A Real-Time Monitor for a Distributed Real-Time

Hideyuki Tokuda, Makoto Kotera and Clifford W. Mercer

Computer Science Department Carnegie Mellon University Pittsburgh, Pennsylvania 15213 (412) 268-7672 H. [email protected] Abstract monitoring comes from the invasive nature of the real-time monitoring activity in the distributed environment. It not only Monitoring and debugging for a distributed real-time sys- interferes with the processor , but also with com- tem is a complicated problem due to the lack of a set of munication scheduling and activities. advanced tools and adequate operating system capability. Software tools can cover the wide range of the software The lack of kernel support makes monitoring/debugging development life cycle from the requirement analysis phase for a distributed real-time system more complicated. With- to debugging and maintenance phases. However, many of out kernel support, all attempts to realize real-time systems these modern tools are not effective for building or analyzing possibly leads to ad hoc solutions. As a result, in traditional complex real-time systems. Real-time software tools and ef- real-time systems, testing and verifying the timing correct- fective kernel support are essential to reduce the complexity ness of the program was done in an ad hoc manner. Since of real-time software. In this paper, we first address the a timing error cannot be captured until the last phase of issues in real-time monitoring and debugging, such as cap- integrated testing, it is extremely difficult to isolate the turing the timing error, the monitor’s invasive nature and source of the timing error. It is also indispensable for a visualization of system behavior. We then describe the ar- chitecture of the ART Real-Time Monitor which is being built real-time operating system to effectively support a for a distributed real-time OS, called ARTS. The built-in monitoring/debugging facility. monitoring/debugging kernel primitives are also mentioned. In this paper, we describe our “software approach” for real-time monitoring/debugging. “Software approach” can- 1. Introduction not be completely non-invasive, however, it is very practical Monitoring and debugging for a distributed real-time sys- and flexible. Similar software approaches were used in dis- tem is a complicated problem due to a lack of a set of tributed program monitoring [14, 15, 16, 17j, a distributed advanced tools and adequate operating system capability. debugger [2] and a performance diagnostic system [13]. Advances in Software Engineering provided us with a set of First, we address the fundamental issues in monitoring and modern programming tools for building a large complex debugging distributed real-time systems. We then describe software system. Various tool sets can cover the wide the ART Real-Time Monitor which cooperates with a range of the software development life cycle from a require- schedulability analyzer, Schedulerl-2-3 [22]. This monitor is ment analysis phase to debugging and maintenance phases being developed for a real-time distributed operating sys- [1, 61. However, we cannot simply reuse the existing tool tem, called the ARTS. The ARTS is to support research and set for designing and building complex real-time computing experimentation with an object-oriented computational systems without considering the time management model, an integrated time-driven scheduler and reliability capabilities. For instance, an interactive debugging tool in mechanisms that are essential to distributed real-time Smalltalk- [A allows us to track down a “logical” bug in a operating systems. We present the built-in program very well. However, it is almost impossible to monitoring/debugging primitives provided by the ARTS ker- detect or fix a timing bug (error) for real-time programs. An nel. We then discuss the benefits and limitations of our additional tool set such as a timing tool, schedulability real-time monitoring approach. analyzer, and a real-time monitor/debugger should be developed to reduce the complexity of real-time software. 2. Issues in Real-Time Monitoring and One of the major issues is that the monitoring result should Debugging be able to capture the logicalcorrectness and timing correct- There are two types of errors we often encounter while ness of a target program. Another aspect of the difficulty in we debug a distributed real-time program. One is related to logical error and the other is related to timing error. Both This research was supported in part by the U.S. Naval Ocean Systems errors are very difficult to track down in a real-time environ- Center under contract number N66001-87-C-0155, by the Office of Naval ment. In a distributed real-time environment, in particular, a Research under contract number NOOOI4-64-K-073, and by the Federal lack of instantaneous, accurate global state or event order- Systems Division of IBM Corporation under University Agreement ing creates extra complexity in analyzing program behavior. YA-276067. The views and conclusions contained in this document are those In a distributed system with special hardware support, the of the authors and should not be interpreted as representing official policies. system may provide a global uniform clock to each node [3]. either expressed or implied, of NOSC, ONR, IBM, or the U.S. Government. However, it is still difficult to capture a timing error in a

68 program. In this section, let us first focus on the timing error tant events or system behavior. No software monitoring problem. approach can provide a 100% non-invasive monitoring In many traditional real-time systems, there was implicit scheme. Our approach is to minimize and predict the effect binding between program code segment and timing con- of performance degradation or system interference due to straint. Even though individual module testing verifies that the monitoring activities. In a real-time system without care- each module can meet its timing constraint, a timing error ful scheduling concern, a monitoring activity can easily inter- may be exposed as a result of integrated module testing. fere with a “hard” deadline activity and can cause a missed This is due to a poor scheduler such as a “cyclic executive” deadline. To avoid the interference, we built a monitoring based on a specific time-line chart analysis [8]. In such a in such a way that each periodic monitoring activity is also taken into account for the schedulability analysis. cyclic executive, there is one major scheduling cycle and the major cycle is divided into a group of minor cycles. Each Then, we can verify whether the “hard” tasks and the minor cycle is assigned to a periodic activity (often a set of monitoring processes are schedulable or not beforehand. “routines”) and referred to as a frame. Although each frame The visualization part is the key component of the monitor has a fixed number of routines, the actual routines called functions, however, there is no uniform visualization tech- can vary from frame to frame. If a frame failed to complete nique for representing anticipated, or unexpected system within its minor cycle and overruns, one of the common events or behavior. For instance, the interactive use of an problems called overrun frame takes place. Then, finding animated tracer for interprocess communication activities the timing error becomes non-trivial, since which specific [2] improved user’s debugging ability significantly in our pre- routine which is causing the problem cannot be determined. vious testbed [19]. In fact, it also helped to find a minor bug It is a system or program designer’s responsibility to deter- in IPC primitive itself. Since no one wants to see massive mine how the timing error has occurred. amount of raw data, powerful visualization support is essen- tial to the monitor. What we need is a notion of time encapsulation in a real-time environment in contrast to the well-known notion of data encapsu/afion[ll]. By time encapsulation we mean 3. ART Real-Time Monltor that each module’s timing requirement and timing error will The objective of ART Real-Time Monitor is to *visualize* the system’s internal behavior for the designers of ARTS. be encapsulated and cannot penetrate a module boundary. Our approach for system monitoring is to build the monitor- Unlike data encapsulation, programming language support ing activity as a permanent part of the target system’. The is not sufficient to provide the time encapsulation system must be able to perform reliably with the monitoring mechanism in the system. In other words, we need to use a activity, and no functionality is lost by leaving the monitor in better programming language construct which can express place. If the monitor were removed after the development timing constraints explicitly in a module. Then we must de- phase, the timing interactions of the system might be velop an underlying scientific scheduler which supports time changed, and this is not desirable. Moreover, because of encapsulation. real-time system’s time critical nature, the monitoring activity We are developing an “integrated time-driven scheduler” must produce a minimum amount of interference. Of course, which supports such time encapsulation among real-time it is difficult to totally eliminate the interference. Our ap- tasks for our testbed system. The integrated time-driven proach can predict in advance whether the given task set scheduler uses a rate monotonic scheduling policy [12] for can meet its time constraint with the overhead produced by periodic “hard” real-time tasks and adopts the deferrable the built-in monitoring activity. server algorithm [9] for aperiodic “soft” real-time tasks. The The monitor shoutd also be able to visualize the system timing constraint of each aperiodic task is defined by its value-function [21]. The scheduler attempts not only to activity at an arbitrary level of abstraction. However, as a first implementation, the Real-Time Monitor can visualize the meet all “hard” deadlines, but also to minimize the average system’s scheduling decisions among periodic and aperiodic response time of aperiodic tasks. The rate monotonic processes2 in terms of a execution history diagram. The scheduling policy also allows us to analyze the current functionality is suitable for verification of the rate schedulability of the given task set by using one of our tools, monotonic scheduling algorithm and the deferrable server. a schedulability analyzer (See Section 3). The deferrable The monitor will be integrated with interactive debugging server algorithm preserves a proper amount of total CPU capability. utilization which can be given to aperiodic activities. A value-function of a task represents a semantic importance value of the task and is used to select the next runnable aperiodic task or to perform shedding in the case of overload conditions. By using the value-function, we can easily select the most important task for the next run and can also abort the least important tasks. ‘A similar approach was proposed by Svobodova [18], and her model has Other important issues are the “invasive” nature of the a richer set of monitoring operations. system monitoring activities and the “visualization” of impor- *“Process” and ‘task” are used interchangeably in this paper.

69 3.1. Overall Structure The ART Real-time Monitor is divided into three func- tional units: the Event Tap, the Reporter, and the Visualizer. The following is a brief description of each: more details are provided in the next section. Figure 3-l shows these com- ponents and their relationship to each other.

l The Event Tap is the part of the operating system code which records the information about interest- ing events, which, in this case, are than es of recess state. The Tap is embedded ins!.% e the Rernel code which performs process switching. l The Reporter sends the event message from the target system to the Visualizer on a remote host. The time requirement of the communication portion of the Reporter is incorporated into the given task set so that the interference by the Reporter’s ac- tivity can be analyzed. By creating a separate task for the Reporter, the interference is predictable in our schedulability analysis. l The Visualizer is on a host outside of the target system. The Visualizer requires too many Figure 3-2: Process State-Transition Diagram resources in term of time and space to be accom- modated within the target system. It utilizes the event messages sent by the Reporter and visual- izes the events in the form of an execution history diagram. 3.3. Reporter The Reporter is a process on the target host which is 3.2. Event Tap responsible for communicating the events to the Visualizer. The Event Tap is the mechanism by which the kernel The Reporter will periodically invoke a kernel primitive to captures the events of interest. The ART monitor has the package the events into an event message and send the capability for monitoring events on the process level. An message over the network. event is generated each time a process changes its state. The Event Tap places this event information into the Events at the process level include process-creating, event message buffer. In each of these buffers, message waking-up, blocking, scheduling, freezing, unfreezing, killing headers are set in advance so that these buffers can be with completion, killing with missed deadline, and killing with sent as event messages when they are filled with event frame overrun. The states and transitions are shown in records. Message headers depend on the communication Figure 3-2; each transition corresponds to an event. These media and protocol the Reporter uses. As soon as one events and related timing information are once stored in the buffer is filled, the data part of the next buffer will be used. event message buffer inside the kernel. Then the Events Tap begins storing event records again.

I- ARTS Kernel image Scheduler I-2-3 Fi’e SefVer and Synthetic Workload Real-Time Monitor’s Visualizer

Figure 3-1: Overall Structure of the ART Real-time Monitor

70 3.4. Visualizer Figure 3-3 indicates the monitoring results of the first 6.41 The Visualizer utilizes event messages from the seconds of the system execution. The diagram consists of Reporter, visualizes them and then provides the user with horizontal lines and vertical lines. Horizontal lines show ex- an interface for remote debugging. An event messages is a ecution time of tasks, while vertical lines indicate task collection of raw data, from which it is hard to find mean- switchings. This figure illustrates that all of periodic tasks ingful information. The Visualizer interprets these raw data (process 3, 4, and 5) met their 10 deadlines, and CPU so as to visualize an execution diagram. After receiving an utilization was recoreded 32.14% at that point. Task 5 in- event message from the Reporter, the Visualizer not only dicates the Reporter, and was executed successfully every creates an execution history diagram, but also calculates the 2 seconds after its creation. Figure 3-4, on the other hand, important statistics such as total CPU utilization for periodic depicts how the Visualizer can help us debug the system, and aperiodic tasks, the number of successful completion, even though it works in passive manner. In this picture, the missed deadlines by abortion or by cancellation, and the Visualizer indicates something wrong happening within the number of events per second. The Visualizer is also system, showing all activities have stopped 4.7 seconds capable of replaying an execution diagram for later analysis. after system initiation. All received messages are fifed, and the Visualizer replays the same execution diagram by referring the filed infor- mation. Statistics are also provided while replay is going on.

Figure 3-3: The visualized run-time system behavior

71 3.5. Monitorability Analysis However, this approach has a limitation. fn case of proce- In real-time monitoring, it is very important to predict the dure calls or object invocation, it is impossible to predict the maximum interference and capability of the monitoring worst case without a dynamic flow analysis scheme. Cur- process itself. Our approach is to predict the interference rently, the events of intests for the ART Real-Time Monitor and monitorability during the system analysis phase by are process switching, in which it is possible to analyze the using a schedulability analyzer, called Schedulerl-2-3. worst case as follows. Schedule/l-23 can analyze the schedulability of a given set For the monitorability analysis, we should know how of periodic and aperiodic processes under a given schedul- many events can occur in the worst case. It is assumed that ing policy. The Reporter process is added to the given task there are n periodic tasks with periods Ti, for 1 I i I n. In set as a periodic process so that the interference by the accordance with Leinbaugh’s analysis [lo], task; with period monitoring activities can be taken account into the schedulability analysis. The monitorability is also analyzed T,. can be preempted r$ + 1 times in a period by taski with by using the worst case analysis performed by Schedulerl-23. By the monitorability, we mean that the period T2 In general,‘the number of preemptions in the Reporter can keep reporting the system behavior to the worst case is estimated in the following manner. Visualizer. In a sense of best effort, even if the monitorability is not guaranteed, the system behavior can be reported to certain extent. Nonetheless, events generated by the system will eventually overwhelm the capacity of the Reporter, and important events might be lost.

11me~s) r currsnt time: 5. se llsc ( Forward I[ lackuard ][ Reulnd ][ Quit ] Wagnlflcatlon 31 I- Cyclic tasks Replay-speed = 1 cyc’ ACYCllC tasks

Total CPU utl llzatlon : k.647157 Ccyclic tasks(1.531100)) hcycllc tasks(O.117857)) Tot UODt doadllnS S he Hissed deadline 0 Wi s -- Aborted n __ -- Cancelod e

event-runnable , set- 4.97 515, event-chosen , sec- 4.98 515, event-complete , set- 4.98 643, event-created ) set- 4.98 event-chosen , set- 4.96 event-runnable , set- 5.97 643, event-chosen , set- 5.68

Figure 3-4: The playback image of the Visualizer

72 Totai~reetrqion PreemptiOni t1 If Ep is bigger than the number of events that can be Preel?ptiO?Zi = ~~;~+I, reported in one period, the system behavior cannot be fol- j=l i lowed up by the Reporter. For example, Figure 3-5 shows a screen image of the shd~~8%?-~ analyzing schedulabifity where i f j, and taskj has possibility to interfere with the of the Inertial Navigation System [4] task set which consists activity of taski of g periodic tasks and 3 aperiodic tasks. A user of Meanwhile, under the rate monotonic scheduling, the sChedU/8d-2-a can create periodic and aperiodic tasks by number of preemptions can decrease. Taskj can preempt simply adjusting the length of the sliding bar by a mouse. In task! iff T.-C Ti Assuming the idle task whose period is this example, sChedU/8f1-2-~ reports the given task set is inftnrtely f ong, every task arrival can be regarded as a schedulable with 88.5% CPU utilization and the system be- preemption over the idle task. So, let Eswilc,, and T, be the havior can be reported with -51.6 events being generated number of events caused by a preemption and the period of during every period of the Reporter. the monitoring task respectively, the maximum number of events produced in T, is calculated with the following ex- pression.

wblcana to the Scheduler l-2-3 Scheduliw Algorith: zRat9 MOnOtOnIC Cyclic : Exacution/Perlod: Prlorjty KifClDl Attitude Updater B/ 25: 0 CQI 10 [bCTB2 Valoclty-Updater 40/ 400: 0 II481 - 1 f3'CTBS Attitude Sender iBE/ 625: El r.iw I- tidcT04 Navigation-Sender 288/10888 : B C2@01 y lGfCT85 Status-Display la88/108a@ : 0 [mm] lZfCT86 Run~Tlma~BIT 258/12588 : B ma1 I. lifCTB7 Poeltlon-Updater 5e/ 500: B C501 c I Et'CTB6 Reporter64 5 209: B q CTBQ 01 580: t3 :i; 7 1 Cl CT16 e/ 568: e WI k I Message frm the Scheduler i-2-3 [Ok71 Schedulable with 9.885888 utlllzatlonl hcycl Ic: Moan axocutlon/Mean Arrival: Stdev: &ATM Casole_KB_ISR B/ 8: 8: N ccl 1 And can be reported 1~11th Reporter64 IZfATB2 Console-Screen-ISR B/ 8: B: N [@I 1 up to 51.599897 ovents/rsporter*s period Cd~T03 ~~~~~~~~~~~ B/ 8: B: N ml 1 5 q AT84 e/ 588: e: N cei L

CPU Utilization Cyclic Only : [Be] B 100 Acycllc Only: [B] 9 1 J 100 Total : [88] e 100

This Is Editing ulndw [mJ[G) Current Task : = CTSS

Context overhead: B WJ @ I 1 sea Task Name : Reporter64 Cycle l+cks : 208 [2Em] E - 1 me Execution licks : 5, c51 B l 1 588 Proirlty : e w3 0 I 1 169 Phase Ticks : e ccl e I 1 588

Figure 3-5: The monitorability analysis on the INS task set

73 3.6. Monitoring Performance Interaction among artobjects is performed by operation The current implementation of the ART Real-Time invocation. An invocation request to a passive artobject is Monitor has been done on a network of SUN3’s3. The performed as a remote procedure call, while a request to an Reporter runs on the ART real-time kernel and the active artobject is performed by an explicit “Request-Accept- Visualizer runs on a UNIX 4.2bsd host workstaGon. The Reply” sequence. Namely, an artobject invokes an opera- performance results of the runtime monitor can be sum- tion by sending an invocation request message to the des- marized as follows: tination artobject explicitly, and then it waits for a result. When the destination sends a resuft back, the caller wakes Event Record Size up and receives the result. From a user’s point of view, 20 bytes there is a common invocation syntax, like “object.opr(args)“, Event Message however the ARTS Kernel provides more flexible invocation 64 events per message mechanism which can perform an asynchronous request Event Logging Overhead and one-to-many communication [5, 191. -40 psec per event Reporter’s Message Transfer Speed The skeleton of a simple artobject is shown in Figure 4-l. -3 msec per message (64 events) Visualizer’s Message Receiving/Drawing Speed -150 msec per message (64 events) artobject Sample speciflcatlon Creating a light-weight process begln -300 psec per process operation oprl (arg, ...) => (result) within time except recovery-oprl (arg, . ..) operation opr2(arg, ...) => (result) 4. ARTS Kernel Support withln time except recovery-opr2(arg, . ..) . . . The ARTS Kernel provides a distributed real-time com- end; puting environment based on an object-oriented model. In the ARTS, every computational entity is represented as an artobject Sample body object, called an “artobject”, and it can be a “passive” as begln well as “active” object which can contain more than one var DataObject; -- shared data objects among processes (light-weight) process. In this section, we describe our ob- ject model, the kernel mechanisms which can confine timing errors, and a part of the ARTS kernel interface which is process lNITIAL(); related to system monitoring and debugging support at begin various levels of abstraction. Accept(AnyOperation, ReqMsg); I . . 4.1. Object Model result = DoComputation(ReqMsg); We view an “artobject” as a basic module for embodying Reply(Req.Trasld, result); a distributed abstract data type. An artobject is a distributed end INITIAL; abstract data type consisting of two principle parts: a specification and a body. The artobject specification, c&ration oprl (arg, ...) P> (result) describing the external user’s view of the artobject, consists within time except recovery-oprt (arg, . ..) of a set of operations which other artobjects use to activate operation recovery-oprl (arg, . ..) => (result) services offered by the artobject. The artobject body con- operatlon opr2(arg, ...) => (result) sists of a set of “passive” procedures or contains at least wlthln time except recovery-opr2(arg, . ..) one process, called INITIAL, as well as other processes operatlon recoveryopr2(arg, . ..) Q> (result) which share a set of shared data objects. We refer to the . . . former type as a “passive” object and the latter one as an end Sample: “active” object. Flgure 4-1: A Skeleton of Artobject Declaration When an instance of the active artobject is created, the INITIAL process is created and run immediately. We imple- mented these processes in the artobject as light-weight For each operation definition of an artobject. the designer processes for which creation and destruction can be done of the object must provide a “time fence” (i.e., worst case cheaply. The shared data objects in an artobject are totally time bound) and its time exception handling routine by hidden from other artobjects and may be accessed by other specifying “wlthln time except recovery-opr()“. The “time” artobjects only by invoking an operation defined in the indicates that the operation must be completed within the specification part. “time limit”; otherwise the specified “recovery” operation will be executed. In the timing exception routine, it may cause a critical section problem within the requested operation. The basic principle in the ARTS is that the designer must put the state of any critical data object “back” or “forward” to a con- %un3 is a registered trademark of Sun Microsystems, Inc.

74 sistent state. In real-time applications, we often prefer to 4.3. MonltorlnglDebugglng Support use the “forward” recovery scheme (i.e., we call it a The objective of the monitoring and debugging support “compensation” routine [20], since an “undo” operation can- primitives in the ARTS Kernel is to reduce the complexity of not be performed against any external (real) state change. application development in a distributed real-time environ- ment. The primitives are useful not only for building a 4.2. Time Encapsulation Support debugger, but also providing application specific monitoring The notion of “time encapsulation” is to encapsulate (or functions. In ARTS, there are three sets of basic primitives confine) each module’s timing error within the module. This for any type of objects/processes: also requires us to specify a timing requirement explicitly for l “Freeze/Unfreeze” object or process’s activities, each object. The ARTS Kernel supports the time encap- l “Fetch/Store” object or process’s data object, and sulation among real-time objects/processes by providing two l “Watch/Capture” object or process’s communica- mechanisms: the integrated “time-driven” scheduler and tion activities, “time fence” mechanism. The “Freeze/Unfreeze” operations control the activity of The integrated time-driven (ITDS) scheduler uses a rate an artobject or process by stopping or resuming it. The monotonic scheduling policy for periodic “hard” real-time “Fetch/Store” operations can be used for retrieving and stor- tasks and adopts a value function based scheduling policy ing data object from a specifc artobject or process. The with the “deferrable server” for “soft” aperiodic tasks. In the “Watch/Capture” operations monitor the communication ac- ITDS scheduler, we first analyze the given “hard” real-time tivity among artobjects by coping or intercepting the selec- task set’s schedulability, then we can compute the maximum tive messages. In the following, we will briefly describe the CPU utilization that the deferrable server can consume. In functionality of the kernel primitives. other words, we try to provide the maximum amount of CPU 4.3.1. Freeze and Unfreeze cycles to the aperiodic task set while we can guarantee The FreezeOb@ct primitive stops the execution of an ar- meeting all hard real-time tasks’ deadlines. In summary, the tobject (i.e.. all its associated processes), while a ITDS scheduler can guarantee the following: FreezeProcess primitive halts a specific process for inspec- l Schedulabilit of the “hard” eriodic tasks (together with the resu 1; from Schedu Perl-2-3), tion. An UnfreezeObject and UnfreezeProcess primitive resumes a suspended artobject and process respectively. l Value function based “soft” real-time task schedul- ing, and While a process is in a frozen state, many of the factors l Overload control based on the given value func- used for making scheduling decisions can be selectively ig-’ tions of the aperiodic tasks. nored. For instance, a timeout value will be ignored by The “time fence” is a mechanism to detect a timing error specifying a proper flag in the option field. In similar context, at every object invocation at runtime in the ARTS Kernel. the ARTS kernel also provides “FreezeNode” and Since we must specify the worst case timing requirement for “UnfreezeNode” primitives to halt or restart all of the client’s each operation in an object, the kernel can perform the activities in a specific node. The Freeze/Unfreeze primitives timing check. This is in a sense similar to the array boundary are defined as follows. check. Suppose that artobject A’s process P invoked an opera- val I FreezeObject(oid [, options]) tion X on artobject Q (i.e., “Q.X( )” ), then the ARTS Kernel val = UnfreezeObJect(oid [, options]) val = FreezeProcess(pid [, options]) will initiate the following timing check during the invocation val - UnfreezeProcess(pid [, options]) protocol. Suppose Pcf and Pwsr stand for P’s current time val I FreeteNode(nid [, options]) and P’s worst case slack time respectively. And P’s current fence time (i.e., P’s starting time plus the current fence value) is represented by Pcj?. Note that the “options” parameter can be used to selec-

l When P invokes Q.X( ), P’s request message will tively ignore on-going target activities. For instance, it can carr Pet as well as Pwst (i.e., Pwst = Pep - Pct)in effectively skip the timeout processing of the trarget activity P’s Yocal time. while the target object/process is frozed. l After the request message was received by the communication mana er at Q’s site, it will check 4.3.2. Fetch and Store whether P can meet t9, e current fence value Qcfv by the following condition: A Fetch primitive inspects the status of a “running” or “frozen” artobject or process in terms of a set of values of if (Pwst < Qcfv + 2.COnvnd + CIOCkE) { data objects. The specific state of the artobject or process art-error(lime fence violation”); will. be selected by a data object-id. The state includes not return(lNVOCATION-FENCE-ERROR); only the status of private variables, but also includes 1 object/process control information. The Fetch/Store primi- where Commd indicates the maximum end-to-end tives are defined as follows. communication delay in our real-time network and Clock, specifies the maximum clock drift among any two nodes in the system. fval = FetchObject(oid, dataoid, buffer, size) sval - StoreObject(oid, dataoid, buffer, size)

75 fval I FetchProcess(pid, dataoid, buffer, size) the integrated scheduler. In particular, the separation be- sval I StoreProcass(pid, dataoid, buffer, size) tween the Reporter and the Visualizer allows us to use the monitor for an embedded system as well. h is also easy to extend the system to visualize many different levels of ac- Note that “dataoid” indicates a private data object’s id tivities, such as programs, objects, procedures, statements, within a given object or process. The system do not and variables. However, if the target system does not have guarantee the consistency of the target data unless the tar- reasonable resources, the monitoring capability will be get was frozen in the consistent state. limited. In particular, the granularity and accuracy of low- 4.3.3. Capture and Watch level activities of the system will be very limited, so that a The Capture primitives capture bngoing communication hybrid approach using additional hardware may be neces- messages from the specified artobject or process. A sary. CaljtureObject primitive captures all incoming requests as well as outgoing reply messages to a specified artobject, References and can also select a target message based on the name of Barstow, D. IX, Shrobe, H. E. and Sandwell, E. (editors). the operation. A CaptureProcess primitive captures all in- Interactive Programming Environment. coming messages and outgoing reply messages for a McGraw-Hill, 1984. specified process. The Watch primitives are similar to the PI Bei, J. N. Capture primitives except that all monitored messages are Communication Graph Display System: On the Use of Com- duplicated, not captured. The “Capture/Watch” primitives puter Graphics to Debug Distributed Sohware. PhD thesis, Dept. of Science, University of Water- are defined as follows. loo, 1985.

[31 Bhatt, D., Ghonami, A. and Ramanujan R. val = CaptureObject(oid, commtype, opr, requestor) An Instrumented Testbed for Real-Time Distributed Systems val = CaptureProcess(pid, opr) Development. val I WatchObJect(oid, commtype. opr, requestor) In Proc. 8th IEEE Real-Time Systems Sympo.. December. val I WatchProcess(pid, opr) 1987. 141 Borger, M. W. VAXLEN Experimentation: Programming a Real-Time Peri- Note that “commtype” argument indicates either “IN” or odic Task Dispatcher using VAXLEN Ada 1.1. “OUT” type and “IN” selects an incoming invocation request Technical Report CMUISEI-87-TR32(ESD-TR-87-195), message and “OUT” indicates an outgoing reply message. Carnegie Mellon University, September, 1987. The “requestor” specifies the requester’s oid or “ANY-OID”. [51 Cheriton, D. R. Similarly, the “opr” parameter can be a specific operation The V Kernel: a software base for distributed systems. name or “ANY-OPR”. IEEE Software , April, 1984. PI Dart, S. A., Ellison, R. J., Feiler, P. H. and Habermann, A. N. 5. Summary Software Development Environment. We addressed the issues in real-time monitoring and COMPUTER20(11). November, 1987. debugging in a distributed real-time system. In particular, [71 Goldberg, A. we discussed the timing error problem in traditional real-time SmaNtalk-80. systems which use “cyclic executive” and showed the lack Addison-Wesley, 1984. of real-time software tools and of adequate kernel support. PI Hood, P. and Grover, V. We then introduced the notion of the “time encapsulation” Designing Real Time Systems in Ada. which allows us to reduce the complexity of handling the Technical Report 1123-1, Soft&~, Inc., January, 1986. timing error problem in a distributed real-time environment. i91 Lehoozky, J. P., Sha, L. and Strosnider, J. K. The architecture of the ART Real-Time Monitor based on Aperiodic Scheduling in A Hard Real-Time Environment. our “software approach” for real-time monitoring/debugging In Proc. 8th IEEE Real-lime Systems Sympo.. December, is described. We also showed how the monitor works 1987. together with the schedulability analyzer, called Scheduler 1101 Leinbaugh, D. W. 7-2-3. Currently, the ART Real-Time Monitor works in a Guaranteed Response Times in a Hard-Real-Time Environ- passive manner in a sense the monitor doesn’t allow us to ment. actively debug the traget system. However, an example of IEEE Transaction on Software Engjneering , January, 1980. the monitor output demonstrated it could also work effec- I111 Liskov. B. H. and Zilles, S. N. tively as a debugger. The ARTS kernel provides a Programming with Abstract Data types. monitoring/debugging support with the built-in promitives, SIGPLAN Notices, April, 1974. which can extend the monitor to an “active” real-time debug- 1121 Liu, C. L. and Layland, J. W. ger. Scheduling Algorithm for Multiprogramming in a Hard Real- Time Environment. It is clear that our “software approach” is not a non- JACM20(1), 1973. invasive monitoring/debugging scheme. However, it is very practical, yet predictable if the real-time monitor is used with

76 (131 Maxion, A. A. Distributed Diagnostic Performance Reporting and Analysis. In Proc. /EEE Int. Cont. Computer Design. October, 1986. If41 McDaniel, G. METRIC: A Kernel Instrumentation System for Distributed Environments. In Proc. 6th Sympo. on @eraring Systems Principles. November, 1977. Miller, B. P., Secherest, S. and Macrander, C. A Distributed Ptvgram Monitor for &w&y Unix. Technical Report UCB/CSD &l/201, University of California, Berkley, 1984. Miller, B. P. and Yang, C-Q. IPS: An Interactive and Automatic Performance Measure- ment Tool for Parallel and Distributed Programs. In Proc. 7th Int. Conf. on Distributed Computing Systems. September, 1987. I17l Poirier. M. The Shoshin Soffware Performance Mwitor, PhD thesis, Dept. of Computer Science, Univ. of Waterloo, 1982. uf4 Svobodova, L. Performance Monitoring in Computer System: A Structured Approach. ACM Operating Systems Review. July, 1981. WI Tokuda, l-f., Radia, S. R. and Manning, E. Shoshin OS: a Message-based Operating System for a Distributed Software Testbed. In Proc. of 16th Hawaii In& Conf. on System Science Vol. 1. January, 1983. I201 Tokuda, H, Locke, C. D. and Clark, Ft. K. Client Interface Specification of ArchOS. Tech. Report, Computer Science Department, Carnegie Mellon Univ. October, 1985 WI Tokuda, H., Wendorf, J. W. and Wang H-Y. Implementation of Time-Driven Scheduler for Real-Time Operating Systems. In Proc. 8th IEEE Real-lime Systems Sympo.. December, 1987. 1221 Tokuda, H. and Kotera, M. Scheduler l-2-3: An Interactive Schedulability Analyzer for Real-Time Systems. Computer Science Department ART project, Carnegie Mel- lon University. February, 1988

77