A Real-Time Monitor for a Distributed Real-Time Operating System

A Real-Time Monitor for a Distributed Real-Time Operating System Hideyuki Tokuda, Makoto Kotera and Clifford W. Mercer Computer Science Department Carnegie Mellon University Pittsburgh, Pennsylvania 15213 (412) 268-7672 H. [email protected] Abstract monitoring comes from the invasive nature of the real-time monitoring activity in the distributed environment. It not only Monitoring and debugging for a distributed real-time sys- interferes with the processor scheduling, but also with com- tem is a complicated problem due to the lack of a set of munication scheduling and activities. advanced tools and adequate operating system capability. Software tools can cover the wide range of the software The lack of kernel support makes monitoring/debugging development life cycle from the requirement analysis phase for a distributed real-time system more complicated. With- to debugging and maintenance phases. However, many of out kernel support, all attempts to realize real-time systems these modern tools are not effective for building or analyzing possibly leads to ad hoc solutions. As a result, in traditional complex real-time systems. Real-time software tools and ef- real-time systems, testing and verifying the timing correct- fective kernel support are essential to reduce the complexity ness of the program was done in an ad hoc manner. Since of real-time software. In this paper, we first address the a timing error cannot be captured until the last phase of issues in real-time monitoring and debugging, such as cap- integrated testing, it is extremely difficult to isolate the turing the timing error, the monitor’s invasive nature and source of the timing error. It is also indispensable for a visualization of system behavior. We then describe the architecture of the ART Real-Time Monitor which is being built real-time operating system to effectively support a for a distributed real-time OS, called ARTS. The built-in monitoring/debugging facility. monitoring/debugging kernel primitives are also mentioned. In this paper, we describe our “software approach” for real-time monitoring/debugging. “Software approach” can- 1. Introduction not be completely non-invasive, however, it is very practical Monitoring and debugging for a distributed real-time sys- and flexible. Similar software approaches were used in dis- tem is a complicated problem due to a lack of a set of tributed program monitoring [14, 15, 16, 17j, a distributed advanced tools and adequate operating system capability. debugger [2] and a performance diagnostic system [13]. Advances in Software Engineering provided us with a set of First, we address the fundamental issues in monitoring and modern programming tools for building a large complex debugging distributed real-time systems. We then describe software system. Various tool sets can cover the wide the ART Real-Time Monitor which cooperates with a range of the software development life cycle from a require- schedulability analyzer, Schedulerl-2-3 [22]. This monitor is ment analysis phase to debugging and maintenance phases being developed for a real-time distributed operating sys- [1, 61. However, we cannot simply reuse the existing tool tem, called the ARTS. The ARTS is to support research and set for designing and building complex real-time computing experimentation with an object-oriented computational systems without considering the time management model, an integrated time-driven scheduler and reliability capabilities. For instance, an interactive debugging tool in mechanisms that are essential to distributed real-time Smalltalk- [A allows us to track down a “logical” bug in a operating systems. We present the built-in program very well. However, it is almost impossible to monitoring/debugging primitives provided by the ARTS ker- detect or fix a timing bug (error) for real-time programs. An nel. We then discuss the benefits and limitations of our additional tool set such as a timing tool, schedulability real-time monitoring approach. analyzer, and a real-time monitor/debugger should be developed to reduce the complexity of real-time software. 2. Issues in Real-Time Monitoring and One of the major issues is that the monitoring result should Debugging be able to capture the logicalcorrectness and timing correct- There are two types of errors we often encounter while ness of a target program. Another aspect of the difficulty in we debug a distributed real-time program. One is related to logical error and the other is related to timing error. Both This research was supported in part by the U.S. Naval Ocean Systems errors are very difficult to track down in a real-time environ- Center under contract number N66001-87-C-0155, by the Office of Naval ment. In a distributed real-time environment, in particular, a Research under contract number NOOOI4-64-K-073, and by the Federal lack of instantaneous, accurate global state or event order- Systems Division of IBM Corporation under University Agreement ing creates extra complexity in analyzing program behavior. YA-276067. The views and conclusions contained in this document are those In a distributed system with special hardware support, the of the authors and should not be interpreted as representing official policies. system may provide a global uniform clock to each node [3]. either expressed or implied, of NOSC, ONR, IBM, or the U.S. Government. However, it is still difficult to capture a timing error in a 68 program. In this section, let us first focus on the timing error tant events or system behavior. No software monitoring problem. approach can provide a 100% non-invasive monitoring In many traditional real-time systems, there was implicit scheme. Our approach is to minimize and predict the effect binding between program code segment and timing con- of performance degradation or system interference due to straint. Even though individual module testing verifies that the monitoring activities. In a real-time system without care- each module can meet its timing constraint, a timing error ful scheduling concern, a monitoring activity can easily inter- may be exposed as a result of integrated module testing. fere with a “hard” deadline activity and can cause a missed This is due to a poor scheduler such as a “cyclic executive” deadline. To avoid the interference, we built a monitoring based on a specific time-line chart analysis [8]. In such a process in such a way that each periodic monitoring activity is also taken into account for the schedulability analysis. cyclic executive, there is one major scheduling cycle and the major cycle is divided into a group of minor cycles. Each Then, we can verify whether the “hard” tasks and the minor cycle is assigned to a periodic activity (often a set of monitoring processes are schedulable or not beforehand. “routines”) and referred to as a frame. Although each frame The visualization part is the key component of the monitor has a fixed number of routines, the actual routines called functions, however, there is no uniform visualization tech- can vary from frame to frame. If a frame failed to complete nique for representing anticipated, or unexpected system within its minor cycle and overruns, one of the common events or behavior. For instance, the interactive use of an problems called overrun frame takes place. Then, finding animated tracer for interprocess communication activities the timing error becomes non-trivial, since which specific [2] improved user’s debugging ability significantly in our pre- routine which is causing the problem cannot be determined. vious testbed [19]. In fact, it also helped to find a minor bug It is a system or program designer’s responsibility to deter- in IPC primitive itself. Since no one wants to see massive mine how the timing error has occurred. amount of raw data, powerful visualization support is essential to the monitor. What we need is a notion of time encapsulation in a real-time environment in contrast to the well-known notion of data encapsu/afion[ll]. By time encapsulation we mean 3. ART Real-Time Monltor that each module’s timing requirement and timing error will The objective of ART Real-Time Monitor is to *visualize* the system’s internal behavior for the designers of ARTS. be encapsulated and cannot penetrate a module boundary. Our approach for system monitoring is to build the monitor- Unlike data encapsulation, programming language support ing activity as a permanent part of the target system’. The is not sufficient to provide the time encapsulation system must be able to perform reliably with the monitoring mechanism in the system. In other words, we need to use a activity, and no functionality is lost by leaving the monitor in better programming language construct which can express place. If the monitor were removed after the development timing constraints explicitly in a module. Then we must de- phase, the timing interactions of the system might be velop an underlying scientific scheduler which supports time changed, and this is not desirable. Moreover, because of encapsulation. real-time system’s time critical nature, the monitoring activity We are developing an “integrated time-driven scheduler” must produce a minimum amount of interference. Of course, which supports such time encapsulation among real-time it is difficult to totally eliminate the interference. Our ap- tasks for our testbed system. The integrated time-driven proach can predict in advance whether the given task set scheduler uses a rate monotonic scheduling policy [12] for can meet its time constraint with the overhead produced by periodic “hard” real-time tasks and adopts the deferrable the built-in monitoring activity. server algorithm [9] for aperiodic “soft” real-time tasks. The The monitor shoutd also be able to visualize the system timing constraint of each aperiodic task is defined by its value-function [21].

A Real-Time Monitor for a Distributed Real-Time Operating System

Processes Process States

Chapter 3: Processes

CS 450: Operating Systems Sean Wallace <[email protected]>

Sched-ITS: an Interactive Tutoring System to Teach CPU Scheduling Concepts in an Operating Systems Course

The Big Picture So Far Today: Process Management

Lecture 4: September 13 4.1 Process State

Execution Architecture for Real-Time Systems by Ton Kostelijk

CS 162 Operating Systems and Systems Programming Lecture 3

Operating Systems Processes and Threads

Process and Kernel • OS Provides Execution Environment to Run User

Process States and Memory Management

The Operating System Kernel: Implementing Processes and Threads