A Lightweight Kernel for the Harness Metacomputing Framework ∗
Total Page:16
File Type:pdf, Size:1020Kb
A Lightweight Kernel for the Harness Metacomputing Framework ∗ C. Engelmann and G. A. Geist Computer Science and Mathematics Division Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA fengelmannc,[email protected] http://www.csm.ornl.gov Abstract cations from software modules, parallel plug-in paradigms, highly available DVMs, fault-tolerant message pass- Harness is a pluggable heterogeneous Distributed Vir- ing, fine-grain security mechanisms and heterogeneous tual Machine (DVM) environment for parallel and dis- reconfigurable communication frameworks. tributed scientific computing. This paper describes recent Currently, there are three different Harness system proto- improvements in the Harness kernel design. By using a types, each concentrating on different research issues. The lightweight approach and moving previously integrated sys- teams at Oak Ridge National Laboratory [3, 5, 12] and at the tem services into software modules, the software becomes University of Tennessee [6, 7, 13] provide different C vari- more versatile and adaptable. This paper outlines these ants, while the team at Emory University [11, 14, 15, 18] changes and explains the major Harness kernel components maintains a Java-based alternative. in more detail. A short overview is given of ongoing ef- Conceptually, the Harness software architecture consists forts in integrating RMIX, a dynamic heterogeneous recon- of two major parts: a runtime environment (kernel) and a figurable communication framework, into the Harness en- set of plug-in software modules. The multi-threaded user- vironment as a new plug-in software module. We describe space kernel manages the set of dynamically loadable plug- the overall impact of these changes and how they relate to ins. While the kernel provides only basic functions, plug- other ongoing work. ins may provide a wide variety of services needed in fault- tolerant parallel and distributed scientific computing, such as messaging, scientific algorithms and resource manage- 1. Introduction ment. Multiple kernels can be aggregated into a Distributed Virtual Machine. The heterogeneous adaptable reconfigurable networked This paper describes recent improvements in the kernel systems (Harness [9]) research project is an ongoing col- design, which significantly enhance versatility and adapt- laborative effort among Oak Ridge National Laboratory, ability by introducing a lightweight design approach. First, The University of Tennessee, Knoxville, and Emory Uni- we outline the changes in the design and then explain the versity. It focuses on the design and development of a plug- major kernel components in more detail. We continue with gable lightweight heterogeneous Distributed Virtual Ma- a short overview of recent efforts in integrating RMIX, a dy- chine (DVM) environment, where clusters of PCs, work- namic heterogeneous reconfigurable communication frame- stations, and “big iron” supercomputers can be aggregated work, into the Harness environment. We describe the over- to form one giant DVM (in the spirit of its widely-used pre- all impact of these changes and how they relate to other on- decessor, “Parallel Virtual Machine” (PVM) [10]). going work. This paper concludes with a brief summary of As part of the Harness project, a variety of experi- the presented research. ments and system prototypes were developed to explore lightweight pluggable frameworks, adaptive reconfig- 2. Kernel Architecture urable runtime environments, assembly of scientific appli- Earlier designs of a multi-threaded kernel [3] for Har- ∗ This research is sponsored by the Mathematical, Information, and ness were derived from an integrated approach similar to Computational Sciences Division; Of®ce of Advanced Scienti®c Computing Research; U.S. Department of Energy. The work was per- PVM [10], since the Harness software was initially devel- formed at the Oak Ridge National Laboratory, which is managed by oped as a follow-on to PVM, based on the distributed vir- UT-Battelle, LLC under Contract No. De-AC05-00OR22725. tual machine (DVM) model. Other User Application Harness Harness Proxy Daemons Communicator Plug-In Plug-In A Plug-In B (HCom) Identity Plug-Ins Invocation Database Groups Control Msg Creation Send/Receive Msg Types HCore API HMsg API Message Handler (HMsg) Controller (HCtl) (Local HCore Functionality) Identity Plug-Ins Invocation Database Groups Figure 1. Previous Kernel Design Figure 3. Harness Architecture between kernels, and to control the DVM. Recent improve- ments in the design (Figure 2) are based on a lightweight ap- proach. By moving previously integrated services into plug- ins, the kernel becomes more versatile and adaptable. The new design features a lightweight kernel with plug- in, thread and process management only. All other basic components, such as communication, distributed control and event notification, are moved into plug-ins in order to make them reconfigurable and exchangeable at runtime. The kernel provides a container for the user to load and ar- range all software components based on actual needs, with- Figure 2. New Lightweight Kernel Design out any preconditions imposed by the managing framework (Figure 3). The user can choose the programming model (peer-to-peer, client/server, PVM, MPI, Harness DVM) by In PVM, every node runs a local virtual machine and the loading the appropriate plug-in(s). overall set of virtual machines is controlled by a single mas- The kernel itself may run as a daemon process and ac- ter. This master is a single point of control and failure. In the cepts various command line and configuration file options, DVM model, all nodes form together a distributed virtual such as loading default plug-ins and spawning certain pro- machine, which they equally control in virtual synchrony. grams on startup, for bootstrapping with an initial set of ser- Symmetric state replication among all nodes, or a subset, vices. A debug variant with more extensive error logging assures high availability. Since there is no single point of and memory tracing is also provided using the ‘.debug’ ex- control or failure, the DVM survives as long as at least one tension or “–debug” command line flag. node is still alive. The original Harness kernel design (Figure 1) featured In the following sections, we describe the three major all components needed to manage plug-ins and external pro- kernel components (process manager, thread pool and plug- cesses, to send messages between components inside and in loader) in more detail. 2.1. Process Manager 2.3. Plug-in Loader Harness is a pluggable multi-threaded environment. The plug-in loader handles loading and unloading of run- However, in heterogeneous distributed scientific comput- time plug-ins, which exist in the form of shared libraries. ing scenarios it is sometimes necessary to fork a child Such plug-ins are automatically unloaded on kernel shut- process and execute a different program, e.g. for remote lo- down. Plug-ins may load and unload other plug-ins on de- gins using the ssh program. The process manager allows mand. Only plug-in modules that reside in a specific pre- just that. In addition, it also provides access to the stan- set plug-in directory, or in a respective sub-directory, may dard input and output channels of spawned programs, be loaded. The kernel itself may load plug-ins on startup, in allowing plug-ins to initiate and control them. a separate thread that applies command line options. Every plug-in has the opportunity to initialize itself dur- Only programs that reside in a specific preset program ing loading and to finalize itself during unloading via directory, or in a respective sub-directory, may be executed. <plug-in name> init() and <plug-in name> fini() func- Links to system programs, such as ssh, may be created by tions that are automatically resolved and called by the the system administrator. plug-in loader if they exist. Plug-ins are able to load and Forking a process in a multi-threaded environment is unload other plug-ins they depend on during initializa- not very well defined. On startup, the Harness kernel im- tion and finalization, since the plug-in loading and unload- mediately creates a separate process that is responsible for ing functions are reentrant and deadlock free. launching child processes and relaying standard input and A plug-in (shared library) is loaded by the plug-in loader output between spawned programs and the kernel. This via dlopen() without exporting any of its symbols (functions avoids any problems associated with duplicating threads and data) to the global name space of the kernel. Plug-in and file descriptors in the kernel. symbols are not made automatically available to the kernel or to other plug-ins, to prevent naming conflicts and to al- low multiple plug-ins to offer the same interface with differ- 2.2. Thread Pool ent implementations. In fact, only the module (kernel or an- other plug-in) that loads a plug-in may have access to it us- ing a unique and private handle. Due to the potentially unpredictable runtime configura- The plug-in loading and unloading policy is based on tion of the Harness environment, certain issues regarding ownership. A module that loads a specific plug-in owns it thread creation, termination and synchronization need to be and is also responsible for unloading it. The ownership of addressed. They range from dealing with the limited num- orphaned plug-ins is automatically transferred to the kernel. ber of threads controllable by a single process to stalling the They are unloaded on kernel shutdown. The plug-in loader system by using too many simultaneous threads. The thread supports multiple loading to allow different modules to load pool allows the execution of jobs in threads, while staying the same plug-in. Plug-ins that maintain state may use ref- in a preset range of running threads utilizing a job queue. erence counting and separate state instances to support mul- Jobs are submitted to the thread pool in the form of a tiple loading.