Experiences in Developing a Linux Cluster for Real-Time Simulation
Total Page:16
File Type:pdf, Size:1020Kb
Experiences in developing a Linux Cluster for Real-Time Simulation Andrew Robbie; Graeme Simpkin; John Fulton; David Craven Defence Science and Technology Organisation [email protected]; [email protected]; [email protected]; [email protected] Abstract. There has been considerable interest in making use of the GNU/Linux operating system as a platform for simulation and training systems. The simulation centre in DSTO's Air Operations Division is focusing on a GNU/Linux cluster primarily designed to serve as a real-time computational resource, rather than as a graphics display system. In this paper we examine some of the design challenges, such as system configuration, modifications required to achieve real-time performance under GNU/Linux, the inter-process communication methods used by simulation middleware, and performance benchmarking. 1. INTRODUCTION Image generation systems and simulation system host In 1992 the Air Operations Division of the Defence computers have often been partitioned into separate Science and Technology Organisation undertook to hardware configuration items. The AOSC adopted a consolidate three decades of real-time human-in-the-loop different approach using a large multi-processor Silicon simulation development and expertise into a new Graphics computer to simultaneously provide multi- capability named the Air Operations Simulation Centre channel image generation and host system models (flight (AOSC) [1]. models, avionics subsystems etc.). Combining these functions within a single system image delivered benefits The primary mission given to the AOSC was to develop through reduced maintenance and administration of the the capability for, and conduct, real-time human-in-the- system and simplified inter-process communications. A loop simulation research with an emphasis on the air simulation software infrastructure was developed by the domain [2]. A flexible, modular system architecture was AOSC to take full advantage of multi-processor developed to address both immediate and anticipated machines. Scalability was provided through simulation requirements. The facility demonstrated its interconnection of host computers via high-bandwidth initial operational capability in 1994 by conducting an reflective memory links. investigation of the tactical utility of symbology presented to aircrew using helmet-mounted displays. The During early years of service the AOSC typically ran following ten years of development and operation have simulation experiments in a serial fashion. However the seen the range of simulation tools expand. Fixed wing demands of recent projects highlighted a requirement for and rotary wing cockpits with both single and dual seat routinely developing and performing experiments in configurations have been integrated with corresponding parallel. Addressing this requirement provided the system models to provide the range of crew environments development team with an opportunity to investigate needed for the work of the Division. alternative design solutions that were afforded by new technology developments. Rapid advancements in the performance of commodity computers made them an attractive design option, meriting further investigation. This paper discusses issues that were identified while exploring the concept of transitioning the successful AOSC architecture to lower-cost computing systems running the GNU/Linux Operating System. 2. ARCHITECTURE OF THE AOSC The broad range of applications for the AOSC dictated a real-time system design that would be flexible, re- configurable, scalable and robust. The research objectives of the simulation imply that behaviour of subsystem models must be both observable and repeatable. Furthermore, the system was designed to undergo continuous development and modification to accommodate changing features of the modelling domain. The architectural solution, developed by the Figure 1: A low-cost flying desk system; computations AOSC to meet these requirements, can be described in are performed on the AOSC Linux cluster. terms of two design patterns [3] – the façade and the mediator. Façade When viewed at the abstraction level of design patterns, it can be seen that the AOSC architecture has The façade design pattern calls for each subsystem, much in common with the DMSO sponsored High or user developed module, to be wrapped in a Level Architecture (HLA) [5]. It should be noted, common interface. Individual components of the however, that the HLA initiative developed several AOSC are integrated into the simulation system in a years after the AOSC began operations. The unified approach that provides standard input, output approximate mapping between architectural and control interfaces. User modules that employ the components is shown in Table 1. façade interface include subsystem models (such as flight dynamics and avionics models), audio systems, Table 1 Similarities between AOSC and HLA image generation systems, operator station modules, AOSC HLA distributed simulation gateways and cockpit devices. User Module & Mediator ⇒ Federate Façade The mediator design pattern calls for a design component Run Time that liaises between other components of the system Mediator ⇒ (Figure 2). The AOSC uses a mediator process to transfer Infrastructure (RTI) data between user modules, and control or schedule Federation Object subsystem activity. De-coupling user modules in this way Global Map File ⇒ greatly facilitates their re-use. Model 3. PROJECT ALUMINIUM User Module + Façade A spiral life-cycle model was adopted for concept Input exploration so that higher-risk options would be examined in earlier work iterations. The initial Output Mediator iteration of work determined to resolve issues that Control would be introduced by migrating the original architecture onto PC based systems. The name given to this initial iteration of work was Project Aluminium. Figure 2: Mediator linking user modules. Process control 3.1 Design Decisions A simulation scenario configuration file is created for Migrate from IRIX to GNU/Linux each experiment detailing the modules to be used, the resources assigned and the inter-module data-flows. GNU/Linux presently enjoys popularity as a well This file is referred to as a global map file. Mediators supported widely available UNIX-like computer and user modules use this information to establish operating system. The GNU Free Software Foundation run-time interconnections. The mediator coordinates coordinates the development toolchain (compilers and time-wise execution of subsystem processes in core libraries), while the Linux kernel provides the accordance with resources specified in the global underlying infrastructure, such as virtual memory and map file. Data transfers occur on simulation frame- scheduling. It was decided early in the project that boundaries as shown in Figure 3. Robbie [4] AOSC simulation software should be migrated to the describes this behaviour in further detail. GNU/Linux platform to achieve the following project goals: • Establishing a technology roadmap that includes low-cost computing options • Improving AOSC flexibility by enhancing platform independence • Minimal risk by maintaining commonality with the original architecture Implement a compute cluster It was decided that the multi-processor scalability of the original implementation should be reproduced by implementing a cluster of GNU/Linux nodes. A number of toolkits are available for developing Figure 3: Subsystem scheduling cluster applications. These include message passing APIs, such as MPI [6], and virtual machine systems, HLA Analogy such as PVM [7], Mosix [8] and OpenSSI[9]. These from the beginning for real-time tasks, it does not tools provide an abstraction layer that hides the suffer the scheduling latency overhead of a general distributed nature of the system. Unfortunately, none purpose operating system. of these were designed to support real-time systems. Rather, they are focused on massively parallel The main drawback to using RTAI is that real-time problems, where every thread is executing the same tasks cannot use standard Linux system calls — code, and jobs can run for days or months. There is instead, they communicate with a normal Linux also a steep learning curve for some toolkits, notably process via shared memory or message queues, which MPI. Hence, these tools were not used. would be complicated for our application. In addition, there are few real-time mode Network Interface Card Reuse system components (NIC) drivers (none for Gigabit Ethernet). As high Rework was minimized by determining that original speed, low latency communication between nodes is AOSC components should be reused wherever essential this is a significant issue. Therefore, it was possible. Peripheral components able to be reused decided that real-time Linux would only be used if without alteration included the audio subsystem, normal Linux would not be able to provide the instruments, cockpit interfaces and terrain databases. required performance. De-couple from image generation system Recently the RTAI developers have introduced the LXRT mode, which allows normal Linux threads to The image generation subsystem was the only user switch to a hard real time scheduler. Additionally, they module requiring significant redevelopment for this run in user mode rather than kernel mode. This mode project. It was decided to limit the scope of Project of operation makes IPC significantly easier, and may Aluminium to only