
2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing FusedOS: Fusing LWK Performance with FWK Functionality in a Heterogeneous Environment Yoonho Park@ Eric Van Hensbergen% Marius Hillenbrand# Todd Inglett∗ Bryan Rosenburg@ Kyung Dong Ryu@ Robert W. Wisniewski+ Abstract environment Linux provides, not the Linux kernel itself. We Traditionally, there have been two approaches to providing an refer to a collection of Linux APIs, personality, etc., as a Linux operating environment for high performance computing (HPC). environment. A Full-Weight Kernel (FWK) approach starts with a general- purpose operating system and strips it down to better scale FusedOS’s design objectives are to address both core het- up across more cores and out across larger clusters. A Light- erogeneity and the need for a rich and familiar operating Weight Kernel (LWK) approach starts with a new thin kernel environment for more applications. We generalize the types code base and extends its functionality by adding more system of compute elements to cores optimized for power efficiency services needed by applications. In both cases, the goal is to (Power-Efficient Cores or PECs), and cores optimized for provide end-users with a scalable HPC operating environment with the functionality and services needed to reliably run their single-thread performance (Single-Thread-Optimized Cores or applications. STOCs). We envision that PECs may have limited capability To achieve this goal, we propose a new approach, called to run traditional kernels (such as GPUs do today), and that FusedOS, that combines the FWK and LWK approaches. applications running on a chip with PECs and STOCs will FusedOS provides an infrastructure capable of partitioning the desire to fully utilize the capability of the chip in a Linux resources of a multicore heterogeneous system and collaboratively environment. running different operating environments on subsets of the cores and memory, without the use of a virtual machine monitor. With There have been two approaches to providing an oper- FusedOS, HPC applications can enjoy both the performance ating environment for HPC. A Full-Weight Kernel (FWK) characteristics of an LWK and the rich functionality of an FWK approach starts with a general-purpose operating system, typi- through cross-core system service delegation. cally Linux, and strips down the environment to better scale up This paper presents the FusedOS architecture and a prototype across more cores and out across larger clusters. In contrast, implementation on Blue Gene/Q. The FusedOS prototype lever- ages Linux with small modifications as a FWK and implements a Light-Weight Kernel (LWK) approach starts with a new a user-level LWK called Compute Library (CL) by leveraging thin kernel code base and extends its functionality by adding CNK. We present CL performance results demonstrating low more system services needed by applications. In FusedOS, noise and show micro-benchmarks running with performance rather than choosing either an LWK or an FWK approach, we commensurate with that provided by CNK. Index Terms combined both. FusedOS uses a Linux kernel as the FWK HPC, Operating System, Kernel and implements a user-level LWK called Compute Library (CL) by leveraging CNK [9] from IBMR Blue GeneR /Q. FusedOS’s design goal is for HPC performance-critical code I. INTRODUCTION to run without interference on the PECs (or STOCs), and for A decade ago, as processor frequencies leveled out and requests requiring a full Linux environment to be delegated to faded as a major contributor to continued performance im- the STOCs. Applications should achieve similar performance provement, there was a marked shift towards using multiple in FusedOS as in a LWK (we chose performance in CNK cores to design “faster” computers. While multicore counts as our baseline) and, at the same time, be able to make use will continue to increase, heterogeneous technology, whether of the richer functionality of a FWK: We introduce FWK GPUs, bi-modal cores, enhanced SIMD units, or more power- functionality but do not expose applications to the interference ful FPGAs, is being considered as a way to help address the and jitter of FWKs [8]. new challenges inherent in the drive towards exascale. There are two main issues in understanding whether a At the same time, there is an increased need for system soft- FusedOS strategy would be viable. They are (i) whether Linux ware to provide richer environments to allow disparate appli- is sufficiently malleable to allow the fusion and (ii) whether cations to utilize the hardware on the largest supercomputers. the interactions between CL and Linux introduce too much This has implications across the full range of system software. latency, hurting performance. We believe that if we needed to It results in a need to support the capabilities provided by a make substantial Linux modifications, the effort of maintaining general-purpose operating system, such as Linux, including them would be prohibitive. Examining the frequency and types libraries, file systems, and daemon-provided services. We of interactions between CL and Linux should help determine explicitly call out that the application cares about the operating the feasibility of our approach and may influence the design of future architectural features that improve the performance @IBM Research. of the paths between CL and Linux. % ARM Research. Work done while at IBM Research. The concept of FusedOS in general has advantages beyond #Karlsruhe Institute of Technology. Work done while at IBM Research. ∗IBM Systems and Technology Group. heterogeneity. Historically, the Linux developers have been +Intel Corporation. Work done while at IBM Research. reluctant to adopt changes specific to the HPC community. 1550-6533/12 $26.00 © 2012 IEEE 211 DOI 10.1109/SBAC-PAD.2012.14 !%' !%' "" "" "" !%' & & )1 00 ( ! !$ !$# ( Fig. 1. FusedOS architecture. Fig. 2. PEC management interface. This is in part because the Linux community tends to accept changes that matter for the general population, while HPC need to manage. STOCs are best suited for serial computation architectures have tended to push technology limits in order and any required system processing, while PECs are targeted to achieve the highest performance for scientific and engineer- for parallel computation. STOCs and PECs will have similar ing applications. FusedOS can support a variety of applica- instruction set architectures. However, STOCs will have fea- tions with legacy requirements while providing the ability to tures found in high-performance, general-purpose processors leverage a more nimble LWK to effectively incorporate new while PECs will be optimized for power and space. PECs technologies. will have a subset of STOC features and may not contain In order to study both the extent of the required modifica- capabilities such as supervisor mode. The FusedOS design tions to Linux and the performance impact of our approach, assumes coherent shared memory between and across STOCs we implemented a prototype of FusedOS on Blue Gene/Q. and PECs. Current research has shown that heterogeneous Although Blue Gene/Q has homogeneous cores, we simulate nodes with non-coherent shared memory can be quite difficult heterogeneous cores by assigning a set of cores to act as to program. Examples include GPUs and IBM Cell processors. PECs. In that role, cores run almost exclusively in user mode Indications are that these types of architectures are moving executing application code. A small supervisor-state monitor towards a more tightly coupled approach. Today, GPUs are is used only to simulate the hardware we would expect to typically treated as functional units controlled by a CPU. In exist on true PECs. This prototype provides the additional contrast, PECs are independent processors, having their own ability to accurately trace and monitor events. It represents a independent flow of execution. In our FusedOS prototype, conservative view of how the actual hardware would perform the CL (Compute Library) manages PECs. CL is a Linux as its capabilities need to be simulated by the prototype application that encapsulates LWK functionality. Specifically software. in our prototype, it is built from CNK source code and runs The rest of the paper is structured as follows. In Section II, as a user process on Linux, but it could be derived from any we describe the FusedOS architecture for combining an FWK LWK. and an LWK to provide a complete operating environment Linux applications will run on a subset (or all) of the with FWK functionality and LWK performance for HPC code. STOCs, like the Linux App A in Figure 1. Applications that run We show that our user-space LWK variant, Compute Library on CNK or another LWK will run unmodified on the PECs, (CL), can manage cores that a traditional kernel cannot. In like the CNK App. While Linux is not an LWK, the FusedOS Section III, we describe our implementation of a prototype approach can provide a Linux environment on a PEC. This is environment running on current hardware intended to allow represented by Linux App B. us to explore the architecture and performance implications of CL. In Section IV, we present an evaluation of our prototype The CL manages the PECs and applications through the demonstrating low noise, and show application benchmarks PEC management interface as illustrated in Figure 2. To run running with performance close to that achieved under CNK. an LWK application, the CL requests a PEC, loads the LWK There are several threads of related work, which we describe in application into the memory region assigned to the PEC, Section V. In Section VI, we discuss future work. We conclude stores start-up information in a memory area shared with the in Section VII. PEC, then tells the PEC to start the application. When an LWK application thread makes a system call or encounters an exception, the PEC stores the system call or exception II.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-