SST + Gem5 = a Scalable Simulation Infrastructure for High Performance Computing

SST + gem5 = A Scalable Simulation Infrastructure for High Performance Computing Mingyu Hsieh Jie Meng Michael Levenhagen Sandia National Labs P.O.Box Boston University Sandia National Labs 5800 ECE Department P.O.Box 5800 Albuquerque, NM Boston, MA Albuquerque, NM [email protected] [email protected] [email protected] Kevin Pedretti Ayse Coskun Arun Rodrigues Sandia National Labs Boston University Sandia National Labs P.O.Box 5800 ECE Department P.O.Box 5800 Albuquerque, NM Boston, MA Albuquerque, NM [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION High Performance Computing (HPC) faces new challenges in As HPC continues to push the bounds of computation, it scalability, performance, reliability, and power consumption. faces new challenges in scalability, performance, reliability, Solving these challenges will require radically new hardware and power consumption. To address these challenges, we and software approaches. It is impractical to explore this need to design new architectures, operating systems, pro- vast design space without detailed system-level simulations. gramming models, and software. It is impractical to ex- However, most of the existing simulators are either not suffi- plore this vast design space without detailed system-level ciently detailed, not scalable, or cannot evaluate key system simulations. However, most simulators are either not suffi- characteristics such as energy consumption or reliability. ciently detailed (i.e., glossing over important details in pro- To address this problem, we integrate the highly detailed cessor, memory, or network implementations), not scalable gem5 performance simulator into the parallel Structural Sim- (i.e., simulating highly parallel architectures with serial simulation Toolkit (SST). We add the fast-forwarding capability ulators), or not able to evaluate key system characteristics in the SST/gem5 and port the lightweight Kitten operating such as energy consumption or reliability. system on gem5. In addition, we improve the reliability To address this problem, we have integrated the instruction- model in SST with a comprehensive analysis of system reli- level gem5 simulator into the parallel Structural Simula- ability. Utilizing the simulation framework, we evaluate the tion Toolkit (SST) [17]. We extend the SST/gem5 sim- impact of two energy-efficient resource-conscious scheduling ulator by adding fast-forwarding capabilities, porting the policies on system reliability. Our results show that the HPC-oriented Kitten operating system [12], and adding re- effectiveness of scheduling policies differ according to the liability analysis capabilities. We evaluate the benefits of composition of workload and system topology. the SST/gem5 framework by analyzing scheduling policies. Specifically, this paper makes the following contributions: Categories and Subject Descriptors • We create a new tool for exploring key issues in HPC by combining the capabilities of gem5 (detailed processor and B.6.3 [Logic Design]: Design Aids—Simulation cache models, and full-system simulation ability) and SST (a rich set of HPC-oriented architecture components, scal- General Terms able parallel simulation, and the ability to analyze power and reliability). Performance, Reliability • We add fast-forwarding features in the simulation framework, enabling multiple levels of granularity for the same Keywords component in the same simulation. Simulation, Architecture • We integrate the Kitten operating system with gem5. In comparison to Linux system, Kitten’s lightweight kernel 3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKLVZRUNIRU 3HUPLVVLRQWRPDNHGLJLWDORUKDUGFRSLHVRIDOORUSDUWRIWKLVZRUNIRU enables faster simulation and prototyping of system soft- SHUVRQDORUFODVVURRPXVHLVJUDQWHGZLWKRXWIHHSURYLGHGWKDWFRSLHV SHUVRQDORUFODVVURRPXVHLVJUDQWHGZLWKRXWIHHSURYLGHGWKDWFRSLHV ware and run-time management ideas. PermissionDUHQRWPDGHRUGLVWULEXWHGIRUSURILWRUFRPPHUFLDODGYDQWDJH to make digital or hard copies of all or part of this workDQGWKDW for DUHQRWPDGHRUGLVWULEXWHGIRUSURILWRUFRPPHUFLDODGYDQWDJHDQGWKDW • personalFRSLHVEHDUWKLVQRWLFHDQGWKHIXOOFLWDWLRQRQWKHILUVWSDJH or classroom use is granted without fee provided that7RFRS\ copies are We extend the reliability model of SST by implementing FRSLHVEHDUWKLVQRWLFHDQGWKHIXOOFLWDWLRQRQWKHILUVWSDJH7RFRS\ notRWKHUZLVHWRUHSXEOLVKWRSRVWRQVHUYHUVRUWRUHGLVWULEXWHWROLVWV made or distributed for profit or commercial advantage and that copies the Monte Carlo method and adding a model for NBTI bearRWKHUZLVHWRUHSXEOLVKWRSRVWRQVHUYHUVRUWRUHGLVWULEXWH this notice and the full citation on the first page. To copy otherwise,WROLVWV to UHTXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH failure mechanism, enabling more comprehensive system republish,UHTXLUHVSULRUVSHFLILFSHUPLVVLRQDQGRUDIHH to post on servers or to redistribute to lists, requires prior specific 6LPXWRROV0DUFK'HVHQ]DQRGHO*DUGD,WDO\ reliability analysis. Utilizing the integrated simulation permission6LPXWRROV0DUFK'HVHQ]DQRGHO*DUGD,WDO\ and/or a fee. &RS\ULJKWk,&67 framework, we evaluate the impact of two resource-consc- SIMUTools&RS\ULJKWk,&67 2012 March 19–23, Desenzano, Italy. '2,LFVWVLPXWRROV ious scheduling policies on system reliability. Copyright'2,LFVWVLPXWRROV 2012 ICST, ISBN . The rest of the paper starts with a discussion of the back- properties and power traces [10]. The SST/gem5 framework ground and related work. Section 3 describes the integration includes both McPAT and HotSpot for power and thermal of SST and gem5. Extensions to the simulation framework modeling. The modular design of SST eases integration of are described in Section 4. Section 5 demonstrates the eval- existing simulators and the interactions among simulators in uation results and is followed by the conclusion. a parallel, scalable, and open-source framework. A distinguishable feature of SST/gem5 is the ability to 2. BACKGROUND evaluate the impact of scheduling policies on system reliability. Prior research has introduced performance and energy- 2.1 SST aware job scheduling policies that take into account of appli- cation memory characteristics [6, 15]. Merkel et al. [15] em- The Structural Simulation Toolkit (SST) [17] is an open- ploy task activity vectors to characterize applications based source, multi-scale, and parallel architectural simulator that on their resource utilization and schedules applications to aims at the design and evaluation of HPC systems. The core improve system performance and energy efficiency. Fedorova of SST utilizes a component-based event simulation model et al. [6] develop the Power Distributed Intensity (Power DI) built on top of MPI for efficient parallel simulations. The scheduling policy that clusters the computation bound ap- hardware devices, such as processors, are modeled as com- plications on as few machines and as few memory domains as ponents in SST. The component-based modular interface of possible. However, none of these resource-conscious schedul- SST enables the integration of existing simulators (such as ing policies take into consideration of system reliability. In gem5) into a scalable simulation framework. SST has been this work, we use the integrated SST/gem5 framework to validated with two real systems [9]. The SST’s main multi- evaluate two scheduling policies by examining which policy core processor model, genericproc, is derived from the Sim- achieves better system reliability. pleScalar [4]. It couples multiple copies of the sim-out-order pipeline model with a front-end emulation engine execut- ing the PowerPC ISA. Genericproc has a cache coherency 3. INTEGRATION OF SST AND GEM5 model, a prefetcher, and a refactored memory model that In this section, we provide the details of the integration can be connected with more accurate memory models, such of gem5 and SST. The integration requires synchronizing as DRAMSim2 [18]. However, there are limitations such as the clock of gem5 and SST, working around synchronous genericproc has a simple cache coherency protocol and non- messages, and allowing the communication between gem5 configurable memory hierarchy. These limitations drive the and other SST components. We test the integration with need for more detailed processor models in SST. two examples: (1) a Portals 4 [16] offload NIC and router; (2) an external memory model DRAMSim2 [18]. 2.2 gem5 The gem5 simulator [2] is an event-driven performance 3.1 System Emulation Mode Integration simulation framework for computer system architecture re- Figure 1 shows a high level view of how gem5 is encapsu- search. It models major structural simulation components lated as an SST component. On each rank, all gem5 SimOb- (such as CPUs, buses, and caches) as objects, and supports jects live inside an sst::component.Gem5eventqueueis performance simulations in both full-system and syscall-emu- modified so that it is driven by the SST event queue. The lation modes. The full-system mode simulates a complete gem5 is triggered by an SST clock event and the gem5 event computer system including operating system kernel and I/O queue delivers events which are scheduled to arrive at that devices, while syscall-emulation mode simulates statically cycle. Gem5 then stops and returns control back to SST compiled binaries by functionally emulating necessary sys- until the next clock interval. tem calls. The memory subsystem in gem5 models inclu- There are some changes to the initialization. Gem5 tra- sive/exclusive cache hierarchies with various replacement ditionally uses Python to dynamically build the

SST + Gem5 = a Scalable Simulation Infrastructure for High Performance Computing

Gem5, INTEROPERABILITY, and IMPROVING SIMULATOR METHODOLOGY

Lost in Abstraction: Pitfalls of Analyzing Gpus at the Intermediate Language Level

THE AMD Gem5 APU SIMULATOR: MODELING GPUS USING the MACHINE ISA

Enhancing the RISC-V Instruction Set Architecture

Latest Status of These Tests on Gem5

Sources of Error in Full-System Simulation

Empirical CPU Power Modelling and Estimation in the Gem5 Simulator

Appendices a Sample Gem5 Python Configuration Code

Validation of the Gem5 Simulator for X86 Architectures

The Gem5 Simulator: Version 20.0+∗ a New Era for the Open-Source Computer Architecture Simulator

Evaluating Gem5 and QEMU Virtual Platforms for ARM Multicore Architectures

Accelerates In-Memory Databases with Near Data Computing Kevin Hsieh Amirali Boroumand [email protected] [email protected]