Accuracy Evaluation of GEM5 Simulator System
Total Page:16
File Type:pdf, Size:1020Kb
Accuracy Evaluation of GEM5 Simulator System Anastasiia Butko, Rafael Garibotti, Luciano Ost and Gilles Sassatelli LIRMM, CNRS/University of Montpellier II – 161 rue Ada, Cedex05 34095 Montpellier, France {last_name}@lirmm.fr Abstract - Design space exploration (DSE) of complex embedded claim to be fast and flexible. While the simulation speed is systems that combine a number of CPUs, dedicated hardware trivially observed, the claimed level of accuracy of such and software is a tedious task for which a broad range of systems remains often unclear. This paper contributes by approaches exists, from the use of high-level models to hardware evaluating the accuracy of one popular framework, GEM5, prototyping. Each of these entails different simulation when compared to a real hardware platform. speed/accuracy tradeoffs, and thereby enables exploring a certain subset of the design space in a given time. Some simulation frameworks devoted to CPU-centric systems have been developed over the past decade, that either feature near real-time simulation speed or moderate to high speed with quasi-cycle level accuracy, often by means of instruction-set simulators or binary translation techniques. This paper presents an evaluation in term of accuracy in modeling real systems using the GEM5 simulator that belong to the first class. Performance figures of a wide range of benchmarks (e.g. in domains such as scientific computing and media applications) are captured and compared to results obtained on real hardware. Keywords: embedded system, GEM5, modeling and full-system simulation, etc. I. INTRODUCTION Embedded computing systems are found in a wide range of products; they become more and more complex and the main feature are always performance, power consumption and cost. With a large number of existing solutions, developers must decide the best-suited configuration while meeting time-to- Figure 1 – Pyramid of abstraction levels that comprise a system design from market. the specification to a possible optimal solution. Figure extracted from [2]. There exist different approaches to design space This paper is organized as follows. Section II describes exploration. The golden point design is an approach to specify related work on full-system simulation. Section III describes architectures in a very detailed level using hardware GEM5, the chosen simulator. Reference Model, Environment description languages like VHDL or Verilog [1]. This limited configuration and benchmarks are presented in Section IV. abstraction gives high accuracy, but in turn poses severe Section V describes the experiments. Finally, Section VI draws limitations on design space exploration and is also extremely conclusions and gives directions for future work. time consuming. Employed methodologies often lie in higher- level models. II. RELATED WORK Following this direction, there are cycle-accurate models, This section provides an abstract survey of the most abstract executable models and others high-level abstraction popular full-system simulators, according to different criteria: models as shown in Figure 1, which result in faster simulation (i) accuracy, (ii) supported processor architectures, (iii) compared to RTL, at the cost of a loss of accuracy. licensing and (iv) development activity. Furthermore, these models facilitate analysis iterations around various architectural options as well as software execution, Simics is a functionally-accurate full-system simulator that which gives flexibility to explore more features than in a low- enables unmodified target software (e.g. operating system, level abstraction model. applications) to run on the virtual platform similar to the physical hardware [3]. Simics supports a wide range of According to this scenario, to maintain a reasonable processor architectures (e.g. Alpha, ARM, MIPS, PowerPC, balance between simulation time and accuracy, we put focus on SPARC, x86), as well as operating systems (e.g. Linux, full-system simulators, which are software programs that VxWorks, Solaris, FreeBSD, QNX, RTEMS). Simics is simulate hardware, making target software believe that it is composed of an instruction-set simulator, memory- running on physical hardware. Moreover, these simulators management units models, as well as all memories and devices found in the memory map of the processors. Simics has two GEM5 is a modular discrete event driven full-system main disadvantages, it is not claimed to be cycle-accurate and a simulator, under BSD license. This simulator supports different commercial license is required (marketed by Wind River instruction set architectures, such as Alpha, ARM, x86, Systems). SPARC, PowerPC and MIPS [6]. Moreover, this simulator has an active development and support team. PTLsim is a cycle accurate full-system x86 microprocessor simulator that has an out of order pipelined model. PTLsim Table I summarizes the reviewed work according to the also supports modeling of multi-processor or simultaneous four criteria mentioned before. Excluding PTLSim that only multithreading (SMT) machines [4]. PTLsim presents two supports x86, reviewed simulators are composed of several main drawbacks, only x86 architectures are supported and the processor architectures. For instance, OVPsim has the largest tool suite is not actively maintained anymore. number of processor architectures among them, but unfortunately does not target simulation accuracy but rather SimpleScalar is an open source infrastructure for simulation application development, while Simics has a private license. and architectural modeling. It supports several processor Further, SimpleScalar does not provide support or architectures including Alpha, ARM, PowerPC and x86. development anymore. In turn, GEM5 covers all four features, Moreover, it features a large range of CPU models, which justifying our choice. varies from simple unpipelined processors to detailed dynamically scheduled microarchitectures with multiple-level To the best of our knowledge there exist no published memory hierarchies [5]. SimpleScalar features were widely material that reports and discusses GEM5 accuracy in terms of improved in the past, but it seems that both development and performance estimation. For instance in [7], authors evaluate support have slowed down significantly, once the last update the accuracy of the M5 full-system simulator for TCP/IP based was more than a year ago at the time of this writing. network-intensive workloads, using only two benchmarks that OVPsim is a dynamic linked library marketed by Imperas, were executed on a single Alpha CPU model. Differing from which simulates complex multiprocessor platforms containing the previous work, this paper evaluates the accuracy of GEM5, arbitrary local and shared-memory topologies [1]. An which combines both M5 [8] and GEMS [9] into one important feature of this simulator is the dynamic binary simulator. Further, several workload belonging to different translation that improves simulation speed. OVPsim domains are employed to stress a dual-core ARM v7 ISA advantages are extensive documentation and excellent support (Cortex-A9), which is widely used in today high-performance for different processor architectures. However, OVPsim does embedded systems. not models cycle-accurate processors but rather instruction accurate processors. TABLE I. RELATED WORKS ON FULL-SYSTEM SIMULATION Development Reference Simulator Accuracy Supported processor architectures License /Support activity WindRiver [3] Simics Functionally-accurate Alpha, ARM, MIPS, PowerPC, SPARC and x86 Private Yes Yourst [4] PTLsim Cycle-accurate x86 Open Yes Austin et al. [5] SimpleScalar Cycle-accurate Alpha, ARM, PowerPC and x86 Open No Open Cores Open RISC, ARM, Synopsys ARC, Imperas [1] OVPsim Instruction-accurate Open and Private Yes MIPS, PowerPC, Xilinx MicroBraze and others Binkert et al. [6] GEM5 Cycle-accurate Alpha, ARM, x86, SPARC, PowerPC and MIPS Open Yes III. GEM5: THE SIMULATION FRAMEWORK A. System Modes This section presents the simulator chosen. GEM5 was Two different system modes are supported in GEM5: (i) created with the best features of two projects, one focused on a system emulation (SE) and (ii) full system (FS) mode. The SE full-system simulator (M5 [8]) and another in memory systems emulates most operating system-level services through stubs on (GEMS [9]). GEM5 simulator provides a flexible, modular the simulation workstation, which include the Operating simulation system that makes it possible exploring System services and devices, resulting in a significant multiprocessor architecture features by offering a diverse set of simulation speedup at the cost of limited support for some CPU models, system execution modes, and memory system functionalities such as multithreading. On the other hand, the models [6]. FS mode performs complete system simulation, including the OS, thread scheduler and devices that runs on both user-level GEM5 is an event-driven simulation framework that has and kernel-level instructions, making the simulation accuracy, different abstraction levels, balancing simulation speed and penalizing the simulation time. accuracy. Furthermore, GEM5 has an open source license, a good object-oriented infrastructure and a very active mailing B. CPU Models list. GEM5 supports four different CPU models: (i) AtomicSimple, (ii) TimingSimple, (iii) In-Order, and (iv) Out- Of-Order (O3), which differ in speed/accuracy trade-offs. IV. EXPERIMENTAL SETUP AtomicSimple is the simplest scalar one cycle-per- instruction/ideal memory model,