The Design and Utility of the ML-RSIM System Simulator, Lambert
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Systems Architecture 52 (2006) 283–297 www.elsevier.com/locate/sysarc The design and utility of the ML-RSIM system simulator Lambert Schaelicke a,*, Mike Parker b a Intel Corporation, 3400 Harmony Road, Fort Collins, CO 80528, USA b Cray, Inc., PO Box 6000, Chippewa Falls, WI 54729, USA Received 24 July 2004; accepted 21 July 2005 Available online 25 October 2005 Abstract Execution-driven simulation has become the primary method for evaluating architectural techniques as it facilitates rapid design space exploration without the cost of building prototype hardware. To date, most simulation systems have either focused on the cycle-accurate modeling of user-level code while ignoring operating system and I/O effects, or have modeled complete systems while abstracting away many cycle-accurate timing details. The ML-RSIM simulation system presented here combines detailed hardware models with the ability to simulate user-level as well as operating system activ- ity, making it particularly suitable for exploring the interaction of applications with the operating system and I/O activity. This paper provides an overview of the design of the simulation infrastructure and discusses its strengths and weaknesses in terms of accuracy, flexibility, and performance. A validation study using LMBench microbenchmarks shows a good cor- relation for most of the architectural characteristics, while operating system effects show a larger variability. By quantifying the accuracy of the simulation tool in various areas, the validation effort not only helps gauge the validity of simulation results but also allows users to assess the suitability of the tool for a particular purpose. Ó 2005 Elsevier B.V. All rights reserved. Keywords: Simulator validation; Execution-driven simulator; Simulator accuracy 1. Introduction not provided by real hardware. However, the design and implementation of a suitable and robust Simulation has become one of the predominant simulator is in itself a major undertaking that tools for computer architecture research. It allows requires a combination of architectural expertise researchers to rapidly explore novel techniques and software development skills. For this reason, and architectures without incurring the engineering a number of readily available and proven simula- effort and expense of implementing prototype hard- tion systems such as SimpleScalar [2], SimOS ware. In addition, it affords a flexibility normally [16], SimICS [12] and RSIM [15] have found wide- spread use. Understanding the performance, accu- racy, and limitations of simulators is critical when * Corresponding author. selecting a tool to answer a particular research E-mail addresses: [email protected] (L. Schaelicke), [email protected] (M. Parker). question. URLs: http://www.cs.utah.edu/~lambert (L. Schaelicke), This paper describes the design and validation of http://www.cs.utah.edu/~map (M. Parker). ML-RSIM, an execution-driven system simulator 1383-7621/$ - see front matter Ó 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.sysarc.2005.07.001 284 L. Schaelicke, M. Parker / Journal of Systems Architecture 52 (2006) 283–297 that combines detailed hardware models of modern The following section briefly describes the design workstation and server-class machines with a Unix- of ML-RSIM, focusing on the I/O subsystem and like operating system [18]. The ML-RSIM environ- operating system infrastructure. Section 3 presents ment, based on RSIM [15], extends the system the validation methodology used to gauge the accu- features normally represented by simulators to racy of ML-RSIM. Section 4 presents the validation include I/O devices and an operating system with- results and discusses how important design deci- out sacrificing fidelity of the processor model and sions affect the simulatorÕs utility and accuracy. Sec- with only a small impact on simulation speed. A tion 5 contrasts this work with other simulation detailed characterization of the accuracy of the infrastructures and validation efforts. Finally, Sec- hardware models as well as the simulator operating tion 6 summarizes and outlines ongoing and future system highlights key strengths and weaknesses of work. ML-RSIM and helps determine the toolÕs suitability for specific research tasks. 2. ML-RSIM design Simulator design is a trade-off between develop- ment effort, simulation speed, modeling scope, and 2.1. Machine model accuracy. As such, most simulators are targeted towards specific application domains and research The ML-RSIM processor model is based on agendas. For instance, architectural simulators like RSIM v1.0 [15] with extensions to simulate operat- SimpleScalar [2] and RSIM [15] focus on the ing system code and memory-mapped access to I/O cycle-accurate modeling of user-level instructions devices. Other hardware models include a two-level by providing highly detailed models of the processor cache hierarchy, a coherent memory bus, a memory and memory hierarchy. These tools are invaluable controller, and several I/O devices. Combined, these for experiments involving compute-intensive appli- components model a contemporary workstation or cations, but they ignore many system-level effects server-class machine, as shown in Fig. 1. such as virtual memory, system calls, interrupt han- The ML-RSIM CPU model implements the dling, and other I/O related effects. System-level SPARC V8 instruction set [21] on a MIPS R12000- simulators like SimOS [16] and SimICS [12],on like dynamically scheduled superscalar processor the other hand, provide complete system models core [14]. It also includes an instruction cache model, and thus account for application, operating system, TLB models, exception handling capabilities, and and I/O effects. However, to achieve acceptable sim- support for privileged instructions and registers. ulation speed, hardware models are usually simpli- Both user-level and privileged instructions are simu- fied and not cycle-accurate. Such system-level lated, so simulation sessions account for both appli- simulators are able to model the interaction of hard- cation and kernel effects. ware, system software, and applications. However, The cache hierarchy is modeled as a conventional due to the abstract hardware models, only coarse- two-level structure with support for snooping cache grain effects can be observed. coherency. An uncached buffer handles load and In contrast, ML-RSIM combines detailed hard- store instructions to I/O addresses, and supports ware models of a microprocessor, memory hierar- store combining to reduce system bus utilization. chy, and I/O subsystem with a simulated operating The system bus implements snooping cache coher- system. As such, it supports detailed explorations ency through a MESI protocol. The memory con- of the interaction of applications, operating sys- troller supports SDRAM or RAMBUS memory in tems, and I/O devices. Hardware models include a a variety of configurations and accurately models dynamically scheduled processor with a two-level queueing delays, bank contention, and DRAM pre- cache hierarchy, coherent system bus, and a multi- charge and refresh overhead. bank memory controller. These components are The I/O subsystem is based on the PCI bus and augmented with an I/O subsystem consisting of a consists of a PCI bridge with autoconfiguration sup- PCI bridge, SCSI adapter, SCSI bus, and hard disk. port, a real-time clock, and a number of SCSI host The fully-simulated ‘‘Lamix’’ operating system is adapters with hard disks. The real-time clock is based on NetBSD source code and includes a sys- modeled after a MOSTEK 48T02 clock chip and tem call layer, multiple file systems with buffer implements common date and time functionality cache, device drivers, and multitasking capabilities as well as two independent periodic interrupt [18]. sources. The SCSI adapter is a variation of an L. Schaelicke, M. Parker / Journal of Systems Architecture 52 (2006) 283–297 285 CPU L1 I-Cache L1 D-Cache L2 Cache System Bus Disk Memory Controller PCI Config PCI Bridge DRAM . DRAM Realtime SCSI Clock Adapter SCSI Bus Fig. 1. ML-RSIM system model. Adaptec AIC 7770-based SCSI host adapter and ances the need to accurately model a realistic system supports multiple outstanding requests, disconnect with the cost of implementing the minute and often and reconnect transactions, and request queueing obscure details of a particular commercial system. [1]. Each adapter controls one SCSI bus that accounts for arbitration delays, idle times, and data 2.2. Lamix kernel transfer. Each SCSI bus connects an adapter to a configurable number of SCSI disks. These disks The Lamix kernel is a Unix-compatible multi- model seek and transfer times as well as an on- tasking operating system specifically designed to board DRAM buffer. Seek times are calculated run on the ML-RSIM simulator. It consists of an using one of four algorithms: (a) a fitted curve based independently developed multitasking core and sig- on single-track, average, and full-stroke seek times, nificant portions of NetBSD source code that imple- (b) a three-point linear curve based on the same ment file system and device driver functionality. The parameters, (c) a constant seek time, or (d) instanta- system ABI is Solaris-compatible; thus applications neous seeks with zero latency [8,11]. The sector- compiled for Lamix can, in most cases, execute on based disk cache is used for data staging, prefetch- native Solaris hosts without modifications. Lamix ing and optional write buffering. The disk model only requires that applications are statically