GUEST EDITORS’ INTRODUCTION Performance Simulation Tools

Understanding the performance of microprocessors, multiprocessors, and distributed computers requires studying them in isolation as well as observing their interaction with the entire system architecture.

Shubhendu S. igh-performance computing has grown Performance modeling and analysis are now inte- Mukherjee largely in scale with Moore’s law. As hun- gral to the design flow of modern computing systems, dreds of millions—and soon to be bil- especially for high-performance microprocessors. As lions—of transistors crowd onto proces- Figure 1 shows, designers begin by developing per- Sarita V. H sor chips, they support computing devices formance models of the target architecture, followed Adve of extreme complexity. Predicting the performance by actual logic design—also called Register Transfer University of Illinois of these machines often requires using sophisticated Language, or RTL. Circuit designers convert the logic at Urbana- programs to model them. Performance specification into circuits, and layout engineers even- Champaign simulators—software programs typically written tually position the circuits on the processor floor plan. in a high-level language such as C or C++—enable For very complex designs, such a design process Todd Austin exploration of design alternatives for future high- can take as long as seven years. Of course, the University of performance computers. process involves close interactions between the Michigan Since the early 1980s, the design of high-perfor- steps. The performance model, in particular, is mance computers has been largely data-driven. For refined as logic, and circuit designers feed back bet- Joel Emer example, analyses of instruction usage revealed that ter timing estimates for different hardware compo- Intel real machines do not use all instructions with equal nents of the target architecture. Thus, the per- frequency. Designers used this observation to opti- formance model’s fidelity to the target architecture Peter S. mize the implementation of these machines. is often key to the success of the architecture itself. Magnusson Direct measurement, however, is a post-design Additionally, the performance model must run at Virtutech Inc. step and does not always help optimize machines least faster than RTL; otherwise, developers could under design. As an alternative, designers adopted obtain performance estimates from the logic blocks analytical models to predict performance. Such themselves. Our experience shows that perfor- models are successful in many cases, particularly in mance models usually run several orders of magni- culling the design space in preliminary explorations. tude faster than RTL. However, analytic models have been less successful The increased complexity of target architectures in assessing detailed design tradeoffs. Because these and applications has made performance modeling tradeoffs are crucial in today’s highly competitive a daunting task. Microarchitecture pipelines have high-performance computing market, designers extended from five to 20 stages to exploit increasing have reverted to simulation models to predict levels of parallelism. This trend will continue as Web machine performance. server and database markets require both fine-

38 Computer 0018-9162/02/$17.00 © 2002 IEEE grained multithreading embedded in aggressive pipelines and coarse-grained multiprocessing span- ning multiple processors. Such multithreading and Performance Logic Circuit Layout modeling design design multiprocessing architectures are even moving on design chip, exemplified by Intel’s Hyperthreaded archi- tectures, IBM’s Power4, and Hewlett-Packard’s Mako processor. Understanding the performance of distributed Figure 1. Typical computing, network server, and parallel applica- Finally, Asim extends SimpleScalar’s reuse phi- flow in a micro- tions on this new generation of processors requires losophy to finer-grained modular components processor design studying how they interact with the system archi- within the simulator itself. “Asim: A Performance process. Interaction tecture and the operating system. Unfortunately, Model Framework” by Joel Emer and his col- between the process performance models that can evaluate system archi- leagues explains how Asim provides a simulation steps refines the tectures and operating systems have evolved into infrastructure with a library of modules that model performance model extremely complex and gigantic software projects. different hardware components, such as caches and throughout the branch predictors. With this library, designers and process. IN THIS ISSUE researchers can easily reuse, extend, and modify This special issue presents four performance sim- architectural components to quickly build complex ulators—Rsim, , SimpleScalar, and Asim— performance models. Currently, Asim is a propri- that address different aspects of the complexities etary tool within Compaq and Intel. encountered in performance simulation. In “Rsim: Simulating Shared-Memory Multi- processors with ILP Processors,” Christopher J. erformance simulation is a research topic of Hughes and coauthors review the development long-standing importance that comprises a process for Rsim, an architecture simulator widely P huge body of literature. Comprehensive cov- used in academic research related to multiproces- erage is impractical in a single special issue, but sors. It provides detailed models for shared-mem- these four articles and tools demonstrate the over- ory multiprocessors based on processors that all state of the art in performance simulation and support dynamic scheduling. The experience with also offer a glimpse of the problems and challenges Rsim’s detailed processor modeling demonstrates that lie ahead. We hope you enjoy them. that simple models of an older generation of sequential processors cannot approximate the more complicated dynamically scheduled processors. The Simics simulation platform is based on the Shubhendu S. Mukherjee is a senior hardware engi- idea that reliable performance estimates require full neer in VSSAD at Intel. Contact him at shubu. system simulation. Simics runs unmodified firmware, [email protected]. operating system kernels, and device drivers. In “Simics: A Full System Simulation Platform,” Peter Sarita V. Adve is an associate professor in the Com- S. Magnusson and colleagues describe how this sys- puter Science Department at the University of Illi- tem simulates a network of multiple, heterogeneous nois at Urbana-Champaign. She is a member of the computers that designers can use to duplicate real- ACM and the IEEE. Contact her at [email protected]. world scenarios. Simics also can export the models edu. for these functions to other tools. Originally an aca- demic research project, Simics is today a commer- Todd Austin is an assistant professor in the Depart- cial product, available from Virtutech. ment of Electrical Engineering and Computer Sci- “SimpleScalar: An Infrastructure for Computer ence at the University of Michigan. Contact him at System Modeling,” by Todd Austin, Eric Larson, [email protected]. and Dan Ernst describes how researchers can reuse this uniprocessor performance simulator’s tools to Joel Emer is an Intel Fellow in VSSAD. Contact quickly obtain meaningful results from complex him at [email protected]. architectures. Subsequent to its development, other researchers incorporated models of multithreaded Peter S. Magnusson is CEO of Virtutech. Contact and multiprocessing architectures into Simple- him at [email protected]. Scalar.

February 2002 39