Characterization and Architectural Implications

Princeton University Technical Report TR-811-08, January 2008 The PARSEC Benchmark Suite: Characterization and Architectural Implications Christian Bienia†, Sanjeev Kumar‡, Jaswinder Pal Singh† and Kai Li† † Department of Computer Science, Princeton University ‡ Microprocessor Technology Labs, Intel [email protected] ABSTRACT suites often include proprietary code and are not publicly This paper presents and characterizes the Princeton Ap- available. To address this problem, we created the PAR- plication Repository for Shared-Memory Computers (PAR- SEC benchmark suite in collaboration with Intel Corpora- SEC), a benchmark suite for studies of Chip-Multiprocessors tion. It includes not only a number of important appli- (CMPs). Previous available benchmarks for multiproces- cations from the RMS suite but also several leading-edge sors have focused on high-performance computing applica- applications from Princeton University, Stanford University tions and used a limited number of synchronization meth- and the open-source domain. The goal is to create a suite ods. PARSEC includes emerging applications in recogni- of emerging workloads that can drive CMP research. tion, mining and synthesis (RMS) as well as systems applications which mimic large-scale multi-threaded commercial This paper makes three contributions: programs. Our characterization shows that the benchmark suite is diverse in working set, locality, data sharing, synchronization, and off-chip traffic. The benchmark suite has • We identify shortcomings of commonly used bench- been made available to the public. mark suites and explain why they should not be used to evaluate CMPs. Categories and Subject Descriptors • We present and characterize PARSEC, a new bench- D.0 [Software]: [benchmark suite] mark suite for CMPs that is diverse enough in order to allow representative conclusions. General Terms Performance, Measurement, Experimentation • Based on our characterization of PARSEC, we analyze what properties future CMPs must have in order to Keywords be able to deliver scalable performance for emerging benchmark suite, performance measurement, multi-threading, applications. In particular, our understanding of the shared-memory computers behavior of future workloads allows us to quantify how CMPs must be built in order to mitigate the effects of 1. INTRODUCTION the memory wall on the next generation of programs. Benchmarking is the quantitative foundation of computer architecture research. Program execution time is the only In Section 2 we describe why existing benchmark suites can- accurate way to measure performance[18]. Without a pro- not be considered adequate to describe future CMP appli- gram selection that provides a representative snapshot of the cations. In Section 3, we present the PARSEC benchmark target application space, performance results can be mis- suite and explain how it avoids the shortcomings of other leading and no valid conclusions may be drawn from an ex- collections of benchmarks. The methodology which we use periment outcome. CMPs require a disruptive change in or- to characterize our workloads is presented in Section 4. In der for programs to benefit from their full potential. Future Sections 5 to 8, we analyze the parallelization, working sets, applications will have to be parallel, but due to the lack of a communication behavior and off-chip traffic of the bench- representative, multi-threaded benchmark suite most scien- mark programs. We conclude our study in Section 9. tists were forced to fall back to existing benchmarks. This usually meant the use of older High-Performance Comput- ing (HPC) workloads, smaller suites with only few programs 2. MOTIVATION or unparallelized benchmarks. We consider this trend ex- The goal of this work is to define a benchmark suite that can tremely dangerous for the whole discipline. Representative be used to design the next generation of processors. In this conclusions require representative experiments and, as we section, we first present the requirements for such a suite. argue in this paper, existing benchmark suites cannot be We then discuss how the existing benchmarks fail to meet considered adequate to describe future CMP applications. these requirements. Large processor manufacturers have already reacted and de- 2.1 Requirements for a Benchmark Suite veloped their own, internal collections of workloads. An ex- We have the following five requirements for a benchmark ample is the Intel RMS benchmark suite[14]. However, these suite: 1 Princeton University Technical Report TR-811-08, January 2008 Multi-threaded Applications Shared-memory CMPs are for example implements the Barnes-Hut algorithm for already ubiquitous. The trend for future processors N-body simulation[8]. For galaxy simulations it has is to deliver large performance improvements through largely been superseded by the TreeSPH[19] method, increasing core counts on CMPs while only provid- which can also account for mass such as dark matter ing modest serial performance improvements. Conse- which is not concentrated in bodies. However, even quently, applications that require additional process- for pure N-body simulation barnes must be consid- ing power will need to be parallel. ered outdated. In 1995 Xu proposed a hybrid algorithm which combines the hierarchical tree algorithm Emerging Workloads Rapidly increasing processing power and the Fourier-based Particle-Mesh (PM) method to is enabling a new class of applications whose compu- the superior TreePM method[45]. Our analysis shows tational requirements were beyond the capabilities of that similar issues exist for a number of other applica- the earlier generation of processors[14]. Such appli- tions of the suite including raytrace and radiosity. cations are significantly different from earlier applications (see Section 3). Future processors will be de- SPEC CPU2006 and OMP2001 SPEC CPU2006 and signed to meet the demands of these emerging appli- SPEC OMP2001 are two of the largest and most signif- cations and a benchmark suite should represent them. icant collections of benchmarks. They provide a snapshot of current scientific and engineering applications. Diverse Applications are increasingly diverse, run on a va- Computer architecture research, however, commonly riety of platforms and accommodate different usage focuses on the near future and should thus also con- models. They include both interactive applications sider emerging applications. Workloads such as sys- like computer games, offline applications like data min- tems programs and parallelization models which em- ing programs and programs with different paralleliza- ploy the producer-consumer model are not included. tion models. Specialized collections of benchmarks can SPEC CPU2006 is furthermore a suite of serial pro- be used to study some of these areas in more detail, grams that is not intended for studies of parallel ma- but decisions about general-purpose processors should chines. be based on a diverse set of applications. While a truly representative suite is impossible to create, reasonable Other Benchmark Suites Besides these major benchmark effort should be made to maximize the diversity of the suites, several smaller suites exist. They were usually program selection. The number of benchmarks must designed to study a specific program area and are thus be large enough to capture a sufficient amount of char- limited to a single application domain. Therefore they acteristics of the target application space. usually include a smaller set of applications than a diverse benchmark suite typically offers. Due to these Employ State-of-Art Techniques A number of applica- limitations they are commonly not used for scientific tion domains have changed dramatically over the last studies which do not restrict themselves to the cov- decade and use very different algorithms and tech- ered application domain. Examples for these types of niques. Visual applications for example have started benchmark suites are ALPBench[25], BioParallel[22], to increasingly integrate physics simulations to gener- MediaBench[1], NU-MineBench[23] and PhysicsBench[46]. ate more realistic animations[20]. A benchmark should Because of their different focus we do not discuss these not only represent emerging applications but also use suites in more detail. state-of-art techniques. Support Research A benchmark suite intended for research 3. THE PARSEC BENCHMARK SUITE has additional requirements compared to one used for One of the goals of the PARSEC suite was to assemble a benchmarking real machines alone. Benchmark suites program selection that is large and diverse enough to be intended for research usually go beyond pure scoring sufficiently representative for scientific studies. It consists systems and provide infrastructure to instrument, ma- of 9 applications and 3 kernels which were chosen from a nipulate, and perform detailed simulations of the in- wide range of application domains. In Table 1 we present a cluded programs in an efficient manner. qualitative summary of their key characteristics. PARSEC workloads were selected to include different combinations of parallel models, machine requirements and runtime behav- 2.2 Limitations of Existing Benchmark Suites iors. In the remaining part of this section we analyze how existing benchmark suites fall short of the presented requirements PARSEC meets all the requirements outlined in Section 2.1: and must thus be considered unsuitable for evaluating CMP performance. • Each of the applications has been parallelized. SPLASH-2

Load more