A Survey Paper on Processor Workloads

A Survey Paper on Processor Workloads Christoph Jechlitschek, [email protected] Abstract Processors are ubiquitous in today's world. Almost any electronic device contains a processor, and their computing power ranges from a few thousand to millions of instruction per second. Processors are built for many different purposes, and so their capabilities vary greatly. Even within a specific area, processors are different enough to make it difficult to compare them. Benchmarks have been developed to measure performance of processors and rank them among their peers. Much research has been conducted to design and develop benchmarks capable of reflecting a processor's performance. This paper surveys benchmarks in five major processor families: general purpose processors, graphics processors, network processors, embedded processors, and processors in supercomputers. This paper also covers some basics about benchmarks. Keywords: processor workloads, benchmarks, performance evaluation, performance analysis, performance measurements, supercomputers, graphic processors, network processors, embedded processors, general purpose processors, performance metric, Dhrystone, Whetstone, Fhourstone, SPEC CPU, SiSoft Sandra, LINPACK, LAPACK, Lawrence Livermore Loops, NAS Parallel Benchmarks, 3Dmark06, SPEC viewperf, CommBench, PacketBench, EEMBC See also: Other Reports on Performance Evaluation Table of Contents 1. Introduction 2. Benchmark Background 2.1 The Origin of the word Benchmark 2.2 Types of Benchmarks 2.2.1 Addition Instruction 2.2.2 Instruction Mixes 2.2.3 Kernels 2.2.4 Synthetic Programs 2.2.5 Application Benchmarks 2.3 Benchmark Metrics 3. Benchmarks for Desktop Computers 3.1 Whetstone 3.2 Dhrystone 3.3 Fhourstone 3.4 SPEC CPU 2006 3.5 SiSoft Sandra 4. Benchmarks for Supercomputers 4.1 LINPACK 4.2 LAPACK 4.3 Lawrence Livermore Loops 4.4 NAS Parallel Benchmarks http://www.cse.wustl.edu/~jain/cse567-06/ftp/processor_workloads/inde... 1 of 14 5. Benchmarks for Graphic Processors 5.1 3Dmark06 5.2 SPEC viewperf 9.0 5.3 3D games 6. Benchmarks for Network Processors 6.1 CommBench 6.2 PacketBench 7. Benchmarks for Embedded Processors 7.1 EEMBC Benchmarks 8. Summary 9. References 10. List of Acronyms 1. Introduction Processors are the brains of many devices. They can perform computations; interact with other devices, and many other things. They come in many sizes and shapes, vary in speed and have different instruction sets, and are designed for different uses. With this multitude of processors, it becomes important for us to determine how well a processor performs in its designed area and how it compares to other processors. In this paper, we start by explaining what benchmarks are and what they measure. It is important to understand the different metrics to make sure to select the right criteria when making comparisons. In the following sections, we present benchmarks for different classes of processors. We start by covering benchmarks for general purpose CPUs in desktop systems, followed by benchmarks for those CPUs in supercomputers. After that, we cover special purpose processors and their benchmarks. Special purpose processors covered are graphic, network, and embedded processors. Back to Table of Contents 2. Benchmark Background In this section, we cover some of the background material that is needed to understand the rest of the paper. We first explain the different kinds of benchmarks that a processor can be subjected to, and discuss what they measure and whether they reflect just the speed of the processor or also include the interactions of the processor with peripheral devices. After that, we cover some common metrics that are used to report the performance of processors. Before we start with these two sections, we cover the origin of the word benchmark as a side note. 2.1 The Origin of the word Benchmark The term benchmark was created and used by surveyors. After surveying an area, surveyors had a great interest in indicating the exact position from where they took their measurements so that they could reposition the leveling rod at a later time. To do so, the surveyors chiseled marks in which they placed an angle-iron to bracket (bench) the rod. From there, the word benchmark found its way into computer science where it changed its meaning to mean evaluation of computer performance. Next, we describe five different groups of benchmarks. 2.2 Types of Benchmarks http://www.cse.wustl.edu/~jain/cse567-06/ftp/processor_workloads/inde... 2 of 14 Benchmarks can be roughly divided into 5 groups [Jain91]. Those groups are: Addition Instruction Instruction Mixes Kernels Synthetic Programs Application Benchmarks We proceed by describing each group in more detail. 2.2.1 Addition Instruction In the early times, the CPUs were "the most expensive and most used components of the system" [Jain91] so performance was measured by counting how many additions a computer could perform per unit time. Since the addition instruction was the most common instruction and because computers had very few other instructions, comparing the number of additions performed was a reasonable benchmark for early computers. 2.2.2 Instruction Mixes Measurements based on a single instruction had the disadvantage that they did not take into account that other instructions had different complexities. As the number of instructions grew, it became clear that a mix of instructions was needed. The number of times each instruction appears in the mix should ideally reflect the frequency with that the instruction appears in an average program. The most popular mix is the Gibson mix which was developed by Jack Gibson in 1959. The Gibson mix is shown in table 1. Table1: This table shows the Gibson mix ordered after the frequency of the instructions. This data was taken from [Jain91]. Instruction Percentage Load and store 31.2 Indexing 18.0 Branches 16.6 Floating Add and Subtract 6.9 Fixed-Point Add and Subtract 6.1 Instructions not using registers 5.3 Shifting 4.4 Compares 3.8 Floating Multiply 3.8 Logical, And, Or 1.6 Floating Divide 1.5 Fixed-Point Multiply 0.6 Fixed-Point Divide 0.2 2.2.3 Kernels http://www.cse.wustl.edu/~jain/cse567-06/ftp/processor_workloads/inde... 3 of 14 Two reasons caused the focus to shift from instruction mixes to kernels. The first is that emerging processors continued to increase the number of instructions in their instruction set. Therefore, it became more and more difficult to include all instructions in the mix. The second reason was the introduction of advanced features such as pipelining and address translations. Execution time of an instruction started to depend on the instructions that were issued in its vicinity. To account for it, researchers started to use the most frequent functions in a program. Later, the focus shifted from functions to operations such as matrix inversion and sorting so that performance could be compared across different systems. Most, if not all, kernels are processing kernels. These kernels do not issue I/O instructions and therefore measure the performance of the processor without regards to the surrounding system. 2.2.4 Synthetic Programs The disadvantage of Kernels is that they do not reflect the behavior of a typical application. Applications, unlike Kernels, issue system calls and interact with I/O devices. Synthetic programs were developed to account for this by exercising a mix of computations, system calls and I/O requests. The main advantage of synthetic programs is that they can be quickly developed and do not rely on real data. To assure portability, these programs were written in a high level language. However, these programs are far from perfect. The main drawback being that they are too small to make statistically relevant memory or disk references. Similar issues arise with page faults, disk cache hits/misses, and the CPU-I/O overlap. 2.2.5 Application Benchmarks By far, the most accurate way to get measurements reflecting application program performance is to use applications as benchmarks. This is especially true in areas where there are only one or very few application programs such as banking (credit-debit transactions). For a more general environment, a set of 'typical' applications is used to compute an overall performance score. These benchmarks are generally identical to the original applications with the exception of small modifications made to automate the evaluation process. Application benchmarks are also not perfect since they evaluate the whole computer instead of just the processor (since we are talking about processor benchmarks). 2.3 Benchmark Metrics Before we can start measuring and comparing the performance of different processors we need to review the metrics that are commonly used. One of the earliest metrics is MIPS - million instructions per second. This metric simply counts how many instructions a processor can execute each second. Many computer scientists regard this as meaningless [Hennessy03] and have even nicknamed it as "Meaningless Indication of Processor Speed" or "Meaningless Information on Performance for Salespeople" [wiki-MIPS]. There are two reasons for this. First, MIPS does not reflect how much work is done per instruction. For example, RISC processors have simple instructions, and even though they can execute instructions faster than CISC processors, they do not necessarily do more work. Second, different instructions can take a different amount of time to execute. For example, add instructions execute faster on most processors than branch instructions. Most MIPS ratings are based on the fastest executing instruction and should be considered as Peak-MIPS, while MIPS rating under "normal" use will be lower. Here again, it is problematic to compare different processor architectures as a processor with a single very fast instruction and other slow instructions will have a higher MIPS rating than processors with all fast instructions, although the latter will execute most programs faster. As a side note, Linux uses so called bogoMIPS.

A Survey Paper on Processor Workloads

EEMBC and the Purposes of Embedded Processor Benchmarking Markus Levy, President

Memory Centric Characterization and Analysis of SPEC CPU2017 Suite

Microbenchmarks in Big Data

Cloud Workbench a Web-Based Framework for Benchmarking Cloud Services

Overview of the SPEC Benchmarks

Automatic Benchmark Profiling Through Advanced Trace Analysis Alexis Martin, Vania Marangozova-Martin

Hypervisors Vs. Lightweight Virtualization: a Performance Comparison

Oracle Corporation: SPARC T7-1

Power Measurement Tutorial for the Green500 List

Poweredge R940 (Intel Xeon Gold 5122, 3.60 Ghz) Specint Rate Base2006 = 1080 CPU2006 License: 55 Test Date: Jun-2017 Test Sponsor: Dell Inc

I.MX 8Quadxplus Power and Performance

Automatic Benchmark Profiling Through Advanced Trace Analysis