A Lightweight Processor Benchmark Mark Claypool Worcester Polytechnic Institute, [email protected]
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by DigitalCommons@WPI Worcester Polytechnic Institute DigitalCommons@WPI Computer Science Faculty Publications Department of Computer Science 7-1-1998 Touchstone - A Lightweight Processor Benchmark Mark Claypool Worcester Polytechnic Institute, [email protected] Follow this and additional works at: http://digitalcommons.wpi.edu/computerscience-pubs Part of the Computer Sciences Commons Suggested Citation Claypool, Mark (1998). Touchstone - A Lightweight Processor Benchmark. Retrieved from: http://digitalcommons.wpi.edu/computerscience-pubs/206 This Other is brought to you for free and open access by the Department of Computer Science at DigitalCommons@WPI. It has been accepted for inclusion in Computer Science Faculty Publications by an authorized administrator of DigitalCommons@WPI. WPI-CS-TR-98-16 July 1998 Touchstone { A Lightweight Pro cessor Benchmark by Mark Claypool Computer Science Technical Rep ort Series WORCESTER POLYTECHNIC INSTITUTE Computer Science Department 100 Institute Road, Worcester, Massachusetts 01609-2280 Touchstone { A Lightweight Pro cessor Benchmark Mark Claypool [email protected] Worcester Polytechnic Institute Computer Science Department August 11, 1998 Abstract Benchmarks are valuable for comparing pro cessor p erformance. However, p orting and run- ning pro cessor b enchmarks on new platforms can b e dicult. Touchstone, a simple addition b enchmark, is designed to overcome p ortability problems, while retaining p erformance measure- ment accuracy. In this pap er, we present exp erimental results that showTouchstone correlates strongly with pro cessor p erformance under other b enchmarks. Equally imp ortant, we presenta measure of p ortability that demonstrates Touchstone is easier to p ort and run than other b ench- marks. We conclude that while Touchstone is not a replacement for more extensive pro cessor b enchmarks, it can b e used for quick, reasonably accurate estimations of pro cessor p erformance. Keywords: b enchmark, pro cessor p erformance, measurement 1 Intro duction benchmark v. trans - to sub ject a system to a series of tests in order to obtain prear- ranged results not available on comp etitive systems. { S. Kel ly-Bootle, \The Devil's DP Dictionary" Standard measures of p erformance for computers systems provide a basis for comparison, which can lead to system improvements and predictions of e ectiveness under other applications. If computer users ran the same programs day in and day out, p erformance comparisons would b e straight-forward. As this will never b e the case, systems must rely on other metho ds to predict p erformance under estimated workloads. The pro cess of p erformance comparison for two or more systems by measurements is called benchmarking, and the workloads used in the measurements are called benchmarks. Broadly, there are four typ es of b enchmarks [10,11]: Real programs. Real programs are applications that users will actually run on the system. Observ- ing a system running such programs may b e a go o d indication of p erformance under normal op erations. Examples are language compilers suchasgcc, acc and f77, text-pro cessing to ols like TeX and Framemaker, and computer-aided design to ols like Spice. 1 Kernels. Kernels are small, key pieces from real programs. Unlike real programs, no user runs kernel programs, for they exist only for p erformance measurements. They are b est used to isolate p erformance of key system features. Examples are LINPACK see Section 2 for more information and the Livermore Loops [14]. Synthetic benchmarks. Synthetic b enchmarks havecharacteristics similar to those of a set of real programs and can b e applied rep eatedly in a controlled manner. They require no real-world data les that may b e large and contain sensitive data, and can b e easily p orted without a ecting their basic op eration. They often have built-in measurement capabilities. Examples are Spec see Section 2 for more information, Whetstone [7] and Dhrystone [15]. Toy benchmarks. Toy b enchmarks are small pieces of co de that pro duce results known b efore hand. They are small, easy to implement, and can b e run on almost any computer. They may b e used in some real programs, but often are not. Examples are the Sieve of Eratosthenes and Quicksort see Section 2 for more information. Although the ab ove b enchmarks may provide valuable system p erformance information, they are often dicult to run prop erly. Determining the correct mix of real programs is dicult and in- stalling real programs can b e time-consuming and resource intensive. Gcc, for example, required more disk space than we had available and demanded a signifcant amount of compilation time see Section 3.2 for more details. Running kernels can b e dicult without the necessary compilers and often small changes are required to p ort kernels b etween platforms. LINPACK, for example, requires a Fortran compiler and a user-supplied second function, that caused us p orting dif- culties see Section 3.2 for more details. Co ding toy b enchmarks can b e surprisingly dicult. Many computer professionals nd it dicult to accurately co de a Mergesort correctly, despite un- derstanding the algorithm. Obtaining reputed synthetic programs can b e equally challenging. For instance, although SPEC is non-pro t, they charge a supp ort and development fee for use of their b enchmarks. Prices in 1995 were [6]: Benchmark Typ e Price CPU intensiveinteger b enchmarks $ 425 CPU intensive oating p oint b enchmarks $ 575 Integer and oating p oint b enchmarks $ 900 System level le server NFS workload $ 1200 UNIX software developmentworkloads $ 1450 Even the basic integer or oating p oint b enchmarks are far to o much for the average graduate student wishing to simply explore workstation p erformance. Toovercome some of the diculties in running standard b enchmarks, we present Touchstone,a lightweight b enchmark for pro cessor p erformance. Touchstone is pro cess that incrementsavariable in a tight lo op for a xed p erio d of time. The value that it rep orts represents the p ower of the pro cessor; the higher the numb er, the more p owerful the pro cessor while the lower the numb er, the less p owerful the pro cessor. Touchstone's b eauty is its simplicity.At its core, Touchstone uses only 2 lines of high-level co de, a branch and an add. These commands are found on all systems and in all languages. Its simplicity makes it easy to p ort while its basic functionality provides a surprisingly accurate measure of system p erformance. 2 Increment instruction b enchmarks, suchasTouchstone, are not new. Historically, pro cessors were the most exp ensive and the most used system comp onents. Initially, computers had very few instructions, the most frequent of whichwas addition. Thus, the computer that added faster was considered a b etter p erformer. As the numb er and complexity of instructions grew, the addition b enchmark was no longer sucient as the only means of measuring pro cessor p erformance. This remains true to day, and so Touchstone should not replace existing b enchmarks. However, rather than b eing abandoned, Touchstone should b e used as a quick, easy way to give an approximate estimate of pro cessor p erformance. Wehavetwohyp otheses ab out the usefulness of Touchstone: Hypothesis:Touchstone is a go o d indicator of pro cessor p erformance under other b ench- marks. Hypothesis:Touchstone is easier to p ort than other b enchmarks. The rest of this pap er is laid out as follows: Section 2 describ es exp eriments we ran to test our hyp otheses; Section 3 analyzes our exp erimental data and tests our hyp otheses; and Section 4 summarizes our conclusions and presents other p ossible uses for Touchstone. 2 Exp eriments touchstone noun - a test or criterion for determining the quality or genuineness of a thing. { Webster's 7th dictionary, on-line. In order to test our rst hyp othesis, Touchstone is a good indicator of processor performance under other benchmarks,we need to correlate Touchstone with one b enchmark from each of the four categories presented in Section 1. This Section describ es our exp eriments to gather data for these correlation computations. The four b enchmarks we selected are: gcc, LINPACK, SPEC int and fp and quicksort. To strengthen our results, we p erform correlations on nine di erent computer workstations. The workstations are: SGI Crimson, SGI Indigo 2 Extreme, Sun Sparc 10, Sun IPX, Sun Sparc 2, Intel 80486-33, SGI Personal Iris, Sun Sparc 1 and Intel 80386-20. Table 1 summarizes the systems. The next 5 subsections describ e the exp eriments involving Touchstone and each b enchmark. 2.1 Touchstone We ran Touchstone on the 9 platforms listed ab ove. Since the countTouchstone rep orts is sensitive to other pro cesses running concurrently,we p ostulate that lightly loaded machines a must for accurate measurements. We p erformed the exp eriments on machines with a minimum of other pro cesses. In order to facilitate nding when time-shared machines are little used, we develop ed a network script to monitor pro cessor load. Figure 1 depicts the sample output from one of our script runs. In this case, the script help ed us avoid running Touchstone during the 3 a.m. system 3 Workstation Op erating System Pro cessor Sp eed RAM SGI Crimson Irix 5.2 100 Mhz 160 Mbytes SGI Indigo 2 Extreme Irix 5.2 100 Mhz 32 Mbytes Sun Sparc 10 SunOs 4.1.4 33 Mhz 64 Mbytes SGI Personal Iris Irix 4.01 20 Mhz 32 Mbytes Sun IPX SunOs 4.1.4 40 Mhz 40 Mbytes Sun Sparc 2 SunOs 4.1.4 40 Mhz 30 Mbytes Sun Sparc 1 SunOs 4.0.2 20 Mhz 20 Mbytes Intel 80486 Linux 1.2.13 33 Mhz 8Mbytes Intel 80386 Linux 1.2.13 20 Mhz 8Mbytes Table 1: Exp erimental platforms.