EEMBC JOURNAL SUMMER 2005

JOURNAL

FROM THE PRESIDENT INSIDE THIS ISSUE: Testing Your Mobile Phone’s FROM THE 1 PRESIDENT Performance BENCHMARKING 1 MULTICORE PLATFORMS There are many reasons why it’s EEMBC has always closely important to test the Java per- guarded its source NEWEST BENCH- 2 formance in a mobile phone. But code to protect the benchmarks’ MARK SCORE the number one reason is that, credibility, and only EEMBC REPORTS as a mobile phone user, you members are allowed to publish FROM THE LAB 2 might be one of the estimated benchmark scores for their own suite available for free-of-charge 220 million mobile gamers ex- products. With our Grinder- download from a new Web (and NEWS BRIEFS 3 pected by 2009. In this case, Bench™ Java benchmarks how- WAP) site, making it possible for your phone’s Java performance ever, more than just processor anyone with a Java-enabled MEMBERS ONLY 3 will play a big role in the quality performance is being measured; mobile phone to run the bench- marks and test their phone’s EEMBC 4 of your user experience. The the score results show the per- CALENDAR good news is that there are formance of a whole Java imple- performance in Java-enabled many benchmarks available for mentation. This being the case, games, encryption, internet con- testing a mobile phone’s Java we’ve come to recognize that a nectivity, and other applications. performance. The bad news is different set of rules from our that there are many benchmarks processor-centric benchmarks The new site also presents available for testing a mobile ought to apply to GrinderBench. scores for a wide variety of phone’s Java performance. As mobile phones that were tested an industry-standard consortium, Based on a decision by the under controlled conditions by it’s time we took action to en- EEMBC Board, our consortium EEMBC members. The prelimi- sure that the proper tests were has agreed to make binary ver- nary data we’ve seen is already being conducted on the world’s sions of the algorithms in the mobile phones. GrinderBench Java benchmark (continued on page 3)

AN EEMBC MEMBER SPEAKS OUT Benchmarking Multicore Platforms: Just how can I know? By John Goodacre, ARM

Benchmarking has long been Within EEMBC, there are multi- overview of the benchmarking invoked as the best way for ple suites of tests, each with a world, I’d probably start by split- designers to determine how well number of test kernels that exer- ting it into three distinct aspects. any particular solution will work cise a candidate platform in in a given application. The truth some contrived or generalized Algorithmic excellence: bench- is that the abstraction of a solu- manner. Even so, we still see marks of this form concentrate tion into some manageable set huge numbers of alternative solely on a single task that is of benchmarking tests provides suites and kernels being offered typically focused on some form www.eembc.org www.eembc.org for the in benchmark latest scores the answer that one size really right across the computer indus- of signal processing. doesn’t fit all. try. If I were to try and offer an (continued on page 4) Page 2 EEMBC JOURNAL SUMMER 2005

From the Lab

NEWEST BENCHMARK Power Consumption: The Next Great SCORE Benchmarking Frontier REPORTS Alan R. Weiss, EEMBC Certification Laboratory (ECL, LLC)

For some time now, the semicon- datapoints can be plotted on a Q: Is the equipment cost effec- MC7448 1.7 GHz ductor and processor industries, graph, resulting in a reasonable tive? Pre-production Silicon and even suppliers of embedded picture of power consumption software, have considered power while the benchmark is being A: This same equipment doubles Automotive/Industrial as a parameter with as much, or run. Warm up periods are taken as the data acquisition and Out-of-the-Box even more, importance than into account to assure accuracy signal generation equipment Consumer performance. EEMBC has not and stability of measurements. used for the Automotive/ Out-of-the-Box ignored this industry focus and We will measure using two Industrial Real Time Version 2 has been working on a consis- different sampling frequencies development, benchmarking, Digital Entertainment tent, reliable, and certifiable to factor out possible aliasing and certification work. The cost Out-of-the-Box method to measure the energy concerns. of a DAC board plus the National Digital Entertainment required to run EEMBC bench- Instruments software is around Optimized marks in devices under test ECL has designed a method to $4000, far less than the hard- Networking Version 1.1 (DUTs). A power technical modify the EEMBC Portable ware-only system, and it can be Out-of-the-Box advisory group, led by Shay Test Harness to trigger the used for future EEMBC work. Gal-On of PMC-Sierra, has been power measurement system Networking Version 1.1 discussing various techniques programmatically. By automating Q: Can we measure power on a Optimized and concepts, with ECL acting as the system, we are providing variety of platforms? Networking Version 2.0 a technical advisor and labora- useful tools to EEMBC members Out-of-the-Box tory to perform experiments. Dr. and ECL-certified power con- A: Yes. We think that any system Office Automation David Kaeli of Northeastern sumption measurements for where the power rails have Out-of-the-Box University has been another their customers. power access points (points we technical advisor, offering feed- can solder on a power resistor) Telecom back and ideas. The following questions and will work. We also must be able Out-of-the-Box answers address the work on to access a signal pin on the This group’s latest milestone is a power measurements that board for control purposes. Telecom white paper, “Measuring Power Optimized EEMBC and ECL have done with National Instruments Lab- so far. Q: Can we measure more than View™.” The title gives some one power rail? indication of our approach. Q: Can the measurement equip- Rather than use a power meter ment do the job? A: Yes, so long as there are for measurements, we believe access points. IBM 970FX 2 GHz EEMBC and its stakeholders will A: The National Instruments 250- Production Silicon be best served by the use of kHz DAC board and software (continued on page 3) National Instruments LabView tools proved suitable for the task Automotive/Industrial software running on a host plat- at hand, al- Out-of-the-Box form (such as Windows XP or though it may Consumer ) containing a digital-to- be prudent to Out-of-the-Box analog converter (DAC) board. move to a 1- This board is connected to a Digital Entertainment MHz board (1 breakout board, which is turn is Out-of-the-Box million samples connected to the DUT, or, more per second). Networking accurately, to power access ECL was Out-of-the-Box points to which a power resistor quickly able has been soldered. Networking Version 2.0 to build the Out-of-the-Box One major advantage of using National Instru- Office Automation this PC-based measurement ments VI pro- Out-of-the-Box setup is that the test results are gram and we Telecom automatically recorded. This can instrument Out-of-the-Box system logs all of our samples to the code to a file and provides a complete trigger the log- audit trail of our work. These ging system. Page 3 EEMBC JOURNAL SUMMER 2005

FROM THE PRESIDENT (continued from page 1) quite revealing. Some phones subsequently improve them. The GrinderBench while ensuring have outstanding Java perform- same is true for JVM developers that only valid and reliable ance and will make game and silicon providers. scores on these phones will be playing a joy. Other phones have published. Compared with the The new GrinderBench such poor Java performance, it’s In doing performance testing, alternatives, GrinderBench may Web site at these stakeholders may want to stretching the facts to claim that be somewhat less exciting to www.grinderbench.com they can run Java games. go beyond just running the run; it won’t bring up any fancy benchmarks. They might also images on your cell phone dis- will debut in late In addition to the ‘fun’ and want to examine the actual play. The rather plain interface August 2005. curiosity aspect for end-users, source code, do some perform- allows the benchmarks to how will GrinderBench be used? ance profiling, and determine focus on the true performance This depends on the audience, how to improve their products. aspects of the Java implementa- which includes service provid- We invite companies such as tion. This is also the reason ers, hardware OEMs (the these to become members of why GrinderBench doesn’t pro- mobile phone manufacturer), the EEMBC Java Subcommittee vide a progress meter while the Java Virtual Machine (JVM) or to license the GrinderBench benchmark is running, as even developers, and silicon provid- benchmark software. that interferes with the actual ers. While “Vendor X” may build benchmark execution. But the phone, user complaints With several different Java under the hood, there’s a lot of always come through the service benchmarks now on the loose, processing going on, making provider. Hence, all service GrinderBench stands as a thor- this a very powerful benchmark providers should make use of oughly engineered, tested, and and laying a solid foundation EEMBC’s new service and try certifiable standard for measur- for future versions of these out every phone that they are ing the performance of Java industry-standard benchmarks. evaluating and subsequently implementations. We’ve offering to the public. Vendors changed our rules so everyone will be able to make use of this with a Java-enabled mobile service to test their phones and phone can experience Markus Levy

Power Consumption: The Next Great NEWS BRIEFS Benchmarking Frontier (Continued from Page 2) Broadcom Corporation and SunPlus Technology are the newest mem- Q: Can we make measurements differences between bench- bers of the EEMBC Board of Directors. Broadcom supplies wired and over a wide range of frequen- marks in terms of power? wireless broadband communications semiconductors and had reve- cies? A: We saw consistent differ- nues of $2.4 billion in 2004. Customers for its system-on-chip and soft- A: Yes. For experimental pur- ences between benchmarks and ware products include manufacturers of computing and networking poses, we measured a 180-MHz different power characteristics equipment, digital entertainment and broadband access products, and processor, a 266-MHz processor, (waveforms, graphs). We think mobile devices. Taiwan-based SunPlus provides custom and standard- and a 600-MHz processor. We that benchmarking across differ- product IC solutions for handheld electronics, interactive toys, digital can take at least 250,000 sam- ent benchmarks is meaningful cameras, MP3 and DVD players, and many other consumer electronic ples per second, and if we run based on our experiments. systems. The company’s products include LCD controllers/drivers, the benchmarks over at least a , multimedia ICs, and memory chips. Q: When will this process be two- to three-second range, that submitted for approval to the Interest in EEMBC benchmarks on the part of academic researchers is will generate a lot of data sam- EEMBC Board of Directors? on the rise in Europe. Since the last issue of EEMBC Journal, faculty at ples. If necessary, a faster DAC the University of Hertfordshire (UK), Chalmers University of Technology board can be inexpensively A: A few more experiments (Goteborg, Sweden), and the Institut für Datentechnik und Kommunika- acquired. must be performed, and the tionsnetze (IDA) of the Technical University at Braunschweig (Germany) entire process from start to Q: Can we measure performance have become EEMBC U academic members. finish must be demonstrated on and power simultaneously? a suitable number of platforms, Most datasheets for the DENBench 1.0 Digital Entertainment bench- A: Yes, although separate power but we believe that there are no mark suite are now available from the Digital Entertainment section of and performance runs would be technical obstacles to complet- the EEMBC Web site. better to remove any potential ing the process development. for intrusive behavior. Q: Can we find a way to get consistent results? A: Consistent to within 5% is achievable and practical. The power resistor itself contributes about 1% variance, and the DAC board another 1%. Q: What is the difference across measuring tests, and are there Page 4 EEMBC JOURNAL SUMMER 2005

Benchmarking Multicore Platforms: Just how can I know? (Continued from Page 1)

Generalized competence: this is where I’d Using benchmarking to measure the conse- Symmetric multiprocessing (SMP) is a soft- put the EEMBC suites. These benchmarks quences of design is fine, and can be ware technique where code is written so take a cross section of realistic workloads accomplished by all three forms of bench- that one or more threads (flows of instruc- and provide standardized implementations marks. The challenge comes when designers tions) can be represented in a portable form. which are applied to solutions that provide use the benchmarks to make design choices The task of multi-threaded programming is a generalized processing capability. and architecturally explore the options they limited. Its role is really only to define the have within a given design space. In this syntax and the API of how to represent code Consumer satisfaction: these forms of case, the distinction between these three for a multiprocessor. It does not define the benchmarks are designed to provide user- forms of benchmark must be clear and semantics of how a workload is to be parti- based scenarios and will typically only concise. tioned to accomplish the given work. execute on the final, complete solution. For the targeted signal processing bench- When looking at SMP, there are three main Even in offering these distinctions, I hope marks, a time consuming, very expensive, forms for semantic decomposition of work- it’s clear that the boundary between each unique, and complex redesign of the bench- loads. Together these can be used to provide can be somewhat blurred. Technology is mark implementation for each design the required meaning in the development of constantly evolving, and what may once variation must be undertaken. In the case of MP benchmarks. have been very much a specific signal proc- consumer satisfaction benchmarks, the essing algorithm can now be performed by sheer size of the benchmarks makes it Decomposition of a single task: in this a general-purpose processor. Likewise, a impossible to execute them in any of today’s scenario, a single algorithm is parallelized generalized task, by virtue of rigidity through design exploration tools, without the to execute concurrently across the multiple standardization, becomes a candidate for designer needing to either invest excessive processors. Similarities exist here in the fixed-function signal processing. As far as amounts of simulation time, or to simplify targeting of a signal processing algorithm benchmarks are concerned, end users really the benchmarks to a point they can be exe- to a specific (MP) architecture. The key dif- only care about how well their purchased cuted. This brings us back to generalized ference is that the benchmark is focused solution will operate. It is for this reason competence benchmarks and EEMBC. at general purpose processors in which the that I separate the consumer satisfaction designer will always maintain a C-code benchmarks into their own category. These The EEMBC benchmarks for many years target. You could consider this as simply tests target end user applications on a have been offering designers both the ability a code optimization of an existing EEMBC defined type of platform ⎯ and this is where to compare final solutions and to help kernel. another key use of benchmarking comes improve their products. Delivered as port- into conflict. able C-code, a simple recompile allows these Execution of multiple algorithms: this form tests to run on some new or improved of benchmark takes multiples of the algo- general-purpose processing unit. But the rithms that represent a user scenario and EEMBC Calendar world is changing. The general-purpose runs them together in much the same way compute engines (CPUs) targeted by EEMBC the end user would. It thus starts to en- ARM Developers’ Conference are starting to call on techniques long used croach on consumer satisfaction tests. The October 5 – Santa Clara Convention Center in embedded systems, for example by using key understanding gained from this type of Markus Levy will present “Update on Industry- multiple general purpose compute engines, test is how well a given solution can manage Standard Performance Analysis” i.e. a multiprocessor implementation, multiple concurrent activities.

instead of a single uniprocessor. The distinc- October 6 – Santa Clara Convention Center tion between multiple processors at the Multiple streams of data: in embedded Markus Levy and Graham Wilkinson will system level and multiple processors at the computing, performance is not just a matter present “EEMBC Java Development and CPU level is an important one when you start of how well a solution can solve a single task Testing with GrinderBench” to look at what must be done within bench- or multiple tasks, but of how well it can ⎯ marking. Generalized benchmarks that process a given input data its throughput. GSPx 2005 Pervasive Signal Processing Con- historically focused on a CPU’s performance Here the multi-processor is using its multiple ference now must also understand the multiproces- threads to split up and share the data across October 24-27 – Santa Clara Convention sor (MP) and multi- (MT) forms of the processors. To measure how well this is Center processor implementations. But the world done, a different kind of benchmark is Markus Levy and Danny Wilson of LSI Logic of system-level multiprocessing is quite needed. will co-present “A Layered Approach to Evalu- diverse. It is fair to ask whether this is some- ating Processor and System Performance for In short, there is no simple answer to the thing best left to the consumer satisfaction VoIP Applications” question “Just how can I know?” Thankfully, level of benchmarks. though, the available benchmarks are broad Major recent and forthcoming contributed When looking into processors that provide and cover a wide design space. But we must articles by Markus Levy software with a view of multiple CPUs, it’s keep in mind that as technology changes, “Evaluating Digital Entertainment System Per- clear that the traditional programmer’s view so will the ways in which devices address a formance” (IEEE Computer, July 2005) of writing for a single flow of instructions problem and offer a solution. In measuring cannot be used across processors capable how well they do this with benchmarks, the “New Benchmarks Aim to Verify Microcontrol- of executing multiple flows of instructions important thing is to ensure that these in- ler Real-Time Capabilities” ( concurrently. Any benchmark code must struments of measurement remain Engineering, May 2005) change, but to offer comparisons with exist- appropriate, and evolve in parallel with the ing uniprocessor designs, the benchmark greater changes that are taking place in “Performance-Tests für Bildverarbeitungs- must be portable between the two forms of processor architectures, implementations, Systeme” (forthcoming Q4 in Elektronik Praxis) processor. and applications.