Microprocessor

MICROPROCESSOR www.MPRonline.com THE REPORTINSIDER’S GUIDE TO MICROPROCESSOR HARDWARE EEMBC’S MULTIBENCH ARRIVES CPU Benchmarks: Not Just For ‘Benchmarketing’ Any More By Tom R. Halfhill {7/28/08-01} Imagine a world without measurements or statistical comparisons. Baseball fans wouldn’t fail to notice that a .300 hitter is better than a .100 hitter. But would they welcome a trade that sends the .300 hitter to Cleveland for three .100 hitters? System designers and software developers face similar quandaries when making trade-offs EEMBC’s role has evolved over the years, and Multi- with multicore processors. Even if a dual-core processor Bench is another step. Originally, EEMBC was conceived as an appears to be better than a single-core processor, how much independent entity that would create benchmark suites and better is it? Twice as good? Would a quad-core processor be certify the scores for accuracy, allowing vendors and customers four times better? Are more cores worth the additional cost, to make valid comparisons among embedded microproces- design complexity, power consumption, and programming sors. (See MPR 5/1/00-02, “EEMBC Releases First Bench- difficulty? marks.”) EEMBC still serves that role. But, as it turns out, most The Embedded Microprocessor Benchmark Consor- EEMBC members don’t openly publish their scores. Instead, tium (EEMBC) wants to help answer those questions. they disclose scores to prospective customers under an NDA or EEMBC’s MultiBench 1.0 is a new benchmark suite for use the benchmarks for internal testing and analysis. measuring the throughput of multiprocessor systems, Partly for this reason, MPR rarely cites EEMBC scores including those built with multicore processors. MultiBench in articles. EEMBC forbids members to publicly release is EEMBC’s biggest launch since introducing the Energy- their scores without EEMBC’s independent certification. Bench power-consumption benchmarks two years ago. (See Another reason for the scarcity of EEMBC scores in MPR is MPR 7/17/06-02, “EEMBC Energizes Benchmarking.”) they’re usually not available for the processors we write [Full disclosure: EEMBC was founded and is headed about—processors that aren’t available yet. And even if more by Markus Levy, a former Microprocessor Report technology EEMBC scores were available, some engineers remain suspi- analyst and a current member of the MPR Editorial Board.] cious of any attempts to distill microprocessor performance into a few numbers. (See MPR 8/30/04-01, “Benchmarking EEMBC’s Evolution: More Than Scores the Benchmarks.”) Since its inception in 1997, EEMBC’s mission has been to MultiBench seems to be moving in the same general produce benchmarks for embedded processors, not for PC direction as other EEMBC benchmarks—away from pub- or server CPUS. In keeping with that mission, the Multi- lished scores, closer to private testing. Although some EEMBC Bench suite consists of compute-intensive tasks commonly members have been using early versions of MultiBench for six performed by networking equipment, office equipment months, no member is near to releasing certified MultiBench (especially printers), and consumer-electronics products. scores. MPR suspects that MultiBench will find its niche Although EEMBC benchmarks may be considered artificial primarily as an analysis tool for internal testing and closed- in that they aren’t complete applications, they perform func- door sales pitches, not as a source for widely publicized tions extracted or adapted from real-world applications. benchmark comparisons. Article Reprint 2 EEMBC’s Multibench Arrives That’s not all bad. Although published scores make for tasks into application-oriented categories of benchmark interesting “benchmarketing,”what the industry sorely needs suites (Networking, Consumer, Telecom, etc.). MultiBench is a technically valid method for evaluating the performance 1.0 is a unified suite that doesn’t specialize by application of multicore processors. Within its limitations, MultiBench domain. Specialized suites will come in the future. meets that need. Creating Custom Workloads New Concept: Work Items and Workloads Although MultiBench 1.0 defines 36 workloads, EEMBC EEMBC adapted most tasks in MultiBench from existing members and MultiBench licensees can create their own cus- EEMBC suites, in addition to writing entirely new tasks. tom workloads by selectively choosing work items and chang- Code reuse saved development time and made sense, because ing the parameters of the items. Almost all MultiBench work multicore embedded processors generally perform the same items can operate on different datasets with adjustable levels application-level tasks as single-core embedded processors. of threading. Testers can substitute their own datasets, too. Tasks were created or ported by EEMBC members and by the Custom workloads aren’t valid for MultiBench scoring, EEMBC Technology Center and its director of software but they allow testers to create a virtually infinite variety of engineering, Shay Gal-On. workloads, even when using the standard EEMBC datasets. Previously, EEMBC referred to the benchmark tasks in Testers can exercise processors and systems to identify each suite as “kernels”—a somewhat confusing term, because strengths and weaknesses or to compare the performance of they aren’t related to operating-system kernels. In EEMBC multicore processors with that of single-core processors. Engi- parlance, a kernel is simply an algorithm or routine that neers can test the effectiveness of multithreading at different performs a common task found in real-world embedded soft- levels: time-sliced on a single processor or distributed among ware. Now EEMBC is using new terminology. MultiBench multiple processors. tasks are called “workloads,” which may include one or more To help testers assemble the workloads, EEMBC offers “work items.” Work items are similar to the kernels of old, new tools. Figure 2 is a screen shot of the Workload Creator, because they mate algorithms with sample datasets on which which lets users choose work items from a list. Items available the algorithms operate. in the list depend on which workloads the tester has For example, one MultiBench workload is 64M-cmykw2. licensed. In addition, testers can build their own work items It has a single work item: a color-conversion routine that from scratch and link them into the Workload Creator by transforms four 12-megapixel images from the RGB color implementing a special application programming interface space to the CMYK color space. Another single-item Multi- (API). Workload Creator is part of MultiBench Architect, an Bench workload is 64M-rotatew2, an image-rotation routine optional tool. EEMBC’s technical documentation explains that operates on four 12-megapixel images, turning each one the function of each work item and shows its adjustable 90 degrees clockwise. And another MultiBench workload is parameters, such as the number of possible threads and the 64M-cmykw2-rotatew2, which combines the color-conversion official EEMBC datasets approved for use with the item. and rotation workloads to create a two-item workload. In Because MultiBench allows a variable number of work all, MultiBench 1.0 has 36 workloads, some of which are items to run simultaneously, the number of simultaneous combinations of work items in other workloads. threads is also variable. And some multithreaded work items There’s another twist. Work items may be single operate on multiple slices of data—potentially, another threaded or may decompose the data for processing by multi- adjustable parameter. When porting existing kernels to ple threads. This variation allows MultiBench to test data- MultiBench, EEMBC made other improvements as well, such level parallelism within a single work item as well as thread- as modifying the algorithms to mirror the latest application level parallelism among multiple work items. The grand trends and changing the sample data to reflect the larger concept is a little difficult to grasp at first, but it allows for a datasets crunched by today’s systems. In all these ways, great variety of possible workloads. A MultiBench workload MultiBench is more than simply a multiprocessing version may consist of a single work item; two or more instances of of existing EEMBC benchmarks. At every level, EEMBC has the same work item, operating on different datasets; two or updated the tasks to better represent the progress of real- more different work items; and work items that are single- world embedded software. or multithreaded. Figure 1 illustrates this concept. EEMBC chose the 36 workloads in MultiBench 1.0 to Inside the MultiBench Suite keep things relatively simple. Actually, EEMBC has ported Existing benchmark suites from which EEMBC derived the almost all the kernels in existing EEMBC suites to function as MultiBench workloads include Networking 2.0, Office MultiBench workloads. Including all these workloads in Automation 2.0, and Digital Entertainment 1.0. As mentioned MultiBench 1.0 would have overwhelmed benchmark testers. above, EEMBC has ported almost all the tasks in all the For scoring purposes, if nothing else, it’s better to limit the suites—including Automotive 1.1 and Telecom 1.1—but is number of workloads to a manageable number. For now, using only a subset of them in MultiBench 1.0. Future releases MultiBench departs from the usual EEMBC practice of sorting of MultiBench will have more workloads. In addition to © IN-STAT JULY 28, 2008 MICROPROCESSOR REPORT EEMBC’s Multibench Arrives

Microprocessor

Load Testing, Benchmarking, and Application Performance Management for the Web

Overview of the SPEC Benchmarks

Programmers' Tool Chain

Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK

Evaluation of AMD EPYC

An Effective Dynamic Analysis for Detecting Generalized Deadlocks

BOOM): an Industry- Competitive, Synthesizable, Parameterized RISC-V Processor

A Randomized Dynamic Program Analysis Technique for Detecting Real Deadlocks

Specfp Benchmark Disclosure

A Real-World Benchmark Model for Testing Concurrent Real-Time Systems in the Automotive Domain

Optimistic Hybrid Analysis: Accelerating Dynamic Analysis Through Predicated Static Analysis

Educational Goals for Embedded Systems in the Multicore Era