Ramp: Research Accelerator for Multiple Processors
Total Page:16
File Type:pdf, Size:1020Kb
’S GOAL IS TO ENABLE THE INTENSIVE, MULTIDISCIPLINARY INNOVATION John Wawrzynek THAT THE COMPUTING INDUSTRY WILL NEED TO TACKLE THE PROBLEMS OF PARALLEL David Patterson PROCESSING. RAMP ITSELF IS AN OPEN-SOURCE, COMMUNITY-DEVELOPED, FPGA-BASED University of California, EMULATOR OF PARALLEL ARCHITECTURES. ITS DESIGN FRAMEWORK LETS A LARGE, Berkeley COLLABORATIVE COMMUNITY DEVELOP AND CONTRIBUTE REUSABLE, COMPOSABLE DESIGN Mark Oskin MODULES. THREE COMPLETE DESIGNS—FOR TRANSACTIONAL MEMORY, DISTRIBUTED University of Washington SYSTEMS, AND DISTRIBUTED-SHARED MEMORY—DEMONSTRATE THE PLATFORM’S POTENTIAL. Shih-Lien Lu ...... In 2005, the computer hardware N Prototyping a new architecture in Intel industry took a historic change of direction: hardware takes approximately four The major microprocessor companies all an- years and many millions of dollars, nounced that their future products would even at only research quality. Christoforos Kozyrakis be single-chip multiprocessors, and that N Software engineers are ineffective until future performance improvements would the new hardware actually shows up, Stanford University rely on software-specified parallelism rather because simulators are too slow to than additional software-transparent paral- support serious software development lelism extracted automatically by the micro- activities. Software engineers tend to James C. Hoe architecture. Several of us discussed this mile- innovate only after hardware arrives. stone at the International Symposium on N Feedback from software engineers on Carnegie Mellon University Computer Architecture (ISCA) in June the current production hardware can- 2005. We were struck that a multibillion- not help the immediate next genera- dollar industry would bet its future on tion because of overlapped hardware Derek Chiou solving the general-purpose parallel com- development cycles. Instead, the feed- puting problem, when so many have back loop can take several hardware University of Texas at Austin previously attempted but failed to provide generations to close fully. a satisfactory approach. To tackle the parallel processing prob- Hence, we conspired on how to create an Krste Asanovic´ lem, our industry urgently needs innovative inexpensive, reconfigurable, highly parallel solutions, which in turn require extensive platform that would attract researchers from Massachusetts Institute codevelopment of hardware and software. many disciplines—architectures, compilers, However, this type of innovation currently operating systems, applications, and others— of Technology gets bogged down in the traditional de- to work together on perhaps the greatest velopment cycle: challenge facing computing in the past ....................................................................... 46 Published by the IEEE Computer Society 0272-1732/07/$25.00 G 2007 IEEE 50 years. Because our industry desperately A second example of how RAMP differs needs solutions, our goal is to develop a from real hardware is reproducibility. Using platform that would allow far more rapid the RAMP Description Language (RDL) evolution than traditional approaches. framework, different researchers can con- struct the same deterministic parallel com- RAMP vision puting system that will perform exactly the Our hallway conversations led us to the same way every time, clock cycle for clock idea of using field-programmable gate arrays cycle. By using processor designs donated by (FPGAs) to emulate highly parallel architec- industry, RAMP users will start with familiar tures at hardware speeds. FPGAs enable very architectures and operating systems, which rapid turnaround for new hardware. You can will provide far more credibility than software tape out a FPGA design every day, and have simulations that model idealized processors a new system fabricated overnight. Another or that ignore operating-system effects. RDL key advantage of FPGAs is that they easily is designed to make constructing a full com- exploit Moore’s law. As the number of cores puter out of RDL-compatible modules easy. per microprocessor die grows, FPGA density Our target speeds of 100 to 200 MHz are will grow at about the same rate. Today we slower than real hardware but fast enough to can map about 16 simple processors onto a run standard operating systems and large- single FPGA, which means we can construct scale applications that are orders of magni- a 1,000-processor system in just 64 FPGAs. tude faster than software simulators. Finally, because of the similarities in the design flow Such a system is cheaper and consumes less of logic for FPGAs and custom hardware, we power than a custom multiprocessor, at believe RAMP is realistic enough to convince about $100 and 1 W per processor. software developers to start aggressive de- Because our goal is to ramp up the rate of velopment on innovative architectures and innovation in hardware and software multi- programming models and to convince hard- processor research, we named this project ware and software companies that RAMP RAMP (Research Accelerator for Multiple results are relevant. Processors). RAMP is an open-source project This combination of cost, power, speed, to develop and share the hardware and flexibility, observability, reproducibility, and software necessary to create parallel architec- credibility will make the platform attractive tures. RAMP is not just a hardware architec- to software and hardware researchers in- ture project. Perhaps our most important terested in the parallel challenge. In particu- goal is to support the software community as lar, it allows the research community to revive it struggles to take advantage of the potential the 1980s culture of building experimental capabilities of parallel microprocessors, by hardware and software systems, which today providing a malleable platform through has been almost entirely lost because of the which the software community can collabo- higher cost and difficulty of building hard- rate with the hardware community. ware. Unlike commercial multiprocessor hard- Table 1 compares alternatives for pursuing ware, RAMP is designed as a research plat- parallel-systems research in academia. The form. We plan to include research features four options are a conventional shared- that are impossible to include in real hard- memory multiprocessor (SMP), a cluster, a ware systems owing to speed, cost, or prac- simulator, a custom-built chip and system, ticality issues. For example, the FPGA design and RAMP. The rows are the features of can incorporate additional hardware to mon- interest, with a grade for each alternative and itor any event in the system. Being able to add quantification where appropriate. Cost rules arbitrary event probes, including arbitrary out a large SMP for most academics. The computation on those events, provides visi- costs of both purchase and ownership make a bility formerly only available in software sim- large cluster too expensive for most academics ulators, but without the inevitable slowdown as well. Our only alternative thus far has been faced by software simulators when introduc- software simulation, and indeed that has been ing such visibility. the vehicle for most architecture research in ........................................................................ MARCH–APRIL 2007 47 ......................................................................................................................................................................................................................... HOT CHIPS Table 1. Relative comparison of four options for parallel research. From the architect’s perspective, the most surprising aspect of this table is that not only is performance not the top concern, it is at the bottom of this list. The platform just needs to be fast enough to run the entire software stack. Feature SMP Cluster Simulator Custom RAMP Scalability (1,000 CPUs) C A A A A Cost (1,000 CPUs) F ($40M) C ($2M-$3M) A+ ($0M) F ($20M) A ($0.1M-$0.2M) Cost of ownership A D A D A Power/space (kW, racks) D (120, 12) D (120, 12) A+ (.1, 0.1) A B (1.5, 0.3) Development community D A A F B Observability D C A+ AB+ Reproducibility B D A+ AA+ Reconfigurability D C A+ CA+ Credibility of result A+ A+ DA+ B+/A Performance (clock) A (2 GHz) A (3 GHz) F (0 GHz) B (0.4 GHz) C (0.1 GHz) Modeling flexibility D D A B A Overall grade C C+ BB2 A the past decade. As mentioned, software de- total instructions per second, drops as more velopers rarely use software simulators, be- cores are simulated and as operating-system cause they run too slowly, and results might effects are included, and the amount of not be credible. In particular, it’s unclear how memory required for each node in the host credible results will be to industry when they compute cluster rises rapidly. are based on simulations of 1,000 processors RAMP is obviously attractive to a broad set running small snippets of applications. The of hardware and software researchers in paral- RAMP option is