Openpiton: an Open Source Manycore Research Framework
Total Page:16
File Type:pdf, Size:1020Kb
OpenPiton: An Open Source Manycore Research Framework Jonathan Balkind Michael McKeown Yaosheng Fu Tri Nguyen Yanqi Zhou Alexey Lavrov Mohammad Shahrad Adi Fuchs Samuel Payne ∗ Xiaohua Liang Matthew Matl David Wentzlaff Princeton University fjbalkind,mmckeown,yfu,trin,yanqiz,alavrov,mshahrad,[email protected], [email protected], fxiaohua,mmatl,[email protected] Abstract chipset Industry is building larger, more complex, manycore proces- sors on the back of strong institutional knowledge, but aca- demic projects face difficulties in replicating that scale. To Tile alleviate these difficulties and to develop and share knowl- edge, the community needs open architecture frameworks for simulation, synthesis, and software exploration which Chip support extensibility, scalability, and configurability, along- side an established base of verification tools and supported software. In this paper we present OpenPiton, an open source framework for building scalable architecture research proto- types from 1 core to 500 million cores. OpenPiton is the world’s first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton leverages the industry hardened OpenSPARC T1 core with modifica- Figure 1: OpenPiton Architecture. Multiple manycore chips tions and builds upon it with a scratch-built, scalable uncore are connected together with chipset logic and networks to creating a flexible, modern manycore design. In addition, build large scalable manycore systems. OpenPiton’s cache OpenPiton provides synthesis and backend scripts for ASIC coherence protocol extends off chip. and FPGA to enable other researchers to bring their designs to implementation. OpenPiton provides a complete verifica- tion infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and has been widespread across the industry with manycore pro- is written in industry standard Verilog. Multiple implemen- cessors being fabricated by numerous vendors [49, 70, 82, tations of OpenPiton have been created including a taped-out 84, 86]. Academia has also heavily studied multicore and 25-core implementation in IBM’s 32nm process and multi- manycore [28, 47, 87], though often through the use of ple Xilinx FPGA prototypes. software simulation [14, 21, 31, 50, 59, 81] and model- ing [43, 72] tools. While software tools are convenient to 1. Introduction work with and provide great flexibility, they do not provide Multicore and manycore processors have been growing in the high performance, characterization fidelity (power, area, size and popularity fueled by the growth in single-chip tran- and frequency), and ability to perform research at-scale that sistor count and the need for higher performance. This trend taking a manycore design to RTL, FPGA, or silicon imple- mentation can provide. ∗ Now at Nvidia A major barrier limiting manycore research is the lack of an easy to use, highly scalable, open source, manycore pro- cessor design. In order to maximize benefit, such a design and framework would have robust software tools, be eas- Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice ily configurable, provide a mature tool ecosystem with OS and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to support, be written in an industry standard HDL, be easily lists, contact the Owner/Author. Request permissions from [email protected] or Publications Dept., ACM, Inc., fax +1 (212) 869-0481. Copyright 2016 held by Owner/Author. Publication Rights Licensed to ACM. synthesizable to FPGA and ASIC, and provide a large test ASPLOS ’16, April 02 - 06, 2016, Atlanta, GA, USA suite. Building and supporting such an infrastructure is a ma- Copyright c 2016 ACM 978-1-4503-4091-5/16/04. $15.00 DOI: http://dx.doi.org/10.1145/2872362.2872414 jor undertaking which has prevented such prior designs. Our framework, OpenPiton, attacks this challenge and provides all of these features and more. OpenPiton is the world’s first open source, general- purpose, multithreaded manycore processor. The Open- Piton platform shown in Figure 1 is a modern, tiled, many- core design consisting of a 64-bit architecture using the mature SPARC v9 ISA with a distributed, directory-based, cache coherence protocol (shared distributed L2) imple- mented across three physical, 2D mesh Networks-on-Chip (NoCs). OpenPiton contains a pipelined dual-precision FPU per core and supports native multithreading to hide mem- (a) (b) ory latency. OpenPiton builds upon the industry hardened OpenSPARC T1 [6, 58, 78] core, but sports a completely Figure 2: Architecture of (a) a tile and (b) chipset. scratch-built uncore (caches, cache-coherence protocol, NoCs, of applications, OpenPiton is configurable and extensible. scalable interrupt controller, NoC-based I/O bridges, etc), The number of cores, attached I/O, size of caches, number a new and modern simulation and synthesis framework, a of TLB entries, presence of an FPU, number of threads, and modern set of FPGA scripts, a complete set of ASIC back- network topology are all configurable from a single config- end scripts enabling chip tape-out, and full-stack multiuser uration file. OpenPiton is easy to extend; the presence of a Debian Linux support. OpenPiton is available for download well documented core, a well documented coherence pro- at http://www.openpiton.org. tocol, and an easy to interface NoC make adding research OpenPiton is scalable and portable; the architecture sup- features easy. Research extensions to OpenPiton that have ports addressing for up to 500-million cores, supports shared already been built include several novel memory system ex- memory both within a chip and across multiple chips, and plorations, an Oblivious RAM controller, and a new in-core has been designed to easily enable high performance 1000+ thread scheduler. The validated and mature ISA and software core microprocessors and beyond. The design is imple- ecosystem support OS and compiler research. The release of mented in industry standard Verilog HDL and does not OpenPiton’s ASIC synthesis, FPGA synthesis, and back-end require the use of any new languages. OpenPiton enables (Place and Route) scripts make it easy for others to port to research from the small to the large with demonstrated im- new process technologies or FPGAs. In particular, this en- plementations from a single lite-core, PicoPiton, on a Xilinx ables CAD researchers who need large netlists to evaluate Artix 7 operating at 29.5MHz up to Piton, our 25-core cus- their algorithms at-scale. tom ASIC implementation of OpenPiton, taped-out at 1GHz The following are our key contributions: in 32nm silicon. • Creation and release of the world’s first open source, OpenPiton has been designed as a platform to enable at- general-purpose, multithreaded manycore processor. scale manycore research. An explicit design goal of Open- • Detailed description of the OpenPiton architecture in- Piton is that it should be easy to use by other researchers. To cluding a novel coherent memory system that seamlessly support this, OpenPiton provides a high degree of integra- integrates on-chip and off-chip networks. tion and configurability. Unlike many other designs where • Presentation of multiple use-cases of configurability and the pieces are provided, but it is up to the user to compose extensibility of the OpenPiton Architecture. them together, OpenPiton is designed with all of the compo- • Comparison of FPGA and ASIC implementations of nents integrated into the same easy to use build infrastruc- OpenPiton ture providing push-button scalability. Computer architec- • Characterization of backend synthesis runtimes. ture researchers can easily deploy OpenPiton’s source code, • Characterization of the test coverage of the OpenPiton add in modifications and explore their novel research ideas test suite. in the setting of a fully working system. Over 8000 targeted, • Qualitative comparison against prior open source proces- high-coverage test cases are provided to enable researchers sors. to innovate with a safety net that ensures functionality is maintained. OpenPiton’s open source nature also makes it 2. Architecture easy to release modifications and reproduce previous work for comparison or reuse. OpenPiton is a tiled-manycore architecture, as shown in Beyond being a platform designed by computer archi- Figure 1. It is designed to be scalable, both intra-chip and tects for use by computer architects, OpenPiton enables re- inter-chip. searchers in other fields including OS, security, compilers, Intra-chip, tiles are connected via three networks on-chip runtime tools, systems, and even CAD tool design to con- (NoCs) in a 2D mesh topology. The scalable tiled architec- duct research at-scale. In order to enable such a wide range ture and mesh topology allow for the number of tiles within an OpenPiton chip to be configurable. In the default config- uration, the NoC router address space supports scaling up to NoC1 NoC1 256 tiles in each dimension within a single OpenPiton chip Off-chip Core Private Distributed CCX NoC2 NoC2 Memory (64K cores/chip). L1 L1.5 L2 Controller Inter-chip, an off-chip interface, known as the chip bridge, NoC3 NoC3 connects the tile array (through the tile in the upper-left) to off-chip logic (chipset). The chipset may be implemented on an FPGA, ASIC, or integrated on an OpenPiton chip. The Figure 3: OpenPiton’s memory hierarchy datapath. chip bridge extends the three NoCs off-chip, multiplexing (one way to implement CPU control registers in SPARC). them over a single link. These configuration registers can be useful for adding ad- The extension of NoCs off-chip allows the seamless con- ditional functionality to the core which can be configured nection of multiple OpenPiton chips to create a larger sys- from software, e.g.