High-Speed Soft-Processor Architecture for FPGA Overlays by Charles Eric Laforest a Thesis Submitted in Conformity with the Requ

Total Page:16

File Type:pdf, Size:1020Kb

High-Speed Soft-Processor Architecture for FPGA Overlays by Charles Eric Laforest a Thesis Submitted in Conformity with the Requ High-Speed Soft-Processor Architecture for FPGA Overlays by Charles Eric LaForest A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto c Copyright 2015 by Charles Eric LaForest Abstract High-Speed Soft-Processor Architecture for FPGA Overlays Charles Eric LaForest Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2015 Field-Programmable Gate Arrays (FPGAs) provide an easier path than Application- Specific Integrated Circuits (ASICs) for implementing computing systems, and generally yield higher performance and lower power than optimized software running on high- end CPUs. However, designing hardware with FPGAs remains a difficult and time- consuming process, requiring specialized skills and hours-long CAD processing times. An easier design process abstracts away the FPGA via an \overlay architecture", which implements a computing platform upon which we construct the desired system. Soft- processors represent the base case of overlays, allowing easy software-driven design, but at a large cost in performance and area. This thesis addresses the performance limitations of FPGA soft-processors, as building blocks for overlay architectures. We first aim to maximize the usage of FPGA structures by designing Octavo, a strict round-robin multi-threaded soft-processor architecture tailored to the underlying FPGA and capable of operating at maximal speed. We then scale Octavo to SIMD and MIMD parallelism by replicating its datapath and connecting Octavo cores in a point-to-point mesh. This scaling creates multi-local logic, which we preserve via logical partitioning to avoid artificial critial paths introduced by unnecessary CAD optimizations. We plan ahead for larger Octavo systems by adding support for architectural ex- tensions, instruction predication, and variable I/O latency. These features improve the ii efficiency of hardware extensions, eliminate busy-wait loops, and provide a building block for more efficient code execution. By extracting the flow-control and addressing sub-graphs out of a program's Control- Data Flow Graph (CDFG), we can execute them in parallel with useful computations using special dedicated hardware units, granting better performance than fully unrolled loops without increasing code size. Finally, we benchmark Octavo against the MXP soft vector processor, the NiosII/f scalar soft-processor, and equivalent benchmark implementations written in C synthe- sized with the LegUp High-Level Synthesis (HLS) tool, and direct Verilog Hardware Description Language (HDL) implementations. Octavo's higher clock frequency offsets its higher cycle count, performing roughly on-par with MXP, and within an order of magnitude of the performance of LegUp and Verilog solutions, but with an order of magnitude area penalty. iii Dedication To my family and friends, near and far. It's been a life-long road, and you all helped me travel it. I am to all in your debt. In Memoriam: John Gregory Steffan August 27, 1972 - July 24, 2014 iv Acknowledgements Since I was a teenager, I've always wanted to design a computer. You would not be reading this without the help and influence of many. Firstly, my wife Joy. Do I need to expound on the love, support, and patience of a spouse over a decade spent in academia, with her own career, while raising a family? Secondly, to Greg Steffan. I can credit pretty much my entire academic success to his ability to pull me back to the big picture after fruitfully letting me get lost in the details, arguing for my own direction. Back in 2008, he proposed I do my MASc on a seemingly small little problem, which grew into a sub-field of its own after we published, and won us a significant award. Right from the beginning, while he co-supervised my undergraduate BIS thesis in 2006, before he even had tenure, we talked about what kind of computer architectures would work best on an FPGA. The work you are holding right now stems from those conversations, and I think I can say we figured out a good solution. Neither of us knew how far it would all go, and I am sorry he did not get to see it completed. Jason Anderson stepped up to carry the supervision torch, and gave his time and effort to make sure the LegUp benchmarks were great. I am lucky to have had his support, unflagging enthusiasm, and invaluable thesis-writing feedback. Guy Lemieux, of UBC, influenced the later stages of this work enormously. The entire work on execution overhead originates from a single conversation we had. Guy always took the time to help and explain things, criticizing honestly, greatly stimulating my thinking. One of his students, Aaron Severance, gave his time and effort also to provide MXP benchmark data, better than any I could have done myself, and patiently explained how the machinery worked. Any mistakes or omissions are mine alone. I have had the luck of also working with Vaughn Betz and Tarek Abdelrahman on my thesis committee, both bringing contrasting and broad perspectives on my research. In fact, every ECE and DCS Faculty member I have spoken with have answered my questions and given help freely. v I have also had the pleasure of great student colleagues, too numerous to mention all, but especially Ruediger Willenberg, Xander Chin, Braiden Brousseau, Robert Hesse, Ming G. Liu, Zimo Li, Tristan O'Rourke, Mihai Burcea, Andrew Shorten, Jason Lu, Henry Wong, Mark Jeffreys, Jasmina Vasiljevic, Ioana Baldini, and Charles Chiasson. Some roots of this work came from conversations at MIT with Jack Dennis and some of Arvind's grad students in the summer of 2010. Joel Emer of Intel also provided some early feedback. Octavo's initial design improved a lot after speaking with Patrick Doyle, Mark Stoodley, and other people of IBM Markham. I am also indebted to Blair Fort (as a student), Martin Labrecque, and Peter Yian- nacouras. Their earlier soft-processor work laid a lot of the groundwork for my own. I am still learning things from it. I got nothing but first-class IT tech support from Jaro Pristupa, Eugenia Distefano, and YongJoo Lee, who never blinked at all my software installation requests, and saved my data through RAID failures and my errors. Furthermore, Kelly Chan and Beverly Chu were always available for EECG administrative help, as were Judith Levene and Darlene Gorzo of the ECE Grad Office, and Jayne Leake who managed my TA assignments. A lot of computational horsepower (over 24 CPU-years of it!), was provided by the GPC supercomputer at the SciNet HPC Consortium [83]. My special thanks to Ching- Hsing Yu and Dr. Daniel Gruner for their enthusiasm at helping me scale my computa- tions, even when I unintentionally broke their system a little sometimes. (\What do you mean, I used up all the file descriptors?!?") The folk at Altera have also provided support in many ways: funding, Quartus li- censes, support, and hardware. My special thanks to Blair Fort (as Altera staff) for his freely given help. Further funding came from the ECE Department, the Queen Elizabeth II World Telecommunication Congress Graduate Scholarship in Science and Technology, the Wal- ter C. Sumner Foundation, and NSERC. If you read this and feel forgotten, my apologies. Call me up and reminisce! vi \He was alive, trembling ever so slightly with delight, proud that his fear was under control. Then without ceremony he hugged in his forewings, extended his short, angled wingtips, and plunged directly toward the sea. By the time he passed four thousand feet he had reached terminal velocity, the wind was a solid beating wall of sound against which he could move no faster. He was flying now straight down, at two hundred fourteen miles per hour. He swallowed, knowing that if his wings unfolded at that speed he'd be blown into a million tiny shreds of seagull. But the speed was power, and the speed was joy, and the speed was pure beauty." {\Jonathan Livingston Seagull", Richard Bach I was born to Rock n' Roll. Everything I need. I was born with the hammer down. I was built for speed. {\Built For Speed", Mot¨orhead It is by caffeine alone I set my mind in motion. It is by the beans of Java that thoughts acquire speed, the hands acquire shakes, the shakes become a warning. It is by caffeine alone I set my mind in motion. { variation on the \Mentat Mantra" { attributed to Mark Stein, Arisia SF Convention, Boston, ca. 1993, Sunday morning. Soundtrack by deadmau5. vii Contents 1 Introduction1 1.1 Motivation...................................2 1.2 Contributions.................................3 1.2.1 Chapter3: The Octavo Soft-Processor...............3 1.2.2 Chapter4: Tiling Overlay Architectures..............4 1.2.3 Chapter5: Planning for Larger Systems..............5 1.2.4 Chapter6: Approaching Overhead-Free Execution.........5 1.3 Organization.................................6 2 Background7 2.1 FPGA Architecture..............................7 2.1.1 Logic Elements............................7 2.1.2 Logic Clusters and Interconnect...................9 2.1.3 Island-Style FPGA Architecture...................9 2.2 Limitations of Design Processes with FPGAs................ 10 2.2.1 Hardware Description Languages.................. 11 2.2.2 High-Level Synthesis......................... 12 2.2.3 Overlays................................ 13 Overlays: Design Cycle........................ 13 Overlays: Scalar Soft-Processors................... 14 Overlays: Vector Processors..................... 16 Overlays: \Virtual" FPGAs and CGRAs.............. 17 2.3 Significant FPGA Features Affecting The Design Process......... 18 2.4 The Potential for Higher-Performance Overlays.............. 22 2.5 Summary................................... 25 viii 3 The Octavo Soft Processor 26 3.1 Design Goals................................. 27 3.1.1 A Multi-Threaded Overlay...................... 28 3.2 Related Work................................. 29 3.3 Contributions................................. 30 3.4 Introducing Octavo.............................. 31 3.5 Experimental Framework........................... 32 3.6 Storage Architecture............................
Recommended publications
  • Latticemico32 Development Kit User's Guide for Latticeecp
    LatticeMico32 Development Kit User’s Guide for LatticeECP Lattice Semiconductor Corporation 5555 NE Moore Court Hillsboro, OR 97124 (503) 268-8000 May 2007 Copyright Copyright © 2007 Lattice Semiconductor Corporation. This document may not, in whole or part, be copied, photocopied, reproduced, translated, or reduced to any electronic medium or machine- readable form without prior written consent from Lattice Semiconductor Corporation. Trademarks Lattice Semiconductor Corporation, L Lattice Semiconductor Corporation (logo), L (stylized), L (design), Lattice (design), LSC, E2CMOS, Extreme Performance, FlashBAK, flexiFlash, flexiMAC, flexiPCS, FreedomChip, GAL, GDX, Generic Array Logic, HDL Explorer, IPexpress, ISP, ispATE, ispClock, ispDOWNLOAD, ispGAL, ispGDS, ispGDX, ispGDXV, ispGDX2, ispGENERATOR, ispJTAG, ispLEVER, ispLeverCORE, ispLSI, ispMACH, ispPAC, ispTRACY, ispTURBO, ispVIRTUAL MACHINE, ispVM, ispXP, ispXPGA, ispXPLD, LatticeEC, LatticeECP, LatticeECP-DSP, LatticeECP2, LatticeECP2M, LatticeMico8, LatticeMico32, LatticeSC, LatticeSCM, LatticeXP, LatticeXP2, MACH, MachXO, MACO, ORCA, PAC, PAC-Designer, PAL, Performance Analyst, PURESPEED, Reveal, Silicon Forest, Speedlocked, Speed Locking, SuperBIG, SuperCOOL, SuperFAST, SuperWIDE, sysCLOCK, sysCONFIG, sysDSP, sysHSI, sysI/O, sysMEM, The Simple Machine for Complex Design, TransFR, UltraMOS, and specific product designations are either registered trademarks or trademarks of Lattice Semiconductor Corporation or its subsidiaries in the United States and/or other countries. ISP, Bringing the Best Together, and More of the Best are service marks of Lattice Semiconductor Corporation. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Disclaimers NO WARRANTIES: THE INFORMATION PROVIDED IN THIS DOCUMENT IS “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTY OF ANY KIND INCLUDING WARRANTIES OF ACCURACY, COMPLETENESS, MERCHANTABILITY, NONINFRINGEMENT OF INTELLECTUAL PROPERTY, OR FITNESS FOR ANY PARTICULAR PURPOSE.
    [Show full text]
  • Latticemico32 Software Developer User Guide
    LatticeMico32 Software Developer User Guide May 2014 Copyright Copyright © 2014 Lattice Semiconductor Corporation. This document may not, in whole or part, be copied, photocopied, reproduced, translated, or reduced to any electronic medium or machine-readable form without prior written consent from Lattice Semiconductor Corporation. Trademarks Lattice Semiconductor Corporation, L Lattice Semiconductor Corporation (logo), L (stylized), L (design), Lattice (design), LSC, CleanClock, Custom Mobile Device, DiePlus, E2CMOS, ECP5, Extreme Performance, FlashBAK, FlexiClock, flexiFLASH, flexiMAC, flexiPCS, FreedomChip, GAL, GDX, Generic Array Logic, HDL Explorer, iCE Dice, iCE40, iCE65, iCEblink, iCEcable, iCEchip, iCEcube, iCEcube2, iCEman, iCEprog, iCEsab, iCEsocket, IPexpress, ISP, ispATE, ispClock, ispDOWNLOAD, ispGAL, ispGDS, ispGDX, ispGDX2, ispGDXV, ispGENERATOR, ispJTAG, ispLEVER, ispLeverCORE, ispLSI, ispMACH, ispPAC, ispTRACY, ispTURBO, ispVIRTUAL MACHINE, ispVM, ispXP, ispXPGA, ispXPLD, Lattice Diamond, LatticeCORE, LatticeEC, LatticeECP, LatticeECP-DSP, LatticeECP2, LatticeECP2M, LatticeECP3, LatticeECP4, LatticeMico, LatticeMico8, LatticeMico32, LatticeSC, LatticeSCM, LatticeXP, LatticeXP2, MACH, MachXO, MachXO2, MachXO3, MACO, mobileFPGA, ORCA, PAC, PAC-Designer, PAL, Performance Analyst, Platform Manager, ProcessorPM, PURESPEED, Reveal, SensorExtender, SiliconBlue, Silicon Forest, Speedlocked, Speed Locking, SuperBIG, SuperCOOL, SuperFAST, SuperWIDE, sysCLOCK, sysCONFIG, sysDSP, sysHSI, sysI/O, sysMEM, The Simple Machine for Complex
    [Show full text]
  • Co-Simulation Between Cλash and Traditional Hdls
    MASTER THESIS CO-SIMULATION BETWEEN CλASH AND TRADITIONAL HDLS Author: John Verheij Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Computer Architecture for Embedded Systems (CAES) Exam committee: Dr. Ir. C.P.R. Baaij Dr. Ir. J. Kuper Dr. Ir. J.F. Broenink Ir. E. Molenkamp August 19, 2016 Abstract CλaSH is a functional hardware description language (HDL) developed at the CAES group of the University of Twente. CλaSH borrows both the syntax and semantics from the general-purpose functional programming language Haskell, meaning that circuit de- signers can define their circuits with regular Haskell syntax. CλaSH contains a compiler for compiling circuits to traditional hardware description languages, like VHDL, Verilog, and SystemVerilog. Currently, compiling to traditional HDLs is one-way, meaning that CλaSH has no simulation options with the traditional HDLs. Co-simulation could be used to simulate designs which are defined in multiple lan- guages. With co-simulation it should be possible to use CλaSH as a verification language (test-bench) for traditional HDLs. Furthermore, circuits defined in traditional HDLs, can be used and simulated within CλaSH. In this thesis, research is done on the co-simulation of CλaSH and traditional HDLs. Traditional hardware description languages are standardized and include an interface to communicate with foreign languages. This interface can be used to include foreign func- tions, or to make verification and co-simulation possible. Because CλaSH also has possibilities to communicate with foreign languages, through Haskell foreign function interface (FFI), it is possible to set up co-simulation. The Verilog Procedural Interface (VPI), as defined in the IEEE 1364 standard, is used to set-up the communication and to control a Verilog simulator.
    [Show full text]
  • An Open-Source Python-Based Hardware Generation, Simulation
    An Open-Source Python-Based Hardware Generation, Simulation, and Verification Framework Shunning Jiang Christopher Torng Christopher Batten School of Electrical and Computer Engineering, Cornell University, Ithaca, NY { sj634, clt67, cbatten }@cornell.edu pytest coverage.py hypothesis ABSTRACT Host Language HDL We present an overview of previously published features and work (Python) (Verilog) in progress for PyMTL, an open-source Python-based hardware generation, simulation, and verification framework that brings com- FL DUT pelling productivity benefits to hardware design and verification. CL DUT generate Verilog synth RTL DUT PyMTL provides a natural environment for multi-level modeling DUT' using method-based interfaces, features highly parametrized static Sim FPGA/ elaboration and analysis/transform passes, supports fast simulation cosim ASIC and property-based random testing in pure Python environment, Test Bench Sim and includes seamless SystemVerilog integration. Figure 1: PyMTL’s workflow – The designer iteratively refines the hardware within the host Python language, with the help from 1 INTRODUCTION pytest, coverage.py, and hypothesis. The same test bench is later There have been multiple generations of open-source hardware reused for co-simulating the generated Verilog. FL = functional generation frameworks that attempt to mitigate the increasing level; CL = cycle level; RTL = register-transfer level; DUT = design hardware design and verification complexity. These frameworks under test; DUT’ = generated DUT; Sim = simulation. use a high-level general-purpose programming language to ex- press a hardware-oriented declarative or procedural description level (RTL), along with verification and evaluation using Python- and explicitly generate a low-level HDL implementation. Our pre- based simulation and the same TB.
    [Show full text]
  • Corecommander for Microprocessors and Microcontrollers
    CORECOMMANDER FOR MICROPROCESSORS AND MICROCONTROLLERS Factsheet Direct access to memory and peripheral devices (I/O) for testing, debugging and in-system programming • Direct access to memory and peripheral (I/O) devices of a micro- processor through its (JTAG) debug interface • Read data from, write data to memory and peripherals without software programming • At-speed execution of read and write cycles • Testing and debugging of the connectivity of processor memory and peripherals with at-speed bus cycles without software programming • Easy programming of processor flash memory without software programming Corecommander provides high-level functions to write data to and read data from microprocessor memory Order information CoreComm Micro (core) and I/O addresses without software programming. (core) = ARM 7, ARM 9, ARM 11, Cortex-A, Cortex-R, CoreCommander functions are applied via the JTAG Cortex-M, Blackfin, PXA2xx, PXA3xx, IXP4xx, PowerPC- interface. MPC500 family, PowerPC-MPC5500 family, PowerPC- MPC5600 family, C28x, XC166, Tricore, PIC32 Applications CoreCommander is used in design debug, manufactu- ring test and (field) service for many different applica- [1] if the uProcessor also contains a boundary-scan register then teh tests and in-system tions such as: programming operations can also be done using the boundary-scan register instead of the CoreCommander. Whether in that case the CoreCommander or the boundary-scan register is used depends on preference or performance. • Diagnosing “dead-kernel” boards; no embedded code is required to perform memory reads and Background writes. A uP performs read and write operations on its bus to ac- cess memory and I/O locations. The read and write cycles • Determining the right settings for the peripheral normally result when the uP executes a program that is controller (DDR controller, flash memory controller, stored in memory.
    [Show full text]
  • Implementation, Verification and Validation of an Openrisc-1200
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 10, No. 1, 2019 Implementation, Verification and Validation of an OpenRISC-1200 Soft-core Processor on FPGA Abdul Rafay Khatri Department of Electronic Engineering, QUEST, NawabShah, Pakistan Abstract—An embedded system is a dedicated computer system in which hardware and software are combined to per- form some specific tasks. Recent advancements in the Field Programmable Gate Array (FPGA) technology make it possible to implement the complete embedded system on a single FPGA chip. The fundamental component of an embedded system is a microprocessor. Soft-core processors are written in hardware description languages and functionally equivalent to an ordinary microprocessor. These soft-core processors are synthesized and implemented on the FPGA devices. In this paper, the OpenRISC 1200 processor is used, which is a 32-bit soft-core processor and Fig. 1. General block diagram of embedded systems. written in the Verilog HDL. Xilinx ISE tools perform synthesis, design implementation and configure/program the FPGA. For verification and debugging purpose, a software toolchain from (RISC) processor. This processor consists of all necessary GNU is configured and installed. The software is written in C components which are available in any other microproces- and Assembly languages. The communication between the host computer and FPGA board is carried out through the serial RS- sor. These components are connected through a bus called 232 port. Wishbone bus. In this work, the OR1200 processor is used to implement the system on a chip technology on a Virtex-5 Keywords—FPGA Design; HDLs; Hw-Sw Co-design; Open- FPGA board from Xilinx.
    [Show full text]
  • Programmable Logic Design Grzegorz Budzyń Lecture 11: Microcontroller
    ProgrammableProgrammable LogicLogic DesignDesign GrzegorzGrzegorz BudzyBudzy ńń LLectureecture 11:11: MicrocontrollerMicrocontroller corescores inin FPGAFPGA Plan • Introduction • PicoBlaze • MicroBlaze • Cortex-M1 Introduction Introduction • There are dozens of 8-bit microcontroller architectures and instruction sets • Modern FPGAs can efficiently implement practically any 8-bit microcontroller • Available FPGA soft cores support popular instruction sets such as the PIC, 8051, AVR, 6502, 8080, and Z80 microcontrollers • Each significant FPGA producer offers their own soft core solution Introduction • Microcontrollers and FPGAs both successfully implement practically any digital logic function. • Each solution has unique advantages in cost, performance, and ease of use. • Microcontrollers are well suited to control applications, especially with widely changing requirements. • The FPGA resources required to implement the microcontroller are relatively constant. Introduction • The same FPGA logic is re-used by the various microcontroller instructions, conserving resources. • The program memory requirements grow with increasing complexity • Programming control sequences or state machines in assembly code is often easier than creating similar structures in FPGA logic • Microcontrollers are typically limited by performance. Each instruction executes sequentially. Introduction – block diagram Source:[1] FPGA vs Soft Core Microcontroller: – Easy to program, excellent for control and state machine applications – Resource requirements remain constant
    [Show full text]
  • Intro to Programmable Logic and Fpgas
    CS 296-33: Intro to Programmable Logic and FPGAs ADEL EJJEH UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGN © Adel Ejjeh, UIUC, 2015 2 Digital Logic • In CS 233: • Logic Gates • Build Logic Circuits • Sum of Products ?? F = (A’.B)+(B.C)+(A.C’) A B Black F Box C © Adel Ejjeh, UIUC, 2015 3 Programmable Logic Devices (PLDs) PLD PLA PAL CPLD FPGA (Programmable (Programmable (Complex PLD) (Field Prog. Logic Array) Array Logic) Gate Array) •2-level structure of •Enhanced PLAs •For large designs •Has a much larger # of AND-OR gates with reduced costs •Collection of logic blocks with programmable multiple PLDs with •Larger interconnection connections an interconnection networK structure •Largest manufacturers: Xilinx - Altera Slide taken from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 4 Combinational Programmable Logic Devices PLAs, CPLDs © Adel Ejjeh, UIUC, 2015 5 Programmable Logic Arrays (PLAs) • 2-level AND-OR device • Programmable connections • Used to generate SOP • Ex: 4x3 PLA Slide adapted from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 6 PLAs contd • O1 = I1.I2’ + I4.I3’ • O2 = I2.I3.I4’ + I4.I3’ • O3 = I1.I2’ + I2.I1’ Slide adapted from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 7 Programmable Array Logic (PALs) • More Versatile than PLAs • User Programmable AND array followed by fixed OR gates • Flip-flops/Buffers with feedback transforming output ports into I/O ports © Adel Ejjeh, UIUC, 2015 8 Complex PLDs (CPLD) • Programmable PLD blocks (PALs) I/O Block I/O I/O
    [Show full text]
  • PDF 19308 Kb ADSP-Bf50x Blackfin ® Processor Hardware Reference
    ADSP-BF50x Blackfin® Processor Hardware Reference Revision 1.2, February 2013 Part Number 82-100101-01 Analog Devices, Inc. One Technology Way Norwood, Mass. 02062-9106 a Copyright Information © 2013 Analog Devices, Inc., ALL RIGHTS RESERVED. This docu- ment may not be reproduced in any form without prior, express written consent from Analog Devices, Inc. Printed in the USA. Disclaimer Analog Devices, Inc. reserves the right to change this product without prior notice. Information furnished by Analog Devices is believed to be accurate and reliable. However, no responsibility is assumed by Analog Devices for its use; nor for any infringement of patents or other rights of third parties which may result from its use. No license is granted by impli- cation or otherwise under the patent rights of Analog Devices, Inc. Trademark and Service Mark Notice The Analog Devices logo, Blackfin, CrossCore, EngineerZone, EZ-KIT Lite, and VisualDSP++ are registered trademarks of Analog Devices, Inc. All other brand and product names are trademarks or service marks of their respective owners. CONTENTS PREFACE Purpose of This Manual .................................................................. li Intended Audience .......................................................................... li Manual Contents ........................................................................... lii What’s New in This Manual ........................................................... lv Technical Support .........................................................................
    [Show full text]
  • Openpiton: an Open Source Manycore Research Framework
    OpenPiton: An Open Source Manycore Research Framework Jonathan Balkind Michael McKeown Yaosheng Fu Tri Nguyen Yanqi Zhou Alexey Lavrov Mohammad Shahrad Adi Fuchs Samuel Payne ∗ Xiaohua Liang Matthew Matl David Wentzlaff Princeton University fjbalkind,mmckeown,yfu,trin,yanqiz,alavrov,mshahrad,[email protected], [email protected], fxiaohua,mmatl,[email protected] Abstract chipset Industry is building larger, more complex, manycore proces- sors on the back of strong institutional knowledge, but aca- demic projects face difficulties in replicating that scale. To Tile alleviate these difficulties and to develop and share knowl- edge, the community needs open architecture frameworks for simulation, synthesis, and software exploration which Chip support extensibility, scalability, and configurability, along- side an established base of verification tools and supported software. In this paper we present OpenPiton, an open source framework for building scalable architecture research proto- types from 1 core to 500 million cores. OpenPiton is the world’s first open source, general-purpose, multithreaded manycore processor and framework. OpenPiton leverages the industry hardened OpenSPARC T1 core with modifica- Figure 1: OpenPiton Architecture. Multiple manycore chips tions and builds upon it with a scratch-built, scalable uncore are connected together with chipset logic and networks to creating a flexible, modern manycore design. In addition, build large scalable manycore systems. OpenPiton’s cache OpenPiton provides synthesis and backend scripts for ASIC coherence protocol extends off chip. and FPGA to enable other researchers to bring their designs to implementation. OpenPiton provides a complete verifica- tion infrastructure of over 8000 tests, is supported by mature software tools, runs full-stack multiuser Debian Linux, and has been widespread across the industry with manycore pro- is written in industry standard Verilog.
    [Show full text]
  • A Pythonic Approach for Rapid Hardware Prototyping and Instrumentation
    A Pythonic Approach for Rapid Hardware Prototyping and Instrumentation John Clow, Georgios Tzimpragos, Deeksha Dangwal, Sammy Guo, Joseph McMahan and Timothy Sherwood University of California, Santa Barbara, CA, 93106 USA Email: fjclow, gtzimpragos, deeksha, sguo, jmcmahan, [email protected] Abstract—We introduce PyRTL, a Python embedded hardware To achieve these goals, PyRTL intentionally restricts users design language that helps concisely and precisely describe to a set of reasonable digital design practices. PyRTL’s small digital hardware structures. Rather than attempt to infer a and well-defined internal core structure makes it easy to add good design via HLS, PyRTL provides a wrapper over a well- defined “core” set of primitives in a way that empowers digital new functionality that works across every design, including hardware design teaching and research. The proposed system logic transforms, static analysis, and optimizations. Features takes advantage of the programming language features of Python such as elaboration-through-execution (e.g. introspection), de- to allow interesting design patterns to be expressed succinctly, and sign and simulation without leaving Python, and the option encourage the rapid generation of tooling and transforms over to export to, or import from, common HDLs (Verilog-in via a custom intermediate representation. We describe PyRTL as a language, its core semantics, the transform generation interface, Yosys [1] and BLIF-in, Verilog-out) are also supported. More and explore its application to several different design patterns and information about PyRTL’s high level flow can be found in analysis tools. Also, we demonstrate the integration of PyRTL- Figure 1. generated hardware overlays into Xilinx PYNQ platform.
    [Show full text]
  • Implementation of PS2 Keyboard Controller IP Core for on Chip Embedded System Applications
    The International Journal Of Engineering And Science (IJES) ||Volume||2 ||Issue|| 4 ||Pages|| 48-50||2013|| ISSN(e): 2319 – 1813 ISSN(p): 2319 – 1805 Implementation of PS2 Keyboard Controller IP Core for On Chip Embedded System Applications 1, 2, Medempudi Poornima, Kavuri Vijaya Chandra 1,M.Tech Student 2,Associate Proffesor. 1,2,Prakasam Engineering College,Kandukuru(post), Kandukuru(m.d), Prakasam(d.t). -----------------------------------------------------------Abstract----------------------------------------------------- In many case on chip systems are used to reduce the development cycles. Mostly IP (Intellectual property) cores are used for system development. In this paper, the IP core is designed with ALTERA NIOSII soft-core processors as the core and Cyclone III FPGA series as the digital platform, the SOPC technology is used to make the I/O interface controller soft-core such as microprocessors and PS2 keyboard on a chip of FPGA. NIOSII IDE is used to accomplish the software testing of system and the hardware test is completed by ALTERA Cyclone III EP3C16F484C6 FPGA chip experimental platform. The result shows that the functions of this IP core are correct, furthermore it can be reused conveniently in the SOPC system. ---------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 8 April 2013 Date Of Publication: 25, April.2013 ---------------------------------------------------------------------------------------------------------------------------------------
    [Show full text]