The Past and Future of FPGA Soft Processors

Total Page:16

File Type:pdf, Size:1020Kb

The Past and Future of FPGA Soft Processors The Past and Future of FPGA Soft Processors Jan Gray Gray Research LLC [email protected] ReConFig 2014 Keynote 9 Dec 2014 Copyright © 2014, Gray Research LLC. Licensed under Creative Commons Attribution 4.0 International (CC BY 4.0) license. http://creativecommons.org/licenses/by/4.0/ In Celebration of Soft Processors • Looking back • Interlude: “old school” soft processor, revisited • Looking ahead 9 Dec 2014 ReConFig 2014 2 New Engines Bring New Design Eras 9 Dec 2014 ReConFig 2014 3 1. EARLY DAYS 9 Dec 2014 ReConFig 2014 4 1985-1990: Prehistory • XC2000, XC3000: not quite up to the job – Early multi-FPGA coprocessors – ~8-bit MISCs 9 Dec 2014 ReConFig 2014 5 1991: XC4000 9 Dec 2014 ReConFig 2014 6 1991: RISC4005 [P. Freidin] • The first monolithic general purpose FPGA CPU • “FPGA Devices: 1 Xilinx XC4005 ... On-board RAM: 64K Words (16 bit words) Notes: A 16 bit RISC processor that requires 75% of an XC4005, 16 general registers, 4 stage pipeline, 20 MHz. Can be integrated with peripherals on 1 FPGA, and ISET can be extended. … Includes a macro assembler, gate level simulator, ANSI C compiler, and a debug monitor.” [Steve Guccione: List of FPGA-based Computing Machines, http://www.cmpware.com/io.com/guccione/HW_list.html] Freidin Photos: Photos: Philip 9 Dec 2014 ReConFig 2014 7 1994-95: Gathering Steam • Communities: FCCM, comp.arch.fpga [http://fpga-faq.org/archives/index.html] • Research, commercial interest – OneChip, V6502 9 Dec 2014 ReConFig 2014 8 1995: J32 • 32-bit RISC + “SoC” • Integer only • 33 MHz ÷ 2φ • 4-stage pipeline • <60% of XC4010 • C++ XNF .bit 9 Dec 2014 ReConFig 2014 9 J32 Microarchitecture 9 Dec 2014 ReConFig 2014 10 1995-96: XC4000E and FLEX10K 9 Dec 2014 ReConFig 2014 11 1998: XSOC/xr16 Kit + http://www.xess.com/shop/product/xs40-005e/ 9 Dec 2014 ReConFig 2014 12 1998: xr16 • 40 MHz 16-bit RISC, DMA • XC4005E/XL/Spartan-10, 265 LUTs • LCC C compiler, simulator • Building a RISC System in an FPGA, Circuit Cellar series [http://fpgacpu.org/xsoc/cc.html] • FPGA CPU News, fpga-cpu list 9 Dec 2014 ReConFig 2014 13 1998: xr16 Datapath XC4005E Floorplan 9 Dec 2014 ReConFig 2014 14 1998: Virtex 9 Dec 2014 ReConFig 2014 15 2000: Nios, SOPC Builder 9 Dec 2014 ReConFig 2014 16 2000: FPGA Chip Multiprocessors • 3rd gen 16/32-bit RISC PE: 200/330 LUTs + 1 BRAM • 8 cores fit in an XCV50E, 60 in an XCV600E 9 Dec 2014 ReConFig 2014 17 2001-02: Virtex-II/Pro, MicroBlaze, EDK 9 Dec 2014 ReConFig 2014 18 2002: The End of the Beginning • Diverse 3rd party soft processors – Little MCUs – KCPSMPicoBlaze – Commercial RISCs – ARC, LEON SPARC – Legacy ISAs – 6502, Z80, 68000 – Hobbyist / open source – OpenRISC – Language specific cores – Java, Forth, Erlang – Teaching – Chalmers, Cornell, Georgia Tech, Hiroshima, Mich. State, NM Tech, Riverside, Tokai, UCSC, Valladolid, Virginia Tech, WUStL • Nios, MicroBlaze: comprehensive SoC platforms 9 Dec 2014 ReConFig 2014 19 2001-2014: MicroBlaze Evolution/Configurability Version: 1. 3-stage, mul, bshift , CoreConnect 2. div, FSL, I$, D$, 150 MHz = 100 DMIPS 3. cache links 4. FPU, debug trace 5. 5-stage pipeline, = 240 DMIPS 6. 3/5-stage 7. MMU, exceptions, Linux 8. AXI4, fault tolerance, 330 MHz, = 400 DMIPS ~4.4× faster / 12 years = +13%/year 9 Dec 2014 ReConFig 2014 20 http://forums.xilinx.com/t5/Xcell-Daily-Blog/Mars-Curiosity-Rover-s-MAHLI-images-a-dusty-penny-on-Mars-with/ba-p/369275 http://www-robotics.jpl.nasa.gov/publications/Reg_Willson/Edgett_etal_MAHLI_7Jul2012published.pdf Remarkable (But Typical) Applications Typical) (But Remarkable 9 Dec 2014 ReConFig 2014 21 http://forums.xilinx.com/t5/Xcell-Daily-Blog/The-Search-for-Gravity-Waves-and-Dark-Energy-Gets-Help-from/ba-p/491048 The Utility of Soft Processors • Run existing software, in the FPGA – Replace external MCU – Run RTOS / Linux / drivers / networking / web server • Control plane – Replace complex state machines – Hardware boot, diagnostics, telemetry • Accelerators – Customize with app-specific instruction sets – Tightly couple software to accelerators • Computer architecture research – RAMP*, CHERI, FlexPRET 9 Dec 2014 ReConFig 2014 22 2. DOING IT “OLD SCHOOL”: AN AUSTERE APPROACH TO SOFT PROCESSOR DESIGN 9 Dec 2014 ReConFig 2014 23 Design Study: How Many Austere 32-bit RISCs Fit in a Modern FPGA? Virtex-7 690T: (433K 6-LUTs, 1470 BRAMs) ÷ (330 LUTs, 1 BRAM) = ? • What about…? – 6-LUTs? DSPs? – NoC? Memory model? • Let’s see 9 Dec 2014 ReConFig 2014 24 Planning an Austere RISC PE – Psilos • Assumptions • Pipeline design – Xilinx 6-LUTs – Critical path: operand regs ALU – CMP on-die shared memory mux mux operand regs – Run small kernels (no I$) – BRAM for instructions, data – MicroBlaze integer subset – IF:DC:EX:MEM • Essential: I-fetch, reg file, • 2-core clusters ALU, PC/branches, lw/sw – Share 4 KB instruction BRAM • Configurable: lb/sb, lh/sh – Share 4 KB local data BRAM – Share 32×32=64 mul • Cluster-shared: mul bshift – 2 BRAM 10R×16C slices • A<<k = A×2k A>>k = mulh(A×232-k) – 1 PE ≤ ½×10×16×4×¾ = 240 LUTs 9 Dec 2014 ReConFig 2014 25 RISC PE Datapath D_AD A A REG L FILE U RESULT MULT* B D_IN NEXTPC IN_AD IN_IN IR PC DO D_OUT 9 Dec 2014 ReConFig 2014 26 RISC PE Technology Mapping D_AD 2 @ 4 LUTs/6 bits A A 1 LUT/bit 1 LUT/bit REG L FILE U RESULT 2 @ 4 LUTs/8 bits MULT* 4 DSPs (clustered) B D_IN NEXTPC 1 LUT/bit IN_AD IN_IN IR PC DO D_OUT 9 Dec 2014 ReConFig 2014 27 Floorplan: <200 6-LUTs, 300 MHz 9 Dec 2014 ReConFig 2014 28 2-PE Cluster O P1_AD N_DO P0_AD P0_D N_AD N_DI P1_DO PE 0 DATA BRAM PE 1 PEs share 4KB instruction RAM D_AD AD AD D_AD and 4 KB data RAM. D_OUT DI DI D_OUT NOC interface can write D_IN DO DO D_IN instruction RAM and read/write data RAM. PE 0 enjoys dedicated ports. PE 1 shares with NOC. INSN BRAM Priority: NOC, PE 1. IN_AD AD AD IN_AD Not shown: PEs 0 and 1 share DI DI the non-pipelined 32x32=64 IN_IN DO DO IN_IN multiplier. 9 Dec 2014 ReConFig 2014 29 2-PE CLUSTER 0 4-PE Cluster P0_AD,P0_DO P0_AD,P0_DO P1_AD,P1_DO P1_AD,P1_DO2-PE CLUSTER 1 N_I P0_AD,P0_DO P2_AD,P2_DO P1_AD,P1_DO P3_AD,P3_DO N_DO,N_DI N_AD N_DO N_DI N_AD 2x2=4 PE CLUSTER N_DO 4:1 concentration of NOC accesses. N_DI NOC interface can write instruction RAM and read/write data RAM. N_O Upon receipt of a read-request, N_O a read-response (e.g. write) is N_AD,N_DO injected back into NOC. 9 Dec 2014 ReConFig 2014 (Not shown: read-response FIFO.)30 A 25R×10C 2-D Torus NoC Using 250 5-port Routers (!) N W E ROUTER O I S 4-PE CLUSTER 9 Dec 2014 ReConFig 2014 31 N Hoplite Austere X:Y Router W E ROUTER O I S 4-PE N CLUSTER (8 KB DATA) N HOPLITE: AUSTERE X:Y ROUTER (Unidirectional 2D Torus) W E W Wide. Fast. Simple. Tiny. Bufferless, but no dropped traffic. Dimension Ordered Routing: X>Y. Priority: N, W, I. I No segmentation/reassembly. No flits. No VCs. Simple flow control. O Grossly unfair, but deadlock-free. I S 9 Dec 2014 ReConFig 2014 32 Phalanx re.jpg tu ic 0p %2 facts 20 x% an al cs/ph liti po ld or /apw om .c bs we .free w ww :// ttp h • 25R×10C×4-PEs + routers – 1000 PEs / Virtex-7-690T • OK, but …? – Latency? – External memory system? – Workloads? 9 Dec 2014 ReConFig 2014 33 3. THE FUTURE OF SOFT PROCESSORS 9 Dec 2014 ReConFig 2014 34 FPGAs: What’s Next? • Today • Soon? (speculative) – 20-28 nm low-power – 10-14 nm lower-power but Dennard – 600 KLUTs, 1000s DSPs – 2+ MLUTs, 1000s DSP-FPUs, TFLOPS! – 2.5D packaging – 2.5D packaging • 1.2 MLUTs • 4+ MLUTs • Hetero: 28 Gbps serdes die • + DRAM, 50+ Gbps serdes – + 2×ARM A9s+SoC – “Datacenter edition”? • ARM tools and IP • 4-8×64-bit ARM+SoC + DRAM + ? – Opportunities and challenges! 9 Dec 2014 ReConFig 2014 35 • Catapult [ISCA14] • Microsoft Research + Bing joint study and pilot • Accelerate Bing search query ranking with FPGAs at datacenter scale • Doubled throughput /or/ greatly reduced latency Server Node += FPGA Two 8-core Xeon 2.1 GHz CPUs 64 GB DRAM 4 HDDs, 2 SSDs 10 Gb Ethernet 9 Dec 2014 ReConFig 2014 37 Catapult FPGA Accelerator Card Stratix V Altera Stratix V D5: 172,600 ALMs, 2,014 M20Ks, 1,590 DSPs 8GB DDR3 PCIe Gen 3 x8 8GB DDR3-1333 Powered by PCIe slot FPGA – FPGA: torus network PCIe Gen3 x8 9 Dec 2014 ReConFig 2014 38 1,632 Server Pilot Deployed in a Production Datacenter FPGA Accelerator for Search Ranking Document 8-Stage Pipeline RaaS Servers Route to Head FE: Feature Extraction FPGA 0 Server Route to FPGA 1 Head Server Document FPGA 2 Server FFE: Free-Form Scoring Expressions Request FPGA 3 Server Return Score FPGA 4 Server Document FPGA 5 Scoring Server Request FPGA 6 Server Return Score Compute Score Score FPGA 7 Server Compute Score 9 Dec 2014 ReConFig 2014 40 FFE: Free Form Expressions {Query,Document Document} Occurrences_0 = 7 Occurrences_1 = 4 Tuples_0_1 = 1 ~4K Dynamic Features FFE #1 =(2*Occurrences_0 + Occurrences_1) (2 * Tuples_0_1) ~2K Synthetic Features FFE #1 = 9 L2Score Score FFE Soft Processor Arrays • Many FFE “programs” soft processor approach Cluster • 4 hardware threads/core 0 • 6 cores/cluster –Shared input vector –Shared pipelined FPUs • 10 clusters/FPGA Core 0 Core 1 Core 2 –60 cores, 240 threads FST Complex • C++ compiler Output Core 3 Core 4 Core 5 The Design Productivity Challenge • Catapult Bing ranking: ~20 KLOCs C++ FPGAs • RTL design productivity << software development – Smaller talent pool, weak tools, bespoke, fragile • Missing essentials – Abstraction builders: languages, types, libraries, services, OS – Reuse, composability, portability, longevity • Often the workload is software already, changes often – Expensive to port and maintain 9 Dec 2014 ReConFig 2014 43 Making HW Dev More Like SW • Vivado HLS one core
Recommended publications
  • Field Programmable Gate Arrays with Hardwired Networks on Chip
    Field Programmable Gate Arrays with Hardwired Networks on Chip PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus prof. ir. K.C.A.M. Luyben, voorzitter van het College voor Promoties, in het openbaar te verdedigen op dinsdag 6 november 2012 om 15:00 uur door MUHAMMAD AQEEL WAHLAH Master of Science in Information Technology Pakistan Institute of Engineering and Applied Sciences (PIEAS) geboren te Lahore, Pakistan. Dit proefschrift is goedgekeurd door de promotor: Prof. dr. K.G.W. Goossens Copromotor: Dr. ir. J.S.S.M. Wong Samenstelling promotiecommissie: Rector Magnificus voorzitter Prof. dr. K.G.W. Goossens Technische Universiteit Eindhoven, promotor Dr. ir. J.S.S.M. Wong Technische Universiteit Delft, copromotor Prof. dr. S. Pillement Technical University of Nantes, France Prof. dr.-Ing. M. Hubner Ruhr-Universitat-Bochum, Germany Prof. dr. D. Stroobandt University of Gent, Belgium Prof. dr. K.L.M. Bertels Technische Universiteit Delft Prof. dr.ir. A.J. van der Veen Technische Universiteit Delft, reservelid ISBN: 978-94-6186-066-8 Keywords: Field Programmable Gate Arrays, Hardwired, Networks on Chip Copyright ⃝c 2012 Muhammad Aqeel Wahlah All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without permission of the author. Printed in The Netherlands Acknowledgments oday when I look back, I find it a very interesting journey filled with different emotions, i.e., joy and frustration, hope and despair, and T laughter and sadness.
    [Show full text]
  • RTEMS CPU Supplement Documentation Release 4.11.3 ©Copyright 2016, RTEMS Project (Built 15Th February 2018)
    RTEMS CPU Supplement Documentation Release 4.11.3 ©Copyright 2016, RTEMS Project (built 15th February 2018) CONTENTS I RTEMS CPU Architecture Supplement1 1 Preface 5 2 Port Specific Information7 2.1 CPU Model Dependent Features...........................8 2.1.1 CPU Model Name...............................8 2.1.2 Floating Point Unit..............................8 2.2 Multilibs........................................9 2.3 Calling Conventions.................................. 10 2.3.1 Calling Mechanism.............................. 10 2.3.2 Register Usage................................. 10 2.3.3 Parameter Passing............................... 10 2.3.4 User-Provided Routines............................ 10 2.4 Memory Model..................................... 11 2.4.1 Flat Memory Model.............................. 11 2.5 Interrupt Processing.................................. 12 2.5.1 Vectoring of an Interrupt Handler...................... 12 2.5.2 Interrupt Levels................................ 12 2.5.3 Disabling of Interrupts by RTEMS...................... 12 2.6 Default Fatal Error Processing............................. 14 2.7 Symmetric Multiprocessing.............................. 15 2.8 Thread-Local Storage................................. 16 2.9 CPU counter...................................... 17 2.10 Interrupt Profiling................................... 18 2.11 Board Support Packages................................ 19 2.11.1 System Reset................................. 19 3 ARM Specific Information 21 3.1 CPU Model Dependent Features..........................
    [Show full text]
  • Implementation, Verification and Validation of an Openrisc-1200
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 10, No. 1, 2019 Implementation, Verification and Validation of an OpenRISC-1200 Soft-core Processor on FPGA Abdul Rafay Khatri Department of Electronic Engineering, QUEST, NawabShah, Pakistan Abstract—An embedded system is a dedicated computer system in which hardware and software are combined to per- form some specific tasks. Recent advancements in the Field Programmable Gate Array (FPGA) technology make it possible to implement the complete embedded system on a single FPGA chip. The fundamental component of an embedded system is a microprocessor. Soft-core processors are written in hardware description languages and functionally equivalent to an ordinary microprocessor. These soft-core processors are synthesized and implemented on the FPGA devices. In this paper, the OpenRISC 1200 processor is used, which is a 32-bit soft-core processor and Fig. 1. General block diagram of embedded systems. written in the Verilog HDL. Xilinx ISE tools perform synthesis, design implementation and configure/program the FPGA. For verification and debugging purpose, a software toolchain from (RISC) processor. This processor consists of all necessary GNU is configured and installed. The software is written in C components which are available in any other microproces- and Assembly languages. The communication between the host computer and FPGA board is carried out through the serial RS- sor. These components are connected through a bus called 232 port. Wishbone bus. In this work, the OR1200 processor is used to implement the system on a chip technology on a Virtex-5 Keywords—FPGA Design; HDLs; Hw-Sw Co-design; Open- FPGA board from Xilinx.
    [Show full text]
  • Programmable Logic Design Grzegorz Budzyń Lecture 11: Microcontroller
    ProgrammableProgrammable LogicLogic DesignDesign GrzegorzGrzegorz BudzyBudzy ńń LLectureecture 11:11: MicrocontrollerMicrocontroller corescores inin FPGAFPGA Plan • Introduction • PicoBlaze • MicroBlaze • Cortex-M1 Introduction Introduction • There are dozens of 8-bit microcontroller architectures and instruction sets • Modern FPGAs can efficiently implement practically any 8-bit microcontroller • Available FPGA soft cores support popular instruction sets such as the PIC, 8051, AVR, 6502, 8080, and Z80 microcontrollers • Each significant FPGA producer offers their own soft core solution Introduction • Microcontrollers and FPGAs both successfully implement practically any digital logic function. • Each solution has unique advantages in cost, performance, and ease of use. • Microcontrollers are well suited to control applications, especially with widely changing requirements. • The FPGA resources required to implement the microcontroller are relatively constant. Introduction • The same FPGA logic is re-used by the various microcontroller instructions, conserving resources. • The program memory requirements grow with increasing complexity • Programming control sequences or state machines in assembly code is often easier than creating similar structures in FPGA logic • Microcontrollers are typically limited by performance. Each instruction executes sequentially. Introduction – block diagram Source:[1] FPGA vs Soft Core Microcontroller: – Easy to program, excellent for control and state machine applications – Resource requirements remain constant
    [Show full text]
  • Implementation of PS2 Keyboard Controller IP Core for on Chip Embedded System Applications
    The International Journal Of Engineering And Science (IJES) ||Volume||2 ||Issue|| 4 ||Pages|| 48-50||2013|| ISSN(e): 2319 – 1813 ISSN(p): 2319 – 1805 Implementation of PS2 Keyboard Controller IP Core for On Chip Embedded System Applications 1, 2, Medempudi Poornima, Kavuri Vijaya Chandra 1,M.Tech Student 2,Associate Proffesor. 1,2,Prakasam Engineering College,Kandukuru(post), Kandukuru(m.d), Prakasam(d.t). -----------------------------------------------------------Abstract----------------------------------------------------- In many case on chip systems are used to reduce the development cycles. Mostly IP (Intellectual property) cores are used for system development. In this paper, the IP core is designed with ALTERA NIOSII soft-core processors as the core and Cyclone III FPGA series as the digital platform, the SOPC technology is used to make the I/O interface controller soft-core such as microprocessors and PS2 keyboard on a chip of FPGA. NIOSII IDE is used to accomplish the software testing of system and the hardware test is completed by ALTERA Cyclone III EP3C16F484C6 FPGA chip experimental platform. The result shows that the functions of this IP core are correct, furthermore it can be reused conveniently in the SOPC system. ---------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 8 April 2013 Date Of Publication: 25, April.2013 ---------------------------------------------------------------------------------------------------------------------------------------
    [Show full text]
  • IT Acronyms.Docx
    List of computing and IT abbreviations /.—Slashdot 1GL—First-Generation Programming Language 1NF—First Normal Form 10B2—10BASE-2 10B5—10BASE-5 10B-F—10BASE-F 10B-FB—10BASE-FB 10B-FL—10BASE-FL 10B-FP—10BASE-FP 10B-T—10BASE-T 100B-FX—100BASE-FX 100B-T—100BASE-T 100B-TX—100BASE-TX 100BVG—100BASE-VG 286—Intel 80286 processor 2B1Q—2 Binary 1 Quaternary 2GL—Second-Generation Programming Language 2NF—Second Normal Form 3GL—Third-Generation Programming Language 3NF—Third Normal Form 386—Intel 80386 processor 1 486—Intel 80486 processor 4B5BLF—4 Byte 5 Byte Local Fiber 4GL—Fourth-Generation Programming Language 4NF—Fourth Normal Form 5GL—Fifth-Generation Programming Language 5NF—Fifth Normal Form 6NF—Sixth Normal Form 8B10BLF—8 Byte 10 Byte Local Fiber A AAT—Average Access Time AA—Anti-Aliasing AAA—Authentication Authorization, Accounting AABB—Axis Aligned Bounding Box AAC—Advanced Audio Coding AAL—ATM Adaptation Layer AALC—ATM Adaptation Layer Connection AARP—AppleTalk Address Resolution Protocol ABCL—Actor-Based Concurrent Language ABI—Application Binary Interface ABM—Asynchronous Balanced Mode ABR—Area Border Router ABR—Auto Baud-Rate detection ABR—Available Bitrate 2 ABR—Average Bitrate AC—Acoustic Coupler AC—Alternating Current ACD—Automatic Call Distributor ACE—Advanced Computing Environment ACF NCP—Advanced Communications Function—Network Control Program ACID—Atomicity Consistency Isolation Durability ACK—ACKnowledgement ACK—Amsterdam Compiler Kit ACL—Access Control List ACL—Active Current
    [Show full text]
  • The Cortex-M Series: Hardware and Software
    The Cortex-M Chapter Series: Hardware 2 and Software Introduction In this chapter the real-time DSP platform of primary focus for the course, the Cortex M4, will be introduced and explained. in terms of hardware, software, and development environments. Beginning topics include: • ARM Architectures and Processors – What is ARM Architecture – ARM Processor Families – ARM Cortex-M Series – Cortex-M4 Processor – ARM Processor vs. ARM Architectures • ARM Cortex-M4 Processor – Cortex-M4 Processor Overview – Cortex-M4 Block Diagram – Cortex-M4 Registers ECE 5655/4655 Real-Time DSP 2–1 Chapter 2 • The Cortex-M Series: Hardware and Software What is ARM Architecture • ARM architecture is a family of RISC-based processor archi- tectures – Well-known for its power efficiency; – Hence widely used in mobile devices, such as smart phones and tablets – Designed and licensed to a wide eco-system by ARM • ARM Holdings – The company designs ARM-based processors; – Does not manufacture, but licenses designs to semiconduc- tor partners who add their own Intellectual Property (IP) on top of ARM’s IP, fabricate and sell to customers; – Also offer other IP apart from processors, such as physical IPs, interconnect IPs, graphics cores, and development tools 2–2 ECE 5655/4655 Real-Time DSP ARM Processor Families ARM Processor Families • Cortex-A series (Application) Cortex-A57 Cortex-A53 – High performance processors Cortex-A15 Cortex-A9 Cortex-A Cortex-A8 capable of full Operating Sys- Cortex-A7 Cortex-A5 tem (OS) support; Cortex-R7 Cortex-R5 Cortex-R – Applications include smart- Cortex-R4 Cortex-M4 New!: Cortex-M7, Cortex-M33 phones, digital TV, smart Cortex-M3 Cortex-M1 Cortex-M Cortex-M0+ books, home gateways etc.
    [Show full text]
  • CDA 4253 FPGA System Design the Picoblaze Microcontroller
    CDA 4253 FPGA System Design The PicoBlaze Microcontroller Hao Zheng Comp Sci & Eng U of South Florida Overview of PicoBlaze • So:-core microcontroller in VHDL: portable to other plaorms. • Small: occupies ~20 CLBs. • Respectable performance: 50 MIPS • Predictable performance: every instrucOon takes 2 cycles. • Suitable for simple data processing and control. 2 Required Reading • P. Chu, FPGA Prototyping by VHDL Examples Chapter 14, PicoBlaze Overview Recommended Reading • PicoBlaxe 8-bit Embedded Microcontroller User Guide (UG129) • K. Chapman, PicoBlaze for Spartan-6, Virtex-6, and 7-Series (KCPSM6) 3 Block diagram of a General-Purpose Processor ctrl 4 Block diagram of a General-Purpose Processor (Microcontroller) 5 PicoBlaze Overview 8-bit data Width, 18-bit instrucOon Width, 10-bit program address 6 Size of PicoBlaze-6 in Spartan 6 1. Resource UOlizaon in CLB Slices • 26 CLB Slices • 1.1% of Spartan-6 used in Nexys3 2. Number of PicoBlaze-6 cores fing inside of the Spartan-6 FPGA (XC6SLX16) used in the Nexys3 FPGA board • 87 PicoBlaze cores Speed of PicoBlaze on Basys-3 1. Maximum Clock Frequency • 100 MHz 2. Maximum number of instrucOons per second • 50 millions of instrucOons per second (MIPS) Fixed Oming: ideal for real-Ome control applicaons, i.e. flight control, manufacturing process control, ... Register File of PicoBlaze-3 8-bit Address 0 7 s0 0 1 7 s1 0 2 7 s2 0 3 7 s3 0 4 7 s4 0 5 7 s5 0 6 7 s6 0 16 Registers 7 7 s7 0 F 7 sF 0 9 DefiniNon of Flags Flags are set or reset after ALU operations Zero flag - Z zero condition Z = 1 if result = 0 0 otherwise Carry flag - C overflow, underflow, or various conditions Example* C = 1 if result > 28-1 (for addition) or result < 0 (for subtraction) 0 otherwise *Applies only to addition or subtraction related instructions, refer to the following slides otherwise 10 Interface of PicoBlaze Inputs Outputs KCPSM = constant (K) coded programmable state machine 11 Interface of PicoBlaze in_port[7:0] – input data port that carries the data for the INPUT instrucOon.
    [Show full text]
  • Accelerated V2X Provisioning with Extensible Processor Platform
    Accelerated V2X provisioning with Extensible Processor Platform Henrique S. Ogawa1, Thomas E. Luther1, Jefferson E. Ricardini1, Helmiton Cunha1, Marcos Simplicio Jr.2, Diego F. Aranha3, Ruud Derwig4 and Harsh Kupwade-Patil1 1 America R&D Center (ARC), LG Electronics US Inc. {henrique1.ogawa,thomas.luther,jefferson1.ricardini,helmiton1.cunha}@lge.com [email protected] 2 University of Sao Paulo, Brazil [email protected] 3 Aarhus University, Denmark [email protected] 4 Synopsys Inc., Netherlands [email protected] Abstract. With the burgeoning Vehicle-to-Everything (V2X) communication, security and privacy concerns are paramount. Such concerns are usually mitigated by combining cryptographic mechanisms with a suitable key management architecture. However, cryptographic operations may be quite resource-intensive, placing a considerable burden on the vehicle’s V2X computing unit. To assuage this issue, it is reasonable to use hardware acceleration for common cryptographic primitives, such as block ciphers, digital signature schemes, and key exchange protocols. In this scenario, custom extension instructions can be a plausible option, since they achieve fine-tuned hardware acceleration with a low to moderate logic overhead, while also reducing code size. In this article, we apply this method along with dual-data memory banks for the hardware acceleration of the PRESENT block cipher, as well as for the F2255−19 finite field arithmetic employed in cryptographic primitives based on Curve25519 (e.g., EdDSA and X25519). As a result, when compared with a state-of-the-art software-optimized implementation, the performance of PRESENT is improved by a factor of 17 to 34 and code size is reduced by 70%, with only a 4.37% increase in FPGA logic overhead.
    [Show full text]
  • Small Soft Core up Inventory ©2019 James Brakefield Opencore and Other Soft Core Processors Reverse-U16 A.T
    tool pip _uP_all_soft opencores or style / data inst repor com LUTs blk F tool MIPS clks/ KIPS ven src #src fltg max max byte adr # start last secondary web status author FPGA top file chai e note worthy comments doc SOC date LUT? # inst # folder prmary link clone size size ter ents ALUT mults ram max ver /inst inst /LUT dor code files pt Hav'd dat inst adrs mod reg year revis link n len Small soft core uP Inventory ©2019 James Brakefield Opencore and other soft core processors reverse-u16 https://github.com/programmerby/ReVerSE-U16stable A.T. Z80 8 8 cylcone-4 James Brakefield11224 4 60 ## 14.7 0.33 4.0 X Y vhdl 29 zxpoly Y yes N N 64K 64K Y 2015 SOC project using T80, HDMI generatorretro Z80 based on T80 by Daniel Wallner copyblaze https://opencores.org/project,copyblazestable Abdallah ElIbrahimi picoBlaze 8 18 kintex-7-3 James Brakefieldmissing block622 ROM6 217 ## 14.7 0.33 2.0 57.5 IX vhdl 16 cp_copyblazeY asm N 256 2K Y 2011 2016 wishbone extras sap https://opencores.org/project,sapstable Ahmed Shahein accum 8 8 kintex-7-3 James Brakefieldno LUT RAM48 or block6 RAM 200 ## 14.7 0.10 4.0 104.2 X vhdl 15 mp_struct N 16 16 Y 5 2012 2017 https://shirishkoirala.blogspot.com/2017/01/sap-1simple-as-possible-1-computer.htmlSimple as Possible Computer from Malvinohttps://www.youtube.com/watch?v=prpyEFxZCMw & Brown "Digital computer electronics" blue https://opencores.org/project,bluestable Al Williams accum 16 16 spartan-3-5 James Brakefieldremoved clock1025 constraint4 63 ## 14.7 0.67 1.0 41.1 X verilog 16 topbox web N 4K 4K N 16 2 2009
    [Show full text]
  • Xilinx Vivado – „EDK” Embedded Development) 4
    EFOP-3.4.3-16-2016-00009 A fels őfokú oktatás min őségének és hozzáférhet őségének együttes javítása a Pannon Egyetemen EMBEDDED SYSTEM DEVELOPMENT (MISAM154R) Created by Zsolt Voroshazi, PhD [email protected] Updated: 02 March. 2020. 2. FPGAS & PLATFORMS Embedded Systems Topics covered 1. Introduction – Embedded Systems 2. FPGAs, Digilent ZyBo development platform 3. Embedded System - Firmware development environment (Xilinx Vivado – „EDK” Embedded Development) 4. Embedded System - Software development environment (Xilinx VITIS – „SDK”) 5. Embedded Base System Build (and Board Bring-Up) 6. Adding Peripherals (from IP database) to BSB 7. Adding Custom (=own) Peripherals to BSB 8. Design and Development of Complex IP cores and applications (e.g. camera/video/ audio controllers) 3 Further references • XILINX official website: http://www.xilinx.com • EE Journal – Electronic Engineering: http://www.eejournal.com/design/embedded • EE Times - News: http://www.eetimes.com/design/embedded 4 PLD & FPGA CIRCUITS General description PAST … • Before ’80s, designing logic networks for digital circuits, modern development tools were not yet available as today. • The design of high complexity (multi I/O) logical combination and sequential networks was therefore slow and cumbersome, often coupled with paper- based design, multiple manual checks, and calculations. • We could not talk about advanced design and simulation tools (CAD) either, so there was a high probability of error in a prototype design. 6 … AND PRESENT • Today, all of these are available in an automated way (EDA - Electronic Design Automation), which, in addition to the use of Programmable Logic Devices (PLD), is relatively fast for both Printed Circuit Boards (PCB) and Application-specific Integrated Circuits and Processors (ASIC / ASSP).
    [Show full text]
  • Emulate 8051 Microprocessor in Picoblaze IP Core
    EmulateEmulate 80518051 MicroprocessorMicroprocessor inin PicoBlazePicoBlaze IPIP CoreCore Put the functions of a legacy microprocessor into a Xilinx FPGA.FPGA. by Lance Roman President Roman-Jones, Inc. [email protected] Brad Fayette Senior Software Engineer Roman-Jones, Inc. [email protected] 00 Xcell Journal Winter 2004 How do you put a one-dollar Intel™ 8051 • Smaller Size – Traditional 8051 type core. Hook up block RAM or microprocessor into an FPGA without implementations range from 1,100 off-chip program ROM, and you’re using 10 dollars’ worth of FPGA fabric? to 1,600 slices of FPGA logic. The ready to go. The answer is emulation. Using soft- PicoBlaze 8051 processor requires • Low Cost – The PB8051 is $495 ware emulation, Roman-Jones Inc. has just 76 slices. Emulation hardware with an easy Xilinx SignOnce IP developed a new type of 8051 processor requires 77 slices. Add another 158 license. core built on a Xilinx 8-bit, soft-core slices for two timers and a four-mode PicoBlaze™ (PB) processor. This “new” serial port, and you have a total of a Architecture of Emulated 8051 PB8051 is more than 70% smaller than 311 slices. This is a reduction of As shown in Figure 1, the architecture of an competing soft-core implementations – more than two-thirds of the FPGA emulated processor has several elements. without sacrificing any of the performance fabric of competing products. Each element is designed independently, of this legacy part. The PB8051 is a Xilinx • Faster – At 1.3 million instructions but together, they act as a whole. AllianceCORE™ microprocessor built per second (MIPS), the PB8051 is through emulation.
    [Show full text]