Efficient Hardware for Low Latency Applications

Total Page:16

File Type:pdf, Size:1020Kb

Efficient Hardware for Low Latency Applications EFFICIENT HARDWARE FOR LOW LATENCY APPLICATIONS Inauguraldissertation zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften der Universität Mannheim vorgelegt von Christian Harald Leber (Diplom-Informatiker der Technischen Informatik) aus Ludwigshafen Mannheim, 2012 Dekan: Professor Dr. Heinz Jürgen Müller, Universität Mannheim Referent: Professor Dr. Ulrich Brüning, Universität Heidelberg Korreferent: Professor Dr. Holger Fröning, Universität Heidelberg Tag der mündlichen Prüfung: 20.8.2012 Für meine Eltern Abstract Abstract The design and development of application specific hardware structures has a high degree of complexity. Logic resources are nowadays often not the limit anymore, but the develop- ment time and especially the time it takes for testing are today significant factors that limit the possibility to implement functionality in hardware. In the following techniques are pre- sented that help to improve the capabilities to design complex hardware. The first part presents a generator (RFS) which allows defining control and status structures for hardware designs using an abstract high level language. These structures are usually called register files. Special features are presented for the efficient implementation of counters and error reporting facilities. Native support for the partitioning of such register files is an important trait of RFS. A novel method to inform host systems very efficiently about changes in the register files is presented in the second part. It makes use of a microcode programmable hardware unit. The microcode is auto generated based on output from RFS. The general principle is to push changed information from the register file to the host systems instead of relying on the stan- dard pull approach. In the third part a highly efficient, fully pipelined address translation mechanism for remote memory access in HPC interconnection networks is presented. It is able of performing up to one translation per cycle and provides efficient interfaces for software to control the content of its n-way set associative translation cache. The mechanism is centered on a pipeline with a new concept to resolve dependency problems. The last part of this thesis addresses the problem of sending TCP messages for a low latency trading application using a hybrid TCP stack implementation that consists of hardware and software components. This separation allows low latency while keeping the complexity under control. Furthermore, a simulation environment for the TCP stack is presented, which is used to test the stack within a hardware simulator. This hardware-software co-simulation allows the acceleration of testing and debugging of complex hardware designs. It allows employing standard components like an operating system kernel instead of implementing the functionality again within the simulation environment. Zusammenfassung Zusam- men- fassung Das Design und die Entwicklung applikationsspezifischer Hardwarestrukturen bedingen einen hohen Grad an Komplexität. Die Menge an Logik Ressourcen ist oft nicht mehr das Limit, sondern die Entwicklungszeit und im speziellen die Zeit, welche zum Testen benötigt wird, sind heute ein signifikanter Faktor welcher die Möglichkeiten Logik in Hardware zu implementieren limitiert. Im Folgenden werden Techniken präsentiert, welche die Fähigkei- ten komplexe Hardware zu entwickeln verbessern. Der erste Teil präsentiert einen Generator, welcher es ermöglicht Kontroll- und Status- Strukturen für Hardware Designs mittels einer abstrakten Hochsprache zu beschreiben. Diese werden für gewöhnlich Registerfiles genannt. Spezielle Funktionen für die effiziente Implementierung von Zählern und Einrichtungen zur Indikation von Fehlern werden vorge- stellt. Zentrales Element dieser Entwicklung ist die Fähigkeit die Partitionierung von Regi- sterfiles direkt zu unterstützen. Eine neuartige Methode um das Hostsystem sehr effizient über Änderungen in besagtem Registerfile zu informieren wird im zweiten Teil präsentiert. Dafür wird eine durch Mikro- code programmierbare Hardwareeinheit genutzt. Der Mikrocode wird basierend auf der Ausgabe des Registerfile Generators automatisch erstellt. Das generelle Prinzip beruht dar- auf geänderte Informationen vom Registerfile zum Hostsystem zu schicken, anstatt sich auf den Standardansatz zu stützen, dass das Hostsystem die Daten abholt. In einem dritten Teil wird ein hocheffizienter Mechanismus zur Übersetzung von Adressen für den Zugriff auf entfernte Speicher in HPC Verbindungsnetzwerken gezeigt. Der Mecha- nismus ist in der Lage bis zu eine Übersetzung pro Takt abzuarbeiten und stellt effiziente Schnittstellen zur Verfügung, so dass Software den Inhalt des n-fach assoziativen Zwi- schenspeichers für Übersetzungen kontrollieren kann. Dieser Mechanismus benutzt eine Pipeline mit einer neuen Lösung um Abhängigkeiten aufzulösen. Der letzte Teil dieser Dissertation befasst sich mit dem Problem TCP Nachrichten mit einer möglichst niedrigen Latenz für den Hochgeschwindigkeitshandel an der Börse zu versen- den. Dazu wird eine hybride TCP Implementation, welche aus Hardware und Software Komponenten besteht, präsentiert. Diese Trennung ermöglicht niedrige Latenz und erlaubt die Komplexität unter Kontrolle zu behalten. Weiterhin wird eine Hardware-Software Kosi- mulation für die besagte TCP Implementation präsentiert, welche dazu dient um diese innerhalb eines Hardware Simulators zusammen mit Standard Komponenten wie einem Betriebssystemkern zu testen, anstatt die Funktionalität innerhalb des Simulator neu zu implementieren. Contents TOC Contents I 1 Introduction 1 1.1 Problems, Goals and Solutions . 3 1.2 Environment of the Hardware Structures . 4 1.2.1 HyperTransport on Chip Protocol . 5 1.2.2 Interface to the Host System . 7 1.2.3 Hardware . 8 1.3 Application Specific Hardware Structures . 9 1.3.1 EXTOLL . 9 1.3.2 Trading Accelerator . 14 2 Register File Generation 17 2.1 Introduction . 17 2.1.1 Register File . 17 2.1.2 Auto Generation . 20 2.1.3 Hierarchical Decomposition . 23 2.2 State of the Art and Motivation . 27 2.2.1 Denali Blueprint . 27 2.2.2 Others . 30 2.2.3 Motivation . 30 2.3 Design Decisions . 31 2.3.1 Features . 31 2.3.2 Input Method and Data Format . 32 2.3.3 Output Data Formats . 34 2.3.4 HW/SW Interface Consistency . 38 2.3.5 Interface . 40 I 2.4 XML Specification and Features . 42 2.4.1 Basic XML Example of a RF . 44 2.4.2 Hardware Register - hwreg . 47 2.4.3 64 bit Register - reg64 . 61 2.4.4 Aligner . 63 2.4.5 Ramblock . 65 2.4.6 Regroot / Regroot Instance . 71 2.4.7 Multi Element Instantiation . 75 2.4.8 Placeholder . 77 2.4.9 Trigger Events . 78 2.4.10 Annotated XML . 80 2.5 Implementation of RFS . 81 2.6 Supporting Infrastructure . 82 2.6.1 HyperTransport on Chip to RF Converter . 83 2.6.2 Debug Port . 83 2.6.3 OVM Components . 84 2.6.4 RF Driver Generator . 85 2.7 Application of RFS . 85 2.7.1 Shift Register Read and Write Logic . 85 2.7.2 EXTOLL RF . 87 2.8 Optimization Efforts . 88 2.8.1 Step 1 . 92 2.8.2 Step 2 . 92 2.8.3 Step 3 . 93 2.8.4 Step 4 . 93 2.8.5 Results . 94 3 System Notification Queue 95 3.1 Introduction . 95 3.2 Motivation . 96 3.2.1 Performance Counters . 97 3.3 Register File . 97 3.3.1 Trigger Dump Groups . 99 3.4 Functionality . 100 3.5 Architecture . 101 3.5.1 Control Logic . 104 II 3.6 Hardware Environment and Components . 108 3.6.1 HyperTransport on Chip to RF Converter . 109 3.6.2 Trigger Arbiter and Control . 110 3.6.3 Microcode Engine . 113 3.6.4 Register File Handler . 114 3.6.5 HT Send Unit . 115 3.6.6 SNQ Register File . 122 3.7 Microcode Engine Instruction Set . 122 3.7.1 JUMP Arbiter . 123 3.7.2 JUMP . ..
Recommended publications
  • EP Activity Report 2014
    EUROPRACTICE IC SERVICE THE RIGHT COCKTAIL OF ASIC SERVICES EUROPRACTICE IC SERVICE OFFERS YOU A PROVEN ROUTE TO ASICS THAT FEATURES: • Low-cost ASIC prototyping • Flexible access to silicon capacity for small and medium volume production quantities • Partnerships with leading world-class foundries, assembly and testhouses • Wide choice of IC technologies • Distribution and full support of high-quality cell libraries and design kits for the most popular CAD tools • RTL-to-Layout service for deep-submicron technologies • Front-end ASIC design through Alliance Partners Industry is rapidly discovering the benefits of using the EUROPRACTICE IC service to help bring new product designs to market quickly and cost-effectively. The EUROPRACTICE ASIC route supports especially those companies who don’t need always the full range of services or high production volumes. Those companies will gain from the flexible access to silicon prototype and production capacity at leading foundries, design services, high quality support and manufacturing expertise that includes IC manufacturing, packaging and test. This you can get all from EUROPRACTICE IC service, a service that is already established for 20 years in the market. THE EUROPRACTICE IC SERVICES ARE OFFERED BY THE FOLLOWING CENTERS: • imec, Leuven (Belgium) • Fraunhofer-Institut fuer Integrierte Schaltungen (Fraunhofer IIS), Erlangen (Germany) This project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement N° 610018. This funding is exclusively used to support European universities and research laboratories. By courtesy of imec FOREWORD Dear EUROPRACTICE customers, Time goes on. A year passes very quickly and when we look around us we see a tremendous rapidly changing world.
    [Show full text]
  • Hardware Realization of an FPGA Processor – Operating System Call Offload and Experiences
    Downloaded from orbit.dtu.dk on: Oct 02, 2021 Hardware Realization of an FPGA Processor – Operating System Call Offload and Experiences Hindborg, Andreas Erik; Schleuniger, Pascal; Jensen, Nicklas Bo; Karlsson, Sven Published in: Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing (DASIP) Publication date: 2014 Link back to DTU Orbit Citation (APA): Hindborg, A. E., Schleuniger, P., Jensen, N. B., & Karlsson, S. (2014). Hardware Realization of an FPGA Processor – Operating System Call Offload and Experiences. In A. Morawiec, & J. Hinderscheit (Eds.), Proceedings of the 2014 Conference on Design and Architectures for Signal and Image Processing (DASIP) IEEE. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Hardware Realization of an FPGA Processor – Operating System Call Offload and Experiences Andreas Erik Hindborg, Pascal Schleuniger Nicklas Bo Jensen, Sven Karlsson DTU Compute – Technical University of Denmark fahin,pass,nboa,[email protected] Abstract—Field-programmable gate arrays, FPGAs, are at- speedup of up to 64% over a Xilinx MicroBlaze based baseline tractive implementation platforms for low-volume signal and system.
    [Show full text]
  • A Full-System VM-HDL Co-Simulation Framework for Servers with Pcie
    A Full-System VM-HDL Co-Simulation Framework for Servers with PCIe-Connected FPGAs Shenghsun Cho, Mrunal Patel, Han Chen, Michael Ferdman, Peter Milder Stony Brook University ABSTRACT the most common connection choice, due to its wide availability in The need for high-performance and low-power acceleration tech- server systems. Today, the majority of FPGAs in data centers are nologies in servers is driving the adoption of PCIe-connected FPGAs communicating with the host system through PCIe [2, 12]. in datacenter environments. However, the co-development of the Unfortunately, developing applications for PCIe-connected application software, driver, and hardware HDL for server FPGA FPGAs is an extremely slow and painful process. It is challeng- platforms remains one of the fundamental challenges standing in ing to develop and debug the host software and the FPGA hardware the way of wide-scale adoption. The FPGA accelerator development designs at the same time. Moreover, the hardware designs running process is plagued by a lack of comprehensive full-system simu- on the FPGAs provide little to no visibility, and even small changes lation tools, unacceptably slow debug iteration times, and limited to the hardware require hours to go through FPGA synthesis and visibility into the software and hardware at the time of failure. place-and-route. The development process becomes even more diffi- In this work, we develop a framework that pairs a virtual ma- cult when operating system and device driver changes are required. chine and an HDL simulator to enable full-system co-simulation of Changes to any part of the system (the OS kernel, the loadable ker- a server system with a PCIe-connected FPGA.
    [Show full text]
  • Co-Simulation Between Cλash and Traditional Hdls
    MASTER THESIS CO-SIMULATION BETWEEN CλASH AND TRADITIONAL HDLS Author: John Verheij Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Computer Architecture for Embedded Systems (CAES) Exam committee: Dr. Ir. C.P.R. Baaij Dr. Ir. J. Kuper Dr. Ir. J.F. Broenink Ir. E. Molenkamp August 19, 2016 Abstract CλaSH is a functional hardware description language (HDL) developed at the CAES group of the University of Twente. CλaSH borrows both the syntax and semantics from the general-purpose functional programming language Haskell, meaning that circuit de- signers can define their circuits with regular Haskell syntax. CλaSH contains a compiler for compiling circuits to traditional hardware description languages, like VHDL, Verilog, and SystemVerilog. Currently, compiling to traditional HDLs is one-way, meaning that CλaSH has no simulation options with the traditional HDLs. Co-simulation could be used to simulate designs which are defined in multiple lan- guages. With co-simulation it should be possible to use CλaSH as a verification language (test-bench) for traditional HDLs. Furthermore, circuits defined in traditional HDLs, can be used and simulated within CλaSH. In this thesis, research is done on the co-simulation of CλaSH and traditional HDLs. Traditional hardware description languages are standardized and include an interface to communicate with foreign languages. This interface can be used to include foreign func- tions, or to make verification and co-simulation possible. Because CλaSH also has possibilities to communicate with foreign languages, through Haskell foreign function interface (FFI), it is possible to set up co-simulation. The Verilog Procedural Interface (VPI), as defined in the IEEE 1364 standard, is used to set-up the communication and to control a Verilog simulator.
    [Show full text]
  • Co-Emulation of Scan-Chain Based Designs Utilizing SCE-MI Infrastructure
    UNLV Theses, Dissertations, Professional Papers, and Capstones 5-1-2014 Co-Emulation of Scan-Chain Based Designs Utilizing SCE-MI Infrastructure Bill Jason Pidlaoan Tomas University of Nevada, Las Vegas Follow this and additional works at: https://digitalscholarship.unlv.edu/thesesdissertations Part of the Computer Engineering Commons, Computer Sciences Commons, and the Electrical and Computer Engineering Commons Repository Citation Tomas, Bill Jason Pidlaoan, "Co-Emulation of Scan-Chain Based Designs Utilizing SCE-MI Infrastructure" (2014). UNLV Theses, Dissertations, Professional Papers, and Capstones. 2152. http://dx.doi.org/10.34917/5836171 This Thesis is protected by copyright and/or related rights. It has been brought to you by Digital Scholarship@UNLV with permission from the rights-holder(s). You are free to use this Thesis in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s) directly, unless additional rights are indicated by a Creative Commons license in the record and/ or on the work itself. This Thesis has been accepted for inclusion in UNLV Theses, Dissertations, Professional Papers, and Capstones by an authorized administrator of Digital Scholarship@UNLV. For more information, please contact [email protected]. CO-EMULATION OF SCAN-CHAIN BASED DESIGNS UTILIZING SCE-MI INFRASTRUCTURE By: Bill Jason Pidlaoan Tomas Bachelor‟s Degree of Electrical Engineering Auburn University 2011 A thesis submitted
    [Show full text]
  • An Open-Source Python-Based Hardware Generation, Simulation
    An Open-Source Python-Based Hardware Generation, Simulation, and Verification Framework Shunning Jiang Christopher Torng Christopher Batten School of Electrical and Computer Engineering, Cornell University, Ithaca, NY { sj634, clt67, cbatten }@cornell.edu pytest coverage.py hypothesis ABSTRACT Host Language HDL We present an overview of previously published features and work (Python) (Verilog) in progress for PyMTL, an open-source Python-based hardware generation, simulation, and verification framework that brings com- FL DUT pelling productivity benefits to hardware design and verification. CL DUT generate Verilog synth RTL DUT PyMTL provides a natural environment for multi-level modeling DUT' using method-based interfaces, features highly parametrized static Sim FPGA/ elaboration and analysis/transform passes, supports fast simulation cosim ASIC and property-based random testing in pure Python environment, Test Bench Sim and includes seamless SystemVerilog integration. Figure 1: PyMTL’s workflow – The designer iteratively refines the hardware within the host Python language, with the help from 1 INTRODUCTION pytest, coverage.py, and hypothesis. The same test bench is later There have been multiple generations of open-source hardware reused for co-simulating the generated Verilog. FL = functional generation frameworks that attempt to mitigate the increasing level; CL = cycle level; RTL = register-transfer level; DUT = design hardware design and verification complexity. These frameworks under test; DUT’ = generated DUT; Sim = simulation. use a high-level general-purpose programming language to ex- press a hardware-oriented declarative or procedural description level (RTL), along with verification and evaluation using Python- and explicitly generate a low-level HDL implementation. Our pre- based simulation and the same TB.
    [Show full text]
  • High Performance Computing Systems: Status and Outlook
    Acta Numerica (2012), pp. 001– c Cambridge University Press, 2012 doi:10.1017/S09624929 Printed in the United Kingdom High Performance Computing Systems: Status and Outlook J.J. Dongarra University of Tennessee and Oak Ridge National Laboratory and University of Manchester [email protected] A.J. van der Steen NCF/HPC Research L.J. Costerstraat 5 6827 AR Arnhem The Netherlands [email protected] CONTENTS 1 Introduction 1 2 The main architectural classes 2 3 Shared-memory SIMD machines 6 4 Distributed-memory SIMD machines 8 5 Shared-memory MIMD machines 10 6 Distributed-memory MIMD machines 13 7 ccNUMA machines 17 8 Clusters 18 9 Processors 20 10 Computational accelerators 38 11 Networks 53 12 Recent Trends in High Performance Computing 59 13 HPC Challenges 72 References 91 1. Introduction High Performance computer systems can be regarded as the most power- ful and flexible research instruments today. They are employed to model phenomena in fields so various as climatology, quantum chemistry, compu- tational medicine, High-Energy Physics and many, many other areas. In 2 J.J. Dongarra & A.J. van der Steen this article we present some of the architectural properties and computer components that make up the present HPC computers and also give an out- look on the systems to come. For even though the speed of computers has increased tremendously over the years (often a doubling in speed every 2 or 3 years), the need for ever faster computers is still there and will not disappear in the forseeable future. Before going on to the descriptions of the machines themselves, it is use- ful to consider some mechanisms that are or have been used to increase the performance.
    [Show full text]
  • Technical Portion
    50 50 YEARS OF INNOVATION DESIGN AUTOMATION CONFERENCE Celebrating 50 Years of Innovation! www.DAC.com JUNE 2-6, 2013 AUSTIN CONVENTION CENTER - AUSTIN, TX SPONSORED BY: IN TECHNICAL COOPERATION WITH: 1 2 TABLE OF CONTENTS General Chair’s Welcome ....................................................................................................................... 4 Sponsors ................................................................................................................................................. 5 Important Information ............................................................................................................................ 6 Networking Receptions .......................................................................................................................... 7 Keynotes ....................................................................................................................................8,9,13-15 Kickin’ it up in Austin Party .................................................................................................................. 10 Global Forum ........................................................................................................................................ 11 Awards .................................................................................................................................................. 12 Technical Sessions ..........................................................................................................................16-36
    [Show full text]
  • Intro to Programmable Logic and Fpgas
    CS 296-33: Intro to Programmable Logic and FPGAs ADEL EJJEH UNIVERSITY OF ILLINOIS URBANA-CHAMPAIGN © Adel Ejjeh, UIUC, 2015 2 Digital Logic • In CS 233: • Logic Gates • Build Logic Circuits • Sum of Products ?? F = (A’.B)+(B.C)+(A.C’) A B Black F Box C © Adel Ejjeh, UIUC, 2015 3 Programmable Logic Devices (PLDs) PLD PLA PAL CPLD FPGA (Programmable (Programmable (Complex PLD) (Field Prog. Logic Array) Array Logic) Gate Array) •2-level structure of •Enhanced PLAs •For large designs •Has a much larger # of AND-OR gates with reduced costs •Collection of logic blocks with programmable multiple PLDs with •Larger interconnection connections an interconnection networK structure •Largest manufacturers: Xilinx - Altera Slide taken from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 4 Combinational Programmable Logic Devices PLAs, CPLDs © Adel Ejjeh, UIUC, 2015 5 Programmable Logic Arrays (PLAs) • 2-level AND-OR device • Programmable connections • Used to generate SOP • Ex: 4x3 PLA Slide adapted from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 6 PLAs contd • O1 = I1.I2’ + I4.I3’ • O2 = I2.I3.I4’ + I4.I3’ • O3 = I1.I2’ + I2.I1’ Slide adapted from Prof. Chehab, American University of Beirut © Adel Ejjeh, UIUC, 2015 7 Programmable Array Logic (PALs) • More Versatile than PLAs • User Programmable AND array followed by fixed OR gates • Flip-flops/Buffers with feedback transforming output ports into I/O ports © Adel Ejjeh, UIUC, 2015 8 Complex PLDs (CPLD) • Programmable PLD blocks (PALs) I/O Block I/O I/O
    [Show full text]
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • A Pythonic Approach for Rapid Hardware Prototyping and Instrumentation
    A Pythonic Approach for Rapid Hardware Prototyping and Instrumentation John Clow, Georgios Tzimpragos, Deeksha Dangwal, Sammy Guo, Joseph McMahan and Timothy Sherwood University of California, Santa Barbara, CA, 93106 USA Email: fjclow, gtzimpragos, deeksha, sguo, jmcmahan, [email protected] Abstract—We introduce PyRTL, a Python embedded hardware To achieve these goals, PyRTL intentionally restricts users design language that helps concisely and precisely describe to a set of reasonable digital design practices. PyRTL’s small digital hardware structures. Rather than attempt to infer a and well-defined internal core structure makes it easy to add good design via HLS, PyRTL provides a wrapper over a well- defined “core” set of primitives in a way that empowers digital new functionality that works across every design, including hardware design teaching and research. The proposed system logic transforms, static analysis, and optimizations. Features takes advantage of the programming language features of Python such as elaboration-through-execution (e.g. introspection), de- to allow interesting design patterns to be expressed succinctly, and sign and simulation without leaving Python, and the option encourage the rapid generation of tooling and transforms over to export to, or import from, common HDLs (Verilog-in via a custom intermediate representation. We describe PyRTL as a language, its core semantics, the transform generation interface, Yosys [1] and BLIF-in, Verilog-out) are also supported. More and explore its application to several different design patterns and information about PyRTL’s high level flow can be found in analysis tools. Also, we demonstrate the integration of PyRTL- Figure 1. generated hardware overlays into Xilinx PYNQ platform.
    [Show full text]
  • Totalview Reference Guide
    TotalView Reference Guide version 8.8 Copyright © 2007–2010 by TotalView Technologies. All rights reserved Copyright © 1998–2007 by Etnus LLC. All rights reserved. Copyright © 1996–1998 by Dolphin Interconnect Solutions, Inc. Copyright © 1993–1996 by BBN Systems and Technologies, a division of BBN Corporation. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, elec- tronic, mechanical, photocopying, recording, or otherwise without the prior written permission of TotalView Technologies. Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DFARS 252.227-7013. TotalView Technologies has prepared this manual for the exclusive use of its customers, personnel, and licensees. The infor- mation in this manual is subject to change without notice, and should not be construed as a commitment by TotalView Tech- nologies. TotalView Technologies assumes no responsibility for any errors that appear in this document. TotalView and TotalView Technologies are registered trademarks of TotalView Technologies. TotalView uses a modified version of the Microline widget library. Under the terms of its license, you are entitled to use these modifications. The source code is available at: ftp://ftp.totalviewtech.com/support/toolworks/Microline_totalview.tar.Z. All other brand names are the trademarks of their respective holders. Book Overview part I - CLI Commands 1
    [Show full text]