Generating Processors from Specifications of Instruction Sets

Generating Processors from Specications of Instruction Sets Dissertation A thesis submitted to the Faculty of Electrical Engineering, Computer Science, and Mathematics of the University of Paderborn in partial fulllment of the requirements for the degree of Doctor rerum naturalium (Dr. rer. nat.) by Ralf Dreesen Paderborn, September 2011 Supervisor: Prof. Dr. Uwe Kastens (University of Paderborn) Reviewer: Prof. Dr. Uwe Kastens (University of Paderborn) Prof. Dr. Marco Platzner (University of Paderborn) Prof. Dr.-Ing. Ulrich Ruckert¨ (University of Bielefeld) Additional members of committee: Prof. Dr. Franz Rammig (University of Paderborn) Dr. Matthias Fischer (University of Paderborn) Submitted: 15.09.2011 Examination: 6.12.2011 Published: 9.12.2011 Acknowledgments I would like to use this opportunity to show my gratitude to all people who supported me during this thesis. First of all, I would like to thank my adviser Professor Uwe Kastens, for the support and guidance he gave me and which significantly contributed to the quality of this thesis. Professor Kastens allowed me great latitude in the design of the system and has shown me the right direction during numerous fruitful discussions. I want to thank Professor Ulrich Ruckert¨ and Professor Marco Platzner for their interest in my thesis and reviewing. In addition, I would like to thank Professor Ruckert¨ for the constructive cooperation with his research group. I am obliged to many of my colleagues, who accompanied me in the past years. In particular, I owe earnest thankfulness to Dr. Michael Thies for many intensive discussions during coffee break. Special thanks go to Thorsten Junge- blut for helping me in the area of hardware design and giving me access to the respective infrastructure. I would like to show my gratitude to Friederike Sudholt and Paul Jaessing for proof reading, in particular as the subject is far away from their profession. My personal thanks go to Johanna Sudholt, who has supported me throughout this thesis, as she has done for the past 12 years. I am thankful to my parents for giving me advice and virtues for my life. Paderborn, 15.09.2011 Ralf Dreesen Abstract Most digital systems include microprocessors, as they are very flexible. Some of these microprocessors are tailored to the respective area of application, to optimize execution time and power consumption. An efficient development of such processors necessitates convenient development tools. This thesis contributes a specification language called ViDL and two generators for rapid development of application specific processors. A developer specifies an instruction set in ViDL, then generates an instruction set simulator as well as a microarchitectural processor implementation from that specification. ViDL provides powerful concepts to specify a wide range of instruction sets at a high level of abstraction. For instance, the pipeline and its control are not defined by the developer, but contributed automatically by the processor generator. To demonstrate the power of ViDL, real world instruction sets, such as MIPS, ARM, Power and CoreVA have been specified. A ViDL specification is very similar to instruction set manuals, which allows for rapid formalization (e.g. one day for MIPS). The generated high speed simulators execute 60 million instructions per second (Mips) on average with a peak performance of 140Mips. The generated processor implementations yield a clock frequency of 600MHz on a 65nm ST Microelectronics low power technology. The processor generator produces a pipelined n-stage microarchitecture, where n is automatically derived from a user defined target clock frequency (e.g. 600MHz) and instruction semantics. As a result, different microarchitectural implementations (e.g. 2-stage and 6- stage) can be generated at no extra effort. All processors and the simulator are guaranteed to be consistent, as they are generated from the very same specification. There is no way that a ViDL user can break this consistency, neither by mistake nor by intention. To prove the fitness of ViDL for design space exploration (DSE) and instruction set extension (ISE), a new application specific processor has been developed as part of this thesis. The processor called DNACore is based on MIPS and includes a sophisticated SIMD instruction set extension to acceler- ate the Smith-Waterman algorithm. The algorithm is used in bioinformatics to align DNA, RNA and protein sequences. The generated processor is freely programmable and achieves a throughput of 2.6 billion cell updates per second (GCUPS) at an estimated power consumption of only 0.05W. Zusammenfassung Viele digitale Systeme beinhalten Mikroprozessoren, weil sie aufgrund ihrer Pro- grammierbarkeit sehr flexibel einsetzbar sind. Einige dieser Mikroprozessoren sind auf den jeweiligen Anwendungsbereich zugeschnitten, um die Ausfuhrungs-¨ geschwindigkeit von Programmen zu steigern und Energie zu sparen. Eine effizi- ente Entwicklung solcher anwendungsspezifischer Prozessoren bedarf geeigneter Entwicklungswerkzeuge. Dazu trägt diese Arbeit, sowohl durch die Spezifikationssprache ViDL, als auch durch zwei Generatoren zur schnellen Entwicklung anwendungsspezifischer Prozessoren bei. Ein Entwickler spezifiziert einen Instruktionssatz in ViDL und generiert daraus einen Simulator, als auch eine mikroarchitektonische Implemen- tierung eines Prozessors. ViDL verfugt¨ uber¨ mächtige Konzepte, um eine Viel- zahl an Instruktionssätzen auf einem hohen Abstraktionsniveau zu spezifizieren. So werden zum Beispiel die Pipeline und ihre Kontrolle nicht durch den Entwick- ler definiert, sondern automatisch durch den Generator aus dem Instruktions- satz hergeleitet. Um die praktische Anwendbarkeit von ViDL zu demonstrieren, wurden reale Instruktionssätze wie ARM, MIPS, Power und CoreVA spezifiziert und entsprechende Simulatoren und Prozessoren generiert. Eine ViDL Spezifika- tion ist der Notation in Instruktionssatz-Handbuchern¨ sehr ähnlich und erlaubt deshalb eine schnelle Formalisierung (z.B. einen Tag fur¨ MIPS). Der generierte Simulator fuhrt¨ durchschnittlich 60 Millionen Instruktionen pro Sekunde (Mips) aus, mit einer Spitzengeschwindigkeit von 140Mips. Ge- nerierte Prozessoren erreichen eine Geschwindigkeit von ca. 600 MHz fur¨ eine 65nm STMicroelectronics low power Technologie. Der Prozessorgenerator erzeugt eine n-stufige Pipeline, wobei n automatisch anhand einer vom Benut- zer gegebenen Zielfrequenz ermittelt wird (z.B. 600MHz). Folglich können ohne zusätzlichen Aufwand verschiedenste mikroarchitektonische Implementierungen generiert werden. Alle Prozessoren und der Simulator sind garantiert konsistent, weil sie vollautomatisch aus derselben Spezifikation erzeugt werden. Ein ViDL Entwickler kann diese Konsistenz nicht verletzen, weder unbeabsichtigt noch vorsätzlich. Die Eignung ViDLs zur Exploration von Entwurfsräumen und Erweiterung von Instruktionssätzen wurde durch die Entwicklung eines anwendungsspezifi- schen Prozessors im Rahmen dieser Arbeit nachgewiesen. Der Prozessor namens DNACore basiert auf MIPS und wurde durch einen Satz anspruchsvoller SIMD Instruktionen erweitert, die die Ausfuhrung¨ des Smith-Waterman Algorithmus erheblich beschleunigen. Der Algorithmus wird in der Bioinformatik eingesetzt, um DNS-, RNS- und Proteinsequenzen zu analysieren. Der generierte Prozessor ist frei programmierbar und erreicht einen Durchsatz von 2.6 giga cell updates per second (GCUPS) bei einer Leistungsaufnahme von nur 0.05 W Contents 1 Introduction 15 1.1 Motivation .............................. 15 1.2 Overview ............................... 17 1.3 Processor aspects ........................... 18 1.4 Scientific contributions ........................ 19 1.4.1 Language concepts ...................... 19 1.4.2 Generation methods ..................... 20 1.5 Processor implementations ...................... 21 1.6 System overview ........................... 22 1.7 Evolution of ViDL and its generators ................ 23 1.8 Areas of expertise ........................... 24 2 Fundamentals 27 2.1 Instruction set architectures ..................... 27 2.1.1 ARM ............................. 28 2.1.2 MIPS ............................. 29 2.1.3 OISC — One instruction set computer ........... 30 2.2 Design scenarios ........................... 30 2.3 Domain specific languages ...................... 32 2.4 Compilation methods ........................ 34 2.4.1 Front-end ........................... 35 2.4.2 Middle-end .......................... 36 2.4.3 Back-end ........................... 36 2.4.4 Compiler framework ..................... 37 2.5 Type systems ............................. 37 2.5.1 Subtyping ........................... 38 2.5.2 Tuples ............................. 38 2.5.3 Signatures ........................... 38 2.5.4 Polymorphic types ...................... 39 2.5.5 Polymorphic functions .................... 39 2.6 Term rewriting systems ....................... 39 2.6.1 Term .............................. 39 2.6.2 Rewrite rules ......................... 40 2.6.3 Termination and confluence ................. 40 10 CONTENTS 2.7 Microarchitecture ........................... 40 2.7.1 Storages ............................ 41 2.7.2 Datapath ........................... 42 2.7.3 Pipeline ............................ 43 2.7.4 Execution order ........................ 44 2.7.5 Forwarding .......................... 44 2.7.6 Interlocking .......................... 45 2.7.7 Branch prediction ...................... 46 3 Related approaches 49 3.1 Taxonomy of ISA specification languages

Generating Processors from Specifications of Instruction Sets

Structured Recursion for Non-Uniform Data-Types

Binary Search Trees

Recursive Type Generativity

Data Structures Are Ways to Organize Data (Informa- Tion). Examples

Recursive Type Generativity

Compositional Data Types

4. Types and Polymorphism

Computing Lectures

Table of Contents

Scrap Your Boilerplate: a Practical Design Pattern for Generic Programming

Lecture 5. Data Types and Type Classes Functional Programming

Implementing Reflection in Nuprl