Object-Oriented Development for Reconfigurable Architectures

Total Page:16

File Type:pdf, Size:1020Kb

Object-Oriented Development for Reconfigurable Architectures Object-Oriented Development for Reconfigurable Architectures Von der Fakultät für Mathematik und Informatik der Technischen Universität Bergakademie Freiberg genehmigte DISSERTATION zur Erlangung des akademischen Grades Doktor Ingenieur Dr.-Ing., vorgelegt von Dipl.-Inf. (FH) Dominik Fröhlich geboren am 19. Februar 1974 Gutachter: Prof. Dr.-Ing. habil. Bernd Steinbach (Freiberg) Prof. Dr.-Ing. Thomas Beierlein (Mittweida) PD Dr.-Ing. habil. Michael Ryba (Osnabrück) Tag der Verleihung: 20. Juni 2007 To my parents. ABSTRACT Reconfigurable hardware architectures have been available now for several years. Yet the application devel- opment for such architectures is still a challenging and error-prone task, since the methods, languages, and tools being used for development are inappropriate to handle the complexity of the problem. This hampers the widespread utilization, despite of the numerous advantages offered by this type of architecture in terms of computational power, flexibility, and cost. This thesis introduces a novel approach that tackles the complexity challenge by raising the level of ab- straction to system-level and increasing the degree of automation. The approach is centered around the paradigms of object-orientation, platforms, and modeling. An application and all platforms being used for its design, implementation, and deployment are modeled with objects using UML and an action language. The application model is then transformed into an implementation, whereby the transformation is steered by the platform models. In this thesis solutions for the relevant problems behind this approach are discussed. It is shown how UML can be used for complete and precise modeling of applications and platforms. Application development is done at the system-level using a set of well-defined, orthogonal platform models. Thereby the core features of object-orientation - data abstraction, encapsulation, inheritance, and polymorphism - are fully supported. Novel algorithms are presented, that allow for an automatic mapping of such application models to the tar- get architecture. Thereby the problems of platform mapping, estimation of implementation characteristics, and synthesis of UML models are discussed. The thesis explores the utilization of platform models for gen- eration of highly optimized implementations in an automatic yet adaptable way. The approach is evaluated by a number of relevant applications. The execution of the generated implementations is supported by a run-time service. This service man- ages the hardware configurations and objects comprising the application. Moreover, it serves as broker for hardware objects. The efficient management of configurations and objects at run-time is discussed and opti- mized life cycles for these entities are proposed. Mechanisms are presented that make the approach portable among different physical hardware architectures. Further, this thesis presents UML profiles and example platforms that support system-level design. These extensions are embodied in a novel type of model compiler. The compiler is accompanied by an imple- mentation of the run-time service. Both have been used to evaluate and improve the presented concepts and algorithms. ACKNOWLEDGEMENTS This work would have never been started or even finished without the support of many people. I am partic- ularly grateful to my advisors Prof. Steinbach and Prof. Beierlein for the long hours they spent with me in discussing the tangible pieces of this work. Especially thankful I am to Prof. Beierlein and my colleague Thomas Oehme for providing me the time and encouragement to finish this thesis. I am greatly indebted to my parents Regina and Elmar for their seamlessly never-ending emotional and financial support, and patience while I was doing my academic endeavours. Without them this work would have never been possible. The development of only the most important parts of the MOCCA project took many man-years of effort. Clearly, this can not be done by a single person in such a restricted time frame. Therefore, I am grateful to Isabel Drost, Alexander Faber, Andre Gauter, Thomas Mathiebe, Henning Riedel, and Vitali Tomm for the discussion and development of parts of MOCCA. Peter Grünberg helped me preparing most of the experimental results. Particular thanks go to Frank Anke for his work in improving the quality of the compiler and performing the experiments on model-driven graphical user interface generation. Many thanks to Jana, Anja, Susi, Tina, Erik, Lars, and many more who offered my their friendship, and provided me an emotional refuge. Particularly I’d like to thank Mandy. We met in a time of my life when I was almost giving up finishing this thesis. It was her emotional support and kind nature that made me smile again and continue to the end. CONTENTS List of Figures .......................................... vii List of Tables ........................................... xi List of Algorithms ........................................ xv List of Listings .......................................... xvii List of Acronyms ......................................... xix List of Symbols ......................................... xxiii 1. Introduction ........................................... 1 1.1 Motivation...................................... ..... 1 1.2 RelatedWork ..................................... .... 3 1.3 ContributionsandRestrictions . ............ 4 1.4 Overview ........................................ ... 5 2. Theoretical and Technological Foundations ........................... 7 2.1 Reconfigurable Computing . ........ 7 2.1.1 Design Space of Reconfigurable Computing . .......... 7 2.1.2 System-LevelDesign. ..... 9 2.2 Hardware Design with High-Level Languages . ............. 14 2.2.1 High-Level Languages for Hardware Design . .......... 14 2.2.2 DesignSpaceofVHDL ............................. 15 2.2.3 HardwareDesignFlow. 16 2.3 The Unified Modeling Language . ........ 18 2.3.1 DesignSpaceofUML .............................. 18 2.3.2 SystemDesignwithUML . 22 3. Model-Driven Architecture for Reconfigurable Computer Architectures ............. 25 3.1 System-LevelDesignwithUML . ........ 25 3.1.1 UML as System-Level Design Language . ....... 25 3.1.2 MOCCAActionLanguage. 26 ii Contents 3.2 Model-Driven Development Methodology . ............ 31 3.2.1 Co-Design, Platform-based Design and Model-Driven Architecture . 31 3.2.2 Model-Driven, Platform-Based System-Level Design . ............... 33 3.3 PlatformsandModels.............................. ....... 36 3.3.1 Use-CaseModel ................................. 36 3.3.2 DesignPlatformModel. ..... 36 3.3.3 DesignModel................................... 41 3.3.4 Implementation Platform Model . ........ 42 3.3.5 ImplementationModel . ..... 49 3.3.6 Deployment Platform Model . ...... 52 3.3.7 DeploymentModel............................... 53 4. Platform Mapping ........................................ 55 4.1 Platform Mapping for Object-Oriented Specifications . .................. 55 4.1.1 Definition of the Mapping Problem . ....... 55 4.1.2 Challenges .................................... 55 4.1.3 StructureoftheDesignSpace . ....... 56 4.2 TargetPlatformArchitecture . ........... 60 4.2.1 ArchitecturalIllusions . ........ 60 4.2.2 ImplementationOptions . ...... 61 4.2.3 ArchitecturalConstraints . ......... 62 4.3 PlatformMappingAlgorithms . ......... 63 4.3.1 A Platform-Based Distributed Mapping Approach . ............ 63 4.3.2 MappingControl ................................ 65 4.3.3 BreedingofMappings . 66 4.3.4 Computation of Candidate Mappings . ........ 73 4.3.5 MappingEvaluation ............................. 76 4.4 Estimation of Model Characteristics . ............. 77 4.4.1 Estimation of Execution Characteristics . ............. 77 4.4.2 Estimation of Implementation Characteristics . ............... 78 5. Synthesis ............................................. 83 5.1 Synthesis for Object-Oriented Specifications . ................. 83 5.1.1 Definition of the Synthesis Problem . ......... 83 5.1.2 UML-to-Implementation Mappings . ........ 83 5.1.3 SynthesisFlow ................................. 84 5.2 Hardware/SoftwareInterface . ........... 84 5.2.1 Hardware Object and Component Life Cycle . ......... 84 5.2.2 Logical Hardware Object Interface . .......... 85 Contents iii 5.3 Implementation of Software Modules . ........... 86 5.3.1 UML-to-C++Mapping. 86 5.3.2 Communication with Hardware Objects . ......... 87 5.4 Implementation of Hardware Modules . ........... 87 5.4.1 UML-to-VHDLMapping . 87 5.4.2 Hardware Implementation of Components . .......... 88 5.4.3 Hardware Implementation of Objects . ......... 94 5.4.4 Hardware Implementation of Behavior . .......... 96 5.5 HardwareObjectModel ............................. 105 6. Run-Time Reconfiguration .................................... 107 6.1 HardwareAbstractionLayer . ......... 107 6.1.1 ResourceManagement . 107 6.1.2 ResourceBroking............................... 108 6.2 Dynamic Object Lifetimes and Communication . ............. 108 6.2.1 ObjectCreation................................ 108 6.2.2 ObjectDestruction . .... ..... ..... ..... ..... .... 110 6.2.3 Communication with Hardware Objects . ......... 111 7. Experimental Results ....................................... 113 7.1 MOCCA-Development Environment . ......... 113 7.1.1 Overview ...................................... 113 7.1.2 Model Compiler for Reconfigurable Architectures . ............. 114 7.2 BooleanNeuralNetworks. ........ 117 7.2.1 ProblemDescription . 117 7.2.2 Experiments
Recommended publications
  • Lecture 7: Synchronous Sequential Logic
    Systems I: Computer Organization and Architecture Lecture 7: Synchronous Sequential Logic Synchronous Sequential Logic • The digital circuits that we have looked at so far are combinational, depending only on the inputs. In practice, most systems have many components that contain memory elements, which require sequential logic. • A sequential circuit contains both a combinational circuit and memory elements, whose output also serves as an input for the combinational circuit. • The binary data stored in the memory elements define the state of the sequential circuit and help determine the conditions for changing the state in the memory elements. Inputs Combinational Outputs circuit Memory elements Sequential Circuits: Synchronous and Asynchronous • There are two types of sequential circuits, classified by their signal’s timing: – Synchronous sequential circuits have behavior that is defined from knowledge of its signal at discrete instants of time. – The behavior of asynchronous sequential circuits depends on the order in which input signals change and are affected at any instant of time. Synchronous Sequential Circuits • Synchronous sequential circuits need to use signal that affect memory elements at discrete time instants. • Synchronization is achieved by a timing device called a master-clock generator which generates a periodic train of clock pulses. Basic Flip-Flop Circuit Using NOR Gates 1 R(reset) Q 0 Set state Clear state 1 Q’ 0 S(set) S R Q Q’ 1 0 1 0 0 0 1 0 (after S = 1, R = 0) 0 1 0 1 0 0 0 1 (after S = 0, R = 1) 1 1 0 0 Basic
    [Show full text]
  • On the Hardware Reduction of Z-Datapath of Vectoring CORDIC
    On the Hardware Reduction of z-Datapath of Vectoring CORDIC R. Stapenhurst*, K. Maharatna**, J. Mathew*, J.L.Nunez-Yanez* and D. K. Pradhan* *University of Bristol, Bristol, UK **University of Southampton, Southampton, UK [email protected] Abstract— In this article we present a novel design of a hardware wordlength larger than 18-bits the hardware requirement of it optimal vectoring CORDIC processor. We present a mathematical becomes more than the classical CORDIC. theory to show that using bipolar binary notation it is possible to eliminate all the arithmetic computations required along the z- In this particular work we propose a formulation to eliminate datapath. Using this technique it is possible to achieve three and 1.5 all the arithmetic operations along the z-datapath for conventional times reduction in the number of registers and adder respectively two-sided vector rotation and thereby reducing the hardware compared to classical CORDIC. Following this, a 16-bit vectoring while increasing the accuracy. Also the resulting architecture CORDIC is designed for the application in Synchronizer for IEEE shows significant hardware saving as the wordlength increases. 802.11a standard. The total area and dynamic power consumption Although we stick to the 2’s complement number system, without of the processor is 0.14 mm2 and 700μW respectively when loss of generality, this formulation can be adopted easily for synthesized in 0.18μm CMOS library which shows its effectiveness redundant arithmetic and higher radix formulation. A 16-bit as a low-area low-power processor. processor developed following this formulation requires 0.14 mm2 area and consumes 700 μW dynamic power when synthesized in 0.18μm CMOS library.
    [Show full text]
  • Synthesis and Verification of Digital Circuits Using Functional Simulation and Boolean Satisfiability
    Synthesis and Verification of Digital Circuits using Functional Simulation and Boolean Satisfiability by Stephen M. Plaza A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Science and Engineering) in The University of Michigan 2008 Doctoral Committee: Associate Professor Igor L. Markov, Co-Chair Assistant Professor Valeria M. Bertacco, Co-Chair Professor John P. Hayes Professor Karem A. Sakallah Associate Professor Dennis M. Sylvester Stephen M. Plaza 2008 c All Rights Reserved To my family, friends, and country ii ACKNOWLEDGEMENTS I would like to thank my advisers, Professor Igor Markov and Professor Valeria Bertacco, for inspiring me to consider various fields of research and providing feedback on my projects and papers. I also want to thank my defense committee for their comments and in- sights: Professor John Hayes, Professor Karem Sakallah, and Professor Dennis Sylvester. I would like to thank Professor David Kieras for enhancing my knowledge and apprecia- tion for computer programming and providing invaluable advice. Over the years, I have been fortunate to know and work with several wonderful stu- dents. I have collaborated extensively with Kai-hui Chang and Smita Krishnaswamy and have enjoyed numerous research discussions with them and have benefited from their in- sights. I would like to thank Ian Kountanis and Zaher Andraus for our many fun discus- sions on parallel SAT. I also appreciate the time spent collaborating with Kypros Constan- tinides and Jason Blome. Although I have not formally collaborated with Ilya Wagner, I have enjoyed numerous discussions with him during my doctoral studies. I also thank my office mates Jarrod Roy, Jin Hu, and Hector Garcia.
    [Show full text]
  • A Logic Synthesis Toolbox for Reducing the Multiplicative Complexity in Logic Networks
    A Logic Synthesis Toolbox for Reducing the Multiplicative Complexity in Logic Networks Eleonora Testa∗, Mathias Soekeny, Heinz Riener∗, Luca Amaruz and Giovanni De Micheli∗ ∗Integrated Systems Laboratory, EPFL, Lausanne, Switzerland yMicrosoft, Switzerland zSynopsys Inc., Design Group, Sunnyvale, California, USA Abstract—Logic synthesis is a fundamental step in the real- correlates to the resistance of the function against algebraic ization of modern integrated circuits. It has traditionally been attacks [10], while the multiplicative complexity of a logic employed for the optimization of CMOS-based designs, as well network implementing that function only provides an upper as for emerging technologies and quantum computing. Recently, bound. Consequently, minimizing the multiplicative complexity it found application in minimizing the number of AND gates in of a network is important to assess the real multiplicative cryptography benchmarks represented as xor-and graphs (XAGs). complexity of the function, and therefore its vulnerability. The number of AND gates in an XAG, which is called the logic net- work’s multiplicative complexity, plays a critical role in various Second, the number of AND gates plays an important role cryptography and security protocols such as fully homomorphic in high-level cryptography protocols such as zero-knowledge encryption (FHE) and secure multi-party computation (MPC). protocols, fully homomorphic encryption (FHE), and secure Further, the number of AND gates is also important to assess multi-party computation (MPC) [11], [12], [6]. For example, the the degree of vulnerability of a Boolean function, and influences size of the signature in post-quantum zero-knowledge signatures the cost of techniques to protect against side-channel attacks.
    [Show full text]
  • 18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures
    18-447 Computer Architecture Lecture 6: Multi-Cycle and Microprogrammed Microarchitectures Prof. Onur Mutlu Carnegie Mellon University Spring 2015, 1/28/2015 Agenda for Today & Next Few Lectures n Single-cycle Microarchitectures n Multi-cycle and Microprogrammed Microarchitectures n Pipelining n Issues in Pipelining: Control & Data Dependence Handling, State Maintenance and Recovery, … n Out-of-Order Execution n Issues in OoO Execution: Load-Store Handling, … 2 Reminder on Assignments n Lab 2 due next Friday (Feb 6) q Start early! n HW 1 due today n HW 2 out n Remember that all is for your benefit q Homeworks, especially so q All assignments can take time, but the goal is for you to learn very well 3 Lab 1 Grades 25 20 15 10 5 Number of Students 0 30 40 50 60 70 80 90 100 n Mean: 88.0 n Median: 96.0 n Standard Deviation: 16.9 4 Extra Credit for Lab Assignment 2 n Complete your normal (single-cycle) implementation first, and get it checked off in lab. n Then, implement the MIPS core using a microcoded approach similar to what we will discuss in class. n We are not specifying any particular details of the microcode format or the microarchitecture; you can be creative. n For the extra credit, the microcoded implementation should execute the same programs that your ordinary implementation does, and you should demo it by the normal lab deadline. n You will get maximum 4% of course grade n Document what you have done and demonstrate well 5 Readings for Today n P&P, Revised Appendix C q Microarchitecture of the LC-3b q Appendix A (LC-3b ISA) will be useful in following this n P&H, Appendix D q Mapping Control to Hardware n Optional q Maurice Wilkes, “The Best Way to Design an Automatic Calculating Machine,” Manchester Univ.
    [Show full text]
  • Logic Optimization and Synthesis: Trends and Directions in Industry
    Logic Optimization and Synthesis: Trends and Directions in Industry Luca Amaru´∗, Patrick Vuillod†, Jiong Luo∗, Janet Olson∗ ∗ Synopsys Inc., Design Group, Sunnyvale, California, USA † Synopsys Inc., Design Group, Grenoble, France Abstract—Logic synthesis is a key design step which optimizes of specific logic styles and cell layouts. Embedding as much abstract circuit representations and links them to technology. technology information as possible early in the logic optimiza- With CMOS technology moving into the deep nanometer regime, tion engine is key to make advantageous logic restructuring logic synthesis needs to be aware of physical informations early in the flow. With the rise of enhanced functionality nanodevices, opportunities carry over at the end of the design flow. research on technology needs the help of logic synthesis to capture In this paper, we examine the synergy between logic synthe- advantageous design opportunities. This paper deals with the syn- sis and technology, from an industrial perspective. We present ergy between logic synthesis and technology, from an industrial technology aware synthesis methods incorporating advanced perspective. First, we present new synthesis techniques which physical information at the core optimization engine. Internal embed detailed physical informations at the core optimization engine. Experiments show improved Quality of Results (QoR) and results evidence faster timing closure and better correlation better correlation between RTL synthesis and physical implemen- between RTL synthesis and physical implementation. We elab- tation. Second, we discuss the application of these new synthesis orate on synthesis aware technology development, where logic techniques in the early assessment of emerging nanodevices with synthesis enables a fair system-level assessment on emerging enhanced functionality.
    [Show full text]
  • System Design for a Computational-RAM Logic-In-Memory Parailel-Processing Machine
    System Design for a Computational-RAM Logic-In-Memory ParaIlel-Processing Machine Peter M. Nyasulu, B .Sc., M.Eng. A thesis submitted to the Faculty of Graduate Studies and Research in partial fulfillment of the requirements for the degree of Doctor of Philosophy Ottaw a-Carleton Ins titute for Eleceical and Computer Engineering, Department of Electronics, Faculty of Engineering, Carleton University, Ottawa, Ontario, Canada May, 1999 O Peter M. Nyasulu, 1999 National Library Biôiiothkque nationale du Canada Acquisitions and Acquisitions et Bibliographie Services services bibliographiques 39S Weiiington Street 395. nie WeUingtm OnawaON KlAW Ottawa ON K1A ON4 Canada Canada The author has granted a non- L'auteur a accordé une licence non exclusive licence allowing the exclusive permettant à la National Library of Canada to Bibliothèque nationale du Canada de reproduce, ban, distribute or seU reproduire, prêter, distribuer ou copies of this thesis in microform, vendre des copies de cette thèse sous paper or electronic formats. la forme de microficbe/nlm, de reproduction sur papier ou sur format électronique. The author retains ownership of the L'auteur conserve la propriété du copyright in this thesis. Neither the droit d'auteur qui protège cette thèse. thesis nor substantial extracts fkom it Ni la thèse ni des extraits substantiels may be printed or otherwise de celle-ci ne doivent être imprimés reproduced without the author's ou autrement reproduits sans son permission. autorisation. Abstract Integrating several 1-bit processing elements at the sense amplifiers of a standard RAM improves the performance of massively-paralle1 applications because of the inherent parallelism and high data bandwidth inside the memory chip.
    [Show full text]
  • Computer Architecture: Dataflow (Part I)
    Computer Architecture: Dataflow (Part I) Prof. Onur Mutlu Carnegie Mellon University A Note on This Lecture n These slides are from 18-742 Fall 2012, Parallel Computer Architecture, Lecture 22: Dataflow I n Video: n http://www.youtube.com/watch? v=D2uue7izU2c&list=PL5PHm2jkkXmh4cDkC3s1VBB7- njlgiG5d&index=19 2 Some Required Dataflow Readings n Dataflow at the ISA level q Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974. q Arvind and Nikhil, “Executing a Program on the MIT Tagged- Token Dataflow Architecture,” IEEE TC 1990. n Restricted Dataflow q Patt et al., “HPS, a new microarchitecture: rationale and introduction,” MICRO 1985. q Patt et al., “Critical issues regarding HPS, a high performance microarchitecture,” MICRO 1985. 3 Other Related Recommended Readings n Dataflow n Gurd et al., “The Manchester prototype dataflow computer,” CACM 1985. n Lee and Hurson, “Dataflow Architectures and Multithreading,” IEEE Computer 1994. n Restricted Dataflow q Sankaralingam et al., “Exploiting ILP, TLP and DLP with the Polymorphous TRIPS Architecture,” ISCA 2003. q Burger et al., “Scaling to the End of Silicon with EDGE Architectures,” IEEE Computer 2004. 4 Today n Start Dataflow 5 Data Flow Readings: Data Flow (I) n Dennis and Misunas, “A Preliminary Architecture for a Basic Data Flow Processor,” ISCA 1974. n Treleaven et al., “Data-Driven and Demand-Driven Computer Architecture,” ACM Computing Surveys 1982. n Veen, “Dataflow Machine Architecture,” ACM Computing Surveys 1986. n Gurd et al., “The Manchester prototype dataflow computer,” CACM 1985. n Arvind and Nikhil, “Executing a Program on the MIT Tagged-Token Dataflow Architecture,” IEEE TC 1990.
    [Show full text]
  • Introduction to ASIC Design
    ’14EC770 : ASIC DESIGN’ An Introduction Application - Specific Integrated Circuit Dr.K.Kalyani AP, ECE, TCE. 1 VLSI COMPANIES IN INDIA • Motorola India – IC design center • Texas Instruments – IC design center in Bangalore • VLSI India – ASIC design and FPGA services • VLSI Software – Design of electronic design automation tools • Microchip Technology – Offers VLSI CMOS semiconductor components for embedded systems • Delsoft – Electronic design automation, digital video technology and VLSI design services • Horizon Semiconductors – ASIC, VLSI and IC design training • Bit Mapper – Design, development & training • Calorex Institute of Technology – Courses in VLSI chip design, DSP and Verilog HDL • ControlNet India – VLSI design, network monitoring products and services • E Infochips – ASIC chip design, embedded systems and software development • EDAIndia – Resource on VLSI design centres and tutorials • Cypress Semiconductor – US semiconductor major Cypress has set up a VLSI development center in Bangalore • VDAT 2000 – Info on VLSI design and test workshops 2 VLSI COMPANIES IN INDIA • Sandeepani – VLSI design training courses • Sanyo LSI Technology – Semiconductor design centre of Sanyo Electronics • Semiconductor Complex – Manufacturer of microelectronics equipment like VLSIs & VLSI based systems & sub systems • Sequence Design – Provider of electronic design automation tools • Trident Techlabs – Power systems analysis software and electrical machine design services • VEDA IIT – Offers courses & training in VLSI design & development • Zensonet Technologies – VLSI IC design firm eg3.com – Useful links for the design engineer • Analog Devices India Product Development Center – Designs DSPs in Bangalore • CG-CoreEl Programmable Solutions – Design services in telecommunications, networking and DSP 3 Physical Design, CAD Tools. • SiCore Systems Pvt. Ltd. 161, Greams Road, ... • Silicon Automation Systems (India) Pvt. Ltd. ( SASI) ... • Tata Elxsi Ltd.
    [Show full text]
  • Designing a RISC CPU in Reversible Logic
    Designing a RISC CPU in Reversible Logic Robert Wille Mathias Soeken Daniel Große Eleonora Schonborn¨ Rolf Drechsler Institute of Computer Science, University of Bremen, 28359 Bremen, Germany frwille,msoeken,grosse,eleonora,[email protected] Abstract—Driven by its promising applications, reversible logic In this paper, the recent progress in the field of reversible cir- received significant attention. As a result, an impressive progress cuit design is employed in order to design a complex system, has been made in the development of synthesis approaches, i.e. a RISC CPU composed of reversible gates. Starting from implementation of sequential elements, and hardware description languages. In this paper, these recent achievements are employed a textual specification, first the core components of the CPU in order to design a RISC CPU in reversible logic that can are identified. Previously introduced approaches are applied execute software programs written in an assembler language. The next to realize the respective combinational and sequential respective combinational and sequential components are designed elements. More precisely, the combinational components are using state-of-the-art design techniques. designed using the reversible hardware description language SyReC [17], whereas for the realization of the sequential I. INTRODUCTION elements an external controller (as suggested in [16]) is utilized. With increasing miniaturization of integrated circuits, the Plugging the respective components together, a CPU design reduction of power dissipation has become a crucial issue in results which can process software programs written in an today’s hardware design process. While due to high integration assembler language. This is demonstrated in a case study, density and new fabrication processes, energy loss has sig- where the execution of a program determining Fibonacci nificantly been reduced over the last decades, physical limits numbers is simulated.
    [Show full text]
  • Datapath Design I Systems I
    Systems I Datapath Design I Topics Sequential instruction execution cycle Instruction mapping to hardware Instruction decoding Overview How do we build a digital computer? Hardware building blocks: digital logic primitives Instruction set architecture: what HW must implement Principled approach Hardware designed to implement one instruction at a time Plus connect to next instruction Decompose each instruction into a series of steps Expect that most steps will be common to many instructions Extend design from there Overlap execution of multiple instructions (pipelining) Later in this course Parallel execution of many instructions In more advanced computer architecture course 2 Y86 Instruction Set Byte 0 1 2 3 4 5 nop 0 0 addl 6 0 halt 1 0 subl 6 1 rrmovl rA, rB 2 0 rA rB andl 6 2 irmovl V, rB 3 0 8 rB V xorl 6 3 rmmovl rA, D(rB) 4 0 rA rB D jmp 7 0 mrmovl D(rB), rA 5 0 rA rB D jle 7 1 OPl rA, rB 6 fn rA rB jl 7 2 jXX Dest 7 fn Dest je 7 3 call Dest 8 0 Dest jne 7 4 ret 9 0 jge 7 5 pushl rA A 0 rA 8 jg 7 6 popl rA B 0 rA 8 3 Building Blocks fun Combinational Logic A A = Compute Boolean functions of L U inputs B 0 Continuously respond to input changes MUX Operate on data and implement 1 control Storage Elements valA A srcA Store bits valW Register W file dstW Addressable memories valB B Non-addressable registers srcB Clock Loaded only as clock rises Clock 4 Hardware Control Language Very simple hardware description language Can only express limited aspects of hardware operation Parts we want to explore and modify Data
    [Show full text]
  • A Dataflow Architecture for Beamforming Operations
    A dataflow architecture for beamforming operations Msc Assignment by Rinse Wester Supervisors: dr. ir. Andr´eB.J. Kokkeler dr. ir. Jan Kuper ir. Kenneth Rovers Anja Niedermeier, M.Sc ir. Andr´eW. Gunst dr. Albert-Jan Boonstra Computer Architecture for Embedded Systems Faculty of EEMCS University of Twente December 10, 2010 Abstract As current radio telescopes get bigger and bigger, so does the demand for processing power. General purpose processors are considered infeasible for this type of processing which is why this thesis investigates the design of a dataflow architecture. This architecture is able to execute the operations which are common in radio astronomy. The architecture presented in this thesis, the FlexCore, exploits regularities found in the mathematics on which the radio telescopes are based: FIR filters, FFTs and complex multiplications. Analysis shows that there is an overlap in these operations. The overlap is used to design the ALU of the architecture. However, this necessitates a way to handle state of the FIR filters. The architecture is not only able to execute dataflow graphs but also uses the dataflow techniques in the implementation. All communication between modules of the architecture are based on dataflow techniques i.e. execution is triggered by the availability of data. This techniques has been implemented using the hardware description language VHDL and forms the basis for the FlexCore design. The FlexCore is implemented using the TSMC 90 nm tech- nology. The design is done in two phases, first a design with a standard ALU is given which acts as reference design, secondly the Extended FlexCore is presented.
    [Show full text]