A Hybrid-Parallel Architecture for Applications in Bioinformatics

Total Page:16

File Type:pdf, Size:1020Kb

A Hybrid-Parallel Architecture for Applications in Bioinformatics A Hybrid-parallel Architecture for Applications in Bioinformatics M.Sc. Jan Christian Kässens Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2017 Kiel Computer Science Series (KCSS) 2017/4 dated 2017-11-08 URN:NBN urn:nbn:de:gbv:8:1-zs-00000335-a3 ISSN 2193-6781 (print version) ISSN 2194-6639 (electronic version) Electronic version, updates, errata available via https://www.informatik.uni-kiel.de/kcss The author can be contacted via [email protected] Published by the Department of Computer Science, Kiel University Computer Engineering Group Please cite as: Ź Jan Christian Kässens. A Hybrid-parallel Architecture for Applications in Bioinformatics Num- ber 2017/4 in Kiel Computer Science Series. Department of Computer Science, 2017. Dissertation, Faculty of Engineering, Kiel University. @book{Kaessens17, author = {Jan Christian K\"assens}, title = {A Hybrid-parallel Architecture for Applications in Bioinformatics}, publisher = {Department of Computer Science, CAU Kiel}, year = {2017}, number = {2017/4}, doi = {10.21941/kcss/2017/4}, series = {Kiel Computer Science Series}, note = {Dissertation, Faculty of Engineering, Kiel University.} } © 2017 by Jan Christian Kässens ii About this Series The Kiel Computer Science Series (KCSS) covers dissertations, habilitation theses, lecture notes, textbooks, surveys, collections, handbooks, etc. written at the Department of Computer Science at Kiel University. It was initiated in 2011 to support authors in the dissemination of their work in electronic and printed form, without restricting their rights to their work. The series provides a unified appearance and aims at high-quality typography. The KCSS is an open access series; all series titles are electronically available free of charge at the department’s website. In addition, authors are encouraged to make printed copies available at a reasonable price, typically with a print-on-demand service. Please visit http://www.informatik.uni-kiel.de/kcss for more information, for instructions how to publish in the KCSS, and for access to all existing publications. iii 1. Gutachter: Prof. Dr. Manfred Schimmler Christian-Albrechts-Universität Kiel 2. Gutachter: Prof. Dr. Bertil Schmidt Johannes-Gutenberg-Universität Mainz Datum der mündlichen Prüfung: 10. Oktober 2017 iv Zusammenfassung Seit der Einführung der Next Generation Sequencing (NGS)-Technologie steigen die Datenmengen aus der Sequenzierung von Genomen besonders schnell. Die Verfügbarkeit dieser Daten führt wiederum zu der Erschlie- ßung neuer Felder in der Molekular- und Zellbiologie, sowie der Genetik, die wiederum neue Daten erzeugen. Andererseits steigt die verfügbare Rechenleistung in Rechenzentren nur linear. In den letzten Jahren zeigte sich jedoch, dass neue Spezialhardware in Rechenzentren Einzug nimmt, insbesondere Grafikprozessoren (GPUs) und, weniger verbreitet, FPGAs (field-programmable gate arrays). Durch den Leistungsbedarf angetrieben be- gannen Entwickler Standardsoftware auf diese neuen Systeme zu portieren, um die speziellen Fähigkeiten ausnutzen zu können. Systeme, die GPUs und FPGAs gemeinsam nutzen, sind jedoch selten zu finden. Besonde- re Herausforderungen stellt dabei einerseits der Bedarf an tiefgehendem Know-How in zwei diametral verschiedenen Programmierparadigmen dar, sowie das nötige Ingeneurswissen um das für heterogene Systeme typische Nadelöhr in der Kommunikation zwischen den beteiligten Geräten. Für diese Arbeit wurden zwei Algorithmen aus der Bioinformatik für die Implementierung auf einer neuen, hybrid-parallelen Rechnerarchitektur und Softwareplattform ausgewählt, die die Vorzüge von GPUs, FPGAs und CPUs auf besonders effiziente Weise nutzt. Es wird gezeigt, dass ei- ne solche Entwicklung nicht nur möglich ist, sondern die Rechenleistung homogener FPGA- oder GPU-Systeme vergleichbarer Größe übertrifft und dennoch weniger Energie benötigt. Beide Methoden werden genutzt, um Fall-Kontroll-Daten aus Assoziationsstudien auszuwerten und auf Interak- tionen zwischen zwei oder drei Genen zu analysieren. Insbesondere bei letzterem zeigt sich, dass die neu gewonnene Rechenleistung es erstmals ermöglicht, größere Datenmengen zu untersuchen, ohne ganze Rechenzen- tren über Wochen auszulasten. Der Erfolg der Architektur führt schließlich zu der Entwicklung eines Hochleistungsrechners auf Basis des vorliegenden Konzeptes. v Abstract Since the advent of Next Generation Sequencing (NGS) technology, the amount of data from whole genome sequencing has been rising fast. In turn, the availability of these resources led to the tapping of whole new research fields in molecular and cellular biology, producing even more data. On the other hand, the available computational power is only increasing linearly. In recent years though, special-purpose high-performance devices started to become prevalent in today’s scientific data centers, namely graphics processing units (GPUs) and, to a lesser extent, field-programmable gate arrays (FPGAs). Driven by the need for performance, developers started porting regular applications to GPU frameworks and FPGA configurations to exploit the special operations only these devices may perform in a timely manner. However, applications using both accelerator technologies are still rare. Major challenges in joint GPU/FPGA application development include the required deep knowledge of associated programming paradigms and the efficient communication both types of devices. In this work, two algorithms from bioinformatics are implemented on a custom hybrid-parallel hardware architecture and a highly concurrent software platform. It is shown that such a solution is not only possible to develop but also its ability to outperform implementations on similar- sized GPU or FPGA clusters in terms of both performance and energy consumption. Both algorithms analyze case/control data from genome- wide association studies to find interactions between two or three genes with different methods. Especially in the latter case, the newly available calculation power and method enables analyses of large data sets for the first time without occupying whole data centers for weeks. The success of the hybrid-parallel architecture proposal led to the development of a high- end array of FPGA/GPU accelerator pairs to provide even better runtimes and more possibilities. vii Acknowledgments I would not have been able to complete this work without the guidance and support of a lot of people and I would like to take this opportunity to express my gratitude. Probably the most important person during my academic career was my supervisor, mentor and leader of the Computer Engineering Group, Manfred Schimmler. From my second semester on, I regularly visited his tutoring lessons and got to know an enthusiastic lecturer that never failed to motivate me, and he obviously also didn’t fail this time. Besides his deep knowledge and his readiness to provide funds for prototype systems with uncertain value, this is probably the quality with the highest importance for me, thank you. On the administrative side, I wish to thank Brigitte Scheidemann, who did what can best be summarized as “always knowing what to do, even if or especially when nobody else did.” I further wish to thank Bertil Schmidt for his continued guidance in publishing articles and fruitful discussions about new technology, Andre Franke and David Ellinghaus for supporting my research even though I should probably have done other things in my working time. Every scientist in academia is only as good as his team and this is especially the case with me. Thank you for being my colleagues, Vasco Grossmann, Sven Koschnicke and Christoph Starke, for listening to my sometimes confused ideas and providing me with chocolate and sweets of all kinds. Special thanks go to my tutor, mentor, room mate and true friend (in temporal order) Lars Wienbrandt. A rather outstanding property of having him as a colleague and friend is the ability to discuss problems, both personal and professional, in a unique, in-depth, sometimes emotional and sometimes rough way, with him always being open and forgiving. Besides our relationship, he was the one that pulled me into working with FPGAs and eventually hybrid computers. Thanks for what you have done so far. On the non-academic side, I would like to express my gratitude to my family who always covered my back, especially to my wife Patricia and my daughter Freya who had to do an awful lot of time without me, even when I was technically present, but never complained and always beared with me when things didn’t go well. I will never forget that. viii Contents 1 Introduction1 1.1 Motivation..............................2 1.2 Related Work............................5 1.3 Structure of This Work.......................8 2 Architecture Essentials 11 2.1 Architectures Overview...................... 12 2.1.1 Central Processing Units (CPUs)............. 12 2.1.2 Graphics Processing Units (GPUs)............ 25 2.1.3 Field Programmable Gate Arrays (FPGAs)....... 31 2.1.4 Comparison and Suitability Assessment........ 38 2.2 Data Transfer and Communication................ 39 2.2.1 Direct Memory Access................... 40 2.2.2 PCI Express......................... 43 2.3 Parallel Computing......................... 47 2.3.1 Parallelism and Machine Models............. 47 2.3.2 Systolic Arrays....................... 49 3 Applications 53 3.1 A Primer on Bioinformatics...................
Recommended publications
  • Efficient Implementation of an Optimized Attack on a Reconfigurable Hardware Cluster
    Breaking ecc2-113: Efficient Implementation of an Optimized Attack on a Reconfigurable Hardware Cluster Susanne Engels Master’s Thesis. February 22, 2014. Chair for Embedded Security – Prof. Dr.-Ing. Christof Paar Advisor: Ralf Zimmermann EMSEC Abstract Elliptic curves have become widespread in cryptographic applications since they offer the same cryptographic functionality as public-key cryptosystems designed over integer rings while needing a much shorter bitlength. The resulting speedup in computation as well as the smaller storage needed for the keys, are reasons to favor elliptic curves. Nowadays, elliptic curves are employed in scenarios which affect the majority of people, such as protecting sensitive data on passports or securing the network communication used, for example, in online banking applications. This works analyzes the security of elliptic curves by practically attacking the very basis of its mathematical security — the Elliptic Curve Discrete Logarithm Problem (ECDLP) — of a binary field curve with a bitlength of 113. As our implementation platform, we choose the RIVYERA hardware consisting of multiple Field Programmable Gate Arrays (FPGAs) which will be united in order to perform the strongest attack known in literature to defeat generic curves: the parallel Pollard’s rho algorithm. Each FPGA will individually perform a what is called additive random walk until two of the walks collide, enabling us to recover the solution of the ECDLP in practice. We detail on our optimized VHDL implementation of dedicated parallel Pollard’s rho processing units with which we equip the individual FPGAs of our hardware cluster. The basic design criterion is to build a compact implementation where the amount of idling units — which deplete resources of the FPGA but contribute in only a fraction of the computations — is reduced to a minimum.
    [Show full text]
  • Intel 8080 Datasheet
    infel.. 8080A/8080A-1/8080A-2 8-BIT N-CHANNEL MICROPROCESSOR • TTL Drive Capability • Decimal, Binary, and Double Precision • 2,..,s (-1:1.3,..,s, -2:1.5 ,..,s) Instruction Arithmetic Cycle • Ability to Provide Priority Vectored • Powerful Problem Solving Instruction Interrupts Set • 512 Directly Addressed 110 Ports 6 General Purpose Registers and an Available In EXPRESS • Accumulator • - Standard Temperature Range 16-Blt Program Counter for Directly Available In 4Q-Lead Cerdlp and Plastic • Addressing up to 64K Bytes of Memory • Packages 16-Blt Stack Pointer and Stack (See Packaging Spec. Order #231369) • Manipulation Instructions for Rapid Switching of the Program Environment • The Intel 8080A is a complete 8-bit parallel central processing unit (CPU). It is fabricated on a single LSI chip using Intel's n-channel silicon gate MOS process. This offers the user a high performance solution to control and processing applications. The 8080A contains 6 8-bit general purpose working registers and an accumulator. The 6 general purpose registers may be addressed individually or in pairs providing both single and double precision operators. Arithmetic and logical instructions set or reset 4 testable flags. A fifth flag provides decimal arithmetic opera­ tion. The 8080A has an external stack feature wherein any portion of memory may be used as a last in/first out stack to store/retrieve the contents of the accumulator, flags, program counter, and all of the 6 general purpose registers. The 16-bit stack pointer controls the addressing of this external stack. This stack gives the 8080A the ability to easily handle multiple level priority interrupts by rapidly storing and restoring processor status.
    [Show full text]
  • 45-Year CPU Evolution: One Law and Two Equations
    45-year CPU evolution: one law and two equations Daniel Etiemble LRI-CNRS University Paris Sud Orsay, France [email protected] Abstract— Moore’s law and two equations allow to explain the a) IC is the instruction count. main trends of CPU evolution since MOS technologies have been b) CPI is the clock cycles per instruction and IPC = 1/CPI is the used to implement microprocessors. Instruction count per clock cycle. c) Tc is the clock cycle time and F=1/Tc is the clock frequency. Keywords—Moore’s law, execution time, CM0S power dissipation. The Power dissipation of CMOS circuits is the second I. INTRODUCTION equation (2). CMOS power dissipation is decomposed into static and dynamic powers. For dynamic power, Vdd is the power A new era started when MOS technologies were used to supply, F is the clock frequency, ΣCi is the sum of gate and build microprocessors. After pMOS (Intel 4004 in 1971) and interconnection capacitances and α is the average percentage of nMOS (Intel 8080 in 1974), CMOS became quickly the leading switching capacitances: α is the activity factor of the overall technology, used by Intel since 1985 with 80386 CPU. circuit MOS technologies obey an empirical law, stated in 1965 and 2 Pd = Pdstatic + α x ΣCi x Vdd x F (2) known as Moore’s law: the number of transistors integrated on a chip doubles every N months. Fig. 1 presents the evolution for II. CONSEQUENCES OF MOORE LAW DRAM memories, processors (MPU) and three types of read- only memories [1]. The growth rate decreases with years, from A.
    [Show full text]
  • Three-Dimensional Integrated Circuit Design: EDA, Design And
    Integrated Circuits and Systems Series Editor Anantha Chandrakasan, Massachusetts Institute of Technology Cambridge, Massachusetts For other titles published in this series, go to http://www.springer.com/series/7236 Yuan Xie · Jason Cong · Sachin Sapatnekar Editors Three-Dimensional Integrated Circuit Design EDA, Design and Microarchitectures 123 Editors Yuan Xie Jason Cong Department of Computer Science and Department of Computer Science Engineering University of California, Los Angeles Pennsylvania State University [email protected] [email protected] Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota [email protected] ISBN 978-1-4419-0783-7 e-ISBN 978-1-4419-0784-4 DOI 10.1007/978-1-4419-0784-4 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009939282 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Foreword We live in a time of great change.
    [Show full text]
  • Appendix D an Alternative to RISC: the Intel 80X86
    D.1 Introduction D-2 D.2 80x86 Registers and Data Addressing Modes D-3 D.3 80x86 Integer Operations D-6 D.4 80x86 Floating-Point Operations D-10 D.5 80x86 Instruction Encoding D-12 D.6 Putting It All Together: Measurements of Instruction Set Usage D-14 D.7 Concluding Remarks D-20 D.8 Historical Perspective and References D-21 D An Alternative to RISC: The Intel 80x86 The x86 isn’t all that complex—it just doesn’t make a lot of sense. Mike Johnson Leader of 80x86 Design at AMD, Microprocessor Report (1994) © 2003 Elsevier Science (USA). All rights reserved. D-2 I Appendix D An Alternative to RISC: The Intel 80x86 D.1 Introduction MIPS was the vision of a single architect. The pieces of this architecture fit nicely together and the whole architecture can be described succinctly. Such is not the case of the 80x86: It is the product of several independent groups who evolved the architecture over 20 years, adding new features to the original instruction set as you might add clothing to a packed bag. Here are important 80x86 milestones: I 1978—The Intel 8086 architecture was announced as an assembly language– compatible extension of the then-successful Intel 8080, an 8-bit microproces- sor. The 8086 is a 16-bit architecture, with all internal registers 16 bits wide. Whereas the 8080 was a straightforward accumulator machine, the 8086 extended the architecture with additional registers. Because nearly every reg- ister has a dedicated use, the 8086 falls somewhere between an accumulator machine and a general-purpose register machine, and can fairly be called an extended accumulator machine.
    [Show full text]
  • Wdv-Notes Stand: 29.DEZ.1994 (2.) 329 Intel Pentium – Business Must Learn from the Debacle
    wdv-notes Stand: 29.DEZ.1994 (2.) 329 Intel Pentium – Business Must Learn from the Debacle. Wiss.Datenverarbeitung © 1994–1995 Edited by Karl-Heinz Dittberner FREIE UNIVERSITÄT BERLIN Theo Die kanadische Monatszeitschrift The Im folgenden sowie in den wdv-notes verbreitet werden, wenn dabei die folgen- Computer Post, Winnipeg veröffentlicht Nr. 330 werden diese Artikel im Original den Spielregeln beachtet werden. in mehreren Artikeln in ihrer Januar-Aus- nachgedruckt. Der Dank dafür geht an Permission is hereby granted to copy gabe 1995 eine exzellente erste Zusam- Sylvia Douglas von The Computer Post, this article electronically or in any other menfassung des Debakels um den Defekt 301 – 68 Higgins Avenue, Winnipeg, Ma- form, provided it is reproduced without des Pentium-Mikroprozessors [1–2] des nitoba, Canada, Email: SDouglas@post. alteration, and you credit it to The Compu- Computergiganten Intel. mb.ca. Diese Artikel dürfen auch weiter- ter Post. Intel’s top of the line Pentium™ microproc- This particular error in the Pentium was in essor chip has turned out to have a slight flaw the floating-point divide unit. Intel manage- Editorial: The Computer Post – Jan.95 in its character: when dividing certain rare ment was concerned enough about it that pairs of floating point numbers, it gives the they pulled together a special team to assess The Way it Will Be wrong answer. For anyone who owns a Pen- the implications. tium-based computer, or was thinking about This month you’ll find we’re reporting a In the words of Andrew Grove, Intel’s CEO, buying one, or is just feeling curious, here lot of background information on Intel’s posting later to the Internet [3], “We were are..
    [Show full text]
  • The Birth, Evolution and Future of Microprocessor
    The Birth, Evolution and Future of Microprocessor Swetha Kogatam Computer Science Department San Jose State University San Jose, CA 95192 408-924-1000 [email protected] ABSTRACT timed sequence through the bus system to output devices such as The world's first microprocessor, the 4004, was co-developed by CRT Screens, networks, or printers. In some cases, the terms Busicom, a Japanese manufacturer of calculators, and Intel, a U.S. 'CPU' and 'microprocessor' are used interchangeably to denote the manufacturer of semiconductors. The basic architecture of 4004 same device. was developed in August 1969; a concrete plan for the 4004 The different ways in which microprocessors are categorized are: system was finalized in December 1969; and the first microprocessor was successfully developed in March 1971. a) CISC (Complex Instruction Set Computers) Microprocessors, which became the "technology to open up a new b) RISC (Reduced Instruction Set Computers) era," brought two outstanding impacts, "power of intelligence" and "power of computing". First, microprocessors opened up a new a) VLIW(Very Long Instruction Word Computers) "era of programming" through replacing with software, the b) Super scalar processors hardwired logic based on IC's of the former "era of logic". At the same time, microprocessors allowed young engineers access to "power of computing" for the creative development of personal 2. BIRTH OF THE MICROPROCESSOR computers and computer games, which in turn led to growth in the In 1970, Intel introduced the first dynamic RAM, which increased software industry, and paved the way to the development of high- IC memory by a factor of four.
    [Show full text]
  • Professor Won Woo Ro, School of Electrical and Electronic Engineering Yonsei University the Intel® 4004 Microprocessor, Introdu
    Professor Won Woo Ro, School of Electrical and Electronic Engineering Yonsei University The 1st Microprocessor The Intel® 4004 microprocessor, introduced in November 1971 An electronics revolution that changed our world. There were no customer‐ programmable microprocessors on the market before the 4004. It propelled software into the limelight as a key player in the world of digital electronics design. 4004 Microprocessor Display at New Intel Museum A Japanese calculator maker (Busicom) asked to design: A set of 12 custom logic chips for a line of programmable calculators. Marcian E. "Ted" Hoff Recognized the integrated circuit technology (of the day) had advanced enough to build a single chip, general purpose computer. Federico Faggin to turn Hoff's vision into a silicon reality. (In less than one year, Faggin and his team delivered the 4004, which was introduced in November, 1971.) The world's first microprocessor application was this Busicom calculator. (sold about 100,000 calculators.) Measuring 1/8 inch wide by 1/6 inch long, consisting of 2,300 transistors, Intel’s 4004 microprocessor had as much computing power as the first electronic computer, ENIAC. 2 inch 4004 and 12 inch Core™2 Duo wafer ENIAC, built in 1946, filled 3000‐cubic‐ feet of space and contained 18,000 vacuum tubes. The 4004 microprocessor could execute 60,000 operations per second Running frequency: 108 KHz Founders wanted to name their new company Moore Noyce. However the name sounds very much similar to “more noise”. "Only the paranoid survive". Moore received a B.S. degree in Chemistry from the University of California, Berkeley in 1950 and a Ph.D.
    [Show full text]
  • Declustering Spatial Databases on a Multi-Computer Architecture
    Declustering spatial databases on a multi-computer architecture 1 2 ? 3 Nikos Koudas and Christos Faloutsos and Ibrahim Kamel 1 Computer Systems Research Institute University of Toronto 2 AT&T Bell Lab oratories Murray Hill, NJ 3 Matsushita Information Technology Lab oratory Abstract. We present a technique to decluster a spatial access metho d + on a shared-nothing multi-computer architecture [DGS 90]. We prop ose a software architecture with the R-tree as the underlying spatial access metho d, with its non-leaf levels on the `master-server' and its leaf no des distributed across the servers. The ma jor contribution of our work is the study of the optimal capacity of leaf no des, or `chunk size' (or `striping unit'): we express the resp onse time on range queries as a function of the `chunk size', and we show how to optimize it. We implemented our metho d on a network of workstations, using a real dataset, and we compared the exp erimental and the theoretical results. The conclusion is that our formula for the resp onse time is very accurate (the maximum relative error was 29%; the typical error was in the vicinity of 10-15%). We illustrate one of the p ossible ways to exploit such an accurate formula, by examining several `what-if ' scenarios. One ma jor, practical conclusion is that a chunk size of 1 page gives either optimal or close to optimal results, for a wide range of the parameters. Keywords: Parallel data bases, spatial access metho ds, shared nothing ar- chitecture. 1 Intro duction One of the requirements for the database management systems (DBMSs) of the future is the ability to handle spatial data.
    [Show full text]
  • High Performance Computing Zur Technischen Finanzmarktanalyse
    High Performance Computing zur technischen Finanzmarktanalyse Christoph Starke Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2012 1. Gutachter: Prof. Dr. Manfred Schimmler Christian-Albrechts-Universität zu Kiel 2. Gutachter: Prof. Dr. Andreas Speck Christian-Albrechts-Universität zu Kiel Datum der mündlichen Prüfung: 24.9.2012 ii Zusammenfassung Auf Grundlagen der technischen Finanzmarktanalyse wird ein Algorith- mus für eine sicherheitsorientierte Wertpapierhandelsstrategie entwickelt. Maßgeblich für den Erfolg der Handelsstrategie ist dabei eine mög- lichst optimale Gewichtung mehrerer Indikatoren. Die Ermittlung dieser Gewichte erfolgt in einer sogenannten Kalibrierungsphase, die extrem rechenintensiv ist. Bei einer direkten Implementierung auf einem herkömmlichen High Performance PC würde diese Kalibrierungsphase zigtausend Jahre dauern. Deshalb wird eine parallele Version des Algorithmus entwickelt, die hervorragend für die massiv parallele, FPGA-basierte Rechnerarchitektur der RIVYERA geeignet ist, die am Lehrstuhl für technische Infor- matik der Christian-Albrechts-Universität zu Kiel entwickelt wurde. Durch mathematisch äquivalente Transformationen und Optimierungs- schritte aus verschiedenen Bereichen der Informatik gelingt eine FPGA- Implementierung mit einer im Vergleich zu dem PC mehr als 22.600-fach höheren Performance. Darauf aufbauend wird durch die zusätzliche Ent- wicklung eines
    [Show full text]
  • RZ Family Microprocessors Brochure
    RZ FAMILY Renesas Microprocessor 2018.10 THE NEXT-GENERATION PROCESSOR TO MEET THE NEEDS OF THE SMART SOCIETY HAS ARRIVED. achine In M te n r a f a m c u e H ntr wo Co ol Net rk CONTENTS RZ/A SERIES ___________________________________________ 04 RZ/G SERIES ___________________________________________ 10 RZ/T SERIES ___________________________________________ 16 RZ/N SERIES ___________________________________________ 22 RZ SPECIFICATIONS ______________________________________ 28 PACKAGE LINEUP _______________________________________ 38 02-03 The utilization of intelligent technology is advancing in all aspects of our lives, including electric household appliances, industrial equipment, building management, power grids, and transportation. The cloud-connected “smart society” is coming ever closer to realization. Microcontrollers are now expected to provide powerful capabilities not available previously, such as high-performance and energy-efficient control combined with interoperation with IT networks, support for human-machine interfaces, and more. To meet the demands of this new age, Renesas has drawn on its unmatched expertise in microcontrollers to create the RZ family of embedded processors. The lineup of these “next-generation processors that are as easy to use as conventional microcontrollers” to meet different customer requirements. The Zenith of the Renesas micro As embedded processors to help build the next generation of advanced products, the RZ family offers features not available elsewhere and brings new value to customer applications.
    [Show full text]
  • Chap01: Computer Abstractions and Technology
    CHAPTER 1 Computer Abstractions and Technology 1.1 Introduction 3 1.2 Eight Great Ideas in Computer Architecture 11 1.3 Below Your Program 13 1.4 Under the Covers 16 1.5 Technologies for Building Processors and Memory 24 1.6 Performance 28 1.7 The Power Wall 40 1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43 1.9 Real Stuff: Benchmarking the Intel Core i7 46 1.10 Fallacies and Pitfalls 49 1.11 Concluding Remarks 52 1.12 Historical Perspective and Further Reading 54 1.13 Exercises 54 CMPS290 Class Notes (Chap01) Page 1 / 24 by Kuo-pao Yang 1.1 Introduction 3 Modern computer technology requires professionals of every computing specialty to understand both hardware and software. Classes of Computing Applications and Their Characteristics Personal computers o A computer designed for use by an individual, usually incorporating a graphics display, a keyboard, and a mouse. o Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party software. o This class of computing drove the evolution of many computing technologies, which is only about 35 years old! Server computers o A computer used for running larger programs for multiple users, often simultaneously, and typically accessed only via a network. o Servers are built from the same basic technology as desktop computers, but provide for greater computing, storage, and input/output capacity. Supercomputers o A class of computers with the highest performance and cost o Supercomputers consist of tens of thousands of processors and many terabytes of memory, and cost tens to hundreds of millions of dollars.
    [Show full text]