Parallelization in Co-Compilation for Configurable Accelerators

Total Page:16

File Type:pdf, Size:1020Kb

Parallelization in Co-Compilation for Configurable Accelerators Parallelization in Co-Compilation for Configurable Accelerators A Host / Accelerator Partitioning Compilation Method J. Becker R. Hartenstein, M. Herz, U. Nageldinger Microelectronics Systems Institute Computer Structures Group, Informatik Technische Universitaet Darmstadt (TUD) University of Kaiserslautern D-64283 Darmstadt, Germany D-67653 Kaiserslautern, Germany Phone: +49 6151 16-4337 Fax: +49 631 205 2640 Fax: +49 6151 16-4936 Home fax: +49 7251 14823 e-mail: [email protected] [email protected] http://www.microelectronic.e-technik. http://xputers.informatik.uni-kl.de tu-darmstadt.de/becker/becker.html Abstract— The paper introduces a novel co-compiler and its each new wave is triggered by a paradigm shift. The sec- “vertical” parallelization method, including a general model for ond wave has been triggered by shifting from hardwired to co-operating host/accelerator platforms and a new parallelizing programmable microcontroller. The third wave will be compilation technique derived from it. Small examples are used triggered by shifting to using reconfigurable hardware plat- for illustration. It explains the exploitation of different levels of forms as a basis of a new computational paradigm. parallelism to achieve optimized speed-ups and hardware re- source utilization. Section II introduces novel vertical paralleliza- Makimoto’s third wave takes into account that hardware has tion techniques involving parallelism exploitation at four differ- become soft. Emanating from field-programmable logic ent levels (task, loop, statement, and operation level) is explained, (FPL, also see [2]) and its application the awareness of the achieved by for configurable accelerators. Finally the results are new paradigm of structural programming is growing. Com- illustrated by a simple application example. But first the paper mercially available FPGAs make use of RAM-based recon- summarizes the fundamentally new dynamically reconfigurable figurability, where functions of circuit blocks and the structure hardware platform underlying the co-compilation method. of their interconnect is determined by bit patterns having been downloaded into “hidden RAM” inside the circuit. Modern I. INTRODUCTION FPGAs are reconfigurable within seconds or milliseconds, Tsugio Makimoto has observed cycles of changing main- even partially or incrementally. Such “dynamically reconfig- stream focus in semiconductor circuit design and applica- urable” circuits may even reconfigure themselves. An active tion [1] (fig. 1). Makimoto’s model obviously assumes, that circuit segment programs an idling other segment. So we have two programming paradigms: program- standardized the future? ming in time and in space, distinguishing Crisis Crisis two kinds of “software”: standard memory, recon- n sequential software (code transistors, micro- figurable Design Design downloaded to RAM) st nand, nor.. processor nd dynamically year 1967 1 1987 2 procedural n structural software (down- 1957 (TTL) customized 1977programming customized 1997 programmingstructural for TV, clock, (computing logic (ASIC), (computing loaded to hidden RAM) in space) calculator, in time) add-on etc. chips But Makimoto’s third wave is heavily de- customized layed. FPGAs are available, but are main- accelerators ly used for a tinkertoy approach, rather application of outsourcing: paradigm shift: crisis symptom: paradigm shift: than for a new paradigm. Is it realistic to transistor and system vendor to hardware to limitations of the procedural to believe, that Makimoto’s third wave will integrated circuit. component vendor software migration microprocessor structural migration come? If yes, what is the reason of its de- paradigm: new paradigm: new paradigm: lay? Although FPGA integration density algorithm: fixed algorithm: variable algorithm: variable has passed that of microprocessors, the resources: fixed resources: fixed resources: variable evolution of dynamically reconfigurable circuits is approaching a dead end. For a Fig. 1: Makimoto’s wave: summarizing the history of paradigm shifts in semiconductor markets. change new solutions are needed for 12 12 10 transistors/chip Fig. 3: FPGA high growth 10 transistors/chip 16G rate of integration density — 4G compared to memory and (the ÒroadmapÓ Xilinx fabricated 1G microprocessor. prediction for Xilinx: “planned” 9 256M 10 microprocessors 64M Xilinx: “perhaps” is too optimistic) 16M design gap 4M 109 memory1M α Progress: parallel to 256k P5 P-II memory 106 Gordon Moore Curve 64k 803866803068040 FPGAs 68020 16k 432 FPGA pre-design is 4k 68000 1k 8086 of full custom style 80808085 Microprocessor 3 ×100 / decade 8008 6 10 4004 Transistor count 10 ×1,6 / year) ( microprocessor exceeds that of the year microprocessor 1990 2000 2010 100 year to market computation time and uses only part of the logic elements 1960 1970 1980 1990 2000 2010 — in some cases even only about 50%. So FPGAs would hardly be the basis of the mainstream paradigm shift to dy- 2: The Gordon Moore curve and microprocessor curve - with design gap [4]. namically reconfigurable, such as e. g. predicted by Makim- oto’s wave [1] (also see analysis in [7]). some fundamental issues [3]. This paper analyzes the state of the art and introduces a fundamentally new approach, The reason of the immense FPGA area inefficiency is the which has to cope with: need for configuration memory and the extensive use of re- configurable routing channels, both being physical reconfig- n a hardware gap n a modeling gap urability overhead artifacts. n a software gap n an education gap B. Closing the Hardware Gap A. The Hardware Gap An alternative dynamically reconfigurable platform is the Comparing the Gordon Moore curve of integrated memory KressArray [16], being much less overhead-prone and more circuits versus that of microprocessors and other logic cir- area-efficient than FPGAs by about 3 orders of magnitude cuits (fig. 2) shows an increasing integration density gap, cur- (fig. 5). (This high density may be reason to need low power rently by about two orders of magnitude. We believe, that the design methods [18]). Also the KressArray integration densi- predictions in fig. 2 [4] are more realistic than the more opti- ty is growing a little faster than that of memories (fig. 5). The mistic ones of Semicon’s “road map” [5] (also [6]). high logical area efficiency is obtained by using multiplexers A main reason of this gap is the difference in design style inside the PEs (processing elements) instead of routing chan- [7]. The high density of memory circuits mainly relies on nels. Fig. 6 illustrates a 4 by 8 KressArray example. Fig. full custom style including wiring by abutment. Micropro- 8 illustrates the mapping (fig. b) of an application (fig. a: a cessors, however, include major chip areas defined by stan- system of 8 equations) onto this array. dard cell and similar styles based on “classical” placement and routing methods. This is a main reason of the density The Kress Array is a generalization of the systolic array — the gap, being a design gap. Another indication of increasing most area-efficient and throughput-efficient datapath design limitations of microprocessors is the rapidly growing usage 12 reconfigurability 10 transistors/chip of add-on accelerators: both boards and integrated circuits. overhead in FPGAs: Both, standard cell based ASICS and FPGAs ([2], [8]), are usu- logical chip area ally highly area-inefficient, because usual placement algo- uses only 1% of rithms use only flat wiring netlist statistics being much less rel- physical chip area [DeHon] - 1 logical . evant than needed for good optimization results A much better transistor per 200 9 placement strategy would be based on detailed data dependen- 10 physical transistors memory cy data directly extracted from a high level application specifi- [Tredennick] cation, like in synthesis of systolic arrays ([9],[10],[11]), where FPGAs physical it’s derived directly from a mathematical equation system or a high level program (“very high level synthesis”). Microprocessor Due to full custom design style FPGA integration density 106 (fig. 3) grows very fast (at a rate as high as that of memory Tredennick chips) and has already exceeded the of general purpose mi- Fig. 4: The hardware croprocessors [12]. But in FPGAs the reconfigurability gap of reconfigurability: overhead is very high (fig. 4). Figures having been pub- physical versus logical DeHon lished indicate 200 physical transistors needed for a logical integration density of FPGAs logical FPGAs — compared transistor ([13],[14]) or, only 1% of chip area is available year to microprocessors 103 for pure application logic [15]. Routing takes up to hours of and memory chips. 1990 2000 2010 12 10 transistors/chip The mapping problem has been mainly reduced to a place- KressArray: ment problem. Only a small residual routing problem goes logical integration beyond nearest neighbor interconnect, which uses a few PEs density is larger than that of FPGAs also as routing elements. DPSS includes a data scheduler to by about 3 orders organize and optimize data streams for host/array communi- of magnitude cation, being a separate algorithm carried out after place- 109 ment [16]. Instead of hours known from FPGA tools DPSS needs only a few seconds of computation time. Permitting memory KressArray alternative solutions by multiple turn-around within minutes the KressArray tools support experimental very rapid proto- typing, as well as profiling methods for known from hard- Microprocessor ware/software co-design (also see section II ff.).
Recommended publications
  • Switching Theory for Logic Synthesis Ad Hoc Wireless Networking Data
    COMMUNICATION ENGINEERING & SIGNAL PROCESSING Switching Theory for Logic Synthesis Tsutomu Sasao Contents: 1. Mathematical Foundation. 2. Lattice and Boolean Algebra. 3. Logic Functions and their Representations 4. Optimization of and-or Two-level Logic Networks. 5. Logic Functions with Various Properties. 6. Sequential Networks. 7. Optimization of Sequential Networks. 8. Delay and Asynchronous Behavior. 9. Multi-valued Input Two-valued Output Function. 10. Heuristic Optimization of Two- level Networks. 11. Multi-level Logic Synthesis. 12. Logic Design Using Modules. 13. Logic Design Using EXORs. 14. Complexity of Logic Networks Rpt. 2011 379 pp 978-81-84898-02-6 BSPSPR Rs. 595.00 Ad Hoc Wireless Networking For New Arrivals visit Cheng Contents: 1. Introduction 2. Related Work 3. Formulation of Power-aware Routing 4. Online Power-aware Routing with max-min zPmin 5. Hierarchical Routing with max-min zPmin 6. Distributed Routing with max-min zPmin Rpt. 2011 630 pp 978-81-84898-48-4 BSPSPR Rs. 650.00 Communications Satellite Handbook : Walter L. Morgan, Gary D. Gordon www.bspbooks.net / www.bspublications.net Contents: 1. Obtaining Access to the Satellite 2. TELETRAFFIC 3. Interfaces Between Terrestrial and Satellite Systems 4. Telecommunications Systems 5. COMMUNICATIONS SATELLITE SYSTEMS 6. System Modeling 7. Overall System Calculations 8. MULTIPLE-ACCESS TECHNIQUES 9. Frequency Domain Multiple Access 10. Time Domain Multiple Access 11. Space Domain Multiple Access 12. Code Domain Multiple Access 13. Random Multiple Access 14. SPACECRAFT TECHNOLOGY 15. Space Configuration and Subsystems 16. Telemetry, Tracking, and Command 17. Solar Arrays 18. Attitude Control 19. Thermal Control 20. SATELLITE ORBITS 21. Direction of Orbit Normals and of Sun 22.
    [Show full text]
  • Reconfigurable Computing
    Reconfigurable Computing: A Survey of Systems and Software KATHERINE COMPTON Northwestern University AND SCOTT HAUCK University of Washington Due to its potential to greatly accelerate a wide variety of applications, reconfigurable computing has become a subject of a great deal of research. Its key feature is the ability to perform computations in hardware to increase performance, while retaining much of the flexibility of a software solution. In this survey, we explore the hardware aspects of reconfigurable computing machines, from single chip architectures to multi-chip systems, including internal structures and external coupling. We also focus on the software that targets these machines, such as compilation tools that map high-level algorithms directly to the reconfigurable substrate. Finally, we consider the issues involved in run-time reconfigurable systems, which reuse the configurable hardware during program execution. Categories and Subject Descriptors: A.1 [Introductory and Survey]; B.6.1 [Logic Design]: Design Style—logic arrays; B.6.3 [Logic Design]: Design Aids; B.7.1 [Integrated Circuits]: Types and Design Styles—gate arrays General Terms: Design, Performance Additional Key Words and Phrases: Automatic design, field-programmable, FPGA, manual design, reconfigurable architectures, reconfigurable computing, reconfigurable systems 1. INTRODUCTION of algorithms. The first is to use hard- wired technology, either an Application There are two primary methods in con- Specific Integrated Circuit (ASIC) or a ventional computing for the execution group of individual components forming a This research was supported in part by Motorola, Inc., DARPA, and NSF. K. Compton was supported by an NSF fellowship. S. Hauck was supported in part by an NSF CAREER award and a Sloan Research Fellowship.
    [Show full text]
  • Reconfigurable Computing
    Reconfigurable computing: architectures and design methods T.J. Todman, G.A. Constantinides, S.J.E. Wilton, O. Mencer, W. Luk and P.Y.K. Cheung Abstract: Reconfigurable computing is becoming increasingly attractive for many applications. This survey covers two aspects of reconfigurable computing: architectures and design methods. The paper includes recent advances in reconfigurable architectures, such as the Alters Stratix II and Xilinx Virtex 4 FPGA devices. The authors identify major trends in general-purpose and special- purpose design methods. It is shown that reconfigurable computing designs are capable of achieving up to 500 times speedup and 70% energy savings over microprocessor implementations for specific applications. 1 Introduction Recent research suggests that it is a trend rather than a one-off for a wide variety of applications: from image Reconfigurable computing is rapidly establishing itself as a processing [3] to floating-point operations [4]. major discipline that covers various subjects of learning, Sheer speed, while important, is not the only strength of including both computing science and electronic engineer- reconfigurable computing. Another compelling advantage is ing. Reconfigurable computing involves the use of reduced energy and power consumption. In a reconfigurable reconfigurable devices, such as field programmable gate system, the circuitry is optimised for the application, such arrays (FPGAs), for computing purposes. Reconfigurable that the power consumption will tend to be much lower than computing is also known as configurable computing or that for a general-purpose processor. A recent study [5] custom computing, since many of the design techniques can reports that moving critical software loops to reconfigurable be seen as customising a computational fabric for specific hardware results in average energy savings of 35% to 70% applications [1].
    [Show full text]
  • The Paramountcy of Reconfigurable Computing
    Energy Efficient Distributed Computing Systems, Edited by Albert Y. Zomaya, Young Choon Lee. ISBN 978-0-471--90875-4 Copyright © 2012 Wiley, Inc. Chapter 18 The Paramountcy of Reconfigurable Computing Reiner Hartenstein Abstract. Computers are very important for all of us. But brute force disruptive architectural develop- ments in industry and threatening unaffordable operation cost by excessive power consumption are a mas- sive future survival problem for our existing cyber infrastructures, which we must not surrender. The pro- gress of performance in high performance computing (HPC) has stalled because of the „programming wall“ caused by lacking scalability of parallelism. This chapter shows that Reconfigurable Computing is the sil- ver bullet to obtain massively better energy efficiency as well as much better performance, also by the up- coming methodology of HPRC (high performance reconfigurable computing). We need a massive cam- paign for migration of software over to configware. Also because of the multicore parallelism dilemma, we anyway need to redefine programmer education. The impact is a fascinating challenge to reach new hori- zons of research in computer science. We need a new generation of talented innovative scientists and engi- neers to start the beginning second history of computing. This paper introduces a new world model. 18.1 Introduction In Reconfigurable Computing, e. g. by FPGA (Table 15), practically everything can be implemented which is running on traditional computing platforms. For instance, recently the historical Cray 1 supercomputer has been reproduced cycle-accurate binary-compatible using a single Xilinx Spartan-3E 1600 development board running at 33 MHz (the original Cray ran at 80 MHz) 0.
    [Show full text]
  • Generalizing Loop-Invariant Code Motion in a Real-World Compiler
    Imperial College London Department of Computing Generalizing loop-invariant code motion in a real-world compiler Author: Supervisor: Paul Colea Fabio Luporini Co-supervisor: Prof. Paul H. J. Kelly MEng Computing Individual Project June 2015 Abstract Motivated by the perpetual goal of automatically generating efficient code from high-level programming abstractions, compiler optimization has developed into an area of intense research. Apart from general-purpose transformations which are applicable to all or most programs, many highly domain-specific optimizations have also been developed. In this project, we extend such a domain-specific compiler optimization, initially described and implemented in the context of finite element analysis, to one that is suitable for arbitrary applications. Our optimization is a generalization of loop-invariant code motion, a technique which moves invariant statements out of program loops. The novelty of the transformation is due to its ability to avoid more redundant recomputation than normal code motion, at the cost of additional storage space. This project provides a theoretical description of the above technique which is fit for general programs, together with an implementation in LLVM, one of the most successful open-source compiler frameworks. We introduce a simple heuristic-driven profitability model which manages to successfully safeguard against potential performance regressions, at the cost of missing some speedup opportunities. We evaluate the functional correctness of our implementation using the comprehensive LLVM test suite, passing all of its 497 whole program tests. The results of our performance evaluation using the same set of tests reveal that generalized code motion is applicable to many programs, but that consistent performance gains depend on an accurate cost model.
    [Show full text]
  • A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco Van Amstel
    A Tiling Perspective for Register Optimization Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel To cite this version: Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel. A Tiling Perspective for Register Op- timization. [Research Report] RR-8541, Inria. 2014, pp.24. hal-00998915 HAL Id: hal-00998915 https://hal.inria.fr/hal-00998915 Submitted on 3 Jun 2014 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Tiling Perspective for Register Optimization Łukasz Domagała, Fabrice Rastello, Sadayappan Ponnuswany, Duco van Amstel RESEARCH REPORT N° 8541 May 2014 Project-Teams GCG ISSN 0249-6399 ISRN INRIA/RR--8541--FR+ENG A Tiling Perspective for Register Optimization Lukasz Domaga la∗, Fabrice Rastello†, Sadayappan Ponnuswany‡, Duco van Amstel§ Project-Teams GCG Research Report n° 8541 — May 2014 — 21 pages Abstract: Register allocation is a much studied problem. A particularly important context for optimizing register allocation is within loops, since a significant fraction of the execution time of programs is often inside loop code. A variety of algorithms have been proposed in the past for register allocation, but the complexity of the problem has resulted in a decoupling of several important aspects, including loop unrolling, register promotion, and instruction reordering.
    [Show full text]
  • Efficient Symbolic Analysis for Optimizing Compilers*
    Efficient Symbolic Analysis for Optimizing Compilers? Robert A. van Engelen Dept. of Computer Science, Florida State University, Tallahassee, FL 32306-4530 [email protected] Abstract. Because most of the execution time of a program is typically spend in loops, loop optimization is the main target of optimizing and re- structuring compilers. An accurate determination of induction variables and dependencies in loops is of paramount importance to many loop opti- mization and parallelization techniques, such as generalized loop strength reduction, loop parallelization by induction variable substitution, and loop-invariant expression elimination. In this paper we present a new method for induction variable recognition. Existing methods are either ad-hoc and not powerful enough to recognize some types of induction variables, or existing methods are powerful but not safe. The most pow- erful method known is the symbolic differencing method as demonstrated by the Parafrase-2 compiler on parallelizing the Perfect Benchmarks(R). However, symbolic differencing is inherently unsafe and a compiler that uses this method may produce incorrectly transformed programs without issuing a warning. In contrast, our method is safe, simpler to implement in a compiler, better adaptable for controlling loop transformations, and recognizes a larger class of induction variables. 1 Introduction It is well known that the optimization and parallelization of scientific applica- tions by restructuring compilers requires extensive analysis of induction vari- ables and dependencies
    [Show full text]
  • Using Static Single Assignment SSA Form
    Using Static Single Assignment SSA form Last Time B1 if (. ) B1 if (. ) • Basic definition, and why it is useful j j B2 x ← 5 B3 x ← 3 B2 x0 ← 5 B3 x1 ← 3 • How to build it j j B4 y ← x B4 x2 ← !(x0,x1) Today y ← x2 • Loop Optimizations – Induction variables (standard vs. SSA) – Loop Invariant Code Motion (SSA based) CS502 Using SSA 1 CS502 Using SSA 2 Loop Optimization Classical Loop Optimizations Loops are important, they execute often • typically, some regular access pattern • Loop Invariant Code Motion regularity ⇒ opportunity for improvement • Induction Variable Recognition repetition ⇒ savings are multiplied • Strength Reduction • assumption: loop bodies execute 10depth times • Linear Test Replacement • Loop Unrolling CS502 Using SSA 3 CS502 Using SSA 4 Other Loop Optimizations Loop Invariant Code Motion Other Loop Optimizations • Build the SSA graph semi-pruned • Scalar replacement • Need insertion of !-nodes: If two non-null paths x →+ z and y →+ z converge at node z, and nodes x • Loop Interchange and y contain assignments to t (in the original program), then a !-node for t must be inserted at z (in the new program) • Loop Fusion and t must be live across some basic block Simple test: • Loop Distribution If, for a statement s ≡ [x ← y ⊗ z], none of the operands y,z refer to a !-node or definition inside the loop, then • Loop Skewing Transform: assign the invariant computation a new temporary name, t ← y ⊗ z, move • Loop Reversal it to the loop pre-header, and assign x ← t. CS502 Using SSA 5 CS502 Using SSA 6 Loop Invariant Code
    [Show full text]
  • Loop Transformations and Parallelization
    Loop transformations and parallelization Claude Tadonki LAL/CNRS/IN2P3 University of Paris-Sud [email protected] December 2010 Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Introduction Most of the time, the most time consuming part of a program is on loops. Thus, loops optimization is critical in high performance computing. Depending on the target architecture, the goal of loops transformations are: improve data reuse and data locality efficient use of memory hierarchy reducing overheads associated with executing loops instructions pipeline maximize parallelism Loop transformations can be performed at different levels by the programmer, the compiler, or specialized tools. At high level, some well known transformations are commonly considered: loop interchange loop (node) splitting loop unswitching loop reversal loop fusion loop inversion loop skewing loop fission loop vectorization loop blocking loop unrolling loop parallelization Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Dependence analysis Extract and analyze the dependencies of a computation from its polyhedral model is a fundamental step toward loop optimization or scheduling. Definition For a given variable V and given indexes I1, I2, if the computation of X(I1) requires the value of X(I2), then I1 ¡ I2 is called a dependence vector for variable V . Drawing all the dependence vectors within the computation polytope yields the so-called dependencies diagram. Example The dependence vectors are (1; 0); (0; 1); (¡1; 1). Claude Tadonki Loop transformations and parallelization C. Tadonki – Loop transformations Scheduling Definition The computation on the entire domain of a given loop can be performed following any valid schedule.A timing function tV for variable V yields a valid schedule if and only if t(x) > t(x ¡ d); 8d 2 DV ; (1) where DV is the set of all dependence vectors for variable V .
    [Show full text]
  • Compiler-Based Code-Improvement Techniques
    Compiler-Based Code-Improvement Techniques KEITH D. COOPER, KATHRYN S. MCKINLEY, and LINDA TORCZON Since the earliest days of compilation, code quality has been recognized as an important problem [18]. A rich literature has developed around the issue of improving code quality. This paper surveys one part of that literature: code transformations intended to improve the running time of programs on uniprocessor machines. This paper emphasizes transformations intended to improve code quality rather than analysis methods. We describe analytical techniques and specific data-flow problems to the extent that they are necessary to understand the transformations. Other papers provide excellent summaries of the various sub-fields of program analysis. The paper is structured around a simple taxonomy that classifies transformations based on how they change the code. The taxonomy is populated with example transformations drawn from the literature. Each transformation is described at a depth that facilitates broad understanding; detailed references are provided for deeper study of individual transformations. The taxonomy provides the reader with a framework for thinking about code-improving transformations. It also serves as an organizing principle for the paper. Copyright 1998, all rights reserved. You may copy this article for your personal use in Comp 512. Further reproduction or distribution requires written permission from the authors. 1INTRODUCTION This paper presents an overview of compiler-based methods for improving the run-time behavior of programs — often mislabeled code optimization. These techniques have a long history in the literature. For example, Backus makes it quite clear that code quality was a major concern to the implementors of the first Fortran compilers [18].
    [Show full text]
  • Efpgas : Architectural Explorations, System Integration & a Visionary Industrial Survey of Programmable Technologies Syed Zahid Ahmed
    eFPGAs : Architectural Explorations, System Integration & a Visionary Industrial Survey of Programmable Technologies Syed Zahid Ahmed To cite this version: Syed Zahid Ahmed. eFPGAs : Architectural Explorations, System Integration & a Visionary Indus- trial Survey of Programmable Technologies. Micro and nanotechnologies/Microelectronics. Université Montpellier II - Sciences et Techniques du Languedoc, 2011. English. tel-00624418 HAL Id: tel-00624418 https://tel.archives-ouvertes.fr/tel-00624418 Submitted on 16 Sep 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Université Montpellier 2 (UM2) École Doctorale I2S LIRMM (Laboratoire d'Informatique, de Robotique et de Microélectronique de Montpellier) Domain: Microelectronics PhD thesis report for partial fulfillment of requirements of Doctorate degree of UM2 Thesis conducted in French Industrial PhD (CIFRE) framework between: Menta & LIRMM lab (Dec.2007 – Feb. 2011) in Montpellier, FRANCE “eFPGAs: Architectural Explorations, System Integration & a Visionary Industrial Survey of Programmable Technologies” eFPGAs: Explorations architecturales, integration système, et une enquête visionnaire industriel des technologies programmable by Syed Zahid AHMED Presented and defended publically on: 22 June 2011 Jury: Mr. Guy GOGNIAT Prof. at STICC/UBS (Lorient, FRANCE) President Mr. Habib MEHREZ Prof. at LIP6/UPMC (Paris, FRANCE) Reviewer Mr.
    [Show full text]
  • [email protected]
    Reiner Hartenstein, University of Kaiserslautern, Germany [email protected] http://hartenstein.de viewgraph downloading: link found in 60 Semester Informatik I http://kressarray.de Kritik an der Praktischen Informatik Xputer Lab University of Kaiserslautern (in der Lehre) Festkolloquium Universität Dortmund, 18. – 19. Juli 2002 • mißbraucht ihre Zweidrittel-Mehrheit • hält die Prägungsphase strikt „procedural-only“ • Absolventen sind daher völlig unvorbereitet für Reiner Hartenstein die nahe Zukunft Data-Stream-based Computing: Universität – Wo >90% der Anwendungen für eingebettete Kaiserslautern Antimaterie der Kern-Informatik Systeme implementiert werden – Wie für 2010 vorhergesagt • nur wenige % des Kurrikulum wären zu ändern • meine Mission: Sie hierfür zu gewinnen © 2002, [email protected] 2 http://KressArray.de Kritik an der Technischen Informatik, TI die Kern-Informatik: jung ? dynamisch ? University of Kaiserslautern (klassischer Art) University of Kaiserslautern .. ist nach >10 Technologie-Generationen ... • diese ist noch immer weit verbreitet das von Neumann Paradigma .... • keine Vorbereitung auf die heutige Arbeitswelt • 1th 4004 ... der vN Mikroprozessor • 2nd 8008 ... noch immer ist ein Methusalem ... • Indizien: Begriffe wie „Rechnerorganisation“, • 3rd 8086 die vorherrschende „Rechnerstrukturen, “„Rechnerarchitektur“ • 4th 80286 Doktrin • 5th 80386 ... die Dampfmaschine • vN-only, alles andere wird konsequent verschwiegen • 6th 80486 des Silizium-Zeitalters die Mikroelektronik • 7th P5 (Pentium) • Paradebeispiel:
    [Show full text]