Design of Application Specific Processor Architectures

Total Page:16

File Type:pdf, Size:1020Kb

Design of Application Specific Processor Architectures Design of Application Specific Processor Architectures Rainer Leupers RWTH Aachen University Software for Systems on Silicon [email protected] Institute for Integrated Signal Processing Systems Overview: Geography Berlin Aachen 2005 © R. Leupers 2 About ISS ¾ RWTH Aachen is a top-ranked technical university in Germany ¾ ISS institute 3 professors (Meyr, Ascheid, Leupers) 18 Ph.D. students 5 staff ¾ Research on wireless communication systems tight industry cooperations origin of several EDA spin-off companies (e.g. Cadis, Axys, LISATek) SSS group focuses on embedded processor design tools 2005 © R. Leupers 3 Overview 1. Introduction 2. ASIP design methodologies 3. Software tools 4. ASIP architecture design 5. Case study 6. Advanced research topics 2005 © R. Leupers 4 1. Introduction 2005 © R. Leupers 5 Embedded system design automation ¾ Embedded systems Special-purpose electronic devices Very different from desktop computers ¾ Strength of European IT market Telecom, consumer, automotive, medical, ... Siemens, Nokia, Bosch, Infineon, ... ¾ New design requirements Low NRE cost, high efficiency requirements Real-time operation, dependability Keep pace with Moore´s Law 2005 © R. Leupers 6 What to do with chip area ? 2005 © R. Leupers 7 Example: wireless multimedia terminals ¾ Multistandard radio UMTS GSM/GPRS/EDGE WLAN Bluetooth UWB … ¾ Multimedia standards Key issues: MPEG-4 Key issues: MP3 ••TimeTime toto marketmarket((≤≤1212 months)months) AAC • Flexibility (ongoing standard GPS • Flexibility (ongoing standard updates) DVB-H updates) … ••EfficiencyEfficiency(battery(batteryoperation)operation) 2005 © R. Leupers 8 Application specific processors (ASIPs) „As the performance of conventional microprocessors improves, they first meet and then exceed the requirements of most computing applications. Initially, performance is key. But eventually, other factors, like customization, become more important to the customer...“ [M.J. Bass, C.M. Christensen: The Future of the Microprocessor Business, IEEE Spectrum 2002] design budget = (semiconductor revenue) × (% for R&D) growth ≈ 15% ≈ 10% # IC designs = (design budget) / (design cost per IC) growth ≈ 15% growth ≈ 50-100% [Keutzer05] → Customizable application specific processors as reusable, programmable platforms 2005 © R. Leupers 9 Efficiency and flexibility General Purpose Digital Signal Processors Application Processors Specific Instruction Set Processors Field SW Programmable 6 Devices WhyDesignuse ASIPs? . 10 • Higher efficiency for given range Application 5 D I S S I P A T I O N of applications Specific 10 ICs • IP protection • Cost reduction (no HWroyalties) Design Physically Log F L E X I B I L I T Y Optimized • Product differentiation ICs Log P O W E R Log P E R F O R M A N C E 103 . 104 Source: T.Noll, RWTH Aachen 2005 © R. Leupers 10 Standard-CPU vs. ASIP 2005 © R. Leupers 11 2. ASIP design methodologies 2005 © R. Leupers 12 ASIP architecture exploration Application initial processor architecture Profiler Compiler Simulator Assembler optimized Linker processor architecture 2005 © R. Leupers 13 Expression (UC Irvine) 2005 © R. Leupers 14 Tensilica Xtensa/XPRES Source: Tensilica Inc. 2005 © R. Leupers 15 MIPS CorXtend/CoWare CorXpert Synthesize HW and profile with Replace MIPSsim Profile critical code and and 2 with special extensions H identify instruction User Defined Instruction o custom 3 t instructions + s p CorExtend Module o 1 t User Defined Instruction 2005 © R. Leupers 16 CoWare LISATek ASIP architecture exploration ¾ Integrated embedded processor development environment ¾ Unified processor model in LISA 2.0 architecture description language (ADL) ¾ Automatic generation of: SW tools HW models 2005 © R. Leupers 17 LISA operation hierarchy main Reflects hierarchical organization of ISAs decode addr cond opcode opnds imm linear cycl control arithm move short long add sub mul and or 2005 © R. Leupers 18 LISA operations structure LISA operation References to other operations DECLARE Binary coding CODING Assembly syntax SYNTAX Resource access, e.g. registers EXPRESSION Initiate “downstream” operations in pipe ACTIVATION Computation and processor state update BEHAVIOR C compiler generation SEMANTICS 2005 © R. Leupers 19 LISA operation example OPERATION ADD { DECLARE { GROUP src1, src2, dest = { Register } } CODING { 0b1011 src1 src2 dest } SYNTAX { “ADD” dest “,” src1 “,” src2 } BEHAVIOR { dest = src1 + src2; } C/C++ Code } OPERATION Register { DECLARE { LABEL index; ADD } CODING { index } src1 src2 dest SYNTAX { “R” index } EXPRESSION{ R[index] } } Register Register Register 2005 © R. Leupers 20 Exploration/debugger GUI ••ApplicationApplicationssimulationimulation ••DebuggingDebugging ••ProfilingProfiling ••ResourceResourceutilizationutilizationaanalysisnalysis ••PipelinePipeline analysisanalysis ••ProcessorProcessormodelmodelddebuggingebugging ••MemoryMemoryhierarchyhierarchyeexplorationxploration ••CodeCode coveragecoverageanalysisanalysis ••...... 2005 © R. Leupers 21 Some available LISA 2.0 models ¾ DSP: •VLIW: Texas Instruments – Texas Instruments TMS320C54x TMS320C6x Analog Devices – STMicroelectronics ADSP21xx ST220 Motorola 56000 •µC: ¾ RISC: –MHS80C51 MIPS32 4K • ASIP: ESA LEON SPARC 8 – Infineon PP32 NPU ARM7100 – Infineon ICore ARM926 – MorphICs DSP 2005 © R. Leupers 22 3. Software tools 2005 © R. Leupers 23 Tools generated from processor ADL model Application Profiler Compiler Simulator Assembler Linker 2005 © R. Leupers 24 Instruction set simulation Interpretive: •flexible Decode Execute Application Instruction • slow (~ 100 KIPS) Memory RunRun--TimTimee Compiled: • fast (> 10 MIPS) Instruction Behavior Simulation • inflexible Application Compiler Execute Compiled Program • high memory Simulation Memory consumption CCoompilmpilee--TimeTime RunRun--TimTimee JIT-CCS™: • „just-in-time“ Instruction Instruction Behavior Application Decode Compiled Execute compiled Program Simulation Memory Cache • SW simulation cache RunRun--TimTimee • fast and flexible 2005 © R. Leupers 25 JIT-CC simulation performance • Dependent on simulation cache size • 95% of compiled simulation performance @ 4096 cache blocks (10% memory consumption of compiled sim.) • Example: ST200 VLIW DSP 9 100 8 90 C ] 80 a S 7 c h IP 70 6 eM [M 60 i 5 s ce sR n 50 a 4 a m 40 r t i o 3 o 30 rf [ 2 % Pe 20 ] 1 10 0 0 d e 8 6 6 2 4 8 6 2 4 8 e iv 1 32 64 9 9 il t 128 25 51 02 38 76 p e 1 204 40 81 m rpr 16 32 Co te In Cache size [records] 2005 © R. Leupers 26 Why care about C compilers? ¾ Embedded SW design becoming predominant manpower factor in system design ¾ Cannot develop/maintain millions of code lines in assembly language ¾ Move to high-level programming languages 2005 © R. Leupers 27 Why care about compilers? ¾ Trend towards heterogeneous multiprocessor systems-on- chip (MPSoC) ¾ Customized application specific instruction set processors (ASIPs) are key MPSoC components ¾ How to achieve efficient compiler support for ASIPs? ASICASIC CPUCPU ASIPASIP ASICASIC CPUCPU ASIPASIP CPUCPU ASIPASIP MemMem MemoryMemory MemoryMemory MemoryMemory 2005 © R. Leupers 28 C compiler in the exploration loop „Compiler/Architecture Co-Design“ ¾ Efficient C-compilers cannot be designed for ARBITRARY architectures! Application Compiler Processor Results Software ¾ Compiler and processor form a UNIT that needs to be optimized! ¾ “Compiler-friendliness“ needs to be taken into account during the architecture exploration! 2005 © R. Leupers 29 Retargetable compilers Classical compiler Retargetable compiler source source processor code code model Compiler processor Compiler model Compiler asm asm code code 2005 © R. Leupers 30 GNU C compiler (gcc) • Probably the most widespread retargetable compiler • Mostly used as a native Unix/Linux compiler, but may operate as a cross-compiler, too • Support for C/C++, Java, and other languages • Comes with comprehensive support software, e.g. runtime and standard libraries, debug support • Portable to new architectures by means of machine description file and C support routines “The“The main main goal goal of of GCC GCC was was to to make make a a good, good, fast fast compiler compiler for for machinesmachines in in the the clas classs that that the the GN GNUU system system aims aims to to run run on: on: 32-bit 32-bit macmachhinineses that that address address 8-bit 8-bit bytes bytes and and have have several several general general registers. registers. Elegance,Elegance, theoretical theoretical power power and and simplicity simplicity are are only only secondary.” secondary.” 2005 © R. Leupers 31 LANCE (Univ. of Dortmund/RWTH Aachen) ¾ Modular retargetable C compiler system ¾ www.lancecompiler.com LANCE library LANCE tools C frontend header file lance2.h IR optimization 1 IR-C used by C++ library IR optimization n liblance2.a backend interface 2005 © R. Leupers 32 Executable C based IR in the LANCE compiler C source code int A[10]; char *t1,*t3; symbol void f() int i,t2,t5,t6,t7,t8; table { int *t4; int i,A[10]; t3 = (char *)A; // cast base to char* i = A[2]++ t2 = 2 * 4; // compute offset > 1 ? t1 = t3 + t2; // compute eff addr t4 = (int *)t1; // cast back to int* 2 : 3; t5 = *t4; // load value from memory } array t6 = t5 + 1; // increment access *t4 = t6; // store back into A[2] increment t7 = t5 > 1; // compare if (t7) goto L1; // jump if > t8 = 3; // load 3 if <= conditional goto L2; // goto join point L1: t8 = 2; // load 2 if > expression L2: i = t8; // move result into i 2005 © R. Leupers 33 CoSy compiler system (ACE) • Universal
Recommended publications
  • Efficient Checker Processor Design
    Efficient Checker Processor Design Saugata Chatterjee, Chris Weaver, and Todd Austin Electrical Engineering and Computer Science Department University of Michigan {saugatac,chriswea,austin}@eecs.umich.edu Abstract system works to increase test space coverage by proving a design is correct, either through model equivalence or assertion. The approach is significantly more efficient The design and implementation of a modern micro- than simulation-based testing as a single proof can ver- processor creates many reliability challenges. Design- ify correctness over large portions of a design’s state ers must verify the correctness of large complex systems space. However, complex modern pipelines with impre- and construct implementations that work reliably in var- cise state management, out-of-order execution, and ied (and occasionally adverse) operating conditions. In aggressive speculation are too stateful or incomprehen- our previous work, we proposed a solution to these sible to permit complete formal verification. problems by adding a simple, easily verifiable checker To further complicate verification, new reliability processor at pipeline retirement. Performance analyses challenges are materializing in deep submicron fabrica- of our initial design were promising, overall slowdowns tion technologies (i.e. process technologies with mini- due to checker processor hazards were less than 3%. mum feature sizes below 0.25um). Finer feature sizes However, slowdowns for some outlier programs were are generally characterized by increased complexity, larger. more exposure to noise-related faults, and interference In this paper, we examine closely the operation of the from single event radiation (SER). It appears the current checker processor. We identify the specific reasons why advances in verification (e.g., formal verification, the initial design works well for some programs, but model-based test generation) are not keeping pace with slows others.
    [Show full text]
  • Three-Dimensional Integrated Circuit Design: EDA, Design And
    Integrated Circuits and Systems Series Editor Anantha Chandrakasan, Massachusetts Institute of Technology Cambridge, Massachusetts For other titles published in this series, go to http://www.springer.com/series/7236 Yuan Xie · Jason Cong · Sachin Sapatnekar Editors Three-Dimensional Integrated Circuit Design EDA, Design and Microarchitectures 123 Editors Yuan Xie Jason Cong Department of Computer Science and Department of Computer Science Engineering University of California, Los Angeles Pennsylvania State University [email protected] [email protected] Sachin Sapatnekar Department of Electrical and Computer Engineering University of Minnesota [email protected] ISBN 978-1-4419-0783-7 e-ISBN 978-1-4419-0784-4 DOI 10.1007/978-1-4419-0784-4 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2009939282 © Springer Science+Business Media, LLC 2010 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com) Foreword We live in a time of great change.
    [Show full text]
  • Understanding Performance Numbers in Integrated Circuit Design Oprecomp Summer School 2019, Perugia Italy 5 September 2019
    Understanding performance numbers in Integrated Circuit Design Oprecomp summer school 2019, Perugia Italy 5 September 2019 Frank K. G¨urkaynak [email protected] Integrated Systems Laboratory Introduction Cost Design Flow Area Speed Area/Speed Trade-offs Power Conclusions 2/74 Who Am I? Born in Istanbul, Turkey Studied and worked at: Istanbul Technical University, Istanbul, Turkey EPFL, Lausanne, Switzerland Worcester Polytechnic Institute, Worcester MA, USA Since 2008: Integrated Systems Laboratory, ETH Zurich Director, Microelectronics Design Center Senior Scientist, group of Prof. Luca Benini Interests: Digital Integrated Circuits Cryptographic Hardware Design Design Flows for Digital Design Processor Design Open Source Hardware Integrated Systems Laboratory Introduction Cost Design Flow Area Speed Area/Speed Trade-offs Power Conclusions 3/74 What Will We Discuss Today? Introduction Cost Structure of Integrated Circuits (ICs) Measuring performance of ICs Why is it difficult? EDA tools should give us a number Area How do people report area? Is that fair? Speed How fast does my circuit actually work? Power These days much more important, but also much harder to get right Integrated Systems Laboratory The performance establishes the solution space Finally the cost sets a limit to what is possible Introduction Cost Design Flow Area Speed Area/Speed Trade-offs Power Conclusions 4/74 System Design Requirements System Requirements Functionality Functionality determines what the system will do Integrated Systems Laboratory Finally the cost sets a limit
    [Show full text]
  • DSP56303 Product Brief, Rev
    Freescale Semiconductor DSP56303PB Product Brief Rev. 2, 2/2005 DSP56303 24-Bit Digital Signal Processor 16 6 6 3 Memory Expansion Area The DSP56303 is intended Triple X Data HI08 ESSI SCI PrograM Y Data for use in telecommunication Timer RAM RAM RAM 4096 × 24 2048 × 24 2048 × 24 applications, such as multi- bits bits bits line voice/data/ fax (default) (default) (default) processing, video Peripheral Expansion Area conferencing, audio PM_EB XM_EB PIO_EB YAB YM_EB applications, control, and Address External 18 Generation XAB Address general digital signal Unit PAB Bus Address Six-Channel DAB Switch processing. DMA Unit External 24-Bit Bus 13 Bootstrap DSP56300 Interface ROM and Inst. Core Cache Control Control DDB 24 Internal YDB External Data XDB Data Bus Bus PDB Switch Data Switch GDB EXTAL Power Clock Management Data ALU 5 Generator Program Program Program × + → XTAL Interrupt Decode Address 24 24 56 56-bit MAC JTAG PLL Two 56-bit Accumulators Controller Controller Generator 56-bit Barrel Shifter OnCE™ DE 2 MODA/IRQA MODB/IRQB RESET MODC/IRQC PINIT/NMI MODD/IRQD Figure 1. DSP56303 Block Diagram The DSP56303 is a member of the DSP56300 core family of programmable CMOS DSPs. Significant architectural features of the DSP56300 core family include a barrel shifter, 24-bit addressing, instruction cache, and DMA. The DSP56303 offers 100 million multiply-accumulates per second (MMACS) using an internal 100 MHz clock at 3.0–3.6 volts. The DSP56300 core family offers a rich instruction set and low power dissipation, as well as increasing levels of speed and power to enable wireless, telecommunications, and multimedia products.
    [Show full text]
  • CSE 141L: Design Your Own Processor What You'll
    CSE 141L: Design your own processor What you’ll do: - learn Xilinx toolflow - learn Verilog language - propose new ISA - implement it - optimize it (for FPGA) - compete with other teams Grading 15% lab participation – webboard 85% various parts of the labs CSE 141L: Design your own processor Teams - two people - pick someone with similar goals - you keep them to the end of the class - more on the class website: http://www.cse.ucsd.edu/classes/sp08/cse141L/ Course Staff: 141L Instructor: Michael Taylor Email: [email protected] Office Hours: EBU 3b 4110 Tuesday 11:30-12:20 TA: Saturnino Email: [email protected] ebu 3b b260 Lab Hours: TBA (141 TA: Kwangyoon) Æ occasional cameos in 141L http://www-cse.ucsd.edu/classes/sp08/cse141L/ Class Introductions Stand up & tell us: -Name - How long until graduation - What you want to do when you “hit the big time” - What kind of thing you find intellectually interesting What is an FPGA? Next time: (Tuesday) Start working on Xilinx assignment (due next Tuesday) - should be posted Sat will give a tutorial on Verilog today Check the website regularly for updates: http://www.cse.ucsd.edu/classes/sp08/cse141L/ CSE 141: 0 Computer Architecture Professor: Michael Taylor UCSD Department of Computer Science & Engineering RF http://www.cse.ucsd.edu/classes/sp08/cse141/ Computer Architecture from 10,000 feet foo(int x) Class of { .. } application Physics Computer Architecture from 10,000 feet foo(int x) Class of { .. } application An impossibly large gap! In the olden days: “In 1942, just after the United States entered World War II, hundreds of women were employed around the country as Physics computers...” (source: IEEE) The Great Battles in Computer Architecture Are About How to Refine the Abstraction Layers foo(int x) { .
    [Show full text]
  • Low Power Asynchronous Digital Signal Processing
    LOW POWER ASYNCHRONOUS DIGITAL SIGNAL PROCESSING A thesis submitted to the University of Manchester for the degree of Doctor of Philosophy in the Faculty of Science & Engineering October 2000 Michael John George Lewis Department of Computer Science 1 Contents Chapter 1: Introduction ....................................................................................14 Digital Signal Processing ...............................................................................15 Evolution of digital signal processors ....................................................17 Architectural features of modern DSPs .........................................................19 High performance multiplier circuits .....................................................20 Memory architecture ..............................................................................21 Data address generation .........................................................................21 Loop management ..................................................................................23 Numerical precision, overflows and rounding .......................................24 Architecture of the GSM Mobile Phone System ...........................................25 Channel equalization ..............................................................................28 Error correction and Viterbi decoding ...................................................29 Speech transcoding ................................................................................31 Half-rate and enhanced
    [Show full text]
  • Rlsc & DSP Advanced Microprocessor System Design
    Purdue University Purdue e-Pubs ECE Technical Reports Electrical and Computer Engineering 3-1-1992 RlSC & DSP Advanced Microprocessor System Design; Sample Projects, Fall 1991 John E. Fredine Purdue University, School of Electrical Engineering Dennis L. Goeckel Purdue University, School of Electrical Engineering David G. Meyer Purdue University, School of Electrical Engineering Stuart E. Sailer Purdue University, School of Electrical Engineering Glenn E. Schmottlach Purdue University, School of Electrical Engineering Follow this and additional works at: http://docs.lib.purdue.edu/ecetr Fredine, John E.; Goeckel, Dennis L.; Meyer, David G.; Sailer, Stuart E.; and Schmottlach, Glenn E., "RlSC & DSP Advanced Microprocessor System Design; Sample Projects, Fall 1991" (1992). ECE Technical Reports. Paper 302. http://docs.lib.purdue.edu/ecetr/302 This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. RISC & DSP Advanced Microprocessor System Design Sample Projects, Fall 1991 John E. Fredine Dennis L. Goeckel David G. Meyer Stuart E. Sailer Glenn E. Schmottlach TR-EE 92- 11 March 1992 School of Electrical Engineering Purdue University West Lafayette, Indiana 47907 RlSC & DSP Advanced Microprocessor System Design Sample Projects, Fall 1991 John E. Fredine Dennis L. Goeckel David G. Meyer Stuart E. Sailer Glenn E. Schrnottlach School of Electrical Engineering Purdue University West Lafayette, Indiana 47907 Table of Contents Abstract ...................................................................................................................
    [Show full text]
  • Introduction to Digital Signal Processors
    INTRODUCTION TO Accumulator architecture DIGITAL SIGNAL PROCESSORS Memory-register architecture Prof. Brian L. Evans in collaboration with Niranjan Damera-Venkata and Magesh Valliappan Load-store architecture Embedded Signal Processing Laboratory The University of Texas at Austin Austin, TX 78712-1084 http://anchovy.ece.utexas.edu/ Outline n Signal processing applications n Conventional DSP architecture n Pipelining in DSP processors n RISC vs. DSP processor architectures n TI TMS320C6x VLIW DSP architecture n Signal and image processing applications n Signal processing on general-purpose processors n Conclusion 2 Signal Processing Applications n Low-cost embedded systems 4 Modems, cellular telephones, disk drives, printers n High-throughput applications 4 Halftoning, radar, high-resolution sonar, tomography n PC based multimedia 4 Compression/decompression of audio, graphics, video n Embedded processor requirements 4 Inexpensive with small area and volume 4 Deterministic interrupt service routine latency 4 Low power: ~50 mW (TMS320C5402/20: 0.32 mA/MIP) 3 Conventional DSP Architecture n High data throughput 4 Harvard architecture n Separate data memory/bus and program memory/bus n Three reads and one or two writes per instruction cycle 4 Short deterministic interrupt service routine latency 4 Multiply-accumulate (MAC) in a single instruction cycle 4 Special addressing modes supported in hardware n Modulo addressing for circular buffers (e.g. FIR filters) n Bit-reversed addressing (e.g. fast Fourier transforms) 4Instructions to keep the
    [Show full text]
  • Processor Architecture Design Using 3D Integration Technology
    Processor Architecture Design Using 3D Integration Technology Yuan Xie Pennsylvania State University Computer Science and Engineering Department University Park, PA, 16802, USA [email protected] Abstract (4)Smaller form factor, which results in higher pack- ing density and smaller footprint due to the addition of The emerging three-dimensional (3D) chip architec- a third dimension to the conventional two dimensional tures, with their intrinsic capability of reducing the wire layout, and potentially results in a lower cost design. length, is one of the promising solutions to mitigate This tutorial paper first presents the background on the interconnect problem in modern microprocessor de- 3D integration technology, and then reviews various ap- signs. 3D memory stacking also enables much higher proaches to design future 3D microprocessors, which memory bandwidth for future chip-multiprocessor de- leverage the benefits of fast latency, higher bandwidth, sign, mitigating the “memory wall” problem. In addi- and heterogeneous integration capability that are offered tion, heterogenous integration enabled by 3D technol- by 3D technology. The challenges for future 3D archi- ogy can also result in innovation designs for future mi- tecture design are also discussed in the last section. croprocessors. This paper serves as a survey of various approaches to design future 3D microprocessors, lever- 2. 3D Integration Technology aging the benefits of fast latency, higher bandwidth, and The 3D integration technologies [25,26] can be clas- heterogeneous integration capability that are offered by sified into one of the two following categories. (1) 3D technology. 1 Monolithic approach. This approach involves sequen- tial device process. The frontend processing (to build the 1.
    [Show full text]
  • Processor Design:System-On-Chip Computing For
    Processor Design Processor Design System-on-Chip Computing for ASICs and FPGAs Edited by Jari Nurmi Tampere University of Technology Finland A C.I.P. Catalogue record for this book is available from the Library of Congress. ISBN 978-1-4020-5529-4 (HB) ISBN 978-1-4020-5530-0 (e-book) Published by Springer, P.O. Box 17, 3300 AA Dordrecht, The Netherlands. www.springer.com Printed on acid-free paper All Rights Reserved © 2007 Springer No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. To Pirjo, Lauri, Eero, and Santeri Preface When I started my computing career by programming a PDP-11 computer as a freshman in the university in early 1980s, I could not have dreamed that one day I’d be able to design a processor. At that time, the freshmen were only allowed to use PDP. Next year I was given the permission to use the famous brand-new VAX-780 computer. Also, my new roommate at the dorm had got one of the first personal computers, a Commodore-64 which we started to explore together. Again, I could not have imagined that hundreds of times the processing power will be available in an everyday embedded device just a quarter of century later.
    [Show full text]
  • Design Software & Development Kit Selector Guide
    ® Design Software & Development Kit Selector Guide April 2004 Table of Contents Quartus II Design Software Leadership ................................ page 2 Selecting a Design Software Product ................................... page 5 System Design Technology Recommended System Configurations ................................. page 8 Altera’s Quartus II software is the industry’s only design Altera Programming Hardware........................................... page 8 environment that supports Intellectual Property (IP)-based Altera Development Kits................................................... page 10 system design—including complete and automated system definition and implementation—without requiring lower- Third-Party Solutions........................................................ page 10 level hardware description language (HDL) or schematics. This capability enables designers to turn their concepts into working systems in minutes. The Quartus II system design tools include: Quartus II Design Software SOPC Builder: A system development tool that automates Leadership adding, parameterizing, and linking IP cores—including Altera’s Quartus® II software leads the embedded processors, co-processors, peripherals, memories, industry as the most comprehensive and user-defined logic—without requiring lower-level HDL environment available for FPGA, CPLD, or schematics. (See Figure 1.) and structured ASIC designs, delivering DSP Builder: Shortens digital signal processing (DSP) design unmatched performance, efficiency, and ease-of-use.
    [Show full text]
  • Physical Design of a 3D-Stacked Heterogeneous Multi-Core Processor
    Physical Design of a 3D-Stacked Heterogeneous Multi-Core Processor Randy Widialaksono, Rangeen Basu Roy Chowdhury, Zhenqian Zhang, Joshua Schabel, Steve Lipa, Eric Rotenberg, W. Rhett Davis, Paul Franzon Department of Electrical and Computer Engineering North Carolina State University, Raleigh, NC, USA frhwidial, rbasuro, zzhang18, jcledfo3, slipa, ericro, wdavis, [email protected] Abstract—With the end of Dennard scaling, three dimensional TSV (I/O pads) stacking has emerged as a promising integration technique to Bulk improve microprocessor performance. In this paper we present Active First metal layer a 3D-SIC physical design methodology for a multi-core processor using commercial off-the-shelf tools. We explain the various Metal High-performance core flows involved and present the lessons learned during the design Last metal layer process. The logic dies were fabricated with GlobalFoundries Face-to-face micro-bumps 130 nm process and were stacked using the Ziptronix face-to- Metal Low-power core face (F2F) bonding technology. We also present a comparative analysis which highlights the benefits of 3D integration. Results indicate an order of magnitude decrease in wirelengths for critical Active inter-core components in the 3D implementation compared to 2D Bulk implementations. I. INTRODUCTION Fig. 1. Cross-section of the face-to-face bonded 3D-IC stack. As performance benefits from technology scaling slows The primary advantage of 3D-stacking comes from reduced down, computer architects are looking at various architectural wirelengths leading to an improvement in routability and signal techniques to maintain the trend of performance improvement, delays. On the other hand, going 3D also increases design while meeting the power budget.
    [Show full text]