Design of Application Specific Processor Architectures

Design of Application Specific Processor Architectures Rainer Leupers RWTH Aachen University Software for Systems on Silicon [email protected] Institute for Integrated Signal Processing Systems Overview: Geography Berlin Aachen 2005 © R. Leupers 2 About ISS ¾ RWTH Aachen is a top-ranked technical university in Germany ¾ ISS institute 3 professors (Meyr, Ascheid, Leupers) 18 Ph.D. students 5 staff ¾ Research on wireless communication systems tight industry cooperations origin of several EDA spin-off companies (e.g. Cadis, Axys, LISATek) SSS group focuses on embedded processor design tools 2005 © R. Leupers 3 Overview 1. Introduction 2. ASIP design methodologies 3. Software tools 4. ASIP architecture design 5. Case study 6. Advanced research topics 2005 © R. Leupers 4 1. Introduction 2005 © R. Leupers 5 Embedded system design automation ¾ Embedded systems Special-purpose electronic devices Very different from desktop computers ¾ Strength of European IT market Telecom, consumer, automotive, medical, ... Siemens, Nokia, Bosch, Infineon, ... ¾ New design requirements Low NRE cost, high efficiency requirements Real-time operation, dependability Keep pace with Moore´s Law 2005 © R. Leupers 6 What to do with chip area ? 2005 © R. Leupers 7 Example: wireless multimedia terminals ¾ Multistandard radio UMTS GSM/GPRS/EDGE WLAN Bluetooth UWB … ¾ Multimedia standards Key issues: MPEG-4 Key issues: MP3 ••TimeTime toto marketmarket((≤≤1212 months)months) AAC • Flexibility (ongoing standard GPS • Flexibility (ongoing standard updates) DVB-H updates) … ••EfficiencyEfficiency(battery(batteryoperation)operation) 2005 © R. Leupers 8 Application specific processors (ASIPs) „As the performance of conventional microprocessors improves, they first meet and then exceed the requirements of most computing applications. Initially, performance is key. But eventually, other factors, like customization, become more important to the customer...“ [M.J. Bass, C.M. Christensen: The Future of the Microprocessor Business, IEEE Spectrum 2002] design budget = (semiconductor revenue) × (% for R&D) growth ≈ 15% ≈ 10% # IC designs = (design budget) / (design cost per IC) growth ≈ 15% growth ≈ 50-100% [Keutzer05] → Customizable application specific processors as reusable, programmable platforms 2005 © R. Leupers 9 Efficiency and flexibility General Purpose Digital Signal Processors Application Processors Specific Instruction Set Processors Field SW Programmable 6 Devices WhyDesignuse ASIPs? . 10 • Higher efficiency for given range Application 5 D I S S I P A T I O N of applications Specific 10 ICs • IP protection • Cost reduction (no HWroyalties) Design Physically Log F L E X I B I L I T Y Optimized • Product differentiation ICs Log P O W E R Log P E R F O R M A N C E 103 . 104 Source: T.Noll, RWTH Aachen 2005 © R. Leupers 10 Standard-CPU vs. ASIP 2005 © R. Leupers 11 2. ASIP design methodologies 2005 © R. Leupers 12 ASIP architecture exploration Application initial processor architecture Profiler Compiler Simulator Assembler optimized Linker processor architecture 2005 © R. Leupers 13 Expression (UC Irvine) 2005 © R. Leupers 14 Tensilica Xtensa/XPRES Source: Tensilica Inc. 2005 © R. Leupers 15 MIPS CorXtend/CoWare CorXpert Synthesize HW and profile with Replace MIPSsim Profile critical code and and 2 with special extensions H identify instruction User Defined Instruction o custom 3 t instructions + s p CorExtend Module o 1 t User Defined Instruction 2005 © R. Leupers 16 CoWare LISATek ASIP architecture exploration ¾ Integrated embedded processor development environment ¾ Unified processor model in LISA 2.0 architecture description language (ADL) ¾ Automatic generation of: SW tools HW models 2005 © R. Leupers 17 LISA operation hierarchy main Reflects hierarchical organization of ISAs decode addr cond opcode opnds imm linear cycl control arithm move short long add sub mul and or 2005 © R. Leupers 18 LISA operations structure LISA operation References to other operations DECLARE Binary coding CODING Assembly syntax SYNTAX Resource access, e.g. registers EXPRESSION Initiate “downstream” operations in pipe ACTIVATION Computation and processor state update BEHAVIOR C compiler generation SEMANTICS 2005 © R. Leupers 19 LISA operation example OPERATION ADD { DECLARE { GROUP src1, src2, dest = { Register } } CODING { 0b1011 src1 src2 dest } SYNTAX { “ADD” dest “,” src1 “,” src2 } BEHAVIOR { dest = src1 + src2; } C/C++ Code } OPERATION Register { DECLARE { LABEL index; ADD } CODING { index } src1 src2 dest SYNTAX { “R” index } EXPRESSION{ R[index] } } Register Register Register 2005 © R. Leupers 20 Exploration/debugger GUI ••ApplicationApplicationssimulationimulation ••DebuggingDebugging ••ProfilingProfiling ••ResourceResourceutilizationutilizationaanalysisnalysis ••PipelinePipeline analysisanalysis ••ProcessorProcessormodelmodelddebuggingebugging ••MemoryMemoryhierarchyhierarchyeexplorationxploration ••CodeCode coveragecoverageanalysisanalysis ••...... 2005 © R. Leupers 21 Some available LISA 2.0 models ¾ DSP: •VLIW: Texas Instruments – Texas Instruments TMS320C54x TMS320C6x Analog Devices – STMicroelectronics ADSP21xx ST220 Motorola 56000 •µC: ¾ RISC: –MHS80C51 MIPS32 4K • ASIP: ESA LEON SPARC 8 – Infineon PP32 NPU ARM7100 – Infineon ICore ARM926 – MorphICs DSP 2005 © R. Leupers 22 3. Software tools 2005 © R. Leupers 23 Tools generated from processor ADL model Application Profiler Compiler Simulator Assembler Linker 2005 © R. Leupers 24 Instruction set simulation Interpretive: •flexible Decode Execute Application Instruction • slow (~ 100 KIPS) Memory RunRun--TimTimee Compiled: • fast (> 10 MIPS) Instruction Behavior Simulation • inflexible Application Compiler Execute Compiled Program • high memory Simulation Memory consumption CCoompilmpilee--TimeTime RunRun--TimTimee JIT-CCS™: • „just-in-time“ Instruction Instruction Behavior Application Decode Compiled Execute compiled Program Simulation Memory Cache • SW simulation cache RunRun--TimTimee • fast and flexible 2005 © R. Leupers 25 JIT-CC simulation performance • Dependent on simulation cache size • 95% of compiled simulation performance @ 4096 cache blocks (10% memory consumption of compiled sim.) • Example: ST200 VLIW DSP 9 100 8 90 C ] 80 a S 7 c h IP 70 6 eM [M 60 i 5 s ce sR n 50 a 4 a m 40 r t i o 3 o 30 rf [ 2 % Pe 20 ] 1 10 0 0 d e 8 6 6 2 4 8 6 2 4 8 e iv 1 32 64 9 9 il t 128 25 51 02 38 76 p e 1 204 40 81 m rpr 16 32 Co te In Cache size [records] 2005 © R. Leupers 26 Why care about C compilers? ¾ Embedded SW design becoming predominant manpower factor in system design ¾ Cannot develop/maintain millions of code lines in assembly language ¾ Move to high-level programming languages 2005 © R. Leupers 27 Why care about compilers? ¾ Trend towards heterogeneous multiprocessor systems-on- chip (MPSoC) ¾ Customized application specific instruction set processors (ASIPs) are key MPSoC components ¾ How to achieve efficient compiler support for ASIPs? ASICASIC CPUCPU ASIPASIP ASICASIC CPUCPU ASIPASIP CPUCPU ASIPASIP MemMem MemoryMemory MemoryMemory MemoryMemory 2005 © R. Leupers 28 C compiler in the exploration loop „Compiler/Architecture Co-Design“ ¾ Efficient C-compilers cannot be designed for ARBITRARY architectures! Application Compiler Processor Results Software ¾ Compiler and processor form a UNIT that needs to be optimized! ¾ “Compiler-friendliness“ needs to be taken into account during the architecture exploration! 2005 © R. Leupers 29 Retargetable compilers Classical compiler Retargetable compiler source source processor code code model Compiler processor Compiler model Compiler asm asm code code 2005 © R. Leupers 30 GNU C compiler (gcc) • Probably the most widespread retargetable compiler • Mostly used as a native Unix/Linux compiler, but may operate as a cross-compiler, too • Support for C/C++, Java, and other languages • Comes with comprehensive support software, e.g. runtime and standard libraries, debug support • Portable to new architectures by means of machine description file and C support routines “The“The main main goal goal of of GCC GCC was was to to make make a a good, good, fast fast compiler compiler for for machinesmachines in in the the clas classs that that the the GN GNUU system system aims aims to to run run on: on: 32-bit 32-bit macmachhinineses that that address address 8-bit 8-bit bytes bytes and and have have several several general general registers. registers. Elegance,Elegance, theoretical theoretical power power and and simplicity simplicity are are only only secondary.” secondary.” 2005 © R. Leupers 31 LANCE (Univ. of Dortmund/RWTH Aachen) ¾ Modular retargetable C compiler system ¾ www.lancecompiler.com LANCE library LANCE tools C frontend header file lance2.h IR optimization 1 IR-C used by C++ library IR optimization n liblance2.a backend interface 2005 © R. Leupers 32 Executable C based IR in the LANCE compiler C source code int A[10]; char *t1,*t3; symbol void f() int i,t2,t5,t6,t7,t8; table { int *t4; int i,A[10]; t3 = (char *)A; // cast base to char* i = A[2]++ t2 = 2 * 4; // compute offset > 1 ? t1 = t3 + t2; // compute eff addr t4 = (int *)t1; // cast back to int* 2 : 3; t5 = *t4; // load value from memory } array t6 = t5 + 1; // increment access *t4 = t6; // store back into A[2] increment t7 = t5 > 1; // compare if (t7) goto L1; // jump if > t8 = 3; // load 3 if <= conditional goto L2; // goto join point L1: t8 = 2; // load 2 if > expression L2: i = t8; // move result into i 2005 © R. Leupers 33 CoSy compiler system (ACE) • Universal

Design of Application Specific Processor Architectures

Efficient Checker Processor Design

Three-Dimensional Integrated Circuit Design: EDA, Design And

Understanding Performance Numbers in Integrated Circuit Design Oprecomp Summer School 2019, Perugia Italy 5 September 2019

DSP56303 Product Brief, Rev

CSE 141L: Design Your Own Processor What You'll

Low Power Asynchronous Digital Signal Processing

Rlsc & DSP Advanced Microprocessor System Design

Introduction to Digital Signal Processors

Processor Architecture Design Using 3D Integration Technology

Processor Design：System-On-Chip Computing For

Design Software & Development Kit Selector Guide

Physical Design of a 3D-Stacked Heterogeneous Multi-Core Processor