Cadence Tensilica Fusion G3 DSP Core

Total Page:16

File Type:pdf, Size:1020Kb

Cadence Tensilica Fusion G3 DSP Core An Independent Evaluation of the Cadence Tensilica Fusion G3 DSP Core By the staff of October 2016 OVERVIEW The recently announced Cadence Tensilica Fusion G3 DSP IP core is a high-performance licensable programmable digital signal processor core targeting diverse signal processing applications such as communications, audio and industrial applications. BDTI, a technology analysis firm, benchmarked the Fusion G3 core on several typical digital signal processing functions, comparing the performance of Fusion G3 against Texas Instruments’ flagship C66x DSP core. BDTI also compared the Fusion G3’s FFT performance to that of the ARM Cortex-A57 CPU core. Finally, BDTI implemented and optimized a custom DSP function from scratch on the Fusion G3 and compared the resulting performance to that of the TI C66x. This report presents BDTI’s independent evaluation of the Fusion G3 core’s performance and ease of software development. The Fusion G3 DSP core’s wide SIMD (single-instruction, multiple-data) operations and VLIW (very long instruction word) instruction set provide excellent cycle efficiency on many DSP tasks, and yield performance that surpasses that of TI’s flagship C66x DSP core. Fusion G3 is also noteworthy for its double- precision floating-point support for precision-critical tasks. Cadence provides robust software development tools and DSP function libraries to help users effectively realize the core’s performance potential. © 2016 Berkeley Design Technology, Inc. Page 1 Contents targets a similar range of applications, is well known in the industry, and has readily-available 1. Introduction ....................................................... 2 tools and optimized libraries that we could 2. About the Cadence Fusion G3 Core ............. 3 leverage for benchmarking. We used existing, 3. About the Benchmarks .................................... 3 optimized software library functions available 4. Benchmark Results ........................................... 4 from Cadence and TI, respectively, to compare 5. BDTI’s Fusion G3 Software Development the performance of the Fusion G3 and C66x. User Experience ................................................ 6 We also compare the Fusion G3’s execution 6. Conclusions........................................................ 7 speed on floating-point FFTs against ARM’s Appendix: Fusion G3 vs. ARM Cortex-A57 Cortex-A57 core leveraging ARM NEON SIMD Benchmarking Considerations ................................. 7 instructions. The Cortex-A57 is a licensable 64-bit high-performance CPU core. Comparing Fusion 1. Introduction G3 performance against that of the Cortex-A57 The Cadence Tensilica Fusion G3 DSP is a illustrates the benefits of including a dedicated high-performance programmable digital signal DSP core in a SoC design. However, CPU cores processor core from Cadence, targeting a wide such as Cortex-A57 are typically used with range of signal processing workloads in different system-level hardware architectures, applications such as communications, audio, and different software architectures, and different industrial equipment. This report presents BDTI’s programming practices compared with DSP cores, independent evaluation of the Cadence Fusion G3 making performance comparisons between these DSP core’s performance and ease of software two classes of processors challenging and prone to development. misinterpretation. Therefore, we consider the We compare Fusion G3’s execution speed and comparison of Fusion G3 and Cortex-A57 to be a cycle-efficiency against the Texas Instruments’ rough first-order comparison. flagship C66x high-performance DSP core, used Finally, we discuss BDTI’s software in many of TI’s DSP chips. BDTI chose the TI development and optimization experience with C66x for this competitive evaluation because it Figure 1 Block Diagram of Fusion G3 Architecture © 2016 Berkeley Design Technology, Inc. Page 2 the Fusion G3 core and toolchain. In addition to We computed execution time for each target by using DSP functions from Cadence’s DSP library, dividing the measured number of cycles by the BDTI implemented and optimized on the Fusion processor’s clock rate. For the Cadence Fusion G3 G3 a median filter common in video processing we measured cycle counts using a cycle-accurate applications. In Section 5 we describe and simulator, and for the C66x we used an evaluation comment on our experience with function board. optimization on Fusion G3 and with Cadence The benchmark functions were chosen based software libraries and tools. on relevance to a variety of signal processing tasks, and also based on availability in a 2. About the Cadence Fusion G3 comparable form on both the Fusion G3 and Core C66x platforms. The Cadence and Texas Cadence positions the Fusion G3 DSP core as Instruments optimized DSP function libraries a multi-purpose DSP for compute-intensive both contain multiple implementations of some applications. The Fusion G3 is suitable for diverse algorithm kernel functions such as FIR filters. workloads including communications, audio, Different implementations impose different imaging, radar and industrial control, among restrictions on function parameters, for example others. A block diagram of the Fusion G3 requiring that the number of filter taps be a architecture is shown in Figure 1. multiple of four. Such restrictions represent Fusion G3 natively supports 8-bit, 16-bit, and tradeoffs between function generality and optimal 32-bit fixed-point arithmetic via 128-bit wide vectorized implementation. BDTI made best SIMD (single-instruction, multiple-data) vector efforts to choose the fastest available function for registers, performing parallel operations on each target processor, while also ensuring that the sixteen, eight, or four data elements at a time, restrictions imposed by the respective respectively. An optional vector floating-point unit implementations are reasonably comparable. adds support for SIMD operations on 32-bit Thus, the requirement of availability of single-precision and 64-bit double-precision comparable implementations on both targets floating-point data, compliant with IEEE 754 restricted the choice of algorithm kernels standards. somewhat. A VLIW (very long instruction word) instruction set architecture allows the Fusion G3 Benchmark Parameters Data Types core to execute up to four SIMD vector operations per cycle, including (with some 16-bit fixed-point, restrictions) up to two multiplier/ALU operations Complex FFT 512 points and two loads or one load and one store. Flexible 32-bit floating-point operation predication support allows vectorized 32 taps, 16-bit fixed-point, Real FIR Filter code to maintain high throughput by avoiding 1024 points 32-bit floating-point conditional branches. Real Vector Dot 16-bit fixed-point, 1024 points 3. About the Benchmarks Product 32-bit floating-point Common DSP Functions 3×3 kernel, BDTI measured the performance of the output one 2D Median Filter 8-bit unsigned Cadence Tensilica Fusion G3 DSP core and the scan line, Texas Instruments C66x DSP core on several 1280 pixels functions from the optimized software libraries Matrix LU available from Cadence and Texas Instruments. 32×32 matrix 64-bit floating-point We believe that these software libraries are widely Decomposition used, and their performance generally reflects real- world application performance. Table 1 DSP functions selected for benchmarking BDTI benchmarked the DSP algorithm kernel the Fusion G3 and C66x. For each function, BDTI functions listed in Table 1. For each function, we benchmarked an optimized implementation for each data type listed. measured the number of processor cycles taken by the Fusion G3 and C66x to execute the function. © 2016 Berkeley Design Technology, Inc. Page 3 The LU matrix decomposition algorithm Evaluating Ease of Software kernel was chosen specifically to highlight the Development support for double-precision floating-point The median filter function was chosen for the arithmetic on the Fusion G3 core. Matrix purpose of evaluating software development user decomposition and inversion functions often experience for the Fusion G3. This function was require double-precision floating-point arithmetic. attractive because it is not included in the Cadence Of the algorithm kernels provided in the Cadence function library—thus requiring custom and TI libraries that commonly require double- implementation and optimization—and because precision floating-point support, the LU there is a median filter implementation available in decomposition kernel functions have the most the Texas Instruments image processing software closely comparable APIs between the Fusion G3 library for the C66x core, enabling performance and C66x DSP libraries. Note, however, that while comparison against the C66x. BDTI created and the TI matrix LU decomposition function optimized a median filter implementation for generates the permutation matrix as one of its Fusion G3 that is designed to be meaningfully outputs, the corresponding Cadence function does comparable to the median filter function found in not. BDTI estimated the additional number of the Texas Instruments library. Therefore, the cycles that would be required by the Cadence median filter benchmark result presented in this function to output the permutation matrix. The document reflects a fair comparison of the Fusion number of cycles for the Fusion G3 on the matrix G3 and C66x core architectures, rather than the LU decomposition reported here is computed by fastest possible implementation
Recommended publications
  • The RISC-V Instruction Set Manual, Volume I: User- Level ISA, Version 2.0
    The RISC-V Instruction Set Manual, Volume I: User- Level ISA, Version 2.0 Andrew Waterman Yunsup Lee David A. Patterson Krste Asanovic Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2014-54 http://www.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-54.html May 6, 2014 Copyright © 2014, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. The RISC-V Instruction Set Manual Volume I: User-Level ISA Version 2.0 Andrew Waterman, Yunsup Lee, David Patterson, Krste Asanovi´c CS Division, EECS Department, University of California, Berkeley fwaterman|yunsup|pattrsn|[email protected] May 6, 2014 Preface This is the second release of the user ISA specification, and we intend the specification of the base user ISA plus general extensions (i.e., IMAFD) to remain fixed for future development. The following changes have been made since Version 1.0 [25] of this ISA specification. • The ISA has been divided into an integer base with several standard extensions. • The instruction formats have been rearranged to make immediate encoding more efficient. • The base ISA has been defined to have a little-endian memory system, with big-endian or bi-endian as non-standard variants.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Embedded Linux Primer: a Practical Real-World Approach
    Embedded Linux Primer: A Practical, Real-World Approach By Christopher Hallinan ............................................... Publisher: Prentice Hall Pub Date: September 18, 2006 Print ISBN-10: 0-13-167984-8 Print ISBN-13: 978-0-13-167984-9 Pages: 576 Table of Contents | Index Comprehensive Real-World Guidance for Every Embedded Developer and Engineer This book brings together indispensable knowledge for building efficient, high-value, Linux-based embedded products: information that has never been assembled in one place before. Drawing on years of experience as an embedded Linux consultant and field application engineer, Christopher Hallinan offers solutions for the specific technical issues you're most likely to face, demonstrates how to build an effective embedded Linux environment, and shows how to use it as productively as possible. Hallinan begins by touring a typical Linux-based embedded system, introducing key concepts and components, and calling attention to differences between Linux and traditional embedded environments. Writing from the embedded developer's viewpoint, he thoroughly addresses issues ranging from kernel building and initialization to bootloaders, device drivers to file systems. Hallinan thoroughly covers the increasingly popular BusyBox utilities; presents a step-by-step walkthrough of porting Linux to custom boards; and introduces real-time configuration via CONFIG_RT--one of today's most exciting developments in embedded Linux. You'll find especially detailed coverage of using development tools to analyze
    [Show full text]
  • The RISC-V Instruction Set Manual Volume I: User-Level ISA Document Version 2.2
    The RISC-V Instruction Set Manual Volume I: User-Level ISA Document Version 2.2 Editors: Andrew Waterman1, Krste Asanovi´c1;2 1SiFive Inc., 2CS Division, EECS Department, University of California, Berkeley [email protected], [email protected] May 7, 2017 Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Krste Asanovi´c,Rimas Aviˇzienis,Jacob Bachmeyer, Christopher F. Batten, Allen J. Baum, Alex Bradbury, Scott Beamer, Preston Briggs, Christopher Celio, David Chisnall, Paul Clayton, Palmer Dabbelt, Stefan Freudenberger, Jan Gray, Michael Hamburg, John Hauser, David Horner, Olof Johansson, Ben Keller, Yunsup Lee, Joseph Myers, Rishiyur Nikhil, Stefan O'Rear, Albert Ou, John Ousterhout, David Patterson, Colin Schmidt, Michael Taylor, Wesley Terpstra, Matt Thomas, Tommy Thorn, Ray VanDeWalker, Megan Wachs, Andrew Waterman, Robert Wat- son, and Reinoud Zandijk. This document is released under a Creative Commons Attribution 4.0 International License. This document is a derivative of \The RISC-V Instruction Set Manual, Volume I: User-Level ISA Version 2.1" released under the following license: c 2010{2017 Andrew Waterman, Yunsup Lee, David Patterson, Krste Asanovi´c. Creative Commons Attribution 4.0 International License. Please cite as: \The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 2.2", Editors Andrew Waterman and Krste Asanovi´c,RISC-V Foundation, May 2017. Preface This is version 2.2 of the document describing the RISC-V user-level architecture. The document contains the following versions of the RISC-V ISA modules: Base Version Frozen? RV32I 2.0 Y RV32E 1.9 N RV64I 2.0 Y RV128I 1.7 N Extension Version Frozen? M 2.0 Y A 2.0 Y F 2.0 Y D 2.0 Y Q 2.0 Y L 0.0 N C 2.0 Y B 0.0 N J 0.0 N T 0.0 N P 0.1 N V 0.2 N N 1.1 N To date, no parts of the standard have been officially ratified by the RISC-V Foundation, but the components labeled \frozen" above are not expected to change during the ratification process beyond resolving ambiguities and holes in the specification.
    [Show full text]
  • Instruction Set Extension with Shadow Registers for Configurable Processors
    Instruction Set Extension with Shadow Registers for Configurable Processors Jason Cong, Yiping Fan, Guoling Han, Ashok Jagannathan, Glenn Reinman, Zhiru Zhang Computer Science Department, University of California, Los Angeles Los Angeles, CA 90095, USA {cong, fanyp, leohgl, ashokj, reinman, zhiruz}@cs.ucla.edu ABSTRACT and the speedup (and power savings) offered by an application- Configurable processors are becoming increasingly popular for specific hardware accelerator. Generally, there are two ways to modern embedded systems (especially for the field-programmable couple the reconfigurable fabric with the microprocessor [5]. system-on-a-chip). While steady progress has been made in the Loosely coupled, a reconfigurable fabric can be used as a co- tools and methodologies of automatic instruction set extension for processor [21][11]. Co-processors perform more complicated configurable processors, the limited data bandwidth available in tasks independently without the constant supervision of the main the core processor (e.g., the number of simultaneous accesses to processor. The main processor sends the necessary data to the co- the register file) becomes a potential performance bottleneck. In processor at the initialization stage. With the internal state this paper we first present a quantitative analysis of the data registers, the co-processor does not need to transfer data during bandwidth limitation in configurable processors, and then propose the computation period. On the contrary, application-specific a novel low-cost architectural extension and associated instruction-set processors (ASIPs) tightly integrate the compilation techniques to address the problem. The application of reconfigurable fabric as additional application-specific function our approach results in a promising performance improvement.
    [Show full text]
  • The What, Why, and How of Customizable Processors Meeting Performance, Cost, and Power Objectives While Reducing ASIC Design Risk and Increasing Design Flexibility
    The What, Why, and How of Customizable Processors Meeting performance, cost, and power objectives while reducing ASIC design risk and increasing design flexibility Customizable processors that perform intensive data processing are designed to provide programmability in the performance-intensive dataplane of the system-on-chip (SoC) design. Not only do they combine the capabilities of a DSP and a CPU, but they can be customized to maximize efficiency for your target application. Introduction Contents While processors are often used for the control functions in system-on-chip Introduction ......................................1 (SoC) designs, designers turn to RTL blocks for many data-intensive functions Getting More Performance the Old that control processors can’t handle. However, RTL blocks take a long time to Way #1—Higher Clock Speed ...........2 design and even longer to verify, and they are not programmable to handle Getting More Performance the Old multiple standards or designs. Way #2—RTL Acceleration ................2 What is a Customizable Processor? ....3 The most common embedded microprocessor architectures—such as the ARM®, MIPS, and PowerPC processors—were developed in the 1980s for Processor Design Cycle ......................4 stand-alone microprocessor chips. These general-purpose processor architec- Achieving Lower Energy Consumption tures, or CPUs, are good at executing a wide range of algorithms with a focus with the Xtensa Processor .................5 on control code, but SoC designers often need more performance in
    [Show full text]
  • Diamond Standard Processor Cores
    WHITE PAPER Diamond Standard Processor Cores A comprehensive family of software-compatible, preconfigured RISC controller cores for your next ASIC or SOC design Tensilica’s Diamond Standard Series is a broad family of preconfigured 32-bit microprocessor and DSP Intellectual Property (IP) cores based on Tensilica’s Xtensa Instruction Set Architecture (ISA). All of these code-compatible processor cores employ a common set of software/firmware development tools making it easy for development teams to move from one processor core to another as design needs change or when your design needs multiple cores to execute different tasks. The base Xtensa ISA employs 24-bit, general-purpose RISC instructions that target a wide range of embedded applications. Most common instructions also have a 16-bit narrow encoding to minimize code footprint and the Diamond Series architecture allows modeless switching between 16 and 24-bit instructions. Consequentially, Diamond Standard Series processor cores achieve the highest code densities among all 32-bit RISC processors while delivering industry-leading performance. Introduction Tensilica’s Diamond Standard Series is a family of code-compatible, preconfigured 32-bit microprocessor and DSP Intellectual Property (IP) cores based on Tensilica’s Xtensa Instruction Set Architecture (ISA). The base Xtensa ISA uses 24 bit instructions that target a wide range of embedded applications. Most common instructions in the Xtensa ISA also have 16-bit narrow encodings as well, and the architecture allows modeless switching between 16 and 24-bit instructions so that the compiler is free to pick the smallest possible instruction at any time with no performance penalty. As a result, the Diamond Series processors achieve the highest code densities among all 32-bit RISC processors.
    [Show full text]
  • Xtensa LX Microprocessor Overview Handbook Iii Contents
    Xtensa ® LX Microprocessor Overview Handbook A Summary of the Xtensa® LX Microprocessor Data Book For Xtensa® LX Processor Cores Tensilica, Inc. 3255-6 Scott Blvd Santa Clara, CA 95054 (408) 986-8000 fax (408) 986-8919 www.tensilica.com © 2004 Tensilica, Inc. Printed in the United States of America All Rights Reserved This publication is provided “AS IS.” Tensilica, Inc. (hereafter “Tensilica”) does not make any warranty of any kind, either ex- pressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Information in this document is provided solely to enable system and software developers to use Tensilica processors. Unless specifically set forth herein, there are no express or implied patent, copyright or any other intellectual property rights or licens- es granted hereunder to design or fabricate Tensilica integrated circuits or integrated circuits based on the information in this document. Tensilica does not warrant that the contents of this publication, whether individually or as one or more groups, meets your requirements or that the publication is error-free. This publication could include technical inaccuracies or typo- graphical errors. Changes may be made to the information herein, and these changes may be incorporated in new editions of this publication. Tensilica and Xtensa are registered trademarks of Tensilica, Inc. The following terms are trademarks of Tensilica, Inc.: FLIX, OSKit, Sea of Processors, Vectra, Xplorer, and XPRES. All other trademarks and registered trademarks are the property of their respective companies. Issue Date: 9/2004 PD-04-2508-10-00 Tensilica, Inc. 3255-6 Scott Blvd Santa Clara, CA 95054 (408) 986-8000 fax (408) 986-8919 www.tensilica.com Contents Contents Introducing the Xtensa LX Processor Generator ...............................................................
    [Show full text]
  • The RISC-V Instruction Set Manual Volume I: Unprivileged ISA Document Version 20191213
    The RISC-V Instruction Set Manual Volume I: Unprivileged ISA Document Version 20191213 Editors: Andrew Waterman1, Krste Asanovi´c1;2 1SiFive Inc., 2CS Division, EECS Department, University of California, Berkeley [email protected], [email protected] December 13, 2019 Contributors to all versions of the spec in alphabetical order (please contact editors to suggest corrections): Arvind, Krste Asanovi´c,Rimas Aviˇzienis,Jacob Bachmeyer, Christopher F. Bat- ten, Allen J. Baum, Alex Bradbury, Scott Beamer, Preston Briggs, Christopher Celio, Chuanhua Chang, David Chisnall, Paul Clayton, Palmer Dabbelt, Ken Dockser, Roger Espasa, Shaked Flur, Stefan Freudenberger, Marc Gauthier, Andy Glew, Jan Gray, Michael Hamburg, John Hauser, David Horner, Bruce Hoult, Bill Huffman, Alexandre Joannou, Olof Johansson, Ben Keller, David Kruckemyer, Yunsup Lee, Paul Loewenstein, Daniel Lustig, Yatin Manerkar, Luc Maranget, Mar- garet Martonosi, Joseph Myers, Vijayanand Nagarajan, Rishiyur Nikhil, Jonas Oberhauser, Stefan O'Rear, Albert Ou, John Ousterhout, David Patterson, Christopher Pulte, Jose Renau, Josh Scheid, Colin Schmidt, Peter Sewell, Susmit Sarkar, Michael Taylor, Wesley Terpstra, Matt Thomas, Tommy Thorn, Caroline Trippel, Ray VanDeWalker, Muralidaran Vijayaraghavan, Megan Wachs, Andrew Waterman, Robert Watson, Derek Williams, Andrew Wright, Reinoud Zandijk, and Sizhuo Zhang. This document is released under a Creative Commons Attribution 4.0 International License. This document is a derivative of \The RISC-V Instruction Set Manual, Volume I: User-Level ISA Version 2.1" released under the following license: ⃝c 2010{2017 Andrew Waterman, Yunsup Lee, David Patterson, Krste Asanovi´c. Creative Commons Attribution 4.0 International License. Please cite as: \The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Document Version 20191213", Editors Andrew Waterman and Krste Asanovi´c,RISC-V Foundation, December 2019.
    [Show full text]
  • Xtensa Instruction Set Architecture (ISA) Reference Manual Iii Contents
    Xtensa® Instruction Set Architecture (ISA) Reference Manual For All Xtensa Processor Cores Tensilica, Inc. 3255-6 Scott Blvd. Santa Clara, CA 95054 (408) 986-8000 fax (408) 986-8919 www.tensilica.com © 2010 Tensilica, Inc. Printed in the United States of America All Rights Reserved This publication is provided “AS IS.” Tensilica, Inc. (hereafter “Tensilica”) does not make any warranty of any kind, either ex- pressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Information in this document is provided solely to enable system and software developers to use Tensilica processors. Unless specifically set forth herein, there are no express or implied patent, copyright or any other intellectual property rights or licens- es granted hereunder to design or fabricate Tensilica integrated circuits or integrated circuits based on the information in this document. Tensilica does not warrant that the contents of this publication, whether individually or as one or more groups, meets your requirements or that the publication is error-free. This publication could include technical inaccuracies or typo- graphical errors. Changes may be made to the information herein, and these changes may be incorporated in new editions of this publication. Tensilica and Xtensa are registered trademarks of Tensilica, Inc. The following terms are trademarks of Tensilica, Inc.: FLIX, OSKit, Sea of Processors, TurboXim, Vectra, Xenergy, Xplorer, and XPRES. All other trademarks and registered trademarks are the property of their respective companies. Issue Date: 4/2010 RC-2010.1 Release PD-09-0801-10-01 Tensilica, Inc. 3255-6 Scott Blvd.
    [Show full text]
  • Soft-Core Processors for Embedded Systems
    RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR SoftSoft--CoreCore ProcessorsProcessors ForFor EmbeddedEmbedded SystemsSystems Jason G. Tong and Ian D. L. Anderson Supervisor: Dr. M.A.S. Khalid ICM’06, December 2006 Research Center for Integrated Microsystems Department of Electrical and Computer Engineering University of Windsor 18th International Conference on Microelectronics – December 16th – 19th, 2006 Slide 1 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR OUTLINE • Introduction • A Survey of Soft-Core Processors • Commerical Cores and Tools • Open-source Cores • Some Example Applications • Comparison of Soft-Core Processors • Conclusions and Future Work 18th International Conference on Microelectronics – December 16th – 19th, 2006 Slide 2 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Introduction 18th International Conference on Microelectronics – December 16th – 19th, 2006 Slide 3 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR Embedded Systems •An embedded system: a system that utilizes custom Embedded System hardware and software to carry out specific tasks Embedded CPU • Digital Hardware: Software running – Microprocessor or µC on CPU – Application-specific Memory hardware generally used & I/O for accelerating time- Application- critical tasks specific • Embedded software running hardware on the μP or μC 18th International Conference on Microelectronics – December 16th – 19th, 2006 Slide 4 RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY
    [Show full text]
  • RISC-V OFFERS SIMPLE, MODULAR ISA New CPU Instruction Set Is Open and Extensible
    RISC-V OFFERS SIMPLE, MODULAR ISA New CPU Instruction Set Is Open and Extensible By David Kanter (March 28, 2016) ................................................................................................................... RISC-V is a new general-purpose instruction-set archi- when Tegra 2 omitted the Neon vector extensions, it created tecture (ISA) that’s BSD licensed, extensible, and royalty substantial consternation in the Android ecosystem. free. It’s clean and modular with a 32-, 64-, or 128-bit inte- To address this issue, RISC-V defines a guaranteed ger base and various optional extensions (e.g., floating base integer instruction set of 32, 64, or 128 bits, a family of point). RISC-V is easier to implement than some alterna- optional and predefined extensions, and a mechanism for tives—minimal RISC-V cores are roughly half the size of creating variable-length extensions. The base ISA is cleanly equivalent ARM cores—and the ISA has already gathered architected to simplify implementation. For example, the some support from the semiconductor industry. instruction encoding is highly regular, and the memory The ambitious goal of RISC-V (pronounced “risk model is straightforward and lacks complicated memory five”) is to create a “universal” instruction set that is free instructions. The benefit is that minimal RISC-V cores are and open to all users while providing the broad software much smaller than similar ARM or x86 cores, although the support of successful commercial ISAs (e.g., x86 and ARM). difference is not noticeable for more powerful cores (e.g., Consider a modern smartphone processor, which typically superscalar, out-of-order). To enable extensions, portions of comprises a dozen or more cores with different software the instruction encoding space have already been set aside stacks (e.g., an ARM application CPU, a GPU, three to five for future use.
    [Show full text]