Presentation

Total Page:16

File Type:pdf, Size:1020Kb

Presentation A VLIW Processor with Hardware Functions: Increasing Performance While Reducing Power Profs. Ray Hoare and Alex Jones Dara Kusic, Joshua Fazekas, Gayatri Mehta, and John Foster Objective High Performance PC to Supercomputer Performance Low Power PDA Power Rapid Design Cycle “Compile Software into Hardware” Unmodified C Source Code No Hardware Design Experience Required Automatic Integration with Processor Target Technologies For all devices, we want to Maximize Performance while Minimizing Power consumption and Design Time. Our Approach VLIW Architecture Up to 4x performance improvement for software alone Full 32-bit Processor Real applications Altera NIOS II Instruction Set Architecture “Hardware Functions” Convert critical software kernels into hardware 9x to 332x Performance improvement 42x to 418x Power reduction Productivity through Design Automation Source: Unmodified C programs “Compile” software into hardware functions VLIW Processor: Exploit Instruction Level Parallelism Full NIOS II 32-bit ISA Real processor, real apps Expanded to 4 ALUs for up to 4x speedup Shared Register File Shared Instruction Stream Trimaran Compiler Instr RAM Instruction Cust Cust Cust Cust Decoder ALU Instr ALU Instr ALU Instr ALU Instr MUX MUX MUX MUX Controller Hardware Functions: Exploit Hardware Acceleration Kernels converted to Hardware Data shared in Reg. File Control flow in Software VLIW Exploits Instruction Parallelism Hardware Provides Acceleration Kernels High Performance and converted Low Power to Hardware Data shared No Data movement in Reg. File Control flow Programmable in Software Benchmark Codes Speech Coding ADPCM Encoder (4:1 compression) ADPCM Decoder G.721 (cordless phones) GSM (wireless phones, 8:1 compression) Communications Spherical Decoder (M tx, N rx antennas) (Written in vectorized Matlab) Multimedia Decoding MPEG2 Decoder (IDCT) Performance Results Fully Implemented Processors FPGA Comparison: Soft-core processor vs. HW Function in 90nm FPGA pNIOS: our sequential processor VLIW-4: our 4-wide VLIW Hardware Functions vs. Equivalent Software Estimations based on a shared register file with HW Functions Intel Strong ARM 206MHz vs 160nm ASIC HW Functions Intel XScale 733MHz vs 160nm ASIC HW Functions Performance Results Acceleration Perspective A speedup of 2x turns 100MHz system into 200MHz of performance 10x provides 1GHz performance 100x provides 10GHz performance Kernel Acceleration: 7x to 345x Software vs. HW-Accelerated 1,000.00 100.00 Times Improvement Times 10.00 1.00 ADPCM Encode ADPCM Decode G.721 Decode GSM Decode MPEG2 Decode Benchmark pNiosII VLIW-4 StrongARM@206MHz XScale@733MHz Application Acceleration: 1.8x to 154x Software vs. HW-Accelerated 1,000.00 100.00 Times Improvement 10.00 1.00 ADPCM Encode ADPCM Decode G.721 Decode GSM Decode MPEG2 Decode Benchmark pNiosII + HW VLIW-4 + HW StrongARM@206MHz+HW XScale@733MHz+HW Application Acceleration: pNIOS 18 16 14 12 10 8 Times Improved 6 4 2 0 ADPCM Encode ADPCM Decode G.721 Decode GSM Decode MPEG2 Decode Benchmark VLIW over pNIOS pNIOS+Hw over pNIOS VLIW4+HW over pNIOS Effective MHz for pNIOS 3000 2500 2000 1500 MHz 1000 500 0 ADPCM Encode ADPCM Decode G.721 Decode GSM Decode MPEG2 Decode Benchmark pNIOS VLIW4 pNIOS+HW VLIW4+HW Application Acceleration: StrongARM 1,000.00 100.00 Times Improved 10.00 1.00 ADPCM Encode ADPCM Decode G.721 Decode GSM Decode MPEG2 Decode Benchmark StrongARM+HW over StrongARM Effective MHz for StrongARM 27,226 31,750 1000 900 800 700 600 500 MHz 400 300 200 100 0 ADPCM Encode ADPCM Decode G.721 Decode GSM Decode MPEG2 Decode Benchmark StrongARM StongARM+HW Application Acceleration: XScale 100.00 10.00 Times Improved Times 1.00 ADPCM Encode ADPCM Decode G.721 Decode GSM Decode MPEG2 Decode Benchmark Xscale+HW over Xscale Effective MHz for XScale 24,906 27,700 3000 2500 2000 1500 MHz 1000 500 0 ADPCM Encode ADPCM Decode G.721 Decode GSM Decode MPEG2 Decode Benchmark Xscale Xscale+HW Power/Energy Savings Software on 4 wide VLIW in 160nm ASIC vs. Hardware Functions in 160nm ASIC Power Reduction: 42x to 418x Energy Reduction: 1,700x to 60,000x If (bufferstep) { delta = inputbuffer & 0xf; (a) } else { inputbuffer = *inp++ Automatic Conversion delta = (inputbuffer >> 4) & 0xf; } of Software Functions bufferstep into Hardware != 0x0 If Basic Block Functions (b) *inp inp inputbuffer Specified hardware inputbuffer 0xF 0x4 0xF >> functions are converted & ++ & delta inp automatically into delta hardware Then Basic Block Else Basic Block Part of the “compiler” inp *inp inputbuffer Prototype compiler is bufferstep inputbuffer 0xF 0xF >> 0x4 (c) working in laboratory ++ != 0x0 & & T/1 F/0 T/ 1 F/ 0 2:1 MUX 2:1 MUX inp delta ADPCM - CDFG Control Flow for 'adpcm_code r' Basic Block 3 Basic Block 5 Basic Block 6 Basic Block 9 Basic Block 11 Basic Block 13 Basic Block 14 Basic Block 16 Basic Block 17 Basic Block 19 Basic Block 20 Basic Block 22 Basic Block 23 Basic Block 24 Basic Block 25 Basic Block 26 Basic Block 27 Basic Block 28 Basic Block 29 Basic Block 31 Basic Block 34 Basic Block 35 Basic Block 37 Basic Block 38 Basic Block 39 Basic Block 40 BB3 2 inp 8 0 0 suif_tmp0 di ff di ff st ep 3 0 vpdi ff step di ff 4 st ep 1 vpdi ff st ep di ff delta step 1 vpdi ff step delt a sign 0 valpre d vpdi ff valp re d vpdi ff 32 767 valp re d 32 767 valpre d -3 2768 delt a sign -3 2768 0 88 index 88 buf fe rstep 0 stepsizeT able inde x delta 4 delta bu ffer step 0 BB5 BB6 + load suif_tmp suif_tmp0 suif_tmp0 != sign *-1 <= >> delt a + - delta diff >> + - conver t 2 di ff >> + conver t 1 != - + < valp red < conv ert conver t valpre d inde x < inde x != + << conver t 15 != 0 BB9 valpre d in p conver t eval di ff eval vpdi ff vpdi ff di ff <= st ep vpdiff diff | <= st ep vpdiff | eval valp red valp re d eval eval | eval eval load conver t 240 & == BB11 0 - val eval conver t eval conver t indexT able conver t step & conver t outputbuf fe r buf fe rs tep BB13 < di ff delt a delt a + delt a conver t conver t conver t BB14 eval inde x load outputbuf fer | BB16 0 + 1 out p conver t BB17 < inde x + addr suif_tmp conver t BB19 eval outp stor e BB20 BB22 BB23 BB24 BB25 BB27 BB26 BB29 BB28 BB31 BB34 BB35 BB37 BB38 BB39 BB40 Control Flow for 'adpcm_code r ' BB3 ADPCM - CFG BB5 BB6 BB9 BB1 1 BB1 3 Control flow dominated BB1 4 BB1 6 if-then-else BB1 7 BB1 9 ILP < 2 BB2 0 Branch prediction is poor BB2 2 BB2 3 BB2 4 Poor inherent parallelism BB2 5 BB2 7 DFGs contain variable BB2 6 BB2 9 BB2 8 length critical paths BB3 1 BB3 4 Cycle time for RTL BB3 5 conforms to longest DFG BB3 7 BB3 8 BB3 9 BB4 0 Basic Block 3 2 in p + load valpred in p convert 0 - val ADPCM - SDFG < 8 0 0 0 mux *-1 != st e p 3 != >> mux convert 1 + <= - 4 0 >> mux mux mux Single SDFG 1 + <= - convert 2 >> mux mux | Removal of clock + <= convert mux mux Combinational/Data - + convert 1 32767 -32768 mux | Flow < < -32768 convert 32767 mux mux Cycle compression mux convert valpred | Significant increase in 4 convert indexTable << convert 15 + parallelism conver t 240 & index load 0 bufferstep 0 & outputbuffer convert 0 + != 0 != convert 1 outp convert convert < 0 == mux + mux | 88 mux bufferstep outputbuffer mux suif_tm p convert < 88 outp addr convert st epsizeTable mux store + load st e p MPEG II (IDCT) - CDFG Two nested loops within fast_IDCT function idct_row and idct_col In this example, control flow is segregated from most of the work Control Flow for 'Fast_IDC T ' Basic Bloc k 0 Basic Bloc k 1 Basic Bloc k 5 Basic Bloc k 6 BB0 1 0 8 1 0 8 8 i fo r_ st ep 2 i fo r_ st ep BB5 fo r_ st ep i fo r _ lb < fo r_ub fo r_ st ep i fo r _ lb < fo r_ub fo r _ub * + 2 bloc k * fo r _ub + BB1 ev a l ev a l < bloc k i * + < i BB6 ev a l + idctco l ev a l BB3 idctro w MPEG II (IDCT) - CDFG (row) Basic Bloc k 0 0 blk return 0 convert 3 convert 4 convert 2 convert 6 convert 0 convert 4 convert 2 convert 7 convert 3 convert 1 convert 1 convert 5 convert 5 convert 6 convert 7 convert + + + + + + + + + + + + + + + + load load load load load load load load 1568 convert 11 convert convert 37 8 4 convert 3406 convert 4017 convert convert convert 2276 799 * + 11 0 8 << 12 8 * 11 * * + 565 + 2408 * * addr addr addr * + << * * addr + + - - - - + - + - - + + + - - 8 + 8 x7 x8 + - 8 x0 18 1 x3 x6 - x1 x5 - 8 + 18 1 >> >> >> * 12 8 >> * 12 8 convert convert convert + 8 convert + 8 stor e store store >> store >> + 8 x4 + 8 - 8 - x2 8 >> addr >> addr >> >> convert convert convert addr addr convert store store store store MPEG II (IDCT) - CDFG (col) Basic Bloc k 0 0 blk return convert 16 convert 0 convert 24 convert 8 convert 32 convert 56 convert 16 convert 32 convert 56 convert 24 convert 40 convert 48 convert 8 convert 40 convert 48 + + + + + + + + + + + + + + + convert 0 load load load load load load load 1568 + convert 3406 convert 4017 convert convert convert convert 2276 799 37 8 4 * load * 11 0 8 + * 565 + + 2408 * * * convert 8 * 4 * 4 * 4 addr << 819 2 convert 8 + + + 3 + + << - 3 - 3 + 3 - 3 - >> addr + - >> >> >> >> addr + - addr + + - - 14 + + 14 x7 iclp - 14 x8 - 14 x6 18 1 x1 x5 - + 18 1 >> >> convert convert conver t convert >> >> 3 12 8 * * 12 8 + + + + >> 8 + + 8 load load load load - >> + >> convert store + 14 store store store x0 + 14 - x3 14 x4 - 14 x2 convert >> addr convert >> convert >> >> addr + + + + addr load load addr load load st ore st ore store st ore A VLIW Processor with Hardware Functions: Increasing Performance While Reducing Power For FPGA, Structured ASIC and ASIC technologies VLIW Architecture Up to 4x performance improvement for software alone Full 32-bit Processor Real applications Altera NIOS II Instruction Set Architecture “Hardware Functions” Convert critical software kernels into hardware 9x to 332x Performance improvement 42x to 418x Power reduction Productivity through Design Automation Source: Unmodified C programs “Compile” software into hardware functions Contact Prof.
Recommended publications
  • Intel® Strongarm® SA-1110 High- Performance, Low-Power Processor for Portable Applied Computing Devices
    Advance Copy Intel® StrongARM® SA-1110 High- Performance, Low-Power Processor For Portable Applied Computing Devices PRODUCT HIGHLIGHTS ■ Innovative Application Specific Standard Product (ASSP) delivers leadership performance, integration and low power for palm-size devices, PC companions, smart phones and other emerging portable applied computing devices As businesses and individuals rely increasingly on portable applied ■ High-speed 100 MHz memory bus and a computing devices to simplify their lives and boost their productivity, flexible memory these devices have to perform more complex functions quickly and controller that adds efficiently. To satisfy ever-increasing customer demands to support for SDRAM, communicate and access information ‘anytime, anywhere’, SMROM, and variable- manufacturers need technologies that deliver high-performance, robust latency I/O devices — provides design functionality and versatility while meeting the small-size and low-power flexibility, scalability and restrictions of portable, battery-operated products. Intel designed the high memory bandwidth SA-1110 processor with all of these requirements in mind. ■ Rich development The Intel® SA-1110 is a highly integrated 32-bit StrongARM® environment enables processor that incorporates Intel design and process technology along leading edge products with the power efficiency of the ARM* architecture. The SA-1110 is while reducing time- to-market software compatible with the ARM V4 architecture while utilizing a high-performance micro-architecture that is optimized to take advantage of Intel process technology. The Intel SA-1110 provides the performance, low power, integration and cost benefits of the Intel SA-1100 processor plus a high speed memory bus, flexible memory controller and the ability to handle variable-latency I/O devices.
    [Show full text]
  • Comparison of Contemporary Real Time Operating Systems
    ISSN (Online) 2278-1021 IJARCCE ISSN (Print) 2319 5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 11, November 2015 Comparison of Contemporary Real Time Operating Systems Mr. Sagar Jape1, Mr. Mihir Kulkarni2, Prof.Dipti Pawade3 Student, Bachelors of Engineering, Department of Information Technology, K J Somaiya College of Engineering, Mumbai1,2 Assistant Professor, Department of Information Technology, K J Somaiya College of Engineering, Mumbai3 Abstract: With the advancement in embedded area, importance of real time operating system (RTOS) has been increased to greater extent. Now days for every embedded application low latency, efficient memory utilization and effective scheduling techniques are the basic requirements. Thus in this paper we have attempted to compare some of the real time operating systems. The systems (viz. VxWorks, QNX, Ecos, RTLinux, Windows CE and FreeRTOS) have been selected according to the highest user base criterion. We enlist the peculiar features of the systems with respect to the parameters like scheduling policies, licensing, memory management techniques, etc. and further, compare the selected systems over these parameters. Our effort to formulate the often confused, complex and contradictory pieces of information on contemporary RTOSs into simple, analytical organized structure will provide decisive insights to the reader on the selection process of an RTOS as per his requirements. Keywords:RTOS, VxWorks, QNX, eCOS, RTLinux,Windows CE, FreeRTOS I. INTRODUCTION An operating system (OS) is a set of software that handles designed known as Real Time Operating System (RTOS). computer hardware. Basically it acts as an interface The motive behind RTOS development is to process data between user program and computer hardware.
    [Show full text]
  • SIMD Extensions
    SIMD Extensions PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 12 May 2012 17:14:46 UTC Contents Articles SIMD 1 MMX (instruction set) 6 3DNow! 8 Streaming SIMD Extensions 12 SSE2 16 SSE3 18 SSSE3 20 SSE4 22 SSE5 26 Advanced Vector Extensions 28 CVT16 instruction set 31 XOP instruction set 31 References Article Sources and Contributors 33 Image Sources, Licenses and Contributors 34 Article Licenses License 35 SIMD 1 SIMD Single instruction Multiple instruction Single data SISD MISD Multiple data SIMD MIMD Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism. History The first use of SIMD instructions was in vector supercomputers of the early 1970s such as the CDC Star-100 and the Texas Instruments ASC, which could operate on a vector of data with a single instruction. Vector processing was especially popularized by Cray in the 1970s and 1980s. Vector-processing architectures are now considered separate from SIMD machines, based on the fact that vector machines processed the vectors one word at a time through pipelined processors (though still based on a single instruction), whereas modern SIMD machines process all elements of the vector simultaneously.[1] The first era of modern SIMD machines was characterized by massively parallel processing-style supercomputers such as the Thinking Machines CM-1 and CM-2. These machines had many limited-functionality processors that would work in parallel.
    [Show full text]
  • Generic Pipelined Processor Modeling and High Performance
    Generic Pipelined Processor Modeling and High Performance Cycle-Accurate Simulator Generation Mehrdad Reshadi, Nikil Dutt Center for Embedded Computer Systems (CECS), Donald Bren School of Information and Computer Science, University of California Irvine, CA 92697, USA. {reshadi, dutt}@cecs.uci.edu simulators were more limited or slower than their manually generated Abstract counterparts. Detailed modeling of processors and high performance cycle- Colored Petri Net (CPN) [1] is a very powerful and flexible accurate simulators are essential for today’s hardware and software modeling technique and has been successfully used for describing design. These problems are challenging enough by themselves and parallelism, resource sharing and synchronization. It can naturally have seen many previous research efforts. Addressing both capture most of the behavioral elements of instruction flow in a simultaneously is even more challenging, with many existing processor. However, CPN models of realistic processors are very approaches focusing on one over another. In this paper, we propose complex mostly due to incompatibility of a token-based mechanism for the Reduced Colored Petri Net (RCPN) model that has two capturing data hazards. Such complexity reduces the productivity and advantages: first, it offers a very simple and intuitive way of modeling results in very slow simulators. In this paper, we present Reduced pipelined processors; second, it can generate high performance cycle- Colored Petri Net (RCPN), a generic modeling approach for accurate simulators. RCPN benefits from all the useful features of generating fast cycle-accurate simulators for pipelined processors. Colored Petri Nets without suffering from their exponential growth in RCPN is based on CPN and reduces the modeling complexity by complexity.
    [Show full text]
  • IXP400 Software's Programmer's Guide
    Intel® IXP400 Software Programmer’s Guide June 2004 Document Number: 252539-002c Intel® IXP400 Software Contents INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. The Intel® IXP400 Software v1.2.2 may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of MPEG CODECs, or MPEG enabled platforms may require licenses from various entities, including Intel Corporation. This document and the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation.
    [Show full text]
  • Comparative Architectures
    Comparative Architectures CST Part II, 16 lectures Lent Term 2006 David Greaves [email protected] Slides Lectures 1-13 (C) 2006 IAP + DJG Course Outline 1. Comparing Implementations • Developments fabrication technology • Cost, power, performance, compatibility • Benchmarking 2. Instruction Set Architecture (ISA) • Classic CISC and RISC traits • ISA evolution 3. Microarchitecture • Pipelining • Super-scalar { static & out-of-order • Multi-threading • Effects of ISA on µarchitecture and vice versa 4. Memory System Architecture • Memory Hierarchy 5. Multi-processor systems • Cache coherent and message passing Understanding design tradeoffs 2 Reading material • OHP slides, articles • Recommended Book: John Hennessy & David Patterson, Computer Architecture: a Quantitative Approach (3rd ed.) 2002 Morgan Kaufmann • MIT Open Courseware: 6.823 Computer System Architecture, by Krste Asanovic • The Web http://bwrc.eecs.berkeley.edu/CIC/ http://www.chip-architect.com/ http://www.geek.com/procspec/procspec.htm http://www.realworldtech.com/ http://www.anandtech.com/ http://www.arstechnica.com/ http://open.specbench.org/ • comp.arch News Group 3 Further Reading and Reference • M Johnson Superscalar microprocessor design 1991 Prentice-Hall • P Markstein IA-64 and Elementary Functions 2000 Prentice-Hall • A Tannenbaum, Structured Computer Organization (2nd ed.) 1990 Prentice-Hall • A Someren & C Atack, The ARM RISC Chip, 1994 Addison-Wesley • R Sites, Alpha Architecture Reference Manual, 1992 Digital Press • G Kane & J Heinrich, MIPS RISC Architecture
    [Show full text]
  • Demystifying Internet of Things Security Successful Iot Device/Edge and Platform Security Deployment — Sunil Cheruvu Anil Kumar Ned Smith David M
    Demystifying Internet of Things Security Successful IoT Device/Edge and Platform Security Deployment — Sunil Cheruvu Anil Kumar Ned Smith David M. Wheeler Demystifying Internet of Things Security Successful IoT Device/Edge and Platform Security Deployment Sunil Cheruvu Anil Kumar Ned Smith David M. Wheeler Demystifying Internet of Things Security: Successful IoT Device/Edge and Platform Security Deployment Sunil Cheruvu Anil Kumar Chandler, AZ, USA Chandler, AZ, USA Ned Smith David M. Wheeler Beaverton, OR, USA Gilbert, AZ, USA ISBN-13 (pbk): 978-1-4842-2895-1 ISBN-13 (electronic): 978-1-4842-2896-8 https://doi.org/10.1007/978-1-4842-2896-8 Copyright © 2020 by The Editor(s) (if applicable) and The Author(s) This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this book are included in the book’s Creative Commons license, unless indicated otherwise in a credit line to the material.
    [Show full text]
  • IXP43X Product Line of Network Processors Specification Update December 2008 2 Order Number: 316847; Revision: 005US Contents
    Intel® IXP43X Product Line of Network Processors Specification Update December 2008 Order Number: 316847; Revision: 005US INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families. See http://www.intel.com/products/processor_number for details. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.
    [Show full text]
  • Arm C Language Extensions Documentation Release ACLE Q1 2019
    Arm C Language Extensions Documentation Release ACLE Q1 2019 Arm Limited. Mar 21, 2019 Contents 1 Preface 1 1.1 Arm C Language Extensions.......................................1 1.2 Abstract..................................................1 1.3 Keywords.................................................1 1.4 How to find the latest release of this specification or report a defect in it................1 1.5 Confidentiality status...........................................1 1.5.1 Proprietary Notice.......................................2 1.6 About this document...........................................3 1.6.1 Change control.........................................3 1.6.1.1 Change history.....................................3 1.6.1.2 Changes between ACLE Q2 2018 and ACLE Q1 2019................3 1.6.1.3 Changes between ACLE Q2 2017 and ACLE Q2 2018................3 1.6.2 References...........................................3 1.6.3 Terms and abbreviations....................................3 1.7 Scope...................................................4 2 Introduction 5 2.1 Portable binary objects..........................................5 3 C language extensions 7 3.1 Data types................................................7 3.1.1 Implementation-defined type properties............................7 3.2 Predefined macros............................................8 3.3 Intrinsics.................................................8 3.3.1 Constant arguments to intrinsics................................8 3.4 Header files................................................8
    [Show full text]
  • AASP Brief 031704F.Pdf
    servicebrief Avnet Avenue Service Provider Program Avnet Design Services has teamed up with the top design service companies in North America to provide you with superior component, board and system level solutions. In cooperation with Avnet Design Services, you can access these pre-screened and certified design service providers. WHAT is the Avnet Avenue Service Provider Program? A geographically dispersed and technical diverse network of design service providers available to fulfill your design service needs Avnet's seven partners are the top design service companies in North America The program compliments Avnet Design Services' ASIC and FPGA design service offerings by providing additional component, board and system-level design services WHY use an Avnet Avenue Service Provider? Time to Market The program provides additional technical resources to assist you in meeting your time to market requirements Value All Providers are selected based on their ability to provide cost competitive solutions Experience All Providers have proven experience completing a wide array of projects on time and within budget Less Risk All Providers are pre-screened and certified to ensure your success Technology The program provides you with single source access to a broad range of services and technical expertise Scale All Providers are capable of supporting the full range of design service requirements from very large to small HOW do I access the Avnet Avenue Service Provider Program? Contact your local Avnet Representative or call 1-800-585-1602 so that
    [Show full text]
  • Intel® Itanium® Architecture Assembly Language Reference Guide
    Intel® Itanium® Architecture Assembly Language Reference Guide Copyright © 2000 - 2003 Intel Corporation. All rights reserved. Order Number: 248801-004 World Wide Web: http://developer.intel.com Intel(R) Itanium(R) Architecture Assembly Lanuage Reference Guide Page 2 Disclaimer and Legal Information Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. This Intel® Itanium® Architecture Assembly Language Reference Guide as well as the software described in it is furnished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them.
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]