Cache Aware Optimization of Stream Programs

Total Page:16

File Type:pdf, Size:1020Kb

Cache Aware Optimization of Stream Programs Cache Aware Optimization of Stream Programs Janis Sermulins William Thies Rodric Rabbah Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology fjaniss, thies, rabbah, [email protected] Abstract invariably expensive and complex. For example, superscalar pro- Effective use of the memory hierarchy is critical for achieving cessors resort to out-of-order execution to hide the latency of cache high performance on embedded systems. We focus on the class of misses. This results in large power expenditures (unfit for embed- streaming applications, which is increasingly prevalent in the em- ded systems) and also increases the cost of the system. Compilers bedded domain. We exploit the widespread parallelism and regular have also employed computation and data reordering to improve communication patterns in stream programs to formulate a set of locality, but this requires a heroic analysis due to the obscured par- cache aware optimizations that automatically improve instruction allelism and communication patterns in traditional languages such and data locality. Our work is in the context of the Synchronous as C. Dataflow model, in which a program is described as a graph of For performance-critical programs, the complexity inevitably independent actors that communicate over channels. The commu- propagates all the way to the application developer. Programs are nication rates between actors are known at compile time, allowing written to explicitly manage parallelism and to reorder the compu- the compiler to statically model the caching behavior. tation so that the instruction and data working sets fit within the We present three cache aware optimizations: 1) execution scal- cache. For example, the inputs and outputs of a procedure might be ing, which judiciously repeats actor executions to improve instruc- arrays that are specifically designed to fit within the data cache on tion locality, 2) cache aware fusion, which combines adjacent ac- a given architecture; loop bodies are written at a level of granular- tors while respecting instruction cache constraints, and 3) scalar ity that matches the instruction cache. While manual tuning can be replacement, which converts certain data buffers into a sequence effective, the end solutions are not portable. They are also exceed- of scalar variables that can be register allocated. The optimizations ingly difficult to understand, modify, and debug. are founded upon a simple and intuitive model that quantifies the The recent emergence of streaming applications represents an temporal locality for a sequence of actor executions. Our imple- opportunity to mitigate these problems using simple transforma- mentation of cache aware optimizations in the StreamIt compiler tions in the compiler. Stream programs are rich with parallelism and yields a 249% average speedup (over unoptimized code) for our regular communication patterns that can be exploited by the com- streaming benchmark suite on a StrongARM 1110 processor. The piler to automatically tune memory performance. Streaming codes optimizations also yield a 154% speedup on a Pentium 3 and a encompass a broad spectrum of applications, including embedded 152% speedup on an Itanium 2. communications processing, multimedia encoding and playback, compression, and encryption. They also range to server applica- Categories and Subject Descriptors D.3.4 [Programming Lan- tions, such as HDTV editing and hyper-spectral imaging. It is nat- guages]: Processors—Optimization; code generation; compilers; ural to express a stream program as a high-level graph of inde- D.3.2 [Programming Languages]: Language Classifications— pendent components, or actors. Actors communicate using explicit Concurrent, distributed, and parallel languages; Data-flow lan- FIFO channels and can execute whenever a sufficient number of guages items are available on their input channels. In a stream graph, actors General Terms Languages, Design, Performance can be freely combined and reordered to improve caching behav- ior as long as there are sufficient inputs to complete each execution. Keywords Stream Programing, StreamIt, Synchronous Dataflow, Such transformations can serve to automate tedious approaches that Cache, Cache Optimizations, Fusion, Embedded are performed manually using today’s languages; they are too com- plex to perform automatically in hardware or in the most aggressive 1. Introduction of C compilers. This paper presents three simple cache aware optimizations for Efficiency and high performance are of central importance within stream programs: (i) execution scaling, (ii) cache aware fusion, and the embedded domain. As processor speeds continue to increase, (iii) scalar replacement. These optimizations represent a unified ap- the memory bottleneck remains a primary impediment to attain- proach that simultaneously considers the instruction and data work- ing performance. Current practices for hiding memory latency are ing sets. We also develop a simple quantitative model of caching behavior for streaming workloads, providing a foundation to rea- son about the transformations. Our work is done in the context of the Synchronous Dataflow [13] model of computation, in which each actor in the stream graph has a known input and output rate. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed This is a popular model for a broad range of signal processing and for profit or commercial advantage and that copies bear this notice and the full citation embedded applications. on the first page. To copy otherwise, to republish, to post on servers or to redistribute Execution scaling is a transformation that improves instruction to lists, requires prior specific permission and/or a fee. locality by executing each actor in the stream graph multiple times LCTES'05, June 15–17, 2005, Chicago, Illinois, USA. before moving on to the next actor. As a given actor usually fits Copyright c 2005 ACM 1-59593-018-3/05/0006. $5.00. within the cache, the repeated executions serve to amortize the cost of loading the actor from off-chip memory. However, as our float->float filter FIR_Filter (int N, float[] weights) { work push 1 pop 1 peek N { cache model will show, actors should not be scaled excessively, as float sum = 0; their outputs will eventually overflow the data cache. We present a for (int i = 0; i < N; i++) { simple and effective algorithm for calculating a scaling factor that sum += peek(i) * weights[i]; respects both instruction and data constraints. } pop(); Prior to execution scaling, cache aware fusion combines adja- push(sum); cent actors into a single function. This allows the compiler to opti- } mize across actor boundaries. Our algorithm is cache aware in that } it never fuses a pair of actors that will result in an overflow of the instruction cache. Figure 1. StreamIt code for an FIR filter As actors are fused together, new buffer management strategies become possible. The most aggressive of these, termed scalar re- placement, serves to replace an array with a series of local scalar splitter variables. Unlike array references, scalar variables can be regis- stream ter allocated, leading to large performance gains. We also develop joiner a new buffer management strategy (called “copy-shift”) that ex- stream st am st am tends scalar replacement to sliding-window computations, a do- stream stream re re stream main where complex indexing expressions typically hinder com- splitter piler analysis. stream Our cache aware optimizations are implemented as part of joiner StreamIt, a language and compiler infrastructure for stream pro- gramming [21]. We evaluate the optimizations on three architec- tures. The StrongARM 1110 represents our primary target; it is an (a) pipeline (b) splitjoin (c) feedback loop embedded processor without a secondary cache. Our other targets are the Pentium 3 (a superscalar) and the Itanium 2 (a VLIW pro- Figure 2. Hierarchical streams in StreamIt. cessor). We find that execution scaling, cache aware fusion, and scalar replacement each offer significant performance gains, and the most consistent speedups result when all are applied together. float -> float pipeline Main() { Source Compared to unoptimized StreamIt code, our cache optimizations add Source(); // code for Source not shown yield a 249% speedup on the StrongARM, a 154% speedup on add FIR(); FIR the Pentium 3, and a 152% speedup on Itanium 2. These numbers add Output(); // code for Output not shown u u represent averages over our streaming benchmark suite. } O tp t This paper is organized as follows. Section 2 gives background information on the StreamIt language. Section 3 lays the foundation Figure 3. Example pipeline with FIR filter. for our approach by developing a quantitative model of caching be- havior for any sequence of actor executions. Section 4 describes execution scaling and cache aware scheduling. Section 5 evaluates N weights buffer management strategies, including scalar replacement. Sec- (e.g., and ) are equivalent to parameters passed to a tion 6 contains our experimental evaluation of these techniques in class constructor. the StreamIt compiler. Finally, Section 7 describes related work and In StreamIt, the application developer focuses on the hierarchi- Section 8 concludes the paper. cal assembly of the stream graph and its communication topology, rather than on the explicit management of the data buffers between filters. StreamIt provides
Recommended publications
  • Intel® Strongarm® SA-1110 High- Performance, Low-Power Processor for Portable Applied Computing Devices
    Advance Copy Intel® StrongARM® SA-1110 High- Performance, Low-Power Processor For Portable Applied Computing Devices PRODUCT HIGHLIGHTS ■ Innovative Application Specific Standard Product (ASSP) delivers leadership performance, integration and low power for palm-size devices, PC companions, smart phones and other emerging portable applied computing devices As businesses and individuals rely increasingly on portable applied ■ High-speed 100 MHz memory bus and a computing devices to simplify their lives and boost their productivity, flexible memory these devices have to perform more complex functions quickly and controller that adds efficiently. To satisfy ever-increasing customer demands to support for SDRAM, communicate and access information ‘anytime, anywhere’, SMROM, and variable- manufacturers need technologies that deliver high-performance, robust latency I/O devices — provides design functionality and versatility while meeting the small-size and low-power flexibility, scalability and restrictions of portable, battery-operated products. Intel designed the high memory bandwidth SA-1110 processor with all of these requirements in mind. ■ Rich development The Intel® SA-1110 is a highly integrated 32-bit StrongARM® environment enables processor that incorporates Intel design and process technology along leading edge products with the power efficiency of the ARM* architecture. The SA-1110 is while reducing time- to-market software compatible with the ARM V4 architecture while utilizing a high-performance micro-architecture that is optimized to take advantage of Intel process technology. The Intel SA-1110 provides the performance, low power, integration and cost benefits of the Intel SA-1100 processor plus a high speed memory bus, flexible memory controller and the ability to handle variable-latency I/O devices.
    [Show full text]
  • Comparison of Contemporary Real Time Operating Systems
    ISSN (Online) 2278-1021 IJARCCE ISSN (Print) 2319 5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 11, November 2015 Comparison of Contemporary Real Time Operating Systems Mr. Sagar Jape1, Mr. Mihir Kulkarni2, Prof.Dipti Pawade3 Student, Bachelors of Engineering, Department of Information Technology, K J Somaiya College of Engineering, Mumbai1,2 Assistant Professor, Department of Information Technology, K J Somaiya College of Engineering, Mumbai3 Abstract: With the advancement in embedded area, importance of real time operating system (RTOS) has been increased to greater extent. Now days for every embedded application low latency, efficient memory utilization and effective scheduling techniques are the basic requirements. Thus in this paper we have attempted to compare some of the real time operating systems. The systems (viz. VxWorks, QNX, Ecos, RTLinux, Windows CE and FreeRTOS) have been selected according to the highest user base criterion. We enlist the peculiar features of the systems with respect to the parameters like scheduling policies, licensing, memory management techniques, etc. and further, compare the selected systems over these parameters. Our effort to formulate the often confused, complex and contradictory pieces of information on contemporary RTOSs into simple, analytical organized structure will provide decisive insights to the reader on the selection process of an RTOS as per his requirements. Keywords:RTOS, VxWorks, QNX, eCOS, RTLinux,Windows CE, FreeRTOS I. INTRODUCTION An operating system (OS) is a set of software that handles designed known as Real Time Operating System (RTOS). computer hardware. Basically it acts as an interface The motive behind RTOS development is to process data between user program and computer hardware.
    [Show full text]
  • IXP400 Software's Programmer's Guide
    Intel® IXP400 Software Programmer’s Guide June 2004 Document Number: 252539-002c Intel® IXP400 Software Contents INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. The Intel® IXP400 Software v1.2.2 may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of MPEG CODECs, or MPEG enabled platforms may require licenses from various entities, including Intel Corporation. This document and the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation.
    [Show full text]
  • Comparative Architectures
    Comparative Architectures CST Part II, 16 lectures Lent Term 2006 David Greaves [email protected] Slides Lectures 1-13 (C) 2006 IAP + DJG Course Outline 1. Comparing Implementations • Developments fabrication technology • Cost, power, performance, compatibility • Benchmarking 2. Instruction Set Architecture (ISA) • Classic CISC and RISC traits • ISA evolution 3. Microarchitecture • Pipelining • Super-scalar { static & out-of-order • Multi-threading • Effects of ISA on µarchitecture and vice versa 4. Memory System Architecture • Memory Hierarchy 5. Multi-processor systems • Cache coherent and message passing Understanding design tradeoffs 2 Reading material • OHP slides, articles • Recommended Book: John Hennessy & David Patterson, Computer Architecture: a Quantitative Approach (3rd ed.) 2002 Morgan Kaufmann • MIT Open Courseware: 6.823 Computer System Architecture, by Krste Asanovic • The Web http://bwrc.eecs.berkeley.edu/CIC/ http://www.chip-architect.com/ http://www.geek.com/procspec/procspec.htm http://www.realworldtech.com/ http://www.anandtech.com/ http://www.arstechnica.com/ http://open.specbench.org/ • comp.arch News Group 3 Further Reading and Reference • M Johnson Superscalar microprocessor design 1991 Prentice-Hall • P Markstein IA-64 and Elementary Functions 2000 Prentice-Hall • A Tannenbaum, Structured Computer Organization (2nd ed.) 1990 Prentice-Hall • A Someren & C Atack, The ARM RISC Chip, 1994 Addison-Wesley • R Sites, Alpha Architecture Reference Manual, 1992 Digital Press • G Kane & J Heinrich, MIPS RISC Architecture
    [Show full text]
  • Arm C Language Extensions Documentation Release ACLE Q1 2019
    Arm C Language Extensions Documentation Release ACLE Q1 2019 Arm Limited. Mar 21, 2019 Contents 1 Preface 1 1.1 Arm C Language Extensions.......................................1 1.2 Abstract..................................................1 1.3 Keywords.................................................1 1.4 How to find the latest release of this specification or report a defect in it................1 1.5 Confidentiality status...........................................1 1.5.1 Proprietary Notice.......................................2 1.6 About this document...........................................3 1.6.1 Change control.........................................3 1.6.1.1 Change history.....................................3 1.6.1.2 Changes between ACLE Q2 2018 and ACLE Q1 2019................3 1.6.1.3 Changes between ACLE Q2 2017 and ACLE Q2 2018................3 1.6.2 References...........................................3 1.6.3 Terms and abbreviations....................................3 1.7 Scope...................................................4 2 Introduction 5 2.1 Portable binary objects..........................................5 3 C language extensions 7 3.1 Data types................................................7 3.1.1 Implementation-defined type properties............................7 3.2 Predefined macros............................................8 3.3 Intrinsics.................................................8 3.3.1 Constant arguments to intrinsics................................8 3.4 Header files................................................8
    [Show full text]
  • AASP Brief 031704F.Pdf
    servicebrief Avnet Avenue Service Provider Program Avnet Design Services has teamed up with the top design service companies in North America to provide you with superior component, board and system level solutions. In cooperation with Avnet Design Services, you can access these pre-screened and certified design service providers. WHAT is the Avnet Avenue Service Provider Program? A geographically dispersed and technical diverse network of design service providers available to fulfill your design service needs Avnet's seven partners are the top design service companies in North America The program compliments Avnet Design Services' ASIC and FPGA design service offerings by providing additional component, board and system-level design services WHY use an Avnet Avenue Service Provider? Time to Market The program provides additional technical resources to assist you in meeting your time to market requirements Value All Providers are selected based on their ability to provide cost competitive solutions Experience All Providers have proven experience completing a wide array of projects on time and within budget Less Risk All Providers are pre-screened and certified to ensure your success Technology The program provides you with single source access to a broad range of services and technical expertise Scale All Providers are capable of supporting the full range of design service requirements from very large to small HOW do I access the Avnet Avenue Service Provider Program? Contact your local Avnet Representative or call 1-800-585-1602 so that
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Network Processors: Building Block for Programmable Networks
    NetworkNetwork Processors:Processors: BuildingBuilding BlockBlock forfor programmableprogrammable networksnetworks Raj Yavatkar Chief Software Architect Intel® Internet Exchange Architecture [email protected] 1 Page 1 Raj Yavatkar OutlineOutline y IXP 2xxx hardware architecture y IXA software architecture y Usage questions y Research questions Page 2 Raj Yavatkar IXPIXP NetworkNetwork ProcessorsProcessors Control Processor y Microengines – RISC processors optimized for packet processing Media/Fabric StrongARM – Hardware support for Interface – Hardware support for multi-threading y Embedded ME 1 ME 2 ME n StrongARM/Xscale – Runs embedded OS and handles exception tasks SRAM DRAM Page 3 Raj Yavatkar IXP:IXP: AA BuildingBuilding BlockBlock forfor NetworkNetwork SystemsSystems y Example: IXP2800 – 16 micro-engines + XScale core Multi-threaded (x8) – Up to 1.4 Ghz ME speed RDRAM Microengine Array Media – 8 HW threads/ME Controller – 4K control store per ME Switch MEv2 MEv2 MEv2 MEv2 Fabric – Multi-level memory hierarchy 1 2 3 4 I/F – Multiple inter-processor communication channels MEv2 MEv2 MEv2 MEv2 Intel® 8 7 6 5 y NPU vs. GPU tradeoffs PCI XScale™ Core MEv2 MEv2 MEv2 MEv2 – Reduce core complexity 9 10 11 12 – No hardware caching – Simpler instructions Î shallow MEv2 MEv2 MEv2 MEv2 Scratch pipelines QDR SRAM 16 15 14 13 Memory – Multiple cores with HW multi- Controller Hash Per-Engine threading per chip Unit Memory, CAM, Signals Interconnect Page 4 Raj Yavatkar IXPIXP 24002400 BlockBlock DiagramDiagram Page 5 Raj Yavatkar XScaleXScale
    [Show full text]
  • Strongarm™ SA-1100 Microprocessor for Portable
    StrongARM™ SA-1100 Microprocessor for Portable Applications Brief Datasheet Product Features The StrongARM™ SA-1100 Microprocessor (SA-1100) is a device targeted to provide portable applications with high-end computing performance without requiring users to sacrifice available battery time. The SA-1100 incorporates a 32-bit StrongARM™ RISC processor with instruction and data cache, memory-management unit (MMU), and read/write buffers running at 133/190 MHz. In addition, the SA-1100 provides system support logic, multiple serial communication channels, a color/gray scale LCD controller, PCMCIA support for up to two sockets, and general-purpose I/O ports. ■ High performance ■ 208-pin thin quad flat pack (LQFP) —150 Dhrystone 2.1 MIPS @ 133 MHz ■ 256 mini-ball grid array (mBGA) —220 Dhrystone 2.1 MIPS @ 190 MHz ■ Low power (normal mode)† ■ 32-way set-associative caches —<230 mW @1.5 V/133 MHz —16 Kbyte instruction cache —<330 mW @1.5 V/190 MHz —8 Kbyte write-back data cache ■ Integrated clock generation ■ 32-entry MMUs —Internal phase-locked loop (PLL) —Maps 4 Kbyte, 8 Kbyte, or 1 Mbyte —3.686-MHz oscillator —32.768-kHz oscillator ■ Power-management features ■ Write buffer —Normal (full-on) mode —8-entry, between 1 and 16 bytes each —Idle (power-down) mode —Sleep (power-down) mode ■ Big and little endian operating modes ■ Read buffer —4-entry, 1, 4, or 8 words ■ 3.3-V I/O interface ■ Memory bus —Interfaces to ROM, Flash, SRAM, and DRAM —Supports two PCMCIA sockets † Power dissipation, particularly in idle mode, is strongly dependent on the details of the system design Order Number: 278087-002 November 1998 Information in this document is provided in connection with Intel products.
    [Show full text]
  • Sok: Introspections on Trust and the Semantic Gap
    SoK: Introspections on Trust and the Semantic Gap Bhushan Jain, Mirza Basim Baig, Dongli Zhang, Donald E. Porter, and Radu Sion Stony Brook University fbpjain, mbaig, dozhang, porter, [email protected] Abstract—An essential goal of Virtual Machine Introspection representative legacy OS (Linux 3.13.5), and a representative (VMI) is assuring security policy enforcement and overall bare-metal hypervisor (Xen 4.4), as well as comparing the functionality in the presence of an untrustworthy OS. A number of reported exploits in both systems over the last 8 fundamental obstacle to this goal is the difficulty in accurately extracting semantic meaning from the hypervisor’s hardware- years. Perhaps unsurprisingly, the size of the code base and level view of a guest OS, called the semantic gap. Over the API complexity are strongly correlated with the number of twelve years since the semantic gap was identified, immense reported vulnerabilities [85]. Thus, hypervisors are a much progress has been made in developing powerful VMI tools. more appealing foundation for the trusted computing base Unfortunately, much of this progress has been made at of modern software systems. the cost of reintroducing trust into the guest OS, often in direct contradiction to the underlying threat model motivating This paper focuses on systems that aim to assure the func- the introspection. Although this choice is reasonable in some tionality required by applications using a legacy software contexts and has facilitated progress, the ultimate goal of stack, secured through techniques such as virtual machine reducing the trusted computing base of software systems is introspection (VMI) [46].
    [Show full text]
  • The New Intel® Xscale™ Microarchitecture
    Session 5: Application Specific Processors The new Intel® Xscale™ Microarchitecture Nuno Ricardo Carvalho de Sousa Departamento de Informática, Universidade do Minho 4710 - 057 Braga, Portugal [email protected] Abstract. In embedded systems, performance and power consumption are the most important criteria to define a good processor chip. The new Intel® Xscale™ microarchitecture, an evolution from StrongARM™ microarchitecture, combines these two features, as will be detailed in this communication. We will also see the advanced techniques used by this microarchitecture core to achieve a high level of efficiency. 1 Introduction Nowadays, most microprocessors are in embedded systems, not in PC’s. Embedded products become part of our everyday items: cellular phones, video games, Personal Digital Assistants (PDA) and much more. Although PC processors seem to generate much of all the excitement in the press, it is the other 98 percent – the embedded processors – that are technologically leading the way. This required a new design of microprocessors. The performance of these embedded microprocessors rivals that of PC’s of just few years ago. With clock frequencies up to 400 MHz, these chips offer performance, with a very economical electrical consumption [1]. A microprocessor’s architecture defines the instruction set and programmer’s model for any processor that will be based on that architecture. Different processor implementations may be built to comply with the architecture. Each processor may vary in performance and features, and be optimized to target different applications. In this document we will see with more detail Intel’s 80200, the first microprocessor that use Xscale, and the new Intel PXA250 application processor.
    [Show full text]
  • Tornado-Releasenotes
    Tornado® 2.2 RELEASE NOTES Copyright 2002 Wind River Systems, Inc. ALL RIGHTS RESERVED. No part of this publication may be copied in any form, by photocopy, microfilm, retrieval system, or by any other means now known or hereafter invented without the prior written permission of Wind River Systems, Inc. AutoCode, Embedded Internet, Epilogue, ESp, FastJ, IxWorks, MATRIXX, pRISM, pRISM+, pSOS, RouterWare, Tornado, VxWorks, wind, WindNavigator, Wind River Systems, WinRouter, and Xmath are registered trademarks or service marks of Wind River Systems, Inc. or its subsidiaries. Attaché Plus, BetterState, Doctor Design, Embedded Desktop, Emissary, Envoy, How Smart Things Think, HTMLWorks, MotorWorks, OSEKWorks, Personal JWorks, pSOS+, pSOSim, pSOSystem, SingleStep, SNiFF+, VSPWorks, VxDCOM, VxFusion, VxMP, VxSim, VxVMI, Wind Foundation Classes, WindC++, WindManage, WindNet, Wind River, WindSurf, and WindView are trademarks or service marks of Wind River Systems, Inc. or its subsidiaries. This is a partial list. For a complete list of Wind River trademarks and service marks, see the following URL: http://www.windriver.com/corporate/html/trademark.html Use of the above marks without the express written permission of Wind River Systems, Inc. is prohibited. All other trademarks, registered trademarks, or service marks mentioned herein are the property of their respective owners. Corporate Headquarters Wind River Systems, Inc. 500 Wind River Way Alameda, CA 94501-1153 U.S.A. toll free (U.S.): 800/545-WIND telephone: 510/748-4100 facsimile: 510/749-2010 For additional contact information, please visit the Wind River URL: http://www.windriver.com For information on how to contact Customer Support, please visit the following URL: http://www.windriver.com/support Tornado Release Notes, 2.2 15 Aug 02 Part #: DOC-14291-ZD-01 Contents 1 Introduction .............................................................................................................
    [Show full text]