3D Graphics Optimizations for Arm

Total Page:16

File Type:pdf, Size:1020Kb

3D Graphics Optimizations for Arm 3D GRAPHICS OPTIMIZATIONS FOR ARM ARCHITECTURE † ‡ † Gopi K. Kolli Stephen Junkins Haim Barad [email protected] mailto:[email protected] mailto:[email protected] † Handheld Computing Division ‡ Emerging Platforms Lab Intel Corporation Presented at GDC<http://www.gdconf.com/>. Introduction............................................................................................................................................................................................................. 3 Floating-Point Systems Vs Fixed-Point Systems ................................................................................................................................................... 3 Floating-Point Systems ........................................................................................................................................................................................ 3 Hardware Coprocessor........................................................................................................................................................................................ 3 Floating-point Library......................................................................................................................................................................................... 3 Fixed-Point System ................................................................................................................................................................................................ 4 Arithmetic Operations......................................................................................................................................................................................... 4 Dynamic Range and Precision ............................................................................................................................................................................ 4 Error Checking.................................................................................................................................................................................................... 5 Arithmetic Approximation Routines.................................................................................................................................................................. 6 Trigonometric functions..................................................................................................................................................................................... 7 Integer Divide.......................................................................................................................................................................................................... 7 Branching and Predication..................................................................................................................................................................................... 8 Branching............................................................................................................................................................................................................. 8 Predication ........................................................................................................................................................................................................... 8 Invoking Predication.............................................................................................................................................................................................. 9 Loops .................................................................................................................................................................................................................. 9 “If” statements.................................................................................................................................................................................................... 9 Relational or Boolean expression....................................................................................................................................................................... 9 Register Allocation............................................................................................................................................................................................... 10 Pointer Aliasing ................................................................................................................................................................................................. 10 Function call overhead....................................................................................................................................................................................... 10 Memory-Based Optimizations ............................................................................................................................................................................ 10 Conclusion ............................................................................................................................................................................................................ 10 References............................................................................................................................................................................................................. 11 Introduction Embedded and handheld computing devices are rapidly becoming ubiquitous. They are evolving in usage, performance and features and are becoming capable of supporting 3D graphics. The computational performance and display capabilities of these consumer devices are evolving rapidly. Compaq’s iPaq 3800 handheld computing devices has a 206Mhz Intel StrongARM Microprocessor and a 16-bit QVGA display. With such capabilities, handheld computing devices, set-top boxes and even cell phones can now be programmed to support software rendering of immersive 3D Worlds. Once developed, a 3D rendering solution coupled with wireless connectivity capability, and the growing ubiquity of mobile computing devices, provides an exciting new opportunity for 3D game developers. Many mobile devices such as cell-phones, personal digital assistants and handheld gaming devices use ARM-based processors. ARM architecture is a 16/32-bit RISC architecture designed to allow very small, yet high-performance implementations for low power devices and is becoming an architecture standard for handheld, multi-media computing. Though ARM processor instruction throughput has recently become quite attractive, other aspects of the architecture challenge implementers of software 3D Rendering systems. Specifically: • Many commercial ARM-based devices do not include dedicated floating-point hardware due to extra cost and power consumption issues. • ARM architecture does not support integer divide. • For most ARM implementations, on chip caches are quite small relative to PC architecture caches sizes. • Display hardware is small and very simple; usually LCD controller memory maps system memory. • 2D and 3D Rasterization hardware is not commonplace in embedded devices. Cost and power consumption will likely limit the acceptance of dedicated hardware in the future, especially for cell phones. However, leading edge PDAs might accept it for a premium price. Given these architectural challenges, careful optimization of the 3D engine is the key to achieving rendering performance sufficient for 3D games on ARM-based platforms. In this paper, we will explore these challenges and suggest performance optimization strategies to enable game developers to build software 3D Rendering solutions for ARM-based embedded devices. Floating-Point Systems Vs Fixed-Point Systems Flexible 3D engines require real number representation of coordinate space systems to support many of 3D Rendering’s fundamental algorithms. Real number representation is especially relevant for implementation of transform, lighting, clipping, and culling, as they require broad dynamic range and a high degree of precision. Floating-point representation of real numbers is preferred to integer representation due to its ability to provide large dynamic range and very high precision. Floating-Point Systems Floating-point support can be provided in ARM-based systems either in hardware or in software. Hardware Coprocessor Hardware floating-point implementation typically consists of a floating-point coprocessor and provides very good performance. However, placing additional silicon and power consumption costs on a commercial system is prohibitive. Additionally, the hardware coprocessor limits the performance of the ARM code with its maximum clock speed. Therefore, this implementation is not preferred currently in commercial ARM-based systems. Floating-point Library Software floating-point implementation typically consists of a floating-point library. Floating-point operations can be fully implemented in a software library using ARM instructions. While compiling the floating-point application code, the compilers generate function calls to this software library rather than floating-point instructions. Therefore, the application code • Remains unaffected with future inclusion of floating-point hardware into the system. • Can instantly take advantage of any improvement in the ARM core. Choice of the floating-point support in the system depends on various
Recommended publications
  • Intel® Strongarm® SA-1110 High- Performance, Low-Power Processor for Portable Applied Computing Devices
    Advance Copy Intel® StrongARM® SA-1110 High- Performance, Low-Power Processor For Portable Applied Computing Devices PRODUCT HIGHLIGHTS ■ Innovative Application Specific Standard Product (ASSP) delivers leadership performance, integration and low power for palm-size devices, PC companions, smart phones and other emerging portable applied computing devices As businesses and individuals rely increasingly on portable applied ■ High-speed 100 MHz memory bus and a computing devices to simplify their lives and boost their productivity, flexible memory these devices have to perform more complex functions quickly and controller that adds efficiently. To satisfy ever-increasing customer demands to support for SDRAM, communicate and access information ‘anytime, anywhere’, SMROM, and variable- manufacturers need technologies that deliver high-performance, robust latency I/O devices — provides design functionality and versatility while meeting the small-size and low-power flexibility, scalability and restrictions of portable, battery-operated products. Intel designed the high memory bandwidth SA-1110 processor with all of these requirements in mind. ■ Rich development The Intel® SA-1110 is a highly integrated 32-bit StrongARM® environment enables processor that incorporates Intel design and process technology along leading edge products with the power efficiency of the ARM* architecture. The SA-1110 is while reducing time- to-market software compatible with the ARM V4 architecture while utilizing a high-performance micro-architecture that is optimized to take advantage of Intel process technology. The Intel SA-1110 provides the performance, low power, integration and cost benefits of the Intel SA-1100 processor plus a high speed memory bus, flexible memory controller and the ability to handle variable-latency I/O devices.
    [Show full text]
  • Comparison of Contemporary Real Time Operating Systems
    ISSN (Online) 2278-1021 IJARCCE ISSN (Print) 2319 5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 11, November 2015 Comparison of Contemporary Real Time Operating Systems Mr. Sagar Jape1, Mr. Mihir Kulkarni2, Prof.Dipti Pawade3 Student, Bachelors of Engineering, Department of Information Technology, K J Somaiya College of Engineering, Mumbai1,2 Assistant Professor, Department of Information Technology, K J Somaiya College of Engineering, Mumbai3 Abstract: With the advancement in embedded area, importance of real time operating system (RTOS) has been increased to greater extent. Now days for every embedded application low latency, efficient memory utilization and effective scheduling techniques are the basic requirements. Thus in this paper we have attempted to compare some of the real time operating systems. The systems (viz. VxWorks, QNX, Ecos, RTLinux, Windows CE and FreeRTOS) have been selected according to the highest user base criterion. We enlist the peculiar features of the systems with respect to the parameters like scheduling policies, licensing, memory management techniques, etc. and further, compare the selected systems over these parameters. Our effort to formulate the often confused, complex and contradictory pieces of information on contemporary RTOSs into simple, analytical organized structure will provide decisive insights to the reader on the selection process of an RTOS as per his requirements. Keywords:RTOS, VxWorks, QNX, eCOS, RTLinux,Windows CE, FreeRTOS I. INTRODUCTION An operating system (OS) is a set of software that handles designed known as Real Time Operating System (RTOS). computer hardware. Basically it acts as an interface The motive behind RTOS development is to process data between user program and computer hardware.
    [Show full text]
  • IXP400 Software's Programmer's Guide
    Intel® IXP400 Software Programmer’s Guide June 2004 Document Number: 252539-002c Intel® IXP400 Software Contents INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel Corporation may have patents or pending patent applications, trademarks, copyrights, or other intellectual property rights that relate to the presented subject matter. The furnishing of documents and other materials and information does not provide any license, express or implied, by estoppel or otherwise, to any such patents, trademarks, copyrights, or other intellectual property rights. Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. The Intel® IXP400 Software v1.2.2 may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. MPEG is an international standard for video compression/decompression promoted by ISO. Implementations of MPEG CODECs, or MPEG enabled platforms may require licenses from various entities, including Intel Corporation. This document and the software described in it are furnished under license and may only be used or copied in accordance with the terms of the license. The information in this document is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Intel Corporation.
    [Show full text]
  • Comparative Architectures
    Comparative Architectures CST Part II, 16 lectures Lent Term 2006 David Greaves [email protected] Slides Lectures 1-13 (C) 2006 IAP + DJG Course Outline 1. Comparing Implementations • Developments fabrication technology • Cost, power, performance, compatibility • Benchmarking 2. Instruction Set Architecture (ISA) • Classic CISC and RISC traits • ISA evolution 3. Microarchitecture • Pipelining • Super-scalar { static & out-of-order • Multi-threading • Effects of ISA on µarchitecture and vice versa 4. Memory System Architecture • Memory Hierarchy 5. Multi-processor systems • Cache coherent and message passing Understanding design tradeoffs 2 Reading material • OHP slides, articles • Recommended Book: John Hennessy & David Patterson, Computer Architecture: a Quantitative Approach (3rd ed.) 2002 Morgan Kaufmann • MIT Open Courseware: 6.823 Computer System Architecture, by Krste Asanovic • The Web http://bwrc.eecs.berkeley.edu/CIC/ http://www.chip-architect.com/ http://www.geek.com/procspec/procspec.htm http://www.realworldtech.com/ http://www.anandtech.com/ http://www.arstechnica.com/ http://open.specbench.org/ • comp.arch News Group 3 Further Reading and Reference • M Johnson Superscalar microprocessor design 1991 Prentice-Hall • P Markstein IA-64 and Elementary Functions 2000 Prentice-Hall • A Tannenbaum, Structured Computer Organization (2nd ed.) 1990 Prentice-Hall • A Someren & C Atack, The ARM RISC Chip, 1994 Addison-Wesley • R Sites, Alpha Architecture Reference Manual, 1992 Digital Press • G Kane & J Heinrich, MIPS RISC Architecture
    [Show full text]
  • Arm C Language Extensions Documentation Release ACLE Q1 2019
    Arm C Language Extensions Documentation Release ACLE Q1 2019 Arm Limited. Mar 21, 2019 Contents 1 Preface 1 1.1 Arm C Language Extensions.......................................1 1.2 Abstract..................................................1 1.3 Keywords.................................................1 1.4 How to find the latest release of this specification or report a defect in it................1 1.5 Confidentiality status...........................................1 1.5.1 Proprietary Notice.......................................2 1.6 About this document...........................................3 1.6.1 Change control.........................................3 1.6.1.1 Change history.....................................3 1.6.1.2 Changes between ACLE Q2 2018 and ACLE Q1 2019................3 1.6.1.3 Changes between ACLE Q2 2017 and ACLE Q2 2018................3 1.6.2 References...........................................3 1.6.3 Terms and abbreviations....................................3 1.7 Scope...................................................4 2 Introduction 5 2.1 Portable binary objects..........................................5 3 C language extensions 7 3.1 Data types................................................7 3.1.1 Implementation-defined type properties............................7 3.2 Predefined macros............................................8 3.3 Intrinsics.................................................8 3.3.1 Constant arguments to intrinsics................................8 3.4 Header files................................................8
    [Show full text]
  • AASP Brief 031704F.Pdf
    servicebrief Avnet Avenue Service Provider Program Avnet Design Services has teamed up with the top design service companies in North America to provide you with superior component, board and system level solutions. In cooperation with Avnet Design Services, you can access these pre-screened and certified design service providers. WHAT is the Avnet Avenue Service Provider Program? A geographically dispersed and technical diverse network of design service providers available to fulfill your design service needs Avnet's seven partners are the top design service companies in North America The program compliments Avnet Design Services' ASIC and FPGA design service offerings by providing additional component, board and system-level design services WHY use an Avnet Avenue Service Provider? Time to Market The program provides additional technical resources to assist you in meeting your time to market requirements Value All Providers are selected based on their ability to provide cost competitive solutions Experience All Providers have proven experience completing a wide array of projects on time and within budget Less Risk All Providers are pre-screened and certified to ensure your success Technology The program provides you with single source access to a broad range of services and technical expertise Scale All Providers are capable of supporting the full range of design service requirements from very large to small HOW do I access the Avnet Avenue Service Provider Program? Contact your local Avnet Representative or call 1-800-585-1602 so that
    [Show full text]
  • Computer Architectures an Overview
    Computer Architectures An Overview PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sat, 25 Feb 2012 22:35:32 UTC Contents Articles Microarchitecture 1 x86 7 PowerPC 23 IBM POWER 33 MIPS architecture 39 SPARC 57 ARM architecture 65 DEC Alpha 80 AlphaStation 92 AlphaServer 95 Very long instruction word 103 Instruction-level parallelism 107 Explicitly parallel instruction computing 108 References Article Sources and Contributors 111 Image Sources, Licenses and Contributors 113 Article Licenses License 114 Microarchitecture 1 Microarchitecture In computer engineering, microarchitecture (sometimes abbreviated to µarch or uarch), also called computer organization, is the way a given instruction set architecture (ISA) is implemented on a processor. A given ISA may be implemented with different microarchitectures.[1] Implementations might vary due to different goals of a given design or due to shifts in technology.[2] Computer architecture is the combination of microarchitecture and instruction set design. Relation to instruction set architecture The ISA is roughly the same as the programming model of a processor as seen by an assembly language programmer or compiler writer. The ISA includes the execution model, processor registers, address and data formats among other things. The Intel Core microarchitecture microarchitecture includes the constituent parts of the processor and how these interconnect and interoperate to implement the ISA. The microarchitecture of a machine is usually represented as (more or less detailed) diagrams that describe the interconnections of the various microarchitectural elements of the machine, which may be everything from single gates and registers, to complete arithmetic logic units (ALU)s and even larger elements.
    [Show full text]
  • Network Processors: Building Block for Programmable Networks
    NetworkNetwork Processors:Processors: BuildingBuilding BlockBlock forfor programmableprogrammable networksnetworks Raj Yavatkar Chief Software Architect Intel® Internet Exchange Architecture [email protected] 1 Page 1 Raj Yavatkar OutlineOutline y IXP 2xxx hardware architecture y IXA software architecture y Usage questions y Research questions Page 2 Raj Yavatkar IXPIXP NetworkNetwork ProcessorsProcessors Control Processor y Microengines – RISC processors optimized for packet processing Media/Fabric StrongARM – Hardware support for Interface – Hardware support for multi-threading y Embedded ME 1 ME 2 ME n StrongARM/Xscale – Runs embedded OS and handles exception tasks SRAM DRAM Page 3 Raj Yavatkar IXP:IXP: AA BuildingBuilding BlockBlock forfor NetworkNetwork SystemsSystems y Example: IXP2800 – 16 micro-engines + XScale core Multi-threaded (x8) – Up to 1.4 Ghz ME speed RDRAM Microengine Array Media – 8 HW threads/ME Controller – 4K control store per ME Switch MEv2 MEv2 MEv2 MEv2 Fabric – Multi-level memory hierarchy 1 2 3 4 I/F – Multiple inter-processor communication channels MEv2 MEv2 MEv2 MEv2 Intel® 8 7 6 5 y NPU vs. GPU tradeoffs PCI XScale™ Core MEv2 MEv2 MEv2 MEv2 – Reduce core complexity 9 10 11 12 – No hardware caching – Simpler instructions Î shallow MEv2 MEv2 MEv2 MEv2 Scratch pipelines QDR SRAM 16 15 14 13 Memory – Multiple cores with HW multi- Controller Hash Per-Engine threading per chip Unit Memory, CAM, Signals Interconnect Page 4 Raj Yavatkar IXPIXP 24002400 BlockBlock DiagramDiagram Page 5 Raj Yavatkar XScaleXScale
    [Show full text]
  • Strongarm™ SA-1100 Microprocessor for Portable
    StrongARM™ SA-1100 Microprocessor for Portable Applications Brief Datasheet Product Features The StrongARM™ SA-1100 Microprocessor (SA-1100) is a device targeted to provide portable applications with high-end computing performance without requiring users to sacrifice available battery time. The SA-1100 incorporates a 32-bit StrongARM™ RISC processor with instruction and data cache, memory-management unit (MMU), and read/write buffers running at 133/190 MHz. In addition, the SA-1100 provides system support logic, multiple serial communication channels, a color/gray scale LCD controller, PCMCIA support for up to two sockets, and general-purpose I/O ports. ■ High performance ■ 208-pin thin quad flat pack (LQFP) —150 Dhrystone 2.1 MIPS @ 133 MHz ■ 256 mini-ball grid array (mBGA) —220 Dhrystone 2.1 MIPS @ 190 MHz ■ Low power (normal mode)† ■ 32-way set-associative caches —<230 mW @1.5 V/133 MHz —16 Kbyte instruction cache —<330 mW @1.5 V/190 MHz —8 Kbyte write-back data cache ■ Integrated clock generation ■ 32-entry MMUs —Internal phase-locked loop (PLL) —Maps 4 Kbyte, 8 Kbyte, or 1 Mbyte —3.686-MHz oscillator —32.768-kHz oscillator ■ Power-management features ■ Write buffer —Normal (full-on) mode —8-entry, between 1 and 16 bytes each —Idle (power-down) mode —Sleep (power-down) mode ■ Big and little endian operating modes ■ Read buffer —4-entry, 1, 4, or 8 words ■ 3.3-V I/O interface ■ Memory bus —Interfaces to ROM, Flash, SRAM, and DRAM —Supports two PCMCIA sockets † Power dissipation, particularly in idle mode, is strongly dependent on the details of the system design Order Number: 278087-002 November 1998 Information in this document is provided in connection with Intel products.
    [Show full text]
  • Sok: Introspections on Trust and the Semantic Gap
    SoK: Introspections on Trust and the Semantic Gap Bhushan Jain, Mirza Basim Baig, Dongli Zhang, Donald E. Porter, and Radu Sion Stony Brook University fbpjain, mbaig, dozhang, porter, [email protected] Abstract—An essential goal of Virtual Machine Introspection representative legacy OS (Linux 3.13.5), and a representative (VMI) is assuring security policy enforcement and overall bare-metal hypervisor (Xen 4.4), as well as comparing the functionality in the presence of an untrustworthy OS. A number of reported exploits in both systems over the last 8 fundamental obstacle to this goal is the difficulty in accurately extracting semantic meaning from the hypervisor’s hardware- years. Perhaps unsurprisingly, the size of the code base and level view of a guest OS, called the semantic gap. Over the API complexity are strongly correlated with the number of twelve years since the semantic gap was identified, immense reported vulnerabilities [85]. Thus, hypervisors are a much progress has been made in developing powerful VMI tools. more appealing foundation for the trusted computing base Unfortunately, much of this progress has been made at of modern software systems. the cost of reintroducing trust into the guest OS, often in direct contradiction to the underlying threat model motivating This paper focuses on systems that aim to assure the func- the introspection. Although this choice is reasonable in some tionality required by applications using a legacy software contexts and has facilitated progress, the ultimate goal of stack, secured through techniques such as virtual machine reducing the trusted computing base of software systems is introspection (VMI) [46].
    [Show full text]
  • The New Intel® Xscale™ Microarchitecture
    Session 5: Application Specific Processors The new Intel® Xscale™ Microarchitecture Nuno Ricardo Carvalho de Sousa Departamento de Informática, Universidade do Minho 4710 - 057 Braga, Portugal [email protected] Abstract. In embedded systems, performance and power consumption are the most important criteria to define a good processor chip. The new Intel® Xscale™ microarchitecture, an evolution from StrongARM™ microarchitecture, combines these two features, as will be detailed in this communication. We will also see the advanced techniques used by this microarchitecture core to achieve a high level of efficiency. 1 Introduction Nowadays, most microprocessors are in embedded systems, not in PC’s. Embedded products become part of our everyday items: cellular phones, video games, Personal Digital Assistants (PDA) and much more. Although PC processors seem to generate much of all the excitement in the press, it is the other 98 percent – the embedded processors – that are technologically leading the way. This required a new design of microprocessors. The performance of these embedded microprocessors rivals that of PC’s of just few years ago. With clock frequencies up to 400 MHz, these chips offer performance, with a very economical electrical consumption [1]. A microprocessor’s architecture defines the instruction set and programmer’s model for any processor that will be based on that architecture. Different processor implementations may be built to comply with the architecture. Each processor may vary in performance and features, and be optimized to target different applications. In this document we will see with more detail Intel’s 80200, the first microprocessor that use Xscale, and the new Intel PXA250 application processor.
    [Show full text]
  • Tornado-Releasenotes
    Tornado® 2.2 RELEASE NOTES Copyright 2002 Wind River Systems, Inc. ALL RIGHTS RESERVED. No part of this publication may be copied in any form, by photocopy, microfilm, retrieval system, or by any other means now known or hereafter invented without the prior written permission of Wind River Systems, Inc. AutoCode, Embedded Internet, Epilogue, ESp, FastJ, IxWorks, MATRIXX, pRISM, pRISM+, pSOS, RouterWare, Tornado, VxWorks, wind, WindNavigator, Wind River Systems, WinRouter, and Xmath are registered trademarks or service marks of Wind River Systems, Inc. or its subsidiaries. Attaché Plus, BetterState, Doctor Design, Embedded Desktop, Emissary, Envoy, How Smart Things Think, HTMLWorks, MotorWorks, OSEKWorks, Personal JWorks, pSOS+, pSOSim, pSOSystem, SingleStep, SNiFF+, VSPWorks, VxDCOM, VxFusion, VxMP, VxSim, VxVMI, Wind Foundation Classes, WindC++, WindManage, WindNet, Wind River, WindSurf, and WindView are trademarks or service marks of Wind River Systems, Inc. or its subsidiaries. This is a partial list. For a complete list of Wind River trademarks and service marks, see the following URL: http://www.windriver.com/corporate/html/trademark.html Use of the above marks without the express written permission of Wind River Systems, Inc. is prohibited. All other trademarks, registered trademarks, or service marks mentioned herein are the property of their respective owners. Corporate Headquarters Wind River Systems, Inc. 500 Wind River Way Alameda, CA 94501-1153 U.S.A. toll free (U.S.): 800/545-WIND telephone: 510/748-4100 facsimile: 510/749-2010 For additional contact information, please visit the Wind River URL: http://www.windriver.com For information on how to contact Customer Support, please visit the following URL: http://www.windriver.com/support Tornado Release Notes, 2.2 15 Aug 02 Part #: DOC-14291-ZD-01 Contents 1 Introduction .............................................................................................................
    [Show full text]