3D Graphics Optimizations for Arm

3D GRAPHICS OPTIMIZATIONS FOR ARM ARCHITECTURE † ‡ † Gopi K. Kolli Stephen Junkins Haim Barad [email protected] mailto:[email protected] mailto:[email protected] † Handheld Computing Division ‡ Emerging Platforms Lab Intel Corporation Presented at GDC<http://www.gdconf.com/>. Introduction............................................................................................................................................................................................................. 3 Floating-Point Systems Vs Fixed-Point Systems ................................................................................................................................................... 3 Floating-Point Systems ........................................................................................................................................................................................ 3 Hardware Coprocessor........................................................................................................................................................................................ 3 Floating-point Library......................................................................................................................................................................................... 3 Fixed-Point System ................................................................................................................................................................................................ 4 Arithmetic Operations......................................................................................................................................................................................... 4 Dynamic Range and Precision ............................................................................................................................................................................ 4 Error Checking.................................................................................................................................................................................................... 5 Arithmetic Approximation Routines.................................................................................................................................................................. 6 Trigonometric functions..................................................................................................................................................................................... 7 Integer Divide.......................................................................................................................................................................................................... 7 Branching and Predication..................................................................................................................................................................................... 8 Branching............................................................................................................................................................................................................. 8 Predication ........................................................................................................................................................................................................... 8 Invoking Predication.............................................................................................................................................................................................. 9 Loops .................................................................................................................................................................................................................. 9 “If” statements.................................................................................................................................................................................................... 9 Relational or Boolean expression....................................................................................................................................................................... 9 Register Allocation............................................................................................................................................................................................... 10 Pointer Aliasing ................................................................................................................................................................................................. 10 Function call overhead....................................................................................................................................................................................... 10 Memory-Based Optimizations ............................................................................................................................................................................ 10 Conclusion ............................................................................................................................................................................................................ 10 References............................................................................................................................................................................................................. 11 Introduction Embedded and handheld computing devices are rapidly becoming ubiquitous. They are evolving in usage, performance and features and are becoming capable of supporting 3D graphics. The computational performance and display capabilities of these consumer devices are evolving rapidly. Compaq’s iPaq 3800 handheld computing devices has a 206Mhz Intel StrongARM Microprocessor and a 16-bit QVGA display. With such capabilities, handheld computing devices, set-top boxes and even cell phones can now be programmed to support software rendering of immersive 3D Worlds. Once developed, a 3D rendering solution coupled with wireless connectivity capability, and the growing ubiquity of mobile computing devices, provides an exciting new opportunity for 3D game developers. Many mobile devices such as cell-phones, personal digital assistants and handheld gaming devices use ARM-based processors. ARM architecture is a 16/32-bit RISC architecture designed to allow very small, yet high-performance implementations for low power devices and is becoming an architecture standard for handheld, multi-media computing. Though ARM processor instruction throughput has recently become quite attractive, other aspects of the architecture challenge implementers of software 3D Rendering systems. Specifically: • Many commercial ARM-based devices do not include dedicated floating-point hardware due to extra cost and power consumption issues. • ARM architecture does not support integer divide. • For most ARM implementations, on chip caches are quite small relative to PC architecture caches sizes. • Display hardware is small and very simple; usually LCD controller memory maps system memory. • 2D and 3D Rasterization hardware is not commonplace in embedded devices. Cost and power consumption will likely limit the acceptance of dedicated hardware in the future, especially for cell phones. However, leading edge PDAs might accept it for a premium price. Given these architectural challenges, careful optimization of the 3D engine is the key to achieving rendering performance sufficient for 3D games on ARM-based platforms. In this paper, we will explore these challenges and suggest performance optimization strategies to enable game developers to build software 3D Rendering solutions for ARM-based embedded devices. Floating-Point Systems Vs Fixed-Point Systems Flexible 3D engines require real number representation of coordinate space systems to support many of 3D Rendering’s fundamental algorithms. Real number representation is especially relevant for implementation of transform, lighting, clipping, and culling, as they require broad dynamic range and a high degree of precision. Floating-point representation of real numbers is preferred to integer representation due to its ability to provide large dynamic range and very high precision. Floating-Point Systems Floating-point support can be provided in ARM-based systems either in hardware or in software. Hardware Coprocessor Hardware floating-point implementation typically consists of a floating-point coprocessor and provides very good performance. However, placing additional silicon and power consumption costs on a commercial system is prohibitive. Additionally, the hardware coprocessor limits the performance of the ARM code with its maximum clock speed. Therefore, this implementation is not preferred currently in commercial ARM-based systems. Floating-point Library Software floating-point implementation typically consists of a floating-point library. Floating-point operations can be fully implemented in a software library using ARM instructions. While compiling the floating-point application code, the compilers generate function calls to this software library rather than floating-point instructions. Therefore, the application code • Remains unaffected with future inclusion of floating-point hardware into the system. • Can instantly take advantage of any improvement in the ARM core. Choice of the floating-point support in the system depends on various

Load more