Performance and Power Consumption Characterization of 3D Mobile Games

Performance and Power Consumption Characterization of 3D Mobile Games Xiaohan Ma+, Mian Dong*, Lin Zhong*, and Zhigang Deng+ + Department of Computer Science, University of Houston * Department of Electrical and Computer Engineering, Rice University Abstract: This paper describes a preliminary study of characterizing performance and power consumption characterization of 3D mobile games. We choose Quake3 and XRace as the game benchmarks and study them on TI OMAP3430, Qualcomm Snapdragon S2, and NVIDIA Tegra 2 (three mainstream mobile System-on-Chip architectures) by selectively disabling different graphics pipeline stages in source code level. Our characterization results show that the geometry stage is the leading bottleneck and the game logic (application) takes a significant portion of power consumption. Keywords: Smartphone, mobile games, graphics, performance analysis, energy characterization 1 INTRODUCTION Gaming applications are becoming prevalent on smartphones in recent years [2]. For example, currently, iPhone games account for ∼20% of all the applications (or over 13,000) in the App Store. However, 3D games are still a rarity on smartphones, although they have been popularized on PCs for decades. The reasons are twofold. First, the mobile and desktop segments are two individual areas with different feature and performance requirements. For instance, rendering effects such as aliasing and imperfect shading can result in more serious problems in reducing image quality since the smartphone screen is held closer to the observer’s eyes compared with PCs [6]. Second, 3D graphics and games are power-hungry applications in general, which can be a major factor for battery-powered smartphones. For example, running Quake3 locally on the Motorola Droid consumes ∼70% more power than video playback (1.7 Watt versus 1 Watt) based on our empirical measurements. To improve the performance and energy efficiency of 3D mobile games, it is crucial to identify and quantify the bottlenecks of their 3D graphics pipeline. Intensive efforts have been attempted to characterize performance and power consumption of desktop 3D games [5]. The same task on smartphones, however, is much more challenging because unlike PCs, mobile architectures are designed for small die, low hardware cost, and low power consumption. As such, smartphones have graphics hardware integrated into a system- on-chip (SoC) known as the Application Processor, together with the core processor, digital signal processor (DSP), and many peripheral controllers. The chip integration makes physical measurement of graphics hardware infeasible. Moreover, since graphics hardware on smartphones is less competent as that on desktop PCs, the general-purpose processor (i.e., the 32-bit ARM core on Snapdragon SoC) still plays a substantial role in mobile graphics computing. As a result, many aspects of the SoC are involved in 3D graphics computing, which further complicates the task of isolating and quantifying the bottlenecks. To tackle the above challenges, we first use a logical, abstracted graphics pipeline to model the stage isolation, similar to the OpenGL ES 2.0 graphics pipeline, and we disable the target stages (i.e., application, geometry, texture fetching, fragment shading, and pixel processing) at source code level. Then, based on the empirically measured performance and power consumption data of three chosen mainstream smartphones: Motorola Droid equipped with TI OMAP3430 SoC, HTC EVO equipped with Qualcomm Snapdragon S2 SoC, and Motorola Atrix 4G equipped with NVIDIA Tegra 2 SoC, we perform in-depth quantitative bottleneck analysis on performance and power consumption among the five stages of the mobile graphics pipeline. 1 2 BACKGROUND AND RELATED WORK 2.1 Mobile Graphics Architecture The smartphones chosen in this study represent three mainstream smartphone SoCs on the market: the Open Multimedia Application Platform (OMAP3430) developed by Texas Instrument (TI) [11], the Qualcomm Snapdragon S2 [9], and the NVIDIA Tegra 2 [8]. Technical specifications of the three chosen smartphone platforms are shown in Table 1. SoC Model CPU GPU CPU Clock GPU Clock GPU Memory Polygon Fillrate Pixel Fillrate (MHz) (MHz) Clock (MHz) (MTriangles/sec) (MPixel/sec) ARM PowerVR 600 200 200 ∼90 ∼250 OMAP3430 Cortex-A8 SGX530 QSD8650 Adreno 200 800 256 128 ∼70 ∼133 Snapdragon S2 Dual-core GeForce 1000 333 600 ∼80 ∼1200 Tegra 2 ARM Cortex-A9 ULV Table 1: Summary of technical specifications of the three chosen smartphone SoCs. The PowerVR GPU architecture in TI OMAP consists of three main modules: a tile accelerator (storing the scene data and dividing the screen into tiles), an image synthesis processor (performing hidden sur- face removal to determine visible pixels), and a texturing and shading processor (having a unified shader architecture with programmable function pipeline). The Adreno GPU in Qualcomm Snapdragon offers a similar programmable graphics pipeline. However, the memory controller in the PowerVR GPU is a 32-bit LPDDR1 interface and can run at up to 200MHz, which offers a 56% increase in memory bandwidth, compared to the Adreno GPU’s 128MHz. Moreover, although the streaming texture unit of the Adreno GPU can combine video and images with 3D graphics; it has only two such texture units. By contrast, the PowerVR GPU has four such texture units. Therefore, the theoretical fillrate of the texture units in the Adreno GPU, 133-250 million texels/sec, is lower than that of the PowerVR GPU, that is, 250-300 million texels/sec. The GeForce ULV GPU in NVIDIA’s Tegra 2 has a different architecture from the other two GPUs. First, it does not employ a tile accelerator; instead, the GeForce ULV GPU uses an immediate mode renderer. Second, the GeForce ULV GPU does not have a unified shader architecture but uses completely separate vertex and pixel cores with different core architectures. Immediate mode renderers and tile-based renderers both have their application spaces: immediate mode renderers have dominated the PC applications while tile-based renderers have been prevalent in the embedded or low-power devices. Compared to immediate mode GPU architectures, tile-based renderers fetch only the visible texels; while immediate mode renderers need fetch texels for every pixel in a polygon. Tile-based renderers only require single frame buffer access to output final color; while immediate mode renderers need multiple buffer accesses to output final color. However, as geometry complexity in games is increased, es- pecially in PC applications, immediate mode renderers have been proved to be the better option because tile-based renderers need process all the polygons in a tile for each pixel; while immediate mode renderers relieve this issue by using explicit Z-buffering. For a complete review of tile-based renderers and immediate mode renderers, please refer to [7]. 2.2 Characterization of 3D Games and Graphics Computing Intensive efforts have been attempted to characterize performance and power consumption of 3D desktop games, including dynamic workload characterization on graphics architecture features [5], 3D graphics performance modeling [12], and power modeling of 3D graphics architecture [4, 10]. As described above, modern smartphones employ significant different graphics hardware from desktop computers. As a result, findings from the above studies cannot be directly applied to smartphones, without considerable efforts. 2 Figure 1: (Left) The abstracted mobile graphics pipeline used in this work. (Right) Experimental setup. Mochocki et al. [6] quantitatively analyzed the power consumption aspect of mobile 3D graphics pipeline. However, their work did not measure both performance and power consumption in real-world 3D games or graphics applications. Instead, their work simply employed a traditional CPU-based power model to es- timate the power consumption of runtime benchmarks on an abstracted, three-stage mobile 3D graphics pipeline (i.e., geometry, triangle setup, and rendering). Recently, Carroll et al. [3] proposed a detailed analysis of power consumption of a smartphone. They conducted fine-grained instrumentation of a smartphone in order to breakdown power on system-level (CPU, RAM, GSM, WiFi, SD card, etc). Different from their work, our work focuses on the performance and energy characterization of mobile graphics in the context of modern mobile SoC architectures. The three chosen mainstream mobile SoC architectures represent the state-of-the-art mobile architecture design. 3 CHARACTERIZATION DESIGN 3.1 Methodology Mobile graphics pipeline abstraction: Inspired by the OpenGL for Embedded Systems (OpenGL ES) 2.0, we use a logical, abstracted graphics pipeline to describe mobile graphics architecture in this work, as illustrated in Figure 1. The abstracted graphics pipeline contains the following stages, executed one after another. (i) Application: In this stage, the 3D graphics application is executed on the CPU. It also involves the graphics system driver running on the CPU and calling OpenGL APIs into actions executed on GPU. (ii) Geometry: Vertex attributes and positions within the 3D scene are calculated according to the scene organization. This stage includes multiplying vertices by the modelview and projection matrices, executing vertex shaders, etc. (iii) Texture fetching: This stage involves texture data fetch workloads. (iv) Fragment shading: It executes various pixel shaders to process fragments and screen pixels. It also involves operations that use textures. (v) Pixel processing:

Load more