Characterizing the Performance and Power Consumption of 3D Mobile Games

RESEARCH FEATURE Characterizing the Performance and Power Consumption of 3D Mobile Games Xiaohan Ma and Zhigang Deng, University of Houston Mian Dong and Lin Zhong, Rice University A preliminary study using the Quake 3 and XRace games as benchmarks on three mainstream mobile system-on-chip architectures reveals that the geometry stage is the main bottleneck in 3D mobile games and confirms that game logic significantly affects power consumption. espite the phenomenal growth of mobile applica- is integrated into a system on chip (SoC) consisting of the tions in novel categories, such as education and core processor, the digital signal processor, and many lifestyle, gaming applications remain the most peripheral controllers. This tight chip integration makes D popular category by a wide margin, according it infeasible to physically isolate and measure the SoC’s to the latest Apple App Store statistics (http://148apps.biz/ graphics hardware. app-store-metrics). Moreover, because smartphone graphics hardware is Although 3D entertainment is gaining a market foothold less competent relative to desktop PCs, the smartphone’s in other media, including desktop computing, 3D games general-purpose processor still plays a substantial role in are still a relative rarity on smartphones, in part because mobile graphics computing. As a result, many aspects of mobile and desktop segments differ considerably in feature the SoC are involved in 3D graphics computing, which ex- and performance requirements. For example, rendering acerbates the task of isolating and quantifying graphics effects such as aliasing and imperfect shading can result bottlenecks on smartphones. in reduced image quality because the smartphone screen To tackle these challenges, we used a five-stage, is viewed closer to the observer’s eyes compared with abstracted graphics pipeline, inspired by the OpenGL a PC.1 Also, 3D graphics and games are power-hungry for Embedded Systems (OpenGL ES 2.0) graphics applications in general, which can be a major limitation for pipeline, to measure the performance of two game battery-powered smartphones. benchmarks. We compared five game versions—one To improve the performance and energy efficiency of 3D corresponding to the original source code and the mobile games, a clear research priority is to identify and other four corresponding to four conditions in which quantify bottlenecks in the games’ 3D graphics pipeline. we selectively disabled one or more pipeline stages. Intensive efforts have attempted to characterize the per- Then, using empirically measured performance and formance and power consumption of desktop 3D games,2 power consumption data for three mainstream smart- but undertaking the same task for smartphones is much phones—the Motorola Droid, HTC EVO, and Motorola more challenging. Unlike PCs, mobile architectures are Atrix 4G—we performed an in-depth quantitative designed for small dies, low hardware cost, and low power bottleneck analysis on performance and power con- consumption. As such, smartphones’ graphics hardware sumption among the pipeline’s five stages. 76 COMPUTER Published by the IEEE Computer Society 0018-9162/13/$31.00 © 2013 IEEE Table 1. Technical specifications summary for three smartphone SoC models. Specification OMAP 3430 Snapdragon S2 Tegra 2 CPU ARM Cortex A8 QSD 8650 Dual-core ARM Cortex 9 GPU PowerVR SGX 530 Adreno 200 GeForce ULV CPU clock (MHz) 600 800 1,000 GPU clock (MHz) 200 256 128 GPU memory (MHz) 200 128 600 Polygon fill rate (millions of triangles per sec) ~90 ~70 ~80 Pixel fill rate (megapixels per sec) ~250 ~133 ~1,200 Our analysis revealed that the geometry stage is the units, relative to the PowerVR GPU’s four. Therefore, the main bottleneck in 3D mobile games and confirmed that theoretical fill rate of the texture units in the Adreno GPU game logic significantly affects power consumption. is lower than that in the PowerVR GPU (133 to 250 million versus 250 to 300 million texels per second). SMARTPHONE PLATFORMS The GeForce ULV (ultralow voltage) GPU in Nvidia’s Tegra 2 Efforts to characterize the performance and power con- has a different architecture than either the PowerVR or sumption of 3D desktop games include dynamic workload Adreno GPU. First, it uses an immediate mode renderer characterization on graphics architecture features,2 3D graph- instead of a tile accelerator. Second, it uses completely ics performance modeling,3 and the power modeling of 3D separate vertex and pixel cores with different core archi- graphics architecture.4,5 Modern smartphones use signifi- tectures, rather than a unified shader architecture. cantly different graphics hardware, which makes it extremely Immediate mode renderers have dominated PC appli- difficult to apply these study results to smartphones. cations, while tile-based renderers have been prevalent in Unlike previous work that has attempted to break down embedded or low-power devices. Relative to immediate power at the system level,6 our work focuses on the per- mode GPU architectures, tile-based renderers fetch only the formance and energy characterization of mobile graphics visible texels, while immediate mode renderers fetch texels in the context of mobile SoC architectures that represent for every pixel in a polygon. Tile-based renderers require state-of-the-art design. only one frame-buffer access to output final color; immedi- Table 1 lists the technical specifications of the three ate mode renderers need multiple buffer accesses to output smartphone platforms in our study: the Droid was equipped final color. However, as geometric complexity increases, with Texas Instruments’ Open Multimedia Application Plat- immediate mode renderers are the better option because form (OMAP 3430), the EVO with Qualcomm’s Snapdragon tile-based renderers must process all the polygons in a S2, and the Atrix with Nvidia’s Tegra 2. tile for each pixel. Immediate mode renderers use explicit The OMAP PowerVR GPU architecture consists of three Z-buffering to resolve this issue.7 main modules: CHARACTERIZATION DESIGN • a tile accelerator, which stores the scene data and di- Similar to the OpenGL ES 2.0 pipeline, we use a logical, vides the screen into tiles; abstracted graphics pipeline to describe mobile graphics ar- • an image synthesis processor, which performs hidden chitecture. Unlike previous efforts to quantitatively analyze surface removal to determine visible pixels; and power consumption using a mobile 3D graphics pipeline,1 • a texturing and shading processor, which uses a uni- we measure both performance and power consumption in fied shader architecture with programmable functions. real-world 3D games or graphics applications. As Figure 1 shows, the abstracted graphics pipeline con- The Snapdragon Adreno GPU offers a similar program- tains five stages, consecutively executed: mable graphics pipeline. However, the memory controller in the PowerVR GPU is a 32-bit low-power double-data-rate • Application. The CPU executes the 3D graphics ap- interface (LPDDRI) and can run at up to 200 MHz, which plication. This stage also involves the graphics system offers a 56 percent increase in memory bandwidth, com- driver running on the CPU and calling OpenGL APIs pared to the Adreno GPU’s 128 MHz. Moreover, although the into actions that the GPU executes. Adreno GPU’s streaming texture unit can combine video • Geometry. The processor calculates the vertex attri- and images with 3D graphics, it has only two such texture butes and positions within the 3D scene according to APRIL 2013 77 RESEARCH FEATURE 3D application Driver (game logic) V = 3.7V; I = 0.2 A Data stream Vertex Assembled Pixel stream polygons stream Host PC Data acquisition system Smartphone Primitive Primitive Rasterizer Frame processing assembly buer Pretransformed Rasterized pixels Updated pixels vertices LIT vertices Fragment shading Vertex shader Texture fetching Pixel processing (a) (b) Figure 1. (a) Abstracted mobile graphics pipeline, which consists of five stages, and (b) experimental setup. The pipeline is designed to measure both performance and power consumption in actual 3D games by isolating pipeline stages and analyzing them to identify bottlenecks. The study used the Quake 3 and XRace 3D games as benchmarks the scene organization. Tasks include multiplying ver- performs fetching, mapping, and filtering operations tices by model view and projection matrices, executing on texture data only on that size, which minimizes the vertex shaders, and so on. texturing workload. • Texture fetching. The processor performs texture data • Disable fragment shading. Instead of running every fetch workloads. complex shader for each pixel, as in the original graph- • Fragment shading. The processor executes various ics program, we run only a simple stub shader, thus pixel shaders to process fragments and screen pixels, removing most of the pixel-shading workload. as well as operations that use textures. • Pixel processing. The processor performs the fixed We performed all these isolation and disabling trans- function operations to further process pixels after formations without using profiler tools, primarily because fragment shading, such as reading and writing color profiler tools typically have their own associated runtime components, reading and writing depth and stencil and energy consumption overhead, which is extremely buffers, and alpha blending. difficult to eliminate in data postprocessing. Disabling pipeline stages Experimental setup and procedure Similar to graphics performance profilers

Load more