<<

Performance and Power Consumption Characterization of 3D Mobile Games

Xiaohan Ma+, Mian Dong*, Lin Zhong*, and Zhigang Deng+ + Department of Computer Science, University of Houston * Department of Electrical and Computer Engineering, Rice University

Abstract: This paper describes a preliminary study of characterizing performance and power consump- tion characterization of 3D mobile games. We choose Quake3 and XRace as the game benchmarks and study them on TI OMAP3430, Snapdragon S2, and 2 (three mainstream mo- bile System-on-Chip architectures) by selectively disabling different stages in source code level. Our characterization results show that the geometry stage is the leading bottleneck and the game logic (application) takes a significant portion of power consumption. Keywords: Smartphone, mobile games, graphics, performance analysis, energy characterization

1 INTRODUCTION

Gaming applications are becoming prevalent on smartphones in recent years [2]. For example, currently, iPhone games account for ∼20% of all the applications (or over 13,000) in the App Store. However, 3D games are still a rarity on smartphones, although they have been popularized on PCs for decades. The reasons are twofold. First, the mobile and desktop segments are two individual areas with different feature and performance requirements. For instance, rendering effects such as aliasing and imperfect can result in more serious problems in reducing image quality since the smartphone screen is held closer to the observer’s eyes compared with PCs [6]. Second, 3D graphics and games are power-hungry applications in general, which can be a major factor for battery-powered smartphones. For example, running Quake3 locally on the consumes ∼70% more power than video playback (1.7 Watt versus 1 Watt) based on our empirical measurements. To improve the performance and energy efficiency of 3D mobile games, it is crucial to identify and quantify the bottlenecks of their 3D graphics pipeline. Intensive efforts have been attempted to characterize performance and power consumption of desktop 3D games [5]. The same task on smartphones, however, is much more challenging because unlike PCs, mobile architectures are designed for small die, low hardware cost, and low power consumption. As such, smartphones have graphics hardware integrated into a system- on-chip (SoC) known as the Application Processor, together with the core processor, digital signal processor (DSP), and many peripheral controllers. The chip integration makes physical measurement of graphics hardware infeasible. Moreover, since graphics hardware on smartphones is less competent as that on desktop PCs, the general-purpose processor (i.e., the 32-bit ARM core on Snapdragon SoC) still plays a substantial role in mobile graphics computing. As a result, many aspects of the SoC are involved in 3D graphics computing, which further complicates the task of isolating and quantifying the bottlenecks. To tackle the above challenges, we first use a logical, abstracted graphics pipeline to model the stage isolation, similar to the OpenGL ES 2.0 graphics pipeline, and we disable the target stages (i.e., application, geometry, texture fetching, fragment shading, and processing) at source code level. Then, based on the empirically measured performance and power consumption data of three chosen mainstream smartphones: Motorola Droid equipped with TI OMAP3430 SoC, HTC EVO equipped with S2 SoC, and Motorola Atrix 4G equipped with NVIDIA Tegra 2 SoC, we perform in-depth quantitative bottle- neck analysis on performance and power consumption among the five stages of the mobile graphics pipeline. 1 2 BACKGROUND AND RELATED WORK

2.1 Mobile Graphics Architecture The smartphones chosen in this study represent three mainstream smartphone SoCs on the market: the Open Multimedia Application Platform (OMAP3430) developed by Texas Instrument (TI) [11], the Qualcomm Snapdragon S2 [9], and the NVIDIA Tegra 2 [8]. Technical specifications of the three chosen smartphone platforms are shown in Table 1.

SoC Model CPU GPU CPU Clock GPU Clock GPU Memory Polygon Pixel Fillrate (MHz) (MHz) Clock (MHz) (MTriangles/sec) (MPixel/sec) ARM PowerVR 600 200 200 ∼90 ∼250 OMAP3430 Cortex-A8 SGX530 QSD8650 200 800 256 128 ∼70 ∼133 Snapdragon S2 Dual-core GeForce 1000 333 600 ∼80 ∼1200 Tegra 2 ARM Cortex-A9 ULV

Table 1: Summary of technical specifications of the three chosen smartphone SoCs.

The PowerVR GPU architecture in TI OMAP consists of three main modules: a tile accelerator (storing the scene data and dividing the screen into tiles), an image synthesis processor (performing hidden sur- face removal to determine visible ), and a texturing and shading processor (having a unified architecture with programmable function pipeline). The Adreno GPU in Qualcomm Snapdragon offers a similar programmable graphics pipeline. However, the in the PowerVR GPU is a 32-bit LPDDR1 interface and can run at up to 200MHz, which offers a 56% increase in memory bandwidth, com- pared to the Adreno GPU’s 128MHz. Moreover, although the streaming texture unit of the Adreno GPU can combine video and images with 3D graphics; it has only two such texture units. By contrast, the PowerVR GPU has four such texture units. Therefore, the theoretical fillrate of the texture units in the Adreno GPU, 133-250 million texels/sec, is lower than that of the PowerVR GPU, that is, 250-300 million texels/sec. The GeForce ULV GPU in NVIDIA’s Tegra 2 has a different architecture from the other two GPUs. First, it does not employ a tile accelerator; instead, the GeForce ULV GPU uses an immediate mode renderer. Second, the GeForce ULV GPU does not have a unified shader architecture but uses completely separate vertex and pixel cores with different core architectures. Immediate mode renderers and tile-based renderers both have their application spaces: immediate mode renderers have dominated the PC applications while tile-based renderers have been prevalent in the embed- ded or low-power devices. Compared to immediate mode GPU architectures, tile-based renderers fetch only the visible texels; while immediate mode renderers need fetch texels for every pixel in a polygon. Tile-based renderers only require single frame buffer access to output final color; while immediate mode renderers need multiple buffer accesses to output final color. However, as geometry complexity in games is increased, es- pecially in PC applications, immediate mode renderers have been proved to be the better option because tile-based renderers need process all the polygons in a tile for each pixel; while immediate mode renderers relieve this issue by using explicit Z-buffering. For a complete review of tile-based renderers and immediate mode renderers, please refer to [7].

2.2 Characterization of 3D Games and Graphics Computing Intensive efforts have been attempted to characterize performance and power consumption of 3D desktop games, including dynamic workload characterization on graphics architecture features [5], 3D graphics performance modeling [12], and power modeling of 3D graphics architecture [4, 10]. As described above, modern smartphones employ significant different graphics hardware from desktop computers. As a result, findings from the above studies cannot be directly applied to smartphones, without considerable efforts. 2 Figure 1: (Left) The abstracted mobile graphics pipeline used in this work. (Right) Experimental setup.

Mochocki et al. [6] quantitatively analyzed the power consumption aspect of mobile 3D graphics pipeline. However, their work did not measure both performance and power consumption in real-world 3D games or graphics applications. Instead, their work simply employed a traditional CPU-based power model to es- timate the power consumption of runtime benchmarks on an abstracted, three-stage mobile 3D graphics pipeline (i.e., geometry, triangle setup, and rendering). Recently, Carroll et al. [3] proposed a detailed anal- ysis of power consumption of a smartphone. They conducted fine-grained instrumentation of a smartphone in order to breakdown power on system-level (CPU, RAM, GSM, WiFi, SD card, etc). Different from their work, our work focuses on the performance and energy characterization of mobile graphics in the context of modern mobile SoC architectures. The three chosen mainstream mobile SoC architectures represent the state-of-the-art mobile architecture design.

3 CHARACTERIZATION DESIGN

3.1 Methodology

Mobile graphics pipeline abstraction: Inspired by the OpenGL for Embedded Systems (OpenGL ES) 2.0, we use a logical, abstracted graphics pipeline to describe mobile graphics architecture in this work, as illustrated in Figure 1. The abstracted graphics pipeline contains the following stages, executed one after another. (i) Application: In this stage, the 3D graphics application is executed on the CPU. It also involves the graphics system driver running on the CPU and calling OpenGL APIs into actions executed on GPU. (ii) Geometry: Vertex attributes and positions within the 3D scene are calculated according to the scene organization. This stage includes multiplying vertices by the modelview and projection matrices, executing vertex , etc. (iii) Texture fetching: This stage involves texture data fetch workloads. (iv) Fragment shading: It executes various pixel shaders to process fragments and screen pixels. It also involves operations that use textures. (v) Pixel processing: After fragment shading, there are more fixed function operations for further pixel processing: reading and writing color components, reading and writing depth and stencil buffers, performing alpha blending, etc. Similar to graphics performance profilers (e.g., gDEBugger), in this work we use the following scheme to disable any target graphics pipeline stages. (1) Disable all the pipeline stages: we disable all the OpenGL commands that push vertices or texture data into graphics pipeline. (2) Disable all other stages except the geometry stage: We disable rasterization operations by forcing all W -coords in the scene to be negative (i.e., changing W to −|W |) after modelview and projection transformations. Thus, all the geometric operations before rasterization are performed as usual. Then, all the vertices will be culled out since they have negative W-coords. Rasterization and follow-up operations will be eliminated as well. (3) Disable the texture fetching 3 stage: We disable the texture fetching stage by forcing OpenGL to use 2x2 pixels stub textures. By using the stub textures, fetching, mapping, and filtering operations of texture data would be only performed on a 2x2 (4 pixels) size. Thus, the texturing workload can be removed. (4) Disable the fragment shading stage: We disable the fragment shading stage using a simple stub pixel shader, instead of all the shaders used in the original graphics program. Thus, instead of running every complex shader for each pixel, only a simple stub shader is performed for each pixel, which removes majority of the pixel shading workload. The above graphics pipeline stage isolations and disabling are performed at source code level. It is noteworthy that we do not use any profiler tools, e.g., gDEBugger, to isolate target pipeline stages in this study. The main reason is that profiler tools typically have their own runtime and energy consumption overheads, and it is extremely difficult to eliminate those overheads through data postprocessing.

3.2 Experimental Setup and Procedure Benchmarks: We chose Quake3 and XRace as the representative 3D game benchmarks in this study. Quake3 is a first-person shooting game and XRace is a car race game, since the two game categories are the most popular 3D mobile games (e.g., 8 out of top 10 Android 3D games in 2010 are shooting and car race games). Quake3 is one of the most successful 3D games during the past decades and it uses OpenGL/OpenGL ES as its graphics engine [1]. Also, the latest release of Quake3 predominantly uses the state-of-the-art vertex and pixel programs, which extensively exploit the capability of modern mobile graphics processors. XRace is a 3D car race game for the mobile phone platform, using OpenGL ES 2.0 for rendering. The XRace provides optimal graphics design for modern mobile SoC. Thus, characterization of Quake3 and XRace games on mobile shall shed significant light on how real-world 3D graphics and game applications run on modern mobile platforms.

Figure 2: Quake3 (Top) and XRace (Bottom) snapshots with different graphics pipeline stages disabled.

Based on the original Quake3 and XRace source code, we generated five different versions for each of the games: one corresponds to the original, the other four correspond to the above four stage-disabled conditions (refer to Section 3.1). To make our characterization process repeatable, for each stage-disabled version, we ran a traced game demo on smartphones in order to measure its performance and power consumption (refer to Figure 2). A game demo is a recorded sequence of game playing that can be reproduced on different runs. For example, the selected game demo for Quake3, FOUR.dm 89, in this study was chosen from the original package file in Quake3 CD. Each frame in the recorded game sequence (containing map type, character positions, weapon information, number of enemies, etc) was first read out. Then, these data were reconstructed in the 3D scene and rendered on the screen using mobile CPU and GPU. After that, the next frame would be read and rendered consecutively. In this study, while the game demo was running, we made 4 the game engine to try all its capabilities to exhaust CPU and GPU resources, in order to run the demo as fast as possible without dropping any frame. Hence, completion time is a sound indicator/measure of its performance. Apparatus: We measured power consumption of each of the chosen smartphones (Motorola Droid, HTC EVO, and Motorola Atrix 4G in this study) using the following scheme. We hacked the battery of the smartphone and put in a sensing resistor to measure the current drawn from the battery. We used a data acquisition system (DAQ) to measure the voltage and current of the battery simultaneously and used a host PC to store all the data collected by the DAQ system. The sampling rate of the DAQ system was set to 1 KHz. Instant power consumption is calculated by multiplying the voltage and current every millisecond. Then, we compute the average in every 100 milliseconds to reduce random noises/errors potentially introduced by the measuring process. Procedure: For each version (total 5x2) of Quake3 and XRace, we ran it on the three chosen smart- phones, measured its demo running time, and recorded its associated power trace. To eliminate any caching effect of the system, we rebooted the smartphones every time before measurement. We repeated the mea- surement of each version of Quake3 and XRace ten times and computed its average.

4 CHARACTERIZATION RESULTS

The top panel of Table 2 shows the raw running time and power measured with different code transforma- tions. The running time and power consumption of each pipeline stage were then calculated through simple subtraction operations between the original and the stage-disabled versions, or subtraction between two dif- ferent stage-disabled versions. The bottom panel of Table 2 and Figure 3 show the performance and power consumption breakdown as well as bottlenecks obtained in our study. From this figure, we can derive the following key findings.

Smartphone Measures Original Disable Disable Disable texture Disable fragment graphics raster fetching shading Q3 XR Q3 XR Q3 XR Q3 XR Q3 XR Time (s) 44.9 15.5 20.3 6.3 31.7 10.1 39.2 13.4 39.0 13.5 Motorola Droid Power (mW) 944.2 460.9 417.2 260.5 653.6 378.2 869.4 411.1 815.0 447.3 Time (s) 64.7 24.5 20.2 6.8 45.2 15.2 53.0 19.4 59.0 21.9 HTC EVO Power (mW) 1005.7 490.1 402.5 273.5 669.9 366.6 892.7 401.4 873.8 466.4 Time (s) 40.8 14.6 18.6 5.8 31.4 8.9 38.4 12.8 38.7 13.6 Motorola Atrix 4G Power (mW) 918.9 377.2 432.4 216.5 648.9 302.9 846.6 291.7 833.4 367.8 Smartphone Measures Application Geometry Texture Fragment Pixel System fetching shading processing Q3 XR Q3 XR Q3 XR Q3 XR Q3 XR Q3 XR Time (s) 20.3 6.3 11.4 3.8 5.7 2.1 5.9 2.0 1.6 1.3 - - Motorola Time Percent. (%) - - 46.3 41.3 23.2 22.8 23.9 21.7 6.5 14.3 - - Droid Power (mW) 417.2 260.5 236.4 117.7 74.8 59.8 129.2 13.6 86.6 9.3 466.6 466.6 Power Percent. (%) - - 44.6 58.7 14.2 29.8 24.5 6.8 16.4 4.6 - - Time (s) 20.2 6.8 25.0 8.4 11.7 5.1 5.7 2.6 2.1 1.6 - - HTC Time Percent. (%) - - 56.2 47.5 26.3 28.8 12.8 14.7 4.7 9.0 - - EVO Power (mW) 402.5 273.5 267.4 93.1 113.0 88.7 131.9 23.7 90.9 11.1 419.9 419.9 Power Percent. (%) - - 44.3 42.9 18.7 41.0 21.9 11.0 15.0 5.1 - - Time (s) 18.6 5.8 12.8 3.1 2.4 1.8 2.1 1.0 4.9 2.9 - - Motorola Time Percent. (%) - - 57.7 35.2 10.8 20.5 9.5 11.4 22.1 33.0 - - Atrix 4G Power (mW) 432.4 216.5 213.5 86.4 72.3 52.1 85.5 9.4 115.2 12.8 420.4 420.4 Power Percent. (%) - - 43.9 53.8 14.9 32.4 17.6 5.8 23.7 8.0 - -

Table 2: Top: Raw running time and power consumption with the different code transformations. Bottom: Running time and power consumption of each pipeline stage. Here Q3 and XR are the abbreviations for Quake3 and XRace.

(i) The geometry stage consumes a major part of power and computing time on all the three smartphone platforms. The obtained power consumption statistics of the geometry stage and its computing 5 Figure 3: (Top) Performance breakdown and bottlenecks of the test games. (Bottom) Power consumption breakdown and bottlenecks of the test games. Here the System component stands for the baseline power when all the graphics pipeline stages are disabled.

6 time statistics are shown in Table 2. We observe that the geometry stage consumes > 40% computing time and > 35% power on all the test platforms. We argue that the scene complexity of real-world 3D games such as Quake 3 makes the geometry stage as a performance and/or energy consumption bound, though scene organization techniques such as Binary Space Partitioning (BSP) have been employed in Quake 3 and XRace. (ii) The fragment shading stage consumes a larger portion of computing time and power than the pixel processing stage on both Motorola Droid and HTC EVO. The obtained computing time and power consumption statistics of the fragment shading stage and pixel processing stage are shown in Table 2. We can observe from Table 2 that the fragment shading stage consumes a larger portion of computing time and power than the pixel processing stage on both Motorola Droid and HTC EVO. On modern mobile GPUs, such as the PowerVR and Adreno studied in this work, pixel processing workloads in GPU are dispatched on hardware units on chip. Meanwhile, these modern mobile GPUs employ a programmable shader archi- tecture to allow graphics/game developers to create more realistic and rich shading effects through software implementation. However, hardware implementation avoids most of instruction fetching, decoding, execu- tion overheads existing in software implementation, which results in power consumption and performance gains over the chip. Therefore, we observe that the fragment shading stage consumes a larger portion of power and running time than the pixel processing stage on both Motorola Droid and HTC EVO. Interestingly, on the Motorola Atrix 4G platform, the pixel processing stage consumes a larger portion of computing time and power than the fragment shading stage (refer to Table 2). One sound explanation is that the GeForce ULV GPU used on Motorola Atrix 4G does not have a tile-based graphics architecture as the other two GPUs. In other words, the GeForce ULV GPU has to perform pixel processing operations over the entire framebuffer for every drawing frame while the PowerVR and Adreno GPUs process pixels in one tile of the entire framebuffer, which makes the pixel processing stage to consume a larger portion of power and computing time than the fragment shading stage on Motorola Atrix 4G. (iii) Compared to game logic computation in the application stage, its graphics part consumes a larger portion of power on the Snapdragon architecture than the OMAP architecture, and graphics performance of the OMAP architecture outperforms that of the Snapdragon architecture. The Adreno GPU on Snapdragon offers a programmable function pipeline similar to the PowerVR GPU on OMAP. However, the memory controller of the PowerVR GPU is a 32-bit LPDDR1 interface and can run at up to 200 MHz, which offers a ∼56% increase in memory bandwidth compared to the Adreno GPU (128 MHz). From the perspective of IC design, moving data within the memory system costs measurable power. Hence, the outperforming of the PowerVR memory controller leads to a better energy-efficiency in the texturing and pixel shading stages where texture memory and other pixel buffers need to be accessed frequently. Moreover, although the streaming texture units of the Adreno GPU can combine video and images with 3D graphics, it only has two such texture units on the Adreno GPU while there are four texture units on the PowerVR and GeForce GPUs. Therefore, theoretical texture fillrate of Adreno GPU is 133∼250 million texels/sec, which is significantly slower than PowerVR GPU’s 250∼300 million texels/sec. We observe that on HTC EVO, the texturing stage consumes significantly more power and computing time for XRace than for Quake 3 (refer to Table 2); as such, it is one of the major performance and power consumption bottlenecks. Also, compared to Adreno GPU, the hardware implementation of hidden surface removal on PowerVR significantly increases the fragment shading and pixel processing capability; on the other hand, the removed pixels avoid costing additional energy consumption in the fragment shading stage. (iv) Different from existing characterization results of commodity desktop graphics, the game logic takes a significant portion of power consumption on the three smartphone platforms. As shown in Figure 3, the game logic (mainly the CPU workload) of Quake 3 takes on average about 30% power consumption on the three chosen mobile platforms. On a commodity desktop (e.g., Althon II dual-core 2.8 GHz CPU plus AMD HD 4200 GPU), we observe that the same game logic only takes 15.12% of the consumed power. We argue that CPU might still be the bound for mobile graphics computing. For example, OMAP CPU and ARM Cortex-A8 studied in this work are able to hit 1 GHz clock frequency; 7 however, compared with multi-core desktop CPUs, the relative amount of on-die resources dedicated to CPU is still limited for the computation-intensive game logic such as optimal path search.

5 CONCLUSIONS

We present a preliminary study on performance and power consumption characterization of 3D mobile games. Our main characterization results show that the geometry stage is the leading bottleneck and the game logic (application) takes a significant portion of power consumption. However, the amount of power dissipated in a pipeline stage may be a function of the scene complexity - given a game with lower primitive complexity, the energy breakdown may be sufficiently different. Thus, one of the future directions of this work is to perform more studies using various 3D games (with high primitive complexity and low primi- tive processing requirements) and further breakdown the geometry stage in order to better understand the performance and power bottlenecks. Another limitation of this work is that the impact of disabling stages on the GPU interconnects/bus in the graphics pipeline has not been removed. For example, when the fragment shading stage is disabled, the outputted fragment color could be different from the normal case since we use stub shaders. Thus, the bit-toggling on the bus would be affected and the resultant running time and power of the fragment shading would not be perfectly accurate. Moreover, only two 3D game benchmarks (Quake 3 and XRace) were investigated, which may not expose all the game characterizations of mobile GPU architectures. However, we believe that the same methodology proposed in the current work can be straightforwardly applied to other 3D mobile games. In the future, besides studying more games, we will use microbenchmarks such as GLBenchmark to expose the performance and energy characteristics of specific units on the graphics pipeline. Second, flash video games that can be played in browsers are not studied in current work. It would be useful to study the performance and energy characterization of flash video games along this direction.

Acknowledgments

This work is supported in part by Texas NHARP 003652-0058-2007, NSF IIS-0914965, NSF-0751173, NSF-0713249, and NSF-0923479. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.

References

[1] I. Antochi, B. Juurlink, S. Vassiliadis, and P. Liuha. Graalbench: a 3D graphics benchmark suite for mobile phones. In Proceedings of ACM SIGPLAN/SIGBED'04, pages 1–9, 2004.

[2] T. Capin, K. Pulli, and T. Akenine-M¨oller. The state of the art in mobile graphics research. IEEE Comput. Graph. Appl., 28:74–84, 2008.

[3] A. Carroll and G. Heiser. An analysis of power consumption in a smartphone. In Proceedings of the 2010 USENIX conference on USENIX annual technical conference, USENIXATC’10, pages 21–21, 2010.

[4] X. Ma, M. Dong, L. Zhong, and Z. Deng. Statistical power consumption analysis and modeling for gpu-based computing. In HotPower'09: Proc. of ACM SOSP Workshop on Power Aware Computing and Systems 2009, Big Sky, MT, Oct 2009.

[5] T. Mitra and T. Chiueh. Dynamic 3D graphics workload characterization and the architectural impli- cations. In Proc. Intl. Symp. On (MICRO-32), pages 62–71, 1999. 8 [6] B. Mochocki, K. Lahiri, and S. Cadambi. Power analysis of mobile 3D graphics. In Proceedings of the Conference on Design, Automation and Test in Europe, pages 502–507, 2006. [7] S. Molnar, M. Cox, D. Ellsworth, and H. Fuchs. A sorting classification of parallel rendering. In ACM SIGGRAPH ASIA 2008 courses, SIGGRAPH Asia ’08, pages 35:1–35:11, 2008. [8] NVIDIA Tegra 2. http://www.nvidia.com/object/tegra-2.html, 2011. [9] Qualcomm Snapdragon Processor. http://www.qualcomm.com/snapdragon, 2011. [10] J. W. Sheaffer, D. Luebke, and K. Skadron. A flexible simulation framework for graphics architectures. In HWWS '04: Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hard- ware, pages 85–94, 2004. [11] The OMAP. http://www.ti.com/omap, 2011. [12] M. Wimmer and P. Wonka. Rendering time estimation for real-time rendering. In Proc. of Eurograph- ics Symposium on Rendering, pages 118–128, 2003.

Authors’ Bios and Contact

Xiaohan Ma is currently a Ph.D. student in the Department of Computer Science at the University of Houston. His research interests include , computer animation, and GPU computing. Ma received his B.S. and M.S. in Computer Science from the Zhejiang University in 2005 and 2007. His email is [email protected].

Ma's Postal Mailing address: Xiaohan Ma PGH 309 Department of Computer Science University of Houston Houston, TX 77204-3010

Mian Dong is currently a Ph.D. student at Rice University, Houston, TX. He received his B.S. and M.S. in Electronic Engineering from Tsinghua University, Beijing, China, in 2003 and 2006, respectively. His research interests include energy efficient graphics and display systems, power characterization and man- agement of mobile systems, and architecture & CAD for computing based on emerging nanometer devices. His email address is [email protected].

Dong's Postal Mailing address: Mian Dong Rice University, Electrical and Computer Engineering Department, 6100 Main St. MS-380, Houston, TX 77005

Lin Zhong received his B.S. and M.S. from Tsinghua University in 1998 and 2000, respectively. He re- ceived his Ph.D. from Princeton University in September, 2005. Currently, he is with the Department of Electrical & Computer Engineering, Rice University, as an Associate Professor. His research interests in- clude mobile & embedded system design, human-computer interaction, and nanoelectronics. His email address is [email protected].

Zhong's Postal Mailing address: Lin Zhong 9 Rice University, Electrical and Computer Engineering Department, 6100 Main St. MS-380, Houston, TX 77005

Zhigang Deng is currently an Assistant Professor of Computer Science and the Director of Computer Graph- ics and Interactive Media Lab at the University of Houston. His research interests include computer graphics, computer animation, and visualization. He earned his Ph.D. from University of Southern California in 2006, M.S. from Peking University in 2000, and B.S. from Xiamen University in 1997, respectively. His email addres is [email protected].

Deng’s Contact Information: Postal Mailing address: University of Houston Zhigang Deng Department of Computer Science, PGH 501, 4800 Calhoun Road, Houston, TX 77204-3010

10