Received: 25 September 2015 Revised: 15 March 2017 Accepted: 21 March 2017 DOI: 10.1002/stvr.1635

SPECIAL ISSUE PAPER

Detecting display energy hotspots in Android apps

Mian Wan Yuchen Jin Ding Li Jiaping Gui Sonal Mahajan William G. J. Halfond

Department of Computer Science, University of Southern California, Los Angeles, CA, USA Summary Correspondence The energy consumption of mobile apps has become an important consideration for develop- William G. J. Halfond, Department of ers as the underlying mobile devices are constrained by battery capacity. Display represents a Computer Science, University of Southern California, Los Angeles, California, USA. significant portion of an app’s energy consumption—up to 60% of an app’s total energy con- Email: [email protected] sumption. However, developers lack techniques to identify the user interfaces in their apps for Funding information which energy needs to be improved. This paper presents a technique for detecting display energy National Science Foundation, Grant/Award hotspots—user interfaces of a mobile app whose energy consumption is greater than optimal. The Number: CCF-1321141 and CCF-1619455 technique leverages display power modeling and automated display transformation techniques to detect these hotspots and prioritize them for developers. The evaluation of the technique shows that it can predict display energy consumption to within 14% of the ground truth and accurately rank display energy hotspots. Furthermore, the approach found 398 display energy hotspots in a set of 962 popular Android apps, showing the pervasiveness of this problem. For these detected hotspots, the average power savings that could be realized through better user interface design was 30%. Taken together,these results indicate that the approach represents a potentially impact- ful technique for helping developers to detect energy related problems and reduce the energy consumption of their mobile apps.

KEYWORDS display, energy, mobile applications, optimization, power

1 INTRODUCTION An important observation is that the display component of a smart- phone consumes a significant portion of the device’s total battery power. This problem has only grown as smartphone display sizes have In less than six years, mobile apps have gone from zero downloads increased from an average of 2.9 inches in 2007 to 4.8 inches in 2014 to over 35 billion downloads [1,2]. Simultaneously, smartphones have [8]. Studies show that display can now consume up to 60% of the total achieved a nearly 31% penetration rate [3]. Smartphones and apps have energy expended by a mobile app [9,10]. Traditionally, optimizing dis- become so popular, in part, because they combine sensors and data play power has been seen as outside of the control of software devel- access to provide many useful services and a rich user experience. How- opers. This is true for LCD screens, for which energy consumption is ever,the usability of mobile devices is inherently limited by their battery based on the display’s brightness and is controlled by either the end power,and the use of popular features, such as the camera and network, user or the Operationg System (OS) performing opportunistic dimming can quickly deplete a device’s limited battery power.Therefore, energy of the display. However, most modern smartphones, such as the Sam- consumption has become an important concern. For the most part, sung Galaxy S7, are powered by a new generation of screen technology, major reductions in energy consumption have come about through a the organic light-emitting diode (OLED). For this type of screen, bright- focus on developing better batteries, more efficient hardware, and bet- ness is still important [11]; however, the colors that are displayed also ter operating system level resource management. However, software become important. Because of the underlying technology, this type of engineers have become increasingly aware of the way an app’s imple- screen consumes less energy when displaying darker colors (eg, black) mentation can impact its energy consumption [4-7]. This realization than lighter ones (eg, white). The use of these screens means there are has motivated the development of software-level techniques that can enormous energy savings to be realized at the software level by opti- identify energy bugs and provide more insights into the energy related mizing the colors and layouts of the user interfaces (UIs) displayed by behaviors of an application. the smartphone. In fact, prior studies have shown that savings of over 40% can be achieved by this method [6,9,10].

Softw TestVerif Reliab. 2017;27:e1635. wileyonlinelibrary.com/journal/stvr Copyright © 2017 John Wiley & Sons, Ltd. 1of15 https://doi.org/10.1002/stvr.1635 2of15 WAN ET AL.

Despite the high impact of focusing on display energy, developers and identify the most impactful DEHs. Furthermore, the results gener- lack techniques that can help them identify where in their apps such ated by the approach can be generalized from one hardware platform savings can be realized. For example, the well-known Android bat- to others. The approach was also used to investigate 962 Android mar- tery monitor only provides device level display energy consumption ket apps; the investigation showed that 41% of these apps have DEHs. and cannot isolate the display energy per app or per UI screen. Other Overall, these results indicate that the approach can accurately iden- energy-related techniques have focused on surveys to identify patterns tify DEHs and can be useful to assist developers in reducing the display of energy consumption [5], design refactoring techniques that improve energy of their mobile applications. energy consumption [7,12], programming language level constructs to The rest of this paper is organized as follows: Section 2 describes make implementation more energy aware [13], energy visualization the approach for detecting DEHs. Section 3 describes how to build the techniques [14], or energy prediction techniques [15]. Although help- display power model for a device. The results of the evaluation are in ful, the mentioned techniques do not account for display energy nor Section 4. Related work is discussed in Section 5. Finally, the conclu- are they able to isolate display related energy. Existing work on dis- sions and contributions are summarized in Section 6. play energy has focused on techniques that can transform the colors in a UI (eg, Nyx [6] and Chameleon [10]). But these techniques do not 2 APPROACH guide developers as to where they should be applied, therefore they must be (1) used automatically for the entire app, which means that The goal of the approach is to assist developers in identifying UIs that although colors will be transformed automatically into more energy can be improved with respect to energy consumption. More specifi- efficient equivalents, the color transformation may be less aesthetically cally, the approach detects DEHs, which are UIs that consume more pleasing than a developer guided one; or (2) applied based solely on display energy than their energy-optimized versions would. Todetect developers’ intuition as to where they would be most effective, which these, the approach automatically scans each UI of a mobile app and means that some energy-inefficient UIs may be missed. then determines if a more energy efficient version could be designed. This paper presents a novel approach to assist developers in identi- It is important to note that DEHs are not necessarily energy bugs, as fying the UIs of their apps that can be improved with respect to energy the DEHs may not be caused by a fault in the traditional sense. Instead, consumption. To do this, the approach combines display energy mod- DEHs represent points where code is energy inefficient with respect to eling and color transformation techniques to identify a display energy an optimized alternative. After detecting the DEHs, the UIs are ranked hotspot (DEH)—a UI of a mobile app whose energy consumption is in order of the potential energy improvement that could be realized via higher than an energy-optimized but functionally equivalent UI. The energy optimization and then reported to the developers. approach is fully automated and does not require software developers Toachieve complete automation and not require developers to have to use power monitoring equipment to isolate the display energy,which, power monitoring equipment,there are two significant challenges to be as explained in Section 3, requires extensive infrastructure and tech- addressed. The first challenge is to determine how much display energy nical expertise. The approach to identify DEHs performs three general will be consumed by an app at runtime without physical measurements. steps. First, the approach traverses the UIs of an app and takes screen- Toaddress this, the insight is that power consumption can be estimated shots of the app’s UIs when they change in response to different user by a display power model that takes UI screenshots as input. The second actions. Second, for each screenshot, the approach calculates an esti- challenge is to determine whether a more energy-efficient version of a mate of how much energy and power could be saved by using a color UI exists and to quantify the difference between these two versions. To optimized version of the screenshot. Finally, the approach ranks the address this, the insight is that automated energy-oriented color trans- UIs based on the magnitude of these differences. The approach reports formation techniques can be used to recolor the screenshots and then these results, along with detailed power and energy information, to the calculate the difference between the original and the more efficient developer,who can target the most impactful UIs for energy-optimizing version. Based on these two insights, the approach can automatically transformations. detect DEHs without requiring power monitoring equipment. The paper also presents the results of an empirical evaluation of the An overview of the approach is shown in Figure 1. The approach approach on a collection of real-world and popular mobile apps. The requires three inputs: (1) a description of the workload for which the results showed that the approach was able to accurately estimate dis- developers want to evaluate the UIs, (2) a display energy profile (DEP) play power consumption to within 14% of the measured ground truth that describes the hardware characteristics of the target

FIGURE 1 Overview of the approach. UI, user interface WAN ET AL. 3of15 platform, and (3) the mobile app to be analyzed. Using these inputs, 2.2 Step 2: workload execution and screenshot the approach performs the detection in five steps. In the first step, the capture approach instruments the apps to record runtime information about The goal of the second step is to convert the workload description into a the UI layout. This information is used to identify certain types of com- set of screenshots that can drive the display energy analysis in the sub- ponents, such as ads, that should not be part of the DEH identification. sequent steps. Strictly speaking, a workload description is not a neces- The second step is to run the app based on the workload description and sary input to the approach since an automated UI crawler (eg, (PUMA) capture screenshots of the different UIs displayed. In the third step, the approach processes these screenshots and generates energy-efficient [17]) could navigate an app and execute a fixed or random distribu- versions via a color transformation technique. Next, in the fourth step, tion of actions over the UI elements. However, the use of a workload a hardware model based on the DEP is used to predict the display description allows the approach to analyze the app using realistic user energy that would be consumed by each of the screenshots and their behaviors or a particular workload of interest to the developers. For energy-optimized versions. Finally, the fifth step compares the energy example, developers could collect execution traces of real users inter- consumption of each UI with that of its optimized version and gives acting with their app and use this to define a workload for the energy the developers a list of UIs ranked according to the potential energy evaluation. impact of transforming each UI. Each of these steps is now explained in The inputs to this step are the target app and its workload descrip- more detail. tion. The app A is the Android Package Kit (APK) file that can be exe- cuted on a mobile device. The workload W is represented as a sequence of event tuples in which each tuple is of the form ⟨e, t⟩, where e is a rep- 2.1 Step 1: gather UI layout information resentation of the event (eg, “OK button pressed”) and t is a timestamp The goal of the first step is to facilitate the detection of content dis- of when the event occurs relative to the first event (ie, t = 0 for the first played in the app’s UI that should not be considered for the purpose event e1). The approach does not impose a specific format or syntax on of detecting DEHs. This type of content is called Excluding Content W except that it must be reproducible. In other words, it must be speci- (EC) and, broadly, it includes UI elements whose appearance will vary fied in a way that allows for some mechanism to replay the workload. In between executions. Excluding Contents are very common in mobile the current implementation of the approach (Section 4.2), the RERAN apps. For example, mobile ads are present in over 50% of all apps [16]. tool [18] is used to record and then replay a workload description, so Since the colors present in an EC should not or cannot be changed by the exact syntax and format of W is dictated by that tool. a developer, yet they could occupy a potentially significant amount of Given W and A, the approach captures screenshots of the differ- the screenspace of a UI, they must be identified and removed from the ent UIs displayed on a device’s screen during the execution of W on A. screenshots to preserve the usefulness of the calculated display energy The general process is as follows. The replay mechanism executes each for a UI. event at its specified time. A monitor mechanism executes in the back- The primary challenge in detecting ECs is that they are mostly indis- ground of the device and captures a screenshot of the display every tinguishable from static content. For example, there is no visual differ- time it changes. This is done by hooking into the refresh and repaint ence between an image that is static versus one that is dynamically events of the underlying device. The execution of the workload con- loaded, nor is there a difference in terms of the APIs used to display tinues until all event tuples have been executed. Once the screenshots them. A notable exception is mobile ads, which invoke special APIs to have been captured, developers may manually analyze the screenshots visually render themselves in the UI. Based on this insight, the approach to identify areas (in addition to the EC areas that are automatically instruments the app so that when the workload is executed during step identified) that should be excluded from the DEH identification. The 2, the ads’ size and location are recorded. The process to identify ad output of the second step is a sequence of tuples S, in which each tuple ⟨ , ⟩ related information is as follows. First, the approach identifies invoca- is of the form s t , where s is a screenshot and t is the time at which tions in the app’s bytecode that call the app’s ad network API. Then, the screenshot was taken (ie, when the display changed), and F,which certain ad related event handlers and callbacks are instrumented. The contains the EC information collected via the mechanisms described as exact set of invocations to be instrumented varies by ad network. For part of step 1 and the areas marked by the developer. example, for the ad network, AdMob, this set would include Tocapture screenshots, the implementation uses a modified version onAdLoaded and onReceiveAd, which can be defined by an ad lis- of an existing tool called Ashot. Ashot periodically captures screen- tener, and loadAd, which is defined in the ad library and can be called shots of the currently displayed UIs. Ashot has a maximum sampling by an Activity. At runtime, the instrumentation records timestamps, frequency that is fast enough to catch user speed events (eg, clicks) but position, and size information about each of the ads displayed. This will not sample videos or animations at their full refresh rate. This sam- information is used to populate a sequence of tuples F, in which each pling frequency does not affect the accuracy of the approach; it only tuple is of the form ⟨t, a⟩, where t is the time at which the content was reduces the overall number of screenshots captured. Furthermore, to displayed and a represents the location and size of the occupied area. reduce storage overhead, Ashot drops consecutive screenshots that The approach provides a mechanism by which other types of ECs can are identical. The use of Ashot did not introduce any observable delay in be excluded from the DEH detection as well. Tuples may be manually the execution of W. Note that a necessary condition of both the replay added to F. This allows developers to specify image or text areas that and screenshot capture mechanisms is that their use does not alter the they know to be dynamic and that should be excluded from the energy functionality of the app or the UI’s appearance when it is rendered on and power analysis in step 3. the device. Both of these conditions were met by RERAN and Ashot. 4of15 WAN ET AL.

2.3 Step 3: generate energy-efficient the timestamps. Using the screenshot to construct the CCG leads to alternative UIs the third challenge, scalability.The recoloring of the CCG is an NP-hard problem with respect to the number of colors. The rendering kits of The goal of the third step is to generate an optimized version of each mobile devices use antialiasing and color shading to smooth lines and screenshot tuple in S so that the fourth step can calculate estimates of curves. This means that even a simple image, such as a black circle the energy consumption for each screenshot and its optimized version. over a white background, would be rendered with many additional However, this optimized alternative does not exist, so the approach colors, such as grays, to smooth out transitions between adjacent col- must first generate a reasonable approximation of what such an alter- ors. Because of these extra colors, the time needed to generate a CTS native would look like. A guiding insight is that prior work has shown would make the approach’s analysis time impractical. that darker colors on OLED screens are more energy efficient than Toaddress this scalability challenge, the approach maps each color lighter colors [9]. To take advantage of this insight, one could invert in a screenshot to the closest of the 140 standardized UI colors [21] the colors of UIs with a white-colored background or systematically and then uses the resulting reduced set of colors to create the CCG. shift colors to make them darker and then use this transformation as The approach then converts the color with the largest cluster to black, the optimized version. However,these approaches neglect the fact that which is the most energy-efficient color, and solves the CCG recolor- both color inversion and linear color shifts do not maintain color differ- ing problem using a simulated-annealing algorithm to find the CTS [6]. ence, which is the visual relationship that humans perceive when they Guided by the newly generated CTS, the approach recolors the original look at a colored display [6]. Therefore, although the color-adjusted UIs screenshot, except for the areas in F, so that every color in the cluster is would be more energy efficient, they would not represent a reason- replaced with its corresponding color in the CTS. able approximation of optimized UIs as the resulting UIs would not be This process is repeated for every screenshot tuple ⟨s, t⟩ ∈ S.For aesthetically pleasing. each such s, the approach generates an s′, which is the alternate version To address this challenge, the approach leverages a color transfor- of the screenshot recolored as described above. The output of this step mation technique, Nyx, that was developed in prior work [6,19]. A is a function O that maps each s to its corresponding s′. key aspect of Nyx is that the color scheme it generates represents Since the approach uses an approximation algorithm, the generated a reasonably aesthetically pleasing new color scheme. Nyx statically CTS may not reflect the most optimal recoloring. Instead, the recolored analyzes the structure of the HTML pages of a web application and UI represents a lower bound on the potential savings a color opti- generates a color transformation scheme (CTS) that represents a more mization could achieve. Additionally, the use of clustering means the energy-efficient color scheme for the web application. Nyx does this approach performs its analysis on simpler versions of the screenshots by first creating a color conflict graph (CCG), where each node in the with fewer colors. This can also introduce inaccuracy into the power graph is a color that appears in a web page and each edge represents estimation of the color-optimized screenshot. However, unless the the type of visual relationship (eg, “next to”,“enclosing”,or “not touch- screenshots differ significantly in the amount of antialiasing used, this ing”) that any two colors in the CCG have. The edges in the CCG are inaccuracy is small. Toconfirm this, the power consumption between a weighted by the type of visual relationship, with higher weights given set of screenshots and the versions of the screenshots using the results to edges so that “enclosing” > “next to” > “not touching”. Then, Nyx of the clustered colors was compared and the average difference was solves for a recoloring of the CCG that is energy efficient and also main- found to be below 2%; thus indicating the simplified version was a tains, as much as possible, the color distances between colors in the reasonable proxy for the full-color version. original page that have a visual relationship. The weighting allows Nyx to prioritize maintaining certain types of color distances over others. Empirical studies show that the resulting color schemes can reduce dis- 2.4 Step 4: display energy prediction play power consumption of web apps by over 40%. Additionally, user studies of the UIs generated by Nyx and other similar color transforma- The fourth step of the approach computes the display power and tion techniques [6,9,10,20] have shown that the transformed UIs have energy of the screenshots and their energy-efficient alternatives. The high end-user and developer acceptance while only minimally affecting approach does this by analyzing each screenshot obtained in the sec- the resulting UIs’ aesthetics. ond step and its optimized version generated in the third step with The approach adapts the CTS generation process of Nyx. There cost functions that estimate the energy consumption based on the col- are three primary challenges to be addressed to conduct this ors used in the screenshot. The inputs to this step are F, populated adaptation—generation of the CCG, accounting for areas in the in the second step; the screenshot tuples, S, generated by the second screenshots occupied by EC, and scalability.Nyx generates the CCG by step; O, generated by the third step; and the cost function, C,provided statically analyzing the server-side code that dynamically generates by the DEP. (The development of the cost function provided by the web pages. In contrast, the approach only has screenshots available; DEP is explained in Section 3, and an evaluation of its accuracy is in therefore, the adapted CCG models the color relationships between Section 4.3.) The outputs of this step are two functions that map each adjacent pixels, which are identified by analyzing each of each screenshot tuple in S or O to its power (P) and energy (E). screenshot and identifying its color and the colors of its surrounding ∑ ∑ ∑ P(s, t)= C(R , G , B )− C(R , G , B ), (1) pixels. Tohandle ECs, the approach only builds a CCG for the pixels of k k k k k k k∈|s| a∈F(t) k∈|a| the screenshot that are not in an area defined in F. Entries in F can be matched to screenshots by matching the time intervals specified by E(s, ts, te)=P(s)∗(te − ts). (2) WAN ET AL. 5of15

The formulas for calculating the output functions are shown in Although the technique provides developers with a CTS, it does not Equations 1 and 2. Here, s can be replaced with O(s) as needed. Tocal- automatically transform the app to use these colors. Nonetheless, the culate P(s, t) for all ⟨s, t⟩ ∈ S, the approach first sums the power cost color mapping information can be useful. Developers may choose to use of each pixel in s, which is calculated by the cost function C that takes this CTS, build on it as a starting point for graphic designers to create the values associated with the red (R), green (G), and blue (B) values of a new palette, or leverage other automated techniques for identifying the pixel’s color. From the calculated power value, the approach sub- energy efficient and aesthetically pleasing color schemes [20]. Toclose tracts the power values calculated for each of the EC areas contained the loop and use the new color scheme, developers must modify points in s. As in step 3, the approach identifies the EC areas corresponding to in the code where colors are defined and/or modify the Android UI lay- out XML file color specifications so that the app uses the new colors in the screenshots using the timestamp information, represented as F(t) places where the old colors would have been used. and then uses C to calculate the power of each pixel in each EC area a. The sum of the power for all of the EC areas is subtracted from the screenshot’s overall power value. The value returned by P is in Watts. 3 THE DISPLAY ENERGY PROFILE Recall that energy is equal to power multiplied by time. Therefore, E is equal to the power associated with the screenshot (P(s)) multiplied by The DEP provides a pixel-based power cost function for a target mobile the amount of time the screenshot is displayed. The display time is cal- device. The use of the DEP allows the approach to analyze display culated by subtracting the time the screenshot is displayed (t ) from the s power for multiple devices by simply providing different DEP as input. It time the next screenshot is displayed (t ), or in other words, subtracting e is expected that, in the future, a DEP will be developed and provided as the timestamp associated with screenshot s from the timestamp asso- i part of a device’s software development kit. However, this is currently ciated with screenshot s , which would be of the form E(s , t , t ).The i+1 i i i+1 not common in practice, so this section discusses the steps required to value returned by E is in Joules. develop a DEP. At a high level, the DEP provides a cost function that can predict how 2.5 Step 5: prioritizing the UIs much power an OLED screen will consume when displaying a particu- The goal of the fifth step is to rank the UIs in order of their potential lar UI. Prior research work has shown that the power consumption of a power and energy reduction. To do this, the approach calculates the pixel in an OLED screen is based on its color [22]. Therefore, the input power and energy of each color-transformed screenshot and compares to the cost function is the RGB value that defines a pixel’s color.The out- it to the power and energy of the original screenshot. The inputs to the put of the cost function is the amount of Watts that will be consumed fifth step are S, P, E,andO. by the display of the pixel on the target device.

, , . DP(s)=P(s)−P(O(s)), (3) C(R G B)=rR + gG + bB + c (5)

DE(s, ts, te)=E(s, ts, te)−E(O(s), ts, te). (4) The general form of the cost function is shown in Equation 5. R, G, and B represent the red, green, and blue components of a pixel’s color, Given these inputs, the approach calculates the power and energy respectively. The coefficients r, g, b,andc represent empirically deter- difference according to the formulas shown in Equations 3 and 4. For mined constants. The value for each constant varies by mobile device. the difference in power (D ), the approach subtracts the power of the P Note that the power model does not account for screen brightness. This corresponding O(s ) from that associated with each s ∈ S. The resulting i i is generally controlled by the user or OS, not the software developer. number is in Watts and represents the power that could be saved by Furthermore, savings incurred by adjusting brightness would apply uni- using s′ , the color-optimized version of s . For the difference in energy i i formly across all UIs. Display energy profiles for four mobile devices, a (D ), the approach subtracts the energy of the corresponding O(s) from E 2.83" 𝜇OLED -32028-P1T display (𝜇OLED ) from 4D Systems, a Sam- that associated with each s ∈ S. The resulting numbers is in Joules and i sung Galaxy SII (S2), a Samsung (Nexus), and a Samsung represents the energy that could be saved by using s′ instead of s . i i Galaxy S5 (S5) were constructed. For all of these displays, power con- The output of the fifth step is two sequences, R and R .Each P E sumption was measured using the Monsoon Power Monitor (MPM) sequence is comprised of the tuples ⟨s, D⟩ where s is the screenshot and from Monsoon Solutions Inc. [23]. The MPM allows voltage to be held D is either the difference in power (D ) or the difference in energy (D ). P E constant while supplying a current that may be varied from its positive By choosing the metric D or D , the developers could choose whether P E and negative terminals. The MPM samples the voltage and the current or not to take the time spent by each screenshot into consideration. The supplied and outputs the power consumption with a frequency of 5kHz. sequences are ordered by each tuple’s D value from highest to lowest. This sampling frequency is sufficient for the development of the DEP, This ranking is the output of the approach and represents a prioritiza- since the average duration of screenshots is in the order of seconds. tion of the screenshots that appear during the workload’s execution in Each power model was built by roughly following the process out- order of their potential power and energy reduction if they were to be lined by Dong and colleagues [22]. First, the power consumption of color optimized. a completely black screen was measured to define a baseline power usage for an active screen. Todetermine the parameters for each RGB 2.6 Discussion of usage scenarios component, the power consumption of the screen was measured while The output of the approach allows developers to identify the UIs of displaying solid-colored pages. The intensity of each color component their app that could save the most energy with color optimization. was varied while holding the other two components at zero, and data 6of15 WAN ET AL. points for 16 intensities of each component (R, G, and B) were collected. TABLE1 Subject application information In total, 48 data points for each device were obtained. Name Size, MB Screenshots Time, s

After taking measurements for each color component, the baseline Facebook 23.7 116 554 power usage was subtracted from these measurements to isolate the Facebook Messenger 12.9 55 268 power consumption of each R, G, and B component. The relationship FaceQ 17.9 96 470 between the power consumption and the RGB value is non-linear due Instagram 9.7 93 429 to a gamma encoding of the screen. Gamma encoding is a digital image Pandora internet radio 8.0 75 278 editing process that defines the relationship between pixel values and Skype 19.9 65 254 the colors’ luminance. It allows for human eyes to correctly perceive the Snapchat 8.8 142 465 shades of color of images that are captured digitally and displayed on Super-Bright LED Flashlight 5.1 20 51 monitors. Toaccount for this encoding, the RGB values were raised to Twitter 13.7 101 388 the 2.2 power to decode the image. While the gamma value can vary WhatsApp Messenger 15.3 65 242 between 1.8 to 2.6, 2.2 is the standard image gamma of screens adopted Arcus Weather 4.0 36 143 by industry. After gamma decoding of the RGB values, linear regres- Drudge Report 2.8 25 105 sion was used to determine the coefficients for Equation 5. The linear English for kids learning free 7.9 43 181 relationship between the RGB values and the power consumption was Retro Camera 29.0 29 106 very strong. The average R2 value for the four models was 0.99288. The Restaurant Finder 5.3 41 148 detailed coefficients for each device can be found in the project web page [24]. app. The average duration of the workloads was 272 seconds, and each One particular problem with the S2, Nexus, and S5 was that the workload resulted in an average of 67 captured screenshots. Informa- measured energy also included the energy consumed by background tion about the apps is listed in Table 1. For each app, the size of its APK processes and other hardware components of the smartphone. Toprop- file, the number of screenshots captured as part of its workload, and erly isolate the display energy, two measurements were taken. For the the time duration (in seconds) of the recorded workload are reported. first, the flex cable that provided power, as well as data and signals, The first ten rows contain information about the top ten apps, and the between the display and the CPU was disconnected. This offered a remaining five rows contain information about the unobfuscated apps. baseline measurement of the power consumption of the phone with- out the display. By disconnecting the cable, the phone still maintains its background processes instead of suspending them and going into 4.2 Implementation sleep mode. This baseline value was subtracted from the second mea- The approach was implemented in a prototype tool called dLens. The surement, the power of the phone with the display cable attached, to implementation of dLens leveraged several other libraries and tools. To calculate the display power for each of the colored pages. gather the mobile ad UI layout information, apktool [25] and the dex2jar [26] tool were used to reverse engineer the APK . The apps were instrumented using the ASM library [27]. Workloads were recorded 4 EVALUATION and replayed using the RERAN tool [18]. The Ashot [28] tool was used to record screenshots of the different UIs displayed. Ashot was also This section presents the results of an evaluation of the approach. The modified to associate a timestamp with each generated screenshot. As approach was implemented in a tool called dLens, and it was used to described in Section 2.3, the CTS generation was based on the code answer the following research questions: developed in the Nyx project [6,19]. Nyx was adapted to build CCGs using color information obtained from screenshots instead of static RQ 1: How accurate is the dLens analysis? analysis. The energy consumption of each screenshot was measured RQ 2: How generalizable are the dLens results across devices? on an MPM. Finally,the experiments were performed on four different RQ 3: What is the impact of ads on the rankings? platforms: a 2.83" 𝜇OLED -32028-P1T display (𝜇OLED) from 4D Sys- RQ 4: How long does it take to perform the dLens analysis? tems, a Samsung Galaxy SII smartphone (S2), a Samsung Galaxy Nexus RQ 5: What is the potential impact of the dLens analysis? smartphone (Nexus), and a Samsung Galaxy S5 smartphone (S5).

4.1 Subject applications 4.3 RQ1: accuracy

The top ten most popular free Android market apps in the United This research question deals with the accuracy of the dLens approach. States, as of August 2014, were selected as subject applications. How- The accuracy of dLens was evaluated with two metrics. First, the error ever,since these apps either did not contain ads or their ad invocations estimation rate (EER), which is the accuracy of the power estimate pro- could not be instrumented due to obfuscation, another five apps that duced by dLens for a given screenshot, was calculated. Note that the used mobile ads without any obfuscation were added to the evalua- EER is different from the accuracy reported in Section 3, which is the tion. The subject applications are from different developers and have Pearson coefficient that expresses the closeness of the fit between different features. For each of the subjects, a workload was manually the power measurements of the solid-color screenshots and the generated by exercising the primary features and functions of each regression-based model. In contrast, the EER evaluates the closeness WAN ET AL. 7of15

FIGURE 2 The error estimation rate of the power model of the ground truth of the screenshots from the subject applications Also, the top five represented the set most likely to be used to guide with dLens’s estimates. The second accuracy metric was Device Rank- the developers and therefore is the most representative of the accu- ing Accuracy (DRA), which is the accuracy of the UI screenshot ranking racy that an end user of dLens would experience in real usage. For these provided by dLens compared to the UI screenshot ranking provided by top five, the ground truths of each screenshot and its color-transformed the ground truth power measurements. Note that if the EER was 0, the version were calculated and then ranked by their power consump- ranking of the screenshots would be 100% accurate. However,since the tion. Then this ranking was compared against the ranking computed approach is dealing with physical systems, a certain amount of estima- by the dLens tool. To compare the rankings, each ranking of the five tion error is to be expected. Therefore, the measurement of the DRA screenshots was treated as a vector and Spearman’s rank correlation reflects the ability of the dLens approach to rank the UI screenshots coefficient was used to compute the closeness of the vectors. The coef- accurately despite the underlying EER of the approach. The accuracy ficient gives a −1 or 1 when the two rankings are perfectly negatively metrics were only calculated for power since the corresponding energy or positively correlated and is 0 when the two rankings are not corre- measurements could simply be obtained by multiplying the power esti- lated. Across all of the applications, the average DRA for the 𝜇OLED, S2, mates by the length of time the screenshot was displayed. Nexus, and S5 was 0.83, 0.72, 0.6, and 0.89, respectively. These results The EER was calculated by the following process. First, dLens was run indicate that the rankings were not an exact match. The cases where on each of the subject applications to generate the list of screenshots the ranking was not an exact match were investigated and were found ranked by power. Tocompute the EER, five screenshots and their cor- to be likely caused by the closeness of the power consumption mea- responding color-transformed versions, were randomly selected from surements of the screenshots in the top five. For example, it was typical each app. For each pair, their ground truths were measured using the to see the top five separated by about a 2% difference in their overall MPM on each of the four devices. The screenshots were sampled power consumption. This was well within the possible error range that instead of completely measured because the process for isolating a was measured for the EER for each device and was likely the reason for screenshot’s power and energy (see Section 3) was manual and, there- this ranking variation. fore, time-intensive. Then the power values estimated by dLens were Overall, the results for RQ1 show that the dLens tool is very accu- compared to the ground truths for the selected screenshots. Figure 2 rate. For the EER, the estimated power was within 14% of the ground shows, for each of the subject applications and for each of the four truth for all devices. Even with a non-zero EER, dLens was able to accu- devices, the average EER for the five screenshots. Across all of the appli- rately rank the UIs as verified by the ground truth measurements. The 𝜇 cations, the average EER for the OLED, S2, Nexus, and S5 was 3%, minor variations seen in the rankings could be attributed to the small 5%, 8% and 8%, respectively. The accuracy for the corresponding color size of the EER. Both of these accuracy measurements are important 𝜇 transformed versions was 5%, 7%, 6%, and 8%. Overall, the OLED had for developers, as they indicate that their design changes can be made a lower EER than the other three devices. A possible reason for this is with a high degree of confidence in both the actual estimates and the that the 𝜇OLED is only a display device and therefore does not have relative ranking of the UI screenshots. any noise introduced into its power measurements and models by back- ground components or processes that would be experienced by the 4.4 RQ2: generalizability three smartphones. Tocompute the DRA, a process similar to that of calculating the EER This research question evaluates the generalizability of the results was followed. However,instead of a random sample, the top-five ranked computed by dLens . This research question addresses a potential limi- screenshots were chosen. The top five were used because this rep- tation to the usage of the approach. Namely, the screenshots captured resented a group of screenshots that would likely be similar in terms by the approach and the corresponding DEP reflect the power and of power consumption, and therefore, their relative ranking accuracy energy usage of UIs displayed on a particular set of devices. In prac- would be more sensitive to the underlying EER than a random subset. tice, this set would be the device(s) the developer has available for 8of15 WAN ET AL. testing purposes. However, other devices are likely to vary in terms TABLE4 The differences between rankings with and without excluding ads of screen resolution and power consumption characteristics. Tobetter understand this potential limitation, this research question evaluates To p 5 To p 1 0 R a n k overlap overlap correlation how well the rankings for one device match the rankings that would be computed for other devices using their own DEP.If the results of Arcus Weather 4 6 0.8263 the dLens approach are generalizable across mobile devices, develop- Drudge Report 5 9 0.9687 ers can use the results from their own devices as a proxy for other or English for kids learning free 3 9 0.9822 similar devices. Retro Camera 2 7 0.3908 Toanswer this research question, the similarity of the rankings gen- Restaurant Finder 4 8 0.9745 erated by dLens for each of the mobile devices was compared. For this experiment, dLens was run for each device (ie, using its own DEP) 4.5 RQ3: ad impact on the set of all screenshots for each app. For each app, its screen- shot rankings were compared against those computed for the other This research question evaluates the impact of mobile ads on the rank- devices. Tocompare the rankings, the Spearman’s rank correlation coef- ings reported by the dLens approach. Essentially, it is measuring the ficient, explained in Section 4.3, was used. A pair-wise comparison of impact on the rankings of the mechanism to exclude advertisements the rankings generated for each of the four devices was performed. The described in step 1 (Section 2.1). To measure this impact, the dLens average Spearman’s correlation coefficient across all apps for each of approach was run twice, the first time excluding the ad portions of the the devices is shown in Table 2. screenshots and the second time including the ad portions. The rank- The similarity of the top n entries of each ranking was also investi- ings generated by the two variations of the dLens approach were then gated, since developers may only check the top entries instead of the compared. The results of this analysis are shown in Table 4. First, the entire list. To do this, the top n of each device’s ranking were treated rankings of all of the screenshots were compared using the Spearman’s as a set and its overlap with that of the other devices was computed. rank correlation coefficient. This is shown in the table as “Rank Cor- For each such comparison, the cardinality of the intersection of the two relation.” The amount of overlap in the top n of the rankings was also set’s intersection was computed. This result is shown in Table 3 for n computed by treating the top n screenshots as a set and computing the equal to 5 and 10. The results show a high similarity in the top n of the cardinality of the intersection of the two rankings. The results of this rankings. The average overlap between rankings is over 4 for the top 5 comparison are shown in the columns labeled “Top N Overlap” where and over 9 for the top 10, demonstrating that this similarity applies to N was set to 5 and 10. The results show that for all apps and com- the portion of the ranking most likely to be used by developers. parisons except for one (top 5 for Drudge Report) ads could affect the Overall, the results show that the rankings generated by dLens for rankings. The magnitude of this impact varied significantly. For Arcus each of the mobile devices were, in fact, highly similar. The implica- Weather and Retro Camera, the impact was significantly higher than tions of this finding are that the results of running dLens on one device for the other apps, such as Drudge Report and Restaurant Finder. The are similar to those for other devices. More broadly, this indicates that results differed due to a number of reasons, including the prevalence energy-reducing redesigns undertaken by developers based on results of ads and the appearance of the rest of the UI. These results indicate from one device are likely to also reduce energy on other devices. that although ads account for a small portion of the UI display,for some apps, they can have a significant impact on the rankings of the DEHs and thus it is useful to have a mechanism to exclude them from the DEH calculations. TABLE2 Average Spearman’s correlation coefficient of rankings between devices Base Device 𝜇OLED S2 Nexus S5 4.6 RQ4: analysis time 𝜇OLED - 0.9874 0.9849 0.9888 S2 0.9874 - 0.9985 0.9990 This research question addresses the time needed to analyze an app Nexus 0.9849 0.9985 - 0.9942 using the dLens approach. The overall time needed to run the dLens S5 0.9888 0.9990 0.9942 - approach and the time for three of the steps (instrumentation, power estimation, and color transformation) was measured. Note that for this RQ, the time to replay the workload for each app was not included as this time is under developer control. TABLE3 Average common screenshots in top 5 and top 10 between The analysis times are shown in Table 5. Only the analysis time for devices the S2 DEP was reported because the results were very similar for all To p 5 To p 1 0 four devices. The time for the color transformation is shown as “TC” 𝜇OLED vs S2 4.33 9.47 and the time for power estimation is shown as “TE.” For the five apps 𝜇OLED vs .6 9.67 with ads, which required instrumentation, the required time is shown 𝜇OLED vs S5 4.47 9.2 as “TI.” The total time is shown as “AllUI”,which also includes time for file S2 vs Nexus 4.47 9.33 processing, ranking, etc. Since each app varies in terms of the number S2 vs S5 4.07 9 of screenshots in its workload, the time measurement was also normal- Nexus vs S5 4.27 9 ized by dividing the total time by the number of screenshots captured WAN ET AL. 9of15

TABLE5 Analysis time of the dLens approach For the subject apps, the potential savings are shown in Figure 3. As

Name TC,s TE,s TI, s All UIs, s Per UI, s the figure shows, the savings varied from 0% to 28%. Flashlight’s poten- Facebook 1470 7 - 1477 12 tial savings were low because it has an almost all black background, Facebook Messenger 997 3 - 1001 18 which means it was already close to optimal and there were only minus- FaceQ 1145 5 - 1151 12 cule improvements to be realized. For the market apps, dLens found Instagram 2799 6 - 2806 30 that 398 of the 962 apps contained DEHs. That means that for 41% of Pandora internet radio 1418 4 - 1423 19 the examined apps, their main page consumes more display power than Skype 871 3 - 875 13 the optimized version. On average, the optimized versions would con- Snapchat 1444 8 - 1453 10 sume 30% less energy than the original versions. For some apps, this Super-Bright LED Flashlight 863 1 - 865 43 number was as large as 50%. For the apps with DEHs, Figure 4 shows Twitter 1316 6 - 1323 13 how much energy could be saved by using the optimized version. In this WhatsApp Messenger 897 3 - 901 13 figure, a point (X,Y) on the line means that there are X apps that could Arcus Weather 879 2 1 883 25 consume at least Y% less energy than their original version. Drudge Report 1192 2 1 1195 48 The app with the largest potential energy saving was “Bible Study.” English for kids learning free 1377 4 1 1382 32 This app’s original design and optimized design are shown in Figure 5. Retro Camera 1899 3 1 1903 66 In Bible Study, the original design uses a significant amount of white as Restaurant Finder 1320 3 1 1324 32 the background color, which consumes more energy than other colors. Abbreviation: UI, user interface. However,by using black as the background color,as shown in Figure 5B, a significant amount of energy could be saved. Information about the in each app’s workload. This value is shown in the column labeled “Per top 10 most energy-inefficient apps is shown in Table 6. UI.” All time measurements are shown in seconds. The category distribution of the apps, per their app store category, The average time for dLens to analyze an app was 22 minutes and was also analyzed. The categories that involved extensive reading or ranged from 14 to 46 minutes. Although this amount can be considered presented textual information, such as Communication and News & high, it was directly dependent on the overall size of the set of screen- Magazines, tended to have a higher ratio of apps with DEHs (above shots, which ranged from 20 to 142. Therefore, the per screenshot 43%). The categories that contained more video and graphical infor- number, which ranged from 12 to 66 seconds, is informative. For each mation tended to have fewer apps with DEHs (below 20%). A possi- screenshot, it was observed that most of the time was taken by the ble explanation for this is that developers of text or reading oriented generation of the CTS. The runtime of the CTS algorithm is expo- apps prefer to use light colored backgrounds (eg, white) to mimic the nential with respect to the number of colors present in a screenshot. traditional print media colors (eg, newspapers) or because this color The Nyx approach uses an approximation algorithm to solve this combination is considered more readable. The increased readability of problem. Therefore, this aspect of the approach can be sped up, light-colored apps is supported by evidence from user studies in prior if needed, by accepting lower quality approximations. However, this work [6], but this same study also showed that users would prefer may have a trade-off in terms of accuracy of the computed results energy savings over a small decrease in readability. This suggests that and rankings. developers of these types of apps might improve user satisfaction by optimizing the color of their apps’ UIs and explaining to end users the 4.7 RQ5: potential impact energy related benefits of the revamped design. This research question investigated the potential impact of dLens in The UI color choices for the apps were also investigated. The apps two ways. The first was by determining how many market apps con- that did not contain DEHs tended to use black (#000000) as the main tained DEHs, and the second was by computing the amount of energy color.Here, a color is considered the main color of a screenshot if it cov- savings that could be realized by transforming the apps that contained ers the most pixels in the screenshot. On average, black covered 79.3% DEHs. This analysis was performed on both the subject apps listed in of the pixels of the screenshots of those apps. Dimgray (#696969), dark- Table 1 and on a much larger sample of 1082 random Android mar- slategray (#2F4F4F), darkgray (#A9A9A9), and gray (#808080) were ket apps. also commonly used as secondary colors in these apps. These colors Because of the large number of apps, the evaluation process was covered 8%, 5%, 3%, and 2% of the pixels, respectively, of the screen- fully automated. The execution of the apps and the capture of their shots. The color combination black:darkgray:gray:white:dimgray with initial home page was automated by using the adb [29] tool from the the ratio 794:32:13:10:6 was the most commonly used combination Android software development kit. One challenge was to automatically in apps without DEHs. It was used in 23% of apps without DEHs and detect when the initial page of an app was valid (ie, finished loading). on average comprised 86% of the screenshots. In the apps with DEHs, This problem was solved with the heuristic that the screenshot was cap- white, dimgray, and whitesmoke (#F5F5F5) were the three most pop- tured five seconds after the app started. For almost all of the apps, this ular main colors. They were used as the main colors in 42%, 12%, and was sufficient time for the initial UI to load and display.Toensure that all 10% of the apps, respectively.On average, each of these colors covered the apps had been executed successfully,all screenshots were manually 65%, 56%, and 55% of the pixels on the screen. checked, and invalid screenshots, such as those representing crashed The results of this RQ showed that many apps in the Android market apps, were removed. In total, screenshots of 962 apps were valid and are not optimized in terms of display energy efficiency and that signif- dLens was run on this set. icant savings could be realized through their optimization. The results 10 of 15 WAN ET AL.

FIGURE 3 The average estimated power savings of the subject apps

FIGURE 4 The number of apps with display energy hotspots

(A) (B)

FIGURE 5 Transformed and original screenshots of the most energy-inefficient app WAN ET AL. 11 of 15

TABLE6 The ten apps with the largest DEHs Potential Power App Package Name Savings (%)

biblereader.olivetree 50 com.amirnaor.happybday 49 com.adp.run.mobile 49 com.airbnb.android 49 com.darkdroiddevs.blondJokes 49 com.chapslife.nyc.traffic.free 49 com.al.braces 48 appinventor.ai_vishnuelectric.notextwhiledriving2_checkpoint1 48 appinventor.ai_freebies_freesamples_coupons.StoreCoupons 48 bazinga.emoticon 48

Abbreviation: DEH, display energy hotspot. of this RQ also revealed basic trends and UI design patterns in both high enough to affect the rankings of some apps. Therefore, a failure to apps with and without DEHs. These results show that dLens can gen- properly exclude EC could lead to misranked screenshots. erate useful and actionable information that can have a large potential A final threat to internal validity is that the approach assumes that impact in terms of helping developers optimize the display energy for the CTSs generated by Nyx represents an aesthetically acceptable ver- their apps above and beyond the detection of DEHs. sion of the UIs. This assumption is reasonable based on prior studies that measured end users’ assessment of the aesthetics of the generated color schemes. In an end user study of the color schemes generated 4.8 Threats to validity by Nyx, it was found that although the readability and attractiveness of the transformed UIs were rated slightly lower than the originals, A possible threat to the external validity of the results is the selec- users overwhelmingly preferred the new color schemes when made tion of only the most popular apps and the workload generation. For aware of the energy trade-offs [6]. More recent approaches that incor- app selection, although the most popular apps may not be represen- porate additional aesthetic constraints into the generation of color tative of all apps, choosing subjects in this way eliminated the threat schemes report even better preference results [20]. Taken together, of selection bias for the subjects and made it possible to argue that these results indicate that the automatically generated color schemes the approach is useful for even well-engineered apps. Furthermore, the represent reasonable proxies for an aesthetically acceptable version of results of RQ4 show the general applicability of the approach to a wide the analyzed UIs. range of other apps. Even though the workloads were generated by this For RQ1, a threat to validity is that the accuracy of only the top-five paper’s authors, the workloads used the primary features of the apps, ranked screenshots and a random sample of the remaining screenshots which were easy to identify in all cases. It is important to note that the were measured. Sampling was used since the screenshot isolation pro- usefulness of the approach does not depend on accurately identifying cess described in Section 3 is very time consuming and it was not feasi- typical or representative workloads, as the approach can be used for ble to measure the power and energy of every screenshot. Nonetheless, any workload of interest. the experiment evaluation included over 1800 such measurements to A threat to the internal validity of the results could exist if the evaluate the accuracy of the four devices for all of the experiments mechanism of establishing ground truth or the approximation of the required by the RQs. The subset was chosen at random to eliminate any CTS generated by the Nyx-based method was inaccurate. To ensure potential selection bias, and there is not a reason to believe that the the accuracy of the ground truth measurements, the protocols were results would differ with a larger subset. The raw data is available via developed and tested to ensure that they reliably and accurately cap- the dLens project page [31]. tured power measurements. The protocols are also based on prior For RQ2, there are two threats to validity. The first of these is a approaches [6,22,30]. The accuracy of the Nyx-based CTS has been threat to criterion validity for the selection of ranking as the metric established in prior work [6]. that was used to evaluate consistency across devices. An alterna- A threat to internal validity is that that the screenshots may contain tive would have been to use power measurements. However, rank- additional ECs whose colors should not be optimized. Todetermine the ing is the more appropriate metric since developers would be likely magnitude of this threat to validity, a study was performed to quan- to prioritize their work based on rank instead of actual power dif- tify the potential impact of not excluding dynamic images and text. ferences. Second, a threat to external validity for RQ2 is that only In this study, the screenshots of nine apps (representing 298 screen- four power models were used, and three of them are manufactured shots) were manually modified to remove the display area associated by Samsung. Despite the results that showed generalizability across with EC from the display energy calculation. Then the difference in devices, it is likely that some phones may have DEP cost functions power between the original and modified versions of the screenshots that will result in different rankings due to different underlying hard- was calculated. The median power difference was 0.81% and the aver- ware mechanisms. Therefore, the results will not always generalize age difference was 1.8%. Although these numbers are small, they are to all devices. Future investigations into what enables more reliable 12 of 15 WAN ET AL. result generalizability will enable stronger conclusions to be made on been identified by dLens, these techniques can help guide developers this aspect. to choose new color schemes that can reduce energy while maintaining For RQ3, a threat to validity is that only the ranking differences for the aesthetics of the app’s UI. five apps were computed. This was due to the limitations of reverse Other proposed ways to save display energy include darkening the engineering tools and the use of obfuscation techniques in the apps. unimportant parts of a screen. Iyer and colleagues [40] proposed a However, these five apps cover different app categories, and it is likely method to reduce the energy consumption of OLED screens by dark- that the results will be similar when considering a larger set of apps ening the user-unfocused areas. Wee and colleagues [41] designed an containing ads. approach to reduce the power consumption of gaming on OLED dis- plays by dimming noninteresting parts. Tan and colleagues [42] also proposed a tool called Focus to reduce OLED display power by dimming 5 RELATED WORK less important regions. Chen and colleagues [43] reduced OLED dis- play power by using a dimming scheme to eliminate undesired details. Lin and colleagues [44] provided an OLED power saving technique by This paper extends prior work [32] by the authors in the following ways: distorting image regions according to visual attention level. For LCD (1) dLens was enhanced to be able to exclude dynamic content or con- displays, energy optimization is mainly achieved by reducing external tent that should not be considered in the identification of the DEHs. For brightness [45,46] or refresh rates [47] and adapting the content to the advertisements, a fully automated mechanism was developed for iden- brightness change [48]. tifying and excluding the portion of the screen they occupy.This mecha- Sampson and colleagues [49] developed a tool called WebChar to nism also allows developers to manually specify other areas to exclude evaluate browser performance and the energy consumption of differ- from the analysis. (2) The size of the evaluation was expanded by adding ent code features in HTML and CSS, which guides developers to opti- five additional apps and adding an additional mobile phone platform, mize browsers or web applications. This work focused on the impact of the Samsung Galaxy S5, to evaluate the approach. Overall, these exten- sions improve the accuracy of the technique and the generalizability of HTML and CSS code structure on energy consumption, whereas dLens the results. focuses on UI color design. The approach to building the DEP is based on research work per- Many approaches have also been proposed to model the power formed by Dong and colleagues [22,30]. In their work they constructed consumption of other components in mobile devices (ie, nondisplay a power model for a commercial QVGA OLED display module. In components). Shye and colleagues [50] obtained power models for their power model, they demonstrated the linear relationship between all components of an Android G1 based on measurements of a log- power consumption and sRGB value and achieved 90% accuracy in ger they developed. Negri and colleagues [51] treated applications as the display power estimation. Other power models for 𝜇OLED screens Finite-state Machines (FSMs) and built power models through mea- have also been proposed and could also be used to implement the DEP. surement of selected states. Several approaches [11,52,53] acquired Zhang [11] built a quadratic model for OLED screens, which increased power models automatically based on the battery behavior of mobile estimation accuracy. Kim and colleagues [33] modified Dong and col- devices. Other techniques estimate energy consumption based on dif- leagues’ model by considering the brightness and sum of RGB values ferent models. Previous work [14,15,54] estimated energy consump- for AMOLED screens. Mittal and colleagues [34] refined a power model tion at the source line level. Tiwari and colleagues [55,56] modeled the based on threshold of RGB component values. However, these works CPU energy of hardware instructions. Eprof [57] modeled energy with only focused on developing new techniques for modeling the power a state machine. Wang and colleagues [58] estimated the power con- consumption of 𝜇OLED screens and did not detect DEHs or give opti- sumption of mobile applications with profile-based battery traces. Li mization guidance to developers. and colleagues [59] proposed Bugu, an application level power profiler Many approaches have been proposed to help developers reduce and analyzer for mobile phones. Tsao and colleagues [60] estimated the the energy consumption of UIs by manipulating their colors. The dLens energy consumption of I/O requests in application processes. Xiao and approach builds on prior work in the area, Nyx [6,19,35], to conduct colleagues [61] proposed a methodology to build system-level power step 3 of the approach. Linares-Vásquez and colleagues [20] showed models for different components of mobile devices without measure- that the same problem could be solved using a multi-objective genetic ments. algorithm–based approach. Prior to Nyx, Dong and colleagues [22,36] In addition, many other researchers have empirically investigated proposed a method to generate CTSs and then later used this to build the energy consumption of Android apps [4], and the energy impacts a color transforming browser that could reduce the energy of the web of mobile advertisement [62,63], software changes [64,65,66,67], user pages it displayed [9,10]. On the basis of a screen space variant energy choice of applications [68], blocking advertisements [69], refactor- model, Chuang and colleagues [37] presented an approach to gener- ing [70,71], obfuscation [72,73], and test suite selection [74]. Other ate energy-efficient color designs by using iso-lightness colors. Wang researchers have proposed new techniques to detect different kinds and colleagues [38] proposed a technique to find an energy-saving of energy bugs. Pathak and colleagues defined energy bugs [75] and color scheme for sequential data on OLED screens. Kamijoh and col- proposed an automatic technique to detect energy bugs on smart- leagues reduced the display energy of the IBM Wristwatch by reducing phones [7]. However,they only focused on identifying “wakelock bugs.” the number of white pixels [39]. All of these techniques assume that a Linares-Vásquez [76] studied energy greedy API usage in Android DEH has been located. As such, these techniques can be seen as com- apps. Liu and colleagues [77] proposed an automated approach to plementary to the approach described in this paper. Once a DEH has detect energy bugs that did not deactivate sensors and misused sensor WAN ET AL. 13 of 15 data. Other researchers have focused on optimization of mobile apps. International Conference on Software Engineering, ICSE 2014, ACM, New Kwon and colleagues [78] optimized the energy consumption by York, NY,USA, 2014, pp. 527–538. offloading partial app functionality to the cloud. Li and colleagues 7. A. Pathak et al., What is keeping my phone awake? Characterizing and detecting no-sleep energy bugs in smartphone apps, Proceedings of the [79,80] used a proxy server to optimize the HTTP requests of mobile 10th International Conference on Mobile Systems, Applications, and Ser- apps. For Green Mining, Romansky and colleagues [81] introduced a vices, MobiSys ’12, ACM, New York,NY,USA, 2012, pp. 267–280. search-based approximation method to accelerate harvesting energy 8. A. Barredo, A comprehensive look at smartphone screen size statistics and profiles while minimizing the accuracy loss. In addition, Hindle and col- trends. Medium. leagues [82] proposed a dedicated hardware Mining Software Repos- 9. M. Dong and L. Zhong, Chameleon: a color-adaptive for mobile OLED displays, Proceedings of the 9th International Conference on itories (MSR)-based test harness called Green Miner to facilitate the Mobile Systems, Applications, and Services, MobiSys ’11, ACM, New York, research of Green Mining. NY,USA, 2011, pp. 85–98. 10. M. Dong and L. Zhong, Chameleon: a color-adaptive web browser for 6 CONCLUSIONS mobile OLED displays, IEEE Trans. Mobile Comput. 11 (2012), no. 5, 724–738. 11. L. Zhang. (2013). Power, performance modeling and optimization for This paper presented a new technique for detecting DEHs in mobile mobile system and applications, Ph.D. Thesis. apps. A DEH is defined as a UI of a mobile app whose energy con- 12. C. Sahin et al., Initial explorations on design pattern energy usage, sumption is higher than that of an energy-optimized but functionally Proceedings of the First International Workshop on Green and Sustain- equivalent UI. The approach detects DEHs with five steps. First, the able Software, GREENS ’12, IEEE Press, Piscataway, NJ, USA, 2012, pp. 55–61. approach processes the target app and instruments mobile ads so that 13. I. Manotas, L. Pollock, and J. Clause, SEEDS: A software engineer’s their location can be identified at runtime. Second, the approach exe- energy-optimization decision support framework, Proceedings of the cutes a developer specified workload for the target app on a mobile 36th International Conference on Software Engineering, ICSE 2014, ACM, device and captures screenshots of the display and ad related informa- New York,NY,USA, 2014, pp. 503–514. tion. Third, the approach generates a CTS for each captured screen- 14. D. Li et al., Calculating source line level energy information for android shot and produces an approximation of an energy-efficient version. applications, Proceedings of the 2013 International Symposium on Soft- ware Testingand Analysis, ISSTA 2013, ACM, New York,NY, USA, 2013, Fourth, it analyzes each screenshot and its alternative version to deter- pp. 78–89. mine their expected display energy.Finally,the approach calculates the 15. S. Hao et al., Estimating mobile application energy consumption using power and energy difference between the two versions of screenshots program analysis, Proceedings of the 2013 International Conference on and provides developers with a list of DEHs. The approach was imple- Software Engineering, ICSE ’13, IEEE Press, Piscataway, NJ, USA, 2013, pp. 92–101. mented in a prototype tool, dLens, and evaluated on real-world popular 16. I. J. M. Ruiz et al., Impact of Ad libraries on ratings of android mobile apps, mobile apps. The results of the evaluation showed that the approach IEEE Software 31 (2014), no. 6, 86–92. was able to accurately estimate display power consumption and rank 17. S. Hao et al., PUMA: Programmable UI-automation for large-scale screenshots with DEHs. Furthermore, the technique was able to detect dynamic analysis of mobile apps, Proceedings of the 12th Annual Interna- DEHs in 398 Android market apps. Overall, the results are very promis- tional Conference on Mobile Systems, Applications, and Services, MobiSys ’14, ACM, New York,NY,USA, 2014, pp. 204–217. ing and indicate that the technique is a viable and potentially impactful 18. L. Gomez et al., RERAN: Timing- and touch-sensitive record and replay technique for helping developers to reduce the energy consumption of for android, Proceedings of the 2013 International Conference on Software their mobile apps. Engineering, ICSE ’13, IEEE Press, Piscataway,NJ, USA, 2013, pp. 72–81. 19. D. Li, A. H. Tran, and W. G. J. Halfond, Nyx: A Display Energy Optimizer for Mobile Web Apps, Proceedings of the 2015 10th Joint Meeting on ACKNOWLEDGEMENTS Foundations of Software Engineering, ESEC/FSE 2015, ACM, New York, This work was supported, in part, by the National Science Foundation NY,USA, 2015, pp. 958–961. under grant numbers CCF-1321141 and CCF-1619455. 20. M. Linares-Vásquez et al., Optimizing energy consumption of GUIs in android apps: A multi-objective approach, Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2015, ACM, New York,NY,USA, 2015, pp. 143–154. REFERENCES 21. Recognized color keyword names. http://www.w3.org/TR/SVG/types. 1. Inc A., Press Releases. http://www.apple.com/pr/library. html. 2. C. Warren, Hits 1 Million Apps. Mashable. Retrieved, 4 June 22. M. Dong and L. Zhong, Power modeling and optimization for OLED dis- 2014. plays, IEEE Trans. Mobile Comput. 11 (2012), no. 9, 1587–1599. 3. T.Ahonen, Lets Do 2014 Numbers for the Mobile Industry: Now we are at 23. Inc. Monsoon Solutions, Power Monitor. https://www.msoon.com/ 100% Mobile Subscription Penetration Rate Per Capita Globally, Retrieved LabEquipment/PowerMonitor/. May 12, 2014. 24. DEP coefficients. https://sites.google.com/site/dlensproject/ 4. D. Li et al., An empirical study of the energy consumption of android dep-coefficients. applications, Software Maintenance and Evolution (ICSME), 2014 25. A tool for reverse engineering Android apk files. https://ibotpeaches. IEEE International Conference on, 2014, Washington, DC, USA, pp. github.io/Apktool/. 121–130. 26. dex2jar. https://github.com/pxb1988/dex2jar. 5. D. Li and W.G. J. Halfond, An investigation into energy-saving program- ming practices for android smartphone app development, Proceedings 27. ASM. http://asm.ow2.org/. of the 3rd International Workshop on Green and Sustainable Software, 28. Android Screenshots and Screen Capture. http://sourceforge.net/ GREENS 2014, ACM, New York,NY,USA, 2014, pp. 46–53. projects/ashot/. 6. D. Li, A. H. Tran, and W. G. J. Halfond, Making web applications 29. G. Inc., Android Debug Bridge. http://developer.android.com/tools/help/ more energy efficient for OLED smartphones, Proceedings of the 36th adb.html. 14 of 15 WAN ET AL.

30. M. Dong, Y-SK Choi, and L. Zhong, Power modeling of graphical user 50. A. Shye, B. Scholbrock, and G. Memik, Into the wild: Studying real user interfaces on OLED displays, Proceedings of the 46th Annual Design activity patterns to guide power optimizations for mobile architec- Automation Conference, DAC ’09, ACM, New York, NY, USA, 2009, tures, Proceedings of the 42nd Annual IEEE/ACM International Sympo- pp. 652–657. sium on Microarchitecture, MICRO 42, ACM, New York, NY, USA, 2009, 31. dLens. https://sites.google.com/site/dlensproject/. pp. 168–178. 32. M. Wan et al., Detecting display energy hotspots in android apps, 2015 51. L. Negri, D. Barretta, and W. Fornaciari, Application-level power man- IEEE 8th International Conference on Software Testing, Verification agement in pervasive computing systems: A case study, Proceedings of and Validation (ICST), Washington, DC, USA, 2015, pp. 1–10. the 1st Conference on Computing Frontiers, CF ’04, ACM, New York, NY, USA, 2004, pp. 78–88. 33. D. Kim, W. Jung, and H. Cha, in Runtime power estimation of mobile AMOLED displays, Design, Automation Test in Europe Conference 52. L. Zhang et al., Accurate online power estimation and automatic bat- Exhibition (DATE), 2013, Washington, DC, USA, 2013, pp. 61–64. tery behavior based power model generation for smartphones, Hard- 34. R. Mittal, A. Kansal, and R. Chandra, Empowering developers to esti- ware/Software Codesign and System Synthesis (CODES+ISSS), 2010 mate app energy consumption, Proceedings of the 18th Annual Interna- IEEE/ACM/IFIP International Conference on, 2010, New York, NY, tional Conference on Mobile Computing and Networking, Mobicom ’12, USA, pp. 105–114. ACM, New York,NY,USA, 2012, pp. 317–328. 53. M. Dong and L. Zhong, Self-constructive high-rate system energy 35. D. Li, A. H. Tran, and W. G. J. Halfond, Optimizing display energy con- modeling for battery-powered mobile systems, Proceedings of the 9th sumption for hybrid android apps (invited talk), Proceedings of the 3rd International Conference on Mobile Systems, Applications, and Services, International Workshop on Software Development Lifecycle for Mobile, MobiSys ’11, ACM, New York,NY,USA, 2011, pp. 335–348. DeMobile 2015, ACM, New York,NY,USA, 2015, pp. 35–36. 54. S. Hao et al., Estimating android applications’ CPU energy usage via 36. M. Dong, Y-SK Choi, and L. Zhong, Power-saving color transforma- bytecode profiling, Proceedings of the First International Workshop on tion of mobile graphical user interfaces on OLED-based displays, Pro- Green and Sustainable Software, GREENS ’12, IEEE Press, Piscataway, ceedings of the 2009 ACM/IEEE International Symposium on Low Power NJ, USA, 2012, pp. 1–7. Electronics and Design, ISLPED ’09, ACM, New York, NY, USA, 2009, 55. V.Tiwari, S. Malik, and A. Wolfe, Power analysis of embedded software: A pp. 339–342. first step towards software power minimization (1994), 384–390. 37. J. Chuang, D. Weiskopf, and T.Möller, Energy aware color sets,Comput. 56. V. Tiwari et al., Instruction level power analysis and optimization of Graphics Forum 28 (2009), no.2, 203–211. software, VLSI Design, 1996. Proceedings., Ninth International Con- 38. J. Wang, X. Lin, and C. North, GreenVis: energy-saving color schemes for ference on, Washington, DC, USA, 1996, pp. 326–328. sequential data visualization on OLED displays, 2012. 57. A. Pathak, Y. C. Hu, and M. Zhang, Where is the energy spent inside 39. N. Kamijoh et al., Energy trade-offs in the IBM wristwatch computer, my app?: fine grained energy accounting on smartphones with Eprof, Proceedings of the 5th IEEE International Symposium on Wearable Proceedings of the 7th ACM European Conference on Computer Systems, Computers, ISWC ’01, IEEE Computer Society, Washington, DC, USA, EuroSys ’12, ACM, New York,NY,USA, 2012, pp. 29–42. 2001, pp. 133–140. 58. C. Wang et al., Power estimation for mobile applications with 40. S. Iyer et al., Energy-adaptive display system designs for future mobile profile-driven battery traces, Low Power Electronics and Design environments, Proceedings of the 1st International Conference on Mobile (ISLPED), 2013 IEEE International Symposium on, Washington, DC, Systems, Applications and Services, MobiSys ’03, ACM, New York, NY, USA, 2013, pp. 120–125. USA, 2003, pp. 245–258. 41. T. K. Wee and R. K. Balan, Adaptive display power management for 59. Y. Li, H. Chen, and W. Shi, Power behavior analysis of mobile applications OLED displays, Proceedings of the First ACM International Workshop on using bugu, Sustainable Comput. Inf.Syst. (2014). Mobile Gaming, MobileGames ’12, ACM, New York, NY, USA, 2012, 60. S-L Tsao, C-K Yu, and Y-H Chang, Architecture of Computing Sys- pp. 25–30. tems – ARCS 2013: 26th International Conference, Prague, Czech Republic, 42. K. W.Tan et al., FOCUS: A usable & effective approach to OLED display February 19-22, 2013. Proceedings, Springer Berlin Heidelberg, Berlin, power management, Proceedings of the 2013 ACM International Joint Heidelberg, 2013, pp. 195–206. Conference on Pervasive and Ubiquitous Computing, UbiComp ’13, ACM, 61. Y. Xiao et al., A system-level model for runtime power estimation New York,NY,USA, 2013, pp. 573–582. on mobile devices, Green Computing and Communications (GREEN- 43. H. Chen et al., An image-space energy-saving visualization scheme for COM), 2010 IEEE/ACM Int’l Conference on Int’l Conference on Cyber, OLED displays, Comput. Graphics (2014). Physical and Social Computing (CPSCom), Washington,DC, USA, 2010, 44. C. H. Lin, C-K Kang, and P. C. Hsiu, in Catch your attention: pp. 27–34. Quality-retaining power saving on mobile OLED displays, 2014 51st 62. J. Gui et al., Truth in advertising: the hidden cost of mobile Ads for ACM/EDAC/IEEE Design Automation Conference (DAC), New York, software developers, Proceedings of the 37th International Conference on NY,USA, 2014, pp. 1–6. Software Engineering - Volume 1, ICSE ’15, IEEE Press, Piscataway, NJ, 45. H. Shim, N. Chang, and M. Pedram, A backlight power management frame- USA, 2015, pp. 100–110. work for battery-operated multimedia systems, IEEE Design Testof Com- 63. J. Gui et al., Lightweight measurement and estimation of mobile Ad put. 21 (2004), no. 5, 388–396. energy consumption, Proceedings of the 5th International Workshop on 46. A. Iranli and M. Pedram, in DTM: Dynamic tone mapping for back- Green and Sustainable Software, GREENS ’16, ACM, New York,NY,USA, light scaling, Proceedings. 42nd Design Automation Conference, 2005, 2016, pp. 1–7. 2005, New York,NY,USA, pp. 612–616. 64. A. Hindle, in Green mining: Investigating power consumption across 47. A. K. Bhowmik and R. J. Brennan, in System-level display power reduc- versions, 2012 34th International Conference on Software Engineer- tion technologies for portable computing and communications devices, Portable Information Devices, 2007. Portable07. IEEE International ing (ICSE), Washington, DC, USA, 2012, pp. 1301–1304. Conference on, Washington, DC, USA, 2007, pp. 1–5. 65. A. Hindle, in Green mining: A methodology of relating software change 48. A. Iranli, H. Fatemi, and M. Pedram, HEBS: Histogram equalization to power consumption, 2012 9th IEEE Working Conference on Mining for backlight scaling, Proceedings of the Conference on Design, Automa- Software Repositories (MSR), Washington, DC, USA, 2012, pp. 78–87. tion and Test in Europe - Volume 1, DATE ’05, IEEE Computer Society, 66. K. Aggarwal et al., The power of system call traces: Predicting Washington, DC, USA, 2005, pp. 346–351. the software energy consumption impact of changes, Proceedings of 49. A. Sampson et al., Automatic discovery of performance and energy pit- 24th Annual International Conference on Computer Science and Soft- falls in HTML and CSS, Workload Characterization (IISWC), 2012 IEEE ware Engineering, CASCON ’14, IBM Corp., Riverton, NJ, USA, 2014, International Symposium on, Washington, DC, USA, 2012, pp. 82–83. pp. 219–233. WAN ET AL. 15 of 15

67. A. Hindle, Green Mining: a Methodology of Relating Software Change and 77. Y. Liu et al., GreenDroid: automated diagnosis of energy inefficiency for Configuration to Power Consumption, Empirical Software Eng. 20 (2015), smartphone applications, IEEE Trans. Software Eng. 40 (2014), no. 9, no. 2, 374–409. 911–940. 68. C. Zhang, A. Hindle, and D. M. German, The impact of user choice on 78. Y-W Kwon and E. Tilevich, Reducing the energy consumption of mobile energy consumption, IEEE Software 31 (2014), no. 3, 69–75. applications behind the scenes, Proceedings of the 2013 IEEE Interna- 69. K. Rasmussen, A. Wilson, and A. Hindle, Green mining: Energy con- tional Conference on Software Maintenance, ICSM ’13, IEEE Computer sumption of advertisement blocking methods, Proceedings of the 3rd Society,Washington, DC, USA, 2013, pp. 170–179. International Workshop on Green and Sustainable Software, GREENS 79. D. Li and W. G. J. Halfond, Optimizing energy of HTTP requests in 2014, ACM, New York,NY,USA, 2014, pp. 38–45. android applications, Proceedings of the 3rd International Workshop on 70. C. Sahin, L. Pollock, and J. Clause, How do code refactorings affect Software Development Lifecycle for Mobile, DeMobile 2015, ACM, New energy usage?, Proceedings of the 8th ACM/IEEE International Symposium York, NY,USA, 2015, pp. 25–28. on Empirical Software Engineering and Measurement, ESEM ’14, ACM, 80. D. Li et al., Automated energy optimization of HTTP requests for New York,NY,USA, 2014, pp. 36:1–36:10. mobile applications, Proceedings of the 38th International Conference 71. W. G. da Silva et al., Evaluation of the impact of code refactoring on on Software Engineering, ICSE ’16, ACM, New York, NY, USA, 2016, embedded software efficiency, Proceedings of the 1st Workshop de pp. 249–260. Sistemas Embarcados, Porto Alegre, Brazil, 2010, pp. 145–150. 81. S. Romansky and A. Hindle, On improving green mining for 72. C. Sahin et al., How Does Code Obfuscation Impact Energy Usage? energy-aware software analysis, Proceedings of 24th Annual Inter- Proceedings of the 2014 IEEE International Conference on Software Main- national Conference on Computer Science and Software Engineering, tenance and Evolution, ICSME ’14, IEEE Computer Society,Washington, CASCON ’14, IBM Corp., Riverton, NJ, USA, 2014, pp. 234–245. DC, USA, 2014, pp. 131–140. 82. A. Hindle et al., GreenMiner: A hardware based mining software repos- 73. C. Sahin et al., How does code obfuscation impact energy usage? J. Soft- itories software energy consumption framework, Proceedings of the ware: Evol. Process (2016). 11th Working Conference on Mining SoftwareRepositories, MSR 2014, 74. D. Li et al., Integrated energy-directed test suite optimization, Proceed- ACM, New York,NY,USA, 2014, pp. 12–21. ings of the 2014 International Symposium on Software Testingand Analysis, ISSTA2014, ACM, New York,NY,USA, 2014, pp. 339–350. 75. A. Pathak, Y.C. Hu, and M. Zhang, Bootstrapping energy debugging on smartphones: A first look at energy bugs in mobile devices, Proceedings How to cite this article: Wan M, Jin Y, Li D, Gui J, of the 10th ACM Workshop on Hot Topicsin Networks, HotNets-X, ACM, New York,NY,USA, 2011, pp. 5:1–5:6. Mahajan S, Halfond WGJ. Detecting display energy hotspots 76. M. Linares-Vásquez et al., Mining energy-greedy API usage patterns in in Android apps. Softw Test Verif Reliab. 2017;27:e1635. android apps: An empirical study, Proceedings of the 11th Working Con- https://doi.org/10.1002/stvr.1635 ference on Mining Software Repositories, MSR 2014, ACM, New York,NY, USA, 2014, pp. 2–11.