<<

App Parameter Energy Profiling: Optimizing App Energy Drain by Finding Tunable App Parameters Qiang Xu Y. Charlie Hu Abhilash Jindal Purdue University Purdue University Mobile Enerlytics, LLC

ABSTRACT drain using an energy pro€ler (e.g. [21, 23]), and then (2) In this paper, we observe that modern mobile apps come with determining whether and how the energy hotspots can be a large number of parameters that control the app behavior restructured to drain less energy. which indirectly a‚ect the app energy drain, and using in- We observe that this conventional pro€ling-based energy correct or non-optimal values for such app parameters can optimization process can be ine‚ective in detecting app en- lead to app energy drain de€ciency or even energy bugs. We ergy bugs or de€ciency due to poorly con€gured parameters. argue conventional app energy optimization using an energy Œis is because a conventional energy pro€ler can only pin- pro€ler which pinpoints energy hotspot code segments in point the energy hotspot in the app source code, but the the app source code may be ine‚ective in detecting such energy hotspots are usually controlled by the parameters in- parameter-induced app energy de€ciency. We propose app directly, e.g., casually data-dependent or control-dependent parameter energy pro€ling which identi€es tunable app pa- on the parameters in the complex app source code. rameters that can reduce app energy drain without a‚ecting In this paper, we argue that the huge number of param- app functions as a potentially more e‚ective solution for de- eters in an app provides a potential opportunity to aŠack bugging such app energy de€ciency. We present the design the type of energy ineciency that can be hard to detect and implementation of Medusa, an app parameter energy with conventional energy pro€lers yet can be more easily ad- pro€ling framework. Medusa overcomes three key design dressed by simply tuning relevant parameters. In particular, challenges: how to €lter out and narrow down candidate pa- we propose app parameter energy pro€ling which pro€les the rameters, how to pick alternative parameter values, and how energy impact of an app parameter as a potentially e‚ective to perform reliable energy drain testing of app versions with technique to optimize app energy drain. Œe e‚ectiveness of mutated parameter values. We demonstrate the e‚ectiveness such an approach rests on the answers to two fundamental of Medusa by applying it to a set of Android apps which questions: successfully identi€es tunable energy-reducing parameters. (1) Are parameter induced energy ineciency common among the apps? 1 INTRODUCTION (2) Can developers routinely reduce energy drain by Most mobile apps come with a large number of parameters tuning parameters in their apps? that control various aspects of the apps. For example, apps To the best of our knowledge, this is the €rst e‚ort to study that use the GPS service have parameters controlling the the energy impact of app parameters. location update frequency, apps that o‚er search function- Œe prerequisite to such an approach, however, is a tool alities have parameters controlling the number of search that can accurately identify app parameters whose values suggestions, while all apps containing images have some can signi€cantly a‚ect the app energy drain. parameters controlling the image caching policy. A principle way of developing such a tool is via static arXiv:2009.12156v1 [cs.SE] 22 Sep 2020 In addition to controlling apps’ functionalities, app pa- analysis and runtime energy drain measurement. Such an rameters can also a‚ect the energy consumption of the apps. approach has two steps. First, at compiler time, we can In particular, incorrectly con€gured parameters can cause perform static analysis to extract the code segment that are unnecessarily high energy drain of an app. For example, data-dependent or control-dependent on each parameter an app may render the map unnecessarily frequently and and instrument such code segments to be surrounded by induce a high energy drain when the location is updated too energy measurement start and stop API calls. We then run frequently. Hence understanding the energy impact of app the instrumented app on an actual phone to measure the parameters is important to optimizing app energy drain. energy drain of the parameter-dependent code segments. Œe conventional development cycle for optimizing the However, such an approach faces a number of challenges. energy drain of mobile apps is similar to that for optimizing First, the dependencies between an app parameter and the the running time of traditional so‰ware – iterating the pro- relevant code segments are o‰en hard to track, or require a cess of (1) €nding code segments in the app source code that number of ad hoc customizations to achieve good coverage. contribute to a signi€cant portion of the total app energy For example, we observed that many dependencies are across 1 programming language boundaries, or involve communica- being watched), initial data download, server state, and back- tion with other processes or even remote servers. Second, it ground system activities. We overcome this challenge by is also hard to determine the right granularity of the code using a suite of techniques, including power model-based segments for parameter dependence analysis. Dependence energy measurement, back-to-back interleaved unmodi€ed analysis at the level of branches has the advantage that it and modi€ed app test runs, and hypothesis testing on the is easier to analyze the e‚ect of di‚erent parameter values measured energy drain from multiple runs. that control branch conditions. However, we observed that We have developed a prototype implementation of the many parameters a‚ect app energy consumption in ways Medusa framework on Pixel 2 and applied it to 5 popular other than controlling branch conditions. For example, app apps from Play. Œe framework automatically nar- energy consumption can be a‚ected by controlling the timer rows down the total number of candidate parameters from duration or thread count. Alternatively, we can perform the 19761 down to 1078, out of which, its automated energy test- analysis at the granularity of methods by tracking the de- ing identi€es 3 energy-reduction parameters. Our manual pendency between parameters and method call arguments. inspection of the three parameters shows that one of them However, this approach has the disadvantage that the rela- is indeed a valid tunable parameter, while the other two are tion between the parameter value and the method energy false positives as they fail various components of the apps. consumption which depends on many other factors is o‰en In summary, our study makes the following contributions: opaque, if there is any relation at all. In this paper, we propose a conceptually straightforward • We propose app parameter energy pro€ling, i.e., au- approach to developing such a parameter energy pro€ling tomatically identifying and tuning tunable, energy- tool. In particular, we present a framework that automati- reduction parameters, as a promising new direction cally identi€es app parameters whose value if changed can to app energy optimization. have a signi€cant impact on the app energy drain. We denote • We present the design and prototyping of Medusa, such app parameters as valid tunable parameters, or valid an app parameter pro€ling framework that over- parameters for short. comes a number of key challenges in automatic €lter- Developing such a framework for Android apps faces sev- ing of candidate parameters and automated energy eral challenges. First, in contrast with conventional server drain testing. so‰ware (e.g., databases and web servers), which explicitly • We present an evaluation of Medusa on a set of 5 exposes their con€guration parameters to a central place [34], popular app to validate its e‚ectiveness. parameters in Android apps are scaŠered all around the source code. 1 To this end, our framework starts with all the To our best knowledge, Medusa is the €st app parameter constants in an app in order to have high coverage. energy pro€ling framework and we plan to open source it Second, covering all possible constants in the app code to help foster further research on this important app energy poses another challenge of having too many candidate pa- optimization approach. rameters, making it intractable to measure the energy impact for all of them. To overcome this challenge, we develop a 2 TERMINOLOGIES combination of €ltering techniques to signi€cantly reduce Although we consider all constants in an app, only a portion the number of candidate parameters, by exploiting a key ob- of them are “real” parameters that are actually tunable. For servation that most of the candidates either are not tunable example, values representing failure types should not be or do not a‚ect the app energy drain. changed arbitrarily, as otherwise the app may report the Finally, for each remaining candidate parameter, the frame- wrong kind of failure. On the other hand, some parameters work needs to automatically mutate its default value up and may a‚ect energy consumption, while others do not. For down and measure the energy drain change compared to example, changing the log level of a logger is unlikely to the app energy drain under the default parameter. Reliably cause noticeable energy di‚erence. To make the rest of the identifying app energy drain change due to parameter value discussion more clear, in this section we de€ne the di‚erent changes with minimal false positives and false negatives and types of parameters that concisely embody the discussion faithfully reƒecting the e‚ect of the parameters is challeng- above. ing as app energy drain can be perturbed by many factors We de€ne a parameter to be a constant in app source including variations in UI interactions, test data (e.g., videos code that can be changed by app developers. Œis includes all numeric and boolean constants, enum references, as well 1We speculate that this may be due to the fact that the target users for as values in the XML resource €les (Figure 1). Œis simple server so‰ware are specialized operators, while apps are mostly designed yet broad de€nition of parameter encompasses a substantial for users who don’t have much technical exposure. range of values that are statically tunable by an app developer. 2 Bitmap.createBitmap( Table 1: Parameter syntactic categories. 320, 240, Bitmap.Config.ARGB_8888); Category Example (a) Two numeric parameters and one enum parameter. Initialization & assignment int PARAM = 25 Method call argument foo(320, 240) Condition if (a.size() ¿ 0) (b) A parameter in XML resource €les. Array index a[0] Return value return 0 Figure 1: Parameter examples.

are exercised, and many code is only designed for certain Next we introduce a few more de€nitions that only cover environments, it is expected that only a fraction of the entire speci€c subsets of parameters. code base is executed. Œe parameters that reside in the A tunable parameter is a parameter that in addition does piece of code that is not executed will certainly not cause not a‚ect app functionality when set to some other values. any di‚erence when the value is changed. All of the app’s components should still function properly, Based on this insight, we use existing coverage analysis with minimal impact on user experience. Œis can typically tools to track code execution at line granularity. If a parame- be determined by examining the corresponding source code ter appears in a line that is executed during the test, they we and understanding the functional e‚ect of the parameter. pass this parameter to future €ltering steps, otherwise we Parameters that only function properly with a limited range will not consider this parameter anymore. of values are also considered tunable. However, class €elds and local variables are initialized as An energy-reducing parameter is a parameter that can long as the corresponding class/method is ever used, and reduce app energy consumption when set to certain values. thus the parameters used for €eld or variable initialization Note that an energy-reducing parameter may not be tunable. will be marked as used, even if the €eld/local variable is For example, changing a parameter may cause an app com- never used a‰er initialization. For example, in Figure 3, the ponent that was originally energy heavy to crash and result €eld PARAM is initialized as long as the class Foo is ever used, in reduced energy drain. regardless whether the method bar() is ever called or not Finally, a valid parameter is the intersection of the last (assuming bar() is the only place where PARAM is used). two types, i.e., a parameter that is both tunable and can To this end, we derive the coverage information for €elds reduce app energy drain. Œis is the exact type of parameters and local variables in a di‚erent way: Œe €eld or variable that we are looking for. is considered covered, if and only if any line that uses this €eld or variable is executed. Going back to our example in 3 METHODOLOGY Figure 3, line 2 is covered if and only if line 5 is ever executed. 3.1 Overview Note that even coverage analysis slows down the app and negatively a‚ects energy consumption, the coverage analysis To validate whether there are con€guration parameters in an only needs to be run once, and the results can be reused for app that reduce app energy consumption, we devise a frame- all subsequent runs, as our test script repeats exactly the work that mutates every potential parameter and checks same operations every time, and the execution paths are also if the change can result in energy reduction. Œe mutate- mostly similar. and-test process goes through several steps, as shown in Figure 2. 3.2.2 Parameter Syntactic Categorization. To systemati- cally eliminate unwanted parameters, we €rst partitioned 3.2 Preliminary Filtering the parameters into 5 di‚erent categories, based on the code Given the huge number of parameters each app have (an app construct the parameter appears in, as shown in Table 1. typically have more than a thousand parameters), testing Œere are other code constructs that are used less fre- each and every parameter is prohibitively time consuming. quently, and are merged into the major categories. For ex- To keep the test time manageable, we would like to €rst ample, constructor call arguments are also considered as eliminate the parameters that are unlikely to be valid. We method call arguments, while the literals appear in switch employ two e‚ective €ltering techniques that are based on cases are considered as conditions. heuristic rules and coverage, respectively. We manually examined all the covered numeric parame- ters of AntennaPod, an open source app, partitioned 3.2.1 Parameter Coverage. We write test scripts to exer- them into the 5 categories, and labeled each of them as tun- cise certain major features of the app. As not all features able or non-tunable, as shown in Table 2. Œis categorization 3 Parameter Mutated Automated App 1 Mutation Test Result 1

Parameter Mutated Automated Parameter 1 Mutation App 2 Test Filtered Result 2 Energy-Reducing Valid Parameters Preliminary Manual Filtering Parameters Parameters Validation Parameters ParameterParameter n 2 ...

Result n Parameter Mutated Automated Mutation App n Test

Figure 2: Workƒow of the proposed mutate-and-test process.

1 class Foo { parameters in the €rst two categories in an automatic man- 2 static final int PARAM = 20; ner. To this end, we propose to mine the paŠerns shared by 3 void bar () { the non-tunable parameters. If a number of the non-tunable 4 for (int i = 0; i < n; i++) parameters share a same paŠern, we can use this paŠern to 5 myTask(PARAM); €lter out these parameters as non-tunable parameters. For 6 } example, the variable initialization int i = 0 inside a for 7 } loop (line 4 of Figure 3) is typically not tunable, and we can extract this paŠern and €lter out all initializations of this Figure 3: Coverage example. type. We systematically mine the heuristic rules by going through Table 2: Categorization of AntennaPod numeric pa- the following steps: rameters. (1) Examine the few €les with the most candidate pa- rameters, (2) Manually classify the parameters in those €les as Category # parameters # tunable parameters tunable or non-tunable, (3) Identify the paŠerns (if any) that are shared by a Initialization & assignment 140 19 subset of non-tunable parameters, Method call argument 134 17 (4) Add the identi€ed paŠerns to the rule set, Condition 86 1 (5) Remove the matched parameters from the candidate Array index 58 0 Return value 23 0 parameter set and repeat the process above. As the paŠerns are more pertinent to the programming language being used, rather than individual developer habit, the heuristic rules can be shared across apps, and we only exhaustively partitions all the parameters, and we observed need to invest this initial e‚ort once. that most tunable parameters are from the €rst two cate- By going through the process for a few apps, we derived gories. a list of heuristic rules as shown in Table 3. Applying these With this categorization, we can ignore all parameters in €ltering rules, we can further automatically reduce the can- the last 3 categories, signi€cantly reducing the number of didate parameters of AntennaPod from 274 to 174. parameters to be tested, while only losing a minimal chance of discovering valid parameters from the last three categories. 3.3 Parameter Mutation Note that as the categorization only relies on programming constructs that can be easily recognized, the categorization As we have now selected out a set of parameters that are is fully automated. likely to be tunable, we next need to test new values for each of these parameters. It is time consuming or even intractable 3.2.3 Heuristic Rules. If we look again at Table 2 and com- to test all possible values, and thus in practice only a few val- pare the total number of covered numeric parameters versus ues will be tested for each parameter. We need to be strategic the number of hand labeled tunable parameters, most param- when it comes to choosing new values for the parameters to eters in the €rst two categories are still non-tunable. Since test, as otherwise we may miss opportunities in discovering manually labeling parameters as tunable is labor extensive, valid parameters. For example, it can happen that a subset we would like to perform further €ltering of non-tunable of values would reduce energy, but we failed to choose any 4 Table 3: Heuristic rules. Table 4: Breakdown of tunable numeric parameters of 5 apps. Empty cells represent zero. Param Rule Example type ¿1 1 0 Float Total Num Comparison with 0 or 1 a.size() ¿ 0 Color 6 8 5 19 Num Plus 1 or 2 or minus 1 a.length() - 1 Count 23 1 1 25 Num For loop initialization for (int i = 0; …) Layout 21 1 1 23 Num Ignored methods s.substring(0, 4) Storage 17 17 Œread 3 4 7 Bool One argument method call item.setVisible(true) Time 35 5 1 41 Enum Time unit convert(5, Uncategorized 6 2 8 TimeUnit.DAYS) All Multiple writes to variable Figure 4 Total 111 11 11 7 140

Table 5: Parameter semantic categories. class Foo { int counter = 0; Category Example void count() { counter++; } void reset() { counter = 0; } Color FG COLOR = 0xFFFAFAFA } Count DEFAULT HISTORY MAX ITEMS = 10 Layout setMargins(12, 3, 3, 3) Storage BUFFER SIZE = 8 * 1024 Figure 4: An example for the last heuristic rule. ‡e Œread SCHED EX POOL SIZE = 1 variable counter is updated at multiple places, thus Time CONTROL SOCKET TIMEOUT = 60000 the two numeric parameters with value 0 are not con- Uncategorized E.g., volume, vibration frequency sidered tunable.

• A factor of 8 should be sucient to trigger most of them; or we may choose a value that crashes the app energy e‚ects, or a part of the app, while we could have avoided this by • Avoid 0, which may be illegal for some parameters. choosing other values. On other hand, the values to be tested However this scheme does not work for colors, as many do not need to be optimal in terms of energy consumption, colors are represented as 8-digit hex numbers and are not as we can perform further investigation into the parameter monotonic. But as only a small fraction of parameters with a‰er identifying any of the new values reduces the app en- original values greater than 1 are colors, this inaccuracy is ergy consumption. Below, we discuss di‚erent strategies for still acceptable. picking new values for di‚erent parameter types. For integer parameters with original value 1, we still follow Choosing new values for boolean parameters is easy, as the same guidelines. While we choose 8 as one of the new there is only one alternative value: if the original value is values, 0 is the only value on the other side, although it is true, we choose false. illegal for 5 of the 11 parameters. To obtain insights on how to choose new values for nu- Parameters with original value 0 are versatile in terms meric parameters, we further break down the manually €l- of their semantics. Œus, we select values for each of the tered tunable parameters into di‚erent categories across two semantic categories, and try all of them, so that for any dimensions: their original values, and their semantics, as parameter with original value 0, there is at least one new shown in Table 4. Integer values are broken into three cat- value that matches the parameter category. In the end we egories (¿1, 1, and 0), as they require di‚erent new value choose three new values: 0x‚‚‚ for colors in the 8-digit selection strategies. Without giving rigid de€nitions, we hex number format, 255 for colors that represent one color provide one example for each semantic category in Table 5. component, and 8 for time in days. For an integer parameter with original value x greater For ƒoat parameters, regardless of their categories, we ob- ∗ ( / ) than 1, we choose x 8 and max x 8, 1 as the two new served that if the original value is between 0 and 1 (including values to be tried, complying to the following guidelines: 0), it is likely that the valid value range is between 0 and 1. • Choose values from di‚erent sides, as parameters Œus, following the same guideline as for integer parameters, from most categories are expected to be monotonic we choose 1 − (1 − x)/8 and x/8 for an ƒoat parameter with in terms of energy e‚ects, original value x satisfying 0 < x < 1, making sure that the 5 Table 6: New values for numeric parameters based on while (!onScreen(item)) the original value. swipe ();

Original value New values Figure 5: Non-deterministic test interactions. x (> 1, int) x ∗ 8, max(x/8, 1) 1 8, 0 interactions become non-deterministic if any of the condi- 0 0x‚‚‚, 255, 8 x (≥ 1, ƒoat) x ∗ 8, x/8 tions is not met. A good practice is to avoid such loops with x (0 < x < 1, ƒoat) 1 − (1 − x)/8, x/8 non-deterministic termination conditions. 0.0 0.5, 1.0 Make sure the test data are also deterministic. Apps are not only driven by human interactions, but also by the data fed into them. Watching di‚erent videos or reading di‚erent new values still lie in the same range. For 0, we choose 0.5 posts may consume di‚erent amount of energy. Œis issue and 1.0. is easier to tackle for the apps that mostly work locally, e.g., We do not have such restrictions for parameter values gallery apps or €le managers, but harder for the apps that greater than or equal to 1, thus we choose x ∗ 8 and x/8, heavily rely on remote contents. For example, the most following the same guidelines as for integer parameters. popular threads on Reddit are highly dynamic and constantly We summarize our original value to new value mapping changing. Our approach to overcome this issue is to keep in Table 6. the content as static as possible. In the case of the Reddit As we do not have similar insights for enum parameters, , instead of fetching the trending threads, we change we randomly select at most 3 values that are di‚erent from the app preference so that the most popular threads of all the original value. time are fetched, which are relatively old and static. Save app data to save bandwidth and time. Many apps 3.4 Automated Testing need to download a large amount of data when being freshly Œe high level idea of automated testing is simple: we drive installed. For example, the map app OsmAnd needs to down- each app with a deterministic UI automation script, mea- load an o„ine map of about 100MB before it is functioning. sure energy drain for both the unmodi€ed app and the app Œe open source app store F-Droid also needs to download modi€ed with a new parameter value, and €nally compare tens of megabytes metadata for all apps in the store. While the results to see if the parameter of interest reduces the this is acceptable when testing the app for a few times, down- app energy drain. However, to make sure that the test is re- loading the data for thousands of times can easily lead to producible, statistically solid, having minimal false positives protective measures on the server side, such as providing and false negatives, and faithfully reƒects the e‚ect of the services with reduced bandwidth or even complete blockage. parameters, every step needs to be carefully thought out. On the other hand, preparing an app for energy drain tests may be very time consuming. For example, for a realistic 3.4.1 Test Script. We design one test script for each app test scenario of a password manager app, the app must have and each test scenario. One major advantage of test automa- at least dozens of password entries in its database. However, tion scripts is reproducibility. Œe script performs exactly popping the database with so many entries each time a‰er the same operations every time. While it is relatively easy installation is time consuming and slows down the test. to write a test automation script that works when you run Œe solution for both scenarios is to utilize the data export it for a couple of times, it requires extra caution to design functionality provided by many apps. Œe downloading or a test script that runs for thousands of times and ensures database bootstrapping only needs to be done once, and all everything is reproducible at the same time. In addition to subsequent tests only need to load the exported data a‰er the more obvious considerations such as cleaning the app app installation. cache, there are a number of less obvious considerations. Pay aˆention to the server state. For many apps requiring Make sure the interactions are really deterministic. Œe account login, a part of the state is stored on the server side. same test script code is executed every time does not guar- Even though we have struggled to make sure that the app has antee that the same interactions are received by the app. For the same data loaded and the same local state every time the example, in Figure 5, we keep swiping on a scrollable list app starts, the server may not be. Œe server of the instant until we €nd the desired item. We will swipe for the same messenger app Conversations recognizes the phone as a new number of times if the list always has the same items each device every time the app reinstalls. In the end, the server time, and they are always sorted in the same way. But the maintains thousands of ”devices” that have used this account, 6 energy di‚erences caused by this energy ƒuctuations are rel- 1.0 1800 atively small (1-3%). To reduce the number of false positives 0.8 1700 caused by such energy ƒuctuations, we set a threshold t for 0.6 d Ah) 1600 CDF

0.4 each app (the magnitude of energy ƒuctuation is di‚erent Energy ( 1500

0.2 for di‚erent apps) and €lter out the results with energy dif- 1400

0.0 ference smaller than the threshold. Speci€cally, we modify 1300 Time 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 Normalized Stdev the hypothesis testing procedure above to incorporate the (a) Energy consumption (b) ‡e CDF of the normal- threshold. Instead of performing t-test for Pi and Bi , we now 0 0 over a period of 3 days. ized standard deviations in perform t-test for Pi and Bi , where Bi = (1 − td )Bi . Since the 0 ‡e error bar represents Figure a. two groups of data Pi and Bi have di‚erence variances, we the standard deviation of switch to use Welch’s t-test instead of Student’s t-test. the 5 runs. 3.4.4 Stability Threshold. We addressed the false posi- Figure 6: Energy consumption of the unmodi€ed An- tives caused by energy ƒuctuations in the last section. How- tennaPod app. Each data point consists of 5 runs. ever, energy ƒuctuations can also introduce false negatives. Figure 6b plots the CDF of the normalized standard devia- tions (standard deviation divided by mean) using the same which drastically changes the app behavior. Œis type of data as Figure 6a. Although the standard deviation is very problem is hard to diagnose, as servers are o‰en viewed as a low in most cases (less than 3%), in some cases the standard black box. Œe solutions are also o‰en app speci€c. deviation can be as high as 16% of the mean. As the chance of passing the t-test is negatively correlated with the standard 3.4.2 Back-to-Back Testing. For many apps that require deviation, such a high standard deviation makes it hard to access to the Internet, due to the time dependent network detect any energy reduction that are lower than 16%. condition and server load, the app energy consumption is As an countermeasure, because the standard deviation is also changing over time. Figure 6a shows the energy con- low most of the time, we can choose another threshold ts sumption of the AntennaPod app over a period of 3 days. for each app (0.03 for Figure 6b), and consider any group of While adjacent data points o‰en have similar energy con- results with normalized standard deviation higher than ts as sumption, the energy consumptions of distant data points abnormal, and re-run the tests for that group. can di‚er by as much as 14%. Œus, to ensure our comparison is not a‚ected by such time dependent energy consumption dri‰s, we run the unmodi€ed 3.5 Manual Validation app and the modi€ed app back-to-back whenever we want Since our parameter €ltering process is based on heuristic to try a new parameter value. rules and hence is not perfect, there are still non-tunable pa- rameters that are mutated and tested. Mutated non-tunable 3.4.3 Hypothesis Testing. To determine whether a param- parameters may also reduce energy consumption by causing eter value reduces energy consumption or not, we compare some components of the app to crash without crashing the the energy consumptions of 5 runs of the unmodi€ed app entire app or test script and hence result in false positives in (denoted as B1,..., B5) and 5 runs of the modi€ed app (de- our process. Since these cases are usually app speci€c and noted as P1,..., P5). To make sure the energy reduction (if do not have common symptoms, it is hard to detect these any) is statistically signi€cant, we perform Student’s t-test cases automatically. To this end, we manually examine the for the two groups of energy data with a signi€cance level code related to every energy-reducing parameter to make 0.05, against the following hypothesis: sure that the parameter does not a‚ect app functionality. Fortunately, as each app only have a few energy-reducing • Null hypothesis: mean(Pi ) = mean(Bi ) parameters, the cost of manual validation is manageable. • Alternative hypothesis: mean(Pi ) ¡ mean(Bi ) We accept the parameter as an energy-reducing parameter if the null hypothesis is rejected in favor of the alternative 4 IMPLEMENTATION hypothesis. We implemented the parameter mutation and test framework However, when applying the above hypothesis testing in 3.5 KLoC. Œe parameter analysis and mutation are built scheme, we noticed that many cases that passed the t-test upon Spoon [24], a Java source code analysis and transfor- are actually caused by small ƒuctuations of the measured en- mation framework. Œe test automation is achieved using ergy consumption, even though we obtained the two groups the test automation framework Appium[1]. Œe coverage of results by running the apps back-to-back. However, the analysis is achieved using JaCoCo[2]. 7 As testing a single app can last for a few days, we opti- 6.1 App Selection mized the test framework in terms of speed and reliability. To select the apps to be tested in a clear and objective way, Every time a parameter is mutated, the app needs to be we start with the open source apps listed in the Wikipedia rebuilt. As the build process is time consuming, we make it page [31]. Next, we €lter out apps that meets any of the asynchronous by pushing all build tasks to a dedicated thread. following conditions: Our framework can even distribute build tasks to multiple • Œe app cannot be successfully analyzed by Spoon[24]; remote servers to further speedup the build process. With • Œe app contains ¿30% non-Java code; all the optimizations for app build, now the tests performed • Œe app uses hardware components that do not have on the phone become the new boŠleneck. To this end, we power models (e.g., hardware codec, GPS); make our test framework fully reentrant, and thus tests for • Œe app is no longer in Google Play or open source; di‚erent apps can run on di‚erent phones independently. or In reality, a day-long test can be interrupted by various • Œe app energy consumption is stochastic under the causes, e.g., network outage, unhandled exceptions. Œus same test scenario. we enhance the test framework with the pause-and-resume feature, where tests can always resume at the €rst untested A‰er removing the non-compliant apps, we sort the re- parameter. UI test automation is known to be fragile, due to maining apps by Google Play installs. Finally, to increase timing or changing external conditions [19]. Œus, the test diversity, we select the top 5 apps that are in di‚erent cate- can fail in unexpected ways. However, the conditions causing gories, as shown in Table 7. We test against the latest version the test to fail are o‰en transient, and thus we give it another of each app. We see that the 5 apps have a total of 19761 chance before marking the test as failed. Œe framework parameters – mutating and testing the energy impact of all further helps diagnosing these transient issues by keeping of them is simply impractical. the app log and its screenshot right before failure. 6.2 Test Con€guration Œe test scenarios for each app are listed in Table 7. All 5 LIMITATIONS test cases are 40-50 seconds long. Œe stability threshold Œe parameters are only modi€ed statically, as we do this by ts is determined by €rst running the experiments with the modifying the source code. However, there are parameters threshold set to 1 (no e‚ect), drawing the CDF of the nor- that do not have a single best value during its life [5]. We malized standard deviations as in Figure 6b, and selecting can also have a €ner tuning granularity when tuning param- the turning point in the CDF curve as ts . To determine the eters during runtime. For example, if we want to adjust the other threshold td , we re-test those parameters that pass the initialization value of a €eld, the €eld value of di‚erent class t-test with td set to 0 (no e‚ect), do the t-test again, and treat instances can be adjusted independently when adjusted dur- those that only pass the t-test in the €rst round as caused ing runtime, but the €eld initialization can only be adjusted by energy ƒuctuations instead of the parameter itself. Œey at the class level when modifying the source code. we choose the minimum value for td that €lters out all those We only focus on parameters located in the app’s source false positives. code, without looking at those in 3rd-party libraries and the Android framework. However, app developers only have 6.3 Power Model limited control over the parameters in those libraries. Devel- We calculate the app energy consumption using power mod- opers typically control those parameters indirectly via API els. We built power models for CPU, GPU and WiFi. We calls (e.g., seŠers). We can still control the parameters if the use well-established utilization-based power models [8, 9] to developers ever call the corresponding APIs, as we can now calculate the CPU and GPU energy. In a nutshell, the models change the method parameter. take the status and frequency of each CPU core and the GPU, Since our implementation is based on Spoon, which only and outputs the power. supports analysis of Java source code, we are restricted to As the CPU of Pixel 2 uses the big.LITTLE architecture, open source apps that are mainly wriŠen in Java. the power of all CPU cores can be calculated as Õ Õ Pcpu = Pbase+ PliŠle(fi )+ Pbig(fi ) 6 EVALUATION i ∈ active liŠle cores i ∈ active big cores

Our evaluation tests 5 open source apps. Œe evaluation where fi is the frequency of core i, Pbase, and the mappings focuses on numeric parameters. Œe tests are performed on PliŠle and Pbig can be found in Table 8. To use this power a Pixel 2 phone, and the energy consumption is measured model, we log each CPU core’s activeness and frequency using a empirical power model. change using ‰race [3]. 8 Table 7: Tested apps and test con€guration.

App Category Installs Version Test Scenario # Parameters # Parameters # Selected ts td a‰er Filtering Tunable Parameters ConnectBot SSH client 4M+ 1.9.7 View 6 Python 12402 451 128 0.03 0.03 €les using vi KeePassDroid Password manager 2M+ 2.5.12 Copy 8 1218 131 8 0.03 0.03 password entries AntennaPod Podcast client 591K+ 1.8.3 View 6 episode 2233 165 30 0.03 0.03 descriptions Conversations Instant messenger 108K+ 2.8.9 Send 10 random 3128 299 73 0.04 0.01 messages Wikimedia Image sharing 49K+ 2.13 View 4 780 32 14 0.08 0.00 Commons images

Table 8: CPU power model. ‡e base power Pbase (for Table 9: GPU power model. all cores) is 24.28mA. State Freq (MHz) SLUMBER NAP AWARE ACTIVE LiŠle Core Big Core Frequency (kHz) Power (mA) Frequency (kHz) Power (mA) 257 0 107.25 107.25 157.79 300000 4.4 300000 15.55 364800 5.28 345600 17.37 ACTIVE 441600 6.29 422400 21.58 Active transmit

518400 7.4 499200 23.83 active transmit 595200 8.45 576000 27.23 End of 672000 9.47 652800 30.64 748800 10.52 729600 35.36 Active transmit 825600 11.61 806400 39.3 IDLE TAIL 883200 12.27 902400 40.93 Idle for 210ms 960000 14.72 979200 46.07 1036800 15.37 1056000 50.54 Figure 7: ‡e state machine of the WiFi power model. 1094400 16.72 1132800 54.92 ‡e condition “active transmit” is met whenever two 1171200 17.48 1190400 60.55 frames are transmitted within 210ms. 1248000 19.73 1267200 63.97 1324800 22.82 1344000 68.77 1401600 24.04 1420800 75.99 1478400 25.08 1497600 86.31 Similar to CPU, we calculate the GPU power using the 1555200 26.35 1574400 95.8 following equation: 1670400 31.07 1651200 103.29 P = P × (s, f ) 1747200 35.1 1728000 112.05 gpu state freq 1824000 41.97 1804800 127.25 where the mapping can be read from Table 9, and the state 1900800 43.89 1881600 153.31 and frequency are obtained using ‰race. 1958400 161.44 We use Finite State Machine (FSM)SM-based modeling 2035200 202.04 [12, 22, 25, 27, 32] for WiFi. Œe state machine of the power 2112000 213.76 2208000 251.34 model is shown at Figure 7. Œe FSM needs to be driven by 2265600 291.63 the network transmission log, which is obtained using ‰race. 2323200 342.14 Œe power in states TAIL and IDLE is €xed (115.39mA and 2342400 350.92 0mA), while the power in the ACTIVE state is calculated as 2361600 356.06 2457600 364.09 Pactive = a ∗ throughput + b where a = 3.10mA/bps and b = 116.48mA. Œe throughput includes both upload and download trac. 9 Table 10: Summary of test results.

App # Tunable # Energy- # Valid reducing ConnectBot 128 1 0 KeePassDroid 8 0 0 AntennaPod 30 0 0 Conversations 73 2 1 Wikimedia Commons 14 0 0

6.4 Results Table 10 shows a summary of the test results. Our rule-based €ltering was highly e‚ective and reduced the number of candidate parameters for the 5 apps from 19761 down to 1078. As our heuristic rules are not perfect and still leave out some non-tunable parameters, to save testing time, we manually select tunable parameters from the initialization & Figure 8: For each parameter of the app Conversations, assignment and method call argument categories. Out of the the energy consumption of the modi€ed app with the 253 tunable parameters selected for the set of apps, three are value consuming the least energy, compared to back- energy-reducing, out of which only one of the parameters to-back runs of the unmodi€ed app. ‡e error bars is valid, and the other two achieve energy reduction by fail- represent the standard deviation of 5 runs. ing various components in the app. We discuss these three parameters in detail below. we start the test scenario a‰er the same delay. But the €rst 6.4.1 The Valid Parameter. Figure 8 shows the test results energy reducing parameter of Conversations delays the start of the Conversations app. Due to the stability parameter time of the test scenario, as the UI automation framework ts , all standard deviations are restricted to 4% of the corre- will only perform actions when the app is in a relatively sponding mean. We can observe that most parameters do not idle state, but this modi€ed parameter causes the app to be a‚ect energy consumption. Œe only three data points that busy in its initialization phase for longer time. When the have statistically signi€cant energy di‚erence are those in test €nally starts in the modi€ed app, all initializations have the dashed box (5.8%, 4.8%, 8.2%). While the two data points been done, causing the apparent energy reduction. on the le‰ correspond to the two energy-reducing parame- Œe other energy-reducing parameter (the valid one) con- ters, the rightmost one is likely caused by transient energy trols the refresh rate of variant UI elements. By reducing the ƒuctuation and was never be reproduced successfully. refresh rate from once every 500ms to once every 4000ms, the app energy consumption is reduced by 4.8%. However, 6.4.2 Two Invalid Parameters. the energy reduction is at the cost of degraded user experi- ConnectBot. Œe energy reducing parameter in the SSH ence: when sending a message, the message will occur in client ConnectBot sets the initial height of the emulated ter- the conversation view only seconds a‰er it has disappeared minal. Although the height of the terminal instantly changes from the text editing box. a‰er initialization to €t the screen size, the initial height also a‚ects the size of the character bu‚er in intricate ways. Œe 7 RELATED WORK changed bu‚er size triggers a bug that was not triggered 7.1 Con€guration Tuning with the original bu‚er size, causing the terminal scrolling Tuning so‰ware con€guration parameters for beŠer perfor- functionality broken, and in this way reduces energy con- mance is a common practice across many €elds. Recently, sumption. many automatic con€guration tuning systems are proposed, Conversations. When measuring energy consumption, we either for arbitrary con€gurable systems, or for a speci€c leave out the initialization phase of the app and only measure type of application. Œe systems can be either o„ine, mean- the energy consumption of our target test scenario. However, ing that the recommended parameter values are optimal for as we do not know exactly when all initializations will be a certain workload and environment, or online, which have done, there may be some lingering initializations tasks a‰er the capability to recon€gure the target system dynamically we enter the test scenario. Œis is not a problem as long as to adapt to environment and workload changes. Many of 10 the automatic tuners are powered by some machine learning that supports popular applications yet provides comparable algorithms. performance to unikernels.

General Purpose Tuning. Œese automatic tuners are de- 7.2 Performance Model signed for arbitrary tunable systems. BestCon€g [34] is an of- ƒine tuner that uses heuristic searching algorithms to search Many systems [13, 14, 26, 29] focus on building a perfor- for a good set of parameter values. SmartConf [30] is an on- mance model for a certain application and workload. A line tuner that builds a control theoretic performance model performance model is a mathematical function where the using pro€ling data. It then uses this model to guide the domain is the con€guration parameters and the codomain is optimization online. As these general purpose tuners can- the performance. Performance optimization or other tasks not utilize domain speci€c knowledge, the con€gurations can be further performed based on the performance model. they recommend are o‰en not as good as those application Œese systems mostly rely on sampling, and they generate speci€c tuners. beŠer performance model by sampling in a more ecient way. GPS Tuning. Aenea [5] has the same goal as our work, i.e., tune app parameters in order to reduce smartphone 7.3 Con€guration Related Performance energy consumption. However, they only focus on a speci€c Bugs category of applications - GPS navigation apps. It optimizes X-ray [4] performs taint analysis from inputs and con€gura- 3 GPS related parameters dynamically using reinforcement tions to branch conditions, so as to aŠribute branch decisions learning to ensure that the app satis€es the prede€ned energy to inputs and con€gurations, which assists developers to an- budget. In contrast, our goal is to automatically identify app alyze performance bugs related to inputs and con€gurations. parameters of any app that can reduce app energy drain. Han et al. [10] perform an empirical study on performance Database Tuning. OŠerTune [28], CDBTune [33], and QTune bugs that manifest only under certain con€gurations, and [17] all perform o„ine database tuning. While OŠerTune show that most so‰ware is only tested under the default con- utilizes supervised learning, which requires a large num- €guration. CP-Detector [11] proposes an ecient so‰ware ber of high quality tuning samples, CDBTune and QTune testing scheme to detect such performance bugs. are based on reinforcement learning. Both CDBTune and QTune make use of the internal metrics commonly exist in 7.4 Our work various database systems, which leads to more informed de- Œe major di‚erence between our work and all the systems cision compared to general purpose tuners. Sophia [20] is an above, is that they all rely on existing con€guration param- online database tuner that focuses on orchestrating recon- eters explicitly exported to a central place by the so‰ware €gurations of a database cluster to maintain its availability developers. For example, most database systems have their during recon€guration. con€gurations parameters managed in a single table. In con- trast, our goal is to discover important app parameters that File System Tuning. Cao et al. [7] compared and analyzed are scaŠered around the app source code which are o‰en various heuristic searching algorithms using Linux €le sys- overlooked even by the developers. tems as the test bed, and found that search based algorithms do not scale and may not give near optimal con€gurations. Carver [6] uses a sampling method to eciently determine 8 CONCLUSION the important parameters for a certain environment-workload- In this paper, we proposed app parameter energy pro€ling metric combination. CAPES [18] optimizes the con€gura- which identi€es tunable app parameters that can reduce app tions of the distributed €le system Lustre using a similar energy drain without a‚ecting app functions as a potentially approach as CDBTune and QTune. more e‚ective solution for debugging such app energy de€- ciency than conventional energy pro€ling. We presented the Kernel Tuning. Cozart [15] focuses on kernel debloating design of such an app parameter energy pro€ling framework by tuning the kernel build con€guration system - Kcon€g. called Medusa that overcomes three key design challenges: Œe Kcon€g parameters are mostly binary and the choice how to €lter out and narrow down candidate parameters, is only dependent on whether the feature is needed by the how to pick alternative parameter values, and how to per- application or not. Œus, instead of referring to machine form reliable energy drain testing of app versions with mu- learning algorithms, Cozart uses low level tracing to deter- tated parameter values, and demonstrated its e‚ectiveness mine whether a feature is used. On the other hand, Lupine using a set of Android apps. To our best knowledge, Medusa [16] hand selects Kcon€g parameters to build a Linux kernel is the €st app parameter energy pro€ling framework and we 11 plan to open source it to help foster further research on this [12] Junxian Huang, Feng Qian, Alexandre Gerber, Z. Morley Mao, Sub- important app energy optimization approach. habrata Sen, and Oliver Spatscheck. 2012. A Close Examination of Performance and Power Characteristics of 4G LTE Networks. In Pro- ceedings of the 10th International Conference on Mobile Systems, Appli- Acknowledgement. Œis work was supported in part by cations, and Services (Low Wood Bay, Lake District, UK) (MobiSys ’12). NSF grant CSR-1718854. Association for Computing Machinery, New York, NY, USA, 225€“238. hŠps://doi.org/10.1145/2307636.2307658 [13] Pooyan Jamshidi, Miguel Velez, Christian Kastner,¨ and Norbert Sieg- REFERENCES mund. 2018. Learning to Sample: Exploiting Similarities across En- vironments to Learn Performance Models for Con€gurable Systems. [1] [n.d.]. Appium. hŠp://appium.io/ In Proceedings of the 2018 26th ACM Joint Meeting on European So‡- [2] [n.d.]. JaCoCo Java Code Coverage Library. hŠps://www.eclemma. ware Engineering Conference and Symposium on the Foundations of org/jacoco/ So‡ware Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). [3] 2020. Using ‡race. hŠps://source.android.com/devices/tech/debug/ Association for Computing Machinery, New York, NY, USA, 71€“82. ‰race hŠps://doi.org/10.1145/3236024.3236074 [4] Mona AŠariyan, MIchael Chow, and Jason Flinn. 2012. X-ray: Au- [14] C. Kaltenecker, A. Grebhahn, N. Siegmund, J. Guo, and S. Apel. 2019. tomating Root-Cause Diagnosis of Performance Anomalies in Produc- Distance-Based Sampling of So‰ware Con€guration Spaces. In 2019 tion So‰ware. In Presented as part of the 10th USENIX Symposium on IEEE/ACM 41st International Conference on So‡ware Engineering (ICSE). Operating Systems Design and Implementation (OSDI 12). USENIX, Hol- 1084–1094. hŠps://doi.org/10.1109/ICSE.2019.00112 lywood, CA, 307–320. hŠps://www.usenix.org/conference/osdi12/ [15] Hsuan-Chi Kuo, Jianyan Chen, Sibin Mohan, and Tianyin Xu. 2020. technical-sessions/presentation/aŠariyan Set the Con€guration for the Heart of the OS: On the Practicality of [5] Anthony Canino, Yu David Liu, and Hidehiko Masuhara. 2018. Sto- Operating System Kernel Debloating. Proc. ACM Meas. Anal. Comput. chastic Energy Optimization for Mobile GPS Applications. In Pro- Syst. 4, 1, Article 03 (May 2020), 27 pages. hŠps://doi.org/10.1145/ ceedings of the 2018 26th ACM Joint Meeting on European So‡ware 3379469 Engineering Conference and Symposium on the Foundations of So‡- [16] Hsuan-Chi Kuo, Dan Williams, Ricardo Koller, and Sibin Mohan. 2020. ware Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). As- A Linux in Unikernel Clothing. In Proceedings of the Fi‡eenth European sociation for Computing Machinery, New York, NY, USA, 703€“713. Conference on Computer Systems (Heraklion, Greece) (EuroSys €20). hŠps://doi.org/10.1145/3236024.3236076 Association for Computing Machinery, New York, NY, USA, Article [6] Zhen Cao, Geo‚ Kuenning, and Erez Zadok. 2020. Carver: Finding 11, 15 pages. hŠps://doi.org/10.1145/3342195.3387526 Important Parameters for Storage System Tuning. In 18th USENIX [17] Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2019. QTune: A Conference on File and Storage Technologies (FAST 20). USENIX Associ- ‹ery-Aware Database Tuning System with Deep Reinforcement ation, Santa Clara, CA, 43–57. hŠps://www.usenix.org/conference/ Learning. Proc. VLDB Endow. 12, 12 (Aug. 2019), 2118€“2130. hŠps: fast20/presentation/cao-zhen //doi.org/10.14778/3352063.3352129 [7] Zhen Cao, Vasily Tarasov, Sachin Tiwari, and Erez Zadok. 2018. To- [18] Yan Li, Kenneth Chang, Oceane Bel, Ethan L. Miller, and Darrell D. E. wards BeŠer Understanding of Black-box Auto-Tuning: A Compara- Long. 2017. CAPES: Unsupervised Storage Performance Tuning Using tive Analysis for Storage Systems. In 2018 USENIX Annual Technical Neural Network-Based Deep Reinforcement Learning. In Proceedings Conference (USENIX ATC 18). USENIX Association, Boston, MA, 893– of the International Conference for High Performance Computing, Net- 907. hŠps://www.usenix.org/conference/atc18/presentation/cao working, Storage and Analysis (Denver, Colorado) (SC €17). Association [8] Xiaomeng Chen, Abhilash Jindal, Ning Ding, Yu Charlie Hu, Maruti for Computing Machinery, New York, NY, USA, Article 42, 14 pages. Gupta, and Rath Vannithamby. 2015. Smartphone Background Ac- hŠps://doi.org/10.1145/3126908.3126951 tivities in the Wild: Origin, Energy Drain, and Optimization. In Pro- [19] Jonathan Lipps. [n.d.]. Making Your Appium Tests Fast and Reliable, ceedings of the 21st Annual International Conference on Mobile Com- Part 3: Waiting for App States. hŠps://appiumpro.com/editions/ puting and Networking (Paris, France) (MobiCom ’15). Association 21-making-your-appium-tests-fast-and-reliable-part-3-waiting-for-app-states for Computing Machinery, New York, NY, USA, 40€“52. hŠps: [20] Ashraf Mahgoub, Paul Wood, Alexander Medo‚, Subrata Mitra, Folker //doi.org/10.1145/2789168.2790107 Meyer, Somali Chaterji, and Saurabh Bagchi. 2019. SOPHIA: Online [9] Ning Ding and Y. Charlie Hu. 2017. GfxDoctor: A Holistic Graphics Recon€guration of Clustered NoSQL Databases for Time-Varying Energy Pro€ler for Mobile Devices. In Proceedings of the Twel‡h Euro- Workloads. In 2019 USENIX Annual Technical Conference (USENIX ATC pean Conference on Computer Systems (Belgrade, Serbia) (EuroSys €17). 19). USENIX Association, Renton, WA, 223–240. hŠps://www.usenix. Association for Computing Machinery, New York, NY, USA, 359€“373. org/conference/atc19/presentation/mahgoub hŠps://doi.org/10.1145/3064176.3064206 [21] Radhika MiŠal, Aman Kansal, and Ranveer Chandra. 2012. Em- [10] Xue Han and Tingting Yu. 2016. An Empirical Study on Performance powering Developers to Estimate App Energy Consumption. In Pro- Bugs for Highly Con€gurable So‰ware Systems. In Proceedings of ceedings of the 18th Annual International Conference on Mobile Com- the 10th ACM/IEEE International Symposium on Empirical So‡ware puting and Networking (Istanbul, Turkey) (Mobicom €12). Associ- Engineering and Measurement (Ciudad Real, Spain) (ESEM €16). As- ation for Computing Machinery, New York, NY, USA, 317€“328. sociation for Computing Machinery, New York, NY, USA, Article 23, hŠps://doi.org/10.1145/2348543.2348583 10 pages. hŠps://doi.org/10.1145/2961111.2962602 [22] Ana Nika, Yibo Zhu, Ning Ding, Abhilash Jindal, Y. Charlie Hu, Xia [11] Haochen He. 2019. Tuning Back€red? Not (Always) Your Fault: Zhou, Ben Y. Zhao, and Haitao Zheng. 2015. Energy and Performance Understanding and Detecting Con€guration-Related Performance of Smartphone Radio Bundling in Outdoor Environments. In Proceed- Bugs. In Proceedings of the 2019 27th ACM Joint Meeting on European ings of the 24th International Conference on (Florence, So‡ware Engineering Conference and Symposium on the Foundations Italy) (WWW ’15). International World Wide Web Conferences Steer- of So‡ware Engineering (Tallinn, Estonia) (ESEC/FSE 2019). Associ- ing CommiŠee, Republic and Canton of Geneva, CHE, 809€“819. ation for Computing Machinery, New York, NY, USA, 1229€“1231. hŠps://doi.org/10.1145/2736277.2741635 hŠps://doi.org/10.1145/3338906.3342498 12 [23] Abhinav Pathak, Y. Charlie Hu, and Ming Zhang. 2012. Where is €19). Association for Computing Machinery, New York, NY, USA, the Energy Spent inside My App? Fine Grained Energy Accounting 415€“432. hŠps://doi.org/10.1145/3299869.3300085 on Smartphones with Eprof. In Proceedings of the 7th ACM European [34] Yuqing Zhu, Jianxun Liu, Mengying Guo, Yungang Bao, Wenlong Conference on Computer Systems (Bern, Switzerland) (EuroSys €12). Ma, Zhuoyue Liu, Kunpeng Song, and Yingchun Yang. 2017. Best- Association for Computing Machinery, New York, NY, USA, 29€“42. Con€g: Tapping the Performance Potential of Systems via Auto- hŠps://doi.org/10.1145/2168836.2168841 matic Con€guration Tuning. In Proceedings of the 2017 Symposium [24] Renaud Pawlak, Martin Monperrus, Nicolas Petitprez, Carlos Noguera, on Cloud Computing (Santa Clara, ) (SoCC €17). Associ- and Lionel Seinturier. 2016. SPOON: A library for implementing ation for Computing Machinery, New York, NY, USA, 338€“350. analyses and transformations of Java source code. So‡ware: Practice hŠps://doi.org/10.1145/3127479.3128605 and Experience 46, 9 (2016), 1155–1179. hŠps://doi.org/10.1002/spe. 2346 arXiv:hŠps://onlinelibrary.wiley.com/doi/pdf/10.1002/spe.2346 [25] Feng Qian, Zhaoguang Wang, Alexandre Gerber, Zhuoqing Mao, Sub- habrata Sen, and Oliver Spatscheck. 2011. Pro€ling Resource Us- age for Mobile Applications: A Cross-Layer Approach. In Proceed- ings of the 9th International Conference on Mobile Systems, Applica- tions, and Services (Bethesda, Maryland, USA) (MobiSys ’11). Asso- ciation for Computing Machinery, New York, NY, USA, 321€“334. hŠps://doi.org/10.1145/1999995.2000026 [26] Daniele Rogora, Antonio Carzaniga, Amer Diwan, MaŠhias Hauswirth, and Robert Soule.´ 2020. Analyzing System Performance with Probabilistic Performance Annotations. In Proceedings of the Fif- teenth European Conference on Computer Systems (Heraklion, Greece) (EuroSys €20). Association for Computing Machinery, New York, NY, USA, Article 43, 14 pages. hŠps://doi.org/10.1145/3342195.3387554 [27] L. Sun, R. K. Sheshadri, W. Zheng, and D. Koutsonikolas. 2014. Model- ing WiFi Active Power/Energy Consumption in Smartphones. In 2014 IEEE 34th International Conference on Distributed Computing Systems. 41–51. [28] Dana Van Aken, Andrew Pavlo, Geo‚rey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Œrough Large-Scale Machine Learning. In Proceedings of the 2017 ACM Inter- national Conference on Management of Data (Chicago, Illinois, USA) (SIGMOD €17). Association for Computing Machinery, New York, NY, USA, 1009€“1024. hŠps://doi.org/10.1145/3035918.3064029 [29] Miguel Velez, Pooyan Jamshidi, Florian SaŠler, Norbert Siegmund, Sven Apel, and Christian Kstner. 2020. Con€gCrusher: towards white-box performance analysis for con€gurable systems. Auto- mated So‡ware Engineering (Aug. 2020). hŠps://doi.org/10.1007/ s10515-020-00273-8 [30] Shu Wang, Chi Li, Henry Ho‚mann, Shan Lu, William Sentosa, and Achmad Imam Kistijantoro. 2018. Understanding and Auto-Adjusting Performance-Sensitive Con€gurations. In Proceedings of the Twenty- Šird International Conference on Architectural Support for Program- ming Languages and Operating Systems (Williamsburg, VA, USA) (ASP- LOS €18). Association for Computing Machinery, New York, NY, USA, 154€“168. hŠps://doi.org/10.1145/3173162.3173206 [31] Wikipedia contributors. 2020. List of free and open-source Android applications — Wikipedia, Œe Free Encyclopedia. hŠps://en.wikipedia.org/w/index.php?title=List of free and open-source Android applications&oldid=975713813. [Online; accessed 19-September-2020]. [32] Fengyuan Xu, Yunxin Liu, ‹n Li, and Yongguang Zhang. 2013. V- edge: Fast Self-constructive Power Modeling of Smartphones Based on BaŠery Voltage Dynamics. In 10th USENIX Symposium on Net- worked Systems Design and Implementation (NSDI 13). USENIX As- sociation, Lombard, IL, 43–55. hŠps://www.usenix.org/conference/ nsdi13/technical-sessions/presentation/xu fengyuan [33] Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, and et al. 2019. An End-to-End Automatic Cloud Database Tuning System Using Deep Reinforcement Learning. In Proceedings of the 2019 International Con- ference on Management of Data (Amsterdam, Netherlands) (SIGMOD

13