The Pennsylvania State University The Graduate School

ENERGY OPTIMIZATION FOR WIRELESS COMMUNICATIONS

ON MOBILE DEVICES

A Dissertation in Computer Science and Engineering by Yi Yang

c 2018 Yi Yang

Submitted in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

May 2018 The dissertation of Yi Yang was reviewed and approved∗ by the following:

Guohong Cao Professor of Computer Science and Engineering Dissertation Advisor, Chair of Committee

George Kesidis Professor of Computer Science and Engineering Professor of Electrical Engineering

Sencun Zhu Associate Professor of Computer Science and Engineering

Dinghao Wu Associate Professor of Information Sciences and Technology

Mahmut Taylan Kandemir Professor of Computer Science and Engineering Graduate Program Chair of Computer Science and Engineering

∗Signatures are on file in the Graduate School. Abstract

Mobile devices such as smartphones and smartwatches are becoming increasingly popular accompanied with a wide range of apps. Those apps usually require data communications through wireless interfaces, which will drain the battery quickly. Thus, it is of great value to characterize the energy consumption of wireless com- munications and propose energy saving solutions. The specific goal of this dissertation is to optimize the energy consumption of wireless communications on mobile devices. Specifically, this dissertation has four foci. First, we propose network quality aware prefetching algorithms to save energy for in-app advertising. The cellular interface on smartphones continues to consume a large amount of energy after a data transmission (referred to as the long tail problem). Then periodically fetching ads through the cellular network may lead to significant battery drain on smartphones. To reduce the tail energy, we can predict the number of ads needed in the future and then prefetch those ads together. However, prefetching unnecessary ads may waste both energy and cellu- lar bandwidth, and this problem becomes worse when the network quality is poor. To solve this problem, we propose network quality aware prefetching algorithms. We first design a prediction algorithm which generates a set of prefetching options with various probabilities, and then we propose two prefetching algorithms to select the best prefetching option by considering the effect of network quality. Second, we generalize the prefetching problem, where the goal is to find a prefetching schedule that minimizes the energy consumption of the data transmissions under the current network condition. To solve the formulated nonlinear optimization problem, we first propose a greedy algorithm, and then propose a discrete algorithm with better performance. Third, we consider the context information when offloading tasks for wearable devices. Considering the low energy consumption of the Bluetooth data transmissions, wearable devices usually offload computationally intensive tasks to the connected smartphone via Bluetooth. However, existing smartphones cannot

iii properly allocate CPU resources to these offloaded tasks due to lack of context information, resulting in either energy waste on smartphones or high interaction latency on wearable devices. To address this issue, we propose a context-aware task offloading framework, in which offloaded tasks can be properly executed on the smartphone or further offloaded to the cloud based on their context, aiming to achieve a balance between good user experience on wearable devices and en- ergy saving on the smartphone. Finally, we characterize and optimize Bluetooth energy consumption on smartwatches. Bluetooth is used for data communications between smartwatches and smartphones, but its energy consumption has been rarely studied. To solve the problem, we first establish the Bluetooth power model and then we perform an in-depth investigation of the background data transfers on smartwatches. We found that those data transfers consume a large amount of energy due to the energy inefficiency attributed to the adverse interaction be- tween the data transfer pattern (i.e., frequently transferring small data) and the Bluetooth energy characteristics (i.e., the tail effect). Based on these findings, we propose four techniques to save Bluetooth energy for smartwatches.

iv Table of Contents

List of Figures x

List of xiv

Acknowledgments xv

Chapter 1 Introduction 1 1.1 Motivation ...... 2 1.2 Challenges ...... 3 1.3 Focus of This Dissertation ...... 5 1.3.1 Energy-Aware Advertising through Quality-Aware Prefetch- ing on Smartphones ...... 5 1.3.2 Prefetch-Based Energy Optimization on Smartphones . . . . 6 1.3.3 Context-Aware Task Offloading for Wearable Devices . . . . 6 1.3.4 Characterizing and Optimizing Background Data Transfers on Smartwatches ...... 7 1.4 Organization ...... 8

Chapter 2 Energy-Aware Advertising through Quality-Aware Prefetching on Smartphones 9 2.1 Introduction ...... 9 2.2 Related Work ...... 11 2.3 Preliminaries ...... 13 2.3.1 Background: In-app Advertising ...... 13 2.3.1.1 In-app Ad Ecosystem ...... 13

v 2.3.1.2 In-app Ad Format and Size ...... 13 2.3.1.3 What Ads to Prefetch ...... 14 2.3.2 Design Considerations and Basic Ideas ...... 14 2.4 Network Quality Aware Prefetching ...... 15 2.4.1 Prediction Based on a Series of Probabilities ...... 15 2.4.1.1 Feature Selection ...... 17 2.4.1.2 App Usage Record ...... 17 2.4.1.3 Partitioning App Usage Records into Clusters . . . 19 2.4.1.4 Classifying Current App Usage to a Cluster . . . . 19 2.4.1.5 Generating Prediction Results for a Cluster . . . . 20 2.4.2 Prefetching Algorithms ...... 20 2.4.2.1 Energy Model ...... 21 2.4.2.2 Network Quality ...... 22 2.4.2.3 Energy-aware Prefetching Algorithm ...... 23 2.4.2.4 Energy-and-data Aware Prefetching Algorithm . . 25 2.5 Performance Evaluation ...... 26 2.5.1 Evaluation Setup ...... 27 2.5.1.1 App Usage Trace ...... 27 2.5.1.2 Measuring Network Quality ...... 27 2.5.2 Evaluation Result (Measurement-based Network Quality) . . 29 2.6 Testbed Development and Evaluation ...... 30 2.6.1 Testbed Development ...... 31 2.6.2 Experimental Results ...... 31 2.7 Conclusion ...... 32

Chapter 3 Prefetch-Based Energy Optimization on Smartphones 33 3.1 Introduction ...... 33 3.2 Related Work ...... 35 3.3 Preliminaries ...... 36 3.3.1 Energy Model ...... 36 3.3.2 Motivations ...... 39 3.4 Prefetch-Based Energy Optimization ...... 41 3.4.1 Problem Formulation ...... 41 3.4.2 Greedy Algorithm ...... 42 3.4.3 Discrete Algorithm ...... 43 3.4.3.1 Value of the segment size A ...... 45 3.4.4 Greedy Algorithm vs Discrete Algorithm ...... 46 3.4.5 Discussions ...... 46 3.5 Performance Evaluations: In-app Advertising ...... 47

vi 3.5.1 In-app Advertising ...... 47 3.5.2 Ad Prefetching Algorithm ...... 48 3.5.3 App Usage and Throughput Traces ...... 48 3.5.4 Parameter Setup and Algorithm Training ...... 49 3.5.5 Trace-Driven Simulations ...... 50 3.5.6 Testbed Development and Evaluation ...... 51 3.6 Performance Evaluations: Mobile Video Streaming ...... 53 3.6.1 Mobile Video Streaming ...... 53 3.6.2 Mobile Video Streaming Algorithms ...... 53 3.6.3 Video Viewing Traces ...... 54 3.6.4 Trace-Driven Simulations ...... 55 3.6.4.1 Energy Consumption ...... 55 3.6.4.2 Data Wastage ...... 57 3.6.5 Testbed Development and Evaluation ...... 58 3.6.5.1 Parameter Setup ...... 59 3.6.5.2 Evaluation Results ...... 59 3.6.5.3 Compatibility with DASH ...... 61 3.7 Conclusions ...... 61

Chapter 4 Context-Aware Task Offloading for Wearable Devices 63 4.1 Introduction ...... 63 4.2 Preliminaries ...... 65 4.2.1 Task Execution in Android ...... 66 4.2.2 big.LITTLE Architecture on Smartphones ...... 67 4.2.3 big.LITTLE Support in Android ...... 68 4.3 The Motivation for Context-Aware Task Offloading ...... 68 4.4 Context-Aware Task Offloading (CATO) ...... 70 4.4.1 CATO Overview ...... 70 4.4.2 Client and Server Proxies ...... 71 4.4.3 Profiler ...... 72 4.4.3.1 CPU Profiling ...... 72 4.4.3.2 Network Profiling ...... 73 4.4.4 Solver ...... 74 4.4.4.1 Estimating Latency and Energy Cost ...... 74 4.4.4.2 Offload Decision ...... 76 4.5 Implementation ...... 77 4.5.1 CATO API ...... 78 4.5.1.1 Preparing a Task ...... 78 4.5.1.2 Offloading a Task ...... 78

vii 4.5.2 Applications ...... 78 4.5.2.1 Speech Recognition ...... 78 4.5.2.2 Smart Alarm ...... 79 4.6 Performance Evaluation ...... 79 4.6.1 Experimental Setup ...... 80 4.6.2 Speech Recognition ...... 82 4.6.3 Smart Alarm ...... 83 4.6.4 CATO Overhead ...... 83 4.7 Related Work ...... 84 4.8 Conclusion ...... 85

Chapter 5 Characterizing and Optimizing Background Data Transfers on Smartwatches 86 5.1 Introduction ...... 86 5.2 Related Work ...... 88 5.3 Preliminaries ...... 89 5.3.1 Bluetooth Overview ...... 90 5.3.2 Bluetooth Modes ...... 91 5.3.3 Bluetooth on Android Smartwatches ...... 92 5.4 Bluetooth Power Model ...... 93 5.4.1 Methodology ...... 93 5.4.2 Power Model ...... 94 5.4.3 Model Validation ...... 96 5.5 Background Data Transfers on Smartwatches ...... 97 5.5.1 Packet Traces ...... 97 5.5.2 Origins of Background Data Transfers ...... 97 5.5.3 Energy Impact ...... 99 5.5.3.1 Energy breakdown for each application ...... 100 5.5.3.2 Total energy impact ...... 100 5.6 Energy Optimizing Techniques ...... 102 5.6.1 Fast Dormancy ...... 102 5.6.2 Phone Initiated Polling ...... 103 5.6.3 Two-stage Sensor Processing ...... 104 5.6.3.1 Sleep as Android ...... 105 5.6.3.2 Cinch ...... 106 5.6.4 Context-Aware Pushing ...... 106 5.7 Performance Evaluations ...... 107 5.7.1 Traffic Optimization for Individual Applications ...... 107 5.7.1.1 Evaluation results ...... 108

viii 5.7.2 Case Study ...... 109 5.8 Conclusion ...... 110

Chapter 6 Conclusions and Future Work 111 6.1 Summary ...... 111 6.2 Future Directions ...... 113

Bibliography 115

ix List of Figures

2.1 Cumulative distribution of how many ads are displayed ...... 14 2.2 Time interval between two consecutive usages of the same app . . . 18 2.3 Partitioning app usage records into clusters by K-means ...... 18 2.4 Naive Bayes classifier ...... 20 2.5 Power level of using the LTE cellular interface to download an ad . 21 2.6 Measuring LTE downlink throughput with different amounts of data 23 2.7 Measurement-based throughput traces. Throughput is randomly collected when walking inside and outside the building...... 28 2.8 Energy ratio for measurement-based throughput (Trace 1) . . . . . 28 2.9 Energy ratio for measurement-based throughput (Trace 2) . . . . . 28 2.10 CDF plots of the number of ads prefetched according to the energy- aware prefetching algorithm (30 seconds ad refresh interval) . . . . 29 2.11 Average data ratio ...... 30 2.12 Energy consumption in an app ...... 31

3.1 Power and downlink throughput with different signal strength, where -90 dBm represents good signal strength, -100 dBm represents av- erage signal strength, -115 dBm represents bad signal strength, and -125 dBm is considered as the boundary condition for losing the LTE signal. The power consumption is measured when the screen is on...... 39 3.2 Viewing content distribution of a video (watched by multiple us- ers). The figure is generated according to the video viewing traces described in Section 3.6.3 ...... 40 3.3 Different prefetching options under different network quality . . . . 40

x 3.4 Energy ratio based on trace-driven simulations. The energy con- sumption of the non-pref algorithm is used as benchmark to cal- culate the energy ratio of other algorithms. The postfix ”AGGR” after the algorithm name indicates that the algorithm is trained by the aggregated model. The postfix ”USER” indicates that the algorithm is trained by the user-specific model...... 50 3.5 Total energy consumed by the ad-enabled app for all tests. Each ad prefetching algorithm is trained using the user-specific model. . . . 52 3.6 CDF plots of the number of ads prefetched according to the discrete algorithm ...... 52 3.7 Total energy consumption of downloading videos for all video view- ings under different network quality. The segment size A in our discrete algorithm is set to 50 KB...... 55 3.8 CDF plots of the energy consumption for video viewing sessions using Trace 1. The simulations are performed for each video. The first half of the video viewing trace is used to train the algorithm and the video viewing records longer than 3 minutes in the second half (Trace 1) are used for evaluations...... 56 3.9 CDF plots of the energy consumption for video viewing sessions using Trace 2. The simulations are performed for each video. The first half of the video viewing trace is used to train the algorithm and the video viewing records shorter than (or equal to) 3 minutes in the second half (Trace 2) are used for evaluations...... 56 3.10 CDF of the fraction of wasted data for video viewing sessions. The fraction of wasted data is calculated by using the amount of video content downloaded but unwatched divided by the amount of watched video content...... 57 3.11 Impact of the time constraint τ on the segment size A. The calcu- lation of A is based on the method described in Section 3.4.3.1. . . 59 3.12 Energy consumption of different algorithms. The power consump- tion is measured using a Monsoon Power Monitor...... 60 3.13 CDF plot of the prefetched data size according to our discrete algo- rithm ...... 61

4.1 Android executes all tasks (components) of the same app in the same process...... 66 4.2 If a foreground app depends on another app, both apps will be run in foreground processes...... 66

xi 4.3 Average power, latency and total energy consumption of a little core and a big core to run the same workload at different frequencies on ...... 67 4.4 Wearable app A (foreground) and B (background) offload tasks to the smartphone by invoking service A and B, respectively. Since the smartphone does not know the context of these two tasks, it cannot properly run service A in a foreground process and service B in a background process...... 69 4.5 Overview of the CATO architecture ...... 70 4.6 Measured and KNN predicted latency ...... 75 4.7 Flex PCB based battery interceptor for the Nexus 5X smartphone to be connected with the Monsoon Power Monitor ...... 80 4.8 Latency and energy consumption of the speech recognition task with speech input of different lengths. CATO executes the task in a fore- ground process. Compared to executing in a background process, CATO-local can reduce latency by one third. To further reduce latency, according to the algorithm described in Section 4.4.4.2, CATO-LTE and CATO-WiFi offload tasks to the cloud when the speech length is longer than 3 seconds and 1 second, respectively. Offloading through LTE consumes much more energy than through WiFi, since the LTE interface consumes lots of energy (i.e., tail energy) after a data transmission...... 81 4.9 The CPU load of executing the speech recognition task in a back- ground process or a foreground process. A background process only uses a little core (i.e. cpu0), while a foreground process can use the big core to accelerate the task processing...... 81 4.10 Energy consumption and latency of different tasks from the smart alarm app. CATO executes the task in a background process. Com- pared to running in a foreground process, CATO-local can reduce energy by half. CATO-LTE never offloads the task to the cloud, so CATO-LTE and CATO-local have the same energy consumption and delay. CATO-WiFi offloads task 1 and task 2 to the cloud to further save energy...... 83

5.1 Slots in an ACL link. A slave (smartwatch) can send data to the master (phone) in slave-to-master slots, and receive data from the master in master-to-slave slots...... 90 5.2 Illustration of the anchor points in the sniff mode with a tsniff in- terval of six slots. M→S represents master-to-slave slots...... 90 5.3 Transition between the active mode and the sniff mode ...... 91

xii 5.4 Main ways to transfer data between a smartwatch and the paired (connected) phone. A single ACL link is shared among all the upper layer network connections...... 92 5.5 Flex PCB based battery interceptor for the Lg Urbane smartwatch to be connected with the Monsoon Power Monitor ...... 93 5.6 State transitions of using the Bluetooth interface to transfer data . 94 5.7 Power consumption of the Bluetooth radio while transferring 500 bytes data (measured by the LG Urbane smartwatch) ...... 95 5.8 Energy consumption of background data transfers generated by each application (service) based on isolated packet traces ...... 100 5.9 Energy breakdown of the Lg Urbane smartwatch in S2. The base energy is obtained from S1. The Bluetooth energy is calculated based on the packet trace and the Bluetooth power model...... 101 5.10 An illustration of the traditional polling schema. In each polling, two packets (i.e., one request and one response) are transferred be- tween smartwatch and phone...... 103 5.11 An illustration of the phone initiated polling schema. The phone is responsible for polling and only sends responses to smartwatch if updates are detected. In each polling, at most one response needs to be transferred between smartwatch and phone...... 104 5.12 The two-stage sensor processing framework. A preprocessing mod- ule is deployed on smartwatch to check the application requirement or the effectiveness of the sensor data. An offloading decision is made adaptively based on the preprocessing result...... 105 5.13 Performance of fast dormancy with different settings ...... 109 5.14 Performance evaluations of the optimization techniques for all back- ground data transfers ...... 109

xiii List of Tables

2.1 Mutual information between app duration and features ...... 17 2.2 Confusion matrix ...... 20 2.3 Power consumption of LTE cellular interface ...... 22

3.1 Mobile Devices and Network Types ...... 37 3.2 Power consumption of LTE ...... 38 3.3 The Process of the Greedy Algorithm ...... 42

4.1 Power consumption of various network interfaces ...... 73 4.2 Features of executing a speech recognition task ...... 74 4.3 Error of predicting the latency of executing a speech recognition task on the smartphone ...... 75 4.4 CATO API used for preparing a task and offloading the task to the smartphone ...... 77 4.5 Tasks of the smart alarm app with different configurations . . . . . 79

5.1 Mobile devices and system versions ...... 93 5.2 Power level of major components on the LG Urbane smartwatch . . 94 5.3 Background data transfers generated by the system ...... 98 5.4 Applications generating periodic background data transfers . . . . . 98 5.5 Battery life of smartwatches under different scenarios ...... 101 5.6 Traffic optimization for individual applications ...... 108

xiv Acknowledgments

I would like to thank all people who have helped and inspired me during my doctoral study. I especially would like to express my deepest gratitude to my advisor Prof. Guohong Cao, for his research advice, patience and encouragement. In the past six years, he has spent considerable time and effort on my research. He has not only guided me through his expertise and insights on important research problems, but also trained me to think independently and critically. I appreciate all the contribu- tions and I have learnt a lot from him. Without his support and encouragement, I could not have completed my Ph.D. dissertation. I would like to express my gratitude to other members of my doctoral commit- tee, including Prof. George Kesidis, Prof. Sencun Zhu, and Prof. Dinghao Wu. They gave me many constructive and insightful comments, which helped me to improve my research and kept my dissertation in the right direction. I am very grateful for the help and encouragement of these professors to make my dissertation reach the present form. My gratitude also extends to my co-authors Yeli Geng, Wenjie Hu and Li Qiu for their inspiring thoughts and valuable work to solve the research problems we met. I also want to thank all my labmates and friends in the MCN lab for sharing research experiences with me and accompanying me throughout my Ph.D. journey. Their friendships make my life at Penn State very enjoyable and memorable. I want to give a special thanks to my family. First I would like to thank my parents and my parents-in-law for their unconditional love and generous support. I also want to thank my son, Jayden. His birth gave me lots of joy in my life. Last but not least, I would like to express my deepest gratitude to my wife, Nan Yu, for her continuous encouragement during my doctoral study and the contributions she made for the whole family. This work was supported in part by the National Science Foundation (NSF) under grants CNS-1421578 and CNS-1526425. Any opinions, findings and conclu- sions or recommendations expressed in this material are those of the author(s) and

xv do not necessarily reflect the views of the National Science Foundation (NSF).

xvi Dedication

To my wife and son.

xvii Chapter 1

Introduction

Mobile devices such as smartphones and smartwatches are quickly gaining popu- larity in the past few years. There are various apps available on mobile devices, such as email, video streaming and health monitoring. Those apps usually rely on data access through wireless interfaces, such as LTE, WiFi and Bluetooth, which will drain the battery quickly [1, 2, 3]. Considering the limited battery life of mobile devices, it is important to characterize the energy consumption of wireless communications and propose energy saving solutions. We first focus on saving energy on smartphones. Most apps on smartphones are associated with in-app advertising. Fetching ads through the cellular network may lead to significant battery drain on smartphones [4]. This is because the cellular interface needs to stay at a high-power state for more than 10 seconds after a data transmission (also referred to as the long tail problem) [5, 6]. Since ads are fetched periodically, the long tail problem occurs frequently and lots of energy is wasted [7]. Prefetching those ads together can reduce the tail energy. However, prefetching unnecessary ads may waste energy, and this energy waste varies depending on the network quality. We will study how to determine the number of ads to prefetch based on the network condition to save energy. Moreover, prefetching can be used in many other mobile apps such as video streaming. So it is necessary to formalize the prefetching problem to provide general solutions. Also, we found that smartphones may waste energy on offloaded tasks due to lack of context. So context information should be considered when offloading tasks. Then we consider to save energy on smartwatches. Bluetooth [8] is adopted by 2 smartwatches to communicate with the connected smartphone. Apps on smart- watches may generate a large amount of Bluetooth data transfers, which may consume lots of energy. Although some researches have been done to investigate the Bluetooth performance [9, 10], none of them focuses on energy. Thus, it is of great value to characterize the energy consumption of the Bluetooth interface on smartwatches and propose energy saving solutions. This dissertation focuses on optimizing energy for wireless communications on mobile devices. Specifically, we focus on saving energy on smartphones and smartwatches.

1.1 Motivation

Periodical data transfers through the cellular interface consumes a large amount of energy on smartphones [11, 12]. For example, apps associated with in-app advertising need to fetch ads periodically, and then the long tail problem occurs frequently and lots of energy is wasted [7]. A recent measurement study based on top 15 ad-supported apps shows that fetching in-app ads consumes 23% of the app’s total energy and 65% of the app’s total communication energy [13]. To reduce the tail energy, we can predict the number of ads needed in the future and then prefetch those ads together. Then, only one tail is generated instead of multiple tails with the traditional ways to fetch ads periodically. However, since it is impossible to accurately predict the exact number of ads that will be used in the future, the prefetched ads may be more or less than necessary. If the prefetched ads are fewer than needed, more prefetches will be needed, and resulting in extra long tail problems. On the other hand, if the prefetched ads are more than necessary, energy and bandwidth will be wasted. This problem becomes worse when the network quality is poor and it will take much longer time to transmit the same amount of data and consume more energy [14]. Thus, we propose to adaptively adjust the number of ads to prefetch based on the network quality. Besides in-app advertising, prefetching can also be used to reduce the tail energy in many other mobile apps. For example, in YouTube app, small chunks of video are periodically downloaded for users to watch, which wastes lots of tail energy. By predicting how long the user will watch the video and then prefetching those 3 video together, lots of tail energy can be saved. Thus, we generalize the prefetching problem to provide general solutions. To save energy, wearable devices usually offload computationally intensive tasks to the connected smartphone [15]. Existing Android smartphones allocate CPU resources to a task according to its performance requirement, which is determined by the context of the task [16]. However, due to lack of context information, smartphones cannot properly allocate resources to tasks offloaded from wearable devices. Allocating too few resources to urgent tasks (related to user interaction) may cause high interaction latency on wearable devices, while allocating too many resources to unimportant tasks (unrelated to user interaction) may lead to energy waste on the smartphone. To address this problem, we propose a context-aware task offloading framework, in which offloaded tasks can be properly executed on the smartphone or further offloaded to the cloud according to their context. Data communications between smartwatches and smartphones are enabled by Bluetooth. Although Bluetooth is designed primarily to achieve low-power data transmissions, a large amount of Bluetooth data transmissions may also cause energy issues on smartwatches. There are some researches on Bluetooth such as characterizing its performance [9] and enhancing its functionalities [10], but little study has been done to investigate the Bluetooth power consumption. In our work, we first establish the Bluetooth power model, and then investigate the energy impact of background data transfers on smartwatches and propose energy optimization techniques.

1.2 Challenges

The proposed energy saving solutions for mobile devices have to overcome many challenges. The first challenge is how to consider the context when generating prefetching options (based on historical app usage). For a certain app, a user may use it in some specific context, which uses different amounts of data from other contexts. To generate prefetching options, we should only consider app usage in a similar context. However, it is very difficult to partition app usage based on context due to two reasons. First, it is hard to know whether a user runs an app in some 4 specific context or just runs it randomly. Second, even if such contexts exist, it is hard to identify them because of the diversity of user behaviors. Thus, there is no simple rule to identify these contexts for the prediction. Second, it is challenging to formulate and solve the prefetching problem. To formulate the problem, we need to consider the tradeoff between saving the tail energy and the energy wasted on downloading unneeded data. Moreover, the effect of the network quality should be considered. Since there are numerous ways of prefetching data, it is impossible to solve the prefetching problem by brutal force, i.e., evaluating all prefetching options to find the best one. The algorithm to search the prefetching option should be lightweight and efficient. There are also some other challenges, such as how to measure the network quality in real time, how to make the prefetching schema compatible with upper layer services such as DASH. Third, the realization of the context-aware task offloading framework has sev- eral challenges. First, the framework should not require any modification to the underlying Android OS. Second, the framework needs to be backward compatible with traditional context-unaware offloading solutions to avoid unexpected failures of offloading tasks. That is, wearable devices should be able to detect whether the connected smartphone supports the framework, and if not, switch back to tradi- tional ways to offload tasks. Third, to further offload tasks from the smartphone to the cloud, some problems need to be addressed, such as how to profile the la- tency and energy cost of executing a task, how to predict the cost of a new task by considering task inputs and the execution environment, how to consider the task context when making offload decisions. Finally, optimizing the energy consumption of Bluetooth data transfers on smartwatches has the following challenges. First, it is difficult to build the Blue- tooth power model, since the power consumption of smartwatches cannot be di- rectly measured, and the Bluetooth implementation is not standardized. Second, we need to collect Bluetooth packet traces to analyze the origins and the energy impact of Bluetooth data transfers. Third, to optimize Bluetooth data transfers, we need to consider many factors, such as unique characteristics of the Bluetooth interface, cooperation between a smartwatch and the connected phone, and the origins of Bluetooth data transfers. 5

1.3 Focus of This Dissertation

The goal of this dissertation is to propose energy saving solutions for mobile de- vices. Specifically, we focus on four aspects, i.e., quality-aware energy optimiza- tion for in-app advertising, prefetch-based energy optimization on smartphones, context-aware task offloading, and energy optimization of Bluetooth data trans- fers on smartwatches. We briefly explain them in the following four subsections.

1.3.1 Energy-Aware Advertising through Quality-Aware Prefetch- ing on Smartphones

In-app advertising provides a monetization solution for free apps, but it also con- sumes lots of energy due to the long tail problem, where the cellular interface has to stay in the high-power state for some time after each data transmission. To re- duce the tail energy, a viable solution is to predict the number of ads needed in the future and then prefetch those ads together instead of periodically [17, 13]. How- ever, prefetching unnecessary ads may waste both energy and cellular bandwidth, and this problem becomes worse when the network quality is poor. We address this problem by adjusting the number of ads to prefetch based on the network quality. Generally speaking, more ads should be prefetched when the network quality is good, and fewer ads should be prefetched when the network quality is poor. Although redundant ads may be prefetched under good network quality, it avoids other possible long tail problems. Similarly, when the network quality is poor, prefetching fewer ads can avoid the energy waste of prefetching unneeded ads. To achieve this, different from traditional data-mining based predic- tion algorithms which only generate one option (i.e., the number of ads to prefetch), the proposed prediction algorithm generates a set of options with various proba- bilities. With these prefetching options, we propose two prefetching algorithms to reduce the energy consumption by considering the effect of network quality, where the energy-aware prefetching algorithm aims to minimize the energy consumption, and the energy-and-data aware prefetching algorithm also considers the data usage to achieve a tradeoff between energy and data usage. 6

1.3.2 Prefetch-Based Energy Optimization on Smartphones

Cellular network enables pervasive data access, but it also increases the power consumption of smartphones due to the long tail problem. Prefetching has been widely used to reduce the tail energy in many apps [17, 18, 19, 20]. However, existing works are limited to certain mobile apps under certain conditions and do not consider the effect of network quality. We propose to decide how much data to prefetch according to the network quality. Specifically, we generalize and formulate the prefetching problem as a nonlinear optimization problem, where the goal is to find a prefetching schedule that minimizes the energy consumption of the data transmissions under the cur- rent network condition. Since it is impractical to find the optimal solution on smartphones due to the high computation overhead, we propose heuristic based solutions. We first propose a greedy algorithm, which iteratively decides how much data to prefetch based on the current network quality. Then, we propose a better solution by changing the problem to a discrete problem and solve it using dynamic programming.

1.3.3 Context-Aware Task Offloading for Wearable Devices

Wearable devices such as smartwatches do not have enough power and computa- tion capability to process computationally intensive tasks. As a solution, wearable devices can offload computationally intensive tasks to the connected smartphone. Previous research has investigated how to make offload decisions on wearable de- vices [21, 22, 15], but none of them considers how to execute offloaded tasks on smartphones. We found that existing Android system allocates CPU resources to a task according to its performance requirement, which is determined by the context of the task, i.e., whether the task is related to user interaction. A task related to user interaction will have high performance requirement, while a task unrelated to user interaction has low performance requirement. For modern smartphones equipped with big.LITTLE cores [23], tasks related to user interaction can run on big cores (high performance) to accelerate their execution, while other unimportant tasks are allocated onto little cores (high energy-efficiency) to save energy. However, due 7 to lack of context information, tasks offloaded from wearable devices cannot be properly executed on smartphones. Current Android smartphones simply execute all offloaded tasks on little cores. Then, tasks related to user interaction cannot be processed promptly, resulting in high interaction latency on wearable devices. To avoid such problem, smartphones may execute offloaded tasks on big cores, but energy may be wasted on unimportant tasks. To address this issue, we propose a context-aware task offloading (CATO) framework, in which offloaded tasks can be properly executed on the smartphone or further offloaded to the cloud based on their context, aiming to achieve a balance between good user experience on wearable devices and energy saving on the smartphone.

1.3.4 Characterizing and Optimizing Background Data Trans- fers on Smartwatches

Smartwatches are quickly gaining popularity, but their limited battery life remains an important factor that adversely affects user satisfaction [24]. To provide full functionality, smartwatches are usually connected to smartphones via Bluetooth [8]. Although there are some researches on Bluetooth such as characterizing its performance [9] and enhancing its functionalities [10], none of them focuses on the energy consumption. To address this issue, we first build the Bluetooth power model. We found that the Bluetooth interface on smartwatches is put into the high-power mode when transferring data, and switches to the low-power mode to save energy when there is no data traffic. The mode transition is controlled by an inactivity timer, and it is possible that the Bluetooth interface continues to consume a substantial amount of energy before the timer expires (referred to as the tail effect), even when there is no network traffic. Based on the observed power characteristics, the Bluetooth power model is established. Then we perform the first in-depth investigation of the background data transfers on smartwatches, and find that they are prevalent and consume a large amount of energy. For example, our experiments show that the smartwatch’s battery life can be reduced to one third (or even worse) due to background data transfers. Such high energy cost is caused by many unnecessary data transfers and the energy inefficiency attributed to the adverse interaction 8 between the data transfer pattern (i.e., frequently transferring small data) and the Bluetooth energy characteristics (i.e., the tail effect). Based on the identified causes, we propose four energy optimization techniques, which are fast dormancy, phone-initiated polling, two-stage sensor processing, and context-aware pushing. The first one aims to reduce tail energy for delay-tolerant data transfers. The latter three are designed for specific applications which are responsible for most background data transfers.

1.4 Organization

The remainder of the dissertation is organized as follows. Chapter 2 presents the energy-aware advertising approach considering the network quality. Chap- ter 3 focuses on the energy optimization based on prefetch. Chapter 4 presents our context-aware task offloading framework. Chapter 5 illustrate our solution to characterize and optimize background data transfers on smartwatches. Finally, we conclude the dissertation and discuss the future work in Chapter 6. Chapter 2

Energy-Aware Advertising through Quality-Aware Prefetching on Smartphones

2.1 Introduction

With the proliferation of smartphones, people spend a large amount of time on mobile apps. There are two kinds of mobile apps: free apps, and paid apps. According to recent studies, free apps account for above 90% of the total amount of apps downloaded in the market in 2015 [25, 26]. To pay for the cost of the app development, free apps are usually associated with in-app advertising [4]. Although users enjoy using free apps, the energy consumption of fetching in- app ads through the cellular network may lead to significant battery drain on smartphones. For example, a recent measurement study based on top 15 ad- supported apps shows that fetching in-app ads consumes 23% of the app’s total energy and 65% of the app’s total communication energy [13]. In cellular networks, the release of radio resource is controlled by multiple timers, and the timeout value can be more than 10 seconds [11, 5, 6, 27]. Thus, it is possible that the cellular interface continues to consume a large amount of energy (also referred to as the long tail problem) before the timer expires, even when there is no network traffic. Although it only takes less than one second for the cellular 10 interface to fetch an ad, due to the long tail problem, much more energy may be wasted. This problem becomes worse since ads are fetched periodically, and then the long tail problem occurs frequently [7]. To reduce the energy related to the long tail problem, techniques such as fast dormancy [28] have been proposed. Fast dormancy can reduce the tail time by switching the cellular interface into the low power state immediately after the data transmission. However, this requires support from both mobile devices and cellular carriers. Furthermore, it may not know when the next data transmission will happen. If the next data transmission happens quickly, fast dormancy may waste energy and introduce extra delay on switching the smartphone out of the low power state. As another solution to address the long tail problem, we can predict the number of ads needed in the future and then prefetch those ads together. Then, only one tail is generated instead of multiple tails with the traditional ways to fetch ads periodically. Although prefetching multiple ads can reduce the tail energy, its potential cost is also high. This is because prediction may be inaccurate, and prefetching unnecessary ads may waste both energy and cellular bandwidth. This problem becomes worse when the network quality is poor and it will take much longer time to transmit the same amount of data and consume more energy. Thus, prefetching should also be aware of the network condition. Since it is hard to know the exact number of ads to prefetch, which is app- dependent and user-dependent, we have to adjust the number of ads to prefetch based on the network quality. Generally speaking, more ads should be prefetched when the network quality is good, and fewer ads should be prefetched when the network quality is poor. Although redundant ads may be prefetched under good network quality, it avoids other possible long tail problems. Similarly, when the net- work quality is poor, prefetching fewer ads can avoid the energy waste of prefetch- ing unneeded ads. In this chapter, we propose network quality aware prefetching algorithms. Dif- ferent from traditional data-mining based prediction algorithms which only gener- ate one option (i.e., the number of ads to prefetch), the proposed prediction algo- rithm generates a set of options with various probabilities. With these prefetching options, we propose two prefetching algorithms to reduce the energy consumption by considering the effect of network quality, where the energy-aware prefetching al- 11 gorithm aims to minimize the energy consumption, and the energy-and-data aware prefetching algorithm also considers the data usage to achieve a tradeoff between energy and data usage. The contributions of this chapter include:

• We propose a prediction algorithm to generate a set of prefetching options with various probabilities, so that different options are adopted based on the network quality to save energy.

• With multiple prefetching options, we propose two prefetching algorithms: the energy-aware prefetching algorithm aims to minimize the energy con- sumption, and the energy-and-data aware prefetching algorithm achieves a tradeoff between energy and data usage.

• We have implemented and evaluated the proposed prefetching algorithms under different network quality. Evaluation results show that, compared to traditional ways of fetching ads periodically, our energy-aware prefetching algorithm can save 80% of energy, and our energy-and-data aware prefetching algorithm can achieve a similar energy saving with less data usage.

The rest of this chapter is organized as follows. Section 2.2 discusses related work. Section 2.3 presents some preliminaries. Section 2.4 introduces network quality aware prefetching algorithms. We evaluate the proposed algorithms in Section 2.5 and test them in a real app in Section 2.6. Section 2.7 concludes the chapter.

2.2 Related Work

Recently, lots of research has been done on energy consumption of fetching in- app ads. A detailed measurement about the ad volume has shown that 50% of Android users have spent more than 5% of their total network traffic related to ad [29]. Although it seems not very large, fetching these ads will generate periodical data transmissions, which contribute to 30% of the total radio energy consumption due to the long tail problem [7]. A case study in [30] has found that 70% of the 12 energy consumed in Angry Birds (a game app) on Android devices is related to a third party ad library. Prefetching has been adopted to solve the long tail problem in cellular networks [31, 18]. Parate et al. [18] proposed to predict what app will be used next, and then prefetch that app. Although they solve the problem of determining what data to prefetch, they do not consider the problem of how long an app will be used and how much data (ads) to prefetch, which is the focus of this chapter. In [31], a system was built to support informed prefetch in which developers can use an API to provide a hint for prefetch. However, in practice it is hard for developers to know how much data will be needed in the future, which is user-dependent (e.g., how many ads are needed depends on how long an app is used). Some prefetching algorithms [32, 33] prefetch all the data when a fast network connection (e.g., WiFi, LTE with good network quality) is available. However, in reality it is hard to know if such a fast network connection will be available before the data is needed (e.g., An indoor area with poor network quality and no WiFi). In our work, we do not have this assumption, and we try to find the most energy efficient way for prefetching under the current network quality. Our work is mostly related to [13, 17] which attempt to reduce the energy consumption of fetching in-app ads. The CAMEO framework [17] provides a mid- dleware to prefetch context-dependant ads for all apps. It predicts what apps will be used in the future to determine ad contexts and prefetches a fixed number of ads for each app. However, the number of ads to prefetch is app-dependent and user-dependent, which is not fixed. Our work can be considered complimentary to CAMEO, as our work focuses on predicting the number of ads to prefetch for certain app and user under the current network quality. Mohan et al. [13] pro- posed an overbooking model at the ad server/network to ensure that all ads can be displayed before deadline, and predicted how many ads to prefetch by using the 80th percentile value of the number of ads displayed in historical records. Dif- ferent from using a simple rule to predict the number of ads to prefetch, which may consume much more energy under some network quality, our prefetching algo- rithms determine the number of ads to prefetch by considering the effect of network quality. 13

2.3 Preliminaries

In this section, we first provide some background of how in-app advertising works, and then present our design considerations and basic ideas.

2.3.1 Background: In-app Advertising

We first briefly illustrate the current in-app ad ecosystem, and then introduce the in-app ad format and size. At the end, we discuss what ads to prefetch.

2.3.1.1 In-app Ad Ecosystem

Current in-app advertising ecosystem involves three main parts: Apps, Ad Net- works, and Advertisers. Apps on the smartphones rely on the embedded ad library, which is usually provided by the same company who also provides the correspond- ing ad network, to fetch and display ads. Advertisers can register with the ad net- work and initiate ad campaigns. An ad campaign is a contract between advertisers and the ad network (e.g., delivering 10,000 ads within a day). The responsibility of the ad network is to complete all the ad campaigns. Current ad networks usually deploy real-time bidding (RTB) strategies [34] to display the most valuable and relevant ads for mobile users. Although the bidding price of an ad may change in a RTB-based system, recent studies [13] have shown that the bidding price is stable in a short period of time (e.g., several hours), which is longer than most app running time.

2.3.1.2 In-app Ad Format and Size

In-app ad formats include banners, rich media, and video. According to recent market analysis and prediction [35], rich media is becoming the most popular ad format. Thus, we consider to prefetch rich media ads in this chapter. When prefetching a rich media ad, the whole file of the ad needs to be fetched to avoid extra network activities, although some parts of the file are only useful after certain user interactions. According to ad specifications [36, 37, 38, 39], the whole file size of a rich media ad ranges from 50 KB to 200 KB. 14

100 90 80 50% 70 60 50 40 30 20

Cumulative probability (%) 10 0 0 10 15 20 30 40 50 60 Number of ads displayed Figure 2.1. Cumulative distribution of how many ads are displayed 2.3.1.3 What Ads to Prefetch

Ads to be displayed are determined by app-dependent contexts such as app cat- egory [17], and then it is easier to decide what ads to prefetch for a specific app. This will be much harder for all apps because different apps may have different contexts and then different types of ads should be prefetched. Thus, ads are only prefetched for a certain app, and we know what app is under consideration and what ads to prefetch. Specifically, ads are prefetched when new ads are needed in the app, and unused ads are discarded when the app is closed.

2.3.2 Design Considerations and Basic Ideas

The energy consumption of prefetching ads is affected by the prediction accuracy. If the prefetched ads are more than necessary, prefetching those unneeded ads may waste a lot of energy. On the other hand, if the prefetched ads are fewer than needed, more prefetches will be needed, and resulting in extra long tail problems. In cellular networks, the energy consumption of prefetching ads varies with the network quality. For example, as we measured in a LTE network, when the net- work quality is poor, downloading a 100 KB ad consumes 2 joules. When the network quality is good, downloading a 100 KB ad consumes 0.1 joules. The en- ergy consumption of a long tail remains about 10 joules under different network quality. Traditional data-mining based algorithms do not consider the network quality, and only generate one option (i.e., the number of ads to prefetch), which may be too high or too low and then wasting some energy. To further reduce the energy consumption, we have to consider the network quality by designing adaptive 15 algorithms which can adjust the number of ads to prefetch accordingly. To achieve this, we generate multiple options based on the historical app usages, and choose the one with the least energy consumption according to the current network quality. For example, Fig. 2.1 shows the cumulative distribution of the number of ads displayed for an app according to our trace. The largest number of ads displayed is 60 and the median is 15. We can prefetch 60 ads, or we can prefetch 15 ads with a 50% probability that 45 more ads are prefetched in the future. Under poor network quality (i.e., downloading an ad consumes 2 joules), prefetching 60 ads consumes 2 × 60 + 10 = 130 joules. As the second option, we can prefetch 15 ads, and then the expected energy consumption is 2 × 15 + 10 + 50% × (2 × 45 + 10) = 90 joules. In this case, prefetching fewer ads (the second option) wins. Under good network quality (i.e., downloading an ad consumes 0.1 joules), prefetching 60 ads consumes 0.1 × 60 + 10 = 16 joules. As the second option, we can prefetch 15 ads, and then the expected energy consumption is 0.1 × 15 + 10 + 50% × (0.1 × 45 + 10) = 18.75 joules. In this case, prefetching more ads (the first option) wins. The above example shows the basic idea of our network quality aware prefetch- ing algorithms. We first design a prediction algorithm which generates multiple prefetching options (i.e., the number of ads to prefetch) with detailed information to estimate the probability of future prefetches. Then, we estimate the energy con- sumption of each option by considering the effect of network quality and choose the best one accordingly.

2.4 Network Quality Aware Prefetching

In this section, we first present the prediction algorithm which can generate multi- ple prefetching options with various probabilities, and then describe two prefetch- ing algorithms to determine the number of ads to prefetch according to the network quality.

2.4.1 Prediction Based on a Series of Probabilities

Our prediction algorithm aims to generate multiple prefetching options of how many ads to prefetch. Specifically, our goal is to generate a set of options in the 16 form of (α, n), where α is a predefined probability, and n is the predicted number of ads corresponding to α such that the probability of displaying more than n ads in an app is α. Since the number of ads can be calculated by dividing the app duration, which is from the app being opened to being closed, by the ad refresh interval, which is fixed and easy to find, the goal becomes to predict the app duration corresponding to α. For a certain app, the prediction can be based on the percentile app duration which can be obtained from the app usage records. For example, the 20th percentile app duration is the time value t below which 20% of the observed app duration can be found. That is, the probability of using the app longer than t is 80%, and t the corresponding prefetching option is (α = 80%, n = ad−refresh−interval ). Thus, to predict the app duration for a certain value of α, we only need to calculate the corresponding percentile app duration. It may not be a good idea to consider all app usage records for prediction, because some of them may be misleading at the current time and location. For a certain app, a user may use it in some specific context, which has different app duration from other contexts. For example, a student may read newspaper at school during class break or at home. Due to the time limitation in the class break, the student may spend much less time when reading newspaper at school than at home. As a result, those app usage records generated at school are not suitable for predicting the app duration at home, and vise versa. Thus, to predict the app duration, we should only consider those app usage records generated in a similar context. Then, we need to identify the context of the app, and partition app usage records based on the context. It is very difficult to partition app usage based on context due to two reasons. First, it is hard to know whether a user runs an app in some specific context or just runs it randomly. Second, even if such contexts exist, it is hard to identify them because of the diversity of user behaviors. For example, a user may run an app at particular locations, while uses another app at particular time and locations. Thus, there is no simple rule to identify these contexts. To address this problem, we adopt the clustering technique, which can group together app usage records in such a way that app usage records in the same group (called a cluster) have more similar features (e.g., time and location) than those in 17

Table 2.1. Mutual information between app duration and features total # of app usage records 170K total # of apps 2K H(app duration) 5.59 bits I(app duration, time) 1.18 bits I(app duration, location) 0.71 bits I(app duration, app category) 0.13 bits I(app duration, last app name) 0.08 bits I(app duration, last app duration) 0.04 bits I(app duration, recent call-SMS) 0.02 bits other groups. If these app usage records are found tightly grouped (clustered), it means that this app is used in specific contexts, which can be represented by the centroid of clusters. Then, we can classify the current app usage to a cluster, and only use app usage records in that cluster for prediction.

2.4.1.1 Feature Selection

It is important to carefully select features that affect the app duration. In infor- mation theory, for a discrete random variable X, the entropy (H(X)) measures the uncertainty of X. The mutual information between random variable X and Y , denoted as I(X,Y ), represents the mutual dependence between X and Y .A higher mutual information between X and Y suggests that X and Y are more relevant to each other. Thus, features that have the highest mutual information with the app duration should be selected. In our experiment based on the LiveLab dataset [40], we discretize the app duration into 50 values, and calculate the mu- tual information between it and different features. From the result shown in Table 2.1, time and location have the highest mutual information with the app duration, so they are chosen for clustering.

2.4.1.2 App Usage Record

An app usage record includes information about app duration, time, and location. An interesting observation from the LiveLab dataset is that most of the time interval between two consecutive usages of the same app is either very short (less than 5 minutes) or very long (more than 100 minutes), as shown in Fig. 2.2, where the time interval is counted as 100 minutes if it exceeds 100 minutes. By analyzing 18

4 4 x 10 x 10 10 4

8 3 6 2 4 Number Number 1 2

0 0 0 50 100 0 5 10 Time interval (min) Time interval (min) (a) Distribution of time interval (b) Distribution of time interval less than 10 minutes Figure 2.2. Time interval between two consecutive usages of the same app

" 150 50

!'% 40 100 ! 30 Number '% Number 20

&(1.6(.5+ 50 10

'% 0 0 ! " # $ % 0 300 600 900 1200 1500 1800 0 300 600 900 1200 1500 1800 '5/)+2.0,.*.534+23.- App duration (sec) App duration (sec) (a) Gap statistic to deter- (b) App duration distributions in two clusters mine the number of clusters Figure 2.3. Partitioning app usage records into clusters by K-means user behaviors within those short time intervals, we find that most app usages are interrupted because the user needs to reply a message or quickly check some information such as clock or weather. Thus, those app usages whose time interval is less than 5 minutes are considered as one usage. Based on this observation, we do not discard unused ads immediately when the app is closed. Instead, five more minutes are waited before discarding those ads. If the app is reused within 5 minutes, it is considered as the same usage as the last one, and those ads can still be used for display. As mentioned above, time and location are selected as features for clustering. In an app usage record, time has two different scales: hour of the day, and day of the week. Location is represented by the cell id. Although cell id based localization incurs errors as high as 500 meters, it is sufficient to recognize coarse-grain locations like “home” or “office”. Unlike GPS based localization, cell id based approach is more energy efficient. 19

2.4.1.3 Partitioning App Usage Records into Clusters

In order to group together app usage records, different kinds of clustering algo- rithms can be applied. We choose the most commonly used clustering algorithm, K-means, because of its simplicity. K-means algorithm can partition app usage records into k clusters in such a way that every app usage record is partitioned into a cluster with the nearest mean to the centroid of that cluster. k is assumed to be less than 5, otherwise the app usage records in each cluster may be insufficient to calculate the percentile app duration for prediction. To determine the value of k, we use the gap statistic [41], which is one of the best cluster validity methods for determining the number of clusters for an unsupervised clustering algorithm like K-means. For example, in Fig. 2.3a, the gap statistic shows a peak at k = 2, which means that app usage records can be most tightly grouped into two clus- ters. After partitioning app usage records into two clusters (C1 and C2), we can see that there is a notable difference between the distributions of app duration in these two clusters, as shown in Fig. 2.3b. We update clusters after every ten app usage records are added. As measured on a Samsung Galaxy S6 phone, running clustering algorithm on average takes 86 ms and consumes less than 0.1 joules, which is negligible compared to the energy consumption of downloading ads.

2.4.1.4 Classifying Current App Usage to a Cluster

We use the naive Bayes classifier [42] to classify the current app usage to a identified cluster, and then use app usage records in that cluster for prediction. The structure of the naive Bayes classifier is shown in Fig. 2.4. To evaluate its accuracy, we randomly select some app usage records which have been assigned to a cluster, and then test whether they can be classified to the right cluster. The classification result is represented by a confusion matrix. As an example shown in Table 2.2, our classifier successfully classifies app usage records in cluster C1 for 198 times and those in cluster C2 for 100 times without any error. The average accuracy of classification for all clusters is 99.8%. $ $

+ ' / 0 / + ' 0

7 & 7 &

/%10RGHO 7%10RGHO 8

+ ' / 7 & 0

1%10RGHO

$

/ + ' 0

7 &

7%10RGHO 20

&OXVWHU

/RFDWLRQ +RXUR IW KHG D\ 'D\R IW KHZ HHN

Figure 2.4. Naive Bayes classifier

Table 2.2. Confusion matrix C1 C2 C1 198 0 C2 0 100

2.4.1.5 Generating Prediction Results for a Cluster

After classifying the current app usage to a cluster, we use app usage records in that cluster to predict the app duration. First, we define a series of probabilities M {αi}i=1, where 100% > α1 > α2 > ... > αM = 0%. Then for each αi, the app duration is predicted as the corresponding percentile app duration Ai observed in that cluster such that the probability of running the app for more than Ai seconds in the future is αi, where AM is the longest app duration observed in that cluster. Finally, we divide Ai by the ad refresh interval, which is a fixed value, to obtain the number of ads ni to prefetch. The final set of prefetching options is

R = {(αi, ni) | i = 1, 2, ..., M}, where nM is the largest number of ads displayed in history. We set M = 10 and αi = 90%, 80%, ..., 10%, 0% to generate prefetching options.

2.4.2 Prefetching Algorithms

To reduce the energy consumption of prefetching, network quality is taken into account to determine how many ads to prefetch among options provided by R. In this section, we first provide an overview of the energy model of the LTE network, and then discuss how to define and obtain the network quality. Finally, we give a detailed description of two prefetching algorithms: the energy-aware prefetching algorithm, and the energy-and-data aware prefetching algorithm. 21

2500

Data transmission 2000 Promotion Tail 1500

Idle 1000 Power (mW)

500

0 0 5 10 15 20 Time (sec) Figure 2.5. Power level of using the LTE cellular interface to download an ad 2.4.2.1 Energy Model

In cellular networks, there are different network states, known as the radio resource control (RRC) states. From a user perspective, various RRC states have different latency and energy characteristics. As LTE has been widely deployed, its energy model is used to formulate the energy consumption of prefetching. There are two main RRC states for LTE: CONNECTED, a high-power state where the network resource is reserved, and IDLE, a low-power state where the network resource is released. The cellular interface should be at CONNECTED to send or receive data, and it takes some time to change the state from IDLE to CONNECTED, known as the promotion delay. To avoid unnecessary promotion delay, the cellular interface remains at CONNECTED for several seconds before switching back to IDLE. As a result, the cellular interface has to stay in the high-power state for some time (i.e., the long tail) after a data transmission. To reduce the tail energy, LTE deploys the Discontinuous Reception (DRX) technique when it is at CONNECTED and there is no data transmission. The goal of DRX is to save the tail energy, where the cellular interface is only active for a small fraction of time to monitor the downlink control channel and deliver small control [3]. Thus, the power level of the tail is much lower than that during the data transmission although it is still much higher than IDLE [43]. Note that the smartphone and the backbone network are still connected during tail time, so no promotion process is needed before starting to transmit data. The power consumption of the LTE cellular interface can be generalized into three states: promotion, data transmission, and tail, as shown in Fig. 2.5. The power consumption of these three states are denoted as Ppro, Pcell, and Ptail, re- 22

Table 2.3. Power consumption of LTE cellular interface State Power (mW) Duration (s) Idle (GS6) 498 ± 35.4 - LTE Promotion 1286.3 ± 36.5 0.5 ± 0.1 LTE Data 1959.2 ± 42.1 - LTE Tail 1192.4 ± 31.4 11.5 ± 0.9

spectively. The promotion time Tpro and the tail time Ttail are fixed. The data transmission time depends on the size of the data and the network condition when the data is requested. To prefetch n ads, it takes time Tpref

n × Sad Tpref (n) = , (2.1) td where Sad is the ad size and td is the downlink throughput. Putting them together, the total energy of prefetching n ads through LTE can be formulated as

Epref (n) = Ppro × Tpro + Pcell × Tpref (n) + Ptail × Ttail. (2.2)

To determine the power consumption of the LTE cellular interface, similar to previous work [44], we use the Monsoon power monitor as the power supplier for our Samsung Galaxy S6 phone (GS6) to measure the average instant power. The results are summarized in Table 2.3, where the power consumption is measured when the screen is on.

2.4.2.2 Network Quality

The downlink throughput, td, is used as the indicator of the network quality. How- ever, the information about the downlink throughput cannot be directly obtained from the smartphone. There are two methods to estimate the downlink through- put. The first method is based on the signal strength information such as the signal-to-noise ratio (SNR) or the signal-to-noise-plus-interference ratio (SNIR) [33]. However, this method is inaccurate and only used by simulators like ns3. The second method is active probing which measures the downlink throughput by downloading some data [45, 46]. Fig. 2.6a shows the measured LTE downlink throughput with different amounts of data downloaded using a TCP connection. As shown in the figure, a large amount of data (i.e., more than 1 MB) is required to 23

8 8 7 7 6 6 5 5 4 4 3 3 2 2 Throughput (Mbps) Throughput (Mbps) 1 1 0 0 10 20 50 100 200 500 1000 2000 5000 10 20 50 100 200 500 1000 2000 5000 Bytes downloaded (KB) Bytes downloaded (KB) (a) 1000 KB of data is required (b) By using the second half of to accurately measure the downlink data, 200 KB of data is required throughput of 5.5 Mbps. to accurately measure the downlink throughput of 5.5 Mbps. Figure 2.6. Measuring LTE downlink throughput with different amounts of data perform such measurement. This is because the time to download a small amount of data may depend on other factors rather than the downlink throughput, such as the initial congestion window, the slow-start mechanism of TCP, etc. Since the TCP slow-start only happens at the beginning of the data transmission, an enhanced method can divide the data in half and measure the throughput when downloading the second half of data. As shown in Fig. 2.6b, by using the enhanced method, 200 KB data is enough to accurately measure the throughput.

2.4.2.3 Energy-aware Prefetching Algorithm

With multiple prefetching options provided by R and the current downlink through- put, the energy-aware algorithm aims to minimize the energy consumption of prefetching ads. To describe the algorithm, we introduce two variables: α0 =

100%, n0 = 0, which mean that more than 0 ads must be displayed for an app usage. There are three cases when running the algorithm. 1) It is the first time to run the algorithm, and n0 ad has been displayed. 2) nj (1 ≤ j < M) ads have been displayed but insufficient (i.e., the prefetching option (αj, nj) has been chosen by the last run of the algorithm). 3) The displayed ads are equal to or more than nM , which is the largest number of ads displayed in history.

The first two cases can be considered together. Given that np (0 ≤ p < M) ads have been displayed, the algorithm calculates the per-ad energy consumption p (Eper ad(i)) for each prefetching option (αi, ni) such that i > p, and chooses the 24

p p one with the minimum Eper ad(i). To calculate Eper ad(i), we first calculate the number of remaining ads, which is (nM − np). Then, we estimate the total energy p consumption of prefetching the remaining ads, Etotal(i). Finally, we have

p p Etotal(i) Eper ad(i) = . (2.3) nM − np p To calculate Etotal(i), two parts of energy consumption need to be considered.

The first part is the energy consumption of prefetching (ni − np) ads. This part of energy will be consumed immediately after the prefetching option (αi, ni) is chosen, and can be calculated by Epref (ni − np) according to the current downlink throughput. The second part is the energy consumption of prefetching additional ads in the future, if and only if more than ni ads are needed. This part of energy cannot be accurately calculated, because future prefetching behaviors are hard to predict.

To simplify the calculation, we assume that if more than ni ads are needed, one additional prefetching is used for all the remaining ads which is (nM − ni). Since users usually stay still while using an app, the network quality should not change too much. Thus, the energy consumption can be calculated by Epref (nM − ni). With the energy consumption for the current prefetching and future prefetch- ing, we have

p Etotal(i) = Epref (ni − np) + pro × Epref (nM − ni), (2.4) where pro = αi/αp, which is the probability to display more than ni ads after np ads already being displayed. If no ad has been displayed before (i.e., p = 0), pro

= αi. In case three, since the displayed ads are equal to or more than the largest number of ads displayed in history, we cannot decide how many ads to prefetch based on the historical app usages. Thus, we decide to prefetch a fixed number of ads. A complete description of the energy-aware prefetching algorithm is shown in Algorithm 1. 25

Algorithm 1: Energy aware prefetching algorithm

Input : R = {(αi, ni) | i = 1, 2, ..., M}, Number of ads already displayed nd, α0 = 100%, n0 = 0 Output : Number of ads to prefetch npref 1 npref ← 10; 2 Emin ← +∞; 3 if nd ≥ nM then 4 return npref ; 5 else 6 find p s.t. np = nd; 7 end 8 for i ← p + 1 to M do 9 n ← ni − np; 10 pro ← αi/αp; p 11 Etotal(i) ← Epref (n) + pro × Epref (nM − ni); p p 12 Eper ad(i) ← Etotal(i)/(nM − np); p 13 if Eper ad(i) < Emin then 14 npref ← n; p 15 Emin ← Eper ad(i); 16 end 17 end 18 return npref ;

2.4.2.4 Energy-and-data Aware Prefetching Algorithm

Our energy-and-data aware prefetching algorithm also considers the effect of prefetch- ing on the cellular data usage. In LTE, energy efficiency is preferred over data saving since big data plans are becoming more and more affordable. However, it is still necessary to provide a tradeoff between energy and data usage, because it is unreasonable to waste a large amount of data usage in exchange for a small improvement of energy efficiency. The energy-and-data aware prefetching algorithm considers both per-ad data usage and per-ad energy consumption. Given that np (0 ≤ p < M) ads have been displayed, for each prefetching option (αi, ni) such that i > p, the per-ad data p usage, Sper ad(i), can be calculated as

p (ni − np) × Sad Sper ad(i) = p , (2.5) Ne (i)

i p P where Ne (i) = [ (nj − np) × (αj−1 − αj)]/(αp − αi) is the expected number j=p+1 of ads that can be displayed among the prefetched (ni − np) ads. After obtaining p p p Sper ad(i), the per-ad cost, Cper ad(i), can be calculated by adding Sper ad(i) and 26

Algorithm 2: Energy-and-data aware prefetching algorithm

Input : R = {(αi, ni) | i = 1, 2, ..., M}, Number of ads already displayed nd, α0 = 100%, n0 = 0, β Output : Number of ads to prefetch npref 1 npref ← 10; 2 Cmin ← +∞; 3 if nd ≥ nM then 4 return npref ; 5 else 6 find p s.t. np = nd; 7 end 8 for i ← p + 1 to M do 9 n ← ni − np; 10 pro ← αi/αp; p 11 Etotal(i) ← Epref (n) + pro × Epref (nM − ni); p p 12 Eper ad(i) ← Etotal(i)/(nM − np); i p P 13 Ne (i) ← [ (nj − np) × (αj−1 − αj )]/(αp − αi); j=p+1 p p 14 Sper ad(i) ← [(ni − np) × Sad]/Ne (i); p p p 15 Cper ad(i) ← Eper ad(i) + β × Sper ad(i); p 16 if Cper ad(i) < Cmin then 17 npref ← n; p 18 Cmin ← Cper ad(i); 19 end 20 end 21 return npref ;

p Eper ad(i) together with a parameter β to perform the tradeoff:

p p p Cper ad(i) = Eper ad(i) + β × Sper ad(i). (2.6)

In the energy-and-data aware prefetching algorithm, the prefetching option p that yields the smallest value of Cper ad(i) is chosen. A complete description of the energy-and-data aware prefetching algorithm is shown in Algorithm 2.

2.5 Performance Evaluation

In this section, we first introduce the app usage traces and the network quality traces, and then use trace-driven simulations to evaluate the performance of our prefetching algorithms. 27

2.5.1 Evaluation Setup

In the evaluation, the ad refresh interval is set to two values: 30 seconds, which indicates an aggressive way to display ads, and 60 seconds, which is usually recom- mended by ad platforms like [38]. Three ad sizes are considered based on the discussion in Section 2.3.1.2: 50 KB, 100 KB, and 200 KB. We also compare four algorithms: “Non-prefetch” is the traditional way to periodically fetch ads, which is used as benchmark. “80th Percentile” indicates a prefetching algorithm that uses the 80th percentile value of the number of ads displayed in historical records to determine how many ads to prefetch, which is based on [13]. “Max” in- dicates a prefetching algorithm that prefetches the largest number of ads displayed in history. “Perfect” indicates a prefetching algorithm that can prefetch the exact number of ads that will be used by the app. Note that the perfect algorithm does not exist, and it only provides a performance upper bound.

2.5.1.1 App Usage Trace

We use all 34 users and 20 apps for each of them from the LiveLab dataset to generate app usage traces. Each app usage trace contains the usage records of a certain app generated by a user. To train our prediction algorithm, each app usage trace is equally splitted into two parts. Half of the trace is used as the training data and the other part is used for evaluation.

2.5.1.2 Measuring Network Quality

We use our Samsung Galaxy S6 phone to collect throughput traces through a LTE network by downloading files from a mock server. Trace 1 in Fig. 2.7a is an outdoor trace with the downlink throughput averaged at 5.5 Mbps. Trace 2 in Fig. 2.7b is an indoor trace with downlink throughput averaged at 2.1 Mbps. The throughput of those two traces is much lower than that measured by speed test tools like Speedtest [47], which can reach 13 Mbps in the outdoor environment. The reasons are as follows: First, in our measurement, we use single HTTP connection over TCP, the same way of fetching ads, to download files. The TCP connection has been proven unable to fully utilize the LTE bandwidth, because its congestion control cannot quickly adapt to the network speed change. Second, speed test 28

10 6 9 measured throughput measured throughput 5 8 median throughput median throughput 7 4 6 3 5 4 2 3 Throughput (Mbps) Throughput

Throughput (Mbps) Throughput 1 2 1 0 0 10 20 30 40 50 0 10 20 30 40 Sample number Sample number (a) Trace 1 (b) Trace 2 Figure 2.7. Measurement-based throughput traces. Throughput is randomly collected when walking inside and outside the building.

80 80

50 KB ad 50 KB ad 60 100 KB ad 60 100 KB ad 200 KB ad 200 KB ad

40 40

Energy ratio (%) 20 Energy ratio (%) 20

0 0 Max Max Energy Perfect Energy Perfect Energy&data Energy&data 80th Percentile 80th Percentile

(a) 30 seconds ad refresh interval (b) 60 seconds ad refresh interval Figure 2.8. Energy ratio for measurement-based throughput (Trace 1)

150 150 50 KB ad 50 KB ad 100 KB ad 100 KB ad 200 KB ad 200 KB ad 100 100

50 50 Energy ratio (%) Energy ratio (%)

0 0 Max Max Energy Perfect Energy Perfect Energy&data Energy&data 80th Percentile 80th Percentile (a) 30 seconds ad refresh interval (b) 60 seconds ad refresh interval Figure 2.9. Energy ratio for measurement-based throughput (Trace 2) tools usually optimize connection parameters and use multiple threads to saturate the network connections. They also have a higher throughput than average. 29

1 1

0.5 0.5 F(x) F(x) 50 KB ad 100 KB ad Trace 1 (5.5 Mbps) 200 KB ad Trace 2 (2.1 Mbps) 0 0 0 20 40 60 80 0 20 40 60 80 Number of ads prefetched Number of ads prefetched (a) Fewer ads are prefetched as the (b) More ads are prefetched when the ad size increases from 50 KB to 200 network quality is better. (100 KB KB. (Trace 1) ad) Figure 2.10. CDF plots of the number of ads prefetched according to the energy-aware prefetching algorithm (30 seconds ad refresh interval) 2.5.2 Evaluation Result (Measurement-based Network Qual- ity)

Fig. 2.8 and Fig. 2.9 show the energy performance of different prefetching algo- rithms on two measurement-based throughput traces (i.e., Trace 1 and Trace 2). The energy consumption of the non-prefetch algorithm is used as benchmark to calculate the energy ratio of other algorithms. Based on the results, we have three observations. First, all prefetching algorithms save less energy when the ad size increases from 50 KB to 200 KB or when the network quality is poor (i.e., trace 2). This is because prefetching algorithms usually download more ads than needed. As the ad size increases or under poor network quality, more energy is wasted on downloading those unnecessary ads. Second, when the ad refresh interval is shorter (i.e., 30 seconds), more ads are displayed and there is more energy saving opportunity for prefetching. Third, our prefetching algorithms are comparable to the perfect algorithm on energy consumption, and outperform all other algorithms under various scenarios. To save energy, our algorithms can adjust the number of ads to prefetch ac- cording to the energy consumption of downloading an ad, which is influenced by the ad size and the network quality. Fig. 2.10a plots the CDF of the number of ads prefetched with different ad sizes according to our energy-aware prefetching algorithm. As can be seen, fewer ads are prefetched as the ad size increases from 50 KB to 200 KB. This is because more energy may be wasted on prefetching unneeded ads as the ad size becomes larger. Fig. 2.10b shows the number of 30

1224.3%

1000

500

Data ratio (%) 249.0% 207.8% 174.5% 100.0% 0 Max Energy Perfect Energy&data 80th Percentile Figure 2.11. Average data ratio ads prefetched under different network quality. Under good network quality (i.e., trace 1), prefetching unneeded ads consumes less energy, so our algorithms save energy by prefetching more ads to avoid other possible long tail problems. Since our prefetching algorithms can adaptively adjust the number of ads to prefetch, they exhibit steady performance on energy consumption under various scenarios. In contrast, the 80th percentile algorithm and the max algorithm cannot adjust the number of ads to prefetch according to the ad size and the network quality. So they consume much more energy under certain scenarios (e.g., 200 KB ad size, poor network quality). The average data usage of different prefetching algorithms is shown in Fig. 2.11. The non-prefetch algorithm’s data usage is used as benchmark to calculate the data ratio of other algorithms. As can be seen, the perfect algorithm has the least data usage because it only fetches the exact number of ads. Comparing with the energy-aware prefetching algorithm, whose data ratio is around 200%, the energy-and-data aware prefetching algorithm has a lower data usage and reduces its data ratio to around 170%. The data usage of the 80th percentile algorithm is acceptable, while the max algorithm uses an unaffordable amount of data which leads to a 1224% data ratio.

2.6 Testbed Development and Evaluation

To understand how much energy can be saved by our prefetching algorithms in a real app, we have implemented the proposed algorithms and measured their performance in a book reader app on smartphones. 31

×104 ×104 ×104 3 3 3 App App App Ad Ad Ad 2 Algorithm 2 Algorithm 2 Algorithm

Energy (J) 1 Energy (J) 1 Energy (J) 1

0 0 0 Max Max Max Energy Energy Energy Non-prefetch Energy&data Non-prefetch Energy&data Non-prefetch Energy&data 80th Percentile 80th Percentile 80th Percentile (a) 50 KB ad (b) 100 KB ad (c) 200 KB ad Figure 2.12. Energy consumption in an app 2.6.1 Testbed Development

We have developed three versions of a book reader app: ad-enabled, ad-disabled, and no-ad. The ad-enabled app embeds an ad client which can download ads from a mock server with different prefetching algorithms. The ad-disabled app embeds a tailored ad client which runs the prefetching algorithms without downloading ads. The no-ad app has no ad client. We distributed the app to seven students in our department, and collected app usage traces for one month. For each trace, half is used as the training data and the other half is used for evaluation. We replay the traces using different versions of the app on a Samsung Galaxy S6 phone with 4G data plan, and use a Monsoon Power Monitor to measure the power consumption. The ad refresh interval is set to 60 seconds in our experiment. For ad-enabled and ad-disabled apps, we run multiple tests with different prefetching algorithms. Each test is repeated five times, and the average energy consumption is reported.

2.6.2 Experimental Results

The energy consumption of the ad-enabled app includes the energy consumed to prefetch ads (ad energy), the energy consumption of the prefetching algorithms, and the app energy (including CPU and display). The energy consumption of the ad-disabled app includes the energy of the prefetching algorithms and the app energy. The energy consumption of the no-ad app includes only the app energy. By considering them together, we can break down the energy consumption of the ad-enabled app. As shown in Fig. 2.12, the energy consumption of our prefetching algorithms is insignificant compared to the ad energy. The ad energy 32 of the non-prefetch algorithm (i.e., fetching ads periodically) accounts for almost half of the total energy consumed by the app. Compared to the non-prefetch algorithm, our prefetching algorithms can save 70% to 80% of the ad energy and 30% to 40% of the total energy with different ad sizes. Compared to the 80th percentile algorithm and the max algorithm, our prefetching algorithms can reduce the ad energy at least by half. When the ad size is 200 KB, the ad energy of our algorithms is one fourth of the ad energy of the max algorithm, and two fifth of the ad energy of the 80th percentile algorithm.

2.7 Conclusion

In this chapter, we proposed network quality aware prefetching algorithms to re- duce the energy of fetching in-app ads. First, in contrast to traditional data-mining based prediction algorithms, which only generate one option, we designed a predic- tion algorithm that generates a set of prefetching options with various probabilities. Second, with these prefetching options, we estimated the energy consumption of each one by considering the effect of network quality and chose the best one accord- ingly. Two prefetching algorithms were proposed: the energy-aware prefetching al- gorithm aims to minimize the energy consumption, and the energy-and-data aware prefetching algorithm also considers the data usage to achieve a tradeoff between energy and data usage. Evaluation results show that our prefetching algorithms can save 80% of energy compared to traditional ways of fetching ads periodically, and outperform existing prefetching algorithms under various network quality. Chapter 3

Prefetch-Based Energy Optimization on Smartphones

3.1 Introduction

As cellular networks such as 3G or LTE have been widely deployed, people with smartphones can access various kinds of data services at anytime, anyplace. How- ever, data transmission through cellular network also consumes a large amount of energy due to the long tail problem [2, 48, 49, 11, 12]. For example, in the last chapter, we found that in-app advertising may lead to significant battery drain on smartphones, since ads are fetched periodically and then the long tail problem occurs frequently [7, 13]. To avoid unnecessary long tail problems, we can predict the number of ads needed in the future and then prefetch those ads together. Besides in-app advertising, prefetching can also be used to reduce the tail energy in many other mobile apps. For example, in YouTube app, small chunks of video are periodically downloaded for users to watch, which wastes lots of tail energy. By predicting how long the user will watch the video and then prefetching those video together, lots of tail energy can be saved. Thus, it is important to generalize the prefetching problem to provide general solutions. The effectiveness of using prefetching to save energy relies on the accuracy of predicting how much data will be used in the future. Not prefetching enough data may result in extra tail energy and prefetching too much data may also waste 34 energy on downloading unnecessary data. This problem becomes worse when the network quality is poor, as it will take much longer time to download the same amount of data and consume more energy. Thus, we need to carefully decide how much data to prefetch according to the network quality. In this chapter, we generalize and formulate the prefetch-based energy opti- mization problem, where the goal is to find a prefetching schedule that minimizes the energy consumption of the data transmissions under the current network qual- ity. To solve the formulated nonlinear optimization problem, we first propose a greedy algorithm, which iteratively decides how much data to prefetch based on the current network quality. Then, we propose a discrete algorithm by changing the problem to a discrete problem and solve it using dynamic programming. To verify the effectiveness of the proposed algorithms, we have implemented and evaluated them in two apps: in-app advertising and mobile video streaming. Evaluation results show that our algorithms can save more energy than existing algorithms and reduce the bandwidth wastage (i.e., bandwidth used to download unnecessary data) compared to other prefetching algorithms. To summarize, our contributions are as follows:

• We generalize and formulate the prefetch-based energy optimization problem as a nonlinear optimization problem.

• We propose heuristic based algorithms to solve the problem, and find its performance bound. The proposed algorithms can adaptively adjust the amount of data to prefetch based on the network quality.

• We have implemented and evaluated our algorithms in two apps, and the results demonstrate that our algorithms can save more energy compared to existing algorithms.

This chapter focuses on the energy optimization for LTE. Besides LTE, Wi-Fi also provides wireless access with relatively small coverage. Since Wi-Fi is more energy efficient than LTE, if Wi-Fi is available, it should be used to download data. However, in many areas, there is no Wi-Fi available. Then LTE will be used to download data. Our work considers the long tail problem in LTE and 35 uses prefetching to save energy. Energy optimization for Wi-Fi is another research topic, which is orthogonal to our work. The rest of this chapter is organized as follows. We present related work in Sec- tion 3.2, and present some preliminaries in Section 3.3. Section 3.4 formalizes the prefetch-based energy optimization problem, and introduces two heuristic based algorithms. Section 3.5 and Section 3.6 evaluate the performance of our algorithms using two apps: in-app advertising and mobile video streaming, respectively. Sec- tion 3.7 concludes the chapter.

3.2 Related Work

Recently, an extensive amount of research has been done to reduce the energy consumption of the cellular interface. In [50, 30, 11, 27], researchers have found that the cellular interface has to stay in the high-power state for some time after each data transmission, which accounts for almost 50% of the total energy for a typical LTE data transmission [3]. Prefetching can be applied to reduce the tail energy in cellular networks. One important issue in prefetching is to determine what data to prefetch and there are some existing works in this area [17, 18, 19, 20]. For example, Higgins et al. [31] built a system in which developers can provide hints on what data to use in the future, and then prefetch those data. In CAMEO [17], in-app ads are prefetched based on app contexts (e.g., app category), and all relevant ads are prefetched. EarlyBird [32] predicts the embedded URLs in social websites that a user may click based on user history, and then prefetches those embedded content. FALCON [51] predicts future app launches according to contexts such as temporal and spatial access patterns. In all these works, all data predicted to be used in the future will be prefetched to reduce the tail energy. However, the prediction may not be accurate and hence some unnecessary data may be prefetched, wasting a significant amount of energy. This problem becomes worse when the network quality is poor where much more energy will be consumed to transmit the same amount of data [14, 33]. Different from these existing works, we address this problem by adaptively adjusting the amount of data to be prefetched based on the network quality. In mobile video streaming, the amount of data (video) to prefetch is determined 36 by the local buffer size. Some researchers have proposed various techniques to prefetch the video content [52, 53, 54, 6]. For example, in the ON-OFF model [54, 6], a fixed-size buffer is used and videos are prefetched until the buffer is full. Then, it stops downloading and turns off the cellular interface to save energy. When the buffer is almost empty, the cellular interface is turned on to prefetch video again. In GreenTube [53], the authors first predict the remaining time for the user to watch the video, and then decides the buffer size from some candidate values accordingly. The effectiveness of GreenTube is affected by the prediction accuracy and the selection of the candidate values. Although those candidate values can be empirically determined for mobile video streaming under certain network condition, the solution cannot be generalized for other mobile services under different network conditions. eSchedule [52] reduces the energy consumption of prefetching based on the viewing history of specific videos. It only considers current prefetches for optimization but ignores future prefetches, wasting significant amount of energy on downloading unneeded video content. Different from these existing works which are limited to certain mobile services under certain conditions, our work generalizes the prefetch-based energy optimization problem considering various constraints. When applying the proposed algorithms to specific mobile services such as in-app advertising or mobile video streaming, our algorithms can significantly outperform existing solutions.

3.3 Preliminaries

In this section, we first introduce the energy model of the LTE network, and then give a motivation example.

3.3.1 Energy Model

We use the energy model of LTE to formulate the prefetch-based energy optimiza- tion problem. Currently, LTE is the fastest wireless network widely deployed. The power consumption of using the LTE cellular interface to download data can be generalized into three states: promotion, data transmission, and tail. The power consumption of these three states is denoted as Ppro, Pcell, and Ptail, respectively. 37

Table 3.1. Mobile Devices and Network Types Device Provider Network Samsung Galaxy S6 AT&T LTE Nexus 5x AT&T LTE Samsung Galaxy S4 Verizon LTE

The energy consumption of prefetching data can be modeled as follows. Suppose a prefetch downloads xi amount of data, and it starts when the the last prefetch’s data (xj) are completely consumed. Then the time interval between the data transmission end of the last prefetch and the start time of the current prefetch is denoted as ∆T , and ∆T = xj/r − xj/td, where td is the downlink throughput and r is the data consuming rate. As data are consumed after being downloaded, r is strictly less than td, and thus ∆T ≥ 0. There are two cases to compute the energy consumption of the current prefetch. If ∆T is larger than the tail timer

Ttail, i.e., the cellular interface is in IDLE state before the current prefetch starts, then besides the data transmission energy, extra promotion and tail energy will be consumed. 2) If ∆T is smaller than Ttail, there is part of tail energy (between the current prefetch and the last prefetch) but no promotion energy.

 xi Ppro × Tpro + Pcell × + Ptail × Ttail,  td   if ∆T ≥ Ttail; Epref (x) = (3.1) xi Pcell × + Ptail × ∆T,  td   Otherwise. We measure the power level of the LTE cellular interface using three types of smartphones (Samsung Galaxy S6, Nexus 5x, and Samsung Galaxy S4) from two cellular providers (AT&T and Verizon). A brief description of the devices and networks are shown in Table 3.1. Similar to [5], the Monsoon power monitor is used as the power supply to measure the power consumption of the smartphone. The experiment is performed when the screen is on and all background services are stopped. Packet traces are collected using tcpdump to make sure there is no background traffic throughout the measurement. In the experiment, we first ran our test app without transferring any data to measure the Idle (base) power (i.e., the LTE interface is at IDLE state). Then we downloaded a file from a remote server via LTE using the test app, and kept measuring the phone’s power 38

Table 3.2. Power consumption of LTE State Power (mW) Duration (s) Idle (base) 498 ± 35.4 - Galaxy S6 LTE Promotion 1286.3 ± 36.5 0.5 ± 0.1 (AT&T) LTE Data 1959.2 ± 42.1 - LTE Tail 1192.4 ± 31.4 11.3 ± 0.8 Idle (base) 979.5 ± 23.3 - Nexus 5x LTE Promotion 1586.3 ± 27.6 0.5 ± 0.1 (AT&T) LTE Data 2286.4 ± 54.3 - LTE Tail 1265.3 ± 38.2 11.3 ± 0.8 Idle (base) 474 ± 32.8 - Galaxy S4 LTE Promotion 1567.9 ± 47.8 0.3 ± 0.1 (Verizon) LTE Data 2224.7 ± 53.1 - LTE Tail 1757.5 ± 97.5 3.4 ± 0.1

*The power consumption is measured when the screen is on. The Idle power is the baseline power, which is measured when no data is transferring. All power readings in this table include the Idle (base) power. consumption until the power level reduces to Idle to cover the whole tail. Based on the power measurement trace and the start/end time from the packet trace, we can get the power consumption at different LTE states. All experiments are repeated five times to reduce measurement error. The results are summarized in Table 3.2, where all power readings include the base power. The power level for each LTE state can be determined by subtracting the base power from the total. According to the results, we find that AT&T and Verizon have different LTE parameters. For example, the tail time is 11.3 seconds for AT&T (measured by Samsung Galaxy S6 and Nexus 5x), while the tail time is 3.4 seconds for Verizon (measured by Samsung Galaxy S4). The power level of the LTE cellular interface varies from device to device, so different energy model is built for each type of smartphones. We also verify whether the power consumption of data downloading depends on the signal strength, by using Samsung Galaxy S6 and Nexus 5x to download data under various network conditions. As shown in Fig. 3.1, the power consumption does not change when the signal strength is within the normal range from -90 dBm (good signal) to -115 dBm (bad signal). As the signal becomes poor, the network throughput becomes lower. Thus, it takes much longer time to download the same amount of data and then consumes more energy. When the signal strength is extremely bad (-125 dBm), which is considered as the boundary condition for 39

4000 15 4000 12 LTE Power LTE Power 10 Throughput Throughput 3500 3500 10 8 3000 3000 6

Power (mW) 2500 5 Power (mW) 4 2500 Throughput (Mbps) Throughput (Mbps) 2 2000 0 2000 0 -90 -100 -115 -125 -90 -100 -115 -125 RSRP (dBm) RSRP (dBm) (a) Samsung Galaxy S6 (b) Nexus 5x Figure 3.1. Power and downlink throughput with different signal strength, where - 90 dBm represents good signal strength, -100 dBm represents average signal strength, -115 dBm represents bad signal strength, and -125 dBm is considered as the boundary condition for losing the LTE signal. The power consumption is measured when the screen is on. losing the LTE signal and switching back to UMTS/GSM, the power consumption becomes significantly higher and the throughput is only 100∼300 kbps, which is too low for normal data communications. In our work, we consider the signal strength to be within the normal range, where the power consumption of data downloading does not change.

3.3.2 Motivations

Since it is impossible to accurately predict the exact amount of data that will be used in the future, the prefetched data may be more or less than necessary. If the prefetched data is more than necessary, energy and bandwidth will be wasted. On the other hand, if the prefetched data is less than needed, future prefetches will be needed, and resulting in extra tail energy. In cellular networks, the energy consumption of downloading data varies with the network quality. For example, as measured at two locations with poor network quality and good quality, 5 joules are needed to download 1 MB data when the network quality is poor, but only 1 joule is needed when the quality is good. The tail energy remains to be about 10 joules under different network quality. In order to save energy, different prefetching strategies may be taken according to the network quality. Suppose we have a video streaming app, and some amount of data (video) is prefetched to save energy. The amount of data to prefetch is determined based on the viewing history and the network quality. Fig. 3.2 shows 40

100 90 80 50% 70 60 50 F(x) 40 30 20 10 0 0 1 2 3 4 5 6 Video content watched before user skips (MB) Figure 3.2. Viewing content distribution of a video (watched by multiple users). The figure is generated according to the video viewing traces described in Section 3.6.3 More data may Power Data transmission Power be required Poor Promotion Network Tail 6MB ... Quality 2MB 4MB time time Good Power Power Network ... Quality time time (a) Prefetching Option 1 (b) Prefetching Option 2 Figure 3.3. Different prefetching options under different network quality the viewing content distribution of the video, which is generated according to the video viewing traces described in Section 3.6.3. As can be seen, the largest amount of video watched is 6 MB and the median is 2 MB. Then, there are two prefetching options: we can prefetch 6 MB data at once, or we can prefetch 2 MB data with a 50% probability that 4 MB more data is prefetched in the future. Figure 3.3 shows the power consumption of these two options. With poor network quality, the first option (prefetching 6 MB data) consumes 5 × 6 + 10 = 40 joules, and the second option (prefetching 2 MB first, and then prefetching another 4 MB with 50% possibility) consumes 5 × 2 + 10 + 50% × (5 × 4 + 10) = 35 joules. In this case, the second option is more energy efficient. With good network quality, the first option consumes 16 joules, but the second option consumes 19 joules. Then, the first option wins. Thus, the optimal prefetching schedule varies when the network quality changes, and it is important to design adaptive algorithms which can adjust the amount of data to prefetch based on the network quality. 41

3.4 Prefetch-Based Energy Optimization

In this section, we first formalize the prefetch-based energy optimization problem, and then propose two prefetching algorithms.

3.4.1 Problem Formulation

We generalize and formulate the prefetch-based energy optimization problem. Let X be a random variable, which denotes the amount of data to be prefetched in an app. According to historical records, we can obtain the empirical CDF of X, FX (x).

Suppose x0 is the amount of data being used, and max is the largest amount of data used in history. A n-step prefetching schedule is defined as S = {x1, x2, . . . , xn}, where x0 < x1 < x2 < . . . < xn = max. According to S, we can prefetch (x1 − x0) amount of data at the current step, (x2 −x1) amount of data if more data is needed,

(x3 − x2) amount of data if more data is needed, and so on. The goal is to find such S that minimizes the energy consumption.

n X S S arg min pi Ei (3.2) S i=1

s.t. x0 < x1 < x2 < . . . < xn (3.3)

xn = max (3.4) where pS = FX (xi)−FX (xi−1) is the probability that i prefetches are needed to down- i 1−FX (x0) i S P load the data based on the prefetching schedule S, and Ei = Epref (xj − xj−1) j=1 is the energy consumption of i prefetches. The objective function (3.2) is to find a prefetching schedule that minimizes the expected energy consumption of prefetch- ing all the needed data. Constraints (3.3) and (3.4) ensure that {x1, x2, . . . , xn} is a valid prefetching schedule according to its definition. The formulated problem is a nonlinear optimization problem, which is difficult to solve. Even if we can find the optimal solution, it is impractical to implement it on smartphones due to the high computation overhead. Thus, we propose heuristic based solutions. We first propose a greedy algorithm and then propose a better solution by changing the problem to a discrete problem. 42

Table 3.3. The Process of the Greedy Algorithm Round i Remaining data Si Sgreedy 1 max − x0 {x1, max} {x1} 2 max − x1 {x2, max} {x1, x2} 3 max − x2 {x3, max} {x1, x2, x3} ...... n max − xn−1 {max} {x1, x2,..., xn−1, max}

3.4.2 Greedy Algorithm

Instead of directly obtaining a n-step prefetching schedule, we can iteratively decide how much data to prefetch at each step. The heuristic we use is as follows: at each step find a 2-step prefetching schedule to minimize the energy consumption of prefetching the remaining data. Specifically, the remaining data is (max − x0) at the beginning. At each step i, we find a 2-step prefetching schedule Si = {xi, max} to prefetch the remaining data. Then xi amount of data is prefetched at step i, and the remaining data becomes (max − xi). At the next step i + 1, we find a

2-step prefetching schedule Si+1 = {xi+1, max} to prefetch the remaining data.

Then xi+1 amount of data is prefetched at step i + 1, and the remaining data becomes (max−xi+1). This is repeated until there is no remaining data to prefetch.

Let Sgreedy denote the prefetching schedule obtained from the algorithm. The procedure to construct Sgreedy is shown in Table 3.3 and described as follows.

1. Initially, i = 1, and Sgreedy is empty.

2. For the remaining (max − xi−1) amount of data, find a 2-step prefetching

schedule Si = {xi, max} that minimizes the objective function (3.2), which can be written as

p1 × Epref (xi − xi−1)+ (3.5) p2 × (Epref (xi − xi−1) + Epref (max − xi)),

FX (xi)−FX (xi−1) where p1 = is the probability to use less than xi amount of 1−FX (xi−1) 1−FX (xi) data so that only one prefetch is needed, and p2 = is the probability 1−FX (xi−1) to use more than xi amount of data so that two prefetches are needed, given

that xi−1 amount of data is already being used. Note that Epref (xi − xi−1) may have different tail energy according to Eq. 3.1. 43

3. If Si can be successfully found such that xi−1 < xi < max, add xi to Sgreedy,

update i = i + 1, and go back to step 2. Otherwise, xi = max minimizes the

value of function (3.5). Then add max to Sgreedy, and stop.

Since there is only one variable in the function (3.5), the greedy algorithm is easy to compute.

3.4.3 Discrete Algorithm

Although the greedy algorithm can provide a solution to the prefetch-based energy optimization problem, it is hard to find its performance bound. In this subsection, we propose a better solution by changing the optimization problem to a discrete problem and solve it using dynamic programming. The discrete problem can be constructed by discretizing the prefetched data into segments. Let A denote the segment size. Then we can use a random vari- X able K such that K = d A e to denote the number of segments to be prefetched.

According to historical records, we can obtain the empirical CDF of K, FK (k) =

FX (k ×A). Suppose k0 is the number of segments already being used, and m is the largest number of segments used in history. A n-step discrete prefetching sched- ule is denoted as Sdisc = {k1, k2, . . . , kn}, where k0 < k1 < k2 < . . . < kn = m.

According to Sdisc,(k1 − k0) segments of data are prefetched at the current step,

(k2 − k1) segments of data if more data is needed, (k3 − k2) segments of data if more data is needed, and so on. The problem is to find such Sdisc that minimizes the energy consumption.

n X Sdisc Sdisc arg min pi Ei (3.6) Sdisc i=1

s.t. k0 < k1 < k2 < . . . < kn (3.7)

kn = m (3.8) where pSdisc = FK (ki)−FK (ki−1) is the probability that i prefetches are needed to i 1−FK (k0) i Sdisc P download the data based on Sdisc, and Ei = Epref ((kj − kj−1) × A) is the j=1 energy consumption of i prefetches. Given a certain value of A, the objective function (3.6) is to find a discrete prefetching schedule that minimizes the expected 44 energy consumption of prefetching all the needed data. Constraints (3.7) and (3.8) ensure that {k1, k2, . . . , kn} is a valid discrete prefetching schedule according to its definition. Compared to the original problem, the solution space is limited in the discrete optimization problem. The discrete problem can be solved by using dynamic pro- gramming as it can be divided into smaller subproblems. Generally speaking, we choose the number of segments to prefetch at one step, and then solve a subprob- lem if more data is needed. At step i (i ≥ k0), i segments have been prefetched and (m − i) segments are not prefetched. So there are (m − i) ways to prefetch the remaining data at this step. Suppose E[i] is the energy consumed after step i. The goal is to pick one prefetching schedule that minimizes E[i], as shown in Eq. 3.9, where pk = 1−FK (i+k) is the possibility to use more than (i + k) segments after i 1−FK (i) already using i segments. k E[i] = min (Epref (k × A) + pi × E[i + k]) 1≤k≤m−i (3.9) for all i = k0, k0 + 1, k0 + 2, . . . , m.

Since E[m] = 0, E[i] can be computed backwards, i.e., from i = m to k0 iteratively. Given that E[j](i < j ≤ m) are known, using Eq. 3.9 we can calculate

E[i]. This is repeated until E[k0] is obtained, and the segments that minimize

E[k0] will form the optimal discrete prefetching schedule. In the algorithm, (m−k0) iterations are used to calculate E[i] where i starts from m to k0. For each iteration, (m − i) different ways are compared to prefetch the remaining data. Thus, the algorithm has a time complexity of O(m2). The optimal discrete prefetching schedule may consume more energy than the optimal prefetching schedule, since at least a whole segment of data must be prefetched although less than one segment of data is actually needed.

Theorem 1. Compared to the optimal prefetching schedule, the extra energy con- sumed by the optimal discrete prefetching schedule is no more than the energy of downloading one segment of data.

∗ ∗ ∗ ∗ Proof. Assume the optimal prefetching schedule is S = {x1, x2, . . . , xn}. Then 0 0 0 0 there exists a discrete prefetching schedule Sdisc = {k1, k2, . . . , kn} such that 0 ≤ 0 ∗ ki × A − xi < A (1 ≤ i ≤ n) for any given segment size A. According to the defini- tion of (discrete) prefetching schedule, compared to S∗, the extra data downloaded 45

0 by Sdisc is no more than one segment of data, and the extra energy consumed is no more than the energy of downloading one segment of data, no matter how much data is actually needed. Since the optimal discrete prefetching schedule consumes 0 ∗ less energy than Sdisc, we have proven that compared to S , the extra energy con- sumed by the optimal discrete prefetching schedule is no more than the energy of downloading one segment of data.

According to Theorem 1, the upper bound of the extra energy wastage of the A optimal discrete prefetching schedule can be calculated as Eub(A) = Pcell × , td which is the energy of downloading one segment of data.

3.4.3.1 Value of the segment size A

Energy may be wasted if the value of A is not carefully chosen. If A is too small, m becomes very large and the algorithm runs longer to calculate the prefetching schedule, which will increase the CPU energy consumption on smartphones. If A is too large, the potential energy wastage of the obtained prefetching schedule may be high. Thus, we need to consider both issues when setting the value of A. To estimate the CPU energy consumption, a CPU power model is needed. Similar to the literature [55], we adopt the utilization based CPU power model where the CPU power consumption (Pcpu) is proportional to the CPU utilization:

Pcpu = α × ucpu, (3.10) where ucpu is the CPU utilization of the running process which can be calculated based on the information in the file /proc/[pid]/stat on Android, and α is the CPU power consumption at 100% CPU load. To obtain the value of α, we measured the power consumption of the smartphone when the CPU is idle with all background services stopped. Then we increased the workload to be 100% CPU load, and measured the power consumption of the smartphone. α will be the difference between these two measured values. As measured with a Samsung Galaxy S6 phone, the value of α is 1423 mW. To estimate the running time, the algorithm is run with a certain segment size

A0 and its running time T0 is recorded. For any value of A, the running time is A2 T (A) = 0 × T . (3.11) A2 0 46

Putting them together, the CPU energy consumption is

Ecpu(A) = Pcpu × T (A). (3.12)

Then the total energy wastage can be calculated by adding the CPU energy consumption to the potential energy wastage of the obtained prefetching schedule, which is Ecpu(A) + Eub(A). Besides energy, we also consider the time constraint such that T (A) < τ, where τ is a threshold, since some real-time apps have time constraints. We should choose A which can minimize the total energy wastage and satisfy the time constraint. When a larger τ is used, a better A could be found and more energy can be saved. So in the real world, τ is selected as the maximum time to run our algorithm allowed by a specific application.

3.4.4 Greedy Algorithm vs Discrete Algorithm

The greedy algorithm is easy to compute, but it may waste energy on downloading unneeded data. Consider a 2-step prefetching schedule S = {x1, x2 = max} and 0 0 0 0 a n-step prefetching schedule S = {x1, x2, . . . , xn = max} that both minimize the objective function (3.2). It can be easily proven that x1 is larger than or equal to 0 x1. Since the greedy algorithm iteratively constructs the solution based on a 2- step prefetching schedule, it may prefetch too much data at each step. In contrast, the discrete algorithm is more complex to compute, but it obtains the optimal discrete prefetching schedule and we can prove its performance bound. Thus, compared to the greedy algorithm, the discrete algorithm saves more energy on data transmission, but consumes more CPU energy to compute. We will evaluate and compare these two algorithms in Sections 3.5 and 3.6, and discuss the impact of the segment size on the performance of the discrete algorithm.

3.4.5 Discussions

A prefetching schedule is only used for the current prefetch. Every time new data is needed, a throughput estimation is performed and a new prefetching schedule is calculated based on the newly estimated throughput. The throughput estima- tion is performed by downloading a small amount of data (e.g., an ad, a 5-second chunk of video). More accurate measurement of network throughput can be found 47 in [56, 57], which is out of the scope of this chapter. Based on the estimated throughput, the proposed algorithms are used to calculate a n-step prefetching schedule S = {x1, x2, . . . , xn}, and (x1 − x0) amount of data is prefetched at the current step, where x0 is the amount of data being used. During the calculation of the n-step prefetching schedule, our algorithms consider both the current prefetch and possible future prefetches. Then a tradeoff is achieved between the energy consumed for downloading unneeded data (if prefetched data is more than nec- essary) and the extra tail energy (if the prefetched data is less than needed and future prefetches are required). Thus, our algorithms can save energy by adap- tively adjusting the amount of data to prefetch according to the network quality, as illustrated by the example shown in Section 3.3.2. In contrast, traditional data- mining based algorithms do not consider the network quality, and only generate one option about the amount of data to prefetch, which may be too high or too low and then waste energy. Our optimization depends on the actual energy model, which may vary among different phone models and carriers. Due to limited experimental resources, we measured three popular phones from two major carriers in the U.S., and used them to show representative results. For other phone models and carriers, although their power consumption and networking settings may be different, the optimization should be similar as long as there exists a significant long tail problem.

3.5 Performance Evaluations: In-app Advertis- ing

To evaluate the performance of our algorithms, we have implemented the greedy algorithm and the discrete algorithm, and compare them with other existing algo- rithms.

3.5.1 In-app Advertising

Most free apps are associated with in-app advertising [4]. In these apps, ads are usually displayed periodically with a fixed ad refresh interval. Popular ad formats include banners, rich media, and video. Based on ad specifications [38, 39, 36], 48 the whole file size of an ad varies from 50 KB to 200 KB. Although ads are small in term of data size, a recent study has shown that in-app ads, on average, are responsible for 23% of the app’s total energy for top 15 ad-supported apps [58]. This is because in-app ads are periodically downloaded and then generate a large amount of tail energy.

3.5.2 Ad Prefetching Algorithm

Our greedy algorithm and discrete algorithm can be used for prefetching ads. Assume the amount of data to prefetch is x in the greedy algorithm (k × A in the x discrete algorithm). The number of ads to prefetch can be calculated as d ad−size e k×A (or d ad−size e). Since the actual amount of data to prefetch must be multiple times of the ad size, to simplify the computation, this constraint is added in the greedy algorithm. In the discrete algorithm, to determine the value of A, only values that are multiple times of the ad size are considered. Four ad fetching / prefetching algorithms are used for comparison:

• Non-pref: The traditional way to periodically fetch ads, which is used as benchmark.

• 80th: An algorithm based on [58], which uses the 80th percentile value of the number of ads displayed in historical records to determine how many ads to prefetch.

• Max: An algorithm that prefetches the largest number of ads needed in history, which is used in [17].

• Optimal: only prefetch the exact number of ads needed. Note that the optimal algorithm does not exist, and it only provides a performance upper bound.

3.5.3 App Usage and Throughput Traces

We distributed a pdf reader app to ten people in our department for two months, and collected app usage and network throughput traces for each person. In the app, the app duration from the app being opened to being closed, is logged. Then, 49 the number of ads displayed can be calculated by dividing the app duration by the ad refresh interval, and the data needed can be calculated by multiplying the number of ads by the ad size. On average, the app is used for 11 minutes each time and three times a day (200 records in total for each user). Once the app is opened, we also log the signal strength and measure the downlink throughput by downloading a file from a mock server every time an ad needs to be displayed (i.e., every 60 seconds based on the ad refresh interval). The TelephonyManager class of Android is used to obtain the network type, and data is only saved when the network type is LTE.

3.5.4 Parameter Setup and Algorithm Training

The ad refresh interval is set to 60 seconds, which is recommended by ad platforms like AdMob [38], and three different ad sizes are considered: 50 KB, 100 KB, and 200 KB. The algorithms are trained using the app usage traces in two models:

• User-specific model: For each user, the first part of the trace is used to train the algorithms, and the second part is used for evaluation. The train- ing dataset is updated when new records (used for evaluation) are added. Specifically, the first 50 records are used to train the algorithms initially, and the remaining records are used for evaluation. After 10 records are added, the training dataset is updated by replacing the 10 oldest records with these 10 new records.

• Aggregated model: For each user, the app usage traces of all other users are used to train the algorithms, and this user’s trace is used for evaluation. In a real app, the aggregated model is used when not enough user-specific historical data have been gathered.

We use the method described in Section 3.4.3.1 to find the optimal A, where τ equals to the ad refresh interval (60 seconds), which is the largest allowed value under the scenario of in-app advertising. Based on experiments on our Samsung Galaxy S6, A is the data size of one ad, and it takes less than 10 ms for our discrete algorithm to calculate the prefetching schedule using the optimal A. Since τ is much larger than the running time of our algorithm using the optimal A, it 50

1.2

50 KB ad 1 100 KB ad 200 KB ad 0.8

0.6

0.4 Energy ratio (%)

0.2

0

Optimal 80th-AGGR Max-AGGR 80th-USER Max-USER Greedy-AGGRDiscrete-AGGR Greedy-USERDiscrete-USER Figure 3.4. Energy ratio based on trace-driven simulations. The energy consumption of the non-pref algorithm is used as benchmark to calculate the energy ratio of other algorithms. The postfix ”AGGR” after the algorithm name indicates that the algorithm is trained by the aggregated model. The postfix ”USER” indicates that the algorithm is trained by the user-specific model. is unnecessary to consider the change of A affected by τ. In the next section, we will discuss how τ can affect the selection of A and how the change of A affects the performance of our algorithm in mobile video streaming.

3.5.5 Trace-Driven Simulations

Based on the collected throughput traces described in Section 3.5.3, we evaluate the performance of our algorithms. All ad prefetching algorithms are trained by the aggregated model and the user-specific model. For each app usage record based on the evaluation traces, we first calculate the actual number of ads to be displayed, and then run each algorithm to prefetch ads and calculate the energy consumption according to the throughput information obtained from the throughput traces. The energy consumption of prefetching ads is calculated by using the energy model of Samsung Galaxy S6 as described in Section 3.3.1. Fig. 3.4 shows the energy performance of each prefetching algorithm based on the throughput traces. The energy consumption of the non-pref algorithm is used as the benchmark to calculate the energy ratio of other algorithms. Note that different benchmarks are used for algorithms trained by the aggregated model and the user-specific model, since different evaluation traces are considered. As can be seen, all prefetching algorithms consume more energy when the ad size increases from 50 KB to 200 KB. This is because prefetching algorithms usually 51 download more ads than needed. As the ad size increases, more energy is wasted on downloading those unnecessary ads. The algorithms trained by the aggregated model consumes more energy than the same algorithms trained by the user-specific model. Our algorithms (i.e., Greedy-AGGR, Discrete-AGGR, Greedy-USER and Discrete-USER) are comparable to the optimal algorithm, and outperform all other algorithms.

3.5.6 Testbed Development and Evaluation

We have developed a testbed to evaluate performance of the ad prefetching algo- rithms. To differentiate the ad downloading energy from that consumed by other smartphone components such as CPU and display, we have implemented three ver- sions of the pdf reader app: 1) no-ad app, 2) ad-disabled app which runs (trains) ad prefetching algorithms without actually downloading ads, 3) ad-enabled app which runs ad prefetching algorithms, and downloads ads from a mock server using these algorithms. All these three apps are installed on our Samsung Galaxy S6 phone with LTE data plan from AT&T. In the experiment, all ad prefetching algorithms are trained online using the user-specific model. We upload the app usage traces to the phone, and perform trace-driven evaluations using the apps installed on the phone. Specifically, a control program on the phone decides when to start and stop the apps according to the traces, and once the apps are opened, both the ad-disabled app and the ad- enabled app will run ad prefetching algorithms and only the ad-enabled app will download ads. The power consumption of the phone is measured using a Monsoon Power Monitor. The experiments are run with different ad prefetching algorithms, and all experiments are repeated five times to reduce the measurement error. Fig. 3.5 shows the total energy consumed by the ad-enabled app for all tests, where the app energy refers to the energy consumed by the CPU and display, and the ad energy refers to the energy consumed by ads downloading. As can be seen, the ad energy represents almost half of the total energy, if prefetch is not used (shown as non-pref). Our greedy algorithm and discrete algorithm have compara- ble performance to the optimal, and outperform all other algorithms. Comparing with non-pref, our algorithms can save 70% to 80% of the ad energy and 30% to 52

×10 4 ×10 4 ×10 4 5 5 5

App App App 4 Ad 4 Ad 4 Ad

3 3 3

2 2 2 Energy (J) Energy (J) Energy (J)

1 1 1

0 0 0 Max Max Max Greedy Discrete Optimal Greedy Discrete Optimal Greedy Discrete Optimal Non-prefetch80th percentile Non-prefetch80th percentile Non-prefetch80th percentile (a) 50 KB ad (b) 100 KB ad (c) 200 KB ad Figure 3.5. Total energy consumed by the ad-enabled app for all tests. Each ad prefetching algorithm is trained using the user-specific model.

1 1

0.8 0.8

0.6 0.6 F(x) F(x) 0.4 0.4

50 KB ad Location 1 (avg. 2 Mbps) 0.2 100 KB ad 0.2 Location 2 (avg. 5 Mbps) 200 KB ad Location 3 (avg. 8 Mbps) 0 0 0 10 20 30 40 50 60 0 20 40 60 80 Number of ads prefetched Number of ads prefetched (a) Less ads are prefetched as the (b) More ads are prefetched at lo- ad size increases from 50 KB to cations with higher throughputs. 200 KB. Figure 3.6. CDF plots of the number of ads prefetched according to the discrete algorithm 40% of the total energy. Comparing with the 80th and the max algorithms, our algorithms can reduce the ad energy at least by half. When the ad size is 200 KB, the ad energy of our algorithms is one fourth of the ad energy of the max algorithm, and two fifth of the ad energy of the 80th algorithm. This result is consistent with the simulation results shown in the last subsection. To save energy, our algorithms can adjust the number of ads to be prefetched based on the ad size and the network quality. Fig. 3.6a plots the CDF of the number of ads prefetched with different ad sizes in the discrete algorithm. As can be seen, less ads are prefetched as the ad size increases from 50 KB to 200 KB. This is because more energy may be wasted for prefetching unneeded ads when the ad size becomes larger. To see the impact of the network quality, 50 KB ads are prefetched at three locations under different network quality. The average downlink throughput at each location is 2 Mbps, 5 Mbps, and 8 Mbps, respectively. As shown in Fig. 3.6b, more ads are prefetched at locations with higher throughputs. This is because less energy will be wasted for prefetching 53 unneeded ads when the network quality becomes better. The CPU energy consumed to run our algorithms can be calculated by using the energy consumption of the ad-disabled app minus that of the no-ad app. Since the maximum number of ads needed is not large (less than 100), our algorithms can calculate the prefetching schedule very quickly and consume little CPU energy. For the discrete algorithm, the CPU energy is only about 0.3% of the energy consumed to prefetch ads.

3.6 Performance Evaluations: Mobile Video Stream- ing

In this section, we use mobile video streaming to evaluate the performance of our algorithms.

3.6.1 Mobile Video Streaming

A video can be encoded with various coding standards. The most commonly used coding standard for streaming video is H.264, which is recommended by YouTube [59]. In general, a higher bitrate indicates higher video quality. At the same bitrate, using a newer coding standard like H.264 can achieve a substantially better video quality than using an older coding standard like H.263. The amount of data to download can be calculated by multiplying the viewing time with the bitrate. For example, viewing a 400 Kbps video for 60 seconds will need 24 Mb (3 MB) of data.

3.6.2 Mobile Video Streaming Algorithms

To save energy during mobile video streaming, many algorithms have been pro- posed to prefetch (download) the video content. We compare our Greedy and Discrete algorithms to five other algorithms. A brief description of these algo- rithms is as follows:

• Whole video: This algorithm downloads the whole video at once.

• ON-OFF: ON-OFF is an algorithm used by YouTube, which uses a fixed-size buffer. In this approach, the network connection is closed when the buffer 54

is full and reopens when the buffer is almost empty. In the experiment, the buffer is set to accommodate a 60-second chunk of video.

• GreenTube: GreenTube [53] first predicts the remaining time a user is willing to watch the video, and then decides the buffer size from some candidate values accordingly. After determining the buffer size, it performs similar to the ON-OFF algorithm. The effectiveness of GreenTube is affected by the prediction accuracy and the selection of candidate values, which are empir- ically determined. The candidate values are set to {1, 3, 5, 10, 20, 50} MB in this chapter.

• eSchedule: Similar to our algorithms, eSchedule [52] also considers the view- ing time distribution of a video. Based on the distribution, it decides how much data to prefetch by minimizing the expected energy wastage for the current prefetch. Different from eSchedule, our algorithms further consider future prefetches when deciding how much data to prefetch currently. Specifi-

cally, our algorithms calculate a n-step prefetching schedule S = {x1, x2, . . . , xn}

to prefetch all the remaining data, and prefetch (x1 − x0) amount of data at

the current step, where x0 is the amount of data being used.

• Optimal: Optimal knows the viewing time of the video and downloads the needed video content at once. Note that the optimal algorithm does not exist, and it only provides a performance upper bound.

3.6.3 Video Viewing Traces

We collected video viewing traces based on an online course platform from Septem- ber 2014 to January 2015 (130 days in total). There are 102 different videos, and each of them has been watched from two hundred to three thousand times by dif- ferent users. The video’s length is around 35 minutes, and the median viewing time for each video ranges from 3 to 6 minutes. The bitrate of these videos varies from 400 Kbps to 1 Mbps. To simplify the simulation, we set the bitrate as 400 Kbps for all videos. Fig. 3.2 shows an example of the viewing content distribution for a certain video (watched by different users) based on the traces. 55

×10 5

Discrete 15 Greedy GreenTube eSchedule Whole video 10 ON-OFF Optimal Energy (J)

5

0 3 5 10 Downlink throughput (Mbps) Figure 3.7. Total energy consumption of downloading videos for all video viewings under different network quality. The segment size A in our discrete algorithm is set to 50 KB. 3.6.4 Trace-Driven Simulations

We write C++ codes to simulate the video streaming process for each algorithm, and calculate the energy consumption of downloading videos using the energy model of Samsung Galaxy S6 as described in Section 3.3.1. Three downlink throughputs are considered: 3 Mbps, 5 Mbps, and 10 Mbps. The segment size A in the discrete algorithm is set to 50 KB. The CPU energy consumed to run the algorithm is not counted in the simulation, but will be evaluated in the next subsection.

3.6.4.1 Energy Consumption

In the first round of simulations, for each video, we use the first half of the video viewing trace to train the algorithms and the second half for evaluation. Fig. 3.7 shows the total energy consumption of downloading videos for all algorithms. Under various network quality, our Discrete algorithm always has the lowest energy consumption. It can save 5% to 10% of the total energy compared to Greedy, and 15% to 25% compared to eSchedule and GreenTube. Greedy is the second best approach, which saves energy by 10% to 15% compared to eSchedule and GreenTube. ON-OFF and Whole video consume much more energy, because the video and the average viewing time are long. In the second round of simulations, for each video, we use the first half of the video viewing trace to train the algorithms and split the second half into two 56

1 1 1 0.9 GreenTube 0.9 GreenTube 0.9 GreenTube 0.8 eSchedule 0.8 eSchedule 0.8 eSchedule ON-OFF ON-OFF ON-OFF 0.7 0.7 0.7 Greedy Greedy Greedy 0.6 Discrete 0.6 Discrete 0.6 Discrete 0.5 Optimal 0.5 Optimal 0.5 Optimal F(x) F(x) F(x) 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 101 102 103 101 102 103 101 102 103 Energy (J) Energy (J) Energy (J) (a) 3 Mbps throughput (b) 5 Mbps throughput (c) 10 Mbps throughput Figure 3.8. CDF plots of the energy consumption for video viewing sessions using Trace 1. The simulations are performed for each video. The first half of the video viewing trace is used to train the algorithm and the video viewing records longer than 3 minutes in the second half (Trace 1) are used for evaluations.

1 1 1 0.9 GreenTube 0.9 GreenTube 0.9 GreenTube 0.8 eSchedule 0.8 eSchedule 0.8 eSchedule ON-OFF ON-OFF ON-OFF 0.7 0.7 0.7 Greedy Greedy Greedy 0.6 Discrete 0.6 Discrete 0.6 Discrete 0.5 Optimal 0.5 Optimal 0.5 Optimal F(x) F(x) F(x) 0.4 0.4 0.4 0.3 0.3 0.3 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 100 101 102 100 101 102 100 101 102 Energy (J) Energy (J) Energy (J) (a) 3 Mbps throughput (b) 5 Mbps throughput (c) 10 Mbps throughput Figure 3.9. CDF plots of the energy consumption for video viewing sessions using Trace 2. The simulations are performed for each video. The first half of the video viewing trace is used to train the algorithm and the video viewing records shorter than (or equal to) 3 minutes in the second half (Trace 2) are used for evaluations. traces (i.e., Trace 1 and Trace 2) for evaluations. Trace 1 contains the video viewing records that are longer than 3 minutes, and Trace 2 contains the video viewing records that are shorter than or equals to 3 minutes. Fig. 3.8 and Fig. 3.9 plot the CDF of the energy consumption for video viewing sessions using Trace 1 and Trace 2, respectively. Under various network conditions, eSchedule and GreenTube are close to our algorithms when the viewing time is long (Trace 1), but consume much more energy than ours when the viewing time is short (Trace 2). The reason is as follows. GreenTube only considers several candidate values to determine the buffer size. Those candidate values may be too high or too low under the current network quality and then some energy may be wasted. eSchedule minimizes the expected energy wastage for the current prefetch, so it may prefetch video too aggressively in order to save the tail energy. When the viewing time is short, a lot of energy will be wasted on downloading unneeded video content. In contrast, our algorithms 57

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6 GreenTube GreenTube GreenTube

F(x) eSchedule F(x) eSchedule F(x) eSchedule 0.4 Whole video 0.4 Whole video 0.4 Whole video Greedy Greedy Greedy 0.2 Discrete 0.2 Discrete 0.2 Discrete ON-OFF ON-OFF ON-OFF 0 0 0 10-2 100 102 10-2 100 102 10-2 100 102 Fraction of data wasted Fraction of data wasted Fraction of data wasted (a) 3 Mbps throughput (b) 5 Mbps throughput (c) 10 Mbps throughput Figure 3.10. CDF of the fraction of wasted data for video viewing sessions. The fraction of wasted data is calculated by using the amount of video content downloaded but unwatched divided by the amount of watched video content.

calculate a n-step prefetching schedule S = {x1, x2, . . . , xn} to decide how much data to prefetch, achieving a better tradeoff between the tail energy and the energy consumed on downloading unneeded data. Discrete performs better than Greedy, because Discrete calculates a discrete optimal solution, while Greedy is based on a 2-step prefetching schedule and it prefetches more aggressively at each step. When the viewing time is long, both algorithms prefetch aggressively and have similar performance. ON-OFF consumes less energy than ours when the viewing time is short (Trace 2), but much more energy when the viewing time is long (Trace 1). This is because ON-OFF prefetches a 60-second chunk of video at once. When the viewing time is very short, ON-OFF only needs to prefetch few times, and wastes little tail energy. In contrast, our algorithms prefetch more data considering that users usually watch the video for a long time (i.e., the median viewing time is more than 3 minutes), and waste more energy due to downloading some unnecessary data. When the viewing time is long, ON-OFF prefetches many times and wastes lots of tail energy, which is much more than the energy wastage of our algorithms, and hence underperforms our algorithms.

3.6.4.2 Data Wastage

We also record the amount of prefetched video content in the first round of sim- ulations. The fraction of wasted data is calculated by using the amount of video content downloaded but unwatched divided by the amount of watched video con- tent. As shown in Fig. 3.10, downloading the video at once (Whole video) leads to 58 a lot of data being wasted. ON-OFF has the least amount of data wastage, but as mentioned above, it consumes too much energy. Comparing with eSchedule and GreenTube, the wasted data in our algorithms is much less under different network quality. To further limit the amount of data being wasted, we can specify an upper limit for the estimated data wastage of each prefetch. For example, in the discrete algorithm, after i segments of data has been used, k segments of data is prefetched only if the expected data wastage of prefetching that much data, calculated by Pk FK (x)−FK (x−1) × ((k − x) × A), is under a limit. x=i+1 FK (k)−FK (i)

3.6.5 Testbed Development and Evaluation

We have implemented all the video streaming algorithms in a modified video player called VLC [60] on our Samsung Galaxy S6 phone (with LTE data plan from AT&T) to evaluate the performance of our algorithms. In the experiment, we upload the video viewing traces to the phone, and perform trace-driven evalua- tions. For each video, the first half of the video viewing trace is used to train the algorithms, and the second half is used for evaluations. A control program on the phone decides when to start and stop playing videos according to the traces, and the video player will run each video streaming algorithm to download the video content. The power consumption of the phone is measured using a Monsoon Power Monitor. To differentiate the CPU energy consumed to run the algorithms from the data transmission energy consumed to download videos, we run each test for two rounds. In the first round, we only measure the CPU energy by disabling the cellular interface. In the second round, we measure the total energy by enabling the cellular interface. To estimate the downlink throughput, a small chunk of video is downloaded and used for estimation. According our measurements, a 5-second chunk of video (i.e., 5 × 400 Kbps / 8 = 250 KB) is large enough to estimate the downlink throughput. Thus, every time new video content is needed, we first download a 5-second chunk of video to estimate the throughput, and then calculate the prefetching schedule based on the measured throughput to decide how much data to prefetch. 59

140

120

100

80 Segment size A (KB)

60 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 τ (sec) Figure 3.11. Impact of the time constraint τ on the segment size A. The calculation of A is based on the method described in Section 3.4.3.1.

3.6.5.1 Parameter Setup

Our algorithms prefetch new video content before the downloaded video runs out. This is because in LTE, there is a promotion delay of up to 1 second for the cellular interface to promote from IDLE to CONNECTED before starting to download data. Besides the promotion delay, it takes some time to download and decode video content for users to watch. If new video content is not prefetched before the buffer runs out, the video will be interrupted when rebuffering occurs [61]. Thus, our algorithms prefetch new video content five seconds in advance before the buffer becomes empty, where the five seconds value is based on [62]. One more second are added as the time constraint (τ) to calculate the prefetching schedule in our discrete algorithm. We use the method described in Section 3.4.3.1 to find the optimal segment size A in our discrete algorithm. Based on experiments on our Samsung Galaxy S6, the optimal A is 69.5 KB if the time constraint is ignored, and it takes 0.18 seconds for our discrete algorithm to calculate the prefetching schedule using the optimal A. Fig. 3.11 shows the obtained value of A under different time constraints (τ). As can be seen, when τ is less than 0.18 seconds, a larger A is obtained in order to meet the time constraint.

3.6.5.2 Evaluation Results

As shown in Fig. 3.12a, in which the total energy includes both the data transmis- sion energy and the CPU energy, Discrete (using the optimal segment size) can save 10% of energy compared to Greedy, and save 20% and 25% compared to eSchedule and GreenTube, respectively. Greedy saves energy by 10% and 15% compared to 60

×10 5 × 4 2 10 1.73 10 Data transmission 1.5 CPU 8

1.08 1 6 Energy (J)

Energy (J) 4 0.53 0.50 0.5 0.41 0.45 0.26 2

0 0 5KB 10KB 20KB 50KB 100KB 1MB 10MB 20MB 50MB Discrete Greedy ON-OFF Optimal GreenTubeeSchedule Whole video Segment size A (a) Total energy consumption of the video (b) Impact of A on the energy consumption of the player for all video viewings. Optimal A (69.5 discrete algorithm. The red star indicates the op- KB) is used in the discrete algorithm. timal A (A = 69.5 KB). Figure 3.12. Energy consumption of different algorithms. The power consumption is measured using a Monsoon Power Monitor. eSchedule and GreenTube, respectively. ON-OFF and Whole video consume much more energy than others. This result is consistent with the simulation results in the last subsection. Fig. 3.12b shows the total energy consumption of Discrete with different values of the segment size A. As can be seen, if A is small (e.g., 5 KB), the CPU energy consumed to run the algorithm is significant. If A is large (e.g., 50 MB), although the CPU energy is lower, the data transmission energy consumed to download videos increases significantly. This is because at least A amount of data will be prefetched although less data is actually needed. When A becomes larger, more energy is wasted on downloading unneeded video. Thus, A should choose a value which can achieve a balance between the CPU energy and the data transmission energy. The red star (A = 69.5 KB) in the figure is the calculated value based on our method described in Section 3.4.3.1. It leads to very low CPU energy while not increasing the data transmission energy significantly compared to smaller values (e.g. 5 KB). In order to maximize the energy saving, enough data should be prefetched until the potential of saved tail energy is offset by the energy wastage of prefetching unnecessary data. Considering the high tail energy and the relatively low energy consumption of data transmission in LTE, it is better to prefetch more data (video). To verify this intuition, we keep track of the amount of prefetched data in our discrete algorithm. As shown in Fig. 3.13, the prefetched data size ranges from 10 61

1

0.8

0.6 F(x) 0.4

0.2

0 10 100 The amount of prefetched data (MB)

Figure 3.13. CDF plot of the prefetched data size according to our discrete algorithm MB to 50 MB. With the average bitrate of about 400 Kbps, the prefetched video lasts from 200 seconds to 1000 seconds, which is much longer than the tail time.

3.6.5.3 Compatibility with DASH

In the experiments, the downlink throughput is consistently higher than the video bitrate (i.e. 400 Kbps). However, it is possible that the downlink throughput is lower than the video bitrate when the network quality is poor. In this case, the video cannot be downloaded in time and rebuffering occurs frequently. All video streaming algorithms will keep downloading and there is no prefetching. Dynamic Adaptive Streaming over HTTP (DASH) [56, 63, 64] is a popular solution to deal with this problem, where a lower video bitrate is used based on the current down- link throughput. After a new bitrate is selected based on DASH, our algorithms can be used to recalculate the prefetching schedule.

3.7 Conclusions

In this chapter, we generalized and formulated the prefetch-based energy opti- mization problem, where the goal is to find a prefetching schedule that minimizes the energy consumption of data transmissions under the current network qual- ity. To solve the formulated nonlinear optimization problem, we first proposed a greedy algorithm, which iteratively decides how much data to prefetch based on the current network quality. Then, we proposed a discrete algorithm to improve its performance and found its performance bound. We have implemented and eval- uated the proposed algorithms in two apps: in-app advertising and mobile video 62 streaming. Evaluation results show that: in in-app advertising, our algorithms can save more energy than existing algorithms by adaptively adjusting the number of ads to be prefetched according to the ad size and the network quality; in mobile video streaming, our algorithms can save energy by 15% to 25% compared to the best existing algorithms (i.e., eSchedule and GreenTube) under various network conditions. Chapter 4

Context-Aware Task Offloading for Wearable Devices

4.1 Introduction

Wearable devices like smartwatches or glasses, are becoming increasingly popular accompanied with a wide range of novel wearable apps, such as health monitoring [65, 66], augmented reality [15, 67, 68], and gesture recognition [69, 70]. Most of these emerging apps involve computationally intensive tasks. However, due to the size limitation, wearable devices do not have enough power and computation capability to process these tasks. As a solution, wearable devices can offload computationally intensive tasks to the connected smartphone [71]. Previous research has investigated how to make offload decisions on wearable devices [15, 21, 22], but none of them considers how to execute offloaded tasks on smartphone. Existing Android system allocates CPU resources to a task according to its performance requirement, which is determined by the context of the task, i.e., whether the task is related to user interaction. A task related to user interaction will have high performance requirement and should be processed as fast as possible. Thus, Android executes such task in a foreground process, which can obtain all CPU resources and then runs faster. In contrast, a task unrelated to user interaction has low performance requirement. Thus, Android executes such task in a background process, which is restricted to 64 limited CPU resources and then runs slower. However, due to lack of task context information, tasks offloaded from wearable devices cannot be properly executed on smartphones. Current Android smartphones simply execute all offloaded tasks in background processes. Then, tasks related to user interaction cannot be processed promptly, resulting in high interaction latency on wearable devices. To avoid such problem, smartphones can execute all offloaded tasks in foreground processes, but resources may be wasted on unimportant tasks. This problem becomes more severe for modern smartphones equipped with the ARM big.LITTLE architecture [23], where high-performance and high-power big CPU cores are combined with energy-efficient but low-performance little CPU cores. On such smartphones, foreground processes can run on big cores to acceler- ate the processing of tasks related to user interaction, while background processes only run on a little core to save energy. Since these two types of CPU cores have significant differences in power consumption and computation performance, exe- cuting offloaded tasks all in background processes will incur significant interaction latency on wearable devices, while executing them all in foreground processes will waste a large amount of energy on smartphones. To solve this problem, we propose a Context-Aware Task Offloading (CATO) framework, where offloaded tasks can be executed based on their context. Specif- ically, tasks related to user interaction are executed in foreground processes to reduce the interaction latency on wearable devices, while those unrelated to user interaction are executed in background processes to save energy on smartphone. CATO has three salient features. First, CATO does not require any modifica- tion to the underlying Android OS. Second, CATO is backward compatible with traditional context-unaware offloading solutions to avoid unexpected failure when offloading tasks to a CATO-disabled smartphone. That is, CATO enables wearable devices to detect whether the connected smartphone supports CATO, and if not, switch back to traditional ways to offload tasks. Third, CATO explores opportu- nities to further offload tasks from the smartphone to the cloud. Different from prior approaches that use offline profiling [72, 5, 73, 74], we use online profiling to estimate the latency and energy cost of executing a task, and consider the task context when making offload decisions. We have implemented CATO on Android platforms. To evaluate its perfor- 65 mance, we have developed two wearable apps: a speech recognition app with active user interaction, and an activity and health monitoring app without user interac- tion. Both apps use CATO to offload tasks to smartphone. To measure the power consumption of the smartphone, we design and fabricate a Flex PCB based battery interceptor to connect the smartphone with an external power monitor. Experi- mental results show that CATO can reduce latency by at least one third for the task related to user interaction (compared to executing the task in a background process on the smartphone), and reduce energy by more than half for the task un- related to user interaction (compared to executing the task in a foreground process on the smartphone). Our contributions are as follows.

• We find that smartphones cannot properly execute tasks offloaded from wear- able devices due to lack of task context.

• We design a context-aware task offloading framework, in which offloaded tasks can be properly executed on the smartphone or further offloaded to the cloud according to their context, aiming to achieve a balance between good user experience on wearable devices and energy saving on smartphone.

• We implement the CATO framework on the Android platform and develop two applications to demonstrate its effectiveness.

The rest of this chapter is organized as follows. Section 4.2 presents some pre- liminaries. Section 4.3 provides the motivation for context-aware task offloading. We present the design and implementation of the task offloading framework in Section 4.4 and Section 4.5, respectively. The performance of CATO is evaluated in Section 4.6. Section 4.7 discusses related work, and Section 4.8 concludes the chapter.

4.2 Preliminaries

In this section, we first provide some background of how tasks are executed in Android, and then introduce the emerging ARM big.LITTLE architecture and its support in Android. 66

Process Activity Service

Text Translation App Figure 4.1. Android executes all tasks (components) of the same app in the same process. Foreground Foreground Process Process

Activity Service Service Text Translation App Text-to-speech App Figure 4.2. If a foreground app depends on another app, both apps will be run in foreground processes.

4.2.1 Task Execution in Android

In Android, a component is a unique building block to perform a specific task. The most common components are activity and service, where an activity provides a screen of user interface (UI), and a service is used to perform a long-running task. An android app usually consists of multiple components to perform various tasks. For example, in a text translation app, an activity is used to collect the user input and output the result, and a service is responsible for the real task of translating the text. Since tasks of the same app usually depend on each other, Android as- sumes these tasks have the same performance requirement and executes them in the same process (Fig. 4.1). For a foreground app (i.e., an app with an active activity that the user interacts with), its tasks are related to user interaction and have high performance requirement. Thus, Android executes them in a foreground process, which can obtain all system resources and then runs faster. For a back- ground app, its tasks are unrelated to user interaction and have low performance requirement. Thus, Android executes them in a background process, which is re- stricted to limited system resources and runs slower. Sometimes a foreground app may call another app. For example, to convert the translated text into speech, the text translation app may ask the text-to-speech app for help (Fig. 4.2). Then the text-to-speech app should also be run in a foreground process to reduce the inter- action latency. To handle this, Android keeps tracking of the relationship among running apps and manages their processes accordingly. 67

×104 2 2500 500 little core little core little core 2000 big core 1.5 big core big core 400 1500 1 300 1000 Power (mW)

Latency (msec) 0.5 200 500

0 0 Energy consumption (uAh) 100 0 500 1000 1500 2000 0 500 1000 1500 2000 0 500 1000 1500 2000 Frequency (MHz) Frequency (MHz) Frequency (MHz) (a) Average power (b) Latency (c) Energy consumption Figure 4.3. Average power, latency and total energy consumption of a little core and a big core to run the same workload at different frequencies on Nexus 5X

4.2.2 big.LITTLE Architecture on Smartphones

The ARM big.LITTLE architecture [23] is becoming increasingly popular on mod- ern smartphones, where big cores are designed for maximum computation perfor- mance while little cores are designed for maximum energy efficiency. With this architecture, a task can be dynamically allocated to a big or little core accord- ing to its performance requirement. By allocating urgent tasks (related to user interaction) to big cores, the smartphone can accelerate the processing and thus reduce the interaction latency. By allocating other unimportant tasks to little cores, the smartphone can save energy. The most popular commercial imple- mentations of big.LITTLE on smartphones include Samsung Exynos 5/7/8 and Qualcomm Snapdragon 808/810. For example, the LG Nexus 5X smartphone uses Qualcomm Snapdragon 808 with quad-core Cortex-A53 (little) and dual-core Cortex-A57 (big). The little and big cores can operate in a frequency range from 384 MHz to 1.44 GHz, and from 384 MHz to 1.82 GHz, respectively. To understand the difference between big and little cores in terms of com- putation performance and energy consumption, we ran the same workload on a little core and a big core at different frequencies on a Nexus 5X smartphone. The workload is a number of iterations that can generate 100% CPU load. The CPU fre- quency is manually controlled by writing files in the /sys/devices/system/cpu/[cpu#]/cpufreq virtual file system with root privilege. To specify the CPU core to run the work- load, the thread affinity is set by calling NR sched setaffinity through the Java native interface (JNI) in Android. Fig. 4.3 shows the measured results. The aver- age power of the little core (554 mW) at its highest frequency of 1.44 GHz is only one fifth of the power of the big core (2468 mW) at its highest frequency of 1.82 68

GHz. Compared to the little core at the same CPU frequency, although the big core can reduce the latency (execution time) by one third (Fig. 4.3b), due to its high power level, the big core increases the total energy consumption by a factor of two (Fig. 4.3c). In summary, big cores are much more powerful than little cores, but also consume significantly more energy.

4.2.3 big.LITTLE Support in Android

Old Android before version 6.0 does not differentiate between big and little cores, so there is no guarantee that important tasks can be executed on big cores when necessary. As big.LITTLE becomes increasingly prevalent on smartphones, An- droid 6.0 Marshmallow, the latest Android version released in October 2015 [16], brings better support for big.LITTLE using the cpuset. A cpuset defines a list of CPU cores on which a process can execute. By moving processes into differ- ent cpusets, Android can control the usage of big and little cores. For example, if processes are moved to a cpuset with a little core, they can only run on that little core. The cpuset is configured in the device specific init.rc file and set up in the init process during system startup. On most devices such as Nexus 5X or , two cpusets are created: one for foreground processes which contains all CPU cores, and another for background processes which only contains a little core, i.e., cpu0. As introduced above, in Android, foreground processes execute tasks related to user interaction, and background processes execute other unim- portant tasks. Thus, by separating foreground and background processes using these two cpusets, Android is able to allocate urgent tasks onto big cores to reduce the latency, and unimportant tasks onto little cores to extend the battery life.

4.3 The Motivation for Context-Aware Task Of- floading

The smartphone knows the context of local tasks, so it can allocate them onto different CPU cores according to their performance requirements. However, for a task offloaded from wearable devices, the smartphone is unaware of its context. Then executing those offloaded tasks without any context information will either 69

Foreground? Background?

Smartphone Process Process Service A Service B

Offload Offload Wearable Foreground Task A Background Task B device App A App B

Figure 4.4. Wearable app A (foreground) and B (background) offload tasks to the smartphone by invoking service A and B, respectively. Since the smartphone does not know the context of these two tasks, it cannot properly run service A in a foreground process and service B in a background process. waste more energy on the smartphone or cause extra delay on wearable devices. For example, Fig. 4.4 shows two wearable apps that offload tasks to the smart- phone. App A is a foreground app with active user interaction, while app B is a background app without user interaction. App A and app B offload task A and task B to the smartphone by remotely invoking service A and service B, respec- tively. If the smartphone knows the context of the task, task (service) A should be executed in a foreground process, since it is related to user interaction. Simi- larly, task (service) B should be executed in a background process, since it does not have user interaction. However, current Android devices offload tasks without context information. That is, the invocation of the remote service is implemented by simply sending a message using Google MessageApi, and that message contains no information about whether the offloading app is a foreground app or a back- ground app. Current Android smartphones simply execute all offloaded tasks in background processes. Then important tasks like task A are executed on a little core, resulting in significant interaction latency on wearable devices. To avoid this problem, smartphones can execute all offloaded tasks in foreground processes, but energy will be wasted if unimportant tasks like task B are executed on big cores. Thus, to better utilize big.LITTLE cores on smartphone, it is necessary to inform the smartphone about the context of offloaded tasks. Then the smartphone can execute them on proper CPU cores according to their performance requirements. Smartphones can further offload tasks to the cloud. In most existing work, researchers only consider a single objective when offloading tasks to the cloud. For example, in [75], the goal is to minimize the latency in order to provide good user experience. However, reducing latency may be at the cost of consuming more en- 70

Wearable device Smartphone Cloud (e.g. smartwatch) CATO App A CATO runtime Local App A service runtime Server App A Offload Offload to CATO / cloud proxy nt service Client discovery Inte IPC Task (AIDL) proxy offloading

Solver Intent App B App B Local Solver service App B oad to Offload Offl oud Profiler cl service

Figure 4.5. Overview of the CATO architecture ergy. For example, the LTE interface on the smartphone consumes a large amount of energy (i.e., tail energy) after a data transmission. Thus, although sometimes offloading a task through LTE can reduce the latency, it may consume more en- ergy than running the task on the smartphone. Such energy waste is unnecessary for tasks unrelated to user interaction, which do not need to be processed very quickly. To achieve a balance between good user experience and energy saving, we should consider the task context when making offload decisions. That is, for tasks related to user interaction, they should be offloaded if latency can be reduced, and for tasks unrelated to user interaction, they should be offloaded if energy can be saved.

4.4 Context-Aware Task Offloading (CATO)

In this section we introduce the design of our offloading framework CATO. We first give an overview of CATO and then describe its major components.

4.4.1 CATO Overview

CATO enables tasks to keep their context when offloaded from wearable devices to the smartphone. Based on the context of the task, CATO achieves a balance between good user experience on wearable devices and energy saving on the smart- phone by minimizing the latency of UI related tasks and maximizing the energy saving of other unimportant tasks. The architecture of CATO is shown in Fig. 4.5. A client proxy runs on the 71 wearable device, which is responsible for discovering whether the connected smart- phone is CATO enabled, attaching context information to offloaded tasks, and offloading tasks to the smartphone. On the smartphone, a server proxy is re- sponsible for receiving offloaded tasks. Based on measurements from past task executions collected by a profiler, a solver estimates the latency and energy cost of executing a new task, and then makes the offload decision.

4.4.2 Client and Server Proxies

It is important to provide a mechanism for the wearable device to discover whether the connected smartphone has CATO enabled. If CATO is not supported, tasks can be offloaded using transitional context unaware ways to avoid unexpected failures of wearable apps. The CATO discovery service is implemented by using the node capability mechanism supported in Android, which enables the wearable device to find connected nodes with a specific capability. In the server proxy, we declare the CATO capability for the smartphone as

CATO_capability

The client proxy can detect and connect to a smartphone with CATO capability. If no such smartphone is found, it will immediately notify the wearable app to use traditional ways to offload tasks. Once a CATO-enabled smartphone is found, the client proxy attaches context information to tasks and sends them to the server proxy. To obtain the context of a task, we inquire the activity manager in Android to check the running status of the app that offloads the task. If the app is a foreground app (i.e., it runs in a foreground process), the offloaded task is considered to be related to user interaction and should have high performance requirement. Otherwise, the task is unrelated to user interaction and has low performance requirement. Note that the running status of an app changes over time. For example, a foreground app could become background if the user opens another app. So the context information of an offloaded task is retrieved in real time. 72

4.4.3 Profiler

The profiler collects two types of data: CPU profiling and network profiling. CPU profiling focuses on the latency and CPU energy consumption of task executions on the smartphone, and network profiling focus on the network characteristics of wireless environments, which are used to calculate the latency and energy cost of offloading a task to the cloud.

4.4.3.1 CPU Profiling

Prior work [72, 75, 5] only considers specific tasks, and uses offline or instrumen- tation based methods to profile the CPU energy consumption on the smartphone. Different from existing work, we adopt online profiling which can be used for any task without modifying the source code. To simplify the profiling, we as- sume that the service that executes the task runs in its own process. If the app’s process includes other components besides the service, we can use the service’s android:process attribute in the app manifest to let the service run in a separate process. Then the goal is to profile the CPU energy consumption of the process hosting the service, and we have the following two approaches. The first approach utilizes the battery fuel gauge on smartphones, which pro- vides whole-phone power readings (in the forms of battery voltage and discharge current) through files in /sys/class/power supply/battery. We can use the power reading before running a process as the baseline to estimate the process’s power consumption during its execution. However, on most smartphones, the fuel gauge does not provide instantaneous power readings. Instead, it reports the average battery power for a long time interval. For example, the time interval on the Nexus 5X smartphone is around thirty seconds. Thus, the power readings are too coarse-grain to accurately profile the process in real time. The second approach (adopted by our profiler) builds power models based on the hardware components of the smartphone, and then estimates the energy con- sumption of a process based on the utilization of each hardware component. The CPU power model is inspired by [76]. We first measured the power consump- tion of different CPU cores at 100% utilization under different frequencies using an external power monitor, as shown in Fig. 4.3a. We then built a utilization 73

Table 4.1. Power consumption of various network interfaces State Power (mW) Duration (s) IDLE* 979.5 ± 23.3 - Sending 2529.6 ± 53.1 - WiFi Receiving 2247.4 ± 32.9 - Sending 2643.2 ± 49.5 - Receiving 2286.4 ± 54.3 - LTE Promotion 1586.3 ± 27.6 0.5 ± 0.1 Tail 1265.3 ± 38.2 11.3 ± 0.8

*: IDLE indicates the power consumption of the smartphone when all network interfaces are at the IDLE state and the screen is on. based CPU power model, in which the CPU power consumption is linear to the CPU utilization for a given frequency. To use this model, we log the process’s CPU utilization (from /proc/[pid]/stat) and the frequencies of CPU cores (from /sys/devices/system/cpu/[cpu#]/cpufreq) every half a second during the pro- cess’s execution. Then we can estimate the process’s CPU energy over each logging interval, and sum up the energy to obtain the total CPU energy consumption of the process. The accuracy of CPU profiling is evaluated by comparing the estimated CPU energy consumption of executing a task on the smartphone with the actual energy measured by a power monitor. The mean error is 7% and the standard deviation is 5.8%. The profiling overhead is negligible, since it only runs during the task execution and reads system files periodically.

4.4.3.2 Network Profiling

To calculate the latency and energy cost of offloading a task, we need to know the network characteristics of the wireless environment, such as the bandwidth, and power consumption. Each time CATO offloads a task to the cloud, the profiler keeps track of the amount of data transmitted, from proc/uid stat/[uid#], and the transfer duration to calculate the bandwidth. The current bandwidth is estimated based on an average of five most recent measurements. To determine the power consumption of wireless interfaces, we use the Monsoon power monitor as the power supplier for our Nexus 5X smartphone to measure the average instant power. The results are summarized in Table 4.1, where the power consumption is measured 74

Table 4.2. Features of executing a speech recognition task Category Features Speech length Task input Language Audio sample rate Execution environment Foreground / background when the screen is on.

4.4.4 Solver

When a new task arrives, the solver first estimates the latency and energy cost of executing the task based on the profiling data, and then makes the offload decision.

4.4.4.1 Estimating Latency and Energy Cost

The latency and energy cost of executing a task on the smartphone may be quite different with different inputs and execution environments. For example, it may take different amounts of time and energy to process a text translation task with input text of different lengths or different languages. Based on the CPU profiling data, CATO constructs a prediction model by considering task inputs as well as the execution environment to predict the latency and energy consumption of executing a new task on the smartphone. We compared the following three predictors, and chose the one with the smallest prediction error in terms of the mean error (mean absolute error divided by mean) and the standard deviation of errors.

• NAIVE returns the mean value of the latency and energy consumption mea- sured from past executions of the task.

• KNN, k-nearest neighbors (k = 5), finds k past task executions (neighbors) that are closest in distance (measuring the similarity of features) to the new execution. The prediction is made by a weighted average of these k neigh- bors, where the weight of each neighbor is based on its distance to the new execution.

• RTREE constructs a regression tree based on the task features for the pre- diction. 75

Table 4.3. Error of predicting the latency of executing a speech recognition task on the smartphone Mean error (%) Standard deviation (%) NAIVE 40.5 26.6 KNN 4.5 8.1 RTREE 6.4 11.8

0 Measured latency Predicted latency 15

10

Latency (sec) 5

0 0 20 40 60 80 100 Test number Figure 4.6. Measured and KNN predicted latency In the experiment, we ran a speech recognition task for two hundred times with different inputs on the smartphone. We measured the accuracy of each predictor to predict the latency of executing the task. Features used by KNN and RTREE are summarized in Table 4.2. Every time a test is finished, the test result is added into the training dataset to retrain the predictors. Table 4.3 shows the prediction error for each predictor. The high prediction error of NAIVE (40.5% mean error, 26.6% standard deviation) indicates that the latency of executing the same task varies significantly. KNN outperforms RTREE with a smaller prediction error (4.5% vs 6.4% for mean error, 8.1% vs 11.8% for standard deviation). Fig. 4.6 shows the measured and KNN predicted latency for each test. The curve of predicted latency appears to closely follow the measured latency after the initial 20 tests. According to the experimental results, KNN is chosen as the predictor. For each type of tasks, a predictor P is trained using the task’s CPU pro- i i i filing data. When a new task Ti arrives, we use P (f1, f2, ..., fn) to predict the i i latency (Llocal) and energy cost (Elocal) of executing Ti on the smartphone, where i i i f1, f2, ..., fn are Ti’s input features and execution environment features (i.e., fore- ground / background).

To estimate the cost of offloading Ti to the cloud, we first estimate the current uplink bandwidth (Bup) and downlink bandwidth (Bdown) according to the network i i i profiling. Then the latency of sending Ti’s input data din is Lsend = din/Bup, and 76

Algorithm 3: Algorithm to make offload decisions i i i Input : a task Ti with features f1, f2, ..., fn Result : how to execute the task i i i i i 1 predict Llocal and Elocal using P (f1, f2, ..., fn); i i i i 2 calculate Lsend, Esend, Lrcv, and Ercv based on current bandwidth; i i i i 3 Loff ← Lsend + Lrcv + Lcloud; i i i 4 Eoff ← Esend + Ercv; 5 if Ti is related to user interaction then // Ti is offloaded by a foreground app; i i 6 if Loff < Llocal then 7 offload Ti to the cloud; 8 else 9 execute Ti in a foreground process; 10 end 11 else i i 12 if Eoff < Elocal then 13 offload Ti to the cloud; 14 else 15 execute Ti in a background process; 16 end 17 end

i i the energy consumption is Esend = Pup ×Lsend, where Pup is the power consumption of sending data through the wireless interface. Similarly, the latency of receiving i i i i Ti’s output data dout is Lrcv = dout/Bdown, and the energy consumption is Ercv = i Pdown ×Lrcv, where Pdown is the power consumption for downloading data. In LTE network, we also consider the promotion delay [27] when calculating the latency and the tail energy [14] when calculating the energy consumption. Finally, the latency of offloading Ti to the cloud is

i i i i Loff = Lsend + Lrcv + Lcloud, (4.1)

i where Lcloud is the estimated execution time of Ti on the cloud. The energy con- sumption of offloading Ti is

i i i Eoff = Esend + Ercv. (4.2)

4.4.4.2 Offload Decision

The offload decision engine decides how to execute a task Ti according to its con- text. If Ti is related to user interaction (from a foreground app), we aim to mini- mize its latency in order to provide better user experience. We can execute Ti in a 77

Table 4.4. CATO API used for preparing a task and offloading the task to the smart- phone Class Method Description CATOTask(Input) Constructs a CATO task. CATOTask addFeature(Key, Adds features for the prediction of the latency Value) and energy cost of executing the task. isConnected() Returns whether CATO is enabled. offload(CATOTask) Offloads a task to CATO. OffloadApi setOffloadResult- Sets a callback to be called when the offload Callback(Callback) result is ready. foreground process on the smartphone, or offload it to the cloud if latency can be i i further reduced (i.e., Loff < Llocal). If Ti is unrelated to user interaction (from a background app), we aim to minimize the smartphone’s energy consumption. We can execute Ti in a background process, or offload it to the cloud if energy can be i i saved (i.e., Eoff < Elocal). The algorithm to make offload decisions is described in Algorithm 3. In this algorithm, we only consider tasks originated from wearable devices, but it can also be used for tasks generated on the smartphone. Once the decision is made, CATO invokes the corresponding service to execute the task on the smartphone or offload it to the cloud. Since CATO runs in the background, the invoked service runs in a background process by default. We need to move the background process into the foreground when a foreground process is required to execute the task. If CATO has the root privilege, this can be easily done by adding the process id into the file /dev/cpuset/foreground/tasks. However, most smartphones are unrooted and users may have privacy concerns about the root privilege. As an alternative approach, we find that when a service issues a notification by calling the method startForeground(Notification), its hosting process becomes a foreground process. This approach does not require the root privilege, but it brings a little interference (i.e., a notification) to the user. We have implemented the second approach in CATO. In the future, we plan to integrate CATO into the Android system and use the first approach.

4.5 Implementation

In this section we introduce the application programming interface (API) of CATO and two apps based on CATO. 78

4.5.1 CATO API

CATO is implemented as an Android library for developers to prepare a context included task and offload the task to the smartphone in a wearable app. Its API, as shown in Table 4.4, is designed similar to Google MessageApi for ease of use.

4.5.1.1 Preparing a Task

CATO provides the CATOTask class for developers to construct a task. Although based on the task input, CATO can extract input features like data size for the prediction of the latency and energy cost of executing the task (see Section 4.4.4), it is very hard to obtain all useful features (e.g., audio sample rate) for the predic- tion. Thus, we provide the addFeature() method for developers to optionally add features to the task.

4.5.1.2 Offloading a Task

As introduced in Section 4.4.2, CATO runs a client proxy on the wearable device to offload tasks to the smartphone. The client proxy exposes interfaces, which are defined by Android Interface Definition Language (AIDL), for other apps to use. To simplify the usage of these interfaces, CATO provides the OffloadApi class, in which the isConnected() method returns whether CATO is enabled (CATO discovery), the offload() method offloads a task from the wearable device to the smartphone, and the setOffloadResultCallback() method allows developers to register a callback function to process the offload result.

4.5.2 Applications

We develop two apps on top of CATO: a speech recognition app, and an activity and health monitoring app (smart alarm).

4.5.2.1 Speech Recognition

Most wearable devices like smartwatches do not have keyboards due to the very limited screen size. The speech recognition app provides an alternative way to get user’s input via voice by converting user’s speech into text. It collects the 79

Table 4.5. Tasks of the smart alarm app with different configurations Sensor set Interval (sec) Data size (KB) Task 1 full 60 240 Task 2 key 60 95 Task 3 full 30 120 Task 4 key 30 48 Task 5 full 15 60 Task 6 key 15 24 user’s speech on the wearable device, and then uses CATO to offload the speech recognition task to the smartphone (or further to the cloud), where the task is processed using the open-source Pocketsphinx library [77]. Other apps can invoke the speech recognition app to obtain the voice typing capability. For example, a SMS app can ask users to speak the reply message via this app. Since this app is either in the foreground or related to a foreground app, CATO executes its task in a foreground process on the smartphone to reduce interaction latency.

4.5.2.2 Smart Alarm

Wearable sensors are extremely useful in providing accurate and reliable informa- tion on user’s activities and health conditions. We use the ideas reviewed in [78] to detect user’s abnormal situations in order to prompt rapid assistance. For ex- ample, we can detect injurious falls and automatically alarm preassigned people like family members or friends. The smart alarm app does not have an UI. It continuously collects sensor data on the wearable device, and periodically offloads the data processing task to the smartphone (or further to the cloud) via CATO. This app is a background app, so CATO executes its task in a background process to save energy on the smartphone.

4.6 Performance Evaluation

In this section, we run some experiments to evaluate the benefits of CATO in terms of latency reduction for foreground apps and energy saving for background apps. 80

Figure 4.7. Flex PCB based battery interceptor for the Nexus 5X smartphone to be connected with the Monsoon Power Monitor 4.6.1 Experimental Setup

We run the speech recognition app and the smart alarm app on the Sony Smart- Watch 3 paired with the Nexus 5X smartphone, both running Android version 6.0.1. The cloud server is a desktop with 2.83 GHz dual-core CPU running Win- dows 7. For the speech recognition app, we generate input speeches of the same language (English) but different lengths varying from 1 second to 20 seconds. These speeches are sampled at 8 KHz and quantized to 16 bits. We test the smart alarm app with different configurations. Two different sets of wearable sensors are used: the full set contains all available sensors on the smartwatch, and the key set con- tains five key sensors (i.e. accelerometer, magnetometer, gyroscope sensor, gravity sensor, and pedometer). We also consider different intervals to offload sensor data. With a larger interval, there will be more data in one offload (task) to process. Six different tasks are generated using different configurations, as shown in Table 4.5. We test CATO under various scenarios. In the evaluation, we use “CATO-local” to indicate that CATO cannot offload tasks from the smartphone to the cloud so that all tasks are executed on the smartphone. “CATO-WiFi” and “CATO-LTE” are used to indicate that WiFi and LTE are available for CATO to offload tasks from the smartphone to the cloud, respectively. To measure the power consumption, we modify the smartphone’s battery con- nection and use an external power monitor to provide power supply for the smart- 81

20 4000

Background Background 15 CATO-local 3000 CATO-local CATO-LTE CATO-LTE CATO-WiFi CATO-WiFi 10 2000 Latency (sec) 5 1000 Energy consumption (uAh) 0 0 1 2 3 4 5 6 7 8 9 10 15 20 1 2 3 4 5 6 7 8 9 10 15 20 Speech length (sec) Speech length (sec) (a) Latency (b) Energy Figure 4.8. Latency and energy consumption of the speech recognition task with speech input of different lengths. CATO executes the task in a foreground process. Compared to executing in a background process, CATO-local can reduce latency by one third. To further reduce latency, according to the algorithm described in Section 4.4.4.2, CATO- LTE and CATO-WiFi offload tasks to the cloud when the speech length is longer than 3 seconds and 1 second, respectively. Offloading through LTE consumes much more energy than through WiFi, since the LTE interface consumes lots of energy (i.e., tail energy) after a data transmission.

100 100 cpu0 cpu0 cpu1 cpu1 80 a little core at 80 big cores at cpu2 cpu2 100% CPU load 100% CPU load 60 cpu3 60 cpu3 cpu4 (big) cpu4 (big) cpu5 (big) cpu5 (big) 40 40 CPU load (%) CPU load (%)

20 20

0 0 0 1 2 3 4 5 0 1 2 3 4 5 Time (sec) Time (sec) (a) Background (b) Foreground (CATO) Figure 4.9. The CPU load of executing the speech recognition task in a background process or a foreground process. A background process only uses a little core (i.e. cpu0), while a foreground process can use the big core to accelerate the task processing. phone, which can eliminate the measurement error introduced by the phone bat- tery. Different from old phone models, the battery connector of modern smart- phones like Nexus 5X is very tiny, and cannot be directly connected to an external power monitor. To solve this problem, we design a battery interceptor based on Flex PCB, which is a very thin circuit board that can be easily bent or flexed. The interceptor is connected with the smartphone’s mainboard and the battery through the corresponding battery connector, which is Hirose BM22-4 for Nexus 5X, and uses a customized circuit to modify the battery connection. As shown in Fig. 4.7, we use this interceptor to connect the Nexus 5X smartphone with the Monsoon Power Monitor to measure the power consumption. 82

4.6.2 Speech Recognition

The speech recognition app should be run in a foreground process to reduce the latency. However, due to lack of context information, it will be run in a background process when offloaded from wearable devices. Thus, we compare CATO with the “background” strategy which runs tasks offloaded from wearable devices in background processes. Compared to running in a background process, executing the task in a fore- ground process (i.e., CATO-local) can reduce the latency by one third (Fig. 4.8). Since the goal here is to reduce the delay and provide better user experience, CATO-local achieves this goal at the cost of consuming more energy. This perfor- mance difference is due to the usage of different CPU cores. As shown in Fig. 4.9, background process only uses a little core (i.e. cpu0), while foreground process can use the big core. We also observe that the speech recognition library, Pocketsphinx, cannot fully utilize the multi-core capability of modern smartphones, since only one cpu is running at 100% load at a certain time (Fig. 4.9b). This is probably because Pocketsphinx puts all computationally heavy jobs on a single thread. If those jobs can be distributed on multi-threads, it can better utilize multi-cores and further accelerate the process of speech recognition. When LTE or WiFi is available, CATO can significantly reduce latency by offloading the task to the cloud (Fig. 4.8a). According to the algorithm described in Section 4.4.4.2, the task is offloaded when the speech length is longer than 3 seconds for LTE and 1 second for WiFi, respectively. Offloading through WiFi spends much less time than through LTE. This is because 1) The network speed of WiFi is faster than the speed of LTE; 2) the LTE interface takes some time (0.5 seconds) to promote from the low-power state to the high-power state before it is ready for the required data transmission, known as the promotion delay. Although energy is not considered when making the offload decision for this task, in most cases, offloading the task can also save energy due to the relatively high CPU energy to compute the task on the smartphone (Fig. 4.8b). LTE consumes much more energy than WiFi. The main reason is that the LTE interface remains at the high-power state for several seconds before switching back to the low-power state to avoid unnecessary promotion delay [6]. As a result, a large amount of energy (also referred to as the tail energy) is wasted after a data transmission. 83

1200 10

1000 Foreground Foreground CATO-local 8 CATO-local 800 CATO-LTE CATO-LTE CATO-WiFi 6 CATO-WiFi 600 4 400 Latency (sec)

200 2 Energy consumption (uAh) 0 0 1 2 3 4 5 6 1 2 3 4 5 6 Task number Task number (a) Energy (b) Latency Figure 4.10. Energy consumption and latency of different tasks from the smart alarm app. CATO executes the task in a background process. Compared to running in a foreground process, CATO-local can reduce energy by half. CATO-LTE never offloads the task to the cloud, so CATO-LTE and CATO-local have the same energy consumption and delay. CATO-WiFi offloads task 1 and task 2 to the cloud to further save energy.

4.6.3 Smart Alarm

CATO executes the task of the smart alarm app in a background process in order to save energy. However, due to lack of context information, it may be run in a foreground process when offloaded from wearable devices. Thus, we compare CATO with the “foreground” strategy which runs tasks offloaded from wearable devices in foreground processes. Compared to running in a foreground process, executing the task in a back- ground process (i.e., CATO-local) can reduce the energy consumption by half (Fig. 4.10). CATO offloads the task to the cloud if energy can be saved. Since the goal here is to save energy, CATO-local achieves this goal at the cost of increasing de- lay. Since LTE consumes a large amount of energy (the tail energy is counted) to transmit small pieces of data, CATO does not offload the task through LTE. This is why CATO-LTE and CATO-local have the same energy consumption and delay. When WiFi is available, task 1 and task 2 are offloaded, while other tasks are still executed on the smartphone because of the high energy efficiency of the little core.

4.6.4 CATO Overhead

Running CATO may incur extra delay, and we measure such overhead. We measure the latency overhead of CATO from the time when the offload request is received by CATO, to the time when the task starts to be executed. We found that CATO takes 5 ms on average to make offload decisions and invoke the corresponding 84 service to process the task. The latency overhead is mainly caused by training the prediction model. Since the latency is very short, we can infer that the energy overhead is also very small.

4.7 Related Work

In the past several years, task offloading from the smartphone to the cloud has been extensively studied. Lots of research has focused on what tasks should be offloaded in order to save energy or reduce delay [72, 5, 73, 74, 75, 79, 80]. MAUI [72] decides which methods should be offloaded, driven by a runtime optimizer that uses integer-linear programming, to maximize energy savings on the smartphone. Odessa [75] adaptively makes offload decisions and adjusts the level of data or pipeline parallelism for mobile interactive perception apps to improve their per- formance. Geng et al. [5] further consider the characteristics of cellular networks in making offload decisions. Gao et al. [73] investigate the dynamic executions of apps, i.e., apps may have different execution paths at runtime, and solve the offload decision problem based on probabilistic analysis. All these works rely on offline profiling or require specific programming environments (e.g., Odessa is built on Sprout, a distributed stream processing environment). Different from them, we use online profiling and have no assumption of any programming environment on the smartphone. As wearable devices become increasingly popular, task offloading from wearable devices to smartphone has received some attention [15, 21, 22]. In [15], a case study has been done based on three augmented reality apps developed on . Experimental results show that all computationally intensive tasks should be offloaded to the nearby smartphone in order to reduce the app’s latency and save energy for wearable devices. Cheng et al. [22]. introduce a three layered offload framework, i.e., from wearable devices in the first layer to the smartphone in the second layer and further to the cloud in the third layer. Based on this framework, they propose an offload algorithm to minimize the delay on wearable devices. All these existing works focus on how to make offload decisions, but ignore the smartphone-side problem of how to execute these offloaded tasks, which is the focus of our work. We find that the smartphone cannot properly execute offloaded 85 tasks due to lack of context, and then propose a context-aware task offloading framework to solve the problem.

4.8 Conclusion

In this chapter, we found that the big.LITTLE architecture based smartphone cannot execute tasks offloaded from wearable devices on proper CPU cores due to lack of task context, resulting in either energy waste if unimportant tasks are executed on big cores or high interaction latency if urgent tasks are executed on little cores. Based on this finding, we proposed a task offloading framework CATO to keep the context of offloaded tasks so that they can be executed properly on the smartphone according to their performance requirements. CATO also explores opportunities to further offload tasks to the cloud, aiming to further save energy for unimportant tasks and reduce delay for urgent tasks. To validate our design, we have implemented CATO on the Android platform and developed two applications on top of it. Experimental results show that CATO can reduce latency by at least one third for the task related to user interaction (compared to executing the task in a background process), and reduce energy by more than half for the task unrelated to user interaction (compared to executing the task in a foreground process). Chapter 5

Characterizing and Optimizing Background Data Transfers on Smartwatches

5.1 Introduction

Smartwatches have become the most popular wearable devices, which bring mobile applications and important notifications straight to the user’s wrist [81]. However, they are still suffering from the limited battery life [24]. For example, based on our experience, a fully charged LG Urbane smartwatch often cannot last for a whole day when it is simply used for reading notifications and emails. Thus, it will be of great value to characterize the energy consumption on smartwatches and propose energy saving solutions. To provide full functionality, smartwatches need to be paired with a phone, and the communication between them is enabled by Bluetooth [8]. However, little study has been done to investigate the Bluetooth power characteristics and the energy impact of Bluetooth data traffic. To address this issue, we first build the Bluetooth power model based on extensive measurements and a thorough examination of the Bluetooth implementation on Android smartwatches. We found that the Bluetooth interface on smartwatches can operate at two modes with different power levels. In particular, the Bluetooth interface is put into the high-power active mode when 87 transferring data, and switches to the low-power sniff mode to save energy when there is no data traffic. The mode transition is controlled by an inactivity timer, whose timeout value can be as high as several seconds. Thus, it is possible that the Bluetooth interface continues to consume a substantial amount of energy before the timer expires (referred to as the tail effect), even when there is no network traffic. Based on the observed power characteristics, the Bluetooth power model is established. We perform an in-depth study of the energy impact of background data trans- fers, which are usually delay-tolerant (unrelated to user interactions) and can be op- timized more aggressively. By collecting and analyzing smartwatch packet traces, we found that background data transfers are prevalent and generated for multiple purposes such as polling for updates, offloading the sensor data to the phone for processing, and pushing notifications. According to our experiments, the smart- watch’s battery life can be shortened to one third by running two very popular applications that generate background data transfers. The high energy cost is due to the following reasons. First, some applications generate small data transfers too frequently when running in the background. For example, Sleep as Android, a popular sleep monitoring application, offloads the sensor data to the phone every twenty seconds, which is too aggressive as there is no user interaction during sleep. Second, transferring small data frequently leads to serious energy inefficiency due to the tail effect. Based on the above findings, we propose four techniques to optimize back- ground data transfers, which are fast dormancy, phone-initiated polling, two-stage sensor processing, and context-aware pushing. Fast dormancy is a technique widely used in cellular networks [28] to reduce tail energy for delay-tolerant data transfers. We adopt this idea and implement it on smartwatches to reduce the tail energy of Bluetooth. The latter three techniques target on specific applications which are responsible for most background data transfers, i.e., polling, data offloading and pushing. Phone-initiated polling leverages the cooperation between a smart- watch and the paired phone to fulfill the polling task while reducing the need of transferring data between them. Two-stage sensor processing enables sensor based applications to make smarter offloading decisions by preprocessing the sensor data on smartwatches, in order to reduce the offloading traffic. Context-aware push- 88 ing leverages the smartwatch’s screen status as an indicator to adaptively adjust the periodicity for the pushing traffic to save energy. We evaluate the proposed techniques based on trace-driven simulations and case studies. Evaluation results show that jointly using these techniques can save 70.6% of the Bluetooth energy, and the latter three techniques can significantly reduce the data transfer volume for specific applications to save energy. To summarize, our contributions are as follows:

• We thoroughly examine the Bluetooth implementations on Android smart- watches and characterize the tail effect in Bluetooth. Then the Bluetooth power model is built.

• We perform the first characterization of background data transfers on smart- watches. We found that background data transfers are prevalent, and many unnecessary small data transfers result in serious energy inefficiency due to the tail effect.

• We propose four optimization techniques, which are fast dormancy, phone- initiated polling, two-stage sensor processing, and context-aware pushing. The first one can save tail energy for delay-tolerant data transfers. The latter three optimize specific applications which generate most background data transfers. Evaluation results show that jointly using these techniques can save 70.6% of the Bluetooth energy.

The rest of this chapter is organized as follows. Section 5.2 discusses related work. Section 5.3 presents some preliminaries. Section 5.4 introduces the Blue- tooth power model. We characterize background data transfers in Section 5.5 and introduce energy optimizing techniques in Section 5.6. Section 5.7 evaluates the performance of the proposed techniques. Section 5.8 concludes the chapter.

5.2 Related Work

The energy characteristics of the wireless interfaces (i.e., WiFi, 3G and LTE) on smartphones have been extensively studied [5, 6]. Based on the energy character- istics, lots of research has been done to analyze the energy impact of data traffic 89 on smartphones and propose energy saving solutions [7, 76, 82, 83, 84]. Huang et al. [82] found that off-screen data traffic accounts for 58.5% of the total ra- dio energy consumption and proposed to use fast dormancy and batching to save energy. Chen et al. [76] performed extensive measurements and analysis of the energy drain of 1520 smartphone. They found that 3G/LTE cellular data traffic accounts for 11.7% of the total energy drain, and a significant portion of the cel- lular energy drain (10.5% out of 11.7%) is tail energy. Qian et al. [7] identified the energy inefficiency of the periodic transfers in mobile applications and inves- tigated various traffic shaping and resource control algorithms. Many researchers proposed to defer and aggregate the data traffic to save energy [11, 14]. Compared to the extensive work on smartphones, little effort has been made toward mea- suring and optimizing the energy consumption of the Bluetooth data transfers on smartwatches. Recently some work has been done to improve the battery life of smartwatches [24, 85]. Min et al. [24] studied the practices for smartwatch battery use and management based on a combination of online surveys and a user study involving 17 Android smartwatch users. In [85], a systematic study was done to characterize the performance of Android Wear OS. It focused on the CPU execution inefficiency and its impact on energy consumption. Other studies proposed to offload tasks from smartwatch to phone in order to save CPU energy [15, 22, 44, 21]. In [15], a case study has been done based on three augmented reality apps. Experimental results show that all computationally intensive tasks should be offloaded to the phone in order to reduce the app’s latency and save energy for wearable devices. Although there are some researches on Bluetooth such as characterizing its per- formance [9] and enhancing its functionalities [10], none of them focuses on the energy consumption.

5.3 Preliminaries

We first give an overview of the Bluetooth technology, and then introduce how the Bluetooth interface operates in different modes. At the end, we discuss the Bluetooth implementation on Android smartwatches. 90

Slave-to-master slotMaster-to-slave slot Slave-to-master slot 312.5 μs

625 μs 1250 μs Figure 5.1. Slots in an ACL link. A slave (smartwatch) can send data to the master (phone) in slave-to-master slots, and receive data from the master in master-to-slave slots.

Tsniff

M→SM→SM→SM→S

0 1 2 3 4 5 6

Anchor Point Anchor Point

Figure 5.2. Illustration of the anchor points in the sniff mode with a tsniff interval of six slots. M→S represents master-to-slave slots.

5.3.1 Bluetooth Overview

Bluetooth [8] is a wireless technology standard for exchanging data over a short distance, which is widely adopted by mobile devices to communicate with each other. As the latest version, Bluetooth 4.0 (also known as Bluetooth Smart or Bluetooth Low Energy) and its update iterations 4.1 and 4.2 are designed pri- marily to achieve considerably reduced power consumption for devices to remain connected to each other and frequently exchange data. The main Bluetooth protocols used by mobile devices are introduced as fol- lows. The Asynchronous Connection-Less or ACL protocol is the most popular baseband layer communication protocol used to carry general data frames. The Logical Link Control and Adaptation Layer Protocol (L2CAP) is layered over the baseband protocol and resides in the data link layer. One of its core functions is to multiplex multiple upper layer connections over a single link. On top of L2CAP, there are two main protocols: Radio Frequency Communication (RFCOMM) and Attribute Protocol (ATT). RFCOMM is a transport protocol which provides re- liable data stream for general data communications. ATT is a protocol designed for the Generic Attribute (GATT) profile which provides a hierarchical data struc- ture to define attributes that can be discovered and transferred between connected devices. 91

Sniff_Mode Active Mode Exit_Sniff_Mode Sniff Mode

Figure 5.3. Transition between the active mode and the sniff mode

5.3.2 Bluetooth Modes

Bluetooth protocols are driven by a system clock with a frequency of 3.2 kHz, which yields a period of 312.5 µs. In an ACL link, communications between devices are based on a period of 625 µs (twice the period of the basic clock), which is known as a slot. As shown in figure 5.1, a slave (smartwatch) can send data to the master (phone) in slave-to-master slots, and receive data from the master in master-to- slave slots. In the active mode, the slave listens in every ACL master-to-slave slots for packets. To reduce energy consumption, the Bluetooth core specification [8] defines three low-power modes: sniff mode, hold mode and park mode. The hold mode and the park mode will not be discussed in this chapter since they are not enabled on smartwatches. In the sniff mode, the master-to-slave slots that the slave listens are reduced in order to save energy. Specifically, the slave can only receive data in specified slots, known as anchor points, which are spaced regularly with an interval of tsniff (illustrated in Figure 5.2). If a packet is received at an anchor point, the slave does not need to wait for the next anchor point to receive the following packets.

Instead, it will continue to listen in the next Ntimeout master-to-slave slots, and the Ntimeout counter will restart every time a packet is received. Thus, the sniff mode can support the same data rate for data flowing as the active mode. The only difference is that there is up to tsniff delay before receiving the first data packet in the sniff mode. Sniff sub-rating provides a means to further reduce power consumption by increasing the time between sniff anchor points (i.e., tsniff ).

According to our packet traces from smartwatches, the value of tsniff in the sniff mode is the same as that in the sniff sub-rating mode (798 baseband slots (498.75 ms)). In bluetooth, the Host Controller Interface (HCI) provides a command inter- face to the baseband controller and link manager. The transition between the active mode and the sniff mode is controlled by HCI commands depending on 92

RFCOMM / Android Wear Android Wear System or application data (Google APIs)

Socket RFCOMM / Socket Application data

ATT / GATT client GATT server Attributes GATT server GATT client

Phone Smartwatch Figure 5.4. Main ways to transfer data between a smartwatch and the paired (con- nected) phone. A single ACL link is shared among all the upper layer network connec- tions. the implementation of upper layer protocols. As shown in Figure 5.3, HCI com- mand Sniff Mode is used to place the link into sniff mode, and HCI command Exit Sniff Mode is used to end the sniff mode and switch the link to active mode.

5.3.3 Bluetooth on Android Smartwatches

Modern smartwatches need to be paired with a phone to provide full functional- ity. In Android, the Android Wear application is the default application used for phones to manage the pairing procedure and control the paired Android smart- watches. The pairing process is initiated the first time smartwatches are opened, which requires user authorizations (by confirming the pin code displayed on the phone that a smartwatch requests to pair with). Once the pairing process is com- pleted, smartwatches will automatically connected to the paired phone when they are within the Bluetooth range of each other. There are three main ways to transfer data between a smartwatch and the paired (connected) phone, as shown in Figure 5.4. First, the Android Wear application constantly maintains a RF- COMM connection, which is used for data transfers generated by the system or Google APIs. Second, developers can use the BluetoothSocket class to establish a RFCOMM connection for data communications. Third, GATT services can be created to exchange attributes (usually small data in certain formats). All these network connections share a single ACL link, which remains opened until all upper 93

Table 5.1. Mobile devices and system versions Device Name Android Version Bluetooth Version LG Urbane Wear 2.0 4.1 Smartwatch Sony Smartwatch 3 Wear 1.5 4.0 Nexus 5x Android 7.0 4.2 Phone LG G4 Android 5.1 4.1

Figure 5.5. Flex PCB based battery interceptor for the Lg Urbane smartwatch to be connected with the Monsoon Power Monitor layer connections are closed. Since the network connection created by the Android Wear application is constantly maintained, the underlying ACL link is never closed (when the smartwatch and the paired phone are within the Bluetooth range).

5.4 Bluetooth Power Model

We first introduce the methodology used to measure the power consumption of the smartwatch, and then build the Bluetooth power model. A brief description of the devices used in our work is shown in Table 5.1, where LG Urbane is paired with Nexus 5x and Sony Smartwatch 3 is paired with LG G4.

5.4.1 Methodology

To measure the power consumption, we need to intercept the battery connection and use an external power monitor to provide power supply for the smartwatch. Different from phones, the battery connector of smartwatches like Lg Urbane is very tiny, and cannot be directly connected to an external power monitor. To solve this 94

Table 5.2. Power level of major components on the LG Urbane smartwatch Power (mW) Duration (sec) Sleep (base) 9.5 ± 0.4 - Wake up (CPU on) 40.3 ± 1.2 - Display (lowest brightness) 60.8 ± 3.5 - Bluetooth idle 9.6 ± 0.4 - Bluetooth tail 40.5 ± 0.8 9.7 (ATT), 4.5 (RFCOMM) Bluetooth data 83.1 ± 0.7 - Bluetooth demotion 82.3 ± 2.8 0.3 ± 0.1

The base power is included in all power readings except the display.

Promotion Data Demotion (active mode)

Done Transfer data Transfer data

Idle Timeout Tail

(sniff mode) ttail (active mode)

Figure 5.6. State transitions of using the Bluetooth interface to transfer data problem, we design a battery interceptor based on Flex PCB, which is a very thin circuit board that can be easily bent or flexed. The interceptor is connected with the smartwatch’s mainboard and the battery through the corresponding battery connector, which is Hirose BM22-4 for Lg Urbane, and uses a customized circuit to modify the battery connection. As shown in Fig. 5.5, we use this interceptor to connect the Lg Urbane smartwatch with the Monsoon Power Monitor to measure the power consumption.

5.4.2 Power Model

Table 5.2 summarizes the power level of major components on the LG Urbane smartwatch. The energy consumption of transferring data via Bluetooth depends on the implementation of upper layer protocols. Figure 5.6 shows the state transitions of using the Bluetooth interface to trans- 95

150 150 Active Mode Active Mode Demotion Demotion Data Transmission Data transmission 100 100 Tail Tail

Power (mW) 50 Sniff Sniff Sniff Power (mW) 50 Sniff Mode Mode Mode Mode

0 0 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 Time (sec) Time (sec) (a) L2CAP/ATT connection (b) L2CAP/RFCOMM connection Figure 5.7. Power consumption of the Bluetooth radio while transferring 500 bytes data (measured by the LG Urbane smartwatch) fer data. When there is no data transfer, the Bluetooth interface stays in the low-power sniff mode and consumes very little energy (0.1 mW after subtracting the base power). Once a data request arrives, the Bluetooth interface is imme- diately turned into the active mode to transfer data. An inactivity timer ttail is triggered to control when to switch back to the sniff mode, which is reset every time a packet is sent/received. The value of ttail varies from different upper layer implementations, which is 9.7 seconds for the L2CAP/ATT connection and 4.5 seconds for the L2CAP/RFCOMM connection. As a result, the Bluetooth inter- face stays in the high-power active mode for several seconds (i.e., the tail) after the data transfer finishes. The purpose of the tail is to avoid the latency of receiving data (up to tsniff in the sniff mode) if the next phone-to-smartwatch data request arrives before ttail expires. However, a large fraction of energy may be wasted in the tail relative to that used for data transmissions.

When ttail expires, the Bluetooth interface starts the process of transition from the active mode to the sniff mode. During the process, the smartwatch exchanges control packets (i.e., LMP sniff req and LMP accepted) with the phone to nego- tiate the sniff parameters (e.g., tsniff ). We refer to this process as the demotion process, and its duration as the demotion time (tdmt), which is 0.3 seconds accord- ing to our measurements. There exists a similar promotion process of transition from the sniff mode to the active mode. The promotion process is simpler and happens simultaneously with the data transmission (i.e., the data transmission can start at the sniff mode and does not have to wait for the promotion process 96 to finish). Thus, the promotion process does not introduce extra delay for data transmissions and has negligible impact on energy. As shown in Figure 5.7, the power consumption of using the Bluetooth interface to transfer data can be generalized into three parts: data, tail and demotion, and the power of these parts are denoted as Pdata, Ptail and Pdmt, respectively. For a data transfer Ti, its energy consumption, E(Ti), can be modeled as follows.

Suppose Ti starts at ti and lasts for di time, and the next data transfer is Tj. The interval between data transfer Ti and Tj is ∆t = tj − ti − di. Then E(Ti) can be calculated depending on ∆t, as shown in Equation 5.1.

 P × d + P × ∆t, if ∆t < t ;  data i tail tail   Pdata × di + Ptail × ttail + Pdmt × (∆t − ttail),  E(Ti) = if ∆t ≥ ttail and ∆t < ttail + tdmt; (5.1)   Pdata × di + Ptail × ttail + Pdmt × tdmt,    Otherwise.

5.4.3 Model Validation

The Bluetooth power model is validated as follows. We perform 1 KB, 2 KB, and 10 KB data transfers using the L2CAP/ATT protocol and the L2CAP/RFCOMM protocol, respectively. For each data transfer, we vary the distance between the smartwatch (LG Urbane) and the phone (Nexus 5x) from 0.5 meters to 20 meters (the smartwatch becomes disconnected at a distance farther than 20 meters). The accuracy of the Bluetooth power model is evaluated by comparing the estimated energy consumption against the actual energy measured by a power monitor. Ac- cording to the experimental results, we have the following observations. 1) The distance has little impact on the power consumption and the data rate of Blue- tooth. 2) The estimation error of the Bluetooth power model is under 5% for all tests. 97

5.5 Background Data Transfers on Smartwatches

This section characterizes background data transfers using packet traces collected on smartwatches, and analyzes their energy impact based on the power model introduced in the last section.

5.5.1 Packet Traces

We collected the Bluetooth packet traces on Lg Urbane using the HCI (Host Controller Interface) snoop log, a tool that captures all Bluetooth HCI pack- ets. We tested 34 popular applications from the store by running them in the background (opening the application and then press the power but- ton). According to the traces, we found that all packets are transferred using the L2CAP/RFCOMM connection created and maintained by the Android Wear application.

5.5.2 Origins of Background Data Transfers

Understanding the origins of background data transfers is important to determine how to optimize these data transfers without affecting the application functional- ities. Since all packets are transferred through the same network connection, the network parameters (e.g., channel/port number) cannot be used to distinguish the data traffic of each application. For packets generated by the system, we manually inspect their content to iden- tify their origins. We found that most packets are associated with (pre-installed) services provided by Google, and a keyword based approach can be used to iden- tify packet sessions with different purposes. For example, a Google synchronization session starts with a packet containing the keyword “WearableSync” and information, followed by a group of packets containing the synchronization data (e.g., fitness data, derived speed, location). The keyword “now-cards” can be used to find the leading packets for cards service. Table 5.3 lists the main system services that generate background data trans- fers, where . is performed every thirty minutes to synchronize its account information and service data (e.g., fitness data), Google now card service 98

Table 5.3. Background data transfers generated by the system Service name Bytes per Session length # of sessions session (KB) (sec) per day Google sync 6.4 1.8 48 Google play 52.2 6.9 23 Google now card 4.2 1.0 27 Table 5.4. Applications generating periodic background data transfers Origins of data Application name Bytes per Periodicity transfers session (KB) (sec) Telegram 0.2 60 Polling Accuweather 4.8 900 Sleep as Android 0.9 20, 120* Offloading Cinch 0.3 5 Endomondo 0.4 1 Pushing Runkeeper 0.3 (1.2)† 1 (5)†

*: the periodicity observed is not fixed. †: two types of data (fitness status and location information) are transferred separately with different periodicities. pushes information like news and weather from phone to smartwatch, and Google play service automatically downloads updates for softwares. These services do not generate data transfers often, but each data transfer session usually involves a large burst of data. We also observed that when the paired phone lost access to Inter- net, connection retry is performed every five seconds, and repeated for hundreds of times for each connection request. To characterize the data traffic generated by user applications, we collected packet traces for each application individually to analyze their traffic patterns. We found that packets are transferred in sessions with a leading packet containing the application name, followed by a group of packets containing the application data. Thus, searching based on application names can be used to recognize the data transfers generated by each application. Table 5.4 lists applications that generate most background data transfers, and their purposes are summarized as follows. Polling is performed periodically by some applications to detect updates. For example, Telegram, a popular SMS application with more than 100 million down- loads, polls a remote server (located at 149.154.175.50:80) every one minute to check updates. Using this kind of traditional polling schema, a lot of polling re- 99 quests / responses are transferred between smartwatch and phone. In the next section, we propose a phone initiated polling schema to reduce the need of trans- ferring data between them in order to save energy on smartwatch. Data offloading: Many wearable applications leverage the rich sensors on smartwatches to monitor user behaviors. Considering the limited CPU and battery capability of smartwatches, these applications usually send the raw sensor data to their phone-side counterparts for processing. For example, Sleep as Android, a sleep monitoring application with more than 10 million downloads, sends collected sensor data to the phone for processing every twenty seconds (sometimes every two minutes), which is too frequent as there is no user interaction during sleep. As another example, Cinch, an application to track weight loss, sends heart rate data to the phone every five seconds, which will keep the Bluetooth interface in the high-power active mode and drain the smartwatch’s battery very quickly. Pushing: Google provides a set of APIs for applications to push notifications to smartwatch. Some applications aggressively use this feature to update their status. For example, two popular fitness applications, Endomondo (10 million downloads) and Runkeeper, use the phone to track fitness status (e.g., running distance and duration) and push the status to the smartwatch every one second. Runkeeper also pushes the location information (estimated by the phone) to the smartwatch every five seconds. By doing so, the latest status can be displayed once the user opens the smartwatch. However, this kind of aggressive pushing generates a huge number of data transfers and causes energy problems.

5.5.3 Energy Impact

We leverage the Bluetooth power model established in section 5.4 to compute the energy consumption of data transfers. We first calculate the energy consumption for each application based on isolated traces, and then perform experiments to show the overall impact of background data transfers on the smartwatch’s battery life. 100

400 300 Data transmission 200 Tail 100 Demotion 50

10 5 Energy (joule)

0

Cinch Runkeeper Telegram Endomondo AccuweatherGoogle syn.Google play Sleep as Android Google now card Figure 5.8. Energy consumption of background data transfers generated by each ap- plication (service) based on isolated packet traces 5.5.3.1 Energy breakdown for each application

We collect a three-hour packet trace for each application (service) and calculate the energy consumption. According to the results (Figure 5.8), we have the fol- lowing observations. First, applications that generate data transfers with small periodicities, such as Runkeeper, Endomondo and Cinch, are extremely energy hungry. For example, Runkeeper consumes 600 joules in three hours, which drains over ten percent of the battery (410 mAh) on a typical smartwatch. Note that these applications are expected to run for a very long time (if not constantly) in the background. Running several such applications simultaneously will cause se- vere energy drain. Second, tail energy accounts for a large proportion of the total energy, which could be optimized. Third, the system traffic (Google sync, Google play and Google now card) does not consume much energy due to its small volume and traffic pattern (transferring large bursts of data).

5.5.3.2 Total energy impact

To show the overall impact of background data transfers on energy, we compared the smartwatch’s battery life under two scenarios. In scenario one (S1), a factory reset is performed before the experiment and the Bluetooth interface is turned off. Only the base power is consumed. In scenario two (S2), we install Sleep as Android and Telegram on the smartwatch and run them in the background. The reason for selecting these two applications is because of their popularity. Then, besides the 101

Table 5.5. Battery life of smartwatches under different scenarios Scenario Battery Life (hours) S1 110 LG Urbane S2 34 Sony S1 225 Smartwatch 3 S2 78

S1: a factory reset is performed before the experiment and the Bluetooth is turned off. Only the base power is consumed. S2: two applications (Sleep as Android and Telegram) are installed and running in the background.

Sensor &CPU Bluetooth (27%) (43%)

Base (31%)

Figure 5.9. Energy breakdown of the Lg Urbane smartwatch in S2. The base energy is obtained from S1. The Bluetooth energy is calculated based on the packet trace and the Bluetooth power model. base power, energy is also consumed by transferring data (Bluetooth) and running background tasks (CPU and sensor). In both scenarios, the smartwatch is kept in the sleep mode with the screen turned off. Table 5.5 summarizes the results for LG Urbane and Sony Smartwatch 3. The battery life of both smartwatches in S1 is three times the battery life in S2. Note that in S2 we do not select the most energy hungry applications (shown in Figure 5.8). If those applications are chosen, the smartwatch’s battery life will become significantly shorter. Figure 5.9 shows the energy breakdown of Lg Urbane in S2, where the base energy is obtained from S1, the Bluetooth energy is calculated based on the packet trace and the Bluetooth power model, and the remaining part is the CPU and sensor energy consumed by background tasks. As can be seen, the Bluetooth energy consumption of background data transfers accounts for 43% of the total energy, which is more than the base energy (31%) and the energy consumed by CPU and sensor (27%). 102

5.6 Energy Optimizing Techniques

Background data transfers on smartwatches are prevalent and can drain battery quickly. There exist lots of works to optimize the background traffic for phones, which communicate with remote servers via the cellular networks [82, 76, 12]. However, these techniques cannot be easily adopted on smartwatches due to the following three factors. 1) Bluetooth has some unique characteristics different from cellular networks, which should be considered. 2) Smartwatches need to leverage the connected phone as a proxy to communicate with remote servers. Thus, more optimization opportunities can be found by exploring the cooperation between a smartwatch and the connected phone. 3) The sources of background data transfers are different. For example, a large amount of data transfers on smartwatches are originated from sensor based applications, which frequently offload the sensor data to the phone for processing. Optimizations targeting on these applications can significantly reduce the background data traffic. By considering the above factors, we propose four techniques to optimize background data transfers on smartwatches, which are fast dormancy, phone initiated polling, two-stage sensor processing, and context-aware pushing.

5.6.1 Fast Dormancy

Fast dormancy [28] is a technique widely used in cellular networks to reduce the tail energy by switching the cellular interface into the low-power state immediately after the data transmission. We adopt similar idea to reduce the tail energy of Bluetooth, i.e., actively switching the Bluetooth interface into sniff mode instead of waiting for the tail timer to expire. Although fast dormancy can save tail energy, it may bring extra energy waste due to the energy cost of the demotion process (see Section 5.4), if performed too aggressively. Another disadvantage of fast dormancy is that cutting the tail may introduce additional latency for data transmissions (i.e., up to tsniff latency for receiving data in the sniff mode). Thus, fast dormancy should be adopted to optimize delay tolerant data transfers including background data transfers discussed in the chapter. We have implemented fast dormancy on our LG Urbane smartwatch. A fast dormancy timer (shorter than ttail) is used to control when to switch from the 103

1. Polling request 2. Polling request

Smartwatch Phone Server

4. Polling response 3. Polling response (a) Message flow diagram

Data transmission Tail Demotion Request Response

Power ... Time (b) Energy consumption of the smartwatch Figure 5.10. An illustration of the traditional polling schema. In each polling, two packets (i.e., one request and one response) are transferred between smartwatch and phone. active mode to the sniff mode, which is reset every time a packet is sent/received. According to our packet traces, 99.7% intra-burst intervals are less than 0.5 sec- onds. So the timer is suggested to be more than 0.5 seconds to avoid unnecessary demotion energy cost. The packet sent/received events are monitored by using hcidump [86], a tool that records all HCI events in real time. Once the fast dor- mancy timer expires, a Sniff Mode command is sent via hcitool [87] to start the demotion process of switching the Bluetooth interface into sniff mode.

5.6.2 Phone Initiated Polling

Polling is a basic method for applications to check updates from remote servers, which may result in lots of data transfers if performed periodically. As the tradi- tional polling schema (Figure 5.10), a polling request is initiated by the smartwatch and sent to the remote server through the phone, and then the phone forwards the received response to the smartwatch. In each polling, two packets (i.e., one request and one response) are transferred between smartwatch and phone. To reduce the data transfer volume and save energy on smartwatches, we propose the phone ini- tiated polling schema (Figure 5.11), in which the phone is responsible for polling and only sends responses to smartwatch if updates are detected. Then at most 104

1. Polling request

Smartwatch Phone Server

3. Polling response 2. Polling response (a) Message flow diagram

Response

Power ... Time (b) Energy consumption of the smartwatch Figure 5.11. An illustration of the phone initiated polling schema. The phone is responsible for polling and only sends responses to smartwatch if updates are detected. In each polling, at most one response needs to be transferred between smartwatch and phone. one packet (i.e., response with update information) needs to be transferred in each polling between smartwatch and phone, and thus energy can be saved on smart- watches. We implement the logics of the phone initiated polling schema as an Android library, which provides a configure file to set parameters (e.g., polling periodicity), and exposes two main interfaces for developers to provide implemen- tations for 1) communicating with the remote server and 2) parsing the response message to check updates.

5.6.3 Two-stage Sensor Processing

Smartwatches provide rich sensor resources that can be used for various purposes such as health monitoring or activity tracking. These sensor based applications usually keep running in the background to collect sensor data and offload the data to their phone-side counterparts for processing. An offloading timer is used to determine when to send the data (e.g., every one minute). To meet the real-time requirements, the offloading timer should be short enough so that the sensor data can be offloaded and processed quickly. However, using a short timer also leads to heavy background data traffic and thus wasting energy on smartwatches. To reduce the offloading traffic, we propose a two-stage sensor processing frame- 105

Offload

Collecting Offload Preprocessing Processing sensor data decision

Delayed for offloading or discarded

Smartwatch Phone

Figure 5.12. The two-stage sensor processing framework. A preprocessing module is deployed on smartwatch to check the application requirement or the effectiveness of the sensor data. An offloading decision is made adaptively based on the preprocessing result. work, which enables sensor based applications to make smarter offloading decisions by preprocessing the sensor data on smartwatches. The idea is based on the fol- lowing intuition. Usually a simple method can be used to check the application requirement and the effectiveness of the sensor data (i.e., whether the data are du- plicated or have little variation), and then an adaptive offloading decision can be made based on the result. The framework is demonstrated in Figure 5.12, where the preprocessing module should be simple enough in order to prevent consuming too much CPU energy on the smartwatch. We rewrite two applications (Sleep as Android and Cinch) based on the framework.

5.6.3.1 Sleep as Android

This sleep monitoring application has been downloaded more than 10 million times from Goole play store. In the application, the smartwatch-side module keeps run- ning in the background to collect accelerometer data and offload them to the phone-side module every twenty seconds. Based on the sensor data, the phone analyzes the user’s sleep phases and draws graphs to display the analysis results. Since the graphs are displayed only when the phone is opened and there is usually no user interaction during sleep, a much longer offloading timer (i.e., larger than twenty seconds) can be used to reduce the amount of data transfers. Delaying offloading the sensor data may lead to violation of the real-time re- 106 quirements of the application. By reverse engineering the source code and reading the design documents, we found that the only task that has real-time requirements is smart wake up, which allows the user to specify a time window during which the user wants to be woken up, and looks for light sleep phases to trigger the alarm. Our two-stage sensor processing framework can be easily adopted to reduce the offloading traffic while satisfying the real-time requirements. We added several lines of codes in the smartwatch-side module to check whether the current time falls in the wake up time window. If the condition is met, the offloading timer is set to twenty seconds (i.e., the same as the original timer) to satisfy the real-time requirements for the smart wake up task. Otherwise, the sensor data are offloaded every thirty minutes or when the screen turns on, which is an indicator that the application may be opened.

5.6.3.2 Cinch

This application continuously monitors the user’s heart rate based on the heart rate sensor equipped on modern smartwatches. The heart rate value is read every five seconds and offloaded to the phone for analysis. The phone will issue an alert if abnormal condition is found (e.g., heart rate is out of the normal range). Due to the low sensitivity of the sensor, most heart rate values read are duplicated. We rewrite this application based on our two-stage sensor processing framework. At the smartwatch side, codes are added to check duplicated values and only offload data that has a different value from its previous one.

5.6.4 Context-Aware Pushing

Some applications aggressively push notifications to update their status on smart- watches with a very small periodicity (e.g., one second used by Endomondo). We propose the context-aware pushing technique to reduce the pushing traffic based on the smartwatch’s screen status, i.e., whether the screen is on or off. When the screen is on, a smaller periodicity is used in order to timely display the latest ap- plication status to the user. When the screen is off, a larger periodicity is used to reduce the pushing traffic. Additional messages are needed to reset the periodicity when the screen status changes. 107

The implementation is straightforward. The change of screen status is de- tected by registering a BroadcastReceiver in a background service for intents AC- TION SCREEN ON and ACTION SCREEN OFF. The communication is imple- mented by using the Google MessageApi. This implementation is used as a proto- type to evaluate the overhead (i.e., running additional codes and generating extra messages).

5.7 Performance Evaluations

In this section, we first evaluate the performance of the proposed techniques for individual applications and then perform a case study to show the overall effec- tiveness of these techniques for all background data transfers.

5.7.1 Traffic Optimization for Individual Applications

We collect packet traces for each application. To evaluate the energy saving of an optimization technique, we modify the packet traces according to the optimization results and calculate the energy consumption using the Bluetooth power model established in 5.4. Assumptions and experimental setups are listed below. 1) The fast dormancy timer is set to 0.5 seconds. 2) For Telegram, we assume all polling responses contain update information and need to be transferred to the smartwatch. So the evaluation results show the lower bound of the energy saving potential for the phone initiated polling schema. 3) Suppose the wake up time window in Sleep as Android is from 6 am to 8 pm. We leave the data transfers within the time window unmodified, and aggregate other data transfers every thirty minutes. 4) We collect a trace of screen status changes during walking, and use it to evaluate the context-aware pushing technique. The periodicity is set to the original value (i.e., one second) when the screen is on and set to one minute when the screen is off. In the evaluation, the Bluetooth energy consumption is calculated based on the power model of the LG Urbane smartwatch. Four metrics are considered as follows, which are normalized as percentage of the original Bluetooth energy consumption (calculated using unmodified traces).

• ∆E: the Bluetooth energy change (excluding the energy overhead). 108

Table 5.6. Traffic optimization for individual applications Application Techniques applied ∆E ∆T ∆D ∆O name (%) (%) (%) (%) Fast dormancy -56.1% -56.3% +0.2% - Sleep as Two-stage sensor processing -71.1% -45.1% -6.0% +2.1% Android Fast dormancy + Two-stage -87.3% -61.4% -5.9% +2.1% sensor processing Fast dormancy -69.4% -69.5% +0.1% - Cinch Two-stage sensor processing -76.4% -58.8% -7.8% +3.0% Fast dormancy + Two-stage -92.8% -75.2% -7.8% +3.0% sensor processing Fast dormancy -44.1% -49.5% +5.4% - Phone initiated polling -36.0% -18.4% -0.1% - Telegram Fast dormancy + Phone ini- -72.1% -54.5% -0.1% - tiated polling Fast dormancy +28.3% -9.4% +37.7% - Endomondo Context-aware pushing -85.6% -40.1% +1.1% +1.5% Fast dormancy +22.8% -6.4% +29.2% - Runkeeper Context-aware pushing -87.6% -34.2% +1.0% +1.5%

∆E = ∆T + ∆D if only fast dormancy is applied.

• ∆T : the change of the tail energy.

• ∆D: the change of the demotion energy.

• ∆O: the energy overhead, which is consumed by running additional codes on smartwatches and transferring extra messages. The CPU energy overhead is captured by locally running the codes without transferring data.

5.7.1.1 Evaluation results

Table 5.6 summarizes the evaluation results. We observe that fast dormancy does not always save energy due to the existence of the demotion process. For appli- cations (Endomondo and Runkeeper) with a very small periodicity (one second), although fast dormancy can help reducing some tail energy (∆T = −9 ∼ −6%), a large amount of demotion energy is wasted (∆D = 29 ∼ 37%), and thus the overall energy is increased (∆E = 22 ∼ 28%). Techniques designed for specific ap- plications can significantly reduce the data transfer volume for those applications to save energy. For example, two-stage sensor processing can save more than 70% 109

40 ∆ E 20 ∆ T ∆ D

0

-20

Energy change (%) -40

-60 0.1 0.5 1 1.5 2 2.5 3 3.5 4 Fast dormancy timer (sec)

Figure 5.13. Performance of fast dormancy with different settings

Fast dormancyPhone initiatedTwo-stage polling sensorJointly processing using all techniques 0

-20

-40 E (%) ∆

-60

-80

Figure 5.14. Performance evaluations of the optimization techniques for all background data transfers energy for Sleep as Android and Cinch. Phone initiated polling saves at least 36% energy (lower bound) for Telegram. Context-aware pushing brings more than 85% energy saving for Endomondo and Runkeeper.

5.7.2 Case Study

To understand how much energy can be saved on a real smartwatch, we evaluate the proposed techniques using a packet trace that captures all background data transfers on a LG Urbane smartwatch. In the experiment, two applications (i.e., Telegram and Sleep as Android) are installed on the smartwatch due to their popularity. The packet trace contains background data transfers generated by 110 these two applications and the Android system. Figure 5.13 shows the results for fast dormancy with different settings. As can be seen, ∆D decreases as the fast dormancy timer becomes longer. A big reduction of ∆D can be observed between 0.1 seconds and 0.5 seconds. According to the packet trace, the intra-burst interval of over 60% data transmissions is within the range from 0.1 seconds to 0.5 seconds, so using a 0.1-second timer triggers a lot of demotion processes and wastes energy. As the timer grows from 0.5 seconds to 4 seconds, the overall energy change (∆E = ∆T + ∆D) is dominated by the tail energy part (∆T ). Fast dormancy achieves its best performance by setting the timer to 0.5 seconds. Figure 5.14 shows the performance of different optimization techniques. For each individual technique, the energy saving rate is 47.6% (fast dormancy with a 0.5-second timer), 9.5% (phone initiated polling) and 40.6% (two- stage sensor processing). Jointly using these techniques leads to 70.6% energy saving.

5.8 Conclusion

In this chapter, we established the Bluetooth power model and performed the first in-depth study of the energy impact of background data transfers on smart- watches. We found that background data transfers are prevalent and generated for multiple purposes, many of which are unnecessary and result in serious energy inefficiency due to the tail effect. Based on these findings, we proposed four energy optimization techniques: fast dormancy, phone-initiated polling, two-stage sen- sor processing, and context-aware pushing. The first one can save tail energy for delay-tolerant data transfers. The latter three optimize specific applications which generate most background data transfers. The proposed techniques are evaluated based on trace-driven simulations and case studies. Evaluation results show that jointly using all techniques can save 70.6% of the Bluetooth energy, and the lat- ter three techniques can significantly reduce the data transfer volume for specific applications. Our work is an important step towards understanding the data traf- fic on smartwatches, and determining better optimization strategies to extend the battery life. Chapter 6

Conclusions and Future Work

6.1 Summary

In this dissertation, we design various energy saving solutions for mobile devices such as smartphones and smartwatches. We summarize these solutions as follows. In Chapter 2, we proposed network quality aware prefetching algorithms to save energy for fetching in-app ads. First, in contrast to traditional data-mining based prediction algorithms, which only generate one option, a prediction algorithm was proposed to generate multiple prefetching options with various probabilities. Then, with these prefetching options, we estimated the energy consumption of each op- tion by considering the effect of network quality and chose the best one accordingly. We proposed two prefetching algorithms, where the energy-aware prefetching al- gorithm chooses a prefetching option that minimizes the energy consumption, and the energy-and-data aware prefetching algorithm also considers the data usage to achieve a tradeoff between energy and data usage. Evaluation results show that our prefetching algorithms can save 80% of energy compared to traditional ways of fetching ads periodically, and outperform existing prefetching algorithms under various network quality. In Chapter 3, we generalized and formulated the prefetch-based energy opti- mization problem, where the goal is to find a prefetching schedule that minimizes the energy consumption of data transmissions under the current network quality. The formulated problem is a nonlinear optimization problem. To solve it, we first proposed a greedy algorithm, which iteratively decides how much data to prefetch 112 based on the current network quality, and then we proposed a discrete algorithm to improve its performance and found its performance bound. We have imple- mented and evaluated the proposed algorithms in two apps: in-app advertising and mobile video streaming. Evaluation results show that: in in-app advertising, our algorithms can adaptively adjust the number of ads to be prefetched according to the network quality and save more energy than existing algorithms; in mobile video streaming, our algorithms can save energy by 15% to 25% compared to the best existing algorithms (i.e., eSchedule and GreenTube) under various network conditions. In Chapter 4, we found that smartphones equipped with big.LITTLE cores can- not execute tasks offloaded from wearable devices on proper CPU cores due to lack of task context, resulting in either energy waste if unimportant tasks are executed on big cores or high interaction latency if urgent tasks are executed on little cores. To solve this problem, a task offloading framework CATO was proposed to keep the context of offloaded tasks so that the smartphone can properly execute them according to their performance requirements. CATO also explores opportunities to further offload tasks to the cloud, aiming to further save energy for unimpor- tant tasks and reduce delay for urgent tasks. We have implemented CATO on the Android platform and evaluated its performance based on two apps. Evaluation results show that CATO can reduce latency by at least one third for tasks related to user interaction (compared to executing them in a background process), and reduce energy by more than half for tasks unrelated to user interaction (compared to executing them in a foreground process). In Chapter 5, we established the Bluetooth power model and performed an in-depth investigation of the energy impact of background data transfers on smart- watches. We found that background data transfers are prevalent and generated for multiple purposes, many of which are unnecessary and cause serious energy in- efficiency due to the tail effect. Based on these findings, four energy optimization techniques were proposed, which are fast dormancy, phone-initiated polling, two- stage sensor processing, and context-aware pushing. Fast dormancy is used to save tail energy for delay-tolerant data transfers. The latter three techniques optimize specific applications which are responsible for most background data transfers. We have evaluated the proposed techniques based on trace-driven simulations and 113 case studies. Evaluation results show that jointly using all techniques can save 70.6% of the Bluetooth energy, and the latter three techniques can significantly reduce the data transfer volume for specific applications. Our work is the first step towards understanding the Bluetooth data traffic and determining better op- timization strategies to save energy for smartwatches.

6.2 Future Directions

Our work to date has provided a series of energy saving solutions for mobile devices. However, they are still in the preliminary stage for energy optimization, and there are many other directions for future research in this area. Next we outline several interesting directions for future work that one could pursue.

• Energy optimization for sensor-based apps: Many apps leverage the rich sensor resources on mobile devices to monitor the user behavior in order to provide personalized services like healthcare or fitness tracking. However, continuously collecting data from wake-up sensors [88] will prevent the CPU from entering a power-saving state and thus leads to energy waste. To reduce energy, sensor batching can be used which buffers sensor events in a queue and reports them together once the queue is full. Then the CPU only needs to wake up to process the grouped sensor events together and can sleep for the rest of time. To save more energy, a bigger queue can be used, but it will cause higher reporting latency, which may violate the real time requirement of some apps. Thus, the tradeoff between energy consumption and latency should be considered, and future researches can be done to optimize energy by adaptively adjusting the queue size according to the latency requirement.

• Energy optimization for event-driven apps: We have studied how to save energy by prefetching data in Chapter 2 and Chapter 3. However, it is hard to predict what data will be used in the future for event-driven apps. In those apps, different types of data may be needed for different events (e.g., clicking a certain button). If data are prefetched for an event which is never triggered, prefetching these data may waste both energy and bandwidth. Thus, for event-driven apps, we need to investigate how to predict future 114

events and how to use prefetching to save energy based on the prediction result.

• Energy optimization for different components on smartwatch: The smartwatch is a complicated system containing many components. As a first step, we have characterized and optimized the energy consumption of the Bluetooth interface in Chapter 5. However, there are lots of other compo- nents, such as display, CPU and sensors, which may also consume a large amount of energy on smartwatches. Thus, future researches can be done to profile and optimize the energy consumption of different components on smartwatches. Moreover, some components on smartwatches are controlled by specific parameters, which can affect the component’s power consumption and performance. For example, the connection interval of the Bluetooth in- terface controls how often packets can be transmitted. Having a smaller con- nection interval can reduce the transmission latency but increases the energy consumption of the Bluetooth interface. As another example, a higher CPU frequency means more computing capacity but higher power consumption. These tradeoffs should be considered to optimize the energy consumption of different components. Bibliography

[1] Balasubramanian, N., A. Balasubramanian, and A. Venkataramani (2009) “Energy consumption in mobile phones: a measurement study and implications for network applications,” in ACM IMC, pp. 280–293.

[2] Mittal, R., A. Kansal, and R. Chandra (2012) “Empowering developers to estimate app energy consumption,” in ACM MobiCom, pp. 317–328.

[3] Huang, J., F. Qian, A. Gerber, Z. M. Mao, S. Sen, and O. Spatscheck (2012) “A close examination of performance and power char- acteristics of 4G LTE networks,” in ACM MobiSys, pp. 225–238.

[4] Nath, S. (2015) “MAdScope: Characterizing Mobile In-App Targeted Ads,” in ACM MobiSys.

[5] Geng, Y., W. Hu, Y. Yang, W. Gao, and G. Cao (2015) “Energy- Efficient Computation Offloading in Cellular Networks,” in IEEE ICNP, pp. 145–155.

[6] Hu, W. and G. Cao (2015) “Energy-aware video streaming on smartphones,” in IEEE INFOCOM, pp. 1185–1193.

[7] Qian, F., Z. Wang, Y. Gao, J. Huang, A. Gerber, Z. Mao, S. Sen, and O. Spatscheck (2012) “Periodic Transfers in Mobile Applications: Network- wide Origin, Impact, and Optimization,” in ACM WWW. [8] “Bluetooth Low Energy,” https://www.bluetooth.com. [9] Friedman, R., A. Kogan, and Y. Krivolapov (2013) “On Power and Throughput Tradeoffs of WiFi and Bluetooth in Smartphones,” IEEE Trans- actions on Mobile Computing, 12(7), pp. 1363–1376.

[10] Levy, A. A., J. Hong, L. Riliskis, P. Levis, and K. Winstein (2016) “Beetle: Flexible Communication for Bluetooth Low Energy,” in ACM Mo- biSys. 116

[11] Zhao, B., W. Hu, Q. Zheng, and G. Cao (2015) “Energy-Aware Web Browsing on Smartphones,” IEEE Trans. Para. Dist. Syst., 26(3), pp. 761– 774.

[12] Zhang, T., X. Zhang, F. Liu, H. Leng, Q. Yu, and G. Liang (2015) “eTrain: Making Wasted Energy Useful by Utilizing Heartbeats for Mobile Data Transmissions,” in IEEE ICDCS, pp. 113–122.

[13] Mohan, P., S. Nath, and O. Riva (2013) “Prefetching Mobile Ads: Can Advertising Systems Afford It?” in ACM EuroSys.

[14] Hu, W. and G. Cao (2014) “Quality-aware traffic offloading in wireless networks,” in ACM MobiHoc, pp. 277–286.

[15] Shi, B., J. Yang, Z. Huang, and P. Hui (2015) “Offloading Guidelines for Augmented Reality Applications on Wearable Devices,” in ACM MM.

[16] “Android 6.0 Marshmallow,” https://www.android.com/versions/ marshmallow-6-0.

[17] Khan, A. J., K. Jayarajah, D. Han, A. Misra, R. Balan, and S. Se- shan (2013) “CAMEO: A middleware for mobile advertisement delivery,” in ACM MobiSys, pp. 125–138.

[18] Parate, A., M. Bohmer¨ , D. Chu, D. Ganesan, and B. M. Marlin (2013) “Practical prediction and prefetch for faster access to applications on mobile phones,” in ACM UbiComp, pp. 275–284.

[19] Yin, L. and G. Cao (2004) “Adaptive power-aware prefetch in wireless net- works,” IEEE Trans. Wireless Commun., 3(5), pp. 1648–1658.

[20] Master, N., A. Dua, D. Tsamis, J. P. Singh, and N. Bambos (2016) “Adaptive Prefetching in Wireless Computing,” IEEE Trans. Wireless Com- mun., 15(5), pp. 3296–3310.

[21] Huang, D., L. Yang, and S. Zhang (2015) “Dust: Real-Time Code Of- floading System for Wearable Computing,” in IEEE GLOBECOM.

[22] Cheng, Z., P. Li, J. Wang, and S. Guo (2015) “Just-in-Time Code Of- floading for Wearable Computing,” IEEE Transactions on Emerging Topics in Computing, 3(1), pp. 74–83.

[23] “ARM big.LITTLE Technology,” http://www.thinkbiglittle.com.

[24] Min, C., S. Kang, C. Yoo, J. Cha, S. Choi, Y. Oh, and J. Song (2015) “Exploring Current Practices for Battery Use and Management of Smart- watches,” in ACM ISWC. 117

[25] “The history of app pricing and why most apps are free,” http://blog.flurry.com/bid/99013/ The-History-of-App-Pricing-And-Why-Most-Apps-Are-Free.

[26] “Download distribution of Android apps,” http://www.appbrain.com/ stats/android-app-downloads.

[27] Hu, W. and G. Cao (2014) “Energy optimization through traffic aggregation in wireless networks,” in IEEE INFOCOM, pp. 916–924.

[28] “Configuration of fast dormancy. rel. 8,” http://www.3gpp.org.

[29] Vallina-Rodriguez, N., J. Shah, A. Finamore, Y. Grunenberger, K. Papagiannaki, H. Haddadi, and J. Crowcroft (2012) “Breaking for Commercials: Characterizing Mobile Advertising,” in ACM IMC.

[30] Pathak, A., Y. C. Hu, and M. Zhang (2012) “Where is the energy spent inside my app?: fine grained energy accounting on smartphones with eprof,” in ACM EuroSys, pp. 29–42.

[31] Higgins, B. D., J. Flinn, T. J. Giuli, B. Noble, C. Peplin, and D. Watson (2012) “Informed mobile prefetching,” in ACM MobiSys, pp. 155–168.

[32] Wang, Y., X. Liu, D. Chu, and Y. Liu (2015) “EarlyBird: Mobile Prefetch- ing of Social Network Feeds via Content Preference Mining and Usage Pattern Analysis,” in ACM MobiHoc, pp. 67–76.

[33] Schulman, A., V. Navda, R. Ramjee, N. Spring, P. Deshpande, C. Grunewald, K. Jain, and V. N. Padmanabhan (2010) “Bartendr: a practical approach to energy-aware cellular data scheduling,” in ACM Mo- biCom, pp. 85–96.

[34] Chen, Y., P. Berkhin, B. Anderson, and N. R. Devanur (2011) “Real- time Bidding Algorithms for Performance-based Display Ad Allocation,” in ACM SIGKDD.

[35] “Three trends in mobile advertising,” http://mobiledevmemo.com/ three-trends-in-mobile-advertising.

[36] “Millennial media ad specs,” http://www.millennialmedia.com/ad-specs.

[37] “Yahoo ad specs,” https://adspecs.yahoo.com/adformats/mobile.

[38] “AdMob ad specs,” https://support.google.com/admob. 118

[39] “iAd ad specs,” https://developer.apple.com/news-publisher/iad/ Creative-Specifications.pdf.

[40] Shepard, C., A. Rahmati, C. Tossell, L. Zhong, and P. Kortum (2011) “LiveLab: Measuring Wireless Networks and Smartphone Users in the Field,” SIGMETRICS Perform. Eval. Rev., 38(3), pp. 15–20.

[41] Tibshirani, R., G. Walther, and T. Hastie (2001) “Estimating the num- ber of clusters in a dataset via the Gap statistic,” Journal of the Royal Sta- tistical Society B, 63(2), pp. 411–423.

[42] Neapolitan, R. E. (2003) Learning Bayesian Networks, Prentice-Hall, Inc.

[43] Koc, A. T., S. C. Jha, R. Vannithamby, and M. Torlak (2014) “Device Power Saving and Latency Optimization in LTE-A Networks Through DRX Configuration,” IEEE Trans. Wireless Commun., 13(5), pp. 2614–2625.

[44] Yang, Y., Y. Geng, L. Qiu, W. Hu, and G. Cao (2017) “Context-Aware Task Offloading for Wearable Devices,” in IEEE ICCCN.

[45] Huang, J., F. Qian, Y. Guo, Y. Zhou, Q. Xu, Z. M. Mao, S. Sen, and O. Spatscheck (2013) “An In-depth Study of LTE: Effect of Network Protocol and Application Behavior on Performance,” in ACM SIGCOMM.

[46] Li, W., R. K. P. Mok, D. Wu, and R. K. C. Chang (2015) “On the accuracy of smartphone-based mobile network measurement,” in IEEE IN- FOCOM.

[47] “Ookla Speedtest,” http://www.speedtest.net/mobile.

[48] Nika, A., Y. Zhu, N. Ding, A. Jindal, Y. C. Hu, X. Zhou, B. Y. Zhao, and H. Zheng (2015) “Energy and Performance of Smartphone Radio Bundling in Outdoor Environments,” in ACM WWW, pp. 809–819.

[49] Qian, F., Z. Wang, A. Gerber, Z. M. Mao, S. Sen, and O. Spatscheck (2010) “Top: Tail optimization protocol for cellular radio resource allocation,” in IEEE ICNP, pp. 285–294.

[50] Cui, Y., S. Xiao, X. Wang, M. Li, H. Wang, and Z. Lai (2014) “Performance-aware energy optimization on mobile devices in cellular net- work,” in IEEE INFOCOM, pp. 1123–1131.

[51] Yan, T., D. Chu, D. Ganesan, A. Kansal, and J. Liu (2012) “Fast app launching for mobile devices using predictive user context,” in ACM MobiSys, pp. 113–126. 119

[52] Hoque, M. A., M. Siekkinen, and J. K. Nurminen (2013) “Using crowd- sourced viewing statistics to save energy in wireless video streaming,” in ACM MobiCom, pp. 377–388.

[53] Li, X., M. Dong, Z. Ma, and F. C. Fernandes (2012) “GreenTube: power optimization for mobile videostreaming via dynamic cache management,” in ACM MM, pp. 279–288.

[54] Rao, A., A. Legout, Y.-s. Lim, D. Towsley, C. Barakat, and W. Dabbous (2011) “Network characteristics of video streaming traffic,” in ACM CoNEXT, pp. 25:1–25:12.

[55] Rivoire, S., P. Ranganathan, and C. Kozyrakis (2008) “A Comparison of High-Level Full-System Power Models,” in ACM HotPower, pp. 3–3.

[56] Xie, X., X. Zhang, S. Kumar, and L. E. Li (2015) “piStream: Physical Layer Informed Adaptive Video Streaming Over LTE,” in ACM MobiCom, pp. 413–425.

[57] Kumar, S., E. Hamed, D. Katabi, and L. Erran Li (2014) “LTE Radio Analytics Made Easy and Accessible,” in ACM SIGCOMM, pp. 211–222.

[58] Mohan, P., S. Nath, and O. Riva (2013) “Prefetching mobile ads: Can advertising systems afford it?” in ACM EuroSys, pp. 267–280.

[59] “Recommended upload encoding settings,” https://support.google.com/youtube/answer/1722171?hl=en.

[60] “VLC media player,” http://www.videolan.org/.

[61] Huang, T.-Y., R. Johari, N. McKeown, M. Trunnell, and M. Wat- son (2014) “A buffer-based approach to rate adaptation: Evidence from a large video streaming service,” in ACM SIGCOMM, pp. 187–198.

[62] Wang, B., J. Kurose, P. Shenoy, and D. Towsley (2008) “Multimedia Streaming via TCP: An Analytic Performance Study,” ACM Trans. Multime- dia Comput. Commun. Appl., 4(2), pp. 16:1–16:22.

[63] Spiteri, K., R. Urgaonkar, and R. K. Sitaraman (2016) “BOLA: Near- optimal bitrate adaptation for online videos,” in IEEE INFOCOM, pp. 1–9.

[64] Bethanabhotla, D., G. Caire, and M. J. Neely (2016) “WiFlix: Adap- tive Video Streaming in Massive MU-MIMO Wireless Networks,” IEEE Trans. Wireless Commun., 15(6), pp. 4088–4103. 120

[65] Parate, A., M.-C. Chiu, C. Chadowitz, D. Ganesan, and E. Kalogerakis (2014) “RisQ: Recognizing Smoking Gestures with Iner- tial Sensors on a Wristband,” in ACM MobiSys.

[66] Chowdhury, A. R., B. Falchuk, and A. Misra (2010) “MediAlly: A provenance-aware remote health monitoring middleware,” in IEEE PerCom.

[67] Ha, K., Z. Chen, W. Hu, W. Richter, P. Pillai, and M. Satya- narayanan (2014) “Towards Wearable Cognitive Assistance,” in ACM Mo- bisys.

[68] Anam, A. I., S. Alam, and M. Yeasin (2014) “Expression: A dyadic conversation aid using Google Glass for people who are blind or visually im- paired,” in ACM MobiCASE.

[69] Wang, H., T. T.-T. Lai, and R. Roy Choudhury (2015) “MoLe: Motion Leaks Through Smartwatch Sensors,” in ACM MobiCom.

[70] Shen, S., H. Wang, and R. Roy Choudhury (2016) “I Am a Smartwatch and I Can Track My User’s Arm,” in ACM MobiSys.

[71] Yang, Y. and G. Cao (2017) “Characterizing and optimizing background data transfers on smartwatches,” in IEEE ICNP.

[72] Cuervo, E., A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu, R. Chandra, and P. Bahl (2010) “MAUI: Making Smartphones Last Longer with Code Offload,” in ACM MobiSys.

[73] Gao, W., Y. Li, H. Lu, T. Wang, and C. Liu (2014) “On Exploiting Dynamic Execution Patterns for Workload Offloading in Mobile Cloud Appli- cations,” in IEEE ICNP.

[74] Guo, S., B. Xiao, Y. Yang, and Y. Yang (2016) “Energy-efficient dynamic offloading and resource scheduling in mobile cloud computing,” in IEEE IN- FOCOM.

[75] Ra, M.-R., A. Sheth, L. Mummert, P. Pillai, D. Wetherall, and R. Govindan (2011) “Odessa: Enabling Interactive Perception Applications on Mobile Devices,” in ACM MobiSys.

[76] Chen, X., N. Ding, A. Jindal, Y. C. Hu, M. Gupta, and R. Van- nithamby (2015) “Smartphone Energy Drain in the Wild: Analysis and Implications,” in ACM SIGMETRICS.

[77] Huggins-Daines, D., M. Kumar, A. Chan, A. W. Black, M. Ravis- hankar, and A. I. Rudnicky (2006) “Pocketsphinx: A free, real-time con- tinuous speech recognition system for hand-held devices,” in IEEE ICASSP. 121

[78] Mukhopadhyay, S. C. (2015) “Wearable Sensors for Human Activity Mon- itoring: A Review,” IEEE Sensors Journal, 15(3), pp. 1321–1330.

[79] Xiang, L., S. Ye, Y. Feng, B. Li, and B. Li (2014) “Ready, Set, Go: Coalesced offloading from mobile devices to the cloud,” in IEEE INFOCOM.

[80] Geng, Y., Y. Yang, and G. Cao (2018) “Energy-Efficient Computation Offloading for Multicore-based Mobile Devices,” in IEEE INFOCOM.

[81] Rawassizadeh, R., B. A. Price, and M. Petre (2014) “Wearables: Has the Age of Smartwatches Finally Arrived?” Commun. ACM, 58(1), pp. 45–47.

[82] Huang, J., F. Qian, Z. M. Mao, S. Sen, and O. Spatscheck (2012) “Screen-off Traffic Characterization and Optimization in 3G/4G Networks,” in ACM IMC.

[83] Yang, Y., Y. Geng, and G. Cao (2017) “Energy-Aware Advertising through Quality-Aware Prefetching on Smartphones,” in IEEE MASS.

[84] Yang, Y. and G. Cao “Prefetch-Based Energy Optimization on Smart- phones,” IEEE Trans. Wireless Commun.

[85] Liu, R. and F. X. Lin (2016) “Understanding the Characteristics of Android Wear OS,” in ACM MobiSys.

[86] “hcidump,” http://www.linuxcommand.org/man\_pages/hcidump8.html.

[87] “hcitool,” http://linuxcommand.org/man\_pages/hcitool1.html.

[88] “Wake-up sensors,” https://source.android.com/devices/sensors/ suspend-mode#wake-up_sensors. Vita Yi Yang

Yi Yang received the B.S. and M.S. degrees in Computer Science from the Beijing University of Posts and Telecommunications, Beijing, China, in 2009 and 2012, respectively. He enrolled in the Ph.D. program in Computer Science and Engineering at The Pennsylvania State University in August 2012. He is a student member of IEEE.

Publications during the Ph.D. study:

• Y. Geng, W. Hu, Y. Yang, W. Gao, and G. Cao, “Energy-Efficient Compu- tation Offloading in Cellular Networks,” IEEE ICNP, 2015.

• Y. Yang and G. Cao, “Characterizing and optimizing background data trans- fers on smartwatches,” in IEEE ICNP, 2017.

• Y. Yang, Y. Geng, and G. Cao, “Energy-aware advertising through quality- aware prefetching on smartphones,” in IEEE MASS, 2017.

• Y. Yang, Y. Geng, L. Qiu, W. Hu, and G. Cao, “Context-aware task offload- ing for wearable devices,” in IEEE ICCCN, 2017.

• Y. Geng, Y. Yang, and G. Cao, “Energy-Efficient Computation Offloading for Multicore-based Mobile Devices,” IEEE INFOCOM, 2018.

• Y. Yang and G. Cao, “Prefetch-Based Energy Optimization on Smartphones,” IEEE Transactions on Wireless Communications (TWC), to appear