Dynamic Analyses for Privacy and Performance in Mobile Applications
Mingyuan Xia
Doctor of Philosophy
School of Computer Science
McGill University Montreal, Quebec 2016-08-14
A Thesis Submitted to the Faculty of Graduate Studies and Research in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Copyright c 2016 Mingyuan Xia DEDICATION
To my beloved family
ii ACKNOWLEDGMENTS
First and foremost, I deeply appreciate my supervisor Dr. Xue Liu for his patience and advice during my graduate study. I am also very fortunate to have Dr. Laurie Hendren and Dr. David Lie provide their invaluable feedbacks to improve my thesis work. I want to thank Dr. Zhengwei Qi and Dr. Yi Gao for collaboration in various research projects. At McGill, I would like to thank all members of CPSLab, staff of School of Computer Science, and Ron Simpson. And I enjoy the fun days with friends from MTSA and SJTU alumnus. I appreciate Howard Wang’s great French skills and Nos Th´esfor brewing the best milk tea. At IBM Almaden, Dr. Pin Zhou and Dr. Mohit Saxena have provided the greatest mentorship during my internship. Finally I want to acknowledge the IBM Ph.D. fellowship, McGill Lorne Trottier Fellowship and NSERC for financially supporting my graduate career.
iii ABSTRACT
Mobile applications (also called apps) have greatly extended and inno- vated users’ daily tasks. The mobile programming model features event-driven execution, rapid changing APIs (about three generations per year) and ubiqui- tous accesses to user’s personal data. These features enrich app functionalities but also give rise to many new software problems that impact performance or damage user privacy, many of which are not occasional programming mistakes. In this thesis, we systematically study these problems and develop dynamic program analyses to effectively detect, diagnose and fix these new problems. We start by researching the sensitive data leakage problem in apps. Since mobile apps can access various sensitive user data stored on the device, data leaks become a great concern for both end users and app market operators. Ex- isting leak detecting approaches rely on static analysis that does not perform well on real-world apps with growing complexity, further limiting their adop- tion for real usage. We propose AppAudit, which embodies a novel dynamic analysis that can execute part of the app code while tracking the dissemination of sensitive data. AppAudit also has a static analysis to shrink analysis scope and boost analysis performance. The synergy of two analyses achieves higher detection accuracy, runs 8.3× faster and uses 90% less memory on real-world Android apps as compared to previous approaches. Based on the analysis building blocks from AppAudit, we further develop binary instrumentation to profile and improve app performance. We study 115 thousand apps and common performance anti-patterns from existing lit- erature. Based on these understandings, we propose AppInspector, which instruments apps to profile a small set of methods while collecting various app runtime diagnostic data. These profiling data is transformed into a graph
iv structure, where AppInspector programmatically diagnoses three common per- formance anti-patterns from this graph. We also develop AppSwift based on AppInspector, which transforms app code to automatically fix some perfor- mance anti-patterns and improve app performance. Both tools instrument app code automatically. Instrumented apps can run on unmodified Android OSes and thus being readily deployable to existing test environments. With extensive tests on real-world apps, AppInspector uncovers 22 performance is- sues per app, with detailed analysis results to guide developers to fix them; AppSwift automatically eliminates about 5 of such issues without any code modification from the app developer. We believe that the analysis method- ologies, frameworks and tools developed in this thesis can assist developers in debugging various performance problems and better protecting user privacy.
v ABREG´ E´
Les applications mobiles (´egalement appel´esapps) ont consid´erablement ´etenduet innov´eeles tˆaches quotidiennes des utilisateurs. Le mod`elede pro- grammation mobile dispose d’ex´ecution´ev´enementielle, API ´evolution rapide (environ trois g´en´erationspar an ) et omnipr´esente des acc`esaux donn´eesper- sonnelles de l’utilisateur. Ces fonctionnalit´esenrichissent app fonctionnalit´es, mais aussi donner lieu `ade nombreux probl`emesnouveaux logiciels que la per- formance de l’impact ou de dommages utilisateur vie priv´ee,dont beaucoup ne sont pas des erreurs de programmation occasionnelles. Dans cette th`ese,nous ´etudions syst´ematiquement ces probl`emeset d´eveloppons le programme dy- namique des analyses pour d´etecterefficacement, diagnostiquer et r´esoudreces nouveaux probl`emes. Nous commen¸conspar rechercher le probl`emede fuite de donn´eessensibles dans des apps. Comme les applications mobiles peuvent acc´eder`adiverses donn´eessensibles de l’utilisateur stock´essur l’appareil, les fuites de donn´eesdevient une grande pr´eoccupation pour les utilisateurs fin- aux et les op´erateursdu march´ede l’app. Les m´ethodes de d´etectionde fuites existantes s’appuient sur l’analyse statique qui ne fonctionne pas bien sur les applications dans le monde rel avec une complexit´ecroissante. Nous proposons AppAudit, qui incarne une nouvelle analyse dynamique qui peut ex´ecuterla partie de l’app code tout en effectuant le suivi de la diffusion des donn´eessensi- bles. AppAuditposs`ede´egalement une analyse statique pour r´etr´ecirl’analyse des performances de l’analyse et boost scopie. La synergie des deux analyses permet d’obtenir une plus grande pr´ecisionde d´etection,8.3x plus rapide et utilise ex´ecute90% moins de m´emoiresur les applications Android dans le monde r´eelpar rapport aux approches pr´ec´edentes. Sur la base des blocs de
vi construction de l’analyse d’AppAudit, nous d´eveloppons l’instrumentation bi- naire au profil et am´elioronsles performances des applications. Nous ´etudions 115 mille applications et performance communs anti-mod`eles `apartir de la litt´eratureexistante. Sur la base de ces accords, nous proposons AppInspector, qui instrumente applications au profil d’un petit ensemble de m´ethodes tout en recueillant des donn´eesde diagnostic diff´erentes application d’ex´ecution. Ces donn´eesde profilage se transforme en une structure de graphe, o`uAppIn- spector diagnostique trois performances commune anti-mod`eles`apartir de ce graphique. Nous d´eveloppons ´egalement AppSwift bas´esur AppInspector, qui transforme le code de l’application pour corriger automatiquement certaines performances anti-mod`eleset d’am´eliorer les performances des applications. Les deux outils instrument code de l’application automatiquement. Les appli- cations instrument´eespeuvent fonctionner sur les syst`emesd’exploitation An- droid non modifi´eset donc ˆetrefacilement d´eployable `ades environnements de test existants. Avec des tests approfondis sur les applications du monde r´eel, AppInspector d´ecouvre22 probl`emesde performance par application, avec des r´esultatsd’analyse d´etaill´espour guider les d´eveloppeurs de les corriger; AppSwift ´elimine automatiquement environ 5 de ces questions sans aucune modification de code `apartir du d´eveloppeur de l’application. Nous croyons que les m´ethodes d’analyse, les cadres et les outils d´evelopp´esdans cette th`ese peuvent aider les d´eveloppeurs `ad´eboguer divers probl`emesde performance et une meilleure protection de la vie priv´eedes utilisateurs.
vii TABLE OF CONTENTS
DEDICATION...... ii ACKNOWLEDGMENTS...... iii ABSTRACT...... iv ABREG´ E´...... vi LIST OF TABLES...... xi LIST OF FIGURES...... xiii 1 Introduction...... 1 1.1 Contributions...... 3 1.2 Thesis Organization...... 5 2 Background...... 6 2.1 Android System Hierarchy...... 6 2.2 Android Applications...... 7 2.2.1 Code, Manifest, and Resources...... 8 2.2.2 Execution Model and Performance...... 10 2.2.3 Permission and Privacy...... 11 3 AppAudit: Analyzing and Detecting Data Leaks...... 14 3.1 The Information Flow Problem Revisited...... 14 3.2 Related Work...... 17 3.2.1 Static Analysis...... 17 3.2.2 Dynamic Analysis...... 19 3.2.3 Compiler Techniques...... 19 3.3 The Synergy of Two Analyses...... 19 3.4 API Usage Analysis...... 21 3.4.1 Call Graph Extensions...... 21 3.4.2 API Usage Analysis...... 24 3.5 Approximated Execution...... 25 3.5.1 Object and Taint Representation...... 26 3.5.2 Basic Execution Flow...... 27 3.5.3 Complete Execution Rules...... 30 3.5.4 Tainting Rules...... 31
viii 3.5.5 Execution Extensions and Optimizations...... 32 3.5.6 Approximation Mode...... 32 3.5.7 False Positive Analysis: Execution Path Validation. 35 3.5.8 False Negative Analysis: Tainting Validation.... 38 3.5.9 Infinity Avoidance...... 40 3.6 Evaluation...... 41 3.6.1 Implementation...... 42 3.6.2 Evaluation Methodology...... 44 3.6.3 Completeness of Static API Analysis...... 45 3.6.4 Detection Accuracy...... 46 3.6.5 Usability...... 50 3.6.6 Characterization of Data Leaks in Real Apps.... 54 4 AppInspector: Programmatically Diagnosing Performance Issues. 59 4.1 Introduction...... 59 4.2 Performance Issue Characterization...... 62 4.2.1 Lengthy Operation...... 62 4.2.2 Over Asynchrony...... 64 4.2.3 Memory Bloat...... 65 4.3 AppInspector Design...... 69 4.3.1 Bytecode Instrumentation...... 69 4.3.2 Profile Graph...... 78 4.3.3 Diagnosing Performance Issues...... 79 4.3.4 Implementation...... 83 4.4 Evaluation...... 83 4.4.1 Methodology...... 84 4.4.2 Overall Results...... 86 4.4.3 Diagnose Slow App Initialization...... 87 4.4.4 Pinpoint Lengthy API Calls in Slow Event Handlers 88 4.4.5 Reveal Inefficient User-defined Functions...... 89 4.4.6 Find Colliding AsyncTasks...... 90 4.4.7 Comparison with Static Analysis...... 91 4.4.8 Overhead Analysis...... 92 4.5 Related Work...... 93 5 AppSwift: Automatically Enhancing App UI Performance.... 96 5.1 Introduction...... 96 5.2 Motivation...... 99 5.3 AppSwift Design...... 101 5.3.1 Overview...... 102 5.3.2 Bitmap Cache...... 103 5.3.3 Simplest API transformation...... 105 5.3.4 API-assisted API transformation...... 106 5.3.5 Data-flow assisted API transformation...... 107
ix 5.3.6 Transformation correctness validation and complex- ity reduction...... 109 5.3.7 Logging and Inspecting...... 110 5.3.8 Implementation...... 110 5.4 Evaluation...... 111 5.4.1 Methodology...... 112 5.4.2 Experiment Results...... 115 5.4.3 Case Study...... 119 5.5 Discussions...... 122 5.5.1 Generalization...... 122 5.5.2 App Rewriting vs. OS Upgrade...... 123 5.5.3 Dalvik vs. ART...... 124 5.6 Related Work...... 124 6 Conclusion and Future Work...... 127 Appendix: PATDroid...... 131 References...... 134
x LIST OF TABLES Table page 3–1 Trigger APIs and extended function calls...... 24 3–2 The execution rules. κ is a series of evaluation functions that perform real calculation when values are known. PTS denotes primitive types...... 30 3–3 The SLOCs for different components...... 42 3–4 Evaluation datasets...... 43 3–5 The breakdown of detection accuracy on Android malware genome dataset...... 48 3–6 App auditing use cases and requirements...... 52 3–7 Free apps that spread certain personal information identified by AppAudit. For the “Privacy Policy” column, a “lib” means that the privacy policy does not cover the kind of data spread by advertising libraries...... 55 4–1 A list of studied memory related bug reports...... 66 4–2 Selected apps and evaluation workloads. Category and down- loads are collected from Google Play Store. All apps have a user review score between 4.2 and 4.7 (out of 5.0)...... 85 4–3 Performance issues detected by AppInspector...... 86 4–4 The analysis capability of AppInspector (dynamic) vs. Per- fChecker (static). ⊆ means the detected problems form a subset of actual problems while ⊇ indicates a superset relation. UDFs denote User-Defined Functions...... 91 5–1 Benchmark image loading on three devices. All three devices load a full-screen-resolution RGB bitmap image...... 99
xi 5–2 Different bitmap origins and their identifiers. “App private” means the image can not be accessed by other apps. “Read- only” means the image data can not be mutated. FILEPATH refers to the relative file path to app root folder. LMT stands for last modified time of the file. RESID stands for a numeric identifier for the resource generated by the compiler...... 103 5–3 Selected apps and the workload for evaluation. All data are collected from Google Play Store...... 113
xii LIST OF FIGURES Figure page 2–1 Android architecture...... 6 2–2 The APK file format and the brief Android build process. AndroidManifest.xml* denotes the binary form of this XML file...... 8 2–3 The event-driven execution of an Android app...... 11 2–4 The permissions request dialog when the user installs an app.. 12 3–1 AppAudit use cases: AppAudit protects app developers from using data-leaking 3rd-party libraries; AppAudit helps app markets to detect data-leaking apps uploaded by untrusted app developers; AppAudit helps mobile users to prevent installing problematic apps from untrusted app markets... 15 3–2 AppAudit architecture and workflow...... 20 3–3 An extended call graph. Each vertex stands for a function. Solid lines represent traditional call relationships and dashed lines stand for extended calls. Grey vertices are the marked suspicious functions. BRs stand for BroadcastReceivers that can receive system events...... 23 3–4 AppAudit approximated executor state machine...... 25 3–5 Four basic control flow structures and their compiled bytecode streams...... 33 3–6 The overall true positives on Android malware genome dataset (99.3%)...... 47 3–7 The average analysis time per app for AppAudit and two static analysis tools. Note that FlowDroid only finishes 61% of the samples (due to OutOfMemory exceptions and 10-minute timeout). Its average time only includes successful cases... 51 3–8 The venues of data leaking...... 56 3–9 The types of leaked data...... 57 4–1 Breakdown of memory objects involved in memory leak reports. 67
xiii 4–2 AppInspector workflow...... 68 4–3 AppInspector instrumentation details. Lightweight Profiling: 1. event handlers, life cycle methods and UI callbacks; 2. asynchronous functions; Complementary Tracing: 3. time- consuming API calls; 4. GC pause time; 5. stack sampling; Tracking Asynchrony: 6. asynchronous calls;...... 70 4–4 Generating and visualizing a profile graph...... 77 4–5 A case of colliding asynchronous functions in the time-thread view of a real profile graph. Dashed lines are asynchronous calls...... 81 4–6 The cumulative distribution function of execution time of long-running methods in the UI thread...... 87 5–1 The cumulative distribution function for API occurrence in 115 thousand apps. The top four APIs related to bitmap are selected, used in more than 80% of the apps...... 100 5–2 The architecture of AppSwift. Four shaded components are runtime components of AppSwift, running with the rewritten app...... 102 5–3 AppSwift’s performance enhancements on 30 real-world apps.. 116 5–4 FaceQ main window, showing the avatar picture currently being made and a list of available style pictures. Each style picture is loaded and displayed asynchronously...... 121
xiv Chapter 1 Introduction In the past ten years, mobile devices have witnessed growth and tremen- dous success in the consumer electronics market. Smart devices refer to a broad spectrum of portable electronics, powered by similar software platforms like Google Android and Apple iOS. These include wearable devices like wearable smart watches (<1.8 diagonal inches), hand-held devices such as smartphones (2.45 to 5.1 diagonal inches), phablets (5.1 to 6.99 inches) and tablets (>7 inches). Also modern TVs and cars are integrated with smart hardware and software. As of year 2015, the worldwide shipment of smartphones alone has reached more than 1.4 billion units [81]. Mobile applications, also termed as “apps”, are the mobile software that directly interact with users on mobile platforms. These apps offer a wide range of functionaries, such as location-based services, health monitoring, mo- bile gaming, which has greatly innovated daily lives. Mobile users generally obtain apps from app markets. As of 2015, the Google Android app market- place provides over 1.5 million apps, with a cumulative downloads of tens of billions [40]. As mobile computing becomes ubiquitous and users continue to use more and more apps, app quality becomes the core factor that impacts user satisfactory. App quality has many aspects and in this thesis, privacy and performance problems in apps are the primary study targets.
Detecting Privacy Threats. Mobile devices nowadays store a wide range of personal user data and apps could use these data to improve service quality (e.g., targeted search results, location-based recommendation, etc). However, this also attracts attackers that abuse apps to collect sensitive user
1 information for unfair advertising revenue, phishing and other malicious activ- ities. Low-quality apps are also growing quickly, such as adware that contain only ads, greyware that interfere and hijack normal apps, repackaged apps that tamper popular apps to steal advertising revenue, etc. App market oper- ators as well as mobile users need tools to detect privacy threats in apps and prevent harmful apps. Meanwhile, various app privacy study gradually reveals that some apps and advertising libraries that extensively collect and leak sen- sitive user data [71, 62, 65]. App developers also need tools to understand the potential privacy threats caused by including 3rd-party libraries. Recently the research community is developing static analysis [48, 121, 103] and dynamic approaches [126, 61] to track sensitive information flow and detect data leaks. However, because of the event-driven nature and a rapid growing code base, such tools dramatically degrade in detection precision and consume considerable amount of time (from several minutes to hours per app). We realize that controlling the cost (memory consumption and analysis time) of program analysis is important for making these analyses practical for app markets, mobile users and app developers.
Diagnosing Performance Problems. Mobile apps are like other graph- ics user interface (GUI) applications, which are interactive applications that constantly receive user inputs and update its UI. The key performance in- dex for interactive apps is to keep responsive to user inputs. Method tracing (TraceView [30] in the Android SDK) instruments apps to time the method execution to reveal long-running methods, which is the primary tool available for debugging app performance problems. However, event-driven execution of apps could spawn multiple threads working asynchronously to serve one user inputs. Also, the slowness observed in one particular method might be caused by contending resources with another thread or due to a particular
2 execution ordering. Thus method tracing often can not reveal the root causes of performance problems and extensive tracing also incurs considerable run- time overhead (up to 2.5x slowdown [74]) and produces a large bulk of logs that are not easy to understand. As a result, according to a recent study of Github-hosted mobile app projects [110], profiling tools are not effective for diagnosing performance problems and developers generally choose to manually time certain methods to detect performance problems. Recent app debugging research aims to develop more effective perfor- mance measurement methods [122, 101, 108, 112, 78] for mobile apps. How- ever, some approaches require changes in the operating system while others are non-trivial for the Android platform where apps can make reflection calls, use complicated inter-process messaging passing and use a wide range of APIs. We realize that relying on a specialized OS is not realistic for an ever-growing number of OEM Android devices and the complexity of the Android platform must be tackled by a practical diagnosing tool. Meanwhile, researchers are developing mechanisms to improve app perfor- mance [102, 92, 69, 56]. However, these tools currently rely on app developers to use new APIs or refactor existing code base. It remains interesting yet challenging to explore methods that can transform app code and apply per- formance enhancements automatically. 1.1 Contributions
In this thesis, we develop various program analysis techniques and tools to automatically deal with privacy and performance problems in mobile apps. These techniques could be used by app developers to improve app quality, by app market operators to remove low-quality apps and by users to vet app behavior. The contributions can be divided into three parts:
3 • AppAudit: we develop a dynamic binary code analysis to detect data leaks in Android apps. This dynamic analysis can execute a part of the app and track the dissemination of user data. We combine this analysis with a static analysis to detect leaks more precisely and more efficiently (both in terms of time and memory consumption). This part of the work is published at IEEE Symposium on Security and Privacy (S&P’15) [115]. • AppInspector is a tool that diagnoses performance problems in Android apps. It reuses most analysis building blocks from AppAudit and extends that to perform static instrumentation on app binary code. AppInspec- tor collects timing and diagnostic data at runtime and analyze these data to reveal responsiveness problems and their causes. AppInspector only needs to instrument app code and thus is readily deployable to exist- ing test environments. The preliminary results have been published at Workshop on Power-Aware Computing and Systems (HotPower’13) [116] and the full paper is in submission, under review [117]. • AppSwift is a code transformation tool that is built based on AppIn- spector. With AppInspector, we find that app UI constitutes many large bitmap images and many app responsiveness problems are caused by inefficiently loading these images. We develop AppSwift that auto- matically transforms app code to fix common code patterns that cause performance problems. We demonstrate that AppSwift can remove a considerable number of UI responsiveness problems by transforming the app code and retrofitting the uses of various image loading APIs. This part of the work is currently in submission, under review [118].
4 1.2 Thesis Organization
The rest of the thesis begins with a brief introduction of the Android software stack and its app programming model. Then three chapters elaborate the methods and tools we developed to app performance issues and privacy leakage. Finally, a summary of my current research results and future work is provided. The appendix briefly introduces an open-source part of our analysis framework that is used across the three tools we built in the thesis.
5 Chapter 2 Background Android is an open-source mobile platform developed by multiple compa- nies, led by Google, first released in September 2008 [2]. Nowadays, Android is a major mobile platform powering hundreds of millions of devices worldwide. Everyday, there are about one million new Android devices being activated. The Android platform supports a diverse range of devices, from as small as smart watches, TV boxes, to medium size smartphones, phablets, tablets, to large sized connected vehicles. The Play store, which is the Google operated Android application marketplace, now has over one million different apps in 2016. In this chapter, we introduce the Android software stack with a focus on the composition and the execution model of Android applications. 2.1 Android System Hierarchy
Applications (Apps)
Application Framework Views, Managers, Content Providers
Libraries Android WebKit, OpenGL, Runtime SQLite, libc, … Dalvik VM, ART
Linux Drivers, Inter-Process Communication
Figure 2–1: Android architecture.
6 The Android software stack is divided into four layers, as shown in Fig- ure 2–1.
Linux. At the very bottom of the software stack, Android runs a tai- lored Linux operating system, with drivers for various on-board hardware com- ponents, such as WiFi module, GPS module, accelerometer, camera, etc.
Android Runtime and Native Libraries. On top of Linux, Android has the Android runtime and native libraries. The Android runtime is a fully- functional Java virtual machine, known as the Dalvik VM [20], which can execute programs in Dalvik bytecode format. Newer Android also has a new runtime called ART [11], which further compiles Dalvik bytecode to native device binary for improved execution performance. ART and Dalvik are com- patible. Native libraries include SQL-style database, fonts, OpenGL libraries, WebKit, which are fundamental to modern applications.
Application Framework. The layer above that is the Android Appli- cation Framework, which appears as several jar files dynamically linked to all Android applications. The framework provides Java classes to access various hardware services (e.g. acquiring GPS locations, reading phone state, etc). It also provides standard Android UI widgets (namely Views) such as buttons and text boxes. This layer also contains a set of designated application compo- nent classes, which should be inherited by the application to perform program logic.
Android Applications. Finally, the top most layer is the Android applications (or apps for short), developed in Java. Android apps are packed into a single distributable file, called the Android Application Package (APK). 2.2 Android Applications
Android apps are the entities that directly interact with mobile users. Apps can provide various complicated functionalities, such as photo editing,
7 web browsing, gaming, etc. Analyzing and improving the performance and security of apps is the main focus of this thesis. In this section, further details regarding apps will be elaborated to understand the problems within the scope of the thesis. 2.2.1 Code, Manifest, and Resources
App source code APK (zip)
.java javac .class dx .dex dx .jar
AndroidMa AndroidMa aapt nifest.xml nifest.xml*
.xml lib/*.so aapt
.png aapt res/ aapt .wav META-INF/ jarsinger
Figure 2–2: The APK file format and the brief Android build process. AndroidManifest.xml* denotes the binary form of this XML file.
Figure 2–2 briefs the composition of an APK file and the build pro- cess that generates such files. An APK file is a standard zip file, com- prising app code (.dex), a manifest file with metadata describing the app
(AndroidManifest.xml), binary linkable libraries (lib/*.so), images/- sounds/UI design files (res/) and cryptography signatures (META-INF/).
Dalvik bytecode (DEX). The app Java source code files are first com- piled with the standard JDK javac compiler. Then the dx utility of Android build tools converts multiple class files (with Oracle Java bytecode) and li- braries into one single DEX file with Dalvik bytecode. Dalvik bytecode [20] is register-based bytecode, which is more compact in space. The single .dex
8 file contains all classes of the app and 3rd-party libraries. To analyze app bytecode, smali [72] and PATDroid [27] are essential.
AndroidManifest File. The AndroidMananifest file is an XML file containing various app metadata. For example, an app is not a single entry point Java program. Instead, the app developer should inherit designated classes (i.e., app components) in the application framework to implement app logic. The names of the inherited app components are provided in the manifest file such that the Android system knows how to properly launch an app.
Resources and Layout XML. Android also allows app developers to pack images, sounds, animations, and UI design files (namely layout XML files) in an app. These multi-media files (so called Resources) are compiled by the aapt utility of the Android build tools and the app can access them via framework-provided APIs. The below XML snippet shows a simple UI design with one text box and a button. The aapt tool has a range of built-in UI widgets and pre-defined properties for each widget.
9
2.2.2 Execution Model and Performance
Android apps are GUI applications that have UI elements interacting with the user in an event-driven manner. Android apps do not have a single main function as the entry point. Instead, Android defines a set of app com- ponent classes in the framework, each representing a particular app window, background service, or a response to certain system events. The app devel- oper must inherit from these app component classes and implement app logic in designated member functions. These app components are declared in the manifest file of the app. When the user launches the app, the Android system reads the manifest file of the app, creates certain app components and calls into designated methods.
App Components and Life Cycle Methods. App components are classes defined in the application framework that serve for a particular purpose of the program. Android has three main components. An Activity manages a window with UI elements interacting with the user. A Service runs long run- ning tasks in the background without user intervention. A Broadcast Receiver responds to certain system-wise events, such as an incoming text message noti- fication. App developers should extend these app components to perform app logic. Each app component contains a set of methods that would be called when the app transits to a certain state. Figure 2–3 depicts the workflow for the system to launch an app. When the user clicks on the app icon, the system will identify the app to be launched, reads its manifest file for the main Activity class, creates this activity and finally calls its onCreate() method to start its life cycle. The app developer is expected to create UI elements (e.g. a button) in the onCreate() method. After onCreate() is completed, the system continues with the onStart()
10 create button set handler become visible
App Process onCreate() onStart() onResume() btn1.onClick()
System Process
start app user clicks on screen
Figure 2–3: The event-driven execution of an Android app. method of the activity, which is expected to attach event handler to UI el- ements. The onClick handler of the button created in onCreate() will be attached to the button object at this time. Before the window becomes visible to the user, onResume() will be called to start the UI (e.g., starting the animation). After onResume(), the window and its UI elements are cre- ated and visible to the user. When the user clicks on the button, the system intercepts this event and calls the onClick handler registered to this button.
Single UI Thread. Like other GUI applications [17], an Android app has a single UI thread to process all user input events, UI updates and life cycle methods of app components. To avoid blocking the UI thread, apps should not perform heavy tasks on the UI thread but instead spawn worker threads for such tasks. If certain operations are running on the UI thread, the thread is unable to accept further user inputs and UI update for a while. Lengthy operations on the UI thread are common causes of user perceptible delays and affect UI responsiveness. Such performance issues are the top ranked performance bugs found in real-world apps [93]. 2.2.3 Permission and Privacy
Android defines a set of permissions, each corresponding to the access to a certain system service or sensitive user data. Each app must declare the permissions it needs in the manifest file. Before Android 6.0, when the app is
11 Figure 2–4: The permissions request dialog when the user installs an app. installed, Android will display a permission request dialog like Figure 2–4 to inform the user of these permissions. The app will only be installed when the user presses the “accept” button. After Android 6.0, only ordinary permis- sions would be requested at installation time while sensitive permissions (e.g., access to user photos) will be requested when the app performs an operation that needs the permission in question. The current Android already has nu- merous permissions and new permissions are being added to newer Android. Permissions are the base line for protecting the user’s security and privacy. Permissions can effectively restrict app’s access to sensitive user data (e.g., call log, message history, contacts) but not enough for preventing an app with sufficient permissions to transmit user data out of the devices. Such threats
12 are also called data leaks, which is a classic information flow problem but fur- ther complicated by Android’s execution model. In this thesis, I propose the AppAudit system to analyze data leaks and effectively detect such leaks in real-world Android apps.
13 Chapter 3 AppAudit: Analyzing and Detecting Data Leaks In recent years, mobile devices have gained unprecedented success and become the most popular personal consumer electronics. Users store all kinds of personal data on these devices, e.g., text messages, call logs, locations, and browsing history. Mobile applications (or apps for short) can deliver rich functionalities and improve services by properly using these personal data. However, recent studies unveil many abuses of these data, which lead to data leaks intentionally [71, 62] (e.g. for improper advertising revenue) or uninten- tionally (e.g. exposing these data in plain-text over public networks [62]). 3.1 The Information Flow Problem Revisited
Data leaks tamper with user privacy, which drives users to abandon apps, harming app developers as well as the app market. To address this crucial problem, market operators have been actively developing techniques to analyze and identify data-leaking apps, i.e., app auditing. Static program analysis [121, 48, 60, 128, 126] can comprehensively examine program data flows and reveal data-leaking code paths, which is the de facto technique for app auditing. However, static analysis is generally inefficient (time- and memory-consuming) and produces false alarms. Market operators have to spend great computing power to run such analysis and further invest human efforts to validate the results. In this work, we propose AppAudit, a program analysis framework that can analyze apps efficiently (in real-time) and effectively (report actual data leaks). Figure 3–1 demonstrates the three use cases of AppAudit. First, AppAudit can be integrated into IDEs to check apps for developers before
14 AppAudit
AppAudit
AppAudit
Figure 3–1: AppAudit use cases: AppAudit protects app developers from using data-leaking 3rd-party libraries; AppAudit helps app markets to detect data- leaking apps uploaded by untrusted app developers; AppAudit helps mobile users to prevent installing problematic apps from untrusted app markets. release. This helps to identify problematic 3rd-party modules, which are the main causes of data leaks [71]. Second, AppAudit can be deployed as an automatic app auditing service at app markets. AppAudit’s high accuracy helps market operators to wipe out human involvement in validating analysis results and thus fully automates app auditing procedure. AppAudit’s high effi- ciency greatly reduces the waiting time for developers to get auditing feedback from the market after they upload apps. Third, AppAudit can be installed on mobile devices to check apps before installation. As Android allows users to install apps from any market and developer, AppAudit can protect users against data-leaking apps from untrusted sources or app markets that lack auditing service.
15 To achieve these goals, AppAudit relies on the synergy of a new dynamic analysis and a lightweight static analysis. AppAudit works with two stages. At the first stage, AppAudit performs an efficient but over-estimating static API analysis to sift out suspicious functions. The static analysis is lightweight at the cost of reporting false positives. Then at the second stage, we propose approximated execution, a dynamic analysis that can simulate the execution of a program while perform customized checks at each program state. The dy- namic analysis executes every suspicious function, monitors the dissemination of sensitive data and reports data leaks that can happen in real execution. AppAudit relies on this analysis to prune false positives from the static stage. Previous dynamic analysis systems [61] only explore code paths that have been executed, which fails to automatically explore code paths in depth. Our dynamic analysis can automatically execute a part of the program with the presence of unknown values. It achieves so with an innovative object model to represent unknown values and mechanisms to handle execution with un- knowns. Our contribution is three-fold: • We propose approximated execution, a novel dynamic analysis that can execute a part of a program while performing customized checks on its program state at each step. The executor can faithfully simulate actual program execution and function with the presence of unknowns. • We present AppAudit, an Android app auditing tool that can check apps effectively and efficiently. AppAudit embodies an API analysis to select suspicious functions and then relies on the approximated executor to prune false positives. Our experiments show that AppAudit achieves a comparable code coverage with static analysis and produces no false positives with significantly less time and memory.
16 • We apply AppAudit to examine more than 400 free Android apps col- lected from various markets. Our tool successfully identifies 30 data leaks in these apps and their containing modules. We also uncover that 3rd-party advertising libraries are the major causes of data leaks and HTTP requests are the most prominent leaking venue. The rest of this chapter is organized as follows. Section 3.2 introduces the related work. Section 3.3 presents the design overview. Section 3.4 elaborates our static API analysis. Section 3.5 elaborates our innovative execution engine for dynamic analysis. Section 3.6 evaluates the accuracy and performance of AppAudit and presents our findings on real free apps. 3.2 Related Work
In this section, we introduce the related work about analyzing mobile ap- plications. The related approaches can be divided into two categories. Static analysis produces analysis results by statically analyzing various files associ- ated with the application. Dynamic analysis runs with the application in real devices and reports problems when they happen. The synergy of static and dynamic analysis is exploited by AppAudit as well as by existing work for various purposes [64]. 3.2.1 Static Analysis
We discuss existing techniques in terms of their analysis granularities. Permission-based analysis. Android defines permissions for an appli- cation to access various resources and system services. Every application is required to declare the permissions it uses in its manifest file. Permissions can serve as an approximation of application behavior, which has been leveraged to identify malicious apps [129]. Kirin [63] checks application permission usage with a set of security policies. However, permission analysis cannot distinguish if the application is abusing permissions. For example, having the access to
17 personal information and network capability may not lead to the conclusion that the application will leak personal information via the network. As a re- sult, permission-based analysis normally faces high false positive when used to analyze personal information leakage [70]. API analysis. To complement permission analysis, existing approaches have made an attempt to analyze the API usage of applications. Stow- away [65] extracts the APIs used by an application and checks if the app is over-demanding permissions. RiskRanker [70] uses API analysis to quickly identify applications that have higher security and privacy risks. API analysis refines the analysis granularity of permission-based analysis. However, API analysis does not consider the runtime states of the application, which fails to justify privacy leakage with detailed dynamic code path. Dataflow analysis. Dataflow analysis is a classic program analysis, used for information flow validation and data reachability test. PiOS [60] performs reachability dataflow analysis on iOS apps to identify potential privacy leaks. ContentScope [128] applies dataflow analysis to detect unwanted information leakage from personal information databases to third-party Android applica- tions. In general, dataflow analysis can provide more accurate and informative results than previous analysis. However, event-driven programming paradigms (e.g., for GUI applications) could cause many possible function execution or- ders and thus greatly slow down dataflow analysis [121]. Symbolic Execution. Symbolic execution is an alternative code analy- sis to finding information leakage. Symbolic execution faces the fundamental challenge of path explosion, especially with event-driven GUI programs. Ap- pIntent [121] aims to reduce the number of paths to be executed based on
18 Android intent propagation rules to improve performance of symbolic execu- tion. Nevertheless, AppIntent still requires minutes to hours to examine an app, making it hard to be deployed to mobile users. 3.2.2 Dynamic Analysis
Dynamic analysis is implanted into the mobile operating systems and monitors applications at runtime. TaintDroid [61] applies dynamic taint anal- ysis to various components of Android OS, which tracks sensitive information flow and reports to the user when sensitive information leaves the device. AppFence [77] retrofits the Android OS to provide fake sensitive information to the applications upon user’s requests. VetDroid [126] dynamically records the permission usage of untrusted applications, which is then analyzed offline to reveal malicious behavior. Compared with static analysis, dynamic analysis only reports suspicious behavior that occurs at runtime. This feature avoid false alarms and is ap- pealing to the end user. However, for other use cases (market-level vetting, detailed code analysis), dynamic analysis can be limited by low code coverage. 3.2.3 Compiler Techniques
The approximations in AppAudit are largely inspired by analysis tech- niques used in just-in-time compilers. Many of the design decisions in our execution engine are inspired by improvements to symbolic execution, e.g. prefix symbolic execution engine [53], directed symbolic execution [94]. Also our object representation is inspired by the object representation previously used for alias analysis [68]. 3.3 The Synergy of Two Analyses
The app auditing service intends to find code paths that leak sensitive user data. Mobile apps nowadays grow larger and more complicated, with
19 static API leaking apk f() usage analysis fn() functions
f() false dynamic fn() leaking positives approximated functions execution suspicious functions
Figure 3–2: AppAudit architecture and workflow. many 3rd-party libraries and thousands of functions. Static analysis can en- counter scalability problems for large code base, because of non-scalable anal- ysis structures, such as precise flow graph or heavy analyses such as points-to analysis and symbolic execution. As a result, static analysis is generally time- consuming, especially with large real applications. Meanwhile, static analysis could generate false alarms because some analyzed code paths could never happen in real execution. These limitations greatly confine the use cases of static analysis. To tackle false positives and analysis efficiency, we start with a very lightweight static API analysis and relies on a dynamic analysis to prune its false positives, as shown in Figure 3–2. The API analysis aims to sift out sus- picious functions and narrow down the analysis scope. Then AppAudit largely depends on the dynamic analysis to execute the bytecode of each function to confirm actual data leaks. Multiple suspicious functions can be examined in parallel to improve performance. Compared with pure static analysis solu- tions, dynamic analysis can produce the specific code path that leaks data by executing the program. The detected code paths could in turns be used to generate the user inputs to trigger the leak, which greatly reduce false positive cases. However, the major challenge of dynamic analysis is caused by various unknown values during the analysis. When dynamic analysis meets unknowns, it can hardly explore deeply into code paths, which will cause false negatives.
20 To overcome this, we design a novel object model to represent and propagate unknowns. We also design several execution mechanisms to increase the depth of our analysis and avoid false negatives. 3.4 API Usage Analysis
The goal of the static API analysis is to find functions that can potentially cause data leaks. Overall, static analysis is over-estimating and AppAudit relies on a dynamic analysis to prune its false positives. In this section, we focus on tuning the static API analysis for improved performance. 3.4.1 Call Graph Extensions
A conventional call graph models the calling relationships between func- tions. A function can reach a particular API if there exists a path from the function to the API. To leak data, a function must be able to reach a source API that retrieves personal data and a sink API that transmits data out of the device. Thus, finding data leaks is equivalent to finding one path from the function to a source API and another to a sink API. Dynamic Java language features and the Android programming model can result in missing paths in a conventional call graph. Thus, AppAudit incorporates series of call graph extensions to capture the following cases: Java Virtual Calls and Reflection Calls. In Java, a virtual call can have many call targets (base class methods or derived class methods) and a reflection call can essentially reach an arbitrary function in the program. In both cases, the actual call target depends on the runtime calling context which is not visible to static analysis. In our static call graph, we assume that virtual calls can reach any matching method from all inherited classes while a reflection call will directly be marked suspicious. This is a simple (thus efficient) but over-estimating heuristic. Though more precise heuristics
21 exist [47], AppAudit aims to postpone fine-grained assessment to the dynamic analysis. Static Fields as Intermediates. It is very common that two functions exchange sensitive data via a static field. In such cases, one function will indirectly call a source API and the other will call a sink API. To complete this colluding procedure, there must be a third function that calls both in order. Thus in the call graph, this third function will be marked suspicious and examined by the dynamic analysis. Android Life Cycle Methods. An Android app interacts with the system by exposing a set of life cycle methods. When the user navigates across the app, the Android system invokes these life cycle methods in a particular order. In our call graph, we create a dummy node that simulates these ordered function invocations. If the app leaks data via life cycle methods, this dummy node will be marked as suspicious and the dynamic analysis can examine the life cycle methods in order. Multi-threading. Multi-threading is a common programming practice in Android apps. A common idiom is to retrieve some data in the main thread and then spawn a child thread to send it via the network. In a conventional call graph, the retrieving function does not directly call the sending function. To tackle this discontinuity, we treat the function that registers a callback as calling the callback directly. In addition to standard Java multi-threading support, we also apply this technique to two Android-specific asynchrony con- structs (AsyncTask and Handler). GUI Event Callbacks. Android apps heavily utilize all kinds of GUI widgets. These widgets rely on various call-back functions to respond to differ- ent user actions. We apply the technique used in the case of multi-threading to handle these GUI call-back functions.
22 f7
f1 f2
f6 BR2
f4 f5 f3 BR1 Application Library trigger sink source
Figure 3–3: An extended call graph. Each vertex stands for a function. Solid lines represent traditional call relationships and dashed lines stand for ex- tended calls. Grey vertices are the marked suspicious functions. BRs stand for BroadcastReceivers that can receive system events.
Android Remote Procedure Call (RPC). Android provides a system- wide RPC mechanism to notify apps of various system events. Apps can send messages to each other through the same mechanism. Messages are encapsu- lated in intents. Some intents might contain sensitive user data. For example, when receiving an incoming SMS, the Android system will generate an intent with the content of the SMS and send it to apps of interest. An app declares a special class called BroadcastReceiver in its manifest file to receive in- tents. In our analysis framework, we treat all BroadcastReceivers that can handle sensitive intents as calling a dummy source API to retrieve sensitive data. The first three cases are handled in an ad-hoc manner when constructing the call graph. The remaining three cases all involve call-back functions so we design a unified mechanism. We define those APIs that can register call- back functions as trigger APIs. Each trigger API can register a specific type of callbacks. In our call graph, if a function calls a trigger API, then this function will be treated as calling all possible callback functions of that type.
23 Category Trigger API Extended function calls Context.startService() u.onCreate(), ∀u extends Service Context.startActivity() u.onCreate(), ∀u extends Activity Android RPC Context.sendBroadcast() u.onReceive(), ∀u extends BroadcastReceiver AlarmManager.setRepeating() all the three above ... and 4 more setOnClickListener() u.onClick(), ∀u extends OnClickListener GUI Callbacks ... and 180 more [6] Thread.start() u.run(), ∀u extends Thread Multi-threading AsyncTask.execute() u.doInBackground(), ∀u extends AsyncTask ... and 14 more
Table 3–1: Trigger APIs and extended function calls.
Table 3–1 provides a partial list of the trigger APIs currently used in App- Audit. For example, Context.startService() registers a callback with the Android system to invoke the life cycle functions of a Service class. Thus if a function calls startService(), we treat it as calling the onCreate() function of all classes that inherit the Service class. 3.4.2 API Usage Analysis
Checking whether a given function is suspicious is equivalent to finding a path from the function to a source API and a path to a sink API. We construct an extended call graph from program bytecode, which combines a standard call graph with dummy functions and extra calling relationships according to the above-mentioned cases. To accelerate the construction algorithm, we omit Android library functions except for source, sink and trigger APIs. We want to focus on application functions and avoid analyzing the Android runtime library. After the extended call graph is constructed, we perform a breadth- first search to mark all suspicious functions. For example, with the extended call graph in Figure 3–3, the static API analysis can reveal four suspicious functions (BR1, f1 and f7, f3) while a conventional call graph can only reveal f3. Overall, the extended call graph is an over-approximated call graph with calling relationships that will not happen in real execution. Consequently,
24 known
unknown calling sink exec branching functions
approx resume no taint check
insufficient tainted data context leaked
leap end
Figure 3–4: AppAudit approximated executor state machine. our static API analysis could mark “good” functions as suspicious in trade for the analysis performance. While previous work [47] employs more complicated analyses to achieve better heuristic at the cost of performance, AppAudit takes an opposite direction and relies on dynamic analysis to prune false positives. 3.5 Approximated Execution
The static API analysis is over-approximating, which could result in false positives. We use a dynamic analysis to confirm actual data leaks and prune false positives. The approximated executor is a dynamic analysis that executes the byte- code instructions of a suspicious function and reports if sensitive data could be leaked during the execution. The executor has a typical register set, a program counter (pc), a call stack as its execution context. It relies on a novel object model to represent application memory objects. The executor has three work- ing modes, as shown in Figure 3–4. It starts with “execution (exec)” mode, where it interprets bytecodes and performs operations. Source APIs can gener- ate sensitive data objects, where we mark them as “tainted”. Tainted objects
25 propagate with the execution and taint any object that is derived from them. Whenever the executor encounters a sink API, it changes to “check” mode to check the parameters for the sink API. If tainted objects are found, the execu- tor reports the leak and terminates (“end” final state). Otherwise, it reverts back to the normal execution mode. When certain bytecode instruction can- not be executed due to unknown operands (e.g. a conditional jump instruction with unknown condition), the executor switches to “approximation (approx)” mode for approximations to continue the execution. If the approximations fail, commonly due to too many unknowns or insufficient execution contexts, it indicates that the calling context is incomplete for executing the current function. Thus the executor will skip executing the current function and start executing one of its caller functions (“leap” final state). The caller function is expected to provide a more concrete execution context for the the incomplete execution. During the execution of the caller function, the incomplete function will be re-analyzed with a more concrete calling context. 3.5.1 Object and Taint Representation
The executor starts from the function entry with the absence of its calling context (the values of parameters and global variables). We design an object model to represent and tackle unknowns. A memory object of the application is represented as a tuple φ(x) := hφT (x), φK (x), φV (x)i. φT (x) specifies its type, which can be Java primitive types (e.g., int, long, char) as well as class types (e.g. String, StringBuilder). AppAudit introduces object kind φK (x) to distinguish known values and unknowns. φK (x) can be one of the following cases: 1) a concrete object (C) that is created during the execution process, e.g. an object created by the new instruction; 2) a prior unknown (PU), which exists prior to the execution process and contains no known values to the executor, e.g. a global variable; 3) a derived unknown (DU), which was a
26 prior unknown but is changed during the execution process. DUs mix known values and unknowns. For instance, a DU could have some known fields and some unknown fields. φV (x) stores the known value(s) of the object. For primitive types, φV (x) reflects its known value, e.g. an integer of value 5 is represented as φV (x) = {val 7→ 5}. If the value is unknown, φV (x) = ∅.
For class types, φV (x) stores all its known fields, e.g. φV (x) = {field1 7→ φ(y)} representing x.field1 == y. Unknown fields will not appear. Arrays are special objects with indices as fields, e.g. an array of two elements is represented as φV (x) = {0 7→ φ(y), 1 7→ φ(z), length 7→ 2}. φV (x)[field] can query a particular filed of an object x. If the field is known, this query returns the known object. Otherwise, this expression returns a prior unknown. In addition to our object representation, AppAudit also tracks taints on objects similar to dynamic taint analysis [107, 106]. For each memory object x, we define τ(x) as its tainting state. Each source API could generate a specific type of taint, representing a particular type of personal data (e.g. text message, location, etc). Taints propagate along with the object. Any object derived from a tainted object will also be tainted. If a sink API meets a tainted object, our executor will report a leak. We will explain our tainting rules in detail after introducing the execution rules. 3.5.2 Basic Execution Flow
We use five examples to demonstrate the basic workflow of the execu- tor and the expressiveness of our object representation. We assume that the source() API generates a tainted integer (denoted as taint) and the sink() API checks if its parameter is tainted. All parameters and global variables (static class fields) are prior unknowns when the execution starts, whose values are unknown to the executor.
27 1. In foo1 shown below, c is first assigned a new concrete object with no known fields, i.e., hT, Concrete, ∅i. Then c.f is assigned and c becomes hT, Concrete, {f 7→ taint}i. Finally, sink() checks c.f. And since it is a taint, the executor reports a leak.
foo1(T x, T y) { c = new T(); c.f = source(); sink(c.f); }
2. In foo2 shown below, x starts as a PU. Then x.f is assigned with a concrete object (the taint), which changes x from a prior unknown to a derived unknown hT, DU, {f 7→ taint}i. A derived unknown implies that this object was unknown (PUs) but some known values have been assigned to it during the execution. Therefore, when the concrete object c gets x.f, it gets the known value (the taint) assigned to x.f before. Finally, the executor successfully reports the leak on sink().
foo2(T x, T y) { x.f = source(); c = new T(); c.f = x.f; sink(c.f); }
3. In foo3 shown below, the condition checks if a concrete object c is equal to a prior unknown x. By definition, a prior unknown is created before execution while a concrete object is created afterwards. Thus the
28 executor can safely evaluate the condition to be false and no leak will be reported.
foo3(T x, T y) { c = new T(); if (c == x) sink(source()); }
4. In foo4 shown below, the condition compares two prior unknowns. Since the executor does not know if x and y refer to the same object, this condition ends up as an unknown. The branching depends on an unknown condition and thus the executor reverts to the approximation mode, which will be discussed in details later.
foo4(T x, T y) { if (x != y) sink(source()); }
5. In foo5 shown below, x changes to a derived unknown with a concrete field but y is still a prior unknown when its field is checked. Thus, the executor also needs to revert to approximations.
foo5(T x, T y) { x.f = source(); sink(y.f); }
From these examples, we illustrate how our object representation keeps record of both known and unknown objects and tracks their propagation to reflect the data flows of personal data.
29 # Instruction Execution Semantic (1)1 x=12 φ(x) ← hint, C, {val 7→ 12}i (2)2 x=new φ(x) ← hT, C, ∅i T() (3)12 x=y φ(x) ← φ(y) ( φ(x) ← hφ (x), DU, φ (x) ∪ {f 7→ φ(y)}i, if φ (x) = PU (4)2 x.f=y T V K φV (x) ← φV (x) ∪ {f 7→ φ(y)}, otherwise 2 (5) x=y.f φ(x) ← φV (y)[f] (6)1 x=source() φ(x) ← taint (7)1 sink(x) switch to check mode (8)12 call assign parameters according to Rule (3) fn() ( φ(x) ← hφ (y), C, κ (φ (y), φ (z))i if φ (y) = φ (z) = C (9)1 x=y T op V V K K binop z φ(x) ← hφT (y), PU, ∅i otherwise ( φ(x) ← hφ (y), C, κ (φ (y))i if φ (y) = C (10)1 x=unop y T op V K φ(x) ← hφT (y), PU, ∅i otherwise φ(x) ← hBool, C, κcmp(φV (y), φV (z))i if φK (y) = φK (z) = C 12 (11) x=y φ(x) ← hBool, C, {val 7→ false}i if φK (y) 6= φK (z) ∧ φT (y) ∈ PTS cmp-op z φ(x) ← hBool, PU, ∅i otherwise ( φ(x) ← φ (a)[φ (i)[val]] if φ (a) = C ∧ φ (i) = C (12)12 x=a[i] V V K K φ(x) ← hELEMT , DU, ∅i otherwise ( φ (a) ← φ (a) ∪ {φ (i)[val] 7→ φ(x)} if φ (a) = φ (i) = C (13)12 a[i]=x V V V K K φ(a) ← hφT (a), DU, φV (a) ∪ {φV (i)[val] 7→ φ(x)}i if φK (a) 6= C ∧ φK (i) = C ( pc ← κ (φ (cond), pc, l) if φ (cond) = C (14) jmp-op jmpop V K cond,l switch to approx mode otherwise (15) jmp l pc ← l 1 this bytecode accepts primitive types 2 this bytecode accepts class types Table 3–2: The execution rules. κ is a series of evaluation functions that perform real calculation when values are known. PTS denotes primitive types.
3.5.3 Complete Execution Rules
Table 3–2 lists the complete execution rules used in AppAudit executor. Rule (1) to (7) have been covered by above-mentioned examples. The rest handle other bytecode instructions: Function Call. Rule (8) shows that our dynamic analysis is naturally inter-procedural. When a function call is being made during execution, the executor will step into the function and pass the parameters accordingly. Arithmetic Operations. Rule (9) and rule (10) outline how to evaluate binary and unary arithmetic expressions. These expressions take only prim- itive types. Basically the executor will compute concrete values when both
30 operands have concrete values. When unknowns are present, the result will be unknown accordingly. Comparison Operations. Rule (11) tackles comparison expressions. Similar to arithmetic operations, if both operands are concrete (known), the result can be evaluated naturally. When unknowns are present, the result is also unknown except for one case. If one operand is a concrete object while the other is an unknown (PU or DU), the result is evaluated to false. This is because an unknown object is an object exists before the execution, whose address is definitely different from a concrete object created during execution. Note that if the program compares an unknown internalized string (String.intern()) with an known one, the result will still be unknown. Array Operations. Rule (12) and rule (13) handle array operations, which are similar to rule (4) and rule (5). Changing an array element can also change a prior unknown to a derived unknown. Branching Operations. Rule (14) and rule (15) handle branching in- structions. For conditional jumps with unknown conditions, the executor will revert to the approximation mode. 3.5.4 Tainting Rules
Personal data is marked as tainted, which is propagated during execution. The taint tracking capability of AppAudit is largely similar to the taint propa- gation rules used in dynamic taint analysis [107, 61]. In our rules, rule (1) and rule (6) set taints explicitly. Rule (9) and rule (10) taint the result as long as one of its operands is tainted. Rule (12) taints x as long as i is tainted. This rule handles encryption libraries that perform substitution to encrypt data, hence tainted inputs should lead to tainted outputs.
31 3.5.5 Execution Extensions and Optimizations
Our executor contains several extensions and optimizations to accelerate the execution speed while maintaining the same instruction semantic. Dynamic Dispatch (Reflection and Virtual Calls). Java virtual calls and reflections are dynamically dispatched. During execution, these call targets will be resolved. Inlining Call-back Functions. As mentioned before, call-back func- tions are widely used and hide implicit data flows. Thus, when the executor encounters a trigger API, it will execute the callback function being registered after the current function is finished. Exception Control Flow. Exceptions can affect control flows. Some instructions and APIs can generate exceptions (e.g. array indexing instructions and file related APIs). Currently our executor supports only plain exceptions (no nested exceptions). Unhandled exceptions are ignored during execution. Library Emulation. Library functions contain large body of instruc- tions and lots of calls into other library functions. The executor emulates some library functions for improved analysis performance. To emulate a par- ticular library function, the executor manipulates its object representations directly to achieve the same effect of the emulated function. For example, to emulate swap(x,y), the executor swaps the object representation directly without executing its bytecode. Library emulation is commonly implemented to accelerate the calls to standard Java library functions. 3.5.6 Approximation Mode
As shown in the execution rules, unknown values can be stored, prop- agated and evaluated with our object model. However, when a conditional
32 if (
(a) conditional statement (b) for loop while (
(c) while loop (d) do-while loop
Figure 3–5: Four basic control flow structures and their compiled bytecode streams. jump instruction meets unknown values, the executor will fail to perform con- trol flow decision. In this case, the executor changes to approximation mode.
Unknown Branching Approximation. The executor relies on this approximation to continue when it encounters branching instructions with unknown conditions. This approximation is designed to skip unknown loops, since these loops cannot provide useful known information from unknowns. Table 3–5 shows the four basic control flow structures compiled by an Android compiler. For the three looping structures, the branching approximation al- ways chooses not to take the conditional branch to skip these loops. However,
33 as we cannot distinguish ifs and loops, this approximation will only explore the “then” branch for unknown if-else structures. This bias could potentially result in loss of code coverage in analyzing one function, but is benign as the function would be re-analyzed when analyzing its callers. Consider the following program: foo() { T a = new T(); T b = new T(); bar(a,a); bar(a,b); } bar(T x, T y) { if (x == y) return ; else
sink(source()); }
In this example, bar will be executed. When executing bar, both x and y are prior unknowns, which trigger the approximation to guide the executor to explore only the “then” branch and thus no leak will be reported. Due to insufficient calling contexts, the “else” branch will not be explored when analyzing bar. Then according to our API analysis, foo will also be analyzed if bar has been analyzed, as foo is a caller of bar. When analyzing foo, the executor will analyze bar again with two concrete calling contexts. Under the bar(a,a) context, the condition will be evaluated to true and no leak will be reported. Under the bar(a,b) context, the condition will be evaluated to false and the leaking “else” branch will be explored.
34 Observed from this case, the unknown branching approximation only af- fects a function with insufficient calling contexts (bar). The approximation will result in fewer code paths being explored. But then the executor will reach callers (foo) of this function and re-analyze the unsuccessful function (bar) with more concrete calling contexts from its caller. If the program contains leaking paths, then at least one of these calling contexts will be sufficient to reach the leaking point. Thus the bias introduced before will be amortized. If the program does not contain leaking paths, then the approximation will skip some code paths but none of them will leak data. In short, the unknown branching approximation will not miss leaking code paths. 3.5.7 False Positive Analysis: Execution Path Validation
We provide a methodology to validate that the approximated execution path can happen in real execution, i.e., testifying that all detected leaks are true positives. The approximated executor starts from the entry of a function and terminates either when a leak is detected or the end of the function is reached. During execution, the executor will record a sequence of operations executed and approximations made. This sequence is called the approximated execution path. We use an example to demonstrate the entire validation proce- dure. The below snippet depicts an example Java program with two functions. Assume the approximated execution starts from foo, where the x will be a prior unknown. void foo(int x) {
int y = x * x; if (y < 0) { bar(x); } else { return ; }
35 } void bar(int x) { if (x > 0) { sink(source()); } }
The approximated execution starts at first line of the foo function and ter- minates at the second line of bar where a leak is found. Since x in foo is a prior unknown, which also makes y unknown. Altogether, the executor will make two branch approximations, at if (y < 0) in foo and if (x > 0) in bar. Both branching operations depend on unknowns and are ap- proximated true to continue execution on the “then” branch. The below is the approximated execution path.
//x is input, symbolic, prior unknown int y = x * x; if (y < 0)// approximated to be true x2 = x;// implicit, due to function call bar(x) if (x2 > 0)// approximated to be true sink(source());// report
We have three symbolic variables in the path, x, y and x2. Note that due to function calls, we use x2 in bar to represent the argument x. To validate if this approximated execution path could happen in real execution is to validate that all approximated branchings could be true in real execution. This problem is further simplified to testify that the predicate y < 0 ∧ x2 > 0 can be satisfiable. With reference of y = x * x and x2 = x, this predicate becomes x ∗ x < 0 ∧ x > 0, which is a predicate that only depends on the input symbolic x. With x being an integer (ignoring the integer overflow case for x ∗ x), this predicate will then be solved to false, indicating
36 that the original predicate is not satisfiable and this approximated execution path can never happen is reality, i.e., a false positive case of data leak. This validation process can be used to validate all approximated execution paths and prune false positives of AppAudit. This validation process can be summarized as follows, with reference to satisfiability modulo theories (SMT) [51]: 1. Obtain the approximated execution path and all branching approxima- tions 2. Construct a predicate P with all all branching approximations joined by conjunctions (AND) 3. Keep replace variables in P with their definitions, until that the predicate only contains the symbolic input parameter. The result predicate is Pˆ(x ··· ) where x ··· are input parameters of the starting function. 4. Use a SMT solver (e.g., Z3 [58]) to simplify Pˆ 5. If Pˆ is testified to be infeasible, the approximated execution path is not possible in real execution. 6. Otherwise, the simplified Pˆ is the condition triggering the data leak. When the validation process certifies that Pˆ is feasible, the approximated execution path is possible in real execution. The simplified Pˆ is condition that triggers the leak. Again consider the altered case and its approximated execution path. void foo(int x) {
int y = x * x; if (y > 0) {// changed bar(x); } else { return ; }
37 } void bar(int x) { if (x > 0) { sink(source()); } }
//x is input, symbolic, prior unknown int y = x * x; if (y > 0)// approximated to be true x2 = x;// implicit, due to function call bar(x) if (x2 > 0)// approximated to be true sink(source());// report
The predicate is y > 0 ∧ x > 0. After replacement, Pˆ = x ∗ x > 0 ∧ x > 0. After simplification, Pˆ becomes x > 0. This indicates that when the input parameter x is greater than 0, the leak will happen. 3.5.8 False Negative Analysis: Tainting Validation
Since our executor performs dynamic analysis, we would like to ensure that it does not miss important (leaking) code paths. When running in ex- ecution mode, the executor can faithfully reproduce the actual path of the real execution. When the executor turns to the approximation mode, it will explore only a few possible code paths, which could lead to false negatives. We have analyzed the side-effect of unknown branching approximation. When the application contains leaking paths, the executor will have a proper calling con- text for executing regardless of this approximation. Thus this approximation only misses non-leaking paths and is benign to the overall accuracy. In addition, our executor embodies a taint analysis to track the dissem- ination of personal data. Thus the correctness of taint rules also affects the
38 accuracy of the executor. We identify the following cases that could affect accuracy of a dynamic taint analysis: Taint Sanitization. Currently, our tainting rules only add and propa- gate taints but never remove them. This could lead to inaccuracy and false positives. One typical case is about rule (9) and rule (10) in Table 3–2. Cur- rently, the result of an arithmetic operation will be tainted as long as one operand is tainted. However some arithmetic operations always return the same result regardless of the value of the operands. For example, x = y ⊕y al- ways returns zero. For such cases, the taint on the result should be removed. Our current implementation does not have taint sanitization. Nevertheless, although these cases are possible for hand-crafted applications, the standard Android Java compiler never generates such idioms. Array Indexing. For an array operation x = a[i], currently x will be tainted if i is tainted. This is because encryption functions usually use an array to map plain-text inputs to encrypted outputs. Thus the taints on the output is dependent on the input. This is commonly employed by other taint analyses [107, 61] to deal with encryption libraries. However, if the array is zero-valued, then x will always be zero regardless of the index i. Again, the current propagation rule over-taints the results and could lead to false positives. Control Flow Dependent Taints. This is a well acknowledged draw- back in most taint analysis [107, 61, 105]. if (x == 1) y = 1; else if (x == 2) y = 2;
In this case, the values of the two variables are correlated so should be their taintness. However, by using the control flow structures, the executor is un- aware of the correlation and always produces an untainted y. ScrubDroid [105,
39 33] presents more attacking cases for a standard taint analysis. We expect to integrate a more powerful code structure recognition module to detect such cases in the future. 3.5.9 Infinity Avoidance
During execution (both the exec and approx mode), our executor can run into infinity due to infinite loops and recursions in the application.
Infinite Loop Prevention. An app could have infinite loops in its code logic. For example, the application can spawn a thread that uses an infinite loop to check network updates. In real execution, this thread could be interrupted by user exiting the application. However, in approximated execution, this infinite loop will never end. To ensure that the executor can always terminate, we introduce a infinite loop detector in the executor detect infinite loops and terminate them (changing the execution state to the leap state). To support infinite loop prevention, the executor keeps an instruction counter recording the number of instructions that have been executed. Every time the executor meets a branching instruction with an unknown condition, it records the current instruction counter and thus each branching instruc- tion will have a list of associated instruction counter values. When a certain branching instruction has been approximated for more then a threshold, the infinite loop prevention will be triggered. For an infinite loop, each iteration will encounter the same number of instructions and the infinite loop detector calculates the number of iterations with the same number of elapsed instruc- tions to detect infinite iterations. We obtain these two thresholds through empirical experiments. First we turn on instruction tracing such that every instruction being executed by AppAudit will be logged. Then we gradually increase the threshold until the
40 infinity avoidance mechanism no longer cuts short any code paths. Finally, we double this fix point as our final threshold to ensure that these thresholds work for other real apps. Alternative infinite loop detectors [86, 54] check the program memory state instead of the instruction counter to detect infinite loops. Such methods could complement the current detection approach for better accuracy. Never- theless, the infinite loop detection problem is equivalent to the halting problem, which is theoretically undecidable [109]. And thus current approaches adopted by AppAudit are essential approximations.
Infinite Recursion Prevention. For infinite recursions, we limit the stack size for the executor. When the app stack size exceeds this limit, the executor will treat this as a stack overflow exception and terminate the exe- cution. The stack size limit might vary between different Android versions, which is defined as kDefaultStackSize in the Dalvik VM source code dalvik/vm/Thread.h [37]: • API 3 (Android 1.5) = 8KB • API 4-10 (Android 1.6 - Android 2.3.7) = 12KB • API 14-17 (Android 4.0 - Android 4.2.2) = 16KB • API 19 (Android 4.4.4) = 32KB [36] The approximated executor itself has a larger stack limit than the app stack limit stated above, which ensures that the executor can always terminate in- finite recursions in the app and complete properly. 3.6 Evaluation
In this section, we evaluate AppAudit in terms of its accuracy and usabil- ity. We demonstrate the three use cases of AppAudit, with regards to market
41 Component Percent. Description preprocessors 11.2% Disassembler and manifest parser emulation 28.5% Library and device emulation core engine 15.8% Core approximated executor objmodel 20.2% Object representation apianalysis 7.0% Call graph based API analysis util 17.3% Utility Total 100% 10,559 lines of code Table 3–3: The SLOCs for different components. operators, app developers and mobile users. We also present a characteriza- tion study about data-leaking apps, providing guidance for designing effective data leak prevention tools. 3.6.1 Implementation
The AppAudit prototype is implemented with Java, and reflectively loads Android SDK for API signatures. Table 3–3 presents the breakdown of source lines of code for different components. We leverage dex2jar for disassem- bling [14] and APKParser [46] for manifest parsing. The API analysis accounts for a relatively small portion of the entire code base. The approximated ex- ecutor is the main contributor to the code base, which implements an Android Dalvik [8] bytecode virtual machine. Portability. Our current prototype is implemented in Java, which can run on different platforms. We have an optimized version for server configu- ration and an Android port with simple GUI. API emulation efforts. As shown in Table 3–3, API emulation accounts for 28.5% of our code base. Currently API emulation is done manually. We have emulated 54 classes and 130 functions, which are the most frequently used in the apps in our evaluation datasets. API emulation is a tedious task and
42 Dataset # Samples Description droidbench 56 A micro-benchmark [15] that stresses the completeness of taint analysis malware 1005 Android malware genome project [127] freeapp 428 Popular free apps from the official market Table 3–4: Evaluation datasets. we are exploring automated ways to generate emulated code for all standard Java library APIs. Device emulation. We emulate a Samsung Galaxy Nexus (i9250) smart- phone running Android 4.0.3, with WiFi and cellular connections. The specific model number, serial, OS version code, CPU types are dumped from a real phone. These information are exposed to the app in the standard Android class android.os.Build. We also emulate a basic /proc file system to present the low-level information about the emulated device. Parallelized Execution of Multiple Approximated Execution. To further improve the analysis speed on multi-core platforms, AppAudit executes multiple (four by default) code paths concurrently. Each code path is executed in a separate execution context and shares no states between each other. Thus the dynamic analysis is fully parallelizable and the parallelism can be adjusted for different use cases. Native code. Some Android apps can link and call into native libraries. Currently, our executor does not execute native code and will simply return an unknown when it meets a native function. We expect a binary executor to provide fine-grained data flow information about native functions.
43 3.6.2 Evaluation Methodology
Our evaluation contains four parts. First we use a micro-benchmark suite to validate the completeness of our static API analysis. Second, we use mal- ware samples to evaluate the accuracy of AppAudit. In particular, we want to answer these two questions: 1) Can our dynamic analysis guarantee no false positives? 2) Can AppAudit provide comparable code coverage as static analysis (a low false negative rate)? Third, we use real-world apps to evaluate the usability as well as usefulness of AppAudit. Our real app based evalua- tion aims to answer the following questions: 1) What is the analysis time and memory consumption? 2) How could AppAudit be used in different use cases? Fourth, we present characterization study of data-leaking apps uncovered by AppAudit. We aim to show the common properties among these apps so as to provide guidance for designing effective prevention tools. Evaluation datasets. Table 3–4 summarizes the datasets used in our evaluation. 1. DroidBench [15] dataset. DroidBench contains a suite of hand-crafted Android applications that exploit various features of the language and programming model to bypass taint analysis. We use DroidBench to validate the completeness of our static API analysis. 2. Malware dataset. Our malware dataset contains 1,005 samples from the Android malware genome dataset. We select the ones related to data leaking based on extensive reference of studies from mobile security companies and labs [25, 13, 35, 26, 34, 18, 44]. Malware samples have well understood malicious behavior [127], which serves as a good accuracy index for data leak detection tools. 3. Free app dataset. Real apps are normally much larger and more com- plicated than malware. Thus, we choose these samples to evaluate the
44 analysis performance and usefulness of program analysis tools. Our ini- tial sampling began around March 2013 when half of the dataset were collected. We notice some user feedback about data leaks online. So we collect newer versions of these apps around January 2014 to outline how developers respond to these reported data leaks. Collected apps comprise not only top free apps but also newly uploaded apps during that sampling time period. Evaluation candidates. We compare AppAudit with two state-of-the- art pure static analysis tools. FlowDroid [48] leverages a precise flow graph to find leaking data flows. FlowDroid achieves high precision by accurately modeling the runtime behav- ior of Android applications with a flow graph. On the contrary, AppAudit largely relies on executing bytecode to reproduce and confirm leaks in real execution. FlowDroid is open-source and thus we can compare the results of both across all three evaluation datasets. AppIntent [121] is a static analysis based on symbolic execution. Its main goal is to prune false positives and optimizes the performance of symbolic ex- ecution. AppAudit also leverages a dynamic analysis to reduce false positives, which naturally becomes a competitive approach for the same purpose. Ap- pIntent is not publicly available and we only have its results on the malware dataset. 3.6.3 Completeness of Static API Analysis
AppAudit adopts a two-stage design where the static API analysis will narrow down the analysis scope for the dynamic analysis. So the static stage should completely include all possible data leaks. We use DroidBench to evalu- ate the completeness of our static API analysis. FlowDroid is the only previous
45 approach compared in this analysis since AppIntent is not available to test on this benchmark. DroidBench contains 65 test cases in total. We exclude 9 unsupported cases and use the remaining 56 for our completeness evaluation. Four excluded cases are related to control flow dependent taints (see Section 3.5.8). This problem is itself an interesting and hard research topic, which is currently not supported by FlowDroid [48] (the state-of-the-art static analysis) and App- Audit. Three excluded cases are because AppAudit does not treat password input widgets as source APIs so far. Two excluded cases declare GUI callbacks via XML files, which is not fully supported by AppAudit. Among the remaining 56 DroidBench tests, AppAudit produces no false positives and two false negatives. As a comparison, FlowDroid has four false positives and two false negatives. Overall, AppAudit achieves fewer false pos- itives and as few false negatives as FlowDroid. AppAudit eliminates all false positives with its dynamic analysis. The dynamic analysis only executes pos- sible code paths and thus false positives caused by impossible code paths will be pruned. The two false negative cases of AppAudit both leak data when particular user inputs happen in a particular order. AppAudit fails to report these leaks because it cannot model infinite possibility of user input orderings. Previous work [121] argues that some particular ordering of user inputs might imply user awareness of the data leak, which indicates that a detection tool should not report such leaks. 3.6.4 Detection Accuracy
Our malware dataset contains 23 malware families, covering a wide range of malicious data-stealing behavior.
46 100%
80%
60%
40%
Detec on accuracy 20%
0%
AVG Avast Sophos F-Secure Kingso McAfee AppVerify AppIntent Microso Symantec FlowDroid Kaspersky AppAudit BitDefender
Figure 3–6: The overall true positives on Android malware genome dataset (99.3%).
We compare AppAudit with both AppIntent and FlowDroid. We do not consider existing dynamic analysis like TaintDroid [61] for accuracy compari- son because 1) existing dynamic analysis requires user inputs and can hardly be automated; 2) static analysis can achieve better code path coverage than existing dynamic analysis. We also compare AppAudit with a collection of commercial solutions, including off-the-shelf anti-virus software and the Google Application Veri- fication Service (AppVerify) shipped with Android 4.2 [16]. The results of commercial anti-malware are obtained from VirusTotal [45], a website that scans submitted mobile apps with latest mobile anti-virus solutions. In terms of AppVerify, we reference the results from an existing study [16]. Overall Detection Accuracy. Figure 3–6 shows the comparison of overall detection accuracy (true positives plus true negatives) among all anal- ysis tools and anti-virus solutions. AppAudit outperforms two state-of-the-art static analysis tools and a number of commercial solutions with a detection accuracy of 99.3%. FlowDroid fails on 6 samples due to memory exhaustion.
47 Malware family TP+TN FP FN Sample # AnserverBot 187 0 0 187 Badnews 2 0 0 2 BeanBot 1 0 7 8 BgServ 9 0 0 9 DroidDreamLight 46 0 0 46 DroidKungFu1 34 0 0 34 DroidKungFu2 30 0 0 30 DroidKungFu3 309 0 0 309 DroidKungFu4 96 0 0 96 Endofday 1 0 0 1 Geinimi 69 0 0 69 GGTracker 1 0 0 1 GingerMaster 4 0 0 4 GoldDream 47 0 0 47 jSMSHider 16 0 0 16 KMin 52 0 0 52 DroidKungfuSapp 3 0 0 3 LoveTrap 1 0 0 1 NickyBot 1 0 0 1 Pjapps 58 0 0 58 Plankton 11 0 0 11 RogueSPPush 9 0 0 9 SndApps 10 0 0 10 Spitmo 1 0 0 1 Total 998 0 7 1005 TP: True Positive, TN: True Negative; FP: False Positive, FN: False Negative Table 3–5: The breakdown of detection accuracy on Android malware genome dataset.
48 Table 3–5 provides a breakdown for false positive and negative cases for App- Audit. False Positives. Overall, AppAudit achieves no false positives while FlowDroid reports one false positive from DroidDreamLight samples. We in- spect this case to understand the reason. Generally, DroidDreamLight samples collect personal data and then send them to a list of remote servers. These samples decrypt a configuration string with a hard-coded DES key to obtain a list of target servers. However, the particular case has a malformed config- uration string and thus no target servers will be obtained and no data leaks will actually happen. The decryption contains lots of substitutions with array operations, which stresses static analysis to correctly model them. With our dynamic analysis, AppAudit can faithfully perform the complete decryption and obtain the decrypted string. Consequently, AppAudit validates that the leaking code snippet in this case is actually dead code due to the malformed configuration and successfully prunes this false positive case. False Negatives. AppAudit reports seven false negative cases on the malware dataset, all from the BeanBot family. Our manual de-compilation and check reveal that BeanBot retrieves personal data and then sends a text message to a cellular number for the service code of the carrier. Once it receives the response text message, it will leak the user data [82, 67]. This shows a typical case where the sending behavior is dependent on external inputs. Since AppAudit cannot predict the content of the incoming text message, it cannot firmly report this case as a data leak. This false negative scenario outlines the major difference between static and dynamic analysis in handling data leaks that depend on external inputs. Such a situation shows that a leak will be triggered given some external inputs (input-sensitive leaks). Dynamic analysis would tend not to report this as a
49 leak, because the analysis cannot firmly ensure that the leaking path will be visited. On the contrary, static analysis tends to treat the path as leaking as long as it finds one possible input that could lead to a leak. Under such circumstances, both analyses are guessing if the leaking path will be visited (dead code or not) based on unknown external inputs. A better indicator for this situation is to determine whether this data leak is user-intended or not [121]. If the input is a user input, then probably the user agrees to let the app send the data and thus such input-sensitive leaks should not be labeled as actual leaks. If the input is a message from an untrusted source (like the case with BeanBot), then such input-sensitive leaks are more likely to be actual leaks. Modeling such inputs would be an interesting future direction for AppAudit. 3.6.5 Usability
Real Android applications are generally much larger and more compli- cated than malware. To examine the practicality of various app auditing tools, we conduct an experiment to compare the analysis time as well as memory consumption for existing tools and AppAudit. Our performance experiment runs on a desktop PC equipped with a quad-core 3.4GHz i7-3770 processor and 8G memory, running 64-bit CentOS 7 and Oracle Java 7. Analysis Time. Figure 3–7 compares the average analysis time per app of the three candidate program analysis tools when examining real apps. Since AppIntent is not publicly available, we only reference its results for malware samples. FlowDroid and AppAudit both have two working modes. The single mode reports only one data leak and the full mode reports all data leaks. We choose to report the analysis time with single mode, which is the most efficient mode for both tools.
50 freeapp malware > 1hr AppIntent
40.54 FlowDroid 11.44
4.87 AppAudit 0.61
0 10 20 30 40 50 60 Average analysis me (seconds)
Figure 3–7: The average analysis time per app for AppAudit and two static analysis tools. Note that FlowDroid only finishes 61% of the samples (due to OutOfMemory exceptions and 10-minute timeout). Its average time only includes successful cases.
As shown, AppAudit performs much faster than static analysis tools. Specifically, AppAudit performs 8.3x faster than FlowDroid, the best-performing static analysis so far. With long analysis time, static tools are generally not acceptable for mobile users. Meanwhile, longer analysis time requires market operators to spend more resources to run the analysis. To further improve AppAudit performance, we measure the breakdown of AppAudit analysis time to locate time consuming parts of AppAudit. The breakdown shows that around 30% to 40% of the analysis time is spent on disassembling. We are planning to adopt a multi-threaded implementation to accelerate this phase. Meanwhile, we also discover that some functions are executed repeatedly during the dynamic analysis and return value caching could be a direction for optimization as well. Memory Consumption. Memory footprint is also an important con- straint of program analysis tools when examining complicated and large real
51 Requirements Market operators Developers Mobile users Platform server desktop mobile device Analysis time days minutes real-time Memory < 100G < 16G < 1G Result granularity brief/complete complete brief Table 3–6: App auditing use cases and requirements. applications. AppIntent requires 32GB and FlowDroid needs about 2GB to 4GB memory by default. Static analysis tools generally require large memory because they need to accommodate huge data structures (such as flow graphs and symbolic representation). The space complexity of these data structures is proportional to the size of the application code base. This constraint makes static analysis memory-consuming for analyzing large real apps. On the contrary, dynamic analysis is more memory efficient. AppAudit only requires a heap size of 256MB, which can run on mobile devices, PCs and servers. According to our measurement, the peak memory consumption App- Audit is only 10% of FlowDroid in most tested cases. In our implementation, we apply several optimizations to control the overall memory consumption of AppAudit. First, we trigger a manual garbage collection after the API analy- sis to keep minimum analysis data structures in memory after the static stage (e.g. bytecode of the app, its class hierarchy). These analysis structures take around 2MB to 20MB memory space according to our measurement. Second, when executing the target app, the memory consumed by the executor is pro- portional to memory consumption of the target app. When some memory objects are no longer needed by the target app, AppAudit will also derefer- ence them such that they will be automatically garbage collected by the JVM hosting AppAudit. Use cases and requirements. We discuss the use cases of app auditing and elaborate the requirements imposed on the auditing tool for each case.
52 Table 3–6 summarizes the three cases and their requirements. We obtain the memory constraints with regards to the memory capacity of the individual platform that runs app auditing. First, app market operators demand an app auditing tool to identify data-leaking apps uploaded to the market. Usually, this use case demands low false positive and false negative rates but do not have strict requirements for the time, memory and result granularity, since the analysis commonly runs on powerful cluster servers. AppAudit stands out among other solutions for this case for its high detection accuracy and low resource consumption, which ensures detection quality while greatly saves the resource investment for automatic app auditing. The second use case of app auditing is to allow app developers to check their apps before publishing. In this case, developers demand the tool to report all possible data leak problems within the capability of a development machine, such as a desktop PC. AppAudit and FlowDroid can both report all data leaks found in an app. When working in the full mode, both tools require more time than the single mode shown in Figure 3–7. Our measurements show that AppAudit runs 4.7x to 7.8x slower for individual apps while FlowDroid encounters more OutOfMemory exceptions and observes similar slowdowns. Nevertheless, AppAudit still manages to finish analysis within one minute and stands for a competitive choice for this use case. The final use case is to run auditing tools on mobile devices and help users to avoid installing data-leaking apps. In this case, the analysis has strict memory and time constraint. However, it is only expected to provide brief auditing results, sometimes just whether the app will leak data or not. Figure 3–7 shows the analysis time on a desktop PC, which shows that App- Audit is the only tool that can fulfill this task on mobile devices. Other tools
53 require memory that is unrealistic even for high-end devices nowadays. We port AppAudit to Android and run it to check apps installed on an LG Nexus 5 smartphone. This device is a late-2013 model with a quad-core 2.3GHz CPU and 2GB RAM. We then experiment the analysis time again with the mobile version of AppAudit. The results show a 1.5x to 2.3x slowdown as compared to Figure 3–7. 3.6.6 Characterization of Data Leaks in Real Apps
AppAudit uncovers 30 data leaks in 400 real apps we collected. For all detected data leaks, we manually confirm them by decompiling related apps and examine the leaking code paths. Based on the reported cases by App- Audit, we can easily characterize data leaks in terms of the leaking component (simply the class name), the leaking sources and venues. Table 3–7 summarizes our characterization results. In this table, we crawl number of downloads to highlight the number of affected users. We also crawl the privacy policy of these leaking apps to clarify if data leaks are made clear to users. Our characterization results show the following interesting findings: Finding 1: Most data leaks are caused by 3rd-party advertis- ing libraries. From Table 3–7, we found that 28 out of the 30 (93.3%) detected data leaks are caused by 3rd-party advertising libraries. As previous research [65, 71, 62] has pointed out, 3rd-party advertising modules aggres- sively request application permissions to access various personal data. If an advertising library leaks data, it can potential affect lots of apps. Meanwhile, hackers have started to exploit advertising libraries to spy on users [19]. We believe that privilege separation [98, 89, 125] and fine-grained privilege control will help to prevent the threats caused by these problematic libraries. From the perspective of app developers, AppAudit can help check
54 Name Compo- Source Venue Privacy Policy Installs nent (M for millions) Texas Poker App Location HTTP GET × 10M- v4.0.1 50M Word Search Mobfox Location, HTTP GET app,lib 0.5M-1M v1.14 IMEI Speedtest Mobfox Location, HTTP GET app,lib 10M- v2.09 IMEI 50M Brightest MDotm, IMEI, HTTP GET app,lib 50M- Flashlight Mobclix IMSI 100M v2.3.3 Weather Un- App, Location, HTTP GET app,lib 1M-5M derground Mobclix IMEI v2.1.2 Fruit Ninja (2 Mobclix IMEI HTTP GET app,lib 100M- samples) 500M Angry Birds Jumptap IMSI HTTP GET app,lib 300M- (14 samples) 900M Bad Piggies (3 Jumptap IMSI HTTP GET app,lib 10M- samples) 50M Tap Tap Re- Tapjoy IMEI HTTPS GET × 0.1M venge v4.3.3 v4.4.5 Logo Quiz Tapjoy IMEI HTTPS GET × 10M- v8.8 50M Trial Extreme Tapjoy IMEI HTTPS GET × 5M-10M v1.28 & v2.83 Big Win Bas- Tapjoy IMEI HTTPS GET app,lib 5M-10M ketball v2.0.4 Solitaire Tapjoy IMEI HTTPS GET app,lib 50M- v2.1.5 100M Talking Tom 2 Tapjoy IMEI HTTPS GET app,lib 100M- v2.0.3 500M Table 3–7: Free apps that spread certain personal information identified by AppAudit. For the “Privacy Policy” column, a “lib” means that the privacy policy does not cover the kind of data spread by advertising libraries.
55 Figure 3–8: The venues of data leaking. their apps before publishing to the market, which could effectively detect data leaks beforehand and avoid accidentally using data-leaking 3rd-party modules. Finding 2: HTTP requests are the most prominent leaking venues. Figure 3–8 presents the leaking venues for all data-leaking cases in malware and free app datasets. HTTP(s) transmission turns out to be the most popular venue to leak data, since HTTP servers can be easily configured. This suggests that mobile application confinement tools [104, 89, 79, 120, 52] can focus on HTTP traffic to effectively confine data leaks. Nevertheless, eight reported free apps transmit personal information in plain-text forms (HTTP GET requests). Consequently, some important per- sonal information (locations and identity) can be easily obtained by traffic sniffing in the public. To make things worse, some of these report apps do not have a clear privacy policy statement, which makes users unaware of the potential risks. Finding 3: Tracking is universal. Figure 3–9 presents the breakdown of leaked data found in the malware and free app datasets. We discover that the IMEI number and phone number is the most commonly leaked information. The phone number is commonly sent by malware for follow-up SMS phishing.
56 Figure 3–9: The types of leaked data.
IMEI serves as the phone identity and is widely used to track user for targeted advertising. Nowadays, each free app is bundled with a couple of advertising libraries [71] and the user interacts with a number of apps. IMEIs to mobile devices is what cookies are to web browsers. Cookies are bound to individual websites, i.e. one website cannot access the cookies of another. However, IMEI tracking is not bound to individual apps but to individual advertising libraries. Thus, if two apps use the same advertising library, the advertiser can accurately track user’s transition from one app to another. If the data transmission between the library and server is unencrypted, the trace can be acquired and used to predict user habits and launch social engineering attacks. Finding 4: Apps and advertising libraries are gaining awareness of user privacy. We find that, apps (Word Search and Speedtest) are gain- ing awareness of privacy by removing problematic advertising libraries. We believe that AppAudit, when integrated with IDEs, could well assist devel- opers for this purpose. On the other hand, we discover advertising libraries are gaining privacy awareness as well. For example, a newer version of the Tapjoy advertising library hashes IMEI before sending it to the advertising
57 server. Given that hashing is cryptographically hard to invert, hashing effec- tively avoids leaking plain-text IMEIs. The newer versions of Trail Extreme and Big Win Basketball benefit from this simple hashing and no long leak data because of Tapjoy. These advancements witness the improved awareness of user privacy for both apps and advertising libraries.
58 Chapter 4 AppInspector: Programmatically Diagnosing Performance Issues Mobile apps are highly interactive with users, and thus performance is a crucial factor for user satisfaction. In this work, I revisit the performance prob- lems in released apps and the limitations of current performance debugging tools for Android. Based on these understandings and the analysis building blocks from AppAudit, I develop AppInspector that statically instruments app code to collect performance data and diagnose performance problems. 4.1 Introduction
Mobile app markets now serve billions of users worldwide with over one million apps. In such a competitive environment, app developers need effective tools to improve app quality. Performance is a crucial factor of app quality but is also an aspect that is hard to debug. According to a recent survey of 485 app and library developers on Github [110], more than 53% of Android developers still rely heavily on manual testing and analysis of user reviews to locate performance issues. Though previous performance bug study has summarized common performance anti-patterns that lead to performance is- sues, these analyses only provide hints when developers debug apps but no automated tools could use these patterns. Profiling [30, 10] is the most common technique to debug app perfor- mance. However, profiling every methods could incur significant runtime over- head (up to 2.5× slowdown [74]), resulting in artificial problems that conceal real performance issues. Developers have to manually shrink the profiling scope to locate actual performance issues. Meanwhile, method profiling can
59 not track asynchronous execution, which is a prominent programming pat- tern for mobile apps [101, 122]. Previous research tools propose finer-grained tracking of asynchronous calls, but require a modified OS [122] or a specialized programming framework [101]. In the Android ecosystem, there are over 1,294 device manufacturers, 24,093 distinct devices in 2015 [43] and a wide distri- bution of OS versions [29]. Tools that require modification to OS and/or the programming framework are not practical due to a large number of customized Android in the wild. In this work, we reuse the analysis functionality from AppAudit (e.g., finding event handlers or particular receivers of system events) and extend it with bytecode manipulation capability. The extended version of AppAudit can not only analyze app code but also instrument the app to monitor program execution and collect various information at runtime. Based on AppAudit, we develop AppInspector, a lightweight profiling and performance diagnosis tool for Android apps. AppInspector first instruments the app binary code to collect diagnostic information at runtime. AppInspector embodies a novel tracing mechanism that can track intra-thread, inter-thread and inter-process asynchronous calls without modify the OS or the programming framework. AppInspector also largely reduces methods being profiled and instead collects complementary stack sampling information to reduce runtime overhead. The profiling data is converted into a graph structure, which allows app developers to programmatically search for performance anti-patterns and diagnose their causes. AppInspector embeds built-in analyses to diagnose two popular perfor- mance anti-patterns. The first diagnosis detects long-running methods in the UI thread, which causes user-perceptible delay. This diagnosis reveals lengthy operations that lead to detected long latency. The second diagnosis detects colliding asynchronous functions, which causes congestion in worker threads
60 and slows down asynchronous execution. Overall, AppInspector has low run- time overhead and will not introduce artificial performance issues. Also, with precise asynchronous call tracking, AppInspector can reveal more issues and provide more detailed diagnosis. Finally, AppInspector does not require mod- ification to the OS or programming framework, which enables it to function on unmodified commodity devices. We apply AppInspector to 20 real-world apps collected from the Google app market and profile these apps with their common usage scenarios. App- Inspector reports 440 performance issues in total, with a diverse range of causes, such as over-complicated UI design, inefficient implementation in user- defined functions, hidden slow paths in Android APIs, etc. These reports come with specific source code locations of involving app components and we demonstrate that developers could easily resolve these issues based on App- Inspector reports. The contribution of this work is three-fold. • We develop a mechanism that can track asynchronous calls in Android apps by only instrumenting app binary code. This mechanism does not require OS or programming framework support and thus allows the instrumented app to run with developers’ original test toolchain and commodity devices. • We develop AppInspector, a profiling and performance diagnosis tool for Android apps. AppInspector traces asynchronous calls and profile apps at low runtime overhead. The profiling data is converted into a graph representation that facilitates app developers to programmatically detect performance anti-patterns. We demonstrate two analyses that diagnose long-running methods in the UI thread and colliding asynchronous func- tions.
61 • We apply AppInspector to real-world apps and detect 440 performance issues from these well-tuned released apps. We demonstrate with case studies that AppInspector can uncover non-trivial performance issues and precisely pinpoint the problematic components to help developers to fix them. The rest of this chapter is organized as follows. Section 4.2 character- izes various performance issues in mobile apps. This study highlights issues specific to mobile platforms and outlines the code patterns that lead to perfor- mance problems. Based on the characterization study, we further design App- Inspector to automatically detect and diagnose performance issues at runtime. Section 4.3 introduces with the design of AppInspector and also covers various implementation details and optimizations. Section 4.4 presents our evaluation methodology and results. Section 4.5 introduces related work. 4.2 Performance Issue Characterization
In this part, we explain the causes and consequences of three top perfor- mance issues found in real-world apps [93]. We also perform API analysis on over 115,000 app releases from the Google Play app marketplace to character- ize the wide existence of these issues. 4.2.1 Lengthy Operation
Android apps are like other GUI applications that are highly interactive to the user. An app has a single UI thread to process user input events (e.g., clicks on a button). To respond to a user input event, an app first gathers user input data such as the text content entered a text box UI element. Second, the app could collect auxiliary information for processing the input, such as reading data from local files or performing network communication to gather such information. Third, the app computes the output from the data generated in previous steps. Finally, the output is updated to the UI and becomes visible
62 to the user. Though step 1 and 4 are typically lightweight, step 2 and 3 can be time-consuming. If all of these four steps are done in the UI thread, they could occupy the UI thread for an excessive long time period. During this period, the app will not able to respond to new user inputs or render UI updates (e.g., an animation). If this period is long enough to be perceptible to the user (typically beyond 100ms to 200ms [95]), the user can observe the slowness and influence user satisfaction. To avoid responsiveness issues, heavy tasks must be moved off the UI thread and be completed in worker threads in an asynchronous manner. Previous work has studied performance bugs found in open-source apps [93], which outlines that UI responsiveness issues are common and account for 75% of the total problems. Many reasons contribute to this phenomenon. First, writing asynchronous logic is generally harder than writing synchronous logic. When the app logic becomes complicated, finding heavy tasks becomes non-trivial and refactoring such tasks with asynchronous design becomes even harder. Second, legacy code using inefficient or even deprecated Android APIs is still very common in released apps. For example, the setImageResource API is known to perform bitmap decoding in the UI thread, which could cause a latency if the bitmap image is large. Android suggests to use decodeResource and perform image loading asynchronously. However, a static API usage anal- ysis of these two image loading APIs in 115 thousand apps shows that the problematic setImageResource is used 4× more than its efficient replace- ment decodeResource. Such usages are not only found in app code but also in included library code. These statistics suggest that synchronous APIs are generally easier for use and thus have higher adoption than asynchronous APIs. With the use of synchronous API, performance issues are inevitable. Third, tools for debugging and diagnosing performance bugs are still very
63 primitive. According to a recent study [110] of how 485 open-source app de- velopers debug and fix debug performance bugs, many of them still depend on manual code reviews, or blind profiling tools. There are few tools that can reproduce real performance bugs, measure performance data to reveal these issues or provide insightful root cause analysis. 4.2.2 Over Asynchrony
Running lengthy operations in the main thread demonstrates the problem of lack of asynchrony. Thus, asynchronizing execution is a typical approach way to avoid such cases. However, over asynchrony could also increase the complexity of control flows and slow down the app in practice. There are two types of performance issues that are commonly caused by over asynchrony.
Colliding AsyncTasks. AsyncTask is the asynchronous construct provided by the Android programming framework and is the most commonly used asynchronous pattern in Android apps. AsyncTask aims to simplify the UI-centric design of Android apps, offloading heavy tasks to a worker thread and encapsulates three phases of app execution. The first phase (onPreExecute) executes in the main thread, which mainly collects input data from UI ele- ments. The second phase (doInBackground) runs the heavy tasks such as network communication and database operations in a worker thread. The last phase (onPostExecute) runs again in the main thread, after the completion of the doInBackground(), to update the results to the UI. An AsyncTask can be started from the UI thread and the AsyncTask could further start new AsyncTasks. To avoid common data races among concurrently running AsyncTasks, the default Android API (AsyncTask.execute) executes all AsyncTasks on one single designated worker thread [32]. However, this also implicitly enforces that multiple AsyncTasks must execute serially, which po- tentially degrades the parallelism. An app encounters the colliding AsyncTasks
64 problem, where multiple independent AsyncTasks runs serially and completes much slower overall. To characterize how many apps face the colliding AsyncTask problem, we statistically count the occurrence of the problematic API AsyncTask.execute in our app database. We notice that more than 23% of the apps (about 26,000) only use AsyncTask.execute to start AsyncTasks. This suggests that col- liding AsyncTasks will always happen in these apps whenever AsyncTasks are executed. Also the adoption of the replacement API AsyncTask.executeOnExecutor() is significantly lower (about 73%) than the problematic AsyncTask.execute(). The wide existence of the use of these problematic APIs indicates that the col- liding AsyncTask problem is prevalent in real-world apps. In this work, we specifically design a mechanism to capture runtime colliding AsyncTasks and inform the app developer for further improvements. 4.2.3 Memory Bloat
Android apps are developed in Java, which has a garbage collector that automatically reclaims memory that is no longer needed by the app. Garbage collection simplifies memory management and greatly reduces common mem- ory safety issues such as use-after-free and buffer overflow. However, memory bloats (simply because the app is using too much memory) still exist. On mo- bile devices, memory bloat issues could quickly exhaust the limited memory space. There are two consequences of memory bloat. First, if the app memory usage reaches the heap limit, the app will be terminated for OutOfMemory exception. Second, with increased memory usage, garbage collection will be triggered more frequently, which pauses app execution and degrades perfor- mance. Notably, memory leaks are the extreme cases of memory bloat issues, where the app keeps references to memory objects that are no longer needed
65 Category Affected component Issue ID java.text.SimpleDateFormat 37607 android.view.View 20724, 18273 android.view.inputmethod.Input- 34731 MethodManager Framework android.database.sqlite.SQLite- 22794 Database android.app.WallpaperManager 40552 android.widget.SearchView 25442 android.util.SparseArray 29884 Listener 15170 Layout XML 18001 Local (location based service app) 39821 Maps (map app) 39819 Talk (phone call and text messaging app) 39818 Built-in apps DigitalClock (clock app) 17015 Gmail (email client) 28524, 21965 Latitude (location service app) 21189 LatinIME (input method) 17903 Dalvik VM libdvm 29306 Available at: http://code.google.com/p/android/issues/detail?id=
App name Issue ID Description anttek2 System utility mapview-overlay-manager 21 Map widget mapsforge 72 Map app eyes-free 100 Speech recognition osmdroid 265 Map widget android-rcs-ims-stack 107 Communication standards roboguice 102 Dev libs mconf 251 Web conferencing robotium 331 Testing framework libgdx 460 Gaming lib adwhirl 71 Advertising lib gmaps-api-issues 4766 Map lib Available at http://code.google.com/p/
Table 4–1: A list of studied memory related bug reports.
66 non- User Native framework objects objects defined Activity 20% 32% objects 45% 30%
Other View Thread 10% Any object Listener 7% 30% 20% 6%
(a) Framework objects (activity, lis- tener, view and others) vs. non- (b) Further breakdown of non- framework objects. framework objects.
Figure 4–1: Breakdown of memory objects involved in memory leak reports. and thus could not be garbage collected. Memory bloats are reported to ac- count for 11.4% of the common performance bugs [93]. In this part, we collect and study 31 memory-related bug reports from the Android platform and open-source app projects to characterize real-world memory bloat issues. Table 4–1 summarizes all the bug reports we studied. As shown, mem- ory bloats are widely found in the Android programming framework and apps (both built-in apps and open-source apps). We further investigate the memory objects concerned in these reports. Figure 4–1 presents the breakdown of con- cerned memory objects. Figure 4–1a highlights the percentage of framework objects and non-framework objects. As noticed, most memory bloats involve memory objects from the Android programming framework, where Activity is the most mentioned object. An Activity object represents a UI window and further references all UI elements in the window, totaling a large amount of memory. As mentioned previously, Activities have life cycle methods and the app developer should release all references to the Activity when its life
67 instrument app profile on unmodified OS bytecode lightweight profiling app complementary tracing log tracking asynchrony
convert visualize
find colliding tasks profile graph
developer decompose lengthy operations
Figure 4–2: AppInspector workflow. cycle ends (onDestroy() get called). Most of the bugs we studied men- tion Activity leaks when implicit memory references keep the old Activity in memory. Since Activity accounts for a large part of memory usage, such leaks quickly exhaust the heap space and cause OOM exception to terminate an app. As compared to framework objects, non-framework objects, especially user defined objects are mentioned less often. From these analyses, we con- clude that memory bloats are most commonly associated with objects that are not created by the app developers themselves.
Correlation with lengthy operations. Memory bloats (and leaks) would inevitably increase the app heap usage and trigger garbage collection more frequently. On a real device, one garbage collection operation would pause the app for up to tens of milliseconds. Thus, when memory issues happen, they would indirectly cause more lengthy operations that result in user perceptible slowness. In our later work, we will treat memory bloat issues as one of the causes of lengthy operations.
68 4.3 AppInspector Design
Profiling every method in the app incurs high runtime overhead that causes artificial problems and hinders the detection of real problems. Also asynchronous calls are prominent in mobile apps and developers need to have asynchronous call causality to understand app execution, which is missing in method profiling. This section presents how AppInspector overcomes these shortages. Figure 4–2 shows the workflow of AppInspector. First, AppInspector instruments the app bytecode to profile a small set of functions, periodically samples the thread stack to detect lengthy operations, and tracks asynchronous calls. Such instrumentation allows AppInspector to collect crucial information regarding app UI performance with lower runtime overhead and run with an unmodified programming framework and OS. Second the instrumented app is profiled, typically by running through developer’s existing test cases on original test devices. Third, the collected runtime log is transformed into a graph structure that facilities the understanding of app execution and the detection of performance anti-patterns. Finally, AppInspector performs two diagnoses based on this graph and reveal long running methods in the UI thread and colliding asynchronous functions. 4.3.1 Bytecode Instrumentation
AppInspector has two instrumentation primitives. The “injection” prim- itive searches for particular methods (e.g., all methods that can override a given method from a base class) and injects code at the entry and exit of these methods. This is usually used to profile methods of interest in the app code (including library code). The “interception” primitive searches for particular function call instructions and replaces them with calls to another method, typically a hook function. This primitive is often used to trace calls
69 4 GC 2 worker thread DownloaderThread.run()
6 1 3 UI thread onClick decodeBitmap
5 5 5 5 sampling thread
Figure 4–3: AppInspector instrumentation details. Lightweight Profiling: 1. event handlers, life cycle methods and UI callbacks; 2. asynchronous functions; Complementary Tracing: 3. time-consuming API calls; 4. GC pause time; 5. stack sampling; Tracking Asynchrony: 6. asynchronous calls; to specific system APIs. AppInspector instruments apps at six profiling points as shown in Figure 4–3. Profiling point 1 and 2 are intended for lightweight profiling, which traces methods that are directly called by the Android pro- gramming framework. Profiling point 3 to 5 are intended for complementary tracing, which intercepts certain time-consuming API calls and samples UI thread stack to provide complementary diagnostic information. Profiling point 6 tracks asynchronous calls, to link methods that are called asynchronously.
Lightweight Profiling
In Android, apps do not start from a main method. Instead, app develop- ers are expected to inherit classes from the Android programming framework and override certain methods to implement app logic. The framework then calls these methods when the app changes states or receives user inputs. App- Inspector only traces app methods that are directly called by the Android programming framework or the Java runtime. Tracing these methods allows AppInspector to be informed once the control flow is transferred to the app code. These methods include life cycle methods, event handlers and UI callbacks running in the UI thread (profiling point 1) as well as asynchronous functions
70 running in worker threads (profiling point 2). Life cycle methods are called by the framework whenever the app changes states (e.g. android.os.Activity.- onResume() is called when the app window becomes focused). Event han- dlers are bounded to a UI widget and get called when the widget receives cer- tain user inputs, e.g., android.view.Button.onClickListener.on- Click(). UI callbacks are functions being called when certain system tasks are completed (e.g., a worker thread has completed its job). Finally, the app can perform asynchronous calls to start running functions in a worker thread, i.e., asynchronous functions. All of these methods are profiled by instrument- ing at their entries and exits. Profiling such a small set of methods provides coarse-grained information about what app code is executing in which thread at a very low runtime overhead. Complementary tracing provides finer-grained information to diagnose lengthy operations performed by these profiled meth- ods.
Complementary Tracing
The goal of complementary tracing is to breakdown the execution of pro- filed methods and uncover lengthy operations that would cause UI performance issues.
Time-consuming Android API calls (profiling point 3). Calling time-consuming APIs in the UI thread is often the cause of UI responsive- ness problems, as reported in previous bug app study [93, 110]. AppInspector intercepts calls to four types of time-consuming Android APIs to detect poten- tial lengthy operations in the UI thread. Bitmap operations typically involve high CPU cost for decoding bitmap image data and memory allocation cost. Network APIs are blocking operations and have varying latency depending on network conditions. Storage APIs involve storage IOs and consistency coor- dinations that could be slow. Layout APIs are used to create a tree of app
71 UI elements, becoming time-consuming when the UI complexity grows. The set of intercepted APIs can be easily extended when new APIs are added to future Android programming framework versions.
Garbage collection pause time (profiling point 4). Garbage col- lection could happen almost randomly during app execution and needs to pause the app for marking unused memory objects. This pausing phase could last several milliseconds and could affect app performance when triggered fre- quently. Previous reports show that memory bloat (and memory leaks) could affect UI responsiveness for causing too much GC overhead. However, garbage collection is transparently invoked by the Dalvik Virtual Machine [20] and in- strumenting app code can not capture such events. Nevertheless, the GC events are logged to Android’s system-wide logging facility (logcat). App- Inspector collects GC events from logcat and merges that with its own runtime log.
UI thread stack sampling (profiling point 5). AppInspector peri- odically dumps the stack of the UI thread to reveal lengthy operations that could affect UI responsiveness. The sampling interval is decided by the desired frame rate (e.g., > 45Hz for games and > 24Hz for apps). Stack sampling happens in a separate thread and thus does not affect the execution of other app threads.
Tracking Asynchronous Call Causality
Asynchronous execution is common in mobile apps. Typically the app should quickly respond to user inputs in the UI thread and spawn worker threads to perform heavy tasks off the UI thread. When the worker thread finishes computation, it makes another asynchronous call back to the UI thread
72 to update the results to the UI. The below snippet demonstrates such a sce- nario (inter-thread asynchronous call), where the event handler onClick run- ning in the UI thread asynchronously calls DownloaderThread.run() to download a picture: void onClick() {// event handler in theUI thread Thread t = new DownloaderThread(); t.start();// asynchronous call } class DownloaderThread extends Thread { void run() {// asynchronous function // downloada picture } }
An developer needs to understand the causality between onClick() and DownloadThead.run() to understand how the user input is processed. AppInspector aims to track all asynchronous calls happening in the app code so as to present this important information to the app developer. Previously, AppInsight [101] presents a tracking technique for Windows Phone apps developed with C# and the Silverlight programming framework. In C#, all asynchronous functions are enclosed in a special “callback closure” object and AppInsight wraps its tracking code in this construct to achieve tracking purposes. Unfortunately, such a construct does not exist in Dalvik VM and Android, which complicates asynchronous call tracking. Current approaches on Android are very primitive, either requiring developers to man- ually instrumenting their app source code [122], or requiring changes in the Android OS [112]. AppInspector introduces an instrumentation-only solution for tracking asynchronous calls in Android apps.
73 In the above example, there are two steps to pair the asynchronous call (t.start()) and the asynchronous function (DownloaderThread.run()). First, a unique matching token must be generated to represent this entire asyn- chronous call process. Second, the token must be passed along with the asyn- chronous call to the asynchronous function. This first step is trivial (e.g., with a global non-decreasing counter) while the second is not. The asynchronous function can only access its this pointer and parameters. The parameter list is mostly fixed for asynchronous functions (defined by Java) and can not be extended to pass the token, while piggybacking the token in the this pointer requires adding new fields to existing classes, which is not always possible.
Matching inter-thread asynchronous calls. We notice that in the above example, the Thread object is shared between the UI thread and the worker thread. AppInspector advocates to use the shared object address as the matching token. Using memory address is safe because the address of the object is unique until it is reclaimed and the shared object will be alive in the memory during the asynchronous call process. Using the object address as the matching token avoids the complication of adding new fields. The below is the instrumented version of the example to demonstrate this tracking technique in AppInspector: void onClick() { Thread t = new MyThread(); HookCall(t);// replace async call with hook call } void HookCall(Thread _this) { // log object address as the matching token logAsyncCall(System.IdentityHash(_this)); _this.start(); } class MyThread extends Thread {
74 void run() {// asynchronous function // log again to pair logAsyncCall(System.IdentityHash(this)); ... } }
AppInspector uses the “interception” primitive to replace the original asyn- chronous call with an invocation to a hook function AsyncHook. The hook function logs the address of the thread object and then starts the thread. Then AppInspector uses its “injection” primitive to inject code at the beginning of the asynchronous function MyThread.run(), which again logs the address of the thread object. In the collected execution log, the causality between the asynchronous call and asynchronous function can be inferred based on this matching memory address. Using memory object address for matching not only works for Java constructs like Thread but also works for some Android constructs such as AsyncTask and Handler.
Matching intra-thread and inter-process Android intent calls. Android provides a system-wide Remote Procedure Call (RPC) mechanism called intent calls. Intents are objects that store the name of the target method (later called reflectively) and optional parameters, being similar to an RPC parcel. Intent calls allow apps to expose reusable functionalities, which are common in Android apps. An app can make an intent call to run its life cycle method on the same thread (intra-thread). Also the OS can make an intent call to a receiver method of an app to deliver a system-wide notification (inter-process). The below example demonstrates a typical intra-thread intent call: void onClick() {// event handler in theUI thread // asynchronous call
75 Intent src = new Intent(MyReceiver.class); sendBroadcast(src); } class MyReceiver extends BroadcastReceiver { public void onReceive(Intent i) {// asynchronous function
} }
Similar to the Java thread example, sendBroadcast() is the asynchronous call and MyReceiver.onReceive is the asynchronous function. onClick, which runs in the UI thread, causes MyReceiver.onReceive to run in the same thread later. However, for intent calls, the address of the Intent object can not be used as the matching token for two reasons. First, the Intent object received by the asynchronous function is a clone of the source object and thus their addresses are different. Second, the same Intent object can be used in multiple intent calls. Fortunately, the Intent object can store extra objects, which can be passed along with the intent call. Thus, AppInspector uses the Linux epoch time at the moment of the asynchronous call as the matching token. This ensures that each Intent call will receive a globally unique token for matching. The token is piggybacked in the Intent object and passed to the asynchronous function. The asynchronous function logs the token to match the asynchronous call. This strategy works for all intent-based asynchronous calls in Android.
76 User
1 2 DownloaderThread. Btn1.onClick 6 run 5 3
5 3 4 foo decodeBitmap
5 4 5 4 bar GC
(a) Generating a profile graph from six profiling points. t1 t2 t3 t4 t5
onClick
bar bar zen foo foo foo onClick onClick onClick
onClick foo bar
span=t1..t5 span=t2..t4 span=t2..t3 (b) From stack dumps to the profile graph. thread
worker DownloaderThread.run GC
Btn1.onClick main foo decodeBitmap bar GC time
(c) Time-thread view of the profile graph in (a).
Figure 4–4: Generating and visualizing a profile graph.
77 4.3.2 Profile Graph
Since AppInspector only instruments app code to enable runtime tracing, the instrumented app can run with existing test cases on unmodified test de- vices. This feature allows AppInspector to seamlessly integrate with develop- ers’ original testing environments (tool-chain, devices, etc). The instrumented app will log while being tested. The runtime log is then converted into a graph structure, namely the profile graph, for better developer comprehension and easier detection of performance anti-patterns. Overall, the profile graph is a context-sensitive dynamic call graph with extra asynchronous call edges linking methods in different threads. In a profile graph, vertices are methods and edges are either synchronous or asynchronous call relationship. Generating this graph is straightforward, as shown in Fig- ure 4–4a. Each paired log reported from profiling point 1 and 2 in Figure 4–3 represents certain methods and generate vertices in the profile graph. Logs for profiling point 3 indicates a synchronous call to a system API and thus generates an API vertex and a synchronous call edge in the graph. Logs for profiling point 4 indicates that a GC pause phase happens. This will pause all running threads at that moment. AppInspector treats this situation as if all running methods synchronously call the GC and thus generate one GC vertex and many edges from all running methods toward this vertex. Logs for profiling point 5 are all stack traces at the sampling moment. By com- paring consecutive samples, AppInspector can inspect which methods is being executed at a particular moment. Figure 4–4b shows an example, where the profiling point 1 reports the execution of onClick between t1 and t5. Then three stack dumps are performed at t2, t3 and t4, respectively. From the three stack dump samples, AppInspector can infer that the foo method is called by onClick and run at least between t2 and t4. Similarly bar is called by foo
78 and runs between t2 and t3. Thus the partial call graph of these three methods are generated (as shown in the bottom part of Figure 4–4b) and is appended to the entire profile graph. Finally, AppInspector introduces a “User” vertex to the profile graph and adds asynchronous call edges from this vertex to any method that has no incoming edges (e.g., Btn1.onClick), as if the user is triggering these initial methods. The generated profile graph is a single-root (the user vertex) directed acyclic graph (DAG). Many graph search algorithms could be applied pro- grammatically on such a graph to search for performance anti-patterns. Fig- ure 4–4a presents an entity-relationship view of the profile graph. The profile graph could also be visualized in a time-thread manner for better comprehen- sion for app developers, as shown in Figure 4–4c. 4.3.3 Diagnosing Performance Issues
Once the profile graph is generated, finding performance issues is equiv- alent to finding particular graph patterns in the profile graph. App develop- ers can programmatically search for known anti-patterns in the profile graph. AppInspector has two built-in diagnoses to demonstrate this process. The first diagnosis finds long-running methods in the UI thread, which is reportedly the most common performance bug [93]. These long-running methods block the UI thread from updating the UI or responding to new user inputs for a user-perceptible period of time. In the worst case, this could trigger Appli- cation Not Responding (ANR) error and lead to app termination. Compared with Android system’s threshold-based detection of extremely long lengthy operations (ANR), AppInspector’s diagnosis could further break down the lengthy operations to discover the specific user-defined methods or API calls that contribute most to the over-time. The second diagnosis finds colliding asynchronous functions where a worker thread is congested with too many
79 asynchronous functions. For both diagnoses, AppInspector not only finds the problematic method but also collects causes from the profile graph.
Long-running Methods in the UI Thread
The below snippet demonstrates an example report for a long-running method in the UI thread (refer to Figure 4–4a):
Long-running method (event handler) detected in the UI thread: Btn1.onClick lasts 695ms 1. lengthy bitmap API calls decodeBitmap lasts 445 ms
* garbage collection (30ms) 2. user-defined function foo (>120ms)
* user-defined function bar > 80ms
The first line pinpoints the problematic method and its execution time. The following items break down its the delay composition, being lengthy API calls, lengthy user-defined functions or garbage collection. Such a report is straight- forward and with a modern IDE, app developers could easily navigate to the problematic methods and the lengthy operations. To find long-running methods in the UI thread, AppInspector traverses all vertices that directly linked with the “User” vertex in the profile graph. AppInspector filters these methods with an execution time threshold. Ac- cording to previous research [95], a response time over 100 milliseconds is perceptible to users. Every method over this limit is diagnosed as a problem- atic method and results in a diagnosis report. For each problematic method, AppInspector further searches for the methods that are synchronously called by this problematic method. These methods are the delay contributors for the long-running method. For example in Figure 4–4a, assume Btn1.onClick is a long-running method. The delay composition of onClick consists of foo, decodeBitmap in the first layer and bar, GC in the second layer.
80 AsyncTask #1
JavaBridge
AsyncTask #3 function async_call ui_delay System
api_callthreads gc AsyncTask #2
main
Thread-5301
101871 101891 101911 101931 101951 101971 101991 time (ms) Figure 4–5: A case of colliding asynchronous functions in the time-thread view of a real profile graph. Dashed lines are asynchronous calls.
DownloaderThread.run is called asynchronously and thus its execution time does not factor into onClick and will not appear in the report.
Colliding Asynchronous Functions
App developers are encouraged to offload heavy tasks to worker threads by asynchronously executing functions. The colliding asynchronous functions problem happens when the app starts too many asynchronous functions in a short time and congests one or more worker threads. Consequently, these asynchronous functions have to execute serially and some of them would have long completion times (from the time the task is submitted to the time the task is completed). Though some asynchronous functions might perform back- ground tasks that are not visible to the user (e.g., downloading updates), col- liding asynchronous functions introduce a potential congested time window for asynchronous tasks. Figure 4–5 visualizes this problem in a time-thread view of the profile graph from a real app. As noticed, the main thread is mostly idle but one worker thread is congested with many asynchronous functions. These functions are started from the UI thread in a short period of time (dashed
81 lines). This problem is not visible to developers with previous profiling tech- niques due to the lack of asynchronous call information. The diagnosis aims to find colliding functions and report them to the developer. The developer can review the data dependencies among colliding functions and run parallelizable functions in different worker threads. To diagnose colliding asynchronous functions, AppInspector searches for all asynchronous functions and their paired asynchronous calls. AppInspector represents each pair as a tuple of thread ID and three timestamps (T ID, tsubmit, tstart, tend). tsubmit is the timestamp on the asynchronous call, which indicates when the asynchronous function is submitted to run. tstart is the timestamp when the asynchronous function actually starts to run in the worker thread and tend is the timestamp when it completes. According to causality tsubmit < tstart < tend. A colliding case happens when another asynchronous function is submit- ted before a previous asynchronous function in the same thread is completed.
0 0 0 0 That is to say, for an asynchronous function (TID , tsubmit, tstart, tend), it col- lides with another asynchronous function (T ID, tsubmit, tstart, tend) if TID =
0 0 0 0 TID and tsubmit < tsubmit < tend or tsubmit < tsubmit < tend. Duplication Elimination
Since the profile graph is context-sensitive, if a problematic method is executed multiple times, the profile graph will have many same subgraphs related to the problematic method. This results in duplicates in the diagnosis report. AppInspector incorporates a mechanism to detect and eliminate such duplicates. For example in Figure 4–4a, if Btn1.onClick() is a long-running method in the UI thread. Then the subgraph with onClick, foo, decodeBitmap, bar, GC vertices would constitute a diagnosis report. If onClick is exe- cuted more than once, the same subgraph would repeat and AppInspector
82 would generate many similar reports. AppInspector binds each diagnosis re- port with a subgraph. After all reports are generated, AppInspector removes the reports with isomorphic subgraphs. 4.3.4 Implementation
AppInspector has three main components, a binary instrumentator, the instrumentation payload and diagnosis components. The binary instrumen- tator is based on Smali [72] and PATDroid [27], supporting standard Dalvik dex bytecode as well as optimized dex. The instrumentation payload con- tains the tracing code, which is injected into target app by the instrumenta- tor. In the current prototype, AppInspector captures 88 methods executable in the UI thread (event handlers, life cycle methods and UI callbacks), 26 bitmap APIs (android.graphics.BitmapFactory and UI widget op- erations such as ImageView.setImageResource()), 60 database APIs (java.io, android.database.sqlite.SQLiteDatabase, android- .content.SharedPreferences) and android.content.ContentPro- vider), 7 layout APIs (Context.setContentView() and LayoutInflater.in- flate()), 15 network APIs (java.net.* and org.apache.http.*) and 1 garbage collection API (System.gc()), totaling 4,880 lines of Java code. The size of the compiled instrumentation payload is about 80KB, which is a negligible footprint to the APK file. The profile graph related parts are implemented with Python, totaling 1,133 SLOCs. Considering the complexity of Android APIs, all instrumented APIs are currently manually selected ac- cording to Android API specifications, with reference to previous performance bug study [93]. 4.4 Evaluation
In this section, we profile 20 real-world apps with their common usage scenarios and demonstrate how AppInspector detects performance issues in
83 these apps and how these reports help developers to locate the problem in source code. We also compare AppInspector with previous static analysis tools in terms of detection capability. Finally, we measure the runtime overhead of AppInspector profiling to characterize runtime performance impact. 4.4.1 Methodology
We sample apps from 11 popular app categories from Google’s Android App Store and for each category we select several apps from the top 50 popular and new apps (in terms of number of downloads). These apps have compli- cated program logic, potential code obfuscation and a wide range of program behavior. All the 20 selected apps have more than one million downloads and have been available in the market for over one year, which are assumed to have sufficient resources and mature methodology for performance opti- mization. We apply AppInspector to these apps to non-trivial performance problems in these well-tuned apps. Table 4–2 summarizes the selected apps and evaluation workloads. For each app, we follow the AppInspector workflow shown in Figure 4–2. First, we instrument the app with AppInspector. Then the instrumented app is installed on unmodified Cyanogenmod 11 (Android 4.4.4 KitKat) on a commodity test device (Google Nexus 7 tablet with a quad-core 1.2GHz Cortex-A9 CPU and 1GB RAM). We aim to cover most of the app functionality when profiling an app. The collected runtime log is then converted to a profile graph and App- Inspector diagnoses run on these graphs. We present the diagnosis reports for long-running methods in the UI thread and colliding asynchronous functions.
Verifiability. We use a commercial software [22] to record touchscreen inputs while profiling each app. All test apps and their input traces for eval- uation are publicly available [41]. These traces can be replayed with the same recording software to reproduce the workloads of evaluated apps.
84 App name Category Downloads Workload Workload description length (seconds)
Bible Book 50M-100M 204 Read Bible chapters, related events and videos FaceQ Entertainment 10M-50M 432 Create several cartoon avatar pic- tures CocoPPa Entertainment 10M-50M 227 Browse and set home screen wallpa- pers; download new wallpapers and themes from the website PetShop Game 10M-50M 117 Collect and feed pets, visit the pet shop, take and save screenshots Akinator Game 10M-50M 259 Select game language and play one round of the guessing game Horoscope Life style 10M-50M 156 Check various lucky index of my horoscope; visit the chat room Pregnancy+ Life style 1M-5M 147 Learn some pregnancy knowledge and check the custom schedule Ganna Media 10M-50M 217 Browse and listen to some music rec- ommended by the app FIFA Sports 10M-50M 205 Browse the latest FIFA sport news, scores and championship PicsPlay Photo 5M-10M 159 Select a picture, rotate and crop; change the scene, chroma and expo- sure Photoshop Express Photo 50M-100M 132 Select a picture, change scene, ro- tate, crop, change border style, ap- ply auto-enhancement Dingtone Social 5M-10M 190 Make a call and send some text mes- sages Business Calendar Tool 5M-10M 175 Create, view, edit and delete several calendar events ESFileManager Tool 100M-500M 228 Create a folder and a file; modify local text files; compress, hide and move files; view pictures Gtasks Tool 1M-5M 121 Create daily task, task list; view tasks and settings; delete tasks Planner Classic Tool 1M-5M 180 Create, view, edit and delete several calendar events Schedule St. Tool 1M-5M 258 Create, view, edit and delete several calendar events; add and delete text notes and voice memos makemytrip Travel 5M-10M 302 Search for flights, hotels, railways and buses MTR Mobile Travel 1M-5M 194 Check subway and light rail routes; check bus schedule 1Weather Weather 10M-50M 147 Set current location; view weather information; view and change set- tings
Table 4–2: Selected apps and evaluation workloads. Category and downloads are collected from Google Play Store. All apps have a user review score be- tween 4.2 and 4.7 (out of 5.0).
85 4.4.2 Overall Results
Life Event UI Dupli- Colliding Unique cycle handler callback cates Cases problems methods Bible 8 3 0 4 13 24 FaceQ 5 1 2 4 29 37 CocoPPa 6 4 18 5 2 30 PetShop 7 7 6 13 0 20 Akinator 9 2 5 58 2 18 Horoscope 16 1 6 4 2 25 Pregnancy+ 3 1 12 12 2 18 Ganna 8 1 11 10 1 21 FIFA 10 0 9 2 0 19 PicsPlay 8 2 5 8 0 15 Photoshop Express 3 2 4 0 0 9 Dingtone 14 9 3 1 5 31 Business Calendar 12 2 0 5 0 14 ESFileManager 7 1 1 0 0 9 Gtasks 5 6 0 1 1 12 Planner Classic 4 16 5 10 0 25 Schedule St. 15 4 2 16 0 21 MakeMyTrip 29 12 9 20 4 54 MTR Mobile 12 1 6 3 0 19 1Weather 3 4 10 11 2 19 Total 184 79 114 187 63 Table 4–3: Performance issues detected by AppInspector.
Table 4–3 presents the overall diagnosis results. In total, AppInspector finds 440 unique problems in 20 evaluated apps (average 22 problems per app). Some of the detected problems occur more than once in the evaluation and 187 duplicates are successfully removed. We notice that over 85% of the detected problems are methods in the UI thread performing lengthy operations and causing the app to be unresponsive to user inputs for a noticeable length of time (> 100ms), commonly referred as “UI lags”. This percentage is in accord with previous research of reported performance bugs in apps [93]. As shown in Figure 4–6, detected lags have dif- ferent severities, ranging from 100ms to up to 5 seconds. In the following parts, we analyze the problematic methods in the decompiled code of these apps and demonstrate how developers could benefit from AppInspector diagnosis.
86 1.0
0.8
0.6 CDF 0.4
0.2 Event handler Life cycle methods UI callback 0.0 0 200 400 600 800 1000 1200 1400 1600 Execution time (ms) Figure 4–6: The cumulative distribution function of execution time of long- running methods in the UI thread.
4.4.3 Diagnose Slow App Initialization
According to Table 4–3, AppInspector reports 187 cases of long-running life cycle methods, mostly the Activity.onCreate() method. This life cycle method is responsible for initializing a window and its UI. When it runs over 100ms, the user would observe a blank window with no content. AppInspector’s reports help developers to locate the causes of slow app ini- tialization and here we present two representative reports:
Over-complicated UI. The below is a report showing that the API call setContentView is lengthy and causes the long latency of onCreate.
* com.digidust.elokence.akinator.activities.HomeActivity.onCreate lasts 716ms
* Layout API calls (1 occurrences, totaling 606 ms), consider decomposing the layout XML and load fragments asynchronously
* android.app.Activity.setContentView lasts 606ms
87 * db API calls (16 occurrences, totaling 34 ms), consider AsyncTask
The setContentView API constructs a tree of UI elements from a UI design file (i.e., a layout XML file). The execution time of this API is proportional to the complexity of app UI. By pinpointing the problem at setContentView, AppInspector reveals that the app UI is over-complicated and prolongs the app initialization time. Common solutions and mitigation in- clude reducing the number of UI elements, reducing the depth of UI hierarchy, decomposing the single monolithic design file into multiple fragments [7, 73].
Heavy tasks when initializing UI. We also observe reports showing that the app is performing lots of other lengthy operations when initializing its UI. For example, the below report for the Akinator game app shows that the app performs many lengthy bitmap API calls at initialization time. com.digidust.elokence.akinator.activities.QuestionActivity. onCreate lasts 1155ms
* Layout API calls (1 occurrences, totaling 100 ms) * android.app.Activity.setContentView lasts 100ms * db API calls (14 occurrences, totaling 8 ms), consider AsyncTask
* bitmap API calls (27 occurrences, totaling 982 ms), consider AsyncTask
This report allows the developer to consider asynchronously loading some of the bitmap images after app initialization to reduce initialization time. 4.4.4 Pinpoint Lengthy API Calls in Slow Event Handlers
Event handlers execute in the UI thread in response to UI inputs. It is suggested to offload heavy tasks to worker threads and keep the UI thread responsive. The most straightforward diagnosis result of AppInspector is to pinpoint some lengthy API calls that cause an overtime event handler or UI
88 callback. For example, the below report shows an example problem for FaceQ (an app to create customized cartoon avatar pictures). A similar problem is detected in PetShop (a pet-raising game) as well.
* com.miantan.myoface.EditorActivity.onClick lasts 624ms * [69.6%] bitmap API calls (2 occurrences, totaling 447 ms), consider AsyncTask
* android.graphics.Bitmap.compress lasts 406ms * android.graphics.Bitmap.createBitmap lasts 41ms * [ 5.9%] GC operations (1 occurrence, totaling 37 ms), consider pre-alloc or object pool
Both apps provide the capability to save the screenshot to a file. AppInspector informs that compressing the screenshot bitmap image in the UI thread is lengthy, causing the button click handler to block the UI thread. By pin- pointing the exact lengthy API call, AppInspector greatly reduces developers’ efforts to resolve this problem. 4.4.5 Reveal Inefficient User-defined Functions
We also notice that AppInspector can not only pinpoint lengthy API calls, but also lengthy user-defined functions. For example, AppInspector captures a long running UI callback, com.mtr.mtrmobile.MTRMobile- Activity$2(Handler).handleMessage for the MTR Mobile app (a Hong Kong metro line query app). The report pinpoints that this excessive la- tency is caused by a lengthy user-defined function Tools.setupStationData. We inspect this function and quickly find five nested loops doing string match- ing. The value of this diagnosis report is to precisely pinpoint a user-defined method with high computation complexity.
Reveal hidden slow path in Android APIs. We also find a App- Inspector report uncovering a hidden slow path in an Android API. The below
89 report shows that an event handler is overtime because of calling an Android API android.media.SoundPool.play, which lasts for over 4 seconds.
* com.storm8.dolphin.activity.GameActivityBase.showMainMenu lasts 4098ms
* custom function com.storm8.base.RootAppBase.playSound( RootAppBase.java:265) lasts over 4040ms
* android.media.SoundPool.play(SoundPool.java:232) > 4040ms
SoundPool is an Android abstraction for a collection of soundtracks and the play API plays a specific sound track. We notice that this API is not in the time-consuming API list of AppInspector, and the report comes from the stack dump analysis. By searching online [38] about this specific API, we find that the first call of this API would cause the system to load the entire sound library, which is a hidden slow path in this API. We conclude that App- Inspector’s report successfully and precisely pinpoints an API with an unusual slow path. 4.4.6 Find Colliding AsyncTasks
AsyncTask is the Android construct for asynchronous functions. Collid- ing AsyncTasks is a problem when multiple Android AsyncTasks are started in a short period. Due to the Android programming scheduling policy, one AsyncTask can not start until a previous one has completed, which results in multiple AsyncTasks running serially. However, these AsyncTasks could have totally independent program logic that can run in parallel for better performance. Colliding AsyncTasks often prolong the total execution time of many asynchronous tasks. In our evaluation, we detect 63 colliding cases in total. For example, in Akinator, one AppInspector diagnosis result shows two colliding AsyncTasks:
* com.tapjoy.internal.y$a
90 Causes of long running methods Lengthy APIs Lengthy UDFs GC Colliding cases AppInspector ⊆ ⊆ ⊆ ⊆ PerfChecker ⊇ Table 4–4: The analysis capability of AppInspector (dynamic) vs. PerfChecker (static). ⊆ means the detected problems form a subset of actual problems while ⊇ indicates a superset relation. UDFs denote User-Defined Functions.
* com.digidust.elokence.akinator.activities.SplashscreenActivity$1 (AsyncTask)
The first is a regular background routine of an advertising library (Tapjoy) while the other is an app task starting its in-app billing functionality. We conclude that these two AsyncTasks are logically independent and could ex- ecute in parallel [39]. Showing which AsyncTasks are colliding helps the app developer to quickly decide if parallel execution can be applied to improve performance. 4.4.7 Comparison with Static Analysis
Table 4–4 presents the comparison of AppInspector with PerfChecker [93], which is a static analysis based performance diagnosis tool. Since AppInspector detects problems in a dynamic call graph generated from profiling data, it al- ways underestimates possible code paths and thus detected problems would be a subset of actual problems. We articulate that AppInspector’s dynamic call graph grows with the test coverage. And developer’s tests would cover most frequently used functionalities of the app and thus AppInspector’s analysis could cover major problems accordingly. On the contrary, PerfChecker builds a static call graph and checks if a method running in the UI thread could reach a time-consuming Android API. This analysis is over-estimating (finding more problems than actual problems)
91 because 1) a static call graph overestimates virtual method calls and some methods might not be reachable at runtime; 2) the execution time of a lengthy Android API depends on its dynamic parameters (e.g., decodeBitmap per- forms differently according to the size of input bitmap). Due to these inherent limitations, static analysis would report false alarms on diagnosing lengthy APIs for long running UI methods. Also it is hard for static analysis to di- agnose lengthy user-defined functions (UDFs) for long running UI methods because estimating the execution time of UDFs with insufficient user inputs is non-trivial. The same limitations also apply to detecting GC-induced prob- lems and colliding asynchronous functions, as predicting GC occurrence and predicting overlapping of asynchronous functions are not easy at static time. We view that AppInspector’ dynamic approach is rather complementary than competitive to static tools like PerfChecker. With reasonable test cases from the app developer, AppInspector can capture a broad range of problems and would not report false alarms while static tools could continue to provide supplementary warnings. 4.4.8 Overhead Analysis
AppInspector collects six kinds of diagnostic information by instrument- ing an app, as shown in Figure 4–3. Overall, we observe that the logs pro- duced in 20 apps range from 0.14KB/s to 6.7KB/s. In these cases, case 4 (GC information) is captured from Dalvik VM’s log and does not involve ad- ditional runtime overhead. Case 5 (UI thread stack sampling) is performed in a separate thread periodically and thus does not factor into app execution performance. For the remaining four cases, AppInspector instruments logging code in the app to collect information and thus incurs overhead to app execu- tion. We measure the runtime overhead for these cases on the Nexus 7 device to understand AppInspector’s impact on app performance.
92 Case 1 (methods in the UI thread), Case 2 (asynchronous function) and Case 3 (API calls) are alike, all requiring to log twice at the entry and exit of certain methods or function calls. The main overhead for these cases comes from JSON logging, which constructs and serializes a JSON log object con- taining collected information. According to our measurement, JSON logging takes 0.5ms ±3% on Nexus 7, which is negligible since 90% of the profiled methods run over 45ms. For case 6 (asynchronous calls), AppInspector needs to log once and retrieves the call site (source code file name and line number) from the topmost stack trace element. Call site retrieval costs 0.4ms ±5% and thus case 6 costs on average 0.9ms, which is similar to case 1 to 3. Overall, these four cases incur similar runtime overhead at about 1ms on average on the Nexus 7 device. Also, Nexus 7 is a 2013 model and newer devices with more powerful CPU and RAM chips are expected to have even lower overhead. 4.5 Related Work
AppInspector is motivated by previous works on mobile app performance study. The methodology and mechanisms used in AppInspector are generally related to previous approaches on measuring, predicting and improving the performance of mobile applications.
Performance bugs/issues study. Performance studies of mobile ap- plications highlight the most common bug types, impacts and thus motivate potential debugging and fixing methods. Liu et al. [93] present an empirical study of performance bugs in open-source smartphone applications and how these bugs could be addressed. AppInspector is motivated by these observa- tions to implant proper instrumentations in apps to detect similar bugs. A study of 485 Github-hosted open-source Android app projects [110] tries to understand how developers find and fix performance bugs. This work high- lights that effective tools are in great need to assist developers in detecting and
93 locating bugs. AppInspector aims to fill this gap for better bug presentation as well as proposing related solutions.
Instrumentation-based performance measurement. Profiling is the most fundamental technique to understand program performance. However, profiling at per-function basis incurs significant runtime overhead yet pro- vides tremendous information that conceals the real problems. Thus new metrics are being developed and experimented on apps. User-perceived re- sponse time [101, 108, 122, 99] is a recent measurement metric for mobile app performance. AppInsight [101] leverages instrumentation to measure user- perceived time for Windows Phone apps developed with the Silverlight frame- work. AppInspector’s async-call tracking is inspired in part by AppInsight, yet varies mainly due to the difference of Android and Windows Phone plat- forms. On Android, Panappticon [122] is closely related to AppInspector, which manually instruments the app to measure user-perceived delays for whole-system and application diagnosis. However, Panappticon largely re- lies on developers to instrument apps and requires a modified Android OS to support measurements. AppInspector overcomes these difficulties by fully automating instrumentation-based measurements. ProfileDroid [112] is an- other system that performs extensive profiling across the Android OS, system services and apps to detect behavioral inconsistencies.
Performance prediction. Mantis [87] and Proteus [119] are the rep- resentative work for this purpose. Mantis can predict app performance based on given user inputs and its fine-grained performance models. Proteus focuses on predicting network performance for real-time mobile apps such as VOIP apps.
Performance improving approaches. There are a few tools to opti- mize mobile app performance with different focuses. Timecard [102] mainly
94 tackles network delays and aims to enforce deadlines for network requests. Asynchronizer [92, 91] targets asynchrony and tries to assist app developers to refactor code and move lengthy operations off the UI thread. AppInspector helps app developers to detect performance issues at the first hand and thus these tools could be better utilized to resolve detected issues. FlipFlop [69] and CloneCloud [56] both propose to leverage a cloud counterpart to assist and improve app execution on resource-limited devices.
Event tracing based performance diagnosis for other systems. Event tracing/logging is a common technique to study system performance. AppInspector is also inspired by various event tracing mechanisms developed for other systems, e.g., cutting down overhead from Log2 [59]. Previous re- search on general Java GUI applications [85] also provides valuable references to AppInspector.
95 Chapter 5 AppSwift: Automatically Enhancing App UI Performance AppInspector reveals that many app performance problems are caused by common inefficient code patterns. In this work, we explore to reuse the static instrumentation capability of AppInspector to automatically retrofit app code and remove inefficient code patterns. We design AppSwift to help app developers to improve app performance without manual refactoring app source code. 5.1 Introduction
Mobile app marketplace is highly competitive and optimizing app perfor- mance is an important mission of app developers for reducing negative user reviews [80]. However, with a fast growing code base, it becomes difficult for developers to apply new performance enhancements throughout the app. Also some third-party libraries used in the apps could also hide performance issues. Thus, optimizing performance incurs considerable code refactoring efforts and sometimes being incomplete when the source code of embedding libraries is not available. According to our large-scale analysis of real-world apps, in- efficient APIs and code patterns widely exist in these apps. Consequently, performance issues are common in released apps and repeatedly occur in app code [93, 101, 92, 85, 99]. According to our AppInspector results and a related performance bug study [93], loading large bitmap images in the main thread is a common cause of long latencies perceptible to users. As a representative case, apps often encounter performance issues when handling bitmap images [93]. Meanwhile, device hardware, especially screen resolution, continue to upgrade over years,
96 which motivates app developers to utilize higher quality bitmap images on the UI for better display effect. For example, the Samsung Galaxy S6 released early 2015 has about 10× screen pixels as compared to its ancestor Galaxy S2 released four years ago. Using these bitmaps consumes a large amount of app memory and loading them requires longer storage IO time and CPU decod- ing time (a full-screen RGB bitmap for the Samsung Galaxy S6 takes about 14MB). Loading large bitmap images not only accounts for the cause for many UI lag bugs, but also causes a large fraction of memory bloat problems [93]. Various performance enhancement libraries and best practices exist. However, their adoption in real-world apps is relatively low or incomplete (especially in 3rd-part libraries). We propose AppSwift, a static instrumentation tool that applies a set of designated performance enhancements automatically. AppSwift reuses the static instrumentation capability of AppInspector, which can automatically operate on app code without the knowledge of the app source code nor anno- tations of any kind from the developer. This automation allows app develop- ers to apply the latest performance enhancements throughout app and library code with zero effort. AppSwift’s enhancements include: • retrofitting inefficient image handling APIs; • a dynamic analysis that helps to retrofit some non-trivial inefficient APIs; • a global bitmap cache that accelerates repeated bitmap loading; • execution logging that allows developers to analyze app execution trace, debug UI performance and understand AppSwift improvements. Rewriting app bytecode is more practical and effective than upgrading other part of the mobile software stack (e.g., mobile OS). On one hand, it exposes the flexibility to app developers to optimize app code in order to perform consistently on a range of devices and OS versions. On the other hand, it
97 exposes the flexibility to mobile users who can optimize downloaded apps automatically before installation, without having to upgrade the entire OS for performance patches. We evaluate AppSwift with real-world apps collected from the Google app marketplace. Rewriting complicated real-world apps is a challenging task as apps could contain different languages, supports a diverse range of devices and possibly be obfuscated. AppSwift can successfully rewrite complicated real-world apps and only adds 71kB footprint to the app binary file. We then pick 30 apps (apps with rich UI elements from 14 Google Play store cate- gories, excluding canvas-based games and background or utility apps) from representative categories for finer-grained studies to understand the perfor- mance improvements brought by AppSwift. For each picked app, we first use most of its functionalities and record a trace of user inputs during the process. Then we rewrite the app with AppSwift and replay the user input trace to the rewritten version. By collecting and comparing the execution log, we show that AppSwift can reduce the bitmap loading time of these apps by 37.4% on average and reduce storage IOs by 48.9%. These savings also eliminate some lengthy operations on the UI thread and improve UI responsiveness for some apps. We also manually analyze the decompiled code along with their execu- tion logs and reveal some interesting findings about these apps. For instance, AppSwift can help apps to detect and remove redundant bitmap memory ob- jects, which reduces app memory consumption. We believe that AppSwift is a practical and useful tool for app developers and mobile users to automatically enhance app UI performance. The rest of this chapter is organized as follows. Section 5.2 presents the observations from real-world apps that motivate AppSwift. Section 5.3 presents the design and implementation of AppSwift. Section 5.4 presents
98 Device Galaxy Nexus 5 Nexus 6 Nexus Vendor Samsung LG Motorola Resolution 720x1280 1080x1920 1440x2560 CPU Dual-core Quad-core Quad-core 1.2GHz 2.3GHz 2.7GHz Release year Late 2011 Late 2013 Late 2014 Loading time 245ms 279ms 278ms Storage IOs 1.6MB 3.4MB 4.2MB Physical display size 4.7’ 4.95’ 5.96’ Loading time per inch 52ms 59ms 46ms Table 5–1: Benchmark image loading on three devices. All three devices load a full-screen-resolution RGB bitmap image. our case study evaluation and performance analysis. Section 5.6 presents the related work. 5.2 Motivation
In this section, we highlight some observations regarding bitmap handling in real-world apps to motivate the optimization of bitmap loading in apps.
Loading bitmaps is resource intensive. Loading a bitmap has two time-consuming phases: first to read a large amount of data from the storage and then decode the image data in memory (CPU time). Table 5–1 shows the amount of IOs and CPU time for loading a full-screen RGB bitmap image on three devices released over the past four years. We notice the although device CPU improves over years, the bitmap image size also increases because of the higher screen resolution. As a result, the time to load an image to fill a fixed-size physical display area (loading time per inch) does not change much over different generations of devices.
99 1.0
0.8
0.6
0.4 setImageResource decodeStream 0.2 decodeResource
Cumulative Distribution Function decodeFile 0.0 0 10 20 30 40 50 API Usage Count Figure 5–1: The cumulative distribution function for API occurrence in 115 thousand apps. The top four APIs related to bitmap are selected, used in more than 80% of the apps.
Android apps have a single UI thread to process user inputs and perform UI updates. Processing heavy tasks on the UI thread could cause the app to become unresponsive to user input and have a frozen UI. This problem is commonly known as the lengthy operations on the UI thread, which is the most common cause for bad UI performance [93]. According to our measurement in Table 5–1, loading a half-screen bitmap on the UI thread is over 100ms and could cause user perceptible latency [95]. Thus handling bitmaps on the UI thread becomes critical to app UI performance.
Inefficient APIs are still widely used. To understand how real- world apps load bitmaps, we conduct a static API usage analysis over 115 thousand apps collected from the Google Play Store. For each app, we disas- semble its code and count the number of calls to image loading APIs provided in Android. Figure 5–1 presents the cumulative distribution function of the top four popular image loading APIs used in these apps.
100 Among the four, three decode-series APIs load an image from differ- ent sources (File, stream and Resource), all being callable from both the UI thread and worker threads. However, the mostly used API setImage- Resource must be called from the UI thread, which loads a Resource (read-only bitmaps packed into the app) into an ImageView (a standard UI widget for an image/bitmap). This API is a convenient way of displaying im- ages in old API versions, but is found to have bad performance impacts and is discouraged for use in newer APIs [5].
AppSwift motivations and challenges. The wide use of inefficient image loading APIs and the potential long time of loading images could result in severe UI performance issues. AppSwift is motivated by these observations to optimize bitmap image loading. Nevertheless, AppSwift is faced with a range of challenges. First, most performance issues are found in legacy code using legacy APIs, which developers do not want to refactor. AppSwift should optimize bitmap image loading automatically and not require developer inter- vention. Second, when transforming app code, AppSwift should cover as many legacy APIs as possible and should not incur correctness problem. 5.3 AppSwift Design
Performance bugs related to inefficient bitmap handling are common in real-world apps and affect app UI performance. However, fixing these bugs re- quires significant refactoring cost and something being incomplete. To address this problem, AppSwift aims to rewrite app binary code to automatically apply performance enhancements to bitmap handling. In this section, we elaborate the binary rewriting procedure and the enhancements introduced by AppSwift. We also discuss the implementation tips for handling complicated real-world apps.
101 Binary Application rewriter
Other APIs Bitmap loading APIs
AppSwift Cache-aware uDDFT UI widgets and APIs Bitmap Cache Log Logger Analyzer Application Framework Built-in UI Widgets, Libraries
Android OS
Figure 5–2: The architecture of AppSwift. Four shaded components are run- time components of AppSwift, running with the rewritten app.
5.3.1 Overview
Figure 5–2 shows the architecture of AppSwift, with four components running with the target app, a binary rewriter and a log analyzer. Given an Android app, AppSwift first adds a bitmap image cache (Section 5.3.2) to the app code. To properly use image caching, AppSwift provides a set of cache-aware image loading APIs to replace the vanilla Android APIs. The binary rewriter will then check every function call instruction, find those calls to original Android image loading APIs and rewrite them to instead use the corresponding cache-aware APIs provided by AppSwift. After this rewriting stage, whenever the app loads image, it will use the AppSwift-provided APIs that have image caching enabled. AppSwift further needs a lightweight dy- namic data-flow tracking (uDDFT) to help enable some of its cache-aware APIs that require runtime data flow information. Finally, all cache-aware APIs of AppSwift would generate runtime logs for the offline log analyzer to analyze caching effectiveness and benefits.
102 Bitmap App Read- Identifier Description origin private only
Resource XX
Table 5–2: Different bitmap origins and their identifiers. “App private” means the image can not be accessed by other apps. “Read-only” means the image data can not be mutated. FILEPATH refers to the relative file path to app root folder. LMT stands for last modified time of the file. RESID stands for a numeric identifier for the resource generated by the compiler.
Transforming Android APIs has different difficulties, depending on the behavior of the API. AppSwift provides three forms of transformation to han- dle different Android APIs. The simplest transformation (Section 5.3.3) could implement the caching logic by just using the original Android API. The API assisted transformation (Section 5.3.4) requires other helper APIs to imple- ment the correct caching semantic for some Android APIs. The data flow assisted transformation (Section 5.3.5) is the most complicated transforma- tion that requires dynamic data flow information to help transform APIs. 5.3.2 Bitmap Cache
A bitmap cache keeps some bitmaps in the memory such that if a bitmap is loaded again (repeated loading), the bitmap stored in the cache could be re-used. Cache hits save the IOs to load bitmaps from the storage as well
103 as save the CPU time for decoding image data. In principle, AppSwift only caches bitmaps that have a high probability of repeated loading. Table 5–2 summarizes the various sources of bitmaps that an Android app could load. AppSwift introduces a unique bitmap identifier to identify bitmaps. Resources (binary data packed into the app) and asset files are the most popular ways of storing static bitmaps, both being private to the app and read- only. Bitmaps from these two origins are often loaded repeatedly because they constitute the app GUI, which makes them the main targets of caching. A resource is uniquely identified by a numeric number and asset files are identified by a path relative to the root app asset folder. Thus, the resource ID and asset file path serve as the bitmap identifier for bitmaps from these two origins. Data files and cache files (temporary files deletable by the system) are mutable files that persist the app data generated at runtime. These files have unique file paths under designated folders but their content can change from time to time. Thus the identifier for these two origins consists of both the file path and the last modified time of the file. Android also allows an app to generate a bitmap from scratch (e.g., a painting app). App-generated bitmaps are highly mutable and will not be cached by AppSwift. External files (e.g., files on the SDCard) and network data contain data from external sources (other processes, users or remote servers). Their contents vary from time to time and each time the app loads bitmaps from these origins, they should be loaded from scratch. Thus App- Swift does not cache bitmap from these sources and does not assign bitmap identifiers. Nevertheless, it is quite common that an app first writes network data to a cache file [31] and then loads the bitmap from the file. In such case, AppSwift will be able to cache this bitmap loaded from the cache file.
104 5.3.3 Simplest API transformation
The simplest API transformation implements a simple “check-cache” logic with only the helper of the original Android API. We demonstrate this trans- formation with the example of transforming android.graphics.Bitmap- Factory.decodeFile API. This API has a file path parameter. Its func- tionality is to read the image file specified by the path and return a bitmap object decoded from the image file. The below snippet shows transformed cache-aware decodeFile:
Bitmap transformed(String path) { // 1. construct the bitmap identifier String id = genIdentifierFromFilePath(path); // 2. check cache if (cache.has(id)) { // 3. skip decoding and return directly return cache.get(id); } else { // 4. fallback to the old API return decodeFile(path); } }
The transformed API has a fast path (1+2+3) that directly serves the image object from AppSwift cache and a slow path (1+2+4) that calls the original API to decode the file. Step 1 is executed according to rules in Table 5–2. Step 2 is a standard cache behavior. On step 3 (cache hit), this simple API only needs the bitmap object, which is directly obtained from the cache. On step 4 (cache miss), the transformed API simply falls back to the original API. The fast path is expected to run much faster than decoding a bitmap. The transformation with decodeFile is a template for transforming some other simple Android APIs.
105 5.3.4 API-assisted API transformation
The API-assisted API transformation needs the assistance of other An- droid APIs to transform some image loading APIs. We demonstrate this trans- formation with android.widget.ImageView.setImageResource. This Android API is a member method of the class ImageView, which is a UI el- ement storing an image. The setImageResource loads an image from an Android resource and update the UI element to display the image. This API involves image decoding and UI update, which must be called on the UI thread and cause the UI thread to take the latency of image decoding. With caching, AppSwift has the potential to eliminate the image decoding step and shorten the execution time. The below snippet demonstrates the transformation. void transformed(ImageView _this, int resid) { // 1. construct bitmap identifier String id = genIdentifierFromResource(resid); // 2. check cache if (cache.has(id)) { // 3. skip decoding and directly updateUI _this.setImageBitmap(cache.get(id)); } else { // 4. fallback _this.setImageResource(resid); } }
The fast path skips image decoding and updates the UI with a cached bitmap object. On step 3, this transformation needs an assistance API (setImageBitmap in this case) to fulfill the same functionality of setImageResource when the bitmap object can be obtained from the cache. For some APIs, the assis- tance API could be as simple as a one-liner like this example. But for some
106 other APIs, the assistance API might not be ready and has to be implemented according to the API behavior. 5.3.5 Data-flow assisted API transformation
The most complicated transformation needs dynamic data flow informa- tion. We demonstrate this with the example of transforming android.graph- ics.BitmapFactory.decodeStream. This API reads from an Input- Stream object and return a decoded bitmap object. Despite the simple API behavior, its input parameter (the input stream) could be a file, a network stream, a resource or even an in-memory buffer. Thus, AppSwift needs to identify the bitmap origin in order to perform cache correctly. The below snippet demonstrate the transformation:
Bitmap transformed(InputStream s) { // 1. construct the bitmap identifier String id = genIdentifierFromFlowBuf(s); // 2. check cache if (cache.has(id)) { // 3. skip decoding and return directly return cache.get(id); } else { // 4. fallback to the old API return decodeStream(s); } }
The non-trivial task in this transformation is step 1, where AppSwift need to identify what the input InputStream represents (a file, network data or memory data). This step will consult a flow buffer maintained by the dynamic data flow tracking module of AppSwift.
Dynamic data flow tracking. To retrofit APIs like this, AppSwift incorporates a lightweight dynamic data-flow tracking module (uDDFT) to
107 gain data flow information. uDDFT maintains a flow buffer tagging certain memory objects with bitmap identifier. uDDFT creates the bitmap identifier whenever a file is opened (e.g., new File()) and tags the File object with this identifier. Then, uDDFT propagates the bitmap identifier to all inter- mediate objects (e.g., FileInputStream, BufferedInputStream, etc.) obtained from this File object. This is achieved by interposing on the con- structors of these intermediate objects. Finally, when AppSwift intercepts a bitmap loading API like decodeStream, it retrieves the bitmap identifier from the flow buffer to identify the source of the bitmap being loaded. Data-flow tracking is performed at app runtime and thus its overhead slows down app execution. AppSwift implements two optimizations to reduce the tracking and bookkeeping overhead.
Optimization 1: limiting tracking scope. Extensively tagging mem- ory objects during execution could incur considerable runtime overhead. Thus uDDFT only tags bitmap identifiers on Java file and Android resource related objects (InputStream, Readers and File object under java.io package) and some similar Android constructs (e.g., AssetInputStream). This optimization makes the analysis lightweight and reduces the interposition overhead.
Optimization 2: locality-aware tracking buffer. Though AppSwift only tracks a limited scope of objects, the tracking buffer could still grow rapidly and unbounded during app execution. A large tracking buffer slows down lookup and increases the bookkeeping overhead (e.g., GC overhead). Thus, AppSwift limits the size of the tracking buffer and only tracks the most recently created intermediate objects. This is in accord with the app
108 execution locality, where the app keeps accessing only a small set of re- cent objects. In implementation, the tracking buffer is a fixed size open ad- dressing hash table, indexed with the object memory address (obtained by System.identityHashCode()).
Best-effort data flow tracking. AppSwift aims to provide a best- effort data flow information instead of a complete one. This is because when a data flow mapping is not available, the affected API will always fall back to the original API to provide the same functionality. The existence of data flow mapping is only intended to improve caching opportunities. By dropping some data flow information and interceptions, AppSwift could have a controllable runtime overhead for tracking data flow. We observe that app code has strong locality of using APIs and thus best-effort data flow tracking is a practical strategy. 5.3.6 Transformation correctness validation and complexity reduc- tion
When transforming Android APIs, the essential requirement is transfor- mation correctness. Consider the complexity and diversity of Android APIs, AppSwift approaches the correctness problems in two stages. First, AppSwift would transform and manually validate a small set of simple APIs, like the ex- amples above-mentioned. These APIs have fewer parameters, corner cases and overall complexity. Second, for complicated APIs (e.g. APIs with numerous parameters and different working modes), AppSwift tends to transform these cases with the validated transformed APIs. These validated transformed APIs already have cache-awareness and could be leverage to reduce complexity in transforming complicated APIs.
109 5.3.7 Logging and Inspecting
AppSwift has a global logger that records various activities related to AppSwift, including caching result, data-flow states, cache state, etc. This information could help to visualize the app execution (in particular UI thread occupancy situation) and understand the effects of AppSwift (cache hit rate and various savings). We design two working modes of the logger for different usage scenario. The concise logging mode only logs the essential information for under- standing caching benefits, e.g., the hit state per bitmap loading request, the dimension of the returned bitmap. Also in this mode, the overhead recorded excludes the logging overhead. This mode is largely intended to minimize the logging overhead and only to outline the performance gains of AppSwift. The inspecting mode logs the execution time of methods running on UI threads to find lengthy operations that run over user perceptible latency limit. The inspecting mode is to understand the caching benefits and how these benefits improve UI responsiveness. 5.3.8 Implementation
The AppSwift rewriter is developed based on smali [72], PATDroid [27] and apktool [113], totaling 2,983 lines of Java code. The runtime components contain 3,743 lines of Java code. Here we introduce some practical implemen- tation considerations to successfully rewriting and running with complicated real-world apps.
Strong reference vs. soft reference. The most straightforward way to implement caching is a strong referenced fixed size LRU cache. In such case, bitmaps are strongly referenced and reside in memory even if they are not needed by the rest of the app. Thus the garbage collector will not reclaim the memory used by the cache, even if the app is experiencing memory pressure.
110 If the app memory usage plus the cache memory requirement exceeds the heap limit, the app will crash due to OOM (OutOfMemory exception). Alternatively, AppSwift provides a soft referenced cache where the cache only keeps soft references to cached objects. The garbage collector will reclaim all weakly referenced memory objects before throwing an OOM exception. Thus, with a soft referenced cache, caching will not increase the risk of OOM.
Handling reflection calls. Apps occasionally use Java reflection to in- voke functions. Reflection calls could reach any function in the app, bypassing some rewriting rules. Thus by using reflections calls, the app could invoke a vanilla bitmap loading API or create a vanilla image widget. For these cases, AppSwift does not enforce any special treatments and the app would simply use the non-cached APIs or widgets with no benefits from AppSwift. From the perspective of data-flow tracking, reflection calls could bypass certain tracking rules and leave incorrect data-flow state in the tracking buffer. For example, a call to InputStream.close() would remove the InputStream from the tracking buffer, because it would no longer point to the file we are track- ing after close(). However, if the close() method is called reflectively, the app would bypass this rule and have incorrect tracking results. To over- come this uncommon but possible case, AppSwift would interpose on reflection calls, check the call target at runtime and invoke the proper tracking rule if necessary. 5.4 Evaluation
To understand the usefulness of AppSwift, we apply AppSwift to 30 real- world Android apps from various categories in a real app marketplace. For each app, we use a replay testing methodology to feed the same user inputs to the original and rewritten version of the app. Then we compare the execution logs and outline the time and IO savings, overhead and memory pressure
111 introduced by AppSwift. Furthermore, we also decompile two cases and study the decompiled code along with AppSwift logs. We report our findings about how AppSwift enhancements improve the UI performance of real apps. 5.4.1 Methodology
In this section, we elaborate how we select apps for evaluation, the replay testing methodology, the metrics used to evaluate AppSwift and our testing environment.
Dataset and Selected Apps. We have collected about 115 thousand apps from Google Play Store, which is the official Android app marketplace that hosts more than one million apps with billions of active smartphone users worldwide. We first run the API usage analysis mentioned in Section 5.2 and exclude 20% apps that have no use of the API that we are interested. These apps are typically OpenGL or very simple apps with pre-loaded UI design. In the remaining 80% of the apps, we choose 30 apps for AppSwift evaluation, as summarized in Table 5–3. Our app selection process aims to retain the diversity of real app marketplaces and evaluate AppSwift on different types of apps. As shown, selected apps cover major app categories such as game, entertainment, sports, social networks, travel, etc. We favor apps with more than one million downloads, as they are expected to be more mature and have better quality control process. These well-optimized apps would better testify the actual usefulness of AppSwift. We also consider the review score when selecting apps. The chosen apps have a Google Play review score span from 3.7 to 4.6 (out of 5). Since these chosen apps are real-world app releases, naturally we have no access to their source code, no prior knowledge of the app structure or their containing third-party libraries and these apps are mostly obfuscated.
112 App name Category Review Downloads Trace Trace description length (secs) FaceQ Entertainment 4.3 10M-50M 432 Create several cartoon avatar pic- tures CocoPPa Entertainment 4.2 10M-50M 227 Browse and set home screen wall- papers Akinator Game 4.2 10M-50M 259 Play several rounds of the game, change language and visit the store Triviados Game 3.7 5M-10M 376 Play several rounds of the QA game PetShop Game 4.1 10M-50M 117 Feed the pet and check the pet shop Starbucks Life style 4.3 5M-10M 356 Check payment cards, available re- wards and latest gift cards Preganacy+ Life style 4.3 1M-5M 147 Learn some pregnancy knowledge and check the custom schedule Horoscope Life style 4.6 10M-50M 156 Check various lucky index of my horoscope Camera360 Photo 4.3 100M-500M 190 Take a picture and add some ef- fects Cymera Photo 4.3 100M-500M 154 Take a picture, browse pictures and add effects CandyCamera Photo 4.3 50M-100M 115 Take a selfie picture and add ef- fects NHL Sports 3.8 1M-5M 314 Select favorite teams and browse the latest sports news FIFA Sports 4.1 10M-50M 205 Browse the latest FIFA sport news Ganna Media 4.1 10M-50M 217 Browse and listen to some music recommended by the app QuickRemote Media 4.0 5M-10M 275 Browse TV programs, checkout the detailed information and casts WeatherChannel Weather 4.3 50M-100M 288 Browse the weather news and checkout the weather for the city Dingtone Social 4.2 5M-10M 190 Make a call and send some text messages oovoo Social 4.3 50M-100M 210 Search for friends and check mes- sages makemytrip Travel 4.1 5M-10M 302 Search flights and hotels CheapOair Flights Travel 4.3 1M-5M 255 Search flights, hotels and car rental information WeCal Tool 4.3 1M-5M 108 Create some calendar entries with local pictures, text notes To-Do Calendar Tool 4.2 1M-5M 180 Add events with photos, check my calendar OfficeSuite+ Business 4.2 50M-100M 300 Edit a local document and presen- tation slides DocsToGo Business 4.0 50M-100M 163 Edit a local document and presen- tation slides Line Communication 4.0 100M-500M 269 Send messages, browse and pick some new emojis and themes GoSMS Communication 4.4 100M-500M 229 Send some text messages, emojis and pictures on the device Kids Doodle Education 4.0 10M-50M 132 Make a doodle from scratch and check the gallery 50Languages Education 4.3 1M-5M 167 Learn English and take some quiz Kobo eBooks Book 4.2 10M-50M 382 Browse and read some book pages Table 5–3: Selected apps and the workload for evaluation. All data are col- lected from Google Play Store.
113 Replay test. For each selected app, we first install the original app from the market. Then we emulate a user and use the app. During this process, we record all user inputs on the touchscreen with a commercial input recording app [22]. Then we rewrite the app with AppSwift and obtain the rewritten app file. We install the rewritten version and replay the recorded user input trace to operate the rewritten version. After the replay is finished, we collected AppSwift logs and analyze various metrics related to AppSwift. We use a record-and-replay method to ensure that both the original and rewritten apps will encounter the same user input event at the same timing. Also we prefer human provided input traces than synthetic traces generated by any sort of UI automation tests. This is because our selected apps have great diversity and thus UI layout. Human provided traces could cover a wide range of major functionality provided by the app and at the mean time generate events in a realistic pace. All of our traces are publicly available at [41]. Such record-and-replay method has been recommended by other work [112, 100, 66].
Metrics. With replay testing, we are able to observe the app behavior in terms of same user inputs. Here we introduce the metrics to understand the improvements and overhead introduced by AppSwift: • IO saving. When a repeated loading is cached, AppSwift saves the IOs to load the data from the storage. The IO saving ratio for the app is calculated as the saved IOs for repeated bitmap loading of rewritten app relative to the total IOs for bitmap loading of the original app. • Time saving. For each bitmap loading request, AppSwift incurs a run- time overhead to check if the bitmap is present in the cache. If cache hits, AppSwift saves the time to load this bitmap. Thus for each bitmap load-
hit×decodingT ime−overhead ing request, the time saving ratio is calculated as decodingT ime . For an app, the overall time saving ratio is averaged across all requests.
114 • UI responsiveness improvement. Bitmap loading saving has both im- pact on UI thread and worker threads. To characterize how AppSwift improve app UI responsiveness, we identify all lengthy operations on the UI thread that could cause user perceptible latency [95]. Lengthy operations are the most severe responsiveness issues that are directly perceptible to the user, where the app will become temporarily unre- sponsive to new user inputs and the user could observe the app UI freeze for a while. We present the number of lengthy operations that could be removed because of the bitmap loading improvement from AppSwift.
Experiment Device and Configuration. During our experiments, we record all reflection call targets to check how many image loading re- quests could bypass AppSwift via reflection calls. All tests are performed on a Google Nexus 7 (2013) Tablet with a 7-inch screen and a screen resolu- tion of 1200x1920 pixels (323 ppi pixel density). This device is equipped with a quad-core 1.2GHz Cortex-A9 CPU and 1GB RAM. The operating system is Cyanogenmod 11 (Android 4.4.4 KitKat) with Dalvik VM as the runtime. The test environment also includes a fully connected WiFi at approximately 22Mbps. For all tested apps, AppSwift is configured to run with a soft refer- ence cache as well as a strong reference cache with a capacity of 1/8 of the Java heap size (128MB for our device). 5.4.2 Experiment Results
Loading time and IO saving. When the app loads a bitmap that exists in AppSwift cache, AppSwift would reuse the bitmap in memory and thus avoid storage IOs and image data decoding. We notice that apps in our dataset never call image-loading APIs via reflection and thus all image loading requests will be intercepted and processed by AppSwift. Figure 5–3a summarizes two saving ratio (IO and loading time) for the 30 selected apps.
115 100% IO saving ra o me saving ra o 80% 60% (percentage) 40% ra o 20% 0% Saving
NHL FIFA Line FaceQ Gaana oovoo To-Do Cymera WeCal GoSMS Doodle CocoPPa Akinator Triviados PetShop Dingtone Starbucks Horoscope CheapOair DocsToGo Pregnancy+ Camera360 makemytrip 50languages CandyCamera QuickRemote WeatherChanWeatherWid Office Suite+ Kobo eBooks (a) IO and loading time saving ratios
30
lengthy 20
10 revmoed
opera ons of 0
NHL FIFA Line FaceQ Gaana oovoo Cymera WeCal GoSMS Doodle Number CocoPPa Akinator Triviados PetShop Dingtone Starbucks Horoscope DocsToGo Pregnancy+ Camera360 makemytrip 50languages CandyCamera QuickRemote Office Suite+ Kobo eBooks WeatherWidget To-Do Calendar WeatherChannel CheapOair Flights (b) Lengthy operations removed
Figure 5–3: AppSwift’s performance enhancements on 30 real-world apps.
On average, AppSwift helps apps to reduce bitmap loading time by 37.4% and reduce storage IOs by 48.9%. Apps from game, entertainment, travel and productivity tool categories benefit most from AppSwift, since these apps generally have rich UI elements with a large set of static bitmaps packed in the app file. On the other side, apps from weather, book and photo categories benefit the least from AppSwift. These apps either load most bitmaps from the network (weather apps) or focus mainly on user contents (ebook view or photo viewer/editor). Fewer UI elements backed by local bitmaps cause AppSwift to have smaller effect on these apps.
UI responsiveness improvement. Each lengthy operation on the UI thread causes the app to become unresponsive to new user inputs and has a freezing UI (stalled animation) for a user perceptible latency. In our evalua- tion, we use 100ms as the latency limit [95]. The lengthy operation problem is the most severe problem for UI responsiveness. Figure 5–3b presents the number of lengthy operations that are removed after AppSwift optimizes the
116 original app. Though there are various causes for lengthy operations [93], we demonstrate that AppSwift can still successfully and automatically remove a considerable number of these problems. One notable lengthy operation is with the app Akinator, which is a question and guess game. We notice that Akina- tor will stall for about two seconds when a new game is started. Through code analysis, we discover that the game window initialization method will (re)load the entire app background image settings for a new game. With AppSwift, these background images are cached in memory and the image loading time for a new game is significantly reduced. We believe that this automatic fix is important for a mobile game. Beyond lengthy operations, we also observe that for some apps, AppSwift helps reduce bitmap loading time on the worker threads more than on the UI thread. Thus, even for apps with almost no lengthy operations, AppSwift’s loading time saving still has a positive impact on user experience. We will further explain this with the FaceQ case study.
Runtime overhead. AppSwift runtime overhead is composed of two parts. First, every time the rewritten app loads a bitmap, AppSwift would check if the bitmap could be found in the cache, which incurs an overhead of checking the cache. As shown in Figure 5–3a, all tested apps experience a positive time saving ratio, which indicates that the saved loading time can always complement the time to check the cache. Second, AppSwift performs dynamic data-flow tracking to improve cache effectiveness. This overhead mainly comes with the instrumentation code that maintains the data-flow table. According to our measurement, such instrumentation happens not often during the execution. For 30 tested apps, the data flow tracking overhead is 0.3ms to 232ms per app, spanning across a test period of several hundred
117 seconds. We conclude that data flow tracking has a negligible impact on app performance.
Memory pressure. With caching, AppSwift naturally increases the runtime memory usage of apps. With a soft reference cache AppSwift will not increase the risk of OOM exception (see Section 5.3.8). We further measure the peak memory usage with the original app and instrumented app. The results show that AppSwift increase app peak memory usage by -30% to 59%, with an average of 11%. We notice quite a few negative cases where AppSwift actually helps the app to save memory. We discover that these apps would load the same bitmap more than once and thus obtain multiple bitmap memory objects with exactly the same content. With caching, AppSwift identifies these duplicates and stores/uses only one copy of the bitmap in memory, which is semantically correct. This “memory de-duplication effect” introduced by App- Swift could help apps to save memory as a side effect. A previous study [88] also confirms that apps have duplicated memory pages in its Java heap. Another negative performance impact brought by increased memory usage is the GC cost. However, according to our measurement the GC overhead of the instrumented apps decreases by 17% on average. This is because the total GC pause time is the sum of the pause time of each operation. On one hand, AppSwift increases the number of live memory objects and thus each GC operation pauses longer. On the other hand, increased memory usage also increases the heap size, which reduces the total number of GC operations. Thus, these two effects combined could either increase or decreases total GC pause time.
Other usability considerations. We apply AppSwift to all app sam- ples we have and verify if the rewritten app can run on our experiment device. During this process, we also measure the rewriting time and apk footprint
118 added by AppSwift. AppSwift adds a negligible 71KB to the original app file, which mainly contains its runtime components. The rewriting time is mea- sured on a desktop PC with a quad-core i7-3770 CPU running at 3.4GHz. The rewriting time for most apps are under one minute, which is acceptable if AppSwift is adopted as an optimization phase for app development. We also advocate to port AppSwift to Android to optimize any app being installed on user’s device. AppSwift can run on Android since all its components are developed in Java. Furthermore, we are working on optimizing the rewriting procedure to make rewriting faster. 5.4.3 Case Study
We pick two cases, decompile these apps and analyze the decompiled code along with AppSwift log to understand how AppSwift enhancements help these apps.
FIFA: AppSwift beneath a cache-aware app. The FIFA app is a news app that provides latest FIFA news to users. We notice that the FIFA app already uses a caching library called Picasso [28] to handle part of the bitmap loading. This app is heavily obfuscated and we tried best-effort manual de-obfuscation to extract useful information. On this particular case, AppSwift saves 40.4% IOs and 36.3% loading time for bitmap loading. We want to understand how AppSwift improves a cache-aware app. Our findings are as follows: • Through static API usage analysis, we capture 249 occurrences of us- ing the setImageResource API in 28 classes. This APIs is a known inefficient API that loads and decodes a bitmap on the UI thread. Pi- casso has replacement for this particular API but we conclude that the Picasso APIs are not fully applied across the entire code base and thus some bitmap loading requests cannot be properly cached. AppSwift’s
119 advantage of completely retrofitting all inefficient APIs is quite useful for this case. • At runtime, the execution log captures 716 calls to the setImageResource where 669 calls (93.4%) are cached by AppSwift. Since setImageResource always performs image loading on the UI thread, 14 lengthy operations that were previously caused by this inefficient API are automatically fixed by AppSwift. • AppSwift caches some bitmaps loaded from the /cache folder for the app. We discover that the Picasso library will save some images fetched from the network in this temporary folder. Though AppSwift does not cache network image data directly, it does cache the bitmaps loaded from the /cache folder. From this case, we conclude that in real apps, it is quite difficult to apply caching consistently and completely across the entire code base. AppSwift could effective handle all inefficient APIs in the app and achieve notable per- formance gain by tackling these cases. Overall, cached operations would have a direct positive impact on UI responsiveness.
FaceQ: beyond lengthy operation fixes. FaceQ is a custom avatar picture making app. From the result, we observe over 23.9% saving in loading time, but the trace shows only four lengthy operations and no lengthy oper- ations after AppSwift optimization. We analyze the decompiled code of this app to understand how the saving would benefit this app. The main window of the app offers a list of available styles (e.g. hair style, face shape, etc), as shown in Figure 5–4. Each of the style is a PNG bitmap image resource loaded asynchronously with an AsyncTask, as simplified below:
// com.miantan.myoface.EditorActivity //A custom image loading task running on
120 Figure 5–4: FaceQ main window, showing the avatar picture currently being made and a list of available style pictures. Each style picture is loaded and displayed asynchronously.
// worker threads to load images from Resources class BitmapWorkerTask extends AsyncTask
121 }
This AsyncTask uses the API decodeStream to load bitmaps from Input- Streams. At runtime, we capture 3,577 calls to this API, which dominates the image loading time of the entire app. Each of the style picture is loaded and displayed asynchronously and thus few lengthy operations are found on the UI thread. AppSwift converts the decodeStream to be cache-aware with the help of dynamic data flow tracking. When a style picture is loaded twice, it could be served from cache and display faster. We observe that it is quite common that the user would navigate between styles and thus style pictures could get loaded repeatedly. According to the AppSwift results, 1,567 calls to the decodeStream are cached, out of 3,577 calls in total, which indicates that about 43% style pictures would load faster as a result of App- Swift optimization. Thus, though the app does not have significant lengthy operation problems, AppSwift still improves the UI performance for apps like FaceQ. FaceQ also emphasizes the importance of data flow tracking in AppSwift. As we have explained in Section 5.3.5, APIs like decodeStream must be transformed with the help of runtime data-flow information. With data flow tracking, AppSwift successfully transforms decodeStream and observes a notable increase of the caching opportunities. 5.5 Discussions
We discuss how to generalize the current AppSwift technique for a wider range of APIs as well as several alternative deployment of AppSwift to under- stand how to better utilize AppSwift with modern Android systems. 5.5.1 Generalization
During our experiments, we identify some other causes of lengthy oper- ations in these real-world apps beyond bitmap loading. These causes include
122 but are not limited to excessive IO operations, memory bloats, poor imple- mentation or even programming mistakes. AppSwift demonstrates an API retrofitting methodology that effectively reduces bitmap loading time and thus eliminating lengthy operations caused by excessive bitmap loading. Though exhaustively solving all real-world lengthy operations is not within the scope of AppSwift, the API retrofitting methodology can be generalized to address these problems. For example, we observe that some apps would experience lengthy operations because of doing excessive expensive IO operations (SQLite database operations and file operations). Similar to the case of bitmap APIs, one can address these problems by rewriting storage IO related APIs and im- plementing caching or even a memory-backed file system to boost performance. Also for memory bloats, one can intercept memory allocation sites to reuse old memory objects to avoid expensive GC pauses (i.e., implementing a Slab allocator [24]). Overall, AppSwift introduces the automatic API retrofitting methodol- ogy, which is capable of adding various efficient programming practices and retrofitting apps to use these practices by meaning of binary rewriting. 5.5.2 App Rewriting vs. OS Upgrade
AppSwift demonstrates that improved API implementation could effec- tively boost app performance. The most straightforward way of updating API implementation is to upgrade the mobile operating system. Though mobile operating systems are updating quickly, the user adoption of new OS versions are much behind. As of April 2016, according to Google’s statistics [21] the Android users that are using Android 4.x, 5.x, 6.x are 56.9%, 35.8%, 4.6%, re- spectively. This phenomenon poses greatly limitations to API implementation improvements.
123 On the contrary, AppSwift retrofits API implementation by means of bi- nary rewriting, which is more lightweight than upgrading the entire OS and thus has more deployment flexibility. As we have demonstrated, app develop- ers could rewrite apps before release to ensure efficient API implementation across app and library code. Mobile users (especially these with older OSes and less power devices) can install AppSwift as a system-wide service that rewrites APKs at installation time for optimization. The fact that AppSwift can retrofit API implementation in a more lightweight manner than an OS up- grade greatly extends its usefulness of delivering cutting-edge and customizable API optimization to a diverse range of real-world app users. 5.5.3 Dalvik vs. ART
ART [9] has replaced Dalvik and becomes the default runtime of Android since 5.0 (year 2014). Android apps are still distributed in APK formats and code being Dalvik bytecode. ART further compiles the Dalvik bytecode to native code at app installation time to boost runtime performance. Thus, as long as AppSwift rewrites the APK before ART compilation, all the benefits of bitmap caching would be preserved. 5.6 Related Work
Mobile devices have rapidly become the mostly used consumer electronics and the number of mobile apps surges over the past few years. Recently there are a lot of research efforts spent on the performance of mobile apps. In this section, we summarize the research related to AppSwift. AppSwift is either inspired by the facts and observations outlined in these previous work or complemented some approaches to address the performance issues.
Measurements and bug detection. Performance bugs are like other software bugs, which could be caused by various reasons, carelessness, etc. There are research work that aim to measure various performance metrics
124 during app execution in order to reveal these bugs. AppInsight [101] and Panappticon [122] track the code path that responds to user inputs, aim- ing to reveal the main bottlenecks that cause long user-perceived delay. Lag Hunting[85] also focuses on measuring the user perceptible delay with event tracing. Mantis [87] and Proteus [119] extract features from an app and pre- dict its future performance. Previous research also leverages multi-layer profil- ing [112] or innovative testing methods [90, 55, 99] to reveal bugs (not limited to performance bugs) in apps. Meanwhile, Liu et al. [93] study some known performance bugs in order to reveal the root causes and common patterns among these bugs. These works and their findings have motivated AppSwift to tackle performance bugs in mobile apps.
Binary rewriting tools. Binary rewriting and instrumentation is a popular technique [74, 57, 101, 50] used to measure or improve performance in mobile apps. AppSwift leverage its own binary rewriter which simplifies binary rewriting with high-level rules. Compared with previous tools [57, 74], the rewriter does not require its users to be aware of bytecode details when writing rewriting rules. Also AppSwift rewriter provides the capability to rewrite app UI layout file, app manifest file to further extend the scope and effect of binary rewriting.
Performance improving tools. There are a few works that aim to im- prove app performance, either by enforcing deadlines for network delay [102] or by refactoring the app for better asynchrony [92]. There are also various approaches [69] to accelerate app execution by offloading part of the compu- tation to an entity with more computing power like a cloud. Most of these works require certain developer involvement, either by refactoring or providing certain guidelines. On the contrary, AppSwift also improves performance but
125 operates in a fully automatic way. Also AppSwift operates on the bytecode level which further extends its usage scenarios.
Performance bugs in other languages. AppSwift is also inspired by some of the techniques for detecting and fixing performance bugs in other programming languages [83, 97, 111, 84]. In particular, Android applications are in general Java applications. Thus some previous studies on Java GUI framework and general Java program bugs also apply on Android apps. For example, previous work in detecting cacheable data in traditional Java appli- cations [96] guides us to find what data shall be cached in an Android app. Previous work on Java GUI problems [123, 124] also help us to understand some performance bugs in Android apps.
126 Chapter 6 Conclusion and Future Work Mobile computing hardware has witnessed tremendous success over the past decade. Mobile applications running on these devices have greatly in- novated daily lives with mobile gaming, always-connected software, location- optimized services, daily physical activities and health conditions tracking, etc. The mobile programming model is evolving rapidly over years to enable and support these innovations in apps. However, the growing programming complexity also gives rise to new software problems in apps. These problems could affect app performance or harm user privacy. Effective methods and tools are needed to detect, diagnose and fix these problems. This thesis proposes three technologies to analyze and improve mobile applications. AppInspector is a developer tool to assist app developers in discovering performance issues and diagnose their root causes. It achieves so by automatically instrumenting an app to collect performance and diagnostic data at runtime. AppAudit provides program analyses to detect and analyze data leaks in apps. The dynamic analysis proposed in AppAudit boosts both the precision and performance of the detection and greatly extends the use cases of data leak detection. AppAudit allows app developers to perform data leakage checks before releasing apps, helps app market operators to detect and remove data-leaking apps, helps mobile users to run analysis on mobile devices to avoid data-leaking apps. AppSwift is an automatic developer tool that aims to eliminate performance issues caused by inefficient bitmap image handling. It automatically transforms app binaries to retrofit an app and remove the use of inefficient image loading APIs.
127 During the development of these technologies, we also synthesize a toolkit for analyzing Android apps, namely PATDroid. We open-source this toolkit for boost further related research, which which will be mentioned in the appendix. Also, our experience with the three technologies and some discussions with other researchers working on the same field have motivated several interesting directions for future work:
Test Generation and Automation for Dynamic Analyses. We choose to evaluate AppInspector and AppSwift with real-world apps to demon- strate their usefulness. To ensure that the evaluation is meaningful, we have the following requirements for user inputs fed to each app. First, we need to generate user inputs that would cover most of the app functionalities, i.e., coverage requirement. Second, we must automatically feed the same user in- puts to different versions of the app to scale up our experiments, i.e., the automation requirement. Third, we need to compare different version of the apps (e.g., measure the overhead of our instrumentation), which requires the user input to be fed in exact the same pace, i.e., the timing requirement. We find that these requirements for user inputs are not specific to evaluating our tools. Many related research tools and systems that perform dynamic analysis also have the same requirements. For example, dynamic taint analysis systems such as TaintDroid [61] could detect more hidden data leaks when the input coverage is high. Performance measuring systems [101] and bug detectors [78] could discover more problems if user inputs could reach more code paths. All of these analyses would benefit if testing apps could be automated. In our evaluation, we use a record-and-replay methodology. This method- ology first records a user input trace with timing and touchscreen position information. The trace is then automatically replayed to different versions of
128 the app for experiment purpose. This testing can achieve the timing require- ment. The coverage is ensured by the user who contributes the input trace. Also the entire testing is semi-automated, where the trace is recorded once and replayed multiple times. This methodology is good enough to evaluate our systems and have been advocated by related research [66] However, we also recognize several limitations of current record-and-replay tool during our experiments, which could be explored in the future. First, the touchscreen point (x,y pair) is very specific to the test device model and cannot be reused on another device. To overcome this, many tools [75, 42, 76, 114] are explor- ing the semantic of user inputs (e.g. which UI element is clicked instead of the x,y coordinate of the touched point on the screen). If the recorded trace contains semantic user inputs, then the trace could be potentially replayed on more devices and scale up the experiments. Second, randomness is in many cases inevitable in apps. For example, a mobile game app would randomize game settings everything the a new game is started. Also, some advertising would display ads (a popup dialog) whenever the ads is downloaded. Such randomness would easily make replayed user inputs invalid. Thus we suggest that the record and replay tools must detect random seeds and events inside apps and control randomness during replaying.
Characterizing the Impact of API Behavior Difference. Mobile APIs are evolving rapidly, with about three generations per year for Android. Maintaining backwards compatibility is a crucial consideration during API evolution, which ensures that apps developed with old APIs can still run nor- mally on new devices. Backwards compatibility requires the new API genera- tion to preserve all old APIs and make sure they work as usual (even if they are marked deprecated in the new generation). More specifically, the API signa- ture (function prototype) should not change and the APIs should perform the
129 same functionalities. Though Android ensures unchanged API signature over evolution, we do notice that some APIs would change behavior in new API generations [3,4]. Though many of these changes are subtle, app developers have reported performance and correctness problem because of API behavior difference [12]. We advocate a comprehensive analysis of how API behavior difference would impact app performance and correctness. Furthermore, we advocate the development of potential solutions for such issues, which could include the following: • Add Lint checks for the use of APIs with behavior difference; • Develop code transformation tool (similar to AppSwift) to evolve the apps toward new APIs; • Develop API compatibility library to patch old apps and maintain the behavior of old APIs;
130 Appendix: PATDroid
PATDroid [27] is an open-source program analysis toolkit for analyzing and transforming Android applications, which is the foundation of AppInspec- tor, AppAudit and AppSwift, being developed and refined during the course of the research. PATDroid comprises a Java library and a set of Python tool scripts. The PATDroid Java library is largely intended for code analysis and transformation while Python scripts are intended for analyzing and/or transforming resource XML files.
PATDroid Java Library. The Java library provides the data struc- tures modeling classes, methods, fields, instructions and permissions of an Android app. PATDroid can leverage the smali disassembler [72] to extract these data structures from an APK file. The below code snippet demonstrates a simple example of using PATDroid to 1) load all API19 framework classes; 2) load a given APK file; 3) find a certain method in a certain class by name; 4) print the instructions of the given method. import patdroid.core.ClassInfo; import patdroid.core.MethodInfo; import patdroid.dalvik.Instruction; import patdroid.smali.SmaliClassDetailLoader;
// load all framework classes, choose an API level installed SmaliClassDetailLoader.getFrameworkClassLoader(19).loadAll(); // pick an apk ZipFile apkFile = new ZipFile(new File("path/to/your/apk")); // load all classes, methods, fields and instructions from an apk // we are using smali as the underlying engine
131 new SmaliClassDetailLoader(apkFile, true).loadAll(); // get the class representation for the MainActivity class in the apk ClassInfo c = ClassInfo.findClass("com.example.MainActivity"); // find all methods with the name"onCreate", most likely there is only one MethodInfo[] m = c.findMethodsHere("onCreate"); // print all instructions int counter = 0; for (Instruction ins : m[0].insns) { System.out.println("[" + counter +"]" + ins.toString()); counter++; }
These code components are the building blocks for more advanced program analysis data structures (e.g. call graph, control-flow graph, execution trace) and are the foundation of the tools built in this thesis (AppSwift for coding transformation, AppInspector for instrumentation and AppAudit for dynamic analysis). In addition to these constructs, PATDroid also integrates with existing Android security research [49, 65] to provide a mapping data structure to query what permission is needed by a given Android API.
PATDroid Python Tools. An APK file is an RSA-singed zip file. Analyzing an APK file involves not only code analysis, but also the analysis of manifest XML files, layout XMLs and zip file (re-)signing. These tasks depend on external tools like jarsigner, aapt [1]. We simplify these common tasks by wrapping them with Python. The PATDroid python tool set has a single command line entry point and is extensible. Each tool is an independent Python script, coupled with the main command line interface. For example, the parse layout tool can read through all layout XML files of an APK
132 and extract potential reflection call targets embedded in layout XMLs [23]. This information could complement code analysis tools for better accuracy.
133 References [1] aapt: Android asset packaging tool. http://elinux.org/ Android_aapt. [2] About android. http://developer.android.com/about/ android.html. [3] Android 5.0 behavior changes. http://developer.android.com/ about/versions/android-5.0-changes.html. [4] Android 6.0 behavior changes. http://developer.android.com/ preview/behavior-changes.html. [5] Android api: Imageview.setimageresource. http://developer. android.com/reference/android/widget/ImageView. html#setImageResource(int). [6] Android callbacks (flowdroid project). https://github. com/secure-software-engineering/soot-infoflow- android/blob/develop/AndroidCallbacks.txt. [7] Android developer’s guide: Improving layout performance. http://developer.android.com/training/improving- layouts/index.html. [8] Android dynamic java virtual machine: Bytecode instruction set. https://source.android.com/devices/tech/dalvik/ dalvik-bytecode.html. [9] Android runtime: Art and dalvik. https://source.android.com/ devices/tech/dalvik/. [10] Android systrace tool. https://developer.android.com/ studio/profile/systrace.html. [11] Art: New android runtime. https://source.android.com/ devices/tech/dalvik/index.html. [12] Asynctask parallel execution over different api levels. http: //stackoverflow.com/questions/4068984/running- multiple-asynctasks-at-the-same-time-not-possible. [13] Avg mobile security. http://www.avgmobilation.com/.
134 135
[14] dex2jar: Tools to work with android .dex and java .class files. https: //code.google.com/p/dex2jar/. [15] Droidbench, an open test suite for evaluating the effectiveness of taint-analysis. https://github.com/secure-software- engineering/DroidBench. [16] An evaluation of the application verification service in android 4.2. http://www.cs.ncsu.edu/faculty/jiang/appverify/. [17] Event dispatching thread. https://en.wikipedia.org/wiki/ Event_dispatching_thread. [18] Fortiguard virus encyclopedia. http://www.fortiguard.com/ encyclopedia/. [19] Free apps used to spy on millions of phones: Flashlight program can be used to secretly record location of phone and content of text messages. http://www.dailymail.co.uk/news/article- 2808007/Free-apps-used-spy-millions-phones- Flashlight-program-used-secretly-record-location- phone-content-text-messages.html. [20] Garbage collection in art and dalvik. https://source.android. com/devices/tech/dalvik/. [21] Google play install stats: Os versions. http://developer. android.com/intl/zh-cn/about/dashboards/index.html. [22] Hiromacro auto-touch macro app. https://play.google.com/ store/apps/details?id=com.prohiro.macro. [23] How exactly does the android:onclick xml attribute differ from se- tonclicklistener? http://stackoverflow.com/questions/ 4153517/how-exactly-does-the-androidonclick-xml- attribute-differ-from-setonclicklistene. [24] Linux slab memory allocator. https://www.kernel.org/doc/ gorman/html/understand/understand011.html. [25] Lookout mobile security. https://www.lookout.com/. [26] Nviso apkscan. http://apkscan.nviso.be/. [27] PATDroid: A program analysis toolkit for android. https:// github.com/mingyuan-xia/PATDroid. [28] Picasso: A powerful image downloading and caching library for android. http://square.github.io/picasso/. 136
[29] Platform versions for running android devices. https://developer. android.com/about/dashboards/index.html. [30] Profiling with traceview and dmtracedump. http://developer. android.com/tools/debugging/debugging-tracing.html. [31] Redundant downloads are redundant. http://developer. android.com/training/efficient-downloads/ redundant_redundant.html. [32] Running multiple asynctasks at the same time not possible? http: //stackoverflow.com/questions/4068984/running- multiple-asynctasks-at-the-same-time-not-possible. [33] Scrubdroid/antitaintdroid project. http://gsbabil.github.io/ AntiTaintDroid/. [34] Semantec mobile security. http://www.symantec.com/mobile- security. [35] Sophos security. http://www.sophos.com/en-us.aspx. [36] Stack limit in android art runtime. http://developer. android.com/guide/practices/verifying-apps- art.html#Stack_Size. [37] Stack limit in dalvik vm. http://stackoverflow.com/ questions/16843357/what-is-the-android-ui-thread- stack-size-limit-and-how-to-overcome-it. [38] Stackoverflow: How to properly use soundpool on a game? http://stackoverflow.com/questions/7437505/how- to-properly-use-soundpool-on-a-game. [39] Stackoverflow: running asynctasks in parallel. http: //stackoverflow.com/questions/13910508/running- parallel-asynctask. [40] Statistics and market data on mobile internet and apps. http: //www.statista.com/markets/424/topic/538/mobile- internet-apps/. [41] Test traces and apps for AppSwift. link to repository removed for double- blind review. [42] Testing ui for multiple apps. http://developer.android. com/training/testing/ui-testing/uiautomator- testing.html. 137
[43] This is what android fragmentation looks like in 2015. http: //thenextweb.com/insider/2015/08/05/this-is-what- android-fragmentation-looks-like-in-2015/. [44] Trendlabs security intelligence blog. http://blog.trendmicro. com/trendlabs-security-intelligence/. [45] Virustotal: Free online virus, malware and url scanner. https://www. virustotal.com/. [46] Xml apk parser. https://code.google.com/p/xml-apk- parser/. [47] Karim Ali and OndˇrejLhot´ak. Application-only call graph construc- tion. In Proceedings of the 26th European Conference on Object-Oriented Programming, ECOOP’12, pages 688–712, Berlin, Heidelberg, 2012. Springer-Verlag. [48] Steven Arzt, Siegfried Rasthofer, Christian Fritz, Eric Bodden, Alexan- dre Bartel, Jacques Klein, Yves Le Traon, Damien Octeau, and Patrick McDaniel. Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Im- plementation, PLDI ’14, pages 259–269, New York, NY, USA, 2013. ACM. [49] Kathy Wain Yee Au, Yi Fan Zhou, Zhen Huang, and David Lie. Pscout: Analyzing the android permission specification. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS ’12, pages 217–228, New York, NY, USA, 2012. ACM. [50] Md. Tanzirul Azim, Iulian Neamtiu, and Lisa M. Marvel. Towards self- healing smartphone software via automated patching. In Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, pages 623–628, New York, NY, USA, 2014. ACM. [51] Clark W Barrett, Roberto Sebastiani, Sanjit A Seshia, and Cesare Tinelli. Satisfiability modulo theories. Handbook of satisfiability, 185:825–885, 2009. [52] Sven Bugiel, Stephan Heuser, and Ahmad-Reza Sadeghi. Flexible and fine-grained mandatory access control on android for diverse security and privacy policies. In Proceedings of the 22Nd USENIX Conference on Security, SEC’13, pages 131–146, Berkeley, CA, USA, 2013. USENIX Association. 138
[53] William R. Bush, Jonathan D. Pincus, and David J. Sielaff. A static analyzer for finding dynamic programming errors. Softw. Pract. Exper., 30(7):775–802, June 2000. [54] Michael Carbin, Sasa Misailovic, Michael Kling, and Martin C. Rinard. Detecting and escaping infinite loops with jolt. In Proceedings of the 25th European Conference on Object-oriented Programming, ECOOP’11, pages 609–633, Berlin, Heidelberg, 2011. Springer-Verlag. [55] Qi Alfred Chen, Haokun Luo, Sanae Rosen, Z. Morley Mao, Karthik Iyer, Jie Hui, Kranthi Sontineni, and Kevin Lau. Qoe doctor: Diagnos- ing mobile app qoe with automated ui control and cross-layer analysis. In Proceedings of the 2014 Conference on Internet Measurement Con- ference, IMC ’14, pages 151–164, New York, NY, USA, 2014. ACM. [56] Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, Mayur Naik, and Ashwin Patti. Clonecloud: Elastic execution between mobile device and cloud. In Proceedings of the Sixth Conference on Computer Systems, EuroSys ’11, pages 301–314, New York, NY, USA, 2011. ACM. [57] Benjamin Davis and Hao Chen. Retroskeleton: Retrofitting android apps. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’13, pages 181– 192, New York, NY, USA, 2013. ACM. [58] Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. In Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. Springer-Verlag. [59] Rui Ding, Hucheng Zhou, Jian-Guang Lou, Hongyu Zhang, Qingwei Lin, Qiang Fu, Dongmei Zhang, and Tao Xie. Log2: A cost-aware log- ging mechanism for performance diagnosis. In Proceedings of the 2015 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC ’15, pages 139–150, Berkeley, CA, USA, 2015. USENIX Associa- tion. [60] Manuel Egele, Christopher Kruegel, Engin Kirda, and Giovanni Vigna. Pios: Detecting privacy leaks in ios applications. In Proceedings of the 18th Network and Distributed System Security, NDSS ’11, 2011. [61] William Enck, Peter Gilbert, Byung-Gon Chun, Landon P. Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N. Sheth. Taintdroid: an information-flow tracking system for realtime privacy monitoring on 139
smartphones. In Proceedings of the 9th USENIX Conference on Operat- ing systems Design and Implementation, OSDI’10, pages 1–6, Berkeley, CA, USA, 2010. USENIX Association. [62] William Enck, Damien Octeau, Patrick McDaniel, and Swarat Chaud- huri. A study of android application security. In Proceedings of the 20th USENIX Conference on Security, SEC’11, pages 21–21, Berkeley, CA, USA, 2011. USENIX Association. [63] William Enck, Machigar Ongtang, and Patrick McDaniel. On lightweight mobile phone application certification. In Proceedings of the 16th ACM Conference on Computer and Communications Security, CCS ’09, pages 235–245, New York, NY, USA, 2009. ACM. [64] Michael D. Ernst. Static and dynamic analysis: Synergy and duality. In WODA 2003: ICSE Workshop on Dynamic Analysis, pages 24–27, Portland, OR, May 9, 2003. [65] Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Song, and David Wagner. Android permissions demystified. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS ’11, pages 627–638, New York, NY, USA, 2011. ACM. [66] Lorenzo Gomez, Iulian Neamtiu, Tanzirul Azim, and Todd Millstein. Reran: Timing- and touch-sensitive record and replay for android. In Proceedings of the 2013 International Conference on Software Engineer- ing, ICSE ’13, pages 72–81, Piscataway, NJ, USA, 2013. IEEE Press. [67] Lu Gong and Mingyuan Xia. Beanbot analysis report. https: //github.com/mingyuan-xia/AppAudit/wiki/BeanBot- analysis-report. [68] Michael Gorbovitski, Yanhong A. Liu, Scott D. Stoller, Tom Rothamel, and Tuncay K. Tekle. Alias analysis for optimization of dynamic lan- guages. In Proceedings of the 6th Symposium on Dynamic Languages, DLS ’10, pages 27–42, New York, NY, USA, 2010. ACM. [69] Mark S. Gordon, David Ke Hong, Peter M. Chen, Jason Flinn, Scott Mahlke, and Zhuoqing Morley Mao. Accelerating mobile applications through flip-flop replication. In Proceedings of the 13th Annual Interna- tional Conference on Mobile Systems, Applications, and Services, Mo- biSys ’15, pages 137–150, New York, NY, USA, 2015. ACM. [70] Michael Grace, Yajin Zhou, Qiang Zhang, Shihong Zou, and Xuxian Jiang. Riskranker: scalable and accurate zero-day android malware de- tection. In Proceedings of the 10th International Conference on Mobile systems, Applications, and Services, MobiSys ’12, pages 281–294, New York, NY, USA, 2012. ACM. 140
[71] Michael C. Grace, Wu Zhou, Xuxian Jiang, and Ahmad-Reza Sadeghi. Unsafe exposure analysis of mobile in-app advertisements. In Proceedings of the Fifth ACM Conference on Security and Privacy in Wireless and Mobile Networks, WISEC ’12, pages 101–112, New York, NY, USA, 2012. ACM. [72] Ben Gruver. Smali: An assembler/disassembler for the dex format used by dalvik. https://github.com/JesusFreke/smali. [73] Romain Guy. Android performance case study. http: //www.curious-creature.com/2012/12/01/android- performance-case-study/. [74] Shuai Hao, Ding Li, William G.J. Halfond, and Ramesh Govindan. Sif: A selective instrumentation framework for mobile applications. In Pro- ceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’13, pages 167–180, New York, NY, USA, 2013. ACM. [75] Shuai Hao, Bin Liu, Suman Nath, William G.J. Halfond, and Ramesh Govindan. Puma: Programmable ui-automation for large-scale dynamic analysis of mobile apps. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’14, pages 204–217, New York, NY, USA, 2014. ACM. [76] Xiaocong He. Python wrapper of android ui automator test tool. https://github.com/xiaocong/uiautomator. [77] Peter Hornyack, Seungyeop Han, Jaeyeon Jung, Stuart Schechter, and David Wetherall. These aren’t the droids you’re looking for: Retrofitting android to protect data from imperious applications. In Proceedings of the 18th ACM Conference on Computer and Communications Security, CCS ’11, pages 639–652, New York, NY, USA, 2011. ACM. [78] Gang Hu, Xinhao Yuan, Yang Tang, and Junfeng Yang. Efficiently, effectively detecting mobile app bugs with appdoctor. In Proceedings of the Ninth European Conference on Computer Systems, EuroSys ’14, pages 18:1–18:15, New York, NY, USA, 2014. ACM. [79] Yan Huang, Peter Chapman, and David Evans. Privacy-preserving ap- plications on smartphones. In Proceedings of the 6th USENIX Confer- ence on Hot Topics in Security, HotSec’11, pages 4–4, Berkeley, CA, USA, 2011. USENIX Association. [80] S. Ickin, K. Wac, M. Fiedler, L. Janowski, Jin-Hyuk Hong, and A.K. Dey. Factors influencing quality of experience of commonly used mo- bile applications. Communications Magazine, IEEE, 50(4):48–56, April 2012. 141
[81] Inc. IDC Research. Worldwide smartphone market will see the first single-digit growth year on record. http://www.idc.com/getdoc. jsp?containerId=prUS40664915. [82] Xuxian Jiang. Security alert: New beanbot sms trojan discovered. http://www.csc.ncsu.edu/faculty/jiang/BeanBot/. [83] Guoliang Jin, Linhai Song, Xiaoming Shi, Joel Scherpelz, and Shan Lu. Understanding and detecting real-world performance bugs. In Proceed- ings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, pages 77–88, New York, NY, USA, 2012. ACM. [84] Guoliang Jin, Linhai Song, Wei Zhang, Shan Lu, and Ben Liblit. Auto- mated atomicity-violation fixing. In Proceedings of the 32Nd ACM SIG- PLAN Conference on Programming Language Design and Implementa- tion, PLDI ’11, pages 389–400, New York, NY, USA, 2011. ACM. [85] Milan Jovic, Andrea Adamoli, and Matthias Hauswirth. Catch me if you can: Performance bug detection in the wild. In Proceedings of the 2011 ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA ’11, pages 155–170, New York, NY, USA, 2011. ACM. [86] Michael Kling, Sasa Misailovic, Michael Carbin, and Martin Rinard. Bolt: On-demand infinite loop escape in unmodified binaries. In Proceed- ings of the ACM International Conference on Object Oriented Program- ming Systems Languages and Applications, OOPSLA ’12, pages 431–450, New York, NY, USA, 2012. ACM. [87] Yongin Kwon, Sangmin Lee, Hayoon Yi, Donghyun Kwon, Seungjun Yang, Byung-Gon Chun, Ling Huang, Petros Maniatis, Mayur Naik, and Yunheung Paek. Mantis: Automatic performance prediction for smartphone applications. In Proceedings of the 2013 USENIX Confer- ence on Annual Technical Conference, USENIX ATC’13, pages 297–308, Berkeley, CA, USA, 2013. USENIX Association. [88] Byeoksan Lee, Seong Min Kim, Eru Park, and Dongsu Han. Memscope: Analyzing memory duplication on android systems. In Proceedings of the 6th Asia-Pacific Workshop on Systems, APSys ’15, pages 19:1–19:7, New York, NY, USA, 2015. ACM. [89] Sangmin Lee, Edmund L. Wong, Deepak Goel, Mike Dahlin, and Vitaly Shmatikov. πbox: A platform for privacy-preserving apps. In Proceed- ings of the 10th USENIX Symposium on Networked Systems Design and Implementation, NSDI ’13, pages 501–514, 2013. 142
[90] Chieh-Jan Mike Liang, Nicholas D. Lane, Niels Brouwers, Li Zhang, B¨orjeF. Karlsson, Hao Liu, Yan Liu, Jun Tang, Xiang Shan, Ranveer Chandra, and Feng Zhao. Caiipa: Automated large-scale mobile app testing through contextual fuzzing. In Proceedings of the 20th Annual International Conference on Mobile Computing and Networking, Mobi- Com ’14, pages 519–530, New York, NY, USA, 2014. ACM. [91] Yu Lin, Semih Okur, and Danny Dig. Study and refactoring of an- droid asynchronous programming (t). In Proceedings of the 2015 30th IEEE/ACM International Conference on Automated Software Engineer- ing (ASE), ASE ’15, pages 224–235, Washington, DC, USA, 2015. IEEE Computer Society. [92] Yu Lin, Cosmin Radoi, and Danny Dig. Retrofitting concurrency for an- droid applications through refactoring. In Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engi- neering, FSE 2014, pages 341–352, New York, NY, USA, 2014. ACM. [93] Yepang Liu, Chang Xu, and Shing-Chi Cheung. Characterizing and de- tecting performance bugs for smartphone applications. In Proceedings of the 36th International Conference on Software Engineering, ICSE 2014, pages 1013–1024, New York, NY, USA, 2014. ACM. [94] Kin-Keung Ma, Khoo Yit Phang, Jeffrey S. Foster, and Michael Hicks. Directed symbolic execution. In Proceedings of the 18th International Conference on Static Analysis, SAS’11, pages 95–111, Berlin, Heidel- berg, 2011. Springer-Verlag. [95] Brad A. Myers. The importance of percent-done progress indicators for computer-human interfaces. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI ’85, pages 11–17, New York, NY, USA, 1985. ACM. [96] Khanh Nguyen and Guoqing Xu. Cachetor: Detecting cacheable data to remove bloat. In Proceedings of the 2013 9th Joint Meeting on Foun- dations of Software Engineering, ESEC/FSE 2013, pages 268–278, New York, NY, USA, 2013. ACM. [97] Oswaldo Olivo, Isil Dillig, and Calvin Lin. Static detection of asymp- totic performance bugs in collection traversals. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Im- plementation, PLDI 2015, pages 369–378, New York, NY, USA, 2015. ACM. [98] Paul Pearce, Adrienne Porter Felt, Gabriel Nunez, and David Wag- ner. Addroid: Privilege separation for applications and advertisers in android. In Proceedings of the 7th ACM Symposium on Information, 143
Computer and Communications Security, ASIACCS ’12, pages 71–72, New York, NY, USA, 2012. ACM. [99] Michael Pradel, Parker Schuh, George Necula, and Koushik Sen. Eventbreak: Analyzing the responsiveness of user interfaces through performance-guided test generation. In Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Lan- guages & Applications, OOPSLA ’14, pages 33–47, New York, NY, USA, 2014. ACM. [100] Lenin Ravindranath, Suman Nath, Jitendra Padhye, and Hari Balakr- ishnan. Automatic and scalable fault detection for mobile applications. In Proceedings of the 12th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’14, pages 190–203, New York, NY, USA, 2014. ACM. [101] Lenin Ravindranath, Jitendra Padhye, Sharad Agarwal, Ratul Maha- jan, Ian Obermiller, and Shahin Shayandeh. Appinsight: Mobile app performance monitoring in the wild. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pages 107–120, Berkeley, CA, USA, 2012. USENIX Association. [102] Lenin Ravindranath, Jitendra Padhye, Ratul Mahajan, and Hari Bal- akrishnan. Timecard: Controlling user-perceived delays in server-based mobile applications. In Proceedings of the Twenty-Fourth ACM Sym- posium on Operating Systems Principles, SOSP ’13, pages 85–100, New York, NY, USA, 2013. ACM. [103] Jingjing Ren Ren, Ashwin Rao, Martina Lindorfer, Arnaud Legout, and David Choffnes. Recon: Revealing and controlling pii leaks in mobile network traffic. In Proceedings of the 14th International Conference on Mobile Systems, Applications and Services, 2016. [104] Giovanni Russello, Arturo Blas Jimenez, Habib Naderi, and Wannes van der Mark. Firedroid: Hardening security in almost-stock android. In Proceedings of the 29th Annual Computer Security Applications Con- ference, ACSAC ’13, pages 319–328, New York, NY, USA, 2013. ACM. [105] Golam Sarwar, Olivier Mehani, Roksana Boreli, and Dali Kaafar. On the effectiveness of dynamic taint analysis for protecting against private information leaks on android-based devices. In Proceedings of the 10th International Conference on Security and Cryptography, pages 461–467, July 2013. [106] Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask). In Proceedings of the 144
2010 IEEE Symposium on Security and Privacy, SP ’10, pages 317–331, Washington, DC, USA, 2010. IEEE Computer Society. [107] Dawn Song, David Brumley, Heng Yin, Juan Caballero, Ivan Jager, Min Gyung Kang, Zhenkai Liang, James Newsome, Pongsin Poosankam, and Prateek Saxena. Bitblaze: A new approach to computer security via binary analysis. In Proceedings of the 4th International Conference on Information Systems Security, ICISS ’08, pages 1–25, Berlin, Heidelberg, 2008. Springer-Verlag. [108] Wook Song, Nosub Sung, Byung-Gon Chun, and Jihong Kim. Reducing energy consumption of smartphones using user-perceived response time analysis. In Proceedings of the 15th Workshop on Mobile Computing Systems and Applications, HotMobile ’14, pages 20:1–20:6, New York, NY, USA, 2014. ACM. [109] Alan Turing. On computable numbers with an application to the Entscheidungsproblem. Proceeding of the London Mathematical Society, 1936. [110] Mario Linares V´asquez, Christopher Vendome, Qi Luo, and Denys Poshyvanyk. How developers detect and fix performance bottlenecks in android apps. In Rainer Koschke, Jens Krinke, and Martin P. Robillard, editors, 2015 IEEE International Conference on Software Maintenance and Evolution, ICSME 2015, Bremen, Germany, September 29 - Octo- ber 1, 2015, pages 352–361. IEEE, 2015. [111] Xi Wang, Zhenyu Guo, Xuezheng Liu, Zhilei Xu, Haoxiang Lin, Xiaoge Wang, and Zheng Zhang. Hang analysis: Fighting responsiveness bugs. In Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008, Eurosys ’08, pages 177–190, New York, NY, USA, 2008. ACM. [112] Xuetao Wei, Lorenzo Gomez, Iulian Neamtiu, and Michalis Faloutsos. Profiledroid: Multi-layer profiling of android applications. In Proceedings of the 18th Annual International Conference on Mobile Computing and Networking, Mobicom ’12, pages 137–148, New York, NY, USA, 2012. ACM. [113] Ryszard Winiewski and Connor Tumbleson. Apktool: A tool for reverse engineering android apk files. https://ibotpeaches.github.io/ Apktool/. [114] Michelle Y Wong and David Lie. Intellidroid: A targeted input generator for the dynamic analysis of android malware. 2016. 145
[115] Mingyuan Xia, Lu Gong, Yuanhao Lyu, Zhengwei Qi, and Xue Liu. Effective real-time android application auditing. In 2015 IEEE Sym- posium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 899–914. IEEE Computer Society, 2015. [116] Mingyuan Xia, Wenbo He, Xue Liu, and Jie Liu. Why application errors drain battery easily?: a study of memory leaks in smartphone apps. In Proceedings of the Workshop on Power-Aware Computing and Systems, HotPower 2013, Farmington, Pennsylvania, USA, November 3-6, 2013, pages 2:1–2:5. ACM, 2013. [117] Mingyuan Xia, Xue Liu, Zhuocheng Ding, Hanyang Ma, Xiaohui Zhao, Chengcheng Xiang, Zhengwei Qi, and Xue Liu. Programmatically diag- nosing performance issues in android apps. In The 39th International Conference on Software Engineering (under review), ICSE ’17. [118] Mingyuan Xia, Chengcheng Xiang, Zhengwei Qi, and Xue Liu. Auto- matically enhancing ui performance of android applications. In The 15th Annual International Conference on Mobile Systems, Applications, and Services (under review), Mobisys ’17. [119] Qiang Xu, Sanjeev Mehrotra, Zhuoqing Mao, and Jin Li. Proteus: Net- work performance forecast for real-time, interactive mobile applications. In Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’13, pages 347–360, New York, NY, USA, 2013. ACM. [120] Rubin Xu, Hassen Sa¨ıdi,and Ross Anderson. Aurasium: Practical policy enforcement for android applications. In Proceedings of the 21st USENIX Conference on Security Symposium, SEC’12, pages 27–27, Berkeley, CA, USA, 2012. USENIX Association. [121] Zhemin Yang, Min Yang, Yuan Zhang, Guofei Gu, Peng Ning, and X. Sean Wang. Appintent: analyzing sensitive data transmission in android for privacy leakage detection. In Proceedings of the 2013 ACM Conference on Computer and Communications Security, CCS ’13, pages 1043–1054, New York, NY, USA, 2013. ACM. [122] Lide Zhang, D.R. Bild, R.P. Dick, Z.M. Mao, and P. Dinda. Panappti- con: Event-based tracing to measure mobile application and platform performance. In Hardware/Software Codesign and System Synthesis (CODES+ISSS), 2013 International Conference on, pages 1–10, Sept 2013. 146
[123] Sai Zhang, Hao L¨u,and Michael D. Ernst. Finding errors in multi- threaded gui applications. In Proceedings of the 2012 International Sym- posium on Software Testing and Analysis, ISSTA 2012, pages 243–253, New York, NY, USA, 2012. ACM. [124] Sai Zhang, Hao L¨u,and Michael D. Ernst. Automatically repairing broken workflows for evolving gui applications. In Proceedings of the 2013 International Symposium on Software Testing and Analysis, ISSTA 2013, pages 45–55, New York, NY, USA, 2013. ACM. [125] Xiao Zhang, Amit Ahlawat, and Wenliang Du. Aframe: Isolating ad- vertisements from mobile applications in android. In Proceedings of the 29th Annual Computer Security Applications Conference, ACSAC ’13, pages 9–18, New York, NY, USA, 2013. ACM. [126] Yuan Zhang, Min Yang, Bingquan Xu, Zhemin Yang, Guofei Gu, Peng Ning, X. Sean Wang, and Binyu Zang. Vetting undesirable behaviors in android apps with permission use analysis. In Proceedings of the 2013 ACM Conference on Computer and Communications Security, CCS ’13, pages 611–622, New York, NY, USA, 2013. ACM. [127] Yajin Zhou and Xuxian Jiang. Dissecting android malware: Charac- terization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP ’12, pages 95–109, Washington, DC, USA, 2012. IEEE Computer Society. [128] Yajin Zhou and Xuxian Jiang. Detecting passive content leaks and pol- lution in android applications. In Proceedings of the 20th Network and Distributed System Security, NDSS ’13, 2013. [129] Yajin Zhou, Zhi Wang, Wu Zhou, and Xuxian Jiang. Hey, you, get off of my market: Detecting malicious apps in official and alternative android markets. In Proceedings of the 19th Network and Distributed System Security, NDSS ’12, 2012.