Specifying Callback Control Flow of Mobile Apps Using Finite Automata

Total Page:16

File Type:pdf, Size:1020Kb

Specifying Callback Control Flow of Mobile Apps Using Finite Automata 1 Specifying Callback Control Flow of Mobile Apps Using Finite Automata Danilo Dominguez Perez and Wei Le Abstract—Given the event-driven and framework-based architecture of Android apps, finding the ordering of callbacks executed by the framework remains a problem that affects every tool that requires inter-callback reasoning. Previous work has focused on the ordering of callbacks related to the Android components and GUI events. But the execution of callbacks can also come from direct calls of the framework (API calls). This paper defines a novel program representation, called Callback Control Flow Automata (CCFA), that specifies the control flow of callbacks invoked via a variety of sources. We present an analysis to automatically construct CCFAs by combining two callback control flow representations developed from the previous research, namely, Window Transition Graphs (WTGs) and Predicate Callback Summaries (PCSs). To demonstrate the usefulness of our representation, we integrated CCFAs into two client analyses: a taint analysis using FLOWDROID, and a value-flow analysis that computes source and sink pairs of a program. Our evaluation shows that we can compute CCFAs efficiently and that CCFAs improved the callback coverages over WTGs. As a result of using CCFAs, we obtained 33 more true positive security leaks than FLOWDROID over a total of 55 apps we have run. With a low false positive rate, we found that 22.76% of source-sink pairs we computed are located in different callbacks and that 31 out of 55 apps contain source-sink pairs spreading across components. Thus, callback control flow graphs and inter-callback analysis are indeed important. Although this paper mainly uses Android, we believe that CCFAs can be useful for modeling control flow of callbacks for other event-driven, framework-based systems. Index Terms—event-driven, framework-based, mobile, program analysis, information flow F 1 INTRODUCTION With more than 1.4 billion Android devices sold [1] and of control, such as external events and API methods, and be over 2.5 million apps in the official Android app’s market directly usable by analysis and testing tools. [2], Android has become one of the leading platforms for In this paper, we propose a novel program repre- smartphone devices. However, the complex event-driven, sentation, namely Callback Control Flow Automata (CCFA), framework-based architecture that developers use to im- to specify control flow of callbacks for event-driven and plement apps imposes new challenges on the quality of framework-based applications. The design of CCFA is based Android apps, creating new kinds of code smells and bugs on the Extended Finite State Machine (EFSM) [15], which ex- [3], [4], [5], [6]. Due to the impact of the Android apps on tends the Finite State Machine (FSM) by labeling transitions the market and the complexity of developing apps, there is using information such as guards. Using the CCFAs, we aim a need for programming tools for analyzing Android apps. to specify four types of callback control flow existing in an One of the foundational challenges for developing these app: 1) a callback B is invoked synchronously after another tools is the sequencing or ordering of callback methods. callback A, 2) B is invoked asynchronously after A, meaning Even for a small subset of callbacks, it has been shown that B is put in the event queue and invoked eventually after the current state-of-the-art tools fail to generate sequences A is invoked, 3) during an execution of A, B is invoked of callbacks that match the runtime behavior of Android synchronously by an API call, and 4) during an execution of apps [7]. A, B is invoked asynchronously by an API call. Previous tools have used manual models to obtain se- In the CCFAs, a state indicates whether the execution quences of callbacks. For instance, Pathak et al. [8] and path enters or exits a callback. The transition from one state Arzt et al. [9] used the lifecycle of components to represent to another represents the transfer of control flow between control flow between callbacks. This approach can miss callbacks. The transitions are labeled with guards, the condi- sequences of callbacks invoked in API methods such as tions under which a transition is enabled. Such a condition AsyncTask.execute and LoaderManager.initLoader. can be an external event (e.g. click the button) or an API call. Other approaches modeled a subset of callbacks, specifi- The input alphabet of the CCFA is composed by the names cally the callbacks invoked from external events including of callbacks implemented in the app. Specifically, we use GUI [10], [11], [12] and sensors’ events [13]. Another ap- callbacknameentry and callbacknameexit as input symbols proach automatically summarize the Android framework so we can specify the case where one callback is invoked to identify the control flow of callbacks implemented in during the execution of the other (see Type 3 and Type 4 the Android API methods [14]. Nevertheless, there is not of control flow discussed above). Using this approach, the a representation that integrates different sources of changes language recognized by a CCFA is a set of paths of callbacks, or possible sequences of callbacks that can be invoked at • Danilo Dominguez Perez and Wei Le are with the Department of Com- runtime for an app. puter Science, Iowa State University, Ames, Iowa, 50011. To compute CCFAs, we designed a static analysis algo- f g E-mail: danilo0,weile @iastate.edu rithm that traverses Windows Transition Graph (WTG) [10] to 2 obtain the control flow of callbacks related to the windows The rest of the paper is organized as follows. In Section 2, (activities, dialogs and menus) and GUI events, and also, in- we present an overview of our work. In Sections 3 and 4, tegrates Predicate Callback Summaries (PCSs) [14] for callback we present the definition of CCFAs and the algorithms for sequences invoked in the Android API methods. These API computing CCFAs. In Section 5 we show how to use CCFAs methods include the callbacks of the Android components’ for inter-callback analysis. The evaluation of our work is lifecycles, which are modeled manually in the previous presented in Section 6, followed by a discussion in Section 7 work. Our approach is able to automatically handle them. and the related work in Section 8. Finally, we present the In addition, we successfully added the callbacks for 9 ad- conclusions and future work in Section 9. ditional external events: GPS location updates, enabling and disabling of GPS services, sensor events, two camera external events (when the camera shutters and when the 2 AN OVERVIEW camera takes a picture), network notifications (through the Given the event-driven, framework-based architecture of ConnectivityManager), a web client and a web chrome Android apps, most of an app’s implementation is done by client that receive notifications from a WebView. The models defining callbacks that are invoked by the Android frame- for these events are constructed manually, but we show work. To perform inter-callback static analysis or testing that CCFA can be easily extended to specify external events the apps beyond unit level, it is important to discover beyond the window and GUI events. and represent the potential orders of the callbacks. In this In this paper, we show two different approaches of inte- section, we use an example to provide an overview of what grating CCFAs into static analyses. In the first approach, we is a CCFA and also the approach of computing the CCFA. extend an existing inter-procedural, context-sensitive anal- ysis to inter-callback dataflow analysis using CCFAs, and 2.1 What is a CCFA also instantiate it to compute inter-callback source and sink pairs of a program. In the second approach, we generate a In Figure 1a, we show a simple Android app that contains program using the callback invocation paths provided by one activity class called A. This activity implements the CCFAs to directly integrate CCFAs with FLOWDROID [9]. callbacks onCreate (line 3), onStart (line 7) and onStop We implemented the construction and applications of (line 15). In the callback onCreate at line 5, the app sets CCFAs using Soot and Spark. The WTGs were computed an object of type CList as the event listener for the click using GATOR [16] and the summaries for Android API event of the button b1. It also invokes the API method methods were provided by Lithium [14]. lm:initLoader at line 12 to load data, where this API call Our results show that the construction of CCFAs is uses the class L (lines 17-22) to create a loader. very efficient, reporting 29 seconds per app on average. Figure 1b shows the CCFA we constructed for Figure 1a. Compared to WTGs, CCFAs improved callback coverage The representation is designed based on the Extended Finite 10.72% on average. In the best cases, we improved the app State Machine (EFSM), and each transition is labeled with a heregps’s coverage from 50% to 100%, and for the app pair (x; g) where x is an input symbol and g is a guard. MyTracks, 78 more callbacks were integrated, improving The input symbol x can represent a callback’s entry point the callback coverage 15.30%. Applying CCFAs, our tool (see A:onCreateentry on the transition from state q1 to q2), found 33 more information leaks than FLOWDROID for 55 a callback’s exit point (see A:onCreateexit on the transition apps we ran. We also experimented on 60 apps from Droid- from state q2 to q3), or an empty input represented by (see Bench that contain inter-callback security leaks, and our tool the transition from state q4 to state q6).
Recommended publications
  • Object-Oriented Programming with ANSI-C
    ___________________________________________________________________________v Preface No programming technique solves all problems. No programming language produces only correct results. No programmer should start each project from scratch. Object-oriented programming is the current cure-all — although it has been around for much more then ten years. At the core, there is little more to it then finally applying the good programming principles which we have been taught for more then twenty years. C++ (Eiffel, Oberon-2, Smalltalk ... take your pick) is the New Language because it is object-oriented — although you need not use it that way if you do not want to (or know how to), and it turns out that you can do just as well with plain ANSI-C. Only object-orientation permits code reuse between pro- jects — although the idea of subroutines is as old as computers and good program- mers always carried their toolkits and libraries with them. This book is not going to praise object-oriented programming or condemn the Old Way. We are simply going to use ANSI-C to discover how object-oriented pro- gramming is done, what its techniques are, why they help us solve bigger prob- lems, and how we harness generality and program to catch mistakes earlier. Along the way we encounter all the jargon — classes, inheritance, instances, linkage, methods, objects, polymorphisms, and more — but we take it out of the realm of magic and see how it translates into the things we have known and done all along. I had fun discovering that ANSI-C is a full-scale object-oriented language.
    [Show full text]
  • The Construction and Applications of Callback Control Flow Graphs for Event-Driven and Framework-Based Mobile Apps Danilo Dominguez Perez Iowa State University
    Iowa State University Capstones, Theses and Graduate Theses and Dissertations Dissertations 2019 The construction and applications of callback control flow graphs for event-driven and framework-based mobile apps Danilo Dominguez Perez Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/etd Part of the Computer Sciences Commons Recommended Citation Dominguez Perez, Danilo, "The onc struction and applications of callback control flow graphs for event-driven and framework-based mobile apps" (2019). Graduate Theses and Dissertations. 17004. https://lib.dr.iastate.edu/etd/17004 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Graduate Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. The construction and applications of callback control flow graphs for event-driven and framework-based mobile apps by Danilo Dominguez Perez A dissertation submitted to the graduate faculty in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Major: Computer Science Program of Study Committee: Wei Le, Major Professor Hridesh Rajan Samik Basu Robyn Lutz Wensheng Zhang The student author, whose presentation of the scholarship herein was approved by the program of study committee, is solely responsible for the content of this dissertation. The Graduate College will ensure this dissertation is globally accessible and will not permit alterations after a degree is conferred. Iowa State University Ames, Iowa 2019 Copyright c Danilo Dominguez Perez, 2019.
    [Show full text]
  • An Object-Oriented Event Calculus Jeremiah S
    Computer Science Technical Reports Computer Science 7-2002 An Object-Oriented Event Calculus Jeremiah S. Patterson Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/cs_techreports Part of the Programming Languages and Compilers Commons Recommended Citation Patterson, Jeremiah S., "An Object-Oriented Event Calculus" (2002). Computer Science Technical Reports. 276. http://lib.dr.iastate.edu/cs_techreports/276 This Article is brought to you for free and open access by the Computer Science at Iowa State University Digital Repository. It has been accepted for inclusion in Computer Science Technical Reports by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. An Object-Oriented Event Calculus Abstract Despite the rising popularity and usefulness of events, or implicit invocation, in software design, the availability of general-purpose event mechanisms are rare. Further, most event mechanisms available for software design are implemented as libraries or sets of macros that are constrained by the language in which they are used; making such mechanisms inconvenient to use as well as error-prone. Event mechanisms that are part of a programming language can do away with such constraints; thus making events easier to use. However, there a few languages that offer built-in the events and even fewer languages that have a built-in general-purpose event mechanism. In order to promote the study of implicit invocation programming languages, this thesis presents a formal programming language foundation for events. This thesis expands the, object based, imps-caluculus to create a calculus for objects and events, the rws-calculus.
    [Show full text]
  • N3985: a Proposal to Add Coroutines to the C++ Standard Library
    Document number: N3985 Date: 2014-05-22 Project: Programming Language C++, SG1 Reply-to: Oliver Kowalke (oliver dot kowalke at gmail dot com) Nat Goodspeed ( nat at lindenlab dot com) A proposal to add coroutines to the C++ standard library (Revision 1) Introduction........................................................1 Motivation.........................................................2 Impact on the Standard.................................................. 11 Design Decisions...................................................... 11 Proposed Wording..................................................... 18 References.......................................................... 32 A. jump-operation for SYSV ABI on x86_64...................................... 32 Revision History This document supersedes N3708. A new kind of coroutines - std::symmetric_coroutine<T> - is introduced and additional examples (like recursive SAX parsing) are added. A section explains the benfits of using coroutines in the context of event-based asynchronous model. Introduction This proposal suggests adding two first-class continuations to the C++ standard library: std::asymmetric_coroutine<T> and std::symmetric_coroutine<T> . In computer science routines are defined as a sequence of operations. The execution of routines forms a parent- child relationship and the child terminates always before the parent. Coroutines (the term was introduced by Melvin Conway1) are a generalization of routines (Donald Knuth2). The principal difference between coroutines and routines is
    [Show full text]
  • Object-Oriented Programming: Some History, and Challenges for the Next fifty Years
    Object-oriented programming: Some history, and challenges for the next fifty years Andrew P. Black Portland State University, Portland, Oregon, USA Abstract Object-oriented programming is inextricably bound up with the pioneering work of Ole-Johan Dahl and Kristen Nygaard on the design of the Simula language, which started at the Norwegian Computing Center in the Spring of 1961. However, object orientation, as we think of it today | fifty years later | is the result of a complex interplay of ideas, constraints and people. Dahl and Nygaard would certainly recognize it as their progeny, but might also be amazed at how much it has grown up. This article is based on a lecture given on 22nd August 2011, on the occasion of the scientific opening of the Ole-Johan Dahl Hus at the University of Oslo. It looks at the foundational ideas from Simula that stand behind object-orientation, how those ideas have evolved to become the dominant programming paradigm, and what they have to offer as we approach the challenges of the next fifty years of informatics. Keywords: 1. Introduction On 22nd August 2011, a public event was scheduled to open both the 18th International Symposium on Fundamentals of Computation Theory and the Ole-Johan Dahl hus, the new building that is home to the University of Oslo's Department of Informatics, and which is shown in Figure 1. The morning Email address: [email protected] (Andrew P. Black) URL: http://www.cs.pdx.edu/~black (Andrew P. Black) Preprint submitted to Information and Computation May 16, 2012 session opened with an Introduction by Morten Dæhlen, which was followed by two invited talks, one by Andrew Black and one by Jose Meseguer, and a discussion panel on the future of object-orientation and programming lan- guages, which was chaired by Arne Maus, and comprised Andrew P.
    [Show full text]
  • Tackling the Awkward Squad for Reactive Programming
    Tackling the Awkward Squad for Reactive Programming: The Actor-Reactor Model Sam Van den Vonder Software Languages Lab, Vrije Universiteit Brussel, Belgium Thierry Renaux Software Languages Lab, Vrije Universiteit Brussel, Belgium Bjarno Oeyen Software Languages Lab, Vrije Universiteit Brussel, Belgium Joeri De Koster Software Languages Lab, Vrije Universiteit Brussel, Belgium Wolfgang De Meuter Software Languages Lab, Vrije Universiteit Brussel, Belgium Abstract Reactive programming is a programming paradigm whereby programs are internally represented by a dependency graph, which is used to automatically (re)compute parts of a program whenever its input changes. In practice reactive programming can only be used for some parts of an application: a reactive program is usually embedded in an application that is still written in ordinary imperative languages such as JavaScript or Scala. In this paper we investigate this embedding and we distill “the awkward squad for reactive programming” as 3 concerns that are essential for real-world software development, but that do not fit within reactive programming. They are related to long lasting computations, side-effects, and the coordination between imperative and reactive code. To solve these issues we design a new programming model called the Actor-Reactor Model in which programs are split up in a number of actors and reactors. Actors and reactors enforce a strict separation of imperative and reactive code, and they can be composed via a number of composition operators that make use of data streams.
    [Show full text]