Interactively Verifying Absence of Explicit Information Flows in Android Apps
Total Page:16
File Type:pdf, Size:1020Kb
Interactively Verifying Absence of Explicit Information Flows in Android Apps Osbert Bastani Saswat Anand Alex Aiken Stanford University Stanford University Stanford University [email protected] [email protected] [email protected] Abstract Categories and Subject Descriptors F.3.2 [Semantics of App stores are increasingly the preferred mechanism for Programming Languages]: Program analysis; F.3.1 [Spec- distributing software, including mobile apps (Google Play), ifying and Verifying and Reasoning about Programs]: Me- desktop apps (Mac App Store and Ubuntu Software Center), chanical verification computer games (the Steam Store), and browser extensions Keywords interactive verification; abductive inference; (Chrome Web Store). The centralized nature of these stores specifications from tests has important implications for security. While app stores have unprecedented ability to audit apps, users now trust 1. Introduction hosted apps, making them more vulnerable to malware that Android malware has become increasingly problematic as evades detection and finds its way onto the app store. Sound the popularity of the platform has skyrocketed in the past static explicit information flow analysis has the potential to few years [67]. App stores currently identify malware using significantly aid human auditors, but it is handicapped by a two-step process: first, they use an automated malware de- high false positive rates. Instead, auditors currently rely on tection pipeline to flag suspicious apps, and then a human au- a combination of dynamic analysis (which is unsound) and ditor manually reviews flagged apps. The detection pipeline lightweight static analysis (which cannot identify informa- typically combines dynamic analysis (e.g., dynamic infor- tion flows) to help detect malicious behaviors. mation flow [20]) and static analysis (e.g., identifying exe- We propose a process for producing apps certified to be cution of dynamically loaded code [27]); subsequent man- free of malicious explicit information flows. In practice, im- ual examination is necessary both because the static analy- precision in the reachability analysis is a major source of sis may be imprecise and because determining if suspicious false positive information flows that are difficult to under- behavior is malicious requires understanding the behavior’s stand and discharge. In our approach, the developer provides purpose (e.g., a map app should not send location data over tests that specify what code is reachable, allowing the static SMS, but it may be okay for a location sharing app to do analysis to restrict its search to tested code. The app hosted so). For example, Google uses a combination of dynamic on the store is instrumented to enforce the provided spec- analysis [7] and manual analysis [6] to audit Android apps ification (i.e., executing untested code terminates the app). on Google Play. We use abductive inference to minimize the necessary in- Important families of malware include apps that leak lo- strumentation, and then interact with the developer to en- cation, device ID, or contact information, and malware that sure that the instrumentation only cuts unreachable code. We covertly send SMS messages to premium numbers [24, 66]. demonstrate the effectiveness of our approach in verifying a Static and dynamic analyses for finding explicit information corpus of 77 Android apps—our interactive verification pro- flows (i.e., flows arising only from data dependencies [52]) cess successfully discharges 11 out of the 12 false positives. can be used to identify such malware [8, 20, 21, 24, 25]. While explicit information flows do not describe all mali- cious behaviors, we focus on them because they can be used to characterize a significant portion (60% according to [21]) of existing malware, especially in conjunction with addi- tional light-weight static analyses [24, 27]. The drawback of dynamic explicit information flow anal- ysis is the potential for false negatives. Malware can easily avoid detection by making the relevant information flow dif- ficult to trigger. For example, the malware can detect that the execution environment is an emulator and disable mali- cious behaviors [61]. While static explicit information flow Of course, a malware developer can attempt to evade de- analysis avoids this problem, it can have a high false posi- tection by specifying that the malicious code is unreachable. tive rate. Because of their global nature, false positive infor- Our solution is simple: we enforce the developer-provided mation flows can be time-consuming for a human auditor to specifications by instrumenting the app to terminate if code understand and discharge. Furthermore, the rate at which the not specified to be reachable (e.g., not covered by any of the auditor can evaluate suspicious applications is likely the bot- developer-provided tests) is actually reached during runtime. tleneck for finding malware, implying that very few benign The instrumented app is both consistent with the developer’s apps should be flagged for manual review—i.e., the malware specifications, and statically verified to be free of explicit in- detection pipeline must have a low false positive rate. As a formation flows. consequence, app stores currently rely on approaches that In practice, it may be difficult for developers to provide achieve low false positive rates (possibly at the expense of tests covering all reachable code. Therefore, we take an it- higher false negative rates), thereby excluding static explicit erative approach to obtaining tests. To enforce the security information flow analysis. policy, it is only necessary to terminate the app if it reaches Our goal is to design a sound verification process for en- untested code that may also lead to a malicious explicit in- forcing the absence of malicious explicit information flows formation flow. Rather than instrument all untested program that guarantees no false negatives and maintains a low false statements, we find a minimum size set of untested state- positive rate. Our first key observation is that for static ex- ments (called a cut) such that instrumenting these statements plicit information flow analysis, a large number of false pos- to terminate execution produces an app that is free of explicit itives are flows through unreachable code. Such false posi- information flows, and then propose this cut to the developer. tives result from imprecision in the static reachability analy- If the developer finds the cut unsatisfactory, then the devel- sis, which can be a major problem for static analyses [9, 16]. oper can provide new tests (or other reachability informa- In our setting, this imprecision is caused both by an impre- tion) and repeat the process; otherwise, if the developer finds cise callgraph (due to virtual method calls) and by the lack of the cut satisfactory, then the cut is enforced via instrumenta- path sensitivity. Using sound assumptions about possible en- tion as before. This process repeats until either a satisfactory try points of an Android app can also lead to imprecision. In cut is produced, or no satisfactory cut exists (in which case our experiments, 92% of false positives were flows through the auditor must manually review the app). We call this pro- unreachable code. Oftentimes, the unreachable code is found cess interactive verification. in large third-party libraries used by the app. If the developer allows (accidentally or maliciously) Our second key observation is that currently the burden reachable code to be instrumented, then it may be possi- of identifying and discharging false positives is placed en- ble for the app to terminate during a benign execution. To tirely on the auditor, despite the fact that the developer is make the process more robust against such failures, we can most familiar with the app’s code. Our approach shifts some produce multiple, disjoint cuts. We then instrument the pro- of this burden onto the developer: we require that the devel- gram to terminate only if at least one statement from every oper specify which methods are reachable in the app. These cut is reached during an execution of the app. reachability specifications allow the static analysis to restrict More formally, our goal is to infer a predicate λ (the its search space to reachable code, thereby reducing the false statements in the cut) such that the security policy φ (lack positive rate. of explicit information flows) holds provided λ holds. We In practice, we envision that the developer will provide compute λ using abductive inference [18]: given facts χ reachability specifications by supplying tests that exercise (extracted from the app P) and security policy φ, abductive the app code—the specification we extract is that only tested inference computes a minimum size predicate λ such that (i) code is reachable. We use tests to avoid the senario where χ ^ λ j= φ (i.e., λ together with the known program facts χ a developer insists (either maliciously or to avoid effort) suffice to prove the security policy φ) and (ii) SAT(χ ^ λ) that everything is reachable, thereby wasting auditor time (i.e., λ is consistent with the known program facts χ). In and eliminating the benefits of our approach. Tests are exe- our setting, we augment χ with facts extracted from the cutable, which means that the auditor can verify the correct- tests. Then the app P is instrumented to ensure that λ holds, ness of the specifications. Using tests as specifications has a producing a verified app P0. Finally, we extend this process number of additional advantages. First, developers routinely to infer multiple disjoint cuts λ1; :::; λn, and instrument P to write tests, so this approach both leverages existing tests and terminate only if every λi fails. gives developers a familiar interface to the specification pro- We propose a novel algorithm for solving the abduc- cess. Second, concrete test cases can benefit the auditor in tive inference problem for properties formulated in terms of case the app must be manually examined.