CRYPTOGUARD: High Precision Detection of Cryptographic Vulnerabilities in Massive-Sized Java Projects

CRYPTOGUARD: High Precision Detection of Cryptographic Vulnerabilities in Massive-sized Java Projects Sazzadur Rahaman1, Ya Xiao1, Sharmin Afrose1, Fahad Shaon2, Ke Tian1, Miles Frantz1, Danfeng (Daphne) Yao1, Murat Kantarcioglu2 1Computer Science, Virginia Tech, Blacksburg, VA 2Computer Science, University of Texas at Dallas, Dallas, TX {sazzad14;yax99;sharminafrose;ketian;frantzme;danfeng}@vt.edu;{Fahad.Shaon;muratk}@utdallas.edu ABSTRACT to use correctly, e.g., certificate verification [37] and cross-language Cryptographic API misuses, such as exposed secrets, predictable encryption and decryption [52]. random numbers, and vulnerable certificate verification, seriously In this work, we focus on the goal of screening massive-sized threaten software security. The vision of automatically screening Java projects for cryptographic API misuses. Specifically, we aim to cryptographic API calls in massive-sized (e.g., millions of LoC) Java design a static analysis tool that has no or few false positives (i.e., programs is not new. However, hindered by the practical difficulty false alarms) and can be routinely used by developers. of reducing false positives without compromising analysis quality, Efforts to screen cryptographic APIs have been previously re- this goal has not been accomplished. State-of-the-art crypto API ported in the literature, including static analysis (e.g., CrySL [44], screening solutions are not designed to operate on a large scale. FixDroid [57], CogniCrypt [43], CryptoLint [31]) and dynamic anal- Our technical innovation is a set of fast and highly accurate slic- ysis (e.g., SMV-Hunter [65], and AndroSSL [34]), as well as manual ing algorithms. Our algorithms refine program slices by identifying code inspection [37]. Static and dynamic analyses have their respec- language-specific irrelevant elements. The refinements reduce false tive pros and cons. Static methods do not require the execution of alerts by 76% to 80% in our experiments. Running our tool, CRYP- programs. They scale up to a large number of programs, cover a wide TOGUARD, on 46 high-impact large-scale Apache projects and 6,181 range of security rules, and are unlikely to have false negatives (i.e., Android apps generate many security insights. Our findings helped missed detections). Dynamic methods, in comparison, require one multiple popular Apache projects to harden their code, including to trigger and detect specific misuse symptoms at runtime (e.g., mis- Spark, Ranger, and Ofbiz. We also have made substantial progress configurations of SSL/TLS). The advantage of dynamic approaches towards the science of analysis in this space, including: i) manually is that they tend to produce fewer false positives (i.e., false alarms) analyzing 1,295 Apache alerts and confirming 1,277 true positives than static analysis. Deployment-grade code screening tools need (98.61% precision), ii) creating a benchmark with 38-unit basic cases to be scalable with wide coverage. Thus, static program analysis and 74-unit advanced cases, iii) performing an in-depth comparison approach is favorable. However, existing static analysis-based tools with leading solutions including CrySL, SpotBugs, and Coverity. We (e.g., [31, 43, 44, 57]) are not optimized to operate on the scale are in the process of integrating CRYPTOGUARD with the Software of massive-sized Java projects (e.g., millions of LoC), which we Assurance Marketplace (SWAMP). explain later. Existing static analysis tools are also limited in detecting SS- KEYWORDS L/TLS API misuses and are not designed to detect complex misuse scenarios. For example, MalloDroid [33] uses a list of accuracy, cryptographic API misuses, static program analysis, false known insecure implementations of HostnameVerifier and positive, benchmark TrustManager to screen apps. Google Play recently deployed an automatic app checking mechanism for SSL/TLS hostname arXiv:1806.06881v5 [cs.CR] 27 Mar 2019 1 INTRODUCTION verifier and certificate verification vulnerabilities11 [ ]. However, Cryptographic algorithms offer provable security guarantees in the the inspection appears to only target obvious misuse scenarios, presence of adversaries. However, vulnerabilities and deficiencies e.g., return true in verify method or an empty body in in low-level cryptographic implementations seriously reduce the checkServerTrusted [4]. guarantees in practice [15, 24, 27, 35, 36]. Researchers also found We made substantial progress toward building a high accuracy misusing cryptographic APIs is not unusual in application-level and low runtime static analysis solution for detecting cryptographic code [31]. Causes of these vulnerabilities are multi-fold, which in- and SSL/TLS API misuse vulnerabilities. Our tool,CRYPTOGUARD, clude complex APIs [12, 55], the lack of cybersecurity training [52], is built on specialized forward and backward program slicing tech- the lack of tools [14], and insecure and misleading forum posts niques. These slicing algorithms are implemented by using flow-, (such as on StackOverflow) [13, 52]. Some aspects of security li- context- and field-sensitive data-flow analysis. braries (such as JCA, JCE, and JSSE1) are difficult for developers Although program slicing is a well-known technique for identifying the set of instructions that influence or are influenced bya program variable, its direct application to screening cryptographic 1JCA, JCE, and JSSE stand for Java Cryptography Architecture, Java Cryptography Extension, and Java Secure Socket Extension, respectively. implementations has several problems, which are explained next. ,, Detection accuracy. A challenging problem that has not been Facebook, Apache, Umeng, and Tencent (Table 5). We ob- solved by prior work is the excessive number of false positives that serve violations in most of the categories, including hardcoded basic static analysis (including slicing) generates. Several types of keyStore passwords, e.g., notasecret is used in multiple detection require one to search for constants or values from pre- Google libraries (Table 4). We also detected multiple SSL/TLS dictable APIs, e.g., passwords, seeds, or initialization vectors (IVs). (MitM) vulnerabilities that Google Play’s automatic screening However, benign constants or irrelevant parameters may be mis- seemed to have missed. taken as violations (e.g., array/collection bookkeeping constants). • We created a benchmark named CRYPTOAPI-BENCH with Another source of detection inaccuracy comes from the assump- 112 unit test cases.3 CRYPTOAPI-BENCH contains basic intra- tion that all the system and runtime libraries are present during the procedural instances, inter-procedural cases, field sensitive analysis. This assumption holds for Android apps (e.g., CrySL [44], cases, false positive tests, and correct API uses. Our evalu- CryptoLint [31]), but not necessarily for Java projects. ation on CRYPTOAPI-BENCH shows that CRYPTOGUARD A feature of our solution CRYPTOGUARD is a set of refinement achieves higher precision and recall than Coverity, SpotBugs algorithms that systematically discard false alerts. These refinement and CrySL [44], which is the state-of-the-art research solu- insights are derived from empirical observations of common pro- tion. The benchmark also reveals false negatives that CRYPTO- gramming idioms and language restrictions to remove irrelevant GUARD needs to improve on in the future. resource identifiers, arguments about states of operations, constants Our key technical novelty and significance are summarized as on infeasible paths, and bookkeeping values. For eight of our rules, follows. [Formulation of problems] We present the mappings be- these refinement algorithms reduce the total number of alerts by tween a number of cryptographic abstractions to concrete Java pro- 76% in Apache and 80% in Android (Figure 3). Our manual analysis gramming elements that can be statically enforced. The mapping shows that CRYPTOGUARD has a precision of 98.61% on Apache. strategy (including specific slicing criteria) is useful beyond CRYP- Runtime overhead and coverage. Existing flow-, context- and TOGUARD (in Section 3). [Methodology development] We special- field-sensitive analysis techniques build a super control-flow graph ize program slicing with new language-based contextual refinement of the entire program, which has a significant impact on runtime. In algorithms and successfully show a significant reduction of false contrast, our on-demand slicing algorithms run much faster, which alarms (related to constants and predictable values). It is a substantial start from the slicing criteria and only propagate to the methods advancement over general-purpose slicing and state-of-the-art so- that have the potential to impact security. Hence, a large portion of lutions (in Section 5). [New security capabilities] Our lightweight the code base is not touched. For the Apache projects we evaluated, algorithm design enables CRYPTOGUARD to check more rules than CRYPTOGUARD took around 3.3 minutes on average. existing solutions, while maintaining high precision. [New security More importantly, our lightweight analysis building blocks enable findings] CRYPTOGUARD enables us to report a number of alarm- us to address complex API misuse scenarios. CRYPTOGUARD covers ing cryptographic coding issues in open source Apache projects more cryptographic properties than CrySL [44], Coverity [1], and and Android (in Sections 6.1 and 6.2). [Science of security] Our SpotBugs [2] combined. Our most complex analysis (for Rule 15 on CRYPTOAPI-BENCH

Load more