From Theory to Practice: Deployment-grade Tools and Methodologies for Software Security

Sazzadur Rahaman

Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Computer Science and Applications

Danfeng (Daphne) Yao, Chair Naren Ramakrishnan Patrick R. Schaumont Gang Wang David Evans

July 20, 2020 Blacksburg, Virginia

Keywords: Program Analysis, Static Analysis, Cryptographic Program Analysis, Taint Analysis, Program Slicing, Payment Card Industry, Data Security Standard, Internet Measurement, Website Scanning, Data Breach, Web Security Copyright 2020, Sazzadur Rahaman From Theory to Practice: Deployment-grade Tools and Methodologies for Software Security Sazzadur Rahaman (ABSTRACT)

Following proper guidelines and recommendations are crucial in software security, which is mostly obstructed by accidental human errors. Automatic screening tools have great potentials to reduce the gap between the theory and the practice. However, the goal of scalable automated code screen- ing is largely hindered by the practical difficulty of reducing false positives without compromising analysis quality. To enable compile-time security checking of cryptographic vulnerabilities, I de- veloped highly precise static analysis tools (CRYPTOGUARD and TAINTCRYPT) that developers can use routinely. The main technical enabler for CRYPTOGUARD is a set of detection algorithms that refine program slices by leveraging language-specific insights, where TAINTCRYPT relies on symbolic execution-based path-sensitive analysis to reduce false positives. Both CRYPTOGUARD and TAINTCRYPT uncovered numerous vulnerabilities in real-world software, which proves the effectiveness. Oracle has implemented our cryptographic code screening algorithms for in its internal code analysis platform, Parfait, and detected numerous vulnerabilities that were pre- viously unknown. I also designed a specification language named SPANL to easily express rules for automated code screening. SPANL enables domain experts to create domain-specific secu- rity checking. Unfortunately, tools and guidelines are not sufficient to ensure baseline security in internet-wide ecosystems. I found that the lack of proper compliance checking induced a huge gap in the payment card industry (PCI) ecosystem. I showed that none of the PCI scanners (out of 6), we tested are fully compliant with the guidelines, issuing certificates to merchants that still have major vulnerabilities. Consequently, 86% (out of 1,203) of the e-commerce websites we tested, are non-compliant. To improve the testbeds in the light of our work, the PCI Security Council shared a copy of our PCI measurement paper to the dedicated companies that host, manage, and maintain the PCI certification testbeds. From Theory to Practice: Deployment-grade Tools and Methodologies for Software Security Sazzadur Rahaman (GENERAL AUDIENCE ABSTRACT)

Automatic screening tools have great potentials to reduce the gap between the theory and the practice of software security. However, the goal of scalable automated code screening is largely hindered by the practical difficulty of reducing false positives without compromising analysis qual- ity. To enable compile-time security checking of cryptographic vulnerabilities, I developed highly precise static analysis tools (CRYPTOGUARD and TAINTCRYPT) that developers can use routinely.

Both CRYPTOGUARD and TAINTCRYPT uncovered numerous vulnerabilities in real-world soft- ware, which proves the effectiveness. Oracle has implemented our cryptographic code screening algorithms for Java in its internal code analysis platform, Parfait, and detected numerous vulner- abilities that were previously unknown. I also designed a specification language named SPANL to easily express rules for automated code screening. SPANL enables domain experts to create domain-specific security checking. Unfortunately, tools and guidelines are not sufficient to ensure baseline security in internet-wide ecosystems. I found that the lack of proper compliance checking induced a huge gap in the payment card industry (PCI) ecosystem. I showed that none of the PCI scanners (out of 6), we tested are fully compliant with the guidelines, issuing certificates to mer- chants that still have major vulnerabilities. Consequently, 86% (out of 1,203) of the e-commerce websites we tested, are non-compliant. To improve the testbeds in the light of our work, the PCI Security Council shared a copy of our PCI measurement paper to the dedicated companies that host the PCI certification testbeds. Dedication

To my parents (Shajeda Akter and Md. Helal Uddin), who traded their dreams to afford my education.

iv Acknowledgments

In underprivileged settings, education loses its priority over the hurdles of “living”. I grew up with a first-hand experience of such hardship. Thanks to my mother’s determination to keep my education going, I am the first person to achieve a graduate degree from my locality. I deeply appreciate the sacrifices of my parents, grandparents, and the other family members for their unconditional love and support throughout my life. Without that, I might not have come this far.

I would like to thank my Ph.D. advisor Dr. Danfeng (Daphne) Yao, who herself made her mark, sailing against countless adversaries. Her advice, patience and understanding as a mentor and help as a friend made my Ph.D. journey easier, enjoyable and rewarding. Thanks to Dr. Naren Ra- makrishnan, Dr. Patrick R. Schaumont, Dr. Gang Wang, Dr. David Evans for serving in my Ph.D committee and guidance over the years. I am also thankful to all my mentors, collaborators and fellow labmates, including Dr. Murat Kantarcioglu, Dr. Gang Wang, Dr. Omar Chowdhury, Dr. Haipeng Cai, Dr. Jung-Min (Jerry) Park, Fahad Shaon, Long Cheng, Sharmin Afrose, Mazharul Is- lam, Ya Xiao, Ke Tian, Miles Frantz, He Li, Md Salman Ahmed, Xiaodong Yu, Fang Liu, Hannah Roth, Alexander Kedrowitsch, Xiaokui Shu and Jamie Davis.

I express my gratitude to my dearest friends including Bushra Tawfiq Chowdhury, Archi Das- gupta, Sajal Dash, Rubayet Elahi, Khan Mizan, Adrina Haldar, Rakibul Hasan, Sajal Iskandar, Reeba Khan, Subrato Kuri, Sufian Latif, Tahmid Nabi, Nabil Nowak, Fabiha Nowshin, Mazharul Islam Rakeb, Tonmoy Roy, Asif Salekin, Maruf Samu, Fahad Shaon, Farin Siddique, Munawwar Mahmud Sohul and many more. Without their cordial support and inspiration the journey would have been harder.

A very special thank goes to my partner in crime, Ipsita Hamid Trisha. Her positive energy fueled the progress in darkest hours. Despite being a constant supporter, her attitude towards life always motivates and challenges me to become a better person.

v Funding Acknowledgment: My work was supported in part by ONR Grant ONR-N00014-17-1- 2498, NSF grants CNS-1929701, CNS-1717028, CNS-1750101, SBIR-1647681, SBIR-1758628 and OAC-1541105, Virginia Tech Pratt Fellowship and Bitshare Fellowship.

Declaration of Collaboration: In addition to my advisor Danfeng (Daphne) Yao, the research presented in this dissertation benefited from several collaborators:

• Ya Xiao (VT), Sharmin Afrose (VT), Fahad Shaon (UTD), Ke Tian (VT), Miles Frantz (VT) and Murat Kantarcioglu (UTD) contributed to the work included in Chapter 3.

• Haipeng Cai (WSU) and Omar Chowdhury (UI) contributed to the work in Chapter 4.

• Gang Wang (UIUC) contributed to the work included in Chapter 5.

vi List of Publications From This Thesis

1. [CCS’19] Sazzadur Rahaman, Gang Wang, Danfeng (Daphne) Yao. Security Certification in Payment Card Industry: Testbeds, Measurements, and Recommendations. ACM Con- ference on Computer and Communications Security (CCS’19). London, United Kingdom. November 2019.

2. [CCS’19] Sazzadur Rahaman, Ya Xiao, Sharmin Afrose, Fahad Shaon, Ke Tian, Miles Frantz, Murat Kantarcioglu, Danfeng (Daphne) Yao. CryptoGuard: High Precision Detec- tion of Cryptographic Vulnerabilities in Massive-sized Java Projects. ACM Conference on Computer and Communications Security (CCS’19). London, United Kingdom. November 2019.

3. [TDSC’19] Sazzadur Rahaman, Haipeng Cai, Omar Chowdhury and Danfeng (Daphne) Yao. From Theory to Code: Identifying Logical Flaws in Cryptographic Implementations. IEEE Transactions on Dependable and Secure Computing (TDSC). 2019.

4. [SecDeV’19] Sharmin Afrose, Sazzadur Rahaman, Danfeng (Daphne) Yao. CryptoAPI- Bench: A Comprehensive Benchmark on Java Cryptographic API Misuses. 2019 IEEE Se- cure Development Conference. McLean, VA. September 2019.

5. [SecDeV’17] Sazzadur Rahaman, Danfeng (Daphne) Yao. Toward Automatic Program Analysis of Cryptography Implementations for Security. 2017 IEEE Secure Development Conference. Cambridge, MA, USA. September, 2017.

vii Contents

List of Figures xiii

List of Tables xviii

1 Introduction 1

1.1 Need for Deployment-grade Solutions ...... 1

1.2 Contribution ...... 3

1.3 Organization of the Report ...... 5

2 Literature Review 6

2.1 Tools to detect cryptographic vulnerabilities ...... 6

2.2 Other analysis tools ...... 7

2.3 Measuring ecosystem-wide security non-compliance ...... 8

3 Cryptographic API Misuse Detection 11

3.1 Introduction ...... 11

3.2 Threat Model and Overview ...... 14

3.2.1 Threat Model ...... 16

3.2.2 Technical Challenges and Solution Overview ...... 17

3.3 Map Vulnerabilities to Analysis ...... 20

3.4 Crypto-specific Slicing ...... 21

viii 3.5 Refinement for FP Reduction ...... 24

3.5.1 Overview of Refinement Insights (RI) ...... 25

3.5.2 RI-I: Removal of State Indicators ...... 26

3.5.3 RI-II: Removal of Source Identifiers ...... 27

3.5.4 RI-III: Removal of bookkeeping indices...... 27

3.5.5 RI-IV: Removal of contextually incompatible constants...... 28

3.5.6 RI-V: Removal of constants in infeasible paths...... 28

3.5.7 Evaluation of Refinement Methods ...... 29

3.6 Security Findings and Evaluation ...... 32

3.6.1 Security Findings in Apache Projects ...... 34

3.6.2 Security Findings in Android Apps ...... 38

3.6.3 Comparison with Existing Tools ...... 41

3.7 Domain-specific Static Security Validation ...... 45

3.7.1 Motivation ...... 45

3.7.2 Overview ...... 46

3.7.3 IR code of the SPANL ...... 46

3.7.4 SPANL Language ...... 48

3.7.5 Expressiveness of the language ...... 54

3.8 Limitations and Discussion ...... 54

3.9 Summary ...... 57

4 Program Analysis of Cryptographic Implementations 58

ix 4.1 Introduction ...... 58

4.2 Motivation and Threat Model ...... 61

4.2.1 Motivating Examples ...... 61

4.2.2 Threat Model ...... 63

4.3 Cryptographic Vulnerabilities ...... 64

4.3.1 Chosen-plaintext attacks on IVs ...... 65

4.3.2 Attacks on PRNG ...... 65

4.3.3 Use of Legacy Ciphers ...... 66

4.3.4 Padding Oracles ...... 67

4.3.5 Side-Channel Exploitations ...... 68

4.3.6 State Machine Vulnerabilities ...... 69

4.3.7 Programming Errors ...... 69

4.4 Security Rules and Enforcement ...... 70

4.4.1 Mapping Rules with Program Properties ...... 71

4.4.2 System Overview ...... 75

4.5 Evaluation ...... 77

4.5.1 Controlled Experiments ...... 78

4.5.2 Limitations ...... 83

4.6 Summary ...... 84

5 Security in Payment Card Industry 85

5.1 Introduction ...... 85

x 5.2 Background on PCI and DSS ...... 88

5.2.1 Payment Card Ecosystem ...... 89

5.2.2 PCI Council and Data Security Standard ...... 90

5.2.3 Our Threat Model and Method Overview ...... 92

5.3 Measurement Methodology ...... 93

5.3.1 Security Test Cases ...... 93

5.3.2 Testbed Architecture and Implementations ...... 95

5.3.3 Research Ethics ...... 98

5.4 Evaluation of PCI Scanners ...... 99

5.4.1 Comparison of Scanner Performance ...... 103

5.4.2 Impacts of Premature Certification ...... 106

5.4.3 Evaluation of Website Scanners ...... 107

5.5 Measurement of Compliant Websites ...... 108

5.5.1 Implementation Details of PCICHECKERLITE ...... 111

5.5.2 Findings of E-commerce Website Compliance...... 114

5.6 Disclosure and Discussion ...... 118

5.7 Summary ...... 122

6 Conclusion and Future Work 123

Appendices 125

Appendix A Cryptographic Code Screening 126

xi A.1 Example intermediate codes ...... 126

A.2 Other Evaluation Results ...... 130

Appendix B Security in Pamyment Card Industry 134

Bibliography 138

xii List of Figures

3.1 (a) An example demonstrating various features of CRYPTOGUARD. Crypto class is used for generic AES encryption and PasswordEncryptor class uses Crypto for password encryption. f indicates influence through the fields

and p indicates influence through the method parameters. (b) Partial data dependency graph for keyBytes variable...... 15

3.2 Indirect field access using orthogonal invocations on data-only class object $r1. . 24

3.3 Reduction of false positives with refinement insights in 46 Apache projects (94 root-subprojects) and 6,181 Android apps. Top 6 rules with maximum reductions are shown...... 29

3.4 Breakdown of the reduction of false positives due to five of our refinement insights. 30

3.5 The impact of the orthogonal exploration depth on F1 scores and the number of discovered constants in (a), runtime in (b), and analysis properties in (c) for 8 rules. 31

3.6 Disabled hostname verification in Apache Cxf...... 34

3.7 Trusting all certificates in Apache Ofbiz...... 34

3.8 Test cases per Rule in CRYPTOAPI-BENCH (as of April 2019)...... 41

3.9 Test cases per API in CRYPTOAPI-BENCH (as of April 2019). A test case can cover one ore more APIs (e.g., test cases for Rule 15). APIs corresponding to the labels can be found in Tables A6, A4, and A5...... 42

xiii 3.10 Runtime comparison in log scale of CryptoGuard and CrySL on 30 Apache root- subprojects in (a) and 30 Android applications in (b), ordered by decreasing lines of code (LoC). * indicates crash. CryptoGuard successfully completed all tasks. The LoCs are shown in Table A2...... 43

3.11 System components of SPANL...... 47

3.12 Grammar the intermediate code ...... 48

3.13 Grammar for parsing the API section ...... 49

3.14 Grammar for parsing the Operation section ...... 50

3.15 Grammar for parsing the Emit section ...... 51

3.16 Grammar for parsing the Constraint section ...... 51

3.17 Grammar for parsing the execution section ...... 52

3.18 Type judgements for execution section...... 52

4.1 Excerpt from a real-world cryptographic program (the core ScreenOS 6.2 PRNG functions [100]), where prng_temporary and prng_output_index are global variables. When prng_reseed is called (Line 16), the loop variable prng_output_index in function prng_generate is set to 32, causing prng_generate to output sensitive data prng_seed at Line 5...... 62

4.2 Function-level size metrics statistics of OpenSSL as an example large-scale cryp- tographic software...... 63

4.3 Finite state machine (FSM) of taint analysis...... 70

4.4 Finite state machine (FSM) and transition function table (δ) to detect insecure cryp- tographic primitives...... 73

4.5 Finite state machine (FSM) to detect unsanitized inputs from external sources. ... 73

xiv 4.6 DFA to detect (a) early termination in a loop, (b) non-generic error messages by traversing program dependence graphs. Inputs of these DFA are the nodes of of the graph. Here, represents the generic triggering node. For example, EVP_DecryptFinal_ex can serve as a trigger for rule 4. We labeled the inputs for only the state changing transitions...... 74

4.7 Overview of the process flow of our analysis approach TAINTCRYPT, including its input (cryptographic programs), output (security report), and three key techni- cal components in between (Clang preprocessing, symbolic execution, and taint checking)...... 75

4.8 An example of TAINTCRYPT detecting the use of vulnerable functionality (MD5) in OpenSSL, which violates the security rule against using broken hash functions (Row 23 of Table 4.1). In this example, our analysis correctly identified the viola- tion by reporting the invocation of a vulnerable hash function EVP_MD5(). .... 76

4.9 An example of taintCrypt detecting the memory disclosure vulnerability in OpenSSL- 1.0.1f (Row 22 of Table 4.1) hence the violation of the rule 22 against that vulner- ability. Here the use of external data in variable payload without proper saniti- zation causes disclosure of memory of an arbitrary size...... 77

4.10 An example of TAINTCRYPT facilitating the enforcement of security rule 11 which concerns secured random number generation in OpenSSL. In this case, as there ex- ists a path from the source (a) to the sink that avoids the filter ssl_fill_hello_random

(b), TAINTCRYPT will generate a warning reporting the rule being violated. .... 80

4.11 An example illustrating the ability of our analysis in detecting and reporting an instance of data leak in Juniper Network. In this case, the sensitive data source in variable prng_seed as the first 8 bytes of variable prng_temporary reaches the sink print_number, which violates our security rule 19...... 81

xv 4.12 An example case of our technique being used to detect a double-free vulnerability in OpenSSL, which constitutes an instance of violation of our security rule 12.

Here TAINTCRYPT reports the double-free incident with variable parms...... 82

5.1 Overview of the payment card ecosystem...... 89

5.2 Illustration of the baseline scanning and the certified version. A PCI scanner it- eratively scans the testbed. The initial scan (baseline) is on the original testbed with all 35 vulnerabilities. The certified version is the testbed version where the testbed successfully passes the scanning after we iteratively fix a minimal set of vulnerabilities in the testbed. In Table 5.3, we report the scanning results on both versions of the testbed for each scanner...... 99

5.3 An example of wrong hostname in the certificate. The domain (a*****.***) uses a certificate that is issued for a different domain name (*.n*****.***)...... 115

5.4 Self-signed certificate used by (r*****.***), a website that accepts payment cards for donations...... 115

5.5 (u*****.***) uses expired certificates by default and redirects users to a secure sub-domain with proper certificate during payment...... 116

5.6 A sample question from the Self-Assessment Questionnaire D (SAQ D) [52]. “Yes with CCW” means “the expected testing has been performed, the requirement has been met with the assistance of a compensating control, and a Compensating Con- trol Worksheet (CCW) is required to be submitted along with the questionnaire” ...... 119

5.7 Self-Assessment Questionnaires (SAQs) for different types of e-commerce mer- chants...... 120

B1 Homepage of www.rwycart.com...... 137

xvi B2 Customer login page of www.rwycart.com...... 137

xvii List of Tables

3.1 Cryptographic vulnerabilities, properties, and static analysis methods used. High, medium, and low risk levels are denoted by H/M/L, respectively. CPA stands for chosen ciphertext attack, MitM for man-in-the-middle, C/I/A for confidentiality, integrity, and authenticity, respectively. ↑ means backward slicing and ↓ means forward slicing. Slicing is inter-procedural unless otherwise specified (e.g., intra, both). Refinement insights are applied for all the inter-procedural backward slicing. 19

3.2 The impact of clipping orthogonal explorations at various depth on runtime across 30 Apache root-subprojects. STD is computed across projects. Variations across multiple runs (3 runs) are negligible...... 31

3.3 Breakdown of accuracy in Apache projects. Duplicates are handled at root-subproject level (total 82 root-subprojects) level. For Rules 1, 2, 3, 8, 10, 12, each constan- t/predictable value of an array/collection is considered as an individual violation. . 33

3.4 Experimental results on the CRYPTOAPI-BENCH basic and CRYPTOAPI-BENCH advanced benchmarks (as of April 2019) with CrySL, Coverity, SpotBugs and

CRYPTOGUARD. GTP stands for the ground truth positives. TP, FP, and FN are the number of true positives, false positives, false negatives in a tool’s output, re- spectively. Pre. and Rec. represent precision and recall, respectively. Tools are evaluated on 6 common rules (out of our 16 rules), i.e., the maximum common subset of all tools. For these 6 rules, there are 6 correct cases (i.e., true negatives) in basic and 3 correct cases in advanced, which are used for computing FPRs. Total alerts = TP + FP...... 38

3.5 Distribution of vulnerabilities in Android apps...... 39

xviii 3.6 Violations in 5 popular libs (manually confirmed)...... 40

3.7 Summary of average runtime (in seconds) across all completed runs for CrySL

and CRYPTOGUARD. We evaluated 30 Apache root-subprojects and 30 Android apps, each with 3 runs. Incmpl stands for the number of incomplete analyses. Standard deviations (std) are computed across projects/apps. Variations across runs are negligible...... 45

4.1 Enforceable security rules in different cryptographic implementations. (*) indi- cates a rule focusing on data integrity and (#) indicates a rule focusing on data secrecy protection. Here, CPA and CCA stand for chosen plaintext attack and chosen ciphertext attack, respectively...... 66

4.2 Transition functions (δ) of the finite state machine (FSM) presented in Figure 4.3. . 72

4.3 Overview of TAINTCRYPT evaluation...... 78

5.1 PCI Compliance levels and their evaluation criteria...... 91

5.2 Prices of PCI scanners and the actual costs...... 101

5.3 Testbed scanning results. “Baseline” indicates the scanning results on our testbed when all the 35

vulnerabilities are active. “Certified” indicates the scanning results after fixing the minimum num-

ber of vulnerabilities in order to be compliant. “#”, “G#”, “ ” means severity level of low, medium,

and high respectively according to the scanners. “” mean “undetected”, “” means “fixed in the

compliant version”, “∗” means “fixed as a side-effect of another case”. The “website scanners”

represent a separate experiment to determine whether website scanners can help to improve cover-

age. We ran the website scanners on test cases that were not detected by the PCI ASV scanners.

“N/A” means “not testable by a scanner”. “-” means “testable but do not need testing”. The "Must

Fix" column shows the vulnerabilities that must be fixed by the e-commerce websites in order to

be certified as PCI DSS compliant...... 102

xix 5.4 Number of e-commerce websites that have at least one vulnerability and those that have at least one “must-fix” vulnerability. In total, 1,203 sites are tested including 810 sites chosen from different web categories, and 393 sites chosen from different Alexa ranking ranges...... 110

5.5 Testing results on 1,203 real-world websites that accept payment card transactions as of May 3, 2019. We reuse the index numbers of the test cases from Table 5.3. . 110

5.6 Comparison between PCICHECKERLITE and the customized w3af on 100 randomly chosen websites and the Buggycart testbed. We report the number of vulnerable websites detected and the false positives (FP) among them...... 117

A1 The number of alerts in Apache (total 94 root-subprojects) and Android applica- tions (6,181). For Rules 1, 2, 3, 8, 10, 12, each constant/predictable value of an array/collection is considered as an individual violation...... 130

A2 Lines of code (LoC) of 30 Apache root-subprojects with their dependencies and 30 Android applications...... 131

A3 Benchmark comparison of CrySL, Coverity, SpotBugs, and CryptoGuard on all

16 rules with CRYPTOAPI-BENCH’s 112 test cases (as of April 2019). There are 16 secure API use cases (13 in basic and 3 in advanced), which a tool should not

raise any alerts on. CRYPTOGUARD successfully passed these 16 test cases. GTP stands for ground truth positive, which is the number of positives in the bench-

mark. CRYPTOGUARD has 11 false negatives, which we reported in Section 3.6 and discussed in Section 3.8...... 132

A4 Rules that use intra-procedural backward program slicing to slice implemented methods of standard Java APIs and their corresponding slicing criteria...... 132

A5 Java APIs used as slicing criteria in our intra-procedural forward program slicing and their corresponding security rules...... 132

xx A6 Java APIs used as slicing criteria in our inter-procedural backward slicing and their corresponding security rules. Boldface indicates the parameter of interest...... 133

B1 Specifications defined by the PCI Security Standard Council (SSC) along with their targets, evaluators, assessors and whether it is enforced by SSC. “COTS" stands for Commercial Off-The-Shelf...... 134

B2 A summary of the guidelines for ASV scanners [41]. In the fourth column, we show the categories that are required to be fixed. “∗" means that in the SSL/TLS category, all the vulnerabilities are required to be fixed, except case 18...... 135

B3 PCI DSS requirements are presented with expected testing (from SAQ D-Mer) and the potential

test-cases that can be used to evaluate the ASV scanning...... 136

xxi Chapter 1

Introduction

1.1 Need for Deployment-grade Solutions

Provable secure cryptographic protocols are not sufficient to ensure software security. While im- plementing these cryptographic protocols, one hopes to replicate the security guarantees provided by their theoretical cryptographic counterparts, that have been proven to be secure. This seem- ingly straightforward goal of implementing applications with provably secure guarantees, how- ever, is often unaccomplished as evident in the recent high-profile outbreaks of cryptography- related vulnerabilities in widely used network libraries and tools (e.g., heartbleed vulnerabilities in OpenSSL [15] and seed leaking in Juniper Network [100]). Researchers also showed that the incorrect uses of these library APIs across ecosystems (e.g., Ubuntu software packages [125], An- droid apps [112], etc.) are causing enormous amount of critical vulnerabilities affecting millions of users [112, 114, 120]. Surprisingly, the task of securing cryptographic implementation as well as their use is still in its infancy. This status is in sharp contrast with the multi-decade advance- ment of modern cryptography. The main challenge of producing deployment-grade techniques to automatically find vulnerabilities is to model security properties in terms of meta-level program properties so that (1) the modeling causes low or no false positives, and (2) a low cost (low runtime overhead, low false positives, low false negatives) checking is possible.

Together with scalable vulnerability detection, it requires formal guidelines and their strict en- forcement to ensure baseline security of an internet-wide ecosystem. For example, to standardize the payment card industry (PCI) security requirements at a global scale, major card brands (in-

1 2 Chapter 1. Introduction Sazzadur Rahaman

cluding Visa, MasterCard, American Express, Discover, and JCB) formed an alliance named Pay- ment Card Industry Security Standards Council (PCI SSC). The council maintains, updates, and promotes Data Security Standard (DSS) [51] that defines a comprehensive set of security require- ments for payment systems. However, several recent high-profile data breaches [34, 174] have raised concerns about the security of the payment card ecosystem, especially for e-commerce mer- chants1. A research report from Gemini Advisory [44] shows that 60 million US payment cards have been compromised in 2018 alone. Among the merchants that experienced data breaches, many were known to be compliant with the PCI data security standards (PCI DSS). For example, in 2013, Target leaked 40 million payment card information due to insecure practices within its internal networks [174], despite that Target was marked as PCI DSS compliant. These incidents raise important questions about how PCI DSS is enforced in practice.

To understand and improve the existing security status, this thesis aims to make security prac- tices more democratized and affordable by producing deployment-grade tools for the practition- ers. It introduced new scalable solutions to detect software vulnerabilities statically and measure ecosystem-wide security compliance. Specifically, it poses the following research questions.

• Automatic detection of cryptographic vulnerabilities. Is it possible to design low cost (low FPs, low FNs, and low runtime overhead) static analysis solutions to detect a wide range of cryptographic vulnerabilities? (Chapter 3, 4) Is it possible to develop a universal specification language to express a meta-level program property that can be detected by using a combination of various types of static data-flow analyses? (Chapter 3)

• Measuring the security non-compliance. How well are the payment card industry (PCI) data security standards enforced in practice? Do real-world e-commerce websites live up to PCI data security standards? (Chapter 5)

1Merchants that allow online payment card transactions for selling products and services are referred to as “e- commerce merchants”. Sazzadur Rahaman Chapter 1. Introduction 3

1.2 Contribution

In this section, we summarize the technical contributions of this thesis.

• Precise and scalable cryptographic misuse detection in Java. To detect cryptographic misuses

in massive-sized Java applications, we propose CRYPTOGUARD. CRYPTOGUARD is a set of highly precise (98.61% precision) detection algorithms that refine program slices by identifying language-specific irrelevant elements. The refinements reduce false alerts by 76% to 80% in our

experiments. Running our tool, CRYPTOGUARD, on 46 high-impact large-scale Apache projects and 6,181 Android apps generated many security insights. Our findings helped multiple popular Apache projects to harden their code, including Spark, Ranger, and Ofbiz. Our in-depth compari-

son with leading solutions including CrySL, SpotBugs, and Coverity, shows that CRYPTOGUARD significantly outperforms all the existing solutions. We design a universal specification language

named SPANL, to model meta-level program properties that can be detected by using a combina- tion of various forms of data-flow analyses. We then both theoretically and empirically prove the expressiveness of the language.

• Detection of cryptographic implementation flaws and misuse vulnerabilities in C/C++. We conducted an in-depth exploratory study of security vulnerabilities in cryptographic programs, which resulted in a taxonomy of 25 classes of exploitable vulnerabilities of 12 distinct types of security attacks. Then, we develop a deterministic finite automaton (DFA)-based specification language, which enables expressing taint-style meta-level program properties. We show that this specification language can be used to model 15 out of 25 the security issues we found. We

further develop a tool called TAINTCRYPT to validate the rules written in this specification lan-

guage at compile-time. We demonstrate the efficacy of TAINTCRYPT by analyzing open-source

C/C++ cryptographic libraries (e.g., OpenSSL) and observe that TAINTCRYPT could have helped to avoid several high-profile implementation vulnerabilities. Our evaluation on 5 popular appli- cations and libraries generated new security insights. 4 Chapter 1. Introduction Sazzadur Rahaman

• Measuring the security non-compliance. To measure an ecosystem-wide security non- compliance, we systematically evaluate the PCI DSS certification process for e-commerce web-

sites. We develop an e-commerce web application testbed, BUGGYCART, which can flexibly add or remove 35 PCI DSS related vulnerabilities. Then we use the testbed to examine the capability and limitations of PCI scanners and the rigor of the certification process. We found that none of the 6 PCI scanners we tested are fully compliant with the PCI scanning guidelines, issuing certificates to merchants that still have major vulnerabilities. To further examine the compliance

status of real-world e-commerce websites, we build a new lightweight scanning tool named PCI-

CHECKERLITE and scan 1,203 e-commerce websites across various business sectors. The results confirm that 86% of the websites have at least one PCI DSS violation that should have disqualified

them as non-compliant. Our in-depth accuracy analysis also shows that PCICHECKERLITE’s out- put is more precise than w3af. We reached out to the PCI Security Council to share our research results to improve the enforcement in practice.

This thesis enjoyed attentions from both the industry and the academia by reporting numerous vul- nerabilities, and open-sourcing various tools and benchmarks. It has a direct impact on improving the security of several high profile Apache projects (e.g., Spark, Ranger, Ofbiz, etc.). Oracle has implemented our cryptographic code screening algorithms in its internal code analysis plat- form, Parfait and detected numerous vulnerabilities that were previously unknown. CRYP-

TOGUARD is open-sourced. To make CRYPTOGUARD more reachable to the developers, we are in the process of integrating it with SWAMP and created build tool plugins (Maven and Gradle). We reached out to the PCI Security Council to share our research results to improve the enforcement in practice. To improve the testbeds in the light of our findings, the Security Council shared a copy of our paper to the dedicated companies that host the PCI certification testbeds. SAP and Mastercard asked for our code, indicating their intention to test our tools and gain more insights to improve their testbed and/or scanning tools. Our measurement on real-world websites found exploitable vulnerabilities. We responsibly disclosed the severe vulnerabilities to the website owners. Sazzadur Rahaman Chapter 1. Introduction 5

1.3 Organization of the Report

The structure of this thesis is as follows. In Chapter 2, I present related work on the automatic detection of cryptographic vulnerabilities and measuring security non-compliance. In Chapter 3, I present my work on detecting cryptographic misuses in Java. Here, I also present, my ongoing work on building a universal language to detect security vulnerabilities by using static data-flow analysis. Chapter 4 presents my work on detecting cryptographic implementation and misuse flaws in C/C++. Chapter 5 presents the work on measuring security non-compliance in the payment card industry. Chapter 6 concludes this report. Chapter 2

Literature Review

2.1 Tools to detect cryptographic vulnerabilities

Cryptographic misuse detection tools are typically constructed into two broad groups, i.e., static analysis (e.g., CryptoLint [112], MalloDroid [114], FixDroid [156], CogniCrypt [134] and CrySL [135]) and dynamic analysis (e.g., SMV-Hunter [177], AndroSSL [116] and K- Hunt [138]). For example, MalloDroid [114] uses a list of known insecure implementations of HostnameVerifier and TrustManager to screen Android apps. In [131], authors showed that generating false positives is one of the most significant barrier to adopt static analysis tools. This problem also exists in anomaly and intrusion detection systems [140, 190]. When screen- ing large projects, virtually all static slicing solutions in this space (e.g., [112]) might generate a

non-negligible amount of false positives. Contextual refinements similar to CRYPTOGUARD’s is necessary to achieve high precision in practice.

CrySL and CRYPTOGUARD have different-but-overlapping security capabilities. Based on CrySL

code and documentation, we identified rules that CrySL supports, but CRYPTOGUARD does not cover. For example, CrySL covers rules to verify the correctness of Signature and MAC genera- tion procedures. CrySL also reports non-crypto issues, e.g., variables not being used. On the flip

side, CRYPTOGUARD supports rules that CrySL does not cover (Rules 4, 5, 7, 8, and 9), includ- ing dummy hostname verifier, dummy certificate validation, use of HTTP, predictable seeds, and untrusted PRNG. For fairness, we only compare the intersected portion of capabilities.

Other misuse detection tools (e.g., FixDroid [156] and CogniCrypt [134]) were mainly built for

6 Sazzadur Rahaman Chapter 2. Literature Review 7

the user-experience study with the goal of making detection tools developer-friendly, as opposed to a deployment-quality screening solution. For example, FixDroid focuses on providing real- time feedback to developers. CogniCrypt’s [134] focus is on code generation (in Eclipse IDE) for several common cryptographic tasks (e.g., data encryption). Some dynamic analysis tools use a simple static analysis to first narrow down the number of potential apps for dynamic analy- sis. For example, SMV-Hunter [177] looks for apps that contain any custom implementation of X509TrustManager or HostNameVerifier for the initial screening.

The recent work SymCerts used a combination of concrete values and symbolic execution to detect missing checks in X.509 certificate verification code [99]. Concrete values are used to reduce the path exploration space.

In [184], the authors explored the capability of symbolic execution to detect decryption oracles data authentication vulnerabilities in WPA2 implementations. Authors showed that mishandling data authentication may result in timing side channels or decryption oracles, which nicely compliments

our tool, TAINTCRYPT.

2.2 Other analysis tools

Researchers found that misusing non-cryptographic APIs in Android also have serious security implications. These APIs include APIs to access sensitive information (such as location, IMEI, and passwords) [153], APIs for fingerprint protection [89], and cloud service APIs for information storage [193]. Data driven techniques to identify API misuses have been proposed [159, 191], which use lightweight static analysis to infer detection rules from examples. In [149], authors proposed a Bayesian framework for automatically learning correct API uses. Efforts on automat- ically repairing insecure code have also been reported [143, 144, 163]. Static code analysis has been extensively used for other related software problems as well, including malware analysis and detection [113, 160, 188], vulnerability discoveries [89, 136], and data-leak detection [93]. 8 Chapter 2. Literature Review Sazzadur Rahaman

In [103], Chi et al. presented a system to infer client behaviors by leveraging symbolic executions of client-side code. They used such knowledge to filter anomalous traffic.

Fuzzing has been demonstrated to automatically discover software vulnerabilities [105, 175, 176]. These techniques aim to find input guided vulnerabilities that result in observable behaviors (e.g., triggering program crashes [176] or anomalous protocol states [105, 175]). It is unclear how to use fuzzing to detect cryptographic vulnerabilities (e.g., predictable IVs/secrets, legacy primitives) that do not exhibit easily observable anomalous behaviors. Another notable example of dynamic approaches is to validate runtime protocol behaviors with a verifier, which is capable of detect- ing invalid or inconsistent network messages [103]. Although the work in [103] uses symbolic execution to infer client behaviors, it requires the executing programs for detecting anomalies.

Similar to these, other dynamic approaches also rely on concrete executions of the program. Ac- cordingly, their results are subject to the availability and quality of the run-time inputs that drive the executions, which may not be always available in practical use scenarios. They are limited to finding only the input guided vulnerabilities with externally visible behaviors (e.g., triggering pro- gram crashes [176] or anomalous protocol states [103, 105]). In addition, as pointed out by [99], fuzzing and other dynamic analysis techniques typically cannot guarantee coverage, which may result in missed detection.

In contrast, our approach to detect cryptographic security rule violations is purely static thus by- passes the above limitations of dynamic approaches. Moreover, static analysis has more potential to be scalable with wide coverage than dynamic analysis, as it does not require program execution.

2.3 Measuring ecosystem-wide security non-compliance

Website Scanning. The detection of web application vulnerabilities has been well studied by re- searchers [82, 107, 178]. In [82, 186], authors measured the performance of several black-box web scanners and reported a low detection rate for XSS and SQL injection attacks. The main challenge Sazzadur Rahaman Chapter 2. Literature Review 9

is to exhaustively discover various web-app states by observing the input/output patterns. Duchene et al. [111] proposed an input fuzzer to detect XSS vulnerabilities. Doupé et al. [107] proposed to guide fuzzing based on the website’s internal states. In [161], authors proposed a black-box method to detect logical flaws using network traffic. In [178], authors used a taint-tracking based detection of XSS vulnerabilities at the client-side. In [162], authors used dynamic execution trace- based behavioral models to detect CSRF vulnerabilities. Although most defenses against XSS and SQL inject attacks prescribe input sanitization [79, 127, 141], in [109], authors proposed an application-agnostic rewrite technique to differentiate scripts from other HTML inputs. We argue that similar research efforts could make a positive impact to the PCI community by (1) producing and releasing high-quality open-sourced tools; and (2) customizing a non-intrusive version of the tool for testing production websites in the PCI DSS context.

Proactive Threat Measurements. Honeypots [154, 164] are useful to collect empirical data on attackers (or defenders). In [124], authors measure attack behaviors by deploying vulnerable web servers waiting to compromised. In [157], authors deployed phishing websites to measure the timeliness of browsers’ blacklist mechanisms. In [97], authors measure the capability of the web hosting providers to detect compromised websites by deploying vulnerable websites within those web hosting services. Our testbed can be regarded as a specialized honeypot to assess the capability of PCI scanners.

Physical Card Frauds. Payment card frauds at ATM or point-of-sale (POS) machines have been studied for decades [72, 73, 91, 110, 151, 172, 173]. Most of these frauds occur due to steal- ing payment card information during physical transactions [59, 72], and cloning magnetic stripe cards [172, 173]. EMV cards are known to be resistant to card cloning, but are vulnerable to tem- pered terminals [110], or due to protocol-level vulnerabilities [151] and implementation flaws [91]. Recently, researchers proposed mechanisms to detect magnetic card skimmers [88, 172].

Digital Card Frauds. In the online setting, the danger of using magnetic-stripe-like transactions is known for years [20, 44]. Various methods (e.g., 3D-Secure [47], Tokenization framework [33]) 10 Chapter 2. Literature Review Sazzadur Rahaman

have been proposed to fix it. Unfortunately, 3D-Secure is found to be inconvenient and easy to break [150]. Tokenization framework offers a great alternative by replacing original card infor- mation with temporary tokens during a transaction. However, card information can still be stolen during account setup phase at a poorly secured merchant. Other unregulated digital financial ser- vices are also reported to be insecure [169]. In [169], the authors showed that branchless banking apps that leverage cellular networks to send/receive cashes are also vulnerable due to flaws such as skipping SSL/TLS certificate validation, and using insecure cryptographic primitives. Chapter 3

Cryptographic API Misuse Detection

3.1 Introduction

Cryptographic algorithms offer provable security guarantees in the presence of adversaries. How- ever, vulnerabilities and deficiencies in low-level cryptographic implementations seriously reduce the guarantees in practice [92, 100, 104, 118, 119, 181]. Researchers also found misusing cryp- tographic APIs is not unusual in application-level code [112]. Causes of these vulnerabilities are multi-fold, which include complex APIs [64, 152], the lack of cybersecurity training [145], the lack of tools [66], and insecure and misleading forum posts (such as on StackOverflow) [65, 145]. Some aspects of security libraries (such as JCA, JCE, and JSSE1) are difficult for developers to use correctly, e.g., certificate verification [121] and cross-language encryption and decryption [145].

In this work, we focus on the goal of screening massive-sized Java projects for cryptographic API misuses. Specifically, we aim to design a static analysis tool that has no or few false positives (i.e., false alarms) and can be routinely used by developers.

Efforts to screen cryptographic APIs have been previously reported in the literature, including static analysis (e.g., CrySL [135], FixDroid [156], CogniCrypt [134], CryptoLint [112]) and dynamic analysis (e.g., SMV-Hunter [177], and AndroSSL [116]), as well as manual code inspection [121]. Static and dynamic analyses have their respective pros and cons. Static methods do not require the execution of programs. They scale up to a large number of programs, cover a wide range of secu-

1JCA, JCE, and JSSE stand for Java Cryptography Architecture, Java Cryptography Extension, and Java Secure Socket Extension, respectively.

11 12 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman rity rules, and are unlikely to have false negatives (i.e., missed detections). Dynamic methods, in comparison, require one to trigger and detect specific misuse symptoms at runtime (e.g., miscon- figurations of SSL/TLS). The advantage of dynamic approaches is that they tend to produce fewer false positives (i.e., false alarms) than static analysis. Deployment-grade code screening tools need to be scalable with wide coverage. Thus, static program analysis approach is favorable. However, existing static analysis-based tools (e.g., [112, 134, 135, 156]) are not optimized to operate on the scale of massive-sized Java projects (e.g., millions of LoC).

Existing static analysis tools are also limited in detecting SSL/TLS API misuses and are not designed to detect complex misuse scenarios. For example, MalloDroid [114] uses a list of known insecure implementations of HostnameVerifier and TrustManager to screen apps. Google Play recently deployed an automatic app checking mechanism for SSL/TLS hostname verifier and certificate verification vulnerabilities [48]. However, the inspection appears to only target obvious misuse scenarios, e.g., return true in verify method or an empty body in checkServerTrusted [24].

We made substantial progress toward building a high accuracy and low runtime static anal- ysis solution for detecting cryptographic and SSL/TLS API misuse vulnerabilities. Our tool,CRYPTOGUARD, is built on specialized forward and backward program slicing techniques. These slicing algorithms are implemented by using flow-, context- and field-sensitive data-flow analysis.

Although program slicing is a well-known technique for identifying the set of instructions that influence or are influenced by a program variable, its direct application to screening cryptographic implementations has several problems, which are explained next.

Detection accuracy. A challenging problem is the excessive number of false positives that basic static analysis (including slicing) generates. Several types of detection require one to search for constants or values from predictable APIs, e.g., passwords, seeds, or initialization vectors (IVs). However, benign constants or irrelevant parameters may be mistaken as violations (e.g., array/col- Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 13

lection bookkeeping constants). Another source of detection inaccuracy comes from the assump- tion that all the system and runtime libraries are present during the analysis. This assumption holds for Android apps, but not necessarily for Java projects.

CRYPTOGUARD addresses the false positive problem with a set of refinement algorithms derived from empirical observations of common programming idioms and language restrictions. The re- finements remove irrelevant resource identifiers, arguments about states of operations, constants on infeasible paths, and bookkeeping values. For eight of our rules, these refinement algorithms reduce the total number of alerts by 76% in Apache and 80% in Android (Figure 3.3). Our manual

analysis shows that CRYPTOGUARD has a precision of 98.61% on Apache.

Efficiency and coverage. Analysis techniques that build a super control-flow graph of the entire program would incur significant memory and runtime overhead. In contrast, our on-demand slicing algorithms are lightweight, which start from the slicing criteria and only propagate to the methods that have the potential to impact security. Hence, a large portion of the code base is not touched.

Our technical contributions are summarized as follows.

• We designed and implemented a set of analysis algorithms for detecting cryptographic and

SSL/TLS API misuses. Our static code checking tool, CRYPTOGUARD, is designed for de-

velopers to use routinely on large Java projects. Besides open-sourcing CRYPTOGUARD2, we are currently integrating it with the Software Assurance Marketplace (SWAMP) [106], a well-known free software security analysis platform.

• We gained numerous security insights from screening 46 Apache projects. For 15 of our rules, we observed violations in Apache projects (Table A1). 39 out of the 46 projects have at least one type of cryptographic misuses, and 33 projects have at least two. We reported our security findings to Apache, some of which have been promptly fixed. In Section 3.8, we share our experience of disclosing to the Apache teams and their pragmatic constraints e.g., backward

2Available at https://github.com/CryptoGuardOSS/cryptoguard under GPL v3.0. 14 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

compatibility.

• Our evaluation on 6,181 Android apps shows that around 95% of the total vulnerabilities come from libraries that are packaged with the application code. Some libraries are from Google, Facebook, Apache, and Tencent (Table 3.6). We observe violations in most of the categories, including hardcoded keyStore passwords, e.g., notasecret is used in multiple Google libraries (Table 3.5). We also detected multiple SSL/TLS (MitM) vulnerabilities that Google Play’s automatic screening seemed to have missed.

• We created a benchmark named CRYPTOAPI-BENCH with 112 unit test cases.3 CRYPTOAPI-

BENCH contains basic intra-procedural instances, inter-procedural cases, field sensitive cases, false positive tests, and correct API uses.

• To enable domain-specific security validation, we design an universal specification language

named SPANL. SPANL can be used to model meta-level program properties that can be de- tected by using a combination of various data-flow analyses. To demonstrate the expres-

siveness, we model various cryptographic misuse related security rules in SPANL language. We also model various classes of framework misconfiguration vulnerabilities that has been reported recently [129].

3.2 Threat Model and Overview

We describe our threat model and discuss the technical challenges associated with detecting these threats with static program analysis. For each challenge, we briefly overview our solution.

3Our benchmark is available at https://github.com/CryptoGuardOSS/cryptoapi-bench. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 15

1class PasswordEncryptor { 2 3 Crypto crypto; 4 5 public PasswordEncryptor(){ 6 String passKey = PasswordEncryptor .getKey("pass.key"); 7 crypto = new Crypto(passKey); p 8 } Param Influence: Orthogonal invocations: 9 Field Influence: Orthogonal Influence: 10 byte[] encPass(String [] arg){ "pass.key" 11 return crypto.encrypt(arg[0], arg[1]); p getKey(..) 12 } "defaultkey" 13 14 static String getKey(String src){ Context.getProperty(..) 15 String key = Context.getProperty(src); 16 if (key == null){ 17 key = "defaultkey"; 18 }

19 return key; PasswordEncryptor(..) passKey 20 } 21} param1 param0 22class Crypto { 1 23

defKey Crypto(..) 24 String ALGO = "AES"; arg encPass(..) 25 String ALGO_SPEC = "AES/CBC/NoPadding"; 26 String defaultKey; 27 Cipher cipher; param1 Crypto:defaultKey 28 29 public Crypto(String defKey){ "UTF−8" 30 cipher = Cipher.getInstance(ALGO_SPEC); key 31 defaultKey = defKey; // assigning field getBytes(..) 32 } 33 34 byte[] encrypt(String txt,String key){ 35 if (key == null){ encrypt(..) keyBytes 36 key = defaultKey; f 37 } (b) 38 byte[] keyBytes = key.getBytes("UTF-8"); 39 byte[] txtBytes = txt.getBytes(); 40 SecretKeySpec keySpc = new SecretKeySpec(keyBytes, ALGO); 41 cipher.init(Cipher.ENCRYPT_MODE,keySpc); 42 return cipher.doFinal(txtBytes);}}

(a)

Figure 3.1: (a) An example demonstrating various features of CRYPTOGUARD. Crypto class is used for generic AES encryption and PasswordEncryptor class uses Crypto for password encryption. f indicates influence through the fields and p indicates influence through the method parameters. (b) Partial data dependency graph for keyBytes variable. 16 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

3.2.1 Threat Model

We summarize the vulnerabilities that CRYPTOGUARD aims to detect below and in Table 3.1. We also rank their severity.

1. Vulnerabilities due to predictable secrets. Software with predictable cryptographic keys and passwords are inherently insecure [112]. Here, we consider the use of any constants, as well as values that are derived from constants or API calls with predictable outputs (e.g., DeviceID, Timestamps) to be insecure.

2. Vulnerabilities from MitM attacks on SSL/TLS. Improper customization of Java Secure Socket Extension (JSSE) APIs may result in man-in-the-middle (MitM) vulnerabilities [114, 121]. Cryp- toLint [112] does not detect these vulnerabilities.

3. Vulnerabilities from predictable PRNGs. Predictable pseudorandom number generators (PRNGs) are a major source of vulnerabilities [84, 122, 126]. Using java.util.Random as a PRNG is insecure [31, 133]. In addition, seeds for java.security.SecureRandom [32] should not be predictable.

4. Vulnerabilities from CPA. Ciphertexts should be indistinguishable under chosen plaintext attacks (CPA) [112]. Static salts make dictionary attacks easier on password-based encryption (PBE). In addition, static initialization vectors (IVs) in cipher block chaining (CBC) and electronic codebook (ECB) modes are insecure [80, 137].

5. Vulnerabilities from feasible bruteforce attacks. MD5 and SHA1 are susceptible to hash collision [179, 180] and pre-image [36, 98] attacks. In addition, bruteforce attacks are feasible for 64-bit symmetric ciphers (e.g., DES, 3DES, IDEA, Blowfish) [86]. 1024-bit RSA/DSA/DH and 160-bit ECC are also weak [21]. RFC 8018 recommends at least 1000 iterations for PBE [148].

How severe are these vulnerabilities? Each case has specific attack scenarios documented in the literature. To prioritize alerts, we categorize their severity into high, medium, and low, based on i) attacker’s gain and ii) attack difficulty. Vulnerabilities from predictable secrets and SSL/TLS MitM Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 17 are immediately exploitable and substantially benefit attackers. In Android, an application can only access its own KeyStore. Hence, hard-coded passwords are less harmful in Android. However, privilege escalation attacks bypassing this restriction have been demonstrated [183]. Commercially available rainbow tables allow attackers to easily obtain pre-images of MD5 and SHA1 hashes for typical passwords [37]. Hash collisions for these algorithms enable attackers to forge digital signatures or break the integrity of any messages [87, 179]. Therefore, these vulnerabilities are classified as high risks. Vulnerabilities from predictability and CPA provide substantial advantages to attackers by significantly reducing attack efforts. They are medium-level risks. Brute-forcing ciphers, requiring non-trivial effort, is low risk.

3.2.2 Technical Challenges and Solution Overview

The task of screening millions of lines of code for cryptographic API misuses poses a set of tech- nical challenges.

Technical Challenge I: False positives.

1. False positives due to phantom methods. A method is phantom if its body is not available during analysis. Unlike Android, Java web applications have phantom libraries. A non-system library that is not packaged with the project binaries is referred to as a phantom library. Existing cryptographic misuse vulnerability solutions (e.g., CryptoLint [112], CrySL [135]) are not designed to handle phantom libraries, which may cause false positives. For example, in Figure 3.1(a) if the class Context is a member of a phantom library, then getProperty method (Line 15) is a phan- tom method. The data-flow diagram in Figure 3.1(b) shows that a straightforward def-use analy- sis would likely report pass.key as a hard-coded key, since it cannot explore getProperty method at Line 15.

Our solution is a set of crypto-specific methods to refine slicing outputs (Section 3.5). For example, examining the context reveals that pass.key is used as an identifier of a key and has no security 18 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

influence on keyBytes. Thus, it can be safely discarded.

2. False positives due to data structures. Constants for bookkeeping data structures are another major source of false positives that are largely uncovered in the existing literature (e.g., [112, 135]). Most frequently used data structures include lists, maps, and arrays. For example, a data-structure- unaware analysis would likely report “1” from Line 11 (Figure 3.1(a)) as a hard-coded key, as it influences the key parameter of encrypt method (Figure 3.1(b)). Our refinement algorithms track and discard any kinds of data-structure-bookkeeping constants (Section 3.5).

Technical Challenge II: precision vs. runtime tradeoff. For a large project with millions LoC, building a super-CFG is costly and unnecessary. Cryptographic functionality is often confined within a small fraction of the project. However, most flow-, context- and field-sensitive analysis based tools (e.g., [112, 135]) appear to build a super control-flow graph, e.g., by superimposing the project’s call graph over control-flow graphs of methods, adding call edges between invoke instructions, method entries, and exits.

In contrast, we adopt the following more scalable approaches.

1. Control the depth of orthogonal explorations. Most of our cryptographic vulnerabilities involve finding constants. A distinguishing feature of constants is that they require no or few processing before use. Generally, processing is done by orthogonal method invocations. The clipping of orthogonal explorations may impact the detection accuracy and runtime. Based on our experiments in Section 3.5.7, we set the depth of orthogonal exploration to 1 in our detection. We use similar techniques as in phantom methods handling to reduce the false positives introduced by clipping.

2. Demand-driven analysis. Our flow- and context- sensitive analysis is demand driven. We per- form on-demand inter-procedural backward data flow analysis to perform backward slicing, where the analysis starts from the slicing criteria and propagates upward and orthogonally on-demand. For example, in Figure 3.1(a), a propagation from encrypt method to encPass method, is an upward propagation. A propagation to orthogonal method invocations at Line 6 and 38 are or- Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 19

Table 3.1: Cryptographic vulnerabilities, properties, and static analysis methods used. High, medium, and low risk levels are denoted by H/M/L, respectively. CPA stands for chosen ciphertext attack, MitM for man-in-the-middle, C/I/A for confidentiality, integrity, and authenticity, respec- tively. ↑ means backward slicing and ↓ means forward slicing. Slicing is inter-procedural unless otherwise specified (e.g., intra, both). Refinement insights are applied for all the inter-procedural backward slicing.

No Vulnerabilities Attack Type Crypto Property Severity Our Analysis Method

1 Predictable/constant cryptographic keys. Confidentiality H ↑ slicing & ↓ slicing 2 Predictable/constant passwords for PBE Predictable Secrets Confidentiality H ↑ slicing & ↓ slicing 3 Predictable/constant passwords for KeyStore Confidentiality H ↑ slicing & ↓ slicing

4 Custom Hostname verifiers to accept all hosts C/I/A H ↑ slicing (intra) 5 Custom TrustManager to trust all certificates C/I/A H ↑ slicing (intra) SSL/TLS MitM 6 Custom SSLSocketFactory w/o manual Hostname verification C/I/A H ↓ slicing (intra) 7 Occasional use of HTTP C/I/A H ↑ slicing

8 Predictable/constant PRNG seeds Randomness M ↑ slicing & ↓ slicing Predictability 9 Cryptographically insecure PRNGs (e.g., java.util.Random) Randomness M Search

10 Static Salts in PBE Confidentiality M ↑ slicing & ↓ slicing 11 ECB mode in symmetric ciphers CPA Confidentiality M ↑ slicing 12 Static IVs in CBC mode symmetric ciphers Confidentiality M ↑ slicing & ↓ slicing

13 Fewer than 1,000 iterations for PBE Confidentiality L ↑ slicing & ↓ slicing 14 64-bit block ciphers (e.g., DES, IDEA, Blowfish, RC4, RC2) Confidentiality L ↑ slicing Brute-force 15 Insecure asymmetric ciphers (e.g, RSA, ECC) C/A L ↑ slicing & ↓ slicing (both) 16 Insecure cryptographic hash (e.g., SHA1, MD5, MD4, MD2) Integrity H ↑ slicing

thogonal propagation. Our on-demand field sensitivity is applied to a field only if it is used in our inter-procedural backward slices. A field’s influence is considered indirect, if the field is accessed using orthogonal method invocations (i.e., getter methods). We refer to this field sensitivity as data-only class field-sensitivity.

3. Subproject awareness. Code in large projects is usually organized into subprojects, packaged as separate .jars. CRYPTOGUARD creates and consults a directed acyclic graph (DAG) represent- ing subproject dependencies. This approach i) excludes unnecessary subprojects and ii) analyzes independent sub-projects concurrently. 20 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

3.3 Map Vulnerabilities to Analysis

It is important to map cryptographic properties to concrete Java programming elements that can be statically enforced. We break down the detection plan into one or more abstract steps so that each step can be mapped to a single round of static analysis.

In this section, we illustrate the process of mapping cryptographic vulnerabilities to concrete pro- gram analysis tasks. This mapping process is manual and only needs to be performed once for each vulnerability. In what follows, we use rule i to refer to the detection of vulnerability i in Table 3.1. For example, in Rule 4, we detect the abuse of HostnameVerifier interface. Ideally, an imple- mentation of HostnameVerifier must use the javax.net.ssl.SSLSession parameter verify method to verify the hostname. Using the return statement as the slicing criterion, we perform intra-procedural backward slicing of verify method to implement this rule.

Rule 5 is to detect the abuse of the X509TrustManager interface. We reduce the task to detecting 3 concrete cases: i) throwing no exception after validating a certificate in checkServerTrusted, ii) unpinned self-signed certificate with an expiration check, and iii) not providing a valid list of certificates in getAcceptedIssuers. For Case i), intuitively, our program analysis needs to search for the occurrences of throw or propagated exception. throw is the slicing criterion in the (intra-procedural) backward slicing. Simple parsing is inadequate, as the analysis needs to learn the type of the thrown exception.

Rule 6 is to detect whether any method uses SSLSocket directly without performing hostname verification. Intuitively, to detect this vulnerability, we need to track whether an SSLSocket cre- ated from SSLSocketFactory influences the SSLSession parameter of a verify method (of a HostnameVerifier) invocation. In addition, we also need to check whether the re- turn value of the verify method is used in a condition checking statement (e.g., if). For detection, we use forward program slicing to identify all the instructions that are influenced by the SSLSocketFactory instance. Among these instructions, we examine three cases i) an Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 21

SSLSocket is created, ii) an SSLSession is created and used in verify, and iii) the return value of verify method is used to make decisions. These three cases represent a correct use of SSLSocket with proper hostname verification.

Rule 15 is to detect insecure asymmetric cipher configurations (e.g., 1024-bit RSA). A more con- crete goal is to detect an insecure default key size use and an explicit definition of insecure key size. The tasks of program analysis are to determine a) whether the key size is defined explicitly or by default, b) the statically defined key size, and c) the key generation algorithm. For Task a), our analysis uses forward slicing to determine whether the initialize method is invoked to set the key size of a key-pair generator. For Tasks b) and c), we use two rounds of backward program slicing to determine the key size and algorithm, respectively. We also employ on-demand field sensitivity for data-only classes in Task b). The analyses for Rule 15 are the most complex in

CRYPTOGUARD.

Mappings for other rules can be deduced from Table 3.1. For example, ↑ in Rules 1 & 2 means these rules are implemented using inter-procedural backward slicing and ↓ indicates inter-procedural forward slicing is used for on-demand data-only class field sensitivity. We list the slicing criteria in Tables A4, A5 and A6 in Appendix A.

3.4 Crypto-specific Slicing

We specialize static def-use analysis [189] and forward and backward program slicings [142] for detecting Java cryptographic API misuses. We break down the detection strategy into one or more steps, so that a step can be realized with a single round of program slicing. After performing the slicing, each program slice is analyzed to find the presence of a vulnerability. Our 16 categories of vulnerabilities require different program analysis methods for detection. Table 3.1 summarizes slicing techniques to detect each of the vulnerabilities. General-purpose slicing alone is inadequate. Thus, we explain our solution for overcoming the accuracy challenge in Section 5. 22 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

A definition of variable v is a statement that modifies v (e.g., declaration, assignment). A use of variable v is a statement that reads v (e.g., a method call with v as an argument). Def-use data- flow analysis or def-use analysis identifies the definition and use statements and describes their dependency relations. Given a slicing criterion, which is a statement or a variable in a statement (e.g., a parameter of an API), backward program slicing is to compute a set of program statements that affect the slicing criterion in terms of data flow. Given a slicing criterion, forward program slicing is to compute a set of program statements that are affected by the slicing criterion in terms of data flow. Given a program and a slicing criterion, a program slicer returns a list of program slices. Intra-procedural program slicing mechanisms use def-use analysis to compute slices.

To confine inter-procedural backward slicing within security code regions, the analysis starts from cryptographic APIs and follows their influences recursively. This approach effectively skips the bulk of the functional code and substantially speeds up the analysis.

Slicing Criteria The choice of slicing criterion directly impacts the analysis outcomes. We choose slicing criteria based on several factors, including relevance to the vulnerability, simplicity of checking rules, shared across multiple projects. Our slicing criteria and corresponding APIs are shown in Tables A4, A5, and A6 in Appendix A.

Backward Slicing For inter-procedural backward slicing, the slicing criteria are defined as the parameters of a target method’s invocation. For example, to find predictable secrets (in Rules 1- 3), we use the key parameter of the constructors of SecretKeySpec as the slicing criterion. For intra-procedural backward slicing, we define three types of slicing criteria: i) parameters of a method, ii) assignments, and iii) throw and return. For example, to detect insecure hostname verifiers that accept all hosts (in Rule 4), we use the return statement in the verify method as the slicing criterion.

Intra-procedural backward slicing. The purpose of intra-procedural backward slicing is two-fold. It is used independently to enforce security as well as a building block of inter-procedural back program slicing. The intra-procedural program slicing utilizes the def-use property of a statement Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 23

to decide whether a statement should be included in a slice or not. Our implementation utilizes the worklist algorithm from the intra-procedural data-flow analysis framework of Soot. During this process, if any orthogonal method invocations are encountered, it recursively slices them to collect the arguments and statements that influence any field or return statements within that or- thogonal methods. To reduce runtime overhead, such orthogonal method explorations are clipped at a pre-configurable depth. We use refinement insights in Section 3.5 to exclude security irrelevant instructions that basic use-def analysis cannot identify.

On-demand Inter-procedural backward slicing. This algorithm performs the upward propagation of the analysis. Our inter-procedural backward slicing builds on intra-procedural backward slicing. Major steps of the algorithm are as follows. i) We build a caller-callee relationship graph of all the methods of the program. The call-graph construction uses class-hierarchy analysis. ii) We identify all the callsites of the method specified in the slicing criterion. A callsite refers to a method invocation. iii) For all the callsites, we obtain all the inter-procedural backward slices by invoking intra-procedural slicing recursively to follow the caller chain. iv) Our procedure is field sensitive. Typical field initialization statements are assignments. After encountering a field assignment, the analysis follows the influences through fields, recursively.

Forward Slicing Some of our analysis demands forward slicing, which inspects the statements occurring after the slicing criterion.

Intra-procedural forward slicing. We design intra-procedural forward slicing for Rules 6 (SSLSocketFactory w/o Hostname verification) and 15 (Weak asymmetric crypto). The opera- tion of intra-procedural forward slicing is similar to that of intra-procedural backward slicing. In forward slicing, we choose assignments as the slicing criteria. The traversal follows the order of the execution, i.e., going forward. Because problematic code regions for Rules 6 and 15 are confined within a method, their forward slicing analyses do not need to be inter-procedural.

Inter-procedural forward slicing. Given an assign instruction or a constant as the slicing criterion, we perform the inter-procedural forward slicing to identify instructions that are influenced by the 24 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

$r1.setText("mytext"); $r1.setKey("mykey"); ... key = $r1.getKey();

Figure 3.2: Indirect field access using orthogonal invocations on data-only class object $r1. slicing criterion in terms of def-use relations. Our inter-procedural forward slicing operates on the slices obtained from inter-procedural backward slicing. The latter produces an ordered collection of instructions combined from all visited methods.

We define a class as a data-only class, if the fields of the class are only visible within orthogonal method invocations. We use inter-procedural forward slicing for on-demand field sensitivity of data-only classes, as the field sensitivity during upward propagation (inter-procedural backward slicing) does not cover them. In Figure 3.2, $r1 is an object of data-only class, where its fields are accessed indirectly with an orthogonal method (i.e, getKey) invocation. Given a constant, us- ing inter-procedural forward slicing, CRYPTOGUARD determines whether the constant influences any field of a data-only class object and records it. Later on, when it encounters an assign in- vocation on the same object and observes that the previously recorded field influences the return statement, then it reports the constant. Through this on-demand field sensitivity for data-only class,

CRYPTOGUARD knows that constant mytext (Figure 3.2) is not a hard-coded key. ↓ in Table 3.1 represents the use of forward slicing for on-demand data-only class field sensitivity 4.

3.5 Refinement for FP Reduction

We design a set of refinement algorithms to exclude security irrelevant instructions to reduce false alarms. These refinement insights (RI) are deduced by observing common programming idioms and language restrictions. We also discuss the possibility of false negatives (i.e., missed detection).

4Current prototype uses this field sensitivity for 8 rules. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 25

3.5.1 Overview of Refinement Insights (RI)

Eight of our rules (1, 2, 3, 8, 10, 12, 13 and 15) require identifying constants/predictable values in a program slice. The purpose is to ensure that no data (e.g., cryptographic keys, passwords, IVs, and seeds) is hardcoded or solely derived from any hardcoded values. Use of any predictable values (e.g., Timestamp, DeviceID) is also insecure for Rules 1, 2, 3 and 8. However, many constants do not impact security. We refer to them as pseudo-influences. Pseudo-influences are a major source of false positives. Based on empirical observations of common programming idioms and language restrictions, we have five strategies to systematically remove irrelevant constants/predictable values from slices and reduce pseudo-influences, which are summarized next.

• RI-I: Removal of state indicators. We discard constants/predictable values that are used to describe the state of a variable during an orthogonal method invocation.

• RI-II: Removal of resource identifiers. We discard constants/predictable values that are used as the identifier of a value source during an orthogonal method invocation.

• RI-III: Removal of bookkeeping indices. We discard constants/predictable values that are used as the index or size of any data structures. Specifically, RI-III discards any influences on i) size parameter of an array or a collection instantiation, ii) indices of an array, iii) indices of a collection.

• RI-IV: Removal of contextually incompatible constants. We discard constants/predictable val- ues, if their types are incompatible with the analysis context. For example, a boolean variable cannot be used as a key, IV, or salt.

• RI-V: Removal of constants in infeasible paths. Some constant initializations are updated along the path to the slicing criterion. We need to discard the initializations that do not have a valid path of influence to the criterion.

RI-I, RI-II and RI-IV are used to handle the clipping orthogonal method explorations, which can 26 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

occur due to phantom method invocations or pre-configured clipping at a certain depth. RI-III is used to achieve data structure awareness and RI-V are used to compensate path insensitivity. Next, we highlight the details of our refinement insights.

3.5.2 RI-I: Removal of State Indicators

Clipping orthogonal method exploration can cause false positives, if the arguments of method is used to describe the state of a variable. Consider UTF-8 in Line 38 of Figure 3.1(a). Its Jimple5 representation is as follows, where $r2 represents variable key, $r4 represents keyBytes, and virtualinvoke is for invoking the non-static method of a class.

$r4 = virtualinvoke $r2.("UTF-8")

If the analysis is clipped so that it cannot explore the getBytes method, then a def-use analysis shows that constant UTF-8 influences the value of $r4 (i.e., keyBytes). Thus, a straightforward detection method would report UTF-8 as a hardcoded key. However, UTF-8 is for describing the encoding of $r2 and can be safely ignored. We refer to this type of constants as state indicator pseudo-influence.

The use of refinement insights has direct impact on analysis outcomes. For example, discarding arguments of virtualinvoke may generate false negatives. Suppose virtualinvoke is used to set a key in a KeyHolder instance with some constant: virtualinvoke $r5.("abcd"). Constant abcd needs to be flagged. On the contrary, we observe that arguments of virtualinvoke appearing in assign statements are typically used to describe the state of a variable and can be ignored. Thus, RI-I states that i) arguments of any virtualinvoke method invocation in an assignment instruction can be regarded as pseudo-influences, and ii) any constants that influence these arguments can also be discarded.

5Jimple is an intermediate representation (IR) of a Java program. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 27

3.5.3 RI-II: Removal of Source Identifiers

Another type of pseudo-influences due to the clipping of orthogonal method exploration is the identifiers of value sources. We use an example to illustrate the importance of this insight. For the code below, a straightforward analysis would flag constant ENCRYPT_KEY. However, it is an identifier for retrieving a value from a Java Map data structure, and thus a false positive.

$r30 = interfaceinvoke r29.("ENCRYPT_KEY")

i) Retrieving values from an external source. Static method invocations (staticinvoke in Jimple) in assign statements are typically used to read values from external sources, e.g., Line 15 in Figure 3.1(a):

$r4 = staticinvoke (src)

Variable src refers to the identifier, not the actual value of the key. Thus, it is a pseudo influ- ence. To avoid such pseudo-influences, RI-II discards any arguments of staticinvoke that appear in an assignment. Although staticinvoke may be used to transform a value from one representation to another, it is unlikely to use staticinvoke to transform a constant.

3.5.4 RI-III: Removal of bookkeeping indices.

1 byte[] iv = new byte[] {0x0, 0x0, 0x0, 2 0x0, 0x0, 0x0, 0x0, 0x0}

Consider the Java statement above. After transforming into jimple representation, this statement looks like the following list of instructions.

1 $r15 = newarray (byte)[8] 2 $r15[0] = 0 3 $r15[1] = 0 4 $r15[2] = 0 28 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

5 $r15[3] = 0 6 $r15[4] = 0 7 $r15[5] = 0 8 $r15[6] = 0 9 10 $r2 = $r15

The hard coded size and the indices of an array can be regarded as pseudo-influences. To address this false positives, we discard all the constants that influences an array index. Also, any constant that influences the size or the index parameter of a collection can also be regarded as pseudo- influences. We regard List, Set as collections.

3.5.5 RI-IV: Removal of contextually incompatible constants.

Clipping of orthogonal invocations that doesn’t appear in an assign statement can also cause false positives. To reduce false alarms further, we also discard some constants constants based on its type and context. Let’s consider, a class named PBEInfo is used to store iteration count and salt and the analysis cannot explore PBEInfo class. A basic use-def analysis will report 5 as a salt from the following invoke instruction: specialinvoke r1.(Integer, String)>(5, "5341453"). However, a standalone Boolean or Inte- ger constant is unlikely to be used as a key, IV or salt, since their corresponding APIs only allow byte arrays. Also, any hard-coded size parameter (e.g., number of iterations in PBE (Rule 13), key size for insecure asymmetric crypto (Rule 15)) is unlikely to have any type other than Integer. Therefore, it is possible to discard some of the pseudo-influences by considering the types of a constant based on its context.

3.5.6 RI-V: Removal of constants in infeasible paths.

Some constant initializations are overwritten along the path to the point of interest. Counting such constants with infeasible influences will result in false positives. Since, empty strings and Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 29

105 w/o RI (Apache) w/ RI (Apache) w/o RI (Android) w/ RI (Android)

104

103 Number of Alerts (Log Scale) 102

[1,2] [3] Hardcoded [10] [12] [13]<1000 Predictable KeyStore Predictable Predictable PBE Iterations Keys Pass Salts IVs

Figure 3.3: Reduction of false positives with refinement insights in 46 Apache projects (94 root- subprojects) and 6,181 Android apps. Top 6 rules with maximum reductions are shown. nulls are used for initialization purpose and most often, these initialization are replaced with other values. To avoid false positive for this case, depending on rules and the slicing criteria we discard null and empty strings. For example, SecretKeySpec prohibits keys to be null or empty. IvParameterSpec does not allow null as IV. Also, PBEParameterSpec does not allow the salt to be null.

3.5.7 Evaluation of Refinement Methods

We compared the numbers of reported alerts before and after employing the five refinement algo- rithms for 46 Apache projects and 6,181 Android apps.

Our experiments show that refinement algorithms reduce the total alerts by 76% in Apache and 80% in Android. For Apache projects, we manually confirmed that all the removed alerts are 30 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

70 Apache Android 60 52.40 % 50 46.64 %

40

30 25.09 % Reduction of Alerts 20 18.68 % 15.09 % 11.47 % 10.93 %

10 7.45 % 6.52 % 5.73 %

0 RI I RI II RI III RI IV RI V

Figure 3.4: Breakdown of the reduction of false positives due to five of our refinement insights. indeed false positives6. All constant-related rules (including 1, 2, 3, and 12) greatly benefit from the refinements and have significant reduction of irrelevant alerts. Results for top six rules with maximum reductions are shown in Figure 3.3. The detailed breakdown is shown in Figure 3.4. The most effective refinement insight for Apache and Android are RI-III (removal of array/collection bookkeeping information).

With refinements enabled, there are a total of 1,295 alerts for the 46 Apache projects. Our careful manual source-code analysis confirms that 1,277 alerts are true positives, resulting in a precision of 98.61%. Out of the 18 false positives, 1 case is due to path insensitivity and 17 to clipping or- thogonal explorations (discussed in Section 3.8). All experiments reported in the next section were conducted with refinements enabled. Refinements may cause false negatives, which we discuss in Section 3.8.

6Regarding the validity of the manual analysis, the manual confirmation of alerts was conducted by a second-year Ph.D. student with a prior Master degree in cybersecurity (the second author), under the close guidance of a professor and a senior Ph.D. student (the first author). Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 31

190 120 0.98 Total Orthogonal Invocations 180 Spark 1000 100 Total Inter-procedural Slices 0.97 170 Hadoop 800 Average Length of Slices 160 Tomee 80 0.96 150 600 60 140 Count

F1 Score 0.95 F1 Score 130 (s) Runtime 40 400 Total Constants Total Discovered Constants 120 0.94 20 200 True positives 110

100 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Depth Depth Depth (a) (b) (c)

Figure 3.5: The impact of the orthogonal exploration depth on F1 scores and the number of dis- covered constants in (a), runtime in (b), and analysis properties in (c) for 8 rules.

Depth Runtime in Sec. (STD) 1 37.23 (49.75) 2 35.5 (39.03) 3 35.75 (39.09) 4 35.82 (39.23) 5 36.1 (38.97) 6 36.7 (39.63) 7 36.83 (39.64) 8 37.99 (40.34) 9 38.54 (41.22) 10 38.87 (41.94)

Table 3.2: The impact of clipping orthogonal explorations at various depth on runtime across 30 Apache root-subprojects. STD is computed across projects. Variations across multiple runs (3 runs) are negligible.

Impact of Orthogonal Exploration Depth. To measure the impact of the orthogonal exploration depth, we conducted an experiment with 30 Apache root-subprojects and varied the clipping from depth 1 to 10. The depth of orthogonal exploration refers to the distance of an orthogonal method from the main slice. An orthogonal method at depth 1 is a method invoked by the main slice.

The results are shown in Figure 3.5. The total number of discovered constants across all projects increases slightly with the depth (Figure 3.5(a) right Y-axis). However, our manual analysis re- vealed that none of the new constants is a true positive, i.e., the new constants are false positives. Thus, the increase of the orthogonal exploration depth does not improve the recall in this specific experiment, causing a decrease in the F1 score (Figure 3.5(a) left Y-axis). Interestingly, the anal- 32 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman ysis runtime does not increase with the increasing depth (Figure 3.5(b)). The average runtime for the 30 root-subprojects is presented in Table 3.2. Figure 3.5(c) shows that the number of inter- procedural slices and their average sizes are drastically reduced when the depth increases from 1 to 2. When the analysis explores inside a method, influences on an argument of an orthogonal invocation might become irrelevant, causing this drastic reduction. Given these observations, we set the orthogonal exploration depth to 1 for the rest of our experiments, as it returns the fewest number of irrelevant constants.

3.6 Security Findings and Evaluation

Our experimental evaluation answers the following questions.

• What are the security findings in Apache Projects? Do Apache projects have any high-risk vulnerabilities such as hardcoded secrets or MitM vulnerabilities? (Section 3.6.1)

• What are the security findings in Android Apps? Do third-party libraries have any high-risk vulnerabilities? (Section 3.6.2)

• How does CRYPTOGUARD compare with CrySL, SpotBugs, and the free trial version of Coverity on benchmarks or real-world projects? (Section 3.6.3)

Selection and pre-processing of programs. We selected 46 popular Apache projects that have crypto API uses. The popularity is measured with the numbers of stars and forks in Github. The maximum, minimum and average Line of Code (LoC) are around 2, 571K (Hadoop), 1.1K (Com- mons Crypto) and 402K, respectively. We perform subproject dependency analysis to build DAGs by parsing build scripts. Subproject dependency analysis was automated for gradle and maven, and was manual for Ant. We identified the root-subprojects, which are sub-projects that have no incoming edges on the subproject dependency DAG. We analyzed 94 root-subprojects in total7.

7We exclude 15 test root-subprojects. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 33

We downloaded 6, 181 high popularity Android apps from the Google app market covering 58 categories. The median value of the number of apps per category is 120. We used Soot to decom-

pile .apk files to Java bytecode in order to interface with CRYPTOGUARD. We use online APK decompiler to obtain human-readable source code for manual verification.

We ran 4 concurrent instances of CRYPTOGUARD in an Intel Xeon(R) X5650 server (2.67GHz CPU and 32GB RAM). For Apache, the average runtime was 3.3 minutes with a median of around 1 minute. For Android, we terminated unfinished analysis after 10 minutes. The average runtime was 3.2 minutes with a median of 2.85 minutes, including the cutoff ones. 552 (9%) of 6,181 app’s

analysis did not finish within 10 minutes, on which CRYPTOGUARD generated partial results. Most of them missed results from Rule 7, which CRYPTOGUARD runs the last.

Table 3.3: Breakdown of accuracy in Apache projects. Duplicates are handled at root-subproject level (total 82 root-subprojects) level. For Rules 1, 2, 3, 8, 10, 12, each constant/predictable value of an array/collection is considered as an individual violation.

Rules Total Alerts # True Positives Precision (1,2) Predictable Keys 264 248 94.14 % (3) Hardcoded Store Pass 148 148 100 % (4) Dummy Hostname Verifier 12 12 100 % (5) Dummy Cert. Validation 30 30 100 % (6) Used Improper Socket 4 4 100 % (7) Used HTTP 222 222 100 % (8) Predictable Seeds 0 0 0% (9) Untrusted PRNG 142 142 100 % (10) Static Salts 112 112 100 % (11) ECB mode for Symm. Crypto 41 41 100 % (12) Static IV 41 40 97.56 % (13) <1000 PBE iterations 43 42 97.67 % (14) Broken Symm. Crypto Algorithm 86 86 100 % (15) Insecure Asymm. Crypto 12 12 100 % (16) Broken Hash 138 138 100 % Total 1,295 1,277 98.61 % 34 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

3.6.1 Security Findings in Apache Projects

Out of the 46 Apache projects, 39 projects have at least one type of cryptographic misuses and 33 projects have at least two types. Table A1 summarizes our security findings in screening Apache projects. Predictable keys (Rules 1 and 2), HTTP URL (Rule 7), insecure hash functions (Rule 16), and the insecure PRNGs (Rule 9) are the most common types of vulnerabilities in Apache. As predictable values, we only observed constants for all these rules. We did not observe any predictable seeds under Rule 8.

1 2 } else if (tlsClientParameters.isDisableCNCheck()) { 3 ... 3 verifier = new AllowAllHostnameVerifier(); 4 4 } (a) A portion of https-cfg-client.xml (b) A portion of SSLUtils.java Figure 3.6: Disabled hostname verification in Apache Cxf.

1 SSLContext getSSLContext(String alias, 2 boolean trustAny){ 1 public static String sendUpsRequest(...){ 3 ... 2 ... 4 TrustManager[] tm; 3 http.setAllowUntrusted(true); 5 if (trustAny) { 6 tm = SSLUtil.getTrustAnyManagers(); } ... } 4 ... } (a) A portion of UpsServices.java (b) A portion of SSLUtil.java Figure 3.7: Trusting all certificates in Apache Ofbiz.

Vulnerabilities from Predictable Secrets

16 Apache projects (37 sub-rootprojects) have hardcoded keys (Rule 1, 2). Three (Meecrowave, Kylin, and Cloudstack) of them use hardcoded symmetric keys (Rule 1). Meecrowave uses DESede (i.e., Triple DES8) for obfuscation purpose. Unfortunately, deterministic keys make it trivial to break the obfuscation. Kylin (635 Forks, 1325 Stars) uses AES to encrypt user passwords. How- ever, using hardcoded keys makes these passwords vulnerable. In Apache Cloudstack, it appears that hardcoded keys are used in the test code, which is accidentally packaged with the production code.

8Triple DES itself is considered insecure. OpenSSL removed the support of Triple DES. NIST recommended moving to AES as soon as possible [182]. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 35

For Rule 2, most of the hardcoded passwords in PBE serve as the default. The most common default password for PBE is masterpassphrase (e.g., Ambari and Knox). Manifoldcf uses NowIsTheTime. Setting PBE to take the default hardcoded passwords without sufficient warn- ings are risky. Distributions using the default configuration are susceptible to the recovery of the plaintext password by an attacker who has the access to the PBE ciphertext. Apache Ranger (165 forks, 155 stars) uses a hardcoded password as default for PBE for all distributions. Its installa- tion Wiki does not mention about it. System administrators unaware of this setup are likely not to change the default. This coding practice significantly weakens the security guarantee of PBE.

For Rule 3, most common hardcoded passwords for KeyStores (for storing private keys) are changeit (e.g., Tomcat, Knox, Judi, Ofbiz and Wss4j) and none (e.g., Knox, Hive and Hadoop). Most of them are set as default. There are 9 projects that have both predictable keys (Rules 1 and 2) and hardcoded KeyStore passwords (Rule 3), indicating persistent insecure coding styles.

Insecure common practices. During manual analysis, we found three types of insecure common practices in Apache projects for storing secrets: i) hard-coding default keys or passwords in the source code, ii) storing plaintext keys or passwords in configuration files, and iii) storing encrypted passwords in configuration files with decryption keys in plaintext in source code or configuration. Java provides a special security APIs (e.g., Callback and CallbackHandler) to prompt users for secrets (e.g., passwords). However, none of these projects support this option.

Sysadmins are forced to store plaintext passwords in the filesystem unless they personally modify the code. The biggest danger that these insecure secret-storage practices bring to users is probably the inflated sense of security and not being able to know the actual risks.

Vulnerabilities from SSL/TLS MitM

Man-in-the-Middle (MitM) vulnerabilities are high risk in our threat model. 5 Apache projects (8 root-subprojects) have dummy hostname verifiers that accept any hostnames (Rule 4), including 36 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

Spark (15086 forks, 16324 stars), Ambari (814 forks, 778 stars), Cxf (706 forks, 398 stars), Ofbiz, and Meecrowave. 6 Apache projects have dummy trust managers that trust any certificates (Rule 5), including Spark, Ambari, Cloudstack, Qpid-broker, Jclouds, and Ofbiz. It appears that most projects offer them as an additional connectivity option.

Our manual analysis reveals that some projects set this insecure implementation as default (e.g., Figure 3.6 and Figure 3.7). In Figure 3.7, we see that Ofbiz uses insecure SSL/TLS configurations by default while using UPS (a shipping company) service. When plain sockets are used, it is recommended to verify the hostname manually. We found 3 projects that do not follow this rule and accept any arbitrary hostnames. We also found 7 projects (24 root-subprojects) that occasionally use the HTTP protocol for communication.

Medium and Low Severity Vulnerabilities

It is important to be aware of the medium and low-risk vulnerabilities in the system and to recog- nize that the risk levels may increase under different adversarial models.

We found hardcoded salts in 4 projects including Apache Ranger, Manifoldcf, Juddi, and Wicket. We also observe the use of ECB mode in AES in 5 projects and predictable IVs in 2 projects with a total of 40 occurrences. We found 5 projects that use PBE with less than 1,000 iterations (Rule 13). Ranger and Wicket projects use 17 iterations for PBE; and Incubator-Taverna-Workbench and Juddi projects use 20 iterations, much fewer than the required 1,000.

Listing 3.1: A vulnerable code snippet from Apache Ranger

1 PBEKeySpec getPBEParameterSpec(String password) throws Throwable { 2 MessageDigest md = MessageDigest.getInstance(MD_ALGO); // MD5 3 byte[] saltGen = md.digest(password.getBytes()); 4 byte[] salt = new byte[SALT_SIZE]; 5 System.arraycopy(saltGen, 0, salt, 0, SALT_SIZE); 6 int iteration = password.toCharArray().length + 1; 7 return new PBEKeySpec(password.toCharArray(), salt, iteration); } Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 37

Listing 3.1 shows a code snippet from Ranger, which has multiple issues. The number of iterations is proportional to the password size (Line 6), which is far less than the required 1, 000. In addition, this code offers a timing side-channel. An adversary capable of measuring PBE execution time (e.g., in multi-tenant environments) may learn the length of the password. This information can substantially decrease the difficulty of dictionary attacks. Another issue is that the salt is computed as the MD5 hash of the password (Lines 2-3). An adversary obtaining the salt may quickly recover the password. The salt’s dependence on the password itself also breaks the indistinguishability requirement of PBE under chosen plaintext attack.

Listing 3.2: Only checking the expiration (checkValidity) of self-signed certificates in Yahoo Finance (TWStock) app, due to (com.softmobile) library.

1 void checkServerTrusted(X509Certificate[] chain, String str){ 2 if (chain == null || chain.length != 1) { 3 this.f7654a.checkServerTrusted(chain, str); 4 } else { 5 //Lack of signature verification and others 6 chain[0].checkValidity();}}

Listing 3.3: Ignoring exceptions in checkServerTrusted in Sina Finance app.

1 void checkServerTrusted(X509Certificate[] chain, String str){ 2 try { 3 this.f7427a.checkServerTrusted(chain, str); 4 } catch (CertificateException e) {}} //Ignores exception

We found various occurrences of Blowfish, DES, and RC4 ciphers for Rule 14. Under Rule 15, we found 3 occurrences of using default key size of 1024 and 9 other occurrences that explicitly initialize the key size to 1024. 23 projects use java.util.Random as a PRNG (Rule 9), where two of them set static seeds to java.util.Random. We do not observe any deterministic seed to a java.security.SecureRandom (Rule 8).

Listing 3.4: SSLSocket without manual hostname verification in ProTaxi Driver app. 38 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

1 try { 2 SSLContext instance = SSLContext.getInstance("TLS"); 3 ... 4 this.webSocketClient 5 .setSocket(instance.getSocketFactory().createSocket()); 6 } catch (Throwable e) { ... } 7 this.webSocketClient.connect();

Table 3.4: Experimental results on the CRYPTOAPI-BENCH basic and CRYPTOAPI-BENCH ad- vanced benchmarks (as of April 2019) with CrySL, Coverity, SpotBugs and CRYPTOGUARD. GTP stands for the ground truth positives. TP, FP, and FN are the number of true positives, false posi- tives, false negatives in a tool’s output, respectively. Pre. and Rec. represent precision and recall, respectively. Tools are evaluated on 6 common rules (out of our 16 rules), i.e., the maximum com- mon subset of all tools. For these 6 rules, there are 6 correct cases (i.e., true negatives) in basic and 3 correct cases in advanced, which are used for computing FPRs. Total alerts = TP + FP.

Tools CRYPTOAPI-BENCH: Basic CRYPTOAPI-BENCH: Advanced Inter-Pro. Inter-Pro. Field False GT:14 Summary (Two) (Multiple) Sensitive Positive Summary GT: 13 GT: 13 GT: 13 GT: 3 TP FP FN FPR FNR Pre. Rec. TP FP FN TP FP FN TP FP FN TP FP FN FPR FNR Pre. Rec.

CrySL[135] 10 6 4 50.00 28.57 62.50 71.43 10 3 3 0 2 13 10 2 3 0 6 3 81.25 52.38 60.61 47.62 Coverity[3] 13 0 1 0.00 7.14 100.0 92.86 3 0 10 3 0 10 1 0 12 0 0 3 0.00 83.33 100.0 16.67 SpotBugs[5] 13 0 1 0.00 7.14 100.0 92.86 0 0 13 0 12 13 0 0 13 0 0 3 80.00 100.0 0.00 0.00

CRYPTOGUARD 14 0 0 0.00 0.00 100.0 100.0 12 0 1 12 0 1 13 0 0 3 0 0 0.00 4.76 100.0 95.24

3.6.2 Security Findings in Android Apps

Violations in apps or in libraries? We distinguished app’s own code from libraries by using the package information from AndroidManifest.xml.9 Android also uses it during R.java file generation (robust against obfuscation). We found that on average 95% of the detected vulner- abilities come from libraries (Table 3.5). This result extends the observation from 7 types of vulnerabilities (reported in [77]) to 16.

9An .apk contains both the app code and the libraries. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 39

Table 3.5 shows the distribution of vulnerability sources for each rule. For hardcoded KeyStore passwords (Rule 3), all violations come from libraries. Most frequent hardcoded KeyStore pass- word is notasecret, which is used to access certificates and keys in Google libraries (e.g.,

*.googleapis.GoogleUtils, *.googleapis.*.GoogleCredential).

Table 3.5: Distribution of vulnerabilities in Android apps.

Library Library App Itself Total (Total) (Unique)

(1,2) Predictable Keys 11,634 (93.4%) 5,940 823 (6.6%) 12,457 (3) Hardcoded Store Password 431 (94.1%) 170 27 (5.8%) 458

(4) Dummy Hostname Verifier 1,148 (99.3%) 51 7 (0.7%) 1,155 (5) Dummy Cert. Validation 3,715 (96.3%) 1,317 141 (3.7%) 3,856 (6) Used Improper Socket 270 (99.6.4%) 13 1 (0.4%) 271 (7) Used HTTP 7,687 (92.5%) 2,105 623 (7.5%) 8,321

(8) Predictable Seeds 522 (96.0%) 101 22 (4.0%) 544 (9) Untrusted PRNG 26,312 (91.7%) 8,679 2,393 (8.3%) 36,223

(10) Predictable Salts 1,638 (93.2%) 774 119 (6.8%) 1,757 (11) ECB in Symm. Crypto 1657 (93.1%) 682 123 (6.9%) 1,780 (12) Predictable IVs 11,357 (94.2%) 6,048 692 (5.8%) 12,089

(13) <1000 PBE iterations 294 (94.2%) 129 18 (57.8%) 312 (14) Broken Symm. Crypto 1,668 (95.8%) 753 74 (4.2%) 1,742 (15) Insecure Asymm. Crypto 4 (3.6%) 3 107 (96.4%) 111 (16) Broken Hash 49,257 (99.0%) 7509 496 (1.0%) 49,769

Total 117,594 (95.40%) 34,274 5,666 (4.60%) 130,845

Besides Google, other high-profile library sources include Facebook, Apache, Umeng, and Tencent (Table 3.6). These libraries frequently appear in different applications. We distinguished these libraries using base packages. CryptoGuard can detect API misuses in obfuscated packages, i.e., any violations from within the obfuscated code are also reported. However, we are unable to report the vendors of obfuscated libraries. Pinpointing the source of an obfuscated package is an active area of research [77].

Overview of other Android findings. We found exposed secrets, similar to Apache projects. Ta- ble A1 summarizes the discovered vulnerabilities in Android applications. The categories of untrusted PRNG (Rule 9) and broken hash (Rule 16) have the most violations. Interestingly, 40 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

Table 3.6: Violations in 5 popular libs (manually confirmed).

Package name Violated rules

com.google.api 3, 4, 5, 7

com.umeng.analytics 7, 9, 12, 16

com.facebook.ads 5, 9, 16

org.apache.commons 5, 9, 16

com.tencent.open 2, 7, 9

we observed 544 cases of predictable seeds (Rule 8). 13 cases of them used time-stamps from API calls.

Compared with Apache projects, Android apps have higher percentages of SSL/TLS API

misuses (Rules 4, 5 and 6) and HTTP use (Rule 7). For example, 25.3% of Android apps have dummy trust manager (Rule 5), which is more than twice the number in Apache (11.7%) as shown in Table A1 in Appendix A.

Our analysis can detect sophisticated cases that Google Play’s built-in screening is likely to miss.

We give code snippets for such cases (Listing 3.2, 3.3, 3.4). CRYPTOGUARD detects a case where developers allow unpinned self-signed certificates with a mere expiration check, as shown in List- ing 3.2. Another case is where developers ignore the exception in checkServerTrusted

method as shown in Listing 3.3. In addition, CRYPTOGUARD detects 271 occurrences of improper use of SSLSocket without manual Hostname verification in 210 apps. One such example is shown in Listing 3.4, where SSLSocket is used in WebSocketClient without manually ver- ifying the hostname 10. In comparison, Google Play’s inspection appears to only detect obvious misuses [24].

Grouping security violations by app popularity or category did not show substantial differences across groups.

10Guide for the correct use can be found at https://developer.android.com/training/article s/security-ssl#WarningsSslSocket. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 41

3.6.3 Comparison with Existing Tools

We compare the accuracy and runtime of CRYPTOGUARD with three existing tools, i.e.,

CrySL [135], Coverity [3], and SpotBugs [5]11. We use CRYPTOGUARD 03.06.00 (commit id ea75a45), SpotBugs 3.1.0 (from SWAMP). Results from Coverity online were obtained before March 30, 2019. For CrySL, we analyze Apache projects with CrySL 2.0 (commit id 5f531d1) and Android applications with CrySL-Android 1.0.0 (commit id 856b1da)[1].

Basic 20.0 Advanced Total 17.5

15.0

12.5

10.0

Test Cases Test 7.5

5.0

2.5

0.0 * * * 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Rules

Figure 3.8: Test cases per Rule in CRYPTOAPI-BENCH (as of April 2019).

Benchmark preparation. First, we12 had to construct CRYPTOAPI-BENCH, a comprehensive benchmark for comparing the quality of cryptographic vulnerability detection tools. Regarding the existing benchmark DroidBench [75], i) DroidBench does not cover cryptographic APIs, ii) the free web version of Coverity requires source code, however DroidBench only contains APK binaries.

CRYPTOAPI-BENCH covers all 16 cryptographic rules specified in Table 3.1. As of April 2019, there are 38 basic test cases and 74 advanced test cases. The basic benchmark con-

11CryptoLint’s code is unavailable. 12The person (third author) who led the benchmark design is different from the person (first author) who imple- mented CRYPTOGUARD. 42 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

25

20

15

Test Cases Test 10

5

0 1.1 2.1 3.1 4.1 5.1 5.2 6.1 7.1 8.1 8.2 10.1 11.1 12.1 15.1 15.5 16.1 Crypto APIs

Figure 3.9: Test cases per API in CRYPTOAPI-BENCH (as of April 2019). A test case can cover one ore more APIs (e.g., test cases for Rule 15). APIs corresponding to the labels can be found in Tables A6, A4, and A5.

tains 25 straightforward API misuses and 13 correct API uses (i.e., true negative cases). The advanced cases have more complex scenarios, including 42 inter-procedural cases13, 20 field- sensitive cases, 9 false positive test cases (for evaluating the ability of recognizing irrelevant elements), and 3 correct API uses (i.e., true negative cases). Figures 3.8 and 3.9 show the dis- tributions of test cases per rule and per API, respectively. A more recent version of the bench- mark with more diverse test cases can be found in [68]. See Github for the most updated ver- sion https://github.com/CryptoGuardOSS/cryptoapi-bench.

Benchmark comparison. To maintain fairness in our comparison, we only report the benchmark results for the six shared rules (1, 2, 3, 11, 14, 16) that are covered by all the tools, CrySL [135], Coverity [3], SpotBugs [5], and ours. Due to the lack of documentation, we had to infer a tool’s coverage based on whether or not it ever generates any alert in that category. We show the results

in Table 3.4. SpotBugs, Coverity, and CRYPTOGUARD perform well on the basic benchmark. For CrySL, its errors are partly due to their rule definitions being very specific. For example, CrySL

1321 cases involve two methods and 21 cases involve more than two methods. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 43

CrySL CryptoGuard

102

101 Runtime in Sec. (Log-scale) * * * * * * * * * * * * * * * * * *

hive fop kylin nifi tika spark tomee juddiwss4j wicket knoxshiro hadoop airavata abdera shindig plugin-yarn cloudstackmanifoldcf deltaspike jackrabbit-oak santuario-java directory-server geronimo-gshell qpid-jms-amqp-0-x activemq-artemis meecrowave-core embeddedwebserver meecrowave-core-runner incubator-taverna-workbench

CrySL 103 CryptoGuard

102 -scale)

101 * * * *

0

Runtime in Sec. (Log 10

10− 1

Alipay AICoin receipts iqboxyinc zhangdan Ulta_Beauty JW_Library UPS_Mobile Money_Lover Tiny_Scannermisa.sothuchiAuto_Makeup ebook_Rentajits.mobile.aya Amazon_KindleManga_ReaderLINE_WEBTOON Clairol_MyShade Square_Point_of_Sale Free_Bitcoin_Spinner Daily_Bible_Journey Sephora_Shop_Makeup Cartoon_Avatar_MakerADP_Mobile_Solutions Audiobooks_from_AudibleMint_Budget_Bills_FinanceFacebook_Pages_Manager Card_Maker_for_Pokemon Dictionary_Merriam_Webster Perfect365_One_Tap_Makeover

Figure 3.10: Runtime comparison in log scale of CryptoGuard and CrySL on 30 Apache root- subprojects in (a) and 30 Android applications in (b), ordered by decreasing lines of code (LoC). * indicates crash. CryptoGuard successfully completed all tasks. The LoCs are shown in Table A2. raises an alert if a cryptographic key is not directly obtained from the key generator. However, in some cases, a previously generated cryptographic key can be used securely in the code with- out a key generator. For cryptographic passwords, CrySL raises an alert if it is derived from a String, likely because Java recommends using char[] so that a password can be wiped after use. 44 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

However, this String-based policy would miss hardcoded passwords defined in char[], generating false negatives. For the advance benchmark, both CrySL and SpotBugs generate false positives, when a variable is passed through multiple methods. For all cases, Coverity has zero false posi- tives, likely because of the use of symbolic execution and/or path-sensitive analysis14. However, Coverity misses multiple advanced vulnerability scenarios (for rules that it does cover in the basic benchmark).

Table A3 in Appendix A presents the comparison for all 16 rules (not just the 6 common rules).

When testing all 16 rules, CRYPTOGUARD failed to report 11 misuses (i.e., false negatives). We discuss the causes in Section 3.8.

Runtime comparison. We compare CrySL and CRYPTOGUARD on 30 randomly selected Apache root-subprojects (LoC ranging from 471K to 1K) and 30 Android applications (LoC ranging from 1,453K to 0.4K), with 3 runs each. The results are summarized in Table 3.7 with full runtime

details sorted by LoC in Figure 3.10 and LoC Table A215. CRYPTOGUARD completed all tasks, demonstrating robust and efficient performance. CrySL exited prematurely for 18 Apache projects

and 4 Android apps due to various errors (e.g., memory errors16). For Apache, CRYPTOGUARD exhibits better overall runtime performance than CrySL.

For Android, CrySL is faster, partly because CrySL only analyzes the code that is reach-

able by an app’s life-cycle. In comparison, CRYPTOGUARD also covers third-party libraries (regardless of life-cycle reachability) and produces more valid alerts. For example, for

Card_Maker_for_Pokemon, CRYPTOGUARD and CrySL generated 2 and 0 alerts, respectively.

For Cartoon_Avatar_Maker, CRYPTOGUARD and CrySL generated 5 and 1 alerts, respectively.

The 8 alerts are distinct true positives in libraries, which means CRYPTOGUARD has 1 false nega-

tive and CrySL has 7 false negatives. CRYPTOGUARD’s false negative (an MD5 use) comes from

the Android core library com.google.android, which CRYPTOGUARD currently does not analyze.

14Coverity is close sourced, so we are unable to confirm. 15LoC is obtained using online Java and APK decompilers and cloc command. 16We increased the heap size to 10GB for CrySL, while CRYPTOGUARD ran with the default 4GB heap memory. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 45

For the free web version of Coverity, we are unable to obtain its runtime. We choose not to compare the runtime with SpotBugs. The comparison would not be meaningful, as its analysis is mostly based on the syntactical matching of source code to known bug patterns [128, 171]. Table 3.7: Summary of average runtime (in seconds) across all completed runs for CrySL and CRYPTOGUARD. We evaluated 30 Apache root-subprojects and 30 Android apps, each with 3 runs. Incmpl stands for the number of incomplete analyses. Standard deviations (std) are computed across projects/apps. Variations across runs are negligible.

Apache root-subprojects Android applications Tool Incmpl. Avg. (std) Median Incmpl. Avg. (std) Median CrySL 18 16.5 (18.0) 6.9 4 15.7 (44.1) 5.2 Ours 0 12.7 (14.2) 6.4 0 187.8 (488.3) 52.4

Summary of findings. Refinements bring a 76% reduction in alarms for Apache projects and an 80% reduction for Android applications. For Apache projects, we manually confirmed that all the removed alerts are indeed false positives. Manually examining the remaining 1,295 Apache alerts (after refinements) confirms our precision of 98.61%. 39 out of the 46 Apache projects have at least one type of cryptographic misuses and 33 have at least two types. There is a widespread insecure practice of storing plaintext passwords in code or in configuration files. Insecure uses of SSL/TLS APIs are set as the default configuration in some cases. 5,596 (91%) out of the 6,181 Android apps have at least one type of cryptographic misuses and 4,884 (79%) apps have at least two types. 95% of the vulnerabilities come from the libraries that are packaged with the applications. Some libraries are from large software firms. CRYPTOGUARD’s detection for SSL/TLS API misuses is more comprehensive than the built-in screening offered by Google Play.

3.7 Domain-specific Static Security Validation

3.7.1 Motivation

There has been numerous efforts to detect these application-level vulnerabilities statically [67, 89, 112, 135, 135, 153, 165]. Most of these efforts focused on either improving the detection 46 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

capability [165] or detecting a new class of misuses [89]. Interestingly, most of these detection mechanisms use some form of data-flow analysis. Researches on developing specification lan- guages to express a security property that can be detected by static data-flow analyses are still largely unexplored. Such a universal language will systematize (and unify) the efforts on both improving the performance of static analysis and their applications in various domains. Kruger et al. [135] designed a specification language (named CrySL) to detect a class of cryptographic API misuse vulnerabilities. Our analysis found that CrySL’s restrictive language structure makes it inappropriate to model various classes of API misuses (i.e., modeling to detect hardcoded keys). In this chapter, we tackle the following research question. Is it possible to develop a universal specification language to express a meta-level program property that can be detected by using a combination of various types of static data-flow analyses?

3.7.2 Overview

In this section, we give an overview of SPANL, which stands for Static Program vAlidatioN

Language (Figure 3.11). SPANL offers a specification language-based rule expression for auto- matic security validation. SPANL performs this in two stages. First, SPANL parses and validates the rules written in SPANL language. Then, the interpreter runs the analysis according to parsed rule representation and produces the analysis results (Backend component in Figure 3.11). This specification language is guided by an extended Backus-Naur form (EBNF) grammar. EBNF is a collection of extensions to Backus-Naur form (BNF) [78] to design modern compilers.

Our SPANL, offers various types of static analysis including, inter- and intra-procedural versions of forward and backward dataflow analysis with various types of starting criteria.

3.7.3 IR code of the SPANL

Our IR code is a collection of the following five sections. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 47

Figure 3.11: System components of SPANL.

• API section. API section contains various API definitions, that can be referred from other sections of the IR code. These API definitions are reusable across various rules.

• Operation section. The operation section defines various static analysis operations that are needed to be performed in order to validate the rule. These operations are referred from various instructions in the execution section.

• Emit section. Emit sets contain the guidelines for collecting information from each of the

operations defined in the operation sections. SPANL supports two types of emit sets, i) explicit and implicit. Emit sets that are explicitly defined in the emit set section are explicit emit sets. These emit sets can be used in various constraints in the constraint section or printed in the execution section. There are two types of implicit emit sets. i) simple, and i) compound. If an operation defined in the operation section doesn’t have an explicitly defined emit set, then an implicit emit set is attached to it. These are called simple emit set, which collects all the program points after the execution of the corresponding operation. Compound emit sets are created as a result of compound operations.

• Constraint section. This section defines various constraints that are used in the conditional instructions in the execution section. These constraints are defined by using emit sets, con- stant values, and API references. 48 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

• Execution section. The execution section contains specific instructions required to be ex-

ecuted to validate a rule. SPANL supports three types of instructions, i.e., operation, con- ditional, and print instructions. Operation instructions are used to execute the operations defined in the operation section. It also supports some compound operations in the form of set union, subtraction, and join based on the results of other operations. Conditional and print instructions are used to support conditional branching and print various types of messages to the standard output, respectively.

3.7.4 SPANL Language

In Figures 3.12, 3.17, 3.13, 3.14, 3.15, and 3.16, we present the formal language of SPANL.

SPANL IR ::= APIs: a Operations: o Emits: ϵ Constraints: c Exec: s

v, aid, oid, ϵid, cid ::= γ

V alues x ::= n | s

Type Enviornment Γ ::= . | Γ, v : τ

γ : identifiers

n : Int

s : String

τ : Set of program-points

Figure 3.12: Grammar the intermediate code

In Figure 3.12, we present the overall structure, various identifiers, values, and the type environ- Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 49 ment of the language. An intermediate code contains 5 code sections (discussed in Section 3.7.3) in order to express various security properties.

′ ′ ApiGroup a ::= aid : a a | aid : a

Api a′ ::= methodSig a′ | methodSig

methodSig ::= ret className: methodName(args)

ret ::= < name : T ype > | < void >

args ::= args, < name : T ype > | < name : T ype >

methodName, name ::= v

T ype ::= basicT ypes | basicT ypes[] | className | className[]

className ::= className.v | v

basicT ypes ::= int | char | byte | boolean

Figure 3.13: Grammar for parsing the API section

The API section (Figure 3.13) defines a set of API groups tagged with unique identifiers (aid). The method signature of an API requires name and type pairs for both method arguments and the return variable, which is used to define and identify various dataflow analysis uniquely (i.e., the starting point, the definition of the emit set, etc.). 50 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

Operation o ::= o | oid : o1 | oid : o2

o1 ::= inter byApi | inter byRegex | intra on byApi |

intra on byP aram | intra on byRegex

inter ::= inter-backward | inter-forward

intra ::= intra-backward | intra-forward

byApi ::= with aid and v

byP aram ::= with v

byRegex ::= with x

o2 ::= iterate on

on ::= aid | ∗

Figure 3.14: Grammar for parsing the Operation section

Overall, SPANL supports 5 types of operations (Figure 3.14), i.e., inter-, intra-procedural forward and backward dataflow analysis and iteration. An operation produces a set of program points (τ) after its execution. These operations are referred from the execution section by using their identifiers (oid). Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 51

′ Emitsets ϵ ::= ϵ | ϵid : ϵ

ϵ′ ::= constants | instructions | invocations

constants ::= constants of-type T ypes | constants matches x

T ypes ::= T ypes, T ype | T ype

instructions ::= instructions matches x

invocations ::= invocations matches aid

Figure 3.15: Grammar for parsing the Emit section

Emit sets define the specification of which program points are to be collected during the execution of an operation. Explicit emit sets are of three types, i.e., constants, instruction, and API invocation based emit sets (Figure 3.15). The inclusion criteria for constants can be specified by types or a regular expression. All the program points containing a value of the specified type or matching the regular expression will be added to the emit set. In the instruction-based emit set, all the program points matching the regular expression are collected. Invocation-based emit sets are used to collect program points containing the specified API invocations. API groups are identified by aid.

′ Constraints c ::= c | cid : c

c′ ::= in | empty

in ::= {vals} in {ϵid} | {vals} not in {ϵid}

empty ::= {ϵid} empty | {ϵid} not empty

vals ::= vals, x | x

Figure 3.16: Grammar for parsing the Constraint section 52 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

A single constraint in SPANL involves one emitset. Constraints support two set’s of conditioning on emitsets, i.e., in and empty (Figure 3.16). in can be used to check whether a set of values exists in the emitset or not. empty constraints can be used to check whether an emitset is empty or not.

Stmt s ::= v := o | o | print(x) | if δ then s1 else s2 | for v do s | s

Operation o ::= (o) | o1 + o2 | o1 − o2 | o1 ⊕ o2 | v | oid

Conditional Expression δ ::= δ1 and δ2 | δ1 or δ2 | (δ) | not δ | cid

Figure 3.17: Grammar for parsing the execution section

Figure 3.14 presents that SPANL’s execution supports five types of instructions, i.e., assign, op- eration, print, if and for. Assign statements offer the functionality of assigning the output of an operation (a set of program points) to a variable. In addition to the basic operations (invoked by oid), SPANL also supports some compound instructions i.e., set addition and subtraction and join operations. Join of o1 ⊕ o2 indicates the execution o1 by using the output program points of o2 as the starting point. We present the judgements of the type system for the execution section in Figure 3.18.

Γ ⊢ δ : boolean Γ ⊢ s : τ Γ ⊢ s : τ Γ ⊢ v : τ Γ ⊢ s : τ 1 2 IF-OP FOR-OP Γ ⊢ (if δ then s1 else s2 ): τ Γ ⊢ (for v do s ): τ Γ ⊢ o : τ Γ ⊢ o : τ Γ ⊢ o : τ Γ ⊢ o : τ Γ ⊢ o : τ Γ ⊢ o : τ 1 2 ADD-OP 1 2 SUB-OP 2 1 JOIN-OP Γ ⊢ (o1 + o2 ): τ Γ ⊢ (o1 − o2 ): τ Γ ⊢ (o1 ⊕ o2 ): τ Γ ⊢ o : τ Γ ⊢ x : Int, String ASSIGN-OP PRINT-OP Γ ⊢ (v := o ): τ Γ ⊢ (print(x)): boolean

Figure 3.18: Type judgements for execution section. Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 53

Listing 3.5: Rule to detect insecure RSA keys

1 NAME: assym_key_gen

2 APIS: kpg_apis

3 java.security.KeyPairGenerator:

4

5 getInstance()

6

7 APIS: kpg_init

8 java.security.KeyPairGenerator:

9 initialize()

10

11 OPERATIONS:

12 o1: inter-backward with kpg_apis and algo

13 o2: inter-forward with kpg_apis and kpg

14 o3: inter-backward with kpg_init and size

15

16 EMITS:

17 {kpg}: *

18 {algo}: constants of-type java.lang.String

19 {size}: constants of-type int, java.lang.Integer

20

21 CONSTRAINTS:

22 c1: {kpg_init} in {kpg}

23 c2: {"RSA"} in {algo}

24 c3: {2048, 4096} not in {size}

25

26 EXEC: 54 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

27 o1, o2 join-with o1

28 if c2 and !c1:

29 out "Must invoke initialize for RSA"

30 if c1:

31 o3 join-with o2

32 if c2 and c3:

33 out "Key size must be in {2048, 4096}"

3.7.5 Expressiveness of the language

In Listing, 3.5, we model the rule to detect insecure usage of RSA (Rule 15) in SPANL language.

To Further demonstrate the expressiveness, we model Rule 1, 4, 5 in our SPANL language (List- ing A.1, A.2, A.3, respectively in the Appendix A). These rules encompass some of the most complex meta-level program properties that are required for detection. To further demonstrate the expressive power, we model two Spring security framework misconfiguration issues reported in [129], i.e., disable of CSRF protection (Listing A.4) and hardcoded JWT signing keys (List-

ing A.5) in Appendix A. In all cases, SPANL arguably made the expression easy and understand- able.

3.8 Limitations and Discussion

No static analysis tool is perfect. CRYPTOGUARD is no exception. We discuss the detection

limitations of CRYPTOGUARD and future improvements.

CRYPTOGUARD runs the intra-procedural forward slicing for Rules 6 and 15, where an inter- procedural forward slicing could potentially improve the coverage. For Rule 15, this change might not make much difference, as KeyPairGenerator creation and its initialization usually occur Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 55 in the same method. For Rule 6, our current implementation ignores the direct sub-classes of SSLSocketFactory to avoid false positives. Inter-procedural slicing could extend the analysis to the sub-classes.

False positives. One source of false positives comes from the path insensitivity. For example,

CRYPTOGUARD raises an alert if the variable iteration is assigned with a value of 0 for the following code snippet (from project jackrabbit-oak). However, this alert is a false positive, since this assignment is on an infeasible path. int iteration = 0; ... if (iteration < NO_ITERATION) { // NO_ITERATION = 1 iteration = DEFAULT_ITERATION; }

CRYPTOGUARD detects the existence of API misuses in a code base but does not verify that the vulnerable code will be triggered at runtime. This issue is a general limitation of static program analysis. confirmed insecure PRNG uses, but stated that the affected code regions are not security critical.17 However, eliminating this type of alerts is difficult as the analysis needs to be aware of custom defined security criteria (e.g., what constitutes critical security) with in-depth knowledge about project semantics.

Another source of false positives is clipping orthogonal exploration. However, deeper exploration has impacts on both ends – eliminating some false positives while increasing the overall number of (irrelevant) constants discovered. As our experiment shows (in Figure 3.5), the net result of increasing the depth appears to be discovering more irrelevant constants (as opposed to reducing them).

False negatives due to refinements. Refinements may cause false negatives. For the full bench- mark evaluation in Table A3 in Appendix A, CRYPTOGUARD has 11 false negatives (i.e., missed

17It is unclear why Spark chose to use insecure PRNG, even for non-security purposes. 56 Chapter 3. Cryptographic API Misuse Detection Sazzadur Rahaman

detection). All these cases are due to our refinements after clipping orthogonal explorations. For example, RI-II would ignore 6A5B7C8A as a pseudo-influence from the following instruction, if orthogonal explorations to explore parseHexBinary are clipped.

byte[] key = DatatypeConverter.parseHexBinary("6A5B7C8A"). However, these conver- sions are mostly required to process values from external sources (e.g., file system, network). Any such conversions of static values under the rules of Table 3.1 are highly unlikely. Indeed, outside the benchmark, we did not observe any such cases during our manual investigation of Apache alerts. Additionally, vulnerabilities originated from a clipped orthogonal method may also be

missed by CRYPTOGUARD. Conceptually, all such false negatives could be avoided by increasing the depth of the orthogonal exploration (default depth is 1). Our results in Figure 3.5 on 30 Apache root-subprojects with varying depths of orthogonal explorations show that the increase of the depth does not necessarily discover new true positive cases or increase the F1 score.

Vulnerability disclosure and feedback. We have heard back from a number of Apache projects regarding our vulnerability disclosure, including Tomcat, Hadoop, Hive, Spark, Ofbiz, Ambari, and Ranger. Apache Spark removed the support of dummy hostname verifier and dummy trust store. Apache Ranger fixed constant default values for PBE [42] and insecure cryptographic prim- itives [30]. Ofbiz promised to fix the reported issues of constant IVs and KeyStore passwords. Regarding MD5, justifies that its MD5 use is for the per-block checksums for Hadoop file systems (HDFS)’s consistency and the setup does not assume the presence of active adversaries. For Android libraries, we have submitted vulnerability reports to Google. Google closed our issue with 4 misuses in Google libraries citing the lack of concrete exploit demonstra- tions. Facebook, and Tencent have similar requirements. We also received similar feedback from an Apache software foundation’s administrator demanding concrete exploit demonstrations before more reported issues can be examined.

Some developers explained that certain operational constraints (e.g., backward compatibility for clients) prevent them from fixing the problems. For example, server has to use Sazzadur Rahaman Chapter 3. Cryptographic API Misuse Detection 57

MD5 in its digest authentication code, because major browsers do not support secure hash func- tions (as defined in RFC 7616). However, digest authentication is rarely used in the wild18. The thorniest issue is secret storage. One justification for developers’ choice of storing plaintext pass- words or keys in file systems is for supporting humanless environments (e.g., automated scripts to manage services). However, not all deployment scenarios are server farms in a humanless envi- ronment. Projects should provide the secure option by prompt human operators for passwords that can be used to unlock/generate other passwords or keys on the fly. Not properly disclosing and documenting the insecure configurations does a great disservice to the project’s users.

3.9 Summary

We described our effort of producing a deployment-quality static analysis tool CRYPTOGUARD to detect cryptographic misuses in Java programs that developers can routinely use. This effort led to several crypto-specific contributions, including language-specific contextual refinements for FP reduction, on-demand flow-sensitive, context-sensitive, and field-sensitive program slicing, and benchmark comparisons of leading solutions. We also obtained a trove of security insights into Java secure coding practices. We also designed a compiler that automatically transforms a crypto- graphic vulnerability or rule into a static-analysis-based code-screening algorithm, similar to what CrySL provides, but with much higher precision and expressiveness.

18https://security.stackexchange.com/questions/152935/why-is-there-no-adoption-of-rfc-7616-http-digest-auth Chapter 4

Program Analysis of Cryptographic Implementations

4.1 Introduction

Cryptographic protocols/constructs are often used as the building block for providing robust se- curity guarantees in many applications (e.g., HTTPS [7], DNSSEC [14], SMTP-over-TLS [8]). While implementing or employing these cryptographic protocols, one hopes to replicate the secu- rity guarantees provided by their theoretical cryptographic counterparts, that have been proven to be secure. This seemingly straightforward goal of implementing applications with provably secure guarantees, however, is often unaccomplished as evident in the recent high-profile outbreaks of cryptography-related vulnerabilities in widely used network libraries and tools (e.g., heartbleed vulnerabilities in OpenSSL [15] and seed leaking in Juniper Network [100]).

The lack of provable security guarantees in applications relying on cryptography can be often attributed to a combination of the following reasons: (1) The application uses a vulnerable cryp- tographic library or an insecure cryptographic construct/parameter (e.g., MD5 hash function); (2) A cryptographic construct is used without satisfying its required precondition (e.g., initialization vector not being random); (3) The correct APIs of the underlying cryptographic library are not invoked at all, not invoked in the prescribed order, or invoked with improper arguments (e.g., hostname validation is not performed after X.509 certificate chain validation); (4) The application suffers from logical/runtime vulnerabilities (e.g., buffer overflow). The impact of such insecurity

58 Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 59

for a critical application can affect millions of devices that execute the vulnerable implementation, hence rendering millions of users vulnerable to adversarial attacks resulting in the loss of user privacy, vendor reputation, or even financial loss. The main objective of the chapter is to develop technologies for aiding developers to avoid such pitfalls in their applications.

The current chapter contributes to this overarching vision by promoting a new paradigm called the cryptographic program analysis (CPA) which prescribes the use of program analysis approaches to develop compile-time insecurity checking and security enhancing solutions. Most of the ex- isting work in this domain either focus on precisely detecting cryptographic API misuses by the applications [112, 115, 125, 134, 155, 166] or identifying protocol-specific vulnerabilities in the cryptographic libraries [85, 99, 103, 105, 175, 176]. These relevant efforts, however, leave the void of not assisting developers to avoid other kinds of pitfalls, for instance, their use of inse- cure cryptographic constructs (e.g., ECB mode in symmetric ciphers) or parameters (e.g., RSA public-exponent 3 [9]).

The key insight that enables CPA to effectively aid developers to have a robust implementation is that many of the aforementioned pitfalls can be mapped to the violations of meta-level proper- ties of the implementations. The violations of the meta-level properties can not only be checked during compile-time but also their violations can serve as sufficient evidence of the cryptographic flaws they encompass. For explaining meta-level properties, let us take a fictitious application that processes commands from a client. For ensuring the integrity of the submitted command, it uses a 8-byte message authentication code (MAC) scheme. Let us also assume that the MAC scheme enjoys the desired security property of resistance against existential forgery [139].

During its execution, whenever the application receives a plaintext command m and its MAC

m m rmack , before processing the command m it checks the validity of the MAC rmack . When the

MAC verification fails, it returns MAC_FAIL warning message; otherwise, it returns OK. For verifying the MAC, it first constructs the MAC of m using its key k. Suppose the constructed MAC

m m ? m m is cmack . It then checks to see whether cmack = rmack by comparing each byte of cmack with 60 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

m rmack ; halting the comparison by sending MAC_FAIL when the first mismatch is observed. It is evident that through the observation of the response message and the timing of the received response message, an adversary—who does not know the cryptographic key—can forge the MAC of a new message with only 8 × 256 attempts instead of 2568. This flaw can be easily mapped to the following meta-level property violation: “No early termination during the comparison of cryptographic payloads”. In the similar vein, the infamous Bleichenbacher’s padding oracle attack against RSA can be mounted due to the violation of another meta-level property: “The same, generic error message should be sent whenever the protocol experiences an error condition”.

As readers may have realized, many of the meta-level properties can be specific to cryptographic constructs/protocols. To enable the specification of such meta-level properties, we provide a de- terministic finite automaton (DFA) based language. We also develop a tool dubbed TAINTCRYPT that leverages static information flow analysis to identify the violations of meta-level properties in C/C++ implementations. Our static analysis is both path- and context-sensitive, hence capable of enforcing a rich set of cryptographic properties precisely (i.e., small false positives).

Our work targets and addresses the fundamental challenge of mapping theoretical cryptographic concepts to practical code structures and security-related behavioral properties, which can poten- tially enable a wide range of code-based security analysis for cryptographic software. This work thus serves as an essential first step towards performing systematic, automated analyses of cryp- tographic libraries and their applications of millions of lines of code. Although static information flow analysis itself has been studied as a general methodology for reasoning about code security, it remains untapped how this general technique can be exploited to secure cryptographic code in particular.

Contributions: In summary, this chapter makes the following contributions:

• We conducted an in-depth exploratory study of code-level security vulnerabilities in crypto- graphic programs, which resulted in a taxonomy of 25 classes of exploitable vulnerabilities in cryptographic implementations that boil down to 12 distinct types of security attacks. Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 61

• We derived 25 enforceable rules (meta-level properties) from our vulnerability study and taxonomy, which address 6 out of the total of 12 security attacks identified. We further showed that static analysis can be used for 23 of these rules to capture the sufficient condition for proving if a property holds or not.

• We identified compile-time security checking of cryptographic implementations as an un- explored problem in software security and proposed a deterministic finite automata (DFA) based language to express meta-level cryptographic properties that can be statically checked using static analysis. Further, we demonstrated our technique by developing a tool named

TAINTCRYPT that enforces 15 security rules we derived from our exploratory study.

• We implemented TAINTCRYPT for C/C++ programs as a practical tool based on LLVM and used the tool to evaluate our CPA technique against real-world cryptographic software. We demonstrated the effectiveness and efficiency of our technique and thus showed how static information flow analysis can be exploited to diagnose a large variety of cryptographic vulnerabilities in large-scale libraries like OpenSSL and critical software systems built on such libraries.

4.2 Motivation and Threat Model

To motivate this work, in this section, we present few examples of cryptographic vulnerabilities from real-world softwares. Then, we present our threat model.

4.2.1 Motivating Examples

Like other types of security vulnerabilities, one of the common causes of vulnerable information flows in cryptographic implementations is their inclusion of basic programming errors.

Example 1. For an example, consider the code snippet excerpted from the core ScreenOS 62 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

1 void prng_reseed(void){ 2 ... 3 if (dualec_generate(prng_temporary, 32) != 32) 4 error_handler("ERROR: unable to reseed", 11); 5 %\underline{memcpy(prng\_seed, prng\_temporary, 8)}%; 6 prng_output_index = 8; 7 memcpy(prng_key, &prng_temporary[prng_output_index], 24); 8 prng_output_index = 32; 9 } 10 11 void prng_generate(int is_one_stage) { 12 ... 13 prng_output_index = 0; 14 15 if (!one_stage_rng(is_one_stage)) { 16 %\underline{prng\_reseed()}%; 17 } 18 19 for (; prng_output_index <= 0x1F; prng_output_index += 8) { 20 // FIPS checks... 21 x9_31_generate_block(time, prng_seed, prng_key, prng_block); 22 // FIPS checks... 23 %\underline{memcpy(\&prng\_temporary[prng\_output\_index], prng\_block, 8)}%; 24 } 25 }

Figure 4.1: Excerpt from a real-world cryptographic program (the core ScreenOS 6.2 PRNG functions [100]), where prng_temporary and prng_output_index are global variables. When prng_reseed is called (Line 16), the loop variable prng_output_index in function prng_generate is set to 32, causing prng_generate to output sensitive data prng_seed at Line 5.

6.2 PRNG functions [100] in Figure 4.1. In this case, the shared use of global vari- ables (prng_temporary and prng_output_index) causes the leak of sensitive data prng_seed (Line 5 in prng_reseed) in the immediate post-seed (Line 16) output of func- tion prng_generate. As another case of this kind, a memory disclosure vulnerability called heartbleed in OpenSSL (e.g., CVE-2014-0160) had the potential of leaking sensitive information (e.g., cryptographic keys, and PRNG seeds). In fact, vulnerabilities and security threats rooted in similar coding errors are commonly found in real-world cryptographic software. Dealing with these issues can be particularly challenging as oftentimes addressing one problem can lead to other problems due to the bug fixes [118]. As an essential step towards overcoming such challenges, we will show how static information flow analysis can be employed to detect various types of sensitive Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 63

(b) Distribution of numbers of basic blocks over 40 (a) LoC distribution over all 7, 157 functions. randomly selected functions with LOC>= 200.

Figure 4.2: Function-level size metrics statistics of OpenSSL as an example large-scale crypto- graphic software.

data leakage in cryptographic code (Section 4.4).

Example 2. The first example as shown in Figure 4.1 illustrates the cases in which the vulnerable cryptographic information flows could be manually inspected. In large-scale projects that involve hundreds of developers, however, it is very difficult or impractical to check each of the information flow paths manually. For example, OpenSSL (as of commit 5748e4dc) consists of 7, 157 functions, totaling 325, 000 lines of code (LoC). In particular, 390 of these functions have more than 100 LoC and 90 of them have over 200 LoC. We randomly selected 40 functions of at least 200 LoC and counted the number of basic blocks (BBs) in each. We found that these 40 functions have 96 BBs on average. Figure 4.2 depicts the distribution of #LOC over all functions in the studied version of OpenSSL (a) and the distribution over #BBs of the 40 chosen functions (b). In addition, this software project involved 323 collaborating contributors. We argue that, for sizable cryptographic software, manual approaches would not be feasible whereas automated security defense/enforce- ment mechanisms are mandatory.

4.2.2 Threat Model

In this chapter, we focus on two main categories of threats associated with cryptographic code that are relevant to vulnerable information flows. These threats thus could be addressed through 64 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

properly designed static information flow analyses.

• Library-level coding vulnerabilities. Coding errors (such as premature exit of a for loop due to an incorrect loop condition [100]), vulnerabilities (such as key leakage), configuration issues and misuses (such as insecure storage of secrets) in third-party cryptographic libraries are particularly dangerous, as the code is usually widely used, oftentimes in commercial systems. As a consequence, security threats at this level can potentially put at risk a range of applications that are built on vulnerable libraries.

• Application-level implementation vulnerabilities. Application code varies (e.g., Android apps, client-side software, web applications), and there is a large variety in them. Vulnerabil- ities in this category are mostly related to API misuses—for example, erroneously invoking or configuring SSL library APIs [120], using obsolete crypto primitives, or intentionally disabling security mechanisms.

We assume that developers of both third-party cryptographic libraries and higher-level crypto- graphic applications could write vulnerable code. Our approach aims at helping these developers who are clueless, as well as developers who are more experienced but may still write insecure code (e.g., PRNG seed leakage [167] or Heartbleed [15]). We do not target side-channel vulnerabili- ties (e.g., RSA padding oracle [90]) since our analysis is static in nature, while static analysis is inherently limited in dealing with those intrinsically dynamic issues.

4.3 Cryptographic Vulnerabilities

In this section, we present different state-of-the-art cryptographic vulnerabilities. We also catego- rize them into several broader groups. Further, in Table 4.1, we present a set of security rules that ought to be enforced to defend crypto implementations against these vulnerabilities. Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 65

4.3.1 Chosen-plaintext attacks on IVs

Electronic Codebook (ECB) mode encryption is not semantically secure [137]. Bard et al. [80] showed that, the determinism of initialization vectors (IVs) can make cipher block chaining (CBC) mode encryption insecure too. However, the vulnerability remained merely hypothetical, until late 2011 when Doung and Rizzo [13] demonstrated a live attack (known as BEAST) against Paypal by exploiting the vulnerability. Row 1 of Table 4.1 corresponds to the security enforcement rule to avoid the use of ECB mode cipher and Row 2 corresponds to the attacks related to the predictability of IVs in CBC mode encryption. In Section 4.4, we present static information flow analysis based mechanisms to detect these vulnerabilities.

4.3.2 Attacks on PRNG

Historically, random number generators have been a major source of cryptographic information flow vulnerabilities [84, 122, 126]. The reason is that many of the cryptographic schemes rely on a cryptographically secure random number generator for key and cryptographic nonce generation (Row 11 of Table 4.1). A random number generator can be exploited such that its behaviors are made predictable. When these attacks occur, such vulnerabilities as the use of predictable seeds (Row 9) and backdoor-able PRNG (Row 10) can be manipulated by an attacker as a backdoor to break the security of the cryptographic applications that use the randomly generated numbers resulted from the PRNG.

The NIST standard for PRNG, referred to as Dual EC PRNG, has been considered both biased and backdoor-able by the security community [101]. Researchers have shown that the backdoor-ability of Dual EC PRNG was the main reason behind the Juniper incident in 2015 [100], and they also revealed how the cascade of multiple vulnerabilities due to programming errors led to the leak of PRNG seeds in Juniper Network (Row 19). We will demonstrate how the proposed static program analysis can be leveraged to detect such vulnerabilities (Section 4.4). 66 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

Table 4.1: Enforceable security rules in different cryptographic implementations. (*) indicates a rule focusing on data integrity and (#) indicates a rule focusing on data secrecy protection. Here, CPA and CCA stand for chosen plaintext attack and chosen ciphertext attack, respectively.

Attack Type Enforceable Rule Crypto property Static Analysis Tool (1) Should not use ECB mode in symmetric ciphers* Secrecy Taint Analysis CPA (2) IVs in CBC mode, should be generated randomly* Secrecy Taint Analysis (3) Validity of ciphertexts should not be revealed in symm. cipher Secrecy Program Dependence Analysis (4) Validity of ciphertexts should not be revealed in RSA Authentication Program Dependence Analysis (5) Should not use export grade or broken asymmetric ciphers* Authentication Taint Analysis CCA (6) Should not use 64 bit block ciphers (e.g., DES, Blowfish) * Secrecy Taint Analysis (7) Should not allow early termination (timing side channels) Secrecy Program Dependence Analysis (8) Should not allow cache-based side channels Secrecy – (9) PRNG seeds should not be predictable* Randomness Taint Analysis Predictability (10) Should not use untrusted PRNGs* Randomness Taint analysis (11) Nonces should be generated randomly* Randomness Taint analysis (12) Should not allow double “free()" exploit* Determinism Taint Analysis (13) Should not have type truncation (e.g., 64 to 32 bit integers) Determinism Data Flow Analysis Memory Corruption (14) Should not leave any wild or dangling pointers Determinism Data Flow Analysis (15) Should guard against Integer overflow* Determinism Taint Analysis (16) Should not write to a memory (buffer) beyond its length* Determinism Taint Analysis (17) Should Check return values of untrusted codes/libraries* Availability Taint Analysis Crash (18) Divisions should not be exposed to arbitrary inputs* Availability Taint Analysis Data Leak (19) Should not leak sensitive data# Secrecy Taint Analysis Key Leak (20) Should not use predictable/constant cryptographic keys Secrecy Data Flow Analysis Memory Leak (21) Should not leave allocated memory without freeing Availability Data Flow Analysis Memory Disclosure (22) Should not read to a memory beyond its length (heartbleed)* Secrecy Taint Analysis Hash Collision (23) Should not use broken hash functions* Integrity Taint Analysis Stack Overflow (24) Cyclic function calls should not depend on untrusted inputs Availability Program Dependence Analysis State machine bugs (25) Should detect illegal transitions in protocol state machines Authentication –

4.3.3 Use of Legacy Ciphers

There are several attacks based on the use of legacy ciphers, where cryptanalysis is feasible. For example, the Logjam attack [181] allows a man-in-the-middle attacker to downgrade vulnerable TLS connections to 512-bit export-grade cryptography. In [86], the authors demonstrated the recovery of a secret session cookie by eavesdropping HTTPS connections. Prior research also demonstrated that the use of weak hash functions (e.g., MD5 or SHA-1 in TLS, IKE, and SSH) might cause almost-practical impersonation and downgrade attacks in TLS 1.1, IKEv2, and SSH- 2 [87]. These attacks are characterized in Table 4.1: Row 5 (asymmetric cipher), 6 (symmetric cipher), and 23 (hash functions). The corresponding vulnerabilities can be detected using our Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 67

static analysis as described in Section 4.4.

4.3.4 Padding Oracles

Padding Oracle vulnerabilities can be categorized into two classes: (1) padding oracles in symmet- ric ciphers and (2) padding oracles in asymmetric ciphers.

Padding oracles in symmetric ciphers. Vaudenay et al. [185] presented a decryption oracle out of the receiver’s reaction on a ciphertext in the case of valid/invalid padding of CBC mode encryption. In SSL/TLS protocol, the receiver may send a decryption failure alert, if invalid padding is encoun- tered. By exploiting this information leaked from the server and cleverly changing the ciphertext, an attacker is able to decrypt a ciphertext without any knowledge of the key. “POODLE" [147] is a padding oracle attack that targets CBC-mode ciphers in SSLv3. “Lucky Thirteen" [69] is also a padding oracle attack on CBC-mode ciphers, exploiting the timing side channel vulnerabilities in victims that do not check the MAC for badly padded ciphertexts. In Row 3 of Table 4.1, we summarize padding oracle attacks on CBC mode encryptions.

Padding oracles in asymmetric ciphers. In [90], Bleichenbacher presented a stealthy attack on RSA based SSL cipher suites. The author utilized the strict structure of the PKCS#1 v1.5 format and showed that it is possible to decrypt the PreMasterSecret in a reasonable amount of time. There are numerous examples of using “Bleichenbacher padding oracle" to recover the RSA private key in different settings [76, 81, 130, 132], some of which use timing side channels to distinguish between properly-formed and malformed ciphertexts [146, 192].

We characterize Bleichenbacher padding oracle attacks in (Row 4) of Table 4.1.

We believe that padding oracles due to early terminations or problem-specific error messages can be detected using program dependence analysis analysis which is presented in Row 3 & 4 in Table 4.1. 68 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

Yet, whether such approaches can be employed to identify more sophisticated padding oracles (e.g., timing-based side-channels) remain to be explored, which is part of our future work.

4.3.5 Side-Channel Exploitations

We categorize side-channel attacks in cryptographic implementations into two classes: (1) timing- based (2) Cache-based side-channel attacks.

Timing-based side-channel attacks. Brumley et al. [95] presented a timing based side channel attacks on OpenSSL’s implementation of RSA decryption. In [94], the authors identified vulnera- bilities to a timing attack in OpenSSL’s ladder implementation for curves over binary fields. Ex- ploiting these vulnerabilities, the authors demonstrated stealing the private key of a TLS server that authenticates with ECDSA signatures. Timing side, channels are hard to detect in general. How- ever, timing side channels due to early termination can be detected using program dependence analysis. Row 7 of Table 4.1 summarizes such timing-based side-channel attacks.

Cache-based Side-channel attacks. After the introduction of cache-based side-channels [158], researchers demonstrated the existence of side-channels in various cryptographic implementations (e.g., AES [83] and DSA [118]). In [118], the authors presented a cache-based side-channel to compromise the OpenSSL’s implementation of DSA signature scheme and recovered keys in TLS and SSH cryptographic protocols. Row 8 of Table 4.1 characterizes cache-based side-channel attacks in cryptographic implementations.

Unfortunately, verifying constant-time implementations to eliminate these side-channel exploita- tions is notoriously difficult, because of its indirect and complex dependency on program con- trol flows [70]. As a result, building feasible static program analysis based techniques to verify constant-time implementation is still an open research problem. Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 69

4.3.6 State Machine Vulnerabilities

Attacks exist which exploit vulnerabilities in the protocol state machines of different cryp- tographic protocols [85, 105]. For example, the CCS injection attack [16] on OpenSSL’s ChangeCipherSpec processing vulnerability allows malicious intermediate nodes to intercept encrypted data and decrypt them while forcing SSL clients to use weak keys that are exposed to the malicious nodes.

Different cipher-suits in TLS use different message sequences. In SKIP-TLS [85], TLS imple- mentations incorrectly allow some messages to be skipped even though they are required for the selected cipher suite. The FREAK attack [17] has led to a server impersonation exploit against several mainstream browsers (including Safari and OpenSSL-based browsers on Android). Like most of the exploits of this category, FREAK also targets a class of deliberately chosen weak, export-grade cipher suites. These attacks are summarized in Row 25 of Table 4.1.

Most of the techniques [85, 105, 176] that detect vulnerabilities due to state machine exploitations use fuzzing-based input generation mechanisms based on dynamic program analyses. In contrast, building practical static analysis based detection mechanisms has yet to be investigated. because with the increase of the protocol internal states the computational complexity will rise exponen- tially. A key challenge towards static detection lies in the fact that as the protocol’s internal states increase, the computational complexity will accordingly rise exponentially.

4.3.7 Programming Errors

Programming errors have been a major source of vulnerabilities in C/C++ security software [102]. These vulnerabilities ranged from improper memory use to improper memory management. Ex- amples of improper memory use include memory over-read (e.g., heartbleed attack [15]) (Row 22 of Table 4.1), memory over-write (e.g., buffer overflow [22, 25]) (Row 16), integer overflow [102] 70 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

(Row 15), type truncation (Row 13)), and stack overflow1 (Row 24). Example vulnerabilities that boil down to improper memory management are malloc without free (Row 21), double free [23] (Row 12), and dangling pointers (Row 14).

In addition, prior studies [112, 137] have shown that other programming errors, such as those that are due to careless handling of cryptographic keys (e.g., using hard-coded keys), are also prevalent in the wild (Row 25). In Section 4.4, we present how static program analysis can be used to detect cryptographic vulnerabilities that are induced by various programming errors.

4.4 Security Rules and Enforcement

In this section, we first present the enforceable security rules we derived against the various crypto- graphic code security vulnerabilities described in Section 4.3. Then we discuss how various types of static analysis techniques can be used to detect the violations of these rules. Once a technique is fixed, then we elaborate how these rules can be expressed in a deterministic finite automaton (DFA) based language for enforcement. In particular, we demonstrate how security-aware testing can be enabled to enforce these rules via static code analysis.

propagator()

q0 q1 q2 source() filter() sink()

q3

Figure 4.3: Finite state machine (FSM) of taint analysis.

Enforceable security rules.

By analyzing different genres of attacks, we have identified 25 categories of cryptographic vulner-

1https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0228 Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 71

abilities and corresponding security rules that should be enforced in a cryptographic program to ensure different security properties, as shown in Table 4.1. These 25 categories of attacks fall in 12 higher-level attack classes (e.g., memory corruption and data leak) listed in the first column. Note that, the rules from memory corruption, crash, memory leak, and stack overflow are not cryp- tographic program specific, hence applicable for general program implementations. However, the violation of these rules in cryptography implementations causes violations in some of the cryp- tographic properties. To provide a one-stop service to secure cryptography implementations, we included these rules in our threat model.

4.4.1 Mapping Rules with Program Properties

23 out of the 25 security rules are enforceable through static code analysis. To use static analy- sis effectively for enforcing cryptographic rules we have to map cryptographic rules (meta-level properties) with static analysis properties. This mapping aims to capture sufficient condition of proving the violation of cryptographic rules. Since such a violation might not imply an exploitable vulnerability, thus it does not capture necessary conditions.

Detection with taint analysis

Next, we show that 15 out of these 23 rules can be enforced using static taint analysis, which requires mapping these rules to taint analysis properties. Specifically, we define the properties of taint analysis and map these rules with taint analysis properties.

Taint analysis typically works by identifying dangerous flows of untrusted inputs into sensitive destinations [187]. Generally, a taint analyzer refers to four types of functions to identify these flows: sources, propagators, filters and sinks. A source is a function that produces an untrusted input, while a sink is a function that consumes an untrusted input sending it to a sensitive destina- tion. A propagator is a function that propagates the untrusted data from one point of the program 72 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

Table 4.2: Transition functions (δ) of the finite state machine (FSM) presented in Figure 4.3.

Inputs source() propagator() filter() sink()

q0 q1 – –– q1 – – q2 q3 q2 – q1 –– States q3 – – ––

(via a variable) to another, while a filter is a function that purifies an untrusted variable and makes it trustworthy. Using taint analysis we can check the following two properties in a program.

• Integrity. The integrity property of a program regards to whether untrusted values (i.e., values generated from sources) can reach and modify trusted placeholders (i.e, sinks).

• Confidentiality. One may also be interested in the dual property of integrity such as confiden- tiality (i.e., whether values generated from sensitive sources can reach to untrusted sinks).

Formally, we can define taint analysis as a deterministic finite automation (DFA) that ∑ ∑ can be represented by the tuple: (Q, , δ, q0,F ). Where, Q = {q0, q1, q2, q3}, =

{source(), propagator(), filter(), sink()}, δ is presented in Table 4.2, q0 = {q0} and F = {q3}. Figure 4.3 shows the finite state machine (FSM) representation. In the next section, we discuss how this DFA based language can be used to express cryptographic properties at meta-level so that taint analysis can be used detect their violations.

Next, we discuss the mapping of cryptographic properties with taint analysis properties and prove that taint analysis can be used effectively to detect any violations of these cryptographic properties.

Use of insecure primitives.

Generally, cryptographic libraries provide a high-level interface to support a wide range of cryp- tographic functions (e.g., hashing, symmetric ciphers, asymmetric ciphers, signature, etc.), so that the coding style remains consistent regardless of the underlying algorithm or mode. Most of them Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 73

Inputs q0 q1 source() source() sink()

sink() q0 q1 – q1 – q3 q3 q3 – – States

(a) Finite state machine (FSM) (b) Transition function table (δ)

Figure 4.4: Finite state machine (FSM) and transition function table (δ) to detect insecure crypto- graphic primitives.

provide a set of convenient functions to create the specification of a crypto primitive (e.g, MD5) that can be used to initialize a certain type of cryptographic operation (e.g., digest).

To identify, the use of insecure cryptographic primitives (Rules 1, 6, 10, 23 in Table 4.1), one needs to identify that an insecure cryptographic primitive is used to initialize a cryptographic operation. If we define the creation of insecure cryptographic primitives as source()’s and the initialization of cryptographic operations as sink()’s, then the detection of such cases can represented by a DFA ∑ ∑ tuple: (Q, , δ, q0,F ). Where, Q = {q0, q1, q3}, = {source(), sink()}, δ is presented in

Table 4.4(b), q0 = {q0} and F = {q3}. The finite state machine is presented in Figure 4.4(a). Since, if we discard input symbol set {propagator(), filter()} from FSA of Figure 4.3, the FSM in Figure 4.3 and Figure 4.4(a) becomes equivalent. This means that taint analysis can detect all such uses of insecure crypto primitives.

q0 q1 q2 source() filter() sink()

q3

Figure 4.5: Finite state machine (FSM) to detect unsanitized inputs from external sources.

Filtering data from external sources.

Data from external sources should be filtered before use (Rules 15, 16, 17, 18 and 22 of Table 4.1). 74 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

If we define external sources as source() and functions that are sensitive to any external data as sink() and any sanitizing/filtering function as filters, then the FSM in Figure 4.5 can be used to detect any flow from external sources to the sensitive sink avoiding filters. Discarding input symbol propagator() from FSM of Figure 4.3 will result the FSM in Figure 4.5. Thus, taint analysis can detect all such violations.

Mappings for other rules can be deduced similarly.

q0 q1 q2 q3

(a)

q0 q1 q2 q3

(b)

Figure 4.6: DFA to detect (a) early termination in a loop, (b) non-generic error messages by travers- ing program dependence graphs. Inputs of these DFA are the nodes of of the graph. Here, represents the generic triggering node. For example, EVP_DecryptFinal_ex can serve as a trigger for rule 4. We labeled the inputs for only the state changing transitions.

Detection with other techniques.

It is possible to identify an early termination or non-generic error message using program depen- dence graph analysis (Rule 3, 4 & 7). The technique to detect such cases can also be expressed in deterministic finite automaton (DFA) based language as shown in Figure 4.6. A similar ap- proach can be used to detect cyclic function calls on untrusted inputs (Rule 24). Type truncation (Rule 13), dangling pointers (Rule 14), memory leaks (Rule 21) can be detected using forward data flow analysis. The use of constant keys can be detected using backward data flow analysis (Rule Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 75

20). Once, the technique is fixed, then expressing these rules in DFA based language is relatively straightforward.

4.4.2 System Overview

Clang preprocessing User configuration

Symbolic execution

Taint checking Cryptographic program Security Report

TaintCrypt

Figure 4.7: Overview of the process flow of our analysis approach TAINTCRYPT, including its input (cryptographic programs), output (security report), and three key technical components in between (Clang preprocessing, symbolic execution, and taint checking). * https://github.com/franchiotta/taintchecker In Section 4.4.1, we see that maximum rules (15 out of 23) can be enforced using taint analysis. Hence, to demonstrate the effectiveness of our methodology, we choose to built a static taint anal- ysis based system (named, TAINTCRYPT) that can be used to automatically enforce all these 15 rules. TAINTCRYPT is built as a checker on top of the Clang static analyzer [74]. Figure 4.7 gives an overview of the process flow of TAINTCRYPT: it takes the cryptographic program under check- ing as input and outputs a security report that informs the detected cryptographic vulnerabilities in the program.

Specifically, TAINTCRYPT analyzes the input program in three key steps corresponding to the three technical components shown in the figure:

• Clang preprocessing, which transforms the given program written in C to its control flow graph (CFG).

• Symbolic execution, which explores the program symbolically and produces symbolic val- ues for program states on the CFG. The execution is path-sensitive and every possible 76 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

path through the program is explored. The explored execution traces are represented with ExplodedGraph object. Each node of the graph is ExplodedNode, which consists of a ProgramPoint and a ProgramState.

• taint checking, which performs the static information flow analysis on ExplodedGraph of a given program to identify cryptographic vulnerabilities.

Figure 4.8: An example of TAINTCRYPT detecting the use of vulnerable functionality (MD5) in OpenSSL, which violates the security rule against using broken hash functions (Row 23 of Ta- ble 4.1). In this example, our analysis correctly identified the violation by reporting the invocation of a vulnerable hash function EVP_MD5().

To accommodate varied application scenarios, TAINTCRYPT reads a configuration file where users can specify taint sources, sinks, propagators and filters as functions used by the taint checking module (Figure 4.7). Note that, TAINTCRYPT has some built-in taint propagation rules. For ex- ample, (1) a variable will become tainted if a tainted value is assigned to it, (2) if the input of a built-in value transforming function (e.g., atoi, atol, gets, toupper, tolower) is tainted, then its return value is also marked as tainted, (3) if the input to a memory copying function (e.g., memcpy, strcpy) is tainted then its return value is also marked as tainted. Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 77

4.5 Evaluation

We evaluate TAINTCRYPT by conducting a controlled experiment on known cryptographic vulner- abilities. Specifically, our evaluation answers the following questions.

• Can TAINTCRYPT detect known cryptographic vulnerabilities in popular libraries and tools? (Section 4.5.1)

• In which scenario, TAINTCRYPT can produce false positives? (Section 4.5.2)

Figure 4.9: An example of taintCrypt detecting the memory disclosure vulnerability in OpenSSL- 1.0.1f (Row 22 of Table 4.1) hence the violation of the rule 22 against that vulnerability. Here the use of external data in variable payload without proper sanitization causes disclosure of memory of an arbitrary size. 78 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

Table 4.3: Overview of TAINTCRYPT evaluation.

Property Rule Software Version # Violations Similar Rules (1) ECB mode Deprecated function invocation (23) Insecure Hash OpenSSL 1.0.1f 7 (6) Insecure block ciphers (10) Insecure PRNG (2) Random IV Mandatory function invocation (11) Random nonce OpenSSL 1.0.1f 1 (9) Non-predictable PRNG seed (20) Non-predictable keys (15) Integer overflow (16) Buffer overflow Untrustworthy inputs (22) Memory disclosure OpenSSL 1.0.1f 7 (17) checking return values (18) Divide-by-zero Unwanted call sequence (12) Double free() OpenSSL 1.1.0-stable 1 – Sensitive data leak (19) Leak of PRNG seeds ScreenOS 6.2.0r1 1 –

4.5.1 Controlled Experiments

The purpose of our evaluation is to demonstrate how TAINTCRYPT can be used effectively to enforce the 15 security rules that are enforceable through static taint analysis. In Table 4.3, we show the overview of our controlled experimental evaluation.

Use of Insecure Primitives

The enforcement of the security rules in Rows 1, 5, 6, 10 and 23 of Table 4.1 demands pro- grammers to avoid/deprecate insecure cryptographic functionalities. For these cases, the user of TAINTCRYPT can specify the instantiation of insecure crypto primitives (e.g., EVP_md5) as sources and the initialization of any cryptographic operations (e.g., EVP_DigestInit_ex) as sinks, and run the tool with this source/sink configuration. If there exists any information flow path between one of these listed sources and one of the specified sinks in the given code, TAINTCRYPT will identify and report it. As discussed in Section 4.4.1, if TAINTCRYPT reports at least one such path, the requirement of deprecating the specified vulnerable functions is violated.

Figure 4.8 illustrates our technique detecting an instance of violation of the security rule concerning uses of broken hash functions. We show with the example how the use of MD5 is detected and Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 79

warned in OpenSSL2.

Filtering Data From External Sources

Using data from external sources may be unavoidable, yet for security, such data should be fil- tered/sanitized before used. To enforce these types of security rules (in Rows 15, 16, 17, 18 and 22

of Table 4.1), the user of TAINTCRYPT would specify three types of functions in its configuration: (1) untrusted data sources (source functions), (2) their relevant sinks (sink functions) and, most importantly, (3) data filters/sanitizers (filter functions). If there exists any path from any source to any sink that bypasses all the filters, an instance of violation against the rules will be reported by

TAINTCRYPT.

As an example, Figure 4.9 illustrates the violation of rule 22 due to a heartbleed memory disclosure vulnerability in OpenSSL-1.0.1f. The source (function n2s) produces an untrusted input data in variable payload at Line 1464, which potentially reaches the invocation of the sink (function memcpy) at Line 1487 hence leads to memory disclosure. In this case, no sanitizer is found on

any potential flows from the source to the sink, thus TAINTCRYPT will report the rule 22 being violated.

Ensuring Certain Function Invocations

To enforce the security rules 2, 9 and 11 (in corresponding rows of Table 4.1), we need to ensure

the invocation of certain functions. To do that, the TAINTCRYPT user may specify these functions as filters as part of the configuration and then run the tool. This way, our analysis will report any dangerous path from a source to a sink that avoids these filters. If at least one such path is reported, the analysis indicates that the requirement of invoking the specified functions is violated.

To illustrate the ability of our analysis in detecting such violations, Figure 4.10 shows that the

2This vulnerability existed before commit f8547f62 80 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

call_dummy_source (a) The source (b) WPACKET_memcpy is a potential sink and _client_random produces untrusted input ssl_fill_hello_random a potential filer. s->s3->client_random.

Figure 4.10: An example of TAINTCRYPT facilitating the enforcement of security rule 11 which concerns secured random number generation in OpenSSL. In this case, as there exists a path from the source (a) to the sink that avoids the filter ssl_fill_hello_random (b), TAINTCRYPT will generate a warning reporting the rule being violated.

function call_dummy_source_client_random as a source produces an untrusted input held in the variable s->s3->client_random3. This untrusted data would flow to the sink WPACKET_- memcpy if function ssl_fill_hello_random as a potential filter is not in- voked. In this example, at least one path from the source to the sink exists which bypasses the filter. Since the existence of such a path is unexpected (with respect to the requirement that the

filter should be invoked), TAINTCRYPT will produce a warning indicating the security violation.

Closer inspection of this example reveals that the reported path actually reuses previously gen- erated values of s->s3->client_random variable, thus can not be considered as a security violation. This indicates that TAINTCRYPT might produce false positives when a violation of the defined cryptographic property does not directly translate to a security violation.

To whitelist such special value propagations, one should mark them as filter(s).

3The present TAINTCRYPT prototype only accepts functions as sources. To accommodate the cases in which variables act immediately as taint sources, we use a dummy call that takes the variable as an argument to adapt to the current capabilities of TAINTCRYPT. Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 81

(b) Untrusted sink.

(a) Sensitive source.

Figure 4.11: An example illustrating the ability of our analysis in detecting and report- ing an instance of data leak in Juniper Network. In this case, the sensitive data source in variable prng_seed as the first 8 bytes of variable prng_temporary reaches the sink print_number, which violates our security rule 19.

Preventing Data Leaks

Our analysis can also be used to detect violation of security rules (in particular, rule 19 in Table 4.1)

against data leaks. To do so, the user would indicate in the configuration of TAINTCRYPT the sensitive data producers as sensitive sources and potential mole functions (e.g., function writing data to the filesystem or network) as untrusted sinks. The information flow analysis with respect to this configuration will detect and report if there exists any path from one of these sources to one of the sinks. In [100], the authors showed that ScreenOS of Juniper network was leaking seeds due to programming errors.

Figure 4.11 illustrates how our analysis with TAINTCRYPT can be employed to detect violations of rule 19. In this example, the data subject to potential leakage is a PRNG seed held in the variable prng_seed. The data is first tainted at Line 67 and later leaked at Line 98 via the sink print_number through five major steps highlighted in light yellow. 82 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman

Figure 4.12: An example case of our technique being used to detect a double-free vulnerability in OpenSSL, which constitutes an instance of violation of our security rule 12. Here TAINTCRYPT reports the double-free incident with variable parms.

Avoiding Double-Free Vulnerabilities

Our technique can also accommodate the need for enforcing the security rule 12, which would pre- vent the vulnerabilities of double free in the given program. Specifically, the user of TAINTCRYPT may specify deallocation functions (e.g., the free function in C programs) as both sources and sinks, and then track the taint flow from the sources to the sinks. If a variable is passed as an argument to a deallocation function as a source, it gets tainted. Then if the taint propagates to a subsequent invocation of a deallocation function as a sink (i.e., the variable is freed the second time), an instance of violation of rule 12 will be reported.

In Figure 4.12, the example illustrates the use scenario of TAINTCRYPT in which our analysis is Sazzadur Rahaman Chapter 4. Program Analysis of Cryptographic Implementations 83 applied to detect double-free vulnerabilities in OpenSSL4. In this example, the violation of rule 12 due to the variable parms being double-freed is detected through tracking information flow via this variable from the first deallocation site at Line 74 to the second at Line 96.

4.5.2 Limitations

Since, our analysis aims to capture sufficient conditions on meta-level properties, it has the poten- tial to generate false positives. However, capturing necessary condition statically to prove crypto- graphic vulnerabilities is still open.

Also, static code analysis trades precision for soundness and scalability, in general. The symbolic execution based path-sensitive analysis takes computationally exponential time [96]. Therefore, considering the scalability, the loop unrolling mechanism of the SMT solver used to model sym- bolic execution in clang static analyzer is made constant bounded. Thus, similar to any other static analysis-based approaches, our technique suffers from imprecision as well.

Currently, our analysis only accepts functions as sources, sinks, filters, and propagators. As a result, in many real-world cryptographic software, the use of constraints as filters may be prevalent (e.g., using predicates to screen untrustworthy inputs) can lead to additional false positives. On the other hand, the comprehensiveness of these four lists of functions in the configuration for our technique immediately affect its soundness: missing some of these functions would lead to false negatives. For some of the rules, these configurations are standard across different code bases (e.g., Rules 1, 5, 6, 10, 12, 23), while for other rules developers need to specify these configurations.

Another limitation of TAINTCRYPT lies in its current implementation. TAINTCRYPT is built on the static taint checker in Clang, which by default does not support analysis across translational units. Thus, currently, our tool does not track taint propagation out of a translational unit. If a taint source and a taint sink are located in different translational units, TAINTCRYPT would not

4This vulnerability existed till commit a34ac5b8 84 Chapter 4. Program Analysis of Cryptographic Implementations Sazzadur Rahaman be able to detect the security violation when there is actually an information flow path from the source to the sink. Note that, however, this is an implementation flaw rather than a limitation of our technique itself.

4.6 Summary

While recent years have seen great advances in cryptography research, security of cryptographic software implementations has received only a few attentions. However, as in other application domains, even a small programming error in these implementations can lead to dangerous security vulnerabilities that have severe and broad impact on end-user devices and services. In this pa- per, we aim to fill this gap by investigating real-world security threats in cryptographic code. Our result of this study is a categorization of 25 different types of cryptographic security vulnerabili- ties, along with associated defending rules that are practically enforceable. We show that 23 out of 25 rules are enforceable using static analysis techniques. To facilitate developers in enforcing these rules in their cryptographic coding practice, we have further developed an information flow analysis technique TAINTCRYPT and implemented a prototype for C programs. We have demon- strated with controlled evaluation how our technique can be applied to varied use scenarios for identifying violations of 15 of our security rules and thus helping developers avoid corresponding vulnerabilities.

As future work, we plan to make TAINTCRYPT capable of detecting vulnerable cryptographic information flows across multiple translational units, with respect to the LLVM framework and Clang frontend on which our tool is built. Another part of future work is to improve the efficiency of TAINTCRYPT configuration by automatically discovering comprehensive lists of sources and sinks. Finally, supporting non-function sources and sinks would make our technique applicable in a broader application scope. Chapter 5

Security in Payment Card Industry

5.1 Introduction

Payment systems are critical targets that attract financially driven attacks. Major card brands (in- cluding Visa, MasterCard, American Express, Discover, and JCB) formed an alliance named Pay- ment Card Industry Security Standards Council (PCI SSC) to standardize the security requirements of the ecosystem at a global scale. The PCI Security Standards Council maintains, updates, and promotes Data Security Standard (DSS) [51] that defines a comprehensive set of security require- ments for payment systems. PCI DSS certification has established itself as a global trademark for secure payment systems. According to PCI DSS [51],

“PCI DSS applies to all entities involved in payment card processing – including merchants, pro- cessors, acquirers, issuers, and service providers. PCI DSS also applies to all other entities that store, process, or transmit cardholder data and/or sensitive authentication data.”

The PCI Security Standards Council plays a major role in evaluating the security and compliance status of the payment card industry participants and supervises a set of entities that are responsi- ble to perform compliance assessments such as Qualified security assessors (QSA) and Approved scanning vendors (ASV). All entities in the PCI ecosystem, including merchants, issuers, and ac- quirers, need to comply with the standards. PCI standards specify that entities need to obtain their compliance reports from the PCI authorized entities (e.g., QSA and ASV) and periodically submit the reports in order to maintain their status. For example, a merchant needs to submit its compli- ance report to the acquirer bank to keep its business account active within the bank. Similarly, card

85 86 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

issuer and acquirer banks need to submit their compliance reports to the payment brands (e.g., Visa, MasterCard, American Express, Discover, and JCB) to maintain their membership status [51].

However, several recent high-profile data breaches [34, 174] have raised concerns about the secu- rity of the payment card ecosystem, specially for e-commerce merchants1. A research report from Gemini Advisory [44] shows that 60 million US payment cards have been compromised in 2018 alone. Among the merchants that experienced data breaches, many were known to be compliant with the PCI data security standards (PCI DSS). For example, in 2013, Target leaked 40 million payment card information due to insecure practices within its internal networks [174], despite that Target was marked as PCI DSS compliant. These incidents raise important questions about how PCI DSS is enforced in practice.

In this chapter, we ask: how well are the PCI data security standards enforced in practice? Do real-world e-commerce websites live up to the PCI data security standards? These questions have not been experimentally addressed before. We first design and develop testbeds and tools to quantitatively measure the degree of PCI DSS compliance of PCI scanners and e-commerce merchants. PCI scanners are commercial security services that perform external security scans on merchants’ servers and issue certificates to those who pass the scan. By setting up our testbed, i.e., an e-commerce website with configurable vulnerabilities, we empirically measure the capability of PCI scanners and the rigor of the certification process.

Our results show that the detection capabilities of PCI scanners vary significantly, where even PCI-approved scanners fail to report serious vulnerabilities in the testbed. For 5 of the 6 scanners evaluated, the reports are not compliant with the PCI scanning guidelines [41]. All 6 scanners issued certificates to web servers that still have major vulnerabilities (e.g., sending sensitive infor- mation over HTTP). Even if major vulnerabilities are detected (e.g., remotely accessible MySQL), which should warrant an “automatic failure” according to the guideline [41], some PCI scanners still proceed with certification regardless.

1Merchants that allow online payment card transactions for selling products and services are referred to as “e- commerce merchants”. Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 87

Given the weak scanner performance, it is possible that real-world e-commerce websites still have major vulnerabilities. For validation, we build a new lightweight scanning tool and perform em- pirical measurements on 1,203 real-world e-commerce websites. Note that for independent re- searchers or third-parties, scanning in the PCI context imposes a new technical challenge, namely the non-intrusive low-interaction constraint. The low interaction constraint, necessary for testing live production sites, makes it difficult to test certain vulnerabilities externally. Traditional pene- tration testing (pentest) tools are not suitable to test live websites in production environments. For example, pentest tools such as w3af [6] have brute-force based tests which require intense URL fuzzing (e.g., prerequisite for SQL injection, XSS) or sending disruptive payload. The feedback from the PCI Security Council during our disclosure (Section 5.6) also confirmed this challenge.

Our technical contributions and findings are summarized below.

• We design and develop an e-commerce web application testbed called BUGGYCART, where we implant 35 PCI-related vulnerabilities such as server misconfiguration (e.g., SSL/TLS and HTTPS misconfigurations), programming errors (e.g., CSRF, XSS, SQL Injection), and

noncompliant practices (e.g., storing plaintext passwords, PAN, and CVV). BUGGYCART allows us to flexibly configure vulnerabilities in the testbed for measuring the capabilities and limitations of PCI scanners.

• Using BUGGYCART, we evaluated 6 PCI scanning services, ranging from more expensive scanners (e.g., $2,995/Year) to low-end scanners (e.g., $250/Year). The results showed an alarming gap between the specifications of the PCI data security standard and its real-world enforcement. For example, most of the scanners choose to certify websites with serious SSL/TLS and server misconfigurations. None of the PCI-approved scanning vendors detect SQL injection, XSS, and CSRF. 5 out of the 6 scanners are not compliant with the ASV scanning guidelines (Section 5.4).

• We further evaluated 4 generic web scanners (not designed for PCI DSS), including two commercial scanners and two open-source academic solutions (w3af [6], ZAP [4]). We 88 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

examine whether they can detect the web-application vulnerabilities missed by PCI scanners. Unfortunately, most of these vulnerabilities still remain undetected. (Section 5.4).

• We conducted empirical measurements to assess the (in)security of real-world e-commerce

websites. We carefully designed and built a lightweight vulnerability scanner called PCI-

CHECKERLITE. Our solution to addressing the non-intrusiveness challenge is centered at

minimizing the number of requests that PCICHECKERLITE issues per test case, while max- imizing the test case coverage. It also involves a collection of lightweight heuristics that

merge multiple security tests into one request. Using PCICHECKERLITE, we evaluated 1,203 e-commerce website across various business categories. We showed that 94% of the websites have at least one PCI DSS violation, and 86% of them contains violations that should have disqualified them as non-compliant (Section 5.5). Our in-depth accuracy anal-

ysis also showed that PCICHECKERLITE’s outputs have fewer false positives than the w3af counterpart (Table 5.6).

Based on our results, we further discuss how various PCI stakeholders, including the PCI council, scanning providers, banks, and merchants, as well as security researchers, can collectively improve the security of the payment card ecosystem (Section 5.6). We open-sourced our BUGGYCART 2 testbed and PCICHECKERLITE 3 in GitHub, which also include a pre-installed docker image. We are in the process of sharing BUGGYCART with the PCI security council (Section 5.3).

5.2 Background on PCI and DSS

We start by describing the background for the security practices, workflow, and standards of the current PCI ecosystem that involves banks, store-front and e-commerce vendors, and software providers. Then, we focus on how merchants obtain security certifications and establish trust with the banks. We discuss how the certification process is regulated and executed. 2Available at https://github.com/sazzad114/buggycart 3Available at https://github.com/sazzad114/pci-checker Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 89

Merchant Payment Gateway 2 3

E-Commerce User

5 6 4

Merchant POS 1 7 8

Acquirer POS

10 9 Issuer Bank Payment Network Acquirer Bank

Figure 5.1: Overview of the payment card ecosystem.

5.2.1 Payment Card Ecosystem

The Payment Card Industry (PCI) has established a working system that allows merchants to accept user payment via payment cards, and complete the transactions with the banks in the backend. Figure 5.1 shows the relationships between the key players in the ecosystem, including users, merchants, and banks. The user and the merchant may use different banks. The issuer bank issues payment cards to the user and manages the user’s credit or debit card accounts (step ‚). Users use the payment card at various types of merchants (steps ƒ, †, and ˆ). The acquirer bank manages an account for the merchant to receive and route the transaction information (steps , ‡, and ‰). The acquirer bank ensures that funds are deposited into the merchant’s account once the transaction is complete via the payment network (steps Š and ‹). The payment network, also known as the card brands (e.g., Visa, MasterCard), bridges between the acquirer and the issuer banks.

There are different types of merchants. For merchants that run an e-commerce service (i.e., all transactions are made online), they usually interact with the acquirer bank via a payment gateway (e.g., Stripe, Square), which eases the payment processing and integration („). For merchants that have a physical storefront, they use point-of-sale (POS) devices, i.e., payment terminals, to collect 90 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

and transfer user card information to the acquirer bank. They can use either the acquirer bank’s POS (ˆ) or their own POS (†). The key difference is that acquirer POS directly transfers the card information to the bank without storing the information within the merchant. Merchant POS, however, may store the card information.

Due to the fact that e-commerce websites and merchant POSes need to store card information, the merchants need to prove to the bank that they are qualified to securely handle the information processing. The acquirer bank requires these merchants to obtain PCI security certifications in order to maintain accounts with the bank [28]. Next, we introduce the security certification process.

5.2.2 PCI Council and Data Security Standard

Payment Card Industry Security Standards Council manages a number of specifications to ensure data security across the extremely complex payment ecosystem. Among all the specifications, only the Data Security Standard (DSS) and Card Production and Provisioning (CPP) are required. All the other specifications (shown in Table B1 in Appendix B) are recommended (i.e., optional). CPP is designed to regulate card issuers and manufactures. The Data Security Standard (DSS) is the most important specification that is required to be complied by issuer banks, acquirer banks, and all types of merchants and e-commerce sites, i.e., all systems that process payment cards. Our work is focused on the DSS compliance.

In the PCI Data Security Standard specifications [51], there are 12 requirements that an organi- zation must follow to protect user payment card data. These requirements cover various aspects ranging from network security to data protection policies, vulnerability management, access con- trol, testing, and personnel management. In total, there are 79 more detailed items under the 12 high-level requirements. We summarize them in the Appendix B (Table B3).

DSS applies to all players in the ecosystem, including all merchants and acquirer/issuer banks. For merchants, they need to approve their compliance to the acquirer bank to open an account Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 91

Table 5.1: PCI Compliance levels and their evaluation criteria.

Compliance Requirements Level Transactions Self-report Sec Scans Sec Audits Per Year with SAQ by ASV by QSA Level 1 Over 6M Quarterly Quarterly Required Level 2 1M – 6M Quarterly Quarterly Required/Optional Level 3 20K – 1M Quarterly Quarterly Not Required Level 4 Less than 20K Quarterly Quarterly Not Required

for their business. For acquirer and issuer banks, they need to prove their compliance to the card brands (e.g., Visa, MasterCard) for the eligibility of membership.

We use the merchant as an example to illustrate how DSS compliance is assessed. First, the PCI se- curity standard council provides the specifications and self-assessment questionnaires (SAQ) [28]. Merchants self-assess their DSS compliance and attach the questionnaires in their reports. Second, the merchant must pass the security tests and audits from external entities such as Approved Scan- ning Vendors (ASV) and the Qualified Security Assessors (QSA). The PCI council approves a list of ASV and QSA [43] for the assessment.

Security scanning is conducted by certified scanners (Approved Scanning Vendors or ASVs) on card processing entities. Security scanning is performed remotely without the need for on-site au- diting. Not all the requirements can be automatically verified by the remote scanning (see Table B1 in the Appendix B). The PCI council provides an ASV scanning guideline [41], which details the responsibilities of the scanners (see Table B2 in the Appendix B).

Self-assessment questionnaires (SAQs) allow an organization to self-evaluate its security com- pliance [53]. In SAQs, all the questions are close ended. More SAQ analysis is presented in Section 5.6.

Security audit is carried out by Qualified Security Assessors (QSAs). It requires on-site auditing (e.g., checking network and database configurations, examining software patches, and interviewing employees). As security scanning cannot verify all of the DSS properties, on-site audits are to cover those missing aspects. 92 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

Level of compliance varies for different organizations. The compliance level is usually deter- mined by the number of annual financial transactions handled by the organization. Each acquirer bank (or card brand) has its own program for compliance and validation levels. In Table 5.1, we show the tentative compliance levels that roughly match most of the payment brands [28, 46]. The self-assessment questionnaires (SAQs) and security scanning are required quarterly regardless of the compliance levels. Only large organizations that handle over 1 million transactions a year are required to have the on-site audit (by a QSA). The majority of merchants are small businesses, (e.g., 85% of merchants all over the world have less than 1 million USD web sale [35]). Thus, most online merchants rely on ASV scanners and self-reported questionnaires for compliance as- sessment.

5.2.3 Our Threat Model and Method Overview

Threat Model. The certification process is designed as an enforcement mechanism for merchants to hold a high-security standard to protect user data from external adversaries. If the certifica- tion process is not well executed, it would allow merchants with security vulnerabilities to store payment card data and interact with banks. In addition, such security certification may also cre- ate a false sense of security for merchants. We primarily focus on the automatic server screening by PCI scanners given that all merchants need to pass the scanning. We also briefly analyze the Self-assessment Questionnaire (SAQs). Our analysis does not cover on-site audits, because i) on- site audit is not required for the vast majority of the merchants and ii) it is impossible to conduct analysis experiments on on-site audits without partnerships with service providers.

Methodology Overview. To systematically measure and compare the rigor of the compliance assessment process, our methodology is to build a semi-functional e-commerce website as a testbed and order commercial PCI scanning services to screen and certify the website. The testbed allows us to easily configure website instances by adding or removing key security vulnerabilities that PCI DSS specifies. We leverage this testbed to perform controlled measurements on the certification Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 93 process of a number of PCI scanners. In addition to the controlled experiments, we also empirically measure the security compliance of real-world e-commerce websites with a focus on a selected set of DSS requirements. In the following, we describe our detailed measurement methodologies and findings.

5.3 Measurement Methodology

In this section, we describe our measurement methodology for understanding how PCI scanners perform data security standard (DSS) compliance assessment and issue certificates to merchants. The core idea is to build a re-configurable testbed where we can add or remove key security vul- nerabilities related to DSS and generate testing cases. By ordering PCI scanning services to scan the testbed, we collect incoming network traffic as well as the security compliance reports from the scanning vendors. In the following, we first describe the list of vulnerabilities that our testbed covers, and then introduce the key steps to set up the e-commerce frontend.

5.3.1 Security Test Cases

The testbed contains a total of 35 test cases, where each test case represents a type of security vul- nerabilities. Running a PCI scanner to scan the testbed could reveal vulnerabilities that the scanner can detect, as well as those that the scanner fails to report. We categorize the 35 security test cases i1–i35 into four categories, namely network security, system security, application security, and se- cure storage. Note that there are 29 test cases in the first three categories are within the scope of ASV scanners (i.e., ASV testable cases). The other 6 cases under “secure storage” cannot be remotely verified. We include these cases to illustrate the limits of ASV scanners.

1. Network security (14 test cases). These testing cases are related to network security prop-

erties, including firewall status, (i1), the access to critical software from network (i2–i4), 94 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

default passwords (i5–i6), the usage of HTTP to transmit sensitive data (e.g., customer or

admin login information) (i7), and SSL/TLS misconfigurations (i12–i18).

2. System security (7 test cases). These test cases are related to system vulnerabilities, in-

cluding vulnerable software (i19–i20), server misconfigurations (i29–i32), and HTTP security

headers (i33).

3. Web Application security (8 test cases). These test cases are related to application-level

problems including SQL injections (i21–i22), not following secure password guidelines (i23–

i24), the integrity of from external sources (i25), revealing crash reports (i26), XSS

(i27) and CSRF (i28).

4. Secure storage (6 test cases). Secure storage is impossible to verify through external scans. Thus, DSS does not require PCI scanners to test these properties, such as storing sensitive

user information (i8), storing and showing PAN in plaintext (i9–i11), and insecure ways of

storing passwords (i34–i35). In PCI DSS, merchants need to fill out the self-assessment questionnaire about how they handle sensitive data internally. We choose to include these vulnerabilities in the testbed for highlighting the fundamental limitations of external scans on some important aspects of server security.

Must-fix Vulnerabilities. These test cases are designed following the official ASV scanning guideline [41] and the PCI data security standard (DSS) [51]. Among the 35 cases, 29 are within the scope (responsibility) of ASV scanners that can be remotely tested. After vulnerabilities are detected, website owners are required to fix any vulnerabilities that have a CVSS score ≥ 4.0, and any vulnerabilities that are marked as mandatory in PCI DSS. CVSS (Common Vulnerability Scoring System) measures the severity of a vulnerability (score 0 to 10). The CVSS scores in Table 5.3 are calculated using CVSSv3.0 calculator [2]. Vulnerabilities that have no CVSS score are marked as “N/A”. If the website owner fails to resolve the “must-fix” vulnerabilities, a scanner should not issue the compliance certification. As shown in Table 5.3, 26 out of the 29 testable Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 95

cases are required to be fixed. Three cases (vulnerability 3, 4, and 18) are not mandatory to fix. For example, exposing OpenSSH to the Internet (case-3) does not mean immediate danger as long as the access is well protected by strong passwords or SSH keys.

Completeness and Excluded Cases. When building our BuggyCart testbed and the PCICHECK-

ERLITE prototype, we exclude five mandatory ASV scanning cases: i) backdoors or malware, ii) DNS server vulnerabilities, e.g., unrestricted DNS zone transfer, iii) vulnerabilities in mail servers, iv) vulnerabilities in hypervisor and virtualization components, and v) vulnerabilities in wireless access points. Most of them (namely, ii, iii, iv, and v) are not relevant, as they involve servers or devices outside our testbed or an application server. In the first category, it is difficult to design a generic network-based testing case. We also exclude the non-mandatory cases (shown in the last 4 rows of Table B2 in the Appendix B).

Note that ASV testable cases only represent a subset of PCI DSS specifications [51] because some specifications are not remotely verifiable. There are specifications related to organization policies, which are impossible to verify externally, e.g., “restricting physical access to cardholder data” (DSS req. 9), and “documenting the key management process” (DSS req. 3.6). They can only be assessed by onsite audits, which unfortunately are not applicable to the majority of e-commerce websites and small businesses (see Table 5.1). We will discuss this further in Section 5.6.

Our PCICHECKERLITE prototype in Section 5.5 scans 17 test cases in Table 5.5, which are a subset of the 29 externally scannable rules in Table 5.3. When scanning live production websites, we have to eliminate cases that require intrusive operations such as web crawling, URL fuzzing, or port scanning.

5.3.2 Testbed Architecture and Implementations

A key challenge of measuring PCI scanners is to interact with PCI scanners like a real e-commerce website does, in order to obtain reliable results. This requires the testbed to incorporate most (if 96 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

not all) of the e-commerce functionality to interact with PCI scanners and reflect the scanners’ true performance. For this reason, we choose the OpenCart [50] as the base to build our testbed. OpenCart is a popular open source PHP-based e-commerce solution used by real-world merchants to build their websites. This allows us to interact with PCI scanners in a realistic manner to ensure the validity of measurement results.

Testbed Frontend. The frontend of our testbed supports core e-commerce functionality, such as account registration, shopping cart management, and checkout and making payment with credit cards. The code of the website4 is based on OPENCART. We rewrote the OpenCart system by integrating all 35 security vulnerabilities and testing cases. We deployed the website using Apache HTTP server and MySQL database. Our testbed automatically spawns a website instance fol- lowing a pre-defined configuration. We used OpenSSH as the remote access software and Php- myadmin to remotely manage the MySQL database. We hosted our website in Amazon AWS in a single t2.medium server instance with Ubuntu 16.04. We obtained a valid SSL certificate to enable HTTPS from Let’s Encrypt [49].

Screenshots of the website are provided in Figures B1 and B2 in Appendix B. We set up the website solely for research experiment purposes. Thus, it does not have a real payment gateway. Instead, we set up a dummy payment gateway that imitates the real gateway Cardconnect [45]. The website forwarded credit card transactions to this dummy payment gateway. The dummy endpoint for Cardconnect is implemented using flask-restful framework. We modified the /etc/hosts file of our web server to redirect the request. During our experiments, our server did not receive any real payment transaction requests. We further discuss research ethics in Section 5.3.3.

Implementing Security Test Cases. Next, we describe the implementation details of the 35 secu- rity test cases in Table 5.3.

For the network security category, we implement test cases i1 to i3 by changing inbound traffic con-

figurations within the Amazon AWS security group. Test case i4 (administer access over Internet)

4The URL was www.rwycart.com. We took the site offline after the experiment. Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 97

is implemented by changing phpmyadmin configuration. For test case i5 (default SQL password), we do not set any password for “root” and enable access from any remote host. Test case i5 is

implemented by configuring phpmyadmin (no password for user “root”). Test case i7 is set to keep

port 80 (HTTP) open without a redirection to port 443 (HTTPS). Test cases i12, i14, i16, and i17 are

implemented by using default certificates from Apache. Test cases i13 and i18 are implemented by changing SSLCipherSuite and SSLProtocol of the Apache server. For test case i15, we configure the Apache server to use a valid certificate but with a wrong domain name.

For the system security category, we implement test cases i19–i20 by installing software that are known to be vulnerable. For test case i19, we use OpenSSL 7.2, which is vulnerable to privilege

escalation and timing side channel attacks. For test case i20, we used phpmyadmin 4.8.2 which is

known to be vulnerable to XSS. We implemented test cases i29 to i33 by changing the configurations

of the Apache server. For test case i33 (HTTP security header) in particular, we consider X-Frame- Options, X-XSS-Protection, X-Content-Type-Options, and Strict-Transport-Security.

For the web application security category, we implement test cases i21 to i28 by modifying Open- Cart source code [50]. Regarding secure password guidelines, we disable password retry restric-

tions for both users and administrators (test case i23), disable the length checking of passwords

(test case i24). For SQL injection, we modify the admin login (test case i21) and customer login

(test case i22) code to implement SQL injection vulnerabilities. For admin login, we simply con- catenate user inputs without sanitation for the login query. For the customer login, we leave an SQL injection vulnerability at the login form. Given that the user password is stored as unsalted MD5 hashes, we run the login query by concatenating the MD5 hash of the user-provided pass- word, which is known to be vulnerable to SQL injection [12]. For XSS and CSRF, we implant an XSS vulnerability in the page of editing customer profiles, by allowing HTML content in the “first

name” field (test case i27). By default, Opencart does not have any protection against CSRF (test

case i28). For test case i26 (displaying errors), we configure OpenCart to reveal crash reports (an insecure practice, which gives away sensitive information). Opencart by default does not check the integrity of Javascript code loaded from external sources (test case i25). 98 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

For the secure storage category, we modify the Cardconnect extension to store CVV in our database

(test case i8) and the full PAN (instead of the last 4 digits) in the database in plaintext (test case i10). We add an option to encrypt PANs before storing, but the encryption key is hardcoded (test case i11). We also update the customers’ order history page to show the unmasked PAN for each transaction (test case i9). Finally, the testbed stores the raw unsalted MD5 hash of passwords for customers (test case i34) and plaintext passwords for admins (test case i35).

5.3.3 Research Ethics

We have taken active steps to ensure research ethics for our measurement on PCI scanners (Sec- tion 5.4). Given that our testbed is hosted on the public Internet, we aim to prevent real users from accidentally visiting the website (or even putting down credit card information). First, we only put the website online shortly before the scanning experiment. After each scanning, we immediately take down the website. Second, the website domain name is freshly registered. We never advertise the website (other than giving the address to the scanners). Third, we closely monitor the HTTP log of the server. Any requests (e.g., for account registration or payment) that are not originated from the scanners are dropped. Network traffic from PCI scanners are easy to distinguish (based on IP and User-Agent) from real user visits. We did not observe any real user requests or payment transactions during our experiments.

All PCI scanners run automatically without any human involvement from the companies. We order and use the scanning services just like regular customers. We never actively generate traffic to the scanning service, and thus our experiments do not cause any interruptions. Our experiments fol- low the terms and conditions specified by the scanning vendors, which we carefully examined. We choose to anonymize the PCI scanners’ names since some scanning vendors strictly forbid publish- ing any benchmark results. We argue that publishing our work with anonymized scanner names is sufficient for the purpose of describing the current security practice in the payment card indus- try, as the security issues reported are likely industry-wide, not unique to the individual scanners Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 99

BuggyCart Testbed PCI Scanner Baseline Version Scanning

Fixing a minimal ing set of vulnerabilities Scann to get PCI DSS certified

Certified Version Scanning Reports Figure 5.2: Illustration of the baseline scanning and the certified version. A PCI scanner iteratively scans the testbed. The initial scan (baseline) is on the original testbed with all 35 vulnerabilities. The certified version is the testbed version where the testbed successfully passes the scanning after we iteratively fix a minimal set of vulnerabilities in the testbed. In Table 5.3, we report the scanning results on both versions of the testbed for each scanner. evaluated. In addition, anonymization would help alleviate the bias toward individual scanners and potential legal issues [117].

In Section 5.5, we also carefully design our experiments when evaluating the compliance of 1,203 websites. The experiment is designed in a way that generates minimal footprints and impact on the servers, in terms of the number of connection requests to the servers. Our client is comparable to a normal client and does not cause any disruption to the servers. For example, we quickly closed the connection, after finding out whether or not an important port is open. More details are be presented in Section 5.5.

5.4 Evaluation of PCI Scanners

Our first set of experiments is focused on evaluating PCI scanners to answer the following research questions. Later in Section 5.5, we will introduce our second set of experiments on measuring the 100 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman security compliance of real-world e-commerce websites.

• How do various PCI scanners compare in terms of their detection capabilities? (Sec- tion 5.4.1)

• What are the security consequences of inadequate scanning and premature certification? (Section 5.4.2)

• How are web scanners (commercial or open-source ones) compared with PCI scanners in terms of detection capabilities? (Section 5.4.3)

We selected 8 U.S. based PCI DSS scanners as shown in Table 5.2. The selection process is as follows. From the list of approved vendors [43]5, we found 85 of them operate globally. Out of these 85, we aimed to identify a set of ASVs that appear to be of high quality (e.g., judging from the company’s reputations and websites) and somewhat affordable (due to our limited funding), while also covering different price ranges. We identified 6 such scanners. For 3 of them, the prices are publicly available. For the other 3 scanners, we emailed them through our rwycart.com email addresses. 2 of them (Scanner7 and Scanner8) did not provide their price quotations, which forced us to drop them from our evaluation (due to our organization policies). During our search, we also found that some website owners used non-ASV scanners. Thus, we also included 2 non-ASVs that have good self-reported quality. Non-approved scanners offer commercial PCI scanning services, but are not on the ASV list [43] of the PCI council6. Because of the legal constraints imposed by the terms and conditions of scanners, we cannot reveal scanners’ names. Researchers who wish to reproduce or extend our work for scientific purposes without publishing scanner names are welcome to contact the authors.

We conducted experiments successfully with 6 of the scanners (without Scanner7 and Scanner8 for the reason mentioned above). We use the email address ([email protected]) associated 5As of April 30, 2019, 97 companies are approved by the PCI Council as the approved scanning vendors (ASVs) [43]. 6To become an ASV, a scanner service needs to pay a fee and go through a testbed-based approval evaluation supervised by the PCI Council. Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 101

Table 5.2: Prices of PCI scanners and the actual costs.

PCI Scanners Price Spent Amount PCI SSC Approved? Scanner1 $2,995/Year $0 (Trial) Yes Scanner2 $2,190/Year $0 (Trial) Yes Scanner3 $67/Month $335 No Scanner4 $495/Year $495 Yes Scanner5 $250/Year $250 Yes Scanner6 $59/Quarter $118 No Scanner7 Unknown N/A Yes Scanner8 $350/Year N/A Yes Total - $1198 - with the testbed e-commerce website to register accounts at the scanning vendors. Table 5.2 shows the prices of these 6 vendors. For Scanner2 and Scanner1, we completed our experiments within the trial period (60 days for Scanner2 and 30 days for Scanner1). The trial-version and the paid- version offer the same features and services.

Iterative Test Design. Given a PCI scanner, we carry out the evaluation in two high-level steps shown in Figure 5.2. Every scanner first runs on the same baseline testbed with all the vulnera- bilities built in. Then we remove a minimal set of vulnerabilities to get the testbed certified for PCI DSS compliance. The final certified instance of the testbed may be different for different scanners, as high-quality scanners require more vulnerabilities to be fixed, having fewer remaining (undetected) vulnerabilities on the testbed.

1. Baseline Test. We spawn a website instance where all 35 vulnerabilities are enabled (29 of them are remotely verifiable). Then we order a PCI scanning service for this testbed. During the scanning, we monitor the incoming network traffic. We obtain the security report from the scanner, once the scanning is complete.

2. Certified Instance Test. After the baseline scanning, we modify the web server instance according to the obtained report. We perform all the fixes required by the PCI scanner and order another round of scanning. The purpose of this round of scanning is to identify the 102 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

Table 5.3: Testbed scanning results. “Baseline” indicates the scanning results on our testbed when all the 35 vulnerabilities are active. “Certified” indicates the scanning results after fixing the minimum number of vulnerabilities in order to be compliant. “#”, “G#”, “ ” means severity level of low, medium, and high respectively according to the scanners. “” mean “undetected”, “” means “fixed in the compliant version”, “∗” means “fixed as a side-effect of another case”. The “website scanners” represent a separate experiment to determine whether website scanners can help to improve coverage. We ran the website scanners on test cases that were not detected by the PCI ASV scanners. “N/A” means “not testable by a scanner”. “-” means “testable but do not need testing”. The "Must Fix" column shows the vulnerabilities that must be fixed by the e-commerce websites in order to be certified as PCI DSS compliant. Scanner4 Scanner6 Scanner3 Website Vul. Scanner2 Scanner5 Rq. Test Cases / Scanner1 (not aprvd.) (not aprvd.) Scanners Location Must Fix? CVSS Score ZAP W3af In ASV Scope? Baseline Baseline Baseline Baseline Baseline Certified Certified Certified Certified Certified Scanner2W Scanner5W 1.1 1. Firewall detection OS Y N/A Y ## ## ##     - - - - 2. Mysql port (3306) detection OS Y N/A Y  G#  ## ## # # - - - - 1.2 3. OpenSSH detected OS Y N/A N G#  ## ## ## # # - - - - 4. Remote access to Phpmyadmin Apache Y N/A N G#  ## ## ## # # - - - - 5. Default Mysql user/password Mysql Y 8.8 Y  # ∗    - - - - 2.1 6. Default Phpmyadmin passwords Apache Y 8.8 Y  ∗     ##   - - - - 2.3 7. Sensitive information over HTTP Apache Y 8.1 Y               3.2 8. Store CVV in DB Webapp N N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 3.3 9. Show unmasked PAN Webapp N N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 3.4 10. Store plaintext PAN Webapp N N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 3.5 11. Hardcoded key for encrypting PAN Webapp N N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 12. Use untrusted selfsigned cert. Apache Y 9.8 Y G#  G#  G#  G#    - - - - 13. Insecure block cipher (Sweet32) Apache Y 7.5 Y G#  G#  ## G#    - - - - 14. Expired SSL cert. Apache Y 5.3 Y G#  G#   G#    - - - - 4.1 15. Wrong domain names in cert. Apache Y 5.3 Y G#    ## ##   - - - - 16. DH modulus <= 1024 Bits Apache Y 5.3 Y # ∗ G#  G#  G#    - - - - 17. Weak Hashing in SSL cert. Apache Y 5.3 Y G#   ∗  ∗ G#    - - - - 18. TLS 1.0 supported Apache Y 3.7 N   G#      - - - - 19. Vulnerable OpenSSH (7.2) OS Y 7.8 Y        - - - - 6.1 20. Vulnerable Phpmyadmin (4.8.3) Apache Y 6.5 Y G#          - - - - 21. Sql inject in admin login Webapp Y 9.8 Y               22. Sql inject in customer login Webapp Y 9.8 Y               23. Disable password retry limit Webapp Y 5.3 Y                             6.5 24. Allow passwords with len <8 Webapp Y 5.3 Y 25. Javascript source integrity check Webapp Y 9.8 Y          - - - - 26. Don’t hide program crashes Webapp Y 6.5 Y               27. Implant XSS Webapp Y 6.1 Y               28. Implant CSRF Webapp Y 8.8 Y G#          - - - - 29. Extraction of server info. Apache Y 5.3 Y G#  ## ## ## # # - - - - 30. Browsable web directories Apache Y 7.5 Y G#  G#  G#  G#  G#  - - - - 6.6 31. HTTP TRACE/TRACK enabled Apache Y 4.3 Y G#  G#   G#   - - - - 32. phpinfo() statement is enabled Apache Y 5.3 Y G#  G#  ##    - - - - 33. Missing security headers in HTTP Apache Y 6.1 Y G#  G#  G#  ##   - - - - 34. Store unsalted customer passwords Webapp N N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A 8.4 35. Store plaintext passwords Webapp N N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Baseline: #Vul. Detected (Total detectable: 29) 21 - 16 - 17 - 16 - 7 - - - - - Certified: #Vul. Remaining (#Vul. detected, but no need to fix) - 7(0) - 15(3) - 18(7) - 20(7) - 25(4) - - - -

minimal set of vulnerabilities that need to be fixed in order to pass the PCI DSS compliance certification. Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 103

In summary, we perform the following steps for each scanner: i) implant vulnerabilities under each test case in the testbed, ii) run the PCI scanning, iii) fix all the vulnerabilities that the scanner mandates to fix in order to be PCI DSS compliant, iv) run the scanning again, and v) record the certified version of the testbed.

5.4.1 Comparison of Scanner Performance

We found that the security scanning capabilities vary significantly across scanners, in terms of i) the vulnerabilities they can detect and ii) the vulnerabilities they require one to fix in order to pass the certification process. Once passed, the website becomes PCI DSS compliant. The experimental results are presented in Table 5.3.

Scanner2. Scanner2 is the most effective PCI scanner in our evaluation, and successfully detected 21 out of the 29 externally detectable cases. The most important case that Scanner2 missed is the use of HTTP protocol to transmit sensitive information (test case 7). We fixed 21 vulnerabil- ities in our testbed to become PCI compliant in Scanner2. Most of the fixes are intuitive, except fixing Javascript source integrity checking (Case 25) and CSRF (Case 28). We added Javascript integrity checking for scripts that are loaded from external sources (Case 25). We used a dynamic instrumentation based plugin to protect OpenCart against CSRF attacks (Case 28). This plugin instruments code for generating and checking of CSRF tokens in OpenCart forms. Sometimes, fixing one vulnerability effectively eliminates another vulnerability that Scanner2 fails to detect. For example, Scanner2 did not detect default usernames and passwords for Phpmyadmin (Case 6); however, this vulnerability no longer exists after we disable the remote access to Phpmyadmin (Case 4).

Scanner5. In the baseline test (i.e., when all the vulnerabilities were in place), Scanner5 detected 16 out of the 29 cases. To obtain a Scanner5 compliant version, we had to fix 13 vulnerabilities. Two of the vulnerabilities (Test cases 5 and 17) are fixed as a side effect of fixing other vulner- abilities (Test cases 2 and 12). Scanner5 failed to report the use of a certificate with the wrong 104 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

hostname, which is a serious vulnerability exploitable by hackers to launch man-in-the-middle at- tacks. Scanner5 did not report the use of HTTP to transmit sensitive information (i.e., login and register forms in rwycart). Interestingly, Scanner5 detected the use of HTTP to log on to Ph- pMyAdmin. In addition, Scanner5 did not report the use of scripts from external sources (Case 25).

Scanner1 and Scanner4. Scanner4 uses Scanner1’s scanning infrastructure for ASV scanning. So we present the experimental results of both scanners under the same column. Scanner1 detects 17 vulnerabilities. However, it only requires fixing 10 of them to be PCI DSS compliant. Some of the high and medium severity vulnerabilities are not required to fix, including remotely accessible Mysql (Case 2), certificates with wrong hostnames (Case 15), and missing security headers (Case 33). The vulnerability of weak hashing in SSL/TLS certificates (Case 17) was fixed as a side effect of using a real certificate from Let’s Encrypt (Case 12).

Scanner6 and Scanner3. Scanner6 and Scanner3 are not on the approved scanning vendors (ASVs) list [43] provided by the PCI council. Compared with other approved scanners, they de- tected a fewer number of vulnerabilities. Scanner6 detected 16 vulnerabilities, whereas Scanner3 detected 7. We fixed 9 of the vulnerabilities for Scanner6 and 3 for Scanner3 in order to be compli- ant. Both Scanner6 and Scanner3 detected remotely accessible Mysql (Case 2), but do not require us to fix them. Scanner3 missed all the SSL/TLS and certificate related vulnerabilities (Test cases 12-18), while Scanner6 detected most of them. However, Scanner6 did not require us to fix certifi- cates with wrong hostnames (Test case 15). We cannot conclude that unapproved scanners perform worse than approved scanners, due to the small sample size.

A Case Study of False Positives. During our experiment, we find Scanner2 produced a false pos- itive under the SQL injection test. Scanner2 recently incorporated an experimental module to find blind SQL detection, by sending specially crafted parameters to the web server. If the server re- turns different responses, then it determines that the server has accepted and processed the parame- ter (a.k.a vulnerable). However, this detection procedure fails on a common e-commerce scenario: Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 105

supporting multiple currencies. OpenCart allows users to select the currency for a product. If a currency is clicked, it updates the currency of the current page. The server records all the parame- ters of the current page under a hidden field so that it can recreate the page later (Listing 5.1). Note that Scanner2’s specially-crafted parameters are also recorded, which makes Scanner2 believe that there exists a difference in the output under different values of the parameter, which is actually

a false positive. Nevertheless, we fixed it by changing the BUGGYCART code to be certified by Scanner2.

Listing 5.1: The difference in the output after injecting a parameter named name with an empty value “” vs. “yy”.

1 4

5 −−−−−−−− vs −−−−−−−− 6

7

Network Traffic Analysis. We collected the incoming network requests from each of the scanners using the access log of our testbed. During the baseline experiment, Scanner5, Scanner6 and Scan- ner3 sent 23,912, 39,836, and 31,583 requests, respectively and finished within an hour. Scanner4 and Scanner1 sent 147,038 requests and took more than 3 hours to finish. Scanner2 sent 64,033 requests within 2.5 hours. The reason why we received such a high traffic volume is that the PCI scanners were attempting to detect vulnerabilities such as XSS, SQL injection that require inten- sive URL fuzzing, crawling and parameter manipulations. This confirms that the PCI scanners have at least attempted to detect such vulnerabilities but were just unsuccessful. 106 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

5.4.2 Impacts of Premature Certification

Some scanners choose to simply report vulnerabilities without marking the e-commerce website as non-compliant. Below, we discuss the security consequences of premature certifications. Some of the incomplete scanning and premature certification issues can be prevented, if the scanners follow the ASV guidelines [41].

Network Security Threats. According to the ASV scanning guideline, SSL/TLS vulnerabilities (Test cases 12–17) should lead to automatic failure of certification, which is clearly necessary due to the man-in-the-middle threats. Only Scanner2 detected all these cases. Scanner3 does not detect any of these SSL/TLS vulnerabilities. In addition, a website should be marked as non- compliant if sensitive information is communicated over HTTP (Test case 7). However, none of the ASV scanners detected this issue in our testbed. This vulnerability can be avoided by configuring the server to automatically redirect all the HTTP traffic to HTTPS. Because none of the 6 scanners detected this vulnerability, it is likely that this HTTP issue exists on real-world e- commerce websites. Our later evaluation on 1,203 websites that process online payment shows 169 of them do not redirect their HTTP traffic to HTTPS (Section 5.5).

Our Test case 2 embeds a database access vulnerability, allowing the database to be accessible from the Internet. All the scanners detected this vulnerability. However, only Scanner2 and Scan- ner5 mark this issue as an automatic failure (i.e., non-compliant). The other scanners report it as “low/information”, not as a required fix, despite the ASV scanning guideline [41] recommends that to be marked as non-compliant. Our evaluation later on websites that accept payment card transactions shows that 59 out of 1,203 websites opened the Mysql port (3306) to the Internet (Section 5.5).

System Security Threats. The ASV scanning guideline [41] suggests to test and report vulnerable remote access software. 4 out of the 6 scanners detected vulnerable OpenSSH software (Test case 19). Under Test case 20, only Scanner2 detected vulnerable phpmyadmin, while others failed. Al- though all scanners noticed the Test case 29 (extracted server information), only Scanner2 required Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 107 a fix for compliance. The ASV scanning guideline [41] also recommends reporting automatic fail- ure if browsable web directories are found (Test case 30). All scanners detected this vulnerability. Scanner6 detected missing security headers (Test case 33), but did not ask us to fix it, while Scan- ner3 failed to detect it.

Web Application Threats. The scanners’ performance is particularly weak for this category. Out of the 8 test cases, only 2 were detected by Scanner2 (tampered Javascript and CSRF). None of the cases was detected by other PCI scanners.

5.4.3 Evaluation of Website Scanners

The above results suggest that some web application vulnerabilities are difficult to detect. The follow-up question is, can specialized website scanners detect these vulnerabilities? To answer this question, we ran four website scanners on our BUGGYCART testbed, including two commercial ones (from Scanner2 and Scanner5) and two open source scanners (w3af [6] and ZAP [4]). w3af and ZAP are state-of-the-art open source web scanners, are actively being maintained and are often used in academic research [107, 108, 168]. The two commercial web scanners are from reputable companies. Scanner2W and Scanner2 are from the same company, where the website scanner is marketed as a different product from PCI scanner. It is the same for Scanner5W and Scanner5.

We conducted the baseline test for the four website scanners. Note that these web scanners do not produce certificates. The results are shown in the last four columns of Table 5.3. Since they are website scanners, we only expect them to cover web application vulnerabilities (Test case 7, 21, 22, 23, 24, 26, 27). Unfortunately, none of the commercial scanners detect these web application vul- nerabilities. W3af reported the use of HTTP protocol to communicate sensitive information (case 7), but missed other web application vulnerabilities. ZAP detected the SQL injection vulnerability in the customer login page (case 22), but missed the SQL injection vulnerability in the admin login page (case 21). Noticeably, ZAP also missed the XSS vulnerability we implanted (case 27). 108 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

Summary of Testbed Findings. The detection capabilities vary significantly across scanners. Our experiments show that 5 out of 6 PCI scanners are not compliant with the ASV scanning guidelines [41] by ignoring detected vulnerabilities and not making them “automatic failures”. Most of the common web application vulnerabilities (e.g., SQL injection, XSS, CSRF) are not detected by the 6 PCI scanners (only Scanner2 detected CSRF), despite the requirements of the PCI guidelines. Out of the 4 website scanners, only ZAP detected one of the two SQL injection cases.

Admittedly, black-box detection of vulnerabilities such as XSS and SQL injection is difficult. Typ- ical reasons for missed detection are i) failure to locate the page due to incomplete discovery and/or ii) that detection heuristics are limited or easily bypassed. In our testbed, SQL injection vulnera- bilities (21, 22) are placed in the login pages. CSRF vulnerabilities are present in all forms. The scanners we tested used web crawling with URL fuzzing to detect hidden pages, URLs, and func- tions. Often, we are unable to pinpoint the exact reasons why the tools fail in these cases. Novel detection techniques, such as guided fuzzing [107] and taint tracking [178], have been proposed by the research community. Future work is needed to evaluate their applicability in the specific PCI context.

5.5 Measurement of Compliant Websites

The alarming security deficiencies in how PCI scanners conduct the compliance certification mo- tivate us to ask the following questions: How secure are e-commerce websites? What are the main measurable vulnerabilities in e-commerce websites? As such we designed another set of real- world experiments where we aim to measure the security of e-commerce websites with respect to the PCI DSS guideline. To do so, we need to address several technical questions.

What Tools to Use? The key enabler of this measurement is a new tool we developed named PCI-

CHECKERLITE. We use basic Linux tools (e.g., nc, openssl) and Java net URL APIs to implement Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 109

the system. Below, we focus on the key design concepts of PCICHECKERLITE in order to work with live websites.

What Security Properties to Check? A key requirement of this experiment to make sure that we do not disrupt or negatively impact websites being tested. Out of the 29 externally verifiable cases in Table 5.3, we choose a subset of 17 cases for this experiment, as shown in Table 5.5.

The sole reason of selecting these cases for PCICHECKERLITE is that we could implement these tests in a non-intrusive manner, leaving a minimum footprint, i.e., having a minimum impact on the servers. We categorize these cases to high, medium and low severity based on i) the attacker’s gain and ii) the attack difficulty. Cases that are immediately exploitable by any arbitrary attacker to cause large damages are highly severe, for example, the use of default passwords (Test case 5), insecure communications (Test cases 7, 12, 13, 16, 17), vulnerable remote access software (Test case 19), browsable web directories (Test case 22), and supporting HTTP TRACE method (Test case 23). Cases that substantially benefit any arbitrary attacker but require some efforts to exploit are marked as medium severity, e.g., test cases 2, 14, 15, 25, 29, and 33. For example, scripts loaded from external sources can steal payment card data (Test case 25), but attackers need to craft the malicious scripts [56]. Low-risk issues are marked as low severity (Test case 3, 18). The categories are consistent with Table 5.3 as high and medium severity cases correspond to “must- fix” vulnerabilities. The two low-severity cases are not required to be fixed to be PCI-compliant.

Implementing PCICHECKERLITE. Our goal is to minimize the number of requests that PCI-

CHECKERLITE issues per test case, while maximizing the test case coverage. It involves a col- lection of lightweight heuristics that merge multiple tests into a single request. For example, for most of the HTTP-related tests, we reuse a single response from the server. Test cases 25, 29, and 33 are covered and resolved by one single HTTP request to retrieve the main page and analyzing the response header. Test cases 12–18 are covered by one certificate fetching. For case 30 (brows- able directories enabled) PCICHECKERLITE conducts a code-guided probe and avoids crawling web pages. It discovers file paths in the code of the landing page and then probes the server with requests for accessing path prefixes. The implementation details are given in Section 5.5.1. 110 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

Table 5.4: Number of e-commerce websites that have at least one vulnerability and those that have at least one “must-fix” vulnerability. In total, 1,203 sites are tested including 810 sites chosen from different web categories, and 393 sites chosen from different Alexa ranking ranges.

E-commerce Websites # of Vulnerable Websites Must-fix Vul. All Vul. Business (122) 106 113 Shopping (163) 135 143 Arts (78) 66 76 Adults (65) 61 65 Recreation (84) 70 75 Category (810) Computer (57) 53 56 Games (42) 38 42 Health (60) 54 55 Home (102) 82 93 Kids & Teens (37) 31 36 Top (288) 235 277 Ranking (393) Bottom (105) 100 104 Total (1,203) 1,031 (86%) 1,135 (94%)

Table 5.5: Testing results on 1,203 real-world websites that accept payment card transactions as of May 3, 2019. We reuse the index numbers of the test cases from Table 5.3. Category (810) Ranking (393) Total (1,203) Reqs. Test Cases Severity Biz. Shop. Arts Adlt. Recr. Comp. Game. Hlth. Home. Kids. Top Btm. (122) (163) (78) (65) (84) (57) (42) (60) (102) (37) (288) (105) 2. Mysql port (3306) detection Medium 3 6 4 2 6 2 3 2 4 0 0 27 59 (5%) 1.2 3. OpenSSH available Low 6 15 11 4 13 6 7 8 12 1 6 27 116 (10%) 2.1 5. Default Mysql user/passwd High 0 0 0 0 0 0 0 0 0 0 0 0 0 (0%) 2.3 7. Sensitive info over HTTP High 12 10 12 10 17 10 8 6 10 5 47 22 169 (14%) 12. Selfsigned cert presented High 0 0 3 0 1 0 0 1 1 0 0 3 9 (1%) 13. Weak Cipher Supported High 0 0 0 0 0 0 0 0 0 0 0 0 0 (0%) 14. Expired cert presented Medium 0 0 2 0 2 0 0 1 0 0 0 2 7 (1%) 4.1 15. Wrong hostname in cert Medium 3 1 3 0 6 2 0 2 4 1 0 10 32 (3%) 16. Insecure Modulus High 0 0 0 0 0 0 0 0 0 0 0 1 1 (0.1%) 17. Weak hash in cert High 0 0 0 0 0 0 0 0 0 0 0 0 0 (0%) 18. TLSv1.0 Supported Low 67 73 53 42 41 40 28 30 67 16 216 71 744 (62%) 6.1 19. OpenSSH vulnerable High 6 14 11 4 13 6 6 8 11 1 6 26 112 (9%) 6.5 25. Missing script integrity check Medium 92 109 54 44 44 32 27 42 66 21 154 75 760 (63%) 29. Server Info available Medium 26 34 17 17 22 15 17 17 25 11 33 22 256 (21%) 30. Browsable Dir Enabled High 0 0 0 0 0 0 0 0 0 0 0 0 0 (0%) 6.6 31. HTTP TRACE supported High 6 4 3 3 2 5 2 2 6 0 4 6 43 (4%) 33. Security Headers missing Medium 18 38 9 12 14 21 9 7 14 7 114 13 276 (23%)

How to Determine Whether a Website is PCI Compliant? It is not easy to directly confirm whether a website is DSS compliant or not, unless the website actively advertises this informa- Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 111

tion. While some cloud and service providers (e.g., Google Cloud [63], Amazon Connect [29], Shopify [62], and Akamai [61]) advertise their PCI compliance status, not all of them disclose such information. However, as e-commerce websites need to show their DSS compliance in or- der to work with acquirer banks (described in Section 5.2), it is reasonable to assume that most websites we evaluated have successfully passed the external scanning.

Website Selection. We use two different ways to select websites to increase diversity. First, we downloaded 2,000 Alexa top websites under 10 categories (200 websites per category) to observe security compliance differences based on categories. In Table 5.4, we show the category-wise breakdown. Among them, we manually identified 810 websites that make payment card transac- tions. This step is time-consuming and usually requires manually visiting multiple pages (e.g., one needs to visit multiple pages to get to the payment page on nytimes.com). Second, to cover web- sites of different popularity levels, we further select the top 500 and bottom 500 websites (1,000 in total) from Alexa top 1 million website list. We found 288 websites from the top list and 105 websites from the bottom list that accept payment card information (and do not overlap with the previous 811 websites). In total, 1,203 payment-cards-taking websites are selected for scanning by

PCICHECKERLITE.

5.5.1 Implementation Details of PCICHECKERLITE

PCICHECKERLITE follows a series of rules for vulnerability testing. The index of the rules matches with the testing cases discussed in this chapter. We only focus on a subset of test cases that do not disrupt or cause any negative impact to the remote servers (for ethical considerations). The implementation details are as follows.

Rule 2. Database port detection. For database port detection, we choose to probe for Mysql port7. The reason for choosing Mysql port are i) Mysql is among the top three (Mysql, Oracle, Microsoft

7We do not probe for multiple ports to avoid suspicions for possible port scanning. However, a similar technique can be used to probe for other databases. 112 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

SQL Server) most popular databases in the world [58]; ii) Mysql is free; and iii) it supports a wide range of programming languages. The access to Mysql port (e.g., 3306) is disabled by default. It is very dangerous to enable remote access to Mysql database for an arbitrary client. We check the Mysql port using nc [60], which is a Unix utility tool that reads and writes data across network connections using the TCP or UDP protocol.

Rule 5. Default Mysql user/password detection. If the Mysql database of a website is remotely accessible, we further check for the default username and password. A typical Mysql installation has a user “root” with an empty password, unless it is otherwise customized or disabled. As such, we run a Mysql client to connect to the remote host using the default username and password.

PCICHECKERLITE terminates the connection immediately and raises an alert if the attempt is successful.

Rules 3 & 19. Checking OpenSSH’s availability and version. We use nc [60] to connect with port 22 of the remote OpenSSH server. If OpenSSH runs on port 22, then it will return the server information (e.g., OpenSSH version, OS type, OS version). We parse the returned information to determine the version of the OpenSSH server. We consider any installation versions before OpenSSH_7.6 as vulnerable.

Rules 29 & 33. Checking HTTP header information. Extracting HTTP information does not require the rich browser functionality. We use Java net URL APIs to open HTTP connections for extracting HTTP headers. For case 29, we raise a warning only if we detect that the “Server” header contains server name and version. For case 33, we raise a warning if any of the four security header (i.e., X-Frame-Options, X-XSS-Protection, Strict-Transport-Security, X-Content- Type-Options) is missing.

Rule 7. Sensitive information over HTTP. We tested whether all the HTTP traffic is redirected to HTTPS by default. We open an HTTP connection with the server and follow the redirection chain. If the server doesn’t redirect to HTTPS, we raise an alert. We use Java net URL APIs to implement this test case. Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 113

Rules 18 & 13. TLSv1.0 and weak cipher negotiation. We use OpenSSL’s s_client tool to estab-

lish a SSL/TLS connection using TLSv1.0 protocol. PCICHECKERLITE raises a warning if the connection is successful. We also use s_client to negotiate the ciphersuite with the remote server.

PCICHECKERLITE raises a warning if we successfully negotiate with a ciphersuite that contains a weak cipher (i.e., IDEA, DES, MD5).

Rules 12, 14, 15, 16 & 17. Retrieving and examining the certificate. We use OpenSSL’s s_client tool to retrieve the SSL certificate of a remote server. To parse the certificate, we use APIs from java.security.cert package. To check whether a certificate is self-signed (Case 12), we used the public key of the certificate to verify the certificate itself. To check whether the certificate is expired, we use the checkValidity() method of X509Certificate API (Case 14). If the subject do- mainname (DN) or any alternate DN of a certificate doesn’t match with the server domainname, then PCICHECKERLITE raises an alert (Case 15). Regarding the public key sizes for factoring modulus (e.g., RSA, DSA), the discrete logarithm (e.g., Diffie-Hellman), and the elliptic curve (e.g., ECDSA) based algorithms, NIST recommends them to be 2048, 224 and 224 bits, respec- tively [57]. PCICHECKERLITE raises alert if the key size is smaller than what is recommended (Case 16). If the signing algorithm uses any of the weak hashing algorithms (e.g., MD5, SHA,

SHA1, SHA-1), PCICHECKERLITE raises warnings (Case 17).

Rule 25. Script source integrity check. A website is expected to check the integrity of any JavaScript code that is loaded externally to the browser. To enable script source integrity check, a server can use the “integrity” attribute of the script tag. In the “integrity” attribute, the server should mention the hashing algorithm and the hash value of the script that should be used to check

the integrity. PCICHECKERLITE downloads the index page of a website. After that, it collects all the script tags, and checks if the script tags contain any external URL (excluding the website’s CDN URLs). Then it looks for the integrity attribute for the scripts loaded from external URLs, and raises alert if the integrity attribute is missing. We only perform this test for the index page (in- stead of all the pages) of a website to keep the test lightweight. The number of vulnerable websites 114 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

detected by this test can only be interpreted as a lower bound.

Rule 30. Checking for browsable directories. We check whether the directories are browsable in a website. To avoid redundant traffic, we reuse the collected JavaScript script URLs for case 25. We then examine the common parent directory of all the internal URLs. Finally, we send a GET request to fetch the content of the directory. If directory browsing is enabled, the server will return a response with code 200 with a page containing the listing of files and directories of the specified path. Otherwise, it should return an error response code (e.g., 404 - not found, 403 - Forbidden). This test only determines if a directory is browsable. We never store any of the returned pages during the test.

Rule 31. HTTP TRACE supported. HTTP TRACE method is used for diagnostic purposes. If it is enabled, the web server will respond to a request by echoing in its response the exact request that it has received. In [123], the author has shown that HTTP TRACE can be used to steal sensitive information (e.g., cookie, credentials). To examine the HTTP TRACE configuration, we send a HTTP request by setting the method to TRACE. If the TRACE method is enabled by the server, the server will echo the request in the response with a code 200.

5.5.2 Findings of E-commerce Website Compliance.

68 websites fully passed our PCICHECKERLITE test, including the aforementioned cloud providers (Google Cloud, Amazon Connect, Shopify). Our results also confirm that a number of actively operating websites do not comply with the PCI Data Security Standard. As shown in Table 5.4, out of the 1,203 websites, 1,135 (94%) have at least one vulnerability. More importantly, 1,031 (86%) sites have at least one vulnerability that belong to the “must-fix” vulnerabilities which should have disqualified them as non-compliant. Among them, 520 (43%) sites even have two or more must-fix vulnerabilities.

Then as shown in Table 5.5, the shopping category has the lowest percentage (87%) of vulnera- Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 115

ble websites, while all other categories have a percentage of over 90%. We found several types of high-risk and medium-risk vulnerabilities, including leaving the Mysql port (3306) open, us- ing self-signed or expired certificates, wrong hostnames in the certificate, enabling HTTP TRACE method, and using vulnerable OpenSSH (7.5 or earlier). Supporting TLS v1.0 (low-risk level) is another most common vulnerability we detected (Test case 18), likely due to the need for backward compatibility. SSLv3.0 and TLSv1.0 are known to have multiple man-in-the-middle vulnerabil- ities [181] and the PCI standard recommends that all web servers and clients must transition to TLSv1.1 or above.

The vulnerabilities in these websites suggest the PCI scanners used by the websites are inadequate and failed to detect the vulnerabilities during the certification scans. Another possibility is that the acquiring banks did not sufficiently examine the merchants’ quarterly security reports, allowing merchants to operate without sending adequate security reports to banks as required.

Figure 5.3: An example of wrong hostname in the certificate. The domain (a*****.***) uses a certificate that is issued for a different domain name (*.n*****.***).

Figure 5.4: Self-signed certificate used by (r*****.***), a website that accepts payment cards for donations.

Vulnerable Websites. Below, we highlight some interesting findings without explicitly mention- 116 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

Figure 5.5: (u*****.***) uses expired certificates by default and redirects users to a secure sub- domain with proper certificate during payment.

ing the names of vulnerable sites.

Mysql open ports. 59 websites expose the MySql service for remote access. For example, two Slovenian websites that sell healthcare products and car components and a Russian website that sells furnaces and stoves all have this vulnerability. We did not detect any use of default user (root) or no password.

Insecure certificates (self-signed, expired, and insecure modulus). The use of certificates with wrong hostnames (Figure 5.3) is an issue that appears in 3% of the websites. For some websites, the root cause is not properly configuring HTTPS. For example, one website accepts payment for donations. Since it does not correctly set up HTTPS, it uses a default certificate8 for HTTPS (Figure 5.4). In some cases, the websites use HTTPS for payment only while other sensitive content (i.e., items and the cart) are still sent over HTTP. Because the original domain is not properly configured to use HTTPS, it presents the default expired certificate (Figure 5.5).

Comparison with Existing Tool. Finally, we experimentally compared PCICHECKERLITE with the state-of-the-art web scanner. Note that existing scanners typically have aggressive pentesting components that are not suitable to test live websites. For this experiment, we choose w3af and have to adapt it to a “non-intrusive low-interactive" version. More specifically, we modify w3af to 1) block intrusive tests (e.g., XSS, SQL injections), 2) disable URL fuzzing, and 3) disable the liveliness testing. For scalability, we also utilized w3af’s programmable APIs (w3af_console) to discard the graphic user interface. We call this version as customized w3af. For comparison, we

8A self-signed certificate comes with the webserver installation. Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 117

Table 5.6: Comparison between PCICHECKERLITE and the customized w3af on 100 ran- domly chosen websites and the Buggycart testbed. We report the number of vulnerable websites detected and the false positives (FP) among them.

100 Random websites Buggycart Vulnerabilities Ours (FP) w3af (FP) Ours w3af 2. Mysql port (3306) detection 5 (0) 0 (0)   3. OpenSSH available 10 (0) 0 (0)   5. Default Mysql user/pass 0 (0) 0 (0)   7. Sensitive info over HTTP 12 (0) 27 (17)   12. Selfsigned cert presented 2 (0) 2 (0)   13. Weak Cipher Supported 0 (0) 0 (0)   14. Expired cert presented 0 (0) 3 (3)   15. Wrong hostname in cert 3 (0) 2 (1)   16. Insecure Modulus 0 (0) 0 (0)   17. Weak hash in cert 0 (0) 0 (0)   18. TLSv1.0 Supported 63 (0) 0 (0)   19. OpenSSH vulnerable 10 (0) 0 (0)   25. Missing script integrity check 72 (1) 55 (10)   29. Server Info available 19 (0) 81 (62)   30. Browsable Dir Enabled 0 (0) 0 (0)   31. HTTP TRACE supported 6 (0) 6 (6)   33. Security Headers missing 30 (0) 0 (0)  

ran PCICHECKERLITE and the customized w3af on 100 websites random from the 1203 sites (in

Table 5.5). For reference, we also ran both tools on our BUGGYCART.

The results are shown in Table 5.6. First, we observe that our system outperforms w3af on Buggy- cart by detecting all the vulnerabilities. Second, on the 100 real-world websites, our system also detected more truly vulnerable websites. Even though w3af flagged more websites (e.g., Test case 7, 29), manually analysis shows that a large portion of the alerts are false positives. For example, under Test case 7, w3af flags a website if Port 80 is open, while PCICHECKERLITE reports a web- site only if the request is not automatically redirected to Port 443 (HTTPS). This design of w3af produces 17 false positives. Under Test case 15, w3af flags a website that uses the certificate for its subdomains (which is not a violation). For Test case 29, w3af flags websites that expose non- critical information whereas we only flag the exposure of exploitable information (e.g., server and framework version numbers). Note that among all vulnerabilities, we only have one FP under Test 118 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman case 25. This website is flagged by PCICHECKERLITE for loading Javascript from a third-party domain without an integrity check. Manually analysis shows that the third-party domain and the original website are actually owned by the same organization. Technically, such information is beyond what PCICHECKERLITE can collect.

5.6 Disclosure and Discussion

Responsible Disclosure. We have fully disclosed our findings to the PCI Security Standard Coun- cil. In May 2019, we shared our work with the PCI SSC, and successfully got in touch with an experienced member of the Security Council. Through productive exchanges with them, we gained useful insights. i) The Security Council shared a copy of our work to the dedicated companies that host the PCI certification testbeds, who are now aware of our findings; ii) Preventing scanners from gaming the test is one of their priorities, for example, by constantly updating their testbeds and changing the tests; iii) Low interaction constraints make it difficult to test some vulnerabilities externally (which we also experienced and aimed to address in our work); iv) The Security Coun- cil routinely removes scanners from the ASV list or warns scanners based on the feedback sent by ASV consumers; v) Their testbeds exclude vulnerabilities whose CVSS scores are lower than 4; vi) Payment brands and acquirer banks need efficient (and automatic) solutions to inspect PCI DSS compliance reports. Insights ii), iii), and vi) present interesting research opportunities. In addition, we are in the process of contacting vulnerable websites. Some notifications have been sent out to those that failed test case 2 (open Mysql port) or 19 (vulnerable OpenSSH). Incidentally, we found a few websites have already fixed their problems, for example Netflix upgraded the vulnerable SSH-2.0-OpenSSH_7.2p2 (current Netflix.com server does not show a version number).

Is Improving PCI Certification a Practical Task? From the economics point of view, the concept of for-profit security certification companies may seem like an oxymoron. Intuitively, a scanning vendor might make more money if its scanner is less strict, allowing websites to easily pass the Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 119

DSS certification test. On the contrary, a company offering rigorous certification scanning might lose customers when they become frustrated from failing the certification test. Phenomena with misaligned incentives widely exist in many security domains (e.g., ATM security, network secu- rity) [71]. Fortunately, unlike the decentralized Internet, PCI security is centrally supervised by the PCI Security Council. Thus, the Council, governing the process of screening and approving scanner vendors, is a strong point of quality control. The enforcement can be strengthened through technical means. Thus, improving the PCI security certification, unlike deploying Internet security protocols [170], is a practical goal that is very reachable in the near future.

Gaming-resistant Self-evolving Testbeds and Open-Source PCI Scanners. A testbed needs to con- stantly evolve, incorporating new types of vulnerabilities and relocating existing vulnerabilities over time. A fixed testbed is undesirable, as scanners may gradually learn about the test cases and trivially pass the test without conducting a thorough analysis. Automating this process and creating self-evolving testbeds are interesting open research problems.

Competitive open-source PCI/web scanners from non-profit organizations could drive up the qual- ity of commercial vendors, forcing the entire scanner industry to catch up, and providing alterna- tive solutions for merchants to run sanity check on their services. Currently, there are not many high-quality, open-source and deployable web scanners; w3af and ZAP are among the very few available.

Yes The card verification code or value (three digit or four-digit number printed with Not Yes CCW No N/A Tested on the front or back of a payment card) is not stored after authorization? 2 2 2 2 2

Figure 5.6: A sample question from the Self-Assessment Questionnaire D (SAQ D) [52]. “Yes with CCW” means “the expected testing has been performed, the requirement has been met with the assistance of a compensating control, and a Compensating Control Worksheet (CCW) is required to be submitted along with the questionnaire” .

Automate the Workload at Payment Brands and Acquirer Banks. Payment brands and acquirer banks are the ultimate gatekeepers in the PCI DSS enforcement chain. Manually screening millions 120 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

of scanning reports and questionnaires every quarter is not efficient (and is likely not being done well in practice). Indeed, our real-world experiments suggest that the gatekeeping at the acquirer banks and payment brands appears weak. Thus, automating the report processing for scalable enforcement is urgently needed.

Doesn't Touch Self-assessment questionaire A Cardholder Data (SAQ A)

Doesn't Store Self-assessment questionaire A-EP Cardholder Data (SAQ A-EP)

Process or Store Self-assessment questionaire D-Mer Cardholder Data (SAQ D-Mer)

E-commerce Type Relevant SAQ Figure 5.7: Self-Assessment Questionnaires (SAQs) for different types of e-commerce merchants.

Scanning vs. Self-assessment Questionnaires. There are four major types of Self-assessment Questionnaires or SAQs (A to D) [53]. The different SAQs are designed for different types of merchants, as illustrated in Figure 5.7. In SAQs, all the questions are close ended, i.e., multiple choices (Figure 5.6). For a vast majority of the merchants, the current compliance checking largely relies on the trust of a merchant’s honesty and capability of maintaining a secure system. This observation is derived from our analysis of the 340 questions in the self-assessment questionnaire (SAQ) D-Mer, which is an SAQ designed for merchants that process or store cardholder data. Consequently, it is the most comprehensive questionnaire.

We manually went through all the questions the in Self-Assessment Questionnaire (SAQ) D-Mer and categorized them into the five major groups, network security, system security, application security, application capability, and company policies. 271 of the 340 questions fall under the category of company policies and application capability, where none of them can be automatically verifiable by an external entity (e.g., ASV/web scanners). Only 31 out of the 69 questions on network, system and application security are automatically verifiable by a PCI scanner. Sazzadur Rahaman Chapter 5. Security in Payment Card Industry 121

Legal Consequences of Cheating in PCI Certification. The PCI DSS standard is not required by the U.S. federal law. Some state laws do refer to PCI DSS (e.g., Nevada, Minnesota, Washing- ton) [11], stating that merchants must be PCI compliant. However, there is no mentioning about any legal consequences of cheating in the PCI DSS certification process. Thus, it appears that being untruthful when filling out the self-assessment questionnaire would not have any direct legal consequences. The only potential penalty would be an “after effect”. For example, a merchant may be fined by the card brand if a data breach happens due to its non-compliance [10].

Limitations. Our work has a few limitations. First, we only tested 6 PCI scanners and 4 web scan- ners. Given the high expense to order PCI and web scanning, it is unlikely that such an experiment can truly scale up. For PCI scanning, we have tried to increase the diversity of scanner selection by selecting from different price ranges. The website scanners are added to further increase diversity. Second, our work primarily focuses on the PCI compliance certification of e-commerce websites. Although we did not evaluate the compliance of banks (which report to card brands), we argue that it is the same set of the approved PCI scanners that provide the compliance reports for both mer- chants and banks. The problem revealed in our study should be generally applicable. Third, we did not test vulnerabilities that are not yet covered by the current Data Security Standards (DSS). Fu- ture work can further study the comprehensiveness of DSS. Finally, in Section 5.5, we only tested 1,203 e-commerce websites because it requires manual efforts to verify whether a website accepts payment card information. It is difficult to automate the verification process since one often needs to register an account and visit many pages before finding the payment page. We argue that our experiment already covers websites from various categories and ranking ranges, which is sufficient to demonstrate the prevalence of the problem. 122 Chapter 5. Security in Payment Card Industry Sazzadur Rahaman

5.7 Summary

Our study shows that the PCI data security standard (PCI DSS) is comprehensive, but there is a big gap between the specifications and their real-world enforcement. Our testbed experiments revealed that the vulnerability screening capabilities of some approved scanning vendors (ASV) are inadequate. 5 of the 6 PCI scanners are not compliant with the ASV scanning guidelines. All 6 PCI scanners would certify e-commerce websites that remain vulnerable. Our measurement on 1,203 e-commerce websites shows that 86% of the websites have at least one type of vulnerability that should disqualify them as non-compliant. Our future work is to a design minimum-footprint black-box scanning method. Chapter 6

Conclusion and Future Work

Deployment-grade security screening has a crucial role in reducing the gap between the theory and the practice in software security. This thesis proposed several scalable detections of cryptographic vulnerabilities in software. I described our effort of producing a deployment-quality static anal-

ysis tool CRYPTOGUARD to detect cryptographic misuses in Java programs that developers can routinely use. This effort led to several crypto-specific contributions, including language-specific contextual refinements for FP reduction, on-demand flow-sensitive, context-sensitive, and field- sensitive program slicing, and benchmark comparisons of leading solutions. We also obtained a trove of security insights into Java secure coding practices. I also designed a universal specification language named SPANL, to model meta-level program properties that can be detected by using a combination of various forms of data-flow analyses. We also proved the expressiveness by model- ing various security rules in SPANL language. I also present, TAINTCRYPT to detect cryptographic implementation and misuse vulnerabilities in C/C++. Our experiment on 5 popular applications and libraries generated new security insights.

Scalable vulnerability detection solutions and security guidelines are not enough to ensure internet- wide software ecosystems. In the second part, This thesis focuses on measuring security non- compliance in the payment card industry (PCI). Our study shows that the PCI data security standard (PCI DSS) is comprehensive, but there is a big gap between the specifications and their real-world enforcement. Our testbed experiments revealed that 5 of the 6 PCI scanners are not compliant with the ASV scanning guidelines. All 6 PCI scanners would certify e-commerce websites that remain vulnerable. Our measurement on 1,203 e-commerce websites shows that 86% of the websites have

123 124 Chapter 6. Conclusion and Future Work Sazzadur Rahaman

at least one type of vulnerability that should disqualify them as non-compliant.

Building universal frameworks with multi-language support for scalable secure code generation, vulnerability detection, and fixing would be an interesting research direction. Also, understanding the developer and user expectations towards secure coding is a vital component to stimulate secure coding. Measuring the non-compliance of various security, protocol, and software specifications and requirements is also an interesting research direction. For example, non-compliant implemen- tations of various public and proprietary internet, payment systems, cyber-physical system proto- cols, and smart contracts are inherently vulnerable. There are two fundamental challenges involved in checking the compliance and implementation correctness, i.e., i) modeling the specification, and ii) the scalability of proving the compliance (or non-compliance). Appendices

125 Appendix A

Cryptographic Code Screening

A.1 Example intermediate codes

Listing A.1: Rule to detect insecure symmetric keys

1 NAME: const_sym_key

2 API: sk_apis

3 javax.crypto.spec.SecretKeySpec:

4 void SecretKeySpec(, )

5

6 OPERATIONS:

7 o1: inter-backward with sk_apis and keyBytes

8

9 EMITS:

10 {keyBytes}: constants of-type java.lang.String, byte[]

11

12 CONSTRAINTS:

13 c1: {keyBytes} not empty

14

15 EXEC:

16 o1

126 Sazzadur Rahaman Appendix A: Cryptographic Code Screening 127

17 if c1:

18 out "Keys must not be derived from constants"

Listing A.2: Rule to detect improper hostname verifier

1 NAME: impropr_hostname

2

3 APIS: host_name_apis

4 java.net.ssl.HostnameVerifier:

5 verify(,)

6

7 OPERATIONS:

8 o1: intra-backward host_name_apis with "return"

9

10 EMITS:

11 {ret}: *

12

13 CONSTRAINTS:

14 c1: "@parameter1: javax.net.ssl.SSLSession" not in {ret}

15

16 EXEC:

17 o1

18 if c1:

19 out "verify method is not properly implemented!"

Listing A.3: Rule to detect improper TrustManager

1 NAME: impropr_trustmgr

2 APIS: trustmgr_apis

3 java.net.ssl.TrustManager: 128 Appendix A: Cryptographic Code Screening Sazzadur Rahaman

4 checkServerTrusted(,

5 )

6

7 OPERATIONS:

8 o1: intra-backward trustmgr_apis with "throw"

9 o2: intra-backward trustmgr_apis with "checkValidity()"

10 o3: iterate trustmgr_apis

11

12 EMITS:

13 {o1_out}: * 14 {o2_out}: *

15 {o3_out}: instructions matches "throws CertificateException"

16

17 CONSTRAINTS:

18 c1: {o1_out} not empty & {o2_out} empty & {o3_out} not empty

19

20 EXEC:

21 o1, o2, o3

22 if c1:

23 out "TrustManager is not properly implemented!"

Listing A.4: Rule to detect disabling csrf protection in Spring security

1 NAME: disabling_csrf

2 APIS: spring_sec_apis

3 annotation.web.configuration.WebSecurityConfigurerAdapter:

4 configure() Sazzadur Rahaman Appendix A: Cryptographic Code Screening 129

5

6 OPERATIONS:

7 o1: intra-forward spring_sec_apis with http

8

9 EMITS:

10 {http}: instructions matches "disable()"

11

12 CONSTRAINTS:

13 c1: {http} not empty

14

15 EXEC:

16 o1

17 if c1:

18 out "Disabled CSRF protection!"

Listing A.5: Rule to detect hardcoded JWT token signing keys

1 NAME: const_sym_key

2 API: jwt_signing_apis

3 io.jsonwebtoken.Jwts:

4 signWith(, )

5

6 OPERATIONS:

7 o1: inter-backward with jwt_signing_apis and key

8

9 EMITS:

10 {key}: constants of-type java.lang.String

11 130 Appendix A: Cryptographic Code Screening Sazzadur Rahaman

12 CONSTRAINTS:

13 c1: {key} not empty

14

15 EXEC:

16 o1

17 if c1:

18 out "Keys must not be derived from constants"

A.2 Other Evaluation Results

Table A1: The number of alerts in Apache (total 94 root-subprojects) and Android applications (6,181). For Rules 1, 2, 3, 8, 10, 12, each constant/predictable value of an array/collection is considered as an individual violation.

Apache Android Rules # of Root-subprojects # of Alerts Per Rule # of Applications #of Alerts Per Rule (1,2) Predictable Keys 37 (39.36%) 264 1,617 (26.16%) 12,457 (3) Hardcoded Store Password 29 (30.85%) 148 218 (3.52%) 458 (4) Dummy Hostname Verifier 8 (8.51%) 12 800 (12.94%) 1,155 (5) Dummy Cert. Validation 11 (11.70%) 30 1,564 (25.30%) 3,856 (6) Used Improper Socket 4 (4.25%) 4 210 (3.39%) 271 (7) Used HTTP 24 (29.62%) 222 2,486 (40.22%) 8,321 (8) Predictable Seeds 0 (0%) 0 80 (1.29%) 544 (9) Untrusted PRNG 33 (35.10%) 142 5,194 (84.03%) 36,223 (10) Static Salts 21 (22.34%) 112 199 (3.21%) 1,757 (11) ECB mode for Symm. Crypto 16 (17.02 %) 41 882 (14.26%) 1,780 (12) Static IVs 4 (4.25 %) 41 913 (14.77%) 12,089 (13) <1000 PBE Iterations 25 (26.59 %) 43 151 (2.44%) 312 (14) Broken Symm. Crypto Algorithms 29 (30.85 %) 86 701 (11.34%) 1,742 (15) Insecure Asymm. Crypto 9 (10.98 %) 12 108 (1.74%) 111 (16) Broken Hash 42 (44.68 %) 138 5,272 (85.29%) 49,769 Sazzadur Rahaman Appendix A: Cryptographic Code Screening 131

Table A2: Lines of code (LoC) of 30 Apache root-subprojects with their dependencies and 30 Android applications.

No. Apache Project LoC Android Applications LoC 1 hive 471k Square_Point_of_Sale 1,453k 2 meecrowave-runner 395k Alipay 1,400k 3 fop 185k Amazon_Kindle 1,256k 4 spark 155k Perfect365_Makeover 1,231k 5 hadoop 151k Manga_Reader 1,032k 6 kylin 151k Free_Bitcoin_Spinner 872k 7 tomee 118k AICoin 790k 8 jackrabbit-oak 102k LINE_WEBTOON 749k 9 airavata 89k Sephora_Shop_Makeup 726k 10 nifi 76k Audiobooks_from_Audible 711k 11 qpid-jms-amqp-0-x 66k Mint_Budget_Bills_Finance 673k 12 juddi 63k Money_Lover 627k 13 wss4j 42k Ulta_Beauty 570k 14 santuario-java 41k Daily_Bible_Journey 449k 15 plugin-yarn 35k iqboxyinc 429k 16 embeddedwebserver 34k Facebook_Pages_Manager 424k 17 abdera 31k Dictionary_Merriam_Webster 398k 18 cloudstack 28k Tiny_Scanner 371k 19 directory-server 23k misa.sothuchi 365k 20 manifoldcf 19k receipts 353k 21 tika 17k Auto_Makeup 338k 22 wicket 16k JW_Library 320k 23 taverna-workbench 10k Card_Maker_for_Pokemon 296k 24 shindig 10k Cartoon_Avatar_Maker 277k 25 activemq-artemis 7k Clairol_MyShade 227k 26 deltaspike 6k UPS_Mobile 220k 27 knox 6k ADP_Mobile_Solutions 203k 28 shiro 2k ebook_Renta 105k 29 meecrowave-core 1k jits.mobile.aya 3k 30 geronimo-gshell 1k zhangdan 0.4k 132 Appendix A: Cryptographic Code Screening Sazzadur Rahaman

Table A3: Benchmark comparison of CrySL, Coverity, SpotBugs, and CryptoGuard on all 16 rules with CRYPTOAPI-BENCH’s 112 test cases (as of April 2019). There are 16 secure API use cases (13 in basic and 3 in advanced), which a tool should not raise any alerts on. CRYPTOGUARD successfully passed these 16 test cases. GTP stands for ground truth positive, which is the number of positives in the benchmark. CRYPTOGUARD has 11 false negatives, which we reported in Section 3.6 and discussed in Section 3.8.

No. Rules GTP CrySL Coverity SpotBugs CryptoGuard TP FP TP FP TP FP TP FP 1 Predictable Cryptographic Key 5 0 4 3 0 2 0 5 0 2 Predictable Password for PBE 6 0 2 5 0 3 0 6 0 3 Predictable Password for KeyStore 5 0 5 3 0 2 0 5 0 4 Dummy Hostname Verifier 1 – – 1 0 1 0 1 0 5 Dummy Cert. Validation 1 – – 1 0 1 0 1 0 6 Used Improper Socket 4 – – 4 0 – – 4 0 7 Use of HTTP 4 – – – – – – 4 0 8 Predictable Seed 10 – – 1 0 – – 5 0 9 Untrusted PRNG 1 – – – – 1 0 1 0 10 Static Salt 5 5 1 – – – – 3 0 11 ECB in Symm. Crypto 4 2 1 1 0 1 1 4 0 12 Static IV 6 0 6 – – 6 0 4 0 13 <1000 PBE Iteration 5 2 1 – – – – 4 0 14 Broken Symm. Crypto 20 10 5 4 0 5 5 20 0 15 Insecure Asymm. Crypto 3 2 1 – – 0 1 2 0 16 Broken Hash 16 8 4 4 0 4 4 16 0 Total 96 29 30 27 0 26 11 85 0

Table A4: Rules that use intra-procedural backward program slicing to slice implemented methods of standard Java APIs and their corresponding slicing criteria.

No. Method to Slice Rule Criterion

4.1 javax.net.ssl.HostnameVerifier: boolean verify(String,SSLSession) 4 return 5.1 void checkServerTrusted(X509Certificate[],String) 5 checkValidity() 5.2 void checkServerTrusted(X509Certificate[],String) 5 throw 5.3 java.security.cert.X509Certificate[] getAcceptedIssuers() 5 return

Table A5: Java APIs used as slicing criteria in our intra-procedural forward program slicing and their corresponding security rules.

No. Slicing Criterion for Intra Procedural Forward Program Slicing Rule Semantic

6.1 javax.net.ssl.SSLSocketFactory: SocketFactory getDefault() 6 Create SocketFactory 6.2 javax.net.ssl.SSLContext: SSLSocketFactory getSocketFactory() 6 Create SocketFactory 15.1 java.security.KeyPairGenerator: KeyPairGenerator getInstance(java.lang.String) 15 Create KeyPairGenerator 15.2 java.security.KeyPairGenerator: KeyPairGenerator getInstance(String,String)> 15 Create KeyPairGenerator 15.3 java.security.KeyPairGenerator: KeyPairGenerator getInstance(String,Provider) 15 Create KeyPairGenerator Sazzadur Rahaman Appendix A: Cryptographic Code Screening 133

Table A6: Java APIs used as slicing criteria in our inter-procedural backward slicing and their corresponding security rules. Boldface indicates the parameter of interest.

No. API Rule Semantic

1.1 javax.crypto.spec.SecretKeySpec: void (byte[],String) 1 Set key 1.2 javax.crypto.spec.SecretKeySpec: void (byte[],int,int,String) 1 Set key 2.1 javax.crypto.spec.PBEKeySpec: void (char[]) 2 Set password 2.2 javax.crypto.spec.PBEKeySpec: void (char[],byte[],int,int) 2 Set password 2.3 javax.crypto.spec.PBEKeySpec: void (char[],byte[],int) 2 Set password 3.1 java.security.KeyStore: void load(InputStream,char[]) 3 Set password 3.2 java.security.KeyStore: void store(OutputStream,char[]) 3 Set password 3.3 java.security.KeyStore: void setKeyEntry(String,Key,char[],Certificate[]) 3 Set password 3.4 java.security.KeyStore: Key getKey(String,char[]) 3 Set password 7.1 java.net.URL: void (String) 7 Set URL 7.2 java.net.URL: void (String,String,String) 7 Set URL 7.3 java.net.URL: void (String,String,int,String) 7 Set URL 7.4 okhttp3.Request$Builder: Request$Builder url(String) 7 Set URL 7.5 retrofit2.Retrofit$Builder: Retrofit$Builder baseUrl(String) 7 Set URL 8.1 java.security.SecureRandom: void (byte[]) 8 Set seed 8.2 java.security.SecureRandom: void setSeed(byte[]) 8 Set seed 8.3 java.security.SecureRandom: void setSeed(long) 8 Set seed 10.1 javax.crypto.spec.PBEParameterSpec: void (byte[],int) 10 Set salt 10.2 javax.crypto.spec.PBEParameterSpec: void (byte[],int,AlgorithmParameterSpec) 10 Set salt 10.3 javax.crypto.spec.PBEKeySpec: void (char[],byte[],int,int) 10 Set salt 10.4 javax.crypto.spec.PBEKeySpec: void (char[],byte[],int) 10 Set salt 11.1 javax.crypto.Cipher: Cipher getInstance(String) 11, 14 Select cipher 11.2 javax.crypto.Cipher: Cipher getInstance(String, String) 11, 14 Select cipher 11.3 javax.crypto.Cipher: Cipher getInstance(String, Provider) 11, 14 Select cipher 12.1 javax.crypto.spec.IvParameterSpec: void (byte[]) 12 Set IV 12.2 javax.crypto.spec.IvParameterSpec: void (byte[],int,int) 12 Set IV 13.1 javax.crypto.spec.PBEParameterSpec: void (byte[],int) 13 Set iterations 13.2 javax.crypto.spec.PBEParameterSpec: void (byte[],int,AlgorithmParameterSpec) 13 Set iterations 13.3 javax.crypto.spec.PBEKeySpec: void (char[],byte[],int,int) 13 Set iterations 13.4 javax.crypto.spec.PBEKeySpec: void (char[],byte[],int) 13 Set iterations 15.1 java.security.KeyPairGenerator: KeyPairGenerator getInstance(String) 15 Select generator 15.2 java.security.KeyPairGenerator: KeyPairGenerator getInstance(String,String)> 15 Select generator 15.3 java.security.KeyPairGenerator: KeyPairGenerator getInstance(String,Provider) 15 Select generator 15.4 java.security.KeyPairGenerator: void initialize(int) 15 Set key size 15.5 java.security.KeyPairGenerator: void initialize(int,java.security.SecureRandom) 15 Set key size 15.6 java.security.KeyPairGenerator: void initialize(AlgorithmParameterSpec) 15 Set key size 15.7 java.security.KeyPairGenerator: void initialize(AlgorithmParameterSpec,SecureRandom) 15 Set key size 16.1 java.security.MessageDigest: MessageDigest getInstance(String) 16 Select hash 16.2 java.security.MessageDigest: MessageDigest getInstance(String, String) 16 Select hash 16.3 java.security.MessageDigest: MessageDigest getInstance(String, Provider) 16 Select hash Appendix B

Security in Pamyment Card Industry

Table B1: Specifications defined by the PCI Security Standard Council (SSC) along with their targets, evaluators, assessors and whether it is enforced by SSC. “COTS" stands for Commercial Off-The-Shelf.

PCI Specifications Target(s) Evaluator(s) Assessor(s): Type Required? Merchant, Acquirer Bank, Issuer Bank, Acquirer, QSA: Manual Data Security Standard (DSS) [51] Yes Token Service Provider, Payment Brand ASV: Automated Service Provider Card Issuer, Card Production and Provisioning (CPP) [39, 40] Card Manufacturer, Payment Brand CPP-QSA: Manual Yes Token Service Provider Payment Application DSS (PA DSS) [26] PA Vendors PA-QSA PA-QSA: Manual Optional Point-to-Point Encryption (P2PE) [18] POS Device Vendors P2PE-QSA P2PE-QSA: Manual Optional PIN Transaction Security (PTS) [27, 54] PIN Pad Vendors PTS Labs PTS Labs: Manual Optional 3DS Server, 3-D Secure (3DS) [38] 3DS Directory Server, Payment Brand 3DS-QSA: Manual Optional 3DS Access Control Server PIN-based Cardholder Software-Based PIN Entry on COTS (SPoC) [55] SPoC Labs SPoC Labs: Manual Optional verification (CVM) Apps Token Service Provider (TSP) [19] Token Service Providers P2PE-QSA P2PE-QSA: Manual Optional

134 Sazzadur Rahaman Appendix B: Security in Pamyment Card Industry 135

Table B2: A summary of the guidelines for ASV scanners [41]. In the fourth column, we show the categories that are required to be fixed. “∗" means that in the SSL/TLS category, all the vulnera- bilities are required to be fixed, except case 18.

Target Component Expectation Test-cases Must fix? 1. Must scan all network devices such as firewalls and external routers. Firewalls and Routers 1 Yes 2. Must test for known vulnerabilities and patches. 1. Must scan to determine the OS type and version. Operating Systems - Yes 2. An unsupported OS must be marked as an automatic failure. 1. Must test for open access to databases from the Internet. Database Servers 2 Yes 2. If found - must be marked as an automatic failure (Req. 1.3.6) 1. Must be able to test for all known vulnerabilities and configuration issues. Web Servers 30 Yes 2. Report if directory browsing is observed. Application Servers 1. Must be able to test for all known vulnerabilities and configuration issues. 29, 33 Yes Common Web Scripts 1. Must be able to find common web scripts (e.g., CGI, e-commerce, etc.). - Yes 1. Default username/passwords in routers, firewalls, OS and server apps. Built-in Accounts 5, 6 Yes 2. Must be marked as an automatic failure. (Req 2.1) 1. Must be able to detect the presence DNS and Mail Servers 2. Must test for known vulnerabilities and configuration issues - Yes 3. Report observed vulnerability (failure for DNS server vulnerabilities). Virtualization components 1. Must be able to test for all known vulnerabilities - Yes Must find common vulnerabilities including the following: 1. Unvalidated parameters that might lead to SQL injection. 2. Cross-site scripting (XSS) flaws 21, 22, 23, Web Applications 3. Directory traversal vulnerabilities 24, 25, 26, 27, Yes 4. HTTP response splitting/header injection 28, 31, 32 5. Information leakage: phpinfo(), Insecure HTTP methods, detailed error msg 6. If found any of the above must be marked as an automatic failure Other Applications 1. Must test for known vulnerabilities and configuration issues 20 Yes Common Services 1. Must test for known vulnerabilities and configuration issues 19 Yes 1. Must be able to detect wireless access points Wireless Access Points - Yes 2. Must test and report known vulnerabilities and configuration issues 1. Must test for remotely detectable backdoors/malware Backdoors/Malware - Yes 2. Report automatic failure if found one Must find: 1. Various version of crypto protocols 2. Detect the encryption algorithms and encryption key strengths SSL/TLS 3. Detect signing algorithms used for all server certificates 12-18 Yes∗ 4. Detect and report on certificate validity 5. Detect and report on whether CN matches the hostname 6. Mark as failure if supports SSL or early versions of TLS. Anonymous Key agreement 1. Must identify protocols allowing anonymous/non-authenticated cipher suites - Yes Protocol 2. Report if found one 1. Must be able to detect remote access software 3, 4 Remote Access 2. Must report if one is detected. Yes 19, 20 3. Must test and report known vulnerabilities and configuration issues 1. Should look for POS software Point-of-sale (POS) Software - No 2. If found - ask for justification Embedded links or code 1. Should look for out-of-scope links/code - No from out-of-scope domains 2. If found - ask for justification Insecure Services/ 1. If found one - ask for justification - No industry-deprecated protocols Unknown services 1. Should look for unknown services and report if found - No 136 Appendix B: Security in Pamyment Card Industry Sazzadur Rahaman

Table B3: PCI DSS requirements are presented with expected testing (from SAQ D-Mer) and the potential test-cases that can be used to evaluate the ASV scanning.

No. Requirement Expected Testing Testcase 1. Review current network diagram 1.1 Formalize testing when firewall configurations change N/A 2. Examine network configuration Build a firewall to restrict "untrusted" traffic 1. Review firewall and router config 1.2 1. Enable/disable firewall. to cardholder data environment 2. Examine firewall and router config 2. Expose Mysql to the Internet Prohibit direct public access between Internet 1.3 1. Examine firewall and router config 3. SSH over public Internet and cardholder data environment 4. Remote access to PhpMyadmin Install a firewall on computers that have connectivity 1.4 1. Examine employee owned-devices N/A to the Internet and organization’s network Always change vendor-supplied defaults before 1. Examine vendor documentations 5. Use default DB user/password 2.1 installing a System on the network 2. Observe system configurations 6. Use default Phpmyadmin user/password Develop a configuration standards for all system 1. Examine vendor documentations 2.2 N/A components that address all known security vulnerabilities. 2. Observe system configurations Encrypt using Strong cryptography all non-console 1. Examine system components 2.3 administrative access such as browser/web-based 2. Examine system configurations 7. Sensitive information over HTTP management tools 3. Observe an administrator log on Shared hosting providers must also comply 2.4 1. Examine system inventory N/A with PCI DSS requirements 3.1 Establish cardholder data retention and disposal policies 1. Review data retention and disposal policies N/A Do not store sensitive authentication data 1. Examine system configurations 3.2 8. Store CVV in DB (even it is encrypted) 2. Examine deletion processes 1. Examine system configurations 3.3 Mask PAN when displayed 9. Show unmask PAN 2. Observe displays of PAN 1. Examine data repositories 3.4 Render PAN unreadable anywhere it is stored 2. Examine removable media 10. Store plain-text PAN (OpenCart) 3. Examine audit logs Secure keys that are used to encrypt stored 1. Examine system configurations 3.5 11. Use hardcoded key for encrypting PAN cardholder data or other keys 2. Examine key storage locations 3.6 Document all key-management process 1. Review key-management procedures N/A 12. Use self-signed certificate 13. Use insecure block cipher 14. Use Expired certificate Use strong cryptography and security protocols 4.1 1. Review system configurations 15. Use cert. with wrong hostname during transmission of cardholder data. 16. Use 1024 bit DH modulus. 17. Use weak hash in SSL certificate 18. Use TLSv1.0 Never send PAN over unprotected user 4.2 1. Review policies and procedures N/A messaging technologies. 5.1 Deploy anti-virus software on all systems 1. Examine system configurations 2. Interview personnel N/A 1. Examine anti-virus configurations Ensure all anti-virus mechanisms are current, 5.2 2. Review log retention process N/A running and generating audit log 3. Examine system configurations Ensure that all system components are protected 1. Examine system components 19. Use vulnerable of OpenSSH 6.1 from known vulnerabilities 2. Compare the list of security patches 20. Use vulnerable PhpMyadmin Establish a process to identify and assign risk 6.2 1. Review policies and procedures N/A to newly discovered security vulnerabilities Develop software applications in accordance 6.3 1. Review software development process N/A with PCI DSS and industry best practices Follow change control processes and procedures 6.4 1. Review change control process N/A for all changes to system components 21. Implant SQL injection in admin login 22. Implant SQL injection in customer login 23. Disable password retry limit Develop applications based on secure coding 24. Disable restriction on password length. 6.5 1. Review software-development policies guidelines and review custom application code 25. Use JS from external source insecurely 26. Do not hide program crashes 27. Implant XSS 28. Implant CSRF 29. Present server info in security Headers. 30. Browsable web directories. Ensure all public-facing applications are 6.6 1. Examine system configuration 31. Enable HTTP Trace/Track protected against known attacks 32. Enable phpinfo() 33. Disable security headers 1. Examine access control policy Restrict access to cardholder data 2. Review vendor documentation 7 N/A based on roles 3. Examine system configuration 4. Interview personnel Render all passwords unreadable during storage 34. Store unsalted customer passwords 8.41 1. Examine system configuration and transmission for all system components 35. Store plaintext passwords 1. Observe process 9 Restrict physical access to cardholder data 2. Review policies and procedures N/A 3. Interview personnel 1. Interview personnel Track and monitor all access to network 10 2. Observe audit logs N/A resource and cardholder data 3. Examine audit log settings 1. Interview personnel 11 Regularly test security systems and processes 2. Examine scope of testing N/A 3. Review results of ASV scans 1. Review formal risk assessment Maintain a policy that addresses information 12 2. Review security policy N/A security for all personnel 3. Interview personnel. 9 Other requirements under 8 are not testable. Sazzadur Rahaman Appendix B: Security in Pamyment Card Industry 137

Figure B1: Homepage of www.rwycart.com.

Figure B2: Customer login page of www.rwycart.com. Bibliography

[1] CogniCrypt_SAST for Android.

[2] Common Vulnerability Scoring System Calculator Version 3. https://nvd.nist.g ov/vuln-metrics/cvss/v3-calculator. [Online; accessed 28-Aug-2019].

[3] Coverity Static Application Security Testing (SAST).

[4] The owasp zed attack proxy (zap). https://www.zaproxy.org/.

[5] Spotbugs: Find Bugs in Java Programs.

[6] W3af. http://w3af.org/.

[7] RFC 2818 - HTTP over TLS. https://tools.ietf.org/html/rfc2818, 2000. [Online; accessed 15-Oct-2018].

[8] RFC 3207 - SMTP service extension for secure SMTP over transport layer security. http s://www.ietf.org/rfc/rfc3207.txt, 2002. [Online; accessed 15-Oct-2018].

[9] Bleichenbacher’s RSA signature forgery based on implementation error. https://www. ietf.org/mail-archive/web/openpgp/current/msg00999.html, 2006. [Online; accessed 15-Oct-2018].

[10] The legal implications, risks and problems of the pci data security standard. https://ww w.infolawgroup.com/2008/02/articles/pci-1/the-legal-implicat

ions-risks-and-problems-of-the-pci-data-security-standard/, 2008.

[11] Faq on washington states pci law. https://www.infolawgroup.com/2010/03/a rticles/payment-card-breach-laws/faq-on-washington-states-pc

i-law/, 2010.

138 Sazzadur Rahaman Bibliography 139

[12] SQL injection with raw MD5 hashes (leet more ctf 2010 injection 300). http://cvk. posthaven.com/sql-injection-with-raw-md5-hashes, 2010.

[13] BEAST. https://vnhacker.blogspot.co.uk/2011/09/beast.html, 2011. [Online; accessed 3-May-2017].

[14] RFC 6944 - applicability statement: DNS security (DNSSEC) DNSKEY algorithm imple- mentation status. https://tools.ietf.org/html/rfc6944, 2013. [Online; accessed 15-Oct-2018].

[15] The Heartbleed Bug. http://heartbleed.com/, 2014. [Online; accessed 3-May-2017].

[16] CCS injection vulnerability. http://ccsinjection.lepidum.co.jp/, 2015. [Online; accessed 4- May-2017].

[17] The FREAK attack. https://censys.io/blog/freak, 2015. [Online; accessed 4-May-2017].

[18] Payment Card Industry (PCI) Point-to-Point Encryption: Solution Requirements and Test- ing Procedures. https://www.pcisecuritystandards.org/documents/P 2PE_v2_r1-1.pdf, 2015.

[19] Payment Card Industry (PCI) Token Service Providers: Additional Security Requirements and Assessment Procedures for Token Service Providers (EMV Payment Tokens). https: //www.pcisecuritystandards.org/documents/PCI_TSP_Requirement

s_v1.pdf, 2015.

[20] All About Fraud: How Crooks Get the CVV. https://krebsonsecurity.com/ 2016/04/all-about-fraud-how-crooks-get-the-cvv/, 2016. [Online; accessed 8-Jan-2019].

[21] Cryptographic Key Length Recommendation. https://www.keylength.com/en /4/, 2016. [Online; accessed 29-Jan-2018]. 140 Bibliography Sazzadur Rahaman

[22] Fixed potential stack corruption in mbedtls_x509write_crt_der(). https:// github.com/ARMmbed/mbedtls/blob/development/ChangeLog#L118, 2016. [Online; accessed 4-May-2017].

[23] Fixed pthread implementation to avoid unintended double initialisations and double frees. https://github.com/ARMmbed/mbedtls/blob/development/ChangeLo

g#L154, 2016. [Online; accessed 4-May-2017].

[24] Google Play Warning: How to fix incorrect implementation of HostnameVerifier? https: //stackoverflow.com/questions/41312795/google-play-warning-h

ow-to-fix-incorrect-implementation-of-hostnameverifier, 2016. [Online; accessed 29-Jan-2018].

[25] libevent (stack) buffer overflow in evutil_parse_sockaddr_port(). https://github.c om/libevent/libevent/issues/318, 2016. [Online; accessed 4-May-2017].

[26] Payment Card Industry (PCI) Payment Application Data Security Standard: Requirements and Security Assessment Procedures. https://www.pcisecuritystandards.o rg/documents/PA-DSS_v3-2.pdf, 2016.

[27] Payment Card Industry (PCI) PIN Transaction Security (PTS) Hardware Security Module (HSM): Modular Security Requirements. https://www.pcisecuritystandards .org/documents/PCI_HSM_Security_Requirements_v3_2016_final.p

df, 2016.

[28] PCI Self-Assessment Questionnaire Instructions and Guidelines. version 3.2. https: //www.pcisecuritystandards.org/documents/SAQ-InstrGuidelines

-v3_2.pdf, 2016.

[29] Amazon Connect is Now PCI DSS Compliant. https://aws.amazon.com/about -aws/whats-new/2017/07/amazon-connect-is-now-pci-dss-complia

nt/, 2017. Sazzadur Rahaman Bibliography 141

[30] Change the default Crypt Algo to use stronger cryptographic algo. "https://issues.a pache.org/jira/browse/RANGER-1644", 2017. [Online; accessed Jan 26, 2018].

[31] Class Random. https://docs.oracle.com/javase/8/docs/api/java/uti l/Random.html, 2017. [Online; accessed 29-Jan-2018].

[32] Class SecureRandom. https://docs.oracle.com/javase/8/docs/api/ja va/security/SecureRandom.html, 2017. [Online; accessed 29-Jan-2018].

[33] EMV Payment Tokenisation Specification: Technical Framework. https://www.emvc o.com/terms-of-use/?u=/wp-content/uploads/documents/EMVCo-P ayment-Tokenisation-Specification-Technical-Framework-v2.0-1

.pdf, 2017.

[34] Giant equifax data breach: 143 million people could be affected. https://money.cn n.com/2017/09/07/technology/business/equifax-data-breach/ind

ex.html, 2017.

[35] How many e-commerce companies are there? whats the global e-commerce market size? http://blog.pipecandy.com/e-commerce-companies-market-size/, 2017.

[36] Lifetimes of cryptographic hash functions. http://valerieaurora.org/hash.h tml, 2017. [Online; accessed 29-Jan-2018].

[37] List of Rainbow Tables. http://project-rainbowcrack.com/table.htm, 2017. [Online; accessed 29-Jan-2018].

[38] Payment Card Industry 3-D Secure (PCI 3DS): Security Requirements and Assessment Pro- cedures for EMV 3-D Secure Core Components: ACS, DS, and 3DS Server. https: //www.pcisecuritystandards.org/documents/PCI-3DS-Core-Securi

ty-Standard-v1.pdf, 2017. 142 Bibliography Sazzadur Rahaman

[39] Payment Card Industry (PCI) Card Production and Provisioning: Logical Security Require- ments. https://www.pcisecuritystandards.org/documents/PCI_Car d_Production_Logical_Security_Requirements_v2.pdf, 2017.

[40] Payment Card Industry (PCI) Card Production and Provisioning: Physical Security Require- ments. https://www.pcisecuritystandards.org/documents/PCI_Car d_Production_Physical_Security_Requirements_v2.pdf, 2017.

[41] Payment Card Industry (PCI) Data Security Standard Approved Scanning Vendor. program guide. version 3.1. https://www.pcisecuritystandards.org/documents/A SV_Program_Guide_v3.1.pdf, 2017.

[42] Update doc/wiki to provide details on using custom encryption key and salt for encryption of credentials. https://issues.apache.org/jira/browse/RANGER-1645, 2017. [Online; accessed Jan 26, 2018].

[43] Approved scanning vendors. https://www.pcisecuritystandards.org/ass essors_and_solutions/approved_scanning_vendors, 2018.

[44] Card Fraud on the Rise, Despite National EMV Adoption. https://geminiadviso ry.io/card-fraud-on-the-rise/, 2018. [Online; accessed 8-Jan-2019].

[45] Cardconnect: A new wave of payment processing. https://cardconnect.com/, 2018.

[46] A comprehensive guide to pci dss merchant levels. https://semafone.com/blog/ a-comprehensive-guide-to-pci-dss-merchant-levels/, 2018.

[47] EMV 3-D Secure: Protocol and Core Functions Specification. https://www.emvco. com/wp-content/uploads/documents/EMVCo_3DS_Spec_v220_122018.

pdf, 2018. Sazzadur Rahaman Bibliography 143

[48] Google rejected app because of HostnameVerifier issue. "https://stackoverflow. com/questions/48420530/google-rejected-app-because-of-hostna

meverifier-issue", 2018. [Online; accessed Jan 26, 2018].

[49] Let’s Encrypt. https://letsencrypt.org/, 2018.

[50] Opencart. https://www.opencart.com/, 2018.

[51] Payment Card Industry (PCI) Data Security Standard: Requirements and security assess- ment procedures. https://www.pcisecuritystandards.org/documents/P CI_DSS_v3-2-1.pdf, 2018.

[52] Payment Card Industry (PCI) Data Security Standard Self-Assessment Questionnaire D and Attestation of Compliance for Merchants: All other SAQ-Eligible Merchants. https: //www.pcisecuritystandards.org/documents/PCI-DSS-v3_2_1-SAQ

-D_Merchant.pdf?agreement=true&time=1557603304233, 2018.

[53] Payment Card Industry (PCI) Data Security Standard Self-Assessment Questionnaire: In- structions and Guidelines. https://finance.ubc.ca/sites/finserv.ubc.c a/files/banking-leases/PCI_DSS_SAQ_Instructions_Guidelines.

pdf, 2018.

[54] Payment Card Industry (PCI) PIN Transaction Security (PTS) Point of Interaction (POI): Modular Security Requirements. https://www.pcisecuritystandards.org/d ocuments/PCI_PTS_POI_SRs_v5-1.pdf, 2018.

[55] Payment Card Industry (PCI) Software-based PIN Entry on COTS: Security Requirements. https://www.pcisecuritystandards.org/documents/SPoC_Securit

y__Requirements_v1.0.pdf, 2018.

[56] Who’s In Your Online Shopping Cart? https://krebsonsecurity.com/2018/1 1/whos-in-your-online-shopping-cart/, 2018. 144 Bibliography Sazzadur Rahaman

[57] BlueCrypt: Cryptographic Key Length Recommendation. https://www.keylength. com/en/4/, 2019.

[58] DB-Engines Ranking. https://db-engines.com/en/ranking, 2019.

[59] Insert Skimmer + Camera Cover PIN Stealer. https://krebsonsecurity.com/ 2019/03/insert-skimmer-camera-cover-pin-stealer/, 2019. [Online; accessed 20-Mar-2019].

[60] netcat. https://en.wikipedia.org/wiki/Netcat, 2019.

[61] PCI DSS Compliance. https://www.akamai.com/us/en/resources/pci-d ss-compliance.jsp, 2019.

[62] Shopify meets all 6 categories of PCI standards. https://www.shopify.ca/secur ity/pci-compliant, 2019.

[63] Standards, Regulations & Certifications. https://cloud.google.com/securit y/compliance/pci-dss/, 2019.

[64] Y. Acar, M. Backes, S. Fahl, S. Garfinkel, D. Kim, M. L. Mazurek, and C. Stransky. Com- paring the usability of cryptographic apis. In Proceedings of the 38th IEEE Symposium on Security and Privacy, 2017.

[65] Y. Acar, M. Backes, S. Fahl, D. Kim, M. L. Mazurek, and C. Stransky. You Get Where You’re Looking for: The Impact of Information Sources on Code Security. In IEEE S&P’16, pages 289–305, 2016.

[66] Y. Acar et al. Developers Need Support, Too: A Survey of Security Advice for Software Developers. In IEEE Secure Development Conference SecDev, 2017.

[67] S. Afrose, S. Rahaman, and D. Yao. Cryptoapi-bench: A comprehensive benchmark on java cryptographic API misuses. In 2019 IEEE Cybersecurity Development, SecDev 2019, Tysons Corner, VA, USA, September 23-25, 2019, pages 49–61, 2019. Sazzadur Rahaman Bibliography 145

[68] S. Afrose, S. Rahaman, and D. D. Yao. CryptoAPI-Bench: A Comprehensive Benchmark on Java Cryptographic API Misuses. In IEEE Secure Development Conference (SecDev), September 2019.

[69] N. J. AlFardan and K. G. Paterson. Lucky Thirteen: Breaking the TLS and DTLS Record Protocols. In IEEE S&P 2013, pages 526–540, 2013.

[70] J. B. Almeida, M. Barbosa, G. Barthe, F. Dupressoir, and M. Emmi. Verifying Constant- Time Implementations. In USENIX Security 2016, pages 53–70, 2016.

[71] R. Anderson and T. Moore. The economics of information security. Science, 314(5799):610–613, 2006.

[72] R. J. Anderson. Why cryptosystems fail. In Proceedings of the ACM Conference on Com- puter and Communications Security (CCS), 1993.

[73] R. J. Anderson and S. J. Murdoch. EMV: why payment systems fail. Commun. ACM, 57(6):24–28, 2014.

[74] M. Arroyo, F. Chiotta, and F. Bavera. An user configurable clang static analyzer taint checker. In 35th International Conference of the Chilean Computer Science Society, SCCC 2016, Valparaíso, Chile, October 10-14, 2016, pages 1–12, 2016.

[75] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. L. Traon, D. Octeau, and P. D. McDaniel. FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 259–269, 2014.

[76] N. Aviram, S. Schinzel, J. Somorovsky, N. Heninger, M. Dankel, J. Steube, L. Valenta, D. Adrian, J. A. Halderman, V. Dukhovni, E. Käsper, S. Cohney, S. Engels, C. Paar, and Y. Shavitt. DROWN: breaking TLS Using SSLv2. In USENIX Security 2016, pages 689– 706, 2016. 146 Bibliography Sazzadur Rahaman

[77] M. Backes, S. Bugiel, and E. Derr. Reliable third-party library detection in android and its security applications. In ACM CCS’16, pages 356–367, 2016.

[78] J. W. Backus. The syntax and semantics of the proposed international algebraic language of the zurich ACM-GAMM conference. In Information Processing, Proceedings of the 1st International Conference on Information Processing, UNESCO, Paris 15-20 June 1959, pages 125–131, 1959.

[79] D. Balzarotti, M. Cova, V. Felmetsger, N. Jovanovic, E. Kirda, C. Kruegel, and G. Vigna. Saner: Composing static and dynamic analysis to validate sanitization in web applications. In Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2008.

[80] G. V. Bard. The Vulnerability of SSL to Chosen Plaintext Attack. IACR Cryptology ePrint Archive, 2004:111, 2004.

[81] R. Bardou, R. Focardi, Y. Kawamoto, L. Simionato, G. Steel, and J. Tsay. Efficient Padding Oracle Attacks on Cryptographic Hardware. In CRYPTO 2012, pages 608–625, 2012.

[82] J. Bau, E. Bursztein, D. Gupta, and J. C. Mitchell. State of the art: Automated black-box web application vulnerability testing. In Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2010.

[83] D. J. Bernstein. Cache-timing attacks on AES. http://cr.yp.to/papers.html#c achetiming, 2005. [Online; accessed 4-May-2017].

[84] D. J. Bernstein, Y. Chang, C. Cheng, L. Chou, N. Heninger, T. Lange, and N. van Someren. Factoring RSA Keys from Certified Smart Cards: Coppersmith in the wild. In ASIACRYPT 2013, pages 341–360, 2013.

[85] B. Beurdouche, K. Bhargavan, A. Delignat-Lavaud, C. Fournet, M. Kohlweiss, A. Pironti, P. Strub, and J. K. Zinzindohoue. A messy state of the union: Taming the composite state Sazzadur Rahaman Bibliography 147

machines of TLS. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 535–552, 2015.

[86] K. Bhargavan and G. Leurent. On the Practical (In-)Security of 64-bit Block Ciphers: Col- lision Attacks on HTTP over TLS and OpenVPN. In ACM CCS 2016, pages 456–467, 2016.

[87] K. Bhargavan and G. Leurent. Transcript Collision Attacks: Breaking Authentication in TLS, IKE and SSH. In NDSS 2016, 2016.

[88] N. Bhaskar, M. Bland, K. Levchenko, and A. Schulman. Please pay inside: Evaluating bluetooth-based detection of gas pump skimmers. In Proceedings of the 28th USENIX Security Symposium (USENIX SEC), 2019.

[89] A. Bianchi, Y. Fratantonio, A. Machiry, C. Kruegel, G. Vigna, S. P. H. Chung, and W. Lee. Broken fingers: On the usage of the fingerprint API in android. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California, USA, February

18-21, 2018, 2018.

[90] D. Bleichenbacher. Chosen Ciphertext Attacks Against Protocols Based on the RSA En- cryption Standard PKCS #1. In CRYPTO ’98, pages 1–12, 1998.

[91] M. Bond, O. Choudary, S. J. Murdoch, S. P. Skorobogatov, and R. J. Anderson. Chip and skim: Cloning EMV cards with the pre-play attack. In Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2014.

[92] D. Boneh. Twenty Years of Attacks on the RSA Cryptosystem. NOTICES OF THE AMS, 46(2), 1999.

[93] A. Bosu, F. Liu, D. D. Yao, and G. Wang. Collusive Data Leak and More: Large-scale Threat Analysis of Inter-app Communications. In AsiaCCS 2017, pages 71–85, 2017. 148 Bibliography Sazzadur Rahaman

[94] B. B. Brumley and N. Tuveri. Remote Timing Attacks Are Still Practical. In ESORICS 2011, pages 355–371, 2011.

[95] D. Brumley and D. Boneh. Remote Timing Attacks Are Practical. In USENIX Security 2003, 2003.

[96] C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In USENIX OSDI 2008, pages 209– 224, 2008.

[97] D. Canali, D. Balzarotti, and A. Francillon. The role of web hosting providers in detecting compromised websites. In Proceedings of the International World Wide Web Conference (WWW), 2013.

[98] D. Chang, A. Jati, S. Mishra, and S. K. Sanadhya. Cryptanalytic Time-Memory Tradeoff for Password Hashing Schemes. IACR Cryptology ePrint Archive, 2017:603, 2017.

[99] S. Y. Chau, O. Chowdhury, M. E. Hoque, H. Ge, A. Kate, C. Nita-Rotaru, and N. Li. Sym- Certs: Practical Symbolic Execution for Exposing Noncompliance in X.509 Certificate Val- idation Implementations. In IEEE S&P 2017, pages 503–520, 2017.

[100] S. Checkoway, J. Maskiewicz, C. Garman, J. Fried, S. Cohney, M. Green, N. Heninger, R. Weinmann, E. Rescorla, and H. Shacham. A systematic analysis of the Juniper Dual EC incident. In ACM CCS 2016, pages 468–479, 2016.

[101] S. Checkoway, R. Niederhagen, A. Everspaugh, M. Green, T. Lange, T. Ristenpart, D. J. Bernstein, J. Maskiewicz, H. Shacham, and M. Fredrikson. On the practical exploitability of dual EC in TLS implementations. In USENIX Security 2014, pages 319–335, 2014.

[102] S. Chen, J. Xu, and E. C. Sezer. Non-Control-Data Attacks Are Realistic Threats. In USENIX Security 2005, 2005. Sazzadur Rahaman Bibliography 149

[103] A. Chi, R. A. Cochran, M. Nesfield, M. K. Reiter, and C. Sturton. A system to verify net- work behavior of known cryptographic clients. In 14th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2017, Boston, MA, USA, March 27-29, 2017, pages 177–195, 2017.

[104] J. Clark and P. C. van Oorschot. SoK: SSL and HTTPS: revisiting past challenges and evaluating certificate trust model enhancements. In 2013 IEEE Symposium on Security and Privacy, SP 2013, Berkeley, CA, USA, May 19-22, 2013, pages 511–525, 2013.

[105] J. de Ruiter and E. Poll. Protocol State Fuzzing of TLS Implementations. In USENIX Security 2015, pages 193–206, 2015.

[106] Welcome to the SWAMP. https://continuousassurance.org, 2018.

[107] A. Doupé, L. Cavedon, C. Kruegel, and G. Vigna. Enemy of the state: A state-aware black- box web vulnerability scanner. In Proceedings of the 21th USENIX Security Symposium, Bellevue, WA, USA, August 8-10, 2012, pages 523–538, 2012.

[108] A. Doupé, M. Cova, and G. Vigna. Why johnny can’t pentest: An analysis of black-box web vulnerability scanners. In Detection of Intrusions and Malware, and Vulnerability Assessment, 7th International Conference, DIMVA 2010, Bonn, Germany, July 8-9, 2010.

Proceedings, pages 111–131, 2010.

[109] A. Doupé, W. Cui, M. H. Jakubowski, M. Peinado, C. Kruegel, and G. Vigna. dedacota: toward preventing server-side XSS via automatic code and data separation. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), 2013.

[110] S. Drimer and S. J. Murdoch. Keep your enemies close: Distance bounding against smart- card relay attacks. In Proceedings of the USENIX Security Symposium (USENIX SEC), 2007. 150 Bibliography Sazzadur Rahaman

[111] F. Duchene, S. Rawat, J. Richier, and R. Groz. Kameleonfuzz: evolutionary fuzzing for black-box XSS detection. In Proceedings of the ACM Conference on Data and Application Security and Privacy (CODASPY), 2014.

[112] M. Egele, D. Brumley, Y. Fratantonio, and C. Kruegel. An empirical study of cryptographic misuse in Android applications. In ACM CCS 2013, pages 73–84, 2013.

[113] K. O. Elish, X. Shu, D. D. Yao, B. G. Ryder, and X. Jiang. Profiling user-trigger dependence for Android malware detection. Computers & Security, 49:255–273, 2015.

[114] S. Fahl, M. Harbach, T. Muders, M. Smith, L. Baumgärtner, and B. Freisleben. Why Eve and Mallory love Android: an analysis of Android SSL (in)Security. In ACM CCS’12, pages 50–61, 2012.

[115] N. Ferguson, B. Schneier, and T. Kohno. Cryptography Engineering - Design Principles and Practical Applications. Wiley, 2010.

[116] F. Gagnon, M. Ferland, M. Fortier, S. Desloges, J. Ouellet, and C. Boileau. AndroSSL: A Platform to Test Android Applications Connection Security. In FPS’15, pages 294–302, 2015.

[117] A. Gamero-Garrido, S. Savage, K. Levchenko, and A. C. Snoeren. Quantifying the pressure of legal risks on third-party vulnerability research. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017.

[118] C. P. García, B. B. Brumley, and Y. Yarom. “Make Sure DSA Signing Exponentiations Really are Constant-Time". In ACM CCS 2016, pages 1639–1650, 2016.

[119] C. Garman, M. Green, G. Kaptchuk, I. Miers, and M. Rushanan. Dancing on the Lip of the Volcano: Chosen Ciphertext Attacks on Apple iMessage. In USENIX Security’16, pages 655–672, 2016. Sazzadur Rahaman Bibliography 151

[120] M. Georgiev, S. Iyengar, S. Jana, R. Anubhai, D. Boneh, and V. Shmatikov. The most dangerous code in the world: validating ssl certificates in non-browser software. In Pro- ceedings of the 2012 ACM conference on Computer and communications security, pages 38–49. ACM, 2012.

[121] M. Georgiev, S. Iyengar, S. Jana, R. Anubhai, D. Boneh, and V. Shmatikov. The most dangerous code in the world: validating SSL certificates in non-browser software. In ACM CCS’12, 2012.

[122] I. Goldberg and D. Wagner. Randomness and the Netscape browser. Dr Dobb’s Journal- Software Tools for the Professional Programmer, 21(1):66–71, 1996.

[123] J. Grossman. Cross site tracing (xst). https://www.cgisecurity.com/whiteh at-mirror/WH-WhitePaper_XST_ebook.pdf.

[124] X. Han, N. Kheir, and D. Balzarotti. Phisheye: Live monitoring of sandboxed phishing kits. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), 2016.

[125] B. He, V. Rastogi, Y. Cao, Y. Chen, V. N. Venkatakrishnan, R. Yang, and Z. Zhang. Vetting SSL usage in applications with SSLINT. In 2015 IEEE Symposium on Security and Privacy, SP 2015, San Jose, CA, USA, May 17-21, 2015, pages 519–534, 2015.

[126] N. Heninger, Z. Durumeric, E. Wustrow, and J. A. Halderman. Mining your Ps and Qs: Detection of Widespread Weak Keys in Network Devices. In USENIX Security 2012, pages 205–220, 2012.

[127] P. Hooimeijer, B. Livshits, D. Molnar, P. Saxena, and M. Veanes. Fast and precise sanitizer analysis with BEK. In Proceedings of the USENIX Security Symposium (USENIX SEC), 2011.

[128] D. Hovemeyer and W. Pugh. Finding bugs is easy. SIGPLAN Notices, 39(12):92–106, 2004. 152 Bibliography Sazzadur Rahaman

[129] M. Islam, S. Rahaman, N. Meng, B. Hassanshahi, P. Krishnan, and D. D. Yao. Coding practices and recommendations of spring security for enterprise applications. In 2020 IEEE Cybersecurity Development, SecDev 2020, 2020. (to appear).

[130] T. Jager, S. Schinzel, and J. Somorovsky. Bleichenbacher’s Attack Strikes again: Breaking PKCS#1 v1.5 in XML Encryption. In ESORICS 2012, pages 752–769, 2012.

[131] B. Johnson et al. Why don’t software developers use static analysis tools to find bugs? In ICSE’13, pages 672–681, 2013.

[132] V. Klíma, O. Pokorný, and T. Rosa. Attacking RSA-based Sessions in SSL/TLS. In Cryp- tographic Hardware and Embedded Systems - CHES 2003, 5th International Workshop,

Cologne, Germany, September 8-10, 2003, Proceedings, pages 426–440, 2003.

[133] H. Krawczyk. How to Predict Congruential Generators. In CRYPTO’89, pages 138–153, 1989.

[134] S. Krüger, S. Nadi, M. Reif, K. Ali, M. Mezini, E. Bodden, F. Göpfert, F. Günther, C. Wein- ert, D. Demmler, and R. Kamath. Cognicrypt: supporting developers in using cryptography. In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, October 30 - November 03, 2017, pages 931– 936, 2017.

[135] S. Krüger, J. Späth, K. Ali, E. Bodden, and M. Mezini. CrySL: An Extensible Approach to Validating the Correct Usage of Cryptographic APIs. In ECOOP’18, pages 10:1–10:27, 2018.

[136] Y. Kwon, B. Saltaformaggio, I. L. Kim, K. H. Lee, X. Zhang, and D. Xu. A2c: Self destructing exploit executions via input perturbation. In NDSS 2017, 2017.

[137] D. Lazar, H. Chen, X. Wang, and N. Zeldovich. Why does cryptographic software fail?: a case study and open problems. In APSys 2014, pages 7:1–7:7, 2014. Sazzadur Rahaman Bibliography 153

[138] J. Li, Z. Lin, J. Caballero, Y. Zhang, and D. Gu. K-Hunt: Pinpointing Insecure Crypto- graphic Keys from Execution Traces. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS 2018, Toronto, ON, Canada, October

15-19, 2018, pages 412–425, 2018.

[139] Y. Lindell and J. Katz. Introduction to modern cryptography. Chapman and Hall/CRC, 2014.

[140] R. Lippmann and R. K. Cunningham. Improving intrusion detection performance using keyword selection and neural networks. Computer Networks, 34(4):597–603, 2000.

[141] B. Livshits and S. Chong. Towards fully automatic placement of security sanitizers and declassifiers. In Proceedings of the ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL), 2013.

[142] A. D. Lucia. Program Slicing: Methods and Applications. In IEEE International Workshop on Source Code Analysis and Manipulation SCAM’01, pages 144–151, 2001.

[143] S. Ma, D. Lo, T. Li, and R. H. Deng. CDRep: Automatic Repair of Cryptographic Misuses in Android Applications. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, AsiaCCS 2016, pages 711–722, 2016.

[144] S. Ma, F. Thung, D. Lo, C. Sun, and R. H. Deng. VuRLE: Automatic Vulnerability Detection and Repair by Learning from Examples. In Computer Security - ESORICS 2017 - 22nd European Symposium on Research in Computer Security, pages 229–246, 2017.

[145] N. Meng, S. Nagy, D. Yao, W. Zhuang, and G. A. Argoty. Secure Coding Practices in Java: Challenges and Vulnerabilities. In ACM ICSE’18, Gothenburg, Sweden, May 2018.

[146] C. Meyer, J. Somorovsky, E. Weiss, J. Schwenk, S. Schinzel, and E. Tews. Revisiting SSL/TLS Implementations: New Bleichenbacher Side Channels and Attacks. In USENIX Security 2014, pages 733–748, 2014. 154 Bibliography Sazzadur Rahaman

[147] B. Möller, T. Duong, and K. Kotowicz. This POODLE bites: exploiting the SSL 3.0 fall- back, 2014.

[148] K. Moriarty, B. Kaliski, and A. Rusch. Pkcs# 5: Password-Based Cryptography Specifica- tion Version 2.1. 2017.

[149] V. Murali, S. Chaudhuri, and C. Jermaine. Bayesian specification learning for finding API usage errors. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, ESEC/FSE 2017, Paderborn, Germany, September 4-8, 2017, pages 151–162, 2017.

[150] S. J. Murdoch and R. J. Anderson. Verified by visa and mastercard securecode: Or, how not to design authentication. In Proceedings of the International Conference on Financial Cryptography and Data Security (FC), 2010.

[151] S. J. Murdoch, S. Drimer, R. J. Anderson, and M. Bond. Chip and PIN is broken. In Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2010.

[152] S. Nadi, S. Krüger, M. Mezini, and E. Bodden. “Jumping Through Hoops": Why do Java Developers Struggle with Cryptography APIs? In Software Engineering (ICSE), 2016 IEEE/ACM 38th International Conference on, pages 935–946. IEEE, 2016.

[153] Y. Nan, Z. Yang, X. Wang, Y. Zhang, D. Zhu, and M. Yang. Finding clues for your se- crets: Semantics-driven, learning-based privacy discovery in mobile apps. In 25th Annual Network and Distributed System Security Symposium, NDSS 2018, San Diego, California,

USA, February 18-21, 2018, 2018.

[154] J. Nazario. PhoneyC: A virtual client honeypot. In Proceedings of the USENIX Workshop on Large-Scale Exploits and Emergent Threats (LEET), 2009.

[155] D. Nguyen, D. Wermke, Y. Acar, M. Backes, C. Weir, and S. Fahl. A stitch in time: Sup- porting android developers in writingsecure code. In Proceedings of the 2017 ACM SIGSAC Sazzadur Rahaman Bibliography 155

Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, Oc-

tober 30 - November 03, 2017, pages 1065–1077, 2017.

[156] D. C. Nguyen et al. A Stitch in Time: Supporting Android Developers in Writing Secure Code. In ACM CCS’17, pages 1065–1077, 2017.

[157] A. Oest, Y. Safaei, A. Doupé, G.-J. Ahn, B. Wardman, and K. Tyers. Phishfarm: A scalable framework for measuring the effectiveness of evasion techniques against browser phishing blacklists. In Proceedings of the IEEE Symposium on Security and Privacy (S&P), 2019.

[158] D. Page. Theoretical use of cache memory as a cryptanalytic side-channel. IACR Cryptology ePrint Archive, 2002:169, 2002.

[159] R. Paletov, P. Tsankov, V. Raychev, and M. T. Vechev. Inferring crypto API rules from code changes. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2018, pages 450–464, 2018.

[160] X. Pan, X. Wang, Y. Duan, X. Wang, and H. Yin. Dark Hazard: Learning-based, Large- Scale Discovery of Hidden Sensitive Operations in Android Apps. In NDSS’17, 2017.

[161] G. Pellegrino and D. Balzarotti. Toward black-box detection of logic flaws in web applica- tions. In Proceedings of the Network and Distributed System Security Symposium (NDSS), 2014.

[162] G. Pellegrino, M. Johns, S. Koch, M. Backes, and C. Rossow. Deemon: Detecting CSRF with dynamic analysis and property graphs. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), 2017.

[163] N. H. Pham, T. T. Nguyen, H. A. Nguyen, and T. N. Nguyen. Detection of recurring soft- ware vulnerabilities. In ASE 2010, 25th IEEE/ACM International Conference on Automated Software Engineering, pages 447–456, 2010. 156 Bibliography Sazzadur Rahaman

[164] N. Provos. A virtual honeypot framework. In Proceedings of the 13th USENIX Security Symposium (USENIX SEC), 2004.

[165] S. Rahaman, Y. Xiao, S. Afrose, F. Shaon, K. Tian, M. Frantz, M. Kantarcioglu, and D. D. Yao. Cryptoguard: High precision detection of cryptographic vulnerabilities in massive- sized java projects. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, CCS 2019, London, UK, November 11-15, 2019, pages 2455– 2472, 2019.

[166] S. Rahaman, Y. Xiao, K. Tian, F. Shaon, M. Kantarcioglu, and D. Yao. RIGORITYJ: deployment-quality detection of java cryptographic vulnerabilities. CoRR, abs/1806.06881, 2018.

[167] S. Rahaman and D. Yao. Program analysis of cryptographic implementations for security. In IEEE Cybersecurity Development, SecDev 2017, Cambridge, MA, USA, September 24-26,

2017, pages 61–68, 2017.

[168] S. H. Ramos, M. T. Villalba, and R. Lacuesta. MQTT Security: A Novel Fuzzing Approach. Wireless Communications and Mobile Computing, 2018, 2018.

[169] B. Reaves, N. Scaife, A. Bates, P. Traynor, and K. R. B. Butler. Mo(bile) money, mo(bile) problems: Analysis of branchless banking applications in the developing world. In Pro- ceedings of the USENIX Security Symposium (USENIX SEC), 2015.

[170] Routing security for policymakers: An Internet society white paper, October 2018. Internet Society. https://www.manrs.org/wp-content/uploads/2018/10/Routi ng-Security-for-Policymakers-EN.pdf.

[171] N. Rutar, C. B. Almazan, and J. S. Foster. A comparison of bug finding tools for Java. In 15th International Symposium on Software Reliability Engineering (ISSRE 2004), pages 245–256, 2004. Sazzadur Rahaman Bibliography 157

[172] N. Scaife, C. Peeters, and P. Traynor. Fear the Reaper: Characterization and Fast Detection of Card Skimmers. In Proceedings of the USENIX Security Symposium (USENIX SEC), 2018.

[173] N. Scaife, C. Peeters, C. Velez, H. Zhao, P. Traynor, and D. P. Arnold. The cards aren’t alright: Detecting counterfeit gift cards using encoding jitter. In Proceedings of the IEEE Symposium on Security and Privacy, (S&P), 2018.

[174] X. Shu, K. Tian, A. Ciambrone, and D. Yao. Breaking the target: An analysis of target data breach and lessons learned. CoRR, abs/1701.04940, 2017.

[175] S. Sivakorn, G. Argyros, K. Pei, A. D. Keromytis, and S. Jana. HVLearn: Automated Black- Box Analysis of Hostname Verification in SSL/TLS Implementations. In IEEE S&P 2017, pages 521–538, 2017.

[176] J. Somorovsky. Systematic Fuzzing and Testing of TLS Libraries. In ACM CCS 2016, pages 1492–1504, 2016.

[177] D. Sounthiraraj, J. Sahs, G. Greenwood, Z. Lin, and L. Khan. SMV-Hunter: Large Scale, Automated Detection of SSL/TLS Man-in-the-Middle Vulnerabilities in Android Apps. In NDSS’14, 2014.

[178] M. Steffens, C. Rossow, M. Johns, and B. Stock. Don’t trust the locals: Investigating the prevalence of persistent client-side cross-site scripting in the wild. In 26th Annual Net- work and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA,

February 24-27, 2019, 2019.

[179] M. Stevens, E. Bursztein, P. Karpman, A. Albertini, and Y. Markov. The First Collision for Full SHA-1. In CRYPTO’17, 2017.

[180] M. Stevens, A. K. Lenstra, and B. de Weger. Chosen-Prefix Collisions for MD5 and Collid- ing X.509 Certificates for Different Identities. In EUROCRYPT’07, pages 1–22, 2007. 158 Bibliography Sazzadur Rahaman

[181] D. A. et al. Imperfect Forward Secrecy: How Diffie-Hellman Fails in Practice. In ACM CCS 2015, pages 5–17, 2015.

[182] Update to current use and deprecation of TDEA, 2017. https://csrc.nist.gov/ news/2017/update-to-current-use-and-deprecation-of-tdea.

[183] V. van der Veen et al. Drammer: Deterministic rowhammer attacks on mobile platforms. In ACM CCS’16, pages 1675–1689, 2016.

[184] M. Vanhoef and F. Piessens. Symbolic execution of security protocol implementations: Handling cryptographic primitives. In 12th USENIX Workshop on Offensive Technologies, WOOT 2018, Baltimore, MD, USA, August 13-14, 2018., 2018.

[185] S. Vaudenay. Security Flaws Induced by CBC Padding - Applications to SSL, IPSEC, WTLS ... In EUROCRYPT 2002, pages 534–546, 2002.

[186] M. Vieira, N. Antunes, and H. Madeira. Using web security scanners to detect vulnerabilities in web services. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2009.

[187] P. Vogt, F. Nentwich, N. Jovanovic, E. Kirda, C. Krügel, and G. Vigna. Cross Site Scripting Prevention with Dynamic Data Tainting and Static Analysis. In NDSS 2007, 2007.

[188] K. Xu, D. Yao, B. Ryder, and K. Tian. Probabilistic Program Modeling for High-Precision Anomaly Classification. In CSF’15, July 2015.

[189] H. Y. Yang, E. D. Tempero, and H. Melton. An Empirical Study into Use of Dependency Injection in Java. In Australian Software Engineering Conference ASWEC’08, pages 239– 247, 2008.

[190] D. Yao, X. Shu, L. Cheng, and S. J. Stolfo. Anomaly Detection as a Service: Challenges, Advances, and Opportunities. In Information Security, Privacy, and Trust Series. Morgan & Claypool., 2017. Sazzadur Rahaman Bibliography 159

[191] T. Zhang, G. Upadhyaya, A. Reinhardt, H. Rajan, and M. Kim. Are code examples on an online Q&A forum reliable?: a study of API misuse on stack overflow. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, pages 886–896, 2018.

[192] Y. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. Cross-Tenant Side-Channel Attacks in PaaS Clouds. In ACM CCS 2014, pages 990–1003, 2014.

[193] C. Zuo, Z. Lin, and Y. Zhang. Why does your data leak? uncovering the data leakage in cloud from mobile apps. In IEEE S&P’16, 2019.