Analyzing Use of Cryptographic Primitives by Machine Learning Approaches

Masaryk University Faculty of Informatics Analyzing use of cryptographic primitives by machine learning approaches Ph.D. Thesis Proposal Mgr. Adam Janovský Advisor: prof. RNDr. Václav Matyáš, M.Sc., Ph.D. Brno, Spring 2020 Signature of Thesis Advisor Abstract Systems that employ cryptographic primitives and protocols are being improved continuously on a multitude of layers. Modern cryptographic primitives have been introduced that have been replacing the weak primitives. Also, the implementations of the systems get frequently audited or even standardized. Still, the security community has failed many times at implementing secure systems, often falling into known pitfalls, or even repeating the same patterns of mistakes. Even if the systems prove to be well designed by the security specialists, their implementation is no less important w.r.t. security. Often, developers untrained in security implement these systems, neglecting essential aspects of security. While individual failures give valuable lessons, the security community should strive to learn where the systems fail systematically. In this thesis proposal, a systematic analysis of usage of cryptographic primitives is outlined for three related domains: Sources of RSA keys, malicious Android applications, and security devices certified with Common Criteria framework. For each of the mentioned domains, the proposal reviews relevant literature and presents the research plan. In addition, some already achieved results are presented, together with a re-print of two selected publications. The objective of our research is to reveal individual vulnerabilities, to point to weak elements in the systems, but most importantly to raise the awereness of the security situation in the studied domains. In the area of RSA keys, we built a model for classification of private RSA keys and identified the possible (but also ruled out the improbable) sources of GCD-factorable keys found on the Internet. Concerning malicious Android applications, more than 250 thousand of samples were automatically processed and nearly 1 million of cryptographic API call sites were collected. Analysis of these call sites revealed several interesting trends about cryptography usage by the malware authors. For instance, we discovered widespread use of weak hash functions, the growth in use of public-key cryptography, and the progressive decrease of the use of cryptographic API in malware. These insights may help to prevent some future threats. i ii Contents 1 Introduction 1 2 State of art 4 2.1 Bias in RSA key generation . .4 2.2 Cryptographic API in Android malware . .8 2.2.1 Inferring rules of cryptographic API misuse . .9 2.2.2 Evaluation of cryptographic API misuse . 10 2.2.3 Attribution of cryptographic API misuse . 11 2.2.4 Automatic cryptographic API repairs . 11 2.2.5 Study of cryptography in Android malware . 11 2.3 Cryptographic primitives in certified devices . 12 3 Aims of thesis 14 3.1 Bias in RSA keys . 14 3.2 Cryptographic API in Android malware . 15 3.3 Cryptography in certified devices . 16 3.4 Limitations . 18 3.5 Publication venues . 19 3.6 Tentative schedule . 19 4 Achieved results 21 4.1 Biased RSA private keys: Origin attribution of GCD- factorable keys . 21 4.2 Cryptographic API in Android malware . 23 4.3 Kleptography in real-world TLS . 25 4.4 List of publications . 26 Appendices 38 A Attached publications of author 39 A.1 Biased RSA private keys: Origin attribution of GCD- factorable keys . 39 A.2 Bringing kleptography to real-world TLS . 60 iii 1 Introduction Systems that employ cryptographic primitives and protocols are being improved continuously on a multitude of layers. Modern cryptographic primitives have been introduced that have been replacing the weak primitives [1, 2, 3]. Also, the implementations of the systems get frequently audited or even standardized. Still, the security community has failed many times at implementing secure systems, often falling into known pitfalls, or even repeating the same patterns of mistakes [4, 5]. Even if the systems prove to be well designed by the security specialists, their implementation is no less important w.r.t. security. Often, developers untrained in security implement these systems, neglecting essential aspects of security [6, 7]. While individual failures give valuable lessons, the security community should strive to learn where the systems fail systematically. We offer three main perspectives that affect the security of a system w.r.t. cryptographic primitives: (i) strong, previously unattacked primitives are chosen; (ii) whatever cryptographic primitive is used, it is generated and leveraged in a proper way; (iii) these primitives get then composed into well-designed and implemented protocols. A plethora of individual failures across these levels was witnessed in the past. To mention few of them, consider the efficient cryptanalysis of the A5/1 cipher used in GSM [8, 9], or the scheduling weaknesses discovered in the RC4 cipher by Fluhrer et al. [10] that gets utilized in the Wired Equivalent Privacy (WEP) protocol. These two works represent the problem on the level (i), where insecure primitives were chosen for the protocols. The ROCA [11] vulnerability constitutes a case when a potentially secure primitive, i.e., RSA key, was generated improperly, resulting in practically factorable keys. Last but not least, multiple flaws in the Transport Layer Security (TLS) protocol were discovered along the way. Either due to design errors – e.g., weak Diffie-Hellman (DH) keys in Logjam attack [12] – or due to implementation mistakes – e.g., Lucky Thirteen timing attack [13]. We conjecture that whole domains of systems that employ cryptography would benefit from automatic scrutiny, especially on levels (ii) and 1 1. Introduction (iii). Examples of such domains may be mobile applications or certified devices. Such automation can even provide insight beyond the discovery of individual vulnerabilities. For instance, knowing how the malware authors use cryptography on mobile platforms can help us prevent future threats. Similarly, understanding the selection of cryptographic primitives for certified black-box devices could guide future inventors in their design process. Also, focus on temporal trends may help to evaluate whether the level of bit security outpaces computational power growth. We claim that the awareness of the landscape of cryptographic primitives provides a valuable insight for the security community and contributes to the evaluation of the security of the studied systems. Many domains of cryptography-driven systems exist that are worth examining. Due to the limited time frame for the dissertation thesis, we concentrate only on three specific ecosystems. In these domains, we plan to systematically analyze their usage of cryptographic primitives on multiple layers. Firstly, we expand the work [14, 15] that revealed a bias in the RSA public keys. We spotted that private keys were not studied, creating a gap in knowledge that we aim to fill. Having a model with a capability to unveil the probable source of private keys allowed us to analyze a dataset of GCD-factorized keys in the wild. Simultaneously, the analysis of private keys provided much more precise results that can serve for internal audits of the keys. Secondly, this thesis concentrates on malicious applications on mobile platforms. Since the Android platform is the most prevalent among mobile platforms [16], and is also continuously threatened with malware [17], we target our attention on Android. Our research on the Android platform involves two main steps. Currently, we are mapping the popularity of cryptographic primitives among the malware authors, describing the ecosystem as a whole. In the near future, we plan to systematically search for weaknesses in malware implementations w.r.t. cryptography that could potentially allow one to block some functions of the malware. Additionally, the patterns of cryptographic API misuse in Android malware can be directly compared to the results obtained from the landscape of benign applications [18, 19], and can be critical for securing the systems and preventing future threats. Thirdly, we aim 2 1. Introduction to systematically scrutinize the certification documents of the cryptographic devices – certified via a Common Criteria (CC) framework [20]– and describe the trends in cryptography usage in those certificates. The objective of this proposal is to introduce a long-term research plan that will eventually lead to a dissertation thesis. The rest of this document is structured as follows. Chapter 2 reviews state of the art that is relevant to the studied problems. Then, in Chapter 3 we precisely formulate our research goals and dissect it into multiple stages. At the same time, the chapter provides a tentative schedule for the outlined work. Finally, in Chapter 4 we review our already achieved results accepted or published at conferences. As an appendix, two already published papers are included in the form of full re-prints. 3 2 State of art In this chapter, we summarize the existing research and on-going efforts of the security community w.r.t. cryptographical security of selected ecosystems. In Section 2.1, we review the bias in RSA keys. We specif- ically concentrate on the impact of bias on key fingerprinting and vulnerability discovery. Next, in Section 2.2 we focus on work that maps the cryptographic API landscape in Android applications. Even though that malicious applications are the main subject of our research, most of the existing research concerns only benign applications. Still, the methodology of the related research applies both to benign and malicious applications. Finally, in Section 2.3 we introduce the Common Criteria framework and revisit several papers that touch upon the cryptographic primitives in certified devices. 2.1 Bias in RSA key generation This section is adopted from [21]. Fingerprinting of devices based on their physical characteristics, exposed interfaces, behaviour in non-standard or undefined situations, errors returned, and a wide range of various other side-channels forms a well-researched area.

Load more