A Semantic‑Based Analysis of Android Malware for Detection, Generation, and Trend Analysis
Total Page:16
File Type:pdf, Size:1020Kb
This document is downloaded from DR‑NTU (https://dr.ntu.edu.sg) Nanyang Technological University, Singapore. A semantic‑based analysis of Android malware for detection, generation, and trend analysis Meng, Guozhu 2017 Meng, G. (2017). A semantic‑based analysis of Android malware for detection, generation, and trend analysis. Doctoral thesis, Nanyang Technological University, Singapore. http://hdl.handle.net/10356/72122 https://doi.org/10.32657/10356/72122 Downloaded on 27 Sep 2021 00:23:01 SGT NANYANG TECHNOLOGICAL UNIVERSITY A SEMANTIC-BASED ANALYSIS OF ANDROID MALWARE FOR DETECTION, GENERATION, AND TREND ANALYSIS GUOZHU MENG School of Computer Science and Engineering A thesis submitted to the Nanyang Technological University in partial fulfilment of the requirements for the degree of Doctor of Philosophy 2017 THESIS ABSTACT A Semantic-based Analysis of Android Malware for Detection, Generation, and Trend Analysis by GUOZHU MENG Doctor of Philosophy School of Computer Science and Engineering Nanyang Technological University, Singapore Android has grown to be the most popular mobile operating system since its release in 2008. Due to its openness and ease of use, it attracts thousands of vendors and devel- opers working on Android application development. Millions of apps provide a variety of functionalities to Android users, such as online shopping, instant messaging, gaming and map service. However, Android becomes a hot attack target of cybercriminals due to its prevalence. According to the security report of Symantec in 2016, the number of Android malware has reached 13 million in 2015. Android malware is uploaded into ei- ther Google official market or unofficial markets everyday by cybercriminals which put users under a high risk. The malware may steal users’ sensitive information, elevate the privilege, remote control devices, and encrypt users’ files for ransom. It is non-trivial to understand the risks and develop effective mitigation against them. Malware is the critical and non-trivial issue in Android security. In order to prevent malware from attacking the users, we need a better understanding of Android malware and its behaviors, which can facilitate the extraction of representative features from malware, and thereby enhance malware detection. The malware and anti-malware tools are keeping evolving during the process of competition. Therefore, it is valuable to learn the characteristics of evolving malware, and weakness of existing anti-malware tools. Moreover, a sustaining malware analysis and security assessment is lacking for the Android world. In order to address these problems, we propose a semantic based malware analysis on these topics with the following achievements in this thesis: 1. We propose a precise semantic model of Android malware based on Deterministic Symbolic Automaton (DSA) for the purpose of malware comprehension, detection and classification. Based on DSA, we develop an automatic analysis framework, named SMART, which learns DSA by detecting and summarizing semantic clones from mal- ware families, and then extracts semantic features from the learned DSA to classify malware according to the attack patterns. We conduct the experiments in both malware benchmark and 223,170 real-world apps. The results show that SMART builds mean- ingful semantic models and outperforms both state-of-the-art approaches and anti-virus tools in malware detection. SMART identifies 4583 new malware in real-world apps that are missed by most anti-virus tools. The classification step further identifies new malware variants and unknown families. iv 2. We first propose a meta model for Android malware to capture the common attack features and evasion features in the malware. Based on this model, we develop a frame- work, MYSTIQUE, to automatically generate malware covering four attack features and two evasion features, by adopting the software product line engineering approach. With the help of MYSTIQUE, we conduct experiments to 1) understand Android malware and the associated attack features as well as evasion techniques; 2) evaluate and compare the 57 off-the-shelf anti-malware tools, 9 academic solutions and 4 Android market vetting processes in terms of accuracy in detecting attack features and capability in addressing evasion. Last but not least, we provide a benchmark of Android malware with proper labeling of contained attack and evasion features. Moreover, we extend this work to MYSTIQUE-S to explore the capabilities of anti-malware tools detecting malware with dynamic code loading. MYSTIQUE-S automatically selects attack fea- tures under various user scenarios and delivers the corresponding malicious payloads at runtime. Relying on dynamic code binding (via service) and loading (via reflection) techniques, MYSTIQUE-S enables the dynamic execution of payloads on user devices at runtime. Experimental results on real-world devices show that existing Anti-Malware Tools (AMTs) are incapable of detecting most of our generated malware. Last, we propose some enhancements for existing anti-malware tools. 3. We propose a systematic approach to study Android malware, unveil security issues, obtain insightful conclusions and highlights, and predict the future trend for research. We have collected 4,267,178 Android apps from a variety of Android marketplaces, where 1,004,550 malware variants are identified and analyzed. Different from previous works, this work focuses on the differences and evolution of apps’ characteristics, and identifies multiple security-related issues concerned by both academia and industry. In order to provide a comprehensive view for these issues, we propose four analyses on individual app, malware family, malware author, and market, to conduct our study and guide the analysis. Furthermore, we propose six dimensions to cluster apps for different analysis tasks to achieve efficiency and accuracy in the large-scale analysis. Some of the key findings reflect the characteristics of attacks, and the weaknesses in protection, which can benefit all stakeholders. Contents 1 Introduction1 1.1 Motivations and Goals..........................2 1.2 Main Works and Contributions......................4 1.3 Thesis Outline............................... 10 1.4 Publication List.............................. 11 2 Background and Preliminaries 15 2.1 Android System.............................. 15 2.2 Android Malware............................. 17 2.3 Android Defense............................. 18 2.3.1 Brief on Anti-Malware Techniques................ 19 2.3.2 Detection Mechanism...................... 20 2.3.2.1 Evidence Collection.................. 20 2.3.2.2 Knowledge-base Detection.............. 22 2.3.3 Summary............................. 23 2.4 Android Malware Dataset......................... 24 3 Semantic Modelling of Android Malware for Malware Detection 29 3.1 Introduction................................ 29 3.2 Related Work............................... 32 3.3 Modelling Android Malware....................... 34 3.4 The SMART Framework......................... 37 3.5 Learning Malicious Behaviors...................... 38 3.5.1 Bytecode Clone Detection.................... 38 3.5.2 Bytecode Differencing...................... 42 3.5.3 Semantic Model Construction.................. 43 3.5.3.1 DSA Construction................... 43 3.5.3.2 Object-based Action Extraction............ 44 3.6 Malware Detection and Classification.................. 45 3.6.1 Machine Learning Based Detection............... 46 3.6.2 DSA based Detection and Classification............. 47 3.7 Evaluation................................. 48 3.7.1 RQ1: Evaluation of the Semantic Model............. 49 3.7.2 RQ2: Malware Detection based on ML............. 51 3.7.3 RQ3: Evaluation on Real World Apps.............. 53 v vi CONTENTS 3.7.4 RQ4: Resilience to Malware Variants.............. 55 3.7.5 RQ5: Scalability & Efficiency.................. 56 3.8 Discussion................................. 57 3.9 Conclusion................................ 58 4 Evolving Android Malware for Auditing Anti-Malware Tools 59 4.1 Introduction................................ 59 4.2 Mystique Overview............................ 63 4.2.1 Mystique Overview........................ 63 4.2.2 Technical Challenge....................... 65 4.3 Feature-oriented Domain Analysis of Android Malware......... 66 4.3.1 Attack Feature.......................... 66 4.3.2 Evasion Feature.......................... 67 4.4 Multi-objective Guided Malware Generation............... 70 4.4.1 Feature Selection via IBEA................... 73 4.4.2 Construction of Malicious Behaviors.............. 75 4.4.3 Code Assembly.......................... 76 4.4.4 Evasion Application....................... 78 4.4.5 Objective Evaluation....................... 79 4.5 Malware and AMT Evaluation...................... 79 4.5.1 Hypothesis of Anti-malware Tools & Research Questions... 79 4.5.2 Evaluation Subjects........................ 80 4.5.3 RQ1: Validity of Generated Malware.............. 82 4.5.4 RQ2: Auditing of AMTs..................... 84 4.5.5 RQ3: Representative Malware and Usefulness of Mystique... 90 4.5.5.1 The Usefulness of MYSTIQUE ............ 90 4.5.5.2 The Rapid Acquisition of Optimal Malware..... 90 4.6 Discussion................................. 92 4.6.1 Threats to Validity........................ 92 4.6.2 Extensibility of Mystique..................... 92 4.6.3 Countermeasure for Generated Malware............. 93 4.6.4 Evasion vs. Obfuscation..................... 94 4.7 Related Work..............................