Analysis and Classification of Android Malware
Total Page:16
File Type:pdf, Size:1020Kb
Analysis and Classification of Android Malware Kimberly Kim-Chi Tam Information Security Department Royal Holloway, University of London A thesis submitted for the degree of Doctor of Philosophy 2016 Acknowledgements First I would like to acknowledge my supervisor and advisor, Dr. Lorenzo Cavallaro and Professor Cid Carlos, for the amazing opportunity to do my PhD at Royal Holloway University of London. It would not have been possible without you two. I would also like to acknowledge my fellow Royal Holloway University of London lab members and external collaborators for their partnership in collaborative work. In particular I would like to thank Salahuddin Khan for his technical expertise, his professionalism, and his friendship. Similarly, I feel much gratitude towards Professor Kevin Jones and Dr. War- rick Cooke for their extensive advice for both fine-tuning this thesis in detail and the overall story and flow. Parts of this research was funded by UK EPSRC grant EP/L022710/1. I would also like to thank Hewlett-Packard in Bristol for the internship op- portunity during my PhD as well as all of their support since then. Finally, I want to acknowledge my family and friends scattered across the globe. Organizing opportunities to connect has been a real challenge over the last few years, but we always managed it. So thank you all for the fun and help throughout the good times and bad. I could not have gotten this far without your awesomeness. “One does not discover new lands without consenting to lose sight of the shore for a very long time.” Andre Gide 2 Abstract With the integration of mobile devices into our daily lives, smartphones are privy to increasing amounts of sensitive information. As of 2016, Android is the leading smartphone in popularity with sophisticated mobile malware targeting its data and services. Thus this thesis attempts to determine how accurate and scalable Android malware analysis and classification methods can be developed to robustly withstand frequent, and substantial, changes within the Android device and in the Android malware ecosystem. First, the author presents a comprehensive survey on leading Android mal- ware analysis and detection techniques, and their effectiveness against evolv- ing malware. Through the systematized survey, the author identifies under- developed areas of research which lead to the development of the novel Android malware analysis and classification solutions within in this thesis. This thesis considers the usefulness and feasibility of reconstructing high- level behaviours via system calls intercepted while running Android apps. Previously, this method had only been rudimentarily implemented. How- ever, the author was able to remedy this and developed a robust, novel, framework, to automatically and completely reconstructs all Android mal- ware behaviours by thoroughly analysing dynamically captured system calls. Next, the author investigates the efficacy of using our reconstructed be- havioural profiles, at different levels of abstractions, to classify Android malware into families. Experiments in this thesis show our reconstructed behaviours to be more effective, and efficient, than raw system call traces. To classify malware, we utilized support vector machines to achieve high accuracy, precision and recall. Deviating from previous methods, we fur- ther apply statistical classification to achieve near-perfect accuracies. Finally, the author explores an alternative Android malware analysis method using memory forensics. By extrapolating from these experiments, the au- thor theorizes how to use this method to assist in capturing behaviours our previous methods could not, and how they could assist classification. 3 Contents 1 Introduction 12 1.1 Android Malware Threat . 12 1.2 Desirable Solution Traits . 13 1.3 Organization of the Thesis . 14 1.4 Declaration of Authorship for Co-Authored Work . 15 2 Background 17 2.1 Introduction . 18 2.1.1 Evolution of Malware . 19 2.1.2 Android Architecture . 22 2.1.3 Notable Android Malware . 23 2.1.4 Statistics for Android Malware Evolution . 24 2.2 Taxonomy of Mobile Malware Analysis Approaches . 27 2.2.1 Static Analysis . 28 2.2.2 Dynamic Analysis . 32 2.2.3 Hybrid Analysis . 36 2.2.4 Analysis Techniques . 36 2.2.5 Feature Selection . 42 2.2.6 Building on Analysis . 43 2.3 Malware Evolution to Counter Analysis . 44 2.3.1 Trivial Layout Transformations . 44 2.3.2 Transformations that Complicate Static Analysis . 44 2.3.3 Transformations that Prevent Static Analysis . 46 2.3.4 Anti-Analysis and VM-Aware . 46 2.4 Discussion . 47 2.4.1 Impact and Motivation . 47 2.4.2 Mobile Security Effectiveness . 51 2.5 General Areas for Future Research . 53 4 Analysis and Classification of Android Malware 2.5.1 Hybrid Analysis and Multi-Levelled Analysis . 53 2.5.2 Code Coverage . 53 2.5.3 Hybrid Devices and Virtualization . 54 2.5.4 Datasets . 55 2.6 Thesis Goals and Contributions . 56 3 Automatic Reconstruction of Android Behaviours 57 3.1 Introduction . 58 3.2 Relevant Background Information . 60 3.2.1 Android Applications . 60 3.2.2 Inter-Process Communications and Remote Procedure Calls . 61 3.2.3 Android Interface Definition Language . 63 3.2.4 Native Interface . 63 3.3 Overview of CopperDroid . 64 3.3.1 Independent of Runtime Changes . 64 3.3.2 Tracking System Call Invocations . 65 3.4 Automatic IPC Unmarshalling . 66 3.4.1 AIDL Parser . 66 3.4.2 Concept for Reconstructing IPC Behaviours . 67 3.4.3 Unmarshalling Oracle . 68 3.4.4 Recursive Object Exploration . 73 3.4.5 An Example of Reconstructing IPC SMS . 74 3.5 Observed Behaviours . 76 3.5.1 Value-Based Data Flow Analysis . 78 3.5.2 App Stimulation . 79 3.6 Evaluation . 81 3.6.1 Effectiveness . 81 3.6.2 Performance . 84 3.7 Limitations and Threat to Validity . 86 3.8 Related Work . 87 3.9 Summary . 90 4 Classifying Android Malware 91 4.1 Introduction . 92 4.2 Relevant Machine Learning Background . 94 4.2.1 Support Vector Machines (SVM) . 94 4.2.2 Conformal Prediction (CP) . 96 5 Analysis and Classification of Android Malware 4.3 Novel Hybrid Prediction: SVM with CP . 97 4.4 Obtaining CopperDroid Behaviours . 98 4.4.1 System Architecture . 98 4.4.2 Modes and Thresholds . 99 4.4.3 Parsing CopperDroid JSONs and Meta data . 100 4.4.4 Behaviour Extraction . 100 4.5 Multi-Class Classification . 102 4.5.1 Behaviour Feature Vectors for SVM . 103 4.5.2 Accuracy and Limitations of Our Traditional SVM . 105 4.5.3 Enhancing SVM with Conformal Predictions . 105 4.6 Statistics and Results . 106 4.6.1 Dataset, Precision, and Recall . 107 4.6.2 Classification Using SVM . 108 4.6.3 Coping with Sparse Behavioural Profiles . 113 4.6.4 Hybrid Prediction: SVM with Selective CP . 115 4.7 Limitations and Threat to Validity . 118 4.8 Related Works . 119 4.9 Summary . 122 5 Supplemental Analysis Approach 123 5.1 Android Memory Image Forensics . 124 5.2 Relevant Background . 125 5.2.1 Memory Forensics . 126 5.2.2 The Android System Recap . 126 5.3 Specific Examples of Malware Behaviours . 128 5.3.1 Root Exploits . 129 5.3.2 Stealing User and Device Data . 132 5.4 Design and Implementation . 133 5.4.1 Android SDK and Kernel . 133 5.4.2 LiME . 134 5.4.3 Volatility . 135 5.4.4 Stimuli and Modifications . 136 5.5 Memory Analysis . 136 5.5.1 Libraries Artefacts . 137 5.5.2 Linux Filesystem Artefacts . 138 5.5.3 Process Tree and ID Artefacts . 140 6 Analysis and Classification of Android Malware 5.5.4 Extras: Miscellaneous String . 142 5.5.5 Theory: Memory Fingerprints . 143 5.6 Case Studies . 144 5.6.1 BaseBridge A . 144 5.6.2 DroidKungFu A . 151 5.7 Limitations and Threat to Validity . 158 5.8 Related Works . 159 5.9 Summary . 161 6 Conclusions and Future Work 163 6.1 Restating Research Problems and Research Goals . 164 6.2 Research Contributions and Distribution of Work . 164 6.2.1 CopperDroid . 165 6.2.2 Android Malware Classification . 165 6.2.3 Memory Forensics . 166 6.2.4 Goals Met . 167 6.3 Future Directions . 167 6.4 Concluding Remarks . 170 A Comparison of Related Works 171 B Overall Results on McAfee dataset 174 C Classification Feature Statistics 176 D Malware Manifests 178 Bibliography 184 7 List of Figures 2.1 Worldwide Smartphone Sales by Operating System (OS) from 2006 to end of 2014. 21 2.2 Overview of the Android Operating System (OS) Architecture. 23 2.3 Evolution of Android malware using obfuscating techniques (e.g., cryp- tography APIs). ..