A Detection Framework for Android Financial Malware
Total Page:16
File Type:pdf, Size:1020Kb
A Detection Framework for Android Financial Malware by Andi Fitriah Abdul Kadir Master of Computer Science, IIUM, 2013 Bachelor of Computer Science, IIUM, 2009 A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy In the Graduate Academic Unit of Faculty of Computer Science Supervisor(s): Ali A. Ghorbani, Ph.D., Computer Science, Natalia Stakhanova, Ph.D., Computer Science Examining Board: Bradford Nickerson, Ph.D., Faculty of Computer Science, Rongxing Lu, Ph.D., Faculty of Computer Science, Constantine Passaris, Ph.D., Faculty of Arts External Examiner: Nur Zincir-Heywood, Ph.D., Faculty of Computer Science, Dalhousie This dissertation is accepted Dean of Graduate Studies THE UNIVERSITY OF NEW BRUNSWICK December, 2018 ©Andi Fitriah Abdul Kadir, 2019 Abstract As attempts to thwart cybercrime have intensified, so have innovations in how cybercrimi- nals provision their infrastructure to sustain their activities. Consequently, what motivates cybercriminals today has changed: from ego and status, to interest and money; the cyberse- curity researchers today have turned their attention to financial-related malware, especially on the Android platform. There are some major issues faced by researchers in detecting Android financial malware. Firstly, what constitutes Android financial malware is still ambiguous. There is a disparity in labelling the type of malware where most of the cur- rent detection systems emphasize the recognition of generic malware types (e.g., Trojan, Worm) rather than indicating its capabilities (e.g., banking malware, ransomware). With- out knowing what constitutes financial malware, the detection systems are not capable of providing an accurate recognition of an advanced and sophisticated financial-related mal- ware. Secondly, most of the current anomaly-based detection systems via machine learn- ing suffer from inaccurate evaluation and comparison due to the lack of adequate datasets, which result in unreliable outputs for real-world deployment. Due to time-consuming processes, most of the available datasets are crafted mainly for static analysis, and those created for dynamic analysis are installed on an emulator or sandbox. Sophisticated mal- ware can bypass these approaches; malware authors have employed obfuscation methods and included a wide range of anti-emulator techniques, where the malware programs at- tempt to hide their malicious activities by detecting the emulator. These deficiencies are some of the major reasons why Android financial malware is able to avoid detection. ii A comprehensive understanding of the existing Android financial malware attacks supported by a unified terminology and high-quality dataset is required for the deploy- ment of reliable defence mechanisms against these attacks. Therefore, we seek to under- stand trends and relationships between Android malware families and devise a taxonomy of Android financial malware attacks. In addition, a systematic approach to generate the required datasets is presented to address the need to use physical platforms instead of em- ulators. In this regard, an automated dynamic analysis system running on smartphones is developed to generate the desired dataset in a testbed environment. In order to correlate the generated dataset and the proposed taxonomy, a hybrid framework for malware detection is presented. We propose a novel combination of both static and dynamic analysis based specifically on features derived from the string literal (statically via reverse engineering) and network flow (dynamically on smartphones). This combination can assist security analysts in recognizing the threats effectively. We employ five common classifiers to con- struct the best model to identify malware at four levels: detecting malicious Android apps, classifying Android apps with respect to malware category and sub-category, and charac- terizing Android apps according to malware family. Specifically, a dataset containing over 5,000 samples is used to evaluate the performance of the proposed method. The experi- mental results show that the proposed method with a Random Forest classifier achieves an accuracy of over 90% with a very low false positive rate of 4% on average. iii Dedication O Lord, increase me in knowledge while keeping me humble For my parents, Andi Abdul Kadir & Niswa Mohd Noor, who gave me wings. For my late grandparents, Andi Arfah Andi Saleh & Andi Dinar Andi Bombang, Mohd Noor Usman & Siti Dinar Saide, who taught me how to spread my wings. For family, teachers, & friends, who encouraged me to fly. rrr iv Acknowledgements My heartfelt appreciation to both of my supervisors, Dr. Stakhanova and Dr. Ghorbani for their dedicated guidance, encouragement, and warm support throughout my studies. It was a privilege to have been given the opportunity to study at the Canadian Institute for Cybersecurity (CIC). I gratefully acknowledge the help of my examining board, especially Dr. Lu and Dr. Nickerson for taking the time to provide me with constructive comments in completing my dissertation. I am indebted to my parents for their unconditional love and support. Without my father’s encouraging words and my mother’s caring nature, I would not have been resilient enough to make it through my four years here. A very special thanks goes to my siblings, relatives, teachers, and friends for standing by my side through the good and bad times. They are indeed my sunshine. I thank as well my fellow colleagues at CIC, especially Hugo, Vaibhavi, Yan Li, Amir, Abdullah, Elaheh, Xi Chen, Alina, Samaneh, Laya, Nasim, Iman, Rasool, and Saeed. It was a great pleasure working with them, and I appreciate their ideas and good humour. I would like to thank all the people at the Faculty of Computer Science who help directly or indirectly, espe- cially to Dr. Lashkari, Dr. Al-Hadidi, Dr. Mamun, Dr. Evans, the system administrator (Hamidreza, John, and Ivan), the Post-Docs (Ratinder and Marco), the Co-Op Students (Riley and Merghan), and the administration staffs (Ali, Pamela, Jodi, and Brenda). Last but not least, I graciously acknowledge the funding from the International Islamic University Malaysia (IIUM) and Malaysia’s Ministry of Education for granting me this opportunity to pursue higher education. v Table of Contents Abstract ii Dedication iv Acknowledgmentsv Table of Contents ix List of Tables xii List of Figures xv 1 Introduction1 1.1 Motivation.................................. 4 1.2 Research Question and Objective...................... 5 1.3 Contributions ................................ 7 1.4 Thesis Organization............................. 8 2 Background and Literature Review 10 2.1 Overview .................................. 10 2.2 Background................................. 11 2.2.1 Understanding Android Applications................ 12 2.2.2 Understanding Android Security.................. 14 2.2.3 Understanding Android Malware.................. 16 2.3 Literature Review.............................. 19 vi 2.3.1 Android Malware Dataset...................... 26 2.3.2 Android Malware Detection .................... 29 2.3.3 Android Sandbox and Virtual Environment ............ 31 2.4 Summary .................................. 35 3 Proposed Taxonomy 36 3.1 Overview .................................. 36 3.2 Motivating Example............................. 36 3.3 Android Financial Malware Definition................... 38 3.4 Android Financial Malware Classification................. 39 3.4.1 Mobile service exploitation (SMS malware)............ 41 3.4.2 Mobile usage restriction (Ransomware) .............. 44 3.4.3 Mobile apps exploitation (Banking Malware, Scareware, Adware) 47 3.5 Summary .................................. 51 4 Proposed Framework 52 4.1 Overview .................................. 52 4.2 Conceptual Framework........................... 53 4.2.1 Stage 1: Collector.......................... 55 4.2.2 Stage 2: Filter............................ 55 4.2.3 Stage 3: Analytics Engine ..................... 56 4.2.3.1 Behavioral analysis.................... 56 4.2.3.2 Feature Representation.................. 59 4.2.4 Stage 4: Detector.......................... 60 4.2.4.1 Decision-Making Module................ 61 4.2.4.2 Detection Module .................... 61 4.3 Summary .................................. 62 5 Implementation 63 vii 5.1 Overview .................................. 63 5.2 Dataset Description............................. 63 5.3 System Configuration............................ 66 5.3.1 Network Architecture........................ 67 5.3.2 Learning Parameter......................... 68 5.3.3 Evaluation methodology and metrics................ 69 5.4 Static Module Implementation ....................... 73 5.5 Dynamic Module Implementation ..................... 74 5.6 Summary .................................. 80 6 Result and Discussion 81 6.1 Overview .................................. 81 6.2 Experiment and Results........................... 82 6.2.1 Data evaluation based on proposed taxonomy........... 85 6.2.2 The effectiveness of static vs. dynamic modules.......... 98 6.2.3 Analysis of feature engineering................... 102 6.2.4 Threshold analysis of decision making module . 106 6.2.5 Evaluation of detection accuracy of the proposed framework . 108 6.2.6 Comparative analysis of state-of-the-art Copperdroid . 111 6.3 Case Study and Results........................... 114 6.3.1 The case of Android.Spy.277 . 116 6.3.2 The Dendroid Android botnet case . 118 6.4 Evaluation and Results ........................... 122