Classifying Advanced Malware Into Families Based on Instruction Link Analysis Alsa Tabatabaei
Total Page:16
File Type:pdf, Size:1020Kb
i i Classifying Advanced Malware into Families based on Instruction Link Analysis Alsa Tabatabaei School of Computing, Science, and Engineering University of Salford Manchester, UK Submitted in Partial Fulfilment of the Requirements of the Degree Master of Philosophy, 2018 i ` i i Table of Contents Table of Contents ..................................................................................................................................... i Acknowledgement...................................................................................................................................... xi Declaration ................................................................................................................................................ xii Abstract .................................................................................................................................................... xiii Chapter One ................................................................................................................................................ 1 1 Overview ............................................................................................................................................ 1 1.1 Background .................................................................................................................................. 1 1.2 Research Problem ........................................................................................................................ 3 1.3 Research Hypothesis.................................................................................................................... 4 1.4 Research Questions...................................................................................................................... 5 1.5 Research Motivation .................................................................................................................... 6 1.6 Research Challenges .................................................................................................................... 6 1.7 Justification, Aims, and Objectives ............................................................................................. 6 1.8 Research Aims ............................................................................................................................. 7 1.9 Research Objectives .................................................................................................................... 7 1.10 Significant of the Study ............................................................................................................... 8 1.11 Research Scope and Limitations .................................................................................................. 8 1.12 Research Methodology and Research Methods........................................................................... 8 1.12.1 Research Methodology ......................................................................................................... 8 i ` i i 1.12.2 Research Methods .............................................................................................................. 10 1.13 Research Overview and Structure ............................................................................................. 11 Chapter Two .............................................................................................................................................. 12 Related Literature Review......................................................................................................................... 12 2 Overview .......................................................................................................................................... 12 2.1 Characterisation of Malware ..................................................................................................... 12 2.2 Understanding Advanced Malware ........................................................................................... 13 2.2.1 Understanding Advanced Persistent Threats (APTs) ......................................................... 14 2.3 An Overview of Static and Dynamic Analysis .......................................................................... 15 2.4 Machine Learning (ML) ............................................................................................................ 17 2.4.1 Supervised Machine Learning ............................................................................................ 20 2.4.2 Unsupervised Machine Learning ....................................................................................... 21 2.5 Data Mining ............................................................................................................................... 23 2.5.1 Association Rule in Data Mining ....................................................................................... 25 2.5.2 Mining Opcode Relevance ................................................................................................. 26 2.6 Analysis to Detect Malware ...................................................................................................... 28 2.7 Techniques over Malware Detection ......................................................................................... 32 2.7.1 Classification of Malware .................................................................................................. 34 2.7.2 Clustering of Malware ........................................................................................................ 35 2.8 Dealing with Advanced Persistent Threats (APT) .................................................................... 36 ii ` i i 2.8.1 Common Techniques to Detect Advanced Persistent Threats (APT) ................................ 36 2.9 Summary and Remarks .............................................................................................................. 37 Chapter Three ............................................................................................................................................ 39 Methodology ............................................................................................................................................. 39 3 Overview .......................................................................................................................................... 39 3.1 Fundamental Techniques and the Proposed Models ................................................................. 39 3.2 Expectation Maximization (EM) Clustering ............................................................................. 42 3.3 K-means and K-medoids Clustering.......................................................................................... 42 3.4 Hierarchical Clustering .............................................................................................................. 43 3.5 Why EM? ................................................................................................................................... 44 3.6 Obtaining and Dealing with Data .............................................................................................. 44 3.6.1 Data Collection ................................................................................................................... 45 3.6.2 Data Preparation ................................................................................................................. 46 3.6.3 Feature Extraction .............................................................................................................. 46 3.6.4 Data Cleaning ..................................................................................................................... 47 3.6.5 Feature Construction .......................................................................................................... 49 3.6.6 Feature Selection ................................................................................................................ 53 Chapter Four ............................................................................................................................................. 54 Design and Implementation of Research Case Studies ............................................................................. 54 4 Overview .......................................................................................................................................... 54 iii ` i i 4.1 Opcode Mining .......................................................................................................................... 54 4.2 Kaggle Case Study..................................................................................................................... 55 4.2.1 Dataset Characteristics and Pre-processing........................................................................ 55 4.2.2 Experimental Set up and Reports ....................................................................................... 56 4.3 APTs Case Study ....................................................................................................................... 67 4.3.1 Dataset Characteristics and Pre-processing........................................................................ 68 4.3.2 Experimental set up and Reports ........................................................................................ 68 4.4 Summary ...................................................................................................................................