Identifying Polymorphic Malware Variants Using Biosequence Analysis Techniques
Total Page:16
File Type:pdf, Size:1020Kb
IDENTIFYING POLYMORPHIC MALWARE VARIANTS USING BIOSEQUENCE ANALYSIS TECHNIQUES By Vijay Naidu SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY AT AUCKLAND UNIVERSITY OF TECHNOLOGY AUCKLAND, NEW ZEALAND June 2018 © Copyright by Vijay Naidu, 2018 Abstract Modern antivirus systems (AVSs) are not able to detect new polymorphic malware variants until they emerge, even when signatures of one or more variants belonging to a specific polymorphic malware family are known. Polymorphic malware can transform into functionally identical variants of themselves. Polymorphism changes the order of the viral code but not typically the code itself to avoid signature-based detection. Current AVSs detect malware by adopting signatures based on the most essential parts of a known virus, such as execution traces, instruction sequences, etc. Virus writers exploit the weaknesses of malware signature databases by creating new variants using the same engine employed by an already existing polymorphic malware family. In this thesis, virus detection and signature extraction techniques are presented. These techniques were developed by exploring string matching techniques traditionally employed in biosequence analysis. The main contribution of these matching techniques is to extract syntactic patterns (i.e. conserved regions/sequences) from semantically rich polymorphic hex code. These extracted syntactic patterns act as signatures and are used in the identification of polymorphic malware variants belonging to the same family. Moreover, these extracted syntactic patterns can help in identifying new variants that make simple alterations to their newly generated variants. The string matching approaches presented in this thesis may revolutionise our knowledge of polymorphic variant generation and give rise to a new era of string-based syntactic AVSs. i Table of Contents Abstract ............................................................................................................................. i Table of Contents ............................................................................................................ ii List of Figures ............................................................................................................... viii List of Tables ................................................................................................................. xii Attestation of Authorship ............................................................................................. xv Acknowledgements ....................................................................................................... xvi Chapter 1 Introduction ................................................................................................... 1 1.1 Motivation .......................................................................................................... 1 1.2 Background and Related Work .......................................................................... 6 1.3 Syntactic and Semantic Approaches .................................................................. 7 1.4 Problem Statements, Research Objectives and Questions ............................... 10 1.4.1 Problem Statements ...................................................................................... 10 1.4.2 Research Objectives ...................................................................................... 11 1.4.3 Research Questions ....................................................................................... 11 1.5 Hypothesis and Proposed Approach ................................................................. 12 1.5.1 Drawbacks of Previous Approaches ............................................................. 12 1.5.2 Hypothesis .................................................................................................... 13 1.5.3 Smith-Waterman Algorithm (SWA) ............................................................. 13 1.5.4 NNge ............................................................................................................. 14 1.5.5 Limitations of Proposed Approach and Possible Solutions .......................... 17 1.6 Thesis Description ............................................................................................ 19 1.6.1 Thesis Contribution....................................................................................... 21 1.6.2 Thesis Structure ............................................................................................ 22 1.6.3 Publications ................................................................................................... 25 Chapter 2 Malware, Polymorphic Malware, and their Detection Approaches....... 27 2.1 Classification of Malware and Recent Research into Malware Detection ....... 27 2.1.1 Virus.............................................................................................................. 28 2.1.2 Previous Research into Malware Detection .................................................. 28 2.1.3 Classification of Viruses by Masking Strategies .......................................... 33 2.1.4 Polymorphism ............................................................................................... 33 2.1.5 Classification of Polymorphism.................................................................... 34 2.1.6 Levels of Polymorphism ............................................................................... 35 2.1.7 Mutation Engine ........................................................................................... 37 2.1.8 Polymorphic Decryptor (The decryption routine) ........................................ 38 ii 2.1.9 Metamorphism .............................................................................................. 39 2.2 Malware Detection Techniques ........................................................................ 40 2.2.1 Machine Learning/Data Mining Approach ................................................... 42 2.2.2 Normalisation Approach ............................................................................... 43 2.2.3 Scan Engine (Signature based Approach) .................................................... 43 2.2.4 Cryptanalysis ................................................................................................ 44 2.2.5 Heuristic Approach ....................................................................................... 45 2.3 History of Malware – Timeline ........................................................................ 47 2.4 Tool Validation ................................................................................................. 48 2.4.1 Predictive Validation .................................................................................... 48 2.4.2 Triangulation Approach ................................................................................ 51 2.5 Summary .......................................................................................................... 53 Chapter 3 Research Design .......................................................................................... 54 3.1 Research Design ............................................................................................... 55 3.2 Identifying and analysing the problem ............................................................. 57 3.3 Defining research objectives and questions ..................................................... 57 3.4 Designing the proposed approach and conducting experiments ...................... 57 3.5 Discussion of Results and Evidence ................................................................. 59 3.6 Analysis and Evaluation ................................................................................... 59 3.7 Overview of thesis ............................................................................................ 60 3.8 Summary .......................................................................................................... 62 Chapter 4 A String-Based Method for Syntactically Identifying Polymorphic Virus Variants .......................................................................................................................... 64 4.1. Introduction ...................................................................................................... 65 4.2. String-Based Syntactic Detection of Polymorphic Malware Variants Method: An Overview ............................................................................................................... 65 4.3. String-Based Syntactic Detection of Polymorphic Malware Variants Method: Systems and Methods .................................................................................................. 66 4.3.1 Hex Dump Extraction ................................................................................... 67 4.3.2 Hex to DNA Code Conversion ..................................................................... 67 4.3.3 Process of Pairwise Local Sequence Alignment........................................... 69 4.3.4 Meta-Signature Virus Testing ....................................................................... 70 4.4. Experimental Results ........................................................................................ 70 4.5. Summary .......................................................................................................... 74 Chapter 5 Exploring Advanced Sequence Alignment Techniques in