Improving the Effectiveness of Behaviour-based Detection

Mohd Fadzli Marhusin

BSc. Information Studies (Hons) (Information Systems Management) UiTM, Malaysia Master of Information Technology (Computer Science) UKM, Malaysia

A thesis submitted in partial fulfilment of the requirements

for the degree of Doctor of Philosophy at the

School of Engineering and Information Technology

University of New South Wales

Australian Defence Force Academy

 Copyright 2012 by Mohd Fadzli Marhusin

PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet

Surname or Family name: MARHUSIN

First name: MOHD FADZLI Other name/s:

Abbreviation for degree as given in the University calendar: PhD (Computer Science)

School: School of Engineering and Information Technology (SEIT) Faculty:

Title: Improving the Effectiveness of Behaviour-based Malware Detection

Abstract 350 words maximum: (PLEASE TYPE)

Malware is software code which has malicious intent but can only do harm if it is allowed to execute and propagate. Detection based on signature alone is not the answer, because new malware with new signatures cannot be detected. Thus, behaviour-based detection is needed to detect novel malware attacks. Moreover, malware detection is a challenging task when most of the latest malware employs some protection and evasion techniques. In this study, we present a malware detection system that addresses both propagation and execution. Detection is based on monitoring session traffic for propagation, and API call sequences for execution. For malware detection during propagation, we investigate the effectiveness of signature-based detection, anomaly-based detection and the combination of both. The decision-making relies upon a collection of recent signatures of session-based traffic data collected at the endpoint level. Patterns in terms of port distributions and frequency or session rates of the signatures are observed. If an abnormality is found, it often signifies worm behaviour. A knowledge base consisting of recent traffic data, which is used to predict future traffic patterns, helps to reverse the incorrect flagging of suspected worms. The knowledge base is made of recent traffic, used to predict future patterns of traffic data. For detection based on execution, we analyse sequences of API calls grouped into n-grams which are compared with benign and malware profiles. A decision is made based on a statistical measure, which indicates how close the behaviour represented in the n-grams is to each of the profiles. The main contributions of this thesis are: the proposal and evaluation of a framework for detecting malware, that considers both propagation and execution in a systematic way; the detection methods are based on information that is simpler to process than other proposals in the literature, yet still achieve very high detection accuracy; and malware can be correctly recognised early in its execution. The experimental results show that our framework is promising in terms of effective behaviour-based detection that can detect malware and protect our computer networks from future zero-day attacks.

Declaration relating to disposition of project thesis/dissertation

I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.

I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral theses only).

…………………………………………………………… ……………………………………..……………… ……….……………………...…….… Signature Witness Date

The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research.

FOR OFFICE USE ONLY Date of completion of requirements for Award:

THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS

COPYRIGHT STATEMENT

‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'

Signed ……………………………………………......

Date ……………………………………………......

AUTHENTICITY STATEMENT

‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’

Signed ……………………………………………......

Date ……………………………………………......

Abstract

Malware is software code that has malicious intent. In recent years, there have been huge changes in the threat landscape. As our dependency on the Internet for social- related information sharing and work increases, the number of the possible threats is huge and we are indeed susceptible to them. Attacks may from the individual or organisational level, to nation-states resorting to cyber warfare to infiltrate and sabotage enemies operation. Hence, the need for a secure and dependable cyber defence is relevant at all levels.

Malware can only do harm if it is allowed to propagate and execute without being detected. Detection based on signature alone is not the answer, because new malware with new signatures cannot be detected. Thus, behaviour-based detection is needed to detect novel malware attacks. Moreover, malware detection is a challenging task when most of the latest malware employs some protection and evasion techniques. In this study, we present a malware detection system that addresses both propagation and execution. Detection is based on monitoring session traffic for propagation, and API call sequences for execution. Our approach is inspired by the human immune system theories known as the Self/Non-self Theory and the Danger Theory.

For malware detection during propagation, we investigate the effectiveness of signature-based detection, anomaly-based detection and the combination of both. The decision-making relies upon a collection of recent signatures of session-based traffic data collected at the endpoint (single computer) level. Patterns in terms of port distributions and frequency or session rates of the signatures are observed. If an abnormality is found, it often signifies worm behaviour. A knowledge base consisting of recent traffic data, which is used to predict future traffic patterns, helps to reverse the incorrect flagging of suspected worms. The knowledge base is made of recent traffic, used to predict future patterns of traffic data. It maintains only recent data as the usage pattern of a computer changes over time.

ii

Our proposed system includes several detectors, the operations of which are governed by several parameters. We study both how these parameters affect the results and performances when different detectors are or are not included. We find that the detectors produce inconsistent results when used independently but when used together achieve promising detection rates. In addition, we identify which worms are consistently detected by the system, and the characteristics of those the system cannot detect well.

For detection based on execution, we analyse sequences of API calls grouped into n- grams which are compared with benign and malware profiles. A decision is made based on a statistical measure, which indicates how close the behaviour represented in the n- grams is to each of the profiles. Experiments show that the system is capable of correctly detecting malware early in its execution.

The main contributions of this thesis are: the proposal and evaluation of a framework for detecting malware, that considers both propagation and execution in a systematic way; the detection methods are based on information that is simpler to process than other proposals in the literature, yet still achieve very high detection accuracy; and malware can be correctly recognised early in its execution.

The experimental results show that our framework is promising in terms of effective behaviour-based detection that can detect malware and protect our computer networks from future zero-day attacks.

iii

Keywords

Malware Detection System

Intrusion Detection System

Self-propagating Worm

Session-based Detection

API Calls based Detection

iv

Acknowledgement

The highest gratitude to the God who gives me strength and patience. This thesis is the result of lengthy work that would have not been possible without the continuous support, guidance and assistance of many people. Thanks to the Ministry of Higher Education, Malaysia and Islamic Science University of Malaysia, for its sponsorships, and to The UNSW@ADFA for providing me with the great opportunity to undertake my PhD on this fascinating research topic.

I am thankful to the people who have guided me from the beginning: Dr David Cornforth and Dr Henry Larkin whose constant support has reduced the complexity of my PhD journey; and Dr Chris Lokan who is on the supervisory board and later became my principal supervisor. I appreciate their spending on their precious time being critics of my thesis and papers and using their expertise to help me in many respects.

I wish to express my deep gratitude to the following special people in my life.

My wonderful wife Dr Rossilawati Sulaiman, who has provided me with a constant support when I have been in pain and hardship. Her trust and faith in my ability will always be remembered.

My joyful boy, Amin Haris who became a source of replenishment for my drained mind. His character, attitude and learning capabilities have fascinated me since he was born during my second year of research.

My parents, Marhusin and Hamsah, and my eight siblings for their moral support and frequent phone calls regarding the progress of my doctorate. My friends with whom I spent time taking break and having fascinating discussions about research and life- related matters.

Also, I appreciate the comments I have received from Ms Denise Russell on my thesis and from a few anonymous reviewers of the papers I submitted to several

v

conferences which helped me greatly.

I hereby acknowledge that any opinions, results and discussions presented in this thesis are based solely on the work I have undertaken in preparation for its submission.

vi

Dedication

To My Family

vii Originality Statement

I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis.

I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.

Signed

Date

viii

List of Publications

Conference Articles

1. Marhusin, M. F., D. Cornforth, and H. Larkin (2008). Malicious Code Detection Architecture Inspired by Human Immune System. Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD'08): pp. 312 -317.

2. Marhusin, M. F., D. Cornforth, and H. Larkin (2008). An overview of recent advances in intrusion detection. 8th IEEE International Conference on Computer and Information Technology (CIT'08): pp. 432-437.

3. Marhusin, M. F., H. Larkin, C. Lokan, and D. Cornforth (2008). An Evaluation of API Calls Hooking Performance. IEEE International Conference on Computational Intelligence and Security (CIS'08): pp. 315-319.

4. Marhusin, M. F., C. Lokan, D. Cornforth, and H. Larkin (2009). A Data Mining Approach for Detection of Self-Propagating Worms. Third International Conference on Network and System Security (NSS'09): pp. 24-29.

ix

Contents

Abstract ii Keywords iv Acknowledgement v Dedication vii Originality Statement viii List of Publications ix Contents x List of Figures xiv List of Tables xv List of Acronyms xvii Chapter 1 Introduction 1 1.1 Overview of Malware ...... 1 1.2 Roles of Malware Detection Systems ...... 2 1.2.1 Intrusion Detection Systems ...... 2 1.2.2 Antivirus Software ...... 3 1.3 Aim of Thesis ...... 4 1.3.1 Problem Formulation ...... 4 1.3.2 Approaches to Problem ...... 5 1.4 Research Question ...... 5 1.5 Contributions to Scientific Knowledge ...... 5 1.6 Organisation of Thesis ...... 6 Chapter 2 Background of the Study 9 2.1 Chapter Objectives ...... 9 2.2 Types of Malware ...... 9 2.3 Intrusion Detection Systems ...... 11 2.3.1 IDS Detection Methods and Techniques ...... 11

x

2.3.2 IDS Architecture and Detection Sources ...... 12 2.3.3 Issues and Challenges of IDS ...... 14 2.4 Malware Detection Systems ...... 19 2.4.1 Malware Detection Techniques ...... 19 2.4.2 Issues and Challenges of Malware Detection Systems ...... 25 2.5 Related Work ...... 35 2.5.1 Approaches to Propagation-based Detection ...... 36 2.5.2 Approaches to Execution-based Detection ...... 39 2.5.3 Concluding Remarks on Detection Approaches ...... 47 2.6 Influence of Human Immune System (HIS) ...... 48 2.7 Conclusions ...... 50 Chapter 3 Malicious Code Detection Architecture Inspired by Human Immune System 51 3.1 Chapter Objectives ...... 51 3.2 System Architecture ...... 51 3.3 Two-phase Malware Detection ...... 54 3.3.1 Adolescent Phase ...... 54 3.3.2 Mature Phase ...... 56 3.4 Threat Model ...... 57 3.4.1 What the System is Designed to Detect ...... 57 3.4.2 What the System May not be Able to Detect ...... 58 3.4.3 Possible Threats and Responses ...... 59 3.5 Remarks ...... 60 3.6 Challenges and Conclusions ...... 60 Chapter 4 Towards Effective Detection of Malware at Propagation 63 4.1 Chapter Objectives ...... 63 4.2 Related Work ...... 64 4.3 The Architecture of the Detection System ...... 65 4.3.1 Detection Buffer ...... 68 4.3.2 Port-based Detector ...... 68 4.3.3 Frequency-based Detector ...... 69

xi

4.3.4 Session Rate-based Detector ...... 70 4.3.5 Signature-based Detector ...... 71 4.3.6 Detection based on Activation ...... 74 4.3.7 Integration of Detectors ...... 75 4.3.8 How Immune System inspired Our Work ...... 76 4.3.9 Performance Measures ...... 77 4.3.10 Summary ...... 77 4.4 Experimental Setup ...... 77 4.4.1 Dataset...... 78 4.4.2 Results and Discussion ...... 81 4.4.3 Port Alone ...... 83 4.4.4 R Alone...... 87

4.4.5 Integrating Detectors PTRTT ...... 90

4.4.6 Integrating Detectors PFRFT ...... 93 4.4.7 Execution Performances ...... 95 4.5 Comparisons with Other Results ...... 96 4.6 Issues ...... 103 4.7 Conclusions ...... 103 Chapter 5 Evaluation of Performance of API Calls Hooking 105 5.1 Chapter Objectives ...... 105 5.2 Introduction ...... 105 5.3 Problem Statement ...... 106 5.4 Experimental Planning and Operation ...... 107 5.5 Results and Discussion ...... 109 5.6 Conclusion ...... 113 Chapter 6 Towards Effective Detection of Malware at Execution 114 6.1 Chapter Objectives ...... 114 6.2 Detection System ...... 115 6.2.1 Detection Architecture ...... 116 6.2.2 Benign and Malware Profiles...... 117 6.2.3 Detection Process ...... 118

xii

6.3 Experiment I...... 120 6.3.1 Features Selection and Data Reduction ...... 124 6.3.2 n-grams ...... 125 6.3.3 k-fold Cross-validation...... 128 6.3.4 Plan and Objectives...... 129 6.3.5 Performance Measures ...... 130 6.3.6 Hardware ...... 130 6.3.7 Results and Discussion ...... 130 6.4 Experiment II ...... 139 6.4.1 Performance Measures ...... 141 6.4.2 Hardware ...... 141 6.4.3 Results and Discussion ...... 142 6.5 Comparison of Results ...... 147 6.6 Issues and Challenges ...... 148 6.7 Conclusion ...... 150 Chapter 7 Conclusions 151 7.1 Summary of Research Done ...... 151 7.2 Limitations ...... 152 7.3 Future Research Directions ...... 154 7.4 Closing Remarks ...... 155 References 157

xiii

List of Figures

Figure 2.1: Common mechanics of malware threats via web site ...... 30 Figure 3.1: Detection architecture of the system ...... 52 Figure 3.2: Processes in Adolescent Phase ...... 55 Figure 3.3: Processes in Mature Phase ...... 56 Figure 4.1: System architecture ...... 66 Figure 4.2: Sample worm tree structure ...... 72 Figure 4.3: Pseudo-code for integration of detectors ...... 75 Figure 6.1: Detection architecture...... 116 Figure 6.2: General structure of API calls-trapping on single executable ...... 117 Figure 6.3: NK values and performances ...... 131 Figure 6.4: S values and performances ...... 132 Figure 6.5: NK values in absence of S and performances ...... 132 Figure 6.6: Detections based on 1st to 1000 th n-grams with NK and S ...... 135 Figure 6.7: Detections based on 1 st to 1000 th n-grams without S ...... 136 Figure 6.8: Detection graph showing discrimination lines between malware and benign at 113 th n-gram ...... 138 Figure 6.9: Detections based on 1st to 300 th n-grams with NK and S ...... 143 Figure 6.10: Detection graph showing discrimination lines between malware and benign at 118 th n-gram ...... 145 Figure 6.11. Comparison of detection accuracies of our and other algorithms ...... 148

xiv

List of Tables

Table 4.1: List of notations used ...... 67 Table 4.2: Profiles of benign signatures as per endpoints ...... 80 Table 4.3: Worms and total signatures ...... 81

Table 4.4: Performances of PT/worms ...... 84

Table 4.5: Performances of PT/endpoints ...... 85

Table 4.6: Detection performances of PF/worms ...... 86

Table 4.7: Detection performances of PF/endpoints ...... 86

Table 4.8: Detection performances of RT/worms ...... 87

Table 4.9: Detection performances of RT/endpoints ...... 88

Table 4.10: Detection performances of RF/worms with static t ...... 89

Table 4.11: Detection performances of PTRTT/worms ...... 92

Table 4.12: Detection performances of PTRTT/endpoints...... 92

Table 4.13: Detection performances of PFRFT/worms ...... 93

Table 4.14: Detection performances of PFRFT/endpoints ...... 94

Table 4.15: Mean TPs & FPs of RVNS, DCA and RT ...... 96 Table 4.16: Accuracy comparison of all detection algorithms (%) ...... 99 Table 4.17: Without MyDoom-A and Rbot-AQJ ...... 102 Table 4.18: Without MyDoom-A, Rbot-AQJ and Zotob.G ...... 102 Table 5.1: Execution times for first run of programs ...... 110 Table 5.2: Average execution times over 30 runs (in seconds) ...... 112 Table 6.1: Notations used for algorithm ...... 118

Table 6.2: API classes in MSDN Library (Microsoft Corporation 2010c) ...... 121 Table 6.3: API classes (Microsoft Corporation 2010c) evaluated in study ...... 122

Table 6.4: API calls used by benign programs and malware executables in dataset .. 122 Table 6.5: shared or exclusive API call sequences in benign and malware ...... 123 Table 6.6: Repetitive/non-repetitive API call sequences used by benign and

xv

malware ...... 124 Table 6.7: Insight into reduction process for API classes ...... 125 Table 6.8: Unique n-grams of benign vs malware ...... 127 Table 6.9: Percentage of n-gram reduction from the removal of the Memory Management class APIs ...... 128 Table 6.10: Performances for full execution with NK and S ...... 134 Table 6.11: Performances for full execution without S ...... 134 Table 6.12: 113 th n-gram as percentage of total execution ...... 137 Table 6.13: Times taken to perform detection ...... 138 Table 6.14: Additional malware included in dataset ...... 139 Table 6.15: Malware dataset distribution into folds ...... 141 Table 6.16: Performance for full execution with NK and S ...... 142 Table 6.17: List of undetected malware ...... 143 Table 6.18: Performances at 29 th n-gram block ...... 144 Table 6.19: Performances at 118 th n-gram block ...... 144 Table 6.20: Details of Forbot-FU detection ...... 146 Table 6.21: Times taken to perform detections ...... 147

xvi

List of Acronyms

ADS Alternate Data Stream API Application Programming Interface ANFIS Adaptive Neuro Fuzzy Inference System BIOS Basic Input Output System CC Cloud Computing CF Computer Forensics CLR Common Language Runtime COFF Common Object File Format CPU Control Processing Unit DARPA Defense Advanced Research Projects Agency DCA Dendritic Cell Algorithm DCs Dendritic Cells DDoS Distributed Denial of Service DIDS Distributed Intrusion Detection System DLL Direct Link Library DoS Denial of Service xattr Extended Attributes FBI Federal Bureau of Investigation FIPA-OS Foundation for Intelligent Physical Agent-Open Source FP False Positive GAs Genetic Algorithms GUI Graphical User Interface HIDS Host-based Intrusion Detection System HIS Human Immune System HMMs Hidden Markov Models IBk Instance-based Learner ICMP Internet Control Message Protocol

xvii

ID Intrusion Detection IDS Intrusion Detection System IL Intermediate Language IO Input Output IP Internet Protocol IPS Intrusion Prevention System KDD Knowledge Discovery and Data Mining KL Kullback-Leibler LCS Learning Classifier System Malware Malicious Software MAS Multi-agent Systems MD Misused Detection ME Maximum Entropy MHC Major Histocompatibility Complex MSDN Microsoft Developer Network NB Naïve Bayes NE New Executable NIDS Network-based Intrusion Detection System NK Natural Killer NNs Neural Networks NTFS New Technology File System OS PAMPs Pathogen Associated Molecular Patterns PC Personal Computer PE Portable Executable RAM Random Access Memory RE Reverse Engineering RCE Reverse Code Engineering RL Rate Limiting RVNS Real Valued Negative Selection SVM Support Vector Machine

xviii

TCP Transmission Control Protocol (TRW) Threshold Random Walk TP True Positive

xix

Chapter 1

Introduction

In the early stages of the computer age, malware was created with limited objectives. However, recent paradigms have also been driven by motives of espionage, and gaining profit and information. This thesis is concerned with detecting malware before it can cause significant damage.

1.1 Overview of Malware

The term ‘malicious code’, or simply ‘malware’, refers to threats posed from a code execution that causes damage to a system or renders its security useless (Szor 2005). Malware can be categorised into many different types, including virus, worm, logic bomb, , germ, exploit, downloader, dialer, dropper, injector, auto rooter, virus generator, spammer program, flooder, key logger, rootkit and spyware (Ford 2005; Szor 2005). A full list is presented in Chapter 2.

Malware creators do not necessarily restrict their code to one type and usually have one or more objectives. Hence, especially at present, a malware might be created with features from more than one type of malware.

In today’s Internet-connected networks, many services are provided over the Internet with a global reach that increases the risk of threats from unpredictable sources. According to the Australian Computer Emergency Response Team, 8,240 websites were compromised and identified as serving malware to their visitors in 2011. In China, the National Computer Network Emergency Response technical Team/Coordination Center received 15,366 incident reports, of which 36.3% were categorised as vulnerabilities Chapter 1. Introduction and 23.4% malware. The Indian Computer Emergency Response Team received 1706 reports of web defacement and 2765 of malware incidents during 2011, with 4394 cases identified as being web-compromised and became malware propagation sources. Bach Khoa Internetwork Security Center in Vietnam identified that 64.2 million computers were infected by malware in 2011 with 38,961 classified as brand-new families. Based on their honeypot records, W32..PE infected approximately 4.2 million computers wordwide. The number of reported incidents has generally shown an increasing trend in many countries over the past decade (US-CERT 2004; CERT 2008; APCERT 2011).

1.2 Roles of Malware Detection Systems

Intrusion in a computer system means illegal or unauthorised access into computer resources. The person who does this will use a set of software tools and malicious code. As the software tools are used for the purpose of malicious intent, we can also classify them as malicious software.

Objectives of defence against malware may involve attempting to: 1) detect the malware itself; 2) comprehend its risks and damages; and/or 3) detect the beneficiaries or owner of the malware attacks. Most research in the literature on this topic, including our work in this thesis have the first objective.

In the literature, there are many approaches for achieving the objective of combatting malware and intrusion, as discussed in the following sections.

1.2.1 Intrusion Detection Systems

An Intrusion Detection System (IDS) refers to a program used to detect an intrusion when it happens and to prevent a system from being compromised. Its purpose is to detect or prevent electronic threat to computer systems. In the literature, a great deal of research was published on IDSs even before the 1960s. Nevertheless, they may suffer from malware that works in new ways.

2 Chapter 1. Introduction

1.2.1.1 Types of IDS

In the field of , there are several types of intrusion/malware detection systems (Sundaram 1996; Sekar, Gupta, Frullo, Shanbhag, Tiwari, Yang and Zhou 2002; Ning and Jajodia 2003). Each type uses different detection mechanisms which have their pros and cons. Anomaly-based detection (Ning and Jajodia 2003; Lakhina, Crovella and Diot 2005; Patcha and Park 2007) depends on the pattern of computer usage. It flags any computer activity that runs differently from the acceptable profile. Although it has the ability to detect novel attacks, it tends to generate high rates of false alarms. Besides, it fails to recognise a completely new legitimate activity.

A signature-based IDS (Sundaram 1996; Uppuluri and Sekar 2001; Ning and Jajodia 2003) relies on identifying known signatures. However, this technique is susceptible to slight variations in the attack signature and also to an unknown attack. Snort is an open- source signature-based IDS which can be obtained from (Sourcefire 2011), which many researchers have used as a benchmark against which to assess their IDSs.

Specification-based detection can rely on the frequency of input data based on system calls, traffic or other sources. This technique requires the definition of valid program behaviours. Its strength can be its low level of false positives. However, it requires careful design to avoid missing some types of attack. Gill et al. (Gill, Smith and Clark 2006) suggested that it is, in fact, a suitable technique within anomaly-based detection.

The combination of two detection types is commonly referred to as hybrid detection, which attempts to overcome the weaknesses of individual categories and forms a strong detection mechanism.

1.2.2 Antivirus Software

In the intrusion sense, besides the detection paradigm, there has been another broad area of research traditionally known as antivirus. Most antivirus software employs signature-based and specification-based detection mechanisms. There are also antivirus software which employ a behaviour-based detection component (AVG Technologies

3 Chapter 1. Introduction

2012; Symantec Corporation 2012). Although, the term ‘virus’ refers to all types of malware and was a popular term during the early computer age, antivirus and anti- malware can be viewed as synonymous.

Currently, computer users can use free or commercial antivirus packages, many of which have the ability to monitor dynamic system behaviour. They also spend a lot of time scanning for known malware in files and examining whether a given program should or should not be allowed access to some areas in the computer, e.g., the registry and operating system (OS) directory.

Nowadays, malware writers will usually combine more than one malware type so that the life and effectiveness of the malware meet their wider objectives. A malware could be designed to remain in a machine for a while without communicating with other machines. With the help of malware that is already inside a victim’s host, other malware located at remote locations could be subsequently downloaded into the host. When a full form of intrusion is initiated, it may already be too late to avoid the damage caused by its malicious activities. These threats demand faster detection of unknown attacks and the ability to immunise computers affected by the first wave of attacks. A new generation of attacks could cause severe damage to the entire global network, leaving behind major challenges for finding better solutions.

1.3 Aim of Thesis

1.3.1 Problem Formulation

This research aims to improve the effectiveness of malware detection. Malicious and hacking activities, which intend to infiltrate a computer system and steal information, begin with reconnaissance attempts. If a detection system looks for evidence only at the network level (propagation), it will miss some useful information at the system call level (execution).

Therefore, a detection system must look for traces at both these levels, analyses of which would yield a strong defence mechanism on a computer system. We present a malware detection system which focuses on the two stages of malware: 1) propagation;

4 Chapter 1. Introduction and 2) execution.

1.3.2 Approaches to Problem

Our approach to malware detection is inspired by human immune system (HIS) theories. We explore and adapt the Self/Non-self Theory and the Danger Theory, and propose algorithms based on the execution paradigm by means of API call sequences. For detection during propagation, our algorithms are based on session-based traffic.

1.4 Research Question

This thesis addresses the following major question.

Can we rely upon the dynamics of session-based traffic and the API call sequences of executables to detect malicious code as they propagate or execute?

We address this question by investigating the following sub-questions:

• Do anomaly-based and signature-based detectors perform better when working in tandem?

• Can we rely upon the concept of data recency to recognise the difference between self and non-self?

• Can we identify the different characteristics of malware that we can or cannot detect?

• What set of API calls can be effectively used to detect malware during execution?

• What is the best size of the n-gram of an API call sequence for the detection system?

• Can we detect malware early in its execution? If so, in what range of n-grams could detection be best?

1.5 Contributions to Scientific Knowledge

This thesis contributes to scientific knowledge in the following ways.

5 Chapter 1. Introduction

A framework for the detection of malware during execution and propagation We adopt the idea of detection at two stages: propagation and execution. We investigate how these two layers of defence can detect malware and how their processes could work in a real environment. We observe that the system can detect malware as it propagates and/or executes and that some worms weakly detected during propagation are well detected during execution.

A session-based detection of self-propagating worms We analyse the performances of anomaly-based and signature-based detectors working both in tandem and independently on session-based data. Here, we introduce simple algorithms for each detector. In addition, we highlight the detectors which definitively do not need a set of data as an initial learning dataset and, therefore, could be self-learning and self- adjusting by detecting as they go based on the knowledge learnt from recent traffic.

Performance evaluation on API call hooking We evaluate the overhead incurred by the execution performance of common software when a set of API functions of the OS are “hooked” (monitored at execution time). This provides us with some insight into on how many API functions we could possibly hook without a significant delay in the execution speed of the software.

A pre-emptive-based detection of malware executables using sequences of API calls We develop an algorithm for the detection of malware using sequences of API calls and observe that we can detect malware early in its execution.

A chronological evaluation of the detection system Using API call-based detection, we provide an evaluation of the system against malware sampled chronologically to simulate the real-world problem of malware threats. The system uses information about earlier malware to detect future ones and it demonstrates its preparedness for possible zero-day attacks.

1.6 Organisation of Thesis

In Chapter 2, we focus on the background to the problem, that is, the intrusion and involvement of malicious code. We explore solutions in the area of intrusion detection

6 Chapter 1. Introduction which exist in the literature. We survey some of the techniques used and architectures adopted in IDSs. We explore existing solutions to malware detection or intrusion using sequences of API calls. We also discuss the backgrounds to the two underlying theories we adapt into our detection systems: the Self/Non-self Theory and the Danger Theory.

In Chapter 3, we introduce our system architecture which includes detection systems during execution and propagation. We discuss how this architecture works theoretically in a production environment and the two phases of executables in our system’s architecture, the Adolescent and Mature Phases.

In Chapter 4, we present and evaluate our approach to malware detection during propagation. An experiment is conducted, using windows of recent session traffic; both, a fixed-size window and a window spanning a fixed time are investigated. They are evaluated using the session-based dataset used by (Khayam, Radha and Loguinov 2008; Shafiq, Khayam and Farooq 2008b). We present both anomaly and signature-based detectors, and discuss our proposed detection algorithms and some of their parameters. The results for this self-adjusted system are presented and comparison is made against some results in the literature.

In Chapter 5, we shift our focus to malware detection during execution by means of API call sequences. We investigate the impact of API hooking on some API functions of the OS. Some commonly used programs are selected for the experiment and time differences among applications with and without API hooks are evaluated.

In Chapter 6, we discuss our approach for detection during execution by means of API call sequences and introduce the API calls’ dataset for the study. Inspired by the human immune system (HIS), we discuss a statistical approach for the algorithm proposed in the system as well as some important issues relating to dataset cleaning and feature selection. Our results are presented and compared with those in the literature. Using the proposed algorithm, we extend our API call-based detection experiment by having the dataset organised in chronological order with the aim of simulating the readiness of the system to respond to malware threats and possible zero-day attacks. We also introduce an additional dataset derived from several worms mentioned in Chapter

7 Chapter 1. Introduction

4. The results are presented and compared against some of those in the literature.

In Chapter 7, we summarise the contributions of this thesis, highlighting what we have achieved. Regarding our research questions, we assess whether they have been answered, and identify limitations of this study. Finally, we also suggest directions for future research.

8

Chapter 2

Background of the Study

Part of this work is based on Marhusin, M. F., D. Cornforth, and H. Larkin (2008). An overview of recent advances in intrusion detection. 8th IEEE International Conference on Computer and Information Technology (CIT'08): pp. 432-437.

2.1 Chapter Objectives

An intrusion detection system (IDS) and malware detection systems are among the most essential computer network’s security defence tools. In this chapter, we present a survey of work in the literature in the area of IDSs and malware detection systems, including their existing types, techniques and/or architectures. We survey the issues and challenges in computer security related to IDSs and malware detection systems. We also review a few HIS theories which exert a dominant influence on designs of detection architectures and their algorithms.

2.2 Types of Malware

Below is the list of malware categories described in (Szor 2005):

Virus – Generally, requires an event, such as a user click, to trigger its payload which will then be embedded in files and spread as the files are opened. Worm – Exists in the form of a file and may not require other files to exist. Some types do not require a trigger for execution. Some special kinds are known as mailer, mass mailer, rabbit and octopus. Mailer and mass mailer exploit mailing services to send malicious mail messages. Chapter 2. Background of the Study

Rabbit is a worm that can move from one computer to another. Octopus is a worm that acts as a component and works with other components that exist in multiple hosts in a computer network. Logic bomb – A set of code with malicious intent that is only executed at a certain in time. Trojan horse – Opens a back door to an intruder for further penetrations. Germ – The first version of a virus designed to not attack the writer’s host but, upon execution, create a virus. Dropper – An installer used to install malware. Injector – Similar to dropper but used to inject malware into memory or a remote host. Exploit – A special code used to take advantage of vulnerability in software. A successful attack will usually give administrative privilege to the attacker to further compromise the victim’s host. Downloader – A program used to download and install other malware from a remote source to a victim’s host. Dialer – Commonly associated with programs that dial up a user’s modem to remote hosts. It causes the user to be charged for a connection made over a telephone line. Auto rooter – A program that has a set of exploits to gain administrative access to a remote machine. Virus generator – A tool that can be used to generate viruses with selected features. Spammer – Distributes fake information, usually via email, which contains email messages or web pages ( phishing ), and aims to collect users’ private data or distribute misleading information. Flooder – Used to cause a denial of service (DoS) to services provided by a target site so that legitimate users can no longer access them. A more sophisticated and large-scale attack is known as a distributed denial of service (DDoS). Usually, it is difficult to trace the attacker who may have launched such an attack using handlers as intermediaries to control zombies. Key logger – Used to capture a user’s keystrokes and mouse movements over a period of time. Rootkit – Used to hide the existence of a malware in a computer. It can prevent the

10 Chapter 2. Background of the Study operating system from showing evidence of a rootkit existence to the end-user. Spyware – Usually, it is a legitimate software but, which at the same time, collects an end-user’s private information. Such software is usually used for marketing purposes.

2.3 Intrusion Detection Systems

The purpose of an IDS is to detect and/or prevent electronic threats to computer systems. In the literature, a great deal of research on IDSs has been published even before the 1960s. At time, the prime focus was on the security of servers and mainframes. Later, personal computers (PCs) gained tremendous popularity. However, today’s Internet-connected networks expose computer systems to another level of threat from unpredictable sources as many services provided over the Internet have a global reach.

2.3.1 IDS Detection Methods and Techniques

Many of the techniques used in attempts to detect intrusion are reviewed in (Patcha and Park 2007). The most common are summarised below.

Neural networks (NNs) can be trained to recognise arbitrary patterns in input data and associate them with an outcome that can be a binary indication of whether an intrusion has occurred. Such models are only as accurate as the data used to train them.

A state transition table describes a sequence of actions performed by an intruder in the form of a state transition diagram. When the behaviour of a system matches those states, an intrusion is detected.

An immune system mimics the natural immunology observed in biology. A crucial part of its process is identifying self cells and non-self cells. Several models exist (Dasgupta 2006), with the most well known being negative selection.

Matzinger’s Danger Theory (Matzinger 1994) challenged the Self/Non-self Theory, claiming that cells can sense not only evidence for the presence of antigens, but also danger signals. If a cell is damaged due to pathogenic infection or other cell stress, these signals are released and, in an apoptotic situation, the cell is dismantled under control.

11 Chapter 2. Background of the Study

Genetic algorithms (GAs) mimic nature’s system of reproduction in which only the fittest individuals in a generation are reproduced in subsequent generations, after undergoing recombination and random change. The application of GAs in IDS research, which appeared as early as 1995 (Crosbie and Spafford 1995), involves evolving a signature that indicates intrusion. A related technique is the Learning Classifier System (LCS) in which binary rules that collectively recognise patterns of intrusion are evolved.

Hidden Markov Models (HMMs) are stochastic versions of the state transition techniques discussed above, in which the states and transition probabilities are modelled as a Markov process with unknown parameters estimated from the input data through a learning phase.

Fuzzy logic is a set of concepts and approaches designed to handle vagueness and imprecision. A set of rules can be created to describe a relationship between the input and output variables which may indicate whether an intrusion has occurred. Fuzzy logic uses membership functions to evaluate degrees of truthfulness (El-Semary, Edmonds, Gonzalez and Papa 2005).

Most early IDSs implemented detection mechanisms by comparing the total output value against a static threshold (Kruegel, Mutz, Robertson and Valeur 2003). The problem with this strategy is that the threshold might not be able to adapt to changes in legitimate computer traffic. Later research indicated interest in bio-inspired and adaptive systems (Zou, Duffield, Towsley and Gong 2006; Shafi and Abbass 2007; Shafi 2008). In our thesis, we implement a dynamic threshold on a detection mechanism called a self-adjusting system which is described in detail in Chapter 4.

2.3.2 IDS Architecture and Detection Sources

An IDS can be classified as a host-based or network-based intrusion detection system (HIDS or NIDS, respectively). A HIDS generally monitors the dynamic behaviour and state of a computer internally rather than the network packets. The internal sources of data it monitors could be session traffic, application-level and

12 Chapter 2. Background of the Study system-level logs, API calls, files and file system alterations, etc. Examples of work related to HIDS can be found in (Anderson 1980; Anderson, Lunt, Javitz, Tamaru and Valdes 1995; Ilgun, Kemmerer and Porras 1995; Marchette 1999; Jian, Xin and Ge 2004; Zhang, Li and Gu 2004; OSSEC 2011).

A NIDS is usually a platform-independent system that monitors the traffic of one or more computers on a network. It can be designed to detect intrusions by analysing network traffic at each individual computer or at intermediary devices, such as proxy servers and gateways. Several high-end network devices such as network taps, network switches and routers can also tap network traffic. A network tap is a device with at least three ports. All inbound and outbound traffic use the first two ports while the third simply provides a copy of all traffic passing through the first two. A network switch and router also implement a similar mechanism which is usually referred to as port mirroring. A device with a port-mirroring feature allows a third party to listen to the same traffic. Several works related to NIDSs can be found in the literature (Anderson, Lunt, Javitz, Tamaru and Valdes 1995; Marchette 1999; Sourcefire 2011).

The use of multiple IDSs (HIDS or NIDS) forms a larger architecture of intrusion detection and is known as a distributed IDS (DIDS) (Porras and Neumann 1997; Jemili, Zaghdoud and Ahmed 2007). Work on DIDS draws attention to the fact that data can be collected from several computers on a network. Each participating host may first analyse the data and then aggregate data can be analysed at a centralised or dedicated host. Using a DIDS approach, detection systems can gather a better quality of information about the state of a computer network.

Centralised analysis, which is implemented in many DIDS, is prone to several weaknesses, as highlighted in (Kannadiga and Zulkernine 2005). Firstly, the addition of a new host causes an increment in the load on the centralised server that performs analysis, which raises a scalability issue. Secondly, communications with the centralised server can overload a network. Thirdly, some IDS clients contain platform-specific components. These problems of IDS have led many researchers to introduce a multi- agent approach for IDS architectural design. The features of multi-agent systems (MASs), such as proactive, reactive, social, truthful, benevolent, adaptive, autonomous

13 Chapter 2. Background of the Study and rational (Bellifemine, Caire and Greenwood 2007), are the reasons for their adoption in IDSs. As a MAS supports a multi-platform environment, an agent can be added or removed with minimal impact to the system.

The work of (Balasubramaniyan, Garcia-Fernandez, Isacoff, Spafford and Zamboni 1998; Kannadiga and Zulkernine 2005) implemented the aggregation and correlation of information gathering which can reduce the amount of information exchanged among hosts. Kannadiga and Zulkernine introduced a corrupted host list that can be used as a reference for an agent to ascertain the right hosts to visit (Kannadiga and Zulkernine 2005). This idea reduces the amount of traffic in communication and avoids corruption of the defence. Chan and Wei (Chan and Wei 2002) used mobile agents that perform analysis at the least busy host and can stop an attack at the gateway using a pre-emptive attack strategy. Peddireddy and Vidal (Peddireddy and Vidal 2002) designed a multi- agent IDS based on the FIPA-OS (Foundation for Intelligent Physical Agent-Open Source) environment. Their design assumes that each agent has limited knowledge about other hosts’ situations. Thus, each agent needs to negotiate with every other agent to acquire some information. Work by (Ghosh and Sen 2004; Xiao, Zheng, Wang and Xue 2005) used a voting method for an agent to obtain other agents’ views about a suspicious attack when the agent is not fully confident with its personal decision. Simulation was performed in (Xiao, Zheng, Wang and Xue 2005) to measure the traffic load and latency effect that occurs during the voting process. Mosqueira-Rey et al. (Mosqueira-Rey, Alonso-Betanzos, del Río and Piñeiro 2007) integrated the Snort rules 1 with a detection agent which they compared with Snort in terms of their rules lookup performances.

2.3.3 Issues and Challenges of IDS

Implementing an IDS in a high-speed network requires special consideration as there will be some cost incurred in terms of network speed. Zaidi et al. outlined three possible implementations of an IDS (Zaidi, Kenaza and Agoulmine 2010): 1) it can selectively analyse some traffic, focusing on some part of an attack while possibly

1 As Snort is an open-source IDS, its source code’s engine and detection rules are publicly available.

14 Chapter 2. Background of the Study ignoring others; 2) it can have buffering techniques to analyse the buffered traffic; and 3) it can capture and redirect traffic to suitable sensors, which can be dedicated computers or just special hardware, in a multi-sensor environment. Nevertheless, the volume of signatures used in the detection algorithm always has an effect on the speed of a network’s traffic (Kruegel, Valeur, Vigna and Kemmerer 2002).

Evasion attacks are a major threat to the effectiveness of an IDS. A mimicry attack (Wagner and Soto 2002; Kruegel, Balzarotti, Robertson and Vigna 2007) blends malicious traffic patterns to resemble benign ones so that, if the attack traffic looks like the benign traffic in terms of its pattern, it could possibly evade detection. This is challenging when an attacker has access to the detection signature or rules because mimicry attacks could then be crafted (Parampalli, Sekar and Johnson 2008).

Regardless of the effectiveness and efficiency of existing IDSs, one work (Jeong, Choi and Kim 2005) addressed the issue of the effective placement of detection systems across large-scale networks. Their algorithm shows an encouraging optimisation in reducing the number of IDS placements while maintaining a low impact of attacks.

The majority of past research has employed analysis based on data sourced from audit trails, system calls and network traffic. However, as analysing audit trails is prone to several weaknesses, such as requiring a lot of space and causing performance degradation, many organisations do not enable audit trails on their IT systems (Inspector General for Audit 2004). Also, as they indicate only past actions, if an intrusion is detected after the compromise has occurred, it may be too late because the damage may have already been done. Although evidence of that intrusion might be useful for the victim to prevent the same attack in future, it may already have been subjected to reversal and/or deletion by the intruder who gained access to the system.

For network traffic, most research studies have analysed the packet header, which is prone to Internet Protocol (IP) address spoofing. Some others have concentrated on the payload which is prone to data encryption. While there are some advantages in implementing detection by analysing traffic against some kinds of attacks, it is vulnerable to any traffic with an unusual rate, e.g., high rates of benign traffic generated

15 Chapter 2. Background of the Study by computers running multimedia-rich applications and online games against a very low rate of malign traffic. Furthermore, relying on the traffic alone will not provide a clear picture of the status of some activities occurring in the computer itself.

Another issue concerns the dataset used to evaluate research. The Defense Advanced Research Projects Agency (DARPA) (MIT Lincoln Labs 1999) and knowledge discovery and data mining (KDD) datasets (KDD Cup 1999) have enabled many researchers to concentrate on improving detection algorithms and comparing their work with that of others. The DARPA dataset was known to be the origin of the KDD Cup dataset (Tavallaee, Bagheri, Wei and Ghorbani 2009).

However, Sabhnani and Serpen (Sabhnani and Serpen 2003) claimed that the KDD dataset was not suitable for use with several well-known machine learning algorithms. Their tests revealed that using it as training data achieved no more than 30% detection results for user-to-root and remote-to-local attack types due to the low representation of attack samples. Both McHugh (McHugh 2000) and Mahoney and Chan (Mahoney and Chan 2003) criticised the dataset’s validity. McHugh questioned the procedure for its preparation which he said was unclear to other researchers. Mahoney and Chan claimed that it was full of erroneous information and in many aspects, did not look like real traffic which was supported by experiments on several IDSs using the KDD dataset and their own real network dataset. They concluded that, as a model with low false alarms created based on the KDD dataset tended to generate high false alarms in a real environment, no good model could be drawn from the KDD dataset. Shafi (Shafi 2008) used that dataset with two machine learning algorithms, UCS and extended UCS classifiers, and found that the detection results for user-to-root and remote-to-local attack types were poor. The extended UCS classifier also failed to show any significant improvement over its counterpart. Despite the criticism of this dataset, some research studies (Zaidi, Kenaza and Agoulmine 2010; Mabu, Chen, Lu, Shimada and Hirasawa 2011) still used it.

Due to some limitations and weaknesses of the above datasets, (Tavallaee, Bagheri, Wei and Ghorbani 2009) produced another called the NSL-KDD (NSL-KDD 2009). Although some weaknesses indicated by (McHugh 2000) could not be resolved, they

16 Chapter 2. Background of the Study were optimistic that the NSL-KDD could be a benchmark dataset against which researchers could compare their detection methods.

As making a dataset available to the public requires the owner to transform its sensitive data to some other label, there is a risk of wrong labelling and data loss during the process.

A collection of Internet traffic datasets is available in (Danzig, Mogul, Paxson and Schwartz 2009). However, Tavallaee has commented that, as they lack proper documentation, their reliability is questionable (Tavallaee, Stakhanova and Ghorbani 2010).

The DARPA/KDD datasets have been used for over a decade. A long time ago, McHugh (McHugh 2000) raised a ‘possible’ commercial baseline for continuously ensuring that the performances of detection methods produced by researchers reflected the actual performances when dealing with real current traffic. Over the past decade, Internet traffic has changed due mainly to technological advances in terms of hardware and software. Unfortunately, many software vulnerabilities, and malware and hacking tools, have emerged. This has created a gap between the research community and the industry which have developed many anomaly detection methods but most security vendors still rely heavily on signature-based detection approaches (Tavallaee, Stakhanova and Ghorbani 2010). Therefore, there is an issue concerning the availability of a reliable dataset for the purposes of research into providing answers to current problems.

Tavallaee et al. conducted a credibility survey of publications on anomaly-based detection from 2000 to 2008 (Tavallaee, Stakhanova and Ghorbani 2010). They analysed three major components of research publications which they thought were crucial in order to achieve a fair evaluation and unbiased comparison: 1) the datasets used; 2) some detailed criteria of the experiments; and 3) the method used to evaluate performance, and made several interesting discoveries. The main objective they wanted to highlight was the importance of producing a good publication which would allow other researchers to replicate, validate and compare the work. One of the issues they

17 Chapter 2. Background of the Study raised was the use of current datasets and while acknowledging the contribution made by DARPA/KDD datasets, also highlighted their problems. Their survey revealed that 50% of publications used these datasets and that, due to the unavailability of a better one, many researchers used synthetic datasets which lead to credibility issues.

There are also a number of strategies which apply intrusion detection in applications. Knowing the nature of grid computing, which utilises a group of machines working together, the technology requires an IDS to provide protection against exploitation and intrusion into the grid itself. In (Choon and Samsudin 2003), the authors proposed a framework, with a monitoring component that enforces an access policy on the resources in a grid in which a correlation and aggregation of active profiles from the computers are compared with the recorded profiles. Hu and Panda (Hu and Panda 2004) proposed an IDS for a database system using a data-mining approach. Their solution believes that a legitimate transaction for a particular record must follow a valid sequence of reading/writing data to related records. Updating a record that does not follow the right sequence would be the subject of an intrusive update. However, this solution is only effective for data that is dependent on other records. Although the authors created special database logs for the purpose of their experiments, the issue is still the availability of a reliable dataset for research purposes.

An Intrusion Prevention System (IPS) shadows the IDS terminology. Early IDS research studies focused merely on detection. However, later works also suggested prevention mechanisms (Gong 2003), such as disconnecting network access, shutting down computers, updating security policy and other measures. Thus, an IPS can be described as an extension of an IDS. Many security vendors have commercialised IDS- related security products. IDSs have been associated with antivirus programs (Xiao, Zheng, Wang and Xue 2005) used to prevent unauthorised modification to a specific data store or file structure in a system and diverse features, including prevention functionality, hardware/software-based components and strategic deployment places, have been added. This indicates the emerging role of IDSs.

Although a malware detector and an IDS are sometimes used synonymously, Idika and Mathur suggested that a malware detector is just a component of a complete IDS

18 Chapter 2. Background of the Study

(Idika and Mathur 2007). There are some situations handled by an IDS but not antivirus software; for example the detection of acts by a masquerader, misfeasor and clandestine user. A masquerader is a user who uses another person’s account to access resources. A misfeasor refers to an authorised user who attempts to gain access to restricted resources or misuses the authorisation and trust given. A clandestine user refers to a person without any authorisation at all who gains unauthorised access to a system (Lowery 2002).

The diverse aspects of research on IDSs are sometimes seen to overlap with research on malware detection but they may actually complement each other. At present, we can see security software such as an IDS, behaviour-based malware detector, etc., becoming as essential a component of a security system as is existing commercial antivirus software at present.

2.4 Malware Detection Systems

The main purpose of a malware detection system is to detect the presence of malware which, if found could be cleaned, quarantined, blocked, or deleted which are trivial tasks.

2.4.1 Malware Detection Techniques

Several solutions to malware threats have been attempted, as mentioned in (Wang, Deng, Fan, Jaw and Liu 2003; Ford 2005). Some of the common approaches can be categorised into static analysis, dynamic analysis, computer forensics (CF), OS hardening, file integrity checking (check sum and integrity shell), trusted code (certification), and others.

Reverse Engineering (RE) in software, sometimes called Reverse Code Engineering (RCE) (Peikari and Chuvakin 2004) is a technique for obtaining a sufficient design- level understanding of a particular system for its later modification or enhancement (Chikofsky and Cross 1990). Although, in some circumstances, the effort might be difficult, a malware analyst can use RCE to develop an understanding of a logic flow of

19 Chapter 2. Background of the Study a particular malware. Two complementary analysis techniques are associated with RCE: 1) static code analysis that analyses a particular executable without executing it; and 2) dynamic code analysis that analyses a particular executable’s behaviour by executing it (Brand 2007; Zeltser 2012).

Static analysis can be further sub-divided into techniques such as disassembly, profile gathering, symbol table generation, and decompilation (The Honeynet Project 2004; Brand 2007). Disassembly is a technique for disassembling, debugging and monitoring the binary information of a program, e.g., by using IDAPro (Hex-Rays SA 2012). The same or other tools can also reveal some important profiles and symbol table generation of an executable. Decompilation of a revealed binary can be transformed into high-level programming code for an enhanced analysis (Brand 2007). Static analysis to obtain a binary signature, is very effective when we want to know exactly what a malware will do, as this information helps antivirus software reverse or clean up damage inflicted by malware. Using this technique, an antivirus analyst can trace down to a line-by-line level which operations have been executed by a particular brand new malware, thereby gaining intelligence in terms of the binary signature as to how the malware creator managed to evade the existing antivirus detection (Kruegel, Robertson and Vigna 2004; Moser, Kruegel and Kirda 2007b). The disadvantage of this technique is that the analyst needs to have access to the malware’s source code or binary file, which may sometimes be difficult to obtain (Cornell 2008). In addition, relying on static analysis or using binary signature-based detection is a really tough challenge in the face of evasion attack techniques. Moser et al. (Moser, Kruegel and Kirda 2007b) demonstrated that relying on static analysis alone is not enough to combat malware.

Dynamic analysis can be categorised into techniques such as sandbox analysis, black-box analysis, and system calls/API calls tracing (The Honeynet Project 2004; Brand 2007). Using a sandbox analysis requires an analyst to isolate code execution in certain ways, such as in a honeynet or virtual environment. The honeypot is a popular strategy for detecting worms and intrusive activities. It involves deploying one or more machines as a tool for a larger detection system to detect, analyse and gather intelligence about attack activities; for example, honeystat (Dagon, Qin, Gu, Lee,

20 Chapter 2. Background of the Study

Grizzard, Levine and Owen 2004). Although a honeypot is a useful technique for tracking the activities of hackers and malware, its obvious weakness is that it is only able to monitor the bad activities interacting with it (Spitzner 2003).

There are a number of automated dynamic analysis tools such as Anubis (International Secure Systems Lab 2012), GFI SandBox (GFI Software 2012), Norman Sandbox (Norman ASA 2012) and ThreatExpert (ThreatExpert Ltd 2012). Researchers can use these tools to analyse behaviour of programs and malware running in a machine within a sandboxed environment.

Newsome and Song (Newsome and Song 2005) proposed a dynamic taint analysis technique called TaintCheck to detect and analyse software exploits automatically, so that signature generation of those exploits can be generated more quickly. The aim of TaintCheck is to detect the exploitations of jump addresses, format strings, system call arguments, etc. TaintCheck performs based on a rule that for a person who wants to alter the execution of a software illegitimately must change a value or its reference in the original one defined in that software to his own input. If an input is detected as coming from an external or illegitimate source, it is marked and then the system keeps track of the use of that input as the software is running. The software is deemed to be attacked if the taint pattern violates the system’s security policy. The technique, which can be used as a single tool or combined with honeypot or OS randomisation, shows promising performances. However, it is susceptible to an attacker who changes a target value or reference without being tainted.

Using a virtual environment enables a malware analyst to take a snapshot of an OS execution state and restore the image at a later time for multiple types of analyses. However, applying virtual machines requires more than one OS (Hollebeek and Berrier 2001) and setting up the environment is costly and suitable only for servers or dedicated computers, not ordinary end-user machines.

A black-box analysis (Brand 2007) is a technique which can be used to analyse a particular executable without the need to know its source code. The program is executed and its behaviour and reactions can be traced by monitoring changes in the OS and the

21 Chapter 2. Background of the Study network packets it generates. An antivirus analyst can use dynamic analysis to see the variations in execution behaviour by applying different execution environments. RCE tools (e.g. OllyDbg ([email protected])), can be used to monitor behavioural changes caused by malware.

System calls/API calls tracing (Forrest, Hofmeyr and Somayaji 1998; Ahmed, Hameed, Shafiq and Farooq 2009; Sami, Yadegari, Rahimi, Peiravian, Hashemi and Hamze 2010) is a technique generally used to provide an insight into which system calls are invoked by a given process. A malware analyst may use a sequence of system calls/API calls to build signature profiles as a means of recognising the difference between benign and malign calls.

Hardening the OS (Shanmugam 2008) code can avoid malicious exploitation of system files and resources, and disabling executables and scripting will ensure no malware can run. However, this solution is largely infeasible as applications will have fewer capabilities available to them. Furthermore, many legacy applications will fail to run with significantly disabled APIs on an OS. Given these restrictions, this technique is presumably suitable for servers but not for a high-powered machine of an end-user in which diverse applications and programs are used.

The checksum/file hashing technique (Sophos Labs 2010) uses a mathematical algorithm to extract a digital signature from a file’s contents. If the file contents have changed, there is a very high probability that the checksum will no longer match. Although this technique is useful for checking for rootkits on an existing OS (Kruegel, Robertson and Vigna 2004) it only provides change detection and is not feasible for checking frequently updated files or new files downloaded from other sources. There are also weaknesses in hashing algorithms such as MD4, MD5 and SHA1 that allow an attacker to craft a different version of a file with the same hash value (Hartley 2007).

The integrity shell method checks for alteration evidence. It is a resident program, a layer that checks the integrity of an object just before it is executed. It is only suitable for managing read-only files, such as an application's executable and DLL files. A machine emulator is suitable for tracing the effect of a suspicious program. At the end

22 Chapter 2. Background of the Study of its execution, files are compared with their states before the execution took place (Wang, Deng, Fan, Jaw and Liu 2003). However, although these techniques set restrictions on certain files, they do not provide protection against real-time threats and are not suitable for high-powered machines of end-users in which a variety of applications and programs are used.

The string checker method reads binary data in files looking for known readable strings. A study based on this method (Lai 2008) showed that malware can be distinguished from benign programs. This technique requires the processes of string selection and filtration before a unique set of strings for recognising malware can be obtained. Obviously, as this technique provides no mechanisms for observing what is currently executing while the program is running, its execution risk is unknown.

A network sniffer, protocol analyser or packet sniffer (Marchette 1999) is a technique for protecting a computer network against several kinds of attack. As it is possible that some attacks generate similar traffic patterns over time or match existing attack signatures, observing any traffic that deviates from normal provides some sort of reliability. This technique provides a useful defence against propagating worms, trojan horses, backdoors, etc. However, it may suffer from malware that propagate at a very slow rate, and it cannot detect malware that does not propagate via the network.

Software certification is a solution for Microsoft applications and third-party vendor applications (Whittaker and Vivanco 2002). Certified applications have gained end- users’ trust as they are from legitimate vendors. However, this technique is not feasible for small organisations or non-profit organisations because not all software houses or programmers can afford its high cost. Also, many programs are designed without considering certification. Furthermore, the credibility of this technique is now questionable as, recently, there were malware executables bearing indications that they had been signed with a Microsoft certificate but, in fact, were fake (Hypponen 2012; Reavey 2012).

According to Civie et al. (Civie and Civie 1998), a working definition for Computer Forensics (CF) is the action of pursuing elemental evidence from a computer related to

23 Chapter 2. Background of the Study a case in a manner admissible for a court proceeding. They highlighted that to qualify as CF the evidence must not simply be uncovered through simple observation. A more updated definition of CF can be that it is a discipline that takes into account knowledge of the law and computer science to gather and analyse any information from computer systems, networks, wireless communications and storage devices in a manner which ensures it is admissible for a court proceeding (US-CERT 2008a). CF has been gaining popularity among the research community for more than a decade and, therefore, a number of commercial CF tools are available. CF tools are used by forensic investigators to conduct investigations (Garfinkel 2006) and can basically be categorised into persistent and volatile data tools. The former is used to analyse data that has been written in a computer’s storage area, e.g., disk. The latter is used to analyse data in volatile areas such as a computer’s memory and a network’s packet, which obviously could be lost if the computer is turned off. Among popular CF tools are The Sleuth Kit, The ATA Forensics Tool, The Coroner’s Toolkit and EnCase (Hartley 2007) .

Many traditional approaches for malware detection focus on binary signature-based detection in which a malware is detected by recognising common features in a file while it is stationary. This method assumes that malicious codes always share the same or similar binary patterns, since most malware use a limited number of attack techniques, and its detection is based on known signatures. Its main challenge is determining how quickly to obtain the signatures and how much of the malware they can detect. If the signatures were created to detect malware in a too specific way, they might be vulnerable to malware with a new binary pattern. At present, many binary scan-based antivirus components are highly efficient when dealing with known malware.

The present patch model provided by many software manufacturers provides continual software and security updates for their software and OS against vulnerabilities and malware threats. However, it seems that this model fails, especially when dealing with large-scale and fast-spreading attacks, as proven by major worms attacks such as Code Red , Code Red II , Melissa , Witty , Nachi , Nimda , Santy , SQL.Slammer and

24 Chapter 2. Background of the Study

MyDoom (Yegneswaran, Barford and Ullrich 2003; Ford 2005). Worse still, sophisticated malware, such as Stuxnet , DuQu and Flame , indicate how commercial antivirus software cannot be relied on to provide protection against targeted attacks (Hypponen 2012).

As the price of computer storage drops, most end-users can afford hard disks with Terabytes (TB) capacity, which allows the storage of more computer files than ever before. Therefore, the challenge is that any approach for detecting malware files or malware-infected files while they are stationary would have to be exhaustive given the huge number of files users may have on their hard disks.

Independent tests conducted in 2010 on binary signature-based detection in all the major on-demand antivirus software revealed that the best could achieve as high as 99.6% detection, and the worst only 81.8%. The test set consisted of about 1.2 million malware collected from several sources. It was also claimed that those software also produced false positive results (Anti-Virus Comparative 2010). Currently, effective threat prevention requires a combination of approaches that take various infection vectors, such as the functionality of an IDS/IPS, anti-spyware and firewall into consideration. One or more of the available detectors aims to provide protection against malware threats and usually monitors files, emails, web pages, etc. These detectors may use binary signatures, API calls, logs, traffic information, etc., as one or more sources of data for monitoring.

2.4.2 Issues and Challenges of Malware Detection Systems

Although commercial antivirus products have gained the trust of the public and organisations worldwide, there are some cases where the product itself has failed and caused the relevant computer system to crash (Surisetty and Kumar 2010).

There have been allegations that several antivirus companies work together with virus writers to keep the business alive. However, John Hawes, a technical consultant at Virus Bulletin, has denied these rumours and claimed that the antivirus business is built based on trust among competitors and their customers (Virus Bulletin 2011). As, from a

25 Chapter 2. Background of the Study business point of view, an antivirus company’s financial performance relies on malware incidents, during a period of no, or only a few, malware cases, usually sales will drop. Harrald et al. investigated this issue in three major antivirus software companies and found a direct relationship between the number of malware incidents and the antivirus companies’ financial performances on the stock market (Harrald, Schmitt and Shrestha 2004).

The challenge of creating a malware detection system is like a cat-and-mouse chase, as detection systems are continually improved in terms of their mechanisms and effectiveness while malware writers are always seeking new ways of evading detection walls. Malware writers always have the upper hand as they can test their codes using numerous commercial and free antivirus products to evaluate whether their malware can evade current defences before launching a malware attack. If not, they have ample time to improve the program code so that the signatures of the altered version become totally unmatched with the signatures stored in the antivirus software; once successfully modified, the malware can be released later. This kind of approach is investigated in a recent work by (Ramilli and Prandini 2010). The only way of eliminating malware writer’s advantage is to use customised or non-publicly available antivirus software. However, as not all organisations can afford a dedicated security team to develop and maintain such security software, this is impractical.

The emergence of new devices and technologies often opens new dimensions for malware to spread (Ford 2005) as they become harder to detect (Lawton 2002). In the early age of PCs and floppy disk drives, malware exploited sector 0 of a floppy disk as a “special zone” where code would run automatically on being booted by the Basic Input Output System (BIOS). Some malware target particular types of documents, e.g., files relating to word processing software which people use to work with their documents. The distribution of infectious files via various sources, such as shared directories and/or portable storage/devices, has simultaneously triggered the spread of malware. When mailing systems were established, malware writers took advantage of them to propagate malware to users as email attachments or links, thereby breaking geographical boundaries.

26 Chapter 2. Background of the Study

An evasion attack is a major threat to an effective malware detection system. A dangerous payload of a malware can exist in the form of more than an ordinary binary file, i.e., registry key and random access memory (RAM). Using evasion attack techniques, binary file signatures can be made partially or fully dynamic with the help of code obfuscation techniques, such as encryption, polymorphism or packing, so that a signature will not match the existing antivirus’s signature database (Christodorescu and Jha 2003; Moser, Kruegel and Kirda 2007b; Fitzgerald 2010). A polymorphic malware is able to hide its file signature which it always changes upon replication (Szor and Ferrie 2001). With the help of a built-in interpreter, its malicious payload can be excluded from its parent file, and downloaded and executed at a later time (Moser, Kruegel and Kirda 2007b; Zdrnja 2010). A highly sophisticated modular-based malware is demonstrated by Clampi , a trojan (Fitzgerald 2010).

There are code protection techniques (DeMarines 2008) that malware writers can implement to deter reverse engineering attempts, or at least render their effort infeasible in terms of complexity and affordability. Malware writers can use binary obfuscations and polymorphism to scramble a file’s logical structure so that it will be difficult to analyse. Using polymorphic encryption, a malicious code payload is packed and then just-in-time decrypting and/or unpacking is used to restore the packed payload, together with the unpacked section of the executable, into memory just before the malware is executed. A more sophisticated layer of code obfuscation is performed with the help of embedded virtual machine protectors which usually comprise a compiler, interpreter and handler (Yan, Zhang and Ansari 2008). To counter an RE attempt, a malware writer can use a rootkit, anti-debugging code, etc., to look for hardware emulators and kernel tools.

Using dynamic analysis, some difficult issues, such as the packing/unpacking process, can possibly be traced depending on which particular technique is used. However, this technique could possibly fail to capture all execution paths when the malware is designed with the capability to hide its functionality if contained or analysed (Cavallaro, Saxena and Sekar 2007; Comparetti, Salvaneschi, Kirda, Kolbitsch, Kruegel and Zanero 2010; www.f-secure.com 2010). Comparetti et al. proposed a tool that could

27 Chapter 2. Background of the Study help to detect dormant functionality in malware programs (Comparetti, Salvaneschi, Kirda, Kolbitsch, Kruegel and Zanero 2010). Although the dormant functionality in a malware can be obtained using a dynamic analysis, the use of a binary signature to detect a malware may still be subject to further evasion attack at a later time.

There are issues relating to techniques which hackers use to hide malicious code in a computer, and the inferior capabilities of CF tools to search for evidence of it. Hartley distinguished between the concepts of anti-forensics and counter forensics (Hartley 2007) by describing the former as the application of technologies to make legitimate evidence disputable, e.g., altering timestamp of files, hiding files, and secure-deleting sensitive data. The application of encryption, steganography and secure deletion in anti- forensics tools frustrates the efforts of forensic investigators in terms of their investigation constraints, i.e., time, cost and resources (Dahbur and Mohammad 2011). Hartley described counter forensics as an application of technologies to prevent the processes of collecting and analysing any evidence so that all CF efforts would lose credibility. Numerous anti-forensic tools, such as necrofile, klimafile, timestomp, slaker, transmogrify, Sam Juicer, Rune FS, Waffen FS, KY FS Data Mule FS and FragFS have been described (The Grugq 2002; grugq 2004; Thompson and Monroe 2006; Hartley 2007). There have also been advances in anti-forensics techniques for smart phones (Azadegan, Yu, Liu, Sistani and Acharya 2012).

Krenhuber et al. explored and described in details the tricks for hiding data (Krenhuber and Niederschick 2007). The Host/Hidden Protected Area (HPA) is a reserved storage area on a computer hard disk designed specifically for computer manufacturers to store important tools and recovery images for after-sales support and the computer’s configuration. However, as software such as hdat2 (Cabla 2012) can modify this originally protected area, it is subject to malicious use. The Device Configuration Overlay (DCO) is also a reserved storage area on a hard disk that can be exploited by hackers to hide malicious code. It was originally used to alter the different sizes of hard disks so they had the same numbers of sectors.

Moreover, there is a slack space on a hard disk due to the design structure of a logical disk. Any unallocated space in a hard disk’s cluster or volume is subject to

28 Chapter 2. Background of the Study exploitation (Dillon 2006; Berghel, Hoelzer and Sthultz 2008) as files can be secretly stored there as it is considered by the OS as a bad sector. The key to this trick is to alter the Master File Table (MFT) of a New Technology File System (NTFS) (Wee 2006). Data can also be hidden in additional clusters allocated to a file; this is called an Alternate Data Stream (ADS) in Windows and Extended Attributes (xattr) in Linux (Wee 2006). Another trick for hiding a file is to simply delete it, but as this will not destroy the data, such a file could be recovered unless its allocated area is overwritten (Mallery 2006). However, it can be very difficult for forensic investigators to recover data if a hacker uses secure deletion algorithms to ensure the total deletion of a file (Gutmann 1996).

Covert channelling techniques can be used to frustrate forensic investigators at the network level. Two popular types are protocol bending and packet crafting (Berghel, Hoelzer and Sthultz 2008). Using covert channelling techniques, an attacker can use the existing network protocol, i.e., the Internet Control Message Protocol (ICMP), for more than its original purpose, e.g., to carry out malicious activities. With this kind of exploitation, the attacker might easily evade an existing network level defence, such as a firewall and/or IDS, because the security defences monitor application-layer data and assume that the ICMP is always safe.

Data contraption is another emerging technique in which there are no trails left on the hard disk of the target computer by an attacker. Everything is carried out directly from the attacker’s machine by directly accessing the live memory of the running processes in the target computer (Hartley 2007).

Cloud Computing (CC) is an emerging option for enterprises to expand their IT business and investigate possibilities without investing much in their IT infrastructure. CC’s infrastructure can include servers stored at multiple locations with redundancy features and the hardware maybe having different types of ownership (Dahbur and Mohammad 2011). It is quite a complex task for forensic investigators to collect and analyse forensic information in this kind of environment.

Both dubious and legitimate, but hacked, websites have become a target for

29 Chapter 2. Background of the Study malware distribution. An infection’s magnitude is proportional to the number of visitors to that site. As illustrated in Figure 2.1, a web user’s machine can be infected via the use of plug-ins, java applets, DOM objects, XMLHttpRequests, cookies, JavaScript, ActiveX or other software required by a user who assumes that whatever is on the site is legitimate and safe. However, as a web browser’s plug-in shares the same memory address space as the web browser itself, this allows an evasion attack to be carried out via a benign application which exploits that address space (Cavallaro, Saxena and Sekar 2007; Saxena, Sekar, Iyer and Puranik 2008; Saiedian and Broyle 2011). It is also possible that, without the user’s permission, an exploit could be embedded to penetrate the vulnerable browser or its extension, such as Adobe Flash and Microsoft Silverlight (Ford, Cova, Kruegel and Vigna 2009; Constantin 2011; Saiedian and Broyle 2011). Once the machine is successfully penetrated, further downloads are possible as the malware would remain in the machine, collecting or stealing information for various purposes (Cascadia Labs 2008; Viega 2011). The same risk also applies to peer-to-peer file-sharing sites (Berns and Jung 2008).

Figure 2.1: Common mechanics of malware threats via web site 2

Due to the rise in the number of social websites, such as Facebook, MySpace and Friendster, they have become the next target of malware writers. As the most popular, i.e., Facebook, stores the profiles of its users, which contain a great deal of information

2 Source: The diagram is inspired by the original diagram illustrated in a review report in (Cascadia Labs 2008)

30 Chapter 2. Background of the Study about the users and lists of their friends, a third party could write a Facebook application and gain access to that information. Patsakis et al. (Patsakis, Asthenidis and Chatzidimitriou 2009) proved that, using a Facebook application, a malware writer could include malicious code to harvest details of that user’s machine, including its list of available ports. Having that information and using an available exploit against a vulnerable application could allow an attacker to gain access to the console of the target machine, which would potentially cause further damage or information stealing. Also, this malware could be spread unknowingly by a user to his or her friends as, usually, people tend to trust a program shared on Facebook (Fan and Yeung 2010).

Embedded malicious code could also exist in free-to-download files, such as system utilities, Windows gadgets, add-ons programs and screensavers. The use of pirated software is also risky because it could already be infected by malware. Some studies (Li, Stolfo, Stavrou, Androulaki and Keromytis 2007; Stolfo, Wang and Li 2007) have indicated that the signature-based antivirus software they used at the time their experiments were conducted could not detect these embedded malware even though their signatures were already known. Shafiq et al. (Shafiq, Khayam and Farooq 2008a) reaffirmed the claims that commercial antivirus software is not effective against embedded malware threats stored in the form of files, e.g., jpeg images. Moreover, it is has been discovered that a knowledgeable hacker can apply a suitable anti-forensic dithering signal to image files (Valenzise, Nobile, Tagliasacchi and Tubaro 2011). Surfers of dubious or hacked websites containing altered images could trigger the transfer of such images containing malicious payload into their computers. These images could then become the arsenal for malware or hackers to launch a sophisticated attack and a counter forensic, thereby leaving no traces of evidence for forensic investigators. A malware could also be divided into several pieces embedded in one or more legitimate files which would then be brought into a computer in stages (Ramilli and Bishop 2010). It could be integrated with the host file for an automatic execution upon opening of the host (Li, Stolfo, Stavrou, Androulaki and Keromytis 2007) or be executed at a later time by other malicious programs (Stolfo, Wang and Li 2007).

‘Wireless fidelity’ or simply Wi-Fi is a name for the 802.11 products which are the

31 Chapter 2. Background of the Study dominant alternative to the wired Local Area Network (LAN). Using Wi-Fi, people have more freedom to physically move while remaining connected to the existing LAN. However, there are still a number of security issues associated with this technology. Although there has been growing popularity of free Wi-Fi in public places, restaurants and hotels, there are potential risks of a user’s machine being compromised by hackers or malware as Wi-Fi without encryption and authentication settings is accessible by anyone (Bowei 2009) and a connected machine is vulnerable to hacking. Also, a fake Wi-Fi could be set up with the requirement that a visitor needs to pass through a default page and provide some sensitive information (Bee 2011) whereby a vulnerable web browser could easily be hacked, as per the case illustrated in Figure 2.1. A visitor could also be tricked by the mandatory installation of a malicious program prior to obtaining the Internet connection.

While some cryptographic algorithms used in Wi-Fi are fairly secure, the most widely used security protocol, Wired Equivalent Privacy (WEP), is known to have flaws as discussed by (Fluhrer, Mantin and Shamir 2001) who showed that a long key of WEP could be discovered within a negligible amount of time. Domenico et al. surveyed Wi-Fi vulnerabilities and investigated their proximities and origins (Domenico, Giorgio and Antonio 2007). Another work also highlighted a similar concern, noting that there are several key technical and business-related challenges which must be addressed before we can really safely take advantage of this technology (Borisov, Goldberg and Wagner 2001; Henry and Hui 2002).

In addition, a vulnerable OS or application software also faces arbitrary code injections. Hackers can penetrate machines with vulnerabilities by using appropriate tools or via worm propagation, which directly exploits these vulnerabilities, thereby allowing further penetration. Metasploit is an example of a framework which demonstrates the risks faced by vulnerable OS and application software (Metasploit LLC 2011) Administrators of IT enterprises could use this framework to assess and evaluate their networks against such threats.

CDs and DVDs can execute programs automatically upon insertion of their disks (Microsoft Corporation 2011a). When a CD-ROM drive's auto-run property is enabled,

32 Chapter 2. Background of the Study an inserted CD/DVD containing malicious content can cause a computer to be fully compromised. Although this feature can be disabled, the default setting for the device may still be enabled as several versions of OSs have been shown to fail to disable it completely even when it was set to be disabled (US-CERT 2008b).

A USB drive is a popular medium for personal file transfer and backup because it is small, reliable and handy. A USB drive with U3 technology has one section recognised as a CD-ROM and the rest as a flash drive. With little effort, anyone could use the tweaking tools to customise the CD-ROM portion, which would result in a full customisation of the application being set to automatically execute upon insertion of the USB drive into a USB port (Al-Zarouni 2006). There are at least two programs involved in the tweaking process: the first removes the original ISO image file located in the CD- ROM portion of the USB drive, and then another executable is created and transformed into an ISO image to replace the original one; the second is used to install an original ISO image downloaded from the original vendor website into the CD-ROM portion of the USB drive. However, the target vendor’s URL of the website can be made local by modifying the host file of the machine, thereby causing the program to retrieve the bogus ISO image created by the user.

Educational organisations and cyber cafes are among the most volatile computer networks (Goebel, Holz and Willems 2007; Venkataram, Pitt, Babu and Mamdani 2008; Parrish 2010). Also, government officials, bodies and agencies have become targets of political, economical and military espionage (Tewari 2008; Constantin 2009; Hodge and Entous 2011; Hypponen 2012) and based on news reports, many claims have been made that attacks have been carried out by coordinated cyber terrorist groups. However, a report from the Federal Bureau of Investigation concludes that 70% of all security breaches are carried out from within an organisation (Macleod 2007). Stuxnet , DuQu and Flame are evidence of how intelligence agencies make use of malware to execute some politically motivated covert operations (Hypponen 2012).

Malware has also now spread to mobile phone devices as mid-range to high-end ones are commonly equipped with Bluetooth technology. Self-propagating malware can use an active Bluetooth at one mobile device to infect other nearby devices within the

33 Chapter 2. Background of the Study range of the Bluetooth network. Also, a self-propagating malware can simply request to connect to nearby devices and, through interactions with the owners of the target devices, can be executed (Zyba, Voelker, Liljenstam, Mehes and Johansson 2009). The first kind of this type of attack was performed by Cabir worm which, targeted the Symbian OS used in some models of Siemens, Panasonic and Samsung mobile phones (Jamaluddin, Zotou and Coulton 2004). The use of USBs, Bluetooths, and Secure Digital cards on mobile phone devices means that the source of a malware infection may come across from different devices (Dai, Liu, Wang, Wei and Zou 2010). Given the large number of Android-based phones, these mobile devices are the most often targeted in the OS environment. DroidKungFu is an example of a malware which targets Android-based phones by exploiting the privilege and trust obtained from apps updates (Zimry, Irene and Yeh 2011). Although the initial versions of these apps do not contain the malware, once installed, users are subsequently tricked by the apps to perform software updates which include it.

In today’s highly Internet-connected working environment, reliance on computers and the Internet is vital to individuals, and the public and private sectors. A serious penetration into computer networks could prevent users from performing their tasks and a larger-scale impact could even cause economic disaster.

The present major security threats in the Internet-connected environment are Denial of Service, viruses, worms and (Shannon 2007). In 2007 there was a report that several Canadian websites were stormed by Storm for DDoS attacks. Even though the attack failed, the bot demonstrated its ability to attack and involved the use of 1.7 million botnets (Gaudin 2007).

The present patch model provided by many software manufacturers seems a failure, especially when dealing with large-scale and fast-spreading attacks. There is growing concern regarding a new generation of attacks which could cause severe damage to the entire global network and provide major challenges for future solutions, such as the demand for faster detection of unknown attacks and an the ability to immunise computers affected by the first wave of attacks against malicious changes to their data.

34 Chapter 2. Background of the Study

A recent trend in malware creation has shown a very significant, almost quadruple, increase since 2007, with an average of 60,000 new pieces of malware identified per day (McAfee 2010a). Another report (Cluley 2009) noted that an independent antivirus testing body has accumulated over 22 million malware samples.

In summary, as computer users are not all educated about, or aware of, computer security issues and how to use their computers safely many are vulnerable to malware threats. Although the creation of hardware and software is advancing, it is still faced with security problems that have been exploited by malware, as outlined in this chapter. As there are too many malware to handle by means of binary/file signatures, it will be inefficient if we try to detect them by relying on binary signatures that are too specific to one or two types and might not be useful for countering others, especially zero-day attacks from brand new malware. Therefore, it is desirable for us to use common features capable of detecting most of these malware by their actions or behaviours rather than their binary signature patterns. This is one of the aims of this research.

2.5 Related Work

This research aims to improve the effectiveness of behaviour-based malware detection. Malware can only do harm if allowed to execute and propagate. We present a behaviour-based malware detection system, which focuses on the detection of malware as they both execute and/or propagate. Essentially, there are some decision processes in the implementation of this system related to the use of anomaly-based and signature- based techniques.

Chapter 3 will discuss the overall architecture of our system. In Chapter 4 we present our approach for detecting malware during propagation, and in Chapter 6 we present our approach for detecting malware during execution. In this section, we survey some related work in the literature on propagation-based and execution-based detection approaches.

35 Chapter 2. Background of the Study

2.5.1 Approaches to Propagation-based Detection

Malware that have the malicious intent of stealing information while remaining stealthy, and/or want to spread to other machines, have to propagate. Malware that propagate or self-propagate are known as worms.

Weaver et al. (Weaver, Paxson, Staniford and Cunningham 2003) provided a taxonomy on computer worms based on several factors such as target discovery, carrier, activation, payloads and attackers. Kienzle and Elder (Kienzle and Elder 2003) provided a high-level overview of worms and their trends from 1998 to the first quarter of 2003. They reported only novel worms which appeared throughout the period and a few trends in worm attacks were well presented from different perspectives. However, there is a need for a multi-layer and robust solution for these worms’ threats.

Threshold Random Walk (TRW) (Jung, Paxson, Berger and Balakrishnan 2004) is an algorithm for detecting port scanners based on whether a given connection from a remote machine to a newly-visited local IP address is successful or not. TRW makes an early decision as the traffic arrives by observing the ratio of two conditional probability mass functions. The ratio is then compared with an upper and lower thresholds. If the ratio is less than or equal to the lower threshold it is deemed benign, whereas if it is greater than or equal to the upper threshold it is deemed as a scanner’s activities. If the ratio is greater than the lower threshold and smaller than the upper threshold, the algorithm will wait for the next traffic, recalculate the ratio and perform the evaluation again. Results of their experiments shows that TRW performs faster with high accuracy than other techniques.

A comparative study involving TRW and an extended TRW algorithm called credit- based TRW was conducted in (Ashfaq, Robert, Mumtaz, Ali, Sajjad and Khayam 2008). The credit-based TRW’s increase/decrease algorithm is used to slow down computers that are experiencing unsuccessful connections. It is claimed that credit-based TRW has a lower complexity than TRW, and thus is more suitable for deployment at endpoint.

The propagation of worms requires network communication involving IP addresses and ports. Entropy was used in (Lakhina, Crovella and Diot 2005) as a tool for detecting

36 Chapter 2. Background of the Study anomalous traffic based on the features of IP and port distributions. They said that using feature distributions could naturally distinguish anomalous from benign traffic and uncover new anomaly types, and that they are a key element in achieving promising performance.

However, Shafiq et al. (Shafiq, Khayam and Farooq 2008b) claimed that entropy cannot clearly detect low rates of attack, which contradicted some of (Lakhina, Crovella and Diot 2005)’s work. They suggested that there should be intelligent consideration of several traffic features, such as the sudden bursts of session arrivals, spikes in traffic volume, entropies of destination IP addresses and divergences in port distribution.

They evaluated several types of detection classifiers against a dataset, including two well-known bio-inspired anomaly detectors called the Real Valued Negative Selection (RVNS) and the Dendritic Cell Algorithm (DCA). Non-bio-inspired detectors included the Adaptive Neuro Fuzzy Inference System (ANFIS), the Support Vector Machine (SVM), the Rate Limiting (RL) detector, and the Maximum Entropy (ME) detector. A later section provides some details of these detectors as we directly compare their results with ours.

Shafiq et al. (Shafiq, Khayam and Farooq 2008b) concluded that the use of the right features will help to detect malware regardless of the classifiers used. They evaluated a number of popular classifiers, some involving complex algorithms, and most yielded excellent results using intelligent features.

Another similar work used the same benign data, but seeded it with 100 infections of each worm (Khayam, Radha and Loguinov 2008). These authors proposed the use of the Kullback-Leibler (KL) divergence measure to quantify perturbations in port distributions in time-based windows. KL is an information-theoretical measure of the closeness between two probability distributions. They obtained the KL values from the training data which were used by the SVM in the infected data. Although they achieved 100% accuracy at most endpoints, their technique suffered from worms that propagated at very low rates.

The SVM (Burges 1998) is a popular machine learning classifier that has good

37 Chapter 2. Background of the Study generalisation ability, even when using very small amount of training data. It is suitable for classification, regression and other tasks, including anomaly-based detection. For a small training dataset, each sample is represented as a dot on a space, and the SVM constructs one or a set of hyper planes that differentiate the area between benign and worm data. Shafiq et al. (Shafiq, Khayam and Farooq 2008b) used the SVM as a benchmark in their comparative classifiers evaluation, noting that the SVM is the best classifier in their work but that it suffers from high algorithmic complexity and extensive memory use. Thus, as the SVM is an impractical solution for the real world, we are interested in finding a simpler but still robust detection technique.

The RL detection technique for endpoints was introduced by Williamson (Williamson 2002), who assumed that malware will actively attempt to contact many computers in a network; this will usually cause a burst in the intensity of requests to make outgoing connections. He claimed that malware tends to establish connections to new non-local addresses at a certain connection rate, whereas benign connections are mostly related to local addresses. This assumes that the detection of malware could be achieved by having the RL control connections to new endpoints. However, using this technique, worms with low scan rates would remain undetected and legitimate home endpoints with complex uses would be targeted, as shown in (Shafiq, Khayam and Farooq 2008b).

Using the ME, Gu et al. (Gu, McCallum and Towsley 2005) divided packets of network traffic into a set of two-dimensional packet classes. The first dimension dealt with attacks that came from the Transmission Control Protocol (TCP) and User Datagram Protocol-based traffic. The TCP-based packets were also grouped into SYN or RST. The second dimension grouped the packets based on 587 classes of destination ports. The ME estimated the packets’ distributions, using the benign traffic as a baseline, which were then used with relative entropy to detect anomalous traffic. In this method, acquiring the benign traffic, which covers a wide variety of end-user types, for the training data is a requirement. We would like to eliminate this by having no specific training data.

The type of window indicates the way in which data is organised and maintained in

38 Chapter 2. Background of the Study the detection process. In the literature (Khayam, Radha and Loguinov 2008), a time window (a window covering a fixed time period) shows impressive outcomes. However, the number of signatures within a time window could possibly fluctuate within a wide range, and may grow too large while still fitting within the window. Therefore, the cost to analyse its data could be high. In contrast, a fixed window (a window that holds a constant number of signatures) has no chance of extreme spikes in its data volume as could happen in a time window. Its focus is more on the pattern in the data, which may affect the amount of knowledge gained from it. Regardless of the window type, the nature of the data will also contribute to the results.

2.5.2 Approaches to Execution-based Detection

In API calls research, it is believed that malware generate sequences of API calls that are different from benign API calls, and involve sequences of actions such as create, read, write, delete and change files, and directories or special resources of the OS. Most malware are also capable of communicating with other hosts (although this is not always necessary) for replication attempts, accessing resources from other hosts and sending information to others. Some other types of malware, such as worms, utilise a special facility in a certain OS so that, upon logon, the malware automatically run again, thereby disallowing the end-user from making any attempt to remove them, as exhibited by the Brontok worm (www.f-secure.com 2010).

A seminal work by Forrest et al. (Forrest, Perelson, Allen and Cherukuri 1994) fragmented long system calls of UNIX processes into shorter system call signatures. They used a sequence length of 10 based on their empirical observation of the unique n- gram sequences. When deciding the size of the n-gram, they raised two issues: (1) if the size of n is large, the size of the storage database will also be large; and (2) if the size of n is too small, it could be very difficult to discriminate between benign software and malware.

Wespi et al. (Wespi, Dacier and Debar 2000) enhanced the above technique by proposing a detection using variable-length sequences of API calls as an alternative to fixed-length sequences.

39 Chapter 2. Background of the Study

Mutz et al. (Mutz, Valeur, Vigna and Kruegel 2006) proposed a technique that observes the arguments returned from system calls and uses them for the detection of multiple detection models so that their outcomes could be observed in multiple perspectives. These outcomes were used in a Bayesian network to perform an overall classification. They argued that the use of the Bayesian technique rather than a simple aggregate score improved detection accuracy and was resilient to evasion attempts.

Ye et al. (Ye, Wang, Li and Ye 2007) recorded all attempts at system calls and generated system call signatures to obtain detailed information about the process which also included the total number of called functions it made. They looked into the level of support and confidence evidenced for a particular sequence falling into the malicious category.

Whittaker and Vivanco (Whittaker and Vivanco 2002) monitored several potential APIs used by malicious code, such as a registry, file system, scripting host, and system and communication APIs. Changes made by programs were tracked and recorded to enable them to be undone if malicious API calls were detected.

On the Windows platform, each executable invokes kernel32.dll, and other APIs are loaded at a later time. Of these APIs, adv32.dll is the one that allows manipulation and access to the registry, which is a common target for the automatic execution of certain events. Winsock.dll, winsock32.dll and ws2_32.dll allow access to other hosts. Particular malware may access and use one of these APIs to establish communication with other hosts.

There are several techniques for injecting DLL into a Windows environment (Richter 1999; Hoglund and Butler 2005). One implemented in (Abimbola, Munoz and Buchanan 2006) replaced the ’s real DLL with a fake one which relayed messages sent and received by the browser to the original DLL. By doing this, the message pattern of the safe system calls can be analysed and logged. Microsoft’s Detours (Microsoft 2012c) is the first library to provide the capability of the Win32 API interception by rewriting target function images instead of binary rewriting (Hunt and Brubacher 1999). DeMarines noted that, as Detours provides the stealthy

40 Chapter 2. Background of the Study implementation of API interception, it has gained popularity among security researchers (DeMarines 2008).

There is a special component called Filter Manager (Microsoft 2012e), created by Microsoft to facilitate the integration of mini-filters (Microsoft 2012f). Developers can create mini-filters for various purposes e.g., as a security tool to monitor program execution by logging the information on various I/O (e.g., CreateFile, ReadFile, WriteFile, etc) and transaction operations (e.g., transaction ID and transaction context object, transaction callback, etc) with the filter manager (Christiansen, Thind, Pudipeddi, Groff, Cargille and Dewey 2011).

A recent work (Alazab, Venkataraman and Watters 2010) demonstrated that a sequence of API calls can be extracted from the binary code of a program, and revealed that six APIs are commonly invoked by malware: search file, copy file, delete file, get file information, move file, read file, and write file and change file attributes. However, there was little explanation of how benign programs differ from the above-mentioned set of APIs.

Shuaifu et al. experimented using an API calls-based detection in mobile phone devices (Dai, Liu, Wang, Wei and Zou 2010) by intercepting the WinCE kernel-level API calls. Their results suggested that this approach effectively detects stealth, which is a malware variant that cannot be detected by commercial antivirus programs.

Tokhtabayev et al. (Tokhtabayev, Skormin and Dolgikh 2008) used API calls with the Petri Net technique to recognise a worm’s propagation engines. They claimed that worms were tied to a limited number of propagation engines, the common behaviour of which they captured. While this approach is good for detecting malware that are propagating, it may suffer from malware that are not. Therefore, it would be better to have a detection mechanism that detects not only worms, but also other malware types.

The specification-based detection approach in (Fredrikson, Jha, Christodorescu, Sailer and Yan 2010) used an automatic extraction of malware behaviour. Their technique was based on graph mining and concept analysis, and aimed at the identification of near-optimal discriminative behaviour based on system calls. They

41 Chapter 2. Background of the Study produced good rates on the True Positive (TP) with 0% on the False Positive (FP). They compared their approach with a few commercial behaviour-based and signature-based antivirus software and the results indicated that their method performed better than the others.

Detections based on API call sequences are related to the use of spatial and temporal information in the API call sequences. Ahmed et al. (Ahmed, Hameed, Shafiq and Farooq 2009) captured both spatial and temporal information in the sequences of API calls and used statistical features to improve detection with the aim of identifying the best features for optimal results.

A significant work by Kolbitsh et al. (Kolbitsch, Comparetti, Kruegel, Kirda, Zhou and Wang 2009) also utilised both spatial and temporal information in API call sequences. They systematically collected malware behaviour in a controlled environment from a list of malware samples. They built a behaviour graph to detect malicious patterns of system call invocations, which allowed detection regardless of the order of the calls which was part of their attempt to overcome the problem of evasion attack.

Bayer et al. (Bayer, Comparetti, Hlauschek, Kruegel and Kirda 2009) proposed an approach for classifying malware samples so that malware with similar behaviour could be grouped together. Considering that the amount of malware today is too huge to handle in short time via manual work, an efficient and automated technique is needed to overcome this challenge. Using tainting technique, they extracted behavioural profiles, i.e. system calls and their related dependencies. The profiles are then linked to several OS objects, and statistical associations were derived to classify the malware samples. The results indicated that they classified more than 75 thousand malware samples within a few hours rather than weeks.

Another work by Bayer et al. (Bayer, Kirda and Kruegel 2010) proposed an approach to improve an efficiency of a dynamic malware analysis tool. They believe that most new malware are just mutations of existing ones. Therefore, improving the analysis would help malware analyst to cut the time needed to analyse their ever-

42 Chapter 2. Background of the Study increasing amount. They created a binary and a behaviour profile of polymorphic malware, and relying only on a small sample of polymorphic malware within each cluster. For each of a selected malware, they compare it with the profile, using some approximation and distance relationship measures. Even though their approach shows a huge reduction in term of times because only a small set of malware are analysed while deciding for the full set, the technique is susceptible to malware that is designed with the capability to hide or delay its malicious activities.

2.5.2.1 Executable in Windows

In Computing, an executable file is a computer program which performs indicated tasks according to encoded instructions (Merriam-Webster 2012). In Windows OSs an executable file is a variant of the Common Object File Format (COFF) (Microsoft 2012b), a specification of a format for executable and object files used on Unix-based systems (Texas Instruments 2009). An executable in Windows has the EXE file extension, a common file extension for executable files originally for DOS. The 16-bit MZ DOS executable was the early format for DOS. The file contains MZ characters in ASCII which can be found at the beginning of the file. It can be executed in DOS and 9x-based OSs.

There are several types of EXEs in Microsoft Windows OSs (Microsoft Corporation 2004a) such as:

• the16-bit New Executable (NE); • the 32-bit Portable Executable (PE); and • the 64-bit PE (PE32+).

The 16-bit NE file begin with NE in ASCII. It can be executed under 16-bit and 32- bit Microsoft OSs. The 32-bit PE file contains the words PE and MZ in the beginning in ASCII. That means this type of exe can run in Windows 32 bit and DOS. The 64-bit PE file is intended for 64-bit Windows OSs. Usually the old types of executable can run in the newer OSs due to support or special environment provided in that version of OS, For example, under 64-bit Microsoft OS, 32 bit PE can still run (Microsoft Corporation

43 Chapter 2. Background of the Study

2010a) due to support provided but 16 bit NE cannot. Hence, the later cannot be run in 64 bit OS. Besides PE on Microsoft Windows, there are several formats of executables which are commonly used such as ELF on Linux/Unix (TIS Committee 1995) and Mach-O on Mac OS X (Apple Inc 2009).

The PE format is a data structure that defines about dynamic library references for linking, API export and import tables, resource management data and thread-local storage data. The PE format is not only used for executables, but also DLLs, SYS, OCX, and CPL (Pietrek 2002a). A PE file is composed of several parts such as the MS- DOS 2.0 header, the PE header, the section table and the import pages (Pietrek 2002a; Pietrek 2002b; Microsoft Corporation 2010b).

The MS-DOS 2.0 header is used for compatibly purpose only and usually displays a special error message i.e., when the executable which is supposedly run in Windows is executed in DOS environment. The PE header contains some specific data describing the file itself, including the PE characters in ASCII and the size of the optional data it has. If the file is a 64 bit PE, both a 32 bit and 64 bit header versions are provided. The section table is an array on section headers. Each section header contains information related to the section, location, size and etc. The import pages consists of several parts, among others are the imports section, exports section, base relocations and resources section.

The PE generated by .NET compiler contains a slight variation in its format which includes Common Language Runtime (CLR) Header and CLR Data section. When executing a .NET PE, the OS loader passes its execution to the CLR via a reference in the PE’s Import Table section which subsequently loads the PE’s CLR Header and Data sections. What is so unique in .NET PE is that the CLR Data section has two essential parts i.e., metadata and Intermediate Language (IL) code. IL is not a machine code, but is similar to Java bytecode. Regardless of the language used to write a program in .NET, either C#, VB.NET, J# or other supported languages, the business logic in the program will be transformed into a common IL (Pietrek 2002b). For a comprehensive discussion on PE, readers may refer to the following source (Microsoft Corporation 2010b).

44 Chapter 2. Background of the Study

There are issues on what is executed in a PE. OSs usually control the access to computer resources and this is achieved by enforcing executable to invoke system calls/API calls in order to get access to those resources (Stallings 2011). However, the execution of rootkits or malware with kernel-level operations may jeopardise this restriction. Native applications in Windows are also have EXE extension. They are called native because they are developed based on Native API rather than Windows API (Russinovich 2006). Hence, their execution do not bound to security context of the OS environment and usually involve kernel-level operations (Coombs 2005). If this happens, additional privilege escalation is possible (Rouse 2010; Nichols 2007).

2.5.2.2 APIs in Windows

APIs in Windows are dynamic link libraries (Microsoft 2012d), which is the essential core component of the OSs (Microsoft 2012h). There are two types of API in windows OSs which are the Windows API and the Native API. The Windows APIs are a set of functions exposed to windows-based application developers in Microsoft and third party application software vendors. The purpose of this API is to ease software development within the Windows OS environment so that the application is developed in a way that is consistent with the OS operations and its standard user interface (Microsoft 2012a). Furthermore, it helps the OS to control access to resources more efficiently.

The Windows API contains many functions and they are packaged into several functional categories (Microsoft 2010).

Administration and Management functions consists of API related to installation, configuration and service facility for applications and the OS.

Diagnostics functions assist programmers to trace and correct problems in applications or in the OS. They also provide the facility to monitor their performance.

Graphics and Multimedia functions contains support for applications to integrate with multimedia elements and generate graphical output (by using graphics device interface (GDI)) into media such as display, physical devices (e.g., printer) and logical devices

45 Chapter 2. Background of the Study

(e.g., memory device).

Networking functions enable applications to communicate and access to network resources such as shared directories, printers, and other network objects across computers in the network. They also provide support to various technologies such Windows Sockets, Remote Procedure Call (RPC), Simple Network Management Protocol (SNMP), Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Gopher, the Domain Name System (DNS), etc.

Security functions provide the implementation for various security related issues such as authentication, authorisation and auditing on computer and network resources.

System Services functions offer a vast set of functions for applications to access computer resources and the internal components of the underlying OS. Included are things like files, directories and input and output (I/O) devices, registry, processes, threads, etc.

Windows User Interface functions enable developers to create applications with common appearance and behaviour which are stable and consistent with the OS environment and other applications. Some of these functions facilitate the access to objects provided by the windows shell and also provide control of mouse and keyboard events as inputs for processing. Developers can quickly include in their applications the standard common dialog boxes for opening and saving files by using the common dialog box library and the advanced controls such as status bar, progress bars and tabs by using the common control library.

For a comprehensive explanation on Windows API, readers may refer to the following source (Microsoft 2010).

Meanwhile, the Native APIs are primarily used to develop application beyond the scope and control of the underlying OSs e.g., some programs run during system boot. Besides that, some Windows APIs also use the Native API internally. The Native API has around 250 functions and only around 25 of them are documented while the rest are not widely known (Russinovich 2006). Native APIs are grouped into several functional categories and readers may refer to the following source for some of useful explanation

46 Chapter 2. Background of the Study on the topic (Russinovich 2004; Chen 2009).

A problem with Native API is that a malware writer with some knowledge about these APIs could create a trapdoor which subsequently allows arbitrary code to be injected or some common process bypassed (Lack 2003; Rogers 2003). When this happens, the protection rings mechanism could be compromised. The protection rings mechanism is done via hardware (i.e. the CPU) to provide protection against hardware access faults or malicious access to hardware resources (Shinagawa, Kono and Masuda 2000). For instance, Windows XP’s kernel mode is in ring 0 and user mode is in ring 3. Access to kernel mode will require a privilege mode (Russinovich and Solomon 2005). The default execution of a Native API is in ring 3.

With the introduction of the .NET framework in 13 February 2002, Microsoft aims to replace Windows API with .NET Framework API. For most of the Windows API, newer implementations are introduced in the .NET Framework API (Microsoft Corporation 2004b). Applications developed in .NET are encouraged to use the .NET Framework API, although the use of Windows API is still possible within the .NET development environment by using Platform Invocation Services (Getz 2002; Microsoft 2012g).

2.5.3 Concluding Remarks on Detection Approaches

The demand for a new, reliable and dependable detection approach has been voiced in many publications (Ford 2005; Tavallaee, Stakhanova and Ghorbani 2010; Vega 2011) which have stressed the danger of present and future malware, as well as the gaps in existing detection approaches. To date, we have not seen any work that systematically addresses both propagation and execution of malware. We would like to discover how malware missed using one detection approach could be detected by the other. As we share the vision outlined by (Tavallaee, Stakhanova and Ghorbani 2010), we provide details of these two approaches for future replication.

47 Chapter 2. Background of the Study

2.6 Influence of Human Immune System (HIS)

In this section, we briefly describe the mechanisms of the HIS (Delves, Martin, Burton and Roitt 2006) which have inspired many researchers to adopt similar characteristics in computer defence. There are a number of other research studies, including those of (Kim, Greensmith, Twycross and Aickelin 2005; Fu, Yuan and Hu 2007), which have attempted to explore HIS theories (Matzinger 1994; Dasgupta 2006), including the Self/Non-self Theory and the Danger Theory, and apply their essential processes in malicious code detection systems.

In HIS, B cells are white blood cells that play a large role in the humoral immune response, whereas T cells have roles in the cell-mediated immune response, also known as the innate immune system. The major task performed by B cells is to make antibodies.

A variety of T cells, Natural Killer (NK) cells, recognises a pathogen when the cell's Major Histocompatibility Complex (MHC) shown on its surface is detected as non-self. Damaged or infected cells tend to show unusual levels of MHC.

Forrest et al. (Forrest, Perelson, Allen and Cherukuri 1994) adapted this idea for their Self/Non-self Theory. Benign cells may show low volumes of MHC for reasons such as the cell being too old or damaged. The NK brings these cells to lysis (repair) or apoptosis (dismantlement for complete and safe destruction).

NK cells are cell killers activated when they receive one of the following signals. Cytokines: A stressed cell may release uric acid to inhibit a pathogen that is entering through its cell wall. NK cells detect this acid and respond against pathogens situated in the surrounding area of the cell. Matzinger's Danger Theory (Matzinger 1994) adopted this process. FC-Receptor: At the site of infection, a large number of white cells engulf pathogens and repair infected cells. Activating and inhibitory receptors: NK cells have receptors that connect to nearby cells to regulate their destructive activities. The HIS’s defence has inspired many researchers to adopt similar characteristics in

48 Chapter 2. Background of the Study computer security and there are several models of the Self/Non-self Theory (Dasgupta 2006). Pioneering works by Forrest et al. (Forrest, Perelson, Allen and Cherukuri 1994; Forrest, Hofmeyr and Somayaji 1998) used the nature of peptides to allow the recognition of self and non-self by using the input vector as analogous to the peptide. Using the negative selection algorithm, there are two stages involved: generation and detection. In the first, a normal profile is recorded with the assumption that there are no intrusive activities. Once the normal profile is sufficiently developed, a raw vector is passed to a process with the aim of matching it to the self-sample. If there are any matching signatures, the ones in the self-sample are discarded and the remaining vectors (abnormal) are passed to a detector. In the detection stage, the detector compares the recorded attack vectors with the incoming vectors and any matching pattern is considered anomalous.

There are a number of other research studies, including those of (Kim, Greensmith, Twycross and Aickelin 2005; Fu, Yuan and Hu 2007), which have attempted to explore ideas of mapping between malicious code detection and the Danger Theory. In (Kim, Greensmith, Twycross and Aickelin 2005), the authors adopted the functionality of dendritic cells (DCs). Firstly, the DCs forward a collected protein (antigen) together with its environmental context to the effector’s T cells. When passed to the lymph node, a DC displays an antigen with context signals, and T cells that have a complementary receptor for the antigen are activated for immunisation. If a cell is stressed because danger is present within a particular tissue, nearby DCs will produce inflammatory cytokine. Then, the cell will undergo lysis or apoptosis. Additionally, the authors included the idea of pattern recognition receptors that are available on DCs and can detect certain well-known pathogens, such as bacteria, that have particular proteins called pathogen-associated molecular patterns (PAMPs) which are learnt over a long time. Mature DCs activate the immune response and semi-mature DCs suppress it. These activation and suppression processes regulate and balance the immune response activity.

In the work of (Kim, Greensmith, Twycross and Aickelin 2005), the PAMP can be assumed as a security policy violation. The Safe Signal is the same as normal behaviour

49 Chapter 2. Background of the Study whereas the Danger Signal is equivalent to a harmful symptom, such as a sharp spike in memory or Control Processing Unit (CPU) processes. Cytokine is equivalent to a system’s load average that can change as a result of one or more symptoms. An antigen is regarded as an exploited system call.

There are several models of the Self/Non-self Theory and many of their algorithms have improved since the pioneering works of Forrest et al. (Forrest, Perelson, Allen and Cherukuri 1994; Forrest, Hofmeyr and Somayaji 1998) which can also be seen in (González, Dasgupta and Niño 2003; Ji and Dasgupta 2004).

2.7 Conclusions

The IDS has been an active research area for more than half a century. However, it has become more popular since the evolution of the PC and subsequent easy access to the Internet, from which high numbers of intrusion incidents are continually being reported. This research area will remain relevant as intruders continuously update their intrusion techniques and tools. In this chapter, we surveyed trends in malware detection and IDS research, as well as a few theories relating to HIS which have inspired the development of detection techniques in IDS and malware research. We also outlined a number of research issues in the related areas.

50

Chapter 3

Malicious Code Detection Architecture Inspired by Human Immune System

Part of this work is based on Marhusin, M. F., D. Cornforth, and H. Larkin (2008). Malicious Code Detection Architecture Inspired by Human Immune System. Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD'08): pp. 312-317.

3.1 Chapter Objectives

Malicious code is an evolving threat to computer systems worldwide. It presents challenges for attackers to improve their techniques, and for researchers and security specialists to improve their detection accuracy. We present a novel architecture for an effective defence against malicious code attack, inspired by the human immune system (HIS). The next section describes our architecture in depth and, before concluding this chapter, we outline several ways in which our idea could fit a number of situations. We also highlight the challenges this architecture might face in order to achieve its objectives.

3.2 System Architecture

Malware can only do harm if it is allowed to execute and propagate. In this study, as we present a malware detection system which focuses on two paradigms, that is, detecting malware during its 1) execution and 2) propagation, there are two main Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System detectors in it. The first monitors executables as they execute and the second monitors their propagation activities.

This proposed malicious code detection system is inspired by the theories associated with the HIS. In execution, we are interested in pattern recognition using vectors derived from API calls rather than from the code signature itself, as suggested by Forrest et al. (Forrest, Perelson, Allen and Cherukuri 1994; Forrest, Hofmeyr and Somayaji 1998). In propagation, we are interested in session traffic as implemented by (Khayam, Radha and Loguinov 2008; Shafiq, Khayam and Farooq 2008b).

Figure 3.1: Detection architecture of the system

Figure 3.1 illustrates the general components in our proposed architecture. All programs that run are monitored by the two main detectors. The first monitors the activities of each individual executable: as an executable executes, an instance or agent of the first detector will hook itself with the executable with the aim of finding anomalies in the API call sequences of each program. The second monitors the session activities of all the programs in the computer, looking for anomalies in the session traffic moving in or out of the computer. Thus, when an executable starts, its API calls and session-based traffic are monitored. The monitoring of API calls is localised to that individual process/program. However, all session-based traffic generated by processes/programs is combined as a stream of all sessions and this stream is monitored for anomalies.

52 52 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System

As well as malware and benign profiles, the first detector has another type of which is unique to each individual program. Although at a glance, it appears too complex to have two types of benign profiles, we explain when each of them is used in Section 3.3.1 and Section 3.3.2. The second detector also has malware and benign profiles.

Analogous to the feature of the thymus that provides immunity for a newborn baby, the initial malware and benign profiles for the first detector are obtained and loaded beforehand. This could occur by running as many existing malware as possible within a test bed to collect API call patterns. Common user applications on a specific platform are also obtained. In a commercial environment, although it may be too costly, an antivirus company can achieve this by collecting API calls of common benign software used by users worldwide. As it is a difficult assumption that these API calls could be used to generate a global benign profile which could be discriminated from worldwide malware ones, it requires a great deal of research effort and is beyond the scope of this thesis.

Therefore, the program profiles that need to be included depend on the programs installed or available on the computer. The refinement of profiles is crucial for reducing their sizes and profile-matching overhead. This results in two profiles, which are used for detection of malware during its execution. For detection at the propagation level, the malware and benign profiles are built based on the concept of ‘recency’, which assumes that normal traffic could change over time and that differences from normal recent traffic could be indicators of malware (the details are provided in Chapter 4).

We introduce two phases of program detection: 1) Adolescent; and 2) Mature. The two main detectors are operational in both phases. The propagation level detector functions the same way in both the Adolescent and Mature Phases, but the execution level detector works slightly differently. This two-phase detection architecture is designed to define the practical uses of the two main detectors, particularly the execution level detector, in a real environment. Each program must separately and independently participate in these two phases.

53 53 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System

3.3 Two-phase Malware Detection

Our proposed detection model consists of two phases, Adolescent and Mature. In the former, all programs are constantly watched for malware activities during their execution and propagation. Mature programs, which are considered safe, are switched to the Mature Phase, which is a more relaxed form of monitoring; any anomalous behaviour moves them back to the Adolescent Phase.

Any program that survives the Adolescent Phase is expected to eventually enter the Mature Phase (provided the application is continually used and not run only once). The occurrence of a deviation is analogous to the process of an NK cell recognising a pathogen when the cell's fragmented peptides shown on its surface are detected as non- self. The following two sub-sections describe these two phases.

3.3.1 Adolescent Phase

In the Adolescent Phase (see Figure 3.2), the first detector (at the execution level) monitors every individual executable as it begins its execution. An instance or agent of the first detector will hook itself with the executable, which allows the retrieval of API call sequences triggered by the executable. These long sequences of API calls will be organised into n-grams and the detector aims to look for anomalies in those of each program. The program’s n-grams are compared with the malware and benign profiles both of which can be updated from time to time when necessary either automatically or manually. If the n-grams suggest the executable is more likely to be malware, it is blocked, deleted or quarantined so that further actions can be taken. The signature obtained from the matched executable can be used to scan for related files in directories, which allows for the complete removal of the malware. In fact, this process is similar to the technique used by existing commercial signature-based antivirus software. If it is suggested that the malware is a benign execution, the program is allowed to execute and the Profile Builder collects and updates the program’s own profile. The process of building a profile is explained in Section 6.3.2.

The second detector (at the propagation level) monitors the session traffic generated

54 54 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System by all programs in the computer, checking for anomalies. The session traffic is organised into a certain window setting and compared against the profiles. Normal traffic is allowed to pass, but anomalous traffic is blocked and similar actions to those of the first detector can be taken.

The process of upgrading each program from Adolescent to Mature is conducted on a program-by-program basis. A program will move from Adolescent to Mature when its program profile becomes mature, which depends on the following:

• The cumulative amount of time the program has been running. • The spread of an application's executed code. Naturally, not all code within an application always runs. • The number of failed attempts to add a new n-gram of API call sequences and the session-based traffic exceeding a threshold which means that harvesting a new pattern is already too exhaustive.

Each of the above rules relies on its own threshold. However, the process for determining these thresholds is beyond the scope of this thesis.

Figure 3.2: Processes in Adolescent Phase

55 55 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System

3.3.2 Mature Phase

The processes of the detection system in the Adolescent and Mature Phases are very much alike, except for detection at the execution level (see Figure 3.3).

Figure 3.3: Processes in Mature Phase

Once an application has entered the Mature Phase, the system does not compare the n-grams of the API call sequences with the full malware profile , nor is the program profile updated. Consider this analogy: a person meets new people each day and, as that person forms a longer relationship and the spread of situations that the two are involved in increases (i.e., the more aspects of the other's life a person witnesses), the more that person is trusted. Once people are trusted, they are in the Mature Phase. Less checking of them is conducted and more trust is placed in them to execute potentially dangerous actions (such as giving them a spare key to a house, etc). If they start behaving differently from expected, more wariness of them is displayed and they become less trusted. Naturally, someone trusted could still harm a person, and the same applies to applications in the Mature Phase. All that can be done in this case is to roll back damaged files where possible and manually repair other damage (such as malicious code utilising an email address list to propagate the viral payload to others).

The second detector still works the same way as it does in the first phase. It monitors the session activities of all programs in the computer, and anomalous program activities cause them to be blocked. If the program is on the local machine, it is deleted

56 56 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System or quarantined so that further actions can be taken.

3.4 Threat Model

Signature-based detection is incapable of dealing with current high-end and evasive malware threats. It is believed that a behaviour-based detection could overcome this problem (Cavallaro, Saxena and Sekar 2007; Moser, Kruegel and Kirda 2007b). We have already described the aim of the detection system in Chapter 1, that is, to detect malware as they try to propagate and execute by monitoring their behaviours at those times. In this section, we describe the threat model this detection system is capable of detecting, and that which it may not be able to detect. Lastly, we discuss the possible threats that might be faced by the detection system and how to react to them.

3.4.1 What the System is Designed to Detect

As has been previously mentioned, if a detection system looks for evidence only at the propagation level (network), it misses some useful information at the execution level (API call), because such evidence might not be visible on both levels (Kruegel, Mutz, Robertson and Valeur 2003). A classical case of a virus problem is that it executes its payload and renders a computer unusable. A good detection system at the system call level is needed to detect the presence of a malware as, although a brand new malware does not necessarily damage a computer, it establishes connections with remote computers for various purposes (www.f-secure.com 2010). In this case, detection systems at the system call or network level are required if we want to detect the malware while it propagates. Having two detectors, one at each of the execution and propagation levels, we need only one of them to detect the presence of malware. Further actions, such as: 1) cleaning up the malware; 2) identifying the individuals or organisations involved; and 3) implicating the perpetrators, fall under programming issues and computer forensics (CF) techniques.

Although the experiments conducted in Chapters 4 and 6 use datasets of mostly binary executables, in principle the proposed detection architecture could detect malware in other forms, such as scripts, compiled java jars and macros. The first

57 57 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System detector could monitor the respective runtime environments, interpreters and/or applications that run them. Any anomalous behaviour can then be linked to its respective source; for instance, a VBScript file is executed by a specific program (Microsoft 2012i). The first detector could monitor the executable and identify the script file being executed so that a malicious activity detected on the executable causes the script to be quarantined. However, a dedicated evaluation of these types of malware is beyond the scope of this thesis.

A program that performs malicious activities can be detected as early as its first run in the Adolescent Phase, when the n-grams of API call sequences or session-based traffic are compared with established profiles. Some programs may start showing their dangerous behaviour at a certain point in time when the attacker uses an existing benign application as a means of executing its payload within a benign application’s execution context (Cavallaro, Saxena and Sekar 2007; Cavallaro, Saxena and Sekar 2008). As it is unusual for a normal program to have a malign feature, such as key logging, downloading executables and mass mailing, unless it is installed for that purpose recognising malware could be possible at the time the malicious behaviour appears. If a program starts showing its anomalous behaviour in the second phase, the first detector would switch the executable into the first phase and if the malicious behaviour is detected the malign executable is blocked and processed for further actions, and any affected files are restored to their original conditions. Similar action would be taken by the second detector if the anomalous traffic is detected in the second phase.

3.4.2 What the System May not be Able to Detect

It might be possible that a malware writer could circumvent this detection model but still be subjected to further evaluations. As the upper hand is always on the malware writer’s side, a malicious payload can be crafted in various cunning ways so that its presence is hidden. In relation to our approach of detection during propagation, an evasion could be possible by using a technique called a mimicry attack (Wagner and Soto 2002; Kruegel, Balzarotti, Robertson and Vigna 2007) which causes the malicious traffic pattern to resemble benign ones. If the attack traffic looks just like benign traffic

58 58 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System in terms of its pattern, it could possibly evade detection.

In relation to our approach of detection during execution by means of API call sequences, the attacker can re-order the sensitive parts of the sequences to make new ones which appear unlikely to be malicious (Kolbitsch, Comparetti, Kruegel, Kirda, Zhou and Wang 2009).

Also, a malware writer could initially avoid detection at the system call level by using the syscall proxying technique, which refers to a program execution at one computer while all related system calls are invoked at the remote computer the writer uses to execute the malware (Garfinkel 2006). However, it is likely that this will generate a high amount of anomalous network traffic, which will be subjected to the detection radar of the detector at the propagation level.

It is possible for a malware to evade detection if the execution is related to only an insignificant number of API call invocations. A worst-case scenario is if a malware writer designs a malware that executes without invoking any API calls in the computer. Although we have yet to discover such a malware in the current OSs environment, it could possibly be crafted.

3.4.3 Possible Threats and Responses

A potential threat to this architecture is that, if a rootkit (Hoglund and Butler 2005) infects a computer, we cannot guarantee that detection at both the network and API call levels could not be compromised or altered by it. It is an uphill challenge to determine if a particular malware is not a rootkit, unless its signature has been discovered via manual work by a virus analyst. It is important for a detection system to seriously consider whether a computer has been compromised with rootkits. If so, cleaning them up might require restoration of the previous image on a computer’s hard disk or, in an older fashion, formatting of the hard disk and reinstallations of the OS, drivers and application software.

Without concrete experiments, it is difficult to gauge whether a system can detect malware because, after all, it is subject to the quality of the detection algorithm, a good

59 59 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System selection of data features for analysis, and knowledge of the system’s ability to adapt to new advancements in evasion techniques in the future.

3.5 Remarks

There would be cases in which a legitimate program misclassified as malware. Such programs are often related to hacking and penetration tools, and an end-user may be able to manually insert a program into the Mature Phase at their own risk.

Changes and deletions to files are logged. A file deletion causes the file to be backed up first; if the delete operation is not part of a malware pattern, the backup copy is also deleted, otherwise the original file is restored. We propose the use of a versioning file system, which uses a copy-on-write approach to file modification, as an underlying format for disks. It stores the original contents of files for a certain period so that unwanted changes can be rolled back. Today, newer OSs offer similar technology (Shadow Copy has been available in all releases of Microsoft Windows since Windows XP SP1 and Mac OS X Leopard has a Time Machine) which makes a rollback after a malicious code has executed an easy task.

The proposed architecture of our system is a generic model for a platform- independent computer defence against malicious code attack. Although we use the term ‘API call’ which refers to a Windows environment, a similar concept for other OSs is still applicable.

3.6 Challenges and Conclusions

There are vast resources available on the Internet for malicious code writers to take advantage of opportunities to create new malware. However, if it exhibits behaviour already known in the system’s malware profile, its execution will be detected.

In this chapter, we introduced our overall architecture for an effective solution against malware. Inspired by the HIS, our system detects malware, even if it was not detected earlier, by looking at patterns in the program’s API call sequences or the session-based traffic.

60 60 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System

Our architecture poses the following challenges.

• The ability to learn from past malware and anticipate the next completely new malware behaviour by profiling its first variant may possibly overcome its effect. • How to minimise the impact on a computer’s performance of hooking up its OS files to obtain as many relevant system calls as possible. Unhooking selected API calls on OS processes could be a solution, as we assume these processes are clean, until a desirable performance is achieved. However, this assumption can be wrong when using a pirated copy of software or a modified installer. Together with an optimised classifier, reductions in these profiles could also enhance the system’s speed. • The ability of the system to detect a very low propagation level or a propagation that appears like a benign pattern. This is the common problem faced in anomaly-based detection against propagating malware (Khayam, Radha and Loguinov 2008; Shafiq, Khayam and Farooq 2008b). A careful algorithmic design of session-based or API-based detection could possibly overcome this issue. • It might be possible that a benign executable has been compromised to execute malware which may happen in both the Adolescent and Mature Phases. If so, it is very difficult to make a decision as to whether a malware, really exists in the benign process, or if it is just a false alarm which we know will annoy end-users and cause the detection system to be considered unreliable. In this case, an end-user would usually assume that such an exploited benign application would always execute benign instructions and allow the process to continue. The question is how to convince an end-user that a benign executable has been compromised. No experiment was conducted to evaluate the effectiveness of this architecture as a fully fledged behaviour-based malware detection system in a dedicated environment. However, we model and test the detectors, which detect malware at the execution and propagation levels separately, in subsequent chapters. A malware that self-propagates

61 61 Chapter 3. Malicious Code Detection Architecture Inspired by Human Immune System poses a much greater danger to a network than one that does not. If it is a novel type and manages to evade present detection systems, it can spread easily, reach most Internet- connected computers around the world and fulfil its objectives, as demonstrated by Code Red , Code Red II , Melissa , Witty , Nimda , Nachi , Santy , SQL.Slammer , MyDoom , Stuxnet , DuQu and Flame (Yegneswaran, Barford and Ullrich 2003; Ford 2005; Hypponen 2012). Hence, we begin our investigation into detection at the propagation level and then look at detection at the execution level.

62 62

Chapter 4

Towards Effective Detection of Malware at

Propagation

Part of this work has previously appeared in

Marhusin, M. F., C. Lokan, D. Cornforth, and H. Larkin (2009). A Data Mining Approach for Detection of Self-Propagating Worms. Third International Conference on Network and System Security (NSS'09): pp. 24-29.

4.1 Chapter Objectives

There are cases that malware possess features of combined categories. However, most of malware that propagates categorised as worm. Since the objective of the detector discussed in this chapter is about detecting malware that propagates, the focus is mainly on worms. In this chapter, we evaluate our systems against several worms, i.e., Blaster , Dloader-NY , Forbot-FU , MyDoom-A, Rbot-CCC , Rbot-AQJ , Sdbot-AFR , CodeRed II , Witty, SoBig.E , Zotob.G and a simulated worm. The main contribution is that we provide an insight into the importance of several detectors working together to achieve a good mechanism for detection at a malware’s/program’s propagation stage. As malware types vary, this project also outlines which worms can and cannot be fully detected by identifying the characteristics of their behaviours. The main aim of this malware detection at the propagation level is to achieve high True Positive (TP) detection rates while minimising the False Positive (FP) rates. In other words, we want the system to reliably detect worms while not raising too many false alarms, which indicate situations in which benign software is wrongly detected as malware. Chapter 4. Towards Effective Detection of Malware at Propagation

This detection is based on maintaining a window of recent session-based traffic, against which a new session is compared. We test the detection technique from the perspectives of a time window (that is, the window may contain all traffic in the last N seconds and the number of sessions may vary) and a fixed window (that is, the number of sessions is fixed but the time span each covers may vary). In both techniques, we report some baseline and detectors’ performances, and compare the results with related work in the literature.

4.2 Related Work

The work in this chapter is based on a similar technique in the literature (Khayam, Radha and Loguinov 2008) which used a KL divergence measure to capture profiles of both benign and worm traffic. It took into account the port values and observed the divergences of port histograms between observed traffic and recorded profiles. Although this technique showed that it achieved 100% accuracy at most endpoints, it suffered from worms that propagated at very low rates. Its best result was achieved by using SVM but, as SVM requires more memory than other techniques and is algorithmically complex, it is not really a practical solution (Burges 1998; Shafiq, Khayam and Farooq 2008b).

A similar work by (Shafiq, Khayam and Farooq 2008b) suggested that quantifying several traffic features in a fixed time window of 30 seconds could improve detection. Among the features they considered were: 1) the burstiness of session arrivals because worms usually have constant burst patterns; 2) the fact that worms usually cause traffic volumes to greatly increase; 3) the entropies of destination IP addresses, as worm attacks are likely to target several IP addresses from the same set within a window; and 4) the divergence of port distributions which, when worm attacks occur, increases in terms of diversity and intensity.

Their approach achieved very high TP rates (at least 98.7% at one of the 13 endpoints) and very low FP rates (the worst was 0.03%). Although this time-based window approach showed impressive outcomes, the cost of analysing a collection of ports may be much greater than using a fixed window, especially when there is a lot of

64 64 Chapter 4. Towards Effective Detection of Malware at Propagation traffic. Therefore, we evaluate both fixed and time windows in our detection system.

In the data mining field (Han and Kamber 2006), a window-based technique called the sliding window model is ideal for streamed data, because it observes patterns in recent data, rather than in the entire stream seen to date, based on which a decision can be made. As a long stream of traffic contains patterns that change over time, using the entire stream to make a decision about a current pattern is unreliable. In our study, we adopt the concept of ‘recency’ in benign traffic, which assumes that the traffic could change over time but in a different way from changes caused by malware activities. Also as, when anomalous traffic is detected, it means the system is in danger (Aickelin and Cayzer 2002), a part of our system will enact a more thorough detection.

We provide a comparison with recent studies in the literature, in many of which a great deal of emphasis was placed on improving detection accuracies, as described earlier. Some algorithms yielded very high accuracy but were very costly in terms of their execution performances, which we present as a benchmark for future work on this subject.

4.3 The Architecture of the Detection System

The detection system is depicted in Figure 4.1. All incoming sessions will be checked by three key detectors, namely the port-based detector (P), the frequency- based 3/session rate-based 4 detector and the signature-based detector. When a signature in question ( q) is to be evaluated, the signature together with a collection of the most recent signatures are added into a detection buffer ( E). This detection buffer is updated as each new signature emerges.

3 uses session traffic in fixed window 4 uses session traffic in time window

65 65 Chapter 4. Towards Effective Detection of Malware at Propagation

New Signature q

Recent Signatures E

Frequency-based/ Signature-based Port-based Detector Session Rate-based Detector P Detector T R

Decision

Figure 4.1: System architecture

To simplify the explanations of, and references to, these detectors and other components, we use the notations listed in Table 4.1.

The port-based detector makes its decision by observing the use of ports in the detection buffer. Port perturbation is detected when the current port value belonging to q happens to be the value with the highest frequency in E. A danger alarm is raised but the final decision still depends on the other detectors’ outcomes. The role of the frequency- based/session rate-based detector is to recognise whether the signature that causes a stress alarm is self or non-self. It monitors the patterns of the signatures in E and raises a danger alarm if the detection buffer has a frequency or session rate higher than the threshold value which is dynamically updated each time a set of sessions passes. This dynamic update process is described in Section 4.3.3 and Section 4.3.4 and its pseudo- code is depicted in Figure 4.3 in Section 4.3.7.

The signature-based detector uses only q to derive its decision, based on its knowledge of recent known benign and worm signatures which it develops and updates over time by looking at the self and non-self scores. If the benign score is higher than that of the worm, the detector reverses the alarms, if any, raised by the first two

66 66 Chapter 4. Towards Effective Detection of Malware at Propagation detectors. There is also a so-called detection based mechanism, activated for only a short time when worms are present, which involves the last two detectors performing additional checking on q. The final decision made by the system is updated as a piece of new knowledge is added into this detector. Details of each detector are provided in following sections.

Table 4.1: List of notations used

Notation Description P Port-based detector in which:

• PT is in the time window; and • PF in the fixed window. Anomaly-based detection of port patterns R Frequency-based/session rate-based detector in which:

• RT is in the time window; and • RF in fixed window. T Signature-based detector, a tree-based structure containing a worm tree and a benign tree, in which:

• TB is the benign tree; and • TW the worm tree for which • T returns a value as a result of comparing the five parts of its signature with q and has a threshold of 4/5 for a positive result;

• TAge defines their maximum sizes; • TBval is an integer value returned by the benign tree after searching its q pattern;

• TWval an integer value returned by the worm tree after searching its q pattern; and

• TMin the minimum satisfied return value expected to allow the negation of alarms. D Total number of matches returned by a tree ranging from zero to five as a result of comparing the five parts of its signature with q E Detection buffer containing recent signatures, including q F Benign buffer containing recent benign signatures q q Signature in question t t Detection threshold W Size of the time or fixed window used to control the size of E B Maximum number of threshold candidates for t in which:

• BT is in the time window; and • BF is in the fixed window . A Detection based on activation

67 67 Chapter 4. Towards Effective Detection of Malware at Propagation

4.3.1 Detection Buffer

The detection buffer ( E) provides information to the other detectors in the system; the decision made is based mainly on the information in E at the time the detection is made. Every time a new signature q is to be evaluated, q is added into E together with a collection of the most recent signatures in an overlapping window. Each time a decision is about to be made on a new q, the previous q only acts as the most recent signature in E. As these signatures are organised in a list according to their sequence of arrival, the newest signature is in the most recent row. There are many ways of establishing an E, but the most popular are in fixed and time windows.

In the time window approach, the size of E is constrained by the parameter W which is defined as the time span between the newest and oldest signatures. There could be any number of signatures in E, providing the interval between the newest and oldest is less than or equal to W. When a new signature is added into E, zero, one or more of the existing signatures in E are removed.

In the fixed window technique, signatures are organised into overlapping fixed windows and stored in E with their total number not exceeding the maximum number allowed by the parameter W. Owing to that constraint, as a new signature is added into E, the oldest is removed when, and only when, the total number of signatures in E reaches its limit.

A further explanation of W, including its values, is in Section 4.4.

4.3.2 Port-based Detector

The port-based detector ( P) aims to detect anomalies through port patterns using the signatures in E to deduce a decision, and is referred to as PT in a time window and PF in a fixed window. The idea behind this detector is that a worm that propagates in certain rates using the same or adjacent port numbers within a time or fixed window may signify an illegitimate traffic. Port perturbation is detected when the current port value is grouped in a bin that has the highest frequency. The signatures are arranged in their correct sequences and q is the last signature in the list.

68 68 Chapter 4. Towards Effective Detection of Malware at Propagation

The bins are divided into source and destination ports, and two or more signatures with the same port value are added into the same bin while a different port value is added into a different bin. Port collections with high cardinality create a large number of bins and those with low cardinality a small number. We normalise the port values, e.g., if there are signatures with port values of 20, 21, 22 and 23, all of them are considered 20, so that a low cardinality is achieved. A comparison of the normalised to the 10 th and un-normalised port values is performed in a later section to ascertain whether there is a significant difference in the detection outcomes.

A perturbation is detected if a local port value, remote port value or both belonging to q turn out to be in a bin with the highest frequency.

4.3.3 Frequency-based Detector

The frequency-based detector, denoted as RT is based on time windows. It monitors the frequency of signatures in E, with the aim of attempting to recognise heavy bursts of similar traffic in which many worms tend to propagate. The idea behind this detector is that, after a long period of benign traffic, a burst of unfamiliar traffic could be malware. Hence, we need a mechanism that could distinguish between traffic generated by benign software and of those by malware. Unfortunately, as there are various applications that generate similar bursting scenario, such as P2P tools, video-related software and online games. An abrupt change in the traffic patterns may also happen when users use a new application for the first time. It is a challenge for any detectors to differentiate between traffic generated by those benign software and illegitimate traffic.

This detector monitors the arrival rate of the traffic within a time window, and compares the frequency rate with a frequency threshold denoted as t which is a highly dynamic value that self-adjusts as benign signatures pass. t is a threshold with a double data type.

t is calculated based on previously detected benign signatures, regardless of whether these detections are correct, and RT returns true only when the frequency rate in E is higher than t (i.e., R>=t ). When a benign signature is detected, it is added into a benign

69 69 Chapter 4. Towards Effective Detection of Malware at Propagation buffer called F. Its threshold value (a new value of t) is only recalculated if the computer is turned off or not in use for a long period and, unless that happen, F continues receiving the next benign signatures.

When the calculation begins, the signatures in F are grouped into one or more non- overlapping time windows, the sizes of which are represented by W. The result is that we may have zero, one or many windows containing signatures from F and the number in each is added to a common list.

The maximum value derived from the list is added into BT which is defined as BT=

(1, 10, 30, 60), i.e., BT=10 indicates that there are 10 possible threshold values inserted at the times of their original appearances. BT is a list with a double data type for each of its elements, used to store a number of threshold candidates for t. The latest t is derived th from BT and it is always picked at the 98 percentile of the total elements in the list. th Before t is chosen, BT is sorted first. The latest threshold is not at the 100 percentile as if this value was chosen, the threshold would tend to become too extreme and result in a false alarm.

When the system is first deployed, BT is empty. Then, it stores a maximum value taken from the common list when an existing session ends and adds another maximum value whenever a newer session ends.

4.3.4 Session Rate-based Detector

The session rate-based detector ( RF) actively monitors the session rate of the signatures in E with the aim of attempting to recognise bursts of similar heavy traffic, in which many worms tend to propagate, within a fixed window. The idea behind this detector is that, normal users will rarely establish a high number of sessions within a fixed window. If high rates of sessions are established within a fixed window, it may indicate as activities triggered by worms. This rate is always compared against t, a highly dynamic value that self-adjusts as benign signatures pass. t is a threshold with a double data type. This detector faces the same challenge as that noted for RT in Section 4.3.3 because legitimate applications also generate heavy bursts of traffic.

70 70 Chapter 4. Towards Effective Detection of Malware at Propagation

RF returns true when the frequency rate in E is higher than t (i.e., R>t ). Also, in it, a self-adjusted t is calculated based on previously detected benign signatures, regardless of whether these detections are correct so that, as a signature is detected, if it is benign, it is added into F. t is only recalculated if the computer is turned off or not in use for a long period and, unless that happens, F continues to receive the next benign signatures.

When the calculation begins, the signatures are grouped into one or more overlapping fixed windows, the sizes of which are equal to that used by E, and their session rates are added into bins. The maximum value of these bins is added into BF which is heuristically defined as BF= (1, 10, 30, 60), i.e., BF=10 indicates that there are

10 possible threshold values. BF is a list with a double data type for each of its element and used to store a number of threshold candidates for t. The latest t is derived from BF and it is always picked at the 98 th percentile of the total elements in the list.

4.3.5 Signature-based Detector

The signature-based detector ( T) is built based on recent known signatures and maintains knowledge about the self and non-self in the traffic by storing it in two separate non-binary trees. The idea behind the T is to assume that all new traffics will have similarity with the recent ones, if not full, at least partially. This is in-line with human behaviour when using computers to do their routine tasks or to seek for entertainment. A signature is composed of five parts and the detector bases its decision on how closely a new signature matches signatures that are previously known or thought to be benign or malware, via consideration of how many parts of a particular signature match and which particular parts match.

4.3.5.1 Tree Structure for Signature-based Detector

In this section, we describe how signatures are maintained in the self and non-self profiles. We design and build an n-ary tree, using the generic framework developed by Vanderboom (Vanderboom 2008). The ID3 (Rojach and Maimon 2008) and its variants’ decision-tree induction algorithms are well developed to operate according to generalised rules. However, we need a customised n-ary tree structure to store the five

71 71 Chapter 4. Towards Effective Detection of Malware at Propagation parts of a given signature, in which each node is required to keep one part. A simple flat file, database or hash table (Microsoft Corporation 2009) could also be used but consideration of this is a subject for future work.

Two trees are used, one representing benign traffic ( TB) and the other worm traffic

(TW), and each providing knowledge of recent known signatures.

Our tree-based knowledge structure is an object-oriented type that allows a rich set of properties to be attached to its nodes, thereby making it highly extensible for future work. The lowest nodes can also be merged when two or more have numbers in consecutive order, with the lowest value becoming the lower bound (LB) and the highest the upper bound (UB) to ensure that the size of the tree is always minimised at the lowest-level nodes. This merging will further ensure that the total nodes in the tree are maintained as optimal, e.g., given four rules (4.6. 1034.1.1158), (4.6. 1034.1.1168), (4.6. 1034.1.1169) and (4.6. 1034.1.1170), we can construct the tree in Figure 4.2. Combining the upper-level nodes will be our future work.

Worm Tree Root

Level=1, 4 ... Node Type=Direction Value=1

Level=2 6 ... Node Type=Protocol Value=2

Level=3 1034 ... Node Type=Destination Port Value=16 Level=4 1 ... Node Type=Session ID Value=4 Level=5 LB:1158 LB:1168 UB:1158 UB:1170 Node Type=Source Port Value=8 Figure 4.2: Sample worm tree structure

For efficiency, as we want the field that varies least to appear at the top of the tree and the ones that vary most at the lower nodes. In view of the existing dataset, we find that the following sequence of rules is in the best order: direction, protocol, destination port, session ID and source port; that sequence minimises the total numbers of nodes in

72 72 Chapter 4. Towards Effective Detection of Malware at Propagation the trees. One drawback of a tree-based structure is that two or more child nodes can only be joined by one or more parent nodes, e.g., two rules that differ in their first fields will be stored entirely at their respective nodes since there are no nodes they can share. Although another drawback is the cost of decoding a signature into a tree, this overhead is quickly balanced by speedier detection (Abbes, Bouhoula and Rusinowitch 2004).

4.3.5.2 The Detection and Decision

If a particular signature is detected as benign, it is added into the benign tree TB, otherwise into the worm tree TW. We limit the sizes of TB and TW by having an age threshold for each of their nodes denoted as TAge =[benign, worm], where the values of benign and worm represent the respective limit each tree can hold. When a new signature emerges and is added into a tree, the oldest dies if the total number of nodes existing in the tree exceeds its TAge value. As the age threshold is used to reduce or maintain the size of the profiles, it is likely that the least recently used signature is removed. All nodes become older as a new node is added and, if the new signature fully or partially matches existing nodes in the tree, these nodes become the youngest in the group.

A new signature is compared with the data in each of the two trees, to see how well it matches. A five-bit binary value is returned, which identifies which parts of the signature match a path in the tree (direction=1, protocol=2, destination port=16, session ID=4, and source port=8 (see Figure 4.2). These scores indicate the significance of each part of the signature e.g. the destination port is most significant). When the searching is completed, a value is returned by each of TB and TW, denoted TBval and TWval respectively. If a signature search returns a state such as ‘if TBval > TWval ’, the signature is likely to be benign.

The difference between the TBval and TWval partially determines the overall decision of a session’s signature-based detection. If TBval > TWval , this will result in a negation of the stress alarms triggered by the P and R, but is only effective when TBval > TWval and

TBval >= TMin , where TMin =24, that is, a match from the tree search must at least include matching destination and source ports (16+8=24).

73 73 Chapter 4. Towards Effective Detection of Malware at Propagation

When there is a stress alarm raised by the R (i.e. when the frequency-based detector R finds that the new signature does match a sufficient number of signatures in E, which indicates that the signature might be worm) in the absence of one from the P (i.e. when the port-based detector P does not finds the ports in the new signature matches the most common ports in E, which indicates that it might be safe), the T imposes stricter rules before claiming the session as positive, that is, the value returned by the TB should be larger than the TMin which means that a match from the benign tree search must satisfy at least one at the destination port, source port and one other node. This kind of detection is only executed upon activation.

4.3.6 Detection based on Activation

Detection based on activation aims to provide a thorough security checking when an incident occurs, in this case, when an anomalous traffic is found. The idea is that when an anomaly is found, the system needs to be particularly suspicious for a while until the situation settles down. This mechanism, which we refer to as A, works to negate a collective decision made by the P and R when both suggest a signature as benign but only works under limited conditions. When a worm signature is detected, regardless of its validity the detector is activated for a period of A, which is heuristically defined as A= (5, 10, 15), e.g., A=5 means the activation will last for the subsequent five signatures irrespective of the detection outcome and its next round will only begin when the present round reaches the limit set by A.

A consists of T and R with a few changes in conditions. In activation-based detection, unlike the negation made by the previous T, this T expects return values from

TW > TB and TW > Tree Min . Also, unlike the regular R, true is returned when its frequency/session rate is higher or equal to t The reason behind this more strict rule is that, once the presence of worm is detected and as the chance for the false alarm is lesser within this activation period, the decision to flag the next suspicious signature can be set at a closer value to the detection threshold t (i.e., R>=t ). If the activation period is over, chances to make a false alarm is higher. Hence, the decision needs to move slightly away from the detection threshold t (i.e., R>t ).

74 74 Chapter 4. Towards Effective Detection of Malware at Propagation

4.3.7 Integration of Detectors

As each detector’s functionality is explained in earlier sections, we now simplify their forms. Figure 4.3 shows the pseudo-code of the detection framework for all detectors. Line 1 encapsulates the general detection algorithm. Each new signature is added in E and E is updated as necessary such as removing the old ones (line 2). Line 3 indicates a decision that is if port-based detector detects the port perturbation (P==true) and frequency-based/session rate-based detector detects anomalous traffic (R==true) , the signature-based detector can override the decision. As in line 4, if (T==true) , it means that the signature-based detector assumes the signature is a benign because the detector detects that TB returns a higher value than TW. If (T==false) the signature is deemed as worm and the detection based on activation A is turned on (line 5 and 6).

It requires both the port-based detector to detect the port perturbation (P==true) and frequency-based/session rate-based detector to detect anomalous traffic (R==true) before the signature-based detector makes its ultimate decision. If any of the earlier returns false, i.e., (P==false) or (R==false) , the algorithm assumes the signature is benign (line 15). However, if the state for the detection based on activation A is on (line 9), the decision is reversed when (T==true) and (R==true) (line 12). Note that, for T and R to return true , more strict decisions are made as outlined in Section 4.3.6. Line 10 and 11 just indicates how A is eventually turned off. Line 17 and 18 indicates that if the PC is rebooted or turned off, or a session has no activity in 10 minutes (Shafiq, Khayam and Farooq 2008b), the system will update benign buffer F and threshold t.

Figure 4.3: Pseudo-code for integration of detectors

1: For Each ComputerInUse 2: { Update E 3: if (( P==true) && ( R==true)) 4: { if ( T==true) return ItIsBenign 5: A is turned on 6: return ItIsWorm 7: } 8: Else

75 75 Chapter 4. Towards Effective Detection of Malware at Propagation

9: { if ( A is turned on) 10: { A=A-1 11: if ( A==0) A is turned off 12: if ((T==true) && ( R==true)) return ItIsWorm 13: return ItIsBenign 14: } 15: return ItIsBenign 16: } 17: if (no_Session_Over_N_Time or PC_Reboot) 18: Update F & t 19: }

4.3.8 How Immune System inspired Our Work

Studies in the literature inspired by the dendritic cells have usually adapted the four signals (dangerous, safe, inflammatory cytokines and PAMPs) using certain behaviours, patterns and conditions; for instance, Greensmith et al. (Greensmith, Aickelin and Twycross 2006) associated those signals with packets’ transmission rates, a network packet’s inverse rate of change, high system activities and failed network connections, respectively.

The T detector demonstrates the feature of T cells that provides immunity for the human body. Signatures during the beginning period of operation are considered immature T cells because the T detector does not yet have fully developed TB and TW.

Yet, as the latter develops more slowly than the former, when the TB is fully developed, the TW, which may not be substantially established, will poorly detect non-self signatures. Benign signatures stored in the TB and worm signatures stored in the TW represent the ability of the T cells to recognise self and non-self cells, respectively.

The length of time required for the T detector to reach maturity, which is when the

TB and TW are fully developed, is subjective. It gains its maturity when the counts of the leaf nodes in both the TB and TW are equivalent to the values defined in the TAge and can fairly be used for self and non-self detection.

76 76 Chapter 4. Towards Effective Detection of Malware at Propagation

The P and R detectors generate stress signals which activate the T which, in response, evaluates the TBval and TWval values. A TWval higher than TBval indicates that an abnormal level of MHC checking is performed by the NK cells, which implies that the signature is more likely to be non-self. Specifically, this task is performed by the T detector.

As a non-self cell is detected, a danger alarm is raised and the activities of the T cells within a particular range of the affected area become more intense. As the alarm vanishes, these intensities gradually decrease and return to normal, which is the T’s basic process for detection based on activation.

4.3.9 Performance Measures

We evaluate the performance of the classification system based on its rates for one of:

• true positive (TP): The percentage of worm traffic correctly classified as worm; • false positive (FP): The percentage of benign traffic wrongly classified as worm;

4.3.10 Summary

This section described the architectures of the detectors and the ideas behind them, the overall decision process, the basic dataset and the performance measures. The following sections describe the experiment used to evaluate the proposed system, including determining the effectiveness of each component, the overall effectiveness of the system and the effect of varying some of its parameters.

4.4 Experimental Setup

The aim of this experiment is to evaluate a similar dataset in fixed and time windows as long streams of session traffic. In this experiment, recent traffic is always assumed to be the ‘training set’, to implement detection.

In the time window, the size of E is constrained by a parameter denoted as W=(10, 20, 30) whereas, in the fixed window, it is set with an allowable number of signatures

77 77 Chapter 4. Towards Effective Detection of Malware at Propagation not exceeding a parameter denoted as W= (13). Why only 13? After completing the experiment that uses the time window technique, we examine the recorded states of E that correspond to all values of W and discover that their median values are exactly 13.

Also, we impose limits of TAge =([100, 15], [200, 30] and [300, 40]) so that, in total, we have hundreds of test cases, each covering one of those unique parameter settings. We test the stream of that traffic (25,023,865 signatures) against these test cases.

We present an algorithm based on self-adjusted scan rates which relies on only recent signatures, without specifying a training set for it. In this work, we adapt the role played by T cells, with benign profiles treated as immature T cells and worm profiles as not fully developed ones but able to reach maturity within short sequences of signatures, which detect self and non-self signatures in the traffic. The detectors generate stress signals when worm behaviour is detected which, in turn, triggers ‘T cell’ checking. Upon the successful detection of a worm, a danger signal is raised and an intense checking begins for a period and then gradually decreases.

The dataset used is passed directly into the detection system as a one-year-long stream of session traffic, just as in a real environment, as opposed to implementations in (Khayam, Radha and Loguinov 2008) and (Shafiq, Khayam and Farooq 2008b) which required training and testing sets.

4.4.1 Dataset

In our study, we observe most of the published guidelines of (Tavallaee, Stakhanova and Ghorbani 2010) and obtain the dataset from (nexginrc.org 2008) used by (Khayam, Radha and Loguinov 2008) and (Shafiq, Khayam and Farooq 2008b). This dataset records session-based network profiles collected from 13 endpoints over 12 months. An endpoint refers to an individual computer system or device that acts as a network client and serves as a workstation or personal computing device (EndpointSecurity.org 2008). These endpoints’ records consist of the session traffic of home users, research students, technical staff and administrative staff. Various types of software are used, including peer-to-peer file sharing, multimedia, games and database clients.

78 78 Chapter 4. Towards Effective Detection of Malware at Propagation

A session is defined as a two-way communication between two IP addresses. Subsequent communications between the same hosts which involve the use of different ports are still considered the same session.

Only the information obtained from the first packet of data is recorded and used, and a session is assumed ended if no packet is sent or received in 10 minutes. This session data has the following six fields (Shafiq, Khayam and Farooq 2008b).

• Session ID: 20-byte SHA-1 hash sub-names of host and remote IP address. • Direction: a byte flag of four types - outgoing/incoming unicast and outgoing/incoming broadcast. • Protocol: packet’s transport layer protocol. • Source port: packet’s source port. • Destination port: packet’s destination port. • Timestamp: millisecond resolution of session initiation.

The benign dataset has a total of 1, 881, 234 benign session signatures. Table 4.2 lists the profiles of the benign signatures at each endpoint.

79 79 Chapter 4. Towards Effective Detection of Malware at Propagation

Table 4.2: Profiles of benign signatures as per endpoints

EndpointID 5 Use Type Total 1 office 33487 2 office 21066 3 home 373009 4 home 444345 5 home/university 27872 6 university 60979 7 university 171601 8 university 41809 9 university 235133 10 university 152048 11 university 207187 12 home/university 100702 13 university 11996 Total Benign Signatures 1881234

The worm dataset has 188,389 signatures, from 12 different types of worms. These are: 1) Blaster , 2) Dloader-NY , 3) Forbot-FU , 4) MyDoom-A, 5) Rbot-AQJ , 6) Rbot-CCC , 7) Sdbot-AFR , 8) Simulated 3_57scan , 9) CodeRed II , 10) Witty , 11) SoBig.E and 12) Zotob.G Table 4.3 provides the breakdown for each worm.

5 The endpointIDs (Nexginrc.org 2008) are slightly different from the descriptions in their publication but the total numbers of signatures for each Use Type (home/university/office) remain consistent, and we correct IDs as necessary.

80 80 Chapter 4. Towards Effective Detection of Malware at Propagation

Table 4.3: Worms and total signatures

Release Ratio Worm Name Total Date (%) Blaster Aug-03 5 9438 Dloader-NY Jul-05 22.4 42244 Forbot-FU Sep-05 14.5 27261 MyDoom-A Jan-06 0.1 157 Rbot-CCC Aug-05 2.4 4534 Rbot-AQJ Oct-05 0.3 584 Sdbot-AFR Jan-06 13.4 25286 3_57scan Simulated 1.7 3225 CodeRed II Jul-04 2.4 4455 Witty Mar-04 10.6 20000 SoBig.E Jun-03 8.6 16124 Zotob.G Jun-03 18.6 35081 Total Worm Signatures 188389

We use the dataset already prepared by (Khayam, Radha and Loguinov 2008), as described in their papers and from the web site providing the data, which allows us to directly compare our and their results. In it, the malware traffic is interleaved into the benign dataset but, when not interleaved, the benign consists of 91% of the total data. Each worm signature was interleaved by Khayam et al. into each endpoint at random insertion points. With 13 datasets of hosts and 12 of worms, the insertions produce 156 unique datasets totalling 25,023,865 benign signatures mixed with those of worms.

4.4.2 Results and Discussion

In this section, we discuss the performances of the systems comprising R, P and T detectors. Prior to that, we assess the performances of the R and P independently, but not that of the T as its knowledge is supplied by the R and P because, otherwise, it would not be able to decide which signatures are positive and negative.

An analysis of variance is performed on 144 test cases using PTRTT and 48 using

PFRFT to identify which parameters make significant differences. Port normalisation in the time window indicates that there are increases in the TP but also the FP rates.

81 81 Chapter 4. Towards Effective Detection of Malware at Propagation

Endpoint 10 (university) produces a significant decrease in the FP rate but the other endpoints do not show any significant difference ([F(1, 5) = 2.807E+28, p < 2.2E-16]). Dloader-NY and MyDoom-A are affected by this parameter ([ F(1, 286) = 6.625, p < 1.056E-02] and [ F(1, 286) = 21.312, p < 5.903E-06], respectively), with the latter using a commonly registered local port value and most remote port values incrementally. However, due to the low number of signatures with low scan rates in Dloader-NY and MyDoom-A, their detection rates remain lower than those of the other worms.

A similar observation is noted in the fixed window in which there are increases in the TP, but also the FP rates while the numbers are insignificant. The only significant effects on FP performances can be seen at endpoints 1 [F(1, 94) = 5.973, p < 1.639E- 02], 2 [F(1, 94) = 11.984, p < 8.092E-04] and 4 [F(1, 94) = 5.27, p < 2.392E-02]. Only the Dloader-NY signatures, which use local port values in ranges up to 5000 and remote port values up to 1000, are significantly affected [F(1, 102) = 6.539, p < 1.203E-02].

In the time window setting, W and B show significant differences in many situations. While W has mixed responses, when B is set to a smaller value, there are increases in the average and median TP rates but also increases in the FP rates.

However, they are negligible. TB and TW show significant differences in the FP rates at only endpoints 1 ([F(1, 1726) = 6.782, p < 9.289E-03] and [ F(1, 1726) = 7.152, p < 7.560E-03], respectively), 3 ([ F(1, 1726) = 88.244, p < 2.2E-16] and [ F(1, 1726) = 90.4, p < 2.2E-16], respectively) and 4 ([ F(1, 1726) = 1759.2, p < 2.2E-16] and [ F(1, 1726) = 2119.2, p < 2.2E-16], respectively). The use of A seems to make a significant difference to the FP rates at only endpoints 1 [F(1, 1726) = 46.033, p < 1.593E-11] and 3 [F(1, 1726) = 4.649, p < 3.120E-02]. Overall, it appears that B and W make significant differences to the TP rates across all endpoints ([F(1, 22392) = 61.854, p < 3.865E-15] and [ F(1, 22392) = 25.674, p < 4.075E-07], respectively). It is also obvious that, generally, B, W, TB and TW produce significant differences in the FP rates ([F(1, 22392) = 4503.262, p < 2.2E-16], [ F(1, 22392) = 85.810, p < 2.2E-16], [ F(1, 22392) = 48.302, p < 3.754E-12] and [ F(1, 22392) = 6.543, p < 1.054E-02], respectively). A cross- examination of the worms reveals that B and/or W make a difference ([ F(1, 22392) = 61.854, p < 3.865E-15] and [ F(1, 22392) = 25.674, p < 4.075E-07], respectively) to

82 82 Chapter 4. Towards Effective Detection of Malware at Propagation most of the worms, including Zotob.G which uses very random port numbers involving local ports spread well apart up to 6000 and remote ports covering all port ranges.

In the fixed window setting, only B turns out to make significant differences to the results ([ F(1, 6828) = 175.003, p < 2.2E-16] for TP and [ F(1, 6828) = 2399.662, p < 2E-16] for FP) with the exception of the TP rates at endpoints 1 [F(1, 526) = 3.570E-06, p < 0.999] and 2 [F(1, 526) = 9.170E-02, p < 0.762] whereas TB and TW only make a significant difference ([ F(1, 526) = 5.105, p < 2.427E-02] and [ F(1, 526) = 4.571, p < 3.297E-02], respectively) to the FP rates at endpoint 1 (office). Overall, B shows strong significant differences in the TP [F(1, 6828) = 175.003, p < 2.2E-16] and FP [F(1, 6828) = 4684.403, p < 2E-16] outcomes while no other parameters appear to have a strong effect on the overall results. A cross-checking of each worm reaffirms that overall, B has a significant effect on the results ([ F(1, 6828) = 175.003, p <2.2E-16] for TP and [ F(1, 6828) = 2399.662, p <2E-16] for FP) but none of the other parameters show any significant differences.

4.4.3 Port Alone

We evaluate detector PT alone to see how well it performs in the time window without normalising the port values, as its lowest FP performance comes from this setting ( W=10, TAge =null, BT=null, A=null, port values un-normalised). A setting with a ‘null’ value indicates that it is simply turned off which applies to the rest of the settings mentioned in this thesis. The PT records average rates of TP=89.0% and FP=22.9% with standard deviations of 17.9 and 0, respectively. A consistent FP=22.9% is recorded while the TP rates produce mixed responses, with some dropped, some increasing but many not changing significantly, as detailed in Table 4.4.

83 83 Chapter 4. Towards Effective Detection of Malware at Propagation

Table 4.4: Performances of PT/worms

Worm TP Rate (%) FP Rate (%) Blaster 99.8 22.9 Dloader-NY 67.6 22.9 Forbot-FU 98.7 22.9 MyDoom-A 44.6 22.9 Rbot-CCC 99.3 22.9 Rbot-AQJ 79.6 22.9 Sdbot-AFR 99.9 22.9 3_57scan 100 22.9 CodeRed II 100 22.9 Witty 100 22.9 SoBig.E 99.8 22.9 Zotob.G 78.7 22.9 Average 89.0 22.9 Standard Deviation 17.9 0.0

As an insight into the PT’s ability to detect self signatures, it is seen that its FP performances per endpoint are not consistent (see Table 4.5).

It finds the university and home/university endpoints among the hardest to detect, with its standard deviations for the TP and FP rates being 0.0 and 12, respectively, and their means 89.0% and 22.9%, respectively. Thus, we conclude that detector PT alone performs considerably well for detecting non-self signatures using the simple algorithm described earlier but suffers weak responses in terms of recognising self signatures.

84 84 Chapter 4. Towards Effective Detection of Malware at Propagation

Table 4.5: Performances of PT/endpoints

Endpoint Use Type TP Rate (%) FP Rate (%) 1 office 89.0 11.6 2 office 89.0 7.6 3 home 89.0 19.3 4 home 89.0 6.7 5 home/university 89.0 18.2 6 university 89.0 30.6 7 university 89.0 30.7 8 university 89.0 24.0 9 university 89.0 12.7 10 university 89.0 27.9 11 university 89.0 48.7 12 home/university 89.0 34.9 13 university 89.0 24.8 Average 89.0 22.9 Standard Deviation 0.0 12.0

Also, the P is inspected in the fixed window and denoted as PF, where normalising the port values makes no difference at all which, it appears, is due to normalisation of the port value with 13 consistent signatures in E being insufficient.

As indicated in Table 4.6, the PF successfully detects worms far better than the

PFRFT, with even the top three hardest being a little better detected. The average TP and FP rates are recorded as 91.5% and 49.3%, respectively, and the standard deviations

17.2 and 0.04, respectively. The setting for this configuration is ( W=13, TAge =null,

BT=null, A=null, port values normalised).

There is also a clear pattern among the endpoints (see Table 4.7). Only the fourth (home) shows a low FP rate of 6% with the rest failing to recognise self signatures with an average FP rate of 49.3%. The standard deviations across the endpoints are 1 for the

TP and 18.1 for the FP rates. Nevertheless, this is sufficient to conclude that the PF alone is weak at recognising self signatures because some port values used by worms are also regularly used by benign applications. Some of these port values, e.g., the registered ports of 80, 20, 21, 22, 443, etc., tend to appear in E in higher frequencies.

85 85 Chapter 4. Towards Effective Detection of Malware at Propagation

Table 4.6: Detection performances of PF/worms

Worm TP Rate (%) FP Rate (%) Blaster 99.5 49.3 Dloader-NY 99.3 49.3 Forbot-FU 98.5 49.2 MyDoom-A 41.4 49.3 Rbot-CCC 98 49.3 Rbot-AQJ 87.2 49.3 Sdbot-AFR 99.2 49.3 3_57scan 98.8 49.3 CodeRed II 99.8 49.3 Witty 99.9 49.3 SoBig.E 99.2 49.2 Zotob.G 77.5 49.3 Average 91.5 49.3 Standard Deviation 17.2 0.0

Table 4.7: Detection performances of PF/endpoints

Endpoint Use Type TP Rate (%) FP Rate (%) 1 office 91.8 42.9 2 office 92.2 33.9 3 home 89.1 24.3 4 home 91.2 6.0 5 home/university 91.8 56.6 6 university 92.3 61.1 7 university 91.4 52.5 8 university 92.4 58.2 9 university 92.5 54.4 10 university 90.4 55.5 11 university 92.0 70.5 12 home/university 90.5 63.9 13 university 92.3 60.9 Average 91.5 49.3 Standard Deviation 1.0 18.1 There is also no adequate mechanism in the P for classifying benign and worms. The threshold that achieves this ( t) is only available in the R which provides the

86 86 Chapter 4. Towards Effective Detection of Malware at Propagation strongest argument for its FP rates being high. In conclusion, the P does not make a good detector when used alone regardless of whether it is in a time or fixed window; thus information about the ports alone is not sufficient to make an accurate detection.

4.4.4 R Alone

In this section, we evaluate the performances of RT and RF alone, that is, worm detection based purely on the rate-based detector in a time window and the frequency- based detector in a fixed window, respectively. It is essential that the conditions of E (the window of recent signatures) and t (the threshold described in Sections 4.3.3 and 4.3.4) are inspected in the R. It is assumed that, if E produces a frequency that goes beyond t, q is deemed worm and the frequencies of the signatures accumulated in E are observed. We evaluate t in the time window both with and without its ability to self- adjust its value over time. For the R without t being self-adjusted, we test several fixed values setting t= (10, 20, 30, 40).

Table 4.8: Detection performances of RT/worms

Worm TP Rate (%) FP Rate (%) Blaster 99.7 9.4 Dloader-NY 99.9 9.4 Forbot-FU 99.9 9.4 MyDoom-A 0.0 9.4 RBOT-CCC 99.2 9.4 Rbot-AQJ 0.0 9.4 Sdbot-AFR 99.8 9.4 3_57scan 99.1 9.4 coderedv2 99.3 9.4 Witty 99.9 9.4 SoBig.E 99.7 9.4 Zotob.G 99.9 9.4 Mean 83.0 9.4 Standard Deviation 38.8 0.0 The system produces 100% TP rates at a cost of a 100% FP rates at t=40, which means all worms and benign signatures are bound to below t=40. The RT also produces

87 87 Chapter 4. Towards Effective Detection of Malware at Propagation slightly better TP rates when t is equal to 10 than to 20 and 30. As shown in Table 4.8, although the FP=9.4% appears to be the best with t fixed at 30, only 83.0% of the TP rate is achieved in return, with their respective standard deviations being 38.8 and 0, which is a clear indication that the presence of various worms at each endpoint do not significantly change the ability of the R to recognise self signatures. All worms are detected at more than 99% TP rates, except MyDoom-A and Rbot-AQJ which are rendered completely undetected.

Table 4.9: Detection performances of RT/endpoints

Endpoint Use Type TP Rate (%) FP Rate (%) 1 office 83.0 0.0 2 office 83.0 0.0 3 home 83.0 26.9 4 home 83.0 93.7 5 home/university 83.0 0.6 6 university 83.0 0.0 7 university 83.0 0.0 8 university 83.0 0.0 9 university 83.0 0.1 10 university 83.0 0.1 11 university 83.0 0.7 12 home/university 83.0 0.0 13 university 83.0 0.0 Average 83.0 9.4 Standard Deviation 0.0 26.4 The FP rates are particularly high at the two home endpoints, as indicated in Table 4.9, as home users use programs very differently from the ways they are used in offices or in mixed mode; for example, various types of applications other than upload/download activities are less likely to be used in an office or university environment. It seems that the loophole at these two home endpoints requires the presence of detector P, as is proven later with the presence of PT and T, as they record FP rates of 3.0% and 1.9%, respectively, while the average FP rates at all endpoints is recorded as 1.6%.

88 88 Chapter 4. Towards Effective Detection of Malware at Propagation

When t is changed to a self-adjusted threshold, it behaves differently than when it is static. Tested on W= (10, 20, 30), A=false and BT= (1, 10, 30, 60), the R produces 100% detection rates of both TP and FP. Its early decisions affect its subsequent detection quality in which the frequency value of E is always greater than the value of t. However, when the R works in tandem with the T and P, the value of t changes dynamically over time.

For the R alone with t set to static, the value of FP decreases as t increases. Tested on t= (5, 10, 20, 30, 40), the average TP rate is 6.9% but the average FP rate 56.3% with their respective standard deviations being 15 and 0.1 based on the data listed in Table 4.10. Strangely, MyDoom-A is moderately detected at 51.1% while others are seen at extremely low percentages.

Table 4.10: Detection performances of RF/worms with static t

Worm TP Rate (%) FP Rate (%) Blaster 0.8 56.3 Dloader-NY 0.3 56.3 Forbot-FU 0.3 56.0 MyDoom-A 51.1 56.3 Rbot-CCC 3.8 56.3 Rbot-AQJ 20.0 56.3 Sdbot-AFR 0.5 56.3 3_57scan 2.2 56.2 CodeRed II 2.5 56.3 Witty 0.5 56.3 SoBig.E 0.4 56.2 Zotob.G 0.2 56.3 Mean 6.9 56.3 Standard Deviation 15.0 0.1 Our assessment is that this outcome is impractical because the FP rate remains very high. It seems that the R detector simply classifies most of the signatures as worms, which automatically increases their detection rate. The reason why the TP rates of only two worms, MyDoom-A and Rbot-AQJ , are high is that, as they have many fewer signatures than the others, only 0.083 % and 0.31%, respectively, one signature

89 89 Chapter 4. Towards Effective Detection of Malware at Propagation correctly classified will carry a higher impact on the TP rate. On an endpoint basis, the TP rates are seen as weak and, as the t value increases, the FP rates decrease. Therefore, it is assumed that the R alone, with t set to a fixed threshold value, cannot well recognise self and non-self signatures.

We change t to self-adjusted ( t is described in Sections 4.3.3 and 4.3.4) and test the

R with settings of BF= (1, 10, 30, 60) and W=13 which, surprisingly, returns perfect TP rates for all worms but suffers badly in terms of the FP rates. The mean of the FP rates is only 34.2 and its standard deviation, based on the worms and endpoints, almost zero.

We conclude that the RF alone with a self-adjusted t shows a robust ability to detect worms but is still weak in terms of recognising self signatures. With an average FP rate of 34.2%, there are approximately 642,629 benign signatures wrongly positioned beyond threshold t.

In conclusion, the R does not make a good detector when used alone, regardless of whether it relies on the time or fixed window of E with t set to either the fixed mode or self-adjusted. This happens because information about only the rates or frequencies is not sufficient to make an accurate detection.

4.4.5 Integrating Detectors PTRTT

In this section, we present the results of the detection system when the three detectors are combined and work together in the time window. Normalising the port values appears to increase the average of the TP rates by 1.7% but that of the FP rates of

0.2% is an insignificant decrease. We test this on all values of W, T, B and A=10 and the subsequent results indicate that the port value is normalised.

Detections are achieved with at least a 99% accuracy of the TP and less than 1% of the FP rates when the setting is W= (10, 20, 30), TAge = ([100, 15], [200, 30], [300, 40]) and BT= (10, 30, 60) regardless of the A setting. Details of these settings are discussed as follows: 1) W in Section 4.3.1; 2); TAge in Section 4.3.5.2; 3) BT in Section 4.3.3; and 4) A in Section 4.3.6. This scenario happens at endpoints 1, 2, 3, 7, 8, 9 and 10 and involves one or more worms other than Forbot-FU , Sdbot-AFR and SoBig.E . However,

90 90 Chapter 4. Towards Effective Detection of Malware at Propagation this does not mean that the detection rates for these worms are low.

The best performance using A is with setting ( W=30, TAge =[300, 40], BT=10, A=15) which achieves TP and FP rates of 84.5% and 2.4%, respectively, insignificant detection rate differences of 0.02% (TP) and 0.04% (FP). The best performance without A is with setting ( W=30, TAge =[300, 40], BT=10, A=null).

The presence or absence of activation detection produces almost similar results, with their differences being just as insignificant. As, considering the averages of all settings with A (TP=83.5%, FP=3.4%) and without A (TP=83.4%, FP=3.3%), there is no significant difference among them, it is concluded that the presence of A weakly increases the system’s ability to detect worms at the cost of a similar percentage of benign.

If the objective is to have the setting that can produce the lowest FP rate, the best performances of all worms at all endpoints occur for setting ( W=10, TAge =[300, 40],

BT=60, A=null) using which the TP and FP rates are 81.0% and 1.7% with standard deviations of 35.2 and 0, respectively.

As shown in Table 4.11, the system is very strong at recognising self signatures, with an average of at least 98.3%, and most non-self signatures. However, it finds it difficult to detect those of MyDoom-A and Rbot-AQJ .

As detection based on the port patterns and session rates of the MyDoom-A and Rbot-AQJ worms are, in the main, similar to benign signatures, the question is raised as to what is an alternative way of detecting such worms.

91 91 Chapter 4. Towards Effective Detection of Malware at Propagation

Table 4.11: Detection performances of PTRTT/worms

Worm TP Rate (%) FP Rate (%) Blaster 99.6 1.7 Dloader-NY 97.1 1.6 Forbot-FU 98.4 1.6 MyDoom-A 6.7 1.7 Rbot-CCC 95.1 1.7 Rbot-AQJ 8.3 1.7 Sdbot-AFR 99.3 1.7 3_57scan 99.2 1.7 CodeRed II 99.4 1.7 Witty 99.8 1.7 SoBig.E 97.7 1.6 Zotob.G 71.2 1.7 Mean 81.0 1.7 Standard Deviation 35.2 0.0

Table 4.12: Detection performances of PTRTT/endpoints

Endpoint Use Type TP Rate (%) FP Rate (%) 1 office 89.9 0.5 2 office 83.3 0.4 3 home 78.2 3.0 4 home 79.5 1.9 5 home/university 80.1 0.6 6 university 79.8 0.6 7 university 80.3 2.0 8 university 79.8 0.3 9 university 82.5 0.9 10 university 79.6 0.9 11 university 80 4.9 12 home/university 79.9 0.7 13 university 79.9 4.9 Average 81.0 1.7 Standard Deviation 3.0 1.6 We look at the details of what happens on each endpoint and find that, regardless of

92 92 Chapter 4. Towards Effective Detection of Malware at Propagation the endpoint used, the detection system recognises self signatures which, with the setting aimed at the lowest FP rate, are detected extremely well at almost all, as shown in Table 4.12.

4.4.6 Integrating Detectors PFRFT

In this section, we present the results of the detection system when the three detectors are combined and work together in a fixed window. At least 99% of Blaster , Sdbot-AFR , 3_57scan , CodeRed II and Witty signatures are successfully detected at endpoints 3 (home), 7 (university), 11 (university) and 12 (home/university) regardless of the parameter values used but most of their TP rates show inconsistent FP performances. Only Blaster and CodeRed II produce TP and FP rates of 99% and 2%, respectively, for all settings excluding BF=1.

The conclusion is that, the lower the BF, the better the performance. Although the lowest FP rate achieved is 4.7% with setting ( W=12, TAge =[200, 30], BT=1, A=5) (see Table 4.13), the TP rate is also very low (5.3%) and using the other parameters seems unable to significantly improve it.

Table 4.13: Detection performances of PFRFT/worms

Worm TP Rate (%) FP Rate (%) Blaster 8.2 4.6 Dloader-NY 0.2 4.7 Forbot-FU 0.2 4.4 MyDoom-A 18.7 4.4 Rbot-CCC 1.9 4.7 Rbot-AQJ 16.5 4.7 Sdbot-AFR 0.3 4.7 3_57scan 4.4 4.8 CodeRed II 12.4 4.5 Witty 0.4 5.0 SoBig.E 0.2 4.6 Zotob.G 0.1 4.7 Mean 5.3 4.7 Standard Deviation 6.9 0.2

93 93 Chapter 4. Towards Effective Detection of Malware at Propagation

Looking at the performances of worms across endpoints (see Table 4.14), the system is considerably strong at recognising self signatures. Despite the average FP rate of 4.7% and standard deviation of 0.2%, it is very hard to detect all the worms, which are seen only at an average TP rate of 5.3% and standard deviation of 6.2.

Table 4.14: Detection performances of PFRFT/endpoints

Endpoint Use Type TP Rate (%) FP Rate (%) 1 office 11.8 99.3 2 office 5.4 97.9 3 home 9 97.1 4 home 6.9 98.3 5 home/university 0.6 98.0 6 university 1.6 95.8 7 university 22.2 98.5 8 university 0.4 97.7 9 university 1.7 98.9 10 university 2 98.5 11 university 3.9 79.0 12 home/university 2.4 89.2 13 university 0.6 91.2 Average 5.3 4.7 Standard Deviation 6.2 5.8 Overall, the system based on a fixed window is not feasible because, besides being very slow for execution detection, it also suffers bad performances. Also, the constant number of signatures in its detection buffer causes time delays at a constant rate and the volume of signatures in the buffer fails to present sufficient knowledge to the P and R whenever there are worm signatures in the traffic.

Comparatively, the PTRTT outperforms the PFRFT as the detectors integrated in it lead to the system sustaining great improvements regardless of the settings imposed.

When the system is set to PTRTT, the detectors complement each other and t is correctly adjusted mainly because of the higher level of intelligence provided by E in the time window than in the fixed window ( PFRFT) .

94 94 Chapter 4. Towards Effective Detection of Malware at Propagation

4.4.7 Execution Performances

In this section, we report the total time required to experiment the dataset as a one- year-long stream of session traffic, representing 13 different endpoints, which is interleaved with worms, as prepared by (Khayam, Radha and Loguinov 2008). Note that, a decision whether a signature is a benign or worm is made on every single signature as the signature passes. Practically, upon the detection of the first signature of a particular worm, the worm is considered detected already. What is implemented in our experiment is that, the detection system will continue to detect the presence of worm signature in the subsequent traffic. Full TP and FP rates are only obtained upon completion of the entire dataset.

For simplicity, we summarise the average performances of all normalised port test cases involved in the PTRTT and PFRFT and inspect the effects of all values of W when A is turned off.

Based on observations of the PTRTT, there are average increases of 12 and 13 minutes as the W value increases. When we inspect all W values when A=15, the results show a decreasing pattern from 7 to 5 minutes in speed as W increases. However, we believe this is not a straight-line trend as the endpoints’ usage patterns change over time and subsequently affect the qualities of the T and t which then affect the processing speed. The best setting for FP in the PTRTT executes for 14 hours 35 minutes and 42 seconds. In the PFRFT, most settings are completed within 4 days and the best setting for FP executes for 4 days 15 hours 54 minutes and 01 seconds.

We also test the time required to process all those signatures without our algorithm codes. For nine rounds of execution, the average is 13:22:14 (HH:MM:SS) with a standard deviation of 00:14:33 (HH:MM:SS). In the time window, it is evident that the smaller the W value, the larger the throughput. Looking at the time difference between the time and fixed window settings, it is clear that the algorithm performs much faster and has greater detection accuracy in the former. It is also known that using all possible values in the parameters does not produce significant differences in terms of speed.

95 95 Chapter 4. Towards Effective Detection of Malware at Propagation

4.5 Comparisons with Other Results

Shafiq et al. (Shafiq, Khayam and Farooq 2008b) evaluated several types of detection classifiers against the same dataset, including two well-known bio-inspired anomaly detectors called Real Valued Negative Selection (RVNS) and DCA, and some non-bio-inspired detectors, including the ANFIS, SVM, RL detector and ME detector. They evaluated the RVNS algorithm with fixed-size detectors using the Euclidean matching rules for the test. Similar to (Greensmith, Aickelin and Twycross 2006), Shafiq et al. tested the DCA but used the variances in destination IP addresses as PAMPs and the inverse of the average inter-arrival session times as signals for inflammations. However, we explore the session rates as danger signals and the self- adjusted t as defining the safe signals or self area. Unlike their scheme, we implement the R in an incrementally overlapping time window as well as fixed windows, and without any data as a training set.

Even though the RT had a better TP rate than RVNS and DCA, the results alone are not convincing, as summarised in Table 4.15, due to the inability of the detection algorithm to detect MyDoom-A and Rbot-AQJ , and signatures from endpoints 3 (home) and 4 (home). The removal of those two worms and endpoints revealed the strength and weakness of setting ( W=10, TAge =null, BT=30, A=null). The idea is that, if we can understand what the system can detect and what it cannot, we need to come up with another detection mechanism that can detect those worms that the system missed by understanding their characteristics, which is an aim of Chapter 6. The average TP and FP rates are 99.6% and 0.2% and their respective standard deviations 0 and 0.3.

Table 4.15: Mean TPs & FPs of RVNS, DCA and RT

RVNS DCA R Classification T (%) (%) (%) TP 53.50 61.60 83.00 FP 8.00 5.60% 9.40

Based on our results, we agree with the argument of (Shafiq, Khayam and Farooq 2008b) that the definition of self is not stable and changes over time due to users’ pseudo-random behaviours. Relying on the session rates is naïve as worms with low

96 96 Chapter 4. Towards Effective Detection of Malware at Propagation scan rates could completely bypass the detection system and suffer high FP rates because home users tend to generate unstable rates of computer traffic, as shown by endpoints 3 and 4, which seem to be different from those in an office and produce rates resulting from the broad freedom of how these endpoints are used.

Earlier, we demonstrated that various parameters produce mixed results. Besides the results based on the lowest FP rates, we also present those based on the best TP rates and balanced performances. Above all, we suggest that ( W=10, TAge =[300, 40], BT=60, A=null), which is the setting that produces the lowest FP rate should be proposed as the preferred system as it will minimise the levels of disturbance to a computer’s normal day-to-day use and yet consistently detect most worms. Without removing any endpoints, we take a snapshot of the final results when the two very low scan rates of MyDoom-A and Rbot-AQJ are removed. The system records 95.695% and 1.6% TP and FP rates, respectively, with respective standard deviations of 0.6 and 1.6. The Zotob.G turns out to be the third most difficult worm at a 71.213% TP rate but its scan rates are not as low as the other two. Should these three worms be detected by another detection mechanism, the overall average performances are 98.415% and 1.7% TP and FP rates, respectively, with respective standard deviations of 0.6 and 1.6.

Shafiq et al. (Shafiq, Khayam and Farooq 2008b) incorporated four features for the unbiased evaluation of the RVNS, DCA and SVM algorithms using the same dataset with each feature using a 30-second time window of session blocks.

The first feature was traffic burstiness, as worm traffic tends to have different burst patterns than other traffic. The authors believed that worm traffic produced constant artificial bursts rather than those observed in benign traffic. For each session block, they classified the sessions into bins of different sizes, 0.001, 1, 2, 3 and 4 seconds, to provide information on low and high rates of attack, from which the Gilbert model was used to produce a feature. Although it is believed that benign sessions produce low and worm sessions high sizes of feature values, this technique is susceptible to high false negative rates when worm activities become active in the presence of benign traffic.

The second feature was based on multi-resolution session rates. Similar to the first

97 97 Chapter 4. Towards Effective Detection of Malware at Propagation technique, the sessions were classified into bins of different sizes, 0.001, 1, 2, 3 and 4 seconds, and their rates recorded for each bin. The feature was a result of the calculation obtained from a first-order derivative of the sum of the bins minus the sum of the bins excluding the last one. The weakness of this technique is that worms with low scan rates can evade detection.

The third feature was obtained by measuring the entropy rates of the destination IP addresses by grouping the same addresses within a time window into bins and using the entropy measure Tsallis to capture the entropy variances in the sessions. Although it is claimed that worms tend to generate higher entropy values because of increases in their IP usage due to scan activities, the weakness of this technique is that those with low scan rates or low propagating activities can evade detection.

The fourth technique looked at the divergence in port distributions using the resistor-average (RA) divergence (Cover and Thomas 1991) to quantify the difference between two probability distributions, and established two histograms for detection. Similar to that of the second technique, this feature was a result of calculating the port histogram of all the ports in a time window and the port histogram of all the ports in a time window without the last one. Based on our experience, this technique works effectively for most worms except those with low scan rates, as there is a lack of representation of their port numbers in the histograms. Nevertheless, a low scan rate is not the only reason because Zotob.G used very random port numbers involving local ports well spread up to 6000 and remote ports covering all port ranges. Another weakness is that benign traffic uses some registered ports more frequently than others which some worms also use.

We classify these four intelligent techniques into three categories. Although the first two used different algorithms, both generally worked in relation to the session rates. The third observed distribution patterns of destination addresses and the fourth observed distribution patterns in terms of ports. This classification inspired us to come up with three detectors that correspond to these three categories.

Unlike the work of (Shafiq, Khayam and Farooq 2008b) that relied upon a training

98 98 Chapter 4. Towards Effective Detection of Malware at Propagation set to train their classifiers and subsequently evaluate the testing set, we choose to use the dataset as a stream of session traffic with our classification system using the recent knowledge it has as the session traffic passes. This way, we assume that the detection performances of our system presented in this chapter more closely resemble a realistic deployment environment.

Table 4.16: Accuracy comparison of all detection algorithms (%) =5) =null) =null) =null) =[300,40], =[300,40], =[300,40], =[100,15], A

A A Age Age Age RL RL ME ME iDCA iDCA iSVM iSVM T T T =1, =1, iRVNS iRVNS iANFIS iANFIS T =60, =60, =10, B Algorithms Algorithms T T B B =10, =10, =30, =30, =30, W W W ( ( (

Endpoint 1 TP 95.1 94.8 96.2 99.4 91.0 83.0 89.9 91.8 92.0 FP 0.3 0.1 0.4 0.0 0.0 0.0 0.5 0.8 0.0 Endpoint 2 TP 95.6 94.5 96.5 98.6 92.0 83.0 83.3 89.5 90.0 FP 0.4 0.1 0.5 0.1 0.0 0.0 0.4 0.6 0.0 Endpoint 3 TP 95.9 95.0 96.9 99.5 85.0 84.0 78.2 78.2 83.0 FP 0.4 0.1 0.5 0.1 11.0 22.0 3.0 3.6 10.0 Endpoint 4 TP 94.8 94.6 96.2 98.7 83.0 83.0 79.5 74.3 82.0 FP 1.3 0.3 1.7 0.3 7.0 33.0 1.9 1.8 0.0 Endpoint 5 TP 95.0 94.8 96.3 99.5 83.0 83.0 79.8 85.5 91.0 FP 0.1 0.0 0.2 0.0 0.0 0.0 0.6 2.7 20.0 Endpoint 6 TP 95.2 94.8 96.4 99.4 83.0 83.0 80.3 83.4 87.0 FP 0.1 0.0 0.2 0.0 0.0 0.0 2.0 2.3 0.0 Endpoint 7 TP 94.6 94.6 96.2 99.4 83.0 83.0 79.8 82.6 89.0 FP 0.2 0.0 0.3 0.0 0.0 0.0 0.3 0.6 10.0 Endpoint 8

99 99 Chapter 4. Towards Effective Detection of Malware at Propagation

TP 94.6 94.6 96.2 99.4 83.0 83.0 82.5 86.7 91.0 FP 0.1 0.0 0.2 0.0 0.0 0.0 0.9 1.7 0.0 Endpoint 9 TP 94.6 94.6 96.2 99.4 83.0 83.0 79.6 79.8 90.0 FP 0.1 0.0 0.2 0.0 0.0 0.0 0.9 1.4 10.0 Endpoint 10 TP 93.9 94.0 96.0 99.0 83.0 83.0 80.0 82.0 90.0 FP 0.1 0.0 0.2 0.0 0.0 0.0 4.9 4.6 10.0 Endpoint 11 TP 94.6 94.6 96.2 99.3 83.0 83.0 80.1 87.8 91.0 FP 0.1 0.0 0.2 0.0 0.0 0.0 0.6 1.1 10.0 Endpoint 12 TP 94.7 94.7 96.2 99.3 83.0 83.0 79.9 86.7 90.0 FP 0.1 0.0 0.2 0.0 0.0 0.0 0.7 1.2 10.0 Endpoint 13 TP 94.7 94.7 96.2 99.4 83.0 83.0 79.9 90.3 91.0 FP 0.1 0.0 0.2 0.0 0.0 0.0 4.9 8.9 10.0 Mean TP 94.9 94.6 96.3 99.3 84.5 83.1 81.0 84.5 89.0 FP 0.3 0.0 0.4 0.0 1.4 4.2 1.6 2.4 6.9 Standard Deviation TP 0.5 0.2 0.2 0.3 3.1 0.3 2.9 4.9 3.0 FP 0.3 0.1 0.4 0.1 3.3 10.2 1.6 2.2 6.1 Table 4.16 shows the comparative detection performances of all classifiers in which the values of our results are based on the average of all worms per endpoint. As stated in (Shafiq, Khayam and Farooq 2008b), their results are calculated using an overall 95% confidence level using the t distribution.

It is clear that setting ( W=10, TAge =[300,40], BT=60, A=null) produces a lower average TP rate than those of the iRVNS, iDCA, iANFIS, ME and RL classifiers but, at least, outperforms the ME for the average FP rate. Critically, this setting performs far better than the ME and RL classifiers on home-based endpoints 3 and 4, and its home endpoint 2 is also similar to that of the iRNVS in terms of FP rates.

It is clear that setting ( W=30, TAge =[300,40], BT=10, A=null) produces a lower average TP rate than those of the iRVNS, iDCA and iANFIS classifiers and produces

100 Chapter 4. Towards Effective Detection of Malware at Propagation slightly better overall TP performances than setting ( W=10, TAge =[300,40], BT=60, A=null) but suffers a slight increase in its FP rates. It shows almost equal performances to the RL and, indeed, performs better than the ME classifier and, unlike both of them, which are weak on home endpoints, achieves good results at all endpoints, even 3 and 4, except that endpoint 13 seems a little high at a 8.9% FP rate.

Although setting (W=30, TAge =[100,15], BT=1, A=5) outperforms the average TP performances of RL, ME, ( W=10, TAge =[300,40], BT=60, A=null) and ( W=30,

TAge =[300,40], BT=10, A=null), it suffers inconsistent performances in terms of FP rates across endpoints which, for some are 0.0%, including home endpoint 4, but several are 10% or 20%.

Looking at the performances of settings ( W=10, TAge =[300,40], BT=60, A=null),

(W=30, TAge =[300,40], BT=10, A=null) and (W=30, TAge =[100,15], BT=1, A=5), we would like to pinpoint the cause of their false detections. Certainly, we acknowledge that, as stated in past publications, these home endpoints are the type that is difficult to recognise, as some behaviour patterns appear anomalous. Similar to the findings in (Shafiq, Khayam and Farooq 2008b), we also notice that worms with low scan rates, such as MyDoom and Rbot-AQJ , are the two hardest to detect in this comparative evaluation and state that, although Zotob.G , the third worm, is hardly detected by our detection system, we present its performance as it recognises sessions belonging to only the benign and remaining worms.

Table 4.17 shows that the performances of our system on those three settings without MyDoom-A and Rbot-AQJ all produce average TP rates of around 95%.

However, setting ( W=10, TAge =[300,40], BT=60, A=null) produces the lowest FP rates, with more than half its endpoints recording lower than 1% FP rates. We exclude both worms with the intention that, if they are hard to detect at propagation, we should extend the detection tasks at execution.

101 Chapter 4. Towards Effective Detection of Malware at Propagation

Table 4.17: Without MyDoom-A and Rbot-AQJ

(W=10, (W=30, (W=30, TAge =[300,40], TAge =[300,40], TAge =[100,15], BT=60, BT=10, BT=1, A=5) A=null) A=null) TP FP TP FP TP FP Mean 95.7 1.7 95.3 2.4 95.7 7.2 Standard Deviation 0.6 1.6 1.9 2.4 2.8 4.6 With Zotob.G removed, the average TP rates for all settings are raised to more than 98% while the average FP rates for all settings show almost no difference, as detailed in Table 4.18.

Table 4.18: Without MyDoom-A, Rbot-AQJ and Zotob.G

(W=10, (W=30, (W=30, TAge =[300,40], TAge =[300,40], TAge =[100,15], BT=60, BT=10, B =1, A=5) A=null) A=null) T TP FP TP FP TP FP Mean 98.4 1.6 98.1 2.4 98.1 7.2 Standard Deviation 0.6 1.6 2.2 2.4 3.1 4.7

It is evident that most worms are successfully detected at very high accuracy. Those with low scan rates, such as MyDoom-A and Rbot-AQJ , are hard to detect using our algorithm due to their lack of representation in the traffic. However, we maintain that our detection system uses session-based data with very high granularity and requires minimal data analysis compared with other types, such as detection based on a packet- by-packet mechanism. A challenge to this argument is that a worm with a higher scan rate, such as Zotob.G , bypasses our algorithm by a substantial percentage, but this is due to its port pattern being very well distributed at both local and remote ports, with the former up to port 5000 and the latter over the maximum number of possible port values.

Analysing the results in detail provides an indication that setting ( W=10,

TAge =[300,40], BT=60, A=null) produces more stable and robust worm detection without compromising the need to have a minimum FP rate. Even without removing the

102 Chapter 4. Towards Effective Detection of Malware at Propagation

MyDoom-A, Rbot-AQJ and Zotob.G worms, it is clear that the FP rates are still the same as only the TP rates are lower due to the inability of the system to detect those three worms. Thus, we suggest that the setting of ( W=10, TAge=[300,40], BT=60, A=null) is better than the others.

4.6 Issues

For the dataset we used, we determined that, in their detection buffers, most of the time, the time window experienced low volumes of data but the fixed window had a constant volume. This explains why, in our experiments, a case in the time window was completed in approximately 18 hours whereas that in a fixed window took around 5 days. Thus, to test too many possible parameter values of a fixed window would be too costly.

Some bias could be seen in the results. The TP and FP rates, which were presented in percentages, indicated that, at a glance, a 1% FP rate was smaller than a 10 % TP rate. However, as the benign signatures were 91% of the total and the TP and FP results were based on 25, 023, 865 signatures, of this large number, only 2, 449, 057 were worm signatures.

Another issue was that, while the Dloader-NY worm reached half a million signatures, which represented 22.60% of the whole dataset when detection performances were measured, MyDoom-A had only 2041, that is, 0.11% which indicated that the MyDoom-A signatures in this dataset may not be as representative as those of Dloader-NY . This tells us that each Dloader-NY signature had a very low influence, while a MyDoom-A signature had a very high impact, on detection accuracy.

4.7 Conclusions

In this chapter, we presented the results of our experiment which employed detections based on session rates/frequencies, patterns in port distribution and self and non-self signatures, all based on recent data in the traffic. We explored the performances of the system through testing using several different evaluated settings and discovered

103 Chapter 4. Towards Effective Detection of Malware at Propagation that many parameters consistently produced good results. We also found that relying on one detector alone was inadequate but that integrating the detectors to work in tandem formed a strong and stable detection mechanism.

The results showed that reliance on recent session-based network traffic was sufficient to construct a robust worm detection system. Although a few worm types were not fully detected, our algorithm proved to be very stable for detecting benign and most worms’ traffic and the characteristics of the worms that could and could not be fully detected were clearly identified.

The main outcome of this chapter is that combining more than one detection scheme produced very promising results. We suggest that, while most worms were detected by the system, as others that generated low scan rates or used highly random ports require other detection mechanisms. This is a way forward for providing the next layer of malware detection.

The SVM is an impractical solution for the real world. Shafiq et al. (Shafiq, Khayam and Farooq 2008b) proposed the use of intelligent features, which require some complex algorithms to improve detection accuracy. We presented a simpler but still robust detection technique.

All of the techniques used suffer from two main issues: 1) worms with low scan rates remain difficult to detect; 2) legitimate home endpoints with complex uses suffer from high false alarm rates. The point we want to highlight here is that, regardless of how complex the detection algorithm is, the two main issues remain unsolved if the detection to combat these type of malware is isolated at network level/session-based traffic only. If we could identify the limits of what sort of malware we can detect with one technique, those beyond the limits need to be detected by other types of detection technique. With this trade off, detection algorithms of lesser complexities can be effective if they can work together. Intuitively, in this chapter we have recognised what type of malware we can and cannot detect in propagation. We aim to solve the unsolved issues in Chapter 6.

104

Chapter 5

Evaluation of Performance of API Calls Hooking

Part of this work has previously appeared in Marhusin, M. F., H. Larkin, C. Lokan, and D. Cornforth (2008). An Evaluation of API Calls Hooking Performance. IEEE International Conference on Computational Intelligence and Security (CIS'08): pp. 315- 319.

5.1 Chapter Objectives

An open research question in malware detection is how to accurately and reliably distinguish between malware and benign programs running on the same machine. In contrast to code signatures, which are commonly used in commercial protection software, signatures derived from system calls have the potential to form the basis of a much more flexible defence mechanism. In this chapter, we report on our experimental analysis of implementing API hooking to capture sequences of API calls. The loading time of 10 common programs is benchmarked using three different settings: a ‘plain’ computer; one with antivirus software; and one with API hooking. The results suggest that the process of hooking an executable onto a number of API functions does not cause a noticeable delay in the use of common application software. Therefore, the use of API call sequences as the means of distinguishing between benign and malware code execution is viable possibility for investigation.

5.2 Introduction

In intrusion detection and malware detection research, there has been quite a number of studies using system calls/API calls as an option for determining sources of Chapter 5. Evaluation of Performance of API Calls Hooking data. It is believed that malware generate different sequences of API calls than benign programs. In addition, it is assumed that, for a set of malware sharing common malicious behaviour and method of propagation, there are concise sets of sequences of API calls generated by one malware that could be used to detect them all. The accuracy of a detection mechanism is subject to the effectiveness of its algorithm.

In this chapter, we investigate the speed penalty imposed on a computer by an API hooking mechanism. This is an initial step in our overall work towards improving the effectiveness of malware detection techniques using sequences of API calls.

5.3 Problem Statement

There are many research studies that employ API calls in an IDS or malware detection system. In order to enable a system to make use of these calls, its OS must have a mechanism capable of hooking executables as they begin to execute.

As reported on the Microsoft Developer Network site (Microsoft Developer Network 2008), hooking all of an OS’s APIs would hamper its performance. Thus, many studies have suggested minimising the number of hooks to achieve the hooking goal while preserving a reasonable computer performance. However, there is still a lack of evidence on what performance penalty is incurred as a result of implementing API hooking.

The aim of this chapter is to report on a computer system’s performance when it has a mechanism for trapping API calls invoked by programs compared with those when it has two other settings. The computer’s three settings are:

1. plain: no antivirus software and no API hooking implemented; 2. antivirus: a free edition of the AVG antivirus (AVG Technologies 2008) installed; and 3. API hook: a program that actively monitors the launch of a new executable and immediately hooks its running process. We run 10 programs on our test PC for each type of system and record their loading times, with the objective of determining if API hooking hampers the computer’s

106 Chapter 5. Evaluation of Performance of API Calls Hooking performance. We also measure the difference in performance between a computer that has antivirus installed and one that has API hooking. This chapter attempts to answer the following questions.

1. Does a hooked program run more slowly than a plain program? 2. Does a program execute more slowly in a computer which has a signature and heuristic-based detection antivirus than in one which does not? 3. Does a program with API hooking execute more slowly than the one with an antivirus program?

5.4 Experimental Planning and Operation

As API hooking tends to slow computer performance, we assume that a hooking mechanism would cause hooked computer programs to perform slower.

We choose the following 10 widely used programs.

• Ms Word 2003 • Ms Excel 2003 • Ms Outlook 2003 • Mozilla Firefox 3.0 • Internet Explorer 7 (IE) • Windows Media Player 11 (M.Player) • Winamp 5.54 • Yahoo Messenger 8.1 (YM) • Windows Live Messenger 8.5 (Live) • The Age of The Empire II (AOE2) The environment used for the test lab is simple. The machine has an Intel (R) Pentium (R) M processor of 1600 MHz with 512 MB RAM and a 40 GB hard disk. Windows XP SP2 is installed on the machine and we use Ms Virtual PC for the test environment (Microsoft Corporation 2011b).

Prior to running every experiment, the PC’s and virtual PCs’ hard disk are defragmented as is essential to avoid their fragmented surfaces actually causing

107 Chapter 5. Evaluation of Performance of API Calls Hooking performance degradation. To enable the hooking mechanism, we write our hooking program in C#.NET in Visual Studio 2008 and use AutoIt v3 (Bennett 2008) to automate its execution steps.

For each program, we measure the time taken in seconds from its execution until its graphical user interface (GUI) becomes an active window. Each program is executed thirty times at intervals of several seconds. We are interested in recording the time differences occurring throughout the thirty rounds for the three established settings.

As mentioned earlier, we examine a computer’s performances when running the same programs using three different settings, namely plain, with antivirus and with an API hooking program (which logs only the API functions’ names and discards other parameters into text files). For each program, we prepare a script to launch it and close it when its GUI is ready.

At this stage, we opt to hook only kernel32.dll, which contains the OS’s core functions for many basic operations, as our interest is in only the functions related to create, open, move, delete and replace file operations. Although we plan to monitor socket-based API calls, which allow network communications, and the registry, which has many malware hooking keys composed in advapi32.dll, for simplicity, these operations are not included in this experiment. The following is a list of the functions of interest:

Kernel32.dll

• CopyFileA • CopyFileExA • CopyFileExW • CopyFileW • CreateFileA • CreateFileW • DeleteFileA • DeleteFileW • MoveFileA

108 Chapter 5. Evaluation of Performance of API Calls Hooking

• MoveFileExA • MoveFileExW • MoveFileW • MoveFileWithProgressA • MoveFileWithProgressW • OpenFile • ReadFile • ReadFileEx • ReadFileScatter • ReplaceFile • ReplaceFileA • ReplaceFileW • WriteFile • WriteFileEx • WriteFileGather It should be noted that the above list of API titles is picked heuristically and, in Chapter 6, we reveal a longer list used in later experiments. We observe that those programs invoke some of those functions at times throughout their execution periods, including within the execution period required for this experiment.

5.5 Results and Discussion

The tests could not avoid outliers occurring. We define an outlier as a performance record in a particular round that deviates markedly from those in other rounds. The appearance of an outlier may have been due to an interruption by another program or the PC’s health. When we encounter an outlier, we simply rerun the thirty times for that program.

109 Chapter 5. Evaluation of Performance of API Calls Hooking

Table 5.1: Execution times for first run of programs

Program Plain Antivirus Hook

Execution Execution Execution Execution Execution times times overhead times overhead (in seconds) (in seconds) (%) (in seconds) (%) Ms Word 1.71 4.88 185.38 4.65 171.93 Ms Excel 1.22 1.74 42.62 1.67 36.89 Ms Outlook 3.86 7.31 89.38 6.71 73.83 Firefox 3.19 4.89 53.29 4.09 28.21 IE 2.82 5.04 78.72 3.9 38.30 M.Player 1.27 3.12 145.67 1.49 17.32 Winamp 2.66 4.93 85.34 2.27 -14.66 YM 2.3 4.36 89.57 2.84 23.48 LIVE 2.42 3.22 33.06 2.44 0.83 AOE2 2.19 2.76 26.03 2.26 3.20 Mean 2.36 4.23 82.91 3.23 37.93 Standard 0.83 1.57 50.30 1.61 53.04 Deviation

One outcome of the experiment, which can be seen by comparing Table 5.1 with Table 5.2, is a quite interesting discovery. Regardless of whether the computer has a plain, antivirus or API hooking setting, its first run of a program tends to take longer than later runs. Also, a computer with either antivirus or hooking has a much longer start up delay than with the plain setting, except for Winamp in API hooking setting which, executes slightly faster. In the first run of all programs, the antivirus causes more delay (an 82.91% execution overhead) than the API hooking (a 37.93% execution overhead), a difference of 44.97% overhead difference (see Table 5.1).

We analyse the results using the paired-samples Wilcoxon test to check for statistically significant differences. At the .05 significance level, we conclude that the execution times for the first runs of the programs for the plain and antivirus settings are significantly different [V = 0, p-value = 1.953E-03], as are those for the plain and API hooking settings [V = 4, p-value = 1.367E-02] and those for the antivirus and API hooking settings [V = 55, p-value = 1.953E-03].

110 Chapter 5. Evaluation of Performance of API Calls Hooking

We conclude that a computer does not perform really well when it runs a program for the first round after a reboot as, apparently, it requires extra time to launch the program. To help us understand this fact, we adopt the following scenario. When, after a reboot, we run Word, the software takes more time to execute the first time than subsequently. Then, without rebooting the computer, we run Excel which also experience delay during its first run. This is consistent with performances of all settings for the software tested in the experiments. We also conduct a manual run of the software (without the help of the autoit3 script) which involves thirty rounds with 30-minute intervals between them. The first runs of most programs consistently take longer than the remaining twenty-nine which run at constant speeds. It is likely that algorithms (Stallings 2011) implemented in the OS, RAM, CPU or hard disk have some influence.

Exclusively, tests on the Winamp and IE programs show that, throughout thirty rounds, the computer’s delays with the antivirus setting incurs greater penalties than those with the API hooking setting.

Also, when we run the Winamp program using the antivirus setting, it did not run at a constant speed, as also happen for the AOE2 and YM programs with the hooking setting. These facts can be seen from their standard deviations against their respective means.

As we consider the plain setting a base for benchmarking, delays in the computer with antivirus and hooking settings are measured proportional to it.

Throughout thirty runs, we observe the running performances of the computer using those three settings (see Table 5.2).

111 Chapter 5. Evaluation of Performance of API Calls Hooking

Table 5.2: Average execution times over 30 runs (in seconds)

Program Plain Antivirus Hook

Std Std Execution Std Execution Mean Mean Mean Dev Dev Overhead Dev Overhead (%) (%) Word 0.42 0.25 0.98 0.74 133.33 1.06 0.68 152.38 Excel 0.37 0.16 0.4 0.25 8.11 0.94 0.14 154.05 Outlook 0.84 0.57 1.33 1.13 58.33 1.33 0.99 58.33 Firefox 0.79 0.45 0.88 0.76 11.39 1.83 0.46 131.65 IE 0.78 0.39 1.89 0.6 142.31 1.28 0.5 64.10 M.Player 0.41 0.16 0.48 0.5 17.07 0.89 0.12 117.07 Winamp 0.56 0.4 2.95 0.51 426.79 1.12 0.22 100 YM 0.66 0.32 0.75 0.68 13.64 1.57 0.39 137.88 LIVE 0.61 0.34 0.66 0.48 8.20 1.18 0.24 93.44 AOE2 0.42 0.25 0.98 0.74 133.33 1.06 0.68 152.38 Mean 0.59 0.33 1.13 0.64 95.25 1.23 0.44 116.13 Standard Deviation 0.18 0.13 0.77 0.23 129.69 0.29 0.28 35.91 From both tables above, we can see that, for the first run, the plain setting is the fastest followed by those with API hooking and antivirus (see Table 5.1). However, over all thirty runs, the pattern is not as consistent (see Table 5.2). Although the plain setting is still the fastest, the API hooking is slower than the antivirus in 8 cases and faster in 2 (IE and Winamp). Overall, API hooking causes programs to incur an average of a 116.13% increase in execution time and the antivirus 95.25%. This suggests that the penalties incurred by the API hooking and antivirus settings on programs’ execution times are high.

We analyse the results using the paired-samples Wilcoxon test to check for statistically significant differences. At the .05 significance level, we conclude that the average execution times over 30 runs of the programs for the plain and antivirus settings are significantly different [V = 0, p-value = 5.857E-03], as are those for the plain and API hooking settings [V = 0, p-value = 5.857E-03], whereas those for the antivirus and API hooking settings are not significantly different [V = 15, p-value = 0.407].

The speeds recorded are accurate as per the software versions and computer specifications used. If another person replicates these tests, slight variations in recorded

112 Chapter 5. Evaluation of Performance of API Calls Hooking performances might be obtained. Also, the use of different computer specifications or software versions would be subject to the removal of any additional features introduced by the respective vendors. However, we are confident that the average ratio differences among these three test settings (plain, with antivirus and with API hook) would be close to our results.

Although Winamp in the API hooking setting performs better in its first runs, we cannot rule out that the complexity of the test environment has some effect, i.e., the locations of the Winamp program and virtual PC’s hard disk in the plain setting might require longer access times (Stallings 2011) than those in the API hooking setting. We conclude that optimised API hooking does impose some overhead on computer performance. However, on average the overhead is close to that imposed by antivirus software. Optimisation regarding overhead issues could be possible via the careful selection of a suitable set of API calls for hooking consideration which would mean that API hooking could be a suitable mechanism for malware detection.

5.6 Conclusion

We have investigated overheads associated with hooking API calls. Our objective is to understand the relative performance of this method by comparing it with two other scenarios: a plain computer, and a computer with antivirus software. We conclude that the performance penalty due to API hooking is no worse than that imposed by antivirus software.

This means that it is feasible to consider using API calls to determine whether a program is malware or not. This is the subject of the next chapter.

113

Chapter 6

Towards Effective Detection of Malware at Execution

6.1 Chapter Objectives

The aim of this study is to detect malware as it begins to execute and, in this chapter, we propose a data mining approach for malware detection using sequences of API calls in a Windows environment.

Most early API calls-based research on malware detection was evaluated in the UNIX environment. However, recently, research in the Windows environment has been gaining momentum.

Although there are a number of convincing solutions to the detection of malware, such as those of (Kolbitsch, Comparetti, Kruegel, Kirda, Zhou and Wang 2009) and (Ahmed, Hameed, Shafiq and Farooq 2009), little research effort has been focused on pre-emptive detection. Also, there has been a lack of systematic research in the literature on how an existing malware dataset could be used to simulate a defence mechanism against zero-day attack scenarios.

Malware detection based on API call sequences uses spatial and temporal information in these sequences. Ahmed et al. (Ahmed, Hameed, Shafiq and Farooq 2009) captured both spatial and temporal information and used statistical features to improve detection. They aimed to identify the best features for achieving best results. They evaluated their data against several machine-learning algorithms (Witten and Frank 2005), namely the instance-based learner (IBk), decision tree (J48), Naïve Bayes (NB), inductive rule learner (RIPPER) and SVM. Chapter 6. Towards Effective Detection of Malware at Execution

As previously discussed, most emphasis in the literature has been on improving detection accuracy. Some algorithms yield very high accuracy but are very costly, so they are not suitable in practice for commercial deployment.

Regarding execution performance, we note that SVM is a complex algorithm, and hooking Memory Management-related API calls is likely to be expensive since they are very commonly executed. Therefore, we are interested in proposing a scheme that involves simpler computations than SVM, and which is not based on memory-related API calls.

There are two main experiments discussed in this chapter. The first uses the same dataset as in (Ahmed, Hameed, Shafiq and Farooq 2009) for comparison purposes, and uses k-fold ( k=10) cross-validation to evaluate accuracy. The second adds several worm samples to this dataset, including those mentioned in Chapter 4 as difficult to detect; and treats this dataset chronologically instead of using cross-validation.

The remainder of this chapter is organised as follows. Section 6.2 describes the detection system, covering its architecture, profiles and detection process, and presents two experiments. Section 6.3 describes the first experiment, including details of its dataset, feature selection and reduction, n-grams, application of a k-fold cross- validation, plan and objectives, performance measures and hardware used, as well as its results. Experiment II is described in a similar way to experiment I in Section 6.4, and its results are compared with those from experiment I and the literature in Section 6.5. Before providing our conclusions in Section 6.7, we highlight the potential issues and challenges in Section 6.6.

6.2 Detection System

In this section, we discuss our detection system in three sub-sections: its architecture and algorithm, and how it could perform in a real environment; its malware and benign profiles, and how they should be managed; and, the most critical part, how the detection algorithm works, including descriptions of some of its parameters and the values obtained by them, and the results and parameters involved when each particular

115 Chapter 6. Towards Effective Detection of Malware at Execution executable is monitored.

6.2.1 Detection Architecture

The architecture of our detection system in a production environment is illustrated in Figure 6.1. This system is designed to trace the API calls of a given process as soon as the program is executed. This could be done by tracing its thread creations in memory and by hooking each program independently via separate hooking agents.

Figure 6.1: Detection architecture

In our study, we form an algorithm based on the Self/Non-self Theory. We have malware and benign profiles, and identify a threshold that distinguishes between them. We adapt the roles of NK, which identifies the preferred threshold, and Suppressor, which controls its setting so that it will not overkill, to dynamically modify the threshold. We explain the profiles, threshold and its self-adjusted process in Section 6.2.2 and 6.2.3.

Figure 6.2 shows the general structure of API calls for trapping an executable. While an executable is running, the system captures its related sequences of API calls, which is processed and transformed before being passed to the decision component as a block of n-grams. Statistical data is retrieved from this block, containing the n-grams’ degrees of closeness to both the malware and benign profiles. The decision component relies greatly on the information learned from past data contained in these profiles, and uses a threshold to make the decision. If the executable’s profile is above this threshold, it is deemed malware, otherwise benign. The process of selecting the threshold value is inspired by the roles of NK and Suppressor.

116 Chapter 6. Towards Effective Detection of Malware at Execution

API Calls Stream

Sequences of n-grams

Benign & Malware Decision Profiles

Update

Figure 6.2: General structure of API calls-trapping on single executable

Although many programs can be hooked, some mature or trusted ones can be relaxed or ignored, thereby avoiding the need to hook too many safe executables which may include protected executables and standard services of the OS.

6.2.2 Benign and Malware Profiles

Benign and malware profiles maintain a system’s knowledge about its self and non- self. They are built based on knowledge of known benign and malware detected in the past.

In a commercial environment, an antivirus company can obtain these two required profiles by collecting API calls of common software used by users worldwide. Then, which program profiles need to be included would depend on the programs installed or available on the relevant computers. Its malware profile could be generated from its existing malware collection or from several websites that offer already-detected malware. The results are two common profiles for use in detection during execution, updates to which can be made, as necessary, under the supervision of the antivirus company. Within this framework, each machine will have a unique profile of itself, very similar to the HIS which is unique to an individual, but updates could also be

117 Chapter 6. Towards Effective Detection of Malware at Execution standardised by the antivirus company. To determine whether this approach will work requires further research.

6.2.3 Detection Process

Details of the proposed algorithm are presented in this section. Throughout this chapter we use the notations listed in Table 6.1:

Table 6.1: Notations used for algorithm

Notation Description B All benign files in training dataset M All malware files in training dataset Bn n-grams collected from all benign training files n-grams collected from all malware training files in which Mn not subset of Mn Bn Mimics indicator for MHC level in cell; process for obtaining m value described on next page; and m’ new value of m after adjustment influenced m by NK and S X Current file to be evaluated and it is from testing data Xn n-grams of X testing data y Score value returned when Xn parsed into Mn and Bn NK Natural Killer cell S Suppressor cell

As an executable runs, its n-grams are monitored. Then, to make a decision, the collection of n-grams seen so far ( Xn ) is compared with the total sets of n-grams from the known malware ( Mn ) and known benign programs ( Bn ). Three counts are obtained: the numbers of n-grams in common with malware, benign and neither. From this, the ratio y is computed as y = ( n-grams in common with malware) / ( n-grams in common with malware + n-grams in common with benign); if it is at or above a threshold ( m), the executable is deemed to be malware.

The main question is the reliability of the decision. Another question of interest is how soon during execution can a reliable decision be made.

The initial value of m is found by computing ratio y for each known malware in the

118 Chapter 6. Towards Effective Detection of Malware at Execution training set (using all its n-grams) and selecting the median of all of the y values. The value of m can be adjusted.

Generally, the detection process undergoes the following two main stages.

Stage 1: Preparation

The n-grams of the malware ( Mn ) and benign programs (Bn ) in the training set are obtained and gathered as two separate collections, with those that appear in both being removed from the Mn profile but retained in the Bn profile.

Mn and Bn are used for two purposes: as part of the process for determining the decision threshold ( m); and for describing whether each test executable is malware or benign.

Then, for each file in M, its n-grams ( Xn ) are obtained and compared with those of Mn and Bn . The idea is to determine how each malware compares with other malware and with benign programs. The outcome from that comparison is a y value for each file in M: y = (total n-grams in common with Mn ) / (total n-grams in Mn + total n-grams in Bn ). These y values are sorted and their median is the initial value for m, using which benign programs will most likely be correctly detected. However, this initial value of m is clearly too high: by definition, many malware will fall below this threshold and, thus not be detected. Therefore, before the testing phase begins, an adjustment for m is made by simply following the rule that, if (m>S) m=NK*S else m=NK*m, so that a new m is obtained.

Stage 2: Evaluation of testing set

For each file ( X) in the testing set, Xn is parsed into n-gram matching Bn and n- gram matching Mn . From this, ratio y is computed as y = (total n-grams in common with Mn ) / (total n-grams in Mn + total n-grams in Bn ) and, if it is at or above the threshold ( m), the executable is deemed malware.

119 Chapter 6. Towards Effective Detection of Malware at Execution

6.3 Experiment I

Currently, there are tools, such as the apimonitor software program (www.apimonitor.com 2010), Deviare framework (Nektra Advanced Computing 2010) and Detours software package (Microsoft 2012c), which can assist developers and researchers to intercept API calls of Windows’ OSs. These could be used to collect benign and malware profiles.

We use an existing dataset, available at (nexginrc.org 2010), to enable comparison with the results published in the work of Ahmed et al. (Ahmed, Hameed, Shafiq and Farooq 2009) who used the same dataset.

This dataset consists of sequences of API calls from 98 benign programs and 416 malware executables. The latter comprise 117 trojans, 165 viruses and 134 worms, and include a number of malware that make use of obfuscation techniques. Based on our checking of online databases describing malware (Symantec 2007; Kaspersky Lab 2010; Symantec 2010), some malware implement polymorphism or encryption engines, e.g., Virus.Win32.Alman.a , Virus.Win32.Dream.4916 , Virus.Win32.Crypto , Virus.Win32.Chop.3808 and Virus.Win32.Aris , and others packing and unpacking engines, e.g., Worm.Win32.Lioten , Trojan.Win32.AVKill.a , Trojan.Win32.AntiNOD.b , Trojan.Win32.Ajim , Mytob and Zotob . We believe that the inclusion of these malware will provide some insight into the capability of our detection system to fight evasion attacks. As described by the original authors, this malware collection was obtained from (VX Heavens 2010), and proprietary software (www.apimonitor.com 2010) was used to record the API calls of its benign programs and malware executables. The sizes of the executables used varied: in the benign category, the minimum and maximum were 4KB and 104588KB, respectively, and the average 1263KB; whereas in the malware category, the maximums of the trojan, virus and worm types were comparatively small, being only 9,277KB, 5,832KB and 1,301KB, respectively, with an average of around 266.7KB and a minimum of approximately 2.7KB. Apparently, benign programs generate longer API sequences than malware. Below is an example of API call

120 Chapter 6. Towards Effective Detection of Malware at Execution sequences generated by the trojan horse Win32.Bancos.j.apm 6:

“…GlobalFree, RegOpenKeyExA, RegOpenKeyExW, HeapAlloc, HeapFree, RegQueryValueExW, HeapAlloc, HeapFree, RegCloseKey, GlobalSize…”.

We examined the dataset and API classes on the Microsoft Developer Network (MSDN) website (Microsoft Corporation 2010c) and noted that the APIs appearing in the dataset fall into the 16 classes listed in Table 6.2.

Table 6.2: API classes in MSDN Library (Microsoft Corporation 2010c)

No API Function/Routine Classes 1 Registry 2 Network 3 Network Share Management 4 Windows Networking Functions 5 Memory Management 6 Windows Native System Services Routine (Windows Driver Kit) 7 File Management 8 Directory Management 9 Volume Management 10 Disk Management 11 Large Integer Functions 12 Winsock 13 Winsock Service Provider Interface (Winsock SPI) 14 Process and Thread 15 Process Status API (PSAPI) Functions 16 Dynamic Link Libraries We grouped them into seven classes: 1) registry; 2) network; 3) memory; 4) file directories and special functions; 5) socket; 6) process and thread; and 7) dynamic link libraries, as listed in Table 6.3.

6 The names of most of the malware in the dataset are based on the classification in http://www.vxheavens.com.

121 Chapter 6. Towards Effective Detection of Malware at Execution

Table 6.3: API classes (Microsoft Corporation 2010c) evaluated in study

Class ID API Function/Routine Classes 1 Registry 2 Network, Network Share Management and Windows Networking Functions 3 Memory Management File Management, Directory Management, Volume Management, Disk 4 Management and Large Integer Functions 5 Socket (Winsock and Winsock SPI) 6 Process and Thread, and Process Status API Functions 7 Dynamic Link Libraries

Our exploratory analysis shows that there is a total of 237 unique API calls generated by benign programs and malware executables, of which the benign use only 166 and the malware 195. There are 71 API calls used only by malware executables of which 33 (46%) appear in all malware classes while 12.6%, 11.2% and 19.7% appear exclusively in trojan, virus and worm, respectively, as listed in Table 6.4.

Table 6.4: API calls used by benign programs and malware executables in dataset

Category Total Total APIs 237 Total in benign 166 Total in malware 195 Shared in malware and benign 124 Exclusive in benign 42 Exclusive in malware 71 Mutually exist in trojan, virus and worm but not benign 33 Total in trojan 159 Total in virus 147 Total in worm 140 Exclusive in trojan 9 Exclusive in virus 8 Exclusive in worm 14

The abovementioned APIs are based on only unrepeated figures. A real API call sequence involves a mixed invocation of repetitive functions starting from the initial execution of the executable until it stops. Table 6.5 shows the total numbers of

122 Chapter 6. Towards Effective Detection of Malware at Execution appearances of API calls in the dataset. Although there are fewer benign programs than malware executables, they carry 56.8% of the total API call invocations, followed by trojan (16.3%), virus (15.7%) and worm (11.1%). The ratios of malware calls appearing exclusively in trojan, virus and worm to the total malware calls are very small, being 0.03%, 0.01% and 0.02%, respectively.

The benign programs in the dataset generate relatively more API calls than malware. The APIs used by benign, malware and both programs are shown in Table 6.5 in which it can be seen that a small proportion of the total APIs, 0.09%, is used exclusively by benign programs’ executions. Trojan, virus and worm types use APIs which do not exist in any benign applications and, regardless of whether the APIs exist exclusively in each malware category, we identify 5.7% of them in trojan, 1.58% in virus and 1.6% in worm.

Table 6.5: shared or exclusive API call sequences in benign and malware

Category Benign Trojan Virus Worm Total executables 98 117 165 134 Total calls 2,210,786 635,989 612,808 43,3554 Appear only in benign 2061 N/A N/A N/A Appear in benign & malware 2,208,725 599,533 603,124 42,6604 Appear in malware but not in benign N/A 3,6456 9,684 6,950 Malware calls appearing exclusively in trojan, N/A 222 59 87 virus or worm

We also investigate whether these same APIs are invoked repetitively and find that most are not in either malware or benign as they represent only a very small proportion of the API call sequences. However, they appear very infrequently in certain categories of malware classes, as shown in Table 6.6.

123 Chapter 6. Towards Effective Detection of Malware at Execution

Table 6.6: Repetitive/non-repetitive API call sequences used by benign and malware

APIs Single Multi Type Single Multi Use % % Shared 1843884 364841 83.40 16.50 Benign Benign 1935 126 0.09 0.01 Shared 525388 74145 82.61 11.66 Trojan Malware 35812 644 5.63 0.10 Shared 519988 83136 84.85 13.57 Virus Malware 8928 756 1.46 0.12 Shared 378917 47687 87.40 11.00 Worm Malware 6674 276 1.54 0.06

In summary, to discriminate benign from malware programs is challenging because a large number of APIs are used by both, that is, 99.91% in benign, 94.27% in trojan, 98.42% in virus and 98.4% in worm. Based on this information, we expect that viruses will contain more n-grams similar to benign, followed by worms and trojans.

6.3.1 Features Selection and Data Reduction

An API call sequence contains a number of features depending on the names of its functions. A sequence of API calls captured using the (www.apimonitor.com 2010) tool can contain comprehensive information, such as the executable’s profile, and the function’s name and its associated parameters. Although spatial and temporal information could be retrieved from a collection of API call sequences, using too many features will usually involve more complex detection algorithms in order to associate them and produce aggregated data. Therefore, we use only function names as a feature for detection purposes.

We investigate the need to ignore certain API classes in the dataset and find that some which appear in benign, malware or both seem to have high concentrations of one type, as can be observed from the APIs in Classes 3, 4 and 7 shown in Table 6.7.

124 Chapter 6. Towards Effective Detection of Malware at Execution

Table 6.7: Insight into reduction process for API classes

Appear in Appear in Appear in No API Function/Routine Classes Benign Both Malware 1 Registry 15 34 n/a Network, Network Share Management 2 1 4 10 and Windows Networking Functions 3 Memory Management 1 33 1 File Management, Directory Management, Volume Management, 4 n/a n/a 50 Disk Management and Large Integer Functions 5 Socket (Winsock and Winsock SPI) 20 16 6 Process and Thread, and Process Status 6 5 32 4 API Functions 7 Dynamic Link Libraries n/a 5 n/a

As too many API calls are invoked from Memory Management, with many at a high frequency and yet co-existing in high proportions in benign and malware, we propose that memory-related API calls not be used as a source of data due to their high frequency of variable declaration, invocation and re-invocation in modern programs. Based on this, and further evidence provided in the next section, we remove all memory-related API calls in the dataset.

It is noted that the benign data does not use any API calls from Class 4 which is evidence that benign API calls are only partially collected, perhaps only those of the executables captured from the initial execution up to a certain time or condition (i.e., when the GUI is ready) and does not include those generated when an end-user started interacting with the program.

6.3.2 n-grams

An n-gram is a technique used in data mining (Witten and Frank 2005) and can be defined as a sub-sequence of n items from a given stream or sequence of data which can come from various sources, such as text, graphic, audio and video. Concerning the dataset we use in this chapter, the term sequence or stream refers to the API calls

125 Chapter 6. Towards Effective Detection of Malware at Execution invoked by a running executable and an item refers to any API functions within the chosen classes. Hence, an n-gram of an API call sequence refers to a sequence of functions the size of which is subject to the value of n.

Usually, n-grams are non-overlapping sequences of items but can be designed to be overlapping. We apply non-overlapping sequences of API calls; for example, a string of API call sequences generated by a trojan horse named Win32.Bancos.j.apm is:

“…GlobalFree, RegOpenKeyExA, RegOpenKeyExW, HeapAlloc, HeapFree, RegQueryValueExW, HeapAlloc, HeapFree, RegCloseKey, GlobalSize …”.

If the size of n=5, it can be transformed into:

n-gram1= GlobalFree, RegOpenKeyExA, RegOpenKeyExW, HeapAlloc, HeapFree

n-gram 2= RegQueryValueExW, HeapAlloc, HeapFree, RegCloseKey, GlobalSize

Further, these n-grams can be transformed into the simpler format of:

n-gram 1= 1,2,3,4,5

n-gram 2=6,4,5,7,8

Determining actual n-gram sizes is very important as two issues arise, as highlighted in (Forrest, Perelson, Allen and Cherukuri 1994) and (Forrest, Hofmeyr and Somayaji 1998). If n is small, the n-gram sequences will find it difficult to discriminate between benign and malware. Inversely, a large n will create a large number of unique sequences as they form a larger combinational matrix of APIs.

We investigate this issue and evaluate n sizes of 1 to 10. As shown in Table 6.8, it appears that the best setting for n is when n=5 as any n values greater than five will have lesser reduction rates but larger n-gram cardinality. Therefore, we set n=5 in this experiment which means that there are five function names for every n-gram.

The dataset contains 48472 benign and 19732 malware n-grams when n=5. Removing the Memory Management class greatly reduce the numbers to only 14220 benign and 7552 malware n-grams, a reduction of 70.66% and 61.73%, respectively, from their original numbers.

126 Chapter 6. Towards Effective Detection of Malware at Execution

Table 6.8 shows the number of n-grams in the benign and malware profiles based on all the 10-fold data for n-gram sizes of 1 to 10, including numbers with and without the Memory Management class APIs. The last column shows the number of n-grams in the malware profile after removing redundant n-grams appearing in the benign dataset.

Table 6.8: Unique n-grams of benign vs malware

Remove Memory Malware Malware n size Benign Management Class? (Redundant) (Unique)

1 No 166 195 71 1 Yes 132 161 70 2 No 2217 2791 1412 2 Yes 1237 1850 1188 3 No 10877 10680 5481 3 Yes 4460 5173 3385 4 No 27969 21821 12443 4 Yes 9477 8434 5651 5 No 48472 30761 19732 5 Yes 14220 10423 7552 6 No 65876 35837 25590 6 Yes 17530 11071 8621 7 No 77911 38160 29245 7 Yes 19080 11116 9135 8 No 84058 38224 30856 8 Yes 19408 10819 9209 9 No 86351 37921 31899 9 Yes 19423 10568 9190 10 No 85630 36876 31787 10 Yes 18854 10104 8929 We analyse the results in Table 6.8 using the paired-samples Wilcoxon test to see if there is evidence of a real difference with and without removing the Memory Management class APIs and if there are real differences between the three benign/malware columns. At the .05 significance level, we conclude that with the Memory Management class APIs there is a significant difference for the number of n- grams in the benign and malware (redundant) categories [ V = 49, p-value = 2.734E-02]

127 Chapter 6. Towards Effective Detection of Malware at Execution as are those for the benign and malware (unique) categories [ V = 55, p-value = 1.953E- 03] and for malware (redundant) and malware (unique) categories [ V = 55, p-value = 1.953E-03]. Similar results are also obtained when we remove the Memory Management class APIs for the number of n-grams in the benign and malware (redundant) categories [ V = 51, p-value = 1.367E-02] as are those for the benign and malware (unique) categories [ V = 55, p-value = 1.953E-03] and for malware (redundant) and malware (unique) categories [ V = 55, p-value = 1.953E-03].

Table 6.9: Percentage of n-gram reduction from the removal of the Memory Management class APIs

Malware Malware Benign n size (Redundant) (Unique) (%) (%) (%) 1 20.5 17.4 1.4 2 44.2 33.7 15.9 3 59.0 51.6 38.2 4 66.1 61.4 54.6 5 70.7 66.1 61.7 6 73.4 69.1 66.3 7 75.5 70.9 68.8 8 76.9 71.7 70.2 9 77.5 72.1 71.2 10 78.0 72.6 71.9 Average 64.2 58.7 52.0 Standard Deviation 18.6 19.0 25.3 Statistical analysis shows that the removal of the Memory Management class APIs from the dataset significantly reduces the number of n-grams [ F(1, 18) = 9.934, p < 5.520E-03] as are those for the malware (redundant) [ F(1, 18) = 11.88, p < 2.870E-03] and malware (unique) [ F(1, 18) = 8.734, p < 8.470E-03] categories. Table 6.9 shows that the amount of n-gram reduction is in an increase trend and proportional to the size of n.

6.3.3 k-fold Cross-validation

Cross-validation (Refaeilzadeh, Tang and Liu 2009) is used to assess the results of a

128 Chapter 6. Towards Effective Detection of Malware at Execution learning algorithm by dividing a dataset into two parts called training and testing sets. It is a common technique known as k-fold cross validation for which k is usually set to 10. Its objective is to provide each partition of a dataset with an equal chance of both validating other partitions and being validated. Thus, the k-fold aims to make use of all the data for both training and testing while avoiding the over-fitting that can arise if the same data is used in full for both training and testing.

We distribute the 98 benign and 416 malware files equally into 10 folds, following the standard k-fold ( k=10) cross-validation scheme. The former are sorted alphabetically, with each sequentially placed in one fold, thereby resulting in 8 folds with 10 benign programs and 2 with 9. For the malware files, we first group them according to their malware types and sort them alphabetically. Then, those in the trojan cluster are sequentially added to folds followed by those in the virus and worm clusters.

In each of these 10-fold data sets, one fold is used as testing data and the other 9 form the training set. As one round of the k-fold begins, we pass the training set to the detection system so it can begin learning it to find the values for Bn, Mn and m. Then, the testing folds are loaded for testing to begin. For each X file, the y value is obtained and compared against m which is moderated by NK and S. The TP and FP rates are calculated for each round of the folds and the normalised and averaged results representing the entire folds are obtained at the end of the experiment.

6.3.4 Plan and Objectives

In this experiment, as NK and S represent important parameters that influence the operation of the system, we begin by ascertaining their best S settings, as described in Section 6.3.7.1. Then, we fully evaluate the system, as in Section 6.3.7.2, with the NK and S parameters set to their optimal values.

Evaluation of the system’s performances involves several aspects, including:

1. the effectiveness of NK with and without S (Section 6.3.7.1). Note that, in immunology, when NK detects the presence of pathogens, it dismantles them for complete and safe destruction. Without S, this process causes autoimmune disease, a

129 Chapter 6. Towards Effective Detection of Malware at Execution situation where safe cells are also overly dismantled or destroyed. With S, this process can be regulated. In our experiments, NK is used to adjust the detection threshold. Only when a condition is triggered, S is used to fine-tune the detection threshold;

2. the effectiveness of the full execution-based (Section 6.3.7.2) and pre-emptive- based detections (Section 6.3.7.3); and

3. the execution performances of the settings (Section 6.3.7.5).

6.3.5 Performance Measures

We evaluated the performance of the detection system using:

• TP: the percentage of malware executables correctly classified as malware; • FP: the percentage of benign programs wrongly classified as malware;

6.3.6 Hardware

For experiment I, approximately 20 machines equipped with an Intel (R) Pentium (R) M processor 1600 MHz with a 512 MB RAM and 40 GB hard disk and with Windows XP SP2 installed, are used for all scenarios. The performance estimation overhead is obtained by using a computer equipped with an Intel (R) Pentium Core(TM) 2Duo P8600 2.40GHz with a 4 GB RAM and 80 GB hard disk with Windows Vista Ultimate SP1 installed.

6.3.7 Results and Discussion

6.3.7.1 NK and S

The crucial part in this preliminary experimentation is the determination of the optimal NK and S values, noting that NK is multiplied by m in order to adjust m and that S also moderates the value of m which indirectly affects its final value generated via its multiplication with NK .

We explore all possible values for NK and S from the entire possible spaces of all the training sets. This is achieved by heuristically obtaining a number approximating the

130 Chapter 6. Towards Effective Detection of Malware at Execution optimal figure. Then, we perform the k-fold cross-validation test using the predicted NK value to reveal the potential of that value. When performing the test, we record the profiles of each file during detection, including the state of the parameters in the detection system corresponding to the detection status of the file. Then, using the recorded states, we repeat the entire round and then, when the states are fixed, test every single increment of the NK value one by one and the TP and FP rates are automatically calculated.

Having explained exploring the value of NK alone, when dealing with NK and S together, the heuristic search is double folded, that is, we first search for the optimal NK , using a default value of S. Once we obtain the optimal value of NK , we fix it and then search for the optimal value of S. This value is used again to seek and validate the optimal value of NK .

When testing the performance of the system on every 0.001 point, we find that NK = 0.847 and S= 0.061 yield the most promising outcome. Figure 6.3 shows the overall detection performances in relation to the dynamics of the NK values when S=0.061. Note that there is a range of best values for NK : from NK = = 0.847 to 0.906 the average TP rate is 96.86% and the average FP rate 0%. The standard deviation for the TP rates is only 0.8.

100 90 80 TP 70 FP 60 50 40 Rates (%) Rates 30 20 10 0 0 0.7 1.4 2.1 2.8 0.18 0.35 0.53 0.88 1.05 1.23 1.58 1.75 1.93 2.28 2.45 2.63 2.98 NK Values

Figure 6.3: NK values and performances

Figure 6.4 shows the overall detection performances in relation to the dynamics of the S values, with NK set to its best value of 0.847. A range of S values, from S = 0.057 to 0.065, has average values of the TP and FP rates of 97.7% and 0.45%, respectively, and their respective standard deviations 1.1 and 0.5, while the average FP rate from S=

131 Chapter 6. Towards Effective Detection of Malware at Execution

0.061 to 0.065 is 0%.

100 90 80 TP 70 60 FP 50 40

Rates (%) 30 20 10 0 0.00 0.01 0.03 0.04 0.05 0.06 0.07 0.09 0.10 0.11 0.12 0.13 0.15 0.16 0.17 0.18 0.19 S Values

Figure 6.4: S values and performances

100 TP 90 FP 80 70 60 50 40 Rates(%) 30 20 10 0 0 0.7 1.4 2.1 2.8 0.18 0.35 0.53 0.88 1.05 1.23 1.58 1.75 1.93 2.28 2.45 2.63 2.98 NK Values Figure 6.5: NK values in absence of S and performances

We also evaluate the performance of the system with NK but without S, and the overall results shown in Figure 6.5 indicate that NK helps the system produce the best detection when set to 0.431. From 0.399 to 0.443, with every 0.001 increase in the NK value, better results are observed, with average rates within that range being 98.02% for TP and 0.77% for FP, and their respective standard deviations 0.8 and 0.5. It is interesting to note that, starting from NK = 0.431, the FP rates are 0%.

The sensitivities of the NK and S values are low, which means that small changes in their optimal values do not greatly affect the system’s performance.

We also evaluate the values of NK and S when the malware are grouped separately into their respective categories: trojan, virus and worm. It is noted that NK retains the same value while S moves to 0.0435 for trojan, 0.0605 for virus and 0.0465 for worm. This indicates that, for the original S value of 0.061, as trojan is traced to have more n- grams belonging to the malware category, they are the easiest to detect, followed by

132 Chapter 6. Towards Effective Detection of Malware at Execution worm and virus. However, the issue here is that relying on the n-grams of trojan alone will most likely cause higher rates of FPs on other types of malware, and virus seems to be the most stable training data. Overall, the combination of all these malware types produces a set of robust results.

In this section, we demonstrate our approach for searching for optimal values of NK and S in the search space. To achieve that, we record the state of detection for every file in the testing sets of the k-fold ( k=10). We explore the sensitivities of the two parameters, which indicates that fluctuations in the values within a wide range do not dramatically affect the detection outcome. Supporting this claim, we also demonstrate that the NK value does not change even when the training set used is separated into trojan, virus and worm classes but that the S value changes by up to 0.0175 points for trojan, 0.0005 for virus and 0.0145 for worm.

6.3.7.2 Full Execution-based Detection

In this experiment, we perform an evaluation based on the full execution of each executable in the testing data, the detection of which is based on the entire set of n- grams in its execution. The result using both NK and S yields rates of TP = 98.56% and FP = 0% which proves that, even though some malware cannot be detected on some k- folds, the system achieves perfect recognition of benign programs, as detailed in Table 6.10.

133 Chapter 6. Towards Effective Detection of Malware at Execution

Table 6.10: Performances for full execution with NK and S

k-fold TP Rate (%) FP Rate (%) 1 97.6 0 2 100 0 3 100 0 4 100 0 5 100 0 6 100 0 7 97.6 0 8 97.6 0 9 95.2 0 10 97.6 0 Mean 98.6 0.0 Standard Deviation 1.7 0.0 When S is removed, NK is readjusted to 0.431 and, with that setting, the TP rate drops slightly to 98.08% while the FP is retained at 0%, as shown in Table 6.11.

Table 6.11: Performances for full execution without S

k-fold TP Rate (%) FP Rate (%) 1 97.6 0 2 100 0 3 100 0 4 100 0 5 97.6 0 6 100 0 7 97.6 0 8 97.6 0 9 92.9 0 10 97.6 0 Mean 98.1 0.0 Standard Deviation 2.2 0.0 We conclude that, when the system has NK with S, its results show better detection performances but that, even having NK without S, the system is capable of maintaining a 0% FP rate. However, we have some issues regarding the meaning of ‘full execution’ and the suitability of the dataset which we highlight in Section 6.6.

134 Chapter 6. Towards Effective Detection of Malware at Execution

6.3.7.3 Pre-emptive-based Detection

This section highlights one of the most significant experiments in this chapter. Although it is useful if a detection system can correctly detect malware, it is more valuable if it can detect whether the executable is a malware while it executes rather than having to wait until its execution is completed. As such, the aim of this experiment is to provide insight into the capability of this system to recognise malware and benign pre-emptively which is useful for combatting most malware since they take the form of a single file or are embedded in a small program. However, it is not intended to detect malware which is specially crafted and embedded within a large software application. In this pre-emptive-based detection, we attempt to determine how many n-grams are needed to reliably detect malware.

First, we test system performances on a range of numerical values, from the first to first 1000 th n-grams, and then identify which percentage point of the malware execution, based on the n-gram sequences, generates the best results.

The size of each executable varies due to the operations programmed in it and its invocations of different patterns of API call sequences. While most executables in the dataset contain fewer than 1000 n-grams, some have many more. However, we assume that evaluating the files beyond the 1000 th n-gram is unnecessary as the results are predictable much earlier, as can be seen in Figure 6.6 and Figure 6.7.

100 90 80 TP 70 FP 60 50 40

Rates (%) Rates 30 20 10 0 1 78 155 232 309 386 463 540 617 694 771 848 925 n-gram blocks Figure 6.6: Detections based on 1st to 1000 th n-grams with NK and S

It can be seen that pre-emptive detection achieves good performances at a consistent rate from the 29 th n-gram, yielding averages of 95.19% for TP and 1.02% for FP with

135 Chapter 6. Towards Effective Detection of Malware at Execution respective standard deviations of 3.0 and 3.2. To be more precise, most TP rates of the 10-fold tests produce at least 92.7%, and the sixth 100%, detection. The system records 0% FP rates on all folds except the ninth which is 10%.

When NK and S work in tandem, the peak performances appear in several places from the 113 th to 135 th n-grams, most of which record accuracy rates of 100% for TP and 0% for FP. This indicates that the system is capable of recognising benign programs and malware executables quite early in their execution, even after only 113 n-grams, with high accuracy.

We also evaluate the performance of the system with NK alone and find good performances still start at the 29 th n-gram. Figure 6.7 shows the overall performances of the detection system with this setting. The peak performances appear in several places from the 113 th to 135 th n-grams. Within the range of these overall performances, most record accuracy rates of at least 99.76% for TP and 0% for FP, which at the 113 th n- gram, reach 100% and 0%, respectively. Overall, the system performs slightly better with NK and S integrated rather than with NK alone.

100 90 80 TP 70 FP 60 50 40 Rates Rates (%) 30 20 10 0 1 78 155 232 309 386 463 540 617 694 771 848 925 n-gram blocks Figure 6.7: Detections based on 1 st to 1000 th n-grams without S

How far through a malware’s execution is the 113 th n-gram? Table 6.12 shows details of each k-fold, and median and mean values of the percentage range for malware files; for example, for the malware in fold 1, when 113 n-grams is expressed as a percentage of malware total executions, the median value is 12.4%.

136 Chapter 6. Towards Effective Detection of Malware at Execution

Table 6.12: 113 th n-gram as percentage of total execution

Median of % Mean of % k-fold Range for Malware Range for Malware 1 12.4 28.7 2 10.0 28.8 3 5.0 30.5 4 10.3 31.9 5 5.8 32.6 6 10.3 32.3 7 14.7 34.2 8 9.2 35.7 9 8.5 37.2 10 10.6 39.4 Average 9.7 33.1 Standard Deviation 2.9 3.5

6.3.7.4 Distinguishing Benign from Malware

As can be seen in Figure 6.8, several benign programs are close to the decision threshold. We identify these files and find that MSN Messenger is the closest, followed by FreeCell, Word and Counter Strike. This indicates that, by following our API scheme, these executables generate n-gram patterns closer to malware profiles than do the rest of the benign programs. Programs or executables that produce y values too close to the m line are at risk of being misclassified.

137 Chapter 6. Towards Effective Detection of Malware at Execution

y for Malware m y for Benign

0.21 y value y

0 1 35 69 103 137 171 205 239 273 307 341 375 409 443 477 511 File ID

Figure 6.8: Detection graph showing discrimination lines between malware and benign at 113 th n-gram

6.3.7.5 Execution Speeds

The overall speed performances of the algorithms for pre-emptive detection and full execution do not show any significant difference. Table 6.13 lists the average times taken to perform the training and testing tasks for the 10-fold data with their respective standard deviations, with and without the detection algorithm.

Table 6.13: Times taken to perform detection

Differences without Average/Fold Standard Standard Setting Detection (in seconds) Deviation Deviation Algorithm/Fold (in seconds) NK +S at 29 th 149.84 5.71 3.63 5.71 NK +S at 113 th 150.88 5.27 1.39 6.48 NK +S on full execution 166.39 5.86 16.94 4.74 NK alone at 29 th 146.37 5.98 0.26 5.37 NK alone at 113 th 149.66 4.63 1.44 6.41 NK alone on full execution 149.41 5.67 1.37 5.94

138 Chapter 6. Towards Effective Detection of Malware at Execution

6.4 Experiment II

Experiment I used the malware samples as training and testing sets without considering their chronology. This raises the issue as to whether the detection system’s strong capabilities are a result of mixing old and newer malware samples to detect old ones. Therefore, to try to resolve this, we arrange the dataset differently.

The first change is to add some more worms. In Chapter 4, we report that our system can efficiently detect propagating worms, except Rbot-AQJ , MyDoom-A and Zotob.G . As we hope to develop a system that can detect malware at execution if it is not detected at propagation, it is important to include these worms. We are able to obtain executable forms of Rbot-CCC , Dloader-NY , Forbot-FU , MyDoom-A, SoBig.E and Zotob.G from (Offensive Computing 2010) or (VX Heavens 2010), and API call sequences for them are captured by a program called API Monitor (www.apimonitor.com 2010) in a virtual environment using Ms Virtual PC (Microsoft Corporation 2011b).

Table 6.14 shows the sizes of the worms’ executables, the total n-grams captured based on related API calls and their detection rates obtained in Chapter 4.

Table 6.14: Additional malware included in dataset

Total API Calls Detection (% TP) Name Size (KB) Captured in Chapter 4 Rbot-CCC 114 911 95.1 Dloader-NY 11 3215 97.1 Forbot-FU 96 19812 98.4 MyDoom-A 22 159 6.7 SoBig.E 85 21 97.7 Zotob.G 78 3611 71.2

With the inclusion of these additional worms, the dataset contains 520 executables.

We sort the malware files according to their detection or release dates. We search for the malware dates from the Kaspersky database web page (Kaspersky Lab 2010) which has the most records and uses the same malware naming scheme. If they are not available there, we obtain them from (Symantec 2010), (Mcafee 2010b), (CA 2010),

139 Chapter 6. Towards Effective Detection of Malware at Execution

(Trend Micro 2010), (Panda Security 2010), (Sunbelt Security 2010), (ThreatExpert 2010), (PestPatrol 2005) and (Spyware Terminator 2010).

A perfect treatment of chronology would ensure that the training set has all the malware executables and benign programs available up to that particular date, and the testing set would involve only the malware executables and benign programs created or released on that date. If this would need to be repeated on a daily basis over the whole time span of the dataset, this strategy would be prohibitively expensive given the limited resources we have and the unavailability of the creation dates of benign executables. Therefore, an approximation is used to partition the datasets: the malware is ordered by date and then divided into 11 equally sized folds; and the benign is divided into 10 folds, following the same distribution technique mentioned in Section 6.3.3. The idea is that the first fold of malware is the training data for the second fold; the first two folds of malware are the training data for the third fold; etc. Eleven folds are needed to maintain ten testing folds. However, we use only 10 folds of benign sample, similar to the implementation in Experiment I, as the release dates for benign applications are unknown. As such, each fold in benign sets has the equal chance to be training and testing sets.

Table 6.15 shows how we distribute the malware equally into 11 folds while the distribution of benign files ensures that their representations are almost equal but random. This unique distribution of malware files ensures that the detection system is sufficiently tested in terms of accuracy and robustness while simulating the threats we face in a real Internet-connected computer environment.

The second subset of malware and first fold of 10 benign samples are detected using only malware from the first subset, plus the remaining benign training data. The third subset of malware and second fold of benign samples are detected using only malware from the first and second subsets, plus the benign training data, and so on. It should be noted that the knowledge gained from detecting the malware in each fold is incorporated into the malware profiles for the detection of subsequent folds. This provides the malware profile at a later fold with more knowledge about malware than the previous fold(s). This process is repeated until all testing sets are evaluated.

140 Chapter 6. Towards Effective Detection of Malware at Execution

Table 6.15: Malware dataset distribution into folds

Date Ranges Fold ID Trojan Virus Worm Total * (to date) 3-Mar-00 to 1 12 20 6 38 30-Nov-00 2 12 17 9 38 *9-Aug-01 3 16 13 9 38 *10-Jan-02 4 15 23 0 38 *10-Jan-02 5 2 25 11 38 *15-Mar-02 6 8 16 14 38 *4-Jul-02 7 15 13 10 38 *10-Jan-03 8 10 12 17 39 *02-Jul-03 9 2 5 32 39 *13-Oct-03 10 13 15 11 39 *28-Oct-04 11 12 6 21 39 *7-Sep-09

In experiment II, we evaluate the performances of the system from several aspects, including:

1. the effectiveness of the full execution-based (Section 6.4.3.1) and pre-emptive- based detections (Section 6.4.3.2); and

2. the execution performances of the settings (Section 6.4.3.4).

6.4.1 Performance Measures

We evaluate the performances of the detection system using:

• TP: the percentage of malware executables correctly classified as malware; • FP: the percentage of benign programs wrongly classified as malware;

6.4.2 Hardware

For experiment II, approximately 20 machines, each with a HP Z400 Work Station equipped with an Intel (R) Xeon (R) CPU W3520 2.67 GHz with 8GB RAM and a 500

141 Chapter 6. Towards Effective Detection of Malware at Execution

GB hard disk with Windows 7 Enterprise installed, are used for all scenarios.

6.4.3 Results and Discussion

6.4.3.1 Full Execution-based Detection

In this experiment, we perform an evaluation based on the full execution of each executable in the testing data. The detection for each file is based on the entire set of n- grams in it. Using both NK = 0.847 and S = 0.061 yields TP = 96.6% and FP = 0%, and the standard deviation for the TP rates is 2.5. The results prove that, even though some malware cannot be detected in some k-folds, the system achieves perfect recognition of benign executables, as detailed in Table 6.16.

Table 6.16: Performance for full execution with NK and S

k-fold TP Rate (%) FP Rate (%) 1 92.1 0 2 97.4 0 3 100.0 0 4 97.4 0 5 97.4 0 6 97.4 0 7 92.3 0 8 97.4 0 9 97.4 0 10 97.4 0 Mean 96.6 0.0 Standard Deviation 2.5 0.0 Table 6.17 lists the malware undetected under the full execution-based detection, and the reasons for some malware being undetected under the pre-emptive-based detection are discussed in Section 6.4.3.3.

142 Chapter 6. Towards Effective Detection of Malware at Execution

Table 6.17: List of undetected malware

k-fold Malware 1 Trojan.Win32.Sweet.apm 1 Virus.Win32.HLLP.Sloc.apm 1 Virus.Win32.HLLW.Acoola.a.apm 2 Trojan.Win32.Anakha.apm 4 Virus.Win32.HLLP.Text.a.apm 5 Worm.Win32.Netres.c.apm 6 Trojan.Win32.AutoAccepter.apm 7 Trojan.Win32.Camking.apm 7 Virus.Win32.HLLP.Kiro.apm 7 Virus.Win32.HLLW.Starfil.apm 8 Worm.Win32.Randex.d.apm 9 Worm.Win32.Padobot.gen.apm 10 Worm.Win32.Forbot -FU

6.4.3.2 Pre-emptive-based Detection

Again, the aim of pre-emptive-based detection is to determine how many n-grams are required to reliably detect malware. This time, based on the previous study showing that it is not necessary to use 1000, we investigate only up to 300 n-grams.

100

80

60 TP FP

Rates 40

20

0 1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289 n-grams blocks Figure 6.9: Detections based on 1st to 300 th n-grams with NK and S

As shown in Figure 6.9, the pre-emptive detection based on a range of numerical values consistently achieves good detection performances from the 29 th n-gram, which yields averages of TP at 94% and FP at 1% with respective standard deviations of 3.3 and 3.2, as detailed in Table 6.18.

143 Chapter 6. Towards Effective Detection of Malware at Execution

th Table 6.18: Performances at 29 n-gram block

k-fold TP Rate (%) FP Rate (%) 1 92.1 0.0 2 89.5 0.0 3 92.1 0.0 4 97.4 0.0 5 89.5 0.0 6 97.4 0.0 7 92.3 0.0 8 94.9 0.0 9 97.4 10.0 10 97.4 0.0 Mean 94.0 1.0 Standard Deviation 3.3 3.2

th Table 6.19: Performances at 118 n-gram block

k-fold TP Rate (%) FP Rate (%) 1 94.7 0.0 2 100 0.0 3 100 0.0 4 100 0.0 5 100 0.0 6 100 0.0 7 100 0.0 8 100 0.0 9 100 0.0 10 97.4 0.0 Mean 99.2 0.0 Standard Deviation 1.8 0.0 Peak performances appear in several places from the 118 th (see Table 6.19) to 135 th n-grams, which produce an average TP rate of 99.2% and FP rate of 0%. Performances in the range of the 58 th to 140 th n-grams average TP rates of at least 98% while still maintaining 0% average FP rates. This indicates a similar pattern to our previous findings in Section 6.3.7.3 in which peak performances are seen in the range of the 113 th

144 Chapter 6. Towards Effective Detection of Malware at Execution to 135 th n-grams.

6.4.3.3 Distinguishing Benign from Malware

y for Malware m y for Benign

0.21 y value

0 1 26 51 76 101 126 151 176 201 226 251 276 301 326 351 376 401 426 451 476 File ID

Figure 6.10: Detection graph showing discrimination lines between malware and benign at 118 th n-gram

As shown in Figure 6.10, there are several benign programs close to the m line. We identify their files and find that the Checker is the closest, followed by FreeCell, MSN Messenger, Counter Strike and Word. This indicates that, by following our API scheme, these executables generate n-gram patterns close to the malware profiles unlike the rest of the benign programs. The figure contains dots representing the malware (triangular dots) and benign (round dots) files in which it can be seen that the triangular dots are clustered into 10 groups representing the respective folds and end with a group of benign dots.

Three malware executables, Trojan.Win32.Sweet.apm , Virus.Win32.HLLP.Sloc.apm and Worm.Win32.Forbot-FU , are below the m line; this means they are not detected as malware by the system. The first two are from the first fold, which means that the malware profile ( Mn ) used to parse their n-grams is based solely on the first of the 11 malware folds; at this stage it seems there is not enough data to recognise them as

145 Chapter 6. Towards Effective Detection of Malware at Execution malware. The third, Worm.Win32.Forbot-FU is also not detected. This may be explained with reference to Table 6.20.

Table 6.20: Details of Forbot-FU detection

Without Memory Hit Ratio Function Class Management Count (%) class APIs (%)

HeapAlloc Memory 9915 50.05 - Management HeapFree Memory 9723 49.08 - Management GetProcAddress Dynamic Link 157 0.79 90.8 Libraries LoadLibraryA Dynamic Link 11 0.06 6.4 Libraries Process and GetModuleFileNameA Thread, and 1 0.01 0.6 Process Status API Functions Process and Sleep Thread, and 1 0.01 0.6 Process Status API Functions Send Socket (Winsock 1 0.01 0.6 and Winsock SPI) WSAStartup Socket (Win sock 1 0.01 0.6 and Winsock SPI) Process and GetStartupInfoA Thread, and 1 0.01 0.6 Process Status API HeapSize Memory 1 0.01 - Management Although this worm produces many invocations on HeapAlloc and HeapFree, as we are ignoring all API calls related to the Memory Management class, only 173 API calls are considered for processing; GetProcAddress represents 90.8% of the total API call invocations observed by the system. API calls from the Memory Management class of APIs represents 99.12% of the total API calls invoked by the worm. As the LoadLibraryA is invoked 11 times and the remaining functions only once, the malware is likely to produce a unique pattern of API call sequences not existing in the malware and benign profiles. It is also possible that the worm’s n-grams are insufficient for a

146 Chapter 6. Towards Effective Detection of Malware at Execution correct detection.

Although this worm is not detected in execution based on its API calls, it is detected during propagation in our system.

6.4.3.4 Execution Speeds

There is no significant difference in the overall speed performances of the pre- emptive detection and full execution. Table 6.21 lists their average times taken, with their respective standard deviations, to perform each set of training and testing tasks.

Table 6.21: Times taken to perform detections

Differences without Average/Fold Standard Standard Setting Detection Algorithm (in seconds) Deviation Deviation (in seconds) th at 29 26.20 11.99 0.50 1.43 th at 118 26.25 11.27 0.65 1.75 full execution 26.41 12.27 2.03 1.50 We see a broad trend of improvement in performance as the amount of malware training data increases. Using the k-fold’s random allocation of malware and benign to training sets, performance is worse because, early in the sequence, the amount of malware training data is small, and only late in the sequence does it include relatively complete malware information. Nonetheless, as the system’s performance on early testing sets is quite good, even at the very early stages of program execution, we can conclude that the proposed system appears to perform well at detecting zero-day malware.

6.5 Comparison of Results

In Figure 6.11, we compare our best performance results, at the 113 th n-gram for experiment I and 118 th n-gram for experiment II, with the results in (Ahmed, Hameed, Shafiq and Farooq 2009). The accuracy is calculated based on the average of (malware correctly detected + benign correctly detected) / (total malware + total benign) for each fold. Ahmed et al. (Ahmed, Hameed, Shafiq and Farooq 2009) concluded that, by

147 Chapter 6. Towards Effective Detection of Malware at Execution monitoring classes including Memory Management and Input Output (IO), their system achieved an accuracy of 98%, and that API call sequences taken from only these classes produced an optimised result with an accuracy of 97%. However, our evaluation yields better results without the Memory Management class, in addition to significant reductions in the numbers of n-grams being used.

For experiment I, which is based on the dataset distributed randomly but not in chronological order, the results show that we achieve 100% accuracy whereas, for experiment II, in which we test the dataset, together with several other worms, chronologically according to the dates of malware discoveries, the system achieves 99.4% accuracy. The reason for the better performance of Experiment I is that, despite the amount of knowledge added to a malware’s profile during the early folds being significantly smaller than that during the later ones, the detection results at most folds reach perfection. Therefore, the results from experiment II are more likely to demonstrate the ability of the detection system in a real environment, especially for dealing with zero-day attacks from brand-new malware.

100 100 99.4 99 98.6 Detection Accuracies 98 97.5

97 96.3 96 95.6 95

Rates (%) Rates 95

94

93

92 IBk J48 NB RIPPER SVM Experiment I Experiment II Algorithms

Figure 6.11. Comparison of detection accuracies of our and other algorithms 6.6 Issues and Challenges

The patterns in API call sequences are greatly influenced by the flow of the

148 Chapter 6. Towards Effective Detection of Malware at Execution programming codes written for an executable or program. While some let an execution run automatically, others require end-user interactions, such as those in application software, many of which provide menus, shortcuts and special tools that will generate dynamic sequences of API calls.

The benign dataset provides only a single-run copy of API call sequences, which it records only up to a certain point of execution. However, the features of malware, such as its abilities to provide polymorphic offspring or dynamically execute in various conditions, also generate dynamic sequences of API calls. Therefore, as a very good detection performance in a first run may not guarantee the same in subsequent runs, to implement an experiment with a multi-run copy of API call sequences is costly and, to achieve it, one must be knowledgeable about all the malware’s execution paths. Nonetheless, we detect those malware, including malware implementing UPX-packing, encryption and polymorphism malware (as described in Section 6.3), during their first runs without compromising the usability of the benign programs. Moser et al. (Moser, Kruegel and Kirda 2007a) produced a tool that can explore multiple execution paths of any malware; although it has limitations and is difficult to implement, it is useful for helping an antivirus analyst produce a detailed report on the behaviour of a malware sample.

The malware evaluated in this study are only in the form of binary executables as the dataset does not include any in other forms, such as script, compiled java or macro. As detection of these malware should be linked to, or embedded with, the respective executables that run them, this could be an area of future work.

The API calls used in the experiments are limited, based on the information available in the dataset. However, it is known that there are many undocumented API functions used by Microsoft’s OSs. The usefulness of these functions for API-calls- based detection is still unclear (Russinovich 2004).

We present the time performances of the system in several settings, but it is still uncertain as to whether they would be accurate when all aspects are packaged together to work in a commercial environment. Our dataset contains recorded API call sequences

149 Chapter 6. Towards Effective Detection of Malware at Execution which require further cleaning and encoding. In a commercial environment, the system should capture the sequences of API calls and directly encode the related data. Although this work is based on sequences of API calls in the Windows XP environment, we are confident that the outcomes will be relevant to later OSs as long as no critical changes are made to the set of APIs involved.

6.7 Conclusion

In this chapter, we presented a malware detection approach that can correctly distinguish between malware and benign programs. Using a data mining technique and inspired by the immune system of NK and Suppressor, the results showed that this system is robust. Also, its correct selection of API call sequences and numbers of n- grams also helped to achieve promising results.

The results, which were compared with the work in the literature, suggested that we could effectively detect most of the malware executables and benign programs as early as the 29 th n-gram. We obtained stable performances over a range of n-gram blocks, and the peak and perfect performances were seen soon after the first hundred, i.e., as early as the 113 th n-gram. We evaluated our system for detecting malware using two datasets, one organised according to a k-fold ( k=10) cross-validation, and the other chronologically to simulate the robustness of the system against possible zero-day attacks.

150

Chapter 7

Conclusions

This chapter briefly describes the research carried out in this thesis. It also discusses its findings and conclusions, and indicates some possible future research directions.

7.1 Summary of Research Done

The main contribution of this thesis is in the area of malware detection based on session and API call sequences. We concluded that employing malware detection during both propagation and execution shows a strong potential for future research. Following is a summary of the contributions of this thesis.

A framework for the detection of malware during propagation and execution We began by proposing a two-stage detection system aimed at detecting malware during both propagation and execution. We investigated how these two layers of defence could detect malware, and identified how this system could work in a real environment. Good results were achieved. Some worms which were weakly detected during propagation were detected well during execution, while one which could not be detected during execution was during its propagation.

A session-based detection of self-propagating worms We presented anomaly- based and signature-based detectors, working both separately and in tandem, and analysed their performance on session-based data. The detectors did not necessarily require a prior set of training data; they could be self-learning and self-adjusting, detecting as they ran based on the knowledge learnt from recent traffic. We evaluated the dataset in fixed and time windows (Chapter 4). Chapter 7. Conclusions

Performance evaluation of API call hooking We evaluated the overheads incurred by the execution performances of common software when a set of API functions of the OS were hooked. We created a program that utilised an API hooking framework to capture loading speeds in terms of the time differences among a variety of software executables, both with and without API hooking implemented. This provided us with an insight into how many API functions we could possibly hook without significantly delaying the execution speeds of the software.

A pre-emptive-based detection of malware executables using sequences of API calls We developed an algorithm using sequences of API calls, which detected malware as they executed. The most effective malware detection could be seen as early as the 29 th n-gram block and optimal results were achieved in the low 100 th n-gram blocks (specifically, at the 113 th n-gram block).

A chronological evaluation of the detection system Using API-calls-based detection, we evaluated the system against malware sampled chronologically, so as to simulate the real-world problem of malware threats. The system demonstrated its readiness against possible zero-day attacks. The most effective malware detection could be seen as early as 29 th n-gram block and the optimal results are in the ranges of early 100 th (specifically in the study they are at the 118 th n-gram blocks).

7.2 Limitations

Several limitations of the work presented in this thesis are as follows.

• The detection system consisted of two sub-systems: 1) a session-based system, which detected malware as they propagated; and 2) an API call- based system, which detected malware as they executed. Both sub-systems were only prototypes as a commercial version of the system, as envisioned in Chapter 3, was beyond the scope of this thesis.

• The session-based system was evaluated using a standard online dataset. The validity of our results is limited to how representative that data set is of what would be seen in a production environment now.

152 Chapter 7. Conclusions

• The same limitation of the session-based system discussed above also applied to the API call-based system. Nevertheless, we introduced a small size of a dataset for the evaluation of the API call-based system, as reported in Chapter 6.

• Although a performance evaluation of the impact of API hooking was conducted in Chapter 5, the actual impact if the set of API functions identified in Chapter 6 were hooked is still an open question. Following the development of a fully fledged, nearly commercial, version of the system which captures API calls on the run and performs detection, a more meaningful performance evaluation could be obtained.

• The API call-based system we evaluated covered only binary executables. Malware in other forms, which run using a specific run-time environment or interpreter such as a script, macro, .NET, and java programs, were not included.

• As described in Section 6.2.3, each n-gram contains five API calls. Comparisons of Xn against Mn and Bn were made based on the exact sequence of API calls. We could break the sequential interdependency of the API calls within each n-gram by using a tree-based structure, as proposed in Chapter 4, to follow a similar approach to (Kolbitsch, Comparetti, Kruegel, Kirda, Zhou and Wang 2009). However, we believed the effort was unnecessary because it would increase the monitoring and searching cost for individual executables. It should be noted that, for a single executable, there could have been hundreds to thousands of n-grams, considering the amount of API calls monitored by our detection system and the size of the executables themselves. Nevertheless, in Chapter 4, we did consider a partial matching for each of the n-grams. Although all our strategies were also aimed at combatting evasion attacks, we did not conduct experiments relating to such attacks on the system. Nevertheless, the detection system architecture conceptually covers this issue, as described in Section 3.3.

153 Chapter 7. Conclusions

• Evasion attack is still a major challenge for malware researchers when the attacker has access to the detection signature or rules because imitative attack attempts could still be produced (Parampalli, Sekar and Johnson 2008). We described what the system might not be able to detect in Section 3.4.2. Although we did not specifically examine evasion attacks using our detection system, some of the malware in our dataset employed certain evasive techniques (see Section 6.3) and the system detected them well. Besides, our detection system architecture conceptually guarantees that malware is on its radar, as described in Section 3.3.

• The detection system was designed to detect the presence of malware threats only while they were propagating and executing. Nowadays, cleaning an infected computer can be done by simply restoring its images, unlike 10 years ago when formatting a computer and reinstalling its applications software and drivers was very time consuming. If a thorough investigation is required, that is, an evaluation of the extent of damage caused by the malware threat in order to implicate the criminal or an analysis of all the actions of the malware, reverse code engineering and computer forensic techniques are needed.

7.3 Future Research Directions

This thesis demonstrates our novel evaluation of two-way malware detection which detects malware as it both executes and propagates. There are several areas that could be improved through further research.

• The detection system consists of two sub-systems: 1) session-based, which detects malware as they propagate; and 2) API-calls-based, which detects malware as they execute. An evaluation of the system in a production environment, as envisioned in Chapter 3, with the sub-systems integrated together is our next research study in which the session-based data and API call sequences of the running executables will be fed into the system as the

154 Chapter 7. Conclusions

detection is in progress. Prior to this, independent evaluations of the two sub-systems need to be conducted using data collected at the host level in real-time.

• Another extension is to enhance the API-calls-based system to cover the detection of malware in other forms that run using specific run-time environments or interpreters, such as script, macro, .NET, and java programs.

• The next step in our research aims to build a detection system that provides a self-protection capability for an autonomic system (IBM 2011) against malware threat. We envision developing an agent-based system with rules and profiles of malware and benign software, that are not only limited to a single host but are also from aggregated data collected from the machines within the reach of the autonomic system.

• We implemented our system in a Windows environment. It might be interesting to evaluate its system performance in a Linux/UNIX-based OSs environment.

7.4 Closing Remarks

No single detection mechanism can be effective in countering all malware threats, and many security specialists agree that there is no silver bullet for solving this problem (Harley 2010; MalwareCity 2011; Vega 2011).

Although behaviour-based malware detection research is on the rise and the one-to- one binary signature has its limitations, as described in Section 2.4.1, binary signature- based detection cannot be ignored (Olzak 2008; Vega 2011): although it cannot detect new malware, it guarantees the detection of malware or their variants that have been detected previously or somewhere else. This requires that at least someone needs to report the existence of a malware so its signature can be produced and updated to binary signature-based antivirus software. Despite the weakness of this technique, it can guarantee an excellent level of detection of known malware.

155 Chapter 7. Conclusions

In this thesis, we have proposed an architecture for malware detection which aims to detect malware by recognising their behaviours as they execute and propagate. We have used two sub-systems, one for monitoring API call sequences as the malware execute and the other for monitoring session-based traffic as they propagate. Each sub-system is inspired by the mechanism of the HIS, and features the profiling of self and non-self signatures and self-adjusted parameters which makes the system adaptive to its environment. The terms ‘self-adjusted’ and ‘adaptive’ are essential for ensuring that our system is capable of adapting to changes from time to time as end-users’ usage patterns are very variable.

Even though we have shown in this thesis that our system presents promising capabilities to attack malware threats, it is either still lacking or untested in several areas. Some of these issues have been highlighted in Section 7.2.

Before a fully operational detection system could be realised, extensive experimentation to refine the algorithms would be needed and the development of the first prototype for a production environment would be desirable. We believe our work in this thesis provides a promising basis for future researchers interested in the area of the behaviour-based detection of malware.

156

References

Abbes, T., A. A. Bouhoula and M. Rusinowitch (2004). Protocol Analysis in Intrusion Detection Using Decision Tree. Proceedings of the International Conference on Information Technology, Coding and Computing (ITCC’04), IEEE Computer Society: pp. 404-408. Abimbola, A. A., J. M. Munoz and W. J. Buchanan (2006). "NetHost-sensor: Monitoring a Target Host's Application via System Calls." Information Security Technical Report 11(4): pp. 166-175. Ahmed, F., H. Hameed, M. Z. Shafiq and M. Farooq (2009). Using Spatio-temporal Information in API Calls with Machine Learning Algorithms for Malware Detection. Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence, Chicago, Illinois, USA, ACM: pp. 55-62. Aickelin, U. and S. Cayzer (2002). The Danger Theory and Its Application to Artificial Immune Systems. 1st International Conference on Artificial Immune Systems, Canterbury, UK: pp. 141–148. Al-Zarouni, M. (2006). The Reality of Risks from Consented Use of USB Devices. Proceedings of the 4th Australian Information Security Conference, Edith Cowan University: pp. 5–15. Alazab, M., S. Venkataraman and P. Watters (2010). Towards Understanding Malware Behaviour by the Extraction of API Calls. Second Cybercrime and Trustworthy Computing Workshop (CTC'10): pp. 52-59. Anderson, D., T. F. Lunt, H. Javitz, A. Tamaru and A. Valdes (1995). Detecting Unusual Program Behavior Using the Statistical Components of NIDES, SRI International. SRI-CSL-95-06. p. 86. Anderson, J. P. (1980). Computer Security Threat Monitoring and Surveillance. Fort Washington, Pennsylvania, James P Anderson Co.

Anti-Virus Comparative (2010). On-demand Detection of Malicious Software, www.av- comparatives.org . No. 25. p. 10. APCERT (2011). APCERT Annual Report, Asia Pacific Computer Emergency Team. p. 194. Apple Inc. (2009). "OS X ABI Mach-O File Format Reference." Retrieved 3 August 2012, from https://developer.apple.com/library/mac/#documentation/DeveloperTools/Concept ual/MachORuntime/Reference/reference.html#//apple_ref/doc/uid/20001298- BAJIHABI. Ashfaq, A., M. J. Robert, A. Mumtaz, M. Q. Ali, A. Sajjad and S. A. Khayam (2008). A Comparative Evaluation of Anomaly Detectors under Portscan Attacks. Recent Advances in Intrusion Detection: pp. 351-371. AVG Technologies. (2008). "AVG Anti-Virus FREE." Retrieved 1 January 2008, from http://free.avg.com . AVG Technologies. (2012). "Sana Security is Now Owned by AVG Technologies." Retrieved 1 February 2012, from http://www.avg.com/my-en/sana-security . Azadegan, S., W. Yu, H. Liu, M. Sistani and S. Acharya (2012). Novel Anti-forensics Approaches for Smart Phones. 45th Hawaii International Conference on System Science (HICSS'12): pp. 5424-5431. Balasubramaniyan, J. S., J. O. Garcia-Fernandez, D. Isacoff, E. Spafford and D. Zamboni (1998). An Architecture for Intrusion Detection Using Autonomous Agents. Proceedings of the 14th Annual Computer Security Applications Conference: pp. 13-24. Bayer, U., P. M. Comparetti, C. Hlauschek, C. Kruegel and E. Kirda (2009). Scalable, Behavior-Based Malware Clustering. Proceedings of the 16th Annual Network and Distributed System Security Symposium (NDSS'09): p. 18. Bayer, U., E. Kirda and C. Kruegel (2010). Improving the efficiency of dynamic malware analysis. Proceedings of the ACM Symposium on Applied Computing, Sierre, Switzerland, ACM: pp. 1871-1878. Bee, S. (2011). "Beware the Dangers of Free Wi-Fi." Retrieved 19 March 2011, from http://malwareresearchgroup.com/2011/01/beware-the-dangers-of-free-wi-fi/ .

158

Bellifemine, F., G. Caire and D. Greenwood (2007). Developing Multi-Agent Systems with JADE, Wiley. Bennett, J. (2008). "AutoIt3." Retrieved 1 June 2008, from http://www.autoitscript.com . Berghel, H., D. Hoelzer and M. Sthultz (2008). "Data Hiding Tactics for Windows and Unix File Systems." 74: p. 17. Berns, A. D. and E. Jung (2008). Searching for Malware in BitTorrent, University of Iowa. Computer Science Technical Report UICS-08-05. pp. 1-10. Borisov, N., I. Goldberg and D. Wagner (2001). Intercepting Mobile Communications: the Insecurity of 802.11. Proceedings of the 7th Annual International Conference on Mobile Computing and Networking (MOBICOM'01), Rome, Italy, ACM: pp. 180-189. Bowei, S. (2009). The Spread of Malware on the WiFi Network: Epidemiology Model and Behaviour Evaluation. 1st International Conference on Information Science and Engineering (ICISE'09): pp. 1916-1918. Brand, M. (2007). Forensic Analysis Avoidance Techniques of Malware. The 5th Australian Digital Forensics Conference, Edith Cowan University, Perth Western Australia. Burges, C. J. C. (1998). "A Tutorial on Support Vector Machines for Pattern Recognition." Data Mining and Knowledge Discovery 2(2): pp. 121-167. CA. (2010). "Virus Info." Retrieved 1 June 2010, from http://gsa.ca.com/virusinfo/ . Cabla, L. (2012). "HDAT2." Retrieved 17/5/2012, from http://www.hdat2.com/ . Cascadia Labs (2008). Protecting Against Evolving Web Threats, Cascadia Labs. 9. Cavallaro, L., P. Saxena and R. Sekar (2007). Anti-taint-analysis: Practical Evasion Techniques Against Information Flow based Malware Defense, Stony Brook University, Stony Brook, New York. Cavallaro, L., P. Saxena and R. Sekar (2008). On the Limits of Information Flow Techniques for Malware Analysis and Containment. Proceedings of the 5th International Conference on Detection of Intrusions and Malware and Vulnerability Assessment, Paris, France, Springer-Verlag: pp. 143-163. CERT. (2008). "CERT Statistics." Retrieved 10 March, 2011, from

159

http://www.cert.org/stats . Chan, P. C. and V. K. Wei (2002). Preemptive Distributed Intrusion Detection Using Mobile Agents. Proceedings of Eleventh IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WET ICE'02): pp. 103-108. Chen, R. (2009). "What does the "Zw" prefix mean?" The Old New Thing Retrieved 8 August 2012, from http://blogs.msdn.com/b/oldnewthing/archive/2009/06/03/9687937.aspx . Chikofsky, E. J. and J. H. Cross, II (1990). "Reverse Engineering and Design Recovery: a Taxonomy." IEEE Software 7(1): pp. 13-17. Choon, O. T. and A. Samsudin (2003). Grid-based Intrusion Detection System. The 9th Asia-Pacific Conference on Communications (APCC'03). 3: pp. 1028-1032. Christiansen, N. R., R. S. Thind, R. V. Pudipeddi, D. D. Groff, J. M. Cargille and B. K. Dewey (2011). File system filters and transactions. U. S. Patent, Microsoft Corporation. Christodorescu, M. and S. Jha (2003). "Static Analysis of Executables to Detect Malicious Patterns." USENIX Security Symposium: p. 18. Civie, V. and R. Civie (1998). Future Technologies from Trends in Computer Forensic Science. IEEE Information Technology Conference: pp. 105-108. Cluley, G. (2009). "AV-Test.org's Malware Count Exceeds 22 Million." Retrieved 18 March 2011, from http://nakedsecurity.sophos.com/2009/07/24/avtestorgs- malware-count-exceeds-22-million/ . Comparetti, P. M., G. Salvaneschi, E. Kirda, C. Kolbitsch, C. Kruegel and S. Zanero (2010). Identifying Dormant Functionality in Malware Programs. IEEE Symposium on Security and Privacy: pp. 61-76. Constantin, L. (2009). "Chinese Hackers Target Australia's Prime Minister." Retrieved 19 March 2011, from http://news.softpedia.com/news/Chinese-Hackers-Target- Australia-039-s-Prime-Minister-108633.shtml . Constantin, L. (2011). "Two Zero-day Vulnerabilities Found in Flash Player." Retrieved 13 June 2012, from http://www.computerworld.com/s/article/9222546/Two_zero_day_vulnerabilities_

160

found_in_Flash_Player . Coombs, J. (2005). "Win32 API Obscurity for I/O Blocking and Intrusion Prevention." Retrieved 8 August 2012, from http://www.drdobbs.com/win32-api-obscurity-for- io-blocking-and/184406098 . Cornell, D. (2008). Static Analysis Techniques for Testing Application Security. Open Web Application Security Project. San Antonio, Denim Group. Cover, T. M. and J. A. Thomas (1991). Elements of Information Theory, Wiley- Interscience. Crosbie, M. and E. Spafford (1995). Applying Geneting Programming to Intrusion Detection. Proceedings of The AAAI Fall Symposium Series, Association for the Advancement of Artificial Intelligence (AAAI): pp. 1-8. Dagon, D., X. Qin, G. Gu, W. Lee, J. Grizzard, J. Levine and H. Owen (2004). HoneyStat: Local Worm Detection Using Honeypots. Recent Advances in Intrusion Detection: pp. 39-58. Dahbur, K. and B. Mohammad (2011). The Anti-forensics Challenge. Proceedings of the International Conference on Intelligent Semantic Web-Services and Applications. Amman, Jordan, ACM: pp. 1-7. Dai, S., Y. Liu, T. Wang, T. Wei and W. Zou (2010). Behavior-Based Malware Detection on Mobile Phone. 6th International Conference on Wireless Communications Networking and Mobile Computing (WiCOM'10): pp. 1-4. Danzig, P., J. Mogul, V. Paxson and M. Schwartz. (2009). "The Internet Traffic Archive." Retrieved 19 March 2011, from http://ita.ee.lbl.gov/ . Dasgupta, D. (2006). Advances in Artificial Immune Systems. IEEE Computational Intelligence Magazine. 1: pp. 40-49. Delves, P., S. Martin, D. Burton and I. Roitt (2006). Roitt's Essential Immunology (Essentials), Wiley-Blackwell. DeMarines, V. (2008). "Obfuscation - How to Do It and How to Crack It." 2008(7): pp. 4-7. Dillon, S. (2006). Hide and Seek: Concealing and Recovering Hard Disk Data. Infosec Techreport, Department of Computer Science, James Madison University. JMU- INFOSEC-TR-2006-002. p. 17.

161

Domenico, A. M., C. Giorgio and L. Antonio (2007). "Dependability in Wireless Networks: Can We Rely on WiFi?" IEEE Security & Privacy 5(1): pp. 23-29. El-Semary, A., J. Edmonds, J. Gonzalez and M. Papa (2005). A Framework for Hybrid Fuzzy Logic Intrusion Detection Systems. The 14th IEEE International Conference on Fuzzy Systems: pp. 325-330. EndpointSecurity.org. (2008). "Endpoint Security Homepage." Retrieved 21 January 2009, from http://www.endpointsecurity.org/Documents/What_is_endpointsecurity.pdf . Fan, W. and K. H. Yeung (2010). Virus Propagation Modeling in Facebook. International Conference on Advances in Social Networks Analysis and Mining (ASONAM'10), Odense, Denmark, IEEE Computer Society Press: pp. 331-335. Fitzgerald, P. (2010). "Inside the Jaws of Trojan.Clampi." Retrieved 12 April 2012, from http://www.symantec.com/connect/blogs/inside-jaws-trojanclampi . Fluhrer, S. R., I. Mantin and A. Shamir (2001). Weaknesses in the Key Scheduling Algorithm of RC4. Revised Papers from the 8th Annual International Workshop on Selected Areas in Cryptography, Springer-Verlag: pp. 1-24. Ford, R. (2005). Malcode Mysteries Revealed [Computer Viruses and Worms]. IEEE Security & Privacy Magazine. 3: pp. 72-75. Ford, S., M. Cova, C. Kruegel and G. Vigna (2009). Analyzing and Detecting Malicious Flash Advertisements. Annual Computer Security Applications Conference (ACSAC'09): pp. 363-372. Forrest, S., S. A. Hofmeyr and A. Somayaji (1998). "Intrusion Detection Using Sequences of System Calls." Journal of Computer Security 6(3): pp. 151-180. Forrest, S., A. S. Perelson, L. Allen and R. Cherukuri (1994). Self-nonself Discrimination in a Computer. Proceedings of the IEEE Symposium on Research in Security and Privacy, Computer Society: pp. 202-212. Fredrikson, M., S. Jha, M. Christodorescu, R. Sailer and X. Yan (2010). Synthesizing Near-optimal Malware Specifications from Suspicious Behaviors. IEEE Symposium on Security and Privacy: pp. 45-60. Fu, H., X. Yuan and L. Hu (2007). Design of a Four-Layer Model Based on Danger Theory and AIS for IDS. International Conference on Wireless Communications,

162

Networking and Mobile Computing: pp. 6331-6334. Garfinkel, S. (2006). Anti-Forensics: Techniques, Detection and Countermeasures. Monterey, CA, USA, Naval Postgraduate School. Gaudin, S. (2007). " Behind Canadian DoS Attack." Retrieved 18 March 2011, from http://www.informationweek.com/news/internet/showArticle.jhtml?articleID=201 500196 . Getz, K. (2002). "Replacing API Calls with .NET Framework Classes." Retrieved 2 August 2012, from http://msdn.microsoft.com/en-us/library/ms973912.aspx . GFI Software. (2012). "Malware Analysis with GFI SandBox (formerly CWSandbox)." Retrieved 25/12/2012, from http://www.gfi.com/malware-analysis-tool . Ghosh, A. and S. Sen (2004). Agent-based Distributed Intrusion Alert System. Distributed Computing: pp. 240-251. Gill, R., J. Smith and A. Clark (2006). Specification-based Intrusion Detection in WLANs. 22nd Annual Computer Security Applications Conference (ACSAC'06): pp. 141-152. Goebel, J., T. Holz and C. Willems (2007). Measurement and Analysis of Autonomous Spreading Malware in a University Environment. Proceedings of the 4th International Conference on Detection of Intrusions and Malware and Vulnerability Assessment, Lucerne, Switzerland, Springer-Verlag: pp. 109-128. Gong, F. (2003). Next Generation Intrusion Detection Systems (IDS), McAfee Inc: p. 14. González, F., D. Dasgupta and L. Niño (2003). A Randomized Real-Valued Negative Selection Algorithm. Artificial Immune Systems: pp. 261-272. Greensmith, J., U. Aickelin and J. Twycross (2006). Articulation and Clarification of the Dendritic Cell Algorithm. Artificial Immune Systems: pp. 404-417. grugq, T. (2004). "The Art of Defiling." Defeating Forensic Analysis on Unix File Systems Retrieved 1 May 2012, from http://www.phrack.org/issues.html?issue=59&id=6 . Gu, Y., A. McCallum and D. Towsley (2005). Detecting Anomalies in Network Traffic Using Maximum Entropy Estimation. Proceedings of the 5th ACM SIGCOMM

163

Conference on Internet Measurement. Berkeley, CA, USENIX Association: pp. 345-350. Gutmann, P. (1996). Secure Deletion of Data from Magnetic and Solid-State Memory. the Sixth USENIX Security Symposium Proceedings, San Jose, California: p. 8. Han, J. and M. Kamber (2006). Data Mining Concepts and Techniques. San Francisco, Morgan Kaufmann. Harley, D. (2010). "Ten Ways to Dodge Cyber-Bullets." Retrieved 28 March 2011, from http://blog.eset.com/2010/02/10/ten-ways-to-dodge-cyber-bullets-part-8. Harrald, J. R., S. A. Schmitt and S. Shrestha (2004). The Effect of Occurrence and Virus Threat Level on Antivirus Companies' Financial Performance. Proceedings of the IEEE International Engineering Management Conference. 2: pp. 780-784. Hartley, W. M. (2007). "Current and Future Threats to Digital Forensics." ISSA Journal: pp. 12-14. Henry, P. S. and L. Hui (2002). "WiFi: What's Next?" IEEE Communications Magazine 40(12): pp. 66-72. Hex-Rays SA. (2012). "IDA: About." Retrieved 8/5/2012, from http://www.hex- rays.com/products/ida/index.shtml . Hodge, N. and A. Entous. (2011). "Oil Firms Hit by Hackers from China, Report Says." Retrieved 19 March 2011, from http://online.wsj.com/article/SB1000142405274870371690457613466111151886 4.html . Hoglund, G. and J. Butler (2005). Rootkits: Subverting the Windows Kernel, Addison- Wesley Professional. Hollebeek, T. and D. Berrier (2001). Interception, Wrapping and Analysis Framework for Win32 Scripts. Proceedings of the DARPA Information Survivability Conference & Exposition II. 2: pp. 222-229. Hu, Y. and B. Panda (2004). A Data Mining Approach for Database Intrusion Detection. Proceedings of the ACM Symposium on Applied Computing, Nicosia, Cyprus, ACM: pp. 711-716. Hunt, G. and D. Brubacher (1999). Detours: Binary Interception of Win32 Functions.

164

Proceedings of the 3rd Conference on USENIX Windows NT Symposium, Seattle, Washington, USENIX Association. 3: pp. 1-9. Hypponen, M. (2012). "Why Antivirus Companies Like Mine Failed to Catch Flame and Stuxnet." Retrieved 5 June 2012, from http://www.wired.com/threatlevel/2012/06/internet-security-fail/ . IBM. (2011). "Autonomic Computing." Retrieved 1 July 2010, from http://www.research.ibm.com/autonomic/ . Idika, N. and A. P. Mathur (2007). A Survey of Malware Detection Techniques. SERC Technical Reports, Software Engineering Research Center. SERC-TR-286. p. 48. Ilgun, K., R. A. Kemmerer and P. A. Porras (1995). "State Transition Analysis: a Rule- based Intrusion Detection Approach." IEEE Transactions on Software Engineering 21(3): pp. 181-199. Inspector General for Audit (2004). The Use of Audit Trails to Monitor Key Networks and Systems Should Remain Part of the Computer Security Material Weakness. WASHINGTON, D.C., Department of the Treasury 2004-20-131. p. 35. International Secure Systems Lab. (2012). "Anubis: Analyzing Unknown Binaries." Retrieved 27/7/2012, from http://anubis.iseclab.org/ . Jamaluddin, J., N. Zotou and P. Coulton (2004). Mobile Phone Vulnerabilities: a New Generation of Malware. IEEE International Symposium on Consumer Electronics: pp. 199-202. Jemili, F., M. Zaghdoud and M. B. Ahmed (2007). A Framework for an Adaptive Intrusion Detection System Using Bayesian Network. IEEE Intelligence and Security Informatics: pp. 66-70. Jeong, S. B., Y. W. Choi and S. Kim (2005). An Effective Placement of Detection Systems for Distributed Attack Detection in Large Scale Networks. Information Security Applications: pp. 204-210. Ji, Z. and D. Dasgupta (2004). Real-Valued Negative Selection Algorithm with Variable-Sized Detectors. Genetic and Evolutionary Computation (GECCO'04): pp. 287-298. Jian, G., L. D. Xin and C. B. Ge (2004). An Induction Learning Approach for Building Intrusion Detection Models Using Genetic Algorithms. Fifth World Congress on

165

Intelligent Control and Automation (WCICA'04). 5: pp. 4339-4342 Jung, J., V. Paxson, A. W. Berger and H. Balakrishnan (2004). Fast Portscan Detection Using Sequential Hypothesis Testing. Proceedings of the IEEE Symposium on Security and Privacy: pp. 211-225. Kannadiga, P. and M. Zulkernine (2005). DIDMA: a Distributed Intrusion Detection System Using Mobile Agents. Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks pp. 238-245. Kaspersky Lab. (2010). "Malware Profile Search." Retrieved June 2010, from http://www.kaspersky.com/find? KDD Cup. (1999). "KDD Cup Datasets." Retrieved 28/8/07, from http://kdd.ics.uci.edu/databases/ . Khayam, S. A., H. Radha and D. Loguinov (2008). Worm Detection at Network Endpoints Using Information-Theoretic Traffic Perturbations. IEEE International Conference on Communications (ICC'08): pp. 1561-1565. Kienzle, D. M. and M. C. Elder (2003). Recent Worms: a Survey and Trends. Proceedings of the ACM Workshop on Rapid Malcode, Washington, DC, USA, ACM: pp. 1-10. Kim, J., J. Greensmith, J. Twycross and U. Aickelin (2005). Malicious Code Execution Detection and Response Immune System Inspired by the Danger Theory. Adaptive and Resilient Computing Security Workshop. Santa Fe, USA. Kolbitsch, C., P. M. Comparetti, C. Kruegel, E. Kirda, X. Zhou and X. Wang (2009). Effective and Efficient Malware Detection at the End Host. Proceeding of the USENIX Security Symposium, USENIX Association: pp. 351-366. Krenhuber, A. and A. Niederschick (2007). Forensic and Anti-forensic on Modern Computer Systems, Johannes Kepler Universität Linz. p. 11. Kruegel, C., D. Balzarotti, W. Robertson and G. Vigna (2007). Improving Signature Testing Through Dynamic Data Flow Analysis. 23rd Annual Computer Security Applications Conference: pp. 53-63. Kruegel, C., D. Mutz, W. Robertson and F. Valeur (2003). Bayesian Event Classification

166

for Intrusion Detection. Proceedings of the 19th Annual Computer Security Applications Conference: pp. 14-23. Kruegel, C., W. Robertson and G. Vigna (2004). Detecting Kernel-level Rootkits Through Binary Analysis. 20th Annual Computer Security Applications Conference: pp. 91-100. Kruegel, C., F. Valeur, G. Vigna and R. Kemmerer (2002). Stateful Intrusion Detection for High-speed Network's. IEEE Symposium on Security and Privacy: pp. 285- 293. Lack, L. (2003). Using the Bootstrap Concept To Build an Adaptable and Compact Subversion Artifice. Naval Postgraduate School. Monterey, California, United States Naval Academy. Master of Science in Computer Science: p. 90. Lai, Y. (2008). A Feature Selection for Malicious Detection. Ninth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD'08): pp. 365-370. Lakhina, A., M. Crovella and C. Diot (2005). Mining Anomalies Using Traffic Feature Distributions. Proceedings of the 2005 Conference on Applications, Technologies, Architectures and Protocols for Computer Communications, Philadelphia, Pennsylvania, USA, ACM: pp. 217-228. Lawton, G. (2002). "Virus Wars : Fewer Attacks, New Threats." IEEE Computer 35(12): pp. 22-24. Li, W. J., S. Stolfo, A. Stavrou, E. Androulaki and A. D. Keromytis (2007). A Study of Malcode-bearing Documents. Detection of Intrusions and Malware and Vulnerability Assessment, Springer-Verlag Berlin Heidelberg. 4579: pp. 231-250. Lowery, J. C. (2002). "Computer System Security: A Primer." Retrieved 18 March 2011, from http://www.dell.com/content/topics/global.aspx/power/en/ps1q02_lowery?c=us&l =en . Mabu, S., C. Chen, N. Lu, K. Shimada and K. Hirasawa (2011). "An Intrusion- Detection Model Based on Fuzzy Class-Association-Rule Mining Using Genetic Network Programming." IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews 41(1): pp. 130-139.

167

Macleod, C. (2007). "Is That a Hacker Next to You?" Communications Engineer 5(1): pp. 36-37. Mahoney, M. V. and P. K. Chan (2003). An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection. Recent Advances in Intrusion Detection: pp. 220-237. Mallery, J. R. (2006). Secure File Deletion: Fact or Fiction?, SANS Institute. p. 25. MalwareCity. (2011). "Night Dragon." Retrieved 18 March 2011, from http://www.malwarecity.com/community/index.php?app=blog&module=display& section=blog&blogid=23&showentry=4427 . Marchette, D. (1999). A Statistical Method for Profiling Network Traffic. Proceedings of the Workshop on Intrusion Detection and Network Monitoring, USENIX Association: pp. 119-128. Matzinger, P. (1994). "Tolerance, Danger and the Extended Family." Annual Review in Immunology 12: pp. 991-1045. McAfee. (2010a, November 17 2010). "Q3 2010 Threats Report: Average Daily Malware Growth at an All Time High." Retrieved 1 February 2011, from http://www.outlookseries.com/A0999/Security/3752_McAfee_Q3_2010_Threats_ Report_Average_Daily_Malware_Growth_High.htm . Mcafee. (2010b). "Threat Intelligence." Retrieved 1 June 2010, from http://www.mcafee.com/threat-intelligence/ . McHugh, J. (2000). "Testing Intrusion Detection Systems: a Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory." ACM Transaction on Information System Security 3(4): pp. 262-294. Merriam-Webster. (2012). "Merriam-Webster's Online Dictionary." Retrieved 3 August 2012, from http://www.merriam- webster.com/dictionary/executable?show=0&t=1343962043 . Metasploit LLC. (2011). "Metasploit Framework." Retrieved 14 March 2011, 2007, from http://www.metasploit.com/framework/download/ . Microsoft. (2010). "Overview of the Windows API." Retrieved 3 August 2012, from http://msdn.microsoft.com/en-us/library/Aa383723 . Microsoft. (2012a). "API Basics." Retrieved 2 August 2012, from

168

http://msdn.microsoft.com/en-us/library/aa165081%28v=office.10%29.aspx . Microsoft. (2012b). "Common Object File Format (COFF)." Retrieved 3 August 2012, from http://support.microsoft.com/?id=121460 . Microsoft. (2012c). "Detours - Microsoft Research." Retrieved 21/5/2012, from http://research.microsoft.com/en-us/projects/detours/ . Microsoft. (2012d). "Dynamic-Link Libraries." Retrieved 2 August 2012, from http://msdn.microsoft.com/en-us/library/ms682589.aspx . Microsoft. (2012e). "File System Filter Drivers." Retrieved 2 August 2012, from http://msdn.microsoft.com/en-us/library/windows/hardware/gg462968.aspx . Microsoft. (2012f). "minispy Minifilter Sample." Retrieved 2 August 2012, from http://msdn.microsoft.com/en- us/library/windows/hardware/ff549778%28v=vs.85%29.aspx . Microsoft. (2012g). "Platform Invocation Services." Retrieved 10 August 2012, from http://msdn.microsoft.com/en-us/library/aa712982%28v=vs.71%29.aspx . Microsoft. (2012h). "Walkthrough: Calling Windows APIs (Visual Basic)." Retrieved 2 August 2012, from http://msdn.microsoft.com/en-us/library/172wfck9.aspx . Microsoft. (2012i). "What Is VBScript?" Retrieved 2 August 2012, from http://msdn.microsoft.com/en-us/library/1kw29xwf.aspx . Microsoft Corporation. (2004a). "Executable-File Header Format." Retrieved 1 August 2012, from http://support.microsoft.com/kb/65122 . Microsoft Corporation. (2004b). "Microsoft Win32 to Microsoft .NET Framework API Map." Retrieved 4 August 2012, from http://msdn.microsoft.com/en- us/library/aa302340.aspx . Microsoft Corporation. (2009). "Hashtable Class." Retrieved 1 April 2009, from http://msdn.microsoft.com/en-us/library/system.collections.hashtable.aspx . Microsoft Corporation. (2010a). "64-bit Applications." Retrieved 1 August 2012, from http://msdn.microsoft.com/en-us/library/ms241064 . Microsoft Corporation. (2010b). "Microsoft Portable Executable and Common Object File Format Specification." Retrieved 3 August 2012, from http://msdn.microsoft.com/library/windows/hardware/gg463125 . Microsoft Corporation. (2010c). "MSDN Library." Retrieved May 2010, from

169

http://msdn.microsoft.com/en-us/library . Microsoft Corporation. (2011a). " http://msdn.microsoft.com/en- us/library/cc144204%28VS.85%29.aspx. " Retrieved 14 March, 2011, from http://windows.microsoft.com/en-US/windows-vista/Whats-the-difference- between-AutoPlay-and-autorun . Microsoft Corporation. (2011b). "Windows Virtual PC." Retrieved 26 March 2011, from http://www.microsoft.com/windows/virtual-pc/ . Microsoft Developer Network. (2008). "About Hooks." Retrieved 24 June 2008, from http://msdn.microsoft.com/en-us/library/ms644959(VS.85).aspx . MIT Lincoln Labs. (1999). "DARPA Intrusion Detection Data Sets." 19 March 2011, from http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/data/index.html . Moser, A., C. Kruegel and E. Kirda (2007a). Exploring Multiple Execution Paths for Malware Analysis. IEEE Symposium on Security and Privacy: pp. 231-245. Moser, A., C. Kruegel and E. Kirda (2007b). Limits of Static Analysis for Malware Detection. 23rd Annual Computer Security Applications Conference: pp. 421-430. Mosqueira-Rey, E., A. Alonso-Betanzos, B. del Río and J. Piñeiro (2007). A Misuse Detection Agent for Intrusion Detection in a Multi-agent Architecture. Agent and Multi-Agent Systems: Technologies and Applications: pp. 466-475. Mutz, D., F. Valeur, G. Vigna and C. Kruegel (2006). "Anomalous System Call Detection." ACM Transactions on Information and System Security 9(1): pp. 61- 93. Nektra Advanced Computing. (2010). "Deviare API." Retrieved May 2010, from http://www.nektra.com/products/deviare-api-hook-windows/ . Newsome, J. and D. X. Song (2005). Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. Proceedings of the Network and Distributed System Security Symposium (NDSS'05), San Diego, California, USA: p. 17. nexginrc.org. (2008). "Endpoint Worm Scan Dataset." Retrieved 1 Nov 2009, from http://www.nexginrc.org/index.php?option=com_content&view=category&layout =blog&id=7&Itemid=24 .

170

nexginrc.org. (2010). "API Call Dataset." Retrieved 10 February 2010, from http://nexginrc.org/Datasets/Default.aspx . Nichols, S. ( 2007). "Kernel-level malware on the rise." Retrieved 1 August 2012, from http://www.v3.co.uk/v3-uk/news/2010144/kernel-level-malware-rise . Ning, P. and S. Jajodia (2003). Intrusion Detection Techniques. The Internet Encyclopedia. H. Bidgoli, John Wiley & Sons: pp. 2–6. Norman ASA. (2012). "SandBox Information Center." Retrieved 25/7/2012, from http://www.norman.com/security_center/security_tools . NSL-KDD. (2009). "NSL-KDD Data Set." Retrieved 19 March 2011, from http://nsl.cs.unb.ca/NSL-KDD/ . Offensive Computing. (2010). "Malware Dataset." Retrieved 10 February 2010, from http://www.offensivecomputing.net/?q=node/602 . [email protected]. (2012). "OllyDbg." Retrieved 8/5/2012, from http://www.ollydbg.de/ . Olzak, T. (2008). "Behavior-based AV Solutions Cannot Stand Alone." Retrieved 10 March, 2011, from http://www.techrepublic.com/blog/security/behavior-based-av- solutions-cannot-stand-alone/531 . OSSEC. (2011). "OSSEC: Open Source Host-based Intrusion Detection System." Retrieved 18 March 2011, from http://www.ossec.net/main/downloads . Panda Security. (2010). "Malware Encyclopedia." Retrieved 1 June 2010, from http://www.pandasecurity.com/homeusers/security-info/about- malware/encyclopedia/results? Parampalli, C., R. Sekar and R. Johnson (2008). A Practical Mimicry Attack Against Powerful System-call Monitors. Proceedings of the 2008 ACM symposium on Information, Computer and Communications Security, Tokyo, Japan, ACM: pp. 156-167. Parrish, A. (2010). "Hackers Plant Pages on University Websites." Retrieved 19 March 2011, from http://personalmoneystore.com/moneyblog/2010/06/30/hackers-plant- pages-on-university-websites/ . Patcha, A. and J. M. Park (2007). "An Overview of Anomaly Detection Techniques: Existing Solutions and Latest Technological Trends." 51(12): pp. 3448-3470.

171

Patsakis, C., A. Asthenidis and A. Chatzidimitriou (2009). Social Networks as an Attack Platform: Facebook Case Study. Eighth International Conference on Networks (ICN'09): pp. 245-247. Peddireddy, T. D. and J. M. Vidal (2002). Multiagent Network Security System Using FIPA-OS. Proceedings of IEEE SoutheastCon: pp. 229-233. Peikari, C. and A. Chuvakin (2004). Security Warrior, O'Reilly Media. PestPatrol. (2005). "Trojan.Win32.Sweet." Retrieved 1 June 2010, from http://www.pestpatrol.com/zks/pestinfo/t/trojan_win32_sweet.asp . Pietrek, M. (2002a). "An In-Depth Look into the Win32 Portable Executable File Format." Retrieved 3 August 2012, from http://msdn.microsoft.com/en- us/magazine/cc301805.aspx . Pietrek, M. (2002b). "An In-Depth Look into the Win32 Portable Executable File Format, Part 2." Retrieved 3 August 2012, from http://msdn.microsoft.com/en- us/magazine/cc301808.aspx . Porras, P. A. and P. G. Neumann (1997). EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances. Proceedings of the 20th National Information Systems Security Conference: pp. 353-365. Ramilli, M. and M. Bishop (2010). Multi-stage delivery of malware. 5th International Conference on Malicious and Unwanted Software (MALWARE): pp. 91-97. Ramilli, M. and M. Prandini (2010). "Always the Same, Never the Same." IEEE Security & Privacy 8(2): pp. 73-75. Reavey, M. (2012). "Microsoft releases Security Advisory 2718704." Retrieved 5 June 2012, from http://blogs.technet.com/b/msrc/archive/2012/06/03/microsoft- releases-security-advisory-2718704.aspx?Redirected=true . Refaeilzadeh, P., L. Tang and H. Liu (2009). Cross-Validation. Encyclopedia of Database Systems. L. Liu and M. T. Özsu: pp. 532-538. Richter, J. (1999). Programming Applications for Microsoft Windows, Microsoft Press. Rogers, D. T. (2003). A Framework for Dynamic Subversion. Naval Postgraduate School. Monterey, California, United States Naval Academy. Master of Science in Computer Science: p. 128. Rojach, L. and O. Maimon (2008). Data Mining With Decision Tree, World Scientific

172

Publishing. Rouse, M. (2010). "Privilege escalation attack." Retrieved 1 August 2012, from http://searchsecurity.techtarget.com/definition/privilege-escalation-attack . Russinovich, M. (2004). "Inside the Native API." Retrieved 3/7/2012, from http://netcode.cz/img/83/nativeapi.html . Russinovich, M. (2006). "Inside Native Applications." Retrieved 8 August 2012, from http://technet.microsoft.com/en-us/sysinternals/bb897447.aspx . Russinovich, M. E. and D. A. Solomon (2005). Microsoft Windows Internals, Microsoft Press. Sabhnani, M. and G. Serpen (2003). Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context. Proceedings of International Conference on Machine Learning: Models Technologies and Applications: pp. 23-26. Saiedian, H. and D. Broyle (2011). "Security Vulnerabilities in the Same-Origin Policy: Implications and Alternatives." Computer 44(9): pp. 29-36. Sami, A., B. Yadegari, H. Rahimi, N. Peiravian, S. Hashemi and A. Hamze (2010). Malware Detection based on Mining API Calls. ACM Symposium on Applied Computing, Sierre, Switzerland, ACM: pp. 1020-1025. Saxena, P., R. Sekar, M. Iyer and V. Puranik (2008). A Practical Technique for Containment of Untrusted Plug-ins, Secure Systems Lab, Stony Brook University. p. 14. Sekar, R., A. Gupta, J. Frullo, T. Shanbhag, A. Tiwari, H. Yang and S. Zhou (2002). Specification-based Anomaly Detection: a New Approach for Detecting Network Intrusions. Proceedings of the 9th ACM Conference on Computer and Sommunications Security. Washington, DC, USA, ACM: pp. 265-274. Shafi, K. (2008). An Online and Adaptive Signature-based Approach for Intrusion Detection Using Learning Classifier Systems. School of Engineering and Information Technology. Canberra, University of New South Wales. Ph.D Thesis: p. 282. Shafi, K. and H. A. Abbass (2007). Biologically-inspired Complex Adaptive Systems approaches to Network Intrusion Detection, Elsevier Advanced Technology

173

Publications. 12: pp. 209-217. Shafiq, M., S. Khayam and M. Farooq (2008a). Embedded Malware Detection Using Markov n-Grams. Detection of Intrusions and Malware and Vulnerability Assessment. D. Zamboni, Springer-Verlag Berlin Heidelberg: pp. 88-107. Shafiq, M. Z., S. A. Khayam and M. Farooq (2008b). Improving Accuracy of Immune- inspired Malware Detectors by Using Intelligent Features. Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation. Atlanta, USA, ACM Press: pp. 119-126. Shanmugam, B. (2008). "Hardening Windows Security." Retrieved 12 December 2008, from http://www.malwarehelp.org/malware-prevention-hardening-windows- security1.html . Shannon, C. (2007). Current Network Security Threats: DoS, Viruses, Worms, Botnets. TERENA Networking Conference, Cooperative Association for Internet Data Analysis (CAIDA'07): pp. 1-43. Shinagawa, T., K. Kono and T. Masuda (2000). Exploiting Segmentation Mechanism for Protecting against Malicious Mobile Code, Department of Information Science, Faculty of Science, University of Tokyo. 00-02. p. 16. Sophos Labs. (2010). "Dangers Of Virus Signature Checksum." Retrieved 18 March 2011, from http://nakedsecurity.sophos.com/2010/01/17/dangers-virus-signature- checksum/ . Sourcefire. (2011). "Snort Downloads." Retrieved 17 March 2011, from http://www.snort.org/snort-downloads . Spitzner, L. (2003). "Honeypot Farms." Retrieved 19 March 2011, from http://www.symantec.com/connect/articles/honeypot-farms . Spyware Terminator. (2010). "Worm.Fozer." Retrieved 1 June 2010, from http://www.spywareterminator.com/it/item/6976/WormFozer.html . Stallings, W. (2011). Operating Systems: Internals and Design Principles, Prentice Hall. Stolfo, S. J., K. Wang and W. J. Li (2007). Towards Stealthy Malware Detection. Malware Detection, Springer US. 27: pp. 231-249. Sunbelt Security. (2010). "Trojan.Win32.Younmac Information and Removal." Retrieved 1 June 2010, from

174

http://www.sunbeltsecurity.com/ThreatDisplay.aspx?name=Trojan.Win32.Younma c&tid=1421927&cs=7B440D54F8A629D4C23E010679C052B7 . Sundaram, A. (1996). An Introduction to Intrusion Detection, ACM. 2: pp. 3-7. Surisetty, S. and S. Kumar (2010). Is McAfee SecurityCenter/Firewall Software Providing Complete Security for Your Computer? Fourth International Conference on Digital Society (ICDS'10): pp. 178-181. Symantec. (2007). "W32.Almanahe.A." Retrieved 1 Jun 2010, from http://www.symantec.com/business/security_response/writeup.jsp?docid=2007- 041317-4330-99 . Symantec. (2010). "Malware Profile Search." Retrieved 1 June 2010, from http://searchg.symantec.com/search? Symantec Corporation. (2012). "Norton™ AntiVirus 2012." Retrieved 8 June 2012, from https://store.norton.com/estore/productsdetailsmoreinfo/productskucode/21172370 /sourcepagetype/landingproductfeatures/parentcartid/0/pricegroupid/master_cons_ pl/asoociationtype/0/slotno/4.html. Szor, P. (2005). The Art of Computer Virus Research and Defense, Addison-Wesley Professional. Szor, P. and P. Ferrie (2001). Hunting for Metamorphic. Virus Bulletin Conference (VB'01): pp. 123-144. Tavallaee, M., E. Bagheri, L. Wei and A. A. Ghorbani (2009). A Detailed Analysis of the KDD CUP 99 Data Set. IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA'09): pp. 1-6. Tavallaee, M., N. Stakhanova and A. A. Ghorbani (2010). "Toward Credible Evaluation of Anomaly-Based Intrusion-Detection Methods." IEEE Transactions on Systems, Man and Cybernetics, Part C: Applications and Reviews 40(5): pp. 516-524. Tewari, M. (2008). "Cyber Attack on 10 Government Websites." Retrieved 19 March 2011, from http://www.dnaindia.com/india/report_cyber-attack-on-10-govt- websites_1169339 . Texas Instruments (2009). Common Object File Format, Texas Instruments. SPRAAO8. p. 15.

175

The Grugq. (2002). "Defeating Forensic Analysis on Unix." Phrack 59 Retrieved 18/5/2012, from http://www.phrack.org/issues.html?issue=59&id=6 . The Honeynet Project (2004). Know Your Enemy: Learning about Security Threats, Addison-Wesley Professional. Thompson, I. and M. Monroe. (2006). "FragFS: An Advanced Data Hiding Technique." Retrieved 1 May 2012, from http://www.blackhat.com/presentations/bh-federal- 06/BH-Fed-06-Thompson/BH-Fed-06-Thompson-up.pdf . ThreatExpert. (2010). "Browse/Search All Reports." Retrieved 1 June 2010, from http://www.threatexpert.com/reports.aspx?find=Sneaker&x=0&y=0 . ThreatExpert Ltd. (2012). "Welcome to ThreatExpert." Retrieved 25/7/2012, from http://www.threatexpert.com . TIS Committee. (1995). "Executable and Linking Format (ELF)." Retrieved 3 August 2012, from http://refspecs.linuxbase.org/elf/elf.pdf . Tokhtabayev, A. G., V. A. Skormin and A. M. Dolgikh (2008). Detection of Worm Propagation Engines in the System Call Domain Using Colored Petri Nets. IEEE International Conference of Performance, Computing and Communications (IPCCC'08), Austin, Texas: pp. 59-68. Trend Micro. (2010). "Threat Encyclopedia." Retrieved 1 June 2010, from http://about- threats.trendmicro.com/threatencyclopedia.aspx?language=us&tab=malware . Uppuluri, P. and R. Sekar (2001). Experiences with Specification-based Intrusion Detection. Recent Advances in Intrusion Detection. W. Lee, L. Mé and A. Wespi, Springer. 2212: pp. 172-189. US-CERT. (2004). "Statistics on Federal Incident Reports." Retrieved April 2007, from http://www.us-cert.gov/federal/statistics . US-CERT. (2008a). "Computer Forensics." Retrieved 18/5/2012, from http://www.us- cert.gov/reading_room/forensics.pdf . US-CERT. (2008b). "Microsoft Windows Fails to Properly Handle the NoDriveTypeAutoRun Registry Value." Retrieved 14 March 2011, from http://www.kb.cert.org/vuls/id/889747 . Valenzise, G., V. Nobile, M. Tagliasacchi and S. Tubaro (2011). Countering JPEG Anti- forensics. 18th IEEE International Conference on Image Processing (ICIP'11): pp.

176

1949-1952. Vanderboom, D. (2008). "Tree: Implementing a Non-Binary Tree in C#." Retrieved 1 January 2009, from http://dvanderboom.wordpress.com/2008/03/15/treet- implementing-a-non-binary-tree-in-c/ . Vega, M. D. (2011). "From RSA 2011: Last Nail in the Coffin for Signature-based AV." Retrieved 18 March 2011, from http://blog.trendmicro.com/from-rsa-2011-last- nail-in-the-coffin-for-signature-based-av . Venkataram, P., J. Pitt, B. S. Babu and E. Mamdani (2008). An intelligent Proactive Security System for Cyber Centres Using Cognitive Agents, Inderscience Publishers. 2: pp. 235-249. Viega, J. (2011). "Reality Check." IEEE Security & Privacy 9(1): pp. 3-4. Virus Bulletin. (2011). "Chinese Whispers of Malware Writing and Bribery in the Industry." Retrieved 18 March 2011, from http://www.virusbtn.com/news/2010/12_14.xml?rss . VX Heavens. (2010). "Virus collection." Retrieved 1/3/2008, from http://vx.netlux.org/faq.php#whole . Wagner, D. and P. Soto (2002). Mimicry Attacks on Host-based Intrusion Detection Systems. Proceedings of the 9th ACM Conference on Computer and Communications Security, Washington, DC, USA, ACM: pp. 255-264. Wang, J. H., P. S. Deng, Y. S. Fan, L. J. Jaw and Y. C. Liu (2003). Virus Detection Using Data Mining Techniques. Proceedings of the IEEE 37th Annual International Carnahan Conference on Security Technology: pp. 71-76. Weaver, N., V. Paxson, S. Staniford and R. Cunningham (2003). A Taxonomy of Computer Worms. Proceedings of the ACM Workshop on Rapid Malcode, Washington, DC, USA, ACM: pp. 11-18. Wee, C. K. (2006). Analysis of Hidden Data in NTFS File System, Edith Cowan University. Wespi, A., M. Dacier and H. Debar (2000). Intrusion Detection Using Variable-Length Audit Trail Patterns. Proceedings of the Third International Workshop on Recent Advances in Intrusion Detection, Springer-Verlag: pp. 110-129. Whittaker, J. A. and A. D. Vivanco (2002). Neutralizing Windows-based Malicious

177

Mobile Code. Proceedings of the 2002 ACM Symposium on Applied Computing, Madrid, Spain, ACM: pp. 242-246. Williamson, M. M. (2002). Throttling Viruses: Restricting Propagation to Defeat Malicious Mobile Code. Proceedings of the 18th Annual Computer Security Applications Conference: pp. 61-68. Witten, I. H. and E. Frank (2005). Data Mining: Practical Machine Learning Tools and Techniques. USA, Morgan Kaufmann. www.apimonitor.com . (2010). "API Monitor." Retrieved May 2010, from http://apimonitor.com/order.html . www.f-secure.com . (2010). "Brontok.N." Retrieved May 2010, from http://www.f- secure.com/v-descs/brontok_n.shtml . Xiao, K., J. Zheng, X. Wang and X. Xue (2005). A Novel Peer-to-Peer Intrusion Detection System. Sixth International Conference on Parallel and Distributed Computing, Applications and Technologies: pp. 441-445. Yan, W., Z. Zhang and N. Ansari (2008). "Revealing Packed Malware." IEEE Security & Privacy: pp. 72-76. Ye, Y., D. Wang, T. Li and D. Ye (2007). IMDS: Intelligent Malware Detection System. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, ACM: pp. 1043-1047. Yegneswaran, V., P. Barford and J. Ullrich (2003). Internet Intrusions: Global Characteristics and Prevalence. Proceedings of the 2003 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems. San Diego, CA, USA, ACM: pp. 138-147. Zaidi, A., T. Kenaza and N. Agoulmine (2010). IDS Adaptation for an Efficient Detection in High-Speed Networks. Proceedings of the Fifth International Conference on Internet Monitoring and Protection, IEEE Computer Society: pp. 11-15. Zdrnja, B. (2010). "Malware Modularization and AV Detection Evasion." Retrieved 12 April 2012, from https://isc.sans.edu/diary.html?storyid=8857. Zeltser, L. (2012). "Reverse-engineering Malware Cheat Sheet." Retrieved 8/5/2012, from http://zeltser.com/reverse-malware/reverse-malware-cheat-sheet.html .

178

Zhang, G. Y., J. Li and G. C. Gu (2004). Research on Defending DDoS Attack - an Expert System Approach. IEEE International Conference on Systems, Man and Cybernetics. 4: pp. 3554-3558. Zimry, Irene and Yeh. (2011). "DroidKungFu Utilizes an Update Attack." Retrieved 5 June 2012, from http://www.f-secure.com/weblog/archives/00002259.html . Zou, C. C., N. Duffield, D. Towsley and W. Gong (2006). "Adaptive Defense Against Various Network Attacks." IEEE Journal on Selected Areas in Communications 24(10): pp. 1877-1888. Zyba, G., G. M. Voelker, M. Liljenstam, A. Mehes and P. Johansson (2009). Defending Mobile Phones from Proximity Malware. Proceedings of the IEEE INFOCOM Conference: pp. 1503-1511.

179