Improving the Effectiveness of Behaviour-Based Malware Detection
Total Page:16
File Type:pdf, Size:1020Kb
Improving the Effectiveness of Behaviour-based Malware Detection Mohd Fadzli Marhusin BSc. Information Studies (Hons) (Information Systems Management) UiTM, Malaysia Master of Information Technology (Computer Science) UKM, Malaysia A thesis submitted in partial fulfilment of the requirements for the degree of Doctor of Philosophy at the School of Engineering and Information Technology University of New South Wales Australian Defence Force Academy Copyright 2012 by Mohd Fadzli Marhusin PLEASE TYPE THE UNIVERSITY OF NEW SOUTH WALES Thesis/Dissertation Sheet Surname or Family name: MARHUSIN First name: MOHD FADZLI Other name/s: Abbreviation for degree as given in the University calendar: PhD (Computer Science) School: School of Engineering and Information Technology (SEIT) Faculty: Title: Improving the Effectiveness of Behaviour-based Malware Detection Abstract 350 words maximum: (PLEASE TYPE) Malware is software code which has malicious intent but can only do harm if it is allowed to execute and propagate. Detection based on signature alone is not the answer, because new malware with new signatures cannot be detected. Thus, behaviour-based detection is needed to detect novel malware attacks. Moreover, malware detection is a challenging task when most of the latest malware employs some protection and evasion techniques. In this study, we present a malware detection system that addresses both propagation and execution. Detection is based on monitoring session traffic for propagation, and API call sequences for execution. For malware detection during propagation, we investigate the effectiveness of signature-based detection, anomaly-based detection and the combination of both. The decision-making relies upon a collection of recent signatures of session-based traffic data collected at the endpoint level. Patterns in terms of port distributions and frequency or session rates of the signatures are observed. If an abnormality is found, it often signifies worm behaviour. A knowledge base consisting of recent traffic data, which is used to predict future traffic patterns, helps to reverse the incorrect flagging of suspected worms. The knowledge base is made of recent traffic, used to predict future patterns of traffic data. For detection based on execution, we analyse sequences of API calls grouped into n-grams which are compared with benign and malware profiles. A decision is made based on a statistical measure, which indicates how close the behaviour represented in the n-grams is to each of the profiles. The main contributions of this thesis are: the proposal and evaluation of a framework for detecting malware, that considers both propagation and execution in a systematic way; the detection methods are based on information that is simpler to process than other proposals in the literature, yet still achieve very high detection accuracy; and malware can be correctly recognised early in its execution. The experimental results show that our framework is promising in terms of effective behaviour-based detection that can detect malware and protect our computer networks from future zero-day attacks. Declaration relating to disposition of project thesis/dissertation I hereby grant to the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or in part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all property rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstracts International (this is applicable to doctoral theses only). …………………………………………………………… ……………………………………..……………… ……….……………………...…….… Signature Witness Date The University recognises that there may be exceptional circumstances requiring restrictions on copying or conditions on use. Requests for restriction for a period of up to 2 years must be made in writing. Requests for a longer period of restriction may be considered in exceptional circumstances and require the approval of the Dean of Graduate Research. FOR OFFICE USE ONLY Date of completion of requirements for Award: THIS SHEET IS TO BE GLUED TO THE INSIDE FRONT COVER OF THE THESIS COPYRIGHT STATEMENT ‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.' Signed ……………………………………………........................... Date ……………………………………………........................... AUTHENTICITY STATEMENT ‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’ Signed ……………………………………………........................... Date ……………………………………………........................... Abstract Malware is software code that has malicious intent. In recent years, there have been huge changes in the threat landscape. As our dependency on the Internet for social- related information sharing and work increases, the number of the possible threats is huge and we are indeed susceptible to them. Attacks may from the individual or organisational level, to nation-states resorting to cyber warfare to infiltrate and sabotage enemies operation. Hence, the need for a secure and dependable cyber defence is relevant at all levels. Malware can only do harm if it is allowed to propagate and execute without being detected. Detection based on signature alone is not the answer, because new malware with new signatures cannot be detected. Thus, behaviour-based detection is needed to detect novel malware attacks. Moreover, malware detection is a challenging task when most of the latest malware employs some protection and evasion techniques. In this study, we present a malware detection system that addresses both propagation and execution. Detection is based on monitoring session traffic for propagation, and API call sequences for execution. Our approach is inspired by the human immune system theories known as the Self/Non-self Theory and the Danger Theory. For malware detection during propagation, we investigate the effectiveness of signature-based detection, anomaly-based detection and the combination of both. The decision-making relies upon a collection of recent signatures of session-based traffic data collected at the endpoint (single computer) level. Patterns in terms of port distributions and frequency or session rates of the signatures are observed. If an abnormality is found, it often signifies worm behaviour. A knowledge base consisting of recent traffic data, which is used to predict future traffic patterns, helps to reverse the incorrect flagging of suspected worms. The knowledge base is made of recent traffic, used to predict future patterns of traffic data. It maintains only recent data as the usage pattern of a computer changes over time. ii Our proposed system includes several detectors, the operations of which are governed by several parameters. We study both how these parameters affect the results and performances when different detectors are or are not included. We find that the detectors produce inconsistent results when used independently but when used together achieve promising detection rates. In addition, we identify which worms are consistently detected by the system, and the characteristics of those the system cannot detect well. For detection based on execution, we analyse sequences of API calls grouped into n- grams which are compared with benign and malware profiles. A decision is made based on a statistical measure, which indicates how close the behaviour represented in the n- grams is to each of the profiles. Experiments show that the system is capable of correctly detecting malware early in its execution. The main contributions of this thesis are: the proposal and evaluation of a framework for detecting malware, that considers both propagation and execution in a systematic way; the detection methods are based on information that is simpler to process than other proposals in the literature, yet still achieve very high detection accuracy; and malware can be correctly recognised early in its execution. The experimental results show that our framework is promising in terms of effective behaviour-based detection that can detect malware and protect our computer networks from future zero-day attacks. iii Keywords Malware Detection System Intrusion Detection System Self-propagating Worm Session-based Detection API Calls based Detection iv Acknowledgement The highest gratitude to the God