Mitigating Concept Drift in Data Mining Applications for Intrusion Detection Systems

Mitigating Concept Drift in Data Mining Applications for Intrusion Detection Systems

Mitigating concept drift in data mining applications for intrusion detection systems Koutrouki Evgenia SID: 3307160006 SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Communications and Cybersecurity DECEMBER 2017 THESSALONIKI – GREECE 1 Mitigating concept drift in data mining applications for intrusion detection systems Koutrouki Evgenia SID: 3307160006 Supervisor: Prof. Georgios Ioannou SCHOOL OF SCIENCE & TECHNOLOGY A thesis submitted for the degree of Master of Science (MSc) in Communications and Cybersecurity DECEMBER 2017 THESSALONIKI – GREECE 2 ABSTRACT The phenomenon of concept drift is defined as the unexpected behavior of a data stream under changing environments. It is considered one of the major problems of data mining applications. Intrusion detection systems are correlated with data mining due to the fact that they collect, monitor and analyze data, so as it is expected, they also experience concept drift. By analyzing the different types of data mining, machine learning, intrusion detection systems and concept drift, a new method is developed with the aim to mitigate this phenomenon in data mining applications for intrusion detection systems. Koutrouki Evgenia 31/12/2017 3 Contents ABSTRACT ................................................................................................................................. 3 CHAPTER 1 INTRODUCTION .......................................................................................................................... 6 1.1 Context ........................................................................................................................ 6 1.2 Problem Statement ....................................................................................................... 7 1.2.1 Concept Drift ............................................................................................................ 7 1.2.2 Machine Learning and Data Mining ............................................................................ 9 1.2.3 Intrusion Detection Systems .................................................................................... 10 1.3 Aims & Objectives ...................................................................................................... 11 1.4 Dissertation Layout ..................................................................................................... 11 CHAPTER 2 LITERATURE REVIEW ............................................................................................................... 12 2.1 An overview of data mining techniques................................................................................ 12 2.1.1 Classification.............................................................................................................. 12 2.1.2 Regression ................................................................................................................ 14 2.1.3 Association ................................................................................................................ 15 2.1.4 Clustering .................................................................................................................. 16 2.2 Data Mining Techniques in Intrusion Detection Systems ....................................................... 17 2.2.1 k-Nearest Neighbor Algorithm ...................................................................................... 17 2.2.2 Naïve Bayes Algorithm................................................................................................ 18 2.2.3 Decision Tree Algorithm .............................................................................................. 19 2.3 Developed Methods for Mitigating Concept Drift in Intrusion Detection Systems ...................... 20 2.3.1 MineClass Algorithm ................................................................................................... 20 2.3.2 Early Drift Detection Method ........................................................................................ 21 2.3.3 DWM Algorithm .......................................................................................................... 22 2.3.4 FAE Algorithm ............................................................................................................ 23 CHAPTER 3 METHODOLOGY ....................................................................................................................... 25 3.1 Software Setup ................................................................................................................. 25 3.1.1 Waikato Environment for Knowledge Analysis (WEKA) .................................................. 25 3.1.2 Massive Online Analysis (MOA) ................................................................................... 26 3.1.3 Eclipse ...................................................................................................................... 26 3.1.4 Operating System ....................................................................................................... 27 3.2 Experimental Dataset ........................................................................................................ 27 3.3 Algorithm Implementation .................................................................................................. 28 4 3.3.1 Objective ................................................................................................................... 28 3.3.2 The New Method Hypothesis ....................................................................................... 29 3.3.3 Classifier Behavior...................................................................................................... 30 3.4 Data Analysis ................................................................................................................... 30 3.4.1 Output Information ...................................................................................................... 30 3.4.2 Testing Method .......................................................................................................... 31 3.4.3 Evaluation Methods .................................................................................................... 31 CHAPTER 4 RESULTS AND ANALYSIS ......................................................................................................... 33 4.1 DATASET PRESENTATION .............................................................................................. 33 4.2 RESULTS IN WEKA ENVIRONMENT ................................................................................. 34 4.1.1 Hoeffding Tree ........................................................................................................... 35 4.1.2 Naïve Bayes .............................................................................................................. 37 4.1.3 J48............................................................................................................................ 39 4.1.3 lazy IBk ..................................................................................................................... 41 4.3 RESULTS IN MOA ENVIRONMENT ................................................................................... 43 CHAPTER 5 CONCLUSIONS ......................................................................................................................... 45 REFERENCES ........................................................................................................................... 46 APPENDIX ................................................................................................................................ 48 5 CHAPTER 1 INTRODUCTION This first chapter is the one where the introduction on data mining, machine learning and intrusion detection system takes place. Along with that, the problem of concept drift is presented and analyzed. A preview of the methodology used is also explained. 1.1 Context When discussing about technology, the words machine learning continuously comes up. Those two terms are paired together because with technology evolving and becoming such a big part of people’s everyday lives, the need of automating several processes is a critical issue. Numerous techniques have been developed to automate both systems and communications but also new problems appear along with that. Securing sensitive data is also related to automated processes. There are different types of systems that are built for detecting and preventing attacks that aim sensitive data. That kind of systems are called intrusion detection systems (IDS). Intrusion detection systems use machine learning algorithms to become smarter and faster when it comes to detecting a malicious movement. Data that are continuously received over a long period of time are known as data streams. As those sequential data are arriving they are processed and analyzed by the algorithms so that useful information can be acquired [1]. The data streams can be processed either offline or online. In offline training the whole training dataset is available when the model is training. When the model is trained and the training process has finished, it can be used for predictions. In online training the data keep arriving and the processing is sequential. The training dataset is not fully complete during the training of the model. The model keeps training as new training data keep coming up. The trained model is used for predictions

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    48 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us