Machine Learning Algorithms for the Analysis and Detection of Network Attacks
Total Page:16
File Type:pdf, Size:1020Kb
MACHINE LEARNING ALGORITHMS FOR THE ANALYSIS AND DETECTION OF NETWORK ATTACKS by Maryam Mousaarab Najafabadi A Dissertation Submitted to the Faculty of The College of Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Florida Atlantic University Boca Raton, FL August 2017 Copyright 2017 by Maryam Mousaarab Najafabadi ii ACKNOWLEDGEMENTS First and foremost, I would like to acknowledge my graduate advisor and men- tor, Dr. Taghi M. Khoshgoftaar. His unwavering support, patience, and knowledge have helped me to become the researcher I am today. I would also like to thank Dr. Bassem Alhalabi, Dr. Xingquan Zhu, and Dr. Hanqi Zhuang, for being on my PhD supervisory committee. I want to thank the members of FAU security analytic group, Mr. Richard Zuech and especially Mr. Chad Calvert and Mr. Clifford Kemp for their data collection efforts. I would like to acknowledge NSF I/UCRC which provided a framework for interaction between FAU faculty and industry, as well as the LexisNexis company for supporting my research assistantship position during my PhD studies. I want to thank Mr. Richard Bauder, from FAU Data Mining and Machine Learning Laboratory, for his constructive reviews of this dissertation. My thanks also go to the other members of the FAU Data Mining and Machine Learning Laboratory at Florida Atlantic University for their continued feedback and support. I also gratefully acknowledge partial support by the National Science Founda- tion, under grant number CNS- 1427536. Any opinions, findings, and conclusions or recommendations expressed in this dissertation are those of the author and do not necessarily reflect the views of the National Science Foundation. iv ABSTRACT Author: MaryamMousaarabNajafabadi Title: Machinelearningalgorithmsfortheanalysisanddetection of network attacks Institution: Florida Atlantic University Dissertation Advisor: Dr. Taghi M. Khoshgoftaar Degree: DoctorofPhilosophy Year: 2017 The Internet and computer networks have become an important part of our organizations and everyday life. With the increase in our dependence on computers and communication networks, malicious activities have become increasingly prevalent. Network attacks are an important problem in today’s communication environments. The network traffic must be monitored and analyzed to detect malicious activities and attacks to ensure reliable functionality of the networks and security of users’ information. Recently, machine learning techniques have been applied toward the detection of network attacks. Machine learning models are able to extract similarities and patterns in the network traffic. Unlike signature based methods, there is no need for manual analyses to extract attack patterns. Applying machine learning algorithms can automatically build predictive models for the detection of network attacks. This dissertation reports an empirical analysis of the usage of machine learning methods for the detection of network attacks. For this purpose, we study the detection of three common attacks in computer networks: SSH brute force, Man In The Middle (MITM) and application layer Distributed Denial of Service (DDoS) attacks. Using outdated and non-representative benchmark data, such as the DARPA dataset, in the v intrusion detection domain, has caused a practical gap between building detection models and their actual deployment in a real computer network. To alleviate this limitation, we collect representative network data from a real production network for each attack type. Our analysis of each attack includes a detailed study of the usage of machine learning methods for its detection. This includes the motivation behind the proposed machine learning based detection approach, the data collection process, feature engineering, building predictive models and evaluating their performance. We also investigate the application of feature selection in building detection models for network attacks. Overall, this dissertation presents a thorough analysis on how machine learning techniques can be used to detect network attacks. We not only study a broad range of network attacks, but also study the application of different machine learning methods including classification, anomaly detection and feature selection for their detection at the host level and the network level. vi To my beloved family, my parents, Mostafa and Manije, my sisters, Anna and Elahe and my brother, Ehsan. You are the greatest treasure in my life. None of my success would be possible without your endless love and support. MACHINE LEARNING ALGORITHMS FOR THE ANALYSIS AND DETECTION OF NETWORK ATTACKS List of Tables .............................. xi List of Figures ............................. xii 1 Introduction .............................. 1 1.1 Motivation ................................. 1 1.1.1 Intrusion Detection ........................ 1 1.1.2 Machine Learning for the Detection of Network Attacks . 2 1.1.3 Lack of Proper Intrusion Detection Public Data . 3 1.2 Contributions ............................... 6 1.3 Dissertation Structure .......................... 7 2 Methodology .............................. 8 3 SSH Brute Force Attacks ....................... 15 3.1 Background and Motivation ....................... 15 3.2 Related Work ............................... 19 3.3 Data Collection .............................. 20 3.4 Detection of SSH Brute Force Attacks ................. 23 3.5 Experimental Results and Analysis ................... 28 3.6 Chapter Summary ............................ 29 4 Man In The Middle Attacks ..................... 31 4.1 Background and Motivation ....................... 31 viii 4.2 Related Work ............................... 35 4.3 Data Collection .............................. 38 4.4 Detection of Man In The Middle Traffic ................. 42 4.4.1 General Approach ......................... 42 4.4.2 Selecting a Subset of Packet Header Fields ........... 46 4.5 Experimental Results and Analysis ................... 49 4.6 Chapter Summary ............................ 54 5 Application Layer DDoS Attacks ................... 56 5.1 Background and Motivation ....................... 56 5.2 Related Work ............................... 63 5.3 Data Collection .............................. 65 5.4 User Behavior Anomaly Detection .................... 69 5.4.1 Defining User Behavior ...................... 71 5.4.2 Detecting Anomalous User Behaviors .............. 74 5.5 Experimental Results and Analysis ................... 78 5.5.1 Results for PCA-subspace Anomaly Detection ......... 78 5.5.2 Results for One-class SVM Anomaly Detection ......... 80 5.5.3 Comparison analysis ....................... 82 5.6 Chapter Summary ............................ 84 6 Feature Selection ............................ 86 6.1 Background and Motivation ....................... 86 6.2 Related Work ............................... 88 6.3 Methodology ............................... 91 6.3.1 Evaluating Feature Selection Methods for Detection of Network Attacks .............................. 91 6.3.2 Ensemble of Feature Selection Methods for Analyzing Impor- tant Features ........................... 95 6.4 Experimental Results and Analysis ................... 100 ix 6.4.1 Results for Evaluating Feature Selection Methods for Detection of Network Attacks ........................ 100 6.4.2 Results for Ensemble of Feature Selection Methods for Analyz- ing Important Features ...................... 107 6.5 Chapter Summary ............................ 109 7 Conclusion and Future Works .................... 112 7.1 Conclusions ................................ 113 7.1.1 Data Collection .......................... 113 7.1.2 Machine Learning Methods for The Detection of Network Attacks 115 7.1.3 Future Work ............................ 118 Bibliography .............................. 119 x LIST OF TABLES 3.1 Description of features extracted from aggregated data ........ 24 3.2 Cross validation results .......................... 29 4.1 Collected data information ........................ 40 4.2 Selected header fields for each attack and the whole data ....... 51 4.3 Selected header fields for each attack and the whole data when check- sum and length-related fields are removed ............... 51 4.4 Performance results for different subsets of packet header fields and different datasets ............................. 52 5.1 Attack variants .............................. 70 5.2 Number of instances in each dataset ................... 78 5.3 AUC and number of instance for each attack type ........... 82 6.1 Details of the dataset ........................... 92 6.2 Description of features extracted from sessions ............. 98 6.3 AUC values for different combination of feature selection and classifi- cation methods. .............................. 103 6.4 ANOVA Results .............................. 104 6.5 Selected features by the ensemble of rankers. .............. 107 6.6 Cross validation results on the whole feature set ............ 108 6.7 Cross validation results on the selected feature set with 7 features . 108 6.8 ANOVA Results .............................. 109 xi LIST OF FIGURES 2.1 General methodology ........................... 8 2.2 Network topology ............................. 11 4.1 Man In The Middle (MITM) Attack .................. 32 4.2 Half-duplex Man In The Middle ..................... 43 4.3 Full-duplex Man In The Middle ..................... 43 4.4 Plotted TPR results for Table 4.4 .................... 53 5.1 Distributed Denial of Service