<<

2020 International Conference on Computational Science and Computational Intelligence (CSCI)

Network Intrusion Detection with XGBoost and Deep Learning Algorithms: An Evaluation Study

Amr Attia Miad Faezipour Abdelshakour Abuzneid Computer Science & Engineering Computer Science & Engineering Computer Science & Engineering University of Bridgeport, CT 06604, USA University of Bridgeport, CT 06604, USA University of Bridgeport, CT 06604, USA [email protected] [email protected] [email protected]

Abstract— This paper introduces an effective Network Intrusion In the KitNET model introduced in [2], an unsupervised Detection Systems (NIDS) framework that deploys incremental technique is introduced for anomaly-based intrusion statistical damping features of the packets along with state-of- detection. Incremental statistical feature extraction of the the-art machine/deep learning algorithms to detect malicious packets is passed through ensembles of autoencoders with a patterns. A comprehensive evaluation study is conducted predefined threshold. The model calculates the Root Mean between eXtreme Gradient Boosting (XGBoost) and Artificial Neural Networks (ANN) where feature selection and/or feature Square (RMS) error to detect anomaly behavior. The higher dimensionality reduction techniques such as Principal the calculated RMS at the output, the higher probability of Component Analysis (PCA) and Linear Discriminant Analysis suspicious activity. (LDA) are also integrated into the models to decrease the system Supervised learning has achieved very decent results with complexity for achieving fast responses. Several experimental algorithms such as Random Forest, ZeroR, J48, AdaBoost, runs confirm how powerful machine/deep learning algorithms Logit Boost, and Multilayer Perceptron [3]. Machine/deep are for intrusion detection on known attacks when combined learning-based algorithms for NIDS have been extensively with the appropriate features extracted. To investigate unknown attacks, the models were trained on a of the studied in the literature. Some models manage imbalanced attack datasets, while a different (with a different attack datasets [4, 5], while others mainly focus on dimensionality type) was kept aside for testing. The decent results achieved reduction techniques implemented using Principal further support the belief that through supervised learning, the Component Analysis PCA [6], autoencoders [5], sparse model could additionally detect unknown attacks. autoencoders in conjunction with well-known classifiers such Keywords- NIDS; Machine Learning; ANN; XGBoost; LDA; as Random Forest [7]. The mentioned techniques have PCA. mostly been applied to the CICIDS2017 dataset [8]. On the other hand, very few evaluations have been carried out on the I. INTRODUCTION Kitsune family dataset [2] for NIDS. A. Background C. Contribution The criticality of intrusion detection has been increasing Reliable and effective NIDS are highly dependable on significantly, especially in the era of big data where a huge accurate and fast detection of the attacks. In this regard, amount of information is continuously transferred at high- creating less complex models while achieving 100% speed data rates. Moreover, the COVID-19 pandemic has detection is highly desirable [9]. This paper introduces a drastically increased the urgent need of transferring data framework which deploys popular machine learning digitally and providing all possible workflows online. algorithms such as eXtreme Gradient Boosting (XGBoost) Such surge also raises the pressing need for more secure and deep learning models such as Artificial Neural Networks Internet usage. (ANN) implemented solely, or along with feature selection It is very important to devise intelligent network intrusion techniques such as Principal Component Analysis (PCA) and detection systems (NIDS) using state-of-the-art technology. Linear Discriminant Analysis (LDA) to achieve the lowest There are two types of network intrusion detection systems: complexity possible while maintaining high-performance i) signature-based and ii) anomaly-based detection where intrusion detection results. What makes both algorithms machine learning is widely deployed [1]. extremely powerful and achieve very decent results is that B. Related Work they are applied to incremental statistical features. Accordingly, the framework proposed here could learn the Applying effective machine/deep learning in network patterns more efficiently and detect potentially known and intrusion detection systems has become increasingly popular unknown attacks. We managed to achieve great results by and significantly crucial due to the rising demand for Internet applying this concept. Initially, we applied XGBoost and deployment in every aspect of our lives. For this purpose, deep learning on Kitsune family datasets [2] one by one. We NIDS engineers need to come up with model(s) that could then, merged all 9 datasets in one very large dataset which efficiently protect and detect all the known attacks as well as had more than 21 million instances, to test the efficiency of unknown attacks that may not be previously known. detecting various attacks using the same trained model. Then,

978-1-7281-7624-6/20/$31.00 ©2020 IEEE 138 DOI 10.1109/CSCI51800.2020.00031 we considered the extreme case in supervised learning by building a model trained on 8 datasets and keeping one aside Input Data (Packet) (considered as the unknown attacks) for testing to study the effectiveness of the model for detecting unknown attacks via Incremental Statistical Features capturing and learning the common patterns of the attacks. This also gives an idea to some extent as to how different attacks share similar characteristics. This allowed us to Feature Selection Dimension Reduction develop a model that can detect known attacks and other unknown attacks using supervised learning techniques rather Classifiers (XGBoost / ANN) than unsupervised learning. Figure 1. Proposed Framework for NIDS

II. METHODOLOGY AND PROCEDURE TABLE I. CHARACTERISTIC SUMMARY OF KITSUNE FAMILY DATASET Attack # of True Negative A. Proposed Idea Attack Name Dataset size Type Features Percentage In this paper, we introduce a framework with XGBoost as Botnet 84.08%: Mirai 764,136 115 a classifier applied to incremental statistical damping features Malware Imbalanced of the intrusion data and compare its performance with ANN. 4.19%: Denial of SSL 2,207,570 115 Heavily Different dimensionality reduction techniques are applied to Service Renegotiation reduce the complexity of the model and compared to the Imbalanced Denial of 35.31%: SSDP Flood 4,077,265 115 specific selection of the features by XGBoost. The proposed Service Semi-balanced hybrid use of machine learning algorithms with incremental 0.25% Denial of statistical features for NIDS has not been investigated earlier SYN DoS 2,771,275 115 Extremely Service in the literature on the Kitsune family dataset. Figure 1 Imbalanced 3.87%: presents the flow chart of our machine learning model for Recon OS_SCAN 1,697,850 115 Heavily NIDS. Imbalanced Man in the 45.73% The rationale behind using the selected machine learning, ARP MitM 2,504,266 115 dimensionality reduction, and/or classifier techniques is Middle Semi-balanced 4.145% noticeable from the details presented hereafter: Man in the Video Injection 2,472,400 115 Heavily Middle 1) Principal Component Analysis is used to reduce the Imbalanced dimensions of the features by searching for the orthogonal Man in the 40.5% Active Wiretap 2,278,688 115 vectors that carry the most important information of the Middle Semi-balanced original features [10, 11]. 19.285% Recon Fuzzing 2,244,138 115 2) Linear Discriminant Analysis is a supervised algorithm Imbalanced also used as a dimension reduction technique. It searches for 23.08% Total 21,017,588 115 the direction of maximum discriminability in the space [12, Imbalanced 13]. 3) XGBoost (eXtreme Gradient Boosting) is a machine learning system using tree boosting algorithms. We III. RESULTS AND DISCUSSION implemented XGBoost for both feature selection and also as A. Experimental Setup a classifier [14]. In the proposed NIDS framework, the applied hybrid 4) Artificial Neural Network is applied as a binary classifier algorithms are implemented using PCA and LDA as using supervised training. A neural network, in general, dimension reduction techniques, followed by ANN and consists of input, hidden, and output layers and is a very XGBoost as classifiers. Feature selection using XGBoost is powerful tool for pattern classification [15]. In this work, we alternatively used to lower the complexity of the model and applied different structures starting from deep learning neural achieve fast detection responses with high accuracies. networks with 3 layers and above and also implemented We have applied and tested more than 250 different simple neural networks using only one layer. models on the 9 different datasets. For illustration, we mostly B. Kitsune Dataset demonstrate the models that achieved accuracy higher than The dataset family employed in this paper is cited in [2] 99.9%. However, for the merged datasets and the trials for and could be accessed publicly. It was created via a real IP the unknown attacks, we demonstrate most of the trials due camera video surveillance network. Features were extracted to the challenges of experimenting and detecting unknown by applying incremental statistics to capture the behavior of attacks through supervised learning. the data stream while reducing the weight of the past B. Single Known Attacks instances [2]. Table I shows a summary of the characteristics XGBoost achieved very decent results in most of the of the Kitsune family dataset used for this work with 9 attack datasets with an accuracy (Acc.) of 100% on SSDP and . Fuzzing (as seen in Tables II and III), even though some of

139 An interesting observation noted with XGBoost is that TABLE II. RESULTS OF XGBOOST AND ANN ON DATASETS PART 1 (1:5) F1 using only the first 23 features (the most recent windows # Algorithm Acc. Confusion Matrix TPR PPV Score only) produces very decent results with 100% accuracy on XGBoost 99.99% TN= 24309 FP = 8 0.999 0.99 0.999 the SSDP dataset (Table III). We also tried XGBoost with the FN = 7 TP = 128504

Mirai second and the third damped windows (features 23 to 46 and XGBoost + 99.99% TN = 24307 FP = 10 0.999 0.99 0.999 Features FN = 8 TP = 128503 as well as 46 to 69, separately), and still achieve very 0:92 ANN (100, 95.299 TN= 22621 FP = 1696 0.957 0.98 0.97 promising results, allowing for early detection with less 50, 10, 10) % FN = 5489 TP = 123022 +SC dimensionality, without the need for going with 5 windows, XGBoost 99.9996 TN=271214 FP = 0 0.999 1 0.999 as mentioned by the Kitsune paper [2]. Nevertheless, we % FN = 2 TP = 184522 encourage building a model to detect different types of

Active Active ANN 99.986 TN=271189 FP = 25 0.999 0.99 0.999 (60,10,60, % attacks (not only one specific attack) and suggest using all the FN = 40 TP = 184484 10,60) + SC 115 features. Therefore, while creating a model for detecting ANN (10) + 99.984 TN= 71155 FP = 59 0.999 0.99 0.999 W SC % unknown attacks, we preferred to consider all the 115 ireta FN = 16 TP = 184508 ANN (50, 99.987 TN 271187 FP = 27 0.999 0.99 0.999 features in order not to lose any important information that p 10, 10, 5) % FN = 32 TP = 184492 might exist to help our model recognize the common patterns +SC ANN (50, 99.984 TN=271168 FP = 46 0.999 0.99 0.999 of different attacks. 10 ,10, 5, 2) % FN = 25 TP = 184499 The results we achieved through our proposed models XGBoost 99.999 TN= 71319 FP = 0 0.999 1 0.999 encouraged us to take it to the next level and apply

ARP MitM ARP All features % FN = 3 TP = 229531 dimensionality reduction to the point where only one ANN (50, 99.976 TN= 71258 FP = 61 0.999 0.99 0.999 10, 10, 5) % FN = 60 TP = 229474 dimension is considered (only one feature is selected), while +SC ANN (50, 99.979 TN= 71290 FP = 29 0.999 0.99 0.999 still achieving very good results. Using LDA [16] as a 10 ,10, 5, 2) % FN = 76 TP = 229458 dimensionality reduction technique, hugely reduces the + SC XGBoost 99.999 TN= 26481 FP = 2 0.999 0.99 0.999 dimensionality space of the system with only one dimension, All features %

OS FN = 2 TP = 13085 while achieving very decent results with XGBoost on SSDP.

ANN (10) + 99.996 TN= 26482 FP = 1 0.999 0.99 0.999

S As can be seen from Table III, 99.9998% accuracy is

can SC % FN = 11 TP = 13076 achieved with very negligible false positives and false ANN (50, 99.997 TN= 26481 FP = 2 0.999 0.99 0.999 10 ,10, 5, 2) % FN = 8 TP = 13079 negatives. Another interesting observation is that by testing + SC XGBoost 99.9995 TN= 23116 FP = 1 0.999 0.99 0.999 the model using only the most important features, we could (All % FN = 1 TP = 18396 still achieve very high accuracy and F-score measures. This Features) XGBoost 99.9995 TN= 23116 FP = 1 0.999 0.99 0.999 implies that the most important features selected by XGBoost features % FN = 1 TP = 18396 23:46 carry a lot of information on the packets, allowing the XGBoost 99.9995 TN= 23116 FP = 1 0.999 0.99 0.999

SS algorithm to learn the pattern of the attacks (see Tables II and (feature) %

L Rene FN = 1 TP = 18396 11:34 Table III). XGBoost + 99.99% TN = 23094 FP = 23 0.998 0.99 0.999 We also observed that the importance of features with PCA = 15 FN = 20 TP = 18377 g

otiation XGBoost is different when applied before and after XGBoost 99.995 TN 423099 FP = 18 0.999 0.99 0.999 (4,3,0,8,6, % FN = 3 TP = 18394 StandardScaler (standardization). For XGBoost, there is no 28) important need to standardize the features, and so applying the most features important feature selection without standardization is ANN (100, 99.99% TN = 23104 FP = 13 0.999 0.99 0.999 50,10,10) + FN = 20 TP = 18377 preferable. Thus, we recommend selecting the most SC ANN 99.99% TN= 23091 FP = 26 0.999 0.99 0.999 important features after standardizing the features only if we (50,10,10) FN = 9 TP = 18388 are going to apply it for a model like ANN, to preserve the + SC compatibility of the algorithm. Another point we noted was the datasets are extremely imbalanced. For the Mirai dataset, that features from different windows might carry more the detection was a bit challenging when using ANN (Table information rather than from one for the Mirai dataset. In one II). This might be because the features are highly correlated of our experiments, although we selected the features due to the way they are extracted using the incremental randomly from different windows, we are able to achieve damping statistical method as well as the fact that they are on better results than applying the 23 features from the same th different scales. Accordingly, for ANN, we standardized the window. One other observation is that the 4 damped th features for all trials, otherwise, the results were very poor. window has the best results and the 5 is the worst. So, we th On the other hand, XGBoost can perfectly handle both the tested the model again excluding the 5 window for Mirai highly correlated features and features on different scales, and achieved a high accuracy of 99.988% (Table II). while achieving very high accuracy. This was not the case for We have also noticed that for SSDP, the first window ANN where the highly correlated features could hinder its (first 23 features) achieves 100% accuracy without the need performance. For ANN, we have tested different structures to for the rest of the windows (Table III). We assume that this improve the model performance and overcome this extra information would only add to the complexity of the challenge. model and could be ignored in case we are aiming to build a

140 TABLE III. RESULTS OF XGBOOST AND ANN ON DATASETS PART 2 (5:9)

F1 Algorithm Acc. Confusion Matrix TPR PPV # Score TN = FP = 0 XGBoost (All 27785 100% 1 1 1 Features) FN = 0 TP =287668 99.99 TN = FP = 1 XGBoost (first 97% 27784 0.99 0.99 0.999 10 features) FN = 1 TP = 287667 TN FP = 0

SSDP Flood XGBoost 100% =527785 1 1 1 (features 0:23) FN = 0 TP = 87668 XGBoost 99.99 TN = FP = 1 (features 98% 527784 5,1,8,58, 2, 0), 1 0.99 0.999 most important FN = 0 TP Figure 2. PCA components importance score in Mirai using features =287668 XGBOOST 99.99 TN = FP = 2 98% 527783 XGBoost + LDA 1 0.99 1 The only exception here is for the SYN attack, as PCA FN = 0 TP =287668 with 15 components achieves an accuracy of 99.991%, recall TN = FP = 0 or True Positive Rate (TPR) of 0.97, and precision or Positive 27785 ANN (10) + SC 100% 1 1 1 FN = 0 TP = Predictive Value (PPV) of 0.99. We also found that when 287668 99.99 TN = FP = 0 applying Features Importance on PCA components using XGBoost (All 96% 52862 0.99 1 0.999 Features) XGBoost, the features’ importance weight does not have a FN = 2 TP = 1391 linear correlation as expected with PCA rankings (Figure 2). XGBoost + PCA 99.99 TN = FP = 10 SYN DoS = 15 and 10% 52852 0.97 0.99 0.98 For example, the last PCA, which should be carrying the least Normalization FN = 38 TP = 1355 information according to the principal component point of ANN 99.90 TN = FP = 359 rd 00% 552503 view, is considered as the 3 important feature by XGBoost (60,10,60,10,60) 0.86 0.77 0.98 th +SC FN = 193 TP = 1200 with a feature-information weight of 381, while the 4 PCA

ANN (50, 10, 99.93 TN = FP = 186 is considered as the least important feature with a feature- 10, 5)+SC, 1st 50% 552676 0.873 0.867 0.87 trial FN = 177 TP = 1216 information weight of 204 by XGBoost, as shown in Figure 99.99 TN = FP = 0 2. For Mirai, applying the most important feature (79) does XGBoost All 90% 73974 0.99 1 0.999 not yield better results compared to features selected features FN = 3 TP = 20503

Video In randomly like feature zero, which has a very low information 99.99 TN = FP = 8 ANN (10) + SC 80% 473966 0.99 1 1 weight = 10. However, combining the 10 most important FN = 3 TP = 20503 features achieve very decent results. In general, it is j

ection 99.99 TN = FP = 19 ANN (100, 50, recommended to apply standardization before PCA, but this 50% 473955 0.99 0.99 0.999 25, 12, 6, 3, 1) FN = 4 TP = 20502 wasn’t the case for the SYN DoS and SSL Renegotiation 99.99 TN = FP = 0 ANN (50, 10, datasets. The best results achieved by PCA is seen on SSL 60% 73974 0.999 1 1 10, 5, 2) + SC FN = 20 TP = 20486 Renegotiation with normalization or standardization of the TN = FP = 0 features (yielding 99.99% accuracy according to Table II). XGBoost (All 100% 362022 1 1 1 Features) FN = 0 TP =86806 For SYN DoS, the normalized features (rather than Fuzzin standardizing) followed by PCA and XGBoost yielded better ANN 99.98 TN = FP = 67 (60,10,60,10,60) 20% 361955 0.999 0.999 0.999 results with an accuracy of 99.99% (Table III.) FN = 16 TP = 6790 g + SC Both the XGBoost and ANN models achieve very 99.70 TN = FP = 1299 ANN (10) + SC 90% 60723 0.99 0.98 0.992 promising performances. For datasets other than SYN DoS, FN = 9 TP=86797 we recommend feature selection by XGBoost rather than using PCA. For the SSL and SSDP datasets, the 6 most important features have better performance compared to model for a specific attack. We also noticed that the oldest PCA-15 with XGBoost. In general, our experimental results window -the 5th- which has the least damped weight, is reveal that XGBoost does a better job than ANN in binary expected to carry the least information. However, our trials classification for NIDS and achieves 100% accuracy in most show that it achieves slightly better results than the second cases. and third windows. Yet, they all achieve very decent For Mirai and SSL Renegotiation datasets, we used the performance, close to 100%. validation dataset for testing and compared it with the results In most cases, selecting the most important features via obtained from the separated validation and test dataset, where XGBoost gives us better results than applying PCA. So, we we obtained the same results. This could be interpreted by the excluded from the tables, the results of models using PCA fact that XGBoost hasn’t memorized the validation dataset that are less than 99.9%. while training the model. The same also applies to SSL.

141 As for the structure of the ANN, determining the TABLE IV. RESULTS OF XGBOOST AND ANN ON ALL DATASETS FI of hidden layers and neurons was challenging. For OS-Scan, Algorithm Acc. Confusion Matrix TPR PPV # Score configuring the ANN with 3 hidden layers achieved the worst XGBoost TN = FP = (11 most 3221822 11510 accuracy, but with 1 or 5 hidden layers we achieved very important 99.68 0.998 0.993 decent results. We assumed that the network is suffering from features) - 5% FN = TP = 0.98 Train 80% - 1747 968437 a high correlation among the features. So, we tried 4 hidden Test 20% TN = layers with the same size (10, 10, 10, 10), and this XGBoost - FP = 150 99.99 1317195 Train 30% - 0.9999 0.99 0.9999 7% TP = surprisingly achieved much better results supporting our Test 70% FN = 271 3394690 assumption. So, we also tried out the (50, 10, 10, 5, 2) ANN(10) - TN = FP = structure on OS_Scan, yielding the accuracy of 99.997% with (11 most 3141141 92191 important 94.41 0.8532 0.89 0.8998 TPR of 0.999 and PPV of 0.9998, which beats (50,10,10) that features) - 8% FN = TP = Train 80% - 142447 827737 resulted in PPV of 0.498 and (100, 50, 10, 10) with TPR =0. Test 20% The aim here is to allow the network to learn how to ANN(50, TN = FP = 10 ,10, 5, 2) 3221840 11492 extract the new features that have less correlation with one -(11 most 98.96 important 0.9668 0.98 0.9772 another. What encourages us to try many structures with this % FN = TP =

All Datasets features) - intuition, is the great results achieved by XGBoost. Such Train 80% - 32228 937956 Test 20% results could also be achieved by ANN if it can overcome the ANN(50, TN = FP = highly correlated features issue. We initially assumed that the 10 ,10) -(11 3206151 27181 most 98.85 model is underfitting since the network couldn’t learn enough important 0.9784 0.97 7% 0.9751 features) - FN = TP = from the features, especially, since the training error and Train 80% - 20746 937956 validation error were too close to each other. Then, we Test 20% TN = FP = ANN(10) - realized what hinders the network performance is the high 97.30 11118436 198909 Train 30% - 0.94 0.9416 4% TP = 0.9417 correlation of the features, and what can help improve the Test 70% FN = 197792 3197169 network efficiency is to learn how to decode these ANN((50, TN = FP = 10 ,10, 5, 11256993 60352 correlations by creating a special structure. Adding more cells 99.12 2)-train 0.9800 0.98 8% 0.9811 and making the network more complex didn’t help, as for the 30% test FN = TP = 70% 67994 3326967 Marai dataset we tried a hidden layer of 2000 neurons which ANN((50, TN = FP = 10 ,10, 5)- 98.87 11229228 88117 hasn’t produced much better results compared to 10 cells. 0.97 0.9756 train 30% 3% FN = TP = 0.9771 C. Multiple Known Attacks test 70% 77619 3317342 ANN((50, TN = FP = To detect multiple known attacks (from multiple 10 ,10)- 99.16 11264471 52874 0.9795 0.98 0.9819 datasets), we moved to the next set of experiments in the train 30% 6% FN = TP = test 70% 69754 3325207 framework to merge all the Kitsune family datasets (9 datasets out of nine. The 9th dataset was kept for testing the datasets) into one dataset. This has been very challenging, as model. In other words, we set aside each dataset as an the merged dataset has more than 21 million instances and is unknown attack for testing while the remaining 8 datasets semi-imbalanced. Nonetheless, we were still able to achieve were used for training. We achieved promising results. To decent results and almost all the attacks have been detected. ensure consistency of the results and that the results have With the 9 merged datasets, our proposed framework using meaningful insights, the experiments have been done 18 XGBoost achieves very decent results and beats ANN. When times for the 9 different attacks along with the two proposed considering all 115 features, we trained only on 30% of the models XGBoost and ANN, as illustrated in Table V. data, due to the limitation of our CPU power and memory. However, with our proposed model, we managed to achieve TABLE V. RESULTS OF XGBOOST AND ANN ON UNKNOWN ATTACKS a very high accuracy of 99.997% with a TPR of 0.9999 and # Algorithm Acc. TPR PPV FI-Score XGBoost - Mirai 13.847% 0.020 0.310 0.038 PPV of 0.99996 (Table IV). When training using the eleven XGBoost - SSL 95.823% 0.005 0.942 0.010 most important features with XGBoost, we also achieve Renegotiation XGBoost3 - SSDP Flood 64.711% 0.001 0.851 0.001 99.68% accuracy (Table IV). We also anticipate achieving XGBoost - SYN DoS 99.790% 0.177 0.989 0.300 All Datasets Except one almost 100% if we can train on70% and test on the remaining XGBoost - OS_SCAN 94.222% 0.998 0.401 0.572 30%. On the other hand, the highest accuracy achieved by XGBoost - ARP MitM 80.466% 0.573 0.99961 0.729 XGBoost-Video Injection 63.109% 0.660 0.072 0.129 ANN is 99.16% with a TPR of 0.979 and PPV of 0.9843. XGBoost-Active Wiretap 85.568% 0.676 0.955 0.791 XGBoost - Fuzzing 62.870% 0.981 0.340 0.505 D. Unknown Attacks ANN(10) - Mirai 76.231% 0.775 0.931 0.846 ANN(10) - SSL 97.936% 0.603 0.864 0.710 According to the very promising results of our proposed Renegotiation model on known attacks, we tried to take our framework one ANN(10) - SSDP Flood 65.014% 0.016 0.701 0.031 ANN(10)-SYN DoS 99.318% 0.117 0.061 0.080 step further and design a model that could recognize the ANN(10)-OS_SCAN 93.571% 0.980 0.374 0.541 patterns of unknown attacks using supervised learning, rather ANN(10) - ARP MitM 55.108% 0.019 0.960 0.038 than the common practice of unsupervised learning. By ANN(10) - Video Injection 95.788% 0.005 0.195 0.010 ANN(10) - Active Wiretap 61.976% 0.112 0.688 0.193 training the model on the known attacks, we trained it on 8 ANN(10)-Fuzzing 48.206% 0.981 0.269 0.422

142 Astonishingly, ANN has done a better job achieving the manner to increase the efficiency of detecting unknown highest F1-score on Mirai, not XGBoost as was the case of attacks. This comes under our conclusion from this work, that known attacks. Interestingly, XGboost achieved TPR of intrusions share similar characteristics that could be captured 0.998 and 0.981 on OS_Scan and Fuzzing datasets, through supervised learning from known attacks. respectively. This gives us the hope that the algorithm could REFERENCES detect the unknown attacks and not just give a random guess. [1] F. Anjum, D. Subhadrabandhu, and S. Sarkar, "Signature based This is true with the assumption that the benign packets share intrusion detection for wireless ad-hoc networks: A comparative study the same pattern and characteristics, and anomaly behavior of various routing protocols," in 2003 IEEE 58th Vehicular Technology could be determined for values above the predetermined Conference. VTC 2003-Fall (IEEE Cat. No. 03CH37484), 2003, vol. 3: threshold. This means that the intrusions also have something IEEE, pp. 2152-2156. [2] Y. Mirsky, T. Doitshman, Y. Elovici, and A. Shabtai, "Kitsune: an in common that could be captured by the model. ensemble of autoencoders for online network intrusion detection," arXiv The training was done on a limited percentage of the preprint arXiv:1802.09089, 2018. dataset (only on 30% of the training set due to the limitation [3] R. Abdulhammed, M. Faezipour, A. Abuzneid, and A. Alessa, of memory size and CPU power), and this is while some "Effective features selection and machine learning classifiers for improved wireless intrusion detection," in 2018 International datasets (e.g. SYN DoS) are extremely imbalanced. When Symposium on Networks, Computers and Communications (ISNCC), training all the datasets except one kept for testing (the 2018: IEEE, pp. 1-6. unknown attack), ANN beats XGBoost significantly with an [4] R. Abdulhammed, M. Faezipour, A. Abuzneid, and A. AbuMallouh, F1-score of 0.846 and 0.71 on Mirai and SSL Renegotiation, "Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network traffic," IEEE sensors letters, vol. 3, respectively (Table V). On the other hand, XGBoost achieved no. 1, pp. 1-4, 2018. a very low F1-score of 0.038 and 0.010 on Mirai and SSL [5] R. Abdulhammed, H. Musafer, A. Alessa, M. Faezipour, and A. Renegotiation, respectively. This could be due to the Abuzneid, "Features dimensionality reduction approaches for machine assumption that XGBoost could not figure out new features learning based network intrusion detection," Electronics, vol. 8, no. 3, p. 322, 2019. out of the range it has been trained for. On the other hand, [6] R. Abdulhammed, M. Faezipour, H. Musafer, and A. Abuzneid, ANN can anticipate new ranges of features that exist in the "Efficient network intrusion detection using pca-based dimensionality unknown attacks and can yield a better prediction. In other reduction of features," in 2019 International Symposium on Networks, words, ANN can learn the common pattern of different Computers and Communications (ISNCC), 2019: IEEE, pp. 1-6. [7] H. Musafer, A. Abuzneid, M. Faezipour, and A. Mahmood, "An attacks better under supervised learning, trained on a large- Enhanced Design of Sparse Autoencoder for Latent Features Extraction scale dataset. However, XGBoost does much better on the Based on Trigonometric Simplexes for Network Intrusion Detection OS_Scan, ARP MitM, Active Wiretap, and Fuzzing datasets. Systems," Electronics, vol. 9, no. 2, 259, 2020. We achieved a TPR of 0.9983 for XGBoost on OS_Scan. [8] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, "Toward generating a new intrusion detection dataset and intrusion traffic characterization," This gives us the hope we could reach a supervised model in ICISSP, 2018, pp. 108-116. that can detect unknown attacks. [9] P. Garcia-Teodoro, J. Diaz-Verdejo, G. Maciá-Fernández, and E. One more observation related to the pattern of attacks is Vázquez, "Anomaly-based network intrusion detection: Techniques, that if we assume that the intrusions have similar patterns that systems and challenges," computers & security, vol. 28, no. 1-2, pp. 18- 28, 2009. could be detected from the incremental damping statistics [10] M. Abuzneid and A. Mahmood, "Performance improvement for 2-D extracted earlier, then we could create a tremendous number face recognition using multi-classifier and BPN," in 2016 IEEE Long of fake attacks that carry the same pattern and similar Island Systems, Applications and Technology Conference (LISAT), characteristics. We can then train our model on fake attacks 2016: IEEE, pp. 1-7. [11] J. Shlens, "A tutorial on principal component analysis," arXiv preprint along with real ones. So, a robust model can be created that arXiv:1404.1100, 2014. could detect known and unknown attacks, even when created [12] S. Balakrishnama and A. Ganapathiraju, "Linear discriminant analysis- fundamentally from a supervised learning model. This a brief tutorial," Institute for Signal and information Processing, vol. 18, supports the belief in the Kitsune ensemble model that the pp. 1-8, 1998. [13] S. J. Prince and J. H. Elder, "Probabilistic linear discriminant analysis unsupervised system could learn the healthy pattern of the for inferences about identity," in 2007 IEEE 11th International packets with no attacks as they share similar features which Conference on Computer Vision, 2007: IEEE, pp. 1-8. also lead to the fact that intrusions share similar features. [14] T. Chen and C. Guestrin, "Xgboost: A scalable tree boosting system," in Proceedings of the 22nd acm sigkdd international conference on IV. CONCLUSION AND FUTURE DIRECTIONS knowledge discovery and data mining, 2016, pp. 785-794. [15] Z. Zhang, J. Li, C. Manikopoulos, J. Jorgenson, and J. Ucles, "HIDE: a Applying XGboost and ANN along with feature hierarchical network intrusion detection system using statistical reduction/selection for NIDS achieves very promising and preprocessing and neural network classification," in Proc. IEEE reliable results. In the future, we plan to build on top of our Workshop on Information Assurance and Security, 2001, pp. 85-90. prior work [17], in addition to the framework introduced in [16] E. Alexandre-Cortizo, M. Rosa-Zurera, and F. Lopez-Ferreras, "Application of fisher linear discriminant analysis to speech/music this paper for devising effective and practical network classification," in EUROCON 2005-The International Conference on" intrusion detection systems. Moreover, Generative Computer as a Tool", 2005, vol. 2: IEEE, pp. 1666-1669. Adversarial Networks (GAN) and Variational Autoencoders [17] A. Attia, M. Faezipour and A. Abuzneid, “Comparative Study of Hybrid (VAE) could be used to create more attacks to be trained on Machine Learning Algorithms for Network Intrusion Detection”, in Springer Book: Advances in Security, Networks, and Internet of Things the predicted models (ANN and XGBoost) in a supervised Conference, July 2020.

143