Enhanced Intrusion Detection System Based on Bat Algorithm-support Vector Machine

Adriana-Cristina Enache and Valentin Sgˆarciu Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, Bucharest, Romania

Keywords: Intrusion Detection, SVM, Bat Algorithm, Binary Bat Algorithm, L´evy Flights.

Abstract: As new security intrusions arise so does the demand for viable intrusion detection systems. These solutions must deal with huge data volumes, high speed network traffics and countervail new and various types of security threats. In this paper we combine existing technologies to construct an Anomaly based Intrusion Detection System. Our approach improves the Support Vector Machine classifier by exploiting the advantages of a new intelligence algorithm inspired by the environment of microbats (Bat Algorithm). The main contribution of our paper is the novel feature selection model based on Binary Bat Algorithm with L´evy flights. To test our model we use the NSL-KDD data set and empirically prove that L´evy flights can upgrade the exploration of standard Binary Bat Algorithm. Furthermore, our approach succeeds to enhance the default SVM classifier and we obtain good performance measures in terms of accuracy (90.06%), attack detection rate (95.05%) and false alarm rate (4.4%) for unknown attacks.

1 INTRODUCTION to improve these systems. Many researchers have fo- cused their attention on the anomaly based IDS and Many of our activities imply using the Internet (on- designed several approaches based on machine learn- line payments, internet banking, social networks or ing algorithms ( decision trees, neural networks or ge- searching for informations) and almost all govern- netic algorithms). There are many challenges to be ment or private organizations store critical data over regarded when implementing an IDS such as offering the networks. This increasing usage and growing real-time responses with a high attack detection rate speed of network connections has determined the pro- and a low false alarm rate. Also, the large number of liferation of various threats. In this context security features and difficulty to recognize the complex rela- systems have become vital components. Despite the tionship between them makes classification a difficult recent advances, security incidents are on the rise. task. In order to address these issues, we propose a IDS have become an indispensable component of al- Network Anomaly Intrusion Detection Model based most every security infrastructure, mainly because on Support Vector Machines(SVM) with Swarm In- they provide a wall of defense and resist external at- telligence(SI). Our approach has two main compo- tacks effectively, where other traditional security sys- nents: feature selection and enhanced SVM classifier. tems cannot perform well. Support Vector Machine (SVM) has many advan- Intrusion Detection Systems (IDS) monitor the tages that make it a suitable solution for intrusion de- activities and events occurring in the systems and de- tection, such as: good generalization performances cide if these are intrusive actions or normal usage of and learning ability in high dimensional or noisy the system. In general, IDS are classified on the ba- datasets. Furthermore, SVM does not suffer from lo- sis of their data analysis technique as: misuse and cal minima and its execution time runs fast. However, anomaly detection. The misuse method is very ac- a draw-back of this classifier is that its performance curate in detecting known attacks based on their sig- depends on selection of the right parameters. natures that are stored in the database. The anomaly Recently, (SI) algorithms have detection approach automatically constructs a normal attracted great interest, mainly because they are sim- behavior of the systems. This latter method can de- ple, flexible (can be applied to a variety of problems tect new attacks but, it can also generate many false such as optimization, data mining and so forth) and alarms(Kukielka and Kotulski, 2014). robust (the algorithm will function even if some indi- Since 1980, when Anderson introduced the first viduals fail to perform their tasks). Currently, there IDS model, multiple techniques have been proposed are a variety of SI algorithms such as: ant colony op-

184 Enache A. and Sgârciu V.. Enhanced Intrusion Detection System Based on Bat Algorithm-support Vector Machine. DOI: 10.5220/0005015501840189 In Proceedings of the 11th International Conference on Security and Cryptography (SECRYPT-2014), pages 184-189 ISBN: 978-989-758-045-1 Copyright c 2014 SCITEPRESS (Science and Technology Publications, Lda.) EnhancedIntrusionDetectionSystemBasedonBatAlgorithm-supportVectorMachine

timization, particle swarm optimization (PSO), artifi- a slack variable and defining a soft margin. Further- cial bee colony algorithm, firefly algorithm, cuckoo more, using kernel functions, we can transform the search or bat algorithm. These algorithms have been nonlinear SVM into a linear problem by mapping the combined with SVM to contruct improved IDS mod- dataset into a higher-dimensional feature space. For els and in most cases are used for two optimization our model we will use SVM with radial basis func- processes: feature selection and electing SVM param- tion (RBF), where the kernel function is: eters. Wang et. al. (Wang et al., 2009) present an IDS 1 based on PSO-SVM. They used Binary PSO to deter- K(x ,x)= exp(− x − x2) (1) i 2σ2 i mine the best feature subset and Standard PSO to seek for optimal SVM parameters. In 2010, a novel Arti- The RBF kernel function is a good solution be- ficial Bee Colony(ABC)-SVM approach is proposed cause it has fewer controllable parameters and an ex- (Wang et al., 2010) and experiments on the KDD- cellent nonlinear forecasting performance. In the fol- Cup99 dataset proved that ABC-SVM can obtain bet- lowing we define the SVM controllable parameters ter accuracy rate than PSO-SVM or GA-SVM. Pu et. and explain their influence. al. (Pu et al., 2012) improve SVM parameters with • The Regularization Parameter (C) - controls the Ant Colony Algorithm by defining the input pa- the ”flexibility” of the hyperplane’s margins. In rameters as the ant’s position. other words, smaller C allows softer-margins thus, In this paper we use a relatively new metaheuris- permits greater errors. Larger C produces a more tic to improve the SVM classifier, the Bat Algorithm accurate model with harder margins but, the gen- (BA), which shows promising results (Yang, 2010a), eralization performance of the classifier is worse. (Yang and He, 2013). Furthermore, we introduce an • Kernel Parameter (σ) - is the constant variable innovative feature selection model that combines Bi- from the kernel function. This parameter reflects nary Bat Algorithm (BBA) with Levy´ flights and em- the correlation among support vectors that define pirically demonstrate that it can outperform the stan- the hyperplane and may cause overfitting or un- dard BBA when coupled with SVM. derfitting of the classifier. The rest of this paper is organized as follows: first we introduce the algorithms that will be used to con- 2.2 Bat Algorithm struct our IDS model. Next, we describe our ap- proach, define the performance measures and show Bat Algorithm (BA) was developed by Yang in 2010 our test results. Also, we compare our model with (Yang, 2010a) and it was inspired by the echoloca- other methods and indicate that it can obtain better tion of bats. These microbats emit a loud sound pulse results. Finally, the conclusions and future work. and change their pulse rate as the obstacle or prey is closer. In order to define a smart bat algorithm, three generalization rules have been formulated: 2 BACKGROUND OVERVIEW • All bats use echolocation to approximate the dif- ference between an obstacle and a prey and sense distance. 2.1 Support Vector Machine • Bats fly randomly and their movement is defined by their position in space (xi) and velocity (vi). Support Vector Machine (SVM) is a binary super- These parameters are computed based on a vary- vised learning algorithm that searches for the opti- ing wavelength (λ), frequency ( freq ) and loud- mum hyperplane to separate the two classes. This min ness (A ) to search for prey. Moreover, bats can linear classifier conducts structural risk analysis of 0 modify the frequency of their emitted pulses and statistical learning theory by selecting a number of the rate of pulse emission (r ∈ [0,1]), depending parameters based on the requirement of the margin on the closeness of their target. that separates the data points (Dua and Du, 2011b). The hyperplane is defined as f (x)= w x + b, where • The loudness can vary in multiple ways, but we w is the weight vector and b is the bias. Our clas- assume that it varies from a large value (A0)to a sifier defines a hyperplane and linearly separates the minimum constant value (Amin). two classes: anomaly and normal traffic. The points BA is a swarm intelligence algorithm which per- closest to the hyperplane are called support vectors forms searches using a population of agents. For and the distance between them is called the margin. SVM parameter selection, BA will search for the best On the other hand, the dataset is not always lin- C and σ based on the accuracy of SVM. Each agent t early separable. This issue is solved by introducing i has a current position xi = (xi,1,xi,2,...,xi,d) and a

185 SECRYPT2014-InternationalConferenceonSecurityandCryptography

t current flying velocity vi = (vi,1,vi,2,...,vi,d) , where bat’s motion in a d-dimensional binary space. There- d is the problem dimension. To find the optimal po- fore, the position of a bat is defined as a vector of sition, each agent (or bat) updates its position and ve- binary coordinates and the bat can move across the locity according to the following equations: corners of a hypercube. Given bat i, its coordinates are computed using a sigmoid function as follows:

freqi = freqmin + ( freqmax − freqmin) β (2) −1 if S(vi, j) > δ xi, j = (8) t t−1 t−1 0 otherwise vi, j = vi, j + (xi, j − x best j) freqi (3) ( t t−1 t Where, the sigmoid function is : xi, j = xi, j + vi, j (4) where β ∈ [0,1] is a random vector drawn from a uni- 1 S(vi, j)= v (9) form distribution. As stated earlier, the bat will de- 1 + e− i, j creases his loudness (Ai) and increase his pulse emis- and δ ∈ U(0,1). Hence, the bat’s position can be sion rate (ri) when he is closer to the target: seen as a string of binary numbers. For the feature selection process we want to determine the best fea- A t+1 = α A t (5) i i ture subset that will enhance the performances of our t+1 0 −γt ri = ri [1 − e ] (6) classifier. In order to adapt BBA for feature selection α α γ γ we can consider the bat’s position as the subset of fea- where (0 < < 1) and ( > 0) are constants. At tures and the bat’s coordinates as the presence (if the each iteration, the fitness value is improved, while coordinate is one) or absence(if the coordinate is zero) A → 0 and r → r0. In order to ameliorate the vari- of a feature. The fitness function for BBA will be the ability of the discovered solutions, Yang uses random accuracy of SVM after it has been trained with the walks to generate new solutions: subset of features represented by the bat’s position. ∗ In order to elevate the exploration of BBA we xnew = xold + δ A t (7) use Levy´ flights for randomization. Levy´ flights are ∗ where δ ∈ [−1,1] is a random number and A t is the a random walk whose step length is drawn from a average loudness of all bats at iteration t. L´evy distribution. Recent studies have shown that In some ways we can state that BA is similar to L´evy flights can better searches in uncertain environ- the popular PSO. The solution is denoted by the po- ments. L´evy flights have many applications and have sition of the particle, each individual from the swarm been observed among foraging pattern of spider mon- has its own position and velocity which are updated keys or albatrosses (Yang, 2010b). This distribution according to their fitness value. There are also some has been combined with other computational intelli- significant differences, as BA uses random walks for gence algorithms, such as : cuckoo search(Yang and exploration (or diversification) and varies the loud- Deb, 2009), firefly (Yang, 2009) or bat algorithm (Xie ness and pulse rate in order to exploit the solution. et al., 2013). Here we exploit this distribution with For PSO, exploitation is controlled by the use of the BBA for feature selection. Therefore, we substitute global best and individual best solutions, while explo- equation (7) with: ration is done using two learning parameters(Yang, −η ∗ 2012). xnew = xold +t A t (10) where t−η is the L´evy flights distribution and 1 < η ≤ 2.3 Feature Selection 3 is a constant. BBA feature selection tries to optimize the feature Feature selection methods can be divided into : scalar subset at each iteration, such that the classifier’s accu- methods (are less complex and select features individ- racy enhances. The main steps of this algorithm are ually) and vector methods (select a subset of features given below: based on a mutual relation between features)(Dua and a. Each bat has a position that defines the subset of Du, 2011a). In the following we introduce the Binary features. Based on this subset, the bat trains and Bat Algorithm and explain how it can be adapted to evaluates the SVM classifier. construct a vector approach feature selection. b. After all bats have been evaluated the global opti- 2.3.1 Binary Bat Algorithm mum fitness value of the swarm is determined. c. Each bat improves its position and updates its The Binary Bat Algorithm(BBA) (Mirjalili et al., pulse rate and loudness as it approaches the best 2013) is a modified version of BA that describes the solution. For this:

186 EnhancedIntrusionDetectionSystemBasedonBatAlgorithm-supportVectorMachine

– the bat uses Levy´ flights to randomize the solu- 3.2 Evaluation tion, in which case will compute equation (10) and the sigmoid function will be S(xi, j). To evaluate our model we find the best subset of fea- – otherwise, if the fitness value has not improved tures with BBAL and compare it with BBA. Also, (compared to the global best) the bat will up- we train the SVM classifier with BA using the de- date his frequency and velocity. The sigmoid termined feature subset and compare it with PSO. function is given by equation (9). This comparison is relevant because PSO is a popu- – finally, if the new fitness value is better than the lar swarm intelligence algorithm that has been widely global best, the bat will increase his pulse rate used for optimization problems. and decrease his loudness. The global optimum 3.2.1 Performance Measures is modified. d. All the steps from c. are covered by all bats from We estimate the effectiveness of our IDS model by the swarm and are reiterated, until the maximum calculating three performance measures: attack de- number of loops is reached. tection rate (ADR) (shows the model’s capability of detecting attacks), false alarm rate (FAR) (measures how many false alarms the model generates) and 3 PROPOSEDMODEL accuracy (reveals the model’s capability of raising proper alarms). Our model has three main stages : first we apply BBA with L´evy flights (BBAL) to determine the best sub- 3.2.2 Model Setup set for SVM, next we use BA to determine the best parameters for SVM. Finally, SVM detects network All experiments were performed using an Intel Core attacks using the best parameters calculated above. 2 Duo 2.8 GHz processor with 2 GB of RAM under Ubuntu 10.04.4. We implemented the Swarm Intelli- 3.1 Data Set gence algorithms (BBA, BBAL, BA and PSO) in Java on eclipse. For the SVM classifier we used Weka ver- sion 3.6.10 (Hall et al., 2009). BBAL and BBA In this paper we experimented with the NSL-KDD data set, which contains network attacks. We chose Table 1: Optimum feature subset selected by BBA-SVM. this dataset because it is publicly available and is protocol type an improved version of KDD-Cup (Tavallaee et al., service 2009). The simulated attacks from NSL-KDD fall flag into one of the following categories: Denial of Service src bytes (DoS), User to Root (U2R), Remote to Local (R2L) or land Probing. Furthermore, there are two different groups urgent of files: training and testing. It is important to men- tion that test data includes attack types not in the train- logged in ing data and therefore it will allow us to evaluate the num compromised classifier for unknown attacks. num root We randomly select 9,500 feature samples from num file creations the 20% training file and 4,500 records from the test num shells file. Each record from the dataset has 41 features and num access files is labeled as either normal or an attack. These features num outbound cmds can be classified into three categories: content based count (13 features), connection based (9 features) and time serror rate based (19 features). In order to improvethe prediction rerror rate ability of the classifier, we convert the symbolic val- same srv rate ues into integer values, as follows: protocol type ∈ dst host count [0,2] , service ∈ [0,69] and flag ∈ [0,10]. We also dst host same srv rate replace the class label with 0 for normal and 1 for at- dst host diff srv rate tack. dst host same src port rate dst host serror rate dst host rerror rate

187 SECRYPT2014-InternationalConferenceonSecurityandCryptography

Table 3: Result comparison with determined feature subset System Nb. of feat. Accuracy ADR FAR Test (sec.) Accuracy (training dataset) SVM 41 78.933% 89.81% 7.288% 12.51 97.74% BBA-SVM 23 79.022% 89.86% 7.377% 7.49 98.04% BBAL-SVM 18 79.666% 91.22% 6.311% 6.82 98.31%

Table 4: Result comparison with the subset of features selected by BBA-SVM. System C σ Accuracy ADR FAR Accuracy (training dataset) SVM 1.0 0.5 79.022% 89.86% 7.377% 98.04% PSO-SVM 110.9106619 5.945653 89% 92.26% 7.2% 98.89% BA-SVM 65.488216 4.6051845 89.28% 92.38% 7.11% 99.05%

Table 5: Result comparison with the subset of features selected by BBAL-SVM. System C σ Accuracy ADR FAR Accuracy (training dataset) SVM 1.0 0.5 79.666% 91.22% 6.311% 98.31% PSO-SVM 55.3817619 14.178919 89.93% 94.94% 4.488% 99.29% BA-SVM 264.386988 5.252083 90.06% 95.05% 4.4% 99.31%

Table 2: Optimum feature subset selected by BBAL-SVM. factors are c1 = 2.3andc2 = 1.8and inertia weight is duration reduced from 0.9 to 0.5. service flag 3.2.3 Results and Analysis wrong fragment urgent Tables 1 and 2 indicate the best subset of features de- num compromised termined with BBAL-SVM and BBA-SVM, having σ root shell parameter C = 1.0 and = 0.5. BBAL obtains the num shells smallest number of features, reducing them by 43.9%. num access files BBA is more vulnerable to local optima than BBAL num outbound cmds but, it succeeds to simplify the number of features. srv serror rate To evaluate these subset we use the training srv rerror rate dataset to build the model and the test dataset to eval- dst host count uate it. Also, we perform a 3-fold cross validation for the training dataset, in order to show the model’s dst host srv count accuracy for known attacks. Results from Table 3 re- dst host same srv rate veal the BBA-SVM subset gives slightly better perfor- dst host serror rate mances when compared to the initial dataset, leading dst host srv serror rate SVM to a higher classification rate, attack detection dst host rerror rate rate and lower number of false alarms. On the other hand, the BBAL subset, having a smaller number of have a swarm of 60 bats, the maximum number of features, succeeds to enhance the classification abili- iterations is 200 and the problem dimension is 41 (the ties of SVM and offers better results than BBA. number of features from the dataset). After several Furthermore, we also try to improve the SVM tests we set the the maximum loudness ( A0 =10), classifier by selecting the proper input parameters. the minimum pulse rate (r0 = 0.9), the parameters For this we use two swarm intelligence algorithms that control the convergence of the algorithm (γ = 0.1 (PSO and BA) that will search for the best solution. and α = 0.9) and range the frequency between 0.8 In other words, each individual of the swarm will at- and 1.0. We use η = 0.1 for L´evy flights. For BA tempt to acquire a position, represented by the two and PSO the maximum number of iterations is 100, parameters, that will bring them a higher accuracy for the problem dimension equals 2 (we need to optimize the classifier. The comparison between PSO and BA two parameters C and σ) and range 1 ≤ C ≤ 500 and for the two subsets is shown in Tables 4 and 5. We 0.0001 ≤ σ ≤ 50. BA has a population of 10 individ- can observe that both PSO and BA manage to boost uals and the same parameters values as BBA, while up the performance of the classifier by electing the PSO has a population size of 20 individuals, learning adequate parameters for SVM. As expected, the en-

188 EnhancedIntrusionDetectionSystemBasedonBatAlgorithm-supportVectorMachine

hanced models perform better with the BBAL subset. Dua, S. and Du, X. (2011b). Machine learning for anomaly The attack detection rate for PSO-SVM with the BBA detection. In Data Mining and Machine Learning in subset is comparable with the standard BBAL-SVM. Cybersecurity, pages 85–114. Auerbach Publications Table 5 reveals PSO-SVM and BA-SVM, with the Taylor and Francis Group. BBAL subset, offer an improvement of almost 1.8% Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009). The weka data mining for the false alarm rate and no more than 3.8% for the software: an update. SIGKDD Explor. Newsl., 11:10– attack detection rate. Results from Table 4 show the 18. false alarm rate is lightly significantly lower, while Kukielka, P. and Kotulski, Z. (2014). New unknown attack the attack detection rate gains approximately 2.5%. detection with the neural network-based ids. In The On the other hand, the accuracy for unknown attacks State of the Art in Intrusion Prevention and Detection, is upgraded with almost 10% in all cases. This means pages 259–284. Auerbach Publications. that our new classifier is able to raise proper alarms. Mirjalili, S., Mirjalili, S., and Yang, X.-S. (2013). Binary bat algorithm. Neural Computing and Applications, We must also remark that the difference between the pages 1–19. results given by PSO and BA is quite small. Further- Pu, J., Xiao, L., Li, Y., and Dong, X. (2012). A detec- more, for known attacks, all approaches show good tion method of network intrusion based on svm and accuracy results and the best is obtained by the BA- ant colony algorithm. In Proceedings of the National SVM (99.31%). Conference on Information Technology and Computer Science, pages 153–156. Atlantis Press. Tavallaee, M., Bagheri, E., Lu, W., and Ghorbani, A. A. (2009). A detailed analysis of the KDD CUP 99 data 4 CONCLUSIONS set. In Proceedings of the IEEE Symposium on Com- putational Intelligence in Security and Defense Appli- cations, pages 1–6. IEEE. In this paper we proposed a new NIDS model that Wang, J., Hong, X., and R. Ren, T. L. (2009). A real- combines SVM with a recent swarm intelligence al- time intrusion detection system based on pso-svm. gorithm, the Bat Algorithm. The main contribution In Proceedings of the International Workshop on In- of this paper is the novel feature selection method formation Security and Application, pages 319–321. (BBAL) that succeeds to reduce the number of at- ACADEMY PUBLISHER. tributes from the dataset while improving the predic- Wang, J., Li, T., and Ren, R. (2010). A real time IDSs tive accuracy, detection rate and false alarm rate of the based on artificial bee colony-support vector machine algorithm. In Proceedings in the International Work- SVM classifier. To evaluate the effectiveness of the shop on Advanced Computational Intelligence, pages proposed model we use the NSL-KDD network intru- 91–96. IEEE. sion benchmark and compare it with the popular PSO Xie, J., Zhou, Y., and Chen, H. (2013). A novel bat algo- for our two subset of features. We showed that BBAL rithm based on differential operator and l´evy flights can upgrade BBA for feature selection but, only when trajectory. Computational Intelligence and Neuro- combined with SVM. Therefore, our future work will science, 2013. focus on combining BBAL with other classifiers and Yang, X.-S. (2009). Firefly algorithm, l´evy flights and comparing it to other feature selection approaches in . In Proceedings of the SGAI Inter- national Conference on Artificial Intelligence, pages order to range its quality. 209–218. Yang, X.-S. (2010a). A new bat-inspired algorithm. In Nature Inspired Cooperative Strate- gies for Optimization (NICSO 2010), volume 284 of ACKNOWLEDGEMENTS Studies in Computational Intelligence, pages 65–74. Springer Berlin Heidelberg. The work has been fundedby the Sectoral Operational Yang, X.-S. (2010b). Random walks and l´evy flights. Programme Human Resources Development 2007- In Nature-Inspired Metaheuristic Algorithms Second 2013 of the Ministry of European Funds through the Edition, pages 11–20. Luniver Press. Financial Agreement POSDRU/159/1.5/S/132395. Yang, X.-S. (2012). Swarm-based metaheuristic algorithms and no-free-lunch theorems. In Theory and New Ap- plications of Swarm Intelligence. InTech. Yang, X.-S. and Deb, S. (2009). Cuckoo search via l´evy REFERENCES flights. In Proceedings of the World Congress on Na- ture & Biologically Inspired Computing, pages 210– 214. IEEE. Dua, S. and Du, X. (2011a). Classical machine-learning Yang, X.-S. and He, X. (2013). Bat algorithm: Literature paradigmsfor data mining. In Data Mining and Ma- review and applications. International Journal of Bio- chine Learning in Cybersecurity, pages 23–56. Auer- Inspired Computation, 5(3):141–149. bach Publications Taylor and Francis Group.

189