Enhanced Intrusion Detection System Based on Bat Algorithm-Support Vector Machine

Enhanced Intrusion Detection System Based on Bat Algorithm-support Vector Machine Adriana-Cristina Enache and Valentin Sgârciu Faculty of Automatic Control and Computer Science, University Politehnica of Bucharest, Bucharest, Romania Keywords: Intrusion Detection, SVM, Bat Algorithm, Binary Bat Algorithm, Lévy Flights. Abstract: As new security intrusions arise so does the demand for viable intrusion detection systems. These solutions must deal with huge data volumes, high speed network traffics and countervail new and various types of security threats. In this paper we combine existing technologies to construct an Anomaly based Intrusion Detection System. Our approach improves the Support Vector Machine classifier by exploiting the advantages of a new swarm intelligence algorithm inspired by the environment of microbats (Bat Algorithm). The main contribution of our paper is the novel feature selection model based on Binary Bat Algorithm with Lévy flights. To test our model we use the NSL-KDD data set and empirically prove that Lévy flights can upgrade the exploration of standard Binary Bat Algorithm. Furthermore, our approach succeeds to enhance the default SVM classifier and we obtain good performance measures in terms of accuracy (90.06%), attack detection rate (95.05%) and false alarm rate (4.4%) for unknown attacks. 1 INTRODUCTION to improve these systems. Many researchers have fo- cused their attention on the anomaly based IDS and Many of our activities imply using the Internet (on- designed several approaches based on machine learn- line payments, internet banking, social networks or ing algorithms ( decision trees, neural networks or ge- searching for informations) and almost all govern- netic algorithms). There are many challenges to be ment or private organizations store critical data over regarded when implementing an IDS such as offering the networks. This increasing usage and growing real-time responses with a high attack detection rate speed of network connections has determined the pro- and a low false alarm rate. Also, the large number of liferation of various threats. In this context security features and difficulty to recognize the complex rela- systems have become vital components. Despite the tionship between them makes classification a difficult recent advances, security incidents are on the rise. task. In order to address these issues, we propose a IDS have become an indispensable component of al- Network Anomaly Intrusion Detection Model based most every security infrastructure, mainly because on Support Vector Machines(SVM) with Swarm In- they provide a wall of defense and resist external at- telligence(SI). Our approach has two main compo- tacks effectively, where other traditional security sys- nents: feature selection and enhanced SVM classifier. tems cannot perform well. Support Vector Machine (SVM) has many advan- Intrusion Detection Systems (IDS) monitor the tages that make it a suitable solution for intrusion de- activities and events occurring in the systems and detection, such as: good generalization performances cide if these are intrusive actions or normal usage of and learning ability in high dimensional or noisy the system. In general, IDS are classified on the ba- datasets. Furthermore, SVM does not suffer from lo- sis of their data analysis technique as: misuse and cal minima and its execution time runs fast. However, anomaly detection. The misuse method is very ac- a draw-back of this classifier is that its performance curate in detecting known attacks based on their sig- depends on selection of the right parameters. natures that are stored in the database. The anomaly Recently, swarm intelligence (SI) algorithms have detection approach automatically constructs a normal attracted great interest, mainly because they are sim- behavior of the systems. This latter method can de- ple, flexible (can be applied to a variety of problems tect new attacks but, it can also generate many false such as optimization, data mining and so forth) and alarms(Kukielka and Kotulski, 2014). robust (the algorithm will function even if some indi- Since 1980, when Anderson introduced the first viduals fail to perform their tasks). Currently, there IDS model, multiple techniques have been proposed are a variety of SI algorithms such as: ant colony op- 184 Enache A. and Sgârciu V.. Enhanced Intrusion Detection System Based on Bat Algorithm-support Vector Machine. DOI: 10.5220/0005015501840189 In Proceedings of the 11th International Conference on Security and Cryptography (SECRYPT-2014), pages 184-189 ISBN: 978-989-758-045-1 Copyright c 2014 SCITEPRESS (Science and Technology Publications, Lda.) EnhancedIntrusionDetectionSystemBasedonBatAlgorithm-supportVectorMachine timization, particle swarm optimization (PSO), artifi- a slack variable and defining a soft margin. Further- cial bee colony algorithm, firefly algorithm, cuckoo more, using kernel functions, we can transform the search or bat algorithm. These algorithms have been nonlinear SVM into a linear problem by mapping the combined with SVM to contruct improved IDS mod- dataset into a higher-dimensional feature space. For els and in most cases are used for two optimization our model we will use SVM with radial basis func- processes: feature selection and electing SVM param- tion (RBF), where the kernel function is: eters. Wang et. al. (Wang et al., 2009) present an IDS 1 based on PSO-SVM. They used Binary PSO to deter- K(x ,x)= exp(− x − x2) (1) i 2σ2 i mine the best feature subset and Standard PSO to seek for optimal SVM parameters. In 2010, a novel Arti- The RBF kernel function is a good solution be- ficial Bee Colony(ABC)-SVM approach is proposed cause it has fewer controllable parameters and an ex- (Wang et al., 2010) and experiments on the KDD- cellent nonlinear forecasting performance. In the fol- Cup99 dataset proved that ABC-SVM can obtain bet- lowing we define the SVM controllable parameters ter accuracy rate than PSO-SVM or GA-SVM. Pu et. and explain their influence. al. (Pu et al., 2012) improve SVM parameters with • The Regularization Parameter (C) - controls the Ant Colony Algorithm by defining the input pa- the ”flexibility” of the hyperplane’s margins. In rameters as the ant’s position. other words, smaller C allows softer-margins thus, In this paper we use a relatively new metaheuris- permits greater errors. Larger C produces a more tic to improve the SVM classifier, the Bat Algorithm accurate model with harder margins but, the gen- (BA), which shows promising results (Yang, 2010a), eralization performance of the classifier is worse. (Yang and He, 2013). Furthermore, we introduce an • Kernel Parameter (σ) - is the constant variable innovative feature selection model that combines Bi- from the kernel function. This parameter reflects nary Bat Algorithm (BBA) with Levy´ flights and em- the correlation among support vectors that define pirically demonstrate that it can outperform the stan- the hyperplane and may cause overfitting or un- dard BBA when coupled with SVM. derfitting of the classifier. The rest of this paper is organized as follows: first we introduce the algorithms that will be used to con- 2.2 Bat Algorithm struct our IDS model. Next, we describe our approach, define the performance measures and show Bat Algorithm (BA) was developed by Yang in 2010 our test results. Also, we compare our model with (Yang, 2010a) and it was inspired by the echoloca- other methods and indicate that it can obtain better tion of bats. These microbats emit a loud sound pulse results. Finally, the conclusions and future work. and change their pulse rate as the obstacle or prey is closer. In order to define a smart bat algorithm, three generalization rules have been formulated: 2 BACKGROUND OVERVIEW • All bats use echolocation to approximate the dif- ference between an obstacle and a prey and sense distance. 2.1 Support Vector Machine • Bats fly randomly and their movement is defined by their position in space (xi) and velocity (vi). Support Vector Machine (SVM) is a binary super- These parameters are computed based on a vary- vised learning algorithm that searches for the opti- ing wavelength (λ), frequency ( freq ) and loud- mum hyperplane to separate the two classes. This min ness (A ) to search for prey. Moreover, bats can linear classifier conducts structural risk analysis of 0 modify the frequency of their emitted pulses and statistical learning theory by selecting a number of the rate of pulse emission (r ∈ [0,1]), depending parameters based on the requirement of the margin on the closeness of their target. that separates the data points (Dua and Du, 2011b). The hyperplane is defined as f (x)= w · x + b, where • The loudness can vary in multiple ways, but we w is the weight vector and b is the bias. Our clas- assume that it varies from a large value (A0)to a sifier defines a hyperplane and linearly separates the minimum constant value (Amin). two classes: anomaly and normal traffic. The points BA is a swarm intelligence algorithm which per- closest to the hyperplane are called support vectors forms searches using a population of agents. For and the distance between them is called the margin. SVM parameter selection, BA will search for the best On the other hand, the dataset is not always lin- C and σ based on the accuracy of SVM. Each agent t early separable. This issue is solved by introducing i has a current position xi = (xi,1,xi,2,...,xi,d) and a 185 SECRYPT2014-InternationalConferenceonSecurityandCryptography t current flying velocity vi = (vi,1,vi,2,...,vi,d) , where bat’s motion in a d-dimensional binary space. There- d is the problem dimension. To find the optimal po- fore, the position of a bat is defined as a vector of sition, each agent (or bat) updates its position and ve- binary coordinates and the bat can move across the locity according to the following equations: corners of a hypercube.

Load more