A Software Implementation of a Genetic Algorithm Based Approach to Network Intrusion Detection

A Software Implementation of a Genetic Algorithm Based Approach to Network Intrusion Detection Ren Hui Gong, Mohammad Zulkernine, Purang Abolmaesumi School of Computing Queen’s University Kingston, Ontario, Canada K7L 3N6 {rhgong, mzulker, purang}@cs.queensu.ca Abstract technology has brought us, computer systems are exposed to increasing security threats that originate With the rapid expansion of Internet in recent externally or internally. Different but complementary years, computer systems are facing increased number technologies have been developed and deployed to of security threats. Despite numerous technological protect organizations’ computer systems against innovations for information assurance, it is still very network attacks, for example, anti-virus software, difficult to protect computer systems. Therefore, firewall, message encryption, secured network unwanted intrusions take place when the actual protocols, password protection, and so on. Despite software systems are running. Different soft computing different protection mechanisms, it is nearly based approaches have been proposed to detect impossible to have a completely secured system. computer network attacks. This paper presents a Therefore, intrusion detection is becoming an genetic algorithm (GA) based approach to network increasingly important technology that monitors intrusion detection, and the software implementation network traffic and identifies network intrusions such of the approach. The genetic algorithm is employed to as anomalous network behaviors, unauthorized derive a set of classification rules from network audit network access, and malicious attacks to computer data, and the support-confidence framework is utilized systems [15]. as fitness function to judge the quality of each rule. There are two general categories of intrusion The generated rules are then used to detect or classify detection systems (IDSs): misuse detection and network intrusions in a real-time environment. Unlike anomaly detection [16]. Misuse detection systems most existing GA-based approaches, because of the detect intruders with known patterns, and anomaly simple representation of rules and the effective fitness detection systems identify deviations from normal function, the proposed method is easier to implement network behaviors and alert for potential unknown while providing the flexibility to either generally detect attacks. Some IDSs integrate both misuse and anomaly network intrusions or precisely classify the types of detection and form hybrid detection systems. The IDSs attacks. Experimental results show the achievement of can also be classified into two categories depending on acceptable detection rates based on benchmark where they look for intrusions. A host-based IDS DARPA data sets on intrusions, while no other monitors activities associated with a particular host, complementary techniques or relevant heuristics are and a network-based IDS listens to network traffic. applied. A number of soft computing based approaches have been proposed for detecting network intrusions [1, 2, Keywords: Information assurance, misuse intrusion 3, 4, 6, 10]. Soft computing refers to a group of detection, genetic algorithms, support-confidence techniques that exploit the tolerance for imprecision, framework, software development. uncertainty, partial truth, and approximation to achieve robustness and low solution cost. The principle 1. Introduction constituents of soft computing are Fuzzy Logic (FL), Artificial Neural Networks (ANNs), Probabilistic The Internet and local area networks are expanding Reasoning (PR), and Genetic Algorithms (GAs) [10]. at an amazing rate in recent years. While we are When used for intrusion detection, soft computing benefiting from the convenience that the new techniques are often used in conjunction with rule- Proceedings of the Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks (SNPD/SAWN’05) 0-7695-2294-7/05 $20.00 © 2005 IEEE based expert systems acquiring expert knowledge [1, and the qualities of the individuals are gradually 4, 5, 6], where the knowledge is represented as a set of improved. During each generation, three basic genetic if-then rules. Despite different soft computing based operators are sequentially applied to each individual approaches having been proposed, the possibilities of with certain probabilities, i.e., selection, crossover, and using the techniques for intrusion detection are still mutation. First, a number of best-fit individuals are under-utilized. selected based on a user-defined fitness function. The In this paper, we present a GA-based approach to remaining individuals are discarded. Next, a number of network misuse detection. GA is chosen because of individuals are selected and paired with each other. some of its nice properties, e.g., robust to noise, no Each individual pair produces one offspring by gradient information is required to find a global partially exchanging their genes around one or more optimal or sub-optimal solution, self-learning randomly selected crossing points. At the end, a certain capabilities, etc. Using GAs for network intrusion number of individuals are selected and the mutation detection has proven to be a cost-effective approach [1, operations are applied, i.e., a randomly selected gene 2, 3, 7, 8, 9, 11]. In this work, we implement a of an individual abruptly changes its value. software based on the presented approach. The One extension of genetic algorithms, namely software is experimented using DARPA data sets on Genetic Programming (GP) [3, 8], is also commonly intrusions, which has become the de facto standard for used. It differs from GAs in the way of encoding testing intrusion detection systems. The experimental individuals. GAs use fixed length vectors to represent results show that our approach is effective, and it has individuals. In contrast, GP encodes each individual the flexibility to either generally detect network with a parse tree, where leaf nodes are genes and non- intrusions or precisely classify the types of misuses. leaf nodes are primitive functions (e.g., AND, OR, This is due to the use of both categorical and etc.). GP has the flexibility to represent very complex quantitative features of network audit data for deriving individuals. In the context of rule based expert the classification rules, and the use of the support- systems, GAs are often used to efficiently derive confidence framework as the GA fitness function. simple rules, and GP is used when more complex or accurate rules are required. Paper Organization. Thus far, we have discussed the motivation and a brief overview of the presented work. The rest of the paper is organized as follows. Section 2 initialization gives an overview of the genetic algorithm employed initial population in this work. Section 3 reviews the work relevant to this research, while some of the more closely related selection work are discussed in the relevant parts of this paper. new population Sections 4 and 5 describe in detail the proposed yes method and its software implementation. Section 6 quit? old population presents the experimental results, and Section 7 no concludes the paper with some future recommendations. crossover 2. Genetic Algorithms mutation Genetic algorithms [3, 12] employ metaphor from end biology and genetics to iteratively evolve a population of initial individuals to a population of high quality Figure 1. The operation of a generic GA. individuals, where each individual represents a solution of the problem to be solved and is composed When a GA is used for problem-solving, three of a fixed number of genes. The number of possible factors will have impact on the effectiveness of the values of each gene is called the cardinality of the algorithm, they are: 1) the selection of fitness function; gene. Figure 1 illustrates the operation of a general 2) the representation of individuals; and 3) the values genetic algorithm. The operation starts from an initial of the GA parameters. The determination of these population of randomly generated individuals. Then factors often depends on applications. In our the population is evolved for a number of generations implementation for network intrusion detection, the Proceedings of the Sixth International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing and First ACIS International Workshop on Self-Assembling Wireless Networks (SNPD/SAWN’05) 0-7695-2294-7/05 $20.00 © 2005 IEEE support-confidence framework was used as fitness and is able to generally detect or precisely classify function, a simple GA (rather than GP) was employed network intrusions. However, the use of GP makes to represent and derive rules, and appropriate GA implementation more difficult and more data or time parameters, including selection rate, crossing over are required to train the system. style, mutation rate, etc, were chosen based on a large Li [7] propose a GA-based method to detect number of experiments. anomalous network behaviors. Both quantitative and categorical features of network data are included when 3. Related Work deriving classification rules using GA. The inclusion of quantitative features may lead to increased detection This section briefly summarizes some of the rates. However, no experimental results are available applications of soft computing techniques for intrusion

Load more