Intrusion Detection Systems: Techniques, Datasets and Challenges Ansam Khraisat*, Iqbal Gondal, Peter Vamplew and Joarder Kamruzzaman

Khraisat et al. Cybersecurity (2019) 2:20 Cybersecurity https://doi.org/10.1186/s42400-019-0038-7 SURVEY Open Access Survey of intrusion detection systems: techniques, datasets and challenges Ansam Khraisat*, Iqbal Gondal, Peter Vamplew and Joarder Kamruzzaman Abstract Cyber-attacks are becoming more sophisticated and thereby presenting increasing challenges in accurately detecting intrusions. Failure to prevent the intrusions could degrade the credibility of security services, e.g. data confidentiality, integrity, and availability. Numerous intrusion detection methods have been proposed in the literature to tackle computer security threats, which can be broadly classified into Signature-based Intrusion Detection Systems (SIDS) and Anomaly-based Intrusion Detection Systems (AIDS). This survey paper presents a taxonomy of contemporary IDS, a comprehensive review of notable recent works, and an overview of the datasets commonly used for evaluation purposes. It also presents evasion techniques used by attackers to avoid detection and discusses future research challenges to counter such techniques so as to make computer systems more secure. Keywords: Malware, Intrusion detection system, NSL_KDD, Anomaly detection, Machine learning Introduction bank customers, robbing bank accounts or stealing The evolution of malicious software (malware) poses a credit cards (Symantec, 2017). However, the new gener- critical challenge to the design of intrusion detection ation of malware has become more ambitious and is tar- systems (IDS). Malicious attacks have become more so- geting the banks themselves, sometimes trying to take phisticated and the foremost challenge is to identify un- millions of dollars in one attack (Symantec, 2017). For known and obfuscated malware, as the malware authors that reason, the detection of zero-day attacks has be- use different evasion techniques for information con- come the highest priority. cealing to prevent detection by an IDS. In addition, there High profile incidents of cybercrime have demonstrated has been an increase in security threats such as zero-day the ease with which cyber threats can spread internation- attacks designed to target internet users. Therefore, ally, as a simple compromise can disrupt a business’ essen- computer security has become essential as the use of in- tial services or facilities. There are a large number of formation technology has become part of our daily lives. cybercriminals around the world motivated to steal infor- As a result, various countries such as Australia and the mation, illegitimately receive revenues, and find new tar- US have been significantly impacted by the zero-day at- gets. Malware is intentionally created to compromise tacks. According to the 2017 Symantec Internet Security computer systems and take advantage of any weakness in Threat Report, more than three billion zero-day attacks intrusion detection systems. In 2017, the Australian Cyber were reported in 2016, and the volume and intensity of Security Centre (ACSC) critically examined the different the zero-day attacks were substantially greater than pre- levels of sophistication employed by the attackers (Austra- viously (Symantec, 2017). As highlighted in the Data lian, 2017). So there is a need to develop an efficient IDS to Breach Statistics in 2017, approximately nine billion data detect novel, sophisticated malware. The aim of an IDS is records were lost or stolen by hackers since 2013 to identify different kinds of malware as early as possible, (Breach_LeveL_Index, 2017). A Symantec report found which cannot be achieved by a traditional firewall. With the that the number of security breach incidents is on the increasing volume of computer malware, the development rise. In the past, cybercriminals primarily focused on of improved IDSs has become extremely important. In the last few decades, machine learning has been * Correspondence: [email protected] used to improve intrusion detection, and currently there Internet Commerce Security Laboratory, Federation University Australia, is a need for an up-to-date, thorough taxonomy and Mount Helen, Australia © The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Khraisat et al. Cybersecurity (2019) 2:20 Page 2 of 22 survey of this recent work. There are a large number of detection system and taxonomy by Axelsson (Axelsson, related studies using either the KDD-Cup 99 or DARPA 2000) classified intrusion detection systems based on the 1999 dataset to validate the development of IDSs; how- detection methods. The highly cited survey by Debar et ever there is no clear answer to the question of which al. (Debar et al., 2000) surveyed detection methods based data mining techniques are more effective. Secondly, the on the behaviour and knowledge profiles of the attacks. time taken for building IDS is not considered in the A taxonomy of intrusion systems by Liao et al. (Liao et evaluation of some IDSs techniques, despite being a crit- al., 2013a), has presented a classification of five sub- ical factor for the effectiveness of ‘on-line’ IDSs. classes with an in-depth perspective on their characteris- This paper provides an up to date taxonomy, together tics: Statistics-based, Pattern-based, Rule-based, State- with a review of the significant research works on IDSs based and Heuristic-based. On the other hand, our work up to the present time; and a classification of the pro- focuses on the signature detection principle, anomaly posed systems according to the taxonomy. It provides a detection, taxonomy and datasets. structured and comprehensive overview of the existing Existing review articles (e.g., such as (Buczak & Guven, IDSs so that a researcher can become quickly familiar 2016;Axelsson,2000; Ahmed et al., 2016; Lunt, 1988; with the key aspects of anomaly detection. This paper Agrawal & Agrawal, 2015)) focus on intrusion detection also provides a survey of data-mining techniques applied techniques or dataset issue or type of computer attack and to design intrusion detection systems. The signature- IDS evasion. No articles comprehensively reviewed intru- based and anomaly-based methods (i.e., SIDS and AIDS) sion detection, dataset problems, evasion techniques, and are described, along with several techniques used in each different kinds of attack altogether. In addition, the devel- method. The complexity of different AIDS methods and opment of intrusion-detection systems has been such that their evaluation techniques are discussed, followed by a several different systems have been proposed in the mean- set of suggestions identifying the best methods, depend- time, and so there is a need for an up-to-date. The up- ing on the nature of the intrusion. Challenges for the dated survey of the taxonomy of intrusion-detection current IDSs are also discussed. Compared to previous discipline is presented in this paper further enhances tax- survey publications (Patel et al., 2013; Liao et al., 2013a), onomies given in (Liao et al., 2013a; Ahmed et al., 2016). this paper presents a discussion on IDS dataset problems In view of the discussion on prior surveys, this article which are of main concern to the research community focuses on the following: in the area of network intrusion detection systems (NIDS). Prior studies such as (Sadotra & Sharma, 2016; Classifying various kinds of IDS with the major Buczak & Guven, 2016) have not completely reviewed types of attacks based on intrusion methods. IDSs in term of the datasets, challenges and techniques. In Presenting a classification of network anomaly IDS this paper, we provide a structured and contemporary, evaluation metrics and discussion on the importance wide-ranging study on intrusion detection system in terms of the feature selection. of techniques and datasets; and also highlight challenges Evaluation of available IDS datasets discussing the of the techniques and then make recommendations. challenges of evasion techniques. During the last few years, a number of surveys on intrusion detection have been published. Table 1 shows Intrusion detection systems the IDS techniques and datasets covered by this survey Intrusion can be defined as any kind of unauthorised ac- and previous survey papers. The survey on intrusion tivities that cause damage to an information system. This Table 1 Comparison of this survey and similar surveys: (✔: Topic is covered, ✖ the topic is not covered) Survey # of Intrusion Detection System Techniques Dataset citation issue SIDS AIDS Hybrid (as of IDS 6/1/ Supervised learning Unsupervised Semi-supervised learning Ensemble methods 2019) Lunt (1988) 219 ✔✖ ✖ ✖ ✖ ✖ ✖ Axelsson (2000) 1039 ✔✔ ✖ ✖ ✖ ✖ ✖ Liao, et al. (2013b) 505 ✔✔ ✔ ✖ ✖ ✔ ✖ Agrawal and Agrawal (2015) 108 ✔✔ ✔ ✔ ✔ ✔ ✖ Buczak and Guven (2016) 338 ✔✔ ✔ ✖ ✔ ✔ ✔ Ahmed, et al. (2016) 181 ✖✔ ✔ ✖ ✖ ✖ ✔ This survey ✔✔ ✔ ✔ ✔ ✔ ✔ Khraisat et al. Cybersecurity (2019) 2:20 Page 3 of 22 means any attack that could pose a possible threat to the Traditional approaches to SIDS examine network information confidentiality, integrity or availability will be packets and try matching against a database of signa- considered an intrusion. For example, activities that would

Intrusion Detection Systems: Techniques, Datasets and Challenges Ansam Khraisat*, Iqbal Gondal, Peter Vamplew and Joarder Kamruzzaman

The 5 Things You Need to Do to Protect Yourself from Ransomware

Internet Security

Internet Security Threat Report Volume 24 | February 2019

Cyber Risks to Public Safety: Ransomware, September 2020

Internet Security

Mcafee Potentially Unwanted Programs (PUP) Policy March, 2018

Internet Security Policy

Trojans and Malware on the Internet an Update

The Ethics of Cyberwarfare Randall R

(Malicious Software) Installed on Your Computer Without Your Consent to Monitor Or Control Your Computer Use

Lesson 6: Hacking Malware

CS 356 – Lecture 28 Internet Authentication