Adaptive Real-Time Anomaly Detection with Improved Index and Ability to Forget∗

Adaptive Real-Time Anomaly Detection with Improved Index and Ability to Forget∗ Kalle Burbeck and Simin Nadjm-Tehrani Department of Computer and Information Science Linkoping¨ university SE–581 83 Linkoping,¨ Sweden E-mail: {kalbu, simin}@ida.liu.se Abstract improve information quality. By correlation the strength of different types of detection schemes may be combined, and Anomaly detection in IP networks, detection of devia- weaknesses compensated for. tions from what is considered normal, is an important com- The main detection scheme of most commercial intru- plement to misuse detection based on known attack descrip- sion detection systems is misuse detection, where known tions. Performing anomaly detection in real-time places bad behaviours (attacks) are encoded into signatures. In hard requirements on the algorithms used. First, to deal anomaly detection normal behaviour of users or the pro- with the massive data volumes one needs to have efficient tected system is modelled, often using machine learning or data structures and indexing mechanisms. Secondly, the data mining techniques rather than given signatures. Dur- dynamic nature of today’s information networks makes the ing detection new data is matched against the normality characterization of normal requests and services difficult. model, and deviations are marked as anomalies. Since no What is considered as normal during some time interval knowledge of attacks is needed to train the normality model, may be classified as abnormal in a new context, and vice anomaly detection may detect previously unknown attacks. versa. These factors make many proposed data mining tech- Anomaly detection still faces many challenges, where niques less suitable for real-time intrusion detection. In this one of the most important is the relatively high rate of false paper we extend ADWICE, Anomaly Detection With fast In- alarms (false positives). The problem of capturing a com- cremental Clustering. Accuracy of ADWICE classifications plex normality makes the high rate of false positives intrin- is improved by introducing a new grid-based index, and its sic to anomaly detection except for simple problems. We ar- ability to build models incrementally is extended by intro- gue that the usefulness of anomaly detection is increased if ducing forgetting. We evaluate the technique on the KDD combined with further aggregation, correlation and analysis data set as well as on data from a real (telecom) IP test of alarms, thereby minimizing the number of false alarms network. The experiments show good detection quality and propagated to the administrator (or automated response sys- illustrate the usefulness of adapting to normality. tem) that further diagnoses the scenario. The fact that normality changes constantly makes the false alarm problem even worse. When normality changes over time, a model 1. Introduction with acceptable rates of false alarms may rapidly deterio- rate in quality. To minimize this problem and the resulting Intrusion detection is an important part of computer sys- additional false alarms, the anomaly detection model needs tem defence. Due to increasing complexity of the intru- to be adaptable. sion detection task, the use of many IDS sensors to increase The training of the normality model for anomaly detec- coverage, and the need for improved usability of intrusion tion may be performed by a variety of different techniques detection, a recent trend is alert or event correlation [3, 6]. and many approaches have been evaluated. One impor- Correlation combines information from multiple sources to tant technique is clustering, where similar data points are ∗This work was supported by the European project Safeguard [13] IST– grouped together into clusters using a distance function. In 2001–32685. We would like to thank Thomas Dagonnier, Tomas Lingvall this paper we develop a new indexing mechanism for AD- and Stefan Burschka at Swisscom for fruitful discussions and their many WICE (Anomaly Detection With fast Incremental Cluster- months of work with the test network. The work is currently supported by a grant from CENIIT, Center for Industrial Information Technology at ing), an adaptive anomaly detection scheme inspired by the Linkoping¨ university. BIRCH clustering algorithm [15]. We improve an earlier version of the algorithm used for anomaly detection [2] in anomaly detection in this paper, training data is assumed two ways: (1) a novel search index that increases precision to consist only of normal data. Munson and Wimer [10] detection quality, and (2) new means to handle adaptation used a cluster-based model (Watcher) to protect a real web of the model (forgetting). This paper does not attempt to server, proving anomaly detection based on clustering to be justify the quality of clustering within anomaly detection useful in real life. ADWICE uses pure anomaly detection to or compare performance with other machine learning work. reduce the training data requirement of classification-based The interested reader is referred to the original report for methods and to avoid the attack volume assumption of un- that purpose [2]. supervised anomaly detection. By including only normal The paper is organised as follows. In section 2 the moti- data in the detection model the low accuracy of pure anom- vation of this work is presented together with related work. aly detection can be significantly improved. Section 3 describes the indexing technique, and evaluation In a real live network with connection to Internet, data is presented in section 4. The final section discusses the re- can never be assumed to be free of attacks. Pure anomaly sults and describes opportunities and challenges for future detection also works when some attacks are included in the work. training data, but those attacks will be considered normal during detection and therefore not detected. To increase de- 2 Motivation and Related Work tection coverage, attacks should be removed from the training data to as large an extent as possible, with a trade-off between coverage and data cleaning effort. Attack data can One fundamental problem of intrusion detection re- be filtered away from training data using updated misuse de- search is the limited availability of good data to be used for tectors, or multiple anomaly detection models may be com- evaluation. Producing intrusion detection data is a labour bined by voting to reduce costly human effort. intensive and complex task involving generation of normal One of the inherent problems of anomaly detection in system data as well as attacks, and labelling the data to general is the false positives rate. In most realistic settings make evaluation possible. If a real network is used, the normality is hard to capture and even worse, is changing problem of producing good normal data is reduced, but then over time. Rather than making use of extensive periodi- the data may be too sensitive to be released to other re- cal retraining sessions on stored off-line data to handle this searchers publicly. Learning-based methods require data change, ADWICE is fully incremental making very flexible not only for testing and comparison but also for training, on-line adaptation of the model possible without destroying resulting in even higher data requirements. The data used what is already learnt. for training needs to be representative for the network to An intrusion detection system in a real-time environment which the learning-based method will be applied, possibly needs to be fast enough to cope with the information flow, requiring generation of new data for each deployment. have explicit limits on resource usage, and adapt to changes Classification-based methods [4, 9] require training data in the protected network in real-time. Many proposed clus- that contains normal data as well as good representatives of tering techniques require quadratic time for training [7], those attacks that should be detected, to be able to separate making real-time adaptation of a cluster-based model hard. attacks from normality. Producing a good coverage of the They may also not be scalable, requiring all training data to very large attack space (including unknown attacks) is not be kept in main memory during training, limiting the size of practical for any network. Also the data needs to be labelled the trained model. We argue that it is important to consider and attacks to be marked. One advantage of clustering- scalability and performance in parallel to detection quality based methods [5,10,12,14] is that they require no labelled when evaluating algorithms for intrusion detection. Most training data set containing attacks, significantly reducing work on applications of data mining to intrusion detection the data requirement. There exist at least two approaches. consider those issues to a very limited degree or not at all. When doing unsupervised anomaly detection [5, 12, 14] ADWICE is scalable since compact cluster summaries are a model based on clusters of data is trained using unlabelled kept in memory rather than individual data. ADWICE also data, normal as well as attacks. If the underlying assump- has better performance compared to similar approaches due tion holds (i.e. attacks are sparse in data) attacks may be to the use of local clustering with an integrated improved detected based on cluster sizes, where small clusters cor- search index. respond to attack data. Unsupervised anomaly detection is a very attractive idea, but unfortunately the experiences so far indicate that acceptable accuracy is very hard to obtain. 3 The Anomaly Detection algorithm Also, the assumption of unsupervised anomaly detection is not always fulfilled making the approach unsuitable for at- This section describes how ADWICE handles training tacks such as denial of service (DoS) and scanning. and detection, more details on the old index may be found In the second approach, which we simply denote (pure) in the original paper [2]. The present implementation re- quires data to be numeric. Data is therefore assumed to be Case 2: If the size (number of clusters) of the model has transformed into numeric format by pre-processing.

Adaptive Real-Time Anomaly Detection with Improved Index and Ability to Forget∗

A Review on Outlier/Anomaly Detection in Time Series Data

Image Anomaly Detection Using Normal Data Only by Latent Space Resampling

Unsupervised Network Anomaly Detection Johan Mazel

Machine Learning and Extremes for Anomaly Detection — Apprentissage Automatique Et Extrêmes Pour La Détection D’Anomalies

Machine Learning for Anomaly Detection and Categorization In

CSE601 Anomaly Detection

Anomaly Detection in Data Mining: a Review Jagruti D

Anomaly Detection in Raw Audio Using Deep Autoregressive Networks

Detecting Anomalies in System Log Files Using Machine Learning Techniques

Anomaly Detection with Adversarial Dual Autoencoders

Backpropagated Gradient Representations for Anomaly Detection

Machine Learning for Automated Anomaly Detection In