Adaptive Real-Time Anomaly Detection with Incremental Clustering

information security technical report 12 (2007) 56–67 available at www.sciencedirect.com www.compseconline.com/publications/prodinf.htm Adaptive real-time anomaly detection with incremental clustering Kalle Burbeck, Simin Nadjm-Tehrani* Department of Computer and Information Science, Linko¨ping University, Sweden article info abstract Published online 7 March 2007 Anomaly detection in information (IP) networks, detection of deviations from what is considered normal, is an important complement to misuse detection based on known attack descriptions. Performing anomaly detection in real-time places hard requirements on the algorithms used. First, to deal with the massive data volumes one needs to have effi- cient data structures and indexing mechanisms. Secondly, the dynamic nature of today’s information networks makes the characterisation of normal requests and services difficult. What is considered as normal during some time interval may be classified as abnormal in a new context, and vice versa. These factors make many proposed data mining techniques less suitable for real-time intrusion detection. In this paper we present ADWICE, Anomaly Detection With fast Incremental Clustering, and propose a new grid index that is shown to improve detection performance while preserving efficiency in search. Moreover, we propose two mechanisms for adaptive evolution of the normality model: incremental exten- sion with new elements of normal behaviour, and a new feature that enables forgetting of outdated elements of normal behaviour. These address the needs of a dynamic network environment such as a telecom management network. We evaluate the technique for network-based intrusion detection, using the KDD data set as well as on data from a telecom IP test network. The experiments show good detection quality and act as proof of concept for adaptation of normality. ª 2007 Elsevier Ltd. All rights reserved. 1. Introduction unfortunately too long. It is long enough for malicious actors developing an attack tool, releasing it to a wider group, and The threats to computer-based systems on which several crit- launching massive attacks to take place before the patches ical infrastructures depend are alarmingly increasing, thereby are in place. One example is the Zotob-A worm and its vari- increasing the need for technology to handle those threats ants (F-secure, 2005). It took only four days from the release (Yegneswaran et al., 2003). Increasing use of software compo- of the patch by Microsoft (9 August 2005) before the worms nents brings the benefit of rapid deployment and flexibility, exploiting the vulnerability were spreading on the Internet together with the vulnerability to accidental and targeted mis- (13 August 2005). use, making service availability the key challenge for many Rapid patching is important but not enough (Lippmann businesses and government agencies. The cycle from detec- et al., 2002). For a production system continuous patching tion of some vulnerability in a software product, and the cre- may not be viable due to system complexity and diversity as ation and application of a patch in all critical systems is well as compatibility requirements. For critical systems, * Corresponding author. E-mail address: [email protected] (S. Nadjm-Tehrani). 1363-4127/$ – see front matter ª 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.istr.2007.02.004 information security technical report 12 (2007) 56–67 57 defence in depth is needed incorporating many different secu- mining technique, clustering fits very well for anomaly detec- rity technologies (Venter and Eloff, 2003). This includes fire- tion, since no knowledge of the attack classes is needed whilst walls at network boundaries, and on individual hosts, training. Contrast this to classification, where the classifica- removal of unused software and services, virus scanners, tion algorithm needs to be presented with both normal and and so on; but it is increasingly evident that intrusion detec- known attack data to be able to separate those classes during tion systems should be part of the hardened defence. detection. Intrusion Detection Systems (IDS) attempt to respond to In this paper we present Anomaly Detection With fast In- this trend by applying knowledge-based techniques (typically cremental Clustering (ADWICE), a novel adaptive anomaly de- realised as signature-based misuse detection), or behaviour- tection scheme, inspired by the BIRCH clustering algorithm based techniques (e.g. by applying machine learning for detec- (Zhang et al., 1996), and extended with new capabilities. It is tion of anomalies). Also, due to increasing complexity of the based on an earlier presented version of the algorithm (Bur- intrusion detection task, the use of many IDS sensors to in- beck and Nadjm-Tehrani, 2004) but complements the original crease coverage, and the need for improved usability of intru- technique with a novel search index that increases detection sion detection, a recent trend is alert or event correlation quality, and provides new means to handle adaptation (forget- (Morin and Herve´, 2003; Haines et al., 2003; Chyssler et al., ting). We show the application of the new mechanisms in two 2004). Correlation combines information from multiple sour- settings: (1) an emulated test network at a major Telecom op- ces to improve information quality. By correlation the erator in Europe (Swisscom) for evaluating the scalability and strength of different types of detection schemes may be com- timeliness of the algorithm, and (2) the comparative analysis bined, and weaknesses compensated for. of the detection quality of the algorithm based on the only The main detection scheme of most commercial intrusion common (open) data source available – the KDD99 attack detection systems is misuse detection, where known bad behav- data. Data mining approaches to anomaly detection are a re- iours (attacks) are encoded into signatures. This type of misuse cent area of exploration, and more work needs to be done to detection faces problems unless the signature database is kept clarify their strengths by studies in several application do- up to date, and even then can only detect attacks that are well mains. Therefore, this paper does not attempt to justify the known and for which signatures have been written. appropriateness of clustering as a whole or compare perfor- An alternative approach is anomaly detection in which nor- mance with alternative machine learning work. The inter- mal behaviour of users or the protected system is modelled, ested reader is referred to an earlier report (Burbeck, 2006) often using machine learning or data mining techniques. Dur- that covers some of this ground. ing detection new data are matched against the normality The paper is organised as follows. In Section 2 the motiva- model, and deviations are marked as anomalies. Since no tion of this work and perspective with respect to related works knowledge of attacks is needed to train the normality model, is provided. Next we present the context of ADWICE within anomaly detection may detect previously unknown attacks. the Safeguard agent architecture. Section 4 describes the tech- If an attack tool is published before a patch is applied and be- nique used, and evaluation is presented in Section 5. The final fore the attack signatures are developed or installed, the section discusses the results and presents some remaining anomaly detection system may be the only remaining de- challenges for future work. fence. Some attack types, including a subset of denial of service and scanning attacks, alter the statistical distribution of the system data when present. This implies that anomaly detection may be a general and perhaps the most viable ap- 2. Motivation proach to detect such attacks. Anomaly detection still faces many challenges, where one 2.1. IDS data problems and dependencies of the most important is the relatively high rate of false alarms (false positives). The problem of capturing a complex normal- One fundamental problem of intrusion detection research is ity makes the high rate of false positives intrinsic to anomaly the limited availability of good data to be used for evaluation. detection. We argue that the usefulness of anomaly detection Producing intrusion detection data is a labour intensive and is increased if combined with further aggregation, correlation complex task involving generation of normal system data as and analysis of alarms, thereby minimizing the number of well as attacks, and labelling the data to make evaluation pos- false alarms propagated to the administrator (or automated sible. If a real network is used, the problem of producing good response system) that further diagnoses the scenario. normal data is reduced, but then the data may be too sensitive The fact that normality changes constantly makes the false to be released for public research comparison. alarm problem even worse. A model with acceptable rates of For learning based methods, good data are not only neces- false alarms may rapidly deteriorate in quality when normal- sary for evaluation and testing, but also for training. Thus ap- ity changes over time. To minimize this problem and the plying a learning based method in the real world puts even resulting additional false alarms, the anomaly detection harder requirements on the data. The data used for training model needs to be adaptable. need to be representative to the network where the learning The training of the normality model for anomaly detection based method will be applied, possibly requiring generation may be performed by a variety of different techniques and of new data for each deployment. Classification based many approaches have been evaluated. One important tech- methods (Elkan, 2000; Mukkamala et al., 2002), or supervised nique is clustering, where similar data points are grouped to- learning, require training data that contain normal data as gether into clusters using a distance function. As a data well as good representatives of those attacks that should be 58 information security technical report 12 (2007) 56–67 detected, to be able to separate attacks from normality.

Adaptive Real-Time Anomaly Detection with Incremental Clustering

A Review on Outlier/Anomaly Detection in Time Series Data

Image Anomaly Detection Using Normal Data Only by Latent Space Resampling

Unsupervised Network Anomaly Detection Johan Mazel

Machine Learning and Extremes for Anomaly Detection — Apprentissage Automatique Et Extrêmes Pour La Détection D’Anomalies

Machine Learning for Anomaly Detection and Categorization In

CSE601 Anomaly Detection

Anomaly Detection in Data Mining: a Review Jagruti D

Anomaly Detection in Raw Audio Using Deep Autoregressive Networks

Detecting Anomalies in System Log Files Using Machine Learning Techniques

Anomaly Detection with Adversarial Dual Autoencoders

Backpropagated Gradient Representations for Anomaly Detection

Machine Learning for Automated Anomaly Detection In