Unsupervised Network Anomaly Detection Johan Mazel

Unsupervised network anomaly detection Johan Mazel To cite this version: Johan Mazel. Unsupervised network anomaly detection. Networking and Internet Architecture [cs.NI]. INSA de Toulouse, 2011. English. tel-00667654 HAL Id: tel-00667654 https://tel.archives-ouvertes.fr/tel-00667654 Submitted on 8 Feb 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 5)µ4& &OWVFEFMPCUFOUJPOEV %0$503"5%&-6/*7&34*5²%&506-064& %ÏMJWSÏQBS Institut National des Sciences Appliquées de Toulouse (INSA de Toulouse) $PUVUFMMFJOUFSOBUJPOBMFBWFD 1SÏTFOUÏFFUTPVUFOVFQBS Mazel Johan -F lundi 19 décembre 2011 5J tre : Unsupervised network anomaly detection ED MITT : Domaine STIC : Réseaux, Télécoms, Systèmes et Architecture 6OJUÏEFSFDIFSDIF LAAS-CNRS %JSFDUFVS T EFʾÒTF Owezarski Philippe Labit Yann 3BQQPSUFVST Festor Olivier Leduc Guy Fukuda Kensuke "VUSF T NFNCSF T EVKVSZ Chassot Christophe Vaton Sandrine Acknowledgements I would like to first thank my PhD advisors for their help, support and for letting me lead my research toward my own direction. Their inputs and comments along the stages of this thesis have been highly valuable. I want to especially thank them for having been able to dive into my work without drowning, and then, provide me useful remarks. I want to make a special thank to Pedro Casas-Hernandez who worked with me inside the ECODE project at LAAS-CNRS. Working with him during one year and a half has been a real pleasure. I think that the extent of this thesis contributions would not have been the same without Pedro’s work. I extend a special thank to Ion Alberdi. He introduced me to functionnal programming through the Ocaml language and pointed me toward new and exciting territories in terms of programming and computer science in general. Our discussions have been on of the most valuable discussions I ever had. I want to also thank all the people working at LAAS-CNRS for the help they provided me. I here want to especially thank David Gauchard and Christian Berty for their exceptionnal availability and reactivity. I finally want to thank my family, friends and colleagues at LAAS-CNRS. I won’t quote any names but be assured that I am truly grateful for your support during these three and half years. 1 Abstract Anomaly detection has become a vital component of any network in today’s Inter- net. Ranging from non-malicious unexpected events such as flash-crowds and failures, to network attacks such as denials-of-service and network scans, network traffic anomalies can have serious detrimental effects on the performance and integrity of the network. The continuous arising of new anomalies and attacks create a continuous challenge to cope with events that put the network integrity at risk. Moreover, the inner polymorphic nature of traffic caused, among other things, by a highly changing protocol landscape, complicates anomaly detection system’s task. In fact, most network anomaly detection systems proposed so far employ knowledge-dependent techniques, using either misuse detection signature-based detection methods or anomaly detection relying on supervised- learning techniques. However, both approaches present major limitations: the former fails to detect and characterize unknown anomalies (letting the network unprotected for long periods) and the latter requires training over labeled normal traffic, which is a dif- ficult and expensive stage that need to be updated on a regular basis to follow network traffic evolution. Such limitations impose a serious bottleneck to the previously presented problem. We introduce an unsupervised approach to detect and characterize network anomalies, without relying on signatures, statistical training, or labeled traffic, which represents a significant step towards the autonomy of networks. Unsupervised detection is accom- plished by means of robust data-clustering techniques, combining Sub-Space clustering with Evidence Accumulation or Inter-Clustering Results Association, to blindly identify anomalies in traffic flows. Correlating the results of several unsupervised detections is also performed to improve detection robustness. The correlation results are further used along other anomaly characteristics to build an anomaly hierarchy in terms of danger- ousness. Characterization is then achieved by building efficient filtering rules to describe a detected anomaly. The detection and characterization performances and sensitivities to parameters are evaluated over a substantial subset of the MAWI repository which contains real network traffic traces. Our work shows that unsupervised learning techniques allow anomaly detection systems to isolate anomalous traffic without any previous knowledge. We think that this contribution constitutes a great step towards autonomous network anomaly detection. This PhD thesis has been funded through the ECODE project by the European Com- mission under the Framework Programme 7. The goal of this project is to develop, implement, and validate experimentally a cognitive routing system that meet the chal- lenges experienced by the Internet in terms of manageability and security, availability and accountability, as well as routing system scalability and quality. The concerned use case inside the ECODE project is network anomaly detection. Keywords: anomaly detection, network traffic, unsupervised learning Résumé La détection d’anomalies est une tâche critique de l’administration des réseaux. L’ap- parition continue de nouvelles anomalies et la nature changeante du trafic réseau com- pliquent de fait la détection d’anomalies. Les méthodes existantes de détection d’anomalies s’appuient sur une connaissance préalable du trafic : soit via des signatures créées àpartir d’anomalies connues, soit via un profil de normalité. Ces deux approches sont limitées : la première ne peut détecter les nouvelles anomalies et la seconde requiert une constante mise àjour de son profil de normalité. Ces deux aspects limitent de fa¸con importante l’efficacitédes méthodes de détection existantes. Nous présentons une approche non-supervisée qui permet de détecter et caractéri- ser les anomalies réseaux de fa¸con autonome. Notre approche utilise des techniques de partitionnement afin d’identifier les flux anormaux. Nous proposons également plusieurs techniques qui permettent de traiter les anomalies extraites pour faciliter la tâche des opérateurs. Nous évaluons les performances de notre système sur des traces de trafic réel issues de la base de trace MAWI. Les résultats obtenus mettent en évidence la possibilité de mettre en place des systèmes de détection d’anomalies autonomes et fonctionnant sans connaissance préalable. Mots-clés : détection d’anomalies, trafic réseau, apprentissage non-supervisé Contents 1 Introduction 5 1.1 Motivatingproblems ............................. 6 1.1.1 Changing traffic in an evolving Internet . 6 1.1.1.1 Changingtraffic ...................... 6 1.1.1.2 Mutating anomalies . 10 1.1.2 Weaknesses of current network anomaly detection approa-ches .. 11 1.1.3 Theneedforanautonomousanomalydetection . 12 1.1.4 Summary ............................... 13 1.2 Contributions ................................. 13 1.2.1 Unsupervisedanomalydetection. 13 1.2.2 Anomalypost-processing. 14 1.2.3 Summary ............................... 14 1.3 Summary ................................... 14 1.4 ContextinsidetheECODEproject . 15 1.4.1 Overview ............................... 15 1.4.2 Usecases ............................... 15 1.4.3 Proposedarchitecture . .. .. .. .. .. 15 1.4.4 Summary ............................... 16 1.5 Dissertationoutline.............................. 16 2 Related work 17 2.1 Unsupervisedmachinelearning . 17 2.1.1 Blind signal separation . 17 2.1.2 Neuralnetworks............................ 18 2.1.3 Clustering algorithms . 18 2.1.3.1 General purpose clustering algorithms . 18 2.1.3.2 Clustering data in high-dimensional space . 19 2.1.3.3 Clustering evaluations . 25 2.2 Network anomaly detection and intrusion detection . .. 26 2.2.1 Classic network anomaly and intrusion detection . 26 2.2.1.1 Misuse detection-based intrusion and network anomaly detection .......................... 26 2.2.1.2 Anomaly detection-based network anomaly detection . 27 2.2.2 Machine learning applied to network anomaly detection and intru- siondetection ............................. 30 1 2.2.2.1 Misuse and anomaly-based supervised network anomaly detection .......................... 31 2.2.2.2 Supervised intrusion detection . 32 2.2.2.3 Unsupervised intrusion detection . 32 2.3 Summary ................................... 33 3 Unsupervised network anomaly detection 34 3.1 Change-Detection, Multi-resolution Flow Aggregation & Attribute com- putation.................................... 35 3.1.1 Changedetection ........................... 35 3.1.2 Multi-resolution Flow Aggregation . 36 3.1.3 Attribute building . 36 3.1.3.1 Metric definition . 36 3.1.3.2 Metric building . 36 3.1.3.3 Attributes generation over packet sets . 37 3.1.4 Summary ............................... 38 3.2 Finding anomalies with unsupervised

Unsupervised Network Anomaly Detection Johan Mazel

A Review on Outlier/Anomaly Detection in Time Series Data

Image Anomaly Detection Using Normal Data Only by Latent Space Resampling

Quickest Change Detection

Statistical Methods for Change Detection - Michèle Basseville

Machine Learning and Extremes for Anomaly Detection — Apprentissage Automatique Et Extrêmes Pour La Détection D’Anomalies

Machine Learning for Anomaly Detection and Categorization In

CSE601 Anomaly Detection

A Survey of Change Detection Methods Based on Remote Sensing Images for Multi-Source and Multi-Objective Scenarios

Change Detection Algorithms

Anomaly Detection in Data Mining: a Review Jagruti D

Anomaly Detection in Raw Audio Using Deep Autoregressive Networks

Detecting Anomalies in System Log Files Using Machine Learning Techniques