Unsupervised Network Anomaly Detection Johan Mazel
Total Page:16
File Type:pdf, Size:1020Kb
Unsupervised network anomaly detection Johan Mazel To cite this version: Johan Mazel. Unsupervised network anomaly detection. Networking and Internet Architecture [cs.NI]. INSA de Toulouse, 2011. English. tel-00667654 HAL Id: tel-00667654 https://tel.archives-ouvertes.fr/tel-00667654 Submitted on 8 Feb 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 5)µ4& &OWVFEFMPCUFOUJPOEV %0$503"5%&-6/*7&34*5²%&506-064& %ÏMJWSÏQBS Institut National des Sciences Appliquées de Toulouse (INSA de Toulouse) $PUVUFMMFJOUFSOBUJPOBMFBWFD 1SÏTFOUÏFFUTPVUFOVFQBS Mazel Johan -F lundi 19 décembre 2011 5J tre : Unsupervised network anomaly detection ED MITT : Domaine STIC : Réseaux, Télécoms, Systèmes et Architecture 6OJUÏEFSFDIFSDIF LAAS-CNRS %JSFDUFVS T EFʾÒTF Owezarski Philippe Labit Yann 3BQQPSUFVST Festor Olivier Leduc Guy Fukuda Kensuke "VUSF T NFNCSF T EVKVSZ Chassot Christophe Vaton Sandrine Acknowledgements I would like to first thank my PhD advisors for their help, support and for letting me lead my research toward my own direction. Their inputs and comments along the stages of this thesis have been highly valuable. I want to especially thank them for having been able to dive into my work without drowning, and then, provide me useful remarks. I want to make a special thank to Pedro Casas-Hernandez who worked with me inside the ECODE project at LAAS-CNRS. Working with him during one year and a half has been a real pleasure. I think that the extent of this thesis contributions would not have been the same without Pedro’s work. I extend a special thank to Ion Alberdi. He introduced me to functionnal programming through the Ocaml language and pointed me toward new and exciting territories in terms of programming and computer science in general. Our discussions have been on of the most valuable discussions I ever had. I want to also thank all the people working at LAAS-CNRS for the help they provided me. I here want to especially thank David Gauchard and Christian Berty for their exceptionnal availability and reactivity. I finally want to thank my family, friends and colleagues at LAAS-CNRS. I won’t quote any names but be assured that I am truly grateful for your support during these three and half years. 1 Abstract Anomaly detection has become a vital component of any network in today’s Inter- net. Ranging from non-malicious unexpected events such as flash-crowds and failures, to network attacks such as denials-of-service and network scans, network traffic anomalies can have serious detrimental effects on the performance and integrity of the network. The continuous arising of new anomalies and attacks create a continuous challenge to cope with events that put the network integrity at risk. Moreover, the inner polymorphic nature of traffic caused, among other things, by a highly changing protocol landscape, complicates anomaly detection system’s task. In fact, most network anomaly detection systems proposed so far employ knowledge-dependent techniques, using either misuse detection signature-based detection methods or anomaly detection relying on supervised- learning techniques. However, both approaches present major limitations: the former fails to detect and characterize unknown anomalies (letting the network unprotected for long periods) and the latter requires training over labeled normal traffic, which is a dif- ficult and expensive stage that need to be updated on a regular basis to follow network traffic evolution. Such limitations impose a serious bottleneck to the previously presented problem. We introduce an unsupervised approach to detect and characterize network anoma- lies, without relying on signatures, statistical training, or labeled traffic, which represents a significant step towards the autonomy of networks. Unsupervised detection is accom- plished by means of robust data-clustering techniques, combining Sub-Space clustering with Evidence Accumulation or Inter-Clustering Results Association, to blindly identify anomalies in traffic flows. Correlating the results of several unsupervised detections is also performed to improve detection robustness. The correlation results are further used along other anomaly characteristics to build an anomaly hierarchy in terms of danger- ousness. Characterization is then achieved by building efficient filtering rules to describe a detected anomaly. The detection and characterization performances and sensitivities to parameters are evaluated over a substantial subset of the MAWI repository which contains real network traffic traces. Our work shows that unsupervised learning techniques allow anomaly detection sys- tems to isolate anomalous traffic without any previous knowledge. We think that this contribution constitutes a great step towards autonomous network anomaly detection. This PhD thesis has been funded through the ECODE project by the European Com- mission under the Framework Programme 7. The goal of this project is to develop, implement, and validate experimentally a cognitive routing system that meet the chal- lenges experienced by the Internet in terms of manageability and security, availability and accountability, as well as routing system scalability and quality. The concerned use case inside the ECODE project is network anomaly detection. Keywords: anomaly detection, network traffic, unsupervised learning R´esum´e La d´etection d’anomalies est une tˆache critique de l’administration des r´eseaux. L’ap- parition continue de nouvelles anomalies et la nature changeante du trafic r´eseau com- pliquent de fait la d´etection d’anomalies. Les m´ethodes existantes de d´etection d’anoma- lies s’appuient sur une connaissance pr´ealable du trafic : soit via des signatures cr´e´ees `apartir d’anomalies connues, soit via un profil de normalit´e. Ces deux approches sont limit´ees : la premi`ere ne peut d´etecter les nouvelles anomalies et la seconde requiert une constante mise `ajour de son profil de normalit´e. Ces deux aspects limitent de fa¸con importante l’efficacit´edes m´ethodes de d´etection existantes. Nous pr´esentons une approche non-supervis´ee qui permet de d´etecter et caract´eri- ser les anomalies r´eseaux de fa¸con autonome. Notre approche utilise des techniques de partitionnement afin d’identifier les flux anormaux. Nous proposons ´egalement plusieurs techniques qui permettent de traiter les anomalies extraites pour faciliter la tˆache des op´erateurs. Nous ´evaluons les performances de notre syst`eme sur des traces de trafic r´eel issues de la base de trace MAWI. Les r´esultats obtenus mettent en ´evidence la possibilit´e de mettre en place des syst`emes de d´etection d’anomalies autonomes et fonctionnant sans connaissance pr´ealable. Mots-cl´es : d´etection d’anomalies, trafic r´eseau, apprentissage non-supervis´e Contents 1 Introduction 5 1.1 Motivatingproblems ............................. 6 1.1.1 Changing traffic in an evolving Internet . 6 1.1.1.1 Changingtraffic ...................... 6 1.1.1.2 Mutating anomalies . 10 1.1.2 Weaknesses of current network anomaly detection approa-ches .. 11 1.1.3 Theneedforanautonomousanomalydetection . 12 1.1.4 Summary ............................... 13 1.2 Contributions ................................. 13 1.2.1 Unsupervisedanomalydetection. 13 1.2.2 Anomalypost-processing. 14 1.2.3 Summary ............................... 14 1.3 Summary ................................... 14 1.4 ContextinsidetheECODEproject . 15 1.4.1 Overview ............................... 15 1.4.2 Usecases ............................... 15 1.4.3 Proposedarchitecture . .. .. .. .. .. 15 1.4.4 Summary ............................... 16 1.5 Dissertationoutline.............................. 16 2 Related work 17 2.1 Unsupervisedmachinelearning . 17 2.1.1 Blind signal separation . 17 2.1.2 Neuralnetworks............................ 18 2.1.3 Clustering algorithms . 18 2.1.3.1 General purpose clustering algorithms . 18 2.1.3.2 Clustering data in high-dimensional space . 19 2.1.3.3 Clustering evaluations . 25 2.2 Network anomaly detection and intrusion detection . .. 26 2.2.1 Classic network anomaly and intrusion detection . 26 2.2.1.1 Misuse detection-based intrusion and network anomaly detection .......................... 26 2.2.1.2 Anomaly detection-based network anomaly detection . 27 2.2.2 Machine learning applied to network anomaly detection and intru- siondetection ............................. 30 1 2.2.2.1 Misuse and anomaly-based supervised network anomaly detection .......................... 31 2.2.2.2 Supervised intrusion detection . 32 2.2.2.3 Unsupervised intrusion detection . 32 2.3 Summary ................................... 33 3 Unsupervised network anomaly detection 34 3.1 Change-Detection, Multi-resolution Flow Aggregation & Attribute com- putation.................................... 35 3.1.1 Changedetection ........................... 35 3.1.2 Multi-resolution Flow Aggregation . 36 3.1.3 Attribute building . 36 3.1.3.1 Metric definition . 36 3.1.3.2 Metric building . 36 3.1.3.3 Attributes generation over packet sets . 37 3.1.4 Summary ............................... 38 3.2 Finding anomalies with unsupervised