End-To-End Anomaly Detection in Stream Data

End-to-End Anomaly Detection in Stream Data by Zahra Zohrevand M.Sc., Sharif University of Technology B.Sc., Bu-Ali Sina University Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in the School of Computing Science Faculty of Applied Science © Zahra Zohrevand 2021 SIMON FRASER UNIVERSITY Spring 2021 Copyright in this work rests with the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation. Declaration of Committee Name: Zahra Zohrevand Degree: Doctor of Philosophy Thesis title: End-to-End Anomaly Detection in Stream Data Committee: Chair: Diana Cukierman University Lecturer, Computing Science Uwe Glässer Supervisor Professor, Computing Science Fred Popowich Committee Member Professor, Computing Science Martin Ester Examiner Professor, Computing Science David Skillicorn External Examiner Professor School of Computing Queen’s University ii Abstract Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health. Keywords: time series; anomaly detection; time series anomaly detection; stream data; explainable AI; cybersecurity; deep learning iii Dedication In loving memory of my father To my mother, for her everlasting love and support To my love, Mehdi iv Quotation "The more I read, the more I acquire, the more certain I am that I know nothing." —Voltaire v Acknowledgements First and foremost, I offer my sincerest gratitude to my senior supervisor, Dr. Uwe Glässer; though, words are powerless to express my appreciation for his inspiring enthusiasm, patience, and continuous support throughout my PhD life with his supervision, understanding, and resourcefulness. I am wholeheartedly grateful for all his invaluable contributions of time and insight to make my research productive, stimulating, and exciting. Not only did I learn a lot of things in my research from him, but he also taught me invaluable life lessons. I would like to sincerely thank Dr. Fred Popowich for his precious advice, guidance, and efforts on enriching various technical and practical aspects of this research. I would also like to thank Dr. David Skillikorn and Dr. Martin Ester for their thorough examination of this thesis and their suggestions, corrections, and remarks on this thesis. I am forever indebted to all my co-authors, Dr. Uwe Glässer, Dr. Hamed Yaghoubi Shahir, Dr. Mohammad Ali Tayebi, Mehdi Shirmaleki, Ali Arab, Mehrdad Mansouri, Mahsa Keramati, Amir Yaghoubi Shahir, Saeed Arasteh, and Tilemachos Charalampous. My friends at Simon Fraser University made my studies more pleasant and enjoyable. Especially, I am grateful to Maryam, Sima, Nazanin, Mahsa, Shadab, Sedigheh, Golnaz, Hossein, Hamed, Payam, Hamid, and Kiarash. Heartfelt thanks goes to my parents and my wonderful brothers. I appreciate their endless love and support that made all these years a pleasant chapter of my life. My beloved family are always my sense of safety and calmness. A special thanks goes to my mother, a constant source of support and encouragement. Mum, you made an untold number of sacrifices for me; thank you for being so loving, caring and supportive. Last but not least, I am immensely grateful to my spouse, Mehdi, for all his love, patience and kindness. His determination, confidence, and courage was inspiring for me and I could not have completed this work without his unfaltering love and support. I would like to acknowledge the administrative and technical staff in the School of Computing Science for making this school a productive environment. I also wish to thank the Natural Sciences and Engineering Research Council of Canada (NSERC) and MDA Systems Ltd. for their financial support in the course of this research. vi Table of Contents Declaration of Committee ii Abstract iii Dedication iv Quotation v Acknowledgements vi Table of Contents vii List of Tables xi List of Figures xii 1 Introduction 1 1.1 Motivations and Research Objectives . 2 1.2 Applications . 4 1.2.1 Cybersecurity application . 4 1.2.2 Generic applications . 5 7 2 Background 10 2.1 Time Series . 10 2.2 Anomaly Definition . 11 2.3 Anomaly Detection . 12 2.4 Challenges in Anomaly Detection . 13 2.4.1 Masking effect . 15 2.4.2 Swamping effect . 15 2.4.3 Variable frequency of anomalies . 16 2.4.4 Curse of dimensionality . 16 2.4.5 Lag of emergence . 17 2.4.6 Domain specific criteria . 17 vii 2.5 Anomaly Detection Approaches . 18 2.5.1 Model-based and data-driven categorization . 18 2.5.2 Technique-based categorization . 18 2.5.3 Strategy-based categorization of anomaly detection techniques . 20 2.5.4 Anomalous subsequences in a test time series . 24 2.5.5 Anomalies within a given time series . 25 2.6 Advanced Temporal Anomaly Detection Techniques . 26 2.6.1 Deep learning based classification . 26 2.6.2 Deep learning based forecasting . 28 2.6.3 Deep anomaly detection . 29 2.6.4 Reinforcement learning for anomaly detection . 29 2.6.5 Hybrid methods . 30 2.7 Summary . 30 3 An Anomaly Detection Algorithm 34 3.1 Problem definition . 35 3.2 Background . 35 3.2.1 Anomaly detection in supervisory control systems . 35 3.2.2 HMM-based anomaly detection . 36 3.3 Addressed Challenges . 37 3.4 Proposed Method . 38 3.4.1 Behavior tracker . 39 3.4.2 Fine-grained anomaly tracker . 40 3.5 Dataset Characteristics . 43 3.6 Experimental Evaluation . 46 3.6.1 Experiments setup . 46 3.6.2 Evaluation of HMM-based models . 48 3.6.3 Baseline methods . 49 3.7 Conclusions . 51 4 A Novel Behavior Modeling Algorithm 53 4.1 Problem definition . 54 4.2 Background . 54 4.3 Addressed Challenges . 55 4.4 Proposed method . 56 4.4.1 Pre-learning . 56 4.4.2 Individual learning . 59 4.4.3 Consensus phase . 61 4.5 Experimental Evaluation . 61 4.5.1 Experimental setup . 61 viii 4.5.2 Evaluated methods . 63 4.5.3 Experimental results . 64 4.5.4 Feature extraction findings . 67 4.6 Conclusions . 69 5 Dynamic Attack Scoring Using Distributed Local Detectors 71 5.1 Problem definition . 72 5.2 Background . 73 5.3 Addressed Challenges . 74 5.4 Proposed Method . 75 5.4.1 Behavior predictor . 76 5.4.2 Inference engine . 77 5.5 Data Characteristics . 80 5.5.1 Dataset Exploration . 81 5.6 Experiments . 86 5.6.1 Behavior predictor evaluation . 87 5.6.2 Log-based evaluation . 87 5.6.3 Event-based evaluation . 88 5.6.4 Discussion . 89 5.7 Conclusions . 90 6 Anomaly Detection Explanation 91 6.1 Introduction . 91 6.2 Background . 93 6.3 Dynamic Temporal Anomaly Detection Explainer . 96 6.3.1 Problem definition . 96 6.3.2 Addressed Challenges . 97 6.3.3 Interpretation approach . 97 6.3.4 Experimental Evaluation . 105 6.4 An Interpretable Variational Autoencoder to Find Cyberattacks . 111 6.4.1 Problem Definition . ..

End-To-End Anomaly Detection in Stream Data

Fastlof: an Expectation-Maximization Based Local Outlier Detection Algorithm

Incremental Local Outlier Detection for Data Streams

A Two-Level Approach Based on Integration of Bagging and Voting for Outlier Detection

Accelerating the Local Outlier Factor Algorithm on a GPU for Intrusion Detection Systems

Arxiv:1904.06034V1 [Stat.ML] 12 Apr 2019 Sity Exactly for a Test Instance

A Comparative Evaluation of Semi- Supervised Anomaly Detection Techniques

Isolation Forest and Local Outlier Factor for Credit Card Fraud Detection System

Anomaly Detection Using Signal Segmentation and One-Class Classification in Diffusion Process of Semiconductor Manufacturing

Anomaly Detection Using Dictionary Learning

A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams

Unsupervised Anomaly Detection Approach for Time-Series in Multi-Domains Using Deep Reconstruction Error

Open Cheng-Kai Chen Thesis Final.Pdf