Unsupervised Anomaly Detection in Time Series with Recurrent Neural Networks

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2019 Unsupervised anomaly detection in time series with recurrent neural networks JOSEF HADDAD CARL PIEHL KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE Unsupervised anomaly detection in time series with recurrent neural networks JOSEF HADDAD, CARL PIEHL Bachelor in Computer Science Date: June 7, 2019 Supervisor: Pawel Herman Examiner: Örjan Ekeberg School of Electrical Engineering and Computer Science Swedish title: Oövervakad avvikelsedetektion i tidsserier med neurala nätverk iii Abstract Artificial neural networks (ANN) have been successfully applied to a wide range of problems. However, most of the ANN-based models do not attempt to model the brain in detail, but there are still some models that do. An example of a biologically constrained ANN is Hierarchical Temporal Memory (HTM). This study applies HTM and Long Short-Term Memory (LSTM) to anomaly detection problems in time series in order to compare their performance for this task. The shape of the anomalies are restricted to point anomalies and the time series are univariate. Pre-existing implementations that utilise these networks for unsupervised anomaly detection in time series are used in this study. We primarily use our own synthetic data sets in order to discover the networks’ robustness to noise and how they compare to each other regarding different characteristics in the time series. Our results shows that both networks can handle noisy time series and the difference in performance regarding noise robustness is not significant for the time series used in the study. LSTM out- performs HTM in detecting point anomalies on our synthetic time series with sine curve trend but a conclusion about the overall best performing network among these two remains inconclusive. iv Sammanfattning Artificiella neurala nätverk (ANN) har tillämpats på många problem. Däremot försöker inte de flesta ANN-modeller efterlikna hjärnan i detalj. Ett exempel på ett ANN som är begränsat till att efterlikna hjärnan är Hierarchical Temporal Memory (HTM). Denna studie tillämpar HTM och Long Short-Term Memory (LSTM) på avvikelsedetektionsproblem i tidsserier för att undersöka vilka styrkor och svagheter de har för detta problem. Avvikelserna i denna studie är begränsade till punktavvikelser och tidsserierna är i endast en variabel. Redan existerande implementationer som utnyttjar dessa nätverk för oövervakad avvikelsedetektionsproblem i tidsserier används i denna studie. Vi använder främst våra egna syntetiska tidsserier för att undersöka hur nätverken hanterar brus och hur de hanterar olika egenskaper som en tidsserie kan ha. Våra resultat visar att båda nätverken kan hantera brus och prestationsskillnaden rörande brusrobusthet var inte tillräckligt stor för att urskilja modellerna. LSTM presterade bättre än HTM på att upptäcka punktavvikelser i våra syntetiska tidsserier som följer en sinuskurva men en slutsats angående vilket nätverk som presterar bäst överlag är fortfarande oavgjord. Contents 1 Introduction 1 1.1 Aims and Research Question . .2 1.2 Scope . .3 1.3 Outline . .3 2 Background 4 2.1 Time Series and Anomalies . .4 2.2 HTM . .5 2.2.1 HTM Neuron . .6 2.2.2 HTM Network . .7 2.2.3 HTM activation and learning . .8 2.3 ANNs . .9 2.3.1 Training . 10 2.3.2 RNNs . 10 2.3.3 LSTM . 11 2.4 Related Work . 12 3 Method 14 3.1 HTM Configuration . 14 3.1.1 Network structure . 15 3.1.2 Training . 15 3.1.3 Anomaly labelling . 16 3.2 LSTM configuration . 16 3.2.1 Network structure . 17 3.2.2 Training . 17 3.2.3 Anomaly labelling . 18 3.3 Differences between the used models . 19 3.4 Evaluation and Performance metrics . 19 3.5 Data sets used . 20 v vi CONTENTS 3.5.1 Data sets for testing noise robustness . 20 3.5.2 Data sets for testing time series characteristics . 22 3.5.3 Real-world time series used . 24 4 Results 26 4.1 Noise robustness . 26 4.1.1 Statistical hypothesis testing . 29 4.2 Trend/characteristics results . 29 4.2.1 Statistical hypothesis testing . 34 4.3 Results from real world data sets . 35 4.3.1 Occupancy t4013 . 35 4.3.2 Ec2_request_latency_system_failure . 37 5 Discussion 39 5.1 Noise robustness . 39 5.2 Time series and anomaly characteristics . 40 5.3 Real world time series performance . 41 5.4 General comparison . 42 5.5 Limitations . 42 5.6 Biological approach . 44 5.7 Future Work . 44 6 Conclusions 46 Bibliography 47 A Time series graphs 51 A.1 Noise robustness graphs . 51 A.2 Synthetic time series graphs . 53 Acronyms ANN Artificial Neural Network. BPTT Back-Propagation Through Time. HTM Hierarchical Temporal Memory. LSTM Long-Short Term Memory. NAB Numenta Anomaly Benchmark. RNN Recurrent Neural Network. SDR Sparse Distributed Representation. vii Chapter 1 Introduction Advances in neuroscience have allowed for a greatly increased understanding of the structure and function of different parts of the brain. Simultaneously, great advancements have been made in the field of machine learning. Artifi- cial Neural Networks (ANN), in particular, have been of great interest in the research community and have been successfully applied to a wide range of problems, from medical diagnosis [1] to playing games [2]. However, most ANN-based models do not attempt to model the brain in any detail [3]. These advancements are mostly driven by mathematically derived models devised to perform specific tasks which do not utilise our increased understanding of the brain. They often also require extensive training in order to perform these tasks and cannot easily be generalised to perform other tasks. An alternative approach is to use biologically inspired models by trying to mimic the way the human brain processes information. There are a few examples of brain inspired methods, one of the more detailed ones is the Hierarchical Temporal Memory (HTM). HTM is an evolving attempt to model the structure and function of the neocortex first introduced by Hawkins [4]. The model is based on Mount- castle’s [5] proposal that all the regions of the neocortex, which makes up roughly 80% of the brain and is responsible for higher-order functions such as cognition and language, follow a similar neuroanatomical design [5]. It is hy- pothesised that the difference in the functionality by different regions mainly arises out of different inputs. This implies, in theory, that a faithful model of the neocortex could be trained to perform multifarious function that would be considered as the backbone of intelligent behaviour. Hawkins claims that the neocortex achieves this by memorising patterns and constantly making predictions based on those memories. This way, the neocortex can learn spatial and 1 2 CHAPTER 1. INTRODUCTION temporal patterns in its environment. When input is received predictions are made based on learned patterns. Due to this ability to make inference based on temporal patterns, it would be of interest to test the performance of HTM in time series anomaly detection tasks. Time series data, data captured over a period of time, occurs naturally in many real world scenarios and analysis of time series data has been of interest in fields such as engineering, economics and medicine [6]. Patterns in a time series data that deviate from expected or normal behaviour are considered to be anomalies [6]. This means that a tool which can accurately predict the future values in a time series can also be used as an anomaly detector. Time series anomaly detection can be useful in many areas such as sleep monitoring [6], jet engine operation [7] and intrusion detection for computer networks [8]. Anomalies can be difficult to detect because it can be difficult to determine if a pattern in a time series is considered to be normal on not since an anomaly in one process can be considered normal behaviour in another [6]. An identifica- tion of an anomalous behaviour can be done with simple threshold heuristics [9], but those often require knowledge of the data sets and implementation by a human with deep domain knowledge. The major challenge lies in capturing dependencies among multiple variables as well as identifying long-term and short-term repeating patterns. This is where traditional approaches, such as auto regressive methods, can fall short [10]. An alternative approach, which has seen increasing popularity as of late, is using ANNs as anomaly detectors. Different types of ANNs have been applied to a wide range of time series problems, such as predicting flour prices and modelling the amount of littering in the North Sea [6]. 1.1 Aims and Research Question The aim of this thesis is to evaluate a HTM network in the task of detecting point anomalies in time series data. The strength and weaknesses of the network are compared to a state-of-the-art ANN-based approach. The selected ANN is the Long-Short Term Memory (LSTM) Recurrent Neural Network (RNN). RNNs are well suited for time series data since they can represent information from an arbitrarily long context window [11]. However, they have traditionally been difficult to train and perform worse with very long-term temporal dependencies [12]. Adding LSTM units to these networks has been shown to remedy some of these issues, allowing the networks to achieve state- of-the-art performance in time series anomaly detection tasks [6][13]. CHAPTER 1. INTRODUCTION 3 The research question for this thesis is: How does HTM compare to LSTM in time series point anomaly detection tasks? 1.2 Scope HTM and LSTM are compared when performing unsupervised anomaly detection in single variable streaming data. In particular, this study focuses on robustness to noise and ability to recognise anomalies in time series with different characteristics.

Unsupervised Anomaly Detection in Time Series with Recurrent Neural Networks

Memory-Prediction Framework for Pattern

Neuromorphic Architecture for the Hierarchical Temporal Memory Abdullah M

Memory-Prediction Framework for Pattern Recognition: Performance

The Neuroscience of Human Intelligence Differences

On Intelligence As Memory

Mapping Mind-Brain Development: Towards a Comprehensive Theory

Intelligence As a Developing Function: a Neuroconstructivist Approach

Conversation with Jeff Hawkins – on Defining Intelligence

Hierarchical Temporal Memory (HTM)

From Neural Networks to Deep Learning: Zeroing in on the Human Brain

On the Prospects for Building a Working Model of the Visual Cortex

Fluid and Flexible Minds: Intelligence Reflects Synchrony in the Brain's Intrinsic Network Architecture