A Deep Neural Network for Unsupervised Anomaly Detection

The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19) A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data Chuxu Zhang,§∗ Dongjin Song,†∗ Yuncong Chen,† Xinyang Feng,‡∗ Cristian Lumezanu,† Wei Cheng,† Jingchao Ni,† Bo Zong,† Haifeng Chen,† Nitesh V. Chawla§ §University of Notre Dame, IN 46556, USA †NEC Laboratories America, Inc., NJ 08540, USA ‡Columbia University, NY 10027, USA §{czhang11,nchawla}@nd.edu, †{dsong,yuncong,lume,weicheng,jni,bzong,haifeng}@nec-labs.com, ‡[email protected] Abstract Nowadays, multivariate time series data are increasingly col- lected in various real world systems, e.g., power plants, wear- able devices, etc. Anomaly detection and diagnosis in multivariate time series refer to identifying abnormal status in certain time steps and pinpointing the root causes. Building such a system, however, is challenging since it not only requires to capture the temporal dependency in each time series, but also need encode the inter-correlations between different pairs of time series. In addition, the system should be robust to noise and provide operators with different levels of anomaly scores based upon the severity of different incidents. Despite the fact that a number of unsupervised anomaly detection algorithms Figure 1: (a) Unsupervised anomaly detection and diagnosis have been developed, few of them can jointly address these in multivariate time series data. (b) Different system signa- challenges. In this paper, we propose a Multi-Scale Con- ture matrices between normal and abnormal periods. volutional Recurrent Encoder-Decoder (MSCRED), to perform anomaly detection and diagnosis in multivariate time series data. Specifically, MSCRED first constructs multi-scale (resolution) signature matrices to characterize multiple levels A critical task in managing these systems is to detect anoma- of the system statuses in different time steps. Subsequently, lies in certain time steps such that the operators can take fur- given the signature matrices, a convolutional encoder is em- ther actions to resolve underlying issues. For instance, an ployed to encode the inter-sensor (time series) correlations and an attention based Convolutional Long-Short Term Mem- anomaly score can be produced based on the sensor data ory (ConvLSTM) network is developed to capture the tempo- and it can be used as an indicator of power plant failure ral patterns. Finally, based upon the feature maps which en- (Len, Vittal, and Manimaran 2007). An accurate detection code the inter-sensor correlations and temporal information, is crucial to avoid serious financial and business losses as a convolutional decoder is used to reconstruct the input sig- it has been reported that 1 minute downtime of an automo- nature matrices and the residual signature matrices are further tive manufacturing plant may cost up to 20, 000 US dollars utilized to detect and diagnose anomalies. Extensive empiri- (Djurdjanovic, Lee, and Ni 2003). In addition, pinpointing cal studies based on a synthetic dataset and a real power plant the root causes, i.e., identifying which sensors (system com- dataset demonstrate that MSCRED can outperform state-of- ponents) are causes to an anomaly, can help the system op- the-art baseline methods. erator perform system diagnosis and repair in a timely man- ner. In real world applications, it is common that a short Introduction term anomaly caused by temporal turbulence or system sta- Complex systems are ubiquitous in modern manufacturing tus switch may not eventually lead to a true system failure industry and information services. Monitoring the behav- due to the auto-recovery capability and robustness of mod- iors of these systems generates a substantial amount of mul- ern systems. Therefore, it would be ideal if an anomaly de- tivariate time series data, such as the readings of the net- tection algorithm can provide operators with different levels worked sensors (e.g., temperature and pressure) distributed of anomaly scores based upon the severity of various inci- in a power plant or the connected components (e.g., CPU us- dents. For simplicity, we assume that the severity of an in- age and disk I/O) in an Information Technology (IT) system. cident is proportional to the duration of an anomaly in this A A ∗ work. Figure 1(a) illustrates two anomalies, i.e., 1 and 2 This work was done when the first and fourth authors were marked by red dash circle, in multivariate time series data. summer interns at NEC Laboratories America. Dongjin Song is the The root causes are yellow and black time series, respec- corresponding author. A A Copyright c 2019, Association for the Advancement of Artificial tively. The duration (severity level) of 2 is larger than 1. Intelligence (www.aaai.org). All rights reserved. To build a system which can automatically detect and di- 1409 agnose anomalies, one main problem is that few or even and abnormal periods. Ideally, MSCRED cannot reconstruct no anomaly label is available in the historical data, which Mabnormal well as training matrices (e.g., Mnormal) are distinct makes the supervised algorithms (Gornitz¨ et al. 2013) from Mabnormal. To summarize, the main contributions of our infeasible. In the past few years, a substantial amount work are: of unsupervised anomaly detection methods have been • We formulate the anomaly detection and diagnosis prob- developed. The most prominent techniques include dis- lem as three underlying tasks, i.e., anomaly detection, tance/clustering methods (He, Xu, and Deng 2003; Hau- ¨ root cause identification, and anomaly severity (dura- tamaki, Karkka¨ ¨ınen, and Franti¨ 2004), probabilistic methods tion) interpretation. Unlike previous studies which inves- (Chandola, Banerjee, and Kumar 2009), density estimation tigate each problem independently, we address these is- methods (Manevitz and Yousef 2001), temporal prediction sues jointly. approaches (Chen et al. 2008; Gunnemann,¨ Gunnemann,¨ • and Faloutsos 2014), and the more recent deep learning We introduce the concept of system signature matrix, de- techniques (Qin et al. 2017; Zhou and Paffenroth 2017; velop MSCRED to encode the inter-sensor correlations Wu et al. 2018; Zong et al. 2018). Despite the intrinsic unsu- via a convolutional encoder, incorporate temporal pat- pervised setting, most of them may still not be able to detect terns with attention based ConvLSTM networks, and re- anomalies effectively due to the following reasons: construct signature matrix via a convolutional decoder. As far as we know, MSCRED is the first model that considers • There exists temporal dependency in multivariate time se- correlations among multivariate time series for anomaly ries data. Due to this reason, distance/clustering methods, detection and can jointly resolve all the three tasks. ¨ e.g., k-Nearest Neighbor (kNN) (Hautamaki, Karkka¨ ¨ınen, • We conduct extensive empirical studies on a synthetic and Franti¨ 2004)), classification methods, e.g., One-Class dataset as well as a power plant dataset. Our results SVM (Manevitz and Yousef 2001), and density estima- demonstrate the superior performance of MSCRED over tion methods, e.g., Deep Autoencoding Gaussian Mix- state-of-the-art baseline methods. ture Model (DAGMM) (Zong et al. 2018), may not perform well since they cannot capture temporal dependen- Related Work cies across different time steps. • Multivariate time series data usually contain noise in real Unsupervised anomaly detection on multivariate time series word applications. When the noise becomes relatively se- data is a challenging task and various types of approaches have been developed in the past few years. vere, it may affect the generalization capability of tempo- ¨ ral prediction models, e.g., Autoregressive Moving Av- One traditional type is the distance methods (Hautamaki, Karkka¨ ¨ınen, and Franti¨ 2004). For instance, the k-Nearest erage (ARMA) (Brockwell and Davis 2013) and LSTM ¨ encoder-decoder (Qin et al. 2017), and increase the false Neighbor (kNN) algorithm (Hautamaki, Karkka¨ ¨ınen, and Franti¨ 2004) computes the anomaly score of each data sam- positive detections. k • ple based on the average distance to its nearest neigh- In real world application, it is meaningful to provide oper- bors. Similarly, the clustering models (He, Xu, and Deng ators with different levels of anomaly scores based upon 2003) cluster different data samples and find anomalies via the severity of different incidents. The existing methods a predefined outlierness score. In addition, the classification for root cause analysis, e.g., Ranking Causal Anomalies methods, e.g., One-Class SVM (Manevitz and Yousef 2001), (RCA) (Cheng et al. 2016), are sensitive to noise and can- models the density distribution of training data and classi- not handle this issue. fies new data as normal or abnormal. Although these meth- In this paper, we propose a Multi-Scale Convolutional ods have demonstrated their effectiveness in various appli- Recurrent Encoder-Decoder (MSCRED) to jointly consider cations, they may not work well on multivariate time series the aforementioned issues. Specifically, MSCRED first con- since they cannot capture the temporal dependencies appro- structs multi-scale (resolution) signature matrices to charac- priately. To address this issue, temporal prediction methods, terize multiple levels of the system statuses across different e.g., Autoregressive Moving Average (ARMA) (Brockwell time steps. In particular,

A Deep Neural Network for Unsupervised Anomaly Detection

A Review on Outlier/Anomaly Detection in Time Series Data

Image Anomaly Detection Using Normal Data Only by Latent Space Resampling

Unsupervised Network Anomaly Detection Johan Mazel

Machine Learning and Extremes for Anomaly Detection — Apprentissage Automatique Et Extrêmes Pour La Détection D’Anomalies

Machine Learning for Anomaly Detection and Categorization In

CSE601 Anomaly Detection

Anomaly Detection in Data Mining: a Review Jagruti D

Anomaly Detection in Raw Audio Using Deep Autoregressive Networks

Detecting Anomalies in System Log Files Using Machine Learning Techniques

Anomaly Detection with Adversarial Dual Autoencoders

Backpropagated Gradient Representations for Anomaly Detection

Machine Learning for Automated Anomaly Detection In