Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis

Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis Jingyuan Wang, Ze Wang, Jianfeng Li, Junjie Wu Beihang University, Beijing, China {jywang,ze.w,leejianfeng,wujj}@buaa.edu.cn ABSTRACT 2018: 24th ACM SIGKDD International Conference on Knowledge Discovery & Recent years have witnessed the unprecedented rising of time series Data Mining, August 19–23, 2018, London, United Kingdom. ACM, New York, from almost all kindes of academic and industrial fields. Various NY, USA, 10 pages. https://doi.org/10.1145/3219819.3220060 types of deep neural network models have been introduced to time 1 INTRODUCTION series analysis, but the important frequency information is yet lack of effective modeling. In light of this, in this paper we proposea A time series is a series of data points indexed in time order. Methods wavelet-based neural network structure called multilevel Wavelet for time series analysis could be classified into two types: time- 1 Decomposition Network (mWDN) for building frequency-aware domain methods and frequency-domain methods. Time-domain deep learning models for time series analysis. mWDN preserves methods consider a time series as a sequence of ordered points the advantage of multilevel discrete wavelet decomposition in fre- and analyze correlations among them. Frequency-domain methods quency learning while enables the fine-tuning of all parameters use transform algorithms, such as discrete Fourier transform and under a deep neural network framework. Based on mWDN, we Z-transform, to transform a time series into a frequency spectrum, further propose two deep learning models called Residual Classifi- which could be used as features to analyze the original series. cation Flow (RCF) and multi-frequecy Long Short-Term Memory In recent years, with the booming of deep learning concept, var- (mLSTM) for time series classification and forecasting, respectively. ious types of deep neural network models have been introduced to The two models take all or partial mWDN decomposed sub-series time series analysis and achieved state-of-the-art performances in in different frequencies as input, and resort to the back propaga- many real-life applications [28, 38]. Some well-known models in- tion algorithm to learn all the parameters globally, which enables clude Recurrent Neural Networks (RNN) [40] and Long Short-Term seamless embedding of wavelet-based frequency analysis into deep Memory (LSTM) [14] that use memory nodes to model correlations learning frameworks. Extensive experiments on 40 UCR datasets of series points, and Convolutional Neural Network (CNN) that uses and a real-world user volume dataset demonstrate the excellent trainable convolution kernels to model local shape patterns [42]. performance of our time series models based on mWDN. In partic- Most of these models fall into the category of time-domain methods ular, we propose an importance analysis method to mWDN based without leveraging frequency information of a time series, although models, which successfully identifies those time-series elements some begin to consider in indirect ways [6, 19]. and mWDN layers that are crucially important to time series analy- Wavelet decompositions [7] are well-known methods for cap- sis. This indeed indicates the interpretability advantage of mWDN, turing features of time series both in time and frequency domains. and can be viewed as an indepth exploration to interpretable deep Intuitively, we can employ them as feature engineering tools for learning. data preprocessing before a deep modeling. While this loose cou- pling way might improve the performance of raw neural network CCS CONCEPTS models [24], they are not globally optimized with independent • Computing methodologies → Neural networks; Supervised parameter inference processes. How to integrate wavelet trans- learning by classification; Supervised learning by regression; forms into the framework of deep learning models remains a great challenge. KEYWORDS In this paper, we propose a wavelet-based neural network struc- arXiv:1806.08946v1 [cs.LG] 23 Jun 2018 ture, named multilevel Wavelet Decomposition Network (mWDN), to Time series analysis, Multilevel wavelet decomposition network, build frequency-aware deep learning models for time series analysis. Deep learning, Importance analysis Similar to the standard Multilevel Discrete Wavelet Decomposition ACM Reference Format: (MDWD) model [26], mWDN can decompose a time series into Jingyuan Wang, Ze Wang, Jianfeng Li, Junjie Wu. 2018. Multilevel Wavelet a group of sub-series with frequencies ranked from high to low, Decomposition Network for Interpretable Time Series Analysis. In KDD which is crucial for capturing frequency factors for deep learning. Permission to make digital or hard copies of all or part of this work for personal or Different from MDWD with fixed parameters, however, all parame- classroom use is granted without fee provided that copies are not made or distributed ters in mWDN can be fine-turned to fit training data of different for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM learning tasks. In other words, mWDN can take advantages of both must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, wavelet based time series decomposition and the learning ability to post on servers or to redistribute to lists, requires prior specific permission and/or a of deep neural networks. fee. Request permissions from [email protected]. KDD 2018, August 19–23, 2018, London, United Kingdom Based on mWDN, two deep learning models, i.e., Residual Classi- © 2018 Association for Computing Machinery. fication Flow (RCF) and multi-frequency Long Short-Term Memory ACM ISBN 978-1-4503-5552-0/18/08...$15.00 https://doi.org/10.1145/3219819.3220060 1https://en.wikipedia.org/wiki/Time_series 2 MODEL al (3) ↓2 Average Pooling … l ↓2 xl (3) Throughout the paper, we use lowercase symbols such as a, b to l l or h The Functions (2) al (2) x (2) denote scalars, bold lowercase symbols such as a; b to denote vec- tors, bold uppercase symbols such as A; B to denote matrices, and l ↓2 h ↓2 xh (3) … l uppercase symbols such as A; B, to denote constant. al (1) x (1) ah (3) l ↓2 h ↓2 xh (2) 2.1 Multilevel Discrete Wavelet Decomposition ah (2) Multilevel Discrete Wavelet Decomposition (MDWD) [26] is a x h ↓2 The third level xh (1) decomposition wavelet based discrete signal analysis method, which can extract ah (1) results of x multilevel time-frequency features from a time series by decom- (a) Illustration of the mWDN Framework posing the series as low and high frequency sub-series level by level. l h l l l h l or h W (i) or W (i) x (i-1) b (i-1) a (i) or a (i) We denote the input time series as x = fx1;:::; xt ;:::; xT g, and l ¹ º l the low and high sub-series generated in the i-th level as x i and x (i) h x ¹iº. In the ¹i + 1º-th level, MDWD uses a low pass filter l = fl1; :::;l ;:::;lK g and a high pass filter h = fh1;:::;h ;:::;hK g, Average k k h + = Pooling K ≪ T , to convolute low frequency sub-series of the upper level as K Õ l ¹ º l ¹ º · an i + 1 = xn+k−1 i lk ; k=1 (1) K Õ (b) Approximative Discrete Wavelet Transform h ¹ º l ¹ º · an i + 1 = xn+k−1 i hk ; k=1 Figure 1: The mWDN framework. l where xn¹iº is the n-th element of the low frequency sub-series in the i-th level, and xl ¹0º is set as the input series. The low and high frequency sub-series xl ¹iº and xh¹iº in the level i are generated from the 1/2 down-sampling of the intermediate variable sequences n o n o l ¹ º l ¹ º l ¹ º h¹ º h¹ º h¹ º a i = a1 i ; a2 i ;::: and a i = a1 i ; a2 i ;::: . n o The sub-series set X¹iº = xh¹1º; xh¹2º;:::; xh¹iº; xl ¹iº is (mLSTM), are designed for time series classification (TSC) and fore- called as the i-th level decomposed results of x. Specifically, X¹iº casting (TSF), respectively. The key issue in TSC is to extract as satisfies: 1) We can fully reconstruct x from X¹iº; 2) The frequency many as possible representative features from time series. The RCF from xh¹1º to xl ¹iº is from high to low; 3) For different layers, X¹iº model therefore adopts the mWDN decomposed results in different has different time and frequency resolutions. As i increases, the fre- levels as inputs, and employs a pipelined classifier stack to exploit quency resolution is increasing and the time resolution, especially features hidden in sub-series through residual learning methods. for low frequency sub-series, is decreasing. For the TSF problem, the key issue turns to inferring future states Because the sub-series with different frequencies in X keep the of a time series according to the hidden trends in different frequen- same order information with the original series x, MDWD is re- cies. Therefore, the mLSTM model feeds all mWDN decomposed garded as time-frequency decomposition. sub-series in high frequencies into independent LSTM models, and ensembles all LSTM outputs for final forecasting. Note that all pa- 2.2 Multilevel Wavelet Decomposition rameters of RCF and mLSTM including the ones in mWDN are Network trained using the back propagation algorithm in an end-to-end man- In this section, we propose a multilevel Wavelet Decomposition ner. In this way, the wavelet-based frequency analysis is seamlessly Network (mWDN), which approximatively implements a MDWD embedded into deep learning frameworks.

Load more