Learning Correlation Space for Time Series

Learning Correlation Space for Time Series Han Qiu Hoang Thanh Lam Massachusetts Institute of Technology IBM Research Cambridge, MA, USA Dublin, Ireland [email protected] [email protected] Francesco Fusco Mathieu Sinn IBM Research IBM Research Dublin, Ireland Dublin, Ireland [email protected] [email protected] ABSTRACT We propose an approximation algorithm for efficient correlation search in time series data. In our method, we use Fourier transform and neural network to embed time series into a low-dimensional Euclidean space. The given space is learned such that time series correlation can be effectively approximated from Euclidean distance between corresponding embedded vectors. Therefore, search for correlated time series can be done using an index in the embedding space for efficient nearest neighbor search. Our theoretical analysis Figure 1: Solution Framework illustrates that our method’s accuracy can be guaranteed under certain regularity conditions. We further conduct experiments on real-world datasets and the results show that our method indeed outperforms the baseline solution. In particular, for approximation vectors. Indeed, people have shown that using only the first 5 coef- of correlation, our method reduces the approximation loss by a ficients of the DFT is enough to approximate the correlation among half in most test cases compared to the baseline solution. For top-k stock indices with high accuracy [42]. highest correlation search, our method improves the precision from Approximation of a time series using the first few coefficients 5% to 20% while the query time is similar to the baseline approach of its Fourier transformation can be considered as a dimension- query time. reduction method that maps long time series into a lower dimen- KEYWORDS sional space with minimal loss of information. An advantage of such dimension-reduction approaches is that they are unsupervised Time series, correlation search, Fourier transform, neural network methods and are independent from use-cases; therefore, they can ACM Reference Format: be used for many types of search queries simultaneously. However, Han Qiu, Hoang Thanh Lam, Francesco Fusco, and Mathieu Sinn. 2018. they might not be ideal for applications on particular use-cases Learning Correlation Space for Time Series. In Proceedings of ACM Confer- where the index is designed just for a specific type of query. ence (Conference’17). ACM, New York, NY, USA, 10 pages. https://doi.org/10. In practice, depending on situations we may serve different types 1145/nnnnnnn.nnnnnnn of search queries such as top-k highest correlation search [14], threshold-based (range) correlation search [2, 30] or even simple 1 INTRODUCTION approximation of the correlation between any pair of time series. Different objective function might be needed to optimize perfor- arXiv:1802.03628v3 [cs.LG] 15 May 2018 Given a massive number of time series, building a compact index of mance of different query types. For instance, for the first two query time series for efficient correlated time series search queries isan types it is important to preserve the order of the correlation in important research problem [2, 8]. The classic solutions [2, 30, 42] the approximation, while for the third one it is more important to in the literature use Discrete Fourier Transformation (DFT) to trans- minimize the approximation error. form time series into the frequency domain and approximate the To overcome such problems, in this paper we propose a general correlation using only the first few coefficients of the frequency framework to optimize query performance for specific combination of datasets and query types. The key idea behind this framework Permission to make digital or hard copies of part or all of this work for personal or is sketched in Figure 1. Denote s as a time series in the set S and classroom use is granted without fee provided that copies are not made or distributed s = DFT ¹sº as the discrete Fourier transformation of s. We use for profit or commercial advantage and that copies bear this notice and the full citation ¹·j º on the first page. Copyrights for third-party components of this work must be honored. a neural network f θ with parameters θ to map s into a low- For all other uses, contact the owner/author(s). dimensional space vector f ¹sjθº. This neural network tries to ap- Conference’17, July 2017, Washington, DC, USA proximate the correlation between a pair of time series s and r using © 2018 Copyright held by the owner/author(s). the Euclidean distance kf ¹sjθº − f ¹rjθºk on the embedding space ACM ISBN 978-x-xxxx-xxxx-x/YY/MM. 2 https://doi.org/10.1145/nnnnnnn.nnnnnnn f ¹Sº. We can then utilize classic nearest neighbor search algorithms Conference’17, July 2017, Washington, DC, USA Han Qiu, Hoang Thanh Lam, Francesco Fusco, and Mathieu Sinn Notations Meanings 2 BACKGROUND s;r;u time series Let s = »s1;s2; ··· ;sM ¼;r = »r1;r2; ··· ;rM ¼ be two time series. The s DFT of s Pearson correlation between r and s is defined as: s¯ Mean value of s 1 ÍM M j=1 sjrj − s¯r¯ σs Standard deviation of s corr¹s;rº = ; (1) sˆ Normalized series of s σs σr N + the set of all positive integers ÍM ÍM where s¯ = j=1 sj /M and r¯ = j=1 rj /M are the mean values of s M time series length q q and r, while σ = ÍM ¹s − s¯º2/M and σ = ÍM ¹r − r¯º2/M m embedding size s j=1 j r j=1 j k selection set size are the standard deviations. If we further define the l2-normalized » ··· ¼ η selection threshold vector of the time series s, sˆ = sˆ1;sˆ2; ;sˆM , as: F top-k selection set 1 sj − s¯ sj − s¯ sˆj = p = q ; (2) θ network parameters M σs ÍM 2 j=1¹sj − s¯º f ¹sjθº neural network with parameter θ ¹ º T − k − k2/ ρ query precision we can reduce corr s;r as sˆ rˆ = 1 sˆ rˆ 2 2. That is, the correla- δ query approximation gap tion between two time series can be expressed as a simple function ks − r k2 Euclidean distance between s and r on the corresponding l2-normalized time series. Therefore, in fol- dm¹s; rº Euclidean distance between s and r lowing discussion, we always consider l2-normalized version sˆ of a considering only the first m elements time series s. Table 1: Notations The scaled discrete Fourier transformation of the time series s, s = »s1; s2; ··· ; sM ¼, is defined as: M−1 1 Õ −2jl π i sj = p sl e M (3) M l=0 where i2 = −1. We use “scaled” here for the factor p1 , which made M on this embedding space to index time series and improve query s having the same scale as s. By the Parseval’s theorem [2, 34], T T ∗ efficiency. we have s r = s r and ksk2 = ksk2. We can then recover the We also notice that specific loss function is needed in training following property in the literature [30, 42]: T 2 the network to optimize performance of specific type of query. In • if corr¹s;rº = s r > 1 − ε /2 then dm¹s; rº < ε, where dm experiments, we will see that using the right loss function for a is the Euclidean distance mapped on the first m dimensions specific type of query is crucial in achieveving good results. There- M with m < 2 fore, in this paper we also propose relevant loss functions for all m aforementioned search query types. 2 Õ 2 dm¹»s1; s2; ··· ; sM ¼; »r1; r2; ··· ; rM ¼º = ksj − rj k2 : (4) In summary, the contributions of this work are listed as follows: j=1 • we propose a framework to approximate time series correla- Based on this property, authors in [2] develop a method to search 2 tion using a time series embedding method; for highest correlated time series in a database utilizing dm¹s; rº ≈ • we do theoretical analysis to show non-optimality of the 2 − 2 · corr¹s;rº. This method is considered as the baseline solution baseline approach from which it motivates this work. We in this paper. proposed appropriate loss function for each type of query and provide theoretical guarantee for these approaches; 3 PROBLEM FORMULATION • we conduct experiments on real-world datasets to show the Assume we are given a dataset of regular time series S 2 RM , effectiveness of our approach compared to the classic solu- where M 2 N + is a large number. By “regular” we mean that the tions [2]: our approach reduces the approximation error at time series are sampled from the same period and with the same least by a half in the experiments, and improves the precision frequency (and therefore, having same length). Some correlation for top-k correlation search from 5% to 20%; search problems can be formulated as follows: • we open the source code for reproducibility of our research1. Problem 1 (Top-k Correlation Search). Find a method F that for any given time series s 2 S and count k 2 N +, F¹s;kjSº provides The rest of the paper is organized as follows. We discuss back- a subset of S that includes the k highest correlated time series with ground of the paper in section 2. Three important problems are respect to s. More formally, formulated in section 3. In section 4 we show the non-optimality Õ of the baseline approach from which it motivates this work. We F¹s;kjSº = arg max corr¹s;rº fS0 jS0 ⊂S; jS0 j=k g also show theoretical guarantee of proposed loss functions for each r 2S0 (5) problem formulated in section 3.

Learning Correlation Space for Time Series

Kriging Prediction with Isotropic Matérn Correlations: Robustness

On Prediction Properties of Kriging: Uniform Error Bounds and Robustness

Rational Function Approximation

Piecewise Linear Approximation of Streaming Time Series Data with Max-Error Guarantees

Rethinking Statistical Learning Theory: Learning Using Statistical Invariants

Function Approximation with Mlps, Radial Basis Functions, and Support Vector Machines

Uhm Ms 3812 R.Pdf

Rational Function Approximation

Simplified Neural Networks Algorithms for Function Approximation and Regression

Time Series Compression: a Survey

Deep Neural Network Approximation Theory

Modeling Periodic Functions for Time Series Analysis