Contextual Time Series Change Detection
Xi C. Chen⇤ Karsten Steinhaeuser⇤ Shyam Boriah⇤ Snigdhansu Chatterjee† Vipin Kumar⇤
Abstract scribed as a target time series behaving similarly to the Time series data are common in a variety of fields ranging related series for some period of time but then diverging from economics to medicine and manufacturing. As a result, from them. Fig. 1 and Fig. 2 illustrate several examples time series analysis and modeling has become an active of contextually changed and unchanged time series. In research area in statistics and data mining. In this paper, both of the figures, the black line indicates the target we focus on a type of change we call contextual time series time series and gray lines show the related time series. change (CTC) and propose a novel two-stage algorithm Fig. 1 shows three types of contextually changed time to address it. In contrast to traditional change detection series. In Type 1, the target series changes abruptly methods, which consider each time series separately, CTC around time step t = 100 while the context remains is defined as a change relative to the behavior of a group relatively stable; in Type 2, the context exhibits a col- of related time series. As a result, our proposed method is lective change whereas the target series remains stable; able to identify novel types of changes not found by other and in Type 3, although both the target time series and algorithms. We demonstrate the unique capabilities of our its context keep changing during the whole period, after approach with several case studies on real-world datasets t = 100, their behaviors are no longer similar – that is, from the financial and Earth science domains. they begin to diverge. The last two types of changes are uniquely addressed by CTC detection and cannot be 1 Introduction found by traditional time series change detection algo- rithms. For comparison, Fig. 2 shows two types of con- Time series data is ubiquitous in a wide range of textually unchanged time series. In the top panel, both applications from financial markets to manufacturing, the target time series and its context are stable while in from health care to the Earth sciences, and many the bottom panel, both of them change similarly during others. As a result, time series analysis and modeling the whole period. Since the target time series does not has become an active area of research in statistics diverge from its context in ether of these scenarios, they and data mining [1, 11, 14]. Of particular interest are considered to be contextually unchanged. within this realm are the problems of change detection CTC detection is useful in many real-world settings. [3, 6, 8, 9]. In this paper, we focus on a type of For example, consider a collection of sensors that mea- change we call contextual time series change (CTC) sure the temperature in a factory. It is very likely that and present a novel approach for addressing the CTC the sensors exhibit a diurnal pattern, with temperatures detection problem. Below we give some background on increasing during the morning hours and cooling in the change detection along with an intuitive description of evenings. Thus, a change in the sensor readings does a contextual change using a simple illustrative example; not necessarily constitute an event. However, if a single a formal definition of the problem follows in Section 2. sensor does not change similarly to the others, it may Traditional time series change detection is defined indicate a sensor failure or an unusual condition in the with respect to values in some portion of the same sensor environment. Similar situations arise in a wide time series. By contrast, contextual change refers to range of application settings. a deviation in behavior of an object (the target time The key contributions of this paper can be sum- series) with respect to its context. The context consists marized as follows: of a collection of time series that exhibit similar behavior to the target time series for some period of time. 1. We provide a formal definition of the problem we Simply put then, a contextual change can be de- call contextual time series change (CTC)detec-
Downloaded 06/09/17 to 73.242.127.91. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php tion, which is distinct from traditional time series
⇤Department of Computer Science & Engineering, University change detection and is not properly addressed by of Minnesota. chen,ksteinha,sboriah,kumar @cs.umn.edu existing approaches. { } †School of Statistics, University of Minnesota. chatter- [email protected] 2. We propose a new similarity function called kth
503 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. Type 1 20 20 10
0
0 Value −10 Value −20 −20 0 50 100 150 Time 0 50 100 150 30 Time Type 2 20 10 Value 20 0
−10 0 50 100 150
Value 0 Time Figure 2: Illustration of contextually unchanged time −20 series. In both of the scenarios, the target time series 0 50 100 150 (black) behaves similarly to their context (grey) during the Time entire period. Note that the target time series in the lower Type 3 panel may be considered as changed by traditional change detection schemes. 20 concluding remarks and directions for future work. See 0 [7] for an extended version of this paper that contains
Value additional details not covered here due to space limita- −20 tion.
0 50 100 150 2 Problem Formulation & Definitions Time Figure 1: Illustration of di↵erent types of contextual In this section we formally define the problem of con- changes in time series. In general, the target time series textual time series change (CTC) detection, which can (black) behaves di↵erently from the context after t =100in be used on datasets with the following properties: all the scenarios. Type 2 and Type 3 are uniquely addressed by CTC change detection algorithms. 1. The input data is real-valued, i.e., we do not consider binary or symbolic data. order statistic distance. We also provide a method to estimate the best k assuming that the 2. All objects are observed at the same time steps. probability of a time step is an outlier can be The time steps do not need to be regularly spaced. estimated from domain knowledge. 3. Although the data volume may be large, we assume 3. We derive a new metric called time series area that new data is arriving at a rate such that it depth to measure the deviation of the target series can be stored o↵-line for subsequent processing, from the context group members. Time series area i.e., the one-pass streaming data setting is not depth–which is anti-correlated to the probability considered in the current problem formulation. In that the target object belongs to a group–is a good particular, it is explicitly assumed that all objects indicator of contextual change. and observations seen until the current time step are readily accessible. 4. We perform qualitative and quantitative evalua- tion of the proposed method with datasets from We define the following notation: O = o1,o2, is a target time series; X = x ,x , is any{ time···} se- two di↵erent real-world domains. { 1 2 ···} ries in the dataset except O; T = ti1 ,ti2 , indicates The remainder of this paper is organized as follows. a time interval; and O and X are{ subsequences···} of O Downloaded 06/09/17 to 73.242.127.91. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php T T In Section 2, we formally define the CTC detection prob- and X during time period T ,respectively. lem. Section 3 presents the related work. Section 4 Generally speaking, an object changes contextually describes the technical approach, followed by an exper- if its behavior changes in a di↵erent way compared with imental evaluation in Section 5. Section 6 closes with its context. The behavior here is characterized by a
504 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. (univariate or multivariate) time series, which records [5]. The basic idea behind PGA is to check whether or the evolution of some property of the object over time. not a time step in a time series departs from an expected The context is defined by other “similar” time series pattern. The expected pattern is defined by values of in CTC detection. We consider both time series O and the peer group time series at that time step. Peer group group G as stochastic, and Tc and Ts as parameters. of a time series is defined as the k nearest neighbors of We call G the dynamic peer group of O, T the context the time series using the first n time steps (n 1). c construction period, and Ts the scoring period.InSince the goal of PGA is primarily fraud detection, definition 1 below we make a statement about the joint these methods focus on scoring a single time step in measure induced by O and G under di↵erent parametric the detection period1 [5, 16]. There are two major conditions. This is essentially a way of defining the di↵erences between our approach and PGA. First, in dynamic grouping, where in the context construction PGA the peer group of a time series is unchangeable period O stays in G almost surely, but may not stay throughout the analysis. Whereas in our scheme, peer within G in the scoring period. Formally, group is constructed dynamically at each time step, which allows our scheme to handle cases where the Definition 1. (Contextual Change) A time series O peer group of a time series changes with time. Second, changes contextually at time t if and only if the following because PGA methods focus on scoring a single time two conditions are met. step in the detection period, they are not as e↵ective 1. There exists a group of time series (G)andatime for detecting changes that are more reliably observed interval Tc = t k1,t k1 +1,...,t 1 for which over multiple consecutive time steps, especially in noisy { } data.
p(OTc GTc )=1 Temporal Outlier Detection (TOD) [10] also focuses 2 on detecting outliers within the context of other time 2. In the time interval Ts = t +1,t+2, ,t+ k2 series in the dataset. Instead of constructing a peer { ··· } group for each time series (as done in our approaches),
p(OTs GTs ) <✏ TOD maintains a temporal neighborhood vector that 2 records historical similarities of the target time series to where,p(O G ) and p(O G ) are the probabil- all other time series in the dataset. The outlier score of Tc 2 Tc Ts 2 Ts ity of O belonging to G in Tc and Ts, respectively. ✏ is the target time series at time step t is then given by the a user-defined threshold. L1 distance between the temporal neighborhood vector at time t and time t 1. Given the di↵erence in the To simplify the problem, we define G in such a way way that the target time series is compared with other that p(O G) 1. Since for kNN-based methods it Tc time series, TOD and our method are meant for finding is hard to2 control⇡ the quality of the group members, a entirely di↵erent types of patterns. For example, if two threshold-based definition is used. sets of time series have similar behavior over time within Definition 2. (Dynamic Peer Group) The group of each group, but their behavior as a group starts to di↵er time series X1, ,Xm constitutes a dynamic peer at a specific time, TOD may flag all these time series as { ··· } outliers, whereas, our scheme will consider all of these group G for a time series O in a time interval Tc if and only if for all j (1,m) time series as unchanged [7]. 2 dist(Xj ,O ) <✏ 4 The proposed approach Tc Tc g In this section, we present the details of our proposed dist(Xj ,O ) is an arbitrary distance metric that mea- Tc Tc approach for contextual time series change (CTC)de- sures the di↵erence between Xj and O . ✏ is a user- tection. The time series to be examined is called the Tc Tc g defined threshold. target time series. The proposed method is applied ex- haustively in every time step of every time series in the 3 Related Work dataset. A change score matrix is reported as the final The concept of analyzing the behavior of a target time output. The elements in the matrix indicate how much series in the context of a group of related series has been a specific time series has changed contextually in a given explored in prior researches [5, 10, 16], particularly for Downloaded 06/09/17 to 73.242.127.91. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php the problems of fraud detection and temporal outlier 1In some schemes [5], original time series is transformed such detection. that the value at each new time step is a function of several preceding time steps. Then, each time step of this transformed Peer Group Analysis (PGA) is an unsupervised time series is scored individually, even though the score can be a fraud detection method proposed by Bolton and Hand function of multiple time steps of the original time series.
505 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. time step. A time series can be labeled as changed if as well. its score is larger than a threshold or its rank is smaller Most standard smoothing methods, such as moving than a threshold. average and Savitzky-Golay filter, cannot address this The proposed approach consists of two steps: con- problem. Instead of ignoring the incorrect information, text construction and scoring. Context construction is smoothing methods average the value of outliers into used to discover the dynamic peer group (G) for the several time steps. When the distance is calculated, a target time series (O) during the context construction fraction of the incorrect information is still used. In period (T ), ensuring that p(O G ) 1. In partic- order to overcome this problem, we propose a distance c Tc 2 Tc ⇡ ular, we build G for O in Tc based on a range query function as described below. method which naturally follows the definition of dy- namic peer group (see Definition 2). We propose a Definition 3. (kth order statistic distance) The kth function called kth order statistic distance (Section 4.1), order statistic distance between two subsequences OT which will be used in the range query method. The time and XT (Ok(OT ,XT )) can be calculated by the following series for which G is constructed successfully (satisfying steps. Condition 1 in Definition 1) are candidates for the sec- 1. Compute point-wise distance between O and X ond step. T T for each of time steps. The scoring mechanism provides a probabilistic
estimate of the extent to which the target time series 2. Rank all the time steps of OT and XT in the de- follows the behavior of its contextual neighbors in the creasing order according to the point-wise distance. scoring period (Ts). A non-parametric scoring function called time series area depth is used to measure the 3. Remove the first k 1 time steps in OT and XT . deviation of O from G in T (Section 4.2). This scoring s 4. O (O ,X ) is given by the predefined Minkowski method does not assume any particular distribution of k T T distance using the remaining time steps. the datasets and can be used in many di↵erent domains. The proposed method assumes that only a small The only parameter k a↵ects the performance of the number of time series in the dynamic peer group changes proposed context construction method in two di↵erent similarly as the target time series (O). For situations ways: Many time series are incorrectly included in G where this assumption cannot be met, we provide a when k is too large, while many real members in G are heuristic algorithm (Multimode Remover) to remove the excluded when k is too small. In other words, when k contextual neighbors that have similar behavior as O is smaller, the false positive rate of G (FPRG)islower (Section 4.3). but the true positive rate of G (TPRG) is larger. Thus, the performance of the proposed method is dependent 4.1 Context Construction The purpose of context on the choice of k. construction is to discover a dynamic peer group G In this paper, we provide a method to estimate the for a target time series O which can satisfy Definition best k assuming that the probability of a time step is an 2. Therefore, instead of attempting to obtain all the outlier can be estimated (in the extended version [7], we contextual neighbors of O, we aim to find enough time propose a supervised method based on cluster sampling series to adequately describe the context and ensure theory to estimate this probability). Specifically, we that p(O G ) 1. Although KNN-based methods Tc Tc propose a method to choose k which can minimize the easily find members2 ⇡ in G, it is di cult to control the FPRG given the condition that TPRG should be larger quality of the dynamic peer group and thus to meet the than a user-defined threshold. condition that p(O G ) 1. In this paper, we Tc 2 Tc ⇡ Assume that the observed time series X and O can choose a range query method, which is an intuitive way be modeled as to find G. The distance metric is the key component.
We use Minkowski distances in this paper to ensure all xi =ˆxi + bx olx + er members in G are similar to O in Euclidean space. · The existence of outliers is a major challenge when oi =ˆoi + bo olo + er using Minkowski distances. For example, L (used · 1 wherex ˆ ando ˆ are the true values of x and o . er is by Li et al. [10]) measures the largest distance of all i i i i weak white noise. ol and ol are the values of outliers. Downloaded 06/09/17 to 73.242.127.91. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php the time steps during T and is thus highly sensitive to x o b and b are random variables indicating whether or outliers. Therefore, many members will incorrectly be x o not a time step is an outlier (1 indicates an outlier, 0 removed from G. Although the impact of outliers is not otherwise); they follow Bernoulli(px) and Bernoulli(po), as large as L , outliers a↵ect the L1 and L2 distance 1 respectively.
506 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. Lemma 1. Estimation of TPR is given by Lemma 2. When G is given, T AD(Ts) is inversely re- lated to p(OT GT ) under the following assumptions: k 1 s 2 s N N n n q (1 q) 1. If a time series O belongs to G,foranyi n n=1 ✓ ◆ i ,i ,... 2 X { 1 2 } where N is the number of time steps in T and q = oi = fi(G)+ i (1 p )(1 p ). Proof is available in [7]. x o where fi(G) is an arbitrary function of all the time series in G and = , , is a pure random Therefore, k is given by the smallest k which 1 2 process. { ···} satisfies k 1 N N n n 2. The probability for a given time series O still q (1 q) >thp n belongs to the dynamic peer group G at time ti Ts n=1 ✓ ◆ X is 2 4.2 Scoring Mechanism Change scoring is the sec- min( o ct , o cb ) ond major step in our CTC detection framework. In this exp 1 | i i| | i i | section, we introduce a new robust scoring mechanism, ct cb ! | i i | Time Series Area Depth (TAD), to measure the devia- p tion of a target object (O) from its dynamic peer group where all the notations are same as Definition 4. (G). Fundamentally, TAD is a scoring mechanism de- Proof is available in [7]. rived from statistical depth [17]. Next we briefly intro- Next, we will show that the two assumptions are duce the concept of statistical depth and then describe generally true. First, the temporal trend of O is similar TAD in detail. to the major trend of G that is constructed by the Statistical depth measures the position of a given proposed method. Since f (G) is defined as a function point relative to a data cloud, and evaluates how close i that represents the major trend of G at time step t , the given point is to the center of the data cloud. i f (G) can also be used as the trend of O. Considering We prefer a statistical depth function in the pro- i that the temporal trend is often responsible for most of posed method for three reasons. First, they are non- the temporal correlation in the time series, all the time parametric methods. Thus, we do not need to change steps in are nearly independent to each other (which the method for applications in di↵erent domains. Sec- is the first assumption). The probability given in the ond, they can be used in multivariate analysis. Al- second assumption is actually a variant of the Simplical though in the work presented in this paper we are Volume Depth (SVD). The properties of a statistical focused on univariate analysis, the ability to extend data-depth function are easily checked for this variant. the approach to multivariate analysis is under consid- In summary, the advantages to use TAD include: eration. Third, the property of a ne invariance en- sures robustness of our methodology with respect to the 1. TAD is inversely related to p(OTs GTs )under scale, rotation and location parameters of the underly- certain assumptions (See Lemma 2).2 ing probability distribution. In particular, it helps ad- dress the issue of unequal variance of the observations 2. TAD, although used in univariate analysis in this at di↵erent time steps. paper, can be easily expanded to multivariate datasets. Definition 4. (Time Series Area Depth) The Time Series Area Depth of a time series O given its dynamic 3. Experiments suggest the TAD is parsimonious, in the sense that it performs quite well even if the peer group G in Ts = ti1 , ,tiN is defined as { ··· } number of contextual neighbors available is not iN t b min( oi ci , oi ci ) larger than the dimension of the data (i.e., the T AD(Ts)= | | | | t b number of time steps in the scoring period). i=i ci ci X1 | | t b p th 4.3 Dealing with Multiple Modes In some cases, where ci (ci ) is the value of the top (bottom) m contextual neighbors may change similarly to the target percentile contour of G at ti. time series. Thus, the dynamic peer group may exhibit Downloaded 06/09/17 to 73.242.127.91. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php m is the parameter to be chosen when using TAD. multimodal behavior after the time of change. In this Here, we choose m = 68 in general because it corre- situation, scores given by TAD will tend to underesti- sponds to one standard deviation if the dataset follows mate the separation. Figure 3a shows an example of normal distribution. this problem wherein 5 out 20 time series (gray) in the
507 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. 30 30 5 Experimental Results 25 25 In this section, we demonstrate the capabilities of the 20 20
15 15 proposed CTC detection algorithm on a variety of real- 10 10 world datasets from the financial and Earth science Value 5 Value 5 domains. 0 0
−5 −5 −10 −10 5.1 Event Detection based on Historical Stock −15 −15 0 50 100 150 0 50 100 150 Time Time Market Data We start with a case study using his- (a) Without Multimode Re- (b) With Multimode Remover torical stock market data. In particular, we consider mover algorithm. algorithm. the weekly closing stock prices of S&P 500 companies Figure 3: A target time series (black) together with its over the 10-year period from January 2000 through De- th dynamic peer group G (gray). The blue lines are the 68 cember 2009. This data is publicly available from the percentile contour of G after t = 100 found without (a) and Yahoo! Finance website. Since individual stock prices with (b) Multimode Remover algorithm. di↵er in scale and exhibit significant variance, we nor- malize the raw time series (R) such that dynamic peer group change similarly as the target time series (black) at t = 100. Thus, the 68th percentile con- r min(R) x = i tour (the upper and lower bounds of the contour are i max(R) min(R) shown as blues lines) is misleading. One potential method to address this problem is to where ri is the stock price at time ti, and max(R) and find the major mode (we assume that the behaviors of min(R) are the maximum and minimum values in R, the majority time series in G are similar to each other respectively. afterwards). We propose an iterative algorithm called In this experiment, we first show the di↵erences Multimode Remover to obtain the mth percentile con- between the contextual changes detected by the pro- tour for the largest mode. Essentially, we first remove posed method and the traditional changes detected by the extreme 10% of time series based on their distance CUSUM using a case study of Pinnacle West Capital to the center of G for each time step. This procedure Corporation. Then, we briefly discuss the performance is iteratively performed until the mean value of G sta- of the proposed CTC detection method on the historical bilizes. The detailed method is listed in Algorithm 1. stock market data. The parameters used in this exper- th Figure 3b shows the 68 percentile contour of the ma- iment is as below. Tc = 100, Ts = 100 and k = 1. jor mode (blue lines) reported by Multimode Remover for the same data used in Figure 3a. Comparing the two 5.1.1 Case Study of Pinnacle West Capital Cor- figures, we see that the Multimode Remover algorithm poration To show the di↵erence between contextual successfully identifies the major mode. changes and traditional changes detected in the stock market data, we compare the performance of our pro- posed method with CUSUM2 using the weekly clos- Algorithm 1 Multimode Remover ing stock prices of Pinnacle West Capital Corporation j1 j2 Input: G = X X and Ts. { ··· } (Symbol: PNW). Output: C (mth percentile contour of the major mode in Figure 4 shows the time series of the stock price G in T ). s together with its mean, CUSUM score, and its dy- for every ti Ts do 2 th namic peer group constructed by our proposed method. Gi contains the i time step of all the members in G. while 1 do CUSUM detects the point around which the mean of the time series shifts. For this specific example, the Calculate µ (the mean of Gi). Remove 10% values in Gi whose distance are changes detected by CUSUM are around the starting largest. point of the global financial crisis from 2007 to 2012. Calculate µ’ (the mean of the “new” Gi). Instead of considering whether or not the pattern of the if µ0 µ