Contextual Time Series Change Detection
Total Page:16
File Type:pdf, Size:1020Kb
Contextual Time Series Change Detection Xi C. Chen⇤ Karsten Steinhaeuser⇤ Shyam Boriah⇤ Snigdhansu Chatterjee† Vipin Kumar⇤ Abstract scribed as a target time series behaving similarly to the Time series data are common in a variety of fields ranging related series for some period of time but then diverging from economics to medicine and manufacturing. As a result, from them. Fig. 1 and Fig. 2 illustrate several examples time series analysis and modeling has become an active of contextually changed and unchanged time series. In research area in statistics and data mining. In this paper, both of the figures, the black line indicates the target we focus on a type of change we call contextual time series time series and gray lines show the related time series. change (CTC) and propose a novel two-stage algorithm Fig. 1 shows three types of contextually changed time to address it. In contrast to traditional change detection series. In Type 1, the target series changes abruptly methods, which consider each time series separately, CTC around time step t = 100 while the context remains is defined as a change relative to the behavior of a group relatively stable; in Type 2, the context exhibits a col- of related time series. As a result, our proposed method is lective change whereas the target series remains stable; able to identify novel types of changes not found by other and in Type 3, although both the target time series and algorithms. We demonstrate the unique capabilities of our its context keep changing during the whole period, after approach with several case studies on real-world datasets t = 100, their behaviors are no longer similar – that is, from the financial and Earth science domains. they begin to diverge. The last two types of changes are uniquely addressed by CTC detection and cannot be 1 Introduction found by traditional time series change detection algo- rithms. For comparison, Fig. 2 shows two types of con- Time series data is ubiquitous in a wide range of textually unchanged time series. In the top panel, both applications from financial markets to manufacturing, the target time series and its context are stable while in from health care to the Earth sciences, and many the bottom panel, both of them change similarly during others. As a result, time series analysis and modeling the whole period. Since the target time series does not has become an active area of research in statistics diverge from its context in ether of these scenarios, they and data mining [1, 11, 14]. Of particular interest are considered to be contextually unchanged. within this realm are the problems of change detection CTC detection is useful in many real-world settings. [3, 6, 8, 9]. In this paper, we focus on a type of For example, consider a collection of sensors that mea- change we call contextual time series change (CTC) sure the temperature in a factory. It is very likely that and present a novel approach for addressing the CTC the sensors exhibit a diurnal pattern, with temperatures detection problem. Below we give some background on increasing during the morning hours and cooling in the change detection along with an intuitive description of evenings. Thus, a change in the sensor readings does a contextual change using a simple illustrative example; not necessarily constitute an event. However, if a single a formal definition of the problem follows in Section 2. sensor does not change similarly to the others, it may Traditional time series change detection is defined indicate a sensor failure or an unusual condition in the with respect to values in some portion of the same sensor environment. Similar situations arise in a wide time series. By contrast, contextual change refers to range of application settings. a deviation in behavior of an object (the target time The key contributions of this paper can be sum- series) with respect to its context. The context consists marized as follows: of a collection of time series that exhibit similar behavior to the target time series for some period of time. 1. We provide a formal definition of the problem we Simply put then, a contextual change can be de- call contextual time series change (CTC)detec- Downloaded 06/09/17 to 73.242.127.91. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php tion, which is distinct from traditional time series ⇤Department of Computer Science & Engineering, University change detection and is not properly addressed by of Minnesota. chen,ksteinha,sboriah,kumar @cs.umn.edu existing approaches. { } †School of Statistics, University of Minnesota. chatter- [email protected] 2. We propose a new similarity function called kth 503 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. Type 1 20 20 10 0 0 Value −10 Value −20 −20 0 50 100 150 Time 0 50 100 150 30 Time Type 2 20 10 Value 20 0 −10 0 50 100 150 Value 0 Time Figure 2: Illustration of contextually unchanged time −20 series. In both of the scenarios, the target time series 0 50 100 150 (black) behaves similarly to their context (grey) during the Time entire period. Note that the target time series in the lower Type 3 panel may be considered as changed by traditional change detection schemes. 20 concluding remarks and directions for future work. See 0 [7] for an extended version of this paper that contains Value additional details not covered here due to space limita- −20 tion. 0 50 100 150 2 Problem Formulation & Definitions Time Figure 1: Illustration of di↵erent types of contextual In this section we formally define the problem of con- changes in time series. In general, the target time series textual time series change (CTC) detection, which can (black) behaves di↵erently from the context after t =100in be used on datasets with the following properties: all the scenarios. Type 2 and Type 3 are uniquely addressed by CTC change detection algorithms. 1. The input data is real-valued, i.e., we do not consider binary or symbolic data. order statistic distance. We also provide a method to estimate the best k assuming that the 2. All objects are observed at the same time steps. probability of a time step is an outlier can be The time steps do not need to be regularly spaced. estimated from domain knowledge. 3. Although the data volume may be large, we assume 3. We derive a new metric called time series area that new data is arriving at a rate such that it depth to measure the deviation of the target series can be stored o↵-line for subsequent processing, from the context group members. Time series area i.e., the one-pass streaming data setting is not depth–which is anti-correlated to the probability considered in the current problem formulation. In that the target object belongs to a group–is a good particular, it is explicitly assumed that all objects indicator of contextual change. and observations seen until the current time step are readily accessible. 4. We perform qualitative and quantitative evalua- tion of the proposed method with datasets from We define the following notation: O = o1,o2, is a target time series; X = x ,x , is any{ time···} se- two di↵erent real-world domains. { 1 2 ···} ries in the dataset except O; T = ti1 ,ti2 , indicates The remainder of this paper is organized as follows. a time interval; and O and X are{ subsequences···} of O Downloaded 06/09/17 to 73.242.127.91. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php T T In Section 2, we formally define the CTC detection prob- and X during time period T ,respectively. lem. Section 3 presents the related work. Section 4 Generally speaking, an object changes contextually describes the technical approach, followed by an exper- if its behavior changes in a di↵erent way compared with imental evaluation in Section 5. Section 6 closes with its context. The behavior here is characterized by a 504 Copyright © SIAM. Unauthorized reproduction of this article is prohibited. (univariate or multivariate) time series, which records [5]. The basic idea behind PGA is to check whether or the evolution of some property of the object over time. not a time step in a time series departs from an expected The context is defined by other “similar” time series pattern. The expected pattern is defined by values of in CTC detection. We consider both time series O and the peer group time series at that time step. Peer group group G as stochastic, and Tc and Ts as parameters. of a time series is defined as the k nearest neighbors of We call G the dynamic peer group of O, T the context the time series using the first n time steps (n 1). c ≥ construction period, and Ts the scoring period.InSince the goal of PGA is primarily fraud detection, definition 1 below we make a statement about the joint these methods focus on scoring a single time step in measure induced by O and G under di↵erent parametric the detection period1 [5, 16]. There are two major conditions. This is essentially a way of defining the di↵erences between our approach and PGA. First, in dynamic grouping, where in the context construction PGA the peer group of a time series is unchangeable period O stays in G almost surely, but may not stay throughout the analysis. Whereas in our scheme, peer within G in the scoring period. Formally, group is constructed dynamically at each time step, which allows our scheme to handle cases where the Definition 1.