Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding

Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding Shohreh Deldari Daniel V. Smith [email protected] [email protected] School of Computing Technologies, RMIT University Data61, CSIRO Melbourne, VIC, Australia Hobart, TAS, Australia Hao Xue Flora D. Salim [email protected] [email protected] School of Computing Technologies, RMIT University School of Computing Technologies, RMIT University Melbourne, VIC, Australia Melbourne, VIC, Australia ABSTRACT 1 INTRODUCTION Change Point Detection (CPD) methods identify the times associ- The ubiquity of digital technologies along with the substantial ated with changes in the trends and properties of time series data processing power and storage capacity on offer means we cur- in order to describe the underlying behaviour of the system. For rently have an unprecedented ability to access and analyse data. instance, detecting the changes and anomalies associated with web The scale and velocity in which data is being stored and shared, service usage, application usage or human behaviour can provide however, means that we often lack the resources to utilise tradi- valuable insights for downstream modelling tasks. We propose a tional data curation processes. For instance, in supervised machine novel approach for self-supervised Time Series Change Point de- learning approaches, the data annotation process can be an ex- tection method based on Contrastive Predictive coding ()( − 퐶%2). pensive, unwieldy and inaccurate one. Consequently, this is why )( − 퐶%2 is the first approach to employ a contrastive learning self-supervised and unsupervised learning methods are currently strategy for CPD by learning an embedded representation that sep- hot topics in the machine learning community where the goal is to arates pairs of embeddings of time adjacent intervals from pairs maximise the value of raw data. of interval embeddings separated across time. Through extensive Change point detection (CPD), an analytical method to identify experiments on three diverse, widely used time series datasets, the times associated with abrupt transitions of a series can be used to we demonstrate that our method outperforms five state-of-the-art extract meaning from non-annotated data. Change points, whether CPD methods, which include unsupervised and semi-supervised they have been generated from video cameras, microphones, en- approaches. )( − 퐶%2 is shown to improve the performance of vironmental sensors or mobile applications can provide a critical methods that use either handcrafted statistical or temporal features understanding of the underlying behaviour of the system being by 79.4% and deep learning-based methods by 17.0% with respect modelled. For instance, change points can represent alterations in to the F1-score averaged across the three datasets. the system state that might require human attention, such as a system fault or an upcoming emergency. Furthermore, CPD methods CCS CONCEPTS can be employed in related problems of temporal segmentation, • Computing methodologies ! Machine learning; Unsuper- event detection and temporal anomaly detection. vised learning; Anomaly detection; Learning latent representations; • Information systems ! Data stream mining. representation KEYWORDS mismatch arXiv:2011.14097v5 [cs.LG] 5 Mar 2021 Unsupervised learning, Time series change point detection, Anom- aly detection, Contrastive learning ACM Reference Format: Shohreh Deldari, Daniel V. Smith, Hao Xue, and Flora D. Salim. 2021. Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding. In Proceedings of the Web Conference 2021 (WWW ’21), April 19– 23, 2021, Ljubljana, Slovenia. ACM, New York, NY, USA, 12 pages. https: //doi.org/10.1145/3442381.3449903 History Future This paper is published under the Creative Commons Attribution 4.0 International Xt Xt+1 (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution. WWW ’21, April 19–23, 2021, Ljubljana, Slovenia © 2021 IW3C2 (International World Wide Web Conference Committee), published under Creative Commons CC-BY 4.0 License. Figure 1: Overview of presented change point detection ap- ACM ISBN 978-1-4503-8312-7/21/04. https://doi.org/10.1145/3442381.3449903 proach based on predictive representation learning. WWW ’21, April 19–23, 2021, Ljubljana, Slovenia Shohreh Deldari, Daniel V. Smith, Hao Xue, Flora D. Salim CPD techniques have been applied to multivariate time series of the data making it applicable to a broad range of real-world data in a broad range of research areas including network traffic applications. The main contributions of our paper are as follows: analysis [27], IoT applications and smart homes [3], human activity • We leverage contrastive learning as an unsupervised objec- recognition (HAR) [5, 13, 29, 42, 46], human physiological and emo- tive function for the CPD task. To the best of our knowledge, tional analysis [13], factory automation [51], trajectory prediction we are the first to employ contrastive learning to the CPD [37], user authentication [22], life-logging [7], elderly rehabilitation problem. [28], and daily work routine studies [12]. In addition to time series, • We propose a representation learning framework to tackle CPD can applied to other data modalities with a temporal dimen- the problem of self-supervised CPD by capturing compact, sion, such as video, where it has been used for video captioning latent embeddings that represent historical and future time [14, 16] and video summarising [1, 47] applications. intervals of the times series. Change points are commonly estimated from one of a number of • We compare our proposed method against five state of the different properties of a time series, including its temporal continu- art CPD methods, which include deep learning and non deep ity, distribution or shape. Unsupervised CPD methods are generally learning based methods, investigate the benefits of each developed to identify changes based upon one particular property. through extensive experiments. For instance, FLOSS [18] was developed to detect changes in the • We investigate the performance impact of the hyperparame- temporal shape, whilst RuLSIF [30] and aHSIC [52] were devel- ters used within our self-supervised learning method includ- oped to identify changes in the statistical distribution. Current CPD ing batch size, code size, and window size. methods have failed to generalise effectively [13] as the semantic To make )( − 퐶%2 reproducible, all the code, data and experi- boundaries of different applications will usually be associated with ments are available in the project’s web page 1. different time series properties. For example, abnormalities inthe rhythm of the human heart are best characterised by changes in the 2 RELATED WORK AND BACKGROUND temporal shape pattern of an electrocardiogram (ECG), whereas changes in human posture (as measured with an RFID sensor sys- In this section, we review existing approaches for the CPD prob- tem) are best characterised by abrupt statistical changes. In this lem. Since we employ contrastive learning for our time series case, the detection performance degrades when a statistical CPD change point detection method, we also outline recent works on method is applied to the heart beat application, whilst shape based self-supervised contrastive learning. We will then review recent CPD methods will fail in the human posture application. Further- representation learning approaches, not only for time series data, more, for many applications in which data is continuously collected, but other data modalities as well. time series with be characterised by slowly varying temporal shape and statistical properties. The change points associated with such 2.1 Time series change point detection time series can be subtle and remain a challenge for CPD methods Although self-supervised learning methods have recently attracted to address. the interest of the deep learning community, current CPD methods In this work, we propose )( − 퐶%2 , a novel approach for self- are mostly based on non deep learning approaches yet. supervised Time Series Change Point detection method based on Existing approaches can be categorised based upon the features Contrastive Predictive coding. We pose the question of whether of the time series that they consider for CPD. Statistical methods self-supervised learning can be used to provide an effective, general often compute change points on the basis of identifying statistical representation for CPD. The intuition here is to exploit the local differences between adjacent short intervals of a time series. The correlation present within a time series by learning a representa- statistical differences between intervals are usually measured with tion that maximises the shared information between contiguous either parametric or non-parametric approaches. Parametric meth- time intervals, whilst minimising the shared information between ods use a Probability Density Function (PDF) such as [4] or auto- pairs of time intervals that are separated in time (i.e. pairs of time regressive model [53] to represent the time intervals, however, such intervals with less correlation). It is hypothesised that whenever the convenient representations limit the types of statistical changes learnt representation differs significantly between time adjacent that can be detected. Non-parametric methods offer a greater degree

Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support