Unfolding Preprocessing for Meaningful Time Series Clustering

Neural Networks 19 (2006) 877–888 www.elsevier.com/locate/neunet 2006 Special Issue Unfolding preprocessing for meaningful time series clustering Geoffroy Simona,∗,1, John A. Leea,2, Michel Verleysena,b,3 a Machine Learning Group - DICE, Universite´ Catholique de Louvain, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium b SAMOS-MATISSE, UMR CNRS 8595, Universite´ Paris I - Pantheon´ Sorbonne, Rue de Tolbiac 90, F-75634 Paris Cedex 13, France Abstract Clustering methods are commonly applied to time series, either as a preprocessing stage for other methods or in their own right. In this paper it is explained why time series clustering may sometimes be considered as meaningless. This problematic situation is illustrated for various raw time series. The unfolding preprocessing methodology is then introduced. The usefulness of unfolding preprocessing is illustrated for various time series. The experimental results show the meaningfulness of the clustering when applied on adequately unfolded time series. c 2006 Elsevier Ltd. All rights reserved. Keywords: Clustering; Time series; Self-organizing maps 1. Introduction be extracted from the series using a sliding window. In the remainder of the text vectors of successive past values are called Time series analysis is a task encountered in many fields, regressors. for example mathematics, statistics, econometrics, system Despite the frequent use of these clustering methods on time identification, physics and machine learning to cite only a few. series very few works try to answer fundamental questions such In all these fields there exist various methods of analyzing, as the meaningfulness of time series clustering. Recently it has describing, extracting features or even forecasting time series. been proved that in some cases clustering on time series is Among all these data analysis tools the partitioning, vector meaningless (Keogh, Lin, & Truppel, 2003). Some authors used quantization and similarity search methods can be grouped this proof to conclude that subsequence or regressor clustering under the general heading of clustering methods. is always meaningless (Bagnall, Janakec, & Zhang, 2003; Clustering is usually applied to time series in two different Hetland, 2003; Mahoney & Chan, 2005). It is shown in Struzik ways. In the context of time series prediction, for example, (2003) that time series clustering is not meaningless but rather clustering can be applied to time series either as preprocessing difficult due to the auto-similarity of the series. It is also proved for another time series prediction model (Cottrell, Girard, & in Denton (2004) that kernel-based clustering methods are able Rousset, 1997; Vesanto, 1997) or (Dablemont et al., 2003) for to provide meaningful clustering for time series. In this paper example) or as a prediction model in its own right ((Martinetz, it will be shown that clustering is indeed meaningful once the Berkovich, & Schulten, 1993; Simon, Lendasse, Cottrell, Fort, regressors are unfolded, i.e. when some adequate preprocessing & Verleysen, 2004; Walter, Ritter, & Schulten, 1990) to cite is performed. only a few references). In practice most of the clustering In order to show the meaningfulness of clustering the methods work with subsequences of the initial time series. Such unfolding preprocessing is introduced. This methodology is subsequences are vectors of successive past values that can based on embedding (Sauer, Yorke, & Casdagli, 1991). The main idea of embedding is to use delayed coordinates in the ∗ regressors. Selection of the optimal delay can be done using Corresponding author. either the autocorrelation function (Abarbanel, 1997; Kantz & E-mail address: [email protected] (G. Simon). URL: http://www.dice.ucl.ac.be/∼gsimon/ (G. Simon). Schreiber, 1997; Kaplan & Glass, 1995) or mutual information 1 G. Simon is funded by the Belgian F.R.I.A. (Abarbanel, 1997; Fraser & Swinney, 1986; Kantz & Schreiber, 2 J.A. Lee is a Scientific Research Worker of the Belgian F.N.R.S. 1997). The limit of these two common approaches is that the 3 M. Verleysen is a Research Director of the Belgian F.N.R.S. first lag is usually selected with great care but the influence 0893-6080/$ - see front matter c 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.neunet.2006.05.020 878 G. Simon et al. / Neural Networks 19 (2006) 877–888 of the other delayed components (selected using multiples of cross-validation, bootstrap (Efron & Tibshirani, 1993)) are also this lag) is not taken into account. In this paper a measure is often used in order to find a correct value for the model proposed to determine whether the selected delay is sufficient complexity p. Another approach to choose p is to estimate the to unfold efficiently the series in a p-dimensional space of dimension of the time series using the correlation dimension, regressors. The idea of the proposed criterion is to observe the as proposed in Ding, Grebogi, Ott, Sauer, and Yorke (1993), mean distance of a data set to the principal diagonal of its space. Grassberger and Procaccia (1983) and Sauer et al. (1991) The whole methodology allows creating unfolded regressors and illustrated in Babloyantz, Nicolis, and Salazar (1985) with much more informative content than those obtained using and Camastra and Colla (1999) to cite only a few. However, the simple sliding window technique. The meaningfulness of this question of choosing an adequate value of p is quite time series clustering on these unfolded time series will then be independent from the unfolding goal of this work; p will shown on artificial and real time series using a Self-Organizing therefore be deemed to be fixed a priori throughout the rest of Map (Kohonen, 1995). Self-Organizing Maps are used here as a this paper. clustering tool thanks to their vector quantization property. The meaningfulness of time series clustering will be observed using 2.2. Self-Organizing Maps criteria that measure the relative differences between regressor distributions. The Self-Organizing Map (SOM) algorithm is an unsuper- The paper is organized as follows. Notations and definitions vised classification algorithm proposed in the 1980s (Kohonen, are given in Section 2, which ends with a brief description 1995). Since its introduction it has been applied in many ar- of Self-Organizing Maps. The criterion to measure the eas, from classification to robotics to cite only two (Kohonen, meaningfulness of clustering is detailed in Section 3. The so- 1995). The intuitive graphical representations that can be ob- called ’meaninglessness’ of time series clustering is shown in tained from the SOM have made this tool a very popular non- the same section using graphical representations in 2- and 3- linear clustering method. Theoretical aspects of the SOM have dimensional spaces. The unfolding preprocessing methodology also been studied in detail (Cottrell, Fort, & Pages,` 1998). is presented and illustrated in Section 4. Experimental results The main concepts of the SOM algorithm can be are provided in Section 5. Section 6 concludes the paper. summarized as follows. During the learning stage, the SOM units, or prototypes, are moved within the data distribution. 2. Regressor clustering using Self-Organizing Maps Each prototype is linked to neighbouring ones according to a 1- or 2-dimensional grid whose shape and size are chosen 2.1. Time series and regressors: Definitions and notations a priori. The learning consists in presenting each datum and selecting the closest prototype with respect to some A time series S is a series of values xt , with 1 ≤ t ≤ distance measure. This winning prototype is moved towards n, measured from a time varying process. The xt values are ordered according to the time index and sampled with constant the data while its neighbours are also moved according to constraints given by the grid. At the end of the learning frequency. Note that, for reasons of simplicity, xt is considered as scalar here though it could be a multidimensional vector. stage, the final prototype positions represent a discrete and In the context of time series prediction one has to build a rough approximation of the data density that is therefore model of the series. For example regression models have at least partitioned into clusters. Each cluster is associated with one of one output and some inputs corresponding either to past values the prototypes. The two main properties of the SOM are vector quantization xt of the time series itself or to past values ut of other related processes. These inputs are grouped in a vector describing the and topology preservation. In this paper, the SOM is used as state of the process at a given time t and defined as: a clustering tool. Data to be clustered are the p-dimensional regressors obtained from the time series. t xt−p+1 = {xt , xt−1,..., xt−p+1, ut , ut−1,..., ut−q+1}. (1) 3. Why time series clustering may be considered as + t The p q-dimensional vector xt−p+1 is called the regressor. meaningless For reasons of simplicity the regressors will be limited in this paper to p-dimensional vectors of past values: How can the meaningfulness of time series clustering t be observed? Consider two different time series. Once the x = {x , x − ,..., x − + }. (2) t−p+1 t t 1 t p 1 regressors are formed for each series one can reasonably expect Intuitively, the regressors can be seen as subsequences obtained that the two sets of regressors are different as they describe two from the initial time series using a sliding window. distinct processes. Applying a clustering method like the SOM The question regarding how to choose the length p algorithm on both sets of regressors reduces the information of the regressor is decisive for the model. This question encoded in the regressors such that it is now contained can be answered elegantly for linear models using various in a limited number of prototypes.

Load more