A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series

A Wavelet-Based Anytime Algorithm for K-Means Clustering of Time Series Michail Vlachos Jessica Lin Eamonn Keogh Dimitrios Gunopulos Computer Science & Engineering Department University of California - Riverside Riverside, CA 92521 {mvlachos, jessica, eamonn, dg}@cs.ucr.edu ABSTRACT 1. INTRODUCTION The emergence of the field of data mining in the last The emergence of the field of data mining in the decade has sparked an increasing interest in last decade has sparked an increase of interest in clustering of time series. Although there has been clustering of time series [12, 16, 20, 21, 22, 33]. Such much research on clustering in general, most classic clustering is useful in its own right as a method to machine learning and data mining algorithms do not summarize and visualize massive datasets [34]. In work well for time series due to their unique addition, clustering is often used as a subroutine in structure. In particular, the high dimensionality, very other data mining algorithms such as similarity search high feature correlation, and the (typically) large [26, 30], classification [22] and the discovery of amount of noise that characterize time series data association rules [9]. Applications of these present a difficult challenge. In this work we address algorithms cover a wide range of activities found in these challenges by introducing a novel anytime finance, meteorology, industry, medicine etc. version of k-Means clustering algorithm for time Although there has been much research on series. The algorithm works by leveraging off the clustering in general [5], the unique structure of time multi-resolution property of wavelets. In particular, series means that most classic machine learning and an initial clustering is performed with a very coarse data mining algorithms do not work well for time resolution representation of the data. The results series. In particular, the high dimensionality, very obtained from this “quick and dirty” clustering are high feature correlation, and the (typically) large used to initialize a clustering at a slightly finer level amount of noise that characterize time series data of approximation. This process is repeated until the present a difficult challenge [21]. clustering results stabilize or until the “approximation” is the raw data. In addition to In this work we address these challenges by casting k-Means as an anytime algorithm, our introducing a novel anytime version of the popular k- approach has two other very unintuitive properties. Means clustering algorithm [15, 27] for time series. The quality of the clustering is often better than the Anytime algorithms are algorithms that trade batch algorithm, and even if the algorithm is run to execution time for quality of results [19]. Their utility completion, the time taken is typically much less than for data mining has been documented at length the time taken by the original algorithm. We explain, elsewhere [5, 31]. and empirically demonstrate these surprising and The algorithm works by leveraging off the multi- desirable properties with comprehensive experiments resolution property of wavelets [11]. In particular, an on several publicly available real data sets. initial clustering is performed with a very coarse representation of the data. The results obtained from Keywords this “quick and dirty” clustering are used to initialize Time Series, Data Mining, Clustering, Anytime a clustering at a finer level of approximation. This Algorithms process is repeated until the clustering results stabilize or until the “approximation” is the original Algorithm k-Means “raw” data. The clustering is said to stabilize when 1. Decide on a value for k. the objects do not change membership from the last iteration, or when the change of membership does not 2. Initialize the k cluster centers (randomly, if improve the clustering results. Our approach allows necessary). the user to interrupt and terminate the process at any 3. Decide the class memberships of the N level. In addition to casting the k-Means algorithm as objects by assigning them to the nearest an anytime algorithm, our approach has two other cluster center. very unintuitive properties. The quality of the 4. Re-estimate the k cluster centers, by assuming clustering is often better than the batch algorithm, and the memberships found above are correct. even if the algorithm is run to completion, the time taken is typically much less than the time taken by the 5. If none of the N objects changed membership batch algorithm. We explain, and empirically in the last iteration, exit. Otherwise goto 3. demonstrate these surprising and desirable properties Table 1: An outline of the k-Means algorithm. with comprehensive experiments on several publicly The k-Means algorithm for N objects has a available real data sets. complexity of O(kNrD) [27], with k the number of The rest of this paper is organized as follows. In clusters specified by the user, r the number of Section 2 we review related work, and introduce the iterations until convergence, and D the dimensionality necessary background on the wavelet transform and of the points. The shortcomings of the algorithm are k-Means clustering. In Section 3, we introduce our its tendency to favor spherical clusters, and the fact algorithm. Section 4 contains a comprehensive that the knowledge on the number of clusters, k, is comparison of our algorithm to classic k-Means on required in advance. The latter limitation can be real datasets. In Section 5 we summarize our findings mitigated by placing the algorithm in a loop, and and offer suggestions for future work. attempting all values of k within a large range. 2. BACKGROUND AND RELATED WORK Various statistical tests can then be used to determine which value of k is most parsimonious. Since k- Since our work draws on the confluence of Means is essentiality a hill-climbing algorithm, it is clustering, wavelets and anytime algorithms, we guaranteed to converge on a local but not necessarily provide the necessary background on these areas in global optimum. In other words, the choices of the this section. initial centers are critical to the quality of results. Nevertheless, in spite of these undesirable properties, 2.1 Background on Clustering for clustering large datasets of time-series, k-Means is One of the most widely used clustering preferable due to its faster running time. approaches is hierarchical clustering, due to the great In order to scale the various clustering methods to visualization power it offers [22]. Hierarchical massive datasets, one can either reduce the number of clustering produces a nested hierarchy of similar objects, N, by sampling [5], or reduce the groups of objects, according to a pairwise distance dimensionality of the objects [1, 6, 14, 25, 29, 35, 36, matrix of the objects. One of the advantages of this 22, 23]. In the case of time-series, the objective is to method is its generality, since the user does not need find a representation at a lower dimensionality that to provide any parameters such as the number of preserves the original information and describes the clusters. However, its application is limited to only original shape of the time-series data as closely as small datasets, due to its quadratic (or higher order) possible. Many approaches have been suggested in computational complexity. the literature, including the Discrete Fourier A faster method to perform clustering is k-Means Transform (DFT) [1, 14], Singular Value [5, 27]. The basic intuition behind k-Means (and a Decomposition [25], Adaptive Piecewise Constant more general class of clustering algorithms known as Approximation [23], Piecewise Aggregate iterative refinement algorithms) is shown in Table 1: Approximation (PAA) [7, 36], Piecewise Linear Approximation [22] and the Discrete Wavelet A B 0 80 160 240 320 400 480 560 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 Figure 1: The Haar Wavelet representation can be Figure 2: The Haar Wavelet can represent data at visualized as an attempt to approximate a time series different levels of resolution. Above we see a raw time with a linear combination of basis functions. In this series, with increasingly finer wavelet approximations case, time series A is transformed to B by Haar wavelet below. decomposition, and the dimensionality is reduced from 512 to 8. Transform (DWT) [6, 29]. While all these useful for multi-resolution analysis of data. The approaches have shared the ability to produce a high first few coefficients contain an overall, coarse quality reduced-dimensionality approximation of approximation of the data; additional coefficients time series, wavelets are unique in that their can be perceived as "zooming-in" to areas of high representation of data is intrinsically multi- detail. Figure 1 and 2 illustrate this idea. resolution. This property is critical to our proposed The Haar Wavelet decomposition works by algorithm and will be discussed in detail in the next averaging two adjacent values on the time series section. function at a given resolution to form a smoothed, Although we choose the Haar wavelet for this lower-dimensional signal, and the resulting work, the algorithm can generally utilize any coefficients are simply the differences between the wavelet basis. The preference for the Haar wavelet values and their averages [6]. The coefficients can is mainly based on its simplicity and its wide usage also be computed by averaging the differences in the data mining community. between each pair of adjacent values. The coefficients are crucial for reconstructing the 2.2 Background on Wavelets original sequence, as they store the detailed information lost in the smoothed signal. For Wavelets are mathematical functions that represent example, suppose we have a time series data T = (2 data or other functions in terms of the averages and 8 1 5 9 7 2 6).

Load more