Linear Dynamics: Clustering Without Identification

Linear Dynamics: Clustering without identification Chloe Ching-Yun Hsuy Michaela Hardty Moritz Hardty University of California, Berkeley Amazon University of California, Berkeley Abstract for LDS parameter estimation, but it is inherently non-convex and can often get stuck in local min- Linear dynamical systems are a fundamental ima [Hazan et al., 2018]. Even when full system iden- and powerful parametric model class. How- tification is hard, is there still hope to learn meaningful ever, identifying the parameters of a linear information about linear dynamics without learning all dynamical system is a venerable task, per- system parameters? We provide a positive answer to mitting provably efficient solutions only in this question. special cases. This work shows that the eigen- We show that the eigenspectrum of the state-transition spectrum of unknown linear dynamics can be matrix of unknown linear dynamics can be identified identified without full system identification. without full system identification. The eigenvalues of We analyze a computationally efficient and the state-transition matrix play a significant role in provably convergent algorithm to estimate the determining the properties of a linear system. For eigenvalues of the state-transition matrix in example, in two dimensions, the eigenvalues determine a linear dynamical system. the stability of a linear dynamical system. Based on When applied to time series clustering, the trace and the determinant of the state-transition our algorithm can efficiently cluster multi- matrix, we can classify a linear system as a stable dimensional time series with temporal offsets node, a stable spiral, a saddle, an unstable node, or an and varying lengths, under the assumption unstable spiral. that the time series are generated from linear To estimate the eigenvalues, we utilize a funda- dynamical systems. Evaluating our algorithm mental correspondence between linear systems and on both synthetic data and real electrocardio- Autoregressive-Moving-Average (ARMA) models. We gram (ECG) signals, we see improvements in establish bi-directional perturbation bounds to prove clustering quality over existing baselines. that two LDSs have similar eigenvalues if and only if their output time series have similar auto-regressive parameters. Based on a consistent estimator for 1 Introduction the autoregressive model parameters of ARMA models [Tsay and Tiao, 1984], we propose a regularized it- Linear dynamical system (LDS) is a simple yet generated least-squares regression method to estimate the eral model for time series. Many machine learn- LDS eigenvalues. Our method runs in time linear in the ing models are special cases of linear dynamical sys- sequence length T and converges to true eigenvalues at tems [Roweis and Ghahramani, 1999], including princi- the rate O (T −1=2). pal component analysis (PCA), mixtures of Gaussians, p Kalman filter models, and hidden Markov models. As one application, our eigenspectrum estimation algorithm gives rise to a simple approach for time series When the states are hidden, LDS parameter iden- clustering: First use regularized iterated least-squares tification has provably efficient solutions only in regression to fit the autoregressive parameters; then special cases, see for example [Hazan et al., 2018, cluster the fitted autoregressive parameters. Hardt et al., 2018, Simchowitz et al., 2018]. In prac- tice, the expectation–maximization (EM) algo- This simple and efficient clustering approach captures rithm [Ghahramani and Hinton, 1996] is often used similarity in eigenspectrums, assuming each time series comes from an underlying linear dynamical system. It yThis work was done at Google. is a suitable similarity measure where the main goal for rd Proceedings of the 23 International Conference on Artificial clustering is to characterize state-transition dynamics Intelligence and Statistics (AISTATS) 2020, Palermo, Italy. regardless of change of basis, particularly relevant when PMLR: Volume 108. Copyright 2020 by the author(s). Linear Dynamics: Clustering without identification there are multiple data sources with different measure- Afsari et al., 2012, Vishwanathan et al., 2007], where ment procedures. Our approach bypasses the challenge the observed time series are higher dimensional than of LDS full parameter estimation, while enjoying the the hidden state dimension. Our work is motivated natural flexibility to handle multi-dimensional time by the more challenging sitatuion with a single or a series with time offsets and partial sequences. few output dimensions, common in climatology, energy consumption, finance, medicine, etc. To verify that our method efficiently learns system eigenvalues on synthetic and real ECG data, Compared to ARMA-parameter based clustering, our we compare our approach to existing baselines, in- method only uses the AR half of the parameters which cluding model-based approaches based on LDS, AR we show to enjoy more reliable convergence. We also and ARMA parameter estimation, and PCA, as well differ from AR-model based clustering because fitting as model-agnostic clustering approaches such as dy- AR to an ARMA process results in biased estimates. namic time warping [Cuturi and Blondel, 2017] and Autoregressive parameter estimation. Existing k-Shape [Paparrizos and Gravano, 2015]. spectral analysis methods for estimating AR parame- Organization. We review LDS and ARMA models in ters in ARMA models include high-order Yule-Walker Sec. 3. In Sec. 4 we discuss the main technical results (HOYW), MUSIC, and ESPRIT [Stoica et al., 2005, around the correspondence between LDS and ARMA. Stoica et al., 1988]. Our method is based on iterated re- Sec. 5 presents the regularized iterated regression algo- gression [Tsay and Tiao, 1984], a more flexible method rithm, a consistent estimator of autoregressive parame- for handling observed exogenous inputs (see Appendix ters in ARMA models with applications to clustering. C) in the ARMAX generalization. We carry out eigenvalue estimation and clustering ex- periments on synthetic data and real ECG data in 3 Preliminaries Sec. 6. In the appendix, we describe generalizations to observable inputs and multidimensional outputs, and 3.1 Linear dynamical systems include additional simulation results. A discrete-time linear dynamical system (LDS) with pa- 2 Related Work rameters Θ = (A; B; C; D) receives inputs x1; ··· ; xT 2 k n R , has hidden states h0; ··· ; hT 2 R , and generates m Linear dynamical system identification. The outputs y1; ··· ; yT 2 R according to the following LDS identification problem has been studied since time-invariant recursive equations: the 60s [Kalman, 1960], yet the theoretical bounds are ht = Aht−1 + Bxt + ζt still not fully understood. Recent provably efficient al- (1) gorithms [Simchowitz et al., 2018, Hazan et al., 2018, yt = Cht + Dxt + ξt: Hardt et al., 2018, Dean et al., 2017] require setups Assumptions. We assume that the stochastic noise that are not best-suited for time series clustering, such ζ and ξ are diagonal Gaussians. We also assume the as assuming observable states and focusing on predition t t system is observable, i.e. C; CA; CA2; ··· ;CAn−1 are error instead of parameter recovery. linearly independent. When the LDS is not observable, On recovering system parameters without observed the ARMA model for the output series can be reduced states, Tsiamis et al. recently study a subspace iden- to lower AR order, and there is not enough information tification algorithm with non-asymptotic O(T −1=2) in the output series to recover all the full eigenspectrum. rate [Tsiamis and Pappas, 2019]. While our analysis The model equivalence theorem (Theorem 4.1) and does not provide finite sample complexity bounds, our the approximation theorem (Theorem 4.2) do not re- simple algorithm achieves the same rate asymptotically. quire any additional assumptions for any real matrix Model-based time series clustering. Com- A. When additionally assuming A only has simple mon model choices for clustering include Gaus- eigenvalues in C, i.e. each eigenvalue has multiplicity sian mixture models [Biernacki et al., 2000], autore- 1, we give a better convergence bound. gressive integrated moving average (ARIMA) mod- Distance between linear dynamical systems. els [Kalpakis et al., 2001], and hidden Markov mod- With the main goal to characterize state-transition els [Smyth, 1997]. Gaussian mixture models, ARIMA dynamics, we view systems as equivalent up to change models, and hidden Markov models are all special of basis, and use the ` distance of the spectrum of the cases of the more general linear dynamical system 2 transition matrix A, i.e. d(Θ ; Θ ) = kλ(A )−λ(A )k , model [Roweis and Ghahramani, 1999]. 1 2 1 2 2 where λ(A1) and λ(A2) are the spectrum of A1 and A2 Linear dynamical systems have used to clus- in sorted order. This distance definition satisfies non- ter video trajectories [Chan and Vasconcelos, 2005, negativity, identity, symmetry, and triangle inequality. Chloe Ching-Yun Hsu, Michaela Hardt, Moritz Hardt Two very different time series could still have small dis- and MA(q) models to consider dependencies both on tance in eigenspectrum. This is by design to allow the past time series values and past unpredictable shocks, flexiblity for different measurement procedures, which p q mathematically correspond to different C matricies. yt = c + t + Σi=1'iyt−i + Σi=1θit−i: When there are multiple data sources with different measurement procedures, our approach can compare ARMAX model. ARMA can be generalized to the underlying dynamics of time series across sources. autoregressive–moving-average model with exogenous Jordan canonical basis. Every square real matrix inputs (ARMAX). is similar to a complex block diagonal matrix known as its Jordan canonical form (JCF). In the special case p q r yt = c + t + Σi=1'iyt−i + Σi=1θit−i + Σi=0γixt−i; for diagonalizable matrices, JCF is the same as the diagonal form. Based on JCF, there exists a canonical where fxtg is a known external time series, possibly basis feig consisting only of eigenvectors and gener- multidimensional. In case xt is a vector, the parameters alized eigenvectors of A.

Linear Dynamics: Clustering Without Identification

Linear Systems

3.3 Diagonalization and Eigenvalues

Binet-Cauchy Kernels on Dynamical Systems and Its Application to the Analysis of Dynamic Scenes ∗

Arxiv:1908.01039V3 [Cs.LG] 29 Feb 2020 LDS Eigenvalues

Linear Dynamical Systems As a Core Computational Primitive

Dynamical Systems and Matrix Algebra

Lecture Notes for EE263

Dynamical Systems Dennis Pixton

An Elementary Introduction to Linear Dynamical System

Lecture Notes for Introduction to Dynamical Systems: CM131A 2018-2019

Mathematical Description of Linear Dynamical Systems* R

STABILITY Math 21B, O. Knill