Linear Dynamics: Clustering without identification
AProofsformodelequivalence This implies that each dimension of yt can be generated by an ARMAX(n, n, n 1) model, where the autore- In this section, we prove a generalization of Theorem gressive parameters are the characteristic polynomial 4.1 for both LDSs with observed inputs and LDSs with coefficients in reverse order and in negative values. hidden inputs. To prove the theorem, we introduce a lemma to an- A.1 Preliminaries alyze the autoregressive behavior of the hidden state projected to a generalized eigenvector direction. Sum of ARMA processes It is known that the Lemma A.2. Consider a linear dynamical system with sum of ARMA processes is still an ARMA process. n parameters ⇥=(A, B, C, D), hidden states ht R , k m 2 Lemma A.1 (Main Theorem inputs xt R , and outputs yt R as defined in 2 2 in [Granger and Morris, 1976]). The sum of two (1). For any generalized eigenvector ei of A⇤ with independent stationary series generated by ARMA(p, eigenvector and rank µ, the lag operator polynomial m) and ARMA(q, n) is generated by ARMA(x, y), µ (i) (1 L) applied to time series ht := ht,ei results where x p + q and y max(p + n, q + m). h i in In shorthand notation, ARMA(p, m)+ARMA(q, n)= µ (i) (1 L) ht = linear transformation of xt, ,xt µ+1. ARMA(p + q, max(p + n, q + m)). ··· When two ARMAX processes share the same exoge- Proof. To expand the LHS, first observe that nous input series, the dependency on exogenous in- (i) put is additive, and the above can be extended to (1 L)h =(1 L) ht,ei t h i ARMAX(p, m, r)+ARMAX(q, n, s)=ARMAX(p + (i) (i) = h ,ei L h ,ei q, max(p + n, q + m), max(r, s)). h t i h t i = Aht 1 + Bxt,ei ht 1, ei h i h i Jordan canonical form and canonical basis Ev- (i) = ht 1, (A⇤ I)ei + Bxt,ei . ery square real matrix is similar to a complex block h i h i diagonal matrix known as its Jordan canonical form We can apply (1 L) again similarly to obtain (JCF). In the special case for diagonalizable matri- 2 (i) 2 ces, JCF is the same as the diagonal form. Based (1 L) ht = ht 2, (A⇤ I) ei h i on JCF, there exists a canonical basis e consisting { i} + Bxt 1, (A⇤ I)ei +(1 L) Bxt,ei , only of eigenvectors and generalized eigenvectors of h i h i A.Avectorv is a generalized eigenvector of rank µ and in general we can show inductively that with corresponding eigenvalue if ( I A)µv =0and µ 1 k (i) k ( I A) v =0. (1 L) ht ht k, (A⇤ I) ei = 6 h i k 1 Relating the canonical basis to the characteristic poly- k 1 j j j (1 L) L Bx , (A⇤ I) e , nomial, the characteristic polynomial can be com- h t ii j=0 pletely factored into linear factors A( )=( X µ1 µ2 µr 1) ( 2) ( r) over C. The complex ··· where the RHS is a linear transformation of roots 1, , r are eigenvalues of A.Foreacheigen- ··· xt, ,xt k+1. value i,thereexistµi linearly independent generalized ··· µi µ eigenvectors v such that ( iI A) v =0. Since ( I A⇤) ei =0by definition of general- µ ized eigenvectors, ht µ, (A⇤ I) ei =0,and (i)h i A.2 General model equivalence theorem hence (1 L)µh itself is a linear transformation t of xt, ,xt µ+1. Now we state Theorem A.1, a more detailed version of ··· Theorem 4.1. Proof for Theorem A.1 Using Lemma A.2 and the Theorem A.1. For any linear dynamical system with canonical basis, we can prove Theorem A.1. parameters ⇥=(A, B, C, D), hidden dimension n,in- k m puts xt R , and outputs yt R , the outputs yt Proof. Let , , be the eigenvalues of A with mul- 2 2 1 ··· r satisfy tiplicity µ , ,µ .SinceA is a real-valued matrix, its 1 ··· r A† (L)yt = A† (L)⇠t + (L)xt, (5) adjoint A has the same characteristic polynomial and ⇤ n n 1 eigenvalues as A. There exists a canonical basis ei where L is the lag operator, A† (L)=L A(L ) is the { }i=1 for A⇤, where e , ,e are generalized eigenvectors reciprocal polynomial of the characteristic polynomial 1 ··· µ1 of A, and (L) is an m-by-k matrix of polynomials of degree n 1. Chloe Ching-Yun Hsu, Michaela Hardt, Moritz Hardt with eigenvalue , e , ,e are generalized Gaussians, y0 is then generated by an ARMA(n, n 1) 1 µ1+1 ··· µ1+µ2 t eigenvectors with eigenvalue 2,soonandsoforth,and process with autoregressive polynomial A† (L). eµ1+ +µr 1+1, ,eµ1+ +µr are generalized eigenvec- ··· ··· ··· The output noise ⇠t itself can be seen as an ARMA(0, 0) tors with eigenvalue r. process. By Lemma A.1, ARMA(n, n 1)+ (i) By Lemma (A.2), (1 L)µ1 h is a linear trans- ARMA(0, 0)=ARMA(n +0, max(n +0,n 1 + 0)) 1 t formation of xt, ,xt µ1+1 for i =1, ,µ1; (1 =ARMA(n, n). Hence the outputs yt are generated (i) ··· ··· µ2 by an ARMA(n, n) process as claimed in Theorem 4.1. 2L) ht is a linear transformation of xt, ,xt µ2+1 ··· for i = µ +1, ,µ + µ ;soonandsoforth;(1 It is easy to see in the proof of Lemma A.1 that the 1 ··· 1 2 µr (i) autoregressive parameters do not change when adding rL) ht is a linear transformation of xt, ,xt µr +1 ··· a white noise [Granger and Morris, 1976]. for i = µ1 + + µr 1 +1, ,n. ··· ··· µj We then apply lag operator polynomial ⇧j=i(1 jL) 6 BProofforeigenvalueapproximation to both sides of each equation. The lag polynomial µ1 µr in the LHS becomes (1 1L) (1 rL) = theorems ··· µ j † (L). For the RHS, since ⇧j=i(1 jL) is of degree A 6 n µ ,itlagstheRHSbyatmostn µ additional Here we restate Theorem 4.2 and Theorem 4.3 together, i i steps, and the RHS becomes a linear transformation of and prove it in three steps for 1) the general case, 2) the simple eigenvalue case, and 3) the explicit condition xt, ,xt n+1. ··· number bounds for the simple eigenvalue case. (i) Thus, for each i, † (L)h is a linear transformation A t Theorem B.1. Suppose yt are the outputs from an of xt, ,xt n+1. ··· n-dimensional latent linear dynamical system with pa- rameters ⇥=(A, B, C, D) and eigenvalues , , . The outputs of the LDS are defined as yt = Cht + 1 ··· n n (i) Let ˆ =('ˆ1, , 'ˆn) be the estimated autoregressive Dxt + ⇠t = i=1 ht Cei + Dxt + ⇠t.Bylinearity, ··· n (i) parameters with error ˆ = ✏, and let r1, ,rn and since A† (L) is of degree n,both i=1 ht Cei and k k ···n P be the roots of the polynomial 1 'ˆ1z 'ˆnz . † (L)Dxt are linear transformations of xt, ,xt n. ··· A P ··· We can write any such linear transformation as (L)xt Assuming the LDS is observable, the roots converge to m k (L) the true eigenvalues with convergence rate (✏1/n).If for some -by- matrix of polynomials of degree O n 1. Thus, as desired, all eigenvalues of A are simple (i.e. multiplicity 1), then the convergence rate is (✏).IfA is symmetric, O A† (L)yt = A† (L)⇠t + (L)xt. Lyapunov stable (spectral radius at most 1), and only has simple eigenvalues, then Assuming that there are no common factors in A† and p n 1 , † is then the lag operator polynomial that repre- n2 2 A ri i ✏ + (✏ ). sents the autoregressive part of yt. This assumption is | |⇧k=j j k O 6 | | the same as saying that yt cannot be expressed as a lower-order ARMA process. The reciprocal polynomial B.1 General (1/n)-exponent bound has the same coefficients in reverse order as the original polynomial. According to the lag operator polynomial This is a known perturbation bound on polynomial 2 n on the LHS, 1 '1L '2L 'nL = A† (L), root finding due to Ostrowski [Beauzamy, 1999]. n n 1 ··· and L '1L 'n = A(L),sothei-th order n n 1 Lemma B.1. Let (z)=z + '1z + + 'n 1z + ··· autoregressive parameter 'i is the negative value of n n 1 ··· 'n and (z)=z + 1z + + n 1z + n be two the (n i)-th order coefficient in the characteristic ··· polynomials of degree n.If 2 <✏, then the polynomial A. k k roots (rk) of and roots (r˜k) of under suitable order satisfy r r˜ 4Cp✏1/n, | k k| 1/n 1/n A.3 The hidden input case as a corollary where C = max1,0 k n 'n , n . {| | | | } The statement about LDS without external inputs in The general (✏1/n) convergence rate in Theorem 4.2 O Theorem 4.1 comes as a corollary to Theorem A.1, with follows directly from Lemma B.1 and Theorem 4.1. ashortproofhere. B.2 Bound for simple eigenvalues Proof. Define yt0 = Cht +Dxt to be the output without 1 † noise, i.e. yt = yt0 + ⇠t. By Theorem A.1, A(L)yt0 = The n -exponent in the above bound might seem not (L)xt.Sinceweassumethehiddeninputsxt are i.i.d. very ideal, but without additional assumptions the Linear Dynamics: Clustering without identification
1 n -exponent is tight. As an example, the polyno- mial x2 ✏ has roots x p✏. This is a general 00... 0 'n ± m 10... 0 'n 1 phenomenon that a root with multiplicity could 2 3 m O(✏m) 01... 0 'n 2 split into roots at rate ,andisrelatedtothe C( ) = . 6. . . . . 7 regular splitting property [Hryniv and Lancaster, 1999, 6...... 7 6 7 Lancaster et al., 2003]inmatrixeigenvalueperturba- 600... 1 ' 7 6 1 7 tion theory. 4 5 Under the additional assumption that all the eigen- The matrix C( ) is the companion in the sense that its values are simple (no multiplicity), we can prove a characteristic polynomial is equal to . better bound using the following idea with companion matrix: Small perturbation in autoregressive parame- In relation to a pure autoregressive AR(p)model,the ters results in small perturbation in companion matrix, companion matrix corresponds to the transition matrix and small perturbation in companion matrix results in in the linear dynamical system when we encode the small perturbation in eigenvalues. values form the past p lags as a p-dimensional state T ht = yt p+1 yt 1 yt . Matrix eigenvalue perturbation theory The ··· perturbation bound on eigenvalues is a well-studied ⇥ ⇤ If yt = '1yt 1 + + 'pyt p,thenht = problem [Greenbaum et al., 2019]. The regular split- ··· ting property states that, for an eigenvalue 0 with 01 0... 0 yt p+1 yt p partial multiplicities m1, ,mk,anO(✏) perturba- 00 1... 0 ··· yt p+2 yt p+1 tion to the matrix could split the eigenvalue into 2 3 2 . . . . . 32 3 = ...... M = m1 + + mk distinct eigenvalues ij(✏) for ··· 6 7 ··· ··· 6 yt 1 7 6 00 0... 1 76 yt 2 7 i =1, ,k and j =1, ,mi,andeacheigenvalue 6 7 6 76 7 ··· ··· 1/mi 6 yt 7 6 yt 1 7 (✏) is moved from the original position by O(✏ ). 6'p 'p 1 'p 2 ... '17 ij 6 7 6 76 7 4 5 T 4 54 5 = C( ) ht 1. For semi-simple eigenvalues, geometric multiplicity equals algebraic multiplicity. Since geometric multiplic- (7) ity is the number of partial multiplicities while alge- braic multiplicity is the sum of partial multiplicities, for Proof of Theorem 4.2 for simple eigenvalues semi-simple eigenvalues all partial multiplicities mi =1. Therefore, the regular splitting property corresponds to Proof. Let yt be the outputs of a linear dynamical the asymptotic relation in equation 6. It is known that system S with only simple eigenvalues, and let = regular splitting holds for any semi-simple eigenvalue (' , ,' ) be the ARMAX autoregressive parameters 1 ··· n even for non-Hermitian matrices. for yt.LetC( ) be the companion matrix of the n n 1 n 2 polynomial z ' z ' z ' .The Lemma B.2 (Theorem 6 in [Lancaster et al., 2003]). 1 2 ··· n Let L( ,✏) be an analytic matrix function with semi- companion matrix is the transition matrix of the LDS simple eigenvalue 0 at ✏ =0of multiplicity M. Then described in equation 7. Since this LDS the same there are exactly M eigenvalues i(✏) of L( ,✏) for autoregressive parameters and hidden state dimension which i(✏) 0 as ✏ 0, and for these eigenvalues as the original LDS, by Corollary 4.1 the companion ! ! matrix has the same characteristic polynomial as the i(✏)= 0 + i0 ✏ + o(✏). (6) original LDS, and thus also has simple (and hence also semi-simple) eigenvalues. The O(✏) convergence rate Companion Matrix Matrix perturbation theory then follows from Lemma B.2 and Theorem 5.1, as the tell us how perturbations on matrices change eigen- error on ARMAX parameter estimation can be seen as values, while we are interested in how perturbations perturbation on the companion matrix. on polynomial coefficients change roots. To apply ma- trix perturbation theory on polynomials, we introduce A note on the companion matrix One might the companion matrix,alsoknownasthecontrollable hope that we could have a more generalized result using canonical form in control theory. Lemma B.2 for all systems with semi-simple eigenvalues instead of restricting to matrices with simple eigenval- Definition B.1. For a monic polynomial (u)=zn + n 1 ues. Unfortunately, even if the original linear dynamical '1z + + 'n 1z + 'n, the companion matrix of ··· system has only semi-simple eigenvalues, in general the the polynomial is the square matrix companion matrix is not semi-simple unless the original linear dynamical system is simple. This is because the Chloe Ching-Yun Hsu, Michaela Hardt, Moritz Hardt
companion matrix always has its minimal polynomial The companion matrix can be diagonalized as C = 1 equal to its characteristic polynomial, and hence has V diag( , , )V , the rows of the Vandermonde 1 ··· n geometric multiplicity 1 for all eigenvalues. This also matrix V are the row eigenvectors of C, while the 1 points to the fact that even though the companion columns of V are the column eigenvectors of C.Since 1 matrix has the form of the controllable canonical form, the the j-th row Vj, and the j-th column V ,j have ⇤ in general it is not necessarily similar to the transition inner product 1 by definition of matrix inverse,⇤ the matrix in the original LDS. condition number is given by
1 (C, j)= Vj, 2 V ,j 2. (13) B.3 Explicit bound for condition number k ⇤k k ⇤ k
In this subsection, we write out explicitly the condi- Formula for inverse of Vandermonde matrix tion number for simple eigenvalues in the asymptotic The Vandermonde matrix is defined as relation (✏)= 0 + ✏ + o(✏),toshowhowitvaries 2 p 1 according to the spectrum. Here we use the notation 1 1 1 1 2 ··· p 1 (C, ) to note the condition number for eigenvalue 1 2 2 2 ··· 2 3 in companion matrix C. V = . . . . . (14) . . . . Lemma B.3. For a companion matrix C with simple 6 2 ··· p 17 61 p p p 7 eigenvalues , , , the eigenvalues 0 , , 0 of 6 ··· 7 1 ··· n 1 ··· n 4 5 the perturbed matrix by C + C satisfy The inverse of the Vandermonde matrix V is given 2 by [El-Mikkawy, 2003]usingelementarysymmetric 0 (C, ) C + o( C ), (8) | j j| j k k2 k k2 polynomial. and the condition number (C, j) is bounded by i+j 1 ( 1) Sp i,j 1 (V )i,j = , (15) (C, ) k