A Proofs for Model Equivalence

Linear Dynamics: Clustering without identification AProofsformodelequivalence This implies that each dimension of yt can be generated by an ARMAX(n, n, n 1) model, where the autore- − In this section, we prove a generalization of Theorem gressive parameters are the characteristic polynomial 4.1 for both LDSs with observed inputs and LDSs with coefficients in reverse order and in negative values. hidden inputs. To prove the theorem, we introduce a lemma to an- A.1 Preliminaries alyze the autoregressive behavior of the hidden state projected to a generalized eigenvector direction. Sum of ARMA processes It is known that the Lemma A.2. Consider a linear dynamical system with sum of ARMA processes is still an ARMA process. n parameters ⇥=(A, B, C, D), hidden states ht R , k m 2 Lemma A.1 (Main Theorem inputs xt R , and outputs yt R as defined in 2 2 in [Granger and Morris, 1976]). The sum of two (1). For any generalized eigenvector ei of A⇤ with independent stationary series generated by ARMA(p, eigenvector λ and rank µ, the lag operator polynomial m) and ARMA(q, n) is generated by ARMA(x, y), µ (i) (1 λL) applied to time series ht := ht,ei results where x p + q and y max(p + n, q + m). − h i in In shorthand notation, ARMA(p, m)+ARMA(q, n)= µ (i) (1 λL) ht = linear transformation of xt, ,xt µ+1. ARMA(p + q, max(p + n, q + m)). − ··· − When two ARMAX processes share the same exoge- Proof. To expand the LHS, first observe that nous input series, the dependency on exogenous in- (i) put is additive, and the above can be extended to (1 λL)h =(1 λL) ht,ei − t − h i ARMAX(p, m, r)+ARMAX(q, n, s)=ARMAX(p + (i) (i) = h ,ei λL h ,ei q, max(p + n, q + m), max(r, s)). h t i h t i = Aht 1 + Bxt,ei ht 1,λei h − ih − i Jordan canonical form and canonical basis Ev- (i) = ht 1, (A⇤ λI)ei + Bxt,ei . ery square real matrix is similar to a complex block h − − i h i diagonal matrix known as its Jordan canonical form We can apply (1 λL) again similarly to obtain − (JCF). In the special case for diagonalizable matri- 2 (i) 2 ces, JCF is the same as the diagonal form. Based (1 λL) ht = ht 2, (A⇤ λI) ei − h − − i on JCF, there exists a canonical basis e consisting { i} + Bxt 1, (A⇤ λI)ei +(1 λL) Bxt,ei , only of eigenvectors and generalized eigenvectors of h − − i − h i A.Avectorv is a generalized eigenvector of rank µ and in general we can show inductively that with corresponding eigenvalue λ if (λI A)µv =0and µ 1 − k (i) k (λI A) − v =0. (1 λL) ht ht k, (A⇤ λI) ei = − 6 − h − − i k 1 Relating the canonical basis to the characteristic poly- − k 1 j j j (1 λL) − − L Bx , (A⇤ λI) e , nomial, the characteristic polynomial can be com- − h t − ii j=0 pletely factored into linear factors χA(λ)=(λ X µ1 µ2 µr − λ1) (λ λ2) (λ λr) over C. The complex − ··· − where the RHS is a linear transformation of roots λ1, ,λr are eigenvalues of A.Foreacheigen- ··· xt, ,xt k+1. value λi,thereexistµi linearly independent generalized ··· − µi µ eigenvectors v such that (λiI A) v =0. Since (λI A⇤) ei =0by definition of general- − − µ ized eigenvectors, ht µ, (A⇤ λI) ei =0,and (i)h − − i A.2 General model equivalence theorem hence (1 λL)µh itself is a linear transformation − t of xt, ,xt µ+1. Now we state Theorem A.1, a more detailed version of ··· − Theorem 4.1. Proof for Theorem A.1 Using Lemma A.2 and the Theorem A.1. For any linear dynamical system with canonical basis, we can prove Theorem A.1. parameters ⇥=(A, B, C, D), hidden dimension n,in- k m puts xt R , and outputs yt R , the outputs yt Proof. Let λ , ,λ be the eigenvalues of A with mul- 2 2 1 ··· r satisfy tiplicity µ , ,µ .SinceA is a real-valued matrix, its 1 ··· r χA† (L)yt = χA† (L)⇠t +Γ(L)xt, (5) adjoint A has the same characteristic polynomial and ⇤ n n 1 eigenvalues as A. There exists a canonical basis ei where L is the lag operator, χA† (L)=L χA(L− ) is the { }i=1 for A⇤, where e , ,e are generalized eigenvectors reciprocal polynomial of the characteristic polynomial 1 ··· µ1 of A, and Γ(L) is an m-by-k matrix of polynomials of degree n 1. − Chloe Ching-Yun Hsu, Michaela Hardt, Moritz Hardt with eigenvalue λ , e , ,e are generalized Gaussians, y0 is then generated by an ARMA(n, n 1) 1 µ1+1 ··· µ1+µ2 t − eigenvectors with eigenvalue λ2,soonandsoforth,and process with autoregressive polynomial χA† (L). eµ1+ +µr 1+1, ,eµ1+ +µr are generalized eigenvec- ··· − ··· ··· The output noise ⇠t itself can be seen as an ARMA(0, 0) tors with eigenvalue λr. process. By Lemma A.1, ARMA(n, n 1)+ (i) − By Lemma (A.2), (1 λ L)µ1 h is a linear trans- ARMA(0, 0)=ARMA(n +0, max(n +0,n 1 + 0)) − 1 t − formation of xt, ,xt µ1+1 for i =1, ,µ1; (1 =ARMA(n, n). Hence the outputs yt are generated (i) ··· − ··· − µ2 by an ARMA(n, n) process as claimed in Theorem 4.1. λ2L) ht is a linear transformation of xt, ,xt µ2+1 ··· − for i = µ +1, ,µ + µ ;soonandsoforth;(1 It is easy to see in the proof of Lemma A.1 that the 1 ··· 1 2 − µr (i) autoregressive parameters do not change when adding λrL) ht is a linear transformation of xt, ,xt µr +1 ··· − a white noise [Granger and Morris, 1976]. for i = µ1 + + µr 1 +1, ,n. ··· − ··· µj We then apply lag operator polynomial ⇧j=i(1 λjL) 6 − BProofforeigenvalueapproximation to both sides of each equation. The lag polynomial µ1 µr in the LHS becomes (1 λ1L) (1 λrL) = theorems − ··· µ−j χ† (L). For the RHS, since ⇧j=i(1 λjL) is of degree A 6 − n µ ,itlagstheRHSbyatmostn µ additional Here we restate Theorem 4.2 and Theorem 4.3 together, − i − i steps, and the RHS becomes a linear transformation of and prove it in three steps for 1) the general case, 2) the simple eigenvalue case, and 3) the explicit condition xt, ,xt n+1. ··· − number bounds for the simple eigenvalue case. (i) Thus, for each i, χ† (L)h is a linear transformation A t Theorem B.1. Suppose yt are the outputs from an of xt, ,xt n+1. ··· − n-dimensional latent linear dynamical system with parameters ⇥=(A, B, C, D) and eigenvalues λ , ,λ . The outputs of the LDS are defined as yt = Cht + 1 ··· n n (i) Let Φˆ =('ˆ1, , 'ˆn) be the estimated autoregressive Dxt + ⇠t = i=1 ht Cei + Dxt + ⇠t.Bylinearity, ··· n (i) parameters with error Φˆ Φ = ✏, and let r1, ,rn and since χA† (L) is of degree n,both i=1 ht Cei and k − k ···n P be the roots of the polynomial 1 'ˆ1z 'ˆnz . χ† (L)Dxt are linear transformations of xt, ,xt n. − −···− A P ··· − We can write any such linear transformation as Γ(L)xt Assuming the LDS is observable, the roots converge to m k Γ(L) the true eigenvalues with convergence rate (✏1/n).If for some -by- matrix of polynomials of degree O n 1. Thus, as desired, all eigenvalues of A are simple (i.e. multiplicity 1), − then the convergence rate is (✏).IfA is symmetric, O χA† (L)yt =χA† (L)⇠t +Γ(L)xt. Lyapunov stable (spectral radius at most 1), and only has simple eigenvalues, then Assuming that there are no common factors in χA† and p n 1 Γ, χ† is then the lag operator polynomial that repre- n2 − 2 A ri λi ✏ + (✏ ). sents the autoregressive part of yt. This assumption is | − |⇧k=j λj λk O 6 | − | the same as saying that yt cannot be expressed as a lower-order ARMA process. The reciprocal polynomial B.1 General (1/n)-exponent bound has the same coefficients in reverse order as the original polynomial. According to the lag operator polynomial This is a known perturbation bound on polynomial 2 n on the LHS, 1 '1L '2L 'nL = χA† (L), root finding due to Ostrowski [Beauzamy, 1999]. n n −1 − −···− and L '1L − 'n = χA(L),sothei-th order n n 1 Lemma B.1. Let Φ(z)=z + '1z − + + 'n 1z + − −···− − autoregressive parameter 'i is the negative value of n n 1 ··· 'n and (z)=z + 1z − + + n 1z + n be two the (n i)-th order coefficient in the characteristic ··· − − polynomials of degree n.If Φ 2 <✏, then the polynomial χA. k − k roots (rk) of Φ and roots (r˜k) of under suitable order satisfy r r˜ 4Cp✏1/n, | k − k| 1/n 1/n A.3 The hidden input case as a corollary where C = max1,0 k n 'n , n . {| | | | } The statement about LDS without external inputs in The general (✏1/n) convergence rate in Theorem 4.2 O Theorem 4.1 comes as a corollary to Theorem A.1, with follows directly from Lemma B.1 and Theorem 4.1. ashortproofhere. B.2 Bound for simple eigenvalues Proof. Define yt0 = Cht +Dxt to be the output without 1 † noise, i.e.

Load more