Optimization and https://doi.org/10.1007/s11081-021-09643-x

RESEARCH ARTICLE

Extracting a low‑dimensional predictable series

Yining Dong1 · S. Joe Qin2 · Stephen P. Boyd3

Received: 15 October 2020 / Revised: 16 March 2021 / Accepted: 10 May 2021 © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021

Abstract Large scale multi-dimensional time series can be found in many disciplines, includ- ing fnance, , biomedical engineering, and industrial engineering systems. It has long been recognized that the time dependent components of the vector time series often reside in a subspace, leaving its complement independent over time. In this paper we develop a method for projecting the time series onto a low-dimensional time-series that is predictable, in the sense that an auto-regressive model achieves low error. Our formulation and method follow ideas from principal component , so we refer to the extracted low-dimensional time series as principal time series. In one special case we can compute the optimal pro- jection exactly; in others, we give a heuristic method that seems to work well in practice. The efectiveness of the method is demonstrated on synthesized and real time series.

Keywords Time series · Dimension reduction · Feature extraction

* Yining Dong [email protected] S. Joe Qin [email protected] Stephen P. Boyd [email protected]

1 School of Science, City University of Hong Kong, Hong Kong, China 2 School of Data Science and Hong Kong Institute for Data Science, Centre for Systems Informatics Engineering, City University of Hong Kong, 83 Tat Chee Ave., Hong Kong, China 3 Department of Electrical Engineering, Stanford University, Stanford, USA

Vol.:(0123456789)1 3 Y. Dong et al.

1 Introduction

High dimensional time series analysis and applications have become increasingly important in many diferent domains. In many cases, the high dimensional time series data are both cross-correlated and auto-correlated. Cross-correlations among diferent time series make it possible to use a set of lower dimensional time series to represent the original, high dimensional time series. For example, principal compo- nent analysis (PCA) (Connor and Korajczyk 1986; Bai and Ng 2008) and general- ized PCA (Choi 2012) methods utilize the cross-correlations among diferent time series to extract lower dimensional factors that capture maximal . Although PCA has seen wide use as a dimension reduction method, it does not focus on mod- eling of auto-correlations or dynamics that can exist in time series data. Since auto- correlations make it possible to predict values from the past values, it is desir- able to perform such low dimensional modeling of the dynamics. In this work, a linear projection method is proposed to extract a lower dimen- sional most predictable time series from high-dimensional time series. The entries of the low dimensional time series are mutually uncorrelated so that they capture as much dynamics as possible. The advantage of the proposed method is that it focuses on extracting principal time series with most dynamics. Therefore, the dynamic fea- tures of the high dimensional data are concentrated in a set of lower dimensional time series, which makes it very useful for data prediction, dynamic feature extrac- tion and visualization. The proposed method has numerous potential applications, ranging from fnance to industrial engineering. In fnance, if the high dimensional time series consist of returns of some assets, applying the proposed method gives the most predictable portfolio. In chemical processes, oscillations are usually undesirable (Thornhill and Hägglund 1997; Thornhill et al. 2003) and applying the proposed method to the pro- cess measurements can help detect the unwanted oscillations. In biomedical engi- neering, (EEG) data can be characterized with waves with diferent frequencies (Teplan 2002; Tatum 2014). Applying the proposed method to EEG data has the potential to detect the diferent waves. In this work, the extraction of principal time series and the VAR modeling of the principal time series are achieved simultaneously by solving the optimization problem (i.e., there is no need to reft a VAR model for the principal time series after they are extracted). In addition, the extracted principal time series are best pre- dictable from their past values and capture most of dynamics. This property makes the proposed method very useful for prediction, dynamic feature extraction, and visualization.

1 3 Extracting a low‑dimensional predictable time series

2 The most predictable projected time series

2.1 of a time series

Consider a wide-sense stationary n-dimensional vector time series process n zt ∈ , t ∈ , with z z zT  t = 0, t t+ =Σ , ∈ . (1)

Here Σ is the auto- matrix for lag . The zero assumption is without loss of generality, since this can be arranged by subtracting the mean from the origi- nal process, if it is not zero. We refer to a time series with Σ0 = I as standardized.

Predictability measure An M-memory auto-regressive (AR) predictor for zt has the form

ẑt = A1zt−1 + A2zt−2 + ⋯ + AMzt−M,

n×n where Ai ∈ , i = 1, 2, … , M are the AR (matrix) coefcients. We defne the (M-memory) (un-)predictability measure for the time series as the smallest possible mean square AR prediction error, 2 � = min zt − ẑt 2, A1,…,AM (2) which has the same value for all t. To simplify the notation for the rest of paper, we defne A = A1 ⋯ AM . We can easily evaluate the predictability measure . The objective can be expressed as

T T ⋯ T Σ1 Σ0Σ1 ΣM−1 Σ Σ Σ ⋯ ΣT z − ẑ 2 =  ⎛Σ − 2 2 AT + A 1 0 M−2 AT ⎞. t t 2 0 ⎡ ⋮ ⎤ ⎡ ⋮⋮⋱⋮ ⎤ ⎜ ⎟ ⎢Σ ⎥ ⎢Σ Σ ⋯ Σ ⎥ ‖ ‖ ⎜ ⎢ M⎥ ⎢ M−1 M−2 0⎥ ⎟ ⎜ ⎢ ⎥ ⎢ ⎥ ⎟ ⎜ ⎣ ⎦ ⎣ ⎦ ⎟ The optimal AR coefcients ⎝are readily found to be ⎠ T ⋯ T −1 Σ0 Σ1 ΣM−1 Σ1 Σ Σ ⋯ ΣT Σ AT = 1 0 M−2 2 , ⎡ ⋮ ⋮⋱⋮⎤ ⎡ ⋮ ⎤ ⎢Σ Σ ⋯ Σ ⎥ ⎢Σ ⎥ ⎢ M−1 M−2 0 ⎥ ⎢ M⎥ ⎢ ⎥ ⎢ ⎥ assuming the inverse of the⎣ symmetric semidefnite⎦ block⎣ ⎦ Toeplitz matrix above exists. It follows that

1 3 Y. Dong et al.

1 T T ⋯ T − Σ1 Σ0 Σ1 ΣM−1 Σ1 T Σ2 Σ1 Σ0 ⋯ Σ Σ2 =  ⎛Σ0 − ⎡ M−2⎤ ⎞. ⎡ ⋮ ⎤ ⋮ ⋮⋱⋮ ⎡ ⋮ ⎤ (3) ⎜ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎟ ⎜ ΣM ⎢ ⋯ ⎥ ΣM ⎟ ⎢ ⎥ ΣM−1 ΣM−2 Σ0 ⎢ ⎥ ⎜ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎟ ⎜ ⎣ ⎦ ⎢ ⎥ ⎣ ⎦⎟ It can be shown that 0 ⎝≤ ≤  Σ0 .⎣ A low value of indicates⎦ high ⎠predictability; a high value of indicates low predictability. The extreme case = 0 that the AR model has zero residual, and the extreme case =  Σ0 occurs when zt and zs are uncorrelated for t ≠ s , so Σ = 0 for ≠ 0 , and the optimal AR coefcients are all zero.

2.2 The most predictable projected time series

m We can obtain a lower-dimensional time series xt ∈ as a linear of the n T n×m original time series zt ∈ , as xt = W zt , where W ∈ , with m < n . We denote the auto-covariance matrices of xt as T T S = xtxt+ = W Σ W, ∈ .

Our goal is to choose W so that the series xt is predictable, i.e., has a low value of . We evidently need to normalize W to rule out the solution W = 0 ; we do this with the constraint T W Σ0W = S0 = I.

T This ensures that xtxt = I , i.e., the low-dimensional time series xt is standardized. The most predictable projected time series is found by solving the optimization problem maximize f (W) (4) subject to S0 = I, with variables W and A, where −1 T T ⋯ T S1 S0S1 SM−1 S1 S2 ⋯ T S2 f (W)=  S1S0 SM−2 . ⎡ ⋮ ⎤ ⎡ ⎤ ⎡ ⋮ ⎤ ⎢ ⋮⋮⋱⋮ ⎥ ⎢S ⎥ ⎢ ⋯ ⎥ ⎢S ⎥ ⎢ M⎥ SM−1SM−2 S0 ⎢ M⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎢ ⎥ ⎣ ⎦ The solution is evidently not unique;⎣ if Q is an orthonormal⎦ m × m matrix, then f (WQ)=f (W) . In other words, all solutions difer by an orthonormal matrix.

1 3 Extracting a low‑dimensional predictable time series

2.3 Special case with exact solutions

Here we observe that when M = 1 and m = 1 , the problem can be solved exactly. For projection, the problem is T 2 maximize W Σ1W F T subject to W Σ0W = I, ‖ ‖ 1∕2 using S0 = I to simplify the objective. This is readily solved. Defne Z =Σ0 W and −1∕2 −1∕2 T 2 T Y =Σ0 Σ1Σ0 , to the problem is to maximize Z YZ F subject to Z Z = I. T Let V denote the eigenvector of Y + Y corresponding to the eigenvalue with the maximum magnitude. Then V satisfes the‖ constraint‖ and maximizes the ⋆ −1∕2 objective value. Therefore, the optimal W is W =Σ0 V. ⋆ ⋆ Once an optimal W is obtained, the optimal A can be easily calculated as ⋆ T −1 T ⋆ T T ⋆ A = S1 S0 = S1 =(W ) Σ1 W .

3 Algorithm

When M ≠ 1 , there is no exact solution to (4), to the best of our knowledge. Prob- lem (4) is essentially an optimization problem over the Grassmannian manifold. There has been research on how to address such optimization problems, including Absil et al. (2009); Usevich and Markovsky (2014); Edelman et al. (1998). In this section, we give a heuristic method that seems to work well in practice. n×m We will construct the columns of W ∈ sequentially. Each column is cho- sen satisfy the constraint S0 = I , while maximizing the predictability of the pro- jected time series. In this section, we explain how to achieve this. k n×k Assume that we have already constructed k columns of W, with W ∈ and k k×Mk k 0 A ∈ . (We initialize k = 0 , W = 0 , and A = 0 to construct the frst col- k+1 k n umn). Then, our goal is to choose W =[W w] , where w ∈ , such that the (k + 1)-dimensional projected time series is most predictable. This is equivalent to the optimization problem maximize f (Wk+1) subject to Wk+1 =[Wk w] k+1 (5) S0 = I,

k k+1 k+1 T k+1 where W is fxed and S0 =(W ) Σ0W . We cannot fnd exact solutions to this k+1 k+1 problem. However, we know how to solve for A exactly when W is fxed, and k+1 k+1 how to solve for W exactly when A is fxed. We can iterate these two steps until convergence to obtain an approximate solution to (5).

1 3 Y. Dong et al.

3.1 Solving for Ak+1 with fxed w

k+1 k+1 When w is fxed, we can solve for A exactly. According to Sect. 2.1, when W k+1 is known, the solution of A is 1 k+1 k+1,T ⋯ k+1,T − k+1 S0 S1 SM−1 S1 1 1 k+1,T k 1 k+1,T k+ k+ ⋯ S + A = ⎡ S1 S0 SM−2 ⎤ ⎡ 2 ⎤ , ⎢ ⋮ ⋮⋱⋮⎥ ⎢ ⋮ ⎥ 1 ⎢Sk+1 Sk+1 ⋯ Sk+1 ⎥ ⎢Sk+ ⎥ ⎢ M−1 M−2 0 ⎥ ⎢ M ⎥ k+1 k+1,T k+1 ⎢ ⎥ ⎢ ⎥ where S = W Σ W , ⎣ ∈ . ⎦ ⎣ ⎦ k+1 k+1 2 k+1 The time complexity for forming S1 , … , SM is O(M(k + 1)n ) . Once S , k+1 = 1, … , M are calculated, the time complexity for updating A is dominated by 3 3 the inversion step, which is O(M (k + 1) ) . Therefore, the overall time complexity for k+1 2 3 3 updating A is max O(M(k + 1)n ), O(M (k + 1) ) .  3.2 Solving for w with fxed Ak+1

k+1 k+1 When A is fxed, we can solve for W exactly. With some derivations, the optimi- zation problem for w can be expressed as minimize f (w)=wT Bw − 2cT w k,T subject to W Σ0w = 0 (6) T w Σ0w = 1, where B ≻ 0 , and B, c are known. The derivation and expressions for B and c can be found in Appendix A. The solution of this problem can be obtained explicitly as follows. n×(n−k) 1∕2 k T Let U ∈ be the orthogonal complement of Σ0 W and U U = I . Denote T −1∕2 −1∕2 T −1∕2 −1∕2 T the SVD decomposition of U Σ0 BΣ0 U as U Σ0 BΣ0 U = VΛV with Λ= ( 1, … , n−k) , ( 1 ≤ 2 ⋯ ≤ n−k ). Then, problem (6) can be transformed as minimize f (z)=yT Λy − 2 T y subject to yT y = 1, (7) ⋯ T T −1∕2 −1∕2 where =[ 1 2 n−k]=V U Σ0 c and w =Σ0 UVy . The solution to prob- −1 lem (7) has the form y = (Λ + I)  where can be obtained as the root of n−k 2 i = 1 ( + )2 i=1 i that satisfes �>−�1. After is found, the optimal w of problem (6) can be obtained as

⋆ −1∕2 −1 w =Σ0 UV(Λ + �I) �.

1 3 Extracting a low‑dimensional predictable time series

According to the expressions of B and c in Appendix A, the time complexity for 2 2 forming B and c is O(M n ) . Once B and c are calculated, the time complexity for 3 updating w is dominated by the SVD step, which is O(n ) . Therefore, the overall 2 2 3 time complexity for updating w is max O(M n ), O(n ) .  3.3 Complete algorithm

k+1 k+1 We have discussed algorithms to solve for A when W is fxed and to k+1 k k+1 solve for W =[W w] when A is fxed. The complete procedure to con- struct the (k + 1)th column of W is to iterate these two steps until convergence. k+1 As discussed in Sects. 3.1 and 3.2, the time complexity for updating A is 2 3 3 max O(M(k + 1)n ), O(M (k + 1) ) , and the time complexity for updating w is 2 2 3 max O(M n ), O(n ) . In practice, it is often the case that m ≪ n and M ≪ n . There- 3 fore, the overall time complexity at each iteration step is O(n ). k+1 k+1 Once W , A are obtained, the same procedure can be applied to construct the next column of W. The complete algorithm is given in Algorithm 1.

The heuristic method proposed has two advantages over directly solving the origi- nal problem (4). First, there is no uniqueness issue in this recursive method, because starting from the frst column of W, each column of W is deterministic according to the optimization problem (6). Second, the iterative procedure gives an indication of the dimension of the predictable vector time series. Let k denote the predictability measure for the k dimensional projected time series, k k or the value of the objective function in (4) with optimal W and A . Then, when extracting the (k + 1) th scalar time series, we have the following upper bound for k+1 of the (k + 1)-dimensional time series, T k+1 ≤ k + w Σ0w = k + 1.

When the optimal k+1 is very close to the upper bound k + 1 , we can draw two conclusions. First, adding another scalar time series does not improve the prediction of the k dimensional vector time series. Second, no other self-predictable scalar time series can be extracted. These two facts suggest that all the predictable components in the time series data are already extracted and hence, we can stop the iteration procedure.

1 3 Y. Dong et al.

4 Examples

In this section, we test our method by applying it to a synthesized high-dimensional time series dataset and a quarterly GDP growth dataset. In both examples, we dem- onstrate advantages of our proposed method over scalar time series AR ftting.

4.1 Simulation dataset

The proposed method is frst tested on a synthesized dataset generated from the model

xt = B1xt−1 + B2xt−2 + vt, yt = Pxt + et, −1∕2 (8) zt = QΣy yt, where 1.1241 0.3045 0.3806 −0.2482 0.3676 0.0328 B1 = 0.3902 − 0.8169 − 0.3114 , B2 = −0.4240 0.1101 0.0267 . ⎡−0.7166 − 0.8630 1.0115 ⎤ ⎡ 0.6011 − 0.5975 − 0.3224⎤ ⎢ ⎥ ⎢ ⎥ ⎢ 1000×3 ⎥ ⎢ 1000×⎥1000 The matrix⎣ P ∈ is random matrix⎦ with orthonormal⎣ columns, Q ∈ ⎦ N N 2 is a random orthonormal matrix, vt is i.i.d. (0, I) and et is i.i.d (0, 0.02 I) . Σy is the empirical of yt , calculated as T 1 Σ = y yT , y T t t t=1 where T in the number of samples. In this example, 10,000 consecutive samples are generated from the model. Hence, T = 10, 000 . The sample func- tions of the true underlying predictable time series xt are plotted in Fig. 1. We can see that there are strong temporal dependence in xt.

4.1.1 Scalar AR approach

We frst ft a 2-memory AR model to each scalar time series in zt to check the pre- dictability of each scalar time series. We use mean squared error (MSE) to evaluate the prediction performance of the ftted AR predictors. We fnd that, of all 1000 sca- lar time series, the minimal MSE is 0.9984 and the maximum MSE is 1.0000. Since each scalar time series has approximately mean 0 and variance 1, this indicates that 2-memory scalar AR ftting fails to extract any signifcant predictability information from the data. The autocorrelation functions of the frst 3 scalar time series in zt are plotted in Fig. 2. It is quite clear that there are no signifcant temporal dependence that can be observed.

1 3 Extracting a low‑dimensional predictable time series

Sample autocorrelation function of x 1 1

0.5

0 Sample Autocorrelation

-0.5 02468101214161820 Lag

Sample autocorrelation function of x 2 1

0.5

0

-0.5 Sample Autocorrelation

-1 02468101214161820 Lag

Sample autocorrelation function of x 3 1

0.5

0

-0.5 Sample Autocorrelation

-1 02468101214161820 Lag

Fig. 1 Sample autocorrelation functions of the true underlying predictable time series

4.1.2 PCA approach

PCA has been widely utilized in literature to extract latent factors from vec- tor time series. In this example, it is clear from the model assumption that the empirical covariance of zt is approximately an identity matrix. Therefore, PCA approach would extract the 3-dimensional latent time series as the frst 3 scalar time series in zt . As shown in Fig. 2, we can tell that this approach is not able to successfully identify the most predictable factors.

1 3 Y. Dong et al.

Sample autocorrelation function of z 1 1

0.5

0 Sample Autocorrelation

-0.5 0246810 12 14 16 18 20 Lag

Sample autocorrelation function of z 2 1

0.5

0 Sample Autocorrelation

-0.5 0246810 12 14 16 18 20 Lag

Sample autocorrelation function of z 3 1

0.5

0 Sample Autocorrelatio n

-0.5 0246810 12 14 16 18 20 Lag

Fig. 2 Sample autocorrelation functions of the frst 3 scalar time series in zt

4.1.3 Proposed approach

We apply the proposed method to this simulation dataset with 2-memory and m = 3 . The auto-covariance matrices of zt are estimated as

1 3 Extracting a low‑dimensional predictable time series

Table 1 MSE of the extracted Proposed method Canonical DiCCA​ time series using diferent transforma- approaches tion

1st extracted time series 0.0568 0.0876 0.0797 2nd extracted time series 0.0824 0.0787 0.1092 3rd extracted time series 0.0679 0.0974 0.1527

Table 2 Eigenvalues of the VAR models extracted using diferent approaches True model Proposed method Canonical transformation DiCCA​

− 0.9560 − 0.9522 − 0.9535 − 0.9641 0.9230 0.9302 0.9104 0.7303 + 0.5533i 0.7117 + 0.5542i 0.7164 + 0.5574i 0.7072 + 0.5470i 0.7303 − 0.5533i 0.7117 − 0.5542i 0.7164 − 0.5574i 0.7072 − 0.5470i 0.6968 + 0.3303i − 0.2544 − 0.2702 − 0.2069 0.6968 − 0.3303i 0.1827 0.1943 0.1113 0.0624

T−M 1 Σ = z zT , = 1, 2. T − M t t+ t=1 and T Σ− =Σ , = 1, 2.

Using the extracted VAR predictor, we fnd that the MSE of the extracted 3-dimen- sional principal time series are given in Table 1. Since each of the principal time series have approximately mean 0 and variance 1, the low MSEs indicate that the extracted VAR predictor is able to make highly accurate . This is a signif- icant improvement over scalar AR predictors. The autocorrelation functions of the 3-dimensional principal time series extracted using the proposed method are plot- ted in Fig. 3. It can be seen that there are indeed strong temporal dependence in the extracted predictable time series. In addition to predictability, we also want to check the similarity between the extracted VAR model and the true model. According to Sect. 2.2 and the relation- ships in (8), the recovered VAR model parameters A1 and A2 can be equivalent to the true model parameters B1 , B2 up to a similarity transformation. Therefore, we check A1 A2 how similar the two models are by comparing the eigenvalues of I 0  and B1 B2 I 0  . The closer the two sets of eigenvalues are, the more similar the two models are. Table 2 lists the two sets of eigenvalues ordered in terms of magnitude. We can see that the two sets of eigenvalues are quite close to each other, which implies that the recovered VAR model is close to the true model. The high

1 3 Y. Dong et al.

Sample autocorrelation function of the extracted x 1 1

0.5

0

-0.5 Sample Autocorrelation

-1 0246810 12 14 16 18 20 Lag

Sample autocorrelation function of the extracted x 2 1

0.5

0 Sample Autocorrelation

-0.5 0246810 12 14 16 18 20 Lag

Sample autocorrelation function of the extracted x 3 1

0.5

0

-0.5 Sample Autocorrelation

-1 0246810 12 14 16 18 20 Lag

Fig. 3 Sample autocorrelation functions of the 3-dimensional extracted predictable time series using pro- posed method predictability of the extracted principal time series and the successfully recovered VAR model demonstrate the efectiveness of the proposed method.

4.1.4 DiCCA approach

According to the method described in Dong and Qin (2018a), a dynamic-inner analysis (DiCCA) model with order 2 is built on the same data- set. A key diference between DiCCA and the proposed method is that DiCCA does not consider interactions among diferent entries in xt . The MSE of the extracted 1 3 Extracting a low‑dimensional predictable time series

3-dimensional time series are listed in Table 1, and eigenvalues of the recovered VAR model are given in Table 2. Since DiCCA does not consider the predictabili- ties among diferent entires in xt , it has higher MSE values and large diferences from the true eigenvalues.

4.1.5 Canonical transformation approach

According to the method described in Box and Tiao (1977), a full-dimensional VAR model with 2-memory is ft to zt frst, and the covariance matrix Σ(ẑ) of the predic- tion ẑt is calculated. The linear transformation matrix W1 is selected as the eigen- −1 vectors corresponding to the largest magnitude eigenvalues of matrix Σ0 Σ(ẑ) . The MSE values of the extracted 3-dimensional latent time series are listed in Table 1 as well. In fact, many methods that extract lower-dimensional predictable time series involve ftting a high-dimensional VAR model (which can be very time consuming when n and M are large), and do not consider the prediction models for the lower- dimensional predictable time series in the objective functions. An AR model is often ft subsequently after extracting the lower-dimensional predictable time series for prediction purposes. In order to examine how closely this method recovers the T underlying model structure, a VAR model with 2-memory is ft to W1 zt , and the eigenvalues are listed in Table 2. Compare to the proposed method, it recovers the small magnitude eigenvalues less accurately than the proposed method.

4.1.6 PFA approach and reduced rank AR approach

The predictable (PFA) method developed in Richthofer and Wiskott (2015) without nonlinear expansion and regularization was tested, as well as the reduced rank AR approach discussed in Velu et al. (1986). In fact, we can show that without special treatment, these two approaches are equivalent to the canonical transformation method in Box and Tiao (1977), and the results are the same as the results of the canonical transformation method.

4.2 GDP dataset

This dataset is composed of seasonally adjusted quarterly GDP growth data of 17 countries from 1961-Q2 to 2017-Q3. The 17 countries are selected based on the largest GDP countries according to the world bank data in 2016 with complete GDP growth records from 1961-Q2 to 2017-Q3, and this data is downloaded from https://​ stats.​oecd.​org/​index.​aspx?​query​id=​350#. The 17 countries are United States, Japan, United Kingdom, France, Italy, Canada, South Korea, Australia, Spain, Mexico, Netherlands, Switzerland, Germany, Sweden, Belgium, Austria, and Norway. Two approaches are applied to the frst 135 samples from 1961-Q2 to 1994-Q4. The frst approach is to ft a single variable AR model to each country’s GDP data. The sec- ond approach is to use the proposed method in this work to extract the most predictable principal time series. For each approach, the 135 samples are preprocessed such that 1 3 Y. Dong et al. each variable has zero mean and unit variance, and an AR model with 1-memory is ft- ted to the preprocessed data. Table 3 summarizes the (un-)predictabilities of ftting single variable AR model to two representative countries, Mexico and Belgium, the (un-) predictabilities of the one- and two-dimensional most predictable time series extracted by the proposed method and the (un-)predictabilities of naïve zero predictor. The naïve zero predictor is defned to always predict zero. In this case, since all the variables are preprocessed to have zero mean, the naïve zero predictor is the essentially the same as the mean predictor, and the corresponding (un-)predictabilities can serve as a reference to evaluate the performance of other predictors. The reason we pick Mexico and Belgium is that they are the two countries with the best predictabilities (lowest (un-)predictability values) using single variable AR predictor. We can see from Table 3 that both approaches result in lower (un-) predictabilities than the naïve zero predictor, indicating that both approaches give more efcient pre- dictors that naïve zero predictor. In addition, the improvements in the (un-)predictabili- ties of principal time series extracted by the proposed method are much more obvious than the improvements in the (un-) predictabilities of single variable AR predictor. This implies that the principal time series extracted by the proposed method are signifcantly more predictable than any individual scalar time series. Figure 4 shows the prediction results of the two approaches. We can see that the predictions of the principal time series extracted by our method are closer to their true values, compared to the predictions of Mexico and Belgium’s scaled GDP using sin- gle variable AR predictor. This again demonstrates the efectiveness of the proposed method. The predictor obtained for the 1 factor (m = 1) case using our proposed method is

x̂t = 0.7382xt−1.

The contribution or weight of each country is shown in Fig. 5. countries have rela- tively large contributions, which are Japan, Italy, Spain, Germany and Belgium. The predictor obtained for the 2 factors (m = 2) case is 0.7382 − 0.0149 x̂ = x . t 0.0149 − 0.6208 t−1

Table 3 (Un-)predictabilities (Un-)predictability (Un-)predictability of of Mexico and Belgium using naïve zero predictor single variable AR ftting, (un-) predictabilities of principal time Mexico, m = 1 0.8289 1 series extracted by the proposed m = 1 method and (un-)predictabilities Belgium, 0.7939 1 = 1 = 1 of naïve zero predictor M , m 0.4417 1 M = 2 , m = 1 1.0484 2

1 3 Extracting a low‑dimensional predictable time series

Single variable AR fit to Mexico with lag 1 4

2

0

-2

Pre-processed GD P -4 1965 197019751980 19851990

Single variable AR fit to Belgium with lag 1 4

2

0

-2

Pre-processed GDP -4 1965 197019751980 19851990

Reduced rank AR fit with 1 factor 4

2

0 Factor 1 -2

-4 1965 197019751980 19851990

Reduced rank AR fit with 2 factors 4

2

0 Factor 1 -2

-4 1965 197019751980 19851990

Reduced rank AR fit with 2 factors 4

2

0 Factor 2 -2

-4 1965 197019751980 19851990

Fig. 4 Prediction results using two approaches from 1961-Q3 to 1994-Q4. The solid blue line represents the true values and the dashed red line represents the predicted values

The contribution or weight of each country is shown in Fig. 6. countries have rela- tively large contributions, which are Japan, France, Korea, Australia, Mexico and Austria.

4.3 GDP dataset prediction

Next, we examine the prediction performance of the predictors found in Sect. 4.2 on unseen data. The data we use in this section are composed of data from 1994-Q4 to 2017-Q3 (samples from 135 to 226 in the same dataset as in Sect. 4.2). We under- stand that the mean and variance of variables in this dataset are considerably difer- ent from the mean and variance in the dataset analyzed in Sect. 4.2, especially due to

1 3 Y. Dong et al.

0.5

0.4

0.3

0.2

Contribution 0.1

0

-0.1

-0.2 USA JPN GBR FRAITA CAN KORAUS ESPMEX NLD CHEDEU SWEBEL AUTNOR

Fig. 5 Contribution of each country to the most predictable factor when memory is 1 and the number of factors is 1 the fnancial crisis around 2008. However, we would still like to check whether the predictors found in Sect. 4.2 persist. First, we shift and scale each scalar time series with its corresponding mean and found in Sect. 4.2. Then, we apply the predictors found in Sect. 4.2 to this preprocessed dataset to make predictions. It is worth noting that this is a very challenging task as it covers the fnancial crisis period from 2007 to 2009. We use MSE as a measure of how good the predictions are. Table 4 summa- rizes the MSE when applying single variable AR predictor to Mexico and Bel- gium, the MSE when applying the predictor to the principal time series obtained using our proposed method and the MSE of naïve zero predictor. The MSE of

Table 4 MSE of Mexico and MSE MSE of naïve Belgium using single variable zero predictor AR ftting, MSE of principal time series extracted by the Mexico, m = 1 1.4072 1.8671 proposed method and MSE of naïve zero predictor Belgium, m = 1 0.3844 0.6536 M = 1, m = 1 0.5009 1.3902 M = 2, m = 1 0.6961 1.5015

1 3 Extracting a low‑dimensional predictable time series

1

0.8

0.6

0.4

Contribution 0.2

0

-0.2

-0.4 USA JPN GBR FRAITA CAN KORAUS ESPMEX NLD CHEDEU SWEBEL AUTNOR

Fig. 6 Contribution of each country to the most predictable factor when memory is 1 and the number of factors is 2

naïve zero predictor can be used as a reference to evaluate the performance of the other two predictors. We can see from Table 4 that both single variable AR predictors and our pro- posed predictors perform better than naïve zero predictor. However, our pro- posed method provides dramatic improvements over the naïve zero predictor compared to the single variable AR predictors. Figure 7 shows the prediction results. We can see that the predictions made using the proposed method are more accurate than the predictions using single variable AR predictor. In addition, the predictor found by our proposed method generalizes well even during the fnancial crisis period. The frst factor captures the big drop around 2008.

5 Extensions and variations

There are several extensions of and variations on the problems that we describe in this paper.

1 3 Y. Dong et al.

Single variable AR fit to Mexico with lag 1 2

0

-2

-4

Pre-processed GD P -6 1996 1998 20002002 2004 2006 200820102012 20142016

Single variable AR fit to Belgium with lag 1 2

0

-2

-4

Pre-processed GDP -6 1996 1998 20002002 2004 2006 200820102012 20142016

Reduced rank AR fit with 1 factor 2

0

-2 Factor 1 -4

-6 1996 1998 20002002 2004 2006 200820102012 20142016

Reduced rank AR fit with 2 factors 2

0

-2 Factor 1 -4

-6 1996 1998 20002002 2004 2006 200820102012 20142016

Reduced rank AR fit with 2 factors 2

0

-2 Factor 2 -4

-6 1996 1998 20002002 2004 2006 200820102012 20142016

Fig. 7 Prediction results using two approaches from 1995-Q1 to 2017-Q3. The solid blue line represents the true values and the dashed red line represents the predicted values

5.1 Regularization

We can add regularization on W and A to the objective. We denote the regularized problem as maximize f (W)+(A)+r̃(W) (9) subject to S0 = I,

mM×m n×m where r ∶ → and r̃ ∶ → . r and r̃ can be chosen to enforce certain properties in A and W. For example, regularization using r(A)= A F can be added to avoid overftting of the AR model; r(A)= A 1 can be added to encourage a par- ‖ ‖ simonious structure of the AR model; r̃(W)= Wi 1 enforces ith column of W to be ‖ ‖ 1 3 Extracting a low‑dimensional predictable time series sparse, such that the ith most predictable time series only depends on a few entries in the high dimensional time series.

5.2 Low rank structure

In many cases, high dimensional time series have low rank structure. We can use this low rank structure to reduce the computational complexity of our problem. When the high dimensional time series has low rank structure, the covariance matrix Σ0 ̃ has many eigenvalues close 0. Let U denote the collections of all the eigenvectors of Σ0 corresponding to the signifcantly non-zero eigenvalues, then original inputs Σ , = 0, 1, … , M can be approximately transformed into Φ , = 0, 1, … , M , where ̃ T ̃ Φ� = U Σ� U . Since the dimension of Φ is much lower than the dimension of Σ , by working with the new input series Φ , the computational complexity reduces signif- cantly. Let W and A denote the solutions with the new inputs Φ , then the solutions with the original inputs Σ can be obtained as ̃ W = UW�, A = A�.

5.3 Filtering

As an extension of the projection method, we can consider extracting a lower-dimen- sional time series xt using a fnite impulse response (FIR) flter with length L: T T ⋯ T xt = W1 zt + W2 zt−1 + + WL zt−L+1, t ∈ ,

n×m where W1, … , WL ∈ are the flter coefcients. For L = 1 , this reduces to the projection problem (4). We can write this as

zt z x = WT t−1 , t ∈ , t ⎡ ⋮ ⎤ ⎢z ⎥ ⎢ t−L+1⎥ ⎢ ⎥ where ⎣ ⎦

W1 W W = 2 . ⎡ ⋮ ⎤ ⎢W ⎥ ⎢ L⎥ ⎢ ⎥ ⎣ ⎦ zt zt−1 Defne the following auto-covariance matrices of ⎡ ⋮ ⎤: ⎢z ⎥ ⎢ t−L+1⎥ ⎢ ⎥ ⎣ ⎦

1 3 Y. Dong et al.

T T T Σ0Σ ⋯ Σ zt zt 1 L−1 T zt−1 zt−1 Σ1Σ0 ⋯ Σ Ω0 = = ⎡ L−2 ⎤ , ⎡ ⋮ ⎤ ⎡ ⋮ ⎤ ⋮⋮⋱⋮ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ zt−L+1 zt−L+1 ⎢Σ 1Σ 2 ⋯ Σ0⎥ ⎢ ⎥ ⎢ ⎥ ⎢ L− L− ⎥ ⎢ ⎥ ⎢ ⎥ T T ⎣ ⎦ ⎣ ⎦ ⎢ Σ1Σ0 ⋯ Σ ⎥ zt zt+1 ⎣ L−2⎦ T zt−1 zt Σ2Σ1 ⋯ Σ Ω1 = = ⎡ L−3⎤ . ⎡ ⋮ ⎤ ⎡ ⋮ ⎤ ⋮⋮⋱⋮ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ z 1 z 2 ⎢ ⋯ ⎥ ⎢ t−L+ ⎥ ⎢ t−L+ ⎥ ΣLΣL−1 Σ1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ⎢ ⎥ The goal is to choose W so that the series xt is most⎣ predictable by⎦ an M-memory AR T predictor. Similar to the projection problem, we add the constraint S0 = W Ω0W = I to rule out the trivial W = 0 case. This also ensures that the low-dimensional time series xt is standardized.

6 Related work

The general problem of extracting low-dimensional latent variables from high- dimensional time series has been studied for decades in many diferent research felds from control systems, processing, to economics. Many methods have been developed, and here we survey a subset of representative methods that are closely related to the proposed method.

6.1 Extracting predictable latent variables

The extraction of predictable latent variables can be traced back to the work Box and Tiao (1977), where canonical transformation is proposed to extract the lower dimensional components ordered from least to most predictable. Mathematically, the transformation matrix W in Box and Tiao (1977) is selected as the eigenvectors −1 T corresponding to the m largest eigenvalues in Σ0 Σ(ẑ) , where Σ(ẑ)= ẑtẑt , and ẑt is the one-step ahead prediction of zt found in (2). It is clear that the method in Box and Tiao (1977) is diferent from the proposed method unless M = 1 and m = 1. Later in Pena and Box (1987), the zt vector is decomposed into one component that contains factors mixed up with , where the transformation matrix W is obtained by analyzing the eigenstructure of the VAR model coefcients of zt , and one component contains . The slow feature analysis (SFA) method proposed in Wiskott and Sejnowski (2002) aims to extract a lower-dimensional “slowly varying” time series. Without nonlinear expansion, SFA can be treated as a special case of the proposed method if we restrict M = 1 and A1 = I . Even though “slowly varying” time series are predict- able, predictable time series do not necessarily have slow variations. To deal with this, the later work Richthofer and Wiskott (2015) proposed a more general predict- able feature analysis (PFA) approach to extract lower-dimensional predictable time series. In PFA method, the selection of the transformation matrix W involves an 1 3 Extracting a low‑dimensional predictable time series

T eigen-decomposition of (zt − ẑt)(zt − ẑt) . As we can see, the PFA method is in fact closely related to the method in Box and Tiao (1977), and is only equivalent to the proposed method when M = 1 and m = 1. In all the above mentioned methods except SFA, a VAR model needs to be ft for the original high-dimensional time series. There have also been methods developed that do not involve ftting a high-dimensional VAR model. For example, in Dong and Qin (2018b, 2018a), dynamic-inner principal component analysis (DiPCA) and DiCCA are developed to extract a lower-dimensional most predictable latent varia- bles. In both methods, the columns of W are extracted sequentially, with a scalar AR predictor for each entry in xt . DiPCA extracts each entry in xt such that it has maxi- mal covariance between its predicted value using an AR predictor, while DiCCA maximizes the correlation. In fact, it can be shown that DiCCA is a special case of the proposed method where A1, A2, … , AM are diagonal matrices. Instead of using expected mean squared error as a predictability measure, there have been methods developed using diferent predictability measures. For example, forecastable component analysis (ForeCA) proposed in Goerg (2013) utilizes the diferential entropy as the predictability (forecastability) measure, which yields the lower bound of the expected squared loss of any . Graph-based predict- able feature analysis (GPFA) in Weghenkel et al. (2017) maximizes a predictability measure defned in terms of graph embedding. Dynamical component analysis in Clark et al. (2019) uses between the past and future data. The method developed in Stone (2001) proposed to use a measure of temporal predict- ability for blind source separation. In atmospheric, optimal persistence analysis (OPA) maximizes the decorrelation time (DelSole 2001), and predictabil- ity time decomposition (APTD) maximizes the average predictability time (DelSole and Tippett 2009a, b).

6.2 Factor models

This work is also closely related to the extensively studied factor models in econo- metrics. Here we analyze some representative methods and compare them with the proposed method. The early work Brillinger (1981) is a domain approach that extracts dynamic principal components (DPC) as linear combinations of the past and future observations to minimize the mean squared reconstruction error of the original high-dimensional time series. Similar structure exists in Peña and Yohai (2016), which is a approach that uses non-causal models. In the later work Peña et al. (2019), a causal model is utilized to extract one-sided dynamic prin- cipal components (ODPC) as linear combinations of the current and past values of the series that minimize the reconstruction mean squared error. All of these methods extract latent factors diferently from the proposed method, where the extraction of xt only depends on the current data. In addition, these methods extract latent factors by minimizing the mean squared reconstruction error, while the proposed method minimizes the mean squared prediction error. In Lam and Yao (2012); Lam et al. (2011), for standardized high-dimensional time series, W is selected as the eigenvectors corresponding to the largest m 1 3 Y. Dong et al.

M T eigenvalues in i=1 Σi Σi . In Pan and Yao (2008), the latent factors are identifed via expanding the∑ white noise space step by step. Compare to the proposed method, these methods provide little characterization on the lower-dimensional xt . To make predictions on xt , Lam et al. (2011) suggest to subsequently build an AR predictor. However, in the proposed method, the extraction and prediction of xt are achieved simultaneously by solving one optimization problem. A lot of the above analysis were also given in Qin et al. (2020), where Lam and Yao (2012) is linked to sub- space identifcation. Refer to Stock and Watson (2006), Stock and Watson (2011), Forni et al. (2000), Amengual and Watson (2007), Bai and Ng (2007) for more gen- eral discussions on factor models.

6.3 Reduced rank time series

Another related work is reduced rank time series modeling. Early work such as Rein- sel (1983), Velu et al. (1986), Ahn and Reinsel (1988), Wang and Bessler (2004) ft AR models to the vector time series with reduced rank coefcients. Later the reduced rank time series models have been generalized into the structured AR mod- eling problem (Basu et al. 2019; Alquier et al. 2020; Melnyk and Banerjee 2016; Barratt et al. 2021), where regularization terms on the AR model coefcients are imposed to encourage certain structures, such as low rank and sparsity. In summary, most of these papers focus on a parsimonious parametrization on the vector time series models, rather than extracting low-dimensional predictable time series.

6.4 State space models

The proposed method also has connections with state space models. There have been many ways to ft a state space model to vector time series, such as expectation- maximization (EM) (Shumway and Stofer 1982), and N4SID (Moonen et al. 1989; Van Overschee and De Moor 1993) and CVA approach (Larimore 1983). In state space models, there have been methods developed to encourage sparse or low rank structures on the state transition matrix, such as She et al. (2018), Chen et al. (2017). Instead of adding regularizations on the state transition matrix, Angelosante et al. (2009), Charles et al. (2011) directly regularize the latent state to be sparse or low rank. In addition, there have been approaches that consider more general settings. For example, Lin and Michailidis (2020), Kost et al. (2018) consider the identifca- tion of linear dynamical systems with serially correlated output noise components. In the work of Qin et al. (2020), state space models are also compared with latent variable models, and it pointed out that subspace identifcation does not naturally yield reduced dimensional models. In summary, the key diference between the proposed method and many of the existing methods is that frst, it extracts low-dimensional predictable time series without ftting a full-dimensional VAR model; second, the extraction of principal

1 3 Extracting a low‑dimensional predictable time series time series and the VAR modeling of the principal time series are achieved simulta- neously by solving the optimization problem.

7 Conclusion

In this paper we have described a new method to extract a low-dimensional most predictable time series from high-dimensional time series, in the sense that an auto- regressive model achieves minimum prediction error. The method is heuristic, since the algorithm does not guarantee globally optimal. Numerical examples suggest, however, that the method works very well in practice.

Appendix A derivation of (6)

We show how to derive the expression (6) in this appendix. For simplicity, we ignore k+1 k+1 k+1 the superscript k + 1 in A , Ai , i = 1, … , M , and S , ∈ , and the superscript k k in W . When A is fxed, we have T ⋯ T S1 S0S1 SM−1 T S2 S S ⋯ S T f (w)=  −2A + A 1 0 M−2 A ⎛ ⎡ ⋮ ⎤ ⎡ ⎤ ⎞ ⎜ ⎢ ⋮⋮⋱⋮ ⎥ ⎟ ⎜ ⎢S ⎥ ⎢ ⋯ ⎥ ⎟ ⎢ M⎥ SM−1SM−2 S0 2 M ⎜ ⎢ ⎥ ⎢ ⎥ ⎟ =− i=1⎜ (AiSi) ⎢ ⎥ ⎟ ⎝ ⎣ ⎦ ⎣ ⎦ ⎠ ∑ T ⋯ T T T ⋯ T S0S1 SM−1 A1 A1A1 A2 A1 AM S S ⋯ ST AT A AT A ⋯ AT A +  ⎡ 1 0 M−2 ⎤ ⎡ 2 1 2 2 2 M ⎤ . ⎢ ⋮⋮⋱⋮ ⎥ ⎢ ⋮⋮⋱⋮ ⎥ T T T ⎢S 1S 2 ⋯ S0⎥ ⎢A A1A A2 ⋯ A A ⎥ ⎢ M− M− ⎥ ⎢ M M M M⎥ ⎢ ⎥ ⎢ ⎥ We divide Ai , i = 1, 2, … ,⎣M into the following⎦ ⎣ submatrices, ⎦

Ai,11 Ai,12 Ai = for i = 1, 2, … , M, Ai,21 Ai,22

k×k k×1 1×k where Ai,11 ∈ , Ai,12 ∈ , Ai,21 ∈ , Ai,22 ∈ . With this notation, we can expand (AiSi) as T T Ai,11Ai,12 W ΣiWW Σiw (AiSi)=  T T Ai,21Ai,22 w ΣiWw Σiw  T T T T = w (Ai,22Σi)w + (ΣiWAi,12 +Σi WAi,21) w + d, where d is a constant. For the second term in f(w), we have

1 3 Y. Dong et al.

T T S0S ⋯ S T T T 1 M−1 A1 A1 A1 A2 ⋯ A1 AM T T T T ⋯ A2 A1A2 A2 ⋯ A2 AM  S1S0 SM−2 ⎡ ⎤ ⎡ ⋮⋮⋱⋮ ⎤ ⎢ ⋮⋮⋱⋮ ⎥ ⎢ T T T ⎥ ⎢ ⋯ ⎥ A A1A A2 ⋯ A A SM−1SM−2 S0 ⎢ M M M M ⎥ ⎢ ⎥ ⎢ ⎥ = ⎢ (S AT A + ST A⎥T A + ⋯ + ST AT A )+ (S AT A + S AT A + ⋯ ⎣ 0 1 1 1 ⎦2⎣ 1 M−1 M 1 1 1 2 ⎦ 0 2 2 T T ⋯ T T T ⋯ T + SM−2AMA2)+ + (SM−1A1 AM + SM−2A2 AM + + S0AMAM) T , = i,j Sj−iAi Aj

∑ T where (Sj−iAi Aj) can be expanded as T (Sj−iAi Aj) T T T T T T W Σj−iWW Σj−iw Ai,11Aj,11 + Ai,21Aj,21Ai,11Aj,12 + Ai,21Aj,22 = T T T T w Σj−iWw Σj−iw  Ai,12Aj,11 + Ai,22Aj,21Ai,12Aj,12 + Ai,22Aj,22 T T T T T T T =(Ai,12Aj,11 + Ai,22Aj,21)W Σj−iw +(Ai,11Aj,12 + Ai,21Aj,22) W Σj−iw T T . + w (Ai,12Aj,12 + Ai,22Aj,22)Σj−iw

Summing all terms, we can obtain the following expression for f(w), f (w)=wT Bw − 2cT w + d, where d is a constant and M T T B = (Ai,12Aj,12 + Ai,22Aj,22)Σj−i − Ai,22(Σi +Σi ), 1≤i,j≤M i=1 M∑ ∑ T T T T T c = (ΣiWAi,12 +Σi WAi,21)− Σj−iW(Aj,11Ai,12 + Ai,22Aj,21) i=1 1≤i

Acknowledgements We would like to express our appreciation to Professor Peter Stoica for his valuable and constructive suggestions during the preparation of this paper. We also thank Peter Nystrup for point- ing us to related work.

References

Absil PA, Mahony R, Sepulchre R (2009) Optimization algorithms on matrix manifolds. Princeton University Press, Princeton Ahn SK, Reinsel GC (1988) Nested reduced-rank autoregressive models for multiple time series. J Am Stat Assoc 83(403):849–856 Alquier P, Bertin K, Doukhan P, Garnier R (2020) High-dimensional VAR with low-rank transition. Stat Comput 30(4):1139–1153. https://​doi.​org/​10.​1007/​s11222-​020-​09929-7

1 3 Extracting a low‑dimensional predictable time series

Amengual D, Watson MW (2007) Consistent estimation of the number of dynamic factors in a large N and T panel. J Bus Econ Stat 25(1):91–96 Angelosante D, Roumeliotis SI, Giannakis GB (2009) Lasso-Kalman smoother for tracking sparse sig- nals. In: 2009 Conference record of the forty-third asilomar conference on , systems and computers, IEEE, pp 181–185 Bai J, Ng S (2007) Determining the number of primitive shocks in factor models. J Bus Econ Stat 25(1):52–60 Bai J, Ng S (2008) Large dimensional factor analysis. Found Trend Reg Econ 3(2):89–163 Barratt S, Dong Y, Boyd S (2021) Low rank . arXiv preprint arXiv:​21011​2414 Basu S, Li X, Michailidis G (2019) Low rank and structured modeling of high-dimensional vector autore- gressions. IEEE Trans Sig Process 67(5):1207–1222 Box GE, Tiao GC (1977) A canonical analysis of multiple time series. Biometrika 64(2):355–365 Brillinger DR (1981) Time series: data analysis and theory, Expanded. Holden-Day Inc, New York Charles A, Asif MS, Romberg J, Rozell C (2011) Sparsity penalties in dynamical system estimation. In: 2011 45th annual conference on information sciences and systems, IEEE, pp 1–6 Chen S, Liu K, Yang Y, Xu Y, Lee S, Lindquist M, Cafo BS, Vogelstein JT (2017) An M-estimator for reduced-rank system identifcation. Pattern Recognit Lett 86:76–81 Choi I (2012) Efcient estimation of factor models. Econ Theory 28(2):274–308 Clark DG, Livezey JA, Bouchard KE (2019) Unsupervised discovery of temporal structure in noisy data with dynamical components analysis. arXiv preprint arXiv:​19050​9944 Connor G, Korajczyk RA (1986) Performance measurement with the arbitrage pricing theory: a new framework for analysis. J Financ Econ 15(3):373–394 DelSole T (2001) Optimally persistent patterns in time-varying felds. J Atmosph Sci 58(11):1341–1356 DelSole T, Tippett MK (2009a) Average predictability time: part i–theory. J Atmosph Sci 66(5):1172–1187 DelSole T, Tippett MK (2009b) Average predictability time: part ii–Seamless diagnoses of predictability on multiple time scales. J Atmosph Sci 66(5):1188–1204 Dong Y, Qin SJ (2018a) Dynamic latent variable for process operations and control. Comput Chem Eng 114:69–80 Dong Y, Qin SJ (2018b) A novel dynamic pca algorithm for dynamic data modeling and process monitor- ing. J Process Control 67:1–11 Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J Matrix Anal Appl 20(2):303–353 Forni M, Hallin M, Lippi M, Reichlin L (2000) The generalized dynamic-factor model: identifcation and estimation. Rev Econ Stat 82(4):540–554 Goerg G (2013) Forecastable component analysis. In: International conference on , pp 64–72 Kost O, Duník J, Straka O (2018) Correlated noise characteristics estimation for linear time-varying sys- tems. In: 2018 IEEE Conference on Decision and Control (CDC), IEEE, pp 650–655 Lam C, Yao Q (2012) Factor modeling for high-dimensional time series: inference for the number of fac- tors. Ann Stat 40(2):694–726 Lam C, Yao Q, Bathia N (2011) Estimation of latent factors for high-dimensional time series. Biometrika 98(4):901–918 Larimore WE (1983) System identifcation, reduced-order fltering and modeling via canonical variate analysis. In: 1983 American Control Conference, IEEE, pp 445–451 Lin J, Michailidis G (2020) System identifcation of high-dimensional linear dynamical systems with serially correlated output noise components. IEEE Trans Sig Process 68:5573–5587 Melnyk I, Banerjee A (2016) Estimating structured vector autoregressive models. In: Proc. Intl. Conf. Machine Learning, pp 830–839 Moonen M, De Moor B, Vandenberghe L, Vandewalle J (1989) On-and of-line identifcation of linear state-space models. Int J Control 49(1):219–232 Pan J, Yao Q (2008) Modelling multiple time series via common factors. Biometrika 95(2):365–379 Pena D, Box GE (1987) Identifying a simplifying structure in time series. J Am Stat Assoc 82(399):836–843 Peña D, Yohai VJ (2016) Generalized dynamic principal components. J Am Stat Assoc 111(515):1121–1131 Peña D, Smucler E, Yohai VJ (2019) Forecasting multiple time series with one-sided dynamic principal components. J Am Stat Assoc. https://​doi.​org/​10.​1080/​01621​459.​2018.​15201​17

1 3 Y. Dong et al.

Qin SJ, Dong Y, Zhu Q, Wang J, Liu Q (2020) Bridging and data science: a unifying review of dynamic latent variable analytics and process monitoring. Ann Rev Control 50:29 Reinsel G (1983) Some results on multivariate autoregressive index models. Biometrika 70(1):145–156 Richthofer S, Wiskott L (2015) Predictable feature analysis. In: 2015 IEEE 14th international conference on machine learning and applications (ICMLA), IEEE, pp 190–196 She Q, Gao Y, Xu K, Chan R (2018) Reduced-rank linear dynamical systems. In: Proceedings of the AAAI Conference on Artifcial Intelligence, vol 32 Shumway RH, Stofer DS (1982) An approach to time series and forecasting using the EM algorithm. J Time Ser Anal 3(4):253–264 Stock JH, Watson M (2011) Dynamic factor models. Oxford handbook on economic forecasting Stock JH, Watson MW (2006) Forecasting with many predictors. Handb Econ Forecast 1:515–554 Stone JV (2001) Blind source separation using temporal predictability. Neural Comput 13(7):1559–1574 Tatum WO (2014) Ellen R. Grass lecture: extraordinary EEG. Neurodiagnostic J 54(1):3–21 Teplan M (2002) Fundamentals of EEG measurement. Measurement Sci Rev 2(2):1–11 Thornhill NF, Hägglund T (1997) Detection and diagnosis of oscillation in control loops. Control Eng Pract 5(10):1343–1354 Thornhill NF, Huang B, Zhang H (2003) Detection of multiple oscillations in control loops. J Process control 13(1):91–100 Usevich K, Markovsky I (2014) Optimization on a Grassmann manifold with application to system iden- tifcation. Automatica 50(6):1656–1662 Van Overschee P, De Moor B (1993) Subspace algorithms for the identifcation problem. Auto- matica 29(3):649–660 Velu RP, Reinsel GC, Wichern DW (1986) Reduced rank models for multiple time series. Biometrika 73(1):105–118 Wang Z, Bessler DA (2004) Forecasting performance of multivariate time series models with full and reduced rank: An empirical examination. Int J Forecast 20(4):683–695 Weghenkel B, Fischer A, Wiskott L (2017) Graph-based predictable feature analysis. Mach Learn 106(9–10):1359–1380 Wiskott L, Sejnowski TJ (2002) Slow feature analysis: unsupervised learning of invariances. Neural Comput 14(4):715–770

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations.

1 3