arXiv:1709.09606v3 [stat.ME] 3 Jul 2019 eieCne o ikAayis(EA tC’FsaiUnive Foscari Ca’ s at cluster (VERA) multiprocessor Analytics SCSCF Risk the for I Center used the Venice research of Meeting” This Intermediate “ESOBE” “SIS 2017, 2016. the Milan, and in 2016, “BISP10” Seville, High-dimensional 2017, in on Wien, “12 Workshop in i 2018, Vienna Finance” 2017” and Rome, “3rd “CFENetwork 2017, in 2018, Madrid, Messina, conference” in in EC2 MAF” “8th “29th 2018, 2018, Rimini, 2 Pisa, “ICEEE in at: U participants 2018” workshop Polytechnic and University, conference U the Maastricht Vrije thank Economics, Southampton, of of suggest University School and CREST, Al comments Lucas, their at: André for participants West, Koop, Mike Gary Robert, Koopman, Christian Jan Siem Johansen, Søren 3 ¶ ∗ ‡ † § td etrGrese onaino h ws ainlBa National Swiss the of Foundation Gerzensee, Center Study eaegaeu oFdrc asti yvaFrühwirth-Sch Sylvia Bassetti, Federico to grateful are We -al [email protected] e-mail: [email protected] e-mail: [email protected] e-mail: [email protected] e-mail: oiaBillio Monica losfricuigetasml nomto n o intr for frame and Bayesian information a extra-sample developing including for by allows als We for methods effects. sparsity inference incorporate to and a parametrization providing for func decomposition low-rank PARAFAC response the impulse exploit ( associated the process derive autoregressive and co properties a Our define we cases. First, special manifold. as models econometric d well-known covariates new some and a variables propose response We tensor-valued for tools. model econometric suitable for calls this esrvle aaaebcmn nraigyaalbein available increasingly becoming are data Tensor-valued aeinDnmcTno Regression Tensor Dynamic Bayesian † 2 1 1 coaNraeSproeo Pisa of Superiore Normale Scuola oet Casarin Roberto , a ocr nvriyo Venice of University Foscari Ca’ Kaufmann Abstract 1 ‡ 1 atoIacopini Matteo , ¶ 3 ainSaitclSceyi Florence, in Society Statistical talian iest fMln oevr we Moreover, Milan. of niversity 1”i ec,21,“CFENetwork 2019, Lecce, in 019” os lo etakteseminar the thank we Also, ions. iest fAsedm London Amsterdam, of niversity se n spr fteproject the of part is and ystem st fVenice. of rsity ieSre nMacroeconomics in Series Time nVnc,21,“CFENetwork” 2016, Venice, in odn 07 IEE2017” “ICEEE 2017, London, n hRE nulmeig in meeting” Annual RCEA th atr hita Gouriéroux, Christian natter, i ofr,PtrPhillips, Peter Monfort, ain dcn shrinking oducing htencompasses that in eod we Second, tion. R) td its study ART), otiueto contribute o cnmc and economics nmclinear ynamic parsimonious tiuinis ntribution okwhich work §1,2 Sylvia , ∗ nk effects. We apply the ART model to time-varying multilayer networks of international trade and capital stock and study the propagation of shocks across countries, over time and between layers.

Keywords: Tensor calculus; multidimensional autoregression; Bayesian statis- tics; sparsity; dynamic networks; international trade

1 Introduction

The increasing availability of long time series of complex-structured data, such as multidimensional tables ([8], [21]), multidimensional panel data ([11], [9], [24], [59]), multilayer networks ([3], [57]), EEG ([50]), neuroimaging ([68]) has put forward some limitations of the existing multivariate econometric models. Tensors, i.e. multidimensional arrays, are the natural class where this kind of complex data belongs. A naïve approach to model tensors relies on reshaping them into lower-dimensional objects (e.g., vectors and matrices) which can then be easily handled using stan- dard multivariate statistical tools. However, mathematical representations of tensor-valued data in terms of vectors have non-negligible drawbacks, such as the difficulty of accounting for the intrinsic structure of the data (e.g., cells of a matrix representing a geographical map or pairwise relations, contiguous pixels in an image). Neglecting this information in the modelling might lead to inefficient estimation and misleading results. Tensor-valued data entries are highly likely to depend on contiguous cells (within and between modes) and collapsing the data into a vector destroys this information. Thus, statistical approaches based on vectorization are unsuited for modelling tensor-valued data. Tensors have been recently introduced in statistics and machine learning (e.g., [38], [48]) and provide a fundamental background for efficient algorithms in Big Data handling (e.g., [23]). However, a compelling statistical approach extending results for scalar random variables to multidimensional random objects beyond di- mension 2 (i.e., matrix-valued random variables, see [37]) is lacking and constitutes a promising field of research. The development of novel statistical methods able to deal directly with tensor- valued data (i.e., without relying on vectorization) is currently an open field of research in statistics and econometrics, where such kind of data is becoming in- creasingly available. The main purpose of this article is to contribute to this growing literature by proposing an extension of standard multivariate economet-

2 ric regression models to tensor-valued response and covariates. Matrix-valued statistical models have been widely employed in time series econometrics over the past decades, especially for state space representations ([39]), dynamic linear models ([21], [63]), Gaussian graphical models ([20]), stochas- tic volatility ([61], [35], [34]), classification of longitudinal datasets ([62]), models for network data ([29], [70], [71]) and factor models ([22]). [27] proposed a bilinear multiplicative matrix regression model, which in vec- tor form becomes a VAR(1) with restrictions on the covariance matrix. The main shortcoming in using bilinear models is the difficulty in introducing sparsity. Im- posing zero restrictions on a subset of the reduced form coefficients implies zero restrictions on the structural coefficients. Recent papers dealing with tensor-valued data include [68] and [64], who pro- posed a generalized linear model to predict a scalar real or binary outcome by ex- ploiting the tensor-valued covariate. Instead, [66], [67] and [43] followed a Bayesian nonparametric approach for regressing a scalar on tensor-valued covariate. An- other stream of the literature considers regression models with tensor-valued re- sponse and covariates. In this framework, [50] proposed a model for cross-sectional data where response and covariates are tensors, and performed sparse estimation by means of the envelope method and iterative maximum likelihood. [41] exploited a multidimensional analogue of the matrix SVD (the Tucker decomposition) to de- fine a parsimonious tensor-on-tensor regression. We propose a new dynamic linear regression model for tensor-valued response and covariates. We show that our framework admits as special cases Bayesian VAR models ([60]), Bayesian panel VAR models ([16]) and Multivariate Autoregressive Index models (i.e. MAI, see [19]), as well as univariate and matrix regression models. Furthermore, we exploit a suitable tensor decomposition for providing a parsimonious parametrization, thus making inference feasible in high-dimensional models. One of the areas where these models can find application is network econometrics. Most statistical models for network data are static ([26]), whereas dynamic models maybe more adequate for many applications (e.g., banking) where data on network evolution are becoming available. Few attempts have been made to model time-varying networks (e.g., [42], [47], [4]), and most of the contributions have focused on providing a representation and a description of temporally evolving graphs. We provide an original study of time-varying economic and financial networks and show that our model can be successfully used to carry out impulse response analysis in this multidimensional setting.

3 The remainder of this paper is organized as follows. Section 2 provides an introduction to tensor algebra and presents the new modelling framework. Section 3 discusses parametrization strategies and a Bayesian inference procedure. Section 4 provides an empirical application and section 5 gives some concluding remarks. Further details and results are provided in the supplementary material.

2 A Dynamic Tensor Model

In this section, we present a dynamic tensor regression model and discuss some of its properties and special cases. We review some notions of which will be used in this paper, and refer the reader to Appendix A and the supplement for further details.

2.1 Tensor Calculus and Decompositions

The use of tensors is well established in physics and mechanics (e.g., see [7] and [2]), but few contributions have been made beyond these disciplines. For a general introduction to the algebraic properties of tensor spaces, see [38]. Noteworthy introductions to operations on tensors and tensor decompositions are [49] and [45], respectively. A N-order real-valued tensor is a N-dimensional array = ( ) X Xi1,...,iN ∈ RI1×...×IN with entries with i = 1,...,I and n = 1,...,N. The or- Xi1,...,iN n n der is the number of dimensions (also called modes). Vectors and matrices are examples of 1- and 2-order tensors, respectively. In the rest of the paper we will use lower-case letters for scalars, lower-case bold letters for vectors, capital letters for matrices and calligraphic capital letters for tensors. We use the symbol “:” to indicate selection of all elements of a given mode of a tensor. The mode-k fiber is the vector obtained by fixing all but the k-th index of the tensor, i.e. the equiva- lent of rows and columns in a matrix. Tensor slices and their generalizations, are obtained by keeping fixed all but two or more dimensions of the tensor. It can be shown that the set of N-order tensors RI1×...×IN endowed with the standard addition + = ( + ) and scalar multiplication α = A B Ai1,...,iN Bi1,...,iN A (α ), with α R, is a . We now introduce some operators on Ai1,...,iN ∈ the set of real tensors, starting with the contracted product, which generalizes the matrix product to tensors. The contracted product between RI1×...×IM and X ∈ RJ1×...×JN with I = J , is denoted by and yields a (M +N 2)-order Y ∈ M 1 X ×M Y −

4 tensor RI1×...×IM−1×J1×...×JN−1 , with entries Z ∈

IM =( ) = . Zi1,...,iM−1,j2,...,jN X ×M Y i1,...,iM−1,j2,...,jN Xi1,...,iM−1,iM YiM ,j2,...,jN i =1 XM When = y is a vector, the contracted product is also called mode-M prod- Y uct. We define with ¯ a sequence of contracted products between the X ×N Y (K + N)-order tensor RJ1×...×JK ×I1×...×IN and the (N + M)-order tensor X ∈ RI1×...×IN ×H1×...×HM . Entry-wise, it is defined as Y ∈

I1 IN

¯ N = ... j ,...,j ,i ,...,i i ,...,i ,h ,...,h . X × Y j1,...,jK ,h1,...,hM X 1 K 1 N Y 1 N 1 M i1=1 iN =1  X X Note that the contracted product is not commutative. The outer product be- ◦ tween a M-order tensor RI1×...×IM and a N-order tensor RJ1×...×JN is X ∈ Y ∈ a (M + N)-order tensor RI1×...×IM ×J1×...×JN with entries = Z ∈ Zi1,...,iM ,j1,...,jN ( ) = . X ◦Y i1,...,iM ,j1,...,jN Xi1,...,iM Yj1,...,jN Tensor decompositions allow to represent a tensor as a function of lower dimen- sional variables, such as matrices of vectors, linked by suitable multidimensional operations. In this paper, we use the low-rank parallel factor (PARAFAC) de- composition, which allows to represent a N-order tensor in terms of a collection of vectors (called marginals). A N-order tensor is of rank 1 when it is the outer product of N vectors. Let R be the rank of the tensor , that is minimum number X of rank-1 tensors whose linear combination yields . The PARAFAC(R) decom- X position is rank-R decomposition which represents a N-order tensor as a finite B sum of R rank-1 tensors r defined by the outer products of N vectors (called (r) B marginals) β RIj j ∈

R R = = β(r) ... β(r), = β(r) ... β(r). (1) B Br 1 ◦ ◦ N Br 1 ◦ ◦ N r=1 r=1 X X The mode-n matricization (or unfolding), denoted by X = mat ( ), is the (n) n X operation of transforming a N-dimensional array into a matrix. It consists X in re-arranging the mode-n fibers of the tensor to be the columns of the matrix X , which has size I I∗ with I∗ = I . The mode-n matricization (n) n × (−n) (−n) i6=n i of maps the (i ,...,i ) element of to the (i , j) element of X , where j = X 1 N X Q n (n) 1+ (i 1) m−1 I . For some numerical examples, see [45] and Appendix m6=n m − p6=n p A. TheP mode-1 unfoldingQ is of interest for providing a visual representation of a

5 tensor: for example, when be a 3-order tensor, its mode-1 matricization X is X (1) a I I I matrix obtained by horizontally stacking the mode-(1, 2) slices of the 1 × 2 3 tensor. The vectorization operator stacks all the elements in direct lexicographic ∗ order, forming a vector of length I = i Ii. Other orderings are possible, as long as it is consistent across the calculations.Q The mode-n matricization can also be used to vectorize a tensor , by exploiting the relationship vec = vec X , X X (1) where vec X(1) stacks vertically into a vector the columns of the matrix X(1). Many product operations have been defined for tensors (e.g., see [49]), but here we constrain ourselves to the operators used in this work. For the ease of notation, we will use the multiple-index summation for indicating the sum over all the corresponding indices.

Remark 2.1. Consider a N-order tensor RI1×...×IN with a PARAFAC(R) decomposition (r) B ∈ (with marginals β ), a (N 1)-order tensor RI1×...×IN−1 and a vector x j − Y ∈ ∈ RIN . Then

′ = x vec = B′ x vec = x′B Y B ×N ⇐⇒ Y (N) ⇐⇒ Y (N)

 ′  where B = R β(r) vec β(r) ... β(r) . (N) r=1 N 1 ◦ ◦ N−1 P  2.2 A General Dynamic Tensor Model

Let be a (I ... I )-dimensional tensor of endogenous variables, a Yt 1 × × N Xt (J ... J )-dimensional tensor of covariates, and S = N 1,...,I NN 1 × × M y j=1{ j} ⊂ and S = M 1,...,J NM sets of n-tuples of integers.× We define the x j=1{ j} ⊂ autoregressive× tensor model of order p, ART(p), as the system of equations

p iid 2 i = i + i k k + i m m + i , i (0, σ ), (2) Y ,t A ,0 A , ,jY ,t−j B , X ,t E ,t E ,t ∼N i j=1 k m X X∈Sy X∈Sx

t = 1, 2,..., with given initial conditions ,..., RI1×...×IN , where i = Y−p+1 Y0 ∈ (i ,...,i ) S and i is the i-th entry of . The general model in eq. (2) 1 N ∈ y Y ,t Yt allows for measuring the effect of all the cells of and of the lagged values of Xt Yt on each endogenous variable. We give two equivalent compact representations of the multilinear system (2). The first one is used for studying the stability property of the process and is ob- tained through the contracted product that provides a natural setting for multilin- ear forms, decompositions and inversions. From (2) one gets the tensor equation

6 p = + ¯ + ¯ + , iid ( , Σ ,..., Σ ), (3) Yt A0 Aj×N Yt−j B×M Xt Et Et ∼NI1,...,IN O 1 N j=1 X e e where ¯ is a shorthand notation for the contracted product 1...a , is ×a,b ×a+1...a+b A0 a N-order tensor of the same size as t, j, j = 1,...,p, are 2N-order tensors Y A e of size (I1 ... IN I1 ... IN ) and is a (N + M)-order tensor of size × × × × × e B (I ... I J ... J ). The error term follows a N-order tensor normal 1 × × N × 1 × × M Et distribution ([54]) with probability density function

exp 1 ( ) ¯ N Σ−1 ¯ ( ) − 2 E−M ×N,0 ◦j=1 j ×N,0 E−M fE ( )= ∗ , (4) E  I∗/2 N I−j /2  (2π) j=1 Σj  Q where I∗ = I and I∗ = I , and are N-order tensors of size I i i −i j6=i j E M 1 × ... I . Each covariance matrix Σ RIj ×Ij , j = 1,...,N, accounts for the × N Q Q j ∈ dependence along the corresponding mode of . E The second representation of the ART(p) in eq. (2) is used for developing inference. Let be the (I ... I m)-dimensional commutation tensor Km 1 × × N × such that σ ¯ = I , where σ is the tensor obtained by flipping the modes Km×N,0Km m Km ∗ of . Define the (I ... I I )-dimensional tensor = ¯ ∗ and Km 1 × × N × Aj Aj×N KI the (I ... I J ∗)-dimensional tensor = ¯ ∗ , with J ∗ = J . We 1 × × N × B B×N KJ j j ¯ e obtain j N+1 vec t−j = j N t−j and the compact representationQ A × Y A × Y e  p e = + vec + vec + , Yt A0 Aj ×N+1 Yt−j B ×N+1 Xt Et j=1 (5) X   iid ( , Σ ,..., Σ ). Et ∼NI1,...,IN O 1 N

Let T =(RI1×...×IN ×I1×...×IN , ¯ ) be the space of (I ... I I ... I )- ×N 1 × × N × 1 × × N dimensional tensors endowed with the contracted product ¯ . We define the ×N identity tensor T to be the neutral element of ¯ , that is the tensor whose I ∈ ×N entries are =1 if i = i for all k =1,...,N and 0 otherwise. Ii1,...,iN ,iN+1,...,i2N k k+N The inverse of a tensor T is the tensor −1 T satisfying −1 ¯ = A ∈ A ∈ A ×N A ¯ −1 = . A complex number λ C and a nonzero tensor RI1×...×IN are A×N A I ∈ X ∈ called eigenvalue and eigenvector of the tensor T if they satisfy the multilinear A ∈ equation ¯ = λ . We define the spectral radius ρ( ) of to be the largest A×N X X A A modulus of the eigenvalues of . We define a stochastic process to be weakly A stationary if the first and second moment of its finite dimensional distributions are finite and constant in t. Finally, note that it is always possible to rewrite an

7 ART(p) process as a ART(1) process on an augmented state space, by stacking the endogenous tensors along the first mode. Thus, without loss of generality, we focus on the case p = 1. We use the definition of inverse tensor, spectral radius and the convergence of power series of tensors to prove the following result.

Lemma 2.1. Every (I I ... I I I ... I )-dimensional ART(p) process = 1 × 2 × × N × 1 × 2 × × N Yt p ¯ + can be rewritten as a (pI I ... I pI I ... I )- k=1 Ak×N Yt−j Et 1 × 2 × × N × 1 × 2 × × N dimensional ART(1) process = ¯ + . P Yt A×N Yt−1 E t Proposition 2.1 (Stationarity). If ρ( ) < 1 and the process is weakly stationary, then the ART process in eq. A1 Xt (3), with p =1, is weakly stationary and admits the representation e ∞ ∞ =( )−1 ¯ + k ¯ ¯ + k ¯ . Yt I − A1 ×N A0 A1×N B×M Xt−k A1×N Et−k Xk=0 Xk=0 e e e e e Proposition 2.2. The VAR(p) in eq. (19) is weakly stationary if and only if the ART(p) in eq. (3) is weakly stationary.

2.3 Parametrization

The unrestricted model in eq. (5) cannot be estimated, as the number of param- eters greatly outmatches the available data. We address this issue by assuming a PARAFAC(R) decomposition for the tensor coefficients, which makes the esti- mation feasible by reducing the of the parameter space. The models in eqq. (5)-(3) are equivalent but the assuming a PARAFAC decomposition for the coefficient tensors leads to different degrees of parsimony, as shown in the following remark.

Remark 2.2 (Alternative parametrization via contracted product). The two models (5) and (3) combined with the PARAFAC decomposition for the tensor coefficients allow for different degree of parsimony. To show this, without loss of generality, focus on the coefficient tensor (similar argument holds for A1 j, j = 2,...,p and ). By assuming a PARAFAC(R) decomposition for 1 in A B e A (3) and for 1 in (5), we get, respectively e A e e R R = α(r) ... α(r) α(r) ... α(r) , = α(r) ... α(r) α(r) , A1 1 ◦ ◦ N ◦ N+1 ◦ ◦ 2N A1 1 ◦ ◦ N ◦ N+1 r=1 r=1 X X e e e e e 8 (r) (r) The length of the vectors αj and αj coincide for each j = 1,...,N. How- (r) ∗ (r) (r) ever, αN+1 has length I while αN+1,..., α2N have length I1,...,IN , respec- tively. Therefore, the number of freee parameters in the coefficient tensor is A1 R(I + ... + I + N I ), whilee it is 2R(I e+ ... + I ) for . This highlights 1 N j=1 j 1 N A1 the greater parsimonyQ granted by the use of the PARAFAC(R) decomposition in e model (3) as compared to model (5).

Remark 2.3 (Vectorization). There is a relation between the (I ... I )-dimensional ART(p) and a (I ... 1 × × N 1 · · IN )-dimensional VAR(p) model. The vector form of (5) is

p vec = vec + mat ( ) vec + mat ( ) vec + vec Yt A0 N+1 Aj Yt−j N+1 B Xt Et j=1  p X    ′ ′ y = α + A y + B x + ǫ , ǫ ∗ (0, Σ ... Σ ), (6) t 0 (N+1),j t−j (N+1) t t t ∼NI N ⊗ ⊗ 1 j=1 X where the constraint on the covariance matrix stems from the one-to-one relation between the tensor normal distribution for and the distribution of its vector- X ization ([54]) given by ( , Σ ,..., Σ ) if and only if vec X ∼NI1,...,IN M 1 N X ∼ ∗ (vec , Σ ... Σ ). The restriction on the covariance structure for NI M N ⊗ ⊗ 1  the vectorized  tensor provides a parsimonious parametrization of the multivariate normal distribution, while allowing both within and between mode dependence. Al- ternative parametrizations for the covariance lead to generalizations of standard models. For example, assuming an additive covariance structure results in the tensor ANOVA. This is an active field for further research.

Example 2.1. For the sake of exposition, consider the model in eq. (5), where p = 1, the re- sponse is a 3-order tensor Rd×d×d and the covariates include only a constant Yt ∈ coefficient tensor . Define by k the number of parameters of the noise dis- A0 E tribution. The total number of parameters to estimate in the unrestricted case is 2N 2N (d )+ kE = O(d ), with N = 3 in this example. Instead, in a ART model de- fined via the mode-n product in eq. (5), assuming a PARAFAC(R) decomposition on the total number of parameters is R (dN +dN )+k = O(dN ). Finally, in A0 r=1 E the ART model defined by the contractedP product in eq. (3) with a PARAFAC(R) decomposition on the number of parameters is R Nd + k = O(d). A com- A0 r=1 E parison of the different parsimony granted by the PARAFAC decomposition in all e P models is illustrated in Fig. 1.

9 (a) vectorized (b) contracted product ¯ N as (5) (c) contracted product ¯ N as (3) × ×

Figure 1: Number of parameters in , in log-scale (vertical axis) as function of the size A0 d of the (d d d)-dimensional tensor t (horizontal axis) in a ART(1) model. In all plots: unconstrained× × model (solid line),Y PARAFAC(R) parametrization with R = 10 (dashed line) and R = 5 (dotted line). Parametrizations: vectorized model (panel a), mode-n product of (5) (panel b) and contracted product of (3) (panel c).

The structure of the PARAFAC decomposition poses an identification problem (r) for the marginals βj , which may arise from three sources: (i) scale identification, since λ β(r) λ β(r) = β(r) β(r) for any collection jr j ◦ kr k j ◦ k λ such that J λ =1; { jr}j,r j=1 jr (ii) permutation identification,Q since for any permutation of the indices 1,...,R { } the outer product of the original vectors is equal to that of the permuted ones;

(iii) orthogonal transformation identification, since β(r)Q β(r)Q = β(r) β(r) for j ◦ k j ◦ k any orthonormal matrix Q.

Note that in our framework these issues do not hamper the inference, since our object of interest is the coefficient tensor , which is exactly identified. The (r) B marginals βj have no interpretation, as the PARAFAC decomposition is assumed on the coefficient tensor for the sake of providing a parsimonious parametrization.

2.4 Important Special Cases

The model in eq. (5) is a generalization of several well-known econometric models, as shown in the following remarks. See the supplement for the proofs of these results.

Remark 2.4 (Univariate).

If Ii =1 for i =1,...,N, then model (5) reduces to a univariate regression

p y = α + α y + β′ vec + ǫ ǫ (0, σ2), (7) t 0 j t−j Xt t t ∼N j=1 X  10 ∗ where the coefficients of (5) become = α R, j =0,...,p and = β RJ . Aj j ∈ B ∈ Remark 2.5 (SUR).

If Ii =1 for i =2,...,N and define by 1n the unit vector of length n, then model (5) reduces to a Seemingly Unrelated Regression (SUR) model ([65])

y = α + B vec + ǫ ǫ (0, Σ), (8) t 0 ×2 Xt t t ∼Nm  where I1 = m and the coefficients of (5) become j =0, j =1,...,p, 0 = α0 ∗ A A ∈ Rm and = B Rm×J . Note that, by definition, B vec = B vec . B ∈ ×2 Xt Xt Remark 2.6 (VARX and Panel VAR).  

Consider the setup of Remark 2.5. If zt = yt−1, then weoobtain a VARX(1) model, with restricted covariance matrix. Another vector of regressors w = vec W t t ∈ Rq may enter the regression (8) pre-multiplied (along mode-3) by a tensor D ∈ Rm×n×q. Therefore, model (5) encompasses as a particular case also the panel VAR models of [16], [18], [17], provided that we make the same restriction on Σ.

Remark 2.7 (VECM). The model in eq. (5) generalises the Vector Error Correction Model (VECM) widely used in multivariate time series analysis (see [31], [58]). Consider a K- dimensional VAR(1) model

y = By + ǫ ǫ (0, Σ). t t−1 t t ∼Nm

Defining ∆y = y y and Π=(B I) = αβ′, where α and β are K R t t − t−1 − × matrices of rank R

′ ∆yt = αβ yt−1 + ǫt. (9)

This is used for studying the cointegration relations among the components of yt. (r) (r) Since Π = αβ′ = R α β′ = R β˜ β˜ , we can interpret the VECM r=1 :,r :,r r=1 1 ◦ 2 model in eq. (9) as a particular case of the model in eq. (5) where the coefficient P P (r) (r) is the matrix Π = αβ′. Furthermore by writing Π = R β˜ β˜ we B r=1 1 ◦ 2 can interpret this relation as a rank-R PARAFAC decomposition of . Following P B ˜ (r) this analogy, the PARAFAC rank corresponds to the cointegration rank, β1 are ˜ (r) ˜(r) ˜(r) the mean-reverting coefficients and β2 = (β2,1 ,..., β2,K) are the cointegrating vectors. See the supplement for details. This interpretation opens the way to reparametrization of based on tensor SVD representations, and to the application B

11 of regularization methods in the spirit of [10]. This is beyond the scope of the paper, thus we leave it for further research.

Remark 2.8 (MAI of [19]). The multivariate autoregressive index model (MAI) of [19] is another special case of model (5). A MAI is a VAR model with a low rank decomposition imposed on the coefficient matrix, as follows

yt = AB0yt−1 + ǫt, where y is a (n 1) vector, whereas A, B are (n R) and (R n) matri- t × 0 × × ces, respectively. In [19], the authors assumed R = 1. This corresponds to our (1) ′ (1) parametrization using R =1 and defining Aβ1 and B0 = β2 , which leads us to AB = β(1) β(1). 0 1 ◦ 2 Remark 2.9 (Tensor autoregressive model (ART)). By removing all the covariates from eq. (5) except the lags of the dependent vari- able, we obtain a tensor autoregressive model of order p (or ART(p))

p = + vec + , iid (0, Σ ,..., Σ ). (10) Yt A0 Aj ×N+1 Yt−j Et Et ∼NI1,...,IN 1 N j=1 X  Matrix autoregressive models (MAR) are another special case of (5), which can be obtained from eq. (10) when the dependent variable is a matrix. See the supplement for an example.

2.5 Impulse Response Analysis

In this section we derive two impulse response functions (IRF) for ART models, the block Cholesky IRF and the block generalised IRF, exploiting the relationship between ART and VAR models. Without loss of generality, we focus on the ART(p) model in eq. (10), with p = 1 and = 0, and introduce the following A0 notation. Let y = vec and ǫ = vec ∗ (0, Σ) be the (I∗ 1) tensor t Yt t Et ∼NI × response and noise term in vector form, respectively, where Σ = Σ ... Σ is   N ⊗ ⊗ 1 the (I∗ I∗) covariance of the model in vector form and I∗ = N I . Partition × k=1 k Σ in blocks as A B Q Σ= , (11) B′ C  

12 where A is n n, B is n (I∗ n) and C is (I∗ n) (I∗ n). Then, denoting × × − − × − by S = C B′A−1B the Schur complement of A, the LDU decomposition of Σ is − −1 ∗ O ∗ I A B In On,I −n A n,I −n n ′ Σ= ′ −1 ′ ′ = LDL . B A II∗−n O ∗ S O ∗ II∗−n   n,I −n  n,I −n  Hence Σ can be block-diagonalised

O ∗ −1 ′ −1 A n,I −n D = L Σ(L ) = ′ . (12) O ∗ S  n,I −n  From the Cholesky decomposition of D one obtains a block Cholesky decomposi- tion ′ −1 LA On,I∗−n LA LA B ′ ′ Σ= ′ −1 ′ ′ = PP , B (L ) L O ∗ L  A S  n,I −n S  where LA, LS are the Cholesky factors of A and S, respectively. Assume the

vectorised ART process admits an infinite MA representation, with Ψ0 = II∗ and Ψ = mat ( )′Ψ , then using the previous results we get: i (4) B i−1 ∞ ∞ ∞ −1 y = Ψ ǫ = (Ψ L)(L ǫ )= (Ψ L)η η ∗ (0,D), (13) t i t−i i t−i i t−i t ∼NI i=0 i=0 i=0 X X X −1 where ηt = L ǫt are the block-orthogonalised shocks and D is the block-diagonal matrix in eq. (12). Denote with E the I∗ n matrix that selects n columns from n × a pre-multiplying matrix, i.e. DEn is a matrix containing n columns of D. Denote with δ∗ a n-dimensional vector of shocks. Using the property of the multivariate Normal distribution, and recalling that the top-left block of size n of D is A, we extend the generalised IRF of [46] and [56] by defining the block generalised IRF

G ′ ∗′ ′ ψ (h; n)= E vec vec =(δ , 0 ∗ ), E vec Yt+h | Et I −n Ft−1 − Yt+h |Ft−1 −1 ∗ = (ΨhL)DE nA δ ,    (14)

where is the natural filtration associated to the stochastic process. Starting Ft from eq. (13) we derive the block Cholesky IRF (OIRF) as

O ′ ∗′ ′ ψ (h; n)= E vec vec =(δ , 0 ∗ ), Yt+h | Et I −n Ft−1 E ′ ′ vec t+h vec  t = 0 ∗ , t−1  − Y | E I F ∗ = (ΨhL)P Enδ .    (15)

13 ∗ Define with ej the j-th column of the I -dimensional identity matrix. The impact of a shock δ∗ to the j-th variable on all I∗ variables is given below in eq. (16), whereas the impact of a shock to the j-th variable on the i-th variable is given in eq. (17).

G −1 ∗ O ∗ ψj (h; n)= ΨhLDejDjj δ , ψj (h; n)= ΨhLP ejδ (16) G ′ −1 ∗ O ′ ∗ ψij (h; n)= eiΨhLDejDjj δ , ψij (h; n)= eiΨhLP ejδ . (17)

∗ Finally, denoting δj = ejδ , we have the compact notation

G −1 O ψj (h; n)= ΨhLDDjj δj, ψj (h; n)= ΨhLP δj G ′ −1 O ′ ψij (h; n)= eiΨhLDDjj δj, ψij (h; n)= eiΨhLP δj.

3 Bayesian Inference

In this section, without loss of generality, we present the inference procedure for a special case of the model in eq. (5), given by

= vec + , iid (0, Σ , Σ , Σ ). (18) Yt B ×4 Yt−1 Et Et ∼NI1,I2,I3 1 2 3  Here is a 3-order tensor response of size I I I , = and is thus a Yt 1 × 2 × 3 Xt Yt−1 B 4-order coefficient tensor of size I I I I , with I = I I I . This is a 3-order 1 × 2 × 3 × 4 4 1 2 3 tensor autoregressive model of lag-order 1, or ART(1), coinciding with eq. (10) for p =1 and = 0. The noise term has as tensor normal distribution, with A0 Et zero mean and covariance matrices Σ , Σ , Σ of sizes I I , I I and I I , 1 2 3 1 × 1 2 × 2 3 × 3 respectively, accounting for the covariance along each of the three dimensions of . Yt The specification of a tensor model with a tensor normal noise instead of a vector model (like a Gaussian VAR) has the advantage of being more parsimonious. By vectorising (18), we get the equivalent VAR

′ iid vec = B vec + vec , vec ∗ (0, Σ Σ Σ ), (19) Yt (4) Yt−1 Et Et ∼NI 3 ⊗ 2 ⊗ 1     whose covariance has a Knocker structure, which contains (I1(I1 +1)+I2(I2 +1)+ ∗ ∗ I3(I3 + 1))/2 parameters (as opposed to (I (I + 1))/2 of an unrestricted VAR) and allows for heteroskedasticity. The choice the Bayesian approach for inference is motivated by the fact that the large number of parameters may lead to an overfitting problem, especially when the samples size is rather small. This issue can be addressed by the indirect inclusion

14 of parameter restrictions through a suitable specification of the corresponding prior distributions. In the unrestricted model (18) it would be necessary to define a prior distribution on the 4-order tensor . The literature on tensor-valued B distributions is limited to the elliptical family (e.g., [54]), which includes the tensor normal and tensor t. Both distributions do not easily allow for the specification of restrictions on a subset of the entries of the tensor, hampering the use of standard regularization prior distributions (such as shrinkage priors). The PARAFAC(R) decomposition of the coefficient tensor provides a way to circumvent this issue. This decomposition allows to represent a tensor through a collection of vectors (the marginals), for which many flexible shrinkage prior distributions are available. Indirectly, this introduces a priori sparsity on the coefficient tensor.

3.1 Prior Specification

The choice of the prior distribution on the PARAFAC marginals is crucial for recovering the sparsity pattern of the coefficient tensor and for the efficiency of the inference. Global-local prior distributions are based on scale mixtures of normal distributions, where the different components of the covariance matrix govern the amount of prior shrinkage. Compared to spike-and-slab distributions (e.g., [52], [33], [44]) which become infeasible as the parameter space grows, global-local priors have better scalability properties in high-dimensional settings. They do not provide automatic variable selection, which can nonetheless be obtained by post-estimation thresholding ([55]). Motivated by these arguments, we define a global-local shrinkage prior for the (r) marginals βj of the coefficient tensor following the hierarchical prior specifi- B (r) cation of [36] (see also [13], [69]). For each βj , we define a prior distributions as a scale mixture of normals centred in zero, with three components for the covariance. The global parameter τ governs the overall variance, the middle pa- rameter φr defines the common shrinkage for the marginals in r-th component of the PARAFAC, and the local parameter Wj,r = diag(wj,r) drives the shrinkage of each entry of each marginal. Summarizing, for p =1,...,Ij, j =1,...,J (J = 4 in eq. (18)) and r = 1,...,R, the hierarchical prior structure1 for each vector of

1We use the shape-rate formulation for the gamma distribution.

15 the PARAFAC(R) decomposition in eq. (1) is

π(φ) ir(α1 ) π(τ) a(a , b ) π(λ ) a(a , b ) ∼D R ∼ G τ τ j,r ∼ G λ λ π(w λ ) xp(λ2 /2) (20) j,r,p| j,r ∼E j,r π β(r) W , φ, τ (0,τφ W ), j j,r ∼NIj r j,r

 1/J where 1R is the vector of ones of length R and we assume aτ = αR and bτ = αR . The conditional prior distribution of a generic entry b of is the law of a i1,...,iJ B sum of product Normals2: it is symmetric around zero, with fatter tails than both a standard Gaussian or a standard Laplace distribution (see the supplement for further details). Note that a product Normal prior promotes sparsity due to the peak at zero. The following result characterises the conditional prior distribution of an entry of the coefficient tensor induced by the hierarchical prior in eq. (20). B See the supplement for the proof.

Lemma 3.1. R (r) (r) (r) (r) Let bijkp = r=1 βr, where βr = β1,i β2,j β3,kβ4,p, and let m1 = i, m2 = j, m3 = k and m4 = pP. Under the prior specification in (20), the generic entry bijkp of the coefficient tensor has the conditional prior distribution B R π(b τ, φ, W)= p β = p(β ) ... p(β ), ijkp| r − 1|− ∗ ∗ R|−  r=1  X where denotes convolution and ∗ 4 p(β )= K G4,0 β2 (2τφ w )−1 0 , r|− r · 4,0 r r h,r,mh  h=1  Y with Gm,n(x aa ) a Meijer G-function and p,q | b

4 ∞ 4 c+i −s 4,0 2 −1 1 2 −1 G4,0 βr (2τφrwh,r,mh ) 0 = βr (2τφrwh,r,mh ) ds 2πi c−i∞  hY=1  Z  hY=1  4 −4/2 −1 Kr = (2π) (2τφrwh,r,mh ) . hY=1 The use of Meijer G- and Fox H-functions is not new in econometrics (e.g., [1]), and they have been recently used for defining prior distributions in Bayesian

2A product Normal is the distribution of the product of n independent centred Normal random variables.

16 aγ bγ aτ bτ α aλ bλ

γ τ φr Wj,r λj,r

(r) βj νj

Ψ Σ j j Yt B t = 1 ,...,T

Figure 2: Directed acyclic graph of the model in eq. (18) and prior structure in eqq. (20)-(21). Gray circles denote observable variables, white solid circles indicate param- eters, white dashed circles indicate fixed hyperparameters. Directed edges represent the conditional independence relationships. statistics ([6], [5]).

From eq. (4), we have that the covariance matrices Σj enter the likelihood in a multiplicative way, therefore separate identification of their scales requires further restrictions. [63] and [28] adopt independent hyper-inverse Wishart prior

distributions ([25]) for each Σj, then impose the identification restriction Σj,11 =1 for j =2,...,J 1. The hard constraint Σ = I (where I is the identity matrix − j Ij j of size j), for all but one n, implicitly imposes that the dependence structure within different modes is the same, but there is no dependence between modes. We follow [40], who suggests to introduce dependence between the Inverse Wishart

prior distribution of each Σj via a hyper-parameter γ affecting their prior scale. To account for marginal dependence, we add a level of hierarchy, thus obtaining

π(γ) a(a , b ) π(Σ γ) (ν ,γΨ ). (21) ∼ G γ γ j| ∼ IW Ij j j

Define Λ = λ : j = 1,...,J, r = 1,...,R and W = W : j =1,...,J, r = { j,r } { j,r 1,...,R , and let θ denote the collection of all parameters. The directed acyclic } graph (DAG) of the prior structure is given in Fig. 2. Note that our prior specification is flexible enough to include Minnesota-type restrictions or hierarchical structures as in [16].

17 3.2 Posterior Computation

Define Y = T , I = J I , β(r) = β(r) : i = j and = B : i = r , {Yt}t=1 0 j=1 j −j { i 6 } B−r { i 6 } with B = β(r) ... β(r). The likelihood function of model (18) is r 1 ◦ ◦ 4 P

T 3 I I −j − 4 − L(Y θ)= (2π) 2 Σ 2 | j t=1 j=1 Y Y 1 exp Σ−1( y ) 1...3 3 Σ−1 1...3 ( y ) , (22) · − 2 2 Yt − B ×4 t−1 ×1...3 ◦j=1 j ×1...3 Yt − B ×4 t−1    where y = vec . Since the posterior distribution is not tractable in closed t−1 Yt−1 form, we adopt an MCMC procedure based on Gibbs sampling. The technical details of the derivation of the posterior distributions are given in Appendix B. We articulate the sampler in three main blocks:

(I) sample the global and middle variance hyper-parameters of the marginals, from

p(ψ , W,α) GiG α I /2, 2b , 2C (23) r|B ∝ − 0 τ r  R p(τ , W, φ) GiG a RI /2, 2b , 2 C /φ , (24) |B ∝ τ − 0 τ r r r=1 X  ′ J (r) −1 (r) R where Cr = j=1 βj Wj,r βj , then set φr = ψr/ l=1 ψl. For improving the mixing, weP sample τ with a Hamiltonian Monte CarloP (HMC) step ([53]). (II) sample the hyper-parameters of the local variance component of the marginals and the marginals themselves, from

p λ β(r),φ , τ a a + I , b + β(r) (τφ )−1/2 (25) j,r| j r ∝ G λ j λ j 1 r (r) 2 (r) 2 pwj,r,p λj,r,φr, τ, β GiG 1/2, λ , (β ) /(τφr) (26) | j ∝ j,r j,p (r) (r) ¯ pβ β , −r, Wj,r,φ r, τ, Y,Σ1,..., Σ3 Ij (µ¯ β ,Σβ ). (27) j | −j B ∝N j j  (III) sample the covariance matrices and the latent scale, respectively, from

p(Σ , Y, Σ ,γ) (ν + I , γΨ + S ) (28) j|B −j ∝ IW Ij j j j j 3 3 p(γ Σ ,..., Σ ) a a + ν I , b + tr(Ψ Σ−1) . (29) | 1 3 ∝ G γ j j γ j j j=1 j=1  X X 

18 4 Application to Multilayer Dynamic Networks

We apply the proposed methodology to study jointly the dynamics of international trade and credit networks. The international trade network has been previously studied by several authors (e.g., [32], [30]), but to the best of our knowledge, this is the first attempt to model the dynamics of two networks jointly. The bilateral trade data come from the COMTRADE database, whereas the data on bilateral outstanding capital come from the Bank of International Settlements database. Our sample of yearly observations for 10 countries runs from 2003 to 2016. At at each time t, the 3-order tensor has size (10, 10, 2) and represents a 2-layer Yt node-aligned network (or multiplex) with 10 vertices (countries), where each edge is given by a bilateral trade flow or financial stock. See the supplement for data description We estimate the tensor autoregressive model in eq. (18), using the prior struc- ture described in section 3, running the Gibbs sampler for N = 100, 000 iterations after 30, 000 burn-in iterations. We retain every second draw for posterior infer- ence.

0.04

0.03

0.02

0.01

0

-0.01

-0.02

-0.03

ˆ Figure 3: Left: mode-4 matricization of estimated coefficient tensor B(4). Right: log- ˆ spectrum of B(4), decreasing order.

0.7 0.7 1.4

0.6 0.6 1.2 0.5 0.5

0.4 1

0.4 0.3 0.8 0.3 0.2

0.6 0.1 0.2

0.4 0 0.1

Figure 4: Estimated covariance matrices: Σˆ 1 (left), Σˆ 2 (center), Σˆ 3 (right).

19 The mode-4 matricization of the estimated coefficient tensor, Bˆ(4), is shown in the left panel of Fig. 3. The (i, j)-th entry of the matrix Bˆ(4) reports the impact of the edge j on edge i (in vectorised form3). The first 100 rows/columns correspond to the edges in the first layer. Hence, two rows of the matricized coefficient tensor are similar when two edges are affected by all the edges of the (lagged) network in a similar way, whereas two similar columns identify the situation where two edges impact the (next period) network in a similar way. The overall distribution of the estimated entries of Bˆ(4) is symmetric around zero and leptokurtic, as a consequence of the shrinkage to zero of the estimated coefficients. The right panel of Fig. 3 shows the log-spectrum of Bˆ(4). As all eigenvalues of Bˆ(4) have modulus smaller than one, we conclude that the estimated ART(1) model is stationary4. Fig. 4 shows the estimated covariance matrices. In all cases, the highest values correspond to individual variances, while the estimated covariances are lower in magnitude and heterogeneous. We also find evidence of heterogeneity in the de- pendence structure, since Σ1, which captures the covariance between rows (i.e., exporting and creditor countries), differs from Σ2, which describes the covariance between columns (i.e., importing and debtor countries). With few exceptions, estimated covariances are positive. After estimating the ART(1) model (18), we may investigate shock propagation across the network computing generalised and orthogonalised impulse response functions presented in equations (14) and (15), respectively. Impulse responses allow us to analyze the propagation of shocks both across the network, within and across layers, and over time. For illustration, we study the responses to a shock in all edges of country, by applying block Cholesky factorisation to Σ, in such a way that the shocked country contemporaneously affects all others and not vice-versa.5 Thus, the matrices A and C in eq. (11) reflect contemporaneous correlations across transactions of the shock-originating country and with transactions of all other countries, respectively. For expositional convenience, we report only statistically significant responses. In the first analysis we consider a negative 1% shock to US trade imports6. The results of the block Cholesky IRF at horizon 1 are given in Fig. 5. We report the impact on the whole network (panel (a)) and, for illustrative purposes, the

3 For example, j = 21 and i =4 corresponds to the coefficient of entry 1,3,1,t−1 on 4,1,1,t. 4It can be shown that the stationarity of the mode-4 matricised coefficientY tensorY implies stationarity of the ART(1) process. 5To save space, we do not report generalised IRFs, which are very similar to the ones pre- sented. 6That is, we allocate the shock across import originating countries to match import shares as in the last period of the sample.

20 impact on Germany’s transactions. The main findings follows. Global effect on the network. The negative shock to US imports has an effect on both layers (trade and financial) of the network. There is evidence of heterogeneous responses across countries and country-specific transactions. On average, trade flows exhibit a slight expansion in response to the shock. Switzerland is the most positively affected, both in terms of exports and imports, and trade imports of the US show (on average) a reverted positive response one period after the shock. This reflects an oscillating impulse response. The overall average effect on the financial layer is negative, similar in magnitude to the effect on the trade layer. More specifically, we observe that Denmark’s and Sweden’s exports to Switzerland, Germany and France show a contraction, whereas the effect on US’, Japan’s and Ireland’s exports to these countries is positive. We may interpret these effects as substitution effects: The decreasing share of Denmark’s and Sweden’s exports to Switzerland, Germany and France is offset by an increase of US, Japanese and Irish exports. In conclusion, the dynamic model can be used for predicting possible trade creation and diversion effects (e.g., see [14]). Local effect on Germany. In panel (b) of Fig. 5 we report the response of Germany’s transactions to the negative shock in US imports. The effects on im- ports are mixed: while Germany’s imports from most other EU countries increase, imports from Sweden and Denmark decrease. Likewise, Germany’s exports show heterogeneous responses, whereby exports to Switzerland react strongest (posi- tively). The shock of US imports does not have a significant impact on Germany’s outstanding credit against most countries (except Switzerland and Japan). On the other hand, the reactions of Germany’s outstanding debt reflect those on trade imports. Local effect on other countries. We observe that the most affected trade trans- actions are those of Denmark, Japan, Ireland, Sweden and US (as exporters) vis-á-vis Switzerland and France (as importers). The financial layer mirrors these effects with opposite sign, while the magnitudes are comparable. Outstanding credit of Ireland and Japan to Switzerland, Germany and France decrease at hori- zon 1. By contrast, Denmark’s outstanding credit to these countries increases. Note that outstanding debt of US vis-á-vis almost all countries decreases after the shock. Overall, responses to a shock on US imports at horizon 1 are heterogeneous in sign but rather low in magnitude, whereas at horizon 2 (plot not reported) the propagation of the shock has vanished. We interpret this as a sign of fast (and monotone) decay of the IRF. Fig. 6 shows the block Cholesky IRF at horizon h = 1, 2, resulting from a

21 (a) Network IRF at h =1 10-4 (b) IRF for Germany’s edges at h =1

exports imports 4 SE SE

JP US US 3 JP AU AU 2 IE CH IE CH DK DK

1 GB DE GB DE FR FR 0 outstanding credit outstanding debt

-1 SE SE JP US JP US

-2 AU AU IE CH IE CH -3 DK DK

Financial (layer 2) Trade (layer 1) GB DE GB DE

-4 Financial (layer 2) TradeFR (layer 1) FR

Figure 5: Shock to US trade imports by -1%. IRF at horizon h =1 for all (panel a) and Germany (panel b) financial and trade transactions. In each plot negative coefficients are in blue and positive in red.

negative 1% shock to GB’s outstanding debt7. The main findings follow. Global effect on the network. We observe heterogeneous effects across countries. Effects on the trade layer at horizon 1 are equally heterogeneous, but smaller in magnitude compared with the financial layer. Local effect on Germany. Compared with other countries, the shock has smaller effects on Germany’s trade. The negative shock to GB’s outstanding debt has a negative impact on Germany’s exports and imports to all countries but Ireland and Sweden for exports and Denmark for imports. Germany’s outstanding credit increases vis-á-vis Denmark, GB, Japan and US. Germany’s outstanding debt increases against all countries but Denmark and Sweden, in particular against France, Japan and Ireland. At horizon 2 responses are not reverted, but nearly all effects turn insignificant, providing evidence of monotone and fast decay of the IRFs. Local effect on other countries. On the trade layer at horizon 1, we observe a positive response in Denmark’s exports and on average a negative response of Switzerland’s, Ireland’s and Japan’s exports. France and Sweden are the most affected countries on the financial layer: The increase in outstanding credit of France towards Germany, Denmark and GB is counterbalanced by a reduction

7Again, the shock is allocated across countries to reflect country-specific shares of the last period in the sample.

22 (a) Network IRF at h =1 10-4 (b) IRF for Germany’s edges at h =1

12 exports outstanding credit

10 SE SE US JP JP US 8 AU AU 6 IE CH IE CH DK DK 4 GB DE GB DE 2 FR FR

0 outflows outstanding debt

SE -2 SE US JP JP US -4 AU AU IE CH IE -6 CH DK DK -8 Financial (layer 2) Trade (layer 1) GB DE GB DE

Financial (layer 2)FR Trade (layer 1) FR

(c) Network IRF at h =2 10-4 (d) IRF for Germany’s edges at h =2

12 exports outstanding credit

10 SE SE US JP JP US 8 AU AU 6 IE CH IE CH DK DK 4 GB DE GB DE 2 FR FR

0 outflows outstanding debt

SE SE -2 JP US JP US -4 AU AU IE CH IE CH -6 DK DK -8 Financial (layer 2) Trade (layer 1) GB DE GB DE

Financial (layer 2)FR Trade (layer 1) FR

Figure 6: Shock to GB capital inflows by -1%. IRF at horizon h = 1 for all (panel a) and Germany (panel b) financial and trade transactions. IRF at horizon h =2 for all (panel c) and Germany (panel d) financial and trade transactions. In each plot negative coefficients are in blue and positive in red. in Sweden’s outstanding credit towards the same countries. We observe reverse effects concerning France’s and Sweden’s outstanding credit towards Switzerland and Ireland. Finally, Ireland’s outstanding credit reacts positively towards most other countries. Compared with responses to the shock to US imports, the persistence of a

23 (a) Network IRF at h =1 10-4 (b) IRF for Germany’s edges at h =1

12 exports outstanding credit

10 SE SE US JP JP US 8 AU AU 6 IE CH IE CH DK DK 4 GB DE GB DE 2 FR FR

0 outflows outstanding debt

SE -2 SE US JP JP US -4 AU AU IE CH IE -6 CH DK DK -8 Financial (layer 2) Trade (layer 1) GB DE GB DE

Financial (layer 2)FR Trade (layer 1) FR

(c) Network IRF at h =2 10-4 (d) IRF for Germany’s edges at h =2

12 exports outstanding credit

10 SE SE

JP US JP US 8 AU AU 6 IE CH IE CH DK DK 4 GB DE GB DE 2 FR FR

0 outflows outstanding debt

SE SE -2 JP US JP US -4 AU AU IE CH IE CH -6 DK DK -8 Financial (layer 2) Trade (layer 1) GB DE GB DE

Financial (layer 2)FR Trade (layer 1) FR

Figure 7: Shock to GB capital inflows by -1% and outflows by +1%. IRF at horizon h =1 for all (panel a) and Germany (panel b) financial and trade transactions. IRF at horizon h =2 for all (panel c) and Germany (panel d) financial and trade transactions. In each plot negative coefficients are in blue and positive in red. negative shock to GB’s outstanding debt is slightly stronger, see impulse responses at horizon 2 in Fig. 6. The decay is monotonic. However, the speed of decay is heterogeneous across countries. For some countries, there are small effects at horizon 2, while for others the effects are completely wiped already. Overall, we do not find evidence of a relation between the size of a country in terms of exports

24 or outstanding credit and the persistence in the impulse response. At the most, persistence seems determined by the origin of the shock, the effects of a financial shock being more persistent than those of a trade shock. Finally, in Fig. 7 we plot the block Cholesky IRF, respectively, at horizon h = 1, 2, resulting from a 1% negative shock to GB’s outstanding debt coupled with a 1% positive shock to GB’s outstanding credit. The main findings follow. Global effect on the network. The results remarkably differ from the previous ones (see Fig. 6). The responses to this simultaneous shock in GB’s outstanding debt and credit are larger, in particular in the trade layer. However, already at horizon 2 responses are nearly fully decayed. The results in Fig. 6 and Fig. 7 suggest that an increase in GB’s outstanding credit has an overall positive effect on trade, stimulating export/import activities of most other countries. Local effect on Germany. One period after the shock, we observe an overall positive effect on German exports, the exception being towards GB, Ireland and Sweden. Imports react mostly positively. Imports from US and Ireland react most, while those from Denmark react negatively. The responses of Germany’s outstanding debt vis-á-vis most countries but Denmark and Sweden are negative, especially against France. At horizon 2 Germany’s responses have nearly faded away, suggesting a rapid monotone decay of the shock’s effect. Local effect on other countries. In particular, the reactions of Switzerland’s imports and outstanding debt are strikingly different from the previous case, com- pare with Fig. 6. Imports from US and Ireland, and to a lesser extent from France and Austria, are strongly boosted, while those from Denmark and Sweden decrease strongly. Moreover, we note that Japan’s outstanding debt increases significantly against most countries. We interpret this as a signal for Japan’s attractiveness for foreign capital. Compared with the previous exercise, France’s financial responses are now mostly insignificant, or of opposite sign. Finally, the reactions of GB’s exports and outstanding credit are heterogeneous, the latter ones being larger in absolute magnitude.

5 Conclusions

We defined a new statistical framework for dynamic tensor regression. It is a generalisation of many models frequently used in time series analysis, such as VAR, panel VAR, SUR and matrix regression models. The PARAFAC decomposition of the tensor of regression coefficients allows to reduce the dimension of the parameter space but also permits to choose flexible multivariate prior distributions, instead

25 of multidimensional ones. Overall, this allows to encompass sparsity beliefs and to design efficient algorithm for posterior inference. The proposed methodology has been used for analysing the temporal evolution of the international trade and financial network, and the investigation has been complemented with an impulse response analysis. We have found evidence of (i) wide heterogeneity in the sign and magnitude of the estimated coefficients; (ii) stationarity of the network process. The impulse response analysis has highlighted the role of network topology in shock propagation across countries and over time. Irrespective of its origin, any shock is found to propagate between layers, but financial shocks are more persistent than those on international trade. Moreover, we we do not find evidence of a relation between the size of a country, expressed by the total trade or capital exports, and the persistence its response to a shock. Finally, we have found evidence of substitution effects in response to the shocks, meaning that pairs of countries experience opposite effects from a shock to another country. In conclusion, our dynamic model can be used for predicting possible trade creation and diversion effects.

Supplementary Material

Supplementary material including background results on tensors, the derivation of the posterior, simulation experiments and the description of the data is available online8.

References

[1] Karim M Abadir and Paolo Paruolo. Two mixed normal densities from coin- tegration analysis. Econometrica, pages 671–680, 1997.

[2] Ralph Abraham, Jerrold E Marsden, and Tudor Ratiu. Manifolds, tensor analysis, and applications. Springer Science & Business Media, 2012.

[3] Iñaki Aldasoro and Iván Alves. Multiplex interbank networks and systemic importance: an application to European data. Journal of Financial Stability, 35:17–37, 2018.

[4] Osvaldo Anacleto and Catriona Queen. Dynamic chain graph models for time series network data. Bayesian Analysis, 12(2):491–509, 2017.

8https://matteoiacopini.github.io/docs/BiCaIaKa_Supplement.pdf

26 [5] JAA Andrade and PN Rathie. Exact posterior computation in non-conjugate gaussian location-scale parameters models. Communications in Nonlinear Science and Numerical Simulation, 53:111–129, 2017.

[6] JAA Andrade and Pushpa Narayan Rathie. On exact posterior distribu- tions using h-functions. Journal of Computational and Applied Mathematics, 290:459–475, 2015.

[7] Rutherford Aris. Vectors, tensors and the basic equations of fluid mechanics. Courier Corporation, 2012.

[8] Laszlo Balazsi, Laszlo Matyas, and Tom Wansbeek. The estimation of mul- tidimensional fixed effects panel data models. Econometric Reviews, pages 1–23, 2015.

[9] Badi H Baltagi, Georges Bresson, and Jean-Michel Etienne. Hedonic housing prices in paris: An unbalanced spatial lag pseudo-panel model with nested random effects. Journal of Applied Econometrics, 30(3):509–528, 2015.

[10] Nalan Baştürk, Lennart Hoogerheide, and Herman K van Dijk. Bayesian analysis of boundary and near-boundary evidence in econometric models with reduced rank. Bayesian Analysis, 12(3):879–917, 2017.

[11] Patrick Bayer, Robert McMillan, Alvin Murphy, and Christopher Timmins. A dynamic model of demand for houses and neighborhoods. Econometrica, 84(3):893–942, 2016.

[12] Ratikanta Behera, Ashish Kumar Nandi, and Jajati Keshari Sahoo. Fur- ther results on the drazin inverse of even order tensors. arXiv preprint arXiv:1904.10783, 2019.

[13] Anirban Bhattacharya, Debdeep Pati, Natesh S. Pillai, and David B. Dun- son. Dirichlet-Laplace priors for optimal shrinkage. Journal of the American Statistical Association, 110(512):1479–1490, 2015.

[14] Jacob A Bikker. The gravity model in international trade: advances and applications, chapter An extended gravity model with substitution applied to international trade, pages 135–164. Cambridge University Press, 2010.

[15] Michael Brazell, Na Li, Carmeliza Navasca, and Christino Tamon. Solving multilinear systems via tensor inversion. SIAM Journal on Matrix Analysis and Applications, 34(2):542–570, 2013.

27 [16] Fabio Canova and Matteo Ciccarelli. Forecasting and turning point predic- tions in a Bayesian panel VAR model. Journal of Econometrics, 120(2):327– 359, 2004.

[17] Fabio Canova and Matteo Ciccarelli. Estimating multicountry VAR models. International Economic Review, 50(3):929–959, 2009.

[18] Fabio Canova, Matteo Ciccarelli, and Eva Ortega. Similarities and conver- gence in G-7 cycles. Journal of Monetary Economics, 54(3):850–878, 2007.

[19] Andrea Carriero, George Kapetanios, and Massimiliano Marcellino. Struc- tural analysis with multivariate autoregressive index models. Journal of Econometrics, 192(2):332 – 348, 2016.

[20] Carlos M. Carvalho, Hélène Massam, and Mike West. Simulation of hyper- inverse Wishart distributions in graphical models. Biometrika, 94(3):647–659, 2007.

[21] Carlos M Carvalho and Mike West. Dynamic matrix-variate graphical models. Bayesian Analysis, 2(1):69–97, 2007.

[22] Elynn Y Chen, Ruey S Tsay, and Rong Chen. Constrained factor models for high-dimensional matrix-variate time series. Journal of the American Statistical Association, ((forthcoming)), 2019.

[23] Andrzej Cichocki. Era of Big data processing: a new approach via ten- sor networks and tensor decompositions. In Proceedings of the International Workshop on Smart Info-Media Systems in Asia (SISA2013), 2014.

[24] Peter Davis. Estimating multi-way error components models with unbalanced data structures. Journal of Econometrics, 106(1):67–95, 2002.

[25] A Philip Dawid and Steffen L Lauritzen. Hyper Markov laws in the statis- tical analysis of decomposable graphical models. The Annals of Statistics, 21(3):1272–1317, 1993.

[26] Aureo De Paula. Econometrics of network models. In Advances in Economics and Econometrics: Theory and Applications, Eleventh World Congress, pages 268–323. Cambridge University Press Cambridge, 2017.

[27] Shanshan Ding and R Dennis Cook. Matrix-variate regressions and enve- lope models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(2):387–408, 2018.

28 [28] Adrian Dobra. Handbook of Spatial Epidemiology, chapter Graphical Model- ing of Spatial Health Data. Chapman & Hall /CRC, first edition, 2015.

[29] Daniele Durante and David B Dunson. Nonparametric Bayesian dynamic modelling of relational data. Biometrika, 101(4):883–898, 2014.

[30] Jonathan Eaton and Samuel Kortum. Technology, geography, and trade. Econometrica, 70(5):1741–1779, 2002.

[31] Robert F Engle and Clive WJ Granger. Co-integration and error correction: representation, estimation, and testing. Econometrica, pages 251–276, 1987.

[32] Ana Cecilia Fieler. Nonhomotheticity and bilateral trade: Evidence and a quantitative explanation. Econometrica, 79(4):1069–1101, 2011.

[33] Edward I George and Robert E McCulloch. Approaches for Bayesian variable selection. Statistica Sinica, 7:339–373, 1997.

[34] Vasyl Golosnoy, Bastian Gribisch, and Roman Liesenfeld. The conditional autoregressive wishart model for multivariate stock market volatility. Journal of Econometrics, 167(1):211–223, 2012.

[35] Christian Gouriéroux, Joann Jasiak, and Razvan Sufana. The wishart autore- gressive process of multivariate stochastic volatility. Journal of Econometrics, 150(2):167–181, 2009.

[36] Rajarshi Guhaniyogi, Shaan Qamar, and David B Dunson. Bayesian tensor regression. Journal of Machine Learning Research, 18(79):1–31, 2017.

[37] Arjun K Gupta and Daya K Nagar. Matrix variate distributions. CRC Press, 1999.

[38] Wolfgang Hackbusch. Tensor spaces and numerical tensor calculus. Springer Science & Business Media, 2012.

[39] Jeff Harrison and Mike West. Bayesian forecasting & dynamic models. Springer, 1999.

[40] Peter D Hoff. Separable covariance arrays via the Tucker product, with ap- plications to multivariate relational data. Bayesian Analysis, 6(2):179–196, 2011.

29 [41] Peter D Hoff. Multilinear tensor regression for longitudinal relational data. The Annals of Applied Statistics, 9(3):1169–1193, 2015.

[42] Petter Holme and Jari Saramäki. Temporal networks. Physics Reports, 519(3):97–125, 2012.

[43] Masaaki Imaizumi and Kohei Hayashi. Doubly decomposing nonparametric tensor regression. In International Conference on Machine Learning, pages 727–736, 2016.

[44] Hemant Ishwaran and J Sunil Rao. Spike and slab variable selection: fre- quentist and Bayesian strategies. The Annals of Statistics, 33(2):730–773, 2005.

[45] Tamara G. Kolda and Brett W. Bader. Tensor decompositions and applica- tions. SIAM Review, 51(3):455–500, 2009.

[46] Gary Koop, M Hashem Pesaran, and Simon M Potter. Impulse response anal- ysis in nonlinear multivariate models. Journal of Econometrics, 74(1):119– 147, 1996.

[47] Vassilis Kostakos. Temporal graphs. Physica A: Statistical Mechanics and its Applications, 388(6):1007–1023, 2009.

[48] Pieter M Kroonenberg. Applied multiway data analysis. John Wiley & Sons, 2008.

[49] Namgil Lee and Andrzej Cichocki. Fundamental tensor operations for large- scale data analysis in tensor train formats. Multidimensional Systems and Signal Processing, 29(3):921–960, 2018.

[50] Lexin Li and Xin Zhang. Parsimonious tensor response regression. Journal of the American Statistical Association, 112(519):1131–1146, 2017.

[51] Helmut Lütkepohl. New introduction to multiple time series analysis. Springer Science & Business Media, 2005.

[52] Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023– 1032, 1988.

30 [53] Radford M Neal. MCMC using Hamiltonian dynamics. In Steve Brooks, Andrew Gelman, Jones L Galin, and Xiao-Li Meng, editors, Handbook of Markov Chain Monte Carlo, chapter 5. Chapman & Hall /CRC, 2011.

[54] Martin Ohlson, M Rauf Ahmad, and Dietrich Von Rosen. The multilinear normal distribution: introduction and some basic properties. Journal of Mul- tivariate Analysis, 113:37–47, 2013.

[55] Trevor Park and George Casella. The Bayesian lasso. Journal of the American Statistical Association, 103(482):681–686, 2008.

[56] H Hashem Pesaran and Yongcheol Shin. Generalized impulse response anal- ysis in linear multivariate models. Economics Letters, 58(1):17–29, 1998.

[57] Sebastian Poledna, José Luis Molina-Borboa, Serafín Martínez-Jaramillo, Marco Van Der Leij, and Stefan Thurner. The multi-layer network nature of systemic risk and its implications for the costs of financial crises. Journal of Financial Stability, 20:70–81, 2015.

[58] Peter Schotman and Herman K Van Dijk. A Bayesian analysis of the unit root in real exchange rates. Journal of Econometrics, 49(1-2):195–238, 1991.

[59] Yongcheol Shin, Laura Serlenga, and George Kapetanios. Estimation and inference for multi-dimensional heterogeneous panel datasets with hierarchical multi-factor error structure. Journal of Econometrics, (forthcoming), 2019.

[60] Christopher A Sims and Tao Zha. Bayesian methods for dynamic multivariate models. International Economic Review, 39(4):949–968, 1998.

[61] Harald Uhlig. Bayesian vector autoregressions with stochastic volatility. Econometrica, 65:59–73, 1997.

[62] Cinzia Viroli. Finite mixtures of matrix normal distributions for classifying three-way data. Statistics and Computing, 21(4):511–522, 2011.

[63] Hao Wang and Mike West. Bayesian analysis of matrix normal graphical models. Biometrika, 96(4):821–834, 2009.

[64] Tan Xu, Zhang Yin, Tang Siliang, Shao Jian, Wu Fei, and Zhuang Yueting. Logistic tensor regression for classification, volume 7751, chapter Intelligent science and intelligent data engineering, pages 573–581. Springer, 2013.

31 [65] Arnold Zellner. An efficient method of estimating seemingly unrelated re- gressions and tests for aggregation bias. Journal of the American Statistical Association, 57(298):348–368, 1962.

[66] Qibin Zhao, Liqing Zhang, and Andrzej Cichocki. A tensor-variate Gaussian process for classification of multidimensional structured data. In Twenty- seventh AAAI conference on artificial intelligence, 2013.

[67] Qibin Zhao, Guoxu Zhou, Liqing Zhang, and Andrzej Cichocki. Tensor- variate Gaussian processes regression and its application to video surveillance. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE Interna- tional Conference on, pages 1265–1269. IEEE, 2014.

[68] Hua Zhou, Lexin Li, and Hongtu Zhu. Tensor regression with applications in neuroimaging data analysis. Journal of the American Statistical Association, 108(502):540–552, 2013.

[69] Jing Zhou, Anirban Bhattacharya, Amy H. Herring, and David B. Dunson. Bayesian factorizations of big sparse tensors. Journal of the American Sta- tistical Association, 110(512):1562–1576, 2015.

[70] Xuening Zhu, Rui Pan, Guodong Li, Yuewen Liu, and Hansheng Wang. Net- work vector autoregression. The Annals of Statistics, 45(3):1096–1123, 2017.

[71] Xuening Zhu, Weining Wang, Hansheng Wang, and Wolfgang Karl Härdle. Network quantile autoregression. Journal of Econometrics, (forthcoming), 2019.

A Background Material on Tensor Calculus

This appendix provides the main tools used in the paper. See the supplement for further results and details. A N-order tensor is an element of the of N vector spaces. Since there exists a between two vector spaces of dimensions N and M < N, it is possible to define a one-to-one map between their elements, that is, between a N-order tensor and a M-order tensor.

Definition A.1 (Tensor reshaping). Let V ,...,V and U ,...,U be vector subspaces V , U R and RI1×...×IN = 1 N 1 M n m ⊆ X ∈ V ... V be a N-order real tensor of dimensions I ,...,I . Let (v ,..., v ) 1 ⊗ ⊗ N 1 N 1 N

32 I1×...×IN be a canonical of R and let ΠS be the projection defined as

Π :V ... V V ... V S 1 ⊗ ⊗ N → s1 ⊗ ⊗ sk v ... v v ... v 1 ⊗ ⊗ N 7→ s1 ⊗ ⊗ sk with S = s ,...,s 1,...,N . Let (S ,...,S ) be a partition of 1,...,N . { 1 k}⊂{ } 1 M { } The (S ,...,S ) tensor reshaping of is defined as = (Π ) ... 1 M X X(S1,...,SM ) S1 X ⊗ ⊗ (Π )= U ... U . The mapping is an isomorphism between V ... V SM X 1 ⊗ ⊗ M 1 ⊗ ⊗ N and U ... U . 1 ⊗ ⊗ M The matricization is a particular case of reshaping a N-order tensor into a 2-order tensor, by choosing a mapping between the tensor modes and the rows and columns of the resulting matrix, then permuting the tensor and reshaping it, accordingly.

Definition A.2 (Matricization). Let be a N-order tensor with dimensions I ,...,I . Let the ordered sets X 1 N R = r ,...,r and C = c ,...,c be a partition of N = 1,...,N . The { 1 L} { 1 M } { } matricized tensor is defined by

J×K matR,C ( )= X(R,C ) R , J = In, K = In . X ∈ R C nY∈ nY∈ Indices of R, C are mapped to the rows and the columns, respectively, and

L l−1 M m−1

X(R×C ) = i ,i ,...,i , j = 1+ (ir 1) Ir′ , k = 1+ (ic 1) Ic′ . j,k X 1 2 N l − l m − m l=1 l′=1 m=1 m′=1  X  Y  X  Y  The inner product between two (I ... I )-dimensional tensors , is defined 1 × × N X Y as I1 IN , = ... hX Yi Xi1,...,iN Yi1,...,iN i =1 i =1 X1 XN The PARAFAC(R) decomposition (e.g., see [45]), is rank-R decomposition which represents a tensor RI1×...×IN as a finite sum of R rank-1 tensors obtained as B ∈ (r) the outer products of N vectors (called marginals) β RIj j ∈

R R = = β(r) ... β(r). B Br 1 ◦ ◦ J r=1 r=1 X X

33 Lemma A.1 (Contracted product – some properties). Let RI1×...×IN and RJ1×...×JN ×JN+1×...×JN+P . Let (S , S ) be a partition X ∈ Y ∈ 1 2 of 1,...,N + P , where S = 1,...,N , S = N +1,...,N + P . It holds: { } 1 { } 2 { } ′ (i) if P = 0 and I = J , n = 1,...,N, then ¯ = , = vec n n X ×N Y hX Yi X · vec . Y   (ii) if P > 0 and In = Jn for n =1,...,N, then

j ×...×j ¯ = vec S S R 1 P X ×N Y X ×1 Y( 1, 2) ∈ ¯ Rj1×...×jP N = (S1,S2) 1 vec . Y× X Y × X ∈  (iii) let R = 1,...,N and C = N +1,..., 2N . If P = N and I = J = { } { } n n JN+n, n =1,...,N, then

′ ¯ ¯ = vec Y R C vec . X ×N Y×N X X ( , ) X   (iv) let M = N +P , then = ¯ T , where , are (I ... I 1)- and X◦Y X ×1Y X Y 1 × × N × (J ... J 1)-dimensional tensors, respectively, given by = , 1 × × M × X :,...,:,1 X = and T = . Y:,...,:,1 Y Yj1,...,jM ,jM+1 YjM+1,jM ,...,j1 Proof. Case (i). By definition of contracted product and tensor scalar product

I1 IN ′ ¯ = ... = , = vec vec . X ×N Y Xi1,...,iN Yi1,...,iN hX Yi X · Y i1=1 iN =1 X X   Case (ii). Define I∗ = N I and k =1+ N (i 1) j−1 I . By definition n=1 n j=1 j − m=1 m of contracted product andQ tensor scalar productP Q

∗ I1 IN I ¯ = ... = . X ×N Y Xi1,...,iN Yi1,...,iN ,jN+1,...,jN+P XkYk,jN+1,...,jN+P i =1 i =1 X1 XN Xk=1 Note that the one-to-one correspondence established by the mapping between k and (i ,...,i ) corresponds to that of the vectorization of a (I ... I )- 1 N 1 × × N dimensional tensor. It also corresponds to the mapping established by the tensor

reshaping of a (N + P )-order tensor with dimensions I1,...,IN ,JN+1,...,JN+P into a (P +1)-order tensor with dimensions I∗,J ,...,J . Let S = 1,...,N , N+1 N+P 1 { }

34 then

S I1 IN | 1| ¯ = ... = x ¯ X ×N Y Xi1,...,iN Yi1,...,iN ,:,...,: s1 Ys1,:,...,: i =1 i =1 s =1 X1 XN X1

where ¯ = reshape S ( ). Following the same approach, and defining Y ( 1,N+1,...,N+P ) Y S = N +1,...,N + P , we obtain the second part of the result. 2 { } Case (iii). We follow the same strategy adopted in case b). Let x = vec , X S = 1,...,N and S = N +1,...,N + P , such that (S 1,S ) is a partition 1 { } 2 { } − 2  of 1,...,N + P . Let k,k′ be defined as in case b). Then { }

I1 IN I1 IN ¯ ¯ ′ ′ ′ ′ N N = ...... i1,...,iN i1,...,iN ,i1,...,iN i1,...,iN X × Y× X ′ ′ X Y X i1=1 iN =1 i1=1 iN =1 ∗ X X X X ∗ ∗ I I1 IN I I ′ = ... xk k,i′ ,...,i′ i′ ,...,i′ = xk k,k′ xk′ = vec (S ,S ) vec . Y 1 N X 1 N Y X Y 1 2 X k=1 i′ =1 i′ =1 k=1 k′=1 X X1 XN X X  

Case (iv). Let i = (i1,...,iN ) and j = (j1,...,jM ) be two multi-indexes. By

the definition of outer and contracted product we get ( )i,j = i,1 1,j = T X ◦Y X Y ( ¯ )i j. Therefore, with a slight abuse of notation, we use = and write X ×1Y , Y Y = ¯ T , when the meaning of the products is clear form the context. Y◦Y Y×1Y Lemma A.2 (Kronecker - matricization). Let X be a I I matrix, for n = 1,...,N, and let = X ... X be the n n × n X 1 ◦ ◦ N (I ... I I ... I )-dimensional tensor obtained as the outer product 1 × × N × 1 × × N of the matrices X ,...,X . Let (S , S ) be a partition of IN = 1,..., 2N , 1 N 1 2 { } where S = 1,...,N and S = N +1,...,N . Then S S = X R C = 1 { } 2 { } X( 1, 2) ( , ) (X ... X ). N ⊗ ⊗ 1 ′ Proof. Use the pair of indices (in, in) for the entries of the matrix Xn, n =

1,...,N. By definition of outer product (X1 ... XN )i ,...,i ,i′ ,...,i′ = (X1)i ,i′ ◦ ◦ 1 N 1 N 1 1 · ′ S S X R C ... (XN )iN ,iN . By definition of matricization, ( 1, 2) = ( , ). Moreover · N Xp−1 N ( S S ) = with h = (i 1) J and k = (i X( 1, 2) h,k Xi1,...,i2N p=1 S1,p − q=1 S1,p p=1 S2,p − 1) p−1 J . By definition of the Kronecker product, the entry (h′,k′) of (X q=1 S2,p P Q P N ⊗ ′ N ... X1) is (XN ... X1)h′,k′ =(XN )i′ ,i′ ... (X1)i ,i′ , where h = (iS Q⊗ ⊗ ⊗ N N · · 1 1 p=1 1,p − 1) p−1 J and k′ = N (i 1) p−1 J . Since h = h′ and k = k′ and q=1 S1,p p=1 S2,p − q=1 S2,p P the associated elements of S S and (X ... X ) are the same, the result Q P X( 1, 2) Q N ⊗ ⊗ 1 follows.

Lemma A.3 (Outer product and vectorization).

35 Let α1,..., αn be vectors such that αi has length di, for i = 1,...,n. Then, for each j =1,...,n, it holds

n n vec αi = αn−i+1 = αn ... αj+1 Idj αj−1 ... α1 αj. i=1◦ i⊗=1 ⊗ ⊗ ⊗ ⊗ ⊗ ⊗   Proof. The result follows from the definitions of vectorisation operator and outer product. For n =2, the result follows directly from

vec α α = vec α α′ = α α =(α I )α =(I α )α . 1 ◦ 2 1 2 2 ⊗ 1 2 ⊗ d1 1 d2 ⊗ 1 2   For n> 2 consider, without loss of generality, n =3 (an analogous proof holds for n > 3). Then, from the definitions of outer product and Kronecker product we have vec α α α = 1 ◦ 2 ◦ 3 ′ ′ ′ ′ ′ ′ =(α α2,1α3,1,..., α α2,d2 α3,1, α α2,1α3,2,..., α α2,d2 α3,2,..., α α2,d2 α3,d3 ) 1 · 1 · 1 · 1 · 1 · = α α α =(α α I )α =(α I α )α =(I α α )α . 3 ⊗ 2 ⊗ 1 3 ⊗ 2 ⊗ d1 1 3 ⊗ d2 ⊗ 1 2 d3 ⊗ 2 ⊗ 1 3

Proof of Proposition 2.1. Denote with L the lag operator, s.t. L = , by Yt Yt−1 properties of the contracted product in Lemma A.1, case (iv), we get ( L) ¯ = I−A1 ×N Yt + ¯ + . We apply to both sides the operator ( + L + 2L2 + ... + A0 B×M Xt Et I A1 A1 t−1 t−1 e 1 L ), take t , and get Ae e →∞ e e ∞ e t t k k lim ( 1L ) ¯ N t = 1L ¯ N ( 0 + ¯ M t + t). t→∞ I − A × Y A × A B× X E  Xk=0  e e e e From [12], if ρ( ) < 1 and is finite a.s., then lim t ¯ = and the A1 Y0 t→∞ A1×N Y0 O ∞ k k operator L applied to a sequence s.t. i < c a.s. i converges to k=0 A1 Yt |Y ,t| ∀ e −1 e the inverseP operator ( 1L) . By the properties of the contracted product we e I − A get e ∞ ∞ ∞ = k ¯ (Lk )+ ( k ¯ ) ¯ (Lk )+ k ¯ (Lk ) Yt A1×N A0 A1×N B ×M Xt A1×N Et Xk=0 Xk=0 Xk=0 e e ∞ e e ∞ e =( L)−1 ¯ + k ¯ ¯ + k ¯ . I − A1 ×N A0 A1×N B×M Xt−k A1×N Et−k Xk=0 Xk=0 e e e e e From the assumption iid ( , Σ ,..., Σ ), we know that E( ) = , Et ∼ NI1,...,IN O 1 N Yt Y0

36 which is finite. Consider the auto-covariance at lag h 1. From Lemma A.1, we ≥ have E E( ) E( ) = E = E ¯ T . Using Yt − Yt ◦ Yt−h − Yt−h Yt ◦Yt−h Yt×1Yt−h the infinite moving average representation for , we get   Y t   h−1 ∞ ∞ E ¯ T = E k ¯ + k+h ¯ ¯ k ¯ T Yt×1Yt−h A ×N Et−k A ×N Et−k−h ×1 A ×N Et−k−h k=0 k=0 k=0    X∞ X ∞  X   = E k+h ¯ ¯ T ¯ ( T )k , A ×N Et−k−h ×1 Et−k−h×N A k=0 k=0  X  X  where we used the assumption of independence of , , for any h 0, and the Et Et−h ≥ fact that ( ¯ )T =( T ¯ T ). Using E( )= and linearity of expectation X ×N Y Y ×N X Et O and of the contracted product we get

∞ E ¯ T = k+h ¯ E ¯ T ¯ ( T )k Yt×1Yt−h A ×N Et−k−h×1Et−k−h ×N A k=0   X∞   = k+h ¯ Σ ¯ ( T )k = h ¯ ( ¯ Σ ¯ T )−1, A ×N ×N A A ×N I−A×N ×N A Xk=0 where E( ¯ T ) = E( ) = Σ = Σ ... Σ . From the Et−k−h×1Et−k−h Et−k−h ◦Et−k−h 1 ◦ ◦ N assumption ρ( ) < 1 it follows that the above series converges to a finite limit, A which is independent from t, thus proving that the process is weakly stationary.

Proof of Proposition 2.2. From Theorem 3.2, Corollary 3.3 of [15], we know that

T is a group (called tensor group) and that the matricization operator mat1:N,1:N is an isomorphism between T and the linear group of square matrices of size ∗ N I = n=1 In. Therefore, there exists a one-to-one relationship between the two eigenvalue problems ¯ = λ and Ax = λx, where A = mat ( ). In Q A×N X X 1:N,1:N A particular, λ = λ and x = vec . Consequently, ρ(A) = ρ( ) and the result X e A follows for p =1 from the fact that ρ(A) < 1 is a sufficient condition for the VAR(1) e stationarity Proposition 2.1 of [51]. Since any VAR(p) and ART(p) processes can be rewritten as VAR(1) and ART(1), respectively, on an augmented state space, the result follows for any p 1. ≥ Proof of Lemma 2.1. Consider a ART(p) process with RI1×...×IN and p 1. Yt ∈ ≥ We define the (pI1 I2 ... IN )-dimensional tensors and as = × × × Yt E t Y(k−1)I1+1:kI1,:,...,:,t and = , for k =0,...,p, respectively. Define the (pI Yt−k E(k−1)I1+1:kI1,:,...,:,t Et−k 1 × I ... I pI I ... I )-dimensional tensor as = 2× × N × 1× 2× × N A A(1:I1,:,...,:,(k−1)I1+1:kI1,:,...,: , for k =1,...,p, = , for k =1,...,p 1 and Ak A(kI1+1:(k+1)I1,:,...,:,(k−1)I1+1:kI1,:,...,: I − 0 elsewhere. Using this notation, we can rewrite the (I I ... I )-dimensional 1 × 2 × × N 37 ART(p) process = p ¯ + as the (pI I ... I )-dimensional Yt k=1 Ak×N Yt−j Et 1 × 2 × × N ART(1) process = ¯ + . Yt PA×N Yt−1 Et

B Computational Details

This appendix shows the derivation of the results. See the supplement for details.

B.1 Full conditional distribution of φr

′ J (r) −1 (r) R Define Cr = j=1 βj Wj,r βj and note that, since r=1 φr = 1, it holds R b τφ b τ. The posterior full conditional distribution of φ, integrating r=1 τ r = Pτ P Pout τ, is

+∞ p(φ , W) π(φ) p( W, φ, τ)π(τ)dτ |B ∝ B| Z0 R +∞ R J 1 ′ φα−1 (τφ )−Ij /2 exp β(r) W −1β(r) τ aτ −1e−bτ τ dτ ∝ r r − 2τφ j j,r j r=1 0 r=1 j=1 r Y Z  Y Y   +∞ R I R 0 RI0 α− 2 −1 αR− −1 Cr φr τ 2 exp + b τφ dτ ∝ − 2τφ τ r 0 r=1 r=1 r Z  Y    X  

where the integrand is the kernel of the GiG for ψr = τφr in eq. (23). Then, by R renormalizing, φr = ψr/ l=1 ψl. P B.2 Full conditional distribution of τ

The posterior full conditional distribution of τ is

R 4 I ′ a −1 −b τ − 0 1 (r) −1 (r) p(τ , W, φ) τ τ e τ (τφ ) 2 exp β (W ) β |B ∝ r − 2τφ j j,r j r=1 r j=1  Y  X  R RI a − 0 −1 −1 Cr τ τ 2 exp b τ τ , ∝ − τ − φ r=1 r  X  which is the kernel of the GiG in eq. (24).

38 B.3 Full conditional distribution of λj,r

The full conditional distribution of λj,r, integrating out Wj,r, is

Ij (r) λj,r βj,p p(λ β(r),φ , τ) λaλ−1e−bλλj,r exp j,r| j r ∝ j,r 2√τφ − (λ /√τφ )−1 p=1 r j,r r Y   β(r) (aλ+Ij )−1 j 1 λj,r exp bλ + λj,r , ∝ − √τφ r     which is the kernel of the Gamma in eq. (26).

B.4 Full conditional distribution of wj,r,p

The posterior full conditional distribution of wj,r,p is

2 (r) −1 2 1 β w λ w (r) − 2 j,p j,r,p j,r j,r,p p(wj,r,p βj ,λj,r,φr, τ) wj,r,p exp exp | ∝ − 2τφr − 2 2  2  (r)  1 λ β − 2 j,r j,p −1 wj,r,p exp wj,r,p wj,r,p , ∝ − 2 − 2τφr   which is the kernel of the GiG in eq. (26).

B.5 Full conditional distributions of PARAFAC marginals

Consider the model in eq. (18), it holds

vec = vec x + vec x + vec , Yt B−r ×4 t Br ×4 t Et     with vec x = vec β(r) β(r) β(r) x′ β(r). From Lemma A.3, we have Br ×4 t 1 ◦ 2 ◦ 3 · t 4   vec β(r) β(r) β(r) x′ β(r) = vec β(r) β(r) β(r) x′ β(r) = b β(r) (B.1) 1 ◦ 2 ◦ 3 · t 4 1 ◦ 2 ◦ 3 · t 4 4 4 (r) (r) (r) (r) (r)  = β , xt β β  II β = b1β (B.2) h 4 i 3 ⊗ 2 ⊗ 1 1 (r) (r) (r) (r) (r) = β , xt β IJ β β = b2β (B.3) h 4 i 3 ⊗ ⊗ 1 2 2 (r) (r) (r) (r) (r) = β , xt IK β β  β = b3β . (B.4) h 4 i ⊗ 2 ⊗ 1 3 3  Define with y = vec and Σ−1 = Σ−1 Σ−1 Σ−1, we obtain t Yt 3 ⊗ 2 ⊗ 1  1 T L(Y θ) exp vec ˜ ′ Σ−1 Σ−1 Σ−1 vec ˜ | ∝ − 2 Et 3 ⊗ 2 ⊗ 1 Et t=1  X   

39 T 1 ′ exp 2 y′ vec x Σ−1 vec β(r) β(r) β(r) β(r), x ∝ − 2 − t − B−r ×4 t 1 ◦ 2 ◦ 3 h 4 ti t=1 X    (r) (r) (r) ′ (r) −1 (r) (r) (r) (r) + vec β1 β2 β3 β4 , xt Σ vec β1 β2 β3 β4 , xt . (B.5) ◦ ◦ h i ◦ ◦ h i!   Consider the case j =1. By exploiting eq. (B.2) we get

T 1 ′ ′ L(Y θ) exp β(r) β(r), x 2 β(r) β(r) I Σ−1 β(r) β(r) I | ∝ − 2 1 h 4 ti 3 ⊗ 2 ⊗ I1 3 ⊗ 2 ⊗ I1 t=1 X   (r) ′ ′ −1 (r) (r) (r) (r) β1 2 yt vec −r 4 xt Σ β4 , xt β3 β2 II1 β1 · − − B × h i ⊗ ⊗ ! 1 ′    = exp β(r) SLβ(r) 2mLβ(r) . (B.6) − 2 1 1 1 − 1 1   Consider the case j =2. From eq. (B.3) we get

T 1 ′ L(Y θ) exp β(r) β(r), x 2 β(r) I β(r) Σ−1 β(r) I β(r) | ∝ − 2 2 h 4 ti 3 ⊗ I2 ⊗ 1 3 ⊗ I2 ⊗ 1 t=1 X   (r) ′ ′ −1 (r) (r) (r) (r) β2 2 yt vec −r 4 xt Σ β4 , xt β3 II2 β1 β2 · − − B × h i ⊗ ⊗ ! 1 ′    = exp β(r) SLβ(r) 2mLβ(r) . (B.7) − 2 2 2 2 − 2 2   Consider the case j =3, by exploiting eq. (B.4) we get

T 1 ′ L(Y θ) exp β(r) β(r), x 2 I β(r) β(r) Σ−1 I β(r) β(r) | ∝ − 2 3 h 4 ti I3 ⊗ 2 ⊗ 1 I3 ⊗ 2 ⊗ 1 t=1 X   (r) ′ ′ −1 (r) (r) (r) (r) β3 2 yt vec −r 4 xt Σ β4 , xt II3 β2 β1 β3 · − − B × h i ⊗ ⊗ ! 1 ′    = exp β(r) SLβ(r) 2mLβ(r) . (B.8) − 2 3 3 3 − 3 3   Finally, in the case j =4. From eq. (B.5) we get

1 T L(Y θ) exp 2 y′ vec x ′ Σ−1 vec β(r) β(r) β(r) | ∝ − 2 − t − B−r ×4 t 1 ◦ 2 ◦ 3 t=1 X    ′ ′ (r) (r) (r) (r) (r) ′ −1 (r) (r) (r) ′ (r) xtβ4 + β4 xt vec β1 β2 β3 Σ vec β1 β2 β3 xtβ4 · ◦ ◦ ◦ ◦ ! 1 ′   = exp β(r) SLβ(r) 2mLβ(r) . (B.9) − 2 4 4 4 − 4 4   40 (r) B.5.1 Full conditional distribution of β1

(r) From eq. (20)-(B.6), the posterior full conditional distribution of β1 is

1 ′ 1 ′ p(β(r) ) exp β(r) SLβ(r) 2mLβ(r) exp β(r) (W φ τ)−1β(r) 1 |− ∝ − 2 1 1 1 − 1 1 · − 2 1 1,r r 1  1 ′    = exp β(r) SL +(W φ τ)−1 β(r) 2mLβ(r) , − 2 1 1 1,r r 1 − 1 1    which is the kernel of the Normal in eq. (27).

(r) B.5.2 Full conditional distribution of β2

(r) From eq. (20)-(B.7), the posterior full conditional distribution of β2 is

1 ′ 1 ′ p(β(r) ) exp β(r) SLβ(r) 2mLβ(r) exp β(r) (W φ τ)−1β(r) 2 |− ∝ − 2 2 2 2 − 2 2 · − 2 2 2,r r 2  1 ′    = exp β(r) SL +(W φ τ)−1 β(r) 2mLβ(r) , − 2 2 2 2,r r 2 − 2 2    which is the kernel of the Normal in eq. (27).

(r) B.5.3 Full conditional distribution of β3

(r) From eq. (20)-(B.8), the posterior full conditional distribution of β3 is

1 ′ 1 ′ p(β(r) ) exp β(r) SLβ(r) 2mLβ(r) exp β(r) (W φ τ)−1β(r) 3 |− ∝ − 2 3 3 3 − 3 3 · − 2 3 3,r r 3  1 ′    = exp β(r) SL +(W φ τ)−1 β(r) 2mLβ(r) , − 2 3 3 3,r r 3 − 3 3    which is the kernel of the Normal in eq. (27).

(r) B.5.4 Full conditional distribution of β4

(r) From eq. (20)-(B.9), the posterior full conditional distribution of β4 is

1 ′ 1 ′ p(β(r) ) exp β(r) SLβ(r) 2mLβ(r) exp β(r) (W φ τ)−1β(r) 4 |− ∝ − 2 4 4 4 − 4 4 · − 2 4 4,r r 4  1 ′    = exp β(r) SL +(W φ τ)−1 β(r) 2mLβ(r) , − 2 4 4 4,r r 4 − 4 4    which is the kernel of the Normal in eq. (27).

41 B.6 Full conditional distribution of Σ1 Define ˜ = x , E˜ = mat ( ˜ ), Z = Σ−1 Σ−1 and S = Et Yt − B ×4 t (1),t (3) Et 1 3 ⊗ 2 1 T ˜ ˜ ′ t=1 E(1),tZ1E(1),t. The posterior full conditional distribution of Σ1 is P 1 −1 T ˜ ˜ ′ −1 exp 2 tr γΨ1Σ1 + t=1 tr E(1),tZ1E(1),tΣ1 p(Σ ) − 1 ν1+I1+T I2I3+1 |− ∝   2  Σ1 P ν T I I I − ( 1+ 2 3)+ 1+1 1 Σ 2 exp tr (γΨ + S )Σ−1 , ∝ 1 − 2 1 1 1   which is the kernel of the Inverse Wishart in eq. (28).

B.7 Full conditional distribution of Σ2

Define ˜ = x , E˜ = mat ( ˜ ) and S = T E˜ (Σ−1 Σ−1)E˜ ′ . Et Yt −B ×4 t (2),t (2) Et 2 t=1 (2),t 3 ⊗ 1 (2),t The posterior full conditional distribution of Σ2 is P

1 −1 T ˜ −1 −1 ˜ ′ −1 exp 2 tr γΨ2Σ2 + tr t=1 E(2),t(Σ3 Σ1 )E(2),tΣ2 p(Σ ) − ⊗ 2 ν2+I2+T I1I3+1 |− ∝   2  Σ2 P ν I T I I − 2+ 2+ 1 3+1 1 Σ 2 exp tr γΨ Σ−1 + S Σ−1 , ∝ 2 − 2 2 2 2 2   which is the kernel of the Inverse Wishart in eq. (28).

B.8 Full conditional distribution of Σ3

Define ˜ = x , E˜ = mat ( ˜ ), Z = Σ−1 Σ−1 and S = Et Yt − B ×4 t (1),t (1) Et 3 2 ⊗ 1 3 T ˜ ˜ ′ t=1 E(1),tZ3E(1),t. The posterior full conditional distribution of Σ3 is P 1 −1 T ˜ ′ −1 ˜ exp 2 tr γΨ3Σ3 + t=1 vec t (Σ3 Z3) vec t p(Σ ) − E ⊗ E 3 ν3+I3+T I1I2+1 |− ∝   2   Σ3P ν T I I I − ( 3+ 1 2)+ 3+1 1 Σ 2 exp tr (γΨ + S )Σ−1 , ∝ 3 − 2 3 3 3   which is the kernel of the Inverse Wishart in eq. (28).

42 B.9 Full conditional distribution of γ

The posterior full conditional distribution is

3 ν − i 1 p(γ Σ , Σ , Σ ) γΨ 2 exp tr γΨ Σ−1 γaγ −1e−bγ γ | 1 2 3 ∝ i − 2 i i i=1   Y 3  P3 ν I a − i=1 i i −1 1 −1 γ γ 2 exp tr Ψ Σ b γ ∝ − 2 i i − γ i=1  X   which is the kernel of the Gamma in eq. (29).

43