<<

Stat Comput DOI 10.1007/s11222-011-9289-1

Sequential parameter learning and filtering in structured autoregressive state-space models

Raquel Prado · Hedibert F. Lopes

Received: 11 January 2011 / Accepted: 7 September 2011 © Springer Science+Business Media, LLC 2011

Abstract We present particle-based algorithms for sequen- have been widely used in a number of applied areas includ- tial filtering and parameter learning in state-space autore- ing , and biomedical and environmental signal gressive (AR) models with structured priors. Non-conjugate processing. In many of these applied settings it is important priors are specified on the AR coefficients at the system to incorporate prior information on physically meaningful level by imposing uniform or truncated normal priors on quantities that may represent quasiperiodic or low frequency the moduli and wavelengths of the reciprocal roots of the trends related to latent processes of the observed time se- AR characteristic polynomial. Sequential Monte Carlo al- ries yt . For example, brain signals such as electroencephalo- gorithms are considered and implemented for on-line filter- grams (EEGs) can be thought as a superposition of latent ing and parameter learning within this modeling framework. processes where each of them describes brain activity in a More specifically, three SMC approaches are considered and specific frequency band (e.g., Prado 2010a, 2010b) and so, compared by applying them to data simulated from different it is desirable to consider priors on the model parameters state-space AR models. An analysis of a human electroen- that can provide qualitative and quantitative information on cephalogram signal is also presented to illustrate the use of the quasiperiodic latent processes that characterize these sig- the structured state-space AR models in describing biomed- nals. ical signals. Following an approach similar to that proposed in Huerta and West (1999b) for AR models at the observational level, Keywords State-space autoregressions · Structured priors · we specify priors on the real and complex reciprocal roots Sequential filtering and parameter learning of the AR characteristic polynomial at the state level of the state-space model. Such prior structure is non-conjugate, and so filtering and parameter inference are not available in closed or via direct simulation. Huerta and West 1 Introduction (1999b) proposed a Monte Carlo (MCMC) algorithm for inference in structured AR models that can = + We consider models of the form yt xt t , with xt an be extended to the case of structured state-space AR mod- AR(p) process with coefficients φ1,...,φp. These models els, however, such algorithm would not be useful in settings where p(xt , φ,v,w|y1:t )—with v and w the observational and system , respectively—needs to be updated se- R. Prado () quentially over time. Department of Applied Mathematics and , Baskin We consider particle-based algorithms for on-line param- School of , University of California Santa Cruz, 1156 High Street, Santa Cruz, CA 95064, USA eter learning and filtering in state-space AR models with e-mail: [email protected] truncated normal or uniform priors on the moduli and the wavelengths of the autoregressive reciprocal roots. The pa- H.F. Lopes per focuses on a sequential Monte Carlo algorithm based University of Chicago Booth School of Business, 5807 South Woodlawn Avenue, Chicago, IL 60637, USA on the particle learning approach of Carvalho et al. (2010); e-mail: [email protected] however, algorithms based on the approaches of Liu and Stat Comput

West (2001) and Storvik (2002) are also considered. We where u is a complex number. The AR process is station- compare the performance of these SMC algorithms in vari- ary if all the roots of (u) (real or complex) lie outside the ous simulation settings. Comparisons to MCMC algorithms unit circle, or equivalently, if the moduli of all the reciprocal are also presented. The particle-based algorithms can also be roots of (u) are below one. Note that the eigenvalues of used to obtain inference on standard AR models with struc- G(φ) are the reciprocal roots of the AR characteristic poly- tured priors, providing an alternative on-line approach to the nomial (u). Assume that the number of real roots and the MCMC scheme of Huerta and West (1999b). number of complex roots of (6) are known a priori and set to R and 2C, respectively. Then, denoting (rj ,λj ) the mod- ulus and wavelength of the jth complex reciprocal pair, for = : 2 Structured AR models j 1 , rj the modulus of the jth real reciprocal root, for j = (C + 1) : (C + R), and α ={αj ; j = 1 : (C + R)} such that α = (r ,λ ) for j = 1 : C and α = r for j = Consider an AR(p) plus noise model, or AR state-space j j j j j (C + 1) : (C + R), we have the following mapping from φ model, given by to α yt = xt + t ,t ∼ N(0,v), (1) C − = − 2πi/λj − 2πi/λj p (u) (1 rj e u)(1 rj e u) = xt = φixt−i + wt ,wt ∼ N(0,w), (2) j 1 i=1 (C+R)  × (1 − rj u). (7) where φ = (φ1,...,φp) is the p-dimensional vector of AR j=(C+1) coefficients, v is the observational , and w is the variance at the state level. It is assumed that φ,v and w are The prior on φ is that implied by the following prior dis- unknown with a prior structure that will be described be- tribution on α: low. Note that the model specified in (1) and (2) can also • Prior on the complex roots. For each j = 1 : C we as- be written in the following dynamic linear model (DLM) or ∼ c ∼ c · sume that rj gj (rj ), and λj hj (λj ), where gj ( ) is state-space form r r a continuous density over an interval (lj ,uj ), such that  0 0.5. Uni- We focus on the particle learning approach of Carvalho et form and truncated normal priors on (rj ,λj ) lead to priors on the implied AR coefficients that can be approximated by al. (2010) and Lopes et al. (2010), but we also consider two bivariate truncated normal distributions (Prado 2010b). Sim- additional algorithms: one based on the scheme of Liu and ilarly, uniform or truncated normal priors can be used on the West (2001) and another one following the SISR approach reciprocal real roots. Such priors will be considered here. of Storvik (2002). For recent overviews of particle filters and Once the priors are specified and observations begin to SMC methods the readers are referred to Cappé et al. (2007), arrive, we are interested in obtaining p(xt , α,v,w|y1:t ) se- Doucet and Johansen (2011) and Lopes and Tsay (2011). quentially over time. However, the class of priors just de- scribed leads to posterior distributions that are not avail- 3.1 Particle learning algorithm able in closed form. Huerta and West (1999b) proposed a reversible jump MCMC algorithm to obtain posterior sam- Carvalho et al. (2010) develop parameter learning, filtering ples of α and v in standard AR models with a prior struc- and smoothing algorithms in rather general state-space mod- ture similar to that described above, but with additional point els. In particular, methods are provided to deal with condi- masses at 1, −1 and 0 placed on the moduli of the real tionally Gaussian dynamic linear models (CDLMs), as well and complex reciprocal roots to account for nonstationar- as models that are conditionally Gaussian but nonlinear at ity and model order uncertainty. The algorithm of Huerta the state level. These algorithms have two main features. and West (1999b) works very well and has been applied to First, similar to other approaches such as those presented in a broad of data sets, however, it is not use- Fearnhead (2002) and Storvik (2002), conditional sufficient ful in settings where inference needs to be performed on- statistics are used to represent the posterior distribution of line. In addition, the methods of Huerta and West (1999b) the parameters. Furthermore, sufficient statistics for the la- tent states are also used whenever the model structure allows were developed for standard AR models at the observational = p + it. This reduces the variance of the weights, result- level, this is, models of the form yt j=1 φj yt−j t , instead of the AR state-space models given in (1) and ing in algorithms with increased efficiency. Second, in con- (2). The latter were considered in West (1997) in the con- trast with other particle based schemes that first propagate text of studying several oxygen isotope time series from and then resample the particles, the particle learning (PL) cores recorded at various geographical locations. Specifi- scheme of Carvalho et al. (2010) follows a “resample-and- cally, West (1997) describes a MCMC algorithm to obtain then-propagate” framework that helps delaying the decay in samples from p(x1:T , φ,v,w|y1:T ) assuming conjugate nor- the particle approximation often found in algorithms based mal priors on the AR coefficients. Therefore, the approach on sampling importance (SIR). of West (1997) does not allow us to sequentially update It is worth noting that particle degeneracy (or particle im- p(xt , α,v,w|y1:t ) in cases where structured priors are con- poverishment) is inherent to all particle filter algorithms as →∞ sidered on the AR parameters of the latent process xt . t for fixed number of particles M (Del Moral et al. In Sect. 3 we propose algorithms for on-line parameter 2006), including particle learning. The main advantage of learning and filtering within the class of models specified PL and Storvik filters, as will be seen in what follows, is that by (1), (2), and the prior structure on the AR characteristic their rate of decaying in particle diversity is slower when reciprocal roots described above. We consider three algo- compared to Liu and West (2001) algorithm. In our sim- rithms: an algorithm based on the particle learning (PL) ap- ulated and real applications the sample sizes are moderate proach of Carvalho et al. (2010); an algorithm based the ap- with values ranging from t = 100 and t = 400. We found proach of Liu and West (2001); and finally, a scheme based that Liu and West filter had smaller effective sample sizes on the sequential resampling (SISR) than those obtained with PL and Storvik filters in all the ex- approach of Storvik (2002). In Sect. 4 we compare the per- amples. Smaller effective sample sizes are associated with formance of these SMC algorithms by applying them to data particle degeneracy, and so the filters of PL and Storvik had simulated from various AR plus noise models. We also com- better performance than the filter of Liu and West. For fur- pare the performance of the PL-based scheme to a Markov ther discussion on particle degeneracy when learning about Stat Comput states and parameters see, amongst others, Chopin et al. We now discuss in detail Steps 1 to 4 in the specific case (2011), Del Moral et al. (2006), Flury and Shephard (2009) of AR(p) plus noise models with the structured priors spec- and Poyiadjis et al. (2011), as well as, Lopes et al. (2010) ified in Sect. 2. First, given the model structure we have that and its discussion. Pr(km = k) in Step 1 of the algorithm is proportional to Here we propose a scheme based on PL for CDLMs | z = |   + + of Carvalho et al. (2010) that will allow us to achieve on- p(yt+1 (st , θ)) N(yt+1 φ mt , φ Ct φ v w). (10) line posterior learning and filtering in structured AR(p) plus noise models. Consider a state-space model whose structure In the general PL algorithm the weights are computed via ∼ ∼ p(y + |(z , θ)), however, as pointed out by Carvalho et al. is that given in (3) and (4), with t N(0,Vt (θ)) and ηt t 1 t N(0, Wt (θ)), for some fixed parameters θ. In other words, (2010), using (10) is more efficient. conditional on θ, it is assumed that the model is fully spec- Then, the propagation of zt+1 in Step 2 is done as fol- =  = ified at time t by the quadruple {F, G(θ), Vt (θ), Wt (θ)}. lows. First note that zt+1 (xt+1,xt ,...,xt−p+2) and zt  In the case of the AR plus noise models, the fixed param- (xt ,xt−1,...,xt−p+2,xt−p+1) . Therefore, the last p − 1 eters are the moduli and frequencies of the AR reciprocal entries of zt+1 are the first p − 1 entries of zt ,sowejust roots, denoted by α, and the variances w and v. Therefore, replace these components accordingly and sample the first θ = (α,w,v), G(θ) = G(φ) = G(q(α))—where the map- element of zt+1,xt+1, from ping from φ to α isgivenin(7)—Vt (θ) = v, and Wt (θ) =  | = ∗ ∗ wFF . We begin by assuming that at time t we have a set p(xt+1 zt , θ,yt+1) N(mt+1(1), Ct+1(1, 1)), { z θ (m); = : } of equally weighted particles (zt , θ, st , st ) m 1 M , ∗ ∗ z θ where m + (1) is the first component of the vector m + and where st and st are the sufficient statistics for the states and ∗ t 1 ∗ t 1 the parameters, respectively. These particles can be used to Ct+1(1, 1) is the first component of the matrix Ct+1, with approximate p(xt , α,w,v|y1:t ). Then, at time t + 1 and for m∗ = G(θ)z + A∗ e∗ , each m the following steps are performed. t+1 t t+1 t+1 ∗ = − ∗ ∗ ∗  Particle learning for AR state-space models Ct+1 Wt (θ) At+1Qt+1(At+1) , m { } Step 1. Sample an index k from 1,...,M with prob- ∗ = −  ∗ = ∗ −1 et+1 (yt+1 F G(θ)zt ), At+1 Wt (θ)F(+1) , and ability ∗ =  + Qt+1 (F Wt (θ)F v). m = ∝ | z (k) Finally, propagating the sufficient statistics for the states Pr(k k) p(yt+1 (st , θ) ). in Step 3 is straightforward from the Kalman filter recur- Step 2. Propagate the states via sions, while propagating the sufficient statistics for the pa- rameters in Step 4 is more involved, as it requires consid- (m) ∼ | (km) zt+1 p(zt+1 (zt , θ) ,yt+1). ering the conditional structure of AR(p) processes. This is explained below. Step 3. Propagate the sufficient statistics for states via

z,m = K z,(km) (km) 3.1.1 Parameter learning st+1 (st , θ ,yt+1)

θ,m θ (m) We now derive the recursions to update st+1 the suffi- Step 4. Propagate st+1 and sample θ using a condi- tional structure on the reciprocal roots. cient statistics for θ. First, note that the reciprocal roots are not identified in the formal sense given that the model In Step 3, K(·) denotes the Kalman filter recursion given remains unchanged under any permutation of the indexes z = that identify such roots. Therefore, to deal with this issue as follows. Let st (mt , Ct ) be the Kalman filter first and second moments at time t conditional on θ. Then, in a model a particular ordering is imposed on the roots. More specif- z defined by {F, G(θ), Vt (θ), Wt (θ)}, s + = (mt+1, Ct+1) ically, we order the reciprocal roots by decreasing mod- t 1 ={ } are obtained via uli, i.e., α (r1,λ1),...,(rC,λC), rC+1,...,rC+R , with r1 > ···>rC and rC+1 > ···>rC+R. In addition, we use = + −  mt+1 at+1 At+1(yt+1 F at+1) (8) α(−j) to denote the set with the ordered moduli and fre- −1 = −1 + −1  quencies for all the roots except for that indexed by j (which Ct+1 Rt+1 FVt+1(θ)F , (9) could be complex or real), and αk:l to denote the set of the  where at+1 = G(θ)mt , Rt+1 = G(θ)Ct G (θ) + Wt (θ), ordered moduli and frequencies for all the roots except the = −1 =  + (k − ) C + R − l. At+1 Rt+1FQt+1 and Qt+1 F Rt+1F Vt+1(θ). Fi- first 1 and the last nally, parameter learning is achieved by Step 4 and the new { z θ (m); = : } set of particles given by (zt+1, θ, st+1, st+1) m 1 M Updating the sufficient statistics for the complex reciprocal can be used to approximate p(xt+1, α,w,v|y1:(t+1)). roots Take the j-th complex reciprocal root with modulus Stat Comput rj and characteristic wavelength λj . Then, conditional on which can be written as s(−j),t+1 = (s(−j),t+1,1,...,  all the remaining reciprocal roots compute s(−j),t+1,5) with −2πi/λ 2πi/λ x − = (1 − r e l B)(1 − r e l B) = + 2 ( j),t l l s(−j),t+1,1 s(−j),t,1 x(−j),t, l≤C,l =j s(−j),t+1,2 = s(−j),t,2 + x(−j),tx(−j),t−1, C+R = + 2 × (1 − rlB)xt , (11) s(−j),t+1,3 s(−j),t,3 x(−j),t−1, l=C+1 s(−j),t+1,4 = s(−j),t,4 + x(−j),tx(−j),t+1, where B is the backshift operator. Under the model, it = + (j) = s(−j),t+1,5 s(−j),t,5 x(−j),t−1x(−j),t+1. follows that x(−j),t is an AR(2) with coefficients φ1 (j) =− 2 2rj cos(2π/λj ) and φ2 rj , and variance w. Therefore, Updating the sufficient statistics for the real reciprocal roots we can write This is simpler than updating the sufficient statistics for the = (j) + ∼ complex reciprocal roots. Take the j-th real root (i.e., take x(−j),t X(−j),tφ ξ t , ξ t N(0,wIt−2) (12) any j such that C + 1 ≤ j ≤ C + R) with modulus rj and  with x(−j),t = (x(−j),3,...,x(−j),t) , It−2 the (t − 2) × compute − (j) = (j) (j)  = (t 2) identity matrix, φ (φ1 ,φ2 ) , and X(−j),t − × C (x(−j),2:(t−1), x(−j),1:(t−2)) the (t 2) 2 design matrix. −2πi/λ 2πi/λ x − = (1 − r e l B)(1 − r e l B) Equation (12) can then be combined with the prior distri- ( j),t l l (j) (j) l=1 bution on φ1 and φ2 , or equivalently, the prior on rj and (j) C+R λj . For example, if we assume that φ has a truncated nor- × (1 − r B)x , mal prior, say φ(j) ∼ TN(mφ , Cφ , Rφ), we have that, up l t j,0 j,0 j l≥C+1,l =j (j) ∼ φ φ Rφ to time t, φ TN(mj,t, Cj,t, j ), with which under the model assumptions follows an AR(1) pro- φ −1 φ −1 1  cess with coefficient r , and so (C ) = (C ) + X − X − (13) j j,t j,0 w ( j),t ( j),t x − = X − r + ξ , ξ ∼ N(0,wI − ), φ −1 φ = φ −1 φ + 1  ( j),t ( j),t j t t t 1 (Cj,t) mj,t (Cj,0) mj,0 X(−j),tx(−j),t. (14) w    with x(−j),t = (x(−j),2,...,x(−j),t) , and X(−j),t = Now, X − X(−j),t and X − x(−j),t are given by  ( j),t ( j),t (x(−j),1,...,x(−j),t−1) . If a truncated normal prior is as-  sumed on r ,thisisr ∼ TN(mφ ,Cφ , Rφ), we have that, X(−j),tX(−j),t j j j,0 j,0 j φ φ φ φ φ t t rj ∼ TN(m ,C , R ), with C and m computed via x2 x − − x − − j,t j,t j j,t j,t l=3 (−j),l−1 l=3 ( j),l 1 ( j),l 2  t 2 = , (13) and (14). In this case X − X(−j),t = = x , t t 2 ( j),t l 2 (−j),l−1 l=3 x(−j),l−1x(−j),l−2 l=3 x(−j),l−2  = t and X(−j),tx(−j),t l=2 x(−j),l−1x(−j),l and so, the vec- and tor of sufficient statistics to update rj at time t + 1for ≥ +  j C 1 is given by X(−j),tx(−j),t + +  t 1 t 1 t t = 2 s(−j),t+1 x(−j),l−1, x(−j),l−1x(−j),l = x(−j),l−1x(−j),l, x(−j),l−2x(−j),l . l=2 l=2 l=3 l=3 2  = (s(−j),t,1 + x − ,s(−j),t,2 + x(−j),tx(−j),t+1) . And so, the vector of sufficient statistics to update φ(j), or ( j),t equivalently (r ,λ ), at time t + 1forj = 1 : C is given by j j Updating the sufficient statistics for the variances Given t+1 t+1 the inverse-gamma priors on v and w, the sufficient statistics 2 s − + = x , x − − x − − , v w ( j),t 1 (−j),l−1 ( j),l 1 ( j),l 2 associated with these parameters, denoted as st+1 and st+1,  l=3 l=3 v = v v w = respectively, are given by st+1 (νt+1,dt+1) and st+1  + + w w v = v + w = w + v = t 1 t 1 (νt+1,dt+1) with νt+1 νt 1,νt+1 νt 1,dt+1 2 v + − 2 x(−j),l−2, x(−j),l−1x(−j),l, dt (yt+1 xt+1) , and l=3 l=3 ⎛ ⎞ 2 t+1 p w = w + ⎝ − ⎠ x(−j),l−2x(−j),l , dt+1 dt xt+1 φj xt+1−j . l=3 j=1 Stat Comput

θ,m (m) { (m); = : } Propagation of st+1 and sampling of θ Based on the a weighted set of M particles (zt , γ t ,ωt ) m 1 M , M = · results presented above, Step 4 of the algorithm is performed with m=1 ωm 1. We discuss some choices of f() later as follows: in this section. The algorithm of Liu and West (2001)forthe • = (m) = S (km) (km) AR plus noise model would be as follows. Begin with j 1. Compute s(−1),t+1 1(s(−1),t , α(−1), (km) (km) (m) v , w , z + , yt+1) as described above; sample Algorithm of Liu and West t 1 m (1),m ∼ φ,m φ,m Rφ φ,m Step 1. Sample an index k from {1,...,M} with prob- φ TN(m1,t+1, C1,t+1, 1 ), where m1,t+1 and φ,m ability C1,t+1—which are computed using (13) and (14)—are (m) (m) (1),m m = ∝ k | (k) (k) s ; r = −φ , Pr(k k) ωt p(yt+1 mt+1, μt ), functions of (−1),t+1 and set 1 2 and (m) = (1),m − (1),m (k) (k) (k) (k) (k) λ 2π/arccos(φ /2 φ ). with m = E(z + |z , μ ), μ = aγ + (1 − 1 1 2 t+1 t 1 t t t t • For the remaining complex reciprocal roots, i.e., for ¯ ¯ = M (m) (m) (m) (km) (m) a)γ t and γ t m=1 ωt γ t . = = S m j 2,...,C: compute s(−j),t+1 j (s(−j),t, α1:(j−1), (m) (k ) − 2 Step 2. Sample γ + from N(μt ,(1 a) Vt ) with (km) (km) (km) (m) ; (j),m t 1 α(j+1):(R+C), v , w , zt+1, yt+1) sample φ M from a truncated normal distribution TN(mφ,m , Cφ,m , j,t+1 j,t+1 V = ω(m)(γ (m) − γ¯ )(γ (m) − γ¯ ). Rφ t t t t t t j ),—again this is done using (13) and (14) and noting m=1 φ,m φ,m (m) ; that mj,t+1 and Cj,t+1 are functions of s(−j),t+1 set m (m) | (k ) γ (m) Step 3. Sample the states from p(zt+1 zt , t+1). Step 4. Compute the new weights via (m) = − (j),m rj φ2 , (m) (m) p(yt+1|z + , γ + ) and ω(m) ∝ t 1 t 1 . t+1 | (km) (km) p(yt+1 mt+1 , μt ) λ(m) = 2π/arccos(φ(j),m/2 −φ(j),m). j 1 2 { Finally, the set of weighted particles (zt+1, γ t+1, (m) • For j = (C + 1),...,(C + R), i.e., for each of the ωt+1) ; m = 1 : M} can be used to obtain an approxi- m (m) = S (k ) mation to p(zt+1, γ |y1:(t+1)). Resampling can be done at R real reciprocal roots: compute s(−j),t+1 j (s(−j),t, (m) (km) m m (m) each time t or only when the effective sample size given α , α , v(k ), w(k ), z , y + ); sam- 1:(j−1) (j+1):(R+C) t+1 t 1 by 1/ M (ω(m))2 is below a certain threshold M (Liu (m) φ,m m=1 t 0 ple rj from a truncated normal distribution TN(mj,t+1, 1996). The value of a above is set to a = (3δ − 1)/2δ, with φ,m R ∈ ] Cj,t+1, j ). δ (0, 1 interpreted as a discount factor (see Liu and West • v,m = S v,(km) (m) 2001). Values of δ>0.9 are typically used in practice. Compute st+1 v(st , zt+1,yt+1) and sample νv dv In the case of AR plus noise models with uniform or (v(m)|sv,m) ∼ IG( t+1 , t+1 ). t+1 2 2 truncated normal priors on the moduli and wavelengths of • w,m = S w,(km) (m) (m) Compute st+1 w(st , α , zt+1,yt+1) and sam- the AR reciprocal roots, the algorithm above can be ap- νw dw (m)| w,m ∼ t+1 t+1 plied to transformations of the parameters so that the nor- ple (w st+1 ) IG( 2 , 2 ). mal kernels in Step 2 above are appropriate. For example, Note that running this algorithm when the state xt follows assume that uniform priors are considered, i.e., assume that an AR(p) process with C pairs of complex characteristic c r r r r r g (rj ) = U(rj |l ,u ) and g (rj ) = U(rj |l ,u ) for j = 1 : roots and R real roots requires storing 5C + 2R + 2 suffi- j j j j j j (C + R), and that h (λ ) = U(λ |lλ,uλ) for j = 1 : C. cient statistics per particle for the parameters, in addition to j j j j j Then, the algorithm can be applied to the transformed pa- storing the sufficient statistics for the states. − r r − = : + rameters log((rj lj )/(uj rj )) for j 1 (C R), − λ λ − = : 3.2 Liu and West algorithm log((λj lj )/(uj λj )) for j 1 C, log(w), and log(v), so that Liu and West (2001) proposed a general algorithm that ex- r r (r1 − l ) (rC+R − l + ) tends auxiliary variable particle filters for state variables γ = log 1 ,...,log C R , (ur − r ) (ur − r + ) (Pitt and Shephard 1999) to include model parameters. This 1 1 C+R C R algorithm uses ideas of kernel smoothing (West 1993) to de- − λ − λ (λ1 l1 ) (λC lC) velop a method that artificially evolves the fixed parameters log λ ,...,log λ , log(w), log(v) . (u − λ1) (u − λC) without information loss. 1 C Assume that at time t, the posterior p(zt , γ |y1:t ), where Section 4 illustrates the performance of this algorithm in γ = f(α,v,w) for some function f(·), is summarized by simulated data sets. Stat Comput

3.3 Storvik algorithm θ approximately from p(θ|tt ) using a few Gibbs steps can be applied. We illustrate the performance of this algorithm Similar to the approaches of Fearnhead (2002) and Carvalho in Sect. 4. et al. (2010), the algorithms of Storvik (2002) assume that the posterior distribution of θ given z0:t and y1:t depends on a sufficient tt that can be updated recursively. 4 Examples When applied to the structured AR plus noise model the gen- eral sequential importance sampling resampling algorithm 4.1 Simulation studies of Storvik (2002) can be summarized as follows. 4.1.1 Structured AR(1) plus noise model Storvik Algorithm = : Importance sampling: for m 1 M A total of T = 300 observations were simulated from an (m) | (m) AR(1) plus noise model with AR coefficient φ = 0.95— Step 1. Sample θ from g1(θ z0:t ,y1:(t+1)), (m) (m) (m) or equivalently, real reciprocal root 0.95—observational Step 2. Sample z˜ ∼ g (z + |z ,y + , θ ) and set t+1 2 t 1 0:t t 1 variance v = 0.02, and system variance w = 0.1. We as- ˜(m) = ˜(m) z0:(t+1) (z0:t , zt+1). sumed apriorithat r ∼ U(0, 1), v ∼ IG(αv,βv), and Step 3. Evaluate the weights w ∼ IG(αw,βw). Non-informative priors on the variances → = (m) (m) (m) (m) are obtained when αv,αw,βv,βw 0, and so we set αv p(θ(m)|t )p(z˜ |z , θ(m))p(y + |z˜ , θ(m)) (m) (m) t t+1 t t 1 t+1 αw = 0.01, and βv = βw = 0.01. The three SMC algorithms ω˜ + =ω . t 1 t (m)| ˜(m) | (m) (m) g1(θ z0:t ,y1:(t+1))g2(zt+1 z0:t ,yt+1, θ ) described in Sect. 3 were applied to the simulated data to obtain particle based approximations of p(r,w,v,xt |y1:t ) Resampling: for m = 1 : M sequentially over time. Each of the algorithms was indepen- Step 1. Sample an index km from {1,...,M} with prob- dently run 4 times using 4 different random seeds, and all m = ˜ (k ) the algorithms used M 2000 particles. A discount factor abilities proportional to ωt+1 . m of δ = 0.95 and was used for the Liu and West algorithm. (m) = ˜(k ) (m) = Step 2. Set z0:(t+1) z0:(t+1) and ωt+1 1/M. Compute A resampling step was also performed in the Liu and West (m) = T (km) (m) the sufficient statistics tt+1 (tt , zt+1,yt+1). algorithm at each time t. Figure 1 compares parameter learning in the AR(1) plus | (m) = | (m) We set g1(θ z0:t ,y1:(t+1)) p(θ tt ) in Step 1 in or- noise model for the three algorithms. PL and Storvik algo- der to make computation and simulation fast. We also set rithms show similar performances while the algorithm based | (m) (m) = | (m) g2(zt+1 z0:t ,yt+1, θ ) p(zt+1 (zt , θ) ,yt+1) and so on the approach of Liu and West (2001) shows particle de- Step 2 of this algorithm uses the same equations of Step 2 generacy. Figure 2 displays plots of the effective sample in the PL algorithm. Finally, the sufficient statistics for the sizes (as percentages based on a total number of M = 2000 parameters are updated as follows. As described in Sect. 3.1, particles) for each of the algorithms. These plots show that assuming truncated normal or uniform priors on the AR re- PL and Storvik algorithms have similar effective sample ciprocal roots, it can be shown that sizes, while the algorithm of Liu and West has much smaller effective sample sizes, with percentages below 80% thresh- (j),m| (φ y1:t , z0:t , α−j ,w) old after t = 100 and often below 50% after t = 150. ∼ φ,m φ,m R = : More informative inverse-gamma priors on v and w with TN(mj,t , Cj,t , j ), j 1 C, αv,αw ∈ (3, 100) and βv = (αv − 1) × 0.02 and βw = (m) | (αw − 1) × 0.1 were also considered. Such priors led to sim- (rj y1:t , z0:t , α(−j),w) ilar results to those summarized above, i.e., PL and Storvik ∼ (m) (m) R = + : + TN(mj,t ,t,Cj,t , j ), j (C 1) (C R), algorithm showed similar performances and the algorithm w w of Liu and West showed particle degeneracy issues when νt dt (w|y :t , z :t , α) ∼ IG , , M = 2000 or less were used. 1 0 2 2 v v νt dt 4.1.2 Structured AR(2) plus noise model (v|y :t , z :t , α) ∼ IG , , 1 0 2 2 We simulated T = 400 observations from the AR(2) plus and so, the sufficient statistic tt for (α,w,v) is a func- noise model with autoregressive coefficients φ1 = 0.1 and t 2 t = : − = = tion of l=1 xl−i and l=1 xl−ixl−j for i, j 1 (p 3), φ2 0.8075 and implied real reciprocal roots r1 0.95 and t 2 t =− l=1 yl and l=1 ylxl. However, direct simulation from r2 0.85. The observational and system variances were p(θ|tt ), as required in Step 1 of the algorithm, is not possi- set at v = 0.02 and w = 0.1, respectively. The time series ble. Following Storvik (2002) a SISR algorithm that samples plot of the simulated xt appears in Fig. 3 (black line). We Stat Comput

Fig. 1 AR(1) plus noise: Sequential parameter learning. The plots show the posterior (black) and 95% credibility intervals (gray)of (r|y1:t ) (first column), (v|y1:t ) (second column), and (w|y1:t ) (third column) for 4 independent runs—i.e., runs with different random seeds—of the PL algorithm (first row), the algorithm of Liu and West (second row), and Storvik algorithm (third row)

Fig. 2 AR(1) plus noise: Effective sample sizes. The plots show the effective sample sizes (as percentages) for the PL algorithm (top plot), the Liu and West algorithm (middle plot) and Storvik algorithm (bottom plot) based on M = 2000 particles. Dotted lines appear at 80% and 50%

applied the PL algorithm for on-line filtering and parameter the estimated latent process xt at each time t and 99% cred- learning described in Sect. 3. Uniform priors were assumed ibility intervals for (xt |y1:t ) obtained from running the PL on r1 and r2, with r1 ∼ U(0, 1) and r2 ∼ U(−1, 0). Difuse algorithm with M = 6000 particles (gray lines). The algo- inverse-gamma priors were assumed on v and w, with v ∼ rithm of Liu and West (2001) was also applied to these data IG(αv,βv) and w ∼ IG(αw,βw), where αv = αw = 0.01, using δ = 0.95 and M = 6000 particles. A resampling step βv = βw = 0.01. Figure 3 also shows the posterior of was done at each time t. Stat Comput

Figure 4 displays 99% posterior credibility intervals and posterior means of (r1|y1:t ), (r2|y1:t ), (w|y1:t ), and (v|y1:t ), respectively, for t = 1 : 400. The left plots correspond to re- sults obtained by applying the PL algorithm with M = 6000 particles and the right plots correspond to posterior sum- maries obtained from running the algorithm of Liu and West (2001) with M = 6000 particles and δ = 0.95. The plots show that the LW algorithm suffers from particle degener- acy much faster than the PL algorithm when the same num- ber of particles is used in both algorithms. Figure 5 shows the effective sample sizes in % for both algorithms, which once again confirms that PL has a better performance than LW in this example.

4.1.3 Structured AR(3) plus noise model

Fig. 3 AR(2) plus noise—Sequential state learning. Sequential pos- In this example, the PL algorithm described in Sect. 3 terior mean and 99% credibility interval (gray lines)of(x |y : ).The t 1 t was applied to T = 250 observations simulated from an black line is the true simulated xt AR(3) plus noise model with three characteristic recipro- cal roots: one real root with modulus r1 =−0.95 and a pair

Fig. 4 AR(2) plus noise—Comparing PL (left)andLW(right). The left plots show the sequential posterior means (black) and 99% credibility intervals (gray). Posterior summaries obtained by running the PL and LW algorithms are based on M = 6000 particles Stat Comput

the PL algorithm performs well when compared to MCMC. Particle-based summaries of the various posterior distribu- tions show that the PL scheme allows for the same type of learning that occurs with MCMC when additional data are received (compare summaries at t = 20 and t = 250 in both cases). Even though more accurate particle approximations (not shown) can be obtained when the number of particles are increased, the PL algorithm successfully achieves pa- rameter learning and filtering in this example with a rela- tively small number of particles (M = 2000).

4.2 Analysis of EEG data

Signals such as EEGs can be thought as a superposition of unobserved latent components that represent brain activ- ity in various frequency bands (Prado 2010a, 2010b). The Fig. 5 AR(2) plus noise—Comparing PL and LW. Effective sample solid black line in Fig. 9(a) displays a portion of a human = sizes (in %) for PL (black line)andLW(gray line) based on M 6000 EEG. The time series is part of a large data set that con- particles tains EEG traces recorded at 19 scalp locations on a patient who received electroconvulsive therapy as a treatment for of complex roots with modulus r2 = 0.95 and wavelength major depression (for a description of the complete data λ2 = 16. The observation and system variances were set see Prado et al. 2001 and references therein). The series to v = 0.25 and w = 1. The prior distributions were trun- was modelled in Prado and West (2010) using autoregres- cated normal priors on the reciprocal roots given by r1 ∼ sive and autoregressive moving average (ARMA) models. TN(r1|−0.5, 1,(−1, 0)), and r2 ∼ TN(r2|0.8, 1,(0.5, 1)), Specifically, it was shown that AR models of orders 8Ð12 for the moduli of the real and complex characteristic could be used describe the main features of the series, which roots, and λ2 ∼ TN(λ2|16, 2,(12, 20)) for the wavelength shows a persistent latent quasiperiodic component of period of the complex characteristic roots. Inverse-gamma pri- in the 11.5Ð13.5 range with corresponding modulus in the ors were assumed on v and w, with v ∼ IG(αv,βv) and 0.90Ð0.99 range, and several latent components with lower w ∼ IG(αw,βw), where αv = 2,βv = (αv − 1) × 0.25, period and moduli. These relatively long order AR models αw = 2, and βw = (αw − 1) × 1. These priors correspond to act as approximations to more complex, but also more par- approximate 99% prior credibility intervals of (0.03, 2.42) simonious, ARMA models. An exploratory search across for v and (0.13, 9.68) for w. ARMA(p, q) models with p,q ∈{1,...,8} that uses the Figure 6 shows the results obtained when the PL algo- conditional and approximate log-likelihoods, and chooses rithm is applied to the simulated data with M = 2000 parti- the model orders via the Akaike information criterion, in- cles. Plots (a), (b), (c), () and (e) display, respectively, the dicates that an ARMA(2, 2) is the optimal model within this distributions of (r1|y1:t ), (r2|y1:t ), (λ2|y1:t ), p(w|y1:t ) and class to describe the EEG series. Prado and West (2010) p(v|y1:t ). The dotted lines in pictures (a)Ð(e) represent the also present an analysis in which some of the components of true values of r1,r2,λ2,wand v. Plot (f) shows the true xt an AR(8) are inverted to approximate ARMA(2,q)models, (solid line) and the posterior mean of (xt |y1:t ) (dotted line) and find evidence supporting the choice of an ARMA(2, 2). for each t = 1 : T.From these figures it seems that the PL al- The AR plus noise models presented here provide a way gorithm successfully learns about the fixed parameters in the to describe time series with an ARMA(p, p) structure. As model, however, in order to provide a more accurate assess- shown in Granger and Morris (1976), the sum of an AR(p) ment of its performance we proceed to compare PL-based plus results in an ARMA(p, p). The AR plus results to results obtained using a MCMC algorithm. noise representation is easier to interpret than the ARMA Figures 7 and 8 illustrate and compare the performance of representation in that it separates the sources of error into the PL algorithm with that of an MCMC algorithm that uses two: the observational error, that captures measurement er- the forward filtering backward sampling (FFBS) scheme of ror, and the system level error. In addition, the structured Carter and Kohn (1994) and Frühwirth-Schnatter (1994). priors presented here allow researchers to include scientifi- These figures show summaries of p(r1|y1:t ), p(r2,λ2|y1:t ) cally meaningful information, and the SMC algorithms lead and p(v,w|y1:t ) for t = 20 and t = 250 obtained from the to on-line filtering and parameter learning within this class PL algorithm with 2000 particles (Fig. 7), and posterior of flexible models. summaries obtained from 10000 MCMC iterations taken af- We now show how the structured AR plus noise models ter 3000 burn-in iterations (Fig. 8). As shown in the figures, and the SMC algorithms can be used to analyze the EEG Stat Comput

Fig. 6 AR(3) plus noise. Estimates of the AR plus noise model pa- (e) show, respectively, estimates of p(r1|y1:t ), p(r2|y1:t ), p(λ2|y1:t ), rameters obtained by applying the PL algorithm described in Sect. 3 p(w|y1:t ) and p(v|y1:t ) based on M = 2000 particles. Plot (f)displays to the simulated data yt from an AR(3) plus noise model with one real xt (dots) and estimates of the posterior mean of (xt |y1:t ) (solid line) reciprocal root r1 =−0.95 and a pair of complex reciprocal roots with based on 2000 particles at each time t = 1 : 250 moduli r2 = 0.95 and wavelength λ2 = 16. Plots (a), (b), (c), (d), and series in Fig. 9(a). We fitted an AR(3) plus noise model to Plots (a), (b), and (c) in Fig. 10 show estimates of | | | = : the data. We assumed that the AR(3) process xt had two p(r1 y1:t ), p(λ1 y1:t ), and p(r2 y1:t ) for t 1 400 ob- pairs of complex reciprocal roots with modulus r1 and pe- tained by applying the PL algorithm described in Sect. 3 = riod λ1, and one real root with modulus r2. Furthermore, we with M 2000 particles. These plots display the poste- rior means and 95% posterior intervals. The in assumed the following prior structure: r1 ∼ U(0.5, 1), λ1 ∼ (d), (e), and (f), correspond to the PL-based distributions of U(3, 20), r2 ∼ U(−1, 1), v ∼ IG(αv,βv) with αv = 50 and (r1|y1:400), (λ1|y1:400), and (r2|y1:400). The plots show that βv = (αv − 1) × 3500, and w ∼ IG(αw,βw) with αw = 2 there is strong evidence of a persistent quasiperiodic com- and β = (α − 1) × 1. The priors for r and λ were cho- w w 1 1 ponent in x with a modulus in the 0.9Ð1.0 range (posterior sen to reflect the fact that it is known that the data will show t mean of 0.957), and period in the 11Ð14 range (posterior at least one rather persistent quasiperiodic component (i.e., mean of 12.72). The 95% posterior interval for r2 based on r1 > 0.5), but there is a fair amount of uncertainty about the M = 2000 particles is (−0.523, −0.121), and its posterior period of that component. A non-informative prior was cho- mean is −0.372, indicating that this is not a very persis- sen on r . For the variances, based on previous analyses of 2 tent component. Under this model yt isassumedtohave similar data, we assume√ that the at the an ARMA(3, 3) structure, however, the posterior distribu- observational level v lies in the 50Ð70√ microvolts range, tion of r2 suggests that an AR(2) plus noise model, which while the system standard deviation w was assumed to be leads to an ARMA(2, 2) representation of yt , may be a bet- centered at 1.0 with a relatively large dispersion a priori. ter model. We fitted such model and obtained similar poste- Stat Comput

Fig. 7 AR(3) plus noise—PL. Plots (a), (b)and(c) are based on draws from the PL approximations to p(r1|y1:20), p(r2,λ2|y1:20), and p(v,w|y1:20), respectively, while the plots (d), (e)and(f) are based on draws from the PL approximations to p(r1|y1:250), p(r2,λ2|y1:250) and p(v,w|y1:250). These approximations are based on M = 2000 particles

Fig. 8 AR(3) plus noise—MCMC. Plots (a), (b) and (c) are based on draws from the FFBS (MCMC) approximations to p(r1|y1:20), p(r2,λ2|y1:20), and p(v,w|y1:20), respectively, while the plots (d), (e)and(f) are based on draws from the FFBS (MCMC) approximations to p(r1|y1:250), p(r2,λ2|y1:250) and p(v,w|y1:250). These approximations are based on 10000 MCMC iterations obtained after a burn-in period of 3000 iterations

rior inference on (r1,λ1). Formal model comparison can be Plots (a)Ð(c) in Fig. 9 show various aspects of the fit of the performed to choose between the AR(3) plus noise and the AR(3) plus noise model to the EEG data. In particular, plot AR(2) plus noise models. We do not present such compar- isons since the focus of the paper is on the SMC algorithms (a) shows the data, yt (solid black line), along with 5 simu- (∗,m) for filtering and parameter learning. lated traces, yt (gray lines), from the AR(3) plus noise Stat Comput

Fig. 9 Analysis of EEG data. (a) A section of a human EEG trace (black line)and5 simulated yt traces (gray lines) from the AR(3) plus noise model fitted with the PL algorithm with M = 2000 particles. (b) The time series shows the observed EEG trace from t = 1 : 250 and data simulated from the AR(3) plus noise model fitted with the PL algorithm for t = 251 : 400. (c) Estimates of the posterior mean of (xt |y1:t ) basedonthePL approximation with M = 2000 particles (black line). The graph also shows 95% intervals for ∗| (yt y1:t ) obtained from the model based on the PL approximation (dotted lines); the actual EEG data are displayed in gray

(∗,m) = (m) + (m) model. The traces were obtained via yt xt t , 5 Discussion (m) ∼ (m) (m) (m) with t N(0,vt ), and with xt and vt , randomly chosen values from the M = 2000 particle-based represen- We consider sequential Monte Carlo approaches to obtain si- tation of (xt |y1:t ) and (v|y1:t ), respectively. Plot (b) dis- multaneous filtering and parameter estimation in state-space plays the actual EEG data for t = 1 : 250 and a simulated autoregressive models with structured priors on the AR co- trace for t = 251 : 400. These picture shows that the AR(3) efficients. Specifically, algorithms based on the schemes plus noise model does well predictively since it is not easy proposed by Liu and West (2001), Storvik (2002), and Car- to determine when the real data ends and when the simu- valho et al. (2010) are detailed and illustrated. lated data begins. Figure 9(c) shows the PL-based posterior Similar to the approach of Huerta and West (1999b), pri- mean of (xt |y1:t ) (black solid line), 95% posterior intervals ors are imposed on the moduli and wavelengths that define ∗| the reciprocal characteristic roots of the AR process at the for (yt y1:t ) (black dotted lines), and the EEG data (dark gray dots and lines). All the observations except two (one system level. Such priors are important in practical scenar- around t = 250 and another right before t = 400) lie within ios since they allow for the incorporation of scientifically the 95% posterior bands indicating a good model fit. The meaningful information (see examples in Huerta and West model can be easily extended to handle outlying observa- 1999a, 1999b, and Prado 2010b). Furthermore, structured tions by changing the structure of the observational noise AR plus noise models can be used to analyze data with an (e.g., Student-t, mixtures). See Lopes and Polson (2010)for ARMA structure, as illustrated in the EEG data analysis of PL applied to stochastic volatility models with Student-t er- Sect. 4.2, providing a more interpretable representation of rors. the ARMA structure and the capability of including infor- The results based on the structured AR plus noise mod- mative priors. els summarized above are consistent with those obtained by Note that if t = 0 for all t, the state-space AR model Prado and West (2010) via long order AR models. The mod- becomes a standard and in such case, els presented here are more flexible since they capture ad- the particle-based approaches provide alternative sequen- ditional structure (e.g. ARMA structure) that is at best ap- tial methods for parameter learning in structured autoregres- proximated by long order AR models, and provide a way sive models that can be used instead of MCMC approaches to incorporate prior information that is scientifically inter- that are not suitable for real-time inference. In particular, pretable. one may choose to use a standard long order autoregressive Stat Comput

Fig. 10 Analysis of EEG data. Results obtained from applying the PL algorithm to fit a state-space AR(3) model to the EEG data shown in Fig. 9 model with structured priors instead of a low order AR plus are generally easier to implement in the structured AR mod- noise model with structured priors to approximate a more eling framework considered here, however, particle degen- complex ARMA model. Fitting AR models with no noise at eracy was evident in many of the examples considered even the observational level has some advantages over fitting AR when relatively large numbers of particles were used. Parti- plus noise models when SMC approaches are used, since cle degeneracy is a problem all filters eventually have to deal models with t = 0 and constant parameters over time do with, including PL and Storvik filters. However, PL, Storvik not show the particle degeneracy issues that appear in state- and similar filters that replenish particles based on low di- space models with unknown parameters. On the other hand, mensional vectors of model-specific sufficient statistics will prior elicitation and interpretation in high order AR mod- generally delay particle degeneracy (see our discussion on els are much more cumbersome than prior elicitation and Sect. 3.1). interpretation in low order AR plus noise models. In gen- eral, we suggest the use of low order AR plus noise mod- Acknowledgements The authors would like to thank the Statisti- cal and Applied Mathematical Sciences Institute (SAMSI) for partially els for relatively short time series—say, no more than 1000 supporting his research via the 2008-09 Program on Sequential Monte observations—and high order standard AR models for long Carlo Methods. time series. We applied the proposed algorithms to data simulated from various AR state-space models. In these simulation References studies we found that algorithms based on the schemes of Storvik (2002) and Carvalho et al. (2010) had similar per- Cappé, O., Godsill, S., Moulines, E.: An overview of existing methods formances, leading to good particle-based approximations to and recent advances in sequential Monte Carlo. IEEE Proc. Signal Process. 95, 899Ð924 (2007) the posterior distributions of the parameters and the states. Carter, C., Kohn, R.: Gibbs sampling for state space models. Algorithms based on the approach of Liu and West (2001) Biometrika 81, 541Ð553 (1994) Stat Comput

Carvalho, C., Johannes, M., Lopes, H., Polson, N.: Particle learning Lopes, H.F., Tsay, R.S.: Particle filters and in fi- and smoothing. Stat. Sci. 25, 88Ð106 (2010) nancial econometrics. J. Forecast. 30, 168Ð209 (2011) Chopin, N., Jacob, P.E., Papaspiliopoulos, O.: SMC2: A sequential Lopes, H.F., Carvalho, C.M., Johannes, M., Polson, N.G.: Particle Monte Carlo algorithm with particle learning for sequential Bayesian computation (with discussion). updates. Technical report, ENSAE-CREST (2011) In: Bernardo, J.M., Bayarri, M.J., Berger, J.O., Dawid, A.P., Del Moral, P., Doucet, A., Jasra, A.: Sequential Monte Carlo samplers. Heckerman, D., Smith, A.F.M., West, M. (eds.) Bayesian Statis- J. R. Stat. Soc. B 68, 411Ð436 (2006) tics 9, pp. 317Ð360. Oxford University Press, Oxford (2010) Doucet, A., Johansen, A.M.: A tutorial on particle filtering and smooth- Pitt, M., Shephard, N.: Filtering via simulation: Auxiliary variable par- ing: Fifteen years later. In: Crisan, D., Rozovsky, B. (eds.) Oxford ticle filters. J. Am. Stat. Assoc. 94, 590Ð599 (1999) Handbook of Nonlinear Filtering. Oxford University Press, Lon- Poyiadjis, G., Doucet, A., Singh, S.S.: Particle approximations of the don (2011) score and observed information matrix in state space models with Fearnhead, P.: MCMC, sufficient statistics and particle filter. J. Com- application to parameter estimation. Biometrika 98, 65Ð80 (2011) put. Graph. Stat. 11, 848Ð862 (2002) Prado, R.: Characterization of latent structure in brain signals. In: Flury, T., Shephard, N.: Learning and filtering via simulation: smoothly Chow, S., Ferrer, E., Hsieh, F. (eds.) Statistical Methods for Mod- jittered particle filters. Technical report, Oxford-Man Institute eling Human Dynamics, pp. 123Ð153. Routledge, Taylor and (2009) Francis, New York (2010a) Frühwirth-Schnatter, S.: Data augmentation and dynamic linear mod- Prado, R.: Multi-state models for mental fatigue. In: O’Hagan, A., els. J. Time Ser. Anal. 15, 183Ð202 (1994) West, M. (eds.) The Handbook of Applied Bayesian Analysis. Ox- Granger, C., Morris, M.J.: Time series modelling and interpretation. J. ford University Press, Oxford (2010b) R. Stat. Soc. A 139(2), 246Ð257 (1976) Prado, R., West, M.: Time Series: Modeling, Computation, and Infer- Huerta, G., West, M.: Bayesian inference on periodicities and compo- ence. CRC Press, Chapman & Hall, Boca Raton (2010) nent spectral structure in time series. J. Time Ser. Anal. 20, 401Ð Prado, R., West, M., Krystal, A.: Multi-channel EEG analyses via dy- 416 (1999a) namic regression models with time-varying lag/lead structure. J. Huerta, G., West, M.: Priors and component structures in autoregres- R. Stat. Soc., Ser. C, Appl. Stat. 50, 95Ð109 (2001) sive time series models. J. R. Stat. Soc. B 61, 881Ð899 (1999b) Storvik, G.: Particle filters for state-space models with the presence of Liu, J., West, M.: Combined parameter and state estimation in unknown static parameters. IEEE Trans. Signal Process. 50, 281Ð simulation-based filtering. In: Doucet, A., de Freitas, N., Gordon, 289 (2002) N. (eds.) Sequential Monte Carlo Methods in Practice, pp. 197Ð West, M.: Approximating posterior distributions by mixtures. J. R. Stat. 223. Springer, Berlin (2001) Soc. B 55, 409Ð422 (1993) Liu, J.S.: Metropolized independent sampling with comparisons to re- West, M.: Time series decomposition. Biometrika 84, 489Ð494 (1997) jection sampling and importance sampling. Stat. Comput. 6, 113Ð 119 (1996) Lopes, H.F., Polson, N.G.: Particle learning for fat-tailed distributions. Technical report, The University of Chicago Booth School of Business (2010)