A General Theory of Particle Filters in Hidden Markov Models And

Home , Hidden Markov model

arXiv:1312.5114v1 [math.ST] 18 Dec 2013 h omlzn osati lodffiutt opt o high for compute to difficult also to is difficult constant often normalizing is the distribution conditional this However, (1.2) error. amn n mt [ Smith and Salmond odtoa on conditional osdretmto of estimation consider 03 o.4,N.6 2877–2904 6, No. DOI: 41, Vol. Statistics 2013, of Annals The nwhich in (1.1) ν Y ainof mation ecniinlyidpnetgiven independent conditionally be

c aiainadtpgahcdetail. typographic 2877–2904 and 6, the pagination by No. published 41, Vol. article 2013, original Statistics the Mathematical of of reprint Institute electronic an is This X t nttt fMteaia Statistics Mathematical of Institute e od n phrases. and words Key 2 1 .Introduction. 1. M 00sbetclassifications. subject 2000 AMS 2013. September revised 2012; May Received ( = uotdb h ainlUiest fSnaoeGatR-15 DMS-11-06535. Grant Grant Singapore NSF of by University Supported National the by Suported EEA HOYO ATCEFLESI HIDDEN IN FILTERS PARTICLE OF THEORY GENERAL A and 10.1214/13-AOS1172 Y 1 ν atcefitretmtso h idnstates. hidden the e of standard estimates the filter martinga of particle estimates and consistent formula to amo variance lead resentations asymptotic dependence the a complicated paths, variances. models in sample asymptotic Markov result their hidden resamplings for in repeated formula filters simple particle relatively of normality totic ainlUiest fSnaoeadSafr University Stanford and Singapore of University National ,Y ., . . , Y p AKVMDL N OEAPPLICATIONS SOME AND MODELS MARKOV E t and , ymkn s fmrigl ersnain,w eietea the derive we representations, martingale of use making By [ and yHc egChan Peng Hock By ψ ( Y t X ,and ), T p T g 1 p t ˜ ) nteaoehde akvmdl(M)is (HMM) model Markov hidden above the in ( T | ·| Y X r est ucin ihrsett oemeasures some to respect with functions density are Let ( X 1 x 11 t ,Y ., . . , ψ atcefitr motnesmln n eapig stand resampling, and sampling importance filter, Particle T 0 ∼ ψ introduced ] eoe h nta density initial the denotes ) X | T Y eamaual elvle ucinof function real-valued measurable a be p 2013 , t = := T ( ·| ) T hsrpitdffr rmteoiia in original the from differs reprint This . { X ∝ .Mr eeal,letting generally, More ]. E X rmr 01,6C5 eodr 02,60K35. 60J22, secondary 65C05; 60F10, Primary t [ t − t Y ψ =1 t , T 1 in ( ) X ≥ [ X Y , p h naso Statistics of Annals The 1 1 t T 1 uhthat such , atcefilters particle ( n z en Lai Leung Tze and ) } x | t Y eaMro hi n let and chain Markov a be | x T t t − .Tedniyfnto of function density The ]. ∼ 1 ) g g t t ( ( ·| Y X t | t x ) o ot al esti- Carlo Monte for t , p )] 1 5-000-120-112. . ( X rr fthe of rrors · of ) apefo and from sample , Although t ( = 2 gthe ng erep- le symp- da nd X -dimensional X 1 1 Gordon, . Y X , . . . , X 1 Y , T 2 we , . . . , X ard t ), T 2 H. P. CHAN AND T. L. LAI or complicated state spaces. Particle filters that use sequential Monte Carlo (SMC) methods involving importance sampling and resampling have been developed to circumvent this difficulty. Asymptotic normality of the particle filter estimate of ψT when the number of simulated trajectories becomes infinite has also been established [6, 8, 9, 12]. However, no explicit formulas are available to estimate the standard error of ψˆT consistently. In this paper, we provide a comprehensive theory of the SMC estimate ψˆT , which includes asymptotic normality and consistent standard error estimation. The main results are stated in Theorems 1 and 2 in Section 2 for the case in which bootstrap resampling is used, and extensions are given in Section 4 to residual Bernoulli (instead of bootstrap) resampling. The proof of Theorem 1 for the case where bootstrap resampling is used at every stage, given in Section 3.2, proceeds in two steps. First we assume that the normalizing constants [i.e., constants of proportionality in (1.2)] are easily computable. We call this the “basic prototype,” for which we derive in Section 3.1 martingale representations of m(ψ˜ ψ ) for novel SMC esti- T − T mates ψ˜T that involve likelihood ratios and general resampling weights. We first encountered this prototype in connection with rare event simulation using SMC methods in [5]. Although traditional particle filters also use similar sequential importance sampling procedures and resampling weights that are proportional to the likelihood ratio statistics, the estimates of ψT are coarser averages that do not have this martingale property. Section 3.2 gives an asymptotic analysis of m(ψˆT ψT ) as m in this case, by making use of the results in Section 3.1. − →∞ In contrast to Gilks and Berzuini [10] who use particle set sizes mt which increase with t and approach in a certain way to guarantee consistency of their standard error estimate∞ which differs from ours, we use the same particle set size m for every t, where m is the number of SMC trajectories. As noted in page 132 of [10], “the mode of convergence of the theorem is not directly relevant to the practical context” in which m1 = m2 = = m. A major reason why we are able to overcome this difficulty in developing· · · a consistent standard error estimate lies in the martingale approximation for our SMC estimate ψˆT . We note in this connection that for the basic prototype, although both martingale representations of m(ψ˜ ψ ) developed T − T in Section 3.1 can be used to prove the asymptotic normality of ψ˜T as the number m of SMC trajectories approaches , only one of them leads to an estimable expression for the asymptotic variance.∞ Further discussion of the main differences between the traditional and our particle filters is given in Section 5.

2. Standard error estimation for bootstrap ﬁlters. Since Y1,Y2,... are the observed data, we will treat them as constants in the sequel. Let qt( xt 1) ·| − be a conditional density function with respect to νX such that qt(xt xt 1) > 0 | − whenever pt(xt xt 1) > 0. In the case t = 1, qt( xt 1)= q1( ). | − ·| − · PARTICLE FILTER 3

2.1. Bootstrap resampling at every stage. Let

(2.1) wt(xt)= pt(xt xt 1)gt(Yt xt)/qt(xt xt 1). | − | | − To estimate ψT , a particle filter first generates m conditionally independent random variables X1,..., Xm at stage t, with Xi having density q ( Xi ), t t t t ·| t 1 Xi Xi i − to form t = ( t 1, Xt) and then use normalized resampling weights −e e e m e e i Xi Xj (2.2) Wt = wt( t) wt( t ) . Xj=1 e e to draw m sample paths Xj, 1 j m, from Xi, 1 i m . Note that t ≤ ≤ { t ≤ ≤ } i Xi Wt are the importance weights attached to t and that after resampling Xi the t have equal weights. The following recursivee algorithm can be used to implement particle filters with bootstrap resampling.e It generates not only Xi i the T but also the ancestral origins AT 1 that are used to compute the − i i standard error estimates. It also computes Ht and Ht recursively, where e i Xi i Xi Ht =w ¯1 w¯tht( t),Ht =w ¯1 w¯tht( t), (2.3) · · · e · · · m t e Xj e x x w¯k = wk( k)/m, ht( t) = 1 wk( k). Xj=1 . kY=1 i e i Initialization: let A0 = i and H0 = 1 for all 1 i m. Importance sampling at stage t = 1,...,T : generate≤ ≤ conditionally inde- i Xi pendent Xt from qt( t 1), 1 i m. Bootstrap resampling·| at− stage≤t =≤ 1,...,T 1: generate i.i.d. random vari- 1 m 1 j− ables Bt ,...,Be t such that P (Bt = j)= Wt and let j j j Xj j j XBt Bt Bt ( t , At ,Ht ) = ( t , At 1, Ht ) for 1 j m. − ≤ ≤ The SMC estimate is defined by e e m ˆ 1 Xi Xi (2.4) ψT = (mw¯T )− ψ( T )wT ( T ). Xi=1 e e Let Ep˜T denote expectation with respect to the probability measure under which XT has density (1.2), and let Eq denote that under which Xt Xt 1 has the conditional density function q for all 1 t T . In Section 3.2| we− t ≤ ≤ prove the following theorem on the asymptotic normality of ψˆT and the consistency of its standard error estimate. Define t ηt = Eq wk(Xk) , " # kY=1 4 H. P. CHAN AND T. L. LAI

t x x (2.5) ht∗( t)= ηt wk( k), ζt = ηt 1/ηt, − . kY=1 t x x 2 x Γt( t)= [wk( k)+ wk( k)], kY=1 with the convention η =1 and h Γ 1. Let 0 0∗ ≡ 0 ≡ T (2.6) LT (xT ) =p ˜T (xT YT ) qt(xt xt 1). | | − t=1 . Y Let f =0 and define for 1 t T , 0 ≤ ≤ (2.7) f (x )[= f (x )] = E [ψ(X ) ψ ]L (X ) X = x . t t t,T t q{ T − T T T | t t} 2 2 2T 1 2 Theorem 1. Assume that σ < , where σ = − σ and ∞ k=1 k 2 2 X 2 X X σ2t 1 = Eq [ft ( t) ft 1( t 1)]ht∗ 1P( t 1) , (2.8) − { − − − − − } 2 2 X X σ2t = Eq[ft ( t)ht∗( t)]. 2 2 p Then √m(ψˆT ψT ) N(0,σ ). Moreover, if EqΓT (XT ) < , then σˆ (ψˆT ) σ2, where for− any real⇒ number µ, ∞ → m Xi 2 2 1 wT ( T ) Xi (2.9) σˆ (µ)= m− [ψ( T ) µ] . w¯T − j=1i : Ai =j X XT −1 e e 2.2. Extension to occasional resampling. Due to the computational cost of resampling and the variability introduced by the resampling step, occasional resampling has been recommended, for example, by Liu (Chap- ter 3.4.4 of [13]), who propose to resample at stage k only when cvk, the coefficient of variation of the resampling weights at stage k, exceeds some threshold. Let τ1 <τ2 < <τr be the successive resampling times and let τ˜(k) be the most recent resampling· · · time before stage k, withτ ˜(k)=0 if no resampling has been carried out before time k. Define k x x i Xi (2.10) vk( k)= wt( t),Vk = vk( k)/(mv¯k), t=˜τY(k)+1 1 m Xj e wherev ¯k = m− j=1 vk( k). Note that if bootstrap resampling is carried out at stage k, then V i, 1 i m, are the resampling weights. Define the P k SMC estimator e≤ ≤ m ˆ i Xi (2.11) ψOR = VT ψ( T ). Xi=1 e PARTICLE FILTER 5

In Section 4.3, we show that the resampling times, which resample at stage k 2 when cvk exceeds some threshold c, satisfy p p (2.12) r r∗ and τ τ ∗ for 1 s r∗ as m , → s → s ≤ ≤ →∞ for some nonrandom positive integers r∗,τ1∗,... and τr∗∗ . In Section 4.2 we prove the following extension of Theorem 1 to ψÔR and give an explicit formula (4.6), which is assumed to be finite, for the limiting variance σ2 of √m(ψˆ ψ ). OR OR − T Theorem 2. Under (2.12), √m(ψˆ ψ ) N(0,σ2 ) as m . OR − T ⇒ OR →∞ X 2 ˆ p 2 Moreover, if EqΓT ( T ) < , then σÔR(ψOR) σOR, where for any real number µ, ∞ → m Xi 2 2 1 vT ( T ) Xi σÔR(µ)= m− [ψ( T ) µ] . v¯T − j=1i : Ai =j X Xτr e e 2.3. A numerical study. Yao [15] has derived explicit formulas for ψT = E(X Y ) in the normal mean shift model (X ,Y ) : t 1 with the unob- T | T { t t ≥ } served states Xt generated recursively by X , with prob. 1 ρ, X = t 1 t Z ,− with prob. ρ,− t where 0 <ρ< 1 is given and Zt N(0,ξ) is independent of Xt 1. The ∼ − observations are Yt = Xt + εt, in which the εt are i.i.d. standard normal, 1 t T . Instead of XT , we simulate using the bootstrap filter the change- ≤ ≤ I 1 point indicators T := (I1,...,IT ), where It = Xt=Xt−1 for t 2 and I1 = 1; this technique of simulating E(X Y ) via realizations{ 6 } of E≥(X I , Y ) T | T T | T T is known as Rao–Blackwellization. Let Ct = max j t : Ij = 1 denote the { ≤ 1 } 1 most recent change-point up to time t, λt = (t Ct +1+ ξ− )− and µt = λ t Y . It can be shown that E(X I , Y−)= µ and that for t 2, t i=Ct i T | T T T ≥ P It = 1 It 1, Yt = a(ρ)/ a(ρ)+b(ρ) , where a(ρ)= ρφ0,1+ξ(Yt) and b(ρ)= (1{Pρ)φ | − }(Y ), in{ which φ }denotes the N(µ, V ) density function. − µt−1,1+λt−1 t µ,V The results in Table 1 are based on a single realization of Y1000 with ξ =1 and ρ = 0.01. We use particle filters to generate Ii : 1 i m { 1000 ≤ ≤ } for m = 10,000 and compute ψÔR for T = 200, 400, 600, 800, 1000, using bootstrap resampling when cv2 c, for various thresholds c. The values of t ≥ σÔR(ψÔR) in Table 1 suggest that variability of ψÔR is minimized when c = 2. Among the 40 simulated values of ψÔR in Table 1, 24 fall within one estimated standard error [=σ ÔR(ψÔR)/√m] and 39 within two estimated standard errors of ψT . This agrees well with the corresponding numbers 27 and 39, respectively, given by the central limit theorem in Theorem 2. Table 2 6 H. P. CHAN AND T. L. LAI

Table 1 Simulated values of ψÔR σÔR(ψÔR)/√m. An italicized value denotes that ψT is outside ± the interval but within two standard errors of ψÔR; a boldfaced value denotes that ψT is more than two standard errors away from ψÔR c T = 200 T = 400 T = 600 T = 800 T = 1000

0.3718 0.0067 0.1791 0.0281 0.3022 0.0136 0.5708 0.0380 0.5393 0.0180 ∞ − ± − ± − ± ± − ± 50 0.3592 0.0019 0.1453 0.0056 0.2768 0.0020 0.6004 0.0030 0.5314 0.0024 − ± − ± − ± ± − ± 10 0.3622 0.0019 0.1421 0.0032 0.2802 0.0029 0.6013 0.0024 0.5287 0.0015 − ± − ± − ± ± − ± 5 0.3558 0.0018 0.1382 0.0027 0.2777 0.0026 0.6004 0.0022 0.5321 0.0012 − ± − ± − ± ± − ± 2 0.3584 0.0013 0.1421 0.0024 0.2798 0.0018 0.6030 0.0020 0.5316 0.0008 − ± − ± − ± ± − ± 1 0.3615 0.0013 0.1402 0.0027 0.2761 0.0019 0.5997 0.0019 0.5291 0.0014 − ± − ± − ± ± − ± 0.5 0.3610 0.0016 −0.1488 ± 0.0036 0.2785 0.0020 0.6057 0.0021 0.5315 0.0014 − ± − ± ± − ± 0 0.3570 0.0032 0.1485 0.0067 0.2759 0.0033 0.6033 0.0031 0.5301 0.0047 − ± − ± − ± ± − ±

ψT 0.3590 0.1415 0.2791 0.6021 0.5302 − − − −

reports a larger simulation study involving 500 realizations of Y1000, with an independent run of the particle filter for each realization. The table shows confidence intervals for each T , with resampling threshold c = 2. The results forσ ÔR(ψÔR) agree well with the coverage probabilities of 0.683 and 0.954 given by the limiting standard normal distribution in Theorem 2. Table 2 also shows that the Gilks–Berzuini estimator VT described in Section 5 is very conservative. b 3. Martingales and proof of Theorem 1. In this section we first consider in Section 3.1 the basic prototype mentioned in the second paragraph of Section 1, for which m(ψ˜T ψT ) can be expressed as a sum of martingale differences for a new class of− particle filters. We came up with this martingale representation, which led to a standard error estimate forα ˜T , in [5] where we used a particle filterα ˜T to estimate the probability α of a rare event that XT belongs to Γ. Unlike the present setting of HMMs, the XT

Table 2 Fraction of confidence intervals ψÔR σ/ˆ √m containing the true mean ψT ± 2 2 2 b σˆ =σ ÔR(ψÔR)σ ˆ = VT T 1 se 2 se 1 se 2 se

200 0.644 0.956 0.980 1.000 400 0.652 0.948 0.998 1.000 600 0.674 0.958 1.000 1.000 800 0.716 0.974 1.000 1.000 1000 0.650 0.958 1.000 1.000 PARTICLE FILTER 7 is not a vector of hidden states in [5], where we showed that m(˜α α) is T − a martingale, thereby proving the unbiasedness ofα ˜T and deriving its standard error. In Section 3.1 we extend the arguments for m(˜α α) in [5] to T − m(ψ˜T ψT ) under the assumption that the constants of proportionality in (1.2) are− easily computable, which we have called the “basic prototype” in Section 1. We then apply the results of Section 3.1 to develop in Section 3.2 martingale approximations for traditional particle ﬁlters and use them to prove Theorem 1.

3.1. Martingale representations for the basic prototype. We consider here the basic prototype, which requires T to be fixed, in a more general setting than HMM. Let (XT , YT ) be a general random vector, with only YT observed, such that the conditional densityp ˜ (X Y ) of X given Y is T T | T T T readily computable. As in Section 2, since YT represents the observed data, we can treat it as constant and simply denotep ˜T (xT YT ) byp ˜T (xT ). The bootstrap resampling scheme in Section 2.1 can be applied| to this general setting and we can use general weight functions wt(xt) > 0, of which (2.1) is a special case for the HMM (1.1). Becausep ˜T (xT ) is readily computable, x x T x so is the likelihood ratio LT ( T ) =p ˜T ( T )/ t=1 qt(xt t 1). This likelihood ratio plays a fundamental role in the| theory− of the particle Q filter estimateα ˜T of the rare event probability α in [5]. An unbiased estimate of ψT under the basic prototype is m ˜ 1 Xi Xi i (3.1) ψT = m− LT ( T )ψ( T )HT 1, − Xi=1 i e e where Ht is defined in (2.3). The unbiasedness follows from the fact that m(ψ˜ ψ ) can be expressed as a sum of martingale differences. There are T − T in fact two martingale representations of m(ψ˜T ψT ). One is closely related to that of Del Moral (Chapter 9 of [8]); see (3.11− ) below. The other is re- 2 lated to the variance estimateσ ˆ (ψˆT ) in Theorem 1 and is given in Lemma 1. Recall the meaning of the notation Eq introduced in the second para- i Xi graph of Section 2.1. Let #k denote the number of copies of k generated X1 Xm from k,..., k to form the m particles in the kth generation. Then, conditionally,{ (#1,...,} #m) Multinomial(m,W 1,...,W m). e k k ∼ k k e e Lemma 1. Let f˜ = ψ and define for 1 t T , 0 T ≤ ≤ (3.2) f˜(x )= E [ψ(X )L (X ) X = x ]. t t q T T T | t t Then f˜(x )= E [f˜ (X ) X = x ] and t t q T T | t t m ˜ j j (3.3) m(ψT ψT )= (ε1 + + ε2T 1), − · · · − Xj=1 8 H. P. CHAN AND T. L. LAI where j ˜ Xi ˜ Xi i ε2t 1 = [ft( t) ft 1( t 1)]Ht 1, − − − − − i : Ai =j Xt−1 (3.4) e εj = (#i mW i)[f˜(Xi)Hi f˜ ]. 2t t − t t t t − 0 i : Ai =j Xt−1 j e e Moreover, for each fixed j, εk, k, 1 k 2T 1 is a martingale difference sequence, where { F ≤ ≤ − } i Xi Xi i 2t 1 = σ( X1 : 1 i m ( s, s+1, As) : 1 s

Proof. Recalling that the “first generation” of the m particles consists 1 m i Xi of X1,..., X1 (before resampling), At = j if the first component of t is j i X1. Thus, At represents the “ancestral origin” of the “genealogical particle” Xi.e It followse from (2.2), (2.3) and simple algebra that for 1 i m, t ≤ ≤ e i i i (3.6) mWt = Ht 1/Ht, − ˜ Xi i i ˜ Xi i (3.7) ft( t)Ht = e #tft( t)Ht, i : Ai=j i : Ai =j Xt Xt−1 e e Xi Xi noting that t has the same first component as t 1. Multiplying (3.6) by ˜ Xi i − ft( t)Ht and combining with (3.7) yield e T e e ˜ Xi ˜ Xi i [ft( t) ft 1( t 1)]Ht 1 − − − − t=1 i : Ai =j X Xt−1 e T 1 − (3.8) + (#i mW i)f˜(Xi)Hi t − t t t t t=1 i : Ai =j X Xt−1 e e ˜ Xi i ˜ = fT ( T )HT 1 f0, − − i : Ai =j XT −1 e i i ˜ ˜ x x x recalling that A0 = i and H0 = 1. Since f0 = ψT and fT ( T )= ψ( T )LT ( T ) by (3.2), it follows from (3.1) that m ˜ ˜ Xi i ˜ (3.9) m(ψT ψT )= fT ( T )HT 1 mf0. − − − Xi=1 e PARTICLE FILTER 9

m i m i Since i=1 #t = i=1 mWt = m for 1 t T 1, combining (3.9) with (3.8) yields (3.3). ≤ ≤ − P P Let Em denote expectation under the probability measure induced by the random variables in the SMC algorithm with m particles. Then E (εj )= E (#j mW j) X1,..., Xm [f˜ (Xj )Hj f˜ ] = 0 m 2|F1 m{ 1 − 1 | 1 1 } 1 1 1 − 0 j j since Em(#1 1)= mW1 . More generally, the conditional distribution of X1 Xm |F e e e e ( t ,..., t ) given 2t 1 is that of m i.i.d. random vectors which take the Xj F − j value t with probability Wt . Moreover, the conditional distribution of 1 m (Xt ,..., Xt ) given 2(t 1) is that of m independent random variables such F − ei Xi ˜ Xi that Xt has density function qt( t 1) and, therefore, Em[ft( t) 2(t 1)]= ˜e Xi e ·| − |F − ft 1( t 1) by (3.2) and the tower property of conditional expectations. Hence,− e in− particular, e E (εj )= E [f˜ (Xi ) Xi ] f˜ (Xi) Hi = 0. m 3|F2 { m 2 2 | 1 − 1 1 } 1 i : Ai =j X1 e j Proceeding inductively in this way shows that εk, k, 1 k 2T 1 is a martingale diﬀerence sequence for all j. { F ≤ ≤ − }

Without tracing their ancestral origins as in (3.4), we can also use the successive generations of the m particles to form martingale differences directly and thereby obtain another martingale representation of m(ψ˜T ψT ). 1 m − Specifically, the preceding argument also shows that (Zk ,...,Zk ), k, 1 k 2T 1 is a martingale difference sequence, where{ F ≤ ≤ − } i ˜ Xi ˜ Xi i Z2t 1 = [ft( t) ft 1( t 1)]Ht 1, (3.10) − − − − − m Zi = f˜(Xei)Hi W jf˜(Xj)Hj. 2t t t t − t t t t Xj=1 1 m e e Moreover, Zk ,...,Zk are conditionally independent given k 1. It follows from (3.1), (3.6), (3.10) and an argument similar to (3.8) thatF − 2T 1 − (3.11) m(ψ˜ ψ )= (Z1 + + Zm). T − T k · · · k Xk=1 This martingale representation yields the limiting normal distribution of √m(ψ˜ ψ ) for the basic prototype in the following. T − T 2 2 2 Corollary 1. Let σC =σ ˜1 + +σ ˜2T 1, where · · · − ˜2 X ˜2 X X 2 Eq [ft ( t) ft 1( t 1)]ht∗ 1( t 1) , if k = 2t 1, (3.12) σ˜ = { − − − − − } − k ˜ ˜ 2 ( Eq [ft(Xt)h∗(Xt) f0] /h∗(Xt) , if k = 2t. { t − t } 10 H. P. CHAN AND T. L. LAI

Assume that σ2 < . Then as m , C ∞ →∞ (3.13) √m(ψ˜ ψ ) N(0,σ2 ). T − T ⇒ C The proof of Corollary 1 is given in the Appendix. The main result of 2 this section is consistent estimation of σC in the basic prototype. Deﬁne for every real number µ,

m 2 1 Xi Xi i σ˜ (µ)= m− LT ( T )ψ( T )HT 1 − j=1(i : Ai =j X XT −1 (3.14) e e T 1 2 − 1+ (#i mW i) µ , − t − t " t=1 i : Ai =j # ) X Xt−1 and note that by (3.4) and (3.8),

m 2 1 j j 2 σ˜ (ψT )= m− (ε1 + + ε2T 1) . · · · − Xj=1 p We next show thatσ ˜2(ψ ) σ2 by making use of the following two lemmas T → C which are proved in the Appendix. Since for every ﬁxed T as m , ψ˜T = 2 →∞ ψT + Op(1/√m) by (3.13), it then follows thatσ ˜ (ψ˜T ) is also consistent 2 for σC.

Lemma 2. Let 1 t T and let G be a measurable real-valued function ≤ ≤ on the state space. Deﬁne ht∗, ηt and ζt as in (2.5). X X (i) If Eq[ G( t) /ht∗ 1( t 1)] < , then as m , | | − − ∞ →∞ m 1 Xi p X X (3.15) m− G( t) Eq[G( t)/ht∗ 1( t 1)]. → − − Xi=1 e (ii) If E [ G(X ) /h (X )] < , then as m , q | t | t∗ t ∞ →∞ m 1 i p (3.16) m− G(X ) E [G(X )/h∗(X )]. t → q t t t Xi=1 p 1 In particular, for the special case G = w , (3.15) yields w¯ ζ− and, hence, t t → t i i Ht Ht 1 p (3.17) = = η− w¯1 w¯t 1 as m . Xi Xi t ht∗( t) ht∗( t) · · · → →∞ e e PARTICLE FILTER 11 X X Moreover, if Eq[ G( t) /ht∗ 1( t 1)] < , then applying (3.15) to G( ) 1 | | − − ∞ | · |× G( ) >M and letting M yield {| · | } →∞ m 1 Xi 1 p (3.18) m− G( t) G(Xi) >ε√m 0 as m for every ε> 0. | | {| e t | } → →∞ Xi=1 e Lemma 3. Let 1 t T and let G be a measurable nonnegative valued ≤ ≤ function on the state space. X X (i) If Eq[G( t)/ht∗ 1( t 1)] < , then as m , − − ∞ →∞ 1 Xi p (3.19) m− max G( t) 0. 1 j m → ≤ ≤ i : Ai =j Xt−1 e (ii) If E [G(X )/h (X )] < , then as m , q t t∗ t ∞ →∞ 1 Xi p (3.20) m− max G( t) 0. 1 j m → ≤ ≤ i i :XAt=j

p Corollary 2. Suppose σ2 < . Then σ˜2(ψ ) σ2 as m . C ∞ T → C →∞ Proof. By (3.4) and (3.8),

2T 1 T 1 − j ˜ Xi i − i i ˜ εk = fT ( T )HT 1 1+ (#t mWt ) f0. − − − k=1 i : Ai =j " t=1 (i) # X XT −1 X i : AXt−1=j e We make use of Lemmas 2 and 3, (3.12) and mathematical induction to show that for all 1 k 2T 1, ≤ ≤ − m k 1 j j 2 p 2 (3.21) m− (ε + + ε ) σ˜ . 1 · · · k → ℓ Xj=1 Xℓ=1 By the weak law of large numbers, (3.21) holds for k = 1. Next assume that (3.21) holds for k = 2t and consider the expansion

m 1 j j 2 j j 2 m− [(ε + + ε ) (ε + + ε ) ] 1 · · · k+1 − 1 · · · k j=1 (3.22) X m m 1 j 2 1 j j j = m− (ε ) + 2m− (ε + + ε )ε . k+1 1 · · · k k+1 Xj=1 Xj=1 12 H. P. CHAN AND T. L. LAI

j i ℓ Let Ct = (i, ℓ) : i = ℓ and At = At = j . Suppressing the subscript t, let ˜ {Xi 6 ˜ Xi i j } Ui = [ft+1( ) ft( t)]Ht . Then ε = i Ui and t+1 − 2t+1 i : At=j m m P m e 1 j 2 1 2 1 (3.23) m− (ε2t+1) = m− Ui + m− UiUℓ. j=1 i=1 j=1 (i,ℓ) Cj X X X X∈ t By (3.12), (3.17) and Lemma 2(i), m 1 2 p 2 (3.24) m− U σ˜ . i → 2t+1 Xi=1 Since U1,...,Um are independent mean zero random variables conditioned on , it follows from (3.17), Lemmas 2(ii) and 3(ii) that F2t m 2 m− Var U U m i ℓ F2t j=1 j ! X (i,ℓX) Ct ∈ m m 2 2 2 2 2 2 (3.25) = m− E (U U ) m− E (U ) m i ℓ |F2t ≤ m i |F2t j=1 j j=1 i (i,ℓ) C i : At=j X X∈ t X X m 1 2 1 2 p m− Em(Ui 2t) m− max Em(Ui 2t) 0. ≤ |F 1 j m |F → " i=1 # ≤ ≤ i : Ai=j X Xt By (3.23)–(3.25), m 1 j 2 p 2 (3.26) m− (ε ) σ˜ . 2t+1 → 2t+1 Xj=1 We next show that m 1 j j j p (3.27) m− (ε + + ε )ε 0. 1 · · · 2t 2t+1 → Xj=1 j j Since ε1,...,ε2t are measurable with respect to 2t for 1 j m and 1 m F ≤ ≤ ε2t+1,...,ε2t+1 independent conditioned on 2t, by the induction hypothesis and by (3.17) and Lemma 3(ii), it follows thatF m 2 j j j m− Var ((ε + + ε )ε ) m 1 · · · 2t 2t+1|F2t Xj=1 m 2 j j 2 2 = m− (ε + + ε ) E (U ) 1 · · · 2t m i |F2t j=1 i : Ai=j X Xt PARTICLE FILTER 13

m 1 j j 2 1 2 p m− (ε + + ε ) m− max Em(Ui 2t) 0, ≤ 1 · · · 2t 1 j m |F → " j=1 # ≤ ≤ i : Ai=j X Xt and therefore (3.27) holds. By (3.22), (3.26), (3.27) and the induction hypothesis, (3.21) holds for k = 2t + 1. Next assume (3.21) holds for k = 2t 1. Suppressing the subscript t, let − S = (#i mW i)[f˜(Xi)Hi f˜ ]. Then i t − t t t t − 0 m m m 1 j 2 1 2 1 (3.28) m− (εe2t) e= m− Si + m− SiSℓ. j=1 i=1 j=1 (i,ℓ) Cj X X X X∈ t−1 From Lemma 2(i), (3.12) and (3.17), it follows that m 1 2 m− Em(Si 2t 1) |F − Xi=1 m 1 i ˜ Xi i ˜ 2 (3.29) = m− Varm(#t 2t 1)[ft( t)Ht f0] |F − − Xi=1 m Xi Xi e e 1 wt( t) wt( t) i i 2 p 2 = m− 1 [f˜(X )H f˜ ] σ˜ . w¯ − mw¯ t t t − 0 → 2t i=1 t t X e e Technical arguments given in the Appendix showe thate m 2 2 p (3.30) m− Varm Si 2t 1 0, F − → i=1 ! X m 2 2 p (3.31) m− Em SiSℓ 2t 1 0 − " j ! F # → j=1 (i,ℓ) C X Xt−1 ∈ and m 2 2 j j j p (3.32) m− Em (ε1 + + ε2t 1)ε2t 2t 1 0 · · · − F − → "(j=1 ) # X under (3.21) for k = 2t 1. By (3.22) and (3.28)–( 3.32), (3.21) holds for k = 2t. The induction proof− is complete and Corollary 2 holds.

Let Ψ(x )= ψ(x ) ψ and define Ψ as in (3.1) with Ψ in place of ψ. T T − T T Then replacing ψ˜ by Ψ in Corollaries 1 and 2 shows that as m , T T →∞ 2 e 2 p 2 (3.33) √mΨT N(0,σ ), σ˜ (0) σ , e ⇒ ∗ → whereσ ˜2(µ) is defined by (3.14) with ψ replaced by Ψ, noting that replacing ∗ e 2 2 ψ by Ψ in the definition of σC in Corollary 1 gives σ defined in (2.8) as f0 = 0. 14 H. P. CHAN AND T. L. LAI

3.2. Approximation of ψˆT ψT by ΨT and proof of Theorem 1. We now return to the setting of Section− 2.1, in which we consider the HMM (1.1) and use weight functions of the form (2.1e ) and the particle ﬁlter (2.4) in lieu of (3.1). We make use of the following lemma to extend (3.33) to ψˆ ψ . T − T Lemma 4. Let w be of the form (2.1) for 1 t T and deﬁne η and t ≤ ≤ T ζ by (2.5) and Ψ by (3.1) with ψ replaced by Ψ= ψ ψ . Then T T − T T e 1 (3.34) p˜T (xT )[=p ˜T (xT YT )] = η− [pt(xt xt 1)gt(Yt xt)], | T | − | t=1 Y T x 1 x (3.35) LT ( T )= ηT− wt( t), t=1 Y m 1 i i 1 (3.36) ψˆ ψ = (mw¯ )− Ψ(X )w (X )=(¯w w¯ )− η Ψ . T − T T T T T 1 · · · T T T Xi=1 e e e Proof. By (2.1) and (2.5),

T T ηT = [wt(xt)qt(xt xt 1)] dνX (xT )= [pt(xt xt 1)gt(Yt xt)] dνX (xT ), | − | − | Z Yt=1 Z Yt=1 and (3.34) follows from (1.2). Moreover, (3.35) follows from (2.1), (2.6) and (3.34). The ﬁrst equality in (3.36) follows from (2.3) and (2.4). By (2.3) and (3.35), Xi Xi i w¯1 w¯T wT ( T ) (3.37) LT ( T )HT 1 = · · · . − ηT w¯T e Hence, the second equalitye in (3.36) follows from (3.1).

The following lemma, proved in the Appendix, is used to prove Theo- erem 1. Although it resembles Lemma 3, its conclusions are about the square of the sums in Lemma 3 and, accordingly, it assumes ﬁniteness of second (instead of ﬁrst) moments.

Lemma 5. Let 1 t T , G be a measurable function on the state space, ≤ ≤ and Γt(xt) be deﬁned as in (2.5). 2 (i) If Eq[G (Xt)Γt 1(Xt 1)] < , then − − ∞ m 2 1 i m− G(X ) = O (1) as m . t p →∞ j=1i : Ai =j X Xt−1 e PARTICLE FILTER 15

(ii) If E [G2(X )Γ (X )] < , then q t t t ∞ m 2 1 i m− G(X ) = O (1) as m . t p →∞ j=1i : Ai=j X Xt Proof of Theorem 1. By (3.17), (3.33) and (3.36), (3.38) √m(ψˆ ψ )=(1+ o (1))√mΨ N(0,σ2). T − T p T ⇒ 2 1 m i i i 2 p 2 Sinceσ ˜ (0) = m− i LT (X )H Ψ(X ) σ by (3.14) j=1 i : AT −1=j Te T 1 T ∗ {2 2 − } → and (3.33), and sinceP σ ˜ P(0) = (1 + op(1))ˆσ (ψT ) in view of (2.9), (3.17) and (3.37), ∗ e e p (3.39) σˆ2(ψ ) σ2. T → Xi wT ( e T ) Xi Xi Letting cj = i : Ai =j [ψ( T ) ψT ], bj = i : Ai =j wT ( T )/w¯T T −1 w¯T − T −1 and a = ψˆ ψ , it follows from (2.9) that T −PT P m e m m me 1 1 2a a2 (3.40) σˆ2(ψˆ )= (c + ab )2 = c2 + b c + b2. T m j j m j m j j m j Xj=1 Xj=1 Xj=1 Xj=1 p Since a = O (m 1/2) by (3.38), m 1 m c2 =σ ˆ2(ψ ) σ2 by (3.39), and p − − j=1 j T → since m 1 m b2 = O (1) by Lemma 5 (with G = w ), − j=1 j p P t m m m P 2a 1 2 1 2 p bjcj a m− bj + m− cj 0. m ≤ | | ! → Xj=1 Xj=1 Xj=1 p Hence, by (3.40 ),σ ˆ2(ψˆ ) σ2. T → 4. Extensions and proof of Theorem 2. In this section we ﬁrst extend the results of Section 2.1 to the case where residual Bernoulli resampling is used in lieu of bootstrap resampling at every stage. We then consider occasional bootstrap resampling and prove Theorem 2, which we also extend to the case of occasional residual Bernoulli resampling.

4.1. Residual Bernoulli resampling. The residual resampling scheme introduced in [2, 3] often leads to smaller asymptotic variance than that of bootstrap resampling; see [6, 14]. We consider here the residual Bernoulli scheme given in [7], which has been shown in Section 2.4 of [9] to yield a consistent and asymptotically normal particle ﬁltering estimate of ψT . To implement residual Bernoulli resampling, we modify bootstrap resam- 1 M(t) pling at stage t as follows: let M(1) = m and let ξt ,...,ξt be inde- i pendent Bernoulli random variables conditioned on (M(t),Wt ) satisfying i i i P ξt = 1 2t 1 = M(t)Wt M(t)Wt . For each 1 i M(t) and t { |F − } − ⌊ ⌋ ≤ ≤ ≤ 16 H. P. CHAN AND T. L. LAI

T 1, let Hi = Hi /(M(t)W i) and make #i := M(t)W i + ξi copies of − t t 1 t t ⌊ t ⌋ t Xi i i − Xj j j ( t, At 1, Ht). These copies constitute an augmented sample ( t , At ,Ht ) : − e M(t) i { 1 j M(t + 1) , where M(t +1)= i=1 #t. Estimate ψT by e≤ ≤ e } PM(T ) ˆ 1 Xi Xi ψT,R := [M(T )w ¯T ]− ψ( T )wT ( T ). Xi=1 e e Let ΨT,R be (3.1) with Ψ= ψ ψT in place of ψ and M(T ) in place of i − m. Since Ht 1 is still given by (2.3) and Lemma 2 still holds for residual Bernoullie resampling,− it follows from (3.17) and (3.37) that

M(T ) 1 i i M(T )(ψˆ ψ ) =w ¯− Ψ(X )w (X ) T,R − T T T T T Xi=1 eM(T ) e ηT Xi Xi Xi (4.1) = Ψ( T )LT ( T )ht 1( t 1) w¯1 w¯T − − · · · Xi=1 e e =(1+ op(1))M(T )ΨT,R. ˜ Because replacing ψ by Ψ in (3.2) modiﬁese ft to ft as deﬁned in (2.7), we have for residual Bernoulli resampling the following analog of (3.11):

2T 1 − 1 M( k/2 ) (4.2) M(T )Ψ = (Z + + Z ⌈ ⌉ ), T,R k,R · · · k,R Xk=1 e i Xi Xi i i i i Xi i where Z2t 1,R = [ft( t) ft 1( t 1)]Ht 1, Z2t,R = (#t M(t)Wt )ft( t)Ht. − 2 − 2T 1 2 − − − − Let σR = k=1− σk,R, with e e e 2 P 2 2 X 2 X X σ2t 1,R = σ2t 1, σ2t,R = Eq[γ(wt∗( t))ft ( t)ht∗( t)], − − x x x in which γ(x) = (x x )(1 x + x )/x, wt∗( t)= ht∗ 1( t 1)/ht∗( t)= x 2 − ⌊ ⌋ − ⌊ ⌋ − − ζtwt( t) and σ2t 1 is deﬁned in Theorem 1. Since Var(ξ)= γ(x)x when ξ is a Bernoulli(x− x ) random variable, − ⌊ ⌋ M(t) M(t) i i i 2 Xi i 2 Var Z2t,R 2t 1 = γ(M(t)Wt )M(t)Wt ft ( t)(Ht) F − i=1 ! i=1 X X e e and, therefore, it follows from Lemma 2 that

M( k/2 ) ⌈ ⌉ 1 i p 2 (4.3) m− Var Zk,R k 1 σk,R, 1 k 2T 1. F − → ≤ ≤ − i=1 ! X

PARTICLE FILTER 17

By (4.1)–(4.3) and a similar modification of (3.3) for residual Bernoulli resampling, Theorem 1 still holds with ψˆT replaced by ψˆT,R; specifically, p (4.4) √m(ψˆ ψ ) N(0,σ2 ), σˆ2(ψˆ ) σ2 , T,R − T ⇒ R T,R → R whereσ ˆ2(µ) is defined in (2.9). The following example shows the advantage of residual resampling over bootstrap resampling.

Example. Consider the bearings-only tracking problem in [11], in which Xt = (Xt1,...,Xt4)′ and (Xt1, Xt3) represents the coordinates of a ship and (Xt2, Xt4) the velocity at time t. A person standing at the origin observes that the ship is at angle Yt relative to him. Taking into account measurement errors and random disturbances in velocity of the ship, and letting Φ be a 4 4 matrix with Φ =Φ =1=Φ for 1 i 4 and all other entries 0, × 12 34 ii ≤ ≤ Γbea4 2 matrix with Γ11 =Γ32 = 0.5, Γ21 =Γ42 = 1 and all other entries 0, the dynamics× can be described by state-space model 1 Xt =ΦXt 1 +Γzt, Yt = tan− (Xt3/Xt1)+ ut, − 2 2 2 where X11 N(0, 0.5 ), X12 N(0, 0.005 ), X13 N(0.4, 0.3 ), X14 N( 0.05, 0.01∼2), z N(0, 0.∼0012I ) and u N(0∼, 0.0052) are indepen-∼ − t+1 ∼ 2 t ∼ dent random variables for t 1. The quantities of interest are E(XT 1 YT ) and E(X Y ). To implement≥ the particle filters, we let q = p for t| 2. T 3| T t t ≥ Unlike [11] in which q1 = p1 as well, we use an importance density q1 that involves p1 and Y1 to generate the initial location (X11, X13). Let ξ and ζ be independent standard normal random variables, and r = tan(Y1 + 0.005ξ). Since u1 has small variance, we can estimate X13/X11 well by r. This sug- gests choosing q1 to be degenerate bivariate normal with support y = rx, with (y,x) denoting (X13, X11), so that its density function on this line is proportional to exp (x µ)2/(2τ) , where µ = 0.4r/(0.36 + r2), τ = 2 {− − } 0.09/(0.36 + r ). Thus, q1 generates X11 = µ + √τζ and X13 = rX11, but still follows p1 to generate (X12, X14). By (2.1), x2 (x 0.4)2 ζ2 w (x )= x √τ(1 + r2)exp 11 13 − + , 1 1 | 11| −2(0.5)2 − 2(0.3)2 2 noting that 0.005 x √τ(1 + r2) is the Jacobian of the transformation of | 11| (ξ,ζ) to (x11,x13). For t 2, (2.1) yields the resampling weights wt(xt)= exp [Y tan 1(x /x ≥)]2/[2(0.005)2] . We use m = 10,000 particles to {− t − − t3 t1 } estimate E((XT 1, XT 3) YT ) by particle filters, using bootstrap (boot) or residual (resid) Bernoulli| resampling at every stage, for different values of T . Table 3, which reports results based on a single realization of YT , shows that residual Bernoulli resampling has smaller standard errors than bootstrap resampling. Table 3 also considers bootstrap resampling that uses qt = pt for t 1, as in [11] and denoted by boot(P ), again with m = 10,000 particles. Although≥ boot and boot(P ) differ only in the choice of the importance den- 18 H. P. CHAN AND T. L. LAI

Table 3 Simulated values of ψˆT σˆ(ψˆT )/√m (for bootstrap resampling) and ψˆT,R σˆ(ψˆT,R)/√m ± ± (for residual Bernoulli resampling)

E(XT 1|YT ) E(XT 3|YT ) T boot(P ) boot resid boot(P ) boot resid

4 0.515 0.536 0.519 0.091 0.098 0.095 − − − 0.026 0.009 0.007 0.007 0.002 0.001 ± ± ± ± ± ± 8 0.505 0.515 0.497 0.118 0.121 0.117 − − − − − − 0.043 0.012 0.011 0.008 0.003 0.003 ± ± ± ± ± ± 12 0.503 0.532 0.508 0.327 0.346 0.331 − − − − − − 0.042 0.014 0.010 0.027 0.009 0.007 ± ± ± ± ± ± 16 0.506 0.538 0.516 0.538 0.572 0.549 − − − − − − 0.044 0.015 0.010 0.046 0.016 0.011 ± ± ± ± ± ± 20 0.512 0.537 0.532 0.747 0.783 0.777 − − − − − − 0.045 0.016 0.011 0.066 0.023 0.017 ± ± ± ± ± ± 24 0.512 0.551 0.540 0.950 1.023 1.002 − − − − − − 0.047 0.016 0.012 0.088 0.029 0.022 ± ± ± ± ± ± sities at t = 1, the standard error of boot is substantially smaller than that of boot(P ) over the entire range 4 T 24. Unlike Section 2.3, the standard error estimates in Table 3 use a sample≤ ≤ splitting reﬁnement that is described in the ﬁrst paragraph of Section 5.

4.2. Proof of Theorem 2. First consider the following modiﬁcation of ΨT : m 1 i i i 1 (4.5) Ψ := m− L (X )Ψ(X )H = η− v¯ v¯ (ψˆ ψ )e. OR T T T τ˜(T ) T τ1 · · · τr+1 OR − T Xi=1 Analogouse to (3.11), mΨ e= 2re+1(Z1 + + Zm), where OR k=1 k · · · k i Xi Xi i Z2s 1 = [fτs ( Pτs ) fτs−1 ( τs−1 )]Hτs−1 , − e − m Zi = f (Xei )Hi V j f(Xj )Hj . 2s τs τs τs − τs τs τs Xj=1 ∗ p τs e e e e Let τ0∗ = 0. Sincev ¯τs t=τ ∗ +1 ζt by (2.12) and Lemma 2, it follows from → s−1 (2.5) and (4.5) that Q √m(ψˆ ψ )=(1+ o (1))√mΨ N(0,σ2 ), OR − T p OR ⇒ OR similarly to (3.38), where e r∗+1 2 2 X 2 X X σOR = Eq [fτ ∗ ( τ ∗ ) fτ ∗ ( τ ∗ )]hτ∗∗ ( τ ∗ ) { s s − s−1 s−1 s−1 s−1 } s=1 X PARTICLE FILTER 19

(4.6) r∗ 2 + E [f ∗ (X ∗ )h ∗ (X ∗ )]. q τs τs τ∗s τs s=1 X Moreover, analogous to (3.3), mΨ = m (εj + + εj ), where OR j=1 1 · · · 2r+1 j Xi P Xi i ε2s 1 = [fτes ( τs ) fτs−1 ( τs−1 )]Hτs−1 , − − i : Ai =j Xτs−1 e εj = (#i mV i )f (Xi )Hi , 2s τs − τs τs τs τs i : Ai =j Xτs−1 e e p therefore, the proof ofσ ˆ2 (ψˆ ) σ2 is similar to that of Theorem 1. OR OR → OR 4.3. Occasional residual resampling and assumption (2.12). In the case of occasional residual resampling, Theorem 2 still holds with ψˆOR replaced ˆ 2 by ψORR and with σOR replaced by

r∗+1 2 2 X 2 X X σORR = Eq [fτ ∗ ( τ ∗ ) fτ ∗ ( τ ∗ )]hτ∗∗ ( τ ∗ ) { s s − s−1 s−1 s−1 s−1 } s=1 X ∗ ∗ r τs 2 + E γ w (X ) f ∗ (X ∗ )h ∗ (X ∗ ) . q t∗ t τs τs τ∗s τs s=1 " t=τ ∗ +1 ! # X Ys−1 p In particular, √m(ψˆ ψ ) N(0,σ2 ) andσ ˆ2 (ψˆ ) σ2 . ORR − T ⇒ ORR OR ORR → ORR Assumption (2.12) is often satisﬁed in practice. In particular, if one follows Liu [13] and resamples at stage t whenever

m m Xi 2 2 1 i 2 i=1[vt( t) v¯t] (4.7) cvt := m− (mVt ) 1= 2 − c, − mv¯t ≥ Xi=1 P e where c> 0 is a prespeciﬁed threshold for the coeﬃcient of variation, then (2.12) can be shown to hold by making use of the following lemma.

2 Lemma 6. Let resampling be carried out at stage t whenever cvt c. Then (2.12) holds with ≥

t 2 [ ∗ w∗(Xk)] k=τs−1+1 k τs∗ = inf t>τs∗ 1 : Eq 1 c − h∗∗ (Xτ ∗ ) − ≥ (4.8) Q τs−1 s−1 for 1 s r , ≤ ≤ ∗ 20 H. P. CHAN AND T. L. LAI where τ = 0 and r = max s : τ < T , provided that 0∗ ∗ { s∗ } ∗ τs 2 [ ∗ w∗(Xk)] k=τs−1+1 k (4.9) Eq X 1 = c for 1 s r∗. Q h∗∗ ( τ ∗ ) − 6 ≤ ≤ τs−1 s−1 x ℓ x Proof. Let τs∗ 1 + 1 ℓ τs∗ and G( ℓ)= k=τ ∗ +1 w∗( k). Apply ≤ ≤ s−1 k Lemma 2(i) to G and− G2, with t 1 in the subscript replaced by the most − Q recent resampling time τs∗ 1. It then follows from (4.7) that − 2 p 2 X X (4.10) cvℓ Eq[G ( ℓ)/hτ∗∗ ( τ ∗ )] 1 on τs 1 = τs∗ 1 . → s−1 s−1 − { − − } In view of (4.8) and (4.9), it follows from (4.10) with ℓ = τ ∗ that Pm τs > p s { τs∗,τs 1 = τs∗ 1 0, for 1 s r∗, and from (4.10) with τs∗ 1 + 1 ℓ<τs∗ − } → ≤ ≤ ≤ − p − that Pm τs = ℓ,τs 1 = τs∗ 1 0. { − − } → 5. Discussion and concluding remarks. The central limit theorem for particle filters in this paper and in [6, 8, 9, 12] considers the case of fixed T as the number m of particles becomes infinite. In addition to the standard error estimates (2.9), one can use the following modification that substitutes 2 ψT inσ ˆ (ψT ) by “out-of-sample” estimates of ψT . Instead of generating ad- ditional particles to accomplish this, we apply a sample-splitting technique to the m particles, which is similar to k-fold cross-validation. The standard error estimates in the example in Section 4.1 use this technique with k = 2. Divide the m particles into k groups of equal size r = m/k except for the last one that may have a larger sample size. For definit⌊ eness⌋ consider bootstrap resampling at every stage; the case of residual resampling or occasional resampling can be treated similarly. Denote the particles at Xij the tth generation, before resampling, by t : 1 i r, 1 j k , and X1j Xrj { ≤ ≤X1j ≤ X≤rj } let t ,..., t be sampled with replacement from t ,..., t , with ij Xij j j e 1 r { Xij } weights Wt = wt( t )/(mw¯t ), wherew ¯t = r− i=1 wt( t ). With these “stratified” resampling weights, we estimate ψT by e e P k e r e ˆ 1 ˆj ˆj j 1 Xij Xij (5.1) ψT = k− ψT where ψT = (rw¯T )− ψ( T )wT ( T ). Xj=1 Xi=1 ˆ j 1 ˆℓ e 2 e Similarly, letting ψT− = (k 1)− ℓ : ℓ=j ψT , we estimate σ by − 6 k r P Xℓj 2 2 1 wT ( T ) Xℓj ˆ j (5.2) σˆSP = m− [ψ( ) ψ− ] . w¯j T − T j=1 i=1 ℓ : Aℓj =i T X X XT −1 e e Chopin (Section 2.1 of [6]) summarizes a general framework of traditional particle filter estimates of ψT , in which the resampling (also called PARTICLE FILTER 21

“selection”) weights wt(Xt) are chosen to convert a weighted approximation of the posterior density of Xt given Yt, with likelihood ratio weights associated with importance sampling (called “mutation” in [6]), to an un- weighted approximation (that has equal weights 1), so that the usual average ψ¯ := m 1 m ψ(Xi ) converges a.s. to ψ as m . He uses induction T − i=1 T T →∞ on t T to prove the central limit theorem for √m(ψ¯T ψT ): “conditional on past≤ iterations,P each step generates independent (but− not identically dis- tributed) particles, which follow some (conditional) central limit theorem” (page 2392 of [6]), and he points out that the particle ﬁlter of Gilks and Berzuini [10] is a variant of this framework. Let Vt be the variance of the limiting normal distribution of √mt(ψ¯T ψT ) in the Gilks–Berzuini particle ﬁlter (pages 132–134 of [10]), that assumes− (5.3) m , then m ,... and then m , 1 →∞ 2 →∞ T →∞ which means that m1 and mt mt 1 for 1

e t ms k,ℓ 1 (5.4) Nt = Xi Xk Xℓ , e s is an ancestor of e t and of e t s=1 { } X Xi=1 mt mt 1 k,ℓ Xk ¯ Xℓ ¯ (5.5) Vt = 2 Nt ψ( t ) ψT ψ( t) ψT . mt { − }{ − } Xk=1 Xℓ=1 b e e p Theorem 3 of Gilks and Berzuini (Appendix A of [10]) shows that Vt/Vt 1 under (5.3). The basic idea is to use the law of large numbers for each→ generation conditioned on the preceding one and an induction argumentb which relies on the assumption (5.3) that “is not directly relevant to the practical context.” In contrast, our Theorem 1 or 2 is based on a much more precise martingale approximation of m(ψˆT ψT ) or m(ψˆOR ψT ). While we have focused on importance sampling in− this paper, anothe− r approach to choosing the proposal distribution is to use Markov chain Monte Carlo iterations for the mutation step, as in [10], and important advances in this approach have been developed recently in [1, 4].

APPENDIX: LEMMAS 2, 3, 5, COROLLARY 1 AND (3.30)–(3.32)

Proof of Lemma 2. We use induction on t and show that if (ii) holds for t 1, then (i) holds for t, and that if (i) holds for t, then (ii) also holds for t.− Since G = G+ G , we can assume without loss of generality that − − 22 H. P. CHAN AND T. L. LAI

G is nonnegative. By the law of large numbers, (i) holds for t = 1, noting that h0∗ 1. Let t> 1 and assume that (ii) holds for t 1. To show that ≡ X X − (i) holds for t, assume µt = Eq[G( t)/ht∗ 1( t 1)] to be ﬁnite and suppose that, contrary to (3.15), there exist 0 <ε<− 1 and− m 2ε > ε(2 + µt) for all m , ( − ) ∈ M i=1 X in which = m ,m ,...e and we write Xi (= Xi) to highlight the de- M { 1 2 } t,m t pendence on m. Let δ = ε3. Since x = (x y) + (x y)+, we can write Xi i i i ∧ − G( t,m)= Ut,m + Vt,m + St,m, where e e i Xi Xi e Ut,m = G( t,m) (δm) Em[G( t,m) (δm) 2t 2], (A.2) ∧ − ∧ |F − i Xi i Xi + Vt,m = Em[G( t,m) (δm) 2t 2], St,m = [G( t,m) δm] . e ∧ |F − e − i i ℓ Since Em(Ut,m 2t 2)=0 and Covm(Ut,m,Ut,m 2t 2)=0 for i = ℓ, it fol- |F − e |F − e 6 lows from Chebyshev’s inequality, (U i )2 δmG(Xi ) and δ = ε3 that t,m ≤ t,m m 1 i e Pm m− Ut,m > ε 2t 2 ( F − ) i=1 X m 2 i (εm)− Varm Ut,m 2t 2 ≤ F − i=1 ! (A.3) X m 2 i 2 = (εm)− Em( Ut,m 2t 2) | | |F − Xi=1 m 1 Xi p εm− Em[G( t,m) 2t 2] εµt, ≤ |F − → Xi=1 as m , by (3.16) applied to G∗(Xte1)= Eq[G(Xt) Xt 1], noting that →∞X X − | X− Eq[G∗( t 1)/ht∗ 1( t 1)] = µt. Application of (3.16) to G∗( t 1) also yields − − − − m 1 i p (A.4) m− V µ . t,m → t Xi=1 From (A.3), it follows that m 1 i (A.5) Pm m− Ut,m > ε ε(1 + µt) for all large m. ( ) ≤ i=1 X Since µt < , we can choose nk such that ∞ X 1 X 1 Eq[G( t) G(Xt)>nk /ht∗ 1( t 1)] < k− . { } − − PARTICLE FILTER 23 X X 1 X Hence, by applying (3.16) to Gk( t 1)= Eq[G( t) G(Xt)>nk t 1], we can choose a further subsequence m −that satisﬁes (A.1{ ), m }n| /δ−and k k ≥ k m 1 Xi 1 1 1 Pm m− Em[G( t,m) G(Xi )>n 2t 2] 2k− k− for m = mk. ( { e t,m k}|F − ≥ ) ≤ Xi=1 Then for m = m , withe probability at least 1 k 1, k − − m m i Xi Pm St,m = 0 2t 2 = Pm G( t,m) >δm 2t 2 { 6 |F − } { |F − } Xi=1 Xi=1 m e 1 Xi 1 (δm)− Em[G( t,m) G(Xi )>δm 2t 2] ≤ { e t,m }|F − Xi=1 1 1 e 2δ− k− . ≤ i Therefore, Pm St,m =0 for some 1 i m 0 as m = mk . Combin- ing this with ({A.4) and6 (A.5), we have≤ ≤ a contradiction}→ to (A.1→∞). Hence, we conclude (3.15) holds for t. The proof that (ii) holds for t whenever (i) holds for t is similar. In particular, since (i) holds for t = 1, so does (ii).

Proof of Lemma 3. We again use induction and show that if (ii) holds for t 1, then (i) holds for t, and that if (i) holds for t, then so does (ii). − i First (i) holds for t = 1 because A0 = i, h0∗ 1 and EqG(X1) < implies that for any ε> 0, ≡ ∞

1 i Pm m− max G(X1) ε mPq G(X1) εm 0 as m . 1 i m ≥ ≤ { ≥ }→ →∞ n ≤ ≤ o Let t> 1 and assume thate (ii) holds for t 1. To show that (i) holds for X X − t, suppose µt(= Eq[G( t)/ht∗ 1( t 1)]) < and that, contrary to (3.19), there exist 0 <ε< 1 and =− m ,m− ,... ∞with m 2ε > ε(2 + µt) for all m . 1 j m ∈ M ≤ ≤ i : Ai =j Xt−1 e As in (A.2), let δ = ε3 and G(Xi )= U i + V i + Si . For 1 j m, t,m t,m t,m t,m ≤ ≤ 1 i 1 Xi Pm m− Ut,m > ε e2t 2 εm− Em[G( t,m) 2t 2], F − ≤ |F − i : Ai =j i : Ai =j Xt−1 Xt−1 e by an argument similar to (A.3 ). Summing over j and then taking expectations, it then follows from Lemma 2 that m 1 i (A.7) P m− U > ε ε(1 + µ ) for all large m. m t,k ≤ t j=1 i : Ai =j X Xt−1

24 H. P. CHAN AND T. L. LAI

By (3.20) applied to the function G∗(Xt 1)= Eq[G(Xt) Xt 1], − | − 1 i p (A.8) m− max Vt,m 0 as m . 1 j m → →∞ ≤ ≤ i : Ai =j Xt−1

As in the proof of Lemma 2, select a further subsequence mk′ such that (A.6) holds and P m Si = 0 1 as m = m . Combining this m i=1 t,m k′ →∞ with (A.7) and (A.8{) yields a contradiction}→ to (A.6). Hence, (3.19) holds for t. P Next let t 1 and assume that (i) holds for t. To show that (ii) holds for t, ≥ i assumeµ ˜t = Eq[G(Xt)/h∗(Xt)] to be ﬁnite and note that i G(X )= t i : At=j t,k i i i # G(X ). Suppose that, contrary to (3.20), there exist 0 <ε< i : At−1=j t,k t,k P 1 and = m1,m2,... with m1 2ε > ε(2 +µ ˜t) 1 j m ≤ ≤ i : Ai =j (A.9) Xt−1 e for all m . ∈ M 3 i Xi i i i i Let δ = ε and write #t,mG( t,m)= Ut,m + Vt,m +#t,mSt,m, where

U i =(#i mW i )[G(Xi ) (δm)], t,m t,me− t,me et,m ∧ V i = mW i [G(Xi ) (δm)] et,m t,m t,m ∧ e i i i i and St,m is deﬁned in (A.2). Since Em(#t,m 2t 1)= mWt,m, Varm(#t,m e e |F − | i Xi 1 m 2t 1) mWt,m = wt( t,m)/w¯t,m and since #t,m,..., #t,m are pairwise neg- F − ≤ atively correlated conditioned on 2t 1, we obtain by Chebyshev’s inequality 1 F − i that for 1 j m, Pme m− i : Ai =j Ut,m > ε 2t 1 is bounded above ≤ ≤ {| t−1 | |F − } by P e 1 i Xi 2 Varm(#t,m 2t 1)[G( t,m) (δm)] (εm)2 |F − ∧ i : Ai =j Xt−1 e ε Xi G∗( t,m), ≤ mw¯t,m i : Ai =j Xt−1 e where G∗( )= wt( )G( ). Hence, applying Lemma 2 to G∗ and noting that, · X· · X 1 1 m Xi p by (2.5), Eq[G∗( t)/ht∗ 1( t 1)] = ζt− µ˜t andw ¯t,m = m− i=1 wm( t,m) 1 − − → ζt− by Lemma 2, we obtain P e m 1 i (A.10) P m− U > ε ε(1 +µ ˜ ) for all large m. m t,m ≤ t j=1 i : Ai =j X Xt−1 e

PARTICLE FILTER 25

By (3.19) applied to G∗ and Lemma 2, 1 i 1 1 Xi p (A.11) m− max Vt,m w¯t,m− m− max G∗( t,m) 0. 1 j m ≤ 1 j m → ≤ ≤ i : Ai =j ≤ ≤ i : Ai =j Xt−1 Xt−1 e X X X 1 e Finally, by Lemma 2 applied to Gm( t)= wt( t)G( t) G(Xt)>δm , { } m i i Pm #t,mSt,m = 0 2t 1 { 6 |F − } Xi=1 m i 1 = Pm #t,m > 0 2t 1 Si =0 { |F − } { t,m6 } i=1 (A.12) X m Xi 1 i Gm( t,m) Em(#t,m 2t 1) δm − Xi ≤ |F wt( t,m) Xi=1 e m 1 1 Xi p e = Gm( t,m) 0 δw¯t,m "m # → Xi=1 as m = m . Since (A.10)–(A.12) contradictse (A.9), (3.20) follows. k →∞ Proof of Lemma 5. The proof of Lemma 5 is similar to that of Lemma 3, and in fact is simpler because the ﬁniteness assumption on the second moments of G dispenses with truncation arguments of the type in i (A.2). First (i) holds for t = 1 because A0 = i, h0∗ =1 and m 1 2 i 1 2 Pm m− G (X1) K K− EqG (X1) 0 as K . ( ≥ ) ≤ → →∞ Xi=1 As in the proof of Lemmae 3, let t> 1 and assume that (ii) holds for t 2 − 1. To show that (i) holds for t, suppose µt := Eq[G (Xt)Γt 1(Xt 1)] < . + − − ∞ By expressing G = G G−, we may assume without loss of generality G − Xi i i i Xi Xi is nonnegative. Write G( t)= Ut + Vt , where Vt = Em[G( t) t 1]. By 1 m i 2 | − the induction hypothesis, m− ( i V ) = Op(1). To show that j=1 i : At−1=j t 1 m i 2e i ℓ e m− j=1( i : Ai =j Ut ) = Op(1), note that Em(Ut Ut 2t 2)=0 for i = ℓ. t−1 P P |F − 6 Hence, it follows by an argument similar to (A.3) that P P m 2 1 i Pm m− Ut K 2t 2 ≥ F − ( j=1 i ) i : At−1=j X X m 1 i 2 (Km)− Em[(Ut ) 2t 2] ≤ |F − Xi=1 p 1 X X X 2 X K− Eq G( t) Eq[G( t) t 1] /ht∗ 1( t 1) . → {| − | − | − − } 26 H. P. CHAN AND T. L. LAI

Next assume that (i) holds for t and show that (ii) holds for t ifµ ˜t := 2 i i i Eq[G (Xt)Γt(Xt)] is ﬁnite. Note that i G(X )= i # G(X ). i : At=j t i : At−1=j t t i Xi i i i i Xi Xi Xi Write #tG( t)= Ut +Vt , where Vt =PmWt G( t)= wt(Pt)G( t)/w¯t. Since p 1 X X 2 X e w¯t ζt− and since Eq [wt( t)G( t)] Γt 1( t 1) µ˜t < , it follows → { 1 m− − }≤ i 2 ∞ from the inductione e hypothesise thate m− ( e i eV ) =eOp(1). More- j=1 i : At−1=j t i i i ℓ over, since Varm(#t 2t 1) mWt andP Covm(#P t, #t 2t 1) 0 for i = ℓ, it follows from an argument|F − similar≤ to (A.3) that |F −e ≤ 6 m 2 m 1 i 1 i 2 Xi Pm m− Ut K 2t 1 (Km)− mWt G ( t) ≥ F − ≤ ( j=1i : Ai =j ) i=1 X Xt−1 X e e p 1 2 K− E [G (X )/h∗(X )], → q t t t 2 1 noting that E [G (X )/h (X )] η− µ˜ < . Hence, q t t∗ t ≤ t t ∞ m 2 1 i m− Ut = Op(1). j=1i : Ai =j X Xt−1 e This shows that (ii) holds for t, completing the induction proof.

1 m Proof of Corollary 1. Recall that (Zk ,...,Zk ), k, 1 k 2T 1 is a martingale diﬀerence sequence and{ that Z1,...,ZFm are≤ condi-≤ − } k k tionally independent given k 1, where 0 is the trival σ-algebra; moreover, F − F 2T 1 m ˜ − i (A.13) √m(ψT ψT )= Zk/√m . − ! Xk=1 Xi=1 k i Fix k. Conditioned on k 1, Ymi := Zk/√m is a triangular array of row-wise F − ˜ Xi independent (i.i.d. when k = 2t) random variables. Noting that Em[ft( t) Xi ˜ Xi | t 1]= ft 1( t 1), we obtain from (3.10) that − − − i 2 ˜2 Xi Xi ˜2 Xi i 2 e Em[(Z2t 1) 2t 2]= Em[ft ( t) t 1] ft 1( t 1) (Ht 1) , − |F − { | − − − − } − m m 2 i 2 j ˜2eXj j 2 j ˜ Xj j Em[(Z2t) 2t 1]= Wt ft ( t )(Ht ) Wt ft( t )Ht . |F − − " # Xj=1 Xj=1 j j e e p 1 e e Let ε> 0. Since W = w (X )/(mw¯ ) andw ¯ ζ− , t t t t t → t m m k 2 p 2 k 21 p Em[(Ymi) k 1] e σ˜k, Em[(Ymi) Y k >ε k 1] 0, |F − → {| mi| }|F − → Xi=1 Xi=1 by (2.5) and Lemma 2. Hence, by Lindeberg’s central limit theorem for triangular arrays of independent random variables, the conditional distribution PARTICLE FILTER 27

m i 2 of i=1 Zk/√m given k 1 converges to N(0, σ˜k) as m . This implies that for any real u, F − →∞ P iu Pm Zj /√m p u2σ˜2 /2 (A.14) Em(e j=1 k k 1) e− k , 1 k 2T 1, |F − → ≤ ≤ − where i denotes the imaginary number √ 1. Let S = t ( m Zj/√m). − t k=1 j=1 k Since m Zj/√m, , 1 k 2T 1 is a martingale diﬀerence se- { j=1 k Fk ≤ ≤ − } P P quence, it follows from (A.13), (A.14) and mathematical induction that P iu√m(ψ˜T ψT ) Em(e − ) m j iuS2T −2 iu P Z /√m = Em[e Em(e j=1 2T −1 2T 2)] |F − u2σ˜2 /2 iuS 2 2 = e− 2T −1 E (e 2T −2 )+ o(1) = = exp( u σ /2) + o(1). m · · · − C Hence, (3.13) holds.

i j Proof of (3.30)–(3.32). For distinct i, j, k, ℓ, deﬁne cij = (mWt )(mWt ), i j k ℓ i 2 j k cijkℓ = (mWt )(mWt )(mWt )(mWt ), ciijk = (mWt ) (mWt )(mWt ) and de- i m h ﬁne c , ciij , cijj similarly. Since # = 1 h , where B are i.i.d. ijk t h=1 Bt =i t h { }i conditioned on 2t 1 such that P Bt = iP 2t 1 = Wt , combinatorial arguments show thatF − { |F − } i i j j k k ℓ ℓ Em[(#t mWt )(#t mWt )(#t mWt )(#t mWt ) 2t 1] − − − − |F − 4 = m− [m(m 1)(m 2)(m 3) (A.15) − − − 4m2(m 1)(m 2) + 6m3(m 1) 4m4 + m4]c − − − − − ijkℓ 2 3 = (3m− 6m− )c , − ijkℓ i i 2 j j k k Em[(#t mWt ) (#t mWt )(#t mWt ) 2t 1] (A.16) − − − |F − 2 3 1 2 = (3m− 6m− )c + ( m− + 2m− )c , − iijk − ijk i i 2 j j 2 Em[(#t mWt ) (#t mWt ) 2t 1] − − |F − 2 3 1 2 (A.17) = (3m− 6m− )c + ( m− + 2m− )(c + c ) − iijj − iij ijj 1 + (1 m− )c , − ij i i 4 Em[(#t mWt ) 2t 1] − |F − i i 4 i i 4 = m[Wt (1 Wt ) + (1 Wt )(Wt ) ] (A.18) − − + 3m(m 1)[(W i)2(1 W i)4 − t − t + 2(W i)3(1 W i)3 + (W i)4(1 W i)2], t − t t − t i i j j 1 (A.19) Em (#t mWt )(#t mWt ) 2t 1 = m− cij. { − − |F − } − 28 H. P. CHAN AND T. L. LAI

i ˜ Xi i ˜ 2 1 m To prove (3.30), let Ti = mWt [ft( t)Ht f0] , T = m− i=1 Ti and Tt∗ = 1 − m− max1 j m i : Ai =j Ti. We can use (A.17) and (A.18) to show that ≤ ≤ t−1 P 2 m 2 2 e e m− Em[( i=1 Si ) 2t 1] is equal to P |F − m P2 4 2 2 2 2 m− Em(Si 2t 1)+ m− Em(Si Sj 2t 1)= T + oP (1) |F − |F − i=1 i=j X X6 p as m . By Lemma 2(i), T 2 σ˜4 . From this and (3.29), (3.30) follows. →∞ → 2t i ˜ Xi i ˜ 1 m To prove (3.31), let Di = mWt [ft( t)Ht f0], D = m− i=1 Di and 1 − | | Dt∗ = m− max1 j m i : Ai =j Di . By (A.15)–(A.17), the left-hand side ≤ ≤ t−1 | | P of (3.31) is equal to e e P m 2 m 1 3 6 1 1 D D + 1 T T m2 m2 m3 i ℓ m2 m i ℓ − ( j ) − j j=1 (i,ℓ) C j=1 (i,ℓ) C X X∈ t−1 X X∈ t−1 m 4 1 2 2 T D − m2 m − m2 i ℓ j=1i : Ai =j ℓ=i : Aℓ =j X Xt−1 6 Xt−1 2 3(DD∗) + T T ∗ for m 2. ≤ t t ≥ By Lemmas 2 and 3, the upper bound in the above inequality converges to 0 in probability as m , proving (3.31). →∞ 1 2 By (A.19), Em(SiSj 2t 1)= m− DiDj and, similarly, Em(Si 2t 1)= T m 1D2. Therefore,|F by− (3.4),− the left-hand side of (3.32) is equal|F to− i − − i m 2 1 j j 2 1 1 p m− (ε1 + + ε2t 1) m− Ti m− Di 0, · · · − − → j=1 i : Ai =j i : Ai =j X Xt−1 Xt−1 2 p since Tt∗ + (Dt∗) 0 by Lemma 3 and since (3.21) holds for k = 2t 1 by the induction hypothesis.→ −

REFERENCES [1] Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 269–342. MR2758115

[2] Baker, J. E. (1985). Adaptive selection methods for genetic algorithms. In Proc. International Conference on Genetic Algorithms and Their Applications (J. Grefenstette, ed.) 101–111. Erlbaum, Mahwah, NJ. [3] Baker, J. E. (1987). Reducing bias and ineﬃciency in the selection algorithm. In Genetic Algorithms and Their Applications (J. Grefenstette, ed.) 14–21. Erl- baum, Mahwah, NJ. [4] Brockwell, A., Del Moral, P. and Doucet, A. (2010). Sequentially interacting Markov chain Monte Carlo methods. Ann. Statist. 38 3387–3411. MR2766856 PARTICLE FILTER 29

[5] Chan, H. P. and Lai, T. L. (2011). A sequential Monte Carlo approach to computing tail probabilities in stochastic models. Ann. Appl. Probab. 21 2315–2342. MR2895417 [6] Chopin, N. (2004). Central limit theorem for sequential Monte Carlo methods and its application to Bayesian inference. Ann. Statist. 32 2385–2411. MR2153989 [7] Crisan, D., Del Moral, P. and Lyons, T. (1999). Discrete filtering using branch- ing and interacting particle systems. Markov Process. Related Fields 5 293–318. MR1710982 [8] Del Moral, P. (2004). Feynman–Kac Formulae: Genealogical and Interacting Par- ticle Systems With Applications. Springer, New York. MR2044973 [9] Douc, R. and Moulines, E. (2008). Limit theorems for weighted samples with applications to sequential Monte Carlo methods. Ann. Statist. 36 2344–2376. MR2458190 [10] Gilks, W. R. and Berzuini, C. (2001). Following a moving target—Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 127–146. MR1811995 [11] Gordon, N. J., Salmond, D. J. and Smith, A. F. M. (1993). A novel approach to non-linear and non-Gaussian Bayesian state estimation. IEEE Proc. Radar Signal Processing 140 107–113. [12] Kunsch,¨ H. R. (2005). Recursive Monte Carlo filters: Algorithms and theoretical analysis. Ann. Statist. 33 1983–2021. MR2211077 [13] Liu, J. S. (2008). Monte Carlo Strategies in Scientific Computing. Springer, New York. MR2401592 [14] Liu, J. S. and Chen, R. (1998). Sequential Monte Carlo methods for dynamic systems. J. Amer. Statist. Assoc. 93 1032–1044. MR1649198 [15] Yao, Y.-C. (1984). Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Ann. Statist. 12 1434–1447. MR0760698

Department of Statistics Department of Statistics and Applied Probability Stanford University National University of Singapore Sequoia Hall 6 Science Drive 2 Stanford, California 94305-4065 Singapore 117546 USA Singapore E-mail: [email protected] E-mail: [email protected]