<<

Particle Monte Carlo Methods

Arnaud Doucet University of British Columbia, Vancouver, Canada

IMB, 21st May 2010

(IMB, 21st May 2010) 1 / 32 It seems natural to combine MCMC and SMC methods. We know how to use MCMC within SMC (Gilks & Berzuini, JRSS B, 2001). We show here how to use SMC within MCMC in a simple and principled way.

Objectives

Two main classes of Monte Carlo methods are used nowadays for Bayesian computation: (MCMC) and Sequential Monte Carlo (SMC) aka Particle …lters.

(IMB, 21st May 2010) 2 / 32 We know how to use MCMC within SMC (Gilks & Berzuini, JRSS B, 2001). We show here how to use SMC within MCMC in a simple and principled way.

Objectives

Two main classes of Monte Carlo methods are used nowadays for Bayesian computation: Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) aka Particle …lters. It seems natural to combine MCMC and SMC methods.

(IMB, 21st May 2010) 2 / 32 We show here how to use SMC within MCMC in a simple and principled way.

Objectives

Two main classes of Monte Carlo methods are used nowadays for Bayesian computation: Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) aka Particle …lters. It seems natural to combine MCMC and SMC methods. We know how to use MCMC within SMC (Gilks & Berzuini, JRSS B, 2001).

(IMB, 21st May 2010) 2 / 32 Objectives

Two main classes of Monte Carlo methods are used nowadays for Bayesian computation: Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) aka Particle …lters. It seems natural to combine MCMC and SMC methods. We know how to use MCMC within SMC (Gilks & Berzuini, JRSS B, 2001). We show here how to use SMC within MCMC in a simple and principled way.

(IMB, 21st May 2010) 2 / 32 We only have access to a process Yn n 1 such that, conditional f g  upon Xn n 1, the observations are statistically independent and f g 

Yn (Xn = xn) g ( xn) . j  θ j θ is an unknown parameter of prior density p (θ) . State-space models are ubiquitous in applied science, , engineering, etc.

Motivating Example: General State-Space Models

Let Xn n 1 be a Markov process de…ned by f g 

X1 µ ( ) and Xn (Xn 1 = xn 1) fθ ( xn 1) .  θ  j  j

(IMB, 21st May 2010) 3 / 32 θ is an unknown parameter of prior density p (θ) . State-space models are ubiquitous in applied science, econometrics, engineering, statistics etc.

Motivating Example: General State-Space Models

Let Xn n 1 be a Markov process de…ned by f g 

X1 µ ( ) and Xn (Xn 1 = xn 1) fθ ( xn 1) .  θ  j  j

We only have access to a process Yn n 1 such that, conditional f g  upon Xn n 1, the observations are statistically independent and f g 

Yn (Xn = xn) g ( xn) . j  θ j

(IMB, 21st May 2010) 3 / 32 State-space models are ubiquitous in applied science, econometrics, engineering, statistics etc.

Motivating Example: General State-Space Models

Let Xn n 1 be a Markov process de…ned by f g 

X1 µ ( ) and Xn (Xn 1 = xn 1) fθ ( xn 1) .  θ  j  j

We only have access to a process Yn n 1 such that, conditional f g  upon Xn n 1, the observations are statistically independent and f g 

Yn (Xn = xn) g ( xn) . j  θ j θ is an unknown parameter of prior density p (θ) .

(IMB, 21st May 2010) 3 / 32 Motivating Example: General State-Space Models

Let Xn n 1 be a Markov process de…ned by f g 

X1 µ ( ) and Xn (Xn 1 = xn 1) fθ ( xn 1) .  θ  j  j

We only have access to a process Yn n 1 such that, conditional f g  upon Xn n 1, the observations are statistically independent and f g 

Yn (Xn = xn) g ( xn) . j  θ j θ is an unknown parameter of prior density p (θ) . State-space models are ubiquitous in applied science, econometrics, engineering, statistics etc.

(IMB, 21st May 2010) 3 / 32 In this Bayesian framework, inference relies on the posterior density

p ( θ, x1 T y1 T ) = p ( θ y1 T ) p (x1 T y1 T ) , : j : j : θ : j : where

pθ (y1:T ) p (θ) pθ (x1:T , y1:T ) p ( θ y1:T ) = , pθ (x1:T y1:T ) = , j p (y1:T ) j pθ (y1:T ) with T T pθ (x1:T , y1:T ) = µθ (x1) ∏ fθ (xn xn 1) ∏ gθ (yn xn) . n=2 j n=1 j Except for a few models including …nite state-space HMM and linear Gaussian models (Kalman), even pθ (x1:T y1:T ) and pθ (y1:T ) do not admit a standard form. j

Bayesian Inference in General State-Space Models

Given a collection of observations y1:T := (y1, ..., yT ), we are interested in carrying out inference about θ and X1:T := (X1, ..., XT ) .

(IMB, 21st May 2010) 4 / 32 Except for a few models including …nite state-space HMM and linear Gaussian models (Kalman), even pθ (x1:T y1:T ) and pθ (y1:T ) do not admit a standard form. j

Bayesian Inference in General State-Space Models

Given a collection of observations y1:T := (y1, ..., yT ), we are interested in carrying out inference about θ and X1:T := (X1, ..., XT ) . In this Bayesian framework, inference relies on the posterior density

p ( θ, x1 T y1 T ) = p ( θ y1 T ) p (x1 T y1 T ) , : j : j : θ : j : where

pθ (y1:T ) p (θ) pθ (x1:T , y1:T ) p ( θ y1:T ) = , pθ (x1:T y1:T ) = , j p (y1:T ) j pθ (y1:T ) with T T pθ (x1:T , y1:T ) = µθ (x1) ∏ fθ (xn xn 1) ∏ gθ (yn xn) . n=2 j n=1 j

(IMB, 21st May 2010) 4 / 32 Bayesian Inference in General State-Space Models

Given a collection of observations y1:T := (y1, ..., yT ), we are interested in carrying out inference about θ and X1:T := (X1, ..., XT ) . In this Bayesian framework, inference relies on the posterior density

p ( θ, x1 T y1 T ) = p ( θ y1 T ) p (x1 T y1 T ) , : j : j : θ : j : where

pθ (y1:T ) p (θ) pθ (x1:T , y1:T ) p ( θ y1:T ) = , pθ (x1:T y1:T ) = , j p (y1:T ) j pθ (y1:T ) with T T pθ (x1:T , y1:T ) = µθ (x1) ∏ fθ (xn xn 1) ∏ gθ (yn xn) . n=2 j n=1 j Except for a few models including …nite state-space HMM and linear Gaussian models (Kalman), even pθ (x1:T y1:T ) and pθ (y1:T ) do not admit a standard form. j

(IMB, 21st May 2010) 4 / 32 Standard practice consists of building an MCMC kernel updating subblocks according to

pθ (xn:n+K 1 yn:n+K 1, xn 1, xn+K ) j n+K n+K 1 ∝ ∏ fθ (xm xm 1) ∏ gθ (ym xm ) . m=n j m=n j Tradeo¤ between length K of the blocks and ease of design of good proposals for Metropolis-Hastings (MH) moves. Many complex models (e.g. volatility models) are such that it is only possible to sample from the prior but impossible to evaluate it pointwise: how can one design e¢ cient MCMC for such models?

Common MCMC Approaches and Limitations

When it is possible to sample from p (x1 T y1 T ) and compute θ : j : pθ (y1:T ), e¢ cient MCMC algorithms can be designed. This is only feasible for linear Gaussian and …nite state-space models.

(IMB, 21st May 2010) 5 / 32 Tradeo¤ between length K of the blocks and ease of design of good proposals for Metropolis-Hastings (MH) moves. Many complex models (e.g. volatility models) are such that it is only possible to sample from the prior but impossible to evaluate it pointwise: how can one design e¢ cient MCMC for such models?

Common MCMC Approaches and Limitations

When it is possible to sample from p (x1 T y1 T ) and compute θ : j : pθ (y1:T ), e¢ cient MCMC algorithms can be designed. This is only feasible for linear Gaussian and …nite state-space models. Standard practice consists of building an MCMC kernel updating subblocks according to

pθ (xn:n+K 1 yn:n+K 1, xn 1, xn+K ) j n+K n+K 1 ∝ ∏ fθ (xm xm 1) ∏ gθ (ym xm ) . m=n j m=n j

(IMB, 21st May 2010) 5 / 32 Many complex models (e.g. volatility models) are such that it is only possible to sample from the prior but impossible to evaluate it pointwise: how can one design e¢ cient MCMC for such models?

Common MCMC Approaches and Limitations

When it is possible to sample from p (x1 T y1 T ) and compute θ : j : pθ (y1:T ), e¢ cient MCMC algorithms can be designed. This is only feasible for linear Gaussian and …nite state-space models. Standard practice consists of building an MCMC kernel updating subblocks according to

pθ (xn:n+K 1 yn:n+K 1, xn 1, xn+K ) j n+K n+K 1 ∝ ∏ fθ (xm xm 1) ∏ gθ (ym xm ) . m=n j m=n j Tradeo¤ between length K of the blocks and ease of design of good proposals for Metropolis-Hastings (MH) moves.

(IMB, 21st May 2010) 5 / 32 Common MCMC Approaches and Limitations

When it is possible to sample from p (x1 T y1 T ) and compute θ : j : pθ (y1:T ), e¢ cient MCMC algorithms can be designed. This is only feasible for linear Gaussian and …nite state-space models. Standard practice consists of building an MCMC kernel updating subblocks according to

pθ (xn:n+K 1 yn:n+K 1, xn 1, xn+K ) j n+K n+K 1 ∝ ∏ fθ (xm xm 1) ∏ gθ (ym xm ) . m=n j m=n j Tradeo¤ between length K of the blocks and ease of design of good proposals for Metropolis-Hastings (MH) moves. Many complex models (e.g. volatility models) are such that it is only possible to sample from the prior but impossible to evaluate it pointwise: how can one design e¢ cient MCMC for such models?

(IMB, 21st May 2010) 5 / 32 To sample from p (x1 T y1 T ), SMC proceed sequentially by …rst θ : j : approximating p (x1 y1) and p (y1) at time 1 then p (x1 2 y1 2) θ j θ θ : j : and pθ (y1:2) at time 2 and so on. SMC methods approximate the distributions of interest via a cloud of N particles which are propagated using Importance Sampling and Resampling steps.

Sequential Monte Carlo aka Particle Filters

SMC methods provide approximations of p (x1 T y1 T ) and p (y1 T ). θ : j : θ :

(IMB, 21st May 2010) 6 / 32 SMC methods approximate the distributions of interest via a cloud of N particles which are propagated using Importance Sampling and Resampling steps.

Sequential Monte Carlo aka Particle Filters

SMC methods provide approximations of p (x1 T y1 T ) and p (y1 T ). θ : j : θ : To sample from p (x1 T y1 T ), SMC proceed sequentially by …rst θ : j : approximating p (x1 y1) and p (y1) at time 1 then p (x1 2 y1 2) θ j θ θ : j : and pθ (y1:2) at time 2 and so on.

(IMB, 21st May 2010) 6 / 32 Sequential Monte Carlo aka Particle Filters

SMC methods provide approximations of p (x1 T y1 T ) and p (y1 T ). θ : j : θ : To sample from p (x1 T y1 T ), SMC proceed sequentially by …rst θ : j : approximating p (x1 y1) and p (y1) at time 1 then p (x1 2 y1 2) θ j θ θ : j : and pθ (y1:2) at time 2 and so on. SMC methods approximate the distributions of interest via a cloud of N particles which are propagated using Importance Sampling and Resampling steps.

(IMB, 21st May 2010) 6 / 32 k k k k k Sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn and compute  j k k N k W ∝ g yn X  , ∑ W = 1.   n θ j n k=1 n ResampleN times from N k k pθ (dx1:n y1:n) = ∑k=1 Wn δ k k (dx1:n) to obtain X1:n . X 1:n 1,Xn j    b

N k Resample N times from p (dx1 y1) = W δ k (dx1) to obtain θ ∑k=1 1 X1 k j X 1 . b At timen no, n > 1

Vanilla SMC - The Bootstrap Filter

At time 1 k k k N k Sample X µ ( ) and compute W ∝ g y1 X , ∑ W = 1. 1  θ  1 θ j 1 k=1 1 

(IMB, 21st May 2010) 7 / 32 k k k k k Sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn and compute  j k k N k W ∝ g yn X  , ∑ W = 1.   n θ j n k=1 n ResampleN times from N k k pθ (dx1:n y1:n) = ∑k=1 Wn δ k k (dx1:n) to obtain X1:n . X 1:n 1,Xn j    b

At time n, n > 1

Vanilla SMC - The Bootstrap Filter

At time 1 k k k N k Sample X µ ( ) and compute W ∝ g y1 X , ∑ W = 1. 1  θ  1 θ j 1 k=1 1 N k Resample N times from p (dx1 y1) = W δ k (dx1) to obtain θ ∑k=1 1 X1 k j X 1 . b n o

(IMB, 21st May 2010) 7 / 32 k k k k k Sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn and compute  j k k N k W ∝ g yn X  , ∑ W = 1.   n θ j n k=1 n ResampleN times from N k k pθ (dx1:n y1:n) = ∑k=1 Wn δ k k (dx1:n) to obtain X1:n . X 1:n 1,Xn j    b

At time n, n > 1

Vanilla SMC - The Bootstrap Filter

At time 1 k k k N k Sample X µ ( ) and compute W ∝ g y1 X , ∑ W = 1. 1  θ  1 θ j 1 k=1 1 N k Resample N times from p (dx1 y1) = W δ k (dx1) to obtain θ ∑k=1 1 X1 k j X 1 . b n o

(IMB, 21st May 2010) 7 / 32 Resample N times from N k k pθ (dx1:n y1:n) = ∑k=1 Wn δ k k (dx1:n) to obtain X1:n . X 1:n 1,Xn j    b

Vanilla SMC - The Bootstrap Filter

At time 1 k k k N k Sample X µ ( ) and compute W ∝ g y1 X , ∑ W = 1. 1  θ  1 θ j 1 k=1 1 N k Resample N times from p (dx1 y1) = W δ k (dx1) to obtain θ ∑k=1 1 X1 k j X 1 . b At timen no, n > 1

k k k k k Sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn and compute  j k k N k W ∝ g yn X  , ∑ W = 1.   n θ j n k=1 n 

(IMB, 21st May 2010) 7 / 32 Vanilla SMC - The Bootstrap Filter

At time 1 k k k N k Sample X µ ( ) and compute W ∝ g y1 X , ∑ W = 1. 1  θ  1 θ j 1 k=1 1 N k Resample N times from p (dx1 y1) = W δ k (dx1) to obtain θ ∑k=1 1 X1 k j X 1 . b At timen no, n > 1

k k k k k Sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn and compute  j k k N k W ∝ g yn X  , ∑ W = 1.   n θ j n k=1 n ResampleN times from N k k pθ (dx1:n y1:n) = ∑k=1 Wn δ k k (dx1:n) to obtain X1:n . X 1:n 1,Xn j    b

(IMB, 21st May 2010) 7 / 32 These approximations are asymptotically (i.e. N ∞) consistent under very weak assumptions. !

SMC Output

At time T , we obtain the following approximation of the posterior of interest N k pθ (dx1:T y1:T ) = WT δ k (dx1:T ) ∑ X1:T j k=1

and an approximationb of pθ (y1:T ) is given by

T T N 1 k pθ (y1:T ) = pθ (y1) ∏ pθ (yn y1:n 1) = ∏ ∑ gθ yn Xn . n=2 j n=1 N k=1 j !   b b b

(IMB, 21st May 2010) 8 / 32 SMC Output

At time T , we obtain the following approximation of the posterior of interest N k pθ (dx1:T y1:T ) = WT δ k (dx1:T ) ∑ X1:T j k=1

and an approximationb of pθ (y1:T ) is given by

T T N 1 k pθ (y1:T ) = pθ (y1) ∏ pθ (yn y1:n 1) = ∏ ∑ gθ yn Xn . n=2 j n=1 N k=1 j !   Theseb approximationsb areb asymptotically (i.e. N ∞) consistent under very weak assumptions. !

(IMB, 21st May 2010) 8 / 32 Under assumptions (Del Moral et al., 2010) we also have

V [pθ (y1:T )] T 2 Dθ . pθ (y1:T )  N b Loosely speaking, the performance of SMC only degrade linearly with time rather than exponentially for naive approaches.

Some Theoretical Results

Under mixing assumptions (Del Moral, 2004), we have

T (X1 T ) p ( y1 T ) C kL : 2  θ j : ktv  θ N

where X1 T E [p ( y1 T )]. :  θ j : b

(IMB, 21st May 2010) 9 / 32 Loosely speaking, the performance of SMC only degrade linearly with time rather than exponentially for naive approaches.

Some Theoretical Results

Under mixing assumptions (Del Moral, 2004), we have

T (X1 T ) p ( y1 T ) C kL : 2  θ j : ktv  θ N

where X1 T E [p ( y1 T )]. :  θ j : Under mixing assumptions (Del Moral et al., 2010) we also have b V [pθ (y1:T )] T 2 Dθ . pθ (y1:T )  N b

(IMB, 21st May 2010) 9 / 32 Some Theoretical Results

Under mixing assumptions (Del Moral, 2004), we have

T (X1 T ) p ( y1 T ) C kL : 2  θ j : ktv  θ N

where X1 T E [p ( y1 T )]. :  θ j : Under mixing assumptions (Del Moral et al., 2010) we also have b V [pθ (y1:T )] T 2 Dθ . pθ (y1:T )  N b Loosely speaking, the performance of SMC only degrade linearly with time rather than exponentially for naive approaches.

(IMB, 21st May 2010) 9 / 32 Problem: Consider X1:T pθ ( y1:T ) then the unconditional distribution is  j

b N k qθ (dx1:T y1:T ) = E WT δ k (dx1:T ) ∑ X1:T j k=1 ! where the expectation is w.r.t. to all random variables of the SMC algorithm: this does not admit an analytic expression and so we cannot use directly the MH algorithm. We will show here how to bypass this problem using an auxiliary variables trick.

Combining MCMC and SMC

‘Idea’: Use the output of our SMC algorithm to de…ne ‘good’ high-dimensional proposal distributions for MCMC.

(IMB, 21st May 2010) 10 / 32 We will show here how to bypass this problem using an auxiliary variables trick.

Combining MCMC and SMC

‘Idea’: Use the output of our SMC algorithm to de…ne ‘good’ high-dimensional proposal distributions for MCMC.

Problem: Consider X1:T pθ ( y1:T ) then the unconditional distribution is  j

b N k qθ (dx1:T y1:T ) = E WT δ k (dx1:T ) ∑ X1:T j k=1 ! where the expectation is w.r.t. to all random variables of the SMC algorithm: this does not admit an analytic expression and so we cannot use directly the MH algorithm.

(IMB, 21st May 2010) 10 / 32 Combining MCMC and SMC

‘Idea’: Use the output of our SMC algorithm to de…ne ‘good’ high-dimensional proposal distributions for MCMC.

Problem: Consider X1:T pθ ( y1:T ) then the unconditional distribution is  j

b N k qθ (dx1:T y1:T ) = E WT δ k (dx1:T ) ∑ X1:T j k=1 ! where the expectation is w.r.t. to all random variables of the SMC algorithm: this does not admit an analytic expression and so we cannot use directly the MH algorithm. We will show here how to bypass this problem using an auxiliary variables trick.

(IMB, 21st May 2010) 10 / 32 The MH acceptance is

p ( θ, x y1 T ) q ((x1 T , θ) (x , θ)) 1 1:T j : : j 1:T ^ p ( θ, x1 T y1 T ) q (x , θ ) (x , θ) : j : 1:T  1:T p (y ) p (θ ) q ( θ θ ) θ  1:T   = 1 j ^ p (y1 T ) p (θ) q ( θ θ) θ : j

Problem: We do not know pθ (y1:T ) = pθ (x1:T , y1:T ) dx1:T analytically and we do not know how to sample from p (x1 T y1 T ) . R θ : j : “Idea”: Use SMC approximations of p (x1 T y1 T ) and p (y1 T ). θ : j : θ :

Marginal MH Sampler

Consider the following so-called marginal MH algorithm which uses as a proposal

q ((x  , θ) (x1 T , θ)) = q ( θ θ) p (x  y1 T ) . 1:T j : j θ 1:T j :

(IMB, 21st May 2010) 11 / 32 Problem: We do not know pθ (y1:T ) = pθ (x1:T , y1:T ) dx1:T analytically and we do not know how to sample from p (x1 T y1 T ) . R θ : j : “Idea”: Use SMC approximations of p (x1 T y1 T ) and p (y1 T ). θ : j : θ :

Marginal MH Sampler

Consider the following so-called marginal MH algorithm which uses as a proposal

q ((x  , θ) (x1 T , θ)) = q ( θ θ) p (x  y1 T ) . 1:T j : j θ 1:T j : The MH acceptance probability is

p ( θ, x y1 T ) q ((x1 T , θ) (x , θ)) 1 1:T j : : j 1:T ^ p ( θ, x1 T y1 T ) q (x , θ ) (x , θ) : j : 1:T  1:T p (y ) p (θ ) q ( θ θ ) θ  1:T   = 1 j ^ p (y1 T ) p (θ) q ( θ θ) θ : j

(IMB, 21st May 2010) 11 / 32 “Idea”: Use SMC approximations of p (x1 T y1 T ) and p (y1 T ). θ : j : θ :

Marginal MH Sampler

Consider the following so-called marginal MH algorithm which uses as a proposal

q ((x  , θ) (x1 T , θ)) = q ( θ θ) p (x  y1 T ) . 1:T j : j θ 1:T j : The MH acceptance probability is

p ( θ, x y1 T ) q ((x1 T , θ) (x , θ)) 1 1:T j : : j 1:T ^ p ( θ, x1 T y1 T ) q (x , θ ) (x , θ) : j : 1:T  1:T p (y ) p (θ ) q ( θ θ ) θ  1:T   = 1 j ^ p (y1 T ) p (θ) q ( θ θ) θ : j

Problem: We do not know pθ (y1:T ) = pθ (x1:T , y1:T ) dx1:T analytically and we do not know how to sample from p (x1 T y1 T ) . R θ : j :

(IMB, 21st May 2010) 11 / 32 Marginal MH Sampler

Consider the following so-called marginal MH algorithm which uses as a proposal

q ((x  , θ) (x1 T , θ)) = q ( θ θ) p (x  y1 T ) . 1:T j : j θ 1:T j : The MH acceptance probability is

p ( θ, x y1 T ) q ((x1 T , θ) (x , θ)) 1 1:T j : : j 1:T ^ p ( θ, x1 T y1 T ) q (x , θ ) (x , θ) : j : 1:T  1:T p (y ) p (θ ) q ( θ θ ) θ  1:T   = 1 j ^ p (y1 T ) p (θ) q ( θ θ) θ : j

Problem: We do not know pθ (y1:T ) = pθ (x1:T , y1:T ) dx1:T analytically and we do not know how to sample from p (x1 T y1 T ) . R θ : j : “Idea”: Use SMC approximations of p (x1 T y1 T ) and p (y1 T ). θ : j : θ :

(IMB, 21st May 2010) 11 / 32 Sample θ q ( θ (i 1)).  j Sample X p ( y1 T ) . 1:T  θ j : With probability

p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j

set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), : 1:T X1 T (i) = X1 T (i 1) . : :

Sample X1 T (1) p ( y1 T ) . :  θ(1) j : At iteration i; i 2 

“Idealized” Marginal MH Sampler

At iteration 1 Set θ (1).

(IMB, 21st May 2010) 12 / 32 Sample θ q ( θ (i 1)).  j Sample X p ( y1 T ) . 1:T  θ j : With probability

p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j

set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), : 1:T X1 T (i) = X1 T (i 1) . : :

At iteration i; i 2 

“Idealized” Marginal MH Sampler

At iteration 1 Set θ (1).

Sample X1 T (1) p ( y1 T ) . :  θ(1) j :

(IMB, 21st May 2010) 12 / 32 Sample θ q ( θ (i 1)).  j Sample X p ( y1 T ) . 1:T  θ j : With probability

p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j

set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), : 1:T X1 T (i) = X1 T (i 1) . : :

At iteration i; i 2 

“Idealized” Marginal MH Sampler

At iteration 1 Set θ (1).

Sample X1 T (1) p ( y1 T ) . :  θ(1) j :

(IMB, 21st May 2010) 12 / 32 Sample X p ( y1 T ) . 1:T  θ j : With probability

p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j

set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), : 1:T X1 T (i) = X1 T (i 1) . : :

“Idealized” Marginal MH Sampler

At iteration 1 Set θ (1).

Sample X1 T (1) p ( y1 T ) . :  θ(1) j : At iteration i; i 2  Sample θ q ( θ (i 1)).  j

(IMB, 21st May 2010) 12 / 32 With probability

p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j

set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), : 1:T X1 T (i) = X1 T (i 1) . : :

“Idealized” Marginal MH Sampler

At iteration 1 Set θ (1).

Sample X1 T (1) p ( y1 T ) . :  θ(1) j : At iteration i; i 2  Sample θ q ( θ (i 1)).  j Sample X p ( y1 T ) . 1:T  θ j :

(IMB, 21st May 2010) 12 / 32 “Idealized” Marginal MH Sampler

At iteration 1 Set θ (1).

Sample X1 T (1) p ( y1 T ) . :  θ(1) j : At iteration i; i 2  Sample θ q ( θ (i 1)).  j Sample X p ( y1 T ) . 1:T  θ j : With probability

p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j

set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), : 1:T X1 T (i) = X1 T (i 1) . : :

(IMB, 21st May 2010) 12 / 32 Sample θ q ( θ (i 1)) and run an SMC algorithm to obtain  j p (dx1 T y1 T ) and p (y1 T ). θ : j : θ : Sample X p ( y1 T ) . 1:T  θ j : Withb probability b b p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j b set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), b : 1:T X1 T (i) = X1 T (i 1) . : :

Sample X1 T (1) p ( y1 T , θ (1)) . :  j : At iteration i; i 2  b

Particle Marginal MH Sampler

At iteration 1

Set θ (1) and run an SMC algorithm to obtain p (dx1 T y1 T ) and θ(1) : j : pθ(1) (y1:T ) . b b

(IMB, 21st May 2010) 13 / 32 Sample θ q ( θ (i 1)) and run an SMC algorithm to obtain  j p (dx1 T y1 T ) and p (y1 T ). θ : j : θ : Sample X p ( y1 T ) . 1:T  θ j : Withb probability b b p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j b set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), b : 1:T X1 T (i) = X1 T (i 1) . : :

At iteration i; i 2 

Particle Marginal MH Sampler

At iteration 1

Set θ (1) and run an SMC algorithm to obtain p (dx1 T y1 T ) and θ(1) : j : pθ(1) (y1:T ) . Sample X1:T (1) p ( y1:T , θ (1)) . b b  j b

(IMB, 21st May 2010) 13 / 32 Sample θ q ( θ (i 1)) and run an SMC algorithm to obtain  j p (dx1 T y1 T ) and p (y1 T ). θ : j : θ : Sample X p ( y1 T ) . 1:T  θ j : Withb probability b b p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j b set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), b : 1:T X1 T (i) = X1 T (i 1) . : :

At iteration i; i 2 

Particle Marginal MH Sampler

At iteration 1

Set θ (1) and run an SMC algorithm to obtain p (dx1 T y1 T ) and θ(1) : j : pθ(1) (y1:T ) . Sample X1:T (1) p ( y1:T , θ (1)) . b b  j b

(IMB, 21st May 2010) 13 / 32 Sample X p ( y1 T ) . 1:T  θ j : With probability b p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j b set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), b : 1:T X1 T (i) = X1 T (i 1) . : :

Particle Marginal MH Sampler

At iteration 1

Set θ (1) and run an SMC algorithm to obtain p (dx1 T y1 T ) and θ(1) : j : pθ(1) (y1:T ) . Sample X1 T (1) p ( y1 T , θ (1)) . b :  j : At iterationb i; i 2  b Sample θ q ( θ (i 1)) and run an SMC algorithm to obtain  j p (dx1 T y1 T ) and p (y1 T ). θ : j : θ : b b

(IMB, 21st May 2010) 13 / 32 With probability

p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j b set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), b : 1:T X1 T (i) = X1 T (i 1) . : :

Particle Marginal MH Sampler

At iteration 1

Set θ (1) and run an SMC algorithm to obtain p (dx1 T y1 T ) and θ(1) : j : pθ(1) (y1:T ) . Sample X1 T (1) p ( y1 T , θ (1)) . b :  j : At iterationb i; i 2  b Sample θ q ( θ (i 1)) and run an SMC algorithm to obtain  j p (dx1 T y1 T ) and p (y1 T ). θ : j : θ : Sample X1:T pθ ( y1:T ) . b  jb b

(IMB, 21st May 2010) 13 / 32 Particle Marginal MH Sampler

At iteration 1

Set θ (1) and run an SMC algorithm to obtain p (dx1 T y1 T ) and θ(1) : j : pθ(1) (y1:T ) . Sample X1 T (1) p ( y1 T , θ (1)) . b :  j : At iterationb i; i 2  b Sample θ q ( θ (i 1)) and run an SMC algorithm to obtain  j p (dx1 T y1 T ) and p (y1 T ). θ : j : θ : Sample X p ( y1 T ) . 1:T  θ j : Withb probability b b p (y1 T ) p (θ) q ( θ (i 1) θ) 1 θ : j ^ pθ(i 1) (y1:T ) p (θ (i 1)) q ( θ θ (i 1)) j b set θ (i) = θ, X1 T (i) = X otherwise set θ (i) = θ (i 1), b : 1:T X1 T (i) = X1 T (i 1) . : : (IMB, 21st May 2010) 13 / 32 Corollary of a more general result: the PMMH sampler is a standard MH sampler of target distribution π˜ N and proposal qN de…ned on an extended space associated to all the variables used to generate the proposal. e

Validity of the Particle Marginal MH Sampler

Assume that the ‘idealized’marginal MH sampler is irreducible and aperiodic then, under very weak assumptions, the PMMH sampler generates a θ (i) , X1:P (i) whose marginal distributions N f g (θ (i) , X1 P (i) ) satisfy for any N 1 L : 2    N (θ (i) , X1:P (i) ) p( y1:T ) 0 as i ∞ . L 2  j TV ! !

(IMB, 21st May 2010) 14 / 32 Validity of the Particle Marginal MH Sampler

Assume that the ‘idealized’marginal MH sampler is irreducible and aperiodic then, under very weak assumptions, the PMMH sampler generates a sequence θ (i) , X1:P (i) whose marginal distributions N f g (θ (i) , X1 P (i) ) satisfy for any N 1 L : 2    N (θ (i) , X1:P (i) ) p( y1:T ) 0 as i ∞ . L 2  j TV ! !

Corollary of a more general result: the PMMH sampler is a standard MH sampler of target distribution π˜ N and proposal qN de…ned on an extended space associated to all the variables used to generate the proposal. e

(IMB, 21st May 2010) 14 / 32 Ancestral Lines Generated by SMC

Figure: Ancestral lineages for N = 5 and T = 3. The lighter path is 2 3 4 2 2 X1:3 = X1 , X2 , X3 and its ancestral lineage B1:3 = (3, 4, 2) 

(IMB, 21st May 2010) 15 / 32 Target distribution

N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1) k θ ∝ pθ (y1:T ) p (θ) wT ψ (x1, ... , xT , a1, ... , aT 1)

b

Structure of the Proposal and Target Distributions

Proposal distribution

N q ( θ, k, x1, ... , xT , a1, ... , aT 1 θ) j θ k = q ( θ θ) ψ (x1, ..., xT , a1, ... , aT 1) wT j where

θ ψ (x1, ... , xT , a1, ... , aT 1) N T N m m m an 1 = ∏ µθ (x1 ) ∏ r (an 1 wn 1) ∏ fθ(xn xn 1 ) m=1 ! n=2 j m=1 j !

(IMB, 21st May 2010) 16 / 32 Structure of the Proposal and Target Distributions

Proposal distribution

N q ( θ, k, x1, ... , xT , a1, ... , aT 1 θ) j θ k = q ( θ θ) ψ (x1, ..., xT , a1, ... , aT 1) wT j where

θ ψ (x1, ... , xT , a1, ... , aT 1) N T N m m m an 1 = ∏ µθ (x1 ) ∏ r (an 1 wn 1) ∏ fθ(xn xn 1 ) m=1 ! n=2 j m=1 j ! Target distribution

N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1) k θ ∝ pθ (y1:T ) p (θ) wT ψ (x1, ... , xT , a1, ... , aT 1)

b (IMB, 21st May 2010) 16 / 32 It follows directly that

N π˜ (θ) = p ( θ y1 T ) . j : However, we can say much more about N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1).

Properties of the Target Distribution

We know that E [pθ (y1:T )] = pθ (y1:T ) where the expectation is with respect to ψθ ( ) . b 

(IMB, 21st May 2010) 17 / 32 However, we can say much more about N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1).

Properties of the Target Distribution

We know that E [pθ (y1:T )] = pθ (y1:T ) where the expectation is with respect to ψθ ( ) . b  It follows directly that

N π˜ (θ) = p ( θ y1 T ) . j :

(IMB, 21st May 2010) 17 / 32 Properties of the Target Distribution

We know that E [pθ (y1:T )] = pθ (y1:T ) where the expectation is with respect to ψθ ( ) . b  It follows directly that

N π˜ (θ) = p ( θ y1 T ) . j : However, we can say much more about N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1).

(IMB, 21st May 2010) 17 / 32 A (collapsed) Gibbs sampler to sample from πN can be implemented using e N N k k k m π θ, x1 k, x1 = p θ y1, x1 ∏ µθ (x1 ) , j m=1;m=k     6

e N 1 N i i π K = i θ, x : = w ∝ w x . j 1 θ θ 1 Note that even for …xed θ, this is a non-standard MCMC update for e p (x1 y1). θ j

Explicit Structure of the Target Distribution: Toy Case

For T = 1, we have k N p θ, x y1 π˜ N (θ, k, x ) = 1 µ (xm ) 1 N ∏ θ 1  m=1;m=k 6 k N gθ y1 x = j 1 µ (xm ) N ∏ θ 1  m=1

(IMB, 21st May 2010) 18 / 32 Note that even for …xed θ, this is a non-standard MCMC update for p (x1 y1). θ j

Explicit Structure of the Target Distribution: Toy Case

For T = 1, we have k N p θ, x y1 π˜ N (θ, k, x ) = 1 µ (xm ) 1 N ∏ θ 1  m=1;m=k 6 k N gθ y1 x = j 1 µ (xm ) N ∏ θ 1  m=1 A (collapsed) Gibbs sampler to sample from πN can be implemented using e N N k k k m π θ, x1 k, x1 = p θ y1, x1 ∏ µθ (x1 ) , j m=1;m=k     6

e N 1 N i i π K = i θ, x : = w ∝ w x . j 1 θ θ 1    e

(IMB, 21st May 2010) 18 / 32 Explicit Structure of the Target Distribution: Toy Case

For T = 1, we have k N p θ, x y1 π˜ N (θ, k, x ) = 1 µ (xm ) 1 N ∏ θ 1  m=1;m=k 6 k N gθ y1 x = j 1 µ (xm ) N ∏ θ 1  m=1 A (collapsed) Gibbs sampler to sample from πN can be implemented using e N N k k k m π θ, x1 k, x1 = p θ y1, x1 ∏ µθ (x1 ) , j m=1;m=k     6

e N 1 N i i π K = i θ, x : = w ∝ w x . j 1 θ θ 1 Note that even for …xed θ, this is a non-standard MCMC update for e p (x1 y1). θ j (IMB, 21st May 2010) 18 / 32 K K K Sample B1 , B2 , ..., BT 1, K from a uniform distribution on 1, ..., N T .  f g K K K K B1 B2 BT 1 K Sample θ and X1:T = (X1 , X2 , ... , XT 1 , XT ) from p ( θ, x1:T y1:T ). (We do not know how to do this, this is why we use MCMC). j

To sample from this target distribution

K Run a conditional SMC algorithm compatible with X1:T and its K K K ancestral lineage B1 , B2 , ..., BT 1, K . 

Explicit Structure of the Target Distribution

The target can be rewritten as

N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1) k θ p θ, x1:T y1:T ψ (x1, ... , xT , a1, ..., aT 1) = , T k k k N b1 T k bn bn 1  µθ(x1 ) ∏n=2 r(bn 1 wn 1)fθ(xn xn 1 ) j n o

(IMB, 21st May 2010) 19 / 32 K K K Sample B1 , B2 , ..., BT 1, K from a uniform distribution on 1, ..., N T .  f g K K K K B1 B2 BT 1 K Sample θ and X1:T = (X1 , X2 , ... , XT 1 , XT ) from p ( θ, x1:T y1:T ). (We do not know how to do this, this is why we use MCMC). j K Run a conditional SMC algorithm compatible with X1:T and its K K K ancestral lineage B1 , B2 , ..., BT 1, K . 

Explicit Structure of the Target Distribution

The target can be rewritten as

N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1) k θ p θ, x1:T y1:T ψ (x1, ... , xT , a1, ..., aT 1) = , T k k k N b1 T k bn bn 1  µθ(x1 ) ∏n=2 r(bn 1 wn 1)fθ(xn xn 1 ) j n o

To sample from this target distribution

(IMB, 21st May 2010) 19 / 32 K K K K B1 B2 BT 1 K Sample θ and X1:T = (X1 , X2 , ... , XT 1 , XT ) from p ( θ, x1:T y1:T ). (We do not know how to do this, this is why we use MCMC). j K Run a conditional SMC algorithm compatible with X1:T and its K K K ancestral lineage B1 , B2 , ..., BT 1, K . 

Explicit Structure of the Target Distribution

The target can be rewritten as

N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1) k θ p θ, x1:T y1:T ψ (x1, ... , xT , a1, ..., aT 1) = , T k k k N b1 T k bn bn 1  µθ(x1 ) ∏n=2 r(bn 1 wn 1)fθ(xn xn 1 ) j n o

To sample from this target distribution K K K Sample B1 , B2 , ..., BT 1, K from a uniform distribution on 1, ..., N T .  f g

(IMB, 21st May 2010) 19 / 32 K Run a conditional SMC algorithm compatible with X1:T and its K K K ancestral lineage B1 , B2 , ..., BT 1, K . 

Explicit Structure of the Target Distribution

The target can be rewritten as

N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1) k θ p θ, x1:T y1:T ψ (x1, ... , xT , a1, ..., aT 1) = , T k k k N b1 T k bn bn 1  µθ(x1 ) ∏n=2 r(bn 1 wn 1)fθ(xn xn 1 ) j n o

To sample from this target distribution K K K Sample B1 , B2 , ..., BT 1, K from a uniform distribution on 1, ..., N T .  f g K K K K B1 B2 BT 1 K Sample θ and X1:T = (X1 , X2 , ... , XT 1 , XT ) from p ( θ, x1:T y1:T ). (We do not know how to do this, this is why we use MCMC). j

(IMB, 21st May 2010) 19 / 32 Explicit Structure of the Target Distribution

The target can be rewritten as

N π˜ (θ, k, x1, ... , xT , a1, ... , aT 1) k θ p θ, x1:T y1:T ψ (x1, ... , xT , a1, ..., aT 1) = , T k k k N b1 T k bn bn 1  µθ(x1 ) ∏n=2 r(bn 1 wn 1)fθ(xn xn 1 ) j n o

To sample from this target distribution K K K Sample B1 , B2 , ..., BT 1, K from a uniform distribution on 1, ..., N T .  f g K K K K B1 B2 BT 1 K Sample θ and X1:T = (X1 , X2 , ... , XT 1 , XT ) from p ( θ, x1:T y1:T ). (We do not know how to do this, this is why we use MCMC). j K Run a conditional SMC algorithm compatible with X1:T and its K K K ancestral lineage B1 , B2 , ..., BT 1, K . (IMB, 21st May 2010)  19 / 32 Conditional SMC

Figure: Example of N 1 = 4 ancestral lineages generated by a conditional SMC 2 2 algorithm for N = 5, T = 3 conditional upon X1:3 and B1:3

(IMB, 21st May 2010) 20 / 32 k m m m m m For m = bn , sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn 6  j m m N m and Wn ∝ gθ (yn Xn ) , ∑m=1 Wn = 1.   j N m Resample N 1 times from p (dx1:n y1:n) = ∑ W δX m (dx1:n) θ j m=1 n 1:n bk bk bk to obtain X n and set X n = X n . 1:n b1:n 1:n  

Sample X1 T p (dx1 T y1 T ) . :  θ : j : b

N m Resample N 1 times from p dx y = W m dx to θ ( 1 1) ∑m=1 1 δX1 ( 1) k k kj b1 b1 b1 obtain X 1 and set X 1 = X . b 1   At time n = 2, ..., T

At time n = T

Conditional SMC Algorithm

At time 1 k m m m For m = b1 , sample X1 µθ ( ) and set W1 ∝ gθ (y1 X1 , ) , N 6 m   j ∑m=1 W1 = 1.

(IMB, 21st May 2010) 21 / 32 k m m m m m For m = bn , sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn 6  j m m N m and Wn ∝ gθ (yn Xn ) , ∑m=1 Wn = 1.   j N m Resample N 1 times from p (dx1:n y1:n) = ∑ W δX m (dx1:n) θ j m=1 n 1:n bk bk bk to obtain X n and set X n = X n . 1:n b1:n 1:n  

Sample X1 T p (dx1 T y1 T ) . :  θ : j : b

At time n = 2, ..., T

At time n = T

Conditional SMC Algorithm

At time 1 k m m m For m = b1 , sample X1 µθ ( ) and set W1 ∝ gθ (y1 X1 , ) , N 6 m   j ∑m=1 W1 = 1. N m Resample N 1 times from p dx y = W m dx to θ ( 1 1) ∑m=1 1 δX1 ( 1) k k kj b1 b1 b1 obtain X 1 and set X 1 = X . b 1  

(IMB, 21st May 2010) 21 / 32 k m m m m m For m = bn , sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn 6  j m m N m and Wn ∝ gθ (yn Xn ) , ∑m=1 Wn = 1.   j N m Resample N 1 times from p (dx1:n y1:n) = ∑ W δX m (dx1:n) θ j m=1 n 1:n bk bk bk to obtain X n and set X n = X n . 1:n b1:n 1:n  

Sample X1 T p (dx1 T y1 T ) . :  θ : j : b

At time n = 2, ..., T

At time n = T

Conditional SMC Algorithm

At time 1 k m m m For m = b1 , sample X1 µθ ( ) and set W1 ∝ gθ (y1 X1 , ) , N 6 m   j ∑m=1 W1 = 1. N m Resample N 1 times from p dx y = W m dx to θ ( 1 1) ∑m=1 1 δX1 ( 1) k k kj b1 b1 b1 obtain X 1 and set X 1 = X . b 1  

(IMB, 21st May 2010) 21 / 32 Sample X1 T p (dx1 T y1 T ) . :  θ : j : b

N m Resample N 1 times from p (dx1:n y1:n) = ∑ W δX m (dx1:n) θ j m=1 n 1:n bk bk bk to obtain X n and set X n = X n . 1:n b1:n 1:n   At time n = T

Conditional SMC Algorithm

At time 1 k m m m For m = b1 , sample X1 µθ ( ) and set W1 ∝ gθ (y1 X1 , ) , N 6 m   j ∑m=1 W1 = 1. N m Resample N 1 times from p dx y = W m dx to θ ( 1 1) ∑m=1 1 δX1 ( 1) k k kj b1 b1 b1 obtain X 1 and set X 1 = X . b 1   At time n = 2, ..., T

k m m m m m For m = bn , sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn 6  j m m N m and W ∝ g (yn X ) , ∑  W = 1.   n θ j n m=1 n

(IMB, 21st May 2010) 21 / 32 Sample X1 T p (dx1 T y1 T ) . :  θ : j : b

At time n = T

Conditional SMC Algorithm

At time 1 k m m m For m = b1 , sample X1 µθ ( ) and set W1 ∝ gθ (y1 X1 , ) , N 6 m   j ∑m=1 W1 = 1. N m Resample N 1 times from p dx y = W m dx to θ ( 1 1) ∑m=1 1 δX1 ( 1) k k kj b1 b1 b1 obtain X 1 and set X 1 = X . b 1   At time n = 2, ..., T

k m m m m m For m = bn , sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn 6  j m m N m and Wn ∝ gθ (yn Xn ) , ∑m=1 Wn = 1.   j N m Resample N 1 times from p (dx1:n y1:n) = ∑ W δX m (dx1:n) θ j m=1 n 1:n bk bk bk to obtain X n and set X n = X n . 1:n b1:n 1:n  

(IMB, 21st May 2010) 21 / 32 Sample X1 T p (dx1 T y1 T ) . :  θ : j : b

At time n = T

Conditional SMC Algorithm

At time 1 k m m m For m = b1 , sample X1 µθ ( ) and set W1 ∝ gθ (y1 X1 , ) , N 6 m   j ∑m=1 W1 = 1. N m Resample N 1 times from p dx y = W m dx to θ ( 1 1) ∑m=1 1 δX1 ( 1) k k kj b1 b1 b1 obtain X 1 and set X 1 = X . b 1   At time n = 2, ..., T

k m m m m m For m = bn , sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn 6  j m m N m and Wn ∝ gθ (yn Xn ) , ∑m=1 Wn = 1.   j N m Resample N 1 times from p (dx1:n y1:n) = ∑ W δX m (dx1:n) θ j m=1 n 1:n bk bk bk to obtain X n and set X n = X n . 1:n b1:n 1:n  

(IMB, 21st May 2010) 21 / 32 Conditional SMC Algorithm

At time 1 k m m m For m = b1 , sample X1 µθ ( ) and set W1 ∝ gθ (y1 X1 , ) , N 6 m   j ∑m=1 W1 = 1. N m Resample N 1 times from p dx y = W m dx to θ ( 1 1) ∑m=1 1 δX1 ( 1) k k kj b1 b1 b1 obtain X 1 and set X 1 = X . b 1   At time n = 2, ..., T

k m m m m m For m = bn , sample Xn fθ X n 1 , set X1:n = X 1:n 1, Xn 6  j m m N m and Wn ∝ gθ (yn Xn ) , ∑m=1 Wn = 1.   j N m Resample N 1 times from p (dx1:n y1:n) = ∑ W δX m (dx1:n) θ j m=1 n 1:n bk bk bk to obtain X n and set X n = X n . 1:n b1:n 1:n   At time n = T

Sample X1 T p (dx1 T y1 T ) . :  θ : j : (IMB, 21st May 2010) 21 / 32 b Sample X1:T (i) pθ(i 1) ( y1:T ).  j Sample θ (i) p ( y1 T , X1 T (i)) .  j : :

Problem: We do not know how to sample from p (x1 T y1 T ). θ : j :

“Idealized” Gibbs Sampler

To sample from p ( θ, x1:T y1:T ), an MCMC strategy consists of using the following block Gibbsj sampler. At iteration i

(IMB, 21st May 2010) 22 / 32 Sample θ (i) p ( y1 T , X1 T (i)) .  j : :

Problem: We do not know how to sample from p (x1 T y1 T ). θ : j :

“Idealized” Gibbs Sampler

To sample from p ( θ, x1:T y1:T ), an MCMC strategy consists of using the following block Gibbsj sampler. At iteration i

Sample X1:T (i) pθ(i 1) ( y1:T ).  j

(IMB, 21st May 2010) 22 / 32 Problem: We do not know how to sample from p (x1 T y1 T ). θ : j :

“Idealized” Gibbs Sampler

To sample from p ( θ, x1:T y1:T ), an MCMC strategy consists of using the following block Gibbsj sampler. At iteration i

Sample X1:T (i) pθ(i 1) ( y1:T ).  j Sample θ (i) p ( y1 T , X1 T (i)) .  j : :

(IMB, 21st May 2010) 22 / 32 “Idealized” Gibbs Sampler

To sample from p ( θ, x1:T y1:T ), an MCMC strategy consists of using the following block Gibbsj sampler. At iteration i

Sample X1:T (i) pθ(i 1) ( y1:T ).  j Sample θ (i) p ( y1 T , X1 T (i)) .  j : :

Problem: We do not know how to sample from p (x1 T y1 T ). θ : j :

(IMB, 21st May 2010) 22 / 32 Run a conditional SMC algorithm for θ (i) consistent with X1 T (i 1) and its ancestral lineage. : Sample X1:T (i) p ( y1:T , θ (i)) from the resulting approximation (hence its ancestral lineagej too). b Assume that the ‘ideal’Gibbs sampler is irreducible and aperiodic then under very weak assumptions the particle Gibbs sampler generates a sequence θ (i) , X1 T (i) such that for any N 2 f : g 

((θ (i) , X1 T (i)) ) p( y1 T ) 0 as i ∞. kL : 2  j : k ! !

Particle Gibbs Sampler

At iteration i

Sample θ (i) p ( y1 T , X1 T (i 1)).  j : :

(IMB, 21st May 2010) 23 / 32 Sample X1:T (i) p ( y1:T , θ (i)) from the resulting approximation (hence its ancestral lineagej too). b Assume that the ‘ideal’Gibbs sampler is irreducible and aperiodic then under very weak assumptions the particle Gibbs sampler generates a sequence θ (i) , X1 T (i) such that for any N 2 f : g 

((θ (i) , X1 T (i)) ) p( y1 T ) 0 as i ∞. kL : 2  j : k ! !

Particle Gibbs Sampler

At iteration i

Sample θ (i) p ( y1 T , X1 T (i 1)).  j : : Run a conditional SMC algorithm for θ (i) consistent with X1 T (i 1) and its ancestral lineage. :

(IMB, 21st May 2010) 23 / 32 Assume that the ‘ideal’Gibbs sampler is irreducible and aperiodic then under very weak assumptions the particle Gibbs sampler generates a sequence θ (i) , X1 T (i) such that for any N 2 f : g 

((θ (i) , X1 T (i)) ) p( y1 T ) 0 as i ∞. kL : 2  j : k ! !

Particle Gibbs Sampler

At iteration i

Sample θ (i) p ( y1 T , X1 T (i 1)).  j : : Run a conditional SMC algorithm for θ (i) consistent with X1 T (i 1) and its ancestral lineage. : Sample X1:T (i) p ( y1:T , θ (i)) from the resulting approximation (hence its ancestral lineagej too). b

(IMB, 21st May 2010) 23 / 32 Particle Gibbs Sampler

At iteration i

Sample θ (i) p ( y1 T , X1 T (i 1)).  j : : Run a conditional SMC algorithm for θ (i) consistent with X1 T (i 1) and its ancestral lineage. : Sample X1:T (i) p ( y1:T , θ (i)) from the resulting approximation (hence its ancestral lineagej too). b Assume that the ‘ideal’Gibbs sampler is irreducible and aperiodic then under very weak assumptions the particle Gibbs sampler generates a sequence θ (i) , X1 T (i) such that for any N 2 f : g 

((θ (i) , X1 T (i)) ) p( y1 T ) 0 as i ∞. kL : 2  j : k ! !

(IMB, 21st May 2010) 23 / 32 Similarly, the particle Gibbs sampler might su¤er from the coalescence problem.

For very long , we can alternatively update X1:T by (large) subblocks Xa b for 1 a < b T . Conditional SMC moves can be :   used to sample from pθ (xa:b xa 1, xb+1, ya:b ). j

Applications of Conditional SMC Updates

For T N, the SMC approximation might be poor and result in an ine¢ cient PMMH algorithm.

(IMB, 21st May 2010) 24 / 32 For very long time series, we can alternatively update X1:T by (large) subblocks Xa b for 1 a < b T . Conditional SMC moves can be :   used to sample from pθ (xa:b xa 1, xb+1, ya:b ). j

Applications of Conditional SMC Updates

For T N, the SMC approximation might be poor and result in an ine¢ cient PMMH algorithm. Similarly, the particle Gibbs sampler might su¤er from the coalescence problem.

(IMB, 21st May 2010) 24 / 32 Applications of Conditional SMC Updates

For T N, the SMC approximation might be poor and result in an ine¢ cient PMMH algorithm. Similarly, the particle Gibbs sampler might su¤er from the coalescence problem.

For very long time series, we can alternatively update X1:T by (large) subblocks Xa b for 1 a < b T . Conditional SMC moves can be :   used to sample from pθ (xa:b xa 1, xb+1, ya:b ). j

(IMB, 21st May 2010) 24 / 32 Use the prior for Xn as proposal distribution. f g For a …xed θ, we evaluate the expected acceptance probability as a function of N.

Nonlinear State-Space Model

Consider the following model

1 Xn 1 Xn = Xn 1 + 25 2 + 8 cos 1.2n + Vn, 2 1 + Xn 1 X 2 Y = n + W n 20 n

2 2 2 where Vn 0, σ , Wn 0, σ and X1 0, 5 .  N v  N w  N   

(IMB, 21st May 2010) 25 / 32 For a …xed θ, we evaluate the expected acceptance probability as a function of N.

Nonlinear State-Space Model

Consider the following model

1 Xn 1 Xn = Xn 1 + 25 2 + 8 cos 1.2n + Vn, 2 1 + Xn 1 X 2 Y = n + W n 20 n

2 2 2 where Vn 0, σ , Wn 0, σ and X1 0, 5 .  N v  N w  N Use the prior for Xn as proposal distribution. f g  

(IMB, 21st May 2010) 25 / 32 Nonlinear State-Space Model

Consider the following model

1 Xn 1 Xn = Xn 1 + 25 2 + 8 cos 1.2n + Vn, 2 1 + Xn 1 X 2 Y = n + W n 20 n

2 2 2 where Vn 0, σ , Wn 0, σ and X1 0, 5 .  N v  N w  N Use the prior for Xn as proposal distribution. f g   For a …xed θ, we evaluate the expected acceptance probability as a function of N.

(IMB, 21st May 2010) 25 / 32 Average Acceptance Probability

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3 T= 10 Acceptance Rate 0.2 T= 25 0.1 T= 50 T=100 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Number of Particles

2 2 Average acceptance probability when σv = σw = 10 (IMB, 21st May 2010) 26 / 32 Average Acceptance Probability

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 T= 10 Acceptance Rate Acceptance 0.2 T= 25 0.1 T= 50 T=100 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Number of Particles

2 2 Average acceptance probability when σv = 10, σw = 1

(IMB, 21st May 2010) 27 / 32 We are interested in the kinetic rate constants θ = (α, β, γ) a priori distributed as (Boys et al., 2008)

α (1, 10), β (1, 0.25), γ (1, 7.5).  G  G  G MCMC methods require reversible jumps, Particle MCMC requires only forward simulation.

Inference for Kinetic Models

1 2 Two species Xt (prey) and Xt (predator)

1 1 2 2 1 2 1 Pr Xt+dt=xt +1, Xt+dt=xt xt , xt = α xt dt + o (dt) , Pr X 1 =x1 1, X 2 =x2+1 x1, x2 = β x1 x2dt + o (dt) , t+dt t t+dt t t t t t Pr X 1 =x1, X 2 =x2 1 x1, x2 = γ x2dt + o (dt) , t+dt t t+dt t t t  t  observed at discrete times 1 i.i.d. 2 Yn = X + Wn with Wn 0, σ . n∆  N 

(IMB, 21st May 2010) 28 / 32 MCMC methods require reversible jumps, Particle MCMC requires only forward simulation.

Inference for Stochastic Kinetic Models

1 2 Two species Xt (prey) and Xt (predator)

1 1 2 2 1 2 1 Pr Xt+dt=xt +1, Xt+dt=xt xt , xt = α xt dt + o (dt) , Pr X 1 =x1 1, X 2 =x2+1 x1, x2 = β x1 x2dt + o (dt) , t+dt t t+dt t t t t t Pr X 1 =x1, X 2 =x2 1 x1, x2 = γ x2dt + o (dt) , t+dt t t+dt t t t  t  observed at discrete times 1 i.i.d. 2 Yn = X + Wn with Wn 0, σ . n∆  N We are interested in the kinetic rate constants θ = (α, β, γ) a priori distributed as (Boys et al., 2008)

α (1, 10), β (1, 0.25), γ (1, 7.5).  G  G  G

(IMB, 21st May 2010) 28 / 32 Inference for Stochastic Kinetic Models

1 2 Two species Xt (prey) and Xt (predator)

1 1 2 2 1 2 1 Pr Xt+dt=xt +1, Xt+dt=xt xt , xt = α xt dt + o (dt) , Pr X 1 =x1 1, X 2 =x2+1 x1, x2 = β x1 x2dt + o (dt) , t+dt t t+dt t t t t t Pr X 1 =x1, X 2 =x2 1 x1, x2 = γ x2dt + o (dt) , t+dt t t+dt t t t  t  observed at discrete times 1 i.i.d. 2 Yn = X + Wn with Wn 0, σ . n∆  N We are interested in the kinetic rate constants θ = (α, β, γ) a priori distributed as (Boys et al., 2008)

α (1, 10), β (1, 0.25), γ (1, 7.5).  G  G  G MCMC methods require reversible jumps, Particle MCMC requires only forward simulation.

(IMB, 21st May 2010) 28 / 32 Experimental Results

140 prey predator

120 a

100

80

60 b

40

20

0 g

•20

0 1 2 3 4 5 6 7 8 9 10 1.5 2 2.5 3 3.5 4 4.5 0.06 0.12 0.18 1 2 3 4 5 6 7 8 Simulated data Posterior distributions

(IMB, 21st May 2010) 29 / 32 Autocorrelation Functions

1 1 a b

50 particles 50 particles 0.8 0.8 100 particles 100 particles 200 particles 200 particles 500 particles 500 particles 1000 particles 1000 particles 0.6 0.6

0.4 0.4

0.2 0.2

0 0

0 100 200 300 400 500 0 100 200 300 400 500

Autocorrelation of α (left) and β (right) for the PMMH sampler for various N.

(IMB, 21st May 2010) 30 / 32 PMCMC allow us to perform Bayesian inference for dynamic models for which only forward simulation is possible. More precise quantitative convergence results need to be established. PMCMC should prove useful for perfect simulation.

Discussion

PMCMC methods allow us to design ‘good’high dimensional proposals based only on low dimensional (and potentially unsophisticated) proposals.

(IMB, 21st May 2010) 31 / 32 More precise quantitative convergence results need to be established. PMCMC should prove useful for perfect simulation.

Discussion

PMCMC methods allow us to design ‘good’high dimensional proposals based only on low dimensional (and potentially unsophisticated) proposals. PMCMC allow us to perform Bayesian inference for dynamic models for which only forward simulation is possible.

(IMB, 21st May 2010) 31 / 32 PMCMC should prove useful for perfect simulation.

Discussion

PMCMC methods allow us to design ‘good’high dimensional proposals based only on low dimensional (and potentially unsophisticated) proposals. PMCMC allow us to perform Bayesian inference for dynamic models for which only forward simulation is possible. More precise quantitative convergence results need to be established.

(IMB, 21st May 2010) 31 / 32 Discussion

PMCMC methods allow us to design ‘good’high dimensional proposals based only on low dimensional (and potentially unsophisticated) proposals. PMCMC allow us to perform Bayesian inference for dynamic models for which only forward simulation is possible. More precise quantitative convergence results need to be established. PMCMC should prove useful for perfect simulation.

(IMB, 21st May 2010) 31 / 32 M.A.G. Belmonte,.O. Papaspiliopoulos, O. & M.K. Pitt, Particle …lter estimation of duration-type models. Comp. Stat. Data Analysis, to appear. T. Flury & N. Shephard, Bayesian inference based only on simulated likelihood: particle …lter analysis of dynamic economic models. Econometrics Review, to appear. R. Silva, P. Giordani & R. Kohn, Particle …ltering within adaptive Metropolis Hastings sampling, J. Comp. Graph. Statist., to appear.

References

C. Andrieu, A.D. & R. Holenstein, Particle Markov chain Monte Carlo methods (with discussion), J. Royal Statistical Society B, to appear.

(IMB, 21st May 2010) 32 / 32 T. Flury & N. Shephard, Bayesian inference based only on simulated likelihood: particle …lter analysis of dynamic economic models. Econometrics Review, to appear. R. Silva, P. Giordani & R. Kohn, Particle …ltering within adaptive Metropolis Hastings sampling, J. Comp. Graph. Statist., to appear.

References

C. Andrieu, A.D. & R. Holenstein, Particle Markov chain Monte Carlo methods (with discussion), J. Royal Statistical Society B, to appear. M.A.G. Belmonte,.O. Papaspiliopoulos, O. & M.K. Pitt, Particle …lter estimation of duration-type models. Comp. Stat. Data Analysis, to appear.

(IMB, 21st May 2010) 32 / 32 R. Silva, P. Giordani & R. Kohn, Particle …ltering within adaptive Metropolis Hastings sampling, J. Comp. Graph. Statist., to appear.

References

C. Andrieu, A.D. & R. Holenstein, Particle Markov chain Monte Carlo methods (with discussion), J. Royal Statistical Society B, to appear. M.A.G. Belmonte,.O. Papaspiliopoulos, O. & M.K. Pitt, Particle …lter estimation of duration-type models. Comp. Stat. Data Analysis, to appear. T. Flury & N. Shephard, Bayesian inference based only on simulated likelihood: particle …lter analysis of dynamic economic models. Econometrics Review, to appear.

(IMB, 21st May 2010) 32 / 32 References

C. Andrieu, A.D. & R. Holenstein, Particle Markov chain Monte Carlo methods (with discussion), J. Royal Statistical Society B, to appear. M.A.G. Belmonte,.O. Papaspiliopoulos, O. & M.K. Pitt, Particle …lter estimation of duration-type models. Comp. Stat. Data Analysis, to appear. T. Flury & N. Shephard, Bayesian inference based only on simulated likelihood: particle …lter analysis of dynamic economic models. Econometrics Review, to appear. R. Silva, P. Giordani & R. Kohn, Particle …ltering within adaptive Metropolis Hastings sampling, J. Comp. Graph. Statist., to appear.

(IMB, 21st May 2010) 32 / 32