Markov Chain Monte Carlo Gibbs Sampler Gibbs Sampler Gibbs Sampler

Markov Chain Monte Carlo Gibbs Sampler Recall: To compute the expectation h(Y ) we use the approximation Let Y = (Y1; : : : ; Yd) be d dimensional with d ¸ 2 and distribution f(y). n 1 ¡ ¢ (h(Y )) ¼ h(Y (t)) with Y (1); : : : ; Y (n) » h(y): The full conditional distribution of Yi is given by n t=1 f(y ; : : : ; y ; y ; y ; : : : ; y ) P 1 i¡1 i i+1 d (1) (n) f(yijy1; : : : ; yi¡1; yi+1; : : : ; yd) = Thus our aim is to sample Y ; : : : ; Y from f(y). f(y1; : : : ; yi¡1; yi; yi+1; : : : ; yd) dyi Problem: Independent sampling from f(y) may be di±cult. R Gibbs sampling Markov chain Monte Carlo (MCMC) approach Sample or update in turn: ± Generate Markov chain fY (t)g with stationary distribution f(y). Y (t+1) » f(y jY (t); Y (t); : : : ; Y (t)) ± Early iterations Y (1); : : : ; Y (m) reect starting value Y (0). 1 1 2 3 d (t+1) (t+1) (t) (t) Y » f(y2jY ; Y ; : : : ; Y ) ± These iterations are called burn-in. 2 1 3 d (t+1) (t+1) (t+1) (t) (t) Y3 » f(y3jY1 ; Y2 ; Y4 ; : : : ; Yd ) ± After the burn-in, we say the chain has \converged". ± Omit the burn-in from averages: (t+1) (t+1) (t+1) (t+1) Y » f(ydjY ; Y ; : : : ; Y ) 1 n d 1 2 d¡1 h(Y (t)) n ¡ m t=m+1 Always use most recent values. P 2 Burn−in Stationarity In two dimensions, the sample path of the Gibbs sampler looks like this: 1 0.30 ) t=1 t ( 0 Y t=2 −1 0.25 t=4 −2 0 100 200 300 400 500 600 700 800 900 1000 Iteration t=3 0.20 ) t ( 2 Y (t) 0.15 t=7 How do we construct a Markov chain fY g which has stationary distri- t=6 bution f(y)? t=5 0.10 ± Gibbs sampler 0.15 0.20 0.25 0.30 0.35 (t) Y1 ± Metropolis-Hastings algorithm (Metropolis et al 1953; Hastings 1970) MCMC, April 29, 2004 - 1 - MCMC, April 29, 2004 - 2 - Gibbs Sampler Gibbs Sampler T Detailed balance for Gibbs sampler: For simplicity, let Y = (Y1; Y2) . Then the Example: Bayes inference for a univariate normal sample (t+1) (t) update Y at time t + 1 is obtained from the previous Y in two steps: T Consider normally distributed observations Y = (Y1; : : : ; Yn) (t+1) (t) Y1 » p(y1jY2 ) (t+1) (t+1) iid 2 Y2 » p(y2jY1 ) Yi » N (¹; ): 0 (t+1) 0 (t) Accordingly the transition matrix P (y; y ) = ¡ (Y = y jY = y) can be factorized Likelihood function: into two separate transition matrices n n 2 1 2 1 2 0 0 f(Y j¹; ) » exp ¡ (Y ¡ ¹) P (y; y ) = P (y; y~) P (y~; y ) 2 2 i 1 2 2 i=1 ³ ´ ³ ´ 0 T P where y~ = (y1; y2) is the intermediate result after the ¯rst step. Obviously we have Prior distribution (noninformative prior): 0 0 0 0 P1(y; y~) = p(y1jy2) and P2(y~; y ) = p(y2jy1): 2 1 ¼(¹; ) » 2 0 0 0 0 0 Note that for any y; y , we have P1(y; y ) = 0 if y2 =6 y2 and P2(y; y ) = 0 if y1 =6 y1. According to the detailed balance for time-dependent Markov chains, it su±ces to show Posterior distribution: 0 0 n 2 +1 n detailed balance for each of the transition matrices: For any states y; y such that y2 = y2 2 1 1 2 ¼(¹; jY ) » 2 exp ¡ 2 (Yi ¡ ¹) 0 0 0 2 i=1 p(y) P1(y; y ) = p(y1; y2) p(y1jy2) = p(y1jy2) p(y1; y2) ³ ´ ³ ´ 0 0 0 0 0 2 P = p(y1jy2) p(y1; y2) = P1(y ; y) p(y ); De¯ne ¿ = 1/ . Then we can show that 0 0 while for y; y with y2 =6 y2 the equation is trivially ful¯lled. 2 ¹ 2 0 0 ¼(¹j ; Y ) = N Y ; =n Similarly we obtain for y; y such that y1 = y1 n n 1 2 0 0 0 ¼(¿j¹; Y ) = ¡ ¡ ; (¢Yi ¡ ¹) p(y) P2(y; y ) = p(y1; y2) p(y2jy1) = p(y2jy1) p(y2; y1) 2 2 i=1 0 0 0 0 0 ³ ´ = p(y2jy1) p(y1; y2) = P2(y ; y) p(y ); P 0 0 while for y; y with y1 =6 y1 the equation trivially holds. Altogether this shows that p(y) Gibbs sampler: is indeed the stationary distribution of the Gibbs sampler. Note that combined we get 0 0 0 0 0 0 p(y) P (y; y ) = p(y) P1(y; y~) P1(y~; y ) = p(y ) P2(y ; y~) P1(y~; y) =6 p(y ) P (y ; y): ¹(t+1) » N Y¹ ; (n ¢ ¿ (t))¡1 n Explanation: Markov chains fYtg which satisfy the detailed balance equation are called (t+1) n 1 (t+1) 2 ¿ » ¡ ¡ ; (Yi ¡ ¹¢ ) time-reversible since it can be shown that 2 2 i=1 ³ ´ 0 0 P ¡ ¡ t t (Yt+1 = y jYt = y) = (Yt = yjYt+1 = y ): with 2 ( +1) = 1=¿ ( +1) For the above Gibbs sampler, to go back in time we have to update the two components (t+1) (t+1) in reverse order - ¯rst Y2 and then Y1 . MCMC, April 29, 2004 - 3 - MCMC, April 29, 2004 - 4 - Gibbs Sampler Markov Chain Monte Carlo Implementation in R Example: Bivariate normal distribtution n<-20 #Data T T Y<-rnorm(n,2,2) Let Y = (Y1; Y2) be normally distributed with mean ¹ = (0; 0) and MC<-2;N<-1000 #Run MC=2 chains of length N=1000 covariance matrix p<-rep(0,2*MC*N) #Allocate memory for results dim(p)<-c(2,MC,N) 1 ½ for (j in (1:MC)) { #Loop over chains § = : p2<-rgamma(1,n/2,1/2) #Starting value for tau ½ 1 for (i in (1:N)) { #Gibbs iterations Ã ! p1<-rnorm(1,mean(Y),sqrt(1/(p2*n))) #Update mu p2<-rgamma(1,n/2,sum((Y-p1)^2)/2) #Update tau The conditional distributions are p[1,j,i]<-p1 #Save results p[2,j,i]<-p2 2 } Y1jY2 = N (½ Y2; 1 ¡ ½ ) } 2 Y2jY1 = N (½ Y1; 1 ¡ ½ ) Results: Bayes inference for a univariate normal sample Thus the steps of the Gibbs sampler are Two runs of Gibbs sampler (N=500): (t+1) (t) 2 Y1 » N (½ Y2 ; 1 ¡ ½ ); 3.5 (t+1) (t+1) 3.0 2 2.5 Y2 » N (½ Y1 ; 1 ¡ ½ ): 2.0 1.5 1.0 0.5 0.6 0.4 T 0.2 Note: (t) (t) (t) 0 We can obtain an independent sample Y = (Y1 ; Y2 ) by 0 100 200 300 400 500 Iteration (t) (t) (t+1) 1.0 µ 1.0 τ Y » N (0; 1); 0.8 0.8 1 0.6 0.6 (t+1) (t+1) 2 0.4ACF 0.4ACF Y2 » N (½ Y1 ; 1 ¡ ½ ): 0.2 0.2 0.0 0.0 0 10 20 30 40 50 0 10 20 30 40 50 Lag Lag Marginal and joint posterior distributions (based on 1000 draws): 100 0.6 80 0.5 80 0.4 60 60 ) t ( 0.3 τ 40 40 Frequency Frequency 0.2 20 20 0.1 0 0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.5 1.0 1.5 2.0 2.5 3.0 3.5 ( ) ( ) µ t τ(t) µ t MCMC, April 29, 2004 - 5 - MCMC, April 29, 2004 - 6 - Markov Chain Monte Carlo Markov Chain Monte Carlo Comparison of MCMC and independent draws Convergence diagnostics 1.0 0.8 ² Plot chain for each quantity of interest. 0.6 (t) ACF 0.4 Y1 Y1 0.2 0.0 0 Iterations 100 to 400 Independent sampling (n=300) 0 0 50 100 150 200 30 ² Plot auto-correlation function (ACF) Lag Burn−in 2 1 ) t 0 ( 30 (t) (t+h) µ −1 −2 20 ½i(h) = corr Yi ; Yi : −3 0 50 100 150 200 100 100 iteration 20 ¡ ¢ Frequency Frequency measures the correlation of values h lags apart. 10 10 ± Slow decay of ACF indicates slow convergence and bad mixing. 200 200 0 0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 (t) ± Can be used to ¯nd independent subsample. Y1 Y1 Iterations 100 to 700 Independent sampling (n=600) ² Run multiple, independent chains (e.g. 3-10). 80 300 300 50 ± Several long runs (Gelman and Rubin 1992) 60 40 ¢ gives indication of convergence 40 30 Frequency Frequency 400 20 400 ¢ a sense of statistical security 20 10 0 0 ± one very long run (Geyer, 1992) −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 (t) Y1 Y1 ¢ reaches parts other schemes cannot reach. 500 Iterations 100 to 1000 Independent sampling (n=900) 500 80 ² Widely dispersed starting values are particularly helpful to detect slow convergence. 60 60 600 600 40 40 40 20 convergence Frequency Frequency © © 1 ©¼ µ 0 20 20 −20 700 0 0 700 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 0 200 400 600 800 1000 (t) Iteration Y1 Y1 Iterations 100 to 10000 Independent sampling (n=9900) 800 800 If not satis¯ed, try some other diagnostics (Ã literature). 800 800 600 600 400 400 Frequency Frequency 900 200 200 900 0 0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 (t) Y1 Y1 1000 1000 MCMC, April 29, 2004 - 7 - MCMC, April 29, 2004 - 8 - Markov Chain Monte Carlo Note: Even after the chain reached convergence, it might not yet good enough for estimating (h(Y )).

Markov Chain Monte Carlo Gibbs Sampler Gibbs Sampler Gibbs Sampler

Markov Chain Monte Carlo Convergence Diagnostics: a Comparative Review

Meta-Learning MCMC Proposals

Markov Chain Monte Carlo Edps 590BAY

Bayesian Phylogenetics and Markov Chain Monte Carlo Will Freyman

The Markov Chain Monte Carlo Approach to Importance Sampling

Markov Chain Monte Carlo (Mcmc) Methods

An Introduction to Markov Chain Monte Carlo Methods and Their Actuarial Applications

Introduction to Markov Chain Monte Carlo

Estimating Accuracy of the MCMC Variance Estimator: a Central Limit Theorem for Batch Means Estimators

Monte Carlo Methods

Introduction to Bayesian Network Meta-Analysis

Bayesian and Markov Chain Monte Carlo Methods for Identifying