Hamiltonian Monte Carlo
Total Page:16
File Type:pdf, Size:1020Kb
Hamiltonian Monte Carlo Dr. Jarad Niemi Iowa State University September 12, 2017 Adapted from Radford Neal's MCMC Using Hamltonian Dynamics in Handbook of Markov Chain Monte Carlo (2011). Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 1 / 29 Hamiltonian dynamics Hamiltonian system Considering a body in a frictionless 1-dimensional environment, let m be its mass, θ be its position, and ! be its momentum. The mass has potential energy U(θ) (which is proportional to its height) and kinetic energy K(!) = !2=(2m). Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 2 / 29 Hamiltonian dynamics Hamilton's equations Hamilton's equations Extending this to d dimensions, we have position vector θ and momentum vector !. The Hamiltonian H(θ; !) describes the time evolution of the system through dθi = @H dt @!i d!i = − @H dt @θi for i = 1;:::; d. Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 3 / 29 Hamiltonian dynamics Hamilton's equations Potential and kinetic energy For Hamiltonian Monte Carlo, we usually use Hamiltonian functions that can be written as follows: H(θ; !) = U(θ) + K(!) where U(θ) is called the potential energy and will be defined to be minus the log probability density of the distribution for θ (plus any constant that is convenient) and K(!) is called the kinetic energy and is usually defined as K(!) = !>M−1!=2 where M is a symmetric, positive-definite \mass matrix", which is typically diagonal, and is often a scalar multiple of the identity matrix. This form for K(!) corresponds to minus the log probability density (plus a constant) of the zero-mean Gaussian distribution with covariance matrix M. The resulting Hamilton's equations are dθi −1 d!i @U = [M !]i ; = − : dt dt @θi Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 4 / 29 Hamiltonian dynamics One-dimensional example One-dimensional example Suppose H(θ; !) = U(θ) + K(!); U(θ) = θ2=2; K(!) = !2=2 The dynamics resulting from this Hamiltonian are dθ d! = !; = −θ: dt dt Solutions of the form θ(t) = r cos(a + t);!(t) = −r sin(a + t) for some constants r and a. Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 5 / 29 Hamiltonian dynamics One-dimensional example One-dimensional example simulation Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 6 / 29 Hamiltonian dynamics Reversibility Hamiltonian dynamics is reversible, i.e. the mapping Ts from the state at time t,(θ(t); omega(t)), to the state at time t + s,(θ(t + s); p(t + s)), is one-to-one, and hence as an inverse, T−s . Under our usual assumptions for HMC, the inverse mapping can be obtained by negative !, applying Ts , and then negating ! again. The reversibility of Hamiltonian dynamics is important for showing convergence of HMC. Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 7 / 29 Hamiltonian dynamics Reversibility Conservation of the Hamiltonian The dynamics conserve the Hamiltonian since h i dH Pd dθi @H d!i @H dt = i=1 dt @θ + dt @! h i i i = Pd @H @H − @H @H i=1 @!i @θi @θi @!i If h is conserved, then the acceptance probability based on Hamiltonian dynamics is 1. In practice, we can oly make H approximately invariant. Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 8 / 29 Hamiltonian dynamics Reversibility Conservation of the Hamiltonian Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 9 / 29 Hamiltonian dynamics Reversibility Volume preservation If we apply the mapping Ts to point in some region R of (θ; !) space with volume V , the image of R under Ts will also have volume V . This feature simplifies calculation of the acceptance probability for Metropolis updates. Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 10 / 29 Hamiltonian dynamics Euler's method Euler's method For simplicity, assume d X !2 H(θ; !) = U(θ) + K(!); K(!) = i : 2mi i=1 One way to simulate Hamiltonian dynamics is to discretize time into increments of e, i.e. d!i @U !i (t + e) = !i (t) + e (t) = !i (t) − e (θ(t)) dt @θi dθi !i (t) θi (t + e) = θi (t) + e (t) = θi (t) + e dt mi Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 11 / 29 Hamiltonian dynamics Euler's method Leapfrog method An improved approach is the leapfrog method which has the following updates: @U !i (t + e=2) = !i (t) − (e=2) (θ(t)) @θi !i (t+e=2) θi (t + e) = θi (t) + e mi @U !i (t + e) = !i (t + e=2) − (e=2) (θ(t + e)) @θi The leapfrog method is reversible and preserves volume exactly. Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 12 / 29 Hamiltonian dynamics Euler's method Leap-frog simulator leap_frog= function(U, grad_U,e,L, theta, omega) f omega= omega-e/2* grad_U(theta) for (l in1:L) f theta= theta+e* omega if (l<L) omega= omega-e* grad_U(theta) g omega= omega-e/2* grad_U(theta) return(list(theta=theta,omega=omega)) g Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 13 / 29 Hamiltonian dynamics Euler's method Leap-frog simulator Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 14 / 29 Hamiltonian dynamics Euler's method Conservation of the Hamiltonian Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 15 / 29 Hamiltonian Monte Carlo Probability distributions The Hamiltonian is an energy function for the joint state of \position", θ, and \momentum", !, and so defines a joint distribution for them, via 1 P(θ; !) = exp (−H(θ; !)) Z where Z is the normalizing constant. If H(θ; !) = U(θ) + K(!), the joint density is 1 P(θ; !) = exp (−U(θ)) exp (−K(!)) : Z If we are interested in a posterior distribution, we set U(θ) = − log [p(yjθ)p(θ)] : Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 16 / 29 Hamiltonian Monte Carlo Hamiltonian Monte Carlo algorithm Set tuning parameters L: the number of steps e: stepsize D = fdi g: covariance matrix for ! Let θ(i) be the current value of the parameter θ. The leap-frog Hamiltonian Monte Carlo algorithm is 1. Sample ! ∼ Nd (0; D). 2. Simulate Hamiltonian dynamics on location θ(i) and momentum ! via the leapfrog method (or any reversible method that preserves volume) for L steps with stepsize e. Call these updated values θ∗ and −!∗. 3. Set θ(i+1) = θ∗ with probability minf1; ρ(θ(i); θ∗)g where p(θ∗jy) p(!∗) p(yjθ∗)p(θ∗) N (!∗; 0; D) ρ(θ(i); θ∗) = = d (i) (i) (i) (i) (i) p(θ jy) p(! ) p(yjθ )p(θ ) Nd (! ; 0; D) otherwise set θ(i+1) = θ(i). Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 17 / 29 Hamiltonian Monte Carlo Reversibility Reversibility for the leapfrog means that if you simulate from (θ; !) to (θ∗;!∗) for some step size e and number of steps L then if you simulate from (θ∗;!∗) for the same e and L, you will end up at (θ; !). If we use θ to denote our simulation \density", then reversibility means θ(θ∗;!∗jθ; !) = θ(θ; !jθ∗;!∗) and thus in the Metropolis-Hastings calculation, the proposal is symmetric. In order to ensure reversibility of our proposal, we need to negate momentum after we complete the leap-frog simulation. So long as p(!) = p(−!), which is true for a multivariate normal centered at 0, this will not affect our acceptance probability. Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 18 / 29 Hamiltonian Monte Carlo Conservation of Hamiltonian results in perfect acceptance The Hamiltonian is conserved if H(θ; !) = H(θ∗;!∗) which implies p(θ∗jy)p(!∗) = exp (−H(θ∗;!∗)) = exp (−H(θ; !)) = p(θjy)p(!) and thus the Metropolis-Hastings acceptance probability is p(θ∗jy)p(!∗) ρ(θ(i); θ∗) = = 1: p(θ(i)jy)p(!(i)) This will only be the case if the simulation is perfect! But we have discretization error. The acceptance probability accounts for this error. Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 19 / 29 Hamiltonian Monte Carlo HMC_neal= function(U, grad_U,e,L, current_theta) f theta= current_theta omega= rnorm(length(theta),0,1) current_omega= omega omega= omega-e* grad_U(theta)/2 for (i in1:L) f theta= theta+e* omega if (i!=L) omega= omega-e* grad_U(theta) g omega= omega-e* grad_U(theta)/2 omega=-omega current_U=U(current_theta) current_K= sum(current_omega^2)/2 proposed_U=U(theta) proposed_K= sum(omega^2)/2 if(runif(1)< exp(current_U-proposed_U+current_K-proposed_K)) f return(theta) g else f return(current_theta) g g Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 20 / 29 Hamiltonian Monte Carlo HMC= function(n_reps, log_density, grad_log_density, tuning, initial) f theta= rep(0, n_reps) theta[1]= initial$theta for (i in2:n_reps) theta[i]= HMC_neal(U= function(x)-log_density(x), grad_U= function(x)-grad_log_density(x), e = tuning$e, L = tuning$L, theta[i-1]) theta g Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 21 / 29 Hamiltonian Monte Carlo theta= HMC(1e4, function(x)-x^2/2, function(x)-x, list(e=1,L=1), list(theta=0)) hist(theta, freq=F, 100) curve(dnorm, add=TRUE, col='red', lwd=2) Histogram of theta 0.4 0.3 0.2 Density 0.1 0.0 −4 −2 0 2 4 theta Jarad Niemi (Iowa State) Hamiltonian Monte Carlo September 12, 2017 22 / 29 Hamiltonian Monte Carlo Tuning parameters There are three tuning parameters: e: step size L: number of steps D: covariance matrix for momentum Let Σ = V (θjy), then an optimal normal distribution for ! is N(0; Σ−1).