<<

Introduction Markov-Chains Hastings-Metropolis Algorithm

Markov Chain Monte Carlo

Todd Ebert

Todd Ebert Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Outline

1 Introduction

2 Markov-Chains

3 Hastings-Metropolis Algorithm

4 Simulated Annealing

Todd Ebert Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Sampling an Unknown Distribution

The Problem Need to sample X , but distribution π(x) is unknown. Only relative values π(x)/π(y) are known, for each x, y ∈ dom(X ). |dom(X )| is too large to enumerate.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Sampling an Unknown Distribution

The Solution Devleop a method for randomly traversing through the elements of dom(X ). The elements of dom(X ) are called states. Moving from one state to the next is called a state transition, and is governed by a distribution p(y|x) that gives the probability of transitioning to y on condition the current visited state is x. For each x ∈ dom(X ), the fraction of visits that are to state x converges to π(x).

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Outline

1 Introduction

2 Markov-Chains

3 Hastings-Metropolis Algorithm

4 Simulated Annealing

Todd Ebert Markov Chain Monte Carlo Markov-chain Example States: {1 = no rain, 2 = rain}. State Transition: moving from one day to the next. State-Transition .  0.8 0.2  P = 0.5 0.5

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models

Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.

Todd Ebert Markov Chain Monte Carlo States: {1 = no rain, 2 = rain}. State Transition: moving from one day to the next. State-Transition matrix.  0.8 0.2  P = 0.5 0.5

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models

Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.

Markov-chain Example

Todd Ebert Markov Chain Monte Carlo State Transition: moving from one day to the next. State-Transition matrix.  0.8 0.2  P = 0.5 0.5

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models

Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.

Markov-chain Example States: {1 = no rain, 2 = rain}.

Todd Ebert Markov Chain Monte Carlo State-Transition matrix.  0.8 0.2  P = 0.5 0.5

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models

Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.

Markov-chain Example States: {1 = no rain, 2 = rain}. State Transition: moving from one day to the next.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models

Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.

Markov-chain Example States: {1 = no rain, 2 = rain}. State Transition: moving from one day to the next. State-Transition matrix.  0.8 0.2  P = 0.5 0.5

Todd Ebert Markov Chain Monte Carlo States: {(no rain, no rain), (no rain, rain), (rain, no rain), (rain,rain)}. State Interpretation. For example, (no rain, rain) means “no rain yesterday, but rain today”.

State-Transition Matrix P (nr, nr) (nr, r) (r, nr) (r,r) (nr,nr) 0.85 0.15 0 0 (nr, r) 0 0 0.6 0.4 (r, nr) 0.65 0.35 0 0 (r, r) 0 0 0.7 0.3

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Can Use Both Past and Present

Markov-chain Example

Todd Ebert Markov Chain Monte Carlo State Interpretation. For example, (no rain, rain) means “no rain yesterday, but rain today”.

State-Transition Matrix P (nr, nr) (nr, r) (r, nr) (r,r) (nr,nr) 0.85 0.15 0 0 (nr, r) 0 0 0.6 0.4 (r, nr) 0.65 0.35 0 0 (r, r) 0 0 0.7 0.3

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Can Use Both Past and Present

Markov-chain Example States: {(no rain, no rain), (no rain, rain), (rain, no rain), (rain,rain)}.

Todd Ebert Markov Chain Monte Carlo State-Transition Matrix P (nr, nr) (nr, r) (r, nr) (r,r) (nr,nr) 0.85 0.15 0 0 (nr, r) 0 0 0.6 0.4 (r, nr) 0.65 0.35 0 0 (r, r) 0 0 0.7 0.3

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Can Use Both Past and Present

Markov-chain Example States: {(no rain, no rain), (no rain, rain), (rain, no rain), (rain,rain)}. State Interpretation. For example, (no rain, rain) means “no rain yesterday, but rain today”.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Can Use Both Past and Present

Markov-chain Example States: {(no rain, no rain), (no rain, rain), (rain, no rain), (rain,rain)}. State Interpretation. For example, (no rain, rain) means “no rain yesterday, but rain today”.

State-Transition Matrix P (nr, nr) (nr, r) (r, nr) (r,r) (nr,nr) 0.85 0.15 0 0 (nr, r) 0 0 0.6 0.4 (r, nr) 0.65 0.35 0 0 (r, r) 0 0 0.7 0.3

Todd Ebert Markov Chain Monte Carlo Proposition 1 Pt = P · P(t−1). In other words, the t-step transition matrix is obtained by multiplying the one-step matrix with the (t − 1)-step matrix.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Predicting Further into the Future

t-Step Transition Matrix Pt t t The t-step transition matrix P is defined so that Pij represents the probability of being in state j t steps after being in state i.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Predicting Further into the Future

t-Step Transition Matrix Pt t t The t-step transition matrix P is defined so that Pij represents the probability of being in state j t steps after being in state i.

Proposition 1 Pt = P · P(t−1). In other words, the t-step transition matrix is obtained by multiplying the one-step matrix with the (t − 1)-step matrix.

Todd Ebert Markov Chain Monte Carlo Proof of Proposition 1: t = 2 Basis Step

Let Si , i = 0, 1, 2,..., be the current state at time i. Then for t = 2, 2 Pij = p(S2 = j|S0 = i) = n X p(S2 = j|S1 = k, S0 = i)p(S1 = k|S0 = i) = k=1 n n X X p(S2 = j|S1 = k)p(S1 = k|S0 = i) = Pik Pkj , k=1 k=1 which is obtained by taking the inner product of row i of P with column j of P. Thus, P2 = P · P. Proof of Proposition 1: Inductive Step

Now assume the result holds for some t ≥ 2. We show that it is also true for t + 1.

(t+1) Pij = p(S(t+1) = j|S0 = i) = n X p(S(t+1) = j|St = k, S0 = i)p(St = k|S0 = i) = k=1 n n X X t p(S(t+1) = j|St = k)p(St = k|S0 = i) = Pik Pkj , k=1 k=1 which is obtained by taking the inner product of row i of P with column j of Pt . Thus, P(t+1) = P · Pt , and the proposition is proved by induction on t.  0.74 0.26   0.74 0.26   0.7166 0.2834  P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915

Interpretation of P4 If it is not raining today, then there is a 71.66% chance of no rain in 4 days. If it is raining today, then there is a 29.15% chance of rain in 4 days.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example

 0.8 0.2   0.8 0.2   0.74 0.26  P2 = = 0.5 0.5 0.5 0.5 0.65 0.35

Todd Ebert Markov Chain Monte Carlo Interpretation of P4 If it is not raining today, then there is a 71.66% chance of no rain in 4 days. If it is raining today, then there is a 29.15% chance of rain in 4 days.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example

 0.8 0.2   0.8 0.2   0.74 0.26  P2 = = 0.5 0.5 0.5 0.5 0.65 0.35

 0.74 0.26   0.74 0.26   0.7166 0.2834  P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915

Todd Ebert Markov Chain Monte Carlo If it is not raining today, then there is a 71.66% chance of no rain in 4 days. If it is raining today, then there is a 29.15% chance of rain in 4 days.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example

 0.8 0.2   0.8 0.2   0.74 0.26  P2 = = 0.5 0.5 0.5 0.5 0.65 0.35

 0.74 0.26   0.74 0.26   0.7166 0.2834  P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915

Interpretation of P4

Todd Ebert Markov Chain Monte Carlo If it is raining today, then there is a 29.15% chance of rain in 4 days.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example

 0.8 0.2   0.8 0.2   0.74 0.26  P2 = = 0.5 0.5 0.5 0.5 0.65 0.35

 0.74 0.26   0.74 0.26   0.7166 0.2834  P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915

Interpretation of P4 If it is not raining today, then there is a 71.66% chance of no rain in 4 days.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example

 0.8 0.2   0.8 0.2   0.74 0.26  P2 = = 0.5 0.5 0.5 0.5 0.65 0.35

 0.74 0.26   0.74 0.26   0.7166 0.2834  P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915

Interpretation of P4 If it is not raining today, then there is a 71.66% chance of no rain in 4 days. If it is raining today, then there is a 29.15% chance of rain in 4 days.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Ergodic Markov-Chains

Some Important Properties of a Markov-Chain Model Irreducible. For each pair of states i, j, there is positive probability of reaching j from i in a finite number of transitions. Period. The period of state i is defined as

n τi = gcd{n > 0|Pii > 0}.

Aperiodic. each state has period equal to 1. Positive recurrent For any state i, expected time to return to i (upon leaving i) is finite. Ergodic A Markov chain is ergodic iff it is aperiodic and positive recurrent.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Fundamental Theorem of Markov Chains

Stationary Distributions Given a Markov-chain matrix P, a π over S is called stationary (or an equilibrium distribution) provided that π = π · P, where π is viewed as a 1 × n vector/matrix. Thus, the probability of being in state i, i = 1,..., n, after t steps is independent of t and is given by π(i).

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Fundamental Theorem of Markov Chains

Fundamental Theorem of Irreducible, Ergodic Markov Chains Let P be the transition matrix for a finite, irreducible, ergodic Markov chain. Then associated with P is a unique stationary distribution π for which

t πi = lim P = 1/si , t−>∞ ji

for all j = 1, 2,... n, and where si is the expected number of steps it takes for a beginning at state i to return to state i. Moreover, π satisfies the equation π = π · P. Conversely, if an irreducible, ergodic Markov chain’s transition matrix satisfies such an equation, for some distribution π, then π is the chain’s stationary distribution.

Todd Ebert Markov Chain Monte Carlo Stationary Distribution Interpretation Regardless of today’s weather, the probability that it will be not be raining in exactly one year from today (or some other day in the distant future) is approximately 5/7.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Stationary Distribution for the Weather Example

Let π = (x, y). Then

 0.8 0.2  (x, y) = (x, y) ⇒ x = 0.8x + 0.5y ⇒ x = 5y/2 0.5 0.5

by equating the first components of both sides. Since x + y = 1, This yields

5y/2 + y = 7y/2 = 1 ⇒ y = 2/7 and x = 5/7.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Stationary Distribution for the Weather Example

Let π = (x, y). Then

 0.8 0.2  (x, y) = (x, y) ⇒ x = 0.8x + 0.5y ⇒ x = 5y/2 0.5 0.5

by equating the first components of both sides. Since x + y = 1, This yields

5y/2 + y = 7y/2 = 1 ⇒ y = 2/7 and x = 5/7.

Stationary Distribution Interpretation Regardless of today’s weather, the probability that it will be not be raining in exactly one year from today (or some other day in the distant future) is approximately 5/7.

Todd Ebert Markov Chain Monte Carlo Proof of Simply note that the left side is the i th component of π, while the right side is the i th component of π · P.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Master Equation

Master Equation Corollary for P with Stationary Distribution π For every i ∈ S, X X π(i) Pij = π(j)Pji j∈S j∈S

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Master Equation

Master Equation Corollary for P with Stationary Distribution π For every i ∈ S, X X π(i) Pij = π(j)Pji j∈S j∈S

Proof of Master Equation Simply note that the left side is the i th component of π, while the right side is the i th component of π · P.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Master Equation

Interpretation: Conservation of State For each i ∈ S, the Master Equation states that the probability of the event that the system transitions into state i equals the probability of the event that the system transitions out of state i.

Todd Ebert Markov Chain Monte Carlo Proof of Detailed-Balance Corollary By fixing i and summing both sides over j, one obtains the Master Equation. Hence, the Master-Equation Corollary implies that π is the stationary distribution.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Detailed-

Detailed-Balance Equation Corollary for Markov Chain P If for every i, j ∈ S, π(i)Pij = π(j)Pji , then π is the stationary distribution for P.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Detailed-Balance Equation

Detailed-Balance Equation Corollary for Markov Chain P If for every i, j ∈ S, π(i)Pij = π(j)Pji , then π is the stationary distribution for P.

Proof of Detailed-Balance Corollary By fixing i and summing both sides over j, one obtains the Master Equation. Hence, the Master-Equation Corollary implies that π is the stationary distribution.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Detailed-Balance Equation

Interpretataion: Conservation of Inter-State Transitions For each i, j ∈ S, probability that system is in state i and then transitions to state j equals the probability that system is in state j and transitions to state i.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Estimating λ for Distribution π

1 Select initial state X0 2 For k sufficiently large, use P to transition to states X1,..., Xk , so that

P(Xk = i) ≈ π(i).

3 Perform n additiional transitions and define 1 λˆ = (X + ··· + X ) . n k+1 k+n

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Estimating σ2 for Distribution π

1 Select initial state X0 2 For k sufficiently large, use P to transition to states X1,..., Xk , so that

P(Xk = i) ≈ π(i).

3 Choose r sufficiently large so that correlation(Xi , Xi+r ) ≈ 0. 4 Let 1 Y = X + ··· + X  , i r k+(i−1)r+1 k+ir i = 1,..., n. Then

" n # 1 X σˆ2 = Y 2 − nλˆ2 . n − 1 i i=1

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Outline

1 Introduction

2 Markov-Chains

3 Hastings-Metropolis Algorithm

4 Simulated Annealing

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm

Eating the π that You Desire Given Markov Chain P, desired state distribution π, and i, j ∈ S, define α(i, j) and α(j, i) so that

π(i)α(i, j)Pij = π(j)α(j, i)Pji ,

where, e.g. α(i, j) = 1 in the case that π(i)Pij ≤ π(j)Pji .

Todd Ebert Markov Chain Monte Carlo When in state i, apply transition matrix P to generate candidate next-state j. Generate a random real number U from interval [0, 1]. If U ≤ α(i, j), transition to next state j. Otherwise, remain in (next) state i.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm

Hastings-Metropolis Algorithm

Todd Ebert Markov Chain Monte Carlo Generate a random real number U from interval [0, 1]. If U ≤ α(i, j), transition to next state j. Otherwise, remain in (next) state i.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm

Hastings-Metropolis Algorithm When in state i, apply transition matrix P to generate candidate next-state j.

Todd Ebert Markov Chain Monte Carlo If U ≤ α(i, j), transition to next state j. Otherwise, remain in (next) state i.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm

Hastings-Metropolis Algorithm When in state i, apply transition matrix P to generate candidate next-state j. Generate a random real number U from interval [0, 1].

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm

Hastings-Metropolis Algorithm When in state i, apply transition matrix P to generate candidate next-state j. Generate a random real number U from interval [0, 1]. If U ≤ α(i, j), transition to next state j. Otherwise, remain in (next) state i.

Todd Ebert Markov Chain Monte Carlo Hastings-Metropolis Example

 0.4 0.2 0.4  P =  0.25 0.5 0.25  0.8 0.2 0 Desired Distribution: π = (1/3, 1/3, 1/3).

(1/3)α(1, 2)(0.2) = (1/3)α(2, 1)(0.25) ⇒ α(1, 2) = 1 , α(2, 1) = 4/5

(1/3)α(1, 3)(0.4) = (1/3)α(3, 1)(0.8) ⇒ α(1, 3) = 1 , α(3, 1) = 1/2

(1/3)α(2, 3)(0.25) = (1/3)α(3, 2)(0.2) ⇒ α(2, 3) = 4/5 , α(3, 2) = 1 Example Uses of P and α Current state: i = 3. Next-state candidtate: j = 1. Generate random real: U = 0.631. U > α(3, 1) ⇒ next state remains i = 3. Current state: i = 2. Next-state candidtate: j = 3. Generate random real: U = 0.417. U ≤ α(2, 3) ⇒ next state is 3.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Example

Final α Matrix  0.4 0.2 0.4   1 1 1  P =  0.25 0.5 0.25  α =  4/5 1 4/5  0.8 0.2 0 1/2 1 1

Todd Ebert Markov Chain Monte Carlo Current state: i = 3. Next-state candidtate: j = 1. Generate random real: U = 0.631. U > α(3, 1) ⇒ next state remains i = 3. Current state: i = 2. Next-state candidtate: j = 3. Generate random real: U = 0.417. U ≤ α(2, 3) ⇒ next state is 3.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Example

Final α Matrix  0.4 0.2 0.4   1 1 1  P =  0.25 0.5 0.25  α =  4/5 1 4/5  0.8 0.2 0 1/2 1 1

Example Uses of P and α

Todd Ebert Markov Chain Monte Carlo Current state: i = 2. Next-state candidtate: j = 3. Generate random real: U = 0.417. U ≤ α(2, 3) ⇒ next state is 3.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Example

Final α Matrix  0.4 0.2 0.4   1 1 1  P =  0.25 0.5 0.25  α =  4/5 1 4/5  0.8 0.2 0 1/2 1 1

Example Uses of P and α Current state: i = 3. Next-state candidtate: j = 1. Generate random real: U = 0.631. U > α(3, 1) ⇒ next state remains i = 3.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Example

Final α Matrix  0.4 0.2 0.4   1 1 1  P =  0.25 0.5 0.25  α =  4/5 1 4/5  0.8 0.2 0 1/2 1 1

Example Uses of P and α Current state: i = 3. Next-state candidtate: j = 1. Generate random real: U = 0.631. U > α(3, 1) ⇒ next state remains i = 3. Current state: i = 2. Next-state candidtate: j = 3. Generate random real: U = 0.417. U ≤ α(2, 3) ⇒ next state is 3.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Using HM to Sample a Uniform Distribution

X = {x1,..., xn} ( Y , where |Y | is very large and n is unknown.

Desired distribution is π(xi ) = 1/|X |, for all i = 1,..., n. For each x ∈ X , N(x) ⊆ X denotes the neighborhood of x. For each y ∈ N(x) define P(y|x) = 1/|N(x)|. Then α(x, y) = min{|N(x)|/|N(y)|, 1}. Interpretation: if x has fewer neighbors than y, then there is more of a tendency to remain at x in order to compensate for the less likelihood of transitioning to x from another state.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Using HM to Sample a Uniform Distribution

Sampling a Permutation with a given Property X denotes the set of permutations σ over 1,..., 20 for which

20 X i ∗ σ(i) ≥ 1900. i=1 For each σ, τ ∈ N(σ) iff τ can be obtained from σ by swapping two elements of σ. For example, if

σ = 5, 8, 9, 10, 1, 3, 4, 15, 20, 2, 19, 6, 18, 7, 17, 11, 16, 12, 13, 14,

then

τ = 5, 14, 9, 10, 1, 3, 4, 15, 20, 2, 19, 6, 18, 7, 17, 11, 16, 12, 13, 8,

is a member of N(σ).

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Outline

1 Introduction

2 Markov-Chains

3 Hastings-Metropolis Algorithm

4 Simulated Annealing

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing The General Optimization Problem

Optimization Problem Given X and function H(x), find an element x∗ ∈ X for which H(x∗) ≤ H(x), for all x ∈ X .

Todd Ebert Markov Chain Monte Carlo A Distribution Inspired from exp( −H(x) ) π(x) = T , P H(y) y∈X where T > 0 is a temperature parameter .

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Applying the HM Algorithm to Optimization

For each x ∈ X , assume that the neighborhood N(x) has a constant size K > 0, so that P(y|x) = 1/K, for all x, y ∈ X . Thus, P is symmetric. Define the stationary probability π(x) in terms of H(x), where π(x) decreases as H(x) increases.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Applying the HM Algorithm to Optimization

For each x ∈ X , assume that the neighborhood N(x) has a constant size K > 0, so that P(y|x) = 1/K, for all x, y ∈ X . Thus, P is symmetric. Define the stationary probability π(x) in terms of H(x), where π(x) decreases as H(x) increases.

A Distribution Inspired from Physics exp( −H(x) ) π(x) = T , P H(y) y∈X where T > 0 is a temperature parameter .

Todd Ebert Markov Chain Monte Carlo Annealing is a heating method for creating metals with desirable physical properties. Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart. The Crystalline structures re-form and grow larger the more slowly the metal is cooled. These structures correspond with a low-energy state.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy

The Annealing Process

Todd Ebert Markov Chain Monte Carlo Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart. The Crystalline structures re-form and grow larger the more slowly the metal is cooled. These structures correspond with a low-energy state.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy

The Annealing Process Annealing is a heating method for creating metals with desirable physical properties.

Todd Ebert Markov Chain Monte Carlo The Crystalline structures re-form and grow larger the more slowly the metal is cooled. These structures correspond with a low-energy state.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy

The Annealing Process Annealing is a heating method for creating metals with desirable physical properties. Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart.

Todd Ebert Markov Chain Monte Carlo These structures correspond with a low-energy state.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy

The Annealing Process Annealing is a heating method for creating metals with desirable physical properties. Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart. The Crystalline structures re-form and grow larger the more slowly the metal is cooled.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy

The Annealing Process Annealing is a heating method for creating metals with desirable physical properties. Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart. The Crystalline structures re-form and grow larger the more slowly the metal is cooled. These structures correspond with a low-energy state.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Crystalline Structures in Metals

Polonium Crystals

Figure: Polonium Metal Crystals Todd Ebert Markov Chain Monte Carlo Temperature parameter T is successively lowered according to a cooling schedule. Higher temperatures produce more uniform-looking distributions. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 1 as T → ∞. Lower temperatures produce distributions more concentrated about low H(x) states. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 0 as T → 0.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Stationary Distribution

Notes on Using the Distribution Using HM Algorithm, if H(x) < H(y), then

α(x, y) = exp((H(x) − H(y))/T ).

Todd Ebert Markov Chain Monte Carlo Higher temperatures produce more uniform-looking distributions. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 1 as T → ∞. Lower temperatures produce distributions more concentrated about low H(x) states. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 0 as T → 0.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Stationary Distribution

Notes on Using the Distribution Using HM Algorithm, if H(x) < H(y), then

α(x, y) = exp((H(x) − H(y))/T ).

Temperature parameter T is successively lowered according to a cooling schedule.

Todd Ebert Markov Chain Monte Carlo Lower temperatures produce distributions more concentrated about low H(x) states. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 0 as T → 0.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Stationary Distribution

Notes on Using the Distribution Using HM Algorithm, if H(x) < H(y), then

α(x, y) = exp((H(x) − H(y))/T ).

Temperature parameter T is successively lowered according to a cooling schedule. Higher temperatures produce more uniform-looking distributions. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 1 as T → ∞.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Stationary Distribution

Notes on Using the Distribution Using HM Algorithm, if H(x) < H(y), then

α(x, y) = exp((H(x) − H(y))/T ).

Temperature parameter T is successively lowered according to a cooling schedule. Higher temperatures produce more uniform-looking distributions. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 1 as T → ∞. Lower temperatures produce distributions more concentrated about low H(x) states. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 0 as T → 0.

Todd Ebert Markov Chain Monte Carlo Simulated Annealing Algorithm

Generate an Initial State s0 Initialize T0 = ∞

While a solution state has not been found at step k ≥ 0 Use P to generate a next state y from current state x. If y is a solution state (in case H(y) is known to be minimum), then return y. If H(y) ≤ H(x), then transition to next state y. Otherwise Generate random U ∈ [0, 1] If U ≤ exp((H(x) − H(y))/Tk ), then transition to next state y Otherwise remain in state x Tk+1 = cooling function(k + 1) Example 2 T = 2.0 H(x) = 3.0, H(y) = 3.5 α(x, y) = exp((H(x) − H(y))/T ) = exp(−0.25) = 0.778 U = 0.63 ≤ 0.778 ⇒ next state is y.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Transition Examples

Example 1 T = 0.1 H(x) = 0.4, H(y) = 0.6 α(x, y) = exp((H(x) − H(y))/T ) = exp(−2) = 0.135 U = 0.28 > 0.135 ⇒ next state remains x.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Transition Examples

Example 1 T = 0.1 H(x) = 0.4, H(y) = 0.6 α(x, y) = exp((H(x) − H(y))/T ) = exp(−2) = 0.135 U = 0.28 > 0.135 ⇒ next state remains x.

Example 2 T = 2.0 H(x) = 3.0, H(y) = 3.5 α(x, y) = exp((H(x) − H(y))/T ) = exp(−0.25) = 0.778 U = 0.63 ≤ 0.778 ⇒ next state is y.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Cooling Techniques

Geman and Geman’s Theorem A necessary and sufficience condition for a having a probability of one of ending in a global optimum is that the temperature decreases more slowly than a T = , b + log(t)

with a and b being problem-dependent constants, and t is the number of steps.

Todd Ebert Markov Chain Monte Carlo Linear Cooling: T = a − bt, where a is the initial temperature, and b is usually chosen within the range of [0.01, 0.2]. Exponential Cooling: T = a · bt , where a is the initial temperature, and b is usually chosen within the range of [0.8, 0.99].

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Cooling Techniques

Popular Cooling Schedules

Todd Ebert Markov Chain Monte Carlo Exponential Cooling: T = a · bt , where a is the initial temperature, and b is usually chosen within the range of [0.8, 0.99].

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Cooling Techniques

Popular Cooling Schedules Linear Cooling: T = a − bt, where a is the initial temperature, and b is usually chosen within the range of [0.01, 0.2].

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Cooling Techniques

Popular Cooling Schedules Linear Cooling: T = a − bt, where a is the initial temperature, and b is usually chosen within the range of [0.01, 0.2]. Exponential Cooling: T = a · bt , where a is the initial temperature, and b is usually chosen within the range of [0.8, 0.99].

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing for Finding Problem Solutions

Each x ∈ X represents a possible solutions to some problem. H(x) measures how close x is to being a solution, with H(x) = 0 implying that x is a solution.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Example: Finding a Hamilton Path

1

6 5 2 10 7

9 8

4 3

Figure: Hamilton Path (green edges) for the Peterson Graph

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Example: Finding a Hamilton Path

Defining the State Space G = (V , E) is the graph under consideration, where V = {1, 2,..., n}. X denotes the set of all permutations of {1, 2,..., n}.

For x ∈ X , xi denotes the i th number of permutation x, i = 1, 2,..., n. Permutation y ∈ N(x) iff y can be obtained from x by n(n−1) swapping two of permutation x. Hence, K = 2 . H(x) is defined as the number of pairs (xi , xi+1) for which (xi , xi+1) 6∈ E.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Example: Finding a Hamilton Path

Calculating H(x) when G is the Peterson Graph x = 6, 2, 3, 9, 7, 1, 4, 8, 10, 5 Pairs (6, 2), (3, 9), (7, 1), (1, 4), (4, 8) are not edges of G. Therefore H(x) = 5

Todd Ebert Markov Chain Monte Carlo Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps. Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima. Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima. Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing

Escaping Local Minima and Speeding up Convergence

Todd Ebert Markov Chain Monte Carlo Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima. Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima. Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing

Escaping Local Minima and Speeding up Convergence Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps.

Todd Ebert Markov Chain Monte Carlo Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima. Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing

Escaping Local Minima and Speeding up Convergence Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps. Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima.

Todd Ebert Markov Chain Monte Carlo Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing

Escaping Local Minima and Speeding up Convergence Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps. Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima. Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing

Escaping Local Minima and Speeding up Convergence Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps. Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima. Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima. Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.

Todd Ebert Markov Chain Monte Carlo Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.

Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing

Advantages

Todd Ebert Markov Chain Monte Carlo Local state transitions allow one to zoom in on increasingly improved neighboring states.

Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing

Advantages Based on a well-developed mathematical theory.

Todd Ebert Markov Chain Monte Carlo Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing

Advantages Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.

Todd Ebert Markov Chain Monte Carlo Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing

Advantages Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.

Disadvantages

Todd Ebert Markov Chain Monte Carlo There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.

Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing

Advantages Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.

Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing

Advantages Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.

Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.

Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Finding the Tallest Peak in the Grand Canyon

Figure: Challenging Search Problem: Find the Tallest Peak!

Todd Ebert Markov Chain Monte Carlo