Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing
Markov Chain Monte Carlo
Todd Ebert
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Outline
1 Introduction
2 Markov-Chains
3 Hastings-Metropolis Algorithm
4 Simulated Annealing
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Sampling an Unknown Distribution
The Problem Need to sample random variable X , but distribution π(x) is unknown. Only relative values π(x)/π(y) are known, for each x, y ∈ dom(X ). |dom(X )| is too large to enumerate.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Sampling an Unknown Distribution
The Solution Devleop a method for randomly traversing through the elements of dom(X ). The elements of dom(X ) are called states. Moving from one state to the next is called a state transition, and is governed by a probability distribution p(y|x) that gives the probability of transitioning to y on condition the current visited state is x. For each x ∈ dom(X ), the fraction of visits that are to state x converges to π(x).
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Outline
1 Introduction
2 Markov-Chains
3 Hastings-Metropolis Algorithm
4 Simulated Annealing
Todd Ebert Markov Chain Monte Carlo Markov-chain Example States: {1 = no rain, 2 = rain}. State Transition: moving from one day to the next. State-Transition matrix. 0.8 0.2 P = 0.5 0.5
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models
Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.
Todd Ebert Markov Chain Monte Carlo States: {1 = no rain, 2 = rain}. State Transition: moving from one day to the next. State-Transition matrix. 0.8 0.2 P = 0.5 0.5
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models
Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.
Markov-chain Example
Todd Ebert Markov Chain Monte Carlo State Transition: moving from one day to the next. State-Transition matrix. 0.8 0.2 P = 0.5 0.5
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models
Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.
Markov-chain Example States: {1 = no rain, 2 = rain}.
Todd Ebert Markov Chain Monte Carlo State-Transition matrix. 0.8 0.2 P = 0.5 0.5
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models
Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.
Markov-chain Example States: {1 = no rain, 2 = rain}. State Transition: moving from one day to the next.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Models
Markov-Chain State Transition Model Given a finite set of states {1,..., n}, a Markov-chain state-transition model is an n × n matrix P, where entry Pij is the probability of transitioning from state i to state j.
Markov-chain Example States: {1 = no rain, 2 = rain}. State Transition: moving from one day to the next. State-Transition matrix. 0.8 0.2 P = 0.5 0.5
Todd Ebert Markov Chain Monte Carlo States: {(no rain, no rain), (no rain, rain), (rain, no rain), (rain,rain)}. State Interpretation. For example, (no rain, rain) means “no rain yesterday, but rain today”.
State-Transition Matrix P (nr, nr) (nr, r) (r, nr) (r,r) (nr,nr) 0.85 0.15 0 0 (nr, r) 0 0 0.6 0.4 (r, nr) 0.65 0.35 0 0 (r, r) 0 0 0.7 0.3
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Can Use Both Past and Present
Markov-chain Example
Todd Ebert Markov Chain Monte Carlo State Interpretation. For example, (no rain, rain) means “no rain yesterday, but rain today”.
State-Transition Matrix P (nr, nr) (nr, r) (r, nr) (r,r) (nr,nr) 0.85 0.15 0 0 (nr, r) 0 0 0.6 0.4 (r, nr) 0.65 0.35 0 0 (r, r) 0 0 0.7 0.3
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Can Use Both Past and Present
Markov-chain Example States: {(no rain, no rain), (no rain, rain), (rain, no rain), (rain,rain)}.
Todd Ebert Markov Chain Monte Carlo State-Transition Matrix P (nr, nr) (nr, r) (r, nr) (r,r) (nr,nr) 0.85 0.15 0 0 (nr, r) 0 0 0.6 0.4 (r, nr) 0.65 0.35 0 0 (r, r) 0 0 0.7 0.3
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Can Use Both Past and Present
Markov-chain Example States: {(no rain, no rain), (no rain, rain), (rain, no rain), (rain,rain)}. State Interpretation. For example, (no rain, rain) means “no rain yesterday, but rain today”.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov-Chains Can Use Both Past and Present
Markov-chain Example States: {(no rain, no rain), (no rain, rain), (rain, no rain), (rain,rain)}. State Interpretation. For example, (no rain, rain) means “no rain yesterday, but rain today”.
State-Transition Matrix P (nr, nr) (nr, r) (r, nr) (r,r) (nr,nr) 0.85 0.15 0 0 (nr, r) 0 0 0.6 0.4 (r, nr) 0.65 0.35 0 0 (r, r) 0 0 0.7 0.3
Todd Ebert Markov Chain Monte Carlo Proposition 1 Pt = P · P(t−1). In other words, the t-step transition matrix is obtained by multiplying the one-step matrix with the (t − 1)-step matrix.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Predicting Further into the Future
t-Step Transition Matrix Pt t t The t-step transition matrix P is defined so that Pij represents the probability of being in state j t steps after being in state i.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Predicting Further into the Future
t-Step Transition Matrix Pt t t The t-step transition matrix P is defined so that Pij represents the probability of being in state j t steps after being in state i.
Proposition 1 Pt = P · P(t−1). In other words, the t-step transition matrix is obtained by multiplying the one-step matrix with the (t − 1)-step matrix.
Todd Ebert Markov Chain Monte Carlo Proof of Proposition 1: t = 2 Basis Step
Let Si , i = 0, 1, 2,..., be the current state at time i. Then for t = 2, 2 Pij = p(S2 = j|S0 = i) = n X p(S2 = j|S1 = k, S0 = i)p(S1 = k|S0 = i) = k=1 n n X X p(S2 = j|S1 = k)p(S1 = k|S0 = i) = Pik Pkj , k=1 k=1 which is obtained by taking the inner product of row i of P with column j of P. Thus, P2 = P · P. Proof of Proposition 1: Inductive Step
Now assume the result holds for some t ≥ 2. We show that it is also true for t + 1.
(t+1) Pij = p(S(t+1) = j|S0 = i) = n X p(S(t+1) = j|St = k, S0 = i)p(St = k|S0 = i) = k=1 n n X X t p(S(t+1) = j|St = k)p(St = k|S0 = i) = Pik Pkj , k=1 k=1 which is obtained by taking the inner product of row i of P with column j of Pt . Thus, P(t+1) = P · Pt , and the proposition is proved by induction on t. 0.74 0.26 0.74 0.26 0.7166 0.2834 P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915
Interpretation of P4 If it is not raining today, then there is a 71.66% chance of no rain in 4 days. If it is raining today, then there is a 29.15% chance of rain in 4 days.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example
0.8 0.2 0.8 0.2 0.74 0.26 P2 = = 0.5 0.5 0.5 0.5 0.65 0.35
Todd Ebert Markov Chain Monte Carlo Interpretation of P4 If it is not raining today, then there is a 71.66% chance of no rain in 4 days. If it is raining today, then there is a 29.15% chance of rain in 4 days.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example
0.8 0.2 0.8 0.2 0.74 0.26 P2 = = 0.5 0.5 0.5 0.5 0.65 0.35
0.74 0.26 0.74 0.26 0.7166 0.2834 P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915
Todd Ebert Markov Chain Monte Carlo If it is not raining today, then there is a 71.66% chance of no rain in 4 days. If it is raining today, then there is a 29.15% chance of rain in 4 days.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example
0.8 0.2 0.8 0.2 0.74 0.26 P2 = = 0.5 0.5 0.5 0.5 0.65 0.35
0.74 0.26 0.74 0.26 0.7166 0.2834 P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915
Interpretation of P4
Todd Ebert Markov Chain Monte Carlo If it is raining today, then there is a 29.15% chance of rain in 4 days.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example
0.8 0.2 0.8 0.2 0.74 0.26 P2 = = 0.5 0.5 0.5 0.5 0.65 0.35
0.74 0.26 0.74 0.26 0.7166 0.2834 P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915
Interpretation of P4 If it is not raining today, then there is a 71.66% chance of no rain in 4 days.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing t-Step Transition Weather Example
0.8 0.2 0.8 0.2 0.74 0.26 P2 = = 0.5 0.5 0.5 0.5 0.65 0.35
0.74 0.26 0.74 0.26 0.7166 0.2834 P4 = = 0.65 0.35 0.65 0.35 0.7085 0.2915
Interpretation of P4 If it is not raining today, then there is a 71.66% chance of no rain in 4 days. If it is raining today, then there is a 29.15% chance of rain in 4 days.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Ergodic Markov-Chains
Some Important Properties of a Markov-Chain Model Irreducible. For each pair of states i, j, there is positive probability of reaching j from i in a finite number of transitions. Period. The period of state i is defined as
n τi = gcd{n > 0|Pii > 0}.
Aperiodic. each state has period equal to 1. Positive recurrent For any state i, expected time to return to i (upon leaving i) is finite. Ergodic A Markov chain is ergodic iff it is aperiodic and positive recurrent.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Fundamental Theorem of Markov Chains
Stationary Distributions Given a Markov-chain matrix P, a probability distribution π over S is called stationary (or an equilibrium distribution) provided that π = π · P, where π is viewed as a 1 × n vector/matrix. Thus, the probability of being in state i, i = 1,..., n, after t steps is independent of t and is given by π(i).
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Fundamental Theorem of Markov Chains
Fundamental Theorem of Irreducible, Ergodic Markov Chains Let P be the transition matrix for a finite, irreducible, ergodic Markov chain. Then associated with P is a unique stationary distribution π for which
t πi = lim P = 1/si , t−>∞ ji
for all j = 1, 2,... n, and where si is the expected number of steps it takes for a random walk beginning at state i to return to state i. Moreover, π satisfies the equation π = π · P. Conversely, if an irreducible, ergodic Markov chain’s transition matrix satisfies such an equation, for some distribution π, then π is the chain’s stationary distribution.
Todd Ebert Markov Chain Monte Carlo Stationary Distribution Interpretation Regardless of today’s weather, the probability that it will be not be raining in exactly one year from today (or some other day in the distant future) is approximately 5/7.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Stationary Distribution for the Weather Example
Let π = (x, y). Then
0.8 0.2 (x, y) = (x, y) ⇒ x = 0.8x + 0.5y ⇒ x = 5y/2 0.5 0.5
by equating the first components of both sides. Since x + y = 1, This yields
5y/2 + y = 7y/2 = 1 ⇒ y = 2/7 and x = 5/7.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Stationary Distribution for the Weather Example
Let π = (x, y). Then
0.8 0.2 (x, y) = (x, y) ⇒ x = 0.8x + 0.5y ⇒ x = 5y/2 0.5 0.5
by equating the first components of both sides. Since x + y = 1, This yields
5y/2 + y = 7y/2 = 1 ⇒ y = 2/7 and x = 5/7.
Stationary Distribution Interpretation Regardless of today’s weather, the probability that it will be not be raining in exactly one year from today (or some other day in the distant future) is approximately 5/7.
Todd Ebert Markov Chain Monte Carlo Proof of Master Equation Simply note that the left side is the i th component of π, while the right side is the i th component of π · P.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Master Equation
Master Equation Corollary for P with Stationary Distribution π For every i ∈ S, X X π(i) Pij = π(j)Pji j∈S j∈S
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Master Equation
Master Equation Corollary for P with Stationary Distribution π For every i ∈ S, X X π(i) Pij = π(j)Pji j∈S j∈S
Proof of Master Equation Simply note that the left side is the i th component of π, while the right side is the i th component of π · P.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Master Equation
Interpretation: Conservation of State For each i ∈ S, the Master Equation states that the probability of the event that the system transitions into state i equals the probability of the event that the system transitions out of state i.
Todd Ebert Markov Chain Monte Carlo Proof of Detailed-Balance Corollary By fixing i and summing both sides over j, one obtains the Master Equation. Hence, the Master-Equation Corollary implies that π is the stationary distribution.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Detailed-Balance Equation
Detailed-Balance Equation Corollary for Markov Chain P If for every i, j ∈ S, π(i)Pij = π(j)Pji , then π is the stationary distribution for P.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Detailed-Balance Equation
Detailed-Balance Equation Corollary for Markov Chain P If for every i, j ∈ S, π(i)Pij = π(j)Pji , then π is the stationary distribution for P.
Proof of Detailed-Balance Corollary By fixing i and summing both sides over j, one obtains the Master Equation. Hence, the Master-Equation Corollary implies that π is the stationary distribution.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Markov Chain Detailed-Balance Equation
Interpretataion: Conservation of Inter-State Transitions For each i, j ∈ S, probability that system is in state i and then transitions to state j equals the probability that system is in state j and transitions to state i.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Estimating λ for Distribution π
1 Select initial state X0 2 For k sufficiently large, use P to transition to states X1,..., Xk , so that
P(Xk = i) ≈ π(i).
3 Perform n additiional transitions and define 1 λˆ = (X + ··· + X ) . n k+1 k+n
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Estimating σ2 for Distribution π
1 Select initial state X0 2 For k sufficiently large, use P to transition to states X1,..., Xk , so that
P(Xk = i) ≈ π(i).
3 Choose r sufficiently large so that correlation(Xi , Xi+r ) ≈ 0. 4 Let 1 Y = X + ··· + X , i r k+(i−1)r+1 k+ir i = 1,..., n. Then
" n # 1 X σˆ2 = Y 2 − nλˆ2 . n − 1 i i=1
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Outline
1 Introduction
2 Markov-Chains
3 Hastings-Metropolis Algorithm
4 Simulated Annealing
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm
Eating the π that You Desire Given Markov Chain P, desired state distribution π, and i, j ∈ S, define α(i, j) and α(j, i) so that
π(i)α(i, j)Pij = π(j)α(j, i)Pji ,
where, e.g. α(i, j) = 1 in the case that π(i)Pij ≤ π(j)Pji .
Todd Ebert Markov Chain Monte Carlo When in state i, apply transition matrix P to generate candidate next-state j. Generate a random real number U from interval [0, 1]. If U ≤ α(i, j), transition to next state j. Otherwise, remain in (next) state i.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm
Hastings-Metropolis Algorithm
Todd Ebert Markov Chain Monte Carlo Generate a random real number U from interval [0, 1]. If U ≤ α(i, j), transition to next state j. Otherwise, remain in (next) state i.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm
Hastings-Metropolis Algorithm When in state i, apply transition matrix P to generate candidate next-state j.
Todd Ebert Markov Chain Monte Carlo If U ≤ α(i, j), transition to next state j. Otherwise, remain in (next) state i.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm
Hastings-Metropolis Algorithm When in state i, apply transition matrix P to generate candidate next-state j. Generate a random real number U from interval [0, 1].
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Algorithm
Hastings-Metropolis Algorithm When in state i, apply transition matrix P to generate candidate next-state j. Generate a random real number U from interval [0, 1]. If U ≤ α(i, j), transition to next state j. Otherwise, remain in (next) state i.
Todd Ebert Markov Chain Monte Carlo Hastings-Metropolis Example
0.4 0.2 0.4 P = 0.25 0.5 0.25 0.8 0.2 0 Desired Distribution: π = (1/3, 1/3, 1/3).
(1/3)α(1, 2)(0.2) = (1/3)α(2, 1)(0.25) ⇒ α(1, 2) = 1 , α(2, 1) = 4/5
(1/3)α(1, 3)(0.4) = (1/3)α(3, 1)(0.8) ⇒ α(1, 3) = 1 , α(3, 1) = 1/2
(1/3)α(2, 3)(0.25) = (1/3)α(3, 2)(0.2) ⇒ α(2, 3) = 4/5 , α(3, 2) = 1 Example Uses of P and α Current state: i = 3. Next-state candidtate: j = 1. Generate random real: U = 0.631. U > α(3, 1) ⇒ next state remains i = 3. Current state: i = 2. Next-state candidtate: j = 3. Generate random real: U = 0.417. U ≤ α(2, 3) ⇒ next state is 3.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Example
Final α Matrix 0.4 0.2 0.4 1 1 1 P = 0.25 0.5 0.25 α = 4/5 1 4/5 0.8 0.2 0 1/2 1 1
Todd Ebert Markov Chain Monte Carlo Current state: i = 3. Next-state candidtate: j = 1. Generate random real: U = 0.631. U > α(3, 1) ⇒ next state remains i = 3. Current state: i = 2. Next-state candidtate: j = 3. Generate random real: U = 0.417. U ≤ α(2, 3) ⇒ next state is 3.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Example
Final α Matrix 0.4 0.2 0.4 1 1 1 P = 0.25 0.5 0.25 α = 4/5 1 4/5 0.8 0.2 0 1/2 1 1
Example Uses of P and α
Todd Ebert Markov Chain Monte Carlo Current state: i = 2. Next-state candidtate: j = 3. Generate random real: U = 0.417. U ≤ α(2, 3) ⇒ next state is 3.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Example
Final α Matrix 0.4 0.2 0.4 1 1 1 P = 0.25 0.5 0.25 α = 4/5 1 4/5 0.8 0.2 0 1/2 1 1
Example Uses of P and α Current state: i = 3. Next-state candidtate: j = 1. Generate random real: U = 0.631. U > α(3, 1) ⇒ next state remains i = 3.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Hastings-Metropolis Example
Final α Matrix 0.4 0.2 0.4 1 1 1 P = 0.25 0.5 0.25 α = 4/5 1 4/5 0.8 0.2 0 1/2 1 1
Example Uses of P and α Current state: i = 3. Next-state candidtate: j = 1. Generate random real: U = 0.631. U > α(3, 1) ⇒ next state remains i = 3. Current state: i = 2. Next-state candidtate: j = 3. Generate random real: U = 0.417. U ≤ α(2, 3) ⇒ next state is 3.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Using HM to Sample a Uniform Distribution
X = {x1,..., xn} ( Y , where |Y | is very large and n is unknown.
Desired distribution is π(xi ) = 1/|X |, for all i = 1,..., n. For each x ∈ X , N(x) ⊆ X denotes the neighborhood of x. For each y ∈ N(x) define P(y|x) = 1/|N(x)|. Then α(x, y) = min{|N(x)|/|N(y)|, 1}. Interpretation: if x has fewer neighbors than y, then there is more of a tendency to remain at x in order to compensate for the less likelihood of transitioning to x from another state.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Using HM to Sample a Uniform Distribution
Sampling a Permutation with a given Property X denotes the set of permutations σ over 1,..., 20 for which
20 X i ∗ σ(i) ≥ 1900. i=1 For each σ, τ ∈ N(σ) iff τ can be obtained from σ by swapping two elements of σ. For example, if
σ = 5, 8, 9, 10, 1, 3, 4, 15, 20, 2, 19, 6, 18, 7, 17, 11, 16, 12, 13, 14,
then
τ = 5, 14, 9, 10, 1, 3, 4, 15, 20, 2, 19, 6, 18, 7, 17, 11, 16, 12, 13, 8,
is a member of N(σ).
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Outline
1 Introduction
2 Markov-Chains
3 Hastings-Metropolis Algorithm
4 Simulated Annealing
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing The General Optimization Problem
Optimization Problem Given state space X and function H(x), find an element x∗ ∈ X for which H(x∗) ≤ H(x), for all x ∈ X .
Todd Ebert Markov Chain Monte Carlo A Distribution Inspired from Physics exp( −H(x) ) π(x) = T , P H(y) y∈X where T > 0 is a temperature parameter .
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Applying the HM Algorithm to Optimization
For each x ∈ X , assume that the neighborhood N(x) has a constant size K > 0, so that P(y|x) = 1/K, for all x, y ∈ X . Thus, P is symmetric. Define the stationary probability π(x) in terms of H(x), where π(x) decreases as H(x) increases.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Applying the HM Algorithm to Optimization
For each x ∈ X , assume that the neighborhood N(x) has a constant size K > 0, so that P(y|x) = 1/K, for all x, y ∈ X . Thus, P is symmetric. Define the stationary probability π(x) in terms of H(x), where π(x) decreases as H(x) increases.
A Distribution Inspired from Physics exp( −H(x) ) π(x) = T , P H(y) y∈X where T > 0 is a temperature parameter .
Todd Ebert Markov Chain Monte Carlo Annealing is a heating method for creating metals with desirable physical properties. Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart. The Crystalline structures re-form and grow larger the more slowly the metal is cooled. These structures correspond with a low-energy state.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy
The Annealing Process
Todd Ebert Markov Chain Monte Carlo Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart. The Crystalline structures re-form and grow larger the more slowly the metal is cooled. These structures correspond with a low-energy state.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy
The Annealing Process Annealing is a heating method for creating metals with desirable physical properties.
Todd Ebert Markov Chain Monte Carlo The Crystalline structures re-form and grow larger the more slowly the metal is cooled. These structures correspond with a low-energy state.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy
The Annealing Process Annealing is a heating method for creating metals with desirable physical properties. Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart.
Todd Ebert Markov Chain Monte Carlo These structures correspond with a low-energy state.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy
The Annealing Process Annealing is a heating method for creating metals with desirable physical properties. Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart. The Crystalline structures re-form and grow larger the more slowly the metal is cooled.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Heating and Cooling Metals to Minimize Energy
The Annealing Process Annealing is a heating method for creating metals with desirable physical properties. Metal is heated to a temperature below its melting point, but high enough so that the crystalline lattice structures within the metal break apart. The Crystalline structures re-form and grow larger the more slowly the metal is cooled. These structures correspond with a low-energy state.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Crystalline Structures in Metals
Polonium Crystals
Figure: Polonium Metal Crystals Todd Ebert Markov Chain Monte Carlo Temperature parameter T is successively lowered according to a cooling schedule. Higher temperatures produce more uniform-looking distributions. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 1 as T → ∞. Lower temperatures produce distributions more concentrated about low H(x) states. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 0 as T → 0.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Stationary Distribution
Notes on Using the Distribution Using HM Algorithm, if H(x) < H(y), then
α(x, y) = exp((H(x) − H(y))/T ).
Todd Ebert Markov Chain Monte Carlo Higher temperatures produce more uniform-looking distributions. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 1 as T → ∞. Lower temperatures produce distributions more concentrated about low H(x) states. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 0 as T → 0.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Stationary Distribution
Notes on Using the Distribution Using HM Algorithm, if H(x) < H(y), then
α(x, y) = exp((H(x) − H(y))/T ).
Temperature parameter T is successively lowered according to a cooling schedule.
Todd Ebert Markov Chain Monte Carlo Lower temperatures produce distributions more concentrated about low H(x) states. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 0 as T → 0.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Stationary Distribution
Notes on Using the Distribution Using HM Algorithm, if H(x) < H(y), then
α(x, y) = exp((H(x) − H(y))/T ).
Temperature parameter T is successively lowered according to a cooling schedule. Higher temperatures produce more uniform-looking distributions. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 1 as T → ∞.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Stationary Distribution
Notes on Using the Distribution Using HM Algorithm, if H(x) < H(y), then
α(x, y) = exp((H(x) − H(y))/T ).
Temperature parameter T is successively lowered according to a cooling schedule. Higher temperatures produce more uniform-looking distributions. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 1 as T → ∞. Lower temperatures produce distributions more concentrated about low H(x) states. True since, if H(x) < H(y), then π(x)/π(y) = exp((H(x) − H(y))/T ) → 0 as T → 0.
Todd Ebert Markov Chain Monte Carlo Simulated Annealing Algorithm
Generate an Initial State s0 Initialize T0 = ∞
While a solution state has not been found at step k ≥ 0 Use P to generate a next state y from current state x. If y is a solution state (in case H(y) is known to be minimum), then return y. If H(y) ≤ H(x), then transition to next state y. Otherwise Generate random U ∈ [0, 1] If U ≤ exp((H(x) − H(y))/Tk ), then transition to next state y Otherwise remain in state x Tk+1 = cooling function(k + 1) Example 2 T = 2.0 H(x) = 3.0, H(y) = 3.5 α(x, y) = exp((H(x) − H(y))/T ) = exp(−0.25) = 0.778 U = 0.63 ≤ 0.778 ⇒ next state is y.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Transition Examples
Example 1 T = 0.1 H(x) = 0.4, H(y) = 0.6 α(x, y) = exp((H(x) − H(y))/T ) = exp(−2) = 0.135 U = 0.28 > 0.135 ⇒ next state remains x.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing Transition Examples
Example 1 T = 0.1 H(x) = 0.4, H(y) = 0.6 α(x, y) = exp((H(x) − H(y))/T ) = exp(−2) = 0.135 U = 0.28 > 0.135 ⇒ next state remains x.
Example 2 T = 2.0 H(x) = 3.0, H(y) = 3.5 α(x, y) = exp((H(x) − H(y))/T ) = exp(−0.25) = 0.778 U = 0.63 ≤ 0.778 ⇒ next state is y.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Cooling Techniques
Geman and Geman’s Theorem A necessary and sufficience condition for a having a probability of one of ending in a global optimum is that the temperature decreases more slowly than a T = , b + log(t)
with a and b being problem-dependent constants, and t is the number of steps.
Todd Ebert Markov Chain Monte Carlo Linear Cooling: T = a − bt, where a is the initial temperature, and b is usually chosen within the range of [0.01, 0.2]. Exponential Cooling: T = a · bt , where a is the initial temperature, and b is usually chosen within the range of [0.8, 0.99].
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Cooling Techniques
Popular Cooling Schedules
Todd Ebert Markov Chain Monte Carlo Exponential Cooling: T = a · bt , where a is the initial temperature, and b is usually chosen within the range of [0.8, 0.99].
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Cooling Techniques
Popular Cooling Schedules Linear Cooling: T = a − bt, where a is the initial temperature, and b is usually chosen within the range of [0.01, 0.2].
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Cooling Techniques
Popular Cooling Schedules Linear Cooling: T = a − bt, where a is the initial temperature, and b is usually chosen within the range of [0.01, 0.2]. Exponential Cooling: T = a · bt , where a is the initial temperature, and b is usually chosen within the range of [0.8, 0.99].
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Simulated Annealing for Finding Problem Solutions
Each x ∈ X represents a possible solutions to some problem. H(x) measures how close x is to being a solution, with H(x) = 0 implying that x is a solution.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Example: Finding a Hamilton Path
1
6 5 2 10 7
9 8
4 3
Figure: Hamilton Path (green edges) for the Peterson Graph
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Example: Finding a Hamilton Path
Defining the State Space G = (V , E) is the graph under consideration, where V = {1, 2,..., n}. X denotes the set of all permutations of {1, 2,..., n}.
For x ∈ X , xi denotes the i th number of permutation x, i = 1, 2,..., n. Permutation y ∈ N(x) iff y can be obtained from x by n(n−1) swapping two numbers of permutation x. Hence, K = 2 . H(x) is defined as the number of pairs (xi , xi+1) for which (xi , xi+1) 6∈ E.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Example: Finding a Hamilton Path
Calculating H(x) when G is the Peterson Graph x = 6, 2, 3, 9, 7, 1, 4, 8, 10, 5 Pairs (6, 2), (3, 9), (7, 1), (1, 4), (4, 8) are not edges of G. Therefore H(x) = 5
Todd Ebert Markov Chain Monte Carlo Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps. Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima. Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima. Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing
Escaping Local Minima and Speeding up Convergence
Todd Ebert Markov Chain Monte Carlo Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima. Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima. Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing
Escaping Local Minima and Speeding up Convergence Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps.
Todd Ebert Markov Chain Monte Carlo Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima. Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing
Escaping Local Minima and Speeding up Convergence Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps. Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima.
Todd Ebert Markov Chain Monte Carlo Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing
Escaping Local Minima and Speeding up Convergence Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps. Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima. Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Enhancements to Simulated Annealing
Escaping Local Minima and Speeding up Convergence Tabu States. Tabu states are recently-visited states that are avoided in the near future so as to promote more variation in the search path. For example, a tabu number of five would prevent the return to a state that had beeen visited in the past five steps. Simulated Tempering. Rather than allowing the temperature to continually decrease, simulated tempering treats the temperature as a state space that can be navigated via a Markov-chain model. This allows for gradual temperature fluctuations, which can assist in the escaping of local minima. Swapping. Instead of a single state-graph path, r such paths are generated in parallel, where the path temperatures are uniformly distributed. State transitions of paths alternate with swapping the states of two paths. This allows for states to be subjected to different temperatures which can help escape local minima. Hybrid Tree Search. The root of the search tree r ∈ S is chosen randomly. Given visited state x, its children are states y ∈ ν(x) for which H(y) < H(x). If no such y exists, then x is a leaf. Otherwise, with probabiliy px randomly choose a child to visit next, and with probability 1 − px backtrack to the parent of x, where px increases as H(x) decreases.
Todd Ebert Markov Chain Monte Carlo Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.
Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing
Advantages
Todd Ebert Markov Chain Monte Carlo Local state transitions allow one to zoom in on increasingly improved neighboring states.
Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing
Advantages Based on a well-developed mathematical theory.
Todd Ebert Markov Chain Monte Carlo Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing
Advantages Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.
Todd Ebert Markov Chain Monte Carlo Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing
Advantages Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.
Disadvantages
Todd Ebert Markov Chain Monte Carlo There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.
Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing
Advantages Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.
Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Assessment of Simulated Annealing
Advantages Based on a well-developed mathematical theory. Local state transitions allow one to zoom in on increasingly improved neighboring states.
Disadvantages Local state transitions means that one cannot immediately “jump” to more promising regions of the state space. There is a propensity under low temperatures to become trapped in sub-optimal regions due to the lack of neighobors who offer improvement to the objective function.
Todd Ebert Markov Chain Monte Carlo Introduction Markov-Chains Hastings-Metropolis Algorithm Simulated Annealing Finding the Tallest Peak in the Grand Canyon
Figure: Challenging Search Problem: Find the Tallest Peak!
Todd Ebert Markov Chain Monte Carlo