<<

Model Counting for Logical Theories Wednesday

Dmitry Chistikov Rayna Dimitrova

Department of Computer Science University of Oxford, UK

Max Planck Institute for Software Systems (MPI-SWS) Kaiserslautern and Saarbr¨ucken, Germany

ESSLLI 2016 SAT SMT satisfiability satisfiability modulo theories

#SMT #SAT model counting model counting modulo theories

2/54 Agenda

Tuesday computational complexity, probability theory

Wednesday randomized , Monte Carlo methods

Thursday hashing-based approach to model counting

Friday from discrete to continuous model counting

3/54 Outline

1. Randomized algorithms Complexity classes RP and BPP

2. Monte Carlo methods

3. Markov chain Monte Carlo

4/54 Decision problems and algorithms

Decision problem: L ⊆ {0, 1}∗ (encodings of yes-instances)

Algorithm for L: says “yes” on every x ∈ L, “no” on every x ∈ {0, 1}∗ \ L

5/54 Complexity classes: brief summary

P: polynomial time (efficiently solvable)

NP: nondeterministic polynomial time (with efficiently verifiable solutions)

#P: counting polynomial time

6/54 Examples of randomized algorithms

I Primality testing

I Polynomial identity testing

I Undirected reachability

I Volume estimation

7/54 Randomized algorithms: our model

The can toss fair coins:

1. Syntactically, a is an algorithm that has access to a source of randomness, but acts deterministically if the random input is fixed. 2. It has on-demand access to arbitrary many independent 1 random variables that have Bernoulli( 2 ) distribution. 3. Each request takes 1 computational step.

8/54 Deterministic and randomized time complexity

Recall deterministic time complexity:

I of algorithm A on input x

Randomized time complexity: Maximum (worst-case) over all possible sequences of random bits

Then take maximum (worst-case) over all inputs of length n.

9/54 RP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 1/2 x 6∈ L =⇒ P[ A(x) accepts ] = 0

BPP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 3/4 x 6∈ L =⇒ P[ A(x) accepts ] ≤ 1/4

Complexity classes P, RP, and BPP P: class of languages L for which there exists a deterministic polynomial-time algorithm A such that x ∈ L =⇒ A(x) accepts x 6∈ L =⇒ A(x) rejects

10/54 BPP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 3/4 x 6∈ L =⇒ P[ A(x) accepts ] ≤ 1/4

Complexity classes P, RP, and BPP P: class of languages L for which there exists a deterministic polynomial-time algorithm A such that x ∈ L =⇒ A(x) accepts x 6∈ L =⇒ A(x) rejects

RP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 1/2 x 6∈ L =⇒ P[ A(x) accepts ] = 0

10/54 Complexity classes P, RP, and BPP P: class of languages L for which there exists a deterministic polynomial-time algorithm A such that x ∈ L =⇒ A(x) accepts x 6∈ L =⇒ A(x) rejects

RP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 1/2 x 6∈ L =⇒ P[ A(x) accepts ] = 0

BPP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 3/4 x 6∈ L =⇒ P[ A(x) accepts ] ≤ 1/4

10/54 Complexity classes P, RP, and BPP

Intuition:

P: deterministic polynomial time

RP: randomized polynomial time with one-sided error

BPP: randomized polynomial time with bounded two-sided error

11/54 Complexity classes P, RP, and BPP P: class of languages L for which there exists a deterministic polynomial-time algorithm A such that x ∈ L =⇒ A(x) accepts x 6∈ L =⇒ A(x) rejects

RP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 1/2 x 6∈ L =⇒ P[ A(x) accepts ] = 0

BPP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 3/4 x 6∈ L =⇒ P[ A(x) accepts ] ≤ 1/4

12/54 Hence, RP ⊆ NP.

Definition of RP via certificates Recall: L ∈ NP ⇐⇒ there exist a polynomial p(n) and a polynomial-time algorithm V (x, y) such that the following holds:

x ∈ L =⇒ there exists a y ∈ {0, 1}p(|x|) such that V (x, y) = YES x 6∈ L =⇒ there is no such y ∈ {0, 1}p(|x|)

L ∈ RP ⇐⇒ there exists a polynomial p(n) and a polynomial-time algorithm V (x, y) such that the following holds:

1 p(|x|) x ∈ L =⇒ for at least 2 -fraction of all y ∈ {0, 1} it holds that V (x, y) = YES x 6∈ L =⇒ there is no such y ∈ {0, 1}p(|x|)

13/54 Definition of RP via certificates Recall: L ∈ NP ⇐⇒ there exist a polynomial p(n) and a polynomial-time algorithm V (x, y) such that the following holds:

x ∈ L =⇒ there exists a y ∈ {0, 1}p(|x|) such that V (x, y) = YES x 6∈ L =⇒ there is no such y ∈ {0, 1}p(|x|)

L ∈ RP ⇐⇒ there exists a polynomial p(n) and a polynomial-time algorithm V (x, y) such that the following holds:

1 p(|x|) x ∈ L =⇒ for at least 2 -fraction of all y ∈ {0, 1} it holds that V (x, y) = YES x 6∈ L =⇒ there is no such y ∈ {0, 1}p(|x|)

Hence, RP ⊆ NP. 13/54 Definition of BPP via certificates L ∈ RP ⇐⇒ there exists a polynomial p(n) and a polynomial-time algorithm V (x, y) such that the following holds: 1 p(|x|) x ∈ L =⇒ for at least 2 -fraction of all y ∈ {0, 1} it holds that V (x, y) = YES x 6∈ L =⇒ there is no such y ∈ {0, 1}p(|x|)

L ∈ BPP ⇐⇒ there exists a polynomial p(n) and a polynomial-time algorithm V (x, y) such that the following holds: 3 p(|x|) x ∈ L =⇒ for at least 4 -fraction of all y ∈ {0, 1} it holds that V (x, y) = YES 1 p(|x|) x 6∈ L =⇒ for at most 4 -fraction of all y ∈ {0, 1} it holds that that V (x, y) = YES

14/54 Error reduction (confidence amplification) for RP

RP: class of languages L for which there exists a randomized polynomial-time algorithm A such that

x ∈ L =⇒ P[ A(x) accepts ] ≥ 1/2 x 6∈ L =⇒ P[ A(x) accepts ] = 0

Error reduction:

−nd I Can replace 1/2 above with 1 − 2 for any d ≥ 1. d I Can replace 1/2 above with 1/n for any d ≥ 1.

15/54 Error reduction (confidence amplification) for BPP

BPP: class of languages L for which there exists a randomized polynomial-time algorithm A such that 1 x ∈ L =⇒ P[ A(x) accepts ] ≥ 1 − 4 1 x 6∈ L =⇒ P[ A(x) accepts ] ≤ 4

Error reduction:

1 −nd I Can replace above with 2 for any d ≥ 1. 4 1 1 d I Can replace above with − 1/n for any d ≥ 1. 4 2

16/54 Complexity classes P, RP, and BPP P: class of languages L for which there exists a deterministic polynomial-time algorithm A such that x ∈ L =⇒ A(x) accepts x 6∈ L =⇒ A(x) rejects

RP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 1/2 x 6∈ L =⇒ P[ A(x) accepts ] = 0

BPP: class of languages L for which there exists a randomized polynomial-time algorithm A such that x ∈ L =⇒ P[ A(x) accepts ] ≥ 3/4 x 6∈ L =⇒ P[ A(x) accepts ] ≤ 1/4

17/54 Complexity classes: summary

P: deterministic polynomial time

RP: randomized polynomial time with one-sided error

BPP: randomized polynomial time with bounded two-sided error

NP: nondeterministic polynomial time (with efficiently verifiable solutions)

#P: counting polynomial time

18/54 Outline

1. Randomized algorithms Complexity classes RP and BPP

2. Monte Carlo methods

3. Markov chain Monte Carlo

19/54 Suppose ϕ ⊆ Dk, and assume µ(D) < ∞. Let F beJ theK σ-algebra of measurable subsets of Dk. Then P: F → [0, 1] given by

µ(A) P(A) = µ(Dk)

is a probability measure on Dk. Then mc(ϕ) = P( ϕ ) · µ(Dk). J K

Model counting and probability

Recall: The model count of a formula ϕ(x1, . . . , xk) is mc(ϕ) = µ( ϕ ). J K A logical theory T is measured if every ϕ is measurable. J K

20/54 Then mc(ϕ) = P( ϕ ) · µ(Dk). J K

Model counting and probability

Recall: The model count of a formula ϕ(x1, . . . , xk) is mc(ϕ) = µ( ϕ ). J K A logical theory T is measured if every ϕ is measurable. J K Suppose ϕ ⊆ Dk, and assume µ(D) < ∞. Let F beJ theK σ-algebra of measurable subsets of Dk. Then P: F → [0, 1] given by

µ(A) P(A) = µ(Dk)

is a probability measure on Dk.

20/54 Model counting and probability

Recall: The model count of a formula ϕ(x1, . . . , xk) is mc(ϕ) = µ( ϕ ). J K A logical theory T is measured if every ϕ is measurable. J K Suppose ϕ ⊆ Dk, and assume µ(D) < ∞. Let F beJ theK σ-algebra of measurable subsets of Dk. Then P: F → [0, 1] given by

µ(A) P(A) = µ(Dk)

is a probability measure on Dk. Then mc(ϕ) = P( ϕ ) · µ(Dk). J K

20/54 Model counting via estimation of probability

The equality mc(ϕ) = P( ϕ ) · µ(Dk) J K reduces model counting to computing the probability of an event.

21/54 Probability estimation

Suppose we are given (implicitly) a probability space (Ω, F, P).

Our goal is to estimate the value of P(A).

22/54 Probability estimation via sampling

In order to estimate P(A), we can observe multiple independent copies of the indicator variable 1A where A ∈ F. This corresponds to random sampling and checking whether the event A has occurred.

Equivalently, we observe multiple independent random variables that have Bernoulli distribution with parameter θ = P(A). Our goal is to estimate θ given the observations.

23/54 X + ... + X E X + ... + E X E X = E 1 n = 1 n n n θ + ... + θ = = θ n By Chebyshev’s inequality, only rarely does X take values away from E X = θ.

Is that sufficient?

The mean as the estimate

Given the values of X1,...,Xn in {0, 1}, the unknown parameter θ should be close to X + ... + X X = 1 n . n Why?

24/54 Is that sufficient?

The mean as the estimate

Given the values of X1,...,Xn in {0, 1}, the unknown parameter θ should be close to X + ... + X X = 1 n . n Why?

X + ... + X E X + ... + E X E X = E 1 n = 1 n n n θ + ... + θ = = θ n By Chebyshev’s inequality, only rarely does X take values away from E X = θ.

24/54 The mean as the estimate

Given the values of X1,...,Xn in {0, 1}, the unknown parameter θ should be close to X + ... + X X = 1 n . n Why?

X + ... + X E X + ... + E X E X = E 1 n = 1 n n n θ + ... + θ = = θ n By Chebyshev’s inequality, only rarely does X take values away from E X = θ.

Is that sufficient? 24/54 Approximation

We want to estimate a certain quantity f.

Suppose our estimate is f˜ = f˜(ε), where ε is an input parameter.

Additive error: |f˜(ε) − f| ≤ ε

Multiplicative error: |f˜(ε) − f| ≤ ε · f

25/54 We want to find efficient randomized approximation schemes.

Randomized approximation

We want to estimate a certain quantity f.

Suppose our estimate is f˜ = f˜(ε, α), where ε, α are input parameters.

Additive error:

P |f˜(ε, α) − f| ≤ ε  ≥ 1 − α

Multiplicative error:

P |f˜(ε) − f| ≤ ε · f  ≥ 1 − α

26/54 Randomized approximation

We want to estimate a certain quantity f.

Suppose our estimate is f˜ = f˜(ε, α), where ε, α are input parameters.

Additive error:

P |f˜(ε, α) − f| ≤ ε  ≥ 1 − α

Multiplicative error:

P |f˜(ε) − f| ≤ ε · f  ≥ 1 − α

We want to find efficient randomized approximation schemes.

26/54 Is that sufficient?

The mean as the estimate

Given the values of X1,...,Xn in {0, 1}, the unknown parameter θ should be close to X + ... + X X = 1 n . n Why?

X + ... + X E X + ... + E X E X = E 1 n = 1 n n n θ + ... + θ = = θ n By Chebyshev’s inequality, only rarely does X take values away from E X = θ.

27/54 1 n ≥ samples are sufficient 4αε2 for additive error ε and confidence parameter α

Estimates from the Chebyshev bound

Chebyshev inequality: Var X P |X − E X| ≥ t  ≤ t2

28/54 Estimates from the Chebyshev bound

Chebyshev inequality: Var X P |X − E X| ≥ t  ≤ t2

1 n ≥ samples are sufficient 4αε2 for additive error ε and confidence parameter α

28/54 4 n ≥ samples are sufficient ln(2/α)ε2 for additive error ε and confidence parameter α

Estimates from the Chernoff bound

Chernoff bound:

P |X − E X| ≥ t  ≤ 2 exp(−nt2/4)

if X1,...,Xn are independent and identically distributed in [0, 1]

29/54 Estimates from the Chernoff bound

Chernoff bound:

P |X − E X| ≥ t  ≤ 2 exp(−nt2/4)

if X1,...,Xn are independent and identically distributed in [0, 1] 4 n ≥ samples are sufficient ln(2/α)ε2 for additive error ε and confidence parameter α

29/54 Conclusion: Model counting via Monte Carlo

Algorithm:

1. Given a formula ϕ(x1, . . . , xk), sample uniformly from possible models. 2. Return the proportion of actual models times µ(Dk).

30/54 Outline

1. Randomized algorithms Complexity classes RP and BPP

2. Monte Carlo methods

3. Markov chain Monte Carlo

31/54 The Knapsack Problem revisited

Consider the counting version of the Knapsack problem.

Given a1, . . . , an ∈ N and b ∈ N, compute the number N of n Pn vectors (x1, . . . , xn) ∈ {0, 1} that satisfy i=1 aixi ≤ b.

This problem is #P-complete.

32/54 A Monte Carlo algorithm for Knapsack C = 0 For k = 1, . . . , m n 1. Sample (x1, . . . , xn) uniformly at random from {0, 1} . Pn 2. If i=0 aixi ≤ b then C = C + 1. C n Return Y = ( m )2 Y is the random variable corresponding to the output. E(Y ) = N.

33/54 A Monte Carlo algorithm for Knapsack C = 0 For k = 1, . . . , m n 1. Sample (x1, . . . , xn) uniformly at random from {0, 1} . Pn 2. If i=0 aixi ≤ b then C = C + 1. C n Return Y = ( m )2 Y is the random variable corresponding to the output. E(Y ) = N. We can make m sufficiently large to obtain a reliable approximation with any desired accuracy.

What’s the problem?

33/54 A Monte Carlo algorithm for Knapsack C = 0 For k = 1, . . . , m n 1. Sample (x1, . . . , xn) uniformly at random from {0, 1} . Pn 2. If i=0 aixi ≤ b then C = C + 1. C n Return Y = ( m )2 Y is the random variable corresponding to the output. E(Y ) = N. Pn Let Zk be random variable such that Zk = 1 if i=0 aixi ≤ b at the k-th iteration and Zk = 0 otherwise. Z1,Z2,... are independent Bernoulli random variables with parame- N ter p = 2n . Let Z be the number of failures before the first success. Z has geometric distribution with parameter p. Thus,

1 − p 1 2n E Z = = − 1 = − 1. p p N If N is sub-exponential (e.g., polynomial in n), we need an expo- nential number of steps before the first success! 33/54 An alternative?

Sample from the uniform distribution over the set of solutions

n n X Ωknapsack = {(x1, . . . , xn) ∈ {0, 1} : aixi ≤ b}. i=0

How? Use a Markov chain .

34/54 A Markov chain Mknapsack = (Ωknapsack,Tknapsack) for Knapsack n Pn I Ωknapsack = {(x1, . . . , xn) ∈ {0, 1} : i=1 aixi ≤ b}, I transition probability matrix Tknapsack has the following rules 0 for transitioning from s = (x1, . . . , xn) to s = (y1, . . . , yn) 0 1 1. Set s = s with probability 2 . 2. Select i uniformly at random from {1, . . . , n} and let

s = (x1, . . . , xi−1, 1 − xi, xi+1, . . . , xn).

Pn 0 0 3. If i=1 aisi ≤ b, then set s = s otherwise s = s.

Markov Chains: Definition Finite Markov chain M = (Ω,T )

I finite set of states Ω,

I transition probability matrix T where

0 Ts,s0 = P(next state will be s | the current state is s)

35/54 Markov Chains: Definition Finite Markov chain M = (Ω,T )

I finite set of states Ω,

I transition probability matrix T where

0 Ts,s0 = P(next state will be s | the current state is s)

A Markov chain Mknapsack = (Ωknapsack,Tknapsack) for Knapsack n Pn I Ωknapsack = {(x1, . . . , xn) ∈ {0, 1} : i=1 aixi ≤ b}, I transition probability matrix Tknapsack has the following rules 0 for transitioning from s = (x1, . . . , xn) to s = (y1, . . . , yn) 0 1 1. Set s = s with probability 2 . 2. Select i uniformly at random from {1, . . . , n} and let

s = (x1, . . . , xi−1, 1 − xi, xi+1, . . . , xn).

Pn 0 0 3. If i=1 aisi ≤ b, then set s = s otherwise s = s.

35/54 Markov chain for the Knapsack problem

Mknapsack = (Ωknapsack,Tknapsack) for Knapsack n Pn I Ωknapsack = {(x1, . . . , xn) ∈ {0, 1} : i=1 aixi ≤ b}, I transition probability matrix Tknapsack has the following rules 0 for transitioning from s = (x1, . . . , xn) to s = (y1, . . . , yn) 0 1 1. Set s = s with probability 2 . 2. Select i uniformly at random from {1, . . . , n} and let

s = (x1, . . . , xi−1, 1 − xi, xi+1, . . . , xn).

Pn 0 0 3. If i=1 aisi ≤ b, then set s = s otherwise s = s.

Property:

The transition probability matrix of Mknapsack is symmetric. 1 1 T = · = T for v 6= u. u,v 2 n v,u

36/54 Simulating a Markov chain Consider a Markov chain M = (Ω,T ).

Let D :Ω → R be a probability distribution on Ω.

|Ω| 0 Interpreting D as an element of R we have that D = DT is P also a probability distribution over Ω (since s0∈Ω Ts,s0 = 1).

Suppose we start in some state s0 and simulate the Markov chain. Let Xt be the random variable such that

Xt = s iff the current state at step t is s.

For the corresponding sequence of distributions D[0],D[1],D[2],... over Ω we have that

I D[0](s0) = 1 and D0(s) = 0 for all s 6= s0 and I D[i + 1] = D[i]T for all i ≥ 0.

37/54 Markov Chains: Stationary distribution

Stationary distribution of a Markov chain M = (Ω,T ) is a probability distribution D :Ω → R such that DT = D, that is, X Du · Tu,v = Dv for all v ∈ Ω. u∈Ω

38/54 Yes.

Every stochastic matrix always has an eigenvalue of 1 with a left eigenvector whose entries are nonnegative.

Not true in general for Markov chains with infinite state space.

Markov Chains: Stationary distribution

Stationary distribution of a Markov chain M = (Ω,T ) is a probability distribution D :Ω → R such that DT = D, that is, X Du · Tu,v = Dv for all v ∈ Ω. u∈Ω Does every finite Markov chain have a stationary distribution?

38/54 Markov Chains: Stationary distribution

Stationary distribution of a Markov chain M = (Ω,T ) is a probability distribution D :Ω → R such that DT = D, that is, X Du · Tu,v = Dv for all v ∈ Ω. u∈Ω Does every finite Markov chain have a stationary distribution? Yes.

Every stochastic matrix always has an eigenvalue of 1 with a left eigenvector whose entries are nonnegative.

Not true in general for Markov chains with infinite state space.

38/54 No.

Example Take M = (Ω,T ) with Ω = {u, v} and Tu,u = 1 and Tv,v = 1. Every probability distribution D on Ω is a stationary distribution.

Markov Chains: Stationary distribution

Stationary distribution of a Markov chain M = (Ω,T ) is a probability distribution D :Ω → R such that DT = D, that is, X Du · Tu,v = Dv for all v ∈ Ω. u∈Ω Does every finite Markov chain have a unique stationary distribution?

38/54 Markov Chains: Stationary distribution

Stationary distribution of a Markov chain M = (Ω,T ) is a probability distribution D :Ω → R such that DT = D, that is, X Du · Tu,v = Dv for all v ∈ Ω. u∈Ω Does every finite Markov chain have a unique stationary distribution? No.

Example Take M = (Ω,T ) with Ω = {u, v} and Tu,u = 1 and Tv,v = 1. Every probability distribution D on Ω is a stationary distribution.

38/54 Stationary distribution for the Knapsack problem

The uniform distribution U over Ωknapsack is a stationary distribution for Mknapsack = (Ωknapsack,Tknapsack).

We have to show that P U · T = U for all v. s∈Ωknapsack s s,v v

Consider v ∈ Ωknapsack and suppose v has d ”neighbours” 1 different from v. Recall that Ts,s0 = Ts0,s = 2n for u 6= v. We have P P s Us · Ts,v = Uv · Tv,v + s,s6=v Us · Ts,v

1 1 1 d 1 1 = N ( 2 + 2 (1 − n )) + d N 2n

1 = N = Uv

39/54 Irreducible Markov chains

A path in M = (Ω,T ) is a sequence of states s0, s1 . . . sl such that

Tsi,si+1 > 0 for 0 ≤ i < l. We say that sl is reachable from s0.

In the Markov chain Mknapsack I if we start from (0,..., 0) we can reach any state,

I if we start from any state we can reach (0,..., 0).

Thus, from any state of Mknapsack we can reach any state. Such Markov chains are called irreducible.

Recall the example with no unique stationary distribution.

40/54 What about convergence?

If a Markov chain has a unique stationary distribution D, does it always converge to D, starting from any distribution? That is, for any D[0], does the sequence D[0],D[1],D[2],..., where D[i + 1] = D[i]T , converge to D? Not necessarily.

Example Take M = (Ω,T ) with Ω = {u, v} and Tu,v = 1 and Tv,u = 1. 1 The only stationary distribution is D(u) = D(v) = 2 . Now, take D[0] such that D[0](u) = 1 and D[0](v) = 0.

The reason for which the Markov chain does not converge is that the state periodically alternates between states u and v.

41/54 Yes.

For every state s ∈ Ωknapsack, Ts,s > 0. Thus there exists a path from s to s of arbitrary length, and thus every state is aperiodic.

Is Mknapsack aperiodic?

Aperiodic Markov chains

A state s of a Markov chain is called periodic if there exists k ∈ N, k > 1 such that for every path s = s0, . . . , sl = s, k divides l. A state which is not periodic is called aperiodic. A Markov chain is called aperiodic if all of its states are aperiodic.

42/54 Yes.

For every state s ∈ Ωknapsack, Ts,s > 0. Thus there exists a path from s to s of arbitrary length, and thus every state is aperiodic.

Aperiodic Markov chains

A state s of a Markov chain is called periodic if there exists k ∈ N, k > 1 such that for every path s = s0, . . . , sl = s, k divides l. A state which is not periodic is called aperiodic. A Markov chain is called aperiodic if all of its states are aperiodic.

Is Mknapsack aperiodic?

42/54 Aperiodic Markov chains

A state s of a Markov chain is called periodic if there exists k ∈ N, k > 1 such that for every path s = s0, . . . , sl = s, k divides l. A state which is not periodic is called aperiodic. A Markov chain is called aperiodic if all of its states are aperiodic.

Is Mknapsack aperiodic? Yes.

For every state s ∈ Ωknapsack, Ts,s > 0. Thus there exists a path from s to s of arbitrary length, and thus every state is aperiodic.

42/54 Fundamental Theorem of Markov chains

Theorem Every finite Markov chain M which is irreducible has a unique stationary distribution D, and if M is aperiodic, then it also holds that limi→∞ D[i] = D.

This means that the Markov chain Mknapsack converges to the uniform distribution over Ωknapsack.

43/54 Using Mknapsack

Almost uniform sampling for Knapsack using Mknapsack. 1. Start in state s = (0,..., 0).

2. Simulate Mknapsack for sufficiently many steps until the distribution over states is ”close” to the uniform distribution over Ωknapsack. 3. Return the current state.

Repeating this we can obtain a sequence of independent samples. Uniform sampling from Ωknapsack can be used to obtain a randomized approximation algorithm for the counting the number of solutions to the Knapsack problem.

How many steps is sufficiently many?

It is not known if Mknapsack converges to the uniform distribution in polynomial number of steps. (Ω may be exponential in n.)

44/54 Markov chain Monte Carlo

Markov chain Monte Carlo (MCMC) is a technique for sampling from a complicated distribution using local information.

The main challenge is to obtain good bounds on the number of steps a Markov chain takes to converge to the desired distribution.

MCMC may provide efficient (i.e., polynomial time) solution techniques.

45/54 Computing the volume of a convex body

n Given a convex body K ⊆ R , compute its volume Vol(K).

The computational effort required increases as n increases.

[Dyer and Frieze’88] Computing the volume exactly is #P-hard.

[Dyer, Frieze and Kannan’91] Polynomial randomized approximation algorithm via Markov chain Monte Carlo.

46/54 Input to the algorithm

K is given as a membership oracle.

Two n-dimensional balls B0 ⊆ K ⊆ Br of non-zero radius.

By simple transformations of K it can be ensured that B0 is the unit ball and that Br has radius cn log n for some constant c.

Note: The volume of the smallest ball containing K might be exponential in Vol(K), hence naive Monte Carlo is hopeless.

47/54 From volume computation to uniform sampling

Construct a sequence of concentric balls B0 ⊆ B1 ⊆ ... ⊆ K ⊆ Br.

Vol(K∩Br ) Vol(K∩Br−1) Vol(K∩B1) Vol(K) = · · ... · · Vol(K ∩ B0) Vol(K∩Br−1) Vol(K∩Br−2) Vol(K∩B0)

Vol(K ∩ B0) = Vol(B0) known.

Estimate each ratio Vol(K∩Bi) . Vol(K∩Bi−1) Sample uniformly at random from K ∩ Bi using MCMC and count the proportion of samples falling into Bi−1.

To ensure that the number of samples needed is small, ensure that the ratio Vol(K∩Bi) is small by making the balls grow slowly. Vol(K∩Bi−1) This implies r = cn log n for some constant c.

48/54 Time complexity

The original algorithm has time complexity O(n23).

Later it was improved to O(n4).

Key ingredient: sample uniformly at random from from the points in a convex body in polynomial time. For this, the Markov chain has to converge in polynomial time to the uniform distribution.

49/54 The random walk on cubes

1. Divide the space into n-dimensional (hyper)cubes of side δ. Choose δ such to provide a good approximation of K, while permitting the random walk on the Markov chain to converge to the stationary distribution in reasonable time. 2. Perform a random walk as follows. If C is the cube at time t, select uniformly at random an orthogonally adjacent cube C0. If C0 is in K, then move to C0, otherwise stay at C.

Properties:

I The uniform distribution is the unique stationary distribution.

I Rapid mixing: The Markov chain converges to the stationary distribution in number of steps polynomial in n.

50/54 A ball walk

Lov´aszand Simonovits proposed a walk with continuous space.

1. Pick δ ∈ R by the same criteria as before. 2. Perform a random walk as follows. n If at time t the walk is at x ∈ R , the probability density function at time t+1 is uniform over K ∩B(x, δ) and 0 outside.

Properties:

I Rapid mixing argument similar to the walk on cubes.

I Saves a factor n in the number of oracle calls.

I Moves more complex, so no saving in time complexity.

51/54 Conclusion

Theorem If we can sample almost uniformly at random from Ωknapsack in polynomial time, then there is a polynomial-time randomized approximation scheme for the knapsack counting problem.

Theorem There exists a polynomial time randomized approximation scheme for the volume computation problem.

52/54 Summary of today’s lecture

I Randomized algorithms: Power of randomness in computation, complexity classes RP and BPP, error reduction

I Monte Carlo methods: Estimating the probability of a random event, model counting via random sampling

I Markov chain Monte Carlo methods: Markov chains and random walks, sampling via MCMC, model counting via MCMC

53/54 Agenda

Tuesday computational complexity, probability theory

Wednesday randomized algorithms, Monte Carlo methods

Thursday hashing-based approach to model counting

Friday from discrete to continuous model counting

54/54