<<

Advanced Acceptance/Rejection Methods for Monte Carlo

Mark Huber

Department of Mathematics and Institute of Statistics and Decision Sciences Duke University

March 14, 2006

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 1 / 51 Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 2 / 51 Monte Carlo methods

The basic question For a random variable X and a measurable event A, what is P(X ∈ A)?

Classical statistics: finding p-values Bayesian statistics: learning about posterior distributions Statistical physics: approximating a partition function : approximation of ]P complete problems

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 3 / 51 Basic acceptance/rejection

Fixed number of trials A/R Input: A, L(X), n Output: pˆ an estimate of P(X ∈ A)

1) Let s ← 0 1) For i from 1 to n do 2) Draw T from the distribution of X 3) If T ∈ A, let s ← s + 1 4) Let pˆ ← s/n

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 4 / 51 Running time

Definition Suppose problem instance I has a true solution of S(I), and a randomized A returns the random variable A(I). Then A is a (1 + δ, ) randomized approximation algorithm if

 1 S(I)  ≤ ≤ 1 + δ ≥ 1 − . P 1 + δ A(I)

Theorem Suppose p = P(X ∈ A). Then basic A/R is a (1 + δ, ) randomized approximation algorithm for

1 1  n = Θ · ln(−1) . p δ2

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 5 / 51 Running time

Definition Suppose problem instance I has a true solution of S(I), and a randomized algorithm A returns the random variable A(I). Then A is a (1 + δ, ) randomized approximation algorithm if

 1 S(I)  ≤ ≤ 1 + δ ≥ 1 − . P 1 + δ A(I)

Theorem Suppose p = P(X ∈ A). Then basic A/R is a (1 + δ, ) randomized approximation algorithm for

1 1  n = Θ · ln(−1) . p δ2

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 5 / 51 Random successes to random trials

Basic A/R has fixed number of trials, random number of successes:

~ ~ ~ ~ ~ ~ ~~~~ ~ ~ ~ (10 Trials, estimate is 3/10) Better idea: fixed number of successes, random number of trials

~~~~~~~~~~~~ ~~ ~ (4 successes, estimate is 4/11)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 6 / 51 Solving the 1/p problem

Recall For G ∼ Geo(p), E[G] = 1/p.

Fixed number of success A/R (Dagum, Luby, Karp, Ross 2000) Input: A, L(X), k Output: pˆ an estimate of P(X ∈ A)

1) Let t ← 0 1) For i from 1 to k do 2) Draw T from the distribution of X 3) Let t ← t + 1 3) If T ∈/ A, Goto line 2) 4) Let pˆ ← k/t

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 7 / 51 Solving the 1/p problem

Recall For G ∼ Geo(p), E[G] = 1/p.

Fixed number of success A/R (Dagum, Luby, Karp, Ross 2000) Input: A, L(X), k Output: pˆ an estimate of P(X ∈ A)

1) Let t ← 0 1) For i from 1 to k do 2) Draw T from the distribution of X 3) Let t ← t + 1 3) If T ∈/ A, Goto line 2) 4) Let pˆ ← k/t

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 7 / 51 No reliance on p

Theorem The fixed number of successes A/R is a (1 + δ, ) randomized approximation algorithm when

 1  k = Θ ln(−1) . δ2

Let R be the running time of this algorithm. Then

1 1  [R] = Θ ln(−1) . E p δ2

Note: 1) algorithm is noninterruptible, 2) 1/p factor is bad

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 8 / 51 Methods for improving on acceptance/rejection

1 Weighting draws with exponential shifts Chernoff bounds

2 Sequential Acceptance/Rejection Perfect matchings, the permanent, and Bregman’s Theorem

3 The Recycler The Ising model

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 9 / 51 Outline

1 Weighting draws with exponential shifts Chernoff bounds

2 Sequential Acceptance/Rejection Perfect matchings, the permanent, and Bregman’s Theorem

3 The Randomness Recycler The Ising model

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 10 / 51 Tails of sums of random variables

Consider the following problem: given

X1, X2, X3,... iid random variables

can upper and lower bounds be found for

X + X + ··· + X  p = 1 2 n ≥ α P n

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 11 / 51 Chernoff bounds

One approach is to use Chernoff bounds, for all t > 0:

X + X + ··· + X  1 2 n ≥ α = (t(X + X + ··· + X ) ≥ tα) P n P 1 2 n  t(X +···+X ) tnα = P e 1 n ≥ e [et(X1+···+Xn)] ≤ E etnα n  [etX1 ] ≤ E etα

Nice feature: captures exponential behavior in n

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 12 / 51 Questions about Chernoff bounds

What value of t is best? How accurate are the bounds? Is there a way to use this upper bound in acceptance/rejection?

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 13 / 51 A bad approach

Naive tail sums

1) Draw X1,..., Xn independently from distribution of X1 2) Accept if (X1 + ··· + Xn)/n ≥ α

Problem: the chance of landing in the tail is exponentially small

Example: U1,..., U100 ∼ Unif([0, 1])

−8 P ((U1 + ··· + U100)/100 ≥ .65) ≈ 7 · 10 .

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 14 / 51 Weighting draws

Solution: weight draws towards larger values (Bucklew 2005) Add factor of etx to distribution: Z tx tx P(R1 ∈ dx) = e P(X1 ∈ dx)/ e P(X1 ∈ dx)

tx tX1 = e P(X1 ∈ dx)/E[e ]

Before After

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 15 / 51 Summing with weights

etx1 etx2 ··· etxn (R ∈ dx ,... R ∈ dx ) = (X ∈ dx ,... X ∈ dx ) P 1 1 n n C P 1 1 n n et(x1+···xn) = (X ∈ dx ,... X ∈ dx ) C P 1 1 n n

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 16 / 51 Consequences

Let Sn = X1 + ··· Xn, Tn = R1 + ··· + Rn Note ts tSn P(Tn ∈ ds) = e P(Sn ∈ ds)/E[e ]. To remove ets factor in measure, let

Y |Tn ∼ Unif[0, Tn].

Now

ts tSn P(Tn ∈ ds, Y ∈ dy) = P(Sn ∈ ds)1(y ∈ [0, e ])/E[e ].

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 17 / 51 Using the auxilliary variable...

ts tSn P(Tn ∈ ds, Y ∈ dy) = P(Sn ∈ ds)1(y ∈ [0, e ])/E[e ].

Note for t > 0, if s ≥ nα, then ets ≥ etnα Theorem

tnα [Tn|Tn ≥ nα, Y ≤ e ] ∼ [Sn|Sn ≥ nα].

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 18 / 51 Theorem to algorithm

Weighted tail sums

1) Draw R1,..., Rn independently from distribution of X1 weighted by etx 2) Draw Y uniformly from 0 to et(R1+···+Rn) tnα 3) Accept if (R1 + ··· + Rn)/n ≥ α and Y ≤ e

Important Fact: The probability of acceptance in this scheme is:

tX1 tnα P((X1 + ··· + Xn)/n ≥ α)/[E[e ]/e ].

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 19 / 51 Running time results

Theorem (Huber 2006 [6]) Suppose that

P(X1 > α) > 0 E[R1] = α 3 E[|R1| ] exists then the probability of acceptance is √ Θ(1/ n),

giving an O(n3/2) sampling algorithm.

Under the conditions, Berry-Esseen Theorem says sum of Ri close to normal, so close to mean

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 20 / 51 Outline

1 Weighting draws with exponential shifts Chernoff bounds

2 Sequential Acceptance/Rejection Perfect matchings, the permanent, and Bregman’s Theorem

3 The Randomness Recycler The Ising model

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 21 / 51 Selfreducible problems

Problem A

A1 A2 A3

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 22 / 51 Perfect matchings/Dimer coverings

Definition A perfect is a collection of edges of a graph such that each node is adjacent to exactly one edge

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 23 / 51 Perfect matchings/Dimer coverings

Definition A perfect matching is a collection of edges of a graph such that each node is adjacent to exactly one edge

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 23 / 51 The Permanent

Bipartite graphs can be encoded with n by n matrix of 0’s and 1’s

 1 1 1   1 1 0  0 1 1

Number of perfect matchings is the permanent of the matrix

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 24 / 51 Relation to the determinant

The determinant of a matrix A can be defined as:

n X X (−1)sign(σ)A(i, σ(i))

σ∈Sn i=1

The permanant of a matrix A can be defined as:

n X X A(i, σ(i))

σ∈Sn i=1

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 25 / 51 Goals

Twin goals: Generate uniformly from set of perfect matchings Estimate number of perfect matchings (]P-complete) Markov chain approach Broder [1] created chain on matchings + near perfect matchings Jerrum/Sinclair [8] showed polynomial under certain conditions Jerrum/Sinclair/Vigoda [9] method for all 0-1 matrices, Θ(n9)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 26 / 51 Induction and an algorithm

Say problem A breaks into problems A1,..., Ak and

bound(A) ≥ bound(A1) + ··· + bound(Ak )

Sequential A/R P 1) Let A0 ← bound(A) − j bound(Aj ) 2) Choose X : P(X = i) = bound(Ai )/bound(A) 3) If X = 0 Reject and Quit 4) Else A ← Ai and Goto 1)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 27 / 51 Running time

The running time is at most

O(k · bound(A)/soln(A))

Important to find upper bound within polynomial of solution

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 28 / 51 Bregman’s Theorem

Theorem (Bregman’s Theorem)

For an n by n 0-1 matrix with row sums ri ,

n Y 1/ri per(A) ≤ (ri !) . i=1

By Stirling, r!1/r ≈ (r/e)[1 + (ln r)/(2r)]. Unfortunately, Bregman cannot be proved directly inductively

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 29 / 51 Bregman modified

Theorem (Huber 2006 [7]) Let g(0) = 1, g(1) = e, and

1 .6 g(i + 1) = g(i) + 1 + + . 2g(a) g(a)2

Then n Y g(r ) per(A) ≤ i . e i=1

Even better: this result can be proved directly inductively! Therefore this gives an algorithm for A/R on 0-1 permanents

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 30 / 51 Lower bound

Suppose that all the row and column sums are the same... Theorem (Van der Waerden’s Conjecture [3, 4]) Let r be the common row and column sum of the matrix A. Then r n √ per(A) ≥ n! > 2πn(r/e)n. n

This makes the running time

 ln r n √  O 1 + / 2πn . 2r

Fast when r = γn

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 31 / 51 Application: Quasar data

Source: NASA Hubble Space Telescope Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 32 / 51 Luminosity versus distance

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 33 / 51 Doubly truncated data

Truncated above cosmological model important relativity issues for large distances Truncated below Far away dark objects cannot be seen

End result: double truncation forces correlation

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 34 / 51 Nonparametric tests

Solution (Efron and Petrosian 1999 [2]) Turn data into permutation Use number of inversions as test statistic

1 2 4 3

@

@ One inversion: 4 ↔ 3 @

@

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 35 / 51 Not all permuations are possible

Cannot permute red data to green data because of truncation:

@ @ One inversion: 4 ↔ 3

@ @

Restricted permutations equals perfect matchings problem

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 36 / 51 Outline

1 Weighting draws with exponential shifts Chernoff bounds

2 Sequential Acceptance/Rejection Perfect matchings, the permanent, and Bregman’s Theorem

3 The Randomness Recycler The Ising model

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 37 / 51 The Randomness Recycler

Randomness Recycler: Main Idea (Fill, Huber 2000 [5]) If accept, great Otherwise, save as much of the sample as possible

Note: technique works best with Markov Random Fields

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 38 / 51 The Ising Model

Begin with graph G = (V , E)

x ∈ {−1, 1}V

X H(x) = − x(i)x(j) {i,j}∈E

e−βH(x) π(x) = Zβ

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 39 / 51 The Ising Model

Begin with graph G = (V , E)

x ∈ {−1, 1}V

X H(x) = − x(i)x(j) {i,j}∈E

e−βH(x) π(x) = Zβ

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 39 / 51 Running the algorithm

e−2β = .3.

Start with no spins, no edges Uniformly spin nodes Add edge (accept since spins same) Add edge (U = .287 ⇒ accept) Add edge (U = .533 ⇒ reject)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 40 / 51 Running the algorithm

e−2β = .3.

Start with no spins, no edges Uniformly spin nodes Add edge (accept since spins same) Add edge (U = .287 ⇒ accept) Add edge (U = .533 ⇒ reject)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 40 / 51 Running the algorithm

e−2β = .3.

Start with no spins, no edges Uniformly spin nodes Add edge (accept since spins same) Add edge (U = .287 ⇒ accept) Add edge (U = .533 ⇒ reject)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 40 / 51 Running the algorithm

e−2β = .3.

Start with no spins, no edges Uniformly spin nodes Add edge (accept since spins same) Add edge (U = .287 ⇒ accept) Add edge (U = .533 ⇒ reject)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 40 / 51 Running the algorithm

e−2β = .3.

Start with no spins, no edges Uniformly spin nodes Add edge (accept since spins same) Add edge (U = .287 ⇒ accept) Add edge (U = .533 ⇒ reject)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 40 / 51 RR for Ising

Edge based RR for Ising 1) Let E0 ← ∅ 2) Draw x(1) through x(|V |) iid uniformly from {−1, 1} 3) Pick edge {i, j} ∈ E \ E0 4) Let U ← Unif[0, 1] 5) If U ≤ eβx(i)x(j)/eβ accept, goto line 3) 6) Else Recycle(x, {i, j}), goto line 3)

Note: only reason x sent to be recycled is x(i) 6= x(j)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 41 / 51 How to Recycle

Idea i x(i) 6= x(j)  Now edges leaving j @ node j are bad @ @ Remove them with @ probability @ −β βx(j)x(j0) 0 00 00 e /e j j j Otherwise    x(j0) = x(j), add to queue of bad nodes

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 42 / 51 How to Recycle

After recycling: A collection of same color nodes have had their edges removed Respin each of these nodes

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 43 / 51 How to Recycle

After recycling: A collection of same color nodes have had their edges removed Respin each of these nodes

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 43 / 51 Back to running the algorithm

e−2β = .3.

Ready to recycle Try to eliminate edge U = .123 success! Remove spin of contaminated node Respin node

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 44 / 51 Back to running the algorithm

e−2β = .3.

Ready to recycle Try to eliminate edge U = .123 success! Remove spin of contaminated node Respin node

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 44 / 51 Back to running the algorithm

e−2β = .3.

Ready to recycle Try to eliminate edge U = .123 success! Remove spin of contaminated node Respin node

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 44 / 51 Back to running the algorithm

e−2β = .3.

Ready to recycle Try to eliminate edge U = .123 success! Remove spin of contaminated node Respin node

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 44 / 51 Back to running the algorithm

e−2β = .3.

Ready to recycle Try to eliminate edge U = .123 success! Remove spin of contaminated node Respin node

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 44 / 51 Ending the algorithm

If all the edges in place, algorithm terminates:

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 45 / 51 Running time of RR

Hopefully algorithm adds more edges (on average) than removes... Theorem (Fill, Huber 2002) Suppose that ∆ is the maximum degree of the graph. Then if e−2β > 1 − 1/∆, then the expected running time is

O(|E|)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 46 / 51 Applications

The Randomness Reycler has been used for Ising model Self-organizing lists (move ahead 1 chain) Hard core gas model Autonormal model

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 47 / 51 Summary

Exponential shifting can push your coordinates towards larger (or smaller) values Sequential acceptance/rejection can be used whenever upper bounds can be proved using induction The Randomness Recycler works well when dealing with local weak interactions

The future... How far can these methods be extended? Relationship to other techniques? (Monotonicity, conductance)

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 48 / 51 References I

A.Z. Broder. How hard is it to marry at random? (on the approximation of the permanent). In Proc. 18th ACM Sympos. on the Theory of , pages 50–58, 1986. B. Efron and V. Petrosian. Nonparametric methods for doubly truncated data. J. Amer. Statist. Assoc., 94, 1999. G. P. Egorychev. The solution of van der Waerden’s problem for pemanents. Advances in Math., 42:299–305, 1981.

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 49 / 51 References II

D. I. Falikman. Proof of the van der Waerden’s conjecture on the permanent of a doubly stochastic matrix. Mat. Zametki, 29(6):931–938, 1981. J. A. Fill and M. L. Huber. The Randomness Recyler: A new approach to perfect sampling. In Proc. 41st Sympos. on Foundations of Comp. Sci., pages 503–511, 2000. M. Huber. Acceptance/rejection sampling using chernoff bounds. preprint, 2006.

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 50 / 51 References III

M. Huber. Exact sampling from perfect matchings of dense regular bipartite graphs. Algorithmica, 44(3), 2006. M. Jerrum and A. Sinclair. Approximating the permanent. J. Comput., 18:1149–1178, 1989. M. Jerrum, A. Sinclair, and E. Vigoda. A polynomial-time approximation algorithm for the permanent of a matrix with non-negative entries. In Proc. 33rd ACM Sympos. on Theory of Computing, pages 712–721, 2001.

Mark Huber (Duke University) Advanced Acceptance/Rejection Math Phys & Prob Seminar 51 / 51