Polya urn
Pollyanna May 1, 2014
1Intro
In probability theory, statistics and combinatorics Urn Models have a long and colorful history. Clas- sical mathematicians Laplace and Bernoulis, amongst others, have made notable contributions to this class of problems. For example, Jacob Bernoulli in his Ars Conjectandi (1731) considered the problem of determining the proportions of di↵erent-colored pebbles inside an urn after a number of pebbles are drawn from the urn. This problem also attracted the attention of Abraham de Moivre and Thomas Bayes.
Urn models can be used to represent many naturally occurring phenomena, such as atoms in a sys- tem, biological organisms, populations, economic processes, etc. In fact, in 1923 Eggenberger and Polya devised an urn scheme (commonly known as Polya Urn) to model such (infinite) processes as the spread of contagious diseases. In one of its simplest forms, the Polya Urn originally contains x white balls and y black balls. A ball is randomly drawn from the urn and replaced back to the urn along with additional (fixed) h balls of the same color as the drawn ball. This process continues forever. The fraction of white balls in the urn converges almost surely to a random limit, which has a beta distribution with parameters x/h and y/h.
There are a number of generalizations of the basic Polya Urn model. For example, when a ball is drawn, not only it is returned to the urn, but if it is white, then A additional white balls and B additional black balls are returned to the urn, and if the drawn ball is black, then C additional white balls and D additional black balls are returned to the urn. This set-up involves the matrix [A B; C D], known as the replacement matrix; a rule used for e↵ecting change in the composition and number of balls after each drawing.
Or, instead of only two di↵erently colored balls, there may be several di↵erently colored balls present in the urn. Or, instead of the additional fixed h balls noted above, after every draw a di↵erent number, that may or may not be deterministic, may be added to the urn.
In the rest of the paper, Polya urn model is carefully studied in section 2. As an application of polya urn, edge-reinforced random walk is studied in section 4. Finally, numerical simulation is presented in setction 5.
2 Polya Urn study 2.1 Definition and Notations A Polya urn is an urn containing balls of up to K (K 2) di↵erent colors. The urn evolves in discrete time steps - at each step, one ball is sampled uniformly at random; The color of the withdrawn ball is observed, and two balls with that color are returned to the urn.
2.2 Count vector as a markov chain We start with the markov property of Polya urn, which is not only of importance in itself, but also serves as an important building block in later analysis.
1 Lemma 2.1. Let Xjn denote the number of balls with jth color in the urn at time step n, then the vector (X1n,...,XKn) n 1 is a homogeneous Markov Chain. { } To show it is a Markov Chain, observed that at each step one ball is sampled uniformly at random. Picking a ball of a particular color only relies on balls in the last step. Since the transition matrix does not depend on time index n, it is homogeneous.
Despite the simple proof, it is not clear how this result can be used since the state space of the vector is infinite and it never goes back to the same state.
2.3 Ratio as a Martingale Another quantity of interests is the ratio of number of balls with di↵erent colors. Statistically it is very important and sometimes good enough to know this ratio.
Note it is not a markov chain. For example, at one time step this ratio is one half. Then the next step we don’t know what the ratio value would be since the ratio may come from the situation with 1 white ball plus 1 black ball or the situation with 2 white balls plus 2 black balls, which, however, give di↵erent next positions.
But it is a martingale as stated in the following lemma.
Xin Lemma 2.2. Let ⇢i = X ,n 0 be the ratio of ith color ball in the urn at the conslusion of step n. j jn Define Fn = ci , whereP ci is the color of ith ball.Then (⇢n)n 0 is a Martingale w.r.t. Fn 1. { } Proof.
Xi,n+1 Xi,n+1 1 E[⇢n+1 Fn]=E[ Fn]=E[ Fn]= E[Xi,n+1 Fn] | Xj,n+1 | Xj,0 + n +1| Xj,0 + n +1 | 1 1 Xi,n = P E[Xi,n P+ 1(cn+1 = i) Fn]= P (Xi,n + )=⇢n Xj,0 + n +1 | Xj,0 + n +1 Xj,0 + n P P P
Which implies that as the urn evolves the expected ratio of a particular color stays the same. Furthermore, when n ,wehave: !1 Lemma 2.3. Starting from 1 ball of each color, lim ⇢n exists almost surely. y is uniform in [0, 1]. 1
Proof. Since ⇢i is a martingale and upper bounded by 1, by convergence theorem, we know the limit exists almost surely. The distribution of y is uniform is proved later in more general cases. 1 As time goes on su ciently long, the proportion of picking balls of di↵erent colors stablizes. So, what is the proportion’s distribution? In the next subsection we are going address this problem.
2.4 Proportion Limiting Distribution Seeing that Polya urn is a Markov Chain, it is natural to seek its limiting discrete distributions underlying the process in closed form. Particularly, we are interested in how many balls of a color would be picked when the process goes on for a su ciently long time. As the first step, we give the closed form probability of picking a given number of balls starting from a particular configuration.
Lemma 2.4. Suppose a Polya urn starts with Xj0 balls with jth color (j =1,...,K)and⌧0 = j Xj0. Let X˜ be the number of jth (j =1, 2,...,K) color balls drawn in the Polya urn after n draws, then jn P K [Xj0(Xj0 + 1) ... (Xj0 + yj 1)] n P (X˜ = y ,j =1,...,m)= j=1 ⇤ ⇤ (1) jn j ⌧ (⌧ + 1) ... (⌧ + n 1) y ,...,y Q 0 0 ⇤ ⇤ 0 ✓ 1 m◆
2 Proof makes use of the markov property by writing the probability as the product of probabiliy of n consequtive draws and also notice there are n ways of picking n balls. Details are in the appendix. y1,...,ym While we see the transition probability is addative in time, Theorem 2.4 gives its closed form based on starting and finishing numbers of balls. Starting from this, we could derive the expectation and variance of X˜jn and also the Polya urn’s limiting distribution stated as follows.
Theorem 2.5. Suppose a Polya urn starts with Xj0 balls of jth color (j =1,...,K). Let X˜jn be the number of jth (j =1, 2,...,K) color balls drawn in the Polya urn after n draws, when n , !1 X˜ X˜ ( 1n ,.., Kn ) D Dir(X ,...,X ). (2) n n ! 10 K0 Proof follows Lemma 2.4 and Stirling approximation. Details are given in the appendix.
Remark Theorem 2.5 tells what proportion of colors of balls we are supposed to get if we start with some known number of balls and draw infinitely many times. This depends on how many balls of each color we start from. Asymptotically, we will have a fixed proportion to sample di↵erent colors of balls and this proportion should follow Dirichlet distribution with parameters the initial numbers of balls.
2.5 An alternative perspective – Bayesian generative model So far, we have reviewed Polya urn in its classical scheme where one repeatedly picks balls with di↵erent colors from one urn. Each time picking a specific color relies on what we already have in the urn at that time. In this section we provide another perspective reviewing Polya urn. It is more of a Bayesian treatment and it’s easy to be extended to more general case, as we will show in section 2.6 where K =+ . 1 We assume balls are drawn from one distribution G independently instead of drawn from one time- adaptive distribution. G = multi(p1,...,pK ) is a probability measure over all K colors and specifies a multinomial distribution. As standard Bayesian treatment, we treat G as a random variable and assign a prior distribution Dir(↵1,...,↵K )toitwhere↵i = Xi0 i. Then the evolving process of this new urn is described as follows, 8
G Dir(↵1,...,↵K ) ⇠ (3) color(i) G i ⇠ 8 Now it is no longer sequential picking, but all balls are generated independently. Since the true distribution of G is unknown, We need to infer the conditional distribution of G given the observed balls. Whenever a new ball is observed, we update our belief on G and then make prediction on the upcoming ball. Note that the underlying distribution of G is assumed to be unchanged. So how does this relate to standard Polya urn model? The next Lemma shows that after observing a given number of balls, one has the same prediction on the color of the next ball as the standard Polya urn model.
Theorem 2.6. By choosing prior Dir(X10,...,XK0) where Xj0 denotes the initial number of balls with jth color (j =1, 2,...,K), the posterior distribution of G given Xj jth color balls, is Dir(X1,...,XK ). The expected probability of picking an ith color ball is Xi , which is the same as given by Polya urn j Xj model rule. P The proof follows the fact that Dirichlet and multinomial are conjugate and that it’s easy to compute expectation of random variable from dirichlet distribtuion.
Remark While the probability of picking one color ball is changing during the process in both views of Polya urn, the mechanisms are di↵erent. In the original view of polya urn, it relies on the assumption that we always pick balls uniformly in the urn. Since number of balls in the urn is changing, we have time-adaptive probability. In the alternative view, balls are assumed to be drawn from one underlying probability distribution. What sequentially changes is the belief we have on its distribution.
3 Not surprisingly, as in non-Bayesian analysis, we could derive the probability of observing a given number of balls starting from some balls as stated in the following lemma.
Lemma 2.7. Let Xj be the number of jth (j =1, 2,...,K) color balls observed, X˜jn be the number of jth color balls appeared in the next n draws according to the conditional probability of G, then
K ( Xj) (Xj + yj) n p(X˜ = y , j y = n)= j j=1 (4) jn j 8 | j K y ,...,y j (Pj Xj +Qyj) j=1 (Xj) 1 m X ✓ ◆ P Q Proof. For a fixed G =(p1,...,pK ), we know the distribution of (X˜1n,...,X˜Kn) is multinomial. Since G is a random variable with conditional probability Dir(X1,...,XK ), we intergrate over G and get the result in equation (4). Remark Lemma 2.8 recovers the result in Lemma 2.4 but in a di↵erent way. The proof is more straightforward since all subsequential samples are considered drawn independently from conditional distribution of G, which could be computed once given Xj.
Corollary 2.8. Let Xj be the number of jth (j =1, 2,...,K) color balls observed, X˜jn be the number of jth color balls appeared in the next n draws according to the conditional probability of G, when n !1 n p(X˜ /n = y , j y = 1) !1 Dir(X ,...,X ). (5) jn j 8 | j ! 1 K j X Apparently this corollary recovers Theorem 2.5.
2.6 Extension: Polya urn with unbounded color numbers We extend Polya urn with fixed number of color balls to Polya urn with possibly infinitely many colors via the generative model discussed in 2.5.
Our motivation is that, in real life data, we don’t know the total number of “colors” beforehand. Therefore it is unwise to set a fixed K before we observe the whole data population. Moreover, even if we observe the entire population, we have no clue whether the upcoming “ball” has a di↵erent “color” than the balls observed so far. For example, in Brain Image research, one models an unknown number of spatial activation patterns in fMRI images. Another example, in text modeling, one models an unknown number of topics across several corpora of documents. Current Polya urn type model is limited in these flexible situations. Is there a principled way to deal with this more flexible model?
It turns out to be highly non-trivial to do that due to the fact in equation (3) G is defined in a finite space, thus too restricted to deal with unbounded number of colors. One needs to extend its space to infinite space. Particularly, one defines a proper prior of G that has a support over infinite space.
Our solution resorts to Dirichlet Process (DP) [4, 3] . DP defines a distribution over distribution where the space could be infinite. We refer the formal definition of DP with parameter to [3] and give an equivalent one here,
Definition Let X be a set and A be a -field of subsets of X . ↵ is a positive number and G0 is a probability measure on (X ,A ). We say a Dirichlet Process has sample path G to be a probability on (X ,A ). Particularly, for any measureable partition of X ,say A n ,wehave(G(A ),...,G(A )) { i}i=1 1 n ⇠ Dirichlet(↵G0(A1),...,↵G0(An)). To make sure this indeed defines a proper random process, we need to examine that the Kolmogorov consistency conditions are satisfied. We refer to [3] for details of examination. In what follows we assume it defines a proper random process.
Now we are ready to define the generalized Polya urn model
G DP(↵,G0) ⇠ (6) color(i) G i ⇠ 8
4 where the G0 could be any distribution over colors, and scalar ↵ controls how likely one is to observe a new color. Dirichlet process has the following two good properties. Lemma 2.9. A sample from Dirichlet Process is a discrete distribution, made of up countably infinite number of point masses.
Lemma 2.10. Let G be a sample from Dirichlet process DP(↵,G0) on the color space and let cn be the color of nth ball observed, then the conditional probability of G is still Dirichlet Process with parameters N ↵ and G0 + n=1 cn . Remark LemmaP 2.9 makes it possible for us to use DP to model clustering problem. If not discrete, then the possibility of picking two identical points from G is zero. Lemma 2.10 tells that Dirichlet Process is conjugate with Multinomial distribution, also, making computation tractable.
Theorem 2.11. Let Xi denote the number of balls with ith color drawn independently from generalized Polya urn model defined in (6), then the probability of observing the next ball with ith color is Xi ; Xj +↵ the probability of observing the next ball with a new color is ↵ . Xj +↵ P The proof follows Lemma 2.10 and expectation of randomP variable from Dirichlet Process.
Remark It is easy to see that (6) is an extension of standard Polya urn model since it allows us to draw a new color with non-zero probability. When ↵ goes to 0, we recover standard Polya urn model (3); when ↵ goes to infinity, (6) becomes drawing balls from stationary distribution G0. Meanwhile, it also explicitly allows flexible choice of G0, which is the base probability distribution over color space. For instance, it is easy to incorporate other attributes of colors in order to pick a new color.
3 Generalized Polya Urn with a Replacement Vector
We generalize the simple Polya urn containting two colors, black and white. Instead of replacing 2 balls of the observed color, a deterministic number of balls of the observed color are returned to the urn. Define replacement vectors ~r =(r0,r1,...,) and w~ =(w0,w1,...,), for the n-th time of observing a black ball, rn number of black balls are returned to the urn; similary, for the n-th time of observing a white ball, wn number of whilte balls are returned. In the original Polya urn problem with replacement of 2(or increment of 1), it is easy to show that there are infinitely many times we draw either ball. Lemma 3.1. In the basic Polya’s urn problem with increment 1, there are infinitely many times we choose a ball of either color. Proof. Assume initially there are t balls in total. W.L.O.G. assume we identify one red ball initially in the urn. Then in step n, pn, the probability of drawing the one we identified is 1/(t + n). n pn = and from Borel-Cantelli we conclude we draw the identified red balls infinitely many times. The rest1 follows. P Luckily we have a more general result for generalized Polya urn from Herman Rubin.
Theorem 3.2. Let ~r =(r0,r1, ),w~ =(w0,w1, ) be two sequences of nonnegative numbers such that ··· k ··· k r0 > 0 and w0 > 0.PutRk = i=0 ri and Wk = i=0 wi. The first entry of the infinite sequence is type r with probability R0/(R0 + W0),typew with probability W0/(R0 + W0). Given the first n entries consist of xr’s and y = n xwP’s, in a given order,P the probability that the n +1st entry is r equals Rx/(Rx + Wy), and the probability it is w equals Wy/(Rx + Wy). Let pr = P (all but finitely many elements of the sequence are red) and pw = P (all but finitely many elements of the sequence are white). 1 1 Put (r)= i1=0 Ri , (w)= i1=0 Wi , then 1. If (r)P< and (w) < P then p > 0,p > 0 and p + p =1. 1 1 r w r w 2. If (r) < and (w)= , p =1. 1 1 r 3. If (r)= and (w)= ,bothp and p equal 0. 1 1 r w
5 Example Assume the initial urn is (1,1). ri = wi =1, i>0. Then Rk =2k +1,Wk =2k + 1. We have the same conclusion as above: there are infinitely many8 times we draw the balls of each color.
2 2 Example ai = i, i>0. Then Rk = O(k ),Wk = O(k ). (r) < , (w) < .Sopr + pw = 1, which means either we draw8 only finitely many red balls, or only finitely1 many white1 balls.
4 Application on Edge-Reinforced Random Walk
Recall that a simple random walk X~ on Z is a process that takes values on the Z at any time i 0, i.e., Xi Z, and 2 1 P (X = X + 1) = P (X = X 1) = . i+1 i i+1 i 2 Further, the edge-weighted random walk X~ on Z is a process that takes values on the Z at any time i 0, and w( i, i +1 ) P (X = X + 1) = h i , i+1 i w( i, i +1 )+w( i, i 1 ) h i h i w( i, i 1 ) P (X = X 1) = h i , i+1 i w( i, i +1 )+w( i, i 1 ) h i h i where w( i, i 1 ) is the weight of the edge i, i 1 . h i h i An Edge-Reinforced Random Walk(ERRW) on Z is an edge-weighted random walk on Z but every time the edge is crossed, the weight of the crossed edge increases so that the process tends to revisit the edge. Thus, an ERRW process X~ is not a Markovian process because the probability of traversing an edge depends on the number of times it has traversed in its entire history. Analogous with a weighted random walk, let w(n, i) be the weight of edge e = i, i +1 at time n. i h i We say that X~ traverses ei at some time n>0, if (Xn 1,Xn) (i, i + 1), (i +1,i) . After the process 2{ } traverses edge ei at time n, we add a nonnegative increment a to w(n, i) so that the weight of the edge e at time (n + 1) is w(n +1,i)=w(n, i)+a w(n, i). Here are some notations. i
Definition A walk is of sequence type if there is a sequence ~↵ = ↵1,↵2,..., of nonnegative numbers such that at the k-th crossing of an edge it is reinforced with ak.Thus,if (n, j) is the number of times (X0,X1,...,Xn) crosses ej then