L´evyMeasure Decompositions for the Beta and Gamma Processes

Yingjian Wang [email protected] Lawrence Carin [email protected] Electrical and Computer Engineering Department, Duke University, Durham NC 27708

Abstract ture data indicate. The increasing importance of these We develop new representations for the models in warrants a detailed theo- L´evymeasures of the beta and gamma pro- retical analysis of their properties, as well as simple cesses. These representations are manifested constructions for their implementation. In this paper in terms of an infinite sum of well-behaved we focus on L´evyprocesses (Sato, 1999), which are of (proper) beta and gamma distributions. Fur- increasing interest in machine learning. ther, we demonstrate how these infinite sums A family of L´evy processes, the pure-jump nonde- may be truncated in practice, and explicitly creasing L´evy processes, also fit into the category of characterize truncation errors. We also per- the completely random measure proposed by King- form an analysis of the characteristics of pos- man (Kingman, 1967). The beta process (Hjort, 1990) terior distributions, based on the proposed is an example of such a process, which is applied in decompositions. The decompositions pro- nonparametric feature learning. The gamma process vide new insights into the beta and gamma falls in this family as well, with its normalization the processes (and their generalizations), and we well-known . Hierarchical forms of demonstrate how the proposed representa- such models have become increasingly popular in ma- tion unifies some properties of the two. This chine learning (Teh et al., 2006; Teh, 2006; Thibaux paper is meant to provide a rigorous founda- & Jordan, 2007), as have nested models (Blei et al., tion for and new perspectives on L´evypro- 2010), and models that introduce covariate depen- cesses, as these are of increasing importance dence (MacEachern, 1999; Williamson et al., 2010; Lin in machine learning. et al., 2010). As a consequence of the important role these mod- 1. Introduction els are playing in machine learning, there is a need for the study of the properties of pure-jump nondecreasing A prominent distinction of nonparametric methods rel- L´evyprocesses. As examples of such work, (Thibaux ative to parametric approaches is the utilization of & Jordan, 2007) and (Paisley et al., 2010) present ex- stochastic processes rather than distribu- plicit constructions for generating the beta process, tions. For example, a (Rasmussen (Teh et al., 2007) derives a construction for the Indian & Williams, 2006) may be employed to nonparamet- buffet process parallel to the stick-breaking construc- rically represent general smooth functions on a con- tion of the Dirichlet process (Sethuraman, 1994), and tinuous space of covariates (e.g., time). Recently the (Thibaux, 2008) obtains a construction for the gamma idea of nonparametric methods has extended to fea- process under the gamma-Poisson context. Apart from ture learning and data clustering, with interest respec- these specialized construction methods, in (Kingman, tively in the beta- (Thibaux & Jor- 1967) a general construction method for completely dan, 2007) and the Dirichlet process (Ferguson, 1973). random measures is proposed, by first decomposing it In such processes the nonparametric aspect concerns into a sum of a countable number of σ-finite measures, the number of features/clusters, which are allowed to and then superposing the Poisson processes according be unbounded (“infinite”), permitting the model to to these sub-measures. By regarding the completely adapt the number of these entities as the given and fu- random measure as a L´evy process, this method corre- sponds to decomposing the L´evy measure, which pro- th Appearing in Proceedings of the 29 International Confer- vides clarity of theoretical properties and simplicity in ence on Machine Learning, Edinburgh, Scotland, UK, 2012. Copyright 2012 by the author(s)/owner(s). practical implementation. However this L´evy measure L´evy Measure Decompositions for the Beta and Gamma Processes decomposition method has not yet come into wide use is usually taken to be one-dimensional, such as the in machine learning and , probably due to the real line, to represent a with varia- nonexistence of a universal construction of the measure tion over time. By the L´evy-Itˆodecomposition (Sato, decomposition. 1999), a L´evyprocess can be decomposed into a con- tinuous Brownian motion with drift, and a discrete In this paper we develop explicit and simple decom- part of a pure-. When a L´evy process positions by following the conjugacy principle for two X(ω) only has the discrete part and its jumps are pos- widely used L´evyprocesses, the beta and gamma pro- itive, then for ∀A ∈ F the characteristic function of cesses. The conjugacy means that the decompositions the random variable X(A) is given by: are manifested by leveraging the forms of conjugate Z likelihoods to the L´evymeasures. The decompositions juX(A) jup bring new perspectives on the beta and gamma pro- E{e } = exp{ (e − 1)ν(dp, dω)} (1) +×A cesses, with associated properties analyzed here in de- R tail. The decompositions are constituted in terms of with ν satisfying the integrability condition (Sato, an infinite set of sub-processes of form convenient for 1999). The expression in (1) defines a category of computation. Since the number of sub-processes is pure-jump nondecreasing L´evy processes, including infinite, a truncation analysis is also presented, of in- most of the L´evyprocesses currently used in nonpara- terest for practical use. We show some posterior prop- metric Bayesian methods, such as the beta, gamma, erties of such decompositions, with the beta process Bernoulli, and negative binomial processes. With (1), as an example. We also extend the decomposition to such a L´evyprocess can be regarded as a Poisson point the symmetric gamma process (positive and negative process on the product space R+ × Ω with the mean jumps), suggesting that the L´evymeasure decomposi- measure ν, called the L´evymeasure. On the other tion is applicable for other pure-jump L´evyprocesses hand, if the increments of X(ω) on any measurable represented by their L´evymeasures. Summarizing the set A ∈ F are regarded as a random measure assigned main contributions of the paper: on the set, then X(ω) is also a completely random measure. Due to this equivalence, in the following dis- • We constitute L´evymeasure decompositions for cussion we will not discriminate the pure-jump nonde- the beta, stable-beta, gamma, generalized gamma creasing L´evy process X with its corresponding com- and symmetric gamma processes via the principle pletely random measure Φ. of conjugacy, providing new perspectives on these processes. 2.2. Completely random measure • The decomposition of the beta process unifies the A random measure Φ on a measure space (Ω, F) is constructions in (Thibaux & Jordan, 2007), (Teh termed “completely random” if for any disjoint sets & G¨or¨ur, 2009), and (with a different decompos- A1, A2 ∈ F the random variables Φ(A1) and Φ(A2) ing method) (Paisley et al., 2010), and a new gen- are independent. A completely random measure Φ can erative construction for the gamma process and be split into three independent components: its variations is derived. Φ = Φf + Φd + Φo (2) • Truncation analyses and posterior properties for P such decompositions are presented for practical where Φf = ω∈I φ(ω)δω is the fixed component, use. with the atoms in I fixed and the jump φ(ω) ran- dom; I is a countable set in F. The deterministic component Φ is a deterministic measure on (Ω, F). 2. Background d Φf and Φd are relatively less interesting compared to L´evyprocesses (Sato, 1999) and completely random the third component Φo, which is called the ordinary measures (Kingman, 1967) are two closely related con- component of Φ. According to (Kingman, 1967), Φo cepts. Specifically, some L´evy processes can be re- is discrete with both random atoms and jumps. garded as completely random measures. In this section In (Kingman, 1967), it is noted that Φo can be further brief reviews and connections are presented for these split into a countable number of independent parts: two important concepts. X X Φo = Φk, Φk = φ(ω)δω (3)

2.1. L´evyprocess k (φ(ω),ω)∈Πk A L´evyprocess X(ω) is a stochastic process with in- Denote ν as the L´evymeasure of (the L´evy process dependent increments on a measure space (Ω, F). Ω corresponding to) Φo, νk as the L´evymeasure of Φk, L´evy Measure Decompositions for the Beta and Gamma Processes

Π a Poisson process with ν its mean measure, and Πk a and substituting (8) in (6), with manipulation detailed Poisson process with νk its mean measure; (3) further in the Supplementary Material, we have the L´evymea- yields: sure decomposition theorem of the beta process: X [ ν = νk, Π = Πk (4) Theorem 1 For a beta process B ∼ BP(c(ω), µ) with k k base measure µ and concentration c(ω), denote Π as its which provides a constructive method for Φo: first con- underlying Poisson process and ν the L´evy measure, struct the Poisson process Πk underlying Φk, and then then B and Π can be expressed as with the superposition theorem (Kingman, 1993) the ∞ ∞ union of Πk will be a realization of Φo. In the following [ X Π = Πk ,B = Bk (9) sections we show how this general construction method k=0 k=0 of (4) can be applied on pure-jump nondecreasing L´evy where B is a L´evyprocess with Π its underlying processes of increasing interest in machine learning, k k Poisson process. The L´evymeasure ν of B is a de- with an emphasis on the beta and gamma processes, k k composition of ν: and their generalizations. ∞ X 3. Beta process ν = νk k=0 (10) A beta process (Hjort, 1990) is a L´evy process with νk(dπ, dω) = Beta(1, c(ω) + k)dπµk(dω) beta-distributed increments; B ∼ BP(c(ω), µ) is a beta c(ω) µ (dω) = µ(dω) process if k c(ω) + k B(dω) ∼ Beta(c(ω)µ(dω), c(ω)(1 − µ(dω))) (5) where Beta(1, c(ω)+k) is the PDF of beta distribution where µ is the base measure on measure space (Ω, F) with parameters 1 and c(ω) + k. and a positive function c(ω) the concentration func- Theorem 1 is the beta process instantiation of the com- tion. Expression (5) indicates that the increments of pletely random measure decomposing in (4), which in- the beta process are independent, which makes it a dicates that the underlying Poisson process Π of the special case of the L´evyprocess family. The L´evymea- beta process B is the superposition of an infinite num- sure of the beta process is ∞ ber of independent Poisson processes {Πk}k=0, with −1 c(ω)−1 ν(dπ, dω) = c(ω)π (1 − π) dπµ(dω) (6) νk the mean measure of Πk and µk the mean measure

−1 c(ω)−1 of the restriction of Πk on Ω. As a result, the beta pro- where Beta(0, c(ω)) = c(ω)π (1 − π) is an im- cess B can be expressed as a sum of an infinite number proper beta distribution since its integral over (0, 1) is ∞ ∞ of independent L´evyprocesses {Bk}k=0 with {Πk}k=0 infinite. As a result, its underlying Poisson process, the underlying Poisson process. The independence of i.e., the Poisson process with ν as its mean measure ∞ ∞ {Πk}k=0 and {Bk}k=0 w.r.t. index k is justified by the on the product space Ω × (0, 1), denoted Π, has an fact that both µ and c(ω) are fixed parameters. infinite number of points drawn from ν, yielding ∞ X 3.2. The L´evyprocess Bk B = πiδωi (7) i=1 It is interesting to study the properties of Bk, such as the expectation and variance. Denoting Bk(dω) = where πi is the jump (increment) which happens at the 1 c(ω)+k+1 µk(dω) as the base measure of Bk, for ∀A ∈ atom ωi. Real variable γ = µ(Ω) is termed the mass parameter of B, and we assume γ < ∞. F: Z (B (A)) = B (dω) = B (A) 3.1. Beta process L´evymeasure decomposition E k k k A (11) Z 2 The infinite integral of the improper beta distribution Var(Bk(A)) = Bk(dω) inspires a decomposition of the improper distribution A c(ω) + k + 2 with an infinite number of proper distributions. The singularity in the improper beta distribution is mani- It is noteworthy that the L´evyprocess Bk is no longer fested from π−1. Since π ∈ (0, 1), the geometric series a beta process, since (5) is not satisfied. By Theorem expansion yields 1, the jumps of Bk follow a proper beta distribution ∞ parameterized by the concentration function c(ω) and X π−1 = (1 − π)k, π ∈ (0, 1) (8) the index k, and µk determines the locations where the ∞ k=0 jumps happen. Since {Bk}k=0 are independent w.r.t. L´evy Measure Decompositions for the Beta and Gamma Processes the index k, with Theorem 1: within the beta-Bernoulli process context, which is

∞ shown here a necessary result of the L´evy measure de- X composition. The same decomposing manipulation of E(Bk(A)) = E(B(A)) k=0 Theorem 1 can be also applied to the stable beta pro- ∞ (12) cess (Teh & G¨or¨ur, 2009) which yields: X Var(Bk(A)) = Var(B(A)) νk =Beta(1 − σ, c(ω) + σ + k)dπ k=0 Γ(c(ω) + σ + k)Γ(c(ω) + 1) (13) The detailed procedure to derive (11) and (12) is given · µ(dω) Γ(c(ω) + k + 1)Γ(c(ω) + σ) in the Supplementary Material. It is noteworthy that the decomposition procedure de- 3.3. Simulating the beta process scribed in Theorem 1 is not the only L´evymeasure de- 3.3.1. Poisson superposition simulation composing method for the beta process. The work of (Paisley & Jordan, 2012) and (Broderick et al., 2011) Theorem 1 reveals that the underlying Poisson pro- show that the stick-breaking construction of the beta cess of a beta process is a superposition of an infinite process in (Paisley et al., 2010) is indeed a result of number of Poisson processes, each of which has a finite another way of decomposing the L´evy measure of the set of atoms. This perspective also provides a simula- beta process. We next analyze the truncation prop- tion procedure for the beta process: first, the Poisson erty of the construction described in Section 3.3.1 and process Πk is sampled for all k = 0, 1, 2, ··· , (here we make comparison with the construction of beta process term the index k as the “round” of the simulation); in (Paisley et al., 2010). then take the union of the samples of each Πk as a realization of the Poisson process Π. With the mark- 3.4. Truncation analysis ing theorem (Kingman, 1993) implicitly applied, the simulation procedure of the beta process is as follows: Since the Poisson superposition simulation operates in rounds, it is natural to analyze the distance between Simulation procedure: For round k: PK the true beta process B and its truncation k=0 Bk, with truncation at round K. A metric for such dis- 1: Sample the number of points for Πk: nk ∼ R tance is the L1 norm: Poisson( Ω µk(dω)); K K Z i.i.d. µk X X µK+1(dω) 2: Sample nk points from µk: ωki ∼ R , for ||B − Bk||1 = |B − Bk| = (14) Ω µk(dω) E Ω γ i = 1, 2, ··· , nk; k=0 k=0 i.i.d. 3: Sample Bk(ωki) ∼ Beta(1, c(ωki) + k), for i = The expectation in (14) is w.r.t. the normalized mea- 1, 2, ··· , nk; sure ν/γ, which yields kBk1 = 1. When B is homo- c S∞ nk geneous, (14) reduces to , which indicates that Then the union k=0{(ωki,Bk(ωki)}i=1 is a realiza- c+K+1 1 tion of Π (and equivalently of B). the L1 distance decreases at a rate of O( K ). For the stick-breaking construction of beta process described c K+1 We refer to the above simulation procedure as the in (Paisley et al., 2010), the L1 distance is: ( c+1 ) . Poisson superposition simulation, for the central role Another metric is the L distance between the of the Poisson superposition. The especially conve- 1 marginal likelihood of a set of data b = b , with nient case is when the beta process is homogeneous, 1:M nk m∞(b) denotes the marginal likelihood (here the like- i.e., c(ω) = c is a constant. In this case {ωki}i=1 for lihood is a Bernoulli process) with prior B, and mK (b) all rounds k are drawn from the same distribution µ/γ; PK cγ for k=0 Bk. This metric was applied on the trun- and nk is drawn from Poisson( c+k ). For round k, both the number of points and the jumps statistically di- cated Indian buffet process (Doshi et al., 2009) and minish as k increases, suggesting that the infinite sum truncated stick-breaking construction of the beta pro- PK cess (Paisley & Jordan, 2012), which indicates in (9) may be truncated as B = k=0 Bk for large K, with minimal impact. Such truncation effects are 1 Z investigated in detail in Section 3.4. |m∞(b) − mK (b)|db ≤ 4 (15) Pr(∃k > K, 1 ≤ i ≤ n , 1 ≤ m ≤ M, s.t. bm = 1) 3.3.2. Related work k ki i.i.d. In (Thibaux & Jordan, 2007) the authors derived the where b1:M ∼ BeP(B) are drawn from a Bernoulli m above simulation procedure for the homogeneous case process with base measure B; bki = bm(ωki) is the L´evy Measure Decompositions for the Beta and Gamma Processes

th m realization of the Bernoulli process at atom ωki. the jumps following the distribution Beta(1, c + M + PK For the truncation k=0 Bk it can be shown that the k) at the atom ωi, whose sum is the πi. Thus the RHS of (15) is bounded by: posterior estimation of πi is given by

Z ∞ Hk RHS of (15) ≤ 1 − exp(−M µ (dω)) (16) X X K+1 πi|b = bkh Ω k=0 h=1 For the homogeneous case, the bound of (16) is PM (19) c m=1 bi,m 1 − exp(−Mγ ). For the stick-breaking con- Hk ∼ Poisson( ) c+K+1 c + M + k struction of beta process, the bound is given by: c K+1 bkh ∼ Beta(1, c + M + k) 1 − exp(−Mγ( c+1 ) )(Paisley & Jordan, 2012). PM m=1 bi,m In order to analyze the bound w.r.t. the truncation from which it can be verified that E(πi|b) = , PK c+M level by number of atoms, denote IK = k=0 nk the same as the posterior of πi without decomposition: PK Beta(PM b , c + M − PM b ). as the total number of atoms in k=0 Bk. Since m=1 i,m m=1 i,m E(IK ) K ∼ O(e cγ ), it is proved that (14) and the bound PM For the πi with no observations, i.e., m=1 bi,m = 0, in (16) decreases at a faster rate w.r.t. I than the only a particular Bk will contribute to πi. In this case, stick-breaking construction of beta process. This indi- first the round k to which πi belongs is drawn, then πi cates that the simulation procedure described in Sec- is drawn from the beta distribution of that round: tion 3.3.1 follows a steeper statistically-decreasing or- der. The proof is presented in the Supplementary Ma- πi ∼ Beta(1, c + M + k) ∞ terial. X 1 (20) k ∼ MP(α), α δ ∝ c + M + k k k=0 3.5. Posterior estimation where MP(α) is a multinomial process with proba- The goal of the inference is to estimate the beta process bility vector α, and α is proportional to the average B from a set of observed data b with prior BP(c, µ). number of points in each round. Since in practical pro- The data b = b1:M is the same as in Section 3.4, which cessing α is always to be truncated with a truncation can be expressed as: level K, by the analysis in Section 3.4,(20) provides a ∞ way to estimate the π within the first K rounds. And X i bm = bi,mδωi , m = 1, 2, ··· ,M (17) πi in each round are of statistically different impor- i=1 tance, contrasted to the evenly assigned mass in the Indian buffet process. where each bi,m ∈ {0, 1}. 3.6. Relating the IBP and beta process 3.5.1. Posterior of Bk

PM b The study of the beta process through its L´evy mea- Since B|b ∼ BP(c + M, cµ + m=1 m )(Thibaux & c+M c+M sure, as discussed in this paper, also uncovers a connec- Jordan, 2007), the base measure of B|b is a measure tion between the Indian buffet process (IBP) (Griffiths with positive masses assigned on single atoms. Theo- & Ghahramani, 2005) and the beta process, by their rem 1 is still applicable to this beta process with mixed L´evymeasures. The IBP with prior π ∼ Beta(c γ , c) type of base measure, which yields i N ∞ can be regarded as a L´evyprocess with the L´evy mea- 0 X 0 B = Bk sure given as: k=0 N γ 0 0 (18) νIBP = Beta(c , c)dπµ(dω) (21) νk = Beta(1, c + M + k)µk γ N PM 0 cµ m=1 bm here N is the same as the K in (Griffiths & Ghahra- µk = + c + M + k c + M + k mani, 2005). It can be proved that: 0 0 0 0 where the B , Bk, νk, and µk are the posterior coun- N→∞ νIBP = ν (22) terparts of B, Bk, νk, and µk. which indicates that the beta process is the limit of 3.5.2. Posterior estimation of πi: the IBP with N → ∞. The detailed proof of (22) is PM b presented in the Supplementary Material. Thus the Since each µ has a mass m=1 i,m at the atom ω , k c+M+k i IBP is like a “mosaic” approximation of beta process, PM m=1 bi,m each Bk will contribute Poisson( c+M+k ) draws with which becomes finer with N increases. L´evy Measure Decompositions for the Beta and Gamma Processes

4. Gamma process Γ1+Γ2+ΓP(α, θ(ω)/3), bearing a gamma process with the same shape and with the scale parameter further A gamma process (Applebaum, 2009) is a L´evyprocess decreased. Repeating this manipulation, we obtain the with independent gamma increments. The gamma Theorem 2: process is traditionally parameterized with a shape measure and a scale function: G ∼ ΓP(α, θ(ω)) where Theorem 2 A gamma process G ∼ ΓP(α, θ(ω)) with α is the shape measure on a measure space (Ω, F), shape measure α and scale θ(ω) can be decomposed as: and the scale θ(ω) a positive function. A gamma pro- ∞ ∞ ∞ cess can be intuitively defined by its increments on X X X infinitesimal sets: G = Γk, Γk = Γkh, νk = νkh k=1 h=1 h=1 (27) G(dω) ∼ Gamma(α(dω), θ(ω)) (23) θ(ω) α(dω) νkh = Gamma(h, )dp h When θ(ω) = θ is a scalar, the gamma process is called k + 1 (k + 1) h homogeneous. The gamma process can also be ex- with Γ ,Γ L´evyprocesses with ν , ν their L´evy pressed in the form with a base measure G0 and a k kh k kh measures. concentration c(ω), with c = 1/θ and G0 = θα (Jor- dan., 2009), to conform with other stochastic processes Theorem 2 is the gamma process instantiation of (4), widely used in machine learning, such as the Dirich- which indicates that G can be expressed as the sum of let process. However, the discussion in this paper will an infinite number of L´evy processes Γk, k = 1, 2, ··· , stick to the traditional form given by (23). where Γk is also the sum of an infinite number of L´evy As a pure-jump L´evy process, the gamma process can processes Γkh, h = 1, 2, ··· . be regarded as a Poisson process on the product space Ω × R+ with mean measure ν: 4.2. L´evyprocesses Γk and Γkh − p ν(dp, dω) = p−1e θ(ω) dpα(dω) (24) In order to obtain further insights into the gamma pro- cess G in Theorem 2, the expectations and variances p −1 − where Gamma(0, θ(ω)) = p e θ(ω) is an improper of Γk and Γkh on any measurable set A ∈ F are given: with an infinite integral on R+, which yields the expression of G: R A θ(ω)α(dω) E(Γkh(A)) = ∞ (k + 1)h+1 X R (28) G = piδωi (25) θ(ω)α(dω) (Γ (A)) = A i=1 E k k(k + 1) 4.1. L´evymeasure decomposition For the variances of Γk and Γkh: Like the beta process, the L´evymeasure of the gamma process is characterized by an improper distribution. (h + 1) Z Var(Γ (A)) = θ2(ω)α(dω) However, unlike the beta process, the decomposition kh h+2 (k + 1) A of the L´evy measure of the gamma process comes from Z (29) 1 1 2 the exponential part. With the details shown in the Var(Γk(A)) = [ − ] θ (ω)α(dω) k2 (k + 1)2 Supplementary Material, the gamma process G can be A decomposed into two parts: Since the L´evy processes Γk are independent w.r.t. k, G = Γ1 + ΓP(α, θ(ω)/2) (26) with analogy to (12) it can be verified that the ex- pectation and variance of Γk sum to the expectation The second term in (26) is a gamma process with of variance of G. The derivations in this section are the same shape measure, and half the scale of the presented in the Supplementary Material. gamma process G; the first term Γ1 is a L´evyprocess P∞ θ(ω) α(dω) 4.3. Simulation of gamma process with the L´evymeasure h=1 Gamma(h, 2 )dp 2hh . θ(ω) Here Gamma(h, 2 ) is the PDF of the gamma distri- Parallel to the simulation of beta process in Section bution, with shape parameter h and scale parameter 3.3.1, a simulation procedure of the gamma process is θ(ω) 2 . presented: Further decomposing the exponential part of the Simulation procedure: Sample the L´evy process gamma process ΓP(α, θ(ω)/2) in (26) yields G = Γkh: L´evy Measure Decompositions for the Beta and Gamma Processes

p 1 −σ−1 − θ(ω) 1: Sample the number of points for Γkh: nkh ∼ whose L´evy measure is Γ(1−σ) p e dpα(dω). Poisson(γ/(k + 1)hh); Then with the same decomposition procedure, it is i.i.d. α 2: Sample nkh points from α: ωkhi ∼ γ , for i = straightforward that the L´evy measure for Γkh of 1, 2, ··· , nkh; the generalized gamma process will change to νkh = i.i.d. θ(ωkhi) θ(ω) α(dω) 3: Sample Γkh(ωkhi) ∼ Gamma(h, k+1 ), for Gamma(h − σ, k+1 )dp Γ(1−σ)(k+1)hh . i = 1, 2, ··· , nkh; The symmetric gamma process is a L´evy process R whose increments are the differences of two gamma- where γ = Ω α(dω) is the mass of the shape mea- S∞ S∞ nkh sure. Then the union k=1 h=1(ωkhi, Γkh(ωkhi))i=1 distributed variables with the same law, whose L´evy − |p| is a realization of the gamma process G. An advan- measure is |p|−1e θ(ω) dpα(dω). Since there can be tage of the above simulation procedure compared to negative increments, the symmetric gamma process is the simulation procedure of the beta process in Sec- not a completely random measure. However, the same tion 3.3.1 is that independent of whether the gamma decomposition procedure is still applicable, yielding process is homogeneous or inhomogeneous, ωkhi is al- θ(ω) 2α(dω) νkh = Gamma(|p| h, )dp h , where the dis- ways drawn from a fixed distribution α/γ. Like with k+1 (k+1) h tribution Gamma(|p| h, θ(ω) ) is to first draw |p| from the beta process construction in Section 3.3.1, for the k+1 θ(ω) gamma process simulation procedure, as k increases Gamma(h, k+1 ), then decide the sign of p through a the expected number of new points and the expected symmetric Bernoulli distribution. jumps decrease, again suggesting accurate truncation. 5. Conclusions 4.4. Truncation analysis The L´evy measure decomposition of the beta and Since in the simulation procedure in Section 4.3 the gamma processes provides new perspectives on the two index k and h both go to infinity, it is practical to widely used stochastic processes, by casting insights on analyze the distance between the true gamma process the sub-processes constituting them, here the Bk and and the truncated one. To measure such a distance, Γk. And the decomposition prescriptions described we apply the L1 norm described in Section 3.4: here are far from the only ways of such decomposi- K H K H tion. Theoretically elegant construction methods are X X X X ||G − Γkh||1 = E|G − Γkh| (30) derived from the proposed decompositions, which are k=1 h=1 k=1 h=1 directly implementable in practice. where the expectation in (30) is w.r.t. the normalized We have applied the proposed beta and gamma rep- R measure ν/ Ω θ(ω)α(dω) with ||G||1 = 1; and K and resentations in numerical experiments, the details of H are the truncation level of k and h. Then for the which are omitted, as this paper focuses on founda- situation with H = ∞: tional properties. However, to briefly summarize ex- perience with such representations, consider for exam- K ∞ ple the image inpainting problem considered in (Zhou X X 1 kG − Γ k = (31) et al., 2009), based upon a beta process factor analy- kh 1 K + 1 k=1 h=1 sis model (Paisley & Carin, 2009). In experiments we performed with such a model, using a Gibbs sampler, 1 which indicates a O( K ) decreasing rate as same as the beta process prior was implemented using the pro- the truncated beta process shown in (14). It is note- cedure discussed in Section 3.3.1, with the posterior worthy that Γ1 alone accounts for on average half the estimation in Section 3.5 applied for inference. The mass of G. When H is finite, a remaining distance proposed representation infers a dictionary with the PK 1 k=1 k(k+1)H+1 is added. “important” dictionary elements captured by the low- index members (see the discussion in Section 3.3.1). 4.5. Generalized gamma process and The model prioritized the first three dictionary el- symmetric gamma process ements as being pure colors, specifically red, green, and blue, with the important structured dictionary el- Theorem 2 can be easily extended to some variations ements following (and no other pure-color dictionary of the gamma process. Here we give the examples of elements, while in (Zhou et al., 2009) many – seem- the generalized gamma process (Brix, 1999) and sym- ingly redundant – pure-color dictionary elements are metric gamma process (C¸inlar, 2010). inferred). This “clean” inference of prioritized dictio- The generalized gamma process extends the ordinary nary elements may be responsible for our also higher gamma process by adding a parameter 0 < σ < 1, observed PSNR in signal recovery, compared to the re- L´evy Measure Decompositions for the Beta and Gamma Processes sult given in (Zhou et al., 2009). The new gamma pro- Lin, D., Grimson, E., and Fisher, J. Construction cess construction in Section 4.3 may be implemented of dependent dirichlet processes based on poisson in a similar manner, and may be employed within re- processes. In NIPS, pp. 1396–1404. 2010. cent models in machine learning in which the gamma MacEachern, S.N. Dependent Nonparametric Pro- process has been utilized (e.g.,(Paisley et al., 2011)). cesses. In In Proceedings of the Section on Bayesian Statistical Science, 1999. Acknowledgements Paisley, J., Blei D.M. and Jordan, M.I. Stick-breaking The research reported here was supported by ARO, beta processes and the poisson process. AISTATS, NGA, ONR and DARPA (MSEE program). 2012. Paisley, J. and Carin, L. Nonparametric factor analysis References with beta process priors. In ICML, 2009. Applebaum, D. Levy Processes and Stochastic Calcu- Paisley, J., Zaas, K., Woods, C., Ginsburg, G., and lus. Cambridge University Press, 2009. Carin, L. A stick-breaking construction of the beta process. In ICML, pp. 847–854, 2010. Blei, D.M., Griffiths, T.L., and Jordan, M.I. The nested chinese restaurant process and bayesian non- Paisley, J., Wang, C., and Blei, D. The discrete infinite parametric inference of topic hierarchies. J. ACM, logistic normal distribution for mixed-membership 57(2), 2010. modeling. In AISTATS, 2011. Brix, A. Generalized gamma measures and shot-noise Rasmussen, C. and Williams, C. Gaussian Processes Cox processes. Advances in Applied Probability, 31: for Machine Learning. MIT Press, 2006. 929–953, 1999. Sato, K. L´evyprocesses and infinitely divisible distri- Broderick, T., Jordan, M., and Pitman, J. Beta pro- butions. Cambridge University Press, 1999. cesses, stick-breaking, and power laws. Bayesian Sethuraman, J. A constructive definition of Dirichlet analysis, 2011. priors. Statistica Sinica, 1994. C¸inlar, E. Probability and Stochastics. Graduate Texts Teh, Y.W. A hierarchical Bayesian language model in Mathematics. Springer, 2010. based on Pitman-Yor processes. In Coling/ACL, pp. 985–992, 2006. Doshi, F., Miller, K.T., Van Gael, J., and Teh, Y.W. Variational inference for the Indian buffet process. Teh, Y.W. and G¨or¨ur, D. Indian buffet processes with In AISTATS, volume 12, 2009. power-law behavior. In NIPS, 2009.

Ferguson, T. A Bayesian analysis of some nonpara- Teh, Y.W., Jordan, M.I., Beal, M.J., and Blei, metric problems. The Annals of Statistics, 1973. D.M. Hierarchical dirichlet processes. JASA, pp. 101:1566–1581, 2006. Griffiths, T. and Ghahramani, Z. Infinite latent feature models and the Indian buffet process. In NIPS, 2005. Teh, Y.W., G¨or¨ur, D., and Ghahramani, Z. Stick- breaking construction for the Indian buffet process. Hjort, N.L. Nonparametric Bayes estimators based on In AISTATS, 2007. beta processes in models for life history data. Annals Thibaux, R. Nonparametric Bayesian Models for Ma- of Statistics, 1990. chine Learning. PhD thesis, EECS Dept., University Jordan., M.I. Hierarchical models, nested models and of California, Berkeley, Oct 2008. completely random measures. In Frontiers of Sta- Thibaux, R. and Jordan, M.I. Hierarchical beta pro- tistical Decision Making and Bayesian Analysis: In cesses and the Indian buffet process. In AISTATS, Honor of James O. Berger. New York: Springer, 2007. 2009. Williamson, S., Orbanz, P., and Ghahramani, Z. De- Kingman, J.F.C. Completely random measure. In pendent Indian buffet processes. In AISTATS, 2010. Pacific Journal of Mathematics, volume 21(1):59- 78, 1967. Zhou, M., Chen, H., Paisley, J., Ren, L., Sapiro, G., and Carin, L. Non-parametric Bayesian dictionary Kingman, J.F.C. Poisson Processes. Oxford Univer- learning for sparse image representations. In NIPS, sity Press, Oxford, 1993. 2009.