Lévy Measure Decompositions for the Beta and Gamma Processes
Total Page:16
File Type:pdf, Size:1020Kb
L´evyMeasure Decompositions for the Beta and Gamma Processes Yingjian Wang [email protected] Lawrence Carin [email protected] Electrical and Computer Engineering Department, Duke University, Durham NC 27708 Abstract ture data indicate. The increasing importance of these We develop new representations for the models in machine learning warrants a detailed theo- L´evymeasures of the beta and gamma pro- retical analysis of their properties, as well as simple cesses. These representations are manifested constructions for their implementation. In this paper in terms of an infinite sum of well-behaved we focus on L´evyprocesses (Sato, 1999), which are of (proper) beta and gamma distributions. Fur- increasing interest in machine learning. ther, we demonstrate how these infinite sums A family of L´evy processes, the pure-jump nonde- may be truncated in practice, and explicitly creasing L´evy processes, also fit into the category of characterize truncation errors. We also per- the completely random measure proposed by King- form an analysis of the characteristics of pos- man (Kingman, 1967). The beta process (Hjort, 1990) terior distributions, based on the proposed is an example of such a process, which is applied in decompositions. The decompositions pro- nonparametric feature learning. The gamma process vide new insights into the beta and gamma falls in this family as well, with its normalization the processes (and their generalizations), and we well-known Dirichlet process. Hierarchical forms of demonstrate how the proposed representa- such models have become increasingly popular in ma- tion unifies some properties of the two. This chine learning (Teh et al., 2006; Teh, 2006; Thibaux paper is meant to provide a rigorous founda- & Jordan, 2007), as have nested models (Blei et al., tion for and new perspectives on L´evypro- 2010), and models that introduce covariate depen- cesses, as these are of increasing importance dence (MacEachern, 1999; Williamson et al., 2010; Lin in machine learning. et al., 2010). As a consequence of the important role these mod- 1. Introduction els are playing in machine learning, there is a need for the study of the properties of pure-jump nondecreasing A prominent distinction of nonparametric methods rel- L´evyprocesses. As examples of such work, (Thibaux ative to parametric approaches is the utilization of & Jordan, 2007) and (Paisley et al., 2010) present ex- stochastic processes rather than probability distribu- plicit constructions for generating the beta process, tions. For example, a Gaussian process (Rasmussen (Teh et al., 2007) derives a construction for the Indian & Williams, 2006) may be employed to nonparamet- buffet process parallel to the stick-breaking construc- rically represent general smooth functions on a con- tion of the Dirichlet process (Sethuraman, 1994), and tinuous space of covariates (e.g., time). Recently the (Thibaux, 2008) obtains a construction for the gamma idea of nonparametric methods has extended to fea- process under the gamma-Poisson context. Apart from ture learning and data clustering, with interest respec- these specialized construction methods, in (Kingman, tively in the beta-Bernoulli process (Thibaux & Jor- 1967) a general construction method for completely dan, 2007) and the Dirichlet process (Ferguson, 1973). random measures is proposed, by first decomposing it In such processes the nonparametric aspect concerns into a sum of a countable number of σ-finite measures, the number of features/clusters, which are allowed to and then superposing the Poisson processes according be unbounded (“infinite”), permitting the model to to these sub-measures. By regarding the completely adapt the number of these entities as the given and fu- random measure as a L´evy process, this method corre- sponds to decomposing the L´evy measure, which pro- th Appearing in Proceedings of the 29 International Confer- vides clarity of theoretical properties and simplicity in ence on Machine Learning, Edinburgh, Scotland, UK, 2012. Copyright 2012 by the author(s)/owner(s). practical implementation. However this L´evy measure L´evy Measure Decompositions for the Beta and Gamma Processes decomposition method has not yet come into wide use is usually taken to be one-dimensional, such as the in machine learning and statistics, probably due to the real line, to represent a stochastic process with varia- nonexistence of a universal construction of the measure tion over time. By the L´evy-Itˆodecomposition (Sato, decomposition. 1999), a L´evyprocess can be decomposed into a con- tinuous Brownian motion with drift, and a discrete In this paper we develop explicit and simple decom- part of a pure-jump process. When a L´evy process positions by following the conjugacy principle for two X(ω) only has the discrete part and its jumps are pos- widely used L´evyprocesses, the beta and gamma pro- itive, then for ∀A ∈ F the characteristic function of cesses. The conjugacy means that the decompositions the random variable X(A) is given by: are manifested by leveraging the forms of conjugate Z likelihoods to the L´evymeasures. The decompositions juX(A) jup bring new perspectives on the beta and gamma pro- E{e } = exp{ (e − 1)ν(dp, dω)} (1) +×A cesses, with associated properties analyzed here in de- R tail. The decompositions are constituted in terms of with ν satisfying the integrability condition (Sato, an infinite set of sub-processes of form convenient for 1999). The expression in (1) defines a category of computation. Since the number of sub-processes is pure-jump nondecreasing L´evy processes, including infinite, a truncation analysis is also presented, of in- most of the L´evyprocesses currently used in nonpara- terest for practical use. We show some posterior prop- metric Bayesian methods, such as the beta, gamma, erties of such decompositions, with the beta process Bernoulli, and negative binomial processes. With (1), as an example. We also extend the decomposition to such a L´evyprocess can be regarded as a Poisson point the symmetric gamma process (positive and negative process on the product space R+ × Ω with the mean jumps), suggesting that the L´evymeasure decomposi- measure ν, called the L´evymeasure. On the other tion is applicable for other pure-jump L´evyprocesses hand, if the increments of X(ω) on any measurable represented by their L´evymeasures. Summarizing the set A ∈ F are regarded as a random measure assigned main contributions of the paper: on the set, then X(ω) is also a completely random measure. Due to this equivalence, in the following dis- • We constitute L´evymeasure decompositions for cussion we will not discriminate the pure-jump nonde- the beta, stable-beta, gamma, generalized gamma creasing L´evy process X with its corresponding com- and symmetric gamma processes via the principle pletely random measure Φ. of conjugacy, providing new perspectives on these processes. 2.2. Completely random measure • The decomposition of the beta process unifies the A random measure Φ on a measure space (Ω, F) is constructions in (Thibaux & Jordan, 2007), (Teh termed “completely random” if for any disjoint sets & G¨or¨ur, 2009), and (with a different decompos- A1, A2 ∈ F the random variables Φ(A1) and Φ(A2) ing method) (Paisley et al., 2010), and a new gen- are independent. A completely random measure Φ can erative construction for the gamma process and be split into three independent components: its variations is derived. Φ = Φf + Φd + Φo (2) • Truncation analyses and posterior properties for P such decompositions are presented for practical where Φf = ω∈I φ(ω)δω is the fixed component, use. with the atoms in I fixed and the jump φ(ω) ran- dom; I is a countable set in F. The deterministic component Φ is a deterministic measure on (Ω, F). 2. Background d Φf and Φd are relatively less interesting compared to L´evyprocesses (Sato, 1999) and completely random the third component Φo, which is called the ordinary measures (Kingman, 1967) are two closely related con- component of Φ. According to (Kingman, 1967), Φo cepts. Specifically, some L´evy processes can be re- is discrete with both random atoms and jumps. garded as completely random measures. In this section In (Kingman, 1967), it is noted that Φo can be further brief reviews and connections are presented for these split into a countable number of independent parts: two important concepts. X X Φo = Φk, Φk = φ(ω)δω (3) 2.1. L´evyprocess k (φ(ω),ω)∈Πk A L´evyprocess X(ω) is a stochastic process with in- Denote ν as the L´evymeasure of (the L´evy process dependent increments on a measure space (Ω, F). Ω corresponding to) Φo, νk as the L´evymeasure of Φk, L´evy Measure Decompositions for the Beta and Gamma Processes Π a Poisson process with ν its mean measure, and Πk a and substituting (8) in (6), with manipulation detailed Poisson process with νk its mean measure; (3) further in the Supplementary Material, we have the L´evymea- yields: sure decomposition theorem of the beta process: X [ ν = νk, Π = Πk (4) Theorem 1 For a beta process B ∼ BP(c(ω), µ) with k k base measure µ and concentration c(ω), denote Π as its which provides a constructive method for Φo: first con- underlying Poisson process and ν the L´evy measure, struct the Poisson process Πk underlying Φk, and then then B and Π can be expressed as with the superposition theorem (Kingman, 1993) the ∞ ∞ union of Πk will be a realization of Φo. In the following [ X Π = Πk ,B = Bk (9) sections we show how this general construction method k=0 k=0 of (4) can be applied on pure-jump nondecreasing L´evy where B is a L´evyprocess with Π its underlying processes of increasing interest in machine learning, k k Poisson process.