Lévy Measure Decompositions for the Beta and Gamma Processes

LévyMeasure Decompositions for the Beta and Gamma Processes Yingjian Wang [email protected] Lawrence Carin [email protected] Electrical and Computer Engineering Department, Duke University, Durham NC 27708 Abstract ture data indicate. The increasing importance of these We develop new representations for the models in machine learning warrants a detailed theo- Lévymeasures of the beta and gamma pro- retical analysis of their properties, as well as simple cesses. These representations are manifested constructions for their implementation. In this paper in terms of an infinite sum of well-behaved we focus on Lévyprocesses (Sato, 1999), which are of (proper) beta and gamma distributions. Fur- increasing interest in machine learning. ther, we demonstrate how these infinite sums A family of Lévy processes, the pure-jump nonde- may be truncated in practice, and explicitly creasing Lévy processes, also fit into the category of characterize truncation errors. We also per- the completely random measure proposed by King- form an analysis of the characteristics of pos- man (Kingman, 1967). The beta process (Hjort, 1990) terior distributions, based on the proposed is an example of such a process, which is applied in decompositions. The decompositions pro- nonparametric feature learning. The gamma process vide new insights into the beta and gamma falls in this family as well, with its normalization the processes (and their generalizations), and we well-known Dirichlet process. Hierarchical forms of demonstrate how the proposed representa- such models have become increasingly popular in ma- tion unifies some properties of the two. This chine learning (Teh et al., 2006; Teh, 2006; Thibaux paper is meant to provide a rigorous founda- & Jordan, 2007), as have nested models (Blei et al., tion for and new perspectives on Lévypro- 2010), and models that introduce covariate depen- cesses, as these are of increasing importance dence (MacEachern, 1999; Williamson et al., 2010; Lin in machine learning. et al., 2010). As a consequence of the important role these mod- 1. Introduction els are playing in machine learning, there is a need for the study of the properties of pure-jump nondecreasing A prominent distinction of nonparametric methods rel- Lévyprocesses. As examples of such work, (Thibaux ative to parametric approaches is the utilization of & Jordan, 2007) and (Paisley et al., 2010) present ex- stochastic processes rather than probability distribu- plicit constructions for generating the beta process, tions. For example, a Gaussian process (Rasmussen (Teh et al., 2007) derives a construction for the Indian & Williams, 2006) may be employed to nonparamet- buffet process parallel to the stick-breaking construc- rically represent general smooth functions on a con- tion of the Dirichlet process (Sethuraman, 1994), and tinuous space of covariates (e.g., time). Recently the (Thibaux, 2008) obtains a construction for the gamma idea of nonparametric methods has extended to fea- process under the gamma-Poisson context. Apart from ture learning and data clustering, with interest respec- these specialized construction methods, in (Kingman, tively in the beta-Bernoulli process (Thibaux & Jor- 1967) a general construction method for completely dan, 2007) and the Dirichlet process (Ferguson, 1973). random measures is proposed, by first decomposing it In such processes the nonparametric aspect concerns into a sum of a countable number of σ-finite measures, the number of features/clusters, which are allowed to and then superposing the Poisson processes according be unbounded (“infinite”), permitting the model to to these sub-measures. By regarding the completely adapt the number of these entities as the given and fu- random measure as a Lévy process, this method corre- sponds to decomposing the Lévy measure, which pro- th Appearing in Proceedings of the 29 International Confer- vides clarity of theoretical properties and simplicity in ence on Machine Learning, Edinburgh, Scotland, UK, 2012. Copyright 2012 by the author(s)/owner(s). practical implementation. However this Lévy measure Lévy Measure Decompositions for the Beta and Gamma Processes decomposition method has not yet come into wide use is usually taken to be one-dimensional, such as the in machine learning and statistics, probably due to the real line, to represent a stochastic process with varia- nonexistence of a universal construction of the measure tion over time. By the Lévy-Itôdecomposition (Sato, decomposition. 1999), a Lévyprocess can be decomposed into a con- tinuous Brownian motion with drift, and a discrete In this paper we develop explicit and simple decom- part of a pure-jump process. When a Lévy process positions by following the conjugacy principle for two X(ω) only has the discrete part and its jumps are pos- widely used Lévyprocesses, the beta and gamma pro- itive, then for ∀A ∈ F the characteristic function of cesses. The conjugacy means that the decompositions the random variable X(A) is given by: are manifested by leveraging the forms of conjugate Z likelihoods to the Lévymeasures. The decompositions juX(A) jup bring new perspectives on the beta and gamma pro- E{e } = exp{ (e − 1)ν(dp, dω)} (1) +×A cesses, with associated properties analyzed here in de- R tail. The decompositions are constituted in terms of with ν satisfying the integrability condition (Sato, an infinite set of sub-processes of form convenient for 1999). The expression in (1) defines a category of computation. Since the number of sub-processes is pure-jump nondecreasing Lévy processes, including infinite, a truncation analysis is also presented, of in- most of the Lévyprocesses currently used in nonpara- terest for practical use. We show some posterior prop- metric Bayesian methods, such as the beta, gamma, erties of such decompositions, with the beta process Bernoulli, and negative binomial processes. With (1), as an example. We also extend the decomposition to such a Lévyprocess can be regarded as a Poisson point the symmetric gamma process (positive and negative process on the product space R+ × Ω with the mean jumps), suggesting that the Lévymeasure decomposi- measure ν, called the Lévymeasure. On the other tion is applicable for other pure-jump Lévyprocesses hand, if the increments of X(ω) on any measurable represented by their Lévymeasures. Summarizing the set A ∈ F are regarded as a random measure assigned main contributions of the paper: on the set, then X(ω) is also a completely random measure. Due to this equivalence, in the following dis- • We constitute Lévymeasure decompositions for cussion we will not discriminate the pure-jump nonde- the beta, stable-beta, gamma, generalized gamma creasing Lévy process X with its corresponding com- and symmetric gamma processes via the principle pletely random measure Φ. of conjugacy, providing new perspectives on these processes. 2.2. Completely random measure • The decomposition of the beta process unifies the A random measure Φ on a measure space (Ω, F) is constructions in (Thibaux & Jordan, 2007), (Teh termed “completely random” if for any disjoint sets & Görür, 2009), and (with a different decompos- A1, A2 ∈ F the random variables Φ(A1) and Φ(A2) ing method) (Paisley et al., 2010), and a new gen- are independent. A completely random measure Φ can erative construction for the gamma process and be split into three independent components: its variations is derived. Φ = Φf + Φd + Φo (2) • Truncation analyses and posterior properties for P such decompositions are presented for practical where Φf = ω∈I φ(ω)δω is the fixed component, use. with the atoms in I fixed and the jump φ(ω) random; I is a countable set in F. The deterministic component Φ is a deterministic measure on (Ω, F). 2. Background d Φf and Φd are relatively less interesting compared to Lévyprocesses (Sato, 1999) and completely random the third component Φo, which is called the ordinary measures (Kingman, 1967) are two closely related con- component of Φ. According to (Kingman, 1967), Φo cepts. Specifically, some Lévy processes can be re- is discrete with both random atoms and jumps. garded as completely random measures. In this section In (Kingman, 1967), it is noted that Φo can be further brief reviews and connections are presented for these split into a countable number of independent parts: two important concepts. X X Φo = Φk, Φk = φ(ω)δω (3) 2.1. Lévyprocess k (φ(ω),ω)∈Πk A Lévyprocess X(ω) is a stochastic process with in- Denote ν as the Lévymeasure of (the Lévy process dependent increments on a measure space (Ω, F). Ω corresponding to) Φo, νk as the Lévymeasure of Φk, Lévy Measure Decompositions for the Beta and Gamma Processes Π a Poisson process with ν its mean measure, and Πk a and substituting (8) in (6), with manipulation detailed Poisson process with νk its mean measure; (3) further in the Supplementary Material, we have the Lévymea- yields: sure decomposition theorem of the beta process: X [ ν = νk, Π = Πk (4) Theorem 1 For a beta process B ∼ BP(c(ω), µ) with k k base measure µ and concentration c(ω), denote Π as its which provides a constructive method for Φo: first con- underlying Poisson process and ν the Lévy measure, struct the Poisson process Πk underlying Φk, and then then B and Π can be expressed as with the superposition theorem (Kingman, 1993) the ∞ ∞ union of Πk will be a realization of Φo. In the following [ X Π = Πk ,B = Bk (9) sections we show how this general construction method k=0 k=0 of (4) can be applied on pure-jump nondecreasing Lévy where B is a Lévyprocess with Π its underlying processes of increasing interest in machine learning, k k Poisson process.

Lévy Measure Decompositions for the Beta and Gamma Processes

STOCHASTIC COMPARISONS and AGING PROPERTIES of an EXTENDED GAMMA PROCESS Zeina Al Masry, Sophie Mercier, Ghislain Verdier

Introduction to Lévy Processes

Nonparametric Bayesian Latent Factor Models for Network Reconstruction

Option Pricing in Bilateral Gamma Stock Models

Bayesian Nonparametrics for Sparse Dynamic Networks

Compound Random Measures and Their Use in Bayesian Nonparametrics

A Simple Proof of Pitman-Yor's Chinese Restaurant Process from Its Stick-Breaking Representation

Nonparametric Dynamic Network Modeling

PROBABILISTIC CONSTRUCTION and PROPERTIES of GAMMA PROCESSES and EXTENSIONS Sophie Mercier

Combinatorial Clustering and the Beta Negative Binomial Process

The Kernel Beta Process

Real-Time Inference for a Gamma Process Model of Neural Spiking