Beta-Negative Binomial Process and Poisson Factor Analysis

Beta-Negative Binomial Process and Poisson Factor Analysis Mingyuan Zhou Lauren A. Hannah† David B. Dunson† Lawrence Carin Department of ECE, †Department of Statistical Science, Duke University, Durham NC 27708, USA Abstract to be prohibitive in high-dimensional settings and the Gaussian assumption is restrictive for count data that are discrete and nonnegative, have limited ranges, and A beta-negative binomial (BNB) process is often present overdispersion. In this paper we pro- proposed, leading to a beta-gamma-Poisson pose a flexible new nonparametric Bayesian prior to process, which may be viewed as a “multi- address these problems, the beta-negative binomial scoop” generalization of the beta-Bernoulli (BNB) process. process. The BNB process is augmented into a beta-gamma-gamma-Poisson hierar- Using completely random measures (Kingman, 1967), chical structure, and applied as a nonpara- Thibaux and Jordan (2007) generalize the beta process metric Bayesian prior for an infinite Poisson defined on [0, 1] R+ by Hjort (1990) to a general prod- factor analysis model. A finite approximation uct space [0, 1] ×Ω, and define a Bernoulli process on an for the beta process Lévyrandom measure is atomic beta process× hazard measure to model binary constructed for convenient implementation. outcomes. They further show that the beta-Bernoulli Efficient MCMC computations are performed process is the underlying de Finetti mixing distribu- with data augmentation and marginalization tion for the Indian buffet process (IBP) of Griffiths techniques. Encouraging results are shown and Ghahramani (2005). To model count variables, on document count matrix factorization. we extend the measure space of the beta process to [0, 1] R+ Ω and introduce a negative binomial process,× leading× to the BNB process. We show that the 1 Introduction BNB process can be augmented into a beta-gamma- Poisson process, and that this process may be inter- Count data appear in many settings. Problems in- preted in terms of a “multi-scoop” IBP. Specifically, clude predicting future demand for medical care based each “customer” visits an infinite set of dishes on a on past use (Cameron et al., 1988; Deb and Trivedi, buffet line, and rather than simply choosing to select 1997), species sampling (National Audubon Society) certain dishes off the buffet (as in the IBP), the cus- and topic modeling of document corpora (Blei et al., tomer may select multiple scoops of each dish, with 2003). Poisson and negative binomial distributions the number of scoops controlled by a negative bino- are typical choices for univariate and repeated mea- mial distribution with dish-dependent hyperparame- sures count data; however, multivariate extensions in- ters. As discussed below, the use of a negative bino- corporating latent variables (latent counts) are under mial distribution for modeling the number of scoops developed. Latent variable models under the Gaus- is more general than using a Poisson distribution, as sian assumption, such as principal component anal- one may control both the mean and variance of the ysis and factor analysis, are widely used to discover counts, and allow overdispersion. This representation low-dimensional data structure (Lawrence, 2005; Tip- is particularly useful for discrete latent variable mod- ping and Bishop, 1999; West, 2003; Zhou et al., 2009). els, where each latent feature is not simply present or There has been some work on exponential family la- absent, but contributes a distinct count to each obser- tent factor models that incorporate Gaussian latent vation. variables (Dunson, 2000, 2003; Moustaki and Knott, We use the BNB process to construct an infinite dis- 2000; Sammel et al., 1997), but computation tends crete latent variable model called Poisson factor analysis (PFA), where an observed count is linked to its Appearing in Proceedings of the 15th International Con- latent parameters with a Poisson distribution. To en- ference on Artificial Intelligence and Statistics (AISTATS) hance model flexibility, we place a gamma prior on 2012, La Palma, Canary Islands. Volume XX of JMLR: W&CP XX. Copyright 2012 by the authors. the Poisson rate parameter, leading to a negative bi- 1462 Beta-Negative Binomial Process and Poisson Factor Analysis nomial distribution. The BNB process is formulated in with ν(dpdω) ν+π(dpdω). A random signed mea- a beta-gamma-gamma-Poisson hierarchical structure, sure satisfying≡ (2) is called a Lévyrandom measure. with which we construct an infinite PFA model for MoreL generally, if the Lévymeasure ν(dpdω) satisfies count matrix factorization. We test PFA with various priors for document count matrix factorization, mak- (1 p )ν(dpdω) < (3) S ∧ | | ∞ ing connections to previous models; here a latent count ZZR× for each compact S Ω, it need not be finite for the assigned to a factor (topic) is the number of times that Lévyrandom measure⊂ to be well defined; the nota- factor appears in the document. tion 1 p denotes minL 1, p . A nonnegative Lévy ∧ | | { | |} The contributions of this paper are: 1) an extension random measure satisfying (3) was called a com- L of the beta process to a marked space, to produce the pletely random measure (CRM) by Kingman (1967, beta-negative binomial (BNB) process; 2) efficient in- 1993) and an additive random measure by Çinlar ference for the BNB process; and 3) a flexible model (2011). It was introduced for machine learning by for count matrix factorization, which accurately cap- Thibaux and Jordan (2007) and Jordan (2010). tures topics with diverse characteristics when applied to topic modeling of document corpora. 2.3 Beta Process The beta process (BP) was defined by Hjort (1990) for 2 Preliminaries survival analysis with Ω = R+. Thibaux and Jordan (2007) generalized the process to an arbitrary measur- 2.1 Negative Binomial Distribution able space Ω by defining a CRM B on a product space [0, 1] Ω with the Lévymeasure The Poisson distribution X Pois(λ) is commonly × ∼ 1 c 1 used for modeling count data. It has the probability ν (dpdω) = cp− (1 p) − dpB (dω). (4) BP − 0 mass function f (k) = eλλk/k!, where k 0, 1,... , X ∈ { } Here c > 0 is a concentration parameter (or concentra- with both the mean and variance equal to λ. A gamma tion function if c is a function of ω), B0 is a continuous distribution with shape r and scale p/(1 p) can be − finite measure over Ω, called the base measure, and placed as a prior on λ to produce a negative binomial α = B0(Ω) is the mass parameter. Since νBP(dpdω) (a.k.a, gamma-Poisson) distribution as integrates to infinity but satisfies (3), a countably infinite number of i.i.d. random points (pk, ωk) k=1, ∞ { } ∞ fX (k) = Pois(k; λ)Gamma (λ; r, p/(1 p)) dλ are obtained from the Poisson process with mean mea- 0 − Z sure νBP and ∞ pk is finite, where the atom ωk Ω Γ(r + k) k=1 ∈ = (1 p)rpk (1) and its weight pk [0, 1]. Therefore, we can express a P ∈ k!Γ(r) − ∞ BP draw, B BP(c, B0), as B = k=1 pkδωk , where δ is a unit∼ measure at the atom ω . If B is dis- where Γ( ) denotes the gamma function. Parame- ωk k 0 · crete (atomic) and of the form B P= q δ , then terized by r > 0 and p (0, 1), this distribution 0 k k ωk ∈ 2 B = p δ with p Beta(cq , c(1 q )). If B X NB(r, p) has a variance rp/(1 p) larger than k k ωk k k k 0 ∼ − is mixed discrete-continuous,∼ B is theP sum− of the two the mean rp/(1 p), and thus it is usually favored P for modeling overdispersed− count data. More detailed independent contributions. discussions about the negative binomial and related distributions and the corresponding stochastic pro- 3 The Beta Process and the Negative cesses defined on R+ can be found in Kozubowski and Binomial Process Podgórski(2009) and Barndorff-Nielsen et al. (2010). Let B be a BP draw as defined in Sec. 2.3, and there- ∞ 2.2 LévyRandom Measures fore B = k=1 pkδωk . A Bernoulli process BeP(B) has atoms appearing at the same locations as those of Beta-negative binomial processes are created using P B; it assigns atom ωk unit mass with probability pk, Lévy random measures. Following Wolpert et al. and zero mass with probability 1 pk, i.e., Bernoulli. + (2011), for any ν 0 and any probability distri- Consequently, each draw from BeP(− B) selects a (fi- ≥ + bution π(dpdω) on R Ω, let K Pois(ν ) and nite) subset of the atoms in B. This construction is iid × ∼ (pk, ωk) 1 k K π(dpdω). Defining 1A(ωk) as be- attractive because the beta distribution is a conjugate { } ≤ ≤ ∼ ing one if ωk A and zero otherwise, the random prior for the Bernoulli distribution. It has diverse ap- measure (A) ∈ K 1 (ω )p assigns independent plications including document classification (Thibaux L ≡ k=1 A k k infinitely divisible random variables (Ai) to disjoint and Jordan, 2007), dictionary learning (Zhou et al., Borel sets A Ω,P with characteristicL functions 2012, 2009, 2011) and topic modeling (Li et al., 2011). i ⊂ it (A) itp The beta distribution is also the conjugate prior for E e L = exp (e 1)ν(dpdω) (2) A − the negative binomial distribution parameter p, which ZZR× 1463 Mingyuan Zhou, Lauren A. Hannah, David B. Dunson, Lawrence Carin suggests coupling the beta process with the negative 3.2 Model Properties binomial process. Further, for modeling flexibility it is Assume we already observe X . Since the beta also desirable to place a prior on the negative-binomial { i}i=1,n parameter r, this motivating a marked beta process.

Beta-Negative Binomial Process and Poisson Factor Analysis

The Exponential Family 1 Definition

A Skew Extension of the T-Distribution, with Applications

On a Problem Connected with Beta and Gamma Distributions by R

A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess

Lecture 2 — September 24 2.1 Recap 2.2 Exponential Families

Distributions (3) © 2008 Winton 2 VIII

Theoretical Properties of the Weighted Feller-Pareto Distributions." Asian Journal of Mathematics and Applications, 2014: 1-12

Field Guide to Continuous Probability Distributions

Bayesian Analysis 1 Introduction

On Families of Generalized Pareto Distributions: Properties and Applications

Probability Density Function of Non-Reactive Solute Concentration in Heterogeneous Porous Formations ⁎ Alberto Bellin A, , Daniele Tonina B,C

A Multivariate Beta-Binomial Model Which Allows to Conduct Bayesian Inference Regarding Θ Or Transformations Thereof