Beta-Negative Binomial Process and Poisson Factor Analysis

Beta-Negative Binomial Process and Poisson Factor Analysis

Beta-Negative Binomial Process and Poisson Factor Analysis Mingyuan Zhou Lauren A. Hannah† David B. Dunson† Lawrence Carin Department of ECE, †Department of Statistical Science, Duke University, Durham NC 27708, USA Abstract to be prohibitive in high-dimensional settings and the Gaussian assumption is restrictive for count data that are discrete and nonnegative, have limited ranges, and A beta-negative binomial (BNB) process is often present overdispersion. In this paper we pro- proposed, leading to a beta-gamma-Poisson pose a flexible new nonparametric Bayesian prior to process, which may be viewed as a “multi- address these problems, the beta-negative binomial scoop” generalization of the beta-Bernoulli (BNB) process. process. The BNB process is augmented into a beta-gamma-gamma-Poisson hierar- Using completely random measures (Kingman, 1967), chical structure, and applied as a nonpara- Thibaux and Jordan (2007) generalize the beta process metric Bayesian prior for an infinite Poisson defined on [0, 1] R+ by Hjort (1990) to a general prod- factor analysis model. A finite approximation uct space [0, 1] ×Ω, and define a Bernoulli process on an for the beta process L´evyrandom measure is atomic beta process× hazard measure to model binary constructed for convenient implementation. outcomes. They further show that the beta-Bernoulli Efficient MCMC computations are performed process is the underlying de Finetti mixing distribu- with data augmentation and marginalization tion for the Indian buffet process (IBP) of Griffiths techniques. Encouraging results are shown and Ghahramani (2005). To model count variables, on document count matrix factorization. we extend the measure space of the beta process to [0, 1] R+ Ω and introduce a negative binomial pro- cess,× leading× to the BNB process. We show that the 1 Introduction BNB process can be augmented into a beta-gamma- Poisson process, and that this process may be inter- Count data appear in many settings. Problems in- preted in terms of a “multi-scoop” IBP. Specifically, clude predicting future demand for medical care based each “customer” visits an infinite set of dishes on a on past use (Cameron et al., 1988; Deb and Trivedi, buffet line, and rather than simply choosing to select 1997), species sampling (National Audubon Society) certain dishes off the buffet (as in the IBP), the cus- and topic modeling of document corpora (Blei et al., tomer may select multiple scoops of each dish, with 2003). Poisson and negative binomial distributions the number of scoops controlled by a negative bino- are typical choices for univariate and repeated mea- mial distribution with dish-dependent hyperparame- sures count data; however, multivariate extensions in- ters. As discussed below, the use of a negative bino- corporating latent variables (latent counts) are under mial distribution for modeling the number of scoops developed. Latent variable models under the Gaus- is more general than using a Poisson distribution, as sian assumption, such as principal component anal- one may control both the mean and variance of the ysis and factor analysis, are widely used to discover counts, and allow overdispersion. This representation low-dimensional data structure (Lawrence, 2005; Tip- is particularly useful for discrete latent variable mod- ping and Bishop, 1999; West, 2003; Zhou et al., 2009). els, where each latent feature is not simply present or There has been some work on exponential family la- absent, but contributes a distinct count to each obser- tent factor models that incorporate Gaussian latent vation. variables (Dunson, 2000, 2003; Moustaki and Knott, We use the BNB process to construct an infinite dis- 2000; Sammel et al., 1997), but computation tends crete latent variable model called Poisson factor anal- ysis (PFA), where an observed count is linked to its Appearing in Proceedings of the 15th International Con- latent parameters with a Poisson distribution. To en- ference on Artificial Intelligence and Statistics (AISTATS) hance model flexibility, we place a gamma prior on 2012, La Palma, Canary Islands. Volume XX of JMLR: W&CP XX. Copyright 2012 by the authors. the Poisson rate parameter, leading to a negative bi- 1462 Beta-Negative Binomial Process and Poisson Factor Analysis nomial distribution. The BNB process is formulated in with ν(dpdω) ν+π(dpdω). A random signed mea- a beta-gamma-gamma-Poisson hierarchical structure, sure satisfying≡ (2) is called a L´evyrandom measure. with which we construct an infinite PFA model for MoreL generally, if the L´evymeasure ν(dpdω) satisfies count matrix factorization. We test PFA with various priors for document count matrix factorization, mak- (1 p )ν(dpdω) < (3) S ∧ | | ∞ ing connections to previous models; here a latent count ZZR× for each compact S Ω, it need not be finite for the assigned to a factor (topic) is the number of times that L´evyrandom measure⊂ to be well defined; the nota- factor appears in the document. tion 1 p denotes minL 1, p . A nonnegative L´evy ∧ | | { | |} The contributions of this paper are: 1) an extension random measure satisfying (3) was called a com- L of the beta process to a marked space, to produce the pletely random measure (CRM) by Kingman (1967, beta-negative binomial (BNB) process; 2) efficient in- 1993) and an additive random measure by C¸inlar ference for the BNB process; and 3) a flexible model (2011). It was introduced for machine learning by for count matrix factorization, which accurately cap- Thibaux and Jordan (2007) and Jordan (2010). tures topics with diverse characteristics when applied to topic modeling of document corpora. 2.3 Beta Process The beta process (BP) was defined by Hjort (1990) for 2 Preliminaries survival analysis with Ω = R+. Thibaux and Jordan (2007) generalized the process to an arbitrary measur- 2.1 Negative Binomial Distribution able space Ω by defining a CRM B on a product space [0, 1] Ω with the L´evymeasure The Poisson distribution X Pois(λ) is commonly × ∼ 1 c 1 used for modeling count data. It has the probability ν (dpdω) = cp− (1 p) − dpB (dω). (4) BP − 0 mass function f (k) = eλλk/k!, where k 0, 1,... , X ∈ { } Here c > 0 is a concentration parameter (or concentra- with both the mean and variance equal to λ. A gamma tion function if c is a function of ω), B0 is a continuous distribution with shape r and scale p/(1 p) can be − finite measure over Ω, called the base measure, and placed as a prior on λ to produce a negative binomial α = B0(Ω) is the mass parameter. Since νBP(dpdω) (a.k.a, gamma-Poisson) distribution as integrates to infinity but satisfies (3), a countably in- finite number of i.i.d. random points (pk, ωk) k=1, ∞ { } ∞ fX (k) = Pois(k; λ)Gamma (λ; r, p/(1 p)) dλ are obtained from the Poisson process with mean mea- 0 − Z sure νBP and ∞ pk is finite, where the atom ωk Ω Γ(r + k) k=1 ∈ = (1 p)rpk (1) and its weight pk [0, 1]. Therefore, we can express a P ∈ k!Γ(r) − ∞ BP draw, B BP(c, B0), as B = k=1 pkδωk , where δ is a unit∼ measure at the atom ω . If B is dis- where Γ( ) denotes the gamma function. Parame- ωk k 0 · crete (atomic) and of the form B P= q δ , then terized by r > 0 and p (0, 1), this distribution 0 k k ωk ∈ 2 B = p δ with p Beta(cq , c(1 q )). If B X NB(r, p) has a variance rp/(1 p) larger than k k ωk k k k 0 ∼ − is mixed discrete-continuous,∼ B is theP sum− of the two the mean rp/(1 p), and thus it is usually favored P for modeling overdispersed− count data. More detailed independent contributions. discussions about the negative binomial and related distributions and the corresponding stochastic pro- 3 The Beta Process and the Negative cesses defined on R+ can be found in Kozubowski and Binomial Process Podg´orski(2009) and Barndorff-Nielsen et al. (2010). Let B be a BP draw as defined in Sec. 2.3, and there- ∞ 2.2 L´evyRandom Measures fore B = k=1 pkδωk . A Bernoulli process BeP(B) has atoms appearing at the same locations as those of Beta-negative binomial processes are created using P B; it assigns atom ωk unit mass with probability pk, L´evy random measures. Following Wolpert et al. and zero mass with probability 1 pk, i.e., Bernoulli. + (2011), for any ν 0 and any probability distri- Consequently, each draw from BeP(− B) selects a (fi- ≥ + bution π(dpdω) on R Ω, let K Pois(ν ) and nite) subset of the atoms in B. This construction is iid × ∼ (pk, ωk) 1 k K π(dpdω). Defining 1A(ωk) as be- attractive because the beta distribution is a conjugate { } ≤ ≤ ∼ ing one if ωk A and zero otherwise, the random prior for the Bernoulli distribution. It has diverse ap- measure (A) ∈ K 1 (ω )p assigns independent plications including document classification (Thibaux L ≡ k=1 A k k infinitely divisible random variables (Ai) to disjoint and Jordan, 2007), dictionary learning (Zhou et al., Borel sets A Ω,P with characteristicL functions 2012, 2009, 2011) and topic modeling (Li et al., 2011). i ⊂ it (A) itp The beta distribution is also the conjugate prior for E e L = exp (e 1)ν(dpdω) (2) A − the negative binomial distribution parameter p, which ZZR× 1463 Mingyuan Zhou, Lauren A. Hannah, David B. Dunson, Lawrence Carin suggests coupling the beta process with the negative 3.2 Model Properties binomial process. Further, for modeling flexibility it is Assume we already observe X . Since the beta also desirable to place a prior on the negative-binomial { i}i=1,n parameter r, this motivating a marked beta process.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    10 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us