Negative Binomial Process Count and Mixture Modeling

1 Negative Binomial Process Count and Mixture Modeling Mingyuan Zhou and Lawrence Carin, Fellow, IEEE Abstract—The seemingly disjoint problems of count and mixture modeling are united under the negative binomial (NB) process. A gamma process is employed to model the rate measure of a Poisson process, whose normalization provides a random probability measure for mixture modeling and whose marginalization leads to an NB process for count modeling. A draw from the NB process consists of a Poisson distributed finite number of distinct atoms, each of which is associated with a logarithmic distributed number of data samples. We reveal relationships between various count- and mixture-modeling distributions and construct a Poisson-logarithmic bivariate distribution that connects the NB and Chinese restaurant table distributions. Fundamental properties of the models are developed, and we derive efficient Bayesian inference. It is shown that with augmentation and normalization, the NB process and gamma-NB process can be reduced to the Dirichlet process and hierarchical Dirichlet process, respectively. These relationships highlight theoretical, structural and computational advantages of the NB process. A variety of NB processes, including the beta- geometric, beta-NB, marked-beta-NB, marked-gamma-NB and zero-inflated-NB processes, with distinct sharing mechanisms, are also constructed. These models are applied to topic modeling, with connections made to existing algorithms under Poisson factor analysis. Example results show the importance of inferring both the NB dispersion and probability parameters. Index Terms—Beta process, Chinese restaurant process, completely random measures, count modeling, Dirichlet process, gamma process, hierarchical Dirichlet process, mixed-membership modeling, mixture modeling, negative binomial process, normalized random measures, Poisson factor analysis, Poisson process, topic modeling. F 1 INTRODUCTION work, using the Dirichlet process [16]–[21] as the prior distribution. With the Dirichlet-multinomial conjugacy, OUNT data appear in many settings, such as pre- the Dirichlet process mixture model enjoys tractability dicting the number of motor insurance claims [1], C because the posterior of the random probability measure [2], analyzing infectious diseases [3] and modeling topics is still a Dirichlet process. Despite its popularity, the of document corpora [4]–[8]. There has been increasing Dirichlet process is inflexible in that a single concen- interest in count modeling using the Poisson process, tration parameter controls both the variability of the geometric process [9]–[13] and recently the negative mass around the mean [21], [22] and the distribution binomial (NB) process [8], [14], [15]. It is shown in [8] of the number of distinct atoms [18], [23]. For mixture and further demonstrated in [15] that the NB process, modeling of grouped data, the hierarchical Dirichlet originally constructed for count analysis, can be nat- process (HDP) [24] has been further proposed to share urally applied for mixture modeling of grouped data statistical strength between groups. The HDP inherits the x ; ··· ; x , where each group x = fx g . For 1 J j ji i=1;Nj same inflexibility of the Dirichlet process, and due to the example, in topic modeling (mixed-membership model- non-conjugacy between Dirichlet processes, its inference ing), each document consists of a group of exchangeable has to be solved under alternative constructions, such words and each word is a group member that is assigned arXiv:1209.3442v3 [stat.ME] 13 Oct 2013 as the Chinese restaurant franchise and stick-breaking to a topic; the number of times a topic appears in a representations [24]–[26]. To make the number of distinct document is a latent count random variable that could atoms increase at a rate faster than that of the Dirichlet be well modeled with an NB distribution [8], [15]. process, one may consider the Pitman-Yor process [27], Mixture modeling, which infers random probability [28] or the normalized generalized gamma process [23] measures to assign data samples into clusters (mixture that provide extra parameters to add flexibility. components), is a key research area of statistics and To construct more expressive mixture models with machine learning. Although the number of samples as- tractable inference, in this paper we consider mixture signed to clusters are counts, mixture modeling is not modeling as a count-modeling problem. Directly mod- typically considered as a count-modeling problem. It is eling the counts assigned to mixture components as NB often addressed under the Dirichlet-multinomial frame- random variables, we perform joint count and mixture modeling via the NB process, using completely random • M. Zhou is with the Department of Information, Risk, and Operations Management, McCombs School of Business, University of Texas at Austin, measures [9], [22], [29], [30] that are easy to construct Austin, TX 78712. L. Carin is with the Department of Electrical and and amenable to posterior computation. By constructing Computer Engineering, Duke University, Durham, NC 27708. a bivariate count distribution that connects the Poisson, logarithmic, NB and Chinese restaurant table distribu- 2 tions, we develop data augmentation and marginaliza- priors in Section 2 and study the NB distribution in tion techniques unique to the NB distribution, with Section 3. We present the NB process in Section 4, the which we augment an NB process into both the gamma- gamma-NB process in Section 5, and the NB process fam- Poisson and compound Poisson representations, yielding ily in Section 6. We discuss NB process topic modeling unification of count and mixture modeling, derivation in Section 7 and present example results in Section 8. of fundamental model properties, as well as efficient Bayesian inference. 2 PRELIMINARIES Under the NB process, we employ a gamma process 2.1 Completely Random Measures to model the rate measure of a Poisson process. The + normalization of the gamma process provides a random Following [22], for any ν ≥ 0 and any probability dis- + probability measure (not necessarily a Dirichlet process) tribution π(dpd!) on the product space R × Ω, let K ∼ + iid + for mixture modeling, and the marginalization of the Pois(ν ) and (pk;!k) ∼ π(dpd!) for k = 1; ··· ;K . gamma process leads to an NB process for count mod- Defining 1A(!k) as being one if !k 2 A and zero oth- eling. Since the gamma scale parameters appear as NB PK+ erwise, the random measure L(A) ≡ k=1 1A(!k)pk as- probability parameters when the gamma processes are signs independent infinitely divisible random variables marginalized out, they directly control count distribu- L(Ai) to disjoint Borel sets Ai ⊂ Ω, with characteristic tions on atoms and they could be conveniently inferred functions with the beta-NB conjugacy. For mixture modeling of n o eitL(A) = exp RR (eitp − 1)ν(dpd!) ; (1) grouped data, we construct hierarchical models by em- E R×A ploying an NB process for each group and sharing their where ν(dpd!) ≡ ν+π(dpd!). A random signed measure NB dispersion or probability measures across groups. L satisfying (1) is called a Levy´ random measure. More Different parameterizations of the NB dispersion and generally, if the Levy´ measure ν(dpd!) satisfies probability parameters result in a wide variety of NB RR processes, which are connected to previously proposed ×S minf1; jpjgν(dpd!) < 1 (2) nonparametric Bayesian mixture models. The proposed R for each compact S ⊂ Ω, the Levy´ random measure L is joint count and mixture modeling framework provides + new opportunities for better data fitting, efficient infer- well defined, even if the Poisson intensity ν is infinite. ence and flexible model constructions. A nonnegative Levy´ random measure L satisfying (2) was called a completely random measure [9], [29], and 1.1 Related Work it was introduced to machine learning in [31] and [30]. Parts of the work presented here first appeared in [2], [8], 2.1.1 Poisson Process [15]. In this paper, we unify related materials scattered in these three conference papers and provide signifi- Define a Poisson process X ∼ PP(G0) on the product cant expansions. In particular, we construct a Poisson- space Z+ × Ω, where Z+ = f0; 1; · · · g, with a finite logarithmic bivariate distribution that tightly connects and continuous base measure G0 over Ω, such that the NB and Chinese restaurant table distributions, ex- X(A) ∼ Pois(G0(A)) for each subset A ⊂ Ω. The tending the Chinese restaurant process to describe the Levy´ measure of the Poisson process can be derived case that both the numbers of customers and tables are from (1) as ν(dud!) = δ1(du)G0(d!), where δ1(du) is random variables, and we provide necessary conditions a unit point mass at u = 1. If G0 is discrete (atomic) G = P λ δ to recover the NB process and the gamma-NB process as 0 k k !k , then the Poisson process definition X = P x δ ; x ∼ (λ ) from the Dirichlet process and HDP, respectively. is still valid that k k !k k Pois k . If We mention that a related beta-NB process has been G0 is mixed discrete-continuous, then X is the sum of independently investigated in [14]. Our constructions two independent contributions. Except where otherwise of a wide variety of NB processes, including the beta- specified, below we consider the base measure to be NB processes in [8] and [14] as special cases, are built finite and continuous. on our thorough investigation of the properties, relationships and inference of the NB and related stochas- 2.1.2 Gamma Process tic processes. In particular, we show that the gamma- We define a gamma process [10], [22] G ∼ GaP(c; G0) Poisson construction of the NB process is key to uniting on the product space R+ × Ω, where R+ = fx : x ≥ count and mixture modeling, and there are two equiv- 0g, with scale parameter 1=c and base measure G0, such that G(A) ∼ Gamma(G0(A); 1=c) for each subset alent augmentations of the NB process that allow us to λ 1 a−1 − b develop analytic conditional posteriors and predictive A ⊂ Ω, where Gamma(λ; a; b) = Γ(a)ba λ e .

Negative Binomial Process Count and Mixture Modeling

Completely Random Measures and Related Models

STOCHASTIC COMPARISONS and AGING PROPERTIES of an EXTENDED GAMMA PROCESS Zeina Al Masry, Sophie Mercier, Ghislain Verdier

Benford's Law As a Logarithmic Transformation

Hyperwage Theory

Introduction to Lévy Processes

The 7Th Workshop on MARKOV PROCESSES and RELATED TOPICS

1988: Logarithmic Series Distribution and Its Use In

Notes on Scale-Invariance and Base-Invariance for Benford's

Superpositions and Products of Ornstein-Uhlenbeck Type Processes: Intermittency and Multifractality

Package 'Distributional'

Nonparametric Bayesian Latent Factor Models for Network Reconstruction

Option Pricing in Bilateral Gamma Stock Models