On Collapsed Representation of Hierarchical Completely Random Measures
Total Page:16
File Type:pdf, Size:1020Kb
On collapsed representation of hierarchical Completely Random Measures Gaurav Pandey [email protected] Ambedkar Dukkipati [email protected] Department of Computer Science and Automation Indian Institute of Science, Bangalore-560012, India Abstract elling have been available in literature since more than a decade (Landauer & Dumais, 1997; Hofmann, 1999; Blei The aim of the paper is to provide an exact ap- et al., 2001), the first non-parametric approach, that al- proach for generating a Poisson process sam- lowed the number of latent classes to be determined as pled from a hierarchical CRM, without having well, was the hierarchical Dirichlet process (HDP) (Teh to instantiate the infinitely many atoms of the et al., 2006). Both the approaches model the object as a random measures. We use completely random set of repeated draws from an object-specific distribution, measures (CRM) and hierarchical CRM to de- whereby the object specific distribution is itself sampled fine a prior for Poisson processes. We derive from a common distribution. On the other hand, recent ap- the marginal distribution of the resultant point proaches such as hierarchical beta-negative binomial pro- process, when the underlying CRM is marginal- cess (Zhou et al., 2012; Broderick et al., 2015) and hier- ized out. Using well known properties unique archical gamma-Poisson process (Titsias, 2008; Zhou & to Poisson processes, we were able to derive an Carin, 2015) model the object as a point process, sampled exact approach for instantiating a Poisson pro- from an object specific random measure, which is itself cess with a hierarchical CRM prior. Furthermore, sampled from a common random measure. In some sense, we derive Gibbs sampling strategies for hierar- these approaches are more natural for mixed membership chical CRM models based on Chinese restau- modelling, since they model the object as a single entity rant franchise sampling scheme. As an example, rather than as a sequence of draws from a distribution. we present the sum of generalized gamma pro- cess (SGGP), and show its application in topic- A straightforward implementation of any of the above non- modelling. We show that one can determine the parametric models would require sampling the atoms in the power-law behaviour of the topics and words in non-parametric distribution for the base as well as object- a Bayesian fashion, by defining a prior on the pa- specific measure. However, since the number of atoms in rameters of SGGP. these distributions are often infinite, a truncation step is re- quired to ensure tractability. Alternatively, for the HDP, a Chinese restaurant franchise scheme (Teh et al., 2006) 1. Introduction can be used for collapsed inference in the model (that is, without explicitly instantiating the atoms). Fully collapsed Mixed membership modelling is the problem of assigning inference scheme has also been proposed for beta-negative arXiv:1509.01817v2 [math.ST] 2 Jun 2016 an object to multiple latent classes/features simultaneously. binomial process (BNBP) (Heaukulani & Roy, 2013; Zhou, Depending upon the problem, one can allow a single latent 2014) and Gamma-Gamma-Poisson process (Zhou et al., feature to be exhibited single or multiple times by the ob- 2015). Of particular relevance is the work by Roy(2014), ject. For instance, a document may comprise several top- whereby a Chinese restaurant fanchise scheme has been ics, with each topic occurring in the document with vari- proposed for hierarchies of beta proceses (and its gener- able multiplicity. The corresponding problem of mapping alizations), when coupled with Bernoulli process. the words of a document to topics, is referred to as topic modelling. In this paper, it is our aim to extend fully collapsed sampling so as to allow any completely random measure While parametric solutions to mixed membership mod- (CRM) for the choice of base and object-specific measure. rd As proposed in Roy(2014) for hierarchies of generalized Proceedings of the 33 International Conference on Machine beta processes, we propose Chinese restaurant franchise Learning, New York, NY, USA, 2016. JMLR: W&CP volume 48. Copyright 2016 by the author(s). schemes for hierarchies of CRMs, when coupled with Pois- On collapsed representation of hierarchical Completely Random Measures 1 son process. We hope that this will encourage the use of then the union Π = [i=1Πi is also a Poisson process with P1 hierarchical random measures, other than HDP and BNBP, mean measure µ = i=1 µi. This is known as the su- for mixed-membership modelling and will lead to further perposition proposition. Equivalently, if Ni is the counting P1 research into an understanding of the applicability of the process of Πi, then N = i=1 Ni is the counting process P1 various random measures. To give an idea about the flex- of a Poisson process with mean measure µ = i=1 µi. ibility that can be obtained by using other measures, we Finally, let g be a measurable function from S to , and propose the sum of generalized gamma process (SGGP), R Σ = P g(x). By Campbell’s proposition (Kingman, which allows one to determine the power term in the power- x2Π 1992), Σ is absolutely convergent with probability, if and law distribution of topics with documents, by defining a only if prior on the parameters of SGGP. Alternatively, one can Z also define a prior directly on the discount parameter. min(jg(x)j; 1)µ(dx) < 1: (1) S The main contributions in this paper are as follows: If this condition holds, then for any t > 0, Z • We derive marginal distributions of Poisson process, −tΣ −tg(x) E[e ] = exp − (1 − e )µ(dx) : (2) when coupled with CRMs, S • We provide an exact approach for generating a Pois- son process sampled from a hierarchical CRM, with- 2.2. Completely random measures out having to instantiate the infinitely many atoms of Let (Ω; F; P) be some probability space. Let (M(S); B) the random measure. be the space of all σ-finite measures on (S; S) supplied • We provide a Gibbs sampling approach for sampling with an appropriate σ-algebra. A completely random mea- a Poisson process from a hierarchical CRM. sure (CRM) Λ on (S; S), is a measurable mapping from Ω to M(S) such that • In the experiments section, we propose the sum of generalized gamma process (SGGP), and show its ap- 1. PfΛ(;) = 0g = 1, plicability for topic-modelling. By defining a prior on the parameters of SGGP, one can determine the 2. For any disjoint countable collection of sets power-law distribution of the topics and words in a A1;A2;:::; the random variables Λ(Ai); i = 1; 2;::: P Bayesian fashion. are independent, and Λ([Ai) = i Λ(Ai), holds almost surely. (the independent increments property) 2. Preliminaries and background An important characterization of CRMs in terms of Poisson In this section, we fix the notation and recall a few well processes is as follows (Kingman, 1967). For any CRM Λ known results from the theory of point processes. on (S; S) without any fixed atoms or deterministic compo- + nent, there exists a Poisson process N on (R × S; B + ⊗ R R S), such that Λ(dx) = + zN(dz; dx). Using Campbell’s 2.1. Poisson process R proposition, the Laplace transform of Λ(A) for a measur- Let (S; S) be a measurable space and Π be a random count- able set A, is given by the following formula: able collection of points on S. Let N(A) = jΠ\Aj, for any Z measurable set A. N is also known as the counting process −tΛ(A) −tz E[e ] = exp − (1 − e )ν(dz; dx) ; t ≥ 0; of Π. Π is called a Poisson process if N(A) is indepen- + R ×A dent of N(B), whenever A and B are disjoint measurable (3) sets, and N(A) is Poisson distributed with mean µ(A) for where ν denotes the mean measure of the underlying Pois- a fixed σ-finite measure µ. In sequel, we refer to both the son process N. ν is also referred to as the Poisson intensity random collection Π and its counting process N as Poisson measure of Λ. If ν(dz; dx) = ρ(dz)µ(dx), for a σ-finite process. measure µ on S, and a σ-finite measure ρ on R+ that sat- R −tz isfies + (1 − e )ρ(dz) < 1, then Λ(:) is known as Let (T; T ) be another measurable space and f : S ! T R homogenous CRM. In sequel, we assume µ(:) to be finite. be a measurable function. If the push forward measure Moreover, unless specified, whenever we refer to CRM, it of µ via f, that is, µ ◦ f −1 is non-atomic, then f(Π) = means a homogeneous completely random measure with- ff(x): x 2 Πg is also a Poisson process with mean mea- out any fixed atoms or deterministic component. sure µ ◦ f −1. This is also known as the mapping proposi- tion for Poisson processes (Kingman, 1992). Moreover, if Let N be the Poisson process of the CRM Λ, that is, R Π1; Π2;::: is a countable collection of independent Pois- Λ(dx) = + sN(dz; dx). If Π is the random collec- R son processes with mean measures µ1; µ2;::: respectively, tion of points corresponding to N, then Λ can equivalently On collapsed representation of hierarchical Completely Random Measures P be written as Λ = (z;x)2Π zδx. fz :(z; x) 2 Πg Our aim is to infer the latent features Ni; 1 ≤ i ≤ n from constitute the weights of the CRM Λ. By the mapping Xi; 1 ≤ i ≤ n.