Nonparametric Dynamic Network Modeling
Total Page:16
File Type:pdf, Size:1020Kb
Nonparametric Dynamic Network Modeling Ayan Acharya Avijit Saha Mingyuan Zhou Dept. of ECE, UT Austin Dept. of CSE, IIT Madras Dept. of IROM, UT Austin [email protected] [email protected] [email protected] Dean Teffer Joydeep Ghosh Applied Research Laboratories, UT Austin Dept. of ECE, UT Austin [email protected] [email protected] ABSTRACT to the presence of probit or logit links. On the other hand, Gaussian Certain relational datasets, such as those representing social net- assumption is often overly restrictive for modeling binary matri- works, evolve over time, motivating the development of time-series ces. Since the inference techniques for linear dynamical systems models where the characteristics of the underlying groups of enti- are well-developed, one usually is tempted to connect a binary ob- ties can adapt with time. This paper proposes the Dynamic Gamma servation to a latent Gaussian random variable using the probit or Process Poisson Factorization for Network (D-NGPPF) modeling logit links. Such approaches, however, involve heavy computation framework, wherein binary network entries are modeled using a and lack intuitive interpretation of the latent states. truncated Poisson distribution and the ideal number of network This work attempts to address such inadequacies by introducing groups is discovered from the data itself. Crucially, a Gamma- an efficient and effective model for binary matrices that evolve over markov chain enables the characteristics of these groups to smoothly time. Its contributions include: evolve over time. Exploiting the properties of the Negative Bino- • A novel non-parametric Gamma Process dynamic network model mial distribution and a novel data augmentation technique, closed that predicts the number of latent network communities from the form Gibbs sampling updates are derived that yield superior empir- data itself. ical results for both synthetic and real world datasets. • A technique for allowing the weights of these latent communities to vary smoothly over time using a Gamma-Markov chain, the in- Keywords ference of which is solved using an augmentation trick associated Dynamic Network modeling, Poisson factorization, Gamma Pro- with the Negative Binomial distribution together with a forward- cess backward sampling algorithm, each step of which has closed-form updates. 1. INTRODUCTION • Empirical results indicating clear superiority of the proposed dy- Many complex social and biological interactions can be naturally namic network model as compared to existing baselines for dy- represented as graphs. Often these graphs evolve over time. For namic and static network modeling. example, an individual in a social network can get acquainted with The rest of the paper is organized as follows. Pertinent back- a new person, an author can collaborate with a new author to write ground and related works are outlined in Section 2. A detailed de- a research paper and proteins can change their interactions to form scription of the Gamma Process network modeling is provided in new compounds. Consequently, a variety of statistical and graph- Section 3 which is then followed by a description of the Dynamic theoretic approaches have been proposed for modeling both static Gamma Process network model in Section 4. Empirical results for and dynamic networks [2; 10; 11; 21; 23; 30; 31; 37; 38]. both synthetic and real-world data are reported in Section 5. Fi- Of particular interest in this work are scalable techniques that nally, the conclusions and future works are listed in Section 6. can identify groups or communities and track their evolution. Ex- isting non-parametric Bayesian approaches for this task promise 2. BACKGROUND AND RELATED WORK to solve the model selection problem of identifying an appropri- ate number of groups, but are computationally intensive, and often 2.1 Negative Binomial Distribution do not match the characteristics of real datasets. All such mod- PK PK els assume that the data comes from a latent space that has either Lemma 2.1. Let xk ∼ Pois(ζk) 8k, X = k=1 xk, ζ = k=1 ζk. discrete sets of configurations [9; 23; 30] or is modeled using Gaus- If (y1; ··· ; yK ) ∼ mult(X; ζ1/ζ; ··· ; ζK /ζ), then the following sian distribution [10; 15; 37]. Approaches that employ discrete la- holds: tent states do not have closed-form inference updates, mostly due P (x1; ··· ; xK ) = P (y1; ··· ; yK ; X): (1) The negative binomial (NB) distribution m ∼ NB(r; p), with Γ(m+r) m probability mass function (PMF) Pr(M = m) = m!Γ(r) p (1 − Permission to make digital or hard copies of all or part of this work for r personal or classroom use is granted without fee provided that copies are p) for m 2 Z, can be augmented into a gamma-Poisson con- not made or distributed for profit or commercial advantage and that copies struction as m ∼ Pois(λ); λ ∼ Gamma(r; p=(1 − p)), where the bear this notice and the full citation on the first page. To copy otherwise, to gamma distribution is parameterized by its shape r and scale p=(1− republish, to post on servers or to redistribute to lists, requires prior specific p). It can also be augmented under a compound Poisson represen- permission and/or a fee. Pl iid WOODSTOCK ’97 El Paso, Texas USA tation as m = t=1 ut; ut ∼ Log(p); l ∼ Pois(−rln(1 − p)), Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00. where u ∼ Log(p) is the logarithmic distribution [18]. Lemma 2.2 ([39]). If m ∼ NB(r; p) is represented under its com- the value of which decreases as τ increases. pound Poisson representation, then the conditional posterior of l given m and r has PMF: 2.3 Poisson Factor Analysis A large number of discrete latent variable models for count ma- Pr(l = jjm; r) = Γ(r) js(m; j)jrj ; j = 0; 1; ··· ; m; (2) Γ(m+r) trix factorization can be united under Poisson factor analysis (PFA) D×V where js(m; j)j are unsigned Stirling numbers of the first kind. We [42], which factorizes a count matrix Y 2 Z under the Pois- D×K denote this conditional posterior as l ∼ CRT(m; r), a Chinese son likelihood as Y ∼ Pois(ΦΘ), where Φ 2 R+ is the factor K×V restaurant table (CRT) count random variable, which can be gen- loading matrix or dictionary, Θ 2 R+ is the factor score matrix. Pm A wide variety of algorithms, although constructed with different erated via l = n=1 zn; zn ∼ Bernoulli(r=(n − 1 + r)). motivations and for distinct problems, can all be viewed as PFA Lemma 2.3. If λ ∼ Gamma(r; 1=c); xi ∼ Poisson(miλ), then P with different prior distributions imposed on Φ and Θ. For exam- P i mi x = xi ∼ NB(r; p), where p = P . i c+ i mi ple, non-negative matrix factorization [7; 26], with the objective Lemma 2.4. If λ ∼ Gamma(r; 1=c); xi ∼ Poisson(miλ), then to minimize the Kullback-Leibler divergence between N and its P P factorization ΦΘ, is essentially PFA solved with maximum likeli- λjfxig ∼ Gamma(r + i xi; 1=(c + i mi)). hood estimation. LDA [4] is equivalent to PFA, in terms of both Lemma 2.5. If x ∼ Pois(m r ); r ∼ Gamma(r ; 1=d); r ∼ i i 2 2 1 1 block Gibbs sampling and variational inference, if Dirichlet distri- Gamma(a; 1=b); , then (r |−) ∼ Gamma(a + `; 1=(b − log(1 − 1 bution priors are imposed on both φ 2 D, the columns of Φ, and p))) where (`jx; r ) ∼ CRT(P x ; r ); p = P m =(d+P m ). k R+ 1 i i 1 i i i i θ 2 V , the columns of Θ. The gamma-Poisson model [6; 34] The proof and illustration can be found in Section 3.3 of [1]. k R+ is PFA with gamma priors on Φ and Θ. A family of negative bi- Lemma 2.6. If ri ∼ Gamma(ai; 1=b) 8i 2 f1; 2; ··· ;Kg; b ∼ nomial (NB) processes, such as the beta-NB [5; 42] and gamma- K K X X NB processes [39; 41], impose different gamma priors on fθvkg, Gamma(c; 1=d), then bjfrig ∼ Gamma( ai +c; 1=( ri +d). the marginalization of which leads to differently parameterized NB i=1 i=1 distributions to explain the latent counts. Both the beta-NB and 2.2 Gamma Process gamma-NB process PFAs are nonparametric Bayesian models that allow K to grow without limits [14]. Following [36], for any ν+ ≥ 0 and any probability distribu- + + tion π(dpd!) on the product space R × Ω, let K ∼ Pois(ν ) 2.4 Static and Dynamic Network Modeling iid + and (pk;!k) ∼ π(dpd!) for k = 1; ··· ;K . Defining 1A(!k) We mention select, most relevant approaches from a substan- as being one if !k 2 A and zero otherwise, the random measure tial literature on this topic. Among static latent variable based PK+ L(A) ≡ k=1 1A(!k)pk assigns independent infinitely divisible models, the Infinite Relational Model (IRM [21]) allows for mul- random variables L(Ai) to disjoint Borel sets Ai ⊂ Ω, with char- tiple types of relations between entities in a network and an in- acteristic functions: finite number of clusters, but restricts these entities to belong to ZZ only one cluster. The Mixed Membership Stochastic Blockmodel itL(A) itp E e = exp (e − 1)ν(dpd!) ; (3) (MMSB [2]) assumes that each node in the network can exhibit R×A a mixture of communities. Though the MMSB has been applied where ν(dpd!) ≡ ν+π(dpd!). A random signed measure L sat- successfully to discover complex network structure in a variety of isfying the above characteristic function is called a Lévy random applications, the computational complexity of the underlying in- 2 measure.