Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence
Generalized Latent Factor Models for Social Network Analysis Wu-Jun Li†, Dit-Yan Yeung‡, Zhihua Zhang † Shanghai Key Laboratory of Scalable Computing and Systems Department of Computer Science and Engineering, Shanghai Jiao Tong University, China ‡ Department of Computer Science and Engineering Hong Kong University of Science and Technology, Hong Kong, China College of Computer Science and Technology, Zhejiang University, China [email protected], [email protected], [email protected] Abstract Homophily and stochastic equivalence are two primary features of interest in social networks. Recently, the multiplicative latent factor model (MLFM) is proposed to model social networks with directed links. Although MLFM can capture stochastic equivalence, it cannot model well ho- mophily in networks. However, many real-world (a) homophily (b) stochastic equivalence networks exhibit homophily or both homophily Figure 1: Homophily and stochastic equivalence in networks. and stochastic equivalence, and hence the net- work structure of these networks cannot be mod- eled well by MLFM. In this paper, we propose graph has the feature of homophily. On the other hand, if a novel model, called generalized latent factor the nodes of a graph can be divided into groups where mem- model (GLFM), for social network analysis by en- bers within a group have similar patterns of links, we say this hancing homophily modeling in MLFM. We de- graph exhibits stochastic equivalence. The web graph has vise a minorization-maximization (MM) algorithm such a feature because some nodes can be described as hubs with linear-time complexity and convergence guar- which are connected to many other nodes called authorities antee to learn the model parameters. Extensive ex- but the hubs or authorities are seldom connected among them- periments on some real-world networks show that selves. For stochastic equivalence, the property that members GLFM can effectively model homophily to dramat- within a group have similar patterns of links also implies that ically outperform state-of-the-art methods. if two nodes link to or are linked by one common node, the two nodes most likely belong to the same group. 1 Introduction Examples of homophily and stochastic equivalence in di- rected graphs are illustrated in Figure 1, where the locations A social network 1 [Wasserman and Faust, 1994] is often in the 2-dimensional space denote the characteristics of the represented as a graph in which the nodes represent the ob- points (nodes). From Figure 1(a), we can see that a link is jects and the edges (or called links) represent the binary re- more likely to exist between two points close to each other, lations between objects. The edges in a graph can be di- which is the property of homophily. In Figure 1(b), the points rected or undirected. If the edges are directed, we call the form three groups associated with different colors, and the graph a directed graph. Otherwise, the graph is an undirected nodes in each group share similar patterns of links to nodes graph. Unless otherwise stated, we focus on directed graphs in other groups, but the nodes in the same group are not neces- in this paper because an undirected edge can be represented sarily connected to each other. This is the property of stochas- by two directed edges with opposite directions. Some typi- tic equivalence. Note that in a graph exhibiting stochastic cal networks include friendship networks among people, web equivalence, two points close to each other are not necessarily graphs, and paper citation networks. connected to each other and connected points are not neces- As pointed out by [Hoff, 2008], homophily and stochas- sarily close to each other, which is different from the property tic equivalence are two primary features of interest in social of homophily. networks. If an edge is more likely to exist between two nodes with similar characteristics than between those nodes As social network analysis (SNA) is becoming more and having different characteristics, we say the graph exhibits ho- more important in a wide range of applications, many SNA mophily. For example, two individuals are more likely to be models have been proposed [Goldenberg et al., 2009]. In this friends if they share common interests. Hence, a friendship paper, we focus on latent variable models [Bartholomew and Knott, 1999] which have been successfully applied to model 1In this paper, we use the terms ‘network’, ‘social network’ and social networks [Hoff, 2008; Nowicki and Snijders, 2001; ‘graph’ interchangeably. Hoff et al., 2002; Kemp et al., 2006; Airoldi et al., 2008;
1705 Hoff, 2009]. These models include: the latent class model overall density of the links in the network, Λ is a D × D [Nowicki and Snijders, 2001] and its extensions [Kemp et diagonal matrix with the diagonal entries being either positive al., 2006; Airoldi et al., 2008], the latent distance model or negative. The latent eigenmodel generalizes both latent [Hoff et al., 2002], the latent eigenmodel [Hoff, 2008], and class models and latent distance models. It can model both the multiplicative latent factor model (MLFM) [Hoff, 2009]. homophily and stochastic equivalence in undirected graphs Among all these models, the recently proposed latent eigen- [Hoff, 2008]. model, which includes both the latent class model and the To adapt the latent eigenmodel for directed graphs, MLFM latent distance model as special cases, can capture both ho- defines T mophily and stochastic equivalence in networks. However, Θik = μ + Xi∗ΛWk∗, (1) it can only model undirected graphs. MLFM [Hoff, 2009] where X and W have orthonormal columns. Note that the adapts the latent eigenmodel for directed graphs. However, key difference between the latent eigenmodel and MLFM lies as to be shown in our experiments, in fact it cannot model in the fact that MLFM adopts a different receiver factor ma- well homophily. trix W which enables MLFM to model directed (asymmetric) In this paper, we propose a novel model, called general- graphs. As we will show in our experiments, this modifica- ized latent factor model (GLFM), for social network analysis tion in MLFM makes it fail to model homophily in networks. N by enhancing homophily modeling in MLFM. The learning Letting Θ =[Θik]i,k=1, we can rewrite MLFM as follows: algorithm of GLFM is guaranteed to converge to a local op- T timum and has linear-time complexity. Hence, GLFM can be Θ = μE + XΛW , (2) used to model large-scale graphs. Extensive experiments on where E is an N×N matrix with all entries being 1. community detection in some real-world networks show that We can find that MLFM is a special case of the following GLFM dramatically outperforms existing methods. model: Θ = μE + UVT . (3) 2 Notation and Definition For example, we can get MLFM by setting U = X and V = WΛ X W Λ We use boldface uppercase letters, such as K, to denote ma- . Furthermore, it is easy to compute the , and U V trices, and boldface lowercase letters, such as x, to denote in (2) based on the learned and in (3). Hence, in the vectors. The ith row and the jth column of a matrix K are sequel, MLFM refers to the model in (3). denoted as Ki∗ and K∗j , respectively. Kij denotes the el- ement at the ith row and jth column in K. xi denotes the 4 Generalized Latent Factor Models ith element in x. We use tr(K) to denote its trace, KT for As discussed above, MLFM can capture stochastic equiva- its transpose and K−1 for its inverse. · is used to denote lence but cannot model well homophily in directed graphs. the length of a vector. |·|denotes the cardinality of a set. I Here, we propose our GLFM to enhance homophily model- denotes the identity matrix whose dimensionality depends on ing in MLFM. the context. For a matrix K, K 0 means that K is positive semi-definite (psd) and K 0 means that K is positive def- 4.1 Model inite (pd). K M means K − M 0. N (·) denotes the In GLFM, Θik is defined as follows: normal distribution, either for scalars or vectors. ◦ denotes 1 T 1 T Θik = μ + Ui∗U + Ui∗V . (4) the Hadamard product (element-wise product). 2 k∗ 2 k∗ N A Let denote the number of nodes in a graph. is the Comparing (4) to (3), we can find that GLFM generalizes N Aij =1 T 3 adjacency (link) matrix for the nodes. if there MLFM by adding an extra term Ui∗U . It is this extra term i j A D k∗ exists a link from node to node . Otherwise, ij =0. that enables GLFM to model homophily in networks, which denotes the number of latent factors. In real-world networks, will be detailed in Section 4.2 when we analyze the objective A i j if ij =1, we can say that there is a relation from to . function in (7). This will also be demonstrated empirically A However, ij =0does not necessarily mean that there is no later in our experiments. i j A relation from to . In most cases, ij =0means that the Based on (4), the likelihood of the observations can be de- i j relationship from to is missing. Hence, we use an indicator fined as follows: Z matrix to indicate whether or not an element is missing. Aik 1−Aik Zik p(A | U, V,μ)= [S (1 − Sik) ] , (5) More specifically, Zij =1means that Aij is observed while ik i=k Zij =0means that Aij is missing. where exp (Θik) 3 Multiplicative Latent Factor Models Sik = . (6) 1+exp(Θik) The latent eigenmodel is formulated as follows 2: Note that as in the conventional SNA model, we ignore the T Θik = log odds(Aik =1| Xi∗, Xk∗,μ)=μ + Xi∗ΛXk∗, diagonal elements of A. That is, in this paper, we set Aii = Zii =0by default. where X is an N × D matrix with Xi∗ denoting the latent i μ Furthermore, we put normal priors on the parame- representation of node and is a parameter reflecting the μ U V p μ N μ | ,τ−1 ,p U ters , and : ( )= ( 0 ) ( )= 2 D D Note that in this paper, we assume for simplicity that there is no d=1 N (U∗d | 0,βI),p(V)= d=1 N (V∗d | 0,γI). attribute information for the links. It is straightforward to integrate 3 1 attribute information into the existing latent variable models as well Note that the coefficient 2 in (4) makes no essential difference as our proposed model. between (4) and (3). It is only for convenience of computation.
1706 4.2 Learning et al., 2000] to learn it. MM is a so-called expectation- Although the Markov chain Monte Carlo (MCMC) algo- maximization (EM) algorithm [Dempster et al., 1977] with- rithms designed for other latent variable models can easily be out missing data, alternating between constructing a concave adapted for GLFM, we do not adopt MCMC here for GLFM lower bound of the objective function and maximizing that because MCMC methods typically incur very high computa- bound.