Semantic Community Identification in Large Attribute Networks

Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) Semantic Community Identification in Large Attribute Networks Xiao Wang1,5,DiJin1, Xiaochun Cao2, Liang Yang2,3, Weixiong Zhang4,5 1School of Computer Science and Technology, Tianjin University, Tianjin 300072, China 2State Key Laboratory of Information Security, IIE, Chinese Academy of Sciences, Beijing 100093, China 3School of Information Engineering, Tianjin University of Commerce, Tianjin 300134, China 4College of Math and Computer Science, Institute for Systems Biology, Jianghan University, Wuhan, Hubei 430056, China 5Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130, USA {wangxiao cv, jindi}@tju.edu.cn, {caoxiaochun, yangliang}@iie.ac.cn, [email protected] Abstract to be assigned correctly to the right communities by only re- Identification of modular or community structures of a net- lying on network topology. Therefore, it is insufficient to ac- work is a key to understanding the semantics and functions curately determine the community structure using network of the network. While many network community detection topology alone. In addition to network topology, semantic methods have been developed, which primarily explore net- information, e.g., that of node attributes, is often available. work topologies, they provide little semantic information of For example, a node (i.e., a person) in a social network is the communities discovered. Although structures and seman- often annotated by a personal profile with information such tics are closely related, little effort has been made to discover as education background, circle of friends and profession; and analyze these two essential network properties together. a node (i.e., a paper) in a citation network is typically an- By integrating network topology and semantic information notated with title, abstract and key words. Different from on nodes, e.g., node attributes, we study the problems of de- network topology, node semantics capture characteristics of tection of communities and inference of their semantics si- multaneously. We propose a novel nonnegative matrix factor- individual nodes and provide a piece of valuable informa- ization (NMF) model with two sets of parameters, the com- tion orthogonal to information of network topology. Integra- munity membership matrix and community attribute matrix, tion of network topological and semantic information holds and present efficient updating rules to evaluate the parameters a great potential for community identification. with a convergence guarantee. The use of node attributes im- Nevertheless, it is technically challenging to effectively proves upon community detection and provides a semantic in- combine these two pieces of valuable albeit orthogonal in- terpretation to the resultant network communities. Extensive experimental results on synthetic and real-world networks not formation. Particularly, two obstacles need to be addressed only show the superior performance of the new method over in order to properly integrate these two types of informa- the state-of-the-art approaches, but also demonstrate its abil- tion. First, how to adequately characterize a community. The ity to semantically annotate the communities. most existing methods for community detection mainly rely on network topologies. However, missing, meaningless or Introduction even erroneous edges are ubiquitous in real networks, which casts doubts on the accuracy and/or correctness of the net- Complex systems can be represented in networks or graphs. work communities discovered based on network topology One of the most prominent features of such networks is the alone. While the nodes in a community are highly con- community structure, where the nodes within a community nected, they should also have similar characteristics, re- are densely connected whereas nodes in different commu- flected by attributes. Thus, nodes attributes may carry es- nities are sparsely connected (Girvan and Newman 2002). sential information of communities that is complementary to Community structures help reveal organizational structures the information of network topology. Therefore, even though and functional components of a complex system. Therefore, two nodes are not directly connected, they may belong to the community detection is an essential step toward characteri- same community if they share the same characteristics, and zation of a complex system. the use of node attributes may enhance community discov- Network topology, an important network description, has ery. Second, how to adequately interpret or semantically an- been broadly exploited by the most existing methods for notate communities. Functional analysis of network commu- community detection. However, network topology reflects nities is typically and independent, post-processing task fol- merely one aspect of a network and is often noisy. As a re- lowing community detection. The result from a community sult, using network topology alone may not necessarily give discovery often provides little information beyond network rise to a satisfactory partition of a network. For instance, it topology regarding why a group of nodes from a commu- is not uncommon that two nodes that belong to the same nity, their semantic meaning, or potential functions. In order community are not directly connected, and a node connect- to semantically annotate a community, supplemental infor- ing to multiple communities for distinct reasons is difficult mation, e.g., background information and/or domain knowl- Copyright c 2016, Association for the Advancement of Artificial edge, is usually required. Even though such domain infor- Intelligence (www.aaai.org). All rights reserved. mation is available, how to fully utilize such information re- 265 mains challenging, application specific and time consuming. cial relations and user generated content in a social network To address the above two problems, we propose and de- (Pei, Chakraborty, and Sycara 2015). However, this method velop in this paper a method, named as Semantic Commu- focused on utilizing additional content information to detect nity Identification (SCI), to identify network communities communities, and failed to study the relationship between with semantic annotation. The SCI method integrates net- communities and these content. work topological and node semantic information; it combines topology based community memberships and node- SCI: The network model attribute based community attributes (or semantics) in the framework of nonnegative matrix factorization (NMF, (Se- Consider an undirected network G =(V,E) with n nodes V and e edges E, represented by a binary-valued adjacency ung and Lee 2001)). The key intuition behind SCI stems in n×n two observations: two nodes are likely to be connected if matrix A ∈ R . Associated with each node i are its at- their community memberships are similar, and two nodes tributes Si, which may be semantic characteristics of the likely belong to the same community if their attributes are node. The attributes of a node are in the form of an m- consistent with the underlying community attributes to be dimensional binary-valued vector, and the attributes of all the nodes can be represented by a node attribute matrix learned. To make the novel SCI method effective, we intro- n×m duce a sparsity penalty in order to select the most related at- S ∈ R . The problem of community identification is to tributes for each community and devise a multiplicative up- partition the network G into k communities as well as to dating rule with a convergence guarantee. Extensive exper- infer the related attributes or semantics of each community. iments on synthetic and real networks, in comparison with Modeling network topologies. We define the propensity several state-of-the-art methods, are performed to assess the of node i belonging to community j as Uij. The commu- performance of SCI. nity membership of all the nodes in the network is then U =(Uij), where i =1, 2, ..., n and j =1, 2, ..., k. Con- Related work sequently UirUpr presents the expected number of edges Several community detection methods, as reviewed in (Xie, between nodes i and p in community r. Summing over all communities, the expected number of edges between i and Kelley, and Szymanski 2013), have been developed to ex- k plore network topologies, including the well-known ones p is r=1 UirUpr. This process of generating edges implies based on nonnegative matrix factorization (NMF) (Wang et that if two nodes have similar community memberships, they al. 2011; Yang and Leskovec 2013) and stochastic block- have a high propensity to be linked. The expected number of model (SBM) (Karrer and Newman 2011). Among these edges between pairs of nodes should be as closely consistent methods are ones that combines network topologies and as possible with the network topology denoted by A, which node attributes (content or features). In particular, a uni- gives rise to the following function in matrix formulation: fied method was suggested to combine a conditional model T 2 min A − UU F . (1) for topology analysis and a discriminative model for mak- U≥0 ing use of node attributes (Yang et al. 2009). However, this method focuses on community detection without inferring Modeling node attributes. We define the propensity of the most relevant attributes for each community. Edge con- community r to have attribute q as Cqr. So for all the com- tent was also leveraged to improve community detection

Semantic Community Identification in Large Attribute Networks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support