User Profile Preserving Social Network Embedding

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) User Profile Preserving Social Network Embedding Daokun Zhanga, Jie Yinb, Xingquan Zhuc;d, Chengqi Zhanga a Centre for Artificial Intelligence, FEIT, University of Technology Sydney, Australia b Data61, CSIRO, Australia c Dept. of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, USA d School of Computer Science, Fudan University, China [email protected]; [email protected]; [email protected]; [email protected] Abstract network can be effectively captured. As a result, existing ma- chine learning methods can be directly applied in the low- This paper addresses social network embedding, dimensional vector space to perform network analytic tasks which aims to embed social network nodes, in- such as node classification, network clustering, etc. cluding user profile information, into a latent low- dimensional space. Most of the existing works on Recently, a series of algorithms have been proposed network embedding only consider network struc- for network representation learning (NRL), such as Deep- [ et al. ] [ et al. ] ture, but ignore user-generated content that could Walk Perozzi , 2014 , LINE Tang , 2015 , [ et al. ] [ be potentially helpful in learning a better joint GraRep Cao , 2015 , and node2vec Grover and ] network representation. Different from rich node Leskovec, 2016 . These approaches have been shown to content in citation networks, user profile informa- be effective in a variety of network analytic tasks, ranging [ et al. ] tion in social networks is useful but noisy, sparse, from node classification Sen , 2008 , anomaly detec- [ et al. ] [ et and incomplete. To properly utilize this information Bhuyan , 2014 , community detection Yang al. ] [ ] tion, we propose a new algorithm called User Pro- , 2013 , to link prediction Lu¨ and Zhou, 2011 . However, e.g. file Preserving Social Network Embedding (UPP- most of them have considered network structure only, , the SNE), which incorporates user profile with net- links between nodes, but ignored other user-generated content e.g. work structure to jointly learn a vector represen- ( , text, user profiles) that could potentially benefit network tation of a social network. The theme of UPP- representation learning and subsequent analytic tasks. SNE is to embed user profile information via a non- In this paper, we are mainly concerned about the problem linear mapping into a consistent subspace, where of social network embedding, which embeds each user in a network structure is seamlessly encoded to jointly social network into a latent low-dimensional space. In so- learn informative node representations. Extensive cial networks, users are not only connected by social rela- experiments on four real-world social networks tionships (e.g., friendship or the follower-followee relation- show that compared to state-of-the-art baselines, ship), but they are also associated with user profile infor- our method learns better social network represen- mation, consisting of attributes such as gender, geographic tations and achieves substantial performance gains location, interests, or school/affiliation. Such profile in- in node classification and clustering tasks. formation can reflect and affect the forming of community structures and social circles [Leskovec and Mcauley, 2012; Yang et al., 2013]. Motivated by the fact that user profile in- 1 Introduction formation is potentially helpful in learning a better joint net- The huge growth of online social networks, e.g., Facebook, work representation, we focus on studying how user profile Twitter, Google Talk, Wechat, etc., has revolutionized a new information can be leveraged and incorporated into the learn- way for people to connect, express themselves, and share ing of social network representations. content with others in today’s cyber society. Users in on- Indeed, several very recently developed algorithms [Yang line social networks are connected with each other to form et al., 2015; Pan et al., 2016] have attempted to utilize node a social graph (e.g., the friendship graph). One of the most content information, such as textual features of each node in critical problems in social network analysis is the automatic citation networks, for effective network representation learn- classification of users into meaningful groups based on their ing. These existing works have confirmed that node content social graphs, which has many useful practical applications indeed provides crucial information to learn better network such as user search, targeted advertising and recommendation representations. However, as we will soon demonstrate in systems. Therefore, it is essential to accurately learn use- Section 5, these methods are mainly designed to consider ful information from social networks. One promising strat- consistent node content, but fail to work for user profiles in egy is to learn a vector representation of a social network: social networks. This is mainly attributed to two reasons. each network node is represented as a low-dimensional vec- First, today’s online social networks rely on users to manually tor such that the information conveyed by the original social input profile attributes, so attributes in profiles could be very 3378 Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17) Table 1: Characteristics of information sources used for network 2 Related Work embedding In this section, we review two lines of NRL algorithms, Info. Sources Consistency Sparsity Noise Incompleteness namely network structure preserving NRL methods that are based on network structure only, and content augmented NRL Structure Medium High Medium Low methods that combine node content with network structure to Node content High Medium Low Low enhance network representation learning. User profile Low High High High 2.1 Network Structure Preserving NRL Methods DeepWalk [Perozzi et al., 2014] is one of the pioneer works sparse, incomplete, and noisy. The values of user profile at- for learning node representations in networks. Following tributes are often long-tail distributed, and the values of some the idea of Skip-Gram [Mikolov et al., 2013], DeepWalk attributes like school or address may occur very infrequently generates node context using truncated random walks and or are simply missing. Second, different from node content learns node representations that allow the nodes sharing sim- such as posts and comments that are topic-centric, user pro- ilar node context to be represented similarly. LINE [Tang files are depicted by user attributes on different dimensions, et al., 2015] formulates a more clear objective function to such as interests, or school/affiliation, and the values on these preserve the first-order proximity and the second-order prox- dimensions are completely distinct and inconsistent. It is thus imity. GraRep [Cao et al., 2015] further considers higher very difficult to find useful information from user profiles that order proximities that describe the representation similarity could complement network structure towards learning a joint between nodes sharing indirect neighbors. Very recently, vector representation of social networks. SDNE [Wang et al., 2016] is proposed to learn non-linear In Table 1, we summarize major characteristics of informa- network representations by applying deep autoencoder model tion sources available for network embedding. Our analysis on node adjacent matrix and exploiting the first-order prox- and empirical study confirm that user profiles are largely dif- imity as supervised information. To capture both the lo- ferent from node content features, and therefore existing rich cal and global network structure, node2vec [Grover and node content based network embedding methods are ineffec- Leskovec, 2016] exploits biased random walks to generate tive in handling user profiles for representation learning. context nodes, and then applies DeepWalk [Perozzi et al., To overcome the above-mentioned difficulties, we propose 2014] to learn node representations. The above NRL algo- a new algorithm called User Profile Preserving Social Net- rithms consider only network structure, without taking ad- work Embedding (UPP-SNE), which incorporates user pro- vantage of user-generated content widely available in social file information with network structure to jointly learn a vec- networks to learn more informative network representations. tor representation of social networks. The theme of the UPP- SNE algorithm is to learn a joint embedding representation 2.2 Content Augmented NRL Methods by performing a non-linear mapping on user profiles guided Text-associated DeepWalk (TADW) [Yang et al., 2015] is the by network structure. In this feature reconstruction process, first attempt to import textual features into NRL. By proving network structure helps filter out noisy information from user DeepWalk is equivalent to matrix factorization, TADW incor- profiles and embed them into a consistent subspace, into porates textural features into network embedding through ma- which topology structure is seamlessly encoded to jointly trix factorization. TADW can be regarded as a special case of learn an informative embedding representation. The inter- our proposed algorithm where a linear mapping is performed play between user profile information and network structure on node content features. TriNDR [Pan et al., 2016] further enables them to complement with each other towards learn-

User Profile Preserving Social Network Embedding

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support