Predicting Interaction Quality in Online Social Networks

Predicting Interaction Quality in Online Social Networks Ting Tang Aubrey Henderson Ding Zhao [email protected] [email protected] [email protected] Abstract vector machine classification system by combining user properties and network properties in a principled way. Next, we show The quality of interactions among users on social media that the Chatous graph can be modeled as a bipartite graph using sites often depends on both the properties intrinsic to the users user gender properties, and apply collaborative filtering and sin- themselves as well as on the users’ past interactions on the sites. gular value decomposition techniques to perform the prediction We study the Chatous random chat network, for which interac- task. Finally, we evaluate our two approaches on a sample set of tion quality is measured by the conversation length between us- the Chatous network, and demonstate that both methods achieve ers. By accounting for user properties and modeling user interac- high prediction accuracy. tions as graph properties, we can accurately predict the quality of interactions between pairs of users in this network before they 2 Background interact. We experiment with two methods of incorporating user and graph properties. First, we build a support vector machine Guo et al. (2012) describes the creation of Chatous and sev- (SVM) based multiclass classification system by combining user eral exploratory methods for predicting optimal conversation and graph properties as features in a principled manner. Second, partners [2]. Initially, the authors sought to assess whether the we decompose the network into a bipartite graph using the most quality of a conversation was linearly related to a particular set of salient user property - gender - and implement collaborative features derived from user characteristics. A subsequent ap- filtering- and singular value decomposition- based prediction proach involved implementing a PageRank-inspired algorithm to systems. We demonstrate that both approaches produce good rank users based on features such as weighted user ratings and results, with 62.2% accuracy for the feature-based classification conversation length. For their third approach, which produced approach and 59.2% accuracy for the bipartite approach. the best predictive accuracy among the three, the authors lever- aged the notion of triads between a participant and his/her can- 1 Introduction didate matches to predict which of two user pairings possessed the greater conversation length. Since the inception of Friendster in 2002, online social net- Guha et al. (2004) presents a model for predicting the trust works have continued to proliferate and evolve as an increasingly relationship between a pair of users by applying trust propaga- popular medium of social interaction. In fact, current estimates tion toward the construction of a web of trust. While explicit reflect that nearly 60% of the world’s population has participated trust signals were propagated to neighboring nodes via four in exchanges via Facebook, Twitter, Google+, and/or LinkedIn— types of atomic propagations, one-step distrust was utilized to and this number is expected to climb. According to [6], 98% of incorporate distrust signals. When evaluated on the Epinions 18- to 24-year-olds have been categorized as active partici- dataset, this iterative method correctly predicted the pants/consumers of social media. trust/distrust relationship on masked edges in the graph with a For many of the aforementioned services and, in particular, prediction error rate of 6.4% for the entire dataset and a 14.7% online dating websites, an essential component to their contin- error rate on a balanced dataset. ued success lies in the automatic recommendation of viable con- Leskovec, Huttenlocher, and Kleinberg (2010) discuss nections. Such is the case with Chatous, a text-based, random training a logistic regression binary classifier for predicting the chat network that pairs users from over 180 countries. For Cha- sign associated with links in an online social network. Evaluating tous, our goal is to match users such that they will interact in a their method on the Epinions, Slashdot, and Wikipedia datasets, high-quality manner, as measure by the length of their conversa- the researchers demonstrated that correct edge sign prediction tions. was achieved with an accuracy of 93.4%, 93.5% and 80.2% on In this paper, we describe our application of network analy- the three datasets, respectively. It was reported that predictive sis and machine learning (ML) techniques toward the identifica- accuracy was enhanced by considering both degree- and triad- tion of user and/or conversation characteristics that are predic- related features as well as the amount of local network structural tive of high-quality social exchanges. This investigation is moti- context available (as represented by the embeddedness of the vated by the theory that improvements at the level of individual edge). conversations will lead to widespread increases in user retention, Wang et al. (2011) explored a novel technique for improv- satisfaction, and engagement. ing the prediction accuracy of online dating recommendations In this paper, we show that user interactions can be accu- that addresses the initial assignment of potential dates to a target rately predicted based on a combination of the properties intrin- user, t. Their algorithm is based on the assumptions that two sic to users as well as on the users' past interactions. We model users share similar partner preferences if both are liked by the Chatous as a graph and user interactions on Chatous as graph same users, and that interactions between similar users and t’s properties, and illuminate some interesting observations with candidate matches, C, can predict t’s behavior toward members regards to the Chatous graph. Subsequently, we describe two of C. When evaluated, Wang et al.’s method outperformed collab- approaches to user interaction prediction. First we illustrate a orative filtering and other traditional recommendation algo- feature-based approach in which we build a multi-class support rithms. 1 While a significant amount of prior work has focused on words typed by each participant, and friendship status. Chats are predicting the nature of relationships among one set of users in a categorized as “Finished,” “Long,” or “Short” depending upon social network based on the relationships that exist among the their length and termination status. The “Long” distinction fur- remaining set (termed link sign prediction), this posi- ther denotes that a friendship link has been established between tive/negative classification is binary [2][3][4]. Wang et al.’s a pair of users. (Note: General statistics appear in the Table 3.1). method for generating a continuous compatibility score between a pair of users addresses this limitation, yet fails to demonstrate a Users (Nodes) 332,888 theoretical understanding with regard to which dataset properties yield more accurate predictions. Furthermore, there is little Conversations (Edges) 9,050,713 indication that the authors’ algorithms and analyses are applica- Nodes in largest SCC 293,673 (0.88) ble to other datasets. Finally, the majority of prior work approaches the task of online behavior prediction using either net- Edges in largest SCC 6,458,818 (0.99) work structural data or non-structural metadata intrinsic to the Average clustering coefficient 0.24 users, ignoring the potential to combine the two [3][4][8]. Number of triangles 45,244,826 3 Methods Diameter (longest shortest path) 11 3.1 Overview Table 3.1 General dataset statistics associated with the complete graph. It is highly probable that online relationships, much like re- al-world relationships, are non-discrete. When considering the profusion of structural data (e.g. friendship, conversations, etc.) 3.3 Preprocessing and user characteristics present in the Chatous dataset, the de- Initially, conversation entries were partitioned into a train- velopment of a classifier that permits more granular prediction is ing set and testing set. Our testing set was constructed by ran- warranted. domly sampling 20% of the conversations from the complete Accurately predicting the length of a conversation two us- dataset; the remaining data was reserved for training. ers will engage in is an essential prerequisite to the optimal as- We represented each participant in the user profiles table signment of matches, as longer conversations are theorized to as a node, and each conversation (irrespective of length) as an reflect increased user satisfaction. Prior attempts to predict con- edge in the graph. With the intention of assigning numerical versation length in similar networks have typically relied on sim- weights to user interaction signals such as conversation length, ple models, minimal features, and/or the exclusive application of conversation termination, user friendship, user reports, etc., a either user characteristics or network structural properties. Our complete graph, C, and three subgraphs were created to model construction of a comprehensive model permits a more thorough these relationships. understanding of the Chatous network’s structure and allows the informed application of both network structural properties and Graph Type Description intrinsic user characteristics toward the derivation of an optimal Each edge represents a conversation algorithm for user matching. Complete Undirected We hypothesize that electing to assign weights to various

Predicting Interaction Quality in Online Social Networks

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support