Information-Theoretic Measures of Influence Based on Content
Total Page:16
File Type:pdf, Size:1020Kb
Information-Theoretic Measures of Influence Based on Content Dynamics Greg Ver Steeg Aram Galstyan Information Sciences Institute Information Sciences Institute University of Southern California University of Southern California Marina del Rey, California Marina del Rey, California [email protected] [email protected] ABSTRACT anced understanding of social interactions requires analyz- The fundamental building block of social influence is for one ing the semantic content of communications. For instance, person to elicit a response in another. Researchers measur- it has been suggested that linguistic cues in communicative ing a \response" in social media typically depend either on patterns, as well as the ways individuals echo and accommo- detailed models of human behavior or on platform-specific date each other's linguistic styles, can be indicative of rela- cues such as re-tweets, hash tags, URLs, or mentions. Most tive social status of participants [6]. Despite recent progress, content on social networks is difficult to model because the however, content-based analysis of social interactions is still modes and motivation of human expression are diverse and a challenging problem due to the lack of adequate quantita- incompletely understood. We introduce content transfer, an tive methods for extracting useful signals from unstructured information-theoretic measure with a predictive interpreta- text. Another significant hurdle is that the design and us- tion that directly quantifies the strength of the effect of one age of social networks, and thus the interpretation of various user's content on another's in a model-free way. Estimating signals, are changing over time. this measure is made possible by combining recent advances in non-parametric entropy estimation with increasingly so- Here we suggest a novel, information-theoretic approach for phisticated tools for content representation. We demonstrate leveraging user-generated content to characterize interac- on Twitter data collected for thousands of users that con- tions among social media participants. Specifically, given all tent transfer is able to capture non-trivial, predictive rela- the content generated by a set of users (e.g., a sequence of tionships even for pairs of users not linked in the follower or tweets), our goal is to find meaningful edges that indicate mention graph. We suggest that this measure makes large social interactions among this set of users. Our approach is quantities of previously under-utilized social media content model-free in the sense that it does not presuppose a particu- accessible to rigorous statistical causal analysis. lar behavioral model of users and their interactions. Instead, we view users as producers of some arbitrarily encoded in- Categories and Subject Descriptors formation stream. If Y 's stream has an effect on X's, then access to Y 's signal can, in principle, improve our prediction H.1.1 [Systems and Information Theory]: Information of X's future activity. This is what we mean by a predictive Theory; H.3.4 [Systems and Software]: Information net- link. We show that this general notion of predictability can works; J.4 [Social and Behavioral Sciences]: Sociology be used to identify social influence. Keywords The technical approach proposed here consists of two main entropy, link prediction, causality, social networks ingredients (see Fig. 1). First, we represent user-generated content in a high-dimensional space so that a sequence of 1. INTRODUCTION user-generated posts is mapped to a time-series in this space. Second, we apply information-theoretic measures to those arXiv:1208.4475v4 [cs.SI] 15 Feb 2013 While the emergence of various online social networking platforms provides a steady source of data for researchers, time series to discover and quantify directed influence among it also provides a source of constantly evolving complexity. the users. Because our method is based on information- Most prior research has focused on analyzing various static theoretic principles, it is easy to interpret, applicable to ar- topological properties of networks induced by social commu- bitrary signals and/or platforms, and flexible with respect nication, while discarding the content of communication. At to the representation of content. the same time, there is a growing recognition that a more nu- Our approach ultimately reduces to calculating an information-theoretic measure called transfer entropy be- tween pairs of stochastic processes [32]. Intuitively, transfer entropy between processes X and Y quantifies how much better we are able to predict the target process X if we use the history of the process Y and X rather than the history of X alone. By using transfer entropy as a statistical measure of the relationship between the content of Y's tweets and the content of X's subsequent tweets, we construct a graph of F User%Y% Past%tweet%for%Y% next tweet generated by X, denoted by X ; see Fig. 1. Gen- Time erally speaking, XF is a random variable that can depend on User%X% Past%tweet%for%X% X’s%future%tweet% a large number of factors that might not be directly observ- able: topical interests of user X (and her friends), exogenous 0.7 events, and so on. Here, however, we are interested in the Y P = 0.2 F P User%Y% 0...1 extent to which X is influenced by the past tweets X P @ A Time 0 0.6 and Y . Namely, we would like to see how much knowing User%X% P XP = 0.3 XF = 0.4 0 1 0 1 the past content generated by user Y, Y , helps us to better ... ... F @ A @ A predict X . If knowing Y's past tweets helps us to predict XF more accurately, then we can say that Y exerts certain F P P ITY X = Hˆ (X : Y X ) influence on X. ! | Figure 1: Is the content of X's future tweet, XF , The notion of influence (or causality) described above is predictable from past tweets, Y P ;XP ? We answer taken in the sense of Granger causality [10] which demands this question by first transforming the text of tweets that (1) the cause occurs before the effect; (2) the cause con- into vectors. Joint samples of these variables can be tains information about the effect that is unique, and is in used to estimate information transfer, or transfer no other variable [12]. In practice, determining that infor- entropy, quantifying how predictive Y 's tweets are mation is \in no other variable" is difficult. For determining for X's future tweets. a causal effect on a user in a social network, we only at- tempt to rule out the user's recent past as an explanation. Exogenous and long-term effects are difficult to account for predictive links, based only on the content of users' tweets. but will be discussed in some interesting cases. The princi- Our results demonstrate that transfer entropy indeed reveals ple behind Granger causality was originally applied in the a variety of predictive, causal behaviors. Surprisingly, we context of regression models, but applying these ideas in also discover that many of the most predictive links are not the context of information theory leads to effective tests of present in the social network, through mentions nor friend causality [12]. links. Nevertheless, in Sec. 4.4, we verify the meaningful- ness of our measure by showing that predictive links are a 2.2 Transfer Entropy statistically significant predictor of mentions on Twitter. We denote by H(X) the entropy of a random variable, X, with some associated probability distribution, p(~x) ≡ To summarize, our main contribution is a novel application d Pr(X = ~x); for ~x 2 R x . In this case (differential) entropy is of an information-theoretic framework to content-based so- defined in the standard way, using the natural log, cial network analysis, providing a general, flexible measure Z of meaningful relationships in the network. This construc- H(X) = E(− log p(x)) = − dx p(x) log p(x): tion is made possible by two apparently novel technical in- sights. (1) Current state-of-the-art methods for estimating We sometimes speak of entropy as quantifying our \uncer- entropic measures such as mutual information continue to tainty" about X. Standard higher order entropies such as perform well in high-dimensional spaces as long as they are mutual information and conditional entropy can be defined effectively low-dimensional in some sense. (2) While content in terms of differential entropy as H(X : Y ) = H(X) + representations of user activity are high-dimensional, they H(Y ) − H(X; Y ) and H(XjY ) = H(X; Y ) − H(Y ), respec- are effectively low dimensional in the required sense. Taken tively. Conditional information can be interpreted as the re- together, these two points allow us to successfully apply en- duction of uncertainty in X from knowing Y . tropic estimators in a previously inaccessible regime. Transfer entropy, or information transfer [32], can be defined After motivating our technical approach and defining the as, relevant information-theoretic quantities in Section 2, we de- F P P scribe how to estimate those quantities in Sec. 3, and demon- ITY !X = H(X : Y jX ) (1) strate their use on real-world data from Twitter in Sec. 4. = H(X FjX P) − H(X FjY P;X P); Finally, we give an overview of related work in Sec. 5, fol- F lowed by a discussion of results in Sec. 6. where X is interpreted as information about user X's fu- ture behavior, and X P;Y P as user X and Y 's past behavior, respectively. The temporal indices dictate that cause should 2. TECHNICAL APPROACH come before effect, and conditioning on X's past insures that 2.1 Motivation any explanatory value from Y is not already present in X's Let us consider a set of users that generate a time-stamped past behavior.