Higher Order Information Identifies Tie Strength

Higher Order Information Identifies Tie Strength Arnab Sarker1, Jean-Baptiste Seby1, Austin R. Benson2, and Ali Jadbabaie1,3 1Institute for Data, Systems, and Society, MIT 2Department of Computer Science, Cornell University 3Laboratory for Information and Decision Systems, MIT March 2021 Abstract A key question in the analysis of sociological processes on networks is to identify pairs of individuals who have strong or weak social ties. Existing approaches mathematically model the social network as a graph, and tie strength is often inferred by examining the number of shared neighbors between individuals, or equivalently, the number of triangles in the graph which contain a specific pair of individuals. However, this approach misses out on critical information because it does not distinguish the case where interactions occur among groups involving more than two individuals. In this work, we measure tie strength by explicitly accounting for these higher order interactions in the network, through the use of a new measure called Edge PageRank. We show how Edge PageRank can be interpreted as the steady-state outcome of a dynamic, message-passing social process that characterizes the strength of weak ties by appropriately discounting the effect of higher-order interactions involving three individuals. Empirically, we find that Edge PageRank outperforms standard measures in identifying tie strength in several large-scale social networks. These results provide a new perspective on tie strength and demonstrate the importance of incorporating higher-order interactions in social network analysis. 1 Introduction Decades of sociological research has focused on determining which social connections provide the most valuable information to an individual [2, 10, 21, 25, 38, 46, 52]. An important finding from this body of works is that the most novel information surprisingly comes from casual acquaintances, as opposed to close family and friends [25]. These casual relationships, referred to as weak ties, are useful in generating new ideas [10], finding a job [24], and transferring knowledge within organizations [27]. The intuition for this apparent \strength" of weak ties is subtle yet simple: whereas close family and friends, or strong ties, may have similar sources of information to an individual, acquaintances often have access to different resources altogether which could potentially expand the social reach of an agent. arXiv:2108.02091v1 [cs.SI] 4 Aug 2021 Along with allowing an individual to optimally mobilize their social contacts [34, 52], tie strength is used in various modern data-driven tasks such as predicting the formation of new ties [13, 32, 35], structuring online interactions [16], and defining the various social groups of an individual [39]. Moreover, the concept of tie strength is used throughout the social sciences, providing a valuable dimension to understand academic output [54], performance of teams [48], the formation of industry structures [57], social contagion [11], labor markets [40], political participation [56], and communication patterns [3]. Determining and quantifying the strength of the social tie between two individuals, however, remains a difficult task. Although many studies categorize social ties as either weak or strong, in reality tie strength is not a dichotomy. Individuals may have their strongest ties with close friends and family, and their weakest ties with casual acquaintances, but they may also have ties of medium strength with individuals such as coworkers who they see regularly [38]. Moreover, in many settings, the only data available consists of which connections exist between individuals, without explicitly detailing the quality of the connections [21, 38, 32]. In these cases, the social network is defined by a set of individuals, referred to as nodes, and information about social connections between pairs of individuals, which are referred to as edges or ties. 1 The most common approaches to the identification of tie strength based on network information rely on the property of triadic closure. Triadic closure reflects the following property common to many social networks: if two individuals share a friend, then it is likely they will eventually become acquainted [47, 14]. Moreover, this likelihood increases if both individuals share a strong tie with their mutual contact [25]. Hence, if two individuals A and B share a strong tie, then it should be more likely that their contacts in the network know one another. These observations of social networks define one of the most widely used network-based measure of tie strength, embeddedness, which considers A and B to have a strong tie if a large proportion of their friends are mutual friends [21, 25]. Variations of embeddedness, such as the unweighted overlap metric, have also been used to identify tie strength [38, 41]. Still, embeddedness is a coarse measure. Local bridges, for example, are a type of edge whose constituent nodes do not have any shared neighbors, but can vary significantly in tie strength based on other network structure [46]. Furthermore, while embeddedness characterizes tie strength, it does not emphasize the functional role of a particular edge, i.e., embeddedness does not characterize structural properties that make weak ties important. Other approaches identify the role of an edge within a network in order to measure tie strength. For example, the measure of betweenness, which counts the number of shortest paths through an edge, has been used as a tie strength metric [31]. In this way, betweenness characterizes the strength of a tie by quantifying the way in which the edge can facilitate efficient communication in the network. However, betweenness can be misleading because it places equal weight on all shortest paths throughout the network (Figure 1a). Another metric is dispersion, which identifies strong ties due to marriage in social networks based on the fact that marital ties often have common connections across multiple social groups [5]. Another reasonable approach to measure the strength of edges in a network would be to try and extend a traditional measure of centrality defined on nodes to define a measure on edges. Edges with high centrality could then be considered strong or weak depending on the corresponding definition from node centrality. However, such graph-based measures cannot explicitly capture the effect of higher order interactions represented by triangles. For example, using the PageRank measure on the so-called line graph (a new network where edges in the original network correspond to nodes, connected if they share a node in common from the original network) tends to only highlight edges connected to many other edges, signifying a measure of high edge degree (Figure 1b). All of these tie strength measures rely on representing the interaction in social network with a set of dyadic or pairwise edges. However, social interactions often take place among groups larger than two, inherently limiting existing approaches, and much of the information collected on social networks goes beyond dyads [8]. Rather than modeling networks with individuals as nodes and relationships as edges, we can use more general structures to explicitly encode higher order interactions that occur between groups of more than two individuals [9, 18, 17, 26, 59]. In fact, the principle of triadic closure may be extended to closure of groups: If A and B each know C, then A; B, and C are relatively more likely to engage in a group interaction [8]. However, this observation, and the additional information that comes with it, cannot be used by existing approaches when relationships are only represented in terms of graphs with edges connecting two individuals at a time. In this work, we incorporate the information from higher order interactions by modeling network information with a more general mathematical structure called a simplicial complex, as opposed to a graph. This allows us to explicitly distinguish between a group interaction among three people, and the case where only pairwise interactions exist. Such a rich data representation also allows us to use a larger set of tools and centrality measures. More specifically, we use the recently-developed Edge PageRank measure [50] which is a distinct from the aforementioned PageRank on edges via the line graph. This measure was recently introduced as a mathematical description of a random walk process, without a particular social interpreta- tion [50]. Here, we show that this notion of Edge PageRank is the outcome of a natural social process. More specifically, we define a random, local interaction process, where each state is a directed communication from one node to another along an edge, and the possible transitions between states are dictated by the presence of both edges and triangles in the network. After the random process reaches a steady state, such that each possible directed communication is assigned a probability, the measure then takes a difference between the orientations along each pair of nodes connected by an edge. In this way, an edge receives a large value if there is asymmetric communication in the steady state of the random process, indicating a strong tie. Based on how the presence of triangles affects the random process, the net result appropriately discounts 2 (a) Betweenness Centrality (b) PageRank on the Line Graph (c) Edge PageRank Figure 1: Comparison of dyad-based centrality measures (a{b) to the Edge PageRank measure (c). The top four edges of each centrality are bolded in the left plots (five in the case of betweenness centrality, due to a tie), and the values of the metrics for each edge is presented in sorted order in the right plots. (a) Betweenness centrality, which has been used as a continuous measure of tie strength [31], equally weights shortest paths across the entire network, resulting in f5; 8g considered weaker than f12; 14g. (b) The PageRank measure, a standard measure of importance in graphs, can be applied to the line graph transformationto define a measure of importance on edges [14].

Load more