A Social Identity Approach to Identify Familiar Strangers in a Social Network
Total Page:16
File Type:pdf, Size:1020Kb
A Social Identity Approach to Identify Familiar Strangers in a Social Network Nitin Agarwal, Huan Liu, Sudheendra Murthy, Arunabha Sen, and Xufei Wang School of Computing and Informatics Arizona State University fNitin.Agarwal.2, Huan.Liu, sudhi, asen, [email protected] Abstract distribution it is quite likely that they may not know each other. Aggregating such familiar strangers could form a We present a novel problem of searching for ‘familiar critical mass such that (1) the understanding of one mem- strangers’ in a social network. Familiar strangers are individ- uals who are not directly connected but exhibit some similar- ber gives us a sensible and representative glimpse to oth- ity. The power-law nature of social networks determines that ers, (2) more data about familiar members can be collected majority of individuals are directly connected with a small for better customization and services (e.g., personalization number of fellow individuals, and similar individuals can be and recommendation), (3) the nuances among them suggest largely unknown to each other. Moreover, the individuals of new business opportunities, and (4) knowledge about them a social network have only a local view of the network, which can facilitate predictive modeling and trend analysis in new makes the problem of aggregating these familiar strangers a product/market development. Connecting them to form a challenge. In this work, we formulate the problem, show why critical mass can only potentially expand their social net- it is significant to address the challenge, and present an ap- work, i.e., job searching, special interest group formation. proach that innovatively employs the social identities of the Aggregating familiar strangers can encourage participation individuals with competitive approaches. The blogger and citation network are used to showcase technical details and due to the crowd effect (Kumar et al. 2004). People usu- empirical results with related issues and future work. ally trust those with similar interests. Knowledge transfer or information flow among friends and acquaintances becomes smoother and more receptive. Introduction Identifying familiar strangers in online social networks is Familiar strangers as defined by Stanley Milgram (Milgram interesting and involves several key challenges. Individuals 1972) in physical world are those individuals who do not have only local view, i.e., individuals know their contacts but know each other but share some common attributes like in- may not know their contacts’ contacts and so on. Search- terests, occupation, location, etc. For instance, people taking ing for all the contacts of a node, his contacts’ contacts and the same train daily find familiar faces but do not know each so on, to identify familiar strangers incurs an exponential other. Analogous to the physical world, it is equally interest- cost. Each individual is associated with some content or ing and challenging to define and study the existence of fa- attributes. The challenge lies in intelligently putting that miliar strangers in virtual or online world. Social networks information to the benefit of searching familiar strangers. represent a complex set of human relations through inter- Evaluation and validation of the proposed approaches is a actions expressed via a spectrum of social media websites big issue due to the absence of an established ground truth. like blogs, online friendship networks, wikis, media shar- ing websites, social tagging websites and etc. In an online Problem Formulation world, familiar strangers could be defined as those individ- Here we define familiar strangers and formulate the problem uals who are not friends with each other, i.e., they are not of searching them using local information. Given a social in each others social network, but they share some common network G where V is the set of vertices (nodes) or the mem- set of attributes like hobbies, community affiliations, work- bers of the social network. The nodes are associated with an place, location, etc. A more formal definition is given later. attribute. The attribute can take one or more values from a Identifying familiar strangers has profound applications domain D = fa ; a ; :::; a g. We call this the attribute-value in online social networks. Since the online social networks 1 2 l set of a node and is denoted by Au for a node u (u 2 V ). are shown to have long tail distribution, i.e., most of the Each node u has a local view of the network (also known members have very few contacts and very few members as an egocentric view (Wasserman and Faust 1994)), that have a large number of contacts, which means that most of means the node only knows its adjacent nodes denoted by these members do not know each other. Although many of Cu = fm1; m2; :::; my j edge(u; mp) 6= 0; 1 ≤ p ≤ yg, them could have a lot in common but due to the long tail also known as u’s contacts. Here edge(c; d) 6= 0 denotes Copyright c 2009, Association for the Advancement of Artificial an edge between nodes c and d. This is similar to a sce- Intelligence (www.aaai.org). All rights reserved. nario where one knows his/her friends but doesn’t know A: Interest D: {Academic, Arts, Blogging, Business, Computers, Exercise, History, Internet, v v4 Music, News, Personal, Political, Recreation, Technology, Travel} 3 u: Seed Blogger Cu: {v1,v2,v3,v4} Au: {Exercise, History, Recreation} v 2 u Av1: {Internet, News} Local knowledge Av2: {Blogging, Internet} that u has of the Av3: {Blogging, Internet, Technology} network Av4: {Recreation, Travel} v1 Find Tu: familiar strangers of u given goal !: Sports = {Exercise, Recreation} Figure 1: Searching familiar strangers for a node u given the local network information that u has and the goal γ. his/her friends’ friends and so on. In order to define familiar Linear Programming (ILP) formulation to solve the Steiner strangers of u, it is essential to define the notion of similarity. tree problem optimally. Given the undirected social net- work graph G = (V; E), we first construct the correspond- Definition 1 (Similarity) Nodes u and v are similar iff A \ v ing directed graph H = (V; F ), in which two directed γ 6= ;, where γ is a goal described as γ ⊆ Au. edges f(vi; vj); (vj; vi)g 2 F for each undirected edge (vi; vj) 2 E. Let the number of required nodes be de- Definition 2 (Familiar Strangers) Given u and γ, Tu is 0 the set of familiar strangers of u iff (1) for all the nodes noted by n, i.e., jV j = n and let an arbitrary vertex say, the node u 2 V 0 be designated as the root node. The ILP v 2 Tu; edge(u; v) = 0 i.e., all the nodes v are non-adjacent to u - stranger1 and (2) all the nodes v are similar to u with views the directed graph H as a flow graph, in which (n−1) respect to γ as defined above - familiar. units of flow are routed from the root u towards the nodes in V 0 n fug through minimum number of edges. Each node The problem of searching for familiar strangers given a in V 0 n fug consumes exactly one unit of flow. The edges node u can be illustrated in Figure 1 where a blogger so- of graph H through which a positive (unit) flow exists form cial network is presented in the left, snippet of which is pre- the minimum-edge arborescence2 in H spanning the vertices sented in the middle. Here the attribute A is “Interest” and V 0. The undirected edges in graph G corresponding to the D is the domain for the values of “Interest”. Cu represents arborescence edges forms the required Steiner tree in G. the contacts of u and A represents the attribute-value set u Let indicator variables xvivj = 1, if edge (vi; vj) belongs of u. Av1 ;Av2 ;Av3 ;Av4 represent the attribute-value sets to the required minimum-edge arborescence T in H, other- of v ,v ,v , and v respectively. We need to find T , famil- 1 2 3 4 u wise, xvivj = 0. Let variables fvivj ≥ 0 represent non- iar strangers of u for the goal γ (“Sports”) defined by the negative flow on the edges. The variables xvivj and fvivj combination of “Exercise” and “Recreation”. are defined for all edges (vi; vj) 2 F . The objective is to The challenge lies in searching for familiar strangers effi- minimize the number of edges in the arborescence in H, ciently, i.e., in minimum number of edge traversals with lo- X cal information. To compute the lower bound on the search Minimize xvivj space for finding the familiar strangers, consider the central- (vi;vj )2F ized version of the problem, in which the node u has global or whole view of the network and the objective is to find the • There are exactly (n − 1) units of flow emanating out of smallest set of edges that will connect all the nodes in Tu the root node u and 0 units of flow going into it. That is, starting at node u. This centralized version of the familiar X X f = n − 1; f = 0 strangers problem corresponds to the Steiner tree problem. uvj vj u 0 Given a subset of nodes V ⊂ V in a graph G = (V; E), the (u;vj )2F (vj ;u)2F Steiner tree (T ) spans the node set V 0 with least number of • Every other required node, i.e., v 2 V 0 n fug consumes edges. The node set V 0 is referred to as the required nodes i 1 unit of flow.