Suggesting Friends Using the Implicit Social Graph

Suggesting (More) Friends Using the Implicit Social Graph∗ Maayan Roth [email protected] Tzvika Barenholz [email protected] Assaf Ben-David [email protected] David Deutscher [email protected] Guy Flysher [email protected] Avinatan Hassidim AV I NATA N @GOOGLE.COM Ilan Horn [email protected] Ari Leichtberg [email protected] Naty Leiser [email protected] Yossi Matias [email protected] Ron Merom [email protected] Google Inc, Israel R&D Center Abstract 1. Introduction Although users of online communication tools One benefit of many online communication channels over rarely categorize their contacts into groups such offline methods is that they enable communication among as ”family”, ”co-workers”, or ”jogging buddies”, groups of people, rather than restricting communication they nonetheless implicitly cluster contacts, by to be peer-to-peer. Email is just one format that supports virtue of their interactions with them, forming group conversations, but there are many others, such as implicit groups. In this paper, we describe the photo- and link-sharing, and collaborative document edit- implicit social graph which is formed by users’ ing. In fact, group communication is so prevalent that our interactions with contacts and groups of con- analysis of the Google Mail email network shows that over tacts, and which is distinct from explicit social 10% of emails are sent to more than one recipient, and over graphs in which users explicitly add other in- 4% of emails are sent to 5 or more recipients. Within en- dividuals as their ”friends”. We introduce an terprise domains, group communication is even more criti- interaction-based metric for estimating a user’s cal. An analysis of the email network of Google employees affinity to his contacts and groups. We then de- showed that over 40% of emails are sent to more than one scribe a novel friend suggestion algorithm that recipient, and nearly 10% are sent to 5 or more recipients. uses a user’s implicit social graph to generate a As opposed to broadcast-style media, such as blogs1 and friend group, given a small seed set of contacts 2 which the user has already labeled as friends. We micro-blogging platforms like Twitter , the information show experimental results that demonstrate the communicated by an individual to a limited group is gen- importance of both implicit group relationships erally carefully targeted, and may be private. The recip- and interaction-based affinity ranking in suggest- ient lists for small-group communications such as emails ing friends. Finally, we discuss two applications are selectively constructed by the message senders. We of the Friend Suggest algorithm that have been have observed that users tend to communicate repeatedly released as Gmail Labs features. with the same groups of contacts. This observation has prompted many online communication platforms to provide their users with tools for creating and saving groups ∗ This is an updated version of Roth et al. ”Suggest- ing Friends Using the Implicit Social Graph” that appeared in of contacts. Some examples are the Google Mail Contact 3 4 SIGKDD, 2010. Please see that paper for all references. Manager , or custom friends lists on Facebook . 1e.g. www.blogger.com, www.wordpress.com Appearing in Proceedings of the 28 th International Conference 2www.twitter.com on Machine Learning, Bellevue, WA, USA, 2011. Copyright 2011 3 by the author(s)/owner(s). http://mail.google.com/support/bin/answer.py? hl=en&answer=30970 4www.facebook.com/help/#/help.php?page=768 Suggesting Friends Despite the prevalence of group communication, users do Implicit Social Graph not often take the time to create and maintain custom contact groups. One survey of mobile phone users in Europe The Google Mail implicit social graph is composed of bil- showed that only 16% of users have created custom contact lions of distinct nodes, where each node is an email ad- groups on their mobile phones (Kuhn & Wirz, 2009). In dress. Edges are formed by the sending and receiving of our user studies, users explain that group-creation is time email messages. For the purpose of our work, we consider consuming and tedious. Additionally, groups change dy- a message sent from a user to a group of several contacts namically, with new individuals being added to multi-party as forming a single edge, thereby constructing a directed communication threads and others being removed. Static, hypergraph. We call the hypergraph composed of all of the custom-created groups can quickly become stale, and lose edges leading into or out of a single user node that user’s their utility. egocentric network. We call each hyperedge an implicit group, even though it may consist of a single contact. On In this paper, we present a friend-suggestion algorithm that average, a typical 7-day active user has 350 implicit groups assists users in the creation of custom contact groups, either in his egocentric network, with groups containing an aver- implicit or explicit. This algorithm is based on analysis of age of 6 contacts. Note that this does not imply that the av- the implicit social graph, which is the social network that erage user has thousands of distinct contacts. Rather, each is defined by interactions between users and their contacts implicit group is a unique combination of one or more con- and groups of contacts. We differentiate the implicit social tacts with whom the user has interacted. graph from explicit social graphs that are formed by users explicitly adding other individuals as ”Friends”. The im- Edges in the implicit social graph have both direction and plicit social graph is a weighted graph, where edge weights weight. The direction of an edge is determined by whether are determined by the frequency, recency, and direction of it was formed by an outgoing email sent by the user, or an interactions between users and their contacts and groups. incoming email received by the user. There may be both Our measure of tie strength differs from previous work in outgoing and incoming edges joining a user and an implicit that we consider group interactions, as well as peer-to-peer. group, if the user has both sent and received email from the group. We consider a user to have received mail from We use the implicit social graph to identify clusters of con- a group by joining the sender of the mail and the other co- tacts who form groups that are meaningful and useful to recipients into an implicit group. Thus, if a contact c1 sent each user. Unlike some previous research on contact clus- mail to the user u and contacts c2 and c3, this is represented tering, we do not consider the content of interactions. Ad- in u’s egocentric network as an incoming edge from the ditionally, because the email network that we have studied group c1,c2,c3 to u. is private, we do not consider any friend-of-friend ties, ei- { } ther when computing edge weights for the graph, or when The weight of an edge is determined by the recency and computing contact clusters. frequency of email interactions between the user and the group. In Section 3.1, we propose one metric for com- Given a user’s social network with weighted edges and an puting edge weight, which we call Interactions Rank.We initial seed of a few contacts, our friend-suggest algorithm claim that edge weight is an important indicator of the builds a custom contact group that accurately expands the strength of the relationship between the user and a par- seed. We evaluate the efficacy of our algorithm by com- ticular group. In the remainder of this paper, we use the paring to baseline approaches via precision-recall measure- terms edge weight, group weight, and group importance ments. We show two applications of this algorithm, imple- interchangeably. mented as Gmail Labs features, called ”Don’t forget Bob!” and ”Got the wrong Bob?” Although our discussion cen- In our work, we draw a sharp distinction between each ters around an email network, the network analysis that we user’s egocentric network and the global or sociocentric have done is applicable to any social graph that is formed network that is formed by combining the networks of all by interactions between users and their contacts. users. Although other researchers have found value in clus- tering contact groups by looking at friend-of-friend edges This paper is an extended and revised version of a paper (e.g. (Kahanda & Neville, 2009)), we restrict our algorithm that appeared in KDD 2010. This paper provides additional to look only at a single user’s egocentric network during experimental results about the performance of our seed ex- friend suggestion. By showing users suggestions based pansion algorithm. We also provide preliminary empirical only on their local data, we are able to protect user pri- results about the performance of the ”Don’t Forget Bob!” vacy and avoid exposing connections between the user’s and ”Got the Wrong Bob?” Gmail features, along with ad- contacts that might not otherwise have been known to him. ditional discussion. The social graph studied in this paper is constructed using the metadata (i.e. timestamp, sender, and recipients) of out- 2. Characteristics of the Email going and incoming messages set or received via Google Suggesting Friends Mail; message content is not included or examined. For Regarding recency, we observe that a group with which the the purposes of this research, we used a random sample user is actively interacting now is more important than one of the metadata from thousands of interactions, and data with which the user last interacted a year ago.

Load more