Navigating Networks by Using Homophily and Degree
Total Page:16
File Type:pdf, Size:1020Kb
Navigating networks by using homophily and degree O¨ zgu¨r S¸ ims¸ek* and David Jensen Department of Computer Science, University of Massachusetts, Amherst, MA 01003 Edited by Peter J. Bickel, University of California, Berkeley, CA, and approved July 11, 2008 (received for review January 16, 2008) Many large distributed systems can be characterized as networks We show that, across a wide range of synthetic and real-world where short paths exist between nearly every pair of nodes. These networks, EVN performs as well as or better than the best include social, biological, communication, and distribution net- previous algorithms. More importantly, in the majority of cases works, which often display power-law or small-world structure. A where previous algorithms do not perform well, EVN synthesizes central challenge of distributed systems is directing messages to whatever homophily and degree information can be exploited to specific nodes through a sequence of decisions made by individual identify much shorter paths than any previous method. nodes without global knowledge of the network. We present a These results have implications for understanding the func- probabilistic analysis of this navigation problem that produces a tioning of current and prospective distributed systems that route surprisingly simple and effective method for directing messages. messages by using local information. These systems include This method requires calculating only the product of the two social networks routing messages, referral systems for finding measures widely used to summarize all local information. It out- informed experts, and also technological systems for routing performs prior approaches reported in the literature by a large messages on the Internet, ad hoc wireless networking, and margin, and it provides a formal model that may describe how peer-to-peer file sharing. humans make decisions in sociological studies intended to explore the social network as well as how they make decisions in more Formulation naturalistic settings. We formulate the navigation problem as a probabilistic decision- making task in which the objective is to minimize the length of complex networks ͉ search algorithms ͉ social network analysis the path traveled by the message. We assume that each node knows about its immediate neighbors—including their identity, uch of the current interest in small-world networks has degree, and attributes—but is unaware of the rest of the Mdrawn inspiration from the now classic work of Travers network. At the source node, and subsequently at each node and Milgram (1). Their 1969 study asked individuals to deliver along the path, the optimal decision rule (given the limited information) is to forward the message to the neighbor from letters to designated persons by passing them through chains of which the message will reach the target in the smallest number first-name acquaintances. That study, as well as more recent of hops, assuming that all future nodes will make their decision work (2), shows that surprisingly short paths exist between pairs similarly by using local information. Although a recurrence of nodes in real-world social networks. relation may, in principle, govern the optimal decision rule, it is However, the existence of such paths is only one of the not apparent how this can be formulated in a way that would interesting findings of the study. Kleinberg (3) notes that the suggest an effective navigation method. study contains a second finding of fundamental algorithmic Instead, we suggest that an effective (although not necessarily importance: People are able to find short paths even though they optimal) decision would be to forward the message to the know very little about the target individual or the network. Both neighbor with the minimum expected distance to the target, Kleinberg and a variety of subsequent researchers have ad- where distance from node s to node t is the length of the shortest dressed the question of how such network navigation is possible. path from s to t. We can express this quantity, the expected value That work identifies two network characteristics that can guide of the distance dst from neighbor s to target t, as a weighted sum navigation. The first is homophily—the tendency of attributes of of all possible distances: connected nodes to be correlated. People tend to be acquainted with other people who live in the same geographical area or who ͑ ͒ ϭ ⅐ ͑ ϭ ͒ E dst i P dst i . [1] have the same occupation. The second is the existence of iՆ1 high-degree nodes. Some people have a large number of ac- quaintances and act as hubs that connect different social circles. Computing this expectation is daunting but, fortunately, not Both of these characteristics are widely observed in real-world necessary. Effective decision making requires only identifying networks, and both lead directly to navigation algorithms. Con- the neighbor that minimizes the expectation. To perform this sideration of homophily gives rise to a navigation algorithm that comparison, we propose to use only the first term in the series, passes messages to the neighbor that is the most similar to the which calculates the probability of a distance of one. The relative target node (e.g., an acquaintance who lives in Boston, if the values of this first term may be an accurate estimator of the target person lives in Boston) (3–6), whereas consideration of relative values of the entire expectation, because the terms in the degree gives rise to an algorithm that favors the neighbor with series are correlated, and this correlation increases with increas- the highest degree (7). ing homophily. For example, given a relatively high probability In contrast to previous work, we cast the navigation problem of a distance of one, we expect a relatively high probability of a as a task of decision making under uncertainty, in which the distance of two. In general, the greater the first term, the lower desired decision is to forward the message to the neighbor with the minimum expected distance to the target. We show how the desired decision may be approximated by using a single criteri- Author contributions: O¨ .S¸. and D.J. designed research; O¨ .S¸. and D.J. performed research; ¨ ¨ on—the probability that the neighbor links directly to the O.S¸. analyzed data; and O.S¸. and D.J. wrote the paper. target—which may be estimated by a simple product of the The authors declare no conflict of interest. homophily and degree information. Our formulation directly This article is a PNAS Direct Submission. implies an algorithm that we call expected-value navigation Freely available online through the PNAS open access option. (EVN). Earlier algorithms are special cases of EVN when only *To whom correspondence should be addressed. E-mail: [email protected]. homophily or degree information is present. © 2008 by The National Academy of Sciences of the USA 12758–12762 ͉ PNAS ͉ September 2, 2008 ͉ vol. 105 ͉ no. 35 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0800497105 Downloaded by guest on October 2, 2021 the whole expectation, so the resulting decision rule is to select 120 the neighbor with the highest probability of linking directly to the A Random target. This probability is influenced by both homophily and 100 degree. The more similar the neighbor is to the target and the Degree 80 higher the degree of the neighbor, the greater the probability that the neighbor links directly to the target. In directed net- 60 works, we express this relationship by making the simplifying 40 Similarity assumption that the links originating at s were created indepen- Median path length 20 dently, with the same probability (qst) of terminating at t. When EVN the network exhibits homophily, qst is a function of the similarity Global 0 between nodes s and t. The number of links from s to t (nst) then 0 1 2 3 Homophily parameter follows a binomial distribution with probability of success qst and ks trials, where ks is the out-degree of s. Using the Poisson approximation to the binomial, we can compute the first term of B 120 Eq. 1: 100 ͑ ϭ ͒ ϭ ͑ Ն ͒ 80 P dst 1 P nst 1 ϭ Ϫ ͑ ϭ ͒ 60 1 P nst 0 EVN 40 ks ϭ Ϫ ͑ ͒ ϭ Ϫ ͑ Ϫ ͒ Median path length 1 binomial 0; ks, qst 1 1 qst [2] 20 Global Ϸ 1 Ϫ Poisson͑0; k q ͒ ϭ 1 Ϫ eksqst. [3] s st 0 0 1 2 3 In undirected networks, we obtain the same result, similarly Homophily parameter assuming that the links were placed independently. In this case, Fig. 1. Median path lengths on power-law networks with degree parameter we use qst to denote the probability that, given that a link borders 1.5 (A), on Poisson networks with ϭ 3.5 (B). The graphs show performance node s, it connects node s to node t. on 5,000 randomly selected search tasks on 30 randomly generated 1,000- Because only a relative ordering is necessary for effective node networks. ‘‘Global’’ indicates the median path lengths obtained with navigation, the resulting decision rule is remarkably simple: complete knowledge of the network. Navigation methods not shown in B Select the neighbor that maximizes the product of a degree term resulted in median path lengths higher than the values shown on the graph. (ks) and a homophily term (qst). If the network shows no homophily, this is equivalent to selecting the neighbor with the highest degree. On the other hand, if all nodes have equal large because two attribute values may be arbitrarily close. In degree, it is equivalent to selecting the neighbor most similar to applying similarity-based navigation, we considered all neigh- the target.