The Mathematical Work of Jon Kleinberg
Total Page:16
File Type:pdf, Size:1020Kb
The Mathematical Work of Jon Kleinberg Gert-Martin Greuel, John E. Hopcroft, and Margaret H. Wright The Notices solicited the following article describing the work of Jon Kleinberg, recipient of the 2006 Nevanlinna Prize. The International Mathematical Union also issued a news release, which appeared in the October 2006 issue of the Notices. The Rolf Nevanlinna Prize is awarded by the In- Find the Mathematics; Then Solve the ternational Mathematical Union for “outstanding Problem contributions in mathematical aspects of informa- Kleinberg broadly describes his research [8] as cen- tion sciences”. Jon Kleinberg, the 2006 recipient tering “around algorithmic issues at the interface of the Nevanlinna Prize, was cited by the prize of networks and information, with emphasis on committee for his “deep, creative, and insightful the social and information networks that underpin contributions to the mathematical theory of the the Web and other on-line media”. The phenomena global information environment”. that Kleinberg has investigated arise from neither Our purpose is to present an overall perspective the physical laws of science and engineering nor on Kleinberg’s work as well as a brief summary of the abstractions of mathematics. Furthermore, the three of his results: networks motivating much of his work lack both • the “hubs and authorities” algorithm, deliberate design and central control, as distinct based on structural analysis of link topol- from (for example) telephone networks, which ogy, for locating high-quality information were built and directed to achieve specific goals. on the World Wide Web; He has focused instead on networks that feature • methods for discovering short chains in very large numbers of unregulated interactions be- large social networks; and tween decentralized, human-initiated actions and • techniques for modeling, identifying, and structures. analyzing bursts in data streams. A striking motif in Kleinberg’s research is his Kleinberg’s Nevanlinna citation includes two other ability to discern and formulate plausible math- areas in which he has made important contribu- ematical structures to describe problems that tions: theoretical models of community growth represent vague, even elusive, human goals. Some in social networks and the mathematical theory of his most brilliant work has begun by asking of clustering. Readers interested in learning more questions that might seem initially to have no clear about his work on these topics should consult the answers—“What do people really want from a Web list of papers given on his home page [8]. query?”, “How can individuals find short paths in a social network using only local information?”— Gert-Martin Greuel is professor of mathematics at the and then coming up with mathematical insights Universität Kaiserslautern, Germany, and director of the Mathematical Research Institute Oberwolfach. His email that illuminate the important features of reality. address is [email protected]. Once he has the mathematical definitions in hand, John E. Hopcroft is IBM Professor of Engineering and Kleinberg goes on to create powerful solution Applied Mathematics in Computer Science at Cornell Uni- techniques that are both mathematically elegant versity. His email address is [email protected]. and successful in practice. Margaret H. Wright is professor of computer science It is difficult to overstate the impact of Klein- and mathematics and chair of the computer science department at the Courant Institute of Mathematical berg’s work on several major real-world problems, Sciences, New York University. Her email address is and for this reason alone his work is well known to [email protected]. the broad scientific and technological community. 740 Notices of the AMS Volume 54, Number 6 Two additional reasons for Kleinberg’s high visi- in social networks, “impact” in scientific cita- bility are that his two best known results can be tions, ranking of Web pages, hypertext document grasped intuitively without invoking details of the retrieval, clustering of explicitly linked moderate- underlying mathematics and that he is a master size structures, and spectral graph partitioning. expositor, with a much-admired ability to motivate A fascinating historical note is his discussion of and explain his work so that readers can follow the then recently published Brin–Page page-rank his logic every step of the way. algorithm [1], soon thereafter to become the basis of Google. Hubs and Authorities Near the end of the paper (which was published There is no better way to appreciate Kleinberg’s before the rise of Google), Kleinberg surveys three signature style than reading the journal version of user studies designed to evaluate the effectiveness his most famous paper, “Authoritative sources in of his algorithm. These reported favorable results a hyperlinked environment” [2]. The paper opens concerning Web users’ perception of improved with a discussion of an important but imprecise- quality in their query results, but he cautions that ly defined problem—finding the “most relevant” such an evaluation is a challenging task because individual judgments of relevance are inherently webpages in response to a given broad query. subjective. Beyond the quandary of how to define “most rel- evant” in a meaningful way, Kleinberg notes that the difficulty with broad queries is the vast over- How Can It Be a Small World? abundance of possibly relevant hits, so that what Stanley Milgram’s social psychology experiments is needed is an automated way to filter out the in the 1960s (see, for example, [6]) reported on and most “definitive” pages. popularized the idea of a “small world”—that any Starting with whatseematfirstto bethe obvious two individuals who are apparently far apart in a solutions, he carefully explains the impossibility social network can find “short paths” to reach one another. To model the mathematical properties of using purely internal features of a page to required to ensure the existence of short paths, rate its authority, as well as the flaw in relying Watts and Strogatz [7] proposed a “superposition” on the query words themselves. The next part of structure in which a relatively small number of ran- the paper motivates and proposes the nonobvious dom long-range links are added to a high-diameter concept of using a link-based model of the graph network with edges at each node representing lo- representing the Web to create an algorithm for cal social links. The long-range links provide the deciding which pages are “authoritative”. opportunity for a short chain through the entire His method for finding these pages begins by network. Informally, a “small-world” network is constructing a small, focused subgraph G of the σ an n-node graph such that almost all pairs of Web, where σ is the query string, using link struc- nodes are connected by chains whose length is ture to identify its strong authorities. Along the a polynomial in log n, i.e., the number of links way, Kleinberg explains how, in addition to find- traversed to reach one node from another is likely ing highly authoritative pages, we would expect to be exponentially smaller than the number of to find hub pages, i.e., those that contain links nodes. to many relevant authoritative pages and thereby In his fundamental paper “The small-world phe- allow unrelated pages to be discarded. Based on a nomenon: an algorithmic perspective” [3], Klein- natural equilibrium between hubs and authorities, berg noted that Milgram’s findings not only indi- a novel iterative algorithm is defined that updates cated the surprising existence of short chains, but numerical weights for each page until a fixed point also revealed an equally surprising result about is reached. Happily, Kleinberg also shows that a the existence of algorithms that would enable an similar process and algorithm can be adapted arbitrary person, knowing only information about when seeking “similar” webpages. the locations of his/her individual acquaintances, A feature of Kleinberg’s algorithm that brought to construct a short communication path to a joy to fans of linear algebra is that the desired target stranger. authority-weight and hub-weight vectors form, re- To begin his analysis of small-world models, spectively, the principal eigenvectors of ATA and Kleinberg first proved a negative result—that, AAT , where A is the adjacency matrix of the using the Watts-Strogatz model, no decentralized graph of a collection of linked pages. As well, algorithm, meaning one whose decisions are based the nonprincipal eigenvectors of these matrices solely on local information, can produce paths of can be used to extract additional densely linked small expected length relative to the diameter of collections of hubs and authorities. the network. Always a careful scholar, Kleinberg provides Next, he generalized the Watts-Strogatz model a summary of previous approaches to a vari- into an infinite family of random network mod- ety of related problems: measuring “standing” els. Starting with two parameters characterizing a June/July 2007 Notices of the AMS 741 node’s local and long-range contacts, the model theory models of bursty traffic. In the most basic can be simplified to a one-parameter family with form of his model, the gap in time between an associated clustering exponent α that repre- two consecutive events (messages) is given by sents the probability of a long-range connection an exponential density function such that the between two nodes as a function of their lattice expected gap between messages is 1/α for some distance. α > 0, where α can be interpreted as the rate When there is a uniform distribution over long- of message arrivals. Bursts can be added by range contacts (corresponding to α = 0), Kleinberg allowing the model to include interleaved periods showed that, although short paths exist with high with lower and higher rates; these correspond to probability, a decentralized algorithm cannot find different states for which the rate depends on the them efficiently, since the expected time is expo- state.