The Mathematical Work of Jon Kleinberg

Gert-Martin Greuel, John E. Hopcroft, and Margaret H. Wright

The Notices solicited the following article describing the work of Jon Kleinberg, recipient of the 2006 . The International Mathematical Union also issued a news release, which appeared in the October 2006 issue of the Notices.

The Rolf Nevanlinna Prize is awarded by the In- Find the Mathematics; Then Solve the ternational Mathematical Union for “outstanding Problem contributions in mathematical aspects of informa- Kleinberg broadly describes his research [8] as cen- tion sciences”. Jon Kleinberg, the 2006 recipient tering “around algorithmic issues at the interface of the Nevanlinna Prize, was cited by the prize of networks and information, with emphasis on committee for his “deep, creative, and insightful the social and information networks that underpin contributions to the mathematical theory of the the Web and other on-line media”. The phenomena global information environment”. that Kleinberg has investigated arise from neither Our purpose is to present an overall perspective the physical laws of science and engineering nor on Kleinberg’s work as well as a brief summary of the abstractions of mathematics. Furthermore, the three of his results: networks motivating much of his work lack both • the “hubs and authorities” algorithm, deliberate design and central control, as distinct based on structural analysis of link topol- from (for example) telephone networks, which ogy, for locating high-quality information were built and directed to achieve specific goals. on the World Wide Web; He has focused instead on networks that feature • methods for discovering short chains in very large numbers of unregulated interactions be- large social networks; and tween decentralized, human-initiated actions and • techniques for modeling, identifying, and structures. analyzing bursts in data streams. A striking motif in Kleinberg’s research is his Kleinberg’s Nevanlinna citation includes two other ability to discern and formulate plausible math- areas in which he has made important contribu- ematical structures to describe problems that tions: theoretical models of community growth represent vague, even elusive, human goals. Some in social networks and the mathematical theory of his most brilliant work has begun by asking of clustering. Readers interested in learning more questions that might seem initially to have no clear about his work on these topics should consult the answers—“What do people really want from a Web list of papers given on his home page [8]. query?”, “How can individuals find short paths in a social network using only local information?”— Gert-Martin Greuel is professor of mathematics at the and then coming up with mathematical insights Universität Kaiserslautern, Germany, and director of the Mathematical Research Institute Oberwolfach. His email that illuminate the important features of reality. address is [email protected]. Once he has the mathematical definitions in hand, John E. Hopcroft is IBM Professor of Engineering and Kleinberg goes on to create powerful solution Applied Mathematics in Computer Science at Cornell Uni- techniques that are both mathematically elegant versity. His email address is [email protected]. and successful in practice. Margaret H. Wright is professor of computer science It is difficult to overstate the impact of Klein- and mathematics and chair of the computer science department at the Courant Institute of Mathematical berg’s work on several major real-world problems, Sciences, New York University. Her email address is and for this reason alone his work is well known to [email protected]. the broad scientific and technological community.

740 Notices of the AMS Volume 54, Number 6 Two additional reasons for Kleinberg’s high visi- in social networks, “impact” in scientific cita- bility are that his two best known results can be tions, ranking of Web pages, hypertext document grasped intuitively without invoking details of the retrieval, clustering of explicitly linked moderate- underlying mathematics and that he is a master size structures, and spectral graph partitioning. expositor, with a much-admired ability to motivate A fascinating historical note is his discussion of and explain his work so that readers can follow the then recently published Brin–Page page-rank his logic every step of the way. algorithm [1], soon thereafter to become the basis of Google. Hubs and Authorities Near the end of the paper (which was published There is no better way to appreciate Kleinberg’s before the rise of Google), Kleinberg surveys three signature style than reading the journal version of user studies designed to evaluate the effectiveness his most famous paper, “Authoritative sources in of his algorithm. These reported favorable results a hyperlinked environment” [2]. The paper opens concerning Web users’ perception of improved with a discussion of an important but imprecise- quality in their query results, but he cautions that ly defined problem—finding the “most relevant” such an evaluation is a challenging task because individual judgments of relevance are inherently webpages in response to a given broad query. subjective. Beyond the quandary of how to define “most rel- evant” in a meaningful way, Kleinberg notes that the difficulty with broad queries is the vast over- How Can It Be a Small World? abundance of possibly relevant hits, so that what ’s social psychology experiments is needed is an automated way to filter out the in the 1960s (see, for example, [6]) reported on and most “definitive” pages. popularized the idea of a “small world”—that any Starting with whatseematfirstto bethe obvious two individuals who are apparently far apart in a solutions, he carefully explains the impossibility social network can find “short paths” to reach one another. To model the mathematical properties of using purely internal features of a page to required to ensure the existence of short paths, rate its authority, as well as the flaw in relying Watts and Strogatz [7] proposed a “superposition” on the query words themselves. The next part of structure in which a relatively small number of ran- the paper motivates and proposes the nonobvious dom long-range links are added to a high-diameter concept of using a link-based model of the graph network with edges at each node representing lo- representing the Web to create an algorithm for cal social links. The long-range links provide the deciding which pages are “authoritative”. opportunity for a short chain through the entire His method for finding these pages begins by network. Informally, a “small-world” network is constructing a small, focused subgraph G of the σ an n-node graph such that almost all pairs of Web, where σ is the query string, using link struc- nodes are connected by chains whose length is ture to identify its strong authorities. Along the a polynomial in log n, i.e., the number of links way, Kleinberg explains how, in addition to find- traversed to reach one node from another is likely ing highly authoritative pages, we would expect to be exponentially smaller than the number of to find hub pages, i.e., those that contain links nodes. to many relevant authoritative pages and thereby In his fundamental paper “The small-world phe- allow unrelated pages to be discarded. Based on a nomenon: an algorithmic perspective” [3], Klein- natural equilibrium between hubs and authorities, berg noted that Milgram’s findings not only indi- a novel iterative algorithm is defined that updates cated the surprising existence of short chains, but numerical weights for each page until a fixed point also revealed an equally surprising result about is reached. Happily, Kleinberg also shows that a the existence of algorithms that would enable an similar process and algorithm can be adapted arbitrary person, knowing only information about when seeking “similar” webpages. the locations of his/her individual acquaintances, A feature of Kleinberg’s algorithm that brought to construct a short communication path to a joy to fans of linear algebra is that the desired target stranger. authority-weight and hub-weight vectors form, re- To begin his analysis of small-world models, spectively, the principal eigenvectors of ATA and Kleinberg first proved a negative result—that, AAT , where A is the adjacency matrix of the using the Watts-Strogatz model, no decentralized graph of a collection of linked pages. As well, algorithm, meaning one whose decisions are based the nonprincipal eigenvectors of these matrices solely on local information, can produce paths of can be used to extract additional densely linked small expected length relative to the diameter of collections of hubs and authorities. the network. Always a careful scholar, Kleinberg provides Next, he generalized the Watts-Strogatz model a summary of previous approaches to a vari- into an infinite family of random network mod- ety of related problems: measuring “standing” els. Starting with two parameters characterizing a

June/July 2007 Notices of the AMS 741 node’s local and long-range contacts, the model theory models of bursty traffic. In the most basic can be simplified to a one-parameter family with form of his model, the gap in time between an associated clustering exponent α that repre- two consecutive events (messages) is given by sents the probability of a long-range connection an exponential density function such that the between two nodes as a function of their lattice expected gap between messages is 1/α for some distance. α > 0, where α can be interpreted as the rate When there is a uniform distribution over long- of message arrivals. Bursts can be added by range contacts (corresponding to α = 0), Kleinberg allowing the model to include interleaved periods showed that, although short paths exist with high with lower and higher rates; these correspond to probability, a decentralized algorithm cannot find different states for which the rate depends on the them efficiently, since the expected time is expo- state. In the ultimate model analyzed by Kleinberg nential in the expected minimum path length. In in detail, there are an infinite number of states, effect, the long-range links are “too random” to be each denoted by qi . The sequence q0, q1,...models useful to a decentralized algorithm. Although larg- inter-arrival times that decrease geometrically, er values of α allow a decentralized algorithm to and there is a cost τ(i, j) corresponding to the take advantage of the structure of the long-range transition from state i to state j. Given this model, contacts, they become less useful in transmitting Kleinberg shows that an optimal (cost-minimizing) the message to an arbitrary far-away node. state sequence can be found efficiently by adapt- The central result is that there is a unique value ing a standard forward dynamic programming = of α (α 2) for which a decentralized algorithm algorithm for hidden Markov models. From the (using a greedy heuristic) will find a target node results of running this algorithm, one can then in a number of steps bounded by a polynomial in define the hierarchical structure that is implicit in log n. Furthermore, no efficient decentralized algo- a sequence of bursts. rithm exists for any other value of α. These results Several fascinating conclusions emerge from generalize from two-dimensional to d-dimensional this work, including the fact that use of a state- lattices, with the critical value of α equal to the di- transition model means that bursts are charac- mension d. A later paper by Kleinberg [4] includes terized by unambiguous beginnings and endings. an extension of these results to other network Kleinberg notes that, when the document stream models, including hierarchical models or models consists of email messages, the initial message at based on set systems. which the state transition took place can be seen His work on small-world phenomena has had as a “landmark” in subsequent extended message a direct effect on the design of peer-to-peer sys- sequences. Later related work by Kleinberg and tems and focused Web crawling techniques, and others has addressed traffic-based feedback on it has also inspired numerous papers by other the Web and the temporal dynamics of online authors. Using one of Kleinberg’s contributions to illustrate the influence of the other, submission of information streams (see [8]). the combination of “small world” and “Kleinberg” to Google on February 11, 2007, produced nearly Summary 52,000 hits! Jon Kleinberg’s work perfectly fits the Nevanlin- na Prize specification since, as we have seen, his Word Bursts and Temporal Analysis mathematical insights have had wide application In his 2002 paper [5], Kleinbergconsiders the prob- to multiple elements of information science—the lem of extracting meaningful information from effectiveness of advanced Web search engines, document streams (such as email messages or Internet routing, data mining, and the sociology news articles) that arrive continuously over time— of the World Wide Web. We refer the interested in particular, spotting the “burst of activity” that reader to his website [8] for further pointers to pa- signals the first appearance of a new topic. His pers on these and other topics, including network stated scientific goal in this work was to devise a analysis and management, gossip algorithms, clus- mathematical model that allows such bursts to be tering, data mining, comparative genomics, and identified efficiently, with a further aim of analyz- geometric pattern matching. ing the content via the associated organizational Returning to our opening theme, Kleinberg’s framework. But, as he amusingly relates in the much-lauded work on information networks is paper’s introduction, his personal motivation (one characterized by (i) identifying and formulating we can all share) was a wish to find an organizing fundamental mathematical structures in questions principle based on time rather than topic for his about the real world, (ii) defining meaningful math- ever-increasing volume of accumulated email. ematical models that represent crucial features The model proposed by Kleinberg—an infinite- of real-world phenomena, and finally (iii) creat- state automaton in which bursts are state ing effective algorithms that solve the resulting transitions—is conceptually related to queueing mathematical problems.

742 Notices of the AMS Volume 54, Number 6 References [1] S. Brin and L. Page, Anatomy of a large-scale hy- pertextual Web search engine, Proceedings of the 7th World Wide Web Conference, Elsevier Science B. V., Amsterdam, 1998. [2] J. Kleinberg, Authoritative sources in a hy- perlinked environment, Proceedings of the 9th ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, PA, 1998, 668–677 (a longer version appears in the Journal of the ACM 46 (1999)). [3] , The small-world phenomenon: An algo- mathematics rithmic perspective, Proceedings of the 32nd ACM Symposium on the Theory of Computing, 2000. LANGUAGE OF THE SCIENCES [4] , Small-world phenomena and the dynam- ics of information, Advances in Neural Information Processing Systems 14 (2001). [5] , Bursty and hierarchical structure in streams, Proceedings of the 8th ACM SIGKDD Inter- national Conference on Knowledge Discovery and Data Mining, ACM Press, New York, NY, 2002, 91–101. [6] S. Milgram, The small-world problem, Psychology Today 1 (1967) . [7] D. Watts and S. Strogatz, Collective dynamics of small-world networks, Nature 393 (1998). [8] http://www.cs.cornell.edu/home/kleinber.

engineering astronomy robotics genetics medicine biology climatology forensics statistics finance computer science physics neuroscience chemistry geology biochemistry ecology molecular biology

June/July 2007 Notices of the AMS 743