Tractable Algorithms for Proximity Search on Large Graphs

Total Page:16

File Type:pdf, Size:1020Kb

Tractable Algorithms for Proximity Search on Large Graphs Tractable Algorithms for Proximity Search on Large Graphs Purnamrita Sarkar July 2010 CMU-ML-10-107 Machine Learning Department School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Andrew W. Moore, Chair Geoffrey J. Gordon Anupam Gupta Jon Kleinberg, Cornell University Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Copyright c 2010 Purnamrita Sarkar This research was sponsored by the National Science Foundation under grant numbers AST-0312498, IIS- 0325581, and IIS-0911032, the Defense Advanced Research Projects Agency under grant number S518700 and EELD grant number F30602-01-2-0569, the U.S. Department of Agriculture under grant numbers 53-3A94-03-11 and TIRNO-06-D-00020, the Department of Energy under grant numbers W7405ENG48 and DE-AC52-07NA27344, the Air Force Research Laboratory under grant numbers FA865005C7264 and FA875006C0216, and the National Institute of Health under grant number 1R01PH000028. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: random walks, proximity measures, graphs, nearest neighbors, link pre- diction, algorithms To, Ramaprasad Das, my grandfather, who is behind everything I am, or ever will be. iv Abstract Identifying the nearest neighbors of a node in a graph is a key ingre- dient in a diverse set of ranking problems, e.g. friend suggestion in social networks, keyword search in databases, web-spam detection etc. For find- ing these “near” neighbors, we need graph theoretic measures of similarity or proximity. Most popular graph-based similarity measures, e.g. length of short- est path, the number of common neighbors etc., look at the paths between two nodes in a graph. One such class of similarity measures arise from random walks. In the context of using these measures, we identify and address two im- portant problems. First, we note that, while random walk based measures are useful, they are often hard to compute. Hence we focus on designing tractable algorithms for faster and better ranking using random walk based proximity measures in large graphs. Second, we theoretically justify why path-based similarity measures work so well in practice. For the first problem, we focus on improving the quality and speed of near- est neighbor search in real-world graphs. This work consists of three main components: first we present an algorithmic framework for computing near- est neighbors in truncated hitting and commute times, which are proximity measures based on short term random walks. Second, we improve upon this ranking by incorporating user feedback, which can counteract ambiguities in queries and data. Third, we address the problem of nearest neighbor search when the underlying graph is too large to fit in main memory. We also prove a number of interesting theoretical properties of these measures, which have been key to designing most of the algorithms in this thesis. We address the second problem by bringing together a well known gen- erative model for link formation, and geometric intuitions. As a measure of the quality of ranking, we examine link prediction, which has been the pri- mary tool for evaluating the algorithms in this thesis. Link prediction has been extensively studied in prior empirical surveys. Our work helps us better understand some common trends in the predictive performance of different measures seen across these empirical results. vi Acknowledgements It is almost impossible to express my gratitude to Andrew Moore, my advisor, in a couple of words. To him, I owe many things. Through his brilliant insights, infectious enthusiasm, and gentle guidance he has helped me formulate the problems of this thesis, and solve them. He has been like a father figure for me through all these years. Besides having time for brainstorming on new ideas and exciting algorithms, he has also lent a patient and affectionate ear to my personal problems. I have learnt many things about research and life from him. I have learnt that, when it comes to dealing with issues with people, the best way is to be honest, direct and sincere. And above all, I have learnt that, it is not the success and fame, but “loving the science” that brings us to work every morning. This thesis was shaped by the insightful suggestions, and perceptive comments of my thesis committee members. In fact, the last chapter of this thesis was mostly written be- cause of the great questions they had brought up during my thesis proposal. Anupam Gupta and Geoffrey Gordon have always had open doors for all my questions, doubts, and half-formed ideas; I am indeed very thankful to them. I am grateful to Jon Kleinberg, for always finding time to talk to me about my research at conferences, or during his brief visits to Carnegie Mellon. He has also been extremely supportive and helpful during the stressful times of job search. I am very grateful to many of the faculty members at Carnegie Mellon. Gary Miller’s lectures and discussions on Spectral Graph Theory and Scientific Computing have been a key to my understanding of random walks on graphs. Drew Bagnell and Jeff Schneider have given valuable suggestions related to my research and pointed me to relevant litera- ture. Artur Dubrawski has been very supportive of my research, and has always inspired me to apply my ideas to completely new areas. I will like to thank Carlos Guestrin, who has helped me immensely in times of need. Tom Mitchell and John Lafferty have helped me through some difficult times at CMU; I am grateful to them. I would like to acknowl- edge Christos Faloutsos, Steve Fienberg and Danny Sleator for their help and guidance during my years at CMU. I will also like to thank Charles Isbell at Georgia Tech; interning with him as an undergraduate gave me my first taste of ideas from machine learning and statistics, and so far it has been the most enjoyable and fulfilling experience. During my stay at Carnegie Mellon, I have made some great friends and life would not vii have been the same without them. I have had the pleasure to know Shobha Venkataraman, Sajid Siddiqi, Sanjeev Koppal, Stano Funiak, Andreas Zollmann, Viji Manoharan, Jeremy Kubica, Jon Huang, Lucia Castellanos, Khalid El-Arini, Ajit Singh, Anna Goldenberg, Andy Schlaikjer, Alice Zheng, Arthur Gretton, Andreas Krause, Hetunandan and Ram Ravichandran. They have looked out for me in difficult times; brainstormed about new exciting ideas; read my papers at odd hours before pressing deadlines; and last, but not least, helped me during job search. To my officemates Indrayana Rustandi, and Shing- Hon Lau, it was a pleasure to share an office, which led to the sharing of small wins and losses from everyday. Thanks to the current and past members of the Auton Lab for their friendship, and good advice. I am grateful to Diane Stidle, Cathy Serventi and Michele Kiriloff for helping me out with administrative titbits, and above all being so kind, friendly and patient. Finally, I would like to thank my family, my parents Somenath and Sandipana Sarkar, and my little brother Anjishnu for their unconditional love, endless support and unwaver- ing faith in me. To Deepayan, my husband, thanks for being a wonderful friend and a great colleague, and for being supportive of my inherent stubbornness. viii Contents 1 Introduction 1 2 Background and Related Work 9 2.1 Introduction . .9 2.2 Random Walks on Graphs: Background . 10 2.2.1 Random Walk based Proximity Measures . 11 2.2.2 Other Graph-based Proximity Measures . 17 2.2.3 Graph-theoretic Measures for Semi-supervised Learning . 18 2.2.4 Clustering with Random Walk Based Measures . 20 2.3 Related Work: Algorithms . 22 2.3.1 Algorithms for Hitting and Commute Times . 23 2.3.2 Algorithms for Computing Personalized Pagerank and Simrank . 24 2.3.3 Algorithms for Computing Harmonic Functions . 28 2.4 Related Work: Applications . 28 2.4.1 Application in Computer Vision . 29 2.4.2 Text Analysis . 29 2.4.3 Collaborative Filtering . 31 2.4.4 Combating Webspam . 31 2.5 Related Work: Evaluation and Datasets . 32 2.5.1 Evaluation: Link Prediction . 32 2.5.2 Publicly Available Data Sources . 33 2.6 Discussion . 35 ix 3 Fast Local Algorithm for Nearest-neighbor Search in Truncated Hitting and Commute Times 37 3.1 Introduction . 37 3.2 Short Term Random Walks on Graphs . 40 3.2.1 Truncated Hitting Times . 40 3.2.2 Truncated and True Hitting Times . 41 3.2.3 Locality Properties . 41 3.3 GRANCH: Algorithm to Compute All Pairs of Nearest Neighbors . 44 3.3.1 Expanding Neighborhood . 48 3.3.2 Approximate Nearest Neighbors . 49 3.4 Hybrid Algorithm . 51 3.4.1 Sampling Scheme . 52 3.4.2 Lower and Upper Bounds on Truncated Commute Times . 53 3.4.3 Expanding Neighborhood . 53 3.4.4 The Algorithm at a Glance . 54 3.5 Empirical Results . 55 3.5.1 Co-authorship Networks . 55 3.5.2 Entity Relation Graphs . 57 3.5.3 Experiments . 58 3.6 Summary and Discussion . 62 4 Fast Dynamic Reranking in Large Graphs 65 4.1 Introduction . 65 4.2 Relation to Previous Work . 67 4.3 Motivation . 69 4.4 Harmonic Functions on Graphs, and T-step Variations . 71 4.4.1 Unconditional vs. Conditional Measures . 72 4.5 Algorithms . 75 4.5.1 The Sampling Scheme . 76 4.5.2 The Branch and Bound Algorithm . 76 4.5.3 Number of Nodes with High Harmonic Function Value .
Recommended publications
  • Four Results of Jon Kleinberg a Talk for St.Petersburg Mathematical Society
    Four Results of Jon Kleinberg A Talk for St.Petersburg Mathematical Society Yury Lifshits Steklov Institute of Mathematics at St.Petersburg May 2007 1 / 43 2 Hubs and Authorities 3 Nearest Neighbors: Faster Than Brute Force 4 Navigation in a Small World 5 Bursty Structure in Streams Outline 1 Nevanlinna Prize for Jon Kleinberg History of Nevanlinna Prize Who is Jon Kleinberg 2 / 43 3 Nearest Neighbors: Faster Than Brute Force 4 Navigation in a Small World 5 Bursty Structure in Streams Outline 1 Nevanlinna Prize for Jon Kleinberg History of Nevanlinna Prize Who is Jon Kleinberg 2 Hubs and Authorities 2 / 43 4 Navigation in a Small World 5 Bursty Structure in Streams Outline 1 Nevanlinna Prize for Jon Kleinberg History of Nevanlinna Prize Who is Jon Kleinberg 2 Hubs and Authorities 3 Nearest Neighbors: Faster Than Brute Force 2 / 43 5 Bursty Structure in Streams Outline 1 Nevanlinna Prize for Jon Kleinberg History of Nevanlinna Prize Who is Jon Kleinberg 2 Hubs and Authorities 3 Nearest Neighbors: Faster Than Brute Force 4 Navigation in a Small World 2 / 43 Outline 1 Nevanlinna Prize for Jon Kleinberg History of Nevanlinna Prize Who is Jon Kleinberg 2 Hubs and Authorities 3 Nearest Neighbors: Faster Than Brute Force 4 Navigation in a Small World 5 Bursty Structure in Streams 2 / 43 Part I History of Nevanlinna Prize Career of Jon Kleinberg 3 / 43 Nevanlinna Prize The Rolf Nevanlinna Prize is awarded once every 4 years at the International Congress of Mathematicians, for outstanding contributions in Mathematical Aspects of Information Sciences including: 1 All mathematical aspects of computer science, including complexity theory, logic of programming languages, analysis of algorithms, cryptography, computer vision, pattern recognition, information processing and modelling of intelligence.
    [Show full text]
  • Navigability of Small World Networks
    Navigability of Small World Networks Pierre Fraigniaud CNRS and University Paris Sud http://www.lri.fr/~pierre Introduction Interaction Networks • Communication networks – Internet – Ad hoc and sensor networks • Societal networks – The Web – P2P networks (the unstructured ones) • Social network – Acquaintance – Mail exchanges • Biology (Interactome network), linguistics, etc. Dec. 19, 2006 HiPC'06 3 Common statistical properties • Low density • “Small world” properties: – Average distance between two nodes is small, typically O(log n) – The probability p that two distinct neighbors u1 and u2 of a same node v are neighbors is large. p = clustering coefficient • “Scale free” properties: – Heavy tailed probability distributions (e.g., of the degrees) Dec. 19, 2006 HiPC'06 4 Gaussian vs. Heavy tail Example : human sizes Example : salaries µ Dec. 19, 2006 HiPC'06 5 Power law loglog ppk prob{prob{ X=kX=k }} ≈≈ kk-α loglog kk Dec. 19, 2006 HiPC'06 6 Random graphs vs. Interaction networks • Random graphs: prob{e exists} ≈ log(n)/n – low clustering coefficient – Gaussian distribution of the degrees • Interaction networks – High clustering coefficient – Heavy tailed distribution of the degrees Dec. 19, 2006 HiPC'06 7 New problematic • Why these networks share these properties? • What model for – Performance analysis of these networks – Algorithm design for these networks • Impact of the measures? • This lecture addresses navigability Dec. 19, 2006 HiPC'06 8 Navigability Milgram Experiment • Source person s (e.g., in Wichita) • Target person t (e.g., in Cambridge) – Name, professional occupation, city of living, etc. • Letter transmitted via a chain of individuals related on a personal basis • Result: “six degrees of separation” Dec.
    [Show full text]
  • Understanding Innovations and Conventions and Their Diffusion Process in Online Social Media
    UNDERSTANDING INNOVATIONS AND CONVENTIONS AND THEIR DIFFUSION PROCESS IN ONLINE SOCIAL MEDIA A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Rahmtin Rotabi December 2017 c 2017 Rahmtin Rotabi ALL RIGHTS RESERVED UNDERSTANDING INNOVATIONS AND CONVENTIONS AND THEIR DIFFUSION PROCESS IN ONLINE SOCIAL MEDIA Rahmtin Rotabi, Ph.D. Cornell University 2017 This thesis investigates innovations, trends, conventions and practices in online social media. Tackling these problems will give more insight into how their users use these online platforms with the hope that the results can be generalized to the offline world. Every major step in human history was accompanied by an innovation, from the time that mankind invented and mastered the production of fire, to the invention of the World Wide Web. The societal process of adopting innovations has been a case that has fas- cinated many researchers throughout the past century. Prior to the existence of online social networks, economists and sociologists were able to study these phenomena on small groups of people through microeconomics and microsociology. However, the data gathered from these online communities help us to take one step further, initiating stud- ies on macroeconomic and macrosociologal problemsin addition to the previous two areas. Work in this thesis sheds light on the properties of both innovators and laggards, the expansion and adaptation of innovation, competition among innovations with the same purpose, and the eventual crowding out of competitor innovations in the target society. Lastly, we look at the bigger picture by studying the entire diffusion process as a whole, abstracting out a great deal of details.
    [Show full text]
  • CS Cornell 40Th Anniversary Booklet
    Contents Welcome from the CS Chair .................................................................................3 Symposium program .............................................................................................4 Meet the speakers ..................................................................................................5 The Cornell environment The Cornell CS ambience ..............................................................................6 Faculty of Computing and Information Science ............................................8 Information Science Program ........................................................................9 College of Engineering ................................................................................10 College of Arts & Sciences ..........................................................................11 Selected articles Mission-critical distributed systems ............................................................ 12 Language-based security ............................................................................. 14 A grand challenge in computer networking .................................................16 Data mining, today and tomorrow ...............................................................18 Grid computing and Web services ...............................................................20 The science of networks .............................................................................. 22 Search engines that learn from experience ..................................................24
    [Show full text]
  • Elliot Anshelevich
    ELLIOT ANSHELEVICH [email protected] Department of Computer Science Office: (607) 255-5578 Cornell University Cell: (607) 262-6170 Upson Hall 5139 Fax: (607) 255-4428 Ithaca, NY 14853 http://www.cs.cornell.edu/people/eanshel Research Interests My interests center on the design and analysis of algorithms. I am especially inter- ested in algorithms for large decentralized networks, including networks involving strategic agents. In particular, I am interested in: • Strategic agents in networks, and influencing their behavior • Network design problems • Distributed load balancing algorithms • Local and decentralized routing algorithms • Influence and information propagation in both social and computer networks Education Cornell University, Ithaca, New York, 2000-2005 Ph.D. Computer Science, expected May 2005 Thesis title: Design and Management of Networks with Strategic Agents Advisor: Jon Kleinberg Master of Science, May 2004 Rice University, Houston, Texas, 1996-2000 B.S. Computer Science, May 2000 Double major in Computer Science and Mathematics magna cum laude Professional and Research Experience Lucent Technologies, Murray Hill, New Jersey Summer 2004 Conducted research on game theoretic network design with Gordon Wilfong and Bruce Shepherd. We study the peering and customer-provider relationships be- tween Autonomous Systems in the Internet, and analyze them using algorithmic game theory. This research is still ongoing and we expect to submit our results for publication in Spring 2005. Lucent Technologies, Murray Hill, New Jersey Summer 2003 Conducted research as an intern on optical network design at Bell Labs. This re- search was done together with Lisa Zhang. We addressed the problem of designing a cheap optical network that satisfies all user demands.
    [Show full text]
  • Link Prediction with Combinatorial Structure: Block Models and Simplicial Complexes
    Link Prediction with Combinatorial Structure: Block Models and Simplicial Complexes Jon Kleinberg Including joint work with Rediet Abebe, Austin Benson, Ali Jadbabaie, Isabel Kloumann, Michael Schaub, and Johan Ugander. Cornell University Neighborhoods and Information-Sharing Marlow-Byron-Lento-Rosenn 2009 Node neighborhoods are the \input" in the basic metaphor for information sharing in social media. w s v A definitional problem: For a given m, who are the m closest people to you in a social network? Defining \closeness" just by hop count has two problems: Not all nodes at a given number of hops seems equally \close." The small-world phenomenon: most nodes are 1, 2, 3, or 4 hops away. Many applications for this question: Content recommendation: What are the m closest people discussing? Link recommendation: Who is the closest person you're not connected to? Similarity: Which nodes should I include in a group with s? Problems Related to Proximity w s v Two testbed versions of these questions: Link Prediction: Given a snapshot of the network up to some time t0, produce a ranked list of pairs likely to form links after time t0. [Kleinberg-LibenNowell 2003, Backstrom-Leskovec 2011, Dong-Tang-Wu-Tian-Chawla-Rao-Cao 2012]. Seed Set Expansion: Given some representative members of a group, produce a ranked list of nodes likely to also belong to the group [Anderson-Chung-Lang 2006, Abrahao et al 2012, Yang-Leskovec 2012]. Problems Related to Proximity w s v Methods for seed set expansion Local: Add nodes iteratively by some measure of the number of edges they have to current group members.
    [Show full text]
  • The Mathematical Work of Jon Kleinberg
    The Mathematical Work of Jon Kleinberg Gert-Martin Greuel, John E. Hopcroft, and Margaret H. Wright The Notices solicited the following article describing the work of Jon Kleinberg, recipient of the 2006 Nevanlinna Prize. The International Mathematical Union also issued a news release, which appeared in the October 2006 issue of the Notices. The Rolf Nevanlinna Prize is awarded by the In- Find the Mathematics; Then Solve the ternational Mathematical Union for “outstanding Problem contributions in mathematical aspects of informa- Kleinberg broadly describes his research [8] as cen- tion sciences”. Jon Kleinberg, the 2006 recipient tering “around algorithmic issues at the interface of the Nevanlinna Prize, was cited by the prize of networks and information, with emphasis on committee for his “deep, creative, and insightful the social and information networks that underpin contributions to the mathematical theory of the the Web and other on-line media”. The phenomena global information environment”. that Kleinberg has investigated arise from neither Our purpose is to present an overall perspective the physical laws of science and engineering nor on Kleinberg’s work as well as a brief summary of the abstractions of mathematics. Furthermore, the three of his results: networks motivating much of his work lack both • the “hubs and authorities” algorithm, deliberate design and central control, as distinct based on structural analysis of link topol- from (for example) telephone networks, which ogy, for locating high-quality information were built and directed to achieve specific goals. on the World Wide Web; He has focused instead on networks that feature • methods for discovering short chains in very large numbers of unregulated interactions be- large social networks; and tween decentralized, human-initiated actions and • techniques for modeling, identifying, and structures.
    [Show full text]
  • Graph Evolution: Densification and Shrinking Diameters
    Graph Evolution: Densification and Shrinking Diameters JURE LESKOVEC Carnegie Mellon University JON KLEINBERG Cornell University and CHRISTOS FALOUTSOS Carnegie Mellon University How do real graphs evolve over time? What are normal growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n)orO(log(log n)). Existing graph generation models do not exhibit these types of behavior even at a qualitative level. We provide a new graph generator, based on a forest fire spreading process that has a simple, intuitive justification, requires very few parameters (like the flammability of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study. This material is based on work supported by the National Science Foundation under Grants No. IIS-0209107 SENSOR-0329549 EF-0331657IIS-0326322 IIS- 0534205, CCF-0325453, IIS-0329064, CNS-0403340, CCR-0122581; a David and Lucile Packard Foundation Fellowship; and also by the Pennsylvania Infrastructure Technology Alliance (PITA), a partnership of Carnegie Mellon, Lehigh University and the Commonwealth of Pennsylvania’s Department of Community and Economic Development (DCED).
    [Show full text]
  • Pagerank Algorithm
    Link Analysis Algorithms Copyright Ellis Horowitz 2011-2019 Overview of the talk • Background on Citation Analysis • Google’s PageRank Algorithm • Simplified Explanation of PageRank • Examples of how PageRank algorithm works • OBservations about PageRank algorithm • Importance of PageRank • KleinBerg’s HITS Algorithm Copyright Ellis Horowitz, 2011-2019 2 History of Link Analysis • Bibliometrics has been active since at least the 1960’s • A definition from Wikipedia: • "Bibliometrics is a set of methods to quantitatively analyze academic literature. Citation analysis and content analysis are commonly used bibliometric methods. While bibliometric methods are most often used in the field of library and information science, bibliometrics have wide applications in other areas.” • Many research fields use bibliometric methods to explore – the impact of their field, – the impact of a set of researchers, or – the impact of a particular paper. Copyright Ellis Horowitz, 2011-2019 3 Bibliometrics http://citeseerx.ist.psu.edu/stats/citations • One common technique of Top Ten Most Cited Articles in CS Literature Bibliometrics is citation analysis CiteSeerx is a search engine for academic papers • Citation analysis is the examination of the frequency, patterns, and graphs of citations in articles and books. • citation analysis can observe links to other works or other researchers. • Bibliographic coupling: two papers that cite many of the same papers • Co-citation: two papers that were cited by many of the same papers • Impact factor (of a journal): frequency with which the average article in a journal has been cited in a particular year or period Copyright Ellis Horowitz 2011-2018 Bibliographic Coupling • Measure of similarity of documents introduced by Kessler of MIT in 1963.
    [Show full text]
  • The Work of Jon Kleinberg
    The work of Jon Kleinberg John Hopcroft Introduction Jon Kleinberg’s research has helped lay the theoretical foundations for the information age. He has developed the theory that underlies search engines, collaborative filtering, organizing and extracting information from sources such as the World Wide Web, news streams, and the large data collections that are becoming available in astronomy, bioinformatics and many other areas. The following is a brief overview of five of his major contributions. Hubs and authorities In the 1960s library science developed the vector space model for representing doc- uments [13]. The vector space model is constructed by sorting all words in the vocabulary of some corpus of documents and forming a vector space model where each dimension corresponds to one word of the vocabulary. A document is repre- sented as a vector where the value of each coordinate is the number of times the word associated with that dimension appears in the document. Two documents are likely to be on a related topic if the angle between their vector representations is small. Early search engines relied on the vector space model to find web pages that were close to a query. Jon’s work on hubs and authorities recognized that the link structure of the web provided additional information to aid in tasks such as search. His work on hubs and authorities addressed the problem of how, out of the millions of documents on the World Wide Web (WWW), you can select a small number in response to a query. Prior to 1997 search engines selected documents based on the vector space model or a variant of it.
    [Show full text]
  • On the Nature of the Theory of Computation (Toc)∗
    Electronic Colloquium on Computational Complexity, Report No. 72 (2018) On the nature of the Theory of Computation (ToC)∗ Avi Wigderson April 19, 2018 Abstract [This paper is a (self contained) chapter in a new book on computational complexity theory, called Mathematics and Computation, available at https://www.math.ias.edu/avi/book]. I attempt to give here a panoramic view of the Theory of Computation, that demonstrates its place as a revolutionary, disruptive science, and as a central, independent intellectual discipline. I discuss many aspects of the field, mainly academic but also cultural and social. The paper details of the rapid expansion of interactions ToC has with all sciences, mathematics and philosophy. I try to articulate how these connections naturally emanate from the intrinsic investigation of the notion of computation itself, and from the methodology of the field. These interactions follow the other, fundamental role that ToC played, and continues to play in the amazing developments of computer technology. I discuss some of the main intellectual goals and challenges of the field, which, together with the ubiquity of computation across human inquiry makes its study and understanding in the future at least as exciting and important as in the past. This fantastic growth in the scope of ToC brings significant challenges regarding its internal orga- nization, facilitating future interactions between its diverse parts and maintaining cohesion of the field around its core mission of understanding computation. Deliberate and thoughtful adaptation and growth of the field, and of the infrastructure of its community are surely required and such discussions are indeed underway.
    [Show full text]
  • Arxiv:2010.12611V1 [Cs.SI] 23 Oct 2020
    Clustering via Information Access in a Network∗ Hannah C. Beilinson Nasanbayar Ulzii-Orshikh Ashkan Bashardoust Haverford College Haverford College University of Utah Haverford, PA Haverford, PA Salt Lake City, UT [email protected] [email protected] [email protected] Sorelle A. Friedler Carlos E. Scheidegger Haverford College University of Arizona Haverford, PA Tucson, AZ [email protected] [email protected] Suresh Venkatasubramanian University of Utah Salt Lake City, UT [email protected] October 27, 2020 Abstract Information flow in a graph (say, a social network) has typically been modeled using standard influence propagation methods, with the goal of determining the most effective ways to spread information widely. More recently, researchers have begun to study the differing access to information of individuals within a network. This previous work suggests that information access is itself a potential aspect of privilege based on network position. While concerns about fairness usually focus on differences between demographic groups, characterizing network position may itself give rise to new groups for study. But how do we characterize position? Rather than using standard grouping methods for graph clustering, we design and explore a clustering that explicitly incorporates models of how information flows on a network. Our goal is to identify clusters of nodes that are similar based on their access to information across the network. We show, both formally and experimentally, that the resulting clustering method is a new approach to network clustering. Using a wide variety of datasets, our experiments show that the introduced clustering technique clusters individuals together who are similar based on an external information access measure.
    [Show full text]