Link Prediction and the Evolution of Communities on Twitter

Total Page:16

File Type:pdf, Size:1020Kb

Link Prediction and the Evolution of Communities on Twitter NATURE | Vol 453 | 1 May 2008 LETTERS graph can capture behaviour of this kind using probabilities pr that observed network. These pairs we consider the most likely candidates decrease as we move higher up the tree. Conversely, probabilities that for missing connections. (Technical details of the procedure are given increase as we move up the tree correspond to ‘disassortative’ struc- in Supplementary Information.) tures in which vertices are less likely to be connected on small scales We demonstrate the method by using our three example networks than on large ones. By letting the pr values vary arbitrarily throughout again. For each network we remove a subset of connections chosen the dendrogram, the hierarchical random graph can capture both uniformly at random and then attempt to predict, on the basis of the assortative and disassortative structure, as well as arbitrary mixtures remaining connections, which have been removed. A standard metric of the two, at all scales and in all parts of the network. for quantifying the accuracy of prediction algorithms, commonly To demonstrate our method we have used it to construct hierarch- used in the medical and machine learning communities, is the ical decompositions of three example networks drawn from disparate AUC statistic, which is equivalent to the area under the receiver fields: the metabolic network of the spirochaete Treponema palli- operating characteristic (ROC) curve29. In the present context, the dum18, a network of associations between terrorists19, and a food AUC statistic can be interpreted as the probability that a randomly web of grassland species20. To test whether these decompositions chosen missing connection (a true positive) is given a higher score by accurately capture the important structural features of the networks, our method than a randomly chosen pair of unconnected vertices (a we use the sampled dendrograms to generate new networks, different true negative). Thus, the degree to which the AUC exceeds 0.5 indi- in detail from the originals but, by definition, having similar hier- cates how much better our predictions are than chance. Figure 2 archical structure (see Supplementary Information for more details). shows the AUC statistic for the three networks as a function of the We find that these ‘resampled’ networks match the statistical pro- fraction of the connections known to the algorithm. For all three perties of the originals closely, including their degree distributions, networks our algorithm does far better than chance, indicating that clustering coefficients, and distributions of shortest path lengths hierarchy is a strong general predictor of missing structure. It is also between pairs of vertices, despite the fact that none of these properties instructive to compare the performance of our method with that of is explicitly represented in the hierarchical random graph (Table 1, other methods for link prediction8. Previously proposed methods and Supplementary Fig. 3). It therefore seems that a network’s hier- include assuming that vertices are likely to be connected if they have archical structure is capable of explaining a wide variety of other many common neighbours, if there are short paths between them, or network features as well. if the product of their degrees is large. These approaches work well The dendrograms produced by our method are also of interest in for strongly assortative networks such as collaboration and citation themselves, as a graphical representation and summary of the hier- archical structure of the observed network. As discussed above, our a method can generate not just a single dendrogram but a set of den- drograms, each of which is a good fit to the data. From this set we can, by using techniques from phylogeny reconstruction21, create a single consensus dendrogram, which captures the topological features that appear consistently across all or a large fraction of the dendrograms and typically is a better summary of the network’s structure than any Link Prediction and the Evolution of individual dendrogram. Figure 2a shows such a consensus dendro- gram for the grassland species network, which clearly reveals com- Communities on Twitter munities and subcommunities of plants, herbivores, parasitoids and hyperparasitoids. Master's Thesis Another application of the hierarchical decomposition is the pre- diction of missing interactions in networks. In many settings, the discovery of interactions in a network requires significant experi- mental effort in the laboratory or the field. As a result, our current pictures of many networks are substantially incomplete22–28. An b alternative to checking exhaustively for a connection between every pair of vertices in a network is to try to predict, in advance and on the basis of the connections already observed, which vertices are most likely to be connected, so that scarce experimental resources can be focused on testing for those interactions. If our predictions are good, we can in this way substantially reduce the effort required to establish the network’s topology. The hierarchical decomposition can be used as the basis for an effective method of predicting missing interactions as follows. Given an observed but incomplete network, we generate, as described above, a set of hierarchical random graphs—dendrograms and the associated probabilities pr—that fit that network. Then we look for pairs of vertices that have a high average probability of connection within these hierarchical random graphs but are unconnected in the Table 1 | Comparison of original and resampled networks Figure 2 | Application of the hierarchical decomposition to the network of Network Ækæ Ækæ C C d d real samp real samp real samp grassland species interactions. a, Consensus dendrogram reconstructed T. pallidum 4.83.7(1) 0.0625 0.0444(2) 3.690 3.940(6) from the sampled hierarchical models. b, A visualization of the network in Terrorists 4.95.1(2) 0.361 0.352(1) 2.575 2.794(7) which the upper few levels of the consensus dendrogram are shown as boxes Grassland 3.02.9(1) 0.174 0.168(1) 3.29 3.69(2) around species (plants, herbivores, parasitoids, hyperparasitoidsOscar Casta~neda and hyper- Statistics are shown for the three example networks studied and for new networks generated by hyperparasitoids are shown as circles, boxes, down triangles, up triangles resampling from our hierarchical model. The generated networks closely match the average and diamonds, respectively). Note that in several cases a set of parasitoids is degree Ækæ, clustering coefficient C and average vertex–vertex distance d in each case, suggesting that they capture much of the structure of the real networks. Parenthetical values grouped into a disassortative community by the algorithm, not because they indicate standard errors on the final digits. prey on each other but because they prey on the same herbivore. 99 © 2008 Nature Publishing Group Link Prediction and the Evolution of Communities on Twitter MASTER'S THESIS submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE Track Information Architecture by Oscar Casta~neda born in Guatemala, Guatemala Web Information Systems Group Department of Software Technology Faculty EEMCS, Delft University of Technology Delft, the Netherlands http://eemcs.tudelft.nl c 2011 Oscar Casta~neda. Coverpicture: Network visualization from Clauset et al. [17]. Link Prediction and the Evolution of Communities on Twitter Author: Oscar Casta~neda Student id: 1398946 Email: [email protected] Graduation Date: 24 November 2011 Graduation Section: Web Information Systems Abstract This research is about the influence of link prediction on the evolution of communities on Twitter. We collected tweets from three technology micro- bloggers who led us through their followings and tweets to tens of thousands of unique users over several weeks. We analyzed conventional and alternative information streams for these micro-bloggers based on URLs embedded in their tweets and in tweets of followees and followees-of-followees. We model users based on the most recent URLs embedded on their tweets and the latest users they follow, from which we infer links and extract semantic entities that are indicative of their interests. Furthermore, we propose a pipeline of methods for user modeling and personalization of communities of interest on Twitter. We test the performance of different organizational principles in community design, including the principles of hierarchy, user interests and the baseline follower mechanism on Twitter, which is based on user intuitions. The goal of this thesis is to create a better notion of community by au- tomatically calculating adaptive and personalized structures of followees that produce highly interesting content. Designing communities in this way is use- ful because it enables people to know in which community they are organized during a given period of time and because it enables community-based rec- ommendations. Furthermore, designing communities based on organizational principles enables their automatic construction. Currently, communities are manually constructed by users through a tedious process of following and unfollowing which is based on disconnected user intuitions. We investigate whether it is possible to infer links between Twitter users who are not explic- itly connected on Twitter and explore whether such automatically inferred social networks would allow for improving content recommendations on Twit- ter. Thesis Committee:
Recommended publications
  • BRENT HECHT Curriculum Vitae
    BRENT HECHT Curriculum Vitae College of Science and Engineering e-mail: [email protected] 4-192 Keller Hall http://cs.umn.edu/~bhecht 200 Union Street S.E. twitter: @bhecht Minneapolis, MN 55455 612-625-0572 Research Areas Human-Computer Interaction (HCI), Social Computing, Geography and GIScience, Artificial Intelligence (AI), Natural Language Processing (NLP), Information Visualization Education Northwestern University Doctor of Philosophy, Computer Science Evanston, IL • Advisor: Dr. Darren Gergle 2008 - 2013 • Thesis: Mining and Applying Diverse Perspectives in User-Generated Content (received Best Dissertation Award) University of California, Master of Arts, Geography Santa Barbara • Advisors: Dr. Martin Raubal and Dr. Keith Clarke. Santa Barbara, CA • Thesis: Utilizing Wikipedia as a Spatiotemporal 2005 - 2007 Knowledge Repository. Macalester College Bachelor of Arts (Honors), Computer Science; Bachelor of Arts St. Paul, MN (Honors), Geography (Graduated Magna Cum Laude) 2002 - 2005 • Advisor: Dr. Laura Smith • Thesis: Mapalester: A Free, Easy to Use and Powerful Geographic In- formation System Professional Experience University of Minnesota, Assistant Professor Twin Cities Department of Computer Science and Engineering Minneapolis, MN 2013 - present Microsoft Research Research Intern Fall 2010 Mentors: Dr. Meredith Morris and Dr. Jamie Teevan Xerox PARC Research Intern Summer 2010 Mentors: Dr. Ed Chi and Dr. Lichan Hong Santa Barbara City Adjunct Instructor College Department of Earth Science 2007-2008 CV of Brent Hecht • p. !1 of 18! Publications Refereed Papers in [C.32] Miller, H., Thebault-Spieker, J., Chang, S., Johnson, I., Terveen, Archival Conference L., and Hecht, B. (2016) “Blissfully happy” or “ready to fight”: Varying Proceedings Interpretations of Emoji. Proceedings of ICWSM 2016.
    [Show full text]
  • CHI 2014 Printed Program Available (PDF)
    WELCOME FROM THE CHAIRS Welcome to CHI 2014 CHI is more than a conference, it is an international community along, join the crowd and be energised by our speakers who will each of researchers and practitioners who want to make a difference. bring in their experience of the Big Picture to inspire us. The talks will be Everything we do is focused on uncovering, critiquing and celebrating short - twenty minutes - and then the rest of the day’s programme will radically new ways for people and technology to evolve together. begin. We are also delighted to host a timely retrospective exhibition on People in their everyday contexts, in diverse regions of the world, from wearable technology curated by Thad Starner and Clint Zeagler. very different backgrounds, with alternative outlooks on life drive this innovation. As you take part in the conference sessions we really hope CHI 2014 also includes two days of focused workshops and four days you will experience how powerful this people-centred approach to of technical content, including CHI’s prestigious technical program, with technological transformation can be. 15 parallel sessions of rigorously reviewed research Papers, engaging Panels, Case Studies and Special Interest Groups (SIGs), an extensive CHI as a conference is now in its 32nd year and has grown to become Course program and invited talks from SIGCHI’s award winners: the premier international forum on human-computer interaction, Steve Whittaker, Gillian Grampton Smith and Richard Ladner. We also gathering us all to share innovative interactive insights that shape host student research, design, and game competitions, provocative people’s lives.
    [Show full text]
  • Atlas2 Part2.Pdf
    © 2015 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. For information about special quantity discounts, please e-mail [email protected] This book was set in Adobe Caslon Pro by Tracey Theriault (graphic design and layout) and Katy Börner (concept), Cyberinfrastructure for Network Science Center, School of Informatics and Computing, Indiana University. Printed and bound in Malaysia. Library of Congress Cataloging-in-Publication Data Börner, Katy. Atlas of knowledge : anyone can map / Katy Börner. pages cm One of a series of three publications influenced by the travelling exhibit Places & Spaces: Mapping Science, curated by the Cyberinfrastructure for Network Science Center at Indiana University. Includes bibliographical references and indexes. ISBN 978-0-262-02881-3 (hardcover : alk. paper) 1. Information visualization. 2. Science—Atlases. 3. Statistics—Graphic methods. 4. Science —Study and teaching—Graphic methods. 5. Communication in science— Data processing. 6. Technical illustration. 7. Graph design. I. Title. QA90.B6624 2015 501'.154—dc23 2014028219 10 9 8 7 6 5 4 3 2 1 Contents Analyze & Visualize 44 Statistical Studies 46 Statistical Visualization Types 48 Temporal Studies—“When” 50 Temporal Visualization Types 52 Geospatial Studies—“Where” 54 Geospatial Visualization Types viii 1 21 56 Topical
    [Show full text]
  • Systems for Collective Human Curation of Online Discussion Amy
    Systems for Collective Human Curation of Online Discussion by Amy Xian Zhang B.S., Rutgers University, New Brunswick (2011) M.Phil., University of Cambridge (2012) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2019 ○c Massachusetts Institute of Technology 2019. All rights reserved. Author................................................................ Department of Electrical Engineering and Computer Science August 30, 2019 Certified by. David R. Karger Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Systems for Collective Human Curation of Online Discussion by Amy Xian Zhang Submitted to the Department of Electrical Engineering and Computer Science on August 30, 2019, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The internet was supposed to democratize discussion, allowing people from all walks of life to communicate with each other at scale. However, this vision has not been fully realized—instead online discourse seems to be getting worse, as people are increasingly drowning in discussion, with much of it unwanted or unpleasant. In this thesis, I present new systems that empower discussion participants to work collectively to bring order to discussions through a range of curation tools that superimpose richer metadata structure on top of standard discussion formats. These systems enable the following new capabilities: 1) recursive summarization of threaded forums using Wikum, 2) teamsourced tagging and summarization of group chat using Tilda, 3) fine-grained customization of email delivery within mailing lists using Murmur, and 4) friendsourced moderation of messages against online harassment using Squadbox.
    [Show full text]
  • John T. Riedl
    John T. Riedl [email protected] • University of Minnesota • (612)-624-7372 Address Computer Science Department University of Minnesota Minneapolis, MN 55455 Education B.S. in Mathematics University of Notre Dame May 1983 M.S. in Computer Sciences Purdue University May 1985 Ph.D. in Computer Sciences Purdue University May 1990 Affiliations Fellow of ACM, Fellow of IEEE, and Member of AAAI Research Social Web, Recommender Systems, Collaborative Systems. Interests Professional McKnight Distinguished Professor, University of Minnesota 2012{present Experience Professor, University of Minnesota 2003{2012 Associate Professor, University of Minnesota 1996{2003 Chief Scientist, Net Perceptions 1998{2002 Chief Technology Officer, Net Perceptions 1996{1998 Assistant Professor, University of Minnesota 1990{1996 Research Assistant, Purdue University 1985{1989 Teaching Assistant, Purdue University 1983{1985 1 Awards & James Chen Award for Best UMUAI Journal Article 2012 Honors (with Joe Konstan) McKnight Distinguished Professor 2012 IEEE Fellow 2012 Best Paper Award, 2011 ACM WikiSym Conference 2011 (with Lam, Uduwage, Dong, Sen, Musicant, and Terveen) Outstanding Teacher Award (U of Minnesota CompSci) 2010-11 ACM Software System Award (with GroupLens team) 2010 ACM Fellow 2009 Best Paper Award, 2009 ACM WikiSym Conference 2009 (with Michael Ekstrand) Best Paper Award, 2009 ACM IUI Conference 2009 (with Jesse Vig and Shilad Sen) ACM Distinguished Scientist 2007 Best Paper Award, 2006 ACM CSCW Conference 2006 (with S. Sen and seven other students) IEEE Senior Member 2006 Commerce Technology Award, World Technology Network (NETP) 1999 MIT Sloan School Award for Innovation in E-Commerce (NETP) 1999 George Taylor Award for Exceptional Contributions to Teaching 1995-96 Outstanding Teacher Award (University of Minnesota CompSci) 1990-91, 1991-92, and 1992-93 Bush Foundation Project for Teaching Excellence 1991-92 AT&T Bell Laboratories Ph.D.
    [Show full text]
  • CV of Brent Hecht • P
    BRENT HECHT Curriculum Vitae One Microsoft Way [email protected] [email protected] Redmond, WA http://www.brenthecht.com 98052-6399 USA twitter: @bhecht Research Areas Responsible AI, Human-centered Artificial Intelligence, Human-Computer Interaction (HCI), Spatial Computing (Geography and GIScience), Social Computing, NLP, Information Visualization Education Northwestern University Doctor of Philosophy, Computer Science 2008 - 2013 • Advisor: Dr. Darren Gergle • Thesis: Mining and Applying Diverse Perspectives in User-Generated Content (received Best Dissertation Award) UC Santa Barbara Master of Arts, Geography 2005 - 2007 • Advisors: Dr. Martin Raubal and Dr. Keith Clarke Macalester College Bachelor of Arts (Honors), Computer Science; Bachelor of Arts 2002 - 2005 (Honors), Geography (Graduated Magna Cum Laude) • Advisor: Dr. Laura Smith Professional Experience Microsoft Director of Applied Science 2019 - present Experiences + Devices Northwestern University Associate Professor 2019 - present Department of Computer Science and School of Communication Northwestern University Assistant Professor 2016 - 2019 Department of Computer Science and School of Communication University of Minnesota Adjunct Assistant Professor 2016 - present Department of Computer Science and Engineering University of Minnesota Assistant Professor 2013 - 2016 Department of Computer Science and Engineering Internships Research Intern Microsoft Research (2010; Dr. Meredith Morris and Dr. Jaime Teevan), Xerox PARC (2010; Dr. Ed Chi and Dr. Lichan Hong) Adjunct Faculty Positions Adjunct Instructor Department of Earth Science, Santa Barbara City College (Spring 2007) CV of Brent Hecht • p. 1 of 26 Publications Refereed Papers in Archival [P.68] Yang, L., Holtz, D., Jaffe, S., Suri, S., Sinha, S., Weston, J., Joyce, Publication Venues C., Shah, N., Sherman, K., Hecht, B. and Teevan, J. 2021.
    [Show full text]