Statistical Network Analysis: Models, Issues, and New Directions
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Network Analysis: Models, Issues, and New Directions Edited by: Edoardo M. Airoldi David M. Blei Stephen E. Fienberg Anna Goldenberg Eric P. Xing Alice X. Zheng Contents Preface : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : ix Part I Invited Presentations 1 Structural Inference of Hierarchies in Networks Aaron Clauset, Cristopher Moore (University of New Mexico), Mark E. J. Newman (University of Michigan) : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2 Heider vs Simmel: Emergent Features in Dynamic Structures David Krackhardt (Carnegie Mellon University), Mark S. Handcock (University of Washington) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 17 3 Joint Group and Topic Discovery from Relations and Text Andrew McCallum, Xuerui Wang, and Natasha Mohanty (University of Massachusetts, Amherst) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33 4 Statistical Models for Networks: A Brief Review of Some Recent Research Stanley Wasserman, Garry Robins, Douglas Steinley (Indiana University) 51 Part II Other Presentations 5 Combining Stochastic Block Models and Mixed Membership for Statistical Network Analysis Edoardo M. Airoldi (Carnegie Mellon University), David M. Blei (Princeton), Stephen E. Fienberg (Carnegie Mellon University), Eric P. Xing (Carnegie Mellon University) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 65 vi Contents 6 Exploratory Study of a New Model for Evolving Networks Anna Goldenberg, Alice Zheng (Carnegie Mellon University) : : : : : : : : : : : 83 7 A Latent Space Model for Rank Data Isobel Claire Gormley, Thomas Brendan Murphy (Trinity College Dublin) 99 8 A simple model for complex networks with arbitrary degree distribution and clustering Mark S. Handcock (University of Washington), Martina Morris (University of Washington) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 113 9 Discrete Temporal Models of Social Networks Steve Hanneke, Eric Xing (Carnegie Mellon University) : : : : : : : : : : : : : : 127 10 Approximate Kalman Filters for Embedding Author-Word Co-occurrence Data Over Time Purnamrita Sarkar, Sajid M. Siddiqi, Geoffrey J. Gordon (Carnegie Mellon University) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 139 11 Discovering Functional Communities in Dynamical Networks Cosma Rohilla Shalizi (Carnegie Mellon University) Marcelo F. Camperi (University of San Francisco), Kristina Lisa Klinkner (Carnegie Mellon University) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 155 12 Empirical analysis of a dynamic social network built from PGP keyrings Robert H. Warren (University of Waterloo), Dana Wilkinson (University of Waterloo), Mike Warnecke (PSW Applied Research Inc.) : 173 Part III Extended Abstracts 13 A brief survey of machine learning methods for classification in networked data and an application to suspicion scoring Sofus Attila Macskassy (Fetch Technologies), Foster Provost (New York University) : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 191 14 Age and Geographic Inferences of the LiveJournal Social Network Ian MacKinnon, Robert H. Warren (University of Waterloo) : : : : : : : : : : 195 15 Inferring Organizational Titles in Online Communication Galileo Mark S. Namata Jr. (University of Maryland), Lise Getoor (University of Maryland), Christopher P. Diehl (Johns Hopkins) : : : : : : : 199 Contents vii 16 Learning Approximate MRFs From Large Transactional Data Chao Wang, Srinivasan Parthasarathy (Ohio State University) : : : : : : : : 203 Part IV Panel Discussion 17 Panel Discussion : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 209 Part V Appendix 18 Appendix : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 223 Preface This volume was prepared to share with a larger audience the exciting ideas and work presented at an ICML 2006 workshop of the same title. Network models have a long history. Sociologists and statisticians made major advances in the 1970s and 1980s, culminating in part with a number of substantial data bases and the class of exponential random graph models and related methods in the early 1990s. Physicists and computer scientists came to this domain considerably later, but they enriched the array of models and approaches and began to tackle much larger networks and more complex forms of data. Our goal in organizing the workshop was to encourage a di- alog among people coming from different disciplinary perspectives and with different methods, models, and tools. Both the workshop and the editing of the proceedings was a truly col- laborative effort on behalf of all six editors, but three in particular deserve special recognition. Anna Goldenberg and Alice Zheng were the driving force behind the entire enterprise and Edo Airoldi assisted on a number of the more important arrangements. The editing process involved two stages. We were assisted in the review of initial submissions by a program committee that including the following individuals: David banks, Duke University • Peter Dodds, Columbia University • Lise Getoor, University of Maryland • Mark Handcock, University of Washington, Seattle • Peter Hoff,University of Washington, Seattle • David Jensen, University of Massachusetts, Amherst • Alan Karr, National Institute of Statistical Sciences • Jon Kleinberg, Cornell University • Andrew McCallum, University of Massachusetts, Amherst • Foster Provost, New York University • Cosma Shalizi, Carnegie Mellon University • x Padhraic Smyth University of California, Irvine • Josh Tenenbaum, Massachusetts Institute of Technology • Stanley Wasserman, Indiana University • Following the workshop, all papers went through a second round of review and editing (and a few went through a third round). We are indebted to Caroline Sheedy and Heidi Sestrich who managed the preparation of the final LaTeX manuscript and did a final cleaning and editing of all materials. Without their assistance this proceedings would not exist. Stephen E. Fienberg February 14, 2007 Part I Invited Presentations 1 Structural Inference of Hierarchies in Networks Aaron Clauset1, Cristopher Moore1;2, and Mark E. J. Newman3 1 Department of Computer Science and 2 Department of Physics and Astronomy, University of New Mexico, Albuquerque, NM 87131 USA 3 Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109 USA Summary. One property of networks that has received comparatively little atten- tion is hierarchy, i.e., the property of having vertices that cluster together in groups, which then join to form groups of groups, and so forth, up through all levels of orga- nization in the network. Here, we give a precise definition of hierarchical structure, give a generic model for generating arbitrary hierarchical structure in a random graph, and describe a statistically principled way to learn the set of hierarchical fea- tures that most plausibly explain a particular real-world network. By applying this approach to two example networks, we demonstrate its advantages for the interpre- tation of network data, the annotation of graphs with edge, vertex and community properties, and the generation of generic null models for further hypothesis testing. 1.1 Introduction Networks or graphs provide a useful mathematical representation of a broad variety of complex systems, from the World Wide Web and the Internet to social, biochemical, and ecological systems. The last decade has seen a surge of interest across the sciences in the study of networks, including both em- pirical studies of particular networked systems and the development of new techniques and models for their analysis and interpretation [1, 2]. Within the mathematical sciences, researchers have focused on the statis- tical characterization of network structure, and, at times, on producing de- scriptive generative mechanisms of simple structures. This approach, in which scientists have focused on statistical summaries of network structure, such as path lengths [3, 4], degree distributions [5], and correlation coefficients [6], stands in contrast with, for example, the work on networks in the social and biological sciences, where the focus is instead on the properties of individual vertices or groups. More recently, researchers in both areas have become more interested in the global organization of networks [7, 8]. One property of real-world networks that has received comparatively little attention is that of hierarchy, i.e., the observation that networks often have 4 Clauset, Moore and Newman a fractal-like structure in which vertices cluster together into groups that then join to form groups of groups, and so forth, from the lowest levels of organization up to the level of the entire network. In this paper, we offer a precise definition of the notion of hierarchy in networks and give a generic model for generating networks with arbitrary hierarchical structure. We then describe an approach for learning such models from real network data, based on maximum likelihood methods and Markov chain Monte Carlo sampling. In addition to inferring global structure from graph data, our method allows the researcher to annotate a graph with community structure,