Topological Evolution of Networks: Case Studies in the US Airlines and Language Wikipedias By
Total Page:16
File Type:pdf, Size:1020Kb
Topological Evolution of Networks: Case Studies in the US Airlines and Language Wikipedias by Gergana Assenova Bounova B.S., Theoretical Mathematics, Massachusetts Institute of Technology (2003) B.S., Aeronautics & Astronautics, Massachusetts Institute of Technology (2003) S.M., Aeronautics & Astronautics, Massachusetts Institute of Technology (2005) Submitted to the Department of Aeronautics and Astronautics in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 2009 c 2009 Gergana A. Bounova, All rights reserved. ... The author hereby grants to MIT the permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part. Author.......................................................................................... Department of Aeronautics and Astronautics February 27, 2009 Certified by..................................................................................... Prof. Olivier L. de Weck Associate Professor of Aeronautics and Astronautics and Engineering Systems Thesis Supervisor Certified by..................................................................................... Prof. Christopher L. Magee Professor of the Practice of Mechanical Engineering and Engineering Systems Certified by..................................................................................... Dr. Daniel E. Whitney Senior Research Scientist, Center for Technology, Policy and Industrial Development, Senior Lecturer in Engineering Systems and Mechanical Engineering Accepted by..................................................................................... Prof. David Darmofal Associate Department Head Chair, Committee on Graduate Students 2 Topological Evolution of Networks: Case Studies in the US Airlines and Language Wikipedias by Gergana Assenova Bounova Submitted to the Department of Aeronautics and Astronautics on February 27, 2009, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract This thesis examines the topology of engineering systems and how that topology changes over time. Topology refers to the relative arrangement and connectivity of the elements of a system. We review network theory relevant to topological evolution and use graph-theoretical methods to analyze real systems, represented as networks. Using existing graph generative models, we develop a profile of canonical graphs and tools to compare a real network to that profile. The developed metrics are used to track topology changes over the history of real networks. This theoretical work is applied to two case studies. The first discusses the US airline industry in terms of routes. We study various airlines and segments of the industry statistically and find commonly occurring patterns. We show that there are topology transitions in the history of air- lines in the period 1990-2007. Most airline networks have similar topology and historical patterns, with the exception of Southwest Airlines. We show mathematically that Southwest’s topology is different. We propose two heuristic growth models, one featuring hub-seeding derived from the underlying patterns of evolution of JetBlue Airways and one featuring local interconnectedness, derived from the patterns of growth of Southwest. The two models match the topologies of these airlines better than canonical models over time. Results suggest that Southwest is becoming more centralized, closer to the hub-spoke topologies of other airlines. Our second case study discusses the growth of language Wikipedia networks, where nodes are articles and hyperlinks are the connections between them. These knowledge networks are subject to different constraints than air transportation systems. The topology of these networks and their growth principles are completely different. Most Wikipedias studied grow by coalescence, with multiple disconnected thematic clusters of pages growing separately and over time, converging to a giant connected component via weak links. These topologies start out as simple trees, and coalesce into sparse hierarchical structures with random interlinking. One striking exception is the history of the Chinese Wikipedia, which grows fully connected from its inception. We discuss these patterns of growth comparatively across Wikipedias, and in general, compared to airline networks. Our work suggests that complex engineering systems are hybrids of pure canonical forms and that they undergo distinct phase transitions during their evolution. We find commonality among systems and uncover important differences by learning from the exceptions. Thesis Supervisor: Prof. Olivier L. de Weck Title: Associate Professor of Aeronautics and Astronautics and Engineering Systems 3 4 Acknowledgments I chose to stay at MIT for a doctoral degree because of Professor Olivier de Weck. I recognized early that he would be a great advisor. I know my peers share this opinion. In 2006 we nominated him for the best graduate student advisor award (Frank E. Perkins award) at MIT, and he de- servedly won. Oli, thank you for letting me pursue research that I care about, and for supporting me in often taking an unconventional route. I admire your energy, kindness and concern for the well-being of students. I would like to thank the rest of my committee, Prof. Christopher Magee and Dr. Daniel Whitney for their help, guidance, criticism and encouragement. Especially, I am grateful to Dan Whitney, for many interesting discussions, help with research directions and multiple revisions on my thesis till the very end. Thank you. Thanks to Dr. Philippe Bonnefoy for helpful feedback on airline networks, and multiple revi- sions on my thesis. The de Weck group throughout the years and the 33-409 cluster have been an integral part of my graduate school experience. I enjoyed working and sharing experiences with Kalina Galabova, Erica Gralla, Chris Graff, Ariane Chepko, Nii Armar, Jaemyung Ahn, Ryan Boas and Wilfried Hofstetter. Many thanks to Wilfried and Luca Bertuccelli for helping with my defense presentation. Some of the best moments at MIT I spent playing volleyball. I am grateful for the existence of the MIT Women’s Volleyball Club. Thanks to Tony, Deb, Ellen, Rahmat, Paola and Tracy and a hundred other people...! It’s been great playing with you and winning some tournaments! Thanks to everyone at St Mary’s and all my friends who have kept me sane and made me who I am - Jaisel, Nodari, Bistra, Baber, Thi and Francois, Frances, Angelita and Sophocles. Many thanks to Anna Custo, for her friendship, encouragement, and for lending me some time on the MGH supercluster when the motif search was taking months to compute. Finally, I would like to thank my whole family, my parents, grandparents and brother. My fa- ther inspired me to become an engineer, and to be curious about how things work, and my mother taught me how to think unconventionally. I am lucky to have my brother, who has made significant contributions to this thesis. He rewrote the motif search algorithm and compiled it in C, which made it a hundred times faster. He also parsed the static Wikipedia dumps into network snapshots that I could analyze. Without him, the Wikipedia Chapter would not be there. Dimo, I am proud to be your sister. This thesis is dedicated to my grandmother, Gina, the most hard-working and selfless person I know. 5 6 Contents 0 Introduction 23 0.1 Complex Systems Change . 23 0.1.1 Growth and Complexity . 23 0.1.2 Visualization of Complex Data . 24 0.2 Complex Systems and Topology . 24 0.3 Complex Systems and Topology Evolution . 25 0.4 Airline Networks . 26 0.5 Research Goals. Hypotheses . 27 0.5.1 Network Topology and Topology Evolution . 27 0.5.2 Airline Networks . 28 0.5.3 Wikipedia . 29 0.6 Thesis Roadmap . 30 1 Network Theory Review 31 1.1 Network Theory Basics . 31 1.1.1 Introduction . 31 1.1.2 Types of simple graph representations . 33 1.2 Statistics on Graphs . 35 1.2.1 Size and density - nodes, edges, average degree . 35 1.2.2 Node centrality measures . 36 1.2.3 Degree distributions . 37 1.2.4 Clustering coefficient . 39 1.2.5 Degree correlation / Assortativity . 40 1.2.6 The S-metric; scale free graphs . 40 1.2.7 Distances - average path length, diameter . 41 1.3 Modularity, Motifs, Coarse-graining . 42 1.3.1 Modularity . 42 1.3.2 Subgraphs. Motifs . 45 1.3.3 Coarse-graining . 49 1.4 Evolution of Networks . 51 1.4.1 Random graph models . 51 1.4.2 Node-degree centric models . 52 1.4.3 Spatial distribution algorithms . 55 1.4.4 Node-copying and function preservation . 57 1.4.5 Module-copying and growth by accretion . 58 1.4.6 Edge-centric models . 59 1.5 Conclusion . 61 7 2 Network Topology 63 2.1 Statistical Indicators for Topology . 64 2.1.1 Canonical topologies . 64 2.1.2 Topology spectrum . 68 2.1.3 Statistics for JetBlue Airways, August 2007 . 71 2.1.4 Degree-Betweennness Relationship . 72 2.1.5 JetBlue August 2007 and degree distributions . 74 2.2 Graph Similarity (Hubs/Authorities Comparison to Canonical Networks) . 77 2.2.1 Graph similarity basics . 77 2.2.2 Graph similarity measure assessment . 77 2.2.3 JetBlue 8/2007 graph similarity . 80 2.3 Modularity and Topology (JetBlue 8/07) . 81 2.3.1 Newman-Girvan: modularity using betweenness . 81 2.3.2 The Newman eigenvector method . 82 2.4 Motifs - The Building Blocks . 83 2.5 Coarse-graining the JetBlue 8/2007 network . 86 2.6 Conclusion and Discussion . 89 3 Airline Networks 91 3.1 Introduction . 91 3.1.1 Trends in the airline industry . 91 3.1.2 The hub-spoke phenomenon . 93 3.1.3 Literature on airline networks . 94 3.1.4 Dynamics drivers in the airline industry . 97 3.2 Airline Data . 99 3.2.1 Data source and slicing . 99 3.2.2 Down-selection of the data slices . 103 3.3 Topology Measures . 103 3.3.1 Statistical indicators for topology . 103 3.3.2 Topology profiles . 115 3.3.3 Degree distributions . 118 3.3.4 Motif analysis . 122 3.4 Conclusion . 128 4 Evolution of US Airline Route Networks 129 4.1 Graph-theoretical statistics over time . 129 4.2 Detecting changes in topology . 135 4.3 Comparing topology to canonical networks over time .