Evolving Networks: Structure and Dynamics
Total Page:16
File Type:pdf, Size:1020Kb
Evolving Networks: Structure and Dynamics A Dissertation Presented to the faculty of the School of Engineering and Applied Science University of Virginia in partial fulfillment of the requirements for the degree Doctor of Philosophy by John R. Hott August 2018 APPROVAL SHEET This Dissertation is submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Author Signature: This Dissertation has been read and approved by the examining committee: Advisor: Worthy Martin Committee Member: Alfred Weaver Committee Member: Gabriel Robins Committee Member: Luther Tychonievich Committee Member: Jeffrey Holt Committee Member: Accepted for the School of Engineering and Applied Science: Craig H. Benson, School of Engineering and Applied Science August 2018 c 2018 John R Hott Abstract Network analysis, especially social network analytics, has become widespread due to the growing amount of linked data available. Many researchers have started to consider evolving networks, i.e. Time-Varying Graphs (TVGs), to begin to understand how these networks change over time. In this dissertation, we expand on current practice in three directions: we define a new concept of \node-identity class" to describe different \lenses" over an evolving network, we develop sampling methods to produce representative static graphs over a network as it evolves, and we utilize social network metrics to produce distributions characterizing the dynamics of the network's evolution. By combining these different techniques, we uncover a change effect in metric value due to network activity across sampling methods and window sizes, and produce a differential measure D(G) that helps signal possibly significant network evolution. We evaluate these techniques on synthetically-generated datasets with prescribed dynamics to show their effectiveness at capturing and depicting those events. We then apply our techniques to analyze three real-world applications: the Nauvoo Marriage Project, consisting of an evolving Mormon marital network in mid-1800s Nauvoo, IL; the Social Networks and Archival Context Project's historical social-document network; and an ArXiv co-authorship network. In each case, we were able to depict the network's dynamics, highlight periods of network activity for further investigation, and guide domain-specific researchers to new insights. For the Nauvoo Marriage Project, through a comparison of the network across identity lenses, our metrics depicted an increased centrality under the patriarchal lens compared with that of the matriarchal lens. Indeed, the rapidity with which the patriarchal centrality \rebounds" suggests a desire of the Nauvoo community to form a strong patriarchal system. i To everyone who has been an influence, support, and guide. To my amazing wife Patricia, who I met on this journey. You have been an unwavering support; thank you for the countless hours spent proofreading. To my advisor Worthy Martin, for providing wisdom and guidance. You inspire me in connecting Computer Science to so many other disciplines across the University. To Nathan Brunelle, Tommy Tracy, Samee Zahur, and everyone who has graciously proofread my works and gave feedback in practice presentations. I am forever grateful. To Alfred Weaver, Gabriel Robins, Luther Tychonievich, and Jeffrey Holt. Thank you for your feedback, suggestions, and guidance as members of my committee. To Kathleen Flake, for the opportunity to collaborate on such an interesting motivating application and our associated theological discussions. To Daniel Pitti and everyone in the SNAC Cooperative, for the opportunity to collaborate on the SNAC project, share in its development, and examine some of its intricacies. To Sarah, Joy, Shayne, Doug, Lauren, Robbie, Cindy, Keswick, and everyone at IATH. To my parents, John and Debbie Hott, who have been a constant support and encouragement. ii Contents Contents iii List of Tables . .v List of Figures . vi 1 Introduction 1 1.1 Related Work . .3 1.2 Time-Varying Graphs . .5 1.3 Time . .7 2 Tools and Methods of Analysis 9 2.1 Sampling Methods . .9 2.2 Metrics . 12 2.2.1 Harmonic Centrality . 13 2.2.2 Betweenness Centrality . 15 2.2.3 Degree and Density . 16 2.2.4 Diameter . 17 2.3 Metric Properties . 18 2.4 Identity Functions . 24 2.5 Application to Evolving Networks . 26 2.5.1 Metric Change Due to Sampling Size as a Measure of Change . 26 2.5.2 Metric Derivative as Measure of Change . 28 2.5.3 Takeaways . 30 2.6 Implementation . 30 2.6.1 Time Complexity . 32 3 Synthetic Experiments 33 3.1 Experimental Design . 33 3.2 Experiments . 37 3.3 Modified Synthetic Analysis . 39 3.4 Results . 41 3.5 Sampling Size Effects . 49 3.5.1 Properties Due to Sampling Size in Synthetic Experiments . 51 3.5.2 Aliasing Effect Based on Sampling Size . 53 3.5.3 Dynamics Through Sampling Size Comparison . 54 3.6 Dynamics Through Metric Derivatives . 55 3.7 Implications of Experiments . 58 4 Real-World Experiments 72 4.1 Nauvoo Marriage Project . 72 4.1.1 Experimental Design . 75 4.1.2 Results . 76 4.1.3 Discussion . 80 iii Contents iv 4.2 Social Networks and Archival Context Project . 81 4.2.1 Experimental Design . 83 4.2.2 Results . 83 4.2.3 Discussion . 85 4.3 ArXiv.org Co-Authorship Network . 85 4.3.1 Overview . 87 4.3.2 Results . 87 4.4 Real-World Summary . 88 5 Visualization Extensions: Nauvoo 103 5.1 Related Visualizations . 104 5.2 Visualizing Family Units: Chords . 106 5.3 Visualizing Lineages: Lineage Flow . 107 5.4 Extensions . 110 5.4.1 Use of Color . 110 5.4.2 Mouseover Focus . 111 5.4.3 Time Sliders . 111 5.5 Evaluation . 113 5.5.1 Implementation Details . 113 5.5.2 Reception . 115 5.6 Future Work on Visualizations . 116 5.6.1 Applications to Other Domains . 117 5.7 Conclusion . 117 6 Conclusion 119 6.1 Future Work . 121 6.1.1 Theoretic and Empirical Studies . 121 6.1.2 Computational and Implementation Challenges . 122 6.1.3 Cross-domain Applications . 123 Bibliography 124 Appendix 128 A Publication List . 128 B Table of Notation . 130 C Additional Calculations . 131 C.1 Harmonic Centrality . 131 D Additional Results of Sampling Size Effects . 132 D.1 Unconstrained Networks . 132 E Implementation Details . 133 E.1 Time-Varying Graph: TemporalGraph . 133 E.2 Time-Varying Graph: TemporalVertex . 142 E.3 Time-Varying Graph: TemporalEdge . 146 F Additional Real-World Results . 151 F.1 Nauvoo Marriage Project . 151 List of Tables 3.1 Baseline metric values each of the γ0 initial densities. Metrics remain constant across time under both centrality definitions, but evenly spaced for harmonic centrality and an exponential decline for betweenness centrality. 42 3.2 Union Sampling: Increase in average number of edges for each η and λt combination over the constrained network with γ0 = 0:5........................ 52 3.3 Union Sampling: Average sustained percentage of the additional 2λt · η edges for each η and λt combination over the constrained network with γ0 = 0:5. 53 3.4 Intersection Sampling: Decrease in average number of edges for each η and λt combi- nation under the constrained network with γ0 = 0:5. 53 3.5 Intersection Sampling: Average sustained percentage of the 2λtη less edges for each η and λt combination over the constrained network with γ0 = 0:5. 53 3.6 D(G) values above the 0:02 threshold and its components for a synthetic network without a guaranteed minimal star and µ = N...................... 57 4.1 D(G) values above the 0:015 threshold and its components for the patriarchal identity lens. 80 4.2 D(G) values above the 0:015 threshold and its components for the matriarchal identity lens. 80 4.3 D(G) values above the 0:015 threshold and its components for the pairwise binary identity lens. ..