The Structure of Social Networks: Modeling, Sampling, and Inference
Total Page:16
File Type:pdf, Size:1020Kb
The Structure of Social Networks: Modeling, Sampling, and Inference Naghmeh Momeni Taramsari Doctor of Philosophy Department of Electrical and Computer Engineering McGill University Montreal,Quebec April 2017 A thesis submitted to McGill University in partial fulllment of the requirements of the degree of Doctor of Philosophy c Naghmeh Momeni Taramsari, 2017 TABLE OF CONTENTS ABSTRACT . vi ABRÉGÉ . vii ACKNOWLEDGEMENTS . viii PREFACE AND CONTRIBUTION OF AUTHORS . ix LIST OF FIGURES . x LIST OF TABLES . xiii 1 Introduction . 1 1.1 Network Perspective . 1 1.2 Notation and Terminology . 3 1.3 Historical Background on Network Science . 5 1.4 Social Networks and Social Outcomes . 7 1.5 Thesis Outline . 9 1.6 Publications . 12 2 Network Sampling . 13 2.1 Chapter Outline . 13 iii 2.2 Introduction . 13 2.3 Challenges of Sampling Oine Social Networks . 15 2.4 Fixed-choice Designs . 16 2.5 Chapter Summary . 17 3 Paper: Statistical Inference for Fixed-choice Design Incorporating Strong and Weak Ties . 18 4 Additional Simulation Results for Network Sampling . 52 4.1 Chapter Outline . 52 4.2 Introduction . 52 4.3 Transitivity . 53 4.4 Community Structure . 53 4.5 Popularity Bias . 54 4.6 Chapter Summary . 58 5 Introduction to Alter Sampling . 59 5.1 Chapter Outline . 59 5.2 Introduction . 59 5.3 Applications of Alter Sampling . 60 5.3.1 Acquaintance Vaccination . 60 5.3.2 Health Monitoring . 62 5.3.3 Information Diusion on Social Media . 63 5.3.4 Early Detection of Natural Disasters . 64 5.3.5 Health Intervention . 64 5.4 Chapter Summary . 65 6 Paper: Eectiveness of Alter Sampling in Various Social Networks . 66 7 Alter Sampling and the Friendship Paradox . 86 7.1 Chapter Outline . 86 7.2 Introduction . 86 7.3 The Friendship Paradox . 87 7.4 Introducing the Generalized Friendship Paradox . 90 7.5 Chapter Summary . 92 8 Paper: Friendship Paradox and the Inequalities in Social Networks . 94 9 Paper: Generalized Friendship Paradox in Growing Networks . 120 10 Paper: Quantifying and Measuring the Friendship Paradox . 139 iv 11 Future Work . 154 11.1 Multiplex Data . 154 11.2 Additional Socio-centric Data . 154 11.3 Heterogeneous Tie Strength . 155 11.4 Eect of Inequalities in Social Networks on Subjective Well-being . 156 11.5 Interplay between Structural and Non-structural Inequality . 156 REFERENCES . 158 v ABSTRACT Social networks are essential tools for modeling social dynamics. Their structure aects and is aected by the behavior of individuals that constitute them. Many studies have related the structure of social networks to various social and individual outcomes. In many studies, the rst step towards network analysis is to observe the network. If the full network is infeasible to acquire, network sampling methods are employed. Sam- pling oine social networks involves interviewing people. Since respondent fatigue is a pressing problem, standard practice is to ask each respondent only a limited number of names. This throws away much information about the network structure. In this thesis, we focus on the problem of estimating the structural properties of the original social network from such survey data. We provide reliable estimators that incorporate link heterogeneity. We then focus on applications where knowledge over the global structure of the social network is unfeasible, and ecient methods are needed to identify nodes with certain properties without having to sample the network. We focus on a method called Alter Sampling, which was originally introduced in network epidemiology. We demon- strate its eectiveness in various social networks with dierent structural properties. Then we highlight insights that this ubiquitous eectiveness provides about how social networks are organized. We discuss the relations to the so-called Friendship Paradox and its generalized version, and provide metrics to quantify how local structural and non-structural properties of nodes compare with their neighbors. vi ABRÉGÉ La modélisation des dynamiques sociales dépend étroitement sur la structure des réseaux sociaux qui inue sur et est également inuencée par le comportement des individus qui composent le réseau. De nombreuses études constatent qu'il existe des liens serrés entre les réseaux sociaux et divers résultats sur non seulement le plan indi- viduel, mais aussi le plan social. En générale, l'observation du réseau est la première étape de son analyse. Par la suite, si la consitution du réseau complet est impossible à déterminer, les méthodes d'échantillonage en réseaux sont souvent employées. Pour réaliser un échantillonge des réseaux sociaux hors lignes et pour réduire l'eet de la fatigue sur les personnes interrogées, des entretiens sont eectuées desquelles la pra- tique habituelle est de demander un nombre limité de noms, ce qui gâche une bonne partie de l'information sur la structure sous-jacente du réseau. Dans cette thèse, nous nous concentrons sur comment estimer les caractéristiques structurelles du réseau social original à partir de telles données. Ainsi, nous fournissons des estimateurs ables qui intègrent l'hétérogénéité des liens. Motivés par le besoin de développer des méthodes ecaces pour identier des noeuds ayant certaines caractéristiques sans devoir échantilloner le réseau au complet, nous nous avons ensuite tournés vers les réseaux sociaux pour lesquels il est impos- sible de cerner leur structure globale. Nous nous sommes penchés sur la méthode d'échantillonage altérée(ou Alter Sampling en anglais), qui a d'abord été introduit dans le domaine de l'épidémiologie des réseaux, et nous montrons son ecacité dans divers réseaux dont les caractéristiques structurales sont toutes diérentes. Finale- ment, nous soulignons comment l'ecacité de l'échantillonage altérée nous informe sur l'organisation sociale de ces mêmes réseaux. Nous élaborons sur les relations entre le Paradoxe de l'amitié et sa généralisation, et nous proposons des indicateurs pour quantier les caractéristiques (non-)structurales locales par rapport à leurs voisins. vii ACKNOWLEDGEMENTS I express my gratitude towards my adviser Prof. Michael Rabbat for his help and support during my PhD. I thank Prof. Sam Harper for agreeing to be on my PhD committee. I would like to thank Prof. Yannis Psaromiligkos for both being on my PhD committee and kindly agreeing to be the internal examiner of the dissertation. I also thank Prof. June Zhang for kindly agreeing to be the external examiner. I thank the feedback and input of the committee during the dierent stages of my PhD. Moreover, I would like to wholeheartedly express my sincere gratitude to my dear parents and my dear brother for their love and care throughout life, and to my beloved husband for his love and support during the life we have shared. viii PREFACE AND CONTRIBUTION OF AUTHORS This dissertation is an original intellectual product of the author, Naghmeh Momeni Taramsari. The work presented in this dissertation is the result of research conducted between January 2014 and March 2017 at the department of electrical and computer engineering, McGill university, under supervision of Prof. Michael Rabbat. The follow- ing manuscripts have been published in peer-reviewed journals and conferences based on the presented work. 1. N. Momeni, M. Rabbat, Qualities and Inequalities in Online Social Networks through the Lens of the Generalized Friendship Paradox", PloS one 11.2 (2016): e0143633. 2. N. Momeni, M. Rabbat, Measuring the Generalized Friendship Paradox in Net- works with Quality-Dependent Connectivity", Complex Networks VI. Springer International Publishing, 2015. 45-55. 3. N. Momeni, M. Rabbat, Inferring network properties from xed-choice design with strong and weak ties", IEEE Statistical Signal Processing Workshop (SSP), 2016. 4. N. Momeni, M. Rabbat, Generalized Friendship Paradox: An Analytical Ap- proach", International Conference on Social Informatics. Springer International Publishing, 2014. 5. N. Momeni, M. Rabbat, Inferring Structural Characteristics of Networks with Strong and Weak Ties from Fixed-Choice Surveys", accepted to appear in IEEE Transactions on Signal and Information Processing over Networks. doi: 10.1109/TSIPN.2017.2731053 ix LIST OF FIGURES Figure page 11 Example graph to illustrate how clustering coecient is calculated. 5 12 The growth in the relative occurrence of the term `Social Network Analysis' in the corpus of the literature that Google NGram Viewer has indexed. The third word is intentionally added, because only using `social network' would also return result that might have been related to online social media, instead of the theoretical eld of research which is our purpose. Note that in the past 40 years, the relative occurrence has grown over 50 times. 8 31 Schematic illustration of Sampling Setup . 24 32 Dierent compositions of open triads . 29 33 Dierent compositions of triangles . 31 34 Illustrative example of triangle and open triad after sampling . 32 35 Distribution of relative error for the approximation made in calculating E mW .................................. 36 f 0 g 36 Performance of the estimators as a function of N ............ 38 x 37 Performance of estimators for dierent q and B ............. 40 38 Comparing the performance to that of the single-layer collapsed method 41 39 Performance of the estimators for the clustering coecient . 42 310 Performance of the estimator for real oine network data set . 43 311 Standard deviation of estimators via jackknife . 45 312 All possible ways a triangle could be observed . 47 313 All possible ways an open triad could be observed . 48 41 Results for transitivity-driven response procedure . 55 42 Results for community-driven response procedure . 56 43 Results for popularity-driven response procedure . 57 61 Empirical distribution of nodes with gain higher than one in directed networks . 72 62 Empirical distribution of nodes with gain higher than one in undirected networks . 73 63 Distribution of gain for all networks . 74 64 The gain of the estimator as a function of in-degree percentile . 75 65 Performance of estimator for the four network families . 77 81 Distribution of dierent nodal attributes .