Proximity, Interactions, and Communities in Social Networks: Properties and Applications
Total Page:16
File Type:pdf, Size:1020Kb
PROXIMITY, INTERACTIONS, AND COMMUNITIES IN SOCIAL NETWORKS: PROPERTIES AND APPLICATIONS. By Tommy Nguyen A Thesis Submitted to the Graduate Faculty of Rensselaer Polytechnic Institute in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY Major Subject: COMPUTER SCIENCE Examining Committee: Boleslaw K. Szymanski, Thesis Adviser Sibel Adal´ı,Member James A. Hendler, Member Gyorgy Korniss, Member Mohammed J. Zaki, Member Rensselaer Polytechnic Institute Troy, New York October 2014 (For Graduation December 2014) c Copyright 2014 by Tommy Nguyen All Rights Reserved ii CONTENTS LIST OF TABLES . vi LIST OF FIGURES . vii ACKNOWLEDGMENT . ix ABSTRACT . .x 1. INTRODUCTION . .1 1.1 Ranking Information in Social Networks . .2 1.2 Small Worlds and Social Stratification . .4 1.3 Summary of Contributions & Organization . .6 1.3.1 Organization . .7 2. LITERATURE REVIEW . 10 2.1 Ranking Techniques . 10 2.1.1 Web Conceptualization . 10 2.1.2 User Data & Trust Models . 11 2.1.3 Learning to Rank . 13 2.2 Small-world Problem . 15 2.2.1 Six Degrees of Separation . 15 2.2.2 Social Stratification . 16 3. SOCIAL NETWORK ANALYSIS . 18 3.1 Geography, Co-Appearance, & Interactions . 19 3.1.1 Data Collection . 19 3.1.2 Notations & Definitions . 20 3.1.3 Data Analysis & Results . 21 3.1.4 Limitations . 24 3.2 Incorporating Geography into Community Detection . 24 3.2.1 Clique Percolation Method . 25 3.2.2 Modularity Maximization . 26 3.2.3 Speaker-Label Propagation (GANXiS) . 27 3.3 Contrasting Communities to Null Models . 28 3.3.1 Techniques for Generating Covers . 29 iii 3.3.2 Measuring Covers & Communities . 29 3.3.3 Examining Covers in Gowalla . 31 3.4 Examining Detected Communities . 33 3.4.1 Network Community Profile (NCP) . 34 3.4.2 Link Connectivity Measurements . 35 3.4.3 Face-to-Face Interactions Measurements . 35 3.5 Application: Social Relationships & Human Mobility . 39 3.5.1 Network Congestion in MANETs . 41 3.5.2 Mobility Generation . 41 3.5.3 Experimental Congestion Design . 42 3.5.4 Congestion Simulation Results . 43 3.6 Application: Long Ties & Economic Development . 44 3.6.1 A Stochastic Model of Economic Development . 47 3.6.2 Experimental Results & Discussion . 48 3.7 Summary of Results . 54 4. SOCIAL RANKING TECHNIQUES . 57 4.1 Google Buzz & Twitter . 57 4.1.1 Categories of URLs. 59 4.1.2 Spreaders & Affected Sets . 60 4.1.3 Information Distances . 61 4.1.4 Geographical Distances . 62 4.1.5 Densities of Social Relationships . 64 4.1.6 Keyword Similarity . 65 4.2 Social Ranking Techniques . 66 4.2.1 PageRank on Social Network . 66 4.2.2 HITS on Social Network . 67 4.2.3 Ranking with Maximum Flow . 68 4.2.4 Variants of Maximum Flow . 70 4.3 Social Ranking Experiments . 70 4.3.1 Comparing PageRank & HITS . 70 4.3.2 Flow Ranking . 71 4.3.3 Rank Differences . 74 4.3.4 Rank Distributions . 76 4.3.5 Rank Validation . 77 4.4 Summary of Results . 78 iv 5. SOCIAL SEARCHING EXPERIMENTS . 81 5.1 Attrition, Geography, & Communities . 82 5.1.1 Modeling Attrition . 82 5.1.2 Geographical Analysis . 84 5.1.3 Detecting Communities . 86 5.2 Experimental Design . 86 5.2.1 Routing Strategies . 87 5.2.2 Starter & Target Selections . 88 5.3 Experimental Results . 89 5.3.1 Selection & Routing Combinations . 89 5.3.2 Friends-of-Friends Knowledge Densities . 90 5.3.3 Distributions of Successful Chains . 91 5.3.4 Effects of Hubs and Connectors . 92 5.3.5 Individual and Community Prominence . 93 5.4 Summary of Results . 95 6. CONCLUSION AND FUTURE WORK . 97 REFERENCES . 99 v LIST OF TABLES 1.1 Aspects of SNA & applications. .7 3.1 Data summary of Gowalla network. 20 3.2 Six techniques for generating covers. 29 3.3 Measurements for cover C of the size k................... 31 3.4 Detected communities and their sizes. 34 3.5 Measuring spatial conductance. 36 3.6 Measuring face-to-face interactions. 36 3.7 Network simulator ns-2 parameters. 43 3.8 Measuring economic development (Gowalla). 52 3.9 Measuring economic development (FourSquare). 53 4.1 Data summary of Google Buzz. 59 4.2 Data summary of Twitter. 59 4.3 Google Buzz (left) & Twitter (right) with geography. 59 4.4 Social relationships densities in Google Buzz. 64 4.5 Social relationships densities in Twitter. 65 4.6 Ranking results of 30 popular URLs in Google Buzz. 74 4.7 Ranking results of 30 random URLs in Google Buzz. 75 4.8 Avg. ranking differences in Google Buzz. 76 4.9 Avg. ranking differences in Twitter. 76 5.1 Summaries of online social networks datasets. 81 5.2 Communities detected by GANXiS. 86 5.3 Prominence of individuals and communities. 88 5.4 Experimental results for Gowalla. 88 5.5 Experimental results for FourSquare. 89 6.1 Aspects of SNA & applications. 97 vi LIST OF FIGURES 3.1 Geographical spread of 100K checkins in Gowalla. 19 3.2 Friendship is bounded by geographical distance. 21 3.3 Densities of pairs as a function of geographical distance. 22 3.4 Measuring face-to-face interactions (t=30mins, d=1km). 23 3.5 Generating CTA & FTA covers. 30 3.6 Intra-edge count, boundary-edge count, and geographic diameter of covers. 32 3.7 Contraction, expansion, conductance, and geographic distance of covers. 33 3.8 Communities detected by Clique Percolation Method. 36 3.9 Communities detected by Inference Algorithm. 37 3.10 Communities detected by GANXiS. 38 3.11 Measuring face-to-face interactions among members. 39 3.12 Generating a Markov Model using checkins. 41 3.13 Design of simulation overview. 43 3.14 Traffic congestion in FMM and RWP. 44 3.15 Frequency of pauses using the RWP. 45 3.16 Scaling laws of short and long ties. 49 3.17 Face-to-face interactions of short ties and long ties. 49 3.18 The collective strength of long ties in a simple contagion model. 50 3.19 Distribution of long ties for adopters and non-adopters. 51 3.20 Economic development as a function of idea flow (Gowalla). 52 3.21 Economic development as a function of idea flow (FourSquare). 53 3.22 Speedy idea flow as a function of social diversity. 53 4.1 Conceptualization of social ranking. 57 4.2 Categories of popular (a,c) and random (b,d) URLs. 60 vii 4.3 Shortest paths to URLs in Google Buzz (a) and Twitter (b). 61 4.4 Ultra small-world property from starters to information. 62 4.5 Densities of shortest path lengths from starters to URLs. 62 4.6 Two degrees of spatial concentration. 63 4.7 Four dimensions of social relationships. 64 4.8 CKS for friendship, following, peers, and random pairs. 65 0 4.9 Graph Gp for ranking URLs fu1; u2g with respect to node p....... 69 4.10 Ranking URLs on Google Buzz. 71 4.11 Ranking URLs on Twitter. 72 4.12 Social ranking with popular URLs on Google Buzz. 72 4.13 Social ranking with random URLs on Google Buzz. 73 4.14 Social ranking with popular URLs on Twitter. 73 4.15 Social ranking with random URLs on.