A Method for Analysis of Node Position in the Network of Internet Users
Total Page:16
File Type:pdf, Size:1020Kb
A Method for Analysis of Node Position in the Network of Internet Users Katarzyna Musia l Institute of Computer Science Wroclaw University of Technology Thesis submitted for the Degree of Doctor of Philosophy in the Wroclaw University of Technology · 2009 · Contents Acknowledgments xiii Important Symbols xiv Abbreviations xv Abstract xvi 1 Introduction 1 2 Social Networks 7 2.1 RegularSocialNetwork. 7 2.2 NetworkofInternetUsers . 14 2.2.1 Concept of the Network of Internet Users . 15 2.2.2 Taxonomy of the Networks of Internet Users . 19 2.3 Internet Identity — Node of the Network of Internet Users . 21 2.3.1 Concept of the Internet Identities . 21 2.3.2 Individual and Group Internet Identity . 26 2.3.3 Internet Identities Integration . 28 2.4 Internet Relationships — Edges of the Network of Internet Users 30 2.4.1 Concept of the Internet Relationship . 30 2.4.2 Types of the Internet Relationships . 31 2.4.3 Directness of the Internet Relationship . 34 i 2.4.4 Ties ............................ 36 2.5 Examples of the Networks of Internet Users . 38 2.5.1 Electronic Mail Services . 38 2.5.2 InstantMessengers . 38 2.5.3 BlogsServices....................... 39 2.5.4 Social Networking Sites . 39 2.5.5 Multimedia Sharing Systems . 40 2.5.6 AuctionSystems . .. .. 41 2.5.7 SocialSearchEngines. 42 2.5.8 Social Bookmarking and Cataloging . 42 2.5.9 Homepages ........................ 43 2.5.10 Knowledge Sharing Systems . 43 2.5.11 Virtual Worlds and Multiplayer Online Games . 44 2.5.12 Collaborative Authoring Systems — Wikis . 44 2.5.13 FriendOfAFriendProject . 45 2.5.14 Complex Communications Systems . 45 2.5.15 Comparison ........................ 45 3 User Position in Social Network 51 3.1 SocialNetworkAnalysis . 51 3.2 UserPosition ........................... 55 3.2.1 Prestige .......................... 55 3.2.2 Centrality ......................... 57 3.3 Comparison of the User Position Indices . 59 4 Node Position in Network of Internet Users 64 4.1 GeneralConcept ......................... 64 4.2 CommitmentEvaluation . 67 ii 4.3 NodePositionCalculation . 69 4.3.1 PINNodes......................... 70 4.3.2 PINEdges......................... 71 4.3.3 PINHybrid ........................ 72 4.4 Example of Node Position Calculation . 74 4.4.1 The Influence of ε ..................... 74 4.4.2 The Influence of Initial Node Positions . 79 4.4.3 The Convergence of Node Position Method . 80 4.4.4 Example — Conclusions . 83 5 Formal Analysis of the Node Position Method 84 5.1 TotalandAverageNodePosition . 84 5.2 Convergence............................ 86 5.3 IntervalofLimitValues . 89 6 Research 92 6.1 ThurmanNetwork ........................ 93 6.1.1 DataDescriptionandPreparation . 93 6.1.2 Comparison of Node Position with Other Centrality Methods.......................... 95 6.2 EnronNetwork .......................... 99 6.2.1 DataDescriptionandPreparation . 99 6.2.2 Characteristics ofNodePosition . 101 6.3 Wroclaw University of Technology Network . 106 6.3.1 DataDescription andPreparation . 106 6.3.2 Influence of stopping condition τ and ε coefficient . 106 6.3.3 Comparison with Other User Position Indices . 114 6.4 EfficiencyTests ..........................117 iii 6.4.1 Influence of ε on Processing Time . 117 6.4.2 Influence of Network Size on Processing Time . 118 6.4.3 Efficiency of Node Position versus Other User Position Indices...........................126 6.5 Possible Application of Node Position Method . 127 6.5.1 Method for Bridges’ Properties Analysis . 128 6.5.2 Experiments. .. .. .129 7 Conclusions and Future Work 134 References 138 Appendix A 149 Appendix B 151 Appendix C 161 iv List of Figures 2.1 Aregularsocialnetwork . .. .. 9 2.2 The division of social networks based on the type of the rela- tionship .............................. 12 2.3 An example of relationships between people working in a com- pany................................ 13 2.4 The division of social networks based on the type of the com- municationchannel . .. .. 13 2.5 Homogeneous HSN, system–based (SSN), and internet multi- modal social network (ISN) ................... 15 2.6 Examples of homogeneous HSN, system–based (SSN), and in- ternet multimodal social network (ISN) ............ 16 2.7 Realtimevs. nonrealtimenetworks . 20 2.8 Mapping of social entities into internet identities . 22 2.9 The concept of internet identities merging . 26 2.10 Integration of two system–based social networks by means of internet identity merging . 29 2.11 Taxonomy of internet relationships . 32 2.12 Thedirectionofrelationships[33] . 33 2.13 The direct internet relationship in the social network on the Internet .............................. 35 v 2.14 The quasi–direct internet realtionship in the social network on theInternet ............................ 35 2.15 The object–based internet relationship with equal roles: com- mentator (a), and different roles: commentator and author (b)................................. 36 2.16 The indirect internet relationship in the social network on the Internet .............................. 36 2.17 The tie concept in the social network . 37 2.18 Main functions of the social networking site related to rela- tionshipmaintenance . 40 3.1 Methods of social network analysis . 54 4.1 Example of the network of Internet users NIU with the as- signed commitment values . 65 4.2 Distribution of the commitment for an inactive member y equally among all y’sacquaintances . 66 4.3 The network of Internet users that can be extracted from the emailcommunication . 74 4.4 The values of members’ node positions in relation to ε value . 77 4.5 The value of social position in relation to ε ........... 78 4.6 The minimum, maximum, average, and standard deviation of node position calculated for the same community but for different values of ε ........................ 78 4.7 The linear regression for average node position and standard deviation.............................. 79 4.8 The values of node position after every iteration for ε = 0.95 . 81 vi 4.9 The values of the member x’s node position after every itera- tion for various ε ......................... 81 4.10 The convergence of the node position sum for various ε and variousinitialsums . .. .. 82 5.1 The community where individual x has the greatest node po- sition................................ 89 5.2 The chart of the function f(ε) for the network that consists of m Internetusers.......................... 91 5.3 Therangeofthenodepositionvalues . 91 6.1 Graph representation of the classic Thurman network . 93 6.2 The comparison of centrality measures to node position in the Thurmannetwork......................... 96 6.3 The comparison of prestige measures to node position in the Thurmannetwork......................... 96 6.4 The comparison of eigenvector measures with node position in theThurmannetwork ...................... 97 6.5 The values of Kendall’s coefficient for the pairs of rankings fromTable6.4...........................100 6.6 Average NP and NP wT F as well as their standard deviations in the Enron dataset, calculated for different values of ε . 102 6.7 The percentage of users with NP < 1 and NP ≥ 1 within the Enron network in relation to ε ..................102 6.8 The distribution of NP depending on ε .............103 6.9 Percentage of duplicates within the set of node measures, sep- arately for node position with different values of ε, degree prestige (IDC), and degree centrality (ODC) .........103 vii 6.10 The percentage contribution of members with NP ≥ NP wT F and NP < NP wT F within the Enron social network in rela- tion to ε ..............................105 6.11 The mean squared error between NP and NP wT F for the Enron dataset in relation to ε ..................106 6.12 Social network discovered from the email communication be- tween employees of WUT . 107 6.13 Processing time [min] of the PIN edges depending on different values of ε coefficient and stop condition τ ...........109 6.14 Processing time [min] of the PIN edges depending on different values of coefficient ε .......................109 6.15 Processing time [min] of the PIN edges depending on different value stop condition τ ......................110 6.16 Number of required iterations depending on different values of ε coefficient and stopping condition τ ..............111 6.17 Processing time [min] of the PIN edges depending on different values of ε coefficient and stop condition τ ...........112 6.18 Exponential fitting function for the relation between ε and numberofiterations . .113 6.19 Exponential fitting function for the relation between ε and numberofduplicates . .115 6.20 Processing time of the PIN nodes depending on different size ofnetwork.............................120 6.21 Processing time of the PIN edges depending on different size of network ..............................121 6.22 Processing time of the PIN hybrid depending on different size ofnetwork.............................122 viii 6.23 Processing time depending on the number of nodes in the net- workforfixednumber ofedges(50,000). 123 6.24 Processing time depending on the number of edges in the net- workforfixednumberofnodes(50,000) . 125 6.25 Crucial steps of the proposed method of bridges properties analysis ..............................128 6.26 Structure of peripheral nodes in Thurman network and bridges that connect PN with the regular cliques . 131 6.27 Structure of peripheral clique in Thurman network and bridges that connect PC with the regular cliques . 131 7.1 TherelationlayersinFlickr . .152 7.2 Relation layers extracted from contact lists: Rc (a), Rrc (b), and Rcoc (c)............................153