Link prediction and link detection in sequences of large social networks using temporal and local metrics. A dissertation submitted to the Department of Computer Science at the University of Cape Town in fulfilment of the requirements for the degree of master of science. By Richard Jeremy Edwin Cooke November 2006 Supervised by Dr. Anet Potgieter and co-supervised by Dr. Kurt April © Copyright 2006 by Richard Cooke Abstract This dissertation builds upon the ideas introduced by Liben-Nowell and Kleinberg in The Link Prediction Problem for Social Networks [42]. Link prediction is the problem of predicting between which unconnected nodes in a graph a link will form next, based on the current structure of the graph. The following research contributions are made: • Highlighting the difference between the link prediction and link detection problems, which have been implicitly regarded as identical in current research. Despite hidden links and forming links having very highly significant differing metric values, they could not be distinguished from each other by a machine learning system using traditional metrics in an initial experiment. However, they could be distinguished from each other in a “simple” network (one where traditional metrics can be used for prediction successfully) using a combination of new graph analysis approaches. • Defining temporal metric statistics by combining traditional statistical measures with measures commonly employed in financial analysis and traditional social network analysis. These metrics are calculated over time for a sequence of sociograms. It is shown that some of the temporal extensions of traditional metrics increase the accuracy of link prediction. • Defining traditional metrics using different radii to those at which they are normally calculated. It is shown that this approach can increase the individual prediction accuracy of certain metrics, marginally increase the accuracy of a group of metrics, and greatly increase metric computation speed without sacrificing information content by computing metrics using smaller radii. It also solves the “distance-three task” (that common neighbour metrics cannot predict links between nodes at a distance greater than three). • Showing that the combination of local and temporal approaches to link prediction can lead to very high prediction accuracies. Furthermore in “complex” networks (ones where traditional metrics cannot be used for prediction successfully) local and temporal metrics become even more useful. Acknowledgements I thank my supervisors, Anet and Kurt, for their guidance and advice; the South African National Research Foundation for partly funding this research; my family for providing additional funding; Petter Holme, the owners of Pussokram and the owners of Netcash for providing the data on which I tested my ideas; and finally the makers of Weka and Jung for creating and freely distributing useful Java packages. Contents Chapter 1. Introduction 12 1.1. Research overview......................................................................................................................12 1.2. Motivation.................................................................................................................................. 12 1.3. Contributions of this dissertation............................................................................................... 13 1.4. Experimental approach...............................................................................................................14 1.5. Evaluation criteria...................................................................................................................... 14 1.6. Scope and limitations................................................................................................................. 15 1.7. Dissertation outline.....................................................................................................................15 Chapter 2. Social network analysis background 17 2.1. Social networks...........................................................................................................................17 2.2. Graph theory............................................................................................................................... 17 2.3. Links in social networks.............................................................................................................19 2.3.1. Small world networks................................................................................................20 2.3.2. Scale-free networks................................................................................................... 21 2.3.3. Homophily and assortative mixing............................................................................22 2.4. Social network analysis ............................................................................................................. 22 2.4.1. Equivalence............................................................................................................... 22 2.4.2. Web link analysis...................................................................................................... 24 2.5. Metrics........................................................................................................................................24 2.6. Link prediction........................................................................................................................... 25 2.6.1. Existing link prediction techniques ..........................................................................26 2.6.2. Link prediction application papers............................................................................28 2.6.3. Link completion.........................................................................................................28 2.7. Anomalous link discovery.......................................................................................................... 29 2.8. Link detection............................................................................................................................. 29 2.9. Dynamic network analysis .........................................................................................................30 2.9.1. Temporal analysis......................................................................................................30 2.10. Applications of social network analysis...................................................................................31 2.10.1. Search in social networks........................................................................................31 2.10.2. Dark networks......................................................................................................... 31 2.10.3. Content recommendation systems...........................................................................31 2.10.4. Marketing................................................................................................................ 32 i 2.10.5. Ecology....................................................................................................................32 2.10.6. Specific applications of link prediction ................................................................. 32 2.11. Social data sources................................................................................................................... 33 2.12. Computational complexity of social network analysis.............................................................33 2.13. Decentralisation (distributed intelligence) theory....................................................................34 2.14. Dealing with complexity in sociograms................................................................................... 35 2.15. Conclusions.............................................................................................................................. 36 Chapter 3. Research methodology 37 3.1. Data format................................................................................................................................. 37 3.2. Metric calculation.......................................................................................................................38 3.2.1. Basic terminology......................................................................................................38 3.2.2. Metric definitions...................................................................................................... 41 3.2.3. Derived metrics......................................................................................................... 45 3.2.4. Metrics useful for link prediction..............................................................................45 3.3. Input transformations..................................................................................................................46 3.4. Data modelling............................................................................................................................46 3.5. False positives and instance weighting...................................................................................... 47 3.6. Regression...................................................................................................................................47 3.6.1. Linear regression....................................................................................................... 48 3.6.2. Updating linear
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages121 Page
-
File Size-