On the Spectral Evolution of Large Networks (Online Version)
Total Page:16
File Type:pdf, Size:1020Kb
On the Spectral Evolution of Large Networks J´er^omeKunegis Institute for Web Science and Technologies University of Koblenz-Landau [email protected] November 2011 Vom Promotionsausschuss des Fachbereichs 4: Informatik der Universit¨at Koblenz-Landau zur Verleihung des akademischen Grades Doktor der Naturwissenschaften (Dr. rer. nat.) genehmigte Dissertation. PhD thesis at the University of Koblenz-Landau. Datum der wissenschaftlichen Aussprache: 9. November 2011 Vorsitz des Promotionsausschusses: Prof. Dr. Karin Harbusch Berichterstatter: Prof. Dr. Steffen Staab Berichterstatter: Prof. Dr. Christian Bauckhage Berichterstatter: Prof. Dr. Klaus Obermayer Abstract In this thesis, I study the spectral characteristics of large dynamic networks and formulate the spectral evolution model. The spectral evolution model applies to networks that evolve over time, and describes their spectral decompositions such as the eigenvalue and singular value decomposition. The spectral evolution model states that over time, the eigenvalues of a network change while its eigenvectors stay approximately constant. I validate the spectral evolution model empirically on over a hundred network datasets, and theoretically by showing that it generalizes a certain number of known link prediction functions, including graph kernels, path counting methods, rank reduction and triangle closing. The collection of datasets I use contains 118 distinct network datasets. One dataset, the signed social network of the Slashdot Zoo, was specifically extracted during work on this thesis. I also show that the spectral evolution model can be understood as a generalization of the preferential attachment model, if we consider growth in latent dimensions of a network individually. As applications of the spectral evolution model, I introduce two new link prediction algo- rithms that can be used for recommender systems, search engines, collaborative filtering, rating prediction, link sign prediction and more. The first link prediction algorithm reduces to a one- dimensional curve fitting problem from which a spectral transformation is learned. The second method uses extrapolation of eigenvalues to predict future eigenvalues. As special cases, I show that the spectral evolution model applies to directed, undirected, weighted, unweighted, signed and bipartite networks. For signed graphs, I introduce new ap- plications of the Laplacian matrix for graph drawing, spectral clustering, and describe new Laplacian graph kernels. I also define the algebraic conflict, a measure of the conflict present in a signed graph based on the signed graph Laplacian. I describe the problem of link sign predic- tion spectrally, and introduce the signed resistance distance. For bipartite and directed graphs, I introduce the hyperbolic sine and odd Neumann kernels, which generalize the exponential and Neumann kernels for undirected unipartite graphs. I show that the problem of directed and bipartite link prediction are related by the fact that both can be solved by considering spectral evolution in the singular value decomposition. I Zusammenfassung In dieser Doktorarbeit beschreibe ich das spektrale Verhalten von großen, dynamischen Netzwer- ken und formuliere das spektrale Evolutionsmodell. Das spektrale Evolutionsmodell beschreibt das Wachstum von Netzwerken, die sich im Laufe der Zeit ¨andern, und charakterisiert ihre Eigenwert- und Singul¨arwertzerlegung. Das spektrale Evolutionsmodell sagt aus, dass im Lau- fe der Zeit die Eigenwerte eines Netzwerks wachsen, und die Eigenvektoren nahezu konstant bleiben. Ich validiere das spektrale Evolutionsmodell empirisch mit Hilfe von uber¨ einhundert Netz- werkdatens¨atzen, und theoretisch indem ich zeige, dass es eine gewisse Anzahl von bekann- ten Algorithmen zur Kantenvorhersage verallgemeinert, darunter Graph-Kernel, Pfad-Z¨ahl- Methoden, Rangreduktion und Triangle-Closing. Die Sammlung von Datens¨atzen, die ich ver- wende enth¨alt 118 distinkte Datens¨atze. Ein Datensatz, das soziale Netzwerk mit negativen Kan- ten des Slashdot-Zoo, wurde speziell w¨ahrend des Verfassens dieser Arbeit extrahiert. Ich zeige auch, dass das spektrale Evolutionsmodell als Generalisierung des Preferential-Attachment- Modells verstanden werden kann, wenn Wachstum in latenten Dimensionen einzeln betrachtet wird. Als Anwendungen des spektralen Evolutionsmodells fuhre¨ ich zwei neue Algorithmen zur Kantenvorhersage ein, die in Empfehlungssystemen, Suchmaschinen, im Collaborative-Filtering, fur¨ die Vorhersage von Bewertungen, fur¨ die Vorhersage von Kantenvorzeichen und mehr ver- wendet werden k¨onnen. Der erste Kantenvorhersagealgorithmus ergibt ein eindimensionales Curve-Fitting-Problem, aus dem eine spektrale Transformation gelernt wird. Die zweite Me- thode verwendet Extrapolation von Eigenwerten, um zukunftige¨ Eigenwerte vorherzusagen. Als Spezialf¨alle zeige ich, dass das spektrale Evolutionsmodell auf gerichtete, ungerichtete, gewichtete, ungewichtete, vorzeichenbehaftete und bipartite Graphen erweitert werden kann. Fur¨ vorzeichenbehaftete Graphen fuhre¨ ich neue Anwendungen der Laplace-Matrix zur Graph- zeichnung, zur spektralen Clusteranalyse, und beschreibe neue Laplace-Graph-Kernel, die auf vorzeichenbehaftete Graphen angewendet werden k¨onnen. Ich definiere dazu den algebraischen Konflikt, ein Maß fur¨ den Konflikt, der in einem vorzeichenbehafteten Graphen vorhanden ist, und das auf der vorzeichenbehafteten Laplace-Matrix begrundet¨ ist. Ich beschreibe das Problem der Vorhersage von Kantenvorzeichen spektral, und fuhre¨ die vorzeichenbehaftete Widerstands- Distanz ein. Fur¨ bipartite und gerichtete Graphen fuhre¨ ich den Sinus-Hyperbolicus- und un- geraden Neumann-Kernel ein, welche den Exponential- und den Neumann-Kernel fur¨ ungerich- tete unipartite Graphen verallgemeinern. Ich zeige zudem, dass das Problem der gerichteten und bipartiten Kantenvorhersage verwandt sind, dadurch dass beide durch die Evolution der Singul¨arwertzerlegung gel¨ost werden k¨onnen. II Acknowledgments I could not have written this PhD thesis without help and support from many people. I thank Narcisse Noubissi Noukumo who got me started with collaborative filtering at the DAI Lab. Martin Mehlitz and Dragan Miloˇsevi´chave helped me getting started with writing papers in the first months as a PhD student. Stephan Schmidt was my collaborator on all things related to signed graph theory. Andreas Lommatzsch helped me during the time where I got my first papers published at conferences. I thank you all for getting me interested in science. I wouldn't even have started writing this thesis if it weren't for your encouragement. Christian Bauckhage helped me choosing the right topics of research early on in my PhD program, and generally encouraged me throughout my work. Robert Wetzker and Nicolas Neubauer were influential in my study of folksonomies and other multipartite networks. Stephan Spiegel has helped me make the most out of my papers. Damien Fay pushed my work in a statistically grounded direction, and was influential in everything related to the graph Laplacian and network evolution. I thank Stephan Schmidt, Jan Clausen, Antje Schultz, Thomas Gottron, Ren´ePickhardt and Julia Preusse for their mathematical insight into all areas of graph theory, linear algebra and spectral analysis that I encountered during my PhD program. You're the reason that this thesis is based, I hope, on solid mathematical foundations. Till Plumbaum, Winfried Umbrath, Torsten Schmidt and Barı¸s Karata¸s helped me get a more application-oriented approach to machine learning problems. Alan Said and Ernesto William De Luca have taught me how to connect with other scientists. J¨urgenLerner helped me understand how visualization of datasets is tightly connected to link prediction. Your per- spective has made it possible for me to disseminate my work to a larger audience at conferences and other events. I thank Christian Banik, Stephan Spiegel and Julia Preusse for choosing to write their diploma theses under my supervision, validating my work and applying it to new datasets and applications. Klaus Obermayer has helped me keeping on with my work and encouraged me to finish fast. Steffen Staab has given me the opportunity, time and position to follow my vision and complete this thesis. I thank Sebastian Stober and Wolfgang Nejdl for allowing me to give talks related to my dissertation. I thank Sergej Sizov for support in the latter stages in my research. I explicitely thank my coauthors Dragan Miloˇsevi´c, Stephan Schmidt, Martin Mehlitz, Christian Bauckhage, Andreas Lommatzsch, Stephan Spiegel, Alan Said, Ernesto William De Luca, Damien Fay, J¨urgenLerner, Till Plumbaum, Winfried Umbrath, Barı¸sKarata¸s,Torsten Schmidt, Arifah Che Alhadi, Nasir Naveed, Maciek Janik, Thomas Gottron, Ansgar Scherp and Steffen Staab. Without your cooperation and patience for writing papers, many parts of this thesis would not have been possible. I thank everyone who has accepted to review or proofread this thesis: Jan Clausen, Damien III Fay, Klaas Dellschaft, Antje Schultz, Renata Dividino, Ren´ePickhardt, Andreas Lommatzsch, Tina Walber, Julia Preusse, Gerd Gr¨oner, Christoph Ringelstein, Alexander Korth and Tobias Walter. Your input was valuable and has contributed directly to the quality of this thesis. I thank John Shawe-Taylor, Thomas Zaslavsky and Robert West for helpful input on the Neumann kernel, on the algebraic theory of signed graphs and on estimating power-law expo- nents. I thank Rabeeh Abbasi, Christian Hachenberg, Felix Schwagereit,