Statistical Inference for Propagation Processes on Complex Networks
Total Page:16
File Type:pdf, Size:1020Kb
Statistical Inference for Propagation Processes on Complex Networks Dissertation zur Erlangung des mathematisch-naturwissenschaftlichen Doktorgrades „Doctor rerum naturalium“ der Georg-August-Universität Göttingen im Promotionsprogramm „PhD School of Mathematical Sciences“(SMS) der Georg-August University School of Science (GAUSS) vorgelegt von Juliane Manitz aus Berlin-Lichtenberg Göttingen, 2014 Betreuungsausschuss Prof. Dr. Anita Schöbel, Institut für Numerische und Angewandte Mathematik, Georg- August-Universität Göttingen Prof. Dr. Thomas Kneib, Institut für Statistik und Ökonometrie, Georg-August-Universität Göttingen Mitglieder der Prüfungskommision Referentin: Prof. Dr. Anita Schöbel, Institut für Numerische und Angewandte Mathematik, Georg-August-Universität Göttingen Koreferent: Prof. Dr. Thomas Kneib, Institut für Statistik und Ökonometrie, Georg- August-Universität Göttingen Weitere Mitglieder der Prüfungskommision Prof. Dr. Michael Höhle, Institut für Mathematik, Stockholm Universität Prof. Dr. Andrea Krajina, Institut für Mathematische Stochastik, Georg-August-Universität Göttingen Prof. Dr. Dominic Schuhmacher, Institut für Mathematische Stochastik, Georg-August- Universität Göttingen Prof. Preda Mihailescu, Mathematisches Institut, Georg-August-Universität Göttingen Tag der mündlichen Prüfung: 12. Juni 2014 Acknowledgement This thesis would not have been possible without the support and encouragement of many different persons. Therefore, I am very glad that I have here the possibility to express my deep gratitude. As there are too many to name them all, I would like to highlight some persons: First of all, I would like to thank Thomas Kneib and Anita Schöbel for their brilliant supervision. While giving me space and trust to evolve my research ideas, I never felt alone knowing that they are by my side with their valuable scientific advise. Furthermore, they gave me excellent support to cope with formal requirements, and have been around with encouragement when most urgently needed. Additionally, I want to thank both of them, to make the future prospect of a research stay in New Zealand possible. I also want to thank my coauthors for fruitful collaborations. Michael Höhle has already been a mentor to me, even before I considered to do a PhD. He patiently instructed me with my first publication. I am thankful for his enduring contribution of brilliant ideas, constructive criticism and nearly scary expertise. Furthermore, I would like to thank Saskia Freytag for extensive discussions, as well as elaborating and reflecting my research at every step along the way. Her enduring support and her encouragement by persistently believing in me, and helping me to get through all ups and downs of daily academic routine. Our common publication was not only very efficient, but also provided great entertainment value; not only because of the obligatory backgammon match. You have not only supported me with your expertise, but also are a valuable friend! I am also grateful to Jonas Harbering, who introduced me to the world of optimization in public transportation systems, meticulously corrected my mathematical notation and provided refreshment by the consumption of many caffeine-containing hot beverages. Beyond, Marie Schmidt improved my work and the manuscript substantially with her critical discussion and her precise comments allowing no argumentation deficiency. Finally, the outcome of my research would not have been the same without Dirk Brockmann and Martin Schlather. They initiated and influenced large parts of this thesis with great ideas and scientific expertise. My research has been made possible by generous funding by the German Research Foundation within the research training group "Scaling Problems in Statistics" (RTG 1644). This also included two inspiring research stays at the Northwestern University in Chicago and the possibility to travel to many stimulating conferences and workshops. I want to thank each and every one of the Chairs of Statistics and Econometrics and its branches for pleasant work atmosphere, the complicated system for the arrangement iii of reliable coffee supply, and of course the welcoming distraction by long-lasting coffee breaks filled with nonsense talk, competitions for the challenge of the month, arrange- ments for barbecue meetings. Furthermore, I yield thanks to the companions of the RTG 1644 for regularly content-related and mental exchange about the challenges a PhD involves. For proofreading and helpful comments, I want to thank Daniel Adler, Mandy Becker, Alexander Brandt, Jan-Wilke Brandt, Andreas Mayr, Britta Oppermann, Hauke Rennies, Benjamin Säfken, Andrea Wiencierz, Lucie Wink, and especially Christopher Gruber. Ad- ditionally, Simone Maxand was a fantastic companionship spending with motivation and patience uncountable hours for the preparation of my thesis defense. I am also grateful to my friends and accompaniments, who supported me morally and mentally with the small things of daily routine (partly see Figure 2.2). Special thanks to Jan-Wilke Brandt for his encouragement and bearing with love and patience my temper tantrums of desperation, Jan Fahrenholz and Daniel Adler for coping with computational issues and long-lasting Sunday brunches with a difference, Britta Oppermann for psy- chological support, concert visits and sharing the tradition of "Tatort" as preparation for the beginning of the new week, and Mandy Becker for comfort food and relaxing evenings with jigsaw puzzles. Last but not least, a special thanks to my family: Vielen lieben Dank an meinen Paps Frank, mein Schwesterchen Caroline und meine Omis Margarete und Siegrid: Eure be- dingungslose Unterstützung, eure von Stolz erfüllten Blicke, sowie eurer beständiger Glaube an meine Erfolge, hat mich über die Jahre hinweg angetrieben. Eure zahlreichen Ostpakete randgefüllt mit Knusperflocken und anderen Leckereien waren immer eine mo- tivierende Wegzehrung in den nicht enden wollenden Stunden im Büro. Ich danke euch, dass ihr mir diesen Weg ermöglicht habt. Thanks to everyone, as well everyone I have not mentioned: All of you helped me to overcome the hurdles on the road to completing this thesis. Abstract Scientists of various research fields have discovered the advantages of network-centric analysis, which captures complex systems by networks and allows for their representation as a collection of nodes connected by links. Currently available network-theoretic meth- ods mainly focus on the descriptive analysis of network topology. In this thesis, different approaches to obtain inferences about propagation processes on complex networks are proposed. These processes influence quantities of interest at the network nodes and are described by a collection of random variables. The developed approaches are motivated by real-world problems ranging from food-borne disease dispersal to propagation of train delays and the regularization of genetic effects. Firstly, dynamic metapopulation model- ing is used for the development of a general food-borne disease model, which integrates the local disease dynamics with the network-based dispersal of contaminated food. The simplification of the ordinary differential equation system for the proportion of suscepti- ble, infected and recovered individuals in each district and the derivation of its solutions provide the opportunity to simulate efficiently a variety of realistic epidemics. Secondly, an explorative approach for fast and efficient origin detection of propagation processes is proposed. Based on a network-based redefinition of geodesic distance, complex spread- ing patterns can be mapped onto simple, regular wave propagation patterns if the process origin is chosen as the reference node. This approach is successfully applied to the 2011 EHEC/HUS outbreak in Germany and its good performance is confirmed in diverse out- break scenarios simulated with the introduced dynamic model for food-borne diseases. The results suggest that our method could become a useful supplement to ordinary time- consuming outbreak investigations. Moreover, this explorative approach is generalized to the problem of source train delay identification in railway systems. Extensive sim- ulation studies mimicking various propagation mechanisms, indicate good performance and promise the general applicability of the source detection approach to propagation processes in a wide range of other applications. To demonstrate the analysis of processes on complex networks from a probabilistic perspective, a kernel-based method is utilized. A novel kernel based on network-interactions for the logistic kernel machine test is sug- gested. This kernel allows seamless integration of biological knowledge and pathway information into the analysis of data from genome-wide association studies. Applications to case-control studies for lung cancer and rheumatoid arthritis demonstrate the ease of implementation and the efficiency of the proposed method. Altogether, the results from the proposed approaches demonstrate that network-theoretic analysis of propagation processes can substantially contribute to evaluate diverse problems in various research fields. v Zusammenfassung Die Methoden der Netzwerktheorie erfreuen sich wachsender Beliebtheit, da sie die Darstellung von komplexen Systemen durch Netzwerke erlauben. Diese werden nur mit einer Menge von Knoten erfasst, die durch Kanten verbunden werden. Derzeit verfüg- bare Methoden beschränken sich hauptsächlich auf die deskriptive Analyse der Net- zwerkstruktur. In der hier vorliegenden Arbeit werden