A Network-Based Approach to Human Diseases

Network Medicine: A Network-based Approach to Human Diseases by Susan Dina Ghiassian B.S. in Physics, Sharif University of Technology M.S. in Physics, Northeastern University A dissertation submitted to The Faculty of the Colledge of Art and Science of Northeastern University in partial fulfillment of the requirements for the degree of Doctor of Philosophy March 19, 2015 Dissertation directed by Albert-László Barabási Distinguished University Professor DEDICATION To Mamanjoon ii ACKNOWLEDGMENTS I would like to thank my advisor, Albert-László Barabási, not only for giving me the opportunity to spend the most productive years of my life (so far!) in his lab but also for proving me that there is no limit to human inspiration. He taught me to be broad- minded and unbiased in discussing ideas, pragmatic and collaborative in doing research and confident in defending what I believe in. I would like to take this opportunity to thank three members of my thesis committee: Daniel Chasman, who has always offered me his guidance during my research, Alain Karma for teaching me the basics of statistical mechanics and Alessandro Vespignani from whom I have learnt and been amazed through his contributions to the field of network science. Completion of this dissertation would not have been possible without all the help and support of a former member of the lab, Jörg Menche, who patiently taught me all steps required to perform a successful research. He not only showed me the necessity of an honest research but also taught me life lessons of being helpful to others and peaceful to yourself. I am grateful to have worked with my wonderful collaborators at CCNR: Sabrina Rabello, Emre Guney, Marc Santolini, Maksim Kitsak, Joseph de Nicolo, Suzanne Aleva, Brett Common and James Bagrow. I would also like to express my gratitude to Joseph Loscalzo, who guided me through to the completion of my research and patiently answered all my questions. This dissertation is the result of a collaborative effort of many bright collaborators from different iii institutes: CCNR, Dana Farber Cancer Institute, Brigham and Women’s Hospital, and University Heart Center, Hamburg, Germany. I would like to acknowledge all my collaborators (Mark Vidal, David E. Hill, Sam Pevzner, Anne-Ruxandra Carvunis, Thomas Rolland, Franco Giulianini, Piero Ricchiuto, Christian Mueller, Tajna Zeller, Sasha Singh, Aikawa Masanori, Ramy Arnaout and many more) for making this happen. I would like to thank my uncle and aunts (Freydoun Ghiassian, Shaheen Ghiassian and Deena Westerby) for their continuous support and encouragement. I would also like to thank my dearest friends and family (Anahita Faham, Fateme Tousi, Amir Taqavi, Samira Faegh, Parnian Boloorizadeh, Dena Saadat, Sara Ansari, Marzieh Haghighi, Parisa Taheri, Noushin Fallahpour, Mona Shahi, Mona Manouchehri, and many more) who have always been by my side, listened to me, shared their experiences and brought the best out of me. I am blessed for having my biggest role models, my lovely parents (Bahman Ghiassian and Fozia Benaissa) and their endless support. They always inspired me, believed in me and supported me in every way possible. Their kind hearts, bright minds, nice person- alities and helping hands have always been the guide throughout my life. I am grateful to have my sister and brother, Yasman and Ehsan, who are always fun, supporting and loving. My special thanks go to my husband, Razzi Movassaghi, who has been by my side through ups and downs for the past 8 years and made me believe in myself. He is not only the source of my courage and motivation in life, but he has always provided me with his insightful scientific suggestions to my research. Finally, this work is dedicated to the memory of my beloved grandmother who loved to learn and always encouraged me to keep learning. She was the best thing this world could have. iv ABSTRACT With the availability of large-scale data, it is now possible to systematically study the underlying interaction maps of many complex systems in multiple disciplines. Statisti- cal physics has a long and successful history in modeling and characterizing systems with a large number of interacting individuals. Indeed, numerous approaches that were first developed in the context of statistical physics, such as the notion of random walks and diffusion processes, have been applied successfully to study and characterize complex systems in the context of network science. Based on these tools, network science has made important contributions to our understanding of many real-world, self-organizing systems, for example in computer science, sociology and economics. Biological systems are no exception. Indeed, recent studies reflect the necessity of applying statistical and network-based approaches in order to understand complex biological systems, such as cells. In these approaches, a cell is viewed as a complex network consisting of interactions among cellular components, such as genes and proteins. Given the cellular network as a platform, machinery, functionality and failure of a cell can be studied with network-based approaches, a field known as systems biology. Here, we apply network-based approaches to explore human diseases and their associated genes within the cellular network. This dissertation is divided in three parts: (i) A systematic analysis of the connectivity patterns among disease proteins within the cellular network. The quantification of these patterns inspires the design of an algorithm which predicts a disease-specific subnetwork containing yet unknown disease- v associated proteins1 . (ii) We apply the introduced algorithm to explore the common underlying mechanism of many complex diseases. We detect a subnetwork from which inflammatory processes initiate and result in many autoimmune diseases. (iii) The last chapter of this dissertation describes the statistical methods, detailed data curation processes and additional analyses performed to accomplish the previous parts. 1 The contents of this part are published in Plos. Comp. Bio. journal vi CONTENTS Dedication ii Acknowledgments iii Abstract v Contents v 1introduction 1 1.1 Origin of graph theory . 1 1.2 Emergence of network science . 3 1.3 Network science applications in systems biology . 7 1.4 Emergence of Network Medicine . 9 1.4.1 Human interactome and complex diseases . 14 1.4.2 Existing methods for the identification of disease-gene associations 17 2 a disease module detection (diamond) algorithm 23 2.1 Quantifying interaction patterns of disease proteins within the interactome......................................... 24 2.2 The DIAMOnD algorithm . 34 2.2.1 Time complexity . 36 2.3 DIAMOnD performance and robustness . 38 2.3.1 Synthetic modules construction . 40 2.3.2 Estimating the recovery rate . 41 vii 2.3.3 Analyzing the sensitivity towards perturbations and network nois- iness . 42 2.4 Identifying and validating disease modules . 46 2.5 Comparison with existing methods . 50 2.6 Extending the basic DIAMOnD algorithm . 54 2.7 Discussion . 58 3 common underlying molecular mechanisms of complex diseases 61 3.1 Constructing inflammasome, thrombosome, and fibrosome . 62 3.1.1 Significant clustering of seed genes within the human interactome . 62 3.1.2 Effect of biased studies on significant clustering of seed genes . 66 3.1.3 Modules detection, validation and robustness . 68 3.1.4 Cross-talk region of the modules . 73 3.1.5 Biological importance of the endophenotype modules . 74 3.1.6 The role of endophenotype modules in cardiovascular disease . 75 3.1.7 The role of endophenotype modules in complex diseases . 76 3.2 Topological properties of the endophenotype modules . 77 3.2.1 Central location of inflammatory and fibrotic genes . 77 3.3 Functionality of detected endophenotype modules using macrophages . 82 3.3.1 Detection of early and late proteins in response to inflammatory stimulator . 84 3.3.2 Early proteins may be responsible for triggering late proteins . 85 3.4 Discussion . 87 4dataanalysisandpreparation 93 4.1 Human Interactome (HI) . 93 4.2 Highly studied proteins within the PPI . 95 4.3 Modular nature of protein-protein interaction network . 98 viii 4.3.1 Disease-genes associations . 99 4.3.2 Gene annotations . 100 4.4 LCC significance . 101 4.5 Pathways analysis . 101 4.6 Genetic association analysis . 101 4.7 Differential expression analysis of cardiovascular risk . 102 4.8 THP-1 cell culture experiments and proteomics . 104 5 conclusionsandfuturedirections 105 bibliography 109 ix LISTOFFIGURES Figure 1 Schematic network representation. 5 Figure 2 Localization of disease proteins. 25 Figure 3 Disease proteins forming the largest connected component (LCC). 26 Figure 4 Singnificant clustering of disease proteins. 29 Figure 5 Topological communities and disease proteins. 30 Figure 6 Failure of topological community detection methods. 31 Figure 7 Connectivity significance vs. local modularity of disease proteins. 33 Figure 8 Connectivity significance characterizes disease proteins. 35 Figure 9 The DIAMOnD algorithm. 37 Figure 10 Macular degeneration disease module. 39 Figure 11 Synthetic modules. 40 Figure 12 Performance evaluation of DIAMOnD. 43 Figure 13 N-1 analysis. 45 Figure 14 DIAMOnD robustness. 46 Figure 15 Biological evaluation of lysosomal storage diseases module. 48 Figure 16 Biological validation of DIAMOnD across 70 diseases. 51 Figure 17 DIAMOnD and Random Walk in synthetic and disease modules. 53 Figure 18 Overall comparison of DIAMOnD and Random Walk. 55 Figure 19 Schematic representation showing why to assign node weights. 56 Figure 20 Extending the DIAMOnD algorithm to adopt node weights. 59 x Figure 21 Topological characteristics of seed genes within the HI. 65 Figure 22 Genetic association of seed genes. 67 Figure 23 Studying biased studies of networks in seeds clustering. 69 Figure 24 Biological validation of the detected DIAMOnD genes. 71 Figure 25 Topological properties of the endophenotypic modules. 72 Figure 26 Differentially expressed genes within modules. 73 Figure 27 Tree analysis. 80 Figure 28 Tree analysis of seed genes and modules.

A Network-Based Approach to Human Diseases

BIOINFORMATICS Doi:10.1093/Bioinformatics/Bti144

BIOGRAPHICAL SKETCH NAME: Berger

Computational Methods Addressing Genetic Variation In

Big Data, Moocs, and ... (PDF)

ABSTRACT HISTORICAL GRAPH DATA MANAGEMENT Udayan

Cloud Computing and the DNA Data Race Michael Schatz

BENG181/CSE 181/BIMM 181 Molecular Sequence Analysis Instructor: Pavel Pevzner

John Anthony Capra

Steven L. Salzberg

Graduation 2019

THE BIG CHALLENGES of BIG DATA As They Grapple with Increasingly Large Data Sets, Biologists and Computer Scientists Uncork New Bottlenecks

UNIVERSITY of CALIFORNIA RIVERSIDE RNA-Seq