Statistical Data Integration in Translational Medicine
Total Page:16
File Type:pdf, Size:1020Kb
TECHNISCHE UNIVERSITAT¨ MUNCHEN¨ Wissenschaftszentrum Weihenstephan f¨urErn¨ahrung,Landnutzung und Umwelt Statistical data integration in translational medicine Linda Carina Krause Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzende/-r: Prof. Dr. Dietmar Zehn Pr¨ufende/-rder Dissertation: 1. Prof. Dr. Dr. Fabian Theis 2. Prof. Dr. Bernhard K¨uster Die Dissertation wurde am 28.03.2019 bei der Technischen Universit¨atM¨unchen eingereicht und durch die Fakult¨atWissenschaftszentrum Weihenstephan f¨ur Ern¨ahrung,Landnutzung und Umwelt am 27.09.2019 angenommen. Acknowledgment I would like to thank Fabian J. Theis for being my doctoral supervisor and inviting me to come to Munich to work as a doctoral researcher in the rich scientific environment at his institute. I am deeply grateful to my supervisor Nikola S. Mueller for her continuous guidance, her steady support and scientific input. I thank my co-supervisor Stefanie Eyerich for discussing all my results in detail, being open to my ideas and introducing me to the exciting world of immunology. Further, I would like to thank Prof. Dr. Bernhard K¨usterfor being in my thesis committee and Prof. Dr. Dietmar Zehn for being the chair of this committee. I would like to thank all former and current members of the computational cell maps group for great discussion rounds, interesting journal clubs and also for the fun we had together during our events and retreats. A special thanks to all the long-term great room mates I had over the years: beginning with Ivan Kondoversky, Julia S¨ollner and Norbert Krautenbacher, followed by Eudes Barbosa and Karolina Worf and finally Christoph Ogris joined us in room 022. I always enjoyed going to the office and an important part of me being so happy there, were all you! Thank you for tolerating my aromatic teas and urge for fresh air. I would like to thank all former and current members of the Institute of Computational Biology for making the place a warm and inspiring place to work. Especially I would like to thank my daily \lunch round" girls Karolina Worf, Kiki Do, Elisa Benedetti, Alida Kindt, Anna Fiedler and Carolin Loos. A special thanks to the inofficial, best thesis writers' club! I would also like to thank Bettina Knapp who helped me getting started with my PhD at the very beginning. Further, I would like to thank all of my collaboration partners for gathering the data, coming up with exciting scientific questions and for a respectful and fruitful working environment. A special thanks to Stefanie Eyerich's group at the ZAUM - Center of Allergy and Environment who welcomed me every week in their offices. Especially I would like to thank Anne Atenhan and Jenny Thomas for answering all my questions around T cells, biology, laboratory procedure, ... and for a great atmosphere in general. It was always a pleasure for me to come to ZAUM, even during the times when the seminar started at 7.30 am. I also would like to emphasize my gratitude towards my clinical collaborators in Munich at the Department of Dermatology and Allergy, Technical University of Munich. iii Especially I would like to thank Kilian Eyerich, Natalie Garzorz-Stark, Felix Lauffer and Kerstin P¨atzoldfor the great work we have accomplished together. I would like to thank the three graduates schools (QBM - Quantitative Bioscience Munich, HELENA - Helmholtz Graduate School, TUM-GS the graduate school of TU Munich) I was part of for their financial support, guidance and help to find my way in the scientific world. I truly enjoyed the inspiring workshops and networking events. Especially I would like to thank those graduate schools for introducing me to some of my closest friends who helped me through the troubles of being a doctoral researcher: Simone Heber for always having a great coffee at the institute next door for me, Andrea Weinzierl and Andor Goetzendorff for our almost weekly dinners including discussions about science and more and all of my QBM Buddies for showing me what kind of science is also out there, for great dinners and fun retreats! Further, I would like to thank all of my friends and (Beach-)Volleyball team mates for helping me keep my mind off work from time to time and for showing me that there is a life outside of the scientific world. I also would like to acknowledge all those people and especially professors who were not part of my time as a doctoral student but who motivated me to pursue science. In particular, I would like to thank Bernhard Haubold and Angelika B¨orsch-Haubold for first introducing me to the scientific world and showing me how interesting life as a scientist can be. I would like to thank Patrick Siecke for showing me what is important in life. Finally, I would like to express my deepest gratitude to my parents Ursula Schwieder-Krause and J¨orgKrause and my sister Laura Krause for their support, their belief in me throughout my life and always being there for me. I especially thank Tim Henschel for supporting me through the ups and downs of the last years and for always being by my side. iv Abstract Translational medicine describes the path from basic biomedical research to health improvement for the general public. The focus of this thesis is on bridging the gap between wet lab-focused views and clinical application across ten studies with three university hospitals by employing tailored computational tools and statistical methods. In particular, four topics in translational medicine research were thoroughly investigated. First, two molecular classifiers as means towards unbiased diagnosis were developed. To differentiate two inflammatory skin diseases a novel classifier was validated on disease subtypes. To diagnose asthma a novel molecular classifier was established and validated by means of an independent patient cohort using penalized regression modeling. Second, to standardize clinical characterization, serum proteins as easily accessible and minimally invasive markers were modeled jointly with clinical attributes. For disease monitoring and patient prognosis, optimized, regularized and consensus regression models were built. Third, the application of linear mixed effects models to adjust for inter-individual variability in human gene expression data was established. This variability masks common, underlying disease characteristics in complex phenotypes. Finally, the characterization of human T helper cell subsets was improved by identifying subset-specific molecular markers. Those markers might advance the insights into the function of T helper cell subsets within the human immune system. In summary, the analyses improved diagnosis, monitoring and understanding of human diseases by means of statistical data integration. The work on the molecular disease classifier for inflammatory skin diseases is extended for use in local practices so might impact daily clinical diagnosis in dermatology. To identify possible, further benefits for the patients the remaining computational results still need to be validated in the labs and clinics. v Zusammenfassung Translationale Medizin beschreibt den Pfad beginnend bei biomedizinischer Grundlagen- forschung hin zur Verbesserung der allgemeinen Gesundheit der Bev¨olkerung. In insgesamt zehn Studien in Kollaboration mit drei Universit¨atskliniken lag der Fokus dieser Arbeit darauf, die L¨ucke zwischen experimentellen Ergebnissen und klinischer Anwendung zu schließen, indem maßgeschneiderte bioinformatische und statistische Methoden angewendet wurden. Insbesondere vier Themen der Forschung in der translationalen Medizin wurden untersucht. Zun¨achst wurden zwei molekulare Klassifikatoren entwickelten, um Krankheiten objektiv zu diagnostizieren. Zur Unterscheidung zweier entz¨undlicher Hauterkrankungen wurde ein neuartigen Klassifikator mithilfe von Krankheitssubtypen validiert. Zur Diagnose von Asthma wurde ein neuer molekularer Klassifikator etabliert und mithilfe einer unabh¨angigen Patientenkohorte validiert. Als Zweites wurden zur Standardisierung der klinischen Charakterisierung von Patienten Serumproteine als minimalinvasive Marker gemeinsam mit klinischen Attributen modelliert. F¨urdie Uberwachung¨ von Krankheiten und der individuellen Prognose wurden optimierte, regularisierte und konsensusbasierte Regressionsmodelle entwickelt. Als Drittes wurde die Anwendung linearer gemischter Modelle zur Korrektur interindivi- dueller Variabilit¨atin humanen Genexpressionsdaten etabliert. Diese Variabilit¨at¨uberlagert zugrunde liegende Krankheitsmerkmale in komplexen Ph¨anotypen. Abschließend wurde die Charakterisierung von Subpopulationen menschlicher T-Helferzellen verbessert, indem subpopulations-spezifische molekulare Marker identifiziert wurden. Diese Marker k¨onnten die Erkenntnisse ¨uber die Funktion von T-Helferzell-Subpopulatio- nen im menschlichen Immunsystem vertiefen. Zusammenfassend verbesserten die auf statischer Datenintegration basierenden, analy- tischen Ergebnisse die Diagnose, Uberwachung¨ und das Verst¨andnismenschlicher Erkran- kungen. Der molekulare Klassifikator f¨urentz¨undliche Hautkrankheiten wird momentan in niedergelassenen Arztpraxen getestet. Dies k¨onnte die t¨agliche klinische Diagnose in der Dermatologie grundlegend ver¨andern. Um weitere m¨ogliche Vorteile f¨urPatienten zu ermitteln, sollten auch die weiteren Forschungsergebnisse experimentell und klinisch validiert werden. vii