Computational Approaches for Improving Treatment and Prevention of Viral Infections Dissertation zur Erlangung des Grades des Doktors der Naturwissenschaften an der Fakultät für Mathematik und Informatik der Universität des Saarlandes von Matthias Döring Saarbrücken 2019 Tag des Kolloqiums 30.04.2019 Dekan Prof. Dr. Sebastian Hack Vorsitzender der Prüfungskommission Prof. Dr. Gerhard Weikum Erster Berichterstatter Prof. Dr. Nico Pfeifer Zweiter Berichterstatter Prof. Dr. Dr. Thomas Lengauer Dritter Berichterstatter Prof. Dr. Olga Kalinina Akademischer Mitarbeiter Dr. Peter Ebert Abstract The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments. geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] deter- mines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover prob- lem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptor- hiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases. 4 Zusammenfassung Die Behandlung von HIV- oder HCV-Infektionen ist herausfordernd. Daher werden neue Wirkstoffe, sowie neue computerbasierte Ver- fahren benötigt, welche die Therapie verbessern. In dieser Arbeit wurden Methoden zur Unterstützung der Therapieauswahl entwick- elt, aber auch solche, welche neuartige Therapien vorantreiben. geno2pheno[ngs-freq] bestimmt, ob Resistenzen gegen Medika- mente vorliegen, indem es Hochdurchsatzsequenzierungsdaten von HIV-1 oder HCV Proben mittels Support Vector Machines oder einem regelbasierten Ansatz interpretiert. geno2pheno[coreceptor- hiv2] bestimmt den HIV-2 Korezeptorgebrauch dadurch, dass es einen Abschnitt des viralen Oberflächenproteins mit einer Support Vector Machine analysiert. openPrimeR kann optimale Kombi- nationen von Primern für die Multiplex-Polymerasekettenreaktion finden, indem es ein Mengenüberdeckungsproblem löst und auf ein neues logistisches Regressionsmodell für die Vorhersage von Amplifizierungsereignissen zurückgreift. geno2pheno[ngs-freq] und geno2pheno[coreceptor- hiv2] ermöglichen die Personalisierung antiviraler Therapien und unterstützen die klinische Entscheidungsfindung. Durch den Ein- satz von openPrimeR auf humanen Immunoglobulinsequenzen konnten Primersätze generiert werden, welche die Isolierung von breit neutralisierenden Antikörpern gegen HIV-1 verbessern. Die in dieser Arbeit entwickelten Methoden leisten somit einen wichti- gen Beitrag zur Verbesserung der Prävention und Therapie viraler Infektionskrankheiten. Acknowledgements This dissertation would not have been possible without the guidance and support from several people that I would like to acknowledge in the following paragraphs. I am grateful to Nico Pfeifer and Thomas Lengauer for the opportunity of pursuing a PhD at the Max Planck Institute for Informatics. It was a privilege to work in an environ- ment that allowed me to fully immerse myself in research, free from other concerns. The expert guidance and support provided by Nico and Thomas were integral to shaping my doctoral research. I am par- ticularly thankful for Nico’s kind mentorship and continual support. I would like to express my appreciation for many people at the Max Planck Institute for Informatics. For providing helpful advice before and during my time as a PhD student, I would like to thank Bastian Beggel, Glenn Lawyer, and Markus List. For the fruitful and stimulating collaboration on the geno2pheno web service, I would like to express my gratitude to Joachim Büch and Georg Friedrich. For their great technical support, I am indebted to Achim, Georg, and everyone working in the Information Services and Technol- ogy group. I would like to thank Alejandro Pironti and Prabhav Kalaghatgi for their contributions to my work; geno2pheno[ngs-freq] would not have been possible without the groundwork that was laid by Alejandro. For her administrative work, I am thankful to Ruth Schneppen-Christmann who always helped me with organizing my conference trips. I am grateful to a multitude of people outside Max Planck Society. I am indebted to Rolf Kaiser for his openness, inventiveness, and the organization of platforms such as the AREVIR meeting and the Rettenstein symposium. For their input to my scientific work, I would like to highlight Eva Heger, Elena Knops, Florian Klein, and Christoph Kreer. Additionally, I would like to acknowledge the contributions from Pedro Borrego, Ricardo Camacho, Martin Däumer, Josef Eberle, Meryem Seda Ercanoglu, Nathalie Lehnen, Andreia Martins, Martin Obermeier, Philipp Schommers, Saleta Sierra-Aragon, Simone Susser, Alexander Thielen, and Nuno Taveira. I would also like to thank the following organizations that funded my doctoral research: Max Planck Society, the German Ministry of Health (MASTER-HIV/HEP), the European HIV Coreceptor Study Panel (EucoHIV), and the German Center for Infection Research (HCV Treatment optimization). Without governmental funding, this work would not have been possible. Heartful thanks go to all my peers at the Center for Bioinformatics in Saarbrücken who always were a source of inspiration, motivation and joyfulness; most notably Peter Ebert, Anna Hake, Lisa Handl, 6 Sivarajan Karunanithi, Tim Kehl, Fabian Müller, Sarvesh Nikumbh, Alejandro Pironti, Florian Schmidt, Lara Schneider, Nora Katharina Speicher, and Thorsten Will. I would like to highlight Peter Ebert, Anna Hake, Eva Heger, Edith Heiter, Elena Knops, and Nora Katha- rina Speicher for providing feedback on this dissertation. For their companionship during my studies I am deeply thankful to Miriam Bah, Max Fischer, Alexander Junge, Sebastian Keller, and Andreas Mohr. Above all, I would like to thank my family. I am indebted to my parents for their continual support and to my brother for spark- ing my interest in computers. Finally, I would like to thank Edith Heiter for always being there for me. I cannot imagine anything better than coming home and seeing your smiling face. Contents List of Figures 13 List of Tables 17 List of Algorithms 19 1 Introduction 21 IBackground 27 2 Virological and Immunological Foundations 29 2.1 Viruses and Viral Pathogenesis 30 2.2 Defense Mechanisms against Viruses 32 2.2.1 Components of the Immune System...... 32 2.2.2 Overview of the Adaptive Immune System.... 33 2.2.3 Cells of the Adaptive Immune System...... 33 2.2.4 Antibody Structure and Function....... 35 2.2.5 Neutralizing Antibodies........... 36 2.3 Human Immunodeficiency Virus 37 2.3.1 Introduction to HIV............ 38 2.3.2 Structure and Genome Organization...... 38 2.3.3 Life Cycle................ 40 2.3.4 Coreceptor Usage............. 42 8 2.3.5 Transmission and Course of Infection...... 45 2.3.6 Treatment of HIV Infection.......... 46 2.3.7 Neutralizing Antibodies and Treatment..... 51 2.4 Hepatitis C Virus 54 2.4.1 Introduction to HCV............ 54 2.4.2 Structure and Genome Organization...... 55 2.4.3 Life Cycle................ 55 2.4.4 Transmission and Course of Infection...... 57 2.4.5 Treatment of HCV Infection......... 58 2.5 Molecular Techniques 60 2.5.1 Phenotypic HIV Coreceptor Testing...... 60 2.5.2 Phenotypic Resistance Testing......... 62 2.5.3 Sequencing of Viral Genomes......... 66 2.5.4 Polymerase Chain Reaction.......... 69 3 Methodological Foundations 73 3.1 Overview of Machine Learning 73 3.2 Supervised Learning 74 3.2.1 Preliminaries............... 74 3.2.2 Supervised Learning as Function Estimation... 75 3.2.3 The Bias-Variance Decomposition....... 76 3.2.4 Training and Test Errors........... 77 3.2.5 Estimating Model Errors........... 78 3.2.6 Limiting Model Complexity via Regularization.. 81 3.2.7 Feature Selection............. 81 3.3 Measures of Predictive Performance 83 3.3.1 The Confusion Matrix............ 83 3.3.2 Performance Measures for Non-Scoring Classifiers. 84 3.3.3 Performance Measures for Scoring Classifiers... 85 3.4 Models for Supervised Learning 86 3.4.1 Logistic Regression............. 86 3.4.2 Support Vector Machines.......... 87 3.5 Clustering 93 3.5.1 K-Means Clustering............ 93 3.5.2 Hierarchical Clustering........... 95 3.6 Statistical Significance Tests 96 3.6.1 Introduction to Hypothesis Testing....... 97 3.6.2 McNemar’s Test.............. 98 3.6.3 Fisher’s Exact Test............. 98 9 3.6.4 Wilcoxon Rank-Sum Test.......... 99 3.6.5 Multiple Hypothesis Testing......... 99 3.7 Optimization with Linear Programs 101 3.7.1 Linear and Integer Linear Programming..... 101 3.7.2 Branch and Bound............. 102 3.7.3 The Set Cover Problem........... 102 II Contributions 105 4 Interpreting Drug Resistance
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages337 Page
-
File Size-