Improving and Validating Data-Driven Genotypic Interpretation Systems for the Selection of Antiretroviral Therapies
Total Page:16
File Type:pdf, Size:1020Kb
Improving and Validating Data-Driven Genotypic Interpretation Systems for the Selection of Antiretroviral Therapies Alejandro Pironti Dissertation zur Erlangung des Grades des Doktors der Naturwissenschaften der Fakultä t fü r Mathematik und Informatik der Universitä t des Saarlandes Saarbrü cken 2016 Tag des Kolloquiums 07.12.2016 Dekan Prof. Dr. Frank-Olaf Schreyer Vorsitzender der Prüfungskomission Prof. Dr. Gerhard Weikum Erster Berichtserstatter Prof. Dr. Thomas Lengauer Zweiter Berichtserstatter Prof. Dr. Hans-Peter Lenhof Akademischer Mitarbeiter Dr. Markus List ©2016 – Alejandro Pironti all rights reserved. Thesis advisor: Prof. Dr. Dr. Thomas Lengauer Alejandro Pironti Improving and Validating Data-Driven Genotypic Interpretation Systems for the Selection of Antiretroviral Therapies Short Summary Infection with Human immunodeficiency vir type 1 (HIV-1) requires treatment with a combination of an- tiretroviral drugs. This combination of drugs must be selected under consideration of its prospects for attaining sustained therapeutic success. Genotypic therapy-success interpretation systems can be used for selecting a com- bination of antiretroviral compounds. However, a number of shortcomings of these systems have prevented them from reaching the bedside. In this work, I present and validate novel methods for deriving interpretable genotype interpretation systems that are trained on HIV-1 data from routine clinical practice. One method produces scores that are correlated with previous exposure of the virus to the drug and with drug resistance. A further, novel genotype interpre- tation system produces a prognostic score correlated with the time for which the antiretroviral therapy with a certain drug combination will remain effective. The methods presented in this work represent an important advance in techniques for the interpretation of viral genotypes. Validation of the methods shows that their performance is comparable or, most frequently, superior to that of previously available methods. The methods are interpretable and can be retrained without the need for expert intervention. Last but not least, long-term therapeutic success is considered by the methods such that their predictions are in line with the results of clinical studies. Kurzfassung Eine Infektion mit dem Humanen Immunodefizienz-Virus Typ 1 (HIV-1) erfordert die Behandlung des Pa- tienten mit einer Kombination von antiretroviralen Wirkstoffen. Die Auswahl dieser Wirkstoffkombination muss unter Berücksichtigung der Aussichten für einen lang anhaltenden Behandlungserfolg stattfinden. Bei der Auswahl von Wirkstoffkombinationen können Systeme zur Vorhersage des Behandlungserfolgs eingesetzt werden. Bisher verfügbare Systeme weisen jedoch mehrere Defizite auf, sodass sie in der klinischen Praxis kaum Verwendung finden. In dieser Arbeit werden neuartige Methoden zur Aufstellung von Systemen zur Genotypinterpretation präsentiert und validiert. Eine dieser Methoden bewertet einen HIV-1-Genotyp bezüglich der vorhergehen- den viralen Wirkstoffexposition und der Wirkstoffresistenzen. Eine weitere Genotypinterpretationsmethode errechnet eine prognostische Zahl, welche mit der Zeit korreliert, die eine antiretrovirale Therapie effektiv sein wird. Diese Arbeit stellt eine wichtige Weiterentwicklungder Methoden zur Interpretation von viralen Genotypen dar. Zum Einen ist das Vorhersagemögen der Modelle dieser Arbeit vergleichbar oder sogar höher als diejenige von bisher verfügbaren Modellen. Zum Anderen sind die Modelle dieser Arbeit interpretierbar und können ohne Expertensupervision neu trainiert werden. Darüber hinaus berücksichtigen die Methoden den Langzeit- therapieerfolg, sodass ihre Vorhersagen mit den Ergebnissen klinischer Studien übereinstimmen. v Thesis advisor: Prof. Dr. Dr. Thomas Lengauer Alejandro Pironti Improving and Validating Data-Driven Genotypic Interpretation Systems for the Selection of Antiretroviral Therapies Abstract Infection with Human immunodeficiency vir type 1 (HIV-1) requires treatment with antiretroviral drugs. Without treatment, patients with HIV-1 infection develop symptoms referred to as acquired immunodeficiency syndrome (AIDS), ultimately leading to the death of the patient. The high productivity and variability of HIV-1 results in the continuous emergence of drug-resistant viral variants. In order to be able to suppress viral replica- tion, several drug compounds must be used simultaneously in antiretroviral therapy. For this reason, a combi- nation of drug compounds must be selected under consideration of the drug resistance of the virus and of the prospects that the drug combination has for attaining sustained therapeutic success. Genotypic drug-resistance interpretation systems are frequently used for selecting combinations of antiretro- viral drug compounds. These systems interpret HIV-1 genotypes in order to predict the susceptibility of the virus to each individual antiretroviral drug. However, the actual selection of an optimal drug combination needs to be carried out by the treating physician. In contrast, genotypic therapy-success interpretation systems provide their users with predictions for the success of antiretroviral drug combinations. However, a number of shortcomings of these systems have prevented them from reaching the bedside. In this work, I present and validate novel methods for deriving genotype interpretation systems that are trained on HIV-1 data from routine clinical practice. All of these systems provide the user with an interpre- tation of their predictions. One system produces numbers called drug exposure scor (DES) for each available antiretroviral drug. DES are correlated with previous exposure of the virus to the drug and with drug resistance. I also present and validate methods for converting DES into clinically meaningful categories, such that they can readily be used by human experts for selecting optimal antiretroviral therapies. DES can be used as features for further analyses relating to antiretroviral therapy. I present a further, novel genotype interpretation system that is trained on DES to produce a prognostic score correlated with the time for which the antiretroviral therapy with a certain drug combination will remain effective. The methods presented in this work represent an important advance in techniques for the interpretation of viral genotypes. Validation of the methods shows that their performance is comparable or, most frequently, superior to that of previously available methods. Their data-driven methodology allows for automatic retrain- ing without the need for expert intervention. Their interpretability helps them gain the confidence of the users and delivers a rationale for predictions that could be considered surprising. Last but not least, the ability of the therapy-success interpretation system to consider cumulative, long-term therapeutic success allows it to produce predictions that are in line with the results of clinical studies. vii viii Acknowledgments First and foremost, I would like to thank my supervisor, Prof. Dr. Thomas Lengauer, PhD, for the following reasons. First, Thomas created a fantastic environment in which people can perform research and learn how to perform research. Second, Thomas invited to me to be part of his research environment. Third, Thomas supervised my research personally. His style of supervision provided me with an exceptional amount of freedom while imparting me all concepts needed for attaining scientific rigor. Fourth, over the course of many years, Thomas provided me with a great amount of support and encouragement while challenging me to thrive as a scientist. Last but not least, Thomas provided me with an example of how a scientist and human being can and should be. I am thankful to Dr. Nico Pfeifer for advising me during my doctoral thesis. I would like to mention that my statistical rigor improved thanks to Nico. I owe gratitude to Dr. Rolf Kaiser who imparted me a significant amount of virological knowledge and welcomed me in his extensive network of scientists. I am truly indebted to all the people who invested a lot of energy into improving my abilities as a scientist. Dr. Hauke Walter let me profit from his impressive memory such that I could attain an intuitive understanding of the developments that occurred in HIV field during the last twenty years. Furthermore, Hauke very frequently shared his research ideas with me. Dr. Björn Jensen invested considerable energy into making me understand the way a physician thinks and into making me understand what is important when selecting an antiretroviral therapy. I attended several conferences in which Dr. Martin Obermeiner was present as well. Martin was always happy to explain to me any biomedical concept of which I had not previously heard. I am indebted to Prof. Dr. Hans-Peter Lenhof for imparting me the basic principles of computational biology and for refereeing this thesis. I am much obliged to all my collaborators without whom my research would not have been possible. I would like to mention the names of the scientists who have worked under the supervision of Dr. Rolf Kaiser: Dr. Melanie Balduin, Dr. Eva Heger, Dr. Elena Knops, Dr. Nadine Lübke, Eugen Schülter, Dr. Finja Schweizer, Dr. Saleta Sierra, and Dr. Jens Verheyen. I am grateful to Claudia Müller, Dr. Stefan Scholten, and Dr. Mark Oette. I am indebted to all members of the EuResist Network. I would like to mention the following names of members of the EuResist Network: Dr. Francesca Incardona, Dr. Ricardo Camacho,