Inferring Virological Response to Antiretroviral Combination Therapy Based on Past Treatment Lines
Total Page:16
File Type:pdf, Size:1020Kb
Universität des Saarlandes Zentrum für Bioinformatik Bachelorstudiengang Bioinformatik Bachelorarbeit Inferring virological response to antiretroviral combination therapy based on past treatment lines vorgelegt von Fabian Müller am 20. März 2008 angefertigt unter der Leitung von Prof. Dr. Thomas Lengauer, Ph.D. betreut von André Altmann begutachtet von Prof. Dr. Thomas Lengauer, Ph.D. Prof. Dr. Hans-Peter Lenhof Erklärung Hiermit erkläre ich, dass ich die vorliegende Arbeit selbständig verfasst und alle ver- wendeten Quellen angegeben habe. Saarbrücken, den 20. März 2008 Given the increasing number of possible drug combinations and the genetic diversity of HIV, it is unlikely that simple hand-crafted rules will capture the complex interplay between drug cocktails and mutational patterns that deter- mine response to antiretroviral therapy. Altmann et al. [1] Abstract Despite the fact that there are several antiretroviral agents available, therapy failure is still a major issue in the quest of ghting HIV. The most promising regimens today are combination therapies, comprising multiple compounds from dierent drug classes. However, complete viral eradication is still not achievable with current strategies. A major cause for that is viral resistance against antiretroviral compounds. Bioinformatics approaches can assess the eectiveness as probabilities of success for dierent promising drug combinations for an individual patient, based on statistics and thus help with choosing an appropriate regimen. Current predictive methods often rely on the genotype of the virus and the drug composition of the therapy. However, genotypes are not always available and even if they are, resistances might be latent in the viral population. Including information on past treatments of a patient might cover resistances (especially hidden ones) indirectly and thus improve prediction. Eventually, methods based on both, the viral genotype and therapy history, might further enhance prediction. Furthermore, it could be useful to have a visual representation of potential therapy- changes in a database. Thereby, it might be possible to identify general habits in drug prescription and observe their eect on the outcome of treatment. With this additional information, an improvement in predicting therapy success could be accomplished. In this work, therapy success is assessed using several encodings for a patient's past treatments. Graph representations prove to be useful for the analysis of therapy sequences and the development of new representations for prediction. Results support, that pre- diction via therapy history encodings is quite eective, especially when combined with genotypic information. Contents 4 Contents 1 Introduction 5 1.1 Human Immunodeciency Virus (HIV) . 5 1.1.1 Structure . 5 1.1.2 Replication Cycle . 6 1.1.3 Pathogenic Mechanisms and AIDS . 8 1.2 Current HIV-Therapies . 8 1.2.1 Therapy Compounds . 9 1.2.2 Viral Resistance and Therapy Failure . 10 1.2.3 Highly Active AntiRetroviral Therapy (HAART) . 11 1.3 Predicting Therapy Success . 12 1.3.1 Related Approaches in Predicting Therapy Success . 12 1.3.2 Motivation for History Based Encodings . 13 2 Materials and Methods 14 2.1 Statistical Learning Methods . 14 2.1.1 Logistic Regression . 14 2.1.2 Random Forests . 14 2.1.3 Validation . 15 2.2 The EuResist Database . 16 2.3 Feature Encodings . 16 2.3.1 Base Encoding for Current Therapy . 17 2.3.2 Binary Compound Variables for Therapy History . 17 2.3.3 Continuous Representation for Therapy History . 17 2.3.4 Second Order Variables for Past and Current Compounds . 18 2.4 Clustering Compound Combinations . 19 2.5 Therapy Sequence Graphs . 20 2.5.1 Transitions-Occurrence Graphs . 20 2.5.2 N-gram Graphs . 20 2.6 Graph Related Feature Encodings . 21 2.6.1 Similarity to Previous Therapy . 21 2.6.2 Transitions from Previous Therapy Clusters . 22 2.7 Analysis of Therapy Success based on Single Compound Replacements . 23 2.8 Consolidation of Genotype and Therapy History-based Methods . 23 2.9 Realization . 24 3 Results and Discussion 25 3.1 Feature Encodings . 25 3.2 Therapy Sequence Graphs . 30 3.3 Graph Related Feature Encodings . 34 3.4 Analysis of Therapy Success based on Single Compound Replacements . 37 3.5 Consolidation of Genotype and Therapy History-based Methods . 38 4 Conclusion 39 1 INTRODUCTION 5 1 Introduction 1.1 Human Immunodeciency Virus (HIV) For understanding current treatment methods of HIV infections, it is essential to have some basic knowledge about the virus' molecular biology and replication cycle. HIV belongs to the family of Retroviridae and is a member of the genus Lentivirus. Its primary targets are cells of the human immune system such as lymphocytes (CD4+ T-cells), monocytes (macrophages) and dendritic cells. HIV is thought to be descended from the simian immunodeciency virus (SIV), which infects non-human primates. There are two major types of HIV, designated HIV-1 and HIV-2, which can be subdivided into several subtypes (Figure 1). Those subtypes dier in their geographic distribution: for instance in Africa types HIV-1A and D are most predominant, while in Europe subtype B occurs most often. Figure 1: HIV Types and Subtypes [2] 1.1.1 Structure HIV's molecular structure is considered complex. The virus particle is roughly spherical and about 100nm in diameter. Its genetic information is located on diploid RNA and contains approximately 9,200 basepairs (HIV-1). It covers structural genes for its capsid, envelope, regulatory, accessory and replication associated proteins. Both ends of the RNA strands consist of long terminal repeats, which are important for transcription control. 1 INTRODUCTION 6 Figure 2: Schematic of the HIV particle [3] As shown in Figure 2, the virion is enveloped by a lipid membrane acquired from the host cell. Integrated in this membrane are trimetric viral complexes of the proteins gp41 (transmembrane protein) and gp120 (surface glycoprotein). The next inner layer, designated matrix, comprises the protein p17, followed by another shell called the capsid, which is built by approximately 2,000 copies of the p24 -protein. The viral RNA itself is stabilized by a ribonucleoprotein complex of p7. Also enclosed in the envelope are the viral proteins reverse transcriptase, protease and integrase as well as accessory proteins. 1.1.2 Replication Cycle As all retroviruses, HIV has a replication cycle that involves recognition of the host cell, integration into the host DNA and the construction and release of new virions. A schematic overview on HIV's replication can be found in Figure 3. In the early phase of infection, the viral surface proteins gp120 specically recognize the surface of the host cell (e.g. CD4+ T-cells). The virus particle fuses with the cell's membrane (mediated by the gp41-protein) and its RNA is released into the host and uncoated from the stabilizing protein. The RNA can then be transcribed into DNA. This process is catalyzed by reverse transcriptase. After construction of the complemen- tary DNA-strand, the viral DNA is transported into the host cell's nucleus, where it is integrated into the host DNA by the viral enzyme integrase. In the late phase, the viral genes are transcribed along with the host cell's genes and after splicing and translation the viral protein chains (polyproteins) are ready for further processing such as glycosylation of the transmembrane protein complexes. Furthermore, cleavage of the polyproteins, catalyzed by the viral protease, is necessary for the proteins to function and to form mature, infectious virus particles. Unspliced viral RNA is packed into the new viral structures consisting of these trans- lated viral proteins. Gp120-gp41-protein-complexes are integrated into the host cell's 1 INTRODUCTION 7 Figure 3: HIV replication cycle [4] 1 INTRODUCTION 8 membrane and during budding from the cell the newly constructed virions are enveloped by fragments of this enriched cell membrane. Replication is estimated to take 1-2 days. The time span between release of a new generation of virus particles and infection is believed to be 1-3 days. 1.1.3 Pathogenic Mechanisms and AIDS During viral replication, the HIV-particles cause severe damage to the host's immune system. Proposed pathogenic mechanisms include lysis of infected cells during viral replication, the disruption of the host's lymphoid architecture as well as autoimmune responses, such as superantigens or self-induced apoptosis of infected cells. The course of infection with HIV can be divided into three phases: • In the acute phase, which accounts for the rst 5 to 10 weeks of infection, the virus population grows rapidly and the host's immune system is activated. • During the asymptomatic phase CD4+ T-cell counts decrease steadily at a low rate while viral replication stays constant at a low rate (lasts between 2-20 years). • The end stage is the symptomatic phase: CD4+ T-cell counts fall below 200 cells per microliter and the host's tissue is damaged. HIV is strongly associated with the Acquired ImmunoDeency Syndrome (AIDS). The current denition of AIDS is given by the U.S. Centers of Disease Control and Preven- tion [5], diagnosing AIDS when a person is HIV-positive and fullls at least one of the following criteria: • The person has a CD4+ T-cell count below 200 cells/µl • The person's CD4+ T-cells account for fewer than 14% of all lymphocytes • The person has been diagnosed with at least one of 25 AIDS-dening illnesses, including opportunistic infections, certain cancers, brain and nerve diseases and the HIV wasting syndrome. Table 1 sums up current regional statistics on the prevalence of HIV and AIDS. Espe- cially alarming are the numbers of the developing countries such as in the Sub-Saharan region. Approximately one third of the population infected with HIV lives in this African region. The prevalence in those developing countries is also estimated very high.