This Electronic Thesis Or Dissertation Has Been Downloaded from the King’S Research Portal At
Total Page:16
File Type:pdf, Size:1020Kb
This electronic thesis or dissertation has been downloaded from the King’s Research Portal at https://kclpure.kcl.ac.uk/portal/ Analysing RNA-seq datasets to determine how pre-mRNA splicing is regulated by RNA binding proteins and cis-acting elements Hugh-White, Rupert Awarding institution: King's College London The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without proper acknowledgement. END USER LICENCE AGREEMENT Unless another licence is stated on the immediately following page this work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence. https://creativecommons.org/licenses/by-nc-nd/4.0/ You are free to copy, distribute and transmit the work Under the following conditions: Attribution: You must attribute the work in the manner specified by the author (but not in any way that suggests that they endorse you or your use of the work). Non Commercial: You may not use this work for commercial purposes. No Derivative Works - You may not alter, transform, or build upon this work. Any of these conditions can be waived if you receive permission from the author. Your fair dealings and other rights are in no way affected by the above. Take down policy If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim. Download date: 04. Oct. 2021 Analysing RNA-seq datasets to determine how pre-mRNA splicing is regulated by RNA binding proteins and cis-acting elements Rupert Hugh-White King’s College London PhD Supervisors: Reiner Schulz & Chad Swanson A thesis submitted for the degree of Doctor of Philosophy May 2020 Declaration The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without proper acknowledgement. 2 Abstract Most human genes undergo alternative pre-mRNA splicing to produce multiple transcript isoforms which often code for functionally distinct and tissue-specific products. Splicing factors interact with pre-mRNA and the core splicing machinery to control alternative splicing and regulate the generation of tissue-specific transcriptomes. For example, alternative splicing contributes to the function and homeostasis of the adaptive immune system. CD4+ T cells are an integral component of the adaptive immune system and regulate effector functions of immune cells towards diverse pathogens. Several splicing factors that contribute to CD4+ T cell function have been identified. However, the full splicing regulatory programmes characterising these and other immune cell types remain to be elucidated. Further, CD4+ T cells are the primary host target cell of HIV-1 infection. The HIV-1 lifecycle is regulated in large part through the host gene expression pathway. For instance, the HIV-1 RNA undergoes extensive alternative splicing mediated via the host splicing machinery. The study of processes such as these would benefit from development of improved methods for the inference of alternative splicing networks. In this thesis, I have analysed RNA-seq datasets to understand how alternative splicing is regulated through the actions of RNA binding proteins and cis-acting RNA elements. Motif Activity Response Analysis (MARA) is an approach developed for the inference of tissue- specific regulatory transcription factors. I propose that MARA may also be effectively employed for the inference of regulatory splicing factors. To this end, I applied MARA for the novel use case of analysing splicing factors. I compared this Splicing-MARA (S-MARA) to a commonly used motif enrichment approach for predicting which splicing factors regulate a given splicing programme. For this purpose, I used a large-scale splicing factor knockdown data resource produced through the ENCODE project, in addition to a published CD4+ T cell activation timecourse. Despite its previous use, splicing factor motif enrichment analysis has not undergone a formal assessment. We found that this method has utility in identifying regulatory splicing factors, providing proof-of-concept for the use of motif-based methods in prediction of regulatory splicing factors. Counter to expectations, S-MARA had poorer performance in identifying regulatory splicing factors as compared to the motif enrichment method. As such, potential improvements to S-MARA are considered as future avenues for investigation. 3 Further, the RNA binding protein Sam68 was investigated using a knockdown approach to infer its genome-wide splicing targets during the CD4+ T cell activation process. This revealed a widespread role for Sam68 in regulating mRNA abundance, whilst only a limited number of genes showed Sam68-dependent alternative splicing. Finally, the regulation of the HIV-1 lifecycle by host RNA-binding proteins was investigated. We showed that suppression of CpG dinucleotides in the HIV-1 genome appears to maintain correct splicing of viral transcripts; whilst introduction of CpGs promotes use of a cryptic splice site which disrupts splicing, potentially mediated through the actions of host splicing factors. 4 Acknowledgements This studentship was funded by Guy’s and St Thomas’ charity. Many thanks to my supervisors Reiner Schulz and Chad Swanson for their guidance throughout the duration of the project. My thesis committee, Rebecca Oakey, Sarah Teichmann, Eugene Makeyev, and Julie Fox provided useful advice to steer the direction of the project at multiple moments. Laura Hidalgo conducted the experimental work underpinning the analysis conducted in Chapter 5, and Irati Antzin- Andueza and Mattia Ficarelli conducted the experimental work described in Chapter 6. 5 ABSTRACT ............................................................................................................................. 3 ACKNOWLEDGEMENTS ......................................................................................................... 5 TABLE OF FIGURES ............................................................................................................... 10 LIST OF TABLES ..................................................................................................................... 13 ABBREVIATIONS ................................................................................................................... 14 CHAPTER 1. INTRODUCTION ............................................................................................. 15 1.1 SPLICING OF PRE-MESSENGER RNA ..................................................................................... 15 1.1.1 ALTERNATIVE PRE-MRNA SPLICING ........................................................................................ 15 1.1.2 ALTERNATIVE SPLICING IN THE GENERATION OF PROTEIN DIVERSITY .............................................. 16 1.1.3 ALTERNATIVE SPLICING IN THE CONTROL OF GENE EXPRESSION .................................................... 18 1.1.4 ASSEMBLY AND ACTION OF THE SPLICEOSOME .......................................................................... 19 1.1.5 AUXILIARY SPLICING FACTORS AND ALTERNATIVE SPLICING .......................................................... 21 1.1.6 CONTROL OF SPLICING NETWORKS .......................................................................................... 23 1.2 ALTERNATIVE SPLICING IN CD4+ T CELLS .............................................................................. 25 1.2.1 ROLE OF CD4+ T CELLS IN THE ADAPTIVE IMMUNE SYSTEM ........................................................ 25 1.2.2 ALTERNATIVE SPLICING IN CD4+ T CELLS ................................................................................. 28 1.3 THE HIV-1 LIFECYCLE AND ITS REGULATION BY HOST RNA-BINDING PROTEINS .............................. 31 1.3.1 HIV-1 IS THE ETIOLOGICAL AGENT OF THE AIDS PANDEMIC ........................................................ 31 1.3.2 THE HIV-1 LIFECYCLE ........................................................................................................... 31 1.3.3 HIV-1 DEPENDS UPON THE HOST SPLICING MACHINERY ............................................................. 33 1.3.4 HUMAN ANTI-HIV FACTORS .................................................................................................. 35 1.3.5 THERAPEUTIC TARGETING OF HOST PROTEIN-HIV-1 INTERACTIONS .............................................. 36 1.4 PROFILING SPLICING NETWORKS AND INFERENCE OF REGULATORY SPLICING FACTORS ..................... 37 1.4.1 APPROACHES TO STUDYING ALTERNATIVE SPLICING ................................................................... 37 1.4.2 PROFILING RNA-PROTEIN INTERACTIONS ................................................................................ 39 1.4.3 MOTIF MODELS FOR RNA-BINDING PROTEINS .......................................................................... 40 1.4.4 INFERENCE OF REGULATORY SPLICING FACTORS ......................................................................... 40 1.4.4.1 Motif enrichment analysis ........................................................................................... 40 1.4.4.2 Regression-based approaches ..................................................................................... 43 1.4.5 DEVELOPING METHODS FOR THE INFERENCE OF REGULATORY SPLICING FACTORS ............................ 44 1.4.5.1 Motif Activity Response Analysis (MARA) ..................................................................