Mechanisms and Impact of Post-Transcriptional Exon Shuffling

Mechanisms and impact of Post-Transcriptional Exon Shuffling (PTES) Ginikachukwu Osagie Izuogu Doctor of Philosophy Institute of Genetic Medicine Newcastle University September 2016 i ABSTRACT Most eukaryotic genes undergo splicing to remove introns and join exons sequentially to produce protein-coding or non-coding transcripts. Post-transcriptional Exon Shuffling (PTES) describes a new class of RNA molecules, characterized by exon order different from the underlying genomic context. PTES can result in linear and circular RNA (circRNA) molecules and enhance the complexity of transcriptomes. Prior to my studies, I developed PTESFinder, a computational tool for PTES identification from high-throughput RNAseq data. As various sources of artefacts (including pseudogenes, template-switching and others) can confound PTES identification, I first assessed the effectiveness of filters within PTESFinder devised to systematically exclude artefacts. When compared to 4 published methods, PTESFinder achieves the highest specificity (~0.99) and comparable sensitivity (~0.85). To define sub-cellular distribution of PTES, I performed in silico analyses of data from various cellular compartments and revealed diverse populations of PTES in nuclei and enrichment in cytosol of various cell lines. Identification of PTES from chromatin-associated RNAseq data and an assessment of co-transcriptional splicing, established that PTES may occur during transcription. To assess if PTES contribute to the proteome, I analyzed sucrose-gradient fractionated data from HEK293, treated with arsenite to induce translational arrest and dislodge ribosomes. My results showed no effect of arsenite treatment on ribosome occupancy within PTES transcripts, indicating that these transcripts are not generally bound by polysomes and do not contribute to the proteome. To investigate the impact of differential degradation on expression levels of linear and circRNAs, I analyzed the PTES population within RNAseq data of anucleate cells and established that most PTES transcripts are circular and are enriched in platelets 17-to-188-fold relative to nucleated tissues. For some genes, only reads from circRNA exons were detectable, suggesting that platelets have lost >90% of their progenitor mRNAs, consistent with time- dependent degradation of platelets transcriptomes. However, some circRNAs exhibit read density patterns suggestive of miRNA induced degradation. Finally, a linear PTES from RMST locus has been implicated in pluripotency maintenance using limited RNAseq data from human embryonic stem cells (hESC). To identify other PTES transcripts with similar expression patterns, I analyzed RNAseq data from H9 ESC differentiation series. Statistical analyses of PTES transcripts identified during cellular differentiation established that PTES expression changes track with that of cognate linear transcripts and accumulate upon differentiation. Contrary to previous reports, the dominant transcript from RMST is circular and increases in abundance during differentiation. Functional Abstract ii analyses demonstrating the role of RMST in pluripotency maintenance had targeted exons within the predicted circRNA, suggesting previously unreported functional relevance for circRNAs. Abstract iii Declaration I, Osagie Izuogu, declare that no material documented in this thesis has been submitted in support of a degree or other qualification in this or any other university. This thesis represents my own work and any collaborative work is acknowledged where necessary. Osagie Izuogu Declaration iv Dedication This thesis is dedicated to Denise, my wife; Tiana, Dante and Luca, my children. Thank you all for your unwavering support and understanding throughout my research. I love you all. Dedication v Acknowledgements Enormous gratitude to Dr. Mike Jackson and Dr. Mauro Santibanez-Koref for the opportunity to undertake my research under their supervision. I am particularly grateful for your guidance, support and patience, and for being available for all formal and informal discussions. I sincerely look forward to future opportunities to work with you, thank you. To Dr. Alhassan, thank you for your guidance and assistance with in vitro analyses, I appreciate your help. To Prof. David Elliott and members of Elliott’s lab, thank you for your support and for providing answers to my numerous questions. I would like to thank current and former members of the statistical genetics group (IGM), especially, Dr. Miossecc, Dr. Griffin, Dr. Ayers, Dr. Xu, Dr. Ainsworth and Dr. Howey. I thank you all for your words of advice, encouragement and impromptu tutorials. Our informal meetings helped shape my thoughts. Special thanks to my collaborators - Prof Lako (Newcastle University, UK), Dr. Ghevaert & colleagues (Cambridge University, UK) - and my panel members - Prof. Heather Cordell, Dr. Andreas Werner and Dr. Venables; I thoroughly enjoyed working on this project and my research benefitted from your expertise, words of encouragement and guidance, thank you. To Dr. Harry Mountain (Staffordshire University, UK) and Ada Izuogu (University of Toledo, Ohio), thank you both for proofreading my thesis and your valuable comments. I am grateful to my family and friends, for their unconditional love and support throughout my study, thank you all. I acknowledge the Biotechnology and Biological Sciences Research Council for funding my research. Ultimately, I thank God for his mercies. Acknowledgements vi vii Table of Contents List of Figures 14 List of Tables 16 List of Abbreviations 17 Chapter 1: Introduction 19 1.1 Transcriptome Diversity in Humans 19 1.2 Major Sources of Transcriptome Diversity 21 1.2.1 Alternative Splicing 21 1.2.2 Transcription of 'Junk DNA' 23 1.2.3 Chimeric Transcripts 26 1.3 Post-Transcriptional Exon Shuffling (PTES) 28 1.3.1 Mechanisms of PTES Formation 31 1.3.2 Regulation of PTES Formation 33 1.3.3 In vitro methods for identification of PTES 35 1.3.4 Approaches to PTES in silico identification 37 1.3.5 Sources of known artefacts that confound in silico PTES identification 41 1.3.6 PTESFinder: a computational tool for PTES identification 43 1.4 Project Aims 46 Chapter 2: Materials & Methods 47 2.1 Cell lines 47 2.2 Sample Preparation 47 2.2.1 Tissue culture of HEK293 and DAOY cell lines 47 2.2.2 Differentiation of H9 ESC 47 2.2.3 Human tissues and blood samples from healthy donors 48 2.2.4 RNA Isolation and cDNA synthesis 48 Table of Contents 8 2.2.5 RNase R digestion 49 2.3 In vitro PTES confirmation, visualization and quantification 49 2.3.1 Primer design 49 2.3.2 Polymerase Chain Reaction (PCR) 49 2.3.3 Agarose Gel electrophoresis 50 2.3.4 Quantitative PCR (qPCR) 50 2.4 Public RNAseq datasets 50 2.4.1 Human Fibroblasts and Leukocytes data 50 2.4.2 ENCODE sub-cellular RNAseq data 51 2.4.3 Sucrose-gradient fractionated RNAseq data from HEK293 51 2.4.4 RNAseq data from human tissues and anucleate cells 51 2.5 RNAseq data generation 52 2.5.1 High-throughput RNA sequencing 52 2.5.2 Generating simulated RNAseq data 53 2.5.3 Sub-sampling of RNAseq data 54 2.6 Computational Methods 54 2.6.1 Sequence Quality check 54 2.6.2 PTES identification 54 2.6.3 RNAseq analysis 55 2.6.4 Definition and derivation of metrics 56 2.6.5 Statistical analysis of PTES abundance 58 2.6.6 Custom scripts 58 Chapter 3: Assessment of Computational PTES identification Methods 59 3.1 Introduction 59 3.1.1 Existing PTES identification tools do not specifically exclude all sources of artefacts 59 Table of Contents 9 3.1.2 Choice of aligner and aligner-specific parameters may impact reproducibility of PTES predictions 60 3.1.3 PTESFinder is equipped with filters that systematically exclude sources of artefacts 61 3.2 Aims 63 3.3 Results 64 3.3.1 Filters Target Overlapping Populations Of Reads 64 3.3.2 Reads Excluded By Specific Filters Have Different Origins 66 3.3.3 PID Has Greater Impact Than JSpan 69 3.3.4 Effect of Aligner Specific Parameter and PTESFinder Performance 70 3.3.5 Comparison of PTES identification methods 72 3.3.6 Assessment of Annotation-Free Identification Methods 77 3.4 Discussion 81 3.5 Conclusion 82 Chapter 4: Sub-Cellular Distribution of PTES transcripts 84 4.1 Introduction 84 4.1.1 Spliceosomal proteins aid nucleo-cytoplasmic mRNA export 84 4.1.2 Co- or Post-Transcriptional Exon Shuffling? 86 4.1.3 Profiling transcripts undergoing translation 87 4.2 Aims 89 4.3 Results 90 4.3.1 Variety of PTES events observed in the nucleus 91 4.3.2 snoRNA-PTES transcripts are likely artefacts 92 4.3.3 PTES transcripts are enriched in the cytosol 98 4.3.4 Incompletely processed circRNAs enriched in the nucleus 101 4.3.5 Quantitative analysis of chromatin-associated PTES transcripts 103 Table of Contents 10 4.3.6 Do PTES transcripts contribute to the proteome? 107 4.4 Discussion 109 4.5 Conclusion 112 Chapter 5: PTES transcripts in anucleate cells 113 5.1 Introduction 113 5.1.1 Platelets have complex transcriptomes 113 5.1.2 Platelets transcriptomes vary between human donors 114 5.2 Aims 116 5.3 Results 117 5.3.1 Most PTES transcripts identified in platelets are circular 118 5.3.2 CircRNAs are enriched in anucleate cells and expand the growing catalog of PTES transcripts 120 5.3.3 Reads from circRNA producing exons are enriched in Platelets 124 5.3.4 circRNA abundance in Platelets is due to decay of linear transcripts 128 5.3.5 RNA secondary structure, GC content and miRNA binding sites may contribute to circRNA stability 130 5.4 Discussion 138 5.5 Conclusion 140 Chapter 6: PTES Events in development 141 6.1 Introduction

Mechanisms and Impact of Post-Transcriptional Exon Shuffling

"Evolutionary Emergence of Genes Through Retrotransposition"

Identification and Characterization of Breakpoints and Mutations on Drosophila Melanogaster Balancer Chromosomes

Origins of New Genes: Exon Shuffling

L1 Retrotransposition Is a Common Feature of Mammalian Hepatocarcinogenesis

INTRINSIC and EXTRINSIC EVOLUTION of HELITRONS in FLOWERING PLANT GENOMES by LIXING YANG (Under the Direction of Jeffrey L. Benn

Living Organisms Author Their Read-Write Genomes in Evolution

Alternative Splicing and Evolution: Diversification, Exon Definition and Function

Exon Shuffling Played a Decisive Role in the Evolution of the Genetic Toolkit for the Multicellular Body Plan of Metazoa

Exon-Trapping Mediated by the Human Retrotransposon SVA

Discovery of RNA Splicing and Genes in Pieces PERSPECTIVE Arnold J

Transposon Regulation Upon Dynamic Loss of DNA Methylation Marius Walter

United States Patent (19) 11 Patent Number: 6,165,793 Stemmer (45) Date of Patent: *Dec