Roberto Simone Phd Thesis 2008
Total Page:16
File Type:pdf, Size:1020Kb
New approaches to unveil the Transcriptional landscape of dopaminergic neurons Roberto Simone International School for Advanced Studies PhD dissertation in Structural and Functional Genomics Trieste – October 2008 1 Supervisor: Prof. Stefano Gustincich, Ph.D. Neurobiology Sector The Giovanni Armenise-Harvard Laboratory SISSA - Trieste External Examiner: Dr Valerio Orlando, Ph.D. Epigenetics and Genome Reprogramming group Dulbecco Telethon Institute IGB CNR - Naples © Roberto Simone 2008 [email protected] Neurobiology Sector International School for Advanced Studies Trieste, Italy 2008 2 “La gente di solito usa le statistiche come un ubriaco i lampioni: più per sostegno che per illuminazione”. Mark Twain “L`insieme e` piu` della somma delle sue parti” Aristotele “La scienza della complessità` ci insegna che la complessita` che vediamo nel mondo e` il risultato di una semplicita` nascosta”. Chris Langton To my Parents: Cinzia and Giovanni, and to my sister Patrizia 3 Table of contents Abstract 9 List of abbreviations 10 SECTION 1. INTRODUCTION The complexity of the Eukaryotic Transcriptome The complexity of the Eukaryotic Transcriptome: Large part of the genome is transcribed 11 The complexity of the Eukaryotic Transcriptome: alternative splicing, alternative transcription initiation and termination 13 The dark side of the Eukaryotic Transcriptome –Transcripts of Unknown Function (TUF) 16 The complexity of the Eukaryotic Transcriptome: Poly A+ versus PolyA- transcripts 22 The complexity of the Eukaryotic Transcriptome: Nuclear versus cytoplasmic transcripts 25 The complexity of the Eukaryotic Transcriptome: What’s the function of many repeat elements? 29 The complexity of the Eukaryotic Transcriptome: S/AS pairs 32 The complexity of the Eukaryotic Transcriptome: the impact of the technical limitations on gene discovery and genome annotation 35 Technologies to unveil transcriptome complexity Technologies to unveil transcriptome complexity: cDNA microarrays, oligonucleotide Gene Chip and tiling arrays 39 Technologies to unveil transcriptome complexity: Serial Analysis of Gene Expression – SAGE 43 Technologies to unveil transcriptome complexity: Cap Analysis of Gene Expression – CAGE 45 Technologies to unveil transcriptome complexity: GIS, GSC, STAGE, DACS and other methods 49 Technologies to unveil transcriptome complexity: High-throughput sequencing technologies 51 Technologies to unveil transcriptome complexity: PCR-based and Transcription-based methods for RNA amplification 54 4 CAGE data and the analysis of mammalian promoters CAGE data and the analysis of mammalian promoters: The classical concept of “core promoter” 62 CAGE data and the analysis of mammalian promoters: Classification of promoter classes 65 CAGE data and the analysis of mammalian promoters: Transcription starts from unconventional sites (Exons, introns and 3`UTR) 67 CAGE data and the analysis of mammalian promoters: Widespread occurrence of alternative promoters 70 CAGE data and the analysis of mammalian promoters: Bidirectional promoters 78 CAGE data and the analysis of mammalian promoters: Mobile promoters 82 The multitasking genome- a new view for gene expression regulation and transcriptome structure 84 Then what is a gene? 92 Mesencephalic Dopaminergic Cells Mesencephalic Dopaminergic Cells: Overview of the system in mammals 95 Mesencephalic Dopaminergic Cells: Anatomical organization 96 Mesencephalic Dopaminergic Cells: Nigrostriatal and Mesolimbic-Mesocortical Pathways 99 Mesencephalic Dopaminergic Cells: The Basal Ganglia circuitry and mDA electrophysiological properties 102 Mesencephalic Dopaminergic Cells: Differentiation and Development 104 Mesencephalic Dopaminergic Cells: DAT, VMAT2 and COMT are crucial components of their function Dopamine Transporter (DAT): function, expression and regulation 110 DAT: multiple transcript isoforms 114 Vmat2: function, expression and regulation 115 Comt: function, expression and regulation 117 Comt: multiple transcript isoforms 118 Mesencephalic Dopaminergic Cells: previous gene expression studies 119 Mesencephalic Dopaminergic Cells: neuropsychiatric disorders 123 5 Parkinson`s disease Parkinson`s disease : retrospecitve, clinical features and epidemiology 127 Parkinson`s disease : Genetics 129 Parkinson`s disease : alpha-synuclein 131 Parkinson`s disease : multiple transcripts of Snca 137 Parkinson`s disease : molecular pathways 141 Parkinson`s disease : Current therapies 145 SECTION 2. MATERIALS AND METHODS Striatal Cells and RNA extraction 148 RNA amplification using SMART7 148 cDNA microarray: probe preparation, hybridizations and scanning 150 Statistical analysis of microarray data 151 Real Time-PCR of differentially expressed genes in Striatal cells 151 nanoCAGE –Illumina-Solexa 152 nanoCAGE – 454 154 Bioinformatic analysis of nanoCAGE data 157 RT-PCR Validation of target genes from total Midbrain 158 5`-RACE validation of target transcripts starting at specific TSSs 160 In Situ Hybridization (ISH) analysis of transcript isoforms on mice midbrain cryosections 161 Immunofluorescence (IF) 162 Oligonucleotide sequences and functional grouping 163 SECTION 3. RESULTS cDNA Microarray transcription profiling from limiting amount of starting material Overview and Validation of the SMART7 technique 165 6 Overview and Validation of the “Brownstein method” 169 SMART7 and cDNA microarrays 173 Developing of a new tecnique for Genome-Wide Tagging and Mapping of the Transcription Start Sites from limited amount of RNA: nanoCAGE and high- throughput sequencing The classical CAGE protocol requires micrograms of RNA 176 nanoCAGE – what is new: random priming, semi-suppressive PCR, starting from 10 ng 177 nanoCAGE-454 and nanoCAGE-Illumina-Solexa: new platforms for high-throughput gene expression profiling from small samples 181 Application of the NanoCAGE technology to the mesencephalic dopaminergic neurons SN (A9) and VTA (A10) dopaminergic nuclei LCM purification of A9 and A10 neurons and nanoCAGE amplification 181 Bioinformatic analysis of sequencing data 183 TSSs clustering strategies 187 Tag clusters-Transcripts associations 190 Generation of a list of potential target genes for further experimental validation 196 GO terms and categories distribution 198 Differentially expressed protein-coding transcripts in SN and VTA neurons 202 Chromosomal distribution of the promoters in SN and VTA neurons 204 Use of alternative promoters of biologically relevant protein coding genes in SN and VTA neurons Alpha-synuclein (Snca): Differential usage of alternative promoters potentially linked to different splicing patterns and post-transcriptional modifications of the protein 207 Dopamine Transporter (Slc6a3/Dat): Different alternative TSSs associated to potentially different protein isoforms 211 Vescicular Monoamine Transporter (Slc18a2): an alternative intronic TSS is mainly used by A10 neurons 212 Catechol-O-methyltransferase (Comt): alternative promoter usage associated to MB-Comt and S- Comt in A9 and A10 dopaminergic neurons 215 7 The non coding RNAs landscape in SN and VTA neurons as emerging from nanoCAGE and deep sequencing Relative abundance and different families of non coding RNAs 217 Differentially expressed microRNA transcripts in SN and VTA neurons 219 Sense-Antisense pairs transcription 220 Candidate non coding genes for establishing functional differences 222 SECTION 5. DISCUSSION and CONCLUSION 228 Acknowledgements 235 References 237 8 Abstract Recent advances in studying the mammalian transcriptome arised new questions about how genes are organized and what is the function of noncoding RNAs. Furthermore, the discovery of large amounts of polyA- transcripts and antisense transcription proved that a portion of the transcriptome has still to be characterized. The complex anatomo-functional organization of the brain has prevented a comprehensive analysis of the transcriptional landscape of this tissue. New techniques must be developed to approach neuronal heterogeneity. In this study we combined Laser Capture Microdissection (LCM) and nanoCAGE, based on Cap Analysis of Gene Expression (CAGE), to describe expressed genes and map their transcription start sites (TSS) in two specific populations, A9 and A10, of mouse mesencephalic dopaminergic cells. Although sharing common dopaminergic marker genes, these two populations are part of different midbrain anatomical structures, substantia nigra (SN) for A9 and ventral tegmental area (VTA) for A10, project to relatively distinct areas, participate to distinct ascending dopaminergic pathways, exhibit different electrophysiological properties and different susceptibility to neurodegeneration in Parkinson`s disease. Specific neurons were identified by the expression of Green Fluorescent Protein driven by a cell- type specific promoter in transgenic mice. High-quality RNAs were purified from 1000-2500 cells collected by LCM. We adapted the CAGE technique to analyze limiting amounts of RNAs (nanoCAGE). We took advantage of the cap-switching properties of the reverse transcriptase to specifically tag the 5`end of transcripts with a sequence containing a class III restriction site for EcoP15I. By creating 32bp 5`tags, we considerably improved the TSS mapping rate on the genome. A semi-suppressive PCR strategy was used to prevent primer dimers formation. The use of random priming in the 1st strand synthesis allowed to capture poly(A)- RNAs. 5`tags were sequenced with Illumina-Solexa platform. Here we show that this new nanoCAGE technology ensures a true high-throughput coverage