Detecting and Quantifying the Translated Transcriptome with Ribo-Seq Data

Detecting and Quantifying the Translated Transcriptome with Ribo-Seq Data

Detecting and quantifying the translated transcriptome with Ribo-seq data D i s s e r t a t i o n zur Erlangung des akademischen Grades Ph.D. bzw. Doctor of Philosophy im Fach Biologie eingereicht an der Lebenswissenschaftlichen Fakultät der Humboldt-Universität zu Berlin von MSc Lorenzo Calviello Präsidentin der Humboldt-Universität zu Berlin Prof. Dr.-Ing. Dr. Sabine Kunst Dekan der Lebenswissenschaftlichen Fakultät der Humboldt-Universität zu Berlin Prof. Dr. Bernhard Grimm Gutachter/innen: 1 – Prof. Dr. Uwe Ohler 2 – Prof. Dr. Nils Blüthgen 3 – Prof. Dr. Markus Landthaler Tag der mündlichten Prüfung 08.11.2017 Contents 1 Introduction ____________________________________________________________ 8 1.1 Thesis outline _______________________________________________________ 9 2 Background ___________________________________________________________ 10 2.1 The Molecular Biology of RNA processing _______________________________ 10 2.1.1 Life and the central dogma ________________________________________ 10 2.1.2 A multitude of RNA species _______________________________________ 13 2.1.3 Nuclear processing ______________________________________________ 17 2.1.4 The cytoplasmic fates of an RNA molecule ___________________________ 19 2.1.5 The Translation process __________________________________________ 20 2.1.6 Translation regulation ____________________________________________ 23 2.1.7 Translation and RNA decay _______________________________________ 26 2.2 Omics techniques to understand RNA biology ____________________________ 29 2.2.1 Next-generation sequencing _______________________________________ 29 2.2.2 RNA-seq applications ____________________________________________ 31 2.2.3 Ribosome Profiling ______________________________________________ 34 2.2.4 Proteomics approaches ___________________________________________ 36 2.3 Computational analysis of -omics data ___________________________________ 40 2.3.1 Genomes and transcriptomes ______________________________________ 40 2.3.2 NGS data pre-processing & mapping ________________________________ 41 2.3.3 Quantification and normalization strategies ___________________________ 42 2.3.4 Beyond count-based methods ______________________________________ 44 2.3.5 The Fourier transform and the Multitaper method ______________________ 45 2.3.6 Ribosome profiling data analysis ___________________________________ 50 2.3.7 Evolutionary signatures on genomic regions __________________________ 59 2.3.8 Shotgun proteomics data analysis ___________________________________ 61 3 Results _______________________________________________________________ 64 3.1 A novel approach to Ribo-seq data analysis _______________________________ 64 3.1.1 Spectral analysis of P-sites profiles __________________________________ 64 3.1.2 On sensitivity and specificity ______________________________________ 67 3.1.3 The RiboTaper strategy to identify translated ORFs ____________________ 69 3.2 Identification of actively translated ORFs in a human cell line ________________ 73 3.2.1 Known and novel ORFs across a wide expression range. _________________ 73 3.2.2 Distinct evolutionary conservation patterns in different ORF categories _____ 77 3.2.3 The de novo identified translatome as a proxy for the cellular proteome. ____ 80 i 3.3 ORF detection with an improved protocol in Arabidopsis thaliana ____________ 82 3.3.1 Analysis of an improved, high-resolution Ribo-seq protocol ______________ 82 3.3.2 New, ultra-conserved ORFs in non-coding genes _______________________ 86 3.4 Annotating and quantifying the translated transcriptome _____________________ 89 3.4.1 The SaTAnn strategy _____________________________________________ 89 3.4.2 Validating translation quantification _________________________________ 98 3.4.3 Translation on degraded RNA isoforms _____________________________ 100 4 Discussion ___________________________________________________________ 104 5 References ___________________________________________________________ 112 Appendix A: The Slepian Sequences and the multitaper F-test ______________________ 133 Appendix B: Supplementary Materials _________________________________________ 136 B.1: The Ribo-seq protocol in HEK293 ______________________________________ 136 B.2: Ribo-seq and RNA-seq data processing. __________________________________ 137 B.3: Supplementary Figure 1: Metagene analysis for different Ribo-seq datasets ______ 137 B.4: multitaper analysis ___________________________________________________ 139 B.5: QTI-seq analysis ____________________________________________________ 139 B.6: Supplementary Figure 2: The toddler ncORF ______________________________ 139 B.7: Supplementary Table 1: RiboTaper-detected ORFs in Danio rerio. _____________ 140 B.8: Evolutionary conservation analysis ______________________________________ 140 B.9: Mass spectrometry preparation and data analysis. __________________________ 141 B.10: Supplementary Figure 3: Additional statistics about Ribotaper- and Uniprot-only identified peptides. ______________________________________________________ 142 B.11: Ribo-seq data processing in Arabidopsis Thaliana _________________________ 142 B.12: Supplementary Table 2: Mapping statistics for the different libraries analyzed in Arabidopsis thaliana. ____________________________________________________ 143 B.13: Supplementary Table 3: Read lengths and cutoffs used to infer P-sites position in Arabidopsis thaliana. ____________________________________________________ 144 B.14: Protein Alignments _________________________________________________ 144 B.15: Polysome profiling, nuclear-cytoplasmic comparison and 5’end sequencing. ____ 145 B.16: Supplementary Figure 4: Translation quantification on different transcript biotypes. ______________________________________________________________________ 145 Appendix C: List of main software used in this study. _____________________________ 146 List of Publications ________________________________________________________ 147 Acknowledgments _________________________________________________________ 148 ii List of main figures Figure 1: The genetic code ....................................................................................................... 12 Figure 2: The central dogma and its main molecular actors .................................................... 13 Figure 3: An overview of the human transcriptome ................................................................ 16 Figure 4: mRNA nuclear processing. ....................................................................................... 18 Figure 5: Different RNA cytoplasmic fates. ............................................................................ 19 Figure 6: The main steps of the translation process. ................................................................ 21 Figure 7: Ribosomal translocation. .......................................................................................... 23 Figure 8: Functional heterogeneity of the alternative transcriptome. ...................................... 27 Figure 9: Illumina sequencing-by-synthesis approach. ............................................................ 30 Figure 10: The Ribo-seq protocol. ........................................................................................... 34 Figure 11: Shotgun proteomics example workflow. ................................................................ 37 Figure 12: Example from the GENCODE 19 GTF file. .......................................................... 40 Figure 13: A schematic of the Fourier transform. .................................................................... 47 Figure 14: Different window functions and their spectral leakage. ......................................... 48 Figure 15: Example of multitaper PSD estimation. ................................................................. 48 Figure 16: Example of Slepian sequences. ............................................................................... 49 Figure 17: Sub-codon resolution in Ribo-seq data. .................................................................. 51 Figure 18: Evolutionary signatures of genomic regions. ......................................................... 60 Figure 19: Analysis workflow for MS/MS data. ...................................................................... 62 Figure 20: Metagene analysis of 29 nt Ribo-seq reads in HEK293. ........................................ 65 Figure 21: Spectral analysis of individual exonic P-sites profiles. .......................................... 66 Figure 22: Sensitivity and specificity of the multitaper method. ............................................. 67 Figure 23: Comparison between the multitaper method and other windowing approaches. ... 69 Figure 24: The RiboTaper workflow. ...................................................................................... 70 Figure 25: RiboTaper de novo ORF-finding strategy. ............................................................. 71 Figure 26: Schematics of RiboTaper ORFs annotation. .......................................................... 72 Figure 27: RiboTaper-detected ORFs in HEK293, across gene biotypes and expression values. .................................................................................................................................................. 73 Figure 28: ORFs categories identified by RiboTaper. ............................................................. 74 Figure 29: QTI-seq comparison ad between-samples reproducibility.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    149 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us