Prediction of Novel Bioactive Micropeptides in the Immune System
Total Page:16
File Type:pdf, Size:1020Kb
Prediction of novel bioactive micropeptides in the immune system Fengyuan Hu The Babraham Institute St Edmund’s College, University of Cambridge June 2020 Submitted for the degree of Doctor of Philosophy at the University of Cambridge Declaration of Originality This thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the Preface and specified in the text. It is not substantially the same as any that I have submitted, or, is being concurrently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. It does not exceed the prescribed word limit for the relevant Degree Committee. Fengyuan Hu 1 Table of Contents Acknowledgements ……...……………………………………………………………………...06 Abstract……...…………………………………………………………………………………..08 Abbreviation……………………………………………………………………………………..09 1 Introduction……………………...…………..10 1.1 Roles of bioactive peptides and small proteins in the immune system ....................................... 111 1.2 Small open reading frames (smORFs) and micropeptides ............................................................. 122 1.3 Identification of smORFs and micropeptides ..................................................................................... 188 Approaches to identify protein-coding smORFs .......................................................................................... 21. Bioinformatics ................................................................................................................... 22 Transcriptomics .............................................................................................................. 244 Proteomics ...................................................................................................................... 277 1.4 Functional characterization of micropeptides..................................................................................... 288 1.5 smORF and mircopeptide categories ...................................................................................................... 30 Canonical smORFs ................................................................................................................................................. 33 Upstream ORFs ........................................................................................................................................................ 33 Downstream smORFs ............................................................................................................................................ 34 smORFs in non-coding RNAs ............................................................................................................................. 34 1.6 Importance of smORFs to health and disease ...................................................................................... 35 1.7 Research objectives ...................................................................................................................................... 35 2 Materials and Methods…………….………..37 2.1 Datasets ............................................................................................................................................................ 38 2.2 Reference genome, transcriptome and annotation .............................................................................. 40 2.3 Identifying putative smORFs .................................................................................................................... 42 2 2.4 Sequencing data processing ....................................................................................................................... 43 2.5 ORF discovery ............................................................................................................................................... 44 2.6 Analysis of predicted smORFs ................................................................................................................. 45 2.7 Plasmid design and smORF cloning ....................................................................................................... 46 3 Computational Pipeline to Predict Actively Translated smORFs…….………………….……49 3.1 Summary ......................................................................................................................................................... 50 3.2 Overview of the pipeline ............................................................................................................................ 51 3.3 Prediction of putative smORFs ................................................................................................................. 55 3.4 Sequencing data QC and processing ....................................................................................................... 56 Adapter trimming .................................................................................................................................................... 56 Contaminant removal ............................................................................................................................................. 57 FastQ screen .............................................................................................................................................................. 57 Sequence alignment to the reference genome................................................................................................. 58 Metagene Analysis .................................................................................................................................................. 58 RPF mapping rules and P-site offsets calculation ......................................................................................... 59 Read phasing in Ribo-Seq ..................................................................................................................................... 61 Transcript expression estimation ........................................................................................................................ 61 3.5 ORF discovery ............................................................................................................................................... 61 ORF discovery steps ............................................................................................................................................... 61 3.6 Pipeline Output .............................................................................................................................................. 64 3.7 Results .............................................................................................................................................................. 65 Ribosome profiling data quality .......................................................................................................................... 65 Missing rRNA reference sequences ................................................................................................................... 67 Genome alignment .................................................................................................................................................. 70 Determine P-site offset and sub-codon phasing ............................................................................................. 73 Sufficient read coverage to predict smORFs .................................................................................................. 75 Reproduce the published smORFs ..................................................................................................................... 76 Comparison between ORFLine and RiboCode .............................................................................................. 77 3.8 Pipeline availability ...................................................................................................................................... 80 3.9 Discussion ............................................................................................................................... 80 3 4 Properties of smORFs and Functional Validation of Micropeptide…………...………...82 4.1 Summary ......................................................................................................................................................... 83 4.2 Predicted smORFs ........................................................................................................................................ 83 4.3 smORF classification ................................................................................................................................... 86 4.4 Start codon usage in smORFs ................................................................................................................. 877 4.5 smORF conservation ................................................................................................................................. 899 4.6 Canonical smORFs ......................................................................................................................................