[Thesis Title Goes Here]
Total Page:16
File Type:pdf, Size:1020Kb
ORIGIN AND EVOLUTION OF EUKARYOTIC GENE SEQUENCES DERIVED FROM TRANSPOSABLE ELEMENTS A Dissertation Presented to The Academic Faculty by Jittima Piriyapongsa In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Bioinformatics in the School of Biology Georgia Institute of Technology August 2008 ORIGIN AND EVOLUTION OF EUKARYOTIC GENE SEQUENCES DERIVED FROM TRANSPOSABLE ELEMENTS Approved by: Dr. I. King Jordan, Advisor Dr. John McDonald School of Biology School of Biology Georgia Institute of Technology Georgia Institute of Technology Dr. Mark Borodovsky Dr. Leonid Bunimovich Department of Biomedical Engineering School of Mathematics Georgia Institute of Technology and Emory Georgia Institute of Technology University Dr. Jung Choi School of Biology Georgia Institute of Technology Date Approved: May 29, 2008 To my family... ACKNOWLEDGEMENTS First of all, I would like to take this opportunity to thank my advisor, Dr. I. King Jordan, for his constant guidance, motivation, and support. Without him, I would not have come this far. His knowledge, experience, and insights have been very influential in my research. Thank you so much for a great experience. Also, I would like to thank Dr. Mark Borodovsky, my first mentor here at Georgia Tech. I received much knowledge and experience from him as I worked in his lab for two years as a PhD student. The skills and scientific thought process I learned have been influential in my research. I would also like to thank my thesis committee members for their advice, time and expertise throughout this entire thesis process. Special thanks to my lab mates, Lee Katz and Ahsan Huda, for their friendships and for being supportive in every way. I also thank all Jordan’s lab members for their help and support. Thanks to my fellow graduate students, Burcu Bakir and Navin Elango, for being my good friends since the first day we met. The five years here have been quite an experience and you all have made it a memorable time of my life. I would like to especially thank my parents for their endless love, and support in all my efforts, and for giving me the foundation to be who I am. Without their encouragement and dedication, I would not be at this point. Even though my father is not here anymore, I always have him in my heart and I know that he would be proud of me. To my sisters - Yoo and Hui - thank you for always being there for me. Special thanks to my close friend, Nantachai Kantanantha (Lek), who encourages and supports me in every way since my first day as a PhD student. Thanks for always being here for me no matter what. I am also thankful to my Thai friends here - Gib, Nu, Oh, Chat, P’Golf, Jan, Chi, iv Tam, Ter, P’Term for having dinner and sharing stories with me every Fridays. Also, I really enjoy every trips and every fun activites we had together here. Thank you to all my friends in Thailand for their support and understanding. Finally, I would like to thank other people who deserve the gratitude but I have not mentioned here. To them, please accept my apology and thank you. v TABLE OF CONTENTS Page ACKNOWLEDGEMENTS iv LIST OF TABLES x LIST OF FIGURES xii LIST OF SYMBOLS AND ABBREVIATIONS xv SUMMARY xviii CHAPTER 1 INTRODUCTION AND LITERATURE REVIEW 1 Overview of Transposable Elements (TEs) 1 Transposable elements: “junk DNA” or “genetic treasure” 4 Contribution of transposable elements to host protein coding sequences 6 Impact of transposable elements on the evolution of host gene regulation 11 Introduction to microRNAs (miRNAs) and small interfering RNAs (siRNAs) 17 Analysis of origin and evolution of gene sequences derived from TEs 20 2 EXONIZATION OF THE LTR TRANSPOSABLE ELEMENTS IN HUMAN GENOME 22 Abstract 22 Introduction 23 Results and Discussion 25 Updated list of LRTS-associated genes 25 Distribution of LRTS in human exons 26 LRTS-derived protein coding exons 29 Contribution of LRTS to gene transcripts 30 vi Reconstruction of evolution of IL22RA2 gene (transcript variant 1) 31 Conclusions 37 Methods 37 Bioinformatic analysis 37 PCR amplification of the IL22RA2 target exon 39 Acknowledgements 40 3 EVALUATING THE PROTEIN CODING POTENTIAL OF EXONIZED TRANSPOSABLE ELEMENT SEQUENCES 41 Abstract 41 Background 42 Results and Discussion 47 Searching for TE-associated proteins 47 Comparative analysis of cases of TE-CDS exaptation 54 Case studies of known TE-derived genes 58 Evolutionary relationship between TE and cellular proteins 59 Protein coding potential of TE-derived exons 62 GC codon distribution for TE-derived exons 68 Conclusions 70 Methods 72 Detection of TE-encoded protein fragments 72 Analysis of known cases of TE-derived proteins (genes) 75 Evolutionary analysis of TE-associated protein domain 75 Codon based analysis of TE-derived exons 76 Acknowledgements 78 4 ORIGIN AND EVOLUTION OF HUMAN MICRORNAS FROM TRANSPOSABLE ELEMENTS 79 vii Abstract 79 Introduction 80 Materials and Methods 83 Detection 83 Evolution 84 Regulation and function 85 TE-miRNA prediction 85 Results 86 Transposable-element-derived miRNAs 86 Evolution of TE-derived miRNAs 92 Regulation and function 96 Prediction of novel TE-derived miRNAs 101 Discussion 106 Abundance of TE-derived miRNAs 106 Conservation of TE-derived miRNAs 107 Lineage-specific effects of TE-derived miRNAs 109 Genome defense and global gene regulatory mechanisms 110 5 A FAMILY OF HUMAN MICRORNA GENES FROM MINIATURE INVERTED-REPEAT TRANSPOSABLE ELEMENTS 112 Abstract 112 Introduction 113 Methods 116 TE-miRNA sequence analysis 116 Regulatory analysis 117 Results and Discussion 119 A TE-derived miRNA gene family 119 viii Regulatory effects of hsa-mir-548 125 Conclusions 133 6 DUAL CODING OF SIRNAS AND MIRNAS BY PLANT TRANSPOSABLE ELEMENTS 137 Abstract 137 Introduction 138 Results 142 Discussion 149 Materials and Methods 154 7 CONCLUSION 156 APPENDIX A: SUPPLEMENTARY INFORMATION FOR CHAPTER2 162 APPENDIX B: SUPPLEMENTARY INFORMATION FOR CHAPTER3 173 APPENDIX C: SUPPLEMENTARY INFORMATION FOR CHAPTER4 177 APPENDIX D: SUPPLEMENTARY INFORMATION FOR CHAPTER5 211 APPENDIX E: SUPPLEMENTARY INFORMATION FOR CHAPTER6 233 PUBLICATIONS 256 REFERENCES 257 VITA 297 ix LIST OF TABLES Page Table 1.1: Genome sizes and transposable element proportions for eukaryotic species 2 Table 1.2: Protein-coding host genes domesticated from TEs 9 Table 1.3: Vertebrate regulatory elements generated by TEs 12 Table 2.1: The distribution of the number of LTR elements (either partial or full elements) containing in an exon 27 Table 2.2: The distribution of the extent of overlap between an exon and an LTR element 28 Table 2.3: The distribution of type of exons containing LRTS 28 Table 2.4: The distribution of class/family of LRTS containing in an exon 28 Table 3.1: Detection of TE-encoded sequences in PDB proteins 48 Table 3.2: Detection of TE-encoded sequences in Swiss-Prot directly sequenced proteins 49 Table 3.3: Sequence similarity program-query-database combinations used to search for TE-related host sequences 51 Table 3.4: Classification of proteins containing TE-associated Pfam domains detected by the GA and TC cutoffs of HMMER 55 Table 3.5: Analysis of the qualified set of TE-associated domain containing proteins 57 Table 3.6: Detection of previously identified TE-associated proteins 59 Table 3.7: Comparison of protein coding potential for Alu-derived exons versus other TE-derived exons 66 Table 3.8: Comparison of GC2/GC3 ratios for different classes of TE-derived and non TE genes (exons) 70 Table 4.1: TE-derived human miRNAs 88 Table 4.2: Putative TE-derived miRNA paralogs 91 Table 4.3: Human-mouse orthologous miRNAs derived from L2 and MIR TEs 95 Table 4.4: Predicted TE-derived miRNA genes 102 x Table 5.1: Made1-derived miRNA genes in the human genome 120 Table 6.1: Plant miRNA genes derived from TEs 143 Table A.1: Features of LRTS-derived protein coding exons 162 Table B.1: List of 124 TE-associated Pfam protein domains 173 Table C.1: mRNAs anticorrelated with hsa-mir-130b and their associated GO terms 177 Table C.2: Conserved RNA secondary structures that co-locate with human TE sequences 178 Table D.1: Made1 homologous human expressed sequence tags (ESTs) 211 Table D.2: Over-represented GO biological process categories among genes with Made1- derived hsa-mir-548 target sites 214 Table D.3: Over-represented GO biological process categories among genes with miRanda predicted hsa-mir-548 target sites that map to colorectal cancer down-regulated co-expression clusters 215 Table D.4: Putative hsa-mir-548 target genes previously implicated as being involved in colorectal cancer by microarray expression profiling 222 Table E.1: TE-derived miRNAs 233 xi LIST OF FIGURES Page Figure 1.1: Classes of TEs found in human genome and their characteristics 4 Figure 2.1: Exon-intron organization of human IL22RA2 gene 33 Figure 2.2: PCR-sequencing 34 Figure 2.3: Multiple sequence alignment of PCR products 35 Figure 2.4: Evolutionary history of IL22RA2 gene 36 Figure 3.1: Sensitivity and selectivity comparison for different sequence similarity search methods 52 Figure 3.2: Relationships among sequence similarity search methods 54 Figure 3.3: Phylogenetic relationship of TE and cellular THAP domains 61 Figure 3.4: Coding probability of human CCDS genes 65 Figure 3.5: Coding probability of genes with Alu-derived exons 67 Figure 3.6: The GC composition of human CCDS genes 69 Figure 4.1: Percentage of TE-derived residues in miRNA genes 88 Figure 4.2: Percentage of TE sequences among different classes and families for the human genome and for TE-derived miRNA genes