Thesis Final
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA SANTA CRUZ IMPROVING GENE ANNOTATION AND UNDERSTANDING OF MAMMALIAN EVOLUTION BY IDENTIFYING RETROCOPIES OF MRNA TRANSCRIPTS A dissertation submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in BIOINFORMATICS By Robert D. Baertsch June 2010 The Dissertation of Robert D. Baertsch Is approved: ________________________________ Professor David Haussler, Chair ________________________________ W. James Kent ________________________________ Professor John Tamkun Table of Contents ABSTRACT ................................................................................................................................................V DEDICATION.........................................................................................................................................VII ACKNOWLEDGEMENTS.................................................................................................................VIII 1. BACKGROUND .....................................................................................................................................1 Introduction to Transposable Elements.............................................................................................2 Transposable elements.................................................................................................................................... 3 Types of non-retroviral retrotransposons ...................................................................................................... 4 Processed pseudogenes and retrogene........................................................................................................... 5 Dating of retrotransposition events................................................................................................................ 6 Overview of thesis ...............................................................................................................................7 2. RETROCOPY CONTRIBUTIONS TO THE EVOLUTION OF THE HUMAN GENOME ...8 ABSTRACT .................................................................................................................................................8 Background..........................................................................................................................................8 Results ..................................................................................................................................................8 Conclusion ...........................................................................................................................................9 BACKGROUND .........................................................................................................................................10 RESULTS ..................................................................................................................................................13 Retrocopies with strong evidence of expression .............................................................................14 Major categories of retrocopy contributions ..................................................................................15 Comparison with other datasets.......................................................................................................16 Overview of types of events whose expression was strongly supported ........................................18 Type I – genes modified by exon acquisition...................................................................................20 Type I exon acquisition in reverse orientation................................................................................22 Type II duplication events.................................................................................................................26 Functional categories of source genes ............................................................................................33 DISCUSSION.............................................................................................................................................33 CONCLUSION ...........................................................................................................................................37 METHODS ................................................................................................................................................37 Algorithm for finding candidate retrocopies in the human genome ..............................................37 Score Function ..................................................................................................................................40 Description of Features ....................................................................................................................41 Filtering Alignments .........................................................................................................................44 Determining the Parent Loci............................................................................................................45 Resolving Conflicts from Multiple Parent Genes............................................................................45 Filtering out Zinc Finger, Mitrochondrial and Immunoglobin Genes ..........................................46 Determining Expression using mRNA and EST Evidence..............................................................46 Classification of Retrocopies according to Evolutionary Event....................................................47 Species Comparisons ........................................................................................................................48 AUTHORS' CONTRIBUTIONS ....................................................................................................................50 3. BURST OF EXAPTED HUMAN RETROGENES NOT FOUND IN MOUSE AND IMPROVED METHODS FOR IDENTIFYING RETROCOPIES..................................................51 ABSTRACT ...............................................................................................................................................51 BACKGROUND .......................................................................................................................................52 MOTIVATION ...........................................................................................................................................52 DEFINITIONS............................................................................................................................................53 PREVIOUS WORK ON PSEUDOGENE AND RETROGENE ANNOTATION .....................................................54 PREVIOUS WORK ON RATE OF GENE DUPLICATION IN PRIMATES AND NON-PRIMATES ........................55 ALGORITHM ...........................................................................................................................................57 OVERVIEW...............................................................................................................................................57 FEATURES................................................................................................................................................58 Exons-spliced feature........................................................................................................................58 Conserved splice-site feature ...........................................................................................................59 Intron feature.....................................................................................................................................59 Break-in-orthology feature ...............................................................................................................60 Poly(A) feature ..................................................................................................................................61 SCORE FUNCTION ....................................................................................................................................61 TOOLS........................................................................................................................................................61 SOURCE CODE AND SCRIPTS ...................................................................................................................62 WEB PAGES FOR DISPLAYING RESULTS .................................................................................................62 THE RETROGENE TRACK IN THE UCSC GENOME BROWSER .................................................................62 RESULTS AND DISCUSSION ..............................................................................................................65 DATASETS FOR BENCHMARKING ............................................................................................................65 Comparison with Vega processed pseudogenes..............................................................................65 Comparison with Yale processed pseudogenes...............................................................................66 Comparison with Virgina Tech retrogenes .....................................................................................68 Results compared with UCSC Known Genes and expression........................................................68 Age Distribution ................................................................................................................................72