Modeling Distributions of Chromosomal Modifications Using Chromosomal Features
Total Page:16
File Type:pdf, Size:1020Kb
Modeling Distributions of Chromosomal Modifications Using Chromosomal Features A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Joshua Adam Baller IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Chad Myers, Daniel Voytas February 2012 Copyright Joshua A. Baller 2012 Acknowledgements J. Gao played an integral role in the production of the Ty5 and Ty1 datasets. H. Wang, D. Mayhew and R. Mitra made additional Ty5 data available prior to publication. M. Hickman made K. lactis ORC data available prior to publication Also, thanks to J. Carlis for advice on data storage as well as to R. Bushman and N. Milani for advice on data processing and statistical approaches. i This dissertation is dedicated to: My mother, father and sister You have given me a great deal of love and support. This dissertation is the culmination of years of work, long nights and little free time. I wouldn’t have made it to this point without your patience and understanding. My thesis advisors: Dan, Chad and (honorarily) Judy These discoveries were not made in a vacuum. You have been my sounding boards, my strongest allies and my harshest critics. I hope in my own career I can be equal to your examples. And to all my mentors and advisors throughout the years Your encouragement and advice inspired me to push my limits and discover new frontiers. ii Abstract Chromatin plays a major role in the regulation and evolution of genomic DNA. The advent of high-throughput sequencing, and the subsequently increasing availability of sequencing data from chromatin immunoprecipitation experiments, is leading to a comprehensive view of the chromatin landscape in key model organisms such as S. cerevisiae. To date, little has been done to exploit the availability of such data. My work develops a logistic regression based framework capable of dissecting the observed distribution of a particular chromosomal modification. This framework models the observed distribution in terms of other known chromosomal features in the organism. I have applied this approach to the distributions of Ty5 and Ty1 retrotransposons, identifying previously unknown integration patterns. For Ty5, I identified integration, independent of the canonical mechanism, at sites of open DNA. For Ty1, I identified precise integration events on a single surface of nucleosomes found near Polymerase III transcribed genes. Additionally, a similar logistic regression approach was developed to predict origins of replication in terms of nucleosome patterning. This resulted in a 200-fold enrichment for origin sites and over 7000-fold enrichment when ORC occupancy data was considered. Together these studies present a general model capable of utilizing the available chromosomal data to provide either mechanistic models or site predictions in a variety of organisms. iii Table of Contents ACKNOWLEDGEMENTS .......................................................................................................................... I ABSTRACT ................................................................................................................................................ III TABLE OF CONTENTS ........................................................................................................................... IV LIST OF TABLES ..................................................................................................................................... VI LIST OF FIGURES .................................................................................................................................. VII LIST OF ABBREVIATIONS ................................................................................................................. VIII 1. INTRODUCTION ................................................................................................................................ 1 Chromosomes and Chromatin in the Three Domains of Life ................................................................. 3 Detection of DNA Modifications .......................................................................................................... 13 Detection of Chromatin Modifications ................................................................................................. 18 Algorithms in the Literature ................................................................................................................. 19 Model Selection .................................................................................................................................... 21 Sparse Modeling ................................................................................................................................... 24 Applications of the Model ..................................................................................................................... 25 2. CHAPTER 1: TY5 INTEGRATION SITE SELECTION .............................................................. 28 GENERAL OVERVIEW OF RETROTRANSPOSON BIOLOGY ............................................................................ 28 TY5 BACKGROUND .................................................................................................................................... 29 RESULTS .................................................................................................................................................... 30 The Ty5 insertion dataset...................................................................................................................... 30 Ty5’s primary target site bias. .............................................................................................................. 32 Relationships between Ty5 insertions and chromosomal features. ...................................................... 34 Ty5’s secondary target site bias. .......................................................................................................... 38 DISCUSSION ............................................................................................................................................... 42 MATERIALS AND METHODS ....................................................................................................................... 45 Recovery of Ty5 insertions.................................................................................................................... 45 Random control insertions. ................................................................................................................... 45 Data annotation and analysis ............................................................................................................... 46 3. CHAPTER TWO: TY1 INTEGRATION SITE SELECTION ...................................................... 47 TY1 BACKGROUND .................................................................................................................................... 47 RESULTS .................................................................................................................................................... 47 Generating, recovering and mapping Ty1 insertions ........................................................................... 47 Genomic distribution of Ty1 insertions in wild type strains ................................................................. 50 Ty1 insertion at class III genes ............................................................................................................. 52 Logistic regression to identify Ty1 targeting determinants .................................................................. 53 Ty1 insertions and nucleosomes ........................................................................................................... 55 Ty1 insertions and endogenous Ty elements ......................................................................................... 56 Ty1 insertions and class II genes .......................................................................................................... 58 Ty1 insertion patterns in mutant backgrounds ..................................................................................... 59 DISCUSSION ............................................................................................................................................... 63 Ty1 and the nucleosome ....................................................................................................................... 66 iv High throughput mapping of insertion sites in mutant strains ............................................................. 68 MATERIALS AND METHODS ....................................................................................................................... 69 Generating Ty1 insertions .................................................................................................................... 69 DNA sequence processing .................................................................................................................... 70 Data annotation and analysis ............................................................................................................... 71 4. CHAPTER THREE: PREDICTION OF ORIGINS OF REPLICATION .................................... 73 BACKGROUND ........................................................................................................................................... 73 Origins of Replication .........................................................................................................................