Open HER Thesis Final.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
THE PENNSYLVANIA STATE UNIVERSITY SCHREYER HONORS COLLEGE DEPARTMENT OF BIOLOGY THE ORIGIN AND EVOLUTION OF INVERSION BREAKPOINT SEQUENCES ON THE THIRD CHROMOSOME OF DROSOPHILA PSEUDOOBSCURA HALEY ELIZABETH RANDOLPH SPRING 2016 A thesis submitted in partial fulfillment of the requirements for a baccalaureate degree in Biology with honors in Biology Reviewed and approved* by the following: Stephen W. Schaeffer Professor of Biology Thesis Supervisor James H. Marden Professor of Biology Honors Adviser * Signatures are on file in the Schreyer Honors College. i ABSTRACT The chromosomal arrangements of Drosophila pseudoobscura provide an excellent model system to understand the mechanisms that generate novel chromosomal inversions. D. pseudoobscura has over thirty different chromosomal arrangements generated by a series of overlapping paracentric inversions on the third chromosome. Two major models of rearrangement are postulated: repeat-mediated events and staggered-cut single-stranded break events. Analyzing the DNA sequences at breakpoints of both the ancestral and derived arrangements of inversion mutations will likely reveal the apparent mechanism that led to the formation of the chromosomal arrangements. To study the mechanisms of rearrangement, we looked at three different inversion mutations in D. pseudoobscura. The locations of the breakpoint sequences were determined from three kilobase large-insert mate-pair libraries whose reads were generated on an Illumina HiSeq 2000 sequencing platform. Sequences were assembled from short-insert paired-end libraries, and a gene flanking the breakpoint served as a starting point for sequence assembly. We have determined the ancestral and derived breakpoint sequences of the Standard to Pikes Peak mutation, the derived breakpoint sequences of the Santa Cruz to Tree Line mutation, and the ancestral breakpoint sequences of the Standard to Arrowhead mutation. Bioinformatics investigation suggests that short, common repeat sequences of about 100 base pairs comprise a main portion of the breakpoint regions. These repeat sequences are enriched within all breakpoint sequences analyzed, suggesting that these shared repeats play a general role in the inversion mutational process. No evidence for boundary gene duplication or transposable element involvement exists in the breakpoint sequences, supporting the repeat-mediated model of chromosomal arrangement formation. ii TABLE OF CONTENTS LIST OF FIGURES ..................................................................................................... iii LIST OF TABLES ....................................................................................................... iv ACKNOWLEDGEMENTS ......................................................................................... v Chapter 1 Introduction ................................................................................................. 1 1.1 History of Chromosomal Rearrangements in Drosophila .......................................... 3 1.2 Chromosomal Arrangements in Drosophila pseudoobscura ..................................... 3 1.3 Phylogeny of the Arrangements ................................................................................. 9 1.4 Rearrangement Models .............................................................................................. 10 1.5 Methodology of Mapping Breakpoint Sequences ...................................................... 14 Chapter 2 Methods ....................................................................................................... 16 2.1 Inversion Breakpoint Discovery and Assembly ......................................................... 17 2.2 Bioinformatics Analysis ............................................................................................. 21 Chapter 3 Results ......................................................................................................... 22 3.1 Standard to Arrowhead Inversion Mutation ............................................................... 23 3.2 Standard to Pikes Peak Inversion Mutation ............................................................... 25 3.3 Santa Cruz to Tree Line Inversion Mutation .............................................................. 27 Chapter 4 Discussion .................................................................................................. 30 Appendix A Inversion Breakpoint Sequences ............................................................ 34 A.1 Proximal Ancestral STAR ......................................................................................... 34 A.2 Distal Ancestral STAR .............................................................................................. 37 A.3 Proximal Derived STAR ........................................................................................... 38 A.4 Distal Derived STAR ................................................................................................ 39 A.5 Proximal Ancestral STPP .......................................................................................... 39 A.6 Distal Ancestral STPP ............................................................................................... 42 A.7 Proximal Derived STPP ............................................................................................ 44 A.8 Distal Derived STPP ................................................................................................. 47 A.9 Proximal Derived SCTL ............................................................................................ 50 A.10 Distal Derived SCTL ............................................................................................... 52 BIBLIOGRAPHY ........................................................................................................ 59 iii LIST OF FIGURES Figure 1. Model of inversion mutation that leads to reversal of the sequence contained within the breakpoint regions ............................................................................................................ 1 Figure 2. Clinal distributions of five arrangements found in relatively high frequencies throughout the southwest United States (adapted from ZL Fuller) .................................................... 5 Figure 3. Banding pattern comparison between Giemsa-stained Standard and Arrowhead arrangements (photomicrographs by SW Schaeffer) ....................................................... 7 Figure 4. Chiricahua/Cuernavaca inversion heterozygote; blue outlines the Cuernavaca homolog, and red outlines the Chiricahua homolog (photomicrograph by SW Schaeffer) ............. 8 Figure 5. Phylogenetic relationships of the chromosomal arrangements: Hypothetical (HY), Standard (ST), Pikes Peak (PP), Arrowhead (AR), Santa Cruz (SC), Cuernavaca (CU), Chiricahua (CH), and Tree Line (TL) .............................................................................. 10 Figure 6. Repeat-mediated model of chromosomal rearrangement in which blue arrows represent inverted repeat sequence motifs that can pair and exchange via a crossover event ......... 11 Figure 7. Staggered-cut single-stranded break model of chromosomal rearrangement in which single-strand DNA breaks occur offset from one another in an inversion loop, and subsequent resolution of the inversion loop leads to an inversion mutation .................... 13 Figure 8. Mapping of the STPP inversion breakpoints (a) Method of large-insert mate-pair and short-insert paired-end Pikes Peak library creation and breakpoint discovery (b) Pikes Peak large-insert mate-pair library mapped to Arrowhead reference ....................................... 18 Figure 9. IGV plots of large-insert mate-pair Pikes Peak reads mapped to the Arrowhead reference genome; reads shown in red are those that straddle the breakpoint and thus map >3 Kb from the mate-pair partner (a) Determination of the proximal Pikes Peak breakpoint location (b) Determination of the distal Pikes Peak breakpoint location ......................... 19 Figure 10. Standard to Arrowhead inversion mutation represented on a cytological map ...... 23 Figure 11. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% similarity (a) Repeat analysis of the ancestral proximal and distal STAR breakpoints (b) Repeat analysis of the derived proximal and distal STAR breakpoints ........................... 25 Figure 12. Standard to Pikes Peak inversion mutation represented on a cytological map ....... 26 Figure 13. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% iv similarity (a) Repeat analysis of the ancestral proximal and distal STPP breakpoints (b) Repeat analysis of the derived proximal and distal STPP breakpoints ............................ 27 Figure 14. Santa Cruz to Tree Line inversion mutation represented on a cytological map ..... 28 Figure 15. The black histograms represent the frequency that a BLAST High-scoring Segment Pair included a particular nucleotide in BLASTN comparison of each breakpoint to the entire genome; repeat blocks in red have >90% similarity, repeat blocks in orange have >80% similarity, repeat