Plastome Phylogenomics in the Genus Pinus Using Massively Parallel Sequencing Technology
Total Page:16
File Type:pdf, Size:1020Kb
AN ABSTRACT OF THE DISSERTATION OF Matthew Benjamin Parks for the degree of Doctor of Philosophy in Botany and Plant Pathology presented on May 26, 2011. Title: Plastome Phylogenomics in the Genus Pinus Using Massively Parallel Sequencing Technology. Abstract approved: ___________________________________________________________________________ Aaron I. Liston Richard C. Cronn This thesis summarizes work completed over the previous four years primarily focusing on chloroplast phylogenomic inquiry into the genus Pinus and related Pinaceae outgroups using next-generation sequencing on Illumina platforms. During the time of our work, Illumina sequence read lengths have essentially been limited to 25 to 100 base pairs, presenting challenges when trying to assemble genomic space featuring repetitive regions or regions divergent from established reference genomes. Our assemblies initially relied on previously constructed high quality plastome sequences for each of the two Pinus subgenera, yet we were able to show clear negative trends in assembly success as divergence from reference sequences. This was most evident in assemblies of Pinaceae outgroups, but the trend was also apparent within Pinus subgenera. To counter this problem, we used a combination of de novo and reference-guided assembly approaches, which allowed us to more effectively assemble highly divergent regions. From a biological standpoint, our initial focus was on increasing phylogenetic resolution by using nearly complete plastome sequences from select Pinus and Pinaceae outgroup species. This effort indeed resulted in greatly increased phylogenetic resolution as evidenced by a nearly 60-fold increase in parsimony informative positions in our dataset as compared to previous datasets comprised of only several chloroplast loci. In addition, bootstrap support levels across the resulting phylogenetic tree were consistently high, with ≥95% bootstrap support at 30/33 ingroup nodes in maximum likelihood analysis. A positive correlation between the length/amount of sequence data applied to our phylogeny and overall bootstrap support values was also supported, although trends indicated some nodes would likely remain recalcitrant even with the application of complete plastomes. This correlation was important to demonstrate, as it was reflective of trends seen in a meta-analysis of contemporary, infrageneric chloroplast-based phylogenies. In addition, our meta-analysis indicated that most researchers rely on relatively small regions of the chloroplast genome in these studies and obtain relatively little in resolution and support in resulting phylogenies. Clearly, the application of plastome sequences to these types of analyses has great potential for increasing our understanding of evolutionary relationships at low taxonomic levels. An unexpected finding of this work involved two putative protein-coding regions in the chloroplast, ycf1 and ycf2, which featured strongly elevated rates of mutation, and together accounted for over half of exon parsimony informative sites although making up only 22% of exon sequence length. Of these two loci, clearly ycf1 was more problematic to assemble from short read data, as it featured numerous indels as well as several repetitive regions. We designed primers based on conserved regions allowing essentially complete amplification of this locus and sequenced the ycf1 locus (with Sanger technology) for a representative of each of the 11 Pinus subsections, using accessions from the previous study. Importantly, these primers were also effective across Pinaceae and should facilitate future work throughout the family. Accessions with full ycf1 sequences were in turn utilized as subsectional references as we sequenced and assembled plastomes for most of the remaining Pinus species. To efficiently produce these sequences, we relied on a solution-based hybridization strategy developed by Richard Cronn to enrich preparations of total genomic DNA for chloroplast-specific DNA. While the phylogenetic results of a full-plastome, full-genus analysis were certainly of interest, our final focus was on the investigation of ‘noise’ in our dataset, and whether it affected phylogenetic conclusions drawn from the plastome. To determine this, we explored the removal of variable sites from our alignment and the resultant effect on topology and resolution. This allowed us to identify a window of alignment partitions in which nodal bootstrap support remained high across the genus, yet sufficient noise was removed to identify important patterns in the positioning of three clades with historically problematic phylogenetic positioning. ©Copyright by Matthew Benjamin Parks May 26, 2011 All Rights Reserved Specific Chapter Copyrights Chapter II. ©Copyright Acta Horticulturae. Parks, M., Liston, A. and Cronn, R. 2010. MEETING THE CHALLENGES OF NON- REFERENCED GENOME ASSEMBLY FROM SHORT-READ SEQUENCE DATA. Acta Hort. (ISHS) 859:323-332. http://www.actahort.org/books/859/859_38.htm Chapter IV. ©Copyright American Journal of Botany. Parks, M., Liston, A. and R. Cronn. 2011. NEWLY DEVELOPED PRIMERS FOR COMPLETE YCF1 AMPLIFICATION IN PINUS (PINACEAE) CHLOROPLASTS WITH POSSIBLE FAMILY-WIDE UTILITY. American Journal of Botany (in press). Plastome Phylogenomics in the Genus Pinus Using Massively Parallel Sequencing Technology by Matthew Benjamin Parks A DISSERTATION submitted to Oregon State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy Presented May 26, 2011 Commencement June 2012 Doctor of Philosophy dissertation of Matthew Benjamin Parks presented on May 26, 2011. APPROVED: ___________________________________________________________________________ Co-Major Professor, representing Botany and Plant Pathology ___________________________________________________________________________ Co-Major Professor, representing Botany and Plant Pathology ___________________________________________________________________________ Chairperson of the Department of Botany and Plant Pathology ___________________________________________________________________________ Dean of the Graduate School I understand that my dissertation will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my dissertation to any reader upon request. ___________________________________________________________________________ Matthew Benjamin Parks, Author ACKNOWLEDGEMENTS I would like to express my sincere appreciation for the support from many friends and colleagues both within the Botany and Plant Pathology department and elsewhere. I would also like to express my gratitude to my committee members Joseph Spatafora, Dee Denver and Dave Myrold for their input and advice, as well as to Chris Sullivan of the CGRB, whose expertise in computing and computing infrastructure has been invaluable. Especial thanks are deserved for the unfailing support of my family, including my father and stepmother, Jerry and Lin Parks, my brother, Nathan Parks, and my fiancée, Ariadne Luh. This work also would not have been possible without the continual support of my advisors, Aaron Liston and Richard Cronn, whose effort, enthusiasm and constructive critique for and of my work I hope has paid dividends. Finally, this thesis is dedicated to my father and the memory of my mother, Ann Huenemann Parks, who instilled in me integrity, honesty, and a great desire for education. CONTRIBUTION OF AUTHORS Dr. Aaron Liston and Dr. Richard Cronn were deeply involved in all aspects of the research presented in the present thesis, including study design, the development of novel laboratory and bioinformatic procedures, data collection and analysis, and editing of manuscripts. Certainly without their input and efforts this work would not have been possible. TABLE OF CONTENTS Page Chapter I General Introduction...................................................... 1 LITERATURE CITED.................................................. 9 Chapter II Meeting the Challenges of Non-Referenced Genome Assembly from Short-Read Sequence Data................... 14 MATERIALS AND METHODS................................... 17 RESULTS....................................................................... 18 DISCUSSION................................................................. 19 LITERATURE CITED.................................................. 29 Chapter III Increasing Phylogenetic Resolution at Low Taxonomic Levels Using Massively Parallel Sequencing of Chloroplast Genomes..................................................... 31 MATERIALS AND METHODS................................... 34 RESULTS....................................................................... 38 DISCUSSION................................................................. 41 LITERATURE CITED.................................................. 58 Chapter IV Newly Developed Primers for Complete ycf1 Amplification in Pinus (Pinaceae) Chloroplasts with Possible Family-Wide Utility......................................... 63 METHODS AND RESULTS......................................... 65 CONCLUSIONS............................................................. 67 LITERATURE CITED.................................................. 71 TABLE OF CONTENTS (Continued) Page Chapter V Separating the Wheat from the Chaff: Mitigating the Effects of Noise in a Chloroplast Phylogenomic Dataset............................................................................ 72 MATERIALS AND METHODS..................................