Evaluation of a New Method for Large-Scale and Gene-Targeted Next Generation DNA Sequencing in Nonmodel Species
Total Page:16
File Type:pdf, Size:1020Kb
University of Montana ScholarWorks at University of Montana Graduate Student Theses, Dissertations, & Professional Papers Graduate School 2013 Evaluation of a New Method for Large-Scale and Gene-targeted Next Generation DNA Sequencing in Nonmodel Species Ted Cosart The University of Montana Follow this and additional works at: https://scholarworks.umt.edu/etd Let us know how access to this document benefits ou.y Recommended Citation Cosart, Ted, "Evaluation of a New Method for Large-Scale and Gene-targeted Next Generation DNA Sequencing in Nonmodel Species" (2013). Graduate Student Theses, Dissertations, & Professional Papers. 4133. https://scholarworks.umt.edu/etd/4133 This Dissertation is brought to you for free and open access by the Graduate School at ScholarWorks at University of Montana. It has been accepted for inclusion in Graduate Student Theses, Dissertations, & Professional Papers by an authorized administrator of ScholarWorks at University of Montana. For more information, please contact [email protected]. EVALUTATION OF A NEW METHOD FOR LARGE-SCALE AND GENE- TARGETED NEXT GENERATION DNA SEQUENCING IN NONMODEL SPECIES By Ted Cosart BA, University of Montana, Missoula, Montana, 1983 MS, University of Montana, Missoula, Montana, 2006 Dissertation presented in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Individualized, Interdisciplinary Graduate Program The University of Montana Missoula, Montana August, 2013 Approved by: Sandy Ross, Associate Dean of The Graduate School Graduate School Dr. Jesse Johnson, Co-Chair Computer Science Dr. Gordon Luikart, Co-Chair Flathead Biological Station Dr. Jeffrey Good Division of Biological Sciences Dr. William Holben Division of Biological Sciences Dr. Stephen Porcella Rocky Mountain Laboratories, National Institute of Allergy and Infectious Diseases Dr. Alden Wright Computer Science Cosart, Ted, 2013 Evaluation of a New Method for Large-Scale and Gene-targeted Next Generation DNA Sequencing in Nonmodel Species Chairperson or Co-Chairperson: Gordon Luikart, Ph.D. Chairperson or Co-Chairperson: Jesse Johnson, Ph.D. The efficient method called exon capture provides for sequencing genes genome-wide, targeting candidate genes, and sampling specific exons within genes. Although developed for model species with available whole genome sequences, the method can capture exons in nonmodel species using the genomic resources of a related model species. How close the relatives must be for effective exon capture is not known. The work herein demonstrates cross-taxa capture in ungulates, using the domestic cow genome as a reference. It also describes a computer program designed for collecting exon sequences for exon capture, allowing users to set per-gene and overall base pair (bp) limits, and to prefer internal or external exons. Cross-taxa exon capture was tested with subject-reference divergence times from 0 to ~60 million years. Sequencing success decreased with increasing subject-reference phylogenetic divergence. With the domestic cow genome as reference, American bison exons, at 1-2 million years (MY) of divergence, were captured as successfully as those of a domestic cow. The cow and bison captures each yielded sequence from ~80% of the 3.6 million bp targeted. Two bighorn sheep, 7 mule deer, and 4 pigs at about 20, 30, and 60 MY of divergence from the cow, respectively, yielded averages of ~70%, ~60%, and ~55% of the targeted bp. A gene family with many closely related, duplicated loci was expected to show reduced success compared to the whole collection. This prediction was supported, as 63 exons in the MHC gene family sequences yielded 62% fully sequenced in the cow, and 32%, 20%, and 4% for the bighorn, deer, and pigs, respectively. A comparison of two sequence alignment programs showed that Stampy, designed for high sample-reference divergence, was dramatically better than BWA, designed for low divergence, only in the pig capture, in which Stampy yielded ~30% more bp than did BWA. A universal ungulate exon capture array could be developed using the 8,999 exons that were fully sequenced in all species, including the pig at ~60 MY. As this method helps us understand the genetic basis of evolutionary processes, so it can contribute to an informed study and stewardship of our ecological endowment. ii Acknowledgements There are many who were crucial to this collaborative work. My acknowledgements will be of necessity incomplete. I thank the National Science Foundation, and all those who led and taught in the integrated graduate training program, Montana-Ecology of Infectious Diseases. M-EID provided instruction in biology and mathematics that was invaluable preparation for this work. My committee members were all important to the work presented here. Dr. Gordon Luikart had the crucial insight into the potential of cross-taxa exon capture for nonmodel wildlife species, and was the chief instigator of all the next-generation genomics work in which I participated. He was also a guide through many biological concepts, and gave support and shared his expertise freely and with enthusiasm. Dr. Jesse Johnson has been a valued advisor and guide throughout my time in graduate school, as well as a mentor in computer science and mathematics. Dr. Bill Holben supported me both through the M-EID program and welcomed me into his lab as I began my studies. Dr. Alden Wright has also been a longtime mentor in computer science and mathematics, and his enthusiasm for genomics has been infectious and motivating. My genomics education and this research depend crucially on Dr. Steve Porcella, who welcomed me into a volunteership at Rocky Mountain Laboratories. He and the Genomics Unit were, to say the least, generous with time, resources, and advice. The research central to my dissertation would not have been possible without them. I am particularly grateful to Dr. Craig Martens for mentoring me in the analysis of next- generation sequence data. I am also grateful to Albano Beja-Pereira, both for his collaboration and support through the Centro de Investigação em Biodiversidade e Recursos Genéticos (CIBIO). I thank all my coauthors and collaborators upon whom the genomics in this dissertation so heavily depends. Finally I thank my parents, Carolyn Cosart and the late Don Cosart. Besides their unflagging support, my brothers and I have also benefited incalculably from their belief in education and, more generally, the value of intellectual pursuits. iii Table of Contents Acknowledgements ............................................................................................................ iii List of Figures ................................................................................................................... vii List of Tables .................................................................................................................... vii Chapter 1: Introduction ...................................................................................................... 1 DNA markers for population genetics/genomics ............................................................ 2 RADs and transcriptomes ................................................................................................ 3 Exon capture .................................................................................................................... 3 Next generation sequencing ............................................................................................ 4 Genomics technology transferred to nonmodel species .................................................. 5 Exon target selection for array design ............................................................................. 6 From reads to genotypes ................................................................................................. 7 Types of read mappers ................................................................................................ 8 Quantifying uncertainty in mapping............................................................................ 9 Consensus genotyping ................................................................................................. 9 Quantifying uncertainty in genotyping...................................................................... 10 Multiple metrics for each genotype call .................................................................... 11 Chapter overviews ......................................................................................................... 11 Chapter 2 ................................................................................................................... 11 Chapter 3 ................................................................................................................... 11 Chapter 4 ................................................................................................................... 12 Supplementary work and appended publications .......................................................... 12 Microbial community DNA analysis ........................................................................... 13 Volunteership at the NIH Rocky Mountain Laboratories .......................................... 13 References ..................................................................................................................... 14 Chapter 2: A Computer Program for Genome-Wide and Candidate Gene Exon Sampling for Targeted Next-Generation Sequencing ......................................................................