Characterization of Genomic Diversity at a Quantitative Disease Resistance Locus in Maize Using Improved Bioinformatic Tools

CHARACTERIZATION OF GENOMIC DIVERSITY AT A QUANTITATIVE DISEASE RESISTANCE LOCUS IN MAIZE USING IMPROVED BIOINFORMATIC TOOLS FOR TARGETED RESEQUENCING by Felix Francis A dissertation submitted to the Faculty of the University of Delaware in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Bioinformatics and Systems Biology Spring 2018 © 2018 Felix Francis All Rights Reserved CHARACTERIZATION OF GENOMIC DIVERSITY AT A QUANTITATIVE DISEASE RESISTANCE LOCUS IN MAIZE USING IMPROVED BIOINFORMATIC TOOLS FOR TARGETED RESEQUENCING by Felix Francis Approved: Cathy H. Wu, Ph.D. Chair of Bioinformatics & Computational Biology Approved: Mark Rieger, Ph.D. Dean of the College of Agriculture and Natural Resources Approved: Ann L. Ardis, Ph.D. Senior Vice Provost for Graduate and Professional Education I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Randall J. Wisser, Ph.D. Professor in charge of dissertation I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: J. Antoni Rafalski, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Shawn W. Polson, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Blake C. Meyers, Ph.D. Member of dissertation committee I certify that I have read this dissertation and that in my opinion it meets the academic and professional standard required by the University as a dissertation for the degree of Doctor of Philosophy. Signed: Li Liao, Ph.D. Member of dissertation committee ACKNOWLEDGEMENTS I would like to express my deep and sincere gratitude to my dissertation advisor, Dr. Randall J. Wisser, for the opportunity to pursue research in his lab. The experience has definitely helped me become a better scientist. In particular, I thank him for giving me the independence to explore my research ideas and through this experience, I have learned a lot, especially the importance of perseverance while dealing with challenging research problems. I am extremely grateful to my dissertation committee members, Dr. Blake C. Meyers, Dr. J. Antoni Rafalski, Dr. Shawn W. Polson, and Dr. Li Liao, for their feedback and guidance, which greatly helped shape my research direction. I thank the Wisser group, for all the interesting discussions and for providing an enjoyable environment to learn. I greatly appreciate the assistance of Teclemariam Weldekidan, Michael Dumas and Scott Davis for various crucial validation and data generation work associated with this dissertation. I particularly feel lucky to have shared my time at the lab with Meredith Biedrzycki, Juliana Teixeira, Heather Manch- ing and Terence Mhora who continue to inspire me with their enthusiasm towards science, and have been brilliant role models as early career scientists. I would also like to thank the other collaborators on the NSF grant that provided the funding for this research, especially Dr. Rebecca Nelson and Dr. Tiffany Jamann who provided valuable insights into the biological questions addressed in this dissertation. I appreciate those who got me started in life and research, especially my parents, for their support and encouragement and for introducing me into scientific research. I thank my teachers and advisors during my Undergraduate and Masters programs for inspiring me to pursue science. I am especially thankful to my wife, Pratha Sah, for her support and patience throughout these six years and beyond, whose encouragement and sacrifice made this happen. v TABLE OF CONTENTS LIST OF TABLES :::::::::::::::::::::::::::::::: ix LIST OF FIGURES ::::::::::::::::::::::::::::::: xi ABSTRACT ::::::::::::::::::::::::::::::::::: xix Chapter 1 INTRODUCTION :::::::::::::::::::::::::::::: 1 1.1 Role of genomic diversity for crop improvement :::::::::::: 1 1.2 Challenges in plant genome sequencing projects :::::::::::: 2 1.3 Maize genomic diversity ::::::::::::::::::::::::: 4 1.4 Complex traits and Quantitative disease resistance (QDR) :::::: 5 2 THERMOALIGN: A GENOME-AWARE PRIMER DESIGN TOOL FOR STANDARD PCR AND TILED AMPLICON RESEQUENCING ::::::::::::::::::::::::::::: 7 2.1 Abstract :::::::::::::::::::::::::::::::::: 7 2.2 Introduction :::::::::::::::::::::::::::::::: 8 2.3 Results ::::::::::::::::::::::::::::::::::: 11 2.3.1 Target Region Selection (TRS) ::::::::::::::::: 12 2.3.2 Unique Oligo Design (UOD) ::::::::::::::::::: 12 2.3.3 Priming Specificity Evaluation (PSE) :::::::::::::: 15 2.3.4 Primer Pair Selection (PPS). ::::::::::::::::::: 16 2.3.5 Empirical evaluation of priming specificity ::::::::::: 18 2.4 Discussion ::::::::::::::::::::::::::::::::: 21 2.5 Methods :::::::::::::::::::::::::::::::::: 25 2.5.1 ThermoAlign pipeline ::::::::::::::::::::::: 25 2.5.2 Target region selection (TRS) :::::::::::::::::: 25 2.5.3 Unique oligonucleotide design (UOD) :::::::::::::: 27 vi 2.5.4 Priming specificity evaluation (PSE) :::::::::::::: 30 2.5.5 Primer pair selection (PPS) ::::::::::::::::::: 32 2.5.6 PCR validation :::::::::::::::::::::::::: 36 2.5.7 SMRT sequencing and analysis of long-range PCR amplicons : 37 2.6 Availability :::::::::::::::::::::::::::::::: 37 2.7 Acknowledgments ::::::::::::::::::::::::::::: 38 2.8 Author contributions statement ::::::::::::::::::::: 38 2.9 Additional information :::::::::::::::::::::::::: 38 3 CLUSTERING OF CIRCULAR CONSENSUS SEQUENCES: ACCURATE ERROR CORRECTION AND ASSEMBLY OF SINGLE MOLECULE REAL-TIME READS FROM MULTIPLEXED AMPLICON LIBRARIES ::::::::::::: 39 3.1 Abstract :::::::::::::::::::::::::::::::::: 39 3.2 Background :::::::::::::::::::::::::::::::: 40 3.3 Methods :::::::::::::::::::::::::::::::::: 41 3.3.1 Sequence data ::::::::::::::::::::::::::: 41 3.3.2 Clustering of circular consensus sequences for long amplicon analysis :::::::::::::::::::::::::::::: 42 3.3.3 Evaluating the accuracy of C3S-LAA :::::::::::::: 43 3.4 Results and Discussion :::::::::::::::::::::::::: 46 3.5 Conclusion ::::::::::::::::::::::::::::::::: 51 3.6 Availability :::::::::::::::::::::::::::::::: 52 4 RESEQUENCING OF A QUANTITATIVE DISEASE RESISTANCE LOCUS IN MAIZE PROVIDES BENCHMARK DATA AND INSIGHT INTO THE SPECTRUM OF SEQUENCE VARIATION AMONG INBRED LINES :::::::::::::::: 54 4.1 Introduction :::::::::::::::::::::::::::::::: 54 4.2 Methods :::::::::::::::::::::::::::::::::: 57 4.2.1 Barcoded DNA amplification of the qNLB 1 25721468 23298 locus :::::::::::::::::::::::::::::::: 57 4.2.2 Sequencing, error correction and assembly of multiplexed amplicon libraries ::::::::::::::::::::::::: 58 4.2.3 Sequence characterization :::::::::::::::::::: 59 4.2.4 Comparison to maize HapMap3 ::::::::::::::::: 60 4.2.5 Annotation of variant effects ::::::::::::::::::: 62 vii 4.2.6 Association mapping ::::::::::::::::::::::: 62 4.3 Results ::::::::::::::::::::::::::::::::::: 62 4.3.1 Genomic diversity across the qNLB 1 25721468 23298 locus : 62 4.3.2 Comparison to maize HapMap3 ::::::::::::::::: 64 4.3.3 Analysis of the NLB susceptible, Tx303 haplotype ::::::: 68 4.4 Discussion ::::::::::::::::::::::::::::::::: 70 5 DISCUSSION AND CONCLUSIONS :::::::::::::::::: 74 5.1 A ThermoAlign approach for targeted enrichment of repetitive genomes 75 5.2 SMRT sequencing and assembly of multiplexed amplicon libraries from the maize genome ::::::::::::::::::::::::::::: 77 5.3 Unravelling the genomic diversity at a maize quantitative disease resistance (QDR) locus using long molecule resequencing ::::::: 78 5.4 Future directions ::::::::::::::::::::::::::::: 79 BIBLIOGRAPHY :::::::::::::::::::::::::::::::: 81 Appendix A SUPPLEMENTARY INFORMATION FOR THERMOALIGN: A GENOME-AWARE PRIMER DESIGN TOOL FOR STANDARD PCR AND TILED AMPLICON RESEQUENCING :::::::: 99 B SUPPLEMENTARY INFORMATION FOR: CLUSTERING OF CIRCULAR CONSENSUS SEQUENCES: ACCURATE ERROR CORRECTION AND ASSEMBLY OF SINGLE MOLECULE REAL-TIME READS FROM MULTIPLEXED AMPLICON LIBRARIES ::::::::::::::::::::::::::::::::: 109 C SUPPLEMENTARY INFORMATION FOR: UNRAVELLING THE GENOMIC DIVERSITY AT A MAIZE QUANTITATIVE DISEASE RESISTANCE LOCUS USING LONG MOLECULE RESEQUENCING ::::::::::::::::::::::::::::: 114 D PERMISSIONS ::::::::::::::::::::::::::::::: 124 viii LIST OF TABLES 2.1 Results from BLASTn alignment of error corrected PacBio consensus sequences to the B73 genome. ::::::::::::::::::::: 21 3.1 Comparison of LAA and C3S-LAA consensus sequences for B73 amplicons. :::::::::::::::::::::::::::::::: 47 3.2 The number of consensus sequences generated from the multiplex library, following barcode demultiplexing. ::::::::::::::: 50 4.1 Quartiles of genotyping accuracy for maize HapMap3 at the qNLB 1 25721468 23298 locus. :::::::::::::::::::: 67 A.1 Comparison of ThermoAlign to related primer design tools. ::::: 99 A.2 Effects of the amplicon size range parameter on the minimum tiling path primer design for the 24 kb target region described in the main text. ::::::::::::::::::::::::::::::::::: 100 A.3 Eight genomic loci in maize B73 genome, selected for targeted enrichment :::::::::::::::::::::::::::::::

Characterization of Genomic Diversity at a Quantitative Disease Resistance Locus in Maize Using Improved Bioinformatic Tools

A Multiscale Tool to Explore Genomic Conservation

A Zebrafish Reporter Line Reveals Immune and Neuronal Expression of Endogenous Retrovirus

Evolution and Function of Drososphila Melanogaster Cis-Regulatory Sequences

A Burst of Protein Sequence Evolution and a Prolonged Period of Asymmetric Evolution Follow Gene Duplication in Yeast

A Phd Position Is Available in the Research Group of Aoife Mclysaght

'A Draft Sequence of the Neandertal Genome'

Review 2015–16 Review

SG Stories004-President-Of-Ireland

Evidence from Human, Yeast, and Plant Positionally Biased Gene Loss

Synteny-Based Analyses Indicate That Sequence Divergence Is Not the Main Source of Orphan Genes Nikolaos Vakirlis1, Anne-Ruxandra Carvunis2, Aoife Mclysaght1

Blood Donor Genotyping

Dissertation Submitted to the Combined Faculties for the Natural