Genome Assembly and Discovery of Structural Variation in Cultivated Potato Taxa
Total Page:16
File Type:pdf, Size:1020Kb
Genome assembly and discovery of structural variation in cultivated potato taxa Maria Kyriakidou Department of Plant Science Faculty of Agricultural and Environmental Sciences McGill University Montreal, Canada April, 2020 A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Doctor of Philosophy c 2020 Maria Kyriakidou Contents List of Figures viii List of Figures . xix List of Tables xx List of Tables . .1 Abstract . .2 Abreg´ e´ ...........................................3 Acknowledgements . .5 List of Abbreviations . .7 Thesis Format . 11 Contribution of Authors . 12 1 Introduction 14 1.1 Potato Importance . 14 1.2 The potato genome . 15 1.3 Hypothesis for Chapter 3 . 20 1.4 Objectives for Chapter 3 . 20 1.5 Hypothesis for Chapter 4 . 20 1.6 Objectives for Chapter 4 . 20 1.7 Hypothesis for Chapter 5 . 21 1.8 Objectives for Chapter 5 . 21 2 Preface to Chapter 2 22 2.1 Abstract . 23 2.2 Introduction to polyploidy . 24 2.3 Overview of the sequencing techniques and their applications in polyploid plant genomes . 31 2.4 Challenges of polyploid genome assembly . 36 i 2.5 Technology-related challenges . 41 2.6 How to estimate ploidy level in plants . 44 2.7 How to “resolve” the ploidy issue (how to reduce the complexity of the problem) . 46 2.7.1 Genome-related approach . 46 2.7.2 Genome sequencing and algorithmic (pipeline) approach . 47 2.8 Third Generation Genomic Technologies come to the rescue . 49 2.9 Advances in genomic resources and functional tools in molecular genetics and breeding . 51 2.10 Lack of complexity of the currently available reference genomes of poly- ploid crops . 52 2.11 Conclusions . 53 Bibliography of Chapter 2 60 3 Preface to Chapter 3 75 3.1 Abstract . 76 3.2 Introduction . 77 3.3 Materials and Methods . 80 3.3.1 Plant Materials and Sequencing . 80 3.3.2 Alignment against the potato reference genome . 81 3.3.3 Single Nucleotide Polymorphism (SNP) Analysis . 81 3.3.4 Copy Number Variation (CNV) Analysis . 82 3.3.5 Significantly Enriched Gene Clusters . 82 3.3.6 Principal Component Analysis of CNV-status . 82 3.4 Results . 83 3.4.1 Alignment of 12 potato landrace and wild genomes against two ref- erence genomes shows greater overall match with DM1-3 than with M6 ...................................... 83 ii 3.4.2 Distribution of Single Nucleotide Polymorphisms detected in the genomes compared to the DM1-3 and M6 reference genomes . 88 3.4.3 Distribution of Structural Variations in the landrace genomes com- pared to the DM1-3 and M6 references shows both polymorphism and synergy . 90 3.5 Discussion . 97 3.5.1 Comparison of the analysis with previous studies . 97 3.5.2 Genome comparisons . 97 3.5.3 A SNP analysis uncovers regions of heterozygosity . 99 3.5.4 Several CNV-affected gene clusters are common among potato genomes ........................................ 100 3.5.5 SAUR gene clusters are affected by CNV events in all genomes stud- ied ...................................... 101 3.5.6 Disease Resistance gene clusters . 101 3.5.7 2-Oxogluterate/ Fe (II) dependent oxygenase superfamily proteins (2OGDs) . 102 3.5.8 Genes involved in metabolite biosynthesis . 103 3.6 Conclusion . 103 Bibliography of Chapter 3 105 4 Preface to Chapter 4 115 4.1 Abstract . 116 4.2 Introduction . 117 4.3 Materials and Methods . 118 4.3.1 Plant materials and genome sequencing . 118 4.3.2 De novo genome assemblies . 119 4.3.3 Estimating the percentage of whole genome heterozygosity . 120 4.3.4 Pan-Genome construction and annotation . 120 4.3.5 Gene presence/absence variation analysis . 121 iii 4.3.6 Phylogenetic analysis . 122 4.3.7 Data availability . 122 4.4 Results . 122 4.4.1 Genome assembly of GON1 . 122 4.4.2 Construction of the GON1 pseudomolecules . 124 4.4.3 Genome assemblies of GON2, PHU, STN AJH and BUK . 126 4.4.4 Comparison of the GON1 against the GON2, PHU, STN, AJH and BUK genome assemblies . 126 4.4.5 Pan-Genome Construction . 127 4.4.6 Functional analysis of the variable genes . 130 4.5 Discussion . 134 4.5.1 Diploid potato genome and pan-genome assemblies . 134 4.5.2 Functional analysis of the variable genes . 135 4.6 Conclusion . 137 Bibliography of Chapter 4 138 5 Preface to Chapter 5 148 5.1 Abstract . 149 5.2 Introduction . 150 5.3 Materials and Methods . 151 5.3.1 Genomic Data . 151 5.3.2 Determining the whole genome heterozygosity . 152 5.3.3 De novo genome assemblies . 152 5.3.4 Data Records . 153 5.4 Results . 153 5.4.1 Quality of the sequenced genomes – Whole genome heterozygosity . 153 5.4.2 Genome assembly of ADG1 . 154 5.4.3 Genome assembly of CHA, JUZ, ADG2, TBR and CUR genomes . 158 5.4.4 Comparison of the genome assemblies of ADG1 and ADG2 . 158 iv 5.4.5 Comparison of the genome assemblies of ADG1 and ADG2, TBR, JUZ, CHA and CUR . 159 5.5 Discussion . 159 5.5.1 Highly fragmented genome assemblies due to the heterozygous na- ture of the polyploid potato genomes . 159 5.6 Conclusion . 160 Bibliography of Chapter 5 162 6 Contribution to knowledge 166 6.1 Contributions from Chapter 2 . 166 6.2 Contributions from Chapter 3 . 166 6.3 Contribution from Chapter 4 . 167 6.4 Contribution from Chapter 5 . 168 7 Future Research Directions 170 7.1 Conclusions . 170 7.2 Chapter 3 . 170 Master List of References 172 8 Appendix 1 201 9 Appendix 2 202 10 Appendix 3 203 11 Appendix 4 205 12 Appendix 5 207 13 Appendix 6 213 14 Appendix 7 218 v 15 Appendix 8 222 16 Appendix 9 227 17 Appendix 10 228 18 Appendix 11 229 19 Appendix 12 230 20 Appendix 13 231 21 Appendix 14 232 22 Appendix 15 233 23 Appendix 16 234 24 Appendix 17 235 25 Appendix 18 236 26 Appendix 19 237 27 Appendix 20 238 28 Appendix 21 239 29 Appendix 22 240 30 Appendix 23 241 31 Appendix 24 242 32 Appendix 25 243 33 Appendix 26 244 vi 34 Appendix 27 245 35 Appendix 28 246 36 Appendix 29 247 37 Appendix 30 248 38 Appendix 31 249 39 Appendix 32 250 40 Appendix 33 251 41 Appendix 34 252 42 Appendix 35 253 43 Appendix 36 254 44 Appendix 37 255 45 Appendix 38 256 vii List of Figures 1.1 Potato production per country. The map shows the potato production per country, worldwide in the year 2017. The yellow indicates produc- tion equal or less than 41,499 tonnes and dark red, production more than 4,800,000 tonnes. Potato is a highly adapted crop that can grow in various climates and altitudes. The map was retrieved from FAO and it is available under the Open Database License, the cartography is licenced as CC BY-SA (https://www.openstreetmap.org/copyright). 19 2.1 Approaches for reference - based genome assembly. A. Shorter-read guided assembly. In this method, shorter reads are aligned against the reference genome, a consensus assembly is generated, and structural vari- ations are detected. It can also be used to detect contamination in the se- quenced reads. this approach is used when genomes are re- sequences to detect polymorphisms in individuals. B. Guided de novo genome as- sembly of shorter reads. Previously de novo assembled shorter reads are aligned against the reference or a closely related genome to extend the ex- isting contigs. C. Longer-read guided assembly. Longer reads are aligned against the reference genome, a consensus genome assembly is constructed, and structural variations are detected. D. Guided de novo genome as- sembly of longer reads. Longer reads are de novo assembled into contigs, which are aligned against the reference or.