The Genomic and Evolutionary Analysis of Floral Heteromorphy in Primula
Total Page:16
File Type:pdf, Size:1020Kb
The genomic and evolutionary analysis of floral heteromorphy in Primula Jonathan Matthew Cocker Thesis submitted for the degree of Doctor of Philosophy University of East Anglia John Innes Centre September 2016 © This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with the author and that use of any information derived therefrom must be in accordance with UK Copyright Law. In addition, any quotation or extract must include full attribution. Abstract The genetic basis and evolutionary significance of floral heteromorphy in Primula has been debated for over 150 years. Charles Darwin was the first to explain the importance of the two heterostylous floral morphs, pin and thrum, suggesting that their reciprocal anther and stigma heights facilitate cross-pollination, and showing that only between morph crosses are fully compatible. This key innovation is an archetypal example of convergent evolution that serves to physically promote insect-mediated outcrossing, having evolved in over 28 angiosperm families. Darwin’s findings laid the foundation for an extensive number of studies into heterostyly that contributed to the establishment of modern genetic theory. The widely accepted genetic model portrays the Primula S locus, which controls heterostyly and self-incompatibility, as a coadapted group of tightly-linked genes, or supergene. It is predicted that self-fertile homostyle flowers, with anthers and stigma at the same height, arise via rare recombination events between dominant and recessive alleles in heterozygous thrums. These observations have underpinned over 60 years of research into the genetics and evolution of heterostyly. The Primula vulgaris genome assembly and associated transcriptomic and comparative sequence analyses have facilitated the assembly and characterisation of the complete S locus in this species. Here it is revealed that thrums are hemizygous not heterozygous: the S locus contains five thrum-specific genes which are completely absent in pins, which means recombination cannot be the cause of homostyles as previously believed. The studies also reveal candidate genes in Primula veris and other species, and have facilitated an estimation for the assembly of the S locus supergene at 51.7 MYA. These findings challenge established theory, and reveal novel insight into the structure and origin of the Primula S locus, providing the foundation for understanding the evolution and breakdown of insect-mediated outcrossing in Primula and other heterostylous species. 2 Contents Acknowledgements 6 Abbreviations 7 List of publications 9 1 Introduction 10 1.1 Heterostyly in Primula 10 1.2 Heteromorphic self-incompatibility 13 1.3 The role of heterostyly 15 1.4 Morphology of the heterostylous floral morphs 17 1.5 Inter-morph transfer of pollen 18 1.6 Models for the evolution of heterostyly 20 1.7 Phylogenetic context 25 1.8 Potential applications for the study of heterostyly 27 1.9 The history of heterostyly 29 1.10 Molecular studies of floral heteromorphy 33 1.11 Next-generation sequencing in the study of heterostyly 34 1.12 Summary and research aims 42 2 Assembly and genomic analysis of Primula genomes 43 2.1 Relevant publications 43 2.2 Introduction 43 2.3 Methods 47 Genomic DNA and RNA-Seq paired-end read libraries 47 Genomic DNA paired-end read assemblies 49 Assembly validation 50 Repeat library construction and repeat masking 50 Alignment of proteins from related species 51 Annotation of genes 52 Functional annotation of predicted genes 52 Comparative analysis of P. vulgaris and P. veris genes 53 Analysis of orthologous gene groups 53 RNA-Seq differential expression analysis 54 2.4 Results 55 Analysis of paired-end reads 55 Assembly validation 57 Evaluation of the genespace captured in the assembly 63 Repeats in the Primula vulgaris genome 67 Gene annotations in the Primula vulgaris genome 69 Comparison of genes in P. vulgaris and P. veris genomes 70 OrthoMCL analysis of orthologous genes 73 Differential expression 75 2.5 Discussion 78 3 Annotation and characterisation of S-linked genes 82 3.1 Relevant publications 82 3 3.2 Introduction 82 3.3 Methods 88 Plant material 88 Differential gene expression between Oakleaf and wild-type 88 Gene model predictions for P. vulgaris KNOX (PvKNOX) genes 89 Generation of the PvKNOX phylogenetic tree 90 Identification of variant sites between Oakleaf and wild-type 90 Linkage analysis of PvKNOX candidate genes 90 BAC contig assembly 91 Repeat masking of the BAC contig assembly 92 Prediction of genes in the BAC contig assembly 92 Functional annotation of the BAC contig 93 3.4 Results 93 Prediction of Primula vulgaris KNOX-like (PvKNL) gene models 93 Characterisation of the PvKNOX gene family 95 Expression of PvKNOX genes in Oakleaf and wild-type P. vulgaris 97 Sequence comparison of PvKNOX genes in wild-type and Oakleaf 100 Differential expression between Oakleaf and wildtype P. vulgaris 102 Annotation of the BAC contig assembly 103 Oakleaf in the BAC contig assembly 104 Linkage analysis of Oakleaf candidates 104 3.5 Discussion 107 4 The structure of the Primula vulgaris S locus 113 4.1 Relevant publications 113 4.2 Introduction 113 4.3 Methods 116 Read depth across the S locus 116 RNA-Seq differential expression analysis 117 Analysis of thrum-specific genome regions 117 Detection of recombination in S locus flanking regions 118 Annotation of microRNAs (miRNAs) in the Primula vulgaris S locus 120 Similarity searches of S locus genes to the P. vulgaris genome 121 Repeat analyses of the P. vulgaris S locus 121 Analysis of intron sizes in the Primula vulgaris S locus 121 4.4 Results 122 Alignment of thrum-specific BAC 70F11 generates 455 kb assembly 122 Predicted genes in the 455 kb assembly 123 Genomic reads aligned to the 455 kb assembly 123 Functional evaluation of genes in the 278 kb thrum-specific region 127 Linkage of the 278 kb region to the S locus 129 Expression of genes at the S locus 131 Identification of thrum-specific regions in the P. vulgaris genome 133 Recombination in regions flanking the S locus 137 The annotation of miRNAs in the Primula vulgaris S locus 143 Similarity of genes at the P. vulgaris S locus to other genomic regions 145 Repetitiveness of the Primula S locus 147 Intron sizes at the Primula S locus 149 4 4.5 Discussion 151 5 Evolution and cross-species comparative analyses of the S locus 161 5.1 Relevant publications 161 5.2 Introduction 161 5.3 Methods 164 Primula veris S locus gene model curation 164 P. vulgaris and P. veris S locus gene model visualization 164 Primula veris S locus gene expression analysis 165 S locus genomic read coverage for P. veris and P. vulgaris 165 Bayesian relaxed-clock phylogenetic analysis 165 Selection of sequences and parameters 168 Inspection of alignment for sequence saturation 169 5.4 Results 169 Analysis of read depth across the 455 kb S locus region 169 Identification of S locus genes in Primula veris 171 Expression of Primula veris S locus genes 173 Phylogenetic analysis of GLO-GLOT divergence 175 Saturation analysis of B-function MADS-box genes 177 Validation of the Bayesian phylogenetic analysis 179 The comparative analysis of CFB flanking genes 185 5.5 Discussion 186 6 General discussion and conclusions 196 Bibliography 200 5 Acknowledgements I would like to thank my supervisor Phil Gilmartin for facilitating my development as a scientist, providing support and encouragement throughout this doctoral project, and giving me the freedom to develop and discuss ideas that have led to the results presented herein. In particular, I thank Jinhong Li for countless discussions, support and the generation of numerous resources, without which much this work would not have been possible. I also thank fellow final year PhD student Olivia Kent for phylogenetic resources and support throughout, including numerous discussions. I thank all members of the Gilmartin lab and others at JIC who have supported this work. The collaborative nature of some elements to this project mean that I am indebted to a number of collaborators, including Jon Wright who helped me to develop as a bioinformatician and carried out many genomic-based analyses that support this work, Mark McMullan for support in phylogenetic analyses, and my third supervisor David Swarbreck and other colleagues at TGAC for input into genomic analyses. I also acknowledge the input of Cock van Oosterhout, whose supervision, enthusiasm and encouragement was invaluable throughout manuscript submission and many of the analyses conducted. I thank my mentors David Westhead and Peter Meyer at the University of Leeds, and others who led me in this direction. The friends and colleagues I have connected with over the last four years have provided an inspiring and friendly environment in which to work, and in which to unwind. I thank all fellow PhD students, especially friends in my year, and those I have met outside of JIC. I thank my family for encouraging me to follow anything I set my mind to, and for providing a platform from which I was able to take those steps. I cannot thank them enough for the support they have provided over the years. Finally, I thank Emily for immeasurable support and reassurance, for walks and stargazing, and always being there to listen. 6 Abbreviations A Androecium S locus allele AHRD Automated Assignment of Human Readable Descriptions ARF Auxin response factor as1 asymmetric leaves 1 ASGR Apomixis-specific genomic region BACs Bacterial artificial chromosomes CEGs Core eukaryotic