<<

University of Southampton Research Repository

Copyright © and Moral Rights for this thesis and, where applicable, any accompanying data are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis and the accompanying data cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content of the thesis and accompanying research data (where applicable) must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holder/s.

When referring to this thesis and any accompanying data, full bibliographic details must be given, e.g.

Thesis: Author (Year of Submission) "Full thesis title", University of Southampton, name of the University Faculty or School or Department, PhD Thesis, pagination.

Data: Author (Year) Title. URI [dataset]

University of Southampton

Faculty of Environmental and Life Sciences School of Biological Sciences

Genomics of speciation and hybridisation in the Macaronesian endemic (; )

by

Oliver William White

ORCID ID 0000-0001-6444-0310 Thesis for the degree of Doctor of Philosophy (PhD) January 2018

University of Southampton Abstract

Faculty of Environmental and Life Sciences School of Biological Sciences Thesis for the degree of Doctor of Philosophy (PhD) Genomics of Adaptation and Speciation in the Macaronesian endemic genus Argyranthemum (Asteraceae; Anthemideae) By Oliver William White

The aim of this thesis was to investigate the evolutionary processes responsible for the diversification of the Macaronesian endemic genus Argyranthemum Webb (Asteraceae), using Next Generation Sequencing (NGS) methodologies. Transcriptome sequences from Macaronesian endemic genera, including Argyranthemum, were used to design primers for simple sequence repeat (SSR) markers. This was necessary to overcome the lack of genetic variation commonly observed in Macaronesian endemic lineages. Morphological, ecological and genetic analyses were then employed to address several unanswered questions surrounding the origin of two putative homoploid hybrid , A. sundingii and A. lemsii. Specifically, each of the homoploid hybrid species are shown to be morphologically distinct, ecologically separated from their parental progenitors and independently derived from the same parental species. The hypothesis of independent homoploid hybrid speciation events facilitated by ecological isolation is supported by these results. Genotyping-By-Sequencing (GBS) was employed to investigate the processes associated with the diversification of Argyranthemum. The results of the phylogenetic and hybridisation analyses reveal that geographical isolation, habitat shifts and hybridisation have all contributed to the diversification of the group. In addition, morphological convergence has contributed to the diversification of the group. A study focussed on A. broussonetii reveals that the two subspecies (subsp. broussonetii and subsp. gomerensis) are not closely related. Their morphological similarity is likely due to convergence as a result of their occupation of similar habitats. Finally, comparative transcriptomics was used to identify differentially expressed genes with a potential role in the ecological isolation and origin of the homoploid hybrid species in Argyranthemum. Although independently derived, A. sundingii and A. lemsii appear to have converged on similar expression phenotypes, likely a consequence of adaptation to similar habitats. NGS methodologies have revolutionised our ability to study the process of speciation in recently evolved lineages. Argyranthemum is the largest endemic genus of the Macaronesian archipelagos and an ideal model for investigating the processes responsible for diversification in oceanic island endemic lineages.

Table of Contents

Table of Contents

Table of Contents ...... ii List of Tables ...... vii List of Figures ...... ix List of Appendix Tables ...... xiv List of Appendix Figures ...... xix List of Accompanying Materials ...... xxv Research Thesis: Declaration of Authorship ...... xxvii Acknowledgements ...... xxix Abbreviations ...... xxxi Chapter 1 Introduction...... 1

Summary ...... 1 Speciation ...... 1 Hybridisation ...... 1 Hybrid speciation ...... 2

1.4.1 Frequency of polyploid and homoploid hybrid speciation ...... 3 1.4.2 Models of homoploid hybrid speciation ...... 5

Oceanic archipelagos as natural laboratories for studying evolution ...... 8 Macaronesia as a model for flowering evolution ...... 8 Argyranthemum ...... 11 Thesis aims ...... 15 Methods ...... 15 Thesis outline ...... 19

Chapter 2 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species ...... 23

Abstract ...... 24 Introduction ...... 24 Methods and Results ...... 27 Conclusions ...... 33

ii Table of Contents

Chapter 3 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum ...... 37

Abstract ...... 38 Introduction ...... 38 Materials and Methods ...... 43

3.3.1 Sampling ...... 43 3.3.2 Morphological analysis ...... 46 3.3.3 Ecological niche modelling ...... 46 3.3.4 DNA extraction ...... 47 3.3.5 Polymerase Chain Reaction (PCR) of SSRs ...... 48 3.3.6 Population genetic analyses of nuclear SSRs ...... 48 3.3.7 Haplotype analysis of chloroplast SSRs ...... 48 3.3.8 Processing of GBS SNP data ...... 49 3.3.9 Genome-wide SNP analysis ...... 49 3.3.10 Testing independent hybrid origins with ABC ...... 50

Results ...... 51

3.4.1 Morphological analyses ...... 51 3.4.2 Ecological niche modelling ...... 52 3.4.3 Population genetic analyses of nuclear SSRs ...... 56 3.4.4 Haplotype analysis of chloroplast SSRs ...... 57 3.4.5 Processing of GBS SNP data ...... 58 3.4.6 Nuclear SNP cluster analysis ...... 59 3.4.7 Approximate Bayesian Computation ...... 60

Discussion ...... 62

3.5.1 The hybrid species are morphologically distinct from the parent species ...... 63 3.5.2 The hybrid species are ecologically intermediate ...... 64 3.5.3 A. sundingii and A. lemsii are genetically distinct from the putative parents . 64 3.5.4 Chloroplast haplotype patterns indicate independent origins ...... 65 3.5.5 What are the from Igueste referred to as A. broussonetii × A. frutescens? ...... 66 3.5.6 Approximate Bayesian Computation supports independent origins of A. sundingii and A. lemsii ...... 67

iii Table of Contents

Conclusions ...... 68

Chapter 4 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae) ...... 71

Summary ...... 72 Introduction ...... 72 Methods ...... 76

4.3.1 Sampling ...... 76 4.3.2 DNA isolation and GBS ...... 81 4.3.3 Processing of GBS data ...... 81 4.3.4 Assembly comparison ...... 81 4.3.5 Phylogenetic reconstruction ...... 82 4.3.6 Ancestral State Reconstruction ...... 82 4.3.7 D-statistics ...... 83

Results ...... 84

4.4.1 Processing of GBS data ...... 84 4.4.2 Assembly comparison ...... 85 4.4.3 Phylogenetic reconstruction ...... 85 4.4.4 Ancestral state reconstruction ...... 88 4.4.5 D-statistics ...... 90

Discussion...... 95

Chapter 5 Raising the rank of the Macaronesian endemic Argyranthemum subsp. gomerensis to species based on evolutionary relationships and morphology99

Introduction ...... 100 Materials and methods ...... 102 Results ...... 105 Discussion...... 108 Key to taxa ...... 109

Chapter 6 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum ...... 115

iv Table of Contents

Abstract ...... 116 Introduction ...... 116 Methods ...... 120

6.3.1 Sampling ...... 120 6.3.2 RNA extraction and sequencing ...... 120 6.3.3 Pre-processing ...... 121 6.3.4 De novo assembly and annotation ...... 121 6.3.5 Quantifying transcript abundance (see Figure 6.2) ...... 122

6.3.5.1 Pipeline 1 ...... 122 6.3.5.2 Pipeline 2 ...... 122 6.3.5.3 Pipeline 3 ...... 123 6.3.5.4 Pipeline 4 ...... 123 6.3.5.5 Pipeline 5 ...... 123

6.3.6 Differential expression ...... 124 6.3.7 Pipeline comparison ...... 124

Results ...... 126

6.4.1 RNA-seq processing and assembly ...... 126 6.4.2 Comparison of pipelines ...... 127 6.4.3 Global gene expression in the parents and hybrid species ...... 129 6.4.4 Expression analysis suggests extensive divergence in genes involved in local adaptation ...... 129

Discussion ...... 132

6.5.1 Assessment of assembly pipelines ...... 132 6.5.2 Identification of DE orthogroups...... 133 6.5.3 Changes in gene expression associated with HHS ...... 134 6.5.4 Conclusions ...... 136

Chapter 7 Conclusions and future directions ...... 137 Supplementary information for chapter 2 ...... 141 Supplementary information for chapter 3 ...... 142 Supplementary information for chapter 4 ...... 177 Supplementary information for chapter 5 ...... 207

v Table of Contents

Supplementary information for chapter 6 ...... 209 List of References ...... 233

vi List of Tables

List of Tables

Table 2.1 - Summary statistics for the de novo assembled transcriptomes, BLASTx searches, and simple sequence repeat identification...... 31

Table 3.1 - Number of individuals sampled for population genetics of Simple Sequence Repeat markers (SSR), leaf morphological analysis and Genotyping-By-Sequencing (GBS). Taxa abbreviations used throughout are shown in brackets...... 45

Table 3.2 - Schoener’s D (Schoener, 1968) and Warren’s I (Warren et al., 2008) niche overlap statistics based on our Maxent and PCA-env niche predictions. P values were generated using the niche equivalency test. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum...... 55

Table 3.3 - Distribution of haplotypes across species and groups as defined by presence in parental species. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum...... 58

Table 4.1 - Leaf sampling for Genotyping-By-Sequencing (GBS) with collection reference (Ref), country or island of origin (Isl.), habitat type (Hab.), location, altitude, latitude/longitude coordinates, leaf and representative voucher specimen barcodes and collector details. Country or island of origin is abbreviated as SP (), TE (), GC (Gran Canaria), EH (El Hierro), LP (La Palma), LG (La Gomera), MA (), LA (Lanzarote), SE (Selvagem Pequena) and FU (Fuerteventura). Habitats types are abbreviated as ME (Mediterranean basin) PF (Pine ), LF (), SZ (Sclerophyllous zone), CD (Coastal desert) and HD (High altitude desert). Barcodes are for leaf and voucher specimens deposited at the Natural History Museum London (BM)...... 77

Table 4.2 - Summary statistics for each ipyrad assembly based on varying similarity thresholds and minimum samples require to process a locus. Statistics included the number of loci, single nucleotide polymorphisms (SNPs) and unlinked SNPs (uSNPs) and the clustering results for PCA and STRUCTURE, where K equals the number of clusters determined by mclust and Evanno ΔK respectively...... 85

vii List of Tables

Table 4.3 - Summary of D-statistics performed between clades from the same island (tests 1-29) and between clades of non-monophyletic multi-island endemic taxa (tests 30- 33). For each test performed, the taxa at positions P1, P2, P3 and O are shown, together with the D-statistic, mean bootstrap value, bootstrap standard deviation, Z score, ABBA and BABA frequencies and number of loci used in the test. Tests significant at the 0.01 level are highlighted in bold...... 91

Table 5.1 - Distinguishing characteristics of A. broussonetii subsp. broussonetii, subsp. gomerensis and A. callichrysum based on Humphries (1976)...... 102

Table 5.2 - Characters used in our morphological analysis with information on whether the character was continuous (cont.) or discrete (disc.), how it was scored and definitions for repeatability...... 104

Table 6.1 - Expression phenotypes used to identify transcripts that are differentially expressed (A) between the parental species, (B) with novel expression in the homoploid hybrid species, (C) between the two homoploid hybrid species, and (D) with parent-like expression in the hybrid species are identified. Taxa are abbreviated as bro (A. broussonetii), fru (A. frutescens), sun (A. sundingii) and lem (A. lemsii). Differential expression (DE) = ×, no DE = • and either = -...... 125

Table 6.2 - Number of differentially expressed (DE) orthogroups, blast hits in Arabidopsis thaliana, annotated hits in A. thaliana and significantly enriched gene ontology (GO) terms for each expression phenotype...... 131

viii List of Figures

List of Figures

Figure 1.1 - Schematic diagram of homoploid and allopolyploid hybrid speciation between two species each with the same diploid chromosome number (2n = 2)...... 3

Figure 1.2 - Recombinational model of homoploid hybrid speciation between two species each with the same number of chromosomes (2n = 8) but differ by two chromosomal rearrangements. Figure adapted from Rieseberg (1997); see text for details. . 7

Figure 1.3 - Macaronesian archipelagos comprising the (A) Azores, (B) Madeira, (C) Selvagens, (D) and (D) Cape Verdes. Islands within each archipelago are inset in A-E on left hand side, with island ages in parentheses, as described in Francisco- Ortega et al. (1997) and García-Maroto et al. (2009)...... 9

Figure 1.4 - Habitat zones present in Macaronesia and images of coastal desert (Tenerife, Anaga, North coast between Almáciga and Roque Bermejo), sclerophyllous zone (Tenerife, Anaga, Barranco de Roque Bermejo below Chamorga), laurel forest (El Hierro, La Llanía; photo taken R. Graham), pine forest (Tenerife, above Arafo) and high-altitude desert (Tenerife, near Observatorio del Teide)...... 10

Figure 1.5 - Overview of the morphological diversity present in Argyranthemum. Growth form varies from small dome shaped plants such as those of (A) A. frutescens subsp. succulentum, upright branching plants such as (B) A. gracile and large shrubs including (C) A. broussonetii subsp. gomerensis. Variation in leaf shape is shown for (D) A. adauctum subsp. erythrocarpon, (E) A. haouarytheum, (F) A. broussonetii subsp. broussonetii, (G) A. pinnatifidum subsp. succulentum, (H) A. frutescens subsp. foeniculaceum, (I) A. coronopifolium. Variation in capitula size and colour is shown for (J) A. hierrense, (K) A. sundingii, (L) A. maderense, (M) A. haemotomma. Disc (N, O) and ray florets (P) from A. frutescens are shown. Examples of disc (Q, R) and ray cypselae (S, T) are also shown. All images taken by O. White except for those taken by A. R. Betancort (C, L) and R. Graham (G, M)...... 13

Figure 1.6 - Vouchers specimens of (A) A. tenerifae, (B) A. gracile, (C) A. coronopifolium and (D) A. adauctum subsp. palmensis deposited at the Natural History Museum, London (BM)...... 17

Figure 2.1 - The Macaronesian archipelagos in the North Atlantic Ocean...... 25

ix List of Figures

Figure 2.2 - Macaronesian endemics Argyranthemum sp. (A), Descurainia bourgaeana (B), and wildpretii (C). Photos taken by O. White...... 26

Figure 2.3 - Summary statistics for the three de novo transcriptome assemblies. (A) Number of genes and transcripts assembled for each species. (B) N50, mean, and median transcript length. (C) Transcript lengths for the three transcriptomes (note the change in bin size along the x axis)...... 30

Figure 2.4 - Principal coordinates analysis (PCoA) of 10 samples of Argyranthemum based on genotypic information from eight SSR loci...... 33

Figure 3.1 - Populations sampled across (A) Tenerife in the (B) Anaga Peninsula. Taxa abbreviations used throughout are shown in brackets. Contour lines represent a 200m change in altitude...... 40

Figure 3.2 - Boxplots (A-D) and PCA (E) based on morphological characters leaf area, perimeter, length and width. Letters above each boxplot plot represent the groupings identified by a post hoc Tukey test with false discovery rate (FDR) correction (Benjamini & Hochberg, 1995) for multiple comparisons. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 52

Figure 3.3 - Niche space for each species based on (A) Maxent and (B) ecospat. The Maxent maps were created using the average predictions across 10 replicates and average cloglog thresholds under the maximum training sensitivity plus specificity criterion. Ordinations generated in ecospat show the niche space for each species across the first two principal components. The density of occurrences for each species is represented by grey shading and the solid and dashed contour lines show 100 % and 50 % of the available background environment respectively. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 55

Figure 3.4 - PCA (A), STRUCTURE results for K = 2 to 5 (B) and Evanno delta K (C) based on nuclear SSR clustering analyses. Taxa included in the PCA and STRUCTURE plots are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum...... 57

x List of Figures

Figure 3.5 - PCA (A), STRUCTURE results for K = 2 to 5 (B) and Evanno delta K (C) based on an ipyrad assembly of GBS reads, using a 90 % clustering threshold and a minimum of 10 samples required to process a locus. For the STRUCTURE and PCA plots taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 60

Figure 3.6 - Scenarios 1 to 9 included in the DIYABC analysis based on nuclear SNPs comparing two hybridisation events (scenarios 1-5), a single hybridisation event (scenarios 6-8) and cladogenesis (scenario 9). Scenario 1 was selected as being most likely in describing the origin of the homoploid hybrid species in Argyranthemum. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum). Generations are shown on the y axis (t0 to td) and admixture proportions from each parent are shown on scenario one...... 62

Figure 4.1 - (A) Madeira, Selvagem Pequena and Canary Islands in the North Atlantic Ocean, the taxa occurring on each island sampled and the habitats they occupy. (B) Simplified diagram of habitat types. Abbreviations for habitat types are as follows: coastal desert 0 – 300 m (CD), sclerophyllous forest 300 – 600 m (SF), laurel forest 600 – 1500 m on North-facing slopes (LF), pine forest 600 – 2000 m on South-facing slopes and 1500 – 2000 m on North-facing slopes (PF) and high altitude desert 2000 – 3700 m (PF)...... 74

Figure 4.2 - Four taxon pectinate for D-statistics showing (A) ABBA and (B) BABA allele distributions where red arrows indicate hybridisation events between P3 and P2 or P1. D-statistics testing for (C) hybridisation between lineages from the same island and (D) between multiple island endemic lineages...... 83

Figure 4.3 - Maximum likelihood generated using RAxML-NG for the assembled dataset based on a clustering threshold of 90 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from which were cut-off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown above the branches and poster probabilities ≥ 95 from mrbayes analysis are shown below the branches. Tips are coloured by island and Clades A-G are discussed in text...... 87

xi List of Figures

Figure 4.4 - Maximum likelihood tree generated based on a cluster threshold of 90 % and minimum sample number of 30 with island (left) and habitat (right) optimised onto the topology using maximum likelihood. Pie charts on each node represent the likelihood attributed to either islands or habitat types. Branch lengths are not shown and branches with bootstrap values ≥ 70 are indicated with an asterisk. Clades A-G are the same as those in figure 4.3 and are discussed in text...... 89

Figure 4.5 - Maximum likelihood tree generated based on a cluster threshold of 90 % and minimum sample number of 30 with vertical lines connecting clades for which there was significant support for hybridisation in the D-statistics analysis. Multiple samples of a taxon occurring in the same clade are represented by a single accession; species relationships that were not supported were collapsed as a polytomy...... 95

Figure 5.1 - Distribution map of A. broussonetii subsp. broussonetii (brbr), subsp. gomerensis (brgo) and A. callichrysum (ca) on Tenerife and La Gomera. Point localities are based on vouchers used in the morphological analysis. Specimens without latitude and longitude coordinates were georeferenced based on locality description...... 100

Figure 5.2 - Plants in situ (A), leaves from glasshouse grown plants (B) and ray (C) and disc (C) cypselae. Photos taken by Oliver White ...... 101

Figure 5.3 - Leaf measurements used in morphological analysis...... 105

Figure 5.4 - Boxplots of continuous characters with letters above each box referring to the groups identified by Tukey tests. Taxa are abbreviated as brbr (A. broussonetii subsp. broussonetii), brgo (A. broussonetii subsp. gomerensis) and ca (A. callichrysum)...... 106

Figure 5.5 - Stacked bar plots with frequency counts for each discrete character. Taxa are abbreviated as brbr (A. broussonetii subsp. broussonetii), brgo (A. broussonetii subsp. gomerensis) and ca (A. callichrysum)...... 107

Figure 5.6 - PCA based on continuous and discrete characters. Point colour is shows the taxon whereas shape is determined by the cluster. The proportion of variation explained on each dimension is shown in parentheses. Taxa are abbreviated as

xii List of Figures

brbr (A. broussonetii subsp. broussonetii), brgo (A. broussonetii subsp. gomerensis) and ca (A. callichrysum)...... 108

Figure 6.1 - Schematic diagram of two genes with similar expression (five reads each) across two species. Gene 1 is sufficiently similar across the two species such that the genes co-assemble providing an accurate impression of expression. Gene 2 has diverged such that the genes from each species do not co-assemble, leading to a false impression of differential expression...... 119

Figure 6.2 - Schematic of five pipelines used in our transcriptome assembly and transcript quantification...... 121

Figure 6.3 - Change in (A) number of orthogroups and (B) percentage of genes in orthogroups with increasing values for the mcl inflation parameter...... 127

Figure 6.4 - Venn diagram depicting the degree of overlap between over-represented gene ontology terms for differentially expressed transcripts between the parental species A. frutescens and A. broussonetii for pipelines 1-5...... 128

Figure 6.5 - Sample correlation matrix (A) and principal components analysis (B) of expression generated using Trinity script PtR. For the sample correlation matrix the colour key shows Pearson’s correlation in expression between samples and the dendrograms above and to the left show similarity/dissimilarities between samples. The first, second and third axis are shown in the principal components analysis...... 130

xiii List of Appendix Tables

List of Appendix Tables

Appendix Table A.1- Contig name, SSR sequence, and primers designed for the 30 SSR-containing loci selected from the Argyranthemum broussonetii transcriptome...... 141

Appendix Table B.1- Morphological data used in the present study, including collection reference, population, repeat number and character scores for leaf area, perimeter, length and width. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 142

Appendix Table B.2 - Representative voucher specimens deposited at the Natural History Museum, London. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum). .. 145

Appendix Table B.3 - Primer sequences of the nSSRs and cpSSRs employed in the present study...... 150

Appendix Table B.4 - Georeferenced localities remaining after filtering to include only one site per 50 m pixel. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum). Extracted variables for each point are shown. These are aspect (asp), slope, altitude (alt), average annual temperature (x01), daytime thermal range (x02), isothermality (x03), thermal seasonality (x04), maximum temperature of the warmest month (x05), minimum temperature of the coldest month (x06), annual thermal range (x07), average winter temperature (x08), average temperature of the driest quarter (x09), average summer temperature (x10), average temperature of the coldest quarter (x11), annual rainfall (x12), winter rainfall (x16), precipitation in the driest season (x17), precipitation in the coldest season (x19), average spring temperature (x20), average autumn temperature (x21), precipitation of the wettest semester (x22), precipitation of the driest semester (x23), maximum annual average temperature (x24) and average minimum annual temperature (x25). See http://climaimpacto.eu/ for full details of climate variables...... 151

xiv List of Appendix Tables

Appendix Table B.5 - Summary statistics of nuclear (A) and chloroplast SSRs (B) including sample size (N); number of alleles (Na); effective number of alleles (Ne); information index (I); observed heterozygosity (Ho); expected heterozygosity (He); unbiased expected heterozygosity (He); unbiased expected heterozygosity (uHe); diversity (h); unbiased diversity by population (uh). Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 159

Appendix Table B.6 - Average number of clusters, clusters that passed the minimum depth requirement, heterozygosity estimate, error estimate, number of consensus reads and loci per sample. In addition, the total number of unlinked SNPs in each assembly that was used for PCA and STRUCTURE...... 159

Appendix Table B.7 - Model checking scenario 1. Summary statistics included variance of non-zero values for genic diversities (HV1_1), two sample Fst (FV1_1) and Nei’s distances (NV1_1) as well as admixture statistics (AV1_1)...... 160

Appendix Table C.1 - Summary of ipyrad parameters used for each assembly. Assemblies differ in their clustering threshold and the minimum number of samples required to include a locus and are named based on these parameters. Thresholds of 80 %, 85 % and 90 % are denoted by c80, c85 and c90 respectively, whereas the minimum numbers of 30 and 38 are denoted by m30 and m38 respectively.177

Appendix Table C.2 - Summary statistics for the samples used in our GBS assemblies generated using ipyrad. See Table 1 for details of the samples. The parameters shown are: number of raw reads (raw), number of filtered reads (filtered), number of reads that mapped to the chloroplast and mitochondrial reference sequences (plastid), the number of retained unmapped reads (nuclear), clusters that passed the minimum depth requirement (clus.), heterozygosity estimates (het.), error estimates (err.) and consensus sequences (cons.) for each similarity threshold (c80 = 80 %, c85 = 85 % and c90 = 90 %) and finally the number of loci in each assembly (m30 = minimum sample number of 30 and m38 = minimum sample number of 38)...... 178

Appendix Table C.3 - Cluster classification for each dataset identified by mclust. The data sets generated in ipyrad vary in clustering threshold (c) and minimum samples required to process a locus (m). Country or island of origin is abbreviated as SP

xv List of Appendix Tables

(Spain), TE (Tenerife), GC (Gran Canaria), EH (El Hierro), LP (La Palma), LG (La Gomera), MA (Madeira), LA (Lanzarote), SE (Selvagem Pequena) and FU (Fuerteventura). Habitats types are abbreviated as ME (Mediterranean basin) PF (Pine forest), LF (Laurel forest), SZ (Sclerophyllous zone), CD (Coastal desert) and HD (High altitude desert)...... 182

Appendix Table C.4 - AICc ranked substitution models according for each dataset with differing clustering thresholds (c) and minimum samples number (m)...... 184

Appendix Table D.1 - Input data used for morphological analysis. Morpological characters include leaf attachment (leaf.att.), primary lobe length (lobe.len), primary lobe width (lobe.w), primary lobe length width ratio (lobe.lw.ratio), leaf and lamina width ratio, (leaf.lam.ratio), capitula width (cap.w), ray cypselae colour (ray.cyp.colour), ray cypselae arrangement (ray.cyp.arr.), ray cypselae wings (ray.cyp.wings) and disc wing number (disc.wing.no). Taxa are abbreviated as brbr (A. broussonetii subsp. broussonetii), brgo (A. broussonetii subsp. gomerensis) and ca (A. callichrysum)...... 207

Appendix Table E.1 - accessions used to grow plants for RNA extraction and transcriptome sequencing. Collection reference, population letter, locality, latitude, longitude and representative voucher material deposited at the Natural History Museum London (BM) is provided...... 209

Appendix Table E.2 - Summary of raw reads, filtered reads and percentage (pct.) of filtered reads across all samples...... 210

Appendix Table E.3 - Summary of input, selected and discarded reads during normalisation within and across species...... 210

Appendix Table E.4 - Summary statistics for interspecific and species-specific assemblies generated using the script TrinityStats.pl. Taxa are abbreviated as bro (A. broussonetii), fru (A. frutescens), lem (A. lemsii) and sun (A. sundingii) and mean statistics species-specific assemblies are also presented...... 211

Appendix Table E.5 - Summary statistics for OrthoFinder analyses with varying values for the inflation parameter (I). ). OG, orthogroups; G50, define; O50, define ...... 212

Appendix Table E.6 - Summary statistics used for comparison of OrthoFinder runs with varying values of the inflation parameter ...... 213

xvi List of Appendix Tables

Appendix Table E.7 - Full list of over-represented GO terms were identified for differentially expressed transcripts between the parent species across pipelines 1,2,3 and 5 ...... 214

Appendix Table E.8 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci between the parental species, A. broussonetii and A. frutescens. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided...... 215

Appendix Table E.9 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci between the homoploid hybrid species A. sundingii and A. lemsii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided...... 222

Appendix Table E.10 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. broussonetii-like expression in A. sundingii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided...... 223

Appendix Table E.11 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. frutescens-like expression in A. sundingii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided...... 224

Appendix Table E.12 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. broussonetii-like in A. lemsii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided...... 225

xvii List of Appendix Tables

Appendix Table E.13 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. frutescens-like expression in A. lemsii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided...... 226

Appendix Table E.14 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. broussonetii-like expression shared between the homoploid hybrid species. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided...... 227

Appendix Table E.15 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. frutescens-like expression shared between the homoploid hybrid species. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided...... 228

xviii List of Appendix Figures

List of Appendix Figures

Appendix Figure B.1 - Images of A. broussonetii, A. frutescens subsp. frutescens, A. frutescens subsp. succulentum, A. sundingii and A. lemsii...... 161

Appendix Figure B.2 - Summary plots for a generalised linear model of leaf area generated in R (R Core Team, 2018) including residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots...... 162

Appendix Figure B.3 - Summary plots for a linear model of leaf perimeter generated in R (R Core Team, 2018) including residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots...... 163

Appendix Figure B.4 - Summary plots for a linear model of leaf length generated in R (R Core Team, 2018) including residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots...... 164

Appendix Figure B.5 - Summary plots for a linear model of leaf width generated in R (R Core Team, 2018) including residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots...... 165

Appendix Figure B.6 - Correlation circle showing the contribution of each variable to the Principal Components Analysis...... 166

Appendix Figure B.7 - Niche equivalency test plots for each comparison based on Schoener’s D statistic (Schoener, 1968) and Maxent predictions. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 167

Appendix Figure B.8 - Niche equivalency test plots for each comparison based on Warren’s I statistic (Warren et al., 2008) and Maxent predictions. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 168

Appendix Figure B.9 - Niche equivalency test plots for each comparison based on Schoener’s D statistic (Schoener, 1968) and PCA-env predictions. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens),

xix List of Appendix Figures

lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 169

Appendix Figure B.10 - Niche equivalency test plots for each comparison based on Warren’s I statistic (Warren et al., 2008) and PCA-env predictions. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 170

Appendix Figure B.11 - Eigen values for additional principal components and PCAs including axes one, two and three for (A) parents only, (B) all taxa and (C) hybrid taxa only.171

Appendix Figure B.12 - Median-joining haplotype network produced using Network 5.0 (Bandelt et al., 1999). Haplotype circle size is proportional to sample number, colour indicates the putative taxa and branch length proportional to the number of SSR motif changes. To conform to software requirements of Network, allele fragment sizes for the chloroplast SSRs were converted to simple numeric characters. For example, Ntcp39 allele fragment sizes 163, 164 and 165 were converted to 1, 2 and 3 respectively. These simplified numeric scores for the chloroplast SSR markers are presented adjacent to the haplotype circles. . 172

Appendix Figure B.13 - Number of filtered reads for each sample included in our analysis of genomic SNPs. The horizontal line at 500,000 reads marks the arbitrary cut-off at which samples were removed prior to assembly...... 173

Appendix Figure B.14 - Eigen values for additional principal components and PCAs including axes one, two and three for each ipyrad assembly method: (A) 80 % cluster threshold and 10 minimum samples, (B) 80 % cluster threshold and 13 minimum samples, (C) 85 % cluster threshold and 10 minimum samples, (D) 85 % cluster threshold and 13 minimum samples, (E) 90 % cluster threshold and 10 minimum samples and finally (F) 90 % cluster threshold and 13 minimum samples...... 174

Appendix Figure B.15 - Delta K and STRUCTURE plots for K values two, three, four and five for each ipyrad assembly method: (A) 80 % cluster threshold and 10 minimum samples, (B) 80 % cluster threshold and 13 minimum samples, (C) 85 % cluster threshold and 10 minimum samples, (D) 85 % cluster threshold and 13 minimum samples, (E) 90 % cluster threshold and 10 minimum samples and finally (F) 90 % cluster threshold and 13 minimum samples. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem

xx List of Appendix Figures

(A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum)...... 175

Appendix Figure B.16 - First two axes of the PCA as part of the model checking function of DIYABC. Small empty circles represent datasets simulated from priors, large filled circles represent datasets simulated from posteriors and the large yellow circle represents the observed dataset. See Materials and Methods for description.176

Appendix Figure C.1 - Clades within islands used for D-statistics where colour signifies the island of origin and the clade labels represent the number of clades identified...... 185

Appendix Figure C.2 - Number of reads for each sample following the filtering step of the ipyrad pipeline. The dashed line at 0.5 × 106 reads passed filter represents an arbitrary cut off for the removal of samples with poor quality data...... 186

Appendix Figure C.3 - Principal Component Analysis (PCA) for each assembled dataset. These include (A) cluster threshold 80 % and minimum sample number 30, (B) cluster threshold 80 % and minimum sample number 38, (C) cluster threshold 85 % and minimum sample number 30, (D) cluster threshold 85 % and minimum sample number 38, (E) cluster threshold 90 % and minimum sample number 30 and (F) cluster threshold 90 % and minimum sample number 38...... 187

Appendix Figure C.4 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 80 % and minimum sample number of 30...... 188

Appendix Figure C.5 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 80 % and minimum sample number of 38...... 189

Appendix Figure C.6 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 85 % and minimum sample number of 30...... 190

Appendix Figure C.7 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 85 % and minimum sample number of 38...... 191

xxi List of Appendix Figures

Appendix Figure C.8 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 90 % and minimum sample number of 30...... 192

Appendix Figure C.9 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset on a cluster threshold of 90 % and minimum sample number of 38.193

Appendix Figure C.10 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 80 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 194

Appendix Figure C.11 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 80 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 195

Appendix Figure C.12 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 85 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 196

Appendix Figure C.13 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 85 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 197

Appendix Figure C.14 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 90 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on

xxii List of Appendix Figures

the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 198

Appendix Figure C.15 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 90 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 199

Appendix Figure C.16 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 80 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 200

Appendix Figure C.17 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 80 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 201

Appendix Figure C.18 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 85 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 202

Appendix Figure C.19 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 85 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 203

xxiii List of Appendix Figures

Appendix Figure C.20 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 90 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 204

Appendix Figure C.21 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 90 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text...... 205

Appendix Figure E.1 - Phylogeny adapted from White et al. (in prep) showing independent hybrid origins for A. sundingii and A. lemsii. Taxa with poorly supported phylogenetic relationships are annotated with an asterisk...... 229

Appendix Figure E.2 - Populations sampled in Tenerife (A) in the Anaga peninsula (B). Populations are labelled A-U. Contour lines represent a 200 m change in altitude...... 230

Appendix Figure E.3 - Summary of raw and filtered read counts across all samples ...... 231

Appendix Figure E.4 - Percentage of orthogroups with distinct, multiple and no hits in (A) Arabidopsis thaliana and (B) Helianthus annuus across varying inflation parameters for OrthoFinder...... 232

xxiv List of Accompanying Materials

List of Accompanying Materials

Data or results that were too large to be included can be found in the CD-ROM attached at the end of the thesis.

Accompanying Material A.1 - Blast output of assembled transcriptomes of Argyranthemum, Descurainia and Echium against Arabidopsis thaliana. One hit per transcript is reported.

Accompanying Material C.1 - Jupyter notebook of D-statistics performed between (1) clades found on the same island and (2) independent lineages of the multi-island endemic species A. adauctum and A. broussonetii that were each resolved as polyphyletic in the phylogenetic analysis.

Accompanying material E.1 - Differentially expressed loci identified for each expression phenotype and blast hits in Arabidopsis thaliana.

xxv

Research Thesis: Declaration of Authorship Research Thesis: Declaration of Authorship

Print name: Oliver White

Genomics of speciation and hybridisation in the Macaronesian endemic genus Title of thesis: Argyranthemum (Asteraceae; Anthemideae)

I declare that this thesis and the work presented in it are my own and has been generated by me as the result of my own original research.

I confirm that:

1. This work was done wholly or mainly while in candidature for a research degree at this University; 2. Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated; 3. Where I have consulted the published work of others, this is always clearly attributed; 4. Where I have quoted from the work of others, the source is always given. With the exception of such quotations, this thesis is entirely my own work; 5. I have acknowledged all main sources of help; 6. Where the thesis is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself; 7. Parts of this work have been published as:

White OW, Reyes-Betancort A, Chapman MA, Carine MA (2018) Independent homoploid hybrid speciation events in the Macaronesian endemic genus Argyranthemum. Molecular Ecology; 27: 4856–4874.

White OW, Doo B, Carine MA & Chapman MA (2016) Transcriptome sequencing and simple sequence repeat marker development for three Macaronesian endemic plant species. Applications in Plant Sciences; 4: 1600050.

Signature: Date:

xxvii

Acknowledgements Acknowledgements

There are many people, without whom, this thesis simply would not have been completed. Primarily, I would like to thank Mark Carine and Mark Chapman for their guidance and patience, not only in writing this thesis, but throughout the various stages of my PhD. I feel very fortunate to have had two supervisors from diverse backgrounds who were always actively involved in the project and also cared about my personal development as a researcher. I would also like to thank Tom Ezard for his advice on numerous occasions, which have had a significant impact on the analytical methods used throughout. Fieldwork often does not get the recognition it deserves. Indeed, without the support and advice of Alfredo Reyes-Betancort and Mark Carine, fieldwork would have been exceptionally harder and the quality of collections would have been considerably poorer. Collections in the Canary Islands were assisted by Arnoldo Santo Guerra, Giancarlo Torre and Magui Olangua Corral. Fieldwork collections for Madeira were made by Rachael Graham, Mark Carine, Miguel Menezes de Sequeira and Roberto Jardim. In addition, material from the Selvagens was provided by Miguel Menezes de Sequeira. I am genuinely grateful for their assistance in helping to achieve the aims of this thesis. Mike Cotton has been exceptionally helpful in caring for the plants in the greenhouse facilities at the University of Southampton, especially considering their susceptibility for whitefly and various pests. Coming from a largely botanical background, the bioinformatics used in this thesis was a steep learning curve at times. Mark Chapman was always available to discuss ideas and gave me the freedom to explore various ‘rabbit holes’, but always kept me on track. Cumulatively, I expect that Elena Vataga and other members of staff at the IRIDIS computing facility have shaved months off the analysis time during my PhD project, for which I will always be grateful. Despite having almost no idea about what my PhD actually involved (something about daises), my family have been unwavering in their support and their belief that it might actually one day finish it. Kate Henbest and the entire Henbest family have supported me during some of the most challenging times of the PhD and I hope in time to return the favour. I have found this project to be a genuinely rewarding experience and I thank all of those who have made it possible.

xxix

Abbreviations

Abbreviations

Abbreviation Description

NGS Next Generation Sequencing

SSR Simple Sequence Repeat

EST Expressed Sequence Tag

PCR Polymerase Chain Reaction

ITS Internal Transcribed Spacer

RNA Ribonucleic acid

DNA Deoxyribonucleic acid

bp Base pair

Contiguous length of DNA sequence contig. comprised of overlapping DNA segments

subsp. Subspecies

cp Chloroplast

Nr Nuclear

HHS Homoploid hybrid speciation

HTS High Throughput Sequencing

PCoA Principal coordinates analysis

PCA principal component analysis

xxxi

Introduction

Chapter 1 Introduction

Summary

The overall aim of this thesis was to investigate the evolutionary processes responsible for speciation and diversification in the Macaronesian endemic genus Argyranthemum Webb using Next Generation Sequencing (NGS) methodologies. In this introductory chapter, fundamental evolutionary phenomena, including speciation and reproductive isolation, are first introduced. Then the review focuses on hybridisation, its frequency in nature and evolutionary consequences. The potential for homoploid hybrid speciation, which is thought to have occurred in Argyranthemum is discussed in detail, focusing on its frequency and models used to explain the origin of a homoploid hybrid species. The role of oceanic islands as model systems in evolutionary biology and the natural history of Macaronesia are also discussed before introducing Argyranthemum, the study system for this thesis. Finally, the key questions we seek to address are outlined, the methods adopted presented and the chapter contents summarised.

Speciation

The processes that give rise to the origin of a new species is a longstanding and fundamental topic of evolutionary biology and is crucial for understanding the basis of biodiversity. Speciation is typically characterised by the divergence of two evolutionary lineages and the accumulation of reproductive barriers (Rieseberg & Willis, 2007; Lowry et al., 2008; Baack et al., 2015). With technological advances in sequencing capabilities, generally referred to as next generation, high throughput or massively paralleled sequencing, current research is becoming increasingly focussed on understanding the genomic and/or genic changes associated with speciation and reproductive isolation (Seehausen et al., 2014). However, the route to complete reproductive isolation can be lengthy and species are often incompletely isolated for millions of years after their initial divergence (Mallet, 2005). Therefore, closely related species or genetically distinct populations that experience secondary contact are likely to interbreed, exchanging genetic material via hybridisation (Rieseberg, 1997; Soltis & Soltis, 2009).

Hybridisation

Hybridisation was historically perceived to be rare and of little evolutionary significance, a view perpetuated by research of animal systems with strong reproductive barriers (Yakimowski & Rieseberg, 2014). For example, Fisher (1930) stated that “the grossest blunder in sexual

1 Introduction preference, which we can conceive of an animal making, would be to mate with a species different from its own”. However, botanists have long recognised the importance of hybridisation. Stebbins (1959) stated that “occasional hybridisation between recognised species . . . is the rule in flowering plants”. Interspecific hybridisation seems to be a common occurrence in nature. At least 25 % of plant species and 10 % of animal species are thought to be involved in hybridisation (Mallet, 2005). The ability to hybridise is particularly frequent in recent, rapidly radiating groups (Ford et al., 2015; Curto et al., 2017), perhaps because not enough time has passed to allow intrinsic reproductive barriers to accumulate.

In the context of speciation, hybridisation may have several outcomes. Hybridisation in areas of overlap between species or genetically distinct populations can lead to the formation of stable hybrid swarms if hybrids have a reduced fitness and there is persistent gene flow between the parental populations (Barton & Hewitt, 1985). If hybrids are fully fertile and there is repeated opportunity for hybridisation, extensive gene flow might result in extinction of one of the hybridising taxa via genetic assimilation or even the merging of two taxa into a single evolutionary lineage (Rieseberg, 1997; Abbott et al., 2013). For this reason, hybridisation has been viewed as a threat to the genetic integrity of rare and endangered species (Taylor et al., 2006; Oliveira et al., 2008). Conversely, hybridisation could lead to the strengthening of barriers to gene exchange between hybridising taxa. This process, termed ‘reinforcement’, involves selection against inviable and sterile hybrids with reduced fitness, resulting in the evolution of isolating barriers (Noor, 1999; Servedio & Noor, 2003). The reunion of two diverging lineages via hybridisation seems to be at odds with speciation and the accumulation of reproductive barriers over time (Rieseberg & Willis, 2007). However, hybridisation can also act as a potent creative force, generating novel gene combinations and adaptations resulting in the formation of an entirely new species in a process termed ‘hybrid speciation’ (Rieseberg, 1997; Mallet, 2007).

Hybrid speciation

There are two types of hybrid speciation: homoploid and polyploid (Figure 1.1). Homoploid hybrid speciation refers to the origin of a new lineage without a change in chromosome number, whereas polyploid hybrid speciation involves a full duplication of chromosome number following hybridisation between species (allopolyploidy; Rieseberg & Willis, 2007; Soltis & Soltis, 2009). A duplication in chromosome number can also occur within a species (autopolyploidy; Mallet, 2007) but this does not involve interspecific hybridisation.

2 Introduction

Figure 1.1 - Schematic diagram of homoploid and allopolyploid hybrid speciation between two species each with the same diploid chromosome number (2n = 2).

1.4.1 Frequency of polyploid and homoploid hybrid speciation

Polyploidy is a well-established means of speciation that appears to be particularly frequent in flowering plants (Soltis et al., 2014). When polyploids cross with diploids, the resulting progeny possess an odd ploidy level (e.g. triploids; Mallet, 2007). Thus, polyploid hybridisation is a simple and saltational way of achieving speciation. In addition, triploid hybrids can produce fertile tetraploid progeny by through backcrosses with diploid or other triploids, a route commonly referred to as a “triploid bridge” (Husband, 2004). Wood et al. (2009) estimated that 15 % of angiosperms and 31 % of ferns originated via a change in ploidy. While polyploidy appears to be rarer in animals, it has been identified in a range of groups including insects, molluscs, crustaceans, amphibians, reptiles, fish and mammals (Mable, 2003). Although the relative frequency of allopolyploid and autopolyploid speciation remains a matter of debate, allopolyploid speciation is considered to be more frequent (Mallet, 2007; Soltis & Soltis, 2009). Indeed, considerably more is known about the frequency of polyploid speciation compared to homoploid hybrid speciation.

Homoploid hybrid speciation (hereafter HHS) is thought to be a rare occurrence in nature. Schumer et al. (2014) compiled a list of 37 putative examples of HHS that were reported in the literature in the preceding decade. These included examples from diverse groups including fungi, flowering plants, fish, mammals and birds. Among flowering plants, Kadereit (2015) identified 28 putative cases of HHS. Comparing the frequency of polyploid speciation to HHS in flowering plants as an example, there is a clear disparity between polyploid speciation (15% of angiosperms) and homoploid hybrid speciation (<0.001%).

3 Introduction

The disparity in frequency between these two modes of speciation has largely been attributed to two main factors. Firstly, early generation homoploid hybrids frequently have reduced fitness caused by a loss of fertility and/or viability, characteristics generally referred to as hybrid incompatibilities (Rieseberg & Willis, 2007). Secondly, early generation homoploid hybrids share the same chromosome number as their parental progenitors making them particularly susceptible to introgression and assimilation (Rieseberg & Willis, 2007; Schumer et al., 2014). In contrast, early generation polyploid hybrids do not exhibit a loss of fertility/fitness and the duplication of chromosome number leads to immediate reproductive isolation from the parental progenitors (Soltis & Soltis, 2009). In addition, it has been suggested that the degree of genetic divergence between the hybridising parental species is related to the occurrence of hybrid speciation at the polyploid versus homoploid level (Chapman & Burke, 2007). Genetic distance was estimated between species pairs that have given rise to homoploid and polyploid hybrid species in angiosperms and the parental species of allopolyploids were found to be more divergent than the parental species of homoploid hybrids (Chapman & Burke, 2007).

Another important factor that may be associated with the apparent rarity of homoploid hybrid species, is the experimental challenges associated with their identification, typically requiring a combination of morphological, karyological and molecular analyses. Indeed, given the frequency of hybridisation in nature and the experimental challenges associated with the identification of a homoploid hybrid species, it has been suggested that HHS may be more common than we currently think (Mallet, 2007; Mavárez & Linares, 2008; Nolte & Tautz, 2010).

Recent publications have also highlighted disagreement on the key criteria defining homoploid hybrid species (Schumer et al., 2014, 2018; Nieto Feliner et al., 2017). Schumer et al. (2014) proposed three criteria for a homoploid hybrid species including (1) reproductive isolation of hybrid lineages from the parental species, (2) evidence of hybridisation in the genome, and (3) evidence that this reproductive isolation is a consequence of hybridisation. In their review of 28 examples of HHS, the authors found that only three sunflower species (Helianthus anomalus S.F.Blake, H. deserticola Heiser, and H. paradoxus Heiser; Rieseberg et al., 2003) and one butterfly species (Heliconius heurippa Hewitson; Jiggins et al., 2008; Salazar et al., 2010) satisfied all three criteria. Indeed, the majority of putative cases of HHS were based primarily on criterion two with evidence of hybridisation in the genome. The lack of studies that were able to demonstrate evidence of reproductive isolation or hybridisation derived reproductive isolation is likely due to the experimental challenges and resources required for non-model systems. The quantitative trait loci (QTL) experiments implemented for Helianthus (Rieseberg et al., 2003) and Heliconius (Jiggins et al., 2008; Salazar et al., 2010) are likely to be unfeasible in non-model systems that are difficult to breed under controlled conditions or have a long life cycle (Ru et al., 2018). For this reason,

4 Introduction

Nieto Feliner et al. (2017) suggested that these criteria were too strict and would limit our understanding of the importance of homoploid hybrid speciation in nature. While the criteria of Schumer et al. (2014) may be restrictive and experimentally challenging to demonstrate, it provides a useful framework with which putative cases can be investigated and compared.

1.4.2 Models of homoploid hybrid speciation

For early generation homoploid hybrids, the accumulation of reproductive barriers between parental taxa is essential to become a distinct lineage (Rieseberg, 1997). In addition to the factors discussed above, this represents a considerable barrier to homoploid hybrid speciation. There are two main hypothesised mechanisms by which a homoploid hybrid lineage might become stabilised, recombinational speciation and ecological speciation.

Recombinational speciation, proposed by Grant (1971), describes a scenario in which chromosomal rearrangements would provide some degree of intrinsic reproductive isolation between a homoploid hybrid and the parental progenitors. This model involves hybridisation between two species with the same diploid chromosome number, which differ by a number of chromosomal rearrangements. Figure 1.2 depicts an example of recombinational hybrid speciation between two species each with a diploid chromosome number 2n = 8 that differ by two chromosomal rearrangements (Rieseberg, 1997; Buerkle et al., 2000; Hegarty & Hiscock, 2005). In this example, hybridisation produces F1 progeny heterozygous for the chromosomal rearrangements. The heterozygous F1 hybrid will generate sixteen possible gamete combinations during meiosis (Figure 1.2). Twelve of these combinations will be unbalanced and inviable due to impaired chromosomal pairing during meiosis or crossing over between heterologous chromosomal blocks (Buerkle et al., 2000). As a result, the F1 hybrid is largely infertile (12/16 or 75 %; Figure 1.2). However, four of the possible combinations will be balanced and viable. Two will recover the parental combinations whereas the remaining two will recover novel combinations. If the gametes with novel combinations are selfed, a small number of F2 individuals will possess novel karyotypes that are fertile and stable when crossing among themselves, but partially intersterile with the parental species. These novel karyotypes are at least partially reproductively isolated from parental species, and if they persist, may constitute a novel species (Buerkle et al., 2000).

Another model for HHS is ecological speciation, involving the generation of novel allele combinations that result in adaptations not present in either parent species, allowing the hybrid progeny to occupy a new niche providing reproductive isolation from the parental taxa (Gross and Rieseberg 2005). This model was supported in a simulation study by Buerkle et al. (2000) who

5 Introduction found ecological isolation necessary for HHS. Perhaps, the strongest support for the role of ecological isolation is that most known examples of HHS are associated with ecological and/or geographical shifts. For example, of the 28 examples of HHS identified by Kadereit (2015), 21 were associated with shifts in ecology or geography. However, it is not known whether ecological isolation was a direct consequence of hybridisation or simply a by-product of divergence over time.

The combination of divergent genomes by hybridisation results in dramatic changes in gene expression (Hegarty et al., 2008). Indeed, it has been hypothesised that novel gene expression combinations and/or transgressive expression generated by hybridisation could play an important role in the ecological shift, isolation and origin of a homoploid hybrid species (Lai et al., 2006; Hegarty et al., 2009). However, this hypothesised mechanism has received relatively little attention.

6 Introduction

Figure 1.2 - Recombinational model of homoploid hybrid speciation between two species each with the same number of chromosomes (2n = 8) but differ by two chromosomal rearrangements. Figure adapted from Rieseberg (1997); see text for details.

7 Introduction

Oceanic archipelagos as natural laboratories for studying evolution

Model systems have taught us much of what we know about biology. Their relatively simple nature makes it possible to interpret complex interactions and organisation that might otherwise be impossible to comprehend. In just the same way that Arabidopsis thaliana (L.) Heynh. has been employed as a model organism for molecular plant biology, remote oceanic archipelagos offer comparatively simple systems or natural experiments for the study of evolution with which we can address fundamental questions concerning the nature of species and the processes responsible for their origin (Losos & Ricklefs, 2009; Kueffer et al., 2014).

Oceanic islands are useful model systems given their isolated nature, comparative simplicity and well-characterised geological age (Losos & Ricklefs, 2009). Formed by volcanic activity, oceanic islands are also ecologically diverse with steep ecological gradients over short distances. Oceanic archipelagos are home to a disproportionate number of the world’s endemic species (Myers et al., 2000; Kueffer et al., 2014) and are commonly associated with evolutionary radiations such as Darwin’s finches (Lamichhaney et al., 2015), Anolis Lizards of the Greater Antilles (Losos et al., 1998) or Hawaiian lobeliads (Givnish et al., 2009).

Macaronesia as a model for evolution

Macaronesia is a biogeographic region encompassing five archipelagos of the North Atlantic Ocean off the western coasts of Europe and Africa, including the Azores, Madeira, Selvagens, Canary Islands and Cape Verdes (Francisco-Ortega et al., 1997a; Figure 1.3). The islands of Macaronesia exhibit a broad range of geological ages and ecological habitats. In the Canary islands for example, geological ages range from 0.7 million years (My) for El Heirro and 21 My for Lanzarote (Francisco-Ortega et al., 1997a). Macaronesia is also characterised by distinct ecological zones, generated by large altitudinal variation and the effect of the northern trade winds (Figure 1.4), although there is considerable variation between islands due to differences in altitude. For example, Tenerife is the tallest island in Macaronesia reaching 3717 m and five distinct ecological zones are present, defined here as coastal desert, sclerophyllous zone, laurel forest, pine forest and high-altitude desert (Humphries, 1979). In contrast, the eastern islands of Lanzarote and Fuerteventura reach an altitude of approximately 650 m and only coastal desert and sclerophyllous zone habitats are present.

The isolation and habitat diversity of these islands has resulted in a rich flora, of which a considerable proportion is shared with the continental flora of the Mediterranean region and north-west Africa (Borgen, 1984). The flora of Macaronesia, including introduced species, has an

8 Introduction estimated 3200 species, of which 680 (20 %) are endemic to the region (Humphries, 1979). A striking characteristic of the Macaronesian flora is the number of radiations of flowering plants in groups including Argyranthemum Webb, Sonchus L., Echium L., the Aeonium Webb & Berthel alliance, Sideritus L. and Pericallis Webb & Berthel. The high number of endemic radiations across Macaronesia makes the region particularly suitable for evolutionary studies in flowering plants.

Figure 1.3 - Macaronesian archipelagos comprising the (A) Azores, (B) Madeira, (C) Selvagens, (D) Canary Islands and (D) Cape Verdes. Islands within each archipelago are inset in A- E on left hand side, with island ages in parentheses, as described in Francisco- Ortega et al. (1997) and García-Maroto et al. (2009).

9 Introduction

Figure 1.4 - Habitat zones present in Macaronesia and images of coastal desert (Tenerife, Anaga, North coast between Almáciga and Roque Bermejo), sclerophyllous zone (Tenerife, Anaga, Barranco de Roque Bermejo below Chamorga), laurel forest (El Hierro, La Llanía; photo taken R. Graham), pine forest (Tenerife, above Arafo) and high-altitude desert (Tenerife, near Observatorio del Teide).

10 Introduction

Argyranthemum

This thesis focuses on the Macaronesian endemic genus Argyranthemum as an evolutionary radiation with which we can address fundamental questions related to speciation and hybridisation (Figure 1.5).

Argyranthemum (Asteraceae; Compositae) is placed in the Anthemideae (formerly Chrysantheminae), which consists of 111 genera and approximately 1800 species with three major centres of diversity found in Central Asia, the Mediterranean region and Southern Africa (Watson et al., 2000; Oberprieler et al., 2009). Within the Anthemideae, Argyranthemum is grouped in the sub tribe Glebionidinae which also includes three other genera distributed in the western Mediterranean: Cass., Schott and Glebionis Cass. Ismelia and Heteranthemis are both monotypic and are distributed in and the Iberian Peninsula respectively. Glebionis has two species G. coronarium L. and G. segetum L. distributed in Europe, Asia and North Africa.

The monophyly of the sub tribe Glebionidinae was supported by morphological cladistic analysis (Bremer & Humphries, 1993) and chloroplast (cp) restriction site data (Francisco-Ortega et al., 1995b). Within Glebionidinae, the monophyly of Argyranthemum was demonstrated using cp restriction site data (Francisco-Ortega et al., 1995b), isozyme data (Francisco-Ortega et al., 1995a) and sequences of the internal transcribed spacer (Francisco-Ortega et al., 1997b). Members of Argyranthemum can consistently be distinguished from their continental relatives by their perennial woody habit (Humphries, 1976).

11 Introduction

12 Introduction

Figure 1.5 - Overview of the morphological diversity present in Argyranthemum. Growth form varies from small dome shaped plants such as those of (A) A. frutescens subsp. succulentum, upright branching plants such as (B) A. gracile and large shrubs including (C) A. broussonetii subsp. gomerensis. Variation in leaf shape is shown for (D) A. adauctum subsp. erythrocarpon, (E) A. haouarytheum, (F) A. broussonetii subsp. broussonetii, (G) A. pinnatifidum subsp. succulentum, (H) A. frutescens subsp. foeniculaceum, (I) A. coronopifolium. Variation in capitula size and colour is shown for (J) A. hierrense, (K) A. sundingii, (L) A. maderense, (M) A. haemotomma. Disc (N, O) and ray florets (P) from A. frutescens are shown. Examples of disc (Q, R) and ray cypselae (S, T) are also shown. All images taken by O. White except for those taken by A. R. Betancort (C, L) and R. Graham (G, M).

Argyranthemum is the largest endemic genus of flowering plants in Macaronesia and is distributed in Madeira, Selvagens and the Canary Islands (Humphries, 1979; Francisco-Ortega et al., 1997a). Twenty-four morphological species are recognised of which four have subspecies recognised and consequently the genus includes 39 terminal taxa (Humphries, 1976). Twenty species are restricted to the Canary Islands, three species are endemic to Madeira and a single endemic is found on the Selvagens (Humphries, 1976). However, Argyranthemum is a taxonomically complex group and taxon delimitation is often blurred by hybridisation, a frequent occurrence in the group as intrinsic reproductive barriers are weak between species (Fjellheim et al., 2009). Isolating barriers are predominately external and due to geographical or ecological isolation and where species come into proximity, hybrids swarms are commonly identified (Borgen, 1976; Brochmann, 1984, 1987; Brochmann et al., 2000). Hybridisation has been implicated in the origin of three species, namely A. sundingii Borgen, A. lemsii Humphries and A. escarrei (Svent.) Humphries. Argyranthemum escarrei on Gran Canaria was hypothesised to be derived from hybridisation between A. adauctum subsp. canariense and A. (Borgen, 1976) based on its morphological similarity to hybrid swarms, but this is yet to be demonstrated experimentally. The most robust example of hybrid origin in Argyranthemum relates to A. sundingii and A. lemsii on Tenerife, for which morphological and molecular analyses support homoploid hybrid speciation between the same parental cross of A. broussonetii and A. frutescens (Brochmann et al., 2000; Fjellheim et al., 2009). Argyranthemum is often cited as one of the text book examples of HHS but there are still unanswered questions regarding the delimitation and origin(s) of these putative homoploid hybrid species that need to be addressed.

Firstly, Brochmann et al. (2000) suggested that A. sundingii and A. lemsii should be treated as conspecific based on morphological analyses of plants grown under common glasshouse experiments. Their analysis was based on material collected from the type localities but more

13 Introduction populations have since been discovered and their distinctiveness considering their full geographical extent remains to be robustly tested. Secondly, ecological selection is often implicated as an important factor in the origin of homoploid hybrid species but the evidence is largely anecdotal and based solely on species distributions (Brochmann et al., 2000; Fjellheim et al., 2009). To date, there has been no explicit ecological analysis of the putative homoploid hybrid species in Argyranthemum. Thirdly, Brochmann et al. (2000) suggested that A. sundingii and A. lemsii were the result of independent homoploid hybrid speciation events based on different maternal chloroplast parentage in each of the hybrid species. However, sampling was limited to one accession for each putative homoploid hybrid species so these results need to be interpreted with extreme caution. Finally, populations identified as “A. cf lemsii” by Fjellheim et al. (2009) of unknown provenance are hypothesised to be of hybrid origin but also show evidence of introgression and how they fit with the HHS scenario proposed for the group remains unclear. Confirming the working model of homoploid hybrid speciation in Argyranthemum is important so that future studies can address question relating to the underlying genomic changes. For example, if the homoploid hybrid species are independently derived and ecologically isolated from their parental progenitors, are there gene expression changes unique and/or shared between the homoploid hybrid species that are related to their occupation of a novel habitat?

To date, phylogenetic studies of Argyranthemum have been unable to fully resolve species relationships due to a lack of genetic variation in commonly used markers (Francisco-Ortega et al., 1996a,b). As a result, inferences of the processes responsible for diversification have been limited. A chloroplast restriction site analysis identified two main clades, one restricted to Madeira and the Selvagens and one comprising taxa from the Canary Islands (Francisco-Ortega et al., 1996b). Within the Canary Islands clade, two major groups were resolved, one largely corresponding to taxa under the influence of the northern the trade winds and the other not, suggesting that inter- island colonisation between similar ecological habitats was the prominent driver of diversification in the Canary Islands (Francisco-Ortega et al., 1996b). In contrast, habitat shifts were more frequent on Madeira where there was a single colonisation event followed by shifts into the different habitat types available. However, given the lack of resolution, the relative contribution of habitat shifts and inter-island colonisation in the diversification of Argyranthemum has yet to be robustly investigated.

14 Introduction

Thesis aims

The overall aim of this thesis was to investigate processes responsible for diversification of the Macaronesian endemic genus Argyranthemum using NGS methodologies. Specifically, I sought to: 1. Address outstanding questions surrounding the origin of the homoploid hybrid species A. sundingii and A. lemsii: o Are A. sundingii and A. lemsii morphologically distinct? o Are A. sundingii and A. lemsii ecologically isolated from their parental progenitors? o Were A. sundingii and A. lemsii the result of independent HHS events? o What is the provenance of material identified by Fjellheim et al. (2009) as A. cf. lemsii? 2. Investigate the relative importance of geographical isolation, habitat shifts and hybridisation in the diversification of Argyranthemum. 3. Employ comparative transcriptomics to investigate changes in gene expression associated with HHS using plants grown under common conditions.

Methods

To address these questions, a range of techniques was employed. Fieldwork was carried out in the Canary Islands (July to August 2015 and May to June 2016) with Mark Carine, Rachael Graham, Alfredo Reyes Betancort, Arnoldo Santo Guerra, Giancarlo Torre and Magui Olangua Corral. Material from Madeira resulted from a fieldtrip undertaken by Rachael Graham, Mark Carine, Miguel Menezes de Sequeira and Roberto Jardim in March 2016. Material from the Selvagens was provided by Miguel Menezes de Sequeira. The fieldwork aims were to collect (1) silica dried leaf samples, cypselae (dry single seeded ) and representative vouchers from all members of the genus, (2) population level sampling of silica dried leaf samples and cypselae from the homoploid hybrid species (A. sundingii and A. lemsii) and parental progenitors. In addition, I sought to understand the ecological and geographical context of the patterns of variation observed in the genus. A total of 291 vouchers, 781 silica died leaf samples and 399 cypselae collections were made. Leaf samples and vouchers have been barcoded and accessioned at the Natural History Museum (BM; Figure 1.6) and duplicate specimens have been prepared for Macaronesian herbaria including Instituto Canario de Investigaciones Agrarias (Tenerife; ORT), Jardín Botánico Canario (La Palma; LPA) and the University of Madeira.

15 Introduction

16 Introduction

Figure 1.6 - Vouchers specimens of (A) A. tenerifae, (B) A. gracile, (C) A. coronopifolium and (D) A. adauctum subsp. palmensis deposited at the Natural History Museum, London (BM).

Previous studies of Argyranthemum largely relied on a handful of selected molecular markers generated by restriction enzymes or Sanger sequencing. To overcome some of the limitations of earlier studies, this project employed Next Generation Sequencing (NGS) methods. NGS is a general term used to describe a new generation of sequencing platforms that can sequence thousands of sequences in parallel at a fraction of the cost (per base pair) of traditional methods. Several platforms are available including Illumina, PacBio and Roche 454. Although each has unique characteristics, all involve library preparation, immobilisation of template DNA sequences on a solid surface or support before sequencing thousands to billions of fragments simultaneously (Metzker, 2010). In this project, two NGS methods are employed, namely Genotyping-By- Sequencing (GBS) and RNA-seq. The reasoning and methodology behind each are explained briefly.

Reduced representation data sets, such as Genotyping-By-Sequencing (GBS) or restriction site associated DNA sequences (RAD-seq), sequence a subset of the genome (Andrews et al., 2016). Methods such as these are advantageous in that no prior knowledge of the genome is required and it can be used across multiple samples in parallel. Further, because only a fraction of the genome is sequenced, they are significantly less costly than whole genome sequencing approaches. As a result, reduced representation data sets have been widely employed as a method of obtaining SNP data from non-model organisms (Eaton & Ree, 2013; Paun et al., 2016; Curto et al., 2017). Reduced representation methods typically follow a similar protocol (Andrews et al., 2016). Genomic DNA is isolated and sheared using restriction enzymes before selecting fragments of a certain size. Adapters with barcode sequences are ligated to the selected fragments such that DNA from different samples can be multiplexed. Importantly, restriction enzyme cut sites are likely to be conserved such that fragments from the same part of the genome will be sequenced across closely related species. The fragments of DNA randomly selected from across the genome are then sequenced on an NGS platform. The resulting data can be assembled against a reference genome or de novo with a range of different bioinformatic pipelines (Catchen et al., 2013; Eaton & Ree, 2013). In this thesis, a GBS dataset is used to investigate putative homoploid hybrid species in the case of A. sundingii and A. lemsii. In addition, it is also used to resolve evolutionary relationships and infer the relative importance of geographical isolation, habitat shifts and hybridisation in the diversification of Argyranthemum. RNA-sequencing (RNA-seq) targets the expressed portion of the genome (transcriptome), simultaneously providing a snapshot of sequence polymorphism and variation in expression

17 Introduction across thousands of genes. A wide variety of applications are possible for RNA-seq including the identification of variable molecular markers (Chapman, 2015), identification of differentially expressed genes and sites undergoing selection (Elmer et al., 2010; Chapman et al., 2013), investigating phylogenetic relationships (Yang et al., 2015), reconstructing demographic history (Ru et al., 2018) and inferring changes in the rate of evolution (Hodgins et al., 2015; Nevado et al., 2016). Importantly, RNA-seq can be employed without any prior knowledge of genome, making this an appropriate method for non-model systems. In RNA-seq, RNA is isolated, converted to a library of cDNA fragments with adaptors on either one or both ends before sequencing with an NGS platform. Transcriptome data can be assembled with or without a reference genome using a range of bioinformatic pipelines (Birol et al., 2009; Schulz et al., 2012; Grabherr et al., 2013; Haas et al., 2014). In this thesis, RNA-seq is first used to identify and design primers for simple sequence repeat markers (SSRs) that are polymorphic. These SSR markers are later used in population genetic analyses of the homoploid hybrid species in Argyranthemum. In addition, RNA- seq is employed to identify gene expression changes associated with origin of the homoploid hybrid species in Argyranthemum.

Various analytical methods are implemented in this thesis including morphological analysis, population genetic analyses, demographic modelling, ecological niche modelling, phylogenetic inference, ancestral state estimation, D-statistics (ABBA-BABA tests) and gene expression analyses. Morphological analyses are used to test the delimitation of A. sundingii and A. lemsii. The taxonomic status of subspecies of A. broussonetii is also investigated. Population genetic analyses including ordinations and STRUCTURE (Pritchard et al., 2000) are valuable for visualising similarity and dissimilarity between sampled individuals. In this thesis, population genetic analysis is employed to test the distinction of the homoploid hybrid species in Argyranthemum from each other and the parental progenitors. Demographic modelling is often used to compare simulations of different evolutionary scenarios to biological sequence data, as a means of identifying the most likely evolutionary scenario. Software packages for this include DIYABC (Cornuet et al., 2008, 2010) and fastsimcoal (Excoffier & Foll, 2011). Demographic modelling was implemented in this thesis to test the hypothesis that the A. sundingii and A. lemsii were the result of independent HHS events. Ecological niche modelling (ENM) can quantify a species distribution, how it might change over time and the degree of overlap with other species. ENM requires knowledge of species distributions and relevant environmental data. Various ENM methods are available including MAXENT (Phillips et al., 2018) and ordination based approaches (Di Cola et al., 2017). ENM was employed to test the hypothesis that the homoploid hybrid species occupy a novel intermediate habitat with respect to the parental progenitors.

18 Introduction

In this thesis, I used GBS data for phylogenetic inference. While the number of GBS markers obtained is usually high, individual markers are too short and lack sufficient variation to resolve relationships. As such, GBS markers are often collapsed into a single haplotype, ideally including both variant and invariant sites, before analysing relationships using maximum likelihood or Bayesian methodologies (Leaché et al., 2015). This approach is used in this thesis to resolve the phylogenetic relationships in Argyranthemum. Mapping of ancestral states onto nodes of a phylogenetic tree is often employed as a method of inferring the evolutionary the origin of particular traits in a tree or the processes responsible for divergence. In this thesis, a maximum likelihood approach is used to estimate ancestral states and investigate the relative importance of geographic isolation and habitat shifts in the diversification of Argyranthemum.

D-statistics (ABBA-BABA tests) can be employed as a means of identifying possible hybridisation/introgression (Eaton & Ree, 2013; Curto et al., 2017). Each test takes a four taxon pectinate tree denoted by (((P1,P2),P3),O) and identifies incongruent ancestral (A) and derived (B) alleles denoted as ABBA or BABA. Incongruence can be caused by incomplete lineage sorting (ILS) or hybridisation. If ILS is responsible, we would expect the proportion of ABBA and BABA alleles to be equal. D-statistics are used in this thesis to test the hypothesis that hybridisation is frequent between species occupying the same island and that hybridisation may explain the presence of non-monophyletic multi-island endemic lineages in Argyranthemum.

Gene expression analyses using RNA-seq data can be implemented with or without a reference genome. In this thesis, a de novo approach is employed using Trinity assembly software (Grabherr et al., 2013; Haas et al., 2014). For gene expression analyses across multiple species there is no standard methodology so we compare various pipelines. To link gene expression to potential functions, genes are often annotated by BLAST searches against model organisms such as Arabidopsis thaliana, for which gene ontology information is available. Over represented gene ontology terms are then identified using Fishers exact test.

Thesis outline

This thesis comprises five main data chapters. Chapter two is a study published in Applications in Plant Sciences (White et al., 2016; DOI 10.3732/apps.1600050), in which simple sequence repeat markers (SSRs) are designed from transcriptome assemblies for three Macaronesian endemic genera, including Argyranthemum. Thirty SSR markers were designed of which, 12 could be amplified by PCR and eight were polymorphic. This study demonstrated the value of transcriptomes as a genetic resource for non-model systems and the markers generated are used

19 Introduction for population genetic analyses (see chapter 3). My roles in this paper were the implementation of lab work, data analysis and preparation of the manuscript.

Chapter three addresses several outstanding questions surrounding the origin of the homoploid hybrid species A. sundingii and A. lemsii and has been published in Molecular Ecology (White et al., 2018; DOI 10.1111/mec.14889). In this study, leaf morphological analysis suggests that A. sundingii and A. lemsii are distinct from their parental progenitors and distinguishable from each other based on leaf area. Ecological niche modelling (ENM) demonstrates that the homoploid hybrid species occupy novel habitats that are intermediate relative to the parental species. Nuclear simple sequence repeat markers (SSRs) and single nucleotide polymorphism (SNP) data indicate that the homoploid hybrid species are distinct from the parental taxa, while populations previously referred to as “A. cf. lemsii” are likely of hybrid origin and have subsequently introgressed with A. frutescens. Population level sampling of chloroplast SSRs and approximate Bayesian computation show that A. sundingii and A. lemsii are independently derived from the same parental cross. This study provides support for the working model of HHS in Argyranthemum. Specifically, that the homoploid hybrid species are distinct, ecologically isolated from each other and from their parental progenitors and that they have evolved as a result of independent HHS events. My roles were fieldwork sampling, the design of the study, implementation of lab work, data analysis and preparation of the manuscript.

In chapter four, GBS is employed to investigate phylogenetic relationships in Argyranthemum and the relative importance of geographical isolation, habitat shifts and hybridisation in the diversification of the genus. I intend to submit this paper for peer review after submission of this thesis and it is formatted for the intended journal. Ancestral state reconstruction revealed an important role for both geographic isolation and habitat shifts in the diversification of Argyranthemum. In particular, habitat shifts were found to be more important in the Canary Islands than previously thought. D-statistics (ABBA-BABA tests) identified evidence of hybridisation between lineages co-occurring on the same island but revealed little support for the hypothesis that that hybridisation may be responsible for the occurrence of non-monophyletic multi-island endemic (MIE) species. This study demonstrates that geographic isolation, habitat shifts and hybridisation have all been important in the diversification of Argyranthemum. In addition, morphological convergence is proposed as an explanation for the occurrence of non- monophyletic MIE species, revealing greater complexity in the processes responsible for the diversification of this endemic oceanic island genus than was previously thought.

In the phylogenetic analysis presented in chapter four, subspecies of A. broussonetii were found to be non-monophyletic with subsp. broussonetii from Tenerife and subsp. gomerensis from La

20 Introduction

Gomera resolved in different clades. In chapter five, a morphological analysis of A. broussonetii is presented. Although subspecies of A. broussonetii have converged on a similar leaf morphology, likely a consequence of their occupation of similar ecological habitats, the two subspecies can be readily distinguished from each other based on cypselae characteristics. Indeed, A. broussonetii subsp. gomerensis is shown to be morphologically more similar to A. callichrysum that also occurs on La Gomera, than to A. broussonetii subsp. broussonetii. This study confirms the hypothesis that convergent evolution has been important in the evolution of Argyranthemum and supports raising A. broussonetii subsp. gomerensis to the rank of species. A revision is presented with the necessary new combination proposed and a key to the three taxa presented. I intend to submit this paper for peer review after submission of this thesis and it is formatted for the intended journal.

Chapter six presents a comparative transcriptomic analysis to test the hypothesis that novel gene expression combinations and/or transgressive expression generated by hybridisation play an important role in ecological isolation and origin of a homoploid hybrid species. Five different pipelines for transcriptome assembly and transcript quantification are compared as there is no established methodology for comparative transcriptomic analyses. We identify genes that (A) are differentially expressed between the parental species, (B) show novel expression in the homoploid hybrid species, (C) are differentially expressed between the two homoploid hybrid species, and (D) show parent-like expression in the hybrid species. This study highlights considerations related to orthology that must be accounted for in comparative transcriptomic analyses and identifies putative genes involved in the ecological isolation and diversification of the homoploid hybrid species in Argyranthemum. Although independently derived, A. sundingii and A. lemsii also appear to have converged in their gene expression.

Finally in chapter 7, the findings from each section are reviewed and potential future avenues for research on Argyranthemum as a model for understanding speciation and hybridisation are discussed.

21

Chapter 2 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species

This chapter is published as:

Oliver W. White, Bethany Doo, Mark A. Carine, and Mark A. Chapman (2016) Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species. Applications in Plant Sciences, 4, 1600050

Oliver W. White1,2, Bethany Doo2, Mark A. Carine1 and Mark A. Chapman2

1 Algae, Fungi and Plants Division, Department of Life Sciences, The Natural History Museum, Cromwell Road, London, SW7 5BD, United Kingdom.

2 Centre for Biological Sciences, University of Southampton, Southampton, SO17 1BJ, United Kingdom.

Author contributions

Oliver White implemented lab work, data analysis and prepared the manuscript. Bethany Doo assisted with lab work. Mark Carine and Mark Chapman provided supervision in the preparation of this manuscript.

23 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species

Abstract

Oceanic islands offer unparalleled opportunities to investigate evolutionary processes such as adaptation and speciation. However, few genomic resources are available for oceanic Island endemics. In this study, we publish transcriptome sequences from three Macaronesian endemic plant species that are representative of lineages that have radiated in the region. In addition, the utility of transcriptome data for marker development is demonstrated. Transcriptomes from the three plant species were sequenced, assembled and annotated. Between 1,972 and 2,282 simple sequence repeat (SSR) markers were identified for each taxon. Primers were designed and tested for 30 of the candidate SSRs identified in Argyranthemum, of which 12 amplified well across three species, and eight were polymorphic. We demonstrate here that a single transcriptome sequence is sufficient to identify hundreds of polymorphic SSR markers. The SSRs are applicable to a wide range of questions relating to the evolution of island lineages.

Introduction

The availability of Next Generation Sequence (NGS) technology for non-model organisms in prime ecological scenarios has revolutionised evolutionary biology (Egan et al., 2012). An exciting prospect of NGS is the potential to improve our understanding of the genetic basis of processes such as adaptation and speciation (Stapley et al., 2010; Kelley et al., 2012; Chapman et al., 2013; Sousa & Hey, 2013; Rius et al., 2015; Twyford et al., 2015). Volcanic oceanic Islands have long served as model systems for the study of such evolutionary processes (Emerson, 2002), however, the capabilities of NGS for oceanic island endemics are only starting to be realised (Kueffer et al., 2014). For example, NGS approaches have been employed to investigate the radiation of Darwin’s finches from the Galapagos Islands (Lamichhaney et al., 2015) and to untangle the complex phylogenetic relationships of Tolpis Adans, a genus of flowering plants from Macaronesia (Mort et al., 2015). NGS has also been successfully applied to other ‘island-like’ scenarios such as the radiation of Cichlid fishes in African and Neotropical Lakes (Fan et al., 2012).

While NGS is becoming more affordable, the cost of obtaining genome level sequences from multiple individuals or population sampling is still high. However, a large genetic resource from just a single or a few individuals (e.g. a transcriptome sequence, or an EST [expressed sequence tag] library) offers the ability to produce highly cost-effective PCR-based molecular markers that can be amplified in many individuals at a fraction of the cost (Ellis & Burke, 2007). To generate further interest in this area and to develop a novel genetic resource, we have sequenced and

24 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species assembled transcriptome sequences for three plant species, Argyranthemum broussonetii (Pers.) Humphries (Asteraceae), Descurainia bourgaeana (E.Fourn.) Webb ex O.E.Schulz (Brassicaceae) and Echium wildpretii H.Pearson ex Hook.f. (), that belong to three endemic radiations of Macaronesia (Figure 2.1).

Isolated oceanic archipelagos are botanically diverse and rich in endemic species, making them ideal systems to investigate the origin and evolution of plant diversity (Losos & Ricklefs, 2009; Kueffer et al., 2014). These taxa have been selected since they belong to genera that offer exceptional “natural laboratories” in the Macaronesian archipelagos with which to investigate a range of evolutionary phenomena.

Figure 2.1 - The Macaronesian archipelagos in the North Atlantic Ocean.

Argyranthemum Webb is the largest endemic genus found in Macaronesia with a total of 23 species (Figure 2.2A; Humphries, 1976, 1979; Francisco-Ortega et al., 1996b) including a rare putative example of homoploid hybrid speciation (Brochmann et al., 2000; Fjellheim et al., 2009). There are seven species of Descurainia Webb & Berthel. endemic to the Canary Islands where they exhibit multiple independent adaptations to high altitude habitats (Figure 2.2B; Goodson et al., 2006). A total of 27 Echium L. species are endemic to Macaronesia where they occur in a wide range of habitats and exhibit conspicuous differences in morphology with annual herbs, candelabra shrubs and monocarpic rosettes all represented (Figure 2.2C; Böhle et al., 1996).

25 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species

Transcriptome data for Macaronesian endemic taxa could be used in a number of ways to assist evolutionary studies. Phylogenetic analyses of Macaronesian endemic lineages to date have often lacked sufficient resolution to interpret patterns of evolution due to a lack of genetic variation (Mort et al., 2015), likely a result of the recent and rapid nature of island radiations (Francisco- Ortega et al., 1997b) and/or reliance upon commonly used universal molecular markers such as the Internal Transcribed Spacer (ITS; Sun et al., 1994) or non-coding chloroplast regions (Shaw et al., 2007). Comparative analysis of transcriptome sequences can be used to identify universal markers which can be widely amplified across taxa, but still exhibit variation within taxa, facilitating phylogenetic reconstruction of poorly resolved groups (Chapman et al., 2007; Chamala et al., 2015). Annotated transcriptomes are also useful for the identification of specific genes of interest.

Figure 2.2 - Macaronesian endemics Argyranthemum sp. (A), Descurainia bourgaeana (B), and Echium wildpretii (C). Photos taken by O. White.

Transcriptome sequences can also be mined for microsatellites or Simple Sequence Repeats (SSRs). SSRs are advantageous over other PCR based markers since they are codominant, often highly polymorphic and are transferable to closely related species (Ellis & Burke, 2007). SSRs can be used to investigate genetic diversity, assess population structure and gene flow and inform conservation strategies (Ellis & Burke, 2007). The development of SSRs using traditional methods

26 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species is costly and time consuming, but more recent approaches have involved the development of SSRs from expressed sequence tag (EST) databases (Ellis and Burke 2007) and transcriptome sequences (Wang et al., 2013; Chapman, 2015).

For this study, we sequenced and annotated transcriptomes for three Macaronesian endemic plant species and focused on the potential application of transcriptome resources for the identification of SSR loci. As proof of concept, we designed and trialled primers for 30 SSR loci in Argyranthemum. All of the resultant data, including transcriptome assemblies, BLAST search results and SSR loci have been made publically available as a resource for future genetic studies in these Macaronesian endemic lineages.

Methods and Results

Cypselae (dry single-seeded fruit) of Argyranthemum broussonetii were collected from Barranco de Valle Crispín in the Anaga peninsula of Tenerife during June 2015 (Collected under a permit from the Cabildo de Tenerife, number 18297). Seed of Echium wildpretii and Descurainia bourgaeana were sourced from Bonn Botanic Gardens, Germany and the Millennium Seed Bank Partnership, respectively. These were soaked overnight in 0.5 mg mL−1 gibberellic acid, then rinsed and placed on damp filter paper in Petri dishes at 22 °C until germination. Germinated were transferred to a 2:1 mixture of Levington’s F2+S and vermiculite in a greenhouse with 16 hour days, supplemented with artificial light.

RNA extraction was carried out from true leaves of seedlings using a QIAGEN RNeasy Kit (QIAGEN, Crawley, United Kingdom) following the manufacturer’s protocol, with on-column DNase digestion (RNase-free DNase, QIAGEN). One microgram of RNA was prepped for sequencing using the KAPA Biosystems Stranded RNA-Seq Library Preparation Kit with unique adapters for each sample to allow de-convolution of reads. Library amplification was carried out with 7 cycles of PCR. The three samples were combined and sequenced on a ~3/4 lane of an Illumina MiSeq with 300bp paired-end (PE) reads, at the National Oceanography Centre, University of Southampton, UK.

Between 3.5 and 3.9 million paired-end 100-bp reads were generated for each of the samples (Table 2.1). Reads have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (BioProject ID PRJNA324223) and the FASTA-formatted transcriptome sequences are available from the authors upon request.

27 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species

The following steps were carried out on the raw transcriptome sequences for each of the Canary Island endemics separately. Poor quality sequence data and adapter sequences were trimmed with Trimmomatic (Bolger et al., 2014). Parameters included Illumina clip with seed mismatches 2, palindrome clip threshold 30, simple clip threshold 10, minimum adapter length 8, keep both reads equals TRUE, leading quality and trailing quality 5, sliding window trimming with a window size 4, required quality 15 and finally minimum read length of 36. The resulting reads were used to create a de novo transcriptome assembly with Trinity (Grabherr et al., 2011). Libraries were normalised to a kmer coverage of 30 so as to reduce computation time, and then assembled using Trinity with the settings − min_kmer_cov 2 to increase the stringency for reads being assembled together, and – max_diffs_same_path 4, – max_internal_gap_same_path 15, which allowed for more divergent reads (up to four nucleotide differences and up to a 15-bp gap) to be assembled into the same transcript. This takes into account the likely heterozygosity of the species.

Although the number of raw and normalised reads from each of these species was relatively similar, de novo assembly of these sequences resulted in transcriptomes with different assembly characteristics. Argyranthemum broussonetii had the largest number of transcripts (80,620 genes and 94,522 transcripts) and D. bourgaeana the fewest (44,287 genes and 54,221 transcripts). Echium wildpretii was intermediate, comprising 58,526 genes and 69,509 transcripts (Table 2.1, Figure 2.3). Despite the A. broussonetii transcriptome comprising more transcripts, transcript length was generally shorter, whereas the transcriptome of D. bourgaeana had longer transcripts, with E. wildpretii intermediate (Table 2.1, Figure 2.3).

BLAST (Altschul et al., 1990) was utilised to compare each of the assembled transcriptomes with annotated coding sequences of Arabidopsis thaliana and Solanum lycopersicum using a local BLASTx search and peptide sequences downloaded from The Arabidopsis Information Resource (https://www.arabidopsis.org/) and The Tomato Genome Sequencing Project (ftp://ftp.solgenomics.net/tomato_genome). BLAST parameters included an Expectation Value (e) of 1.0 × 10-20 and alignment length greater than or equal to 50. One hit per transcript is reported (Accompanying Material A.1).

A substantial proportion of the contigs in our transcriptomes matched with an annotated coding sequence from either A. thaliana or S. lycopersicum (Table 2.1). This is especially true of the comparison between our Descurainia transcriptome and A. thaliana. Of the 54,221 transcripts in the assembled Descurainia transcriptome, 43,329 (80%) matched an annotated sequence from A. thaliana. In addition, 16,645 A. thaliana genes (50% of the estimated number of genes) were recovered in the Descurainia transcriptome. The proportion of transcripts that matched an

28 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species annotated coding sequence in the other BLAST search combinations were more moderate, ranging from 39-63% of transcripts matching with an annotated sequence, and 36-38% of genes being recovered from either A. thaliana or S. lycopersicum (Table 2.1).

SSRs were identified in each of the assembled transcriptomes (Appendix Table A.1) using MISA (Micro Satellite identification tool, http://pgrc.ipk-gatersleben.de/misa/misa.html). Minimum repeat number for di-, tri- and tetranucleotide repeat markers was 8, 6, and 4 respectively.

A large number of microsatellites were identified in each of the transcriptomes (Accompanying Material A.1): 2,282, 1,972 and 1,284 in A. broussonetii, D. bourgaeana and E. wildpretii respectively. Several of these were within the first or last 50 bp of the transcript, were in compound formation or there was more than one SSR per transcript. After removal of these, the number of SSRs was 1,288, 1,219 and 737 for A. broussonetii, D. bourgaeana and E. wildpretii respectively. The number of trinucleotide repeat markers was notably higher than di- or tetranucleotide markers, typical of SSRs identified in coding regions from EST libraries/transcriptomes.

29 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species

A 100,000 No. of genes No. of transcripts 80,000

60,000

40,000

20,000

0 Argyranthemum Descurainia Echium wildpretii broussonetii bourgaeana

1400 N50 all transcripts B Average length 1200 Median length 1000

800

600

400

200

0 Argyranthemum Descurainia Echium wildpretii broussonetii bourgaeana

30000 C Argyranthemum broussonetii 25000 Descurainia bourgaeana Echium wildpretii 20000

15000

10000

5000

0

Figure 2.3 - Summary statistics for the three de novo transcriptome assemblies. (A) Number of genes and transcripts assembled for each species. (B) N50, mean, and median transcript length. (C) Transcript lengths for the three transcriptomes (note the change in bin size along the x axis).

30 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species

Table 2.1 - Summary statistics for the de novo assembled transcriptomes, BLASTx searches, and simple sequence repeat identification.

Argyranthemum broussonetii Descurainia bourgaeana Echium wildpretii Raw reads

No. of raw reads 3,950,166 3,958,226 3,550,703 No. of trimmed reads 3,842,373 3,774,119 3,375,554 No. of normalised reads 1,471,858 1,234,637 1,228,003 Normalised reads % 38% 33% 36% Assembly Statistics

No. of genes 80,620 44,287 58,526 No. of transcripts 94,522 54,221 69,509 N50 all transcripts 921 1,233 1,041 Median length 411 549 446 Average length 654 833 707 Total no. of assembled bases 61,826,463 45,213,744 49,160,261 BLASTx analyses1

A. thaliana BLAST hits 37,265 43,329 37,060 A. thaliana BLAST hits (%) 39% 80% 53% A. thaliana genes recovered 12,235 16,645 12,072 A. thaliana genes recovered (%)2 36% 50% 36% S. lycopersicum BLAST hits 39,521 34,042 39,758 S. lycopersicum BLAST hits (%) 42% 63% 57% S. lycopersicum genes recovered 13,139 12,534 13,101 S. lycopersicum genes recovered (%)2 38% 36% 38% MISA Statistics

Total no. of identified SSRs 2,282 1,972 1,284 Total no. of SSR containing transcripts 2,232 1,919 1,251 No. of transcripts with more than 1 SSR 50 51 31 No. of SSRs in compound formation 68 34 31 No. of dinucleotide SSRs 478 604 230 No. of trinucleotide SSRs 1,097 1,098 799 No. of tetranucleotide SSRs 589 183 191 No. of SSRs suitable for primer design3 1,288 1,219 737

1 BLAST parameters included an expectation value (E-value) of 1.0 × 10−20 or less. 2 Proportion of the A. thaliana (33,602) or S. lycopersicum (34,727) annotated coding sequences recovered from the target species. 3 SSR loci were excluded if they were located in the first or last 50 bp of the contig, if SSRs were in compound formation, or if there were multiple SSR loci per contig.

Primers were designed for 30 SSRs identified in A. broussonetii using Primer3 (Untergasser et al. 2012). Contigs were avoided if (1) the SSR was in the first or last 50bp (ensuring there was sufficient space for primer design), (2) the SSR was in a compound formation and (3) there was more than one SSR present in the locus. Longer SSRs were preferentially chosen since a number of studies have suggested that longer SSRs are more likely to be polymorphic (Burstin et al., 2001; Mun et al., 2006; Wang et al., 2012). The 30 loci (Appendix Table A.1) were screened across 10

31 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species

DNA samples from four members of the genus Argyranthemum, including A. frutescens (L.) Sch.Bip subsp. frutescens, A. frutescens subsp. succulentum Humphries, A. broussonetii subsp. broussonetii and A. tenerifae Humphries. These taxa were selected to test how broadly our SSR loci can be utilised effectively, from closely related taxa such as the A. frutescens subspecies to more distantly related species such as A. broussonetii and A. tenerifae.

DNA was extracted from silica-dried leaf material of Argyranthemum using a modified CTAB based method (Doyle & Doyle 1987). For PCR, the forward primers were designed with the sequence M13(-29) sequence (CACGACGTTGTAAAACGAC) appended to the 5’ end such that a third fluorescently-labelled M13(-29) primer (either TET or FAM) could be incorporated in the PCR (Schuelke, 2000). Each polymerase chain reaction (PCR) contained 10 mM Tris–HCl (pH 8.8), 50 mM KCl, 0.01 % Tween 20, 1.5 mM MgCl2, 0.2 mM dNTPs, 0.04 µM forward primer, 0.2 µM reverse primer, 0.2 µM fluorescent primer, 1 unit of Taq DNA polymerase, 15 ng DNA, and was made up to 15 µL with water. PCR conditions consisted of an initial denaturation (94°C for 3 min), 10 ‘touchdown’ cycles of 94°C for 30 s, 65°C for 30 s (decreasing by 1°C per cycle), 72°C for 60 s, 30 cycles of 94°C for 30 s, 55°C for 30 s, 72°C for 60 s, and a final elongation (72°C for 7 min). Amplification success was assessed by running PCR products on 1% agarose gels stained with GelRed (Biotium, Hayward, California, USA).

Primers that allowed for the amplification of one distinct PCR product between 100 and 400 bp were selected for use in further experiments. PCR products were diluted 1:30 and combined in such a way that multiple loci (differing in size and/or fluorescent label) could be resolved in a single lane on an ABI3730xl (Applied Biosystems, Carlsbad, USA) at the Department of Zoology, University of Oxford, UK. Alleles were scored from the raw traces using GeneMarker 2.6.7 (Soft Genetics, State College, Pennsylvania, USA) and we performed a principal coordinates analysis (PCoA) of the 10 Argyranthemum samples based on polymorphic loci using GenAlEx (Figure 2.4) (Peakall & Smouse, 2012).

All 30 SSR primer pairs tested in Argyranthemum produced a PCR product, however only 12 primer pairs produced a clear PCR product between 100 and 400 bp. Of these, eight were polymorphic in the ten DNA samples, one was monomorphic and three produced multiple non- specific PCR products. Although our sample number is small, principal coordinates analysis (PCoA) of SSR loci show that these markers are able to differentiate species, as there is a clear grouping of A. frutescens and A. broussonetii (Figure 2.4). The SSRs identified may hold value for identifying genetic variation between subspecies as there is also some separation between the A. frutescens subsp. frutescens and subsp. succulentum. The position of A. tenerifae is not clear, as it appears to

32 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species overlap with A. frutescens subsp. frutescens. This seems to suggest that A. tenerifae is more closely related to A. frutescens subsp. frutescens than other species included in the analysis, but it

may also be a result of the low sample size. Coord. (15.41%) 2

Coord. 1 (27.63%)

A. broussonetii A. frutescens subsp. frutescens A. frutescens subsp. succulentum A. tenerifae

Figure 2.4 - Principal coordinates analysis (PCoA) of 10 samples of Argyranthemum based on genotypic information from eight SSR loci.

Conclusions

This generation of transcriptome sequences from Macaronesian endemic species of flowering plants are the first resources of their kind for these archipelagos as far as we are aware. The de novo assembled transcriptomes have recovered a considerable portion of the expressed genes as indicated by our BLAST comparisons with A. thaliana and S. lycopersicum (Table 2.1) and 1,200 to 2,200 SSRs. The results of the microsatellite screen in Argyranthemum revealed that eight out of 30 markers (27%) were polymorphic and easy to score. Assuming this is representative of the entire transcriptome and across island lineages, we might expect that we are able to amplify and score ~350, ~330 and ~200 polymorphic SSRs in Argyranthemum, Descurainia and Echium respectively.

33 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species

Previous studies of rapidly radiating lineages, such as those represented by the three species we focussed on in this study, have been hampered by a lack of genetic variation between taxa using sequence-based ‘universal’ markers (Böhle et al., 1996; Francisco-Ortega et al., 1997b; Goodson et al., 2006). The markers we have identified will allow much more fine-scale resolution of evolutionary processes in these lineages. It is encouraging to find that eight SSR loci were able to differentiate A. broussonetii and both subspecies of A. frutescens (Figure 2.4). The lack of genetic differentiation between A. frutescens subsp. frutescens and A. tenerifae may be due to our low sample size but this is an area of study which warrants further study. Indeed, the sister relationships of this species is not clear on the basis of the chloroplast restriction site phylogeny by Francisco-Ortega (1996b).

Other recent studies have also used transcriptome sequences to identify SSR loci, with similar success rates. Wang et al. (2013) used a transcriptome from nankingense (Nakai) Tzvel to identify 2,813 putative SSR loci, with about 20% of the 100 tested showing polymorphism. Chapman (2015) also used transcriptome sequences to identify between 1,139 and 2,567 SSR loci for each of four underutilised legumes. In a follow-up study, 36 primer pairs were designed for one of these species, of which 6 (17 %) were found to be polymorphic in a small number of accessions (Robotham & Chapman, 2017).

For our study, we only attempted to amplify and genotype a small number of SSR loci, and we did not try and optimise the PCR for markers that amplified non-specific products. In addition, the settings and programs used in the SSR discovery are likely to resolve more or less markers depending on how strict one sets the parameters. Nevertheless, the identification of SSR loci using transcriptome sequences is a more cost effective, less time consuming and methodically straightforward compared to earlier methods (Ellis and Burke, 2007).

Previous studies have used NGS technology to identify SSRs from oceanic island endemic plants. Takayama et al. (2013) developed ten SSR markers for DC. (Asteraceae) endemic to the Juan Fernández Archipelago. In a later study, Takayama et al. (2015) demonstrated how SSRs can be used to assess population structure using STRUCTURE and infer evolutionary relationships by constructing neighbour joining networks. The SSRs identified in the present study can be harnessed in a similar way for a range of evolutionary studies, ranging from homoploid hybrid speciation in Argyranthemum, adaptation to high altitude in Descurainia and evolution of monocarpy in Echium.

Although we have focused on SSR identification, there are further potential applications for the transcriptomes generated in future studies of Argyranthemum, Descurainia or Echium, such as the

34 Transcriptome Sequencing and Simple Sequence Repeat Marker Development for Three Macaronesian Endemic Plant Species development of universal markers which can be widely amplified across taxa (Wu et al., 2006; Chapman et al., 2007; Chapman, 2015) and the use of transcriptomes in studies of adaptation and speciation to link divergent genes to genes of known function.

Transcriptome sequences are a valuable source of genomic information that can be readily acquired from non-model organisms in prime ecological scenarios. The availability of free de-novo transcriptome assembly software means there is no need for a reference genome to map sequences against and no subsequent costs. As such, this NGS approach can be applied to any non-model organism and downstream processing does not require any extra expenditure. Simultaneously providing gene expression and sequence polymorphism for thousands of genes, transcriptomics is one branch of NGS technology that holds exciting potential to inform evolutionary biologists about the genetic changes underlying processes such as adaptation and speciation (Sousa & Hey, 2013; Rius et al., 2015).

35

Chapter 3 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

This chapter is published as:

Oliver W. White, Alfredo Reyes-Betancort3, Mark A. Chapman2 and Mark A. Carine1 (2018) Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum. Molecular Ecology, 2018; 00: 1–19. https://doi.org/10.1111/mec.14889

1 Algae, Fungi and Plants Division, Department of Life Sciences, The Natural History Museum, Cromwell Road, London, SW7 5BD, United Kingdom.

2 Centre for Biological Sciences, University of Southampton, Southampton, SO17 1BJ, United Kingdom.

3 Jardín de Aclimatación de La Oratava (ICIA), C/ Retama 2 38400 Puerto de la Cruz, Tenerife, Spain.

Author contributions

Oliver White implemented fieldwork, lab work, data analysis and prepared the manuscript. Alfredo Reyes-Betancort provided knowledge on population localities and assisted with fieldwork collections. Mark Carine and Mark Chapman provided supervision in the preparation of this manuscript.

37 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Abstract

Well-characterised examples of homoploid hybrid speciation (HHS) are rare in nature yet they offer the potential to study a number of evolutionary processes. In this study we investigate putative homoploid hybrid species in the genus Argyranthemum (Asteraceae), a group of plants endemic to the Macaronesian archipelagos of the North Atlantic Ocean. We specifically address a number of knowledge gaps surrounding the origin(s) of A. sundingii and A. lemsii, which are thought to be derived from the same parental cross. Comparisons of leaf morphology suggest that A. sundingii and A. lemsii are distinct from their parental progenitors and distinguishable from each other based on leaf area. Ecological niche modelling (ENM) demonstrated that the homoploid hybrid species occupy novel habitats that are intermediate relative to the parental species. Nuclear SSRs and SNP data indicate that the homoploid hybrid species are distinct from the parental taxa, whilst population level sampling of chloroplast SSRs and Approximate Bayesian Computation show that A. sundingii and A. lemsii are independently derived from the same parental cross. As such, Argyranthemum represents an example of independent homoploid hybrid speciation events with evidence of divergence in leaf morphology and adaptation to novel intermediate habitats. On oceanic islands, which are often typified by steep ecological gradients and inhabited by recently derived species with weak reproductive barriers, multiple HHS events from the same parental cross are not only possible but are likely to have played a more important role in oceanic island radiations than we currently think.

Introduction

Homoploid hybrid speciation (HHS), the origin of a new species by hybridisation without a change in chromosome number, is generally considered to be exceptionally rare in angiosperm evolution (Hegarty & Hiscock, 2005; Soltis & Soltis, 2009; Schumer et al., 2014; Goulet et al., 2017) with a recent review identifying 28 putative examples of HHS in flowering plants (Kadereit, 2015). This is in sharp contrast to polyploid hybrid speciation, involving a doubling of chromosome number, which is thought to account for approximately 15 % of angiosperm speciation events (Wood et al., 2009). The relative rarity of HHS has largely been attributed to two main factors. Firstly, early generation hybrids frequently have reduced fitness caused by a loss of fertility and/or viability, characteristics generally referred to as hybrid incompatibilities (Rieseberg & Willis, 2007). Secondly, since early generation homoploid hybrids share the same chromosome number as their parental progenitors they are particularly susceptible to introgression and assimilation (Rieseberg

38 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

& Willis, 2007; Schumer et al., 2014). As such, the lack of habitats not occupied by one or both of the parental species may also limit the existence of homoploid hybrid species.

The accurate identification of a homoploid hybrid species is experimentally challenging, often requiring a combination of morphological and molecular analyses (Mallet, 2007). In addition, there is a lack of agreement on the defining criteria of a homoploid hybrid species (Schumer et al., 2014, 2018; Nieto Feliner et al., 2017). Nevertheless, given that occasional hybridisation between closely related species is frequent in plants (Mallet, 2005) and that homoploid hybrids are difficult to identify, opinion has recently shifted to speculate that HHS might play a more prominent role in plant evolution than previously thought (Mallet, 2005; Mavárez & Linares, 2008; Nolte & Tautz, 2010).

Examples of HHS provide natural experiments for the study of fundamental evolutionary processes such as reproductive isolation, hybridisation, adaptation, ecological speciation and genome evolution. An excellent example of this can be found in North American sunflowers Helianthus L., where hybridisation between the same parental combination of H. annuus L. and H. petiolaris Nutt. has resulted in the formation of three distinct homoploid hybrid species (H. anomalus S.F.Blake, H. paradoxus Heiser, and H. deserticola Heiser; Rieseberg et al. 2003). The fact that three species in Helianthus are all derived from the same parental cross has allowed for comparative investigations of genomic composition (Rieseberg et al., 2003), genome size (Baack et al., 2005) and ecological selection (Donovan et al., 2010).

Argyranthemum Webb, a genus of 24 species endemic to the Macaronesian archipelagos of Madeira, the Selvagens and the Canary Islands provides another of the few documented examples of HHS in plants. Argyranthemum is thought to have diverged relatively recently, with estimates ranging from 2.5-3.0 mya based on isozyme differentiation (Francisco-Ortega et al., 1995a) to 0.26-2.1 mya based on ITS sequences (Francisco-Ortega et al., 1997b). All members of the genus are diploid (2n = 2x = 18; Gonzalez et al., 1997) and outcrossing because of limited self- compatibility (Francisco-Ortega et al., 1997a) and wind dispersed although not specialised for this purpose. There is little intrinsic reproductive isolation between taxa, with isolation mainly due to occupancy of different islands or ecological habitats (Francisco-Ortega et al., 1997a). The breakdown of these external barriers often results in the formation of hybrid swarms (Borgen, 1976; Brochmann, 1984) and crosses can easily be made under common garden conditions (Humphries, 1973; Brochmann et al., 2000). The presence of polyphyletic taxa in a chloroplast restriction site phylogeny has been interpreted as evidence that hybridisation was prevalent in the diversification of Argyranthemum (Francisco-Ortega et al., 1996b).

39 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Within Argyranthemum, A. sundingii Borgen and A. lemsii Humphries are thought to have originated by homoploid hybridisation (Brochmann et al. 2000; Fjellheim et al. 2009; Appendix Figure B.1). Both species are found in the Anaga peninsula of Tenerife and were discovered and described independently in southern and north-eastern valleys respectively (Humphries, 1976; Borgen, 1980; Figure 3.1). Morphological comparisons and artificial crossing experiments have demonstrated that these two species are both likely the result of hybridisation between A. frutescens (L.) Sch.Bip. and A. broussonetii (Pers.) Humphries (Brochmann et al. 2000; Appendix Figure B.1). Different A. frutescens subspecies have been implicated in the parentage of the putative homoploid hybrid species based on their geographical distributions: subsp. frutescens as a parent of A. sundingii and subsp. succulentum Humphries as a parent of A. lemsii. While A. sundingii and A. lemsii provide a potential case study of HHS, a number of significant knowledge gaps concerning the delimitation of species and their origin(s) remain to be addressed.

Figure 3.1 - Populations sampled across (A) Tenerife in the (B) Anaga Peninsula. Taxa abbreviations used throughout are shown in brackets. Contour lines represent a 200m change in altitude.

40 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

The first concerns the morphological distinction of the homoploid hybrid species. The parental species differ conspicuously in their morphology. Argyranthemum broussonetii forms a large shrub up to 1.2 m in height with large bipinnatifid leaves, large capitula and dark brown ray cypselae (dry single-seeded ). In contrast, A. frutescens is a smaller branching shrub 20 – 80 cm in height, with small pinnatisect to bipinnatisect leaves, small capitula and cypselae that are pale brown in colour. subsp. succulentum is a cushion-like shrub with pale green succulent leaves found along the North coast of the Anaga peninsula under the influence of the cooling trade winds whereas subsp. frutescens is more upright and branched with less succulent and darker green leaves distributed on the South coast of the Anaga peninsula. Argyranthemum sundingii and A. lemsii are morphologically intermediate between A. broussonetii and A. frutescens supporting their hybrid origin (Brochmann et al., 2000). They were found also to be morphologically distinct from their parental progenitors and exhibit high fertility, characteristics which are inconsistent with a typical hybrid swarm (Brochmann et al., 2000). Despite their broadly similar appearance, A. sundingii and A. lemsii are recognised as distinct in recent floras (Bramwell & Bramwell, 2001; Schönfelder & Schönfelder, 2012) and are both of conservation concern (Moreno, 2008). However, an analysis of leaf morphology by Brochmann et al. (2000) suggested that A. sundingii and A. lemsii should be treated as a single species. This study was based on material collected only from the type localities of A. sundingii and A. lemsii; populations found more recently were not examined and the distinctiveness of taxa considering their full geographical extent remains to be robustly tested.

Although ecological selection has been implicated as an important factor in the origin of the homoploid hybrid species in Argyranthemum (Brochmann et al., 2000; Fjellheim et al., 2009), the evidence is largely anecdotal and based solely on species distributions. The parental species occupy the altitudinal extremes of the Anaga peninsula with A. broussonetii restricted to the higher altitudes between 500 – 1000 m in laurel forest clearings whereas A. frutescens typically occurs in coastal xerophytic habitats below 100 m. Both A. sundingii and A. lemsii are found at intermediate altitudes of the strong humidity gradient that exists in the peninsula in habitats created by deforestation of lower parts of the laurel forest (Brochmann et al., 2000; Fjellheim et al., 2009). Models of HHS suggest that a hybrid species can arise from chromosomal rearrangements, ecological and/or geographical isolation (Rieseberg, 1997; Buerkle et al., 2000; James & Abbott, 2005). Although these models are not mutually exclusive, ecological isolation is increasingly seen as fundamental because simulations indicate HHS is unlikely without ecological divergence (Buerkle et al., 2000). Of the 28 instances of HHS reported by Kadereit (2015), 21 were

41 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum associated with habitat or geographical displacement. However, as of yet there has been no explicit ecological analysis of the putative homoploid hybrid species in Argyranthemum.

The evidence to support the hypothesis that A. sundingii and A. lemsii originated independently through hybridisation events between A. broussonetii and A. frutescens is based on an earlier chloroplast (cp) restriction site phylogeny (Francisco-Ortega et al., 1996b) which showed that the single individual sampled for each of A. sundingii and A. lemsii had distinct cp types: the A. sundingii sample shared a haplotype with A. broussonetii whereas the A. lemsii sample shared the A. frutescens haplotype, implying that A. broussonetii and A. frutescens are the maternal parents of A. sundingii and A. lemsii respectively. However, given that sampling was limited to one accession for each putative homoploid hybrid species, these results need to be interpreted with extreme caution in a genus with weak intrinsic reproductive barriers. Differences in Amplified Fragment Length Polymorphisms (AFLPs; Fjellheim et al., 2009) have been identified between A. sundingii and A. lemsii, but were unable to differentiate between a single HHS event and divergence versus independent HHS events. Furthermore, the hypothesis that two subspecies of A. frutescens are involved in the parentage of the homoploid hybrid species was based on the distributions of the A. frutescens subspecies relative to A. sundingii and A. lemsii (Figure 3.1). More rigorous sampling is necessary to test the independent origins hypothesis and establish the contribution of the two A. frutescens subspecies to the homoploid hybrid species in the Anaga peninsula.

Fjellheim et al. (2009) identified a population in the Anaga peninsula of unknown identity that they referred to as ‘A. cf. lemsii’. This population was located in Barranco de Igueste almost equidistant between the then known populations of A. sundingii and A. lemsii. Analysis of AFLPs found that this population was genetically intermediate between A. broussonetii and A. frutescens suggesting a hybrid origin. However, based on their AFLP analysis this population also appeared to be influenced by introgression suggesting it could be a hybrid swarm (Fjellheim et al., 2009). Further populations in this area have subsequently been discovered (see Methods; Sampling) and whether plants in this area represent a hybrid swarm, a contact zone between A. sundingii and A. lemsii or an independently derived homoploid hybrid species is unclear.

This study addresses a number of outstanding questions concerning the origin of the homoploid hybrid species in Argyranthemum. We first test if A. sundingii and A. lemsii are morphologically distinct using leaf characters from plants grown under controlled glasshouse conditions. We use material collected from previously recorded as well as recently discovered populations that are yet to be assessed. The hypothesis that A. sundingii and A. lemsii are adapted to novel ecological

42 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum conditions along a strong humidity gradient that exists in the Anaga peninsula (Fjellheim et al., 2009) is also investigated using ecological niche modelling (ENM). Population level sampling of nuclear Simple Sequence Repeat markers (nSSRs) is employed to assess the genetic distinctiveness of the two homoploid hybrid species and chloroplast SSR markers (cpSSRs) are used to test the hypothesis of Brochmann et al. (2000) that A. sundingii and A. lemsii exhibit distinct maternal parentages. The origin(s) of the populations of uncertain status near Barranco de Igueste are clarified using a combination of evidence from morphology, ecology and population genetics. Finally, this study leverages a dataset of nuclear single nucleotide polymorphisms (nSNPs) obtained through Genotyping-By-Sequencing (GBS) and Approximate Bayesian Computation (ABC) to explicitly test whether the homoploid hybrid species originated by independent hybridisation events.

Materials and Methods

3.3.1 Sampling

From July to August 2015 multiple populations of A. broussonetii, A. frutescens, A. lemsii and A. sundingii in the Anaga peninsula of Tenerife were sampled for leaf, seed and voucher material (Table 3.1; Figure 3.1). Plant material was collected under a permit from the Cabildo de Tenerife, number 18297 and Gobierno de Canarias permit number 2015/939. We sampled populations investigated in earlier studies (Brochmann et al., 2000; Fjellheim et al., 2009) as well as populations recently discovered that had not been sampled previously. Population K that was sampled by Fjellheim et al. (2009) from Barranco de Igueste and population L from Lomo de las Casillas appeared to be morphologically intermediate between A. broussonetii and A. frutescens and were not clearly referable to either A. sundingii or A. lemsii. These populations are therefore referred to hereafter as A. frutescens × A. broussonetii.

Wild collected cypselae from 70 plants representing 20 populations were used to grow 123 replicates (up to two replicates per parent) under common glasshouse conditions at the University of Southampton for leaf morphological analysis (Table 3.1; Appendix Table B.1). Georeferenced samples from a total of 594 collections were used as distribution data for each taxon in our ENM analysis. Representative voucher specimens are deposited at the Natural History Museum, London (BM; Appendix Table B.2). Silica dried leaf material from a total of 198 field-collected individuals across 20 populations were sampled for nuclear and chloroplast simple sequence repeat (SSR) markers (Table 3.1). Nuclear SNP data were obtained for a total of 28 individuals using Genotyping-By-Sequencing (GBS), eight of each parental species and six of each

43 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum homoploid hybrid species (Table 3.1). Populations referred to as A. frutescens × A. broussonetii (i.e. populations K and L) were not sampled for GBS as they were admixed and not clearly identifiable as either A. sundingii or A. lemsii (see results).

44

Table 3.1 - Number of individuals sampled for population genetics of Simple Sequence Repeat markers (SSR), leaf morphological analysis and Genotyping-By-Sequencing

(GBS). Taxa abbreviations used throughout are shown in brackets. IndependentHomoploid Hybrid Speciation events the in Macaronesian endemic g Pop. Taxon Locality Latitude Longitude Morph. SSR GBS A A. broussonetii subsp. broussonetii (bro) Barranco de Valle Crispin 28.5306 -16.2423 4 11 2 B Las Casas de la Cumbre 28.5354 -16.2339 4 9 1

C Path to Mesa del Sabinal 28.5585 -16.1577 7 9 1

D La Cumbrilla 28.5663 -16.1536 4 9 1

E Chamorga 28.5722 -16.1544 7 9 2

F Roques del Fraile 28.5535 -16.2308 2 3 1

G A. sundingii (sun) Valle Crispin 28.5148 -16.2372 10 17 2 H Valle Brosque 28.5196 -16.2268 7 7 2

I Roque Cubo 28.5239 -16.2177 8 11 1

J Barranco del Cercado de Andrés 28.5298 -16.2108 8 6 1 45

K A. broussonetii × A. frutescens (bro × fru) Barranco de Igueste 28.5427 -16.1606 11 9 na L Lomo de las Casillas 28.5529 -16.1579 7 5 na

M A. lemsii (lem) Path to Mesa del Sabinal 28.5582 -16.1532 8 17 2 N La Cumbrilla 28.5667 -16.1527 6 10 2

O Barranco de Roque Bermejo 28.5741 -16.1433 8 17 2

P A. frutescens subsp. frutescens (frf) Maria Jiménez 28.5053 -16.2290 5 10 2 Q San Andrés 28.5149 -16.1976 5 11 2

R Igueste de San Andreas 28.5281 -16.1591 4 9 na

S A. frutescens subsp. succulentum (frs) Roque Bermejo 28.5782 -16.1365 6 10 2 T Between Almáciga and Roque Bermejo 28.5819 -16.1654 5 9 2

Argyranthemum

enus

Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

3.3.2 Morphological analysis

Plants sown from wild collected seed (Table 3.1) were grown for six months under common glasshouse conditions before sampling a leaf approximately halfway up the stem. No plants had started to , therefore minimising differences due to life stage of the plants. Whole leaves were imaged with a Canon EOS 600D digital SLR camera on a tripod with a scale bar next to each leaf. ImageJ v.1.5 (Schneider et al., 2012) was used to measure leaf area, perimeter, length, and width. Leaf length and width proved effective in earlier morphological comparisons (Brochmann, 1987; Brochmann et al., 2000) while leaf area and perimeter have not been used in previous studies of the homoploid hybrid species in Argyranthemum. Measurements related to primary lobes have been used previously (Brochmann, 1987) but were excluded to avoid the inclusion of over-correlated variables. To examine morphological differences between taxa, we used linear models and generalised linear models for normal and non-normally distributed data respectively, where each morphological character was treated as a response variable and the taxa as fixed factors. Normality was inferred using Q-Q norm plots, histograms and a Shapiro-Wilk test. Significant differences between taxa were identified using a post hoc Tukey’s honest significant difference (HSD) test with false discovery rate (FDR) correction (Benjamini & Hochberg, 1995) for multiple comparisons. To visualise the differences between taxa, boxplots and a principal component analysis (PCA) were generated. All analyses were performed using R (R Core Team, 2018).

3.3.3 Ecological niche modelling

Climate data with a spatial resolution of 50 m were downloaded from the Climate Impact Project for Tenerife http://climaimpacto.eu/efectos/catalogos-climaticos/5-tenerife/ based on scenario B (current 1981-2010; 15 temperature and 6 precipitation variables). Altitude data for Tenerife were downloaded from http://www.eorc.jaxa.jp/ALOS/en/aw3d30/ at a resolution of 30 m. ArcGIS was used to project the altitude data at the same resolution as the climate data and create an aspect and slope model. As aspect is a circular variable, it was treated as a categorical variable with nine values (flat, North, Northeast, East, Southeast, South, Southwest, West and Northwest). Georeferenced samples were filtered such that only a single accession per species occupied each 50 m2 pixel, before extracting values from the climate, altitude, aspect and slope data sets. To avoid the inclusion of over-correlated variables, we selected mean annual temperature and mean annual precipitation and then selected further variables that were not over-correlated with these or each other (Pearson correlation < 0.7).

46 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

ENMs were generated for each taxon using the maximum entropy method of MAXENT version 3.4.1 (Phillips et al., 2006, 2018). We employed MAXENT with a random test percentage of 25 %, regularization multiplier of one, 10,000 background points, 10 replicates, a maximum of 5000 iterations and a convergence threshold of 10-5. ENMTools version 1.4.4 (Warren et al., 2010) was then used to calculate niche overlap between each species based on the Schoener’s D (Schoener, 1968) and Warren’s I statistics (Warren et al., 2008), where a value of 0 denotes no overlap and 1 completely overlapping. To test whether the ENMs of two species are identical as expected under the null hypothesis we used the niche equivalency test initially proposed by Warren et al. (2008) in ENMtools. This test compares the observed scores of niche overlap statistics D and I with their null distribution generated with 100 pseudoreplicates (see Warren et al. (2008) for details). The null hypothesis is rejected when the observed value for the niche overlap statistics are significantly lower than the values expected from the pseudoreplicated data sets.

In order to assess the degree of niche overlap, we also employed PCA-env, initially implemented by Broennimann et al. (2012). In this approach, the multidimensional environmental space of the selected variables is first translated into two dimensions by means of principal components analysis (PCA). The PCA is divided into a grid with a resolution of 100, where each cell corresponds to a unique set of environmental conditions. Species occurrences are projected onto this grid and a smoothed density of occurrence for each species was estimated using a kernel density function. Only continuous variables could be used for this analysis so aspect was excluded. This method is advantageous in that it accounts for spatial resolution biases, makes optimal use of both geographical and environmental spaces and corrects observed occurrence densities in light of the availability of environmental space (Broennimann et al., 2012). As above, niche overlap was quantified using the D and I statistics and the niche equivalency test was employed to test whether the environmental niche space of two species are identical using 100 pseudoreplicates. All analyses for PCA-env were performed in R using the package ecospat (Di Cola et al., 2017; R Core Team, 2018).

3.3.4 DNA extraction

DNA was extracted from silica-dried leaf material using a modified CTAB based method (Doyle & Doyle, 1987). Briefly, silica dried leaf material was homogenised and extracted with a CTAB-based buffer. Lipids and other debris were removed by mixing with 24:1 chloroform: isoamyl alcohol and centrifugation. DNA was precipitated using isopropanol, pelleted by centrifugation and washed with ethanol. DNA was re-suspended in TE buffer and treated with RNase at 37°C for 30 minutes.

47 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

3.3.5 Polymerase Chain Reaction (PCR) of SSRs

A combination of eight nuclear and four chloroplast SSR markers were used for population genetic analyses (Appendix Table B.3). The nuclear SSRs (nSSRs) were developed by White et al. (2016) and are known to be variable between the parental species A. broussonetii and A. frutescens. The chloroplast SSR (cpSSR) markers were developed by Bryan et al. (1999) using Nicotiana tabacum L. but are universal for flowering plants. The forward primer for each locus was designed with the sequence M13(-29) sequence (CACGACGTTGTAAAACGAC) appended to the 5’ end such that a third fluorescently-labelled M13(-29) primer (either FAM, NED or TET) could be incorporated in the PCR (Schuelke, 2000). PCR reactions were carried out as described in White et al. (2016).

3.3.6 Population genetic analyses of nuclear SSRs

Samples were excluded from nSSR analyses if there were missing data in more than three of the eight markers scored. Summary statistics were calculated in GenAlEx version 6.502 (Peakall & Smouse, 2012) and principal component analysis (PCA) was performed using the R package adegenet (Jombart & Ahmed, 2011; R Core Team, 2018). STRUCTURE (Pritchard et al., 2000) was used to identify clustering, where K (the number of populations) was tested from 1 to 10. For each value of K, 10 runs of 2,000,000 replicates after a 500,000 burn-in were implemented. The most likely K was then determined using the delta K method of Evanno et al. (2005) with STRUCTURE HARVESTER (Earl & vonHoldt, 2012). For each accession, the proportion of membership to each of the clusters was determined using CLUMPP (Jakobsson & Rosenberg, 2007) using the full search method and the structure plots were drawn using the R package ggplot2 (Wickham, 2009; R Core Team, 2018).

3.3.7 Haplotype analysis of chloroplast SSRs

Samples were excluded from cpSSR analyses if there were any missing data and summary statistics were calculated in GenAlEx. Allele scores for cpSSRs were used to identify haplotypes and construct a median-joining haplotype network in Network 5.0 (Bandelt et al., 1999). For the purposes of identifying the parentage of the homoploid hybrid species, haplotypes were grouped according to their presence in the parental species. These groups included haplotypes that were only found in A. broussonetii, only found in A. frutescens, identified in both parental species and those found in neither (i.e. restricted to one or other of the homoploid hybrid species or A. broussonetii × A. frutescens).

48 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

3.3.8 Processing of GBS SNP data

DNA samples were sent to the Genomic Diversity Facility at Cornell University for GBS where they were digested using EcoT22I and single-end 100 bp reads were sequenced using an Illumina HiSeq. Ipyrad (version 0.7.15) was used to process the raw sequences, cluster loci and call SNPs (Eaton & Ree, 2013). The raw fastq data were de-multiplexed based on barcode sequences with no barcode mismatches allowed. Bases with low quality scores (<33) were converted to N and reads with more than five Ns were discarded. The strict setting (2) in ipyrad was used to filter adapter sequences as recommended for GBS data. Prior to clustering, samples with less than 0.5 M filtered reads were removed to avoid the inclusion of samples with large amounts of missing data. Filtered reads were assembled with three different levels of clustering threshold (80 %, 85 % and 90 %) using the de novo-reference assembly method. In this approach, reads that map to a reference are removed from downstream processes. Therefore we were able to use this to identify and remove clusters that mapped to the chloroplast genomes of Arabidopsis thaliana (L.) Heynh. (Genbank accession number NC_000932.1), Helianthus annuus (NC_007977.1) and Chrysanthemum indicum L. (NC_020320.1) and the mitochondrial genomes of A. thaliana (NC_001284.2) and H. annuus (KF815390.1). Consensus allele sequences within individuals were estimated with a minimum depth of six reads required for base calling. Consensus sequences were then clustered across samples resulting in assembled loci. For the three levels of clustering employed, loci were filtered at two levels of missing data (minimum samples per locus of 10 and 13 individuals) resulting in a total of six assemblies. Loci with a shared heterozygous site in 20 % or more of samples were also removed from the final assembly as these may result from the clustering of paralogs.

3.3.9 Genome-wide SNP analysis

Assembled loci were further filtered using vcftools (Danecek et al., 2011) for a minimum minor allele frequency of 0.05 while selecting the first SNP from each locus to reduce the likelihood that SNPs were linked. PCA and STRUCTURE were used to analyse the datasets generated using each assembly method. PCA was employed using PLINK v.1.9 (Purcell et al., 2007). STRUCTURE (Pritchard et al., 2000) was implemented as above for the SSR markers except for a shorter run length of 50,000 replicates after a 20,000 burn-in over five iterations and using the Greedy search algorithm in CLUMPP (Jakobsson & Rosenberg, 2007). Plots were drawn using the R package ggplot2 (Wickham, 2009; R Core Team, 2018).

49 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

3.3.10 Testing independent hybrid origins with ABC

We used approximate Bayesian computation (ABC) to compare different evolutionary scenarios for the origin of A. sundingii and A. lemsii (Brochmann et al., 2000; Fjellheim et al., 2009) based on our population genetic results using the software package DIYABC (Cornuet et al., 2008). A total of nine scenarios were tested (Figure 3.6). For scenarios one and two, A. sundingii and A. lemsii originate independently from crosses between A. broussonetii and different subspecies of A. frutescens. For scenarios three and four A. sundingii and A. lemsii originate independently but from the same parentage. In scenario five, A. sundingii and A. lemsii originate independently from hybridisation events between A. broussonetii and the common ancestor of the two A. frutescens subspecies. Scenarios six, seven and eight involve a single origin based on a hybridisation event between A. broussonetii and either A. frutescens subsp. frutescens, subsp. succulentum or the common ancestor of the two subspecies. Finally, in scenario nine A. sundingii and A. lemsii diverge by cladogenesis from A. broussonetii.

A vcf file of unlinked SNPs was converted to DIYABC format using the python script vcf2DIYABC.py available from https://github.com/loire/vcf2DIYABC.py. To meet the requirements of DIYABC SNPs were removed if they were not present in at least one individual from each of the five taxa (A. broussonetii, A. sundingii, A. lemsii, A. frutescens subsp. frutescens and subsp. succulentum).

For each prior a uniform distribution with a large interval was chosen due to lack of knowledge concerning population sizes, divergence times and admixture rate. Specifically, the interval for population sizes and divergence times were set to 102 – 107 and the interval for admixture rates was 0.001 – 0.999. A total of 29 summary statistics were used to compare the observed and simulated data comparisons, including the means of genic diversity, pairwise sample Fst and Nei’s distance as well as admixture summary statistics (Choisy et al., 2004). A total of 106 simulations were performed for each scenario and the posterior probability of each scenario was computed by performing a logistic regression on the 1 % of simulated data closest to the observed dataset (Cornuet et al., 2008, 2010).

Confidence in scenario choice was assessed by simulating 1000 test datasets (pseudo-observed datasets) of the selected scenario, drawing parameter values from the prior distribution. The posterior probabilities of each scenario were estimated for each simulated dataset by performing a logistic regression as above. Type I and type II errors were then evaluated by measuring the fraction of datasets simulated under the best scenario that were assigned to other scenarios and the fraction of datasets simulated under other scenarios that were assigned to the best scenario respectively (Cornuet et al., 2010).

50 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Goodness-of-fit of the selected scenario was assessed using the model checking function of DIYABC, which evaluates the ability of a given scenario to produce datasets similar to the observed dataset (Cornuet et al., 2010; Barrès et al., 2012). For the selected scenario, 1000 datasets simulated using posterior distribution of parameter values were compared with the observed dataset using different summary statistics than those used for model selection and parameter estimation to avoid overweighting the fit of the scenario (Cornuet et al., 2010; Capblancq et al., 2015). PCA was implemented to visually assess the position of the observed dataset with regards to the simulated datasets. A P-value was estimated for each summary statistic by ranking the observed value among the values obtained with simulated datasets.

The posterior distribution of the parameters for the selected model were also estimated using a local linear regression on the 1 % of simulations closest to the observed dataset and applying a logit transformation to the parameter values (Cornuet et al., 2010). These distributions estimate the most probable value and the width of the distribution for each historical parameter (Capblancq et al., 2015).

Results

3.4.1 Morphological analyses

Of the four morphological characters used only leaf area showed a non-normal distribution (Shapiro test: W = 0.9001, P < 0.001). A GLM of leaf area identified significant differences between taxa (GLM with a quasi link log function: F 5,117 = 77.416, P < 0.001). Linear models identified significant differences between taxa for perimeter (F5,117 = 19.756, P < 0.001), leaf length (F5,117 =

39.934, P < 0.001) and leaf width (F5,117 = 22.794, P < 0.001). Summary plots for each model are in Appendix Figures B.2 - B.5.

For each character A. broussonetii and A. frutescens were significantly different, while the two subspecies of A. frutescens were largely indistinguishable from each other (Figure 3.2 A-D). Leaf area and leaf perimeter were effective in delimiting A. sundingii and A. lemsii from both parental species. For leaf length, A. sundingii and A. lemsii were not significantly different from A. broussonetii. Argyranthemum lemsii was significantly different to both parents based on leaf width but A. sundingii showed no significant difference from A. broussonetii based on this character. Only leaf area identified a significant difference betwwen the homoploid hybrid species with the remaining characters showing no significant differences. The putative hybrid populations of A. broussonetii × A. frutescens were significantly different from both of the homoploid hybrid

51 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum species and A. broussonetii for all characters but were not significantly different from either subspecies of A. frutescens.

Argyranthemum sundingii and A. lemsii are intermediate relative to their parental progenitors based on the PCA whereas the putative hybrid populations of A. broussonetii × A. frutescens showed greater similarity to A. frutescens than to either of the homoploid hybrid species (Figure 3.2 E).

Figure 3.2 - Boxplots (A-D) and PCA (E) based on morphological characters leaf area, perimeter, length and width. Letters above each boxplot plot represent the groupings identified by a post hoc Tukey test with false discovery rate (FDR) correction (Benjamini & Hochberg, 1995) for multiple comparisons. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

3.4.2 Ecological niche modelling

A total of 185 georeferenced localities remained after filtering samples to include only one record per species per 50 m2 pixel (Appendix Table B.4). A total of six variables were not correlated (Pearson correlation < 0.7) and were used in our analysis: mean annual temperature, mean annual precipitation, isothermality (diurnal thermal range divided by annual thermal range), daytime thermal range (difference between maximum annual average temperature and minimum annual average temperature), annual thermal range (difference between maximum temperature of the warmest month and minimum temperature of the coldest month) and slope.

52 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Niche distributions for each species predicted using Maxent had high predictive ability based on mean area under the receiver operator curve (AUC) values for A. broussonetii (0.987), A. frutescens subsp. frutescens (0.984), A. frutescens subsp. succulentum (0.969), A. sundingii (0.950), A. lemsii (0.974) and A. frutescens × A. frutescens (0.949). Predicted distributions were also congruent with the observed distributions of the taxa (Figure 3.3). Mean annual temperature contributed most to the model predictions for A. broussonetii (37.5 %) and A. frutescens subsp. frutescens (55.4 %), and mean annual precipitation provided a greater contribution to the predictions of A. lemsii (31.6 %) and A. broussonetii × A. frutescens (36.1 %). Daytime thermal range contributed most to the model predictions of A. frutescens subsp. succulentum (80.2 %) and isothermality contributed most to the prediction of A. sundingii (33.5 %).

In our PCA-env analysis, the proportion of variation explained by the first and second axis was 43.91 % and 26.8 % respectively. The first axis was primarily associated with mean annual temperature and mean annual precipitation whereas the second axis was largely explained by the remaining four variables (Appendix Figure B.6).

Niche overlap statistics demonstrated that each taxon occupied a distinct niche (Table 3.2). The null hypothesis that the ENM of two species are identical was rejected for all comparisons (P < 0.05) except the D statistic of our PCA-env comparison for A. frutescens subsp. frutescens vs. subsp. succulentum (P = 0.0990). Niche equivalency test plots are shown in Appendix Figures B.7 - B.10.

53 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

54 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Figure 3.3 - Niche space for each species based on (A) Maxent and (B) ecospat. The Maxent maps were created using the average predictions across 10 replicates and average cloglog thresholds under the maximum training sensitivity plus specificity criterion. Ordinations generated in ecospat show the niche space for each species across the first two principal components. The density of occurrences for each species is represented by grey shading and the solid and dashed contour lines show 100 % and 50 % of the available background environment respectively. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

Table 3.2 - Schoener’s D (Schoener, 1968) and Warren’s I (Warren et al., 2008) niche overlap statistics based on our Maxent and PCA-env niche predictions. P values were generated using the niche equivalency test. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum.

Maxent PCA-env

D I D I bro vs. frf 0.016 ** 0.059 ** 0.000 ** 0.000 ** bro vs. frs 0.042 ** 0.140 ** 0.000 ** 0.001 ** bro vs. sun 0.230 ** 0.448 ** 0.163 ** 0.216 ** bro vs. lem 0.369 ** 0.632 ** 0.343 ** 0.417 ** bro vs. bro × fru 0.267 ** 0.526 ** 0.175 ** 0.253 ** frf vs. frs 0.264 ** 0.552 ** 0.313 0.410 * frf vs. sun 0.231 ** 0.477 ** 0.102 ** 0.175 ** frf vs. lem 0.135 ** 0.325 ** 0.065 ** 0.116 ** frf vs. bro × fru 0.207 ** 0.433 ** 0.120 ** 0.179 ** frs vs. sun 0.155 ** 0.374 ** 0.064 ** 0.178 ** frs vs. lem 0.279 ** 0.561 ** 0.046 ** 0.125 ** frs vs. bro × fru 0.183 ** 0.431 ** 0.067 ** 0.130 ** sun vs. lem 0.376 ** 0.649 ** 0.620 * 0.760 * sun vs. bro × fru 0.513 ** 0.790 ** 0.387 ** 0.536 ** lem vs. bro × fru 0.468 ** 0.765 ** 0.360 ** 0.487 **

55 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

3.4.3 Population genetic analyses of nuclear SSRs

A total of 186 individuals had sufficient data to be included in the nuclear SSR analysis. Consistent with their hypothesised hybrid origin, expected heterozygosity (He) and unbiased expected heterozygosity (uHe) are higher in A. sundingii (0.72 and 0.73 respectively), A. lemsii (0.69 and 0.70) and A. broussonetii × A. frutescens (0.70 and 0.73) compared with A. broussonetii (0.57 and 0.57), A. frutescens subsp. frutescens (0.61 and 0.62) and subsp. succulentum (0.36 and 0.37) (Appendix Table B5 A).

Principal component analysis (PCA) of the nSSRs demonstrated a genetic distinction between A. broussonetii and A. frutescens and between the two subspecies of A. frutescens (Figure 3.4 A). The homoploid hybrid species and A. broussonetii × A. frutescens are intermediate to the parents and appear separated to some extent, but do not form distinct genetic clusters. Examination of more principal components did not reveal any further distinction between taxa (Appendix Figure B.11).

STRUCTURE analysis of the nuclear SSR markers identified the most likely number of K as 2, 3 and 5 in descending order based on the Evanno delta K method (Figure 3.4 B-C). For K = 2, the parent species are separated into different clusters. The average membership of A. broussonetii to cluster one was 94.52 % and the average membership of A. frutescens subsp. frutescens and subsp. succulentum to cluster two was 88.84 % and 96.99 % respectively. Argyranthemum sundingii and A. lemsii had an average membership of 81.67 % and 89.30 % to cluster one, suggesting the hybrid species have a greater affinity to A. broussonetii. Populations of A. broussonetii × A. frutescens had an average membership of 41.62 % and 58.38 % to clusters one and two respectively, although this was clearly variable, ranging from 1.60 % to 95.40 % for cluster one and from 4.60 % to 98.40 % for cluster two.

K values of 3 and 5 had similar levels of support and are both biologically informative. For K = 3, the clusters subdivide A. broussonetii, A. frutescens and the homoploid hybrid species. The average membership of A. broussonetii to cluster one was 91.58 % and the average membership of A. frutescens subsp. frutescens and subsp. succulentum to cluster two was 84.24 % and 94.27 % respectively. The average memberships of A. sundingii and A. lemsii to cluster three were 83.95 % and 81.90 %, respectively. Populations of A. broussonetii × A. frutescens share 66.15 % membership with cluster three, with 8.48 % and 25.39 % belonging to clusters one and two respectively.

56 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

For K = 5, the five named taxa occupy different clusters. A distinction between the two homoploid hybrid species is identified with A. sundingii having an average membership of 75.76 % and A. lemsii an average membership of 80.10 % to separate clusters. The two subspecies of A. frutescens are also delimited, with subsp. frutescens forming a cluster with an average membership of 84.11 % and subsp. succulentum an average of 90.43 %. Populations of A. broussonetii × A. frutescens are admixed, showing greatest affinity to A. sundingii (40.52 %), A. frutescens subsp. frutescens (average = 27.36 %) and A. lemsii (26.01 %).

Figure 3.4 - PCA (A), STRUCTURE results for K = 2 to 5 (B) and Evanno delta K (C) based on nuclear SSR clustering analyses. Taxa included in the PCA and STRUCTURE plots are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum.

3.4.4 Haplotype analysis of chloroplast SSRs

After removing samples with missing data, 164 individuals remained and 16 haplotypes were identified (Table 3.3; Appendix Figure B.12). Ten of these haplotypes were specific to one of parental species with four found in A. broussonetii (40 individuals) and six specific for A. frutescens (27 individuals). Of the remaining six haplotypes two were shared between the

57 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum parental species (12 individuals) and four were not found in either of the parental species (five individuals). Of the 37 A. sundingii individuals sampled, 24 shared a haplotype with A. broussonetii, four with A. frutescens and the remaining nine possessed haplotypes that were either shared between the parental species or not present in either. For A. lemsii, 29 of the 37 individuals sampled had a haplotype shared with A. frutescens, two shared a haplotype with A. broussonetii and six had haplotypes that were shared between each of the parental species or found in neither. Of the eleven individuals sampled from A. broussonetii × A. frutescens, 9 possessed a haplotype shared with A. frutescens, one individual had a haplotype shared between both parents and one individual had a haplotype not found in the plants of either parent sampled.

Table 3.3 - Distribution of haplotypes across species and groups as defined by presence in parental species. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum.

bro fru shared neither

I II III IV V VI VII VIII IX X XI XII XIII XIV XV XVI

bro 1 2 9 28 0 0 0 0 0 0 5 1 0 0 0 0 frf 0 0 0 0 1 6 3 1 1 3 2 4 0 0 0 0 frs 0 0 0 0 0 0 0 0 4 8 0 0 0 0 0 0 sun 0 2 0 22 0 4 0 0 0 0 8 0 0 0 1 0 bro × fru 0 0 0 0 0 6 0 0 3 0 1 0 0 0 0 1 lem 0 0 0 2 3 7 11 0 2 6 3 0 1 2 0 0

3.4.5 Processing of GBS SNP data

Genotyping-By-Sequencing resulted in an average of 1.91 M reads across all samples (range 0.23 to 5.24 M). Filtering out of low quality reads removed 7.59 - 8.52 % of the total reads resulting in an average of 1.77 M across all samples (range 0.22 to 4.79 M; Appendix Figure B.13). Two samples (one each of A. frutescens subsp. succulentum [frs242] and A. sundingii [sun288]) with less than 0.5 M reads were removed. An average of 1705 reads per sample mapped to either chloroplast or mitochondrial references and were removed. Increasing the clustering threshold through 80 %, 85 % and 90 % increased the resulting number of clusters with an average of 59,494, 67,313 and 87,312 clusters per individual for each threshold respectively (Appendix Table B.6). This was reduced to 18,735, 20,818 and 25,621 clusters, respectively, that passed the minimum depth requirement. Increasing the clustering threshold reduced heterozygosity estimates from 0.0394 to 0.0284 and error estimates from 0.0164 to 0.0130. The average number of loci consensus reads per sample increased to 16,194, 18,586 and 23,988 for each clustering threshold. Two levels were used for the minimum number of samples required to process a locus.

58 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Increasing the minimum number of samples required to process a locus from 10 to 13 reduced the average number of loci across samples included in the assembly as well as the number of SNPs (Appendix Table B.6).

3.4.6 Nuclear SNP cluster analysis

For each assembly generated in this study, PCA and STRUCTURE showed a similar grouping of individuals and the Evanno delta K plot as part of the STRUCTURE analysis always provided strongest support for K = 3, providing confidence that the results were robust to the read clustering approach used. Therefore, we selected the assembly based on a clustering threshold of 90 % with loci present in a minimum of 10 samples since it had the greatest number of SNPs (Appendix Table B.6). PCA and STRUCTURE plots for this dataset are presented in Figure 3.5 but results for all other assemblies are also presented in Appendix Figures B.14 and B.15.

PCA identified clear differences between the parental species, as well as the two subspecies of A. frutescens (Figure 3.4 A). The homoploid hybrid species form a cluster intermediate to the parents but closely allied to A. broussonetii. No clear distinction between A. sundingii and A. lemsii is apparent. The STRUCTURE analysis for the most strongly supported K = 3 (Figure 3.4 C) separated the parent species into distinct clusters with the two homoploid hybrid species sharing the third cluster (Figure 3.4 B). Increasing the value of K does not increase the separation between the hybrid species. For K = 3, A. broussonetii has 100 % membership to cluster one while A. frutescens subsp. frutescens and subsp. succulentum have 99.96 % and 100 % membership to cluster three respectively. Argyranthemum sundingii and A. lemsii have 94.32 % and 82.82 % membership to cluster two respectively. Where K = 2, there is a clear distinction between the parental species with A. broussonetii having 100 % membership to cluster one while A. frutescens subsp. frutescens and subsp. succulentum have 99.97 % and 99.97 % membership to cluster two respectively. In keeping with the results of K = 2 for the nSSRs, the homoploid hybrid species show greater similarity to A. broussonetii, with A. sundingii sharing an average membership of 78.00 % and A. lemsii a membership of 86.91 % with the A. broussonetii cluster.

59 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Figure 3.5 - PCA (A), STRUCTURE results for K = 2 to 5 (B) and Evanno delta K (C) based on an ipyrad assembly of GBS reads, using a 90 % clustering threshold and a minimum of 10 samples required to process a locus. For the STRUCTURE and PCA plots taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

3.4.7 Approximate Bayesian Computation

The ipyrad assembly used for our ABC analysis contained a total of 4736 putatively unlinked SNPs (Appendix Table B.6). After the removal of SNPs that were not found in at least one individual per species (a requirement of DIYABC), the total number of SNPs included in the ABC analysis was 2152. Scenario 1 (Figure 3.6) involving two independent hybridisation events was estimated to be the most likely with a posterior probability of 0.5999 and a confidence interval of 0.4045 - 0.7953. In this scenario, A. sundingii and A. lemsii arose independently, and from different subspecies of A. frutescens. Scenario 3 which also involved two hybridisation events had some support with a posterior probability of 0.4001 and a confidence interval of 0.2047 - 0.5955. In this scenario the homoploid hybrid species again originated from independent hybridisation events but they shared the same parentage of A. broussonetii and A. frutescens subsp. frutescens.

60 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Estimation of error rates provided confidence in scenario choice. Type I error rate (probability that datasets simulated under the selected scenario were assigned to other scenarios) was estimated to be 11.57 %, whereas, the type II error rate (the probability that datasets simulated under other scenarios were assigned to the best scenario) was estimated to be 1.41 %.

The goodness-of-fit of scenario 1 was demonstrated using simulations from posterior distributions and different summary statistics to those employed for scenario choice. PCA showed that the observed dataset was similar to these simulated datasets (Appendix Figure B.16). A P-value was also estimated for each summary statistic by ranking the observed value among the values obtained with simulated datasets. Of the 29 summary statistics used for model checking, 12 differed significantly from their simulation distribution (P < 0.05; Appendix Table B.7). Despite being the most strongly supported model this suggests some discordance between the scenario- posterior combinations and the observed dataset.

Parameter estimates for scenario one (Figure 3.6) suggest that hybridisation events that gave rise to A. sundingii and A. lemsii occurred 2.22 (95% confidence interval 1.24 – 3.29) and 4.21 (CI 2.73 – 5.70) mya respectively (based on one generation per year). The relative admixture rates suggests that the initial parental contributions were unequal for A. sundingii, with A. frutescens subsp. frutescens contributing 74 % (parameter ra; Figure 3.6) and A. broussonetii contributing 26 % (1 – ra). In contrast, the parental admixture for A. lemsii is more equal, with A. frutescens subsp. succulentum contributing 47 % (rb) and A. broussonetii 53 % (1 – rb). The mean divergence time for the two subspecies of A. frutescens is estimated to be 8.07 (CI 6.01 x 106 – 9.51 x 106) mya and for A. broussonetii and an ancestral A. frutescens 5.88 (CI 2.46 x 106 – 9.42 x 106) mya. Whilst the mean values are biologically implausible, the confidence intervals are large and overlapping suggesting that these events are indistinguishable.

61 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

Figure 3.6 - Scenarios 1 to 9 included in the DIYABC analysis based on nuclear SNPs comparing two hybridisation events (scenarios 1-5), a single hybridisation event (scenarios 6-8) and cladogenesis (scenario 9). Scenario 1 was selected as being most likely in describing the origin of the homoploid hybrid species in Argyranthemum. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum). Generations are shown on the y axis (t0 to td) and admixture proportions from each parent are shown on scenario one.

Discussion

In this study we employed a range of approaches to address a number of questions surrounding the status of putative homoploid hybrid species in Argyranthemum. Using collections from recently discovered populations we investigated if the homoploid hybrid species are morphologically distinct based on leaf characters. Fine resolution climatic data were used to test if the homoploid hybrid species occupy a novel ecological niche with respect to the parents. Using a combination of nuclear and chloroplast SSR markers and sampling at the population level we investigated the distinctiveness of the two homoploid hybrid species at a population genetic level while testing the hypothesis of Brochmann et al. (2000) that A. sundingii and A. lemsii exhibit distinct maternal parentages. We further sought to clarify the origin(s) of the populations of uncertain status near Barranco de Igueste using these data. Finally, we leveraged a nSNP dataset

62 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum to test whether the homoploid hybrid species originated by independent hybridisation events using Approximate Bayesian Computation (ABC).

3.5.1 The hybrid species are morphologically distinct from the parent species

There is considerable variation in leaf morphology within Argyranthemum and leaf characters are important for species identification (Humphries, 1973, 1976; Francisco-Ortega et al., 1997a). Each of the four leaf characters used in this study distinguished the parental species A. broussonetii and A. frutescens (Figure 3.2 A-D). The two subspecies of A. frutescens were indistinguishable but this is not surprising since the main differences between these taxa are growth form and leaf succulence (Humphries, 1976) which would not have been captured by our analysis. The homoploid hybrid species A. sundingii and A. lemsii were intermediate between and distinct from the parental species based on leaf area and perimeter in particular. Argyranthemum sundingii and A. lemsii did not exhibit the high degree of variability between the parental species that one would associate with ongoing hybridisation and backcrossing (Tovar-Sánchez & Oyama, 2004; Worth et al., 2016) conforming to the hypothesis that A. sundingii and A. lemsii are stabilised homoploid hybrid species as opposed to hybrid swarms. This finding, with broader sampling, corroborates the earlier work of Brochmann et al. (2000) who found the extent of morphological variation in the homoploid hybrid species was comparable to the parental taxa, whereas hybrid swarms and synthetic F2 hybrids exhibited a much wider range in variation.

With the characters used in this analysis, only leaf area was able to distinguish A. sundingii and A. lemsii. This disagrees with their treatment as a single entity as proposed by Brochmann et al. (2000) and rather supports the current taxonomic treatment. While the homoploid hybrid species are largely overlapping for most morphological traits examined here we were unable to include characters of the ligule (floral) and cypselae (seed) which are known to be informative for species identification in this genus (Humphries, 1976; Brochmann, 1987), since A. broussonetii did not flower under glasshouse conditions. A comprehensive study of morphology would be needed before making firm taxonomic conclusions regarding the status of the homoploid hybrid species in Argyranthemum. Our results do indicate that whilst the two homoploid hybrid species share broadly similar leaf traits, they are distinguishable and their distinctiveness might be greater when floral traits are considered.

63 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

3.5.2 The hybrid species are ecologically intermediate

Ecological isolation has been considered a contributing factor in most cases of HHS (Kadereit, 2015). Few studies have explicitly investigated this hypothesis although Liu et al. (2014) demonstrated that the homoploid hybrid species Ostryopsis intermedia B.Tian & J.Q.Liu occupies a largely distinct niche from the parental progenitors and Capblancq et al. (2015) demonstrated significant overlap between the homoploid hybrid species Coenonympha darwiniana Staudinger and one of the progenitors.

The parental species are almost entirely non-overlapping in their ecological niches and it is clear from the Maxent and PCA-env analyses that the homoploid hybrid species occupy an altitude and ecological niche space intermediate between the parents. Despite being significantly different there is still some overlap in the niche space of the homoploid hybrid species and their parental progenitors, in particular between A. broussonetii and A. lemsii (Maxent: D = 0.369, I = 0.632; PCA- env: D = 0.343, I = 0.417), which is not surprising considering their geographical proximity (e.g. populations D and N; Figure 3.1). That such populations of a homoploid hybrid species can persist despite proximity to parental populations without assimilation, suggests either some degree of intrinsic reproductive isolation between the homoploid hybrid species and their parental progenitors or ecological isolation over very short distances. Brochmann et al. (2000) found a moderate reduction in the fertility of F1 hybrids between A. broussonetii and A. frutescens based on pollen staining. However, to our knowledge no such crosses have yet been made between the homoploid hybrid species and the parents. The fact that A. sundingii and A. lemsii are both adapted to intermediate habitats with respect to their parental progenitors might explain why each of the homoploid hybrid species is broadly similar leaf morphology. However, our analysis also identified a significant difference in the ecological niches occupied by A. sundingii and A. lemsii, which might explain the differences in leaf area identified.

3.5.3 A. sundingii and A. lemsii are genetically distinct from the putative parents

Nuclear SSRs have been used successfully in studies investigating the origin of homoploid hybrid species (Sherman & Burke, 2009; Liu et al., 2014) and the nSSRs identified by White et al. (2016) were effective in identifying genetic clusters for this study. The nSNP dataset (n = 4736 SNPs) obtained by GBS was also informative for investigating the genetic distinctiveness of the hybrid species. For a non-model system such as Argyranthemum where no reference genome is available, GBS followed by de novo assembly of reads using ipyrad was an effective protocol for the identification of nSNPs. A similar approach proved effective in a study of HHS in the butterfly

64 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum genus Coenonympha which employed a reduced representation approach (Capblancq et al., 2015).

The PCA based on the nSSR data revealed a clear distinction between the parental progenitors and demonstrated the relative intermediacy of the homoploid hybrid species (Figure 3.4 A). PCA of the nSNP dataset shows that the homoploid hybrid species are intermediate relative to their putative parental progenitors (Figure 3.4 A) and the distinction is clearer compared with the PCA based on nSSRs.

In the STRUCTURE analysis of both nSSRs and nSNPs, the HHS formed a distinct cluster with respect to the parents where K = 3 (Figure 3.4 B; Figure 3.5 B), supporting the distinctiveness of the homoploid hybrid species from the parental taxa. In the case of the nSNP data, these results are consistent across different assemblies suggesting that the results are robust (Appendix Figure B.14; Appendix Figure B.15).

Where K = 2, the STRUCTURE analyses of both datasets demonstrated that A. sundingii and A. lemsii, whilst being intermediate to the parents, showed greater similarity to A. broussonetii. This agrees with the expectation that a homoploid hybrid species is unlikely to inherit an equal proportion from each parent as backcrossing is expected to occur (Mallet, 2007).

In the nSSR analysis, K = 5 showed evidence of differentiation between the two homoploid hybrid species which conforms to previous analyses based on AFLPs (Fjellheim et al., 2009) and evidence of chromosomal rearrangements between A. sundingii and A. lemsii based on genomic in situ hybridization (GISH; Borgen et al., 2003). This was not evident in the nSNPs analysis. This could potentially be caused by missing data as a result of low sequence coverage commonly associated with reduced representation approaches such as GBS (Davey et al., 2011; Poland & Rife, 2012).

Thus, both nSSRs and nSNPs support the genetic differentiation of the HHS from the putative parents with a greater contribution from A. broussonetii than A. frutescens to their genetic composition. Genetic differentiation between HHS is only supported by nSSR data, although this may be an artefact of the processing of nSNP data.

3.5.4 Chloroplast haplotype patterns indicate independent origins

Previous studies of HHS have also leveraged maternally inherited chloroplast markers as a method of inferring parentage and number of origins. For example, Helianthus anomalus (Schwarzbach & Rieseberg, 2002), Hippophae gyantsensis Y.S.Lian (Wang et al., 2001) and Pinus densata Masters

65 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

(Wang et al., 2001) are homoploid hybrid species that are thought to be derived from multiple hybridisation events based on the haplotype patterns identified using cp markers.

In Argyranthemum, 10 of the 16 haplotypes recovered were informative as to the parentage of the HHS because they occurred in one parental species and not the other. Of the remaining six haplotypes, two were shared between the parents and four were found in neither. Excluding uninformative haplotypes, 86 % (24/28) of A. sundingii samples have an A. broussonetii specific haplotype and 94 % (29/31) of A. lemsii samples have an A. frutescens haplotype. This pattern of haplotype distribution agrees with the restriction site phylogeny of Francisco-Ortega et al. (1996). A small number of samples of A. sundingii (four individuals) and A. lemsii (two individuals) have haplotypes that suggest the reverse parentage and this could be attributed to more recent gene flow. The most parsimonious explanation of the pattern of haplotypes identified agrees with the hypothesis that the two homoploid hybrid species currently recognised arose independently (Brochmann et al., 2000) and through crossing in different directions, although with some subsequent backcrossing.

3.5.5 What are the plants from Igueste referred to as A. broussonetii × A. frutescens?

Based on previous studies the origin(s) of populations near Barranco de Igueste (populations K and L in this study) remains unclear. In the field, plants at this site appeared morphologically intermediate between A. broussonetii and A. frutescens and considering the widespread distribution of the parental species in the Anaga peninsula, it is certainly plausible for hybridisation events to occur multiple times in different valleys (Brochmann et al., 2000). The comparison of leaf characters in this study showed that populations of A. broussonetii × A. frutescens were similar to A. frutescens. Our ecological analysis found that A. broussonetii × A. frutescens occupied intermediate habitats similar to the homoploid hybrid species, which is consistent with their distribution at intermediate altitudes. The hybrid origin of A. broussonetii × A. frutescens was supported in our PCA of nSSRs which found these populations to be intermediate between the parental species. STRUCTURE analysis of nSSRs also showed that a number of individuals in these populations were admixed, with some individuals sharing a greater proportion of their genetic composition with A. frutescens. All but two of the individuals of A. broussonetii × A. frutescens possessed a haplotype shared with A. frutescens, with one having a haplotype shared between the parents and another a haplotype found in neither, suggesting that A. frutescens is the maternal parent of these crosses. Taken together, the data suggest that individuals sampled from population K and L are likely of hybrid origin and subsequently introgressed with A. frutescens.

66 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

3.5.6 Approximate Bayesian Computation supports independent origins of A. sundingii and A. lemsii

Approximate Bayesian Computation offers a flexible framework to investigate different evolutionary scenarios and is particularly useful for inferring historical hybridisation events (Bertorelle et al., 2010). This approach has been used effectively in previous studies of HHS, confirming a hybrid origin of Ostryopsis intermedia (Liu et al., 2014) and demonstrating that two subspecies of Coenonympha darwiniana originated by a single hybridisation event and subsequent divergence (Capblancq et al., 2015).

Our study supports two independent hybridisation events in the origin of the homoploid hybrid species with two different subspecies of A. frutescens involved. Specifically, a hybridisation event between A. broussonetii and A. frutescens subsp. frutescens gave rise to A. sundingii and a hybridisation event between A. broussonetii and A. frutescens subsp. succulentum resulted in A. lemsii. It is also noteworthy that the only other scenario which had had any level of support (scenario 3) also involved two independent hybridisation events, providing confidence in the hypothesis that the two homoploid hybrid species currently recognised arose independently.

While there is support for scenario 1 being the most likely, the model does not fit the observed data particularly well. This is apparent in the posterior values of parameters for scenario 1, which suggest that A. frutescens subsp. frutescens contributed a greater proportion of the genome to A. sundingii relative to A. broussonetii: the relative genomic contribution of A. broussonetii to the hybrid species would be expected to be greater based on our STRUCTURE analyses (Figure 3.4 B; Figure 3.5 B). In addition, we would expect the split between A. broussonetii and A. frutescens to be ancestral to the subspecies of A. frutescens. Despite our tree topology showing this relationship the DIYABC analysis still provides the alternate order of events. The poor fit of the model could be attributed to the simplicity of our model design, which involved two discrete hybridisation events. In reality, it is likely that there would have been some degree of backcrossing particularly during the early stages of divergence. Indeed the identification of hybrid populations near Barannco de Igueste (populations K and L in the present study), which appear to be backcrossing with A. frutescens, and the unequal genomic contributions of the parental species to the homoploid hybrid species (Figure 3.4 B; Figure 3.5 B) indicates that occasional gene flow is likely. It is not possible to model gene flow using DIYABC and while there is software available that permits the simulation of gene flow over time (Excoffier & Foll, 2011), it was not possible to use this due to the amount of missing data, an observation often associated with reduced representation datasets. Despite the simplicity and the limitations of the analysis, two

67 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum independent hybridisation events appear to have resulted in the origin of the homoploid hybrid species in Argyranthemum consistent with other data presented. Future work should attempt to incorporate more complex models that include gene flow as opposed to discrete admixture events, which might allow for more accurate identification of model parameters.

Conclusions

From a morphological perspective, A. sundingii and A. lemsii are distinct from their putative parents. Although both are similar in appearance, the homoploid hybrid species can be distinguished from each other based on leaf area. Each of the homoploid hybrid species occupy novel ecological niches with respect to their parents and each other. However, given the proximity of populations and climatic niche overlap, populations of the homoploid hybrid species are unlikely to be completely reproductively isolated from the parents based on ecological isolation alone. Our genetic analyses also demonstrate that A. sundingii and A. lemsii originated by independent hybridisation events. As such, Argyranthemum represents an example of independent homoploid hybrid speciation events, with evidence of divergence in morphology and parallel adaptation to novel intermediate ecological niches.

The molecular data provided support for the independent hybrid origin of A. sundingii and A. lemsii. Whether or not they represent a hybrid species depends largely on the definition employed. Schumer et al. (2014) provided stringent criteria for a homoploid hybrid species which included evidence of reproductive isolation of hybrid lineages from the parental species, hybridisation in the genome, and of reproductive isolation as a consequence of hybridisation. These criteria have been debated in subsequent papers (Nieto Feliner et al., 2017; Schumer et al., 2018) as only Helianthus and Heliconius butterflies satisfy all three criteria, leading to a potential underestimate of the importance of HHS in evolution. Evidence of reproductive isolation and reproductive isolation derived from hybridisation are lacking from most putative examples of HHS (Schumer et al., 2014). In this study we have demonstrated the hybrid origin of A. sundingii and A. lemsii as well as evidence of adaptation to intermediate altitudes, which likely confers some, albeit incomplete, reproductive isolation from their parental progenitors. However, as of yet it is unclear whether or not hybridisation was directly responsible for the occupation of a novel habitat. Reciprocal transplant experiments to investigate if the hybrid species are more fit than parental species in intermediate habitats would allow this to be investigated.

Identifying the genetic mechanisms responsible for HHS is an important consideration for any future work. Jiggins et al. (2008) and Salazar et al. (2010) distinguished two different models of

68 Independent Homoploid Hybrid Speciation events in the Macaronesian endemic genus Argyranthemum

HHS: hybrid trait speciation and mosaic genome hybrid speciation. The former involves the introgression of a small number of adaptive loci (so-called magic traits) which confer reproductive isolation by a novel adaption whereas the latter involves the stabilisation of the hybrid genome and reproductive isolation by intrinsic incompatibilities. It is not clear which mechanism may be acting in Argyranthemum and each is not necessarily mutually exclusive. The homoploid hybrid species in Argyranthemum appear to have inherited a larger proportion of the genome from A. broussonetii than A. frutescens, so it is plausible that the homoploid hybrid species have inherited a limited number of adaptive loci from A. frutescens although this would need to be demonstrated experimentally. Indeed, future studies in Argyranthemum or other examples of HHS should attempt to identify the genetic changes responsible for the morphological differences and/or ecological adaption in the homoploid hybrid species relative to the parents. One approach would be to use transcriptomes to identify differentially expressed transcripts in the homoploid species relative to the parents.

Interspecific hybridisation is frequent in island floras (Howarth & Baum, 2005; Carine et al., 2007; Friar et al., 2008; Lindhardt et al., 2009; Jones et al., 2014). This is likely the result of endemic lineages possessing little intrinsic reproductive isolation and the dynamic ecological landscape of oceanic islands with natural and anthropogenic disturbances bringing previously isolated species into contact (Francisco-Ortega et al., 2000; van Hengstum et al., 2012; Crawford & Stuessy, 2016; Kerbs et al., 2017). In addition, the variety of distinctive habitats in close proximity on oceanic islands also offers the opportunity for the establishment of hybrids free from competition with their parental progenitors. Although the evolutionary consequences of hybridisation are varied, there is increasing evidence that hybridisation within island lineages has promoted diversification. (Howarth & Baum, 2005; Friar et al., 2008). Given the propensity for hybridisation in island lineages as well as the difficulty associated with homoploid hybrid species identification, it is plausible that HHS is more common in island lineages than we currently think and the processes that we document here may have played a much greater role in generating novel diversity in island plant radiations than we currently appreciate.

69

Chapter 4 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

71 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

Summary

 Geographical isolation, habitat shifts and hybridisation have all been implicated in generating the rich endemic floras of oceanic islands. However, inferences have often been limited due to a lack of genetic variation in commonly used molecular markers.  In this study we employ Genotyping-By-Sequencing (GBS) to investigate the relative importance of geographical isolation, habitats shifts and hybridisation in the diversification of Argyranthemum Webb (Asteraceae), the largest endemic genus of flowering plants endemic to the Macaronesian archipelagos of the North Atlantic Ocean.  Phylogenetic relationships were investigated and ancestral state reconstruction was performed to differentiate patterns of diversification related to both geographic isolation and habitat shifts. D-statistics (ABBA-BABA tests) identified evidence of hybridisation between lineages co-occurring on the same island but found little support for the hypothesis that that hybridisation may be responsible for the occurrence of non- monophyletic multi-island endemic (MIE) species.  Geographic isolation, habitat shifts and hybridisation have all contributed to the diversification of Argyranthemum with morphological convergence also proposed to explain the occurrence of non-monophyletic MIE species. This study reveals greater complexity in the processes responsible for the diversification of this endemic oceanic island genus than was previously thought.

Introduction

The Macaronesian archipelagos, comprised of the Azores, Madeira, Selvagens, the Canary Islands and the Cape Verdes in the North Atlantic Ocean, are home to approximately 30 endemic genera and 900 endemic species of flowering plants (Caujapé-Castells et al., 2010). Diversification of lineages within the region has played a prominent role in generating the striking levels of endemic diversity observed and Macaronesia has been considered a region ideally suited to investigations of the processes responsible for flowering plant evolution (Kim et al., 2008). Hypotheses to explain diversification of plant lineages within Macaronesia have largely focussed on geographical isolation (i.e. speciation through isolation following inter-island dispersal between similar ecological niches) and habitat shifts (i.e. speciation associated with the shifts of a lineage to different ecological niches; Francisco-Ortega et al., 2002; Lee et al., 2005; Trusty et al., 2005; Goodson et al., 2006). Hybridisation has also been recognised as an important process for

72 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae) diversification in Macaronesia, although much less is known about its frequency and evolutionary significance (Jones et al., 2014; Curto et al., 2017). The occurrence of hybridisation has been inferred from incongruence between chloroplast (cp) and nuclear data sets or by the use of D- statistics (also termed ABBA-BABA tests), that identify discordant alleles in four taxon pectinate trees as a result of hybridisation (Eaton & Ree, 2013). For example, Jones et al. (2014) proposed hybridisation as an explanation for incongruence between cp and nuclear data in the phylogenetic placement of Pericallis appendiculata (L.f) B.Nord. that was resolved as monophyletic in their chloroplast analysis but polyphyletic in their nuclear (ITS) analysis, whilst Curto et al. (2017) proposed hybridisation as a possible explanation for the non-monophyly of Micromeria canariensis (P.Pérez) Puppo in Tenerife based on D-statistics.

A significant impediment to our understanding of the relative contribution of these processes in generating diversity on oceanic islands has been the lack of resolution in phylogenies based either on ITS or a few chloroplast markers (Francisco-Ortega et al., 1997b; Allan et al., 2004; Goodson et al., 2006). To overcome this, recent studies have employed high-throughput sequencing (HTS) approaches, particularly reduced representation style data such as restriction site associated markers (RAD-seq) and Genotyping-By-Sequencing (GBS). These methods identify thousands of presumptively neutral polymorphic makers across multiple samples and have been employed successfully in the resolution of phylogenetic relationships between closely related taxa in Macaronesia, notably Tolpis Adans. (Mort et al., 2015) and Micromeria Benth. (Puppo et al., 2015), as well as taxa in other island systems (e.g. Paun et al., 2016). In each of these genera, the reduced representation data sets considerably improved the resolution of species relationships and allowed greater inference of the processes responsible for diversification.

Argyranthemum Webb (Asteraceae) is the largest endemic genus of flowering plants found in the Macaronesian archipelagos, with a total of 24 species and 39 terminal taxa (species and subspecies; Figure 4.1). The 24 species are endemic to Madeira (3 species), Selvagem Pequena (1) and the Canary Islands (20) of which 21 species are single island endemics (SIEs). Three species are multiple island endemics (MIEs; A. frutescens, A. broussonetii and A. adauctum) but in each case, subspecies of these are SIEs. Argyranthemum is present in all the major habitat zones in Macaronesia, ranging from coastal desert, sclerophyllous zone, laurel forest, pine savannah to

73 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

Figure 4.1 - (A) Madeira, Selvagem Pequena and Canary Islands in the North Atlantic Ocean, the taxa occurring on each island sampled and the habitats they occupy. (B) Simplified diagram of habitat types. Abbreviations for habitat types are as follows: coastal desert 0 – 300 m (CD), sclerophyllous forest 300 – 600 m (SF), laurel forest 600 – 1500 m on North-facing slopes (LF), pine forest 600 – 2000 m on South-facing slopes and 1500 – 2000 m on North-facing slopes (PF) and high altitude desert 2000 – 3700 m (PF).

74 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae) high-altitude desert (Figure 4.1). Most terminal taxa are found in a single habitat type except for A. haouarytheum, A. hierrense, A. webbii and A. adauctum subsp. adauctum which are each found in multiple habitat types (Figure 4.1).

Previous phylogenetic analyses using small numbers of molecular markers have found that Argyranthemum is monophyletic and closely related to the continental genera Glebionis Cass., Heteranthemis Schott and Ismelia Cass. that are distributed in the Mediterranean, southern Iberia and Morocco respectively (Francisco-Ortega et al., 1995a,b; Oberprieler et al., 2007). However, attempts to resolve species relationships within Argyranthemum have been hampered by a lack of genetic variation (Francisco-Ortega et al., 1997a). A chloroplast restriction site analysis identified two main clades, one restricted to Madeira and the Selvagens and one comprising taxa from the Canary Islands (Francisco-Ortega et al., 1996b). Within the Canary Islands clade, two major groups were resolved, one largely corresponding to taxa under the influence of the northern the trade winds and the other not, suggesting that inter-island colonisation between similar ecological habitats was the prominent driver of diversification in the Canary Islands (Francisco-Ortega et al., 1996b). In contrast, habitat shifts were more frequent on Madeira where there was a single colonisation event followed by shifts into the different habitat types available.

Hybridisation is also likely to have played a significant role in the diversification of Argyranthemum. Intrinsic reproductive barriers between taxa are weak with reproductive isolation largely due to geographical and ecological differentiation (Francisco-Ortega et al., 1997a). Artificial hybrids can be created with ease in cultivation (Humphries, 1973) and hybrid swarms have been documented between A. broussonetii and A. frutescens (Tenerife; Brochmann, 1987), A. coronopifolium and A. frutescens (Tenerife; Brochmann, 1984) A. adauctum and A. filifolium (Gran Canaria; Borgen, 1976) and A. tenerifae and A. adauctum (Tenerife; White, pers. obs.). Hybridisation has also been inferred from the presence of polyphyletic taxa in previous phylogenetic analyses, notably A. adauctum (Francisco-Ortega et al., 1996b). Furthermore, it has been hypothesised that A. escarrei is of hybrid origin based on morphological similarity to individuals collected from hybrid swarms between A. adauctum subsp. canariense and A. filifolium (Borgen, 1976) but this has yet to be tested. To date, the most robust support for the role of hybridisation in the diversification of Argyranthemum relates to A. sundingii and A. lemsii, for which molecular data support independent homoploid hybrid origins from crosses between A. broussonetii and A. frutescens (Brochmann et al., 2000; Fjellheim et al., 2009; White et al., 2018).

The aim of this study was to investigate the relative importance of geographical isolation, habitat shifts and hybridisation in the diversification of Argyranthemum. To this end, we present a

75 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae) phylogenetic analysis of Argyranthemum using GBS and sampling across the entire genus. Using the resulting phylogeny, ancestral states for islands and habitats were estimated to infer patterns of divergence related to geographic isolation and/or habitat shifts. To detect evidence of hybridisation we employed D-statistics (Eaton & Ree, 2013). We specifically aimed to test two hybridisation hypotheses: first, that hybridisation between species on the same island has been frequent as proposed for Micromeria by Curto et al. (2017) and second, that hybridisation explains the non-monophyly of multi-island endemic species as proposed for Pericallis by Jones et al. (2014).

Methods

4.3.1 Sampling

Field collections of Argyranthemum were carried out in the Canary Islands and Madeira from July to August 2015 and July 2016 respectively whereas the outgroup taxa Glebionis were collected in Andalucía, Spain during April 2015. Leaf material was dried and preserved using silica gel for molecular work. Herbarium vouchers were also prepared and these are deposited at the Natural History Museum, London (BM). Collections from the Selvagens were provided by M. Menezes de Sequeira.

Leaf material of all Argyranthemum taxa recognised by Humphries (1976) and Francisco-Ortega et al. (1996b) were included in our study (Table 4.1) with the exception of A. sundingii and A. lemsii which are known to be of hybrid origin (Brochmann et al., 2000; Fjellheim et al., 2009; White et al., 2018) and likely to confound phylogenetic analyses (Gruenstaeudl et al., 2017; McVay et al., 2017). Accessions of “A. vincentii” were included in our sampling despite the fact this taxon name is yet to be described formally. “Argyranthemum vincentii” has been included in previous studies (Francisco-Ortega et al., 1996c, 2000) and can be distinguished by filiform leaf lobes and green (not glaucous) leaves (pers. com., A. Santos-Guerra). Where possible we included two samples for each terminal taxon, ideally from different populations. However, only one sample could be included for A. haemotomma, A. thalassophilum and for each subspecies of A. pinnatifidum. Additionally, for each of the following taxa, two samples from the same populations were sampled: A. adauctum subsp. dugourii, subsp. erythrocarpon, subsp. palmensis, A. frutescens subsp. canariae, A. lidii, A. sventenii, “A. vincentii” and A. winteri. The outgroup taxa, and G. coronaria from the Mediterranean region, were selected on the basis of earlier phylogenetic analyses (Francisco-Ortega et al., 1995b,a; Oberprieler et al., 2007). Specimen details are provided in Table 4.1.

76

Geographical iso Table 4.1 - Leaf sampling for Genotyping-By-Sequencing (GBS) with collection reference (Ref), country or island of origin (Isl.), habitat type (Hab.), location, altitude, latitude/longitude coordinates, leaf and representative voucher specimen barcodes and collector details. Country or island of origin is abbreviated as SP (Spain), TE (Tenerife), GC (Gran Canaria), EH (El Hierro), LP (La Palma), LG (La Gomera), MA (Madeira), LA (Lanzarote), SE (Selvagem Pequena)

and FU (Fuerteventura). Habitats types are abbreviated as ME (Mediterranean basin) PF (Pine forest), LF (Laurel forest), SZ (Sclerophyllous zone), CD lation, habitatshifts, hybridisation and convergent evolution in thediversification (Coastal desert) and HD (High altitude desert). Barcodes are for leaf and voucher specimens deposited at the Natural History Museum London (BM).

Taxa Collection Isl. Hab. Location Leaf Voucher G. coronaria White et al. 797 ME ME Andalucia, province of Cádiz BM001092800 BM013407815

of theof Macaronesian endemic genus Argyranthemum (Asteraceae) G. segetum White et al. 796 ME ME Andalucia, province of Cádiz BM001092799 BM013407814 A. adauctum subsp. adauctum White et al. 120 TE PF Between La Laguna and Las Cañadas del Teide BM010765622 BM000828632

77 A. adauctum subsp. adauctum White et al. 135 TE LF Valle de la Ortova, TF-21 BM010765636 BM000828618

A. adauctum subsp. canariense White et al. 363 GC PF GC-600 south of Presa de la Siberia BM010765864 BM000828531 A. adauctum subsp. canariense White et al. 366 GC PF South east of Presa de la Siberia BM010765867 BM000828528 A. adauctum subsp. dugourii White et al. 163 TE PF Off-road track north east of Vilaflor BM010765664 BM000828592 A. adauctum subsp. dugourii White et al. 166 TE PF Off-road track north east of Vilaflor BM010765667 BM000828589 A. adauctum subsp. erythrocarpon White.et al. 43 EH LF HI-1 between El Brezal and El Salvador BM010765545 BM000828709 A. adauctum subsp. erythrocarpon White et al. 44 EH LF HI-1 between El Brezal and El Salvador BM010765546 BM000828708 A. adauctum subsp. gracile White et al. 356 GC PF GC-60 between La Plata and Agualatente BM010765857 BM000828534 A. adauctum subsp. gracile White.et al. 360 GC PF GC-60 south of Risco las Candelillas BM010765861 BM000828533* A. adauctum subsp. jacobaeifolium White et al. 368 GC LF South of Valsendero BM010765869 BM000828526 A. adauctum subsp. jacobaeifolium1 White et al. 370 GC LF Near to La Laguna BM010765871 BM000828524 A. adauctum subsp. palmensis White et al. 58 LP LF Walk from Los Tilos to Marcos y Corderos BM010765560 BM000828694 A. adauctum subsp. palmensis White et al. 62 LP LF Walk from Los Tilos to Marcos y Corderos BM010765564 BM000828690 A. broussonetii subsp. broussonetii White et al. 157 TE LF Roques del Fraile BM010765658 BM000828598 A. broussonetii subsp. broussonetii White et al. 494 TE LF La Cumbrilla, Anaga BM010765995 BM000828683* A. broussonetii subsp. broussonetii White et al. 552 TE LF Path to Mesa del Sabinal, Anaga BM010766053 BM000828674*

A. broussonetii subsp. broussonetii White et al. 664 TE LF Chamorga, Anaga BM010766164 BM000828483* A. broussonetii subsp. broussonetii White et al. 674 TE LF Chamorga, Anaga BM010766174 BM000828482* A. broussonetii subsp. broussonetii White et al. 719 TE LF Las Casas de la Cumbre, Anaga BM010766218 BM000828476* A. broussonetii subsp. broussonetii White et al. 749 TE LF Barranco de Valle Crispin, Anaga BM010766248 BM000828668* A. broussonetii subsp. broussonetii White et al. 768 TE LF Barranco de Valle Crispin, Anaga BM010766267 BM000828667* A. broussonetii subsp. gomerensis White et al. 110 LG LF CV-5 between Las Rosas and La Palmita BM010765612 BM000828642 A. broussonetii subsp. gomerensis White et al. 112 LG LF South of La Palmita BM010765614 BM000828640 A. callichrysum White et al. 95 LG SZ Valley below TF-713 in Barranco de la Guancha BM010765597 BM000828657 A. callichrysum White et al. 97 LG SZ Roque de Agando BM010765599 BM000828655 A. coronopifolium White et al. 80 TE SZ Chinamada, Anaga BM010765582 BM000828672 A. coronopifolium Graham et al. 107b TE SZ Anaga, Afur Roque Marubial BM001092356 BM000828856 A. dissectum Graham et al. 13 MA SZ Fajã dos Padres above cable car station BM001092072 BM000828763

A. dissectum Graham et al. 19 MA SZ By tunnel entrance near Fajã da Ovelha BM001092082 BM000828769 78 A. escarrei White et al. 335 GC SZ GC-200 between La Playa and Tirma BM010765836 BM000828542 A. escarrei White et al. 338 GC SZ GC-200 near Degollada del la Aldea BM010765839 BM000828542* A. filifolium White et al. 344 GC SZ GC-200 north of Morgán BM010765845 BM000828540 A. filifolium White et al. 346 GC SZ Barranco de Fataga BM010765847 BM000828538 A. foeniculaceum White et al. 142 TE SZ TF-436 between Las Portelas and Masca BM010765643 BM000828611 A. foeniculaceum White et al. 144 TE SZ TF-436 between Masca and Santiago del Teide BM010765645 BM000828609 A. frutescens subsp. canariae White et al. 319 GC CD North of La Atalaya NA BM000828553 A. frutescens subsp. canariae White et al. 320 GC CD North of La Atalaya NA BM000828552 A. frutescens subsp. foeniculaceum White et al. 107 LG CD TF-712 through Barranco del Valle BM010765609 BM000828645 A. frutescens subsp. foeniculaceum White et al. 116 LG CD Agulo BM010765618 BM000828636 A. frutescens subsp. frutescens White et al. 567 TE CD Maria Jiménez, Anaga BM010766068 BM000828558* A. frutescens subsp. frutescens White et al. 585 TE CD Maria Jiménez, Anaga BM010766086 BM000828557*

isolation, habitat shifts, hybridisation and convergent evolution in the diversification diversification in the evolution convergent and hybridisation shifts, habitat isolation, A. frutescens subsp. frutescens White et al. 611 TE CD Barranco del Cercado de Andrés, Anaga BM010766112 BM000828514* A. frutescens subsp. frutescens White et al. 620 TE CD Barranco del Cercado de Andrés, Anaga BM010766120 BM000828513* A. frutescens subsp. gracilescens White et al. 177 TE CD TF-625 above Poris de Abona BM010765678 BM000828578

Geographical Geographical (Asteraceae) Argyranthemum genus endemic Macaronesian of the

Geographical isolation, habita A. frutescens subsp. gracilescens White et al. 179 TE CD Road near to Arafo BM010765680 BM000828576 A. frutescens subsp. parviflorum White et al. 101 LG SZ Calle la Lajita north of Aeropuerto de GO BM010765603 BM000828651 A. frutescens subsp. parviflorum White et al. 92 LG SZ Barranco del Revolcadero above San Sebastian BM010765594 BM000828660 A. frutescens subsp. pumilum1 White et al. 326 GC CD Alongside GC-200, north of Laja del Risco BM010765827 BM000828551 A. frutescens subsp. pumilum1 White et al. 329 GC CD Overlooking Laja del Risco BM010765830 BM000828548 A. frutescens subsp. succulentum White et al. 229 TE CD Between Almáciga and Roque Bermejo, Anaga BM010765730 BM000828575* A. frutescens subsp. succulentum White et al. 234 TE CD Between Almáciga and Roque Bermejo, Anaga BM010765735 BM000828574* A. frutescens subsp. succulentum1 White et al. 242 TE CD Roque Bermejo BM010765743 BM000828732*

A. frutescens subsp. succulentum White et al. 244 TE CD Roque Bermejo BM010765745 BM000828731* shifts,t hybridisation and convergent evolution in thediversification A. gracile White et al. 169 TE SZ TF-38 above Chío and Guía de Isora BM010765670 BM000828586 theof Macaronesian endemic genus Argyranthemum (Asteraceae) A. gracile White et al. 172 TE SZ TF-82 above Tijoco Bajo BM010765673 BM000828583

79 A. haemotomma Graham et al. 15 MA CD Path between Prazeres and Paul do Mar BM000828765 BM001092077

A. haouarytheum White et al. 56 LP PF LP-2 approximately two km north of El Charco BM010765558 BM000828696 A. haouarytheum White et al. 57 LP SZ Walk below Volcan de San Antonio BM010765559 BM000828695 A. hierrense White et al. 38 EH CD HI-50 east of Sabinosa BM010765540 BM000828714 A. hierrense White et al. 47 EH SZ HI-15 approx. 1km north of Villa de Valverde BM010765549 BM000828705 A. lidii White et al. 321 GC SZ Amagro BM010765822 BM000828547* A. lidii White et al. 325 GC SZ Amagro BM010765826 BM000828546* A. maderense White et al. 775 LA SZ Haría, above Barranco de Temisa BM010766274 BM000828473 A. maderense White et al. 777 LA SZ Haría, Vueltas de Malpaso BM010766276 BM000828471 A. pinnatifidum subsp. montanum Graham et al. 25 MA LF On path from Pico do Arieiro to Pico Ruivo BM001092094 BM000828775 A. pinnatifidum subsp. pinnatifidum Graham et al. 38 MA LF West of Encumeada Just after third road tunnel BM001092116 BM000828788 A. pinnatifidum subsp. succulentum Graham et al. 3 MA CD Ponta de São Lourenço BM001092048 BM000828753 A. sventenii Graham et al. 119a EH SZ By main road to Restinga BM001092381 BM000828870 A. sventenii Graham et al. 119b EH SZ By main road to Restinga BM001092382 BM000828870 A. tenerifae1 White et al. 131 TE HD TF-24 below to Observatorio del Teide BM010765632 BM000828622 A. tenerifae White et al. 159 TE HD TF-24, Cañadas del Teide BM010765660 BM000828596 A. tenerifae White et al. 160 TE HD TF-24, Cañadas del Teide BM010765661 BM000828595

A. tenerifae1 White et al. 564 TE HD Cañadas - walk to Refuge BM010766065 BM000828596* A. thalassophilum Filipe Silva SE CD Selvagem Pequena NA UMad s/n A. vincentii White et al. 123 TE PF Barranco de la Gota near to TF-523, above Arafo BM010765625 BM000828629 A. vincentii White et al. 125 TE PF Barranco de la Gota near to TF-523, above Arafo BM010765627 BM000828627 A. webbii White et al. 49 LP SZ road near Lomo los Machines BM010765551 BM000828703 A. webbii White et al. 50 LP LF LP-1 between Llano Negro and Roque del Faro BM010765552 BM000828702 A. winteri1 White et al. 794 FU SZ Pájara, Jandia, Pico de La Zarza BM001092793 NA A. winteri White et al. 795 FU SZ Pájara, Jandia, Pico de La Zarza BM001092794 NA

1 Samples were removed for having less than 500,000 filtered GBS reads * If a voucher for this leaf sample is not available a representative voucher from the same population is given

White et al. refers to collection made by O. White, M. Carine, A. Reyes-Betancort A. Santos-Guerra, G. Torre and M. Olangua-Corral 80 Graham et al. refers to collections made by R. Graham, M. Carine, M. Menezes de Sequeira.

isolation, habitat shifts, hybridisation and convergent evolution in the diversification diversification in the evolution convergent and hybridisation shifts, habitat isolation,

Geographical Geographical (Asteraceae) Argyranthemum genus endemic Macaronesian of the Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

4.3.2 DNA isolation and GBS

DNA was extracted from silica-dried leaf material using a modified CTAB method (Doyle & Doyle 1987) identical to the method used by White et al. (2016, 2018). DNA samples were sent to the Genomic Diversity Facility at Cornell University for Genotyping-By-Sequencing (GBS) where samples were digested using EcoT22I and single-end 100 bp reads were generated using an Illumina Hiseq.

4.3.3 Processing of GBS data

Reads generated by GBS were processed and assembled using ipyrad (version 0.7.15; Eaton & Ree, 2013) employing the same assembly method as White et al. (2018). Briefly, the sequence quality of raw GBS reads was assessed and low-quality reads were filtered. Samples with less than 500,000 filtered reads were removed prior to clustering within samples at three different similarity thresholds (80 %, 85 % and 90 %). Clusters were aligned and heterozygosity and error rate were estimated before calling consensus base sequences. Clusters that mapped to the chloroplast genomes of Arabidopsis thaliana (L.) Heynh. (Genbank accession number: NC_000932.1), Helianthus annuus L. (NC_007977.1) or Chrysanthemum indicum L. (NC_020320.1) and the mitochondrial genomes of A. thaliana (NC_001284.2) or H. annuus (KF815390.1) were removed at this step. Consensus sequences were then clustered and aligned across samples using the same thresholds as above. These alignments formed loci that were then filtered based on the number of samples present and the maximum number of shared polymorphic sites in a locus. We employed two levels of filtering for the number of samples required for a locus. Specifically, a locus must have included 30 or 38 samples, which is equivalent to 60 % and 50 % missing data respectively after the removal of low-quality samples (see Results; Processing of GBS data). Additionally, loci with shared heterozygous sites across more than 20 % of the samples were removed as likely paralogs. Therefore. a total of six assemblies were produced, with varying parameters for read clustering similarity threshold (80 %, 85 % or 90 %) and the minimum number of samples required to process a locus (30 or 38). Parameters used for each assembly can be found in Supplementary Table C.1.

4.3.4 Assembly comparison

Principal components analysis (PCA) and STRUCTURE (Pritchard et al., 2000) were both used to identify and compare genetic clusters to investigate the robustness of our assemblies to the

81 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae) varying parameter values. Assembled loci were filtered using vcftools (Danecek et al., 2011) for a minimum minor allele frequency of 0.05 while selecting the first SNP from each locus to avoid the confounding influence of linkage. PCA was performed using PLINK v.1.9 (Purcell et al., 2007) and the most likely number of clusters was estimated using the R package mclust (Scrucca et al., 2016; R Core Team, 2018). STRUCTURE was used to test K from 1 to 10 with 10 runs of 50,000 replicates after a burn-in of 20,000 runs. The most likely K was then determined using the delta K method of Evanno et al. (2005) with STRUCTURE HARVESTER (Earl & vonHoldt, 2012). For each accession, the proportion of membership in each of the clusters was determined using CLUMPP (Jakobsson & Rosenberg, 2007). PCA and STRUCTURE plots were generated using ggplot2 in R (Wickham, 2009; R Core Team, 2018).

4.3.5 Phylogenetic reconstruction

Loci were concatenated and missing data were added as Ns to create a supermatrix. The optimal model of sequence evolution was identified using ModelTest-NG v.0.1.3 (https://github.com/ddarriba/modeltest). Maximum likelihood (ML) and Bayesian Inference (BI) trees were generated for each data set using RAxML Next Generation v.0.5.1 (RAxML-NG; Kozlov et al., 2018) and MrBayes v.3.2.6 (Ronquist et al., 2012), respectively. The best ML tree was selected after 1000 independent searches and bootstrap values were calculated from 1000 replicates. The BI analysis had two runs with four chains, 1 × 106 generations and a sampling frequency of 5000 before computing a 50 % majority rule consensus tree. Trees were visualised using ggtree in R (Yu et al., 2017; R Core Team, 2018).

4.3.6 Ancestral State Reconstruction

To investigate colonisation events and habitat shifts, we estimated ancestral states for islands and habitat types using the ML ancestral state estimation method for discrete characters in the R package ape (Paradis et al., 2004; R Core Team, 2018). We estimated ancestral states using our ML tree based on a clustering threshold of 90 % and a minimum sample number of 30 (see Results; Phylogenetic reconstruction). Habitat types were defined as: coastal desert (0 – 300 m), sclerophyllous zone (300 – 600 m), laurel forest (600 – 1500 m on North-facing slopes), pine forest (600 – 2000 m on South-facing slopes and 1500 – 2000 m on North-facing slopes) and high altitude desert (2000 – 3700 m; Figure 4.1 B) following Humphries (1976) and Jones et al. (2014).

82 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

4.3.7 D-statistics

We employed D-statistics as outlined by Eaton and Ree (2013) to test for evidence of hybridisation. Each test takes a four taxon pectinate tree denoted by (((P1,P2),P3),O) and identifies incongruent ancestral (A) and derived (B) alleles denoted as ABBA or BABA. Incongruence can be caused by incomplete lineage sorting (ILS) or hybridisation. If ILS is responsible, we would expect the proportion of ABBA and BABA alleles to be equal. However, if P3 has hybridised with either P2 or P1, we would expect an asymmetry in the number of ABBA or BABA alleles respectively (Figure 4.2 A; B). The D statistic quantifies the asymmetry of ABBA and BABA allele frequencies. Following Eaton and Ree (2013), we performed 1000 bootstrap iterations to measure the standard deviation of the D-statistic, in which loci were re-sampled with replacement to the same number as in the original data set. The results are reported as a Z scores, where Z is the number of standard deviations from 0 (the expected value) for D. Significance was assessed by converting the Z-score into a two-tailed p-value and using 0.01 as a conservative cut-off after correcting for multiple comparisons using Holm-Bonferroni correction.

Figure 4.2 - Four taxon pectinate trees for D-statistics showing (A) ABBA and (B) BABA allele distributions where red arrows indicate hybridisation events between P3 and P2 or P1. D-statistics testing for (C) hybridisation between lineages from the same island and (D) between multiple island endemic lineages.

83 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

D-statistics were performed to test for evidence of hybridisation between (1) clades found on the same island and (2) independent lineages of the multi-island endemic species A. adauctum and A. broussonetii that were each resolved as polyphyletic in the phylogenetic analysis (See Results; Phylogenetic reconstruction; Figure 4.3). For within island tests, seventeen clades were identified, with two clades on El Hierro, three on La Palma, two on La Gomera, seven on Tenerife and three on Gran Canaria (Appendix Figure C.1). In testing for the occurrence of hybridisation between each clade present on an island, a total of 29 tests were implemented: one on El Hierro (Table 4.3; test 1), three on La Palma (Table 4.3; tests 2-4), one on La Gomera (Table 4.3; test 5), 21 on Tenerife (Table 4.3, tests 6-26) and three on Gran Canaria (Table 4.3, tests 27-29). Four tests were implemented to test for hybridisation to explain the origins of MIEs that were recovered as polyphyletic: one for A. broussonetii that was resolved in two distinct clades (Table 4.3, test 30) and three for A. adauctum that was resolved in three distinct clades (Table 4.3, tests 31-33).

For each test we used outgroup (O) accessions from the Madeira and Selvagem Pequena clade (Clade A; Figure 4.3) which were resolved as sister to the Canary Islands clade. We used the data set based on clustering threshold of 90 % and a minimum sample number of 30 (see Results; Phylogenetic reconstruction). The methodology for running D-statistics in ipyrad can be found in a Jupyter notebook within Accompanying Material C.1.

Results

4.4.1 Processing of GBS data

Approximately 219 M raw reads were generated using GBS with an average of 2.65 M per sample (range 0.02 – 9.27 M). Filtering of low-quality reads removed 5.71 – 11.25 % of the total reads resulting in an average of 2.45 M reads retained across all samples (0.02 – 8.44 M). Seven samples with less than 0.5 M reads were removed (Table 4.1; Appendix Figure C.2), leaving a total of 76 samples. An average of 2237 (0.08 %) reads per sample mapped to either the mitochondrial or chloroplast reference genomes and were excluded (Supplementary Table C.2).

Increasing the similarity threshold for de novo clustering resulted in a higher number of clusters and consensus sequences. The average number of clusters per individual that passed the minimum depth requirement were 23,745, 26,497 and 32,946 for thresholds of 80 %, 85 % and 90 % respectively (Supplementary Table C.2). The equivalent numbers for the average number of consensus sequences were 20,583, 23,706 and 30,865 respectively (Supplementary Table C.2). Increasing the minimum number of samples required to include a locus from 30 (40 % of samples)

84 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae) to 38 (50 %) reduced the average number of loci across samples included in the assembly as well as the number of SNPs (Table 4.2).

Table 4.2 - Summary statistics for each ipyrad assembly based on varying similarity thresholds and minimum samples require to process a locus. Statistics included the number of loci, single nucleotide polymorphisms (SNPs) and unlinked SNPs (uSNPs) and the clustering results for PCA and STRUCTURE, where K equals the number of clusters determined by mclust and Evanno ΔK respectively.

Assembly Threshold Samples Loci SNPs uSNPs PCA K STRUCTURE K ΔK value 1 80 30 3867 32,367 2851 4 7 1315.33 2 80 38 2630 22,513 1846 5 6 78.04 3 85 30 4377 36,892 3250 5 6 29.99 4 85 38 2953 25,447 2092 5 2 107.98 5 90 30 5166 43,407 3871 5 6 14,797.22 6 90 38 3400 29,188 2427 6 7 256.49

4.4.2 Assembly comparison

Each data set generated by our assemblies showed a similar clustering of individuals (see Appendix Figures C.3 and C.4 - C.9 for comparison of PCAs and STRUCTURE plots respectively). Mclust identified four (one data set), five (four data sets) or six (one data set) clusters across PCAs (Table 4.2). All data sets identified a consistent set of four clusters with further clusters, where found, resulting from subdivision of those (Supplementary Table C.3). The Evanno delta K method found that the most strongly supported K values were either two (one data set), six (three data sets) or seven (two data sets; Table 4.2).

4.4.3 Phylogenetic reconstruction

The transversion model (TVM) with proportion of invariable sites (I) and a gamma distribution (G) was the best model for each assembly (Supplementary Table C.4) and was employed for the ML analysis. However, this substitution model was not available in the Bayesian inference approach employed in Mrbayes so we used the second best supported model which was the general time reversible model (GTR) +I +G (Supplementary Table C.4).

We only present results for the data set based on a clustering threshold of 90 % and minimum sample number of 30 since it had the highest number of SNPs (Figure 4.3; Table 4.2). However, the main clades identified and discussed below were recovered in the other data sets based on different assembly parameters (Appendix Figures C.3 and C.4 - C.9). Phylogenetic analyses for all

85 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae) data sets based on RAxML-NG and Mrbayes can be found in Appendix Figures C.10 - C.15 and C.16 - C.21 reprectively.

Seven main clades were identified in our ML and BI analyses (Figure 4.3; A – G). Clade A contains all taxa from Madeira and Selvagem Pequena (Figure 4.3; ML bootstrap (BS) = 100; Bayesian posterior probability (PP) = 100). Accessions of the Tenerife endemic A. broussonetii subsp. broussonetii comprise clade B (Figure 4.3; BS = 100; PP = 100). Clade C includes the multi-island endemic A. frutescens which is resolved as paraphyletic with respect to A. gracile and “A. vincentii” from Tenerife (Figure 4.3; BS = 100; PP = 100). Clade D includes A. tenerifae, A. adauctum subsp. adauctum and subsp. dugorii from Tenerife (Figure 4.3; BS = 98; PP = 100). All accessions from Gran Canaria, excluding A. frutescens subsp. canariae, were resolved in clade E (Figure 4.3; BS = 100; PP = 100). The Tenerife endemics A. coronopifolium and A. foeniculaceum are grouped together in clade F (Figure 4.3; BS = 100; PP = 100). Finally, clade G is composed of taxa endemic to the eastern and western Canary Islands (Figure 4.3; BS = 100; PP = 100).

86 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

Figure 4.3 - Maximum likelihood tree generated using RAxML-NG for the assembled dataset based on a clustering threshold of 90 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut-off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown above the branches and poster probabilities ≥ 95 from mrbayes analysis are shown below the branches. Tips are coloured by island and Clades A-G are discussed in text.

87 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

Our analysis revealed three non-monophyletic species (Figure 4.3). As noted above, samples of A. frutescens were identified as paraphyletic, with A. gracile and “A. vincentii” nested within the same clade. In addition, within A. frutescens, subsp. gracilescens was resolved as polyphyletic, with one of the two samples (frgr177) sister to the La Gomera endemics A. frutescens subsp. parviflorum and subsp. foeniculaceum and the other (frgr179) nested within subsp. frutescens (Figure 4.3). Accessions of A. broussonetii were resolved as polyphyletic, with subsp. broussonetii from Tenerife resolved as sister to clade C containing the multiple island endemic A. frutescens, whereas subsp. gomerensis from La Gomera was sister to A. callichrysum, also endemic to La Gomera (Figure 4.3). Accessions of A. adauctum were polyphyletic and were resolved in three clades (Figure 4.3). The first included A. adauctum subsp. adauctum and subsp. dugourii, resolved in a Tenerife clade (clade D) with A. tenerifae. The second was composed of A. adauctum subsp. jacobaeifolium, subsp. canariense and subsp. gracile which was resolved in the Gran Canarian clade (clade E). The third clade included A. adauctum subsp. erythrocarpon and subsp. palmensis from El Hierro and La Palma respectively. This was resolved in the Eastern-Western island clade G.

The relationships between clades A-G differed in some analyses (Appendix Figures C.10 - C.21). For example, the sister relationship between clade B and clade C, which is poorly supported in our ML analyses (Figure 4.3; BS < 70) was not found in four of the other six data sets, where instead clade B was found to be sister to all Canary Island taxa (Appendix Figures C.11, C.12, C.13, C.15). Similarly, the poorly supported sister group relationship between clade D and clade E (Figure 4.3; bs < 70) was contradicted in three of our ML analyses wherein clade D was resolved as sister to clades F and G (Appendix Figures C.11, C.13, C.15). In the two analyses with an 80% clustering threshold, one of the outgroup taxa (G. coronaria b8) was resolved as sister to the Madeiran clade (Appendix Figures C.10, C.11), although these relationships were poorly supported (BS < 70).

4.4.4 Ancestral state reconstruction

The most likely ancestral area for Clade A was Madeira (Figure 4.4). Within this clade, colonisation of Selvagem Pequena is inferred for A. thalassophilum. In the Madeiran clade, habitat shifts from coastal desert to the sclerophyllous zone and laurel forest are also inferred for A. dissectum and A. pinnatifidum respectively (Figure 4.4). The ancestral area for clade B, comprising all accessions of A. broussonetii subsp. broussonetii, is most likely Tenerife and this clade was associated with a habitat shift from coastal desert to laurel forest (Figure 4.4). Clade C, including A. frutescens as well as “A. vincentii” and A. gracile similarly has Tenerife as the most likely ancestral area (Figure 4.4). Within this clade there were colonisation events from Tenerife to Gran Canaria (for A. frutescens subsp. canariae) and to La Gomera (for subsp. foeniculaceum and subsp. parviflorum).

88 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

Figure 4.4 - Maximum likelihood tree generated based on a cluster threshold of 90 % and minimum sample number of 30 with island (left) and habitat (right) optimised onto the topology using maximum likelihood. Pie charts on each node represent the likelihood attributed to either islands or habitat types. Branch lengths are not shown and branches with bootstrap values ≥ 70 are indicated with an asterisk. Clades A-G are the same as those in figure 4.3 and are discussed in text.

89 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

Habitat shifts from coastal desert to the sclerophyllous zone were identified for A. gracile and A. frutescens subsp. parviflorum with a shift to pine forest habitats identified for “A. vincentii”. The ancestral area inferred for clade D was Tenerife and this clade was associated with a habitat shift from sclerophyllous zone to pine forest. Within clade D, there was a further shift for A. tenerifae into high-altitude desert and an ecological expansion for A. adauctum subsp. adauctum to include both pine and laurel forest habitat. Dispersal from Tenerife to Gran Canaria was inferred for clade E (Figure 4.4). Within this clade there was a habitat shift for the sub-clade containing A. adauctum subspecies on Gran Canaria from sclerophyllous zone to pine forest and a further shift to laurel forest for A. adauctum subsp. jacobaeifolium. The ancestral area inferred for clade F, containing accessions of A. coronopifolium and A. foeniculaceum, was most likely Tenerife (Figure 4.4). The ancestral area inferred for clade G, including accessions from eastern and western islands, is La Palma (Figure 4.4). Within this clade, El Hierro was colonised twice independently, and there was a single colonisation of the eastern islands (Lanzarote and La Gomera). On La Palma, A. haouarytheum is distributed across both the sclerophyllous zone and pine forest whereas A. webbii is distributed in the sclerophyllous zone and laurel forest habitats. Subspecies of A. adauctum on La Palma and El Hierro were associated with a habitat shift from sclerophyllous zone to laurel forest. On El Hierro, A. hierrense showed an expansion in habitat to include both sclerophyllous zone and coastal desert habitats. On La Gomera, a habitat shift from the sclerophyllous zone to laurel forest was identified for A. broussonetii subsp. gomerensis. In total, 11 interisland dispersal events and 17 habitat shifts were inferred.

4.4.5 D-statistics

Seventeen of the 29 tests between lineages found on the same island provided evidence of hybridisation (Figure 4.5; Table 4.3). On El Hierro, there was significant support for hybridisation between the A. sventinii-A. hierense clade and A. adauctum subsp. erythrocarpon. On La Palma there was significant support for hybridisation between A. webbii and A. adauctum subsp. palmensis. Of the 21 tests performed between clades on Tenerife, 12 provided significant support for hybridisation and all clades identified were admixed with at least two others. On Gran Canaria, there was significant evidence of hybridisation between each of the three clades identified. Between non-monophyletic lineages of MIE species, only one of the four tests performed provided support for evidence of hybridisation, between A. adauctum on Tenerife and Gran Canaria.

90

Table 4.3 - Summary of D-statistics performed between clades from the same island (tests 1-29) and between clades of non-monophyletic multi-island endemic taxa (tests 30-33). For each test performed, the taxa at positions P1, P2, P3 and O are shown, together with the D-statistic, mean bootstrap value, bootstrap standard deviation, Z score, ABBA and BABA frequencies and number of loci used in the test. Tests significant at the 0.01 level are highlighted in bold.

P1 P2 P3 O dstat bootmean bootstd Z ABBA BABA nloci 1 A. callichrysum, A. sventenii, A. adauctum subsp. erythrocarpon Clade A 0.327 0.329 0.048 6.771 237.339 120.352 3100 A. broussonetii subsp. gomerensis A. hierrense (El Hierro) (La Gomera) (El Hierro) 2 A. callichrysum, A. haouarytheum A. adauctum subsp. palmensis Clade A 0.093 0.094 0.053 1.748 165.337 137.25 2877 A. broussonetii subsp. gomerensis (La Palma) (La Palma) (La Gomera) 3 A. callichrysum, A. webbii A. adauctum subsp. palmensis Clade A 0.196 0.197 0.048 4.077 169.808 114.184 3038 91 A. broussonetii subsp. gomerensis (La Palma) (La Palma) (La Gomera) 4 A. callichrysum, A. webbii A. haouarytheum Clade A 0.203 0.201 0.051 4.02 148.078 98.019 2720 A. broussonetii subsp. gomerensis (La Palma) (La Palma) (La Gomera) 5 A. sventenii, A. callichrysum, A. frutescens subsp. parviforum, Clade A 0.021 0.019 0.058 0.359 104.749 100.493 3195 A. hierrense A. broussonetii subsp. gomerensis A. frutescens subsp. foeniculaceum (El Hierro) (La Gomera) (La Gomera) 6 A. frutescens subsp. parviforum, A. frutescens subsp. succulentum A. broussonetii subsp. broussonetii Clade A 0.291 0.292 0.062 4.692 114.54 62.846 2905 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera) 7 A. frutescens subsp. parviforum, “A. vincentii” A. broussonetii subsp. broussonetii Clade A 0.046 0.045 0.072 0.64 60.87 55.487 2389 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera) 8 A. frutescens subsp. parviforum, A. gracile A. broussonetii subsp. broussonetii Clade A 0.066 0.068 0.055 1.207 105.619 92.53 3554 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera)

9 A. frutescens subsp. parviforum, A. frutescens subsp. frutescens A. broussonetii subsp. broussonetii Clade A 0.198 0.198 0.042 4.649 132.424 88.743 3700 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera) 10 A. callichrysum, A. tenerifae, A. broussonetii subsp. broussonetii Clade A 0.208 0.209 0.041 5.048 190.823 125.04 3818 A. broussonetii subsp. gomerensis A. adauctum subsp. adauctum, (Tenerife) (La Gomera) A. adauctum subsp. dugorii (Tenerife) 11 A. callichrysum, A. foeniculaceum, A. broussonetii subsp. broussonetii Clade A 0.232 0.231 0.047 4.905 187.392 116.846 3619 A. broussonetii subsp. gomerensis A. coronopifolium (Tenerife) (La Gomera) (Tenerife) 12 A. frutescens subsp. parviforum, “A. vincentii” A. frutescens subsp. succulentum Clade A -0.035 -0.036 0.064 0.546 89.58 96.049 2107 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera) 13 A. frutescens subsp. parviforum, A. gracile A. frutescens subsp. succulentum Clade A -0.036 -0.036 0.047 0.764 133.528 143.57 2920

A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife)

92 (La Gomera) 14 A. frutescens subsp. parviforum, A. frutescens subsp. frutescens A. frutescens subsp. succulentum Clade A -0.048 -0.05 0.039 1.252 138.187 152.264 3010 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera) 15 A. callichrysum, A. tenerifae, A. frutescens subsp. succulentum Clade A 0.328 0.328 0.05 6.539 169.654 85.759 2956 A. broussonetii subsp. gomerensis A. adauctum subsp. adauctum, (Tenerife) (La Gomera) A. adauctum subsp. dugorii (Tenerife) 16 A. callichrysum, A. foeniculaceum, A. frutescens subsp. succulentum Clade A 0.404 0.403 0.048 8.358 175.026 74.253 2805 A. broussonetii subsp. gomerensis A. coronopifolium (Tenerife) (La Gomera) (Tenerife) 17 A. frutescens subsp. parviforum, A. gracile “A. vincentii” Clade A 0.055 0.055 0.054 1.021 104.323 93.387 2414 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera) 18 A. frutescens subsp. parviforum, A. frutescens subsp. frutescens “A. vincentii” Clade A 0.083 0.085 0.041 2.026 126.284 106.86 2493 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera)

19 A. callichrysum, A. tenerifae, “A. vincentii” Clade A 0.353 0.353 0.054 6.487 131.819 63.079 2406 A. broussonetii subsp. gomerensis A. adauctum subsp. adauctum, (Tenerife) (La Gomera) A. adauctum subsp. dugorii (Tenerife)

20 A. callichrysum, A. foeniculaceum, “A. vincentii” Clade A 0.231 0.231 0.063 3.651 102.707 64.166 2300 A. broussonetii subsp. gomerensis A. coronopifolium (Tenerife) (La Gomera) (Tenerife) 21 A. frutescens subsp. parviforum, A. frutescens subsp. frutescens A. gracile Clade A -0.042 -0.042 0.035 1.228 199.508 217.187 3698 A. frutescens subsp. foeniculaceum (Tenerife) (Tenerife) (La Gomera) 22 A. callichrysum, A. tenerifae, A. gracile Clade A 0.326 0.324 0.04 8.114 229.478 116.676 3663 A. broussonetii subsp. gomerensis A. adauctum subsp. adauctum, (Tenerife) (La Gomera) A. adauctum subsp. dugorii (Tenerife)

93

23 A. callichrysum, A. foeniculaceum, A. gracile Clade A 0.223 0.222 0.045 4.998 177.869 112.925 3462 A. broussonetii subsp. gomerensis A. coronopifolium (Tenerife) (La Gomera) (Tenerife) 24 A. callichrysum, A. tenerifae, A. frutescens subsp. frutescens Clade A 0.312 0.312 0.036 8.74 229.205 120.117 3855 A. broussonetii subsp. gomerensis A. adauctum subsp. adauctum, (Tenerife) (La Gomera) A. adauctum subsp. dugorii (Tenerife) 25 A. callichrysum, A. foeniculaceum, A. frutescens subsp. frutescens Clade A 0.238 0.238 0.045 5.277 182.175 112.174 3646 A. broussonetii subsp. gomerensis A. coronopifolium (Tenerife) (La Gomera) (Tenerife) 26 A. callichrysum, A. foeniculaceum, A. tenerifae, Clade A 0.191 0.19 0.041 4.669 233.855 158.98 3754 A. broussonetii subsp. gomerensis A. coronopifolium A. adauctum subsp. adauctum, (La Gomera) (Tenerife) A. adauctum subsp. dugorii (Tenerife)

27 A. callichrysum, A. adauctum subsp. canariense, A. frutescens subsp. canariense Clade A 0.325 0.324 0.061 5.348 108.383 55.246 2425 A. broussonetii subsp. gomerensis A. adauctum subsp. gracile, (Gran Canaria) (La Gomera) A. adauctum subsp. jacobaefolium (Gran Canaria)

28 A. callichrysum, A. filifolium, A. frutescens subsp. canariense Clade A 0.278 0.277 0.06 4.615 114.55 64.726 2397 A. broussonetii subsp. gomerensis A. escarrei, (Gran Canaria) (La Gomera) A. lidii (Gran Canaria)

29 A. tenerifae, A. adauctum subsp. canariense, A. filifolium, Clade A 0.391 0.39 0.027 14.589 289.277 126.542 3978 A. adauctum subsp. adauctum, A. adauctum subsp. gracile, A. escarrei, A. adauctum subsp. dugorii A. adauctum subsp. jacobaefolium A. lidii (Tenerife) (Gran Canaria) (Gran Canaria)

30 A. callichrysum A. broussonetii subsp. gomerensis A. broussonetii subsp. broussonetii Clade A 0.055 0.051 0.063 0.865 81.072 72.649 3347 (La Gomera) (La Gomera) (Tenerife) 31 A. filifolium, A. adauctum subsp. canariense, A. adauctum subsp. adauctum, Clade A 0.358 0.357 0.029 12.178 267.427 126.361 3934 A. escarrei, A. adauctum subsp. gracile, A. adauctum subsp. dugorii A. lidii A. adauctum subsp. jacobaefolium (Tenerife) (Gran Canaria) (Gran Canaria)

32 A. callichrysum, A. adauctum subsp. erythrocarpon, A. adauctum subsp. adauctum, Clade A 0.137 0.137 0.045 3.068 192.388 145.917 3720 94 A. broussonetii subsp. gomerensis A. adauctum subsp. palmensis A. adauctum subsp. dugorii (La Gomera) (El Hierro) (Tenerife) 33 A. callichrysum, A. adauctum subsp. erythrocarpon, A. adauctum subsp. canariense, Clade A 0.076 0.077 0.041 1.856 168.37 144.59 3733 A. broussonetii subsp. gomerensis A. adauctum subsp. palmensis A. adauctum subsp. gracile, (La Gomera) (El Hierro) A. adauctum subsp. jacobaefolium (Gran Canaria)

Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

Figure 4.5 - Maximum likelihood tree generated based on a cluster threshold of 90 % and minimum sample number of 30 with vertical lines connecting clades for which there was significant support for hybridisation in the D-statistics analysis. Multiple samples of a taxon occurring in the same clade are represented by a single accession; species relationships that were not supported were collapsed as a polytomy.

Discussion

Argyranthemum is the largest endemic genus of flowering plants endemic to Macaronesia and an exemplary case of an evolutionary radiation (Francisco-Ortega et al., 1996b, 1997a). As such, it is an ideal system for investigations of the relative important of geographical isolation, habitat shifts and hybridisation in generating plant diversity across Macaronesia. Previous phylogenetic analyses of Argyranthemum based on a few molecular markers suggested that diversification was largely explained by geographical isolation by means of inter-island dispersal between similar habitats (Francisco-Ortega et al., 1996b). Several taxa in Argyranthemum were also resolved as non-monophyletic, including the MIEs A. broussonetii, A. adauctum and A. frutescens (Francisco-

95 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

Ortega et al., 1996a,b). However, phylogenetic inference in these analyses were limited by a lack genetic variation and resolution of species relationships. To overcome limitations of previous studies, we used GBS to investigate the evolutionary relationships within Argyranthemum and the processes responsible their diversification. Thousands of polymorphic, presumptively neutral, markers across the genome were identified and our phylogenetic analysis improved the resolution between species considerably (Figure 4.3).

Based on our ancestral state reconstruction (Figure 4.4), the most recent common ancestor of Argyranthemum likely colonised Tenerife and this central island acted as the centre of diversity, as suggested for Crambe (Francisco-Ortega et al., 2002), (Vitales et al., 2014) and Descurainia (Goodson et al., 2006). From Tenerife, the results suggest that Argyranthemum dispersed north to Madeira then back to Selvagem Pequena. Gran Canaria was colonised twice from Tenerife and there were colonisation events also to La Gomera and La Palma. From La Palma, El Hierro was colonised twice, as well as La Gomera and the eastern islands of Lanzarote and Fuerteventura. The colonisation of the Eastern islands of Lanzarote and Fuerteventura from the western island of La Palma is counter intuitive, but was also recovered by Francisco-Ortega et al. (1996b). As an alternative to long distance dispersal, Francisco-Ortega et al. (1996b) hypothesised that this lineage may have followed a stepping stone colonisation pattern from West to East, with the lineages on the central islands since becoming extinct. As far as we are aware, there are no other Canary Island endemic lineages showing this West to East relationship. However, there are comparable examples of floristic links between the western and eastern islands of the Azorean archipelago (e.g. Schaefer et al., 2011).

Our phylogenetic analysis confirmed that all three MIE species, A. broussonetii, A. adauctum and A. frutescens, are non-monophyletic (Figure 4.3), in agreement with the analysis of Francisco- Ortega et al. (1996a,b). Specifically, A. broussonetii is polyphyletic with subsp. broussonetii on Tenerife sister to the multi-island endemic A. frutescens and subsp. gomerensis on La Gomera sister to A. callichrysum, also from La Gomera. Argyranthemum adauctum was also polyphyletic with three independent clades that corresponded to (1) Gran Canaria (subsp. gracile, canariense and jacobaeifolium), (2) Tenerife (subsp. adauctum and dugourii) and (3) La Palma and El Hierro (subsp. palmensis and erythrocarpon respectively). Subspecies of A. adauctum on Gran Canaria and Tenerife were each sister to taxa from the same island whereas subspecies from La Palma and El Hierro were sister to a clade comprised of taxa from the western islands of La Palma, El Hierro and La Gomera and the eastern islands of Lanzarote and Fuerteventura. Argyranthemum frutescens was resolved as paraphyletic with A. gracile and “A. vincentii” nested within it (Figure

96 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

4.3; clade C). Based on our phylogenetic analysis, a reconsideration of current taxonomic circumscriptions in the A. frutescens clade would be appropriate.

Hybridisation between lineages co-occurring on the same islands appears to be common in Argyranthemum and was supported in 57 % of tests performed on Tenerife; 33% on La Palma and 100% on El Hierro and Gran Canaria. Taken together with evidence that hybridisation has generated two species by homoploid hybrid speciation (Brochmann et al., 2000; Fjellheim et al., 2009; White et al., 2018), it is clear that hybridisation has played a significant role in the evolutionary history and diversification of Argyranthemum, a pattern consistent with the findings in Micromeria (Curto et al., 2017).

However, of the four D-statistics performed on the polyphyletic MIEs A. broussonetii and A. adauctum, only one was significant, with support for hybridisation between clades of A. adauctum on Tenerife (subsp. adauctum and dugourii) and Gran Canaria (subsp. gracile, canariense and jacobaeifolium; Table 4.3, test 31). This hybridisation event is consistent with the hypothesis of Jones et al. (2014) that hybridisation with co-distributed taxa explains the polyphyly of multi-island endemics. The identification of hybridisation between lineages that do not co- occur on the same island might suggest that at least one of the lineages in question was once more widespread. Jones et al (2014) were unable to differentiate between ILS and hybridisation with their molecular data; hybridisation was inferred based on incongruence between their chloroplast and nuclear (ITS) data that showed a taxonomic and geographical signal respectively in the relationships of accessions of the MIE Pericallis appendiculata. However, Curto et al. (2017) was able to infer evidence of hybridisation between lineages of Micromeria distributed across different islands using D-statistics suggesting that inter-island hybridisation might be a significant process in the diversification of Macaronesian lineages.

In the absence of evidence for hybridisation in the other MIE tests performed, hybridisation as an explanation for polyphyletic species more generally is not supported for Argyranthemum and morphological convergence of distinct evolutionary lineages may better explain the patterns observed. Indeed, morphological analysis indicates that in the case of A. broussonetii, while the two subspecies are similar in leaf characteristics, subsp. gomerensis shows greater affinity to A. callichrysum in capitula width and cypselae (dry dingle seeded fruits) traits. This suggests that the two have converged on similar leaf traits in response to the similar habitats in which they occur (White, unpublished data). It is evident that the treatment of the two taxa A. broussonetii subsp. broussonetii and gomerensis as conspecific cannot be supported.

97 Geographical isolation, habitat shifts, hybridisation and convergent evolution in the diversification of the Macaronesian endemic genus Argyranthemum (Asteraceae)

In contrast, the morphological characters that delimit the three lineages of A. adauctum from other taxa (Humphries, 1976) appear to be more consistent, including hispid or tomentose indumentum, sessile leaves with primary lobes or teeth at the leaf base and wingless and fused ray cypselae (pers. obs.). Although morphological convergence seems less likely, the role of convergent morphological evolution has received much less attention in the diversification of flowering plants across Macaronesia compared with geographic isolation and habitat shifts. Lee et al. (2005) proposed convergent evolution as a potential explanation for the non-monophyletic relationships of the genus Taeckholmia in the Sonchus alliance (Asteraceae: Sonchinae). However, the authors were unable to completely rule out the possibility of hybridisation as an alternative explanation. Convergent morphological evolution in response to similar habitats on islands has also been reported in Nesotes beetles in the Canary Islands (Rees et al., 2001) and Anolis Lizards of the Greater Antilles (Losos et al., 1998).

In total, 11 speciation events were associated with geographical isolation by means of inter-island dispersal and 17 were associated with intra-island habitat shifts. In particular, habitat shifts were found to be more prevalent in the Canary Islands than hypothesised by Francisco-Ortega et al. (1996b). Given the extent of intra-island hybridisation revealed by our analyses and the evidence of homoploid hybrid speciation into new habitats in Argyranthemum (White et al., 2018), it is plausible that hybridisation may have contributed to the frequency of habitat shifts identified in the evolution of the genus.

The use of GBS has significantly improved the resolution of phylogenetic relationships in Argyranthemum and revealed greater complexity in the process responsible for its diversification with geographical isolation, habitat shifts, hybridisation and convergent morphological evolution all inferred. High throughput sequencing (HTS) is increasingly employed in investigations of oceanic island endemic lineages such as Argyranthemum (Mort et al., 2015; Paun et al., 2016; Curto et al., 2017). Further studies of Macaronesian endemic lineages using HTS are likely to provide further insights into the complexity of the evolutionary process acting to generate diversity across these archipelagos.

98

Chapter 5 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

99 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Introduction

Argyranthemum Webb (Asteraceae-Anthemideae) is the largest endemic flowering plant genus of the Macaronesian region. It comprises 24 species (39 terminal taxa when subspecies are considered; Humphries, 1976) and is distributed across the archipelagos of Madeira, the Selvagens and the Canary Islands.

Argyranthemum broussonetii (Pers.) Humphries is endemic to the Canary Islands where it is restricted to laurel forest clearings on Tenerife and La Gomera. It is distinguished from other taxa in the genus by its large bipinnatifid leaves, often wingless ray cypselae, typically two-winged ray cypselae, large capitula and ray florets. Humphries (1976) recognised two subspecies, namely A. broussonetii subsp. broussonetii endemic to Tenerife and subsp. gomerensis Humphries endemic to La Gomera (Figure 5.1; Figure 5.2). Humphries differentiated the subspecies largely by size with subsp. broussonetii larger than subsp. gomerensis in stature, leaf size, involucre width and ray cypselae size (Table 5.1).

Figure 5.1 - Distribution map of A. broussonetii subsp. broussonetii (brbr), subsp. gomerensis (brgo) and A. callichrysum (ca) on Tenerife and La Gomera. Point localities are based on vouchers used in the morphological analysis. Specimens without latitude and longitude coordinates were georeferenced based on locality description.

100

Raisingrank the of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to

species based evolutionaryon relationships and morphology

101

Figure 5.2 - Plants in situ (A), leaves from glasshouse grown plants (B) and ray (C) and disc (C) cypselae. Photos taken by Oliver White

Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Table 5.1 - Distinguishing characteristics of A. broussonetii subsp. broussonetii, subsp. gomerensis and A. callichrysum based on Humphries (1976).

Taxon Stems Leaf dimensions Leaf dissection Involucre Ray cypselae A. broussonetii Up to 120 cm, 3 - 16 × bipinnatifid 12 - 22 mm 5 - 6 × subsp. ascending 0.5 - 8 cm 3 - 5 mm broussonetii A. broussonetii 70 - 80 cm, slightly 3 - 10 × bipinnatifid 13 - 16 (-19) mm 4 - 5 × subsp. procumbent to 0.5 - 6 cm 3 - 5 mm gomerensis ascending A. callichrysum 60 - 100 cm, erect 10 - 15 × bipinnatisect 9 - 14 mm 4 - 5 × and 2 - 6 cm 4 - 8 mm branched throughout

A phylogenetic analysis of Argyranthemum based on chloroplast restriction site markers found that A. broussonetii is not monophyletic since the two subspecies were resolved in different clades (Francisco-Ortega et al., 1996b). More recently, a phylogenetic study of Argyranthemum that employed a Next Generation Sequencing (NGS) approach (White et al. in prep) provided improved resolution of species relationships and confirmed that the two subspecies of A. broussonetii are distinct evolutionary lineages. Specifically, A. broussonetii subsp. broussonetii is sister to the multi- island endemic A. frutescens whereas A. broussonetii subsp. gomerensis is sister to A. callichrysum (Svent.) Humphries which is also endemic to La Gomera (Figure 5.1; Figure 5.2). D-statistics provided no evidence of introgression between A. broussonetii subsp. broussonetii and A. callichrysum ruling out the possibility that A. broussonetii subsp. gomerensis is the result of historical hybridisation between A. broussonetii and A. callichrysum.

Francisco-Ortega et al. (1996b) suggested that the subspecies of A. broussonetii are morphologically distinct and that each should be considered a unique species. However, no subsequent taxonomic work on A. broussonetii has been published. In this study, we re-assess the of A. broussonetii in the light of recent phylogenetic analyses. The morphological distinctiveness of the two subspecies of A. broussonetii is demonstrated with subsp. gomerensis morphologically more similar to A. callichrysum. We propose raising A. broussonetii subsp. gomerensis to species rank and provide a key with which to differentiate the two and A. callichrysum.

Materials and methods

We identified characters that differed between subspecies of A. broussonetii and A. callichrysum based on Humphries (1976). These included leaf attachment (petiolate, shortly petiolate or sessile), leaf dissection (bipinnatisect, bipinnatifid), primary lobe length, primary lobe width, primary lobe

102 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology shape (linear-lanceolate, obovate), capitulum width (cm), ray cypselae colouration (yellow-brown, chestnut-brown, brown-purple or black), ray cypselae arrangement (solitary, coalesced in groups of 2 – 6 or both identified in the same capitulum), ray cypselae wings (present or absent) and disk cypselae wing number (zero, one or two wings). For an estimate of leaf dissection, a ratio of leaf width and lamina was calculated. For an estimate of lobe shape, a ratio of leaf length and width was calculated. Descriptions of these characters and definitions are provided in Table 5.2 and Figure 5.3 A total of 27 recent collections of Argyranthemum made in July – August 2015 were examined and scored for these traits together with 10 further collections accessioned at the Natural History Museum in London (BM). The 31 voucher specimens examined for the morphometric analysis comprised 16 A. broussonetii subsp. broussonetii, 14 A. broussonetii subsp. gomerensis and seven A. callichrysum.

For continuous characters, we checked for normality using histograms, Q-Q norm plots and a Shapiro-Wilk test. Characters with a non-normal distribution were transformed and we tested for significant differences between taxa using ANOVA followed by post hoc Tukey’s honest significant difference test. For discrete variables, we used chi-square tests of independence to identify significant associations between character frequencies and taxa. To further investigate the morphological relationships, we employed the R package PCAmixdata (Chavent et al., 2012, 2017) which implements principal components analysis (PCA) using both continuous and discrete variables. The number of clusters identified by the PCA was determined using the R package mclust (Scrucca et al., 2016; R Core Team, 2018).

Further specimens at BM, RNG and ORT were also examined to test the taxon circumscriptions proposed in light of the morphometric analysis.

103

Table 5.2 - Characters used in our morphological analysis with information on whether the character was continuous (cont.) or discrete (disc.), how it was scored and definitions for repeatability.

Character Type Scoring Definitions Based on distance from leaf base to first primary lobe or tooth Leaf lobes are defined as paired or obovate or ovate limbs. Leaf attachment disc. 1. petiolate (≥ 1.5 cm) Consensus taken across the specimen. 2. shortly petiolate ± sessile (0.5-1.5 cm) 3. sessile (≤ 0.5 cm) Selected the third leaf below lowest peduncle and the Primary lobe length cont. cm longest lobe closest to the . If the third leaf

couldn't easily be measured the next leaf down was used. Primary lobe width cont. cm Same lobe as above. Capitulum width cont. cm Selected uppermost capitulum. 1. yellow-light brown 2. chestnut brown Used extra in cypselae in packet if available or dissection of 104 Ray cypselae colour disc. 3. brown-purple specimen. 4. black Used same cyselae as above. 1. solitary Solitary - single cypselae Ray cypselae arrangement disc. 2. coalesced Coalesced - two to six coalesed cypselae 3. both Both - both solitary and coalsed cypselae idenfied Ray cypselae wing presence or 1. absent disc. Used same cypselae as above. absence 2. present 1. no wings Used same cypselae as above. Wings defined as extensions Disc cypselae wing number disc. 2. one wing of the cypselae surface ≥ 1 mm in length. 3. two wings

Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to to gomerensis subsp. broussonetii Argyranthemum endemic Macaronesian the of the rank Raising morphology and relationships on evolutionary species based

Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Figure 5.3 - Leaf measurements used in morphological analysis.

Results

Ten of the 37 samples had missing characters scores including three of A. broussonetii subsp. broussonetii and seven subsp. gomerensis. Input data can be found in Supplementary Table D.1. All continuous variables were normally distributed except for leaf lamina width ratio which require a log transformation (Shapiro-Wilk test; primary lobe length, P = 0.194; primary lobe width, P = 0.054; primary lobe length width ratio, P = 0.099; log leaf lamina width ratio, P = 0.909; capitulum width, P = 0.173). One-way ANOVA found no significant difference in primary lobe length between taxa (F = 0.098, P = 0.906; Figure 5.4 A). In contrast, significant differences were identified based on primary lobe width (F = 10.580, P = < 0.001; Figure 5.4 B), primary lobe length width ratio (F = 38.580, P = < 0.001; Figure 5.4 C), log transformed leaf lamina ratio (F = 10.080, P = < 0.001; Figure 5.4 D) and capitulum width (F = 16.050, P = < 0.001; Figure 5.4 E). For primary lobe width, A. broussonetii subsp. broussonetii had significantly wider lobes than A. broussonetii subsp. gomerensis and A. callichrysum (Figure 5.4 B). Primary lobe length:width ratio was significantly different across all species (Figure 5.4 C). Subspecies of A. broussonetii shared a similar leaf:lamina width ratio and were significantly different to A. callichrysum (Figure 5.4 D). Argyranthemum broussonetii subsp. broussonetii had significantly wider capitula with respect to A. broussonetii subsp. gomerensis and A. callichrysum (Figure 5.4 C).

105 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Chi-squared tests identified a no association between ray cypselae colour (χ2 = 4.9887, P = 0.545; Figure 5.5 B) or ray cypselae wings (χ2 = 1.909, P = 0.385; Figure 5.5 G) with taxa. However, there were significant associations between leaf attachment (χ2 =37.000, P = < 0.001; Figure 5.5 A), ray cypselae arrangement (χ2 = 30.159, P = < 0.001; Figure 5.5 C) and disc wing number (χ2 = 27.000, P = < 0.001; Figure 5.5 E) with taxa.

Figure 5.4 - Boxplots of continuous characters with letters above each box referring to the groups identified by Tukey tests. Taxa are abbreviated as brbr (A. broussonetii subsp. broussonetii), brgo (A. broussonetii subsp. gomerensis) and ca (A. callichrysum).

106 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Figure 5.5 - Stacked bar plots with frequency counts for each discrete character. Taxa are abbreviated as brbr (A. broussonetii subsp. broussonetii), brgo (A. broussonetii subsp. gomerensis) and ca (A. callichrysum).

107 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Figure 5.6 - PCA based on continuous and discrete characters. Point colour is shows the taxon whereas shape is determined by the cluster. The proportion of variation explained on each dimension is shown in parentheses. Taxa are abbreviated as brbr (A. broussonetii subsp. broussonetii), brgo (A. broussonetii subsp. gomerensis) and ca (A. callichrysum).

The PCA of both continuous and discrete variable showed a clear separation on the x axis, with A. broussonetii subsp. gomerensis and A. callichrysum clustering together and A. broussonetii subsp. broussonetii forming a distinct group (Figure 5.6). There was a slight separation between A. broussonetii subsp. gomerensis and A. callichrysum, with A. broussonetii subsp. gomerensis in a more intermediate position relative to A. callichrysum. Mclust identified two clusters, with the first corresponding to A. broussonetii subsp. gomerensis and A. callichrysum, the second composed of A. broussonetii subsp. broussonetii (Figure 5.6).

Discussion

From a morphological perspective, we found that A. broussonetii subsp. broussonetii and subsp. gomerensis are similar in leaf traits including primary lobe length and lead dissection (leaf:lamina ratio), which supports their current taxonomic treatment. However, A. broussonetii subsp.

108 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology broussonetii and subsp. gomerensis can be distinguished based on leaf attachment, capitula width and cypselae characteristics. Indeed, A. broussonetii subsp. gomerensis shows greater similarity to A. callichrysum based on these characters, which is in agreement their phylogenetic relationships (White et al., in prep) and the hypothesis proposed by Francisco-Ortega et al. (1996b). While our ordination analysis shows some overlap between A. broussonetii subsp. gomerensis and A. callichrysum (Figure 5.6), the two can be differentiated using primary lobe width (Figure 5.4 B), primary lobe shape (length:width ratio; Figure 5.4 D) and dissection (leaf:lamina ratio; Figure 5.4 D). The differences in leaf morphology between A. broussonetii subsp. gomerensis and A. callichrysum are also apparent in plants grown under common glasshouse conditions (Figure 5.2 B), ruling out the possibility that the observed differences in wild collected material was simply due to environmentally induced plasticity. In light of the morphological findings that are congruent with the molecular phylogenetic data, we propose that A. broussonetii subsp. gomerensis should be recognised at specific rank. The necessary new combination is provided below together with a key to the three taxa considered here.

Morphological traits that are advantageous in a particular habitat may be susceptible to morphological convergence. The two independently derived lineages that comprise A. broussonetii as circumscribed by (Humphries, 1976) occupy humid laurel of Tenerife and La Gomera where they can be found in forest clearings. The similarity in leaf shape exhibited by these two lineages would appear to be a convergence in response to similarities in habitat. Although no hybrids between A. broussonetii subsp. gomerensis and A. callichrysum were identified, it is likely that occasional hybrids exist where both species come into close proximity, as has been found for other species in the genus (Borgen, 1976; Brochmann, 1984; Brochmann et al., 2000).

Key to taxa

1 Leaves petiolate, lacking teeth or primary lobes at the leaf base. Capitulum 0.75-1.5 cm wide. Disc cypselae with a single wing; ray cypselae either coalesced or solitary…………………………………………….…2

- Leaves sessile with teeth and/or primary lobe to the base. Capitulum 1.4-2.0 cm wide. Disc cypselae two winged; ray cypselae typically solitary…………………………………………………….A. broussonetii

2 Leaves bipinnatisect, primary leaf lobes linear lanceolate, 0.2-0.75 cm wide…….………A. callichrysum

- Leaves bipinnatifid, primary leaf lobes obovate, 0.75-1.5 cm wide; ………………………….…A. gomerensis

109 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Argyranthemum callichrysum (Svent.) C. J. Humphries

Chrysanthemum callichrysum Svent. Addit. Fl Canar. 1 : 65, t. 25 (1960). Type: Sin loc., Sventenius, s.n. (ORT holo.).

Leaves petiolate, bipinnatisect, primary lobes linear lanceolate. Disc cypselae single-winged; ray cypselae often coalesced.

La Gomera: distributed from Barranco de Argaga on the South West coast to the South and South- facing slopes of the central mountains of Igualero, Agando and Tagamiche (Figure. 1).

Associated with xerophytic scrub on rocky slopes or disturbed vegetation.

Specimens examined: SPAIN, CANARY ISLANDS, LA GOMERA: A. Santos 34.327, Dgda. Peraza, 08 March 1997, (ORT); A. Santos 34.328, Dgda. Peraza, 08 March 1997, (ORT); A. Santos 34.329, Bajada, 09 March 1997, (ORT); D. Bramwell & C. J. Humphries 3174, Igualero, 30 March 1971, (BM); E. R. Sventenius 4827, Fte. Tamadauche, 03 March 1946, (ORT); E. R. Sventenius 4828, Fuente de la Yegua, 18 May 1945, (ORT); E. R. Sventenius 4829, Seima, Risco Serradero, 19 May 1958, (ORT); E. R. Sventenius 4831, Ojila, 29 March 1959, (ORT); E. R. Sventenius 4832, Roque de Agando, 20 May 1945, (ORT); E. R. Sventenius 4834, Andenes sobre Benchijigua, 23 April 1966, (ORT); E. R. Sventenius 4835, Andenes sobre Benchijigua, 23 April 1966, (ORT); E. R. Sventenius 4838, Fuente Tamadauche, 14 May 1946, (ORT); E. R. Sventenius 4849, Jerduñe, 06 May 1970, (ORT); E. R. Sventenius 4851, Pico Gomero, 19 April 1966, (ORT); E. R. Sventenius 4852, Degollada Blanca, Pico Gomero, 19 April 1966, (ORT); E. R. Sventenius 4853, Bco. de la Laja, Roque Grande, 22 May 1965, (ORT); E. R. Sventenius 4855, Andenes de Tagasmiche, 05 May 1968, (ORT); E. R. Sventenius 4858, Sine loc., May 1969, (ORT); E. R. Sventenius 4859, Sobre los Andenes de Benchijigua, 23 May 1969, (ORT); E. R. Sventenius 4860, Ardenes de Benchijigua, 08 May 1968, (ORT); E. R. Sventenius 4861, Ardenes de Benchijigua, 08 May 1968, (ORT); E. R. Sventenius 4862, Tacalcuce, 11 June 1970, (ORT); E. R. Sventenius 4863, Taguluche, 13 May 1959, (ORT); E. R. Sventenius 23872, Igualero, 07 March 1954, (ORT); M. Fernández Galván 26020, Laderas del Rq. Cano, May 1973, (ORT); M. Fernández Galván 26422, Orillas de Izque, 01 May 1976, (ORT); M. Fernández Galván 27921, Arure, 16 April 1978, (ORT); O. White et al. 94, Barranco de la Guancha, 16 May 2015, (BM); O. White et al. 95, Barranco de la Guancha, 16 May 2015, (BM); O. White et al. 96, Barranco de la Guancha, 16 May 2015, (BM); O. White et al. 97, Roque de Agando, 16 May 2015, (BM); O. White et al. 98, Roque de Agando, 16 May 2015, (BM); O. White et al. 99, Roque de Agando, 16 May 2015, (BM).

110 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Argyranthemum gomerensis (C.J.Humphries) O.W.White comb et stat nov.

Argyranthemum broussonetii subsp. gomerensis C.J.Humphries. Bull. Brit. Mus. (Nat. Hist.) Bot. 4: 5. Type: Bramwell & Humphries 3355 (RNG, holo, BM 000810844; iso).

Leaves petiolate, bipinnatifid, primary lobes ovate. Disc cypselae single-winged; ray cypselae often coalesced.

La Gomera: scattered populations on steep slopes of La Gomera between Las Rosas, La Palmita and Agulo on the North West coast (Figure 5.1).

Along roadsides and open clearings of novocanariensis forest between 550 and 1000 m.

Specimens examined: SPAIN, CANARY ISLANDS, LA GOMERA: C. E. Jarvis 603, Top of Barranco del Hermigua, 08 May 1977, (BM); D. Bramwell & C. J. Humphries 3355, Between Agulo and Las Rosas, 06 April 1971, (BM); E. Bourgeau 247, Dellgollada de San Sebastian, 1845, (BM); E. R. Sventenius 4825, Bco. de Liria, 15 May 1945, (ORT); E. R. Sventenius 4833, Hermigua, 07 July 1960, (ORT); E. R. Sventenius 4848, Pico Aragán, 19 May 1965, (ORT); M. Fernández Galván 26019, La Meseta, 20 April 1975, (ORT); M. Fernández Galván 26022, El Cano, 1973, (ORT); M. Fernández Galván 26024, La Meseta, 20 April 1975, (ORT); M. Fernández Galván 26338, Aguajilva, March 1976, (ORT); M. Fernández Galván 26340, Las Carboneras, NA, (ORT); M. Fernández Galván 26344, Meriga, límite inf. del monte, April 1976, (ORT); M. Fernández Galván 26469, Aguajilva, April 1976, (ORT); M. Fernández Galván 26661, Borde superior del monte del Cedro, 27 July 1977, (ORT); M. Fernández Galván s.n., Las Casas del Cedro, 16 July 1975, (ORT); O. White et al. 100, Alongside CV- 17 above Igualero, 16 May 2015, (BM); O. White et al. 104, Alongside CV-6 south of Epína, 17 May 2015, (BM); O. White et al. 108, Las Rosas, 17 May 2015, (BM); O. White et al. 109, Las Rosas., 17 May 2015, (BM); O. White et al. 110, Between Las Rosas and La Palmita, 17 May 2015, (BM); O. White et al. 111, Between Las Rosas and La Palmita, 17 May 2015, (BM); O. White et al. 112, South of La Palmita, 17 May 2015, (BM); O. White et al. 113, South of La Palmita, 17 May 2015, (BM); O. White et al. 114, South of La Palmita, 17 May 2015, (BM); O. White et al. 115, North of La Palmita, 17 May 2015, (BM); R. T. Lowe 133, Hermigua, 18 April 1961, (BM).

111 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Argyranthemum broussonetii (Pers.) C. J. Humphries

Chrysanthemum broussonetii Pers., Syn. PL 2 : 461 (1807). Type: Cult? Persoon (L L0012057, holo).

Leaves sessile, bipinnatifid, primary lobes ovate. Disc cypselae two-winged; ray cypselae typically solitary.

Tenerife: locally common in the Anaga peninsula of Tenerife; small populations also reported in the Orotava valley between Icod Alto and Realejo.

Along roadsides and open clearings of Laurus novocanariensis forest between 550 and 1000 m.

Specimens examined: SPAIN, CANARY ISLANDS, TENERIFE: A. E. Aldridge 558, Anaga. Near to Cumbrilla. Road to Taganana, 28 January 1973, (BM, RNG); A. E. Aldridge 625A, Near Aeropuerto. By side of Autopista. West of La Laguna, 29 January 1973, (BM, RNG); Broussonet s.n., s.l., 1801, (BM); C. E. Jarvis 467, Monte de las Mercedes, 22 April 1997, (BM, RNG); C. E. Jarvis & D. Bramwell 544, Sierra Anaga. 4km west of Chinobre, 03 May 1977, (BM, RNG); D. Bramwell 421, Pico Inglés, 02 December 1968, (RNG); D. Bramwell 1257, Icod el alto, above road to Realejo, 13 April 1969, (RNG); D. Bramwell 1534, Taganana, 21 May 1969, (RNG); D. Bramwell & C. J. Humphries 3364, Ridge between Las Animas and Azaro, 09 April 1971, (BM, RNG); D. Bramwell & C. J. Humphries 3378, Roque de las Pasas, 09 April 1971, (RNG); D. Bramwell & C. J. Humphries 3382, Punta de Anaga, Roque del Agua, 09 April 1971, (BM); D. Bramwell & C. J. Humphries 3382, Anaga, Roque del Aqua, 09 April 1971, (RNG); D. Bramwell & C. J. Humphries s.n., Roque de las Pasas, 09 April 1971, (BM); J.F.M., M.J. & P.F. Cannon 4426, Valle de Guerra to Cruz Chiquita, 01 April 1975, (BM); O. White et al. 69, La Cumbrilla, Anaga, 06 May 2015, (BM); O. White et al. 70, La Cumbrilla, Anaga, 06 May 2015, (BM); O. White et al. 71, La Cumbrilla, Anaga, 06 May 2015, (BM); O. White et al. 79, Path to Mesa del Sabinal, Anaga, 06 May 2015, (BM); O. White et al. 84, Barranco de Valle Crispin, 13 May 2015, (BM); O. White et al. 85, Barranco de Valle Crispin, 13 May 2015, (BM); O. White et al. 87, Barranco de Valle Crispin, 13 May 2015, (BM); O. White et al. 88, Barranco de Valle Crispin, 13 May 2015, (BM); O. White et al. 89, Barranco de Valle Crispin, 13 May 2015, (BM); O. White et al. 90, Barranco de Valle Crispin, 13 May 2015, (BM); O. White et al. 155, Roques del Fraile, 25 May 2015, (BM); O. White et al. 156, Roques del Fraile, 25 May 2015, (BM); O. White et al. 157, Roques del Fraile, 25 May 2015, (BM); O. White et al. 679, Chamorga, Anaga, 20 June 2015, (BM); O. White et al. 683, Chamorga, Anaga, 20 June 2015, (BM); O. White et al. 686, Chamorga, Anaga, 20 June 2015, (BM); O. White et al. 688, Chamorga, Anaga, 20 June 2015, (BM); O. White et al. 726, Las Casas de la Cumbre, Anaga, 21 June 2015, (BM); R. & M.

112 Raising the rank of the Macaronesian endemic Argyranthemum broussonetii subsp. gomerensis to species based on evolutionary relationships and morphology

Dittrich s.n., WSW of Tegueste, on the road TF 5118, El Boqueron., 06 July 1976, (BM); R. P. Murray s.n., Anaga, 19 May 1890, (BM).

113

Chapter 6 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

115 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Abstract

Ecological isolation is increasingly thought to play an important role in the origin and reproductive isolation of homoploid hybrid species. Novel gene combinations and/or transgressive gene expression represent ways in which ecological isolation might be achieved. In this study we employ comparative transcriptomics to investigate gene expression changes associated with the origin of two homoploid hybrid species, Argyranthemum sundingii and A. lemsii (Asteraceae), which are independently derived from crosses between the same parent species. As there is no standard methodology for comparative transcriptomics in the absence of a reference genome (as in Argyranthemum), we trial five different pipelines, and selected one which appears to show the least bias for the expression analysis, based on the quantification of expression across orthogroups. We identify differentially expressed loci between the parental taxa and between the homoploid hybrid species and their parents. In several cases these have a potential role in ecological adaptation and speciation. Although independently derived, the homoploid hybrid species have converged on similar expression phenotypes, likely as a consequence of adaptation to similar habitats.

Introduction

Homoploid hybrid speciation (HHS) is the origin of a new species by hybridisation without a change in chromosome number (Rieseberg, 1997; Soltis & Soltis, 2009). Although this phenomenon appears to be a rare occurrence in nature, the number of putative homoploid hybrid species documented is increasing, indicating it may be a more widespread phenomenon than we currently think (Mavárez & Linares, 2008; Nolte & Tautz, 2010; Abbott et al., 2013). The accumulation of reproductive barriers between hybrid lineage and the parental progenitors as a result of hybridisation is a key criterion of HHS (Schumer et al., 2014), but has been proven experimentally in only a handful of cases, i.e. sunflowers (Helianthus L.; Rieseberg et al., 2003) and butterflies (Heliconius Kluk; Salazar et al., 2010). Two mechanisms are thought to contribute to the reproductive isolation of a novel hybrid from its parents during HHS, namely chromosomal recombination and/or ecological isolation (Gross & Rieseberg, 2005). However, these models are not mutually exclusive and both are likely to contribute to the origin of homoploid hybrid species (Rieseberg, 1997; Buerkle et al., 2000). Indeed, a recombinant genome in the hybrid may result from novel combinations of parental alleles through selection for ecological divergence (Rieseberg et al., 2003). Computer simulations indicate that HHS is unlikely in the absence of niche

116 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum divergence (Buerkle et al., 2000), and this is backed up by empirical data showing that the majority of putative homoploid hybrid species occupy novel ecological habitats that are either intermediate (Brochmann et al., 2000; White et al., 2018) or extreme (Rieseberg et al., 2003; Howarth & Baum, 2005) with respect to their parents (Gross & Rieseberg, 2005; Kadereit, 2015).

How these processes are achieved at the genomic or transcriptomic level are not well understood. The combination of divergent genomes by hybridisation results in dramatic changes in gene expression (Hegarty et al., 2008) and it has been hypothesised that novel gene expression combinations and/or transgressive expression generated by hybridisation could play an important role in ecological displacement, isolation and origin of a homoploid hybrid species (Lai et al., 2006; Hegarty et al., 2009). Microarray analyses have identified differentially expressed genes associated with the origin of two homoploid hybrid species, the desert sunflower (Helianthus deserticola Heiser; Lai et al., 2006) and the Oxford ragwort ( squalidus L.; Hegarty et al., 2009). Comparative transcriptomic analyses across species are ideally suited for the identification of differentially expressed genes, but comparisons of homoploid hybrids and their parents using this approach has received relatively little attention.

Argyranthemum Webb (Asteraceae-Anthemideae), a genus of flowering plants endemic to Macaronesian archipelagos of the North Atlantic Ocean including Madeira, the Selvagem Pequena and the Canary Islands (Humphries, 1976) provides a well-documented model for investigating the role of transcriptomics in the evolution of HHS. Argyranthemum sundingii Borgen and A. lemsii Humphries are homoploid hybrid species, independently derived from crosses between A. broussonetii and subspecies of A. frutescens (Appendix Figure E.1; Brochmann, 1987; Fjellheim et al., 2009; White et al., 2018). In the Anaga Peninsula of Tenerife in the Canary Islands, A. frutescens occupies lowland coastal xerophytic habitats whilst A. broussonetii is distributed in higher laurel forests (Appendix Figure E.2; Humphries, 1976; Bramwell & Bramwell, 2001). Although the parental species can be found within only approximately 2 km of each other, they are separated by a steep ecological gradient characterised by decreasing temperature and increasing rainfall/humidity as altitude increases. The homoploid hybrid species are found at intermediate altitudes in South and North-East facing valleys of the Peninsula respectively and niche modelling has demonstrated that the hybrid species occupy distinct ecological niches with respect to each other and their parents (Appendix Figure E.2; Brochmann et al., 2000; Fjellheim et al., 2009; White et al., 2018). Even though the homoploid hybrid species are independently derived, each has a similar genomic composition, with approximately 80 % of the genome derived form A. broussonetii and 20 % from A. frutescens (White et al., 2018).

117 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

The aim of this paper is to investigate the changes in gene expression associated with homoploid hybrid speciation in Argyranthemum. We first compare the transcriptomes of the parental taxa in this complex, identifying differentially expressed (DE) loci that may have diverged due to selection for local adaptation. Second, we identify loci in one or both of the homoploid hybrid species which are DE relative to both parental species and determine the proportion of loci that are transgressive and/or intermediate. Third, we assess the degree to which the independently derived homoploid hybrid species have converged on similar gene expression profiles. Finally, we determine the proportion of the transcripts which show parental-like expression. We investigate gene expression using plants grown under common conditions to control for expression differences due to environment. We therefore acknowledge that if any loci confer local adaptation solely because of environmentally-induced gene expression differences, these will be missed in our analysis.

For non-model species without a reference genome, there are different methods available for the assembly and expression quantification of RNA-seq data but as of yet there is no ‘gold standard’ method by which to do this. There are certain considerations when one is determining gene expression similarities and differences across species, especially without a reference genome, and we specifically utilise a range of pipelines in an attempt to find one that suffers the least bias.

The major consideration is that true orthologues are identified across species. For studies in which RNA-seq data is assembled from multiple species into a single assembly (e.g. Chapman et al., 2013), one assumes that orthologous loci in each taxon are sufficiently similar that they will ‘co- assemble’ (Figure 6.1; Gene 1). However, depending on the extent of genetic divergence between the study species, a subset of orthologous loci may have diverged sufficiently to prevent co- assembly, leading to separate assembly of the alleles into two loci and potentially false patterns of differential expression (Figure 6.1; Gene 2).

118 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Figure 6.1 - Schematic diagram of two genes with similar expression (five reads each) across two species. Gene 1 is sufficiently similar across the two species such that the genes co-assemble providing an accurate impression of expression. Gene 2 has diverged such that the genes from each species do not co-assemble, leading to a false

impression of differential expression.

To identify and combine orthologous loci across multiple species, a clustering algorithm such as CD-HIT (Li & Godzik, 2006; Fu et al., 2012) is frequently employed to cluster similar sequences from independently assembled transcriptomes (Hodgins et al., 2015; Roberts & Roalson, 2017; Ru et al., 2018). However, the appropriate clustering threshold is likely to vary, dependent on the study system in question, intended purpose for clustering and is likely to vary across loci.

Another possibility is to assemble the data per species and then parse down to only the one-to- one orthologues (Harrison et al., 2015; Wright et al., 2015). However, one-to-one orthologues are likely to be relatively few for a transcriptomic investigation where some genes may be expressed at a low level or not expressed in the tissue(s) being studied, prohibiting their assembly and inclusion in downstream analyses. This is likely to limit the analysis to a small fraction of the available data, especially if large numbers of species are compared. Indeed, if one-to-one orthologues are used in differential expression experiments, it is likely that a large proportion of differentially expressed loci would be excluded prior to the analysis.

A final possibility includes the use of orthogroups, a set of genes that are descended from the last common ancestor of the species being considered (Emms & Kelly, 2015). Orthogroups are often used in comparative genomics (Hodgins et al., 2015; Wright et al., 2015; Grusz et al., 2016; Roberts & Roalson, 2017) and their use circumvents some of the disadvantages listed above for

119 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum the other pipelines. However, by their definition orthogroups can also include paralogs that may interfere with the identification of differential expression.

As there is no consensus regarding the methodology for de novo comparative transcriptomics, we compare five pipelines which represent variations of the methodologies outlined above (see Figure 6.2, described in detail below).

Methods

6.3.1 Sampling

From July to August 2015, populations of A. broussonetii, A. frutescens, A. sundingii and A. lemsii in the Anaga peninsula of Tenerife were sampled for cypselae (single seeded fruit) and stored in silica gel (Appendix Table E.1; Appendix Figure E.2). Cypselae were chipped and soaked in 0.5 mg

−1 mL Gibberellic acid (GA3) overnight at 4˚C (Francisco-Ortega et al., 1994). After rinsing with distilled H2O, each accession was sown directly onto a 2:1 mix of compost and perlite in a 5 cm diameter pot with a maximum of two cypselae per pot. Cypselae were left to germinate in an environmentally controlled room at a day temperature of 23 ˚C, night temperature of 18 ˚C, humidity of 60 % and day length of 16 hours. Pots were placed in a tray that was flooded daily so that each had an equal volume of water. For the first 10 days after sowing pots were covered with cling film to increase humidity. To minimise gene expression difference due to morphological differences plants were sampled during the early stages of growth, upon development of the third true leaf, which was snap frozen in liquid nitrogen and stored in a -80˚C freezer until all samples were ready for extraction.

6.3.2 RNA extraction and sequencing

The frozen leaf tissue was homogenised using a pestle and mortar and RNA was isolated using an RNeasy Plant Mini Kit (QIAGEN, Manchester, UK) following the protocol instructions with an additional on-column DNase digestion step (RNase-free DNase, QIAGEN). RNA concentration was evaluated using a QuantusTM Fluorometer (Promega, Southampton, UK) and RNA quality was estimated by running ca. 400 ng of each sample on a 1 % agarose gel stained with GelRed (Biotium, UK). Two µg of RNA was preserved using RNAstable (Biomatrica, San Diego, USA) prior to shipping to Novogene (Hong Kong), where the samples were sequenced (after poly-A isolation) on an Illumina HiSeq.

120 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

6.3.3 Pre-processing

Poor quality sequence data and adapter sequences were removed with Trimmomatic (Bolger et al., 2014) . Illumina clip parameters were set to 2 seed mismatches, palindrome clip threshold 30, simple clip threshold 10, minimum adapter length of 8, keep both reads equals TRUE, leading quality and trailing quality 5, sliding window trimming with a window size 4, required quality 15 and a minimum read length of 36.

6.3.4 De novo assembly and annotation

We investigated five pipelines with differing methods of de novo assembly and homology inference between species (Figure 6.2).

Figure 6.2 - Schematic of five pipelines used in our transcriptome assembly and transcript quantification.

121 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

For pipelines 1 and 2, reads from all 24 samples were normalised to a kmer coverage of 30 and assembled using Trinity (version 2.4; Grabherr et al., 2013), generating an “interspecific assembly”. For pipelines 3 to 5, reads were normalised within species (6 individuals per species) and each was assemble to create “species-specific assemblies”. For both types of assembly, transcripts were assembled with a minimum k-mer coverage of two, increasing the stringency for reads being assembled together. To account for polymorphism between individuals and species, up to 4 nucleotide differences and a 15 bp gap were allowed when assembling transcripts. Trinity assembles the reads into linear contigs, groups contigs that are related due to alternative splicing or gene duplication and identifies final full length transcripts and isoforms of transcripts (Grabherr et al., 2013).

Transcripts from the interspecific and species-specific assemblies were annotated separately using a blastn search (Camacho et al., 2009) against representative CDS sequences (primary transcript only) from Helianthus annuus L. (Badouin et al., 2017) and Arabidopsis thaliana (L.) Heynh (Lamesch et al., 2012) downloaded from Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html; accessed 17/07/2018). An e-value cut-off of 10-20 was used and 500 maximum target sequences were retained. The resulting hits were sorted by query ID and e-value before selecting the top hit for each transcript (Shah et al., 2018).

6.3.5 Quantifying transcript abundance (see Figure 6.2)

6.3.5.1 Pipeline 1

Transcript abundance across samples for each species was estimated by mapping filtered (non- normalised) reads to the interspecific assembly using the Trinity script align_and_estimate_abundance.pl and the RSEM alignment-based estimation method (Li & Dewey, 2011). Transcript abundance was quantified at the level of Trinity isoforms.

6.3.5.2 Pipeline 2

CD-HIT-EST (Li & Godzik, 2006; Fu et al., 2012) was used to collapse potentially orthologous sequences within the interspecific assembly using a sequence identity threshold of 0.95, which was chosen with the goal that divergent alleles would collapse, but that diverged paralogues would not. Transcript abundance was then quantified as for pipeline 1.

122 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

6.3.5.3 Pipeline 3

The four species-specific assemblies were combined and CD-HIT-EST was used in the same way as pipeline 2, to collapse potentially orthologous sequences using the same sequence identify threshold. Transcript abundance was then quantified as for pipelines 1 and 2.

6.3.5.4 Pipeline 4

For each species-specific assembly, peptide open reading frames (ORFs) were predicted for every isoform using Transdecoder (v5.3.0; https://github.com/TransDecoder) and the longest ORF was retained. If no ORF was predicted, or all ORFs were < 100 amino acids, this isoform was excluded. To minimise the presence of multiple transcripts per gene, CD-HIT was used to cluster sequences with a similarity threshold of 0.995 for each species. One-to-one orthologues were identified using OrthoFinder (Emms & Kelly, 2015) which involves an all-by-all blast search (Camacho et al., 2009) followed by clustering of transcripts using mcl (van Dongen, 2000).

Gene expression was quantified using align_and_estimate_abundance.pl using a transcriptome comprising a concatenation of the four species-specific assemblies, and using the gene_to_trans parameter to specify which transcripts compose each one-to-one orthologue. Transcript expression was quantified at the level of the one-to-one orthologues identified.

6.3.5.5 Pipeline 5

For the identification of orthogroups we carried out the same steps as for pipeline 4, but we also included representative peptide sequences (primary transcript only) from five reference taxa in our OrthoFinder analysis to improve orthogroup inference. The reference taxa were Helianthus annuus (Badouin et al., 2017), Lactuca sativa L. (Reyes-Chin-Wo et al., 2017), Solanum lycopersicum L. (The Tomato Genome Consortium, 2012), Mimulus guttatus DC. (Hellsten et al., 2013) and Arabidopsis thaliana (Lamesch et al., 2012) downloaded from Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html; accessed 17/07/2018). On inspection of orthogroups resulting from running OrthoFinder with the default parameters it was clear that there were many paralogous transcripts in some of the orthogroups. To minimise clustering of gene paralogues we investigated the effect of increasing the inflation parameter (-I) of mcl in our OrthoFinder analysis, which increases the clustering threshold. We tested inflation parameter values from 1.5 (default) to 3.9, in steps of 0.2.

For the outputs derived from increasing the mcl inflation parameter, we compared (1) the number of orthogroups, (2) the proportion of sequences in orthogroups, (3) the number of

123 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum monophyletic orthogroups and (4) the proportion of orthogroups comprised of transcripts with a single BLAST hit. To determine the number of monophyletic orthogroups (i.e. where all species or Argyranthemum as a whole were monophyletic) we used the reconciled gene trees generated by OrthoFinder and the R script check_genus_monophyly.R (https://gist.github.com/josephwb/f3d35f8833a07f71002af7726b12652b; accessed 29/09/2018). To determine the proportion of orthogroups comprised of transcripts with a single BLAST hit we used the previous results from the BLAST searches against Helianthus and Arabidopsis. We reason that the results from the chosen mcl parameter should exhibit a relatively large number of orthogroups, without a compromise in the number of sequences that are present in the orthogroups, and a large proportion of the orthogroups should be monophyletic and found to correspond to only a single BLAST hit in both Helianthus and Arabidopsis. After selecting this, expression was quantified in the same way as pipeline 4, quantifying transcript expression at the level of the orthogroups identified.

6.3.6 Differential expression

Following transcript quantification, an expression matrix with counts and TMM-normalised counts was built for each pipeline using the Trinity script abundance_estimates_to_matrix.pl. Pair-wise differential expression was performed using the R package edgeR (Robinson et al., 2009; McCarthy et al., 2012) within Trinity, utilising the script run_DE_analysis.pl. Differentially expressed (DE) transcripts, orthologues or orthogroups (hereafter, simply loci) were identified in pairwise comparisons amongst species using a P value cut-off of 0.05 after false discovery rate (FDR) correction for multiple comparisons (Benjamini & Hochberg, 1995).

6.3.7 Pipeline comparison

To select a single pipeline for further analysis, comparisons were made between the five pipelines for the enriched Gene Ontology (GO) terms in the loci identified as DE between the parental species using AgriGO (Du et al., 2010; Tian et al., 2017) and the top Arabidopsis hit for each locus, with Fisher’s exact tests and FDR corrected P value cut-off of 0.05. For the outputs of pipelines 1-3 we simply used the top A. thaliana BLAST hit for the locus, whereas for pipelines 4 and 5 we used the A. thaliana hit with the lowest e-value out of all of the transcripts present in each orthologue/orthogroup. We also tested the hypothesis that orthologous loci that do not co- assemble are enriched in the differential expression results as depicted by Figure 6.1. Specifically, we carried out an all-by-all blast for our interspecific and species-specific transcriptome assemblies and tested if DE transcripts are more likely to have a best blast hit which is also DE

124 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum more than expected by chance using χ2 test. The longest transcript from orthologues and orthogroups were selected as representatives in the all-by-all BLAST.

Based on these comparisons, pipeline 5 was selected for the complete analysis (see Results; Comparison of pipelines), analysing the transcriptomes for novel expression in the homoploid hybrid species, differential expression between the two homoploid hybrid species, and parent-like expression in the hybrid species (see Table 6.1 for definitions of these categories). GO enrichment tests were performed as above for all lists of DE loci.

To assess the degree to which the independently derived homoploid hybrid species have converged on similar gene expression we compared expression profiles of replicates across samples of each species using the Trinity script PtR. A sample correlation matrix and principal component analysis was generated based on a counts matrix for orthogroups as input generated by the Trinity script abundance_estimates_to_matrix.pl as mentioned above, with counts per million (CPM) and log2 transformation.

Table 6.1 - Expression phenotypes used to identify transcripts that are differentially expressed (A) between the parental species, (B) with novel expression in the homoploid hybrid species, (C) between the two homoploid hybrid species, and (D) with parent-like expression in the hybrid species are identified. Taxa are abbreviated as bro (A. broussonetii), fru (A. frutescens), sun (A. sundingii) and lem (A. lemsii). Differential expression (DE) = ×, no DE = • and either = -.

Expression bro v bro v bro v fru v fru v sun v Type phenotype fru sun lem sun lem lem A Between parents × - - - - - B Novel sun - × - × - × Novel lem - - × - × ×

Novel hhs - × × × × •

C Between hhs - - - - - × D sun bro-like × • - × - - sun fru-like × × - • - -

lem bro-like × - • - × -

lem fru-like × - × - • -

HHS bro-like × • • × × •

HHS fru-like × × × • • •

125 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Results

6.4.1 RNA-seq processing and assembly

RNA-seq produced 674 M raw reads with an average of 28.1 M (± 0.9 M [SE]) per sample. Filtering of poor-quality reads removed approximately 5.21 % of reads, resulting in a total of approximately 639 M and an average of 26.6 M (± 0.8 M) per sample (Appendix Table E.2). Normalisation within species retained on average 9.25 M reads (5.86 % of the input). When the normalised reads were combined and re-normalised for the “interspecific assembly”, 19.81 M (53.53 %) reads were retained (Appendix Table E.3).

The interspecific assembly (used in pipelines 1 and 2) had 0.512 M genes and 1.083 M transcripts, with an average contig size of 585 bp and N50 of 763 bp. The species-specific assemblies (used in pipelines 3 to 5) resulted in an average of 0.218 M genes and 0.437 M transcripts per species, with an average contig size of 711 bp and N50 of 1042 bp (Appendix Table E.4).

In pipeline 2, collapsing similar transcripts with CD-HIT-EST retained 0.687 M transcripts. For pipeline 3, combining the species-specific assemblies and collapsing similar transcripts resulted in 0.785 M transcripts. For pipelines 4 and 5, an average of 0.143 M ORFs per species were identified (i.e. approx. 32.7 % of transcripts) and clustering with CD-HIT resulted in an average of 0.093 M transcripts per species.

In pipeline 4, running OrthoFinder with only transcripts from Argyranthemum identified 7,266 one-to-one orthologues. For pipeline 5, increasing the inflation parameter values increased the number of orthogroups (both across all species and within Argyranthemum), as well as the number of monophyletic orthogroups (Figure 6.3A). This was paralleled with a modest reduction in the percentage of genes in an orthogroup above an inflation parameter value of 2.3 (Figure 6.3 B). However, there was no appreciable change in the proportion of orthogroups composed of transcripts with different best blast hits in either Arabidopsis or Helianthus (Appendix Figure E.4). Our OrthoFinder results based on an inflation parameter of 2.3 (54,094 monophyletic orthogroups) were selected for further analysis as it represented the point at which there was a high proportion of monophyletic orthogroups, without a reduction in the percentage of transcripts in an orthogroup (Figure 6.3; Appendix Table E.6).

126 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Figure 6.3 - Change in (A) number of orthogroups and (B) percentage of genes in orthogroups with increasing values for the mcl inflation parameter.

6.4.2 Comparison of pipelines

Differential expression between the parental species was compared for all pipelines implemented. Pipelines 1, 2 and 3 identified 22,472 (2.07 %), 16,100 (2.34 %) and 28,678 (3.65 %) DE loci respectively. Pipelines 4 and 5 identified 124 (1.71 %) and 2,661 (4.92 %) DE loci respectively. Only two GO terms were identified in all five analyses: cellular nitrogen compound metabolic process (GO: 0034641) and biosynthetic processes (GO:0044271; Figure 6.4). Excluding pipeline 4, in which only a very small proportion of loci were DE, a total of 30 GO terms were shared between the pipelines (Figure 6.4). These included response to light stimulus (GO:0009416), response to temperature stimulus (GO:0009266), response to inorganic substance (GO:0010035) and light harvesting complex (GO:0030076) as well as related terms (Appendix Table E.7). Pipelines 2 and 5 have relatively distinct GO terms compared to the other pipelines, with 44 and 96 unique GO terms respectively (Figure 6.4).

127 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Figure 6.4 - Venn diagram depicting the degree of overlap between over-represented gene ontology terms for differentially expressed transcripts between the parental species A. frutescens and A. broussonetii for pipelines 1-5.

In pipelines 1, 2, 3 and 5, χ2 tests found that DE transcripts have a best blast hit that is also DE more often than expected by chance (all P < 0.0001), the pattern we would expect if false patterns of differential expression were being recovered because orthologous loci were not co- assembling (see Figure 6.1). A greater number of reads successfully mapped back to a de novo assembly of the same species than to a de novo assembly of another species (data not shown) which further suggests that true orthologues might be sufficiently diverged between species to preclude their co-assembly. The skew from chance association, assessed by a φ correlation (where a score of 0 indicates no skew and a score of 1 indicates a strong positive correlation), was greatest for pipelines 1 and 2 (0.2177 and 0.2420), and least for pipelines 5 (0.1301).

Whilst the χ2 test for pipeline 4 was not significant, suggesting that true orthologues are not failing to co-assemble, this is presumably because only the orthogroups with the strictest evidence of being true orthologues were retained. This meant that only 7266 loci could be assessed for DE (compared to >54,000 loci for the other pipelines), which represents only 0.67 % of the transcripts per species. Further, as mentioned in the introduction, only analysing 1-to-1 orthologues likely

128 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum removes those loci with the strongest true differential expression. Indeed. this pipeline produced the smallest proportion of DE loci.

Based on the evidence above we employed pipeline 5 for further downstream analyses as it appears to reduce the problem of orthologous loci not co-assembling, does not remove a large proportion of DE loci prior to analysis, and maximises the proportion of data being analysed.

6.4.3 Global gene expression in the parents and hybrid species

Our sample correlation matrix and principal components analysis showed a clear distinction between the parental species and the subspecies of A. frutescens (Figure 6.5). The homoploid hybrid species A. sundingii and A. lemsii appear to have converged on similar expression profiles and are distinct relative to their parental progenitors. On second and third principal components there is some separation between A. sundingii and A. lemsii. However, one sample (A. sundingii 12) appears to be an outlier (Figure 6.5).

6.4.4 Expression analysis suggests extensive divergence in genes involved in local adaptation

The number of DE orthogroups and enriched GO terms was greatest between the parents, with 2661 (4.92 %) and 188 respectively (Table 6.2). GO terms were associated with a wide variety of biological processes (118), cellular components (12) and molecular functions (58; Appendix Table E.8). The three most enriched GO terms were response to stimulus (GO:0050896), response to chemical stimulus (GO:0042221) and small molecule metabolic process (GO:0044281). Other related GO terms significantly enriched in the orthogroups DE between the parents include response to temperature stimulus (GO:0009266), response to cold (GO:0009409), response to radiation (GO:0009314), response to water deprivation (GO:0009414), response to high light intensity (GO:0009644) and response to red or far red light (GO:0009639).

129 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Figure 6.5 - Sample correlation matrix (A) and principal components analysis (B) of expression generated using Trinity script PtR. For the sample correlation matrix the colour key shows Pearson’s correlation in expression between samples and the dendrograms above and to the left show similarity/dissimilarities between samples. The first, second and third axis are shown in the principal components analysis.

130 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Table 6.2 - Number of differentially expressed (DE) orthogroups, blast hits in Arabidopsis thaliana, annotated hits in A. thaliana and significantly enriched gene ontology (GO) terms for each expression phenotype.

DE Blast hit Annotated GO terms Between parents 2661 1304 974 188 Novel sun 20 11 10 0 Novel lem 18 9 9 0 Novel hhs 53 21 17 0 Between hhs 134 70 60 3 sun bro-like 459 249 210 5 sun fru-like 448 209 181 17 lem bro-like 664 359 293 15 lem fru-like 399 188 162 31 HHS bro-like 277 157 134 3 HHS fru-like 208 101 89 18

Relatively few loci exhibited patterns of novel expression unique to one of the homoploid hybrid species or shared between the homoploid hybrid species. In total, 20 (0.04 %) loci were DE in A. sundingii, 18 (0.03 %) in A. lemsii and 53 (0.10 %) were shared between A. sundingii and A. lemsii (Table 6.2). No GO terms were significantly over-represented in any of these three sets of loci after correcting for multiple testing. However, a small number of DE loci appear to be orthologous to genes involved in ecological adaptation and relevant morphological characteristics in A. thaliana (Accompanying material E.1). For example, a DE locus in A. sundingii appears orthologous to PIP2, involved in water transport and a DE locus in A. lemsii appears orthologous to a protein with 1-deoxyxylulose 5-phosphate synthase activity, essential for chloroplast development. One locus DE between both homoploid hybrid species and the parents is a putative orthologue of a member of the BEL family of homeodomain proteins which affect leaf serration in A. thaliana. All of the loci DE in one or both homoploid hybrid species exhibited a transgressive expression pattern, either upregulated or downregulated with respect to the parental progenitors. In A. sundingii, an equal proportion of DE loci were up (10/20) and down regulated (10/20) with respect to the parental progenitors. For A. lemsii, 6 loci were up-regulated and 12 were down- regulated. For novel expression shared between the homoploid hybrid species, 43 loci were upregulated whereas 10 were down-regulated. A total of 134 (0.25 %) loci were DE between A. sundingii and A. lemsii (Table 6.2), and these were significantly enriched for three GO terms including response to stress (GO:0006950), cellular carbohydrate metabolic process (GO:0044262) and response to stimulus (GO:0050896; Appendix Table E.9).

For a locus to be ‘parental-like’ in terms of expression in the hybrid species (individually or shared) expression had to be significantly DE between the hybrid and one parent, and not DE between the

131 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum hybrid and the other parent. Necessarily this was limited to the 2661 loci that were significantly DE between the parents. In A. sundingii, 459 (0.85 %) loci were A. broussonetii-like and 448 (0.83 %) were A. frutescens-like in their expression. In A. lemsii, 664 (1.24 %) loci were A. broussonetii-like and 339 (0.74 %) were A. frutescens-like in their expression. For the hybrid species together, there were 277 (0.51 %) loci with A. broussonetii-like expression and 208 (0.38 %) loci with A. frutescens-like expression (Table 6.2).

We found a large number of GO terms significantly over-represented in the lists of loci with parent-like expression in each of the categories listed above (Appendix Tables E.10-E.15) and several were associated with potential adaptation to the environment. In A. sundingii, loci with parent-like expression were associated with GO terms for response to external stimulus (GO:0009605), oxidoreductase activity (GO:0016491), response to stress (GO:0006950), and response to abiotic stimulus (GO:0009628; Appendix Tables E.10-E.11). Similarly for A. lemsii, loci with parent-like expression were enriched for GO terms such as response to stimulus (GO:0050896), response to cold (GO:0009409), response to light stimulus (GO:0009416) and response to temperature stimulus (GO:0009266; Appendix Tables E.12-E.13). For the loci exhibiting shared parent-like expression, GO terms response to temperature stimulus (GO:0009266), response to stress (GO:0006950) and response to abiotic stimulus (GO:0009628) were over-represented (Appendix Tables E.14-E.15).

Discussion

6.5.1 Assessment of assembly pipelines

In this study, we compared a range of pipelines for de novo comparative gene expression analyses between four Argyranthemum species, namely two homoploid hybrids A. sundingii and A. lemsii and their parental species A. broussonetii and A. frutescens. A method based on the identification of orthogroups was employed and loci potentially associated with the origin of the homoploid hybrid species were identified. The approach we selected is advantageous in that it maximises the proportion of data included in the analysis whilst filtering out orthogroups with evidence for being comprised of paralogous sequences. Compared to earlier studies based on a single interspecific assembly, we expect that our results are less likely to recover false patterns of differential expression as a result of loci across species not co assembling (see Figure 6.1).

Comparative transcriptomics has been employed across multiple species to recover orthologues in Entelegyne spiders (3,345 genes; Carlson & Hedin, 2017) and New World lupins (6013 genes;

132 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Nevado et al., 2016). In each of these studies, a large proportion of the data is precluded from the analyses during orthologue identification, exactly as we found for pipeline 4 where less than 1 % of the transcripts per species were retained in the one-to-one orthologues. The aims of these previous studies were not to identify DE genes involved in speciation, rather resolve phylogenetic relationships, investigate changes in selection acting on coding sequences (Nevado et al., 2016), and identify orphan genes (Carlson & Hedin, 2017). As such, the use of orthologues was entirely suitable for their study aims. However, we demonstrate here that use of 1-to-1 orthologues were too restrictive for meaningful comparative gene expression analyses across species.

6.5.2 Identification of DE orthogroups

Differential expression was greatest between the parental species, with 2661 DE orthogroups, from which 188 GO terms were significantly over-represented. The parental species occupy the altitudinal extremes of the Anaga peninsula of Tenerife (< 200 m A. frutescens; > 500 m A. broussonetii) which has steep ecological gradients related to temperature, rainfall and humidity gradients (Brochmann et al., 2000; Fjellheim et al., 2009). Temperature was identified as an important contributor to the distribution of the parental species using ecological niche modelling (White et al., 2018), and GO terms related to response to temperature and cold were identified. Other ecologically relevant GO terms identified included response to radiation, water deprivation, response to high light intensity and response to red or far red light, suggesting further gene expression divergence related to changes in altitude. Selection for expression divergence of genes related to response to light wavelength may indicate differential adaptation to shaded environments; A. broussonetii is often found in gaps in the laurel forest, whereas A. frutescens is typically found in exposed environments.

Patterns of novel gene expression were identified in the homoploid hybrid species, with 20 orthogroups expressed in a novel manner solely in A. sundingii, 18 expressed in a novel manner in A. lemsii and 53 expressed in a novel manner shared between the homoploid hybrid species and for all of these a transgressive pattern of expression was identified. This may indicate that expression divergence between hybrid species and the parents involves a significant amount of novel expression, as opposed to the ‘averaging’ of parental expression, although it should be noted that the conditions under which significant differential expression of a hybrid is intermediate to both parents is quite restrictive, involving DE between the parents as well as between the hybrid and each parent.

133 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

While the number of DE loci with BLAST annotations was too few for a meaningful GO analysis, there was evidence of transgressive up regulation of loci involved with water transport in A. sundingii, chloroplast development in A. lemsii and leaf serration in both homoploid hybrid species.

The genomic make-up of the two homoploid hybrid species is not intermediate with respect to the parental species, with a greater contribution of A. broussonetii relative to A. frutescens (approximately 80:20 for A. broussonetii and A. frutescens respectively based on STRUCTURE analyses; White et al., 2018). However, for gene expression there was a less obvious bias in terms of parent-like expression. For A. sundingii, the relative parental contribution is almost equal between A. broussonetii-like (459; 51%) and A. frutescens-like (448; 49 %). For A. lemsii, there was a bias towards A. broussonetii-like expression (664; 62%), relative to A. frutescens-like (399; 38 %). For loci showing shared parent-like expression in both A. sundingii and A. lemsii there was again a slight bias towards A. broussonetii-like expression (277; 57%) compared with A. frutescens-like (208; 43 %). GO terms related to ecological tolerances were again resolved in these sets of loci, including response to temperature and response to cold. Overall, there seems a pervasive signature of selection for expression divergence in genes related to the divergence in ecology between these taxa.

6.5.3 Changes in gene expression associated with HHS

As far as we are aware, only two studies have employed gene expression analyses to identify changes in expression associated with homoploid hybrid species, both of which were based on microarrays. Lai et al. (2006) investigated gene expression divergence between Helianthus deserticola and its parental species H. annuus and H. petiolaris and identified 2.0 %, 3.3 %, and 5.8 % of 2897 genes which were transgressive, annuus-like, and petiolaris-like, respectively, in their expression. Similarly, Hegarty et al. (2009) investigated gene expression divergence between Senecio squalidus and its parental species S. chrysanthemifolius and S. aethnensis. Of the 311 cDNA clones screened, 203 (65.27 %) were transgressive in S. squalidus. In these studies (Lai et al., 2006; Hegarty et al., 2009) and in ours, genes with potential roles in ecological divergence were present in the lists of loci with transgressive expression phenotypes in the homoploid hybrid species. However, it should be noted that for S. squalidus, there is geographic isolation between the homoploid hybrid (UK) and its parent species (Mt. Etna, Sicily, ). Therefore, the role of ecological divergence in the origin of S. squalidus is not clear. Overall this supports the hypothesis that ecological divergence is important in the origin of homoploid hybrid species (Lai et al., 2006;

134 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

Hegarty et al., 2009), and in some cases this appears to result from the evolution of novel transgressive gene expression.

A small number of differences in gene expression between the two homoploid hybrid species were identified with 134 DE loci, considerably less than between the parents (2661 loci). It has recently been shown that the two hybrid species arose independently and occupy distinct ecologically niches in Tenerife (White et al., 2018). GO terms over-represented in the loci DE between A. sundingii and A. lemsii include several with potential roles in ecological adaptation, for example response to stress and response to stimuli. The two hybrid species occupy the South and North West facing slopes of the Anaga peninsula in contrasting ecological habitats due to the absence and present of the northern trade winds respectively (White et al., 2018). Indeed, it appears that the small subset of loci which are DE between the homoploid hybrid species may have a role in ecological adaptation. However, based on the relative paucity of DE loci between the two hybrid species, the shared patterns of novel gene expression, and the intermediate and overlapping gene expression patterns of the two hybrid species, it seems that despite the genomic composition of the hybrid species being strongly biased towards from A. broussonetii, the gene expression is much more intermediate.

Although our methodology maximised the proportion of data being analysed, it is noteworthy that only a relatively small proportion of orthogroups were differentially expressed. There could be several reasons for this finding. First, the hybrid species originated 2 - 4 mya (White et al., 2018). Hence, there is comparatively little evolutionary time for large-scale gene expression divergence to evolve. Secondly, the plants were grown in a common environment. Therefore, if ecological adaptation was conferred by plastic gene expression differences only observed in the native environments, these would not be identified. Third, ecological adaptation and speciation may be conferred by sequence differences as opposed to expression differences. Since our chosen pipeline groups transcripts, and does not involve the generation of one reference transcriptome, we cannot generate haplotype sequences from the individuals or species, hence sequences cannot be compared to identify loci exhibiting the hallmarks of divergent selection. Finally, our analysis was based on transcriptomes from seedling leaf tissue; hence any differential expression in other tissues would be missed.

Building upon this study, future work to further examine the genetic basis of ecological divergence and speciation should investigate gene expression from wild material as well as reciprocal transplant material.

135 Gene expression changes associated with homoploid hybrid speciation in the Macaronesian endemic genus Argyranthemum

6.5.4 Conclusions

Argyranthemum is a model for studying the causes and consequences of homoploid hybrid speciation, with two independently derived hybrid species from the same parental cross and a likely role for ecological speciation in each case. We identified DE loci associated with a potential role in ecological speciation between the parental taxa, novel expression in the homoploid hybrid species, differential expression between the two homoploid hybrid species and parent-like expression in the hybrid species. Although independently derived, the homoploid hybrid species appear to have converged on similar expression phenotypes, likely as a consequence of adaptation to intermediate habitats in the Anaga peninsula. However, there are subtle differences between A. sundingii and A. lemsii, which likely reflect the ecological differences between the South and North West facing slopes of the Anaga peninsula respectively.

Comparative transcriptomics across species offers a unique opportunity to investigate the gene expression changes associated with speciation. It is gaining momentum as a mechanism by which to identify gene expression differences associated with speciation (Pavey et al., 2010; Chapman et al., 2013; Dunning et al., 2016). However, our results clearly show that different pipelines can produce different outputs and alternative pipelines should be investigated.

136 Conclusions and future directions

Chapter 7 Conclusions and future directions

Argyranthemum is the largest endemic genus of flowering plants in the Macaronesian archipelagos and an ideal model for investigating the processes responsible for the diversification of oceanic island endemic lineages. In this thesis, I sought to investigate the processes responsible for diversification of Argyranthemum using NGS methodologies. Specifically, the aims of this project were to: 1. Address outstanding questions surrounding the origin of the homoploid hybrid species A. sundingii and A. lemsii. 2. Investigate the relative importance of geographical isolation, habitat shifts and hybridisation in the diversification of Argyranthemum. 3. Employ comparative transcriptomics to investigate changes in gene expression associated with HHS using plants grown under common conditions.

In chapter two, simple sequence repeat (SSR) markers were designed for Macaronesian endemic genera, including Argyranthemum, which were later used for population genetic analyses (chapter three). This was an important development required to overcome the limitations of earlier studies, which were hampered by a lack of variation in commonly used molecular makers. Thirty SSR markers were designed of which 12 could be amplified by PCR and eight were polymorphic. This demonstrated the value of transcriptomes as a genetic resource for non-model systems and the markers generated were used for subsequent population genetic analyses.

Several outstanding questions surrounding the status and origin of the homoploid hybrid species in Argyranthemum were addressed in chapter three. Leaf morphological analysis suggested that A. sundingii and A. lemsii are distinct from their parental progenitors and distinguishable from each other based on leaf area. Ecological niche modelling demonstrated that the homoploid hybrid species occupy novel habitats that are intermediate relative to the parental species. Nuclear SSRs and single nucleotide polymorphism (SNP) data indicated that the homoploid hybrid species are distinct from the parental taxa, while populations previously referred to as “A. cf. lemsii” are likely of hybrid origin and have subsequently introgressed with A. frutescens. Population level sampling of chloroplast SSRs and approximate Bayesian computation showed that A. sundingii and A. lemsii are independently derived from the same parental species. This chapter provided support for the hypothesis of homoploid hybrid speciation in Argyranthemum, that A. sundingii and A. lemsii are genetically and morphologically distinct, ecologically isolated from each other and from their parental progenitors and that each originated from independent homoploid hybrid speciation (HHS) events.

137 Conclusions and future directions

Whilst I have demonstrated that the hybrid species are distinct and ecologically isolated from their parents, it is not known whether the hybridisation was critical to their occupation of intermediate ecological habitats i.e. was hybridisation directly responsible for the onset of reproductive isolation. This is one of the criteria outlined by Schumer et al. (2014), but has only been demonstrated experimentally in Helianthus (Rieseberg et al., 2003) and Heliconius (Salazar et al., 2010). To demonstrate this, one could perform a reciprocal transplant experiment between A. broussonetii (laurel forest), homoploid hybrid species (intermediate altitudes) and A. frutescens (coastal xerophytic) habitats. By measuring traits as a proxy for fitness such as germination, growth rate and time to flowering, it would be possible to investigate if the homoploid hybrid species are more fit in the intermediate altitudes relative to the parents. Importantly, F1 hybrids would also need to be used to elucidate if adaptations are directly the result of hybridisation.

In chapter four, Genotyping-By-Sequencing (GBS) was employed to resolve phylogenetic relationships in Argyranthemum whilst investigating the relative importance of geographical isolation, habitat shifts and hybridisation in the diversification of the genus. Ancestral state reconstruction revealed an important role for both geographic isolation and habitat shifts in the diversification of Argyranthemum. In particular, habitat shifts were found to be more important in the Canary Islands than previously thought. D-statistics (ABBA-BABA tests) revealed evidence of hybridisation between lineages co-occurring on the same island but found little support for the hypothesis that that hybridisation may be responsible for the occurrence of non-monophyletic multi-island endemic (MIE) species. This study demonstrated that geographic isolation, habitat shifts and hybridisation have all been important in the diversification of Argyranthemum. In addition, morphological convergence is proposed as an explanation for the occurrence of non- monophyletic MIE species, revealing greater complexity in the processes responsible for the diversification of this endemic oceanic island genus than was previously thought.

Individually GBS loci are too short and lack sufficient variation to resolve species relationships. Hence, it was necessary to concatenate GBS loci together as a single alignment. However, the ability to resolve species relationships on a gene-by-gene basis would be advantageous, as incongruent gene trees can be used to resolve reticulate evolutionary relationships, which we suspect to be frequent in Argyranthemum. A recent paper has suggested that it is possible to use gene trees from GBS or RAD loci to resolved reticulate evolutionary events (Blanco-Pastor et al., 2018) but this remains to be tested for Argyranthemum. Other possible methods of obtaining individual genes from across the genome include targeted amplicon sequencing, exome sequencing or transcriptomics and such approaches could provide further insights into the evolution of Argyranthemum.

138 Conclusions and future directions

In chapter five, the taxonomic status A. broussonetii was re-evaluated in the light of the phylogenetic analysis in chapter four, in which A. broussonetii subsp. broussonetii and subsp. gomerensis were not resolved as monophyletic. Although subspecies of A. broussonetii have converged on a similar leaf morphology, likely a consequence of their occupation of similar ecological habitats, the two subspecies can be readily distinguished from each other based on cypselae characteristics. Indeed, A. broussonetii subsp. gomerensis is shown to be morphologically more similar to A. callichrysum that also occurs on La Gomera, than to A. broussonetii subsp. broussonetii. This chapter highlights the potential for convergent evolution in Argyranthemum and supports raising A. broussonetii subsp. gomerensis to the rank of species.

During the timescale of thesis, detailed morphological analysis of A. adauctum and A. frutescens was not feasible. However, these taxa also warrant further study as each was identified as non- monophyletic. Argyranthemum adauctum was polyphyletic, with subspecies of A. adauctum more closely related to other species from the same island than to each other. I propose that this is likely due to morphological convergence, but their morphological distinction remains to be robustly evaluated. In addition, the current delimitation of A. frutescens needs to be reassessed as “A. vincentii” and A. gracile were found to be nested within a paraphyletic A. frutescens. The accurate delimitation of taxa is essential for effective conservation of the Macaronesian flora and can guide investigations into the genetic basis of phenotypic evolution.

In chapter six, a comparative transcriptomic analysis is implemented to test the hypothesis that novel gene combinations and/or transgressive expression generated by hybridisation play an important role in ecological isolation and origin of a homoploid hybrid species. Since there is no established methodology for comparative transcriptomic analyses in the absence of a reference genome (as in Argyranthemum), five pipelines for transcriptome assembly and transcript quantification were compared. Differentially expressed genes are identified between the parental species, novel expression in the homoploid hybrid species, differential expression between the two homoploid hybrid species and evidence of parent-like expression in the hybrid species are identified. Our analysis highlighted that although independently derived, A. sundingii and A. lemsii, have converged on a similar expression profile, potentially as a consequence of the adaptation to intermediate altitudes.

As the plants sampled for transcriptomes were grown under common conditions, differences in gene expression owing to environmental plasticity would not have been recovered by our analysis. Future research should seek to identify patterns of differential expression in the hybrid species from material collected in situ. This would allow researchers to identify plastic

139 Conclusions and future directions environmental changes in gene expression, compared with the fixed patterns of expression identified by this analysis.

In summary, whilst there are still unanswered questions surrounding the evolution of Argyranthemum that remain to be addressed, this thesis has contributed significantly to our understanding of the processes responsible for the diversification of Argyranthemum. Lessons learned from Argyranthemum are applicable to other island radiations and plant speciation more generally, which we hope will be the focus of future research.

140

Supplementary information for chapter 2

Appendix Table A.1- Contig name, SSR sequence, and primers designed for the 30 SSR-containing loci selected from the Argyranthemum broussonetii transcriptome. Contig name SSR Forward primer Forward primer sequence Reverse primer Reverse primer sequence TR20067|c0_g1_i1 (CA)9 TR20067F CACGACGTTGTAAAACGACCACAGATTCCCTGCAATCAA TR20067R CGTTATAAGCCAGCGGGTAT TR26218|c0_g1_i1 (ATA)8 TR26218F CACGACGTTGTAAAACGACTGACACAAAACAAACCAAACAA TR26218R TGAACATCATCTGGGTCAACA TR23535|c0_g2_i1 (ATG)7 TR23535F CACGACGTTGTAAAACGACCTGCGAATAACTCGCGAAA TR23535R CGCTAAACGCCTCGTCTAAT TR3577|c0_g1_i1 (TA)9 TR3577F CACGACGTTGTAAAACGACGCATTAAAAATGCGCTCACA TR3577R TGATGATTGCAATGGATACAAA TR18837|c0_g1_i1 (CAA)7 TR18837F CACGACGTTGTAAAACGACAACCCCAACGACATAACAGC TR18837R TGTAATGAGGTGGTGGTTGC TR9717|c0_g1_i1 (ATA)7 TR9717F CACGACGTTGTAAAACGACCGGGTGATTTGAATGCTGAT TR9717R CCGCACTAGTACCCATACCC TR26198|c0_g1_i2 (CTG)8 TR26198F CACGACGTTGTAAAACGACTGGGAATTGTAGCTGCCAAG TR26198R AGCACCATTTGTGTGAGCTG TR27644|c0_g6_i1 (CAG)8 TR27644F CACGACGTTGTAAAACGACAATTTCAGCATCGCCAAATG TR27644R TGGGCTAGTTGGTTTCCTTG

141 TR24279|c0_g3_i1 (TCA)7 TR24279F CACGACGTTGTAAAACGACTCATCATCGTCATCGTAATCG TR24279R AACCGGCCTAGGAAAGTGAT

TR19493|c0_g2_i1 (ATA)7 TR19493F CACGACGTTGTAAAACGACCAACCCCTTCAAACTTCTTCA TR19493R CCCTTTAATTGCATTCCCTTC TR19493|c0_g1_i1 (ATA)7 TR19493F CACGACGTTGTAAAACGACCCCCTTCAAACTTCTTCAACC TR19493R CACCAATAAGCAGCAAACCA TR13418|c0_g1_i1 (TGG)8 TR13418F CACGACGTTGTAAAACGACTTTGATGATGGGGTTGAGAA TR13418R AAACACCATCACACCACCAC TR24618|c0_g3_i1 (ATT)8 TR24618F CACGACGTTGTAAAACGACATCGATTGAGTCCGCGTAAG TR24618R GCGAGTTTCGACAATCTGGA TR22108|c0_g2_i1 (TTCA)5 TR22108F CACGACGTTGTAAAACGACCGTAAGCGCGTCTCTCTCTT TR22108R CTCTGTAAGCCGCCATTGAT TR16342|c0_g1_i1 (CAG)7 TR16342F CACGACGTTGTAAAACGACGGCGAAGTTTCTGGATGTTG TR16342R AATCTCCCGCATAACCAGAC TR6042|c0_g1_i1 (ATC)7 TR6042F CACGACGTTGTAAAACGACACAAAAACCCATTTGCCACA TR6042R ACAGAAGACAAAGGGCCATC TR25403|c0_g1_i1 (AGAT)5 TR25403F CACGACGTTGTAAAACGACCATCCCCATTCACTTCACAA TR25403R TTGAATCTCTCCTGCCATGAT TR25653|c0_g1_i1 (TGA)7 TR25653F CACGACGTTGTAAAACGACAAACACCGCAAGTGGATGAT TR25653R ACCTCTTTCCATTGCTTTGC TR13461|c0_g1_i1 (CAAT)5 TR13461F CACGACGTTGTAAAACGACTTCCTTCCAATCAACAACACTT TR13461R AGGGTTTCCGTTAACAGTGG TR19268|c0_g1_i2 (TTTC)5 TR19268F CACGACGTTGTAAAACGACCCACTCATTCAATCCATCCA TR19268R TCATCAGCATCGTCATGGTT TR4524|c0_g1_i1 (TCA)7 TR4524F CACGACGTTGTAAAACGACTCAGATCAAGCGTCACCACT TR4524R CGGGAACCTTGATCTTTGGT TR24540|c0_g2_i1 (AAG)7 TR24540F CACGACGTTGTAAAACGACAGCCTCCATTCCATTTTCG TR24540R TGCAGCACATGAGTCCATACT TR20600|c1_g1_i1 (CAG)8 TR20600F CACGACGTTGTAAAACGACATCAGTCGATTTCGGTCTCG TR20600R GATCACCACCAACACCATCA TR16955|c0_g3_i1 (TGG)7 TR16955F CACGACGTTGTAAAACGACCCGCCTGTGACTCATCAGTAT TR16955R CTGCCTCCTTCCATATCCAG TR26651|c2_g2_i1 (ATAA)5 TR26651F CACGACGTTGTAAAACGACTCAATTGGGTTGATTGCTCA TR26651R ACCGATGCTGTGCTTCTTG TR24349|c0_g2_i5 (GAA)7 TR24349F CACGACGTTGTAAAACGACCAATTCATCATCCATCCATCC TR24349R CAACAACAACACCTTTCATTCC TR24835|c0_g1_i1 (AC)8 TR24835F CACGACGTTGTAAAACGACTATGGCTACATGCTGGGACA TR24835R CGCAGCCACGAACATACTAA

Appendix A TR17195|c0_g1_i2 (CCA)7 TR17195F CACGACGTTGTAAAACGACCTCCAACAATGGGTCTCTCC TR17195R CGCCTTCAATCTTGTCTGGT TR27145|c0_g3_i4 (AC)9 TR27145F CACGACGTTGTAAAACGACCCATGATCCAGTTGACACCA TR27145R TCGGCGTGCTCAAATAAAC TR23697|c0_g1_i1 (GCA)7 TR23697F CACGACGTTGTAAAACGACTCATCGCATCTAGGGTTTTTG TR23697R ATCTCAGCGGTGAAACGAAC

Appendix B

Supplementary information for chapter 3

Appendix Table B.1- Morphological data used in the present study, including collection reference, population, repeat number and character scores for leaf area, perimeter, length and width. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

ref taxon pop repeat area perim length width 739 bro A 1 19.44 79.17 9.12 5.66 742 bro A 1 24.89 89.65 9.3 6.59 742 bro A 2 16.8 71.09 8.17 4.65 751 bro A 1 12.58 62.29 6.87 4.3 707 bro B 1 24.23 101.59 9.8 6.53 707 bro B 2 23.13 70.86 8.97 5.76 726 bro B 1 24.61 121.78 10.22 6.33 726 bro B 2 22.52 108.13 8.73 6.19 554 bro C 1 21.21 90.44 8.86 5.97 554 bro C 2 26.38 84.65 9.48 6.14 556 bro C 1 14.58 67.75 8.74 5.06 556 bro C 2 19.92 51.82 9.08 5.47 69 bro D 1 21.11 64.21 8.89 5.4 70 bro D 1 26.88 64.73 10.31 7.01 70 bro D 2 34.14 76.35 11.24 7.15 71 bro D 1 24.75 67.65 10.41 5.51 662 bro E 1 19.49 60.97 8.75 5.17 662 bro E 2 19.08 76.8 8.94 5.13 679 bro E 1 5.54 41.01 6.36 2.8 683 bro E 1 20.94 66.17 8.53 5.47 683 bro E 2 27.88 94.09 9.3 6.92 688 bro E 1 18.78 84.02 8.88 5.81 688 bro E 2 12.18 43.48 7.62 3.77 155 bro F 1 15.47 67.2 7.39 4.93 157 bro F 1 32.78 104.76 10.27 7.07 181 sun G 1 11.71 61.57 8.25 5.99 181 sun G 2 5.84 48.71 7.24 4.75 185 sun G 1 4.68 52.32 6.71 4.55 185 sun G 2 8.95 73.2 7.41 6.02 186 sun G 1 8.12 59.8 8.33 5.34 186 sun G 2 7.77 69.97 8.24 5.54 196 sun G 1 10.53 57.79 8.79 5.02 196 sun G 2 6.84 43.06 7.68 4.34 199 sun G 1 12.37 68.55 8.67 5.77 199 sun G 2 11.47 91.51 9.78 6.39 272 sun H 2 8.42 62.38 7.83 5.21 274 sun H 1 9.31 68 9.22 5.44

142 Appendix B

274 sun H 2 10.27 74.32 9.34 5.74 276 sun H 1 8.82 56.8 8.9 5.52 276 sun H 2 10.43 89.03 10.75 6.4 277 sun H 1 5.39 57.48 6.89 3.24 277 sun H 2 5.23 52.66 7.53 4.45 285 sun I 1 7.22 51.03 8.52 3.83 285 sun I 2 8.07 51.26 10.14 6.31 288 sun I 1 7.56 56.5 7.68 4.6 288 sun I 2 8.87 54.82 7.4 3.88 292 sun I 1 14.36 109.25 10.49 6.67 292 sun I 2 9.51 58.06 9.89 5.73 293 sun I 1 14.23 84.48 9.64 6.26 293 sun I 2 10.31 84.13 8.67 5.1 393 sun J 1 13.34 65.89 8.7 5.25 393 sun J 2 15.48 75.08 9.92 6.11 396 sun J 1 11.17 58.58 9.38 4.51 396 sun J 2 10.73 55.43 7.85 4.87 401 sun J 1 5.03 37.18 6.55 3.6 401 sun J 2 14.61 111.31 9 6.68 402 sun J 1 7.4 57.18 6.64 4.52 402 sun J 2 4.5 39.06 6.16 3.49 430 bro × fru K 1 5.38 50.99 5.26 3.88 430 bro × fru K 2 4.29 53.99 5.73 3.46 434 bro × fru K 1 5.58 44.91 5.78 3.82 434 bro × fru K 2 5.58 47.36 6.4 3.36 440 bro × fru K 1 3.96 35.5 5.67 2.51 440 bro × fru K 2 5.41 53.44 6.79 3.19 446 bro × fru K 1 7.7 55.92 5.71 4.53 446 bro × fru K 2 5.46 43.77 6.09 3.31 452 bro × fru K 1 1.77 26.91 3.88 2.81 452 bro × fru K 2 1.83 33.07 4.01 2.68 455 bro × fru K 1 3.83 29.42 4.52 2.8 694 bro × fru L 1 4.6 44.03 5.22 3.32 694 bro × fru L 2 6.64 60.89 6.49 3.44 696 bro × fru L 1 4.24 47.07 5.7 2.72 699 bro × fru L 1 4.28 33.68 4.78 3.62 699 bro × fru L 2 7.78 43.91 6.64 3.48 703 bro × fru L 1 5.79 42.76 5.72 3.26 703 bro × fru L 2 5.19 46.22 6.54 3.13 72 lem M 1 9.52 55.37 8.52 5.79 72 lem M 2 11.95 64.82 9.51 6.21 73 lem M 1 11.67 66.93 8.08 5.1 73 lem M 2 13.78 71.29 8.65 5.47 75 lem M 1 13.68 82.41 8.76 6.04 75 lem M 2 14.77 77.62 11.49 6.15 76 lem M 1 11.03 66.59 8.09 4.66 76 lem M 2 13.05 71.74 8.11 4.66 66 lem N 1 15.05 86.42 9.05 5.18

143 Appendix B

66 lem N 1 10.65 62.95 7.07 3.92 67 lem N 1 14.11 62.79 8.44 5.47 67 lem N 2 16.58 79.43 8.96 5.68 68 lem N 1 10.2 62.21 7.25 4.32 68 lem N 2 12.36 56.54 7.76 4.12 255 lem O 1 12.42 60.81 8.79 4.29 255 lem O 2 9.69 49.3 8.93 4.02 257 lem O 1 9.57 66.65 9.35 4.98 257 lem O 2 16.18 66.21 9.51 5.69 261 lem O 1 9.83 50.5 8.41 4.31 261 lem O 2 10.98 64.15 7.99 4.96 264 lem O 1 8.21 59.4 8.6 4.28 264 lem O 2 9.41 62.12 9.12 5.11 310 frf P 1 3.17 41.18 4.59 3.77 568 frf P 1 3.65 54.02 6.02 4.31 568 frf P 2 3.23 51.66 5.57 4.54 573 frf P 1 3.19 47.4 6 3.74 573 frf P 2 7.79 81.42 6.48 5.27 610 frf Q 1 3.49 30.45 4.96 2.89 610 frf Q 2 3.15 36.72 6.28 3.4 623 frf Q 1 2.42 31.05 6.87 3.39 623 frf Q 2 2.35 38.67 7.04 4.29 630 frf Q 1 1.01 16.3 2.4 1.67 632 frf R 1 3.63 53.94 5.88 4.56 645 frf R 1 3.33 49.51 5.9 4.61 645 frf R 2 4.21 44.33 7.16 3.97 656 frf R 1 1.8 29.82 4.03 2.79 241 frs S 1 5.5 55.64 6.51 4.47 241 frs S 2 3.29 31.75 5.82 3.42 242 frs S 1 6.6 46.1 7.98 4.62 242 frs S 2 7.04 44.91 7.53 4.76 245 frs S 1 2.67 30.33 4.22 3.29 245 frs S 2 2.23 25.61 4.06 3.05 223 frs T 1 5.03 47.76 6.05 4.3 223 frs T 2 4.79 44.51 4.95 3.58 224 frs T 2 1.16 17.92 3.38 2.04 232 frs T 1 4.39 36.81 6.09 4.22 232 frs T 2 2.94 27.42 5.33 2.72

144

Appendix Table B.2 - Representative voucher specimens deposited at the Natural History Museum, London. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

Date Collectors ref taxon pop x y locality barcode 13/05/2015 O. White, A. Reyes-Betancort & G. Torre 84 bro A 28.5306 -16.2423 Barranco de Valle Crispin BM000828668 13/05/2015 O. White, A. Reyes-Betancort & G. Torre 85 bro A 28.5306 -16.2423 Barranco de Valle Crispin BM000828667 13/05/2015 O. White, A. Reyes-Betancort & G. Torre 87 bro A 28.5318 -16.2424 Barranco de Valle Crispin BM000828665 13/05/2015 O. White, A. Reyes-Betancort & G. Torre 88 bro A 28.5319 -16.2426 Barranco de Valle Crispin BM000828664 13/05/2015 O. White, A. Reyes-Betancort & G. Torre 89 bro A 28.5318 -16.2429 Barranco de Valle Crispin BM000828663 13/05/2015 O. White, A. Reyes-Betancort & G. Torre 90 bro A 28.5331 -16.2423 Barranco de Valle Crispin BM000828662 21/06/2015 O. White, A. Reyes-Betancort 726 bro B 28.5352 -16.2346 Las Casas de la Cumbre BM000828476 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 79 bro C 28.5586 -16.1578 Path to Mesa del Sabinal BM000828673

145 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 69 bro D 28.5669 -16.1529 La Cumbrilla BM000828683 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 70 bro D 28.5663 -16.1533 La Cumbrilla BM000828682

06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 71 bro D 28.5661 -16.1540 La Cumbrilla BM000828681 20/06/2015 O. White 679 bro E 28.5719 -16.1559 Chamorga BM000828483 20/06/2015 O. White 683 bro E 28.5720 -16.1559 Chamorga BM000828482 20/06/2015 O. White 686 bro E 28.5720 -16.1559 Chamorga BM000828481 20/06/2015 O. White 688 bro E 28.5721 -16.1563 Chamorga BM000828480 25/05/2015 O. White 155 bro F 28.5535 -16.2308 Roques del Fraile BM000828600 25/05/2015 O. White 156 bro F 28.5541 -16.2294 Roques del Fraile BM000828599 25/05/2015 O. White 157 bro F 28.5522 -16.2292 Roques del Fraile BM000828598 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 1 sun G 28.5152 -16.2365 Valle Crispin BM000828751 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 2 sun G 28.5152 -16.2365 Valle Crispin BM000828750

30/04/2015 O. White, M. Carine & A. Reyes-Betancort 3 sun G 28.5153 -16.2366 Valle Crispin BM000828749 Appendix B 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 4 sun G 28.5152 -16.2366 Valle Crispin BM000828748 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 5 sun G 28.5150 -16.2368 Valle Crispin BM000828747

30/04/2015 O. White, M. Carine & A. Reyes-Betancort 6 sun G 28.5150 -16.2369 Valle Crispin BM000828746 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 7 sun G 28.5148 -16.2372 Valle Crispin BM000828745 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 8 sun G 28.5146 -16.2371 Valle Crispin BM000828744 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 9 sun G 28.5159 -16.2369 Valle Crispin BM000828743 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 10 sun G 28.5159 -16.2369 Valle Crispin BM000828742 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 11 sun G 28.5159 -16.2369 Valle Crispin BM000828741 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 271 sun H 28.5196 -16.2268 Valle Brosque BM000828567 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 272 sun H 28.5198 -16.2265 Valle Brosque BM000828566 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 278 sun I 28.5239 -16.2177 Roque Cubo BM000828565 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 281 sun I 28.5233 -16.2169 Roque Cubo BM000828564 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 284 sun I 28.5226 -16.2175 Roque Cubo BM000828563 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 287 sun I 28.5233 -16.2196 Roque Cubo BM000828562 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 292 sun I 28.5236 -16.2201 Roque Cubo BM000828561

03/06/2015 O. White, A. Reyes-Betancort & G. Torre 295 sun I 28.5241 -16.2197 Roque Cubo BM000828560

03/06/2015 O. White, A. Reyes-Betancort & G. Torre 301 sun I 28.5266 -16.2199 Roque Cubo BM000828559 146 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 12 sun J 28.5299 -16.2110 Barranco del Cercado de Andrés BM000828740 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 13 sun J 28.5298 -16.2110 Barranco del Cercado de Andrés BM000828739 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 14 sun J 28.5299 -16.2109 Barranco del Cercado de Andrés BM000828738 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 15 sun J 28.5299 -16.2108 Barranco del Cercado de Andrés BM000828737 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 17 sun J 28.5298 -16.2113 Barranco del Cercado de Andrés BM000828735 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 18 sun J 28.5296 -16.2113 Barranco del Cercado de Andrés BM000828734 30/04/2015 O. White, M. Carine & A. Reyes-Betancort 19 sun J 28.5297 -16.2108 Barranco del Cercado de Andrés BM000828733 09/06/2015 O. White 387 sun J 28.5300 -16.2108 Barranco del Cercado de Andrés BM000828519 09/06/2015 O. White 393 sun J 28.5292 -16.2114 Barranco del Cercado de Andrés BM000828518 09/06/2015 O. White 396 sun J 28.5292 -16.2115 Barranco del Cercado de Andrés BM000828517

09/06/2015 O. White 401 sun J 28.5280 -16.2130 Barranco del Cercado de Andrés BM000828516 09/06/2015 O. White 409 sun J 28.5264 -16.2146 Barranco del Cercado de Andrés BM000828515 11/06/2015 O. White 427 bro × fru K 28.5427 -16.1606 Barranco de Igueste BM000828509

Appendix B Appendix

11/06/2015 O. White 430 bro × fru K 28.5428 -16.1606 Barranco de Igueste BM000828508 11/06/2015 O. White 434 bro × fru K 28.5427 -16.1601 Barranco de Igueste BM000828507 11/06/2015 O. White 440 bro × fru K 28.5432 -16.1618 Barranco de Igueste BM000828506 11/06/2015 O. White 446 bro × fru K 28.5431 -16.1629 Barranco de Igueste BM000828505 11/06/2015 O. White 449 bro × fru K 28.5422 -16.1576 Barranco de Igueste BM000828504 11/06/2015 O. White 450 bro × fru K 28.5417 -16.1579 Barranco de Igueste BM000828503 11/06/2015 O. White 451 bro × fru K 28.5417 -16.1579 Barranco de Igueste BM000828502 11/06/2015 O. White 454 bro × fru K 28.5418 -16.1577 Barranco de Igueste BM000828501 11/06/2015 O. White 459 bro × fru K 28.5456 -16.1571 Barranco de Igueste BM000828500 11/06/2015 O. White 463 bro × fru K 28.5446 -16.1580 Barranco de Igueste BM000828499 11/06/2015 O. White 465 bro × fru K 28.5427 -16.1590 Barranco de Igueste BM000828498 20/06/2015 O. White 699 bro × fru L 28.5506 -16.1540 Lomo de las Casillas BM000828479 20/06/2015 O. White 700 bro × fru L 28.5509 -16.1547 Lomo de las Casillas BM000828478

147 20/06/2015 O. White 703 bro × fru L 28.5470 -16.1543 Lomo de las Casillas BM000828477

06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 72 lem M 28.5583 -16.1527 Path to Mesa del Sabinal BM000828680 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 73 lem M 28.5583 -16.1527 Path to Mesa del Sabinal BM000828679 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 74 lem M 28.5583 -16.1527 Path to Mesa del Sabinal BM000828678 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 75 lem M 28.5582 -16.1526 Path to Mesa del Sabinal BM000828677 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 76 lem M 28.5584 -16.1521 Path to Mesa del Sabinal BM000828676 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 77 lem M 28.5584 -16.1530 Path to Mesa del Sabinal BM000828675 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 63 lem N 28.5668 -16.1527 La Cumbrilla BM000828689 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 64 lem N 28.5668 -16.1527 La Cumbrilla BM000828688 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 65 lem N 28.5668 -16.1527 La Cumbrilla BM000828687 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 66 lem N 28.5673 -16.1510 La Cumbrilla BM000828686 06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 67 lem N 28.5673 -16.1510 La Cumbrilla BM000828685

06/05/2015 O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre 68 lem N 28.5673 -16.1510 La Cumbrilla BM000828684 Appendix B 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 28 lem O 28.5752 -16.1429 Barranco de Roque Bermejo BM000828724 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 29 lem O 28.5743 -16.1437 Barranco de Roque Bermejo BM000828723

01/05/2015 O. White, M. Carine & A. Reyes-Betancort 30 lem O 28.5745 -16.1468 Barranco de Roque Bermejo BM000828722 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 31 lem O 28.5749 -16.1493 Barranco de Roque Bermejo BM000828721 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 32 lem O 28.5731 -16.1511 Barranco de Roque Bermejo BM000828720 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 33 lem O 28.5728 -16.1517 Barranco de Roque Bermejo BM000828719 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 34 lem O 28.5726 -16.1519 Barranco de Roque Bermejo BM000828718 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 35 lem O 28.5731 -16.1498 Barranco de Roque Bermejo BM000828717 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 303 frf P 28.5078 -16.2293 Maria Jiménez BM000828558 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 307 frf P 28.5073 -16.2288 Maria Jiménez BM000828557 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 309 frf P 28.5067 -16.2282 Maria Jiménez BM000828556 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 312 frf P 28.5053 -16.2290 Maria Jiménez BM000828555 03/06/2015 O. White, A. Reyes-Betancort & G. Torre 317 frf P 28.4998 -16.2279 Maria Jiménez BM000828554 17/06/2015 O. White 568 frf P 28.4994 -16.2280 Maria Jiménez BM000828497 17/06/2015 O. White 573 frf P 28.4995 -16.2279 Maria Jiménez BM000828496

17/06/2015 O. White 595 frf P 28.4994 -16.2288 Maria Jiménez BM000828495

09/06/2015 O. White 412 frf Q 28.5172 -16.2025 San Andrés BM000828514 148 09/06/2015 O. White 416 frf Q 28.5149 -16.1976 San Andrés BM000828513 09/06/2015 O. White 420 frf Q 28.5110 -16.1968 San Andrés BM000828512 09/06/2015 O. White 422 frf Q 28.5108 -16.1964 San Andrés BM000828511 17/06/2015 O. White 599 frf Q 28.5128 -16.1959 San Andrés BM000828494 17/06/2015 O. White 610 frf Q 28.5127 -16.1961 San Andrés BM000828493 17/06/2015 O. White 623 frf Q 28.5129 -16.1964 San Andrés BM000828492 17/06/2015 O. White 630 frf Q 28.5125 -16.1970 San Andrés BM000828491 11/06/2015 O. White 425 frf R 28.5357 -16.1568 Igueste de San Andreas BM000828510 18/06/2015 O. White 632 frf R 28.5281 -16.1591 Igueste de San Andreas BM000828490 18/06/2015 O. White 633 frf R 28.5281 -16.1592 Igueste de San Andreas BM000828489

18/06/2015 O. White 639 frf R 28.5280 -16.1588 Igueste de San Andreas BM000828488 18/06/2015 O. White 645 frf R 28.5279 -16.1582 Igueste de San Andreas BM000828487 18/06/2015 O. White 647 frf R 28.5279 -16.1582 Igueste de San Andreas BM000828486

Appendix B Appendix

18/06/2015 O. White 656 frf R 28.5282 -16.1569 Igueste de San Andreas BM000828485 18/06/2015 O. White 661 frf R 28.5285 -16.1572 Igueste de San Andreas BM000828484 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 20 frs S 28.5796 -16.1349 Roque Bermejo BM000828732 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 21 frs S 28.5795 -16.1350 Roque Bermejo BM000828731 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 22 frs S 28.5784 -16.1363 Roque Bermejo BM000828730 01/05/2015 O. White, M. Carine & A. Reyes-Betancort 23 frs S 28.5785 -16.1363 Roque Bermejo BM000828729 02/06/2015 O. White 220 frs T 28.5742 -16.1909 Between Almáciga and Roque Bermejo BM000828575 02/06/2015 O. White 221 frs T 28.5754 -16.1864 Between Almáciga and Roque Bermejo BM000828574 02/06/2015 O. White 222 frs T 28.5782 -16.1779 Between Almáciga and Roque Bermejo BM000828573 02/06/2015 O. White 223 frs T 28.5778 -16.1782 Between Almáciga and Roque Bermejo BM000828572 02/06/2015 O. White 224 frs T 28.5808 -16.1714 Between Almáciga and Roque Bermejo BM000828571 02/06/2015 O. White 225 frs T 28.5808 -16.1713 Between Almáciga and Roque Bermejo BM000828570 02/06/2015 O. White 230 frs T 28.5818 -16.1652 Between Almáciga and Roque Bermejo BM000828569

149 02/06/2015 O. White 233 frs T 28.5834 -16.1535 Between Almáciga and Roque Bermejo BM000828568

25/05/2015 O. White 146 frs NA 28.5685 -16.2057 Roque de los Bodegas, east of Taganana BM000828607 25/05/2015 O. White 147 frs NA 28.5682 -16.2069 Roque de los Bodegas, east of Taganana BM000828606 25/05/2015 O. White 148 frs NA 28.5633 -16.2336 Taganana to Playa de Tamadite BM000828605 25/05/2015 O. White 150 frs NA 28.5642 -16.2339 Taganana to Playa de Tamadite BM000828604 25/05/2015 O. White 151 frs NA 28.5706 -16.2503 Cliffs above Playa de Tamadite BM000828603 25/05/2015 O. White 152 frs NA 28.5706 -16.2503 Cliffs above Playa de Tamadite BM000828602 25/05/2015 O. White 153 frs NA 28.5626 -16.2528 Barranco de Afur BM000828601

Appendix B

Appendix Table B.3 - Primer sequences of the nSSRs and cpSSRs employed in the present study.

Loci F. primer Sequence1 R. primer Sequence Repeat Nuclear (White et al. 2016) TR20067 TR20067F CACGACGTTGTAAAACGACCACAGATTCCCTGCAATCAA TR20067R CGTTATAAGCCAGCGGGTAT (CA)9 TR23535 TR23535F CACGACGTTGTAAAACGACCTGCGAATAACTCGCGAAA TR23535R CGCTAAACGCCTCGTCTAAT (ATG)7 TR26198 TR26198F CACGACGTTGTAAAACGACTGGGAATTGTAGCTGCCAAG TR26198R AGCACCATTTGTGTGAGCTG (CTG)8 TR24618 TR24618F CACGACGTTGTAAAACGACATCGATTGAGTCCGCGTAAG TR24618R GCGAGTTTCGACAATCTGGA (ATT)8 TR6042 TR6042F CACGACGTTGTAAAACGACACAAAAACCCATTTGCCACA TR6042R ACAGAAGACAAAGGGCCATC (ATC)7 TR25653 TR25653F CACGACGTTGTAAAACGACAAACACCGCAAGTGGATGAT TR25653R ACCTCTTTCCATTGCTTTGC (TGA)7 TR24835 TR24835F CACGACGTTGTAAAACGACTATGGCTACATGCTGGGACA TR24835R CGCAGCCACGAACATACTAA (AC)8 TR23697 TR23697F CACGACGTTGTAAAACGACTCATCGCATCTAGGGTTTTTG TR23697R ATCTCAGCGGTGAAACGAAC (GCA)7 Chloroplast (Bryan et al. 1999) Ntcp9 Ntcp9F CACGACGTTGTAAAACGACCTTCCAAGCTAACGATGC Ntcp9R CTGTCCTATCCATTAGACAATG (T)10

Ntcp30 Ntcp30F CACGACGTTGTAAAACGACGATGGCTCCGTTGCTTTAT Ntcp30R TGCCGGAGAGTTCTTAACAATA (T)13·(T)15 Ntcp39 Ntcp39F CACGACGTTGTAAAACGACGTCACAATTGGGGTTTTGAATA Ntcp39R GACGATACTGTAGGGGAGGTC (T)13

150 Ntcp40 Ntcp40F CACGACGTTGTAAAACGACTAATTTGATTCTTCGTCGC Ntcp40R GATGTAGCCAAGTGGATCA (A)14 1 CACGACGTTGTAAAACGAC represents the universal sequence appended to the start of the forward primers.

Appendix B Appendix

Appendix Table B.4 - Georeferenced localities remaining after filtering to include only one site per 50 m pixel. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum). Extracted variables for each point are shown. These are aspect (asp), slope, altitude (alt), average annual temperature (x01), daytime thermal range (x02), isothermality (x03), thermal seasonality (x04), maximum temperature of the warmest month (x05), minimum temperature of the coldest month (x06), annual thermal range (x07), average winter temperature (x08), average temperature of the driest quarter (x09), average summer temperature (x10), average temperature of the coldest quarter (x11), annual rainfall (x12), winter rainfall (x16), precipitation in the driest season (x17), precipitation in the coldest season (x19), average spring temperature (x20), average autumn temperature (x21), precipitation of the wettest semester (x22), precipitation of the driest semester (x23), maximum annual average temperature (x24) and average minimum annual temperature (x25). See http://climaimpacto.eu/ for full details of climate variables.

151 taxon x y asp slope alt x01 x02 x03 x04 x05 x06 x07 x08 x09 x10 x11 x12 x16 x17 x19 x20 x21 x22 x23 x24 x25 bro -16.242 28.531 5 28.90 643 162 55 38 2861 250 108 142 132 195 204 131 490 182 21 231 154 158 417 73 190 135

bro -16.243 28.532 6 30.21 674 162 59 40 2817 250 105 145 133 194 204 132 491 183 21 231 154 159 418 73 192 133 bro -16.243 28.532 6 37.39 725 159 59 39 2879 249 101 148 129 193 202 128 493 185 21 232 151 155 420 73 189 130 bro -16.243 28.532 6 34.90 673 163 61 42 2803 250 105 145 134 195 205 133 485 182 20 229 155 160 414 71 194 133 bro -16.242 28.533 5 27.00 747 156 57 38 2987 248 100 148 125 192 201 124 500 187 22 235 149 151 426 74 185 128 bro -16.243 28.531 6 17.50 650 166 63 44 2691 250 107 143 137 195 205 137 487 181 21 229 157 164 414 73 197 134 bro -16.243 28.530 5 8.21 643 163 57 40 2832 249 107 142 133 195 204 132 486 181 21 229 155 159 413 73 191 134 bro -16.242 28.532 5 32.23 671 161 54 37 2894 250 107 143 131 195 204 130 493 184 21 232 153 157 420 73 188 134 bro -16.243 28.533 7 13.46 767 159 64 42 2807 248 99 149 130 192 201 129 500 188 21 236 151 156 427 73 192 128 bro -16.234 28.535 5 30.54 761 155 53 36 2965 245 100 145 124 191 199 123 465 176 21 220 148 150 400 65 182 129 bro -16.234 28.535 5 34.95 738 157 56 38 2961 246 101 145 126 192 201 125 459 174 20 218 149 152 396 63 185 129

bro -16.235 28.535 6 34.48 746 158 59 40 2848 245 100 145 128 191 200 127 464 175 21 219 150 154 399 65 188 129 Appendix B bro -16.235 28.535 5 24.79 783 155 57 39 2937 244 98 146 125 190 199 124 471 178 21 222 147 151 405 66 184 127 bro -16.235 28.535 6 22.12 781 156 59 40 2854 243 98 145 127 189 199 125 473 178 21 222 148 152 406 67 186 127

bro -16.235 28.536 5 23.49 793 154 56 38 2957 243 97 146 123 189 198 122 474 179 21 223 146 149 407 67 182 126 bro -16.235 28.536 5 22.62 815 153 56 38 2964 241 96 145 122 188 197 121 482 181 22 225 145 148 413 69 181 125 bro -16.158 28.559 7 11.75 633 160 58 45 2501 230 102 128 133 184 196 133 466 169 23 211 151 161 383 83 189 131 bro -16.158 28.559 2 2.47 633 153 45 35 2716 230 102 128 124 184 192 124 469 171 23 212 145 152 386 83 176 131 bro -16.153 28.567 6 13.10 637 158 52 41 2574 229 105 124 130 183 195 129 476 174 22 211 149 158 390 86 184 132 bro -16.154 28.566 6 27.34 629 160 57 45 2479 230 104 126 134 184 196 133 472 173 22 210 151 161 387 85 189 132 bro -16.154 28.566 7 8.46 624 157 49 39 2558 229 106 123 130 184 194 130 474 173 22 210 148 158 388 86 182 133 bro -16.156 28.566 2 17.56 622 154 43 35 2585 226 106 120 127 182 192 127 484 175 23 213 146 154 395 89 176 133 bro -16.156 28.566 5 19.86 600 158 49 39 2607 229 106 123 129 184 195 129 476 173 23 211 150 157 389 87 182 133 bro -16.157 28.566 2 29.89 615 154 41 34 2608 226 107 119 126 183 192 127 485 175 23 213 146 154 396 89 175 134 bro -16.156 28.566 6 23.60 601 160 54 43 2529 229 105 124 132 184 196 132 473 172 23 210 151 160 387 86 187 133 bro -16.154 28.572 5 29.45 476 163 44 36 2596 234 115 119 134 190 200 134 459 166 22 202 155 163 375 84 185 141 bro -16.155 28.572 5 23.71 510 161 43 35 2648 234 113 121 132 190 199 132 456 165 22 201 154 161 373 83 183 140

bro -16.156 28.572 4 8.98 532 159 41 33 2694 235 112 123 130 189 198 130 450 165 21 200 151 158 369 81 180 139

bro -16.156 28.572 5 16.91 539 161 45 36 2686 236 111 125 131 190 199 131 444 164 20 199 153 160 366 78 183 138 152 bro -16.231 28.553 7 24.47 586 164 56 47 2337 227 110 117 139 184 197 139 428 161 18 185 154 166 367 61 192 136 bro -16.230 28.554 5 25.35 627 156 48 40 2526 224 106 118 129 180 193 129 432 164 19 185 148 157 369 63 181 133 bro -16.229 28.552 3 30.40 599 155 40 34 2582 226 109 117 127 183 192 128 429 162 19 185 146 154 367 62 175 135 sun -16.237 28.515 3 23.92 271 185 44 36 2523 257 137 120 156 213 220 158 482 183 20 223 177 185 405 77 207 163 sun -16.237 28.515 3 36.60 232 188 44 36 2555 260 140 120 159 216 224 161 493 180 23 226 181 188 414 79 210 166 sun -16.237 28.515 7 10.12 272 186 50 41 2471 256 136 120 158 212 221 160 486 184 20 224 180 187 409 77 212 162 sun -16.237 28.515 7 35.55 256 192 59 49 2252 256 137 119 167 212 223 168 484 183 20 223 183 195 407 77 222 163 sun -16.237 28.516 3 28.74 210 188 44 36 2540 260 141 119 159 216 224 161 493 180 23 226 181 188 414 79 210 166 sun -16.236 28.515 3 31.57 221 188 44 36 2512 260 141 119 160 216 224 162 492 179 23 225 181 189 413 79 211 167 sun -16.236 28.515 3 20.94 193 191 44 37 2500 262 144 118 163 219 226 165 494 180 23 226 184 192 415 79 213 169

sun -16.236 28.515 3 31.78 204 190 44 36 2524 261 142 119 161 217 225 163 492 179 23 225 182 190 413 79 212 168 sun -16.236 28.515 3 34.45 245 187 44 36 2524 259 140 119 159 215 223 161 493 180 23 226 179 188 414 79 209 165 sun -16.237 28.516 4 12.29 199 190 44 36 2544 262 143 119 161 218 226 163 492 179 23 225 183 190 413 79 212 168

Appendix B Appendix

sun -16.237 28.516 3 32.89 236 186 43 35 2557 259 139 120 157 215 222 159 495 181 23 227 179 186 416 79 208 165 sun -16.227 28.520 7 9.37 220 194 57 47 2316 260 141 119 168 216 227 170 495 169 24 222 187 197 415 80 223 166 sun -16.227 28.520 7 3.44 220 192 53 44 2378 260 141 119 165 216 226 167 431 151 21 197 185 194 367 64 219 166 sun -16.226 28.520 8 13.09 232 195 58 49 2228 259 141 118 169 215 225 172 436 152 21 199 186 198 371 65 224 166 sun -16.218 28.524 2 26.35 464 167 42 36 2500 236 120 116 140 194 203 141 451 158 21 202 159 168 374 77 189 147 sun -16.216 28.524 3 24.58 476 166 41 35 2538 234 120 114 139 193 203 139 371 138 17 164 158 166 312 59 187 146 sun -16.217 28.524 2 25.53 472 167 44 37 2466 235 119 116 140 193 203 142 367 137 17 163 159 168 309 58 190 146 sun -16.217 28.523 3 15.81 501 164 41 35 2564 233 117 116 136 192 201 137 453 159 22 202 156 164 375 78 185 144 sun -16.217 28.523 4 24.51 490 165 42 36 2521 232 118 114 137 191 201 138 457 160 22 203 157 165 378 79 186 144 sun -16.218 28.523 3 15.93 513 164 41 35 2526 232 117 115 136 191 200 137 458 160 22 204 156 164 379 79 185 144 sun -16.217 28.522 4 23.15 530 163 42 36 2542 231 116 115 135 189 200 136 463 161 22 205 155 163 382 81 184 142 sun -16.218 28.522 4 22.63 530 162 43 37 2555 231 115 116 135 189 199 135 469 165 22 209 154 162 388 81 184 141 sun -16.220 28.523 3 14.90 561 161 43 35 2643 235 114 121 133 191 200 133 460 162 21 209 153 160 384 76 183 140

153 sun -16.220 28.523 8 23.88 540 168 54 44 2396 236 115 121 142 191 202 143 470 169 22 209 158 169 398 72 195 141

sun -16.220 28.524 9 20.31 536 168 54 44 2410 237 115 122 142 192 203 143 440 157 20 200 158 169 370 70 195 141 sun -16.220 28.524 3 18.20 542 163 42 35 2629 236 116 120 135 192 201 135 438 156 20 199 155 162 368 70 184 142 sun -16.220 28.524 4 9.84 537 165 46 38 2574 235 115 120 137 192 202 137 438 156 20 199 157 164 368 70 188 142 sun -16.220 28.525 6 11.21 548 168 54 44 2461 235 114 121 141 191 203 140 439 157 20 199 159 167 369 70 195 141 sun -16.220 28.525 5 3.84 556 162 43 34 2676 237 114 123 133 192 201 133 436 155 20 198 154 160 367 69 184 141 sun -16.220 28.525 8 23.68 533 170 57 46 2364 237 115 122 145 193 204 145 440 156 20 200 161 171 370 70 199 142 sun -16.220 28.526 9 26.00 540 167 52 41 2509 240 114 126 140 194 203 141 432 154 20 198 157 167 365 67 193 141 sun -16.220 28.527 8 26.22 517 170 54 42 2472 243 117 126 143 196 206 144 426 153 20 197 161 171 361 65 197 143 sun -16.211 28.530 2 26.03 321 179 44 37 2513 249 131 118 151 206 214 153 330 125 15 154 171 179 282 48 201 157 sun -16.211 28.530 2 30.34 268 184 45 38 2474 254 136 118 156 211 219 159 323 122 15 151 176 186 277 46 207 162 sun -16.211 28.530 2 31.15 300 183 47 40 2400 250 133 117 156 208 217 158 329 124 15 153 174 184 281 48 206 159

sun -16.211 28.529 2 6.94 327 179 43 37 2476 247 131 116 151 205 214 153 334 126 15 154 171 179 284 50 200 157 Appendix B sun -16.213 28.528 3 21.36 383 174 42 36 2486 241 127 114 146 200 209 148 348 131 16 158 166 174 295 53 195 153 sun -16.213 28.528 5 36.92 376 177 46 40 2416 241 128 113 150 200 211 151 350 131 16 158 169 178 295 55 200 154

sun -16.214 28.527 3 18.75 436 169 42 35 2570 239 122 117 140 197 206 142 346 132 15 158 161 169 294 52 190 148 sun -16.214 28.527 5 25.18 421 172 47 40 2489 240 123 117 145 197 208 145 366 137 17 163 165 173 309 57 196 149 sun -16.214 28.527 5 35.29 417 173 47 40 2453 240 124 116 146 198 208 147 361 135 16 162 165 174 306 55 197 150 b × f -16.160 28.543 5 9.45 185 190 49 42 2386 254 139 115 163 213 223 164 455 162 23 203 183 193 370 85 215 166 b × f -16.160 28.543 7 17.30 177 194 57 49 2244 255 139 116 168 214 224 170 454 161 23 203 187 198 370 84 223 166 b × f -16.162 28.543 3 8.71 215 183 42 36 2510 251 136 115 154 210 217 156 421 146 20 189 176 184 341 80 204 162 b × f -16.162 28.543 3 2.50 204 185 42 36 2435 251 137 114 157 210 218 159 424 146 20 189 177 187 343 81 206 164 b × f -16.163 28.543 3 5.67 208 184 40 35 2438 249 137 112 156 209 217 158 428 148 20 190 177 186 346 82 204 164 b × f -16.163 28.543 3 17.11 216 183 40 36 2446 247 137 110 155 208 216 157 434 150 20 192 175 185 350 84 203 163 b × f -16.157 28.542 8 33.81 191 193 55 46 2301 258 139 119 167 215 224 169 453 161 22 204 184 196 370 83 220 165 b × f -16.158 28.542 8 27.71 158 196 56 47 2255 258 141 117 170 216 226 172 452 160 23 202 188 200 368 84 224 168 b × f -16.158 28.541 8 29.95 149 197 57 49 2242 258 142 116 171 216 227 173 452 160 23 202 189 202 368 84 226 169 b × f -16.156 28.546 7 21.16 329 183 58 46 2360 252 128 124 157 205 215 158 435 152 20 200 174 186 360 75 212 154

b × f -16.157 28.546 7 15.99 301 184 57 46 2337 253 130 123 158 206 216 159 434 151 20 199 175 187 359 75 213 156

b × f -16.157 28.546 6 18.96 286 184 57 46 2362 253 130 123 158 206 217 159 433 151 20 199 176 187 358 75 213 156 154 b × f -16.157 28.546 7 7.06 293 185 58 47 2354 253 131 122 159 207 218 160 431 150 20 199 177 189 356 75 215 157 b × f -16.158 28.545 8 8.85 282 185 56 45 2374 254 131 123 158 208 218 160 429 149 20 198 176 188 355 74 213 157 b × f -16.158 28.545 5 17.92 279 181 48 39 2561 254 131 123 152 208 217 153 427 149 20 197 174 182 353 74 205 157 b × f -16.159 28.543 6 22.85 204 191 54 45 2345 255 137 118 164 212 222 165 455 162 23 204 183 194 371 84 218 164 b × f -16.158 28.553 6 6.10 607 161 56 43 2549 234 106 128 134 187 198 133 473 170 22 213 153 161 389 84 190 134 b × f -16.155 28.551 6 22.67 592 161 53 42 2520 232 108 124 134 186 197 134 485 174 22 216 153 162 399 86 188 135 b × f -16.154 28.551 4 9.62 581 156 40 32 2690 231 109 122 127 186 194 127 488 173 22 216 147 155 399 89 176 136 b × f -16.154 28.551 9 22.49 604 160 52 42 2462 231 108 123 134 185 196 134 491 175 22 217 151 162 402 89 187 135 b × f -16.154 28.551 7 29.95 560 165 56 45 2425 233 110 123 139 187 199 139 483 172 22 215 155 166 397 86 193 137 b × f -16.154 28.547 7 29.60 418 175 58 45 2424 247 120 127 148 199 209 149 448 158 21 205 165 177 371 77 204 146

b × f -16.151 28.546 5 12.29 486 169 55 43 2486 241 115 126 142 194 204 142 456 162 21 206 161 170 376 80 197 142 b × f -16.152 28.546 6 31.80 486 169 55 43 2515 242 116 126 142 194 205 142 456 162 21 206 161 170 376 80 197 142 b × f -16.151 28.546 2 12.58 488 167 51 40 2483 240 115 125 140 193 203 142 456 163 21 206 158 169 376 80 193 142

Appendix B Appendix

b × f -16.154 28.545 6 24.89 403 177 56 45 2403 247 123 124 150 200 210 151 451 158 21 206 168 179 372 79 205 149 lem -16.152 28.558 6 16.10 578 162 54 43 2488 230 107 123 135 185 197 134 477 171 23 212 154 163 390 87 189 135 lem -16.152 28.559 9 16.79 595 158 48 40 2505 227 107 120 131 184 194 131 484 174 23 214 149 158 396 88 182 134 lem -16.153 28.558 5 18.93 584 157 47 38 2645 229 106 123 129 185 195 128 480 174 23 214 150 156 393 87 181 134 lem -16.153 28.559 3 18.80 588 155 42 34 2674 227 106 121 126 184 193 126 480 172 23 213 147 154 392 88 176 134 lem -16.152 28.559 2 15.42 585 156 43 35 2595 228 108 120 128 184 194 129 483 173 23 214 148 156 395 88 178 135 lem -16.153 28.567 5 28.93 630 157 50 40 2632 230 105 125 129 184 195 128 475 174 22 211 149 156 390 85 182 132 lem -16.151 28.567 5 35.12 589 158 47 37 2659 232 108 124 129 185 196 128 474 174 22 210 149 157 389 85 181 134 lem -16.152 28.567 3 17.88 626 153 41 33 2713 229 106 123 124 183 192 124 481 175 22 212 145 152 394 87 174 133 lem -16.152 28.567 5 36.80 616 159 51 40 2596 232 107 125 131 185 196 130 472 173 22 210 151 159 387 85 185 134 lem -16.150 28.569 5 11.83 565 162 49 41 2471 229 112 117 135 185 197 135 482 174 23 210 154 163 392 90 187 138 lem -16.143 28.575 5 35.70 223 184 45 43 2295 241 137 104 158 203 215 159 468 163 23 198 176 188 376 92 207 162 lem -16.144 28.575 5 33.50 280 182 48 45 2277 239 133 106 156 200 212 157 472 165 23 200 173 185 380 92 206 158

155 lem -16.147 28.574 6 22.85 265 184 52 47 2254 243 133 110 159 202 214 160 457 162 22 197 175 188 370 87 210 158

lem -16.149 28.575 5 23.89 306 181 51 44 2335 245 131 114 155 202 213 156 451 162 20 197 173 184 368 83 207 156 lem -16.151 28.573 5 14.91 362 174 48 44 2332 234 125 109 148 194 206 148 472 167 23 203 166 177 382 90 198 150 lem -16.152 28.573 5 18.51 367 174 49 44 2358 235 124 111 148 195 207 148 467 166 23 202 166 177 378 89 199 150 lem -16.152 28.573 3 13.80 374 170 41 36 2482 235 124 111 143 194 205 143 467 166 23 202 162 172 378 89 191 150 lem -16.150 28.573 4 17.66 352 172 40 37 2401 233 127 106 145 194 205 146 477 168 23 203 164 174 385 92 192 152 lem -16.141 28.577 4 23.42 168 188 39 37 2378 248 144 104 160 211 220 163 452 159 21 192 181 191 365 87 208 169 lem -16.143 28.574 4 17.98 207 186 43 42 2280 241 140 101 160 205 216 161 477 164 24 201 179 190 382 95 208 165 lem -16.148 28.574 6 11.52 273 184 52 47 2267 243 133 110 158 202 214 159 458 163 22 198 175 187 371 87 210 158 lem -16.150 28.575 5 35.80 323 178 46 40 2436 245 131 114 151 201 212 152 451 162 20 197 170 180 368 83 201 155 lem -16.152 28.573 2 19.69 374 170 41 37 2396 233 124 109 143 194 203 145 473 167 23 203 161 173 382 91 191 150 lem -16.151 28.573 5 14.47 358 174 44 41 2361 234 127 107 147 195 206 148 475 167 23 203 166 176 383 92 196 152

frf -16.229 28.508 7 18.10 93 209 60 51 2202 270 153 117 183 228 238 186 466 169 20 210 201 213 393 73 239 179 Appendix B frf -16.230 28.508 6 4.17 75 209 60 51 2224 271 154 117 183 229 239 186 465 169 20 209 202 213 392 73 240 180 frf -16.230 28.507 3 17.84 68 202 44 37 2483 271 154 117 173 229 236 176 465 169 20 209 196 203 392 73 224 180

frf -16.229 28.507 6 6.87 74 210 60 51 2206 271 154 117 184 229 239 187 464 168 20 209 202 214 392 72 240 180 frf -16.229 28.507 7 8.88 61 210 60 51 2207 271 154 117 184 229 239 187 465 169 20 210 202 214 393 72 240 180 frf -16.228 28.507 8 31.34 86 209 59 51 2157 269 155 114 184 228 238 187 474 171 21 212 201 214 398 76 239 180 frf -16.229 28.505 7 6.37 57 210 58 50 2174 270 156 114 184 229 239 188 465 169 20 208 202 214 390 75 239 181 frf -16.227 28.500 3 15.99 42 205 45 39 2447 271 157 114 176 231 239 179 445 164 19 200 199 207 374 71 228 183 frf -16.228 28.500 3 25.11 60 204 44 38 2445 270 156 114 175 230 238 179 444 164 19 199 198 206 373 71 226 182 frf -16.228 28.499 3 27.27 67 202 44 38 2450 268 154 114 173 228 236 176 444 164 19 199 196 204 373 71 224 180 frf -16.228 28.499 3 33.79 114 198 44 39 2422 263 151 112 169 223 231 172 444 166 19 199 191 200 373 71 220 176 frf -16.228 28.499 3 30.87 89 200 44 38 2437 266 153 113 171 226 233 175 444 165 19 199 193 202 373 71 222 178 frf -16.228 28.499 4 33.12 82 201 44 38 2436 266 153 113 172 226 234 175 444 165 19 199 195 203 373 71 223 179 frf -16.229 28.499 3 28.10 142 196 43 38 2417 261 149 112 168 221 229 171 447 167 19 199 189 198 375 72 218 175 frf -16.229 28.499 3 26.33 115 198 43 38 2422 263 151 112 169 223 231 172 448 167 19 200 191 200 376 72 219 176 frf -16.203 28.517 3 18.28 74 200 44 38 2449 265 152 113 172 226 234 175 325 121 15 154 194 202 268 57 223 179

frf -16.203 28.517 3 21.74 105 198 43 38 2456 262 150 112 169 224 232 172 330 122 15 155 192 200 271 59 220 177

frf -16.197 28.515 8 17.22 67 210 61 51 2255 272 153 119 184 231 241 187 299 112 14 147 203 214 249 50 241 180 156 frf -16.197 28.511 3 13.41 34 205 45 39 2476 269 155 114 176 231 239 179 284 103 12 135 200 207 236 48 228 183 frf -16.196 28.511 4 4.86 24 209 53 45 2367 271 155 116 181 232 241 184 280 102 12 134 204 212 233 47 236 183 frf -16.196 28.510 4 8.37 27 206 46 39 2503 271 155 116 176 232 240 179 279 101 12 134 200 207 232 47 229 183 frf -16.196 28.510 6 1.58 23 213 61 52 2210 272 155 117 187 233 243 190 277 101 12 134 207 218 231 46 244 183 frf -16.196 28.513 7 19.88 72 208 60 50 2230 268 150 118 182 228 238 185 280 103 12 136 201 212 235 45 238 178 frf -16.196 28.513 8 17.63 62 210 60 51 2211 269 152 117 183 229 239 187 278 102 12 135 203 214 233 45 240 180 frf -16.196 28.512 7 14.87 48 211 60 51 2230 269 152 117 184 230 240 187 279 102 12 135 204 215 234 45 241 181 frf -16.196 28.513 8 22.17 56 209 59 50 2243 269 152 117 183 230 239 186 279 102 12 136 202 213 234 45 239 180 frf -16.197 28.513 8 13.12 41 211 60 51 2206 270 153 117 185 230 240 188 281 103 12 136 204 215 235 46 241 181 frf -16.157 28.536 4 34.64 117 195 47 40 2460 260 145 115 166 219 228 168 440 155 22 196 188 197 357 83 218 171

frf -16.156 28.535 5 5.88 81 199 48 42 2416 262 149 113 171 223 231 173 442 155 23 197 193 202 358 84 223 175 frf -16.159 28.528 3 31.42 147 190 39 37 2349 248 145 103 162 213 221 165 375 138 17 168 183 194 298 77 210 171 frf -16.159 28.528 3 32.29 131 192 40 38 2346 250 146 104 164 215 223 167 424 154 20 182 185 195 340 84 212 172

Appendix B Appendix

frf -16.159 28.528 4 15.53 136 190 40 38 2381 249 145 104 162 213 222 165 422 154 20 182 184 194 338 84 211 171 frf -16.158 28.528 4 17.39 131 190 39 36 2404 250 144 106 162 214 222 165 418 153 20 181 184 193 336 82 210 171 frf -16.158 28.528 4 22.61 108 192 40 37 2414 252 145 107 163 216 224 166 414 151 20 180 185 195 333 81 212 172 frf -16.157 28.528 3 25.09 77 197 42 38 2402 258 149 109 168 221 229 171 413 149 20 180 190 200 333 80 218 176 frf -16.157 28.528 3 21.58 73 197 41 36 2418 260 149 111 168 222 229 172 410 148 20 180 191 200 332 78 218 177 frs -16.135 28.580 4 13.87 3 204 45 41 2420 262 153 109 175 227 236 178 417 142 22 176 199 207 335 82 227 182 frs -16.136 28.579 4 18.39 43 199 45 40 2440 259 149 110 170 223 231 172 419 144 22 177 193 202 337 82 222 177 frs -16.135 28.580 4 20.39 26 202 46 42 2403 260 151 109 173 224 233 175 418 143 22 177 196 205 336 82 225 179 frs -16.136 28.579 5 21.91 30 202 47 42 2400 260 150 110 173 224 233 175 419 144 22 177 197 205 337 82 226 179 frs -16.137 28.578 4 16.89 53 202 50 45 2323 259 150 109 174 222 232 176 424 146 22 179 196 206 341 83 227 177 frs -16.136 28.578 4 17.89 43 201 47 42 2388 259 149 110 172 223 232 174 419 144 22 177 195 204 337 82 224 177 frs -16.191 28.574 8 22.51 7 207 51 52 2068 251 154 97 183 221 232 186 348 113 20 145 200 215 290 58 233 182 frs -16.186 28.575 2 33.45 25 204 42 43 2165 252 155 97 178 221 231 182 354 115 20 148 197 210 295 59 225 183

157 frs -16.178 28.578 2 29.56 158 188 39 41 2172 238 143 95 163 206 217 166 523 171 27 215 180 194 417 106 208 169

frs -16.178 28.578 9 30.39 174 190 45 46 2082 238 142 96 166 205 217 169 480 157 27 200 181 197 385 95 213 168 frs -16.172 28.581 9 31.69 235 183 41 44 2097 232 139 93 160 199 210 162 535 177 27 219 174 189 426 109 204 163 frs -16.171 28.581 2 33.04 238 180 37 39 2179 231 138 93 156 198 209 158 534 177 27 218 171 186 425 109 199 162 frs -16.165 28.582 2 25.58 217 183 40 43 2137 230 138 92 158 199 210 161 503 168 26 204 174 189 398 105 203 163 frs -16.165 28.581 2 20.81 242 181 39 42 2129 228 137 91 157 198 209 159 505 169 26 205 173 187 400 105 201 162 frs -16.165 28.582 2 16.86 189 185 39 42 2145 232 140 92 160 201 212 163 497 164 26 201 177 191 394 103 205 166 frs -16.166 28.582 9 21.60 188 188 43 47 2059 232 141 91 164 202 214 167 500 165 26 202 179 195 395 105 210 167 frs -16.166 28.582 9 20.72 182 189 44 48 2048 232 141 91 165 202 214 168 497 164 26 201 180 196 394 103 211 167 frs -16.153 28.583 3 37.17 243 178 35 39 2184 226 138 88 153 195 207 156 514 174 26 206 170 184 405 109 196 161 frs -16.153 28.583 3 38.48 284 176 34 38 2180 224 136 88 151 193 205 153 519 177 26 209 167 181 410 109 193 159 frs -16.153 28.583 3 41.09 252 179 33 37 2173 226 139 87 154 196 207 157 516 175 26 206 171 185 407 109 196 163

frs -16.206 28.569 2 25.83 20 204 43 46 2076 248 156 92 180 219 230 184 367 124 19 146 196 211 309 58 226 183 Appendix B frs -16.207 28.568 2 29.84 48 201 40 43 2130 246 155 91 175 217 227 179 372 126 19 148 194 208 313 59 221 181 frs -16.234 28.563 3 28.03 192 184 35 38 2169 232 142 90 159 201 212 162 401 140 20 156 177 190 337 64 202 167

frs -16.234 28.564 3 29.47 149 187 35 39 2176 234 145 89 162 204 215 165 400 139 20 155 180 193 336 64 205 170 frs -16.250 28.570 9 14.80 94 200 44 44 2184 253 154 99 175 219 229 178 444 149 26 178 193 203 358 86 222 178 frs -16.253 28.563 3 12.56 96 200 44 41 2388 263 156 107 172 224 232 175 413 144 21 174 193 201 341 72 222 178

158

Appendix B Appendix

Appendix Table B.5 - Summary statistics of nuclear (A) and chloroplast SSRs (B) including sample size (N); number of alleles (Na); effective number of alleles (Ne); information index (I); observed heterozygosity (Ho); expected heterozygosity (He); unbiased expected heterozygosity (He); unbiased expected heterozygosity (uHe); diversity (h); unbiased diversity by population (uh). Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

A. nSSRs B. cpSSRs N Na Ne I Ho He uHe N Na Ne I h uh bro 44.1 5 2.58 1.09 0.42 0.57 0.57 49 2.75 1.39 0.46 0.24 0.25 sun 35.5 6.75 4.02 1.5 0.56 0.72 0.73 39.3 2.25 1.44 0.42 0.24 0.25 bro × fru 12.4 5.5 4.1 1.43 0.5 0.7 0.73 13.3 2.5 1.72 0.64 0.38 0.41 lem 39.6 6.63 3.44 1.4 0.55 0.69 0.7 42 3.5 2.23 0.95 0.55 0.56 frf 28.3 6.5 3.21 1.28 0.41 0.61 0.62 27.8 3 2.24 0.89 0.55 0.57

159 frs 14.3 3.5 1.81 0.67 0.24 0.36 0.37 14.5 1.5 1.29 0.24 0.16 0.17

Appendix Table B.6 - Average number of clusters, clusters that passed the minimum depth requirement, heterozygosity estimate, error estimate, number of consensus reads and loci per sample. In addition, the total number of unlinked SNPs in each assembly that was used for PCA and STRUCTURE.

Cluster Cluster > Hetero. Error Consensus Loci Loci u SNPs u SNPs Clusters threshold min. depth estimate estimate reads m10 m13 m10 m13 80% 59,494 18,735 0.0394 0.0164 16,194 2850 2070 3556 2142 85% 67,313 20,818 0.0350 0.0156 18,586 3127 2269 3939 2370 90% 87,312 25,621 0.0284 0.0130 23,988 3640 2574 4736 2751

Appendix B

Appendix B

Appendix Table B.7 - Model checking scenario 1. Summary statistics included variance of non-zero values for genic diversities (HV1_1), two sample Fst (FV1_1) and Nei’s distances (NV1_1) as well as admixture statistics (AV1_1).

Proportion Summary statistics Observed value (simulated

160 Appendix B

Appendix Figure B.1 - Images of A. broussonetii, A. frutescens subsp. frutescens, A. frutescens subsp. succulentum, A. sundingii and A. lemsii.

161 Appendix B

Appendix Figure B.2 - Summary plots for a generalised linear model of leaf area generated in R (R Core Team, 2018) including residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots.

162 Appendix B

Appendix Figure B.3 - Summary plots for a linear model of leaf perimeter generated in R (R Core Team, 2018) including residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots.

163 Appendix B

Appendix Figure B.4 - Summary plots for a linear model of leaf length generated in R (R Core Team, 2018) including residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots.

164 Appendix B

Appendix Figure B.5 - Summary plots for a linear model of leaf width generated in R (R Core Team, 2018) including residuals vs fitted, normal Q-Q, scale-location and residuals vs leverage plots.

165 Appendix B

Appendix Figure B.6 - Correlation circle showing the contribution of each variable to the Principal Components Analysis.

166 Appendix B

Appendix Figure B.7 - Niche equivalency test plots for each comparison based on Schoener’s D statistic (Schoener, 1968) and Maxent predictions. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

167 Appendix B

Appendix Figure B.8 - Niche equivalency test plots for each comparison based on Warren’s I statistic (Warren et al., 2008) and Maxent predictions. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

168 Appendix B

Appendix Figure B.9 - Niche equivalency test plots for each comparison based on Schoener’s D statistic (Schoener, 1968) and PCA-env predictions. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

169 Appendix B

Appendix Figure B.10 - Niche equivalency test plots for each comparison based on Warren’s I statistic (Warren et al., 2008) and PCA-env predictions. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

170 Appendix B

Appendix Figure B.11 - Eigen values for additional principal components and PCAs including axes one, two and three for (A) parents only, (B) all taxa and (C) hybrid taxa only.

171 Appendix B

Appendix Figure B.12 - Median-joining haplotype network produced using Network 5.0 (Bandelt et al., 1999). Haplotype circle size is proportional to sample number, colour indicates the putative taxa and branch length proportional to the number of SSR motif changes. To conform to software requirements of Network, allele fragment sizes for the chloroplast SSRs were converted to simple numeric characters. For example, Ntcp39 allele fragment sizes 163, 164 and 165 were converted to 1, 2 and 3 respectively. These simplified numeric scores for the chloroplast SSR markers are presented adjacent to the haplotype circles.

172 Appendix B

Appendix Figure B.13 - Number of filtered reads for each sample included in our analysis of genomic SNPs. The horizontal line at 500,000 reads marks the arbitrary cut-off at which samples were removed prior to assembly.

173 Appendix B

Appendix Figure B.14 - Eigen values for additional principal components and PCAs including axes one, two and three for each ipyrad assembly method: (A) 80 % cluster threshold and 10 minimum samples, (B) 80 % cluster threshold and 13 minimum samples, (C) 85 % cluster threshold and 10 minimum samples, (D) 85 % cluster threshold and 13 minimum samples, (E) 90 % cluster threshold and 10 minimum samples and finally (F) 90 % cluster threshold and 13 minimum samples.

174 Appendix B

Appendix Figure B.15 - Delta K and STRUCTURE plots for K values two, three, four and five for each ipyrad assembly method: (A) 80 % cluster threshold and 10 minimum samples, (B) 80 % cluster threshold and 13 minimum samples, (C) 85 % cluster threshold and 10 minimum samples, (D) 85 % cluster threshold and 13 minimum samples, (E) 90 % cluster threshold and 10 minimum samples and finally (F) 90 % cluster threshold and 13 minimum samples. Taxa are abbreviated to bro (A. broussonetii), sun (A. sundingii), bro × fru (A. broussonetii × A. frutescens), lem (A. lemsii), frf (A. frutescens subsp. frutescens) and frs (A. frutescens subsp. succulentum).

175 Appendix B

Appendix Figure B.16 - First two axes of the PCA as part of the model checking function of DIYABC. Small empty circles represent datasets simulated from priors, large filled circles represent datasets simulated from posteriors and the large yellow circle represents the observed dataset. See Materials and Methods for description.

176

Supplementary information for chapter 4

Appendix Table C.1 - Summary of ipyrad parameters used for each assembly. Assemblies differ in their clustering threshold and the minimum number of samples required to include a locus and are named based on these parameters. Thresholds of 80 %, 85 % and 90 % are denoted by c80, c85 and c90 respectively, whereas the minimum numbers of 30 and 38 are denoted by m30 and m38 respectively.

Ipyrad parameter c80-m30 c80-m38 c85-m30 c85-m38 c90-m30 c90-m38 assembly_method denovo-reference denovo-reference denovo-reference denovo-reference denovo-reference denovo-reference datatype gbs gbs gbs gbs gbs gbs restriction_overhang TGCA, TGCA, TGCA, TGCA, TGCA, TGCA, 177 max_low_qual_bases 5 5 5 5 5 5

phred_Qscore_offset 33 33 33 33 33 33 mindepth_statistical 6 6 6 6 6 6 mindepth_majrule 6 6 6 6 6 6 maxdepth 10000 10000 10000 10000 10000 10000 clust_threshold 0.8 0.8 0.85 0.85 0.9 0.9 max_barcode_mismatch 0 0 0 0 0 0 filter_adapters 2 2 2 2 2 2 filter_min_trim_len 35 35 35 35 35 35 max_alleles_consens 2 2 2 2 2 2 max_Ns_consens 5 5 5 5 5 5 max_Hs_consens 8 8 8 8 8 8 min_samples_locus 30 38 30 38 30 38 max_SNPs_locus 20 20 20 20 20 20 max_Indels_locus 8 8 8 8 8 8 max_shared_Hs_locus 0.2 0.2 0.2 0.2 0.2 0.2 trim_reads 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0

Appendix C trim_loci 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0 0, 0, 0, 0

Appendix Table C.2 - Summary statistics for the samples used in our GBS assemblies generated using ipyrad. See Table 1 for details of the samples. The parameters shown are: number of raw reads (raw), number of filtered reads (filtered), number of reads that mapped to the chloroplast and mitochondrial reference sequences (plastid), the number of retained unmapped reads (nuclear), clusters that passed the minimum depth requirement (clus.), heterozygosity estimates (het.), error estimates (err.) and consensus sequences (cons.) for each similarity threshold (c80 = 80 %, c85 = 85 % and c90 = 90 %) and finally the number of loci in each assembly (m30 = minimum sample number of 30 and m38 = minimum sample number of 38).

loci loci loci loci loci loci raw filtered clus. clus. clus. het. het. het. err. err. err. cons. cons. cons. c80 c80 c85 c85 c90 c90 ref. (×10-6) (×10-6) plastid nuclear c80 c85 c90 c80 c85 c90 c80 c85 c90 c80 c85 c90 m30 m38 m30 m38 m30 m38 GcB8 2.27 2.02 2827 545033 21212 23423 28187 0.039 0.034 0.027 0.015 0.014 0.012 18277 20940 26514 722 639 783 690 792 695 GsB13 1.86 1.71 1614 518108 18954 20885 24917 0.035 0.031 0.025 0.013 0.013 0.011 16688 18896 23602 260 237 244 226 204 182

adad120 9.27 8.44 3956 1636773 47509 53967 68783 0.036 0.032 0.028 0.012 0.011 0.010 41392 48282 64355 3302 2367 3735 2663 4343 3029 adad135 9.14 8.35 6750 1613834 48415 54940 70083 0.036 0.033 0.028 0.012 0.011 0.010 42291 49171 65527 3400 2412 3804 2704 4442 3080 178 adca363 5.27 4.79 4402 1103700 36076 40574 51222 0.038 0.034 0.028 0.014 0.013 0.011 31043 36033 47727 3193 2314 3580 2572 4170 2951 adca366 8.25 7.54 5295 1438506 45163 50985 64822 0.037 0.033 0.028 0.011 0.011 0.010 39349 45655 60722 3309 2383 3728 2673 4348 3046 addu163 3.92 3.57 3313 911995 30703 34407 43357 0.037 0.034 0.028 0.015 0.014 0.012 26415 30554 40370 3009 2228 3422 2515 3966 2868 addu166 4.50 4.15 3015 965844 34511 38677 48267 0.037 0.034 0.029 0.014 0.013 0.011 29920 34482 45046 3227 2361 3633 2635 4233 2992 ader43 2.84 2.58 2256 687885 25658 28498 35174 0.037 0.033 0.027 0.015 0.014 0.012 22181 25378 32814 2832 2122 3172 2368 3638 2670 ader44 6.26 5.74 3581 1091496 37500 42097 53079 0.035 0.032 0.027 0.011 0.011 0.009 32711 37609 49750 3048 2241 3447 2503 3986 2848 adgr356 2.76 2.58 1873 695085 25197 28168 35079 0.038 0.034 0.028 0.015 0.014 0.012 21772 25040 32815 2775 2135 3112 2365 3633 2704 adgr360 3.83 3.55 2857 864748 30813 34447 43074 0.037 0.033 0.028 0.014 0.013 0.011 26844 30885 40325 3097 2291 3515 2575 4116 2938 adja368 6.45 5.99 2534 1230767 40342 45414 57484 0.036 0.033 0.027 0.013 0.012 0.010 35173 40587 53795 3222 2336 3641 2616 4219 2979 adpa58 5.22 4.82 6035 1051006 39107 43419 53315 0.034 0.031 0.026 0.012 0.012 0.010 34452 39192 50222 3087 2271 3488 2540 4061 2880 adpa62 4.69 4.37 2204 949779 33570 37481 47068 0.037 0.033 0.028 0.012 0.012 0.010 29218 33554 44067 3068 2265 3486 2550 4036 2865

Appendix C Appendix

brbr157 1.89 1.75 823 514977 18902 20886 25145 0.038 0.034 0.027 0.015 0.014 0.012 16348 18572 23499 2069 1653 2333 1852 2727 2117 brbr494 4.87 4.47 3614 988107 34965 39147 48470 0.036 0.032 0.027 0.013 0.012 0.011 30523 35193 45491 3104 2284 3478 2528 4057 2902 brbr552 2.30 2.12 2579 605562 22755 25087 30727 0.036 0.033 0.027 0.015 0.014 0.012 19816 22469 28670 2531 1942 2821 2150 3307 2479 brbr664 5.24 4.79 4061 1042106 36778 41022 51114 0.035 0.032 0.026 0.012 0.012 0.010 32251 36864 48135 3148 2297 3513 2543 4092 2909 brbr674 2.95 2.75 1844 708384 26366 29451 36315 0.037 0.033 0.028 0.014 0.014 0.012 22840 26309 34042 2766 2081 3085 2299 3653 2658 brbr719 1.08 1.00 710 337954 12454 13631 16175 0.040 0.035 0.028 0.017 0.016 0.013 10661 12010 15046 1394 1162 1534 1274 1770 1438 brbr749 2.40 2.26 980 606130 23351 25902 31772 0.037 0.033 0.026 0.014 0.013 0.011 20392 23318 29972 2598 1980 2900 2194 3385 2528 brbr768 1.87 1.76 546 513720 19425 21560 26252 0.038 0.033 0.027 0.015 0.014 0.012 16877 19359 24654 2210 1764 2449 1940 2872 2232 brgo110 4.15 3.83 2386 862743 31703 35449 44181 0.036 0.032 0.026 0.013 0.012 0.010 27832 32111 41728 3103 2277 3491 2535 4083 2900 brgo112 3.04 2.82 1754 679105 26821 29791 36830 0.036 0.032 0.027 0.013 0.012 0.011 23540 26939 34773 2955 2224 3302 2458 3871 2819 ca95 2.62 2.41 3784 648277 24759 27632 34272 0.037 0.033 0.028 0.015 0.014 0.012 21529 24760 32232 2753 2091 3102 2328 3637 2674

179 ca97 3.45 3.17 2749 789047 28890 32367 40318 0.037 0.033 0.028 0.014 0.013 0.011 25268 29106 37883 3015 2236 3385 2486 3951 2855

co80 1.88 1.74 2145 505435 19448 21510 26296 0.037 0.033 0.027 0.015 0.014 0.012 16801 19261 24724 2171 1712 2458 1915 2922 2222 coR107b 1.82 1.70 1080 487777 19337 21320 25969 0.036 0.032 0.026 0.015 0.014 0.012 16764 19131 24415 2218 1776 2511 1986 2961 2266 diR13 1.45 1.36 1146 446337 15116 16719 20572 0.041 0.037 0.030 0.017 0.016 0.013 12954 14811 18985 1564 1297 1752 1438 2027 1636 diR19 1.55 1.44 992 453403 15897 17676 21618 0.040 0.035 0.028 0.017 0.016 0.013 13538 15613 20079 1502 1234 1681 1362 1931 1550 es335 1.89 1.76 2156 541455 19241 21423 26736 0.040 0.037 0.029 0.017 0.016 0.013 16501 19020 24875 2123 1689 2333 1837 2739 2106 es338 2.73 2.52 2581 737650 25697 28705 36027 0.038 0.034 0.029 0.016 0.015 0.012 22126 25567 33656 2690 2028 2997 2257 3476 2552 fi344 3.25 3.02 2575 789305 28531 32015 40119 0.037 0.034 0.028 0.015 0.014 0.012 24755 28674 37639 2982 2214 3326 2489 3897 2833 fi346 2.39 2.22 2877 618631 22943 25749 32127 0.038 0.034 0.028 0.016 0.015 0.012 19874 22976 30045 2520 1956 2805 2168 3243 2435 fo142 2.93 2.70 4551 712424 26796 29927 37368 0.037 0.033 0.027 0.014 0.014 0.012 23270 26859 35109 2882 2183 3251 2435 3768 2786 fo144 2.31 2.14 1562 627310 22441 24976 31093 0.039 0.034 0.028 0.015 0.015 0.012 19323 22378 29162 2367 1850 2691 2091 3148 2401 frca319 0.60 0.55 313 224226 6772 7401 8878 0.046 0.041 0.031 0.020 0.019 0.016 5517 6330 8017 563 473 632 531 708 578 frca320 2.08 1.93 1070 618941 20349 22571 27856 0.039 0.035 0.028 0.017 0.016 0.014 17314 19919 25839 2258 1795 2555 2030 2984 2311 frfo107 3.71 3.47 1759 829440 30058 33709 42243 0.036 0.032 0.027 0.013 0.013 0.011 26351 30500 39811 3050 2270 3436 2530 3986 2875

Appendix C

frfo116 1.01 0.95 691 336176 11362 12562 15341 0.042 0.037 0.029 0.018 0.017 0.014 9633 11072 14179 1192 995 1314 1083 1535 1238 frfr567 4.68 4.36 3405 1005779 34812 39196 49769 0.037 0.033 0.027 0.013 0.013 0.011 30365 35275 46862 3189 2337 3582 2600 4185 2963 frfr585 1.95 1.79 2049 561988 19477 21721 26762 0.038 0.034 0.029 0.017 0.016 0.013 16824 19398 24952 2358 1869 2634 2087 3058 2383 frfr611 1.88 1.75 2592 535609 19097 21264 26274 0.039 0.035 0.028 0.016 0.015 0.013 16485 18947 24467 2304 1829 2589 2042 2996 2297 frfr620 3.52 3.24 4024 832126 29519 33056 41879 0.037 0.033 0.029 0.015 0.014 0.012 25689 29732 39330 3035 2290 3442 2577 4015 2941 frgr177 2.69 2.49 2636 783207 25347 28306 35500 0.038 0.034 0.028 0.016 0.015 0.012 21914 25306 33153 2537 1959 2854 2205 3345 2518 frgr179 3.17 2.93 3695 777083 27421 30778 38592 0.038 0.033 0.027 0.014 0.014 0.012 23843 27747 36291 2959 2234 3337 2507 3877 2871 frpa101 2.00 1.87 2069 532465 20219 22430 27788 0.039 0.034 0.027 0.015 0.015 0.012 17432 20071 26045 2312 1820 2612 2027 3012 2292 frpa92 4.06 3.68 3285 895667 31497 35331 44299 0.035 0.032 0.026 0.014 0.013 0.011 27511 31819 41799 3041 2273 3428 2539 3965 2893 frsu229 2.64 2.44 1656 649348 23460 26204 32909 0.037 0.033 0.027 0.015 0.014 0.011 20267 23468 30858 2582 2013 2922 2254 3394 2544 frsu234 1.23 1.15 896 376951 13306 14690 17877 0.040 0.035 0.028 0.016 0.016 0.013 11348 12984 16623 1475 1233 1626 1340 1883 1526

frsu244 0.71 0.64 382 257436 8135 9014 10625 0.042 0.036 0.029 0.019 0.019 0.015 6787 7791 9693 815 691 923 771 1029 854

gr169 5.33 4.89 6919 1074815 37877 42473 53970 0.037 0.033 0.027 0.012 0.012 0.010 33213 38409 51136 3212 2354 3620 2625 4241 2993 180 gr172 3.70 3.45 2549 855351 30412 34136 43082 0.038 0.033 0.027 0.014 0.013 0.011 26469 30761 40578 3054 2280 3440 2549 3998 2897 haeR15 1.49 1.36 1713 455852 15094 16799 20709 0.040 0.035 0.029 0.018 0.017 0.014 12754 14809 19183 1524 1257 1711 1411 1967 1595 hao56 2.65 2.48 1594 598565 24397 27094 33511 0.036 0.032 0.026 0.014 0.013 0.011 21289 24488 31598 2709 2066 3035 2317 3551 2636 hao57 1.35 1.27 819 398987 14752 16251 19805 0.041 0.035 0.028 0.017 0.016 0.013 12599 14466 18496 1594 1279 1798 1449 2134 1689 hi38 0.79 0.74 480 264016 9161 10130 12194 0.044 0.039 0.030 0.019 0.018 0.015 7640 8841 11248 872 737 968 815 1141 924 hi47 1.00 0.94 572 316904 11970 13099 15929 0.042 0.036 0.029 0.018 0.017 0.014 10118 11544 14774 1209 958 1345 1087 1572 1248 li321 1.23 1.15 1590 378361 14146 15567 18755 0.039 0.034 0.028 0.017 0.016 0.014 12133 13852 17436 1574 1284 1757 1435 2037 1630 li325 0.54 0.51 309 202509 6244 6865 8372 0.046 0.040 0.032 0.020 0.019 0.016 5086 5906 7594 484 413 553 470 628 514 ma775 1.13 1.05 958 341546 13176 14533 17508 0.038 0.034 0.027 0.017 0.016 0.013 11338 12916 16326 1506 1241 1680 1372 1908 1536 ma777 1.02 0.95 814 330710 12290 13466 16287 0.039 0.034 0.027 0.016 0.016 0.014 10562 12011 15153 1346 1103 1490 1218 1742 1397 pimoR25 3.11 2.86 2332 713103 25842 28690 35164 0.036 0.032 0.026 0.014 0.013 0.011 22564 25903 33220 2498 1966 2804 2182 3236 2496 pipiR38 1.28 1.19 1157 376080 12565 13875 16911 0.041 0.037 0.030 0.017 0.016 0.013 10646 12242 15714 1121 934 1281 1046 1462 1198

Appendix C Appendix

pisuR3 0.82 0.77 733 286027 9190 10077 12007 0.043 0.037 0.029 0.019 0.017 0.015 7659 8814 11076 827 700 916 774 1039 866 svR119a 1.78 1.67 2371 501021 19550 21753 26918 0.037 0.032 0.026 0.016 0.015 0.012 16835 19412 25268 2076 1660 2329 1845 2763 2135 svR119b 3.05 2.84 2430 671793 27206 30492 37741 0.035 0.031 0.025 0.013 0.013 0.011 23808 27493 35642 2834 2126 3163 2369 3724 2718 te159 2.06 1.91 1204 531694 19252 21458 26950 0.038 0.034 0.028 0.016 0.015 0.012 16688 19185 25252 2184 1756 2461 1953 2874 2214 te160 1.69 1.58 1556 484362 17195 19114 23578 0.040 0.034 0.028 0.016 0.016 0.013 14782 17048 21996 1935 1565 2189 1757 2541 1987 thM1 2.64 2.42 1245 622892 23437 25997 32470 0.035 0.031 0.026 0.014 0.014 0.012 20493 23419 30537 2249 1777 2512 1956 2861 2210 vi123 1.44 1.35 1247 443182 15581 17312 21347 0.041 0.037 0.030 0.017 0.016 0.014 13326 15322 19784 1746 1424 1957 1582 2282 1815 vi125 1.15 1.08 1358 371392 12436 13825 16750 0.041 0.038 0.030 0.018 0.017 0.014 10482 12127 15514 1419 1202 1571 1309 1853 1498 we49 2.29 2.13 1072 581308 22069 24504 30383 0.038 0.034 0.027 0.015 0.014 0.012 19206 22058 28584 2504 1936 2774 2140 3243 2451 we50 2.57 2.37 2574 642781 23061 25632 31847 0.038 0.034 0.028 0.015 0.014 0.012 19987 22958 29901 2500 1934 2811 2161 3271 2455 wiA2y 3.66 3.42 1932 792572 29553 33099 41694 0.035 0.032 0.026 0.014 0.013 0.011 25843 29790 39315 2862 2162 3194 2412 3761 2755

181 Mean 2.87 2.65 2238 670693 23745 26497 32946 0.038 0.034 0.028 0.015 0.014 0.012 20583 23706 30865 2312 1773 2595 1975 3020 2250

Appendix C

Appendix C

Appendix Table C.3 - Cluster classification for each dataset identified by mclust. The data sets generated in ipyrad vary in clustering threshold (c) and minimum samples required to process a locus (m). Country or island of origin is abbreviated as SP (Spain), TE (Tenerife), GC (Gran Canaria), EH (El Hierro), LP (La Palma), LG (La Gomera), MA (Madeira), LA (Lanzarote), SE (Selvagem Pequena) and FU (Fuerteventura). Habitats types are abbreviated as ME (Mediterranean basin) PF (Pine forest), LF (Laurel forest), SZ (Sclerophyllous zone), CD (Coastal desert) and HD (High altitude desert).

c80 c80 c85 c85 c90 c90 taxa island habitat m30 m38 m30 m38 m30 m38 A. adauctum subsp. adauctum 120 TE PF 1 1 1 1 1 1 A. adauctum subsp. adauctum 135 TE LF 1 1 1 1 1 1 A. adauctum subsp. canariense 363 GC PF 1 1 1 1 1 1 A. adauctum subsp. canariense 366 GC PF 1 1 1 1 1 1 A. adauctum subsp. dugourii 163 TE PF 1 1 1 1 1 1 A. adauctum subsp. dugourii 166 TE PF 1 1 1 1 1 1 A. adauctum subsp. gracile 356 GC PF 1 1 1 1 1 1 A. adauctum subsp. gracile 360 GC PF 1 1 1 1 1 1 A. adauctum subsp. jacobaeifolium 368 GC LF 1 1 1 1 1 1 A. coronopifolium 80 TE SZ 1 1 1 1 1 1 A. coronopifolium r107 TE SZ 1 1 1 1 1 1 A. escarrei 335 GC SZ 1 1 1 1 1 1 A. escarrei 338 GC SZ 1 1 1 1 1 1 A. filifolium 344 GC SZ 1 1 1 1 1 1 A. filifolium 346 GC SZ 1 1 1 1 1 1 A. foeniculaceum 142 TE SZ 1 1 1 1 1 1 A. foeniculaceum 144 TE SZ 1 1 1 1 1 1 A. lidii 321 GC SZ 1 1 1 1 1 1 A. lidii 325 GC SZ 1 1 1 1 1 1 A. tenerifae 159 TE HD 1 1 1 1 1 1 A. tenerifae 160 TE HD 1 1 1 1 1 1 G. coronaria b8 SP ME 1 1 4 4 1 1 A. haemotomma r15 MA CD 1 1 4 4 4 1 G. segetum b13 SP ME 1 4 4 4 1 5 A. dissectum r13 MA SZ 1 4 4 4 4 1 A. dissectum r19 MA SZ 1 4 4 4 4 1 A. pinnatifidum subsp. montanum r25 MA LF 1 4 4 4 4 1 A. pinnatifidum subsp. pinnatifidum r38 MA LF 1 4 4 4 4 1 A. pinnatifidum subsp. succulentum r3 MA CD 1 4 4 4 4 1 A. thalassophilum m1 SE CD 1 4 4 4 4 1 A. adauctum subsp. erythrocarpon 43 EH LF 2 2 2 2 2 2 A. adauctum subsp. erythrocarpon 44 EH LF 2 2 2 2 2 2 A. adauctum subsp. palmensis 58 LP LF 2 2 2 2 2 2 A. adauctum subsp. palmensis 62 LP LF 2 2 2 2 2 2 A. hierrense 38 EH CD 2 2 2 2 2 2 A. hierrense 47 EH SZ 2 2 2 2 2 2 A. sventenii r19a EH SZ 2 2 2 2 2 2 A. sventenii r19b EH SZ 2 2 2 2 2 2 A. broussonetii subsp. gomerensis 110 LG LF 2 2 2 2 2 4 A. broussonetii subsp. gomerensis 112 LG LF 2 2 2 2 2 4 A. callichrysum 95 LG SZ 2 2 2 2 2 4

182 Appendix C

A. callichrysum 97 LG SZ 2 2 2 2 2 4 A. haouarytheum 56 LP PF 2 2 2 2 2 4 A. haouarytheum 57 LP SZ 2 2 2 2 2 4 A. maderense 775 LA SZ 2 2 2 2 2 4 A. maderense 777 LA SZ 2 2 2 2 2 4 A. webbii 49 LP SZ 2 2 2 2 2 4 A. webbii 50 LP LF 2 2 2 2 2 4 A. winterii a2 FU SZ 2 2 2 2 2 4 A. broussonetii subsp. broussonetii 157 TE LF 3 3 3 3 3 3 A. broussonetii subsp. broussonetii 494 TE LF 3 3 3 3 3 3 A. broussonetii subsp. broussonetii 552 TE LF 3 3 3 3 3 3 A. broussonetii subsp. broussonetii 664 TE LF 3 3 3 3 3 3 A. broussonetii subsp. broussonetii 674 TE LF 3 3 3 3 3 3 A. broussonetii subsp. broussonetii 719 TE LF 3 3 3 3 3 3 A. broussonetii subsp. broussonetii 749 TE LF 3 3 3 3 3 3 A. broussonetii subsp. broussonetii 768 TE LF 3 3 3 3 3 3 A. frutescens subsp. canariae 319 GC CD 4 5 5 5 5 5 A. frutescens subsp. canariae 320 GC CD 4 5 5 5 5 5 A. frutescens subsp. foeniculaceum 107 LG CD 4 5 5 5 5 6 A. frutescens subsp. foeniculaceum 116 LG CD 4 5 5 5 5 6 A. frutescens subsp. frutescens 567 TE CD 4 5 5 5 5 6 A. frutescens subsp. frutescens 585 TE CD 4 5 5 5 5 6 A. frutescens subsp. frutescens 611 TE CD 4 5 5 5 5 6 A. frutescens subsp. frutescens 620 TE CD 4 5 5 5 5 6 A. frutescens subsp. gracilescens 177 TE CD 4 5 5 5 5 6 A. frutescens subsp. gracilescens 179 TE CD 4 5 5 5 5 6 A. frutescens subsp. parviflorum 101 LG SZ 4 5 5 5 5 6 A. frutescens subsp. parviflorum 92 LG SZ 4 5 5 5 5 6 A. frutescens subsp. succulentum 229 TE CD 4 5 5 5 5 6 A. frutescens subsp. succulentum 234 TE CD 4 5 5 5 5 6 A. frutescens subsp. succulentum 244 TE CD 4 5 5 5 5 6 A. gracile 169 TE SZ 4 5 5 5 5 6 A. gracile 172 TE SZ 4 5 5 5 5 6 A. vincentii 123 TE PF 4 5 5 5 5 6 A. vincentii 125 TE PF 4 5 5 5 5 6

183 Appendix C

Appendix Table C.4 - AICc ranked substitution models according for each dataset with differing clustering thresholds (c) and minimum samples number (m).

c m AICc rank model K lnL score delta weight 80 30 1 TVM+I+G 9 -559153.8608 1118623.926 0 0.6419 2 GTR+I+G 10 -559153.4433 1118625.094 1.1675 0.3581

3 TIM3+I+G 8 -559270.2204 1118854.643 230.7165 0

4 TPM3uf+I+G 7 -559273.5908 1118859.381 235.4548 0

5 TIM2+I+G 8 -559290.2902 1118894.783 270.8561 0

80 38 1 TVM+I+G 9 -382359.3931 765035.0909 0 0.7059 2 GTR+I+G 10 -382359.2668 765036.8421 1.7512 0.2941

3 TPM2uf+I+G 7 -382448.5323 765209.3617 174.2708 0

4 TIM2+I+G 8 -382448.5143 765211.3296 176.2387 0

5 TPM3uf+I+G 7 -382460.5079 765233.3129 198.222 0

85 30 1 TVM+I+G 9 -634468.6024 1269253.205 0 0.727 2 GTR+I+G 10 -634468.5821 1269255.164 1.9593 0.273

3 TIM2+I+G 8 -634604.8262 1269523.652 270.4475 0

4 TPM2uf+I+G 7 -634606.3281 1269524.656 271.4513 0

5 TIM3+I+G 8 -634631.1427 1269576.285 323.0806 0

85 38 1 TVM+I+G 9 -428582.3471 857480.6942 0 0.7222 2 GTR+I+G 10 -428582.3023 857482.6047 1.9105 0.2778

3 TPM2uf+I+G 7 -428662.0549 857636.1097 155.4156 0

4 TIM2+I+G 8 -428661.619 857637.2381 156.5439 0

5 TPM3uf+I+G 7 -428702.6987 857717.3973 236.7031 0

90 30 1 TVM+I+G 9 -746839.3624 1493994.876 0 0.7282 2 GTR+I+G 10 -746839.347 1493996.847 1.971 0.2718

3 TPM1uf+I+G 7 -747014.2038 1494340.555 345.679 0

4 TIM1+I+G 8 -747013.7833 1494341.716 346.8398 0

5 TIM2+I+G 8 -747028.6058 1494371.361 376.4849 0

90 38 1 TVM+I+G 9 -490633.4506 981583.1345 0 0.5986 2 GTR+I+G 10 -490632.849 981583.9341 0.7996 0.4014

3 TPM2uf+I+G 7 -490736.6667 981785.5608 202.4263 0

4 TIM2+I+G 8 -490736.0665 981786.3634 203.2288 0

5 TPM1uf+I+G 7 -490756.3231 981824.8737 241.7391 0

184 Appendix C

Appendix Figure C.1 - Clades within islands used for D-statistics where colour signifies the island of origin and the clade labels represent the number of clades identified.

185 Appendix C

Appendix Figure C.2 - Number of reads for each sample following the filtering step of the ipyrad pipeline. The dashed line at 0.5 × 106 reads passed filter represents an arbitrary cut off for the removal of samples with poor quality data.

186 Appendix C

Appendix Figure C.3 - Principal Component Analysis (PCA) for each assembled dataset. These include (A) cluster threshold 80 % and minimum sample number 30, (B) cluster threshold 80 % and minimum sample number 38, (C) cluster threshold 85 % and minimum sample number 30, (D) cluster threshold 85 % and minimum sample number 38, (E) cluster threshold 90 % and minimum sample number 30 and (F) cluster threshold 90 % and minimum sample number 38.

187 Appendix C

Appendix Figure C.4 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 80 % and minimum sample number of 30.

188 Appendix C

Appendix Figure C.5 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 80 % and minimum sample number of 38.

189 Appendix C

Appendix Figure C.6 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 85 % and minimum sample number of 30.

190 Appendix C

Appendix Figure C.7 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 85 % and minimum sample number of 38.

191 Appendix C

Appendix Figure C.8 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset based on a cluster threshold of 90 % and minimum sample number of 30.

192 Appendix C

Appendix Figure C.9 - Delta K (A) and STRUCTURE plots (B) for K one to 10 for the assembled dataset on a cluster threshold of 90 % and minimum sample number of 38.

193 Appendix C

Appendix Figure C.10 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 80 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

194 Appendix C

Appendix Figure C.11 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 80 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

195 Appendix C

Appendix Figure C.12 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 85 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

196 Appendix C

Appendix Figure C.13 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 85 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

197 Appendix C

Appendix Figure C.14 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 90 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

198 Appendix C

Appendix Figure C.15 - Maximum likelihood tree generated using RAxML-NG based on a cluster threshold of 90 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Bootstrap values ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

199 Appendix C

Appendix Figure C.16 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 80 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

200 Appendix C

Appendix Figure C.17 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 80 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

201 Appendix C

Appendix Figure C.18 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 85 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

202 Appendix C

Appendix Figure C.19 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 85 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

203 Appendix C

Appendix Figure C.20 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 90 % and minimum sample number of 30. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

204 Appendix C

Appendix Figure C.21 - Bayesian inference tree generated using MrBayes based on a cluster threshold of 90 % and minimum sample number of 38. Branch lengths are shown except for outgroup taxa from Glebionis which were cut off to equal the longest branch length in Argyranthemum. Posterior probability ≥ 70 are shown on the nodes and tip labels are coloured by island. Clades A-G are discussed in text.

205

Supplementary information for chapter 5

Appendix Table D.1 - Input data used for morphological analysis. Morpological characters include leaf attachment (leaf.att.), primary lobe length (lobe.len), primary lobe width (lobe.w), primary lobe length width ratio (lobe.lw.ratio), leaf and lamina width ratio, (leaf.lam.ratio), capitula width (cap.w), ray cypselae colour (ray.cyp.colour), ray cypselae arrangement (ray.cyp.arr.), ray cypselae wings (ray.cyp.wings) and disc wing number (disc.wing.no). Taxa are abbreviated as brbr (A. broussonetii subsp. broussonetii), brgo (A. broussonetii subsp. gomerensis) and ca (A. callichrysum).

voucher taxon leaf.att. lobe.len lobe.w lobe.lw.ratio leaf.lam.ratio cap.w ray.cyp.colour ray.cyp.arr. ray.cyp.wings disc.wing.no Bramwell.Humphries brbr sessile 2.50 1.10 2.27 7.20 1.3 chestnut-brown solitary present two

207 Bramwell.Humphries.3382 brbr sessile 3.70 1.50 2.47 12.25 1.9 yellow-brown solitary present two

Broussonet brbr sessile 3.20 1.80 1.78 16.75 1.5 NA NA NA NA Jarvis.467 brbr sessile 4.40 1.70 2.59 9.83 1.5 chestnut-brown solitary absent NA Jarvis.Bramwell.544 brbr sessile 4.50 1.90 2.37 14.40 1.6 NA NA NA NA White.et.al.155 brbr sessile 3.30 1.10 3.00 11.50 1.6 yellow-brown solitary absent two White.et.al.157 brbr sessile 2.80 1.40 2.00 7.83 1.8 yellow-brown solitary present two White.et.al.683 brbr sessile 2.00 0.60 3.33 10.33 1.4 chestnut-brown solitary present two White.et.al.686 brbr sessile 2.90 0.90 3.22 9.25 1.9 black solitary absent two White.et.al.69 brbr sessile 3.70 1.50 2.47 11.60 1.5 yellow-brown solitary present two White.et.al.70 brbr sessile 3.70 1.50 2.47 8.43 1.7 yellow-brown solitary present two White.et.al.71 brbr sessile 3.20 1.20 2.67 8.40 1.9 chestnut-brown solitary present two White.et.al.726 brbr sessile 2.60 0.90 2.89 9.50 1.5 yellow-brown solitary present two White.et.al.79 brbr sessile 3.20 0.80 4.00 13.00 1.7 yellow-brown solitary present two White.et.al.84 brbr sessile 4.10 2.10 1.95 14.40 1.9 yellow-brown solitary present two

White.et.al.87 brbr sessile 2.40 0.90 2.67 8.25 2 yellow-brown solitary present two Appendix D Bourgeau.247 brgo petiolate 1.80 0.60 3.00 7.00 1.3 chestnut-brown coalesced present NA Bramwell.Humphries.3355 brgo petiolate 3.20 1.20 2.67 8.67 1.3 yellow-brown both present NA

Jarvis.603 brgo petiolate 2.00 0.60 3.33 11.67 1.2 yellow-brown coalesced present one Lowe.133 brgo petiolate 2.80 1.00 2.80 15.00 1 NA NA NA NA White.et.al.100 brgo petiolate 3.50 0.80 4.38 14.50 1.90 NA NA NA NA White.et.al.104 brgo petiolate 5.70 1.40 4.07 14.83 1.40 yellow-brown both present one White.et.al.108 brgo petiolate 3.30 1.10 3.00 11.50 1.3 NA NA NA NA White.et.al.109 brgo petiolate 2.60 0.70 3.71 13.67 1.20 NA NA NA NA White.et.al.110 brgo petiolate 4.30 1.20 3.58 7.00 1.5 yellow-brown both present one White.et.al.111 brgo petiolate 2.70 1.00 2.70 5.29 1.4 chestnut-brown both present one White.et.al.112 brgo petiolate 3.30 0.90 3.67 19.33 1.2 yellow-brown both present one White.et.al.113 brgo petiolate 2.90 0.90 3.22 10.25 1.1 NA NA NA NA White.et.al.114 brgo petiolate 3.00 0.90 3.33 7.60 1.2 yellow-brown both absent one

White.et.al.115 brgo petiolate 4.30 1.20 3.58 12.20 1.4 yellow-brown both present one

Bramwell.Humphries.3174 ca petiolate 3.30 0.70 4.71 21.00 0.8 yellow-brown coalesced present one 208 White.et.al.94 ca petiolate 3.50 0.80 4.38 17.00 1.2 yellow-brown coalesced present one White.et.al.95 ca petiolate 3.10 0.70 4.43 20.50 1.5 yellow-brown both present one White.et.al.96 ca petiolate 2.20 0.40 5.50 16.00 1.2 brown-purple both present one White.et.al.97 ca petiolate 4.20 0.70 6.00 31.00 1.2 yellow-brown both present one White.et.al.98 ca petiolate 2.90 0.70 4.14 14.33 1 chestnut-brown both present one White.et.al.99 ca petiolate 2.50 0.50 5.00 15.33 1.4 yellow-brown both present one

Appendix E Appendix

Supplementary information for chapter 6 Appendix Table E.1 - Seed accessions used to grow plants for RNA extraction and transcriptome sequencing. Collection reference, population letter, locality, latitude, longitude and representative voucher material deposited at the Natural History Museum London (BM) is provided.

Taxon Ref Pop. Locality x y Voucher1 A. broussonetii subsp. broussonetii 742 A Barranco de Valle Crispin 28.5308 -16.2427 BM000828668* A. broussonetii subsp. broussonetii 726 B Las Casas de la Cumbre 28.5352 -16.2346 BM000828476 A. broussonetii subsp. broussonetii 554 C Path to Mesa del Sabinal 28.5587 -16.1576 BM000828673* A. broussonetii subsp. broussonetii 69 D La Cumbrilla 28.5669 -16.1529 BM000828683 A. broussonetii subsp. broussonetii 679 E Chamorga 28.5719 -16.1559 BM000828483 A. broussonetii subsp. broussonetii 157 F Roques del Fraile 28.5522 -16.2292 BM000828598 A. sundingii 181 G Valle Crispin 28.5152 -16.2359 BM000828751* A. sundingii 199 G Valle Crispin 28.5151 -16.2366 BM000828750* A. sundingii 272 H Valle Brosque 28.5198 -16.2265 BM000828566 A. sundingii 276 H Valle Brosque 28.5198 -16.2264 BM000828567* A. sundingii 292 I Roque Cubo 28.5236 -16.2201 BM000828561 A. sundingii 401 J Barranco del Cercado de Andrés 28.5280 -16.2130 BM000828516

209 A. lemsii 76 M Path to Mesa del Sabinal 28.5584 -16.1521 BM000828676 A. lemsii 73 M Path to Mesa del Sabinal 28.5583 -16.1527 BM000828679 A. lemsii 66 N La Cumbrilla 28.5673 -16.1510 BM000828686 A. lemsii 68 N La Cumbrilla 28.5673 -16.1510 BM000828684 A. lemsii 261 O Barranco de Roque Bermejo 28.5729 -16.1518 BM000828724* A. lemsii 264 O Barranco de Roque Bermejo 28.5729 -16.1518 BM000828723* A. frutescens subsp. frutescens 568 P Maria Jiménez 28.4994 -16.2280 BM000828497 A. frutescens subsp. frutescens 623 Q Barranco del Cercado de Andrés 28.5129 -16.1964 BM000828492 A. frutescens subsp. frutescens 632 R Igueste de San Andreas 28.5281 -16.1591 BM000828490 A. frutescens subsp. succulentum 241 S Roque Bermejo 28.5797 -16.1351 BM000828732* A. frutescens subsp. succulentum 232 T Between Almáciga and Roque Bermejo 28.5823 -16.1653 BM000828575* A. frutescens subsp. succulentum 153 U Barranco de Afur 28.5626 -16.2528 BM000828601

1 All material was collected by O. White, M. Carine, A. Reyes-Betancort, A. Santos-Guerra & G. Torre

* If a voucher was not collected from the same plant, a barcode of a representative voucher from the same population deposited at the Natural History Museum London (BM) is provided

Appendix E

Appendix E

Appendix Table E.2 - Summary of raw reads, filtered reads and percentage (pct.) of filtered reads across all samples.

Sample Raw Filtered Pct. filtered bro 1 30,256,576 28,507,427 5.78% bro 2 31,969,742 29,254,850 8.49% bro 3 34,096,840 32,477,301 4.75% bro 4 30,006,151 28,667,503 4.46% bro 5 36,165,576 34,730,809 3.97% bro 6 37,264,585 35,566,629 4.56% sun 7 29,134,686 27,700,509 4.92% sun 8 30,820,622 29,433,290 4.50% sun 9 36,162,662 34,635,900 4.22% sun 10 21,466,656 20,082,748 6.45% sun 11 25,043,659 23,888,500 4.61% sun 12 26,905,584 25,593,836 4.88% lem 13 25,410,941 24,129,706 5.04% lem 14 25,932,851 24,547,412 5.34% lem 15 24,870,800 23,085,199 7.18% lem 16 24,590,452 23,401,360 4.84% lem 17 26,815,926 25,418,012 5.21% lem 18 26,447,171 25,108,225 5.06% fru 19 27,171,940 25,875,691 4.77% fru 20 23,509,641 22,415,862 4.65% fru 21 22,054,432 20,677,352 6.24% fru 22 27,377,766 25,928,306 5.29% fru 23 24,270,313 22,936,826 5.49% fru 24 26,452,080 25,010,871 5.45% mean 28,091,569 26,628,089 5.21% total 674,197,652 639,074,124

Appendix Table E.3 - Summary of input, selected and discarded reads during normalisation within and across species.

Input reads Selected Selected (%) Discarded Discarded (%) Within sp.

bro 189,204,519 9,439,787 4.99% 816,685 0.43% fru 142,844,908 9,435,945 6.61% 702,352 0.49%

lem 145,689,914 9,069,184 6.22% 689,029 0.47%

sun 161,334,783 9,058,184 5.61% 660,539 0.41%

mean 159,768,531 9,250,775 5.86% 717,151 0.45%

Across sp.

total 37,003,100 19,808,898 53.53% 96,398 0.26%

210

Appendix Table E.4 - Summary statistics for interspecific and species-specific assemblies generated using the script TrinityStats.pl. Taxa are abbreviated as bro (A. broussonetii), fru (A. frutescens), lem (A. lemsii) and sun (A. sundingii) and mean statistics species-specific assemblies are also presented.

Interspecific assembly Species-specific assemblies bro fru lem sun Mean

Counts of transcripts

Total trinity 'genes' 512,143 233,152 217,956 205,469 217,470 218,512 Total trinity transcripts 1,083,354 461,625 445,389 419,735 421,520 437,067

Stats based on all transcript contigs

Contig N10 2154 2697 2539 2677 2632 2,636 Contig N20 1594 2041 1927 2014 1983 1,991

211 Contig N30 1233 1640 1552 1623 1596 1,603

Contig N40 970 1330 1256 1316 1294 1,299

Contig N50 763 1067 1009 1057 1034 1,042

Median contig length 394 461 461 472 446 460

Average contig 585.11 719.69 700.39 724.17 700.28 711

Total assembled bases 633,876,284 332,226,317 311,948,118 303,959,590 295,181,718 310,828,936

Stats based on longest isoform per gene

Contig N10 2186 2719 2617 2720 2689 2,686 Contig N20 1583 2026 1984 2037 2009 2,014

Contig N30 1185 1611 1588 1630 1601 1,608

Contig N40 895 1267 1254 1297 1258 1,269

Contig N50 679 965 967 1001 959 973

Median contig length 341 370 387 397 360 379

Average contig 532.13 634.52 645.57 663.22 625.35 642

Total assembled bases 272,524,505 147,940,758 140,705,789 136,271,889 135,995,494 140,228,483 Appendix E

Appendix Table E.5 - Summary statistics for OrthoFinder analyses with varying values for the inflation parameter (I). ). OG, orthogroups; G50, define; O50, define

I1.5 I1.7 I1.9 I2.1 I2.3 I2.5 I2.7 I2.9 I3.1 I3.3 I3.5 I3.7 I3.9 No. of genes 557,064 557,064 557,064 557,064 557,064 557,064 557,064 557,064 557,064 557,064 557,064 557,064 557,064 No. of genes in OGs 473,563 473,558 473,550 473,511 473,377 473,170 472,858 472,359 471,728 471,027 470,322 469,609 468,885 No. of unassigned genes 83,501 83,506 83,514 83,553 83,687 83,894 84,206 84,705 85,336 86,037 86,742 87,455 88,179 % genes in OGs 85 85 85 85 85 84.9 84.9 84.8 84.7 84.6 84.4 84.3 84.2 % unassigned genes 15 15 15 15 15 15.1 15.1 15.2 15.3 15.4 15.6 15.7 15.8 No. OGs 44,019 50,397 55,097 58,996 62,279 65,180 67,776 70,036 72,039 73,833 75,304 76,819 78,057 No. sp.-specific OGs 379 738 1,156 1,586 2,113 2,651 3,206 3,792 4,359 4,819 5,236 5,705 6,063 No. genes in sp. specific OGs 2,857 4,427 5,972 7,440 9,089 10,759 12,255 13,797 15,302 16,422 17,541 18,791 19,767

% genes in sp. specific OGs 0.5 0.8 1.1 1.3 1.6 1.9 2.2 2.5 2.7 2.9 3.1 3.4 3.5 212 Mean orthogroup size 10.8 9.4 8.6 8 7.6 7.3 7 6.7 6.5 6.4 6.2 6.1 6 Median orthogroup size 6 5 5 4 4 4 4 4 4 4 4 3 3 G50 (assigned genes) 18 16 14 13 13 12 12 11 11 11 10 10 10 G50 (all genes) 15 13 12 11 10 10 9 9 9 8 8 8 7 O50 (assigned genes) 6,291 7,698 8,740 9,557 10,218 10,802 11,311 11,746 12,122 12,459 12,744 13,061 13,320 O50 (all genes) 8,841 10,686 12,004 13,056 13,911 14,656 15,341 15,948 16,477 17,022 17,486 17,958 18,437 No. of OGs with all sp. present 9,837 9,820 9,763 9,720 9,672 9,614 9,528 9,455 9,355 9,261 9,175 9,081 8,966 No. of single-copy OGs 362 389 405 420 429 434 439 441 446 448 448 452 456

Appendix E Appendix

Appendix Table E.6 - Summary statistics used for comparison of OrthoFinder runs with varying values of the inflation parameter

Inflation Pct. transcripts in Mono. Mono. Argyranthemum Orthogroups parameter orthogroups Orthogroups Pct. orthogroups I1.5 44,019 85.0% 38,567 87.6% 36,603 I1.7 50,397 85.0% 45,146 89.6% 42,739 I1.9 55,097 85.0% 50,029 90.8% 47,268 I2.1 58,996 85.0% 54,051 91.6% 51,003 I2.3 62,279 85.0% 57,440 92.2% 54,094 I2.5 65,180 84.9% 60,462 92.8% 56,818 I2.7 67,776 84.9% 63,164 93.2% 59,255 I2.9 70,036 84.8% 65,515 93.5% 61,346

213 I3.1 72,039 84.7% 67,589 93.8% 63,172

I3.3 73,833 84.6% 69,437 94.0% 64,819 I3.5 75,304 84.4% 70,979 94.3% 66,148 I3.7 76,819 84.3% 72,548 94.4% 67,499 I3.9 78,057 84.2% 73,843 94.6% 68,608

Appendix E

Appendix E

Appendix Table E.7 - Full list of over-represented GO terms were identified for differentially expressed transcripts between the parent species across pipelines 1,2,3 and 5 .

Gene ontology Description GO:0044255 cellular lipid metabolic process GO:0044106 cellular amine metabolic process GO:0051704 multi-organism process GO:0043436 oxoacid metabolic process GO:0044042 glucan metabolic process GO:0009416 response to light stimulus GO:0006950 response to stress GO:0044283 small molecule biosynthetic process GO:0016020 membrane GO:0016053 organic acid biosynthetic process GO:0030005 cellular di-, tri-valent inorganic cation homeostasis GO:0006519 cellular amino acid and derivative metabolic process GO:0006520 cellular amino acid metabolic process GO:0046394 carboxylic acid biosynthetic process GO:0055080 cation homeostasis GO:0044435 plastid part GO:0019752 carboxylic acid metabolic process GO:0042180 cellular ketone metabolic process GO:0032787 monocarboxylic acid metabolic process GO:0044281 small molecule metabolic process GO:0009628 response to abiotic stimulus GO:0009314 response to radiation GO:0010035 response to inorganic substance GO:0006073 cellular glucan metabolic process GO:0005773 vacuole GO:0006082 organic acid metabolic process GO:0030003 cellular cation homeostasis GO:0050896 response to stimulus GO:0009266 response to temperature stimulus GO:0030076 light-harvesting complex

214

Appendix Table E.8 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci between the parental species, A. broussonetii and A. frutescens. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided.

GO term Ontology Description Number in input list Number in BG/Ref p-value FDR GO:0050896 P response to stimulus 221 1292 4E-12 1.8E-09 GO:0042221 P response to chemical stimulus 114 640 2.5E-08 5.4E-06 GO:0044281 P small molecule metabolic process 124 723 6E-08 8.7E-06 GO:0006519 P cellular amino acid and derivative metabolic process 59 277 1.1E-07 0.000012 GO:0009611 P response to wounding 22 65 3.3E-07 0.000029 GO:0044283 P small molecule biosynthetic process 63 317 5.5E-07 0.000031 215 GO:0055080 P cation homeostasis 17 43 5.6E-07 0.000031

GO:0009628 P response to abiotic stimulus 99 571 5.9E-07 0.000031 GO:0043436 P oxoacid metabolic process 73 388 7.1E-07 0.000031 GO:0019752 P carboxylic acid metabolic process 73 388 7.1E-07 0.000031 GO:0006082 P organic acid metabolic process 73 389 7.8E-07 0.000031 GO:0042180 P cellular ketone metabolic process 73 400 2.2E-06 0.00007 GO:0070887 P cellular response to chemical stimulus 38 165 2.3E-06 0.00007 GO:0050801 P ion homeostasis 17 47 2.4E-06 0.00007 GO:0046394 P carboxylic acid biosynthetic process 42 191 2.6E-06 0.00007 GO:0016053 P organic acid biosynthetic process 42 191 2.6E-06 0.00007 GO:0003824 F catalytic activity 510 3823 5E-07 0.000091 GO:0005215 F transporter activity 106 628 9.2E-07 0.000091 GO:0071310 P cellular response to organic substance 35 150 4.1E-06 0.00011 GO:0006950 P response to stress 115 722 5.6E-06 0.00013

GO:0051707 P response to other organism 42 197 5.8E-06 0.00013 Appendix E GO:0048878 P chemical homeostasis 19 60 6.2E-06 0.00013 GO:0051704 P multi-organism process 52 265 6.8E-06 0.00014

GO:0009607 P response to biotic stimulus 43 209 1.2E-05 0.00023 GO:0030003 P cellular cation homeostasis 14 39 0.00002 0.00038 GO:0006520 P cellular amino acid metabolic process 39 188 2.3E-05 0.00041 GO:0009308 P amine metabolic process 43 215 2.4E-05 0.00041 GO:0044106 P cellular amine metabolic process 40 196 2.7E-05 0.00046 GO:0015020 F glucuronosyltransferase activity 6 7 7.9E-06 0.00052 GO:0034641 P cellular nitrogen compound metabolic process 57 316 3.4E-05 0.00054 GO:0010033 P response to organic substance 66 382 3.6E-05 0.00056 GO:0071495 P cellular response to endogenous stimulus 27 115 4.2E-05 0.00063 GO:0022857 F transmembrane transporter activity 80 472 1.3E-05 0.00065 GO:0019825 F oxygen binding 18 59 1.9E-05 0.00075

GO:0006873 P cellular ion homeostasis 14 42 5.2E-05 0.00076

GO:0019748 P secondary metabolic process 33 155 5.4E-05 0.00076 216 GO:0005773 C vacuole 36 161 8.4E-06 0.00079 GO:0042626 F ATPase activity, coupled to transmembrane movement of substances 26 108 3.6E-05 0.00089 GO:0043492 F ATPase activity, coupled to movement of substances 26 108 3.6E-05 0.00089 GO:0016820 F hydrolase activity, acting on acid anhydrides, catalyzing transmembrane movement of substances 26 108 3.6E-05 0.00089 GO:0055082 P cellular chemical homeostasis 14 43 0.00007 0.00095 GO:0009416 P response to light stimulus 45 239 7.2E-05 0.00095 GO:0007242 P intracellular signaling cascade 39 199 8.5E-05 0.0011 GO:0055066 P di-, tri-valent inorganic cation homeostasis 11 29 8.6E-05 0.0011 GO:0009605 P response to external stimulus 32 154 0.00011 0.0013 GO:0042493 P response to drug 11 30 0.00012 0.0014 GO:0006855 P multidrug transport 11 30 0.00012 0.0014 GO:0015893 P drug transport 11 30 0.00012 0.0014 GO:0006875 P cellular metal ion homeostasis 10 26 0.00016 0.0017

GO:0055065 P metal ion homeostasis 10 26 0.00016 0.0017 GO:0008652 P cellular amino acid biosynthetic process 21 88 0.00022 0.0022

Appendix E Appendix

GO:0009719 P response to endogenous stimulus 51 294 0.00022 0.0022 GO:0022804 F active transmembrane transporter activity 52 293 0.00011 0.0023 GO:0009314 P response to radiation 45 251 0.00023 0.0023 GO:0009309 P amine biosynthetic process 22 95 0.00025 0.0024 GO:0016740 F transferase activity 185 1346 0.00014 0.0027 GO:0042623 F ATPase activity, coupled 41 219 0.00016 0.0029 GO:0042592 P homeostatic process 22 97 0.00034 0.0032 GO:0044271 P cellular nitrogen compound biosynthetic process 34 182 0.00056 0.0052 GO:0006631 P fatty acid metabolic process 24 114 0.0006 0.0055 GO:0009081 P branched chain family amino acid metabolic process 7 16 0.00062 0.0056 GO:0044437 C vacuolar part 14 48 0.00026 0.006

217 GO:0005774 C vacuolar membrane 14 48 0.00026 0.006

GO:0010319 C stromule 9 23 0.00029 0.006 GO:0016020 C membrane 219 1654 0.00032 0.006 GO:0005886 C plasma membrane 101 689 0.00039 0.006 GO:0051716 P cellular response to stimulus 55 339 0.00068 0.006 GO:0016746 F transferase activity, transferring acyl groups 25 118 0.00042 0.0061 GO:0015103 F inorganic anion transmembrane transporter activity 11 34 0.00044 0.0061 GO:0015399 F primary active transmembrane transporter activity 27 132 0.00046 0.0061 GO:0015405 F P-P-bond-hydrolysis-driven transmembrane transporter activity 27 132 0.00046 0.0061 GO:0009266 P response to temperature stimulus 34 185 0.00076 0.0065 GO:0044255 P cellular lipid metabolic process 41 236 0.00078 0.0066 GO:0007165 P signal transduction 46 275 0.00091 0.0075 GO:0019725 P cellular homeostasis 17 72 0.00093 0.0076 GO:0009698 P phenylpropanoid metabolic process 16 66 0.00096 0.0076 GO:0016887 F ATPase activity 48 286 0.00067 0.0079

Appendix E GO:0016829 F lyase activity 28 142 0.00068 0.0079 GO:0022892 F substrate-specific transporter activity 66 425 0.00075 0.0082

GO:0005976 P polysaccharide metabolic process 19 86 0.0011 0.0087 GO:0071103 P DNA conformation change 9 27 0.0011 0.0087 GO:0019438 P aromatic compound biosynthetic process 21 99 0.0011 0.0087 GO:0023046 P signaling process 46 279 0.0012 0.0087 GO:0032870 P cellular response to hormone stimulus 22 106 0.0012 0.0087 GO:0023060 P signal transmission 46 279 0.0012 0.0087 GO:0009755 P hormone-mediated signaling pathway 22 106 0.0012 0.0087 GO:0009624 P response to nematode 10 33 0.0014 0.0097 GO:0023052 P signaling 64 421 0.0015 0.01 GO:0005199 F structural constituent of cell wall 5 9 0.001 0.011 GO:0000271 P polysaccharide biosynthetic process 15 63 0.0017 0.011

GO:0006575 P cellular amino acid derivative metabolic process 22 109 0.0017 0.012

GO:0006952 P defense response 28 151 0.0018 0.012 218 GO:0009082 P branched chain family amino acid biosynthetic process 5 10 0.0019 0.012 GO:0009867 P jasmonic acid mediated signaling pathway 7 19 0.0021 0.013 GO:0071395 P cellular response to jasmonic acid stimulus 7 19 0.0021 0.013 GO:0005618 C cell wall 28 146 0.0011 0.014 GO:0030312 C external encapsulating structure 28 148 0.0013 0.015 GO:0016491 F oxidoreductase activity 80 548 0.0015 0.015 GO:0009753 P response to jasmonic acid stimulus 14 59 0.0024 0.015 GO:0006629 P lipid metabolic process 51 328 0.0025 0.015 GO:0009414 P response to water deprivation 16 72 0.0025 0.015 GO:0030001 P metal ion transport 18 85 0.0025 0.015 GO:0009813 P flavonoid biosynthetic process 9 30 0.0026 0.015 GO:0015833 P peptide transport 9 30 0.0026 0.015 GO:0006857 P oligopeptide transport 9 30 0.0026 0.015

GO:0030005 P cellular di-, tri-valent inorganic cation homeostasis 8 25 0.0028 0.016 GO:0017111 F nucleoside-triphosphatase activity 65 432 0.0018 0.017

Appendix E Appendix

GO:0009409 P response to cold 22 114 0.0031 0.017 GO:0032787 P monocarboxylic acid metabolic process 32 186 0.0031 0.017 GO:0015849 P organic acid transport 9 31 0.0033 0.018 GO:0046942 P carboxylic acid transport 9 31 0.0033 0.018 GO:0009699 P phenylpropanoid biosynthetic process 13 55 0.0035 0.018 GO:0000325 C plant-type vacuole 10 34 0.0018 0.019 GO:0010218 P response to far red light 8 26 0.0037 0.019 GO:0046873 F metal ion transmembrane transporter activity 18 84 0.0022 0.02 GO:0043648 P dicarboxylic acid metabolic process 9 32 0.0042 0.021 GO:0006633 P fatty acid biosynthetic process 15 69 0.0042 0.021 GO:0006725 P cellular aromatic compound metabolic process 27 153 0.0043 0.022

219 GO:0009415 P response to water 16 76 0.0045 0.022

GO:0015079 F potassium ion transmembrane transporter activity 6 15 0.0027 0.023 GO:0016462 F pyrophosphatase activity 66 450 0.0031 0.023 GO:0015198 F oligopeptide transporter activity 5 11 0.0032 0.023 GO:0015197 F peptide transporter activity 5 11 0.0032 0.023 GO:0016787 F hydrolase activity 177 1380 0.0033 0.023 GO:0016818 F hydrolase activity, acting on acid anhydrides, in phosphorus-containing anhydrides 66 452 0.0034 0.023 GO:0016817 F hydrolase activity, acting on acid anhydrides 66 452 0.0034 0.023 GO:0048037 F cofactor binding 20 101 0.0034 0.023 GO:0009725 P response to hormone stimulus 43 275 0.0046 0.023 GO:0070279 F vitamin B6 binding 6 16 0.0039 0.024 GO:0030170 F pyridoxal phosphate binding 6 16 0.0039 0.024 GO:0009620 P response to 13 57 0.0049 0.024 GO:0009644 P response to high light intensity 7 22 0.0053 0.026 GO:0009694 P jasmonic acid metabolic process 6 17 0.0055 0.026

Appendix E GO:0009695 P jasmonic acid biosynthetic process 6 17 0.0055 0.026 GO:0044042 P glucan metabolic process 13 58 0.0057 0.026

GO:0016758 F transferase activity, transferring hexosyl groups 28 161 0.0046 0.027 GO:0048519 P negative regulation of biological process 26 149 0.0058 0.027 GO:0030276 F clathrin binding 5 12 0.005 0.029 GO:0016773 F phosphotransferase activity, alcohol group as acceptor 69 486 0.0054 0.031 GO:0008509 F anion transmembrane transporter activity 13 58 0.0057 0.031 GO:0015075 F ion transmembrane transporter activity 42 271 0.0059 0.031 GO:0022891 F substrate-specific transmembrane transporter activity 54 367 0.0062 0.032 GO:0009705 C plant-type vacuole membrane 8 26 0.0037 0.033 GO:0044435 C plastid part 69 479 0.0039 0.033 GO:0016831 F carboxy-lyase activity 9 34 0.0065 0.033 GO:0071370 P cellular response to gibberellin stimulus 5 13 0.0075 0.033

GO:0009740 P gibberellic acid mediated signaling pathway 5 13 0.0075 0.033

GO:0010476 P gibberellin mediated signaling pathway 5 13 0.0075 0.033 220 GO:0031408 P oxylipin biosynthetic process 6 18 0.0076 0.033 GO:0031407 P oxylipin metabolic process 6 18 0.0076 0.033 GO:0016638 F oxidoreductase activity, acting on the CH-NH2 group of donors 7 23 0.007 0.034 GO:0009642 P response to light intensity 9 35 0.0079 0.034 GO:0009812 P flavonoid metabolic process 9 35 0.0079 0.034 GO:0004312 F fatty-acid synthase activity 5 13 0.0075 0.036 GO:0015297 F antiporter activity 13 60 0.0076 0.036 GO:0009987 P cellular process 451 3836 0.0088 0.037 GO:0046983 F protein dimerization activity 11 48 0.0088 0.038 GO:0016301 F kinase activity 90 672 0.0088 0.038 GO:0005524 F ATP binding 80 589 0.009 0.038 GO:0046527 F glucosyltransferase activity 14 68 0.0092 0.038 GO:0008415 F acyltransferase activity 18 96 0.0094 0.038

GO:0022890 F inorganic cation transmembrane transporter activity 16 82 0.0095 0.038 GO:0001883 F purine nucleoside binding 84 625 0.0099 0.038

Appendix E Appendix

GO:0001882 F nucleoside binding 84 625 0.0099 0.038 GO:0030554 F adenyl nucleotide binding 84 625 0.0099 0.038 GO:0006970 P response to osmotic stress 25 147 0.0093 0.039 GO:0042398 P cellular amino acid derivative biosynthetic process 16 82 0.0095 0.039 GO:0000041 P transition metal ion transport 6 19 0.01 0.042 GO:0030076 C light-harvesting complex 6 17 0.0055 0.043 GO:0006874 P cellular calcium ion homeostasis 5 14 0.011 0.043 GO:0055074 P calcium ion homeostasis 5 14 0.011 0.043 GO:0008152 P metabolic process 421 3584 0.011 0.043 GO:0010035 P response to inorganic substance 25 149 0.011 0.043 GO:0009651 P response to salt stress 23 134 0.011 0.043

221 GO:0008610 P lipid biosynthetic process 31 195 0.011 0.043

GO:0032559 F adenyl ribonucleotide binding 80 595 0.011 0.043 GO:0006865 P amino acid transport 7 25 0.011 0.044 GO:0016757 F transferase activity, transferring glycosyl groups 36 235 0.012 0.044 GO:0009639 P response to red or far red light 16 84 0.012 0.045 GO:0044264 P cellular polysaccharide metabolic process 14 70 0.012 0.045 GO:0016747 F transferase activity, transferring acyl groups other than amino-acyl groups 19 106 0.013 0.046 GO:0019842 F vitamin binding 6 20 0.013 0.047 GO:0006073 P cellular glucan metabolic process 12 57 0.013 0.048 GO:0004672 F protein kinase activity 55 391 0.014 0.048 GO:0017076 F purine nucleotide binding 93 711 0.014 0.048 GO:0016835 F carbon-oxygen lyase activity 11 51 0.014 0.048 GO:0044237 P cellular metabolic process 347 2939 0.013 0.049 GO:0055085 P transmembrane transport 13 64 0.013 0.049

Appendix E

Appendix Table E.9 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci between the homoploid hybrid species A. sundingii and A. lemsii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided.

GO term Ontology Description Number in input list Number in BG/Ref p-value FDR GO:0006950 P response to stress 17 722 2.10E-05 0.00062 GO:0044262 P cellular carbohydrate metabolic process 6 218 0.0036 0.044 GO:0050896 P response to stimulus 18 1292 0.0045 0.044

222

Appendix E Appendix

Appendix Table E.10 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. broussonetii-like expression in A. sundingii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided.

GO term Ontology Description Number in input list Number in BG/Ref p-value FDR GO:0009611 P response to wounding 8 65 0.00011 0.0073 GO:0009605 P response to external stimulus 13 154 5.90E-05 0.0073 GO:0016491 F oxidoreductase activity 26 548 0.00053 0.016 GO:0016829 F lyase activity 11 142 0.00043 0.016 GO:0015103 F inorganic anion transmembrane transporter activity 5 34 0.00094 0.019

223

Appendix E

Appendix Table E.11 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. frutescens-like expression in A. sundingii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided.

GO term Ontology Description Number in input list Number in BG/Ref p-value FDR GO:0006950 P response to stress 30 722 0.0002 0.015 GO:0050896 P response to stimulus 46 1292 0.00024 0.015 GO:0044272 P sulfur compound biosynthetic process 5 41 0.0012 0.034 GO:0071554 P cell wall organization or biogenesis 7 83 0.0012 0.034 GO:0009628 P response to abiotic stimulus 23 571 0.0013 0.034 GO:0010319 C stromule 5 23 7.00E-05 0.0037 GO:0009526 C plastid envelope 12 178 0.00024 0.0064

GO:0044437 C vacuolar part 5 48 0.0024 0.021

GO:0044435 C plastid part 20 479 0.0017 0.021 224 GO:0031090 C organelle membrane 18 413 0.0017 0.021 GO:0005774 C vacuolar membrane 5 48 0.0024 0.021 GO:0044422 C organelle part 36 1111 0.0038 0.023 GO:0044446 C intracellular organelle part 36 1110 0.0038 0.023 GO:0042170 C plastid membrane 5 53 0.0037 0.023 GO:0031975 C envelope 13 295 0.0061 0.029 GO:0031967 C organelle envelope 13 295 0.0061 0.029 GO:0005576 C extracellular region 5 65 0.0087 0.039

Appendix E Appendix

Appendix Table E.12 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. broussonetii-like in A. lemsii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided.

GO term Ontology Description Number in input list Number in BG/Ref p-value FDR GO:0009611 P response to wounding 10 65 3.50E-05 0.0056 GO:0050896 P response to stimulus 69 1292 6.00E-05 0.0056 GO:0006855 P multidrug transport 6 30 0.0003 0.011 GO:0042493 P response to drug 6 30 0.0003 0.011 GO:0015893 P drug transport 6 30 0.0003 0.011 GO:0009605 P response to external stimulus 14 154 0.00041 0.013 GO:0015995 P chlorophyll biosynthetic process 5 22 0.00051 0.014

225 GO:0046148 P pigment biosynthetic process 7 54 0.0015 0.034

GO:0016491 F oxidoreductase activity 36 548 5.50E-05 0.0044 GO:0016829 F lyase activity 13 142 0.0006 0.011 GO:0005215 F transporter activity 36 628 0.00068 0.011 GO:0022857 F transmembrane transporter activity 29 472 0.00071 0.011 GO:0015103 F inorganic anion transmembrane transporter activity 6 34 0.0006 0.011 GO:0008509 F anion transmembrane transporter activity 7 58 0.0022 0.03 GO:0005783 C endoplasmic reticulum 16 201 0.00073 0.042

Appendix E

Appendix Table E.13 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. frutescens-like expression in A. lemsii. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided.

GO term Ontology Description Number in input list Number in BG/Ref p-value FDR GO:0009628 P response to abiotic stimulus 27 571 8.80E-06 0.00089 GO:0050896 P response to stimulus 46 1292 2.60E-05 0.0013 GO:0009409 P response to cold 9 114 0.0002 0.0067 GO:0006950 P response to stress 27 722 0.00038 0.0079 GO:0009416 P response to light stimulus 13 239 0.00039 0.0079 GO:0009266 P response to temperature stimulus 11 185 0.0005 0.0084 GO:0009314 P response to radiation 13 251 0.00061 0.0089

GO:0044272 P sulfur compound biosynthetic process 5 41 0.00072 0.0091

GO:0044281 P small molecule metabolic process 25 723 0.0016 0.017 226 GO:0042221 P response to chemical stimulus 23 640 0.0015 0.017 GO:0048878 P chemical homeostasis 5 60 0.004 0.036 GO:0044271 P cellular nitrogen compound biosynthetic process 9 182 0.0051 0.043 GO:0016740 F transferase activity 42 1346 0.00066 0.0069 GO:0042626 F ATPase activity, coupled to transmembrane movement of substances 8 108 0.00067 0.0069 GO:0042623 F ATPase activity, coupled 12 219 0.00059 0.0069 hydrolase activity, acting on acid anhydrides, GO:0016820 F 8 108 0.00067 0.0069 catalyzing transmembrane movement of substances GO:0043492 F ATPase activity, coupled to movement of substances 8 108 0.00067 0.0069 GO:0046527 F glucosyltransferase activity 6 68 0.0012 0.011 GO:0015405 F P-P-bond-hydrolysis-driven transmembrane transporter activity 8 132 0.0024 0.011 GO:0015399 F primary active transmembrane transporter activity 8 132 0.0024 0.011 GO:0003824 F catalytic activity 97 3823 0.0015 0.011 GO:0008194 F UDP-glycosyltransferase activity 7 105 0.0026 0.011 GO:0016758 F transferase activity, transferring hexosyl groups 9 161 0.0023 0.011 GO:0016887 F ATPase activity 13 286 0.002 0.011 GO:0016757 F transferase activity, transferring glycosyl groups 11 235 0.0033 0.013 GO:0005215 F transporter activity 20 628 0.0096 0.036

Appendix E Appendix

GO:0022857 F transmembrane transporter activity 16 472 0.011 0.038 GO:0042578 F phosphoric ester hydrolase activity 8 186 0.017 0.05 GO:0022804 F active transmembrane transporter activity 11 293 0.015 0.05 GO:0016787 F hydrolase activity 36 1380 0.017 0.05 GO:0010319 C stromule 5 23 4.20E-05 0.002

Appendix Table E.14 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. broussonetii-like expression shared between the homoploid hybrid species. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR)

227 corrected P values are provided.

GO term Ontology Description Number in input list Number in BG/Ref p-value FDR GO:0015103 F inorganic anion transmembrane transporter activity 5 34 0.00013 0.0063 GO:0016491 F oxidoreductase activity 19 548 0.0007 0.017 GO:0008509 F anion transmembrane transporter activity 5 58 0.0015 0.026

Appendix E

Appendix Table E.15 - Significantly enriched Gene Ontology (GO) terms for differentially expressed loci with A. frutescens-like expression shared between the homoploid hybrid species. Ontology is abbreviated as P (biological process), C (cellular components) and F (molecular functions (F). The number of differentially expressed loci and loci in the background reference list associated with each GO term is provided. P values and false discovery rate (FDR) corrected P values are provided.

GO term Ontology Description Number in input list Number in BG/Ref p-value FDR GO:0009266 P response to temperature stimulus 8 185 0.00054 0.011 GO:0050896 P response to stimulus 28 1292 0.00024 0.011 GO:0006950 P response to stress 18 722 0.0005 0.011 GO:0009409 P response to cold 6 114 0.00093 0.011 GO:0009628 P response to abiotic stimulus 15 571 0.00078 0.011 GO:0016740 F transferase activity 25 1346 0.0032 0.036

GO:0016758 F transferase activity, transferring hexosyl groups 6 161 0.0052 0.036 GO:0015405 F P-P-bond-hydrolysis-driven transmembrane transporter activity 5 132 0.0096 0.036 228 GO:0016887 F ATPase activity 8 286 0.0077 0.036 GO:0042626 F ATPase activity, coupled to transmembrane movement of substances 5 108 0.0042 0.036 GO:0042623 F ATPase activity, coupled 7 219 0.0061 0.036 GO:0015399 F primary active transmembrane transporter activity 5 132 0.0096 0.036 GO:0016757 F transferase activity, transferring glycosyl groups 7 235 0.0087 0.036 GO:0003824 F catalytic activity 56 3823 0.0068 0.036 hydrolase activity, acting on acid anhydrides, GO:0016820 F 5 108 0.0042 0.036 catalyzing transmembrane movement of substances GO:0043492 F ATPase activity, coupled to movement of substances 5 108 0.0042 0.036 GO:0044435 C plastid part 14 479 0.00041 0.016 GO:0009526 C plastid envelope 7 178 0.002 0.038

Appendix E Appendix Appendix E

Appendix Figure E.1 - Phylogeny adapted from White et al. (in prep) showing independent hybrid origins for A. sundingii and A. lemsii. Taxa with poorly supported phylogenetic relationships are annotated with an asterisk.

229 Appendix E

Appendix Figure E.2 - Populations sampled in Tenerife (A) in the Anaga peninsula (B). Populations are labelled A-U. Contour lines represent a 200 m change in altitude.

230 Appendix E

Appendix Figure E.3 - Summary of raw and filtered read counts across all samples

231 Appendix E

Appendix Figure E.4 - Percentage of orthogroups with distinct, multiple and no hits in (A) Arabidopsis thaliana and (B) Helianthus annuus across varying inflation parameters for OrthoFinder.

232 List of References

List of References

Abbott R, Albach D, Ansell S, Arntzen JW, Baird SJE, Bierne N, Boughman J, Brelsford A, Buerkle CA, Buggs R, et al. 2013. Hybridization and speciation. Journal of Evolutionary Biology 26: 229– 246.

Allan GJ, Francisco-Ortega J, Santos-Guerra A, Boerner E, Zimmer EA. 2004. Molecular phylogenetic evidence for the geographic origin and classification of Canary Island Lotus (Fabaceae: Loteae). Molecular Phylogenetics and Evolution 32: 123–138.

Altschul SF, Warren G, Miller W, Myers WE, Lipman DJ. 1990. Basic local alignment search tool. Journal of Molecular Biology 215: 403–410.

Andrews KR, Good JM, Miller MR, Luikart G, Hohenlohe PA. 2016. Harnessing the power of RADseq for ecological and evolutionary genomics. Nature Reviews Genetics 17: 81–92.

Baack E, Melo MC, Rieseberg LH, Ortiz-Barrientos D. 2015. The origins of reproductive isolation in plants. New Phytologist 207: 968–984.

Baack EJ, Whitney KD, Rieseberg LH. 2005. Hybridization and genome size evolution: timing and magnitude of nuclear DNA content increases in Helianthus homoploid hybrid species. New Phytologist 167: 623–630.

Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, Cottret L, Lelandais-Brière C, Owens GL, Carrère S, Mayjonade B, et al. 2017. The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546: 148–152.

Bandelt HJ, Forster P, Röhl A. 1999. Median-joining networks for inferring intraspecific phylogenies. Molecular Biology and Evolution 16: 37–48.

Barrès B, Carlier J, Seguin M, Fenouillet C, Cilas C, Ravigné V. 2012. Understanding the recent colonization history of a plant pathogenic fungus using population genetic tools and Approximate Bayesian Computation. Heredity 109: 269–279.

Barton NH, Hewitt GM. 1985. Analysis of hybrid zones. Annual Review of Ecology and Systematics 16: 113–148.

Benjamini Y, Hochberg Y. 1995. Controlling the False Discovery Rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society 57: 289–300.

Bertorelle G, Benazzo A, Mona S. 2010. ABC as a flexible framework to estimate demography

233 List of References over space and time: Some cons, many pros. Molecular Ecology 19: 2609–2625.

Birol I, Jackman SD, Nielsen CB, Qian JQ, Varhol R, Stazyk G, Morin RD, Zhao Y, Hirst M, Schein JE, et al. 2009. De novo transcriptome assembly with ABySS. Bioinformatics 25: 2872–2877.

Blanco-Pastor JL, Bertrand YJK, Liberal IM, Wei Y, Brummer EC, Pfeil BE. 2018. Robustness of RADseq for evolutionary network reconstruction from gene trees. bioRxiv: 414243.

Böhle UR, Hilger HH, Martin WF. 1996. Island colonisation and evolution of the insular woody habit in Echium L. (Boraginaceae). Proceedings of the National Academy of Sciences of the United States of America 93: 11740–11745.

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114–2120.

Borgen L. 1976. Analysis of a hybrid swarm between Argyranthemum adauctum and A. filifolium in the Canary islands. Norwegian Journal of Botany 23: 121–137.

Borgen L. 1980. A new species of Argyranthemum (Compositae) from the Canary Islands. Nordic Journal of Botany 27: 163–165.

Borgen L. 1984. Biosystematics of Macaronesian Flowering Plants. In: Grant WF, ed. Plant Biosystematics. London: Academic Press, 477–496.

Borgen L, Leitch I, Santos-Guerra A. 2003. Genome organization in diploid hybrid species of Argyranthemum (Asteraceae) in the Cnary Islands. Botanical Journal of the Linnean Society 141: 491–501.

Bramwell D, Bramwell Z. 2001. Wild of the Canary Islands. Spain: Editorial Rueda.

Bremer K, Humphries CJ. 1993. Generic monograph of the Asteraceae-Antemideae. Bulletin of the Natural History Museum 23: 71–177.

Brochmann C. 1984. Hybridization and distribution of Argyranthemum coronopifolium (Asteraceae, Anthemideae) in the Canary Islands. Nordic Journal of Botany 4: 729–736.

Brochmann C. 1987. Evaluation of some methods for hybrid analysis, exemplified by hybridisation in Argyranthemum. Nordic Journal of Botany 7: 609–630.

Brochmann C, Borgen L, Stabbetorp OE. 2000. Multiple diploid hybrid speciation of the Canary Island endemic Argyranthemum sundingii (Asteraceae). Plant Systematics and Evolution 220: 77– 92.

234 List of References

Broennimann O, Fitzpatrick MC, Pearman PB, Petitpierre B, Pellissier L, Yoccoz NG, Thuiller W, Fortin MJ, Randin C, Zimmermann NE, et al. 2012. Measuring ecological niche overlap from occurrence and spatial environmental data. Global Ecology and Biogeography 21: 481–497.

Bryan GJ, McNicoll J, Ramsay G, Meyer RC, De Jong WS. 1999. Polymorphic simple sequence repeat markers in chloroplast genomes of Solanaceous plants. Theoretical and Applied Genetics 99: 859–867.

Buerkle CA, Morris RJ, Asmussen MA, Rieseberg LH. 2000. The likelihood of homoploid hybrid speciation. Heredity 84: 441–451.

Burstin J, Deniot G, Potier J, Weinachter C, Aubert G, Baranger A. 2001. Microsatellite polymorphism in Pisum sativum. Plant Breeding 120: 311–317.

Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL. 2009. BLAST+: Architecture and applications. BMC Bioinformatics 10: 1–9.

Capblancq T, D L, Rioux D, Mavarez J. 2015. Hybridization promotes speciation in Coenonympha butterflies. Molecular Ecology 24: 6209–6222.

Carine MA, Robba L, Little R, Russell S, Santos-Guerra A. 2007. Molecular and morphological evidence for hybridization between endemic Canary Island Convolvulus. Botanical Journal of the Linnean Society 154: 187–204.

Carlson DE, Hedin M. 2017. Comparative transcriptomics of Entelegyne spiders (Araneae, Entelegynae), with emphasis on molecular evolution of orphan genes. PLoS ONE 12: e0174102.

Catchen J, Hohenlohe PA, Bassham S, Amores A, Cresko WA. 2013. Stacks: An analysis tool set for population genomics. Molecular Ecology 22: 3124–3140.

Caujapé-Castells J, Tye A, Crawford DJ, Santos-Guerra A, Sakai A, Beaver K, Lobin W, Vincent Florens FB, Moura M, Jardim R, et al. 2010. Conservation of oceanic island floras: Present and future global challenges. Perspectives in Plant Ecology, Evolution and Systematics 12: 107–129.

Chamala S, García N, Godden GT, Krishnakumar V, Jordon-Thaden IE, De Smet R, Barbazuk WB, Soltis DE, Soltis PS. 2015. MarkerMiner 1.0: A new application for phylogenetic marker development using angiosperm transcriptomes. Applications in Plant Sciences 3: 1400115.

Chapman MA. 2015. Transcriptome sequencing and marker development for four underutilized legumes. Applications in Plant Sciences 3: 1400111.

Chapman MA, Burke JM. 2007. Genetic divergence and hybrid speciation. Evolution 61: 1773–

235 List of References

1780.

Chapman MA, Chang J, Weisman D, Kesseli R V., Burke JM. 2007. Universal markers for comparative mapping and phylogenetic analysis in the Asteraceae (Compositae). Theoretical and Applied Genetics 115: 747–755.

Chapman MA, Hiscock SJ, Filatov DA. 2013. Genomic divergence during speciation driven by adaptation to altitude. Molecular Biology and Evolution 30: 2553–2567.

Chavent M, Kuentz-Simonet V, Labenne A, Saracco J. 2017. Multivariate analysis of mixed data: The R package PCAmixdata. arXiv: 1411.4911.

Chavent M, Kuentz-Simonet V, Saracco J. 2012. Orthogonal rotation in PCAMIX. Advances in Data Analysis and Classification 6: 131–146.

Choisy M, Franck P, Cornuet JM. 2004. Estimating admixture proportions with microsatellites : comparison of methods based on simulated data. Molecular Ecology 13: 955–968.

Di Cola V, Broennimann O, Petitpierre B, Breiner FT, D’Amen M, Randin C, Engler R, Pottier J, Pio D, Dubuis A, et al. 2017. ecospat: an R package to support spatial analyses and modeling of species niches and distributions. Ecography 40: 774–787.

Cornuet J-M, Ravigné V, Estoup A. 2010. Inference on population history and model checking using DNA sequence and microsatellite data with the software DIYABC (v1.0). BMC bioinformatics 11: 401.

Cornuet JM, Santos F, Beaumont MA, Robert CP, Marin JM, Balding DJ, Guillemaud T, Estoup A. 2008. Inferring population history with DIY ABC: A user-friendly approach to approximate Bayesian computation. Bioinformatics 24: 2713–2719.

Crawford DJ, Stuessy TF. 2016. Cryptic variation, molecular data, and the challenge of conserving plant diversity in oceanic archipelagos: the critical role of plant systematics. Taxon 46: 129–148.

Curto M, Schachtler C, Puppo P, Meimberg H. 2017. Using a new RAD-sequencing approach to study the evolution of Micromeria in the Canary Islands. Molecular Phylogenetics and Evolution 119: 160–169.

Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, et al. 2011. The variant call format and VCFtools. Bioinformatics 27: 2156– 2158.

Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML. 2011. Genome-wide

236 List of References genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics 12: 499–510. van Dongen SM. 2000. Graph Clustering by Flow Simulation (Doctoral dissertation).

Donovan LA, Rosenthal DR, Sanchez-Velenosi M, Rieseberg LH, Ludwig F. 2010. Are hybrid species more fit than ancestral parent species in the current hybrid species habitats? Journal of Evolutionary Biology 23: 805–816.

Doyle JJ, Doyle JL. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical Bulletin 19: 11–15.

Du Z, Zhou X, Ling Y, Zhang Z, Su Z. 2010. agriGO: A GO analysis toolkit for the agricultural community. Nucleic Acids Research 38: 64–70.

Dunning LT, Hipperson H, Baker WJ, Butlin RK, Devaux C, Hutton I, Igea J, Papadopulos AST, Quan X, Smadja CM, et al. 2016. Ecological speciation in sympatric palms: 1. Gene expression, selection and pleiotropy. Journal of Evolutionary Biology 29: 1472–1487.

Earl DA, vonHoldt BM. 2012. STRUCTURE HARVESTER: A website and program for visualizing STRUCTURE output and implementing the Evanno method. Conservation Genetics Resources 4: 359–361.

Eaton D, Ree RH. 2013. Inferring phylogeny and introgression using RADseq data: an example from flowering plants (Pedicularis: Orobanchaceae). Systematic Biology 62: 689–706.

Egan AN, Schlueter J, Spooner DM. 2012. Applications of next-generation sequencing in plant biology. American Journal of Botany 99: 175–185.

Ellis JR, Burke JM. 2007. EST-SSRs as a resource for population genetic analyses. Heredity 99: 125– 132.

Elmer KR, Fan S, Gunter HM, Jones JC, Boekhoff S, Kuraku S, Meyer A. 2010. Rapid evolution and selection inferred from the transcriptomes of sympatric crater lake cichlid fishes. Molecular Ecology 19: 197–211.

Emerson BC. 2002. Evolution on oceanic islands: molecular phylogenetic approaches to understanding pattern and process. Molecular Ecology 11: 951–66.

Emms DM, Kelly S. 2015. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biology 16: 157.

237 List of References

Evanno G, Regnaut S, Goudet J. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Molecular Ecology 14: 2611–2620.

Excoffier L, Foll M. 2011. fastsimcoal: A continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. Bioinformatics 27: 1332–1334.

Fan S, Elmer KR, Meyer A. 2012. Genomics of adaptation and speciation in cichlid fishes: recent advances and analyses in African and Neotropical lineages. Philosophical transactions of the Royal Society of London. Series B, Biological sciences 367: 385–94.

Fisher R. 1930. The genetical theory of natural selection. Oxford: Clarendon.

Fjellheim S, Jørgensen MH, Kjos M, Borgen L. 2009. A molecular study of hybridization and homoploid hybrid speciation in Argyranthemum (Asteraceae) on Tenerife, the Canary Islands. Botanical Journal of the Linnean Society 159: 19–31.

Ford AGP, Dasmahapatra KK, Rüber L, Gharbi K, Cezard T, Day JJ. 2015. High levels of interspecific gene flow in an endemic cichlid fish adaptive radiation from an extreme lake environment. Molecular Ecology 24: 3421–3440.

Francisco-Ortega J, Crawford DJ, Santos-Guerra A, Carvalho JA. 1996a. Isozyme differenctiation in the endemic genus Argyranthemum (Asteraceae: Anthemideae) in the Macaronesian Islands. Plant Systematics and Evolution 202: 137–152.

Francisco-Ortega J, Crawford DJ, Santos-Guerra A, Jansen RK. 1997a. Origin and Evolution of Argyranthemum (Asteraceae: Anthemideae) in Macaronesia. In: Givnish TJ, Sytsma KJ, eds. In Molecular Evolution and Adaptive Radiation. 407–432.

Francisco-Ortega J, Crawford DJ, Santos-Guerra A, Sa-Fontinha S. 1995a. Genetic divergence among Mediterranean and Macronesian genera of the subtribe Chrysantheminae (Asteraceae). American Journal of Botany 82: 1321–1328.

Francisco-Ortega J, Ellis RH, González-Feria E, Santos-Guerra A. 1994. Overcoming seed dormancy in ex situ plant germplasm conservation programmes; an example in the endemic Argyranthemum (Asteraceae: Anthemideae) species from the Canary Islands. Biodiversity and Conservation 3: 341–353.

Francisco-Ortega J, Fuertes-Aguilar J, Kim SC, Santos-Guerra A, Crawford DJ, Jansen RK. 2002. Phylogeny of the Macaronesian endemic Crambe section Dendrocrambe (Brassicaceae) based on internal transcribed spacer sequences of nuclear ribosomal DNA. American Journal of Botany 89: 1984–1990.

238 List of References

Francisco-Ortega J, Jansen RK, Crawford DJ, Santos-Guerra A. 1995b. Chloroplast DNA evidence for intergeneric relationships of the macaronesian endemic genus Argyranthemum (Asteraceae). Systematic Botany 20: 413–422.

Francisco-Ortega J, Jansen RK, Santos-Guerra A. 1996b. Chloroplast DNA evidence of colonization, adaptive radiation, and hybridization in the evolution of the Macaronesian flora. Proceedings of the National Academy of Sciences of the United States of America 93: 4085–4090.

Francisco-Ortega J, Santos-Guerra A, Hines A, Robert K. 1997b. Molecular evidence for a Mediterranean origin of the Macaronesian endemic genus Argyranthemum (Asteraceae). American Journal of Botany 84: 1595–1613.

Francisco-Ortega J, Santos-Guerra A, Kim SC, Crawford DJ. 2000. Plant genetic diversity in the Canary Islands: A conservation perspective. American Journal of Botany 87: 909–919.

Francisco-Ortega J, Santos-Guerra A, Mesa-Coello R, Gonzalez-Feria E, Crawford DJ. 1996c. Genetic resource conservation of the endemic genus Argyranthemum Sch. Bip. (Asteraceae: Anthemideae) in the Macaronesian Islands. Genetic Resources and Crop Evolution 43: 33–39.

Friar EA, Prince LM, Cruse-Sanders JM, McGlaughlin ME, Butterworth C a., Baldwin BG. 2008. Hybrid origin and genomic mosaicism of Dubautia scabra (Hawaiian Silversword Alliance: Asteraceae, Madiinae). Systematic Botany 33: 589–597.

Fu L, Niu B, Zhu Z, Wu S, Li W. 2012. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 28: 3150–3152.

Givnish TJ, Millam KC, Mast AR, Paterson TB, Theim TJ, Hipp AL, Henss JM, Smith JF, Wood KR, Sytsma KJ, et al. 2009. Origin, adaptive radiation and diversification of the Hawaiian lobeliads (: Campanulaceae). Proceedings of the Royal Society Biological Sciences 276: 407–416.

Gonzalez AG, Estevez-Reyes R, Estevez-Braun A, Ravelo AG, Jimenez IA, Bazzocchi IL, Aguilar MA, Moujir L. 1997. Biological activities of some Argyranthemum species. Phytochemistry 45: 963–967.

Goodson BE, Santos-guerra A, Jansen RK, Goodson E, Jansen K. 2006. Molecular systematics of Descurainia (Brassicaceae) in the Canary Islands: Biogeographic and taxonomic implications. Taxon 55: 671–682.

Goulet BE, Roda F, Hopkins R. 2017. Hybridization in Plants: old ideas, new techniques. Plant Physiology 173: 65–78.

239 List of References

Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. 2011. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology 29: 644–652.

Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, et al. 2013. Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data. Nature Biotechnology 29: 644–652.

Grant V. 1971. Natural Hybridisation. In: Plant Speciation. New York & London: Columbia University Press, 151–162.

Gross BL, Rieseberg LH. 2005. The ecological genetics of homoploid hybrid speciation. Journal of Heredity 96: 241–252.

Gruenstaeudl M, Carstens BC, Santos-Guerra A, Jansen RK. 2017. Statistical hybrid detection and the inference of ancestral distribution areas in Tolpis (Asteraceae). Biological Journal of the Linnean Society 121: 133–149.

Grusz AL, Rothfels CJ, Schuettpelz E. 2016. Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns. BMC Genomics 17: 692.

Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Philip D, Bowden J, Couger MB, Eccles D, Li B, Macmanes MD, et al. 2014. De novo transcript sequence reconstruction from RNA-Seq: reference generation and analysis with Trinity. Nat Protocols 8: 1–43.

Harrison PW, Wright AE, Zimmer F, Dean R, Montgomery SH, Pointer MA, Mank JE. 2015. Sexual selection drives evolution and rapid turnover of male gene expression. Proceedings of the National Academy of Sciences 112: 4393–4398.

Hegarty MJ, Barker GL, Brennan AC, Edwards KJ, Abbott RJ, Hiscock SJ. 2008. Changes to gene expression associated with hybrid speciation in plants: further insights from transcriptomic studies in Senecio. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363: 3055–3069.

Hegarty MJ, Barker GL, Brennan AC, Edwards KJ, Abbott RJ, Hiscock SJ. 2009. Extreme changes to gene expression associated with homoploid hybrid speciation. Molecular Ecology 18: 877–889.

Hegarty MJ, Hiscock SJ. 2005. Hybrid speciation in plants: new insights from molecular studies. The New phytologist 165: 411–423.

Hellsten U, Wright KM, Jenkins J, Shu S, Yuan Y, Wessler SR, Schmutz J, Willis JH, Rokhsar DS.

240 List of References

2013. Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. Proceedings of the National Academy of Sciences 110: 19478–19482. van Hengstum T, Lachmuth S, Oostermeijer JGB, den Nijs JCM, Meirmans PG, van Tienderen PH. 2012. Human-induced hybridization among congeneric endemic plants on Tenerife, Canary Islands. Plant Systematics and Evolution 298: 1119–1131.

Hodgins KA, Bock DG, Hahn MA, Heredia SM, Turner KG, Rieseberg LH. 2015. Comparative genomics in the Asteraceae reveals little evidence for parallel evolutionary change in invasive taxa. Molecular Ecology 24: 2226–2240.

Howarth DG, Baum DA. 2005. Genealogical evidence of homoploid hybrid speciation in an adaptive radiation of Scaevola (Goodeniaceae) in the Hawaiian Islands. Evolution 59: 948–961.

Humphries CJ. 1973. A Taxonomic study of the genus Argyranthemum Webb Ex Sch. BIP (Doctoral dissertation).

Humphries CJ. 1976. A Revision of the Macaronesian Genus Argyranthemum Webb Ex Schultz BIP. (Compositae-Anthemideae). Bulletin of the British Museum Natural History 5: 145–240.

Humphries CJ. 1979. Endemism and Evolution in the Macaronesia. In: Plants and Islands. New York: Academic Press, 171–199.

Husband BC. 2004. The role of triploid hybrids in the evolutionary dynamics of mixed-ploidy populations. Biological Journal of the Linnean Society 82: 537–546.

Jakobsson M, Rosenberg NA. 2007. CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23: 1801–1806.

James JK, Abbott RJ. 2005. Recent, allopatric, homoploid hybrid speciation: the origin of Senecio squalidus (Asteraceae) in the British Isles from a hybrid zone on Mount Etna, Sicily. Evolution 59: 2533–2547.

Jiggins CD, Salazar C, Linares M, Mavarez J, B PTRS. 2008. Hybrid trait speciation and Heliconius butterflies. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363: 3047–3054.

Jombart T, Ahmed I. 2011. adegenet 1.3-1: New tools for the analysis of genome-wide SNP data. Bioinformatics 27: 3070–3071.

Jones KE, Reyes-Betancort JA, Hiscock SJ, Carine MA. 2014. Allopatric diversification, multiple

241 List of References habitat shifts, and hybridization in the evolution of Pericallis (Asteraceae), a Macaronesian endemic genus. American Journal of Botany 101: 637–651.

Kadereit JW. 2015. The geography of hybrid speciation in plants. Taxon 64: 673–687.

Kelley JL, Passow CN, Plath M, Arias Rodriguez L, Yee M-C, Tobler M. 2012. Genomic resources for a model in adaptation and speciation research: characterization of the Poecilia mexicana transcriptome. BMC genomics 13: 652.

Kerbs B, Ressler J, Kelly JK, Mort ME, Santos-Guerra A, Gibson MJS, Caujapé-Castells J, Crawford DJ. 2017. The potential role of hybridization in diversification and speciation in an insular plant lineage: insights from synthetic interspecific hybrids. AoB PLANTS 9: 1–12.

Kim SC, McGowen MR, Lubinsky P, Barber JC, Mort ME, Santos-Guerra A. 2008. Timing and tempo of early and successive adaptive radiations in Macaronesia. PLoS ONE 3: 1–7.

Kozlov A, Darriba D, Flouri T, Morel B, Stamatakis A. 2018. RAxML-NG: A fast, scalable, and user- friendly tool for maximum likelihood phylogenetic inference. bioRxiv: 447110.

Kueffer C, Drake DR, Fernández-Palacios JM, Kueffer C, Drake DR. 2014. Island biology : looking towards the future Island biology: looking towards the future. Biology letters 447110: 20140719.

Lai Z, Gross BL, Zou Y, Andrews J, Rieseberg LH. 2006. Microarray analysis reveals differential gene expression in hybrid sunflower species. Molecular Ecology 15: 1213–1227.

Lamesch P, Berardini TZ, Li D, Swarbreck D, Wilks C, Sasidharan R, Muller R, Dreher K, Alexander DL, Garcia-Hernandez M, et al. 2012. The Arabidopsis Information Resource (TAIR): Improved gene annotation and new tools. Nucleic Acids Research 40: 1202–1210.

Lamichhaney S, Berglund J, Almén MS, Maqbool K, Grabherr M, Martinez-Barrio A, Promerová M, Rubin C-J, Wang C, Zamani N, et al. 2015. Evolution of Darwin’s finches and their beaks revealed by genome sequencing. Nature 518: 371–375.

Leaché AD, Banbury BL, Felsenstein J, De Oca ANM, Stamatakis A. 2015. Short tree, long tree, right tree, wrong tree: New acquisition bias corrections for inferring SNP phylogenies. Systematic Biology 64: 1032–1047.

Lee C, Kim SC, Lundy K, Santos-Guerra A. 2005. Chloroplast DNA phylogeny of the woody Sonchus alliance (Asteraceae: Sonchinae) in the Macaronesian Islands. American Journal of Botany 92: 2072–2085.

Li B, Dewey CN. 2011. RSEM: Accurate transcript quantification from RNA-Seq data with or

242 List of References without a reference genome. BMC Bioinformatics 12.

Li W, Godzik A. 2006. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658–1659.

Lindhardt MS, Philipp M, Tye A, Nielsen LR. 2009. Molecular, morphological, and experimental evidence for hybridization between threatened species of the Galapagos endemic genus Scalesia (Asteraceae). International Journal of Plant Sciences 170: 1019–1030.

Liu B, Abbott RJ, Lu Z, Tian B, Liu J. 2014. Diploid hybrid origin of Ostryopsis intermedia (Betulaceae) in the Qinghai-Tibet Plateau triggered by Quaternary climate change. Molecular Ecology 23: 3013–3027.

Losos JB, Jackman TR, Larson A, Queiroz K de, Rodríguez-Schettino L. 1998. Contingency and determinismin replicated adaptative radiations of island lizards. Science 279: 2115–2118.

Losos JB, Ricklefs RE. 2009. Adaptation and diversification on islands. Nature 457: 830–6.

Lowry DB, Modliszewski JL, Wright KM, Wu CA, Willis JH. 2008. The strength and genetic basis of reproductive isolating barriers in flowering plants. Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363: 3009–3021.

Mable BK. 2003. Breaking down taxonomic barriers in polyploidy research. Trends in Plant Science 8: 582–590.

Mallet J. 2005. Hybridization as an invasion of the genome. Trends in Ecology and Evolution 20: 229–237.

Mallet J. 2007. Hybrid speciation. Nature 446: 279–283.

Mavárez J, Linares M. 2008. Homoploid hybrid speciation in animals. Molecular Ecology 17: 4181– 4185.

McCarthy DJ, Chen Y, Smyth GK. 2012. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Research 40: 4288–4297.

McVay JD, Hipp AL, Manos PS. 2017. A genetic legacy of introgression confounds phylogeny and biogeography in oaks. Proceedings of the Royal Society of London B 284: 20170300.

Metzker ML. 2010. Sequencing technologies - the next generation. Nature reviews. Genetics 11: 31–46.

Moreno JC. 2008. Lista Roja 2008 de la flora vascular española. Madrid, Spain: Dirección General

243 List of References de Medio Natural y Politica Forestal (Ministerio de Medio Ambiente y Medio Rural y Marino), y Sociedad Española de Biologia dela Conservacion de Plantas.

Mort ME, Crawford DJ, Kelly JK, Santos-Guerra A, Menezes de Sequeira M, Moura M, Caujape- Castells J. 2015. Multiplexed-shotgun-genotyping data resolve phylogeny within a very recently derived insular lineage. American Journal of Botany 102: 634–641.

Mun JH, Kim DJ, Choi HK, Gish J, Debellé F, Mudge J, Denny R, Endré G, Saurat O, Dudez AM, et al. 2006. Distribution of microsatellites in the genome of Medicago truncatula: A resource of genetic markers that integrate genetic and physical maps. Genetics 172: 2541–2555.

Myers N, Mittermeier RA, Mittermeier CG, da Fonseca GAB, Kent J. 2000. Biodiversity hotspots for conservation priorities. Nature 403: 853–858.

Nevado B, Atchison GW, Hughes CE, Filatov DA. 2016. Widespread adaptive evolution during repeated evolutionary radiations in New World lupins. Nature Communications 7: 1–9.

Nieto Feliner G, Álvarez I, Fuertes-Aguilar J, Heuertz M, Marques I, Moharrek F, Piñeiro R, Riina R, Rosselló JA, Soltis PS, et al. 2017. Is homoploid hybrid speciation that rare? An empiricist’s view. Heredity: 513–516.

Nolte AW, Tautz D. 2010. Understanding the onset of hybrid speciation. Trends in Genetics 26: 54–58.

Noor MAF. 1999. Reinforcement and other consequences of sympatry. Heredity 83: 503–508.

Oberprieler C, Himmelreich S, Kallersjo M, Valles J, Watson LE. 2009. Anthemideae. In: Systematics, evolution, and biogeography of Compositae. Vienna, Austria: International Association for Plant Taxonomy, 631–662.

Oberprieler C, Himmelreich S, Vogt R. 2007. A new subtribal classification of the tribe Anthemideae. Willdenowia 37: 89–114.

Oliveira R, Godinho R, Randi E, Alves PC. 2008. Hybridization versus conservation: are domestic cats threatening the genetic integrity of wildcats (Felis silvestris silvestris) in Iberian Peninsula? Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences 363: 2953– 2961.

Paradis E, Claude J, Strimmer K. 2004. APE: Analyses of phylogenetics and evolution in R language. Bioinformatics 20: 289–290.

Paun O, Turner B, Trucchi E, Munzinger J, Chase MW, Samuel R. 2016. Processes driving the

244 List of References adaptive radiation of a tropical tree (Diospyros, Ebenaceae) in New Caledonia, a biodiversity hotspot. Systematic Biology 65: 212–227.

Pavey SA, Collin H, Nosil P, Rogers SM. 2010. The role of gene expression in ecological speciation. Annals of the New York Academy of Sciences 1206: 110–129.

Peakall R, Smouse PE. 2012. GenALEx 6.5: Genetic analysis in Excel. Population genetic software for teaching and research-an update. Bioinformatics 28: 2537–2539.

Phillips SJ, Anderson RP, Schapire RE. 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190: 231–259.

Phillips SJ, Dudík M, Schapire RE. 2018. Maxent software for modeling species niches and distributions (Version 3.4.1).

Poland JA, Rife TW. 2012. Genotyping-by-Sequencing for plant breeding and genetics. The Plant Genome Journal 5: 92–102.

Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics 155: 945–959.

Puppo P, Curto M, Gusmão-Guedes J, Cochofel J, Pérez de Paz PL, Bräuchler C, Meimberg H. 2015. Molecular phylogenetics of Micromeria (Lamiaceae) in the Canary Islands, diversification and inter-island colonization patterns inferred from nuclear genes. Molecular Phylogenetics and Evolution 89: 160–170.

Purcell S, Neale B, Todd-brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, Bakker PIW De, Daly MJ, et al. 2007. PLINK: A tool set for whole-genome association and population- based linkage analyses. American Journal of Human Genetics 81: 559–575.

R Core Team. 2018. R: A Language and Environment for Statistical Computing.

Rees DJ, Emerson BC, Oromí P, Hewitt GM. 2001. The diversification of the genus Nesotes (Coleoptera: Tenebrionidae) in the Canary Islands: Evidence from mtDNA. Molecular Phylogenetics and Evolution 21: 321–326.

Reyes-Chin-Wo S, Wang Z, Yang X, Kozik A, Arikit S, Song C, Xia L, Froenicke L, Lavelle DO, Truco MJ, et al. 2017. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce. Nature Communications 8.

Rieseberg LH. 1997. Hybrid origins of plant species. Annual Review of Ecology, Evolution and Systematics: 359–389.

245 List of References

Rieseberg LH, Raymond O, Rosenthal DM, Lai Z, Livingstone K, Nakazato T, Durphy JL, Schwarzbach AE, Donovan L a, Lexer C. 2003. Major ecological transitions in wild sunflowers facilitated by hybridization. Science 301: 1211–6.

Rieseberg LH, Willis JH. 2007. Plant speciation. Science 317: 910–4.

Rius M, Bourne S, Hornsby HG, Chapman MA. 2015. Applications of next-generation sequencing to the study of biological invasions. Current Zoology 61: 488–504.

Roberts WR, Roalson EH. 2017. Comparative transcriptome analyses of flower development in four species of Achimenes (Gesneriaceae). BMC Genomics 18: 240.

Robinson MD, McCarthy DJ, Smyth GK. 2009. edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140.

Robotham O, Chapman MA. 2017. Erratum to: Population genetic analysis of hyacinth bean (Lablab purpureus (L.) Sweet, Leguminosae) indicates an East African origin and variation in drought tolerance. Genetic Resources and Crop Evolution 64: 441.

Ronquist F, Teslenko M, Van Der Mark P, Ayres DL, Darling A, Höhna S, Larget B, Liu L, Suchard MA, Huelsenbeck JP. 2012. Mrbayes 3.2: Efficient bayesian phylogenetic inference and model choice across a large model space. Systematic Biology 61: 539–542.

Ru D, Sun Y, Wang D, Chen Y, Wang T, Hu Q, Abbott RJ, Liu J. 2018. Population genomic analysis reveals that homoploid hybrid speciation can be a lengthy process. Molecular Ecology 27: 4875– 4887.

Salazar C, Baxter SW, Pardo-Diaz C, Wu G, Surridge A, Linares M, Bermingham E, Jiggins CD. 2010. Genetic evidence for hybrid trait speciation in Heliconius butterflies. PLoS Genetics 6.

Schaefer H, Moura M, Belo Maciel MG, Silva L, Rumsey FJ, Carine MA. 2011. The Linnean shortfall in oceanic island biogeography: A case study in the Azores. Journal of Biogeography 38: 1345–1355.

Schneider CA, Rasband WS, Eliceiri KW. 2012. NIH Image to ImageJ: 25 years of image analysis. Nature Methods 9: 671–675.

Schoener TW. 1968. The Anolis lizards of Bimini: resource partitioning in a complex fauna. Ecology 49: 704–726.

Schönfelder I, Schönfelder P. 2012. Die Kosmos-Kanarenflora: Über 1000 Arten und 60 tropische Ziergehölze. Kosmos.

246 List of References

Schuelke M. 2000. An economic method for the fluorescent labeling of PCR fragments. Nature Biotechnology 18: 233–234.

Schulz MH, Zerbino DR, Vingron M, Birney E. 2012. Oases: Robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28: 1086–1092.

Schumer M, Rosenthal GG, Andolfatto P. 2014. How common is homoploid hybrid speciation? Evolution 68: 1553–1560.

Schumer M, Rosenthal GG, Andolfatto P. 2018. What do we mean when we talk about hybrid speciation? Heredity 120: 379–382.

Schwarzbach AE, Rieseberg LH. 2002. Likely multiple origins of a diploid hybrid sunflower species. Molecular Ecology 11: 1703–1715.

Scrucca L, Fop M, Murphy TB, Raftery AE. 2016. mclust 5: Clustering, classification and density estimation using gaussian finite mixture models. R Journal 8: 289–317.

Seehausen O, Butlin RK, Keller I, Wagner CE, Boughman JW, Hohenlohe PA, Peichel CL, Saetre GP, Bank C, Brännström Å, et al. 2014. Genomics and the origin of species. Nature Reviews Genetics 15: 176–192.

Servedio MR, Noor MA. 2003. The role of reinforcement in speciation: theory and data. Annual Review of Ecology and Systematics 34: 339–364.

Shah N, G NM, Warnow T, Pop M. 2018. Misunderstood parameter of NCBI BLAST impacts the correctness of bioinformatics workflows. Bioinformatics: 1–2.

Shaw J, Lickey EB, Schilling EE, Small RL. 2007. Comparison of whole chloroplast genome sequences to choose noncoding regions for phylogenetic studies in angiosperms: The Tortoise and the hare III. American Journal of Botany 94: 275–288.

Sherman NA, Burke JM. 2009. Population genetic analysis reveals a homoploid hybrid origin of Stephanomeria diegensis (Asteraceae). Molecular Ecology 18: 4049–4060.

Soltis PS, Soltis DE. 2009. The role of hybridization in plant speciation. Annual Review of Plant Biology 60: 561–88.

Soltis DE, Visger CJ, Soltis PS. 2014. The polyploidy revolution then...and now: Stebbins revisited. American Journal of Botany 101: 1057–1078.

Sousa V, Hey J. 2013. Understanding the origin of species with genome-scale data: modelling

247 List of References gene flow. Nature Reviews Genetics 14: 404–414.

Stapley J, Reger J, Feulner PGD, Smadja C, Galindo J, Ekblom R, Bennison C, Ball AD, Beckerman AP, Slate J. 2010. Adaptation genomics: the next generation. Trends in Ecology and Evolution 25: 705–712.

Stebbins GL. 1959. The Role of Hybridization in Evolution. Proceedings of the American Philosophical Society 103: 231–251.

Sun Y, Skinner DZ, Liang GH, Hulbert SH. 1994. Phylogenetic analysis of Sorghum and related taxa using internal transcribed spacers of nuclear ribosomal DNA. Theoretical and Applied Genetics 89: 26–32.

Takayama K, López-Sepúlveda P, Greimler J, Crawford DJ, Peñailillo P, Baeza M, Ruiz E, Kohl G, Tremetsberger K, Gatica A, et al. 2015. Relationships and genetic consequences of contrasting modes of speciation among endemic species of Robinsonia (Asteraceae, ) of the Juan Fernández Archipelago, Chile, based on AFLPs and SSRs. New Phytologist 205: 415–428.

Takayama K, López Sepúlveda P, Kohl G, Novak J, Stuessy TF. 2013. Development of microsatellite markers in Robinsonia (Asteraceae) an endemic genus of the Juan Fernández Archipelago, Chile. Conservation Genetics Resources 5: 63–67.

Taylor EB, Boughman JW, Groenenboom M, Sniatynski M, Schluter D, Gow JL. 2006. Speciation in reverse: Morphological and genetic evidence of the collapse of a three-spined stickleback (Gasterosteus aculeatus) species pair. Molecular Ecology 15: 343–355.

The Tomato Genome Consortium. 2012. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485: 635–641.

Tian T, Liu Y, Yan H, You Q, Yi X, Du Z, Xu W, Su Z. 2017. AgriGO v2.0: A GO analysis toolkit for the agricultural community, 2017 update. Nucleic Acids Research 45: W122–W129.

Tovar-Sánchez E, Oyama K. 2004. Natural hybridization and hybrid zones between Quercus crassifolia and Quercus crassipes (Fagaceae) in Mexico: morphological and molecular evidence. American Journal of Botany 91: 1352–1363.

Trusty JL, Olmstead RG, Santos-Guerra A, Sá-Fontinha S, Francisco-Ortega J. 2005. Molecular phylogenetics of the Macaronesian-endemic genus Bystropogon (Lamiaceae): Palaeo-islands, ecological shifts and interisland colonizations. Molecular Ecology 14: 1177–1189.

Twyford AD, Streisfeld MA, Lowry DB, Friedman J. 2015. Genomic studies on the nature of

248 List of References species: adaptation and speciation in Mimulus. Molecular Ecology 24: 2601–2609.

Vitales D, Garnatje T, Pellicer J, Vallès J, Santos-Guerra A, Sanmartín I. 2014. The explosive radiation of Cheirolophus (Asteraceae, Cardueae) in Macaronesia. BMC Evolutionary Biology 14: 118.

Wang H, Jiang J, Chen S, Qi X, Peng H, Li P, Song A, Guan Z, Fang W, Liao Y, et al. 2013. Next- generation sequencing of the Chrysanthemum nankingense (Asteraceae) transcriptome permits large-scale unigene assembly and SSR marker discovery. PLoS ONE 8: 1–10.

Wang H, Penmetsa RV, Yuan M, Gong L, Zhao Y, Guo B, Farmer AD, Rosen BD, Gao J, Isobe S, et al. 2012. Development and characterization of BAC-end sequence derived SSRs, and their incorporation into a new higher density genetic map for cultivated peanut (Arachis hypogaea L.). BMC Plant Biology 12: 10.

Wang XR, Szmidt AE, Savolainen O. 2001. Genetic composition and diploid hybrid speciation of a high mountain pine, Pinus densata, native to the Tibetan plateau. Genetics 159: 337–346.

Warren DL, Glor RE, Turelli M. 2010. ENMTools: A toolbox for comparative studies of environmental niche models. Ecography 33: 607–611.

Warren DL, Warren DL, Glor RE, Turelli M. 2008. Environmental niche equivalency versus conservatism: quantitative approaches to niche evolution. Evolution 62: 2868–2883.

Watson LE, Evans TM, Boluarte T. 2000. Molecular phylogeny and biogeography of tribe Anthemideae (Asteraceae), based on chloroplast gene ndhF. Molecular Phylogenetics and Evolution 15: 59–69.

White OW, Doo B, Carine MA, Chapman MA. 2016. Transcriptome sequencing and simple sequence repeat marker development for three macaronesian endemic plant species. Applications in Plant Sciences 4: 1600050.

White OW, Reyes-Betancort JA, Chapman MA, Carine MA. 2018. Independent homoploid hybrid speciation events in the Macaronesian endemic genus Argyranthemum. Molecular Ecology 00: 1– 18.

Wickham H. 2009. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York.

Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Rieseberg LH. 2009. The frequency of polyploid speciation in vascular plants. Proceedings of the National Academy of Sciences of the United States of America 106: 13875–9.

249 List of References

Worth JRP, Larcombe MJ, Sakaguchi S, Marthick JR, Bowman DMJS, Ito M, Jordan GJ. 2016. Transient hybridization, not homoploid hybrid speciation, between ancient and deeply divergent conifers. American Journal of Botany 103: 246–259.

Wright AE, Harrison PW, Zimmer F, Montgomery SH, Pointer MA, Mank JE. 2015. Variation in promiscuity and sexual selection drives avian rate of Faster-Z evolution. Molecular Ecology 24: 1218–1235.

Wu F, Mueller LA, Crouzillat D, Pétiard V, Tanksley SD. 2006. Combining bioinformatics and phylogenetics to identify large sets of single-copy orthologous genes (COSII) for comparative, evolutionary and systematic studies: A test case in the euasterid plant clade. Genetics 174: 1407– 1420.

Yakimowski SB, Rieseberg LH. 2014. The role of homoploid hybridization in evolution: A century of studies synthesizing genetics and ecology. American Journal of Botany 101: 1247–1258.

Yang Y, Moore MJ, Brockington SF, Soltis DE, Wong GKS, Carpenter EJ, Zhang Y, Chen L, Yan Z, Xie Y, et al. 2015. Dissecting molecular evolution in the highly diverse plant clade caryophyllales using transcriptome sequencing. Molecular Biology and Evolution 32: 2001–2014.

Yu G, Smith DK, Zhu H, Guan Y, Lam TTY. 2017. ggtree: An R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods in Ecology and Evolution 8: 28–36.

250