Manuscript Click here to access/download;Manuscript;Siniscalchi et al primer note_final_final_final_resub_edits.docx Click here to view linked References 1 2 3 4 Using genomic data to develop SSR markers for species of Chresta (, ) 5 6 7 from the Caatinga 8 9 10 11 Carolina M. Siniscalchi12*, Benoit Loeuille3, José R. Pirani2, Jennifer R. Mandel1 12 13 14 15 16 17 18 1 19 Department of Biological Sciences, University of Memphis, Memphis, TN 38152, USA 20 21 2 Departamento de Botânica, Instituto de Biociências, Universidade de São Paulo, Rua do Matão, 22 23 24 277, 05508-090, São Paulo, SP, Brazil 25 26 3 Universidade Federal de Pernambuco, Departamento de Botânica - CCB, Av. Prof. Moraes Rego, 27 28 1235, 50670-901, Recife, PE, Brazil. 29 30 * 31 Author for correspondence: [email protected] 32 33 34 35 36 ORCID ID: 37 38 Carolina M. Siniscalchi: 0000-0003-3349-5081 39 40 41 Benoit Loeuille: 0000-0001-6898-7858 42 43 José Rubens Pirani: 0000-0001-7984-4457 44 45 Jennifer R. Mandel: 0000-0003-3539-2991 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 1 64 65 1 2 3 4 Abstract 5 6 Chresta is a mostly endemic to Brazil that presents several rupiculous species with naturally 7 8 9 fragmented distributions. Aiming to facilitate studies about genetic diversity and structure in these 10 11 species, we developed a set of 22 nuclear and 6 plastid microsatellite markers that are transferable 12 13 14 among different species of the genus. We used previously obtained genomic data from target 15 16 capture and Illumina sequencing to identify putative repeat regions, designed and synthetized 17 18 19 primers, and genotyped individuals from different populations of three species. All loci were 20 21 successfully amplified in all three species and were overall variable, except for the plastid 22 23 24 markers, which were monomorphic in two species. These newly developed microsatellites will be 25 26 useful in studies focusing on the population genetics of Chresta. 27 28 29 30 31 Key Words: Chrestinae, microsatellite, rock outcrops, target capture. 32 33 34 35 36 1. Introduction 37 38 Microsatellites, or Simple Sequence Repeats (SSRs), still are some of the most frequently used 39 40 41 markers for population and conservation genetics studies. They are prevalent among different 42 43 organisms, abundant in the genome, often highly polymorphic, and relatively inexpensive (Hodel 44 45 et al. 2016). The traditional method to develop microsatellite markers from a genomic library 46 47 48 enriched for repeat motifs is work-intensive and usually results in markers that are hard to 49 50 successfully transfer among different species (Squirrell et al 2003). With Next Generation 51 52 53 Sequencing methods becoming more widespread and less expensive, and with the increased 54 55 availability of genomic data in public databases, developing microsatellites from in silico mining 56 57 58 of sequences has become more common (Zalapa et al. 2012). Another advantage is the ability to 59 60 61 62 63 2 64 65 1 2 3 4 generate non-anonymous markers that can be more easily transferred across species, especially 5 6 when they are located in conserved genomic regions, such as within genes and expressed regions 7 8 9 of the DNA (Ellis and Burke 2007). 10 11 The recent development of target capture probes specific to the Asteraceae, targeting ca. 1000 12 13 14 nuclear orthologous regions (Mandel et al. 2014), has considerably advanced our understanding of 15 16 the phylogeny of the family (Mandel et al. 2015; 2017; 2019), and provided an abundance of 17 18 19 genomic data that have yet to be explored in other ways. Marker development is one of the 20 21 potential uses, and has already been tested with the development of microsatellites for genus 22 23 24 Antennaria (Thapa et al. 2019). 25 26 Asteraceae is the third most diverse family in Brazil, with more than 2000 species occurring 27 28 in the country, being among the top five more numerous families in five of the six tradionally 29 30 31 recognized phytogeographic domains (BFG 2015). Chresta Vell. ex DC. has 18 species mostly 32 33 endemic to Brazil (Siniscalchi 2019a), which are distributed mainly in the Cerrado, the savanna- 34 35 36 like environment in central Brazil, and in the Caatinga, a diverse phytogeographic domain 37 38 composed mainly of Seasonally Dry Tropical Forests in a semi-arid region in Northeastern Brazil. 39 40 41 The group of species that occurs in the Caatinga is composed exclusively of rupiculous , 42 43 usually represented by populations that are isolated from one another due to rock outcrops being 44 45 surrounded by a dry forest matrix (Siniscalchi et al. 2018). Some species, like Chresta harleyi 46 47 48 H.Rob., C. hatschbachii H.Rob. and C. subverticillata Siniscalchi and Loeuille are restricted to 49 50 small areas that are relatively close to one another, without showing overlap or signs of 51 52 53 hybridization, while others, like C. martii (DC.) H.Rob, have a wider distribution, showing 54 55 morphological differentiation between populations on the extremes of the distribution (Siniscalchi 56 57 58 et al. 2019b). Although these seven species share many morphological similarities and 59 60 61 62 63 3 64 65 1 2 3 4 environmental preferences, they do not form a monophyletic group, with C. martii actually being 5 6 the sister taxa to the rest of the genus. The other six Caatinga species form a clade which is further 7 8 9 subdivided into two clades (Siniscalchi et al. 2019c). 10 11 The natural isolation of the populations of Chresta in the Caatinga and the presence of 12 13 14 morphological variability throughout a geographical gradient raise interesting questions about the 15 16 evolutionary processes that act on these species, as the gene flow between different populations is 17 18 19 likely limited. Aiming to facilitate future studies of the genetic variation in Chresta species, we 20 21 developed microsatellite markers using previously obtained genomic data from a phylogenetic 22 23 24 study of Vernonieae (Siniscalchi et al. 2019c), comprising both nuclear and chloroplast sequences. 25 26 27 28 2. Material and Methods 29 30 31 DNA extraction—Total DNA was extracted from silica-gel dried leaves using the E.Z.N.A.® SQ 32 33 Plant DNA Kit from Omega Bio-Tek (Norcross, GA, USA), with the addition of PVP and 34 35 36 Ascorbic Acid to the first extraction buffer (10 mL SQ1 buffer, 100 mg PVP, 90 mg ascorbic 37 38 acid). One extra step was added for Chresta martii extractions, consisting of two washes with 1 39 40 41 mL of STE buffer (0.25 M sucrose, 0.03 M Tris, 0.05 M EDTA), followed by 10 minutes of 42 43 centrifugation at 2,000 g, in order to remove mucilage (adapted from Shepherd and McLay 2011). 44 45 Target capture and genomic data assembly—Total DNA extracted from 17 Chresta species 46 47 48 (Table 1) was quantified using fluorometry (Qubit 3.0, ThermoFisher Scientific), then sheared to 49 50 300 bp fragments using a sonicator (Covaris S series or QSonica Q500). Illumina libraries were 51 52 53 prepared with the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs 54 55 Inc., Ipswich, MA, USA), according to the manufacturer’s instructions, using 15 cycles on the last 56 57 58 amplification step. Final library concentrations and sizes were checked using Qubit and libraries 59 60 61 62 63 4 64 65 1 2 3 4 were then pooled in groups of four. Target capture was carried out using the myBaits COS 5 6 Compositae/Asteraceae 1Kv1 kit (Mandel et al. 2014) (Arbor Biosciences, Ann Arbor, MI, USA), 7 8 9 according to the manufacturer’s instructions and using a 36-hour incubation step. Sequencing was 10 11 performed at Macrogen Inc. (Seoul, South Korea), on an Illumina HiSeq2500 device, in paired- 12 13 14 end, high-throughput mode. 15 16 Reads were trimmed for quality using Trimmomatic (Bolger et al. 2014) and assembled into 17 18 19 contigs using SPAdes (Bankevich et al. 2012), with kmer lengths of 21, 33, 55, 77, 99 and 127. 20 21 The sequences were matched back to the original probes using the phyluce pipeline (Faircloth 22 23 24 2016), resulting in approximately 700 individual alignments corresponding to the recovered 25 26 targeted loci, containing all taxa for which that given locus was recovered. 27 28 29 30 31 Repeat identification and primer design—The identification of putative microsatellites was 32 33 carried out in the individual matrices obtained for each locus, using the plugin Phobos on 34 35 36 Geneious (Mayer 2006–2010), searching for di- to pentanucleotides motifs, with a minimal length 37 38 of 15 base pairs and allowing for imperfect motifs. All loci identified as having repeats were then 39 40 41 inspected by eye, and those showing the longest repeats or that presented variation among the 42 43 sequences in the matrix were selected. Loci that were found in more than one taxon present in the 44 45 matrix were preferred, in order to ensure transferability. The consensus for each alignment was 46 47 48 subjected to a BLAST search on GenBank, using BLASTn, to identify in which part of the gene 49 50 the repeat was. We gave preference to repeats located in introns or less conserved regions (Table 51 52 53 2). This approach generated 28 markers. 54 55 To complement the analysis, the SPAdes assembled contigs of Chresta harleyi and C. 56 57 58 pacourinoides were search for microsatellite repeats following the methods as before. The most 59 60 61 62 63 5 64 65 1 2 3 4 promising regions were also subjected to a BLAST search to confirm they did not correspond to 5 6 any of the selected regions from the alignment approach described above. This approach resulted 7 8 9 in 18 additional markers. 10 11 Plastid sequences obtained as bycatch from the target capture method were assembled using the 12 13 14 lettuce plastid genome as a reference (GenBank accession: DQ383816.1) with Bowtie2 15 16 (Langmead and Salzberg 2012), resulting in seven plastid genomes with different degrees of 17 18 19 completeness that were aligned using the MAFFT plugin in Geneious (Katoh and Standley 2013). 20 21 Plastid microsatellites were developed based on this alignment, using the same methods described 22 23 24 above, resulting in six markers. 25 26 Primers were developed using the Primer3 plugin in Geneious (Koressaar and Remm 2007, 27 28 Untergasser et al. 2012) using the default parameters of the program. The designed primers were 29 30 31 synthesized at IDT (San Jose, CA, USA), and a m13 sequence (CACGACGTTGTAAAACGAC) 32 33 was added to the 5' end of the forward primers in order to implement the fluorescent labeling of 34 35 36 PCR products method of Schuelke (2000). 37 38 Polymerase chain reaction (PCR), genotyping and polymorphism evaluation—After primers 39 40 41 were synthetized, initial PCR tests were carried out for all 52 markers using a basic recipe (1.5 µL 42 43 10X buffer, 0.5 µL MgCl2 25 nM, 0.2 µL DNTPs 20 nM (5 nM of each), 0.35 µL forward primer 44 45 at 5 nM, 0.35 µL reverse primer at 20 nM, 0.35 µL of unlabeled m13 at 10 nM, 1.0 µL of Taq and 46 47 48 1.5 µL of DNA), composing a 15 µL reaction, with two individuals each of Chresta harleyi, C. 49 50 martii and C. subverticillata. A standard touch-down PCR protocol was used, using the following 51 52 53 specifications: initial denaturation step of 3 min at 95 °C, then ten cycles with a 30 seconds 54 55 denaturation step at 94 °C, followed by a 30 sec annealing step, with the temperature starting at 65 56 57 58 °C and decreasing one degree per cycle, then a 1 min extension step at 72 °C, followed for another 59 60 61 62 63 6 64 65 1 2 3 4 30 cycles with 30 sec denaturation at 94 °C, 30 sec annealing at 55 °C and 1 min extension at 72 5 6 °C, with a final 10 min extension period at 72 °C. Amplification was verified using agarose gel 7 8 9 electrophoresis, and the primers that resulted in clear, single bands, with fragment size larger than 10 11 150 bp were selected for genotyping tests, totaling 22 nuclear markers and 6 plastid markers. 12 13 14 Genotyping was carried out in 22 individuals of Chresta harleyi, 28 individuals of C. martii and 15 16 12 individuals of C. subverticillata, from different populations (Table 3). DNA extraction and 17 18 19 amplification were carried out as described above, but with the addition of m13 labelled with 6- 20 21 FAM, VIC or NED in the PCR reaction. Final PCR products were combined into run plates with 22 23 24 GeneScan 500 LIZ Size Standard (ThermoFisher Scientific, Waltham, MA, USA) and genotyping 25 26 was carried out in an ABI 3130XL sequencer at the Molecular Resource Center at the University 27 28 of Tennessee in Memphis. The sequenced fragments were analyzed on Gene Marker version 2.6.3 29 30 31 and genotypes were scored. Frequency-based indexes, such as allele number, observed and 32 33 expected heterozygosity and fixation index, were calculated using GenAlEx (Peakall and Smouse 34 35 36 2006; 2012). 37 38 39 40 41 3. Results and Discussion 42 43 From the total of 52 markers developed, 28 were successfully amplified and genotyped across 44 45 three species of Chresta. From these 28, eight presented dinucleotide repeats, nine were 46 47 48 trinucleotide repeats, seven were tetranucleotide repeats and two were pentanucleotide repeats. 49 50 Two markers were a composite of two repeat areas in the same region, being one di-/dinucleotide 51 52 53 repeats and the other di-/trinucleotide repeats (Table 4). 54 55 All 22 nuclear markers selected for genotyping were amplified in all three species, and from these, 56 57 58 20 were polymorphic. Besides the two markers that were monomorphic for all three species (NC3 59 60 61 62 63 7 64 65 1 2 3 4 and NC9), two were monomorphic in both Chresta harleyi and C. subverticillata (NC6 and NC8), 5 6 one was monomorphic in C. martii (NC7) and two other were monomorphic in C. subverticillata 7 8 9 (NC10 and NC12). The highest number of alleles in a single marker was 24, in marker NC24. 10 11 Twelve markers presented less than 10 alleles, eleven markers had between 10 and 20 alleles and 12 13 14 one had more than 20 alleles (Table 5). The species with the higher number of private alleles was 15 16 C. martii (80), followed by C. harleyi (62) and C. subverticillata (23). All individuals presented 17 18 19 up to two alleles, suggesting these species are diploid, although there are no ploidy reports in the 20 21 literature that could confirm this finding. It is known that most extant Asteraceae present one 22 23 24 ancient genome duplication (Barker et al. 2016), what could suggest that at least the regions where 25 26 the repeats are found do not present multiple copies in the genome. 27 28 Regarding polymorphic markers, observed heterozygosity varied from 0.045 to 0.643 (mean value 29 30 31 0.252) in Chresta harleyi, from 0.091 to 0.381 (mean value 0.142) in C. martii and from 0.083 to 32 33 0.917 (mean value 0.323) in C. subverticillata. Most markers presented a deficit of heterozygotes 34 35 36 (F>0), especially in C. harleyi and C. martii., which had one and two loci with heterozygote 37 38 excess, respectively. Six loci in C. subeverticillata had F<0. No locus consistently present an 39 40 41 excess of heterozygotes in all species, but locus NC1 presented F<0 in C. martii and C. 42 43 subverticillata (Table 5). As few individuals from each population were sampled, not representing 44 45 the whole distribution of the species, our results are preliminary at best, but representative of the 46 47 48 potential of the newly developed markers. 49 50 From the six plastid markers developed, three were trinucleotide repeats and three were 51 52 53 tetranucleotide repeats (Table 6). They showed less variation, with all six markers being 54 55 monomorphic in Chresta harleyi and C. subverticillata. In C. martii, three were polymorphic 56 57 58 (CL2, CL4, CL5) and three were monormophic. The marker CL2 had four alleles and CL4 and 59 60 61 62 63 8 64 65 1 2 3 4 CL5 had two alleles each (Table 7). Previous studies have shown that plastid sequences in 5 6 Asteraceae are usually highly conserved, presenting little divergence, especially among closely 7 8 9 related taxa (Shaw et al. 2005; Timmee et al. 2007). Our choice to prioritize di- to pentanucleotide 10 11 repeats might also have influence in the low variability found, as we excluded more variable 12 13 14 repeats, such as poly-A tails. 15 16 The conserved nature of our initial genomic data, obtained from target capture of conserved 17 18 19 orthologous loci, could lead to the presence of lower numbers of alleles and gene diversity as 20 21 compared to other methods of microsatellite development. The study by Thapa et al. (2019), 22 23 24 which used the same set of loci as basis for microsatellite development, showed similar results as 25 26 ours, with markers presenting low number of alleles (1 to 7 alleles per marker), but easily 27 28 transferable across related species. It is also noteworthy that the chloroplast markers developed by 29 30 31 Thapa et al. (2019) also present very low number of alleles, with most of them being 32 33 monomorphic across species, as also shown here. 34 35 36 Few sets of microsatellite markers are available for taxa in tribe Vernonieae, where Chresta is 37 38 placed. A study that developed markers from microsatellite-enriched libraries for Chrysolaena 39 40 41 (Camacho et al. 2017) successfully amplified nine markers across three population of C. obovata, 42 43 reporting three to eight alleles per locus (average of 5.11). However, this taxon is widespread in 44 45 the Brazilian Cerrado, not presenting particularly isolated populations or extreme environmental 46 47 48 requirements, differently from the Chresta species from the Caatinga, which present highly 49 50 fragmented populations. This might lead to different evolutionary pressures over the populations 51 52 53 of these taxa, resulting in different levels of genetic diversity and impairing a direct comparison. 54 55 Despite the markers in Camacho et al. (2017) being developed from anonymous regions, most of 56 57 58 them were successfully transferred to other Asteraceae species, which is unusual, as the traditional 59 60 61 62 63 9 64 65 1 2 3 4 method of developing markers from microsatellite-enriched libraries frequently results in loci that 5 6 are not easily transferred to distantly related taxa due to lack of homology among genomic regions 7 8 9 or large variations in primer binding sites (Whitton et al 1997, Merritt et al. 2015). 10 11 Microsatellite markers are also available for Lychnophora ericoides (Rabelo et al. 2011) and L. 12 13 14 pinaster (Haber et al. 2008), which both occur in more fragmentary environments, such as rock 15 16 outcrops and highland rocky fields (Loeuille et al. 2019). The development of markers in these 17 18 19 studies followed traditional methods, and the markers proved to be considerably variable, with 20 21 number of alleles per locus varying from 1 to 13 (average of 8.6) in L. ericoides and 2 to 21 22 23 24 (average 6.6) in L. pinaster. Nevertheless, these studies did not test the transferability of the 25 26 markers across other close species. None of the markers developed for Chrysolaena and 27 28 Lychnophora were subsequently used in population genetics studies, precluding any assumptions 29 30 31 about the evolutionary forces that may act over these species and others that share similar habitats 32 33 and environmental requirements. 34 35 36 In our study we found different mean fixation index values (Table 5) for the three different 37 38 species, likely indicating that these species are under different reproductive regimens, with 39 40 41 different levels of inbreeding. The value found for Chresta martii is particularly high (0.683) and 42 43 combined with the very low observed heterozygosity indicates that this species presents strongly 44 45 isolated populations. Given the lack of knowledge about reproductive systems in Chresta, that 46 47 48 might also indicate that this taxa presents some sort of self-compatibility and that self-pollination 49 50 might be occurring. The divergence between observed and expected heterozygosity seen in C. 51 52 53 harleyi and C. martii also indicates that many of the loci studied here are not in Hardy-Weinberg 54 55 equilibrium, being consistent with the hypothesis of low gene flow due to strong population 56 57 58 structuring. 59 60 61 62 63 10 64 65 1 2 3 4 Despite the limitations of our study, with a low number of individuals per species, we found that 5 6 most of our newly described loci are polymorphic, although with relatively low number of alleles 7 8 9 (Table 5). Given how readily the nuclear and plastid markers were successfully amplified and 10 11 genotyped across three species, even with the considerable phylogenetic distance among them 12 13 14 (Siniscalchi et al. 2019c), our results show that the use of comparative genomic data for the 15 16 development of microsatellite markers is promising. Complementary studies focusing on genetic 17 18 19 diversity and structure of the three species presented here and other related taxa are being 20 21 developed in order to elucidate the micro-evolutionary processes that drive diversification in these 22 23 24 taxa and possibly in other endemic taxa from the Caatinga. 25 26 27 28 Acknowledgments 29 30 31 This research was supported by two FAPESP (Fundação de Amparo a Ciência do Estado de São 32 33 Paulo) scholarships (2013/18189-2, 2016/12446-1) and the National Science Foundation Division 34 35 36 of Environmental Biology DEB-1745197. We would like to thank Tom Cunningham from the 37 38 MRC at UT-Memphis for carrying out the genotyping, Gabbie Johson for helping with DNA 39 40 41 extractions and Ramhari Thapa for the useful discussions and advice. 42 43 44 45 Authors’ Contributions 46 47 48 CMS designed the study, sampled specimens in the field, did laboratory work, analyzed the data 49 50 and wrote the manuscript. BL helped with field collections, provided data about the occurrence of 51 52 53 each species and reviewed the manuscript. JRP reviewed and commented in several versions of 54 55 the manuscript. JRM contributed to the study design and laboratory routines, provided reagents 56 57 58 and equipment for genotyping, helped in data analysis and reviewed the manuscript. 59 60 61 62 63 11 64 65 1 2 3 4 5 6 7 Conflict of interest 8 9 The authors declare that there is no confict of interest. 10 11 12 13 14 References 15 16 Bankevich A, Nurk S, Antipov D et al (2012) SPAdes: A new genome assembly algorithm and its 17 18 19 applications to single-cell sequencing. J Comput Biol 19:455–477. doi: 20 21 10.1089/cmb.2012.0021 22 23 24 Barker MS, Li Z, Kidder TI, Reardon CR, Lai Z, Oliveira LO, Scascitelli M, Rieseberg LH 25 26 (2016) Most Compositae (Asteraceae) are descendants of a paleohexaploid and all share a 27 28 paleotetraploid ancestor with the Calyceraceae. Am J Bot 103:1203–1211. doi: 29 30 31 10.3732/ajb.1600113 32 33 Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: A flexible trimmer for Illumina Sequence 34 35 36 Data. Bioinformatics 30:2114–2120. doi: 10.1093/bioinformatics/btu170 37 38 The Brazil Flora Group (2015) Growing knowledge: an overview of Seed Plant diversity in 39 40 41 Brazil. Rodriguésia 66:1085–1113. doi: 10.1590/2175-7860201566411 42 43 Camacho LMD, Schatzer CAF, Alves-Pereira A, Zucchi MI, Carvalho MAM, Gaspar M (2017) 44 45 Development, characterization and cross-amplification of microsatellite markers for 46 47 48 Chrysolaena obovata, an important Asteraceae from Brazilian Cerrado. J Genet 96:47–53. 49 50 doi: 10.1007/s12041-017-0812-9 51 52 53 Ellis JR, Burke JM (2007) EST-SSRs as a resource for population genetic analysis. Heredity 54 55 99:125–132. doi: 10.1038/sj.hdy.6801001 56 57 58 59 60 61 62 63 12 64 65 1 2 3 4 Faircloth BC (2016) PHYLUCE is a software package for the analysis of conserved genomic loci. 5 6 Bioinformatics 32:786–788. doi:10.1093/bioinformatics/btv646 7 8 9 Haber LH, Cavallari MM, Santos FRC, Marques MOM, Gimenes MA, Zucchi MI (2008) 10 11 Development and characterization of microsatellite markers for Lychnophora pinaster: a 12 13 14 study for the conservation of a native medicinal plant. Mol Ecol Resour 9:811–814. doi: 15 16 10.1111/j.1755-0998.2008.02309.x 17 18 19 Hodel RGJ, Segovia-Salcedo MC, Landis JB, Crowl AA, Sun M, Liu X, Gitzendanner MA, 20 21 Douglas NA, Germain-Aubrey CC, Chen S, Soltis DE, Soltis PS (2016) The report of my 22 23 st 24 death was an exaggeration: A review for researchers using microsatellites in the 21 century. 25 26 Appl Plant Sci 4:1600025. doi: 10.3732/apps.1600025 27 28 Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: 29 30 31 improvements in performance and usability. Mol Biol Evol 30:772–780. doi: 32 33 10.1093/molbev/mst010 34 35 36 Koressaar T, Remm M (2007) Enhancements and modifications of primer design program 37 38 Primer3. Bioinformatics 23:1289–1291. doi: https://doi.org/10.1093/bioinformatics/btm091. 39 40 41 Langmead B, Salzberg S (2012) Fast gapped-read alignment with Bowtie 2. Nature Methods 42 43 9:357–359. doi: 10.1038/nmeth.1923 44 45 Loeuille B, Semir J, Pirani JR (2019) A synopsis of Lychnophorinae (Asteraceae: Vernonieae). 46 47 48 Phytotaxa 398:1–139. doi: https://doi.org/10.11646/phytotaxa.398.1.1 49 50 Mandel JR, Dikow RB, Funk VA et al. (2014) A target enrichment method for gathering 51 52 53 phylogenetic information from hundreds of loci: an example from the Compositae. Appl 54 55 Plant Sci 2:1300085. doi: 10.3732/apps.1300085 56 57 58 59 60 61 62 63 13 64 65 1 2 3 4 Mandel JR, Dikow RB, Funk VA (2015) Using phylogenomics to resolve mega-families: An 5 6 example from Compositae. J Syst Evol 53:391–402. doi: 10.1111/jse.12167 7 8 9 Mandel JR, Barker MS, Bayer RJ, Dikow RB, Gao TG, Jones KE, Keeley S, Kilian N, Ma H, 10 11 12 Siniscalchi CM, Susanna A, Thapa R, Watson L, Funk VA (2017) The Compositae tree of 13 14 life in the age of phylogenomics. J Syst Evol 55:405–410. https://doi.org/10.1111/jse.12265 15 16 17 Mandel JR, Dikow RB, Siniscalchi CM, Thapa R, Watson LE, Funk VA (2019) A fully resolved 18 19 20 backbone phylogeny reveals numerous dispersals and explosive diversifications throughout 21 22 the history of Asteraceae. Proc Natl Acad Sci 116:14083–14088. doi: 23 24 https://doi.org/10.1073/pnas.1903871116 25 26 27 Mayer C (2006–2010). Phobos 3.3.11, . 28 29 30 Merritt BJ, Culley TM, Avanesyan A, Stokes R, Brzyski J (2015) An empirical review: 31 32 characteristics of plant microsatellite markers that confer higher levels of genetic variation. 33 34 35 Appl Pl Sci 3:1500025. doi:10.3732/apps.1500025 36 37 Peakall R, Smouse PE (2012) GenAlEx 6.5: genetic analysis in Excel. Population genetic software 38 39 for teaching and research-an update. Bioinformatics 28:2537–2539. doi: 40 41 42 https://doi.org/10.1093/bioinformatics/bts460 43 44 Peakall R, Smouse PE (2006) GENALEX 6: genetic analysis in Excel. Population genetic 45 46 47 software for teaching and research. Mol Ecol Notes 6:288–295. doi: 48 49 https://doi.org/10.1111/j.1471-8286.2005.01155.x 50 51 52 Rabelo SG, Teixeira CF, Telles MPC, Collevatti RG (2011) Development and characterization of 53 54 microsatellite markers for Lychnophora ericoides, an endangered Cerrado shrub species. 55 56 57 Conservation Genet Resour 3:741–743. doi: 10.1007/s12686-011-9447-y 58 59 60 61 62 63 14 64 65 1 2 3 4 Schuelke M (2000) An economic method for the fluorescent labeling of PCR fragments. Nature 5 6 Biotechnology 18:233–234. doi: https://doi.org/10.1038/72708 7 8 9 Shaw J, Lickey EB, Beck JT, Farmer SB, Liu W, Miller J, Siripun KC, Winder CT, Schilling EE, 10 11 Small RL (2005) The tortoise and the hare II: relative utility of 21 noncoding chloroplast 12 13 14 DNA sequences for phylogenetic analysis. Am J Bot 92:142–166. doi: 10.3732/ajb.92.1.142 15 16 Shepherd LD, McLay TGB (2011) Two micro-scale protocols for the isolation of DNA from 17 18 19 polysaccharide-rich plant tissue. J Plant Res 124:311–314. doi: 10.1007/s10265-010-0379-5 20 21 Siniscalchi CM, Loeuille BFP, Pirani JR (2018) Two New Rupicolous Species of Chresta 22 23 24 (Asteraceae, Vernonieae) from the Brazilian Caatinga. Syst Bot 43:1059–1071. doi: 25 26 10.1600/036364418X697698 27 28 Siniscalchi CM (2019a) Chresta in Flora do Brasil 2020 under construction. Jardim Botânico do 29 30 31 Rio de Janeiro.Available at: . 32 33 (accessed 08 Oct 2019) 34 35 36 Siniscalchi CM, Loeuille BFP, Siqueira Filho, JS, Pirani JR (2019b) Chresta artemisiifolia 37 38 (Vernonieae, Asteraceae), a new endangered species from a recently created protected area 39 40 41 in the Brazilian Caatinga. Phytotaxa 399:119–126. doi: 42 43 https://doi.org/10.11646/phytotaxa.399.2.2 44 45 Siniscalchi CM, Loeuille BFP, Funk, VA, Mandel, JR, Pirani JR (2019c) Phylogenomics yields 46 47 48 new insight into relationships within Vernonieae (Asteraceae). Front Plant Sci. doi: 49 50 10.3389/fpls.2019.01224 51 52 53 Squirrell J, Hollingsworth PM, Woodhead M, Russell J, Lowe AJ, Gibby M, Powell W (2003) 54 55 How much effort is required to isolate nuclear microsatellites from plants? Mol Ecol 56 57 58 12:1339–1348. doi: https://doi.org/10.1046/j.1365-294X.2003.01825.x 59 60 61 62 63 15 64 65 1 2 3 4 Thapa R, Bayer RJ, Mandel JR (2019) Development and characterization of microsatellite 5 6 markers for Antennaria corymbosa (Asteraceae) and close relatives. Appl Plant Sci 7 8 9 7:e11268. doi: https://doi.org/10.1002/aps3.11268 10 11 Timmee RE, Kuehl JV, Boore JL, Jansen RK (2007) A comparative analysis of the Lactuca and 12 13 14 Helianthus (Asteraceae) plastid genomes: identification of divergent regions and 15 16 categorization of shared repeats. Am J Bot 94:302–312. doi: 10.3732/ajb.94.3.302 17 18 19 Untergasser A, Cutcutache I, Koressaar K et al. (2012) Primer3 - new capabilities and interfaces. 20 21 Nucleic Acids Research 40:e115. doi: https://doi.org/10.1093/nar/gks596 22 23 24 Whitton J, Rieseberg LH, Ungerer MC (1997) Microsatellite loci are not conserved across the 25 26 Asteraceae. Mol Biol Evol 14:204–209. doi: 27 28 https://doi.org/10.1093/oxfordjournals.molbev.a025755 29 30 31 Zalapa JE, Cuevas H, Zhu H, Steffan S, Senalik D, Zeldin E, McCown B, Harbut R, Simon P 32 33 (2012) Using Next-Generation Sequencing approaches to isolate simple sequence repeat 34 35 36 (SSR) loci in the plant sciences. Am J Bot 99:193–208. doi: 10.3732/ajb.1100394 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 16 64 65 Table Click here to access/download;Table;Tables_final_edits.docx

Tables

Table 1. Voucher list of samples used in the initial development of microsatellite markers. Species name Location Voucher (Herbarium) Chresta angustifolia Gardner Brazil, Goiás, Alto Cavalcante C.M. Siniscalchi 490 (SPF) Chresta artemisiifolia Siniscalchi & J. Siqueira-Filho 3671 Brazil, Bahia, Sento Sé Loeuille (HVASF) Chresta curumbensis (Philipson) H.Rob. Brazil, Distrito Federal, Gama C.M. Siniscalchi 573 (SPF) Chresta exsucca DC. Brazil, Goiás, Alto Paraíso C.M. Siniscalchi 378 (SPF) Brazil, Minas Gerais, Conselheiro Chresta filicifolia Siniscalchi & Loeuille C.M. Siniscalchi 512 (SPF) Pena Chresta harleyi H.Rob. Brazil, Bahia, Licínio de Almeida C.M. Siniscalchi 459 (SPF) Chresta hatschbachii H.Rob. Brazil, Bahia, Oliveira dos Brejinhos C.M. Siniscalchi 468 (SPF) Chresta heteropappa Siniscalchi & Brazil, Ceará, Uruburetama C.M. Siniscalchi 614 (SPF) Loeuille Chresta martii (DC.) H.Rob. Brazil, Bahia, Casa Nova C.M. Siniscalchi 482 (SPF) Chresta pacourinoides (Mart. ex DC.) Brazil, Bahia, Feira de Santana B. Loeuille 351 (SPF) Siniscalchi & Loeuille Chresta plantaginifolia (Less.) Gardner Brazil, Distrito Federal, Gama C.M. Siniscalchi 573 (SPF) Chresta pycnocephala DC. Brazil, Minas Gerais, Itacambira C.M. Siniscalchi 535 (SPF) Chresta souzae H.Rob. Brazil, Goiás, Alto Paraíso C.M. Siniscalchi 571 (SPF) Chresta scapigera (Less.) Gardner Brazil, Minas Gerais, Rio Paranaíba C.M. Siniscalchi 359 (SPF) Chresta speciosa Gardner Brazil, Goiás, Alto Paraíso C.M. Siniscalchi 421 (SPF) Chresta sphaerocephala DC. Brazil, Distrito Federal, Planaltina C.M. Siniscalchi 576 (SPF) Chresta subverticillata Siniscalchi & Brazil, Bahia, Gentio do Ouro C.M. Siniscalchi 634 (SPF) Loeuille

1 Table 2. Putative regions where microsatellite markers were found. Locus name Genomic region Genomic region description Possible location on genomic region CL1 psbA Photosystem II protein D1 Exon CL2 atpA ATP synthase CF1 alpha chain Exon CL3 psbD Photosystem II protein D2 Exon CL4 ndhK - ndhC Intergenic region Intergenic region CL5 ycf2 - trnL Intergenic region Intergenic region CL6 trnV-rrn16 Intergenic region Intergenic region NC1 At2g04790 PTB domain engulfment adapter Unclear NC2 At4g22300 Carboxylesterase Possible intergenic space NC3 At4g26530 Aldolase superfamily protein Probably in exon NC4 Unmapped Possible intergenic space Possible intergenic space NC6 At5g65660 Hydroxyproline-rich glycoprotein family protein Unclear NC7 Malectin/receptor-like protein kinase family Exon NC8 At3g50690 Leucine-rich repeat (LRR) family protein Unclear NC9 At2g38360 Prenylated RAB acceptor 1.B4 Exon NC10 At3g02360 6-phosphogluconate dehydrogenase family protein Exon NC12 At5g24930 Zinc finger CONSTANS-like protein Exon NC13 At4g02060 Microsome maintenance protein family Possible intron NC15 At4g27390 Transmembrance protein Probably in exon NC16 At5g02120 One helix protein, homologous to cyanobacterial high-light inducible Probably in intron proteins NC17 At2g01670 Nudix hydrolase homolog 17 Possible intergenic region NC21 At3g59530 Calcium-dependent phosphotriesterase superfamily protein Exon NC22 At1g55000 Peptidoglycan-binding LysM domain-containing protein Intergenic region NC24 At2g26210 Ankyrin repeat family protein Intron NC26 At1g79040 Photosystem II subunit R Intron NC27 At4g09980 Methyltransferase MT-A70 family protein Intron NC29 At3g63510 FMN-linked oxidoreductases superfamily protein Intron NC35 Unknown Unknown Unknown NC38 Unknown Unknown Unknown

2

Table 3. Voucher list of populations sampled in the study. Species Locality (city, state, country) No. sampled Geographic Voucher no. individuals coordinates Chresta harleyi Mato Verde, Minas Gerais, Brazil 4 15°23'19"S, C.M. Siniscalchi 449 H.Rob. 42°46'36"W Urandi, Bahia, Brazil 4 14°44'46"S, C.M. Siniscalchi 457 42°34'25"W Licínio de Almeida, Bahia, Brazil 4 14°34'38"S, C.M. Siniscalchi 459 42°31'30"W Licínio de Almeida, Bahia, Brazil 4 14°34'22"S, C.M. Siniscalchi 460 42°31'27"W Jacaraci, Bahia, Brazil 4 14°49'46"S, C.M. Siniscalchi 462 42°26'06"W Caetité, Bahia, Brazil 4 14°15'24"S, C.M. Siniscalchi 463 42°31'21"W Caetité, Bahia, Brazil 4 14°15'57"S, C.M. Siniscalchi 464 42°31'45"W Chresta martii Jaguarari, Bahia, Brazil 2 10°06'11"S, C.M. Siniscalchi 473 (DC.) H.Rob. 40°13'47"W Jaguarari, Bahia, Brazil 2 9°56'11"S, C.M. Siniscalchi 474 40°15'49"W Sobradinho, Bahia, Brazil 2 9°28'54"S, C.M. Siniscalchi 479 40°51'59"W Casa Nova, Bahia, Brazil 2 9°22'52"S, C.M. Siniscalchi 482 40°48'02"W Petrolina, Pernambuco, Brazil 2 9°21'41"S, C.M. Siniscalchi 565 40°23'05"W Teixeira, Paraíba, Brazil 2 7°12'10"S, C.M. Siniscalchi 635 37°15'30"W Santa Luzia, Paraíba, Brazil 2 6°53'05"S, C.M. Siniscalchi 638 36°53'02"W Parelhas, Rio Grande do Norte, Brazil 2 6°42'09”"S, C.M. Siniscalchi 639 36°41'28"W São João do Sabugi, Rio Grande do 2 6°41'46"S, C.M. Siniscalchi 641 Norte, Brazil 37°09'47"W Campo Formoso, Bahia, Brazil 2 10°11'32"S, J. Siqueira Filho 3537 41°04'45"W Sento Sé, Bahia, Brazil 2 10°22'12"S, J. Siqueira Filho 3657 41°44'58"W Chresta Gentio do Ouro, Bahia, Brazil 4 11°11'31"S, C.M. Siniscalchi 630 subverticillata 42°43'03"W Siniscalchi & Loeuille Gentio do Ouro, Bahia, Brazil 4 11°05'48"S, C.M. Siniscalchi 631 42°43'18"W Gentio do Ouro, Bahia, Brazil 4 11°06'26"S, C.M. Siniscalchi 634 42°43'10"W

3 Table 4. Characteristics of 22 nuclear microsatellites developed in Chresta. Locus Primer sequence Ta (°C) Repeat Allele size GenBank motif range (bp) Accession NC1 F: GCCCAAGAGTTATCGCTAAAGC 57.0 AAAT 105–165 MK231144 R: GCCGCCACGTAGACTTCATA NC2 F: AGGTTAAGGCACCTGCAACA 56.5 AG 181–206 MK231145 R: GAGGTGGCTGCTGGAATTG NC3 F: AGTGGGTGTGGGTGTAGGAT 56.8 AGG 123 MK231146 R: TGACGTGGAGCAATTGACGA NC4 F: AGCACCAGTAGCGACGTAAC 53.1 AG 155–169 MK231147 R: TGAACATCGCTTTTGTTTCTCA NC6 F: ATAGGCTTTCCACTCGGCAC 57.1 AATTC 286–291 MK231148 R: CGGCCAAACAGGAGCAAATC NC7 F: CCCAATCACATGGTCACGGA 52.9 AAAAT 135–160 MK231149 R: TCAGGAACTCAAAGAAAATAACATCA NC8 F: GCCGTTGAAGTTGAGGAGGA 57.5 ATC 218–240 MK231150 R: GCCCACCAAGATCACCATCA NC9 F: CGACCGATCAGCATTCTCCA 57.8 ACG 241 MK231151 R: ATCACCAAGGGAGGATCGGA NC10 F: GGTGGCAACGAGTGGTATGA 57.4 AG 161–171 MK231152 R: GTAAGCCTCAAAGGACCCTCC NC12 F: GTAGAGCAGACTCCGCCTTC 57.3 AC 208–210 MK231153 R: GGGGTTGGCGGAGTGAATAT NC13 F: ATCTCCTGCCCTTGGGTTTG 56.8 ACAT 190–284 MK231154 R: GGCAGCTGAAATGTATGCCC NC15 F: CCCACGAGGAGAYACGTTTG 54.9 CGG 246–267 MK231155 R: YAGCCAAAGCAAGATTYCCC NC16 F: AGCTCCCACCTGGTGTATGA 57.5 CA 186–203 MK231156 R: AGGGGCTTGRAATTTTGGCTG NC17 F: GCGTTGTTGGAATCGTCGAG 57.0 AT 304–330 MK231157 R: CTTACCCAGACTCTCTGCCG NC21 F: GGATGGCTTGGCTTTCCCTA 56.3 TG/AT 361–441 MK231158 R: CCAAAATTGGCCCTGCTCAT NC22 F: TACAAATGGGATGCTGCGGT 57.0 ACCA 212–278 MK231159 R: AACCTGCATACACAAGCGGA NC24 F: ACAAGGATTGTCCCTCTTGCA 55.5 TC 151–193 MK231160 R: TTGATTGAAAACAGCCGCCT NC26 F: ACAGGGAAAGGGTGTGTACC 57.8 TA/TGA 382–487 MK231161 R: CAGAGCCCCTCCTCCTAGAA NC27 F: CTGTYCGGCGRAGCACTGAT 52.2 CT 301–322 MK231162 R: SACTGTCTTTGACAAATGCAG NC29 F: KCTCTGATGGAGGAGACGAT 53.6 AAG 262–271 MK231163 R: GGATCTATTTCCCAAAAGAATTGTCA NC35 F: CCATCCACATGTCTGCCAGT 57.1 AGAT 366–421 MK231164 R: ACGCACCCGATCGGAATATC NC38 F: AGATCGAGGCGAAACCCAAG 57.6 ACC 266–315 MK231165 R: CACCAGTCTCAGTCGCAGTC Note: Ta = annealing temperature.

4 Table 5. Genetic diversity indexes of 22 nuclear microsatellite markers developed in Chresta.

C. harleyi (n=28) C. martii (n=22) C. subverticillata (n=12) A Ho He F A Ho He F A Ho He F NC1 2 0 0.080 1.000 2 0.091 0.087 -0.048 2 0.400 0.320 -0.250 NC2 10 0.462 0.803 0.425 5 0.333 0.686 0.514 4 0.167 0.625 0.733 NC3 1 0 0 - 1 0 0 - 1 0 0 - NC4 4 0.120 0.524 0.771 10 0.238 0.849 0.720 3 0 0.620 1.00 NC6 1 0 0 - 3 0 0.500 1.000 1 0 0 - NC7 2 0.045 0.044 -0.023 1 0 0 - 2 0.333 0.444 0.250 NC8 1 0 0 - 6 0.190 0.626 0.696 1 0 0 - NC9 1 0 0 - 1 0 0 - 1 0 0 - NC10 2 0.100 0.255 0.608 4 0.250 0.666 0.625 1 0 0 - NC12 2 0 0.211 1.000 3 0 0.322 1.000 1 0 0 - NC13 3 0 0.135 1.000 11 0 0.835 1.000 4 0.833 0.563 -0.481 NC15 7 0.615 0.789 0.220 3 0 0.254 1.000 2 0 0.153 1.000 NC16 9 0.643 0.845 0.239 7 0.381 0.724 0.474 4 0.917 0.670 -0.368 NC17 3 0.308 0.624 0.507 9 0.238 0.774 0.693 3 0.250 0.344 0.273 NC21 2 0.037 0.500 0.926 3 0.095 0.381 0.750 4 0.917 0.642 -0.427 NC22 2 0.269 0.233 -0.156 3 0.182 0.244 0.254 2 0.917 0.497 -0.846 NC24 15 0.667 0.853 0.219 11 0.318 0.815 0.610 6 0.417 0.674 0.381 NC26 9 0.292 0.809 0.639 5 0 0.600 1.000 3 0.083 0.344 0.758 NC27 7 0.519 0.805 0.356 10 0.364 0.674 0.460 3 0.167 0.403 0.586 NC29 2 0.360 0.385 0.064 3 0.091 0.334 0.728 3 0.917 0.625 -0.467 NC35 10 0.481 0.707 0.319 6 0.190 0.632 0.698 3 0.300 0.405 0.259 NC38 7 0.615 0.772 0.203 10 0.167 0.835 0.800 3 0.500 0.656 0.238 Mean 4.6 0.215 0.426 0.462 5.32 0.142 0.493 0.683 2.6 0.324 0.363 0.165 Note: A = number of alleles, Ho = observed heterozygosity, He = expected heterozygosity, F = fixation index.

5 Table 6. Characteristics of six plastid microsatellites developed in Chresta.

Locus Primer sequence Ta (°C) Repeat Allele size GenBank motif range Accession (bp) CL1 F: AACCATGAGCGGCTACGATA 55.9 AAG 124 MK231138 R: TGGTAACCTCTAGTTTGATCAGGG CL2 F: TTCCAGCCAATGATGACGCT 55.6 AAT 287–300 MK231139 R: ACCACCTCTTTCTCGACTTGAC CL3 F: AAAGGGAGTGTGTGCGAGTT 57.0 AAAG 293–298 MK231140 R: ATGCTGCGTCTGGACTTCAA CL4 F: ACTGATGGGGCCAACAAACA 57.6 AAAT 248–254 MK231141 R: CGGCAGGGGGATTCTGAAAT CL5 F: GCGCGTGTGATACATGTTCC 57.7 AAT 340–343 MK231142 R: TAATGGCTGTAGACCCCCGA CL6 F: AACGTGTCACAGCTTCCTCC 56.6 AAAG 277–278 MK231143 R: GCTAGGTAAGCGCCCTGTA

Note: Ta = annealing temperature.

Table 7. Genetic diversity of six plastid microsatellite markers developed in Chresta. C. harleyi (n=28) C. martii (n=22) C. subverticillata (n=12) A h A h A h 0 CL1 1 0 1 0 1 0 CL2 1 0 4 0.603 1 0 0 CL3 1 1 0 1 CL4 1 0 2 0.490 1 0 CL5 1 0 2 0.180 1 0 CL6 1 0 1 0 1 0 Note: A = number of alleles, h = unbiased haploid diversity.

6