Microsatellite Discovery in an Insular Amphibian
Total Page:16
File Type:pdf, Size:1020Kb
1 1 Methods and Resource Article 2 3 Microsatellite discovery in an insular amphibian (Grandisonia 4 alternans) with comments on cross-species utility and the accuracy of 5 locus identification from unassembled Illumina data 6 7 Eleanor A. S. Adamson1 • Anwesha Saha1, 2, 3 • Simon T. Maddock1, 2, 4 • Ronald 5 1 1 8 A. Nussbaum • David J. Gower • Jeffrey W. Streicher 9 10 11 1Department of Life Sciences, The Natural History Museum, London SW7 5BD, UK 12 13 2Department of Genetics, Evolution and Environment, University College London, 14 London WC1E 6BT, UK 15 16 3Ashoka Trust for Research of Ecology and the Environment, Bangalore, India 17 18 4Department of Animal Management, Reaseheath College, Nantwich, CW5 6DF, UK. 19 20 5Department of Ecology and Evolutionary Biology, University of Michigan, Ann 21 Arbor, Michigan 48109, USA 22 23 _____________________________________________________________________ 24 Eleanor A. S. Adamson and Anwesha Saha have contributed equally to this work. 25 _____________________________________________________________________ 26 27 Corresponding Author: Jeffrey W. Streicher ([email protected]) 28 29 Submitted to Conservation Genetics Resources 30 5 Figures, 2 Tables, 1 Table Online 31 Word counts: Abstract, 250 words [250 max]; Main text/references, 4,997 [6000 max] 32 33 2 34 35 Abstract The Seychelles archipelago is unique among isolated oceanic islands 36 because it features an endemic radiation of caecilian amphibians (Gymnophiona). In 37 order to develop population genetics resources for this system, we identified 38 microsatellite loci using unassembled Illumina MiSeq data generated from a genomic 39 library of Grandisonia alternans, a species that occurs on multiple islands in the 40 archipelago. Applying a recently described method (PALFINDER) we identified 41 8,001 microsatellite loci that were potentially informative for population genetics 42 analyses. Of these markers, we screened 60 loci using five individuals, directly 43 sequenced several amplicons to confirm their identity, and then used eight loci to 44 score allele sizes in 64 G. alternans individuals originating from five islands. A 45 number of these individuals were sampled using non-lethal methods, demonstrating 46 the efficacy of non-destructive molecular sampling in amphibian research. Although 47 two loci satisfied our criteria as diploid, neutrally evolving loci with the statistical 48 power to detect population structure, our success in identifying reliable loci was very 49 low. Additionally, we discovered some issues with primer redundancy and differences 50 between Illumina and Sanger sequences that suggest some Illumina-inferred loci are 51 invalid. We investigated cross-species utility for eight loci and found most could be 52 successfully amplified, sequenced and aligned across other genera/species of 53 caecilians from the Seychelles. Thus, our study in part supported the validity of using 54 PALFINDER with unassembled reads for microsatellite discovery within and across 55 species, but importantly identified major limitations to applying this approach to 56 small datasets (ca. 1 million reads) and loci with small tandem repeat sizes. 57 58 Keywords caecilians • cross amplification • Gymnophiona • Hypogeophis 59 •Indotyphlidae • Praslinia • Seq-to-SSR approach • simple sequence repeats 60 3 61 Introduction 62 63 Microsatellites or simple sequence repeats (SSR) are one of the most widely used 64 tools in population genetics and conservation biology. As costs associated with high- 65 throughput Next Generation DNA sequencing (NGS) have decreased, microsatellite 66 discovery via genomic shotgun libraries (the Seq-to-SSR approach) has emerged as a 67 reliable and economical methodology for non-model species (Abdelkrim et al. 2009; 68 Allentoft et al., 2009; Castoe et al. 2012a). Several software packages have been 69 developed to identify microsatellite loci this way including MSATCOMMANDER 70 (Faircloth 2008), PALFINDER (Castoe et al. 2012a), and QDD (Meglécz et al. 2010; 71 2014). Typically these methods use de novo assemblies or unassembled reads to 72 identify SSR loci. The use of unassembled reads is a timesaving characteristic (Castoe 73 et al. 2012a; Lance et al. 2013), however, several studies have cautioned against this 74 approach when using short read data, such as from the Illumina platform. These 75 studies have found that Illumina paired-end identified microsatellite loci can have an 76 order of magnitude smaller success rate (in terms of amplification across multiple 77 individuals and sequencing error) than those loci identified from longer 454 or PacBio 78 reads (Drechsler et al. 2013; Wei et al. 2014). Despite this, the use of PALFINDER to 79 identify SSR loci from short, unassembled Illumina reads is widespread (e.g. 80 amphibians, Drechsler et al. 2013; Peterman et al. 2013; bivalves, O’Bryhim et al. 81 2012a; crustaceans, Stoutamore et al. 2012; fish, O’Bryhim et al. 2012b; Nunziata et 82 al. 2013; mammals, Barthelmess et al. 2013; reptiles, Castoe et al. 2012b). 83 Interestingly, few studies have performed validation experiments using SSR loci 84 identified from unassembled reads. Those that have (e.g. Mikheyev et al. 2010; 85 Delmas et al., 2011; Castoe et al. 2012b) have typically used unassembled 454 reads, 86 thus leaving the question of whether SSR loci can be reliably identified from shorter 87 reads poorly addressed outside of a few studies (Drechsler et al. 2013; Wei et al. 88 2014). 89 In this study, we performed validation experiments (via direct sequencing and 90 power analysis) on SSR loci identified with unassembled Illumina data from 91 caecilians (Amphibia, Gymnophiona). Caecilians are limbless amphibians that are 92 restricted mostly to parts of the wet tropics and typically possess a fossorial ecology 93 (Gower & Wilkinson 2008). As such, they are generally rarely encountered and most 94 species are poorly understood (Gower & Wilkinson 2005). Globally, there are ten 4 95 families of caecilian (Wilkinson et al. 2011; Kamei et al. 2012). We focused on 96 Grandisonia alternans (Stejneger,1893), an indotyphlid species that is widely 97 distributed across the Seychelles, an archipelago situated off the coast of east Africa 98 in the Indian Ocean (Fig. 1). With both endemic frog and caecilian radiations the 99 Seychelles is unique among isolated oceanic archipelagos (Nussbaum 1984). Given 100 the exceptional evolutionary insights provided by species occurring on small islands 101 (e.g. Warren et al. 2015), the generally high level of threat of extinction faced by such 102 species (e.g. Daltry 2007), and the scarcity of knowledge of caecilian population 103 genetics (e.g. Gower & Wilkinson 2005), we were motivated to develop population 104 genetics tools for the endemic caecilians of the Seychelles. 105 To accomplish this we identified thousands of putative SSR loci from shotgun 106 genomic sequencing, tested 60 loci to investigate intraspecific utility, generated 107 Sanger sequences to examine sequence similarity to Illumina reads, identified eight 108 loci with putatively desirable characteristics for population genetics and size-scored 109 loci in multiple individuals of G. alternans. We also examined the cross-species 110 utility of these eight loci in additional caecilian taxa and compared divergence in their 111 flanking regions relative to G. alternans sequences. Based on our findings we 112 describe (i) the discovery of six microsatellite loci that amplify across divergent 113 caecilian species, including two loci that satisfy theoretical assumptions of selective 114 neutrally in G. alternans putative populations, and (ii) several important limitations 115 we encountered while using the Seq-to-SSR approach with unassembled Illumina 116 reads. 117 118 Materials and methods 119 120 DNA isolation and microsatellite identification 121 We generated a shotgun genomic library from a pool of six barcoded Seychelles 122 caecilian specimens, comprising representatives of five species: Grandisonia 123 alternans, G. larvata (Ahl 1934), G. sechellensis (Boulenger 1911), Hypogeophis 124 brevis Boulenger 1911 and H. rostratus (Cuvier 1829). Library preparation was 125 performed by first extracting genomic DNA with a Qiagen DNeasy Blood and Tissue 126 kit and then using a standard Illumina Nextera DNA kit to prepare the pooled sample 127 for sequencing (see Lewis et al. 2014). Sequencing was performed using a 500-cycle 128 v.2 reagent kit on an Illumina MiSeq at the Core Research laboratories of the Natural 5 129 History Museum, London. We removed and trimmed low quality reads of 130 unassembled paired-end data using Illumina software and the FASTX-Toolkit 131 (available at http://hannonlab.cshl.edu/fastx_toolkit/links.html). Barcoded pooled data 132 were de-multiplexed and only sequences from the target taxa G. alternans were 133 retained, originating from a single G. alternans individual from Silhouette Island 134 (UMMZ 192945). This single individual dataset was used in all downstream analysis. 135 136 Potentially amplifiable loci (PAL) selection 137 We used PALFINDER software v 2.03 (Castoe et al. 2012a) to identify putative SSR 138 loci from NGS output, run with default settings and applying the following criteria in 139 Primer3 (Untergasser et al. 2012): (i) a minimum of eight tandem repeats for 140 dinucleotide motifs, (ii) a minimum of six tandem repeats for trinucleotide motifs, and 141 (iii) a minimum of six tandem repeats for tetranucleotide motifs. From the 142 PALFINDER/Primer3