Genome-Wide RAD Sequencing Data Provide Unprecedented Resolution
Total Page:16
File Type:pdf, Size:1020Kb
www.nature.com/scientificreports OPEN Genome-wide RAD sequencing data provide unprecedented resolution of the phylogeny of Received: 1 February 2017 Accepted: 23 August 2017 temperate bamboos (Poaceae: Published: xx xx xxxx Bambusoideae) Xueqin Wang 1,2, Xiaying Ye1,3,4, Lei Zhao1,3,4, Dezhu Li1,5, Zhenhua Guo1 & Huifu Zhuang6 The temperate bamboos (tribe Arundinarieae, Poaceae) are strongly supported as monophyly in recent molecular studies, but taxonomic delineation and phylogenetic relationships within the tribe lack resolution. Here, we sampled 39 species (36 temperate bamboos and 3 outgroups) for restriction-site associated DNA sequencing (RAD-seq) with an emphasis on Phyllostachys clade and related clades. Using the largest data matrix for the bamboos to date, we were able to infer phylogenetic relationships with unparalleled resolution. The Phyllostachys, Shibataea, and Arundinaria clades defned from plastid phylogeny, were not supported as monophyletic group. However, the RAD-seq phylogeny largely agreed with the morphology-based taxonomy, with two clades having leptomorph rhizomes strongly supported as monophyletic group. We also explored two approaches, BWA-GATK (a mapping system) and Stacks (a grouping system), for diferences in SNP calling and phylogeny inference. For the same level of missing data, the BWA-GATK pipeline produced much more SNPs in comparison with Stacks. Phylogenetic analyses of the largest data matrices from both pipelines, using concatenation and coalescent methods provided similar tree topologies, despite the presence of missing data. Our study demonstrates the utility of RAD-seq data for elucidating phylogenetic relationships between genera and higher taxonomic levels in this important but phylogenetically challenging group. Te temperate bamboos (tribe Arundinarieae, Bambusoideae, Poaceae) are a clade of diverse taxa containing 32 genera and about 600 species1–4. Bamboos in this tribe have considerable ecological and economic value as most of species are major components of the subtropical and temperate forests in eastern and southeastern Asia. Many bamboo species are important sources of food, pulp manufacture, and materials for housing construction and artwork, such as Moso bamboo (Phyllostachys edulis)5. With highly diversifed morphology and lack of fowering characters due to long vegetative periods, this tribe is notorious for the complicated taxonomy1, 6. Although unequivocal sets of characters for classifying species and genera have not been identifed, mono- phyly of temperate bamboos has been strongly supported in many molecular studies7–12. According to biogeo- graphic analyses13, Arundinarieae diversifed during the middle to late Miocene, and followed by a rapid radiation especially within the clades containing largest genera and species. Such recent origin might make the temperate bamboos undergo very little molecular variation14 and result in the intricate phylogenetic relationships within Arundinarieae. Based on broad sampling and eight non-coding plastid regions, Zeng et al.15 divided it into ten 1Plant Germplasm and Genomics Center, Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China. 2College of Life Science and Agronomy, Zhoukou Normal University, Zhoukou, 466001, China. 3Kunming College of Life Sciences, University of Chinese Academy of Sciences, Kunming, 650201, China. 4University of Chinese Academy of Sciences, Beijing, 100049, China. 5Key Laboratory of Biodiversity and Biogeography, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China. 6Key Laboratory of Economic Plants and Biotechnology, Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201, China. Xueqin Wang and Xiaying Ye contributed equally to this work. Correspondence and requests for materials should be addressed to D.L. (email: [email protected]. ac.cn) or Z.G. (email: [email protected]) SCIENTIFIC REPORTS | 7: 11546 | DOI:10.1038/s41598-017-11367-x 1 www.nature.com/scientificreports/ major lineages, and confrmed that most genera within this tribe were highly heterogeneous and incongruent with current taxonomic circumscriptions. Subsequently, two additional clades were recovered in Yang et al.16 and Zhang et al.13, thus twelve lineages in all were currently recognized in Arundinarieae based on plastid phylogeny. Te evolutionary relationships among lineages were almost resolved but within lineages the resolution remained low12, 17, mainly due to the extremely slow molecular evolutionary rate of plastid DNA15. While in phylogeny based on nuclear DNA marker GBSSI18, 13 lineages were resolved and incongruence was revealed between the plastid and nuclear trees, indicating diferent evolutionary trajectories. Moreover, the nuclear tree provided a poorly resolved phylogenetic relationship within Arundinarieae16, 19 because of insufcient informative charac- ters, and more nuclear DNA markers were suggested to be needed to infer the evolutionary history of it. Next-generation sequencing has recently been used to address evolutionary problems in the Bambusoideae17, 20, 21. Ma et al.17 and Attigala et al.21 used plastid genome sequencing to resolve the phylogenetic relationships in Arundinarieae and obtained robust relationships among the major clades. However, studies of Arundinarieae employing next-generation sequencing have mainly focused on the plastid genome, few involved the nuclear genome. By analyzing whole-genome datasets from the Poaceae, one of them identifed 74 putative nuclear single copy orthologous genes for phylogenetic studies of temperate bamboos22, but this method is labor-intensive. With the development of high-throughput sequencing technologies, reduced-representation methods have rev- olutionized the felds of phylogeography, population genomics, and phylogenomics by providing high-resolution genomic data for non-model organisms at a reasonable cost23–25, such as restriction site-associated DNA sequenc- ing (RAD-seq)26. By reducing genomic representation, RAD-seq can identify tens of thousands of single nucle- otide polymorphism (SNP) markers, and address the issue of phylogenetic reconstruction with unprecedented power and precision, even with limited, or no reference genome27–32. Many empirical studies have employed this method on plants and animals to reconstruct their phylogenetic relationships and demonstrated its power on phylogenetic resolution in non-model organisms30, 32–35. Terefore, RAD-seq provides an opportunity to solve the contentious relationships of Arundinarieae from nuclear evolutionary trajectory. Mapping and grouping are two SNP-calling systems for obtaining large numbers of SNPs from RAD sequenc- ing data. In mapping, RAD sequencing reads are aligned to a reference genome and genotyped using standard tools, such as BWA36 and Stampy37, and the output alignments are supported by several generic SNP callers such as Genome Analysis Tool Kit38 (GATK) and SAMtools39. In grouping, RAD sequencing reads are used de novo, generating large marker sets where no reference genome is available. Several tools have been developed to produce RAD marker sets de novo, including Stacks40 and RADtools41. Pan et al.42 tested and compared SNP calling using the UNEAK, Stacks and bowtie2-GATK pipelines for genotyping-by-sequencing (GBS) data in nine individuals of the three pine species, and found that both Stacks and bowtie2-GATK were more efcient than UNEAK for SNP calling. However, to date, there has been no comparison of the performance of mapping and grouping in terms of the variants obtained and downstream phylogenetic analysis of RAD sequencing data. In a pilot study, we elucidated the phylogenetic relationship between two closely related species in temperate bamboos using RAD sequencing43. However, the utility of RAD-seq in building Arundinarieae phylogeny when more taxa sampled remains elusive. Te Phyllostachys clade (clade V) is the largest clade in Arundinarieae, with ca. 16 genera and more than 330 species which comprises about 50% of the genera and more than 70% of the spe- cies of the tribe15, 44. Te clade is remarkable for combining high morphological diversity with low plastid DNA variability. Terefore we adopt broad taxon sampling with an emphasis on Phyllostachys clade and related clades to elucidate their phylogenetic relationships, which would act as a valuable starting point for reconstructing a comprehensive phylogenetic framework for the whole tribe. Te primary goals of this investigation were (1) to test the utility of RAD data in providing a high-resolution estimate of the phylogenetic relationships among tem- perate bamboos, where a broad sample was examined; and (2) to evaluate and compare mapping and grouping systems for SNP calling and phylogeny inference based on RAD sequencing data. Results RAD sequencing. We obtained an average of 11.0 million paired-end reads of 82–86 bp per sample and approximately 615 million reads in all afer barcode trimming, cleaning and quality checking. Details of the sequencing output are provided in Supplementary Table S1. Data matrices from mapping system. Using BWA, we were able to map between 6.58% (Guadua angus- tifolia) and 99.12% (Phyllostachys edulis) (mean = 57.55%) of the RAD tags to the genomic scafold sequences. Te reference-based GATK HaplotypeCaller identifed 6,602,640 raw variants. Filtering for a coverage of 10 to 500 resulted in 5,934,688 variants being retained. Only 1390