The First Set of Universal Nuclear Protein-Coding Loci Markers For
Total Page:16
File Type:pdf, Size:1020Kb
www.nature.com/scientificreports OPEN The frst set of universal nuclear protein-coding loci markers for avian phylogenetic and population Received: 2 March 2018 Accepted: 21 September 2018 genetic studies Published: xx xx xxxx Yang Liu 1, Simin Liu1, Chia-Fen Yeh2, Nan Zhang1, Guoling Chen1, Pinjia Que 3, Lu Dong 3 & Shou-hsien Li2 Multiple nuclear markers provide genetic polymorphism data for molecular systematics and population genetic studies. They are especially required for the coalescent-based analyses that can be used to accurately estimate species trees and infer population demographic histories. However, in avian evolutionary studies, these powerful coalescent-based methods are hindered by the lack of a sufcient number of markers. In this study, we designed PCR primers to amplify 136 nuclear protein-coding loci (NPCLs) by scanning the published Red Junglefowl (Gallus gallus) and Zebra Finch (Taeniopygia guttata) genomes. To test their utility, we amplifed these loci in 41 bird species representing 23 Aves orders. The sixty-three best-performing NPCLs, based on high PCR success rates, were selected which had various mutation rates and were evenly distributed across 17 avian autosomal chromosomes and the Z chromosome. To test phylogenetic resolving power of these markers, we conducted a Neoavian phylogenies analysis using 63 concatenated NPCL markers derived from 48 whole genomes of birds. The resulting phylogenetic topology, to a large extent, is congruence with results resolved by previous whole genome data. To test the level of intraspecifc polymorphism in these makers, we examined the genetic diversity in four populations of the Kentish Plover (Charadrius alexandrinus) at 17 of NPCL markers chosen at random. Our results showed that these NPCL markers exhibited a level of polymorphism comparable with mitochondrial loci. Therefore, this set of pan-avian nuclear protein- coding loci has great potential to facilitate studies in avian phylogenetics and population genetics. Although the next generation sequencing technologies have produced sequences data in the unprecedented quantity with relative low cost1, traditional Sanger sequencing still has its niche in molecular evolutionary stud- ies: pilot or small scale phylogenetic studies using PCR-based approach are cost-efective and nearly available for every laboratory, benefcial to design sampling strategy and built an analysis scheme. By comparing molecular phylogenies based on diferent sizes of dataset, Rokas et al.2 proposed that concatenation of a sufcient number of unlinked genes (>20) can overwhelm incongruent branches of the Tree of Life (TOL). Furthermore, tracing backwards from multiple genetic polymorphisms to fnd the most recent common ancestor (MRCA) of a group of individuals provides a sophisticated approach to clarify phylogenetic relationships among species (species tree approach) and to reconstruct the demographic history of populations3,4. However, the major drawback of this approach is that the PCR performance of primers developed from one species is ofen unpredictable in the distantly related species; consequently, it is a time and cost consuming process to evaluate the performance of primers in a previously untested species. Terefore, a set of universal nuclear markers could provide an efcient way to ease this time consuming process. It should greatly facilitate the use of coalescent-based analyses to answer phylogenetic and population genetic questions5. 1State Key Laboratory of Biocontrol, Department of Ecology/School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, Guangdong, China. 2Department of Life Sciences, National Taiwan Normal University, Taipei, 116, Taiwan, China. 3Ministry of Education Key Laboratory for Biodiversity Science and Ecological Engineering, College of Life Sciences, Beijing Normal University, Beijing, 100875, China. Yang Liu and Simin Liu contributed equally. Correspondence and requests for materials should be addressed to L.D. (email: [email protected]) or S.-h.L. (email: [email protected]) SCIENTIFIC REPORTS | (2018) 8:15723 | DOI:10.1038/s41598-018-33646-x 1 www.nature.com/scientificreports/ Nuclear Protein-coding Loci (NPCLs) are exons without fanking introns6, and are widely used in interspecifc phylogenetic studies (e.g. RAG17, c-myc8,9). NPCL markers possess favorable properties including homogeneous base composition, varied evolutionary rates and easy alignment across species or populations10,11. Moreover, ort- hologous genes can be identifed accurately using their annotations12,13. Several sets of universal NPCL markers had been developed specially for beetles14, fsh15, reptiles6, amphibian and vertebrates16,17. However, there is still no sufcient number of easily amplifable NPCL markers that can fulfll the needs of modern coalescent-based analysis for most of bird species. As the most common and species-rich group of terrestrial vertebrates, birds exhibit tremen- dous diversity in their phenotypes, ecology, habitats and behaviors18. So far, a considerable efort has been devoted to resolve the phylogenetic relationships from higher taxonomic categories19–21 to sister species22–26. In addition to phylogenetics, modeling-based approaches using multiple nuclear genes have also shed light on population struc- ture and demographic history and allowed inferences of selection pressures in non-model organisms27–30. Te rapid advance in these sub-disciplines in evolutionary biology always hinges upon proper sampling design and a rigorous statistical approach, but it also requires data on multiple independent loci with an appropriate level of genetic poly- morphism31, which allows the application of sophisticated modeling and thus hypothesis testing. Eforts of developing universal PCR primers have facilitated avian phylogenetic and population genetic stud- ies32–34. For example, Dawson et al.35 developed a set of microsatellite markers with high cross-species utility, suit- able for paternity and population studies. Backström et al.36 developed more than 200 exons fanking introns, which were evenly distributed throughout the avian genome. However, a variable number of indels (insertions and deletions) in the intron complicate the subsequent amplifcation, sequencing and alignment of these exons. Conserved and easily aligned exonic regions are ideal alternatives to compensate for resolving power for phyloge- netic reconstruction13. Kimball et al.37 tested the utility of 36 published markers on 42–199 bird species with only fve exonic markers therein. Kerr et al.38 developed 100 exonic markers from fve avian genomes, and fnally tested a subset of 25 markers in 12 avian orders. Te quantity of NPCL markers is far from adequate as exon length should be longer than intron sequences to yield sufcient phylogenetic resolution39. Using a small number of universal NPCL markers could increase the probability of error when estimating species relationships due to the confict of gene tree topologies. To overcome the problem, it has been advocated to use more genes with longer sequences40. However, some obstacles have hindered the development of universal NPCL markers. Firstly, widespread fanking introns make the identifcation of the exon boundaries of a specifc NPCL marker difcult6. Secondly, multiple nuclear loci are required to be distributed evenly and widely across the whole genome in order to indicate a variety of historical signals. And fnally, low-cost and easy amplifcation are important requisites. Te development of a set of universal NPCL markers for birds should signifcantly reduce the time required for future research as well as its cost, and facilitate the application of coalescent-based methods in avian evolutionary studies. In this study, we aimed to develop a set of avian universal NPCL markers that can be widely utilized in avian phylogenetic and population genetic studies. By comparing the published genomes of the Red Junglefowl (Gallus gallus) and the Zebra Finch (Taeniopygia guttata), we designed 136 pairs of NPCL primers and amplifed them in 41 species representing 23 avian orders to check their versatility. To test the resolving power of these markers, we further constructed a phylogenetic tree and estimated mutation rates by extracting universal NPCLs from 48 published avian genomes41. Moreover, samples from four populations of the Kentish Plover (Charadrius alexan- drinus) were also amplifed to estimate the intra-specifc polymorphic level of these universal NPCLs. Results Pan-avian order amplifications of the novel NPCLs. The genome alignment and BLAST proce- dures resulted in 136 NPCL candidates, which were broadly distributed across 24 autosomal chromosomes and the Z chromosome of the Zebra Finch genome. Teir original fragment length ranged from 815 bp to 7176 bp (Supplementary Table S1). We thus nominated each NPCL marker using abbreviation of the associated pro- tein-coding regions according to gene annotation of Zebra Finch (Supplementary Table S1). More than one primer pairs were conducted for each NPCL marker candidate, and we fnally chose the pair of PCR markers with the highest score denoting the level of conservatism between Zebra Finch and Red Junglefowl genomes. In total, 5,146 PCRs were performed to amplify the 136 NPCLs in 41 species representing 23 avian orders (Fig. 1A). Among them, 2,875 (55.9%) of PCR performances produced a target band (Supplementary Table S3). For the 136 candidates, we successfully amplifed 12 NPCLs in all 23 orders, with 100% PCR success rate (PSR). Sixty three of