A New Application for Phylogenetic Marker Development Using Angiosperm Transcriptomes Author(S): Srikar Chamala, Nicolás García, Grant T
Total Page:16
File Type:pdf, Size:1020Kb
MarkerMiner 1.0: A New Application for Phylogenetic Marker Development Using Angiosperm Transcriptomes Author(s): Srikar Chamala, Nicolás García, Grant T. Godden, Vivek Krishnakumar, Ingrid E. Jordon- Thaden, Riet De Smet, W. Brad Barbazuk, Douglas E. Soltis, and Pamela S. Soltis Source: Applications in Plant Sciences, 3(4) Published By: Botanical Society of America DOI: http://dx.doi.org/10.3732/apps.1400115 URL: http://www.bioone.org/doi/full/10.3732/apps.1400115 BioOne (www.bioone.org) is a nonprofit, online aggregation of core research in the biological, ecological, and environmental sciences. BioOne provides a sustainable online platform for over 170 journals and books published by nonprofit societies, associations, museums, institutions, and presses. Your use of this PDF, the BioOne Web site, and all posted and associated content indicates your acceptance of BioOne’s Terms of Use, available at www.bioone.org/page/terms_of_use. Usage of BioOne content is strictly limited to personal, educational, and non-commercial use. Commercial inquiries or rights and permissions requests should be directed to the individual publisher as copyright holder. BioOne sees sustainable scholarly publishing as an inherently collaborative enterprise connecting authors, nonprofit publishers, academic institutions, research libraries, and research funders in the common goal of maximizing access to critical research. ApApplicatitionsons Applications in Plant Sciences 2015 3 ( 4 ): 1400115 inin PlPlant ScienSciencesces S OFTWARE NOTE M ARKERMINER 1.0: A NEW APPLICATION FOR PHYLOGENETIC 1 MARKER DEVELOPMENT USING ANGIOSPERM TRANSCRIPTOMES S RIKAR C HAMALA 2,12 , N ICOLÁS G ARCÍA 2,3,4 * , GRANT T . G ODDEN 2,3,5 * , V IVEK K RISHNAKUMAR 6 , I NGRID E. JORDON-THADEN 7,8 , R IET D E SMET 9,10 , W . B RAD B ARBAZUK 2,11 , D OUGLAS E. SOLTIS 2,3,11 , AND P AMELA S. SOLTIS 3,11 2 Department of Biology, University of Florida, Gainesville, Florida, USA; 3 Florida Museum of Natural History, University of Florida, Gainesville, Florida, USA; 4 Facultad de Ciencias Forestales y Conservación de la Naturaleza, Universidad de Chile, Santiago, Chile; 5 Rancho Santa Ana Botanic Garden, Claremont, California, USA; 6 Plant Genomics, J. Craig Venter Institute, Rockville, Maryland, USA; 7 Department of Biology, Bucknell University, Lewisburg, Pennsylvania, USA; 8 Jepson and University Herbaria, University of California, Berkeley, Berkeley, California, USA; 9 Department of Plant Systems Biology, Vlaams Instituut voor Biotechnologie , 9052 Ghent, Belgium; 10 Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium; and 11 Genetics Institute, University of Florida, Gainesville, Florida, USA • Premise of the study: Targeted sequencing using next-generation sequencing (NGS) platforms offers enormous potential for plant systematics by enabling economical acquisition of multilocus data sets that can resolve diffi cult phylogenetic problems. However, because discovery of single-copy nuclear (SCN) loci from NGS data requires both bioinformatics skills and access to high-performance computing resources, the application of NGS data has been limited. • Methods and Results: We developed MarkerMiner 1.0, a fully automated, open-access bioinformatic workfl ow and application for discovery of SCN loci in angiosperms. Our new tool identifi ed as many as 1993 SCN loci from transcriptomic data sampled as part of four independent test cases representing marker development projects at different phylogenetic scales. • Conclusions: MarkerMiner is an easy-to-use and effective tool for discovery of putative SCN loci. It can be run locally or via the Web, and its tabular and alignment outputs facilitate effi cient downstream assessments of phylogenetic utility, locus selec- tion, intron-exon boundary prediction, and primer or probe development. Key words: data mining; introns; marker development; next-generation sequencing; phylogenetics; single-copy nuclear genes; transcriptomes. The availability of next-generation sequencing (NGS) tech- portion of the genome; see Bräutigam and Gowik, 2010 ; Strickler nologies and improved computational tools has revolutionized et al., 2012 ) or genome skimming data (i.e., low-coverage ge- the fi eld of plant molecular systematics (reviewed in Cronn nome sequencing; see Straub et al., 2012 ), and a few pioneering et al., 2012 ; McCormack et al., 2013 ; Soltis et al., 2013 ). Ac- studies have reported great success in developing large sets of cess to genome-scale data presents exciting opportunities for orthologous SCN loci with elaborately designed bioinfor- researchers to develop hundreds or potentially thousands of in- matic pipelines (e.g., Straub et al., 2011 ; Rothfels et al., 2013 ; formative, taxon-specifi c loci from nuclear genomes—large, Weitemier et al., 2014 ; Tonnabel et al., 2014 ; Pillon et al., 2014 ). multilocus data sets that can potentially resolve relationships at Nevertheless, SCN locus discovery from NGS data remains a com- any phylogenetic scale (e.g., Godden et al., 2012 ). plex process for many researchers with limited bioinformatics Recently, there has been much interest in developing single- training and access to computational resources. To address these copy nuclear (SCN) loci from new or existing NGS resources challenges, we developed MarkerMiner 1.0, a fully automated, such as transcriptomes (i.e., sequences representing the expressed open-access bioinformatic workfl ow to aid plant researchers in the discovery of putative orthologous SCN loci and to facilitate 1 Manuscript received 4 December 2014; revision accepted 25 February downstream marker development activities such as primer or 2015. probe design with user-friendly output. The authors thank all oneKP contributors, especially Gane Ka-Shu Wong, and BGI. Funding for S.C. was provided by the National Science Foundation (NSF; grant IOS-0922742 [P.S.S., D.E.S., W.B.B.]). Research METHODS AND RESULTS funding for N.G. and G.T.G. was provided in part by NSF Doctoral Dissertation Improvement grants DEB-1310839 (P.S.S. and N.G.) and Overall design of the application — Transcriptome sequencing is a useful DEB-1210671 (P.S.S. and G.T.G.), respectively. The salary of I.J.T. was approach for acquiring new data for phylogenetic marker development, and it provided by the David Burpee Endowment and Chris Martine (Bucknell might offer some advantages over genome skimming approaches. For example, University). the high output of NGS platforms, coupled with the reduced representation * Authors are listed alphabetically by surname and contributed equally. afforded by transcriptome sequencing, permits multiplexing of more samples 12 Author for correspondence: [email protected] from a clade of interest. This provides a more comprehensive a priori survey of phylogenetic utility across both gene space and the clade of interest than doi:10.3732/apps.1400115 genome skimming on a fi xed budget. Moreover, researchers may fi nd that Applications in Plant Sciences 2015 3 ( 4 ): 1400115; http://www.bioone.org/loi/apps © 2015 Chamala et al. Published by the Botanical Society of America. This work is licensed under a Creative Commons Attribution License (CC-BY-NC-SA). 1 of 8 Applications in Plant Sciences 2015 3 ( 4 ): 1400115 Chamala et al.—MarkerMiner software doi:10.3732/apps.1400115 Fig. 1. Filtering steps performed by MarkerMiner 1.0 to identify single-copy nuclear genes from angiosperm transcriptome assembly input. Best (primary) alignments are identifi ed with a star, and secondary alignments are identifi ed with a diamond . http://www.bioone.org/loi/apps 2 of 8 Applications in Plant Sciences 2015 3 ( 4 ): 1400115 Chamala et al.—MarkerMiner software doi:10.3732/apps.1400115 expressed sequence tags (ESTs) or de novo transcriptome assemblies already parameter based on their individual preferences and research needs. Decreasing exist for many groups of angiosperms (e.g., transcriptomes available through the default length parameter (e.g., <900 bp) will facilitate retention of larger the 1000 Plants [oneKP] project; see www.onekp.com for more information), numbers of transcripts for downstream fi ltering steps. In contrast, increasing the and use of these existing data resources can eliminate or reduce the overall costs default length parameter (e.g., >900 bp) may result in discovery of fewer ortho- and time investment for some marker discovery projects. logs between sampled taxa. MarkerMiner is a novel, command line–based computational workfl ow that identifi es putative orthologous SCN loci present in two or more user-provided Filtering putative ortholog pairs with reciprocal BLAST queries — angiosperm transcriptome assemblies and outputs detailed tabular results and MarkerMiner employs independent reciprocal BLAST ( Altschul et al., 1990 , 1997 ) sequence alignments for downstream assessment of phylogenetic utility, locus queries on each fi ltered transcriptome assembly to identify putative orthologs. By selection, intron-exon boundary prediction, and primer or probe development default, the application uses the Arabidopsis thaliana (L.) Heynh. proteome from for targeted sequencing (see Figs. 1–3 ) . The tool features a user-confi gurable the PLAZA 2.5 database ( Van Bel et al., 2012 ) as a reference. However, we offer command line interface that is backed by a computational pipeline, and its the fl exibility to use one of 15 additional reference options (see Box 1) , and job submission graphical user interface is accessible to researchers with lim- MarkerMiner is updated