Towards a Unified Paradigm for Sequencebased Identification of Fungi
Total Page:16
File Type:pdf, Size:1020Kb
Received Date : 10-May-2013 Revised Date : 08-Jul-2013 Accepted Date : 17-Jul-2013 Article type : Opinion Towards a unified paradigm for sequence-based identification of Fungi Authors Urmas Kõljalg1, 2, R. Henrik Nilsson3, Kessy Abarenkov2, Leho Tedersoo2, Andy FS Taylor4, 5, Mohammad Bahram1, Scott T. Bates6, Thomas D. Bruns7, Johan Bengtsson-Palme8, Tony Martin Callaghan9, Brian Douglas9, Tiia Drenkhan10, Ursula Eberhardt11, Margarita Dueñas12, Tine Grebenc13, Gareth W. Griffith9, Martin Hartmann14, 15, Paul M. Kirk16, Petr Kohout1, 17, Ellen Article 3 18 19 12 20 Larsson , Björn D. Lindahl , Robert Lücking , María P. Martín , P. Brandon Matheny , Nhu H. Nguyen7, Tuula Niskanen21, Jane Oja1, Kabir G. Peay22, Ursula Peintner23, Marko Peterson1, Kadri Põldmaa1, Lauri Saag1, Irja Saar1, Arthur Schüßler24, James A. Scott25, Carolina Senés24, Matthew E. Smith26, Ave Suija1, D. Lee Taylor27, M. Teresa Telleria12, Michael Weiß28, Karl- Henrik Larsson29. 1 Institute of Ecology and Earth Sciences, University of Tartu, Lai 40, 51005 Tartu, Estonia. 2 Natural History Museum, University of Tartu, Vanemuise 46, 51014 Tartu, Estonia. 3 Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, SE-40530 Göteborg, Sweden. 4 The James Hutton Institute, Craigiebuckler, Aberdeen, AB15 8QH, Scotland, UK. 5 Institute of Biological and Environmental Sciences, University of Aberdeen, Cruickshank Building, St Machar Drive, Aberdeen, AB24 3UU, UK. This article has been accepted for publication and undergone full peer review but has not been through the copyediting, typesetting, pagination and proofreading process, which may lead to differences between this version and the Version of Record. Please cite this article as doi: Accepted 10.1111/mec.12481 This article is protected by copyright. All rights reserved. 6 Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, CO 80309 Colorado, USA. 7 Department of Plant and Microbial Biology, University of California, Berkeley California, 94720, USA. 8 Department of Neuroscience and Physiology, The Sahlgrenska Academy, University of Gothenburg, Box 434, SE-40530 Göteborg, Sweden. 9 Institute of Biological, Environmental and Rural Sciences, Cledwyn Building, Aberystwyth University, SY23 3DD Aberystwyth, Wales, UK. 10 Institute of Forestry and Rural Engineering, Estonian University of Life Sciences, Fr. R. Kreutzwaldi 5, 51014 Tartu, Estonia. Article 11 Staatliches Museum f. Naturkunde Stuttgart, Abt. Botanik, Rosenstein 1, D-70191 Stuttgart, Germany. 12 Departmento de Micología, Real Jardín Botánico (RJB-CSIC), Plaza de Murillo 1, 28014 Madrid, Spain. 13 Department of Forest Physiology and Genetics, Slovenian Forestry Institute, Vecna pot 2, SI- 1000 Ljubljana, Slovenia. 14 Forest Soils and Biogeochemistry, Swiss Federal Research Institute WSL, Birmensdorf, Switzerland. 15 Molecular Ecology, Agroscope Reckenholz-Tänikon Research Station ART, Zurich, Switzerland. 16 Mycology Section, Jodrell Laboratory, Royal Botanic Gardens Kew, Surrey TW9 3DS, UK. 17 Institute of Botany, Academy of Science of the Czech Republic, CZ-252 43 Průhonice, Czech Republic. Accepted This article is protected by copyright. All rights reserved. 18 Swedish University of Agricultural Sciences, Department of Forest Mycology and Plant Pathology, Box 7026, SE-75007 Uppsala, Sweden. 19 Department of Botany, The Field Museum, 1400 South Lake Shore Drive, Chicago, IL 60605-2496, USA. 20 Department of Ecology and Evolutionary Biology, University of Tennessee, Hesler Biology Building 332, Knoxville, TN 37996-1610, USA. 21 Plant Biology, Department of Biosciences, University of Helsinki, P.O. Box 56, FI-00014 Helsinki, Finland. 22 Department of Biology, Stanford University, CA 94305 Stanford, USA. 23 Institute of Microbiology, University Innsbruck, Technikerstr. 25, 6020 Innsbruck, Austria. Article 24 Genetics, Department Biology, Ludwig-Maximilians-University, Munich, Grosshaderner Str. 4, 82152 Martinsried, Germany. 25 Dalla Lana School of Public Health, University of Toronto, 223 College Street, Toronto ON, M5T 1R4 Canada. 26 Department of Plant Pathology, University of Florida, Gainesville, FL 32611-0680, USA. 27 Department of Biology, MSC03 2020, University of New Mexico, Albuquerque, NM 87131- 0001, USA. 28 Department of Biology, University of Tübingen, Auf der Morgenstelle 5, D-72076 Tübingen, Germany. 29 Natural History Museum, University of Oslo, P.O. Box 1172 Blindern, 0318 Oslo, Norway. Keywords: microbial diversity, DNA barcoding, ecological genomics, bioinformatics, fungi. Accepted This article is protected by copyright. All rights reserved. Corresponding author: Urmas Kõljalg, Institute of Ecology and Earth Sciences, University of Tartu, Lai 40, 51005 Tartu, Estonia. E-mail: [email protected] Running title: Unified paradigm for identification of Fungi Abstract The nuclear ribosomal internal transcribed spacer (ITS) region is the formal fungal barcode and in most cases the marker of choice for exploration of fungal diversity in environmental samples. Two problems are particularly acute in the pursuit of satisfactory taxonomic assignment of newly generated ITS sequences: (i) the lack of an inclusive, reliable Article public reference dataset, and (ii) the lack of means to refer to fungal species, for which no Latin name is available in a standardized stable way. Here we report on progress in these regards through further development of the UNITE database (http://unite.ut.ee) for molecular identification of fungi. All fungal species represented by at least two ITS sequences in the international nucleotide sequence databases are now given a unique, stable name of the accession number type (e.g., Hymenoscyphus pseudoalbidus|GU586904|SH133781.05FU), and their taxonomic and ecological annotations were corrected as far as possible through a distributed, third-party annotation effort. We introduce the term “species hypothesis” (SH) for the taxa discovered in clustering on different similarity tresholds (97-99%). An automatically or manually designated sequence is chosen to represent each such species hypothesis. These reference sequences are released (http://unite.ut.ee/repository.php) for use by the scientific community in, e.g., local sequence similarity searches and in the QIIME pipeline. The system and the data will be updated automatically as the number of public fungal ITS sequences grows. We invite everybody in the position to improve the annotation or metadata associated with their particular Accepted This article is protected by copyright. All rights reserved. fungal lineages of expertise to do so through the new web-based sequence management system in UNITE. Introduction The nuclear ribosomal internal transcribed spacer (ITS) region has a long history of use as a molecular marker for species-level identification in ecological and taxonomic studies of Fungi (Hibbett et al. 2011). It offers several advantages over other species-level markers in terms of high information content and ease of amplification, and it was recently designated the official barcode for fungi (Schoch et al. 2012). The publicly available fungal ITS sequences vary Article significantly in reliability and technical quality, however, and third party annotation is not currently allowed (Bidartondo et al. 2008). To facilitate ITS-based molecular identification of fungi for the scientific community, the first fungal ITS annotation workshop was held on the premises of the University of Tartu, Estonia, on January 29-30, 2013. The 28 physical and online participants were chiefly fungal taxonomists whose expertise covered various lineages of Ascomycota, Basidiomycota, Glomeromycota, and Neocallimastigomycota. The researchers also comprised bioinformaticians and molecular ecologists with experience in sequence quality assessment. The workshop centered on the annotation of fungal ITS sequences in the extended UNITE database (http://unite.ut.ee; Abarenkov et al. 2010a) through the web-based sequence management workbench PlutoF (Abarenkov et al. 2010b; see also Fig. 1). Because UNITE mirrors the fungal ITS sequences in the International Nucleotide Sequence Databases (INSD: GenBank, EMBL, and DDBJ), the full set of ca. 300,000 fungal ITS entries generated by the scientific community as of December 2012 served as the target dataset. Accepted This article is protected by copyright. All rights reserved. The first version of the UNITE database was released in 2003 with a focus on ITS sequences of ectomycorrhizal fungi in northern Europe (Kõljalg et al. 2005). The database has been under continuous development since then and has become a full-blown sequence management environment with analysis and storage modules. At present, UNITE targets all fungi and geographical regions, but the founding principle – to provide reliable reference sequences for molecular identification – remains the same. Hereafter, UNITE refers not only to the original database of annotated ectomycorrhizal sequences, but also encompasses all fungal ITS sequences in the INSD database that are not of poor quality. The demand for high-quality reference sequences has risen rapidly due to the increasing use of high-throughput sequencing technologies (such as 454 pyrosequencing, Illumina, and Ion Torrent; Glenn 2011; Shokralla et Article al. 2012; Bates et al. 2013). These approaches generate vast amounts of sequence data – hundreds of thousands to billions of reads within a few hours or days – such that various automated approaches to analysis