Incorporating Molecular Data in Fungal Systematics: a Guide for Aspiring Researchers

Current Research in Environmental & Applied Mycology Doi 10.5943/cream/3/1/1 Incorporating molecular data in fungal systematics: a guide for aspiring researchers Hyde KD1,2, Udayanga D1,2, Manamgoda DS1,2, Tedersoo L3,4, Larsson E5, Abarenkov K3, Bertrand YJK5, Oxelman B5, Hartmann M6,7, Kauserud H8, Ryberg M9, Kristiansson E10, Nilsson RH 5,* 1Institute of Excellence in Fungal Research, 2School of Science, Mae Fah Luang University, Chiang Rai, 57100, Thailand 3Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia 4Natural History Museum, University of Tartu, Tartu, Estonia 5Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 405 30 Göteborg, Sweden 6Forest Soils and Biogeochemistry, Swiss Federal Research Institute WSL, Birmensdorf, Switzerland 7Molecular Ecology, Agroscope Reckenholz-Tänikon Research Station ART, Zurich, Switzerland 8 Department of Biosciences, University of Oslo, PO Box 1066, Blindern, N-0316, Oslo, Norway 9Department of Organismal Biology, Uppsala University, Norbyvägen 18D, 75236 Uppsala, Sweden 10Department of Mathematical Statistics, Chalmers University of Technology, 412 96 Göteborg, Sweden Hyde KD, Udayanga D, Manamgoda DS, Tedersoo L, Larsson E, Abarenkov K, Bertrand YJK, Oxelman B, Hartmann M, Kauserud H, Ryberg M, Kristiansson E, Nilsson RH. 2013 – Incorporating molecular data in fungal systematics: a guide for aspiring researchers. Current Research in Environmental & Applied Mycology 3(1), 1–32, Doi 10.5943/cream/3/1/1 The last twenty years have witnessed molecular data emerge as a primary research instrument in most branches of mycology. Fungal systematics, taxonomy, and ecology have all seen tremendous progress and have undergone rapid, far-reaching changes as disciplines in the wake of continual improvement in DNA sequencing technology. A taxonomic study that draws from molecular data involves a long series of steps, ranging from taxon sampling through the various laboratory procedures and data analysis to the publication process. All steps are important and influence the results and the way they are perceived by the scientific community. The present paper provides a reflective overview of all major steps in such a project with the purpose to assist research students about to begin their first study using DNA-based methods. We also take the opportunity to discuss the role of taxonomy in biology and the life sciences in general in the light of molecular data. While the best way to learn molecular methods is to work side by side with someone experienced, we hope that the present paper will serve to lower the learning threshold for the reader. Key words – biodiversity – molecular marker – phylogeny – publication – species – Sanger sequencing – taxonomy Article Information Received 12 March 2013 Accepted 26 March 2013 Published online 29 April 2013 * Corresponding author: R. Henrik Nilsson – email – [email protected] 1 Current Research in Environmental & Applied Mycology Doi 10.5943/cream/3/1/1 Introduction requirement as an effective tool in terms of Morphology has been the basis of time and resource consumption (Horn et al. nearly all taxonomic studies of fungi. Most 1996; Frisvad et al. 2008). species were previously introduced because Molecular data was first used in their morphology differed from that of other taxonomic studies of fungi in the 1970's (cf. taxa, although in many plant pathogenic genera DeBertoldi et al. 1973), which marked the the host was given a major consideration beginning of a new era in fungal research. The (Rossman and Palm-Hernández 2008; Hyde et last twenty years have seen an explosion in the al. 2010; Cai et al. 2011). Genera were use of molecular data in systematics and introduced because they were deemed taxonomy, to the extent where many journals sufficiently distinct from other groups of will no longer accept papers for publication species, and the differences typically amounted unless the taxonomic decisions are backed by to one to several characters. Groups of genera molecular data. DNA sequences are now used were combined into families and families into on a routine basis in fungal taxonomy at all orders and classes; the underlying reasoning levels. Molecular data in taxonomy and for grouping lineages was based on single to systematics are not devoid of problems, several characters deemed to be of particular however, and there are many concerns that are discriminatory value. However, the whole not always given the attention they deserve system hinged to no small degree on what (Taylor et al. 2000; Hibbett et al. 2011). We characters were chosen as arbiters of regularly meet research students who are inclusiveness. These characters often varied unsure about one or more steps in their nascent among mycologists and resulted in molecular projects. Much has been written disagreement and constant taxonomic about each step in the molecular pipeline, and rearrangements in many groups of fungi there are several good publications that should (Hibbett 2007; Shenoy et al. 2007; Yang 2011). be consulted regardless of the questions The limitations of morphology-based arising. What is missing, perhaps, is a wrapper taxonomy were recognized early on, and for these papers: a freely available, easy-to- numerous mycologists tried to incorporate read yet not overly long document that other sources of data into the classification summarizes all major steps and provides process, such as information on biochemistry, references for additional reading for each of enzyme production, metabolite profiles, them. This is an attempt at such an overview. physiological factors, growth rate, We have divided it into six sections that span pathogenicity, and mating tests (Guarro et al. the width of a typical molecular study of fungi: 1999; Taylor et al. 2000; Abang et al. 2009). 1) taxon sampling, 2) laboratory procedures, 3) Some of these attempts proved successful. For sequence quality control, 4) data analysis, 5) instance, the economically important plant the publication process, and 6) other pathogen Colletotrichum kahawae was observations. The target audience is research distinguished from C. gloeosporioides based students ("the users") about to undertake a on physiological and biochemical characters (Sanger sequencing-based) molecular (Correll et al. 2000; Hyde et al. 2009). mycological study with a systematic or Similarly, relative growth rates and the taxonomic focus. production of secondary metabolites on defined media under controlled conditions are valuable Sampling and compiling a dataset in studies of complex genera such as The dataset determines the results. If Aspergillus, Penicillium, and Colletotrichum there is anything that the user should spend (Frisvad et al. 2007; Samson and Varga 2007; valued time completing, it is to compile a rich, Cai et al. 2009). In many other situations, meaningful dataset of specimens/sequences however, these data sources proved (Zwickl and Hillis 2002). Right from the onset, inconclusive or included a signal that varied it is important to consider what hypotheses will over time or with environmental and be tested with the resulting phylogeny. If the experimental conditions. In addition, the hypothesis is that one taxon is separate from detection methods often did not meet the one or several other taxa, then all those taxa 2 Current Research in Environmental & Applied Mycology Doi 10.5943/cream/3/1/1 should be sampled along with appropriate Sampling through herbaria, culture outgroups; consultation of taxonomic expertise collections, and the literature – Mining world (if available) is advisable. It is paramount to herbaria for species and specimens of any consider previous studies of both the taxa given genus is not as straightforward as one under scrutiny and closely related taxa to might think. Many herbaria (even in the achieve effective sampling. It is usually a Western countries) are not digitized, and no mistake to think that one knows about the centralized resource exists where all digitized relevant literature already, and the users are herbaria can be queried jointly. GBIF advised to familiarize themselves with how to (http://www.gbif.org/) nevertheless represents search the literature efficiently (cf. Conn et al. a first step in such a direction, and the user is 2003 and Miller et al. 2009). In particular, it advised to start there. Larger herbaria not often pays off to establish a set of core covered by GBIF may have their own publications and then look for other papers that databases searchable through web interfaces cite these core papers (through, e.g., “Cited by” but otherwise are best queried through an email in Google Scholar). If there is an opportunity to to the curator. Type specimens have a special sequence more than one specimen per taxon of standing in systematics (Hyde and Zhang 2008; interest, this is highly preferable and Ko Ko et al. 2011; McNeill et al. 2012), and particularly important when it comes to poorly the inclusion of type specimens in a study defined taxa and/or at low taxonomic levels. In lends extra weight and credibility to the results most cases it will not be enough to use and to unambiguous naming of generated whatever specimens the local herbarium or sequences. Herbaria may be unwilling to loan culture collection has to offer; rather the user type specimens (particularly for sequencing or should consider the resources available at the any other activity that involves destroying a national and international levels. Ordering part of the specimen, i.e., destructive sampling) specimens or cultures may prove expensive, and may offer to sequence them locally instead and the user may want to consider inviting – or may in fact already have done so. researchers with easy access to such specimens Specimens collected by particularly as co-authors of the study. The invited co- professional taxonomists or that are covered in authors may even oversee the local sequencing reference works should similarly be prioritized. of those specimens, to the point where more For cultures, we recommend the user to start at specimens does not have to mean more costs the CBS culture collection for the researcher.

Incorporating Molecular Data in Fungal Systematics: a Guide for Aspiring Researchers

Selecting Molecular Markers for a Specific Phylogenetic Problem

An Automated Pipeline for Retrieving Orthologous DNA Sequences from Genbank in R

"Phylogenetic Analysis of Protein Sequence Data Using The

New Syllabus

Molecular Phylogenetics Suggests a New Classification and Uncovers Convergent Evolution of Lithistid Demosponges

Molecular Homology and Multiple-Sequence Alignment: an Analysis of Concepts and Practice

Incorporating Molecular Data Into Your Phylogenetic Tree Activity Adapted from Bishop Museum ECHO Project Taxonomy Module and ENSI/SENSI Making Cladograms Lesson

Molecular Phylogenetics: Principles and Practice

Molecular Phylogenetics: State-Of-The- Art Methods for Looking Into the Past

Annonex2embl: Automatic Preparation of Annotated DNA Sequences for Bulk Submissions to ENA Michael Gruenstaeudl1

Inferring Biological Networks from Genome-Wide Transcriptional And

Molecular Phylogenetics Next Two Lectures