1 Incorporating Molecular Data in Fungal Systematics: a Guide for Aspiring Researchers 2 3 Kevin D
Total Page:16
File Type:pdf, Size:1020Kb
1 Incorporating molecular data in fungal systematics: a guide for aspiring researchers 2 3 Kevin D. Hyde1,2, Dhanushka Udayanga1,2, Dimuthu S. Manamgoda1,2, Leho Tedersoo3,4, 4 Ellen Larsson5, Kessy Abarenkov3, Yann J.K. Bertrand5, Bengt Oxelman5, Martin 5 Hartmann6,7, Håvard Kauserud8, Martin Ryberg9, Erik Kristiansson10, R. Henrik Nilsson5,* 6 7 1 Institute of Excellence in Fungal Research 8 2 School of Science, Mae Fah Luang University, Chiang Rai, 57100, Thailand 9 3 Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Estonia 10 4 Natural History Museum, University of Tartu, Tartu, Estonia 11 5 Department of Biological and Environmental Sciences, University of Gothenburg, Box 461, 12 405 30 Göteborg, Sweden 13 6 Forest Soils and Biogeochemistry, Swiss Federal Research Institute WSL, Birmensdorf, 14 Switzerland 15 7 Molecular Ecology, Agroscope Reckenholz-Tänikon Research Station ART, Zurich, 16 Switzerland 17 8 Department of Biology, University of Oslo, PO Box 1066 Blindern, N-0316 Oslo, Norway 18 9 Department of Organismal Biology, Uppsala University, Norbyvägen 18D, 75236 Uppsala, 19 Sweden 20 10 Department of Mathematical Statistics, Chalmers University of Technology, 412 96 21 Göteborg, Sweden 22 * Corresponding author: R. Henrik Nilsson – email – [email protected] 23 24 Abstract 25 The last twenty years have witnessed molecular data emerge as a primary research instrument 26 in most branches of mycology. Fungal systematics, taxonomy, and ecology have all seen 27 tremendous progress and have undergone rapid, far-reaching changes as disciplines in the 28 wake of continual improvement in DNA sequencing technology. A taxonomic study that 29 draws from molecular data involves a long series of steps, ranging from taxon sampling 30 through the various laboratory procedures and data analysis to the publication process. All 31 steps are important and influence the results and the way they are perceived by the scientific 32 community. The present paper provides a reflective overview of all major steps in such a 33 project with the purpose to assist research students about to begin their first study using DNA- 34 based methods. We also take the opportunity to discuss the role of taxonomy in biology and 1 35 the life sciences in general in the light of molecular data. While the best way to learn 36 molecular methods is to work side by side with someone experienced, we hope that the 37 present paper will serve to lower the learning threshold for the reader. 38 39 Key words: biodiversity, molecular marker, phylogeny, publication, species, Sanger 40 sequencing, taxonomy 2 41 Introduction 42 Morphology has been the basis of nearly all taxonomic studies of fungi. Most species were 43 previously introduced because their morphology differed from that of other taxa, although in 44 many plant pathogenic genera the host was given a major consideration (Rossman and Palm- 45 Hernández 2008; Hyde et al. 2010; Cai et al. 2011). Genera were introduced because they 46 were deemed sufficiently distinct from other groups of species, and the differences typically 47 amounted to one to several characters. Groups of genera were combined into families and 48 families into orders and classes; the underlying reasoning for grouping lineages was based on 49 single to several characters deemed to be of particular discriminatory value. However, the 50 whole system hinged to no small degree on what characters were chosen as arbiters of 51 inclusiveness. These characters often varied among mycologists and resulted in disagreement 52 and constant taxonomic rearrangements in many groups of fungi (Hibbett 2007; Shenoy et al. 53 2007; Yang 2011). 54 The limitations of morphology-based taxonomy were recognized early on, and 55 numerous mycologists tried to incorporate other sources of data into the classification 56 process, such as information on biochemistry, enzyme production, metabolite profiles, 57 physiological factors, growth rate, pathogenicity, and mating tests (Guarro et al. 1999, Taylor 58 et al. 2000; Abang et al. 2009). Some of these attempts proved successful. For instance, the 59 economically important plant pathogen Colletotrichum kahawae was distinguished from C. 60 gloeosporioides based on physiological and biochemical characters (Correll et al. 2000; Hyde 61 et al. 2009). Similarly, relative growth rates and the production of secondary metabolites on 62 defined media under controlled conditions are valuable in studies of complex genera such as 63 Aspergillus, Penicillium, and Colletotrichum (Frisvad et al. 2007; Samson and Varga 2007; 64 Cai et al. 2009). In many other situations, however, these data sources proved inconclusive or 65 included a signal that varied over time or with environmental and experimental conditions. In 66 addition, the detection methods often did not meet the requirement as an effective tool in 67 terms of time and resource consumption (Horn et al. 1996; Frisvad et al. 2008). 68 Molecular data was first used in taxonomic studies of fungi in the 1970's (cf. 69 DeBertoldi et al. 1973), which marked the beginning of a new era in fungal research. The last 70 twenty years have seen an explosion in the use of molecular data in systematics and 71 taxonomy, to the extent where many journals will no longer accept papers for publication 72 unless the taxonomic decisions are backed by molecular data. DNA sequences are now used 73 on a routine basis in fungal taxonomy at all levels. Molecular data in taxonomy and 74 systematics are not devoid of problems, however, and there are many concerns that are not 3 75 always given the attention they deserve (Taylor et al. 2000; Groenwald et al. 2011; Hibbett et 76 al. 2011). We regularly meet research students who are unsure about one or more steps in 77 their nascent molecular projects. Much has been written about each step in the molecular 78 pipeline, and there are several good publications that should be consulted regardless of the 79 questions arising. What is missing, perhaps, is a wrapper for these papers: a freely available, 80 easy-to-read yet not overly long document that summarizes all major steps and provides 81 references for additional reading for each of them. This is an attempt at such an overview. We 82 have divided it into six sections that span the width of a typical molecular study of fungi: 1) 83 taxon sampling, 2) laboratory procedures, 3) sequence quality control, 4) data analysis, 5) the 84 publication process, and 6) other observations. The target audience is research students ("the 85 users") about to undertake a (Sanger sequencing-based) molecular mycological study with a 86 systematic or taxonomic focus. 87 88 1. Sampling and compiling a dataset 89 The dataset determines the results. If there is anything that the user should spend valued time 90 completing, it is to compile a rich, meaningful dataset of specimens/sequences (Zwickl and 91 Hillis 2002). Right from the onset, it is important to consider what hypotheses will be tested 92 with the resulting phylogeny. If the hypothesis is that one taxon is separate from one or 93 several other taxa, then all those taxa should be sampled along with appropriate outgroups; 94 consultation of taxonomic expertise (if available) is advisable. It is paramount to consider 95 previous studies of both the taxa under scrutiny and closely related taxa to achieve effective 96 sampling. It is usually a mistake to think that one knows about the relevant literature already, 97 and the users are advised to familiarize themselves with how to search the literature 98 efficiently (cf. Conn et al. 2003 and Miller et al. 2009). In particular, it often pays off to 99 establish a set of core publications and then look for other papers that cite these core papers 100 (through, e.g., “Cited by” in Google Scholar). If there is an opportunity to sequence more than 101 one specimen per taxon of interest, this is highly preferable and particularly important when it 102 comes to poorly defined taxa and/or at low taxonomic levels. In most cases it will not be 103 enough to use whatever specimens the local herbarium or culture collection has to offer; 104 rather the user should consider the resources available at the national and international levels. 105 Ordering specimens or cultures may prove expensive, and the user may want to consider 106 inviting researchers with easy access to such specimens as co-authors of the study. The 107 invited co-authors may even oversee the local sequencing of those specimens, to the point 108 where more specimens does not have to mean more costs for the researcher. When 4 109 considering specimens for sequencing, it should be kept in mind that it may be difficult to 110 obtain long, high-quality DNA sequences from older specimens (cf. Larsson and Jacobsson 111 2004). However, there are other factors than age, such as how the specimen was dried and 112 how it has been stored, that also influence the quality of the DNA. There also seem to be 113 systematic differences across taxa in how well their DNA is preserved over time; as an 114 example, species adapted to tolerate desiccation are often found to have better-preserved 115 DNA than have short-lived mushrooms (personal observation). There are different methods 116 that increase the chances of getting satisfactory sequences even from single cells or otherwise 117 problematic material (see Möhlenhoff et al. 2001, Maetka et al. 2008, and Bärlocher et al. 118 2010). 119 120 Sampling through herbaria, culture collections, and the literature. Mining world herbaria for 121 species and specimens of any given genus is not as straightforward as one might think. Many 122 herbaria (even in the Western countries) are not digitized, and no centralized resource exists 123 where all digitized herbaria can be queried jointly.