Calibration of the Multi-Gene Metabarcoding Approach As an Efficient and Accurate Biomonitoring Tool
Total Page:16
File Type:pdf, Size:1020Kb
Calibration of the multi-gene metabarcoding approach as an efficient and accurate biomonitoring tool Guang Kun Zhang Department of Biology McGill University, Montréal April 2017 A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Master of Science © Guang Kun Zhang 2017 1 TABLE OF CONTENTS Abstract .................................................................................................................. 3 Résumé .................................................................................................................... 4 Acknowledgements ................................................................................................ 5 Contributions of Authors ...................................................................................... 6 General Introduction ............................................................................................. 7 References ..................................................................................................... 9 Manuscript: Towards accurate species detection: calibrating metabarcoding methods based on multiplexing multiple markers.................................................. 13 References ....................................................................................................32 Tables ...........................................................................................................41 Figures ......................................................................................................... 47 General Conclusions ........................................................................................... 52 References ................................................................................................... 54 Appendix ……………........................................................................................... 55 Manuscript supporting information ............................................................. 55 Additional materials .................................................................................... 83 2 ABSTRACT Climate change can impact biodiversity across different ecosystems, hence large-scale, time-sensitive biomonitoring tools are needed to survey global biodiversity. DNA barcoding is often used to identify single species based on single gene fragments, whereas DNA metabarcoding combines barcoding and high-throughput sequencing to survey for multiple species in complex environmental samples. The metabarcoding approach has been adopted for biodiversity surveys and diet analyses, but is only now starting to be widely applied in various fields such as early detection of invasive species and forensics science. The selection of genetic markers used for metabarcoding can greatly affect species detection rates and the taxonomic accuracy of the species detected. An ideal genetic marker would have both high amplification success, due to conserved priming sites, and high discrimination power due to divergent sequences of amplified genetic fragments within the taxonomic groups of interest. For many taxa, a single marker that provides these characteristics has proven to be elusive; however, only a limited number of metabarcoding studies have used multiple genetic markers to circumvent this problem and/or have cross-validated the species detected from natural environmental samples. The use of evolutionarily-independent genetic markers with different sequence characteristics is expected to improve species detection rates and the accuracy of species discrimination; for example, the use of both mitochondrial and nuclear genetic markers. Only few studies to date have used cocktails of species-specific or group-specific primer pairs to increase species detection rates. To address these outstanding issues, we have improved species detection by calibrating a metabarcoding approach using multiple markers and multiple primer pairs on mock communities with species known a priori. This approach can be applied for biomonitoring natural environmental samples containing similar species from the same major taxonomic groups as tested here. 3 RÉSUMÉ Le changement climatique a des répercussions sur la diversité biologique des différents écosystèmes. Un outil de biosurveillance rapide à grande échelle est donc nécessaire pour étudier la biodiversité mondiale. Bien que le « barcoding moléculaire » soit souvent utilisée pour identifier des espèces uniques basées sur des fragments de gène unique, l'approche du « métabarcdoing » combine le barcoding d’ADN et le séquençage à haut débit pour l'étude de plusieurs espèces dans des échantillons environnementaux complexes. Cette approche de métabarcoding a été utilisée dans les enquêtes sur la biodiversité et les analyses alimentaires, et seulement récemment commence à être appliquée dans divers domaines tels que la détection précoce des espèces envahissantes et la science médico-légale. Cependant, le choix des marqueurs génétiques peut grandement affecter les taux de détection des espèces et la précision taxonomique des espèces détectées. Un marqueur génétique idéal devrait avoir à la fois un succès d'amplification élevé avec des sites d'amorçage conservés et un pouvoir de discrimination élevé avec des séquences divergentes de fragments génétiques amplifiés dans tous les groupes d'intérêts taxonomiques, mais tel marqueur est souvent insaisissable. Seul un nombre limité d'études metabarcoding ont utilisé des marqueurs génétiques multiples au lieu d'un marqueur unique pour cibler des groupes taxonomiques plus divers et / ou ont validé les espèces détectées dans les échantillons naturels de l'environnement. L'utilisation de marqueurs génétiques qui évoluent indépendamment avec des caractéristiques de séquences différentes améliorerait particulièrement les taux de détection des espèces et la précision de la détection des espèces, par exemple une combinaison de marqueurs génétiques mitochondriaux et nucléaires. De plus, seules quelques études ont utilisé un mélange de paires d'amorces spécifiques d'une espèce ou d'un groupe spécifique pour augmenter les taux de détection des espèces. Nous avons donc amélioré la détection des espèces avec l'approche de metabarcoding en utilisant un mélange de deux marqueurs et de multiple paires d’amorces pour caractériser des communautés simulées avec des espèces connues a priori. Cette approche peut ensuite être appliquée pour la biosurveillance des échantillons environnementaux naturels contenant des espèces similaires provenant de divers groupes taxonomiques. 4 ACKNOWLEDGEMENTS I foremost would like to express my gratitude towards my supervisor - Melania Cristescu. She not only guided and supported me in the academics, but also mentored me for becoming a better scientist and leading me into the future career. I really appreciate her trust in me among all the candidates for working on such interesting projects, since I was previously working on species distribution modeling for my honours thesis. Due to lack of experience and English as a second language, writing has been my weakness in the academics life. Melania lent me books on academics writing, and used her own experience for guiding me how to improve my academics writing. In addition, she offered me opportunities to present my work at both local and international conferences, which helped me in the oral communication and built more confidence for presenting myself. Furthermore, as a student living far from family, Melania organized many social activities for the whole lab members, which made me feel the warmth for being home again. I am also so grateful towards my co-supervisor, Cathryn Abbott, from the Fisheries & Oceans Canada, Pacific Biological Station. She provided valuable insights into the manuscript and thesis writing, and she also mentored me on writing clearly, concisely, and effectively based on her writing experience in the government setting. She not only supported me in the academics, but also provided me the opportunity to work in her highly quality-controlled government laboratory that supports regulatory science. I initially was invited to her lab to meet her students and exchanged the ideas of my projects in June 2016, then I was hired as a part-time employee from January-April 2017 for applying my skills and obtaining government working experience. Both Melania and Cathryn had brought me into the fantastic network called CAISN (Canadian Aquatic Invasive Species Network), which allowed me meeting, getting assistance, and built collaboration with excellent scientists from government, academics institutions across Canada. I would also express my special thanks to the post-doc, Frédéric Chain, who helped me as a role-model and mentor in the academics. He taught me very patiently from zero knowledge in the bioinformatics field into scripting and performing analysis on my own. He not only helped me so much on scripting and data analysis of my manuscript, but also provided very valuable comments on my thesis and manuscript writing. I would also like to thank my committee members Rowan Barrett and Irene Gregory-Eaves for useful discussion and valuable insights of 5 this project. Additionally, I thank my fellow lab members, who provided support and company: Tiffany Chin, Katie Millette, Julien Flynn, Emily Brown, Sarah Finlayson, Genelle Harrison, Alessandra Loria, James Bull, Michaela Harris, and Joanne Littlefair. Finally, I would like to thank my friends and family for supporting me in the academics and personal life, which