Next generation sequencing, direct detection and genotyping of fungi, and nematodes in the agri-food system

Prepared by: André Lévesque, et al. Agriculture and Agri-Food Canada K.W. Neatby Bldg Ottawa, ON K1Y 4X2

Contract Project Number: CRTI 09-462RD CSA: Nezih Mrad, Portfolio Manager Chem/Bio, DKTS, Centre for Security Science

The scientific or technical validity of this Contract Report is entirely the responsibility of the Contractor and the contents do not necessarily have the approval or endorsement of the Department of National Defence of Canada.

Contract Report DRDC-RDDC-2015-C148 April 2015

IMPORTANT INFORMATIVE STATEMENTS

Next generation sequencing, direct detection and genotyping of fungi, bacteria and nematodes in the agri- food system (CRTI 09-462RD) was supported by the Canadian Safety and Security Program which is led by Defence Research and Development Canada’s Centre for Security Science, in partnership with Public Safety Canada. The project was led by Agriculture & AgriFood Canada in partnership with CFIA Charlottetown, CFIA Ottawa, Canadian Grain Commission, AAFC Lethbridge, AAFC Saskatoon, AAFC London.

Canadian Safety and Security Program is a federally-funded program to strengthen Canada’s ability to anticipate, prevent/mitigate, prepare for, respond to, and recover from natural disasters, serious accidents, crime and terrorism through the convergence of science and technology with policy, operations and intelligence.

© Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, 2015 © Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale, 2015

CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

CRTI 09-462RD CSSP 30vv01

Next generation sequencing, direct detection and genotyping of fungi, bacteria and nematodes in the agri-food system

Final - 30 April 2015

Led by Defence Research and Development Canada - Centre for Security Science

page 1 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Co-authors André Lévesque, Project Manager, Research Scientist, AAFC Ottawa Wen Chen, Deputy Project Manager, Research Scientist, AAFC Ottawa Sarah Hambleton, Research Scientist (Mycology: cereal rusts, smuts and bunts), AAFC Ottawa Keith Seifert, Research Scientist (Mycology: ), AAFC Ottawa Zaky Adam, post doctoral fellow (Bacteriology: plant pathogens), AAFC Ottawa Guillaume Bilodeau, Research Scientist, Ottawa Plant Laboratory (Fallowfield), CFIA, Ottawa Jeff Cullis, Bioinformatics Programmer, AAFC Ottawa Tigst Demeke, Research Scientist- Molecular Biology, Grain Research Laboratory, CGC Winnipeg Tim Dumonceaux, Research Scientist (Bacteriology: Endophytes of plants), AAFC Saskatoon Marie-Claude Gagnon, post doctoral fellow, Ottawa Plant Laboratory (Fallowfield), CFIA, Ottawa Tom Graefenhan, Research Scientist-Mycologist, Grain Research Laboratory, CGC Winnipeg Larry Kawchuk, Research Scientist (Mycology: potato pathogens), AAFC Lethbridge Izhar Khan, Research Scientist (Bacteriology: human/animal pathogens), AAFC, Ottawa, , Christopher Lewis, Chief Bioinformatician for Biodiversity, AAFC Ottawa Xiang Li, Research Scientist, Charlottetown Laboratory - Plant Health, CFIA, Charlottetown Matthew Links, Biologist (Molecular Bioinformatics), AAFC Saskatoon Benjamin Mimee, Research Scientist (nematology), AAFC Saint Jean sur Richelieu Mike Rott, Research Scientist, Sidney Laboratory - Plant Health, CFIA, Saanich James Tambong, Research Scientist (Bacteriology: plant pathogens), AAFC Ottawa Ed Topp, Research Scientist (Bacteriology: human/animal pathogens), AAFC London Qin Yu, Research Scientist (Nematology: plant pathogens), AAFC Ottawa Kat Yuan, post doctoral fellow, Charlottetown Laboratory - Plant Health, CFIA, Charlottetown

© Her Majesty the Queen in Right of Canada, as represented by the Minister of National Defence, © Sa Majesté la Reine (en droit du Canada), telle que représentée par le ministre de la Défense nationale,

page 2 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Abstract ……..

This project engaged Agriculture and Agri-Food Canada (AAFC) and first responders, the Canadian Food Inspection Agency (CFIA) and the Canadian Grain Commission (CGC) to deliver on a collection of sub- projects unified by the technology of Next Generation Sequencing (NGS). The objectives focused on developing innovative methods to counter threats from fungal, bacterial and nematode pathogens. The outputs covered the full spectrum of Technology Readiness Levels (TRL). For example, de novo sequencing was accomplished for some high risk microbes for which there were no genomics data available (TRL1) and some tests developed from the genomics data were used by Canadian first responders to resolve trade disputes, and were either used or are currently being evaluated by first responders in other countries (>TRL7). The genomes and transcriptomes of high risk plant and animal / human pathogens found on crop commodities and in agro-ecosystems were sequenced. DNA and RNA from multiple strains of high profile pathogenic species were processed for NGS from the following genera: Arcobacter, Ditylenchus, Globodera, Pantoea, Pectobacterium, Penicillium, Phytophthora, Puccinia, Ralstonia, Streptomyces, Synchytrium, and . Reductions in sequencing costs over the time span of the project and an in-kind contribution for genomics and for data analysis infrastructure installed mid-project, allowed us to increase the number of strains processed for full genome sequencing to more than four times our original commitment. We also annotated the genomes of additional species that were not originally planned. Quantitative PCR, single nucleotide polymorphism (SNP) detection and microsatellite-based assays were developed for these targets. The threat from a few targeted species, namely, Ralstonia solanacearum race 3 biovar 2 (brown rot of potato), Synchytrium endobioticum (potato wart), Globodera pallida (the pale cyst nematode of potato), G. rostochiensis (the golden nematode of potato), and Ditylenchus destructor (the potato rot nematode), increased nationally during the project. Therefore, some diagnostic technologies were transferred to end users and made operational much earlier than originally planned. The project also generated universal 'DNA barcode' sequences from environmental samples using NGS. Automated bioinformatics pipelines were developed and improved for the processing of next generation genomic and metagenomic data, an added deliverable leading to higher assembly efficiency than originally expected. A total of 322 environmental samples, 30% more than planned, from cereal grains, soils from potato fields and agricultural watersheds were processed to validate the NGS approach. This project generated comprehensive sequence databases, providing an up- to-date inventory for high risk plant pathogens. It also delivered state-of-the-art diagnostic technologies and routine monitoring strategies to improve the safety and security of the Canadian food and agriculture system, to counter challenges to international trade regulations, and to benefit the agriculture industry and communities in Canada.

page 3 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Résumé ….....

Ce projet a amené Agriculture et Agroalimentaire Canada (AAC) et les premiers intervenants, soit l’Agence canadienne d’inspection des aliments (ACIA) et la Commission canadienne des grains (CCG), à réaliser une série de sous-projets reposant sur la technologie du séquençage de nouvelle génération (SNG). Les objectifs étaient axés sur la mise au point de méthodes novatrices visant à contrer les menaces posées par des champignons, des bactéries et des nématodes pathogènes. Les résultats obtenus couvrent tous les niveaux de maturité technologique (NMT). Ainsi, un séquençage de novo a été réalisé pour des microorganismes à risque élevé pour lesquels on ne disposait d’aucune donnée génomique (NMT 1). Par ailleurs, des essais mis au point à partir des données génomiques obtenues ont été utilisés par des premiers intervenants canadiens afin de résoudre des différends commerciaux, et ont été utilisés ou sont en cours d’évaluation par les premiers intervenants d’autres pays (> NMT 7). On a également séquencé les génomes et les transcriptomes d’agents pathogènes à risque élevé pour les végétaux, les animaux et les humains, trouvés dans des produits végétaux et dans des agroécosystèmes. L’ADN et l’ARN de nombreuses souches d’espèces pathogènes importantes ont été traités par SNG. Les espèces visées appartenaient aux genres suivants : Arcobacter, Ditylenchus, Globodera, Pantoea, Pectobacterium, Penicillium, Phytophthora, Puccinia, Ralstonia, Streptomyces, Synchytrium et Tilletia. Grâce à la réduction des coûts de séquençage pendant la durée du projet et à la contribution en nature liée à la génomique et au cadre d’analyse de données mis à place à mi-parcours, nous avons pu multiplier par plus de quatre le nombre de souches traitées aux fins du séquençage du génome complet par rapport à notre engagement initial. Nous avons également annoté les génomes d’espèces additionnelles qui n’étaient pas prévues au départ. Nous avons mis au point des épreuves de PCR quantitatives, de détection de SNP et de génotypage par analyse de microsatellites pour les espèces ciblées. La menace posée par certaines d’entre elles, en l’occurrence le Ralstonia solanacearum race 3 biovar 2 (pourriture brune de la pomme de terre), le Synchytrium endobioticum (gale verruqueuse), le Globodera pallida (nématode blanc de la pomme de terre), le G. rostochiensis (nématode doré de la pomme de terre) et le Ditylenchus destructor (nématode de la pourriture des racines de la pomme de terre), a augmenté à l’échelle nationale au cours du projet. Certaines technologies diagnostiques ont donc été transférées aux utilisateurs finaux et ont pu être appliquées beaucoup plus tôt que ce qui avait été prévu au départ. Le projet a également permis d’obtenir, au moyen du SNG, des séquences de codes-barres universels à partir d’échantillons environnementaux. Des pipelines bioinformatiques automatisés ont été mis au point pour le traitement des données génomiques et métagénomiques de nouvelle génération. Ce produit livrable additionnel a mené à un taux d’efficacité d’assemblage plus élevé que prévu. Un total de 322 échantillons environnementaux, soit 30 % de plus que prévu, provenant de grains céréaliers, du sol de champs de pomme de terre et de bassins versants agricoles, ont été traités aux fins de validation du SNG. Ce projet a entraîné la création de bases de données de séquences exhaustives, constituant un inventaire à jour des phytopathogènes à risque élevé. Il a également permis de mettre au point des technologies diagnostiques de pointe et des stratégies de surveillance courante visant à améliorer la salubrité et la sécurité du système agricole et agroalimentaire canadien, à résoudre les difficultés liées à la réglementation du commerce international et à favoriser l’industrie agricole et les collectivités du pays.

page 4 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Executive summary

Introduction or background: Detection technologies are the cornerstone of disease prevention but are also essential for response and recovery. In this continuum, characterization of existing baseline levels of high-risk pathogens and closely related species is essential to unambiguously detect new outbreaks and to monitor mitigation protocols and parameters. Technologies that go beyond species identification and allow strain typing enable responders to determine the source of an outbreak, compare outbreak strains with endemic populations, and determine whether response and recovery protocols are warranted. The decreased cost of Next Generation Sequencing (NGS) technologies provides opportunities to exploit the full capability of current and upcoming NGS for pathogen monitoring and genotyping. Lower costs also allow us to build a more comprehensive database of complete genomes of all high risk pathogens, especially their most virulent genotypes, and the genomes of some of their closest relatives. Furthermore, to prepare Canada for new challenges that may rise from the affordability of NGS in screening import commodities for pathogens in any country, we need to improve the accuracy and robustness of NGS data interpretation with validated analyses. During our previous work funded by the Canadian Safety and Security Program (CSSP), under the CBRNE Research & Technology Initiative at the time (project CRTI-04-0045RD, 2005 -2009), some targeted organisms became national and international issues and the technologies we developed were immediately implemented by first responders in Canada and the US. The previous project led to major improvements in operational effectiveness at the Canadian Food Inspection Agency (CFIA). New Standard Operating Procedures (SOP) were developed and transferred, validating our approach to include the designated emergency incident responders in the project plan from the onset. Examples from that project include the now routine use of an SOP for Phytophthora ramorum by the USDA (APHIS Elicitin WI-B-T-1-7) and implementation by the Canadian Grain Commission (CGC, not a partner in the original project) of a critical detection technology for mycotoxigenic Fusarium graminearum in grains. Several peer reviewed publications and protocols routinely used by end users were direct outputs of CRTI 04- 0045RD. Further, the foreign postdoctoral fellow, Tom Gräfenhan, recruited to Canada to work on CRTI 04-0045RD is now a Canadian Citizen and Research Scientist at CGC, and the Ph.D. candidate from the project, Guillaume Bilodeau, is now a Research Scientist at CFIA. Both were collaborators in the project recently completed. We appointed the deputy project manager, Wen Chen, to an indeterminate Research Scientist at the lead federal department. The main objective of this collaborative project between the S&T Branch of Agriculture and Agri-Food Canada (STB-AAFC) and the two main agencies monitoring pathogens in our food system, i.e. the CFIA and CGC, was to reduce the risk of biological threats to the Canadian agri-food system by exploiting new generations of sequencing and genotyping technology. The specific objectives were to: A) sequence the genomes and transcriptomes of selected high risk plant pathogens with NGS technology; B) develop quantitative PCR assays for species-specific detection; C) develop genotyping assays by microsatellite analysis or Sanger sequencing; D) develop methods for direct detection of genotypes by allele specific oligonucleotide (ASO) PCR, and perform pyrosequencing of universal 'barcode' PCR products from: E) agriculture soils; F) cereal grains; and G) agriculturally impacted water samples in areas of high intensity of livestock production.

Results: Baseline data and DNA banks/libraries were generated, genetic variability within species of plant pathogenic microbes occurring in the environment was determined and PCR assays for the detection of important genotypes were developed and validated. Genomes of the most important pathogens of cereal grains and potatoes were sequenced. The microbial profiles of soils from potato fields and water samples from agriculture and urban watersheds were analyzed by NGS to improve the safety and security of the Canadian food and agriculture system. NGS technologies were deployed as a monitoring tool, and as an effective means of generating comprehensive sequence databases. Metagenomic data generated by

page 5 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

NGS were used to validate detection and identification tools for targeted microbes. Project team members covered the full innovation continuum from discovery in research laboratories to adoption by end users. Many assays have been transferred to CFIA and CGC and several draft genomes of bacteria were published and other will come out shortly. The current project targeted the following plant pathogens of cereal grains (listed with the disease they cause): Pantoea (Erwinia) stewartii (Ps, Bacterium) Corn stewart wilt, Penicillium verrucosum (Pv, ) Ochratoxin contamination, Tilletia indica (Ti, Fungus) of , Tilletia controversa (Tc, Fungus) Dwarf bunt of wheat, Tilletia caries (Tca, Fungus) Common bunt of wheat, Tilletia walkeri (Tw, Fungus) Ryegrass bunt, Puccinia striiformis f. sp. tritici (Pst, Fungus) wheat stripe rust, Puccinia triticina (Pt, Fungus) wheat leaf rust, and Urocystis agropyri (Ua, Fungus) Flag smut of wheat. The targeted plant pathogens of potato and their diseases were Ralstonia solanacearum race 3 biovar 2 (Rs, Bacterium) Brown rot, Pectobacterium wasabiae (Pw, Bacterium) blackleg-like, Pectobacterium carotovorum subsp brasiliense (Pcb, Bacterium) blackleg-like, Streptomyces scabies (Ss, Bacterium) Potato Scab, Phytophthora infestans (Pi, fungus-like) Late Blight, Synchytrium endobioticum (Se, Fungus) Potato Wart, Globodera pallida (Gp, Nematode) Pale Potato Cyst nematode, Globodera rostochiensis (Gr, Nematode) Golden Potato Cyst nematode, Ditylenchus destructor (Dd, Nematode) Potato rot nematode, and Ditylenchus dipsaci (Ddi, Nematode) Stem nematode. Three Arcobacter species, a bacterial group causing diarrhea in humans, were also targeted: Arcobacter lanthierii (Al), Arcobacter septicus (As), and Arcobacter cibarius (Ac, as a close relative). Seven species in this list, including Tc, Tw, Pcb, Ds, Al, As, Ac, were not in our original charter, whereas two were removed from the list because we could not access material (Ua) or the genome we were expecting to access from a European consortium was not released until 2014 (Gp). In collaboration with the National Research Council of Canada, genome and transcriptome sequencing by NGS (objective A) was performed as planned for all species (with additions and exceptions mentioned above). Our original plan was to process only one strain per species for genome sequencing, but we more than quadrupled the number of sequenced strains. We were expecting to analyze published genome data for some targets but sequenced our own samples because the expected data were not released or limited (Gr and Pt). Although annotation of genomes was not included in the original objectives, except for Pv, we performed this work for nine species (Ac, Al, As, Ps, Pcb, Pw, Rs, Pv, Se) and for three closely related species of targets that were not on the original list (Pantoea ananatis, Penicillium nordicum, and Penicillium thymicola). A version control system was established and 1.5TB of final genome assembly data is now available to all collaborators on an internal website tracking our genomics and metagenomics results. For objective B, the project delivered six (6) of the eight (8) species detection assays it had planned originally (Ps, Ti, Tc, Ss, Se, Dd) and added six (6) new species (Tc, Tw, Pcb, Dd, Al, As). For the ochratoxin producing species (Pv), one of the original eight for which assays are not yet completed, significant additional resources were invested and assays targeting toxin gene cluster will be available next year. Urocystis agropyri, the second target lacking completed assays, was removed early in the project. For genotyping (objective C), the project delivered seven (7) (Pst, Pt, Rs, Pw, Pi, Se, Gr) of the eight (8) genotyping assays originally planned and added three (3) additional ones (Pv, Al, As). The missing assay was for Globodera pallida, removed from the project because the genome was not released on time from the European consortium. For direct detection of genotypes (objective D), assays were developed for five (5) of the six (6) targets (Pst, Pt, Rs, Pi, Se), whereas detection of a (1) Psb strain was added. The missing one (Gr) will be completed through a new project funded by AAFC. The 454 pyrosequencing of universal ‘barcode’ PCR products from the bacterial 16S rRNA gene (16S) and nematode/fungal Internal transcribed spacer (ITS) was performed on DNA templates extracted from soil samples collected from potato fields with a range of disease outbreaks or quarantine histories (objective E). A similar approach was followed for fungi and bacteria to compare microbial composition between high quality grain and visibly-diseased samples collected from various cereal growing areas, adding cpn60 as barcode marker (objective F). Finally, bacterial 16S was sequenced by 454 from water samples in agricultural watersheds with intensive animal husbandry practices, following manure

page 6 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing application (objective G). The total number of samples processed for these three objectives was 322, 30% more than the original plan of 250. A total of 2TB of metagenomic data was analyzed and is available to all collaborators. Moreover, several new bioinformatics tools for data analysis were developed, as we uncovered some issues with the “off-the-shelf” tools that we were originally planning to use.

Significance: Because first responder agencies were involved, uptake of many of the new technologies was immediate. Some assays were used in response to new incidents of high importance to Canada (eg. new findings of potato wart in PEI). The data acquired and the bioinformatics approaches developed already have broad applicability for Canada, with tangible impact on major trade issues. This CSSP project was seminal in the establishment of a new high level USDA-AAFC collaboration on NGS, which is strongly endorsed by USDA Undersecretary and AAFC DM. This collaboration would not have occurred without this CSSP project.

Future plans: This project generated many assays that reached at least the TRL 4. Several new projects were approved at CFIA, AAFC, some including the CGC, on the organisms and technologies developed here. These projects will increase the TRL of the assays, expanding the number of fully operational tests and tools. Proposals have been submitted on both sides of the border and outcomes so far have been successful.

page 7 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Sommaire .....

Introduction ou contexte : Les technologies de détection sont la pierre d’assise de la prévention des maladies, mais elles jouent aussi un rôle essentiel en matière d’interventions et de rétablissement. Dans ce continuum, il est crucial de caractériser les niveaux de référence actuels des pathogènes à risque élevé et des espèces étroitement apparentées pour pouvoir détecter sans équivoque les nouvelles éclosions et assurer le suivi des protocoles et des paramètres d’atténuation employés. Grâce aux technologies qui permettent non seulement l’identification des espèces, mais aussi le typage des souches, les intervenants peuvent déterminer la source des éclosions, comparer les souches à l’origine des éclosions aux populations endémiques et déterminer si la mise en œuvre de protocoles d’intervention et de rétablissement est nécessaire. Compte tenu du coût réduit des technologies de séquençage de nouvelle génération (SNG), il sera possible d’utiliser pleinement ces technologies, aujourd’hui et à l’avenir, pour assurer la surveillance des pathogènes et procéder à leur génotypage. Leur coût moindre nous permet également d’établir une base de données exhaustive du génome complet de tous les pathogènes à risque élevé, particulièrement de leurs génotypes des plus virulents, et du génome de quelques-unes des espèces les plus étroitement apparentées. De plus, pour préparer le Canada à relever les défis qui pourraient être associés au coût abordable du SNG pour la détection des pathogènes dans les produits importés de n’importe quel pays, nous devons améliorer l’exactitude et la robustesse de l’interprétation des données de SNG au moyen d’analyses validées. Au cours des travaux financés par le Programme canadien pour la sûreté et la sécurité (PCSS), dans le cadre de l’Initiative de recherche et de technologie CBRNE (projet CRTI-04-0045RD, 2005- 2009), certains organismes ciblés sont devenus problématiques à l’échelle nationale et internationale, et les technologies que nous avions mises au point ont été immédiatement mises en œuvre par les premiers intervenants, au Canada et aux États-Unis. Grâce au projet, l’Agence canadienne d’inspection des aliments (ACIA) a pu réaliser des gains importants sur le plan de l’efficacité opérationnelle. On a élaboré et transmis de nouvelles procédures opératoires normalisées (PON) qui valident notre approche consistant à inclure dès le départ les intervenants d’urgence désignés dans le plan de projet. Citons comme exemple la PON sur le Phytophthora ramorum, maintenant couramment utilisée par l’USDA (APHIS Elicitin WI- B-T-1-7) et la mise en œuvre, par la Commission canadienne des grains (qui ne participait pas au projet initial) d’une technologie de détection essentielle pour le Fusarium graminearum mycotoxinogène dans les céréales. Plusieurs publications évaluées par des pairs et des protocoles couramment utilisés par les utilisateurs finaux découlent directement du projet CRTI 04-0045RD. En outre, Tom Gräfenhan, boursier postdoctoral recruté de l’étranger pour travailler au projet CRTI 04-0045RD, est maintenant citoyen canadien et travaille comme chercheur scientifique à la Commission canadienne des grains, et Guillaume Bilodeau, candidat au doctorat dans le cadre du projet, est maintenant chercheur scientifique à l’ACIA. Tous deux ont collaboré au projet qui a pris fin récemment. Nous avons nommé Wen Chen, gestionnaire de projet adjoint, à un poste de chercheur scientifique de durée indéterminée au ministère fédéral responsable.

Le principal but de ce projet de collaboration entre la Direction générale des sciences et de la technologie d’Agriculture et Agroalimentaire Canada (DGST-AAC) et les deux principaux organismes chargés de surveiller les pathogènes dans notre système alimentaire, soit l’ACIA et la CCG, était de réduire le risque des menaces biologiques qui planent sur le système agroalimentaire canadien par l’exploitation des nouvelles technologies de séquençage et de génotypage. Les objectifs suivants étaient visés : A) séquencer les génomes et les transcriptomes de certains phytopathogènes à risque élevé au moyen de la technologie du SNG; B) mettre au point des épreuves de PCR quantitatives permettant de détecter des phytopathogènes spécifiques; C) mettre au point des épreuves de génotypage par analyse de microsatellites ou séquençage Sanger; D) élaborer des méthodes de détection directe de génotypes par la

page 8 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

PCR spécifique d’allèles (ou ASO, pour Allele Specific Oligonucleotide) et réaliser le pyroséquençage des produits de PCR amplifiés avec des code-barres universels provenant E) de sols agricoles; F) de grains céréaliers et G) d’échantillons d’eau prélevés dans des zones d’élevage intensif. Résultats : Le projet a permis d’établir des données de référence et des banques d’ADN, de déterminer la variabilité génétique d’espèces de phytopathogènes présents dans l’environnement et de mettre au point et valider des épreuves de PCR pour la détection de génotypes importants. Les génomes des principaux pathogènes présents dans les grains céréaliers et les pommes de terre ont été séquencés. Le profil microbien de sols provenant de champs de pomme de terre et d’échantillons d’eau provenant de bassins versants agricoles et urbains a été analysé par SNG dans le but d’améliorer la salubrité et la sécurité du système agricole et agroalimentaire canadien. Les technologies de SNG ont été mises à contribution comme outil de surveillance et comme moyen efficace d’établir des bases de données de séquences exhaustives. Les données métagénomiques obtenues par SNG ont servi à valider les outils de détection et d’identification utilisés pour les microorganismes ciblés. Les membres de l’équipe de projet ont couvert l’ensemble du continuum de l’innovation, depuis le stade de la découverte en laboratoire jusqu’à l’adoption par les utilisateurs finaux. De nombreuses épreuves ont été transférées à l’ACIA et à la CCG, et plusieurs ébauches de génomes de bactéries ont été publiées; d’autres le seront sous peu. Le projet en cours visait les phytopathogènes suivants, présents dans les grains céréaliers (la maladie causée est indiquée dans chaque cas) : le Pantoea (Erwinia) stewartii (Ps, bactérie) causant la maladie de Stewart chez le maïs, le Penicillium verrucosum (Pv, champignon) à l’origine de la contamination par l’ochratoxine, le Tilletia indica (Ti, champignon) causant la carie indienne chez le blé, le Tilletia controversa (Tc, champignon) causant la carie naine chez le blé, le Tilletia caries (Tca, champignon) causant la carie commune chez le blé, le Tilletia walkeri (Tw, champignon) causant la carie du ray-grass, le Puccinia striiformis f. sp. tritici (Pst, champignon) causant la rouille jaune chez le blé, le Puccinia triticina (Pt, champignon) causant la rouille des feuilles chez le blé, et l’Urocystis agropyri (Ua, champignon) causant le charbon des feuilles du blé. Les phytopathogènes ciblés pour la pomme de terre étaient les suivants : le Ralstonia solanacearum race 3 biovar 2 (Rs, bactérie) causant la pourriture brune, le Pectobacterium wasabiae (Pw, bactérie) causant des symptômes semblables à ceux de la jambe noire, le Pectobacterium carotovorum subsp brasiliense (Pcb, bactérie) causant aussi des symptômes semblables à ceux de la jambe noire, le Streptomyces scabies (Ss, bactérie) causant la gale commune, le Phytophthora infestans (Pi, organisme semblable à un champignon) causant le mildiou, le Synchytrium endobioticum (Se, champignon) causant la gale verruqueuse, le Globodera pallida (Gp, nématode) ou nématode blanc, le Globodera rostochiensis (Gr, nématode) ou nématode doré, le Ditylenchus destructor (Dd, nématode) ou nématode de la pourriture des racines et le Ditylenchus dipsaci (Ddi, nématode) ou nématode des tiges et des bulbes. Trois espèces d’Arcobacter, un groupe de bactéries causant la diarrhée chez les humains, étaient également ciblées : l’Arcobacter lanthierii (Al), l’Arcobacter septicus (As) et l’Arcobacter cibarius (Ac, espèce étroitement apparentée). Sept espèces figurant dans cette liste, en l’occurrence les espèces Tc, Tw, Pcb, Ds, Al, As et Ac, ne faisaient pas partie de notre mandat initial, tandis que deux autres ont été retirées de la liste, parce que nous n’avons pas eu accès au matériel (dans le cas de l’Ua) ou que le génome que devait fournir un consortium européen n’a pas été diffusé avant 2014 (dans le cas du Gp). En collaboration avec le Conseil national de recherches du Canada, le séquençage des génomes et des transcriptomes au moyen du SNG (objectif A) a été réalisé comme prévu pour toutes les espèces (compte tenu des ajouts et des exceptions mentionnés ci-dessus). Nous voulions au départ procéder au séquençage du génome d’une seule souche par espèce, mais nous avons finalement plus que quadruplé le nombre de souches séquencées. Alors que nous avions prévu analyser les données de génomes publiés pour certains organismes ciblés, nous avons dû séquencer nos propres échantillons, car les données attendues n’ont pas été diffusées ou étaient limitées (Gr et Pt). Bien que l’annotation des génomes n’était pas prévue dans les objectifs initiaux, sauf dans le cas du Pv, nous avons fait ce travail pour neuf espèces (Ac, Al, As, Ps, Pcb, Pw, Rs, Pv, Se) et pour trois espèces étroitement apparentées qui ne figuraient pas dans la liste originale (le Pantoea ananatis, le Penicillium nordicum et le Penicillium thymicola). Un

page 9 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing système de contrôle des versions a été établi, et 1,5 To de données d’assemblage du génome finales sont maintenant à la disposition de tous les collaborateurs sur un site Web interne regroupant nos résultats en matière de génomique et de métagénomique. Dans le cadre de l’objectif B, nous avons mis au point des épreuves de détection pour six des huit espèces qui étaient prévues au départ (Ps, Ti, Tc, Ss, Se, Dd) et pour six espèces additionnelles (Tc, Tw, Pcb, Dd, Al, As). Dans le cas de l’espèce productrice d'ochratoxine (Pv), l’une des espèces pour laquelle l’épreuve n’a pas encore été réalisée, des ressources supplémentaires considérables ont été investies, et des épreuves ciblant le groupe de gènes de la toxine seront disponibles l’an prochain. L’Urocystis agropyri, la seconde espèce ciblée pour laquelle l’épreuve n’a pas été réalisée, a été retiré de la liste au début du projet. En ce qui concerne le génotypage (objectif C), nous avons mis au point des épreuves de génotypage pour sept des huit espèces prévues au départ (Pst, Pt, Rs, Pw, Pi, Se, Gr) et nous en avons ajouté trois autres (Pv, Al, As). L’épreuve manquante est celle du Globodera pallida, qui a été retirée du projet parce que le génome n’a pas été diffusé à temps par le consortium européen. Pour ce qui est de la détection directe des génotypes (objectif D), nous avons mis au point des épreuves pour cinq des six espèces ciblées (Pst, Pt, Rs, Pi, Se) et nous avons ajouté la détection d’une souche du Psb. L’épreuve manquante (Gr) sera mise au point dans le cadre d’un nouveau projet financé par AAC.

Nous avons effectué le pyroséquençage par la technologie 454 des produits de PCR provenant d’échantillons de sol de champs de pommes de terre comportant des antécédents d’éclosions ou de mises en quarantaine, produits amplifiés avec des code-barres universels (ARNr 16S des bactéries et espaceurs transcrits internes [ITS, pour Internal Transcribed Spacer] des nématodes et des champignons) (objectif E). Nous avons réalisé d’autres travaux selon une approche similaire visant à comparer la composition microbienne (champignons et bactéries) des échantillons de grains de qualité par rapport à d’autres visiblement contaminés provenant de différentes zones de culture céréalière, en ajoutant le cpn60 comme marqueur de code-barres (objectif F). Enfin, nous avons effectué le pyroséquençage par la technologie 454 de l’ARNr bactérien 16S provenant d’échantillons d’eau prélevés dans des bassins agricoles où l’on pratique l’élevage intensif, à la suite de l’épandage de fumier (objectif G). Le nombre total d’échantillons traités dans le cadre de ces trois objectifs a été de 322, soit 30 % de plus que le nombre prévu au départ, qui était de 250. En tout, 2 To de données métagénomiques ont été analysés et mis à la disposition de tous les collaborateurs. De plus, plusieurs nouveaux outils bioinformatiques ont été mis au point pour l’analyse des données, certains produits commerciaux que l’on prévoyait utiliser ayant posé des problèmes. Importance : Grâce à l’engagement des organismes de première ligne, plusieurs des nouvelles technologies ont été adoptées immédiatement. Certaines épreuves ont été utilisées en réaction à de nouveaux incidents présentant une grande importance pour le Canada (p. ex., nouveaux cas de gale verruqueuse de la pomme de terre à l’Î.-P.-É.). Les données acquises et les outils bioinformatiques mis au point ont déjà de nombreuses applications au Canada et ont des répercussions concrètes sur de grands enjeux commerciaux. Ce projet du Programme canadien pour la sûreté et la sécurité (PCSS) a grandement contribué à l’établissement d’une collaboration plus étroite entre l’USDA et AAC dans le dossier du SNG, collaboration fortement soutenue par le sous-secrétaire de l’USDA et la sous-ministre d’AAC. Cette collaboration n’aurait pas été possible sans ce projet du PCSS.

Perspectives : Ce projet a mené à la mise au point de nombreuses épreuves ayant atteint au moins le NMT 4. Plusieurs nouveaux projets portant sur les organismes ciblés et les technologies mises au point ont été approuvés à l’ACIA et à AAC, dont certains à la Commission canadienne des grains. Ces projets feront augmenter le NMT des épreuves ainsi que le nombre d’essais et d’outils pleinement opérationnels. Des propositions ont été soumises des deux côtés de la frontière, et les résultats obtenus jusqu’à présent sont positifs.

page 10 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Table of contents

Abstract ……...... 3 Résumé …...... 4 Executive summary ...... 5 Sommaire ...... 8 Table of contents ...... 11 List of figures ...... 14 List of tables ...... 15 Acknowledgements ...... 16 1 Introduction ...... 17 2 Purpose ...... 18 3 Methodology ...... 20 A) Genome and transcriptome sequencing by next generation technology ...... 20 B) Quantitative real-time PCR assays for species-specific detection ...... 20 C) Genotyping of isolates by microsatellite analysis or Sanger sequencing ...... 20 D) Direct detection of genotypes by ASO-PCR ...... 20 E) Next generation sequencing of universal ‘barcode’ PCR products from agriculture soils . 21 F) Next generation sequencing of universal 'barcode' PCR products from cereal grains ...... 21 G) Next generation sequencing of universal ‘barcode’ PCR products from water/manure samples from high density livestock areas...... 21 4 Results ...... 24 4.1 Genome sequencing, assembly and annotation at glance ...... 24 4.1.1 Bioinformatics pipelines ...... 24 4.1.2 Impact and relevance to the identified priority and gap addressed by the project ...... 25 4.1.3 Lessons Learned and implementation plan of the Lessons Learned ...... 25 4.1.4 New capabilities, partnerships and networks created through the horizontal work of the project ...... 25 4.2 Targeted organisms ...... 27 4.2.1 Pantoea stewartii (Ps) & Streptomyces scabies (Ss) ...... 27 4.2.1.1 Genome sequencing and diagnostic tools ...... 27 4.2.1.2 Lessons learned and implementation plan of the lessons learned27 4.2.1.3 New capabilities, partnerships and networks created through the horizontal work of the project...... 28 4.2.2 Ralstonia solanacearum (Rs) ...... 28 4.2.2.1 Genome and transcriptome sequencing ...... 28 4.2.2.2 Diagnostic tools ...... 29 4.2.2.3 Follow-up activities and new R&D initiated ...... 29 4.2.3 Pectobacterium wasabiae (Pw) ...... 29 4.2.3.1 Genome and transcriptome sequencing ...... 30

page 11 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

4.2.3.2 Diagnostic tools ...... 30 4.2.3.3 Follow-up activities and new R&D initiated ...... 30 4.2.4 P. carotovorum subsp. brasiliense (Pcb) ...... 30 4.2.5 Arcobacter lanthierii (Al), A. septicus (As) & A. cibarius (Ac) ...... 31 4.2.5.1 Genome and transcriptome sequencing ...... 31 4.2.5.2 Diagnostic tools and its impact and relevance to the identified priority and gap addressed by the project ...... 32 4.2.5.3 Lessons learned and implementation plan of the lessons learned32 4.2.5.4 New capabilities, partnerships and networks created through the horizontal work of the project ...... 32 4.2.6 Penicillium verrucosum (Pv) ...... 32 4.2.6.1 Genome and transcriptome sequencing ...... 32 4.2.6.2 Diagnostic tools ...... 33 4.2.7 Tilletia ...... 33 4.2.7.1 Genome and transcriptome sequencing ...... 33 4.2.7.2 Diagnostic tools ...... 33 4.2.7.3 Impact and relevance to the identified priority and gap addressed by the project ...... 34 4.2.7.4 Lessons learned and implementation plan of the lessons learned35 4.2.7.5 New capabilities, partnerships and networks created through the horizontal work of the project ...... 35 4.2.7.6 Transition and exploitation (this is our validation and protocol transfers) ...... 35 4.2.8 Puccinia ...... 36 4.2.8.1 Diagnostic tools ...... 36 4.2.8.2 Impact and relevance to the identified priority and gap addressed by the project ...... 36 4.2.8.3 Lessons learned and implementation plan of the lessons learned37 4.2.8.4 New capabilities, partnerships and networks created through the horizontal work of the project...... 37 4.2.9 Phytophthora infestans (Pi) ...... 37 4.2.9.1 Diagnostic tools ...... 37 4.2.10 Synchytrium endobioticum (Se) ...... 39 4.2.10.1 Genome and transcriptome assembly and annotation ...... 39 4.2.10.2 Diagnostic tools ...... 39 4.2.11 Globodera pallida (Gp) ...... 40 4.2.12 Globodera rostochiensis (Gr) ...... 40 4.2.13 Ditylenchus destructor (Dd) ...... 41 4.2.14 Ditylenchus dipsaci (Ddi) ...... 42 4.3 Metagenomics from agricultural soil, commodity, agricultural & urban watershed ... 42 4.3.1 Bioinformatics tools developed for pathogen monitoring and biodiversity studies ...... 42 4.3.2 NGS data analysis ...... 43 4.3.3 Comparison of data with cpn60 ...... 44 4.3.4 Impact, transition and exploitation ...... 44

page 12 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

5 Transition and Exploitation ...... 45 5.1 Genomics and Metagenomics data ...... 45 5.2 Validation of the diagnostic tools ...... 45 5.3 New projects initiated for further development and transition ...... 45 5.4 Success stories ...... 47 Improved capacity in agri-food research for high risk pathogens ...... 47 Impact of project on policy development and government priorities ...... 48 Karnal bunt of wheat ...... 48 Detection of pathogens in grains by Next Generation Sequencing (NGS) technology 49 Potato wart ...... 49 Potato rot nematode ...... 49 Potato brown rot ...... 50 Ochratoxin A (OTA) ...... 50 Wheat Stripe rust ...... 50 6 Conclusion ...... 52 References...... 54 Annex A Project Team ...... 56 Annex B PROJECT PERFORMANCE SUMMARY ...... 59 Schedule Performance Summary ...... 60 Cost Performance Summary ...... 60 Annex C Publications, Presentations, Patents ...... 62 6.1 Publications ...... 62 6.1.1 Software release ...... 62 6.1.2 Journal articles ...... 62 6.2 Presentations ...... 63 6.3 Presentations to senior management ...... 65 List of symbols/abbreviations/acronyms/initialisms ...... 67 List of target organisms ...... 70 Partner/Scientist ...... 70 Software Names and Sources ...... 71 Glossary...... 73

page 13 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

List of figures

Figure 1: Gantt chart showing the different objectives, the schedule, the linkages to milestones, and the responsibilities for the different partners...... 22 Figure 2: Principal Component Analysis with Bruvo’s distance of Potato wart isolates from different geographical origins and of different pathotypes (Se01-Se17)...... 40

page 14 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

List of tables

Table 1. List of target organisms from the most up to date charter with the partner responsible for each objective pertaining to this species. List of organisms and partner initials given at the end...... 23 Table 2: Summary of genomes sequenced ...... 25 Table 3: Details of genomes sequenced ...... 26 Table 4: Diagnostic test development for Tilletia spp...... 33 Table 5: Expected genotypes and profiles of the five dominant Canadian genotypes of Phytophthora infestans at nine optimized allele-specific oligonucleotide (ASO)-PCR assays...... 38 Table 6: Metagenomic sample summary ...... 43

page 15 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Acknowledgements

This work was supported by Defence Research and Development Canada Centre for Security Science Chemical, Biological, Radiological/Nuclear, and Explosives Research and Technology Initiative (CRTI 09-462RD / CSSP 30vv01), Agriculture and Agri-Food Canada, the Canadian Food Inspection Agency and the Canadian Grain Commission.

We want to acknowledge the support of the past CSSP Biology portfolio managers, namely, Norm Yanofsky and Diana Wilkinson, their dedication was essential to our success. We are also grateful to Helen Spencer, Biology portfolio managers during an earlier CRTI project, who instilled into us early on the solid project management work ethics that are essential for such large project. We want to thank members of the Project Review Committee for guidance and support, including Peter Burnett, former director of the CGC Grain Research Laboratory Director.

We want to thank Jeff Cullis and Christine Lowe at AAFC, for the bioinformatics work; Julie Chapados, Wayne McCormick and Ekaterina Ponomavera for molecular work; and Andrew Sharpe of NRC Saskatoon for the sequencing.

page 16 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

1 Introduction

The new generation of sequencing technologies to be exploited through this project were Illumina and Roche 454 pyrosequencing (Metzker, 2010), selected as appropriate for each objective. Each of eight flow cells in an Illumina sequencer deliver ~1 billion base pairs (bp) of DNA sequence/run, but reads are short (36-150 bp). The 454 Pyrosequencing delivers 500 million bp/run, i.e. 20 times less, but sequences are longer, being over 400 bp (750 bp by the end of the project). We used Illumina to sequence complete genomes and cDNA libraries from clean samples. These data were used to develop PCR assays to detect selected species and genotypes. We used pyrosequencing to analyze DNA from environmental samples of soil, cereal grain, and agricultural and urban water, using DNA markers commonly used for species identification because longer sequences are generally required for reliable species diagnostics.

The specimen-based genomics objectives were to identify diagnostic microsatellite regions and genes with abundant single nucleotide polymorphisms (SNP). Strain typing is vital to detect highly virulent genotypes and to trace back origins of accidental or deliberate outbreaks. With NGS, an entire eukaryotic genome or transcriptome (<100M bp) can be easily sequenced with several times coverage. Using these sequences, species detection assays were developed for pathogens not covered in our previous CRTI project. The environmental genomics objectives were to characterize the complete microbiota by next generation "deep" sequencing of barcodes in agricultural/urban surface water, potato farm soil and cereal grains. Samples with a known history of targeted pathogens were compared to matched controls expected to be free of these pathogens.

The outputs for this project were validated using real-time PCR assays for selected agents and quarantine fungi and bacteria of direct importance to Canada’s economy and environmental health. The resulting genomic and metagenomic data will serve as resources and benchmarks to assess future threats. The outputs will have significant application for detection and/or identification of high risk plant pathogens and mycotoxigenic fungi in cereal grains and potato fields, and their interest to regulatory agencies is evident from the participation of CFIA and CGC. The assays can be used by all levels of government, Canada’s S&T focused federal departments, and commercial laboratories involved in assurance of grain safety nationally and internationally. For example, Penicillium verrucosum is a prime concern for Canada’s European trade partners because it produces the ochratoxin A, which is of considerable concern for human health; this risk profile gives this fungus significant potential for misuse, an issue that will be mitigated by the assays and data developed here.

The collaboration with CFIA and CGC, the first responder agencies, allowed immediate uptake of technologies developed in the current project. Our analysis of environmental metagenomic data focused on the improvement of accurate detection of species / subspecies, rather than the higher categories that are the focus of most contemporary metagenomic analyses. The new bioinformatics tools developed in this project conducted proof of concept in accurately differentiating pathogenic and non- pathogenic relatives using NGS data. Microorganisms by definition are invisible, and their presence and activities are generally invisible to even the scientific community. The methods will allow routine monitoring of the Canadian environment (preparedness, prevention), and will also be essential for response and recovery activities from microbiological events. These new opportunities will be a major advance in operational capability.

page 17 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

2 Purpose

Detection technologies are the cornerstone of prevention but are also essential for response and recovery. In this continuum, characterization of existing baseline levels of high risk pathogens and closely related species is essential to unambiguously detect new outbreaks and monitor recovery protocols and parameters. Technologies that go beyond species identification and allow strain typing enable responders to determine the source of an outbreak, compare outbreak strains with endemic populations, and determine whether response and recovery protocols are required.

During our previous work (CRTI 04-0045RD), some targets became national and international issues and the technologies we developed were immediately implemented by responders in Canada and the US. Our previous project led to major improvements in operational effectiveness at the Canadian Food Inspection Agency: New Standard Operating Procedures (SOP) were developed and transferred, validating our approach to include the designated emergency incident responders in planning from the onset. From that project, another SOP is routinely used by the USDA (APHIS Elicitin WI-B-T-1-7), and the Canadian Grain Commission (not a partner in the original project) implemented a critical detection technology for Fusarium.

There was an urgent need to follow up on CRTI 04-0045RD because of the new opportunity provided by recent advances in DNA sequencing technology, an Emerging S&T that we must exploit because of new or emerging major high risk plant, animal and human pathogens in the Canadian agri-food system. Other countries are moving in this direction for monitoring imported agri-food. Canada will be at risk if it does not generate its own data and capability with this new generation of sequencing technologies.

Our main objective was to reduce risk of biological threats to the Canadian agri-food system, expand targets identified in our earlier CRTI project (04-0045RD) to include bacteria, nematodes as well as more fungi and develop new technological capabilities, databases and tools to support responder networks. We wanted to exploit new generations of sequencing and genotyping technologies to generate baseline data, DNA banks, and validated real-time PCR assays for species detection or genotype characterization of microbes in the environment. To improve the safety and defence of the Canadian food and agriculture system (priority SFS/SCA), we targeted the most important pathogens of cereal grains and potatoes, soils from potato-fields, and manure-contaminated water. NGS technology was to be used both as a monitoring tool (priority EST/NST), and as an effective means of generating comprehensive sequence data to develop and validate detection and identification tools for high risk biologicals. This was a collaborative project between the S&T Branch of Agriculture and Agri-Food Canada (AAFC) and the two main agencies monitoring pathogens in our food system, i.e., the Canadian Food Inspection Agency (CFIA) and the Canadian Grain Commission (CGC). AAFC provided expertise in next generation sequencing, biosystematics, genomics, transcriptomics, bioinformatics support and infrastructure, and biological collections with reference strains. The agencies provided end user perspective, expertise in developing detection tools for regulatory purposes, and Quality Assurance and Quality Control expertise required by first responders. The project team covered the full innovation continuum from discovery in research laboratories to adoption by end users.

Objectives were to, A) sequence the genome and transcriptome of selected plant pathogens using next generation technology; B) develop quantitative real-time PCR assays for species-specific detection; C) develop genotyping assays by Microsatellite analysis or Sanger Sequencing; D) develop direct detection of genotypes by allele specific PCR, and perform pyrosequencing of universal “barcode” PCR products from E) agriculture soils, F) cereal grains, and G) manure-contaminated agricultural water samples from intensive livestock farming areas.

page 18 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

The plan followed a project management matrix, whereby deliverables were defined based on objectives and pathogens targeted (see Figure 1 and Table 1). Current technical and scientific knowledge varied greatly on different pathogens, which meant that tool development for end users begun at different stages. For genomes already available, de novo genomic sequencing was not planned, and re-sequencing was only used for confirmation and validation. When genomes were very large, we focussed on sequencing transcriptomes to find SNP's to reduce cost and risk.

page 19 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

3 Methodology

Major tasks and associated target species are outlined in Figure 1 and Table 1. Validated technology and data to be transferred to end users are in objectives B to G.

A) Genome and transcriptome sequencing by next generation technology Representative fungal and bacterial strains were selected and obtained by taxonomic experts (Table 1: Ps/Pv/Ti/Tc/Tca/Tw/Rs/Pw/Ss/Pi/Se/Gp/Gr/Dd/Ddi/Al/As/Ac), and grown in pure culture or on hosts in dual culture. In the latter case, the pathogen was purified to minimize contamination by host DNA. DNA and RNA were extracted and assessed for quality and quantity. Pathogens were grown under different environmental conditions to maximize gene expression. Libraries of genomic DNA and cDNA were prepared by the sequencing service provider (National Research Council Canada, Saskatoon Saskatchewan, Canada); although, some of the last genomes sequenced were completely processed at AAFC with MiSeq technology. Transcriptomes were indexed and sequenced using individual flow cells of Illumina HiSeq sequencers, yielding 100 bp paired end reads. Multiple strains of transcriptomes were analysed for Pv, Pi, and Rs for SNP discovery. Bacterial, fungal and nematode genomes were assembled from paired end reads of ~300 bp fragments using 100 bp reads, targeting a minimum of 100X coverage. Mate-pair reads for HiSeq were done at 3 and 8 kb (50 bp paired end). Genomic data were assembled with AAFC bioinformatics servers.

B) Quantitative real-time PCR assays for species-specific detection Representative strains of targets and at least five closely related species were obtained by experts (Table 1: Ps/Pv/Ti/Tc/Ss/Se/Dd). Genomic data generated were analyzed to generate a minimum of three final sets of validated species-specific primers and TaqMan probes. Assays were validated using pure DNA and field DNA samples, including some obtained in objectives E and F. Fluorescence Resonance Energy Transfer (FRET) was also developed for Ps and Ss, going beyond the original objectives.

C) Genotyping of isolates by microsatellite analysis or Sanger sequencing Representative collections of each target species, with a minimum of 10 strains, were obtained by experts (Table1: Ps/Pst/Pt/Rs/Pw/Ss/Pi/Se/Dd). Genomic data obtained in this project (Ps/Rs/Pw/Ss/Pi/Se/Dd) or from other laboratories (Ps/Pst) were analysed and PCR primer pairs were designed for hypervariable regions (either microsatellites or SNP rich exons).

Amplification of hypervariable regions was done by PCR. PCR products were either sequenced by Sanger sequencing or microsatellite patterns were evaluated using electrophoresis. Oligonucleotide primers for microsatellites or most informative SNP regions were identified and optimized.

D) Direct detection of genotypes by ASO-PCR Genomic data and Sanger sequencing generated in objective C were analyzed for SNP detection and to design primers for Allele Specific Oligonucleotide (ASO) PCR, including Taqman probes. Assays were validated with pure strains/specimens, including samples from objectives E/F.

page 20 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

E) Next generation sequencing of universal ‘barcode’ PCR products from agriculture soils Soil samples were collected from potato fields with a range of outbreak/quarantine histories. DNA was extracted with commercial kits and amplified using universal PCR primers for eukaryote ITS region (all fungi and nematodes), and 16S rRNA gene for bacteria. PCR primers were labeled with 24 different MIDs. Forty-eight samples were analyzed using two primer sets, over two plates of 454 pyrosequencing, yielding ~20,000 sequences/sample. The bioinformatics pipeline developed for our Growing Forward project for automatic identification was optimized for rapid detection of all target species from this and our previous CRTI projects. Some samples were used for validation of PCR assays listed in objectives B and D were identified (Figure 1).

F) Next generation sequencing of universal 'barcode' PCR products from cereal grains The approach was similar to "E", but with a plan to double the samples with the same amount of sequencing (i.e. ~10,000 sequences/sample). A comparison of the microbial species composition of high quality grain with diseased-looking seed samples (FDK, mildew, ergot) was made for various cereal growing areas. The same MIDs was used for fungi and bacteria on each half plate and sequences sorted by MID and identification of kingdom. Funded by another project, we also utilized MID-tagged cpn60 universal target amplicons for determining with high resolution the complement of microbes that are associated with a variety of crop seeds (Figure 1).

G) Next generation sequencing of universal ‘barcode’ PCR products from water/manure samples from high density livestock areas

Water samples from agricultural and urban watersheds with intensive animal husbandry practices were collected. Ninety-six samples were processed with only bacterial primers using 16S rRNA gene, with ~20,000 sequences/ sample, using the methods outlined in "E". Data was analysed for waterborne pathogens causing these diseases: Gastro-enteritis, abortions in animals, gastric ulcers, lung infection, nosocomial, wound, urinary tract infections and bacteraemia (Figure 1 and Table 1). In addition, genome and transcriptome sequencing of novel Arcobacter species including A. lanthierii and A. septicus along with A. cibarius were performed.

page 21 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Quarter Responsibility (work package Objectives / Work Packages coordinator in bold) Apr-Jun 11 Jul-Sep 11 Oct-Dec 11 Jan-Mar 12 Apr-Jun 12 Jul-Sep 12 Oct-Dec 12 Jan-Mar 13 Apr-Jun 13 Jul-Sep 13 Oct-Dec 13 Jan-Mar 14 Apr-Jun 14 Jul-Sep 14 A) Genome/transcriptome sequencing by next generation technology C. Lewis Selection of strains; building up collection (milestone 6) see Table 1 DNA/RNA extraction, including QA/QC Seq. Coord./TBD Next generation genome sequencing (all targets on Table 1 except Pi) Seq. Coord./TBD Next generation transcriptome sequencing Seq. Coord./TBD Assembly of genomes and transcriptomes (milestone 8) C. Lewis Automated annotation (only for Pv ) C. Lewis Data made available to all partners C. Lewis B) Quantitative PCR assays for species specific detection J. Tambong Build collections of each target and closely related species (milestone 6) See Table 1 Genomic data analysis and primer design See Table 1 Validation of assays, including samples from objectives E/F (milestone 9) See Table 1 Transfer of technology to end users (milestone 15) See Table 1 C) Genotyping of isolates by Microsatellite analysis or Sanger Sequencing S. Hambleton Build broad collection of each target species (milestone 6) See Table 1 Genomic data analysis and primer design for hypervariable regions † See Table 1 Amplification of hypervariable region by PCR See Table 1 Sequencing or Microsatellite analysis of PCR products See Table 1 Identification of the best SNP rich regions (milestone 10) See Table 1 Identification of the best microsatellite primers (milestone 10) See Table 1 Transfer of technology to end users (milestone 15) See Table 1 D) Direct detection of genotypes by ASO-PCR Q. Yu Build broad collection of each target species (milestone 6) done in #3 SNP analysis and ASO primer design See Table 1 Validation of assays, including samples from objectives E/F (Milestone 11) See Table 1 Transfer of technology to end users (milestone 15) See Table 1 E) Next Gen sequencing of "barcode" PCR products from agriculture soils S. Li Collect soil samples (milestone 7) S. Li Extract DNA S. Li PCR for Bacteria / fungi with molecular identifiers (MID) incorporation S. Li Pyrosequencing Seq. Coord./TBD Pipeline optimization / First data analysis C. Lewis/Links Development of rapid mining tools for the target species C. Lewis Identify DNA samples for validation of assays in objectives 2 and 4 see Table 1 Provide DNA samples for objective 2 and 4 S. Li Validation of automated data analysis (milestone 12) See Table 1 Further optimization of pipeline tools C. Lewis/Links Transfer of technology to end users (milestone 15) C. Lewis F) Next Gen sequencing of 'barcode' PCR products from cereal grains T. Gräfenhan Collect cereal samples (milestone 7) T. Gräfenhan Extract DNA T. Gräfenhan PCR for Bacteria / fungi with MID incorporation T. Gräfenhan Pyrosequencing Seq. Coord./TBD Pipeline optimization / First data analysis C. Lewis/Links Development of rapid mining tools for the target species C. Lewis Identify DNA samples for validation of assays in objectives 2 and 4 see Table 1 Provide DNA samples for objective 2 and 4 T. Gräfenhan Validation of automated data analysis pipeline (milestone 13) See Table 1 Further optimization of pipeline tools C. Lewis/Links Transfer of technology to end users (milestone 15) C. Lewis G) Next Gen sequencing of universal "barcode" PCR products from I. Khan water/manure samples from high density livestock areas Collect fecal/water samples from agriculture watershed (milestone 7) Khan Extract DNA from water and fecal samples Khan PCR for Bacteria (16S rRNA gene) Khan Pyrosequencing Seq. Coord./TBD Pipeline optimization / First data analysis C. Lewis/Links Validation of automated data analysis pipeline (milestone 14) Khan Further optimization of pipeline tool and data analysis C. Lewis/Links Transfer of technology to end users (milestone 15) Khan Figure 1: Gantt chart showing the different objectives, the schedule, the linkages to milestones, and the responsibilities for the different partners.

page 22 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Table 1. List of target organisms from the most up to date charter with the partner responsible for each objective pertaining to this species. List of organisms and partner initials given at the end. Host Cereal Grains Potato Human and animal pathogens Gastro-enteritis, abortions in animals, Gatric ulcers, Lung infection, Nosocomal, wound,

Disease stewart Corn wilt Ochratoxin bunt Karnal of wheat bunt Dwarf of wheat Common ofbunt Ryegrass bunt stripe wheat rust leaf wheat rust rot Brown blackleg-like Potato Scab Blight Late Potato Wart Pale Potato Cyst Golden cyst)(potato Potato rot nematode Stem nematode Diarrhea urinay tract infections, bacteraemia

Kingdom Bacterium Fungus Fungus Fungus Fungus Fungus Fungus Fungus Bacterium Bacterium Bacterium fungus-like Fungus Nematode Nematode Nematode Nematode Bacterium Bacteria Arcobacter spp., Bacillus spp.,

f. sp. Campylobacter spp., Clostridium spp., Enterococcus spp., tritici

Tilletia Escherichia coli, Helicobacter stewartii scabies Puccinia Ralstonia Ralstonia infestans dipsaci + wasabiae destructor Globodera Globodera lanthierii + lanthierii Penicillium Arcobacter Arcobacter verrucosum controversa Ditylenchus Ditylenchus

Synchytrium pylori, Listeria monocytogenes, Tilletia indica Phytophthora rostochiensis endobioticum solanacearum solanacearum Streptomyces Tilletia carries + striiformis Pectobacterium Tilletia + walkeri Puccinia triticina Species (Erwinia) Pantoea pallida Globodera Mycobacterium spp., Salmonella spp., Shigella spp. Species initials Ps Pv Ti Tc Tca Tw Pst Pt Rs Pw Ss Pi Se Gp Gr Dd Ddi Al A) Genome/transcriptome sequencing MR/ JT KS SH SH SH SH n/a n/a SL SL JT LK SL BM QY QY IK n/a by next generation technology BM

B) Quantitative PCR assays for species JT KS SH* SH n/a n/a n/a n/a n/a n/a JT n/a GB * n/a n/a QY n/a n/a n/a specific detection

C) Genotyping of isolates by GB/ JT n/a n/a n/a n/a n/a SH SH SL SL JT GB n/a n/a n/a n/a n/a n/a Microsatellites or Sanger Sequencing LK*

D) Direct detection of genotypes by BM/ BM/ n/a n/a n/a n/a n/a n/a SH SH SL n/a n/a GB n/a n/a n/a n/a n/a ASO-PCR QY† QY†

E) Next Gen sequencing of "barcode" AL/ GB/ n/a n/a n/a n/a n/a n/a n/a n/a SL SL JT QY QY QY QY n/a IK PCR products from agriculture soils LK AL

F) Next Gen sequencing of 'barcode' JT/ KS/ SH/ SH/ SH/ SH/ SH/ SH/

Objectives / Packages Objectives Work n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a IK PCR products from cereal grains DL DL DL DL DL DL DL DL

G) Next Gen sequencing of universal "barcode" PCR products from n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a IK IK water/manure samples from high density livestock areas

* Improvement over what was developed in previous project CRTI 04-0045RD † Objective now covered by AAFC A-base project. Qing Yu will contribute to validation. n/a not applicable + Target species added during the project (Urocystis agropyri was removed)

page 23 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

4 Results

4.1 Genome sequencing, assembly and annotation at glance

4.1.1 Bioinformatics pipelines

We developed a comprehensive genome assembly pipeline, which is a Perl-based wrapper utility of the following functions: QC (FastQC (Andrews, 2010), fastx_trimmer (Gordon & Hannon, 2010)); Genome assembly (selection of Velvet (Zerbino & Birney, 2008), SPAdes (Bankevich et al., 2012) and / or Celera (Myers et al., 2000)); Transcriptome assembly [Tuxedo suite (Bowtie (Langmead et al., 2009), Tophat (Trapnell et al., 2009), Cufflinks (reference-based) (Trapnell et al., 2010))]. YAML format data files were generated which tracked all sequencing and processing metadata for thousands of output assemblies. The pipeline allowed all assemblies to be regenerated as code base and tool parameters improved. We have released the best assemblies and associated metadata for each of 56 unique sequenced organisms (Table 2 and 3). The release of metadata facilitated a common understanding of methods, promoting collaboration and publication. This pipeline is used as protocols for de novo assemblies.

Among all targeted organisms, bacteria had the best assembly quality, and most publications to date (refer to publications section). To improve genome assembly for the fungi, we integrated mate-pair sequencing data, which greatly improved some assemblies, e.g. Tilletia controversa (Tc) and T. caries (Tca). Nematodes have the largest genomes; while some were well assembled, others appeared contaminated by sequences representing non-target kingdoms present in the gut of the nematodes. In all cases, there were enough quality data to be used to develop diagnostic assays. In the few cases when contaminants were suspected, assembled genome contigs were matched through BLAST (Altschul et al., 1997) and aligned to contaminant genomes for contaminant identification, however coverage was typically too low for good assembly after removing contaminant reads (e.g. original runs of Synchytrium endobioticum (Se) and Ditylenchus spp.). We also improved assemblies of large genomes using mate-pair sequencing data to get longer and more accurate scaffolds.

Going beyond the original goals, annotation work has become a core part of our post-sequencing analyses. Maker (Cantarel et al., 2008) and InterProScan (Mulder & Apweiler, 2007) pipelines were used for genome annotation. Eight genomes have been annotated so far including one strain of P. thymicola (Pth), one strain of P. nordicum (Pn), four strains of P. verrucosum (Pv) as well as two strains of Se. Annotation gene numbers were consistent with those of JGI and Broad Institute for related reference annotations. Additional work on mate-pair sequencing, including exhaustive parameter searches, has so far yielded benefits for Se, e.g. doubled best assembly N50 score after exhaustive search for best scaffolding parameters. Additional annotation and mate-pair work for Se is part of ongoing collaborative efforts with PRI towards identifying genes for pathotyping.

To facilitate pipeline use by non-experts, efforts in the final 6 months focussed on enabling pipelines in the Galaxy platform (Blankenberg et al., 2010). Galaxy provides a web-based GUI for launching cluster jobs, implementing metadata tracking, improving reproducibility, interchanging tools, and versioning. These efforts are starting to pay dividends; porting of components of the assembly pipeline into Galaxy has led towards full automation of post- sequencing analysis. Work with Galaxy shows great promise to enable key features, such as the metadata capture and reproducibility used in our assembly pipeline, with much lower development time.

page 24 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

4.1.2 Impact and relevance to the identified priority and gap addressed by the project

We acquired the bioinformatics capability to sequence genomes of a wide range of high-risk organisms very effectively. This ability was instrumental in sequencing and annotating many more genomes than originally planned and now being able to respond quickly in emergencies.

4.1.3 Lessons Learned and implementation plan of the Lessons Learned

A critical lesson learned was finding the importance of version control of the various genome assemblies. It is critical to keep track of genome versions when working with different partners. When a post doc needed to analyse genome data for preparing a presentation or attending a meeting, it was important to know which version was used in case future analyses generated different conclusions. Alternatively, diagnostic tests developed on earlier versions of genomes needed to be reassessed if the genome was improved.

4.1.4 New capabilities, partnerships and networks created through the horizontal work of the project

CFIA, AAFC and CGC can access genome data from a common source, providing the best model right now for interdepartmental sharing of Big Data. We are now working towards automatic analysis integration after sequencing through LIMS, a goal beyond the original objectives.

Table 2: Summary of genomes sequenced

Assembly Releases Total Bacteria Fungi Nematode

Genome Unique 59 20 29 10 organisms

Genome Total 87 28 45 14 releases

Transcriptome Unique 26 17 9 2 organisms

Transcriptome Total 54 35 15 4 releases

page 25 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Table 3: Details of genomes sequenced

MP- MP- Transcrip Anno Group # Species Strain PE* 3kb* 8kb* tome tation 1 Arcobacter cibarius strain LMG 21996 done n/a n/a done done 2 Arcobacter lanthierii strain AF 1430 done n/a n/a done done 3 Arcobacter lanthierii strain AF 1440 done n/a n/a done done 4 Arcobacter lanthierii strain AF 1581 done n/a n/a done done 5 Arcobacter septicus strain AF 1028 done n/a n/a done done 6 Arcobacter septicus strain AF 1078 done n/a n/a done done 7 Pantoea (Erwinia) ananatis strain LMG 2665 done n/a n/a done done 8 Pantoea stewartii strain DOAB 021 done n/a n/a done done 9 Pantoea stewartii strain DOAB 230 done n/a n/a done done 10 Pantoea stewartii subsp. Nov. S301 done n/a n/a done done 11 Pantoea stewartii strain DOAB213 done n/a n/a n/a done Bacteria 12 Pantoea stewartii strain DOAB 384 done n/a n/a n/a done 13 Pantoea stewartii strain A206 done n/a n/a n/a done 14 Pectobacterium carotovorum subsp. brasiliensis, isolate 1001 done n/a n/a done done 15 Pectobacterium carotovorum subsp. brasiliensis, isolate 1009 done n/a n/a done done 16 Pectobacterium carotovorum subsp. brasiliensis, isolate 1033 done n/a n/a done done 17 Pectobacterium wasabiae Ecw1002 done n/a n/a done done 18 Ralstonia solanacearum NCPPB 909 race 3 biovar 2 done n/a n/a done done 19 Ralstonia solanacearum race 3 biovar 2 906 IIB-1 with low cold-Cl done n/a n/a done done 20 Streptomyces scabies CG-1 done n/a n/a done n/a 1 Penicillium nordicum strain DAOM 185683 done n/a n/a n/a done 2 Penicillium thymicola strain DAOM 180753 done n/a n/a n/a done 3 Penicillium verrucosum strain DAOM 211566 done n/a n/a n/a done 4 Penicillium verrucosum strain DAOM 213195 done n/a n/a n/a done 5 Penicillium verrucosum strain DAOM 214801 done n/a n/a n/a done 6 Penicillium verrucosum strain KAS 4260 done done done done done 7 Penicillium verrucosum strain KAS 4370 done n/a n/a n/a done 8 Puccinia graminis CF12WGA1 _100pg DNA_0%PEG done n/a n/a n/a n/a 9 Puccinia graminis CF12WGA2 _10ng DNA_0%PEG done n/a n/a n/a n/a 10 Puccinia graminis CF12WGA3 _100pg DNA_1%PEG done n/a n/a n/a n/a 11 Puccinia graminis race QTHJT (RSA2051) done n/a n/a n/a n/a 12 Puccinia graminis race RHTSK (RSA2052) done n/a n/a n/a n/a 13 Puccinia graminis race TMRTK (RSA2053) done n/a n/a n/a n/a 14 Phythophthora infestans US-22 n/a n/a n/a done n/a 15 Phythophthora infestans US-24 n/a n/a n/a done n/a Fungi 16 Phythophthora infestans US-8 n/a n/a n/a done n/a 17 Puccinia triticina race MBDS (RSA2119) done n/a n/a n/a n/a 18 Puccinia triticina race MFDS (RSA2117) done n/a n/a n/a n/a 19 Synchytrium endobioticum 2000 outbreak, pathotype 6 done done done n/a n/a 20 Synchytrium endobioticum 2012 PEI outbreak, unknown pathotype done done done done done 21 Synchytrium endobioticum 2012 LEV6574 done n/a n/a n/a n/a 22 Synchytrium endobioticum mixed strains from Newfoundland done n/a n/a n/a n/a 23 Tilletia caries DAOM 238032 done done done done n/a 24 Tilletia controversa DAOM 236426 done done done done n/a 25 Tilletia controversa RS113C _01 done n/a n/a n/a n/a 26 Tilletia controversa RS113C _02 done n/a n/a n/a n/a 27 Tilletia indica DAOM 236416 done done done done n/a 28 Tilletia laevis RS637B done n/a n/a n/a n/a 29 Tilletia walkeri DAOM 236422 done done done done n/a 1 Ditylenchus destructor Jiangsu 01 done done done done n/a 2 Ditylenchus dipsaci Ontario 01 done done done done n/a 3 Globodera pallida strain A2 Cyprus, pathotype Pa2/3 done n/a n/a n/a n/a 4 Globodera pallida strain B1 INRA LeRheu France, pathotypedone Pa2/3 n/a n/a n/a n/a 5 Globodera rostochiensis Isolate Mar149, pathotype Ro1 done done done n/a n/a 6 Globodera rostochiensis QC strain, pathotype Ro1 done done done n/a n/a 7 Globodera rostochiensis SCRIR03, pathotype Ro3 done n/a n/a n/a n/a Nematodes 8 Globodera rostochiensis SCRIR04, pathotype Ro4 done n/a n/a n/a n/a 9 Globodera rostochiensis SCRIR05, pathotype Ro5 done n/a n/a n/a n/a 10 Globodera rostochiensis strain Ro19, pathotype Ro1 done n/a n/a n/a n/a * PE = paired ends (300 bp inserts), MP- 3kb = mate-pair (3kb bp insert), MP- 8kb = mate pair (8 kb insert) Cells highlighted in grey: strains and tasks that were not in the original plan.

page 26 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

4.2 Targeted organisms

4.2.1 Pantoea stewartii (Ps) & Streptomyces scabies (Ss)

Plant-associated Pantoeas are either epiphytes or pathogens and the latter constitute the majority of validly described species. It is important to reliably differentiate P. stewartii subsp. Stewartii (Ps), a quarantine pathogen of corn, from the other subspecies. Errors in identification / certification could have serious economic and/or trade consequences

Common scab disease of potato is caused by the bacterium Streptomyces scabies (Ss). Symptoms affect the appearance, and reduce the marketability of the potato tuber. Symptomology and morphology are no longer sufficient for accurate species identification. Traditional methods of detection involve time consuming culture-based methods.

4.2.1.1 Genome sequencing and diagnostic tools

The genome sequences of Pantoea stewartii (4.5 to 5.4 M bp depending on strain) were exploited for the development of a diagnostic tool. The first subspecies-specific real-time TaqMan PCR diagnostic assay was developed and evaluated for specific detection of Ps in pure cultures and infected corn. The assay has been transferred to end-users (CFIA & CGC). The assay is a rapid, reliable and sensitive tool for the detection of Ps, avoiding false-negative results. This reduces the time required for certifying maize shipments. A significant additional impact of the project was the sequencing of ten (10) other genomes, three (3) of which have been published.

A 10.8M bp draft genome was exploited for the development of a quantitative TaqMan real-time PCR assay specific for the Ss complex targeting the trpB gene. The assay achieved a sensitivity of 100 fg DNA. Specificity was confirmed using 46 strains (24 strains of Streptomyces species and 22 of other Gram-positive and Gram-negative bacterial species. The assay provides a new and instrumental capability to assess field soil for Ss propagules prior to planting of potato with results made available in less than 24 hr. The assay is rapid, reliable and sensitive for the detection of Ss, avoiding false-negative and false-positive results. Genome expertise gained from this project led to new partnerships with CFIA and CGC. Impact and relevance to the identified priority and gap addressed by the project

These new assays reduce the time required for certifying maize shipments or field soil testing prior to planting potato seeds. The specificity and sensitivity of the assays compare favourably to existing technologies and methods. Because post-PCR processing steps are not required, the assays can be easily automated leading to high sample throughput. This, together with the high specificity and sensitivity of the assays, offers significant advantages. These advantages will positively impact the Canadian corn and potato growers through improved yields and marketable quality, leading to high confidence in the Canadian agricultural system by international trade partners such as USA, Europe and China. The expertise gained from this project led to partnership with Manitoba Corn Growers Association and University of Nebraska on another bacterial pathogen of corn.

4.2.1.2 Lessons learned and implementation plan of the lessons learned

The main lesson learned was the delay incurred during the process of hiring qualified bioinformaticians required to assemble and analyze next-generation sequence data. At the time of implementation of this project this key resource of skilled professional was not sufficiently available in Canada and the hiring of qualified foreign nationals was met with paperwork hurdles. Also, the massive amount of data generated required corresponding levels of storage capacity.

page 27 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

However, our bioinformatics group was able to attract additional funding to cover the extra capacity required.

4.2.1.3 New capabilities, partnerships and networks created through the horizontal work of the project.

The concluding project enhanced AAFC capability in next-generation sequencing and analysis of plant pathogenic bacteria by providing the necessary facilities and computer support. It led to new partnerships with CFIA, Manitoba Corn Grower Association (MCGA) and University of Nebraska. The computer support from this project was instrumental for the smooth start of two new collaborative projects. The first assay to reliably differentiate the quarantine subspecies Ps from the non-quarantine was through RMOU # AGR-11104 and MTA # AGR-11163 . The transfer packages for this assay included detailed protocols, positive control DNA extracts, a small set of specimens for validation processing at CFIA PEI (end-user). Also transferred to CFIA were background information for the target lineages and results for the strains used for development. The same assay was transferred to CGC.

Similarly, a new project for further technological readiness has been initiated at CFIA. RMOU# pending; CFIA project ID: OLF-P-1402 Project Title: Evaluate and validate a novel real-time PCR assay for Goss’s Wilt of Corn caused by the invasive plant bacterium Clavibacter michiganensis subsp. nebraskensis (CMN) with Mr. S. Brière as lead.

As follow-up activities funding was secured for another bacterial pathogen of corn: Project Title: Monitoring an invasive bacterial pathogen (Clavibacter michiganensis subsp. nebraskensis) of corn using next-generation sequencing with Dr. Tambong as the principal investigator. Funding provided by AAFC ($210K) and MCGA CRADA # AGR10755 ($45K). A poster reporting on development of one assay was honored with a first prize award for best research and presentation at the 2014 Student Research day, Department of Food Science, Carleton University.

4.2.2 Ralstonia solanacearum (Rs)

Ralstonia solanacearum (Rs), as a species complex, shows remarkable genetic and phenotypic variation. One clade of this species, classically identified as race 3 biovar 2 (R3bv2), is of particular concern for agriculture in temperate climatic zones on account of its adaptation to cool temperatures. The pathogen possesses a surprising capacity to grow at very low substrate concentrations, as well as to convert to a viable but unculturable form at low temperature. It is likely that these very divergent behavioural patterns evolved in response to highly variable habitats. Strict quarantine regulations are applied in Canada, US and many other countries for bacterial brown rot disease of potato caused by R. solanacerum R3bv2.

4.2.2.1 Genome and transcriptome sequencing

Genome/transcriptome sequences of R. solanacearum R3bv2 NCPPB909 and CFIA906 were decoded using paired-end Illumina HiSeq sequencing technology with 300 bp inserts to provide approximately 84X and 105X genome coverage for NCPPB909 and CFIA906, respectively. The draft genome size for NCPPB909 was 5.2M bp, while for CFIA906 it was 5.0 M bp. Annotations conducted on the RAST server using the Glimmer 3 option predicted 4,937 and 5,025 protein- coding genes, including 53 and 63 noncoding RNA genes for NCPPB909 and CFIA906, respectively. A number of predicted virulence related factors, phage-related loci, motility and

page 28 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

chemotactic genes were identified in the genome, which may enhance or trigger its specific pathogenicity in specific environments.

4.2.2.2 Diagnostic tools

A. Genotyping of isolates by Microsatellites or Sanger Sequencing: Five (5) whole genomes were analyzed to categorize the 24 hypervariable regions in the genomes of Rs for genotypic differentiation of R3bv2 strains. The genotypic characteristics of R3bv2 strains coincided with those from the MLSA and phylogenetic analyses of seven house-keeping genes for fifteen R3bv2 strains. Those strains clustered separately from the other 25 strains of different races and biovar types in the phylogenetic trees.

B. Direct detection of genotypes by RT-PCR: Eight sensitive RT-PCR assays were designed targeting hypervariable loci/regions of R. solanacearum R3bv2 strains. Of these, seven pairs of real-time primers amplified specific PCR products from all 15 R3bv2 strains of R. solanacearum, while none of the 25 strains of other race and biovar strains was amplified. In the eighth RT-PCR assay, the primers and probe targeting Locus S798 differentiated pathogenic R3bv2 strains from known non-pathogenic R3bv2 strains. All eight assays produced distinct melting curves to support the positive reactions. These assays demonstrated a detection limit from 10-4 to 10-8 dilutions of the bacterial suspension spiked in potato tuber extracts. Two of the assays are being evaluated in the AAFC Carling Laboratory and CFIA Fallowfield Laboratory.

Four of the RT-PCR assays developed in this project have been used to resolve differences in results on Canadian seed potato identified for export to the Mexican market in early 2014. The qPCR assays developed based on genome sequence data obtained in this project were successfully used in a recent case to clarify discrepancies between Mexican and Canadian results for R. solanacearum race 3 biovar 2 in Canadian seed potato samples. The Canadian analysis indicates that the samples received from Mexico were contaminated with some PCR amplicon. As a follow-up the CFIA has invited Mexican technical experts to participate a technical exchange session to be held at the CFIA-Charlottetown Laboratory.

4.2.2.3 Follow-up activities and new R&D initiated

Results obtained in this project have triggered a follow-up study to further analyze the pathogenic and non-pathogenic strains of R. solanacearum R3bv2 strains in a collaboration project between CFIA and AAFC (RPS Project CHA-P-1412 entitled “Comparative genomic analysis of Ralstonia solanacearum race 3 bv 2 (pathogenic and non-pathogenic strains) with non-R3bv2 strains of various pathogenic characteristics for accurate diagnostic methodology targeting low temperature adapted pathogenicity determinants”) which has been approved at CFIA with a total funding of $276K for the next three years.

4.2.3 Pectobacterium wasabiae (Pw) Pectobacterium wasabiae (Pw) (formerly Erwinia carotovora subsp. Wasabiae) was originally described as causing soft rot of Japanese horseradish, and later identified as the causal agent of potato tuber decay in New Zealand, US, and Iran. A recent study demonstrated that P. wasabiae also causes blackleg disease in potato plants. The pathogen possesses diverse genetic regulatory systems with known virulence factors including genes encoding pectolytic enzymes and the type III secretion system (T3SS), and has many additional pathogenicity and virulence determinants acquired by horizontal gene transfer. Recently, experiments at CFIA-Charlottetown Laboratory indicated that the Japanese horseradish isolates could not infect potato plants through artificial inoculation and could not ferment melibiose and raffinose, but potato isolates from US, Canada

page 29 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

and European countries demonstrated strong virulence on potato plants and utilized melibiose and raffinose as the sole carbon resources.

4.2.3.1 Genome and transcriptome sequencing The strain ‘P. wasabiae’ CFIA1002 was isolated from a blackleg-diseased potato stem sample in Canada. Draft genome sequence data for this Canadian strain were generated using paired-end Illumina sequencing technology, producing a 5.0M bp genome. Annotation conducted on the RAST server using the Glimmer 3 option predicted 4,615 protein-coding genes (96 noncoding RNA). A number of predicted virulence factors, phage loci, motility and chemotaxis genes were identified, which may facilitate its pathogenicity in specific environments. The variable genomic regions, especially pathogenicity related loci, were highly correlated with different environmental factors including host species.

4.2.3.2 Diagnostic tools Genotyping of isolates by Microsatellites or Sanger Sequencing: The draft genome sequences of potato ‘P. wasabiae’ isolates CFIA 1002 from Canada, WPP163 from US and SCC3193 from Europe were further engaged in comparative genomics analysis with the type strain P. wasabiae CFBP3394, isolated from horseradish in Japan. Results suggested a need for reclassification of the potato isolates from the US, Canada and Europe as a new species distinct from the Japanese strains of P. wasabiae isolated from horseradish. The new species, provisionally named ‘Pectobacterium kelmani’, was positive in utilizing or acidification of lactose, melibiose and raffinose, while the four P. wasabiae strains from Japan were all negative. The MLSA based on the eight house-keeping genes and phylogenetic analysis of the 16S rRNA of all other strains and isolates indicated the presence of phylogenetic differences between these two groups of bacteria. Further determination and comparison of average nucleotide identity (ANI) values, representing a mean of identity/similarity values between homologous genomic regions shared by two genomes, of 27 available genome sequences of Pectobacterium spp. and Dickeya spp. confirm the legitimacy of the specie, P. atrosepticum, P. carotovorum, P aroidearum, ‘P. kelmani’, and P. wasabiae using the 95-96% boundary for species delineation (Li et al, unpublished). All Pectobacterium spp sequenced share an ANI value of 89%-96% between each other, and 81%- 83% with any of the Dickeya spp. (Table 1).

Direct detection of genotypes by RT-PCR: Species-specific real-time PCR assays were developed for detecting the potato pathogen ‘P. kelmani’, which showed no cross reaction with Japanese strains of ‘P. wasabiae’ isolated from horseradish. Another RT-PCR assay was designed to differentiate strain CFIA 1002 from most other isolates of ‘P. kelmani’. This assay has potential to be useful for tracking the movement of this particular potato pathogenic bacterium.

4.2.3.3 Follow-up activities and new R&D initiated Results obtained in this project have triggered a follow-up study to further analyze the pectolytic bacteria with European, NA and other origins, and their environmental fitness, to assess the potential threat to Canadian potato industry. A collaborative project between CFIA and the Netherlands (RPS Project CHA-P-1313 entitled “Investigation of Dickeya spp. for their environmental fitness on potato and development of rapid and sensitive molecular diagnostic method”) has been approved at CFIA with a total funding of $234K for the next three years.

4.2.4 P. carotovorum subsp. brasiliense (Pcb)

Pectobacterium carotovorum subsp brasiliense (Pcb) was considered the only causal agent of potato blackleg in Brazil, and was a major cause of potato blackleg in South Africa. More page 30 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

recently, Pcb was also found in temperate regions such as the United States, Canada and Israel. This sub-species was not a target in the original charter. In particular, we found that Canadian isolates of Pcb were clearly less virulent than Brazilian strains in both greenhouse and field conditions. Whether or not these differences in aggressiveness can be assigned to specific genomic differences was the major trigger for this sequencing work. Comparative genomics analyses were planned to provide important insight into genomic differences that differentiate the highly virulent tropical strains from temperate isolates of Pcb.

In this study, three Canadian strains (CFIA1001, 1009 & 1033) isolated from potato blackleg- infected stems were sequenced using paired-end Illumina HiSeq sequencing technology. Approximately 27X, 21X and 37X genome coverages were obtained for strains CFIA1001, CFIA1009 and CFIA1033, respectively. After quality checking and initial de novo assembly using Velvet assembler, the draft genome sizes for these three strains are as follows: CFIA1001 4.76M bp, CFIA1009 4.76 M bp, and CFIA1033 4.70M. Annotations predicted 4,457, 4,442 and 4,471 protein-coding genes, including 85, 81 and 77 noncoding RNA genes for CFIA1001, CFIA1009 and CFIA1033, respectively.

A number of predicted virulence related factors, phage-related loci, motility and chemotactic genes were identified in the genome, which may facilitate its specific pathogenicity in specific environment. Comparative analysis of Average Nucleotide Identity (ANI) values with other 27 available genome sequences of Pectobacterium spp. and Dickeya spp. indicated that Pectobacterium carotovorum subsp. brasiliense should be considered as an individual species, rather than a subspecies of P. carotovorum. Further comparison of genome sequences of strains from different hosts and geographic regions will provide further insights on virulence, functionality, and plant/pest interactions, and contribute to the development of specific assays for accurate identification and detection of the pathogen.

Results obtained in this project have triggered a follow-up CFIA GRDI project (GRDI CHA-P- 1411 entitled “Development and application of genomics and metagenomics-based surveillance technologies toward eradication of potato bacterial pathogens of regulatory importance”) to further analyze the regulated plant pathogenic bacteria in collaboration between CFIA and AAFC, which has been approved at CFIA with a total funding of $140K for the next five years.. In addition, another TD Project Proposal CHA-P-1511 entitled “Establishing next generation sequencing capability for diagnostics and research” has been planned in complementary to the GRDI project at CFIA-Charlottetown Laboratory.

4.2.5 Arcobacter lanthierii (Al), A. septicus (As) & A. cibarius (Ac)

The genus Arcobacter has been associated with human illness and species considered as emerging enteropathogens and potential zoonotic agents. It is important to better understand this genus in order to identify emerging Arcobacter spp. and their potential pathogenic health risks to humans and animals.

4.2.5.1 Genome and transcriptome sequencing Initially, genome and transcriptome sequencing of these three Arcobacter species were not listed in the charter; however, considering the importance of known Arcobacter spp. implicated in human illness and isolated from same sources, sequencing of new species A. lanthierii (2.2-2.3 M bp depending on strain) and A. septicus (2.4-2.5 M bp depending on strain) along with a closely related species A. cibarius for comparative analysis were included in the charter in order to study the pathogenic risk to human and animals.

page 31 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

4.2.5.2 Diagnostic tools and its impact and relevance to the identified priority and gap addressed by the project This work was done beyond the current charter deliverables. Rapid and reliable diagnostic methods have been developed for Arcobacter. The developed genus-specific PCR assay amplifies 18 known species of genus Arcobacter, whereas the RFLP method identifies these amplified products to species-level. In addition, a developed multiplex PCR assay may identify two novel Arcobacter spp. Including A. lanthierii and A. septicus in more rapid and robust manner.

4.2.5.3 Lessons learned and implementation plan of the lessons learned The existing culture-based methods are laborious and time consuming, so a modified method for the isolation of Arcobacter from fecal and water sample has been successfully used in the lab. The method allowed better yield and isolation of Arcobacter spp. by reducing background bacterial contamination.

4.2.5.4 New capabilities, partnerships and networks created through the horizontal work of the project The developed genus-specific PCR and RFLP assays were successfully applied to identify Arcobacter spp. in commercial microbial consortia (Samarajeewa et al. 2015). The research developments and innovations further resulted in an initiation of research collaborations with other scientists from government and academia. Considering the significance of species of Arcobacter as emerging pathogens, much is still unknown about pathogenicity and functional elements, including virulence factors and antibiotic resistance. The results from this research study helped in (i) understanding the relative (moderate to high) relationships of these species to human and/ or animal species of Arcobacter; (ii) identifying source, pathogenicity and prevalence of these novel species in environment and fecal matter; (iii) developing methods that can be used in the diagnosis of infection in humans and identify potential animal hosts. In addition, transcriptome sequence data helped in: (i) identifying the functional elements of the genome and molecular constituents; (ii) determining the transcriptional structure of genes and (iii) identifying expression levels of virulent, toxin and antibiotic resistant genes.

4.2.6 Penicillium verrucosum (Pv)

Ochratoxin A (OTA) Group 2B is a possible human (kidneys), according to International Agency for Research on Cancer (IARC). It is found in grains (especially barley, oats), wine, beer, coffee and other commodities. The European Food Safety Agency (EFSA) and EU lowered maximum acceptable levels by an order of magnitude, creating a major issue for food producers worldwide.

4.2.6.1 Genome and transcriptome sequencing Genomes of six strains of Penicillium verrucosum of varying ochratoxin A ability were completed, assembled, and automatically annotated (genome size 32M bp). The package AntiSMASH, which annotates secondary metabolites (including mycotoxin) gene clusters, was run for all genomes. Differential expression analyses were run on transcriptomes of one strain, and genes up regulated and down regulated during the initiation of ochratoxin A synthesis were identified.

page 32 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

4.2.6.2 Diagnostic tools Quantitative PCR assays for species specific detection: Note that the A-base funded project for diagnostics for OTA producing fungi ends at the end of FY 2015-2016, and is on track to deliver assays at that time.

Impact and relevance to the identified priority and gap addressed by the project: The assays developed will be assessed by the Canadian Grain Commission as an aid for monitoring Canadian grain exports for fungi that might lead to OTA contamination during product transport; they will also be used in microcosm experiments in experiments with custom-built grain storage bins in southern Ontario by University researchers. No other assays currently exist.

New capabilities, partnerships and networks created through the horizontal work of the project: Carleton University provided support in kind by performing liquid fermentation experiments on ochratoxin A production, which was used to deliver cells used for RNA Seq transcriptomes. This was invaluable for understanding genes involved in ochratoxin A synthesis.

4.2.7 Tilletia

Tilletia has close to 200 species causing smut in various grasses. Tilletia indica (Ti) causes Karnal bunt of wheat whereas T. controversa (Tc) causes dwarf bunt. Both are quarantine diseases and quarantine testing is conducted in Canada and elsewhere on wheat imports from infested regions.

Additional work NOT listed in the charter:  Tilletia walkeri (Tw) and Tilletia caries (Tca) – Quantitative PCR assays for species specific detection  Tilletia laevis – Genome sequencing by Illumina MiSeq  Tilletia – Development of genus-specific primers

4.2.7.1 Genome and transcriptome sequencing Genome and transcriptome data for one strain of each of the four target species (25-33 M bp) were generated and assembled, and used for assay development. Near the end of the project, genome data for one strain of T. laevis were generated by paired-end Illumina MiSeq (beyond the current charter deliverables).

4.2.7.2 Diagnostic tools Quantitative PCR assays for species specific detection: TaqMan® real-time PCR assays were designed and optimized for each of the four target species and evaluated against a small panel of the target species. For first validation, they were tested for specificity against each other and eight non-target Tilletia species, 10 non-Tilletia fungal species and pure wheat DNA. Performance of each specific assay was ranked based on sensitivity and the most promising were transferred to the Canadian Grain Commission (CGC) for second validation testing.

Table 4: Diagnostic test development for Tilletia spp. Assays # developed # passed 1st validation # transferred to CGC

page 33 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Ti 6 5 3 Tw+ 6 4 3 Tc 6 5 5 Tca+ 6 5 5 + Beyond in the current charter deliverables

At CGC, the assays for three species worked consistently well (Ti, Tc and Tca) using their equipment and representative DNA samples, and also when challenged against grain samples artificially spiked with pathogen DNA. The best Ti assay also transferred well over to the highly sensitive droplet PCR technology, which provides an increase in diagnostic power. The Tw assays (beyond the current charter deliverables) were inconsistent, sometimes overestimating the amount of infection, but in other cases unable to detect any infection. For Tw, further optimization and validation was hampered by the low number of representatives available for testing.

Development of genus-specific primers (beyond the current charter deliverables): During the assay design phase, several sets of genus-level specific primers were identified, including one set (PK1825) that amplified and differentiated all tested Tilletia species with analysis of the underlying DNA sequences, except for T. caries (Tca) and T. laevis.

4.2.7.3 Impact and relevance to the identified priority and gap addressed by the project

Tilletia species cause bunt diseases in cereals and other grasses. Two of the species targets for this project are of regulatory concern for Canada, T. indica (Ti, Karnal bunt) and T. controversa (Tc, dwarf bunt). Both are seed-borne, contaminating seed, grain, straw and compost of wheat, triticale, barley, rye. Karnal bunt is a regulated organism in most countries. Quarantine testing is conducted in Canada and elsewhere on wheat imports from infected regions. Surveys and monitoring of Karnal bunt to provide export certificates is very expensive but deregulating would also be costly because of lost export markets. Similar regulatory and economic concerns exist for dwarf bunt in Canada where the species is found in localized areas of BC and Ontario, but actively monitored to avoid further spread.

For Karnal bunt, the primary need is for accurate differentiation from the closely related but non- regulated ryegrass bunt pathogen (Tw) which may be present as contaminant in some commodities. Current detection systems are morphological and based on time consuming teliospore examination; or molecular and based on minor differences in ITS barcodes or a mitochondrial gene region. The latter requires time-consuming teliospore germination to achieve the required sensitivity (www.ippc.int/sites/default/files/documents/20140911/dp_04_ 2014 _en_ 2014-09-11_201409111208--690.4%20KB.pdf). At the start of this project for dwarf bunt, there were no published diagnostic DNA markers differentiating the species from Tca or T. laevis, a complex of three species subject to hybridization and all occurring on wheat.

Impact and relevance: This project generated the first fully assembled genomes and transcriptomes and the first real-time PCR assays targeting effector genes for regulated species in the genus Tilletia and the first DNA-based diagnostic assays for Tc.

Public domain release of all data is pending and will foster multiple lines of downstream research on these important plant pathogens in addition to the diagnostic assays developed in this project.

page 34 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Real-time PCR assays for three species, including both regulated species, are ready for internal use at AAFC, CGC and CFIA but require one more level of rigorous validation against environmental samples before publication. Once fully validated, these more rapid, efficient and sensitive assays will enhance the diagnostic toolkit available to regulators and quality control specialists.

The new genus-level PCR primers contribute an economical DNA sequencing approach to sample identification, appropriate for use when time and DNA concentration are not limiting factors.

4.2.7.4 Lessons learned and implementation plan of the lessons learned Living strains for this project were obtained from CFIA or purchased from culture collections but only a limited number were available for some species (Tw and Tc). For Tw, validation of the assays for the two regulated targets against this species was not hampered. For Tc, the limited number of isolates became problematic when results indicated that some were hybrids between Tc and Tca, and when the Tca assays failed to differentiate that species from T. laevis, for which we had no genome data during the design phase. All three species occur on wheat and may hybridize freely in the field. Near the end of the project, genome data for T. laevis were generated by paired-end Illumina MiSeq, too late for analysis but in anticipation of follow-up funding. One component of a proposal for AAFC core research funding submitted in January 2015 was to continue research on the wheat-associated Tilletia species to investigate questions raised in this project.

4.2.7.5 New capabilities, partnerships and networks created through the horizontal work of the project  collaboration with USDA (see 4.2.7.6)  collaboration with Washington State University (see 4.2.7.6)

4.2.7.6 Transition and exploitation (this is our validation and protocol transfers)

 Assays transferred to Canadian Grain Commission and Canadian Food Inspection Agency  Internally released assemblies provided to researchers at Washington State University (Dr. Carris) for a comparative study of mating type loci among Tilletia species.  Results of the Tilletia genome analyses used as one example in an invited talk at IMC10 in August 2014 (4a).  Results of the Tilletia genome analyses used for a poster presentation at the 2013 APS-MSA Joint Meeting, August 10-14, Texas, USA (4b)

An internal AAFC research proposal was submitted to complete the work beginning April 2015 (funding recently approved). Samples from USDA (298 Tilletia-infected seed samples collected during 1995-2006) were received in December 2014, a major contribution by USDA towards our goal. These collections will be a source of additional Tc and Tw representatives as well as 12 other species, and of environmental samples for the final validations needed before the assays can be published. Manuscripts for two genome announcements are in preparation.

page 35 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

4.2.8 Puccinia

Puccinia species cause different cereal rusts and have been responsible for the most devastating disease epidemics in cereals.

4.2.8.1 Diagnostic tools

Genotyping: The approach was to validate SSR markers previously published by PIs with long- running programs specializing on genotyping the target species (Kolmer et al., 2011; Cheng et al., 2012). Pt: 23 microsatellite loci were screened for 32 isolates, representing 30 different virulence phenotypes (races). Pst: 15 microsatellite loci were screened for 27 isolates, from Ecuador 2012 (9 samples), China 2006 (8 samples), Ethiopia 2007 (1 sample) Kenya 2007 (4 samples), and Canada 2008 (1 sample) and four reference DNA samples of pathotyped isolates (races Pst-35, Pst-37, Pst-78, and Pst-127). A reference database of SSR profiles is now available for many pathotyped or environmental isolates. (Note: similar validation work was completed for Pgt as one component of a project funded separately.)

ASO-PCR for direct detection: The approach was to build on results from two recent publications (Duplessis et al., 2011; Saunders et al., 2012). More than 1600 putative effector genes in Pgt, Pt and Pst were identified through in silico analysis. We searched the publicly available rust genomes using sequences of the 1600 candidate effectors, to identify the single copy and species-specific genes to target for primer design. For Pgt, Pt and Pst a total of 30, 67 and 26 primer pairs were designed and tested. After testing, 1 out of 3 Pgt, 13 out of 17 Pt and 5 out of 12 Pst primers were found to be specific at all levels: within species, between species and against non-target DNA samples. This high specificity at the species level generated new markers for diagnostics (see next paragraph). The levels of subspecific variation in all genes analyzed were insufficient for genotyping applications, except for two Pgt ASO loci which differentiated groups of isolates corresponding to provenance from Africa versus North America.

Diagnostic PCR assays for species-specific detection: The observed variation in PCR amplicon sizes for the species-specific primer sets allowed for multiplexing, and the specificity eliminated the need for laborious cloning and costly probes for identification. Combinations of primer sets were used in single reactions to identify the targets for a mixed DNA sample containing all three species, without DNA sequencing. Specificity was confirmed independently using real-time PCR assays developed for each species in a previous CRTI project (4b) and sequences generated for a systematic study of leaf rusts (4d). Final validation and publication of the specific primers and multiplex assays will be completed after the project close. A poster was presented at APS-CPS in August 2014 (4c).

Genome sequencing Pt and Pgt: Genome data for 3 Pgt and 2 Pt pure spore isolates (previously characterized for pathogen virulence) were generated by Illumina HiSeq sequencing. Assemblies were completed late in the project and were not analyzed during the ASO primer development phase.

4.2.8.2 Impact and relevance to the identified priority and gap addressed by the project

Leaf rust, stripe rust and stem rust, caused by Pt, Pst and Pgt respectively, are important diseases of wheat. It is common to find plants infected with more than one species, making disease diagnosis difficult in early stages of rust development. Two gaps were identified. 1. Genotyping markers already published had not been widely tested by independent laboratories. The objective was to evaluate and identify the best markers for routine genotyping and develop a database of reference SSR profiles. 2. At the project start, there were a limited number of assays available for page 36 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

direct detection of pathogen targets in environmental samples. Developing a robust toolkit including multiple assays or approaches based on different molecular markers minimizes the risk of false positive detections and increases confidence in identification accuracy. The approach was to screen effector genes for new markers, both for diagnostics and for genotyping. Pathogen effector genes are assumed to play a major role in plant disease, evolving in response to resistance mechanisms of their host plants and having the potential to encode highly specific markers.

We now have the capacity and expertise to accurately genotype isolates for population genetics studies and have developed a reference database based on the samples available for the project. The development of economical, highly specific and independent DNA-based identification assays has increased our ability to quickly identify the three major rust pathogens of wheat, and monitor or track spore dispersal in the environment.

4.2.8.3 Lessons learned and implementation plan of the lessons learned

The challenge to develop specific markers for the rust species targeted in this project was great. Pathogen populations comprise a broad spectrum of virulence phenotypes and are known to evolve quickly in response to changing crop host resistance profiles. Significant investment in genome sequencing and bioinformatics capacity to analyze data for large numbers of well- characterized populations of each species is required for accurate diagnostics at the virulence level. This type of investment is now being generated by international groups of collaborators sharing samples, data and resources. Future research efforts by our team will be accomplished most effectively through increased linkage to broader collaborations and by focusing the research on specific populations (see success story on stripe rust).

4.2.8.4 New capabilities, partnerships and networks created through the horizontal work of the project.

The species-specific primers developed in this project were used as one line of evidence confirming the first report of stripe rust (Pst) in the province of Quebec (Rioux et al. 2015 under Annex C). The multiplex assay was especially valuable for demonstrating in one reaction that the initial DNA extracts were derived from specimens infected with leaf rust (Pt) as well as stripe rust. The validation of the assays has been internal so far and the technology will be transferred via publications and presentations (see Hambleton under Annex C). The results achieved in this project provide fundamental building blocks for future work on the rust pathogens of wheat. Dr. Hambleton is a collaborator on separately funded projects for stem rust (Pgt; “Development of Canadian wheat cultivars with resistance to Ug99 stem rust”) and stripe rust (Pst; “Effectors of Canadian Puccinia striiformis isolates”) and on a new proposal on environmental monitoring methods (Pst, Pgt, Pt; “The Wheat Disease Forecasting Network”).

4.2.9 Phytophthora infestans (Pi)

Phytophthora infestans causes potato late blight, which was responsible for the Irish potato famine in the 1800’s. The disease in North America and Europe was caused by a single clone until the 1990’s, but the introduction of new strains from Central America has completely changed the populations, leading to a major increase in virulence of the pathogen.

4.2.9.1 Diagnostic tools The genome sequence was already available. The genetic composition of P. infestans populations in Canada has changed considerably over the last few years, with the appearance of several new genotypes showing different mating types and resistance sensitivity to the fungicide metalaxyl. Genetic markers allowing for a rapid assessment of clonal lineages genotypes from small amounts page 37 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

of biological material would be beneficial in the early detection and control of this pathogen throughout Canada.

Allele Specific Oligonucleotides were developed to genotype P. infestans by Dr. Bilodeau’s group at CFIA in Ottawa. The Canadian isolates were tested and the Standard Operating Protocol (SOP) was transferred to Dr. Kawchuk (AAFC Lethbridge, AB) who surveys P. infestans in Canada as part of his research.

Mining of the P. infestans genome revealed several regions containing SNPs, both within nuclear genes and flanking sequences of microsatellite loci. ASO-PCR assays were developed from 14 of the 50 SNPs found by sequencing. Nine successfully optimized ASO-PCR assays were validated (Table 5) using a blind test (Table 6) comprising P. infestans and other Phytophthora species. The nine optimized assays revealed diagnostic profiles unique to each of the five dominant clonal lineages genotypes of P. infestans in Canada. Most of the other Phytophthora species tested did not display any amplification product at any of the nine optimized markers. The markers developed in this study can be used with field samples and will certainly contribute to the genomic toolbox available to assess the genetic diversity of P. infestans at the intra-specific level.

Table 5: Expected genotypes and profiles of the five dominant Canadian genotypes of Phytophthora infestans at nine optimized allele-specific oligonucleotide (ASO)-PCR assays.

ASO Arp23-182 ASO PUA-120 ASO PUA-225 ASO PUA-570 RevASO Ras-98 ASO Ras-376 RevASO 3332F-225 RevASO 3332F-258 RevASO 3332F-344 Genotype Profile Genotype Profile Genotype Profile Genotype Profile Genotype Profile Genotype Profile Genotype Profile Genotype Profile Genotype Profile US-8 AA Others TT Others CC Others AA US-8/US-24 GG Others TC Others GG Others CC Others CT US-8 US-11 AA Others TT Others CC Others GG US-11 GT US-11/US-22 TC Others GG Others CA US-11 CC Others US-22 AT US-22/US-23 TT Others CC Others GA US-22/US-23 GT US-11/US-22 TT US-22 GG Others CC Others CC Others US-23 AT US-22/US-23 TA US-23 C T US-23 GA US-22/US-23 GG Others TC Others GC US-23/US-24 CC Others CC Others US-24 AA Others TT Others CC Others AA US-8/US-24 GG Others TC Others GC US-23/US-24 CC Others CC Others

An SOP with these ASO markers has been transferred to Dr. Kawchuk’s lab and has been used to compare with traditional RFLP and allozyme genotyping protocols developed over 20 years ago; the latter two approaches require large amount of DNA from pure cultures. Overall, the novel quantitative real time PCR method was capable of distinguishing different US genotypes of P. infestans with the results largely identical to those of the traditional approaches.

Phytophthora infestans is already established in Canada. The assays developed will serve to control the spread of virulent strains of the pathogen inside Canada. The type of marker developed (ASO) allows detection directly from infected material and does not require culturing of the pathogen like the traditional methods. Long-distance movement in seed tubers and garden center transplants contributed to the rapid spread of the genotypes across Canada. Tracking pathogen movement and population composition should improve our ability to predict the genotypes expected each year.

A manuscript on the ASO markers is being internally reviewed. Field samples to evaluate the limit of sensitivity of the markers and how the assays can be used directly in the field were included. After we presented this method at a conference, sexual isolates were received recently from Cornell University and were genotyped with the ASO markers to assess their profile. The type of environmental material received consisted of known amounts of infection and will be tested for a project on “Development of integrated molecular and mathematical tools for prediction, detection, and monitoring of plant pathogens and of specific genotypes of Canadian economic importance” lead by Dr. Odile Carisse, AAFC. This is an excellent outcome because it represents continuity for this project and fosters collaboration with CFIA, AAFC and University partnership.

page 38 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

4.2.10 Synchytrium endobioticum (Se)

Synchytrium endobioticum (potato wart) is a chytrid fungus considered as a quarantine organism worldwide. It is on the US Select agent list. This pathogen infects potato tubers, rendering them unmarketable, and produces resting structures (winter sporangia) that can stay in the soil for more than 30 years. A spore detection assay was developed in a previous CRTI project. Fungicides are mainly ineffective against this pathogen, limiting the means to control it.

Potato wart has been present in Newfoundland since 1909 and it has been detected in 2000, 2012 & 2014 in Prince Edward Island (PEI). Different pathotypes of this pathogen are present in both provinces. However, little is known about the intraspecific genetic diversity found in this organism in Canada. More genotyping markers are needed to characterize S. endobioticum isolates in Canada.

4.2.10.1 Genome and transcriptome assembly and annotation

Early at the beginning of the project, we discovered that the Plant Research Institute (PRI) in the Netherlands also wanted to sequence this genome. We decided to collaborate together on this very challenging project. They had some sequences that we were able to use early on to develop our genotyping assay. Synchytrium endobioticum cannot grow in culture and the starting material is wart potato tubers, which are full of microbial contaminants and potato DNA. Like our collaborators, our early attempts generated many sequences that were contaminants. We both improved our initial cleaning up techniques. Our 2014 MiSeq genome assembly is comparable with the PRI assembly. Full genome annotations using Maker on both assemblies and using common input evidence gave results that overlapped by over 90%, showing significant similarity after using BLAT to align annotated transcripts back to the PRI genome. These results were reported at a videoconference with members of AAFC, CFIA, and PRI in early November.

We then re-examined the use of our initial HiSeq mate-pair reads. Only 7% of the Se 2012 mate- pair reads had initially mapped to our paired-end Se assemblies because of contamination issues. However, hundreds of millions of reads were sequenced, so we have performed some work to trim and filter the reads by mapping quality and read pairing data to get only those reads that may be of use in scaffolding. In this way we have been able to significantly reduce the number of reads needed to scaffold, without reducing quality of the output.

The Dutch S. endobioticum isolate of pathotype 1 from PRI (collaboration with Dr. Peter Bonants & Dr. Theo Van der Lee, Plant Research International (PRI), Wageningen, The Netherlands) was sequenced and used for development of microsatellite markers. Additional Canadian isolates were sequenced as well at AAFC.

4.2.10.2 Diagnostic tools An assay was designed to genotype with microsatellite markers the isolates of S. endobioticum from Newfoundland and PEI. The microsatellite markers were developed to help in the characterization of S. endobioticum isolates found in Canada.

Data from whole genome sequencing and the de novo assembly project of S. endobioticum were used to look for microsatellites. The project consisted of the whole genome sequencing of one Dutch S. endobioticum isolate of pathotype 1 from PRI. The software MsatCommander was used to look for microsatellite motifs varying from 1-6 bp repeats. Microsatellite (N = 62) with the larger number of repeats in the 2-6 bp categories were chosen for primer design. The 62 primer pairs were first amplified and checked on 1.5% agarose gels using three isolates representing

page 39 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

different pathotypes and a negative control to check for amplification success. The 37 primer pairs that produced amplicons in all isolates were later used to screen for polymorphism on an ABI 3130 xl Genetic Analyzer in a larger panel of isolates. Universal fluorescent labelling was used to avoid the cost of genotyping with individual fluorescently labelled primers on the ABI 3130 xl Genetic Analyzer.

A total of 21 loci were polymorphic in S. endobioticum and all the isolates presented different genotypes, except for three isolates from PEI (2012 and 2014) which were identical. Figure 2 represent the different grouping patterns. There is no clear association between pathotypes and genotypes. The PEI isolate from 2000 show some genetic differences with those from 2012 & 2014 that might indicate two introduction events on PEI. This is the very first molecular test that could link strains genetically.

Figure 2: Principal Component Analysis with Bruvo’s distance of Potato wart isolates from different geographical origins and of different pathotypes (Se01-Se17).

An SOP with these microsatellite markers has been transferred to the CFIA Charlottetown where they are testing and isolating potato wart samples. The microsatellites developed may serve to characterize and control the spread of the pathogen in Canada through a better understanding of this pest evolutionary history. The assays developed may also serve CFIA to enable a rapid response in case of another detection of potato wart, notably in helping to track down its origin. During this project new isolates were found on PEI in 2012 and 2014. These markers were very useful to identify the genotype and to indicate if this could be a new introduction and will help the potato program to take decisions.

4.2.11 Globodera pallida (Gp) Globodera pallida is also called the pale cyst nematode and is a major quarantine pathogen of potato. As outlined in the charter, we did not plan to do any sequencing on this species as we knew that a European consortium was working on this. However, the genome came out only in 2014 (Cotton et al. 2014), later than expected and too late to develop assays. The CSSP team decided to focus its efforts to the other cyst nematode.

4.2.12 Globodera rostochiensis (Gr) The golden potato cyst nematode, Globodera rostochiensis is among the most damaging pests to potato. It has been found in a few potato growing regions of Canada. Understanding the genetic

page 40 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

factors behind virulence will contribute to the development of new monitoring and management tools.

Like for G. pallida, we were counting on Europeans to provide the genome to develop genotyping tools. In this instance, we decided to proceed with genome sequencing of our own, using other resources from CFIA and AAFC since this was not originally planned and this is a very large genome. We generated a genome of 95.9 M bp containing 13,650 predicted genes that were annotated. Genes involved in hatching, cyst survival and analog of known effectors were identified. A method based on Pool-Seq and Genotyping by Sequencing (GBS) was also developed to rapidly obtain population genetics information on field samples of cyst nematode. This led to the identification of pathotype- or origin-specific SNPs and to the development of ASO-qPCR assays.

We have been using the data generated from the genomic sequencing of the different strains and pathotypes of G. rostochiensis and have done several whole genome comparisons to identify pathotype specific markers. We have identified several different markers but few that are pathotype specific. We think this has a lot to do with the fact that pathotypes are poorly characterized phenotypically. So, for now at least we are modifying our approach to look at patterns of markers between pathotypes. For this we will need more genomic sequence data from more isolates which we are now pursuing.

The genome sequence of G. rostochiensis is providing the basis for a better understanding of virulence. Genes associated with effectors or SNPs associated with distinct pathotypes will be used for the development of ASO-qPCR detection tools for first responder agencies (CFIA). Knowledge about virulence and key developmental stage will be used for the development of new management tools (resistant cultivars) in order to minimize the impact of future introductions. The GBS protocol will also be useful for the rapid genetic characterization of other cyst nematode species.

We did not deliver an ASO-PCR assay but we can genotype by GBS. It would be pointless to develop an ASO-PCR assay right now without more pathotype genomes. This work will be continuing at AAFC and CFIA under different studies.

4.2.13 Ditylenchus destructor (Dd)

Potato rot nematode D. destructor is one of the most destructive pests on several tube crops such as potato, and is on the quarantine list of most countries. Species identification based on morphology is difficult, misidentifications cause problems with tremendous trade consequences.

Genome and transcriptome of D. destructor were sequenced and the genome sequences have been assembled. Transcriptomes were from both fungal feeding and plant feeding populations. The genome size is estimated at about 100M bp. It is not precise because of some issues of contamination and wide genetic differences between the alleles of the diploid genome. However, there was enough data to generate assays.

Microsatellites have been identified, and been used for the real-time PCR. A total of 4 primer sets were designed and tested. The primer/probe sets SSR-1360 and SSR-5277 amplify their respective targets specifically and can be used to detect D. destructor in mixtures of DNA samples. Primer/probe set SSR-1360 is species-specific for D. destructor, while primer/probe set SSR-5277 is specific to the Jiangsu strain of D. destructor.

Real time PCR assays for species-specific detection have been developed. SOP developed, and transferred to CFIA. The real-time PCR technology provides quick and accurate detection and identification of the pest. Prior to this, PCR using universal ITS primers, other ribosomal DNA page 41 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

based and mitochondria based primers and AFLP were the methods used by our first responders such as PCR. The method is handy in dealing with the new national emergency of D. destructor.

4.2.14 Ditylenchus dipsaci (Ddi)

The stem and bulb nematode Ditylenchus dipsaci is a serious pest on many crops, primarily onion and garlic, and it is on the quarantine list of many countries. Recently, there was an outbreak in Ontario and it is spreading to the province of Quebec province and the neighboring states in the USA, sounding an alarm by its velocity of spread. Identifications of the pest based on morphology and PCR-AFLP have not always been correct.

This species was not on the original charter. Because of the recent incidents and the advantage of having this genome for improving the D. destructor assay, we decided to sequence its genome.

The first draft genome of the potato rot nematode D. dipsaci has been generated. The early releases of the assemblies have over 100 M bp. Microsatellites have been identified and a few were chosen for real-time PCR for the quick detection of D. dipsaci. We are at the retesting and validating stage, and an SOP will be developed and transferred in a near future to CFIA (a new deliverable that was not in the original objectives). The work and collection gathered from this project was used to solve the trade issue related with D. dipsaci/D. weischeri in our yellow pea exports to India.

4.3 Metagenomics from agricultural soil, commodity, agricultural & urban watershed

4.3.1 Bioinformatics tools developed for pathogen monitoring and biodiversity studies We have developed three bioinformatics tools to facilitate NGS data analysis. These objectives were beyond the original goals and they are very significant improvements over the tools we were planning to use originally. It was accomplished using funds leveraged from AAFC A-base project STB-97.

1) The Automated Oligonucleotides Design Pipeline (AODP) is an open source command-line utility, Perl-based, that designs Signature Oligos (SO) for individual or clades of DNA barcode sequences with desired lengths. AODP integrates a previously developed program, namely “SigOli” (Zahariev et al., 2009) to identify the unique polymorphism site(s) and extract SO that can distinguish sequences belonging to different hierarchical clusters that have been defined by annotated phylogenetic trees. AODP is available at https://bitbucket.org/aafc-mbb/aodp_release and https://github.com/AAFC- MBB/AODP_releases, and can be distributed and used under the MIT public license. AODP utilizes parallelization to minimize runtime and a two-step cross validation process ensures the specificity of SO designed by this pipeline. AODP can be used to design highly specific SO for any taxon or DNA barcoding region. It links SO to curated high quality reference databases and accurately annotated taxonomies. Using SO designed by AODP for improved classification accuracy of metabarcodes is particularly useful for routine screening for agriculturally and economically important pathogens (e.g quarantine species) in raw amplicon-based metagenomic datasets. SO designed by AODP can also be used as candidate primers and probes, and for the construction of DNA hybridization arrays, provided the thermodynamic properties are being properly determined by an appropriate external program, such as ArrayDesigner.

2) We have further developed AODP version 2.0 which is a command-line application written page 42 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

in C++ and a perl utility, consisting in large part of code from aodp-1.3. This new version of AODP is highly efficient in designing SO for large datasets. It takes less than 20 seconds to generate SO for a dataset of over 10,000 sequences with average length of 750 bp. AODP v. 2.0 will be released to the public in April 2015.

3) The Oligo-Fishing pipeline (OFP): OFP takes the SO designed by AODP and uses them as a virtual DNA array, to probe perfectly matched sequences from NGS databases and facilitate the classification of NGS sequences to lower taxonomic levels (species or even lower). OFP is functional but needs more work to enhance efficiency. OFP will be released to the public in April 2015.

4) The R package RAM (R for Amplicon-based Metagenomics) provides a series of functions to make amplicon-based metagenomic analysis more accessible to non-bioinformaticians, and simplify the creation of publication-quality plots. It has been published in CRAN and the current version is 1.2.0: http://cran.r-project.org/web/packages/RAM/index.html.

4.3.2 NGS data analysis

We collected and sequenced 500 amplicon libraries (from 323 samples) for 454 pyrosequencing (Table 6), including using the internal transcribed spacer (ITS) for fungi and nematodes, and 16S ribosomal RNA gene (16S) region for bacteria.

Table 6: Metagenomic sample summary

Count of Sample Province Target Project Year AB BC MB NB NL NS ON PEI QC SK Grand Total Bacteria CRTI_Cereal_Washes 2011 1 1 2 1 5 10 CRTI_Cereal_Washes Total 1 1 2 1 5 10 CRTI_CGC_Cereal_Wash_Tom 2010 6 5 2 2 13 28 2011 17 1 6 1 8 2 4 29 68 2012 9 2 1 1 11 24 CRTI_CGC_Cereal_Wash_Tom Total 32 1 13 1 11 2 7 53 120 CRTI_Soil_SeanLi 2012 6 42 48 CRTI_Soil_SeanLi Total 6 42 48 CRTI_Water_IzharKhan 2010 9 9 2011 46 46 2012 41 41 CRTI_Water_IzharKhan Total 96 96 Bacteria Total 32 1 13 1 6 2 109 44 8 58 274 Fungi CRTI_Cereal_Washes 2011 1 1 2 1 5 10 CRTI_Cereal_Washes Total 1 1 2 1 5 10 CRTI_CGC_Cereal_Wash_Tom 2010 6 5 1 3 12 27 2011 17 1 6 1 9 2 3 30 69 2012 9 2 1 1 11 24 CRTI_CGC_Cereal_Wash_Tom Total 32 1 13 1 11 2 7 53 120 CRTI_Soil_SeanLi 2012 6 42 48 CRTI_Soil_SeanLi Total 6 42 48 Fungi Total 32 1 13 1 6 2 13 44 8 58 178 Nematodes CRTI_Soil_Nematodes_QY_BM_GB 2008 1 1 2009 4 4 2010 16 16 2011 2 1 3 2012 8 16 24 CRTI_Soil_Nematodes_QY_BM_GB Total 31 17 48 Nematodes Total 31 17 48 Grand Total 64 2 26 2 12 4 153 88 33 116 500 So far, we have processed all NGS sequences using two in-house NGS processing pipelines. The LCA pipeline adopts the lowest common ancestor algorithm (Huson et al., 2007) to process the BLAST results against the NCBI GenBank database for sequence classification, whilst the QIIME pipeline uses rdp classifier (Wang et al., 2007) and the UNITE ITS database (Kõljalg et al., 2005) to assign fungal taxonomy to NGS data.

We have analysed the epiphytic microbiomes of cereal grains, core bacteria communities of agriculture/urban watersheds associated with different commodities, and collected baseline data of fungi/bacterial communities of soil samples from potato farms. We focused on the identification of important plant / human / animal pathogens and fitted different models to predict

page 43 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

their distributions under the impact of climate change, geographic location, physical and chemical attributes of the environment, as well as anthropogenic activities. Network and co-occurrence analyses were performed to explore the distribution and correlation of taxon groups in different environmental niches. Currently, four (4) manuscripts are in preparation.

4.3.3 Comparison of data with cpn60 Amplicon-based metagenomics is limited by the resolution of the targeted amplicon. In bacteria, 16S and ITS lack of the discriminatory power to separate species of very important pathogens. One sub-objective of this project was to directly compare metagenomic data generated using a variety of platforms and also to compare the microbial profiles obtained using different taxonomic markers (16S, cpn60, ITS). In this work, we established microbial profiles for seed samples based on both 16S/ITS (bacteria/fungi) and cpn60 (bacteria/fungi). We established a pipeline for analyzing data generated using the cpn60 taxonomic marker (Links et al., 2013) and have enhanced this by creating an exploratory data environment (EDE). We will further improve this analysis pipeline by expanding the EDE to a collaborative EDE (CEDE) in a recently initiated A-base project (see below).

4.3.4 Impact, transition and exploitation We offered a series of tutorials to collaborators and students from USDA, AAFC, CFIA, CGC and universities. We presented applications for taxonomic identification in metagenomic data and an R package for analysis following clustering into OTUs and taxonomic identification. We also broadcast the trainings through webinar, webex and teleconference. A potential collaboration was established with scientists at the USDA to integrating two software packages for the identification/detection of pathogens using both amplicon and shotgun metagenomics data. Our training was well received and we have been able to expand our collaborations to USDA, CFIA animal sections and research teams at universities.

page 44 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

5 Transition and Exploitation

5.1 Genomics and Metagenomics data

This project generated 16TB of DNA and RNA sequence data. These data are all available to partners at http://biodiversity/wiki/bin/view/Main/CRTI-09S-462RD.

We received 807GB (after file compression) of raw reads in total from PBI-NRC. Based on a few calculations from the sample summary sheet, this represents a total of just over 1 trillion raw base pairs sequenced from for CRTI.

After assemblies were done the genomics/transcriptomics data released across all organisms for CRTI amounted to 1.5TB of genomics/transcriptomics and 2TB of metagenomics data (available at the website mentioned above). Almost 100TB had to be available for storage during the genome assembly phase and while analyses were being done to pick the best candidate releases, The metagenomics data have been used already to generate analyses that are being used for policy analysis. The genomics data have been used for the development of several diagnostic tools but this will continue, hence the importance of data availability. The genomics data of species closely related to very high-risk pathogens will reduce the potential for trade embargoes (e.g. reduction of false positive with Karnal bunts because we sequenced the closely related Tilletia walkeri.

5.2 Validation of the diagnostic tools

Diagnostic tools have been validated in the laboratory (TRL4) but several assays have gone beyond that. In conjunction with the acquisition of a high-throughput quantitative PCR system (CSSP-2013-TI-1141) for the Canadian Grain Commission, we were able to implement newly developed diagnostic assays for high-risk pathogens in the routine monitoring of grain shipments. The results and outcome of the project expanded CGC’s search capacity for toxigenic and pathogenic microorganisms. In comparison to previously employed methods, the new SOPs enhance and expedite identification and strain typing of microbial agents using NGS and qPCR protocols.

5.3 New projects initiated for further development and transition At a meeting held on May 6, 2013 at AAFC headquarters in Ottawa, Dr. Catherine E. Woteki, Under Secretary and Chief Scientist, U.S. Department of Agriculture (USDA) and Dr. Siddika Mithani, Assistant Deputy Minister (ADM) for STB, decided to work together to make best use of scarce resources and focus on advancing collaboration around five strategic priority areas, including “Next Generation Sequencing Technologies and Applications” to address potential threats to trade. A follow up meeting was held at Agricultural Research Service (ARS), Beltsville on June 2-3, 2014, and included scientists from STB, CFIA, and scientists from USDA’s ARS and the Animal, Plant Health Inspection Service (APHIS), primarily the scientists from this study. An Action Plan for collaborative research was drafted for next generation sequencing for plant pathogens of high consequence, focussing on bacteria and fungi as a pilot project.

The year our new project started, the Genomics R&D Initiative was trying to get renewed. A new approach was suggested by the ADM stirring committee. They set aside 20% of the funding for two shared priority projects, one on food and water safety and one on quarantine and invasive

page 45 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

species (QIS), allocating $2M per year for each project. AAFC and CFIA led the project on QIS which was much broader than the current CSSP project, much beyond plant pathogens as it involved Department of Fisheries and Oceans, Environment Canada, Natural Resources Canada and National Research Council, but narrower in some ways as only viral genomes were sequenced. The CSSP-CRTI project was instrumental in obtaining this project and in leading it, particularly for the project management and the bioinformatics components. Because this new project had to invest into computer infrastructure immediately for fiscal year reasons, the CSSP- CRTI benefitted immensely by accessing a large computer resource early on while the new GRDI was ramping up. This bioinformatics capacity and the decreasing cost of DNA sequencing are the main reasons that allowed us to sequenced many more genomes than originally planned. 4 year project (2012-2016) $8M.

Two research proposals were recently funded through the AAFC peer-review process: (i) Project STB-1136. Mycology and Bacteriology Biosystematics - Filling gaps in agricultural fungal and bacterial biodiversity (2015-2018) $1.70M, and (ii) Project STB-1134. Next generation sequencing - genomics and metagenomics of quarantine fungal and bacterial crop pathogens (2015-2018) $708K. The second project is a direct continuation of CRTI 09-462RD, focussing on the new USDA collaboration on next generation sequencing.

The identification issues of this project, primarily the potential false positive of high consequence, led to a bioinformatics study devoted to bioinformatics and identification, a project entitled “Development of identification and analysis tools for amplicon-based metagenomics, focussing on high risk and regulated pathogens.” We have developed three bioinformatics tools to facilitate NGS data analysis. These objectives were beyond the original goals and they are very significant improvements over the tools we were planning to use originally. It was accomplished using funds leveraged from AAFC A-base project STB-97. 3-year project (2014-2017) ($325K).

AAFC funded a multi-centre study on Emerging (EmTox), which runs until March 2018, addressing other mycotoxins (beyond Ochratoxin A) targeted for possible trade regulations by the European Union. The approach used in the CRTI study will be extended in the EmTox study, which will combine state of the art analytical chemistry with assays derived from mycotoxin gene clusters, to address the prevalence of these toxins in Canada and prepare for the impending regulations. 3-year project (2014-2017) $1.67M.

AAFC has funded continued improvement of our bioinformatics platforms accross multiple users, as part of the GRDI mandate funded project: “Exploring the applicability of new technologies and processes to the management and analysis of next generation sequencing data” Christopher Lewis and Matthew Links, co-project leads. 3-year project (2014-2017) $394K.

Continuing the work on metagenomics of grains, AAFC A-base proposal: “Bioinformatics for the seed microbiome” Matthew Links and Tim Dumonceaux, co-project leads. 3-year project (2014- 2017) $292K.

Funded project (Saskatchewan Agriculture Development Fund): “Development and application of rapidly deployable in-field molecular diagnostics for plant diseases” Tim Dumonceaux, Chrystel Olivier, Matthew Links, and Hossein Borhan. 2 year project (2014-2016) $51K.

As follow-up activities funding was secured for another bacterial pathogen of corn: Project Title: Monitoring an invasive bacterial pathogen (Clavibacter michiganensis subsp. nebraskensis) of corn using next-generation sequencing with Dr. Tambong as the principal investigator. Funding provided by AAFC STB-89. 3-year project (2014-2017) $210K and from industry MCGA CRADA # AGR10755 ($45K).

page 46 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Results obtained in this project have triggered a follow-up study to further analyze the pathogenic and non-pathogenic strains of R. solanacearum R3bv2 strains in a collaboration project between CFIA and AAFC (CFIA-RPS Project CHA-P-1412 entitled “Comparative genomic analysis of Ralstonia solanacearum race 3 bv 2 (pathogenic and non-pathogenic strains) with non-R3bv2 strains of various pathogenic characteristics for accurate diagnostic methodology targeting low temperature adapted pathogenicity determinants”) which has been approved at CFIA with a total funding of $276K for the next three years. Results obtained in this project have triggered a follow-up study to further analyze the pectolytic bacteria with European, NA and other origins, and their environmental fitness, to assess the potential threat to Canadian potato industry. A collaborative project between CFIA and the Netherlands (CFIA-RPS Project CHA-P-1313 entitled “Investigation of Dickeya spp. for their environmental fitness on potato and development of rapid and sensitive molecular diagnostic method”) has been approved at CFIA with a total funding of $234K for the next three years.

Results obtained in this project have triggered a follow-up CFIA GRDI project (GRDI CHA-P- 1411 entitled “Development and application of genomics and metagenomics-based surveillance technologies toward eradication of potato bacterial pathogens of regulatory importance”) to further analyze the regulated plant pathogenic bacteria in collaboration between CFIA and AAFC, which has been approved at CFIA with a total funding of $140K for the next five years.. In addition, another TD Project Proposal CHA-P-1511 entitled “Establishing next generation sequencing capability for diagnostics and research” has been planned in complementary to the GRDI project at CFIA-Charlottetown Laboratory.

Seifert joined a Joint Genome Initiative funded project run by the US Dept of Energy to sequence genomes of ten additional mycotoxin producing Penicillium species, and leads the mycotoxin gene annotation effort in this project.

Chen initiated a collaborative project (2015-2018) with the Microbiology Institute, Chinese Academy of Sciences, to collect baseline data of epiphytic microbiomes of cereal grains in China. Although this project aims to address an underlying hypothesis assuming that grain quality and yield are associated with specific members, functional genes and metabolic pathways of their epiphytic microbial communities, we will obtain the first hand data associated to potential distribution of un-identified “DNA species” in Canada through international trade with China (please refer to section 4.3.5).

5.4 Success stories

We are presenting here a few examples of the successes from this CSSP project CRTI 09-462RD. The following are written up as short success stories that can used as stand-alone pieces to support the CSSP program that made these possible.

Improved capacity in agri-food research for high risk pathogens

When this project started, there was very little knowledge and capacity at the federal government level to handle large amounts of next generation sequencing data in the context of agri-food pathogens. This project seeded a transformative change at AAFC, CFIA and CGC. The members of this project managed to leverage over five folds the original CSSP investment of $2M, in projects that were either a direct continuation of CSSP sub-projects for further development and transition or projects that were highly complementary to our CSSP project (total over $10M). AAFC, CFIA and CGC now share several TB of disk space on a shared server with raw and analysed next generation sequencing data, a transformative change by itself. Through web-based applications (e.g. R-Studio or Galaxy) these data can be analyzed by all team members

page 47 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

while it resides on the server, avoiding the very inefficient transfers of data that were the norm previously for sharing data. Through control of software and analysis versions, the tracking and reproducibility of analyses has been dramatically improved. It is also reassuring that with the new individual and collaborative projects approved until 2018, development and enhancement of this shared resource will continue.

Impact of project on policy development and government priorities

The main rationale behind this project was the plummeting cost of DNA sequencing. Less than 10 years ago, we would have been able to sequence only one of the many genomes we sequenced with the funding we received. The continually decreasing price of sequencing, 10,000 fold over 8 years, is one of the factors that allowed us to sequence many more genomes than we originally planned. It is now cost effective to use next generation sequencing as a detection tool for pathogens. The sequencing cost decreased for us but also for our trading partners that are screening commodities they import from Canada. The largest next generation sequencing (NGS) facilities are in countries that import heavily from Canada. This transformative technological change and the potential risk for trade have caught the attention of policy makers at high levels and team leaders of this project received invitations from senior management to make presentations on NGS impact (see section 6.3). Following a meeting in 2013 between USDA Undersecretary for Research and AAFC S&T Branch ADM, NGS has become one of the top priorities for collaboration between USDA and AAFC. Reports on this collaboration are being requested regularly from the highest levels of AAFC. This collaboration will continue through new funding received on both sides of the border. The new findings of potato wart in PEI in 2014 led the CFIA VP Science to write to the AAFC S&T Branch ADM to acknowledge the CSSP/CRTI subproject on potato wart, asking to continue this work beyond the end of the project, moving from genotyping to comparing strains to new work on pathotyping, in support of efforts to develop resistant potato varieties.

Karnal bunt of wheat Canadian exports of wheat are worth about $7B annually. The presence of certain quarantine pathogens could immediately stop our exportation until we demonstrate that our wheat is no longer a threat to the importers. Karnal bunt of wheat caused by Tilletia indica is such pathogen and is a quarantine organism in most countries of the world (e.g. China and Europe). This pathogen has been considered as high risk on the Consolidated Risk Assessment of CRTI. It was found in Arizona, USA, in March 1996, and despite quarantine measures in infested US states, several countries stopped importing from US until new certification agreements were established. Continuous surveys and monitoring of Karnal bunt in US to provide export certificates is very expensive but deregulating would cost more than one billion per year because of lost export markets. The cost of an introduction of Karnal bunt into Australia has been estimated at $500M per year, with a higher cost in the first year. Similar regulatory and economic concerns exist for the related species Tilletia controversa, which causes dwarf bunt of wheat. Through our project we recently sequenced with NGS the entire genomes of the Karnal bunt and Dwarf bunt agents and their two closest relatives, going beyond the original goals of this project. We developed very specific assays from genome data to independently detect the regulated bunt pathogens only (to avoid the false positive issues that added to the economic impact of Karnal bunt in US) or detect the closely related unregulated species to double check for any false positives, also going beyond the original goals of the project for the latter. From the genome data of Tilletia indica, we could also quickly design assays to perform strain typing and determine the origin of a contamination if required or even develop an assay to determine if the molecular detection is from spores that are dead or alive.

page 48 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Detection of pathogens in grains by Next Generation Sequencing (NGS) technology Generating millions of sequences is now easy and inexpensive and Canadian commodities are currently being screened elsewhere (the largest NGS centres are in Asia). Interpretation of NGS data is difficult and prone to error or misinterpretation and “Off the shelf” identification tools of NGS data typically show many worrisome – but inaccurate – results. With our project we found that such tools applied to Canadian wheat were improperly detecting Tilletia walkeri, an innocuous pathogen of annual ryegrass, as the quarantine Tilletia indica. Such false positive results could have disastrous consequences on Canadian exports. Because of our project, we were made aware of some very specific issues with off the shelf tools and have developed better approaches to analyse the data, in addition to the alternative tests described above to demonstrate that these were indeed false positives. Because of false positive results from an older molecular assay, counties in Southeastern US and Oregon were falsely declared under quarantine in the mid- nineties, with disastrous economic consequences in the US. The work conducted in Dr. Wen Chen’s lab was featured in AAFC “Weekly Science Stories” on September 17th, 2014, which was entitled “Next Generation Sequencing to Identify Pathogens”, available at http://agriwiki.agr.gc.ca/wiki/STB_Weekly_Science_Stories#Next_Generation_Sequencing_to_I dentify_Pathogens-_September_17.2C_2014

Potato wart The United States and other countries implemented a trade embargo on PEI potatoes in 2000 immediately after the first finding of this disease in this province. According to the Auditor General of Canada, the cost of a few months of embargo in PEI was $84M1. From our first CRTI project (04-0045RD) we improved on a molecular assay we developed in 2000 and a validated protocol was recently published2 to detect the pathogen straight from soil. The causal agent of potato wart was added to the US Select Agent list in 2002. One cannot absolutely prove this, but our molecular assays, now validated by Americans, certainly played a role in keeping the border open when new PEI cases were found in 2012. The work we completed with our project was about genotyping potato wart in order to match or establish the origin of strains. Our assay has a very high resolution in separating strains. So far, PEI strains are very closely related, which is very good news because it points to a single original infestation. The latest finds in 2014 are genetically identical to the 2012 cases. We are closely working with the Europeans on this and CSSP/CRTI funding was crucial to establish this partnership. USDA is also very interested in our findings as they rely on us to sequence the genome of this select agent.

Potato rot nematode Ditylenchus destructor is a serious quarantine nematode pathogen on potato causing potato rot. It was discovered in garlic in Ontario in 2011, early in the project. The rapid discovery and the accurate identification prevented a potential disastrous threat for the Canadian potato industry. If this pathogen had spread to a potato field, it would have caused trade embargoes similar to what happened in PEI for potato wart, possibly on a larger scale because this was not on an island. The fact that the identity was established and confirmed quickly was largely due to the molecular tools and nematode isolates assembled through our project. A total of nine isolates of D. destructor had already been collected including seven from China. One isolate from Jiangsu province was selected for genome and transcriptome sequencing and we now have a complete genome of this organism. We also generated the genome of Ditylenchus dipsaci, a close relative of D. destructor, which again went beyond the original scope of the project.

1 http://www.oag-bvg.gc.ca/internet/English/att_c20021004se03_e_12347.html 2 http://www.ncbi.nlm.nih.gov/pubmed/24328493 page 49 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Potato brown rot

Ralstonia solanacearum R3bv2 is considered a quarantine pathogen in NAPPO, EPPO and Australasian countries, and is listed as a select agent in the US Agroterrorism Protection Act of 2002. In 2003, R. solanacearum R3bv2 was detected in the United States in geraniums imported from Kenya, Guatemala and Costa Rica (Swanson et al., 2005). The resulting response for these detections cost geranium growers and regulators an estimated $10 million that year alone. The spread of R. solanacearum R3bv2, which causes potato brown rot, has become a major threat to the potato industry in temperate regions. Based on the experience in European countries, eradication could become difficult or impossible once the bacterium is established in local soil and irrigation systems. For any confirmed detection of R. solanacearum R3bv2, such as by the US, consequences could be significant, ranging from export restriction to a full prohibition for the import of potato and other host crops from affected countries. On Dec 19th, 2013, Mexican plant protection officials notified CFIA of the positive laboratory diagnosis for brown rot of potato caused by R. solanacearum R3bv2 in pre-clearance seed potato samples submitted from growers in Alberta. In response to the claims by Mexico, the CFIA launched its own internal investigation, collecting 6000 samples from the seed lots associated with the possible detections. In addition, Mexican officials provided samples of the materials used for their diagnostic tests to the CFIA Chalottetown Laboratory to confirm their findings. Laboratory analysis in CFIA Charlottetown Laboratory showed that no R. solanacearum R3bv2 were present in any of the samples submitted by CFIA inspectors. Similarly, R. solanacearum R3bv2 could not be isolated or confirmed from the samples forwarded by Mexican officials. As a follow-up activity to try to resolve the discrepancy in results, a technical exchange session between CFIA and Mexican technical staff has been planned. During the internal investigation at Charlottetown Laboratory, the new RT- PCR assays developed in the CRTI 09-462RD project played a key role. Although a fragment of the genome for R. solanacearum R3bv2 was detected in the rejected samples provided by Mexican officials using an old in-house assay, none of these samples tested positive using the new RT-PCR assays which target four novel and unique fragments of the genome of R. solanacearum R3bv2. Additional analysis indicates that the positive reports of R. solanacearum R3bv2 by Mexican officials was most likely due to a lab contamination with their PCR product in Mexico, which shares some homology within the target fragment of our old in-house assay.

Ochratoxin A (OTA) This mycotoxin, produced primarily by the fungus Penicillium verrucosum in Canada, is found in grains, especially barley and oats, as well as in beer, imported wine and coffee. It is a Group 2B human carcinogen (kidney) according to the International Agency for Research on Cancer. New lower limits have been proposed by Health Canada for trade and domestic regulations, following a European lead. This creates challenges for grain producers because the new limits are near what were previously considered low and acceptable levels. We completed the genome sequencing of this fungus through this project, which allowed us to leverage other funds to sequence transcriptomes to follow genes involved in OTA synthesis, and the genomes of five additional strains and two closely related Penicillium species. Annotation of our genomes will be enhanced by our participation in a US Dept. of Energy Joint Genome Institute project on the genomics of ten other Penicillium species. AAFC funded a multi-centre study on Emerging Mycotoxins (EmTox), which runs until March 2018, addressing other mycotoxins targeted for possible trade regulations by the European Union.

Wheat Stripe rust Stripe rust, Puccinia striiformis f. sp. tritici, is a serious disease of wheat, occurring predominantly in temperate and high elevation regions. Disease incidence and severity are dependent on environmental conditions, but 100% yield losses are possible if infection occurs very early and continues to develop during the growing season

page 50 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

(http://striperust.wsu.edu/generalInformation/puccinia-striiformis-impacts.html). The pathogen is characterized as requiring relatively low temperatures for germination and infection but recent epidemics have demonstrated the emergence of strains adapted to higher temperatures, expanding the global threat to new areas closer to the equator (http://www.icarda.org/blog/%5Bnode%3ABlog%20type%5Dcountering-threat-wheat-stripe- rust-disease). The higher temperature adaptation also impacts the infective potential of the pathogen, pushing upwards the temperature trigger for the fungus to switch away from production of the wind-dispersed spores responsible for disease spread. In Canada, stripe rust has been a long-standing problem for Alberta, affecting up to 25% of the spring wheat production. Since 2000, the disease has expanded its range eastward to Saskatchewan, Manitoba and Ontario, occasionally causing localized but substantial losses. In 2011, losses were at epidemic numbers for southern Alberta and Saskatchewan and, abnormally, the pathogen had overwintered in Canada. In 2013, the disease was documented for the first time in Quebec, on cultivars grown in experimental performance trials at Laval University. The species-specific primers for rust pathogens developed and tested in this project were used as one line of evidence confirming the first report, in addition to morphological data and results from real-time PCR assays developed by this team in our preceding CRTI project (not published until 2015). The new primers were combined in a multiplex PCR assay which was especially valuable for demonstrating in one reaction that the initial DNA extracts were derived from specimens infected with leaf rust (Pt) as well as stripe rust. This inexpensive DNA-based test will be effective as a screening assay for environmental samples as part of a disease-monitoring program.

page 51 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

6 Conclusion

The main goal of this project was to reduce the risk of biological threats to the Canadian agri-food system by exploiting new generations of sequencing and genotyping technology. This goal was accomplished. The project completed all its milestones and deliverables, except a few minor delays or minor changes from the original plan. It did add many more deliverables, quadrupling the number of strains with their genomes sequenced, increasing the number of samples processed by metagenomics and developing six more assays than originally planned.

We expected that some of the targets would become major national issues during the duration of the project, but we did not know which targets these would be. This project was instrumental in helping to resolve crises and major investments have been made to push further some of its most critical components. Potato wart was found in PEI in several new fields in 2012 and 2014. Both the species detection and the genotyping assays have been used to help managing this problem. Pathotyping, finding out the virulence factor in the pathogen which would provide management decision for planting resistant potato varieties, as now emerged as a new need. The VP of CFIA contacted AAFC S&T Branch ADM to acknowledge the good work that had been done so far and mentioning this new requirement because of the new fields found. This project accomplished major steps towards this new objective, particularly the genome annotation that was not originally planned. The assays for potato brown rot were used to resolve a trade dispute with Mexico. The assay for wheat stripe rust was used to confirm the first occurrence of this disease in Québec. Early on, the strain collection that was being assembled for this project was instrumental in confirming the first case of the potato rot nematode. NGS processing of grain samples is being done by our trading partners although this is not being used yet to trigger quarantine measures. We uncovered major false positive issues with off the shelf analysis protocols and expanded our original goals to resolve these issues.

This work has resulted in the creation of a new working relationship among scientists at AAFC, CFIA, and the Canadian Grain Commission that will be highly productive going forward as threats to the Canadian food production system are identified through monitoring. The collaborative relationship means that any detection or monitoring technologies developed can be rapidly and meaningfully applied to the most relevant samples. Many detection assays have been developed through this project but a much stronger foundation now exists for further development and better technology transfer.

We need to continue to work together, and there are several new projects that will be accomplished with same momentum. Validation of developed methods is crucial to the accuracy and precision of test results and this will need to continue for new tests and for further technology readiness of some of the new tests. The various technology platforms used by different project partners often required additional time for tweaking individual diagnostic methods adopted for particular instruments. This project identified the need for standardization of platforms, technologies and analysis software between portfolio partners to reduce time and costs required for technology transfer from one department to another.

Although all the genomics and metagenomics data is available to all partners, there remain differences in the analysis software that can be installed by the different partners. This reduces capacity to fully use the IT infrastructure being held at AAFC/CFIA headquarters. There are projects in place to improve connectivity and shared analysis platforms among different partners.

As we predicted at the onset of this project, the cost of DNA sequencing is continuing to go down, making DNA sequencing an affordable new tool for pathogen detection and for the development of more robust and rapid assays. We translated this cost saving into sequencing

page 52 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

more genomes than originally planned. Because first responder agencies were involved in our project, immediate uptake of technology has already happened. The methods and data we developed are improving detection and monitoring capability of the Canadian environment for high risk organisms with tangible results.

page 53 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

References.....

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 25: 3389-3402.

Andrews S. 2010. FastQC: A quality control tool for high throughput sequence data. Reference Source

Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology 19: 455-477.

Blankenberg D, Kuster GV, Coraor N, Ananda G, Lazarus R, Mangan M, Nekrutenko A, Taylor J. 2010. Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists. Current protocols in molecular biology: 19.10. 11-19.10. 21.

Cantarel BL, Korf I, Robb SM, Parra G, Ross E, Moore B, Holt C, Alvarado AS, Yandell M. 2008. MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. Genome Research 18: 188-196.

Cheng P, Chen XM, Xu LS, See DR. 2012. Development and characterization of expressed sequence tag-derived microsatellite markers for the wheat stripe rust fungus Puccinia striiformis f. sp. tritici. Andris, M., Arias, MC, Barthel, BL, Bluhm, BH, Bried, J., Canal, D., Chen, XM, Cheng, P. et al: 779-781.

Duplessis S, Cuomo CA, Lin Y-C, Aerts A, Tisserant E, Veneault-Fourrey C, Joly DL, Hacquard S, Amselem J, Cantarel BL. 2011. Obligate biotrophy features unraveled by the genomic analysis of rust fungi. Proceedings of the National Academy of Sciences 108: 9166-9171 %@ 0027-8424.

Gordon A, Hannon G. 2010. Fastx-toolkit. FASTQ/A short-reads pre-processing tools (unpublished) http://hannonlab. cshl. edu/fastx_toolkit

Huson DH, Auch AF, Qi J, Schuster SC. 2007. MEGAN analysis of metagenomic data. Genome Research 17: 377-386.

Kõljalg U, Larsson KH, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E. 2005. UNITE: a database providing web‐based methods for the molecular identification of ectomycorrhizal fungi. New Phytologist 166: 1063-1068 %@ 1469-8137.

Kolmer JA, Ordoñez ME, Manisterski J, Anikster Y. 2011. Genetic differentiation of Puccinia triticina populations in the Middle East and genetic similarity with populations in Central Asia. Phytopathology 101: 870-877 %@ 0031-0949X.

Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25.

Links MG, Chaban B, Hemmingsen SM, Muirhead K, Hill JE. 2013. mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences. Microbiome 1: 1-7 %@ 2049-2618.

page 54 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Metzker ML. 2010. Sequencing technologies—the next generation. Nature Reviews Genetics 11: 31-46 %@ 1471-0056.

Mulder N, Apweiler R. 2007. InterPro and InterProScan. In: Comparative genomics: 59-70. Springer.

Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA. 2000. A whole-genome assembly of Drosophila. Science 287: 2196-2204.

Saunders DGO, Win J, Cano LM, Szabo LJ, Kamoun S, Raffaele S. 2012. Using hierarchical clustering of secreted protein families to classify and rank candidate effectors of rust fungi. PLoS One 7: e29847 %@ 21932-26203.

Swanson JK, Yao J, Tans-Kersten J, Allen C. 2005. Behavior of Ralstonia solanacearum race 3 biovar 2 during latent and active infection of geranium. Phytopathology 95: 136-143 %@ 0031-0949X.

Trapnell C, Pachter L, Salzberg SL. 2009. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25: 1105-1111.

Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. 2010. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28: 511-515.

Wang Q, Garrity GM, Tiedje JM, Cole JR. 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental Microbiology 73: 5261-5267.

Zahariev M, Dahl V, Chen W, Levesque CA. 2009. Efficient algorithms for the discovery of DNA oligonucleotide barcodes from sequence databases. Molecular ecology resources 9: 58-64 %@ 1755-0998.

Zerbino DR, Birney E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18: 821-829.

page 55 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Annex A Project Team

PRC Role Name/Title/Organization Phone Number E-mail Address Y/N Project Champion Y Michèle Marcotte, Science (613) 759-1525 [email protected] (PRC Chair) Director, Eastern Cereal and c.ca Oilseed Research Centre - Ottawa, AAFC (delegated by Gilles Saindon) PRC Core Members Project Manager Y André Lévesque, Research (613) 759-1579 [email protected] Scientist, Ottawa, AAFC a Portfolio Manager Y Nezih Mrad (current), Norman J (613) 947-1198 Nezih.Mrad@drdc- Yanofsky (retired in 2014), rddc.gc.ca Biology Portfolio Manager, Centre for Security Science (CSS) Partner Scientific Y Linda DeVerno, Director, (613) 228-6690 linda.deverno@inspectio Management Ottawa Plant Laboratory, CFIA (5902) n.gc.ca Representative (supervisor is Karen Jessett) Partner Scientific Y Stefan Wagener (current) (204) 983-2764 stefan.wagener@grainsc Management Peter Burnett (retired in 2014), anada.gc.ca Representative Director, Grain Research Laboratory, Canadian Grain Commission, Winnipeg PRC Associate Members Deputy Project Y Wen Chen, Research Scientist, (613) 759-1284 [email protected] Manager AAFC Ottawa Chief Y Christopher Lewis, (613) 759-1232 [email protected] Bioinformatician Bioinformatics, AAFC Ottawa c.ca Project Finance N Louise Clermont, (613) 759-1669 [email protected]. Officer Finance/Administration Officer, ca AAFC Ottawa Additional Team Members (non-PRC) PWGSC N N/A Representative

page 56 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Project N N/A Procurement Lead

Other Partner N Sean Li, Research Scientist, (902) 368-0950 [email protected] Representatives Charlottetown Laboratory - (263) Plant Health, CFIA, Charlottetown Other Partner N Guillaume Bilodeau, Research (613) 228-6690 Guillaume.Bilodeau@ins Representatives Scientist, Ottawa Plant (4997) pection.gc.ca Laboratory (Fallowfield), CFIA, Ottawa Other Partner N Tom Graefenhan, Research (204) 983-7797 tom.graefenhan@grainsc Representatives Scientist-Mycologist, Grain anada.gc.ca Research Laboratory, CGC Winnipeg Other Partner N Tigst Demeke, Research (204) 984-4582 tigst.demeke@grainscan Representatives Scientist- Plant Molecular ada.gc.ca Biology, Grain Research Laboratory, CGC Winnipeg Other Partner N Keith Seifert, Research Scientist (613) 759-1378 [email protected] Representatives (Mycology: Penicillium), AAFC Ottawa Other Partner N Sarah Hambleton, Research (613) 759-1769 [email protected]. Representatives Scientist (Mycology: cereal ca rusts, smuts and bunts), AAFC Ottawa Other Partner N James Tambong, Research (613) 715-5398 [email protected] Representatives Scientist (Bacteriology: plant a pathogens), AAFC Ottawa Other Partner N Qin Yu, Research Scientist (613) 759-1768 [email protected] Representatives (Nematology: plant pathogens), AAFC Ottawa Other Partner N Izhar Khan, Research Scientist (613) 759-7702 [email protected] Representatives (Bacteriology: human/animal pathogens), AAFC, Ottawa, , Other Partner N Ed Topp, Research Scientist (519) 457-1470 [email protected] Representatives (Bacteriology: human/animal (ext 235) pathogens), AAFC London Other Partner N Larry Kawchuk, Research (403) 317-2271 [email protected] Representatives Scientist (Mycology: potato a pathogens), AAFC Lethbridge Other Partner N Matthew Links, Biologist (306) 956-7693 [email protected] Representatives (Molecular Bioinformatics), AAFC Saskatoon

page 57 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Other Partner N Tim Dumonceaux, Research (306) 956-7653 [email protected] Representatives Scientist (Bacteriology: .ca Endophytes of plants), AAFC Saskatoon Other Partner N Mike Rott, Research Scientist, (250) 363-6650 [email protected]. Representatives Sidney Laboratory - Plant (263) ca Health, CFIA, Saanich Other Partner N Benjamin Mimee, Research (450) 515-2136 [email protected] Representatives Scientist (nematology), AAFC .ca Saint Jean sur Richelieu

page 58 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Annex B PROJECT PERFORMANCE SUMMARY

This was an R&D project and all objectives reached at least TRL 4, i.e. “concept, process, component validation of concept, component, or subsystem validation in a laboratory environment” according to the CSSP guidelines. Major tasks and associated target species from charter are outlined in Figure 1 and Table 1. A summary of all objectives related to genomics, detection assays and genotyping (i.e. excluding metagenomics) is presented in Table 10. These objectives were: A) Genome and transcriptome sequencing by next generation technology (see Table 3 for more details), B) Quantitative PCR assays for species specific detection, C) Genotyping of isolates by microsatellite analysis or Sanger sequencing, and D) Direct detection of genotypes by ASO-PCR

Two species were removed from the project because strains could not be obtained (Urocystis agropyri) or because the genomes we were expecting from other laboratories was not available on time (Globodera pallida).

Table 7: Summary deliverables per species for objectives A-D Objective A Objective B Objective C Objective D Genomes Genome Transcriptome Species (number of size (M (number f detection Genotype host Species strains)* bp) strains) assays Genotyping detection Pantoea (Erwinia) stewartii 6 4.5-5.4 6 completed n/a n/a Penicillium verrucosum 5 32 1 ongoing¥ completedǂ n/a Tilletia indica 1 30 1 completed n/a n/a Tilletia controversa 3 31-33 3 completed n/a n/a Tilletia carries + 1 35 1 completedǂ n/a n/a Tilletia walkeri + 1 25 1 completedǂ n/a n/a Cerealgrains Puccinia striiformis f. sp. tritici n/a n/a n/a n/a completed completed Puccinia triticina 2 102-104 n/a n/a completed completed Urocystis agropyri - - - - - n/a n/a Ralstonia solanacearum race 3 biovar 2 5 5.0-5.2 2 n/a completed completed Pectobacterium wasabiae 1 5.0 1 n/a completed completedǂ P. carotovorum subsp. brasiliense + 3 4.7-4.8 3 completedǂ n/a n/a Streptomyces scabies 1 10.8 1 completed n/a n/a Phytophthora infestans n/a n/a 3 n/a completed completed Synchytrium endobioticum 4 21 1 completed completed completed

Potato Globodera pallida n/a n/a n/a n/a - n/a Globodera rostochiensis 6 96 n/a n/a completed ongoing¥ Ditylenchus destructor 1 >100 1 completed n/a n/a Ditylenchus dipsaci + 1 >100 1 completedǂ n/a n/a Arcobacter lanthierii + 3 2.2-2.3 3 completedǂ completedǂ n/a Arcobacter septicus + 2 2.4-2.5 2 completedǂ completedǂ n/a + Species not in original plan - removed * Original plan was 1 ǂ task not on original plan ¥ new funding in place to complete work n/a not applicable as it was not planned or relevant (see Table 1)

page 59 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Objectives E-F: The 454 pyrosequencing of universal ‘barcode’ PCR products from bacteria 16S and nematode/fungi ITS was performed as planned on DNA template extracted from soil samples collected from potato fields with a range of outbreak/quarantine histories (objective E). A similar approach was followed as planned for fungi and bacteria to compare microbial composition between high quality grain and visibly-diseased samples collected from various cereal growing areas, adding also cpn60 as barcode marker (objective F). Finally, bacterial 16S was sequenced by 454 from fecal and water samples in agricultural watersheds with heavy animal husbandry practices, and agricultural soils following manure application as planned (objective G). The total number of samples processed for these three objectives was twice the original number of 250, moreover, several data analysis tools were developed as we uncovered some issues with what we were originally planning to use.

Assays have been validated in the laboratory (TRL4) and several have been already used operationally (see success stories). Several projects received new funding from AAFC and CFIA to bring TRL up to TRL 7 or more.

Schedule Performance Summary Almost all the original deliverables were achieved. The only exceptions are two cases that were dropped because of impossibility to access samples (Uc) or because a genome we were expecting from a group in Europe was not released on time. Two deliverables are still being developed with additional funding from AAFC.

Cost Performance Summary

The Total Project funds to be expended (CSS funds - CRTI and in-kind contributions) during this Project were estimated at $ 3,734,000. Total CSS funds (CRTI) to expended during this Project were $ 1,999,000 as planned, following the fiscal year spending. However, there was a very significant increase in in-kind contribution which was $ 3.72M instead of $1.74M. We more than doubled the originally planned in-kind contribution. We invested $1.86 of in-kind for every CSSP dollar.

page 60 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Spending schedule as per charter. Participant Fiscal Year CSS Funds - $ In-kind Effort (PSTP/CRTI)

Lead Fed Definition Funds (if applicable) 2011/2012 $ 147,000 199,000 2012/2013 $ 486,000 419,000 2013/2014 $ 431,000 400,000 2014/2015 $ 186,000 190,000 TOTAL $ 1,250,000 $ 1,208,000 CFIA Charlottetown 2011/2012 $ 40,000 38,000 2012/2013 $ 129,000 83,000 2013/2014 $ 131,000 67,000 2014/2015 $ 49,000 37,000 TOTAL $ 1,599,000 $ 1,433,000 CFIA Ottawa 2011/2012 $ 5,000 15,000 2012/2013 $ 56,000 50,000 2013/2014 $ 57,000 44,000 2014/2015 $ 30,000 27,000 TOTAL $ 1,747,000 $ 1,569,000 Canadian Grain 2011/2012 $ 40,000 36,000 Commission 2012/2013 $ 81,000 53,000 2013/2014 $ 77,000 47,000 2014/2015 $ 54,000 30,000 TOTAL $ 1,999,000 $ 1,735,000 2011/2012 2012/2013 2013/2014 2014/2015 TOTAL $ 1,999,000 $ 1,735,000

Definition Funds $ 0 $ 0 Fiscal year ____ (if applicable) TOTAL 2011/2012 $ 232,000 $ 288,000 TOTAL 2012/2013 $ 752,000 $ 605,000 TOTAL 2013/2014 $ 696,000 $ 558,000 TOTAL 2014/2015 $ 319,000 $ 284,000 TOTAL $ 1,999,000 $ 1,735,000

page 61 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Annex C Publications, Presentations, Patents

6.1 Publications

6.1.1 Software release  Chen KS, Qi CY, Hardy E and Chen W (2014) The Automated Oligonucleotides Design Pipeline (AODP) version 1.0.0 to 1.3.1 https://bitbucket.org/aafc-mbb/aodp_release and https://github.com/AAFC-MBB/AODP_releases  Zahariev M, Chen KS, Qi CY, Hardy E, Lévesque CA and Chen W (2015) The Automated Oligonucleotides Design Pipeline (AODP) version 2 https://bitbucket.org/wenchen_aafc/aodp_v2.0_release/overview (complete new code for oligonucleotide design)

6.1.2 Journal articles  Adam Z, Chen Q, Xu R, Diange AE, Bromfield ESP and Tambong JT (2015) Draft Genome Sequence of Pseudomonas simiae Strain 2-36, an In Vitro Antagonist of Rhizoctonia solani and Gaeumannomyces graminis. Genome announcements 3:e01534-14.  Adam Z, Tambong JT, Chen Q, Lewis CT, Lévesque CA and Xu R (2014) Draft genome sequence of Pseudomonas sp. strain 2-92, a biological control strain isolated from a field plot under long-term mineral fertilization. Genome announcements 2:e01121-13.  Adam Z, Tambong JT, Lewis CT, Lévesque CA, Chen W, Bromfield ESP, Khan IUH and Xu R (2014) Draft Genome Sequence of Pantoea ananatis Strain LMG 2665T, a Bacterial Pathogen of Pineapple Fruitlets. Genome announcements 2.  Adam Z, Whiteduck-Léveillée K, Cloutier M, Chen W, Lewis CT, Lévesque CA, Topp E, Lapen DR, Tambong JT and Talbot G (2014) Draft genome sequences of two Arcobacter strains isolated from human feces. Genome announcements 2:e00113-14.  Adam Z, Whiteduck-Leveillee K, Cloutier M, Chen W, Lewis CT, Lévesque CA, Topp E, Lapen DR, Tambong JT, Talbot G and Khan IUH (2014) Draft Genome Sequence of Arcobacter cibarius Strain LMG21996T, Isolated from Broiler Carcasses. Genome announcements 2:e00034-14. doi: 10.1128/genomeA.00034-14  Adam Z, Whiteduck-Leveillee K, Cloutier M, Tambong JT, Chen W, Lewis CT, Lévesque CA, Topp E, Lapen DR and Talbot G (2014) Draft genome sequences of three Arcobacter strains of pig and dairy cattle manure origin. Genome announcements 2:e00377-14.  Li XS, Yuan KX, Cullis J, Lévesque CA, Chen W, Lewis CT and De Boer SH (2015 ) Draft genome sequences for Canadian isolates of Pectobacterium carotovorum subsp. brasiliense with weak virulence on potato Genome Announcement Genome announcements (Submitted for review).  Liu M, McCabe E, Chapados JT, Carey J, Wilson SK, Tropiano R, Redhead SA, Lévesque CA and Hambleton S (2015) Detection and identification of selected cereal rust pathogens by TaqMan® real-time PCR. Canadian Journal of Plant Pathology 37:92-105.  Rioux S, Mimee B, Gagnon A-È and Hambleton S (2015) First report of stripe rust (Puccinia striiformis f. sp. tritici) on wheat in Quebec, Canada. Phytoprotection (In press, accepted 2014-10-09).  Smith DS, Rocheleau H, Chapados JT, Abbott C, Ribero S, Redhead SA, Lévesque CA, De Boer SH. 2014. Phylogeny of the genus Synchytrium and the development of TaqMan PCR assay for sensitive detection of Synchytrium endobioticum in soil. Phytopathology 104: 422-432.  Whiteduck-Leveillee K (2014) Characterization of two novel Arcobacter spp. isolated from fecal matter. M.Sc thesis. .

page 62 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

 Yuan KX, Adam Z, Tambong J, Lévesque CA, Chen W, Lewis CT, De Boer SH and Li XS (2014) Draft Genome sequence of Pectobacterium wasabiae strain CFIA1002. Genome announcements 2:e00214-14.  Yuan KX, Chen W, Lewis CT, Tambong JT, Adam Z, De Boer SH and Li X (2014) Comparative genomics of pectobacteria and development of species-specific assays against Pectobacterium wasabiae. Canadian Journal of Plant Pathology (in press, abstract).  Yuan KX, Li XS, Nie J and De Boer SH (2015) Genomic and virulence analysis of cool temperature-adapted strains of the Ralstonia solanacearum species complex. Phytopathology (Submitted for review).

6.2 Presentations

 Bilodeau GJ (2013) Development of molecular detection-identification tools and genomics for protection against forest and agriculture plant pathogens (Invited Oral Presentation). Plant Research Seminar series, The plant research and Strategies unit- CFIA, June 26 2013, Ottawa, ON, Canada  Bilodeau GJ (2013) PIRL: Research in genomic and molecular detection-identification tools for protection against forest and agriculture plant pathogens (Oral Presentation). Ottawa Laboratory Fallowfield Seminar, November 21, 2013, Ottawa, ON, Canada  Bilodeau GJ (2014) Synchytrium endobioticum genotyping. USDA-ARS AAFC-STB Meeting. Applying Next Generation Sequencing (NGS) towards monitoring and genotyping of high risk plant pathogens, June 2nd and 3rd. Beltsville, MD  Bilodeau GJ, Gagnon M-C, Lévesque CA, Kawchuk L, Wijekoon CP, Feau N, Bergeron M-J, Grünwald NJ, Brasier CM, Webber JF and Hamelin RC (2014) Tools for rapid characterization tools of Phytophthora infestans and Phytophthora ramorum using real- time PCR and microsatellites from genomic resources. 7th meeting of the IUFRO Working Party 7-02-09 "Phytophthora in Forests and Natural Ecosystems", November 7th to November 17th 2014, Esquel (Patagonia, Argentina).  Brière SC and Bilodeau GJ Phytophthora collaboration updates and others. Genomics- Based Forest Health Diagnostics and Monitoring, TAIGA Project, Genome Canada-BC (web teleconference), April 3rd, 2012, web teleconference  Brière SC, Lévesque CA and Bilodeau GJ (2012) Oomycetes of concern in Canada and DNA barcoding. 1st International Web Symposium of “Oomycetes of concern in International Trade”, May 23rd 2012, (Univ. of Maryland), Webinar  Chen W (2014) Air, rain and commodity metagenomics. USDA-ARS AAFC-STB Meeting. Applying Next Generation Sequencing (NGS) towards monitoring and genotyping of high risk plant pathogens, June, 2014, Beltsville, MD, USA  Chen W, Lewis CT, Levesque CA, Zhang N, Turkington K, Bamforth J, Gaba D, Tittlemier SA, MacLeod A and Gräfenhan T (2014) Pyrosequencing revealed geographical distribution and ecological diversification of fungal communities on barley and malt from western Canada (Oral presentation). The International Union of Microbiological Societies (IUMS 2014) - XIVth International Congress of Bacteriology and Applied Microbiology, XIVth International Congress of Mycology and Eukaryotic Microbiology and XVIth International Congress of Virology. , 27 Aug. – 01 Sep 2014. Montreal, QC, Canada.  Gagnon M-C (2014) Genotyping tools for plant pathogens detection and characterization: From the genome to the field (Oral Presenation). Plant Research Seminar series, The plant research and Strategies unit-CFIA, , August 27th 2014, Ottawa, ON, Canada  Gagnon M-C and Bilodeau GJ (2014) PIRL: Update on the Potato Wart genotyping project, CSSP. CRTI team leader meeting, March 18, 2014, AAFC Carling, Ottawa

page 63 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

 Gagnon M-C, Kawchuk L, Li X, van der Lee TAJ, Bonants PJM, Lévesque CA and Bilodeau GJ (2013) Identification and molecular characterisation of Canadian strains of potato late blight and potato wart. Canadian Phytopathological Society Annual Meeting, June 16-19, Alberta, Canada  Gagnon M-C, Kawchuk L, Li X, van der Lee TAJ, Bonants PJM, Lévesque CA and Bilodeau GJ (2014) Identification and molecular characterisation of Canadian strains of potato late blight and potato wart. 2013 Annual meeting of Canadian Journal of Plant Pathology-Revue Canadienne de Phytopathologie Vol. 36, No. 2, 262,  Gagnon M-C, Lévesque CA, Kawchuk L, Wijekoon CP, Feau N, Bergeron M-J, Grünwald NJ, Hamelin RC and Bilodeau GJ (2014) Development of rapid characterization tools for Phytophthora infestans and Phytophthora ramorum using real- time PCR and microsatellites from a genomic resource (Poster). The Oomycete Molecular Genetics Network, July 2nd-4th, 2014, Norwich, UK  Gräfenhan T, Chen W, Lewis CT, Lévesque CA, Zhang N, Bamforth J, Gaba D, Tittlemier SA, MacLeod A and Turkington TK (2014) Composition of fungal communities on barley and malt from western Canada revealed by next generation sequencing (Oral presentation). 10th International Mycologists Conference 3-8 August, 2014, Bangkok Thailand  Hambleton S, Chen W, Chapados JT, Lewis CT and Lévesque CA (2014) Application of NGS technology in fungal pathogen detection and challenges in identification at species level (Invited oral presentation). 10th International Mycological Conference (IMC10), , August 3-8, 2014, Bangkok, Thailand  Kesanakurti P, Richards L, Gerdis S and Hambleton S (2014) Developing a DNA-based toolkit for detection and identification of wheat rust pathogens (Poster). APS-CPS Joint Meeting, August 9-13, 2014, Hilton Minneapolis, Minneapolis, MN, USA  Kesanakurti P, Setia R, Temple K, Hambleton S, Levesque A and Lewis CT (2013) Genome data mining and diagnostic marker development of Tilletia indica for agri-food system detection screening. AMER PHYTOPATHOLOGICAL SOC 3340 PILOT KNOB ROAD, ST PAUL, MN 55121 USA, pp. 73-73.  Li X, Nie J, Arsenault H and De Boer SH (2013) On-The-Spot Diagnosis and Detection of Plant Pathogens (Oral presentation, abstract published in ACTA Phytopathologica Sinica). 10th International Congress of Plant Pathology, August, 2013, Beijing, China  Li X, Yuan KX, Nie J, Adam Z, Tambong JT, Chen W, Lewis CT, De Boer SH and Lévesque CA (2014) Genomics-Based Approaches for Detection and Identification of Ralstonia solanacearum race 3 bv 2. 2014 Northeast Potato Technology Forum, March 2014, Charlottetown, PEI, Canada  Li XS, Ward L, Adam Z, Tambong J, Lewis CT, Lévesque CA and De Boer SH (2013) Pectobacterium wasabiae and blackleg-like disease in potato. Northeast Potato Technology Forum, March 5-6, 2013, Charlottetown, PEI, Canada  Li XS, Yuan KX, Nie J, Adam Z, Tambong J, Chen W, Lewis CT, De Boer SH and Lévesque CA (2014) Comparative genomics approaches for detection and identification of Ralstonia solanacerum race 3 biovar 2 (Invited Oral Presentation). 13th International Conference of Plant Pathogenic Bacteria, June 8-13, 2014, Shanghai, China pp. 92  Lowe C, Chen KS and Chen W (2014) A Bioinformatics Pipeline for Designing Virtual DNA Arrays for Improved Species-Level Interpretation of Amplicon-Based Metagenomic Data (Poster) 22nd Annual International Conference on Intelligent systems for Molecular Biology, July 11 - July 15, 2014, Boston, MA, USA  Nguyen HDT, Li, M., Ponomareva, E., Setia, R., Cullis, J., Kandalaft, I., McMullin, M., Miller, J.D., Seifert, K.A. 2014. (2014) Ochratoxin A biosynthetic pathway elucidation in the fungus Penicillium verrucosum by de novo genome assembly, genome annotation, and transcriptome analysis (Poster presentation). US Dept. of Energy Joint Genome Institute, Annual User's Meeting, March 2014, Walnut Creek, California

page 64 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

 Nie J, Yuan KX, De Boer SH and Li X (2013) Environmental fitness and biodiversity of Ralstonia solanacearum species complex 2013 Northeast Potato Technology Forum, March, 2013, Charlottetown, PEI, Canada  Seifert KA (2014) Ochratoxin A and its implications for international trade: Genomics of Penicillium verrucosum. USDA-ARS AAFC-STM Meeting: Applying Next Generation Sequencing (NGS) towards monitoring and genotyping of high risk plant pathogens. , Web conference. June 2014.  Seifert KA, Frisvad JC, Keller N, Li X and Prusky D (2014) Mycotoxins and Antibiotics and in Penicillium genomes (Oral Presentation). US Dept. of Energy Joint Genome Institute, Penicillium genome annotation meeting, March 2014, Walnut Creek, California  Smith D and Singh A (2014) De novo assembly and analysis of the mRNA transcriptome in resting spores of the potato wart pathogen Synchytrium endobioticum (Oral presentation by Kat Yuan; abstract in Canadian Journal of Plant Pathology: In press) 2013 CPS Maritime Regional Meeting, Nov., 2013, Charlottetown, PEI, Canada  Tambong JT, Adam Z, Lewis CT, Lévesque CA, Chen W, Khan I, Bromfield ESP and Xu R (2014) Comparative genomics and development of a TaqMan Real-Time PCR assay for specific detection of P. stewartii subsp. stewartii. 13th International Conference of Plant Pathogenic Bacteria, June 8-13, 2014, Shanghai, China. pp. 34  Tambong JT, Loaiza N, Xu R, Sanchez E, Gomez LA, Kaneza C-A and Beaupré L (2012) Screening house-keeping genes for phylogenetic differentiation of subspecies of Pantoea stewartii. The 2012 Annual meeting of the Canadian Phytopathological Society, Canadian Journal of Plant Pathology (2013), Niagara Falls, ON, Canada pp. 126  Whiteduck-Leveillee J, Kaneza CA, Léveillée K, Talbot G, Topp E, Lapen DR, Tambong JT, Lévesque CA and Khan IUH (2013) Optimization of a genus-specific PCR assay and restriction fragment length polymorphism (RFLP) analysis for rapid detection of Arcobacter spp. 63rd Annual Conference of the Canadian Society of Microbiologists (CSM), , June 17-20, 2013, Ottawa, ON, Canada,  Whiteduck-Leveillee J, Whiteduck-Leveillee K, Kaneza CA, Cloutier M, Laprade N, Topp E, Guylaine T, Lapen DR, Tambong JT, Lévesque CA and Khan IUH (2013) Multiplex PCR assay for the detection and identification of virulent genes in Arcobacter species. Canadian Society of Microbiologists (CSM) 63rd Annual Conference, , June 17- 20, 2013, Ottawa, ON, Canada  Yuan KX, Adam Z, Tambong JT, Lévesque CA, Chen W, Lewis CT, De Boer SH and Li X (2014) Draft genome sequence of one Canadian Isolate of Pectobacterium wasabiae strain CFIA1002 (Oral Presentation, abstract in conference proceedings). 2014 Northeast Potato Technology Forum, March 2014, Charlottetown, PEI, Canada  Yuan KX, Chen W, Lewis CT, Tambong JT, Adam Z, De Boer SH and Li X (2013) Comparative genomics of pectobacteria and development of species-specific assays against Pectobacterium wasabiae (Oral presentation). 2013 CPS Maritime Regional Meeting, November 2013, Charlottetown, PEI, Canada  Yuan KX, Li X, Adam Z, Tambong JT, Lévesque CA, Chen W, Lewis CT and De Boer SH (2014) Draft genome sequence of Pectobacterium wasabiae strain CFIA1002 (Oral Presentation). 3rd International Workshop on Erwinia, June, 2014, Shanghai, China.

6.3 Presentations to senior management

The CSSP project CRTI 09-462RD was featured in several presentations to upper management. The most important ones are the following: 1. Lévesque, C.A. 2013. Next Generation Sequencing - A Paradigm Shift in Monitoring and Genotyping of Quarantine and Invasive Alien Species. Invited oral presentation at the

page 65 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

AAFC-USDA Science and Technology Collaboration Dialogue between ADM Mithani and USDA Undersecretary Woteki, Central Experimental Farm, Ottawa, 6 May, 2013. 2. Lévesque, C.A. 2013. Next Generation Sequencing (NGS) – A Paradigm Shift in Monitoring of Quarantine Species Invited oral presentation at the S&T Branch Policy Committee of Agriculture and Agri-Food Canada, Ottawa, STB Headquarters, 27 June, 2013. (DG’s of S&T Branch) 3. Lévesque, C.A. 2013. Next Generation Sequencing (NGS) – A Paradigm Shift in Monitoring of Quarantine Species Invited oral presentation at the S&T Branch Executive Committee of Agriculture and Agri-Food Canada, Ottawa, STB Headquarters, 4 July, 2013. (ADM of S&T Branch) 4. Lévesque, C.A. 2013. Next Generation Sequencing (NGS) of DNA – Challenges and Opportunities for Trade. Invited oral presentation at the Director Generals Policy and Programs Management Committee (DGPPMC) of Agriculture and Agri-Food Canada, Ottawa, DM conference room, 25 September, 2013. (DG’s of Department) 5. Lévesque, C.A. 2013. Next Generation Sequencing (NGS) of DNA – Challenges and Opportunities for Trade. Invited oral presentation at the Policy and Programs Management Committee (PPMC) of Agriculture and Agri-Food Canada, Ottawa, DM conference room, 8 October, 2013. (chaired by DM) 6. Lévesque, C.A. 2013. DNA barcoding and new sequencing technologies – CSI for Ag. Invited oral presentation at the ”So You Think You Know Ag” series, Communications and Consultations Branch of Agriculture and Agri-Food Canada, Ottawa, C&C Headquarters, 28 November, 2013. 7. Lévesque, C.A. 2014. DNA barcoding and new sequencing technologies – challenges and opportunities for trade. Invited oral presentation at the Visit of Taki Sarantakis, Assistant Secretary for Economic sector at Treasury Board, Central Experimental Farm, Ottawa, 8 January, 2014. 8. Lévesque, C.A. 2014. DNA Sequencing - Challenges and Opportunities for Trade. Invited directly by DM office for an oral presentation at the AAFC All-Staff Meeting, National Headquarters in Ottawa and video broadcasted to all AAFC offices and centres in Canada, 8 December, 2014.

page 66 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

List of symbols/abbreviations/acronyms/initialisms

AAFC Agriculture & Agri-Food Canada, Government of Canada ABI Applied Biosystems ADM Assistant Deputy Minister, Agriculture and Agri-Food Canada AFLP Amplified fragment length polymoprhism ANI Average Nucleotide Identity AODP Automated Oligonucleotides Design Pipeline AODP-OFP Automated Oligonucleotides Design Pipeline - Oligo Fishing Pipeline APHIS Animal, Plant Health Inspection Service, United States Department of Agriculture APS-MSA American Phytopathological Society and Mycology Society of America ARS Agricultural Research Service, Beltsville, United States Department of Agriculture ASO-PCR allele-specific oligonucleotide Polymerase Chain Reaction assays BC British Columbia, Canada BLAST Basic Local Alignment Search Tool bp base pair of deoxyribonucleic or ribonucleic acid CCFC Canadian Collection of Fungal Cultures CFIA Canadian Food Inspection Agency CFIA PEI Canadian Food Inspection Agency, Prince Edward Island CGC Canadian Grain Commission cpn60 chaperonin-60 CRAN The Comprehensive R Archive Network CRTI Cbrne (chemical, biological radio nuclear & explosives) Research and Technology Initiative, now under CSSP CSS Centre for Security Science, Defence Research and Development Canada, Government of Canada CSSP Canadian Safety and Security Program DAOM National Mycological Herbarium Ottawa DNA deoxyribonucleic acid EFSA The European Food Safety Agency EmTox Emerging Mycotoxins EPPO European and Mediterranean Plant Protection Organization EST expressed sequence tag EU European Union FDK Fusarium Damage Kernel FY fiscal year GB gigabyte GBS Genotyping by Sequencing GoC Government of Canada GRDI Genomics Research & Development Initiative GUI graphical user interface IARC International Agency for Research on Cancer IMC10 Tenth International Mycological Congress

page 67 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

ISB Information Systems Branch ITS1 internal transcribed spacer region 1 ITS2 internal transcribed spacer region 2 JGI Joint Genome Initiative LCA lowest common ancestor pipeline LIMS Laboratory Information Management System LTB Low Temperature Basidiomycetes MCGA Manitoba Corn Grower Association MID multiplex identifier MIDs multiplex identifiers

MIT Massachusetts Institute of Technology MLSA Multilocus Sequence Analysis MLST Multilocus Sequence Typing MTA Material Transfer Agreement NAPPO North American Plant Protection Organization NGS Next Generation Sequencing OFP Oligo-Fishing Pipeline OPI Office of Primary Interest OTA ochratoxin A OTU Operational Taxonomic Units PCR Polymerase Chain Reaction PCR-AFLP Polymerase Chain Reaction of Amplified fragment length polymoprhism PEI Prince Edward Island, Canada PHAC Public Health Agency of Canada PPC Plant Pest Containment, regulations of CFIA PRC Chair Project Review Committee PRI Plant Research International, Wagenigen, Netherlands PWGSC Public Works and Government Services Canada, Government of Canada qPCR Quantitative real time PCR R&D Research & Development RFLP Restriction fragment length polymorphism RMOU Research Memorandum of Understanding RNA ribonucleic acid RNASeq ribonucleic acid sequencing RPS Project Research Project Submission, Canadian Food Inspection Agency, CHA-P-1412 Government of Canada RT-PCR Quantitative real time PCR S&T Science and Technology SAGES Sustainable Agriculture Environmental Systems Initiative SNP single-nucleotide polymorphism SO signature oligonucleotides SSC Shared Services Canada, Government of Canada SSR simple sequence repeats STB Science and Technology Branch, Agriculture and Agri-Food Canada T3SS type III secretion system

page 68 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

TB terabyte TRL Technology Readiness Level of Deliverable trpB gene tryptophan biosynthesis gene US United States US Dept of United States Department of Energy Energy USA United States of America USDA U.S. Department of Agriculture

page 69 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

List of target organisms Organism Initial Pathogen name Group Disease name Ps Pantoea (Erwinia) stewartii Bacterium Corn stewart wilt Pv Penicillium verrucosum Fungus Ochratoxin Ti Tilletia indica Fungus Karnal bunt of wheat Tc Tilletia controversa Fungus Dwarf bunt of wheat Tca Tilletia caries Fungus Common bunt of wheat Tw Tilletia walkeri Fungus Ryegrass bunt Pst Puccinia striiformis f. sp. tritici Fungus wheat stripe rust Pt Puccinia triticina Fungus wheat leaf rust Pgt Puccinia graminis f.sp. tritici Fungus wheat stem rust Rs Ralstonia solanacearum race 3 biovar 2 Bacterium Brown rot Pw Pectobacterium wasabiae Bacterium blackleg-like Pcb Pectobacterium carotovorum subsp brasiliense Bacterium blackleg-like Ua Urocystis agropyri Fungus Flag smut of wheat Ss Streptomyces scabies Bacterium Potato Scab Pi Phytophthora infestans fungus-like Late Blight Se Synchytrium endobioticum Fungus Potato Wart Gp Globodera pallida Nematode Pale Potato Cyst Nematode Gr Globodera rostochiensis Nematode Golden (potato cyst) Nematode Dd Ditylenchus destructor Nematode Potato rot nematode Ddi Ditylenchus dipsaci Nematode Stem nematode Al Arcobacter lanthierii Bacterium Diarrhea As Arcobacter septicus Bacterium Diarrhea Ac Arcobacter cibarius Bacterium Diarrhea

Partner/Scientist Initial name Affiliation AL André Lévesque AAFC Ottawa DL Tim Dumonceaux/Matthew Links AAFC Saskatoon IK Izhar Khan AAFC Ottawa GB Guillaume Bilodeau CFIA Ottawa JT James Tambong AAFC Ottawa KS Keith Seifert AAFC Ottawa LK Larry Kawchuk AAFC Lethbridge QY Qing Yu AAFC Ottawa SH Sarah Hambleton AAFC Ottawa SL Sean Li CFIA Charlottetown TG Tom Gräfenhan CGC Winnipeg MR Mike Rott CFIA Saanich BM Benjamin Mimee AAFC St Jean sur Richelieu CL Christopher Lewis AAFC Ottawa WC Wen Chen AAFC Ottawa

page 70 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Software Names and Sources AntiSMASH Antibiotics & Secondary Metabolite Analysis Shell http://antismash.secondarymetab olites.org. AODP Automated Oligonucleotides Design Pipeline https://bitbucket.org/aafc- mbb/aodp_release ArrayDesigner Oligo & cDNA Microarray Design Software; Array http://premierbiosoft.com/dnami Designer designs thousands of primers and probes for croarray/index.html oligo and cDNA microarrays in seconds.

BLAST Basic Local Alignment Search Tool http://blast.ncbi.nlm.nih.gov/Bla st.cgi?PAGE_TYPE=BlastDocs &DOC_TYPE=Download

BLAT bioinformatics software a tool which performs rapid http://genome.cshlp.org/content/ mRNA/DNA and cross-species protein alignments 12/4/656

C++ General purpose programming language Celera Celera Assembler is a de novo whole-genome shotgun http://wgs- (WGS) DNA sequence assembler. assembler.sourceforge.net/wiki/i ndex.php?title=Main_Page

CRAN The Comprehensive R Archive Network http://cran.r-project.org/ FastQC FastQC aims to provide a simple way to do some http://www.bioinformatics.bbsrc quality control checks on raw sequence data coming .ac.uk/projects/fastqc/ from high throughput sequencing pipelines. fastx_trimmer The FASTX-Toolkit is a collection of command line http://hannonlab.cshl.edu/fastx_t tools for Short-Reads FASTA/FASTQ files oolkit/ preprocessing.

Galaxy Galaxy is a web-based GUI for launching cluster jobs http://galaxyproject.org/ and implements metadata tracking, reproducibility, tool interchange, and versioning.

InterProScan InterProScan is a bioinformatics tool that provides a https://code.google.com/p/interp one-stop-shop for automated sequence analysis of both roscan/ protein and nucleic acid,it identifies both structural and functional regions of interest.

LCA lowest common ancestor pipeline Maker MAKER is an easy-to-use genome annotation pipeline. http://weatherby.genetics.utah.e du/MAKER/wiki/index.php/MA KER_Tutorial_for_GMOD_Onl ine_Training_2014

MsatComman msatcommander is a python program written to locate https://code.google.com/p/msatc der microsatellite (SSR, VNTR, &c) repeats within fasta- ommander/ formatted sequence or consensus files.

page 71 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

OFP Oligo-Fishing pipeline Will be released to the public in April 2015 Perl Programming language https://www.perl.org/ QIIME Quantitative Insights Into Microbial Ecology. http://qiime.org/ RAM R package for Amplicon-based Metagenomics http://cran.r- project.org/web/packages/RAM/ index.html

RAST server Rapid Annotation Server is a fully-automated service http://www.nmpdr.org/FIG/wiki using the for annotating bacterial and archaeal genomes. It /view.cgi/FIG/RapidAnnotation Glimmer 3 provides high quality genome annotations for these Server option genomes across the whole phylogenetic tree.

SigOli Signature Oligo (“Sigoli”) software developed for Zahariev 2009 identifying unique sequence regions in large databases before using Array Designer

SPAdes SPAdes – St. Petersburg genome assembler – is http://bioinf.spbau.ru/spades intended for both standard isolates and single-cell MDA bacteria assemblies

Tuxedo suite: The "Tuxedo Suite" of Bowtie, Tophat, and Cufflinks, http://bowtie- Bowtie, is afor the analysis of differential gene expression, bio.sourceforge.net/index.shtml Tophat, discovery of splice forms, and fast alignment of short Cufflinks reads.

UNITE ITS UNITE also offers a non-redundant version with all http://www.mothur.org/wiki/UN database fungal ITS sequences clustered approximately at the ITE_ITS_database species level.

Velvet Velvet is an algorithm package that has been designed https://www.ebi.ac.uk/~zerbino/ assembler to deal with de novo genome assembly and short read velvet/ sequencing alignments

Webex Video conferencing software from CISCO http://www.webex.com/ YAML data YAML is a human friendly data serialization http://yaml.org/ files standard for all programming languages.

page 72 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Glossary.....

454 sequencing A parallelized version of pyrosequencing was developed by 454 Life Sciences, which has since been acquired by Roche Diagnostics. The method amplifies DNA inside water droplets in an oil solution (emulsion PCR), with each droplet containing a single DNA template attached to a single primer-coated bead that then forms a clonal colony. The sequencing machine contains many picoliter-volume wells each containing a single bead and sequencing enzymes. Pyrosequencing uses luciferase to generate light for detection of the individual nucleotides added to the nascent DNA, and the combined data are used to generate sequence read-outs. allele One of two or more alternative forms of a gene that arise by mutation and are found at the same place on a chromosome. amplicon An amplicon is a piece of DNA or RNA that is the source and/or product of natural or artificial amplification or replication events. It can be formed using various methods including polymerase chain reactions (PCR), ligase chain reactions (LCR), or natural gene duplicatio bioinformatics The science of collecting and analyzing complex biological data such as genetic codes. biovar A biovar is a variant prokaryotic strain that differs physiologically and/or biochemically from other strains in a particular species. Morphovars (or morphotypes) are those strains that differ morphologically. Serovars (or serotypes) are those strains that have antigenic properties that differ from other strains.

Bruvo's distance This function calculates the distance between two individuals at one microsatellite locus using a method based on that of Bruvo et al. (2004). chytrid fungus Any of the simple, algaelike fungi constituting the class Chytridiomycetes, order Chytridiales, of aquatic and soil environments, having flagellated zoospores and little or no mycelium. cpn60 The chaperonin family is a family of evolutionarily related proteins. de novo Starting from the beginning; anew.

DNA barcode DNA barcoding is a technique for characterizing species of organisms using a short DNA sequence from a standard and agreed-upon position in the genome. DNA barcode sequences are very short relative to the entire genome and they can be obtained reasonably quickly and cheaply. effector In biochemistry, an effector molecule is usually a small molecule that selectively binds to a protein and regulates its biological activity. In this manner, effector molecules act as ligands that can increase or decrease enzyme activity, gene expression, or cell signalling. epiphytes An organism that grows on a plant but is not parasitic. page 73 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

ergot A fungal disease of rye and other cereals in which black, elongated, fruiting bodies grow in the ears of the cereal. Eating contaminated food can result in ergotism.

EST An expressed sequence tag or EST is a short sub-sequence of a cDNA sequence genera Plural form of genus. genome The complete set of genes or genetic material present in a cell or organism. genome annotation The process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do. genotype The genetic constitution of an individual organism.

Genotyping by Genotyping by sequencing (GBS) enables the analysis of large numbers Sequencing (GBS) of single nucleotide polymorphisms (SNPs) in studies of genetic variation in a diverse range of species. GBS uses a highly multiplexed, cost effective approach that includes the use of restriction enzymes to reduce genome complexity and next generation sequencing technologies for SNP analysis. genus A principal taxonomic category that ranks above species and below family, and is denoted by a capitalized Latin name

HiSeq A next generation sequencing platform from illumina that uses clonal bridge amplification for template preparation and Reversible Dye Terminator chemistry at very high quantities. house-keeping gene In molecular biology, housekeeping genes are typically constitutive genes that are required for the maintenance of basic cellular function, and are expressed in all cells of an organism under normal and patho- physiological conditions. in silico analysis An analysis performed on a computer or via a computer simulation. isolates A cultured strain of bacteria or fungus. karnal bunt Karnal bunt is a fungal disease of wheat, durum wheat, and triticale. mate-pair Mate-pair sequencing genreates long-insert paired end DNA sequences, sequencing data between 2-5 kb. melibiose Is a galactose and glucose with an alpha-1,6 linkage metagenomic Metagenomics is the study of a collection of genetic material (genomes) from a mixed community of organisms metalaxyl A fungicide microbial biota Ecological community of microbes.

page 74 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

microsatellite Microsatellites, also known as simple sequence repeats (SSRs) or short markers or tandem repeats (STRs), are repeating sequences of 2-5 base pairs of microsattelites DNA.

MIDS Method used to maximise the throughput from a sequencing run, coded separation of samples using multiplex identifier (MIDs).

MiSeq A next generation sequencing platform from illumina that uses clonal bridge amplification for template preparation and Reversible Dye Terminator chemistry for lab level sequencing. mitochondrial gene Genes found in mitochondria region mycotoxigenic Produces mycotoxins. mycotoxin A toxic substance produced by a fungus.

N50 score Given a set of contigs, each with its own length, the N50 length is defined as the length for which the collection of all contigs of that length or longer contains at least half of the sum of the lengths of all contigs, and for which the collection of all contigs of that length or shorter also contains at least half of the sum of the lengths of all contigs.

NextGen sequencing. Next-generation sequencing refers to non-Sanger-based high-throughput DNA sequencing technologies. Millions or billions of DNA strands can be sequenced in parallel, yielding substantially more throughput and minimizing the need for the fragment-cloning methods that are often used in Sanger sequencing of genomes.

NGS Next-generation sequencing refers to non-Sanger-based high-throughput DNA sequencing technologies. Millions or billions of DNA strands can be sequenced in parallel, yielding substantially more throughput and minimizing the need for the fragment-cloning methods that are often used in Sanger sequencing of genomes. ochratoxin A Produced by Aspergillus and Penicillium species, this is one of the most- abundant food-contaminating mycotoxins. oligo oligonucleotide shortform. oligonucleotides a polynucleotide whose molecules contain a relatively small number of nucleotides. paired-end Illumina Paired-end sequencing allows users to sequence both ends of a fragment MiSeq sequencing and generate high-quality, alignable sequence data. Paired-end reads sequencing facilitates detection of genomic rearrangements and repetitive sequence elements, as well as gene fusions and novel transcripts. pathotype Any of a group of organisms (of the same species) that have the same pathogenicity on a specified host

page 75 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

pathotyping To identify and differentiate pathotypes of a species

PCR Polymerase chain reaction, or PCR, is a laboratory technique used to make multiple copies of a segment of DNA. PCR is very precise and can be used to amplify, or copy, a specific DNA target from a mixture of DNA molecules. pectolytic Producing hydrolysis of pectic substances phage-related loci Remnant sequences from viral DNA integration phenotypic The observable properties of an organism that are produced by the interaction of the genotype and the environment phylogenetic trees. A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the inferred evolutionary relationships among various biological species or other entities—their phylogeny—based upon similarities and differences in their physical or genetic characteristics.

Pool-Seq The sequencing of pooled DNA samples. primer A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis principal component A clustering analysis on two different axes. analysis pyrosequencing A method of DNA sequencing (determining the order of nucleotides in DNA) based on the "sequencing by synthesis" principle quantitiative pcr Real-time or Quantitative PCR use the linearity of DNA amplification to determine absolute or relative amounts of a known sequence in a sample. By using a fluorescent reporter in the reaction, it is possible to measure DNA generation. raffinose Raffinose is a trisaccharide composed of galactose, glucose, and fructose. real-time PCR Real-time or Quantitative PCR use the linearity of DNA amplification to determine absolute or relative amounts of a known sequence in a sample. By using a fluorescent reporter in the reaction, it is possible to measure DNA generation.

RFLP Restriction fragment length polymorphisms are a variation in the length of restriction fragments produced by a given restriction enzyme in a sample of DNA. Such variation is used in forensic investigations and to map hereditary disease.

RNA Seq RNA-seq (RNA Sequencing), also called Whole Transcriptome Shotgun Sequencing (WTSS), is a technology that uses the capabilities of next- generation sequencing to reveal a snapshot of RNA presence and quantity from a genome at a given moment in time

page 76 CRTI 09-462RD / CSSP 30vv01 Agri-food pathogen detection and next-gen sequencing

Sanger sequencing Sanger sequencing is a method of DNA sequencing based on the selective incorporation of chain-terminating dideoxynucleotides by DNA polymerase during in vitro DNA replication. Developed by Frederick Sanger and colleagues in 1977, it was the most widely used sequencing method for approximately 25 years.

Signature A unique polynucleotide whose molecules contain a relatively small oligonucleotides number of nucleotides that are species or group specific.

SNP detection SNP genotyping is the measurement of genetic variations of single nucleotide polymorphisms (SNPs) between members of a species. It is a form of genotyping, which is the measurement of more general genetic variation. SNPs are one of the most common types of genetic variation. species Principal natural taxonomic unit

SSR profiles Also known as microsattelites or short tandem repeats (STRs), are repeating sequences of 2-5 base pairs of DNA. subspecies A taxonomic category that ranks below species subspecific variation Genetic or morphological differences within a species.

Taqman probes TaqMan probes are hydrolysis probes that are designed to increase the specificity of quantitative PCR. taxon A taxonomic group of any rank, such as a species, family, or class. teliospore A spore of certain rust fungi, which carries the fungus through the winter and which, on germination, produces the promycelium. transcriptome The sum total of all the messenger RNA molecules expressed from the genes of an organism. trpB gene Tryptophan biosynthesis gene type III secretion A protein appendage found in several Gram-negative bacteria. In system (T3SS) pathogenic bacteria, the needle-like structure is used as a sensory probe to detect the presence of eukaryotic organisms and secrete proteins that help the bacteria infect them.

page 77