Can DNA barcoding and Next Generation Sequencing (NGS) technology detect invasive species in seed lots? Steve Jones and Marie-José Côté

© 2017 Her Majesty the Queen in Right of Canada (Canadian Food Inspection Agency), all rights reserved. Use without permission is prohibited. Presentation overview

Background: Species id and molecular testing What we did What we found Further research Future use

Background

Morphological id (dichotomous key) 1689: Richard Waller drawings and watercolours 1778: Jean-Baptiste Lamarck Flora Française text based Seed testing (correct crop species and other seed id) 1869: Prof. Dr. Friederick Nobbe first seed testing lab 1896: 119 seed labs worldwide 19 countries Molecular testing (DNA bar coding) 1953: DNA structure discovered: AG CT base pairs 1970s: Sanger sequencing 2019: Molecular species id in seed lots: let’s see….

Sanger sequencer DNA barcoding – how does it works?

Top 5 lines: Amaranthus tuberculatus

Bottom 3 lines: Amaranthus palmeri Herbarium specimens are subsampled for DNA extraction. DNA is then sequenced using universal DNA barcoding primers. DNA barcode sequences for ITS2, trnH-psbA are collected DNA barcoding – identifies differences

C C A G A C C A T G C T C C C C C A T A G G G A C A .Asteraceae Centaurea stoebe QFA 511500 C C A G A C C A T G C T C C C C C A T A G G G A C A .Asteraceae Centaurea stoebe ALTA 118589 C T A G A C C A T G C T C C C T T A G A G G G A C G .Asteraceae Centaurea phrygia DAO 618362 C T A G A C C A T G C T C C C T T A G A G G G A C G .Asteraceae Centaurea phrygia DAO 854486 C C A A A C C A A G C T C C C C C A T A G G G A C G .Asteraceae Centaurea sulphurea DAO 871922 C C A A A C C A A G C T C C C C C A T A G G G A C G .Asteraceae Centaurea sulphurea US 1221543 C C A G A C C A T A C T C C C C C A T A G G G A T G .Asteraceae Centaurea iberica DAO 854476 C C A G A C C A T A C T C C C C C A T A G G G A T G .Asteraceae Centaurea iberica DAO 854478 C C A G A C C A T A C T T C C C C A T A G G G A T G .Asteraceae Centaurea calcitrapa DAO 870915 C C A G A C C A T A C T T C C C C A T A G G G A T G .Asteraceae Centaurea calcitrapa DAO 634363 C C A G A C C A T C C T C C C C C A T A G G G A C G .Asteraceae Centaurea solstitialis DAO 854503 C C A G A C C A T C C T C C C C C A T A G G G A C G .Asteraceae Centaurea solstitialis DAO 284465 C C A G A C C A T G C T C C C T C A T T G G G A T G .Asteraceae Centaurea sicula (nicaeensis) CAN 219416 C C A G A C C A T G C T C C C T C A T T G G G A T G .Asteraceae Centaurea sicula (nicaeensis) US 1667075 C C A A T C C T T G C T C C C T C A T A G G G A T G .Asteraceae Centaurea melitensis CAN 550565 C C A A T C C T T G C T C C C T C A T A G G G A T G .Asteraceae Centaurea melitensis CAN 208260 DNA barcoding can create phylogenic trees

A T G C A T T A T G T - - C T C C C C T C T C G T G T G T G G A G T G - G G A A T A G A T C C T G G C C T C C T G G G C C C T T C C T Convolvulaceae Cuscuta campestris A T G C A T T A T G T - - C T C C C C T C T C G T G T G T G G A G T G - G G A A T A G A T C C T G G C C T C C T G G G C C C T T C C T Convolvulaceae Cuscuta campestris A T T C G T T A T G T - - C T C C C C T C T - - T G T G T G G T G C G - G G A G C A G A T T G T G G C C T C C T G G G C C C T T C C T Convolvulaceae Cuscuta umbrosa A T T C G T T A T G T - - C T C C C C T C T - - T G T G T G G T G C G - G G A G C A G A T T G T G G C C T C C T G G G C C C T T C C T Convolvulaceae Cuscuta umbrosa A T T C G T T A T G T - - C T C C C C T C T - - T G T G T G G T G C G - G G A G T A G A T T G T G G C C T C C T G G G C C C T T C C T Convolvulaceae Cuscuta gronovii A T T C G T T A T G T - - C T C C C C T C T - - T G T G T G G T G C G - G G A G T A G A T T G T G G C C T C C T G G G C C C T T C C T Convolvulaceae Cuscuta gronovii A A A T G T T T C G T - - C G T T C C T A A T C C T T T T T G T G A G - G G A C T G G T T C A T G G C C T C C C A G G C T A A G G T T Convolvulaceae Cuscuta indecora A A A T G T T T C G T - - C G T T C C T A A T C C T T T T T G T G A G - G G A C T G G T T C A T G G C C T C C C A G G C T A A G G T T Convolvulaceae Cuscuta indecora A C G T G A C G T G T C G C T C C C T T C T G G A T T T T T T T G G G A G G A G C G G A T A A T G T C C T C C C G T G C C T A T T G T Convolvulaceae Cuscuta approximata A C G T G A C G T G T C G C T C C C T T C T G G A T T T T T T T G G G A G G A G C G G A T A A T G T C C T C C C G T G C C T A T T G T Convolvulaceae Cuscuta approximata 70 75 80 85 90 95 100

Cuscuta indecora 69.9 Cuscuta indecora Cuscuta approximata Cuscuta approximata 68.2 Cuscuta umbrosa 99.7 Cuscuta umbrosa Cuscuta gronovii 91.1 Cuscuta gronovii Cuscuta campestris Cuscuta campestris DNA barcoding to visualise species links 78 80 82 84 86 88 90 92 94 96 98 76 100

Orobanchaceae Orobanche uniflora DAO 653751 85.2 Orobanchaceae Orobanche uniflora DAO 438357 Orobanchaceae Orobanche ludoviciana DAO 035116 Orobanchaceae Orobanche ludoviciana DAO 035107 Orobanchaceae Orobanche minor DAO 402190

83.6 99.7 Orobanchaceae Orobanche minor DAO 438307 Orobanchaceae Orobanche crenata DAO 322029 98.9 Orobanchaceae Orobanche crenata MICH 261 Orobanchaceae Orobanche hederae CAN 148927 96.6 99.9 Orobanchaceae Orobanche hederae UBC V26912 81.5 Orobanchaceae Orobanche cumana CAN 240529

99.7 Orobanchaceae Orobanche cumana MICH 3792 Orobanchaceae Orobanche cernua MICH 6119 Orobanchaceae Orobanche cernua MICH 7182 Orobanchaceae Orobanche ramosa DAO 284362 75.7 Orobanchaceae Orobanche ramosa DAO 438314 Orobanchaceae Striga forbesii US 633677 Orobanchaceae Striga forbesii US 3256690 95.2 Orobanchaceae Striga asiatica MICH 3722 Orobanchaceae Striga asiatica MICH 6432 Orobanchaceae BOT-15-015 93.3 99.7 Orobanchaceae Striga gesnerioides US 3006311 Orobanchaceae Striga gesnerioides US 2671711 99.2 Orobanchaceae Striga hermonthica US 97.5 Orobanchaceae Striga hermonthica US 1595603 Orobanchaceae Striga aspera US 37829 Orobanchaceae Striga aspera US 3107699 CFIA DNA barcode reference database in use

TABLE 1. List of plant specimens submitted for identification Specimen submitted Morphological identification Barcode identification Specimen code Family Identified Species ITS2 trnH-psbA BOT-17-036 Poaceae Aegilops cylindrica Host Jointed goatgrass A. cylindrica or A. triuncialis A. cylindrica or A. tauschii BOT-16-015 Poaceae Alopecurus geniculatus L. Water foxtail A. aequalis or A. geniculatus A. geniculatus BOT-16-047 Poaceae Alopecurus pratensis L. Meadow foxtail A. arundinaceus or A. pratensis A. arundinaceus or A. pratensis BOT-14-094 Poaceae Andropogon gerardi Vitman Big bluestem A. gerardi or A. hallii A. gerardi BOT-13-103 Poaceae Bromus inermis Leyss. Smooth brome B. inermis B. inermis BOT-15-013 Poaceae Bromus sp. L. Brome grasses B. secalinus B. secalinus BOT-15-070 Asteraceae Carduus nutans L. Nodding thistle C. nutans Carduus or Cirsium spp. BOT-13-187 Asteraceae Centaurea nigra L. Black knapweed C. nigra C. nigra or C. phrygia BOT-15-073 Asteraceae Centaurea stoebe L. Spotted knapweed C. diffusa or C. stoebe or C. virgata C. diffusa or C. stoebe or C. virgata BOT-16-060 Asteraceae Cirsium arvense (L.) Scop. Canada thistle C. arvense Carduus or Cirsium spp. BOT-12-035 Asteraceae Cirsium vulgare (Savi) Ten. Bull thistle C. vulgare Carduus or Cirsium spp. BOT-13-165 Convolvulaceae Cuscuta gronovii Willd. ex Roem. & Schult. Swamp dodder C. gronovii N/A BOT-15-057 Solanaceae Datura stramonium L. Jimsonweed D. stramonium D. stramonium BOT-16-112 Poaceae Digitaria sanguinalis (L.) Scop. Hairy crabgrass D. sanguinalis Digitaria spp. BOT-15-050 Poaceae Echinochloa crus-galli (L.) P. Beauv. Large barnyard grass Echinochloa spp. Echinochloa spp. BOT-14-081 Poaceae Echinochloa muricata (P. Beauv.) Fernald Rough barnyard grass Echinochloa spp. Echinochloa spp. BOT-13-198 Poaceae Eriochloa villosa (Thunb.) Kunth Woolly cupgrass E. villosa E. villosa BOT-13-087 Poaceae Festuca rubra L. Red fescue F. rubra Festuca spp. BOT-14-117 Poaceae Hordeum jubatum L. foxtail barley Hordeum spp. H. jubatum BOT-14-115 Poaceae Panicum capillare L. Common panicgrass P. capillare P. capillare or P. miliaceum BOT-12-150 Poaceae Panicum dichotomiflorum Michx. Fall panicgrass P. dichotomiflorum P. dichotomiflorum BOT-14-125 Poaceae Panicum miliaceum L. Proso millet P. miliaceum P. capillare or P. miliaceum BOT-16-054 Polygonaceae Persicaria lapathifolia L. Delarbre Pale smartweed P. lapathifolia P. lapathifolia BOT-16-077 Poaceae Phleum pratense L. Common timothy P. pratense P. pratense BOT-15-019 Solanaceae Solanum americanum Mill. American black nightshade S. americanum or S. physalifolium S. americanum or S. physalifolium BOT-14-100 Solanaceae Solanum physalifolium Rusby Ground-cherry nightshade S. americanum or S. physalifolium S. americanum or S. physalifolium BOT-15-015 Orobanchaceae Striga sp. Witchweeds S. asiatica Striga sp. Barcode of Life Database (BOLD) Key words search  Public database of DNA barcodes sequenced from plant voucher specimens of animal, plant, fungi

 Most of the CFIA plant DNA barcode collection is being transferred to BOLD to allow for public access

List of corresponding DNA barcode http://www.boldsystems.org/index.php/default voucher specimens sequences available How else could DNA barcoding be used?

What if it was possible to sequence several individuals together and then sort and identify the species within a sample?

One option:

Metabarcoding Metabarcoding: Next Generation Sequencing (NGS) robot and machine Metabarcoding: What is it and how does it works? • Next Generation Sequencing is a technology that can sequence and track the results of several different molecules in one reaction (i.e. mixed samples)

• This technology will take molecular diagnostics and identification to a whole new level Metabarcoding: process overview Wheat screenings

PCR Alignment Reference barcodes with Ref 1

Barley screenings Ref 1 Ref 2 Ref 2 Ref 3 PCR Ref 4 Ref 3 Ref 5 Ref 6 Ref 4 Ref 5 Ref 6 Flax screenings NGS reads NGS reads mapped to PCR reference barcodes Metabarcoding: the results so far Wheat Barley Flax Reference species 1 2 3 4 1 2 3 4 1 2 3 4 amaranthoides 1 59072 (5) 1 1 Amaranthaceae Bassia Kochia scoparia 106769(1) 13419(1) 19302(1) 2 32671 (5) 25001 41593 (1) 111707 73491 (13) 78274 (13) 18208 (1) 1515 Amaranthaceae Chenopodium album 106769(5) 19765(5) 15430(4) 2704(4) 3127 3723 26012 (4) 19 27 1987 (1) 64681 (4) 28258 (2) Amaranthaceae Salsola tragus 12 12 1 1 19 1311 (1) 1 4 Apiaceae Daucus carota 597 (2) 1 7850 (4) Asteraceae Ambrosia artemisiifolia 56 (1) 2 138 (1) 1 47 (1) Asteraceae Arctium minus 59338 (1) 1 Asteraceae Carduus nutans 8061 (2) Asteraceae Centaurea diffusa or stoebe or virgata 1 1 24135 (2) Asteraceae Cirsium arvense 1459 219 (1) 169 161 290 3631 (3) 409 1810 99 130 283 102 Asteraceae Rhaponticum repens 0 (2) 9 (2) 4499 (2) Asteraceae Sonchus arvensis 115 1559 71 2 2121 9998 2379 20254 988 (1) 25 9 14 Asteraceae Sonchus oleraceus or asper 42 104 692 739 529 621 801 4676 5178 (2) 274 51 10 Asteraceae Tragopogon dubius 117 (1) Brassicaceae Brassica rapa-napus 721414 (2) 1218578 (4) 905633 (3) 1101020 434000 118771 625609 (3) 269308 47971 40818 210228 (3) 25461 Brassicaceae Capsella bursa-pastoris 5 11980 (2) 12 16714 582 1 7 1 4669 3717 Brassicaceae Descurainia sophia 1 6 4 1358 (2) 3924 (6) 10 126 Brassicaceae Erucastrum gallicum 10448 (1) 1 Brassicaceae Erysimum cheiranthoides 2 2 2 2 239 1046 (1) Brassicaceae Lepidium densiflorum or virginicum 729 (1) Brassicaceae Lepidium draba 162 (1) Brassicaceae Raphanus sativum or raphanistrum 1 1 9967 (1) 4184 (1) 1 1 1 3414 (1) 1 Brassicaceae Sinapis arvensis 33 19 43 16 20 4030 22 73 314133 (10) 37324 (3) 43 48 Brassicaceae Sinapsis alba 15 661 16 6 8 1 36 10 3 7 12 508245 (3) Brassicaceae Thlaspi arvense 128313 (3) 93 5 1 7 32 13 371 5 26 Caryophyllaceae Silene noctiflora 8977 (1) 1 1 10 3 Caryophyllaceae Spergula arvensis 6 (1) Caryophyllaceae Stellaria media 14 1165 (1) 407 (1) 2 20 Caryophyllaceae Vaccaria hispanica 5531 (2) Linaceae Linum usitatissimum 88050 720 29206 820 20342 7126 951 325376 (1) 1975697 3137824 2631251 2006287 Malvaceae Abutilon theophrasti 8 7 3 1 309213 (1) 3 3 14 5 5 11 3 Poaceae Avena sativa or fatua 47721 (3) 25 195 106 97 7 412 417 163 18 1 1 Poaceae Bromus secalinus or japonicus 1 6091 (1) 1 Poaceae Echinochloa crus-galli or esculenta or 17 (1) 42 (1) 235 1 frumentaceae or muricata Poaceae Elymus repens 221 366 (1) 74 162 (1) 126 40 18 Poaceae Hordeum distale or hexale or vulgare 14813 596 15633 6894 251920 271965 99200 211743 6 38 25 6 Poaceae Setaria pumila 458 (1) 109 1 1 Poaceae Setaria viridis or italica 16524 (3) 14430 (19) 19 1151 (3) 1297 97048 2714 43123 (2) 10 5 7 Poaceae Triticum aestivum 156319 51195 66505 53497 1074 676 2071 32837 19 6 7 7 Polygonaceae Persicaria lapathifolia 11 (1) 11 13 1 364 (1) 16 1 95 2400 (8) 39 36 Polygonaceae Polygonum aviculare 3 (1) 1 Portulacaceae Portulaca oleracea 979 (6) Ranunculaceae Ranunculus acris 1 34517 (1) Rubiaceae Galium spurium 3085 (2) 13 2631 (4) 198 (3) 8 2 1005 (4) 3837 (39) 240 (4) 87 (1) Solanaceae Solanum carolinense 194 (1) Table 1. Number of NGS reads aligning with the reference weed or crop species ITS2 barcode. Metabarcoding: looking at the details

Wheat Barley Flax Reference species 1 2 3 4 1 2 3 4 1 2 3 4 Amaranthaceae Salsola tragus 12 12 1 1 19 1311 (1) 1 4 Apiaceae Daucus carota 597 (2) 1 7850 (4) Asteraceae Ambrosia artemisiifolia 56 (1) 2 138 (1) 1 47 (1) Asteraceae Arctium minus 59338 (1) 1 Asteraceae Carduus nutans 8061 (2) Asteraceae Centaurea diffusa or stoebe 1 1 24135 (2) or virgata

Green boxes are screening samples to which the reference species was intentionally added In ( ) is the number of seeds used to spiked the screening samples Metabarcoding: some easier than others

Wheat Barley Flax Reference species 1 2 3 4 1 2 3 4 1 2 3 4 Brassicaceae Lepidium draba 162 (1) Brassicaceae Raphanus sativum or 1 1 9967 (1) 4184 (1) 1 1 1 3414 (1) 1 raphanistrum Brassicaceae Sinapis arvensis 33 19 43 16 20 4030 22 73 314133 (10) 37324 (3) 43 48 Brassicaceae Sinapsis alba 15 661 16 6 8 1 36 10 3 7 12 508245 (3) Brassicaceae Thlaspi arvense 128313 (3) 93 5 1 7 32 13 371 5 26 Caryophyllaceae Silene noctiflora 8977 (1) 1 1 10 3 Caryophyllaceae Spergula arvensis 6 (1) Caryophyllaceae Stellaria media 14 1165 (1) 407 (1) 2 20 Caryophyllaceae Vaccaria hispanica 5531 (2) Linaceae Linum usitatissimum 88050 720 29206 820 20342 7126 951 325376 (1) 1975697 3137824 2631251 2006287 Malvaceae Abutilon theophrasti 8 7 3 1 309213 (1) 3 3 14 5 5 11 3

Green boxes are for screening samples to which the reference species was intentionally added In ( ) is the number of seeds used to spiked the screening samples Further Research

 Use another genomic area from plant barcodes (rbcL), to confirm or complement the ITS2 results

 Improve some of the reference DNA barcodes, to increase the number of NGS reads mapped

 More testing to gain information on background signals and the influence of its composition on the test Now and Future Uses

 Assist seed analysts as a pre-screening tool

 Increase sample testing capacity, machines can run 24/7

 Use to identify single specimens of weed seeds in addition to morphological ids or grow outs

Teams and Collaborators DNA barcoding collection Metabarcoding

National collection of CFIA Saskatoon Seed Lab (SSTS) (DAO) Steve Jones, Nicole Wurm Gisèle Mitrow, Paul Catling, Sara Martin, Tyler Smith, James Macklin, Amanda Ward CFIA Ottawa Plant Lab (OPL) Adam Colville, Emilie Tremblay, Guillaume Bilodeau and Marie-José National Herbarium of Canada (CAN) Côté Jennifer Doubt, Jeffery M. Saarela

CFIA Ottawa Lab (Fallowfield) CFIA lab Bioinformatics Cheryl Dollard, Adam Colville, Sarah Marc-Olivier Duceppe, John Chmara Kyte, Lisa Leduc, Donald Kerr, Alexandre Blain, Marie-Claude Gagnon and Marie-José Côté

Several herbaria in Canada, USA and Europe

Thank you for listening