<<

A NEXT-GENERATION SEQUENCING APPROACH FOR THE SIMULTANEOUS

DETECTION OF FOR PLANT QUARANTINE TESTING

A Thesis

Submitted to the Graduate Faculty

In Partial Fulfillment of the Requirements

For the Degree of

Master of Science

In the Department of Pathology and Microbiology

Faculty of Veterinary Medicine

University of Prince Edward Island

Desmond L. Hammill

Charlottetown, PE

April 2019

© 2019, D.L. Hammill

THESIS/DISSERTATION NON-EXCLUSIVE LICENSE

Family Name: Hammill Given Name, Middle Name (if applicable): Desmond, Leslie Full Name of University: University of Prince Edward Island Faculty, Department, School: Faculty of Veterinary Medicine, Department of Pathology/Microbiology, UPEI Degree for which Date Degree Awarded: thesis/dissertation was presented: Master of Science Thesis/dissertation Title: A next-generation sequencing approach for the simultaneous detection of plant viruses for plant quarantine testing

In consideration of my University making my thesis/dissertation available to interested persons, I, Desmond Hammill, hereby grant a non-exclusive, for the full term of copyright protection, license to my University, University of Prince Edward Island:

a) to archive, preserve, produce, reproduce, publish, communicate, convert into any format, and to make available in print or online by telecommunication to the public for non- commercial purposes; b) to sub-licence to Library and Archives Canada any of the acts mentioned in paragraph (a). I undertake to submit my thesis/dissertation, through my University, to Library and Archives Canada. Any abstract submitted with the thesis/dissertation will be considered to form part of the thesis/dissertation.

I represent that my thesis/dissertation is my original work, does not infringe any rights of others, including privacy right, and that I have the right to make the grant conferred by this non- exclusive license.

If third party copyrighted material was included in my thesis/dissertation for which, under the terms of the Copyright Act, written permission from the copyright owners is required I have to obtained such permission from the copyright owners to do the acts mentioned in paragraph (a) above the full term of copyright protection.

I retrain copyright ownership and moral rights in my thesis/dissertation, and may deal with the copyright in my thesis/dissertation, in any way consistent with rights granted by me to my University in this non-exclusive license.

I further promise to inform any person to whom I may hereafter assign or license my copyright in my thesis/dissertation of the rights granted by me to my University in this non-exclusive license.

Signature: Date:

ii

UNIVERSITY OF PRINCE EDWARD ISLAND FACULTY OF VETERINARY MEDICINE CHARLOTTETOWN

CERTIFICATION OF THESIS WORK

We, the undersigned, certify that Desmond Hammill, candidate for the degree of Master of Science has presented his/her thesis with the following title:

A NEXT-GENERATION SEQUENCING APPROACH FOR THE SIMULTANEOUS DETECTION OF PLANT VIRUSES FOR PLANT QUARANTINE TESTING that the thesis is acceptable in form and content, and that a satisfactory knowledge of the field covered by the thesis was demonstrated by the candidate through an oral examination held on:

Examiners

Dr. Mark Fast (chair) ______

Dr. Huimin Xu ______

Dr. Chelsea Martin ______

Dr. Russell Fraser ______

Dr. Lawrence Hale ______

Date: July 9, 2019

iii

ABSTRACT

Potato (Solanum tuberosum L.) is one of the most important vegetable crops in the world and plays a vital role in human nutrition and food security. However, potato crops have been known to be infected by at least 40 viruses and 2 viroids contributing to reduced yields. The detection of viruses and viroids is limited from a diagnostic perspective as there are very few validated procedures that exist. Globalization of agriculture has meant that crop are now grown further from their centres of origin and far from the pathogens that had co-evolved with them. Crops introduced to a new area may be poorly equipped to resists pathogenic organisms that are already resident. Similarly, increased trade leads to the potential of introducing pathogenic organisms that may not be endemic to a particular region leading to outbreaks having the potential for extreme economic consequences. The Potato Post-Entry Quarantine (PPEQ) program implemented by the Canadian Food Inspection Agency (CFIA) was established under the Plant Protection Act to prevent the introduction of economically devastating foreign plant diseases into Canada. The PPEQ program allows the Canadian potato industry to obtain disease-free germplasm from other countries to meet international demands for the production of seed, table, or processing potatoes. The diagnostic methods employed in the current PPEQ program are time-consuming and rely on both traditional and modern techniques. Many of these techniques are only specific for single viral targets having weak specificity and sensitivity. Recently, next-generation sequencing (NGS) platforms have been widely accepted as high-throughput, unbiased technologies that have attractive features in the field of plant diagnostics. NGS has opened the door to very sensitive and specific testing with the opportunity to detect multiple pathogens in a single sample. In this study, various RNA extraction methods were evaluated to acquire high quality non-fragmented RNA. Integrity values ranging from 7-9 were obtained using the PureLink Total RNA Mini kit and lysis buffer (Thermo Scientific, Waltham, USA). Using this method for RNA extraction, the application of NGS was explored to determine if plant pathogenic viral genomes could be generated de novo via a bioinformatic pathway when mapped to the whole host genome without a priori knowledge of their presence. Intended potato tuber imports intercepted from Bangladesh showed that viral contigs from multiple mixed infections of Potato aucuba mosaic (PAMV), Potato virus Y (PVY), Potato virus X (PVX), Potato virus S (PVS), Potato virus M (PVM), Potato leaf roll virus (PLRV), and Tomato chlorosis virus (ToCV) could be detected simultaneously with high specificity. Similarly, NGS was successfully applied to determine the causal agents of unknown etiology in tomato (S. lycopersicum) without a priori knowledge of their existence in the sample and confirmed the first report of Southern tomato virus (STV) in Canada. This NGS protocol will aid in the diagnosis of pathogens that were otherwise unable to be tested for within the current PPEQ program as well as identify unknown agents from other samples allowing for the development of new routine diagnostic assays and timely epidemiological and eradication strategies to be performed.

iv

ACKNOWLEDGEMENTS

I would like to acknowledge the contributions of my co-supervisors Dr. Huimin Xu and Dr. Fred Kibenge. Thank-you, Huimin, for accepting me into your research group here at the Charlottetown Laboratory of the Canadian Food Inspection Agency and seeing the potential in me to complete my graduate studies. The time and effort it took for you to provide me with both guidance and directed studies for this effort was well beyond your requirement at the Agency. I thank you for being supportive of my idea to combine professional with academic achievements. I would also like to thank my committee members Dr. Chelsea Martin and Dr. Yingwei Wang for seeing me through this process. I would also like to thank Dr. Doug Campbell and Dr. Amanda Cockshutt of Mount Allison University who gave me an opportunity to work in their lab in 2007; something they did simply out of kindness. Their guidance provided me with many of the necessary skills, technical knowledge, and troubleshooting capabilities necessary for the future. I hope they realize how grateful I am to learn from them. I would not be in the position that I am in now without them. I was also very fortunate for Dr. Xianzhou Nie and Virginia Dickison of the Fredericton Research Centre of Agriculture Agri-Food Canada who provided me with introductory training in NGS technology which was important for my further understanding of the Illumina platform. Dr. Martin Filion and Dr. Adrien Biessy at the University of Moncton provided valuable training on the CLC Genomics Workbench platform that enabled me to create a functional NGS workflow. Erin Miller-Jones of the Canadian Food Inspection Agency provided me with outstanding help retrieving sample information at the drop of a hat. I am very fortunate to also have someone at the Charlottetown Laboratory like Joan Hardy who has the breadth of knowledge that she has in the administrative processes of government which made my experience much easier.

I wish to dedicate this work to the citizens of the Canadian Federation.

v

TABLE OF CONTENTS

Thesis/dissertation non-exclusive license……………………………………………………….ii

Certification of thesis work……………………………………………………………………..iii

Abstract…………………………………………………………………………………………..iv

Acknowledgements………………………………………………………………………………v

Table of Contents………………………………………………………………………………..vi

List of Tables………………………………………………………………………………….....ix

List of Figures……………………………………………………………………………………xi

List of Abbreviations…………………………………………………………………………..xiii

1.0 GENERAL INTRODUCTION AND LITERATURE REVIEW……………………..2 1.1 Historical Perspective……………………………………………………………...2 1.2 Genome Characteristics of Potato…………………………………………………3 1.2.1 Potato Breeding………………………………………………………….3 1.3 Potato Industry in Canada………………………………………………………....5 1.4 Plant Diagnostics and Disease Management…..……………………………….….5 1.4.1 RNA Interference (RNAi) and Symptom Development……………..….6 1.4.2 Diagnostics of Plant Viruses and Viroids.…….………………………...8 1.4.3 Seed Potato Certification………………………………………………...9 1.4.4 Potato Post-Entry Quarantine (PPEQ) Program…………………………10 1.5 Next-Generation Sequencing (NGS)………………………………………………12 1.5.1 Metagenomics……………………………………………………………12 1.5.2 Development of NGS Technologies……………………………………..13 1.5.3 RNA Input……….……………………………………………………....14 1.5.4 Library Preparation………..……………………………………………..15 1.5.5 Sequence-by-Synthesis (SBS)………….………………………………..16 1.5.6 Bioinformatic Pipelines………………………………………...………...18 1.6 Characteristics of Plant Virus and Common Plant Virus Families………………....19 1.6.1 Transmission……………………………………………………………...20 1.6.2 Viruses of …………………………………………………..……....21 1.6.3 Viruses of …………………………………………………....23 1.6.4 Viruses of ……………………………………………………...24 1.6.5 Viruses of Luteoviridae………………….……………………………………....25 1.6.6 Viruses of ……………………………………………………....27 1.6.7 Viruses of ………………………………………………………….28 1.6.8 Viroids……………………………………………………………..……..29 1.6.9 Double-stranded RNA (dsRNA) Viruses………………………………...31 1.6.10 (-)-ssRNA Viruses…………..…………………………………………..32

vi

1.7 Objectives and Aims of This Study...………………………………………………..33

2.0 MATERIALS AND METHODS…………………………………………………………..36 2.1 Virus and Viroid Isolates…………………………………………………………….36 2.2 RNA Extraction….…………………………………………………………………..39 2.2.1 Tri-Reagent Extraction.……………………………………………………39 2.2.2 Pure-Link Total RNA Mini Kit……………………………………………40 2.2.3 Direct-zol RNA Mini-Prep………………………………………………...41 2.2.4 Quick-RNA Mini-Prep……………………………………………………..42 2.2.5 Plant/Fungi Total RNA Purification Kit…………………………………...43 2.3 RNA Quality Assessment……………………………………………………………44 2.4 Construction of cDNA Libraries…...………………………………………………...44 2.4.1 TruSeq Stranded Total RNA Library Preparation…………………………44 2.4.1.1 rRNA Depletion………………………………………………….44 2.4.1.2 RNA Purification and Fragmentation……………………………45 2.4.1.3 cDNA Synthesis………………………………………………….46 2.4.1.4 3’-Adenylation and Adapter Ligation……………………………46 2.4.1.5 cDNA Library Enrichment………………………………………47 2.4.1.6 cDNA Library Validation………………………………………..48 2.4.1.7 cDNA Library Quantitation and Normalization…………………48 2.5 Sequence Analysis…………………………………………………………………...49 2.6 Design of Oligonucleotide Primers…………………………………………………..50 2.7 Synthesis of cDNAs ……….………………………………………………………..51 2.8 PCR and RT-PCR……………………………………………………………………52 2.8.1 PCR………………………………………………………………………...52 2.8.2 Multiplex-PCR for PVY-Strain Discrimination…………………………...55 2.8.3 One-Step RT-qPCR for Detection of PSTVd and PepMV………………...55 2.9 Bioassay……………………………………………………………………………...56

3.0 RESULTS…………………………………………………………………………………...59 3.1 Determination of RNA Quality………………………………………………………59 3.2 Evaluation of the Tru-Seq Stranded Total RNA Library Preparation Protocol……...60 3.2.1 Acquisition of Tubers Imported from Bangladesh in Toronto, Canada..….60 3.2.2 Testing Remainder of Tubers from Bangladesh Interception……………...63 3.2.3 Testing Imported Nuclear Stocks...………………………………………...68 3.2.4 Testing for Unknown Pathogen in S. lycopersicum Leaves……………….70 3.2.5 Sensitivity Assay of the Illumina Sequencing Platform…………………...74 3.2.6 GenBank Submissions……………………………………………………..75

4.0 DISCUSSION……………………………………………………………………………….78 4.1 Requirement of High Quality RNA for cDNA Library Preparation……………..…..78 4.2 Evaluation of a Commercial Library Preparation Kit for Diagnosis……………...…80 4.2.1 Accurate Detection of Mixed Infections in a Single Sample………………81 4.2.2 Use of NGS in Diagnostic Testing Lads to New Characterizations……….84 4.3 Sensitivity of the NGS Platform……………………………………………………..85 4.4 Conclusion and Future Direction…………………………………………………….86

vii

REFERENCES………………………………………………………………………………….89

APPENDIX A………………………………………………………………………………….102

APPENDIX B………………………………………………………………………………….103

APPENDIX C……………………………………………………………………………….…104

APPENDIX D………………………………………………………………………………….105

APPENDIX E………………………………………………………………………………….107

APPENDIX F………………………………………………………………………………….108

APPENDIX G…………………………………………………………………………………109

APPENDIX H…………………………………………………………………………...... 110

APPENDIX I…………………………………………………………………………………..111

viii

LIST OF TABLES

Table 2.1. List of samples used in this study…………………………………………………….37

Table 2.2. Scoring values used for mapping paired reads to the reference genome……………..50

Table 2.3. Scoring values used for de novo assembly of unmapped contigs………………….....50

Table 2.4. List of primers and probes used in this study………………………………………...53

Table 3.1. Quality and quantity of RNA isolated using Tri-Reagent……………………………59

Table 3.2. Quality and quantity of RNA isolated using PureLink Total RNA Mini Kit………..59

Table 3.3. Quality and quantity of RNA isolated using Direct-zol RNA Mini-Prep Kit. ………59

Table 3.4. Quality and quantity of RNA isolated using Quick-RNA Mini-Prep Kit……………60

Table 3.5. Quality and quantity of RNA isolated using Plant/Fungi RNA Purification Kit…….60

Table 3.6. Summary of data generated after sequencing of 8 samples from Bangladesh and sequencing against the constructed S. tuberosum genome published in GenBank……………...63

Table 3.7. NGS results of tuber interception from Bangladesh confirmed via RT-PCR using the primers listed in Table 2.4.……………………………………………………………………....63

Table 3.8. Summary of data generated after sequencing of 12 samples from Bangladesh against the constructed S. tuberosum genome published in GenBank…………………………………..67

Table 3.9. NGS results of tuber interception from Bangladesh confirmed via RT-PCR using the primers listed in Table 2.4…………………………………………………………………….....67

Table 3.10. Results from sequencing PPEQ batches 82, 83, and 84 confirmed via RT-PCR using the primers listed in Table 2.4…...... 70

Table 3.11. Summary of data generated after sequencing a twofold dilution series consisting of 3 viruses (PVX, PLRV, PotLV) ranging from undilute to 1:4096 in healthy potato leaf sap against the constructed S. tuberosum genome published in GenBank…………………………………..75

Table 3.12. Selected complete and incomplete sequences submitted directly to GenBank obtained in this study……………………………………………………………………………76

Table A.1. Components for the generation of Murashige and Skoog agar……..………………102

Table B.1. Components for the generation of 0.01M KPO4 buffer pH 7.0……………………..103

ix

Table C.1. List of herbaceous indicator species used within the PPEQ program……………..104

Table G.1. Table G.1. Summary of data generated after sequencing of 3 tomato leaf samples from Alberta against the constructed S. lycopersicum genome published in GenBank…….…110

x

LIST OF FIGURES

Figure 3.1. (A) Amplification of standards of known concentration and cDNA libraries created from Bangladesh-intercepted plants using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Depletion Reagent. Concentration of standards were 20pM, 2pM, 0.2pM, 0.02pM, 0.002pM, and 0.002pM. Each sample was performed in triplicate and cDNA libraries were diluted 1:100000 (B) Standard curve generated from amplification of standards of known concentration……………………………………………………………………..….…………..61

Figure 3.2. (A) Amplification of standards of known concentration and cDNA libraries created from second set of Bangladesh-intercepted plants using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Depletion Reagent. Concentration of standards were 20pM, 2pM, 0.2pM, 0.02pM, 0.002pM, and 0.002pM. Each sample was performed in triplicate and cDNA libraries were diluted 1:100000 (B) Standard curve generated from amplification of standards. Standard curve generated from amplification of standards of known concentration…………...64

Figure 3.3. (A) Real-time RT-qPCR amplification using RNA from leaves acquired from Alberta and the PSTVd-specific primer/probe set PSTVd-F/R/P. (B) Real-time RT-qPCR amplification using the same RNA and the PepMV-specific primer/probe set Pep-F2/R2/P2. (C) Agarose gel electrophoresis of RT-PCR products from cDNA generated from the same RNA amplified using the STV-specific primers STV-F/R……………………………………………………………..72

Figure D.1. Agilent Bioanalyzer trace for RNA extracted using Tri-Reagent….……………...105

Figure D.2. Agilent Bioanalyzer trace for RNA extracted using PureLink Total RNA Mini Kit ………………………….……………………………………………………………………....105

Figure D.3. Agilent Bioanalyzer trace for RNA extracted using Direct-zol RNA Mini-Prep kit……………………………………………………………………………………………….105

Figure D.4. Agilent Bioanalyzer trace for RNA extracted using Quick-RNA Mini-Prep kit.…106

Figure D.5. Agilent Bioanalyzer trace for RNA extracted using Plant/Fungi Total RNA Purification kit………………………………………………………………………………….106

Figure E.1. Agarose gel electrophoresis of PCR products amplified from cDNA generated from extracted RNA from a tuber interception from Bangladesh in Toronto, Canada. Samples amplified using primers specific for the detection of (A) PAMV, (B) PVY (all strains), (C) ToCV, (D) PVX, (E) PVS, and (F) PVM……...... ……………………………………………..107

Figure F.1. Agarose gel electrophoresis of PCR products amplified from cDNA generated from extracted RNA from a tuber interception from Bangladesh in Toronto, Canada. Samples amplified using primers specific for the detection of (A) PAMV, (B) PVY (all strains), (C) ToCV, (D) PLRV, (E) PVS, and (F) PVM……..……………………………………………..108

xi

Figure H.1. Agarose gel electrophoresis of PCR products amplified from cDNA generated from extracted RNA from several accessions submitted to the PPEQ program at the Charlottetown Laboratory of the CFIA. Samples amplified using primers specific for the detection of (A) PVY (all strains), (B) PLRV, and (C) PVS……..…………………………………………………..110

xii

LIST OF ABBREVIATIONS

(+)-ssRNA positive-sense single-stranded RNA (-)-ssRNA negative-sense single-stranded RNA A adenine AMV Alfalfa ASVd Avocado sunblotch viroid ATF aphid transmission factor BLAST Basic Local Alignment Search Tool BLRV Bean leaf roll virus BYMV Barley yellow mosaic virus C cytosine cDNA complementary DNA CEVd citrus exocortis viroid CFIA Canadian Food Inspection Agency CMV Cucumber mosaic virus contig contiguous sequence COX cytochrome oxidase CP coat protein CSVd Chrysanthemum stunt viroid CT cycle threshold DAS-ELISA double-antibody sandwich ELISA ddH2O distilled deionized water ddNTP dideoxynucleoside triphosphate DNA deoxyribonucleic acid DNA pol DNA polymerase dNTP deoxynucleotide triphosphate dsDNA double-stranded DNA DTT dithiothreitol ELISA enzyme-linked immunosorbent assay EM electron microscopy EPPO European Plant protection Organization ERSV Echinocloa ragged stunt virus G guanine HC-Pro helper component protease HpMV Hop mosaic virus KPO4 potassium phosphate LNYV Lettuce necrotic yellows virus miRNA micro RNA M-MLV RT Maloney-murine leukemia virus reverse transcriptase MgCl2 magnesium chloride MP movement protein MPVd Mexican papita viroid mRNA messenger RNA MRDV Maize rough dwarf virus NaOH sodium hydroxide

xiii ncRNA non-coding RNA NGS next-generation sequencing NLV latent virus ORF open reading frame OSDV Oat sterile dwarf virus PAMV Potato aucuba mosaic virus PAR photosynthetically active radiation PCR polymerase chain reaction PepMV Pepino mosaic virus PLRV Potato leaf roll virus PMTV Potato mop-top virus PPEQ Potato Post-Entry Quarantine pre-miRNA pre-micro RNA pri-miRNA primary micro RNA PTGS post-transcriptional gene silencing PTNRD potato tuber necrotic ringspot disease PotLV Potato latent virus PPV Plum pox virus PSTVd Potato spindle tuber viroid PVM Potato virus M PVS Potato virus S PVV Potato virus V PVX Potato virus X PVY Potato virus Y Q Phred quality score QC quality control qPCR quantitative PCR R-PAGE reverse polyacrylamide gel electrophoresis RBSDV Rice black-streaked dwarf virus RdRp RNA-dependent RNA polymerase RDV Rice dwarf virus RIN RNA integrity number RISC RNA induced silencing complex RMV Ryegrass mosaic virus RNA ribonucleic acid rRNA ribosomal RNA RNAi RNA interference RNA pol RNA polymerase RT-PCR reverse transcriptase PCR satRNA RNA SBS sequence-by-synthesis SDV Soybean dwarf virus SOLiD sequencing by oligonucleotide ligation and detection sgRNA subgenomic RNA SPMMV Sweet potato mild mottle virus ssDNA single-stranded DNA

xiv

STV Southern tomato virus T thymine Taq DNA polymerase Thermus aquaticus DNA polymerase TAS-ELIZA triple-antibody sandwich ELISA TBE tris-borate ethyl ethylenediaminetetraacetic acid TCSVd Tomato chlorotic stunt viroid TEM transmission electron microscopy TEV Tobacco etch virus TGB triple gene block TICV Tomato infectious chlorosis virus ToCV Tomato chlorosis virus TPS true potato seed TRV Tobacco rattle virus TSWV Tomato spotted wilt virus VAP virion associated protein VIGS virus-induced gene silencing VILO variable input linear output WTV Wound tuber virus YLV Yam latent virus

xv

CHAPTER 1

GENERAL INTRODUCTION AND LITERATURE REVIEW

1

1.0 GENERAL INTRODUCTION

1.1 Historical Perspective

The family Solanaceae consists of approximately 3000-4000 species within approximately 90 genera. The most commonly cultivated are potato (Solanum tuberosum L.), tomato (S. lycopersicum L.), eggplant (S. melongena L.), and pepper (Capsicum spp.) (Machida-

Hirano, 2015). There are several related taxonomic groups within S. tuberosum that were domesticated in the Andean region of South America. Following the exchange of people, plants, animals, and culture after the visit by Columbus in 1492, the potato was introduced to Europe and shortly thereafter became a staple part of the diet for many in the region. By the late eighteenth century, it was recognized as a famine food which reliably fed the population when other foods and crops failed. There are a number of unique traits that make the potato suitable for growth in Europe such as the relative size of the tuber and its ability to form sprouts even in harsh conditions (Brown, 1993). Some of the early domesticated Solanum species did not form tubers due to the shorter growth periods in South America. However, S. tuberosum tuberosum eventually adapted to long-day conditions in Europe and North America (Hirsch et al., 2013).

During the eighteenth and nineteenth centuries, particularly in Ireland, potatoes were grown in poor soils but still had good yields. For an impoverished country at the time, potatoes provided the Irish a crop that could be grown completely on their own with little work while providing a high nutritional advantage and its ability to be traded for other goods. This is thought to have been a huge factor in the population expanse in Ireland from 1.5 million to 9 million between

1790 and 1845 (Brown, 1993). The importance of the potato crop for Ireland was further highlighted by its destruction due the Phytopthora infestans in the mid nineteenth century

2 resulting in famine and population displacement. At the time of this writing, P. infestans remains the most destructive disease of potato worldwide (Cooke et al., 2012).

1.2 Genome Characteristics of Potato

Almost all potato cultivars are heterozygous autotetraploids that can be vegetatively propagated as a form of asexual reproduction for both the maintenance of germplasm or for agricultural purposes. Potatoes also sexually reproduce by the generation of a blossom that has evolved to provide haploid gametophytes that cross with other haploid gametophytes via pollination. Upon fertilization, the blossom forms botanical seeds known as true potato seed

(TPS) resembling a tomato (Machida-Hirano, 2015). With a base chromosome number of n = 12, cultivated potato may be diploid (24 chromosomes), triploid (36 chromosomes), tetraploid (48 chromosomes), or pentaploid (60 chromosomes). As recently as 2007, it has been suggested that

S. tuberosum should be separated into 2 groups called the Andigenum (upland Andean genotypes

– diploids, triploids, tetraploids) and Chilotanum (lowland Chilean landraces – tetraploids)

(Machida-Hirano, 2015). Expressing 10046 genes, 4479 are common genes shared between monocots, eudicots, and algae (Potato Genome Sequencing Consortium, 2011).

1.2.1 Potato Breeding

Potato is the third most important crop in the world behind wheat and rice. This is due to its adaptability, yield potential, and nutritional advantage (Hirsch et al., 2013). Cultivated potato is now bred to meet a wide range of market classes. Prior to 1950, there were only white and red- skinned round potatoes. However, after 1950, the market for potatoes expanded to meet the needs of frozen French fry processing which required extended tubers and longer dormancy

3 periods. The potato chip processing industry requires tubers with high starch content but lower reducing sugar content. Culinary practices have also evolved during this time for home and restaurant baking requiring pigmented and odd-shaped varieties. The evolution of the potato market has therefore increased the demand for new varieties that have a particular yield or performance. New cultivars are required to improve performance under particular biotic and abiotic stressors to keep up with the global demand and the impact of climate change. New cultivars can also give economic benefits through a product that has greater yield with less cost of production. New cultivars may also have built-in resistances to pathogens that give environmental benefits such as a reduction of pesticide application (Hirsch et al., 2013).

Potato stands out from other crops because of its immense genetic diversity. Primitive forms of cultivated potato along with their wild relatives provide a deep pool of genetic variation in terms of potato breeding. Adaptation to a broad range of habitats influenced by latitude and altitude as well as the changing habitat from Canada to Argentina provides diverse morphological traits such as leaf shape, size, tuber shape, and stolon length (Machida-Hirano,

2015). With the complete potato genome being sequenced in 2011 (Potato Genome Sequencing

Consortium, 2011), it is now possible to identify genomic regions, genes, or alleles that contribute to particular agronomic traits of interest even with a narrow genetic base. By doing this, it is possible to select these molecular markers for breeding purposes. New potato lines can now be developed through marker-assisted crossing to adapt to specific market classes (Hirsch et al., 2013).

4

1.3 Potato Industry in Canada

In Canada, potatoes are grown in every province. Production is distributed to Atlantic

Canada (39.9%), Central Canada (22.2%), and Western Canada (37.9%). Approximately 65% of

Canadian-grown potatoes are used for processed products including French fries, potato chips, and flakes. Table consumption (21%) and seed production (14%) make up the remainder. The top three export countries for Canadian potatoes are the United States, Indonesia, and Thailand

(Statistics Canada, 2018). There are approximately 1000 potato farms operating in Canada and approximately 140000 total hectares farmed in 2016. Because soil and climatic conditions in

Canada combined with a long growing season provides optimal conditions for productivity, this leads to Canada having a trade surplus of approximately $1.2 billion and has contributed $1.6 billion to the Canadian economy annually (Potato Market Information Review, 2015-2016).

1.4 Plant Diagnostics and Disease Management

In agriculture, crop losses due to emerging plant diseases are of great concern particularly in developing countries due to their grave economic impact. Many plant pathogens have worldwide distribution while some are endemic to particular regions of the world (Kreuze et al.,

2009). Globalization of agriculture means that crop plants with a narrow genetic base are grown in locations far from their centres of origin. This leads to a lack of resistance to already established pathogens (Strange and Scott, 2005). Plant viruses alone make up a considerable amount of these losses. There are more than 700 known plant viruses that cause devastating losses across a variety of host ranges (Kaur et al., 2016; Strange and Scott, 2005). However, the impacts of primary infection are not the only impacts of concern. If the crop is vegetatively propagated, the virus can be transmitted through the tubers to successive vegetative generations.

5

Losses can be the result of reduced yield, tuber degeneration, or the decertification of seed lots.

The cost to maintain a healthy crop is also very expensive to the grower in the form of labour

(inspection, diagnostic testing) and vector control. Throughout time, the cost of vector control is increasing due to insecticide resistance, the human and environmental concern regarding the safe application of insecticide, and climate change leading to overall changes in the vector population

(Solomon-Blackburn and Barker, 2001).

1.4.1 RNA Interference (RNAi) and Symptom Development

Advances in RNA sequencing has revealed that there are various non-coding RNAs

(ncRNA) that still play a role in modulating gene expression and host defense mechanisms within plants. These ncRNAs include all of the known RNA species, but do not encode any protein. ncRNA are crucial for viruses as they play a role in both host defense but also promote their survival within the host (Sharma and Singh, 2016).

Micro RNAs (miRNAs) are a small RNA species approximately 22 nt in length which is one of the major classes of RNA species that regulate gene expression post-transcriptionally

(Sharma and Singh, 2016). Found in every living species, they function through imperfect base- pairing with complementary or semi-complementary sequences of target host genes. miRNAs are encoded in the host genome and are transcribed in the nucleus by RNA polymerase (RNApol).

This results in a stem-looped structure of intermediate dsRNA species called primary miRNAs (pri-miRNAs) which is cleaved by an enzyme known as Drosha to a size of 60-

70 nt (pre-miRNA) and transported out of the nucleus. After export to the nucleus, an enzyme known as Dicer cleaves the stem of the looped structure to generate a miRNA-miRNA duplex approximately 22 nt in length. Upon degradation of one of these strands, the remaining strand

6 becomes a guide strand (mature miRNA) that becomes associated with another protein known as

Argonaute. This is known as the RNA-induced silencing complex (RISC) which spares the miRNA from degradation. The mature miRNA is then free to bind with complementary or semi- complementary RNA species (including mRNA) leading to transcriptional repression of host mRNA or targeted mRNA cleavage in a process indistinguishable from RNAi called post- transcriptional gene silencing (PTGS) (Sharma and Singh, 2016).

Viral infection modulates the abundance and distribution of miRNAs within the host.

Many plant viruses encode proteins that are suppressors of this RNA silencing process. These suppressor proteins are not produced until after infection and subsequent replication in the host.

These proteins also regulate the final quantity of virus accumulation. Virus-induced gene silencing (VIGS) accounts for two ways in which this could occur. The suppressor proteins could bind to proteins involved in regulatory gene-expression such as Dicer, Drosha, or Argonaute and target them for degradation leading to an enhancement or repression in host gene expression.

Similarly, RNA sequences from the pathogen with partial complementarity can bind to the RISC and sequester host mRNA making the host incapable of moderating gene expression. Both of these mechanisms reflect losses in the generation of functional protein and changes in host physiology (Lu et al., 2003; Senshu et al., 2011; Sharma and Singh, 2016). Such physiological alterations include decreases in photosynthesis, increases in respiration, accumulation of nitrogen compounds, or increases in oxidase activity (Culver and Padmanabhan, 2007). The process of

RNAi can however be used as a host defense mechanism where complementary viral sequences integrated into the host genome can be involved during RISC formation to provide host defense from an invading viral pathogen (Lu et al., 2003; Senshu et al., 2011).

7

Numerous simultaneous infections with distinct plant viruses have been observed in nature with differing biological and epidemiological implications. In many cases, a mixed infection causes disease symptoms much more severe than with single infection - a phenomenon known as viral synergism. Virus-virus interaction within mixed infections has resulted in suppression of host defense mechanisms by one or both viruses. Viral suppressor proteins have also been shown to alter tissue movement patterns of other viruses in mixed infections (Garcia-

Marcos et al., 2009; Gonzalez-Jara et al., 2004). However, in cases of mixed infection, often involving a , accumulation level of the potyvirus is unaltered and the accumulation of the non-potyvirus increases significantly (Gonzalez-Jara et al., 2004; Pruss et al., 1997). Pruss et al., (1997) determined that the potyviral P1 protein and HC-Pro (helper component protease) acts as a pathogenicity enhancer of other viruses, thus by acting as a , by suppressing

PTGS paving the way for further infection.

1.4.2 Diagnostics of Plant Viruses and Viroids

Detecting and identifying new and known viruses currently relies on a large range of techniques, both traditional and modern. Currently, serological tests such as enzyme-linked immunosorbent assay (ELISA) and molecular tests such as polymerase chain reaction (PCR) are the most commonly used. If pathogens are not detected, other techniques could be applied such as electron microscopy, plant bioassay, or PCR using degenerate primers. Recently, microarrays have been developed allowing for a large number of targets to be tested for simultaneously.

However, these techniques have their downsides particularly when trying to identify unknown or uncharacterized agents. Given the heterogeneity of a viral population, reagents in these assays have finite specificity covering a particular strain, individual species, or small group of

8 pathogens. These reagents are often incapable of detecting new strains or variants that can evade detection. Many viruses are not readily transmissible to indicator hosts for bioassay or can be undetected under microscopy (Adams et al., 2009a). There is also a significant length of time required for the identification of a new disease which can have impacts on the agricultural industry while waiting for means to control an outbreak.

1.4.3 Seed Potato Certification

The source of seed from many countries is acquired via clonal selection where field grown tubers are used to create a new line of seed potatoes. This is known as “nuclear stock”

(disease tested micro-plantlets grown in vitro). Canada has its own seed certification program with similar aims to those of other countries; the maintenance of clonal selection. Historically, this relied almost entirely on visual inspections which gradually transitioned to validated laboratory assays. These inspection records are then maintained in a database that recorded all observations relevant to pathogen incidence. Long-term monitoring of data sets also contains financial data in order to understand long-term pathogen trends. The long-term collection of data also allows for the determination of patterns of disease incidence that can be correlated to farm hygiene practices, environmental variability, or arrival of invasive species over time (Frost et al.,

2013). In Canada, the seed potato certification program falls under the legislative authority of

The Seeds Act. Any grower intending to use nuclear stock class equivalent to produce pre-elite class seed potatoes must comply with the specific requirements of the program. Growers are responsible for administering the selection, handling, and maintenance of nuclear stock clones.

They are also responsible for retaining documentation relating to the seed source, clone selection, and clone identification to demonstrate to inspectors that the identity of the selected

9 material has been maintained. Post-harvest testing is mandatory to ensure the elimination of plants affected by secondary infection; the costs of which are at the expense of the grower.

1.4.4 Potato Post-Entry Quarantine (PPEQ) Program

Potato lines that have specific agronomic traits are generally acquired from outside of

Canada. However, the direct import of tuber producing species of Solanum spp. to Canada is prohibited from all regions of the world including some regions of the United States. The importation of plant cultivars from regions outside of North America poses the potential risk of the introduction of non-indigenous virus disease of potato into the Canadian Potato Program.

This is particularly true of cultivars imported from South America where they have evolved for centuries. The Canadian Food Inspection Agency (CFIA) has developed a phytosanitary program known as the Potato Post-Entry Quarantine (PPEQ) Program under the Plant Protection Act. The program standards are similar to those developed by European Plant Protection Organization

(EPPO) (EPPO Standard PM 8(1) - Commodity Standard for Potato). The program is designed to prevent the entry and spread of quarantine pests of potato into Canada. Not only does the spread of a quarantine pest have economic importance, but could reduce international market access.

The PPEQ program involves the importation of seed or in vitro cultivars via an import permit. The candidate material is grown under quarantine conditions (greenhouse, biocontainment level 2), tested for specific pathogens using specified methods, and inspection of disease symptoms. Testing material must be grown and maintained using procedures that minimize the risk of contamination. Infected material should not be released from quarantine while material that passes diagnostic tests can be released to the importer.

10

There are a number of diagnostic methods employed by the PPEQ program. One such method includes bioassay. This involves the mechanical inoculation of herbaceous indicator species that can harbour plant virus and induce phenotypic symptoms similar to those within the host using ground sap from the imported accession. There are 19 indicator species used within the PPEQ program; such species include Nicotiana spp., Datura metel, Gomphrena globosa,

Chenopodium spp., Solanum lycopersicum, Capsicum annum, Nicandra physaloides, Phaseolus vulgaris, Cucumis sativus, Physalus spp., and Petunia hybridae. These indicator species are monitored for disease symptoms over a period of weeks. Serological assays such as double or triple antibody sandwich ELISA (DAS/TAS-ELISA) for the detection of known pathogens with available antibodies are employed in the program. Morphological examination using light microscopy or transmission electron microscopy (TEM) may be employed to reveal pathogens in test samples. Most frequently, molecular technologies have been employed in the PPEQ program to detect and identify specific nucleic acid sequences of pathogens. These molecular methods include PCR, reverse transcriptase PCR (RT-PCR), quantitative PCR (qPCR), and reverse transcriptase qPCR (RT-qPCR). Quantitative assays using probes for virus-specific sequences are now frequently used because of their added sensitivity and specificity. However, the number of validated molecular and serological assays utilized within the PPEQ program have limited specificity. For example, the program has antibodies designed for the specific serological detection of eight plant viruses and validated RT-PCR assays for the detection of six viruses. The program also utilizes reverse-polyacrylamide gel electrophoresis (R-PAGE) for the non-specific detection of viroids which is both a time and labour intensive assay. The finite specificity of these assays make the detection of other viruses and new strains of pre-existing viruses difficult.

11

Given these drawbacks, the PPEQ program requires a novel approach to improve the identification of new or unusual plant pathogens. Metagenomics provides a promising approach.

1.5 Next-Generation Sequencing (NGS)

1.5.1 Metagenomics

Metagenomics is an approach to the study of microbial populations in a sample by analyzing the nucleotide sequence content of an entire sample. Metagenomic approaches to diagnostic plant virology offers the possibility of overcoming the problems associated with pathogen identification as sequences produced from an infected sample will include sequences from any pathogens present (Adams et al., 2009a). Metagenomics has led to the development of meta-transcriptomics allowing for the quantitative measurement of gene expression. Complete sets of RNA transcripts from the genome at any given time can be examined allowing researchers to determine how transcript patterns are affected by development, disease, or other environmental factors (Lowe et al., 2017). Next-generation sequencing (NGS) meta- transcriptomic technology has also hastened the advancement for the detection of RNA viruses.

Different sequencing platforms (and ultimately different sample preparation methods) has utilized different NGS chemistries for the de novo detection of plant viruses and viroids from various plant tissue samples such as in studies by Adams et al., (2009a), Candresse et al., (2014),

Chiumenti et al., (2014), Di Serio et al., (2009), Elbeaino et al., (2014), Hagen et al., (2011), and

Kehoe et al., (2014).

12

1.5.2 Development of NGS Technologies

In 1977, Frederick Sanger developed DNA sequencing with a chain-terminating inhibitor

(Sanger et al., 1977). This method involved purifying DNA, denaturing, and then performing second strand synthesis using a priming sequence of known length. Nucleotide additions upstream from the priming sequence were determined in four separate reactions using a mixture of deoxy-nucleotide tri-phosphates (dNTPs) and one of the four possible dideoxy-nucleotide tri- phosphates (ddNTPs). The ddNTPs were missing a hydroxyl group at the third carbon atom of the sugar moiety. This missing hydroxyl group causes random, non-reversible termination of the extension reaction creating double-stranded deoxyribonucleic acid (dsDNA) strands of different lengths. The resulting DNA fragments were then sorted by molecular weight after separation on an acrylamide gel (Kircher and Kelso, 2010; Sanger et al., 1977). In 1986, an automated DNA sequencing machine was developed based on the Sanger chain-terminating inhibitor procedure using labeled ddNTPs and sorting by molecular weight was done via capillary electrophoresis.

The fluorescent signal emitted by the ddNTPs was then read by a sensitive detector. The following year, Applied Biosystems (USA) improved this technique to the point that it was commercially available. Year after year there have been constant improvements in the technology which have resulted in major research accomplishments such as the completion of the human genome sequence (Barba and Hadidi, 2015). This sequencing technology became known as “first-generation” technology.

In the early 2000’s, newer and more modern sequencing methods were developed to replace the Sanger method. These newer methods became known as “next-generation”. NGS became a game-changer for many branches of the biological field such as pathology, physiology, and oncology. The biggest advance that NGS technology provided research groups was the

13 ability to produce very large quantities of data in a single instrument run very accurately and inexpensively. The current cost using NGS methods per megabase of DNA sequence is approximately $0.07USD, and about $7500USD per genome. There has been a gradual decrease in cost over time per megabase of sequence since the development of NGS technology outpacing

Moore’s Law (Barba and Hadidi, 2015).

Today, there are numerous NGS platforms that are commercially available that all rely on different chemistries to generate sequence data. Such sequencing platforms include 454 pyrosequencing from Roche (USA), sequence-by-synthesis (SBS) chemistry from Illumina

(USA), SOLiD (sequencing by oligonucleotide ligation and detection) chemistry from Life

Technologies (USA), and Ion Torrent sequencing (measuring pH changes with the addition of each nucleotide for detection) from Life Technologies (USA) (Barba and Hadidi, 2015). The particular sequencing platform chosen depends on numerous factors such as the expected size of the genome that is studied, its complexity, and the depth of coverage and accuracy required

(Barba et al., 2014). Illumina sequencing platforms currently feature the longest read length, biggest output, and lowest sequencing cost for commercial benchtop sequencing.

1.5.3 RNA Input

Transcriptomic methods require ribonucleic acid (RNA) to be isolated from the host in a method that ensures its integrity. Extraction techniques for RNA are diverse but largely rely on the disruption of cells or tissues, treatment of the sample with a chaotropic salt, disruption of macromolecules such as proteins, and the separation of RNA by precipitation (Lowe et al.,

2017). However, RNA is rapidly degraded in the presence of rather ubiquitous RNase enzymes resulting in shorter fragments that can compromise downstream results. The degree of this

14 degradation must be determined prior to library generation to unsure the highest quality of sequence data. In the past, quality was determined via electrophoretic methods yielding specific banding patterns. Gel images would show two distinct bands representing the 28 S and 18 S ribosomal RNA (rRNA) subunits (25 S and 18 S in plants) with a series of other bands representing smaller RNA species. RNA is considered of high quality when the ratio of the rRNA subunits is 2:1. However, this process can be highly flawed because of the human element of interpretation and may not be comparable from one individual to another (Schroeder et al.,

2006). There has since been a move to a more automated approach. There are several commercial pieces of equipment that determines RNA quality such as the BioRad Experion

(BioRad, USA) and the Agilent BioAnalyzer (Agilent, USA). The Agilent system has been the most widely used technique for the analysis of RNA. Small amounts of the total RNA sample are separated by capillary channels in a microfabricated chip according to their molecular weight and detected via laser-induced fluorescence. The result is an electrophoretogram where the measure of fluorescence correlates to the concentration of RNA in the sample (Schroeder et al.,

2006). Quantitation of the ratio of the rRNA bands allows for a RIN (RNA integrity number) score to be assigned to the sample to indicate the quality of the RNA. The RIN scale ranges from

0-10; with 0 being lowest quality of RNA and 10 being high quality.

1.5.4 Library Preparation

RNA extracts can typically range from 80-90% rRNA (Lowe et al., 2017). Enrichment of transcripts and other RNA species must be performed to reduce the quantity of redundant sequences that can be generated. This can be performed via poly-A binding methods or by the binding of rRNA sequence specific probes for sequestration (Lowe et al., 2017; Zhong et al.,

15

2011). After rRNA has been depleted from the sample, the remaining RNA is converted to cDNA using a reverse transcriptase. The resulting cDNA is then bead purified to remove any dNTPs, reverse transcriptase, or random primers that were not involved in the conversion.

To each cDNA molecule, a single adenine (A) nucleotide is added to the 3’ and of the blunt fragments. Sequence-specific adapters containing a single thymine (T) residue allowing for binding to the cDNA via the complementary overhang are then ligated to each sample individually (different adapter for each sample). While the sequence of each adapter is approximately the same, each adapter contains a unique index sequence which allows the sequencing software to identify the specific adapter that bound to that unique sample. The 5’ and

3’ ends of each adapter are complementary to the single-stranded oligonucleotide lawn on the flow cell allowing for the complementary binding of single-stranded DNA (ssDNA) after the library has been fully prepared. Upon another bead purification to remove excess adapter and A residues, the resultant cDNA with adapter sequence bound to its 3’ ends are then enriched via

12-15 cycles of PCR to create a final cDNA library. The libraries are then purified to eliminate any remaining reagents used during PCR and quantitated. Post-quantitation of each library, they are then pooled together to a single concentration which can then be denatured to ssDNA and ready for attachment to the oligo lawn on the flow cell and sequenced.

1.5.5 Sequence-by-Synthesis (SBS)

Illumina sequencing chemistry utilizes SBS chemistry to generate sequence reads. The flow cell is an optically transparent slide with eight lanes to which the surfaces are bound with two types of oligonucleotide anchors complementary sequences on the 5’ and 3’ ends of the adapter on each cDNA library. These DNA templates are amplified on the flow cell through

16

“bridge” amplification (Voelkerding et al., 2009). During “bridge” amplification, a DNA polymerase makes a complementary strand of the hybridized fragment which is a double- stranded molecule that is then denatured, and the original cleaved away. The hybridized fragment folds over and binds to the second type of oligonucleotide on the flow cell. DNA polymerase (DNApol) then makes a complementary strand to make a double-stranded bridge.

After this dsDNA bridge is denatured, the result is two ssDNA molecules now tethered to the flow cell. The process is repeated several times to generate millions of clusters. This results in clonal amplification of all fragments. After amplification, all reverse strands are cleaved off of the flow cell leaving the forward strands. The first read is generated via the binding of a short sequencing primer where fluorescently tagged dNTPs are added to the growing strand. After each cycle of dNTP addition, the emitted fluorescent signal is detected allowing for a determination of the base call. This allows hundreds of millions of strands to be read simultaneously. After the first read, the read product is cleaved and washed away. The ssDNA molecules can now fold over and bind to its complementary oligonucleotide sequence on the flow cell. DNApol extends the new strand to form another double-stranded bridge. The dsDNA is then linearized and the original forward strand is cleaved leaving only the reverse strand attached to the flow cell. The reverse strand is then sequenced in a similar manner as the forward strand by the introduction of a sequencing primer complementary to a short sequence on the adapter and addition of fluorescently tagged dNTPs.

Millions of reads have now been generated that represent all of the fragments that have been clonally amplified. Sequences from pooled sample libraries are separated by their unique index reads introduced during cDNA library preparation. Reads with similar stretches of sequence are locally clustered to create longer contiguous (contig) sequences. These contigs can

17 then be aligned back to the reference genome. From a pathogen detection perspective, it may be the sequences that do not align back to the reference genome that are of the most importance.

1.5.6 Bioinformatic Pipelines

NGS produces unprecedented volumes of sequence data which present challenges in data management, storage, and analysis. These large volumes of data require analytical pipelines that assess sequence quality, trim unwanted sections of reads, assemble contigs, map to a reference genome, and identify polymorphisms (Voelkerding et al., 2009). Quality assessment helps in choosing the correct processing parameters as they have a negative impact on assembly. The

Phred quality score (Q) developed for Sanger sequencing reflects mathematical base-calling error probabilities (Blawid et al., 2017). NGS error rates depend on many factors including signal-to-noise levels or cross-talk from nearby clusters (caused by over-clustering). Miscall probabilities of 0.1 (10%), 0.01 (1%), and 0.001 (0.1%) yield Q scores of 10, 20, and 30 respectively (Voelkerding et al., 2009). Q scores lower than 20 produce poor quality reads. The existence of low quality base calls may be detrimental for NGS analysis as it adds unreliable and potentially random sequences to the dataset (Del Fabbro et al., 2013).

Sequence trimming aims at removing low quality portions of sequence while preserving the longest and high quality part of a NGS read. Trimming is a broadly adopted practice in bioinformatic pipelines. Read trimming highly affects the usage of computational resources along with associated cost (Del Fabbro et al., 2013). One of the main purposes of trimming is to remove known adapter sequences originating from cDNA library preparation. Trimming also scans the 5’ ends of each read and cleaves off sequence once the quality per base drops below a given point. Over-trimming can lead to losses of information, but retaining contaminants and

18 low-quality base reads can interfere with downstream analysis such as de novo assembly (Blawid et al., 2017; Chikhi and Medvedev, 2014). There are a number of different trimming software and algorithms that are available. By eliminating reads that have gaps or are very short leading to a decrease in the number of reads overall, the probability of the surviving dataset being capable of correctly aligning to the reference genome increases.

After quality analysis and trimming of undesired sequence, contigs can now be assembled (de novo assembly of contigs). There are also several software packages and algorithms that are available for assembly. The primary algorithm used is based on de Bruijn graphs as they are the most computationally efficient. These algorithms hash reads into substrings of a fixed length k, or k-mers, representing nodes in the graph. In practice, the value of k is often based on previous experience with similar datasets. Two nodes are linked with an edge if they share one k-mer and overlaps between reads are then represented as shared k-mers

(Blawid et al., 2017). This process repeats until all nodes are linked or there is no possible further linkage to occur. In addition to de novo assembly, there can also be reference-guided assembly which can be used to detect genomic variants or other genomes (such as pathogen) within the library. Similar to de novo assembly, there are a number of different software and algorithms available for this purpose (Blawid et al., 2017).

1.6 Characteristics of Plant Viruses and Common Plant Virus Families

Plant viruses have a protein shell known as the coat protein (CP) or capsid that can have a variety of different symmetries. These include helical, rigid rods, and flexous filaments that have their genetic material highly ordered or linear; and icosahedral or bacilliform (Ivanov and

Makinen, 2012). Encased within the CP, the majority of plant viruses are composed of a single-

19 stranded RNA genome with the virus genome in the same orientation (positive sense, (+)- ssRNA). Others possess single-stranded RNA with an antisense orientation (negative sense, (-)- ssRNA). Fewer have dsDNA, double-stranded RNA (dsRNA), or single-stranded DNA (ssDNA) based genomes (Kaur et al., 2016). Another common genome expression strategy used by (+)- ssRNA is the synthesis of subgenomic RNAs (sgRNAs). This particular assembly allows for the most efficient utilization of genetic material by the exploitation post-translational processing prior to transcription of 3’-proximal genes. sgRNAs often code for genes necessary in pathogenesis and particle formation (Koev and Miller, 2000). Some plant RNA viruses have what is known as satellite RNA (satRNA) associated with them having no sequence homology to the host or the virus itself. satRNAs are completely dependent on a helper virus for their replication and can modify disease expression in the host (Francki, 1985; Hu et al., 2009).

1.6.1 Transmission

The majority of plant viruses are transmitted non-persistently via arthropods, nematodes, or fungi. However, there are persistent modes of transmission through contaminated seed, pollen, or plant-to-plant contact. Aphids (), thrips, and whiteflies are the most common arthropod vectors. Plant viruses can be classified in terms of their movement throughout the vector as circulative or non-circulative depending upon whether the virus particle circulates throughout the body of the vector. Non-circulative viruses tend to be semi or non-persistent meaning they are retained for seconds to minutes such as the Criniviridae and Potyviridae families. These are acquired as they probe into the host plant during feeding and remain associated with the insect mouthparts. Conversely, circulative viruses are usually persistent such as the and Tospoviridae families. These virus particles are ingested from the host

20 and are transmitted through the haemocoel and eventually to the salivary glands where they are transmitted to a new plant during feeding (Kaur et al., 2016).

During circulative transmission in whiteflies, a third-partner bacterial-insect symbiont is required. This bacteria produces GroEL (symbiotic protein) that binds to the viral CP allowing it to pass through the haemocoel. By contrast, non-persistent viruses remain in the stylet of an aphid via a virion associated protein (VAP) coating the viral CP with its C-terminus anchored to the CP and N-terminus facing outward. An aphid transmission factor (ATF) acquired by the aphid independently from the virus binds to the walls of the stylet forming a protein-protein interaction with VAP anchoring the virus particle to the surface of the stylet. VAP is non- discriminatory and can bind to multiple CP meaning that multiple viruses can be bound within a single vector (Hohn, 2007).

1.6.2 Viruses of Potyviridae

The Potyviridae cause significant losses in agricultural, horticultural, and ornamental crops worldwide. Viruses belonging to the family Potyviridae have (+)-ssRNA genomes approximately 8500-12000 nt in length with a poly (A) tail at the 3’-terminus. The genome is encapsidated in flexous filamentous particles approximately 680-900 nm in length and is expressed as a large polyprotein which is post-translationally modified to ten individual proteins

(Adams et al., 2005a; Chen et al., 2001). Genome organization is highly conserved throughout the family. The largest of the six genera within Potyviridae is Potyvirus containing over 100 species, all of which are transmitted by aphids (Adams et al., 2005b). Macularavirus are also transmitted by aphids. The other four genera are classed based on their transmission by fungi

(), whiteflies (), and mites (, ) (Chen et al., 2001).

21

Potato virus Y (PVY) (genus Potyvirus) is an emerging pathogen of potato seriously affecting tuber quality and yield. PVY exists as a complex of strains that are distinguished from each other based on the phenotypic symptoms induced on the host as well as interactions with host resistance genes (Singh et al., 2008; Karasev et al., 2011). The strain groups PVYO

(ordinary strain), PVYN (necrotic strain), and PVYC are well established worldwide. PVYC induces top necrosis leaving the bottom of the plant unaffected (Singh et al., 2008). PVYO induces systemic mottle and mosaic symptoms in most potato and tobacco cultivars leading to a reduction in yield. PVYN causes systemic veinal necrosis in tobacco, but foliar symptoms in potatoes is mild (Lorenzen et al., 2006).

The major strains PVYO and PVYN are subject to genomic recombination in when both are present in the host. These recombinants have segments of both PVYO and PVYN spliced into their genomes (Lorenzen et al., 2006) leading to the generation of new strains with different symptomology depending on where genomic recombination occurs. Some isolates contain three to four recombination points while others have only one or two recombinant points. One such strain with three to four recombination points has PVYN serological characteristics but with distinct symptomology from PVYN and is termed PVYNTN. This additional strain causes external and internal rings, arcs, and discolorations on tubers. This is termed potato tuber necrotic ringspot disease (PTNRD) (Karasev et al., 2011; Lorenzen et al., 2006). An additional tuber necrotic type has appeared in North America and is known as NA-PVYNTN. This strain has been reported to be the result of a mutation and not recombination. It is different from European

PVYN and PVYNTN from other North American strains, and is often referred to as PVYNA-N/NTN

(Lorenzen et al., 2006). Other PVY recombinants include PVYN-Wi and PVYN:O. These strains have two and one recombination junctions respectively. These two strains cause veinal necrosis

22 in tobacco leaves, but mild foliar symptoms in potato leaves similar to PVYN, but are of PVYO serotype (Karasev et al., 2011; Lorenzen et al., 2006).

Other members of the family Potyviridae include Potato virus A (PVA - genus

Potyvirus), Potato virus V (PVV – genus Potyvirus), Plum pox virus (PPV – genus Potyvirus),

Tobacco etch virus (TEV – genus Potyvirus), Ryegrass mosaic virus (RMV – genus Rymovirus),

Narcissus latent virus (NLV – genus Macluravirus), Sweet potato mild mottle virus (SPMMV – genus Ipomovirus), and Barley yellow mosaic virus (BYMV – genus Bymovirus) that infect a very broad range of other hosts including tobacco (Tian et al., 2011), soybean (Hajimorad et al.,

2017), tomato (Aramburu et al., 2006), cassava (Amuge et al., 2017), and pepper (Janzac et al.,

2008). As a whole, each of these has worldwide distribution, particularly in areas in which these crops are primarily grown. have been reported to infect over 500 plant species among 60 plant families (Janzac et al., 2008).

1.6.3 Viruses of Alphaflexiviridae

Members of Alphaflexiviridae are characterized by flexous filamentous virions between

470-580nm in length. Surrounded by a single CP, viruses of this family have (+)-ssRNA genomes consisting of five open reading frames (ORF) and a poly (A) tail at the 3’-terminus approximately 7000 nt in length. The first ORF consists of a viral replicase or RNA-dependent

RNA polymerase (RdRp) responsible for viral transcription and genome replication (Lozano et al., 2017; Ng et al., 2008; Verchot-Lubicz et al., 2007). The central region of the genome encodes three overlapping ORFs known as the triple-gene block (TGB). The proteins coded in this region are responsible for movement of the virus between host cells. The fifth ORF codes the CP required for virion assembly as well as cell-to-cell movement. Some genera within the

23 family have a sixth ORF encoding a zinc-finger binding motif to bind to other nucleic acids

(Verchot-Lubicz et al., 2007).

Potexvirus is the largest genus within this family with at least 25 known species, many of which are of significant agricultural importance. The common infecting potato include Potato virus X (PVX), Potato aucuba mosaic virus (PAMV), and Pepino mosaic virus

(PepMV) (Verchot-Lubicz et al., 2007). PVX is often used a tool to study VIGS. Recombinant

PVX isolates have determined that viruses can induce the silencing of host genes resulting in a degradation of host mRNA influencing the expression of host genes without altering host DNA

(Gonzalez-Jara et al., 2004; Verchot-Lubicz et al., 2007). PAMV is an uncommon pathogen, but does have worldwide distribution but cannot be transmitted by aphids as do other potexviruses unless PVY is present. PVY HC-Pro (helper component proteinase), a non-structural protein involved in a variety of functions such as viral infection, aphid transmission, polyprotein processing, and suppression of host RNA silencing is required for non-persistent transmission of

PAMV via aphids (Baulcombe et al., 1993; Valli et al., 2014; Xu et al., 1994). PepMV was first isolated from pepino crops (Solanum muricatum, Capsicum annum) but has recently spread to tomato crops. PepMV can spread very rapidly and causes symptoms on both leaves and fruit, thus having an impact on fruit quality (Mumford and Metcalfe, 2001; van der Vlugt et al., 2000).

1.6.4 Viruses of Betaflexiviridae

Members of the viral family Betaflexiviridae have (+)-ssRNA genomes ranging from

6500-9000nt in size containing five ORFs and a poly (A) tail at the 3’ end of the genome. Their coat proteins are arranged as flexuous filaments. ORF1 contains the conserved domain coding for RdRp, while ORFs 2-4 overlap to form the TGB as with Alphaflexiviridae, conferring to cell-

24 to-cell movement. ORF5 codes for the CP (Elbeaino et al., 2014). Host symptoms involving infection by the betaflexiviruses range from vein clearing (Elbeaino et al., 2014), leaf mottling

(Hajeri et al., 2010), rugosity of leaves (Gutierrez et al., 2013), or kinked and twisted leaves

(James et al., 2014).

One of the most prevalent viruses infecting potato crops worldwide is Potato virus S

(PVS - genus ). Symptoms of PVS infection of potato include leaf mottling, leaf rugosity, and deepened veins resulting in lower crop yields (Gutierrez et al., 2013). Another common carlavirus, Potato virus M (PVM), also has worldwide distribution causing leaf malformation, vein clearing, and stem necrosis (Senshu et al., 2011). In 2002, a new carlavirus known as Potato latent virus (PotLV) was characterized in Europe where local chlorotic spots turned necrotic after a longer period of incubation in the host Chenopodium murale (Brattey et al., 2002). Other members of the family Betaflexiviridae also infect a number of other economically important crops worldwide including cherries (James et al., 2014; Villamor et al.,

2015), grapevine (Al Rwahnih et al., 2012), apricots (Elbeaino et al., 2014), apples, and pears

(Komorowska et al., 2011). Recently, Potato virus T (PVT – genus ) has been assigned to this family. PVT has a genome of 6539nt in length consisting of three ORFs with a limited range of hosts and no known vector. However, PVT can be transmitted through pollen and seed pieces and can be mechanically inoculated from sap (Russo et al., 2009).

1.6.5 Viruses of Luteoviridae

The Luteoviridae are divided into three genera: , , and .

This family possess (+)-ssRNA genomes ranging from 5600-6000 nt in length that is not poly- adenylated at the 3’ end of the genome. Within this small genome, five or six ORFs translate five

25 functional proteins. ORF1 and ORF2 encode proteins P1 and P2 respectively and are proteins involved in viral replication. Also within this region, ribosomal frameshift occurs to translate

RdRp. ORF3 and ORF4 encodes the CP and a protein required for virus movement and ORF5 encodes a protein involved in aphid transmission and cell-to-cell movement throughout the phloem of the host. Polerovirus contains an extra ORF0 preceding ORF1 at the 5`-end of the genome that encodes the P0 protein that acts as repressor of host RNA-silencing (Pagan and

Holmes, 2010).

Potato leaf roll virus (PLRV – genus Polerovirus) has worldwide distribution and causes severe economic losses in the potato industry. It is transmitted persistently by aphids and is also disseminated and propagated within infected tubers used for seed. The virus is restricted almost exclusively to the vascular tissues of the host. Damage to the phloem can cause carbohydrates to accumulate in leaves instead of flowing to the bottom of the plant. This leads to a corresponding reduction in tuber size. Other related infect companion cells almost exclusively. The accumulation of carbohydrates also create symptoms include vein yellowing and upward leaf rolling. The edges of leaves may also redden. These symptoms, in combination, stunt shoots leading to a reduced yield. The Luteoviridae also cause crop losses in legumes, beets, and cereal crops (Mayo et al., 1989). Other luteoviruses such as Bean leaf roll virus (BLRV) and Soybean dwarf virus (SDV) have been shown to be recombinants between viruses from the genera

Polerovirus and Luteovirus as they have polerovirus-like CP but luteovirus-like RdRp genes

(Correa et al., 2005).

26

1.6.6 Viruses of Bromoviridae

Members of the family Bromoviridae have risen from homologous recombination consisting of the genera Alfamovirus, , , and . With worldwide distribution and a wide host range of more than 10000 species, bromoviruses cause some of the most agronomically important diseases. The members of the Bromoviridae have a genome comprised of three segments of a (+)-ssRNA that is approximately 8000 nt in total length. RNA1 and RNA2 encode the proteins P1 and P2 involved with replication. RNA3 codes for the protein involved with cell-to-cell movement as well as the CP. In some species, the CP is translated from a sgRNA4. Some bromoviruses also encode a fifth protein at the C-terminus of the P2 protein on

RNA2 involved in the host PTGS and viral movement (Codoner and Elena, 2008). The four viral RNAs are encapsidated into bacilliform particles (alfamoviruses), spheroidal particles

(), and spherical particles (bromoviruses and cucumoviruses) (Bol, 1999).

Alfalfa mosaic virus (AMV – genus Alfamovirus) is considered the most important viral pathogen of alfalfa. However, due to an extremely wide host range, AMV has now become one of the most important pathogens of cultivated crops worldwide as it can infect both herbaceous and woody plants. AMV can cause mosaics, mottles, and malformations in peas, tobacco, and potatoes as well as tuber necrosis in potato. AMV is transmitted non-persistently by aphids but can also be transmitted via pollen and seed pieces. Potato tuber necrosis caused by AMV resembles PTNRD caused by PVYNTN. Canadian AMV isolates cause systemic mosaic symptoms along with mottling (Xu and Nie, 2006).

Another significant bromovirus is Cucumber mosaic virus (CMV – genus Cucumovirus).

This pathogen has a large agricultural impact worldwide due to a broad host range. CMV is disseminated non-persistently by aphids and through seed in most species including beans,

27 spinach, lentils, lupin, potato, and peppers. Most hosts infected with CMV develop mosaic symptoms whose severity is dependent on the infecting strain. Tissue necrosis can develop locally or systemically which can lead to reduced vegetative yield or complete death of the plant.

CMV is also important from a research perspective, as it has been used a laboratory model for the development of host plants with pathogen-derived resistance. In this case, a CMV-derived transgene such as P2 or CP in the reverse complement can confer resistance to another viral target. For example, a plant that transcribes CMV mRNA can then become a target for Dicer-like enzymes to convert to siRNA in the presence of sequences of the reverse complement. These siRNA are then loaded into the RISC for degradation (Jacquemond, 2012; Morroni et al., 2008).

1.6.7 Viruses of Virgaviridae

The plant virus family Virgaviridae is relatively taxonomically new. The group includes the genera , Hordevirus, , , Tombamovirus, and .

These rod shaped viruses have (+)-ssRNA genomes with no poly (A) tail. From a genomic perspective, these viruses are relatively diverse with anywhere from one to three genomic RNAs depending on genus ranging from 600-10000 nt in total length. Sequence analysis of the RdRp indicates that all viruses in this group form a single taxonomic unit. While the RdRp gene comprises the majority of the genome, there is a cell-to-cell MP, TGB, and CP (Adams et al.,

2009b).

Two economically important virgaviruses regarding potato processing are Potato mop- top virus (PMTV – genus Pomovirus) and Tobacco rattle virus (TRV – genus Tobravirus).

PMTV has a tripartite genome with RNA1 and RNA2 coding components of RdRp, with RNA 2 also coding for the CP and what is known a read-through protein which is thought to be involved

28 in transmission by the vector. RNA3 encodes the TGB (Savenkov et al., 2003). In contrast, TRV has a bipartite genome with RNA1 coding four ORFs including RdRp, cell-to-cell MP, and a silencing suppressor protein. RNA2 codes for the CP and two nematode transmission factor proteins (Donaire et al., 2008). These two viruses affect the overall quality of a crop by producing internal and external tuber necrosis similar to PTNRD symptoms. Although yield may not be affected, this can cause the rejection of a crop that is destined for trade or processing.

Necrotic rings on tubers can develop after TRV and PMTV infection with no foliar symptoms.

Both of these viruses are soil-borne with TRV being transmitted by nematodes and PMTV transmission via the fungus Spongospora subterranea, known commonly as powdery scab

(Mumford et al., 2000).

1.6.8 Viroids

Viroids are infectious, unencapsidated, small circular ssRNA capable of replication within a plant host. Their genomes are typically 300-400 nt in length and do not code for any proteins. Viroids can be divided into two families: Pospiviroidae and Avsunviroidae. The

Pospiviroidae are comprised of viroids that replicate in the nucleus of the host via an asymmetric rolling-circle mechanism and fold into a rod-like circular ssRNA structure. The Avsunviroidae are comprised of viroids that replicate in the host chloroplast via symmetric rolling-circle mechanism. The Avsunviroidae also have a hammerhead-motif containing eleven conserved bases that cleaves off during infection and binds to other small RNAs as a form of VIGS

(Botermans et al., 2013; Giguere et al., 2014; Monger et al., 2010; Verhoeven et al., 2010).

Within the Avsunviroidae, there are four species. The type species Avocado sunblotch viroid

29

(ASVd – genus Avsunviroid) have rod-liked branch regions and are not circular (Giguere et al.,

2014).

Within the Pospiviroidae, there are nine species. The type species Potato spindle tuber viroid (PSTVd – genus Pospiviroid) as well as other pospiviroids such as Chrysanthemum stunt viroid (CSVd), Citrus exocortis viroid (CEVd), Mexican papita viroid (MPVd), and Tomato chlorotic stunt viroid (TCSVd) cause serious diseases in potato, tomato, pepper, and citrus crops.

However, it is recently become apparent that pospiviroids occur widely in ornamental crops without any visual symptoms. The severity of symptoms produced by these viroids on the typical host species vary by cultivar but are similar to those produced by viruses with stunting being the most common characteristic. This results in unmarketable sized and cracked fruit or spindled and misshapen tubers. PSTVd is readily transmissible by mechanical means, seed, and pollen. Their small genome size causes few errors during replication, are high in G-C content for thermodynamic stability, and are circular also aiding in stability and avoidance of degradation by

RNases (Botermans et al., 2013; Verhoeven et al., 2010).

In the case of all viroid RNA, replication is dependent on the host RNA polymerase.

RNA is imported into the nucleus or chloroplast and then copied via a rolling-circle mechanism by host DNA dependent RNA polymerase. After replication, the viroid progeny leaves the nucleus or chloroplast and moves to adjacent cells via plasmodesmata which can then allow the viroid RNA to move systemically through the phloem. Since viroids do not code for any protein, it is suggested that VIGS via viroid siRNA silences the expression of host genes leading to disease symptoms (Tsagris et al., 2008).

30

1.6.9 Double-stranded RNA (dsRNA) Viruses

dsRNA viruses have a wide host range affecting animals, fungi, bacteria, and plants.

There are twelve dsRNA families that infect plants and fungi (Spear et al., 2010). Most dsRNA viruses are isometric particles of the families , , ,

Reoviridae, , , , , Hypoviridae,

Cystoviridae, and a new group known as . Each of these families have genomes that range 0.7-9.0 kb in size that are both non-segmented and segmented (Jia et al., 2017;

Mertens, 2004; Puchades et al., 2017). dsRNA viruses have been known to infect many economically important crops worldwide including avocado, alfalfa, barley, beets, melon, pepper, rice, and tomato. However, many do not cause the phenotypic symptoms that can be shown for ssRNA virus infections. It is hypothesized that some dsRNA viruses have a mutualistic relationship with the host to provide tolerance from biotic and abiotic agents. In ornamental crops, mild dsRNA viral strains have been shown to protect the plant from severe strains of the same virus. The addition of some dsRNA viruses to some crop cultivars may have the ability to increase crop productivity. However, the synergistic relationship with other viruses or pathogens could be harmful and has not been well studied (Roosinck et al., 2011).

Most dsRNA viruses are similar in terms of their shape and replication strategy. These similarities are in the innermost capsid layers. dsRNA molecules make poor mRNA templates for host cell transcriptases, therefore requiring their own mechanisms. However, they are a very diverse group from a genetic perspective. The level of nucleotide and amino acid sequence similarity is low. Each of the outermost capsid layers are individually adapted for virus transmission and infection most likely due to their broad range of hosts (Mertens, 2004).

31

Most dsRNA plant viruses belong to the family under the genera

Phytoreovirus, and . Members of the genus include Wound tumor virus (WTV) and Rice dwarf virus (RDV). Members of the Fijivirus genus include Maize rough dwarf virus (MRDV), Rice black-streaked dwarf virus (RBSDV), and Oat sterile dwarf virus (OSDV). Members of the Oryzavirus genus include Rice ragged stunt virus (RRSV) and

Echinocloa ragged stunt virus (ERSV) (Noda and Nakashima, 1995). Recently, a new group of plant viruses (family Amalgaviridae, genus Amalgavirus) containing Southern tomato virus

(STV) has been characterized. STV has a unique genome organization with features belonging to the families Totiviridae and Partidiviridae within the RdRp domain. However, the CP of STV shares no sequence similarity with the or the partidiviruses. STV was first detected in tomato plants in Mississippi, California, and Mexico but has recently been detected in Spain,

Italy, France, China, and Bangladesh – mostly in a mixed infection with PepMV. STV is transmitted by seed, but mechanical transmission and grafting attempts have been unsuccessful.

Plants infected with STV exhibit severe mosaic, leaf curl, and yellow stunting however it is unclear if this is directly due to the STV virus or if symptoms are attributed to a synergistic relationship with PepMV (Padmanabhan et al., 2015; Puchades et al., 2017).

1.6.10 (-)-ssRNA Viruses

Negative-sense single-stranded RNA viruses have both animal and plant hosts and differ widely in their morphology and host interactions. They can infect agricultural and ornamental plant species as well as weeds. The most common of the (-)-ssRNA plant infecting viruses are of the genus Tospovirus (family Bunyaviridae) Their RNA genome is single-stranded, tri-partite, and of negative-sense polarity. The most common of the tospoviruses is Tomato spotted wilt

32 virus (TSWV) but contains more than 20 recognized or tentative species. Gene expression occurs via a process called “cap-snatching” where host short sub-genomic mRNAs capped at their 5’ ends (methylated) are cleaved close to the cap by an endonuclease encoded by the host which are then used as primers for viral mRNA transcription (Kormelink et al., 2011; Reguera et al., 2010).

Major crops susceptible to TSWV infection are tomato, pepper, papaya, lettuce, potato, peanut, and tobacco with varying symptoms in each host. Such symptoms include stunting, necrosis, chlorosis, ring spots, and veinal clearing (Adkins, 2000).

Another group of negative-strand RNA viruses known as the family collectively infect animals and plants contains the genus containing the species

Lettuce necrotic yellows virus (LNYV) whose replication occurs in the cytoplasm. The genus

Nucleorhabdovirus contains the species Potato yellow dwarf virus (PYDV) whose replication occurs in the nucleus of infected hosts. PYDV is a highly destructive pathogen of potato causing almost complete death upon infection under high virus titre (Bandyopadhyay et al., 2010). At least seven strains of PYDV have been characterized; each with different levels of symptom severity. Early research of this virus contributed significantly to the field of virus-insect (vector) interaction (Jang et al., 2017).

1.7 Objectives and Aims of This Study

Given the vast diversity and quantity of known plant RNA viruses and their hosts, it is of critical importance to have the capability to identify the presence of a viral pathogen before it can evade detection and spread rapidly. Current techniques for identifications present several significant drawbacks, particularly for the detection of unknown agents or novel species. Many plant viruses may not be mechanically transmissible to indicator hosts or easily visualized using

33 standard electron microscopy (EM). PCR assays may require re-validation due to genetic mutation or genomic recombination. Microarrays require known genetic sequences which may not be suitable for the detection of a novel pathogen. Given that new strains can arise and thus spread very rapidly without detection, more specific and universal techniques are required.

Currently the PPEQ program of the CFIA has the ability to screen specifically for twelve plant viruses, universal screening for viroids, as well as fungal and bacterial contaminants. The number of plant viruses that are known to exist, however, may only be a small fraction of the total number in existence. As a result, the capability of the program needs to be expanded to identify more plant pathogens. This will give greater confidence to both the CFIA and the

Canadian Potato Program that the PPEQ program meets the needs of today given current trading practices.

The aim of this work is to develop a meta-transcriptomic diagnostic technique utilizing

NGS for the purpose of de novo detection of plant viruses without a priori knowledge of their existence in the host. This will be performed by:

 Evaluating and developing procedures for the isolation of high quality RNA for cDNA

library construction.

 Evaluating a commercial NGS library preparation kit compatible with the Illumina

sequencing platform.

 Developing an RNA-seq bioinformatic workflow for the detection of viral contigs.

34

CHAPTER 2

MATERIALS AND METHODS

35

2.0 MATERIALS AND METHODS

2.1 Virus and Viroid Isolates

Samples used for RNA isolation, positive controls for RT-PCR, and NGS runs were obtained from the tissue culture collection maintained at the Charlottetown Laboratory of the

CFIA. Isolates used for this study are indicated in Table 2.1. These samples arrived at the facility from different locations worldwide from different S. tuberosum or S. lycopersicum cultivars.

Some arrived from other government affiliated facilities, positive-tested diagnostic samples submitted by CFIA inspectors, and PPEQ submissions determined to be positive for different viruses. These samples were either initially propagated from seed, or inoculated onto a healthy plant and placed in the tissue culture collection. Each plant containing a different virus isolate was aseptically cloned on a six to eight week basis for maintenance. Cloning involved aseptically cutting tissue culture plant at a node and inserting vertically into approximately 5 mL of sterile

Murashige and Skoog Agar pH 5.7 (Appendix A) in a 25 x 150 mm Kimax Tissue Culture Tube.

36

Table 2.1. List of samples used in this study. (Continued on page 38) Sample ID Host Origin Originator PotLV S. tuberosum Canada AAFC Nepean PVYNTN S. tuberosum Hungary Scottish Advice for Agriculture PVYO S. tuberosum Canada AAFC Nepean PVYO + PVYN:O S. tuberosum Hungary Scottish Advice for Agriculture PVYN/NTN-NA S. tuberosum Canada AAFC Nepean PVM-2 S. tuberosum Canada AAFC Nepean PVS-3 S. tuberosum Unknown N/A PVX-1 S. tuberosum Canada AAFC Nepean PVV-1 S. tuberosum Canada AAFC Nepean AMV-2 S. tuberosum Canada DIA03-175 PSTVd-2 S. tuberosum South Korea PPEQ Accession 40-5 PLRV-3 S. tuberosum Canada AAFC Nepean DIA17-130 S. lycopersicum Canada Diagnostic sample DIA17-131 S. lycopersicum Canada Diagnostic sample DIA17-132 S. lycopersicum Canada Diagnostic sample DIA15-016-1 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-016-2 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-016-3 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-017-1 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-017-2 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-017-3 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-018-1 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-018-2 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-018-3 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-016-11 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-016-21 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-016-31 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-016-41 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-016-51 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-017-11 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-017-21 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-017-31 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-017-41 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-017-51 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-018-11 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-018-21 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-018-31 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-018-41 S. tuberosum Bangladesh Border intercept – PPEQ accession DIA15-018-51 S. tuberosum Bangladesh Border intercept – PPEQ accession 82-1 S. tuberosum Chile PPEQ accession 82-2 S. tuberosum Chile PPEQ accession 82-3 S. tuberosum Chile PPEQ accession 82-4 S. tuberosum Chile PPEQ accession 82-5 S. tuberosum Chile PPEQ accession

37

Table 2.1. (continued) List of samples used in this study. Sample ID Host Origin Originator 82-6 S. tuberosum Chile PPEQ accession 82-7 S. tuberosum United Kingdom PPEQ accession 82-8 S. tuberosum Belgium PPEQ accession 82-9 S. tuberosum United Kingdom PPEQ accession 82-10 S. tuberosum United Kingdom PPEQ accession 83-1 S. tuberosum Chile PPEQ accession 83-2 S. tuberosum Chile PPEQ accession 83-3 S. tuberosum Chile PPEQ accession 83-4 S. tuberosum Chile PPEQ accession 83-5 S. tuberosum Chile PPEQ accession 83-6 S. tuberosum Chile PPEQ accession 83-7 S. tuberosum Chile PPEQ accession 83-8 S. tuberosum Chile PPEQ accession 83-9 S. tuberosum Chile PPEQ accession 83-10 S. tuberosum Chile PPEQ accession 83-11 S. tuberosum Chile PPEQ accession 84-1 S. tuberosum Chile PPEQ accession 84-5 S. tuberosum Chile PPEQ accession 84-6 S. tuberosum Chile PPEQ accession 84-9 S. tuberosum Chile PPEQ accession 84-12 S. tuberosum Netherlands PPEQ accession 1The diagnostic section at the Charlottetown Laboratory of the CFIA maintained the same numbering scheme with the second batch of these samples entered into the PPEQ program.

Mechanical inoculation involved the maceration of virus-positive plant material and mixing in a 1:3 ratio 0.01M KPO4 buffer pH 7.0 (phosphate buffer - Appendix B). Plants were grown in a greenhouse rated as a biocontainment level 2 facility (CFIA Biohazard Containment and Safety, 2010) approximately 6.2 x 7.5 m in size. All air supply, temperature, irrigation, lighting, shade curtains, and humidity are computer controlled. Greenhouse bays containing plant material are kept at a temperature of 22.0°C during daytime, 18.0°C during nighttime, 65% humidity, automatic irrigation using spigots for 10 minutes each day beginning at 9:00am.

Lighting is applied on a diurnal cycle with each bay receiving 16 h of sunlight and 8 h of darkness each day. On overcast days, artificial light is applied by 8-10 400 W high pressure sodium lamps (60 W m-2) resulting in 250 cd at bench level. Artificial lighting is activated at or

38 below 150 PAR (photosynthetically active radiation) and is inactive at any value measured above. Growth chambers are maintained at 22.0°C under general atmospheric conditions also using the same diurnal cycle.

Waste material generated for this study as well as others that occurs utilizing greenhouse or tissue culture growth chamber chambers is decontaminated before disposal. Effluent waste water used during watering is autoclaved and plant material is autoclaved prior to leaving the building to release. Benches and other greenhouse bay facilities were scrubbed using a mixture of Comet™ and water, then bleached (500 ppm, Javex).

2.2 RNA Extraction

2.2.1 Tri-Reagent Extraction

Plant leaves collected from each sample were transferred to one side of a universal extraction bag (Bioreba AG, Reinach, Switzerland). Each extraction bag was subjected to mechanical homogenization using a Homex 6 (Bioreba AG, Reinach, Switzerland) allowing for macerated plant sap to be acquired out of the other side of the meshed extraction bag. RNA was extracted from the sap by adding 100 µL of sap to 1 mL of Tri-Reagent (Molecular Research

Center, Cincinnati, USA) solution in a nuclease-free 1.5 mL tube. The mixture was vortexed and let stand for approximately 5 minutes after which time it was then centrifuged at 12000 x g for

10 minutes at 4°C using a Micromax RF refrigerated centrifuge (Thermo Fisher Scientific,

Waltham, USA). After separation of plant cell wall material and placement of the supernatant into a fresh tube, 200 µL of chloroform (Sigma Aldrich, St. Louis, USA) was then added to each sample which was then vigorously vortexed for approximately 15 seconds and let stand for approximately 10 minutes. After the 10 minute incubation, each sample was then centrifuged at

39

12000 x g for 15 minutes to separate into an aqueous and non-aqueous phase. Approximately

700 µL of each aqueous phase could be obtained by decanting. After placement of the decanted aqueous layer into a 1.5 mL fresh tube, 500 µL of isopropanol (Sigma Aldrich, St. Louis, USA) was added to each sample to precipitate the RNA. The sample was gently mixed by inversion and after approximately 10 minutes of incubation at room temperature, each sample was centrifuged at 16000 x g for 10 minutes to pellet the RNA. Upon removal of the supernatant, 1 mL of 75% ethanol (Commercial Alcohols, Toronto, Canada) was added and the pellet is briefly re-suspended via vortexing. Each sample is then centrifuged at 12000 x g for 10 minutes, decanted, and left to air dry for approximately 20-30 minutes. The pellet was then re-constituted using 100 µL of UltraPure™ nuclease-free distilled deionized water (ddH2O) (Thermo Fisher

Scientific, Waltham, USA). The sample was then placed on ice or immediately frozen at -30°C.

2.2.2 PureLink Total RNA Mini Kit

Plant leaf tissue was placed in an extraction bag (Bioreba AG, Reinach, Switzerland) containing 2 mL of lysis buffer provided by the PureLink Total RNA Mini Kit (Thermo Fisher

Scientific, Waltham, USA) containing 1% β-mercaptoethanol (Sigma Aldrich, St. Louis, USA)

(5mL per 500mL lysis buffer). Each extraction bag was mechanically homogenized using a

Homex 6 (Bioreba AG, Reinach, Switzerland) and 700 µL of plant sap mixed with lysis buffer was acquired and placed in a 1.5 mL nuclease-free tube and centrifuged at 12000 x g for 2 minutes at room temperature. The supernatant was placed in a new tube where an equivalent volume of 75% ethanol (Commercial Alcohols, Toronto, Canada) was added to the plant sap/lysis buffer mix. Half of this volume was transferred to a spin cartridge provided with the kit and centrifuged at 12000 x g for 15 seconds at room temperature using a Micromax RF

40 refrigerated centrifuge (Thermo Fisher Scientific, Waltham, USA). The flow through was discarded and the remainder of the sample was added to the spin cartridge and centrifuged at

12000 x g for 15 seconds at room temperature. Wash Buffer I (700 µL) provided with the kit was added to the spin cartridge and centrifuged at 12000 x g for 15 seconds at room temperature. The spin cartridge was then placed in a new collection tube and 500 µL of Wash Buffer II and centrifuged at 12000 x g for 15 seconds at room temperature. This step was then repeated one time. Upon discarding the flow through, the spin cartridge was then centrifuged at 12000 x g for

1 minute at room temperature to dry the spin cartridge membrane. The spin cartridge was then placed in a newly labeled 1.5 mL tube and 100 µL of nuclease-free ddH2O provided with the kit was added to the centre of the membrane. The sample was then centrifuged at 12000 x g for 2 minutes at room temperature and the flow through was kept and immediately placed on ice or frozen at -30°C.

2.2.3 Direct-zol RNA Mini-Prep

Plant leaf tissue was placed in an extraction bag (Bioreba AG, Reinach, Switzerland) and mechanically homogenized using a Homex 6 (Bioreba AG, Reinach, Switzerland). Tri-Reagent

(Molecular Research Center, Cincinnati, USA) was then placed in a 2.0 mL tube in a 3:1 ratio

(Tri-Reagent:plant sap) and mixed by vortexing. An equal volume of 100% ethanol (Commercial

Alcohols, Toronto, Canada) was placed in the sample tube and mixed by pipetting. Half of the volume was then added to a Zymo-Spin IIC Column provided with the Direct-zol RNA Mini-

Prep Kit (Zymo Research, Irvine, USA) and centrifuged at 12000 x g for 30 seconds at room temperature using a Micromax RF refrigerated centrifuge (Thermo Fisher Scientific, Waltham,

USA). The flow through was discarded and the remainder of the sample was added to the spin

41 column and centrifuged at 12000 x g for 30 seconds. The sample was then washed by adding 100

µL of RNA Wash Buffer (provided with the kit) to the column and centrifuged at 12000 x g for

30 seconds and the flow through was discarded. DNase I (6 U µL-1) was added to DNA digestion buffer according to the manufacturer’s instructions and added directly to the column. The column was then incubated at room temperature for 15 minutes. Direct-zol RNA PreWash (400

µL) was then added to the column and centrifuged at 12000 x g for 30 seconds at room temperature. After discarding the flow through, 700 µL of RNA Wash Buffer was added to the column and centrifuged at 12000 x g for 2 minutes at room temperature. The column was then transferred to a fresh 1.5 mL nuclease-free tube and 100 µL of nuclease-free ddH2O was added to the column, and centrifuged at 12000 x g for 30 seconds at room temperature. The sample was then immediately placed on ice or frozen at -30°C.

2.2.4 Quick-RNA Mini-Prep

Plant leaf tissue was placed in an extraction bag (Bioreba AG, Reinach, Switzerland) and mechanically homogenized using a Homex 6 (Bioreba AG, Reinach, Switzerland). Plant sap was then added in a 1:3 ratio with lysis buffer (plant sap:lysis buffer) provided by the Quick-RNA

Mini-Prep Kit (Zymo Research, Irvine, USA) in a 1.5 mL nuclease-free tube, vortexed, and incubated at room temperature for 5 mintues. The lysate was cleared by centrifugation at 10000 x g for 1 minute using a Micromax RF refrigerated centrifuge (Thermo Fisher Scientific, Waltham,

USA) at room temperature. The supernatant was then transferred to a Spin-Away filter in a new

1.5 mL nuclease-free tube and centrifuged at 10000 x g for 1 minute at room temperature. The flow through was then mixed by pipetting with one volume of 100% ethanol (Commercial

Alcohols, Toronto, Canada). The mixture was then transferred to a Zymo-Spin IIICG column in

42 a new 1.5 mL nuclease-free tube and centrifuged at 10000 x g for 30 seconds at room temperature. The flow through was discarded and column was treated with DNase I (6 U µL-1) according to the manufacturer’s instructions and added directly to the column. The column was then incubated at room temperature for 15 minutes. RNA Prep Buffer (400µL) was then added to the column and the sample was centrifuged at 10000 x g for 30 seconds at room temperature.

The flow through was discarded and 700 µL of RNA Wash Buffer was added to the column and centrifuged at 10000 x g for 30 seconds. The flow through was discarded and 400 µL of RNA

Wash Buffer was added to the column and centrifuged at 10000 x g for 2 minutes at room temperature. Nuclease-free ddH2O (50 µL) was then added to the column and centrifuged at

10000 x g for 30 seconds at room temperature in a 1.5 mL nuclease free tube. The sample was then immediately placed on ice or stored at -30°C.

2.2.5 Plant/Fungi Total RNA Purification Kit

Plant leaf tissue was placed in an extraction bag (Bioreba AG, Reinach, Switzerland) and mechanically homogenized using Homex 6 (Bioreba AG, Reinach, Switzerland). Plant sap was then placed in a 1:3 ratio with lysis buffer (plant sap:lysis buffer) provided with the Plant/Fungi

Total RNA Purification Kit (Norgen Biotek Corporation, Thorold, Canada), mixed by vortexing, and incubated at 55°C for 5 minutes. The lysate was then centrifuged at 12000 x g for 2 minutes at room temperature using a Micromax RF refrigerated centrifuge (Thermo Fisher Scientific,

Waltham, USA). The supernatant was placed in a new 1.5 mL nuclease-free tube and an equal volume of 100% ethanol (Commercial Alcohols, Toronto, Canada) was added to the lysate and mixed by vortexing. The lysate (600 µL) was then added to a spin column (grey) and centrifuged at 3500 x g for 1 minute at room temperature. The flow through was discarded and the column

43 was then treated with DNase I according to the manufacturer’s instructions. Wash Solution A

(400 µL) was then applied to the column and centrifuged at 12000 x g for 1 minute at room temperature. This step was repeated two more times, discarding the flow-through each time. The column was then subjected to a dry centrifugation step by centrifuging at 12000 x g for 2 minutes at room temperature. The column was then placed in a fresh 1.5 mL nuclease-free tube and 50 µL of Elution Solution A was added to the sample. The sample was then centrifuged at

12000 x g for 2 minutes at room temperature. The sample was then immediately placed on ice or stored at -30°C.

2.3 RNA Quality and Yield Assessment

RNA quality was assessed and the concentration of each sample was determined using an

Agilent Bioanalyzer 2100 capillary electrophoresis system (Agilent, Santa Clara, USA) to examine the ratio of the 28 S and 18 S ribosomal subunits. The gel matrix and chip loading was performed as per manufacturer’s protocol (Agilent, Santa Clara, USA). RNA Plant Nano was the protocol selected on the Agilent software for use during the analysis.

2.4 Construction of cDNA Libraries

2.4.1 TruSeq Stranded Total RNA Library Preparation

2.4.1.1 rRNA Depletion

After the quality of RNA was assessed, the decision was made to move forward with cDNA library preparation and the subsequent sequencing reaction. Using the reagents in the

TruSeq Stranded Total RNA Library Preparation Kit with Ribo-Zero Depletion (Illumina, San

Diego, USA) for a single sample, 10.0 µL of RNA was added to the well of a 96-well plate along

44 with 5.0 µL of RNA Binding Buffer and 5.0 µL of rRNA Removal Mix and mixed by pipetting.

The plate was then sealed and heated in a Mastercycler Nexus GSX1 (Eppendorf, Hamburg,

Germany) thermal cycler at 68°C for 5 minutes to bind to rRNA in the sample. After incubation,

35.0 µL of rRNA Removal Beads were added to the sample and mixed by pipetting for the rRNA to bind to the complementary rRNA sequences in the rRNA Removal Mix, which bound to the beads. The plate was then placed on a magnetic stand (Thermo Scientific, Waltham, USA) where the beads were then separated from the mixture and approximately 45.0 µL of supernatant is transferred to another 96-well plate.

2.4.1.2 RNA Purification and Fragmentation

RNAClean XP beads (99µL) (Beckman Coulter, Brea, USA) were then added to the ribosomally depleted RNA and incubated at room temperature for 15 minutes to bind to the

RNA. After incubation, the plate was transferred to a magnetic stand and the supernatant is removed after the solution appeared clear. The beads were then subjected to a 200 µL wash with

70% ethanol (Commercial Alcohols, Toronto, Canada). Upon removal of the 70% ethanol, the beads were left to air dry for approximately 15 minutes. The plate was then taken off the magnetic stand and 11.0 µL of Elution Buffer was added to elute the RNA off the beads. The plate was then placed back on the magnetic stand and 8.5 µL supernatant was removed and placed in a new plate. Elute, Prime, Fragment High Mix (8.5 µL) was then added to the supernatant, plate was sealed, and placed in the thermal cycler at 94°C for 8 minutes to fragment the RNA followed by a 4°C indefinite hold.

45

2.4.1.3 cDNA Synthesis

First strand cDNA synthesis occurred by adding 8 µL of First Strand Synthesis Mix Act

D containing SuperScript II Reverse Transcriptase (Thermo Scientific, Waltham, USA) to the fragmented RNA. The plate was then re-sealed and placed in the thermal cycler and incubated at

25°C for 10 minutes for random priming, 45°C for 15 minutes for reverse transcription, 70°C for enzyme inactivation, followed by a 4.0°C indefinite hold. Second strand cDNA synthesis occurred by adding 5.0 µL of Resuspension Buffer, 20.0 µL of Second Strand Marking Master

Mix, and incubated in a thermal cycler at 16°C for 1 hour with a 4°C indefinite hold. The cDNA was then purified by adding 90.0 µL of AMPure XP beads (Beckman Coulter, Brea, USA) and mixed by pipetting. The sample was then incubated for 15 minutes at room temperature. The plate was then placed on a magnetic stand until the solution turned clear and the supernatant was removed. The beads were then washed with 80% ethanol twice and left to air dry for 15 minutes at room temperature. After this point, the plate was taken off of the magnetic stand and 17.5 µL of Resuspension Buffer was added to take the cDNA off the beads. The plate was then placed back on the stand and 15.0 µL of the supernatant was removed and placed in a new 96-well plate.

2.4.1.4 3’-Adenylation and Adapter Ligation

The total cDNA was then adenylated at the 3’-end. Resuspension Buffer (2.5µL) was added to the sample along with 12.5 µL of A-Tailing Mix and mixed by pipetting. The plate was sealed and placed in a thermal cycler for 37°C for 30 minutes for 3’-adenylation, 70°C for 5 minutes for enzyme inactivation, followed by a 4°C indefinite hold. Resuspension buffer (3.5

µL) was then added to the sample along with 2.5 µL of Ligation Mix. The user-selected adapter

46

(1.5 µL) is then added to each sample individually to ensure that each sample has a unique index sequence. The plate was then sealed and placed on a thermal cycler at 30°C for 15 minutes with a

4°C indefinite hold. After this period of incubation, 5.0 µL of Stop Ligation Buffer was added to the sample. The now adapter-ligated cDNA library was purified using 42 µL AMPure XP beads

(Beckman Coulter, Brea, USA) and mixed by pipetting and incubated at room temperature for 15 minutes. The plate was then placed on a magnetic rack until the solution turned clear and the supernatant was removed. The sample was washed twice with 80% ethanol and air dried at room temperature for 15 minutes. The plate was then taken off the magnetic rack and 52.5 µL

Resuspension Buffer was added to remove the cDNA library from the beads. The plate was then placed back on the magnetic rack until the solution turned clear. The supernatant (50.0 µL) was placed in new 96-well plate. Purification was then repeated using 50.0 µL AMPure XP beads for cDNA binding, 22.5 µL Resuspension Buffer, and obtaining 20.0 µL of eluate.

2.4.1.5 cDNA Library Enrichment

The cDNA library was then enriched by adding 5.0µL of PCR Primer Cocktail Mix which are primers complementary to the adapter sequence and 25.0 µL of PCR Master Mix for a total volume of 50 µL. The mixture was then placed in a thermal cycler and incubated at 98°C initial denaturation for 30 seconds, 15 cycles of 98°C denaturation for 10 seconds, 60°C annealing for 30 seconds, 72°C extension for 30 seconds, followed by at 72°C final extension for

5 minutes and a 4°C indefinite hold. The enriched cDNA library was then purified using 50.0 µL

AMPure XP beads, mixed by pipetting, and incubated at room temperature for 15 minutes. The plate was then placed on a magnetic rack until the solution turned clear and the supernatant was discarded. The beads were then twice washed with 80% ethanol and let to air dry at room

47 temperature for 15 minutes. The plate was then taken off the magnetic rack and the cDNA library was re-suspended using 32.5 µL of Resuspension Buffer, placed back on the magnetic plate until the solution turned clear, and 30.0 µL final purified enriched cDNA library was removed and placed in a new plate.

2.4.1.6 cDNA Library Validation

Before continuation, validation of the presence of cDNA library occurred by diluting 1.0

µL cDNA library in 4.0 µL nuclease-free ddH2O (1:5). Gel matrix was prepared using a High

Sensitivity DNA Kit (Agilent, Santa Clara, USA) as per the manufacturer instructions and 1.0 µL of the diluted cDNA library was added to the chip. The chip was then placed in a Bioanalyzer capillary electrophoresis system (Agilent, Santa Clara, USA). High Sensitivity DNA was the protocol selected in the software.

2.4.1.7 cDNA Library Quantitation and Normalization

The concentration of cDNA library present was determined via qPCR using a KAPA

Biosystems Library Quantitation Kit (KAPA Biosystems, Wilmington, USA). cDNA library was diluted 1:100000 in nuclease free ddH2O. Master mix was prepared and qPCR conditions were used for the standards and sample as instructed in the manufacturer protocol and performed in triplicate. The standard curve provided an equation for the slope of the line (y=mx+b) which was then used to calculate the concentration of library present determined using the mean CT value acquired for the sample and the standards. Based on the quality of the input RNA and quality of the cDNA library generated, the expected size of the mean length of the cDNA fragments in the library is 300 bp which is used in the calculation of library concentration. After determination of

48 the concentration of each undiluted library it was further diluted to 4 nM. Multiple samples sequenced simultaneously can be diluted to 4 nM and pooled a single tube. The now singular or pooled 4 nM cDNA library (5.0 µL) was denatured using 5.0 µL 0.2 N NaOH (Bioshop,

Burlington, Canada) mixed by pipetting in a 1.5 mL tube for 5 minutes at room temperature.

Buffer HT1 (990 µL) was then added to bring the singular or pooled cDNA library to a concentration of 20 pM. The 20 pM cDNA library was further diluted to 5 pM using Buffer HT1.

The 5 pM cDNA library (600 µL) was then loaded into a MiSeq Reagent Kit v.3 (150 cycle)

(Illumina, San Diego, USA) and placed into the Illumina MiSeq (Illumina, San Diego, USA) sequencing system for sequencing.

2.5 Sequence Analysis

The bioinformatic workflow used for sequence analysis was created using CLC

Genomics Workbench (Qiagen, Hilden, Germany) by sequence data input from an Illumina platform by pairing the forward and reverse reads (reads 1 and 2). Reads were mapped either to a

Solanum tuberosum or Solanum lycopersicum reference genome acquired from the FTP database in GenBank (http://ftp.ncbi.nlm.nih.gov) using no reference masking. The scoring matrix used for mapping to the reference genome is indicated in Table 2.2 using a linear gap cost and non- specific matches were mapped randomly. After acquiring a mapping report for downstream QC analysis, all unmapped reads were subject to de novo assembly. Assembly occurred by mapping reads back to contigs, local alignments, random match mode, and the scoring matrix indicated in

Table 2.3 creating an automatic bubble size of 50 and a word size of 20 to generate de Bruijn graphs with minimum contig lengths of 200. After de novo assembly of unmapped reads, consensus sequences were extracted using a threshold of low coverage of 0 using “N” symbols

49 for ambiguity. Conflicts during assembly were resolved by a vote meaning in positions where there is a disagreement regarding the alignment, the conflict will be solved by counting the instances of each nucleotide and letting the majority decide the nucleotide in the contig.

Table 2.2. Scoring values used for mapping paired reads to the reference genome. Parameter Cost Match score 1 Mismatch cost 2 Insertion cost 3 Deletion cost 3 Insertion open cost 6 Insertion extend cost 1 Deletion open cost 6 Deletion extend cost 1

Table 2.3. Scoring values used for de novo assembly of unmapped contigs. Parameter Cost Mismatch cost 2 Insertion cost 3 Deletion cost 3

2.6 Design of Oligonucleotide Primers

Primers used for this study were primarily acquired from published literature. However, for some non-common targets, primers were designed using sequences published in GenBank

(http://www.ncbi.nlm.nih.gov) for the CP of each virus detected or suspected. Multiple sequences from multiple published isolates with varying geographical origins were aligned using

Clustal Omega (https://www.ebi.ac.uk/Tools/msa/) and regions of sequence similarity were selected. These regions were assessed for GC content, product size, and annealing temperature.

To ensure there would be no internal complementarity or complementarity between the sense and antisense primers, OligoCalc (http://biotools.nubic.northwestern.edu/OligoCalc.html) was used. Primer and probe design was then transferred to Integrated DNA Technologies (Coralville,

50

USA) and returned lyophilized. The primers were then re-suspended using nuclease free ddH2O to a stock concentration of either 50 µM or 100 µM (dependent on lyophilized yield) and stored at -80°C.

2.7 Synthesis of cDNAs

Complementary DNA (cDNA) was synthesized using 0.5 mM dNTPs (Thermo

Scientific, Waltham, USA), 0.2 µM random primer (Thermo Scientific, Waltham, USA), 1.0 µL of total RNA, and nuclease-free ddH2O (Thermo Scientific, Waltham, USA) to a final volume of

10 µL. This reaction was heated at 65°C for 5 minutes for primer annealing using an Eppendorf

Mastercycler Nexus GSX1 (Eppendorf, Hamburg, Germany) thermal cycler. Samples were then placed on ice while a second master mix was prepared containing 3.9 µL of nuclease-free ddH2O

(Thermo Scientific, Waltham USA), 4.0 µL 5X First Strand Buffer (Thermo Scientific,

Waltham, USA), 1.0µL dithiothreitol (DTT) (Thermo Scientific, Waltham, USA), 0.1 µL RNase

OUT (Thermo Scientific, Waltham, USA), and 1.0 µL Maloney-murine leukemia virus reverse transcriptase (M-MLV RT) was prepared. 10 µL of this master mix was added to each sample to yield a final volume of 20 µL. The samples were then incubated in a thermal cycler at 37°C for

60 minutes of reverse transcription followed by 15 minutes at 80°C for enzyme inactivation and a 20°C indefinite hold.

cDNA was also synthesized using SuperScript VILO (variable input linear output)

Reverse Transcriptase (Thermo Fisher, Waltham, USA). Each sample contained 13 µL of nuclease-free ddH2O (Thermo Fisher, Waltham, USA), 4 µL of 5X VILO Reaction Mix (Thermo

Scientific, Waltham, USA), 2 µL of 10X Superscript Enzyme Mix (Thermo Scientific, Waltham,

USA), and 1 µL of total RNA for a 20 µL reaction volume. Samples were incubated in a thermal

51 cycler at 25°C for 10 minutes for primer binding, 42°C for 60 minutes for reverse transcription, and 85°C for enzyme inactivation followed by a 20°C indefinite hold.

2.8 PCR and RT-qPCR

2.8.1 PCR

For general detection of a single target, cDNA from a known positive sample was used as a positive control and nuclease-free ddH2O as a negative control. For the detection of a single target, 15.5 µL of nuclease-free ddH2O (Thermo Fisher, Waltham, USA), 2.5 µL 10X Taq Buffer

(Qiagen, Hilden, Germany), 1.0 µL 25 mM MgCl2 (Qiagen, Hilden, Germany), 1.0 µL 10 mM dNTPs (Qiagen, Hilden, Germany), 1.0 µL 5 µM forward primer and 1.0 µL 5 µM reverse primer (Integrated DNA Technologies, Coralville, USA), 0.5 µL 10% blotto (Thermo Fisher,

Waltham, USA), 0.5 µL 5000 U mL-1 Taq polymerase (New England Biolabs, Ipswich, USA), and 2.0 µL cDNA was required for a 25 µL reaction volume. PCR was performed in using an

Eppendorf Mastercycler Nexus GSX1 (Eppendorf, Hamburg, Germany) thermal cycler at 94°C initial denaturation for 5 minutes, 35 cycles of 95°C denaturation for 45 seconds, annealing temperatures ranging from 55°C-65°C for 45 seconds, 72°C extension for 45 seconds, followed by a 72°C final extension for 7 minutes, and a 20°C indefinite hold. Primers, their sequence, and their optimized annealing temperatures used in this study are listed in Table 2.4. PCR products were visualized on a 2.0% agarose gel in 0.5X TBE buffer (Thermo Scientific, Waltham, USA).

52

Table 2.4. List of primers and probes used in this study. (Continued on page 54) Primer Target Sequence (5’-3’) Ta (°C) Amplicon Size (bp) Reference PLR3 PLRV CACTGATCCTCAGAAGAATCG 58.0 486 Schoen et al., 1996 PLR4 AAGAAGGCGAAGAAGGCAATG PVS-F3 PVS TCCGGTCGTTTGGAATTATATGC 60.0 338 Unpublished PVS-R3 CATTGGTTTATTGCATTCCG PAMV-F3 PAMV TTTGCTGCATTTGATTTCTTC 60.0 375 Xu et al., 1994 PAMV-R3 AAAGAAACTGTTGAAACTCCC PS1 PVS TCACCTCAGTTACTCCAACC 55.0 714 Unpublished PS2 ACTTAGACCCTGCAGGGACTG PX1 PVX CAAGGTTTCAAGCCTGAGCA 58.0 306 Unpublished PX2 AAATACTATGAAACTGGGGTAG PM5 PVM TGAGCTCGGGACCATTCATAC 59.0 520 Xu et al., 2010 PM6 ACATCTGAGACATGATGCGC ToCV-F3 ToCV TGTCCGATGTTCCAAACGCT 55.0 250 Unpublished ToCV-R3 AGTTCCTTGCCCTCGTTACC STV-F STV CGTTATCTTAGGCGTCAGCT 60.0 470 Puchades et al., 2017 STV-R GGAGTTTGATTGCATCAGCG PSTVd-F PSTVd CCTTGGAACCGCAGTTGGT 62.5 77 Bertolini et al., 2010 PSTVd-R TTTCCCCGGGGATCCC PSTVd-P PSTVd Cy5-TCCTGTGGT-TAO- TCACACCTGAACTCCT-IabRQSp

Pep-F2 PepMV TGGGTTTAGCAGCCAATGAGA 65.0 71 Gutierrez- (US2) Aguirre et al., 2009 Pep-R2 AACTTTGCACATCAGCATAGGCA

53

Table 2.4. (continued) List of primers and probes used in this study. Primer Target Sequence (5’-3’) Ta (°C) Amplicon Reference Size (bp) Pep-P2 PepMV FAM-CGGACCTGC-ZEN- (US2) CATGTGGGACCTC-3IABkFQ COX-F COX CGTCGCATTCCAGATTATCCA 65.0 88 Weller et al., 2000 COX-R CAACTACGGATATATAAGAGCCAAAACTG COXsol- Qua705-AGGGCATTCCATCCAGCGTAA- 1511T BHQ2 O6266c PVY CTCCTGTGCTGGTATGTCCT Touchdown1 Varied1 Lorenzen et 60.0-66.0 al., 2006b O2172 CAACTATGATGGATTTGGCGACC N5707 GTGTCTCACCAGGGCAAGAAC S5585m GGATCTCAAGTTGAAGGGGAC O2439c CCCAAGTTCAGGGCATGCAT N2258 GTCGATCACGAAACGCAGACA A6032m CTTGCGGACATCACTAAAGCG N2650c TGATCCACAACTTCACCGCTAACT 1Primers designed for the PVY-multiplex assay are strain-specific. For each strain type, band patterns observed in a 2% agarose gel will deduce strain-type. For the first 12 cycles, 66°C was used as the annealing temperature touching down to by 0.5°C per cycle for 12 cycles ending with 20 cycles at 60°C.

54

2.8.2 Multiplex-PCR for PVY-strain Discrimination

The second multiplex RT-PCR assay used in this study involved the discrimination of

PVY-strains. This was a two-step RT-PCR assay requiring the generation of cDNA prior to PCR.

For this assay, there are eight primers mixed for a working concentration of 5 µM (Integrated

DNA Technologies, Coralville, USA) (Table 2.4). For a single reaction, 14.2 µL nuclease-free ddH2O, 2.5 µL 10X Taq Buffer (Qiagen, Hilden, Germany), 0.3 µL 25mM MgCl2 (Qiagen,

Hilden, Germany), 0.5 µL 10 mM dNTPs (Qiagen, Hilden Germany), 0.5 µL 10% blotto

(Thermo Fisher, Waltham, USA), 4.8 µM 5 µM primer mix (Table 2.4), 0.2 µL Taq polymerase

(New England Biolabs, Ipswich, USA), and 2.0 µL cDNA was required for a 25 µL reaction volume. PCR was performed using an Eppendorf Mastercycler Nexus GSX1 (Eppendorf,

Hamburg Germany) via touchdown PCR at 94°C initial denaturation for 5 minutes, 12 cycles of

95°C denaturation at 45 seconds, 66°C with 0.5°C decrease in annealing temperature each cycle down to 60°C, 72°C extension for 45 seconds, followed by 20 cycles of 95°C denaturation for 45 seconds, 60°C annealing for 45 seconds, 72°C for 45 seconds, followed by a 72°C final extension for 7 minutes and a 20°C indefinite hold. PCR products were visualized on a 2.0% agarose gel in

0.5X TBE buffer (Thermo Scientific, Waltham, USA).

2.8.3 One-Step RT-qPCR for Detection of PSTVd and PepMV

One-step RT-qPCR for detection of PSTVd and PepMV was performed using a

SensiFAST Probe NO-ROX One-Step Kit (Bioline, London, UK). This reaction included an internal control detecting cytochrome oxidase (COX) of S. tuberosum and S. lycopersicum. For this assay, the forward and reverse primers were mixed to a final working concentration of 10.0

µM (Integrated DNA Technologies, Coralville, USA) (Table 2.4). One single reaction involves

55 using 5.25 µL nuclease-free ddH2O, 12.5 µL 2X SensiFAST Mix (Bioline, London, Canada),

0.75 µL 10.0 µM PSTVd-F/R or Pep-F2/R2 mix, 0.75 µL 5.0 µM PSTVd-P or Pep-P2 probe

(Biosearch Technologies, Novato, USA), 0.75 µL 10.0 µM COX-F/R mix, 0.75 µL 5.0 µL

COXsol-1511T probe, 2.5 µL Eva Green (Biotium, Fremont, USA), 0.25 µL Reverse

Transcriptase, 0.5 µL RNase Inhibitor, and 1.0 µL of total RNA for a final volume of 25 µL.

Cycling was performed in a Rotorgene-Q qPCR cycler (Qiagen, Hilden, Germany) using 45°C reverse transcription for 10 minutes, 95°C denaturation for 2 minutes, 35 cycles of 95°C denaturation for 30 seconds, 65°C annealing/extension for 20 seconds, and melt analysis using a range between 50-90°C with a 1°C increase per 4 seconds acquiring data on the red and crimson

(PSTVd) or green (PepMV) and crimson (COX) channels.

2.9 Bioassay

The PPEQ program employs plant bioassay as a non-discriminate test for the presence of plant pathogens. The program uses 19 indicator species (Appendix C) with the ability to harbour many of the same plant pathogens as the various Solanum species. After planting from seed based on a seeding schedule, individual plants are potted and mechanically inoculated using a mixture (1:3) of ground sap from a PPEQ accession and phosphate buffer. Similarly, for this study, graft inoculation was used for this study which involved cutting off lower stems and making a small slit using a sterile scalpel in the main stem of a healthy host plant. Thin symptomatic stems were then sliced from a donor at an angle and in the slit and wrapped with tape. This allowed for the donor tissue to integrate with the host and pathogens could be transferred by the donor to a healthy host plant. This was even true between two different species

(tomato to potato and potato to tomato). Injection from a donor to a healthy host was also used in this study whereby leaves from a donor plant were grounded to sap and mixed with phosphate

56 buffer (1:3), loaded into a sterile 0.45mm x 16mm syringe and injected into the midrib of leaves and to the main stem of the host. Symptom development was observed over a period of 4-5 weeks and noted in the final report for each accession in all cases.

57

CHAPTER 3

RESULTS

58

3.0 RESULTS

3.1 Determination of RNA Quality

Approximately 1.0 g of leaves was selected from each of 6 different potato plants growing in greenhouse conditions that are used by the Charlottetown Laboratory of the CFIA as frequent control material. These 6 plants are all individually infected with PVYNTN, PLRV,

AMV, PotLV, PVV, and PSTVd. The quality of the RNA extracts for each of the selected extraction methods (Tri-Reagent, PureLink Total RNA Mini Kit, Direct-zol RNA Mini-Prep,

Quick-RNA Mini-Prep, Plant/Fungi Total RNA Purification Kit) was assessed using the

Bioanalyzer capillary electrophoresis system (Agilent, Santa Clara, USA) (Tables 3.1-3.5,

Appendix D).

Table 3.1. Quality and quantity of RNA isolated using Tri-Reagent. Sample RIN [ ](ng uL-1) PVYNTN 2.5 722 PLRV 2.9 702 PotLV 2.5 1147 AMV 2.4 1496 PVV 2.5 1316 PSTVd 2.5 959

Table 3.2. Quality and quantity of RNA isolated using PureLink Total RNA Mini Kit. Sample RIN [ ](ng uL-1) PVYNTN 8.3 1296 PLRV 8.7 238 PotLV 8.7 261 AMV 8.3 107 PVV 8.3 1747 PSTVd 9.0 833

Table 3.3. Quality and quantity of RNA isolated using Direct-zol RNA Mini-Prep Kit. Sample RIN [ ](ng uL-1) PVYNTN - 1161 PLRV 2.6 285 PotLV 2.0 909 AMV 2.2 408 PVV 1.8 2014 PSTVd - 1604

59

Table 3.4. Quality and quantity of RNA isolated using Quick-RNA Mini-Prep Kit. Sample RIN [ ](ng uL-1) PVYNTN 2.1 110 PLRV - 209 PotLV 4.2 612 AMV 1.9 714 PVV - 1182 PSTVd 2.1 743

Table 3.5. Quality and quantity of RNA isolated using Plant/Fungi RNA Purification Kit. Sample RIN [ ](ng uL-1) PVYNTN 3.8 201 PLRV 2.4 804 PotLV 2.3 927 AMV 2.0 271 PVV 2.9 1042 PSTVd 1.8 564

3.2 Evaluation of the TruSeq Stranded Total RNA Library Preparation Protocol

3.2.1 Acquisition of Tubers Imported from Bangladesh in Toronto, Canada

In late 2015, inspectors representing the CFIA intercepted 24 potato tubers (cultivar unknown) from three markets in Toronto, Canada. The sellers admitted to the inspector that these tubers were illegally imported from Bangladesh. The tubers were shipped to the Charlottetown

Laboratory of the CFIA for analysis and given the identifiers DIA15-016, DIA15-017, and

DIA15-018 for 3 composite batches of tubers. RNA was isolated from a composite sample of each batch and RT-PCR for the specific detection of five common viruses was performed. It was determined that each sample contained a mixed infection of many of the viruses commonly found in North America (result not shown). Based on this result, 9 tubers total (3 from each composite) were left to sprout and planted individually and left to grow under greenhouse conditions. After approximately 100 days, each plant displayed a variety of symptoms typical of virus infection. Each plant displayed different levels of stunting, chlorosis, mosaic mottling, and

60 leaf purpling. Approximately 1.0 g of symptomatic leaf tissue from each plant was macerated and RNA was isolated from each sample using the PureLink Total RNA Mini Kit (Thermo

Fisher, Waltham, USA). cDNA libraries were generated for 8 of these 9 samples (one excluded because of poor RNA yield) using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Removal Reagent (Illumina, San Diego, USA). Upon cDNA library generation, validation, and enrichment, each library was quantified via qPCR using the KAPA Biosystems

Library Quantitation Kit (KAPA Biosystems, Wilmington, USA) (Figure 3.1A). This allowed for the generation of a standard curve (Figure 3.1B) which was used to quantify each cDNA library for the purposes of pooling.

(A)

Figure 3.1. (A) Amplification of standards of known concentration and cDNA libraries created from Bangladesh-intercepted plants using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Depletion Reagent. Concentration of standards were 20pM, 2pM, 0.2pM, 0.02pM, 0.002pM, and 0.002pM. Each sample was performed in triplicate and cDNA libraries were diluted 1:100000 (B) Standard curve generated from amplification of standards. (Continued on page 62)

61

(B)

Figure 3.1. (continued) (A) Amplification of standards of known concentration and cDNA libraries created from Bangladesh-intercepted plants using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Depletion Reagent. Concentration of standards were 20pM, 2pM, 0.2pM, 0.02pM, 0.002pM, and 0.002pM. Each sample was performed in triplicate and cDNA libraries were diluted 1:100000 (B) Standard curve generated from amplification of standards.

Following the calculation of the concentration of each cDNA library, each sample was pooled to generate a 5 pM pooled library which was sequenced using a MiSeq Reagent Kit v.3

(150 cycle) (Illumina, San Diego, USA) with sequence yield for each sample shown in Table 3.6.

Upon mapping to the host genome, host sequence is removed and any remaining non-host sequence was assembled in a de novo fashion and searched by BLAST (Basic Local Alignment

Search Tool) (http://www.ncbi.nlm.nih.gov) with results indicated in Table 3.7. Upon BLAST analysis, it was determined that each sample contained a mixture of viruses from a number of different groups. These results, including the sample not sequenced in the batch, were confirmed via RT-PCR (Appendix E).

62

Table 3.6. Summary of data generated after sequencing of 8 samples from Bangladesh and sequencing against the constructed S. tuberosum genome published in GenBank. Sample Total Reads Total Reads % Mapped % Unmapped Mapped Unmapped DIA15-016-1 389251 6290679 5.83 94.17 DIA15-016-2 253271 8145627 3.02 96.98 DIA15-016-3 194331 8906427 2.14 97.86 DIA15-017-1 478757 7558859 5.96 94.04 DIA15-017-2 328207 7320556 4.29 95.71 DIA15-017-3 313291 7169555 4.19 95.81 DIA15-018-1 323258 9743572 3.21 96.79 DIA15-018-2 369514 6492446 5.38 94.62

Table 3.7. NGS results of tuber interception from Bangladesh confirmed via RT-PCR using the primers listed in Table 2.4. Sample Viruses Detected via NGS confirmed by RT-PCR DIA15-016-1 PAMV, Eu-PVYN + Eu-PVYN/NTN, PVS, PVM, ToCV DIA15-016-2 PAMV, PVYO, PVS, PVM DIA15-016-3 PVX, PVYO + Eu-PVYNTN, PVS, PVM, ToCV DIA15-017-1 PAMV, PVYC, PVS, PVM, ToCV DIA15-017-2 PAMV, Eu-PVYN + Eu-PVYN/NTN, PVS, PVM, ToCV DIA15-017-3 PVYC, PVS, PVM, ToCV DIA15-018-1 PAMV, PVYC, PVS, PVM, ToCV DIA15-018-3 PAMV, Eu-PVYN + Eu-PVYN/NTN, PVS, PVM, ToCV

3.2.2 Testing Remainder of Tubers from Bangladesh Interception

There were 15 tubers that remained from each of the three composite samples from

Bangladesh (5 for each market location). Each tuber was planted individually and grown under greenhouse conditions. Again, the sample identifiers generated by the Charlottetown Laboratory of the CFIA were kept. After approximately 100 days, each plant displayed a variety of symptoms typical of virus infection. These symptoms were similar to those displayed by the previous set. Approximately 1.0 g of symptomatic leaf tissue from each plant was macerated and

RNA was isolated from each sample using the PureLink Total RNA Mini Kit (Thermo Fisher,

Waltham, USA). cDNA libraries were generated for 12 of these 15 samples as three were

63 excluded due to poor RNA quality using the TruSeq Stranded Total RNA Library Prep Kit with

Plant rRNA Removal Reagent (Illumina, San Diego, USA) in two separate preparations. Upon cDNA library generation, validation, and enrichment, each library was quantified via qPCR using the KAPA Biosystems Library Quantitation Kit (KAPA Biosystems, Wilmington, USA)

(Figure 3.2).

(A)

Figure 3.2. (A) Amplification of standards of known concentration and cDNA libraries created from second set of Bangladesh-intercepted plants using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Depletion Reagent. Concentration of standards were 20pM, 2pM, 0.2pM, 0.02pM, 0.002pM, and 0.002pM. Each sample was performed in triplicate and cDNA libraries were diluted 1:100000 (B) Standard curve generated from amplification of standards. (Continued on pages 65 and 66)

64

(B)

(C)

Figure 3.2. (continued) (A) Amplification of standards of known concentration and cDNA libraries created from second set of Bangladesh-intercepted plants using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Depletion Reagent. Concentration of standards were 20pM, 2pM, 0.2pM, 0.02pM, 0.002pM, and 0.002pM. Each sample was performed in triplicate and cDNA libraries were diluted 1:100000 (B) Standard curve generated from amplification of standards. Continued on page 66.

65

(D)

Figure 3.2. (continued) (A, C) Amplification of standards of known concentration and cDNA libraries created from second set of Bangladesh-intercepted plants using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Depletion Reagent. Concentration of standards were 20pM, 2pM, 0.2pM, 0.02pM, 0.002pM, and 0.002pM. Each sample was performed in triplicate and cDNA libraries were diluted 1:100000 (B, D) Standard curve generated from amplification of standards.

Following the calculation of the concentration of each cDNA library for both runs, each sample was pooled to generate a 5 pM pooled library which was sequenced using a MiSeq

Reagent Kit v.3 (150 cycle) (Illumina, San Diego, USA) with sequence yield for each sample shown in Table 3.8. Upon mapping to the host genome, host sequence is removed and any remaining non-host sequence was assembled in a de novo fashion and searched by BLAST

(http://www.ncbi.nlm.nih.gov) with results indicated in Table 3.9. Upon BLAST analysis, it was determined that each sample, as in the previous NGS run using samples from the Bangladesh intercept, contained a mixture of viruses from a number of different groups. These results, including the samples not sequenced in the batch were confirmed via RT-PCR (Appendix F).

66

Table 3.8. Summary of data generated after sequencing of 12 samples from Bangladesh against the constructed S. tuberosum genome published in GenBank. Sample Total Reads Total Reads % Mapped % Unmapped Mapped Unmapped DIA15-016-1 255592 4441130 5.44 94.56 DIA15-016-2 207888 3384450 5.79 94.21 DIA15-016-3 485189 5853946 7.65 92.35 DIA15-016-4 437846 4658920 8.59 91.51 DIA15-016-5 1983941 4116939 32.52 67.48 DIA15-017-1 2661096 12755712 17.26 81.74 DIA15-017-2 3222950 4518014 41.63 58.37 DIA15-017-3 1948965 3540189 35.51 64.49 DIA15-017-4 1306 213822 0.61 99.39 DIA15-017-5 1068557 5281439 16.83 83.17 DIA15-018-2 386284 4497418 7.91 92.09 DIA15-018-4 539725 4920083 9.89 90.11

Table 3.9. NGS results of tuber interception from Bangladesh confirmed via RT-PCR using the primers listed in Table 2.4. Sample Viruses Detected via NGS confirmed by RT-PCR DIA15-016-1 PVM, PVS, PAMV, ToCV, PVYN DIA15-016-2 PVM, PVS, PAMV, PLRV, ToCV, PVYC DIA15-016-3 PVM, PVS, PAMV, ToCV, PVYC DIA15-016-4 PVM, PVS, PAMV, ToCV, PVYC DIA15-016-5 PVM, PVS, PAMV, PLRV, PVYO + Eu-PVYNTN DIA15-017-1 PVM, PVS, PAMV, PVYC DIA15-017-2 PVS, ToCV, PVYC DIA15-017-3 PVM, PVS, ToCV, PVYC DIA15-017-4 PVM, PVS, PAMV, ToCV, PVYO DIA15-017-5 PVM, PVS, PAMV, ToCV, PVYO DIA15-018-2 PVM, PVS, PAMV, ToCV, PVYO DIA15-018-4 PVM, PVS, PAMV, PLRV, ToCV, PVYO + Eu-PVYN

These samples were mechanically inoculated to healthy tomato (S. lycopersicum) in an effort to fulfill Koch’s postulates. Leaf sap from these samples were inoculated in a 1:3

(sap:phosphate buffer) ratio to tomato. After 21 days post-inoculation, RNA was isolated from each sample and RT-PCR was performed for each target originally confirmed via NGS and RT-

PCR (result not shown). After mechanical inoculation, each virus was re-confirmed to be present as in Table 3.9 in each inoculated sample with the exception of ToCV. Small shoots from each S.

67 tuberosum plant was then grafted to a node of each S. lycopersicum plant. After 14 days post- grafting, ToCV was detected in each tomato plant with results concurrent with the results listed in Table 3.9 showing that ToCV is incapable of being mechanically inoculated to its host, but can be graft inoculated.

3.2.3 Testing Imported Nuclear Stocks

In the spring of 2018, three “batches” of imported nuclear stock were tested within the

PPEQ program. These samples were given unique PPEQ identifiers based on their arrival at the

Charlottetown Laboratory of the CFIA. Due to spatial requirements in the biocontainment level

II greenhouse, only 12 samples can be tested at one time. Therefore, there are a maximum of 12 samples per “batch”. During this time, space in the greenhouse allowed for three batches of imports to be tested simultaneously. These were given the unique identifiers of bathes 82, 83, and 84 with 12 samples belonging to each batch. Some samples are received at the Charlottetown

Laboratory of the CFIA but are rejected upon reception and are not tested within the program.

Their origins are listed in Table 2.1.

Upon observation in the greenhouse of these plants’ growth, several plants in each batch began to observe symptoms typical of viral infection including the plants inoculated in bioassay.

These symptoms included chlorosis, leaf curling, and mosaic-type symptoms that are not typical symptoms of plants that arrive within the PPEQ program as nuclear stock. RNA was isolated from approximately 1.0 g of symptomatic leaf tissue from each accession within each batch using the PureLink Total RNA Mini Kit (Thermo Fisher, Waltham, USA). cDNA libraries were generated from each sample using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Removal Reagents (Illumina, San Diego, USA) in three separate reactions. Upon cDNA

68 library generation, validation, and enrichment, each library was quantified via qPCR using the

KAPA Biosystems Library Quantitation Kit (KAPA Biosystems, Wilmington, USA) (result not shown). Following the calculation of the concentration of each cDNA library for both runs, each sample was pooled to generate a 5 pM pooled library which was sequenced using a MiSeq

Reagent Kit v.3 (150 cycle) (Illumina, San Diego, USA). Upon mapping to the host genome, host sequence is removed and any remaining non-host sequence was assembled in a de novo fashion and searched by BLAST (http://www.ncbi.nlm.nih.gov) with results indicated in Table

3.10. Sequence yields for this run are indicated in Appendix H. The results from sequencing were later confirmed via RT-PCR (Appendix G).

69

Table 3.10. Results from sequencing PPEQ batches 82, 83, and 84 that were confirmed via RT- PCR using primers listed in Table 2.4. Batch Identifier Sequence Result 82-1 Neg 82-2 Neg 82-3 Neg 82-4 Neg 82-5 PVYO 82-6 Neg 82-7 Neg 82-8 Neg 82-9 Neg 82-10 Neg 83-1 Neg 83-2 PVYO 83-3 PVYO 83-4 PVYO 83-5 PVYNTN 83-6 PVYO + Eu-PVYNTN 83-7 PVS 83-8 Neg 83-9 Neg 83-10 Neg 83-11 Neg 84-1 Neg 84-4 Neg 84-5 Neg 84-6 Neg 84-7 PLRV 84-9 Neg

3.2.4 Testing for Unknown Pathogen in S. lycopersicum Leaves

In February of 2018, three leaf samples arrived at the Charlottetown Laboratory of the

CFIA from a greenhouse in Lacombe, Alberta. These samples were given the unique identifiers of DIA17-130, DIA17-131, and DIA17-132. The grower had indicated that many of his tomato plants (S. lycopersicum) within this greenhouse were experiencing the same symptoms including severe necrosis, chlorosis, and leaf scorching. Initially, these samples were suspected of being infected with PSTVd or another tomato-infecting viroid. Total RNA was isolated from these

70 plants using the PureLink Total RNA Mini Kit (Thermo Fisher, Waltham, USA). Upon arrival to the Charlottetown Laboratory of the CFIA, these leaves were beginning to degrade by being partially frozen, wet, and very shriveled from being on ice and traveling to Prince Edward Island from Alberta. Therefore, RNA quality was severely compromised. cDNA libraries were generated from each sample using the TruSeq Stranded Total RNA Library Prep Kit with Plant rRNA Removal Reagents (Illumina, San Diego, USA). Upon cDNA library generation, enrichment, and validation, each library was quantified via qPCR using the KAPA Biosystems

Library Quantitation Kit (KAPA Biosystems, Wilmington, USA) (result not shown). Following the calculation of the concentration of each cDNA library for both runs, each sample was pooled to generate a 5 pM pooled library which was sequenced using a MiSeq Reagent Kit v.3 (150 cycle) (Illumina, San Diego, USA). Upon mapping to the host genome (S. lycopersicum), host sequence was removed and any remaining non-host sequence was assembled in a de novo fashion and searched by BLAST (http://www.ncbi.nlm.nih.gov).

Non-host sequence queries within BLAST revealed that each sample was infected with

PepMV, PSTVd, and STV with the results from sequencing indicated in Appendix H. The presence of both PepMV and PSTVd was later confirmed via qPCR and the presence of STV was confirmed via RT-PCR by producing an anticipated 470 bp PCR product (Figure 3.3).

71

(A)

Figure 3.3. (A) Real-time RT-qPCR amplification using RNA from leaves acquired from Alberta and the PSTVd-specific primer/probe set PSTVd-F/R/P. (B) Real-time RT-qPCR amplification using the same RNA and the PepMV-specific primer/probe set Pep-F2/R2/P2. (C) Agarose gel electrophoresis of RT-PCR products from cDNA generated from the same RNA amplified using the STV-specific primers STV-F/R.

72

(B)

(C)

Figure 3.3. (continued) (A) Real-time RT-qPCR amplification using RNA from leaves acquired from Alberta and the PSTVd-specific primer/probe set PSTVd-F/R/P. (B) Real-time RT-qPCR amplification using the same RNA and the PepMV-specific primer/probe set Pep-F2/R2/P2. (C) Agarose gel electrophoresis of RT-PCR products from cDNA generated from the same RNA amplified using the STV-specific primers STV-F/R.

73

These samples were mechanically inoculated back to healthy tomato (S. lycopersicum).

Leaf sap from these samples were inoculated in a 1:3 (sap:phosphate buffer) ratio to tomato.

After 21 days post-inoculation, RNA was isolated from each sample and RT-PCR and RT-qPCR was performed for each target originally confirmed via NGS, RT-PCR and RT-qPCR (result not shown). After mechanical inoculation, PepMV and PSTVd were confirmed to be present. Since no live plant containing STV was available, sap from DIA17-130, DIA17-131, and DIA17-132 was mixed in a 1:3 ratio (sap:phosphate buffer) were injected via syringe into the petioles, nodes, internodes, and midribs in all locations of healthy tomato plants. After 21 days post-inoculation,

STV could still not be detected.

3.2.5 Sensitivity Assay of the Illumina Sequencing Platform

Symptomatic leaf tissue (approximately 1.0 g) from plants infected with PotLV, PLRV, and PVX were ground to liquid leaf sap and diluted 1:8 each in the same tube with liquid sap from healthy potato leaf tissue. This 1:8 dilution was then serially diluted with further healthy potato leaf tissue to 1:4096 to create a dilution series to test the sensitivity of the NGS assay. cDNA libraries were then created from each dilution using the TruSeq Stranded Total RNA

Library Prep Kit with plant rRNA Removal Reagents (Illumina, San Diego, USA). Upon cDNA library generation, enrichment, and validation, each library was quantified via qPCR using the

KAPA Biosystems Library Quantitation Kit (KAPA Biosystems, Wilmington, USA) (result not shown). Following the calculation of the concentration of each cDNA library for both runs, each sample was pooled to generate a 5 pM pooled library which was sequenced using a MiSeq

Reagent Kit v.3 (150 cycle) (Illumina, San Diego, USA). Upon mapping to the host genome (S. tuberosum) host sequence was removed and any remaining non-host sequence was assembled in

74 a de novo fashion and searched by BLAST (http://www.ncbi.nlm.nih.gov). Results from this sequencing run are shown in Table 3.11.

Table 3.11. Summary of data generated after sequencing a twofold dilution series consisting of 3 viruses (PVX, PLRV, PotLV) ranging from undilute to 1:4096 in healthy potato leaf sap against the constructed S. tuberosum genome published in GenBank. Sample Total Reads Total Reads % Mapped % Virus Mapped Unmapped Unmapped Detected 1:8 2302846 760474 75.17 29.83 PVX, PLRV, PotLV 1:16 1698716 156800 91.55 8.45 PVX, PLRV, PotLV 1:32 5544757 147055 97.42 2.58 PVX, PLRV, PotLV 1:64 5692042 118752 94.96 2.04 PVX, PLRV, PotLV 1:128 7810303 465051 94.78 5.62 PVX, PLRV, PotLV 1:256 1132095 23349 97.98 2.02 PVX, PLRV, PotLV 1:512 15723812 334166 97.92 2.08 PVX, PLRV, PotLV 1:1024 4775128 71608 98.52 1.48 PVX, PLRV, PotLV 1:2048 10017732 331550 96.80 3.20 PVX, PLRV, PotLV 1:4096 5973021 203137 96.71 3.29 Neg

3.2.6 GenBank Submissions

From the several sequencing reactions used in the evaluation of the TruSeq Stranded

Total RNA Library Preparation Protocol, there were several complete and partial viral genomes constructed. These complete sequences along with some incomplete sequences have been submitted to GenBank (https://www.ncbi.nlm.nih.gov/) for publication (Table 3.12). Some incomplete sequences were also submitted to contribute to a small number that are currently available. Both the BankIt and Sequin submission portals were used to submit sequences.

75

Table 3.12. Selected complete and incomplete sequences submitted directly to GenBank obtained in this study. Accession Virus Source Complete/Partial Genome MF133523 PAMV DIA15-016-1 Complete MG356504 PAMV DIA15-016-5 Complete MG356506 PAMV DIA15-017-1 Complete MF133525 PAMV DIA15-018-2 Complete MF133526 PAMV DIA15-018-4 Complete MF133527 PVM DIA15-016-1 Complete MF133529 PVM DIA15-016-3 Complete MG356508 PVM DIA15-016-5 Complete MG356509 PVM DIA15-017-1 Complete MF133529 PVM DIA15-018-2 Complete MF133530 PVM DIA15-018-4 Complete MF622198 PVYC DIA15-016-2 Complete MF622199 PVYC DIA15-017-1 Complete MF622200 PVYC DIA15-017-2 Complete MF622201 PVYC DIA15-017-3 Complete MF622202 PVYO DIA15-017-5 Complete MF678621 ToCV DIA15-016-3 Partial, minor coat protein MF678622 ToCV DIA15-016-3 Partial, p17 protein MF678623 ToCV DIA15-017-2 Partial, minor coat protein MG356502 PLRV DIA15-016-5 Complete MG356504 PLRV DIA15-018-4 Complete MF503887 PVS DIA15-016-1 Complete MF503888 PVS DIA15-016-2 Complete MF503889 PVS DIA15-016-3 Complete MF503890 PVS DIA15-016-4 Complete MG356510 PVS DIA15-016-5 Complete MF503891 PVS DIA15-017-1 Complete MF503892 PVS DIA15-017-5 Complete MF503893 PVS DIA15-018-2 Complete MF503894 PVS DIA15-018-4 Complete MK391568 PepMV DIA17-130 Complete MK391569 PepMV DIA17-131 Complete MK391570 PepMV DIA17-132 Complete MK391571 STV DIA17-132 Partial, coat protein MH069212 PVX PVX-1 Complete MF990287 PotLV PotLV Complete MG356503 PLRV PLRV-3 Complete

76

CHAPTER 4

DISCUSSION

77

4.0 DISCUSSION

4.1 Requirement of High Quality RNA for cDNA Library Construction

Molecular techniques such as NGS require a highly sensitive and efficient method for

RNA analysis. Thermodynamically, RNA is a stable molecule but is rapidly digested by the presence of ubiquitous RNases commonly present on almost any surface. Because of this, shorter

RNA fragments are commonly present in a sample than originally transcribed having the potential to compromise downstream results and applications. Similarly, high-quality RNA isolation from plant tissues, particularly to a stressed plant, is challenging. Stressed plants have inhibitory biomolecules present such as phenolic compounds, polysaccharides, carbohydrates, secondary metabolites, and lipids that could either bind or co-precipitate with RNA making it unsuitable for downstream application. Most standard isolation procedures require modifications in their protocols depending on the plant tissue type (Friedman, 2004; Gambino et al., 2008;

Nadiya et al., 2015; Schroeder et al., 2006; Srivastava et al., 2012). Many of the above- mentioned biomolecules are also inhibitors of PCR. This is an important factor to be considered as PCR is an important technique exploited by the SBS chemistry utilized by the Illumina NGS platform. For successful cDNA amplification, the choice of the type of nucleic acid isolation is therefore undoubtedly crucial but is heavily dependent on the type and nature of the samples themselves (Thaler and Bajc, 2013).

Good quality plant-derived RNA is generally universally measured as the intensity of 25

S rRNA being approximately double that of the 18 S rRNA band on an agarose gel. However, this is subject to human interpretation and may vary from person to person within a laboratory.

In the case of this study, the Agilent 2100 Bioanalyzer (Agilent, Santa Clara, USA) was used as an alternative tool to assess RNA quality to produce data that expresses RNA quality via a RIN

78 score (Schroder et al., 2006) on a scale of 1 (totally degraded) to 10 (fully intact). For plant samples, RIN values above 6.0 can be considered intact and acceptable for highly sensitive downstream applications (Nadiya et al., 2015). The Bioanalyzer instrument uses a microfluidic chip and laser-induced fluorescence (LIF) detection on a miniaturized scale (Srivastava et al.,

2012). This allows for minimal amount of sample to be used (1 µL) so that the remainder can be used for downstream analysis but also calculates RNA concentration with accuracy that is comparable with the universally recognized Nanodrop 8000 spectrophotometer (Thermo Fisher,

Waltham, USA).

This study examined a number of commonly used RNA isolation techniques. They included traditional Tri-Reagent phenol:chloroform reaction, silica column based, and magnetic bead-based methods. It was determined that the PureLink Total RNA Mini Kit (Thermo Fisher,

Waltham, USA) modified protocol yielded the highest quality and most intact RNA (with RIN

>6.0) out of all methods. Previous studies have shown that high quality RNA is isolated using

CTAB (cetyl trimethylammonium bromide) methods combined with the presence of β- mercaptoethanol (Gambino et al., 2008; Nadiya et al., 2015; Srivastava et al., 2012) using silica columns (Gayral et al., 2011; Thaler and Bajc, 2013). The PureLink Total RNA Mini Kit (Table

3.2) combines all of these features including a CTAB-based lysis buffer containing 1% β- mercaptoethanol and silica columns to capture and wash precipitated RNA. The CTAB buffer is a strong protein denaturant and binds to many phenolics and secondary metabolites present in the crude plant extract that have ringed molecular structures to pass them through the silica matrix

(Nadiya et al., 2015). β-mercaptoethanol also denatures proteins by disrupting disulfide bonds which eliminates exogenous RNases. The silica matrix of the spin column adsorbs the RNA with the aid of a high pH and salt concentration binding solution (Wash Buffer I) allowing for the

79 removal of contaminants. It is due to the high quality and concentration of the RNA yielded in this preparation that allows for an internal control to be omitted from the general sequencing reaction. The other RNA isolation techniques used in this study either use guanidinium-based salts that are strong protein denaturants but do not eliminate secondary metabolites and do not eliminate any potential contaminants. Despite having comparable yields, the quality of the RNA that was finally eluted was poor to not determinable (Tables 3.3 and 3.4).

4.2 Evaluation of a Commercial Library Preparation Kit for Diagnosis

It is clear that the TruSeq Stranded Total RNA Library Prep Kit (Illumina, San Diego,

USA) is robust in its capability to generate cDNA libraries for the Illumina platform. There are other kits commercially available that are compatible with the Illumina platform in the sense that the adapter and index sequences ligated to the 3’-ends of the cDNA are the same as those in the

Illumina kit. Such kits include the NEBNext Ultra RNA Library Prep Kit for Illumina (New

England Biolabs, Ipswich, USA) and the KAPA RNA HyperPrep Kit (KAPA Biosystems,

Wilmington, USA) and were also used to generate cDNA libraries for this study (results not shown). However, these cDNA library generation kits do not include reagents targeted at removing host rRNA which could comprise as much as 80-90% of a single RNA extract (Lowe et al., 2017). Therefore, libraries generated for samples using these kits were heavily contaminated with rRNA and contained only small amounts of non-host sequence. This was not considered ideal as this study is aimed at evaluating NGS for the de novo detection of plant viruses and were not considered ideal due to the high quantity of host sequence that is acquired.

The number of unique sequence reads acquired from each sample varies from library to library (Tables 3.6, 3.8, and 3.11). The quality of each RNA sample is not equivalent and the

80 concentration of rRNA in each sample is not equivalent. Therefore, varying amounts of host mRNA and non-host RNA are present for sequencing. Furthermore, during the generation of each cDNA library, there are many pipetting and magnetic bead purification steps involved which will alter the quantity of RNA and cDNA that can be bound in a single reaction. The pipetting skills between investigators will also not be standard or equivalent which can have large downstream impacts on each sample. Despite the fact that each library is normalized to

4nM after pooling, the accuracy of library pooling is dependent on the efficiency of the qPCR reaction to quantify and the quality of the standard curve that is generated. Furthermore, even slight quantities of primer-dimers after the PCR fragment enrichment step could cause the subsequent qPCR quantitation reaction produce a lower CT value than it should, thus causing the investigator to calculate a higher concentration of library to pool than exists in the sample

(Figures 3.1-3.6).

4.2.1 Accurate Detection of Mixed Infections in a Single Sample

The TruSeq Stranded Total RNA Library Prep Kit is, however, able to generate millions of sequence reads from various plant pathogens without a priori knowledge of their presence in the sample. In the case of the plants generated from tubers intercepted from Bangladesh, multiple viruses were detected in each sample (Tables 3.7 and 3.9). Within plants, mixed viral infections may be more common than single infections. It is unclear, in this case, if the multiple infections that were determined to be present in each sample is an example of co-infection (two or more viruses invade the host simultaneously) or super-infection (different viruses or strains of viruses infect the host at different times). It is also unclear in this mixed infection if these viruses co- infecting the same host are interacting in a synergistic or antagonistic way. Synergistic

81 interactions have a facilitative effect on at least one (or all) of the viral partners leading to an increase in virus replication in the host plant. Antagonistic types of interactions occur when only one virus is the beneficiary and its presence or activity lowers the fitness of the second virus

(Syller, 2012). These types of synergistic interactions could be a possible explanation as to why approximately 95% of unmapped reads (as in Table 3.6) do not map back to the host genome.

Multiple infections and multiple different types of interactions could be vastly increasing the quantity of viral RNA present leading to an increased quantity of unmapped reads after host rRNA depletion.

Plant viral synergism has been widely studied (Garcia-Cano et al., 2006; Gonzalez-Jara et al., 2004; Pruss et al., 1997). The most commonly studied synergistic relationship that has been studied relates to that of PVY and PVX. This synergistic response is mediated by the expression of HC-Pro of PVY, a multifunctional protein controlling diverse processes of the viral cycle including suppression of PTGS. HC-Pro interferes with multiple silencing pathways leading to an enhancement of disease symptoms and a 10-fold increase in PVX titre compared to a single infection (Syller, 2012). This co-infection (along with the presence of PVS, PVM, and ToCV) was found to be present in DIA15-16-3 (Table 3.7) of the first sequencing run of Bangladesh samples. However, it cannot be determined for certain if a high titre of PVX within the plant is a result of co-infection with PVY in the sample, or a result of the presence of the other viruses. For example, ToCV was also present and has been shown to be involved synergistically as it has shown to increase the titre of a related virus, Tomato infectious chlorosis virus (TICV), in tobacco (Syller, 2012; Wintermantel et al., 20008). Similarly, PAMV, another , could have had a fold change in its titre due to the presence of PVY in each sample (Tables 3.7 and

3.9).

82

Helper dependence is another synergistic phenomenon that occurs. Some viruses are dependent on other viruses for certain phases of their life cycles. Syller (2012) has reviewed that some viruses cannot be transmitted by aphids. However, if the host plant is co-infected with a virus from the family Luteoviridae (such as PLRV), the dependent virus can then become aphid- transmissible. This would then theoretically change the capability of ToCV to be transmitted by aphids in samples DIA15-016-2 and DIA15-018-4 (Table 3.9) of the second set of Bangladesh samples as ToCV is only currently known to be transmitted by whiteflies (Wintermantel et al.,

2008).

It is also unknown, due to the number of viruses present in these samples as well as their different combinations, if there is possibly an antagonistic relationship between two viruses occurring. Cross-protection could have taken place where a previous infection with one virus prevents or interferes with subsequent infection by a homologous virus. Mutual exclusion is not common for interactions between viruses and plants but could occur when two or more viruses infect the host at the same time. Two or more viruses could be competing with each other for the colonization of specific cells in certain areas of the plant which could lead to a decrease in the titre of one or both and is reviewed extensively be Elena et al., (2011) (Syller, 2012).

Using the protocol for screening imported PPEQ accessions has shown to be successful in the detection of plant virus without a priori knowledge of infection. The tested PPEQ accessions (Table 2.1) arrived from several destinations worldwide and displayed symptoms in bioassay typical to those of viral infection. These included mosaic mottling and chlorosis both in the mother plant and bioassay plants suggesting that the mother plants could be infected. In the case of these accessions (Table 3.10), five accessions arriving from Chile and one from the

United Kingdom were determined through NGS and later confirmed via RT-PCR to be virus-

83 positive and were rejected for import to Canada. For the Chilean accessions, it is normal for imports from not only Chile but also from Peru, Colombia, Venezuela, and other Andean regions to be rejected. Coastal Andean regions of Chile and Peru constitute the centre of origin and domestication for many cultivated potato, tomato, and pepper species. Many of the potyiruses that infect these crops likely originated in these coastal regions and spread via contaminated germplasm as was prevented in this case. Similarly, this region is unique from other potato- growing regions because of a great abundance of wild Solanum spp. relatives that grow in close proximity to domesticated crops which contributes to the common exchange of viruses between them (Spetz et al., 2003). The single accession from the United Kingdom is a more uncommon rejection. Although PLRV has worldwide distribution, phytosanitary practices in the United

Kingdom have PLRV relatively well controlled by pesticide application for aphid vectors and aphid monitoring. However, the result does highlight the necessity of the PPEQ program and its ability to safeguard the seed potato certification program in Canada.

4.2.2 Use of NGS in Diagnostic Testing Leads to New Characterizations

This study was also able to characterize STV for the first time in Canada (Figure 3.7)

(Hammill and Xu, 2019). Up to this point, STV has only been characterized in southern

California and Mississippi of the United States, southwestern Mexico (Sabanadzovic et al.,

2009), France (Candresse et al., 2013), Spain (Verbeek et al., 2015) and Italy (Iacono et al.,

2015). In addition to the presence of STV, complete genome assemblies of PepMV and PSTVd were also acquired from each Canadian sample. The presence of STV in a mixed infection with

PepMV or some other viruses is common (Iacono et al., 2015; Padmanabhan et al., 2015;

Verbeek et al., 2015) and therefore the observed symptoms cannot be attributed solely to STV. It

84 has been reported that STV may cause hosts to be asymptomatic but may work in a synergistic relationship with other pathogens (Puchades et al., 2017). We were also able to confirm the observations made by Candresse et al., (2013) and Alcala-Briseno et al., (2017) that STV is only known to be transmitted vertically as mechanical inoculation, graft inoculation, and syringe injection had all failed to transmit to a healthy host tomato (Hammill and Xu, 2019).

4.3 Sensitivity of the NGS Platform

The NGS protocol that was evaluated has shown sensitivity to 1:2096 when sap from

PotLV, PVX, and PLRV were serially diluted. Complete genome assemblies were created from contigs of all three targets until 1:32 dilution with incomplete assemblies and single contigs to

1:2096. Sensitivity decreases when there are higher insertion and deletion costs associated with de novo assembly in the bioinformatic workflow (Table 2.3). Specificity is decreased when lower mismatch, insertion, and deletion penalties are assessed during de novo assembly of the unmapped reads for the Bangladesh samples. Yam latent virus (YLV) and Hop mosaic virus

(HpMV), two carlaviruses, replaced contigs from PVM and PVS, also carlaviruses. YLV and

HpMV were determined not to be present in these samples via RT-PCR (result not shown). This would indicate that less stringent scoring of the unmapped reads leads to lower specificity by creating assemblies of sequences that are conserved among the carlaviruses (or similar groups).

Higher mismatch, insertion, and deletion penalties led to a decrease in sensitivity has single contigs in diluted samples were more highly scrutinized. When higher scoring penalties were applied to mapping reads to the host genome (Table 2.2), there was no change in sensitivity and specificity owing to the fact that only non-host sequences are desired. From the designed bioinformatic workflow using the scoring scheme as shown in Tables 2.2 and 2.3, 37 total

85 sequences including 33 complete viral genomes (Table 3.12) have been published in GenBank. It is to be noted, however, that these targets were serially diluted from an unknown starting concentration and any future sensitivity evaluation should include analysis samples of known viral titre. This would be difficult to achieve in nature due to the synergistic interactions that occur in mixed natural infection, but possibly be controlled under laboratory conditions through the use of plaque assays and spectrophotometric measurement of viral RNA.

While attempting to identify unknown agents, the recovery of virus sequence does not provide a direct link to the cause of disease. Assembled sequences are scrutinized only by computational methods and the quality and accuracy of the assembly is only as precise as the quality of the input. Therefore, poor sample quality and therefore poor RNA quality could lead to the detection of a conserved sequence of a member of the same group leading to a decreased specificity. Despite its high accuracy, Illumina sequencing platforms do come with their share of challenges such as the implementation of quality and calibration controls for sequencing runs.

Traditionally, bacteriophage PhiX genomic DNA has been used as a control for sequencing runs.

For example, PhiX spike-in at 1% for high diversity samples can be used to necessitate subsequent quality control analysis downstream after reference mapping and de novo assembly

(Mukherjee et al., 2015). However, for high diversity samples, even a 1% spike-in could take away physical space on the flow cell for viral targets present in low titre as evident in Table

3.11.

4.4 Conclusion and Future Direction

Detecting and identifying new plant viruses currently relies on a wide range of techniques; some traditional and some modern. Symptomology can be similar for many different

86 viral groups meaning that the screening process begins using a broad-stroke approach of testing for a range of common and known viruses. These specific tests utilize reagents with a finite specificity capable of detecting specific strains or species. However, in many cases, these reagents may have difficulty detecting new variants that may arise. As a result, these pathogens can evade detection and have the possibility of spreading rapidly. The metagenomics approach discussed here has the ability to study the microbial population by analyzing nucleotide sequence content that can be applied to a wide range of plant samples from different species. This metagenomics approach to diagnostics offers the possibility of overcoming the problems of pathogen prediction. Sequences produced from an infected plant include sequences from pathogens present and the extraction of RNA from the infected plant can produce cDNA from all nucleic acid present via random priming to reveal a wide range of potential pathogens without discrimination. The TruSeq Stranded Total RNA Library Prep Kit is very preferable to use in this case as it provides a reagent that depletes host rRNA making the capability of detecting non-host sequences, in theory, approximately 4-5 times more robust. This approach may also have the ability to detect RNA species from other organisms such as bacteria of fungi that may be present in the sample.

The NGS diagnostic technique can also be expanded to lead to the exploration of a variety of virus-virus interactions within a particular host. Subsequent sequence data may provide insight on the generation of variants that show novel genetic features that have impacts on the viral population as a whole. Interactions among viruses, particularly in the cases of synergistic relationships, can be crucial for understanding pathogenesis and evolution which can further contribute to the development of control and eradication strategies as synergistic relationships have epidemiological and economic implications. The increased multiplication of

87 one or more interacting viral partners can also modify host range or vector transmission or can make non-host plants hosts.

Unlike traditional techniques, this diagnostic technique requires no a priori knowledge of the presence of a suspected pathogen. The technique does not utilize virus-specific reagents such as antibodies for ELISA or PCR primers and probes. NGS screening of plant samples also allows for a sensitive and rapid approach in characterizing new pathogens in a way that allows for timely epidemiological work to be completed and potentially enabling control or eradication strategies to occur. Used in the context of the PPEQ program, the NGS diagnostic technique takes the burden of judgement calls for rejection or acceptance of a particular cultivar off of the

CFIA and provides certainty that the Canadian Potato Program is safeguarded from the import of pathogens that have the potential to devastate the industry.

88

REFERENCES

Adams, M.J., Antoniw, F.J., Beaudoin, F. 2005a. Overview and analysis of polyprotein cleavage sites in the family Potyviridae. Mol. Plant Pathol., 6: 471-487. doi: 10.1111/J.1364- 3703.2005.00296.X

Adams, M.J., Antoniw, F.J., Fauquet, C.M. 2005b. Molecular criteria for genus and species discrimination within the family Potyviridae. Arch. Virol., 150: 459-479. doi: 10.1007/s00705- 004-0440-6

Adams, I.P., Glover, R.H., Monger, W.A., Mumford, R., Jackeviciene, E., Navalinskiene, M., Samuitiene, M., Boonham, N. 2009a. Next-generation sequencing and metagnomic analysis: a universal diagnostic tool in plant virology. Mol. Plant Pathol., 10: 537-545. doi: 10.1111/J.1364- 3703.2009.00545.X

Adams, M.J., Antoniw, F.J., Kreuze, J. 2009b. Virgaviridae: a new family of rod-shaped viruses. Arch. Virol., 154: 1967-1972. doi: 10.1007/s00705-009-0506-6

Adkins, S. 2000. Tomato spotted wilt virus – positive steps towards negative success. Mol. Plant Pathol., 1: 151-157.

Al Rwahnih, M., Sudarshana, M.R., Uyemoto, J.K., Rowhani, A. 2012. Complete genome sequence of a novel isolated from grapevine. J. Virol., 86: 9545. doi: 10.1128/JVI.01444-12

Alcala-Briseno, R.I., Coskan, S., Londono, M.A., Polston, J.E. 2017. Genome sequence of Southern tomato virus in asymptomatic tomato “Sweet Hearts”. Genome. Announc., 5: e01374- 16. doi: 10.1128/genomeA.01374-16.

Amuge, T., Berger, D.K., Katari, M.S., Myburg, A.A., Goldman, S.L., Ferguson, M.E. 2017. A time series transcriptome analysis of cassava (Manihot esculenta Crantz) varieties challenged with Ugandan Cassava brown streak virus. Nat. Sci. Rep., 7: 9747. doi: 10.1038/s41598-017- 09617-z

Aramburu, J., Galipienso, L., Matas, M. 2006. Characterization of Potato virus Y isolates from tomato crops in northeast Spain. Eur. J. Plant Pathol., 115: 247-256. doi: 10.1007/s10658-006- 9003-x

89

Bandyopadhyay, A., Kopperud, K., Anderson, G., Martin, K., Goodin, M. 2010. An integrated protein localization and interaction map for Potato yellow dwarf virus, type species of the genus Nucleorhabdovirus. Virol., 402: 61-71. doi: 10.1016/j.virol.2010.03.013

Baulcombe, D.C., Lloyd, J., Manoussopoulos, I.N., Roberts, I.M., Harrison, B.D. 1993. Signal for potyvirus-dependent aphid transmission of Potato aucuba mosaic virus and the effect of its transfer to Potato virus X. J. Gen. Virol., 74: 1245-1253.

Barba, M. and Hadidi, A. 2015. An overview of plant pathology and application of next- generation sequencing technologies. CAB Rev., 10: 1-21. doi: 10.1079/PAVSNNR201510005

Barba, M., Czosnek, H., Hadidi, A. 2014. Historical perspective, development and applications of next-generation sequencing in plant virology. Viruses, 6: 106-136. doi: 10.3390/v6010106

Bertolini, E., Cambra, M., Serra, P., Lopez, M.M., Lopes, S., Duran-Villa, N., Ayres, J., and Bove, J. 2010. Direct procedure for specific detection of Potato spindle tuber viroid and Citrus exocortis viroid immobilized by RT-PCR and real-time targets for detection. Spanish Patent 2.387.172.

Blawid, R., Silva, J.M.F., Nagata, T. 2017. Discovering and sequencing new plant viral genomes by next-generation sequencing: description of a practical pipeline. Ann. Appl. Biol., 170: 301- 314. doi: 10.1111/aab.12345

Bol, J.F. 1999. Alfalfa mosaic virus and ilarviruses: involvement of coat protein in multiple steps of the replication cycle. J. Gen. Virol., 80: 1089-1102.

Botermans, M., van de Vossenberg, B.T.L.H., Verhoeven, J.T.J., Roenhorst, J.W., Hooftman, M., Dekter, R., Meekes, E.T.M. 2013. Development and validation of a real-time RT-PCR assay for generic detection of pospiviroids. J. Virol. Meth., 187: 43-50. doi: 10.1016/j.jviromet.2012.09.004

Brattey, C., Badge, J.L., Burns, R., Foster, G.D., George, E., Goodfellow, H.A., Mulholland, V., McDonald, J.G., Jeffries, C.J. 2002. Potato latent virus: a proposed new species in the genus Carlavirus. Plant Pathol., 51: 495-505. doi: 10.1046/j.1365-3059.2002.00729.x

Brown, C.R. 1993. Origin and history of the potato. Am. Pot. J., 70: 363-373.

Candresse, T., Marais, A., Faure, C. 2013. First report of Southern tomato virus on tomatoes in southwest France. Plant Dis., 97: 1124. doi: 10.1094/PDIS-01-13-0017-PDN

90

Candresse, T., Filloux, D., Muhire, B., Julian, C., Galzi, S., Fort, G., Bernardo, P., Daugrois, J.- H., Fernandez, E., Martin, D.P., Varsani, A., Roumagnac, P. 2014. Appearances can be deceptive : revealing a hidden viral infection with deep sequencing in a plant quarantine context. PLoS ONE, 9: e102945. doi: 10.1371/journal.pone.0102945.

CFIA Biohazard and Containment Safety, 2010. Containment standards for facilities handling aquatic and animal pathogens. 1st ed., Canadian Food Inspection Agency.

Chen, J., Chen, J., Adams, M.J. 2001. A universal PCR primer to detect members of the Potyviridae and is use to examine the taxonomic status of several members of the family. Arch. Virol., 146: 757-766.

Chikhi, R. and Medvedev, P. 2014. Informed and automated k-mer size selection for genome assembly. Bioinformatics, 30: 31-37. doi: 10.1093/bioinformatics/btt310

Chiumenti, M., Torchetti, E.M., Di Serio, F., Minafra, A. 2014. Identification and characterization of a viroid resembling apple dimple fruit viroid in fig (Ficus carica L.) by next generation sequencing of small RNAs. Virus Res., 188: 54-59. doi: 10.1016/j.virusres.2014.03.026

Codoner, F.M. and Elena, S.F. 2008. The promiscuous evolutionary history of the family Bromoviridae. J. Gen. Virol., 89: 1739-1747. doi: 10.1099/vir.0.2008/000166-0

Cooke, D.E.L., Cano, L.M., Raffaele, S., Bain, R.A., Cooke, L.R., Etherington, G.J., Deahl, K.L., Farrer, R.A., Gilroy, E.M, Goss, E.M., Grunwald, N.J., Hein, I., MacLean, D., McNichol, J.W., Randall, E., Oliva, R.F., Pel, M.A., Shaw, D.S., Squires, J.N., Taylor, M.C., Vleeshouwers, V.G.A.A., Birch, R.R.J., Lees, A.K., Kamoun, S. 2012. Genome analyses of an aggressive and invasive lineage of the Irish potato famine pathogen. PLoS Path., 8: e1002940. doi: 10.1371/journal.ppat.1002940

Correa, R.L., Silva, T.F., Simoes-Araujo, J.L., Barroso, P.A.V., Vidal, M.S., Vaslin, M.F.S. 2005. Molecular characterization of a virus from the family Luteoviridae associated with cotton blue disease. Arch. Virol., 150: 1357-1367. doi: 10.1007/s00705-004-0475-8

Culver, J.N. and Padmanabhan, M.S. 2007. Virus-induced disease: altering host physiology one interaction at a time. Annu. Rev. Phytopathol., 45: 221-243. doi: 10.1146/annurev.phyto.45.062806.094422

91

Del Fabbro, C., Scalabrin, S., Morgante, M., Girogi, F.M. 2013. An extensive evaluation of read trimming effects on Illumina data analysis. PLoS ONE, 8: e85024. doi: 10.1371/journal.pone.0085024

Di Serio, F., Gisel, A., Navarro, B., Delgado, S., Martinez de Alba, A.-E., Donvito, G., Flores, R. 2009. Deep sequencing of the small RNAs derived from two symptomatic variants of a chloroplastic viroid: implications for their genesis and for pathogenesis. PLoS ONE, 4: e7539. doi: 10.1371/journal.pone.0007539.

Donaire, L, Barajas, D., Martinez-Garcia, B., Martinez-Priego, L., Pagan, I., Llave, C. 2008. Structural and genetic requirements for the biogenesis of Tobacco rattle virus-derived small interfering RNAs. J. Virol., 82: 5167-5177. doi: 10.1128/JVI.00272-08

Elbeaino, T., Giampetruzzi, A., de Stradis, A., Digiaro, M. 2014. Deep-sequencing analysis of an apricot tree with vein clearing symptoms reveals the presence of a novel betaflexivirus. Virus Res., 181: 1-5. doi: 10.1016/j.virusres.2013.12.030

Elena, S.F., Bedhomme, S., Carrasco, P., Cuevas, J.M., de la Iglesia, F., Lafforgue, G., Lalic, J., Prosper, A., Tromas, N., Zwart, M.P. 2011. The evolutionary genetics of emerging plant RNA viruses. Mol. Plant Microbe Interact., 24: 287-293. doi: 10.1094/MPMI-09-10-0214

EPPO Standard PM 8(1) – Commodity Standard for Potato. Post-entry quarantine for potato. EPPO Bulletin, 34: 443-454.

Francki, R.I.B. 1985. Plant virus satellites. Ann. Rev. Microbiol., 39: 151-174.

Friedman, M. 2004. Analysis lf biologically active compounds in potatoes (Solanum tuberosum), tomatoes (Lycopersicon esculentum), and jimson weed (Datura stramonium) seeds. J. Chrom., 1054: 143-155. doi: 10.1016/j.chroma.2004.04.049

Frost, K.E., Groves, R.L., Charkowski, A.O. 2013. Integrated control of potato pathogens through seed potato certification and provision of clean seed potatoes. Plant Dis., 97: 1268-1280.

Gambino, G., Perrone, I., Gribaudo, I. 2008. A rapid and effective method for RNA extraction from different tissues of grapevine and other woody plants. Phytochem. Analysis, 19: 520-525. doi: 10.1002/pca.1078

Garcia-Cano, E., Resende, R.O., Fernandez-Munoz, R., Moriones, E. 2006. Synergistic interaction between Tomato chlorosis virus and Tomato spotted wilt virus results in breakdown of resistance in tomato. Phytopathol., 96: 1263-1269. doi: 10.1094/PHYTO-96-1263

92

Garcia-Marcos, A., Pacheco, R., Martinez, J., Gonzalez-Jara, P., Diaz-Ruiz, J.R., Tenllado, F. 2009. Transcriptional changes and oxidative stress associated with the synergistic interaction between Potato virus X and Potato virus Y and their relationship with symptom expression. Mol. Plant Microb. Interact., 22: 1431-1444. doi: doi.org/10.1094/MPMI-22-11-1431

Gayral, P., Weinert, L., Chiari, Y., Tsagkogeorga, G., Ballenghien, M., Galtier, N. 2011. Next- generation sequencing of transcriptomes: a guide to RNA isolation in nonmodel animals. Mol. Ecol. Resources, 11: 650-661. doi: 10.1111/j.1755-0998.2011.03010.x

Giguere, T., Adkar-Purushothama, C.R., Bolduc, F., Perrault, J.-P. 2014. Elucidation of the structures of all members of the Avsunviroidae family. Mol. Plant Pathol., 15: 767-779. doi: 10.1111/mpp.12130

Gonzalez-Jara, P., Tenllado, F., Martinez-Garcia, B., Atencio, F.A., Barajas, D., Vargas, M., Diaz-Ruiz, J., Diaz-Ruiz, J.R. 2004. Host-dependent differences during synergistic infection by potyviruses with Potato virus X. Mol. Plant Pathol., 5: 29-35. doi: 10.1046/J.1364- 3703.2004.00202.X

Gutierrez, P.A., Alzate, J.F., Marin-Montoya, M.A. 2013. Complete genome sequence of a novel Potato virus S strain infecting Solanum phureja in Columbia. Arch. Virol., 158: 2205-2208. doi: 10.1007/s00705-013-1730-7

Gutierrez-Aguirre, I., Mehle, N., Delic, D., Gruden, K., Mumford, R., Ravnikar, M. 2009. Real- time quantitative PCR based sensitive detection and genotype discrimination of Pepino mosaic virus. J. Virological Meth., 162: 46-55. doi: 10.1016/j.jviromet.2009.07.008

Hagen, C., Frizzi, A., Kao, J., Jia, L., Huang, M., Zhang, Y., Huang, S. 2011. Using small RNA sequences to diagnose, sequence, and investigate the infectivity characteristics of vegetable- infecting viruses. Arch. Virol., 156: 1209-1216. doi: 10.1007/s00705-011-0979-y

Hajeri, S., Ramadugu, C., Keremane, M., Vidalakis, G., Lee, R. 2010. Nucleotide sequence and genome organization of Dweet mottle virus and its relationship to members of the family Betaflexiviridae. Arch. Virol., 155: 1523-1527. doi: 10.1007/s00705-010-0758-1

Hajimorad, M.R., Domier, L.L., Tolin, S.A., Whitham, S.A., Saghai Maroof, M.A. 2018. Soybean mosaic virus: a successful potyvirus with a wide distribution but restricted natural host range. Mol. Plant Pathol., 19: 1563-1579. doi: 10.1111/mpp.12644

Hammill, D. and Xu, H. 2019. First report of Southern tomato virus on tomato in Canada. Plant Dis., in press.

93

Hirsch, C.N., Hirsch, C.D., Felcher, K., Coombs, J., Zarka, D., van Deynze, A., de Jong, W., Veilleux, R.E., Jansky, S., Bethke, P., Douches, D.S., Buell, C.R. 2013. Retrospective view of North American potato (Solanum tuberosum L.) breeding in the 20th and 21st centuries. G3, 3: 1003-1013. doi: 10.1534/g3.113.005595

Hohn, T. 2007. Plant virus transmission from the insect point of view. Proc. Natl. Acad. Sci., 104: 17905-17906. doi: 10.1073_pnas.0709178104

Hu, C.-C., Hsu, Y.-H., Lin, N.-S. 2009. Satellite RNAs and satellite viruses of plants. Viruses, 1: 1325-1350. doi: 10.3390/v1031325

Iacono, G., Hernandez-Llopis, D., Alfaro-Fernandez, A., Davino, M., Font, M.I., Panno, S., Galipenso, L., Rubio, L, Davino, S. 2015. First report of Southern tomato virus in tomato crops in Italy. New Dis. Rep., 32: 27. doi: 10.5197/j.2044-0588.2015.032.027

Ivanov, K.I. and Makinen, K. 2012. Coat proteins, host factors and plant viral replication. Curr. Opin. Virol., 2: 712-718. doi: 10.1016/j.coviro.2012.10.001

Jacquemond, M. 2012. Cucumber mosaic virus. Adv. Virus Res., 84: 439-504. doi: 10.1016/B978-0-12-394314-9.00013-0

James, D., Varga, A., Lye, D. 2014. Analysis of the complete genome of a virus associated with twisted leaf disease of cherry reveals evidence of a close relationship to unassigned viruses in the family Betaflexiviridae. Arch. Virol., 159: 2463-2468. doi: 10.1007/s00705-014-2075-6

Jang, C., Wang, R., Wells, J., Leon, F., Farman, M., Hammond, J., Goodin, M.M. 2017. Genome sequence variation in the constricta strain dramatically alters the protein interaction and localization map of Potato yellow dwarf virus. J. Gen. Virol., 98: 1526-1536. doi: 10.1099/jgv.0.000771

Janzac, B., Fabre, M.-F., Palloix, A., Moury, B. 2008. Characterization of a new potyvirus infecting pepper crops in Ecuador. Arch. Virol., 153: 1543-1548. doi: 10.1007/s00705-008-0132- 8

Jia, H., Dong, K., Zhou, L., Wang, G., Hong, N., Jiang, D., Xu, W. 2017. A dsRNA virus with filamentous viral particles. Nat. Comm., 8: 168. doi: 10.1038/s41467-017-00237-9

Karasev, A.V., Hu, X., Brown, C.J., Kerlan, C., Nikolaeva, O.V., Crosslin, J.M., Gray, S.M. 2011. Genetic diversity of the ordinary strain of Potato virus Y (PVY) and origin of recombinant PVY strains. Phytopathol., 101: 778-785. doi: 10.1094/PHYTO-10-10-0284

94

Kaur, N., Hasegawa, D.K., Ling, K.-S., Wintermantel, W.M. 2016. Application of genomics for understanding plant virus-insect vector interactions and insect vector control. Phytopathol., 106: 1213-1222. doi: 10.1094/PHYTO-10-10-0284

Kehoe, M. A., Coutts, B.A., Buirchell, B.J., Jones, R.A.C. 2014. Plant virology and next generation sequencing: experiences with a Potyvirus. PLoS ONE, 9: e104580. doi: 10.1371/journal.pone.0104580

Kircher, M. and Kelso, J. 2010. High-throughput DNA sequencing – concepts and limitations. Bioessays, 32: 524-536. doi: 10.1002/bies.200900181

Koev, G. and Miller, W.A. 2000. A positive-strand RNA virus with three different subgenomic RNA promoters. J. Virol., 74: 5988-5996.

Komorowska, B., Siedlecki, P., Kaczanowski, S., Hasiow-Jaroszweska, B., Malinowska, T. 2011. Sequence diversity and potential recombination events in the coat protein gene of Apple stem pitting virus. Virus Res., 158: 263-267. doi: 10.1016/j.virusres.2011.03.003

Kormelink, R., Garcia, M.L., Goodin, M., Sasaya, T., Haenni, A.-L. 2011. Negative-strand RNA viruses: the plant-infecting counterparts. Virus Res., 162: 184-202. doi: 10.1016/j.virusres.2011.09.028

Kreuze, J.F., Perez, A., Untiveros, M., Quispe, D., Fuentes, S., Barker, I., Simon, R. 2009. Identification and molecular make-up of Potato virus Y strain PVYZ: genetic typing of PVYZ- NTN. Phytopathol., 101: 1052-1060. doi: 10.1016/j.virol.2009.03.024

Lorenzen, J.H., Meacham, T., Berger, P.H., Shiel, P.J., Crosslin, J.M., Hamm, P.B., Kopp, H. 2006a. Whole genome characterization of Potato virus Y isolates collected in the western USA and their comparisons to isolates from Europe and Canada. Arch. Virol., 151: 1055-1074. doi: 10.1007/s00705-005-0707-6

Lorenzen, J.H., Piche, L.M., Gudmestad, N.C., Meacham, T., Shiel, P. 2006b. A multiplex PCR assay to characterize Potato virus Y isolates and identify strain mixtures. Plant Dis., 90: 935-940. doi: 10.1094/PD-90-0935

Lowe, R., Shirley, N., Bleackley, M., Dolan, S., Shafee, T. 2017. Transcriptomics technologies. PLoS Comput. Biol.., 13: e1005457. doi: 10.1371/journal.pcbi.1005457.

95

Lozano, I., Leiva, A.M., Jiminez, J., Fernandez, E., Carvajal-Yepes, M., Cuervo, M., Cuellar, W.J. 2017. Resolution of cassava-infecting alphaflexiviruses: molecular and biological characterization of a novel group of potexviruses lacking the TGB3 gene. Virus. Res., 241: 53- 61. doi: 10.1016/j.virusres.2017.03.019

Lu, R., Martin-Hernandez, A.M., Peart, J.R., Malcuit, I., Baulcombe, D.C. 2003. Virus-induced gene silencing in plants. Methods, 30: 296-303. doi: 10.1016/S1046-2023(03)00037-9

Machida-Hirano, R. 2015. Diversity of potato genetic resources. Breeding Sci., 65: 26-40. doi: 10.1270/jsbbs.65.26

Mayo, M.A., Robinson, D.J., Jolly, C.A., Hyman, L. 1989. Nucleotide sequence of Potato leaf roll luteovirus RNA. J. Gen. Virol., 70: 1037-1051.

Mertens, P. 2004. The dsRNA viruses. Virus Res., 101: 3-13. doi: 10.1016/j.virusres.2003.12.002

Monger, W., Tomlinson, J., Boonham, N., Virscek Marn, M., Mavric Plesko, I., Molinero- Demilly, V., Tassus, X., Meekes, E., Toonen, M., Papayiannis, L., Perez-Egusquiza, Z., Mehle, N., Jansen, C., Lykke Nielsen, S. 2010. Development of inter-laboratory evaluation of real-time PCR assays for the detection of pospiviroids. J. Virol. Meth., 169: 207-210. doi: 10.1007/s00705- 009-0581-8

Morroni, M., Thompson, J.R., Tepfer, M. 2008. Twenty years of transgenic plants resistant to Cucumber mosaic virus. Mol. Plant Microbe Interact., 21: 675-684. doi: 10.1094/MPMI -21-6- 0675

Mukherjee, S., Huntemann, M., Ivanova, N., Kyrpides, N.C., Pati, A. 2015. Large-scale contamination of microbial isolate genomes by Illumina PhiX control. Stand. Genomic Sci., 10: 18 doi: https://doi.org/10.1186/1944-3277-10-18

Mumford, R. and Metcalfe, E.J. 2001. The partial sequencing of the genomic RNA of a UK isolate of Pepino mosaic virus and the comparison of the coat protein sequence with other isolates from Europe and Peru. Arch. Virol., 146: 2455-2460.

Mumford, R.A., Walsh, K., Barker, I, Boonham, N. 2000. Detection of Potato mop-top virus and Tobacco rattle virus using a multiplex real-time fluorescent reverse-transcription polymerase chain reaction assay. Phytopathol., 90: 448-453.

96

Nadiya, F., Anjali, N., Gangaprasad, A., Krishnannair Sabu, K. 2015. High-quality RNA extraction from small cardamom tissues rich in polysaccharides and polyphenols. Analytical Biochem., 485: 25-27. doi: 10.1016/j.ab.2015.05.017

Noda, H. and Nakashima, N. 1995. Non-pathogenic reoviruses of leafhoppers and planthoppers. Sem. Virol., 6: 109-116.

Ng, K.K.S., Arnold, J.A., Cameron, C.E. 2008. Structure-function relationships among RNA- dependent RNA polymerases. Curr. Top. Microbiol. Immunol., 320: 137-156.

Padmanabhan, C., Zheng, Y., Li, R., Sun, S.-E., Zhang, D., Liu, Y., Fei, Z., Ling, K.-S. 2015. Complete genome sequence of Southern tomato virus identified in China using next-generation sequencing. Genome Announc., 3: e01226-15. doi: 10.1128/genomeA.01226-15

Pagan, I. and Holmes, E.C. 2010. Long-term evolution of the Luteoviridae: time scale and mode of virus speciation. J. Virol., 84: 6177-6187. doi: 10.1128/JVI.02160-09

Potato Genome Sequencing Consortium. 2011. Genome sequence and analysis of the tuber crop potato. Nature, 475: 189-195. doi: 10.1038/nature10158

Potato Market Information Review, 2015-2016. Agriculture and Agri-Food Canada. Available at http://www.agr.gc.ca/eng/industry-markets-and-trade/market-information-by- sector/horticulture/horticulture-sector-reports/?id=1368482338314

Pruss, G., Ge, X., Shi, X.M., Carrington, J.C., Vance, V.B. 1997. Plant viral synergism: the potyviral genome encodes a broad-range pathogenicity enhancer that transactivates replication of heterologous viruses. The Plant Cell, 9: 859-868.

Puchades, A.V., Carpino, C., Alfaro-Fernandez, A., Font-San-Ambrosio, M.I., Davino, S., Guerri, J., Rubio, L., Galipienso, L. 2017. Detection of Southern tomato virus by molecular hybridisation. Ann. Appl. Biol., 171: 172-178. doi: 10.1111/aab.12367

Reguera, J., Weber, F., Cusack, S. 2010. Bunyaviridae RNA polymerase (L-protein) have an N- terminal, influenza-like endonuclease domain, essential for viral cap-dependent transcription. PLoS Path., 6: e1001101. doi: 10.1371/journal.ppat.1001101

Roosinck, M.J., Sabanadzovic, S., Okada, R., Valverde, R.A. 2011. The remarkable evolutionary history of endornaviruses. J. Gen. Virol., 92: 2674-2678. doi: 10.1099/vir.0.034702-0

97

Russo, M., Rubino, L., De Stradis, A., Martelli, G.P. 2009. The complete nucleotide sequence of Potato virus T. Arch. Virol., 154: 321-325. doi: 10.1007/s00705-008-0300-x

Savanadzovic, S., Valverde, R.A., Brown, J.K, Martin, R.R., Tzanetakis, I.E. 2009. Southern tomato virus: the link between the families Totiviridae and Partitiviridae. Virus Res., 140: 130- 137. doi: 10.1016/j.virusres.2008.11.018

Sanger, F., Nicklen, S., Coulson, A.R. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci., 74: 5463-5467.

Savenkov, E.I., Germundsson, A., Zamyatnin, A.A., Sandgren, M., Valkonen, J.P.T. 2003. Potato mop-top virus: the coat protein-encoding RNA and the gene for cysteine-rich protein are dispensible for systemic virus movement in Nicotiana benthamiana. J. Gen. Virol., 84: 1001- 1005. doi: 10.1099/vir.0.18813-0

Senshu, H., Yamaji, Y., Minato, N., Shiraishi, T., Maejima, K., Hashimoto, M., Miura, C., Neriya, Y., Namba, S. 2011. A dual strategy for the suppression of host antiviral silencing: two distinct suppressors for viral replication and viral movement encoded by Potato virus M. J. Virol., 85: 10269-10278. doi: 10.1128/JVI.05273-11

Schoen, C.D., Knorr, D., Leone, G. 1996. Detection of Potato leaf roll virus in dormant potato tubers by immunocapture and a fluorogenic 5’ nuclease RT-PCR assay. Phytopathol., 86: 993- 999.

Schroeder, A., Mueller, O., Stocker, S., Salowsky, R., Leiber, M., Gassmann, M., Lightfoot, S., Menzel, W., Granzow, M. Ragg, T. 2006. The RIN: an RNA integrity number for assigning integrity values to RNA measurements. BMC Molec. Biol., 7: 3. doi: 10.1186/1471-2199-7-3

Sharma, N. and Singh, S.K. 2016. Implications of non-coding RNAs in viral infections. Rev. Med. Virol., 26: 356-368. doi: 10.1002/rmv.1893

Singh, R.P., Valkonen, J.P.T., Gray, S.M., Boonham, N., Jones, R.A.C., Kerlan, C., Schubert, J. 2008. Discussion paper: The naming of Potato virus Y strains infecting potato. Arch. Virol., 153: 1:13. doi: 10.1007/s00705-007-1059-1

Solomon-Blackburn, R.M. and Barker, H. 2001. Breeding virus resistant potatoes (Solanum tuberosum): a review of traditional and molecular approaches. Heredity, 86: 17-35.

98

Spear, A., Sisterson, M.S., Yokomi, R., Stenger. 2010. Plant-feeding insects harbor double- stranded RNA viruses encoding a novel proline-alanine rich protein and a polymerase distantly related to that of fungal viruses. Virol., 404: 304-311. doi: 10.1016/j.virol.2010.05.015

Spetz, C., Taboada, A.M., Sarwich, S., Ramsell, J., Salazar, L.F., Valkonen, J.P.T. 2003. Molecular resolution of a complex of potyviruses infected solanaceous crops at the centre of origin in Peru. J. Gen. Virol., 84: 2565-2578. doi: 10.1099/vir.0.19208-0

Srivastava, N., Chaudhary, S., Kumar, V., Katudia, K., Vaidya, K., Kumar Vyas, M., Chikara, S.K. 2012. Evaluation of the yield, quality and integrity of total RNA extracted by four different extraction methods in rice (Oryza sativa). J. Crop Sci. Tech., 1: 1-9

Statistics Canada, 2018. Table 32-10-0358-01. Area, production and farm value of potatoes

Strange, R.N. and Scott, P.R. 2005. Plant disease: a threat to global food security. Annu. Rev. Phytopathol., 43: 83-116. doi: 10.1146/annurev.phyto.43.113004.133839

Syller, J. 2012. Facilitative and antagonistic interactions between plant viruses in mixed infections. Mol. Plant Pathol., 12: 204-216. doi: 10.1111/J.1364-3703.2011.00734.X

Thaler, N. and Bajc, M. 2013. Effect of fungal and plant secondary metabolites on polymerase chain reaction (PCR). Acta Silvae et Ligni, 100: 25-40.

Tian, Y., Liu, J., Xhang, C.L., Liu, Y.Y., Wang, B., Li, X.-D., Guo, Z.K., Valkonen, J.P.T. 2011. Genetic diversity of Potato virus Y infecting tobacco crops in China. Phytopathol., 101: 377-387. doi: 10.1094 / PHYTO-02-10-0055

Tsagris, E.M., Martinez de Alba, A.E., Gozmanova, M., Katantidis, K. 2008. Viroids. Cell. Microbiol., 10: 2168-2179. doi: 10.1111/j.1462-5822.2008.01231.x

Valli, A., Gallo, A., Calvo, M., de Jesus Perez, J., Garcia, J.A. 2014. A novel role of the potyviral helper component proteinase contributes to enhance yield of viral particles. J. Virol., 88: 9809-9818. doi: 10.1128/JVI.01010-14 van der Vlugt, R.R.A., Stijger, C.C.M., Verhoeven, J.T.J., Lesemann, D.E. 2000. First report of Pepino mosaic virus on tomato. Plant Dis., 84: 103. doi: 10.1094/PDIS.2000.84.1.103C

Verbeek, M., Dullemans, A.M., Espino, A., Botella, M., Alfaro-Fernandez, A., Font, M.I. 2015. First report of Southern tomato virus in the Canary Islands, Spain. J. Plant Pathol., 97: 391-403.

99

Verchot-Lubicz, J., Ye, C.-M., Bamunusinghe, D. 2007. Molecular biology of potexviruses: recent advances. J. Gen. Virol., 88: 1643-1655. doi: 10.1099/vir.0.82667-0

Verhoeven, J.T.J., Jansen, C.C.C., Botermans, M., Roenhorst, J.W. 2010. Epidemiological evidence that vegetatively propagated solanaceous plant species act as sources for Potato spindle tuber viroid inoculum for tomato. Plant Pathol., 59: 3-12. doi: 10.1111/j.1365- 3059.2009.02173.x

Villamor, D.E.V., Susaimuthu, J., Eastwell, K.C. 2015. Genomic analyses of cherry rusty mottle group and cherry twisted leaf-associated viruses reveal a possible new genus within the family Betaflexiviridae. Phytopathol., 105: 39-408. doi: 10.1094/PHYTO-03-14-0066-R

Voelkerding, K.V., Dames, S.A., Durtschi, J.D. 2009. Next-generation sequencing: from basic research to diagnostics. Clin. Chem., 55: 641-658. doi: 10.1373/clinchem.2008.112789

Weller, S.A., Elphinstone, J.G., Smith, N.C., Boonham, N., Stead, D.E. 2000. Detection of Ralstonia solanacearum strains with a quantitative, multiplex, real-time, fluorogenic PCR (TaqMan) Assay. App. Environ. Microbiol., 66: 2853-2858.

Wintermantel, W.M., Cortez, A.A., Anchieta, A.G., Gulati-Sakhuja, A., Hladky, L.L. 2008. Co- infection by two criniviruses alters accumulation of each virus in a host-specific manner and influences efficiency of virus transmission. Phytopathol., 98: 1340-1345. doi: 10.1094/PHYTO- 98-12-1340

Xu, H. 2007. Specific detection and identification of Potato virus X isolates in potato leaf and dormant tuber samples by RT-PCR and RFLP. J. Huazhong Agricultural University, 27: 43-50.

Xu, H. and Nie, J. 2006. Identification, characterization, and molecular identification of Alfalfa mosaic virus in potato. Phytopathol., 96: 1237-1242. doi: 10.1094/PHYTO-96-1237

Xu, H., D’Aubin, J., Nie, J. 2010. Genomic variability in Potato virus M and the development of RT-PCR and RFLP procedures for the detection of this virus in seed potatoes. Virol. J., 7: 25.

Xu, H., Leclerc, D., Leung, B., AbouHaidar, M.G. 1994. The entire nucleotide sequence and genomic organization of Potato aucuba mosaic virus. Arch. Virol., 135: 461-469.

Zhang, W., Zhang, Z., Fan, G., Gao, Y., Wen, J., Bai, Y., Qiu, C., Zhang, S., Shen, Y, Meng, X. 2017. Development and application of a universal and simplified multiplex RT-PCR assay to detect five potato viruses. J. Gen. Plant Pathol., 83: 33-45. doi: 10.1007/s10327-016-0688-1

100

Zhong, S., Joung, J.-G., Zheng, Y., Chen, Y.-r, Liu, B., Shao, Y., Xiang, J.Z., Fei, Z., Giovannoni, J.J. 2011. High-throughput Illumina strand-specific RNA sequencing library preparation. Cold Spring Harb. Protoc., doi: 10.1101/pdb.prot5652

101

APPENDIX A

Table A.1. Components for the generation of Murashige and Skoog agar. Reagent Quantity ddH2O Up to 4 L Murashige and Skoog Basal Salts 17.6 g Sucrose 120.0 g Sodium Phosphate Monohydrate 4.4 g Thiamine-HCl Stock Solution1 4.0 mL Agar 36.0 g 1 Thiamine HCl Stock solution prepared using 100mg Thiamine HCl and 25mL ddH2O

102

APPENDIX B

Table B.1. Components for the generation of 0.01M KPO4 buffer pH 7.0. Reagent Quantity ddH2O Up to 1 L K2HPO4 10.7 g KH2PO 5.2 g Na2SO3 4.0 g

103

APPENDIX C

Table C.1. List of herbaceous indicator species used within the PPEQ program. Indicator Species Common Name Gomphrena globosa Globe amaranth Nicotiana benthamiana Australian tobacco Nicotiana bigloveii Indian tobacco Nicotiana debneyii Debney’s tobacco Nicotiana glutinosa South American tobacco Nicotiana rustica Aztec tobacco Nicotiana tabacum cv. Samsun Transgenic tobacco Physalis angulate Nightshade Physalis floridana Nightshade Petunia hybridiae Petunia Chenopodium amaranticolor Goosefoot amaranth Chenopodium quinoa Quinoa Chenopodium murale Nettle-leaved goosefoot amaranth Solanum lycopersicum Tomato Capiscum annum Pepper Nicandra physaloides Nightshade Cucumis sativus Cucumber Phaseolus vulgaris Common bean

104

APPENDIX D

Figure D.1. Agilent Bioanalyzer trace for RNA extracted using Tri-Reagent.

Figure D.2. Agilent Bioanalyzer trace for RNA extracted using PureLink Total RNA Mini Kit.

Figure D.3. Agilent Bioanalyzer trace for RNA extracted using Direct-zol RNA Mini-Prep Kit.

105

Figure D.4. Agilent Bioanalyzer trace for RNA extracted using Quick-RNA Mini-Prep Kit.

Figure D.5. Agilent Bioanalyzer trace for RNA extracted using Plant/Fungi Total RNA Purification Kit.

106

APPENDIX E

(A) (B) (C)

(D) (E) (F)

Figure E.1. Agarose gel electrophoresis of PCR products amplified from cDNA generated from extracted RNA from a tuber interception from Bangladesh in Toronto, Canada. Samples amplified using primers specific for the detection of (A) PAMV, (B) PVY (all strains), (C) ToCV, (D) PVX, (E) PVS, and (F) PVM.

107

APPENDIX F

(A) (B) (C)

(D) (E) (F)

Figure F.1. Agarose gel electrophoresis of PCR products amplified from cDNA generated from extracted RNA from a tuber interception from Bangladesh in Toronto, Canada. Samples amplified using primers specific for the detection of (A) PAMV, (B) PVY (all strains), (C) ToCV, (D) PLRV, (E) PVS, and (F) PVM.

108

APPENDIX G

Table G.1. Summary of data generated after sequencing of 3 tomato leaf samples from Alberta against the constructed S. lycopersicum genome published in GenBank. Sample Total Reads Total Reads % Mapped % Unmapped Mapped Unmapped DIA17-130 22956848 1487708 93.91 6.09 DIA17-131 10426346 755750 93.24 6.76 DIA17-132 31550872 2487682 92.69 7.31

109

APPENDIX H

(A) (B)

(C)

Figure H.1. Agarose gel electrophoresis of PCR products amplified from cDNA generated from extracted RNA from several accessions submitted to the PPEQ program at the Charlottetown Laboratory of the CFIA. Samples amplified using primers specific for the detection of (A) PVY (all strains), (B) PLRV, and (C) PVS.

110

APPENDIX I

Publications and presentations made during the completion of this study:

Publications: Xu, H., Li, X., Hammill, D., Cody, S., Nie, J. 2017. Development and validation of diagnostic procedures based on the next generation sequencing technology for screening potato accessions imported to Canada. Can. J. Plant Pathol., 39: 137.

Xu H., Cody S., Hammill D., Li X. 2015. Introducing molecular diagnostic technologies for detecting viruses in potato quarantine testing. Phytopathology, 105: Supplement 4: 152.

Hammill, D. and Xu, H. 2019. First report of Southern tomato virus on tomato in Canada. Plant Dis., in press.

Presentations: March 2016: Training Seminar, Charlottetown Laboratory – Canadian Food Inspection Agency, Charlottetown, PE Title: Introductory Phase of Implementing Next-Generation Sequencing (NGS) at the Charlottetown Laboratory.

May 2016: Graduate Studies Research Days, Atlantic Veterinary College – University of Prince Edward Island, Charlottetown, PE Title: A Next-Generation Sequencing (NGS) Approach for the Simultaneous Detection of Plant Viruses for the Purpose of Quarantine Testing.

May 2018: Graduate Studies Research Days, Atlantic Veterinary College – University of Prince Edward Island, Charlottetown, PE Title: A Next-Generation Sequencing (NGS) Approach for the Simultaneous Detection of Plant Viruses for the Purpose of Quarantine Testing.

March 2019: Northeast Potato Technology Forum – Charlottetown, PE Title: Evaluation of Conventional and real-time RT-PCR Procedures for detecting Andean potato latent virus and Andean potato mottle virus.

111