<<

https://lib.uliege.be https://matheo.uliege.be

Characterization of the virome of ancient pome fruit cultivars of Malus Mill. and Pyrus L. using high-throughput sequencing

Auteur : Fontdevila Pareta, Núria Promoteur(s) : Massart, Sébastien Faculté : Gembloux Agro-Bio Tech (GxABT) Diplôme : Master en bioingénieur : sciences agronomiques, à finalité spécialisée Année académique : 2019-2020 URI/URL : http://hdl.handle.net/2268.2/9475

Avertissement à l'attention des usagers :

Tous les documents placés en accès ouvert sur le site le site MatheO sont protégés par le droit d'auteur. Conformément aux principes énoncés par la "Budapest Open Access Initiative"(BOAI, 2002), l'utilisateur du site peut lire, télécharger, copier, transmettre, imprimer, chercher ou faire un lien vers le texte intégral de ces documents, les disséquer pour les indexer, s'en servir de données pour un logiciel, ou s'en servir à toute autre fin légale (ou prévue par la réglementation relative au droit d'auteur). Toute utilisation du document à des fins commerciales est strictement interdite.

Par ailleurs, l'utilisateur s'engage à respecter les droits moraux de l'auteur, principalement le droit à l'intégrité de l'oeuvre et le droit de paternité et ce dans toute utilisation que l'utilisateur entreprend. Ainsi, à titre d'exemple, lorsqu'il reproduira un document par extrait ou dans son intégralité, l'utilisateur citera de manière complète les sources telles que mentionnées ci-dessus. Toute utilisation non explicitement autorisée ci-avant (telle que par exemple, la modification du document ou son résumé) nécessite l'autorisation préalable et expresse des auteurs ou de leurs ayants droit.

Characterization of the virome of ancient pome fruit cultivars of Malus Mill. and Pyrus L. using high-throughput sequencing

NURIA FONTDEVILA PARETA

TRAVAIL DE FIN D’ETUDES PRESENTE EN VUE DE L’OBTENTION DU DIPLÔME DE MASTER BIOINGENIEUR EN SCIENCES AGRONOMIQUES

ANNÉE ACADÉMIQUE 2019-2020

PROMOTEUR : PROF. SÉBASTIEN MASSART

1

© Toute reproduction du présent document, par quelque procédé que ce soit, ne peut être réalisée qu’avec l’autorisation de l’auteur et de l’autorité académique1 de Agro-Bio Tech.

Le présent document n’engage que son auteur.

1 Dans ce cas, l’autorité acadèmiques est representée par le(s) promoteur(s) membre du personnel(s) enseignant de GxABT.

Characterization of the virome of ancient pome fruit cultivars of Malus Mill. and Pyrus L. using high-throughput sequencing

NURIA FONTDEVILA PARETA

TRAVAIL DE FIN D’ETUDES PRESENTE EN VUE DE L’OBTENTION DU DIPLÔME DE MASTER BIOINGENIEUR EN SCIENCES AGRONOMIQUES

ANNÉE ACADÉMIQUE 2019-2020

PROMOTEUR : PROF. SÉBASTIEN MASSART

Acknowledgements

First of all, I would like to thank my promoter, Prof. Sébastien Massart for giving me the opportunity to be part of his team and their scientific network, and for all his support during this master’s thesis. Without him this project would not be a reality.

I would like to specially thank as well Dr Arnaud Blouin for his help during this project, supervising the laboratory work and bioinformatic analysis, for sharing his knowledge in virology and molecular biology, and for his immense patience with me and all my questions,

Last, but not least, I also want to thank all the virology team members and lab technicians for their support and help when I needed it.

Master’s thesis performed at the Integrated and Urban Phytopathology Unit of the ULg-Gembloux Agro-Bio Tech University.

Abstract

Pome fruit have been mainly identified from commercialised cultivars presenting disease symptoms. Their significant impact on fruit yield and quality triggered their identification and characterization ex post. For example, the infection of certain virulent strains of apple stem pitting (ASPV) was detected due to necrosis development between susceptible scions and/or rootstocks (rootstock incompatibility). In this context, the diversity of viruses infecting pome fruit trees is still largely underestimated and it is worth additional investigation. The goal of this project is to evaluate the presence of known and unknown viruses infecting ancient cultivars of apple and pear. Leaf samples were taken from six apple cultivars (‘Gravenstein’, ‘Pomme Pellone’, ‘Délices de Beignée’, ‘Reinette Meurens’, ‘Belle de Boskoop’, and ‘Joseph Musch’) and five pear cultivars (‘Poire Cuisse Madame’, ‘Jeanne d’Arc’, ‘Poire Rougette’, ‘Bronzée d’Enghien’, and ‘Colmar du Mortier’). Total RNA was analysed from ‘Joseph Musch’ (tree Q9) after high throughput sequencing. Double-stranded RNA (dsRNA) and virion-associated nucleic acids (VANA) preparation protocols were applied on the other samples individually (dsRNA) or pooled (dsRNA and VANA). After sequencing, the obtained data were analysed with Geneious Prime to identify the viruses present in the sampled trees and to reconstruct their genomes. In addition, two other bioinformatics pipelines were tested on the generated data (Kaiju and Kraken). The nearly complete genome sequence of seven new isolates from several known viruses was reconstructed. All the samples were infected by at least one virus, the most prevalent was ASPV. Interestingly, the detection of Apple rubbery wood virus-1 (ARWV-1), Apple -1 (ALV-1) and Apple hammerhead -like RNA (AHVd-like RNA) corresponded to the first detection in Europe. Primer and RT-PCR protocols were designed. The presence of ARWV- 1 was detected by RT-PCR, confirming the first detection of this virus in Europe. Further studies need to be carried out to assess the distribution of ARWV-1 within the germplasm collection and to confirm the detection of ALV-1 and AHVd. The analysis of the local prevalence of these viruses will be a first step to evaluate the biological risk they can pose for European production.

Keywords: pome fruit, apple (Malus Mill.), pear (Pyrus L.), plant virus, high-throughput sequencing (HTS), virome characterization, germplasm resources, Apple stem pitting virus (ASPV), Apple stem grooving virus (ASGV), Apple luteovirus-1 (ALV-1), Apple rubbery wood virus-1 (ARWV-1), Apple hammerhead viroid-like RNA (AHVd-like RNA)

Résumé

Historiquement, les virus des pommiers et poiriers ont été principalement identifiés à partir de plantes cultivées présentant des symptômes de maladie. Leurs impacts significatifs sur le rendement et la qualité des fruits a déclenché leur identification et leur caractérisation. Par exemple, l'infection de certaines souches virulentes du virus « Apple stem piting virus » (ASPV) ont été détectées en raison du développement de la nécrose entre les scions et/ou les porte-greffes sensibles. Dans ce contexte, la diversité des virus qui infectent les pommiers et poiriers est encore largement sous-estimée et il convient de poursuivre les recherches. L'objectif de ce projet est d'évaluer la présence de virus connus et inconnus infectant d'anciens cultivars de pommiers ("Gravenstein", "Pomme Pellone", "Délices de Beignée", "Reinette Meurens", "Belle de Boskoop" et "Joseph Musch") et de poiriers ("Poire Cuisse Madame", "Jeanne d'Arc", "Poire Rougette", "Bronzée d'Enghien" et "Colmar du Mortier"). L'ARN total a été séquencé et analysé à partir de feuilles de "Joseph Musch" (arbre Q9). Des protocoles de préparation d'ARN double brin (dsRNA) et d'acides nucléiques associés à des virions (VANA) ont été appliqués aux autres échantillons individuellement (dsRNA) ou en pool (dsRNA et VANA). Après le séquençage, les données obtenues ont été analysées avec Geneious Prime pour identifier les virus présents dans les arbres échantillonnés et pour reconstruire les génomes des isolats présants. En outre, deux autres pipelines bioinformatiques ont été testés sur les données générées (Kaiju et Kraken). La séquence du génome presque complète de sept nouveaux isolats de plusieurs virus connus a été reconstruite. Tous les échantillons ont été infectés par au moins un virus, le plus répandu étant l'ASPV. Il est intéressant de noter que la détection de « Apple rubbery wood virus-1 » (ARWV-1), « Apple luteovirus-1 » (ALV-1) et « Apple hammerhead viroid-like RNA » (AHVd-like RNA) correspond à la première détection en Europe. Les amorces ont été conçues pour confirmer la présence de l'ARWV-1 par RT-PCR, confirmant la première détection de ce virus en Europe. D'autres études doivent être menées pour évaluer la distribution de l'ARWV-1 au sein de la collection de germoplasme et pour confirmer la détection de l'ALV-1 et de l'AHVd. L'analyse de la prévalence locale de ces virus sera une première étape pour évaluer le risque biologique qu'ils peuvent représenter pour la production européenne.

Mots-clés: pommiers (Malus Mill.), poiriers (Pyrus L.), virus des plantes, séquençage à haut débit (HTS), virome, ressources génétiques, Apple stem pitting virus (ASPV), Apple stem grooving virus (ASGV), Apple luteovirus-1 (ALV-1), Apple rubbery wood virus-1 (ARWV-1), Apple hammerhead viroid-like RNA (AHVd-like RNA)

TABLE OF CONTENTS

1 Introduction ...... 1 2 Bibliography ...... 2 2.1 Rosaceae Family ...... 2 2.1.1 Growth and development of pome fruit trees ...... 2 2.1.2 Vegetative propagation of pome fruit trees ...... 4 2.1.3 Apple tree (Malus Mill.) ...... 4 2.1.4 Pear tree (Pyrus L.) ...... 5 2.2 Importance of germplasm collections for conservation of genetic diversity ...... 6 2.3 High-throughput sequencing (HTS) for virome characterization ...... 6 2.4 Bioinformatics ...... 7 2.5 Extraction protocols ...... 8 2.6 Plant viruses ...... 9 2.6.1 Definition ...... 9 2.6.2 Taxonomy ...... 9 2.6.3 Genome composition and organization ...... 10 2.6.4 Apple tree viruses ...... 10 3 Objectives ...... 19 4 Materials and methods ...... 21 4.1 Biological data collection ...... 21 4.2 RNA Extraction methods ...... 21 4.2.1 RNeasy Plant Mini Kit (QIAGEN) ...... 22 4.2.2 Virion-associated nucleic acid (VANA) ...... 24 4.2.3 Double-stranded RNA (dsRNA) ...... 26 4.3 Virus detection protocol by RT-PCR ...... 29 4.4 Bioinformatic analyses ...... 30 5 Results ...... 33 5.1 Implementation of double-stranded RNA (dsRNA) protocol in Gembloux ...... 33 5.2 Sequencing statistics ...... 33 5.3 Identified viruses ...... 36 5.3.1 Foveaviruses and apple stem pitting virus (ASPV) ...... 36 5.3.2 Apple Rubbery Wood Virus-1 (ARWV-1) ...... 42 5.3.3 Apple Stem Grooving Virus (ASGV) ...... 46 5.3.4 Apple luteovirus-1 (ALV-1) ...... 47

5.3.5 Apple hammerhead viroid-like RNA (AHVd-like RNA) ...... 48 5.4 Comparison of classification programs and viral enrichment during extraction ...... 49 6 Discussion ...... 50 7 Conclusions ...... 52 8 References ...... 54

Table of figures Figure 1. Stages of bud development in pome fruit trees. A: Winter bud; B: beginning of bud swelling; C, and C3: apparent swelling of the bud; and, D: appearance of leafless flower buds [17]...... 3 Figure 2. Types of fructifications, ports, and ramifications in apple, as an example for pome fruit trees [18]...... 3 Figure 3. Vertical section of a flower from an apple tree, as a general example for trees belonging to the Rosaceae family [19]...... 4 Figure 4. Quality of the union between three different rootstocks (M26, M9, and M106) with scions from the apple variety Granny Smith. Grafting on M26 presents high incompatibility symptoms, grafting on M9 has medium signs of incompatibility, and grafting on M106 is successful [21]...... 4 Figure 5. Fruits produced by several apple tree (Malus Mill.) cultivars to illustrate the skin colour range of apples. From left to right: Golden Delicious, Granny Smith, Boskoop, Gala, and Red Delicious [13]...... 5 Figure 6. Fruits from several pear tree (Pyrus L.) cultivars to illustrate the skin colour range of pears. From left to right: Green Anjou, Bartlett, Bosc, and Red Anjou [25]...... 5 Figure 7. Data of sequencing costs per Human genome between 2001 and 2019. From: https://www.genome.gov/about-genomics/fact-sheets/Sequencing-Human-Genome-cost...... 6 Figure 8. Increase in (a) number of nucleotides available in Genbank and (b) number of protein structures published in PDB [43]...... 8 Figure 9. Comparison of the ICTV taxonomic rank hierarchy from 1991 to 2017 and in 2019. The number of taxa assigned to each rank from the new hierarchy are shown in white. Black arrows show the common taxonomic ranks between the five-rank (1991-2017) and the fifteen-rank (2019) structures [51]...... 9 Figure 10. Distribution of plant virus species by family. Numbers in black indicate the quantity of plant virus species recognized by the ICTV in each family, and family names with an asterisk include species that infect other nonplant hosts [52]. This graphic was published in an article from Roossinck in 2012, thus the viral families that have been included by the ICTV after 2012 are not included...... 10 Figure 11. Genome organization of Apple stem pitting virus (ASPV), the type member of . Boxes indicate open reading frames (ORFs) and the relative position of their expression products. Abbreviations: Mtr, methytransferase; P-Prp, papain-like protease; Hel, helicase; Pol, polymerase; TGB, triple gene block; CP, protein (ICTV)...... 13 Figure 12. Negative contrast electron micrograph of particles of an isolate of Apple stem pitting virus (ASPV). The black bar at the low, left corner represents 100 nm (ICTV)...... 13 Figure 13. Worldwide distribution map of Apple stem pitting virus (ASPV). Yellow dots indicate countries where ASPV has been reported (incomplete map). Distribution of ASPV is probably much wider, but the database used to generate this map was, most probably, not updated and incomplete. From: https://gd.eppo.int/taxon/ASPV00/distribution...... 13 Figure 14. Detail of stem pitting symptoms produced by apple stem pitting virus (ASPV) infection. From: https://gd.eppo.int/taxon/ASPV00/photos...... 14 Figure 15. Comparison in stem and leaf growth between a healthy and an apple stem pitting virus (ASPV) infected plant showing symptoms of plant decline. From: https://gd.eppo.int/taxon/ASPV00/photos...... 14 Figure 16. Genome organization of Capilloviruses. Boxes indicate open reading frames (ORFs) and the proteins they encode: Mtr, methyltransferase domain; P-Pro, papain-like protease domain; Hel, helicase domain; RdRp, RNA polymerase domain; MP, putative movement protein; CP, coat protein (ICTV)...... 15

Figure 17. Image of particles of an isolate of Apple stem grooving virus (ASGV) produced by negative contrast electron micrograph. The bar in the lower, right corner represents 100 nm (ICTV)...... 15 Figure 18. Worldwide distribution map of Apple stem grooving virus (ASGV). Yellow dots indicate countries where ASGV has been reported. From: https://gd.eppo.int/taxon/ASGV00/distribution.... 15 Figure 19. Necrotic grooves at the graft union of an unknown rootstock budded with Virgina Crab, a symptom of apple stem grooving virus (ASGV) infection. From https://gd.eppo.int/taxon/ASGV00/photos...... 15 Figure 20. Genome organization of viruses from the family . Boxes indicate open reading frames (ORFs) and the proteins they encode: L segment encodes RNA polymerase protein (RdRp); M segment encodes for several polyproteins by leaky scanning; S segment encodes for several proteins, and both genomic and antigenomic RNA is transcribed [100]...... 16 Figure 21. Apple rubbery wood disease in an apple tree infected with Apple rubbery wood virus-1 (ARWV-1) [59]...... 16 Figure 22. Genome organization of Apple luteovirus 1 (ALV-1). Boxes indicate the proposed open reading frames (ORFs) and the proteins they encode: RdRp, RNA polymerase domain; CP, coat protein; MP, movement protein [62]...... 17 Figure 23. Symptoms of Rapid Apple Decline (RAD) showing necrosis from a declining tree with bark removed from the graft union. From: https://extension.psu.edu/apple-disease-rapid-apple- decline-rad-or-sudden-apple-decline-sad...... 17 Figure 24. Proposed primary and secondary structures for Apple hammerhead viroid-like RNA (AHVd-like RNA). In the subfigure A, the mutations observed in different variants of the viroid are indicated in blue and the sequences forming the hammerhead structures are delimited by flags. The two pairs of arrows represent the primers used for amplification. In subfigure B, there are three schematic maps of plus, minus, and consensus hammerhead structures of the viroid [106]...... 18 Figure 25. Symptoms of apple scar skin disease in apple. A: symptoms of small circular spots are more pronounced at the calyx end. B: misshaping of apple fruits. From: https://pnwhandbooks.org/plantdisease/host-disease/apple-malus-spp-scar-skin-dapple-apple...... 18 Figure 26. Principle and procedure of the RNeasy extraction kits from QIAGEN. From left to right: RNeasy Mini Kit, RNeasy Protect Mini Kit, and RNeasy Plant Mini Kit, highlighted in red (From: RNeasy Handbook)...... 22 Figure 27. Flowchart of a total RNA extraction with the RNeasy Plant Mini Kit (QIAGEN). The RNA extraction and the DNAse treatment were performed in Gembloux Agro-Bio Tech; and the ribodepletion, library preparation, and sequencing were performed by GIGA in Liège...... 23 Figure 28. Flow through of the processes and steps followed during the extraction of virion-associated nucleic acids (VANA), adapting the procedure from Filloux et al. (2015)...... 24 Figure 29. Flow through of a double stranded RNA (dsRNA) extraction, following the procedure of Marais et al. (2018)...... 26 Figure 30. General flow through used to process the reads and identify the viruses infecting the selected apple and pear trees, after sequenced reads are obtained. 1: performed on Geneious; 2: Linux-based programs...... 30 Figure 31. Classification algorithm used by Kraken to classify a sequence. The k-mers in the sequence are mapped to the lowest common ancestor (LCA) to the genomes that contain that k-mer in the pre- computed database. Then, the taxa that has been associated with k-mers from the sequence, and the ancestors of the taxa, form a pruned subtree that is used for classification. The classification path in the classification tree with the maximal root-to-leaf (RTL) is chosen as the taxonomic assignation of the k-mers [114]...... 31

Figure 32. Classification algorithm used by Kaiju to assign sequencing reads to a taxon. A sequencing read is first translated into the six possible reading frames, and then they are split into fragments at stop codons. Fragments are sorted by their length (MEM) or by their score (Greedy), which are screened against the reference protein database using the Burrows-Wheeler Transformation (BWT) [109]...... 31 Figure 33. Migration of dsRNAs from treated and non-treated (NT) with enzymatic treatment. Samples 11 and 12 correspond to samples 8 and 9 extracted with the dsRNA extraction protocol for grapevine...... 33 Figure 34. Migration of PCR products after cDNA synthesis and random amplification, visualized in a UV transilluminator. From left to right: lanes 1-10 are individual samples, lane 11 is a mix of individual samples before amplification, and lane 12 is the ladder...... 33 Figure 35. Mapping all reads from sample Q9 to the reference genome of Apple green crinkle virus (AGCaV), allowing 20% of mismatches with no iterations...... 36 Figure 36. Mapping all reads from sample Q9 to the reference genome of Apple stem pitting virus (ASPV), allowing 20% of mismatches with no iterations...... 37 Figure 37. Mapping of all extended contigs (Annex II, ) to the reference sequence of Apple stem pitting virus (ASPV) (Figure 58), with data from sample Q9 (Table 4)...... 37 Figure 38. Tree of three longest contigs produced with de novo Geneious Assembler allowing different percentages of mismatches (5%, 10%, 20%). The contigs clustering together had between a 98% and 100% of identity, except contig 2 at 5% and 10% which had 87% of identity...... 38 Figure 39. Mapping the contigs obtained from the DeNovo assembly with Geneious assembler at 10% of mismatches to the reference genome of Apple stem pitting virus (ASPV). The mapping was done allowing 28% of mismatches, thus mapping contigs that have more than 82% of identity with ASPV at nucleotide level (minimum percentage of identity stated in the demarcation criteria for the family )...... 39 Figure 40. Test of coverage of contig number 2...... 39 Figure 41. Test of coverage of contig number 3...... 39 Figure 42. Test of coverage of contig number 5...... 40 Figure 43. Test of coverage of contig number 7...... 40 Figure 44. Phylogenetic tree of RdRp fragments, at nucleotide level, from ASPV and AGCaV complete sequences downloaded from GenBank [119], and three new isolates identified from cultivar Joseph Musch. Isolates of ASPV are highlighted in Black, isolated of AGCaV in green, and new isolates in red...... 41 Figure 45. Phylogenetic tree of RdRp fragments, at aminoacid level, from ASPV and AGCaV complete sequences downloaded from GenBank, and three new isolated identifies from cultivar Joseph Musch. Isolates of ASPV are highlighted in Black, isolated of AGCaV in green, and new isolates in red...... 41 Figure 46. Phylogenetic tree of CP fragments, at nucleotide level, from ASPV and AGCaV complete sequences downloaded from GenBank, and three new isolates identified from cultivar Joseph Musch. Isolates of ASPV are highlighted in Black, isolated of AGCaV in green, and new isolates in red. .... 41 Figure 47. Phylogenetic tree of CP fragments, at aminoacid level, from ASPV and AGCaV complete sequences downloaded from GenBank, and three new isolates identified from cultivar Joseph Musch. Isolates of ASPV are highlighted in Black, isolated of AGCaV in green, and new isolates in red. .... 41 Figure 48. Mapping all reads to the reference of segment L from Apple rubbery wood virus-1 isolate BR-Mishima, allowing 20% of mismatches with no iterations...... 42

Figure 49. Mapping all reads to the reference of segment M from Apple rubbery wood virus-1 isolate BR-Mishima, allowing 20% of mismatches with no iterations...... 43 Figure 50. Mapping all reads to the reference of segment S from Apple rubbery wood virus-1 isolate BR-Mishima, allowing 20% of mismatches with no iterations...... 43 Figure 51. Phylogenetic tree of ARWV segment L sequences, at nucleotide level, complete sequences downloaded from GenBank, and the isolated identified in the cultivar Joseph Musch...... 43 Figure 52. Phylogenetic tree of ARWV segment L sequences, at aminoacid level, complete sequences downloaded from GenBank, and the isolated identified in the cultivar Joseph Musch...... 43 Figure 53. Phylogenetic tree of ARWV segment M sequences, at nucleotide level, complete sequences downloaded from GenBank, and the isolated identified in the cultivar Joseph Musch...... 44 Figure 54. Phylogenetic tree of ARWV segment M sequences, at aminoacid level, complete sequences downloaded from GenBank, and the isolated identified in the cultivar Joseph Musch...... 44 Figure 55. Phylogenetic tree of ARWV segment S sequences, at nucleotide level, complete sequences downloaded from GenBank [119], and the isolated identified in the cultivar Joseph Musch...... 44 Figure 56. Phylogenetic tree of ARWV segment S sequences, at aminoacid level, complete sequences downloaded from GenBank [119], and the isolated identified in the cultivar Joseph Musch...... 44 Figure 57. Electrophoresis gel of the PCR products with ARWV-1 primers ARWaV-1L3639F and ARWaV-1L4058R (Table 7) Samples were taken from leaves of different age from tree Q9 and RNA was extracted with the RNeasy Plant Mini Kit. From left to right: 1 and 2, young leaves; 3 and 4, medium leaves, 5 and 6, old leaves. The ladder used was GeneRuler 100 bp DNA Ladder...... 45 Figure 58. Electrophoresis gel of the PCR products with ARWV-1 primers ARWaV-1L3639F with an hybridization temperature of 54ºC and ARWaV-1L4058R (Table 7) with samples taken from trees surrounding tree Q9. RNA was extracted with the RNeasy Plant Mini Kit. From left to right: 1-16, samples 1-8 with two replicates. The positive control was taken from PCR product of the positive sample with amplification of ARWV-1 (Figure 57). The ladder used was GeneRuler 100 bp DNA Ladder...... 45 Figure 59. Electrophoresis gel of the PCR products with Nad5 primers, to ensure that the extraction with RNeasy Plant Mini Kit of samples from trees surrounding Q9 was performed well. From left to right: 1-8, samples from trees surrounding Q9; 9-10, positive controls from tomato ; 11, negative control. The ladder used was GeneRuler 100 bp DNA Ladder...... 45 Figure 60. Mapping of all reads to the reference of Apple stem grooving virus isolate HPKu-2, allowing 20% of mismatches with 25 iterations...... 46 Figure 61. Phylogenetic tree of ASGV full sequences, at nucleotide level, downloaded from GenBank [119], and a new isolated identified in the cultivar Joseph Musch...... 46 Figure 62. Phylogenetic tree of ASGV full sequences, at aminoacid level, downloaded from GenBank [119], and a new isolated identified in the cultivar Joseph Musch...... 46 Figure 63. Mapping all reads from sample number 9 (Table 5) to the reference genome of Apple luteovirus-1 (ALV-1) isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 85)...... 47 Figure 64. Mapping all reads from sample number 6 (Table 4) to the reference genome of ALV-1 isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 86)...... 47 Figure 65. Mapping all reads from sample number 7 (Table 4) to the reference genome of ALV-1 isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 86)...... 47 Figure 66. Mapping all reads from sample number 8 (Table 4) to the reference genome of ALV-1 isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 86)...... 48

Figure 67. Mapping all reads from sample number 10 (Table 4) to the reference genome of ALV-1 isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 86)...... 48 Figure 68. Mapping of all reads from sample number 8 (Table 5) to the reference sequence of Apple hammerhead viroid-like RNA (AHVd-like RNA), allowing 20% of mismatches with no iterations. 48

Table of tables Table 1. Apple and pear production (x1,000 tons) in Europe, North America, Asia, and the Southern Hemisphere; during the years 2003, 2006, 2009, and 2012. Data from WAPA - The World Apple and Pear Association...... 2 Table 2. Comparison of detection method, form of nucleotide addition, enzyme, amplification, length of the products, and fixation of DNA between most currently used high-throughput sequencing (HTS) technologies (Illumina, Ion torrent, PacBio, and Oxford)...... 6 Table 3. Types of viral nucleic acids used for metagenomic studies. Abbrev iations: dsRNA, double- stranded RNA; ssRNA, single-stranded RNA; siRNA, small interfering RNA; VANA, virion- asssociated nucleic acid [45]...... 8 Table 4. Summary of the most common viruses infecting apple and pear trees, their taxonomy, their group according to the , and the symptoms they produce [54]...... 11 Table 5. Metadata of the samples taken from the CRA-W in Gembloux. Identification number of the cultivars in the collection, name of the cultivar, tree species (host), number given to samples during laboratory analysis (sample nº), and extraction method used for each sample...... 21 Table 6. Nucleotide sequences of the multiplex identifiers (MIDs) used with double-stranded RNA (dsRNA) and virion-associated nucleic acids (VANA) protocols. The MID sequences used with dsRNA extraction protocol (MID-GENCO1-10) were designed by the laboratory of plant pathology in INRA Aquitaine (Bordeaux, France). The MID with name “Tag761_4_LDF_093” was designed by the plant virology group from Gembloux Agro-Bio Tech...... 22 Table 7. Primers used for detection of Apple rubbery wood virus-1 (ARWV-1), Apple luteovirus-1 (ALV-1), and Apple hammerhead viroid-like RNA (AHVd-like RNA) by an RT-PCR. Primers for ARWV-1 were taken from the publication of Rott et al. (2018), and two sets of primers for AHVd-like RNA were taken from the publication of Serra et al. (2018). The other sets of primers were designed with Geneious Prime ® 2020.0.5...... 29 Table 8. Code of the sequences used as reference genome sequence during the bioinformatic analysis, obtained from GenBank...... 32 Table 9. Comparison of virus detectability with different identification methods (BLAST, Kaiju, and Kraken) and virus enrichment with different RNA extraction methods (dsRNA and VANA). Columns of Kaiju and Kraken indicate number of reads per million assigned to a certain virus specie and BLAST is indicated as presence (+) or absence (-). Abbreviations: Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), Apricot latent virus (ApLV), Apple chlorotic leaf spot virus (ACLSV), Apple rubbery wood virus-1 (ARWV-1), Apple luteovirus-1 (ALV-1), Apple hammerhead viroid-like RNA (AHVd-like RNA), Apple chlorotic leaf spot virus (ACLSV), double-stranded RNA (dsRNA), VANA (virion-associated nucleic acid)...... 35 Table 10. Percentage of homology (%) of the contigs produced after de novo assembly using blastx and blastn, comparing the contigs to Apple green crinkle associated virus (AGCaV), Apple stem pitting virus (ASPV), and Apricot latent virus (ApLV)...... 36 Table 11. Number of contigs with greater length than 1,000 bp, maximum length (bp), total number of contigs produced with Geneious Assembler allowing different percentage of mismatches (5%, 10%, 20%)...... 38

Abbreviations

aa aminoacid ACLSV apple chlorotic leaf spot virus AGCaV apple green crinkle associated virus ARWV apple rubbery wood virus ASGV apple stem grooving virus ASPV apple stem pitting virus ApLV apricot latent virus BLAST Basic Local Alignment Search Tool cDNA complementary DNA CP coat protein CRA-W Walloon Agricultural Research Centre dNTP deoxyribose nucleoside triphosphate dsRNA double-stranded RNA DTT dithiothreitol EDTA ethylene diamino tetraacetic acid HTS high-throughput sequencing ICTV International Committee on the Taxonomy of Viruses MP movement protein NCBI National Center for Biotechnology Information nt nucleotide ORF open reading frame PCR polymerase chain reaction RdRp RNA-dependent RNA polymerase RT reverse transcriptase RT-PCR reverse transcriptase polymerase chain reaction TGB triple gene block UTR untranslated region VANA virion associated nucleic acid

1 INTRODUCTION

In the European Union (EU), approximately 3.4 million hectares are dedicated to fruit cultivation, with almond and apple trees being the most cultivated. In 2017, apple orchards represented the 15.5% of total fruit production in the EU, representing a value of 3.8 billion euros that year [1]. Hence, pome fruit viruses that are associated with diseases affecting pome fruit trees have been thoroughly studied over the years and more than fifty viruses and six have been identified, mainly from commercial cultivars [2], [3]. Behind the commercial cultivar, an important genetic diversity exists for pome fruit tree. This genetic diversity is particularly important for the future as a source of genetic material for breeding. For example, partial resistance gene to scab, an apple disease caused by the Venturia inaequalis, were identified and characterized in the old Belgian apple cultivar ‘Président Roulin’ [4]. Nevertheless, the viral status of landrace and old cultivars in germplasm collection is not known. Therefore, it is becoming important to assess more in depth the viruses able to infect pome trees. Plant germplasm collections provide a new source of plant material for the study and characterization of new viruses whose infection might not results in any symptoms so far but which could represent future threads for apple production [5]–[7].

In this frame, this project was focused on characterizing the virome, defined as the genome of all viruses of apple (Malus Mill.) and pear (Pyrus L.) trees from the Walloon Agricultural Research Centre (CRA-W) germplasm collection in Gembloux (Belgium). Specifically, six apple cultivars (‘Gravenstein’, ‘Pomme Pellone’, ‘Délices de Beignée’, ‘Reinette Meurens’, ‘Belle de Boskoop’, and ‘Joseph Musch’) and five pear cultivars (‘Poire Cuisse Madame’, ‘Jeanne d’Arc’, ‘Poire Rougette’, ‘Bronzée d’Enghien’, and ‘Colmar du Mortier’) were studied. The main goal of this research was to gain more knowledge on viruses from apple and pear and their history in Europe. Behind this main objective, there was also the aim to compare two different extraction protocols, double-stranded RNA (dsRNA) and virion-associated nucleic acids (VANA), and the effect of pooling on virus detection and yield. For that, total RNA of samples taken from the cultivar ‘Joseph Musch’ was extracted with a RNeasy Plant Mini Kit (QIAGEN). Double-stranded RNA (dsRNA) and virion-associated nucleic acids (VANA) were extracted from samples taken from the other apple and pear trees. Data produced during this project was analysed to identify the viruses infecting the trees that were sampled. Additionally, virus enrichment during extraction and detection during the bioinformatic analysis were compared between two extraction methods (dsRNA and VANA) and between two classification programs (Kaiju and Kraken).

1

2 BIBLIOGRAPHY

2.1 ROSACEAE FAMILY

The Rosaceae Family includes over 100 genera and 3,000 species, consisting of many important fruits, nuts, ornamental, and woody crops, mainly found in temperate climates [8], [9]. The majority of members of this family are widely commercially exploited due to their high nutritional value, as well as for their aesthetic and unique colours [10]. Asia is the biggest producer of apple and pear worldwide, followed by Europe, the Southern Hemisphere, and North America (Table 1). Over the centuries, fruit trees have been domesticated and improved by breeding and interspecific hybridization, thus producing a high genetic variability within species, and complicating its phylogeny description. There have been multiple attempts over the years to propose effective models of categorization [11], [12]. In this project, the pome fruit trees studied are apple (Malus domestica.) and pear (Pyrus communis). Hence, the following information of reproduction and development is based on pome fruit trees.

Table 1. Apple and pear production (x1,000 tons) in Europe, North America, Asia, and the Southern Hemisphere; during the years 2003, 2006, 2009, and 2012. Data from WAPA - The World Apple and Pear Association.

Pome fruit Area 2003 2006 2009 2012 Europe 16,106 15,077 16,178 15,569 North America 4,821 5,547 5,362 4,754 Apple Asia 27,362 33,609 38,572 44,761 Oceania 827 630 652 737 South America 3,533 3,652 3,477 4,357 Europe 2,969 2,972 2,910 2,360 North America 886 806 901 806 Pear Asia 11,160 13,487 15,873 17,708 Oceania 175 172 152 149 South America 888 995 926 936

2.1.1 Growth and development of pome fruit trees

The meristem is the organ responsible for plant growth and development and consists of cells that can divide continuously and permanently. Apical or primary meristems are located at the tips of the axes or stems and contribute to their elongation, and secondary meristems appear at the interior of the axes and contribute to their growth in diameter. During cell division of primary meristems, groups of cells are organized in well-defined areas. These groups of cells produce the constituent parts of the stem, such as leaves, nodes, and inter-nodes. From primary meristems, buds are also formed (Figure 1). Meristems are regulated internally by the hormones auxin and cytokinin, which stimulate plant growth and cell division [14]. In perennial plants, bud dormancy is crucial for plants to survive through cold winters, and it is associated with the temporary suspension of visible growth generally from late summer to late winter. Its progression over winter determines the quality of bud break, flowering, and fruiting. Bud dormancy can be categorized as true dormancy (“rest” or “endodormancy”), which is triggered by internal factors, and climatic dormancy (“quiescence” or “ecodormancy”), which is controlled by external factors such as temperature [15], [16].

2

Figure 1. Stages of bud development in pome fruit trees. A: Winter bud; B: beginning of bud swelling; C, and C3: apparent swelling of the bud; and, D: appearance of leafless flower buds [17].

Figure 2. Types of fructifications, ports, and ramifications in apple, as an example for pome fruit trees [18].

Taking apple as an example, fructification, port, and ramification can be divided into four categories: (i) spurs, (ii) Golden Delicious, (iii) Reine des Reinettes, and (iv) Granny Smith (Figure 2). Trees within the category “spurs” have conical branches and a tendency to pierce on their basal or proximal part, the spurs are located on two-year old branches or older, long branches are generally weak, fruits remain close to the carpenter branches and fruiting area does not move away from the main structure of the tree. Trees from the category “Reine des Reinettes” are easy to grow in windy areas because the insertion of fruit branches on the axis are very solid and the angles of the ramifications are open, the majority of spurs are located on twigs between two to five years old, and its exteriorization is slow compared to the central axis. In the category “Golden Delicious”, tree ramification is much more important than those of “spurs” and “Reine des Reinettes”, spurs are mainly located on young twigs between one and three years old, and the fruiting area quickly moves away from the centre of the tree causes sagging of the carpenter branches. Lastly, trees from the category “Granny Smith” have the spurs located on young twigs between one and two years old, the branching is weak on the lower part of the carpenter branch, whose extension is carried out by successive arches. In “Granny Smith”, the evolution of the fruiting area towards the outside of the tree is faster than the evolution in “Golden Delicious” [18], [19].

3

Figure 3. Vertical section of a flower from an apple tree, as a general example for trees belonging to the Rosaceae family [19].

The typical flower of trees belonging to the Rosaceae family has five sepals, five petals, and twenty stamens (Figure 3). Generally, flowers from pome fruit trees are auto-incompatible and it is necessary to use another cultivar as a pollinator. This flower auto-incompatibility of pome fruit trees is due to the fact that the varieties or cultivars can be classified into two different chromosomic groups: (i) diploid varieties, with thirty-four chromosomes and a regular meiosis; and (ii) triploid varieties, with fifty-one chromosomes and an irregular meiosis that results in a very low germination capacity of the pollen. Fruits from trees within the Rosaceae family are complex fleshy fruits, which are a result from the development of the flower ovary and the welded tissue that surround it [19], [20].

2.1.2 Vegetative propagation of pome fruit trees

In orchards, pome fruit trees are transmitted by grafting scions of the desired cultivars into a selected rootstock. The vegetative propagation of planting material by grafting allows growers to reproduce the desired characteristics of the cultivar. Rootstocks directly influence the characteristics of the scion, the fruit set and the productivity, and fruit characteristics. The selection of the correct cultivar in combination with the selection of the correct rootstock is the base of a successful orchard, and it is essential to prevent incompatibility between rootstock and scion, which can lead to bead and necrosis of the scion (Figure 4). This incompatibility can have a genetic or a viral origin. For example, if either the scion or the rootstock are infected with virulent strains of apple stem pitting virus (ASPV) [21].

Figure 4. Quality of the union between three different rootstocks (M26, M9, and M106) with scions from the apple variety Granny Smith. Grafting on M26 presents high incompatibility symptoms, grafting on M9 has medium signs of incompatibility, and grafting on M106 is successful [21].

2.1.3 Apple tree (Malus Mill.)

Apple is one of the most important and widely cultivated temperate fruit crops worldwide that consists of about twenty-five species and multiple subspecies, known as crabapples. The scientific name Malus x domestica is generally accepted as the denomination for the cultivated apple and it is believed to

4 have originated in Central Asia, which is also the area with highest diversity of this crop. Malus siversii is considered the progenitor of the domestic apple, although in some articles they are both referred to as Malus pumila. From Malus x domestica, a considerable number of cultivars have been produced and adapted to the growing conditions of each country. For example, two of the most notorious and important apple cultivars are “Red Delicious”, first reported in 1880 in the United States, and “Golden Delicious”, first reported in 1890 in the United States as well [10], [22].

Apples can be classified depending on their concentration of malic acid and tannins: (i) bitter-sweet apples, which contain more than 0.2% of tannins and less than 0.45% acidity; (ii) sharp apples, which contain less than 0.2% of tannins and more than 0.45% acidity; and, (iii) sweet apples, which contain less than 0.2% of tannins and less than 0.45% acidity . They are a coveted fruit for their high percentage of carbohydrates and food energy; low fat, cholesterol, or sodium; and, high percentage of pectins, phenolic antioxidants and flavonoids [23]. Its fruit originates and develops from the inferior ovary wall and floral tube, which fuses to the ovary wall and then expands and ripens [22]. The colour of their skin ranges from yellow, yellow-green, to bright red (Figure 5). Apples are harvested in late summer or early autumn, but they can be stored at 0ºC for a long period of time due to their low respiration rate.

Figure 5. Fruits produced by several apple tree (Malus Mill.) cultivars to illustrate the skin colour range of apples. From left to right: Golden Delicious, Granny Smith, Boskoop, Gala, and Red Delicious [13].

2.1.4 Pear tree (Pyrus L.)

The genus Pyrus consists of about twenty species, and it is believed to have originated in Central Asia, and by the end of the Middle Ages over two-hundred cultivars had been described. The common pear is also referred to as Pyrus communis. Pear trees are mostly cultivated in temperate regions as they do not tolerate extensive periods of winter frost. They also need a soil rich in nutrients and a well-drained subsoil, which can influence fruit quality. Pear trees are generally propagated by grafting but can be propagated by seeds in wild populations. The colour of their skin ranges from yellow, yellow green, to brown, with very few cultivars presenting bright dark red skin (Figure 6). Red skin in pears is an attractive trait for consumers not only for their aesthetic appeal but also for the health benefits provided by the anthocyanins and their antioxidant effects [24]. Unlike apples, pears have gritty cells around their core called stone cells.

Figure 6. Fruits from several pear tree (Pyrus L.) cultivars to illustrate the skin colour range of pears. From left to right: Green Anjou, Bartlett, Bosc, and Red Anjou [25]. 5

2.2 IMPORTANCE OF GERMPLASM COLLECTIONS FOR CONSERVATION OF GENETIC DIVERSITY

Demand for specific characteristics in fruits and vegetables changes from year to year, but the intensive farming systems have dramatically reduced apple genetic diversity and the use of local varieties. Cultivated species and cultivars comprise a small fraction of the totality available, and there is a risk of losing ancient cultivars and their genetic characteristics. Germplasm is described as the living genetic resources of an organism and storing it in collections is particularly useful both for and plant breeding and for preservation of genetic biodiversity. In the case of plants, germplasm in collections is stored as seeds in cool storage, and as plants kept and grown in nurseries or orchards [5], [6], [26].

2.3 HIGH-THROUGHPUT SEQUENCING (HTS) FOR VIROME CHARACTERIZATION

Diagnostic methods for any known viral infection have historically included simple serological assays such as ELISA test, molecular hybridization, polymerase chain reaction (PCR), real-time PCR, electron microscopy or indexing [27]–[29]. These immunological and molecular techniques target the sequence of known viruses, thus they are not suited for the identification of unknown pathogens. Techniques with a broader range of identification include methods such as indexing or electron microscopy, although they do not serve as a method of virus identification and further analyses need to be carried out in order to identify the species of the virus. In comparison, HTS allows the identification of novel pathogens without any a priori knowledge of the pathogen from environmental and host tissue samples, as well as from asymptomatic infections or with no obvious symptoms [30]. HTS has also supposed a step forward to better tools for plant health diagnostics used, for example, by National Plant Protection Organisations (NPPOs) to control importation and movement of plant material, and the associated pests [31].

Figure 7. Data of sequencing costs per Human genome between 2001 and 2019. From: https://www.genome.gov/about-genomics/fact- sheets/Sequencing-Human-Genome-cost.

Table 2. Comparison of detection method, form of nucleotide addition, enzyme, amplification, length of the products, and fixation of DNA between most currently used high-throughput sequencing (HTS) technologies (Illumina, Ion torrent, PacBio, and Oxford).

Illumina Ion Torrent Pacific Bioscience Oxford Nanopore Detection method Fluorescence Electric signal Fluorescence Electric signal Nucleotides Mix One by one Mix n.a. Enzyme Reagent Reagent Fixed Fixed Amplification Bridge Beads No No Length Max. 300 nt, Max. 300 nt Up to 10 kb Up to 10 kb paired Fixation of DNA Oligo on cell Oligo on beads Enzyme Enzyme + nanopore

6

In the mid-1970s, a first sequencing technology based on radioactivity was developed. This method uses isotopic radioactive labelling of primers for DNA ladder imaging [32]. Leroy Hood modified the Sanger’s method by replacing radioactive labelling with colour fluorescent dyes. These modifications were the base for the first semiautomated DNA sequencing platform [33]. In the mid-2000s, the second-generation sequencing, also next generation sequencing (NGS), technologies were developed. These second-generation sequencing technologies are based on cyclic array sequencing by synthesis. They use a massive matrix configuration where thousands to millions of DNA fragments are analysed simultaneously. Opposed to first-generation technologies, this next generation sequencing methods do not use electrophoresis, which results in less biological material consumption and quality and yield improvements [34]. With the release of the first truly HTS platform during the mid-2000s, the cost of sequencing a human genome dropped 50,000-fold compared to the costs of the Human Genome Project (Figure 7). NGS includes PCR-based next-generation DNA-sequencing technologies, such as Roche 454 (discontinued from 2016) or Illumina Sequencing (see https://www.illumina.com/) by synthesis, and single-molecule DNA-sequencing technologies, such as Helicos biosciences HeliScope or Pacific Biosciences (PacBio) (see https://www.pacb.com/) single-molecule real-time (SMRT) DNA sequencer (Table 2). Another sequencing technology by synthesis is Ion Torrent (see https://www.thermofisher.com/be/en/home/brands/ion-torrent.html) [35], [36]. Currently, Illumina sequencing technology is the most used within the scientific community. Nevertheless, more recent technologies, such as Oxford Nanopore (see https://nanoporetech.com/), are being developed. These new technologies can be considered as third generation sequencers. Third generation sequencing does not need a PCR to amplify the DNA before sequencing, which reduces the preparation time and bias and error caused by PCR, and the signal is captured in real time, whether it is fluorescent (PacBio, Table 2) or electric signal (Oxford Nanopore, Table 2). The advantages of PacBio and Nanopore technologies against Illumina and Ion Torrent is the maximum length of the fragment sequenced and that they do not need previous amplification of the nucleic acids (Table 2). The development for future HTS technologies are focusing on amplification of use, increased throughput and decrease in sequencing costs [37], [38].

2.4 BIOINFORMATICS With the first appearance of HTS platforms, researchers started to develop specific bioinformatic tools to manage and analyse the large amounts of data generated by sequencing technologies. However, the field of bioinformatics was first developed in the early 1950s with the use of computational methods in protein sequence analysis [39]. Margaret Dayhoff (1925-1983) was a pioneer on the application of computational methods to the field of biochemistry and, together with David J. Lipman, is considered one of the parents of bioinformatics. After 1970s, there was a paradigm shift from protein to DNA analysis with the contributions of Francis Crick (1916-2004) and his sequence hypothesis, which is known as the “Central Dogma”. Then, during the 1980s there were advances in biology and computer science such as the development of molecular methods to target and amplify specific genes by a polymerase chain reaction (PCR). At this point, with the advances in sequencing technologies and the increase in generated data (Figure 8), smaller computer with easier access and specialized software were developed. During the next decade, the first complete genome from a free-living organism (Haemophilus influenzae) was sequenced by The Institute for Genomic Research (TIGR) [40]. Another milestone in bioinformatics was the finalization of The Human Genome Project, which was completed in 2004 after thirteen years and a cost of $2.7 billion dollars by the U.S. National Institutes of Health

7

(NIH). Nowadays, bioinformatics are facing the challenge of handling Big Data and the reproducibility of results from published research [41]. The use of bioinformatics in genomics is based on three basic steps: (i) quality control of the generated sequences, which can include a demultiplexing step to separate and assign the generated sequences to a specific sample; (ii) taxonomic or functional assignation of the sequences, which can be performed on individual reads, on sequences produced after the de novo assembly of the individual reads, or on consensus sequences obtained after the mapping of the reads to reference sequences; and, (iii) verification of the results to rule out potential false positives and/or negative [42].

Figure 8. Increase in (a) number of nucleotides available in Genbank and (b) number of protein structures published in PDB [43].

2.5 EXTRACTION PROTOCOLS There are different nucleic acids that are targeted during the study of plant viral metagenomics, which is the study of microbial genomes directly from environmental samples [44]. The four main types of nucleic acids mainly used as targets in plant viral metagenomics are total RNA or DNA, virion- associated nucleic acids (VANA) purified from virus-like particles, double-stranded (dsRNA), and virus-derived small interfering RNAs (siRNAs) (Table 3) [45].

Table 3. Types of viral nucleic acids used for metagenomic studies. Abbrev iations: dsRNA, double-stranded RNA; ssRNA, single- stranded RNA; siRNA, small interfering RNA; VANA, virion-asssociated nucleic acid [45].

Virus- associated Subcellular location Virus genome types Drawbacks nucleic acid dsRNA Cytoplasm, virions (+) ssRNA, dsRNA Misses (-) RNA viruses and DNA viruses DNA Nucleus virions ssDNA, dsDNA Misses RNA viruses (+) and (-) ssRNA; Cytoplasm, nucleus, High background; may miss low titer viruses; ssRNA dsRNA (as transcripts); virions may miss dsRNA genomes DNA (as transcripts) May miss persistent viruses; may be difficult siRNA Cytoplasm, nucleus ssRNA, dsRNA, ssDNA to accurately assemble novel viruses; high background Generally, cannot detect unencapsidated ssRNA, dsRNA, ssDNA, VANA Cytoplasm, nucleus viruses; poor recovery of viruses from some dsDNA plants

The advantages of total RNA are that this protocol allows the identification of any RNA or DNA virus and viroid with no previous enrichment of viral sequences, and it can be used in individual and pooled samples. However, there is limited sensitivity to detect viruses that are present in low concentrations. Double-stranded RNA allows the detection of a higher diversity of RNA viruses than other protocols,

8 and it allows the detection of viroids. During extraction with dsRNA protocol, there is an enrichment of viral sequences and it can be used both for individual and pooled samples. However, it is time consuming and it does not allow detection of DNA viruses [46]. The advantage of siRNA extraction protocol is that it is very sensitive in the detection of unknown and unknown viruses in single plants. However, it can cause some issues when it is used to detect viruses that do not trigger silencing responses or that produce silencing suppressors [47]. Compared to the dsRNA approach, the VANA protocol does not allow detection of viroids, and it does not detect viruses that are not encapsidated. Nevertheless, VANA has been proved to perform better in a complete viral genome reconstruction and it allows to detect more recombinants [46].

2.6 PLANT VIRUSES

2.6.1 Definition

A virus can be described as “a set of one or more nucleic acid template molecules, either RNA or DNA, normally encased in a protective coat or coats of protein or lipoprotein, that is able to organize its own replication only within suitable host cells” [48]. Viral particles, or virions, are constituted by a genome and a capsid, which protects the genome and is made of proteins. In some cases, the virion also contains an external membrane made of lipoproteins. Viroids are similar to viruses but they consist of a short, single-stranded RNA fragment without the coat protein. The most important feature that characterizes viruses and viroids is that they are obligate parasites which depend on the host’s cellular machinery to reproduce [48], [49].

2.6.2 Taxonomy

Historically, viruses were considered as stable entities that produced a certain disease in a specific host specie, and they were classified by disease symptoms and host plants. Tobacco (TMV) was the first virus discovered in plants by Beijerinck in 1899, although the symptoms of infection by TMV were first described by Mayer in 1886. It was in the early 1930s that the concept and study of plant viruses took a turn. In the next decades, scientists started to realize that (i) viruses can have different strains which can produce different symptoms in the same host, (ii) different viruses can cause similar symptoms in the same host, and (iii) some diseases are caused by a mixed viral infection [48], [50].

Figure 9. Comparison of the ICTV taxonomic rank hierarchy from 1991 to 2017 and in 2019. The number of taxa assigned to each rank from the new hierarchy are shown in white. Black arrows show the common taxonomic ranks between the five-rank (1991-2017) and the fifteen-rank (2019) structures [51]. 9

Figure 10. Distribution of plant virus species by family. Numbers in black indicate the quantity of plant virus species recognized by the ICTV in each family, and family names with an asterisk include species that infect other nonplant hosts [52]. This graphic was published in an article from Roossinck in 2012, thus the viral families that have been included by the ICTV after 2012 are not included.

Since then, there have been several attempts to develop a reliable classification system for viruses. There has been a new hierarchisation of virus taxonomy published recently and approved by the ICTV that shifts from a five-rank structure to a fifteen-rank structure more similar to the Linnaean taxonomic system (Figure 9) [51]. Nevertheless, one of the most widely used and accepted is the Baltimore classification, which was developed in 1971 by David Baltimore. This classification divides viruses into seven groups depending on their genome type and replication method: group I, includes double- stranded (ds) DNA (dsDNA) viruses; group II, includes positive-sense single-stranded (ss) DNA (+ssDNA) viruses; group III, includes dsRNA viruses; group IV, includes positive-sense ssRNA (+ssRNA); group V, includes negative-sense ssRNA (-ssRNA) viruses; group VI, includes positive- sense ssRNA viruses that replicate through a DNA intermediate (); and, group VIII, which includes dsDNA viruses that replicate through a single-stranded RNA intermediate (Pararetrovirus) [53]. Currently, plant viruses are classified in more than twenty viral families (Figure 10).

2.6.3 Genome composition and organization

In general, the viral genome includes several coding and non-coding regions. Coding regions express the proteins required to continue the infectious cycle, such as replication-associated proteins and coat proteins. These coding regions can also be identified as open reading frames (ORFs), which usually starts with an AUG codon and stops with one of the three stop codons. However, there are some cases of plant viruses with ORFs that do not start with the AUG codon, such as Peach chlorotic mottle virus (PCMV) which starts with an AUC codon. Non-coding regions control the expression of the coding regions and replication of the genome [48].

2.6.4 Apple tree viruses

There are currently thirty-nine viruses and viroids that have been detected infecting apple and pear (Table 4), which are transmitted by grafting [54]. Apple and pear viruses can be found in numerous viral families including Betaflexiviridae, , , Luteoviridae, Phenuiviridae, , Pospiridae, , , , , , . However, the following sections only describe the viruses that were detected during this project and that were studied in depth. These viruses are: Apple stem pitting virus (ASPV), Apple stem grooving virus (ASGV), Apple rubbery wood virus (ARWV), Apple luteovirus- 1 (ALV-1), and Apple hammerhead viroid (AHVd).

10

Table 4. Summary of the most common viruses infecting apple and pear trees, their taxonomy, their group according to the Baltimore classification, and the symptoms they produce [54].

Virus Family Genus Group First publication Malus domestica virus A(MdoVA) Closteroviridae Velariviruses ssRNA [55] Apple chlorotic fruit spot viroid(ACFSVd) Apscaviroid viroid [56] Apple rootstock virus A(ApRVA) Rhabdoviridae Nucleorhabdovirus ssRNA [57] Apple hammerhead viroid (AHVd) Pelamoviroid. viroid [58] Apple rubbery wood virus (ARWV) Phenuiviridae Rubodvirus ssRNA [59] Apple rubbery wood-associated viruses(ARWaV) Phenuiviridae Bunyavirales ssRNA [60] Citrus concave gum associated virus(CCGaV) bunya virales Coguvirus ssRNA [61] Apple luteovirus(ALV) Luteoviridae Luteovirus ssRNA [62] Apple-associated luteovirus (AaLV) Luteoviridae Luteovirus ssRNA(+) [63] Apple necrotic mosaic virus (ApNMV) Bromoviridae ssRNA(+) [64] Temperate fruit decay-associated virus(TFDaV) unclassified DNA viruses unclassified ssDNA viruses circular DNA [65] Apple geminivirus(AGV ) Geminiviridae unclassified Geminiviridae circular DNA [66] Apple green crinkle associated virus(AGCaV) Betaflexivirida Foveavirus ssRNA(+) [67] Blackberry chlorotic ringspot virus(BCRV) Bromoviridae Ilarvirus ssRNA(+) [68] Apple latent spherical virus(ALSV) Secoviridae ssRNA(+) [69] Apple fruit crinkle viroid(AFCVd) Pospiviroidae Apscaviroid viroid [3] Apple dimple fruit viroid (ADVFd) Pospiviroidae Apscaviroid viroid [70] Pear blister canker viroid (PBCVd) Pospiviroidae Apscaviroid viroid [71] Peach latent mosaic viroid (PLMVd) Avsunviroidae Pelamoviroid viroid [72] Apple scar skin viroid (ASSVd ) Pospiviroidae Apscaviroid viroid [73] (HSVd) Pospiviroidae viroid [74] Tomato chlorosis virus (ToCV) Closteroviridae ssRNA(+) [75] Apple stem pitting virus (ASPV) Betaflexiviridae Foveavirus ssRNA(+) [76] Apple stem grooving virus (ASGV) Betaflexiviridae ssRNA(+) [77]

11

Continuation of Table 3.

Virus Family Genus Group First publication Apple flat limb virus - - ssRNA [78] Apple chlorotic leaf spot virus (ACLCV) Betafexiviridae ssRNA(+) [79] Tulare apple mosaic virus (TAMV) Bromoviridae Ilarvirus ssRNA(+) [80] Cherry necrotic rusty mottle virus (CNRMV) Betaflexiviridae Robigovirus ssRNA(+) [81] Cherry leaf roll virus (CLRV) Secoviridae ssRNA(+) [82] Cherry rasp leaf virus(CRLV) Secoviridae Cheravirus ssRNA(+) [83], [84] Potato virus S (PYS) Betaflexiviridae ssRNA(+) [85] Prunus necrotic ringspot virus (PNRSV) Bromoviridae Ilarvirus ssRNA(+) [86] Apple mosaic virus (ApMV) Bromoviridae Ilarvirus ssRNA(+) [87] Cucumber mosaic virus (CMV) Bromoviridae ssRNA(+) [88] Tomato ringspot virus(ToRSV) Secoviridae Nepovirus ssRNA(+) [89] Tobacco ringspot virus(TRSV) Secoviridae Nepovirus ssRNA(+) [90], [91] Tomato bushy stunt virus(TBSV) Tombusviridae ssRNA [92], [93] Carnation mottle virus(CarMV) Tombusviridae Alphacarmovirus ssRNA(+) [94] Clover yellow mosaic virus (ClYMV) Tymovirales ssRNA(+) [95]

12

2.6.4.1 Apple stem pitting virus (ASPV)

Apple stem pitting virus (ASPV) is the type member of the genus Foveavirus (ICTV, 2011) within the family Betaflexiviridae, which includes plant viruses with a single molecule of linear ssRNA [76]. Multiple viruses known to infect pome fruit trees belong to the same family, and mostly belong to the genus Foveavirus and Capillovirus (Table 4). Virions are flexuous filaments, usually 12 – 13 nm in diameter (range 10 – 15 nm) and between 600 – 1000 nm in length, depending on the genus. They have helical symmetry with a pitch of about 3.4 nm (range 3.3 – 3.7 nm) and in some genera there is clearly visible cross-banding (Figure 12). The genome of ASPV is constituted by five open reading frames (ORFs). The ORF1 is preceded by a short 5’-UTR sequences. At the beginning of the 5’ region there is an untranslated region (UTR) of about 33 – 72 nt. The ORF1 encodes for a replication-related protein (RdRp); ORFs 2, 3, and 4 constitute the triple gene block (TGB); and, ORF5 encodes for the CP. At the end of the genome there is a UTR of about 176 – 312 nt followed by a poly(A) tail (Figure 11).

Figure 11. Genome organization of Apple stem pitting virus (ASPV), the type member of Foveavirus. Boxes indicate open reading frames (ORFs) and the relative position of their expression products. Abbreviations: Mtr, methytransferase; P-Prp, papain-like protease; Hel, helicase; Pol, polymerase; TGB, triple gene block; CP, capsid protein. From: https://talk.ictvonline.org/

Figure 12. Negative contrast electron micrograph of particles of an isolate of Apple stem pitting virus (ASPV). The black bar at the low, left corner represents 100 nm. From: https://talk.ictvonline.org/

Figure 13. Worldwide distribution map of Apple stem pitting virus (ASPV). Yellow dots indicate countries where ASPV has been reported (incomplete map). Distribution of ASPV is probably much wider, but the database used to generate this map was, most probably, not updated and incomplete. From: https://gd.eppo.int/taxon/ASPV00/distribution.

13

Figure 15. Comparison in stem and leaf growth between Figure 14. Detail of stem pitting symptoms produced by apple stem pitting a healthy and an apple stem pitting virus (ASPV) virus (ASPV) infection. From: https://gd.eppo.int/taxon/ASPV00/photos. infected plant showing symptoms of plant decline. From: https://gd.eppo.int/taxon/ASPV00/photos.

ASPV is widely distributed, having been reported in countries from Europe, Africa, and Asia (Figure 13). The natural host range of foveaviruses include apple, pear, and quince, among others. ASPV is transmitted by grafting, through infected clonal rootstocks. In apple trees, infection with ASPV is usually asymptomatic but it causes xylem pits in the stem of Malus pumila Virginia Crab (Figure 12) and epinasty and decline of Spy 227 (Figure 11), which is an ornamental variety of apple [96]. In pears, it is known as pear vein yellows virus-PVYV and it is the causal agent of necrotic spot and vein yellows diseases. According to the ICTV, species from the family Betaflexiviridae are differentiated by their natural host range, serological specificity, CP size, and the identity of their genome (less than about 72% of identity at nucleotide level or 80% at aminoacid level between their CP or polymerase genes).

2.6.4.2 Apple stem grooving virus (ASGV)

Apple stem grooving virus (ASGV) belongs to the family Betaflexiviridae, subfamily Trivirinae, and the genus Capillovirus. The name Capillovirus was given to this genus because of the shape of the particles, which are similar to a hair and capillus in Latin means hair. It has a linear, positive-strand, ssRNA (+ssRNA) genome of about 6.5 – 7.5 kb. Virions from the genus Capillovirus are non- enveloped, filamentous particles of approximately 640 nm long and 12 nm in diameter (Figure 17). The RNA virion acts as both the genome and the viral messenger RNA (mRNA). A distinctive feature of capilloviruses is their genomic organization. They have two ORFs that encode for a large replication-associated protein together with a coat protein, and a putative movement protein encoded by a nested ORF (Figure 16). Apple stem grooving virus is the type member of the genus Capillovirus (ICTV, 2011).

14

Figure 16. Genome organization of Capilloviruses. Boxes indicate open reading frames (ORFs) and the proteins they encode: Mtr, methyltransferase domain; P-Pro, papain-like protease domain; Hel, helicase domain; RdRp, RNA polymerase domain; MP, putative movement protein; CP, coat protein. From: https://talk.ictvonline.org/

Figure 17. Image of particles of an isolate of Apple stem grooving virus (ASGV) produced by negative contrast electron micrograph. The bar in the lower, right corner represents 100 nm. From: https://talk.ictvonline.org/

Figure 18. Worldwide distribution map of Apple stem grooving virus (ASGV). Yellow dots indicate countries where ASGV has been reported. From: https://gd.eppo.int/taxon/ASGV00/distribution.

Figure 19. Necrotic grooves at the graft union of an unknown rootstock budded with Virgina Crab, a symptom of apple stem grooving virus (ASGV) infection. From https://gd.eppo.int/taxon/ASGV00/photos.

15

ASGV is highly distributed around the world, with higher presence on the North Hemisphere (Figure 18). Natural hosts of capilloviruses include several important horticultural and ornamental crops such as apple, nandina, or cherry. ASGV is transmitted by mechanical transmission and grafting, and it is not considered an epidemic where certification programs of planting material are applied. However, it can cause incompatibility if the scions are grafted on sensitive material (Figure 16). In apple trees, infection with ASGV is generally symptomless or asymptomatic and symptoms are only expressed when an infected cultivar is grafted onto a sensitive rootstock [98].

2.6.4.3 Apple rubbery wood virus (ARWV)

Apple rubbery wood virus-1 (ARWV-1) and 2 (ARWV-2) have been proposed as new members of the family Phenuiviridae. The proposed new name to the ICTV is Apple rubodvirus 1 and 2 [99]. It was incorporated into the order Bunyavirales in 2016 by the ICTV. It can also be found under the name Apple rubbery wood associated virus (ARWaV). Virions are enveloped and spherical, of 80 – 120 nm in diameter. They have a segmented, negative-stranded RNA linear genome that consists of three segments: (i) L (for Large) segment, of about 6.4 kb; (ii) M (for Medium) segment, of about 3.2 kb; and, (iii) S (for Small) segment, of about 1.7 kb (Figure 20).

Figure 20. Genome organization of viruses from the family Phenuiviridae. Boxes indicate open reading frames (ORFs) and the proteins they encode: L segment encodes RNA polymerase protein (RdRp); M segment encodes for several polyproteins by leaky scanning; S segment encodes for several proteins, and both genomic and antigenomic RNA is transcribed [100].

Figure 21. Apple rubbery wood disease in an apple tree infected with Apple rubbery wood virus-1 (ARWV-1) [59].

In 2018, the new genus Rubodvirus was proposed to be added to the family Phenuiviridae, but it is still pending for approval by the ICTV. The name Rubodvirus stands for Apple rubbery wood virus. This new genus was proposed by Rott et al. in 2018 to accommodate Apple rubbery wood virus 1 (ARWV-1) and Apple rubbery wood virus 2 (ARWV-2). Even though the genome of Apple rubbery wood virus (ARWV) from trees presenting symptoms of Apple rubbery wood disease was first

16 sequenced and described in 2018 [59], ARW disease was first described in 1971 in New Zealand [101]. Reviewing the published sequences of ARWV, ARWV-1 has been detected in Brazil, Canada, South Korea, and the United States. ARWV is believed to be the causal agent of Apple rubbery wood disease. Interestingly, the family Phenuiviridae has a wide host range and it includes both animal and plant viruses. This disease causes abnormal flexibility in branches from susceptible cultivars (Figure 18), although in some cases rigidity is restored in some or all branches later in the tree development [102], [103].

2.6.4.4 Apple luteovirus 1 (ALV-1)

Apple luteovirus 1 (ALV-1) is a virus from the family Luteoviridae, which includes forty-four recognised species by the ICTV, that was identified by Liu et al. (2018). This family is divided in three genera: , , and Luteovirus. The group name Luteovirus is derived from the Latin “luteus”, which means yellow, because the type member ( virus) along with other members of the group cause yellowing in their respective hosts [104]. To date, ALV-1 is the only virus from the family Luteoviridae to have been described on apple or pear [62]. Viruses from the genus Luteovirus have a genome that contain six ORFs, similar to that of ALV-1 (Figure 22). The ORF1 and ORF2 encode putative proteins P1 and a fusion between P1 and P2, by the -1-frameshift . These two proteins together form a putative replicase complex. The ORF3 encodes a putative coat protein (CP), and the ORF encodes a putative movement protein (MP). The ORF3a encodes a small protein essential for long distance movement. The ORF1a and ORF5a are only present in this virus and the putative proteins they encode have no sequence homology with any other know proteins. On the other hand, ORF6 and ORF7 present in other viruses from the genus Luteovirus [62].

Figure 22. Genome organization of Apple luteovirus 1 (ALV-1). Boxes indicate the proposed open reading frames (ORFs) and the proteins they encode: RdRp, RNA polymerase domain; CP, coat protein; MP, movement protein [62].

Figure 23. Symptoms of Rapid Apple Decline (RAD) showing necrosis from a declining tree with bark removed from the graft union. From: https://extension.psu.edu/apple-disease-rapid-apple-decline-rad-or-sudden-apple-decline-sad.

17

Reviewing the published sequences on NCBI, this novel virus has been detected in Greece, South Korea, and the United States of America. It is associated with Rapid Apple Decline (RAD), a disease that produces necrosis at the graft union and a rapid collapse of apple trees after the first appearance of symptoms [105].

2.6.4.5 Apple hammerhead viroid-like RNA (AHVd-like RNA)

Apple hammerhead viroid-like RNA (AHVd-like RNA) has a single-stranded circular RNA genome of 453 nucleotides long. Viroids do not have protein-coding capabilities and, thus, they depend on host-encoded DNA-dependent RNA polymerase. AHVd-like RNA was first described by Zhang et al., (2014). Reviewing the published sequences of AHVd-like RNA, it has only been detected in China and Canada.

Figure 24. Proposed primary and secondary structures for Apple hammerhead viroid-like RNA (AHVd-like RNA). In the subfigure A, the mutations observed in different variants of the viroid are indicated in blue and the sequences forming the hammerhead structures are delimited by flags. The two pairs of arrows represent the primers used for amplification. In subfigure B, there are three schematic maps of plus, minus, and consensus hammerhead structures of the viroid [106].

Figure 25. Symptoms of apple scar skin disease in apple. A: symptoms of small circular spots are more pronounced at the calyx end. B: misshaping of apple fruits. From: https://pnwhandbooks.org/plantdisease/host-disease/apple-malus-spp-scar-skin-dapple-apple.

This viroid is associated with typical symptoms of apple scar skin disease, which produced small circular spots more pronounced near the calyx end and smaller and misshapen apples in affected trees (Figure 22) [107]. AHVd-like RNA is a viroid from the family Avsunviroidae and the genus Pelamoviroid, with peach latent mosaic viroid and chrysanthemum chlorotic mottle viroid [58]. Members of this genus have a circular genomic RNA, with a stable secondary structure in a branched conformation, stabilized by a kissing-loop interaction in the positive (+) strand (Figure 24) [106], [108].

18

3 OBJECTIVES

Pome fruit viruses associated to economically important diseases have been widely studied and characterized. However, there is a gap of knowledge of viruses with no obvious symptoms. High- throughput sequencing (HTS) techniques allow plant pathologists to study and characterize the plant virome without any previous knowledge, and it provides a new tool for the identification and characterization of novel viruses. For a better management of pome fruit viruses, specifically apple and pear, it is necessary to (i) understand the historic evolution of pome fruit viruses, as well as (ii) having a more complete overview of their viral populations. In the case of pome fruit viruses, it is possible to study the historic evolution of the viruses given that most of them are transmitted by grafting and there is no known vector, thus there is no horizontal transfer of the viruses. Nevertheless, this assumption could change with the new viruses being discovered with high-throughput sequencing (HTS) techniques. For example, Apple Luteovirus-1, which was discovered in 2018, is a member of the family Luteoviridae, which includes viruses that are mainly transmitted by vectors. The objective of this project is to complete the knowledge on viruses infecting apple and pear by sequencing samples from a germplasm collection that hosts a wide diversity of cultivars.

In addition, secondary objectives of the project were focused on technical development and protocol comparison:

i. Developing and applying for the first time the dsRNA protocol for virome study in the laboratory. Previous studies have shown that double-stranded RNA (dsRNA) protocol is more efficient than VANA in relation to RNA viruses from weeds and certain crops [46]. VANA extraction protocol has been widely used in the Laboratory of Integrated and Urban Phytopathology from Gembloux Agro-Bio Tech, but we decided to test the dsRNA protocol from INRA Aquitaine (Bordeaux, France) and if it is more efficient for pome fruit tree virus characterization. ii. Comparing classical and innovative pipelines of bioinformatics analysis of high throughput sequencing data. Classical methodologies for virus detection rely on the construction of longer sequences from the sequencing reads and on their annotation. New approaches have been developed recently and are based on the annotation of individual reads. Two of these approaches have been tested on fruit tree for the first time and compared to the classical approach. iii. Evaluating the impact of sample pooling on the viruses detected. Up to know, the studies of the virome from fruit tree samples were carried out on individual trees. This individual characterization is interesting when analysing individual symptomatic trees but is hampering a large-scale survey due to the time and the cost of analysis. Pooling samples is particularly interesting for surveying many trees even though it would require more downstream analyses to identify the individual trees.

To complete the knowledge on viruses infecting apple and pear, I have conducted an experiment with samples from the CRA-W germplasm collection in Gembloux, with cultivars dating back to the 19th century. The samples were taken from cultivars of apple (Malus domestica) and pear (Pyrus communis). Extracted genetic material was sequenced in order to identify the viruses infecting the trees

19 and to recover their genomes. Sequenced data was analysed using Geneious (www.geneious.com) and complementary programs such as Kaiju [109] or Kraken [110]. Some of the detected viruses were further characterized by recovering their nearly complete genome and confirming their detection by RT-PCR:

20

4 MATERIALS AND METHODS

4.1 BIOLOGICAL DATA COLLECTION

Samples were collected from the germplasm collection in the Walloon Agricultural Research Centre (CRA-W), in Gembloux. This organization was founded in 1872 and, since 2002, it is under the control of the Regional Government of Wallonia. The sampling strategy followed was to take one leave from each cardinal point at two different heights, to improve randomization. This sampling strategy was also used to obtain a more complete view of the viral population per tree. Leaves were stored at -80ºC to preserve viral particles and avoid oxidation and denaturation. Samples were further lyophilised and stored at -20ºC. Before extraction, ten individual samples (nº 1 – 10) of apple (Malus domestica) and pear (Pyrus communis) were pooled in two replicates in order to extract the viruses from each pool with a different extraction method: double-stranded RNA (dsRNA) [111], and VANA [112].

Table 5. Metadata of the samples taken from the CRA-W in Gembloux. Identification number of the cultivars in the collection, name of the cultivar, tree species (host), number given to samples during laboratory analysis (sample nº), and extraction method used for each sample.

Identification Cultivar Host Sample nº Extraction method Number Poire Cuisse L9.868 Pyrus communis 1 dsRNA Madame L9.870 Jeanne d’Arc Pyrus communis 2 dsRNA L9.876 Poire Rougette Pyrus communis 3 dsRNA Bronzée L9.1104 Pyrus communis 4 dsRNA d’Enghien Colmar du L9.1106 Pyrus communis 5 dsRNA Mortier Q27 Gravenstein Malus domestica 6 dsRNA Q35 Pomme Pellone Malus domestica 7 dsRNA Délices de Q37 Malus domestica 8 dsRNA Beignée Reinette Q39 Malus domestica 9 dsRNA Meurens Belle de Q41 Malus domestica 10 dsRNA Boskoop Q9 Joseph Musch Malus domestica Q9 RNeasy Plant Mini Kit dsRNA mix - - 1-10 dsRNA VANA mix - - 1-10 VANA

4.2 RNA EXTRACTION METHODS

During this project, three different RNA extraction methods were used: (i) RNeasy Plant Mini Kit (QIAGEN), (ii) double-stranded RNA (dsRNA), and (iii) VANA. The specific method of extraction was chosen depending on the purpose of the extraction, for example, RNeasy Plant Mini Kit was used as a quick and efficient extraction method for a preliminary analysis of sample Q9.

21

Table 6. Nucleotide sequences of the multiplex identifiers (MIDs) used with double-stranded RNA (dsRNA) and virion-associated nucleic acids (VANA) protocols. The MID sequences used with dsRNA extraction protocol (MID-GENCO1-10) were designed by the laboratory of plant pathology in INRA Aquitaine (Bordeaux, France). The MID with name “Tag761_4_LDF_093” was designed by the plant virology group from Gembloux Agro-Bio Tech.

Name Nucleotide sequence (5’ - 3’) MID-GENCO1 AACCGCAATGTGTTGGGTGTGTTTGG MID-GENCO2 AACGACGTTGTGTTGGGTGTGTTTGG MID-GENCO3 AACTAGTATGTGTTGGGTGTGTTTGG MID-GENCO4 AAGAACCATGTGTTGGGTGTGTTTGG MID-GENCO5 AAGAGAGTTGTGTTGGGTGTGTTTGG MID-GENCO6 AGAGTCTTTGTGTTGGGTGTGTTTGG MID-GENCO7 AGATGGTCTGTGTTGGGTGTGTTTGG MID-GENCO8 AGGCGCCTTGTGTTGGGTGTGTTTGG MID-GENCO9 AGGCTTGGTGTGTTGGGTGTGTTTGG MID-GENCO10 AGTTCCGCTGTGTTGGGTGTGTTTGG Tag761_4_LDF_093 GTTGTATGCTTCCTCTGATCGGGC

4.2.1 RNeasy Plant Mini Kit (QIAGEN)

The RNeasy Plant Mini Kit is a protocol from QIAGEN (Figure 26 and Figure 27) used to purify and extract total RNA from plant cells and tissues, as well as filamentous fungi. Before starting, 4 volumes of ethanol (96 – 100%) are added to the Buffer RPE and 10 µL of betamercaptoethanol (β-ME) are added per 1 mL of Buffer RLT.

Figure 26. Principle and procedure of the RNeasy extraction kits from QIAGEN. From left to right: RNeasy Mini Kit, RNeasy Protect Mini Kit, and RNeasy Plant Mini Kit, highlighted in red (From: RNeasy Handbook).

22

Library RNA DNAse Ribodeple- Novaseq extraction preparation treatment tion sequencing (kit) (trueseq)

Figure 27. Flowchart of a total RNA extraction with the RNeasy Plant Mini Kit (QIAGEN). The RNA extraction and the DNAse treatment were performed in Gembloux Agro-Bio Tech; and the ribodepletion, library preparation, and sequencing were performed by GIGA in Liège.

This kit allows the RNA purification from several samples simultaneously theoretically in less than one hour, even though for unexperienced researchers it may take between two to three hours. Its technology is based on the combination of the selective binding properties of a silica-based membrane with the speed of microspin technology.

a) Total RNA extraction

For this protocol, 200 mg of fresh plant leaves (or 20 mg if the tissue was lyophilised) were ground in 1 mL of RLT buffer containing 1% betamercaptoethanol using a tissue homogeniser. The resulting mix was transferred to a QIAshredder spin column in a 2 mL collection tube and the tubes were centrifuged at full speed during two minutes. The supernatant was transferred into new microcentrifuge tubes, and a 0.5 volume of ethanol was added. Then, the supernatant was transferred again, including any precipitate, to a RNeasy spin column in a 2 mL collection tube and it was centrifuged for fifteen second at more than 8,000 x ɡ. The flow-through was discarded and 700 µL of Buffer RW1 were added to the spin column. The tubes were centrifuged again during fifteen seconds at more than 8,000 x ɡ in order to wash the membrane from the spin column. The flow-through was discarded one more time and 500 µL of Buffer RPE, previously mixed with ethanol, were added to the RNeasy spin column. The tubes were centrifuged again during fifteen seconds at more than 8,000 x ɡ and the supernatant was discarded. The RPE step was repeated with a centrifugation time of two minutes. This last centrifugation was used to dry the spin column membrane, which ensured that there was no ethanol left that may interfere during the RNA elution. The RNeasy spin column was placed in new 1.5 mL collection tubes, and 30 µL of RNAse-free water were added to the spin column membrane. Finally, the tubes were centrifuged during one minute at more than 8,000 x ɡ in order to elute the RNA. After extraction, a DNAse treatment and a nanodrop quantification was performed on the purified RNA.

b) Library preparation and sequencing

Synthesis of complimentary DNA (cDNA), library preparation and sequencing were performed at the GIGA facilities from the University of Liège. Preparation of sequencing libraries was done with the TruSeq Small RNA Library Preparation Kit (Illumina), following the manufacturer’s instructions. After the synthesis of cDNA, the purified genomic DNA was fragmented to an optimal length, and specialized adapters were added to both ends of the fragments. These fragments with adapters were then amplified by PCR and purified with a gel. The libraries were sequenced with a Novaseq sequencing machine and a read length of 2x150 nt.

23

4.2.2 Virion-associated nucleic acid (VANA)

This extraction method was developed for plants containing high levels of secondary metabolites, phenolic compounds, highly viscous polysaccharides, and endonucleases. The VANA extraction method used for this project is an optimized protocol adapted from Filloux (2015) and it consists of the following steps: (a) purification of viral particles, including clarification, filtration, ultracentrifugation, and an enzymatic treatment; (b) extraction of viral nucleic acids; (c) cDNA synthesis, purification, priming, and extension by RT-PCR and Klenow fragmentation; and, (d) library preparation and Illumina sequencing (Figure 28).

cDNA synthesis Amplification by Clarification (tagd) PCR

Nucleic acids Library preparation Filtration extraction (truseq)

Enzymatic Novaseq Ultracentrifugation treatment sequencing

Figure 28. Flow through of the processes and steps followed during the extraction of virion-associated nucleic acids (VANA), adapting the procedure from Filloux et al. (2015).

a) Purification of viral particles

In a filtered bag, 1 g of dried leaf material was ground in Hanks’ buffered salt solution (HBSS) with a tissue homogeniser. Because the extraction buffer, or HBSS, must be prepared just before extraction, five stock solutions can be kept in the laboratory. Combining these stock solutions, a HBSS premix with a final composition of 0.137 M NaCl, 5.4 mM KCl, 0.25 mM Na2HPO4, 0.1 g glucose, 0.44 mM KH2PO4, 1.3 mM CaCl2, 1.0 mM MgSO4, 4.2 mM NaHCO3 was obtained. The stock solutions were: (i) 8.0 g of NaCl , 0.4 g of KCl, and 1.0 g of glucose were dissolved in 90 mL of distilled water, which was then added until a final volume of 100 mL was obtained; (ii) 0.358 g of anhydrous Na2HPO4 and 0.60 g of KH2PO4 were added to 90 mL of distilled water, which was then added until a final volume of 100 mL was obtained; (iii) 0.72 g of CaCl2 were dissolved in 50 mL of distilled water; (iv) 1.23 g of MgSO4x7H2O were dissolved in 50 mL of distilled water; and, (v) 0.35 g of NaHCO3 were dissolved in 10 mL of distilled water. To prepare the HBSS premix from the stock solutions, the following volumes of each solution are mixed: 10 mL of stock solution 1, 1 mL of stock solution 2, 1 mL of stock solution 2, 86 mL of distilled water, and 1 mL of stock solution 4. To prepare HBSS at full strength just before extraction, 9.9 mL of HBSS premix with 0.1 mL of stock solution 5 were mixed.

24

Then, the homogenised plant extracts were centrifuged at 3,200 x ɡ during five minutes and 5 mL of supernatant were collected in new tubes. The tubes were centrifuged again at 8,228 x ɡ for three minutes. The supernatants were later filtered with a 0.45 µm sterile syringe filter and 3 mL of filtered supernatant were collected in an ultracentrifuge tube. 1 mL of 30% sucrose diluted in 0.2 M potassium phosphate pH 7.0 was added to the tubes, which were then ultracentrifuged at 148,000 x ɡ during two hours at 4 ºC, to concentrate the viral particles. Then, the supernatant was removed, and the pellet was resuspended in 1 mL of HBSS at 4 ºC overnight. The resulting mix was transferred to new 1.5 mL Eppendorf tubes. The ultra-centrifugation was done in the GIGA facilities in Liège.

b) Viral nucleic acids extraction

In this step, total nucleic acids were extracted from 200 µL of resuspended particles after enzymatic treatment using a PureLinkTM Viral RNA/DNA Mini Kit, from ThermoFischer Scientific, following the manufacturer’s protocol. After extraction, an enzymatic treatment was performed to eliminate unencapsidated nucleic acids. To 200 µL of resuspended mix, 15 U of bovine pancreas DNase I and 1.9 U of bovine pancreas RNase A were added. The tubes were incubated at 37 ºC during one hour and a half.

c) Viral complementary DNA (cDNA) synthesis, purification, priming and extension

For the viral cDNA synthesis, 100 pmol of primer were added to 10 µL of extracted viral nucleic acids and the tubes were incubated at 85 ºC during two minutes. The primers used comprised a random dodecamer with a multiplex identifier (MID), barcode or tag at the 5’ end. A MID, barcode or tag is a short sequence added between the fragmented DNA and the sequencing adapters, that help to trace the source of the read. In this case, the MID used was “Tag761_4_LDF_093” (Table 6). To this mixture, 2 µL of dithiothreitol, 2 µL of each dNTP, 4 µL of SuperScript buffer, and 1 µL of SuperScript III were added. The tubes were incubated at 25 ºC for ten minutes, 42 ºC during one hour, and 70 ºC for five minutes. The second strand synthesis step was performed using Klenow fragment (DNA Polymerase I), following manufacturer’s indications. Then, the tubes were placed on ice during two minutes and the cDNA was purified with the Qiaquick PCR cleanup kit, following the manufacturer’s indications.

d) Library preparation and amplification

The aim of library preparation is to generate a collection of DNA fragments suited for sequencing. The mix for the PCR contained 5 µL of sample product, 5 µL of primer solution at 10 µM, and 10 µL of HotStarTaq Plus Master Mix Kit (from Qiagen). The fragments were amplified under the following conditions: one cycle at 95 ºC for five minutes; five cycles at 95 ºC during one minute; 50 ºC for one minute; 72 ºC during one and a half minutes; thirty-five cycles at 95 ºC for thirty seconds; 50 ºC during thirty seconds; 72 ºC for one and a half minute, with a slope of two seconds between each cycle; and, finally, an extension at 72 ºC during ten minutes. The resulting PCR products were visualised on a gel, quantified by nanodrop, combined or pooled together, and cleaned before being sent to the GIGA facilities in Liège to be sequenced.

25

4.2.3 Double-stranded RNA (dsRNA)

This method targets double-stranded RNAs as alternative nucleic acid substrate for high-throughput sequencing analysis. The dsRNA extraction method used during this project is an optimized protocol from the laboratory of INRA Aquitaine, in Bordeaux [111], [113]. This extraction protocol allows the detection and recovery both of dsRNA and ssRNA viruses, given that dsRNAs are also the replicative form of viruses with ssRNA genomes. The principle is that plants produce very little dsRNA and, hence, the small quantities of dsRNA extracted are a result of the replication of viral particles, and it is based on the affinity of cellulose powder with nucleic acids, which selectively purifies the dsRNAs from other nucleic acids [113]. This protocol consists of a purification of dsRNAs with two steps of cellulose batch chromatography, followed by a synthesis of complementary DNA (cDNA) and a random amplification by PCR (Figure 29).

First cellulose batch Amplification by Library preparation chromatography PCR (truseq)

Enzymatic treatment cDNA synthesis Novaseq sequencing

Second cellulose Nucleic acids batch extraction chromatography

Figure 29. Flow through of a double stranded RNA (dsRNA) extraction, following the procedure of Marais et al. (2018).

a) Double-stranded RNA extraction

First, fresh extraction buffer containing 1 mL 2xSTE, 70 µL 20% SDS, 20 µL sodium bentonite, and 1.425 mL phenol-TE saturated was prepared just before use in a 15 mL Falcon tube. Then, 0.075 g of lyophilised, or freeze dried, material was ground with liquid nitrogen and a precooled mortar and pestle. The resulting frozen powder was transferred to a 15 mL Falcon tube containing the extraction buffer. The tubes were agitated gently at 120 – 130 rpm during thirty minutes on a horizontal agitator. Then, the tubes were centrifuged at room temperature for fifteen minutes at 3,000 x ɡ.

New 1.5 mL Eppendorf tubes were prepared containing 0.040 g of cellulose, either CF11 or CC41. After centrifugation, the aqueous phase was transferred to 1.5 mL Eppendorf tubes, which were centrifuged at 10,000 x ɡ during twenty minutes. To further purify the dsRNA from unwanted organic materials, the aqueous phase was transferred to new 1.5 mL Eppendorf tubes and absolute ethanol to 15% (v/v) final concentration was added. To calculate the volume of absolute ethanol that needed to

26 be added, the volume of aqueous phase was quantified and multiplied by 0.176. Then, the mix was transferred to the 1.5 mL Eppendorf tubes containing 0.040 g of cellulose. The tubes were vortexed to mix the cellulose with the aqueous phase and the ethanol and agitated gently at 120 – 130 rpm during thirty minutes to one hour in the horizontal agitator. Then, the tubes were centrifuged at 5,000 x ɡ for one minute and the supernatant was removed with a pipette.

The cellulose was washed by adding 1 mL of washing solution to the tubes containing the cellulose pellet, and they were agitated gently at 120 – 130 rpm during five minutes in the horizontal agitator. The tubes were centrifuged at 5,000 x ɡ for one minute and the supernatant was removed again with a pipette. the washing step was repeated twice, until the supernatant became colourless. At the end of the last cellulose wash, the supernatant was removed, and the cellulose was dried during one minute at the speed vacuum.

For the elution step, 200 µL of STE 1X were added to the dry cellulose. The cellulose was then diluted and resuspended with the help of a pipette tip and the tubes were agitated gently at 120 – 130 rpm during five minutes in the horizontal agitator. Then, the tubes were centrifuged at 5,000 x ɡ for one minute and the supernatant was collected in new 1.5 mL Eppendorf tubes, which are kept on ice. The elution step was repeated twice and each time the supernatant was collected into the same tube stored on ice. The tubes were centrifuged at 10,000 x ɡ during one minute in order to remove any residual cellulose from the previous steps. After that, the supernatant was retained in new 1.5 mL Eppendorf tubes. Nucleic acids (dsRNAs) were precipitated by adding 1/10 volume of 3 M sodium acetate pH 5.2 and 0.8 volume of isopropanol. The tubes were stored overnight at -20 ºC, or during one hour at - 80 ºC. Then, they were centrifuged at 20,000 x ɡ during twenty minutes at 4 ºC. After that, the supernatant was discarded, and the cellulose pellet was washed with 1 mL of 70% ethanol. The tubes were centrifuged at 20,000 x ɡ during twenty minutes at 4 ºC, and the supernatant was removed again. The pellets were dried on the speed vacuum for ten minutes and resuspended and dissolved in 170 µL of DEPC-treated water.

b) Enzymatic treatment of dsRNAs The extracted dsRNAs were treated with three enzymatic treatments: (i) DNase treatment, where 20 μL of 1 M magnesium acetate and 10 μL of DNase RQ1 (1 U/ μL) were added to the mixture and the tubes were incubated during one hour at 37 ºC; (ii) RNase A treatment in salt conditions, where 60 μL of 10X SSC, 1 μL of RNase A (10 μg/ μL), and 39 μL of DEPC-treated water were added to the mixture, and the tubes were incubated during thirty minutes at 37 ºC; and, (iii) Proteinase K treatment, where 2.5 μL of SDS 2% and 8 μL of proteinase K (5 mg/mL) were added to the mixture, and the tubes were incubated during one hour at 37 ºC.

c) Phenol/Chloroform extraction and ethanol precipitation

After the enzymatic treatment, the dsRNAs were purified with phenol/chloroform in order to separate nucleic acids from residual proteins. First, one volume (300 mL) of phenol:chloroform:isoamyl alcohol (25:24:1) was added to the tubes. Then, the tubes were centrifuged during five minutes at more than 14,000 x ɡ and the supernatant was retained in new 1.5 mL Eppendorf tubes. The same procedure was repeated but instead of phenol:chloroform:isoamyl alcohol, one volume of chloroform:isoamyl alcohol

27

(24:1) was added. The supernatant was retained again in new 1.5 mL Eppendorf tubes, and 1/10 volume of 3 M sodium acetate pH 5.2 and 2 volumes of absolute alcohol were added. The tubes were stored at -80 ºC during one hour. After that, the tubes were centrifuged at 20,000 x ɡ for twenty minutes at 4 ºC and the supernatant was discarded. The cellulose pellet was washed with 1 mL of 70% ethanol, and the tubes were centrifuged again during fifteen minutes. The supernatant was discarded, and the pellet was dried in the speed vacuum for ten minutes. Then, the pellet was dissolved and resuspended in 250 µL of DEPC-treated water.

d) Second round of cellulose batch chromatography

A second round of cellulose chromatography was done to enrich the viral particles. First, 40 µL of absolute ethanol were added to the 250 µL of dsRNA and the mixture was transferred to a new Eppendorf tube containing 0.040 g of cellulose. The tubes were agitated at 120 – 130 rpm during thirty minutes in a horizontal agitator. The same procedure from step “(a) double-standard RNA extraction” was repeated. On the last step, the pellet was dissolved and resuspended in 20 µL of DEPC-treated water. To observe the dsRNAs in a gel, 3 µL of treated dsRNAs previously mixed with 1/6 volume of loading buffer were loaded on a 0.8% agarose gel containing 1% of Sybergreen in 1x TBE buffer. The gel was migrated for one hour at 80 V and the nucleic acids were visualized on a UV transilluminator. However, the visualization of bands was not systematic as the concentration of dsRNAs may be limiting.

e) Complementary DNA (cDNA) synthesis and random amplification (PCR)

Complementary DNA synthesis is the most critical step of the dsRNA extraction for the high contamination risk. All the procedures must be performed in a room with material only used for cDNA and PCR (pipets, freezer, tips, coats, etc.). Additionally, if the samples are going to be send for sequencing, PCR tubes and mixtures are also kept apart and differentiated from others.

For cDNA synthesis, 3 µL of purified dsRNAs, together with 2.4 µL of sterile water, were denaturated by heating at 99 ºC during five minutes. The products were kept on ice while the following mix was prepared: 0.4 µL of dNTP 25 mM, 0.4 µL of PcDNA12 100 µM, 5.8 µL of sterile water. The mix was added to the tubes and the products were incubated during five minutes at 95 ºC. Then, the following mix was prepared and added to the products, after the previous incubation step: 4 µL of 5X Buffer, 2 µL of DTT, 40 U of Ribolock, and 200 U of Superscript II Reverse Transcriptase. Ribolock inhibits the activity of RNases by binding them in a non-competitive mode at a 1:1 ration, and Superscript II Reverse Transcriptase is used to synthesize first-strand cDNA, and it can generate cDNA of a length up to 12.3 kb. Later, the tubes were incubated at 25 ºC for ten minutes, then at 42 ºC for one hour, followed by the inactivation of the RT by incubating at 70 ºC for 10 minutes. Finally, 1.5 U of RNase H were added to the products to remove any RNA complementary to the cDNA, in order to use the cDNA as a template for amplification in PCR.

The random amplification that followed allowed the conversion of cDNA to double-stranded cDNA while incorporating the tagging MID adaptors (primers MID-GENCO1-10, Table 6). This allowed a multiplexed sequencing, which reduced costs of sequencing. For this procedure, 5 µL of primer MID- GENCO 10 µM were added to 5 µL of cDNA. To this mixture, 5 µL of 10X Buffer, 0.5 µL of dNTP

28

25 mM, 0.25 µL of Dream Taq DNA Polymerase (5 U/ µL), and 34.25 µL of sterile water to obtain a 50 µL reaction volume were added to the tubes containing the primers and the cDNA. The amplification conditions were: 94 ºC during one minute; 65 ºC for zero seconds; 72 ºC during 45 seconds, with a slope of 5 ºC per second, followed by forty cycles of 94 ºC for zero seconds; 45 ºC during zero seconds; 72 ºC for five minutes, with a slope of 5 ºC per second; and final steps of five minutes at 72 ºC and five minutes at 37 ºC.

To visualize the PCR products, 10 µL of the PCR products mixed with 1/6 volume of loading buffer were loaded on a 1.5% agarose gel containing 10 µg/mL of ethidium bromide in 1x TBE buffer, and were migrated during thirty minutes at 100 V. Normally, smears corresponding to PCR products from 100 to 1000 bp can be visualized, rather than clear bands.

4.3 VIRUS DETECTION PROTOCOL BY RT-PCR

For the detection of viruses by a retro-transcriptase polymerase chain reaction (RT-PCR), specific primers were designed for AHVd-like RNA and ALV-1. These primers were designed using the primer design tool from Geneious Prime ® 2020.0.5, according to the following rules: (i) primer length between eighteen to twenty-five nucleotides, which is enough for adequate specificity without the primer binding easily to the template at the annealing temperature; (ii) primer melting temperature (Tm) between 50ºC to 65ºC, which is the temperature at which the DNA will dissociate into single stranded DNA; (iii) percentage of GC content between 40-60%; and, (iv) avoid formation of primer secondary structure, either self or cross dimer, which can results in poor yield of the product. Other parameters to have in consideration when designing a pair of primers are amplicon length, product position, optimum annealing temperature (Ta), and primer pair mismatch Tm with a maximum of 5ºC of difference between the two primers Tm.

Table 7. Primers used for detection of Apple rubbery wood virus-1 (ARWV-1), Apple luteovirus-1 (ALV-1), and Apple hammerhead viroid-like RNA (AHVd-like RNA) by an RT-PCR. Primers for ARWV-1 were taken from the publication of Rott et al. (2018), and two sets of primers for AHVd-like RNA were taken from the publication of Serra et al. (2018). The other sets of primers were designed with Geneious Prime ® 2020.0.5.

Amplicon size Primers Sequence (5’-3’) 5’ Position Tm (ºC) (bp) ARWaV-1L3639F1 AGAACCAGCAATAGCCAC 3639 55.0 419 ARWaV-1L4058R1 CTATCCTTATCTTTGCCTACTT 4058 AHVd-67R AGAACCGGGAGTCAGGAGAG 67 64.0 314 AHVd-381F TCTCCTGACTCCCGGTTCTG 49 AHVd-400R2 ACACACCGCCTTAGATCAGCT 400 61.5 353 AHVd-47F2 GCTGATCTAAGGCGGTGTGT 381 62.0 AHVd-88F2 AGTTACTTCCGGTAACTAGGAGTTTG 88 51.3 243 AHVd-331R2 GAGGGATRTGAAGGGCGAGAGAG 331 55.5 ALV-5773F GTGGTTGTTTTCGGGGAAGC 5773 62.0 394 ALV-6167R TGGAGTTCAGACGTGTGCTC 6167 ALV-4361F GTGTTTGATGAGCGTGATGG 4361 60.0 206 ALV-4567R AGCAGGTTCCGGTTTAGGTT 4567 1 Rott et al., 2018; 2 Serra et al., 2018.

An RT-PCR transcribes extracted RNA into complementary DNA (cDNA), which is then amplified exponentially by PCR. For the retro- (RT) and cDNA synthesis, 5 µL of mastermix (4.50 29

µL of water and 0.50 µL of Hexamer, per reaction) were added to 1 µL of extracted RNA and the tubes were incubated at 65ºC during five minutes and kept immediately on ice after for one minute. Then, 4 µL of mastermix (0.50 µL of dNTPs 10 mM, 0.50 µL of DTT 100 mM, 0.50 µL of RNAse OUT 40 U/µL, 2 µL of 5x Buffer 5 U/µL, and 0.5 µL of SuperScript III 200 U/µL) were added to each tube. Dithiothreitol (DTT) protects the enzymatic activity for the retro-transcriptase, RNAse OUT inhibits the activity of RNases, and SuperSript III Reverse Transcriptase allows the production of cDNA between 100 bp to 12 kb. The tubes were then incubated at 25ºC during five minutes to allow the RNAse OUT to inhibit the RNases, at 50ºC during forty-five minutes to initiate the reverse transcription by the SuperScript III, and at 70ºC during fifteen minutes to stop the reaction. For the PCR, 2 µL of cDNA were added to 18 µL of mastermix (11.60 µL of water, 4 µL of 5x Mango Taq coloured buffer 5 U/µL, 0.40 µL of dNTPs 10 mM, 0.40 µL forward primer, 0.40 µL reverse primer, 0.80 µL of MgCl2 50 mM, and 0.40 µL of Mango Taq 5 U/µL). Then, the tubes were incubated at 94ºC during one minute to separate the two strands of DNA; 94ºC during twenty seconds, 50ºC during twenty seconds, and 72ºC during thirty second for forty cycles in order to allow the primers to bind to the DNA and start its replication; and, 72ºC during three minutes to stop the reaction. The temperature of primer binding was modified depending on the primers used. The results were observed by an electrophoresis gel. An agarose gel was made at 1% TAE (100 mL of TAE per 1 g of agarose powder) and 10 µL of GelRed (Biotium) were added to the mix. The migration of the PCR products was performed at 100 V during 40 minutes on the agarose gel, which was observed on a UV Transilluminator. The ladder used on all gels produced during this project was the GeneRuler 100 bp DNA Ladder.

4.4 BIOINFORMATIC ANALYSES

Figure 30. General flow through used to process the reads and identify the viruses infecting the selected apple and pear trees, after sequenced reads are obtained. 1: performed on Geneious; 2: Linux-based programs. 30

Figure 31. Classification algorithm used by Kraken to classify a sequence. The k-mers in the sequence are mapped to the lowest common ancestor (LCA) to the genomes that contain that k-mer in the pre-computed database. Then, the taxa that has been associated with k- mers from the sequence, and the ancestors of the taxa, form a pruned subtree that is used for classification. The classification path in the classification tree with the maximal root-to-leaf (RTL) is chosen as the taxonomic assignation of the k-mers [114].

Figure 32. Classification algorithm used by Kaiju to assign sequencing reads to a taxon. A sequencing read is first translated into the six possible reading frames, and then they are split into fragments at stop codons. Fragments are sorted by their length (MEM) or by their score (Greedy), which are screened against the reference protein database using the Burrows-Wheeler Transformation (BWT) [109].

The majority of bioinformatic analyses were performed on Geneious Prime ® 2020.0.5, which is a software with a graphical interface. Once the sequenced reads were received, they were curated with a quality control program (FastQC), which included a quality control of the initial reads, read trimming to remove adapter sequences, filtering of low quality reads, and a trimming of the reads with low quality base pairs. Then, the two pools of samples that were pooled before sequencing were demultiplexed by MIDs using a script written by PhD student François Maclot. The MIDs were then removed, and a second quality control of the reads was performed. After the quality control, the reads were assembled into contigs with SPAdes Genome Assembler [115] (Annex II, Figure 69) and the assembled contigs were identified with tblastx and blastn [116], using the Virus RefSeq Database downloaded from NCBI (https://www.ncbi.nlm.nih.gov/). Virus Refseq includes a reference genome sequence from each virus species. The version “Virus RefSeq 15 11 2019” has been downloaded in November 2019 for these analyses. BLAST is a very accurate alignment method but it is slow, allows mismatches, and requires long queries in order to be efficient [117]. In the past few years, new programs for taxonomic assignation have been developed, such as Kraken [110] and Kaiju [109]. Kraken and Kaiju give a taxonomic assignation on the sequencing reads without assembly. On one hand, Kaiju is a metagenome classification program that works with translated nucleotide and identifies matches at the protein level using the Burrows-Wheeler transformation on each individual read. On the other hand, Kraken is a k-mer-based classification program that associates k-mers with the lowest common ancestor taxa (Figure 31). The developer version of Kraken and Kaiju used were Kraken2 v2.0.8-beta and Kaiju version 1.7.2. Kraken was run on the RefSeq Database from NCBI downloaded the 17th of December 2019, including virus, fungi, plants and viroids. Kaiju was run on

31 the RefSeq Database of non-redundant proteins downloaded the 10th of January 2020. Both databases were manually curated. As seen in Figure 30, three different pipelines were used to do the taxonomic assignation (Kaiju, Kraken, and BLAST) and the results were compered between individual and pooled samples, and between extraction protocols (dsRNA and VANA).

For Kaiju (see https://github.com/bioinformatics-centre/kaiju/blob/master/README.md) and Kraken (see http://ccb.jhu.edu/software/kraken/MANUAL.html), the following scripts were used on Linux:

#Kaiju srun --time=5000 --mem-per-cpu=120000 -J kaiju_complete16 -n 1 -c 1 -o kaiju_complete16.out -e kaiju_complete16.err bash -c "kaiju -t /CECI/proj/virusid/DB/kaiju_db/nodes.dmp -f /CECI/proj/virusid/DB/kaiju_db/nr_euk/kaiju_db_nr_euk.fmi -i /CECI/proj/virusid/virus_analysis/data/Nuria/Tag016-R1_Trim.fastq - j/CECI/proj/virusid/virus_analysis/data/Nuria/Tag016-R2_Trim.fastq -v" #Kraken srun --time=2000 --mem-per-cpu=25000 -J 103kraken2_fast -n 1 -c 3 -o 103kraken2_fast.out -e 103kraken2_fast.err bash -c "kraken2 --db /CECI/proj/virusid/DB/kraken_db_fast/plant/ --quick --report 103kraken2fast_report.out --threads 3 -paired /CECI/proj/virusid/virus_analysis/data/Nuria/Tag103- R1_Trim.fastq /CECI/proj/virusid/virus_analysis/data/Nuria/Tag103-R2_Trim.fastq"

Table 8. Code of the sequences used as reference genome sequence during the bioinformatic analysis, obtained from GenBank.

Virus specie Reference code (GenBank) Apple green crinkle associated virus (AGCaV) NC_018714.1 Apple hammerhead viroid-like RNA (AHVd-like RNA) NC_028132.1 Apple luteovirus-1 (ALV-1) MF120198.1 Apple rubbery wood virus-1 (ARWV-1) segment L MK936227.1 Apple rubbery wood virus-1 (ARWV-1) segment M MK936226.1 Apple rubbery wood virus-1 (ARWV-1) segment S MK936225.1 Apple stem grooving virus (ASGV) NC_001749.2 Apple stem pitting virus (ASPV) NC_003462.2 Apricot latent virus (ApLV) NC_014821.1

Once the reads or contigs were assigned to a virus specie or taxonomic group, the presence of the virus was confirmed by mapping the reads to the reference genome from the suspected virus (Annex II, Figure 70), which was downloaded from NCBI (https://www.ncbi.nlm.nih.gov/nucleotide/). The contigs obtained after a DeNovo assembly, either with SPAdes (Annex II, Figure 69) or Geneious assembler (Annex II, Figure 72), were mapped to the reference genomes allowing 20% of mismatches in order to know which part of the genome was covered (Annex II, Figure 77). The consensus sequences obtained for each virus after the mapping were then used to perform a phylogenetic analysis (Annex II, Figure 82). The consensus sequences were aligned with other reference sequences from the same virus specie to identify the percentage of identity between the different regions with a multiple alignment using MUSLCE (Annex II, Figure 71). For example, in the case of Apple stem pitting virus the multiple alignment was not performed with the whole consensus sequences, but with the ORF1 (RdRp) and ORF5 (CP), which are the regions used to determine the taxonomy within their family. Multiple alignments were performed with MUSCLE [118] and phylogenetic trees were constructed using the algorithm.

32

5 RESULTS

5.1 IMPLEMENTATION OF DOUBLE-STRANDED RNA (DSRNA) PROTOCOL IN GEMBLOUX

The double-stranded RNA (dsRNA) extraction protocol from INRA Aquitaine (Bordeaux, France) was adapted to be used in the Laboratory of Phytopathology from Gembloux. To ensure that this extraction protocol was successfully adapted and performed, a gel was done after the enzymatic treatment of extracted dsRNAs (Figure 33) and on the PCR products (Figure 34). In a successful dsRNA extraction, a smear instead of clear bands is expected in the migration of the PCR products (Figure 34). It is not necessary to do a migration of the dsRNAs after enzymatic treatment because in most cases there will not be any bands or smear (Figure 33). Thus, the dsRNA extraction that was performed in the laboratory was successful. Nevertheless, this is not an indication of presence of viral RNA but only a confirmation that the extraction worked.

Figure 33. Migration of dsRNAs from treated and non-treated (NT) with enzymatic treatment. Samples 11 and 12 correspond to samples 8 and 9 extracted with the dsRNA extraction protocol for grapevine.

Figure 34. Migration of PCR products after cDNA synthesis and random amplification, visualized in a UV transilluminator. From left to right: lanes 1-10 are individual samples, lane 11 is a mix of individual samples before amplification, and lane 12 is the ladder.

5.2 SEQUENCING STATISTICS

The viruses identified for the different samples and their number of sequencing reads were compared between two extraction protocols (dsRNA and VANA) and between individual and pooled samples. For this part of the project, the studied samples were individual samples 1-10 (Table 5), pool of samples 1-10 extracted with dsRNA protocol, and pool of samples 1-10 extracted with VANA protocol. There were eight viruses and one viroid that were detected during the bioinformatic analysis in the sequenced samples (Table 9). Six of these viruses and the viroid are naturally found in apple and pear trees. However, apple and pear trees are not known to be natural hosts of Apricot latent virus (ApLV).

33

Foveaviruses were the most prevalent in all the individual samples and pools. The foveaviruses detected were Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV). Apple stem grooving virus (ASGV) was detected in samples 7 and 8, and in the tree Q9 (Table 5). Apple luteovirus -1 (ALV-1) was detected in nine samples, although the low number of observed reads for ALV-1 in eight of the nine positive samples warrants further investigation as there is a risk of contamination from sample 9. For Apple hammerhead viroid-like RNA (AHVd-like RNA), there was a very low number of reads in all ten samples. Comparing the detection of viruses between programs of taxonomic assignation, Kaiju did not detect AHVd-like RNA in all ten individual samples extracted with the double-stranded (dsRNA) protocol, and it did not detect ASGV in samples 7 and 8. It is normal that Kaiju did not detect AHVd-like RNA given that it works at a protein level, but the results of Apple chlorotic leaf spot virus (ACLSV) in sample 4 contradict the detection of ASGV in samples 7 and 8. For Apple rubbery wood virus-1 (ARWV-1), Kraken did not detect it in any of the samples extracted with dsRNA and the pooled samples extracted with the VANA protocol, but it detected it in sample Q9 (Table 9).

34

Table 9. Comparison of virus detectability with different identification methods (BLAST, Kaiju, and Kraken) and virus enrichment with different RNA extraction methods (dsRNA and VANA). Columns of Kaiju and Kraken indicate number of reads per million assigned to a certain virus specie and BLAST is indicated as presence (+) or absence (-). Abbreviations: Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), Apricot latent virus (ApLV), Apple chlorotic leaf spot virus (ACLSV), Apple rubbery wood virus-1 (ARWV-1), Apple luteovirus-1 (ALV-1), Apple hammerhead viroid-like RNA (AHVd-like RNA), Apple chlorotic leaf spot virus (ACLSV), double-stranded RNA (dsRNA), VANA (virion-associated nucleic acid).

Total ASPV AGCaV ApLV ARWV-1 ALV-1 AHVd-like RNA ACLSV ASGV Sample number of reads Kaiju Kraken Blast Kaiju Kraken Blast Kaiju Kraken Blast Kaiju Kraken Blast Kaiju Kraken Blast Kaiju Kraken Blast Kaiju Kraken Blast Kaiju Kraken Blast dsRNA 598216 82983 19431 + 2312 9400 + 1237 2469 + 12 - + 142 149 + - 2 + 13667 1660 + - - - mix VANA 607411 12247 1920 + 196 1182 + 212 454 + 2 - - 16 - - - 2 - 5321 519 + - - - mix 1 326068 162078 63229 + - 11994 + 938 2748 - 11 - - 23 - + - 2 - - - + - - -

2 282024 5684 1596 + 222 894 + - 273 + 12 - - 12 - + - 2 - 610 53 + - - -

3 601025 36641 9041 + 2928 5564 + 1145 1245 + 12 - + 10 - + - 2 - - - + - - -

4 294925 4523 1305 + 139 546 + 17 186 + 24 - - 20 - + - 3 - 502 - + - - -

5 197733 432710 83739 + 6114 37616 + 6807 27841 + 35 - + 56 - + - 5 - - - + - - -

6 450969 105497 23024 + 4309 14252 + 1424 3049 + 16 - + 4 - + - 2 - 13302 1419 + - - -

7 470828 82614 15488 + 1924 14651 + 692 3513 + 15 - + 6 - + - 2 - 59986 4719 + - 15 +

8 95899 202098 39010 + 3744 24786 + 4119 6403 + 73 - + 21 21 + - 10 + 34203 3055 + - 21 +

9 289694 63722 15765 + 1863 8840 + 800 1205 + 24 - - 6040 5868 + - 3 + 21257 2727 + - - -

10 397388 90923 22968 + 2693 11231 + 1017 2584 + 18 - + - 8 + - 3 + 253 2413 + - - -

Q9 5804011 433 511 + 4 20 + 4 13 + 272 446 + ------220 192 + 342 347 +

35

5.3 IDENTIFIED VIRUSES

5.3.1 Foveaviruses and apple stem pitting virus (ASPV)

In sample Q9 (Table 5), there were three foveaviruses detected: Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), and Apricot latent virus (ApLV). In the case of ApLV, it was detected in a host which was never reported. An in depth analysis was therefore carried out on the contigs from foveaviruses (Table 10).

Table 10. Percentage of homology (%) of the contigs produced after de novo assembly using blastx and blastn, comparing the contigs to Apple green crinkle associated virus (AGCaV), Apple stem pitting virus (ASPV), and Apricot latent virus (ApLV).

AGCaV ASPV ApLV Initial Contig Contig name blastx blastn blastx blastn blastx blastn annotation length (nt) Contig 32 AGCaV 8263 91% 79% 94% 84% 84% 76% Contig 36 AGCaV 7887 92% 80% 95% 82% 84% 75% Contig 48 ASPV 7616 90% 80% 94% 83% 84% 74% Contig 20111 ApLV 385 94% 80% 98% 84% 95% 80% Contig 25798 ApLV 302 93% 75% 97% 82% 91% 77%

Contigs that were initially assigned to Apricot latent virus (ApLV) were considered as Apple stem pitting virus (ASPV), given their short length (contig 20111, 302 nt, Table 10), and their higher percentage of identify to ASPV (contig 20111, 84% identity, Table 10). Contig 32 was identified as AGCaV by blastx with a percentage of identity of 91%, but the same sequence has a percentage of identity of 94% to ASPV (Table 10). The next step of the bioinformatic analysis was to perform a mapping of all reads to the reference genomes of potential viruses infecting the tree Q9. Mapping was done on AGCaV and ASPV as reference genomes at the same time to avoid mapping one read to both genomes, and with a threshold of 80% of identity between read and reference genome. In both mappings, most of the reference genome was covered (Figure 35 and Figure 36).

Figure 35. Mapping all reads from sample Q9 to the reference genome of Apple green crinkle virus (AGCaV), allowing 20% of mismatches with no iterations.

36

Figure 36. Mapping all reads from sample Q9 to the reference genome of Apple stem pitting virus (ASPV), allowing 20% of mismatches with no iterations.

The consensus sequence obtained from mapping all reads to the reference genome of ASPV had a percentage of identity of 77% to AGCaV and 83% to ASPV, at nucleotide level. The consensus sequence resulting from mapping all reads to the reference genome of AGCaV had a percentage of identity of 79% to AGCaV and 77% to ASPV, at nucleotide level. The RdRp of the consensus sequence obtained from the mapping to the reference genome of AGCaV had a percentage of identity of 77% to ASPV and 76% to AGCaV, at nucleotide level. The RdRp of the consensus sequence obtained from the mapping to the reference genome of ASPV had a percentage of identity of 81% to ASPV and 75% to AGCaV. The CP regions of the consensus sequences were not analysed because they had gaps. Taking into account that the demarcation criteria to differentiate between species within the family Betaflexiviridae is 72% of identity at nucleotide level or 80% of identity at aminoacid level, either at the RNA-dependent RNA polymerase (RdRp) or coat protein (CP) region, the consensus sequences obtained from the mapping to ASPV and AGCaV could be considered as belonging to either both of these two foveaviruses.

Figure 37. Mapping of all extended contigs (Annex II, ) to the reference sequence of Apple stem pitting virus (ASPV) (Figure 58), with data from sample Q9 (Table 4).

We therefore pursue the investigations on these sequences in order to (i) recover the full genome and (ii) evaluate the risk of chimeric sequences between species. The first strategy that was followed to recover full genomes of the new isolates was based on the extension of the contigs by mapping all reads to the contigs with twenty-five iterations. Nevertheless, there was no extension on the contigs. The second strategy proposed was a three-step process which involved: (i) extract reads assigned to foveaviruses by mapping all reads to the reference genome of ASPV, allowing 20% of mismatches, and then performing a de novo assembly with SPAdes; (ii) contig extension by mapping all reads to the contigs obtained in the previous step, allowing 5% of mismatches to avoid hybrid contigs; and, (iii) assemble extended contigs by mapping the extended contigs to the reference genome of ASPV (Figure 37). This process was repeated with the reference genome of AGCaV in order to recover the full 37 genome of the second isolate. However, this three-step process to extend the contigs was not extending the contigs but rather cleaning the data. Because of the lack of reads retrieved by mapping to reference sequences from foveaviruses, the third strategy was based on the results from Kraken to extract separately foveavirus reads. The foveavirus reads were assembled using SPAdes and were mapped to the contigs from the previous assembly several times increasing the number of iterations. There was also no extension of the contigs. Then, the process was repeated mapping the viral reads to the contigs and increasing the number of iterations, which resulted in no extension of the contigs.

Table 11. Number of contigs with greater length than 1,000 bp, maximum length (bp), total number of contigs produced with Geneious Assembler allowing different percentage of mismatches (5%, 10%, 20%).

5% mismatches 10% mismatches 20% mismatches ≥ 1,000 bp 19 22 23 Maximum 7,064 9,321 9,248 length (bp) Number of 461 421 336 contigs

Figure 38. Tree of three longest contigs produced with de novo Geneious Assembler allowing different percentages of mismatches (5%, 10%, 20%). The contigs clustering together had between a 98% and 100% of identity, except contig 2 at 5% and 10% which had 87% of identity.

To ensure that there were no artifacts created between contigs when assembling the reads, we generated the reference contigs according to three mismatch levels: 5%, 10%, and 20%. The lowest percentage (5%) is very stringent for viruses and allows the generation of intraspecies contigs. Nevertheless, there is a risk of having shorter contigs. The two other percentages allow the generation of longer contigs but with an increasing risk of including reads from other viral species. The contigs generated were further aligned and a phylogenetic tree was built (Table 11 and Figure 38). Contigs 1, 2, and 3 from each group clustered together, thus no artifacts were created and the contigs can be used for further analysis (Figure 38). The de novo assembly allowing 5% of mismatches produced a similar number of contigs longer that 1,000 bp and it produced the highest number of contigs as well, although it had the lower maximum length (7,064 bp) (Table 11). The de novo assemblies allowing a 10% and 20% of mismatches had similar number of contigs of more than 1,000 bp and similar maximum length (9,321 bp at 10% and 9,248 bp at 20%, Table 11). However, the de novo assembly allowing 20% of mismatches produced less contigs than the other two parameter settings used (Table 11). Thus, the

38 bioinformatic analysis was continued with the contigs produced by the de novo assembly allowing 10% of mismatches.

Figure 39. Mapping the contigs obtained from the DeNovo assembly with Geneious assembler at 10% of mismatches to the reference genome of Apple stem pitting virus (ASPV). The mapping was done allowing 28% of mismatches, thus mapping contigs that have more than 82% of identity with ASPV at nucleotide level (minimum percentage of identity stated in the demarcation criteria for the family Betaflexiviridae).

The contigs obtained from the de novo assembly with Geneious Assembler at 10% of mismatches were mapped to the reference genome of ASPV to locate the part of the genome that they belonged (Figure 39). The longer contig (contig 1) is a full genome of a new foveavirus isolate (isolate 1); the following two longer contigs (contigs 2 and 3) are considered as partial genomes, with complete RdRp genes and partial TGB fragments, of two new foveavirus isolates (isolates 2 and 3); and, two shorter contigs (contigs 5 and 7) are considered as complete CP genes of isolates 2 and 3. However, it was not possible to elongate contigs 5 and 7, and 2 and 3 in order to assign the two CP regions to isolates 2 and 3 reliably. Thus, further laboratory analysis must be performed to recover full genomes of the new isolates. Nevertheless, to continue with the phylogenetic analysis, contigs 2 and 5 were considered parts of the same molecule (isolate 2) and contigs 3 and 7 were considered regions of the same molecule (isolate 3). This assumption was made because the triple gene block (TGB) region, which is between the RdRp and CP, could not be recovered due to a lack of reads and thus the CPs and RdRps could not be linked together. Noteworthy, further laboratory word will be done to amplify the regions with gaps and link each CP and RdRp to a single molecule.

Figure 40. Test of coverage of contig number 2.

Figure 41. Test of coverage of contig number 3. 39

Figure 42. Test of coverage of contig number 5.

Figure 43. Test of coverage of contig number 7.

A pairwise alignment was done separately for the RdRp and CP regions of the new isolates and the complete reference genomes of ASPV and AGCaV downloaded from NCBI.

For the RdRp, at nucleotide level, isolate 1 has 82% of identity with ASPV and 79% of identity with AGCaV, isolate 2 has 79% of identity with ASPV and 79% of identity with AGCaV, and isolate 3 has 74% of identity with ASPV and 76.38% of identity with AGCaV. At aminoacid level, , isolate 1 has 92% of identity with ASPV and 79% of identity with AGCaV, isolate 2 has 78.72% of identity with ASPV and 79% of identity with AGCaV, and isolate 3 has 74% of identity with ASPV and 76% of identity with AGCaV (Annex I, Table 12 and Table 13).

For the CP, At nucleotide level, isolate 1 has 73% of identity with ASPV and 68% of identity with AGCaV, isolate 2 has 75% of identity with ASPV and 68% of identity with AGCaV, and isolate 3 has 74% of identity with ASPV and 68% of identity with AGCaV. At aminoacid level, , isolate 1 has 80.19% of identity with ASPV and 73% of identity with AGCaV, isolate 2 has 82.12% of identity with ASPV and 74% of identity with AGCaV, and isolate 3 has 81% of identity with ASPV and 75% of identity with AGCaV (Annex I, Table 14 and Table 15). At this point, there is a strong suspicion that the isolates belong to ASPV instead of AGCaV. For the RdRp region, at a nucleotide level the three isolates could belong to both species, while at an aminoacid level isolate 1 would be assigned to ASPV and isolates 2 and 3 could still belong to both species. For the CP region, both at nucleotide and aminoacid level the three isolates would belong to ASPV. To confirm that the three isolates do belong to ASPV, phylogenetic trees were made.

40

Figure 44. Phylogenetic tree of RdRp fragments, at Figure 45. Phylogenetic tree of RdRp fragments, at nucleotide level, from ASPV and AGCaV complete aminoacid level, from ASPV and AGCaV complete sequences downloaded from GenBank [119], and three sequences downloaded from GenBank, and three new new isolates identified from cultivar Joseph Musch. isolated identifies from cultivar Joseph Musch. Isolates Isolates of ASPV are highlighted in Black, isolated of of ASPV are highlighted in Black, isolated of AGCaV in AGCaV in green, and new isolates in red. green, and new isolates in red.

Figure 46. Phylogenetic tree of CP fragments, at Figure 47. Phylogenetic tree of CP fragments, at nucleotide level, from ASPV and AGCaV complete aminoacid level, from ASPV and AGCaV complete sequences downloaded from GenBank, and three new sequences downloaded from GenBank, and three new isolates identified from cultivar Joseph Musch. Isolates isolates identified from cultivar Joseph Musch. Isolates of ASPV are highlighted in Black, isolated of AGCaV in of ASPV are highlighted in Black, isolated of AGCaV in green, and new isolates in red. green, and new isolates in red.

41

The phylogenetic analysis and multiple alignment were made with complete genome sequences of ASPV and AGCaV downloaded from “NCBI Virus” (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/). The analyses were done at nucleotide and aminoacid level on the CP and RdRp regions separately. AGCaV isolate Aurora-1 has two accession numbers: NC_018714.1 and HE963831.1. Observing the phylogenetic trees, for the RdRp, both at nucleotide and aminoacid level, our isolates were located between sequences of ASPV and do not cluster with sequences of AGCaV (Figure 44 and Figure 45). Noteworthy, at aminoacid level the two sequences of AGCaV did not cluster together (Figure 45).

5.3.2 Apple Rubbery Wood Virus-1 (ARWV-1)

During the preliminary data analysis of sample Q9, Apple rubbery wood virus-1 (ARWV-1) was undetected due to its absence from the database used for tblastx on Geneious (Virus RefSeq Database downloaded from NCBI (https://www.ncbi.nlm.nih.gov/) including one reference sequence from each known virus, updated in November 2019). It was first detected on the sample Q9 after an analysis with Kaiju using an updated custom database. Reads were assigned into contigs using SPAdes de novo assembler and three contigs were identified as ARWV-1 using an updated database. ARWV-1 has three linear, negative sense, single-stranded RNA (-ssRNA) molecules. Then, reads were mapped to reference genomes of the three segments of ARWV-1 isolate BR-Mishima, allowing 20% of mismatches without iterations, to confirm their presence (Figure 48, Figure 49, and Figure 50).

The first contig corresponded to a nearly complete sequence of segment L, with 95% and 94% of identity to segment L of isolate BR-Mishima (MK936227.1) at nucleotide and aminoacid levels (Annex I, Table 16 and Table 17). The second contig presented 95% and 93% of identity to segment M of isolate BR-Mishima (MK936226.1) (Annex I, Table 18 and Table 19). The third contig was identified as a partial sequence of segment S, with 95% and 97% of identity to segment S of isolate BR-Mishima (MK936225.1) (Annex I, Table 20 and Table 21). To be able to compare the results, the phylogenetic analysis was made on protein encoding regions extracted from full genomes of ARWV- 1 and ARWV-2 downloaded from “NCBI Virus (https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/). Even though the contig identified as segment S was a partial sequence, it included a complete CP region that was used for phylogenetic analysis. The fact that segment S clusters with Isolate BR- Mishima (Figure 55 and Figure 56) while segments M and L do not cluster with any published genome of ARWV-1 (Figure 51, Figure 52, Figure 53, and Figure 54) could be explained by divergent evolution of segments M and L.

Figure 48. Mapping all reads to the reference of segment L from Apple rubbery wood virus-1 isolate BR-Mishima, allowing 20% of mismatches with no iterations. 42

Figure 49. Mapping all reads to the reference of segment M from Apple rubbery wood virus-1 isolate BR-Mishima, allowing 20% of mismatches with no iterations.

Figure 50. Mapping all reads to the reference of segment S from Apple rubbery wood virus-1 isolate BR-Mishima, allowing 20% of mismatches with no iterations.

Figure 51. Phylogenetic tree of ARWV segment L Figure 52. Phylogenetic tree of ARWV segment L sequences, at nucleotide level, complete sequences sequences, at aminoacid level, complete sequences downloaded from GenBank, and the isolated identified downloaded from GenBank, and the isolated identified in the cultivar Joseph Musch. in the cultivar Joseph Musch.

43

Figure 53. Phylogenetic tree of ARWV segment M Figure 54. Phylogenetic tree of ARWV segment M sequences, at nucleotide level, complete sequences sequences, at aminoacid level, complete sequences downloaded from GenBank, and the isolated identified downloaded from GenBank, and the isolated identified in the cultivar Joseph Musch. in the cultivar Joseph Musch.

Figure 55. Phylogenetic tree of ARWV segment S Figure 56. Phylogenetic tree of ARWV segment S sequences, at nucleotide level, complete sequences sequences, at aminoacid level, complete sequences downloaded from GenBank [119], and the isolated downloaded from GenBank [119], and the isolated identified in the cultivar Joseph Musch. identified in the cultivar Joseph Musch. Presence of ARWV-1 in tree Q9 was confirmed by RT-PCR on young and medium leaves, there was no detection of ARWV-1 in older leaves from tree Q9 (Figure 57). This represents the first detection of ARWV-1 in Europe, given that ARWV-1 has only been detected in Canada, South Korea, United States of America, and Brazil (NCBI Virus). In addition, trees surrounding Q9, which included several trees from the cultivar Joseph Musch, were screened for presence of ARWV-1. The RT-PCR carried out on those trees were all negative (Figure 59). The absence of inhibitors causing false negative results was confirmed by an RT-PCR with Nad5 primers (Figure 59), which is an internal control of a well expressed plant gene and was used as positive control to ensure that the extraction was done correctly

44

[120]. This can be explained by the fact that most pome fruit viruses are known to be transmitted solely by grafting, either susceptible scions into infected rootstocks or vice versa.

Figure 57. Electrophoresis gel of the PCR products with ARWV-1 primers ARWaV-1L3639F and ARWaV-1L4058R (Table 7) Samples were taken from leaves of different age from tree Q9 and RNA was extracted with the RNeasy Plant Mini Kit. From left to right: 1 and 2, young leaves; 3 and 4, medium leaves, 5 and 6, old leaves. The ladder used was GeneRuler 100 bp DNA Ladder.

Figure 58. Electrophoresis gel of the PCR products with ARWV-1 primers ARWaV-1L3639F with an hybridization temperature of 54ºC and ARWaV-1L4058R (Table 7) with samples taken from trees surrounding tree Q9. RNA was extracted with the RNeasy Plant Mini Kit. From left to right: 1-16, samples 1-8 with two replicates. The positive control was taken from PCR product of the positive sample with amplification of ARWV-1 (Figure 57). The ladder used was GeneRuler 100 bp DNA Ladder.

Figure 59. Electrophoresis gel of the PCR products with Nad5 primers, to ensure that the extraction with RNeasy Plant Mini Kit of samples from trees surrounding Q9 was performed well. From left to right: 1-8, samples from trees surrounding Q9; 9-10, positive controls from tomato plants ; 11, negative control. The ladder used was GeneRuler 100 bp DNA Ladder.

45

5.3.3 Apple Stem Grooving Virus (ASGV)

During the preliminary analysis of data from sample Q9 several contigs were identified as Apple stem grooving virus (ASGV). Then, reads were mapped to the reference sequence of ASGV downloaded from NCBI (https://www.ncbi.nlm.nih.gov/) (Figure 60). The consensus sequence obtained from the mapping had a 96.2% of identity to isolate HPKu-2 (LT160740).

Figure 60. Mapping of all reads to the reference of Apple stem grooving virus isolate HPKu-2, allowing 20% of mismatches with 25 iterations.

Figure 61. Phylogenetic tree of ASGV full sequences, at Figure 62. Phylogenetic tree of ASGV full sequences, at nucleotide level, downloaded from GenBank [119], and aminoacid level, downloaded from GenBank [119], and a new isolated identified in the cultivar Joseph Musch. a new isolated identified in the cultivar Joseph Musch.

A phylogenetic analysis was performed on the region encoding a large replication-associated protein, comparing the isolate from tree Q9 (isolate Q9) and reference genomes of ASGV downloaded from NCBI. In the phylogenetic tree and both at nucleotide and aminoacid level, isolate Q9 is very closely related to isolate HPKu-2 (LT160740) (Figure 61 and Figure 62). Isolate Q9 has 96% of identity at nucleotide level and 97% of identity at aminoacid level to isolate HPKu-2 (LT160740) (Annex I, Table 22 and Table 23). In the case of ASGV, it was easily identified and classified during the bioinformatic analysis using BLAST and mapping to a reference genome.

46

5.3.4 Apple luteovirus-1 (ALV-1)

Apple luteovirus-1 (ALV-1) was detected in all individual samples extracted by double-stranded RNA (dsRNA). Sample 9 (Table 5) had a higher yield compared to the other samples containing reads assigned to ALV-1 (Table 9). This could be explained by a possible contamination between samples, although there was no indication of a possible contamination between samples in other viruses. To confirm the hypothesis of a contamination to explain the differences in yield of ALV-1 between samples, it is needed to confirm the results by RT-PCR. For that, two pairs of primers were designed (Table 7). If the positive detection of ALV-1 in the samples is confirmed, it would represent the first detection in the European Union of ALV-1. Reads from individual samples 5-10 were mapped to the reference genome of ALV-1 isolate PA8 to confirm assignation of some reads to ALV-1 in Kaiju and Kraken, and detection with BLAST (Figures 61-65).

Figure 63. Mapping all reads from sample number 9 (Table 5) to the reference genome of Apple luteovirus-1 (ALV-1) isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 85).

Figure 64. Mapping all reads from sample number 6 (Table 4) to the reference genome of ALV-1 isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 86).

Figure 65. Mapping all reads from sample number 7 (Table 4) to the reference genome of ALV-1 isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 86).

47

Figure 66. Mapping all reads from sample number 8 (Table 4) to the reference genome of ALV-1 isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 86).

Figure 67. Mapping all reads from sample number 10 (Table 4) to the reference genome of ALV-1 isolate PA8 (MF120198), allowing 20% of mismatches (Annex II, Figure 86).

5.3.5 Apple hammerhead viroid-like RNA (AHVd-like RNA)

Figure 68. Mapping of all reads from sample number 8 (Table 5) to the reference sequence of Apple hammerhead viroid-like RNA (AHVd-like RNA), allowing 20% of mismatches with no iterations.

During the bioinformatic analysis of data from samples extracted with double-stranded RNA (dsRNA), various contigs were assigned to Apple hammerhead viroid-like RNA (AHVd-like RNA). A mapping of all reads from sample number 8 (Table 5), which has a higher AHVd-like RNA yield (Table 9), was performed and a nearly complete genome sequence was obtained Figure 68). The consensus sequence obtained from the mapping has a gap of 58 nt. To confirm the detection by RT-PCR, two pairs of primers for AHVd-like RNA were ordered (Table 7). If detection of AHVd-like RNA in the samples is confirmed by RT-PCR, it would be the first detection of this viroid in the European Union.

48

5.4 COMPARISON OF CLASSIFICATION PROGRAMS AND VIRAL ENRICHMENT DURING EXTRACTION

The last part of this project involved the comparison of different bioinformatic protocols of virus identification (BLAST, Kaiju, and Kraken), two different extraction methods (dsRNA and VANA), and the effects of sample pooling.

Kaiju and Kraken are two k-mer-based classification programs that identify matches at a protein level and associate k-mers with the lowest common ancestor taxa, respectively [109], [110]. This comparison is based on results summarized in Table 9.

For virus identification, Kaiju identified more reads per million per virus specie than Kraken. For example, in sample 5 Kaiju assigned 432,710 reads/million to Apple stem pitting virus (ASPV) while Kraken assigned only 83,739 reads/million to the same virus (Table 9). In sample 1, Kraken detected Apple green crinkle associated virus (AGCaV) while Kaiju did not. However, in this case Kaiju still assigned more reads/million to virus species from the genus Foveavirus than Kraken. In the pooled samples extracted following the dsRNA protocol (dsRNA mix), Kaiju assigned 2,312 reads/million to AGCaV and 1,237 reads/million to ApLV, while Kraken assigned 9,400 reads/million to AGCaV and 2,469 reads/million to ApLV (Table 9). In the pooled samples extracted with the VANA protocol (VANA mix), Kaiju assigned 196 reads/million to AGCaV and 212 reads/million to ApLV, while Kraken assigned 1,182 reads/million to AGCaV and 454 reads/million to ApLV (Table 9). Taking into account that at the end of the analysis of the foveavirus reads the three new isolates only belonged to ASPV, Kraken had misassigned more foveavirus reads that Kaiju. In comparison, blastn misassigned forty-seven contigs to AGCaV and seven contigs to ApLV. In this case, for BLAST, the small contigs were the main source of misassignation to ApLV, while the assignation of contigs to AGCaV might be due to the closest proximity of the isolate to the reference genome sequence of AGCaV compared to the reference genome sequence of ASPV (Table 8) (Figure 44). Furthermore, Kaiju had higher detection of viruses with very low abundance than Kraken, for example Apple rubbery wood virus-1 (ARWV-1) and Apple luteovirus-1 (ALV-1). Even though BLAST did not detect these viruses with a lower yield of reads in some samples, it had better detectability than Kraken.

To study which sequencing preparation protocol is more effective for pome fruit viruses, ten samples were pooled two times. One of these two pools was extracted with dsRNA extraction method [111] and the other one was extracted using the VANA extraction protocol adapted from Fillous et al. (2015) [112]. Viruses with higher yield, for example foveaviruses (ASPV, AGCaV, and ApLV), were detected in both dsRNA and VANA. However, viruses that were present in lower concentrations, as ARWV-1, ALV-1, and AHVd-like RNA, and the viroid were not detected in VANA (Table 9).

In the pooled samples extracted with the dsRNA protocol, there was only ASGV that was not detected in the pooled sample but that was detected in two of the individual samples (Table 9). This could be explained by the fact that viruses present in the samples in low levels have a risk of not being detected if the samples are pooled.

49

6 DISCUSSION

The dsRNA extraction protocol was successfully implemented in the laboratory. With dsRNA six viral species were detected, corresponding to ssRNA(+), ssRNA(-), and viroids, and considering all foveavirus reads as ASPV. This results confirm previous publications which showed that dsRNA is better suited for the study of RNA viruses and viroids than VANA [45], [46].

In the case of the three foveavirus identified, comparing the RdRp of the isolates to RdRp sequences of ASPV and AGCaV, they follow the demarcation criteria for both species. For the CP, at both at nucleotide and aminoacid level, the isolates are located between sequences of ASPV and do not cluster with sequences of AGCaV (Figure 46 and Figure 47). Comparing the CP region of the isolates to CP sequences of ASPV and AGCaV, and according to demarcation criteria of the family Betaflexiviridae, the three new isolates could be assigned to ASPV. However, the difficulties that were faced during this project to assign the new isolates to a virus species arise some doubts about the current taxonomy of the family Betaflexiviridae. AGCaV was first characterized and described as a new specie within the genus Foveavirus by James et al. (2013). In that article, AGCaV was proposed as a new specie instead of a new isolate of ASPV due to the differences at the CP region both at nucleotide and aminoacid level. Morelli et al. (2017) identified another isolate of AGCaV with differences to ASPV at the CP region. However, there were two isolates classified as ASPV that could be classified as new viruses (EU095327.1 and JF946772.1). According to the demarcation criteria, isolates PR1 (EU095327.1) and KL9 (JF946772.1) can belong to both ASPV and AGCaV. Thus, including AGCaV as an isolate of ASPV would resolve the difficulties in identifying and classifying new isolates, specially from mixed populations where obtaining complete individual genomes of each isolate is challenging. It is interesting to note that the difficulties in assigning new isolates of Foveavirus into a specie arise at the CP region, which can be due to the high variability and recombination events within the ASPV population [122]. It would be recommended to redefine demarcation criteria of the family Betaflexiviridae by its ICTV study group, similar to what has been done recently in the genus [123]. After the optimization of RT-PCR conditions, the PCR products done with ARWV-1 primers showed that the concentration of viral particles within the trees is heterogeneous, given that there was no amplification of ARWV-1 in old leaves but there was and amplification on young and medium leaves (Figure 57). This possible heterogeneous distribution of ARWV-1 within the tree must be taken into account during the analysis of other trees, and a mix of young and medium leaves has been used as starting material for dsRNA extraction. Apple rubbery wood virus (ARWV) was proposed as the causal agent of Apple rubbery wood disease (ARWD) in 2018 [59], thus there are very few articles and sequences published. In addition to the first publication of Rott et al., there was a second article that detected both ARWV-1 and ARWV-2 in trees showing symptoms of Apple Decline [60]. Detection of ARWV-1 has been confirmed by RT-PCR. Noteworthy, the tree Q9 was not showing any obvious symptoms of Apple rubbery wood disease at the time of sampling. Symptoms of Apple rubbery wood disease appear in young apple trees between one to three years old, but rigidity of stems and branches can later be restored in trees between three to five years old [102]. It is a possibility that tree Q9 showed symptoms of Apple rubbery wood disease

50 when it was between one to three years old and that now it has restored its rigidity. Nevertheless, previous studies that have detected ARWV in trees showing symptoms of Apple rubbery wood disease have found ARWV-1 in coinfection with ARWV-2 [59], [60]. Thus, it is possible that ARWV only causes symptoms of Apple rubbery wood disease when both ARWV-1 and ARWV-2 are infecting the tree. Transmissibility of ARWV-1 by grafting has not been tested yet and it has not been proven that ARWV-1 and ARWV-2 are the causal agents of ARWD [2], [59]. To further study the distribution of ARWV in the germplasm collection, more information about the history of each tree, for example when and where it was taken from, is needed.

ALV-1 was detected in sample 9 (Table 5) with a high yield, and in the other individual samples extracted with dsRNA protocol with a lower yield. Reads from ALV-1 in the other samples extracted with dsRNA may be product of a contamination during complementary DNA (cDNA) synthesis, which is the most fragile step for cross-contamination during dsRNA extraction. A mapping of all reads from sample 9 was made to the reference genome of ALV-1 isolate PA8 (MF120198), which allowed the recovery of a nearly complete genome of the new isolate of ALV-1 (Figure 63). ALV-1 is associated with Rapid Apple Decline (RAD) disease, which is an emerging problem in apple trees in North America that produces necrosis, cracking, and canker of the trunk before its collapse during summer [62], [105]. Nevertheless, the trees where ALV-1 was detected did not present any obvious symptoms at the moment of sampling. There are primers that have been ordered to confirm independently the presence of ALV-1 by RT-PCR (Table 7). If its presence is confirmed in samples from the CRA-W germplasm collection, it would represent the first detection of ALV-1 in Europe, given that it has only been described in the United States of America and South Korea [62], [63].

In general, Kaiju had better results for the foveaviruses. However, since Kaiju works at protein level it does not allow detection of viroids, which is a setback in comparison to Kraken and BLAST. Nevertheless, Kaiju and Kraken had the advantage that they allow read extraction by virus specie or taxonomic group and that they give the results of unassigned reads, which can be useful for detection of novel viruses. Comparatively, BLAST works with contigs produced by DeNovo assembly to identify viruses, which can sometimes lead to a misassignation of hybrid contigs to the wrong virus. Thus, BLAST results needed a manual curation to identify possible misassignations and discard the results that did not make sense biologically.

There was an effect of pooling on detection of viruses present in low levels, and there was more viral enrichment in samples extracted individually, especially for species with a lower yield. For example, in sample 9, Kaiju assigned 6,040 reads/million to ALV-1 while on the pooled sample “dsRNA mix” it only assigned 142 reads/million to ALV-1 (Table 9). These detections with lower yield need to be confirmed by PCR and further work in the laboratory.

51

7 CONCLUSIONS

Pome fruit viruses have been extensively studied over the years on cultivars that are economically important. Studying the virome of trees from germplasm collections gives a new insight into the diversity of viruses infecting pome fruit trees that, not only increases our knowledge on plant viruses, but also provides useful information for breeders. In this project, the virome characterization of six samples of apple (Malus domestica) and five samples of pear (Pyrus communis) from the CRA-W germplasm collection resulted in (i) the generation of new nearly complete genome sequences for Apple stem pitting virus (ASPV), Apple green crinkle associated virus (AGCaV), Apple rubbery wood virus-1 (ARWV-1), Apple stems grooving virus (ASGV), Apple luteovirus-1 (ALV-1), Apple hammerhead viroid-like RNA (AHVd-like RNA), and Apple chlorotic leaf spot virus (ACLSV); (ii) the contribution to further discussion surrounding species taxonomy and demarcation criteria, specially within the Family Betaflexiviridae; and, (iii) the first detection of ARWV-1, ALV-1, and AHVd-like RNA in Europe with a RT-PCR confirmation needed for the two latter’s.

Furthermore, a preliminary comparison of protocol has been done comparing pooled and individual samples, different preparation methods (dsRNA and VANA), and programs (BLAST, Kaiju, and Kraken). As seen in Table 9, and discussed in the previous section, individual samples presented a higher yield of viruses that may be present in low concentrations and a higher analytical sensitivity than pooled samples. An adaptation of the protocol could correspond to the generation of 10 times more sequences for the pooled samples. Despite its interest, this option needs to take care of contamination issues.

For the sequencing preparation protocols (dsRNA and VANA), the results from literature were confirmed, as more viruses were detected in the pooled sample extracted with the dsRNA protocol and with higher sensitivity, in comparison to the viruses detected in the pooled sample extracted with the VANA protocol. Kaiju and Kraken had similar results to BLAST in terms of analytical sensitivity, but the advantage of these two new programs is that they allow the extraction of reads assigned to a specific taxa. Another advantage is that they keep record of the number of unassigned reads (neither to the host plant nor to any known organism or virus). These reads, called dark matter, could be further analysed through assembly and blast analysis. In both cases, Kaiju and Kraken did not detect some viruses (ASGV and AHVd-like RNA) and they generated a false positive presence of foveaviruses (AGCaV and ApLV). Thus, further refinements are therefore needed before switching from BLAST to Kaiju and Kraken. Nevertheless, these programs are interesting to use as a preliminary analysis before the assembly of the reads.

For ARWV-1, further research should be done to amplify the whole genome and sequence it in order to recover a full genome. Additional experiments may include grafting scions from Q9 into healthy rootstock to study and monitor whether infection with only ARWV-1 can cause symptoms of Appel rubbery wood disease (ARWD), by comparing the results with scions infected with ARWV-1 and ARWV-2 into healthy rootstock. If the causal agent of ARWD is indeed a coinfection of ARWV-1 and ARWV-2, then their name should be modified for Apple rubbery wood associated virus (ARWaV). ARWV-2 could be obtained from the laboratory of Dr Mike Rott, from the Canadian Food Inspection Agency, who has worked previously with ARWV-1 and 2. The article of the first ARWV-

52

1 detection in Europe would include the first detection of ALV-1 and AHVd-like RNA in Europe as well. For that, a PCR amplification, and sequencing of the PCR products, will be done to confirm first detection of both ALV-1 and AHVd-like RNA in Europe. To recover the full genome of the new isolates detected on the samples from CRA-W, a long PCR and sequencing of the products can be done. In the case of new isolates of ASPV, at least three new isolates have been detected in samples from the tree Q9. However, two of the three isolates have gaps in the triple gene block (TGB) region. Full genomes of ASGV and ACLSV must be recovered as well before publication of the results. In order to recover full genomes of the new ASPV, ASGV, and ACLSV isolates and fill the gaps, a PCR amplification with custom primers can be performed. A RACE-PCR to sequence the 5’ and 3’ end of the genome can be done to obtain complete sequences, which is required in some journals when publishing an article. Additionally, as mentioned during the discussion, the ICTV discussion group for Betaflexiviridae should be reviewing the demarcation criteria of the family given that they are not consistent with the sequences published, and AGCaV sequences available and published on GenBank should be considered as isolates of ASPV.

As a continuation of this work, the goal will be to further characterize the virome of four hundred pome fruit trees, which represent approximately one third of the total number of trees from the CRA-W germplasm collection. The dsRNA preparation protocol, that was adapted and developed during this thesis and for the first time in Gembloux Agro-Bio Tech, will be applied on individual samples. Pooling samples is a useful tool for plant viruses at high prevalence in an orchard and at high level in the infected trees. However, because pome fruit viruses are generally transmitted by grafting, viral populations from a virus specie can greatly vary from one tree to the other in the case of germplasm collection. Hence, differentiating different isolates from the same virus specie in a pooled sample is difficult and time-consuming. Extracting samples individually is time-consuming as well, but their bioinformatic analyses will be simpler compared to a pool of samples.

In the middle term, the detection of an unknown virus and of a known virus for the first time in Europe will also require Pest Risk Analyses to inform the curators, the growers, and phytosanitary authorities if needed. A framework for the evaluation of biosecurity, commercial, regulatory, and scientific impacts of plant viruses and viroids identified by NGS technologies was published three years ago [124]. In brief, after the first detection of a new virus or of a poorly characterized virus, its full or nearly full genome must be obtained. A diagnostic protocol should be designed and used to evaluate its local prevalence and its association with symptoms. In the longer term, host range, symptomatology, transmission, and epidemiology at large scale should be carried out to fine tune the preliminary risk analysis. We have applied this framework for ARWV-1 in this thesis and we saw no symptoms in the orchard nor detected the virus in neighbouring trees. We will also apply this framework for ALV-1 and AHVd-like RNA.

In the longer term, a better knowledge of the virome of the germplasm collection will be gained through the developed protocols. Their application aims at increasing awareness on the phytosanitary status of perennial, living pome fruit collections and thus ensure their long-term preservation and multiplication to tackle future challenges of pome fruit production.

53

8 REFERENCES

[1] Eurostat, “The fruit and vegetable sector in the EU - a statistical overview,” Statistics Explained, 2019. [Online]. Available: http://ec.europa.eu/eurostat/statistics- explained/index.php/Category:Tourism_glossary. [Accessed: 03-Jun-2020]. [2] F. Di Serio, S. Ambr, T. Sano, R. Flores, and B. Navarro, “Viroid Diseases in Pome and Stone Fruit Trees and Koch ’ s Postulates : A Critical Assessment,” no. i, 2018. [3] A. Hadidi, M. Barba, T. Candresse, W. Jelkmann, and American Phytopathological Society., Virus and virus-like diseases of pome and stone fruits. APS Press/American Phytopathological Society, 2011. [4] H. Bastiaanse, Y. Muhovski, D. Mingeot, and M. Lateur, “Candidate defense genes as predictors of partial resistance in ‘Président Roulin’ against apple scab caused by Venturia inaequalis,” Tree Genet. Genomes, vol. 11, no. 6, pp. 1–18, Dec. 2015. [5] G. Marconi, N. Ferradini, L. Russi, L. Concezzi, F. Veronesi, and E. Albertini, “Genetic characterization of the apple germplasm collection in central Italy: The value of local varieties,” Front. Plant Sci., vol. 9, Oct. 2018. [6] C. A. Acuña et al., “Reproductive Systems in Paspalum: Relevance for Germplasm Collection and Conservation, Breeding Techniques, and Adoption of Released Cultivars,” Frontiers in Plant Science, vol. 10. Frontiers Media S.A., 21-Nov-2019. [7] M. Ordidge, P. Kirdwichai, M. Fazil Baksh, E. P. Venison, J. George Gibbings, and J. M. Dunwell, “Genetic analysis of a major international collection of cultivated apple varieties reveals previously unknown historic heteroploid and inbred relationships,” PLoS One, vol. 13, no. 9, Sep. 2018. [8] D. J. Mabberley, The Plant-Book. A portable dictionary of the vascular plants., 2nd ed., no. 5– 6. Cambridge, New York, Melbourne: Cambridge University Press, 1997. [9] W. S. Judd, C. S. Campbell, E. Kellogg, P. F. Stevens, and J. Donoghue, Plant Systematics: A Phylogenetic Approach, 2nd ed., no. 3. Sunderland, MA: Oxford University Press (OUP), 2002. [10] J. Janick, “The Origins of Fruits, Fruit Growing, and Fruit Breeding,” in Plant Breeding Reviews, Oxford, UK: John Wiley & Sons, Inc., 2005, p. 320. [11] V. Shulaev et al., “Multiple models for Rosaceae genomics,” Plant Physiol., vol. 147, no. 3, pp. 985–1003, 2008. [12] D. Potter et al., “Phylogeny and classification of Rosaceae,” Plant Syst. Evol., vol. 266, pp. 5– 43, 2007. [13] “WAPA - The World Apple and Pear Association.” [Online]. Available: http://www.wapa- association.org/asp/page_1.asp?doc_id=446. [Accessed: 22-May-2020]. [14] L. Taiz, E. Zeiger, I. M. Moller, and A. Murphy, Plant Physiology and Development, Sixth. Oxford University Press, 2014. [15] R. Beauvieux, B. Wenden, and E. Dirlewanger, “Bud dormancy in perennial fruit tree species: A pivotal role for oxidative cues,” Frontiers in Plant Science, vol. 9. Frontiers Media S.A., p. 657, 16-May-2018. 54

[16] G. A. Lang, J. D. Early, G. C. Martin, and R. L. Darnell, “Endo-, para-, and ecodormancy: physiological terminology and classification for dormancy research,” Hortic. Sci., vol. 22, no. 3, pp. 371–377, 1987. [17] J. Granier, Phénologie des espèces fruitières et fruits rouges. Centre Technique Interprofessionnel des Fruits et Légumes , 2006. [18] J. M. Lespinasse, P. Chol, J. Dupin, and E. Terrenne, La Conduite du pommier : types de fructification, incidence sur la conduite de l’arbre. Paris: INVUFLEC, 1977. [19] M. Trillot, A. Masseron, V. Mathieu, F. Bergougnoux, C. Hutin, and Y. Lespinasse, Le pommier. Centre technique interprofessionnel des fruits et légumes, 2002. [20] T. Kurokura, N. Mimida, N. H. Battey, and T. Hytönen, “The regulation of seasonal flowering in the Rosaceae,” J. Exp. Bot., vol. 64, no. 14, pp. 4131–4141, Nov. 2013. [21] A. Masseron, Les Porte-greffe pommier, poirier et nashi. Paris: Centre technique interprofessionnel des fruits et légumes, 1989. [22] K. M. Folta and S. E. Gardiner, Genetics and Genomics of Rosaceae. Springer New York, 2009. [23] C. Pratt, “Apple Flower and Fruit: Morphology and Anatomy,” in Horticultural Reviews, John Wiley & Sons, Inc., 2011, pp. 273–308. [24] H. Xue et al., “The genetic locus underlying red foliage and fruit skin traits is mapped to the same location in the two pear bud mutants ‘Red Zaosu’ and ‘Max Red Bartlett,’” Hereditas, vol. 155, p. 25, 2018. [25] “Pear Varieties List | Guide to Ten Pear Types | USA Pears.” [Online]. Available: https://usapears.org/pear-varieties/. [Accessed: 03-May-2020]. [26] H. P. Doekes, R. F. Veerkamp, P. Bijma, S. J. Hiemstra, and J. Windig, “Value of the Dutch Holstein Friesian germplasm collection to increase genetic variability and improve genetic merit,” J. Dairy Sci., vol. 101, no. 11, pp. 10022–10033, Nov. 2018. [27] R. Mumford, N. Boonham, J. Tomlinson, and I. Barker, “Advances in molecular phytodiagnostics - New solutions for old problems,” European Journal of Plant Pathology, vol. 116, no. 1. Springer Netherlands, pp. 1–19, 13-Jul-2006. [28] T. Wetzel, T. Candresse, M. Ravelonandro, and J. Dunez, “A polymerase chain reaction assay adapted to plum pox detection,” J. Virol. Methods, vol. 33, no. 3, pp. 355–365, Aug. 1991. [29] R. R. Martin, D. James, and C. A. Lévesque, “Impacts of Molecular Diagnostic Technologies on Plant Disease Management,” Annu. Rev. Phytopathol., vol. 38, no. 1, pp. 207–239, Sep. 2000. [30] S. Massart, A. Olmos, H. Jijakli, and T. Candresse, “Current impact and future directions of high throughput sequencing in plant virus diagnostics,” Virus Research, vol. 188. Elsevier, pp. 90–96, 08-Aug-2014. [31] I. P. Adams, A. Fox, N. Boonham, S. Massart, and K. De Jonghe, “The impact of high throughput sequencing on plant health diagnostics,” Eur. J. Plant Pathol., vol. 152, no. 4, pp. 909–919, Dec. 2018. [32] F. Sanger, S. Nicklen, and A. R. Coulson, “DNA sequencing with chain-terminating inhibitors.,” Proc. Natl. Acad. Sci. U. S. A., vol. 74, no. 12, pp. 5463–5467, 1977.

55

[33] L. M. Smith, S. Fung, M. W. Hunkapiller, T. J. Hunkapiller, and L. E. Hood, “The synthesis of oligonucleotides containing an aliphatic amino group at the 5’ terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis.,” Nucleic Acids Res., vol. 13, no. 7, pp. 2399– 2412, Apr. 1985. [34] X. G. Zhou, L. F. Ren, Y. T. Li, M. Zhang, Y. De Yu, and J. Yu, “The next-generation sequencing technology: A technology review and future perspective,” Science China Life Sciences, vol. 53, no. 1. Springer, pp. 44–57, 12-Feb-2010. [35] S. Goodwin, J. D. McPherson, and W. R. McCombie, “Coming of age: Ten years of next- generation sequencing technologies,” Nature Reviews Genetics, vol. 17, no. 6. Nature Publishing Group, pp. 333–351, 01-Jun-2016. [36] S. Shokralla, J. L. Spall, J. F. Gibson, and M. Hajibabaei, “Next-generation sequencing technologies for environmental DNA research,” Mol. Ecol., vol. 21, no. 8, pp. 1794–1805, Apr. 2012. [37] E. R. Mardis, “DNA sequencing technologies: 2006-2016,” Nature Protocols, vol. 12, no. 2. Nature Publishing Group, pp. 213–218, 01-Feb-2017. [38] A. Olmos et al., “High-throughput sequencing technologies for plant pest diagnosis: challenges and opportunities,” EPPO Bull., vol. 48, no. 2, pp. 219–224, Aug. 2018. [39] O. T. Avery, C. M. Macleod, and M. McCarty, “Studies on the chemical nature of the substance inducing transformation of pneumococcal types: Induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type iii,” J. Exp. Med., vol. 79, no. 2, pp. 137–158, Feb. 1944. [40] R. D. Fleischmann et al., “Whole-genome random sequencing and assembly of Haemophilus influenzae Rd,” Science (80-. )., vol. 269, no. 5223, pp. 496–512, 1995. [41] J. Gauthier, A. T. Vincent, S. Charette, and N. Derome, “A Brief History of Bioinformatics,” Brief. Bioinform., pp. 1–16, 2018. [42] S. Massart, “In preparation.” [43] A. M. Lesk, Introduction to bioinformatics, 4th ed. Oxford University Press, 2014. [44] S. Roux, J. Matthijnssens, and B. E. Dutilh, “Metagenomics in Virology,” in Reference Module in Life Sciences, Elsevier, 2019. [45] M. J. Roossinck, D. P. Martin, and P. Roumagnac, “Plant virus metagenomics: Advances in virus discovery,” Phytopathology, vol. 105, no. 6. American Phytopathological Society, pp. 716–727, 01-Jun-2015. [46] Y. Ma et al., “Phytovirome Analysis of Wild Plant Populations: Comparison of Double- Stranded RNA and Virion-Associated Nucleic Acid Metagenomic Approaches,” J. Virol., vol. 94, no. 1, Oct. 2019. [47] F. Maclot, “Illuminating an ecological blackbox: Using High Throughput Sequencing to characterize the plant virome across scales.,” Front. Microbiol. (submitted)., 2020. [48] R. Hull, Plant Virology: Fifth Edition. Elsevier Inc., 2013. [49] R. C. Gergerich and V. V. Dolja, “Introduction to Plant Viruses, the Invisible Foe.,” Plant Heal. Instr., 2006.

56

[50] N. J. Dimmock, A. J. Easton, K. N. Leppard, and J. Galama, Introduction to modern virology, 6th ed. Oxford: Wiley-Blackwel, 2007. [51] A. E. Gorbalenya et al., “The new scope of virus taxonomy: partitioning the virosphere into 15 hierarchical ranks,” Nature Microbiology, vol. 5, no. 5. Nature Research, pp. 668–674, 01-May- 2020. [52] M. J. Roossinck, “Plant Virus Metagenomics: Biodiversity and Ecology,” Annu. Rev. Genet., vol. 46, no. 1, pp. 359–369, Dec. 2012. [53] D. Baltimore, “Expression of animal virus genomes.,” Bacteriol. Rev., vol. 35, no. 3, pp. 235– 241, Sep. 1971. [54] Hou, “In preparation.” [55] I. Koloniuk, J. Přibylová, J. Fránová, and J. Špak, “Genomic characterization of Malus domestica virus A (MdoVA), a novel infecting apple,” Arch. Virol., vol. 165, no. 2, pp. 479–482, Feb. 2020. [56] T. Leichtfried, S. Dobrovolny, H. Reisenzein, S. Steinkellner, and R. A. Gottsberger, “Apple chlorotic fruit spot viroid: a putative new pathogenic viroid on apple characterized by next- generation sequencing,” Arch. Virol., vol. 164, no. 12, pp. 3137–3140, Dec. 2019. [57] D. Baek, S. Lim, H. J. Ju, H. R. Kim, S. H. Lee, and J. S. Moon, “The complete genome sequence of apple rootstock virus A, a novel nucleorhabdovirus identified in apple rootstocks,” Arch. Virol., vol. 164, no. 10, pp. 2641–2644, Oct. 2019. [58] P. Serra, A. Messmer, D. Sanderson, D. James, and R. Flores, “Apple hammerhead viroid-like RNA is a bona fide viroid: Autonomous replication and structural features support its inclusion as a new member in the genus Pelamoviroid,” Virus Res., vol. 249, pp. 8–15, Apr. 2018. [59] M. E. Rott, P. Kesanakurti, C. Food, I. Agency, N. Saanich, and B. Columbia, “Discovery of Negative-Sense RNA Viruses in Trees Infected with Apple Rubbery Wood Disease by Next- Generation Sequencing,” vol. 102, no. 7, pp. 1254–1263, 2018. [60] A. A. Wright, S. A. Szostek, E. Beaver-Kanuya, and S. J. Harper, “Diversity of three bunya- like viruses infecting apple,” Arch. Virol., vol. 163, no. 12, pp. 3339–3343, Dec. 2018. [61] B. Navarro, S. Zicca, M. Minutolo, M. Saponari, D. Alioto, and F. Di Serio, “A Negative- Stranded RNA Virus Infecting Citrus Trees: The Second Member of a New Genus Within the Order Bunyavirales,” Front. Microbiol., vol. 9, no. OCT, p. 2340, Oct. 2018. [62] H. Liu et al., “Characterization of a new apple luteovirus identified by high-throughput sequencing,” pp. 1–9, 2018. [63] P. Shen et al., “Molecular characterization of a novel luteovirus infecting apple by next- generation sequencing,” Arch. Virol., vol. 163, no. 3, pp. 761–765, Mar. 2018. [64] F. Xing, B. Lemma, Z. Zhang, H. Wang, and S. Li, “Genomic Analysis, Sequence Diversity, and Occurrence of Apple necrotic mosaic virus, a Novel Ilarvirus Associated with Mosaic Disease of Apple Trees in China,” Am. Phytopahtological Soc., 2018. [65] M. Fernando et al., “A novel , highly divergent ssDNA virus identified in Brazil infecting apple , pear and grapevine,” vol. 210, pp. 27–33, 2015. [66] P. Liang et al., “Identification and characterization of a novel geminivirus with a monopartite genome infecting apple trees,” pp. 2411–2420, 2015.

57

[67] D. James et al., “Identification and complete genome analysis of a virus variant or putative new foveavirus associated with apple green crinkle disease,” pp. 1877–1887, 2013. [68] A. T. Jones, W. J. McGavin, V. Gepp, M. T. Zimmerman, and S. W. Scott, “Purification and properties of blackberry chlorotic ringspot, a new virus species in subgroup 1 of the genus Ilarvirus found naturally infecting blackberry in the UK,” Ann. Appl. Biol., vol. 149, no. 2, pp. 125–135, Oct. 2006. [69] C. Li, N. Yoshikawa, T. Takahashi, T. Ito, K. Yoshida, and H. Koganezawa, “Nucleotide sequence and genome organization of Apple latent spherical virus: A new virus classified into the family Comoviridae,” J. Gen. Virol., vol. 81, no. 2, pp. 541–547, Feb. 2000. [70] F. Di Serio et al., “Apple dimple fruit viroid : Fulfillment of Koch ’ s Postulates and Symptom Characteristics.” [71] J. C. Desvignes et al., “Pear blister canker viroid : sequence variability and causal role in pear blister canker disease,” pp. 2625–2629, 1995. [72] P. E. Kyriakopoulou, L. Giunchedi, M. Barba, I. N. Boubourakas, M. S. Kaponi, and A. Hadidi, “Peach Latent Mosaic Viroid Other Than Peach,” 1990. [73] J. Hashimoto, H. Koganezawal, and T. S. City, “Nucleic Acids Research Nucleic Acids Research,” vol. 15, no. 17, pp. 7045–7052, 1987. [74] T. Ohno et al., “Purification and characterization of hop stunt viroid,” Virology, vol. 118, no. 1, pp. 54–63, Apr. 1982. [75] D. W. Mossop, R. I. B. Francki, and C. J. Grivell, “Comparative studies on tomato aspermy and cucumber mosaic viruses. V. Purification and properties of a cucumber mosaic virus inducing severe chlorosis,” Virology, vol. 74, no. 2, pp. 544–546, Oct. 1976. [76] W. Jelkmann, “Nucleotide sequences of apple stem pitting virus and of the coat protein gene of a similar virus from pear associated with vein yellows disease and their relationship with potex- and carlaviruses,” J. Gen. Virol., vol. 75, no. 7, pp. 1535–1542, Jul. 1994. [77] Y. Li, C. Deng, Y. Bian, X. Zhao, and Q. Zhou, “Characterization of apple stem grooving virus and apple chlorotic leaf spot virus identified in a crab apple tree,” Arch. Virol., vol. 162, no. 4, pp. 1093–1097, Apr. 2017. [78] G. Scurfield and C. Reinganum, “Some anatomical and chemical effects of ‘flat-limb’ virus on apple var gravenstein,” Aust. J. Agric. Res., vol. 15, no. 4, pp. 548–559, 1964. [79] G. Pasquini, F. Faggioli, M. Pilotti, V. Lumia, and M. Barba, “Characterization of Apple Chlorotic Leaf Spot Virus isolates from Italy,” 1998. [80] G. I. Mink and J. B. Bancroft, “Purification and serology of Tulare apple mosaic virus,” Nature, vol. 194, no. 4824, pp. 214–215, 1962. [81] G. Nyland, “Possible virus-induced genetic abnormalities in tree fruits,” Science (80-. )., vol. 137, no. 3530, pp. 598–599, Aug. 1962. [82] R. Cropley, “Cherry leaf-roll virus,” Ann. Appl. Biol., vol. 49, no. 3, pp. 524–529, Oct. 1961. [83] CABI, “Cherry rasp leaf ‘nepovirus,’” no. 303, 1996. [84] D. James, W. E. Howell, and G. I. Mink, “Molecular evidence of the relationship between a virus associated with flat apple disease and Cherry rasp leaf virus as determined by RT-PCR,”

58

Plant Dis., vol. 85, no. 1, pp. 47–52, Feb. 2001. [85] A. D. Thomson, “Potato viruses A and S in New Zealand,” New Zeal. J. Agric. Res., vol. 2, no. 4, pp. 702–706, 1959. [86] R. W. Fulton, “Identity of and relationships among certain sour cherry viruses mechanically transmitted to Prunus species,” Virology, vol. 6, no. 2, pp. 499–511, Oct. 1958. [87] L. Grimová, L. Winkowska, M. Konrady, and P. Rysánek, “Apple mosaic virus,” Phytopathol. Mediterr., vol. 55, no. 1, pp. 1–19, 2016. [88] B. Kassanis, “STUDIES ON DANDELION YELLOW MOSAIC AND OTHER VIRUS DISEASES OF LETTUCE,” Ann. Appl. Biol., vol. 34, no. 3, pp. 412–421, Sep. 1947. [89] A. A. Moini, “Identification of Tomato ringspot virus (ToRSV) on apple in Iran,” Australas. Plant Dis. Notes, vol. 5, no. 1, pp. 105–106, 2010. [90] G. Distribution, “Tobacco ringspot nepovirus,” 1992. [91] K. M. Smith, “STUDIES ON POTATO VIRUS DISEASES:VI. FURTHER EXPERIMENTS WITH THE VIRUS OF A POTATO MOSAIC UPON THE TOBACCO PLANT,” Ann. Appl. Biol., vol. 16, no. 3, pp. 382–399, Aug. 1929. [92] H. H. Nawaz, M. Umer, S. Bano, and A. Usmani, “A Research Review on Tomato Bushy Stunt Virus Disease Complex,” vol. 4, no. 5, pp. 1985–1990, 2014. [93] J. D. Bernal, I. Fankuchen, and D. P. Riley, “Structure of the crystals of tomato bushy stunt virus preparations,” Nature, vol. 142, 1938. [94] B. Kassanis, “Some properties of four viruses isolated from carnation plants,” Ann. Appl. Biol., vol. 43, no. 1, pp. 103–113, Mar. 1955. [95] H. Agrawal, M. Chessin, and L. Bos, “Purification of clover yellow mosaic virus,” Nature, vol. 194, pp. 408–409, Apr. 1962. [96] W. Jelkmann and S. Paunovic, “ CHAPTER 8: Apple stem pitting virus ,” in Virus and Virus- Like Diseases of Pome and Stone Fruits, The American Phytopathological Society, 2011, pp. 35–40. [97] International Committee on Taxonomy of Viruses (ICTV), “Luteoviridae - Positive Sense RNA Viruses,” 2011. [Online]. Available: https://talk.ictvonline.org/ictv- reports/ictv_9th_report/positive-sense--viruses-2011/w/posrna_viruses/265/luteoviridae. [Accessed: 25-Oct-2018]. [98] S. Massart, M. H. Jijakli, and J. Kummert, “ CHAPTER 7: Apple stem grooving virus ,” in Virus and Virus-Like Diseases of Pome and Stone Fruits, The American Phytopathological Society, 2011, pp. 29–33. [99] M. Marklewitz, G. Palacios, H. Ebihara, J. H. Kuhn, and S. Junglen, “Create four new genera, create seventy nine new species, rename/move seven species, rename/move three genera and abolish one genus in the family Phenuiviridae, order Bunyavirales,” Oct. 2019. [100] A. Diaz-Lara et al., “Two Novel Negative-Sense RNA Viruses Infecting Grapevine Are Members of a Newly Proposed Genus within the Family Phenuiviridae,” Viruses, vol. 11, no. 8, p. 685, Jul. 2019. [101] E. E. Chamberlain, J. D. Atkinson, G. A. Wood, and J. A. Hunter, “Apple rubbery wood virus,”

59

New Zeal. J. Agric. Res., vol. 14, no. 3, pp. 707–719, 1971. [102] V. Jakovljevic, P. Otten, C. Berwarth, and W. Jelkmann, “Analysis of the apple rubbery wood disease by next generation sequencing of total RNA,” Eur. J. Plant Pathol., vol. 148, no. 3, pp. 637–646, Jul. 2017. [103] R. Fialová, M. Navrátil, and R. Podsedníková, “Histological manifestation of rubbery wood symptom in apple trees,” in Acta Horticulturae, 2001, vol. 550, pp. 265–268. [104] R. Casper, “,” in The Plant Viruses, Springer US, 1988, pp. 235–258. [105] E. Stokstad, “Rapid apple decline has researchers stumped,” Science, vol. 363, no. 6433. American Association for the Advancement of Science, p. 1259, 22-Mar-2019. [106] Z. Zhang et al., “Discovery of Replicating Circular RNAs by RNA-Seq and Computational Algorithms,” PLoS Pathog., vol. 10, no. 12, Dec. 2014. [107] H. Koganezawa, H. Yanase, and T. Sakuma, “VIROID-LIKE RNA ASSOCIATED WITH APPLE SCAR SKIN (OR DAPPLE APPLE) DISEASE,” Acta Hortic., no. 130, pp. 193–198, Jan. 1983. [108] M. Chiumenti, B. Navarro, P. Venerito, F. Civita, A. Minafra, and F. Di, “Molecular variability of apple hammerhead viroid from Italian apple varieties supports the relevance in vivo of its branched conformation stabilized by a kissing loop interaction,” Virus Res., vol. 270, no. May, 2019. [109] P. Menzel, K. L. Ng, and A. Krogh, “Fast and sensitive taxonomic classification for metagenomics with Kaiju,” Nat. Commun., vol. 7, no. 1, pp. 1–9, Apr. 2016. [110] D. E. Wood, J. Lu, and B. Langmead, “Improved metagenomic analysis with Kraken 2,” Genome Biol., vol. 20, no. 1, p. 257, Nov. 2019. [111] A. Marais, C. Faure, B. Bergey, and T. Candresse, “Viral Double-Stranded RNAs (dsRNAs) from Plants: Alternative Nucleic Acid Substrates for High-Throughput Sequencing,” in Viral Metagenomics: Methods and Protocols, vol. 1746, 2018, pp. 45–53. [112] D. Filloux, S. Dallot, A. Delaunay, S. Galzi, E. Jacquot, and P. Roumagnac, “Plant Pathology: Techniques and Protocols,” in Methods in Molecular Biology, vol. 1302, C. Lacomme, Ed. New York: Springer Science+Business Media, 2015, pp. 249–257. [113] T. J. Morris and J. A. Dodds, “Isolation and Analysis of Double-Stranded RNA from Virus- Infected Plant and Fungal Tissue,” Phytopathology, vol. 69, pp. 854–858, 1979. [114] D. E. Wood and S. L. Salzberg, “Kraken: ultrafast metagenomic sequence classification using exact alignments,” 2014. [115] A. Bankevich et al., “SPAdes: A new genome assembly algorithm and its applications to single- cell sequencing,” J. Comput. Biol., vol. 19, no. 5, pp. 455–477, May 2012. [116] S. F. Altschul, W. Gish, W. Miller, E. W. Myers, and D. J. Lipman, “Basic local alignment search tool,” J. Mol. Biol., vol. 215, no. 3, pp. 403–410, 1990. [117] S. François et al., “Increase in taxonomic assignment efficiency of viral reads in metagenomic studies,” Virus Res., vol. 244, pp. 230–234, Jan. 2018. [118] R. C. Edgar, “MUSCLE: multiple sequence alignment with high accuracy and high throughput,” Nucleic Acids Res., vol. 32, no. 5, pp. 1792–1791, Mar. 2004.

60

[119] D. A. Benson et al., “GenBank,” Nucleic Acids Res., vol. 41, no. D1, pp. 36–42, 2013. [120] M. S. Noorani, P. Awasthi, M. P. Sharma, R. Ram, A. A. Zaidi, and V. Hallan, “Simultaneous detection and identification of four cherry viruses by two step multiplex RT-PCR with an internal control of plant nad5 mRNA,” J. Virol. Methods, vol. 193, no. 1, pp. 103–107, Oct. 2013. [121] M. Morelli, A. Giampetruzzi, L. Laghezza, L. Catalano, V. N. Savino, and P. Saldarelli, “Identification and characterization of an isolate of apple green crinkle associated virus involved in a severe disease of quince (Cydonia oblonga, Mill.),” Arch. Virol., vol. 162, no. 1, pp. 299– 306, Jan. 2017. [122] B. Komorowska, B. Hasiów-Jaroszewska, and S. F. Elena, “Evolving by deleting: patterns of molecular evolution of Apple stem pitting virus isolates from Poland,” J. Gen. Virol., vol. 100, pp. 1442–1456, 2019. [123] H. J. Maree, A. G. Blouin, A. Diaz-Lara, I. Mostert, M. Al Rwahnih, and T. Candresse, “Status of the current vitivirus taxonomy,” Arch. Virol., vol. 165, no. 2, pp. 451–458, Feb. 2020. [124] S. Massart et al., “A Framework for the Evaluation of Biosecurity, Commercial, Regulatory, and Scientific Impacts of Plant Viruses and Viroids Identified by NGS Technologies,” Front. Microbiol., vol. 8, no. JAN, p. 45, Jan. 2017.

61