<<

Molecular characterization and diagnosis of infecting yams in the South Pacific

by

Amit Chand Sukal Bachelor of Science (Biology/Chemistry) Master of Science (Biology)

Centre for Tropical Crops and Biocommodities School of Earth, Environment and Biological Sciences Faculty of Science and Technology

A thesis submitted for the degree of Doctor of Philosophy Queensland University of Technology

2018

This page is intentionally left blank

ii

Abstract

Yams ( spp.) are economically important, annual or perennial tuber-bearing, tropical . Globally yam ranks as the fourth most important root crop by production and is a staple food crop for millions of people in Africa, the Caribbean,

South America, Asia and the Pacific. In Pacific Island countries (PICs), the production and utilization of yams is limited by several factors including diseases and the lack of genetic diversity. An important global in vitro collection of yam germplasm is conserved in tissue culture by the Pacific Community’s (SPC) Centre for Pacific Crops and Trees (CePaCT) in Fiji. Evaluation of this germplasm and its distribution to PICs holds the key to improving production. However, similar to other vegetatively propagated crops, yam has a tendency to accumulate and perpetuate tuber-borne fungal and viral diseases. Although the tissue culture process eliminates fungal pathogens, remain an issue. As such, quarantine regulations prohibit the movement of the yam germplasm from the SPC-CePaCT germplasm collection to other countries due to the risks associated with movement of untested and/or -infected material. To comply with these standards, sensitive diagnostic tests are needed to enable the virus indexing of yam germplasm. Several different viruses are known to infect yams, but badnaviruses, namely the Dioscorea bacilliform viruses (DBVs), remain the least studied and the most difficult to diagnose. The limited studies conducted on DBVs in

PICs, using PCR-based studies, suggest that they are prevalent and are highly diverse.

This high genetic variability hinders the development of reliable PCR-based diagnostic tests. DBV diagnostics is further complicated by the fact that sequences are integrated in the genomes of some yam cultivars leading to false positives using

PCR-based tests. Further, since all studies on DBV in the Pacific have been PCR- based, the existence of episomal DBV in Pacific yam remains unknown. Therefore, iii

the aims of this PhD were to identify and characterize the diversity of episomal badnaviruses infecting yams in the Pacific to support the development of diagnostic protocols.

A rolling circle amplification (RCA)-based approach, previously used for the characterization of episomal banana streak viruses (BSVs) from banana, was used in initial screening of yam accessions from SPC-CePaCT. Using RCA, two novel badnaviruses, namely Dioscorea bacilliform AL virus 2 (DBALV2) from Papua New

Guinea (PNG) and Dioscorea bacilliform ES virus (DBESV) from Fiji were amplified and characterized. In addition, an isolate of Dioscorea bacilliform RT virus 2

(DBRTV2) was characterized from Samoa, which is the first report from the Pacific.

Further, a novel viral sequence, tentatively named Dioscorea nummularia-associated virus (DNUaV), infecting D. nummularia from Samoa was identified. The genome size, organization, and the presence of conserved amino acid domains of DNUaV were found to be characteristic of members of the family . However, based on the criteria used for the demarcation of species in the family by the International

Committee on of Viruses (ICTV), DNUaV is likely representative of a member of a new genus within the Caulimoviridae family. This was further supported by pairwise sequence analysis using pol gene sequences which showed 42 to 58% nucleotide and 27 to 53% amino acid identity between DNUaV and type members of other recognized genera within the family Caulimoviridae.

Despite some success in using RCA for the characterization of DBVs from

Pacific yams, in some cases the existing protocols yielded inconsistent results and produced background amplification of host circular DNA, such as plastids. Therefore, a suite of badnavirus-specific primers was designed from published sequences and used to optimize badnavirus-biased RCA protocols, such as directed-RCA (D-RCA)

iv

and specific primed-RCA (SP-RCA), using a commercially available phi29 polymerase. The optimized badnavirus RCA protocols performed up to 80-fold better than the commercially available TempliPhi kit-based random primed-RCA (RCA) based on Illumina MiSeq sequencing analysis. D-RCA was found to be the best protocol for badnavirus genome amplification and was subsequently used to test 224 yam accessions in the SPC-CePaCT Pacific yam germplasm collection including D. alata (185), D. esculenta (31), D. bulbifera (6), and one each of D. transversa and D. trifida. Thirty-five samples from three countries (PNG, Tonga and Vanuatu), representing five yam species (D. alata, D. bulbifera, D. esculenta, D. transversa and

D. trifida) produced restriction profiles indicative of badnaviruses following digestion with EcoRI and SphI. Twenty samples were selected, and the SphI digested RCA products were cloned and sequenced using Sanger sequencing (n=4) or undigested

RCA products sequenced using Illumina MiSeq (n=16) to obtain full length genome sequences. A total of 10 Dioscorea bacilliform AL virus (DBALV) genomes were generated from Vanuatu D. alata (n=2), D. bulbifera (n=3), D. esculenta (n=2), D. transversa (n=1) and D. trifida (n=1), or Tonga D. esculenta (n=1), while an additional

10 DBALV2 genomes were generated from PNG D. alata. This study also revealed that RCA, in combination with restriction analysis and/or Sanger sequencing and/or

Next Generation Sequencing (NGS), could be successfully used for the detection and characterization of DBVs from Pacific yams. Such a strategy can now be used for the detection and further characterization of DBVs in the Pacific and other regions. An understanding of the episomal virus diversity infecting Pacific yam will help the further improvement of diagnostic protocols.

This study has generated novel data that will support the global community in

DBV diagnostics and also provides a foundation for the development of a consolidated

v

global diagnostic approach to enable the routine testing of yam germplasm. In the immediate future, the results of this study will enable the indexing of the yam collections, currently conserved at SPC-CePaCT, for DBVs and support the safe distribution and utilization of yam germplasm.

Keywords: Dioscorea bacilliform virus (DBV), Dioscorea bacilliform AL virus

(DBALV), Dioscorea bacilliform AL virus 2 (DBALV2), Dioscorea bacilliform ES virus (DBESV), Dioscorea bacilliform RT virus 2 (DBRTV2), rolling circle amplification (RCA), random-primed RCA (RP-RCA), directed RCA (D-RCA), specific-primed RCA (SP-RCA), next generation sequencing (NGS)

vi

Publications

Peer reviewed publications related to this PhD thesis

1. Sukal, A., Kidanemariam, D., Dale, J., James, A. and Harding, R. (2017).

Characterization of badnaviruses infecting Dioscorea spp. in the Pacific reveals

two putative novel species and the first report of dioscorea bacilliform RT virus

2. Virus Research 238, 29–34.

2. Sukal, A., Kidanemariam, D., Dale, J., Harding, R. and James, A. (2018).

Characterization of a novel member of the family Caulimoviridae infecting

Dioscorea nummularia in the Pacific, which may represent a new genus of

dsDNA viruses. PLos ONE 13, 1-12.

3. Sukal, A., Kidanemariam, D., Dale, J., Harding, R. and James, A. (2018). An

improved degenerate-primed rolling circle amplification and next-generation

sequencing approach for the detection and characterization of badnaviruses.

Formatted for submission to Virology.

4. Sukal, A., Kidanemariam, D., Dale, J., Harding, R. and James, A. (2018).

Characterization and genetic diversity of Dioscorea bacilliform viruses infecting

Pacific yam germplasm collections. Formatted for submission to Plant

Pathology.

vii

This page is intentionally left blank

viii

Table of Contents

Abstract ...... ii

Publications ...... vii

Table of Contents ...... ix

List of Figures ...... xv

List of Tables ...... xvii

List of Abbreviations...... xix

Statement of Original Authorship ...... xxi

Acknowledgements ...... xxii

Chapter 1

Introduction ...... 1

1.1 Description of scientific problem investigated ...... 1

1.2 Overall objectives of the study ...... 2

1.3 Specific aims of the study ...... 2

1.4 Account of scientific progress linking the scientific papers ...... 3

1.5 References ...... 6

Chapter 2

Literature Review ...... 7

2.1 Introduction ...... 7

ix

2.2 Taxonomy and morphology of yam ...... 8

2.3 Origin and distribution of yam ...... 11

2.4 Yam in the South Pacific ...... 12

2.5 Caulimoviridae ...... 17

2.6 Badnaviruses ...... 18

2.7 Detection of badnaviruses ...... 22

2.8 Current knowledge of badnaviruses infecting yam ...... 27

2.9 Conservation and utilization of yams ...... 30

2.10 Research problem and aim ...... 31

2.11 Objectives ...... 31

2.12 References ...... 32

Chapter 3

Characterization of badnaviruses infecting Dioscorea spp. in the Pacific reveals two putative novel species and the first report of dioscorea bacilliform RT virus 2.

...... 43

Abstract ...... 45

Conflict of interest ...... 60

Financial support ...... 60

Acknowledgements ...... 61

References ...... 61

Supplementary information ...... 66

x

Chapter 4

Characterization of a novel member of the family Caulimoviridae infecting

Dioscorea nummularia in the Pacific, which may represent a new genus of dsDNA plant viruses ...... 69

Abstract ...... 70

Introduction ...... 71

Materials and methods ...... 74

Plant material and nucleic acid extraction ...... 74

RCA and sequencing ...... 74

Sequence comparisons and phylogenetic analyses ...... 76

Virus detection ...... 76

Results ...... 77

Identification of DNUaV ...... 77

Genome organization, sequence and phylogenetic analysis ...... 78

PCR screening for DNUaV ...... 83

Discussion ...... 88

Acknowledgments ...... 91

References ...... 92

Supporting Information ...... 97

xi

Chapter 5

An improved degenerate-primed rolling circle amplification and next-generation sequencing approach for the detection and characterization of badnaviruses .... 101

Abstract ...... 102

1. Introduction ...... 103

2. Materials and Methods ...... 107

2.1. Samples ...... 107

2.2. Primer design...... 107

2.3. Random-primed RCA (RP-RCA) ...... 108

2.4. Primer-spiked random-primed RCA (primer-spiked RP-RCA) ...... 109

2.5. Directed RCA (D-RCA) ...... 109

2.6. Specific-primed RCA (SP-RCA) ...... 111

2.7. Optimization of D-RCA and SP-RCA ...... 111

2.8. Restriction analysis, cloning and Sanger sequencing...... 111

3. Results ...... 113

3.1. Badnavirus RCA optimization ...... 113

3.2. RP-RCA, primer-spiked RP-RCA, D-RCA and SP-RCA amplification

of badnaviruses ...... 118

3.3. RCA-NGS for virus characterization ...... 122

3.4. DBALV isolate VUT02_De genome ...... 123

4. Discussion ...... 124

Acknowledgements ...... 127

References ...... 128

xii

Chapter 6

Characterization and genetic diversity of Dioscorea bacilliform viruses infecting

Pacific yam germplasm collections ...... 137

Abstract ...... 138

Introduction ...... 140

Methods ...... 141

Sample details, total nucleic acid (TNA) extractions...... 141

Viral DNA enrichment, RCA-RFLP, cloning and Sanger sequencing ...... 142

Next generation sequencing and genome assembly ...... 143

Pairwise sequence comparisons and phylogenetic analyses ...... 145

Results ...... 145

RCA and Sanger sequencing ...... 145

Next generation sequencing of Pacific DBV isolates ...... 152

Analysis of DBALV and DBALV2 complete genomes ...... 153

Dioscorea bacilliform AL virus (DBALV) from the Pacific ...... 156

Dioscorea bacilliform AL virus 2 (DBALV2) from the Pacific ...... 158

Discussion ...... 160

Acknowledgements ...... 166

References ...... 167

Chapter 7

General Discussion ...... 173

7.1. Dioscorea bacilliform virus (DBV) ...... 175

xiii

7.2. Dioscorea nummularia-associated virus (DNUaV)...... 176

7.3. Development of diagnostic protocols ...... 178

7.4. Conclusions ...... 181

7.5. References ...... 184

xiv

List of Figures

Chapter 2

Figure 1: FAO production data for yams from 2006-2016...... 9

Figure 2: Yam production issues in the Pacific...... 15

Figure 3: Morphology of caulimoviridae particles ...... 19

Figure 4: Linear representation of a typical badnavirus genome organization ...... 21

Figure 5: The principle of rolling circle amplification ...... 25

Figure 6: Variable symptoms on yam leaves infected with badnaviruses...... 28

Chapter 3

Figure 1: Linearized representation of the genome organization of (A) DBESV

(isolate FJ14), (B) DBALV2 (isolate PNG10) and (C) DBRTV2-[4RT] . 52

Figure 2: Phylogenetic tree constructed using maximum likelihood method based on

the partial (A) RT/RNaseH-coding and (B) full length sequences of DBESV

(isolate FJ14), DBALV2 (isolate PNG10) and DBRTV2-[4RT] (isolate

SAM01) and previously described badnavirus sequences...... 55

Chapter 4

Figure 1: Schematic representation of the genome organization of Dioscorea

nummularia-associated virus (DNUaV)...... 80

Figure 2: Amino acid sequence alignments of the conserved motifs in the proteins of

the type member of each genus in the family Caulimoviridae...... 82

Figure 3: Phylogenetic analysis using the maximum-likelihood method following

ClustalW alignment in MEGA7 to infer evolutionary relationships of

DNUaV...... 86

xv

Chapter 5 Figure 1: RCA of DBALV isolate VUT02_De to determine the effect of (A) gradient

incubation temperature, (B) dNTP concentration and (C) incubation

duration...... 116

Figure 2: RCA of DBALV isolate VUT02_De using concentrations of 0 to 500 ng

total nucleic acid...... 117

Figure 3: Different badnavirus infected samples amplified with (A) RP-RCA at 30°C

incubation, (B) RP-RCA at 36°C incubation, (C) primer-spiked RP-RCA at

30°C, (D) primer-spiked RP-RCA at 36°C, (E) D-RCA and (F) SP-RCA.

...... 121

Chapter 6

Figure 1: (A) EcoRI and (B) SphI restriction analysis of PNG D. alata RCA positive

samples...... 148

Figure 2: (A) EcoRI and (B) SphI restriction analysis of Vanuatu (VUT) and Tonga

(TON) RCA positive samples...... 149

Figure 3: PASC and phylogenetic analysis using the nucleotide of partial RT/RNase

H-coding nucleotide sequences showing the relationships of DBALV

isolates from this study with previously published complete DBALV

sequences...... 157

Figure 4: PASC and phylogenetic analysis using the nucleotide of partial RT/RNase

H-coding nucleotide sequences showing the relationships of DBALV2

isolates from this study with previously published complete DBALV2

sequences...... 159

xvi

List of Tables

Chapter 2

Table 1: Genome characteristics of genera within the family Caulimoviridae...... 20

Chapter 3

Table S1: Arrangement of genome features of DBESV (isolate FJ14), DBALV2

(isolate PNG10) and DBRTV2-[4RT] (isolate SAM01)...... 66

Table S2: PCR primers used for the detection of DBESV (isolate FJ14), DBALV2

(isolate PNG10) and DBRTV2-[4RT] (isolate SAM01)...... 67

Chapter 4

Table 1: Mean pairwise nucleotide (above diagonal) and amino acid (below diagonal)

similarity between the pol gene of DNUaV and the type members of the

eight current genera within the family Caulimoviridae...... 83

S1 Table: Details of yam partial RT/RNase H-coding sequences used in the

phylogenetic analysis of DNUaV...... 97

S2 Table: Acronyms, GenBank accession and virus names of sequences used for

phylogenetic analysis in Fig 3B...... 99

Chapter 5

Table 1: Sequences of primers used in primer-spiked RP-RCA, D-RCA and SP-RCA

protocols ...... 110

xvii

Chapter 6

Table 1: Summary of badnavirus RCA testing results……………………….……146

Table 2: Arrangement of genomic feature of DBALV2 isolates obtained from

sequencing of RCA products ...... 154

Table 3: Arrangement of genomic feature of DBALV isolates obtained from

sequencing of RCA products ...... 155

xviii

List of Abbreviations aa amino acid

BLAST basic local alignment search tool bp base pair/s

°C degrees Celsius

CePaCT Centre for Pacific Crops and Trees

CTAB cetyl trimethyl ammonium bromide

CTCB Centre for Tropical Crops and Biocommodities

DNA deoxyribonucleic acid dNTP deoxynucleotide ds double-stranded

DTT dithiothreitol

ELISA enzyme-linked immunosorbent assay

FAO Food and Agriculture Organization of the United Nations

IC-PCR immuno-capture polymerase chain reaction

ICTV International Committee on Taxonomy of Viruses

IR intergenic region kbp kilobase pair/s kDa kilodalton/s mg milligram min minute/s

µg microgram

µl microlitre/s

Mt million tonne/s

xix

NCBI National Centre for Biotechnology Information

NF nuclease free ng nanogram/s

NGS next generation sequencing nm nanometre/s nt nucleotide/s

ORF open reading frame

PCR polymerase chain reaction

PICs Pacific Island Countries

QUT Queensland University of Technology

RCA rolling circle amplification

RFLP restriction fragment length polymorphism

ρmol picomole/s

RNA ribonucleic acid

RNase H ribonuclease H

RT s second/s

SEF Science and Engineering Faculty

SPC Pacific Community spp. species

U units

xx

Statement of Original Authorship

The work compiled in this thesis has not been previously submitted to meet the requirements of an award at this or any other higher education institution. To my knowledge this thesis contains no previously published or written work of another person except where due reference is made.

QUT Verified Signature

xxi

Acknowledgements

This project would not have eventuated if it were not for Grahame Jackson,

Michael Furlong and Richard Markham, and their passion to enhance food and nutritional security through agricultural development in the Pacific. The importance of yams for Pacific agriculture led to this project being funded through the Australian

Centre for International Agricultural Research (ACIAR) (#PC/2010/065). I also thank

ACIAR for the John Allwright Fellowship which provided funding for my PhD. I would also like to acknowledge SPC-CePaCT for its continued efforts to support safe conservation and utilization of plant genetic resource. SPC is a wonderful organization which understands the need for staff capacity building and has supported this project and made their yam collections available for this study.

A big thank you to my supervisors, Rob Harding, Anthony James and James

Dale, for their continued support, advice and supervision during my PhD. It was a learning experience every time I had a chat with Rob and AJ, their guidance always put me in the right direction. I will forever remember our accomplishments under this project and our celebrations of our accomplishments. Ben, I believe a ‘Thank You

Mate’ is in order for all the well-placed advice, the rugby chats and beer shouts. The

CTCB crew thank you. My four years have gone by very quickly and I am thankful for all the support, laughter, and the beers along the way. Dani your patience in dealing with the lab is commendable. Thank you all for your help.

This acknowledgment would not be complete if I do not dedicate an entire paragraph thanking my friend/brother, Dawit. You have been instrumental in teaching me the tricks and trade of the pipette when I needed it the most. I would also like to thank his better half Abigail, thanks for bearing with us when we left you and went out for a few beers. xxii

This research would not have been successful without the support of my family. I would like to thank my parents for their support, they are the reason I am what I am today. The last couple of years have been hard for my family as we lost something that was dear to our heart. Kapil my bro, you went doing what you loved, we miss you and I dedicate this thesis in your name.

Above all I believe my wife, Anjani, should get my biggest thanks. You have been a pillar of strength for me. Your encouragement and support have pushed me on to reach this stage and I am grateful. Thanks for taking care of the fort while I have been away. I would like to acknowledge my kids, Eesha and Darsh, sorry for being away all this time but dad’s going to make up for the lost time. I love you guys.

xxiii

Chapter 1

Introduction

This thesis is presented in ‘Thesis by Publication’ style containing a comprehensive literature review section (Chapter 2) followed by four results chapters

(Chapters 3 to 6) and a general discussion chapter (Chapter 7). Chapter 3 has been published in the journal Virus Research, Chapter 4 has been provisionally accepted for publication in the journal PLoS ONE, while chapters 5 and 6 have been formatted for submission to Virology and Plant Pathology, respectively. The presentation of each of the results chapters follows the formatting style of the target journal.

1.1 Description of scientific problem investigated

Yam (Dioscorea spp.) is one of the major staple food crops of the Pacific Island

Countries, however, despite having high economic and cultural importance yam production has declined. In the Pacific there is huge potential for development of yam into a major export commodity and as a crop to support food security in the midst of climate change. However, exploitation of yam to its full potential has been slow in some of the Pacific Island Countries, mainly due to the lack of genetic diversity, which prevents the selection of desirable agronomical traits such as pest and disease resistance as well as those important for export markets, such as nutrition, quality, and the shape and size of tubers. The improvement of yam is further hindered by the rare and often male-dominated flowering of the Pacific yam (D. alata). Access to diversity will enable the countries to screen/evaluate for the desired agronomical traits and increase local production. The diversity to drive this exists as an in vitro collection conserved with the SPC-CePaCT. This unique collection has been amassed from the

1

Pacific and has the potential to support selection for the desired crop traits. However, utilization of this unique collection has been very limited. The main reason for limited utilization of the collection is due to the unavailability of diagnostic protocols to test the collections for viruses, mainly the badnavirus -Dioscorea bacilliform virus (DBV).

To address this knowledge gap, the current PhD study was commissioned. The study was aimed at characterizing the diversity of DBVs prevalent in the Pacific and developing protocols for DBV indexing. Ultimately, the results of this study are aimed at supporting the indexing of yams conserved with SPC-CePaCT to enable access to genetic diversity by the PICs to increase utilization for enhanced food and nutrition security in the Pacific.

1.2 Overall objectives of the study

The overall objective of this study was to characterize the diversity of episomal

Dioscorea bacilliform viruses present in Pacific yam and develop diagnostic protocols to support safe exchange of yam germplasm.

1.3 Specific aims of the study

The specific aims of this project were to (i) screen the pacific germplasm collections conserved at SPC-CePaCT for episomal DBV sequences, (ii) characterize the molecular diversity of the episomal DBV sequences existent in the Pacific and (iii) develop diagnostic protocols for the detection of DBVs.

2

1.4 Account of scientific progress linking the scientific papers

The SPC-CePaCT yam collection comprising of D. alata, D. rotundata, D. esculenta, D. bulbifera, D. nummularia, D. transversa and D. trifida originating from different Pacific Island Countries, namely Fiji, Federated States of Micronesia (FSM),

New Caledonia, Papua New Guinea (PNG), Vanuatu, Samoa and Tonga, was acclimatized in the SPC-CePaCT screenhouse for at least three months. Total nucleic acid (TNA) were extracted and brought to QUT for DBV screening. Previously described RCA protocol (James et al., 2011) and restriction analysis using a number of endonucleases (EcoRI, BamHI, KpnI, SalI and StuI), determined from published badnavirus sequences and from experimentation, were used to screen samples for potential badnavirus (DBV) sequences. Following RCA and restriction analysis, one sample from Fiji, two samples from PNG and one sample from Samoa produced restriction profiles indicative of DBV. The RCA restriction fragments were sequenced and identified as three distinct species, Dioscorea bacilliform AL virus 2 (DBALV2),

Dioscorea bacilliform ES virus (DBESV) and Dioscorea bacilliform RT virus 2

(DBRTV2) from the PNG, Fiji and Samoa samples, respectively. At the time of identification all three were novel species, however, during the course of manuscript preparation, DBRTV2 was published by Bömer et al. (2016). Since there was no previous sequence information for DBALV2 and DBESV, and the Pacific isolate of

DBRTV2, the complete genome sequences of each of the species was generated and the results are presented in Chapter 3.

During further RCA screening and sequencing an additional previously unidentified sequence group was detected from D. nummularia originating from

Samoa. The polymerase (pol) gene sequence was determined and compared to published DBV and badnavirus sequences. Based on pol gene nucleotide sequence

3

comparisons, it was found that the amplified sequence had very low nucleotide sequence identity (42-58%) with published badnavirus sequences. The RCA restriction fragments were cloned and sequenced to generate the complete genome sequence. The complete genome was found to be typical of a member of the family

Caulimoviridae but was different from the members of other genera in the family.

During phylogenetic studies it clustered between genus Badnavirus and genus

Tungroviruses, however, it had very low sequence identity with members of both genera. It was determined to represent a new member of the family Caulimoviridae and possibly a new genus within the family. Chapter 4 discusses in detail the new sequence group identified, which is tentatively named as Dioscorea nummularia- associated virus (DNUaV), giving evidence for the sequence to be considered representative of a new genus within the family Caulimoviridae.

Following initial RCA screening using the RCA kits with added primers as described previously (James et al., 2011) it was found that the protocol was inconsistent and non-reproducible for DBV amplification from yam. Where amplification was achieved, it was also found that a lot of the amplified products were of plant origin, such as plastid. Eventually, the RCA protocol as used by James et al

(2011) was determined to be unsuitable for DBV amplification, as also suggested by other authors (Bömer et al., 2016; Umber et al., 2014). RCA for badnavirus amplification was improved by manipulating the individual components of the RCA using the Phi29 enzyme (ThermoFisher Scientific) as previously described for low- copy number human papillomavirus amplification (Marincevic-zuniga et al., 2012;

Rockett et al., 2015). Using 182 badnavirus complete genome sequences, representing

43 species, 28 primers were designed and together with the BadnaFP/RP primers

(Yang et al., 2003) and Badna-MFP/MRP primers (Turaki, 2014) were optimized for

4

use in two RCA protocols, namely directed RCA (D-RCA) and specific-primed RCA

(SP-RCA). Chapter 5 describes the optimization of the different D-RCA and SP-RCA, with comparisons to previously described RCA protocols. D-RCA and SP-RCA were both found to reliably amplify badnavirus DNA, however, D-RCA was found to be the most efficient method. Subsequently, D-RCA followed by restriction analysis and/or Sanger and/or next generation sequencing (NGS) was used to screen the entire

Pacific collection and characterize the episomal DBV diversity prevalent in Pacific yams. The findings are summarised in Chapter 6.

This study is the first comprehensive survey of episomal DBV present in

Pacific yams. Two novel DBV species (DBALV2 and DBESV), two previously characterized DBV species (DBALV and DBRTV2), and a novel virus genome which may represent a new genus in the family Caulimoviridae have been described in this study. The sequence information generated has been deposited in the National Centre for Biotechnology Information (NCBI) GenBank database and will be available to the global community to further the global initiatives on safe yam germplasm exchange.

Further, the described RCA and PCR-based protocols can be used for testing DBALV,

DBALV2, DBESV, DBRTV2 and DNUaV and expanded further to cover a wider range of DBVs.

5

1.5 References

Bömer, M., Turaki, A., Silva, G., Kumar, P., Seal, S., 2016. A sequence-independent

strategy for amplification and characterization of episomal badnavirus sequences

reveals three previously uncharacterized yam badnaviruses. Viruses 8, 188.

James, A.P., Geijskes, R.J., Dale, J.L., A., Harding, R.M., 2011. Development of a

novel rolling-circle amplification technique to detect banana streak virus that also

discriminates between integrated and episomal virus sequences. Plant Dis. 95,

57–62.

Marincevic-Zuniga, Y., Gustavsson, I., Gyllensten, U., 2012. Multiply-primed rolling

circle amplification of human papillomavirus using sequence-specific primers.

Virology 432, 57-62.

Rockett, R., Barraclough, K.A., Isbel, N.M., Dudley, K.J., Nissen, M.D., Sloots, T.P.,

Bialasiewicz, S., 2015. Specific rolling circle amplification of low-copy human

polyomaviruses BKV, HPyV6, HPyV7, TSPyV, and STLPyV. J. Virol. Methods

215–216, 17–21.

Turaki, A.A., 2014. Characterization of badnavirus sequences in West African Yams

(Dioscorea spp.). PhD thesis, University of Greenwich, United Kingdom, 240.

Umber, M., Filloux, D., Muller, E., Laboureau, N., Galzi, S., Roumagnac, P., Iskra-

Caruana, M.-L., Pavis, C., Teycheney, P.-Y., Seal, S.E., 2014. The genome of

African yam (Dioscorea cayenensis-rotundata complex) hosts endogenous

sequences from four distinct badnavirus species. Mol. Plant Pathol. 15, 790–801.

Yang, I.C., Hafner, G.J., Dale, J.L., Harding, R.M., 2003. Genomic characterization

of taro bacilliform virus. Arch. Virol. 148, 937–949.

6

Chapter 2

Literature Review

2.1 Introduction

Yams (Dioscorea spp.) are classified in the family and are represented by some 644 species (Lebot, 2009). Ten of these species (D. alata, D. cayenensis, D. nummularia, D. opposita, D. rotundata and D. transversa, D. esculenta, D. bulbifera, D. trifida and D. pentaphylla) are economically important as cultivated crops. Cultivated yam is ranked as the fourth most important root crop by production after potato, sweet potato and cassava. In 2016, global production of yam was estimated at around 66 million tonnes (Mt) with production in the Pacific region being less than half a million ton (FAOSTAT, 2018). Cultivated yam provides a staple food for millions of people in Africa, South America, Asia and the Pacific, while wild yam provides a valuable source of food in times of famine and scarcity (Risimeri,

2001). They also provide valuable pharmacologically active compounds in traditional medicine (Lebot, 2009). Yam production is highest in West Africa, which accounts for 95% of the world’s total production (Asiedu and Sartie, 2010; FAOSTAT, 2018,

Mignouna et al., 2008). Dioscorea cayenensis-rotundata is by far the most common species complex cultivated in this region, however, D. alata and D. esculenta are predominant in Pacific Island Countries (PICs) (Kenyon et al., 2008; Lebot, 2009).

Apart from being a food staple, yam tubers are of great ceremonial significance in many PICs, such as Papua New Guinea, Vanuatu, Fiji, Tonga and Pohnpei (Elevitch and Love, 2011). Yam has the potential to become an important export commodity for some of the PICs, especially to niche markets in countries where Pacific Islanders have settled, such as Australia, New Zealand and the United States (SPYN, 2003; Sukal,

7

2015). Despite its importance, production of yams in the PICs has not increased to meet the demand. FAO data for yam production in the PICs shows a mere 88 thousand tonnes increase over the ten years from 2006 to 2016 compared to a global production increase of 13 million tonnes for the same period (Figure 1). However, the relative increase in the Pacific is about the same as that globally. This increase in PICs yam production is far less than the demand that exists. Unavailability of suitable planting material for desired agronomical traits as well as pests and diseases, such as anthracnose disease caused by the fungus Colletotrichum gloeosporioides, are some of the major constraints to production.

2.2 Taxonomy and morphology of yam

Dioscorea spp. are annual or perennial tuber-bearing, dioecious, climbing, tropical monocots comprising the largest genus within the family Dioscoreaceae and consisting of about 644 species (Govaerts et al., 2007; Lebot, 2009; Mignouna et al.,

2007). The genus Dioscorea is further divided into taxonomic sections - the 10 species that are used as food crops belong to five different sections, namely Enantiophyllum

(D. alata, D. cayenensis, D. nummularia, D. opposita, D. rotundata and D. transversa), Combilium (D. esculenta), Opsophyton (D. bulbifera), Macrogynodium

(D. trifida) and Lasiophyton (D. pentaphylla) (Lebot, 2009). Both annual and perennial types occur, with the roots and stems renewing annually after spending the dry part of the year in dormancy, a period which can vary from one to six months.

8

A 70

65

60

55

50 Million tonnes Million 45

40

B 440

420

400

380

360

Thousand tonnes Thousand 340

320

Figure 1: FAO production data for yams from 2006-2016. (A) World yam production data; and (B) PICs yam production data (FAOSTAT, 2018).

9

Tuber morphology, stem twining direction, dioecy and fruit/seed wing shape are the most important phenotypic characters for the systematic classification of the genus (Wilkin et al., 2005). Yams in the Enanthiophyllum section produce one to three large tubers and have winged stems twining to the right (anticlockwise), with occasional bulbils. Similarly, yams in the Combilium section have stems that also twine right (anticlockwise) but have numerus smaller tubers. Yams of the Opsophyton section produce aerial bulbils with the stems twining to the left (clockwise), while

Macrogynodium produce small tubers, with spineless stems twining to the left

(clockwise). Lasiophyton section produces a cluster of medium-sized tubers, with stems twining to the left (clockwise) and with large thorns on the stem (Lebot, 2009).

However, there still remains some controversies regarding yam taxonomy, particularly for the cultivated guinea yams and their wild relatives (Girma et al., 2016; Mignouna et al., 2002; Ramser et al., 1997; Terauchi et al., 1992).

Cytological studies on Dioscorea spp. initially determined the basic chromosome number of x=9 and x=10 in species from America, Europe and Africa, while species in Asia and Oceania all have the basic chromosome number of x=10

(Essad, 1984). However, further studies have revealed that D. rotundata, which was previously thought to be a tetraploid species (2n=40), is a diploid with a basic chromosome number of x=20 (Scarcelli et al., 2005). Another study notes that D. trifida (2n=80), which was considered to be octoploid, is actually a tetraploid with a basic chromosome number of x=20 (Bousalem et al., 2006). Dioscorea alata was considered to have a basic chromosome number of x=10 by many authors (Abraham and Nair, 1991; Gamiette et al., 1999; Malapa et al., 2005), however, a more recent study provided genetic evidence to confirm diploidy of plants with 2n=40 chromosomes, hence, supporting the hypothesis that plants with 2n=40, 60 and 80

10

chromosomes are diploids, triploids and tetraploids, respectively, with the basic chromosome number of D. alata being x=20 (Arnau et al., 2009). Furthermore, variable ploidy levels have also been observed within D. nummularia (2n=3x=60 to

2n=6x=120) (Lebot et al., 2017).

2.3 Origin and distribution of yam

The major cultivated yam species are believed to have originated in the tropical areas of three separate continents, including (i) Africa (mainly West Africa for D. rotundata, D. cayenensis and D. dumetorum), (ii) the region comprising South-East

Asia and the South Pacific (D. alata and D. esculenta) and (iii) South America (D. trifida) (Arnau et al., 2010; Ayensu and Coursey, 1972; Bhattacharjee et al., 2011).

The occurrence of Dioscorea spp. in southern Asia, Africa and South America predates human history, with domestication events occurring independently in

America, Africa, Madagascar, South and South-East Asia, and Oceania (Arnau et al.,

2010; Ayensu and Coursey, 1972; Bhattacharjee et al., 2011; Lebot, 2009). Although debate still exists on the origin of D. alata, as it has not yet been found in its wild state in nature, studies using AFLP markers show that it is closely related to D. nummularia and D. transversa (Malapa et al., 2005), which are restricted to the South-East Asian islands and Oceania. Therefore, by association, it has been proposed that D. alata may belong to a South Asian-Oceanic gene pool that is confined to the former Sahulian and

Wallacean regions (Arnau et al., 2010; Lebot, 1997).

Yam cultivation occurs in many tropical regions. Based on annual production data published by the Food and Agriculture Organization for the United Nations

(FAO) statistics, yam is cultivated in 61 different countries (FAOSTAT, 2018).

However, this is not a comprehensive list as some countries (such as China) do not

11

provide annual production statistics to FAO. D. alata is the most widely distributed species in the humid and semi-humid tropics and, together with D. rotundata and D. cayenensis which are indigenous to West Africa, is the most important yam in terms of quantity produced and marketed (Asiedu and Sartie, 2010; Lebot, 2009). D. alata

(also referred to as greater yam) together with D. esculenta (also referred to as lesser yam) is the most important cultivated yam species in the Pacific. Although, D. esculenta is important as a staple food species and is the dominant species grown by yam-dependent communities of Papua New Guinea, D. alata still retains a high status from its use in cultural and ceremonial purposes throughout the Pacific (O’Sullivan,

2010).

2.4 Yam in the South Pacific

Yam is one of the most important food staples of the Pacific. It ranks among the top crops in the tropics along with cassava (Manihot esculenta) and aroids (Colocasia spp. and Xanthosoma spp.) (Elevitch and Love, 2011). It is also considered a traditional crop with great ceremonial significance in many areas of the Pacific, such as Papua New Guinea, Vanuatu, Fiji, Tonga and Pohnpei, and its cultivation is consistent with maintaining a fragile ecosystem, such as those of the lowland areas, where they are typically cultivated (Elevitch and Love, 2011; SPYN, 2003). Across the Pacific region, yam cultivation is largely seasonal with the dominant cultivated species being the greater yam (D. alata) and lesser or sweet yam (D. esculenta).

However, there is scattered cultivation of D. rotundata, D. bulbifera, D. nummularia,

D. transversa and D. trifida throughout the region.

Yam is considered as one of the crops with good potential for commercial exploitation in the Pacific as niche markets become available in Australia, New

12

Zealand and the United States through Pacific Islander migration (SPYN, 2003; Sukal et al., 2015). However, there are several potential issues which may hinder commercial exploitation. For example, cultivation is very resource intensive and requires costly materials that are in short supply, such as for staking (Figure 2A). Further, harvesting of some cultivars due to tuber size is time-consuming and laborious compared to other staple food crops in the Pacific (Figure 2B). The lack of information on the nutritional content of the different yam species and cultivars hinders potential utilization of yam as a high-quality vegetable (Figure 2C). The rare and often male-dominated flowering of D. alata also prevents crop improvement for agronomic traits as well as for resistance to pests and diseases, such as anthracnose (Figure 2D), which decreases the production of many cultivars in the Pacific (SPYN, 2003; Sukal et al., 2015).

Therefore, there is an urgent need to collate and evaluate existing yam genetic resources present in the Pacific to select for desired traits. Through the efforts of the

Pacific Community (SPC) under a European Union (EU)-funded South Pacific Yam

Network (SPYN) project, yam genetic resources (mainly D. alata) from the Pacific have been collected and characterized. This clonal selection was subsequently conserved at the Centre for Pacific Crops and Trees (CePaCT), the genebank of SPC located in Suva, Fiji, in addition to in situ collections. This yam collection continues to expand as new species and varieties are received from within and outside of the

Pacific region. Despite its importance, most of the collection has remained unavailable for distribution, with only 10% being distributed throughout the region. This has been primarily due to the threat of spreading diseases in the vegetative plant material.

13

A

B

D

14

C

D

Figure 2: Yam production issues in the Pacific. (A) Typical staking-type production system in Fiji; (B) and (C) depict the varied size and shapes and colours of tubers from

D. alata, (with the purple tubers having high anthocyanin content); and (D) anthracnose disease damage on a Fijian yam cultivar ‘Taniela’ of D. alata.

15

Like other vegetatively propagated crops, yam has a tendency to accumulate and perpetuate tuber-borne fungal and viral diseases (Kenyon et al., 2008). Tissue culture, which is the preferred method of yam germplasm exchange, helps eliminate fungal pathogens, however, viruses remain an issue. Viruses belonging to the families

Alphaflexiviridae (genus ), (genus ),

Bromoviridae (genus ), Caulimoviridae (genus Badnavirus),

(genus Macluravirus and ), (genus and ) and (genus ) are known to infect yams (Kenyon et al.,

2001; Menzel et al., 2014).

Yam production is reduced by virus infection in all yam-growing areas (Fuji et al., 1999; Kenyon et al., 2008; Mantell and Haque, 1978). Viruses also prevent/limit the exchange of yam germplasm due to the risks associated with movement of untested and/or virus-infected material. Quarantine regulations restrict the exchange of planting material between PICs and West Africa unless plant material is certified as disease- free. Even germplasm movement between the PICs is prohibited by in-country quarantine regulations under the minimum phytosanitary standards set out by quarantine legislation. To comply with these standards, sensitive diagnostic tests are needed to enable the virus indexing of yam germplasm (Kenyon et al., 2008).

Of the nine different virus genera known to infect yams, the badnaviruses remain the least studied and the most difficult to diagnose. Studies on badnaviruses infecting yam suggest that they are highly diverse and have a high prevalence in Pacific yams (Kenyon et al., 2008) which complicates detection efforts. The difficulty in detection is further complicated by the fact that partial badnavirus sequences have been found to integrate into the yam genome (Bousalem et al., 2009; Kenyon et al., 2008;

Seal et al., 2014; Umber et al., 2014). The results from previous studies also suggests

16

that the detection of badnaviruses in yam remains unreliable due to the high sequence diversity of field isolates (Bömer et al., 2016; Bousalem et al., 2009; Kenyon et al.,

2008; Seal et al., 2014; Umber et al., 2014).

2.5 Caulimoviridae

Badnaviruses are one of eight genera within the family Caulimoviridae. All members consist of reverse-transcribing, double-stranded deoxyribonucleic acid

(dsDNA) containing plant viruses, which are distinguished from each other primarily by genome organization (Geering, 2014; Geering and Hull, 2012). Six of the genera,

Caulimovirus, , , Rosadnavirus, and have isometric virions that are 52 nm in diameter with an icosahedral T=7 symmetry, while two genera, Badnaviruses and Tungroviruses, have bacilliform virions, with dimensions of 30 x 130-150 nm and are tubular structures based on a T=3 icosahedron cut across its threefold axis (Geering, 2014; Hull, 1996) (Figure 3). The genome size of the circular dsDNA is between 7.2 and 9.2 kbp (Table 1) with all the coding capacity on the positive-strand. The different genera are distinguished by the organization of their open reading frames (ORFs) and there can be between one and seven ORFs depending on the genus (Geering, 2014) (Table 1).

For some viruses within the genera Badnavirus, Petuvirus, Solendovirus and

Caulimovirus, viral DNA has been shown to be integrated within the host nuclear genome. This integration of viral sequences into the host genome is referred to as endogenous viral elements (EVEs) and is the result of illegitimate recombination, with the integrated sequences being fragmented and rearranged when compared to the respective ancestral virus genomes (Geering, 2014). EVEs are not restricted to the family Caulimoviridae, but have also been reported from the family

17

(Ashby et al., 1997; Bejarano et al., 1996), from host plants such as Nicotiana spp., and Dioscorea spp. (Filloux et al., 2015). In contrast, a greater diversity of EVEs belonging to the four genera of Caulimoviridae (referred to as endogenous pararetroviruses (EPRVs)) have been characterized in several plant species such as tobacco, banana, bitter orange, fig, petunia, rice, potato and relatives, lucky bamboo, tomato, Dahlia, pineapple, grapes, poplar and yam (Chabannes and Iskra-Caruana,

2013; Geering et al., 2010; Mette et al., 2002; Staginnus et al., 2009; Staginnus and

Richert-Pöggeler, 2006; Umber et al., 2014).

2.6 Badnavirus

Badnaviruses typically contain three open reading frames (ORFs) with ORF 1 encoding a protein of unknown function, ORF 2 coding for a virion-associated protein

(VAP) and ORF 3 coding for a large polyprotein which is processed into several mature proteins including a movement protein (MP), coat protein (CP), an aspartic protease (AP) reverse transcriptase (RT) and ribonuclease H (RNase H) (Geering,

2014; Olszewski and Lockhart, 2011) (Figure 4). However, Sweet potato pakakuy virus (SPPV), which has a typical badnavirus genome organization, has the ORF 3 divided into two, namely ORF 3a (with the MP and CP domains) and ORF 3b (with

AP, RT and RNase H) (Geering, 2014; Kreuze et al., 2009).

Transmission of badnaviruses is mainly by vegetative propagation, mealybugs and through seeds (Bhat et al., 2014; Huang and Hartung, 2001; Lockhart et al., 1997;

Yang et al., 2003a, 2003b). However, transmission of Rubus yellow net virus (RYNV),

Gooseberry vein banding associated virus (GVBAV) and Spiraea yellow leaf spot virus (SYLSV) are by aphid vectors, while Piper yellow mottle virus (PYMoV) is

18

transmitted by both the citrus mealybug (Planococcus citri) and the black pepper lacebug (Diconocoris distanti) (Geering, 2014).

Figure 3: Morphology of caulimoviridae particles. (Top left) reconstruction of the surface structure of a cauliflower particle showing T=7 symmetry. (Top right) Cutaway surface reconstruction showing multilayer structure (Adapted from

Cheng et al., 1992). (Bottom) Negative contrast electron micrograph of particles of

Commelina yellow mottle virus, stained with 2% sodium phosphotungstate, pH 7.0

(bar represents 10 nm) (Adapted from Geering and Hull, 2012).

19

Table 1: Genome characteristics of genera within the family Caulimoviridae (Geering,

2014).

Genus Genome size (kbp) No. of open reading frames (ORFs)

Badnavirus 7.2-9.2 3-4

Caulimovirus 7.8-8.2 6

Cavemovirus 7.7-8.2 4

Petuvirus 7.2 1

Rosadnavirus 9.3 8

Solendovirus 7.8-8.8 4

Soymovirus 8.1-8.2 7

Tungrovirus 8.0 4

20

Figure 4: Linear representation of a typical badnavirus genome organization showing the positions of the putative tRNAmet-binding site (tRNAmet),

TATA box and polyadenylation signal (polyA); open reading frames ORF 1; ORF 2; ORF 3 showing movement protein (MP), capsid protein (CP), zinc finger (Zn), aspartic protease (AP), reverse transcriptase (RT) and ribonuclease H (RNase H) motifs.

21

2.7 Detection of badnaviruses

Detection of badnaviruses is complicated by their high serological and genetic heterogeneity (Harper et al., 2005; Kenyon et al., 2008; Lockhart, 1986; Muller et al.,

2011; Seal et al., 2014). Polyclonal antibodies have been prepared against a cocktail of approximately 30 purified isolates of banana streak virus (BSV) and sugarcane bacilliform virus (SCBV). Although the antisera (referred to as BenL antisera) has been raised against BSV and SCBV isolates, this polyclonal antibody mixture appears to cross-react with other badnavirus species. It has been used for badnavirus detection from crops such as bananas, yams and others using enzyme-linked immunosorbent assay (ELISA) and immunosorbent electron microscopy ISEM (Le Provost et al.,

2006; Seal et al., 2014). However, the heterogeneity of badnaviruses in these crops, as well as the fact that badnaviruses are often present at a low titre in infected plants, means that not all infections are detected (Phillips et al., 1999; Seal et al., 2014; Yang et al., 2003b), which renders these techniques unsuitable for routine diagnostics. Seal et al. (2014) showed that although the BenL antisera detected some badnavirus isolates from yams it failed to detect all isolates. Therefore, nucleic acid-based techniques, due to their increased sensitivity, have been preferred for the detection of badnaviruses.

Based on the consensus sequence of the RT/RNase H-coding region of published badnavirus sequences, a pair of degenerate primers (BadnaFP/RP) were designed

(Yang et al., 2003b) and have been used widely for badnavirus detection in a range of host plants, including taro, yam, and sugarcane (Bousalem et al., 2009; Guimarães et al., 2015; Kenyon et al., 2008; Seal and Muller, 2007; Yang et al., 2003a, 2003b). The sequence of the core RT/RNase H-coding region amplified using these primers is not only important for identification of badnaviruses but is also used for taxonomic classification within the genus (Geering and Hull, 2012). According to the

22

International Committee on Taxonomy of Viruses (ICTV), a threshold of 20% dissimilarity in nucleotide identity in this region is used as the criterion for the demarcation of a new species. However, the discovery of integrated badnavirus sequences in plant host genomes, as reported in banana (Harper et al., 1999; Le Provost et al., 2006; Ndowora et al., 1999) and yam (Dioscorea cayenensis-rotundata) (Umber et al., 2014), complicates the use of these primers for diagnosis. Although the

BadnaFP/RP primers have been shown to successfully detect many badnaviruses, not all amplification generated is from episomal viral sequences since integrated sequences can also act as a template in PCR (Seal et al., 2014). Immuno-capture PCR

(IC-PCR), which combines serological and molecular approaches, is another diagnostic method used for detection of badnaviruses. However, IC-PCR relies on antisera successfully trapping virus particles as well as the complete elimination of genomic DNA from the reaction tubes, which is often difficult. IC-PCR using the

BenL antisera has been used successfully to detect badnaviruses in bananas (Le

Provost et al., 2006), however, these authors showed that genomic DNA can bind to the tube and amplification can also result from integrated BSV sequences using IC-

PCR.

Rolling circle amplification (RCA) is a more recent method that has shown promise in the detection of badnaviruses. RCA is a simple and efficient isothermal enzymatic process that utilizes unique DNA (Phi29, Bst, and Vent exo-DNA polymerase) or RNA (T7 RNA polymerase) polymerases to generate tandem repeats of single-stranded circular DNA and RNA templates. During RCA, the polymerase continuously adds nucleotides to a primer annealed to the template and, since it can be isothermal, there is no need for a thermal cycler or a thermostable polymerase (Ali et al., 2014). Among the different polymerases available for RCA, bacteriophage phi29

23

DNA polymerase has shown great potential for the amplification of viral circular

DNA. Phi29 possesses several features, such as strand displacement activity, proof- reading activity and generation of very long synthesis product (Figure 5A-C), that make it ideal for the efficient amplification of circular DNA molecules from complex biological samples (Johne et al., 2009). These properties of phi29 DNA polymerase are useful in the study of small circular DNA molecules and phi29 polymerase- dependent RCA has been used in the study of several virus families with circular DNA molecules (James et al., 2011a; Johne et al., 2009). The ability of phi29 to support sequence independent amplification of circular DNA molecules, as long as a primer binds to the template, means specific primers are not required and the RCA can be carried out with random primers. This characteristic of phi29 polymerase has led to the discovery of many novel DNA viruses infecting humans, animals and plants (Johne et al., 2009). As observed in Figure 5C, multiprimed-RCA (using either specific or random primers) leads to the production of long concatemeric molecule(s) of double- stranded DNA. The concatemer can be subjected to restriction fragment length polymorphism (RFLP) analysis. Digestion with a single cutting endonuclease can produce a full-length monomeric product that can subsequently be purified, cloned and sequenced.

24

Figure 5: The principle of rolling circle amplification (adapted from Johne et al.,

2009). Blue lines denote target DNA sequences, green lines represent oligonucleotide primers and red lines represent new DNA synthesized by the polymerase. Arrowheads indicate 3` ends of the synthesized DNA strands. (A) Linear template and single primer. After primer binding, the polymerase synthesizes one complementary strand;

(B) circular template and single primer. The polymerase synthesizes a complementary strand beginning at the bound primer. After one round, the primer and the synthesized strand are displaced, and DNA synthesis continues for additional rounds. By this, a long concatemeric single-stranded DNA is produced; and (C) circular template and multiple random primers. The synthesis is initiated by multiple primers bound to the template. DNA synthesis using strand displacement is carried out as in (B). However, primers still present in the reaction mixture bind to the displaced strand and are used as additional initiation points for DNA synthesis. The multiple products are long concatemeric molecules of double-stranded DNA.

25

The application of RCA-technology for the discovery, characterization and diagnosis of plant-infecting viruses has mainly concentrated on small single-stranded

DNA genomes of viruses in the families Geminiviridae (genome n=1 or 2, size 2.5-3 kbp) and (genome n=6 to 8, size 1-1.5 kbp) (Johne et al., 2009). However,

James et al. (2011a) showed that it could be utilized for the detection and characterization of larger genome sized viruses belonging to the family

Caulimoviridae, specifically the badnavirus, banana streak virus (BSV). Importantly,

James et al. (2011a) also demonstrated the utility of sequence-nonspecific phi29 based

RCA to differentiate between episomal sequences of BSV from the integrated forms

(EPRVs) present in some banana genotypes. Their study also showed that RCA could be improved by the addition of target-specific primers. The optimized RCA protocol was subsequently used for the characterization of badnaviruses infecting banana

(Baranwal et al., 2014; Carnelossi et al., 2014; James et al., 2011b; Javer-Higginson et al., 2014; Wambulwa et al., 2012; Wambulwa et al., 2013), cacao (Chingandu et al.,

2017a, 2017b; Muller et al., 2018), fig (Laney et al., 2012), mulberry (Chiumenti et al., 2016), Rubus spp. (Diaz-Lara et al., 2015) and yam (Bömer et al., 2018, 2016;

Sukal et al., 2017; Umber et al., 2014).

Over the last 10 years, next generation sequencing (NGS) has gained momentum as a tool for viral whole genome characterization. It is considered to be a highly efficient, rapid, low cost DNA, or RNA high-throughput sequencing option for plant viruses genomes (Hadidi et al., 2016). Recently, RCA coupled with NGS has also proven effective for the characterization of badnaviruses from cacao (Chingandu et al.,

2017a, 2017b; Muller et al., 2018).

26

2.8 Current knowledge of badnaviruses infecting yam

Yam-infecting badnaviruses are probably the most prevalent viruses infecting yam globally (Bousalem et al., 2009; Eni et al., 2008a, 2008b; Kenyon et al., 2008).

Yam badnaviruses are transmitted mechanically and by several species of mealybugs

(family Pseudococcidae) in a semi-persistent manner (Atiri et al., 2003; Bömer et al.,

2016; Kenyon et al., 2001; Odu et al., 2004; Phillips et al., 1999). Badnavirus infection in yam can be symptomless (Figure 6A) or cause a range of symptoms including veinal chlorosis (Figure 6B), necrosis and distortions such as puckering and crinkling (Bömer et al., 2016; Lebot, 2009; Phillips et al., 1999; Seal et al., 2014).

Badnaviruses were first reported infecting yams from the Caribbean, where bacilliform-shaped virions were observed together with a flexuous virus, causing internal brown spot disease in D. alata and D. cayenensis-rotundata (Harrison and

Roberts, 1973; Mantell and Haque, 1978). Two decades later the complete genome of two DBV isolates were sequenced from Nigerian D. alata and named as Dioscorea alata bacilliform virus (ICTV species name Dioscorea bacilliform AL virus, DBALV)

(Briddon et al., 1999; Phillips et al., 1999). Later, the complete genome sequences of two additional isolates representing a second species were sequenced from D. sansibarensis originating from Benin and named Dioscorea sansibarensis bacilliform virus (ICTV species name Dioscorea bacilliform SN virus, DBSNV)(Seal and Muller,

2007).

27

Figure 6: Variable symptoms on yam leaves infected with badnaviruses. Leaves of

(A) D. rotundata (var. Ogoja) showing no marked viral symptoms and (B) D. rotundata (var. Nwokpoko) showing veinal chlorosis (adapted from Seal et al., 2014).

28

Later, Kenyon et al. (2008) used degenerate badnavirus-specific primers which amplify the core RT/RNase H-coding region to identify 11 new DBV sequence groups

(K1-K11) with <79% nucleotide identity to each other, together with two additional sequence groups (K12-K13) with very low sequence similarity to other badnaviruses.

These latter two groups were considered by the authors to either represent highly divergent badnaviruses, members of a new Caulimoviridae genus, or remnants of viral sequences that had become integrated into the host genome following illegitimate recombination. An additional DBV sequence group (DBV-D) from Guinea yam with

<80% similarity to previously characterized sequences was subsequently reported by

Bousalem et al. (2009). However, both of these studies used PCR-based approaches with degenerate badnavirus primers and the existence of episomal virus counterparts of these groups was not confirmed.

Bömer et al. (2016) used RCA and PCR to characterize three further sequence groups (T13-T15) from yams and determined the complete genome sequence of two of the groups, namely Dioscorea bacilliform rotundata virus 1 (ICTV species name

Dioscorea bacilliform RT virus 1 (DBRTV1), group T13) and Dioscorea bacilliform rotundata virus 2 (ICTV species name Dioscorea bacilliform RT virus 2 (DBRTV2), group T14). Bömer et al. (2018) and Umber et al. (2017) in separate studies published complete genomes of two additional species, namely Dioscorea bacilliform rotundata virus 3 (DBRTV3) and Dioscorea bacilliform trifida virus (ICTV species name

Dioscorea bacilliform TR virus (DBTRV)), respectively. DBV genome characterization has received much attention in recent years due to the need for virus testing of yam to facilitate germplasm exchange. However, most of these studies have been restricted to the African and Caribbean region with the Pacific receiving very little attention.

29

2.9 Conservation and utilization of yams

The International Institute of Tropical Agriculture (IITA) conserves the largest collection of yams, 5788 yam accessions, which includes nine species with D. rotundata and D. alata making up the majority of the collection (www.iita.org). Yam germplasm collections are also maintained at the Central Tuber Crops Research

Institute (CTCRI) in Trivandrum (India), Vietnam Agricultural Science Institute

(VASI) in Hanoi (Vietnam), Phil Root Crops in Baybay (the Philippines), Vanuatu

Agricultural Research and Technical Centre (VARTC) Vanuatu, French National

Institute for Agricultural Research (INRA), Centre de Coopération Internationale en

Recherche Agronomique pour le Développement (CIRAD) in Guadeloupe (West

Indies) with additional smaller collections in China, Japan and many PICs. A collection of PICs accessions is conserved as an in vitro collection at the SPC-CePaCT in Fiji.

The collection at SPC-CePaCT was assessed by DNA fingerprinting using simple sequence repeat (SSR) markers in 2011 by IITA and shown to contain a unique subset of the global yam genetic diversity (SPC 2011, unpublished data). SPC-

CePaCT since has acquired other species of yam, including wild species, through various donor-funded projects, in particular the Global Crop Diversity Trust to expand its unique Pacific yam collection. SPC-CePaCT receives a long-term grant for sustainable conservation of its yam collection and currently maintains an in vitro yam collection (283 cultivars) which comprises seven species, namely, D. alata, D. rotundata, D. esculenta, D. bulbifera, D. nummularia, D. transversa and D. trifida

(SPC 2018, unpublished data). The collection stands to expand as new species and varieties are received from within and outside of the Pacific. However, as iterated before, only 10% of the collection has been available for exchange due to the inability

30

to successfully certify the plants as virus-free due to the lack of diagnostic protocols.

2.10 Research problem and aim

Despite its significant value as a food security and climate resilient crop, and a crop with potential for economic exploitation, yam production in the PIC’s remains low. There is the potential to increase production through the introduction and evaluation of new genetic diversity with important agronomical traits. This can be achieved through utilization of field collections held in countries within the Pacific region, the in vitro collection held at SPC-CePaCT and the collections held in other global genebanks such as IITA. However, a lack of reliable diagnostic protocols for the detection of yam badnaviruses has hindered the exchange of yam germplasm. The

African region has made great progress in characterizing the diversity of badnaviruses in yam, however, very limited work has been done in the Pacific. Therefore, the aim of this project was to identify and characterize yam-infecting badnaviruses in the

Pacific for the subsequent development of reliable diagnostic tests.

2.11 Objectives

The objectives of this study were therefore to:

1. Identify and characterize badnaviruses infecting Pacific collections of yam

germplasm.

2. Develop, evaluate and validate protocols for the detection of episomal

badnavirus infections in yam.

3. Investigate the genetic diversity of Pacific isolates of badnaviruses which

infect yam.

31

2.12 References

Abraham, K., Nair, G.P., 1991. Polyploidy and sterility in relation to sex in Dioscorea

alata L. (Dioscoreaceae). Genetica 83, 93–97.

Ali, M.M., Li, F., Zhang, Z., Zhang, K., Kang, D.-K., Ankrum, J.A., Le, X.C., Zhao,

W., 2014. Rolling circle amplification: a versatile tool for chemical biology,

materials science and medicine. Chem. Soc. Rev. 43, 3324.

Arnau, G., Abraham, K., Sheela, M.N.N., Chair, H., Sartie, A., Asiedu, R., Chair, H.,

Sartie, A., Asiedu, R., 2010. Yams, in: Bradshaw, J.E. (Ed.), Root and Tuber

Crops. Springer New York, New York, NY, pp. 127–148.

Arnau, G., Nemorin, A., Maledon, E., Abraham, K., 2009. Revision of ploidy status

of Dioscorea alata L. (Dioscoreaceae) by cytogenetic and microsatellite

segregation analysis. Theor. Appl. Genet. 118, 1239–1249.

Ashby, M.K., WaZrry, A., Bejarano, E.R., Khashoggi, A., Burrell, M., Lichtenstein,

C.P., 1997. Analysis of multiple copies of geminiviral DNA in the genome of

four closely related Nicotiana species suggest a unique integration event. Plant

Mol. Biol. 35, 313–321.

Asiedu, R., Sartie, A., 2010. Crops that feed the World 1. Yams. Food Secur. 2, 305–

315.

Atiri, G.I., Winter, S., Alabi, O.J., 2003. Yam, in: Loebenstein, G., Thottappilly, G.

(Eds.), Virus and Virus-like Diseases of Major Crops in Developing Countries.

Springer Netherlands, Dordrecht, The Netherlands, pp. 249–250.

Ayensu, E.S., Coursey, D.G., 1972. Guinea yams the botany, ethnobotany, use and

possible future of yams in West Africa. Econ. Bot. 26, 301–318.

Baranwal, V.K., Sharma, S.K., Khurana, D., Verma, R., 2014. Sequence analysis of

shorter than genome length episomal banana streak OL virus like sequences

32

isolated from banana in India. Virus Genes 48, 120–127.

Bejarano, E.R., Khashoggi, A, Witty, M., Lichtenstein, C., 1996. Integration of

multiple repeats of geminiviral DNA into the nuclear genome of tobacco during

evolution. Proc. Natl. Acad. Sci. U. S. A. 93, 759–764.

Bhat, A.I., Sasi, S., Revathy, K.A., Deeshma, K.P., Saji, K. V., 2014. Sequence

diversity among badnavirus isolates infecting black pepper and related species in

India. Virus Dis. 25, 402–407.

Bhattacharjee, R., Gedil, M., Sartie, A., Otoo, E., Dumet, D., Kikuno, H., Kumar, P.L.,

Asiedu, R., 2011. Dioscorea, Wild Crop Relatives: Genomic and Breeding

Resources, Industrial Crops. Springer Berlin Heidelberg, Berlin, Heidelberg.

Bömer, M., Rathnayake, A.I., Visendi, P., Silva, G., Seal, S.E., 2018. Complete

genome sequence of a new member of the genus Badnavirus, Dioscorea

bacilliform RT virus 3, reveals the first evidence of recombination in yam

badnaviruses. Arch. Virol. 163, 533–538.

Bömer, M., Turaki, A., Silva, G., Kumar, P., Seal, S., 2016. A sequence-independent

strategy for amplification and characterization of episomal badnavirus sequences

reveals three previously uncharacterized yam badnaviruses. Viruses 8, 188.

Bousalem, M., Arnau, G., Hochu, I., Arnolin, R., Viader, V., Santoni, S., David, J.,

2006. Microsatellite segregation analysis and cytogenetic evidence for tetrasomic

inheritance in the American yam Dioscorea trifida and a new basic chromosome

number in the Dioscoreae. Theor. Appl. Genet. 113, 439–451.

Bousalem, M., Durand, O., Scarcelli, N., Lebas, B.S.M., Kenyon, L., Marchand, J.L.,

Lefort, F., Seal, S.E., 2009. Dilemmas caused by endogenous pararetroviruses

regarding the taxonomy and diagnosis of yam (Dioscorea spp.) badnaviruses:

Analyses to support safe germplasm movement. Arch. Virol. 154, 297–314.

33

Briddon, R.W., Phillips, S., Brunt, A., Hull, R., 1999. Analysis of the sequence of

Dioscorea alata bacilliform virus; comparison to other members of the badnavirus

group. Virus Genes 18, 277–283.

Carnelossi, P.R., Bijora, T., Facco, C.U., Silva, J.M., Picoli, M.H.S., Souto, E.R.,

Oliveira, F.T. De, 2014. Episomal detection of banana streak OL virus in single

and mixed infection with Cucumber mosaic virus in banana “Nanicão Jangada.”

Trop. Plant Pathol. 39, 342–346.

Chabannes, M., Iskra-Caruana, M.-L., 2013. Endogenous pararetroviruses—a

reservoir of virus infection in plants. Curr. Opin. Virol. 3, 615–620.

Cheng, R.H., Olson, N.H., Baker, T.S., 1992. : A 420 subunit

(T=7), multilayer structure. Virology 186, 655–668.

Chingandu, N., Kouakou, K., Aka, R., Ameyaw, G., Gutierrez, O.A., Herrmann, H.-

W., Brown, J.K., 2017a. The proposed new species, cacao red vein virus, and

three previously recognized badnavirus species are associated with cacao swollen

shoot disease. Virol. J. 14, 199.

Chingandu, N., Zia-ur-rehman, M., Sreenivasan, T.N., Surujdeo-Maharaj, S.,

Umaharan, P., Gutierrez, O.A., Brown, J.K., Thyail, M.Z., Zia-ur-rehman, M.,

Sreenivasan, T.N., Surujdeo-Maharaj, S., Umaharan, P., Gutierrez, O.A., Brown,

J.K., Thyail, M.Z., 2017b. Molecular characterization of previously elusive

badnaviruses associated with symptomatic cacao in the New World. Arch. Virol.

162, 1363–1371.

Chiumenti, M., Morelli, M., De Stradis, A., Elbeaino, T., Stavolone, L., Minafra, A.,

2016. Unusual genomic features of a badnavirus infecting mulberry. J. Gen.

Virol. 97, 3073–3087.

Diaz-Lara, A., Mosier, N.J., Keller, K.E., Martin, R.R., 2015. A variant of Rubus

34

yellow net virus with altered genomic organization. Virus Genes 50, 104–110.

Elevitch, B.C.R., Love, K., 2011. Farm and Forestry Production and Marketing

Profiles: Highlighting value-added strategies., in: Elevitch, C.R. (Ed.), Specialty

Crops for Pacific Island Agroforestry. Permanent Agriculture Resources (PAR),

Holualoa, Hawai‘i., p. 14.

Eni, A.O., Hughes, J.D.A., Asiedu, R., Rey, M.E.C., 2008a. Sequence diversity among

badnavirus isolates infecting yam (Dioscorea spp.) in Ghana, Togo, Benin and

Nigeria. Arch. Virol. 153, 2263–2272.

Eni AO, Hughes JDA, Rey MEC, 2008b. Survey of the incidence and distribution of

five viruses infecting yams in the major yam-producing zones in Benin. Ann Appl

Biol 153, 223–32.

Essad, S., 1984. Geographic variation of basic chromosome numbers and polyploidy

in the Dioscorea genus with regard to counting for transversa Brown,

pilosiuscula Bert. and trifida (L.). Agronomie 4, 611–617.

FAOSTAT, 2018. Production Statistics (FAOSTAT). Rome, Italy: Food and

Agriculture Organization of the United Nations.

Filloux, D., Murrell, S., Koohapitagtam, M., Golden, M., Julian, C., Galzi, S., Uzest,

M., Rodier-Goud, M., D’Hont, A., Vernerey, M.S., Wilkin, P., Peterschmitt, M.,

Winter, S., Murrell, B., Martin, D.P., Roumagnac, P., 2015. The genomes of

many yam species contain transcriptionally active endogenous geminiviral

sequences that may be functionally expressed. Virus Evol. 1, vev002.

Fuji, S., Mitobe, I., Nakamae, H., Natsuaki, K.T., 1999. Nucleotide sequence of coat

protein gene of yam mild mosaic virus. J. Gen. Virol. 1415–1419.

Gamiette, F., Bakry, F., Ano, G., 1999. Ploidy determination of some yam species

(Dioscorea spp.) by flow cytometry and conventional chromosomes counting.

35

Genet. Resour. Crop Evol. 46, 19–27.

Geering, A.D., 2014. Caulimoviridae (Plant Pararetroviruses), in: ELS. John Wiley &

Sons, Ltd, Chichester, UK.

Geering, A.D.W., Hull, R., 2012. Family Caulimoviridae, in: King, A.M.Q., Adams,

M.J., Carstens, E.B., Lefkowitz, E.J. (Eds.), Virus Taxonomy. Ninth Report of

the International Committee on Taxonomy of Viruses. Elsevier Academic Press,

Amsterdam, The Netherlands, pp. 429–443.

Geering, A.D.W., Scharaschkin, T., Teycheney, P., 2010. The classification and

nomenclature of endogenous viruses of the family Caulimoviridae. Arch. Virol.

155, 123–131.

Girma, G., Spillane, C., Gedil, M., 2016. DNA barcoding of the main cultivated yams

and selected wild species in the genus Dioscorea. J. Syst. Evol. 54, 228–237.

Govaerts, R., Wilkin, P., Saunders, R., 2007. World checklist of : yams

and their allies, Kew: Royal. ed. London, U.K.

Guimarães, K.M.C., Silva, S.J.C., Melo, A.M., Ramos-Sobrinho, R., Lima, J.S.,

Zerbini, F.M., Assunção, I.P., Lima, G.S.A., 2015. Genetic variability of

badnaviruses infecting yam (Dioscorea spp.) in northeastern Brazil. Trop. Plant

Pathol. 40, 111–118.

Hadidi, A., Flores, R., Candresse, T., Barba, M., 2016. Next-generation sequencing

and genome editing in plant virology. Front. Microbiol. 7, 1–12.

Harper, G., Hart, D., Moult, S., Hull, R., Geering, A., Thomas, J., 2005. The diversity

of banana streak virus isolates in Uganda. Arch. Virol. 150, 2407–2420.

Harper, G., Osuji, J.O., Heslop-Harrison, J.S., Hull, R., 1999. Integration of banana

streak badnavirus into the Musa genome: molecular and cytogenetic evidence.

Virology 255, 207–213.

36

Harrison, B.D.D., Roberts, I.M.M., 1973. Association of virus-like particles with

internal brown spot of yam (Dioscorea alata). Trop. Agric. 50, 355–340.

Huang, Q., Hartung, J.S., 2001. Cloning and sequence analysis of an infectious clone

of citrus yellow mosaic virus that can infect sweet orange via agrobacterium-

mediated inoculation. J. Gen. Virol. 82, 2549–2558.

Hull, R., 1996. Molecular biology of rice tungro viruses. Annu. Rev. Phytopathol. 34,

275–297.

James, A.P., Geijskes, R.J., Dale, J.L., A., Harding, R.M., 2011a. Development of a

novel rolling-circle amplification technique to detect banana streak virus that also

discriminates between integrated and episomal virus sequences. Plant Dis. 95,

57–62.

James, A.P., Geijskes, R.J., Dale, J.L., Harding, R.M., 2011b. Molecular

characterization of six badnavirus species associated with leaf streak disease of

banana in East Africa. Ann. Appl. Biol. 158, 346–353.

Javer-Higginson, E., Acina-Mambole, I., González, J.E., Font, C., González, G.,

Echemendía, A.L., Muller, E., Teycheney, P.Y., 2014. Occurrence, prevalence

and molecular diversity of banana streak viruses in Cuba. Eur. J. Plant Pathol.

138, 157–166.

Johne, R., Mu, H., Rector, A., Ranst, M. Van, Stevens, H., 2009. Rolling-circle

amplification of viral DNA genomes using phi29 polymerase. Trends Microbiol.

17, 205–211.

Kenyon, L., Lebas, B.S.M., Seal, S.E., 2008. Yams (Dioscorea spp.) from the South

Pacific Islands contain many novel badnaviruses: implications for international

movement of yam germplasm. Arch. Virol. 153, 877–889.

Kenyon, L., Shoyinka, S. a, Hughes, J. d’A., Odu, B.O., 2001. An overview of viruses

37

infecting Dioscorea yams in sub-Saharan Africa, in: Proceeding of Plant

Virology in Sub-Saharan Africa Conference. International institute of tropical

agricultural (IITA), Ibadan, pp. 432–439.

Kreuze, J.F., Perez, A., Untiveros, M., Quispe, D., Fuentes, S., Barker, I., Simon, R.,

2009. Complete viral genome sequence and discovery of novel viruses by deep

sequencing of small : A generic method for diagnosis, discovery and

sequencing of viruses. Virology 388, 1–7.

Laney, A.G., Hassan, M., Tzanetakis, I.E., 2012. An integrated badnavirus is prevalent

in fig germplasm. Phytopathology 102, 1182–1189.

Le Provost, G., Iskra-Caruana, M.-L., Acina, I., Teycheney, P.-Y., 2006. Improved

detection of episomal banana streak viruses by multiplex immunocapture PCR.

J. Virol. Methods 137, 7–13.

Lebot, V., 1997. Synthèse des résultats sur les plantes à tubercules en Nouvelle

Caledonie. Noumea, New Caledonia.

Lebot, V., 2009. Tropical Root and Tuber Crops: Cassava, Sweet Potato, Yams and

Aroids. Paris University, France.

Lebot, V., Malapa, R., Abraham, K., 2017. The Pacific yam (Dioscorea nummularia

Lam.), an under-exploited tuber crop from Melanesia. Genet. Resour. Crop Evol.

64, 217–235.

Lockhart, B.E.L., 1986. Purification and serology of a bacilliform virus associated

with banana streak disease. Phytopathology 76, 995-999.

Lockhart, B.E.L., Kiratiya-Angul, K., Jones, P., Eng, L., De Silva, P., Olszewski, N.E.,

Lockhart, N., Deema, N., Sangalang, J., 1997. Identification of piper yellow

mottle virus, a mealybug-transmitted badnavirus infecting Piper spp. in Southeast

Asia. Eur. J. Plant Pathol. 103, 303–311.

38

Malapa, R., Arnau, G., Noyer, J.L., Lebot, V., 2005. Genetic diversity of the greater

yam (Dioscorea alata L.) and relatedness to D. nummularia Lam. and D.

transversa Br. as revealed with AFLP markers. Genet. Resour. Crop Evol. 52,

919–929.

Mantell, S.H., Haque, S.Q., 1978. Incidence of internal brown spot disease in white

Lisbon yams (Dioscorea alata) during storage. Exp. Agric. 14, 167.

Menzel, W., Thottappilly, G., Winter, S., 2014. Characterization of an isometric virus

isolated from yam (Dioscorea rotundata) in Nigeria suggests that it belongs to a

new species in the genus Aureusvirus. Arch. Virol. 159, 603–606.

Mette, M.F., Kanno, T., Aufsatz, W., Jakowitsch, J., Winden, J. Van Der, Matzke,

M.A., Matzke, A.J.M., 2002. Endogenous viral sequences and their potential

contribution to heritable virus resistance in plants. The EMBO J. 21, 461–469.

Mignouna, H.D., Abang, M.M., Asiedu, R., 2008. Genomics of Yams, a Common

Source of Food and Medicine in the Tropics, in: Genomics of Tropical Crop

Plants. Springer New York, New York, NY, pp. 549–570.

Mignouna, H.D., Dansi, A., Zok, S., 2002. Morphological and isozymic diversity of

the cultivated yams (Dioscorea cayenensis/Dioscorea rotundata complex) of

Cameroon. Genet. Resour. Crop Evol. 49, 21–29.

Mignouna, H.D., Abang, Asiedu R, 2007. Advances in yam (Dioscorea spp.) genetics

and genomics, Proceedings of the 13th ISTRC Symposium. Arusha, Tanzania,

pp. 72–81.

Muller, E., Dupuy, V., Blondin, L., Bauffe, F., Daugrois, J.H., Nathalie, L., Iskra-

Caruana, M.L., 2011. High molecular variability of sugarcane bacilliform viruses

in Guadeloupe implying the existence of at least three new species. Virus Res.

160, 414–419.

39

Muller, E., Ravel, S., Agret, C., Abrokwah, F., Dzahini-Obiatey, H., Galyuon, I.,

Kouakou, K., Jeyaseelan, E.C., Allainguillaume, J., Wetten, A., 2018. Next

generation sequencing elucidates cacao badnavirus diversity and reveals the

existence of more than ten viral species. Virus Res. 244, 235–251.

Ndowora, T., Dahal, G., LaFleur, D., Harper, G., Hull, R., Olszewski, N.E., Lockhart,

B., 1999. Evidence that badnavirus infection in Musa spp. can originate from

integrated pararetroviral sequences. Virology 255, 214–220.

O’Sullivan, J.N., 2010. Yam nutrition and soil fertility management in the Pacific,

ACIAR Monograph. Canberra, Australia, pp. 122

Odu, B.O., Hughes, J.D.A.A., Asiedu, R., Ng, N.Q., Shoyinka, S.A., Oladiran, O.A.,

2004. Responses of white yam (Dioscorea rotundata) cultivars to inoculation

with three viruses. Plant Pathol. 53, 141–147.

Olszewski, N.E., Lockhart, B., 2011. Badnavirus, The Springer Index of Viruses.

Springer New York, New York, NY.

Phillips, S., Briddon, R.W., Brunt, A.A., Hull, R., 1999. The Partial Characterization

of a Badnavirus Infecting the Greater Asiatic or Water Yam (Dioscorea alata). J.

Phytopathol. 147, 265–269.

Ramser, J., Weising, K., Terauchi, R., Kahl, G., Lopez-Peralta, C., Terhalle, W., 1997.

Molecular marker based taxonomy and phylogeny of Guinea yam (Dioscorea

rotundata – D . cayenensis). Genome 40, 903–915.

Risimeri, J.B., 2001. Yams and food security in the lowlands of PNG, in: Food

Security for Papua New Guinea. Australian Centre for International Agricultural

Research, Canberra.

Scarcelli, N., Daïnou, O., Agbangla, C., Tostain, S., Pham, J.L., 2005. Segregation

patterns of isozyme loci and microsatellite markers show the diploidy of African

40

yam Dioscorea rotundata (2n=40). Theor. Appl. Genet. 111, 226–232.

Seal, S., Muller, E., 2007. Molecular analysis of a full-length sequence of a new yam

badnavirus from Dioscorea sansibarensis. Arch. Virol. 152, 819–825.

Seal, S., Turaki, A., Muller, E., Kumar, P.L., Kenyon, L., Filloux, D., Galzi, S., Lopez-

Montes, A., Iskra-Caruana, M.L., 2014. The prevalence of badnaviruses in West

African yams (Dioscorea cayenensis-rotundata) and evidence of endogenous

pararetrovirus sequences in their genomes. Virus Res. 186, 144–154.

SPYN, 2003. Yam : Cultivar Selection for Disease Resistance & Commercial Potential

in Pacific Islands, EU-INCO-DC project final report. Montpellier.

Staginnus, C., Iskra-Caruana, M.L., Lockhart, B., Hohn, T., Richert-Pöggeler, K.R.,

2009. Suggestions for a nomenclature of endogenous pararetroviral sequences in

plants. Arch. Virol. 154, 1189–1193.

Staginnus, C., Richert-Pöggeler, K.R., 2006. Endogenous pararetroviruses: two-faced

travelers in the plant genome. Trends Plant Sci. 11, 485–491.

Sukal, A., Kidanemariam, D., Dale, J., James, A., Harding, R., 2017. Characterization

of badnaviruses infecting Dioscorea spp. in the Pacific reveals two putative novel

species and the first report of Dioscorea bacilliform RT virus 2. Virus Res. 238,

29–34.

Sukal, A.C., Taylor, M., Tuia, V.S., 2015. Viruses and their impact on the utilization

of plant genetic resources in the Pacific. Acta Hortic. 1101, 127–132.

Terauchi, R., Chikaleke, V.A., Thottappilly, G., Hahn, S.K., 1992. Origin and

phylogeny of Guinea yams as revealed by RFLP analysis of chloroplast DNA and

nuclear ribosomal DNA. Theor. Appl. Genet. 83, 743–751.

Umber, M., Filloux, D., Muller, E., Laboureau, N., Galzi, S., Roumagnac, P., Pavis,

C., Teycheney, P.Y., Seal, S.E., Iskra-Caruana, M.L., Pavis, C., Teycheney, P.Y.,

41

Seal, S.E., 2014. The genome of African yam (Dioscorea cayenensis-rotundata

complex) hosts endogenous sequences from four distinct badnavirus species.

Mol. Plant Pathol. 15, 790–801.

Umber, M., Gomez, R.M., Gélabale, S., Bonheur, L., Pavis, C., Teycheney, P.Y.,

2017. The genome sequence of Dioscorea bacilliform TR virus, a member of the

genus Badnavirus infecting Dioscorea spp., sheds light on the possible function

of endogenous Dioscorea bacilliform viruses. Arch. Virol. 162, 517–521.

Wambulwa, M.C., 2012. Rolling circle amplification is more sensitive than PCR and

serology-based methods in detection of banana streak virus in Musa germplasm.

Am. J. Plant Sci. 03, 1581–1587.

Wambulwa, M.C., Wachira, F.N., Karanja, L.S., Kiarie, S.M., Muturi, S.M., 2013.

The influence of host and pathogen genotypes on symptom severity in banana

streak disease. African J. Biotechnol. 12, 27–31.

Wilkin, P., Schols, P., Chase, M.W., Chayamarit, K., Furness, C. A., Huysmans, S.,

Rakotonasolo, F., Smets, E., Thapyai, C., 2005. A plastid gene phylogeny of the

yam genus, Dioscorea: roots, fruits and Madagascar. Syst. Bot. 30, 736–749.

Yang, I.C., Hafner, G.J., Dale, J.L., Harding, R.M., 2003a. Genomic characterization

of taro bacilliform virus. Arch. Virol. 148, 937–949.

Yang, I.C., Hafner, G.J., Revill, P.A., Dale, J.L., Harding, R.M., 2003b. Sequence

diversity of South Pacific isolates of Taro bacilliform virus and the development

of a PCR-based diagnostic test. Arch. Virol. 148, 1957–1968.

42

Chapter 3

Characterization of badnaviruses infecting Dioscorea spp. in the Pacific reveals two putative novel species and the first

report of dioscorea bacilliform RT virus 2

Amit C. Sukal1, Dawit B. Kidanemariam1,2, James L. Dale1, Anthony P. James1,

Robert M. Harding1

1 Centre for Tropical Crops and Biocommodities, Queensland University of

Technology, Brisbane, 4001, Australia

2 National Agricultural Biotechnology Research Center, Ethiopian Institute of

Agricultural Research, P.O. Box 2003, Addis Ababa, Ethiopia

Virus Research 238:29–34

43

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

44

Abstract

The complete genome sequences of three new badnaviruses associated with yam

(Dioscorea spp.) originating from Fiji, Papua New Guinea and Samoa were determined following rolling circle amplification of the virus genomes. The full-length genomes consisted of a single molecule of circular double-stranded DNA of 8106 bp for isolate FJ14, 7871 bp for isolate PNG10 and 7426 bp for isolate SAM01. FJ14 and

PNG10 contained three open reading frames while SAM01 had an additional open reading frame which partially overlapped the 3′ end of ORF 3. Amino acid sequence analysis of ORF 3 from the three isolates confirmed the presence of conserved motifs typical of other badnaviruses. Phylogenetic analysis revealed the sequences to be closely related to other Dioscorea–infecting badnaviruses. FJ14 and PNG10 appear to be new species, which we have tentatively named dioscorea bacilliform ES virus

(DBESV) and dioscorea bacilliform AL virus 2 (DBALV2), respectively, while

SAM01 represents a Pacific isolate of the recently published dioscorea bacilliform RT virus 2 and is described as dioscorea bacilliform RT virus 2-[4RT] (DBRTV2-[4RT]).

Keywords: Yam, Episomal badnavirus, rolling circle amplification, Dioscorea,

Dioscorea bacilliform ES virus, Dioscorea bacilliform AL virus 2

45

Members of the genus Badnavirus (family Caulimoviridae) have non- enveloped, bacilliform-shaped virions with an approximate diameter of 30 nm and length ranging from 120 to 150 nm (King et al., 2012). The genome consists of a single molecule of circular, double-stranded DNA of 7.2–9.2 kb, typically encoding three open reading frames (ORFs) all on the (+) strand (Geering, 2014). Replication occurs via reverse transcription of a greater-than-genome length RNA which subsequently serves as template for both the translation of viral proteins and reverse transcription for replication of the genome (King et al., 2012). Badnavirus ORF 1 encodes a small protein with an unknown function, while ORF 2 encodes a protein referred to as VAP

(virion-associated protein) which possesses a conserved coiled-coil motif (Stavolone et al., 2001). ORF 3 encodes a large polyprotein that is cleaved into several mature proteins, including a movement protein (MP), coat protein (CP), aspartic protease

(AP), reverse transcriptase (RT) and ribonuclease H (RNase H) (Geering, 2014).

Badnaviruses are transmitted through vegetative propagation, mealybug vectors and in some cases through seed (Bhat et al., 2016). They are serologically and genetically heterogeneous. Further, genomic DNA of several species, such as banana streak viruses (BSV) (Gayral et al., 2008), dracaena mottle virus (DrMV) (Su et al.,

2007), fig badnavirus 1 (FBV-1) (Laney et al., 2012), and dioscorea bacilliform virus

(DBV) (Seal et al., 2014), are integrated into the host genome, which hinders the development of diagnostic protocols (Kenyon et al., 2008; Seal et al., 2014). This difficulty in diagnosis presents challenges to the safe exchange of germplasm.

Yams (Dioscorea spp.) are economically important, annual or perennial tuber- bearing, dioecious, climbing, tropical monocots classified in the family Dioscoreaceae

(Mignouna et al., 2008). Cultivated yams are ranked as the fourth most important root crop by production after potato, cassava and sweet potato (FAOSTAT, 2014). They

46

provide a staple food source for millions of people in Africa, South America, Asia and the Pacific, and wild yams provide a valuable food source in times of famine. Yam production is highest in West Africa, which accounts for 95% of the world’s total production (Mignouna et al., 2008). Although most of the production occurs in the

African region, predominated by Dioscorea rotundata-cayenensis, yam is of importance in the South Pacific where D. alata and D. esculenta are the dominant species (Kenyon et al., 2008). Complete genomes of five Dioscorea-infecting badnavirus species have been published, namely dioscorea bacilliform alata virus

(DBALV), dioscorea bacilliform sansibarensis virus (DBSNV), dioscorea bacilliform rotundata virus 1 (DBRTV1), dioscorea bacilliform rotundata virus 2 (DBRTV2) and dioscorea bacilliform trifida virus (DBTRV), however, none of these reports are from the Pacific region (Briddon et al., 1999; Seal and Muller, 2007; Bömer et al., 2016,

Umber et al., 2016).

Yam production and improvement in the Pacific region is hindered by a lack of genetic diversity. Germplasm exchange within the Pacific region and between

Pacific and Africa has been difficult due to a lack of reliable virus diagnostic protocols, especially for badnaviruses. To address this problem, a project was initiated in 2014 by the Secretariat of the Pacific Community (SPC), Fiji, to characterize the diversity of badnaviruses infecting yams in the Pacific region.

SPC maintains an in vitro collection of yams (278 cultivars) which is comprised of seven different species, namely, D. alata, D. rotundata, D. esculenta, D. bulbifera, D. nummularia, D. transversa and D. trifida. A subset of this collection (50 cultivars including D. alata [28], D. rotundata [1], D. esculenta [15], D. bulbifera [2],

D. nummularia [2], D. transversa [1] and D. trifida [1]) was initially screened using an immunocapture polymerase chain reaction (IC-PCR) protocol with a general

47

badnavirus polyclonal antiserum (BenL) kindly provided by Prof. Ben Lockhart

(University of Minnesota, USA) and the degenerate badnavirus primers BadnaFP/RP

(Yang et al., 2003). In extracts from four of the 50 accessions, including two D. alata types from Papua New Guinea (PNG) (DA-PNG03 and DA-PNG10), one D. esculenta type from Fiji (DE-FJ14) and one D. rotundata type from Samoa (DR-SAM01), the expected 579 bp product was amplified. To validate that the amplification was derived from episomal badnavirus DNA and not integrated badnavirus sequences, total nucleic acid (TNA) was extracted from leaf tissue (Kleinow et al., 2009) and subjected to rolling circle amplification (RCA) using the TempliPhi 100 Amplification Kit (GE

Healthcare).

Briefly, 1 μl of TNA (adjusted to 500 ng/μl with sterile water) was mixed with

4 μl of the kit sample buffer and the mixture was denatured for 3 min at 95°C and snap cooled on ice. Templihi kit reaction buffer (5 μl) was then mixed with 0.2 μl of phi29

DNA polymerase, added to each denatured TNA sample mixture, and incubated at

30°C for 18 h. The reaction mixture was then incubated at 65°C for 10 min to inactivate the phi29 enzyme. Based on in-silico restriction analysis of published full-length

DBSNV (GenBank accession DQ822073 and DQ822074) sequences, the RCA products were digested with the restriction enzymes BamHI and SalI for which the published DBSNV sequences contained only one or two recognition sites.

Digestion of the RCA-amplified DNA using SalI resulted in a single fragment of approximately 7.5 kb for all four samples, while digestion using BamHI yielded a single fragment of approximately 7.5 kb from sample SAM01, two fragments of approximately 4 and 3 kb from sample FJ14 and three fragments of approximately 3,

2.5 and 2 kb in samples PNG03 and PNG10. The restriction fragments were excised and purified using Freeze ‘N Squeeze™ DNA Gel Extraction Spin Columns (Bio-Rad)

48

and subsequently ligated into appropriately cut and dephosphorylated pUC19 and sequenced as described previously (James et al., 2011). BadnaFP/RP primers were used to sequence the RT/RNase H–coding region.

Pairwise sequence comparison of the 529 bp RT/RNase H–coding region, delimited by the BadnaFP/RP primers, of samples PNG03 and PNG10 revealed that they were identical. When the PNG sequences were compared with the sequences from

FJ14 and SAM01 there was 64- 69% nucleotide similarity between the three groups.

When analysed using BLASTn, FJ14 was found to be most similar to DBALV

(accession KX008571) with 73.2% nucleotide similarity, while PNG03/PNG10 was most similar to DBSNV (accession DQ822074) with 71% nucleotide similarity and

SAM01 was most similar to DBRTV2 (accession KX008577) with 95% nucleotide similarity.

Since full genome sequences of the three putative badnaviruses were not available in GenBank at the time of the original analysis, three independent full-length clones of FJ14, PNG10 (as a representative of the PNG isolates) and SAM01, generated from the SalI-digested RCA products, were completely sequenced in both directions by primer walking. To confirm the sequences spanning the SalI restriction sites, PCR was carried out using sequence-specific primers flanking this region.

Briefly, PCR master mix consisted of 10 μl of 2X GoTaq Green Master Mix

(Promega), 10 ρmol of each sequence-specific primer and 1 μl of TNA extract (diluted to 30–50 ng/μl) in a final volume of 20 μl. PCR cycling conditions were as follows: initial denaturation at 94°C for 2 min followed by 35 cycles of 94°C for 20 s, 50°C for

2 min, and 72°C for 2 min, with a final extension at 72°C for 10 min. The amplified products were cloned into pGEM-T Easy (Promega) and sequenced as described

49

previously (James et al., 2011). Complete genome sequences were then assembled using Geneious v9.0.2 (http://www.geneious.com; Kearse et al., 2012).

The assembled full-length genome sequences of FJ14, PNG10 and SAM01 comprised 8106 bp, 7871 bp and 7426 bp, respectively. The intergenic regions (IR) of

FJ14 and PNG10 comprised 1280 nt and 1210 nt, respectively, while the IR of SAM01 was considerably smaller at 751 nt. The IR of all isolates contained several conserved nucleic acid motifs previously described for plant dsDNA viruses (Benfey and Chua,

1990; Medberry and Olszewski, 1993). A putative tRNAmet-binding site was identified in all sequences (FJ141-18 and PNG101-18 - TGGTATCAGAGCTTGGTT, SAM011-18

- TGGTATCAGAGCTCGGTT; underlined nucleotides are mismatches) with 94% and 89% nucleotide identity to the plant tRNAmet consensus sequence (3'

ACCAUAGUCUCGGUCCAA 5′), which has been previously described as the priming site for reverse transcription (Medberry et al., 1990). The putative tRNAmet- binding site was designated as the origin of the circular genome, consistent with the convention currently used for badnaviruses. Transcriptional promoter elements including putative TATA boxes and polyadenylation signals, analogous to the 35S promotor of cauliflower mosaic virus, were also identified in the region upstream of the tRNAmet-binding site (Table S1)

SnapGene® software (www.snapgene.com; GSL Biotech) was used to predict the presence of putative ORFs on the plus strand of the three full-length sequences.

FJ14 and PNG10 were predicted to have three ORFs, while SAM01 was predicted to have four ORFs, with the size and arrangement consistent with other published badnavirus sequences (Fig. 1.). For FJ14, PNG10 and SAM01, ORF 1 was predicted to encode a putative protein of 142 amino acids (aa) with a calculated Mr of 16.3 kDa,

16.5 kDa and 16.6 kDa, respectively. A conserved domain was identified within the

50

ORF 1 protein (pfam07028: DUF1319) (Finn et al., 2016), which is a member of c106184 superfamily that appears to be restricted to badnaviruses (Marchler-Bauer et al., 2015). ORF 2 of FJ14, PNG10 and SAM01 was predicted to encode a putative protein of 128 aa (Mr=13.8 kDa), 131 aa (Mr=14.5 kDa) and 121 aa (Mr=13.6 kDa), respectively. No conserved domains were identified in ORF 2 of any of the sequences.

ORF 3 of FJ14, PNG10 and SAM01 was predicted to encode a putative polyprotein of

2005 aa (Mr=226.6 kDa), 1946 aa (Mr=221.7 kDa) and 1892 aa (Mr=213.8 kDa), respectively, with conserved domains for a movement protein, aspartic protease, reverse transcriptase, ribonuclease H and RNA-binding zinc finger-like domain

(CXCX2CX4HX4C) predicted from the amino acid sequences.

In addition to the three typical ORFs found in badnaviruses, isolate SAM01 was predicted to have an additional ORF 4 of 417 nt (position 6656–7072), which partially overlaps ORF 3 and encodes a 138 aa putative protein of calculated Mr of

15.5 kDa. The size and genome position of this ORF is similar to a putative small ORF present in several other badnavirus genomes, including piper yellow mottle virus

(PYMoV) (Hany et al., 2014), pagoda yellow mosaic associated virus (PYMAV)

(Wang et al., 2014), yacon necrotic mottle virus (Lee et al., 2015), grapevine roditis leaf discoloration-associated virus (GRLDaV) (Maliogka et al., 2015) and rubus yellow net virus (Kalishuck et al., 2013). These small ORFs have little (5–20%) aa sequence homology and no conserved domains.

51

Fig. 1. Linearized representation of the genome organization of (A) DBESV (isolate FJ14), (B) DBALV2 (isolate PNG10) and (C) DBRTV2-

[4RT] (isolate SAM01). The putative tRNAmet- binding site (tRNAmet), TATA box and polyadenylation signal (polyA); open reading frames (ORF)

1; ORF 2; ORF 3 showing movement protein (MP), capsid protein (CP), zinc finger (Zn), aspartic protease (AP), reverse transcriptase (RT) and ribonuclease H (RNaseH) motifs.

52

To determine the taxonomic position of FJ14, PNG10 and SAM01, phylogenetic analysis and sequence comparison to published yam badnavirus sequences was carried out using the 529 bp RT/RNase H–coding region delimited by the BadnaFP/RP primer binding sites. The sequences were aligned using the

CLUSTAL-W algorithm in MEGA7. A maximum likelihood method following pairwise sequence comparison using the Kimura-2-parameter model was used to construct a phylogenetic tree in MEGA7 (Kumar et al., 2016). This analysis showed that FJ14, PNG10 and SAM01 cluster into three distinct putative species groups, namely DeBV-A/K1, DeBV-B/K3 and T14, respectively (Fig. 2A) which is consistent with previous phylogenetic analyses of characterized yam-infecting badnaviruses

(Kenyon et al., 2008; Bousalem et al., 2009; Eni et al., 2008; Seal et al., 2014; Bömer et al., 2016; Umber et al., 2016). Phylogenetic analysis was also carried out using the complete genome sequences of FJ14, PNG10, SAM01, badnaviruses infecting other host plants as well as members of other genera within the family Caulimoviridae. This analysis confirmed previous reports showing that yam-infecting badnaviruses cluster with DBALV, DBTRV, DBRTV1, DBRTV2 and DBSNV in a single subgroup

(subgroup 4 in Fig. 2B) which is separate from badnaviruses infecting other hosts.

53

54

Fig. 2. A) Phylogenetic tree constructed using maximum likelihood method based on the partial RT/RNaseH-coding sequence of DBESV (isolate FJ14), DBALV2 (isolate

PNG10) and DBRTV2-[4RT] (isolate SAM01) and previously described badnavirus

RT/RNaseH sequences from yam. Included in the analysis are (i) the monophyletic groups described by Bousalem et al. (2009), where DBV-A=dioscorea bacilliform virus A (A and B subgroups); DBV-B=dioscorea bacilliform virus B; DBV-

C=dioscorea bacilliform virus C; DBV-D=dioscorea bacilliform virus D; DeBV-

A=dioscorea esculenta bacilliform. virus A; DeBV-B=dioscorea esculenta bacilliform virus B; DeBV-C=dioscorea esculenta bacilliform virus C; DeBV- D=dioscorea esculenta bacilliform virus D; DeBV-E=dioscorea esculenta bacilliform E; DeBV-

55

F=dioscorea esculenta virus F; and DpBV=dioscorea pentaphylla bacilliform virus,

(ii) the 3 monophyletic groups (T13- T15) described by Bömer et al. (2016), (iii) the one monophyletic group (U12) denoted by Umber et al. (2016), and (iv) the 11 monophyletic groupings (K1-K11) described by Kenyon et al. (2008). Equivalent sequences from cacao swollen shoot virus (CSSV; NC_001574), banana streak OL virus (BSOLV; AJ002234), commelina yellow mottle virus (ComYMV; X52938), sugarcane bacilliform MO virus (SCBMOV; NC_008017), taro bacilliform virus

(TaBV; AF357836) and the outgroup rice tungro bacilliform virus (RTBV;

NC_001914; genus ) are also included. Boot- strap values for 1000 replicates with a cut-off value of 85%. Asterisks mark the groups for which full genome representatives are available. B) Maximum likelihood phylogenetic tree constructed from the alignment of complete genome sequences of DBESV (isolate

FJ14), DBALV2 (isolate PNG10), DBRTV2-[4RT] (isolate SAM01) and selected badnaviruses. GenBank accession numbers are banana streak MY virus (BSMYV;

AY805074), banana streak VN virus (BSVNV; AY750155), banana streak GF virus

(BSGFV; AY493509), banana streak IM virus (BSIMV; HQ593112), banana streak

OL virus (BSOLV; AJ002234), banana streak UA virus (BSUAV; HQ593107), banana streak UI virus (BSUIV; HQ593108), banana streak UL virus (BSULV;

HQ593109), banana streak UM virus (BSUMV; HQ593110), Bougainvillea chlorotic vein banding virus (BCVBV; EU034539), commelina yellow mottle virus (ComYMV;

X52938), cacao swollen shoot virus (CSSV; NC_001574), citrus yellow mosaic virus

(CYMV; AF347695), dioscorea bacilliform AL virus (DBALV; X94578, X94580,

X94582, X94575), dioscorea bacilliform SN virus (DBSNV; DQ822073), dioscorea bacilliform RT virus 1 (DBRTV1; KX008574), dioscorea bacilliform RT virus 2

(DBRTV2; KX008577), dioscorea bacilliform TR virus (DBTRV; KX430257), fig

56

badnavirus 1 (FBV-1; JF411989), grapevine roditis leaf discoloration-associated virus

(GRLDaV; HG940503), gooseberry vein banding associated virus (GVBaV;

JQ316114), grapevine vein clearing virus (GVCV; JF301669), pagoda yellow mosaic associated virus (PYMAV; KJ013302), pineapple bacilliform CO virus (PBCOV;

GU121676), piper yellow mottle virus (PYMoV; KC808712), rubus yellow net virus

(RYNV; KM078034), sweet potato pakakuy virus (SPPV; FJ560943), sugarcane bacilliform Guadeloupe D virus (SCBGDV; FJ439817), sugarcane bacilliform

Guadeloupe A virus (SCBGAV; FJ824813), sugarcane bacilliform IM virus

(SCBIMV; AJ277091), sugarcane bacilliform MO virus (SCBMOV; NC_008017), taro bacilliform virus (TaBV; AF357836) and rice tungro bacilliform virus (RTBV;

NC_001914). RTBV (genus Tungrovirus) was used as the outgroup to the genus

Badnavirus. The topology of the tree supports the separation of badnaviruses into four groups as depicted by Bömer et al. (2016) and Wang et al. (2014). Bootstrap values for 1000 replicates with a cut-off value of 70% (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.).

57

The current species demarcation criteria for members of the genus Badnavirus is a difference in the nucleotide sequence of the core polymerase (RT/RNase H- coding) region of the genome of greater than 20% (King et al., 2012). Since the nucleotide sequences of these regions in FJ14 and PNG10 differ from other published full-length badnavirus sequences by more than 20%, these viruses probably represent new species in the genus Badnavirus. To maintain the convention established by previous authors who have described badnavirus species from Dioscorea spp. the name dioscorea bacilliform esculenta virus (DBESV) and dioscorea bacilliform alata virus 2 (DBALV2) is proposed for these putative new species from Fiji and PNG, respectively. Although SAM01 was also considered a putative new species at the time of analysis, this novel sequence from Samoa should now be considered as an isolate

(designated DBRTV2-[4RT]) of the recently characterized DBRTV2 (Bömer et al.,

2016) from D. rotundata, based on 95% nucleotide identity between the two sequences in the RT/RNase H-coding region. The full-length sequence of DBESV (isolate FJ14),

DBALV2 (isolate PNG10) and DBRTV2-[4RT] (isolate SAM01) have been deposited in the GenBank database under the accession numbers KY827394, KY827395 and

KY827393, respectively.

DBESV has high sequence similarity (>99.8%) with partial sequences amplified using PCR from D. esculenta accessions from the Pacific (Fiji and PNG) as well as 82–83% sequence similarity with partial sequences generated from D. alata from PNG, Solomon Islands and Tonga (Kenyon et al., 2008; Fig. 2A sequence group

DeBV-A/K1). Similarly, DBALV2 has high sequence similarity (90–92%) with partial sequences amplified using PCR from D. alata accessions from PNG and

Vanuatu and 82% sequence similarity with one sequence from D. alata from the

Philippines (Kenyon et al., 2008; Fig. 2A sequence group DeBV-B/K3). Interestingly,

58

additional sequences from these two subgroups were not detected in a subsequent study investigating endogenous badnavirus sequences in yams (Bousalem et al., 2009).

Our results confirm the extant nature of two of these badnavirus groups through sequencing of the episomal viral genome.

The present report confirms the occurrence of DBRTV2 in the Pacific, which has previously only been recorded from the African region. Interestingly, previous studies by Kenyon et al. (2008) and Bousalem et al. (2009) did not amplify sequences with high similarity to DBRTV2. This may be due to the poor representation of D. rotundata in the Pacific collection available to the former, or a result of the geographical sources of accessions used in the latter study, with DBRTV2 only previously reported in yam from Nigeria (Bömer et al., 2016).

As a preliminary study towards the development of diagnostic assays for these three badnavirus species, PCR primers (Table S2) were designed based on the sequences of DBESV and DBALV2 and using the consensus sequences of available

DBRTV2 isolates (isolates SAM01, KX008577, KX008578 and KX00857). These primers were subsequently used in separate PCRs with TNA extracts from the 50 yam accessions originally screened by IC-PCR using the BadnaFP/RP primers. PCR was carried out using GoTaq Green, essentially as described previously, with primer annealing temperatures listed in Table S2. Amplicons of the expected size were only generated in the four yam samples (PNG03, PNG10, FJ14, SAM01) that had previously tested positive for episomal badnavirus by IC-PCR and RCA, with sequencing of the amplicons confirming their identity. Although these results suggest that the primers may be suitable for use in diagnostic PCR assays for DBESV,

DBALV2 and DBRTV2-[4RT], further work is necessary to assess the genomic sequence variability of these viruses in order to design the most appropriate primers

59

for virus detection. Badnaviruses are known to be highly variable at the genomic level with Kenyon et al. (2008) reporting nucleotide differences of up to 18.0% and 16.3% in the RT/RNase H-coding region of DBESV and DBALV2 isolates, respectively.

Importantly, this initial PCR screening work suggests that integrated sequences of these three species are not present in the genomes of the yam species tested.

Furthermore, BLAST analysis of Dioscorea spp. expressed sequence tag (EST) sequences available in GenBank, using the complete nucleotide sequences of DBESV,

DBALV2 and DBRTV2-[4RT], failed to identify any published sequences with significant nucleotide sequence identity to these viruses, providing further evidence that endogenous counterparts of DBESV, DBALV2 and DBRTV2 are not present in

D. alata, D. esculenta or in other Dioscorea spp. studied to date.

The recent increase in availability of full-length episomal sequences will improve the clarity around the endogenous or exogenous nature of yam badnaviruses and will help in the development of reliable diagnostics for badnaviruses in yam, thus enabling safe international exchange of yam germplasm.

Conflict of interest

The authors declare no conflict of interest.

Financial support

The funding for the project was provided by the Australian Centre for

International Agricultural Research (#PC/2010/065). AS is a John Allwright

Fellowship recipient.

60

Acknowledgements

The authors would like to thank the Secretariat of the Pacific Community for making their yam collections available for this project.

References

Bömer, M., Turaki, A.A., Silva, G., Kumar, P.L., Seal, S.E., 2016. A sequence-

independent strategy for amplification and characterization of episomal

badnavirus sequences reveals three previously uncharacterized yam

badnaviruses. Viruses 8, 188.

Benfey, P.N., Chua, N.H., 1990. The cauliflower mosaic virus 35S promoter:

combinatorial regulation of transcription in plants. Science 250, 959–966.

Bhat, A.I., Hohn, T., Selvarajan, R., 2016. Badnaviruses: the current global scenario.

Viruses 8, 177.

Bousalem, M., Durand, O., Scarcelli, N., Lebas, B.S.M., Kenyon, L., Marchand, J.-L.,

Lefort, F., Seal, S.E., 2009. Dilemmas caused by endogenous pararetroviruses

regarding the taxonomy and diagnosis of yam (Dioscorea spp.) badnaviruses:

analyses to support safe germplasm movement. Arch. Virol. 154, 297–314.

Briddon, R.W., Phillips, S., Brunt, A., Hull, R., 1999. Analysis of the sequence of

Dioscorea alata bacilliform virus; Comparison to other members of the

badnavirus group. Virus Genes 18, 277–283.

Eni, A.O., Hughes, J.D., Asiedu, R., Rey, M.E.C., 2008. Sequence diversity among

badnavirus isolates infecting yam (Dioscorea spp.) in Ghana, Togo, Benin and

Nigeria. Arch. Virol. 153, 2263–2272.

FAOSTAT, 2014, Rome, Italy http://faostat.fao.org.

61

Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell, A.L., Potter,

S.C., Punta, M., Qureshi, M., Sangrador-Vegas, A., Salazar, A.G., Tate, J.,

Bateman, A., 2016. The Pfam protein families database: towards a more

sustainable future. Nucleic Acids Res. 44, D279–D285.

Gayral, P., Noa-Carrazana, J.-C., Lescot, M., Lheureux, F., Lockhart, B.E.L.,

Matsumoto, T., Piffanelli, P., Iskra-Caruana, M.-L., 2008. A single banana

streak virus integration event in the banana genome as the origin of infectious

endogenous pararetrovirus. J. Virol. 82, 6697–6710.

Geering, A.D., 2014. Caulimoviridae (Plant pararetroviruses). John Wiley&Sons, Ltd,

Chichester, UK.

Hany, U., Adams, I.P., Glover, R., Bhat, A.I., Boonham, N., 2014. The complete

genome sequence of Piper yellow mottle virus (PYMoV). Arch. Virol. 159, 385–

388.

James, A.P., Geijskes, R.J., Dale, J.L., Harding, R.M., 2011. Development of a novel

rolling circle amplification technique to detect banana streak virus that also

discriminates between integrated and episomal virus sequences. Plant Dis. 95,

57–62.

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton,

S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Mentjies, P.,

Drummond, A., 2012. Geneious Basic: an integrated and extendable desktop

software platform for the organization and analysis of sequence data.

Bioinformatics 28, 1647–1649.

Kenyon, L., Lebas, B.S.M., Seal, S.E., 2008. Yams (Dioscorea spp.) from the South

Pacific Islands contain many novel badnaviruses: implications for international

movement of yam germplasm. Arch. Virol. 153, 877–889.

62

King, A.M.Q., Adams, M.J., Carstens, E.B., Lefkowitz, E.J., 2012. Virus taxonomy.

In: Ninth Report of the International Committee on Taxonomy of Viruses. San

Diego. Elsevier, 1338 pp.

Kleinow, T., Nischang, M., Beck, A., Kratzer, U., Tanwir, F., Preiss, W., Kepp, G.,

Jeske, H., 2009. Three C-terminal phosphorylation sites in the Abutilon mosaic

virus movement protein affect symptom development and viral DNA

accumulation. Virology 390, 89–101.

Kumar, S., Stecher, G., Tamura, K., 2016. MEGA7: Molecular evolutionary genetics

analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1854–1874.

Laney, A.G., Hassan, M., Tzanetakis, I.E., 2012. An integrated badnavirus is prevalent

in fig germplasm. Phytopathol 102, 1182–1189.

Lee, Y.-J., Kwak, H.-R., Lee, Y.-K., Kim, M.-K., Choi, H.-S., Seo, J.-K., 2015.

Complete genome sequence of yacon necrotic mottle virus, a novel putative

member of the genus Badnavirus. Arch. Virol. 160, 1139–1142.

Maliogka, V.I., Olmos, A., Pappi, P.G., Lotos, L., Efthimiou, K., Grammatikaki, G.,

Candresse, T., Katis, N.I., Avgelis, A.D., 2015. A novel grapevine badnavirus is

associated with the Roditis leaf discoloration disease. Virus Res. 203, 47–55.

Marchler-Bauer, A., Derbyshire, M.K., Gonzales, N.R., Lu, S., Chitsaz, F., Geer,

L.Y., Geer, R.C., He, J., Gwadz, M., Hurwitz, D.I., Lanczycki, C.J., Lu, F.,

Marchler, G.H., Song, J.S., Thanki, N., Wang, Z., Yamashita, R.A., Zhang, D.,

Zheng, C., Bryant, S.H., 2015. CDD: NCBI's conserved domain database.

Nucleic Acids Res. 43, D222–D226.

Medberry, S.L., Olszewski, N.E., 1993. Identification of cis elements involved in

Commelina yellow mottle virus promoter activity. Plant J. 3, 619–626.

63

Medberry, S.L., Lockhart, B.E.L., Olszewski, N.E., 1990. Properties of Commelina

yellow mottle virus's complete DNA sequence, genomic discontinuities and

transcript suggest that it is a pararetrovirus. Nucleic Acids Res. 18, 5505–5513.

Mignouna, H.D., Abang, M.M., Asiedu, R., 2008. Genomics of yams, a common

source of food and medicine in the tropics. In: Moore, P.H., Ming, R. (Eds.),

Plant Genetics and Genomics: Crops and Models. Springer, New York, pp. 549–

570.

Seal, S., Muller, E., 2007. Molecular analysis of a full-length sequence of a new yam

badnavirus from Dioscorea sansibarensis. Arch. Virol. 152, 819–825.

Seal, S., Turaki, A., Muller, E., Kumar, P.L., Kenyon, L., Filloux, D., Galzi, S., Lopez-

Montes, A., Iskra-Caruana, M.L., 2014. The prevalence of badnaviruses in West

African yams (Dioscorea cayenensis-rotundata) and evidence of endogenous

pararetrovirus sequences in their genomes. Virus Res. 186, 144–154.

Stavolone, L., Herzog, E., Leclerc, D., Hohn, T., 2001. Tetramerization is a conserved

feature of the virion-associated protein in plant pararetroviruses. J. Virol. 75,

7739–7743.

Su, L., Gao, S., Huang, Y., Ji, C., Wang, D., Ma, Y., Fang, R., Chen, X., 2007.

Complete genomic sequence of Dracaena mottle virus, a distinct badnavirus.

Virus Genes 35, 423–429.

Umber, M., Gomez, R.-M., Gélabale, S., Bonheur, L., Pavis, C., Teycheney, P.Y.,

2016. The genome sequence of Dioscorea bacilliform TR virus a member of the

genus Badnavirus infecting Dioscorea spp., sheds light on the possible function

of endogenous Dioscorea bacilliform viruses. Arch. Virol. 162, 517–521.

64

Wang, Y., Cheng, X., Wu, X., Wang, A., Wu, X., 2014. Characterization of complete

genome and small RNA profile of pagoda yellow mosaic associated virus a novel

badnavirus in China. Virus Res. 188, 103–108.

Yang, I.C., Hafner, G.J., Dale, J.L., Harding, R.M., 2003. Genomic characterization

of taro bacilliform virus. Arch. Virol. 148, 937–949.

65

Supplementary Information

Table S1. Arrangement of genome features of DBESV (isolate FJ14), DBALV2 (isolate PNG10) and DBRTV2-[4RT] (isolate SAM01).

Genome ORF 1 ORF 2 ORF 3 ORF 4 Transcriptional elements Virus Start-stop Lengt Start-stop Lengt Start-stop Lengt Start-stop Amin Protei Poly(A)- Length (bp) Length Amino Protein Amino Protein Amino Protein TATA-box (frame) h (frame) h (frame) h (frame) o n tail

(% G + C) (bp) (codon use) acids (kDa) (bp) (codon use) acids (kDa) (bp) (codon use) acids (kDa) (bp) (codon use) acids (kDa) (position) (position) DBESV 8106 429 824-1252 142 16.3 387 1249-1635 128 13.8 6020 1632-7651 2005 226.6 7956-7962 <127 8090-8094 (isolate FJ14) (+2) (+1) (+3) > (46.2) (atg-tga) (atg-tga) (atg-tga) ttcTATATAAgac ATAAA

DBALV2 7871 429 769-1197 142 16.5 396 1194-1589 131 14.5 5841 1589-5841 1946 221.7 7694-7700 <93> 7794-7799 (isolate (+1) (+3) (+2) PNG10) (42.4) (atg-tga) (atg-taa) (atg-tga) ctcTATATAAgct AATAAA

DBRTV2- 7426 429 398-829 (+2) 142 16.6 366 823-1188 (+1) 121 13.6 5679 1183-6863 1892 213.9 417 6656-7072 138 15.50 7255-7261 <88> 7350-7355 [4RT] (isolate (+3) (+2) SAM01) (43.9) (atg-tga) (atg-tga) (atg-tag) (atg-taa) gccTATATAAgta AATAAA

66

Table S2. PCR primers used for the detection of DBESV (isolate FJ14), DBALV2 (isolate PNG10) and DBRTV2-[4RT] (isolate SAM01).

Target virus Primer name Sequence (5'-3') Tm (℃) Amplicon size (bp)

YBV_GPT14-FP1 TCTCGCAGTTTACATCGATGA DBRTV2-[4RT] (isolate SAM01) 53 438 YBV_GPT14-RP1 TTRGGYGGGATCTCCARTTC DBALV2_FP1 RCAGGAGCATGAGCAGCATT DBALV2 (isolate PNG10) 57 316 DBALV2_RP1 CATCCTCTTTTCGCCATTTGG DBESV_FP1 GTGAGCCAATTCTGAGGGCT DBESV (isolate FJ14) 60 1027 DBESV_RP1 GGTGGYAGYTGCARRTCAGG

67

This page is intentionally left blank

68

Chapter 4

Characterization of a novel member of the family

Caulimoviridae infecting Dioscorea nummularia in the

Pacific, which may represent a new genus of dsDNA plant

viruses

Amit Sukal1,2, Dawit B Kidanemariam1, James Dale1, Robert M. Harding1 and

Anthony James1*

1 Centre for Tropical Crops and Biocommodities, Queensland University of

Technology, Brisbane, Queensland, Australia

2 Centre for Pacific Crops and Trees, Pacific Community, Suva, Fiji.

* Corresponding author:

E-mail address: [email protected] (APJ)

PLoS ONE 13:1-12

69

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

70

Abstract

We have characterized the complete genome of a novel circular double-stranded

DNA virus, tentatively named Dioscorea nummularia-associated virus (DNUaV), infecting Dioscorea nummularia originating from Samoa. The genome of DNUaV comprised 8139 bp and contained four putative open reading frames (ORFs). ORFs 1 and 2 had no identifiable conserved domains, while ORF 3 had conserved motifs typical of viruses within the family Caulimoviridae including coat protein, movement protein, aspartic protease, reverse transcriptase and ribonuclease H. A transactivator domain, similar to that present in members of several caulimoviridae genera, was also identified in the putative ORF 4. The genome size, organization, and presence of conserved amino acid domains are similar to other viruses in the family

Caulimoviridae. However, based on nucleotide sequence similarity and phylogenetic analysis, DNUaV appears to be a distinct novel member of the family and may represent a new genus within the family Caulimoviridae.

Introduction

Yams (Dioscorea spp.) are ranked as the fourth most important root crop by production after potato, cassava and sweet potato. They provide a staple food source for millions of people in Africa, the Caribbean, South America, Asia and the Pacific

(1) while wild yams provide a valuable food source in times of famine. Yam production is highest in West Africa, which accounts for 95% of the world’s total production (2). Although most of the production occurs in the African region, predominated by Dioscorea rotundata-cayenensis, yam is of importance in South

Pacific countries where D. alata and D. esculenta are the dominant species (3) with some scattered cultivation of D. rotundata, D. bulbifera, D. nummularia, D. transversa

71

and D. trifida throughout the region. Yam cultivation and improvement in the Pacific faces many agronomical challenges including yield losses due to pests and diseases

(4,5). To help address these issues, as well as improve food security and facilitate commercial agricultural opportunities in the Pacific region, access to germplasm from the Pacific and other regions (such as Africa) is needed for possible exploitation. An important collection of Pacific yam germplasm is conserved in tissue culture at the

Centre for Pacific Crops and Trees (CePaCT) of the Pacific Community (SPC), Suva,

Fiji. This collection, together with yam germplasm from the International Institute of

Tropical Agriculture (IITA) in West Africa, could hold the key to addressing the problems faced with yam cultivation in the Pacific. However, like many other vegetatively propagated crops such as sugarcane, banana, cassava, aroids and sweet potato, yams are prone to virus infection and accumulation. Therefore, the identification of viruses infecting the crop and the development of reliable diagnostic tests is critical to facilitate the safe exchange and utilization of yam germplasm.

Viruses belonging to the families (genus Potexvirus),

Betaflexiviridae (genus Carlavirus), (genus Cucumovirus),

Caulimoviridae (genus Badnavirus), Potyviridae (genus Macluravirus and Potyvirus),

Secoviridae (genus Comovirus and Fabavirus) and Tombusviridae (genus

Aureusvirus) are known to infect yams (6,7). Of these, viruses belonging to the family

Caulimoviridae remain the least studied and the most difficult to diagnose due to their significant genetic variability and, in some cases, the presence of integrated viral sequences in the host genome (8–10).

The family Caulimoviridae consists of eight genera of reverse transcribing, double-stranded DNA (dsDNA)-containing plant viruses, which are primarily distinguished from each other based on particle morphology and genome organization

72

(11,12). Six of the genera, namely Caulimovirus, Cavemovirus, Petuvirus,

Rosadnavirus, Soymovirus and Solendovirus have isometric virions that are 52 nm in diameter, while two genera, Badnavirus and Tungrovirus, have bacilliform virions with dimensions of 30 x 130 to 150 nm (11,13). All family members have a genome size between 7.2 to 9.2 kb with the coding capacity on the plus-strand. To date, only species belonging to the genus Badnavirus have been identified from yams, namely

Dioscorea bacilliform alata virus (DBALV), DBALV2, Dioscorea bacilliform esculenta virus (DBESV), Dioscorea bacilliform rotundata virus 1 (DBRTV1),

DBRTV2, DBRTV3, Dioscorea bacilliform sansibarensis virus (DBSNV) and

Dioscorea bacilliform trifida virus (DBTRV) (9,14–18). In addition to these full-length viral sequences, a large number of partial reverse transcriptase (RT)-ribonuclease H

(RNase H) sequences which cluster within numerous different monophyletic groups have also been PCR-amplified from yam germplasm (3,9,19–22). While the majority of these groups cluster within the genus Badnavirus, several groups do not cluster with any recognized genera within the family Caulimoviridae (3,21). Whether these sequences are derived from episomal or integrated viral sequences or from another source such as retrotransposons is unknown since they were generated by PCR.

In 2014, a project was initiated to characterize the diversity of badnaviruses infecting yams in the Pacific region. In this paper, we report the identification of a putative new member of the family Caulimoviridae from yam, tentatively named

Dioscorea nummularia-associated virus (DNUaV). The genome properties and organization of DNUaV are described and its relationship to other members of the family Caulimoviridae is discussed.

73

Materials and methods

Plant material and nucleic acid extraction

CePaCT maintains an in vitro collection of yams (278 accessions) which is comprised of seven different species: D. alata (n=193), D. rotundata (n=32), D. esculenta (n=41), D. bulbifera (n=8), D. nummularia (n=2), D. transversa (n=1) and

D. trifida (n=1) originating from Africa (obtained from IITA, Ibadan, Nigeria), Papua

New Guinea (PNG), Vanuatu, New Caledonia, Federated States of Micronesia (FSM),

Samoa and Tonga. Following screenhouse acclimatization for three months leaf samples from 173 plants representative of the collection were used in this study. Total nucleic acid (TNA) was extracted using a CTAB protocol (23) from approximately

100 mg of fresh leaf tissue. The purified TNA was treated with RNase A (1 μg/μl) and the concentration adjusted to 500 ng/μl with sterile nuclease-free water.

RCA and sequencing

RCA was done essentially as described previously (24). Briefly, 1 μl of TNA extract was used as template in RCA using the TempliPhi™ 100 Amplification Kit

(GE Healthcare, UK) with the addition of 1 μl of 10 mM 3’-exonuclease-protected degenerate badnavirus 96 primers BadnaFP/RP (25) to bias amplification towards badnavirus DNA.

RCA products were independently digested with EcoRI, KpnI, SphI and StuI restriction endonucleases which were selected following in silico restriction analysis of published yam badnavirus genome sequences, or based on experimental experience, to generate useful restriction profiles. Digested RCA products were electrophoresed through 1% agarose gels at 100 V for 1 h. Restriction fragments of approximately full- length genome size (7-8 kb) were excised and ligated into appropriately digested and

74

de-phosphorylated pUC19. Plasmids were first screened via restriction analysis to ensure desired inserts were present, then subjected to Sanger sequencing using either universal M13 primers or BadnaFP/RP primers. The resulting sequences were used to query the National Centre for Biotechnology Information (NCBI) database

(www.ncbi.nlm.nih.gov) with the BLASTn and BLASTx search functions. Where

BLAST analysis yielded a match to viral sequences, primer walking using sequence- specific primers was used to generate full-length sequences in both directions.

To confirm the sequences spanning putative restriction sites, PCR was carried out using sequence-specific primers flanking the region. PCR mixes consisted of 10

μl of 2x GoTaq Green Master Mix (Promega, USA), 5 ρmol of each sequence-specific primer and 1 μl of DNA extract (diluted to ~50 ng/μl) in a final volume of 20 μl. PCR cycling conditions were as follows: initial denaturation at 94°C for 2 min followed by

35 cycles of 94°C for 20 s, 50°C for 30 s and 72°C for 2 min, with a final extension at

72°C for 10 min. Amplicons were cloned into pGEM®-T Easy (Promega, USA) and sequenced with primers M13F/R as described previously.

Putative full-length sequences were assembled using Geneious v11.0.5 (26).

SnapGene® software (www.snapgene.com; GSL Biotech) and ORFfinder

(https://www.ncbi.nlm.nih.gov/orffinder/) were used to predict putative ORFs on the plus-strand of the assembled full-length sequences. InterPro software was used to scan protein databases for conserved domains (27), while BLASTn and BLASTx were used to search for sequence homologies in GenBank.

75

Sequence comparisons and phylogenetic analyses

Pairwise sequence comparison (PASC) was done using sequences corresponding to amino acid residues L269-R672 of the cauliflower mosaic virus

(CaMV) polymerase (pol) gene. This region includes the conserved motifs of the RT- and RNase H-coding regions (28) and is currently used for the demarcation of species in the family Caulimoviridae (12). Nucleotide or translated amino acid sequences were aligned using ClustalW alignment in MEGA7 (29). Phylogenetic analyses were done using the nucleotide sequences of either the 529 bp RT/RNase H-coding region delineated by the BadnaFP/RP primer binding sites or the pol gene sequences described above. Sequences were aligned using ClustalW and phylogenetic trees were constructed using the maximum-likelihood method (Kimura-2-parameter model) in

MEGA7 with 1000 bootstrap replication.

Viral DNA detection

Specific primers DNUaV-ORF4-FP1 (5'-CCGGGTTGCCAGTACAGAAT-

3') and DNUaV-ORF4-RP1 (5'-CGTGAAGCACCCAAACCTTG-3') were designed following sequence analysis to amplify a 450 bp region of the putative ORF 4 sequence. PCR was carried out using GoTaq Green essentially as described previously using 57°C as the annealing temperature. Amplicons were cloned and sequenced as described earlier.

76

Results

Identification of DNUaV

Of the 173 samples analysed, none of which showed symptoms, 35 yielded restriction profiles indicative of the presence of badnaviruses. Restriction analysis of

RCA products derived from two Samoan D. nummularia accessions (DN/WSM-01 and DN/WSM-02) using SphI 148 and StuI, resulted in putative full-length products

(~8 kb), while KpnI gave no digest products and digestion using EcoRI resulted in a number of products smaller than 3.5 kb. These profiles were inconsistent with those expected for known yam-infecting badnaviruses based on analysis of full-length sequences available in GenBank. Therefore, the putative full-length SphI digested fragments were cloned and sequenced. Sequences (~700 bp) originating from the termini of the ~8 kb SphI fragments from both samples showed no nucleotide similarity with published viral sequences. However, BLASTx analysis revealed that the putative amino acid sequence from one end of the cloned fragments had low (32%) similarity to the ORF 1 protein of the badnavirus, cacao yellow vein-banding virus

(CYVBV), and 31% similarity to the ORF 1 protein of the tungrovirus, rice tungro bacilliform virus. Sequencing of the cloned fragments was subsequently carried out using the degenerate badnavirus primers BadnaFP/RP. Sequences (~700 bp) were only obtained using primer BadnaFP, with BLASTn analysis revealing 73-75% identity with two partial RT/RNase H-coding sequences of a Dioscorea bacilliform virus derived from D. nummularia (GenBank accessions AM072692 and AM421696).

Since the sequences of the two 8 kb-SphI clones from isolates DN/WSM-01 and

DN/WSM-02 showed 99% nucleotide similarity, the complete genomic sequence of only one isolate, DN/WSM-01, was determined. This sequence was obtained from

77

three independent clones using primer walking, and the presence of the single SphI restriction site was confirmed through additional PCR analysis and sequencing.

Genome organization, sequence and phylogenetic analysis

The complete genomic sequence of the virus isolate derived from yam accession DN/WSM-01 was 8139 bp in length and was deposited in GenBank under the accession number MG944237. Consistent with the RFLP patterns observed, the genome contained 5 EcoRI sites, single SphI and StuI sites, and no KpnI site. The genome of isolate DN/WSM-01 contained four putative ORFs which comprised 450 nt (ORF 1), 384 nt (ORF 2), 4737 nt (ORF 3) and 1371 nt (ORF 4) (Fig 1). ORFs 1 and 2, and 2 and 3 overlapped, whereas ORFs 3 and 4 were separated by one nucleotide. Whereas ORFs 1 and 2 had overlapping stop/start codons (atga), the putative start codon of ORF 3 was located 47 nucleotides 5' of the ORF 2 stop codon

(Fig 1). ORF 2 was in a -1 translational reading frame relative to ORF 1, while ORF 3 was in a +1 translational reading frame relative to ORF 2. The genome contained one large intergenic region (IR), between ORF 4 and ORF 1 which comprised 1247 nt and contained a putative tRNAmet binding site (5'-TGGTATCAGAGCAATGGT-3') with

88% nucleotide similarity to the plant tRNAmet consensus sequence (3'-

ACCAUAGUCUCGGUCCAA-5'), which has been described as the priming site for reverse transcription (30). This was designated as the origin of the circular genome, consistent with the convention used for other caulimoviridae members. A TATA-box

(TATATAA7944-7950) and polyadenylation signal (AAAAAATAA7981-7989), analogous to the 35S promotor of CaMV, were also identified in the region 5' of the tRNAmet site.

Analysis of the translated ORF sequences failed to identify any conserved domains in ORFs 1 and 2. In contrast, comparative sequence analysis of ORF 3

78

revealed several functional domains shared by all members of the family

Caulimoviridae including aspartic protease (Ala933-Ile1045, IPR021109), zinc finger

(Cys703-Cys708, IPR001878), RT (Lys1187-Ile1348, IPR000477) and RNase H domains

(Ser1469-Ala1574, IPR002156). In addition, a conserved movement protein (MP) domain corresponding to M1-E327 of CaMV ORF 2 protein, and a coat protein (CP) domain corresponding to L261–N429 of the CaMV ORF 4 protein, were also identified.

A transactivator (TAV) domain (Tyr80-202 Thr122, pfam01693), similar to that present in ORF 6 of caulimoviruses and , was identified in the putative

ORF 4 sequence (Fig 1 and 2A-E).

When the full-length genome sequence of isolate DN/WSM-01 was used for

BLAST analysis with the search restricted to viruses (taxid:10239), the highest nucleotide identity (70%) was to a 263 bp and 186 bp region of the RT domain of two members of the genus Badnavirus, namely DBRTV2 (accession KX008579) and cacao swollen shoot virus (CSSV; accession KX592572.1), respectively. BLAST analysis of the putative protein sequences encoded by ORFs 1-4 of isolate DN/WSM-

01 revealed highest similarity with the ORF 1 protein of CYVBV (32%), ORF 2 protein of taro bacilliform virus (32%), ORF 3 polyprotein of fig badnavirus 1 (41%), while ORF 4 had 26% similarity to amino acids 1443 to 1544 of Piper DNA virus 1.

79

Fig 1. Schematic representation of the genome organization of Dioscorea nummularia-associated virus (DNUaV). Large arrows represent the putative ORFs. Conserved protein domains are shown: dark blue=movement protein (MP) domain corresponding to M1-E327 of CaMV ORF 2 protein; green=the putative coat protein (CP) domain corresponding to L261–N429 of the CaMV ORF 4 protein; black=zinc finger (Zn); red=pepsin-like aspartic protease (AP); orange=reverse transcriptase (RT); purple=RNase H (RH); light blue=transactivator (TAV) domain. The tRNAmet binding site (tRNAmet) and regulatory sequences including TATA box (TATA) and polyadenylation signal (PolyA) are also shown. The relative position of restriction sites based on the complete genome sequence are shown above the genome.

80

81

Fig 2. Amino acid sequence alignments of the conserved motifs in the proteins of the type member of each genus in the family Caulimoviridae. The type member for each genus within the family Caulimoviridae used for comparison is as follows:

Caulimovirus - cauliflower mosaic virus (CaMV;V00141), Badnavirus - Commelina yellow mottle virus (ComYMV; X52938), Cavemovirus – cassava vein mosaic virus

(CsVMV; U59751), Petuvirus - Petunia vein clearing virus (PVCV; U95208),

Tungrovirus - rice tungro bacilliform virus (RTBV; NC001914), Rosadnavirus - rose yellow vein virus (RYVV; JX028536), Soymovirus - soybean chlorotic mottle virus

(X15828), Solendovirus – tobacco vein clearing virus (TVCV; AF190123). Identical

(asterisk/bold font), conserved (colon) and weakly conserved (dot) residues among the members of the family are indicated.

82

PASC using either nucleotide or translated amino acid sequences of the pol gene revealed an identity of 42 to 58% or 27 to 53%, respectively, between DNUaV and the type species for each genus in the family Caulimoviridae (Table 1).

Phylogenetic analysis using partial RT/RNase H-coding sequences showed that

DNUaV forms a distinct subgroup outside of the genus Badnavirus, together with several published sequences (GenBank accessions KY555561, AM072692 and

AM421696) previously reported from yams (Fig 3A). A similar tree topology, with

DNUaV clustering separately from recognized caulimoviridae genera, was obtained when pol nucleotide sequences from published full-length sequences were analyzed

(Fig 3B).

PCR screening for DNUaV

Using primers designed to amplify a 450 bp region of DNUaV ORF 4, the 173 samples used in this study were tested for DNUaV by PCR. The expected size amplicon was only generated from the Samoan D. nummularia samples, DN/WSM-

01 and DN/WSM-02. Sequence analysis of the cloned PCR amplicons from the two samples revealed 99% similarity to each other and to the DNUaV ORF 4 sequence generated using RCA.

83

Table 1. Mean pairwise nucleotide (above diagonal) and amino acid (below diagonal) similarity between the pol gene of DNUaV and the type members of the eight current genera within the family Caulimoviridaea

DNUaV CaMV ComYMV CsVMV PVCV RTBV RYVV SbCMV TVCV DNUaV 48 58 48 45 54 47 45 47

CaMV 41 46 46 51 47 48 48 47

ComYMV 53 36 43 42 49 45 42 43

CsVMV 36 36 32 45 48 54 45 64

PVCV 32 42 29 32 43 46 44 44

RTBV 45 36 40 33 30 46 43 48

RYVV 39 40 35 39 34 35 44 51

SbCMV 32 39 28 27 29 27 32 43

TVCV 38 37 32 48 30 36 37 28 aAbbreviations for the type members of each genus are: Caulimovirus - cauliflower mosaic virus (CaMV; V00141), Badnavirus - Commelina yellow mottle virus (ComYMV; X52938), Cavemovirus - cassava vein mosaic virus (CsVMV; U59751), Petuvirus - Petunia vein clearing virus

(PVCV;U95208), Tungrovirus - rice tungro bacilliform virus (RTBV; NC001914), Rosadnavirus – rose yellow vein virus (RYVV; JX028536),

Soymovirus - soybean chlorotic mottle virus (X15828), Solendovirus - tobacco vein clearing virus (TVCV; AF190123) used in the analysis above.

84

85

86

Fig 3. Phylogenetic analysis using the maximum-likelihood method following

ClustalW alignment in MEGA7 (29) to infer evolutionary relationships of DNUaV.

Bootstrap values (1,000 replicates) are shown above nodes when greater than 70%.

(A) Phylogenetic tree constructed using sequences of the RT/RNase H-coding region delineated by the BadnaFP/RP primers (25). This analysis includes badnavirus

RT/RNase H-coding sequences identified from yams (3,9,19-22), badnaviruses infecting other crops and the homologous region of other caulimoviridae members

(See S1 Table and S2 Table for complete list of sequences used in the analysis).

Subgroups representing published yam badnavirus sequences have been collapsed to improve the presentation of the tree, while the badnavirus groups that have representative full genome sequence available are marked with an asterisk (*); (B)

Phylogenetic tree using pol gene nucleotide sequences of DNUaV and representative members of family Caulimoviridae (See S2 Table for list of sequences included in the analysis). The pol gene sequences are equivalent to amino acid residues L269-R672 from the translated ORF 5 nucleotide sequence of cauliflower mosaic virus (CaMV).

87

Discussion

In this study, we identified and characterized a novel DNA virus infecting D. nummularia which we have tentatively named Dioscorea nummularia-associated virus

(DNUaV). Although the genome size and organization, and the presence of conserved amino acid domains of DNUaV, is typical of other viruses in the family

Caulimoviridae, there are several molecular features of the virus that distinguish it from the current genera.

The ICTV uses several criteria to classify members of the family

Caulimoviridae. The most common criterion for demarcation of species uses differences in the nucleotide sequence of the pol gene (AP/RT/RNase H-coding region) of more than 20%. Comparisons of the pol gene sequence of DNUaV with other Caulimoviridae showed the highest identity (76%) to a partial sequence of

Dioscorea bacilliform virus isolate SB10a_Dn derived from D. nummularia (3). Based on differences in the nucleotide sequence identity of more than 20%, DNUaV appears to be a novel virus in the family Caulimoviridae.

In addition to nucleotide sequence similarity, distinctions between genera within the family Caulimoviridae are also based on the type of host plant, particle morphology, genome organization and the presence and arrangement of conserved protein-coding motifs. DNUaV encodes four ORFs with the size of ORFs 1-3 consistent with both badnavirus and tungrovirus members, as are the arrangement of the characteristic MP, CP, Zn-finger binding domain and the AP-RT-RNase H-coding regions of ORF 3. The relative positions of ORF 1 and 2 are similar to those of badnaviruses, while ORFs 2 and 3 overlap each other by 47 nt which is also similar to the badnaviruses CSSV, gooseberry vein banding virus, Piper yellow mottle virus and sweet potato pakakuy virus (31–33). However, unlike those badnaviruses with a fourth

88

ORF which always overlaps with ORF 3, ORF 4 of DNUaV is separated from ORF 3 by a short intergenic region which is more similar to genome organization of RTBV, the sole member of the genus Tungrovirus. Further, the size of DNUaV ORF 4 is also similar to that of RTBV. Unlike RTBV, however, the DNUaV ORF 4 gene product contains a conserved translation transactivator domain, which is typical of ORF 6 of caulimoviruses and soymoviruses, and which is also present in ORF 4 of cavemoviruses and solendoviruses. However, unlike the DNUaV ORF 4 sequence, the

ORF 4 sequences of both cavemoviruses and solendoviruses also includes the conserved coiled-coil motifs characteristic of the virion-associated protein. Clearly, determination of virion morphology and whether infected plants contain inclusion bodies typical of members of the genus Caulimovirus is required before the taxonomic status of DNUaV can be fully resolved. However, based on the sequence information presented, DNUaV appears to be a distinct, novel member of Caulimoviridae.

PASC carried out using pol gene sequences showed 42 to 58% nucleotide or

27 to 53% amino acid sequence identity between DNUaV and the type members of each genus within the family Caulimoviridae (Table 1). This level of nucleotide sequence identity is typical of that between the established genera within the family

Caulimoviridae, which ranges from 42 to 64% (Table 1). Further, the level of amino acid sequence identity is similar to the range of 27 to 48% identity between the type members of each genus. Of the eight type members included in the analysis DNUaV shares the highest level of amino acid identity (53%) with ComYMV, the type member of the genus Badnavirus (Table 1), suggesting that DNUaV is most closely related to the badnaviruses. However, phylogenetic analyses using either partial RT/RNase H- coding sequences (Fig 3A) or pol gene sequences (Fig 3B), indicates that DNUaV is basal to, and distinct from, the badnaviruses, forming a distinct clade between the

89

single member of the genus Tungrovirus, RTBV, and the genus Badnavirus. This suggests that DNUaV may belong in a new, distinct genus within the family

Caulimoviridae.

Previous studies investigating the occurrence of badnaviruses in yams have reported large numbers of badnavirus partial RT/RNase H-coding (529 bp) sequences generated using the BadnaFP/RP primers (3,9,10,18–22). Phylogenetic analyses of these sequences identified four distinct sequence groups, namely, K12 and K13 (3) and T16 and T17 (21), which clustered into two monophyletic groups (K12/T16 and

K13/T17) outside of the eight currently recognized genera within the family

Caulimoviridae. Our phylogenetic analysis revealed that DNUaV clusters with the monophyletic group K12/T16 (Fig 3A). Since the sequences reported in these previous studies were obtained using a PCR based approach, the authors were unable to confirm their episomal nature and so theorized that the sequence groups could represent either divergent badnaviruses, ancient endogenous pararetrovirus sequences, or possibly new genera within the family Caulimoviridae. The full-length DNUaV sequence presented here provides strong evidence that the sequences in group K12/T16 may also be derived from episomal virus(es) infecting yam.

When the yam germplasm collection held at CePaCT was tested for DNUaV using primers designed from DNUaV ORF 4, only 2/173 samples tested positive, both of which were D. nummularia from Samoa. Sequencing of the PCR products from the two accessions revealed 99% nucleotide sequence identity to the full-length RCA- derived sequence, indicating that the sequence was conserved in both isolates. These results suggest that DNUaV does not appear to be integrated into the genome of

Dioscorea spp. as the only two samples that tested positive with PCR also tested positive using RCA. Sequences with high similarity to DNUaV have previously been

90

identified from D. nummularia originating from the Solomon Islands (3), however, we were unable to obtain yam samples from the Solomon Islands for testing. The distribution of DNUaV in the Pacific needs to be determined as the current sample set included only two D. nummularia accessions, both from Samoa.

This research builds on the work carried out previously (3,17) in characterizing caulimoviridae from yams in the Pacific and is important in confirming the episomal nature of reported sequences. An understanding of the episomal virus diversity infecting yam will enable genebanks to test their genetic resources to ensure safe distribution. The diagnostic protocol described here for detecting DNUaV may be suitable for routine diagnostic screening for DNUaV in yam germplasm collections.

Acknowledgments

The authors would like to thank the Centre for Pacific Crops and Trees (CePaCT) of the Pacific Community (SPC) for making their yam collections available for this project. Authors would also like to thank Dr. Michael Furlong and Dr. Grahame

Jackson for their support and advice on this research.

91

References

1. FAOSTAT, 2018. Production Statistics (FAOSTAT). Food and Agriculture

Organization of the United Nations. Rome.

2. Mignouna HD, Abang MM, Asiedu R. Genomics of yams, a common source of

food and medicine in the tropics. In: Moore PH, Ming R, editors. Genomics of

tropical crop plants. Springer, New York; 2008; 549–570.

3. Kenyon L, Lebas BSM, Seal SE. Yams (Dioscorea spp.) from the South Pacific

Islands contain many novel badnaviruses: implications for international

movement of yam germplasm. Arch Virol. 2008;153: 877–889.

4. SPYN. Yam: cultivar selection for disease resistance and commercial potential

in Pacific Islands. CIRAD, Montpellier; 2003.

5. Sukal AC, Taylor M, Tuia VS. Viruses and their impact on the utilization of

plant genetic resources in the Pacific. Acta Hortic. 2015;1101: 127–132.

6. Kenyon L, Shoyinka SA, Hughes Jd’A, Odu BO. An overview of viruses

infecting Dioscorea yams in sub-Saharan Africa. In: Hughes Jd’A, Odu BO,

editors. Plant Virology in Sub-Saharan Africa: Proceedings of a conference

organized by IITA. International Institute of Tropical Agriculture, Ibadan; 2001:

432–439.

7. Menzel W, Thottappilly G, Winter S. Characterization of an isometric virus

isolated from yam (Dioscorea rotundata) in Nigeria suggests that it belongs to a

new species in the genus Aureusvirus. Arch Virol. 2014;159: 603–606.

8. Bhat A, Hohn T, Selvarajan R. Badnaviruses: The current global scenario.

Viruses. 2016;8: 177.

9. Bömer M, Turaki AA, Silva G, Kumar PL, Seal SE. A sequence-independent

strategy for amplification and characterization of episomal badnavirus

92

sequences reveals three previously uncharacterized yam badnaviruses. Viruses.

2016;8: 188.

10. Seal S, Turaki A, Muller E, Kumar PL, Kenyon L, Filloux D, et al. The

prevalence of badnaviruses in West African yams (Dioscorea cayenensis-

rotundata) and evidence of endogenous pararetrovirus sequences in their

genomes. Virus Res. 2014;186: 144–154.

11. Geering ADW. Caulimoviridae (Plant Pararetroviruses). In: Encyclopedia of

Life Sciences. John Wiley & Sons Ltd, Chichester; 2014.

12. Geering ADW, Hull R. Family Caulimoviridae. In: King AMQ, Adams MJ,

Carstens EB, Lefkowitz EJ, editors. Virus Taxonomy: Ninth report of the

international committee on taxonomy of viruses. Elsevier Academic Press,

Amsterdam; 2012: 429–443.

13. Hull R. Molecular biology of rice tungro viruses. Annu Rev Phytopathol.

1996;34: 275–297.

14. Bömer M, Rathnayake AI, Visendi P, Silva G, Seal SE. Complete genome

sequence of a new member of the genus Badnavirus, Dioscorea bacilliform RT

virus 3, reveals the first evidence of recombination in yam badnaviruses. Arch

Virol. 2017;163: 533-538.

15. Briddon RW, Phillips S, Brunt A, Hull R. Analysis of the sequence of Dioscorea

alata bacilliform virus; comparison to other members of the badnavirus group.

Virus Genes. 1999;18: 277–283.

16. Seal S, Muller E. Molecular analysis of a full-length sequence of a new yam

badnavirus from Dioscorea sansibarensis. Arch Virol. 2007;152: 819–825.

17. Sukal A, Kidanemariam D, Dale J, James A, Harding R. Characterization of

badnaviruses infecting Dioscorea spp. in the Pacific reveals two putative novel

93

species and the first report of Dioscorea bacilliform RT virus 2. Virus Res.

2017;238: 29–34.

18. Umber M, Gomez RM, Gélabale S, Bonheur L, Pavis C, Teycheney PY. The

genome sequence of Dioscorea bacilliform TR virus, a member of the genus

Badnavirus infecting Dioscorea spp., sheds light on the possible function of

endogenous Dioscorea bacilliform viruses. Arch Virol. 2017;162: 517–521.

19. Bousalem M, Durand O, Scarcelli N, Lebas BSM, Kenyon L, Marchand JL, et

al. Dilemmas caused by endogenous pararetroviruses regarding the taxonomy

and diagnosis of yam (Dioscorea spp.) badnaviruses: Analyses to support safe

germplasm movement. Arch Virol. 2009;154: 297–314.

20. Eni AO, Hughes JDA, Asiedu R, Rey MEC. Sequence diversity among

badnavirus isolates infecting yam (Dioscorea spp.) in Ghana, Togo, Benin and

Nigeria. Arch Virol. 2008;153: 2263–2272.

21. Turaki AA, Bömer M, Silva G, Kumar PL, Seal SE. PCR-DGGE analysis:

Unravelling complex mixtures of badnavirus sequences present in yam

germplasm. Viruses. 2017;9: 181.

22. Umber M, Filloux D, Muller E, Laboureau N, Galzi S, Roumagnac P, et al. The

genome of African yam (Dioscorea cayenensis-rotundata complex) hosts

endogenous sequences from four distinct badnavirus species. Mol Plant Pathol.

2014;15: 790–801.

23. Kleinow T, Nischang M, Beck A, Kratzer U, Tanwir F, Preiss W, et al. Three C-

terminal phosphorylation sites in the abutilon mosaic virus movement protein

affect symptom development and viral DNA accumulation. Virol. 2009;390: 89–

101.

94

24. James AP, Geijskes RJ, Dale JL, Harding RM. Development of a novel rolling-

circle amplification technique to detect banana streak virus that also

discriminates between integrated and episomal virus sequences. Plant Dis.

2011;95: 57–62.

25. Yang IC, Hafner GJ, Dale JL, Harding RM. Genomic characterization of taro

bacilliform virus. Arch Virol. 2003;148:937–949.

26. Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, et al.

Geneious Basic: An integrated and extendable desktop software platform for the

organization and analysis of sequence data. Bioinformatics. 2012;28: 1647–

1649.

27. Finn RD, Attwood TK, Babbitt PC, Bateman A, Bork P, Bridge AJ, et al.

InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids

Res. 2017;45: D190–D199.

28. Geering ADW, Scharaschkin T, Teycheney P-Y. The classification and

nomenclature of endogenous viruses of the family Caulimoviridae. Arch Virol.

2010;155: 123–131.

29. Kumar S, Stecher G, Tamura K. MEGA7: Molecular evolutionary genetics

analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33: 1870–1874.

30. Medberry SL, Lockhart BEL, Olszewski NE. Properties of commelina yellow

mottle virus’s complete DNA sequence, genomic discontinuities and transcript

suggest that it is a pararetrovirus. Nucleic Acids Res. 1990;18: 5505–5513.

31. Hany U, Adams IP, Glover R, Bhat AI, Boonham N. The complete genome

sequence of Piper yellow mottle virus (PYMoV). Arch Virol. 2014;159: 385–

388.

95

32. Kreuze JF, Perez A, Untiveros M, Quispe D, Fuentes S, B 425 arker I, et al.

Complete viral genome sequence and discovery of novel viruses by deep

sequencing of small RNAs: A generic method for diagnosis, discovery and

sequencing of viruses. Virol. 2009;388: 1–7.

33. Petrzik K, Přibylová J, Špak J. Molecular analysis of gooseberry vein banding

associated virus. Acta Virol. 2012;56: 119–124.

96

Supporting Information

S1 Table. Details of yam partial RT/RNase H-coding sequences used in the phylogenetic analysis of DNUaV.

Phylogenetic Isolate GenBank accession Reference group DeBV-A/K1 SB42 Da AM072696 Kenyon et al., 2008 DBESV KY827394 Sukal et al., 2017 FJ75c De AM072663 Kenyon et al., 2008 DeBV-B/K2 PG137 De AM072682 Kenyon et al., 2008 PH06a De AM072688 Kenyon et al., 2008 DeBV-B/K3 PG141 Da AM072683 Kenyon et al., 2008 VU227 Da AM072704 Kenyon et al., 2008 DBALV2 KY827395 Kenyon et al., 2008 DsBV/K4 B394Ds DQ822073 Seal & Muller, 2007 B396Ds DQ822074 Seal & Muller, 2007 DBV-C/K5 FJ60b Dr AM072659 Kenyon et al., 2008 Mt9122Dt AM503398 Bousalem et al., 2009 NGl3841Dc KX008585 Bömer et al., 2016 NGb0005Da2 KX008581 Bömer et al., 2016 NGl1950Dr KX008589 Bömer et al., 2016 S1g6Dr KF829974 Umber et al., 2014 DeBV-D/K6 FJ65c De AM072661 Kenyon et al., 2008 PG180 De AM072687 Kenyon et al., 2008 DeBV-E/K7 PG110bDe AM072677 Kenyon et al., 2008 PH06b De AM072689 Kenyon et al., 2008 DBV-A(A)/K8 Gn502Dr AM503395 Bousalem et al., 2009 Gn845Dr AM503397 Bousalem et al., 2009 NGb0477Dr KX008586 Bömer et al., 2016 BfA103Dc AM503393 Bousalem et al., 2009 NGb0310Da2 KX008583 Bömer et al., 2016 DaBVa X94576, X94581 Briddon et al, 1999 NG3Da AM944573 Bömer et al., 2016 VU249 Db AM072705 Kenyon et al., 2008 DBALV-2ALb KX008594 Bömer et al., 2016 DBALV-2ALa KX008571 Bömer et al., 2016 NGb1844Dr2 KX008592 Bömer et al., 2016 NGb1892Dr1 KX008587 Bömer et al., 2016 DaBVb X94575, X94582 Briddon et al, 1999 NG1Da AM944571 Bömer et al., 2016 NGb0005Da1 KX008580 Bömer et al., 2016 NGb0310Da1 KX008582 Bömer et al., 2016

97

GH03 Dr AM072664 Bousalem et al., 2009 DBALV-3RT KX008595 Bömer et al., 2016 NG01 Dr AM072673 Bousalem et al., 2009 NGb1892Dr2 KX008588 Bömer et al., 2016 NGb2475Dr KX008590 Bömer et al., 2016 BN2Da AM944584 Eni et al., 2008 GHL2d Dr AM072668 Bousalem et al., 2009 DBV-B/K9 BN4Dr AM944586 Eni et al., 2008 Cu1Da AM503359 Bousalem et al., 2009 DBTRV KX430257 Umber et al., 2017 Bf1052Dr AM503363 Bousalem et al., 2009 BfA102Dr AM503365 Bousalem et al., 2009 FJ60a Dr AM072658 Kenyon et al., 2008 Gn1582Dr AM503366 Bousalem et al., 2009 Gn1583Dr AM503367 Bousalem et al., 2009 Bf103aDr AM503362 Bousalem et al., 2009 Gn158Dr AM503368 Bousalem et al., 2009 DpBV/K10 SB15b_DP AM072695 Kenyon et al., 2008 DeBV-E/K11 FJ75b De AM072662 Kenyon et al., 2008 PG176 De AM072686 Kenyon et al., 2008 K12 SB10aDn AM072692 Kenyon et al., 2008 WS31aDn AM421696 Kenyon et al., 2008 K13 PG102cDa AM421690 Kenyon et al., 2008 DBV-A(B)/U12 GHL2a Dr AM072665 Bousalem et al., 2009 Gn1551Dr AM503380 Bousalem et al., 2009 Gn842Dr AM503382 Bousalem et al., 2009 Gn155Dr AM503383 Bousalem et al., 2009 Gn84Dr AM503385 Bousalem et al., 2009 Gn501Dr AM503387 Bousalem et al., 2009 Gn5031Dr AM503388 Bousalem et al., 2009 DBV-D Gn1632Dr AM503399 Bousalem et al., 2009 Gn1645Da AM503401 Bousalem et al., 2009 T13 DBRTV1-2RT KX008597 Bömer et al., 2016 DBRTV1-3RT KX008598 Bömer et al., 2016 DBRTV1 KX008596 Bömer et al., 2016 NGb1844Dr1 KX008591 Bömer et al., 2016 T14 DBRTV2 KX008599 Bömer et al., 2016 DBRTV2-3RT KX008601 Bömer et al., 2016 DBRTV2-4RT KY827393 Bömer et al., 2016 DBRTV2-2RT KX008600 Bömer et al., 2016 T15 NGb0310Da3 KX008584 Bömer et al., 2016 TG2Dr AM944580 Eni et al., 2008 T16 NG165De KY555561 Turaki et al., 2017 T17 NGb53Dr KY555548 Turaki et al., 2017

98

S2 Table. Acronyms, GenBank accession and virus names of sequences used for phylogenetic analysis in Fig 3B.

GenBank Genus Virus species Acronym accession Badnavirus Banana streak GF virus BSGFV AY493509 Banana streak IM virus BSIMV HQ593112 Banana streak MY virus BSMYV AY805074 Banana streak OL virus BSOLV AJ002234 Banana streak UA virus BSUAV HQ593107 Banana streak UI virus BSUIV HQ593108 Banana streak UL virus BSULV HQ593109 Banana streak UM virus BSUMV HQ593110 Banana streak VN virus BSVNV AY750155 Bougainvillea chlorotic vein banding BsCVBV EU034539 virus Cacao swollen shoot virus CSSV NC_001574 Citrus yellow mosaic virus CYMV AF347695 Commelina yellow mottle virus ComYMV X52938 Dioscorea bacilliform AL virus 2 DBALV2 DBALV2 Dioscorea bacilliform AL virus 2 DBALV X94578, X94580, X94582, X94575 Dioscorea bacilliform ES virus DBESV DBESV Dioscorea bacilliform RT virus 1 DBRTV1 KX008574 Dioscorea bacilliform RT virus 2 DBRTV2 KX008577, KY827393 Dioscorea bacilliform RT virus 3 DBRTV3 MF476845 Dioscorea bacilliform SN virus DBSNV DQ822073 Dioscorea bacilliform TR virus DBTRV KX430257 Fig badnavirus 1 FBV-1 JF411989 Gooseberry vein banding associated GVBaV JQ316114 virus Grapevine roditis leaf discoloration- GRLDaV HG940503 associated virus Grapevine vein-clearing virus GVCV JF301669 Pagoda yellow mosaic associated virus PYMAV KJ013302 Pineapple bacilliform comosus virus PBCOV GU121676 Piper yellow mottle virus PYMoV KC808712 Rubus yellow net virus RYNV KM078034 Sugarcane bacilliform Guadeloupe A SCBGAV FJ824813 virus Sugarcane bacilliform Guadeloupe D SCBGDV FJ439817 virus Sugarcane bacilliform IM virus SCBIMV AJ277091

99

Sugarcane bacilliform MO virus SCBMOV NC_008017 Sweet potato caulimo-like virus SPCV HQ694978 Sweet potato vein clearing virus SPVCV HQ694979 Taro bacilliform virus TaBV AF357836 Caulimovirus Atractylodes mild mottle virus AMMV KR080327 Carnation etched ring virus CERV X04658 Cauliflower mosaic virus CaMV V00141 Dahlia mosaic virus DaMV JX272320 Figwort mosaic virus FMV X06166 Horseradish latent virus HRLV JX429923 Lamium leaf distortion virus LLDV EU554423 Mirabilis mosaic virus MiMV AF454635 Soybean Putnam virus SPuV JQ926983 Strawberry vein banding virus SVBV X97304 Cassava vein mosaic virus CsVMV U59751 Petuvirus Petunia vein clearing virus PVCV U95208 Rosadnavirus Rose yellow vein virus RYVV JX028536 Solendovirus Tobacco vein clearing virus TVCV AF190123 Soymovirus Blueberry red ringspot virus BRRV AF404509 Cestrum yellow leaf curling virus CmYLCV AF364175 Peanut chlorotic streak virus PCSV U13988 Soybean chlorotic mottle virus SbCMV X15828 Tungrovirus Rice tungro bacilliform virus RTBV NC001914 Unassigned Dioscorea nummularia-associated virus DNUaV MG944237

100

Chapter 5

An improved degenerate-primed rolling circle amplification and next-generation sequencing approach for the detection

and characterization of badnaviruses

Amit Sukal1,2, Dawit B Kidanemariam1, James Dale1, Robert M. Harding1 and

Anthony James1*

1 Centre for Tropical Crops and Biocommodities, Queensland University of

Technology, Brisbane, Queensland, Australia

2 Centre for Pacific Crops and Trees, Pacific Community, Suva, Fiji.

* Corresponding author:

E-mail address: [email protected] (APJ)

[Formatted for submission to Virology]

101

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

102

Abstract

The genus Badnavirus is characterized by members that are genetically and serologically heterogeneous making their detection and characterization difficult. The presence of integrated badnavirus-like sequence in some host species further complicates diagnosis using PCR-based protocols. To circumvent these issues, we have optimized various RCA protocols including random-primed RCA (RP-RCA), primer-spiked random-primed RCA (primer-spiked RP-RCA), directed RCA (D-

RCA) and specific-primed RCA (SP-RCA). For all methods, amplification of badnavirus genomes is greatly improved using incubation temperatures of 36°C instead of 30°C. Using Dioscorea bacilliform AL virus (DBALV) as an example, we showed that viral DNA amplified using the optimized D-RCA and SP-RCA protocols contained more than 80-fold badnavirus Illumina MiSeq-generated reads than those amplified using random primed-RCA (RP-RCA). The optimized RCA techniques described here were used to successfully amplify badnaviruses infecting banana

(BSCAV, BSGFV, BSMYV, BSOLV), sugar cane (SCBIMV), taro (TaBV) and yam

(DBALV, DBALV2, DBESV and DBRTV2).

Keywords: RCA, RP-RCA, primer-spiked R-RCA, D-RCA, SP-RCA, yam,

Dioscorea spp., badnavirus, NGS

1. Introduction

The genus Badnavirus (family Caulimoviridae) consists of plant pararetroviruses that infect a wide range of economically important crops and cause estimated global economic crop losses ranging from 10-90% (Bhat et al., 2016). Badnaviruses possess non-enveloped bacilliform-shaped virions with an approximate size of 30 nm x 120-

103

150 nm (Geering and Hull, 2012). The genome consists of a single molecule of circular, double-stranded DNA of 7.2-9.2 kb, typically encoding three open reading frames (ORFs) all on the (+) strand (Geering, 2014). Replication occurs via reverse transcription of a greater-than-genome length RNA which subsequently serves as a template both for the translation of viral proteins and for reverse transcription to replicate the genome (Borah et al., 2013; Geering and Hull, 2012; Iskra-Caruana et al.,

2014). The genomes of some badnavirus species are integrated into their host plant genomes, and these sequences are referred to as endogenous badnaviruses (Bhat et al.,

2016; Hohn et al., 2008; Staginnus et al., 2009). These integration events are assumed to have occurred through illegitimate recombination (Holmes, 2011) and/or during

DNA break repair (Gayral et al., 2008) rather than an association with viral infection.

However, there are some instances where these integrated sequences have given rise to systemic virus infection following recombination events post exposure to abiotic stress such as in vitro tissue culture process (Côte et al., 2010; Dallot et al., 2001) and interspecific crossing (Lheureux et al., 2003).

Badnaviruses have been reported from the tropical and temperate regions of

Africa, Asia, Europe, Oceania and the Americas. Most badnavirus species infect tropical and subtropical crops such as banana, black pepper, citrus, cocoa, sugarcane, sweet potato, taro and yam, with a few known to infect plants of the temperate regions such as gooseberry, grape, red raspberry, and ornamental spiraea (reviewed in Bhat et al., 2016). Members of the genus Badnavirus are genetically and serologically heterogeneous, having relatively low nucleotide identities, even within the same species, when compared with other virus genera (Borah et al., 2013; Geering et al.,

2000; Harper et al., 2005, 2004; Jaufeerally-Fakim et al., 2006; Kenyon et al., 2008;

Lockhart et al., 1993).

104

The high sequence and serological diversity, and heterogeneous nature of badnaviruses, in addition to the presence of endogenous badnavirus sequences, presents challenges for the characterization and detection of these viruses and for the safe exchange of germplasm. Although antibodies have been prepared against several badnaviruses for use in serological-based detection tests, they often lack the utility to detect all virus isolates (Kenyon et al., 2008; Seal et al., 2014). Molecular tools, such as PCR, real-time-PCR and loop-mediated isothermal assays have also been developed for several badnaviruses (reviewed in Bhat et al., 2016). However, these are constrained by the highly heterogeneous nature of badnaviruses as well as the presence of integrated sequences in some host plant genomes. Rolling circle amplification

(RCA) is a method that utilizes phi29 polymerase, an enzyme which preferentially amplifies circular DNA and has strong strand displacement and 3`-5` proofreading abilities. These features result in high-fidelity amplification (Blanco et al., 1989;

Rockett et al., 2015) and, as a result, the technique has been exploited as a sequence- independent (random primed or RP) amplification strategy to characterize several groups of DNA-viruses infecting humans, animals and plants (Johne et al., 2009).

However, the sequence-independent nature of RP-RCA can result in off-target amplification, with some studies in sweet potato (Paprotka et al., 2010) and sugar beet

(Homs et al., 2008) reporting amplification of non-viral host DNA, such as mitochondrial DNA. Until 2011, RCA was primarily used to detect plant viruses with small genomes (<3 kb) belonging to the families Geminiviridae and Nanoviridae

(Grigoras et al., 2009; Haible et al., 2006; Inoue-Nagata et al., 2004). James et al.

(2011a) showed that RCA could be used to detect plant viruses with larger genomes such as those from the family Caulimoviridae. Further, they reported an increase in the amplification of the target banana streak virus (BSV, genus Badnavirus) sequences

105

through the addition of BSV-specific primers in addition to the random hexamers included in premixed commercial kits, such as the TempliPhi kit (GE Healthcare,

United Kingdom). The study also highlighted the utility of RCA for the differential amplification of episomal badnavirus genomes compared to their integrated counterparts. The optimized RCA protocol has been subsequently used for the characterization of novel badnaviruses infecting banana (James et al., 2011b), fig

(Laney et al., 2012) and yam (Bömer et al., 2016; Sukal et al., 2017). When using the premixed commercial RCA kit protocols to amplify badnaviruses from yam, however,

Bömer et al. (2016) reported non-specific amplification of DNA from both circular and linear non-viral templates.

The use of premixed kit components, either with/without additional virus-specific primers, has been the standard for most badnavirus RCA applications. However, whereas the use of these kits for the detection of badnaviruses in banana has proven highly successful, their use for the detection of badnaviruses from other crops such as yams has been somewhat less successful (Bömer et al., 2016). Unfortunately, the nature of premixed kits, such as those supplied with the TempliPhi kit, precludes significant scope for optimization. In this study, we report the development of an optimized RCA-based method by manipulating the individual components of the RCA reaction and by the inclusion of improved badnavirus degenerate primers. The use of this method significantly increases episomal badnavirus genome amplification compared to commercial premixed kits and can be used for badnavirus detection and characterization using both Sanger and next-generation sequencing (NGS).

106

2. Materials and Methods

2.1. Samples

Total nucleic acid (TNA) from Dioscorea esculenta leaf tissues infected with

Dioscorea bacilliform AL virus (DBALV) isolate VUT02_De (GenBank accession

MG948562) and Dioscorea bacilliform ES virus (DBESV) isolate FJ14 (GenBank accession KY827394), D. alata leaf tissue infected with Dioscorea bacilliform AL virus 2 (DBALV2) isolate PNG10 (GenBank accession KY827395) and Dioscorea rotundata leaf tissue infected with Dioscorea bacilliform RT virus 2 (DBRTV2) isolate

SAM01 (GenBank accession KY827393) was obtained from the Centre for Pacific

Crops and Trees (CePaCT), Pacific Community (SPC) germplasm collection in Fiji.

Banana leaf tissue infected with isolates of banana streak CA virus (BSCAV), banana streak GF virus (BSGFV), banana streak MY virus (BSMYV) and banana streak OL virus (BSOLV), as well as sugarcane infected with sugar cane bacilliform IM virus

(SCBIMV) and taro infected with taro bacilliform virus (TaBV), was provided by the

Centre for Tropical Crops and Biocommodities (CTCB), Queensland University of

Technology (QUT). TNA was extracted using a CTAB-based method (Kleinow et al.,

2009) and the yield and quality were assessed using a NanoDrop spectrophotometer

(ThermoFisher Scientific, Australia). The concentration of purified TNA was adjusted to ~500 ng/μl with sterile nuclease-free water (sterile NF-H2O) for RCA experiments.

2.2. Primer design

The complete genome sequences of 182 badnaviruses, representing 43 species, were accessed from GenBank. For each genome, the ORF 1 and ORF 2 sequences, as well as the ORF 3 conserved domains equivalent to the cauliflower mosaic virus (CaMV) movement protein (L43-E243 of ORF 1 protein), coat protein (L261-N429 of ORF 4),

107

aspartic protease (K36-Q120 of ORF 5), reverse transcriptase (K273-G449 of ORF 5) and ribonuclease H (I547-E673) (Geering and Hull, 2012) were identified. The ORF 1, ORF

2, or individual ORF 3 conserved domain sequences were separately aligned using the

CLUSTALW algorithm in MEGA7. Primers were designed from the consensus sequence of each alignment using Geneious® v11.0.4 (http://www.geneious.com;

Kearse et al., 2012). The specificity of each primer was assessed using both Primer-

BLAST at NCBI and in silico in Geneious® using the 182 complete genome sequences.

To circumvent the DNA exonuclease activity of phi29 polymerase, the two terminal

3’ nucleotides of each primer were phosphorothioate modified. A total of twenty eight degenerate badnavirus primers were synthesized (Table 1), in addition to phosphorothioate modified Badna-MFP/MRP (Turaki, 2014) and BadnaFP/RP (Yang et al., 2003) primers. Episomal DBV-free accession, DA/NGA01, was used as a negative control for the RCA experiments. DA/NGA01 is a Nigerian accession that was obtained from International Institute of Tropical Agriculture (IITA) by SPC-

CePaCT. This accession was tested free of episomal DBV at IITA using IC-PCR and retested at SPC-CePaCT using IC-PCR and RCA.

2.3. Random-primed RCA (RP-RCA)

RP-RCA was done using the Illustra TempliPhi 100 Amplification Kit (GE

Healthcare) essentially as described by the manufacturer with some modifications.

Briefly, a 1 μl aliquot of TNA (~500 ng) was mixed with 5 μl of kit sample buffer and incubated at 95°C for 3 min, cooled to 4°C and placed on ice. The denatured sample solution was then combined with 5 μl of reaction buffer premixed with 0.2 μl of phi29 polymerase. RP-RCA was either carried out at 30°C (manufacturer’s

108

recommendation) or 36°C for 18 h followed by 65°C for 10 min to inactivate the enzyme.

2.4. Primer-spiked random-primed RCA (primer-spiked RP-RCA)

Primer-spiked RP-RCA was essentially as described for RP-RCA with the addition of 1 μl of a mixture of 32 degenerate badnavirus primers (Table 1) at a final concentration of 0.4 μM of each primer to 5 μl of the sample buffer as previously described by James et al. (2011a) with either 30°C or 36°C as the incubation temperature.

2.5. Directed RCA (D-RCA)

The directed RCA (D-RCA) protocol was a modification of the published two-step

RCA used for amplification of low-copy number human polyomaviruses, which consisted of an annealing step followed by an amplification step (Marincevic-Zuniga et al., 2012; Rockett et al., 2015). For the annealing step, the 32 badnavirus-specific primer mix (Table 1) at a final concentration of 0.4 μM of each primer was combined with 1 × phi29 buffer (NEB, Australia) and 1 μl TNA in a final volume of 10 μl. The mixture was denatured at 95°C for 3 min, cooled to 4°C and placed on ice. A second mixture consisting of 2.5 μM of exonuclease-resistant random hexamers

(ThermoFisher Scientific, Australia), 1 × phi29 buffer, 2 ng/μl bovine serum albumin

(BSA), 4 mM DTT, 15 mM dNTPs, 5 U of phi29 polymerase (ThermoFisher

Scientific, Australia) and sterile NF-H2O to 10 μl was prepared and combined with the denatured primer/template mixture. Reactions were incubated at 36°C for 18 h, followed by enzyme inactivation at 65°C for 10 min.

109

Table 1 Sequences of primers used in primer-spiked RP-RCA, D-RCA and SP-RCA protocols

Primer name Sequence (5’ to 3’) Reference Badna-MFP CAARTMTCTATCCTYACCAAAGG Turaki (2014) Badna-MRP AWTGCYTGNACTCCATGRG Turaki (2014) BadnaFP CCAYTTRCAIACISCICCCCAICC Yang et al. (2003) BadnaRP ATGCCITTYGGIITIAARAAYGCICC Yang et al. (2003) Badna_RCA4 AYNADSAGRRTTKGYYTCHCC This study Badna_RCA5 AANYCRRCRTTDGRDGTRTTKG This study Badna_RCA18 ACHYYNTSRATGBTKRTANKYRAA This study Badna_RCA23 GGBTCAAKRAYDARYATDGCYCC This study Badna_RCA12 CARHTRRTHTANRTHATMCCDRA This study Badna_RCA45 TAYGGNRYMAGRARRRDCHA This study Badna_RCA64 TTYGAYYTRAARWSYGGHTT This study Badna_RCA65 TCHATMSMHTGGACDGCHTT This study Badna_RCA78 ADAYKCCWCCCCAWCC This study Badna_RCA17 TTYRSDRAYTAYSMRG This study Badna_RCA7 TMCCWGCWGARGTVYTVTA This study Badna_RCA8 TWYATYCAYMTHGGWRTVHT This study Badna_RCA11 ATGGARGTDGAYYTDWCHRAAGG This study Badna_RCA13 ATGAYVACHATHVRRGAYTTCTA This study Badna_RCA46 TAYAARGGHAARCCWCA This study Badna_RCA47 AAACHCATGTNMGRRTWGWHAA This study Badna_RCA62 TGGTVTTCAAYTAYAARMG This study Badna_RCA66 TDTAYGAATGGYTDGTHAT This study Badna_RCA67 THTTYCARAGRAARATGG This study Badna_RCA71 TGGRYTDRTYCTHAGYCC This study Badna_RCA77 TVRTMMTWGARACWGAYGGHT This study Badna_RCA80 AAYTTBCCRCTKGCRTADGCRCA This study Badna_RCA81 ACHATYGAYGCHGARAT This study Badna_RCA82 TYAARATHTAYTAYYTKGA This study Badna_RCA83 ATDGCYTGRCWRTCWGTTCTKA This study Badna_RCA86 TTBCCDTYWATRTGYTC This study Badna_RCA1 TGGTATCAGAGCWDDGT This study Badna_RCA14 TCNGTYTGYTTYTCDATRAAYTT This study

110

2.6. Specific-primed RCA (SP-RCA)

SP-RCA was carried out essentially as described for D-RCA except that the random hexamers used in the second master mix were substituted with the badnavirus-specific primer mixture at a final concentration of 0.4 μM of each primer.

2.7. Optimization of D-RCA and SP-RCA

To determine the optimum incubation temperature, RCA was carried out using TNA from DBALV-infected yam as template. Incubation temperatures ranging from 30-

40°C, in increments of 2°C, were assessed and all reactions used an 18 h incubation time. To investigate the optimum incubation time, RCA was carried out as before using incubation times of 4, 8, 12, 16 and 18 h at the optimum RCA temperatures. To determine the optimal dNTP concentration for amplification, RCA was carried out using final concentrations of 0, 2.5, 5, 10, 15 and 20 mM dNTPs. The sensitivity of

RP-RCA, primer-spiked RP-RCA, D-RCA and SP-RCA was determined by varying the template concentration using 500, 250, 125, 50, 25, 12.5 and 0 ng of DBALV- infected yam TNA with RCA carried out using the optimized temperature (36°C), incubation time and concentration of dNTPs. All RCA conditions were kept essentially as described in sections 2.3 to 2.6 while varying the parameter under investigation.

2.8. Restriction analysis, cloning and Sanger sequencing

RCA products were independently digested with either EcoRI, KpnI, SphI or StuI, which were selected based on in silico restriction analysis of published badnavirus genome sequences, or from experimental experience, to generate useful restriction profiles. RCA products (10 l) were digested in a total reaction volume of 20 l

® containing 5 U of enzyme, 1 × CutSmart buffer (NEB) and sterile NF-H2O. Reaction

111

mixtures were incubated at 37°C for 2-4 h and the digested RCA products were analysed by electrophoresis through 1.5% agarose gels stained with SYBR® Safe

(ThermoFisher Scientific). Fragments of interest were purified using Freeze ‘N

SqueezeTM DNA Gel Extraction Spin Columns (Bio-Rad, Australia) and cloned into pUC19 as described in Sukal et al. (2017). Sequencing was carried out using either

M13F/R or BadnaFP/RP primers.

2.9. RCA-NGS and genome assembly

To characterize the specificity of each of the different RCA protocols for amplification of badnaviruses, DBALV TNA amplified using RP-RCA, D-RCA and SP-RCA was purified and sequenced using the Illumina MiSeq platform. RCA products were purified using the Illustra™ GFX™ PCR DNA and Gel Band Purification Kit (GE

Healthcare). Sequencing libraries were prepared from purified RCA products using the Nextera™ XT Library Prep Kit (Illumina) and paired-end reads generated using the MiSeq system (Illumina) at the Central Analytical Research Facility (CARF),

Queensland University of Technology, Brisbane, Australia. Raw read quality was assessed with FastQC v0.10.1

(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), with residual adapter sequences trimmed, as well as low quality and short reads (<40 nt) removed, using the

BBduk plugin in Geneious®. To determine NGS read identities, the quality corrected reads were mapped against badnavirus genomes (182 complete sequences), the

Dioscorea rotundata reference genome (GenBank accessions DF933857-DF938579), plastid (GenBank accessions EF380353, KJ490011, KY085893) or mitochondrial

(GenBank accession LC219374) DNA using the Geneious reference mapper algorithm. Unmapped reads were de novo assembled using SPAdes v3.5 (Bankevich

112

et al., 2012) and BLASTn was carried out on contigs >1000 bp to further determine read identities. Finally, complete viral genomes were generated by reference mapping total NGS reads against the DBALV genome (GenBank accession KX008571).

3. Results

3.1. Badnavirus RCA optimization

RCA optimization was initially carried out using TNA extracted from D. esculenta originating from Vanuatu infected with DBALV isolate VUT02_De. The effect of incubation temperature on RP-RCA, D-RCA and SP-RCA was evaluated between 30 and 40°C at intervals of 2°C. Following RCA, a 10 μl aliquot of each amplification reaction was digested with EcoRI, to generate four fragments of ~3.3,

1.7, 1.4 and 1 kb. Using RP-RCA, only very low levels of visible digest products were observed irrespective of the RCA incubation temperatures (Fig. 1A). In contrast, whereas incubation temperatures of 30 and 40°C also resulted in very poor amplification of virus DNA using D-RCA and SP-RCA, considerably stronger visible digest products were observed using incubation temperatures of 32-36°C for SP-RCA, or 32-38°C for D-RCA (Fig. 1A). Similar results were observed when RCA reactions were digested with SphI (results not shown).

The effect of dNTP concentration on amplification was evaluated by varying the final concentration of dNTPs between 0 and 20 mM in the RCA reactions. When D-

RCA and SP-RCA-amplified DNA was digested with EcoRI, no visible reaction products were observed in reactions with 0 or 2.5 mM dNTPs (Fig. 1B). In contrast, visibly stronger digest products were observed using D-RCA reactions containing 5 mM dNTPs, while no visible digest products were observed from SP-RCA reactions using 5 mM dNTPs. The strongest visible digest products were observed using D-RCA

113

and SP-RCA reactions containing dNTP concentrations in the range of 10-20 mM (Fig.

1B).

Evaluation of reaction incubation times showed that detectable levels of badnavirus DNA were amplified after 12 h in both the D-RCA and SP-RCA reactions, but only after 16 h from RP-RCA reactions (Fig. 1C). However, in all three RCA protocols, optimal levels of visible digest products were observed following incubation for 16-18 h.

When the template concentration was varied from 5 to 500 ng, very low levels of visible digest products were observed using the RP-RCA protocol containing 250-500 ng of TNA template (Fig 2A). In contrast, digestion of products from primer-spiked

RP-RCA generated visible digest products when as little as 50 ng of TNA template was used (Fig. 2B). The levels of visible digest products from both the RP-RCA and primer-spiked RP-RCA protocol was positively correlated with the amount of TNA template added, with higher starting template concentrations resulting in higher amounts of visible digest products. When the amount of TNA template was varied in the D-RCA and SP-RCA protocols, relatively high amounts of visible digest products were observed at all TNA concentrations assessed (Fig. 2C and D, respectively).

Following optimization, D-RCA and SP-RCA were carried out as described in 2.5 and 2.6, respectively, with a final concentration of 15 mM dNTPs and an incubation temperature of 36°C for a duration of 18 h, while, RP-RCA and primer-spiked RP-

RCA were carried out as described in 2.3 and 2.4, respectively. All RCA was carried out with using 500 ng of TNA as template.

114

Gradient RCA

A RP-RCA (°C) D-RCA (°C) SP-RCA (°C)

M 30 32 34 36 38 40 30 32 34 36 38 40 30 32 34 36 38 40 M

10 kb 6 kb

3 kb

1 kb

dNTP concentration B D-RCA (mM) SP-RCA (mM) M 0 2.5 5 10 15 20 M 0 2.5 5 10 15 20 M

10 kb 6 kb

3 kb

1 kb

115

C

Fig. 1. RCA of DBALV isolate VUT02_De. (A) Gradient incubation temperature from

30 to 40C in 2C increments for RP-RCA, D-RCA and SP-RCA, (B) D-RCA and SP-

RCA setup with dNTP concentrations of 0 to 20 mM, (C) Effect of incubation duration on RP-RCA, D-RCA and SP-RCA. The RCA products in (A) and (B) were digested with EcoRI, while, RCA products of (C) were digested with SphI and electrophoresed through 1.5% agarose gel stained with SYBR® Safe. M - GeneRuler 1 kb DNA Ladder

(ThermoFisher Scientific, Australia) visible are lanes from 1 kb onwards. The numbers on the side indicate the molecular sizes of the markers in base pairs.

116

Fig. 2. RCA of DBALV isolate VUT02_De using concentrations of 0 to 500 ng total nucleic acid. (A) RP-RCA, (B) primer-spiked RP-RCA (C)

D-RCA and (D) SP-RCA. The RCA products were digested with EcoRI and electrophoresed through 1.5% agarose gels stained with SYBR®

Safe. NT- No template control. M - GeneRuler 1 kb DNA Ladder (ThermoFisher Scientific, Australia).

117

3.2. RP-RCA, primer-spiked RP-RCA, D-RCA and SP-RCA amplification of

badnaviruses

To compare the utility of the optimized RCA protocol with previously described

RCA protocols for the detection of a broad range of badnaviruses, TNA from bananas infected with BSCAV, BSGFV, BSMYV or BSOLV, yam plants infected with

DBALV, DBALV2, DBESV or DBRTV2, sugar cane infected with SCBIMV and taro infected with TaBV were subjected to RP-RCA, primer-spiked RP-RCA, D-RCA and

SP-RCA using the optimized reaction conditions. Since temperatures in the range of

34-38°C were shown to increase the efficiency of both RP-RCA (Fig. 1A) and primer- spiked RP-RCA (figure not shown), the performance of RP-RCA and primer-spiked

RP-RCA was evaluated at both 30°C (used in previous published studies), and at 36°C.

To obtain putative full-length restriction digestion fragments, RCA products from

TNA extracts containing BSCAV, BSGFV, BSMYV, BSOLV, DBESV and SCBIMV were digested with KpnI, RCA products from TNA extracts containing DBALV,

DBALV2, DBRTV2 were digested with SphI while RCA products from the TNA extract containing TaBV was digested with StuI. RP-RCA carried out at 30°C amplified large amounts of BSGFV and BSMYV (Fig. 3A, lanes 2-3), however, only very low levels of amplification were observed for BSCAV, DBALV2, SCBIMV and

TaBV (Fig. 3A, lanes 1, 6 and 9-10) and no visible digest products were present in digests of RCA-amplified DNA from samples with BSOLV, DBALV, DBALV2 and

DBRTV2 (Fig. 3A, lanes 4-5, 7-8). In contrast, RP-RCA carried out at 36°C amplified all badnavirus species tested, with highest levels of amplification observed with

BSGFV, BSMYV, DBALV, DBALV2 and SCIMV (Fig. 3B, lane 2-3 5-6 and 9) compared to BSCAV, BSOLV, DBESV, DBRTV2 and TaBV (Fig. 3B, lane 1, 4, 7-8 and 10).

118

Viral DNA was amplified from all samples using primer-spiked RP-RCA at 30°C, although the amplification products from some samples containing BSCAV, BSGFV,

BSMYV and SCIMV (Fig. 3C, lanes 1-3 and 9) were comparatively higher than others

(BSOLV, DBALV, DBALV2, DBESV, DBRTV2 and TaBV; Fig. 3C lanes 4-8 and

10). The use of primer-spiked RP-RCA at 36°C also resulted in the amplification of all viruses, with either similar intensity reaction products (eg. BSCAV) or relatively stronger amplification products observed, compared with incubation at 30°C.

Comparatively lower amplification was observed for all badnaviruses from yam host with primer-spiked RP-RCA at 30°C (Fig. 3C, lanes 5-8) compared with 36°C (Fig.

3D, lanes 5-8).

D-RCA and SP-RCA amplified all samples (Fig. 3E and F, respectively), although consistently higher amplification was achieved for all samples with D-RCA compared to RP-RCA, primer-spiked RP-RCA and SP-RCA. Whereas consistently high levels of amplification were obtained using D-RCA (Fig. 3E), the level of amplification from other RCA protocols was variable.

All the single digest bands of RP-RCA (30 and 36°C), primer-spiked RP-RCA

(30 and 36°C) D-RCA and SP-RCA (Fig. E and F) were excised, cloned into appropriately cut pUC19 and Sanger sequenced using M13F/R or BadnaFP/RP primers. Sequencing confirmed that the amplification was of the respective virus isolates used for the RCA optimization.

119

120

Fig. 3. Different badnavirus infected samples amplified with (A) RP-RCA at 30°C incubation, (B) RP-RCA at 36°C incubation, (C) primer-spiked

RP-RCA at 30°C, (D) primer-spiked RP-RCA at 36°C, (E) D-RCA and (F) SP-RCA. Lanes 1-4, 7 and 9 represent KpnI digested RCA products of

BSCAV, BSGFV, BSMYV, BSOLV, DBESV and SCIMV, respectively, while lanes 5-6 and 8 represents SphI digested RCA of DBALV,

DBALV2, DBRTV2 and lane 10 represents TaBV digested with StuI. Lane 11 is a known negative sample (DA/NGA01) digested with EcoRI and

Lane 12 is a no template control. M - GeneRuler 1 kb DNA Ladder (ThermoFisher Scientific, Australia) visible are lanes from 1 kb onward.

121

3.3. RCA-NGS for virus characterization

To investigate the efficiency of the different RCA protocols for the amplification of badnavirus DNA, TNA extracted from DBALV-infected D. esculenta from

Vanuatu (DBALV isolate VUT02_De) was used in RP-RCA, D-RCA and SP-RCA.

Undigested reaction products of single RP-RCA, D-RCA and SP-RCA reactions were then individually sequenced using the Illumina platform which resulted in paired-end reads of 350,272, 604,382 and 800,560 from NGS, respectively. Following adapter removal, quality trimming and removal of short reads (<40 nt) from the paired-end reads, 317,608, 557,088 and 751,680 respective reads from the RP-RCA, D-RCA and

SP-RCA products were obtained. Reference mapping using Geneious revealed that

0.15%, 2.39%, 41.85%, and 1.38% of RP-RCA sequences, 85.71%, 4.00%, 0.41% and

0.03% of D-RCA sequences and 84.78%, 2.96%, 0.30% and 0.02% of SP-RCA sequences mapped to badnaviruses, the Dioscorea rotundata reference genome sequence, plastid or mitochondrial sequences, respectively, while 54.27%, 9.86% and

11.94% of RP-RCA, D-RCA and SP-RCA generated sequences remained unmapped.

When the unmapped reads were de novo assembled and contigs >1000 bp were subjected to BLASTn analysis, the majority of hits were to plant genomes other than

Dioscorea spp.

Reference mapping was further carried out in Geneious, using DBALV-[2ALa]

(GenBank accession KX008571) as a reference, to generate the complete genome of isolate VUT02_De used in this study. The Geneious mapper assigned 7,600 (mean coverage 11), 477,485 (mean coverage 12,783) and 637,254 reads (mean coverage

17,145) of the trimmed reads generated using RP-RCA, D-RCA and SP-RCA, respectively, to DBALV-[2ALa]. The D-RCA and SP-RCA libraries generated a

122

complete circular virus genome using the NGS data, whereas the RP-RCA library only generated fragmented sequences, the largest of which was 7,203 nt.

3.4. DBALV isolate VUT02_De genome

The complete NGS-derived sequences of DBALV isolate VUT02_De generated using D-RCA and SP-RCA NGS showed 99.6% nucleotide identity to each other. The consensus genome of DBALV isolate VUT02_De comprised 7,509 nt and contained three open reading frames (ORFs). ORF 1 comprised 432 bp and encoded a putative protein of 144 amino acids (aa), while ORF 2 was 378 bp and encoded a putative protein of 126 aa. ORF 3 was 5,682 bp and encoded a putative protein of 1,894 aa. An intergenic region (IR) of 1,022 bp was present between ORF 1 and ORF 3 and

met contained a putative plant tRNA binding site (3`-TGGTATCAGAGCTTGGTT-5`1-

met 18) complementary to the consensus sequence of plant cytoplasmic initiator tRNA

(3`-ACCAUAGUCUCGGUCCAA-5′) which was designated as the start of the circular viral genome. The 529 bp partial RT/RNase H-coding region delineated by the BadnaFP/RP primers showed highest nucleotide identity (99.6%) to a partial

RT/RNase H-coding sequence of Dioscorea bacilliform virus isolate VU249_Db

(GenBank accession AM072705) and 92.5-94.5% similarity to two other partial sequences, VU254_DP and VU252_Db (AM072707 and AM072706, respectively) originating from Vanuatu. When compared to published DBALV full length sequences, the complete genome sequence DBALV-VUT02_DE had highest sequence identities with DBALV isolate DBALV-[2ALa] (89.8%, KX008572), DBALV-[2Alb]

(89.7%, KX008571) and DBALV-[3RT] (85.5%, KX008595). The complete genome sequence of DBALV-VUT02_DE has been deposited in GenBank under accession number MG948562.

123

4. Discussion

In this paper, two improved RCA protocols for the amplification and characterization of badnaviruses are described. The major advantage of these protocols over previously described methods is that they avoid the use of premixed reaction components, such as those included in the commercial TempliPhi kit, thereby allowing each component of the reaction mixture to be varied to achieve optimum viral genome amplification. The use of a suite of badnavirus degenerate primers in the denaturation step of these optimized protocols significantly increased the amplification bias towards the target virus thus reducing non-target amplification.

Although RCA using kit-based protocols, such as TempliPhi, has been successfully used to amplify badnaviruses from various host species, such as banana

(Baranwal et al., 2014; Carnelossi et al., 2014; James et al., 2011b; Javer-Higginson et al., 2014; Sharma et al., 2015, 2014; Wambulwa et al., 2013, 2012), cacao

(Chingandu et al., 2017a, 2017b; Muller et al., 2018), fig (Laney et al., 2012), mulberry (Chiumenti et al., 2016), Rubus spp. (Diaz-Lara et al., 2015) and yam (Bömer et al., 2018, 2016; Sukal et al., 2017; Umber et al., 2014), the method still has several limitations. Due to the sequence-independent nature of the kit-based RCA protocols, plant-genome derived DNAs, such as mitochondrial or chloroplast DNA, are sometimes also amplified (Bömer et al., 2016; Homs et al., 2008; Paprotka et al.,

2010). Further, during the initial annealing process, the random hexamer primers in the premixed sample buffer bind to all available nucleic acids and not just the preferred target sequences, enabling non-target DNA amplification in addition to the desired target. Although the addition of virus-specific primers reported by James et al (2011a) creates a bias towards the target badnavirus genomes, non-specific amplification still occurs as a consequence of priming to non-target DNA by the random primers present

124

in the premixed sample buffer used in the denaturation step. Modification of the initial denaturation mixture to only contain primers which anneal to the target virus sequences results in amplification which is biased towards the target circular virus genomes. Compared to random-primed RCA using the TempliPhi kit, where <1% of the NGS reads mapped to badnavirus sequences, the use of either D-RCA or SP-RCA resulted in 85.71% and 84.78% of respective reads mapping to the target badnavirus, showing that these RCA protocols greatly enhance amplification of the target virus genome.

In an effort to optimize the RCA protocol to detect the greatest breadth of badnavirus sequence diversity, in silico primer-binding analysis using the 32 degenerate primers developed in this study was carried out using complete genome sequences of the 43 currently recognized badnavirus species. This analysis showed that at least 10 of the primers were able to bind to every complete genome sequence.

Using these primers, the SP- and D-RCA protocols were shown to successfully amplify distantly related badnaviruses including four distinct BSVs, four distinct

DBVs, as well as SCBIMV and TaBV, from four different host plant species, highlighting the utility of the method for both detection and characterization of badnaviruses.

We also found that increasing incubation temperature to 36°C greatly improved the performance of kit-based RCA protocols such as the RP-RCA and primer-spiked

RP-RCA (Fig. 3B and D, respectively). However, D-RCA was found to be the most consistent and reproducible protocol for generic badnavirus amplification from the different host plants. Although SP-RCA performed to an equivalent level in some cases, there was some variability in amplification levels possibly due to variability in the number of primers binding to different target virus genomes. However, the

125

reliability of SP-RCA should be improved further by designing specific primers for the target virus species of interest.

RCA post-amplification analysis often involves restriction analysis to confirm viral genome amplification, however, this is dependent on knowledge of suitable restriction enzymes which generate reproducible restriction profiles. The genomic heterogeneity of many badnaviruses, together with the limited availability of complete genome sequences for some virus species, complicates the use of restriction analysis for virus detection. By utilizing NGS of total undigested RCA products, both virus detection as well as full genome characterization can be accomplished. The combination of RCA and NGS has previously been used for the characterization of circular DNA viruses including geminiviruses (Leke et al., 2016; Zubair et al., 2017;

Idris et al., 2014; Kathurima et al., 2016) and badnaviruses (Chingandu et al., 2017a,

2017b; Muller et al., 2018). Previous work using RP-RCA-NGS to characterize the badnaviruses, Cacao mild mosaic virus (CaMMV) and Cacao yellow vein-banding virus (CYVBV), showed that of the total 2,111,947 and 3,664,739 NGS reads obtained by sequencing of RCA products, only 1,084,938 and 15,355 reads (representing

51.37% and 0.4% of the total reads, respectively) were derived from the target viral sequences (Chingandu et al., 2017b). Using the SP- and D-RCA methods described herein produced a far greater percentage of reads (~85%) mapping to the target

DBALV2 genome compared with the RP-RCA protocol (~1%). This result highlights the significance of the improvement in target sequence amplification by omitting random primers from the initial denaturation/annealing step as well as the utility for using NGS to diagnose and characterize badnavirus genomes. The costs associated with NGS may preclude its use as a routine diagnostic tool, however, if enough

126

badnavirus genome information can be amassed through initial RCA-NGS efforts,

RCA restriction analysis can then be used as an effective diagnostic tool.

The high levels of heterogeneity at both serological and genetic levels and the presence of host genome integrated sequences believed to be remnants of ancient viral sequences make characterization and diagnosis of some badnaviruses difficult. The optimized RCA protocols described in this study coupled with NGS can be used for the characterization and detection of badnaviruses from a range of host species. The potential for using restriction analysis of either D-RCA or SP-RCA products as a diagnostic tool for badnavirus detection remains high, with continual sequencing of complete genomes improving knowledge of suitable restriction enzymes for digestion of reaction products, particularly for the host/badnavirus combinations described in this study.

Acknowledgements

The authors would like to thank the Centre for Pacific Crops and Trees (CePaCT) of the Pacific Community (SPC) for making their yam collections available for this project. Authors would also like to thank Dr. Michael Furlong and Dr. Grahame

Jackson for their support and advice on this research. This work was funded under the

Australian Centre for International Agricultural Research project PC/2010/065. AS is a John Allwright Fellowship recipient.

127

References

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S.,

Lesin, V.M., Nikolenko, S.I., Pham, S., Prjibelski, A.D., Pyshkin, A. V., Sirotkin,

A. V., Vyahhi, N., Tesler, G., Alekseyev, M.A., Pevzner, P.A., 2012. SPAdes: A

new genome assembly algorithm and its applications to single-cell sequencing. J.

Comput. Biol. 19, 455–477.

Baranwal, V.K., Sharma, S.K., Khurana, D., Verma, R., 2014. Sequence analysis of

shorter than genome length episomal Banana streak OL virus like sequences

isolated from banana in India. Virus Genes 48, 120–127.

Bhat, A., Hohn, T., Selvarajan, R., 2016. Badnaviruses: The current global scenario.

Viruses 8, 177.

Blanco, L., Bernads, A., Lharo, J.M., Martins, G., Garmendia, C., 1989. Highly

efficient DNA synthesis by the phage 429 DNA polymerase. The J. Biol. Chem.

264, 8935-8940.

Bömer, M., Rathnayake, A.I., Visendi, P., Silva, G., Seal, S.E., 2018. Complete

genome sequence of a new member of the genus Badnavirus, Dioscorea

bacilliform RT virus 3, reveals the first evidence of recombination in yam

badnaviruses. Arch. Virol. 163, 533–538.

Bömer, M., Turaki, A., Silva, G., Kumar, P., Seal, S., 2016. A sequence-independent

strategy for amplification and characterization of episomal badnavirus sequences

reveals three previously uncharacterized yam badnaviruses. Viruses 8, 188.

Borah, B.K., Sharma, S., Kant, R., Johnson, A.M.A.A., Saigopal, D.V.R., Dasgupta,

I., 2013. Bacilliform DNA-containing plant viruses in the tropics: Commonalities

within a genetically diverse group. Mol. Plant Pathol. 14, 759–771.

Carnelossi, P.R., Bijora, T., Facco, C.U., Silva, J.M., Picoli, M.H.S., Souto, E.R.,

128

Oliveira, F.T. De, 2014. Episomal detection of banana streak OL virus in single

and mixed infection with Cucumber mosaic virus in banana “Nanicão Jangada.”

Trop. Plant Pathol. 39, 342–346.

Chingandu, N., Kouakou, K., Aka, R., Ameyaw, G., Gutierrez, O.A., Herrmann, H.-

W., Brown, J.K., 2017a. The proposed new species, cacao red vein virus, and

three previously recognized badnavirus species are associated with cacao swollen

shoot disease. Virol. J. 14, 199.

Chingandu, N., Zia-ur-rehman, M., Sreenivasan, T.N., Surujdeo-Maharaj, S.,

Umaharan, P., Gutierrez, O.A., Brown, J.K., Thyail, M.Z., Zia-ur-rehman, M.,

Sreenivasan, T.N., Surujdeo-Maharaj, S., Umaharan, P., Gutierrez, O.A., Brown,

J.K., Thyail, M.Z., 2017b. Molecular characterization of previously elusive

badnaviruses associated with symptomatic cacao in the New World. Arch. Virol.

162, 1363–1371.

Chiumenti, M., Morelli, M., De Stradis, A., Elbeaino, T., Stavolone, L., Minafra, A.,

2016. Unusual genomic features of a badnavirus infecting mulberry. J. Gen.

Virol. 97, 3073–3087.

Côte, F.X., Galzi, S., Folliot, M., Lamagnère, Y., Teycheney, P., 2010.

Micropropagation by tissue culture triggers differential expression of infectious

endogenous Banana streak virus sequences (eBSV) present in the B genome of

natural and synthetic interspecific banana plantains. Mol. Plant Pathol. 11, 137–

144.

Dallot, S., Acuña, P., Rivera, C., Ramírez, P., Cote, F., Lockhart, B.E.L., Caruana,

M.L., 2001. Evidence that the proliferation stage of micropropagation procedure

is determinant in the expression of Banana streak virus integrated into the genome

of the FHIA 21 hybrid (Musa AAAB). Arch. Virol. 146, 2179–2190.

129

Diaz-Lara, A., Mosier, N.J., Keller, K.E., Martin, R.R., 2015. A variant of Rubus

yellow net virus with altered genomic organization. Virus Genes 50, 104–110.

Gayral, P., Noa-Carrazana, J.-C., Lescot, M., Lheureux, F., Lockhart, B.E.L.,

Matsumoto, T., Piffanelli, P., Iskra-Caruana, M.-L., 2008. A single banana streak

virus integration event in the banana genome as the origin of infectious

endogenous pararetrovirus. J. Virol. 82, 6697–6710.

Geering, A.D., 2014. Caulimoviridae (Plant Pararetroviruses), in: ELS. John Wiley &

Sons, Ltd, Chichester, UK.

Geering, A.D., McMichael, L.A., Dietzgen, R.G., Thomas, J.E., 2000. Genetic

diversity among banana streak virus isolates from Australia. Phytopathology 90,

921–927.

Geering, A.D.W., Hull, R., 2012. Family Caulimoviridae, in: King, A.M.Q., Adams,

M.J., Carstens, E.B., Lefkowitz, E.J. (Eds.), Virus Taxonomy. Ninth Report of

the International Committee on Taxonomy of Viruses. Elsevier Academic Press,

Amsterdam, The Netherlands, pp. 429–443.

Grigoras, I., Timchenko, T., Katul, L., Grande-Pérez, A., Vetten, H.-J., Gronenborn,

B., 2009. Reconstitution of authentic from multiple cloned DNAs. J.

Virol. 83, 10778–10787.

Haible, D., Kober, S., Jeske, H., 2006. Rolling circle amplification revolutionizes

diagnosis and genomics of geminiviruses. J. Virol. Methods 135, 9–16.

Harper, G., Hart, D., Moult, S., Hull, R., Geering, A., Thomas, J., 2005. The diversity

of banana streak virus isolates in Uganda. Arch. Virol. 150, 2407–2420.

Harper, G., Hart, D., Moult, S., Hull, R., 2004. Banana streak virus is very diverse in

Uganda. Virus Res. 100, 51–56.

Hohn, T., Richert-Pöggeler, K.R., Staginnus, C., Harper, G., Schwarzacher, T., Teo,

130

C.H., Teycheney, P.Y., Iskra-Caruana, M.L., Hull, R., 2008. Evolution of

integrated plant viruses. Evol. 53–81.

Holmes, E.C., 2011. The evolution of endogenous viral elements. Cell Host Microbe

10, 368–377.

Homs, M., Kober, S., Kepp, G., Jeske, H., 2008. Mitochondrial plasmids of sugar beet

amplified via rolling circle method detected during screening. Virus

Res. 136, 124–129.

Inoue-Nagata, A.K., Albuquerque, L.C., Rocha, W.B., Nagata, T., 2004. A simple

method for cloning the complete genome using the bacteriophage

φ29 DNA polymerase. J. Virol. Methods 116, 209–211.

Idris, A., Al-saleh, M., Piatek, M.J., Al-shahwan, I., Ali, S., Brown, J.K., Marek, J.,

Judith, K., Idris, A., Piatek, M.J., Ali, S., Al-saleh, M., Piatek, M.J., Al-shahwan,

I., Ali, S., Brown, J.K., 2014. Viral metagenomics: Analysis of

by illumina high-throughput sequencing. Viruses 6, 1219–1236.

Iskra-Caruana, M. line, Duroy, P.O., Chabannes, M., Muller, E., 2014. The common

evolutionary history of badnaviruses and banana. Infect. Genet. Evol. 21, 83–89.

James, A.P., Geijskes, R.J., Dale, J.L., A., Harding, R.M., 2011a. Development of a

novel rolling-circle amplification technique to detect banana streak virus that also

discriminates between integrated and episomal virus sequences. Plant Dis. 95,

57–62.

James, A.P., Geijskes, R.J., Dale, J.L., Harding, R.M., 2011b. Molecular

characterization of six badnavirus species associated with leaf streak disease of

banana in East Africa. Ann. Appl. Biol. 158, 346–353.

Jaufeerally-Fakim, Y., Khorugdharry, A., Harper, G., 2006. Genetic variants of banana

streak virus in Mauritius. Virus Res. 115, 91–98.

131

Javer-Higginson, E., Acina-Mambole, I., González, J.E., Font, C., González, G.,

Echemendía, A.L., Muller, E., Teycheney, P.Y., 2014. Occurrence, prevalence

and molecular diversity of banana streak viruses in Cuba. Eur. J. Plant Pathol.

138, 157–166.

Johne, R., Müller, H., Rector, A., van Ranst, M., Stevens, H., 2009. Rolling-circle

amplification of viral DNA genomes using phi29 polymerase. Trends Microbiol.

17, 205–211.

Kathurima, T.M., Ateka, E.M., Nyende, A.B., Holton, T., 2016. The rolling circle

amplification and next generation sequencing approaches reveal genome wide

diversity of Kenyan cassava mosaic geminivirus. African J. Biotechnol. 15,

2045–2052.

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton,

S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P.,

Drummond, A., 2012. Geneious Basic: An integrated and extendable desktop

software platform for the organization and analysis of sequence data.

Bioinformatics 28, 1647–1649.

Kenyon, L., Lebas, B.S.M., Seal, S.E., 2008. Yams (Dioscorea spp.) from the South

Pacific Islands contain many novel badnaviruses: implications for international

movement of yam germplasm. Arch. Virol. 153, 877–889.

Kleinow, T., Nischang, M., Beck, A., Kratzer, U., Tanwir, F., Preiss, W., Kepp, G.,

Jeske, H., 2009. Three C-terminal phosphorylation sites in the Abutilon mosaic

virus movement protein affect symptom development and viral DNA

accumulation. Virol. 390, 89–101.

Laney, A.G., Hassan, M., Tzanetakis, I.E., 2012. An integrated badnavirus is prevalent

in fig germplasm. Phytopathology 102, 1182–9.

132

Leke, W.N., Khatabi, B., Mignouna, D.B., Brown, J.K., Fondong, V.N., 2016.

Complete genome sequence of a new bipartite begomovirus infecting cotton in

the Republic of Benin in West Africa. Arch. Virol. 161, 2329–2333.

Lheureux, F., Carreel, F., Jenny, C., Lockhart, B.E.L., Iskra-Caruana, M.L., 2003.

Identification of genetic markers linked to banana streak disease expression in

inter-specific Musa hybrids. Theor. Appl. Genet. 106, 594–598.

Lockhart, B.E.L., Olszewski, N.E., Bouhida, M., Lockhartz, B.E.L., Olszewski, N.E.,

Lockhart, B.E.L., Olszewski, N.E., Bouhida, M., Lockhartz, B.E.L., Olszewski,

N.E., 1993. An analysis of the complete sequence of a sugarcane bacilliform virus

genome infectious to banana and rice. J. Gen. Virol. 74, 15–22.

Marincevic-zuniga, Y., Gustavsson, I., Gyllensten, U., 2012. Multiply-primed rolling

circle amplification of human papillomavirus using sequence-specific primers.

Virology 432, 57–62.

Muller, E., Ravel, S., Agret, C., Abrokwah, F., Dzahini-Obiatey, H., Galyuon, I.,

Kouakou, K., Jeyaseelan, E.C., Allainguillaume, J., Wetten, A., 2018. Next

generation sequencing elucidates cacao badnavirus diversity and reveals the

existence of more than ten viral species. Virus Res. 244, 235–251.

Paprotka, T., Boiteux, L.S., Fonseca, M.E.N., Resende, R.O., Jeske, H., Faria, J.C.,

Ribeiro, S.G., 2010. Genomic diversity of sweet potato geminiviruses in a

Brazilian germplasm bank. Virus Res. 149, 224–233.

Rockett, R., Barraclough, K.A., Isbel, N.M., Dudley, K.J., Nissen, M.D., Sloots, T.P.,

Bialasiewicz, S., 2015. Specific rolling circle amplification of low-copy human

polyomaviruses BKV, HPyV6, HPyV7, TSPyV, and STLPyV. J. Virol. Methods

215–216, 17–21.

Seal, S., Turaki, A., Muller, E., Kumar, P.L., Kenyon, L., Filloux, D., Galzi, S., Lopez-

133

Montes, A., Iskra-Caruana, M.L., 2014. The prevalence of badnaviruses in West

African yams (Dioscorea cayenensis-rotundata) and evidence of endogenous

pararetrovirus sequences in their genomes. Virus Res. 186, 144–154.

Sharma, S.K., Vignesh Kumar, P., Geetanjali, A.S., Pun, K.B., Baranwal, V.K., 2015.

Subpopulation level variation of banana streak viruses in India and common

evolution of banana and sugarcane badnaviruses. Virus Genes 50, 450–465.

Sharma, S.K., Vignesh Kumar, P., Poswal, R., Rai, R., Swapna Geetanjali, A., Prabha,

K., Jain, R.K., Baranwal, V.K., 2014. Occurrence and distribution of banana

streak disease and standardization of a reliable detection procedure for routine

indexing of banana streak viruses in India. Sci. Hortic. 179, 277–283.

Staginnus, C., Iskra-Caruana, M.L., Lockhart, B., Hohn, T., Richert-Pöggeler, K.R.,

2009. Suggestions for a nomenclature of endogenous pararetroviral sequences in

plants. Arch. Virol. 154, 1189–1193.

Sukal, A., Kidanemariam, D., Dale, J., James, A., Harding, R., 2017. Characterization

of badnaviruses infecting Dioscorea spp. in the Pacific reveals two putative novel

species and the first report of Dioscorea bacilliform RT virus 2. Virus Res. 238,

29–34.

Turaki, A.A., 2014. Characterization of badnavirus Sequences in West African Yams

(Dioscorea spp.). PhD thesis, University of Greenwich, United Kingdom, 240.

Umber, M., Filloux, D., Muller, E., Laboureau, N., Galzi, S., Roumagnac, P., Iskra-

Caruana, M.-L., Pavis, C., Teycheney, P.-Y., Seal, S.E., 2014. The genome of

African yam (Dioscorea cayenensis-rotundata complex) hosts endogenous

sequences from four distinct badnavirus species. Mol. Plant Pathol. 15, 790–801.

Wambulwa, M.C., 2012. Rolling circle amplification is more sensitive than PCR and

serology-based methods in detection of banana streak virus in Musa germplasm.

134

Am. J. Plant Sci. 03, 1581–1587.

Wambulwa, M.C., Wachira, F.N., Karanja, L.S., Kiarie, S.M., Muturi, S.M., 2013.

The influence of host and pathogen genotypes on symptom severity in banana

streak disease. African J. Biotechnol. 12, 27–31.

Yang, I.C., Hafner, G.J., Dale, J.L., Harding, R.M., 2003. Genomic characterization

of taro bacilliform virus. Arch. Virol. 148, 937–949.

Zubair, M., Zaidi, S.S.-A., Shakir, S., Farooq, M., Amin, I., Scheffler, J.A., Scheffler,

B.E., Mansoor, S., 2017. Multiple begomoviruses found associated with cotton

leaf curl disease in Pakistan in early 1990 are back in cultivated cotton. Sci. Rep.

7, 680.

135

This page is intentionally left blank

136

Chapter 6

Characterization and genetic diversity of Dioscorea

bacilliform viruses infecting Pacific yam germplasm

collections

Amit C. Sukal1, 2  Dawit B. Kidanemariam1  James L. Dale1  Robert M. Harding1 

Anthony P. James1*

1 Centre for Tropical Crops and Biocommodities, Queensland University of

Technology, Brisbane, Queensland, Australia

2 Centre for Pacific Crops and Trees, Pacific Community, Suva, Fiji.

* Corresponding author:

E-mail address: [email protected] (APJ)

[Formatted for submission to Plant Pathology]

137

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

QUT Verified Signature

138

Abstract

Dioscorea bacilliform viruses (DBVs) are members of the family Caulimoviridae, genus Badnavirus and are important pathogens of yam (Dioscorea spp.). DBVs are difficult to diagnose due to their genetic and serological diversity, and this is further complicated by the fact that some DBV species occur as endogenous sequences. To date, the complete genome sequences of eight virus species have been determined, with an additional eight putative species described based on partial RT-RNase H- coding sequences. Using RCA, we screened 224 accessions in a Pacific yam germplasm collection and 35 tested positive, with sequencing confirming all positive samples as either Dioscorea bacilliform AL virus (DBALV) or Dioscorea bacilliform

AL virus 2 (DBALV2). DBALV was only present in Vanuatu and Tonga, while

DBALV2 was restricted to Papua New Guinea. Twenty complete genome sequences were generated, including 10 of DBALV with a nucleotide sequence identity ranging from 89 to 90%, and 10 of DBALV2 with a nucleotide sequence identity ranging from

87 to 89%. Based on these results DBALV and DBALV2 appear to be the most prevalent badnavirus species infecting yams in the Pacific region. Analysis of NGS reads from RCA-negative samples failed to identify sequences with similarity to badnaviruses, further demonstrating the potential of RCA for detecting DBVs. For both DBALV and DBALV2, several distinct restriction profiles were observed amongst isolates of each species providing support for the existence of sequence variants. The direct sequencing of RCA products using NGS highlights the utility of

RCA for episomal virus characterization, which will lead to improved diagnostics to support the safe exchange of yam germplasm.

Keywords: Yam, badnavirus, Dioscorea bacilliform virus, DBALV, DBALV2, D-

RCA, RCA-NGS

139

Introduction

Yams (Dioscorea spp.) provide a staple food source for millions of people in

Africa, South America, Asia and the Pacific. It is also a crop that has significant cultural and economic importance (Orkwor, 1998; Bourke & Vlassak, 2004; Sukal et al., 2015). Although yam production is highest in the African region, predominated by

Dioscorea rotundata-cayenensis and accounting for 95% of the world total production

(FAOSTAT, 2018), yam is also very important in the Pacific, where D. alata and D. esculenta are the dominant species (Kenyon et al., 2008). Yam cultivation is almost exclusively through vegetative propagation, which facilitates the vertical transmission of viruses, leading to virus accumulation and associated production losses. Virus accumulation presents major challenges in the exchange of yam germplasm (Kenyon et al., 2008; Seal et al., 2014; Sukal et al., 2015).

Badnaviruses, classified as a number of Dioscorea bacilliform virus (DBV) species, are reported as being the most widespread of all the virus groups known to infect yams (Eni et al., 2008a,b; Kenyon et al., 2008; Bousalem et al., 2009). DBVs are vegetatively transmitted as well as vectored by several species of mealybugs

(family Pseudococcidae) in a semi-persistent manner (Phillips et al., 1999; Kenyon et al., 2001; Atiri et al., 2003; Odu et al., 2004; Bömer et al., 2016). Although infection of yams with DBVs has been associated with leaf symptoms such as veinal chlorosis, necrosis, puckering and crinkling, symptomless infections can also occur (Phillips et al., 1999; Lebot, 2009; Seal et al., 2014; Bömer et al., 2016).

At present, five species of yam-infecting badnaviruses, Dioscorea bacilliform

AL virus (DBALV), Dioscorea bacilliform RT virus 1 (DBRTV1), Dioscorea bacilliform RT virus 2 (DBRTV2), Dioscorea bacilliform TR virus (DBTRV) and

Dioscorea bacilliform SN virus (DBSNV), are taxonomically accepted by the ICTV

140

(Adams & Carstens, 2012; Adams et al., 2018) and a further three species, Dioscorea bacilliform AL virus 2 (DBALV2), Dioscorea bacilliform ES virus (DBESV) and

Dioscorea bacilliform RT virus 3 (DBRTV3) have been described (Sukal et al., 2017;

Bömer et al., 2018). However, additional studies using PCR suggest that there may be at least another eight putative badnavirus species infecting yams worldwide (Kenyon et al., 2008; Bousalem et al., 2009, Bömer et al., 2016).

The Centre for Pacific Crops and Trees (CePaCT) within the Pacific

Community (SPC) has a unique collection of yam germplasm from the Pacific region.

The potential use of this germplasm to address production issues, such as yield losses from pests and diseases, as well as agronomic traits pertinent to export requirements, is immense but remains unexplored due to the unavailability of suitable diagnostic protocols for the detection of badnaviruses. Sukal et al., (2017) recently showed the existence of three different DBVs (DBALV2, DBESV and DBRTV2) in Pacific germplasm collections using RCA and sequencing. They further optimized the RCA protocol to improve badnavirus amplification and showed that RCA coupled with next generation sequencing can be used as an effective tool for DBV amplification and characterization (Sukal et al., 2018, manuscript in preparation). In this study, we further apply this approach to characterize the molecular diversity of badnaviruses infecting the Pacific yam germplasm conserved at SPC-CePaCT.

Methods

Sample details, total nucleic acid (TNA) extractions

SPC-CePaCT presently maintains a collection of 283 yam accessions in tissue cultures. For this study, 224 of these accessions were established in SPC-CePaCT’s insect-proof screenhouse. The remaining 59 yam accessions were not available for this

141

study, because they were in low numbers. After at least three months following acclimatization, leaf samples were collected and total nucleic acids (TNA) extracted using a CTAB extraction protocol described previously (Kleinow et al., 2009). The purified TNA was quantified using a NanoDrop2000 spectrophotometer

(ThermoFisher Scientific, Australia) and the concentration adjusted to ~500 ng/μl with sterile nuclease-free water (NF-H2O). DA/NGA01 an accession from Nigeria, virus screened by IITA before being sent to SPC-CePaCT and experimentally determined to be negative for episomal DBV was used as a negative control for the RCA screening.

Viral DNA enrichment, RCA-RFLP, cloning and Sanger sequencing

A badnavirus-biased RCA approach, described by Sukal et al., (2018; manuscript in preparation), was used to enrich for viral circular DNA. Briefly, a mixture of 32 degenerate badnavirus primers at a final concentration of 0.4 μM of each primer, 1 × phi29 buffer (NEB, Australia) and 1 μl (~500 ng) of TNA was made up to a final volume of 10 μl with sterile NF-H2O and denatured at 95°C for 3 min, cooled to 4°C and placed on ice. Ten μl of reaction mixture consisting of 2.5 μM exo-resistant random hexamers (ThermoFisher Scientific), 1 × phi29 buffer, 2 ng/μl bovine serum albumin (BSA), 4 mM DTT, 15 mM dNTPs, 5 U/μl of phi29 DNA polymerase

(ThermoFisher Scientific) and sterile NF-H2O to make up the final volume, was prepared and added to each denatured sample. Reactions were incubated at 36°C for

18 h, followed by 65°C for 10 min to denature the phi29 polymerase.

RCA products were digested using several restriction endonucleases independently, including EcoRI and SphI (NEB, USA). The enzymes were selected following in silico restriction analysis of published badnavirus genome sequences and from experimental experience. Digest products were electrophoresed through 1.5%

142

agarose gels stained with SYBR® Safe (ThermoFisher Scientific) and fragments of interest were excised, purified and cloned as described in Sukal et al. (2017). The plasmids were screened using restriction analysis and insert-containing plasmids were sequenced with universal M13F/R primers. The resulting reads were queried against

GenBank using the BLASTn and BLASTx algorithms and where sequence reads matched to badnavirus genomes the cloned DNAs were further sequenced with badnavirus degenerate primers BadnaFP/RP (Yang et al., 2003). A primer-walking approach was subsequently used to sequence the complete genomes of four DBALV isolates infecting three samples (one each of D. alata, D. transversa and D. trifida) from Vanuatu as well as one D. esculenta sample from Tonga. The putative restriction sites were confirmed with PCR using sequence-specific primers flanking the sites as described previously (Sukal et al., 2017).

Next generation sequencing and genome assembly

RCA was carried out as described previously and undigested RCA products of 10 PNG samples (all from D. alata) and six Vanuatu samples (including one from D. alata, two from D. esculenta and three from D. bulbifera) were purified using the Illustra™

GFX™ PCR DNA and Gel Band Purification Kit (GE Healthcare, United Kingdom) and sent to the Central Analytical Research Facility (CARF), Queensland University of Technology, Brisbane, Australia for library preparation and sequencing using the

Illumina MiSeq system to generate paired end reads of 301 bp. A further 98 samples that did not produce any apparent restriction profile following RCA restriction digest with EcoRI were purified and pooled by country and sent for the preparation of 13 additional libraries, including six libraries for Fiji, two libraries each for Vanuatu, New

143

Caledonia and Federated States of Micronesia (FSM), and one library for PNG, with subsequent sequencing by NGS as described previously.

Quality of the raw reads was assessed with FastQC v0.10.1 (Babraham

Bioinformatics, UK). A pipeline similar to that described in Muller et al. (2018) for the processing of NGS data and characterization of badnavirus diversity from cacao was used in this study. Raw reads were trimmed to obtain optimum quality using the dynamic trim function of SolexaQA++ v.3.1.3 (Cox et al., 2010) and FASTX-Toolkit

(http://hannonlab.cshl.edu/fastx_toolkit/). Quality corrected reads were then mapped against the D. rotundata reference genome (GenBank accessions DF933857-

DF938579; Tamiru et al., 2017) using the Geneious® v11.0.2

(http://www.geneious.com; Kearse et al., 2012) reference mapper algorithm with default settings. Unmapped reads were assembled into contigs using the SPAdes v3.5.0 algorithm (Bankevich et al., 2012) with k-mers 21, 33 and 55 on the Galaxy platform (Afgan et al., 2015). The resulting contigs were imported into Geneious and

BLASTn analysis was performed against a local database of all known caulimoviridae complete genome sequences available on NCBI as of April 2018. Finally, the viral genome having the highest homology with each assembled contig was then used to perform reference guided mapping to generate a consensus viral genome. Geneious was used to manually examine each assembled genome and ORF prediction on the plus-strand of the putative viral genomes was carried out using Geneious and

ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/) to determine putative viral open reading frames (ORFs) and identify conserved badnaviral sequence motifs.

144

Pairwise sequence comparisons and phylogenetic analysis

Partial reverse transcriptase (RT)-ribonuclease H (RNase H) coding sequences delimited by the BadnaFP/RP (Yang et al., 2003) primers, representing the region accepted for species demarcation of badnaviruses by the ICTV (Geering and Hull,

2012), were used to determine the pairwise nucleotide sequence identity using the

Sequence Demarcation Tool (SDTv1.2) (Muhire et al., 2014) for the full-length genomes generated. For phylogenetic analysis, multiple sequence alignments were constructed using ClustalW (Larkin et al., 2007) within MEGA7 (Kumar et al., 2016) and the Maximum-Likelihood method (Kimura-2-parameter model) used to reconstruct phylogenetic trees following 1000 bootstrap iterations.

Results

RCA and Sanger sequencing

TNA was extracted from 224 yam samples representing five yam species, including

D. alata (185), D. esculenta (31), D bulbifera (6), and one each of D. transversa and

D. trifida. All extracts were screened for putative badnavirus infection using RCA followed by independent restriction digestion with EcoRI and SphI. In total, 35 samples from three countries, representing five yam species, produced restriction profiles indicative of badnaviral genome amplification, while none of the 42, 21 and

24 D. alata samples from FSM, Fiji or New Caledonia, respectively, and 17, two and one D. esculenta samples from Fiji, PNG or Samoa, respectively, produced profiles indicative of badnavirus infection (Table 1).

145

Table 1 Summary of badnavirus RCA testing results

D. alata D. bulbifera D. esculenta D. transversa D. trifida No. of No. No. No. of No. No. No. of No. No. No. of No. No. No. of No. No.

acc. tested positive acc. tested positive acc. tested positive acc. tested positive acc. tested positive Fiji 43 42 0 0 0 0 20 17 0 0 0 0 0 0 0 FSM 24 21 0 0 0 0 0 0 0 0 0 0 0 0 0 PNG 44 38 15 0 0 0 6 2 0 0 0 0 0 0 0 New Caledonia 40 24 0 0 0 0 0 0 0 0 0 0 0 0 0 Samoa 4 2 0 1 1 0 1 1 0 0 0 0 0 0 0 Tonga 1 0 0 0 0 0 3 2 2 0 0 0 0 0 0 Vanuatu 73 58 2 7 5 5 11 9 9 1 1 1 1 1 1 Total 229 185 17 8 6 5 41 31 11 1 1 1 1 1 1 No. of acc. = Number of accessions, No. tested = number of accessions tested for badnavirus and No. of positive = number of accessions badnavirus positive.

146

Of the 185 D. alata accessions tested, 15/38 samples from PNG and 2/58 samples from Vanuatu produced digest profiles indicative of badnavirus infection following digestion of RCA-amplified DNA. In these 17 accessions, digestion using

EcoRI produced between three and seven fragments of less than 4 kb (represented in

Fig. 1A and 2A), while digestion using SphI produced a single 8 kb band in 15 accessions (represented in Fig. 1B and 2B). In accessions DA/PNG58, SphI digestion produced two bands of 1 and 6 kb (Fig. 1B, lane 8), while in accession DA/VUT38,

SphI digestion produced three bands of 2.5, 3.5 and 5 kb (Fig. 2B, lane 2).

Five distinct EcoRI restriction profiles (hereafter referred to as P1-P5) were observed in digests of RCA-amplified DNA from the D. alata samples from PNG (Fig.

1A). In eight samples (DA/PNG06, 20, 22, 23, 43, 51, 57and 59), digestion using

EcoRI produced profile P1 with seven fragments of  0.3, 0.5, 0.6, 0.8, 1.0, 1.6 and 3 kb (Fig. 1A, lane 1-2), while in two samples (DA/PNG07 and 45) digestion using

EcoRI produced profile P2 with seven fragments of  0.6. 0.8, 1.1, 1.6, 2.5, 3 and 4 kb

(Fig. 1A, lane 3-4). Three samples (DA/PNG05, 12 and 14) showed profile P3 with six fragments of  0.5, 0.8, 1.0, 1.6, 2.3 and 3 kb (Fig. 1A, lanes 5-6), sample

DA/PNG17 showed profile P4 with three fragments of  1.4, 2.5, and 4 kb (Fig. 1A lane 7) and sample DA/PNG58 produced profile P5 with four fragments of  0.6, 0.8,

2.5 and 4 kb (Fig. 1A, lane 8). Digestion of the 15 RCA positive PNG D. alata samples using SphI produced a single 8 kb band in 14 of the accessions, while SphI digestion of RCA-amplified DNA from accession DA/PNG58 produced two bands of 1 and 6 kb (Fig. 1A, lane 8).

147

(A)

(B)

Figure 1 (A) EcoRI and (B) SphI restriction analysis of PNG D. alata RCA positive samples. Lane 1-8 are DA/PNG20, DA/PNG22, DA/PNG07, DA/PNG45,

DA/PNG12, DA/PNG14, DA/PNG17 and DA/PNG58. Lane 9 is a known negative sample (DA/NGA01) and Lane 10 is a no template control. M – GeneRuler 1 kb DNA ladder (ThermoFisher Scientific).

148

(A)

(B)

Figure 2 (A) EcoRI and (B) SphI restriction analysis of Vanuatu (VUT) and Tonga

(TON) RCA positive samples. Lanes 1-12 are samples DA/VUT12, DA/VUT38,

DB/VUT01, DB/VUT02, DB/VUT04, DB/VUT07, DE/VUT01, DE/VUT02,

DE/TON02, DE/TON03, DTV/VUT01 and DTF/VUT01. Lane 13 is a known negative sample (DA/NGA01) and Lane 14 is a no template control. M – GeneRuler

1 kb DNA ladder (ThermoFisher Scientific). DA - D. alata, DB - D. bulbifera, DE –

D. esculenta, DTV – D. transversa and DTF – D. trifida.

149

The two D. alata samples from Vanuatu which tested positive also produced distinct restriction profiles following digestion using EcoRI, with sample DA/VUT12 producing four fragments of 1.0, 1.2, 1.7 and 3.3 kb (Fig. 2A, lane 1), while sample

DA/VUT38 produced five fragments of 1.0, 1.2, 1.7, 2.7 and 3.5 kb (Fig. 2A, lane

2). Digestion of RCA-amplified DNA from accession DA/VUT12 using SphI produced a single 8 kb band (Fig. 2B, lane 1), while in accession DA/VUT38 SphI digestion produced 3 bands of 2.5, 3.5 and 5 kb (Fig. 2B, lane 2).

Of the six D. bulbifera accessions tested, all five from Vanuatu produced restriction profiles indicative of badnavirus infection, producing six to seven fragments of less than 4 kb following digestion with EcoRI (Fig. 2A, lanes 3-6) or a single putative full-length digest fragment using SphI (Fig. 2B, lanes 3-6), while the single accession tested from Samoa produced no visible digest fragments. Two distinct restriction profiles were observed following EcoRI digestion of the RCA-amplified

DNA from D. bulbifera accessions from Vanuatu. Three samples (DB/VUT01, 02 and

03) produced a profile with seven fragments of  0.9, 1.0, 1.1, 1.2, 1.7, 3.0 and 3.4 kb

(Fig. 2A, lane 3-4), while digestion of two samples (DB/VUT04 and 07) produced six fragments of 0.9, 1.0, 1.2, 1.7, 3.4 and 4.0 kb (Fig. 2A, lanes 5-6).

Of the 31 D. esculenta accessions tested, 11 showed profiles indicative of badnavirus infection including two samples from Tonga and all nine samples tested from Vanuatu, while none of the 17, two and one accessions from Fiji, PNG and

Samoa, respectively, produced any visible digest fragments. EcoRI digestion of RCA- amplified DNA from the nine positive D. esculenta samples from Vanuatu

(DE/VUT01-09) (represented in Fig. 2A, lanes 7-8) and two positive D. esculenta from

Tonga (DE/TON02-03) (Fig. 2A, lanes 9-10) resulted in the same restriction profile as previously observed in accession DA/VUT12, (Fig. 2A, lane 1), while digestion with

150

SphI produced a single 8 kb fragment for all 11 positive D. esculenta samples

(represented in Fig. 2B, lanes 7-8 and Fig. 2B, lanes 9-10, respectively).

When RCA-amplified DNA from the single D. transversa accession

(DTV/VUT01) was digested using EcoRI (Fig. 2A, lane 11), the profile matched that of the Vanuatu D. bulbifera samples DB/VUT04 and 07 (Fig. 2A, lane 5 and 6).

Similarly, the restriction profile of the EcoRI digested RCA-amplified DNA from the single D. trifida accession (DTF/VUT01) matched that of the D. alata accession

DA/VUT12 as well as the D. esculenta accessions from Vanuatu and Tonga described previously (Fig. 2A, lane 12).

The SphI-digested RCA products of the 35 previously uncharacterized samples were all cloned and sequenced using M13F/R primers, and subsequently using

BadnaFP/RP primers. BLASTn analysis of the partial RT/RNase H sequences obtained following sequencing with the BadnaFP/RP primers revealed that all 15 sequences originating from PNG showed 89-100% nucleotide identity with DBALV2 isolate PNG10 (GenBank accession KY827395), while the 18 sequences from

Vanuatu and two sequences from Tonga showed 92-100% nucleotide identity with

DBALV isolate 2ALa (GenBank accession KX008571).

The complete genome sequences of DBALV were subsequently obtained using a primer-walking approach from D. alata accession DA/VUT12, D. transversa accession DTV/VUT01 and D. trifida accession DTF/VUT01 from Vanuatu, as well as D. esculenta sample DE/TON02 from Tonga. The complete genome sizes of these four virus isolates comprised 7531, 7503, 7579 and 7390 bp, respectively, and the complete annotated sequences were submitted to GenBank under accession numbers

MH404165- MH404168. In order to identify the sequences of all templates amplified

151

using RCA, the remainder of the samples were analysed using a RCA-NGS approach and screened for circular DNA viruses.

Next generation sequencing of Pacific DBV isolates

Thirty five out of 224 accessions produced RCA restriction profiles indicative of the presence of badnavirus. Sixteen of these 35 samples, including ten D. alata samples from PNG and six samples from Vanuatu (one D. alata, two D. esculenta and three D. bulbifera) were selected based on differences in EcoRI restriction profiles, partial RT/RNase H-coding nucleotide sequences, host species or the country of origin, and undigested RCA products were sequenced using NGS. In addition, undigested

RCA products from a further 98 samples that did not produce any visible restriction profiles following digestion of RCA products with EcoRI and SphI were purified and pooled by countries and/or host plant species, and also sent for library preparation and

NGS. These 98 samples were pooled into 13 libraries consisting of six libraries from

Fiji (four D. alata and two D. esculenta), two each from Vanuatu (D. alata), New

Caledonia (D alata) and FSM (D. alata), with each library consisting of eight pooled samples, and one library consisting of two pooled D. esculenta samples from PNG.

Raw NGS paired-end reads were quality corrected, host genomic DNA subtracted, and the reads de novo assembled.

Using the NGS data from the sixteen individual samples (10 from PNG and six from Vanuatu), 16 complete badnavirus genomes were assembled. All of the complete genomes generated from PNG samples (Table 2) were most similar to DBALV2, while all of the complete genomes assembled from Vanuatu samples (Table 3) were most similar to DBALV. The 16 complete genomes were annotated and submitted to

GenBank under the accessions MH404155-MH404164 and MH404169-MH404174.

152

The NGS data obtained from the 13 pooled libraries resulted in contigs with no significant homology to viral sequences, consistent with the RCA restriction analysis.

Analysis of DBALV and DBALV2 complete genomes

A total of 20 complete genome sequences were obtained using either NGS or Sanger sequencing of RCA products from 20 individual accessions. Sixteen of the genomes were generated using NGS, of which 10 were DBALV2 originating from PNG and the remaining 6 were DBALV originating from Vanuatu. The four genomes generated using Sanger sequencing were all DBALV including three samples originating from

Vanuatu and one sample from Tonga. Consistent with the RCA restriction analysis

(Fig. 1 and 2), all of the complete genomes contained between three and seven EcoRI sites and a single SphI site, except for samples DA/PNG58 and DA/VUT38 which had two SphI sites.

Dioscorea bacilliform AL virus (DBALV) from the Pacific

Using NGS, six complete DBALV genomes from Vanuatu were generated including one from D. alata, three from D. bulbifera and two from D. esculenta, while four complete genomes were generated using Sanger sequencing including one each from D. alata, D. transversa and D. trifida from Vanuatu as well as one complete genome from D. esculenta originating from Tonga. The complete genomes ranged from 7390 to 7579 bp in length with their GC contents varying between 42.8 and

43.7%. All contained three ORFs typical of badnaviruses with ORF 1 432 bp in length,

ORF 2 378 bp in length and ORF 3 ranging from 5676-5736 bp in length (Table 3).

153

Table 2 Genomic features of DBALV2 isolates obtained from sequencing of RCA products

Virus Isolate Genome ORF 1 ORF 2 ORF 3 Length (bp) Length Start-stop (frame) Length Start-stop (frame) Length Start-stop (frame) (% G + C) (bp) (codon use) (bp) (codon use) (bp) (codon use) DBALV2-PNG07_DA 7843 429 773-1201 (+2) 396 1198-1593 (+1) 5811 1590-7400 (+3) 42.1 (ATG-TGA) (ATG-TGA) (ATG-TGA) DBALV2-PNG12_DA 7860 447 754-1200 (+1) 396 1197-1592 (+3) 5844 1592-7435 (+2) 42.8 (CTG-TGA) (ATG-TAA) (ATG-TAA) DBALV2-PNG14_DA 7862 447 754-1200 (+1) 396 1197-1592 (+3) 5847 1592-7438 (+2) 42.7 (CTG-TGA) (ATG-TAA) (ATG-TAA) DBALV2-PNG20_DA 7879 429 774-1202 (+3) 399 1199-1597 (+2) 5856 1594-7449 (+1) 41.0 (ATG-TGA) (ATG-TGA) (ATG-TGA) DBALV2-PNG22_DA 7877 429 775-1203 (+1) 399 1200-1598 (+3) 5852 1595-7447 (+2) 41.0 (ATG-TGA) (ATG-TGA) (ATG-TAA) DBALV2-PNG23_DA 7863 447 754-1200 (+1) 396 1197-1592 (+3) 5847 1592-7438 (+2) 42.7 (CTG-TGA) (ATG-TAA) (ATG-TAA) DBALV2-PNG45_DA 7849 441 760-1200 (+1) 396 1197-1592 (+3) 5838 1592-7429 (+2) 42.7 (CTG-TGA) (ATG-TAA) ATG-TAA DBALV2-PNG51_DA 7876 429 774-1202 (+3) 399 1199-1597 (+2) 5853 1594-7446 (+1) 41.0 (ATG-TGA) (ATG-TGA) (ATG-TAA) DBALV2-PNG58_DA 7876 429 774-1202 (+3) 399 119-1597 (+2) 5853 1594-7446 (+1) 41.0 (ATG-TGA) (ATG-TGA) ATG-TAA DBALV2-PNG59_DA 7860 447 754-1200 (+1) 396 1197-1592 (+3) 5844 1592-7435 (+2) 42.8 (CTG-TGA) (ATG-TAA) (ATG-TAA)

154

Table 3 Genomic features of DBALV isolates obtained from sequencing of RCA products

Virus Isolate Genome ORF 1 ORF 2 ORF 3 Length (bp) Length Start-stop (frame) Length Start-stop (frame) Length Start-stop (frame) (% G + C) (bp) (codon use) (bp) (codon use) (bp) (codon use) DBALV-TON02_DE* 7390 432 447-878 (+3) 378 875-1252 (+2) 5700 1252-6951 (+1) 43.1 (ATG-TGA) (ATG-TAA) (ATG-TAA) DBALV-VUT12_DA* 7531 432 589-1020 (+1) 378 1017-1394 (+3) 5700 1394-7093 (+2) 43.7 (ATG-TGA) (ATG-TAA) (ATG-TAA) DBALV-VUT38_DA 7523 432 576-1007 (+2) 378 1004-1381(+1) 5670 1381-7050 (+3) 43.1 (ATG-TGA) (ATG-TAA) (ATG-TAA) DBALV-VUT01_DB 7512 432 571-1002 (+2) 378 999-1376 (+1) 5688 1376-7063(+3) 42.8 (ATG-TGA) (ATG-TAA) (ATG-TGA) DBALV-VUT02_DB 7515 432 574-1005 (+1) 378 1009-1379 (+3) 5688 1379-7066 (+2) 42.8 (ATG-TGA) (ATG-TAA) (ATG-TGA) DBALV-VUT04_DB 7502 432 576-1007 (+2) 378 1004-1381 (+1) 5676 1381-7056 (+3) 43.0 (ATG-TGA) (ATG-TAA) (ATG-TGA) DBALV-VUT01_DE 7520 432 577-1008 (+1) 378 1005-1382 (+3) 5700 1382-7081 (+2) 43.0 (ATG-TGA) (ATG-TAA) (ATG-TAA) DBALV-VUT04_DE 7401 432 476-907 (+2) 378 904-1281 (+1) 5682 1281-6962 (+3) 43.1 (ATG-TGA) (ATG-TAA) (ATG-TAA) DBALV-VUT01_DTF* 7579 432 600-1031 (+3) 378 1028-1405 (+2) 5736 1405-7140 (+1) 43.0 (ATG-TGA) (ATG-TAA) (ATG-TAA) DBALV-VUT01_DTV* 7502 432 583-1014 (+1) 378 1011-1388 (+3) 5676 1388-7063 (+2) 42.8 (ATG-TGA) (ATG-TAA) (ATG-TGA) *Sequence was obtained with Sanger sequencing.

155

The single intergenic region (IR), varying from 885-1038 bp in length, contained several conserved motifs typical of plant dsDNA viruses (Benfey & Chua, 1990;

Medberry & Olszewski, 1993). The putative tRNAmet binding site (5`-

TGGTATCAGAGCTCGGTT-3`) with 88.9% nucleotide identity to the plant tRNAmet consensus sequence (3`-ACCAUAGUCUCGGUCCAA-5`), which has been described as the priming site for reverse transcription, was designated as the origin of the circular genomes consistent with the convention used for other badnaviruses. In addition, a TATA-box (TATATAA) and polyadenylation signal, analogous to the 35S promotor of cauliflower mosaic virus (CaMV), were also identified in the region 5` of the tRNAmet site in all genomes.

BLASTn analysis of the full-length genome sequences confirmed that all had highest nucleotide identity to DBALV (GenBank accessions KX008571, KX008572 and KX008573) with nucleotide sequence identity ranging from 89.2 to 90%. BLASTn analysis using the partial RT/RNase H-coding region of each sequence revealed 97.5 to 99.8% nucleotide identity to partial RT/RNase H-coding sequences generated previously from Vanuatu yams (GenBank accessions AM072705 to AM072708). The partial RT/RNase H-coding sequence delineated by the BadnaFP/RP primers was identified in silico and used to carry out pairwise sequence analysis (PASC) and phylogenetic comparison amongst the sequences and with other published DBALV sequences. PASC revealed that DBALV sequences from Vanuatu and Tonga had 90.5 to 100% nucleotide identity amongst each other and 85 to 94% to previously published isolates from Africa (Fig. 3A). Phylogenetic analysis using the partial RT/RNase H- coding region revealed that the DBALV isolates from Africa were ancestral to the isolates from the Pacific (Fig. 3B).

156

Figure 3 PASC and phylogenetic analysis using partial RT/RNase H-coding nucleotide sequences showing the relationships of DBALV isolates from this study with previously published complete DBALV sequences. (A) Pairwise nucleotide sequence identities were determined using the Sequence Demarcation Tool (SDTv1.2) (Muhire et al., 2014). (B) Maximum-Likelihood phylogenetic tree constructed with the Kimura-2-parameter model, using DBALV2-PNG10 (KY827395) as the outgroup. VUT-Vanuatu, TON-Tonga,

PNG-Papua New Guinea, while KX008571, KX008572 and KX008595 are from Africa.

157

Dioscorea bacilliform AL virus 2 (DBALV2)

RCA combined with NGS was used to generate 10 complete sequences of PNG badnavirus sequences derived from D. alata. The complete genomes ranged from 7860 to 7879 bp in length and had a GC content of 41 to 42.8%. Consistent with DBALV2 isolate PNG10, the sequences contained three ORFs with ORF 1 varying in length from 429-447 bp, ORF 2 varying in length from 396-399 bp and ORF 3 varying in length from 5811-5856 bp (Table 2). Conserved motifs including the tRNAmet site (5`-

TGGTATCAGAGCKYGGTT-3`, underlined nucleotides were not conserved in all sequences), TATA-boxes (TATATAA) and polyadenylation signals were identified in the IR, similar to those present in the DBALV sequences.

BLASTn analysis of the full genome sequences revealed a sequence identity of 87 to 89.1% to the previously published DBALV2 isolate PNG10 (GenBank accession KY827395) originating from PNG, while BLASTn analysis using the partial

RT/RNase H-coding region nucleotide sequences revealed 90.7 to 97.5% nucleotide identity to partial RT/RNase H sequences generated previously from PNG D. alata

(GenBank accessions AM072674, AM072683 and AM072685). PASC and phylogenetic analysis of DBALV2 was carried out using the partial RT/RNase H- coding sequences as described earlier. PASC between the DBALV2 sequences identified in this study, together with the previously published complete DBALV2-

PNG10 sequence from PNG, revealed 88.8-100% nucleotide identity between the sequences (Fig. 4A), with the DBALV2-PNG10 sequence appearing to be ancestral to the 10 sequences characterized in this study (Fig. 4B).

158

Figure 4 PASC and phylogenetic analysis using partial RT/RNase H-coding nucleotide sequences showing the relationships of DBALV2 isolates from this study with previously published complete DBALV2 sequences. (A) Pairwise nucleotide sequence identities were determined using the Sequence Demarcation Tool (SDTv1.2) (Muhire et al., 2014). (B) Maximum-Likelihood phylogenetic tree constructed with the Kimura-2-parameter model, using DBALV-[2ALa] (KX008571) as the outgroup. PNG-Papua New Guinea, while

KX008571 is from Africa.

159

Discussion

In this study, we have characterized the sequence diversity of DBVs present in the Pacific yam germplasm collections held in trust by SPC-CePaCT. A subset of 224 accessions from the collection was screened with RCA, with subsequent restriction analysis indicating the presence of episomal badnavirus in 35 samples. Based on differences in EcoRI restriction profiles, partial RT/RNase H-coding nucleotide sequences, host species or the country of origin, a total of 20 complete DBALV and

DBALV2 genome sequences were obtained from five different yam species originating from PNG, Tonga and Vanuatu.

Based on this, and other studies (Kenyon et al., 2008), DBALV appears to be the most prevalent badnavirus in Pacific yam germplasm with the virus identified in five yam species (D. alata, D. bulbifera, D. esculenta, D. transversa and D. trifida) from Vanuatu as well as D. esculenta from Tonga. Using a PCR-based approach,

Kenyon et al. (2008) reported the presence of DBALV from Vanuatu and Solomon

Islands. Due the unavailability of samples, the occurrence of DBALV in Solomon

Islands could not be confirmed in the present study. Whereas a very low prevalence

(3.4%) of DBALV was found in D. alata accessions from Vanuatu, all D. esculenta and D. bulbifera samples tested were found to be infected with the virus. The low prevalence of DBALV in Vanuatu D. alata could be due to the fact that part of the collection tested herein (31/73) was sent to SPC-CePaCT ex Centre de Coopération

Internationale en Recherche Agronomique pour le Développement (CIRAD) following virus screening, including testing for DBV, and only the virus negative accessions were sent to SPC-CePaCT to be included in the yam collection (Filloux et al., 2006). In the current study, the accessions from CIRAD were also found to be

DBV negative. Further, the previously untested Vanuatu D. alata accessions were also

160

found to have a low prevalence of DBALV (2/42), which correlates with the results of

Filloux et al (2006) who also reported a low incidence of DBALV in Vanuatu D. alata with only 4/56 accessions testing positive in the previous study. Additionally, none of the samples from Fiji, FSM, PNG, New Caledonia or Samoa tested in the present study were positive for DBALV.

DBALV2 was only detected from PNG D. alata samples in the present study, with an incidence of 42.5%, which was significantly higher than the incidence of

DBALV infection of D. alata from Vanuatu. Kenyon et al. (2008) also reported

DBALV2 from D. alata accessions only, however, they detected DBALV2 in samples from both PNG and Vanuatu. In contrast, DBALV2 was not detected from the Vanuatu samples or from other countries included in this study.

DBESV has been previously reported and characterized from a D. esculenta accession from Fiji (DE/FJ14) (Sukal et al., 2017). Testing of additional yam samples in the present study failed to detect DBESV from any additional D. esculenta accessions, or accessions of other yam species from either Fiji or any other countries included in the study. Previously, Kenyon et al. (2008) reported DBESV-like

RT/RNase H partial sequences from D. alata accessions originating from Fiji, PNG,

Tonga and Vanuatu, as well as D. esculenta accessions originating from Fiji, PNG and

Solomon Islands. However, no additional accessions (other than DE/FJ14 described previously) tested positive for DBESV, therefore, as speculated by Kenyon et al.,

(2008), DBESV may be present as an endogenous sequence in some yam species.

Since the focus of the present study was to characterize the episomal badnavirus sequences present in yam, the presence of endogenous DBESV was not determined but is worthy of future investigation.

161

DBRTV2 has been previously characterized from a single Samoan D. rotundata accession (Sukal et al., 2017). Bömer et al. (2016) also detected and characterized DBRTV2 from D. rotundata originating from Nigeria. Therefore, it is possible that DBRTV2 is restricted to D. rotundata. However, since the SPC-CePaCT collections do not include D. rotundata from any additional Pacific countries, the presence of DBRTV2 in countries other than Samoa could not be ascertained. Further field collections and testing should be done to determine whether DBRTV2 is more widely prevalent in the Pacific and whether it occurs in other yam species. Since D. rotundata originates from West Africa (Lebot, 2009), it could be postulated that

DBRTV2 may also originate from West Africa and was distributed into the Pacific with the historical introduction of D. rotundata accessions.

RCA has been successfully used to characterize badnaviruses infecting banana

(James et al., 2011; Wambulwa et al., 2012; Wambulwa et al., 2013; Baranwal et al.,

2014; Carnelossi et al., 2014; Javer-Higginson et al., 2014; Sharma et al., 2014, 2015), cacao (Chingandu et al., 2017a,b; Muller et al., 2018), fig (Laney et al., 2012), mulberry (Chiumenti et al., 2016), Rubus spp. (Diaz-Lara et al., 2015) and yam

(Umber et al., 2014; Bömer et al., 2016, 2018; Sukal et al., 2017). RCA in combination with NGS has also been shown to be very effective in characterizing badnaviruses infecting cacao (Chingandu et al., 2017a,b; Muller et al., 2018). The present study has further demonstrated the utility of RCA-NGS for whole genome amplification and characterization of badnaviruses from yam. Although RCA-NGS can be effectively used for virus detection and characterization, the high costs currently associated with

NGS makes routine diagnosis by this method impractical. However, with improved specificity and the increasing number of badnavirus sequences available from yam,

RCA is now an important consideration for badnavirus detection.

162

During restriction analysis of RCA using EcoRI and SphI, the restriction profiles generated using EcoRI were found to vary considerably between virus isolates in the different yam species studied (Fig. 1A and 2A). Therefore, it may be necessary to use a combination of enzymes with single or multiple recognition sites within the virus genome for diagnostic purposes. The use of an enzyme with multiple recognition sites would increase the chances of detection, since the loss of a site through genetic changes during virus replication would be unlikely to affect multiple sites for a single enzyme. In contrast, the use of an enzyme with a single recognition site may be significantly affected by errors resulting during genome replication, since the loss of a single site would result in no detection following restriction analysis of RCA products.

We also found that other enzymes, such KpnI and StuI, can also complement EcoRI and SphI in determining a DBV positive sample (data not shown).

Restriction sites both within and between DBV species were found to be variable. For example, EcoRI restriction analysis of PNG D. alata RCA products resulted in five distinct restriction profiles (Fig. 1A, lanes 1-8). Similarly, EcoRI restriction analysis of Vanuatu D. alata, D. bulbifera, D. esculenta, D. transversa and

D. trifida and Tonga D. esculenta resulted in an additional four distinct profiles.

However, sequencing of the RCA products from PNG samples having the different restriction profiles revealed that they were DBALV2, while all of the products from the Tonga and Vanuatu were DBALV. In some of these profiles, the EcoRI restriction fragments total greater than 8 kb, for example DA/PNG07 and 45 (Fig. 1A, lanes 3-

4), DA/VUT38 (Fig. 2A, lane 2), DB/VUT01, 02 and 03 (Fig. 2A, lanes 3-4),

DB/VUT04 and 07 (Fig. 2A, lanes 5-6), (DTV/VUT01) (Fig. 2A, lane 11),

DTF/VUT01 (Fig. 2A, lane 12), which may be indicative of the presence of multiple viral sequences. However, sequencing revealed that these accessions contained only

163

single DBV infections. Possible explanations for this observation include star activity of EcoRI, the existence of virus sequence variants within an accession, or co- amplification of plant genomic DNA during RCA or partial digestion of RCA- amplified DNA. Star activity of EcoRI and/or partial digests is unlikely due to both the use of high-fidelity enzyme and the observation that the restriction profiles were reproducible when RCA-amplified DNA from samples was digested on different occasions. Furthermore, some of these profiles were consistent between multiple samples infected with either of the two virus species. The amplification of detectable levels of plant genomic DNA is also unlikely as we used a directed RCA approach, which has been shown to have improved specificity towards badnaviral genome amplification in preference to host genomic DNAs (Sukal et al., 2018 manuscript in preparation). Sequencing of visible digest fragments confirmed that they were all of viral origin and not derived from host plant genomic DNA. Therefore, it is likely that, in some samples, DBVs are probably present as a collection of sequence variants or quasispecies which produce slightly variable digest patterns using some enzymes, such as EcoRI.

These results suggest that while an enzyme with multiple sites, such as EcoRI, may not conclusively identify the virus species present in all cases, when combined with single or double cutting enzymes such as SphI it can be effectively used to confirm the presence of badnavirus through the observation of profiles which clearly indicate the presence of a single 8 kb fragment, or two fragments which total 8 kb. In this way, RCA will be an effective tool for determining the presence/absence of episomal DBVs in yams, which is sufficient for use in germplasm indexing. Further, the expanding knowledge of episomal integrated sequence diversity of badnavirus will

164

support the development of targeted diagnostic procedures such as PCR using virus specific-primers.

This study, together with our previous work on yam badnavirus characterization (Sukal et al., 2017; Sukal et al., 2018 manuscript in preparation) significantly increases the knowledge on yam infecting badnaviruses in the Pacific, with 24 complete genome sequences belonging to four different species, infecting six species of yam from five countries, now available. Prior to this study, an additional

Caulimoviridae member that is distantly related to badnaviruses, from Samoan D. nummularia accessions (DN/SAM01 and 02), was also identified and characterized

(Sukal et al., 2018). Collectively, these studies on Pacific yams show that the dominant yam badnaviruses present in Pacific germplasm differs from that of the African region.

Although some virus species, such as DBALV, are present in both regions, many of the viruses identified in the African region are not present in the Pacific and vice versa.

The present study shows that some of these viruses are restricted to only one or a few countries in the Pacific and that special considerations must be taken to ensure that germplasm collections are thoroughly screened to prevent the dissemination of these badnavirus species to other countries. The availability of virus-tested yam germplasm is essential for the effective distribution and eventual utilization of yams for improved food and nutritional security in the Pacific.

165

Acknowledgements

The authors would like to thank the Centre for Pacific Crops and Trees (CePaCT) of the Pacific Community (SPC) for making their yam collections available for this project. Authors would also like to thank Dr. Michael Furlong and Dr. Grahame

Jackson for their support and advice on this research. The funding for the project was provided by the Australian Centre for International Agricultural Research

(#PC/2010/065). AS is a John Allwright Fellowship recipient.

166

References

Adams MJ, Carstens EB, 2012. Ratification vote on taxonomic proposals to the

International Committee on Taxonomy of Viruses (2012). Archives of Virology

157, 1411–22.

Adams MJ, Lefkowitz EJ, King AMQ et al., 2018. Changes to taxonomy and the

international code of and nomenclature ratified by the

International Committee on Taxonomy of Viruses (2017). Archives of Virology

162, 2505–38.

Afgan E, Sloggett C, Goonasekera N et al., 2015. Genomics virtual laboratory: A

practical bioinformatics workbench for the cloud. PLoS ONE 10, e0140829.

Atiri GI, Winter S, Alabi OJ, 2003. Yam. In: Loebenstein G, Thottappilly G, eds. In:

Loebenstein G, Thottappilly G, eds. Virus and Virus-like Diseases of Major

Crops in Developing Countries. Dordrecht, Netherlands: Kluwer, 249–50.

Bankevich A, Nurk S, Antipov D et al., 2012. SPAdes: A new genome assembly

algorithm and its applications to single-cell sequencing. Journal of

Computational Biology 19, 455–77.

Baranwal VK, Sharma SK, Khurana D, Verma R, 2014. Sequence analysis of shorter

than genome length episomal banana streak OL virus like sequences isolated from

banana in India. Virus Genes 48, 120–7.

Benfey PN, Chua N-H, 1990. The Cauliflower mosaic virus 35s promoter:

combinatorial regulation of transcription in plants. Science 250, 959–66.

Bömer M, Rathnayake AI, Visendi P, Silva G, Seal SE, 2018. Complete genome

sequence of a new member of the genus Badnavirus, Dioscorea bacilliform RT

virus 3, reveals the first evidence of recombination in yam badnaviruses. Archives

of Virology 163, 533–8.

167

Bömer M, Turaki A, Silva G, Kumar P, Seal S, 2016. A Sequence-independent

strategy for amplification and characterization of episomal badnavirus sequences

reveals three previously uncharacterized yam badnaviruses. Viruses 8, 188.

Bourke RM, Vlassak V, 2004. Estimates of food crop production in Papua New

Guinea. Australian National University: Canberra.

Bousalem M, Durand O, Scarcelli N et al., 2009. Dilemmas caused by endogenous

pararetroviruses regarding the taxonomy and diagnosis of yam (Dioscorea spp.)

badnaviruses: Analyses to support safe germplasm movement. Archives of

Virology 154, 297–314.

Carnelossi PR, Bijora T, Facco CU et al., 2014. Episomal detection of banana streak

OL virus in single and mixed infection with cucumber mosaic virus in banana

‘Nanicão Jangada’. Tropical Plant Pathology 39, 342–6.

Chingandu N, Kouakou K, Aka R et al., 2017a. The proposed new species, cacao red

vein virus, and three previously recognized badnavirus species are associated

with cacao swollen shoot disease. Virology Journal 14, 199.

Chingandu N, Zia-ur-rehman M, Sreenivasan TN et al., 2017b. Molecular

characterization of previously elusive badnaviruses associated with symptomatic

cacao in the New World. Archives of Virology 162, 1363–71.

Chiumenti M, Morelli M, De Stradis A, Elbeaino T, Stavolone L, Minafra A, 2016.

Unusual genomic features of a badnavirus infecting mulberry. Journal of General

Virology 97, 3073–87.

Cox MP, Peterson DA, Biggs PJ, 2010. SolexaQA: At-a-glance quality assessment of

Illumina second-generation sequencing data. BMC Bioinformatics 11, 485.

Diaz-Lara A, Mosier NJ, Keller KE, Martin RR, 2015. A variant of rubus yellow net

virus with altered genomic organization. Virus Genes 50, 104–10.

168

Eni AO, Hughes JDA, Asiedu R, Rey MEC, 2008a. Sequence diversity among

badnavirus isolates infecting yam (Dioscorea spp.) in Ghana, Togo, Benin and

Nigeria. Archives of Virology 153, 2263–72.

Eni AO, Hughes JDA, Rey MEC, 2008b. Survey of the incidence and distribution of

five viruses infecting yams in the major yam-producing zones in Benin. Annals

of Applied Biology 153, 223–32.

FAOSTAT, 2018. Production Statistics (FAOSTAT). Rome, Italy: Food and

Agriculture Organization of the United Nations.

Filloux D, Girard J, Bgpi UMR, Ta K, Baillarguet CI De, 2006. Indexing and

elimination of viruses infecting yams (Dioscorea spp.) for the safe movement of

germplasm. In: Proc. 14th Triennial Symp. Trivandrum, India: ISTRC.

Geering ADW, Hull R. 2012. Family Caulimoviridae. In In: King AMQ, Adams MJ,

Carstens EB, Lefkowitz EJ, eds. Virus Taxonomy: Classification and

Nomenclature of Viruses - Ninth Report of the International Committee on

Taxonomy of Viruses. Oxford, UK: Elsevier Academic Press, 429–43.

James AP, Geijskes RJ, Dale JL, Harding RM, 2011. Molecular characterization of six

badnavirus species associated with leaf streak disease of banana in East Africa.

Annals of Applied Biology 158, 346–53.

Javer-Higginson E, Acina-Mambole I, González JE et al., 2014. Occurrence,

prevalence and molecular diversity of banana streak viruses in Cuba. European

Journal of Plant Pathology 138, 157–66.

Kearse M, Moir R, Wilson A et al., 2012. Geneious Basic: An integrated and

extendable desktop software platform for the organization and analysis of

sequence data. Bioinformatics 28, 1647–9.

Kenyon L, Lebas BSM, Seal SE, 2008. Yams (Dioscorea spp.) from the South Pacific

169

Islands contain many novel badnaviruses: implications for international

movement of yam germplasm. Archives of Virology 153, 877–89.

Kenyon L, Shoyinka S a, Hughes J d’A., Odu BO, 2001. An overview of viruses

infecting Dioscorea yams in sub-Saharan Africa. In: Plant Virology in Sub-

Saharan Africa. Proceedings of a Conference Organized by IITA. Ibadan,

Nigeria: IITA

Kleinow T, Nischang M, Beck A et al., 2009. Three C-terminal phosphorylation sites

in the Abutilon mosaic virus movement protein affect symptom development and

viral DNA accumulation. Virology 390, 89–101.

Kumar S, Stecher G, Tamura K, 2016. MEGA7: molecular evolutionary genetics

analysis version 7.0 for bigger datasets. Molecular biology and evolution 33,

1870–4.

Laney AG, Hassan M, Tzanetakis IE, 2012. An integrated badnavirus is prevalent in

fig germplasm. Phytopathology 102, 1182–9.

Larkin MA, Blackshields G, Brown NP et al., 2007. Clustal W and Clustal X version

2.0. Bioinformatics 23, 2947–8.

Lebot V, 2009. Tropical Root and Tuber Crops: Cassava, Sweet Potato, Yams and

Aroids. France: Paris University.

Medberry SL, Olszewski NE, 1993. Identification of cis elements involved in

Commelina yellow mottle virus promoter activity. The Plant Journal 3, 619–26.

Muhire BM, Varsani A, Martin DP, 2014. SDT: A virus classification tool based on

pairwise sequence alignment and identity calculation. PLoS ONE 9, e108277.

Muller E, Ravel S, Agret C et al., 2018. Next generation sequencing elucidates cacao

badnavirus diversity and reveals the existence of more than ten viral species.

Virus Research 244, 235–51.

170

Odu BO, Hughes JDAA, Asiedu R, Ng NQ, Shoyinka SA, Oladiran OA, 2004.

Responses of white yam (Dioscorea rotundata) cultivars to inoculation with three

viruses. Plant Pathology 53, 141–147.

Orkwor GC, 1998. The importance of yams. In: Orkwor GC,, Asiedu R,, Ekanayake

IJ, eds. Food Yams. Advances in Research. Nigeria: IITA (Ibadan) and NRCRI

(Umudike), 1–26.

Phillips S, Briddon RW, Brunt AA, Hull R, 1999. The partial characterization of a

badnavirus infecting the greater Asiatic or water yam (Dioscorea alata). Journal

of Phytopathology 147, 265–9.

Seal S, Turaki A, Muller E et al., 2014. The prevalence of badnaviruses in West

African yams (Dioscorea cayenensis-rotundata) and evidence of endogenous

pararetrovirus sequences in their genomes. Virus Research 186, 144–54.

Sharma SK, Vignesh Kumar P, Geetanjali AS, Pun KB, Baranwal VK, 2015.

Subpopulation level variation of banana streak viruses in India and common

evolution of banana and sugarcane badnaviruses. Virus Genes 50, 450–65.

Sharma SK, Vignesh Kumar P, Poswal R et al., 2014. Occurrence and distribution of

banana streak disease and standardization of a reliable detection procedure for

routine indexing of banana streak viruses in India. Scientia Horticulturae 179,

277–83.

Sukal A, Kidanemariam D, Dale J, James A, Harding R, 2017. Characterization of

badnaviruses infecting Dioscorea spp. in the Pacific reveals two putative novel

species and the first report of Dioscorea bacilliform RT virus 2. Virus Research

238, 29–34.

Sukal A, Kidanemariam D, Harding R, Dale J, James A, 2018. An improved

degenerate-primed rolling circle amplification and next-generation sequencing

171

approach for the detection and characterization of badnaviruses. Manuscript in

preparation.

Sukal A, Kidanemariam D, Harding R, Dale J, James A, 2018. Characterization of a

novel member of the family Caulimoviridae infecting Dioscorea nummularia in

the Pacific, which may represent a new genus of dsDNA plant viruses. PLos ONE

13, 1-12.

Sukal AC, Taylor M, Tuia VS, 2015. Viruses and their impact on the utilization of

plant genetic resources in the Pacific. Acta Horticulturae 1101, 127–32.

Tamiru M, Natsume S, Takagi H et al., 2017. Genome sequencing of the staple food

crop white Guinea yam enables the development of a molecular marker for sex

determination. BMC Biology 15, 86.

Umber M, Filloux D, Muller E et al., 2014. The genome of African yam (Dioscorea

cayenensis-rotundata complex ) hosts endogenous sequences from four distinct

badnavirus species. Molecular Plant Pathology 15, 790-801.

Wambulwa MC, 2012. Rolling circle amplification is more sensitive than PCR and

serology-based methods in detection of banana streak virus in Musa germplasm.

American Journal of Plant Sciences 03, 1581–7.

Wambulwa MC, Wachira FN, Karanja LS, Kiarie SM, Muturi SM, 2013. The

influence of host and pathogen genotypes on symptom severity in banana streak

disease. African Journal of Biotechnology 12, 27–31.

Yang IC, Hafner GJ, Dale JL, Harding RM, 2003. Genomic characterization of taro

bacilliform virus. Archives of Virology 148, 937–49.

172

Chapter 7

General Discussion

Food production per capita has decreased in nearly all the Pacific Island countries (PICs) over the last decade, even in countries with little population growth

(SPC and CSIRO, 2011). With the imminent threat to food security from climate change, agricultural production needs to be increased to maintain food security in the

PICs. Further, agricultural production is an important source of employment and income for the PICs. Yam (mainly Dioscorea alata and D. esculenta) is one of the major staples and ranks among the top root crops along with cassava (Manihot esculenta) and aroids (Colocasia spp. and Xanthosoma spp.) (Elevitch and Love, 2011;

SPC and CSIRO, 2011). In the PICs, yam is grown together with other Pacific staple crops, such as taro, coconut, breadfruit, banana and mango, to ensure adequate and reliable production as part of the coping mechanisms to accommodate for the high climate variability that exists in the region (Thaman, 2007). Wild yams (Dioscorea spp.) play an important role during adverse climatic events, such as drought and cyclones, when other staples are in short supply (Risimeri, 2001). Yam is mainly consumed locally where it is grown, but it has become an export commodity for some

PICs, such as Fiji and Tonga, and has the potential to be developed further. Despite having great potential, the development of yam has remained slow mainly due to the low genetic diversity of cultivars in some of the PICs. This constraint restricts efforts to select for desired traits such as quality, shape and size of tubers, which are conducive for supply to export markets, as well as nutritional benefits and resistance to pests and diseases. The lack of genetic diversity coupled with the rare and often male-dominated flowering of the Pacific yam (D. alata) also prevents crop improvement through plant breeding activities. Access to yam genetic resources from both within the Pacific

173

region and from outside the region is vital to inject the much needed genetic base to support further improvement of Pacific yams and to support effective utilization. The raw material to drive the desired improvement of yams in the Pacific exists within globally recognized genetic repositories, namely SPC-CePaCT (Suva, Fiji) and IITA

(Nigeria), as accessions maintained in tissue culture and are available for exchange under the Multilateral System of the International Treaty on Plant Genetic Resources for Food and Agriculture (ITPGRFA). However, the exchange of yam germplasm has remained limited, mainly due to the unavailability of diagnostic protocols to test germplasm collections for viruses, particularly badnaviruses.

Over the last decade significant efforts have been directed towards improving the understanding and knowledge of badnaviruses infecting yams and improving diagnostic assays for their detection, however, knowledge on the diversity of badnaviruses in the Pacific has been limited to the work done by Kenyon et al. (2008).

This group used a PCR-based technique to characterize the diversity of yam badnaviruses, generally referred to as Dioscorea bacilliform viruses or DBVs, present in the Pacific and reported 11 putative badnavirus species (K1-K11) of DBVs in yam accessions. However, since PCR-based testing can amplify integrated sequences, the episomal nature of these 11 groups was questionable. Kenyon et al. (2008) also identified two sequence groups (K12-K13) that clustered distantly from other DBV sequence groups and outside of, but placed between, members of the genera

Badnavirus and Tungrovirus. This raised further questions as to whether additional viral groups distantly related to genera Badnavirus and Tungrovirus were prevalent in

Pacific yams. The results presented in this thesis resolve some of the questions left unanswered by previous studies on yam badnaviruses in the Pacific. Further, studies on the diversity of DBVs prevalent in Pacific yam germplasm conserved at SPC-

174

CePaCT and the development of protocols to support the testing of yam germplasm for DBVs were done to enable the safe exchange of yam germplasm without risk of disseminating DBVs.

7.1. Dioscorea bacilliform virus (DBV)

At the start of the present study the existence of episomal DBVs in Pacific yams was unknown. In addition, only two episomal DBV species, namely Dioscorea bacilliform SN virus (DBSNV) and Dioscorea bacilliform AL virus (DBALV), had been characterized from the African and Caribbean regions, respectively. Concurrent with the work described in this thesis, additional DBV species were characterized from the African and Caribbean region, including Dioscorea bacilliform RT virus 1

(DBRTV1), and Dioscorea bacilliform RT virus 2 (DBRTV2) (Bömer et al., 2016), as well as Dioscorea bacilliform RT virus 3 (DBRTV3) (Bömer et al., 2018) and

Dioscorea bacilliform TR virus (DBTRV) (Umber et al., 2017). In the present study, a total of 224 out of 283 yam germplasm conserved at SPC-CePaCT were screened using rolling circle amplification (RCA). Using a combination of Sanger and next- generation-sequencing (NGS), a total of 24 complete DBV genomes representing four different species groups, namely K1 or Dioscorea bacilliform ES virus (DBESV), K3 or Dioscorea bacilliform AL virus 2 (DBALV2), K8 or DBALV and T14 or Dioscorea bacilliform RT virus 2 (DBRTV2), were identified in the Pacific. Two of the four

DBVs, namely DBALV2 and DBESV, were determined to be novel species and shown to be restricted to D. alata from PNG and D. esculenta from Fiji, respectively.

Similarly, DBRTV2 was only identified from Samoa and DBALV was identified from

Tonga and Vanuatu. None of the accessions tested from New Caledonia or Federated

States of Micronesia (FSM) were found to be infected with DBVs. DBALV was the

175

only virus found to be present in two countries, Vanuatu and Tonga, while, DBALV2,

DBESV and DBRTV2 were found to be restricted to single countries, namely PNG,

Fiji and Samoa, respectively.

The results of this study considerably expands the knowledge of episomal DBVs in the Pacific region with four known species and 24 complete genomes now characterized. This study, together with the efforts of other researchers over the last four years (Bömer et al., 2016, 2018; Umber et al., 2017), has increased the knowledge of episomal DBVs from two to seven species. A greater understanding of the diversity of DBVs will enable the development of reliable diagnostic protocols to support screening of germplasm collections for safe germplasm exchange. As illustrated from this study, some DBVs appear to be endemic to certain PICs, reinforcing the need for effective screening measures to ensure that they are not distributed throughout the region.

7.2. Dioscorea nummularia-associated virus (DNUaV)

In addition to describing the diversity of DBVs in the Pacific, this study has characterized an additional virus sequence that is phylogenetically distinct from badnaviruses, but clearly belongs within the family Caulimoviridae. Currently, the family Caulimoviridae contains eight recognized genera, namely Badnavirus

Caulimoviridae, Cavemovirus, Petuvirus, Rosadnavirus, Soymovirus, Solendovirus,

Tungrovirus. Kenyon et al. (2008) and Turaki et al. (2017), using universal badnavirus

(BadnaFP/RP) primers (Yang et al., 2003), obtained partial RT/RNase H-coding sequences which, following sequence comparisons, were subsequently delineated into four distinct groups, namely K12, K13, T16 and T17, respectively. These sequences were found to represent distinct groups with very low nucleotide identity to DBV

176

sequences and other badnavirus sequences. Nevertheless, during phylogenetic analysis the sequences clustered outside of, but between, the genera Badnavirus and

Tungrovirus. However, none of the studies were able to determine if groups K12, K13,

T16 and T17 were derived from integrated or episomal sequences, or if they represented new genera within the family Caulimoviridae. In the current study, RCA was used to amplify, clone and sequence the complete genome of an additional sequence group, namely Dioscorea nummularia-associated virus (DNUaV) which, according to the demarcation criteria set out by ICTV for species delineation within current genera of the family Caulimoviridae, was determined to be a novel sequence group. During phylogenetic analysis, DNUaV clustered together with K12 and T16 providing strong evidence that these sequences could also be of episomal origin.

Genome size, organization and the presence of conserved amino acid domains of

DNUaV are characteristic of members of the family Caulimoviridae. However, nucleotide sequence similarity and phylogenetic analysis suggests that DNUaV could be a distinct novel member of the family and may represent a new genus within the

Caulimoviridae family. Despite the strong evidence presented for the existence of episomal DNUaV sequence, further studies on virus transmission, virion morphology and whether infected plants contain inclusion bodies typical of members of the genus

Caulimovirus are required before the taxonomic status of DNUaV can be fully resolved. The occurrence of DNUaV-like virus in other yam species and countries also needs further investigation.

177

7.3. Development of diagnostic protocols

Increased knowledge on the diversity of episomal DBV’s and related viruses present in the Pacific and other regions opens opportunities for the development of diagnostic protocols which will support the indexing and safe distribution/exchange of much needed genetic diversity. This study contributes to this global effort. As alluded to earlier, detection of DBVs presents a particular challenge mainly because of lack of knowledge on episomal sequences, the presence of host integrated sequences and the heterogeneous nature, both genetically and serologically, of DBV.

In addition to the identification and characterization of DBVs, this study also investigated techniques for the diagnosis of DBVs. Initially, PCR protocols were developed for the detection of DBALV2, DBESV and DBRTV2, while DNUaV- specific primers for PCR detection of DNUaV were also designed. Although these

PCR protocols successfully detected the isolates characterized in this study, their ability to detect the breadth of DBV isolates and to differentiate between episomal and integrated sequences was unknown. Due to these potential shortcomings, RCA-based protocols were considered more appropriate. James et al. (2011a) showed the utility of

RCA for the detection of badnaviruses from banana using random-primed RCA with addition of specific badnavirus primers (primer-spiked RP-RCA). Based on this work many authors have successfully used RCA for the diagnosis of episomal badnavirus infections in different hosts, including banana (Baranwal et al., 2014; Carnelossi et al.,

2014; James et al., 2011b; Javer-Higginson et al., 2014; Wambulwa et al., 2012;

Wambulwa et al., 2013; Sharma et al., 2014, 2015), cacao (Chingandu et al., 2017b,

2017a; Muller et al., 2018), fig (Laney et al., 2012), mulberry (Chiumenti et al., 2016),

Rubus spp. (Diaz-Lara et al., 2015) and yam (Bömer et al., 2018, 2016; Sukal et al.,

2017; Umber et al., 2014). However, since RCA amplification of DBVs from yams is

178

not always reliable and has been shown to be unsuitable for DBV indexing (Bömer et al., 2016; Umber et al., 2014), this study sought to optimize the existing protocols.

During the initial DBV characterization studies, it was found that sequence- independent RCA using the TempliPhi™ kit (GE Healthcare) yielded inconsistent results, even with the addition of virus-specific primers similar to the strategy described by James et al. (2011a). Bömer et al. (2016) found that the inclusion of a purification step, following total nucleic acid extraction, with Tip-100G columns improved the efficiency of RCA. However, Bömer et al. (2016) provided experimental evidence that, even with total nucleic acid purification, RCA still resulted in high levels of false negatives and could amplify linear templates. The nature of many commercial RCA kits, such as TempliPhi™, restricts the optimization of the reaction components. Therefore, a strategy to optimize RCA protocols by using directed-RCA

(D-RCA) or specific-primed-RCA (SP-RCA), which preferentially amplify episomal badnavirus DNA, were compared to previously published RCA protocols such as random-primed approach (RP-RCA) and primer-spiked RP-RCA. Using NGS to analyse the reaction products, D-RCA and SP-RCA were found to be approximately

80-fold more efficient in DBV amplification than the TempliPhi kit protocols using the manufacturer’s procedures. During the optimization of the RCA protocols, an incubation temperature of 36°C for the TempliPhi RCA was also found to result in greater target amplification than the manufacturer recommended 30°C for the

TempliPhi RCA protocols. When the D-RCA and SP-RCA protocols were assessed using a range of different badnaviruses from different host plant species, all viruses were successfully amplified from all hosts tested. Further, D-RCA was shown to be the most consistent and efficient protocol for the amplification of DBVs (DBALV,

DBALV2, DBESV, DBRTV2 and DBRTV3), BSV (BSMYV, BSGFV and BSOLV),

179

SCBIMV and TaBV, and performed comparatively better than the RP-RCA or primer- spiked RP-RCA approach using the TempliPhi kit. While SP-RCA was not as robust as D-RCA, it nonetheless performed as well, if not better, than the TempliPhi kit protocols for badnavirus detection. Since the SP-RCA protocol only utilizes badnavirus genus-specific primers, its efficiency can be further improved with the inclusion of additional primers.

The D-RCA and SP-RCA protocols successfully amplified all DBV species from the Pacific. Although DBV species from regions outside the Pacific were not able to be accessed to validate the described protocols, in silico analysis revealed that, theoretically, they can be used to detect a broad range of DBV species. In the future, it is hoped that, through collaborative efforts with researchers in other countries, the protocols described here can be evaluated with other DBV species and an optimized global DBV diagnostic protocol developed. However, this study has still shown that

RCA works well for the DBV species present in the Pacific and that following RCA restriction analysis (using endonuclease such as EcoRI, KpnI, SphI and StuI) the presence or absence of DBVs can be successfully determined. Although further effort is needed to be able to fully identify species of DBV present, the presence/absence of

DBV is sufficient to enable germplasm screening and address the issue of safe dissemination.

This study has also shown that RCA, in combination with NGS, can be an effective tool for virus detection and characterization. Although the costs of NGS may currently preclude its routine use, with the improvement of the technology and likely decrease in costs of NGS, it may be an option in the future. Further research into identification of sequence variants and quasispecies present in infected samples, and

180

the relationship to differences in restriction digest patterns will strengthen interpretation of RCA restriction results.

7.4. Conclusions

Yam (Dioscorea spp.) is an important food staple in the Pacific with great potential for increasing food security and commercial exploitation. However, the agronomical challenges with its production have limited its utilization. Pacific germplasm collections conserved at SPC-CePaCT and other important collections, such as those conserved by IITA, hold germplasm that are integral to overcoming these agronomical challenges. However, access to this germplasm has remained limited due to concerns over introduction of new viral diseases with the germplasm. DBVs are of great concern mainly because of a lack of reliable diagnostic tests which prevents routine indexing for safe exchange. The RCA protocols optimized in this study have huge potential for episomal virus characterization when coupled with restriction analysis alone and/or Sanger sequencing and/or direct sequencing of the RCA products via NGS. This provides an avenue to virus index yam germplasm and certify them free of episomal DBVs. To date there is no clear evidence of activation of endogenous

DBV-like sequences, however, the possibility still exists. Therefore, RCA protocols proposed in this study, coupled with either restriction analysis, Sanger Sequencing and

NGS alone or a combination, provides the best possible option for the detection and characterization of episomal DBVs.

Since the present study was not able to obtain DBV species from regions other than the Pacific, a global collaborative effort is warranted, firstly to determine the

DBVs that are present in the different yam growing regions, mainly the African and

Caribbean regions and secondly, to utilize the methods described in this study to

181

evaluate their efficacy as a universal diagnostic tool for DBV. The RCA methods described coupled with NGS could be deployed to other regions to characterize the yam DBVs present. Sequence information on DBVs from the different regions will help in the development of universal primers which could be used in RCA methods for DBV diagnosis to support transboundary safe yam germplasm exchange.

For the immediate future, techniques described in this study will be used at SPC-

CePaCT under an ACIAR funded integrated crop management (ICM) project titled:

Integrated crop management strategies for root and tuber crops: strengthening national and regional capacities in Papua New Guinea, Fiji, Samoa, Solomon Islands and Tonga (Project number: HORT/2010/065), to screen the Pacific yam germplasm collections for DBV and other viruses including those in the families Alphaflexiviridae

(genus Potexvirus), Betaflexiviridae (genus Carlavirus), Bromoviridae (genus

Cucumovirus), Caulimoviridae (genus Badnavirus), Potyviridae (genus Macluravirus and Potyvirus), Secoviridae (genus Comovirus and Fabavirus) and Tombusviridae

(genus Aureusvirus). Accessions found to be free from these viruses will then be made available for distribution. The virus testing results will enable the yam germplasm to meet quarantine requirements enabling access by the different Pacific Island

Countries.

Access to yam germplasm by the countries will enable the evaluation of the different cultivars available at SPC-CePaCT for desired agronomical traits.

Identification of cultivars with desirable agronomical traits, such as anthracnose resistance, will enable local farmers to increase subsistence production and commercial exploitation. Countries such as Fiji, Samoa and Tonga, through evaluation stand to benefit from identification of cultivars with tuber morphology conducive to export. Some yam cultivars (especially the lesser yams D. esculenta) have also been

182

identified as drought tolerant. Some have been identified suitable for atoll islands, such as Viwa Island in Fiji, and can be distributed to enhance tolerance against drought thus contributing towards resilience against climate change (SPC data, unpublished).

The unique germplasm conserved at SPC-CePaCT has the promise of desired agronomical traits that will promote informed utilization and exploitation of yam genetic material to build food and nutritional security in the Pacific and also support resilience against climate change.

183

7.5. References

Baranwal VK, Sharma SK, Khurana D, Verma R, 2014. Sequence analysis of shorter

than genome length episomal banana streak OL virus like sequences isolated from

banana in India. Virus Genes 48, 120–127.

Bömer, M., Rathnayake, A.I., Visendi, P., Silva, G., Seal, S.E., 2018. Complete

genome sequence of a new member of the genus Badnavirus, Dioscorea

bacilliform RT virus 3, reveals the first evidence of recombination in yam

badnaviruses. Arch. Virol. 163, 533–538.

Bömer, M., Turaki, A., Silva, G., Kumar, P., Seal, S., 2016. A sequence-independent

strategy for amplification and characterization of episomal badnavirus sequences

reveals three previously uncharacterized yam badnaviruses. Viruses 8, 188.

Carnelossi, P.R., Bijora, T., Facco, C.U., Silva, J.M., Picoli, M.H.S., Souto, E.R.,

Oliveira, F.T. De, 2014. Episomal detection of banana streak OL virus in single

and mixed infection with Cucumber mosaic virus in banana “Nanicão Jangada.”

Trop. Plant Pathol. 39, 342–346.

Chingandu, N., Kouakou, K., Aka, R., Ameyaw, G., Gutierrez, O.A., Herrmann, H.-

W., Brown, J.K., 2017a. The proposed new species, cacao red vein virus, and

three previously recognized badnavirus species are associated with cacao swollen

shoot disease. Virol. J. 14, 199.

Chingandu, N., Zia-ur-rehman, M., Sreenivasan, T.N., Surujdeo-Maharaj, S.,

Umaharan, P., Gutierrez, O.A., Brown, J.K., Thyail, M.Z., Zia-ur-rehman, M.,

Sreenivasan, T.N., Surujdeo-Maharaj, S., Umaharan, P., Gutierrez, O.A., Brown,

J.K., Thyail, M.Z., 2017b. Molecular characterization of previously elusive

badnaviruses associated with symptomatic cacao in the New World. Arch. Virol.

162, 1363–1371.

184

Chiumenti, M., Morelli, M., De Stradis, A., Elbeaino, T., Stavolone, L., Minafra, A.,

2016. Unusual genomic features of a badnavirus infecting mulberry. J. Gen.

Virol. 97, 3073–3087.

Diaz-Lara, A., Mosier, N.J., Keller, K.E., Martin, R.R., 2015. A variant of Rubus

yellow net virus with altered genomic organization. Virus Genes 50, 104–110.

Elevitch, B.C.R., Love, K., 2011. Farm and Forestry Production and Marketing

Profiles: Highlighting value-added strategies., in: Elevitch, C.R. (Ed.), Specialty

Crops for Pacific Island Agroforestry. Permanent Agriculture Resources (PAR),

Holualoa, Hawai‘i., p. 14.

James, A.P., Geijskes, R.J., Dale, J.L., A., Harding, R.M., 2011a. Development of a

novel rolling-circle amplification technique to detect banana streak virus that also

discriminates between integrated and episomal virus sequences. Plant Dis. 95,

57–62.

James, A.P., Geijskes, R.J., Dale, J.L., Harding, R.M., 2011b. Molecular

characterization of six badnavirus species associated with leaf streak disease of

banana in East Africa. Ann. Appl. Biol. 158, 346–353.

Javer-Higginson, E., Acina-Mambole, I., González, J.E., Font, C., González, G.,

Echemendía, A.L., Muller, E., Teycheney, P.Y., 2014. Occurrence, prevalence

and molecular diversity of banana streak viruses in Cuba. Eur. J. Plant Pathol.

138, 157–166.

Kenyon, L., Lebas, B.S.M., Seal, S.E., 2008. Yams (Dioscorea spp.) from the South

Pacific Islands contain many novel badnaviruses: implications for international

movement of yam germplasm. Arch. Virol. 153, 877–889.

Laney, A.G., Hassan, M., Tzanetakis, I.E., 2012. An integrated badnavirus is prevalent

in fig germplasm. Phytopathology 102, 1182–1189.

185

Muller, E., Ravel, S., Agret, C., Abrokwah, F., Dzahini-Obiatey, H., Galyuon, I.,

Kouakou, K., Jeyaseelan, E.C., Allainguillaume, J., Wetten, A., 2018. Next

generation sequencing elucidates cacao badnavirus diversity and reveals the

existence of more than ten viral species. Virus Res. 244, 235–251.

Risimeri, J.B., 2001. Yams and food security in the lowlands of PNG, in: Food

Security for Papua New Guinea. Australian Centre for International Agricultural

Research, Canberra.

Sharma, S.K., Vignesh Kumar, P., Geetanjali, A.S., Pun, K.B., Baranwal, V.K., 2015.

Subpopulation level variation of banana streak viruses in India and common

evolution of banana and sugarcane badnaviruses. Virus Genes 50, 450–465.

Sharma, S.K., Vignesh Kumar, P., Poswal, R., Rai, R., Swapna Geetanjali, A., Prabha,

K., Jain, R.K., Baranwal, V.K., 2014. Occurrence and distribution of banana

streak disease and standardization of a reliable detection procedure for routine

indexing of banana streak viruses in India. Sci. Hortic. 179, 277–283.

SPC and CSIRO, 2011. Food security in the Pacific and East Timor and its

vulnerability to climate change. Pacific Community (SPC), Noumea.

Sukal, A., Kidanemariam, D., Dale, J., James, A., Harding, R., 2017. Characterization

of badnaviruses infecting Dioscorea spp. in the Pacific reveals two putative novel

species and the first report of Dioscorea bacilliform RT virus 2. Virus Res. 238,

29–34.

Thaman, R.R., 2007. Restoring the Pacific Islands’ rich agricultural traditions: an

urgent priority. Pacific Ecol. 15, 51–57.

Turaki, A.A., Bömer, M., Silva, G., Lava Kumar, P., Seal, S.E., 2017. PCR-DGGE

analysis: Unravelling complex mixtures of badnavirus sequences present in yam

germplasm. Viruses 9, 181.

186

Umber, M., Filloux, D., Muller, E., Laboureau, N., Galzi, S., Roumagnac, P., Iskra-

Caruana, M.-L., Pavis, C., Teycheney, P.-Y., Seal, S.E., 2014. The genome of

African yam (Dioscorea cayenensis-rotundata complex) hosts endogenous

sequences from four distinct badnavirus species. Mol. Plant Pathol. 15, 790–801.

Umber, M., Gomez, R.M., Gélabale, S., Bonheur, L., Pavis, C., Teycheney, P.Y.,

2017. The genome sequence of Dioscorea bacilliform TR virus, a member of the

genus Badnavirus infecting Dioscorea spp., sheds light on the possible function

of endogenous Dioscorea bacilliform viruses. Arch. Virol. 162, 517–521.

Wambulwa, M.C., 2012. Rolling Circle Amplification Is More Sensitive than PCR

and Serology-Based Methods in Detection of Banana streak virus in Musa

Germplasm. Am. J. Plant Sci. 03, 1581–1587.

Wambulwa, M.C., Wachira, F.N., Karanja, L.S., Kiarie, S.M., Muturi, S.M., 2013.

The influence of host and pathogen genotypes on symptom severity in banana

streak disease. African J. Biotechnol. 12, 27–31.

Yang I.C., Hafner G.J., Dale J.L., Harding R.M., 2003. Genomic characterization of

taro bacilliform virus. Arch. Virol. 148, 937–49.

187