Mol Genet Genomics DOI 10.1007/s00438-015-1049-z

ORIGINAL PAPER

Sequences enhancing cassava mosaic disease symptoms occur in the cassava genome and are associated with South African cassava mosaic infection

A. T. Maredza1 · F. Allie1 · G. Plata2 · M. E. C. Rey1

Received: 6 January 2015 / Accepted: 10 April 2015 © Springer-Verlag Berlin Heidelberg 2015

Abstract Cassava is an important food security crop in with cassava genes, suggesting a possible role in regulation Sub-Saharan Africa. Two episomal -associ- of specific biological processes. We confirm the expression of ated sequences, named Sequences Enhancing Geminivirus SEGS in planta using EST data and RT-PCR. The sequence Symptoms (SEGS1 and SEGS2), were identified in field features of endogenous SEGS (iSEGS) are unique but resem- cassava affected by the devastating cassava mosaic dis- ble non-autonomous transposable elements (TEs) such as ease (CMD). The sequences reportedly exacerbated CMD MITEs and helitrons. Furthermore, many SEGS-associated symptoms in the tolerant cassava landrace TME3, and the genes, some involved in virus–host interactions, are differ- model Arabidopsis thaliana and Nicotiana bentha- entially expressed in susceptible (T200) and tolerant TME3) miana, when biolistically co-inoculated with African cas- cassava landraces infected by South African cassava mosaic sava -Cameroon (ACMV-CM) or East African virus (SACMV) of susceptible (T200) and tolerant (TME3) cassava mosaic virus-UG2 (EACMV-UG2). Following the cassava landraces. Abundant SEGS-derived small RNAs were identification of small SEGS fragments in the cassava EST also present in mock-inoculated and SACMV-infected T200 database, the intention of this study was to confirm their pres- and TME3 leaves. Given the known role of TEs and associ- ence in the genome, and investigate a possible role for these ated genes in gene regulation and immune responses, sequences in CMD. We report that multiple copies of vary- our observations are consistent with a role of these DNA ele- ing lengths of both SEGS1 and SEGS2 are widely distributed ments in the host’s regulatory response to geminiviruses. in the sequenced cassava genome and are present in several other cassava accessions screened by PCR. The endogenous Keywords Cassava mosaic disease · Sequences SEGS1 and SEGS2 are in close proximity or overlapping Enhancing Geminivirus Symptoms · Satellites · Begomovirus · Transposable elements

Communicated by K. Gruden.

Small RNA-Seq data reported are available in the European Introduction Nucleotide Archive under the accession number PRJEB8495. Geminiviruses are single-stranded circular (ssc) plant- Electronic supplementary material The online version of this infecting DNA (Brown et al. 2011). Cassava mosaic article (doi:10.1007/s00438-015-1049-z) contains supplementary disease (CMD), caused by belonging to the material, which is available to authorized users. family, is endemic in regions of sub-Saharan * M. E. C. Rey Africa where the crop is cultivated (Patil and Fauquet 2009). [email protected] An unusually severe CMD epidemic that swept across East Africa, and was reported between 1995 and 2005 (Gibson 1 School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg, Wits 2050, South Africa et al. 1996; Legg et al. 2006; Ntawuruhunga et al. 2007), resulted in unprecedented cassava crop losses. Unexpect- 2 Department of Systems Biology, Columbia University in the City of New York, 1130 St Nicholas Avenue, New York, NY, edly, cassava fields with previously tolerant germplasm USA such as TME3, an IITA-bred landrace (Fregene et al. 2001),

1 3 Mol Genet Genomics were also severely affected. Epidemiological studies in of iSEGS within the genome. Analysis of genome posi- Uganda and Cameroon identified recombinants between tions of SEGS may improve our understanding of their cassava begomoviruses (CBVs) namely, African cassava potential role in CMD aetiology. In this study, eSEGS1 mosaic virus (ACMV), East African cassava mosaic virus (AY836366.1; begomovirus-associated sat DNA-II) and (EACMV) and East African cassava mosaic Cameroon eSEGS2 (AY836367.1; begomovirus-associated sat DNA- virus (EACMCV) (Zhou et al. 1997; Fondong et al. 2000; III) were used to query the cassava genome. The frequen- Pita et al. 2001). Ndunguru (2006) later identified novel epi- cies of iSEGS insertions, the sizes of integrated elements, somal ssc DNA sequences associated with begomoviruses in the patterns of integration and their chromosomal (scaffold) cassava field samples exhibiting the unusually severe mosaic locations were investigated. Furthermore, the expression symptoms. Following the discovery of resistance breaking of the larger iSEGS and associated cassava genes was ana- in TME3 and detection of these ssc DNA sequences in cas- lysed. Differential expression of iSEGS-associated genes in sava EST datasets, questions remained as to the nature of South African cassava mosaic virus (SACMV) (Berrie et al. these DNA sequences, and what role could they possibly be 2001)-infected cassava was also evaluated in comparison playing in CMD symptom modulation. In view of breeding with mock-inoculated plants. The iSEGS vary in length, are and genetic modification programs to develop geminivi- found in multiple loci across the cassava genome, and share rus resistance, the results from this study could shed some some features with several classes of transposable ele- important light not only on molecular virus–host interac- ments (TEs). Small RNA-Seq data revealed the presence of tions but also on disease control approaches. iSEGS-derived small RNAs targeting SEGS1 and SEGS2 The ssc DNA molecules, initially named begomovirus- in mock-inoculated and in SACMV-infected cassava, with associated sat DNA-II and sat DNA-III (GenBank acces- higher frequencies in SACMV-infected T200. Our results sion numbers AY836366.1 and AY836367.1), were unre- suggest a possible regulatory role of iSEGS through a small lated and did not share any long stretches of nucleotide RNA-mediated mechanism. In addition, we demonstrate sequence similarity. Sat DNA-II and DNA-III were initially the altered expression of several iSEGS-associated genes classified as -like molecules because they were in response to SACMV, providing support for a functional amplified using universal primers for alpha- and betasatel- relationship between the iSEGS and CMD phenotype mod- lites (Mansoor et al. 1999; Briddon et al. 2002). Satellites ification and gene regulation. are sub-viral nucleic acids that depend on co-infection with a helper virus for replication (Mayo et al. 2005) and affect disease severity and transmission (reviewed in Nawaz-ul- Materials and methods Rehman and Fauquet 2009; Zhou 2013). Sat DNA-II and DNA-III have been renamed episomal Sequences Enhanc- Plant material ing Geminivirus Symptoms (eSEGS1 and eSEGS2, respec- tively), as further characterisation revealed their non-con- Leaf samples for nucleic acid extraction were obtained from formity to classical satellites, and have been described in virus-free in vitro-propagated cassava plantlets (Table 1) detail (Ndunguru 2006; Ndunguru et al. 2008). Intrigu- grown in plant agar containing MS media (Murashige and ingly, SEGS homologues were subsequently identified in Skoog 1962). All plant material was checked for healthy or genomes of healthy cassava, and are termed integrated or infected status by PCR, using SACMV core coat primers endogenous SEGS (iSEGS) to distinguish from the episo- (Allie et al. 2014). mal forms (eSEGS). A BLAST search against a database of 10,577 ESTs from MTAI16 (Sakurai et al. 2007) also Extraction of nucleic acids identified several transcripts with >80 % sequence iden- tity to fragments of eSEGS1 and eSEGS2 (unpublished). Total nucleic acid (TNA) was extracted from ground leaf While some plant DNA virus sequences have been found samples using extraction buffer [0.1 M Tris–Cl pH 8.0, integrated in their host genomes and play a role in genome 1.4 M NaCl, 20 mM EDTA, 2 % cetyltrimethyl ammo- expansion and gene expression (Harper et al. 2002), the nium bromide (CTAB), 1 % β-mercaptoethanol] followed integrated SEGS (iSEGS) are not homologous to any by chloroform:isoamylalcohol (24:1) extraction. TNA known plant virus, but iSEGS-2 contains a 26-bp (includ- were precipitated using isopropanol and resuspended in TE ing 6 bp of the DNA1-F primer) match to a sequence of (10 mM Tris–Cl, 1 mM EDTA) containing 1 µg/µl RNase alphasatellite origin (Zhou 2013). A. Purified DNA was quantified using the NanoDrop spec- The published cassava genome sequences (http:// trophotometer (Thermo Scientific) by measuring absorb- www.phytozome.net) (Goodstein et al. 2012; Prochnik ance at 260 nm and analysed on 1 % agarose gels by et al. 2012) allow a detailed investigation of the diversity electrophoresis.

1 3 Mol Genet Genomics

RNA was isolated using the Tri-Reagent (Sigma, USA) Alignment Editor (Hall 1999). Large gaps (>10 bases) or Purezol (Bio-Rad, USA) following the manufacturer’s and read ends were deleted to improve alignment. The instructions. Contaminating DNA was removed by DNase I best substitution model was determined using the Bayes- treatment according to manufacturer’s instructions (Thermo ian information criterion corrected (BICc), and the Akaike Scientific, USA). Complementary DNA was synthesised information criterion, corrected (AICc). Tree inference was from 1 µg total RNA using the RevertAid H Minus Reverse performed in MEGA version 6.06 (updated v. 6140226; Transcriptase Kit (Thermo Scientific, USA) according to the Tamura et al. 2013). The episomal SEGS, accession num- manufacturer’s instructions and utilising random hexamers. bers AY836366.1 and AY836367.1, were used as reference sequences as well as the mint 1019-nt sat-II-like amplicon PCR screening for integrated sequences (EU862815).

PCRs contained 1 Green buffer, 0.25 µM of each primer, Searching for integrated SEGS in the cassava genome × 0.25 µM of dNTPs, 0.1 U of DreamTaq polymerase (Thermo Scientific, USA) and 100 ng of template DNA BLASTn (v 2.2.21) (Altschul et al. 1997) was used to in 20 µl total volume. Primers, SatII F/R (5′-gccgcac- align the SEGS1 (AY836366.1) and SEGS2 (AY836367.1) cactggatctc-3′ and 5′-cagcagccagtcaggaagtt-3′), or SatIII against version 4 of the cassava genome available from F/R (5′-acctgacggcagaaggaat-3′ and 5′-aggcctcgttactaaaa- Phytozome v9.1 (http://www.phytozome.net). HSPs with gtgc-3′), were designed using the episomal begomovirus- E values higher than 0.05 were ignored. The reward for a associated sequences to amplify 895 and 306 bp fragments nucleotide match parameter ( r) was assigned a value of 1. − for iSEGS1 and iSEGS2, respectively. Annealing tempera- All other parameters were left at their default values. tures of 56–58 °C were used depending on primers. PCR products were analysed by electrophoresis on 1 % agarose Genome searches for genes associated with iSEGS gels, stained with ethidium bromide and viewed using the ChemiDoc™ MP Imaging System (Bio-Rad, USA). The We studied in detail HSPs meeting all conditions below: 1 kb DNA ladder plus (Thermo Scientific, USA) was used at least 70 % sequence identity to the query sequences, 20 as molecular weight marker unless specified. E values 10− and sequence lengths of at least 200 bp. ≤ Exceptions were made when HSPs were in close proximity Cloning and sequencing of PCR fragments with other insertions fulfilling the above criteria. Selected sequences were analysed by tabulating the frequencies, Integrated SEGS for sequencing were amplified using size ranges, homology statistics, chromosomal locations of the Phusion HotStart PCR Kit (Thermo Scientific, USA) iSEGS and genes in close proximity to the iSEGS inser- according to the manufacturer’s instructions. Amplicons tions. The genome contexts of the iSEGS insertions were in gel fragments were purified using the Macherey-Nagel analysed using the Gbrowse feature in Phytozome by slid- Nucleospin® Gel and PCR Clean-up Kit (MN, Germany) ing the window 25 kilo base pairs (kbp) upstream and according to manufacturer’s instructions. Purified DNA downstream of the iSEGS insertion sites. Proximal and was cloned into pJet1.2 Blunt (CloneJet PCR Cloning Kit) distal genes were classified as ‘upstream of’, ‘overlapping according to manufacturer’s instructions (Thermo Scien- with’ or ‘downstream of’ the iSEGS insertion sites. The tific, USA). The clones were sequenced using the cassava_IDs were mapped to homologous A. thaliana locus Inqaba Biotech (Pretoria, South Africa) sequencing service. IDs to use the more comprehensive TAIR resources for Several clones were sequenced for each cassava germ- gene ontology (GO) annotation (http://www.arabidopsis. plasm, and if sequence differences were observed, more org/tools/bulk/go/index.jsp). clones were sequenced to verify the variations. Searching transposable elements homologous to SEGS Analysis of sequences of iSEGS cloned from healthy cassava To characterise the nature of the repetitive SEGS inser- tions and to determine if other genomes harboured similar Quality checks post-sequencing (base calling, trimming sequences, the Censor tool in the repetitive sequences data- and reassembly) were performed using CLC Main Work- base RepBase (http://www.girinst.org/censor/) was used bench (v6.9 clcbio.com). Multiple sequence alignments to search all available reference collections for sequences were performed on the online MAFFT alignment server homologous to the eSEGS (AY836366.1 and AY836367.1). (mafft.cbrc.jp/alignment/server) using default parameters In addition, the terminal sequences of iSEGS were manually combined with manual adjustments in BioEdit Sequence inspected for inverted repeats and target site duplications.

1 3 Mol Genet Genomics

Expression of SEGS‑associated genes during SACMV following the manufacturer’s protocol. Next generation infection sequencing (NGS) was done using the Illumina HiSeq2000 platform at LGC Genomics in Berlin, Germany. Details on the generation of the expression data used to Raw reads generated from the Illumina HiSeq2000 identify SEGS-associated genes during SACMV infec- system for the 12 small RNA libraries were cleaned of tion are detailed in Allie et al. (2014). In brief, an RNA- sequence adapters using the fast-toolkit (http://hannon- seq study monitoring the gene expression changes in lab.cshl.edu/fastx_toolkit/), low-quality tags and small a SACMV-infected susceptible (T200) and a tolerant sequences (<15 nt long). Quality analysis per cycle was (TME3) cassava landrace was conducted. RNA was performed for each library. To eliminate all other small extracted at each of the three time points (12, 32 and 67 non-coding RNAs, high-quality trimmed sequences were dpi) from SACMV and mock-inoculated leaf tissue. A total mapped onto rRNA, tRNA and snoRNAs sequences from of twelve cDNA libraries were generated using the SOLiD Rfam (version 12.0). The sequences that mapped com- Total RNA-Seq Kit (Applied Biosystems). These librar- pletely and had an E value <0.06 were removed from ies were barcoded, multiplexed and sequenced on a single the libraries. Small RNAs (18–26 nt) from the Illumina flow cell using 50 bp forward and 35 bp reverse paired- data were aligned to the two eSEGS (AY836366.1 and end sequencing chemistry on the ABI SOLiD V4 system. AY836367.1), using the Map to Reference Tool in CLC The reads generated for each T200 and TME3 library were Main Genomics Workbench (v6.9 clcbio.com). The num- mapped to the cassava genome available on Phytozome ber of SEGS reads (normalised against total NGS sRNA (http://www.phytozome.net/cassava_Manihot esculenta reads) was quantified in T200 and TME3, modelled 147 version 4.1). These NGS data can be accessed from the through a Poisson distribution, and lower and upper limits NCBI Sequence Read Archive using BioProject accession of confidence intervals calculated using the Neyman pro- PRJNA255198. cedure (Nakamura et al. 2010). The small RNA-Seq data have been submitted to the European Nucleotide Archive Small RNA data from mock‑inoculated and the accession number is PRJEB8495. and SACMV‑infected cassava Amplification of iSEGS using TIR sequences Total RNA extraction, using a modified high molecular weight polyethylene glycol (HMW-PEG) protocol, was Universal primers for begomovirus-associated satel- carried out on mock-inoculated and SACMV-infected leaf lites were mapped on the episomal SEGS sequences. To tissue samples collected from T200 and TME3 at 12, 32 investigate if the SEGS were flanked by terminal inverted and 67 dpi. For each sample, 1 g pooled leaf tissue (from repeats (TIRs), the forward primers Beta01 (5′-ggtaccac- two experiments; 6 leaves per experiment) was homog- tacgctacgcagcagcc-3′) and DNA-1 (5′-tggggatcctagga- enised in liquid nitrogen and added to 5 ml preheated tataaataacacgtc-3′) were used separately for single-primer (65 °C) GHCL buffer (6.5 M guanidium hydrochloride, PCR, yielding amplicons from TIRs. PCR was performed 100 mM Tris–HCl pH 8.0, 0.1 M sodium acetate pH 5.5, on healthy cassava as described previously, using anneal- 0.1 M β-mercaptoethanol) and 0.1 g HMW-PEG (Mr: ing temperature of 50 °C and amplicons were cloned into 20,000, Sigma). The mixture was then pelleted by centrifu- pJET1.2 Blunt as described earlier, and sequenced (Inqaba gation (10,000 g) for 10 min at 4 °C. The supernatant was Biotech, South Africa). Secondary structures within eSEGS × treated with 0.1 ml 1 M sodium citrate (pH 4.0), 0.2 ml were predicted using the Mfold (Zuker 2003) tool within 2 M NaCl and 5 ml phenol:chloroform:isoamyl alcohol CLC Main Workbench (version 6.9) after converting DNA (PCI) (25:24:1). The mixture was then vortexed vigorously to RNA sequences. All settings were left at default values. and again pelleted by centrifugation (10,000 g) for 10 min × at 4 °C. The supernatant was removed and RNA was pre- Primers for analysing iSEGS expression cipitated by adding 5 ml isopropanol (propan-2-ol). The mixture was thoroughly mixed and incubated at 20 °C Two sets of primers for analysing iSEGS expres- − for 60 min and pelleted by centrifugation (10,000 g) for sion were designed to amplify within the near full- × 25 min at 4 °C. RNA pellets were washed with 5 ml ice- length iSEGS: SatII F (5ʹ-gccgcaccactggatctc-3ʹ) and cold 75 % molecular grade ethanol. RNA pellets were dried AY66_520 (5ʹ-caaagctcgagctccaaaggtc-3ʹ) for SEGS1, at 37 °C for 5 min. The pellet was resuspended in 100 μl and AY67_588 (5ʹ-gtgaattgattgagagttg-3ʹ) and AY67_1166 preheated (55 °C) RNase-free water and 1 μl RNase inhibi- (5ʹ-cacgtctatcatttgcttctc-3ʹ) for SEGS2. Other primer pairs tor (Fermentas). Small RNAs were specifically filtered for were designed to span iSEGS and adjoining gene sequences: using the mirVanaTM miRNA isolation kit (Ambion Inc.), Prefoldin- (5ʹ ggattttgccgagatcatta-3ʹ) and AY66_520 for

1 3 Mol Genet Genomics the prefoldin 6 subunit (cassava4.1_019094) associated Additionally, the nucleotide sequences of SEGS1 with SEGS1, and Helicase (5ʹ-gcagcttccctttcttgtttttg-3ʹ) and and SEGS2 were used in a BLASTn search, with default SatIII R (5ʹ-acctgacggcagaaggaat-3ʹ) for the helicase (cas- parameters and an E value threshold of 0.05, against the sava4.1_025672) gene associated with SEGS2. These prim- non-redundant reads in the small RNA study of Pérez- ers were used to amplify from cDNA of selected healthy Quintero et al. (2012) (GEO Accessions: GSM726146 and cassava germplasm. The primers were tested in silico GSM726147) and the unique reads from the study by Bal- against M. esculenta sequences downloaded from the Phy- len-Taborda et al. (2013). tozome database ftp://ftp.jgi-psf.org/pub/compgen/phyto- zome/v9.0/Mesculenta/ using the Primer and Probe Design function located in the Molecular Biology Folder in the Results CLC toolbox in CLC Main Workbench (v6.9) and expected product sizes were recorded. iSEGS fragments are present in the genomes of several cassava accessions Analysis of iSEGS in cassava ESTs Analysis of the putatively episomal SEGS showed termi- The expression of iSEGS was analysed by BLASTn nal inverted repeats (TIR) identical to the universal prim- searches in cassava-expressed sequence tags (EST) data- ers, DNA1-F (Mansoor et al. 1999) and Beta01 (Brid- base (http://ncbi.nlm.nih.gov) containing 88 062 sequences don et al. 2002), used to amplify alphasatellites and as of 2 April 2014. The BLAST outputs were manually betasatellites, respectively (Fig. 1a, b). This suggests that curated and corresponding gene identities verified by partially corresponding palindromic sequences were used searching protein databases using BLASTp. to amplify eSEGS2 and eSEGS1. Surprisingly, the Beta01

Fig. 1 Primer mapping to eSEGS (a, b) and PCR analysis of iSEGS of integrated SEGS from genomes of healthy cassava accessions (c–e) in healthy cassava. Mapping of universal primers for begomov- (as labelled at the top for both panels). Specific primers for SEGS1 irus-associated satellites on the episomal SEGS sequences: a SEGS1 and SEGS2 were designed to produce ~895 and ~306 bp products, and b SEGS2, c amplification of SEGS1 from healthy cassava using respectively. M Molecular weight marker, O’GeneRuler 1 kb plus a single primer (Beta01) demonstrating the presence of correspond- DNA ladder (Thermo Scientific), Plasmid 1 and Plasmid 2 denote ing TIRs in the cassava genome. Lanes 1–4 Cassava cultivars TME1, positive controls (eSEGS1 or eSEGS2-containing , respec- TME3, T200 and cv.60444, respectively; d and e PCR amplification tively) and NTC denotes no template controls

1 3 Mol Genet Genomics

Table 1 Cassava accession characteristics Cultivar Resistance/other characteristics Integrated satellite Integrated satellite Cultivar source DNA-II-like sequences DNA-III-like sequences cv. 60444 ✗ ✓ ✓ CIAT TME1 ✓ ✓ ✓ IITA TME3 ✓ ✓ ✓ IITA T200 ✗ ✓ ✓ South Africa TME117 ✗ ✓ ✓ IITA TME7 ✓ ✓ ✓ IITA I30001 ✓ ✓ ✓ IITA I30572 IITA ± ✓ ✓ I60506 ✗ ✓ ✓ IITA AR9-18 ✓ ✓ ✓ CIAT AR9-44 ✓ ✓ ✓ CIAT CR43-13 ✓ ✓ ✓ CIAT SM707-17 Low temp ✓ ✓ CIAT SM1433-4 High yield ✓ ✓ CIAT CM523-7 High starch ✓ ✓ CIAT/FAO BRA1183 Low temp ✓ ✓ FAO MTAI-16 High starch ✓ ✓ FAO

✗ susceptible, tolerant, moderate tolerance ✓ ± primer alone was able to amplify multiple fragments from genomic DNA of four healthy cassava accessions (Fig. 1c), demonstrating the presence of several similar TIRs in dif- ferent loci within the cassava genome. The sequenced amplicons included a near full-length clone of 98.9 % sequence identity to eSEGS1 (Online Resource 1), but also showed several genome sequences unrelated to SEGS (data not shown). DNA1 F alone failed to amplify from genomic DNA of cassava. Internal primers for SEGS1 and SEGS2 were used to amplify ~ 895 and ~ 306 bp fragments (Fig. 1d, e), respec- tively, from genomic DNA of seventeen healthy cassava accessions (Table 1). Amplicons from cultivars TME1 and TME3 (CMD tolerant), and cv.60444, T200 and TME Fig. 2 Phylogenetic analysis of PCR-amplified iSEGS from 117 (susceptible to CMD), were sequenced. Multiple pair- genomes of healthy cassava a maximum likelihood (ML) tree for iSEGS1 based on the Tamura 3-parameter substitution model wise sequence alignments showed a maximum of 12 and (Tamura 1992). Best substitution model consists of BICc 3554.19, 7 % nucleotide (nt) sequence divergence among iSEGS1 AICc 3348.48 and the highest log likelihood tree is =1652.6442. (Fig. 2a) and iSEGS2 (Fig. 2b) clones, respectively. The The percentage= (>50 %) of trees in which the associated− taxa clus- observed nucleotide differences were mostly due to inser- tered together is shown next to the branches. b Molecular phyloge- netic analysis of iSEGS2 by ML based on the Hasegawa–Kishino– tions or deletions within the sequenced amplicons relative Yano model (Hasegawa et al. 1985). Best substitution model consists to the episomal SEGS. When large gaps were disregarded, of BICc 1524.34; AICc 1223.85 and the highest log likelihood blocks of high sequence conservation were observed, irre- tree is =566.57. All trees =are drawn to scale, with branch lengths − spective of the germplasm. Phylogenetic analyses using measured as the number of substitutions per site. The trees were auto- matically rooted by saving the generated ML tree in standard Newick maximum likelihood (ML) showed no distinct clustering format as previously described (Louis et al. 2014) and all the anal- of clones according to source germplasm (Fig. 2a, b). Sev- ysis were performed in MEGA 6.06 (updated v. 6140226) software eral clones from the same accession were found in separate (Tamura et al. 2013)

1 3 Mol Genet Genomics clusters within the phylogenetic trees. Thus, a heterogene- of the repeat elements were variable. However, nucleo- ous population of amplicons was produced from multiple tide sequences within the first half (5′) of the iSEGS were priming sites within the genome and nucleotide variations more highly represented, while sequences matching the arose from these multiple iSEGS loci. end (3′) were less frequent (Fig. 3c, d). To investigate pos- sible transposition mechanisms of iSEGS, the Censor tool Analysis of iSEGS using the cassava genome database in the RepBase repetitive sequences database (http://www. girinst.org/censor/) was used to search for sequences shar- A BLASTn search (Altschul et al. 1990) against the Mani- ing homology with the episomal SEGS. Interestingly, both hot esculenta genome assembly (v4.1, http://www.phy- SEGS had at least 70 % nucleotide identity to segments tozome.net) identified 73 SEGS1 and 197 SEGS2 homo- of LTR (Copia, Gypsy) and non-LTR (CR1) retrotranspo- logues using an expect (E) value <0.05 (Online Resources sons and a minor region of homology to a DNA element 2 and 3, respectively). The high-scoring pairs (HSPs) var- (P-type). Three TE-like sequences belonging to the Gypsy, ied in length from near full-length sequences to homol- Copia and DNA/P classes showed homology to iSEGS1, ogy stretches of 22 bp and 26 bp for iSEGS1 and iSEGS2, while two TE-like segments with partial homology to respectively (Online Resource 2 and 3). Near full-length SEGS2 were characterised as Gypsy and CR1 (Table 2 insertions matched 997 bp on scaffold 12498 for iSEGS1 and illustrated in Fig. 3e, f). These regions of significant and 756 bp on scaffold 12725 for iSEGS2, represent- homology to known transposons are unlikely to encode any ing 99.2 and 84.1 % nt identities to the query sequences, functional protein, given the relatively short lengths (40– respectively. Analysis of the near full-length iSEGS1 and 2 300 bp) of sequence homology relative to the large sizes of revealed that they both included the GC-rich regions found the TEs in question (3–8 kb) and the lack of genes coding in eSEGS. iSEGS1 also retained seven putative eSEGS1 for transposition enzymes. ORFs (Ndunguru et al. 2008), while iSEGS2 retained TIRs are a common feature of DNA transposons. 105 bp of the putative ORF 2 from eSEGS2. Putative TIRs corresponding to Beta01 and DNA 1F primers were iSEGS1 and iSEGS2 ORFs showed no significant simi- mapped to the episomal SEGS (Fig. 1a, b). To find the larity to protein sequences in the public databases. Many corresponding TIRs on iSEGS, a comparison was made iSEGS were in intergenic genomic regions and only 6 of of sequences flanking the near full-length iSEGS1 and the 73 iSEGS1 and 6 of the 197 iSEGS2 fragments were iSEGS2 on scaffolds 12,498 and 12,725, respectively not associated with genes (within a 25 kb nucleotide win- (Online Resource 6). The analysis showed that the long- dow). Partial iSEGS1 homologues overlapping with anno- est fragment of iSEGS1 has TIRs similar to the Beta01 tated genes were fewer (6 out of 73) than iSEGS2 overlap- primer. Mapping of DNA 1F primer sequences was less ping genes (90 out of 197). Up to seven integrated SEGS defined because the longest iSEGS2 is ′5 -truncated. The per scaffold in the reconstructed cassava genome were presence of TIRs on the longest iSEGS1 and its apparent observed. The highest insertion frequencies were 0.74 lack of coding potential suggest similarity to the Miniature and 0.54 per kbp for fragments with sequence identity to Inverted-Repeat Transposable Elements (MITEs) class of iSEGS1 and iSEGS2, respectively (Fig. 3a, b). Most of non-autonomous DNA transposons. Additionally, second- the scaffolds, including 12498 and 12725 which each con- ary structure (Mfold) analysis of eSEGS1 and 2 showed tained a single near full-length copy, showed less than 0.01 a propensity for the formation of intramolecular double- insertions per kb. However, insufficient information on the stranded loop structures (Online Resource 6). cassava genome precluded the characterisation of iSEGS distribution patterns by chromosomal locations. GO annotations of iSEGS‑associated genes The BLASTn results were analysed further by manually scoring the search statistics against contiguous sequence Mobile genetic elements can affect expression pat- stretches 200 bp or longer. HSPs showing nucleotide iden- terns of genes around insertion sites (Peaston et al. 20 tities 70 % and E values

1 3 Mol Genet Genomics

Fig. 3 Distribution of SEGS insertions in the cassava genome. a, b regions and the red line the location of putative open reading frames Frequency distribution of iSEGS insertions normalised by scaffold (ORFs); e, f Maps of SEGS molecules putative ORFS, the transpo- lengths and presented as number of insertions per kilo base pair. The son elements-like motifs, and annealing positions of several primers numbers on the x-axis are scaffold names. c, d Frequency of homolo- used in the study. Blue regions of high GC content and red the longest gous nucleotides in the cassava genome along the SEGS1 or SEGS2 putative ORFs fragments according to the BLASTn alignment. Blue line the GC-rich

Table 2 -like motifs on iSEGS

Sequence name Position on sequence (bp) Element name Element size (bp) Class References

AY836366/SEGS1 86–235 Copia-113 6042 LTR Bao and Jurka (2013), Collén et al. (2013) 320–381 Gypsy4-I 7680 LTR Jurka and Kohany (2009) 437–475 P-7_HM 3170 DNA/P Jurka (2008b) AY836367/SEGS2 151–444 Gypsy-7 8813 LTR Drosophila 12 Genomes Consortium et al. (2007), Jurka and Kojima (2012) 632–741 CR1-17 4457 Non-LTR Bao and Jurka (2008)

in Online Resource 7. A substantial fraction of iSEGS- among genes overlapping with iSEGS1 was the chloro- associated genes were linked to the nuclear cellular plast. Ten molecular functions, namely nucleotide bind- component (Fig. 4); these include genes such as heli- ing, nucleic acid binding, protein binding, DNA or RNA cases, integrases, ligases, and DNA and histone methy- binding, receptor binding activity, transporter activity, lases. Another cellular component category represented hydrolase activity, kinase activity, transferase activity,

1 3 Mol Genet Genomics

Fig. 4 Gene ontology (GO) annotations for genes in close proximity with iSEGS1 and iSEGS2

and transcription factor activity, were characterised Cassava ESTs harbour sequences homologous to iSEGS the Biological Process categories (Fig. 4). Also pre- sent were several classes of metabolic genes including BLASTn analysis of 88 062 cassava-expressed sequence tags esterases, glycosidases, helicases, nucleases, proteases, (ESTs) established the presence of iSEGS in the pool of tran- and protein phosphatases. Genes with roles in diverse scripts and confirmed their overlap with gene-coding regions cellular, developmental or regulatory processes such (Online Resources 8). Both eSEGS contain motifs, that as amino acid biosynthesis, and protein modification, resemble TATA boxes and polyadenylation signals (Ndun- degradation or binding, and protein transport (chaper- guru et al. 2008), strongly supporting them being transcribed, ones, kinases, transporters, and transferases) were also but no transcripts in the cassava EST database that map to found close to iSEGS. Table 3 shows several examples the full copy sequences related to iSEGS1 or iSEGS2 were of iSEGS-associated genes with possible roles in plant detected. Analysis of the EST-associated iSEGS showed that development and stress response that demonstrate many the 5′ sequences (the first 500 nt of SEGS) were more highly of the above GO functions. represented than the 3′ sequences. EST-associated iSEGS

1 3 Mol Genet Genomics Phytozome accession Phytozome Cassava4.1_033826m Cassava4.1_025277m Cassava4.1_031465m Cassava4.1_031078m Cassava4.1_026451m Cassava4.1_017789m Cassava4.1_025388m Cassava4.1_011123m Cassava4.1_003604m Cassava4.1_034290m Cassava4.1_025672m Cassava4.1_013901m Cassava4.1_018336m Cassava4.1_023764m Cassava4.1_013520m Cassava4.1_007265m Cassava4.1_029497m tion, stability and translation responses; nuclear localisation, DNA-binding tubulin monomers during cytoskeleton assembly; Impli - monomers during cytoskeleton tubulin cated in nuclear gene regulation initiation of DNA replication initiation of DNA control, apoptosis control, apoptosis tion, folding and unfolding by stimulating ATPase activity activity ATPase tion, folding and unfolding by stimulating of Hsp70 stability of signalling complexes signal transduction pathways regulation- binding of ribosomes to mRNA; hydrolase hydrolase binding of ribosomes to mRNA; regulation- activity and rapid wall expansion and rapid wall ing, RNA splicing, RNA cleavage, translation; required for cleavage, splicing, RNA ing, RNA processes developmental many tein interaction; binds substrates for ubiquitin-mediated proteolysis Biological function RNA-binding proteins, modulate RNA processing, localisa - proteins, modulate RNA RNA-binding TFs mediating plant developmental processes & stress TFs mediating plant developmental Co-chaperone with chaperonin, binds cytoplasmic actin & Co-chaperone with chaperonin, binds cytoplasmic Carbohydrate metabolism Carbohydrate Male and female flower fertility Male and female flower Male and female flower fertility Male and female flower Ubiquitin-mediated degradation of cell cycle G1 regulators; G1 regulators; of cell cycle Ubiquitin-mediated degradation Signal transduction, regulation of transcription, cell cycle of transcription, cell cycle Signal transduction, regulation Signal transduction, regulation of transcription, cell cycle of transcription, cell cycle Signal transduction, regulation - Hsp40, mediates protein translation, translocation, degrada Involved in unwinding of nucleic acids Involved Transmembrane molecular facilitator proteins mediating molecular facilitator Transmembrane binding ubiquitin-conjugating enzymes; participates in binding ubiquitin-conjugating Protein & nucleic acid binding; translation initiation and Hydrolase important during seed germination, fruit ripening, RNA-binding; posttranscriptional functions e.g. RNA edit - posttranscriptional functions e.g. RNA RNA-binding; Components of ubiquitin-ligase complexes; protein–pro - complexes; Components of ubiquitin-ligase 67 dpi − 1.06 0.73 0.85 − 0.66 0.16 − 0.66 − 0.61 − 0.07 0.39 − 0.77 0.46 0.66 2.82 − 0.30 − 0.27 − 0.07 NA 0.79 0.67 0.55 2.56 0.20 0.00 0.12 1.19 0.03 32 dpi − 0.88 − 0.72 − 0.43 − 0.25 − 0.41 − 1.46 − 2.36 − 0.14 0.58 0.46 0.31 0.84 0.56 0.78 0.01 0.22 0.04 1.05 0.28 12 dpi TME3 log2 ratio − 0.48 − 0.07 − 0.51 − 0.90 − 0.31 − 1.10 67 dpi NA 2.39 NA 2.11 2.73 − 1.22 NA 1.17 − 1.20 NA 2.14 2.08 NA − 2.19 NA − 2.03 2.14 32 dpi − 3.13 − 1.00 − 3.41 0.62 0.59 0.60 1.06 1.90 − 0.51 − 1.80 − 0.90 1.84 − 1.19 − 4.00 − 0.90 − 0.15 − 1.32 12 dpi T200 log2 ratio 0.75 − 2.08 0.41 0.12 − 0.25 1.51 0.25 0.75 − 0.49 1.27 1.12 1.22 − 0.49 − 1.21 1.92 − 0.35 0.92 Differentially expressed annotated iSEGS1 and 2 associated genes (>twofold change) in SACMV-infected cassava change) in SACMV-infected annotated iSEGS1 and 2 associated genes (>twofold expressed Differentially Tesmin/TSO1-like CXC 2 Tesmin/TSO1-like Tesmin/TSO1-like CXC domain-containing protein Tesmin/TSO1-like Transducin/WD40 repeat-like superfamily protein superfamily repeat-like Transducin/WD40 Transducin family protein/WD-40 repeat family protein protein/WD-40 repeat family family Transducin ADP-glucose pyro-phosphorylase small subunit 2 small subunit ADP-glucose pyro-phosphorylase ATP-dependent RNA helicase, putative RNA ATP-dependent Pentatricopeptide repeat (PPR) superfamily protein Pentatricopeptide repeat (PPR) superfamily NAC domain-containing protein 57 NAC Prefoldin 6

Ubiquitin-conjugating enzyme 34 Ubiquitin-conjugating

DNAJ heat shock N-terminal domain-containing protein DNAJ Helicase domain-containing protein Tetraspanin2 Ubiquitin system component Cue protein

Xyloglucan endotransglycosylase 6 Pentatricopeptide repeat (PPR) superfamily protein Pentatricopeptide repeat (PPR) superfamily F-box family protein F-box family 3 Table Functional annotation AY836366 AY836367

1 3 Mol Genet Genomics Phytozome accession Phytozome Cassava4.1_012626m Cassava4.1_002733m Cassava4.1_011010m Cassava4.1_009844m Cassava4.1_017364m Cassava4.1_013376m Cassava4.1_000657m Cassava4.1_027442m Cassava4.1_024524m Cassava4.1_030232m Cassava4.1_029883m Cassava4.1_029977m Cassava4.1_001994m Cassava4.1_024762m Cassava4.1_026704m Cassava4.1_007562m Cassava4.1_025138m Cassava4.1_002333m Cassava4.1_010120m control, apoptosis control, apoptosis infection and nodule development response to infection & wounding R proteins to mediate disease resistance transfer of glycosyl residues from S-adenosyl methionine to RNA and assembly of multi-protein complexes GTPase kinase and helicase classes GTPase tions such as proliferation, gene expression, differentiation, differentiation, tions such as proliferation, gene expression, and apoptosis mitosis, cell survival diverse binding partners, diverse functions; intracellular binding partners, diverse diverse regulation signalling, cytoskeletal ity; plant growth and development, disease response and development, ity; plant growth regulation- binding of ribosomes to mRNA; hydrolase hydrolase binding of ribosomes to mRNA; regulation- activity cated biotic & abiotic stress responses, developmental cated biotic & abiotic stress responses, developmental stage regulation Biological function Methylation of proteins, lipids & nucleic acids Methylation Signal transduction, regulation of transcription, cell cycle of transcription, cell cycle Signal transduction, regulation Lysine degradation, chromatin modification degradation, Lysine of transcription, cell cycle Signal transduction, regulation Membrane-anchored; electron transfer activity; implicated in Membrane-anchored; electron transfer activity; Amino acid, flavonol, sterol & lignin biosynthetic processes, Amino acid, flavonol, ATPase-dependant nucleotide binding domain; interacts with ATPase-dependant Detoxification of endogenous and exogenous substrates by Detoxification of endogenous and RNA binding, modification by methyltransferase activity activity binding, modification by methyltransferase RNA A structural motif that mediates protein–protein interaction Lipase activity; hydrolase activity; lipid metabolism activity; hydrolase Lipase activity; Chloroplast differentiation Nucleotide-binding protein fold present in many ATPase or ATPase Nucleotide-binding protein fold present in many Thio-ester hydrolases Direct cellular responses to stimuli: regulate cellular func - Direct cellular responses to stimuli: regulate Repetitive amino acid sequence diverse cellular locations, amino acid sequence diverse Repetitive - factor activ binding, transcription Sequence-specific DNA Protein & nucleic acid binding; translation initiation and Transcriptional repression of transposons and genes, impli - Transcriptional 67 dpi − 0.17 0.50 0.59 0.71 0.56 0.57 0.65 1.32 − 0.20 2.32 0.73 0.47 0.50 − 0.62 0.92 − 0.69 0.47 0.29 0.54 0.02 0.22 0.46 1.12 0.12 0.41 3.40 0.71 0.86 0.03 32 dpi − 0.61 − 0.75 − 0.10 − 0.18 − 1.14 − 3.31 − 0.46 − 0.46 − 0.54 0.35 0.51 0.47 0.08 0.70 0.50 0.57 0.81 2.75 0.11 12 dpi TME3 log2 ratio − 0.14 − 0.25 − 0.15 − 0.16 − 0.65 − 1,16 − 1.54 − 1.81 − 0.12 67 dpi NA NA NA NA 2.42 1.81 NA NA NA NA − 4.07 2.37 1.57 2.95 NA − 4.92 NA − 0.99 NA 32 dpi 0.13 − 0.10 − 0.61 0.19 0.10 1.72 − 0.47 − 0.58 0.47 − 0.07 − 3.64 0.47 − 0.67 − 1.90 − 0.32 − 5.21 − 1.13 − 0.04 − 1.69 12 dpi − 1.18 T200 log2 ratio − 0.82 − 0.15 0.01 0.08 − 0.27 0.77 2.92 − 0.81 1.31 0.19 0.92 2.36 0.92 NA − 1.28 NA − 0.37 0.42 a a a -methionine-dependent methyltransferases -methionine-dependent methyltransferases l superfamily protein superfamily superfamily protein superfamily Transducin/WD40 repeat-like superfamily protein superfamily repeat-like Transducin/WD40 Transducin/WD40 repeat-like superfamily protein superfamily repeat-like Transducin/WD40 Tetratricopeptide repeat (TPR)-like superfamily protein superfamily repeat (TPR)-like Tetratricopeptide Thioesterase superfamily protein Thioesterase superfamily ARM-repeat superfamily protein ARM-repeat superfamily Early nodulin-like protein 17 Early nodulin-like O-methyltransferase 1 O-methyltransferase

NB-ARC domain-containing disease resistance protein UDP-glucosyl transferase 76E11 RNA methyltransferase family protein family methyltransferase RNA

Histone-lysine N-methyltransferase ASHH3 Histone-lysine N-methyltransferase GDSL-like Lipase/Acylhydrolase superfamily protein superfamily Lipase/Acylhydrolase GDSL-like division2 P-loop containing nucleoside triphosphate hydrolases P-loop containing nucleoside triphosphate hydrolases

Mitogen-activated protein kinase 15 Mitogen-activated

Integrase-type DNA-binding superfamily protein superfamily DNA-binding Integrase-type RNA helicase family protein helicase family RNA RNA-directed DNA methylation 4 methylation DNA RNA-directed S-adenosyl- 3 Table continued Functional annotation

1 3 Mol Genet Genomics Phytozome accession Phytozome Cassava4.1_029752m Cassava4.1_005267m Cassava4.1_029434m Cassava4.1_012494m Cassava4.1_025931m Cassava4.1_003640m Cassava4.1_028232m Cassava4.1_033633m Cassava4.1_001725m vesiculo-tubular transport vesiculo-tubular scriptional reprogramming in plant defence responses & hormone signalling complex; proteosomal degradation of targets; involved in involved of targets; proteosomal degradation complex; suppression of gene silencing; interact with specific F-box proteins involved in stress response involved enzyme activity, cellular location, or association with other enzyme activity, proteins; critical in signal transduction cascades dependent kinase (Cdk) enzymes olysis; regulation of cellular functions olysis; regulation cence; participates in abscisic acid biosynthesis Biological function A membrane-anchored ras-related GTPase, regulates regulates A membrane-anchored ras-related GTPase, Sequence-specific DNA-binding transcription factor, tran - factor, transcription Sequence-specific DNA-binding Mediates proteosomal binding with ubiquitin ligase Mediates proteosomal binding with ubiquitin ligase Ethylene-forming oxidoreductase Ethylene-forming HSP chaperone cofactor interaction via TTP repeat domain; interaction via HSP chaperone cofactor Posttranslational modification by phosphorylation changing Control cell cycle progression and differentiation via cyclin- progression and differentiation Control cell cycle Protein trafficking, autophagy, proteasome-mediated prote - Protein trafficking, autophagy, Ubiquitin-dependent degradation; prevents premature senes - prevents Ubiquitin-dependent degradation; 67 dpi 0.80 − 0.23 0.21 − 1.14 0.42 − 0.86 0.73 − 1.49 − 0.68 0.15 0.39 0.64 0.49 0.37 32 dpi − 0.85 − 0.19 − 0.91 − 0.99 0.40 0.07 0.59 0.95 0.54 12 dpi TME3 log2 ratio − 1.65 − 0.49 − 0.32 − 0.01 67 dpi NA NA NA NA − 0.93 NA NA NA NA 32 dpi − 0.10 − 0.15 − 0.96 − 0.64 − 0.02 0.05 − 0.15 − 1.32 − 0.61 12 dpi T200 log2 ratio 1.02 − 0.73 − 0.80 − 0.08 0.32 − 0.03 0.66 0.51 0.77 a Gene sequences overlapping with SEGS Gene sequences overlapping WRKY DNA-binding protein 3 WRKY DNA-binding ACC oxidase 1 ACC

SKP1/ASK-interacting protein 5

Heat-shock protein DnaJ with tetratricopeptide repeat Protein kinase superfamily protein Protein kinase superfamily Cyclin family Ubiquitin-like superfamily protein superfamily Ubiquitin-like Senescence-associated E3 ubiquitin ligase 1 Senescence-associated E3 ubiquitin ligase Rab5-interacting family protein Rab5-interacting family

3 Table continued Functional annotation a twofold change in gene expression is − 1 ≥ log2 ratio 1, representing a ± twofold At least one of the values

1 3 Mol Genet Genomics

Fig. 5 Expression analysis of iSEGS in different cassava germplasm. T200 and cv.60444, respectively; Marker—Molecular weight marker, Primers were designed to amplify with the largest integrated frag- O’GeneRuler 1 kb plus DNA ladder (Thermo Scientific), Plasmid ments; a, b for iSEGS1 and iSEGS2, respectively. Primers spanning 1 and Plasmid 2 refer to plasmids containing dimers of eSEGS1 or the iSEGS1 insertion to include the neighbouring prefoldin 6 subu- eSEGS2, respectively, and NTC refers to no template controls. PCR nit gene or iSEGS2 into neighbouring helicase gene (c, d), respec- was performed on ~250 ng gDNA or cDNA tively. Lanes 1–4 cDNA from cassava accessions TME1, TME3, were mostly in / orientation relative to the episomal sequences were among the SEGS-containing ESTs iden- + + SEGS query sequences. Unrelated gene sequences contained tified (Online Resource 9), and transcripts from this gene similar stretches of iSEGS, including the GC-rich sequences, were differentially expressed in SACMV-infected cassava suggesting a conserved function for these sequence stretches (Online Resource 11). Helicase and iSEGS2 primers pro- or a common biological role for the sequence motifs. The duced ~700 bp amplicons in TME1, TME3 and cv.60444 iSEGS-associated cassava ESTs had assigned functions such (Fig. 5d). as signal recognition, sulphite oxidase activity, RNA helicase and RNA/nucleotide binding, prefoldin, kinesin, dihydropico- Expression of iSEGS‑associated genes during SACMV linate reductase, and quinolinate phosphoribosyltransferase infection (Online Resources 9, 10), largely in agreement with previ- ously mentioned GO categories (Fig. 4; Online Resource 7). To investigate the possible involvement of iSEGS in CMD, transcriptome data from SACMV-infected cassava at 12, 32 Expression analysis of iSEGS and 67 days post-inoculation (dpi) (Allie et al. 2014) were analysed. The expression of iSEGS-associated genes in two Reverse transcriptase (RT)-PCR analyses confirmed that landraces, susceptible T200 and tolerant TME3, showed iSEGS are expressed. Amplicons were of the expected differential expression (DE) during the course of infection sizes when the primer pairs annealed within the iSEGS (Online Resources 11 and 12 for iSEGS1 and iSEGS2, (Fig. 5a, b). When primers were designed to span respectively). In general, gene expression in the tolerant iSEGS1 and an adjacent prefoldin 6 subunit gene (cas- TME3 was largely unperturbed compared to the changes sava4.1_019094 m), or iSEGS2 and an adjacent helicase observed in the CMD-susceptible T200. The number of gene (cassava4.1_025672 m), the expected product sizes DE genes associated with iSEGS2 was higher, concomi- from cDNA could not be reliably ascertained. Unspecific tant with the larger number of iSEGS2 homologues within amplicons were produced from the iSEGS1 and prefol- the genome. Some of the iSEGS-associated DE genes were din primers (Fig. 5c). However, several prefoldin subunit transcripts coding for prefoldin subunits, helicase-domain

1 3 Mol Genet Genomics

Small RNAs targeting SEGS in mock‑inoculated and SACMV‑infected cassava

To identify sRNAs targeting SEGS, six small RNA- enriched libraries were each generated from mock-inocu- lated and SACMV-infected cassava leaves that were col- lected from susceptible T200 and tolerant TME3, at 12, 32 and 67 dpi, and sequenced using the Illumina HiSeq2000 system. The small RNA (sRNA) sequencing raw reads and trimmed read counts, after removing low-quality sequences, adapters, and small sequences (<15 nt), for the twelve libraries are shown in Online Resource 13. In addition, homologous normalised total and unique sRNAs Fig. 6 The percentage (out of the total number of DE genes) of SEGS-associated DE genes in SACMV infected T200 and TME3 for (18–26 nt) from mock and infected leaf tissues targeting each time point (12, 32 and 67 dpi) SEGS were quantified (Online Resource 13). RNA-Seq data from mock-inoculated and SACMV-infected cassava showed that the relative abundances of unique SEGS1 and proteins; ATP-dependent RNA helicase; RNA-binding fam- 2-derived sRNAs were higher in SACMV-infected suscep- ily proteins, GDSL-like lipase/acylhydrolase superfamily tible T200 compared to tolerant TME3 at 67 dpi (Fig. 7a, proteins, ARM-repeat superfamily protein, and an integrase- c). Interestingly, unique and total SEGS1 and 2 sRNA type DNA-binding protein (Table 3 and Online Resources counts were generally higher in frequency in SACMV- 11, 12). In general, genes involved in nucleic acid binding infected T200 compared with mock-inoculated leaves and synthesis were down-regulated in virus-infected plants (Fig. 7a, c), whereas no difference, or the opposite pattern, relative to mock-inoculated controls. DNA- and RNA- was observed for TME3. Total and unique 18–26 nt sRNA dependent polymerases, helicases, DNA- and RNA bind- SEGS-mapped reads were significantly more abundant for ing proteins, ribosomal RNA processing proteins, amongst SEGS2 compared to SEGS1 in both susceptible and toler- others, showed substantial changes in expression levels in ant landraces (Fig. 7b, d). We also note that the differences T200, that were not correspondingly matched in TME3 in sRNA abundances were more apparent in the unique expression data. However, notably, of the total number of (Fig. 7a, c) than the total counts (Fig. 7b, d). The distribu- DE genes, the percentage of iSEGS2-associated genes was tion of perfectly matched sRNAs showed apparent enrich- highest (53, 31 and 50 % at 12, 32, and 67 dpi, respec- ment of the 24-nt population in all four groups (SEGS1 or tively) in tolerant TME3 compared to T200 (11, 5 and 1 % SEGS2 for T200 and TME3) (data not shown). at 12, 32 and 67 dpi, respectively) (Fig. 6). The percentage Complementary to the above analysis, comparison of of iSEGS1-associated DE genes was low (1–2 %) for both iSEGS to published small RNA datasets from cassava vari- landraces. Of particular interest in relation to iSEGS role eties MBRA685 and TAI16 (Pérez-Quintero et al. 2012; in CMD was that among iSEGS-associated genes, many Ballen-Taborda et al. 2013) identified 15 small RNAs were involved in biotic stress responses such as heat-shock (sizes 20–50 bp) with sequence identities between 90 % proteins, ubiquitinylation-associated proteins, enzymes for and 100 % to different regions of SEGS1 (8 sequences) and the prevention of early senescence (mediated by ethylene SEGS2 (7 sequences). and/or ubiquitin), R gene-associated NB-ARC domains, enzymes for chromatin and histone modification, RNA/ DNA-modifying enzymes and enzymes for posttranslational Discussion modification of proteins. Also of interest was the identifi- cation of hydrolase activity as one of the named classes of Since the discovery of two episomal single-stranded DNA iSEGS-associated enzymes. Hydrolases are implicated in sequences, termed Sequences Enhancing Geminivirus hypersensitive reaction-induced necrosis, leading to disease Symptoms (eSEGS1 and eSEGS2), from field-infected cas- resistance by restricting spread of the invading pathogens sava exhibiting severe CMD symptoms (Ndunguru 2006; (Guo et al. 1998). Other interesting iSEGS-associated genes Ndunguru et al. 2008), the nature and function of these implicated in host response to pathogens are transducin/ sequence elements have been elusive. Our study provides WD40 repeat-like and protein kinase superfamily proteins bioinformatics and experimental evidence that SEGS are (SEGS1; scaffold 04116) and NB-ARC domain-containing integrated as multiple fragmented copies in the cassava disease R protein (SEGS2; scaffold 12711), as these are also genome, and resemble non-autonomous transposable ele- involved in resistance to plant pathogens (Cai et al. 2008). ments whose regulation is linked to alterations in CMD

1 3 Mol Genet Genomics

Fig. 7 Size distribution of unique small RNAs mapping to SEGS. Small RNA sizes and abundance in mock-inoculated and SACMV-infected cassava representing values for perfect matches mapping to genomic segments of eSEGS1 or eSEGS2 are shown for T200 or TME3 aetiology. Although the replication of eSEGS molecules of germplasm from field samples collected in Rwanda, has not been conclusively demonstrated, it is clear that in Burundi and Tanzania (Ndomba 2013; Mollel 2014), sug- the presence of ACMV-CM or EACMCV, inoculation with gesting that the presence of iSEGS is a highly conserved eSEGS clones induce enhanced severe symptoms in cas- feature of cassava genomes. Phylogenetic analysis showed sava and N. benthamiana (Ndunguru et al. 2008). Of par- no particular clustering of iSEGS according to germ- ticular importance, CMD tolerance in the well-known West plasm of origin, suggesting that for the evolution of these African landrace TME3 appeared to be compromised when sequences, cross-breeding of cultivars may have been more eSEGS were co-bombarded with the begomovirus infec- important than divergence through duplication and point tious clones. Interestingly, susceptible Arabidopsis ecotype mutations. As the divergence between PCR amplicons was plants co-inoculated with ACMV-CM and episomal SEGS often greater within than across accessions, SEGS integra- also showed enhanced CMD symptoms compared to inoc- tions may be ancestral, may have been gained repeatedly, ulations with the cassava begomovirus alone (de Leòn or are not actively duplicating. et al. 2013). In addition, begomovirus DNA accumulated Integrated SEGS may also be widely represented in to higher levels in a susceptible Arabidopsis ecotype, but other plant species where they play a role in modulation a resistant Arabidopsis ecotype showed no symptoms in of disease. eSEGS1 from cassava has been described in a the presence or absence of the eSEGS. Since no iSEGS study on leaf deformity in mint, caused by a geminivirus– occur in the Arabidopsis genome, we suggest that the effect satellite complex, a 1019-nt amplicon (EU862815) (Borah of exogenous SEGS on CMD symptoms may be due to et al. 2011). This molecule, named Mentha leaf deformity- endogenization of the eSEGS into the Arabidopsis genome, associated DNA-II, showed 98 % nucleotide similarity to and/or regulation by eSEGS via the RNA silencing path- eSEGS1 (DNA-II, AY836366.1). It was proposed that the ways. One of the questions which emerged from the dis- amplicon contributed to the severe deformity observed dur- covery of SEGS fragments in the cassava EST database ing geminivirus-betasatellite infection. In another study, de was whether these novel sequences were present in cassava Leòn et al. (2013) reported that episomal SEGS function germplasm from different sources or even in other plant with a heterologous geminivirus; Cabbage leaf curl virus species. Several analysed germplasm (South American cul- (CaLCuV) in a resistant Arabidopsis ecotype. Plants accu- tivars and African landraces) in our collection contained mulated CaLCuV when co-inoculated with eSEGS clones, multiple iSEGS1 and iSEGS2 copies of varying sizes and but no symptoms or viral nucleic acids were detected when sequence conservation (including the GC-rich region). We inoculations contained only the virus or the eSEGS. Sig- have also uncovered iSEGS widely distributed in a range nificantly, this supports our hypothesis that exogenous

1 3 Mol Genet Genomics

SEGS are functioning through interaction with the genome, group, predominantly miRNA, corroborates the failure to and not with the begomoviruses. While their molecular identify precursor miRNA signatures within the SEGS. In action needs further investigation, SEGS somehow modu- agreement with our findings, we uncovered small RNAs late symptom phenotype by direct or indirect interference with homology to iSEGS1 and iSEGS2 in RNA-seq data with plant resistance mechanisms or transcriptome changes by Pérez-Quintero et al. (2012) and Ballen-Taborda et al. induced by begomoviruses, leading to alteration of gene (2013), from cassava under biotic and abiotic stress, respec- expression and enhanced symptoms. tively, although no miRNA precursors could be identified. iSEGS found in the cassava genome share some features These results suggest a possible transcriptional gene silenc- with the MITE class of TEs. As with SEGS, (1) MITEs ing role for the iSEGS rather than PTGS, although PTGS lack open reading frames (ORFs) and where ORFs are pre- cannot be ruled out. sent, they code for short peptides of unknown functions, Reports on the characterisation of TEs in the cassava and (2) they are found in close proximity to gene coding genome are few. Gbadegesin et al. (2008) identified diverse sequences, and are often co-transcribed with the plant En/Spm-like elements (Meens) and LTR-retrotransposons. genes (Lu et al. 2012). However, there are notable differ- However, such annotations are not yet fully represented in ences with SEGS and most classes of MITEs characterised the draft genome on Phytozome, impeding assignment of to date: (1) MITEs are generally smaller (<600 bp) than the iSEGS elements to definitive classes of repetitive elements. near full-length iSEGS (>1 kb), (2) the copy numbers of In the absence of TIRs, except in the near full-length iSEGS are low (hundreds) compared to up to tens of thou- iSEGS1, the involvement of a trans-acting transposase (a sands copies reported for some MITEs (Oki et al. 2008; classical feature of non-autonomous DNA TEs) cannot be Paterson et al. 2009), and (3) terminal inverted repeats supported, and an alternative dispersal mechanism utilis- (TIRs) and target site duplications (TSD) are characteristic ing rolling circle amplification (RCA) of SEGS, followed features of MITEs which could not be identified for most by detachment of amplicons of varied sizes and subsequent iSEGS. TIRs were only observed for the largest copy of integration in other genome locations is proposed in this iSEGS1, for which the putative TSDs are short and are not study. The presence of inverted-repeat sequences condu- directly adjacent to the TIRs (Online Resource 6). Further- cive to intra-strand recombination is supported by complex more, using MITE-Hunter (Han and Wessler 2010), none cruciform structures obtained in Mfold analyses of iSEGS of the possible 229 MITE families identified in the cassava (Online Resource 12). The GC-rich regions of SEGS and genome corresponded to iSEGS. stable stem/loop structures may lead to the formation of Due to their proximity to genes, MITEs are implicated proposed extra-chromosomal circles, leading to RCA and in regulation of gene expression. RNA from transcribed reintegration of copies into other genome locations by MITEs can regulate gene expression through RNA-directed homologous or illegitimate recombination; however, the gene methylation of adjacent genes (Buchmann et al. recombination potential and thermodynamic stability of 2009), and/or through Post Transcriptional Gene Silenc- the cruciform and hairpin structures from iSEGS require ing (PTGS) of homologous mRNAs. Both mechanisms further characterisation using predetermined stretches of involve small RNAs originating either from the miRNA 50–200 bp. Furthermore, the extensive intra-strand base or small interfering RNA (siRNA) pathways. Differential pairing observed strongly supports our hypothesis that abundances of SEGS-derived sRNAs between T200 and iSEGS are able to regulate gene expression at the nucleo- TME3, as well as between SACMV-infected and mock- tide or transcript level. DNA and RNA secondary struc- inoculated cassava were detected, and further support a tures are known to interact with proteins, affecting cellular role for SEGS in CMD modulation. Furthermore, SEGS functions such as recombination, replication and transcrip- sRNAs were more frequent in susceptible T200 compared tion (Hatfield and Benham 2002). Notably, the fragmented to TME3, suggesting that SACMV causes an increase in nature of the iSEGS, the observed core sequence conserva- SEGS expression, or that, since the differences are more tion between PCR amplicons, and the proposed RCA-medi- apparent in the unique than the total counts, these sRNAs ated dispersal are characteristics of helitrons, a recently could be derived via RNA silencing dicer-mediated cleav- described class of transposons (Kapitonov and Jurka 2007; age of longer transcripts. The 24-nt class was most abun- Thomas et al. 2010). dant and enriched at 32 dpi in both landraces. The 24-nt The functional significance of integrated SEGS was group, representing heterochromatic or repeat-associated explored in light of the proposed role of episomal SEGS siRNAs, is known to guide DNA methylation and histone in exacerbation of CMD symptoms. Our results suggest modification of repetitive sequences (Xie et al. 2004) rather several layers of regulatory potential. First, the location than mediating mRNA cleavage. Although dual coding of iSEGS in intergenic regions, especially close or over- siRNAs and miRNAs from TEs have been reported (Piri- lapping with 5′ or 3′ UTRs, may affect the expression of yapongsa and Jordan 2008), the absence of the 21 nt size neighbouring genes due to proximity effects or alteration

1 3 Mol Genet Genomics of gene sequences (Jurka 2008a; Lisch 2009). Repetitive (Cassava 4.1_025672 m), in T200 at 67 dpi, may facili- sequences, such as transposons and satellite DNA ele- tate SACMV rolling circle replication (RCR), as these ments, are known agents of gene regulation by insertion proteins are known to be involved in cell cycling and mutations. Second, iSEGS-derived transcripts and tran- RCR. Sahu et al. (2010) also found up-regulation of a scription of some neighbouring genes in cassava were DNA2-NAM7 helicase family protein in Tomato leaf curl demonstrated by RT-PCR, corroborating the presence of New Delhi virus (ToLCNDV)-tolerant tomato cultivar. iSEGS-like sequences in 5′ regions of several transcripts While DE analysis can identify genes that are potentially in ESTs libraries. The patterns of integration in genome regulated by iSEGS, pointing to a causal relationship in locations, and ESTs, suggest regulation of associated genes disease modulation, and the role of geminivirus infec- by iSEGS because many unrelated genes contain similar tion in the dynamics of these elements remains unclear. iSEGS stretches. Third, iSEGS-derived small RNAs were It is not known how geminivirus infection activates or identified in RNA-Seq data from mock-inoculated and represses the transcription of iSEGS. In addition, gemi- SACMV-infected cassava leaves. niviruses are known to suppress host defences at both Analysis of selected iSEGS-associated genes that were transcriptional and posttranscriptional levels (Buchmann differentially expressed during SACMV infection also et al. 2009). How such virus-mediated silencing mecha- showed a preponderance of GO functions associated with nisms affect the transcription of iSEGS needs further regulation of processes at nucleic acid and protein lev- investigation. els. Genes with nuclear localised biological functions are We conclude from in silico and NGS studies, that the often related to regulatory networks or cell cycle repro- iSEGS resemble non-autonomous transposable elements, gramming and developmental changes. Prefoldin acts but show unique features that make their classification on DNA-binding proteins in the nucleus, thus playing a challenging. A relationship between iSEGS, gene expres- role in transcriptional regulation, and interestingly, many sion and CMD infection was established in this study. prefoldin-interacting partners are iSEGS-associated and We conclude that iSEGS are modulators of genome func- were differentially expressed during SACMV infection. tion, possibly via transcriptional or posttranscriptional It is noteworthy that SACMV replicates in the nucleus silencing of host genes, irrespective of the geminivirus- (Hanley-Bowdoin et al. 2013) and geminivirus DNA is associated disease, because exogenous episomal SEGS often methylated by host innate immune response mecha- exacerbate symptoms with heterologous geminiviruses nisms (Raja et al. 2008). The overarching properties of (Borah et al. 2011; de Leòn et al. 2013) in plant species iSEGS-associated genes in cassava were regulation of with (cassava) or without (A. thaliana and N. benthamiana) cellular functions and reprogramming global cellular pro- iSEGS. In either case, this is important for future studies cesses. Differential expression analysis during SACMV since iSEGS influence geminivirus-responsive host genes infection of two landraces, T200 and TME3, illustrates and subsequent disease aetiology. We suggest that a RCA how gene expression and silencing mechanisms of dis- mechanism, as proposed for helitrons, may explain the ease response-associated genes may be different in dispa- dispersal of iSEGS within the cassava genome. The func- rate genomic backgrounds. Overall, SACMV-susceptible tions of iSEGS-associated differentially expressed genes in T200 showed greater perturbation in gene expression CMD would prove invaluable, and clearly, the scope of this compared to tolerant TME3 (Allie et al. 2014). Integrated study needs to be extended to investigate these relation- SEGS-associated genes were differentially expressed in ships further. Additionally, dynamic methylation changes SACMV-infected compared to mock-inoculated leaves. within transposons can regulate proximal genes in response Differential expression was also observed during the pro- to virus infection (Dowen et al. 2012). Pathogen-induced gression of infection, demonstrating that genes neigh- transcriptome networks and DNA methylomes are linked to bouring iSEGS are responsive to the begomovirus infec- gene expression regulation, and it would not be unreason- tion. Notably, amongst the most significant differentially able to hypothesise that SACMV-induced symptom modu- expressed iSEGS-associated proximal genes (Table 3), lation may be attributable to methylation changes within several proteins have also been reported in other gemi- transposable-like iSEGS (as evidenced by abundant 24-nt nivirus infections. A NAC domain-containing protein sRNAs in T200 and TME3) that may regulate neighbour- (Cassava 4.1_025277m) was highly enriched in tolerant ing or proximal genes. A study of the DNA methylome, in TME3 at the stage of recovery (67 dpi). NAC transcrip- relation to iSEGS and cassava geminiviruses, would prove tion factors are major transcriptional regulators in plants worthwhile. (Nuruzzaman et al. 2013), and numerous members of this multigene family play roles in regulation of biotic stress Acknowledgments This project was supported by Grants from the National Research Foundation Competitive Grant and the Inter- responses, including geminivirus infection (Selth et al. national Center for Genetic Engineering and Biotechnology, Trieste. 2005). Up-regulation of a helicase-containing protein ATM was supported by Claude Leon Foundation and NRF-South

1 3 Mol Genet Genomics

Africa. We would like to thank Dr. Louis Bengyella for assistance in Chondrus crispus shed light on evolution of the Archaeplastida. the phylogenetic analysis. Proc Natl Acad Sci 110:5247–5252 De Leòn L, Dallas L, Ascencio-Ibáñez J, Sseruwagi P, Robertson D, Conflict of interest The authors declare no conflict of interest. Ndunguru J, Hanley-Bowdoin L (2013) Two CMD-associated DNA sequences enhance geminivirus symptoms and break resistance in cassava and Arabidopsis. 7th Int. Geminivirus Symp. 5th Int. ssDNA Comp. Virol. Work. Hangzhou, China, p References 86 Dowen RH, Pelizzola M, Schmitz RJ, Lister R, Dowen JM, Nery JR, Allie F, Pierce EJ, Okoniewski MJ, Rey C (2014) Transcriptional Dixon JE, Ecker JR (2012) From the Cover: PNAS Plus: wide- analysis of South African cassava mosaic virus-infected suscep- spread dynamic DNA methylation in response to biotic stress. tible and tolerant landraces of cassava highlights differences in Proc Natl Acad Sci 109:E2183–E2191 resistance, basal defense and cell wall associated genes during Fondong VN, Pita JS, Rey MEC, de Kochko A, Beachy RN, Fauquet infection. BMC Genom 15:1006 CM (2000) Evidence of synergism between African cassava Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic mosaic virus and a new double-recombinant geminivirus infect- local alignment search tool. J Mol Biol 215(3):403–410 ing cassava in Cameroon. J Gen Virol 81:287–297 Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Fregene M, Okogbenin E, Mba C, Angel F, Suarez MC, Janneth G, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new gen- Chavarriaga P, Roca W, Bonierbale M, Tohme J (2001) Genome eration of protein database search programs. Nucleic Acids Res mapping in cassava improvement: challenges, achievements and 25:3389–3402 opportunities. Euphytica 120:159–165 Ballen-Taborda C, Plata G, Ayling S, Guez-Zapata F, Lopez-Lav- Gbadegesin MA, Wills MA, Beeching JR (2008) Diversity of LTR- alle B, Luis A, Duitama J, Tohme J (2013) Identification of retrotransposons and Enhancer/Suppressor Mutator-like transpo- Cassava MicroRNAs under Abiotic Stress. Int J Genomics. sons in cassava (Manihot esculenta Crantz). Mol Genet Genom- doi:10.1155/2013/857986 ics 280:305–317 Bao W, Jurka J (2008) CR1 families from Hydra magnipapillata. Gibson RW, Legg JP, Otim-Nape GW (1996) Unusually severe symp- Repbase Reports 8:1845 toms are a characteristic of the current epidemic of mosaic virus Bao W, Jurka J (2013) LTR retrotransposons from the red seaweed. disease of cassava in Uganda. Ann Appl Biol 128:479–490 Repbase Reports 13:2407 Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Bejerano G, Lowe CB, Ahituv N, King B, Siepel A, Salama SR, Mitros T, Dirks W, Hellsten U, Putnam N, Rokhsar DS (2012) Rubin EM, Kent WJ, Haussler D (2006) A distal enhancer and an Phytozome: a comparative platform for green plant genomics. ultraconserved exon are derived from a novel . Nature Nucleic Acids Res 40:1178–1186 441:87–90 Guo A, Durner J, Klessig DF (1998) Characterization of a tobacco Berrie LC, Rybicki EP, Rey ME (2001) Complete nucleotide epoxide hydrolase gene induced during the resistance response sequence and host range of South African cassava mosaic virus: to TMV. Plant J 15:647–656 further evidence for recombination amongst begomoviruses. J Hall TA (1999) BioEdit: a user-friendly biological sequence align- Gen Virol 82:53–58 ment editor and analysis program for Windows 95/98/NT. pp Borah BK, Cheema GS, Gill CK, Dasgupta I (2011) A Geminivirus– 95–98 Satellite Complex is associated with leaf deformity of Mentha Han Y, Wessler SR (2010) MITE-Hunter: a program for discovering (Mint) plants in Punjab. Indian J Virol 21:103–109 miniature inverted-repeat transposable elements from genomic Briddon RW, Bull SE, Mansoor S, Amin I, Markham PG (2002) Uni- sequences. Nucleic Acids Res 38:e199 versal primers for the PCR-mediated amplification of DNA beta: Hanley-Bowdoin L, Bejarano ER, Robertson D, Mansoor S (2013) a molecule associated with some monopartite begomoviruses. Geminiviruses: masters at redirecting and reprogramming plant Mol Biotechnol 20:315–318 processes. Nat Rev Microbiol 11:777–788 Brown J, Fauquet C, Briddon R, Zerbini M, Navas-Castillo J (2011) Harper G, Hull R, Lockhart B, Olszewski N (2002) Viral sequences Family Geminiviridae, 1st edn. Elsevier-Academic, Amsterdam, integrated into plant genomes. Ann Rev Phytopathol 40:119–136 pp 351–373 Hasegawa M, Kishino H, Yano T (1985) Dating of the human–ape Buchmann RC, Asad S, Wolf JN, Mohannath G, Bisaro DM (2009) splitting by a molecular clock of mitochondrial DNA. J Mol Geminivirus AL2 and L2 proteins suppress transcriptional gene Evol 22:160–174 silencing and cause genome-wide reductions in cytosine meth- Hatfield GW, Benham CJ (2002) DNA topology-mediated control ylation. J Virol 83:5005–5013 of global gene expression in Escherichia coli. Ann Rev Genet Cai M, Qiu D, Yuan T, Ding X, Li H, Duan L, Xu C, Li X, Wang S 36:175–203 (2008) Identification of novel pathogen-responsive cis-elements Jurka J (2008a) Conserved eukaryotic transposable elements and the and their binding proteins in the promoter of OsWRKY13, evolution of gene regulation. Cell Mol Sci 65:201–204 a gene regulating rice disease resistance. Plant Cell Environ Jurka J (2008b) P-type DNA transposon families from Hydra magni- 31:86–96 papillata. Repbase Reports 8:353 Clark AG, Eisen MB, Smith DR, Bergman CM, Oliver B, Markow Jurka J, Kohany O (2009) LTR retrotransposons from fruit fly. Rep- TA, Kaufman TC, Kellis M, Gelbart W, Iyer VN, Pollard DA, base Reports 9:1046 Sackton TB, Larracuente AM, Singh ND, Abad JP, Abt DN, Jurka J, Kojima K (2012) LTR retrotransposons from fruit fly. Rep- Adryan B, Aguade M, Akashi H, Anderson WW, Drosophila base Reports 12:1512 12 Genomes Consortium et al (2007) Evolution of genes and Kapitonov VV, Jurka J (2007) Helitrons on a roll: eukaryotic rolling- genomes on the Drosophila phylogeny. Nature 450:203–218 circle transposons. Trends Genet 23:521–529 Collén J, Porcel B, Carré W, Ball SG, Chaparro C, Tonon T, Barbey- Legg JP, Owor B, Sseruwagi P, Ndunguru J (2006) Cassava Mosaic ron T, Michel G, Noel B, Valentin K, Elias M, Artiguenave F, Virus Disease in East and Central Africa: epidemiology and Man- Arun A, Aury J-M, Barbosa-Neto JF, Bothwell JH, Bouget F-Y, agement of A Regional Pandemic. Adv Virus Res 67:355–418 Brillet L, Cabello-Hurtado F, Capella-Gutiérrez S et al (2013) Lisch D (2009) Epigenetic regulation of transposable elements in Genome structure and metabolic features in the red seaweed plants. Annu Rev Plant Biol 60:43–66

1 3 Mol Genet Genomics

Louis B, Waikhom SD, Roy P, Bhardwaj PK, Sharma CK, Singh MW, Pérez-Quintero ÁL, Quintero A, Urrego O, Vanegas P, López C Talukdar NC (2014) Host-range dynamics of Cochliobolus luna- (2012) Bioinformatic identification of cassava miRNAs differen- tus: From a biocontrol agent to a severe environmental threat. tially expressed in response to infection by Xanthomonas axono- Biomed Res Int. doi:10.1155/2014/378372 podis pv. manihotis. BMC Plant Biol 12:1–11 Lu C, Chen J, Zhang Y, Hu Q, Su W, Kuang H (2012) Miniature Piriyapongsa J, Jordan IK (2008) Dual coding of siRNAs and miR- inverted-repeat transposable elements (MITEs) have been accu- NAs by plant transposable elements. RNA 14:814–821 mulated through amplification bursts and play important roles in Pita JS, Fondong VN, Sangaré A, Otim-Nape GW, Ogwal S, Fauquet gene expression and species diversity in Oryza sativa. Mol Biol CM (2001) Recombination, pseudorecombination and synergism Evol 29:1005–1017 of geminiviruses are determinant keys to the epidemic of severe Mansoor S, Khan SH, Bashir A, Saeed M, Zafar Y, Malik KA, Brid- cassava mosaic disease in Uganda. J Gen Virol 82:655–665 don R, Stanley J, Markham PG (1999) Identification of a Novel Prochnik S, Marri PR, Desany B, Rabinowicz PD, Kodira C, Mohiud- Circular Single-Stranded DNA associated with Cotton Leaf Curl din M, Rodriguez F, Fauquet C, Tohme J, Harkins T, Rokhsar Disease in Pakistan. Virology 259:190–199 DS, Rounsley S (2012) The cassava genome: current progress, Mayo M, Leibowitz M, Palukaitis P, Scholthof KBG, Simon AE, future directions. Trop Plant Biol 5:88–94 Stanley J, Taliansky M (2005) Satellites. Elsevier/Academic Raja P, Sanville BC, Buchmann RC, Bisaro DM (2008) Viral genome Press, London, pp 1163–1169 methylation as an epigenetic defense against geminiviruses. J Mollel HG (2014) Interaction and impact of cassava mosaic begomo- Virol 82:8997–9007 viruses and their associated satellites. M.Sc. Thesis. University Sahu PP, Rai NK, Chakraborty S, Singh M, Chandrappa PH, Ramesh of the Witwatersrand B, Chattopadhyay D, Prasad M (2010) Tomato cultivar tolerant Murashige T, Skoog F (1962) A revised medium for rapid growth to tomato leaf curl New Delhi virus infection induces virus-spe- and bio assays with tobacco tissue cultures. Physiol Plant cific short interfering RNA accumulation and defence-associated 15:473–497 host gene expression. Mol Plant Pathol 11:531–544 Nakamura K, Particle Data Group et al (2010) Review of particle Sakurai T, Plata G, Rodríguez-Zapata F, Seki M, Salcedo A, Toy- physics. J Phys G Nucl Part Phys 37:075021 oda A, Ishiwata A, Tohme J, Sakaki Y, Shinozaki K, Ishitani Nawaz-ul-Rehman MS, Fauquet CM (2009) Evolution of geminivi- M (2007) Sequencing analysis of 20,000 full-length cDNA ruses and their satellites. FEBS Lett 583:1825–1832 clones from cassava reveals lineage specific expansions in Ndomba OA (2013) Influence of satellite DNA molecules on sever- gene families related to stress response. BMC Plant Biol 7:66. ity of cassava begomoviruses and the breakdown of resistance to doi:10.1186/1471-2229-7-66 cassava mosaic disease in Tanzania. Ph.D. Thesis. University of Selth LA, Dogra SC, Rasheed MS, Healy H, Randles JW, Rezaian the Witwatersrand MA (2005) A NAC domain protein interacts with tomato leaf Ndunguru J (2006) Molecular characterization of cassava mosaic curl virus replication accessory protein and enhances viral repli- geminiviruses in Tanzania. Ph.D. Thesis. University of Pretoria cation. Plant Cell 17:311–325 Ndunguru J, Fofana B, Legg J, Challepan P, Taylor N, Aveling T, Tamura K (1992) Estimation of the number of nucleotide substitu- Thompson G, Fauquet CM (2008) Two novel satellite tions when there are strong transition-transversion and G C associated with bipartite cassava mosaic begomoviruses enhanc- content biases. Mol Biol Evol 9:678–687 + ing symptoms and capable of breaking high virus resistance in a Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) cassava landraces. Ghent University, Ghent, p 141 MEGA6: molecular evolutionary genetics analysis version 6.0. Ntawuruhunga P, Legg J, Okidi J, Okao-Okuja G, Tadu G, Remington Mol Biol Evol 30:2725–2729 T (2007) Southern Sudan, Equatoria Region, Cassava Baseline Tewari R, Bailes E, Bunting KA, Coates JC (2010) Armadillo-repeat Survey Technical Report, vol 64. IITA, Ibadan protein functions: questions for little creatures. Trends Cell Biol Nuruzzaman M, Sharoni AM, Kikuchi S (2013) Roles of NAC tran- 20:470–481 scription factors in the regulation of biotic and abiotic stress Thomas J, Schaack S, Pritham EJ (2010) Pervasive horizontal transfer responses in plants. Front Microbiol 4:248 of rolling-circle transposons among . Genome Biol Evol Oki N, Yano K, Okumoto Y, Tsukiyama T, Teraishi M, Tanisaka T 2:656–664 (2008) A genome-wide view of miniature inverted-repeat trans- Xie Z, Johansen LK, Gustafson AM, Kasschau KD, Lellis AD, Zil- posable elements (MITEs) in rice, Oryza sativa ssp. japonica. berman D, Jacobsen SE, Carrington JC (2004) Genetic and func- Genes Genet Syst 83:321–329 tional diversification of small RNA pathways in plants. PLoS Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood Biol 2:E104 J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Zhou X (2013) Advances in understanding begomovirus satellites. Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti Ann Rev Phytopathol 51:357–381 AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV et al (2009) Zhou X, Liu Y, Calvert L, Munoz C, Otim-Nape GW, Robinson DJ, The Sorghum bicolor genome and the diversification of grasses. Harrison BD (1997) Evidence that DNA-A of a geminivirus Nature 457:551–556 associated with severe cassava mosaic disease in Uganda has Patil BL, Fauquet CM (2009) Cassava mosaic geminiviruses: actual arisen by interspecific recombination. J Gen Virol 78:2101–2111 knowledge and perspectives. Mol Plant Pathol 10:685–701 Zuker M (2003) Mfold web server for nucleic acid folding and Peaston AE, Evsikov AV, Graber JH, de Vries WN, Holbrook AE, hybridization prediction. Nucleic Acids Res 31:3406–3415 Solter D, Knowles BB (2004) Retrotransposons regulate host genes in mouse oocytes and preimplantation embryos. Dev Cell 7:597–606

1 3