<<

Genome

Genome and transcriptome analysis of the latent pathogen theobromae, an emerging threat to the cacao industry

Journal: Genome

Manuscript ID gen-2019-0112.R1

Manuscript Type: Article

Date Submitted by the 05-Sep-2019 Author:

Complete List of Authors: Ali, Shahin; Sustainable Perennial Crops Laboratory, United States Department of Agriculture Asman, Asman; Hasanuddin University, Department of & Enology Draft Shao, Jonathan; USDA-ARS Northeast Area Balidion, Johnny; University of the Philippines Los Banos Strem, Mary; Sustainable Perennial Crops Laboratory, United States Department of Agriculture Puig, Alina; USDA/ARS Miami, Subtropical Horticultural Research Station Meinhardt, Lyndel; Sustainable Perennial Crops Laboratory, United States Department of Agriculture Bailey, Bryan; Sustainable Perennial Crops Laboratory, United States Department of Agriculture

Keyword: Cocoa, Lasiodiplodia, genome, transcriptome, effectors

Is the invited manuscript for consideration in a Special Not applicable (regular submission) Issue? :

https://mc06.manuscriptcentral.com/genome-pubs Page 1 of 46 Genome

1 Genome and transcriptome analysis of the latent pathogen Lasiodiplodia

2 theobromae, an emerging threat to the cacao industry

3

4 Shahin S. Ali1,2, Asman Asman3, Jonathan Shao4, Johnny F. Balidion5, Mary D. Strem1, Alina S.

5 Puig6, Lyndel W. Meinhardt1 and Bryan A. Bailey1*

6

7 1Sustainable Perennial Crops Laboratory, USDA/ARS, Beltsville Agricultural Research Center-West,

8 Beltsville, MD 20705, USA.

9 2Department of Viticulture & Enology, University of California, Davis, CA 95616

10 3Department of Plant Pests and Diseases, Hasanuddin University, South Sulawesi, Indonesia. 11 4USDA/ARS, Northeast Area, Beltsville, MDDraft 20705, USA. 12 5 Institute of Weed Science, Entomology and , University of the Philippines, Los Banos,

13 Laguna 4031, Philippines.

14 6Subtropical Horticultural Research Station, USDA/ARS, Miami, FL 33158, USA

15

16

17

18 *Corresponding author:

19 Phone: 1-301-504-7985; Fax: 1-301-504-1998

20 E-mail: [email protected]

1 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 2 of 46

21

22 Abstract

23 (Ltheo), a member of the family, is becoming a

24 significant threat to crops and woody plants in many parts of the world, including the major

25 cacao growing areas. While attempting to recover Ceratobasidium theobromae, causal agent of

26 vascular streak dieback (VSD), from symptomatic cacao stems, 74% of recovered fungi were

27 Lasiodiplodia spp. Sequence-based identification of 52 putative Lasiodiplodia isolates indicates

28 that diverse Lasiodiplodia species are associated with cacao in the studied areas, and the isolates

29 showed variation in aggressiveness when assayed using cacao leaf discs. The current study 30 reports on the 43.75 Mb de novo assembledDraft genome of a Ltheo isolate from cacao. Ab initio gene 31 prediction has generated 13,061 protein-coding genes, of which 2,862 are unique to Ltheo, when

32 compared to other closely related Botryosphaeriaceae fungi. Transcriptome analysis revealed

33 that 11,860 predicted genes were transcriptionally active and 1,255 were more highly expressed

34 in planta compare to cultured mycelia. The predicted genes differentially expressed during

35 infection were mainly those involved in carbohydrate, pectin and lignin catabolism, cytochrome

36 P450s, necrosis-inducing proteins and putative effectors. These findings significantly expand our

37 knowledge of the Ltheo genome and Ltheo genes involved in virulence and pathogenicity.

38 Keywords: Cocoa, Lasiodiplodia, genome, transcriptome, effectors

2 https://mc06.manuscriptcentral.com/genome-pubs Page 3 of 46 Genome

39

40 Introduction

41 Lasiodiplodia theobromae (Pat.) Griffon & Maubl., (Ltheo), a member of the

42 Botryosphaeriaceae family, is often considered a latent plant pathogen attacking more than 500

43 plant species in the tropics and subtropics (Burgess et al. 2006; Slippers and Wingfield 2007).

44 The significance of disease caused by Ltheo appears to be increasing in many parts of the world,

45 perhaps in association with global climate change. Environmental factors like temperature and

46 drought are known to influence the interactions between Ltheo and their plant hosts (Paolinelli-

47 Alfonso et al. 2016; Yan et al. 2017; Songy et al. 2019). The effects of climate change on cacao, and

48 the tropics in general, are of increasing Draftconcern (Medina and Laliberte 2017). Theobroma cacao

49 L. (cacao), the source of chocolate, is the major source of income for six million farmers located

50 around the world in tropical climates (World Cocoa Foundation 2014). Most cacao farmers have

51 small plots of land and many suffer major income losses due to destructive diseases (Ploetz

52 2016). Though Ltheo was first reported to cause pod rot and dieback in cacao in 1923 (Nowell

53 1923), it was never considered as a major pathogen of cacao. However, Ltheo has been

54 suggested as a significant constraint for cacao production in some locations, and isolates from

55 symptomatic tissues can cause stem cankers and diebacks when artificially inoculated onto cacao

56 tissues (Mbenoun et al. 2008; Alvindia and Gallema 2017; del Castillo et al. 2016). Typical

57 symptoms on cacao caused by Ltheo can resemble those of other diseases and there is

58 speculation concerning associations of Ltheo with other cacao pathogens, such as canker caused

59 by Phytophthora species (Jaiyeola et al. 2014) and vascular streak dieback (VSD) caused by

60 Ceratobasidium theobromae (Alvindia and Gallema 2017; McMahon and Purwantara 2016).

61 Alvindia and Gallema (2017) reproduced many of the symptoms commonly associated with

3 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 4 of 46

62 VSD on cacao seedlings by inoculating the young leaves with Ltheo. Symptoms included leaf

63 chlorosis and necrotic blotches and leaf scar and stem vascular discoloration.

64 How Ltheo causes disease on such a wide range of host is a question of considerable

65 interest. To establish an infection, Ltheo and related members of the family Botryosphaeriaceae

66 must overcome both preformed and inducible host defenses (Yan et al. 2017), which can vary

67 significantly among hosts. Recently published draft genomes of woody plant pathogens in the

68 Botryosphaeriaceae have provided information about a range of potential virulence factors such

69 as effectors and cell wall modifying enzymes (Blanco-Ulate et al. 2013; Morales-Cruz et al.

70 2015; Paolinelli-Alfonso et al. 2016; van der Nest et al. 2014; Yan et al. 2017). These studies

71 reported the presence of expanded gene families associated with cell wall degradation,

72 membrane transport, nutrient uptake andDraft secondary metabolism, which contribute to adaptations

73 for degrading grapevine tissue (Yan et al. 2017). Although the genome and transcriptome

74 analysis of Ltheo strains pathogenic to grapevine, along with other, mostly grapevine-associated,

75 Botryosphaeriaceae species, has provided a better understanding of Ltheo biology, the study of a

76 strain pathogenic on cacao would help our understanding of how widespread some basic aspects

77 of Ltheo biology are.

78 In this study, we describe the genome and transcriptome of a Ltheo isolate AM2A which

79 was isolated from a cacao stem showing symptoms of vascular streak dieback. The genome of

80 Ltheo isolate AM2As is compared to the genomes of closely related Botryosphaeriaceae

81 pathogens. Expressed genes within the Ltheo genome are characterized through RNA-Seq

82 analysis using infected cacao leaves, and potential effectors expressed during the infection

83 process are identified. These insights add to our understanding of the Ltheo genome and

84 transcriptome.

4 https://mc06.manuscriptcentral.com/genome-pubs Page 5 of 46 Genome

85 Material and method:

86 Isolation and maintenance of fungi

87 Cacao stems showing symptoms of vascular streak dieback (VSD) were collected from Wotu

88 District of South Sulawesi Province, Indonesia (Coordinate point S: 02°33'33.30" E:

89 120°47'51.74", Elevation 39 m) and Davao Region of Philippines between 2014 and 2016.

90 Samples were shipped to USDA-APHIS-PPQ facility at Beltsville, USA and transferred to

91 USDA-ARS Sustainable Perennial Crops Laboratory in Beltsville after inspection. Bark was

92 removed and remaining stem material was cut into one cm long segments. Stem segments were

93 surface sterilized by submerging in 6% (v/v) bleach solution (Clorox, USA) for three minutes 94 followed by three rinses in sterile water.Draft Stem segments were placed on 1.5% water agar (Difco 95 Laboratories, USA) in 100 mm diameter plastic Petri plates and incubated at 25oC under dark.

96 After 2-3 days, stem segments were examined for hyphal growth from the stem segment in

97 contact with the agar. Fungal mycelia were transferred from water agar to Corticium Culture

98 Media (Samuels et al. 2012) containing 100 µg/ml ampicillin. Note the procedure to this point is

99 typically used when attempting to isolate C. theobromae, causal agent of vascular streak dieback

100 (VSD), from cacao tissues and other organisms are commonly isolated from tissues showing

101 symptoms of VSD. Subsequently, fungal isolates were transferred to 20% clarified V8 (CV8)-

102 agar plates and maintained at room temperature.

103 Molecular identification of Lasiodiplodia spp.

104 DNA Extraction

105 For DNA extraction, 2-3 agar plugs (0.25 cm2) from 5 day old cultures of each isolate were

106 transferred to 50 ml falcon tubes containing 20 ml liquid CV8 and grown at room temperature

5 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 6 of 46

107 for 5 days, while shaking at 100 rpm. Mycelia were harvested and DNA was extracted as

108 previously described by Ali et al. (2016).

109 PCR amplification and DNA sequencing of ITS region and EF1α gene

110 For molecular identification and phylogenetic analysis, PCR amplification of the ITS region and

111 elongation factor 1-alpha (EF1α) of the template DNA from 52 Lasiodiplodia isolates was

112 performed using the primers ITS4 and ITS5 described by White et al. (1990) and EF1-688F and

113 EF1-1251R described by (Alves et al. 2008), respectively. PCR amplification, product

114 purification and sequencing was performed as previously described by Ali et al. (2016).

115 Heterozygous or ambiguous sites were labelled using the IUPAC code and sequences were 116 exported for phylogenetic analysis. Draft 117 Phylogenetic analysis

118 A phylogenetic analysis was undertaken to confirm the sequence-based molecular identification

119 and characterization of variation within the Lasiodiplodia isolates associated with cacao. For

120 better phylogenetic representation, both the ITS region and EF1α sequences were combined and

121 aligned using ClustalW2 tool (Larkin et al. 2007) under default settings. A phylogenetic tree was

122 reconstructed using the Maximum Likelihood method based on the Poisson correction model and

123 a distance tree of 1000 bootstrapped data sets was generated by using MEGA v. 6 (Tamura et al.

124 2011).

125 Plant material and leaf disc bioassay

126 To identify a Lasiodiplodia isolate aggressive on cacao leaves, a leaf disc infection bioassay was

127 carried out using a subset of 14 Lasiodiplodia isolates representing the genetic diversity

128 identified. Stage 3 cacao leaves (light green but non-hardened) (Bailey et al. 2005) were

6 https://mc06.manuscriptcentral.com/genome-pubs Page 7 of 46 Genome

129 harvested from cacao trees of clone ICS1. Single leaf discs with 2.1 cm diameters were cut and

130 placed into 60 x 15 mm petri dishes lined with Whatman no. 2 filter paper soaked with sterile

131 distilled water. For inoculation, an agar plug (5 mm diameter) was placed off center beside the

132 midrib of the leaf disc. Controls were treated with water agar plugs. Petri plates were covered

133 and incubated at 25oC and under 12 h light (200 lx) and dark cycles. Observations were taken at

134 1-day intervals and the progression of the necrotic area was quantified as percentage of the area

135 covered. Necrosis progression was recorded separately for leaf blade (lamina), main vein

136 (midrib) and vein. Observations were taken up to 4 days after inoculation and the area under

137 disease progress curve (AUDPC) was calculated according to Shaner and Finney (1977). Each

138 experiment was repeated independently twice with three replicating leaf discs (one per plate) per

139 isolate per experiment. The homogeneityDraft of data sets across replicate experiments was confirmed

140 by two-tailed Pearson correlation analyses conducted using mean data values within GraphPad

141 Prism version 7.0. (r  0.9; P ≤ 0.001). Therefore, data sets from the replicate experiments (a

142 total of 6 leaf discs per isolate) were pooled for the purposes of further analysis. The significance

143 of treatment effects was analyzed within GraphPad Prism version 7.0 by two-way ANOVA with

144 post-hoc pair wise uncorrected Fisher's Least Significance Difference (LSD) comparisons (P =

145 0.05).

146 Isolation of RNA from mycelia and infected plant material

147 For RNA extraction from Ltheo isolate AM2As mycelia, 2-3 agar plugs from a V8 agar plate

148 culture were transferred to 250 ml conical flasks containing 50 ml liquid CV8. Liquid cultures

149 were grown 5 days at 25°C, shaking at 100 rpm. Mycelia were harvested by vacuum filtration

150 through miracloth (Calbiochem, San Diego, USA) and rinsed three times with sterile distilled

151 water followed by flash freezing in liquid nitrogen and freeze drying. Freeze-dried mycelia were

7 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 8 of 46

152 ground in a mortar and pestle in liquid nitrogen and transferred to a 50mL centrifuge tube

153 containing 15 mL of 65°C extraction buffer (Bailey et al. 2005). The remaining extraction

154 procedure was conducted as described by Bailey et al. (2013). Using a NanoDrop

155 spectrophotometer (Thermo Scientific, Wilmington, DE), RNA concentrations were determined

156 based on absorbance at 260 nm and purity was estimated by the 260/280 and the 260/230 ratios.

157 For RNA extraction from infected leaf discs, the leaf disc bioassay was conducted

158 essentially as described above, with the following exceptions. Leaf discs (55 mm diameter) cut

159 out from stage 2 cacao leaves with the midrib in the center were used for the assay. Agar plugs

160 carrying mycelia of isolate AM2As or sterile V8 agar (controls) were placed off center on both

161 sides (two agar plugs per leaf disk) of the midrib of the leaf disc. Leaf discs were harvested 48 h

162 after inoculation, agar plugs removed, andDraft leaf disk flash frozen in liquid nitrogen. Samples were

163 ground with mortar and pestle in liquid nitrogen and RNA extracted as described above.

164 Genome sequencing and assembly

165 Ltheo isolate AM2As genomic DNA was sequenced using Illumina paired-end short-read

166 technology (library preparation and sequencing done by Beijing Genome Institute, Shenzhen,

167 China). DNA sample was sheared into small fragment with a desired size by Covaris S/E210.

168 The overhangs resulting from fragmentation are converted into blunt ends by using T4 DNA

169 polymerase, klenow fragment and T4 polynucleotide kinase. After adding an “A” base to the 3'

170 end of the blunt phosphorylates DNA fragments, adapters were ligated to the ends of the DNA

171 fragments. The desires fragments were purified though gel-electrophoresis and selectively

172 enriched and amplified by PCR to construct a library with 500bp insert size. Sequencing was

173 performed using the Illumina X-ten platform. For assembly, 347,857,726 short reads (100 bp)

174 were trimmed using BBMap version 37.58 (Bushnell 2014) and 173,386,781 paired-end reads

8 https://mc06.manuscriptcentral.com/genome-pubs Page 9 of 46 Genome

175 were assembled using SPAdes Genome Assembler version 3.11.0 (Bankevich et al. 2012) in read

176 error correction and assembling mood. Key parameter K-mers were set at K21, K33, K55.

177 Ab initio gene prediction

178 The ab initio gene prediction was performed from the assembly results using AUGUSTUS

179 version 2.7 (Stanke et al. 2004), trained with Diplodia corticola gene models (GenBank:

180 MNUE01000000). The predicted proteins were compared against NCBI non-redundant (NR)

181 protein databases by BLASTp to identify biological functions (Altschul et al. 1997). Open

182 reading frames were also annotated using Blast2GO (http://www.blast2go.com/b2ghome)

183 (Conesa et al. 2005) and the KEGG–database of metabolic pathways (Moriya et al. 2007).

184 Identification of core eukaryotic genes

185 To assess transcriptome completeness Draft of assemblies and predicted genes, the Benchmarking

186 Universal Single-Copy Orthologs (BUSCO) (Simão et al. 2015) strategy was used. BUSCO

187 assembly was run using the eukaryote profile under default settings.

188 Determining secretomes and effectors

189 Ltheo protein coding sequences were scanned for possible signal peptides using SignalP, version

190 3.0 (Petersen et al. 2011). The amino acid sequences containing predicted signal peptides were

191 scanned for transmembrane proteins using the TMHMM program (prediction of transmembrane

192 helices in proteins) (Sonnhammer et al. 1998). Proteins with no more than one transmembrane

193 domain were considered potential components of the secretome. Fungal effectors among

194 secretome was identified using the machine learning program EffectorP 2.0 (Sperschneider et al.

195 2018a).

196 Determining pathogenicity related proteins

9 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 10 of 46

197 To determine the pathogenicity related proteins, Ltheo predicted proteins were compared with

198 the Pathogen-Host Interaction database (PHI-base) (Winnenburg et al. 2006). The predicted

199 protein sequences were used in a local BlastStation2 software (TM Software, Arcadia, CA)

200 analysis in the PHI-base (version 4.4). Proteins with E<10-10 and >40% sequence identity for

201 BLASTp were considered as homologs.

202 Identification of genes encoding cell wall degrading enzymes

203 To identify Ltheo genes encoding carbohydrate-active enzymes related to cell wall and other

204 organic matters, Ltheo predicted genes were analyzed by BLASTp program against

205 Carbohydrate-Active enzymes database (CAZymes) at the threshold value of E<10-10. Proteins

206 possessing a sequence identity more than 40% with biochemically characterized CAZymes were

207 considered as candidate. Draft

208 Transcriptome sequencing

209 RNA-Seq analysis from three replicating RNA samples from mycelia and infected plant material

210 was carried out by the National Center for Genome Resources (Santa Fe, NM, USA). cDNA was

211 generated using the RNA library preparation TruSeq protocol developed by Illumina

212 Technologies (San Diego, CA). Using the kit, mRNA was first isolated from total RNA by

213 performing a polyA selection step, followed by construction of paired-end sequencing libraries

214 with an insert size of 160 bp. Paired-end sequencing was performed using the Illumina

215 HiSeq2000 platform. Samples were multiplexed with unique six-mer barcodes generating

216 filtered (for Illumina adapters/primers and PhiX contamination) 2x50bp reads. The sequences

217 acquired by RNA-Seq were verified by comparison to the genomes assembled in this study.

218 RNA reads from RNA-Seq libraries ranging from 42 to 78 million reads in fastq format were

219 trimmed up using BBDuk version 37.58 (Bushnell 2014), using adapters.fa with parameters

10 https://mc06.manuscriptcentral.com/genome-pubs Page 11 of 46 Genome

220 ktrrim=r, k=23, mink=11, hist=1, tpe, tbo. Trimmed reads were aligned using HISAT2 2.1.0

221 (Pertea et al. 2016) to the coding sequences (CDS) of the Ltheo genomes. Tabulated raw counts

222 of reads to each CDS were obtained from the HISAT2 alignment. Estimation and statistical

223 analysis of expression level using the count data of each gene with three replicates for each

224 library were performed using the DESeq2 package in the R statistics suite (Anders and Huber

225 2010). For DESeq2's default normalization method, scaling factors are calculated for each lane

226 as median of the ratio, for each gene, of its read count of its geometric mean across all lanes and

227 apply to all read counts.

228 Data availability 229 The DNA sequence data of ITS 1 and 2Draft regions of the 52 Lasiodiplodia spp. have been deposited 230 at GenBank under the accession MH412939 to MH412990. The complete nucleotide sequence

231 assemblies and the Whole Genome Assembly of the Ltheo isolate AM2As has been deposited at

232 GenBank under the accession QCYV00000000, under BioProject PRJNA388190. The combined

233 transcriptome assembly from multiple tissues have been uploaded as supplementary Excel files

234 available with the article through the journal Web site.

235 Result and Discussion:

236 Collection and identification of fungi

237 VSD, caused by a near fastidious basidiomycete C. theobromae, is a serious threat to the cacao

238 industry in South East Asia (McMahon and Purwantara 2016). In an attempt to isolate the VSD

239 pathogen, we initially collected stems showing symptoms of VSD from Indonesia. Using water

240 agar medium, 47 fungal isolates were obtained from stems collected in Indonesia and showing

241 symptoms of VSD, 74% of which were identified as Lasiodiplodia spp., and 15% as Diaporthe

11 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 12 of 46

242 sp., the remainder being Fusarium spp., based on BLASTn search of the ITS rDNA sequence

243 (gen-2019-0070.R3Suppla Figure S1).

244 We also collected cacao stem samples with VSD symptoms from the Philippines and

245 isolated fungal cultures as described above. Fifteen fungal isolates from the Philippines, along

246 with 2 isolates from Miami, Florida were validated as Lasiodiplodia spp. based on BLASTn

247 search of their ITS rDNA sequences. Alvindia and Gallema (2017) reported on Ltheo causing

248 many of the symptoms associated with VSD in the Philippines based on the isolation of the

249 organism from symptomatic material. It is noted here that C. theobromae, being near fastidious,

250 is difficult to isolate from infected tissues (Samuels et al. 2012), so the isolation of other

251 organisms from VSD-symptomatic tissues is not unexpected and does not preclude the original

252 tissue symptoms being caused by C. theobromaeDraft. Regardless, the isolation of multiple isolates

253 of several Lasiodiplodia species, some of which we demonstrate have potential to cause disease

254 symptoms on cacao mimicking, in part, some of the symptoms of VSD cannot be ignored,

255 Whether acting as a primary, latent or opportunistic pathogen, it is important to

256 understand the molecular diversity, and pathogenic makeup of the organism. Alignment results

257 of the ITS rDNA sequences obtained here of these isolates showed 97-100% similarity with the

258 ITS rDNA sequences of various Lasiodiplodia sp. reported in the NCBI database (gen-2019-

259 0070.R3Supplb). The 50 Lasiodiplodia isolates identified herein from South East Asia, along

260 with two isolates obtained from cacao in Miami, Florida were subjected to phylogenetic study to

261 assess the diversity of Lasiodiplodia species associated with cacao in the areas studied hear in.

262 The parts of rDNA sequenced in this investigation include the entire ITS1 and ITS2 regions and

263 the 5.8S rRNA gene and the part of EF1α gene sequences. As the intraspecific variation of the

264 ITS rDNA is usually low in Lasiodiplodia (Alves et al. 2008), the EF1α gene sequence was

12 https://mc06.manuscriptcentral.com/genome-pubs Page 13 of 46 Genome

265 combined with the ITS rDNA sequence from the 52 Lasiodiplodia isolates acquired herein, and

266 other Lasiodiplodia spp. and related Botryosphaeriaceae Diplodia corticola and

267 lutea obtained from Genbank. The maximum likelihood tree grouped the 52

268 isolates on distinct branches in accordance with these species (Figure 1). Lasiodiplodia clade

269 was resolved into five sub-clades corresponding to L. gonubiensis, L. crassispora, L.

270 rubropurpurea, L. venezuelensis and L. theobromae (Alves et al. 2008). All isolates (except

271 AM54B, AM52 and AM54B2) fall into the L. theobromae sub-clade. Thirty-three isolates can

272 be confirmed as Ltheo based on their sequence similarity to known Ltheo isolates and their

273 phylogenetic position with almost no intraspecific diversity, except isolate AM50A (Figure 1).

274 The Ltheo group includes multiple isolates from Indonesia and the Philippines. Another set of 16

275 isolates separates from the Ltheo isolatesDraft into multiple separate diverse groups with 50%

276 confidence limit when bootstrap analysis was performed with 1000 replicates (Figure 1). This set

277 of 16 diverse isolates may represent cryptic species. Although the BLASTn search against the

278 NCBI database indicates that the isolates belonging to this diverse groups are mostly likely L.

279 pseudotheobromae, a few isolates showed sequence similarity with other Lasiodiplodia species

280 (gen-2019-0070.R3Supplb). As in many instances, Lasiodiplodia isolates with exact same

281 sequence have been designated as different Lasiodiplodia sp. in the NCBI database, we would

282 need to sequence more marker genes to clearly identify these species. The third group of 3

283 isolates (AM54B, AM52 and AM54B2) were distinct, being close to L. rubropurpurea with 99%

284 sequence similarity (Figure 1 and gen-2019-0070.R3Supplb). L. rubropurpurea has been

285 previously reported from eucalyptus tree in Australia (Van der Linde et al. 2011; Burgess et al.

286 2006), this being the first report of that species outside Australia. Though further in depth study

287 would be needed to identify the exact species for many of these isolates, it is clear, that cacao

13 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 14 of 46

288 from these two areas of Indonesia and the Philippines harbors multiple species of Lasiodiplodia

289 in association with cacao. That multiple species interact with cacao in the field brings into focus

290 the need for a better understanding of their potential pathogenicity on cacao. Therefore, we

291 selected 14 isolates representing the three groups to conduct bioassays and test their

292 aggressiveness on cacao.

293 Leaf disc bioassay

294 All the 14 Lasiodiplodia isolates tested started to show water-soaked lesions on the inoculated

295 leaf disc at 1 day post-inoculation (dpi) and necrosis started at 2 dpi (gen-2019-0070.R3Suppla

296 Figure S2). AUDPC was calculated using the area under necrosis (% of area) measures for the

297 three primary tissues of the leaf (main vein, auxiliary veins, and leaf blade) at 24 h intervals over

298 4 days. Though the rate of necrosis Draft progression varied between the three tissues for some

299 isolates, the isolates showed similar trends across tissues (gen-2019-0070.R3Suppla Figure S3).

300 Focusing on the leaf blade, there is variation in the rate of spread of necrosis among the isolates

301 tested (P < 0.05) (Figure 2). Among the 6 Ltheo isolates tested (AM50B, 21A, 29A, 2As, 19B

302 and 50A), AM2As and AM50A were most aggressive. Among the other 6 Lasiodiplodia sp.

303 isolates (AM27C, AM27B, AM36A, AM19A, AM25B and AM54A), AM27B was the most

304 aggressive among all the isolates tested here in, while the AM27C and AM54A were the least

305 aggressive. The 2 putative L. rubropurpurea isolates (AM54B and AM52) tested had limited

306 aggressiveness in this assay (Figure 2). Interspecies variation in pathogenicity has been observed

307 among the Botryosphaeriaceae (Urbez-Torres and Gubler 2009). Here we see variation in

308 aggressiveness for Lasiodiplodia spp. The presence of multiple Lasiodiplodia spp., coupled with

309 variation in aggressiveness, and a common appearance of symptoms at the later stages of

310 infection, is likely to hinder the development of disease management strategies in cacao. We

14 https://mc06.manuscriptcentral.com/genome-pubs Page 15 of 46 Genome

311 selected Ltheo isolate AM2As for further study, it being aggressive and a member of the most

312 common Lasiodiplodia species recovered. We report here the draft genome sequence of Ltheo

313 isolate AM2As to identify and better understand its possible virulence determinants, an

314 understanding that may prove critical to disease management.

315 Genome Assembly and Annotation

316 Ltheo isolate AM2As was subjected to whole-genome shotgun sequencing generated by Illumina

317 technology (Table 1). Around 347 million Illumina reads were fed into SPAdes Genome

318 Assembler that resulted 43.75 Mb genome sequence with approximately 91X coverage. The

319 assembly consisted of 833 contigs with N50 length of 0.87 Mb. The overall GC content of the 320 Ltheo genome is 54.73%. For quantitativeDraft assessment of genome completeness, BUSCO analysis 321 was conducted that indicated Ltheo contains 99.23% of examined loci (95.2% complete genes

322 and 3.95% fragmented genes). Another recently published Ltheo isolate CSS-01s, has a very

323 similar genome size of 43.7 Mb and average GC content of 54.8% (Yan et al. 2017). Compared

324 to other pathogens within the Botryosphaeriaceae family, the estimated genome size of Ltheo

325 was larger than those of Diplodia corticola (34.9 Mb), but similar to Neofusicoccum parvum

326 (42.59 Mb) (Blanco-Ulate et al. 2013) and Macrophomina phaseolina (48.8 Mb) (Islam et al.

327 2012).

328 Ab initio gene prediction using AUGUSTUS, generated 13,061 protein-coding genes in

329 Ltheo genome with an average sequence length of 1,639.5bp. The gene density calculated was

330 0.489, meaning that the coding regions of these predicted genes covers 48.9% of the whole

331 genome. Using the same ab initio gene prediction strategy, Blanco-Ulate et al. (2013) has

332 reported 10,366 protein coding genes in N. parvum and Islam et al. (2012) has predicted 12,231

333 protein coding genes in M. phaseolina. Whereas, using a slightly different approach, Yan et al.

15 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 16 of 46

334 (2017) has predicted 12,902 protein coding genes in Ltheo isolate CSS-01s. As

335 Botryosphaeriaceae fungi carry limited numbers of repetitive elements (around 3.4%) in their

336 genomes (Yan et al. 2017), the high similarity in genetic structure of these organisms was

337 expected. Functional annotation of the 13,061 predicted genes showed that 7,980 (61%) could be

338 assigned with GO terms and 4,035 (30.8%) predicted genes could be mapped to the KEGG

339 pathway database (Table 1 and gen-2019-0070.R3Supplc). Among the 13,061 predicted Ltheo

340 proteins, 1,372 were further predicted to contain signal peptides. Finally, scanning for

341 transmembrane proteins using the TMHMM program, 1,202 proteins with no more than one

342 transmembrane domain were considered components of the secretome (gen-2019-

343 0070.R3Supplc). A total of 1,279 predicted Ltheo proteins showed homology with the proteins

344 included in the PHI-base. Among that, 150Draft predicted proteins are related to loss of pathogenicity,

345 533 are related to reduced virulence, 20 are related to increased virulence, while 71 predicted

346 proteins are related to lethality when their functions are disrupted (gen-2019-0070.R3Supplc).

347 Similarly, 1,120 PHI-base hits were identified in D. seriata, while 1,384 were identified in N.

348 parvum (Morales-Cruz et al. 2015). To understand the potential Ltheo genes involved in organic

349 matter degradation we identified CAZymes in the transcriptome. A total of 606 CAZymes

350 mapping to 718 predicted Ltheo proteins were identified (gen-2019-0070.R3Supplc). Although a

351 direct comparison was not possible, as Yan et al. (2017) haven’t released the assembled

352 CDS/protein sequence data in the published genome (GenBank accession no. MDYX01000000),

353 these numbers were quite similar to the Ltheo isolate CSS-01s (Yan et al. 2017).

354 Comparisons to other fungi

355 Comparative genomics reveals information on genetic variation and evolutionary dynamics

356 between species and their specific adaptations. Among the published Botryosphaeriaceae

16 https://mc06.manuscriptcentral.com/genome-pubs Page 17 of 46 Genome

357 genomes, Ltheo AM2As has the second largest genome after M. phaseolina (48.88 Mb) (Table

358 2). To identify predicted genes that are exclusive to a genome or to a group, bidirectional

359 BLAST analysis were conducted with Ltheo, Diplodia corticola, Neofusicoccum parvum and

360 cinerea, enabling identification of similarities such as pathogenicity genes, and family

361 specific genes. D. corticola and N. parvum were selected because they are Botryosphaeriaceae

362 and exhibit similar environmental adaptations, whereas B. cinerea is a non-Botryosphaeriaceae

363 and highly pathogenic. The number of predicted species-specific genes is 2,862 for Ltheo (LT),

364 1,086 for D. corticola (DC), 1,269 for N. parvum (NP) and 10, 798 for B. cinereal (BC) (Figure

365 3). Of the 2,862 Ltheo-specific predicted genes, 1,661 have putative functions and 1,619 were

366 detected as transcriptionally active by RNA-Seq analysis (with ≥5 normalized reads, either in

367 mycelia, or in planta) (gen-2019-0070.R3Suppld).Draft Based on the annotation, several of these

368 predicted genes appear to have a role in plant defense modification, cell adhesion, cell wall

369 degradation, pectin degradation, oxidoreductases and membrane transport (gen-2019-

370 0070.R3Suppld). Ltheo has been reported to be the most virulent pathogen of vine among

371 the Botryosphaeriaceae family (Úrbez-Torres and Gubler 2009). The species-specific genes

372 identified here likely support that virulence. We could not include the transcriptome of the other

373 published Ltheo genome of CSS-01s (Yan et al. 2017) in this comparison as the assembled

374 transcriptome has not been made available. A BLASTn search of the Ltheo isolate AM2As

375 against the CSS-01s contigs showed that 9,410 genes are ≥99% similar. Among the 383 Ltheo

376 AM2As genes which are ≤90% similar to the Ltheo CSS-01s genome, 297 genes are Ltheo-

377 specific genes (gen-2019-0070.R3Supple). This indicates that, though the two isolates have very

378 similar genomes, there is a set of differentiating genes and those genes are mostly unique to

379 Ltheo compared to other related Botryosphaeriaceae fungi.

17 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 18 of 46

380 Differential transcriptome analysis during cacao infection

381 To validate the expression of the Ltheo predicted genes, RNA-Seq was performed on RNA from

382 culture grown mycelia and from infected cacao leaves at 48 h post infection. The RNA-Seq

383 analysis identified 11,698 and 10,310 transcripts (with ≥5 raw reads) for the mycelia and in planta

384 samples, respectively. Combining these data 11,860 predicted genes could be detected in RNA-

385 Seq data from and infected leaves, leaving only 1,201 predicted genes without

386 transcripts. Islam et al. (2012) has reported that only 9,934 predicted genes were transcriptionally

387 active in the M. phaseolina genome with 13,806 predicted genes. Among the transcribed genes,

388 1,255 were preferentially expressed in planta (>2 Log2 and Padj <0.05) compared to mycelia.

389 On the other hand, 1,753 transcribed genes were preferentially expressed in culture grown

390 mycelia (>2 Log2 and Padj <0.05) comparedDraft to in planta (gen-2019-0070.R3Supplc). KEGG

391 pathway analysis of the differentially expressed genes showed that in planta, although pathways

392 such as starch and sucrose metabolism, pentose and glucuronate interconversions,

393 glycolysis/gluconeogenesis, fatty acid degradation and biosynthesis of unsaturated fatty acids

394 showed some perturbation (gen-2019-0070.R3Supplf), changes to major metabolic pathways

395 were limited in general. This resiliency despite such a significant change in external conditions

396 may partially explain the plasticity of Ltheo, allowing adaptation to its broad host range.

397 Yan et al. (2017) identified a total of 285 up-regulated genes and 243 down-regulated

398 genes during the early infection stages of Ltheo in grape vine. Yan et al. (2017) also reported that

399 the up-regulated genes were largely secreted, facilitating the degradation of cell walls. Genome

400 comparison of Botryosphaeriaceae species with opportunistic, pathogenic and non-pathogenic

401 fungi revealed that they are more closely related to opportunistic fungi than the other two (Yan et

402 al. 2017). Endophytic fungi like Trichoderma spp. on cacao also utilize genes encoding enzymes

18 https://mc06.manuscriptcentral.com/genome-pubs Page 19 of 46 Genome

403 targeting digestion and modification of the hosts cell wall (Bailey et al. 2006). That these

404 enzymes would be of importance in the establishing associations between fungi and plants seems

405 logical since they likely function in acquisition of nutrients from the surrounding environment,

406 whatever the outcome of the interaction might be. Ltheo has been described as a primary

407 pathogen (Machado et al. 2014), opportunistic pathogen (Mullen et al. 1991), or endophyte

408 (Rubini et al. 2005) depending on the specific interaction and situation of isolation. Considering

409 the complexity of Ltheo interactions with cacao, the manner Ltheo interacts with plants through

410 its transmembrane and secreted proteins is warrantied.

411 Membrane proteins 412 Membrane transporters play a Draft vital role in fungal pathogenesis and protection against 413 host defense mechanism (Perlin et al. 2014). Plant pathogenic fungi depend heavily on their

414 ability to exploit host-derived resources like carbohydrates or peptides, for which they need the

415 transporters to facilitate the uptake of degraded cellular components. During the infection

416 process, fungi also constantly need to get rid of any phytotoxins or xenobiotics that would

417 otherwise hinder their success in processes that can also rely on membrane transporters. Based

418 on the screening result for transmembrane proteins using the TMHMM program and BLASTp

419 search against NCBI non-redundant (NR) protein databases, 827 Ltheo proteins are predicted to

420 be membrane bound transporter proteins, with MFS transporters being the largest group (Table

421 3). Ltheo encodes the highest number of membrane transporters among all the sequenced

422 Botryosphaeriaceae species (Yan et al. 2017). More than 90% of these transporter proteins were

423 transcriptionally active either in mycelia or in planta, with only 93 being more highly expressed

424 in planta and 159 being more highly expressed in mycelia (Table 3). The sugar transporters were

425 the exception in this trend with 20 being more highly expressed in planta compared to just 9

19 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 20 of 46

426 being more highly expressed in mycelia. Similarly, more genes encoding non-transporter

427 membrane bound proteins are more highly expressed in mycelia (162) compared to those more

428 highly expressed (78) in planta (Table 3). These differences are likely indicators of the

429 significant necrotrophic ability of Ltheo as the mycelia were grown in a complex plant based

430 media V8. Membrane protein composition depends on available nutrient sources. For the

431 necrotrophic pathogen B. cinerea, it was observed that, when grown in the presence of

432 cell walls as a sole carbon source, membrane protein production was directed toward cell wall

433 degrading enzymes and proteins involved in toxic resistance (Liñeiro et al. 2016). On the other

434 hand, when B. cinerea was grown in presence of glucose as a sole carbon source, the changes in

435 membrane proteins were related to signaling process, protein biosynthesis and modification

436 process in the endoplasmic reticulum andDraft vesicle mediated transport (Liñeiro et al. 2016). The in

437 planta differential expression of the Ltheo transcriptome identified here, though narrower than

438 that observed in V8 grown mycelia, likely focused on specific substrates and secondary

439 metabolites released during the infection process on cacao.

440 Ltheo secretome

441 Proteins secreted by plant pathogenic fungi have the potential to interact with and alter host cells

442 and therefore, their identification and characterization is essential to understanding virulence and

443 the mechanism of infection. As predicted by SignalP-4.1, 1,202 proteins with no more than one

444 transmembrane domain were considered potential components of the Ltheo secretome,

445 accounting for 9.2% of its proteome. Among the secretome, 1020 were found to be

446 transcriptionally active and 397 were more highly expressed during cacao leaf infection (gen-

447 2019-0070.R3Supplg). On the other hand, Yan et al. (2017) has reported 937 secreted proteins in

448 another Ltheo isolate from grape vine and found that 105 were up-regulated during infection.

20 https://mc06.manuscriptcentral.com/genome-pubs Page 21 of 46 Genome

449 More than 50% of the Ltheo secretome identified herein are secreted into the apoplast (gen-

450 2019-0070.R3Supplg) as predicted by ApoplastP 1.0 (Sperschneider et al. 2018b).

451 Secreted effectors

452 Fungi and oomycetes secrete effectors that suppress the pathogen-associated molecular pattern

453 (PAMP) triggered plant immunity. But, we have very limited knowledge on Botryosphaeriaceae

454 effectors that contribute to virulence. Yan et al. (2017) did an initial prediction of Ltheo effectors

455 based on protein sequence length and number of cysteine residues, and identified 359 putative

456 effectors in Ltheo isolate CSS-01s. Using same approach, 384 putative effectors were identified

457 in the transcriptome of AM2As. We further used the machine learning program EffectorP 2.0

458 (Sperschneider et al. 2018a) to predict 115 effector proteins among the secretome (gen-2019-

459 0070.R3Supplg). Among these 115 effectors,Draft 85 are apoplastic while 66 are transcriptionally

460 active in mycelia or infected plant tissue (48 hpi) (gen-2019-0070.R3Supplg). Evolutionary

461 relationships among the effectors indicates that they are highly diverse and not dominated by any

462 significant phyletic group (Figure 4A). As expected, most expressed effectors were more highly

463 expressed in planta compare to mycelia (Figure 4B). Fifty-three of the effectors encode

464 hypothetical proteins, which suggests novel effector functions may exist in Ltheo. Yan et al.

465 (2017) tested a few Ltheo effectors by expressing them in Burkholderia glumae, and then

466 infecting Nicotiana benthamiana with the transformed bacteria. Five out of seven effectors tested

467 showed strong suppressive effect of the B. glumae triggered hypersensitivity in N. benthamiana.

468 Among the 39 effectors more highly expressed in planta, 3 genes have pectate lyase domains

469 (LTHEOB_1556, 2730 and 7548). Pectate lyases can be essential for the fungal virulence (Cho

470 et al. 2015; Yang et al. 2018), but their potential function as effectors need further consideration.

471 Another infection expressed effector (LTHEOB_3882) has a NPP1 domain. Necrosis inducing

21 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 22 of 46

472 proteins (NPPs) are induced during infection and are associated with plant cell death (Fellbrich et

473 al. 2002). Some other interesting infection expressed effectors requiring further study are

474 LTHEOB_3870 with CAP domain, LTHEOB_197 with CFEM and LTHEOB_12487 with

475 RALF domain. Another set of interesting group requiring further study are the 14 in planta

476 expressed effectors that are specific to Ltheo, compared to other Botryosphaeriaceae species

477 (gen-2019-0070.R3Suppld).

478 Secreted CAZymes

479 Phytopathogenic fungi encode CAZymes, that play an important role during colonization and

480 infection by breakdown and modification of plant cell wall structures. To understand the

481 potential genes involved in the adaptation of Ltheo to cacao habitats and substrates we identified

482 the repertoire of CAZymes of this cacaoDraft pathogen. Among the 718 predicted Ltheo genes

483 mapped to 606 CAZymes, 323 are predicted to be secreted protein based on the presence of a

484 signal peptide (Table 4) and rest of the 395 are predicted to be non- secreted gen-2019-

485 0070.R3Suppla Table S1). Yan et al. (2017) has reported slightly higher numbers, a total of 763

486 Ltheo genes mapped to 820 CAZymes. The differences found may be due to different cut offs

487 used in the BLAST homology search. Unfortunately, a direct comparison between the two

488 transcriptomes was not possible. Of the secreted CAZymes, 306 are transcriptionally active and

489 more than 50% of these are more highly expressed during infection, whereas only 39 secreted

490 CAZymes were more highly expressed in mycelia. Glycoside hydrolases (GH) formed the

491 largest group followed by auxiliary activities (AA) and polysaccharide lyases (PL) among the

492 infection expressed secreted CAZymes (Table 4). Most of these infection-expressed secreted

493 CAZymes can be considered as highly expressed (treatments mean >1000 reads) (Table 4).

494 Previous studies related to Botryosphaeriaceae genomes identified a broad range of key genes

22 https://mc06.manuscriptcentral.com/genome-pubs Page 23 of 46 Genome

495 responsible for cell degradation of woody plants (Morales-Cruz et al. 2015; Paolinelli-Alfonso et

496 al. 2016). Pectinases degrade pectin which is a component of the plant primary cell wall and

497 middle lamella. Out of the 28 Ltheo pectinases, 26 are predicted to be secreted proteins. There

498 were 26 Ltheo pectinases preferentially expressed in planta compared to just one preferentially

499 expressed in mycelia. Ltheo possesses the highest number of pectolytic enzyme coding genes

500 among the sequenced Botryosphaeriaceae genomes (Yan et al. 2017). The significant induction

501 of these genes during infection is consistent with their critical roles in pathogenesis. Because

502 most of the dicotyledon cell walls consist of around 35% pectin, these pectolytic enzymes

503 facilitate the breakdown of cell walls (Have et al. 1998). In addition, there were 90 GH and other

504 carbohydrate modifying genes preferentially expressed in planta compared to just 13

505 preferentially expressed in mycelia (TableDraft 4). Among these, a wide array of enzymes (AA9,

506 GH5, GH3, GH16, GH43) target cellulose and hemicellulose, like other pathogens (Yan et al.

507 2017; Morales-Cruz et al. 2015; Paolinelli-Alfonso et al. 2016). Together, the up-regulation of

508 genes encoding secreted CAZymes targeting pectin, cellulose, hemicellulose and xylan during

509 infection explains the rapid colonization and infection of Ltheo on plants. Among the non-

510 secreted CAZymes, GH was also the largest group followed by glycosyltransferases (GT).

511 Though, more than 95% of the 393 non-secreted CAZymes are transcriptionally active, only 50

512 were preferentially expressed during infection of cacao, and 34 were preferentially expressed in

513 mycelia (gen-2019-0070.R3Suppla Table S1).

514 Non-CAZymes secreted proteins

515 Beside secreted effectors and plant cell degrading enzymes, fungi have a broader arsenal

516 of secreted proteins/enzymes at their disposal when causing disease. The most notable are

517 peptidases, alternative oxidase, cell wall proteins, reductases, necrosis inducing proteins (NPPs),

23 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 24 of 46

518 esterase, pathogenesis associated (PR) proteins, fungal hydrophobins, cytochrome P450s, chitin

519 synthesis and genes with putative role in phenolic, melanin, protein metabolism (Paolinelli-

520 Alfonso et al. 2016; Meinhardt et al. 2014; Bailey et al. 2014). Among the 764 non-CAZyme and

521 non-effector secreted protein codding genes, proteases/peptidases are the largest group with 78

522 genes, of which, 26 being preferentially expressed during infection compared to 8 preferentially

523 expressed in mycelia (Table 5). The next major group of genes in this list are oxidases/reductases

524 (60 genes), with 31 genes being preferentially expressed during infection compared to 6

525 preferentially expressed in mycelia (Table 5). These genes participate in processes including

526 sterol biosynthesis, degradation of lignin and breakdown of environmental contaminants (van

527 Gorcom et al. 1998). Similarly, induction of 4 tannase feruloyl esterase genes during infection

528 also suggests accelerated lignin degradation,Draft as they act as accessory enzymes to assist

529 xylanolytic and pectinolytic enzymes in gaining access to their site of action during biomass

530 conversion (Dilokpimol et al. 2016). Besides the proteins with known putative functions, many

531 in these groups encode hypothetical proteins, again needing more detailed study. Motif search

532 have identified functional domains in a few of these hypothetical proteins (gen-2019-

533 0070.R3Supplg).

534 Conclusion

535 From an endophyte to opportunistic plant pathogen, and now, potentially, a significant emerging

536 threat to cacao, Ltheo appears to have the tools within its genome to evolve in response to

537 changes in climate and crop production practices. Ltheo has been shown to be more virulent

538 under high temperature and drought stress in some plant pathogen interactions (Paolinelli-Alfonso

539 et al. 2016; Yan et al. 2017; Songy et al. 2019). We observed that multiple species of Lasiodiplodia

540 capable of causing similar symptoms on cacao leaves coexist in a given location, increasing the

24 https://mc06.manuscriptcentral.com/genome-pubs Page 25 of 46 Genome

541 complexity of disease management. It is easy to see how symptoms caused by Lasiodiplodia

542 spp., principally necrosis, might be confused with symptoms of other diseases and possible

543 synergy between these pathogens in causing disease deserves further study. The genome of the

544 Ltheo isolate from cacao studied here is similar to the genome of grapevine isolate CSS-01s,

545 previously described by Yan et al. (2017), and it uses a similar gene complements when

546 colonizing cacao leaves as CSS-01s uses when colonizing grape vine stems. This seems logical

547 for a broad host pathogen providing a foundation to study the possible specialization of Ltheo

548 isolates when attacking divergent plant species. Our results indicate that, during infection,

549 limited changes in expression occur in genes encoding proteins that reside inside the cell or on its

550 membranes, compared to genes encoding secreted proteins that directly interact with components

551 outside the cell, including the plant. ThisDraft may indicate that Ltheo routinely and constitutively

552 expresses a wide range of internal and membrane associated proteins, buffering it against

553 external changes, while genes encoding secreted proteins that interact directly with components

554 outside the cell are more responsive to change, adapting to the specific substrates/components

555 encountered. The latter gene set includes a wide array of putative effectors, necrosis-inducing

556 proteins, pectinases and hydrolytic enzymes likely involved in Ltheo virulence and pathogenicity

557 during cacao infection.

558

559 Acknowledgements

560 This work was funded by USDA ARS. References to a company and/or product by the USDA

561 are only for the purposes of information and do not imply approval or recommendation of the

562 product to the exclusion of others that may also be suitable. USDA is an equal opportunity

563 provider and employer. The authors have no conflict of interest to declare.

25 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 26 of 46

564

565 Reference: 566 Ali, S. S., Shao, J., Lary, D. J., Kronmiller, B., Shen, D., Strem, M. D., Amoako-Attah, I.,

567 Akrofi, A. Y., Begoude, B. A. D., Hoopen, G. M. t., Coulibaly, K., Kebe, B. I., Melnick,

568 R. L., Guiltinan, M. J., Tyler, B. M., Meinhardt, L. W. and Bailey, B. A. 2016.

569 Phytophthora megakarya and P. palmivora, closely related causal agents of cacao black

570 pod rot, underwent increases in genome sizes and gene numbers by different

571 mechanisms. Genome. Biol. Evol. 9: 536-557.

572 Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.

573 J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search 574 programs. Nucleic Acids Res. 25Draft: 3389-3402. 575 Alves, A., Crous, P. W., Correia, A. and Phillips, A. 2008. Morphological and molecular data

576 reveal cryptic speciation in Lasiodiplodia theobromae. Fungal Divers. 28: 1-13.

577 Alvindia, D. G. and Gallema, F. L. M. 2017. Lasiodiplodia theobromae causes vascular streak

578 dieback (VSD)–like symptoms of cacao in Davao Region, Philippines. Austral. Plant Dis.

579 Notes, 12: 54.

580 Anders, S. and Huber, W. 2010. Differential expression analysis for sequence count data.

581 Genome Biol. 11: R106.

582 Bailey, B., Bae, H., Strem, M., Roberts, D., Thomas, S., Crozier, J., Samuels, G., Choi, I.-Y. and

583 Holmes, K. 2006. Fungal and plant gene expression during the colonization of cacao

584 seedlings by endophytic isolates of four Trichoderma species. Planta,, 224: 1449-1464.

585 Bailey, B. A., Crozier, J., Sicher, R. C., Strem, M. D., Melnick, R., Carazzolle, M. F., Costa, G.

586 G., Pereira, G. A., Zhang, D. and Maximova, S. 2013. Dynamic changes in pod and

587 fungal physiology associated with the shift from biotrophy to necrotrophy during the

26 https://mc06.manuscriptcentral.com/genome-pubs Page 27 of 46 Genome

588 infection of Theobroma cacao by Moniliophthora roreri. Physiol. Mol. Plant Pathol. 81:

589 84-96.

590 Bailey, B. A., Melnick, R. L., Strem, M. D., Crozier, J., Shao, J., Sicher, R., Philips-Mora, W.,

591 Ali, S. S., Zhang, D. and Meinhardt, L. 2014. Differential gene expression by

592 Moniliophthora roreri while overcoming cacao tolerance in the field. Mol. Plant Pathol.

593 15: 711-729.

594 Bailey, B. A., Strem, M. D., Bae, H., de Mayolo, G. A. and Guiltinan, M. J. 2005. Gene

595 expression in leaves of Theobroma cacao in response to mechanical wounding, ethylene,

596 and/or methyl jasmonate. Plant Science, 168: 1247-1258.

597 Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V.

598 M., Nikolenko, S. I., Pham, S.Draft and Prjibelski, A. D. 2012. SPAdes: a new genome

599 assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19:

600 455-477.

601 Blanco-Ulate, B., Rolshausen, P. and Cantu, D. 2013. Draft genome sequence of Neofusicoccum

602 parvum isolate UCR-NP2, a fungal vascular pathogen associated with grapevine cankers.

603 Genome Announc. 1: e00339-00313. DOI: 10.1128/genomeA.00339-13

604 Burgess, T. I., Barber, P. A., Mohali, S., Pegg, G., de Beer, W. and Wingfield, M. J. 2006. Three

605 new Lasiodiplodia spp. from the tropics, recognized based on DNA sequence

606 comparisons and morphology. Mycologia, 98: 423-435.

607 Bushnell, B. 2014 BBMap: a fast, accurate, splice-aware aligner. Berkeley, CA (US): Ernest

608 Orlando Lawrence Berkeley National Laboratory. http://1ofdmq2n8tc36m6i46scovo2e.

609 wpengine.netdna-cdn.com/wp-content/uploads/2013/11/BB_User-Meeting-2014-poster-

610 FINAL.pdf.

27 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 28 of 46

611 Cho, Y., Jang, M., Srivastava, A., Jang, J.-H., Soung, N.-K., Ko, S.-K., Kang, D.-O., Ahn, J. S.

612 and Kim, B. Y. 2015. A pectate lyase-coding gene abundantly expressed during early

613 stages of infection is required for full virulence in Alternaria brassicicola. PloS ONE, 10:

614 e0127140. doi.org/10.1371/journal.pone.0127140.

615 Conesa, A., Götz, S., García-Gómez, J. M., Terol, J., Talón, M. and Robles, M. 2005. Blast2GO:

616 a universal tool for annotation, visualization and analysis in functional genomics

617 research. Bioinformatics, 21: 3674-3676.

618 del Castillo, D. S., Parra, D., Noceda, C. and Pérez-Martínez, S. 2016. Co-occurrence of

619 pathogenic and non-pathogenic Fusarium decemcellulare and Lasiodiplodia theobromae

620 isolates in cushion galls disease of cacao (Theobroma cacao L.). J. Plant Protect. Res. 56:

621 129-138. Draft

622 Dilokpimol, A., Mäkelä, M. R., Aguilar-Pontes, M. V., Benoit-Gelber, I., Hildén, K. S. and de

623 Vries, R. P. 2016. Diversity of fungal feruloyl esterases: updated phylogenetic

624 classification, properties, and industrial applications. Biotechno. Biofuels, 9: 231-231.

625 Fellbrich, G., Romanski, A., Varet, A., Blume, B., Brunner, F., Engelhardt, S., Felix, G.,

626 Kemmerling, B., Krzymowska, M. and Nürnberger, T. 2002. NPP1, a

627 Phytophthora‐associated trigger of plant defense in parsley and Arabidopsis. Plant J. 32:

628 375-390.

629 Have, A. t., Mulder, W., Visser, J. and van Kan, J. A. 1998. The endopolygalacturonase gene

630 Bcpg1 is required for full virulence of . Mol. Plant Microbe Interact. 11:

631 1009-1016.

632 Islam, M. S., Haque, M. S., Islam, M. M., Emdad, E. M., Halim, A., Hossen, Q. M. M., Hossain,

633 M. Z., Ahmed, B., Rahim, S. and Rahman, M. S. 2012. Tools to kill: genome of one of

28 https://mc06.manuscriptcentral.com/genome-pubs Page 29 of 46 Genome

634 the most destructive plant pathogenic fungi Macrophomina phaseolina. BMC Genom.

635 13: 493.

636 Jaiyeola, I., Akinrinlola, R. J., Ige, G. S., Omoleye, O. O., Oyedele, A., Odunayo, B. J., Emehin,

637 O. J., Bello, M. O. and Adesemoye, A. O. 2014. Bot canker pathogens could complicate

638 the management of Phytophthora black pod of cocoa. African J. Microbiol. Res. 8: 3094-

639 3100.

640 Larkin, M. A., Blackshields, G., Brown, N., Chenna, R., McGettigan, P. A., McWilliam, H.,

641 Valentin, F., Wallace, I. M., Wilm, A. and Lopez, R. 2007. Clustal W and Clustal X

642 version 2.0. Bioinformatics, 23: 2947-2948.

643 Liñeiro, E., Chiva, C., Cantoral, J. M., Sabidó, E. and Fernández-Acero, F. J. 2016.

644 Modifications of fungal membraneDraft proteins profile under pathogenicity induction: A

645 proteomic analysis of Botrytis cinerea membranome. Proteom. 16: 2363-2376.

646 Machado, A. R., Pinho, D. B. and Pereira, O. L. 2014. Phylogeny, identification and

647 pathogenicity of the Botryosphaeriaceae associated with collar and root rot of the biofuel

648 plant Jatropha curcas in Brazil, with a description of new species of Lasiodiplodia.

649 Fungal Diver. 67: 231-247.

650 Mbenoun, M., Momo Zeutsa, E. H., Samuels, G., Nsouga Amougou, F. and Nyasse, S. 2008.

651 Dieback due to Lasiodiplodia theobromae, a new constraint to cocoa production in

652 Cameroon. Plant Pathol. 57: 381-381.

653 McMahon, P. and Purwantara, A. 2016.Vascular streak dieback (Ceratobasidium theobromae,

654 history and biology. In Cacao Diseases: A History of Old Enemies and New Encounters.

655 Edited by B. A. Bailey and L. W. Meinhardt. Springer International Publishing, New

656 York, NY. pp. 307-335.

29 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 30 of 46

657 Medina, V. and Laliberte, B., 2017. A review of research on the effects of drought and

658 temperature stress and increased CO2 on Theobroma cacao L., and the role of genetic

659 diversity to address climate change. Costa Rica: Bioversity International, 51 p. ISBN: 978-92-

660 9255-074-5.

661 Meinhardt, L. W., Costa, G. G., Thomazella, D. P., Teixeira, P. J., Carazzolle, M. F., Schuster, S.

662 C., Carlson, J. E., Guiltinan, M. J., Mieczkowski, P. and Farmer, A. 2014. Genome and

663 secretome analysis of the hemibiotrophic fungal pathogen, Moniliophthora roreri, which

664 causes frosty pod rot disease of cacao: mechanisms of the biotrophic and necrotrophic

665 phases. BMC Genom. 15: 164.

666 Morales-Cruz, A., Amrine, K. C., Blanco-Ulate, B., Lawrence, D. P., Travadon, R., Rolshausen,

667 P. E., Baumgartner, K. and Cantu,Draft D. 2015. Distinctive expansion of gene families

668 associated with plant cell wall degradation, secondary metabolism, and nutrient uptake in

669 the genomes of grapevine trunk pathogens. BMC Genom. 16: 469.

670 Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A. C. and Kanehisa, M. 2007. KAAS: an automatic

671 genome annotation and pathway reconstruction server. Nucleic Acids Res. 35: 182-185.

672 Mullen, J., Gilliam, C., Hagan, A. and Morgan-Jones, G. 1991. Canker of dogwood caused by

673 Lasiodiplodia theobromae, a disease influenced by drought stress or selection.

674 Plant Disease, 75: 886-889.

675 Nowell, W. 1923. Diseases of crop-plants in. the Lesser Antilles. London: The Imperial Dept. of

676 Agriculture. The West India Committee, Trinity Square, London, UK. pp. 383.

677 Paolinelli-Alfonso, M., Villalobos-Escobedo, J. M., Rolshausen, P., Herrera-Estrella, A.,

678 Galindo-Sánchez, C., López-Hernández, J. F. and Hernandez-Martinez, R. 2016. Global

679 transcriptional analysis suggests Lasiodiplodia theobromae pathogenicity factors

680 involved in modulation of grapevine defensive response. BMC Genom. 17: 615.

30 https://mc06.manuscriptcentral.com/genome-pubs Page 31 of 46 Genome

681 Perlin, M. H., Andrews, J. and San Toh, S. 2014.Essential letters in the fungal alphabet: ABC

682 and MFS transporters and their roles in survival and pathogenicity. Adv. Genet. 85: 201-

683 253.

684 Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. and Salzberg, S. L. 2016. Transcript-level

685 expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat.

686 Protoc. 11: 1650.

687 Petersen, T. N., Brunak, S., von Heijne, G. and Nielsen, H. 2011. SignalP 4.0: discriminating

688 signal peptides from transmembrane regions. Nat. Methods, 8: 785-786.

689 Ploetz, R. 2016.The Impact of Diseases on Cacao Production: A Global Overview. In Cacao

690 Diseases: A History of Old Enemies and New Encounters. Edited by B. A. Bailey and L.

691 W. Meinhardt. Springer InternationalDraft Publishing, New York, NY. pp. 33-59.

692 Rubini, M. R., Silva-Ribeiro, R. T., Pomella, A. W. V., Maki, C. S., Araújo, W. L., Dos Santos,

693 D. R. and Azevedo, J. L. 2005. Diversity of endophytic fungal community of cacao

694 (Theobroma cacao L.) and biological control of Crinipellis perniciosa, causal agent of

695 Witches' Broom Disease. Int. J. Biol. Sc. 1: 24-33.

696 Saitou, N. and Nei, M. 1987. The neighbor-joining method: a new method for reconstructing

697 phylogenetic trees. Mol. Biol. Evol. 4: 406-425.

698 Samuels, G. J., Ismaiel, A., Rosmana, A., Junaid, M., Guest, D., Mcmahon, P., Keane, P.,

699 Purwantara, A., Lambert, S. and Rodriguez-Carres, M. 2012. Vascular streak dieback of

700 cacao in Southeast Asia and Melanesia: in planta detection of the pathogen and a new

701 . Fungal Biol. 116: 11-23.

702 Shaner, G. and Finney, R. 1977. The effect of nitrogen fertilization on the expression of slow-

703 mildewing resistance in Knox wheat. Phytopathol. 67: 1051-1056.

31 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 32 of 46

704 Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. and Zdobnov, E. M. 2015.

705 BUSCO: assessing genome assembly and annotation completeness with single-copy

706 orthologs. Bioinformatics, 31(19): 3210-3212.

707 Slippers, B. and Wingfield, M. J. 2007. Botryosphaeriaceae as endophytes and latent pathogens

708 of woody plants: diversity, ecology and impact. Fungal Bio. Rev. 21: 90-106.

709 Sonnhammer, E. L., Von Heijne, G. and Krogh, A. 1998 A hidden Markov model for predicting

710 transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6:

711 175-182.

712 Songy, A., Fernandez, O., Clément, C., Larignon, P. and Fontaine, F., 2019. Grapevine trunk

713 diseases under thermal and water stresses. Planta, 249(6): 1655-1679.

714 Sperschneider, J., Dodds, P. N., Gardiner,Draft D. M., Singh, K. B. and Taylor, J. M. 2018a.

715 Improved prediction of fungal effector proteins from secretomes with EffectorP 2.0. Mol.

716 Plant Pathol. 19: 2094-2110.

717 Sperschneider, J., Dodds, P. N., Singh, K. B. and Taylor, J. M. 2018b. ApoplastP: prediction of

718 effectors and plant proteins in the apoplast using machine learning. New Phytol. 217:

719 1764-1778.

720 Stanke, M., Steinkamp, R., Waack, S. and Morgenstern, B. 2004. AUGUSTUS: a web server for

721 gene finding in eukaryotes. Nucleic Acids Res. 32: 309-312.

722 Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. and Kumar, S. 2011. MEGA5:

723 molecular evolutionary genetics analysis using maximum likelihood, evolutionary

724 distance, and maximum parsimony methods. Mol. Biol. Evol. 28: 2731-2739.

725 Úrbez-Torres, J. and Gubler, W. 2009. Pathogenicity of Botryosphaeriaceae species isolated

726 from grapevine cankers in California. Plant Disease, 93: 584-592.

32 https://mc06.manuscriptcentral.com/genome-pubs Page 33 of 46 Genome

727 Van der Linde, J. A., Six, D. L., Wingfield, M. J. and Roux, J. 2011. Lasiodiplodia species

728 associated with dying Euphorbia ingens in South Africa. Southern Forests 73: 165-173.

729 van der Nest, M. A., Bihon, W., De Vos, L., Naidoo, K., Roodt, D., Rubagotti, E., Slippers, B.,

730 Steenkamp, E. T., Wilken, P. M. and Wilson, A. 2014. Draft genome sequences of

731 Diplodia sapinea, Ceratocystis manginecans, and Ceratocystis moniliformis. IMA

732 Fungus 5: 135-140.

733 van Gorcom, R. F., van den Hondel, C. A. and Punt, P. J. 1998. Cytochrome P450 enzyme

734 systems in fungi. Fungal Genet. Biol. 23: 1-17.

735 White, T. J., Bruns, T., Lee, S. and Taylor, J. 1990. Amplification and direct sequencing of

736 fungal ribosomal RNA genes for phylogenetics. In PCR Protocols: A Guide to Methods

737 and Applications. Edited by M.Draft A. Innis, D. H. Gelfand, J. J. Sninsky and T. J. White.

738 Academic Press Inc., New York. pp. 315-322.

739 Winnenburg, R., Baldwin, T. K., Urban, M., Rawlings, C., Köhler, J. and Hammond-Kosack, K.

740 E. 2006. PHI-base: a new database for pathogen host interactions. Nucleic Acids Res. 34:

741 459-464.

742 World Cocoa Foundation 2014. Cocoa market update. Washington DC: World Cocoa

743 Foundation. Found at http://www.worldcocoafoundation.org/wp-content/uploads/Cocoa-

744 Market-Update-as-of-4-1-2014.pdf.

745 Yan, J. Y., Zhao, W. S., Chen, Z., Xing, Q. K., Zhang, W., Chethana, K., Xue, M. F., Xu, J. P.,

746 Phillips, A. J. and Wang, Y. 2017. Comparative genome and transcriptome analyses

747 reveal adaptations to opportunistic infections in woody plant degrading pathogens of

748 Botryosphaeriaceae. DNA Res. 25: 87-102.

33 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 34 of 46

749 Yang, Y., Zhang, Y., Li, B., Yang, X., Dong, Y. and Qiu, D. 2018. A Verticillium dahliae

750 Pectate lyase Induces Plant Immune Responses and Contributes to Virulence. Front.

751 Plant Sci. 9: 1271.

752 Zuckerkandl, E. and Pauling, L. 1965. Evolutionary divergence and convergence in proteins.

753 In Evolving Genes and Proteins. Edited by V. Bryson and H.J. Vogel. Academic Press,

754 New York. pp. 97-166.

755

Draft

34 https://mc06.manuscriptcentral.com/genome-pubs Page 35 of 46 Genome

756 Table 1. Genome assembly and annotation statistics of

757 Lasiodiplodia theobromae strain AM2As.

L. theobromae Total Contig length (bp) 43,757,571 Contig numbers 833 BUSCO Completeness (%) 99.23% GC content 54.73% N50 Contig length (bp) 876,715 Max Contig size (bp) 1,723,604 Min Contig size (bp) 56 Mean Contig size (bp) 52530 Gene number 13,061 Total gene length 21,414,393 Average gene length 1639.56 Gene density# Draft0.489 Number of expressed genes* 11,860 Genes with GO annotation¥ 7,980 Genes within KEGG pathway 4,035 #CDS bases/total genome bases *Only gene models with ≥5 raw reads, either in any mycelia, or in planta sample are reported. ¥Gene models with E<10-4 for BLASTn against Uniport Gene Ontology database.

35 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 36 of 46

759 Table 2. Comparative genomes of Botryosphaeriaceae fungi.

Genome No. of Species Contig N50 GC% No. of genes GenBank no. length (bp) contigs Diplodia sapinea 36,053,350 2,371 37,635 56.85 No gene call AXCF00000000 Neofusicoccum parvum 42,592,847 1,877 83,561 56.80 10,366 AORE01000000 Macrophomina phaseolina 48,882,845 1,506 150,180 52.30 13,806 AHHD01000000 Lasiodiplodia theobromae 43,283,415 60 1,738,941 54.80 No gene call MDYX01000000 CSS-01s Diplodia corticola 34,986,079 286 271,374 57.10 10,839 MNUE01000000 Diplodia scrobiculata 34,931,051 4,037 16,793 57.00 No gene call LAEG00000000 Diplodia seriata 37,268,684 469 239,894 56.60 8,050 MSZU00000000 Botryosphaeria dothidea 47,389,336 1,251 210,735 53.10 No gene call MDSR00000000 L. theobromae AM2As 43,757,571 833 876,715 54.73 13,061 In this study Draft

36 https://mc06.manuscriptcentral.com/genome-pubs Page 37 of 46 Genome

761 Table 3. Number of non-CAZyme transmembrane protein codding genes# of Lasiodiplodia

762 theobromae isolate AM2As.

Preferential expression (In planta/mycelia)2 Gene Total Treatment Mean Treatment Mean>1000 Expressed1 Class/family genes Overall reads In planta Mycelia In planta Mycelia Total transporters 827 747 93 159 17 40 ABC Transporters 43 41 2 10 2 5 MFS 316 277 49 59 16 12 Sugar transport 106 93 20 9 6 4 Cation/anion (noncarbon) 161 152 7 36 2 6 Drug (ABC/MFS) 75 65 7 11 3 2 Carbon based 200 184 14 49 6 11 peptide 18 16 1 1 amino acid 64 Draft57 3 17 3 5 allantoate 31 27 5 7 3 2 pantothenate 9 9 0 5 0 0 Total non-transport 941 848 78 162 21 55 Integral membrane 185 174 28 27 4 14 Hypothetical 228 180 23 51 5 11 Enzymatic etc 645 607 46 98 14 42 Steroid biosymthesis/metabolism 17 16 1 3 1 Chitin synthase/GH family 2 19 19 1 5 3 RTA/RTA1 26 24 2 8 CFEM 8 8 5 1 1 0 Petidase/protease 22 21 1 6 1 2 FAD Binding 21 19 5 2 Glucosyl Hydrolase 23 22 2 4 1 1 Glucosyl Transferase 42 41 1 5 0 3 Cytochrome/Cp450 56 48 7 11 2 3 #As determined by TMHMM program and BLASTp search against Carbohydrate-Active enzymes database at the threshold value of E<10-10. 1Gene models with ≥5 raw reads, either in any mycelia, or in planta. 2Differential regulation at >2 Log2 and Padj <0.05

37 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 38 of 46

764 Table 4. Number of secreted# CAZymes family genes of Lasiodiplodia theobromae isolate 765 AM2A.

Preferential expression (In plant/mycelia)3 CAZymes Total Treatment Mean Treatment Expressed2 family1 genes Overall Mean>1000 reads In planta Mycelia In planta Mycelia AA3 19 19 7 3 3 1 AA1 12 11 3 3 1 2 AA7 9 6 5 1 3 1 AA9 8 8 6 2 6 1 Other AA (4) 11 10 2 4 1 2 CBM1 8 8 7 0 7 0 CBM13 4 4 3 1 3 0 Other CBM (6) 15 15 4 6 3 1 CE5 8 8 6 0 3 0 CE16 4 3 3 0 2 0 CE8 3 3 3 0 1 0 Other CE (5) 14 14 7 5 4 4 GH43_12 19 18 Draft11 0 8 0 GH3 13 11 11 0 8 0 GH28 12 11 8 1 5 0 GH5_11 10 10 6 1 5 1 GH16 9 9 5 1 2 0 GH18 7 5 4 0 2 0 GH10 4 4 4 0 3 0 GH35 4 3 3 0 3 0 GH78 4 3 3 0 3 0 GH12 3 3 3 0 3 0 GH131 3 3 3 0 3 0 Other GH (40) 86 83 29 10 21 2 GTs (8) 11 11 2 1 0 1 PL1 10 10 8 0 7 0 PL3_2 7 7 6 1 6 0 PL4_1 5 5 5 0 4 0 PL9_3 1 1 1 0 1 0 #As determined by SignalP, version 3.0 and BLASTp search against Carbohydrate-Active enzymes database at the threshold value of E<10-10. 1Number within parentheses indicates the number of CAZymes families. 2Gene models with ≥5 raw reads, either in any mycelia, or in planta. 3Differential regulation at >2 Log2 and Padj <0.05

38 https://mc06.manuscriptcentral.com/genome-pubs Page 39 of 46 Genome

767 Table 5. Number of secreted# non-CAZyme and non-effector genes of Lasiodiplodia 768 theobromae isolate AM2As.

Preferential expression (In plant/mycelia)2 Total Treatment Mean Treatment Mean>1000 Gene Class/family Expressed1 genes Overall reads In planta Mycelia In planta Mycelia Peptidase/protease/amidase 78 72 26 8 19 3 Carboxypeptidase a1 6 5 1 1 1 0 Aspartic endopeptidase pep1 3 2 0 0 0 0 Peptidase A1 7 7 2 1 1 1 Peptidase m35 deuterolysin 5 5 4 1 4 0 Peptidase M43 4 4 2 0 2 0 Peptidase S10 serine carboxypeptidase 3 3 1 0 1 0 Peptidase S41 family protein 5 4 1 0 1 0 Peptidase S8/S53 4 3 2 0 1 0 subtilisin/kexin/sedolisin Tripeptidyl-peptidase 1 4 4 1 0 0 0 Major allergen 12 9 5 2 2 2 Major allergen Asp 1 1 0 1 0 1 Allergen Asp f 7 1 0 0 0 0 0 Allergen V5/Tpx-1-related protein 4 Draft4 2 1 1 1 Major allergen Alt 5 3 2 1 1 1 Oxidase/reductase activity 60 51 31 6 19 2 Dehydrogenases 15 14 8 3 6 2 p450 20 15 6 1 2 0 Peroxidase 5 4 3 0 2 0 Dioxygenase 13 13 11 0 7 0 Protocatechuate -dioxygenase beta 4 3 3 0 2 0 subunit protein Esterase 30 26 8 5 6 0 Tannase feruloyl esterase 5 4 4 0 4 0 Carboxylesterase 16 12 3 3 2 0 Para-nitrobenzyl esterase 2 2 1 0 1 0 Cell wall protein 27 25 10 9 5 5 Cell wall protein 9 9 7 2 4 1 Gpi anchored cell wall protein 4 4 0 2 0 1 Gpi-anchored cell wall organization 8 7 1 3 1 1 protein ecm33 Carbohydrate binding 24 24 4 11 4 4 Carbohydrate binding protein 4 4 0 0 0 0 CFEM domain-containing protein 5 5 1 4 1 3 Chitin binding 3 3 1 0 1 0 WSC domain containing protein 9 9 1 6 1 1 Others families/gene classes alpha beta-hydrolase 14 12 6 1 3 0 Lipase 13 11 4 2 3 1

39 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 40 of 46

Hypothetical protein 240 172 56 29 18 12 Tyrosinase 4 4 2 1 1 0 Hemagglutinin 3 2 0 2 0 2 NLPs 4 4 4 0 4 0 Protein elicitor 1 1 1 0 1 0 Deoxyribonuclease TatD-related 2 1 1 0 1 0 protein Extracellular aldonolactonase protein 1 1 1 0 1 0 Isoform cra a protein 1 1 1 0 1 0 Hypersensitive response-inducing 1 1 1 0 0 0 protein elicitor Mycelial catalase cat1 1 1 1 0 1 0 Sulfatase 1 1 1 0 0 0 Survival protein SurE-like 1 1 1 0 1 0 phosphatase/nucleotidase Phytase protein 1 1 1 0 1 0 Ureidoglycolate lyase 1 1 1 0 1 0 Glutaminase 1 1 1 0 1 0 Epl1 protein 1 1 1 0 1 0 Carbonic anhydrase 1 1 1 0 1 0 ABC-type fe3+ transport system 4 Draft4 4 0 3 0 protein FAD binding domain containing 22 17 4 1 2 0 protein #As determined by SignalP, version 3.0, EffectorP 2.0 and BLASTp search against Carbohydrate-Active enzymes database at the threshold value of E<10-10. 1Gene models with ≥5 raw reads, either in any mycelia, or in planta. 2Differential regulation at >2 Log2 and Padj <0.05

40 https://mc06.manuscriptcentral.com/genome-pubs Page 41 of 46 Genome

770 Figure legend

771 Figure 1. Molecular phylogenetic analysis of Lasiodiplodia isolates collected from infected

772 cacao plants from Indonesia, Philippines and USA (country of origin with code AM: Indonesia,

773 Phi: Philippines and Miami: USA) in comparison to known Lasiodiplodia sp. and related

774 Botryosphaeriaceae fungi Diplodia corticola and Botryosphaeria lutea (accessed from

775 GenBank). The analysis was based on DNA sequence data of ITS 1 and 2 regions (GenBank-

776 MH412939 to MH412990) and part of EF1α gene of the 52 isolates. Sequence were combined

777 and aligned using ClustalW2 tool (Larkin et al. 2007) under default settings. A phylogenetic tree

778 was reconstructed using the Maximum Likelihood method based on Poisson correction model 779 (Zuckerkandl and Pauling, 1965), withDraft 1000 bootstrapped data sets. The tree is drawn to scale, 780 with branch lengths measured in the number of substitutions per site. Analyses were conducted

781 in MEGA6 (Tamura et al. 2011). Isolates of Lasiodiplodia spp. studied here in are listed in gen-

782 2019-0070.R3Supplb.

783 Figure 2. Differential aggressiveness responses by Lasiodiplodia spp. isolates as assessed in

784 mycelia inoculated leaf disc bioassay. After daily observation, out to 4 days post inoculation,

785 percentage of necrotic area of leaf blade (lamina) was quantified and area under disease progress

786 curve (AUDPC) was calculated. AUDPC was analyzed by two-way RM ANOVA with Fisher's

787 Least Significant Difference (LSD) test (P = 0.05) using GraphPad Prism version 7.0. Bars

788 indicate standard error of the mean (LSD0.05 = 28.92).

789 Figure 3. Bi-directional Venn diagram. Bi-directional blast results are present in a Venn

790 diagram. The code used for this diagram is: LT = Lasiodiplodia theobromae (AM2As) genes;

791 DC = Diplodia corticola genes; NP = Neofusicoccum parvum genes and BC = Botrytis cinerea

41 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 42 of 46

792 genes. Intersects are labeled with a number which represents the number of specific genes in that

793 intersect. To be considered as an ortholog, BLASTp matches should span at least 50% of the

794 sequence with E-value less than 1e-05. The Venn diagram was generated using the ‘DrawVenn

795 Diagram’ website at http://bioinformatics.psb.ugent.be/webtools/Venn/. The 2,862 L.

796 theobromae specific genes are listed in gen-2019-0070.R3Suppld.

797 Figure 4. Evolutionary relationships and transcription profiles of 115 putative effector proteins

798 of Lasiodiplodia theobromae (AM2As). (A) Amino acid sequences were aligned using

799 ClustalW2 tool (Larkin et al. 2007) under default settings and evolutionary relationships were

800 inferred using the Neighbor-Joining method (Saitou and Nei, 1987) with bootstrap (1000 801 replicates). There was a total of 6 positionsDraft in the final dataset. The tree is drawn to scale, with 802 branch lengths representing number of amino acid differences per sequence. Evolutionary

803 analyses were conducted in MEGA5 (Tamura et al. 2011). (B) For the relative transcription

804 profiles, normalized mean RNA-Seq read counts for in planta and mycelia libraries were

805 LOG10-transformed. The heat map was generated using CIMminer

806 (http://discover.nci.nih.gov/cimminer). White blocks indicate no detectable transcription.

42 https://mc06.manuscriptcentral.com/genome-pubs Page 43 of 46 Genome

Phi_L4 Phi_L5 Phi_L3 Phi_L2 Phi_L1 AM29B AM23A AM29A AM9B AM21A AM19C AM19B AM2As Amp8 AM50B AM9 31 AMp5C AM32A L. theobromae AM5E AM5F AM49 AM34A AM2C Phi_L6 Phi_L8 Phi_L9 Phi_L10 Phi_L11 Phi_L12 Phi_L13 50 Phi_L14 Miami_Draft1 L._theobromae_MF580791 L._theobromae_EF622074 AM50A Phi_L15 L._pseudotheobromae_MF671948 Phi_L7 . p

AM27A s AM27C a

AM26B_2 i

49 L._pseudotheobromae_KY655207 d

AM26B o AM54A l p 55 AM25B i Am27B d o

AM25C i

AM36A s 58 AM33A a 61 AM9A L AM25A AM19A 59 Miami_2 L._pseudotheobromae_EF622077 L._gonubiensis_AY639595 53 L_venezuelensis_KF766194 L._crassispora_KY994644 L._rubropurpurea_NR_136976 42 AM52 53 AM54B L. rubropurpurea AM54B2 Diplodia_corticola_MG220433 99 Diplodia_corticola_MG015741 Botryosphaeria_lutea_AY259091

0.02 Figure 1

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 44 of 46

140 120 )

( % 100

80

P C 60 D

U 40 A 20 0

L. theobromae Lasiodiplodia sp. L. rubropurpurea Draft Figure 2

https://mc06.manuscriptcentral.com/genome-pubs Page 45 of 46 Genome

DC

BC

Draft

Figure 3

https://mc06.manuscriptcentral.com/genome-pubs Genome Page 46 of 46

A B 83 LTHEOB_5587 LTHEOB_5988 LTHEOB_10949 LTHEOB_9306 LTHEOB_6889 LTHEOB_9274 41 LTHEOB_3309 LTHEOB_12692 36 LTHEOB_8146 827 LTHEOB_11417 LTHEOB_68 25 LTHEOB_3477 LTHEOB_11673 LTHEOB_11518 LTHEOB_10916 21 LTHEOB_8187 LTHEOB_7140 1439 LTHEOB_10532 LTHEOB_8999 LTHEOB_83 LTHEOB_10018 LTHEOB_6118 20 LTHEOB_777 LTHEOB_6708 LTHEOB_8392 LTHEOB_10437 11 LTHEOB_1882 LTHEOB_10978 LTHEOB_5120 99 LTHEOB_10533 LTHEOB_6027 LTHEOB_9425 10 LTHEOB_8594 32 LTHEOB_4222 17 100 LTHEOB_3870 10 LTHEOB_1946 LTHEOB_10768 LTHEOB_10510 25 54 LTHEOB_10866 15 LTHEOB_9401 LTHEOB_2457 LTHEOB_10065 LTHEOB_11381 1249 LTHEOB_1074 LTHEOB_6344 LTHEOB_9419 LTHEOB_3825 LTHEOB_5899 10 LTHEOB_11039 47 98 LTHEOB_10522 LTHEOB_10948 Draft LTHEOB_11804 LTHEOB_12437 17 LTHEOB_897 LTHEOB_6028 LTHEOB_544 41 LTHEOB_3834 27 94 LTHEOB_197 LTHEOB_5817 LTHEOB_10971 LTHEOB_282 LTHEOB_10312 24 LTHEOB_8441 LTHEOB_11463 LTHEOB_8212 LTHEOB_11455 LTHEOB_8145 LTHEOB_53 22 LTHEOB_11453 15 LTHEOB_11454 LTHEOB_52 LTHEOB_1624 LTHEOB_5629 LTHEOB_4787 29 LTHEOB_6019 LTHEOB_8063 LTHEOB_2873 LTHEOB_11449 LTHEOB_4221 3607 LTHEOB_10356 LTHEOB_8573 80 45 LTHEOB_10448 31 LTHEOB_12891 LTHEOB_2218 LTHEOB_7368 LTHEOB_5293 14 LTHEOB_4255 LTHEOB_2998 LTHEOB_10224 LTHEOB_11788 LTHEOB_11502 27 LTHEOB_11506 52 54 LTHEOB_1556 26 LTHEOB_2730 LTHEOB_7548 LTHEOB_3558 LTHEOB_9035 LTHEOB_230 45 LTHEOB_8715 LTHEOB_4483 25 LTHEOB_4162 66 LTHEOB_4686 LTHEOB_10947 20 LTHEOB_3743 LTHEOB_9582 LTHEOB_6447 LTHEOB_6552 35 LTHEOB_12487 LTHEOB_9678 29 LTHEOB_63 LTHEOB_4822 LTHEOB_8191 LTHEOB_3882 LTHEOB_7369 7 1 5 9 3

2.5 2.0 1.5 1.0 0.5 0.0 0 3 3 6 6 4 4 6 4 5 3 6 7 9 0 6 8 1 2 ...... 5 4 2 2 0 3 1 0 -

Figure 4

https://mc06.manuscriptcentral.com/genome-pubs