Genome
Genome and transcriptome analysis of the latent pathogen Lasiodiplodia theobromae, an emerging threat to the cacao industry
Journal: Genome
Manuscript ID gen-2019-0112.R1
Manuscript Type: Article
Date Submitted by the 05-Sep-2019 Author:
Complete List of Authors: Ali, Shahin; Sustainable Perennial Crops Laboratory, United States Department of Agriculture Asman, Asman; Hasanuddin University, Department of Viticulture & Enology Draft Shao, Jonathan; USDA-ARS Northeast Area Balidion, Johnny; University of the Philippines Los Banos Strem, Mary; Sustainable Perennial Crops Laboratory, United States Department of Agriculture Puig, Alina; USDA/ARS Miami, Subtropical Horticultural Research Station Meinhardt, Lyndel; Sustainable Perennial Crops Laboratory, United States Department of Agriculture Bailey, Bryan; Sustainable Perennial Crops Laboratory, United States Department of Agriculture
Keyword: Cocoa, Lasiodiplodia, genome, transcriptome, effectors
Is the invited manuscript for consideration in a Special Not applicable (regular submission) Issue? :
https://mc06.manuscriptcentral.com/genome-pubs Page 1 of 46 Genome
1 Genome and transcriptome analysis of the latent pathogen Lasiodiplodia
2 theobromae, an emerging threat to the cacao industry
3
4 Shahin S. Ali1,2, Asman Asman3, Jonathan Shao4, Johnny F. Balidion5, Mary D. Strem1, Alina S.
5 Puig6, Lyndel W. Meinhardt1 and Bryan A. Bailey1*
6
7 1Sustainable Perennial Crops Laboratory, USDA/ARS, Beltsville Agricultural Research Center-West,
8 Beltsville, MD 20705, USA.
9 2Department of Viticulture & Enology, University of California, Davis, CA 95616
10 3Department of Plant Pests and Diseases, Hasanuddin University, South Sulawesi, Indonesia. 11 4USDA/ARS, Northeast Area, Beltsville, MDDraft 20705, USA. 12 5 Institute of Weed Science, Entomology and Plant Pathology, University of the Philippines, Los Banos,
13 Laguna 4031, Philippines.
14 6Subtropical Horticultural Research Station, USDA/ARS, Miami, FL 33158, USA
15
16
17
18 *Corresponding author:
19 Phone: 1-301-504-7985; Fax: 1-301-504-1998
20 E-mail: [email protected]
1 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 2 of 46
21
22 Abstract
23 Lasiodiplodia theobromae (Ltheo), a member of the Botryosphaeriaceae family, is becoming a
24 significant threat to crops and woody plants in many parts of the world, including the major
25 cacao growing areas. While attempting to recover Ceratobasidium theobromae, causal agent of
26 vascular streak dieback (VSD), from symptomatic cacao stems, 74% of recovered fungi were
27 Lasiodiplodia spp. Sequence-based identification of 52 putative Lasiodiplodia isolates indicates
28 that diverse Lasiodiplodia species are associated with cacao in the studied areas, and the isolates
29 showed variation in aggressiveness when assayed using cacao leaf discs. The current study 30 reports on the 43.75 Mb de novo assembledDraft genome of a Ltheo isolate from cacao. Ab initio gene 31 prediction has generated 13,061 protein-coding genes, of which 2,862 are unique to Ltheo, when
32 compared to other closely related Botryosphaeriaceae fungi. Transcriptome analysis revealed
33 that 11,860 predicted genes were transcriptionally active and 1,255 were more highly expressed
34 in planta compare to cultured mycelia. The predicted genes differentially expressed during
35 infection were mainly those involved in carbohydrate, pectin and lignin catabolism, cytochrome
36 P450s, necrosis-inducing proteins and putative effectors. These findings significantly expand our
37 knowledge of the Ltheo genome and Ltheo genes involved in virulence and pathogenicity.
38 Keywords: Cocoa, Lasiodiplodia, genome, transcriptome, effectors
2 https://mc06.manuscriptcentral.com/genome-pubs Page 3 of 46 Genome
39
40 Introduction
41 Lasiodiplodia theobromae (Pat.) Griffon & Maubl., (Ltheo), a member of the
42 Botryosphaeriaceae family, is often considered a latent plant pathogen attacking more than 500
43 plant species in the tropics and subtropics (Burgess et al. 2006; Slippers and Wingfield 2007).
44 The significance of disease caused by Ltheo appears to be increasing in many parts of the world,
45 perhaps in association with global climate change. Environmental factors like temperature and
46 drought are known to influence the interactions between Ltheo and their plant hosts (Paolinelli-
47 Alfonso et al. 2016; Yan et al. 2017; Songy et al. 2019). The effects of climate change on cacao, and
48 the tropics in general, are of increasing Draftconcern (Medina and Laliberte 2017). Theobroma cacao
49 L. (cacao), the source of chocolate, is the major source of income for six million farmers located
50 around the world in tropical climates (World Cocoa Foundation 2014). Most cacao farmers have
51 small plots of land and many suffer major income losses due to destructive diseases (Ploetz
52 2016). Though Ltheo was first reported to cause pod rot and dieback in cacao in 1923 (Nowell
53 1923), it was never considered as a major pathogen of cacao. However, Ltheo has been
54 suggested as a significant constraint for cacao production in some locations, and isolates from
55 symptomatic tissues can cause stem cankers and diebacks when artificially inoculated onto cacao
56 tissues (Mbenoun et al. 2008; Alvindia and Gallema 2017; del Castillo et al. 2016). Typical
57 symptoms on cacao caused by Ltheo can resemble those of other diseases and there is
58 speculation concerning associations of Ltheo with other cacao pathogens, such as canker caused
59 by Phytophthora species (Jaiyeola et al. 2014) and vascular streak dieback (VSD) caused by
60 Ceratobasidium theobromae (Alvindia and Gallema 2017; McMahon and Purwantara 2016).
61 Alvindia and Gallema (2017) reproduced many of the symptoms commonly associated with
3 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 4 of 46
62 VSD on cacao seedlings by inoculating the young leaves with Ltheo. Symptoms included leaf
63 chlorosis and necrotic blotches and leaf scar and stem vascular discoloration.
64 How Ltheo causes disease on such a wide range of host is a question of considerable
65 interest. To establish an infection, Ltheo and related members of the family Botryosphaeriaceae
66 must overcome both preformed and inducible host defenses (Yan et al. 2017), which can vary
67 significantly among hosts. Recently published draft genomes of woody plant pathogens in the
68 Botryosphaeriaceae have provided information about a range of potential virulence factors such
69 as effectors and cell wall modifying enzymes (Blanco-Ulate et al. 2013; Morales-Cruz et al.
70 2015; Paolinelli-Alfonso et al. 2016; van der Nest et al. 2014; Yan et al. 2017). These studies
71 reported the presence of expanded gene families associated with cell wall degradation,
72 membrane transport, nutrient uptake andDraft secondary metabolism, which contribute to adaptations
73 for degrading grapevine tissue (Yan et al. 2017). Although the genome and transcriptome
74 analysis of Ltheo strains pathogenic to grapevine, along with other, mostly grapevine-associated,
75 Botryosphaeriaceae species, has provided a better understanding of Ltheo biology, the study of a
76 strain pathogenic on cacao would help our understanding of how widespread some basic aspects
77 of Ltheo biology are.
78 In this study, we describe the genome and transcriptome of a Ltheo isolate AM2A which
79 was isolated from a cacao stem showing symptoms of vascular streak dieback. The genome of
80 Ltheo isolate AM2As is compared to the genomes of closely related Botryosphaeriaceae
81 pathogens. Expressed genes within the Ltheo genome are characterized through RNA-Seq
82 analysis using infected cacao leaves, and potential effectors expressed during the infection
83 process are identified. These insights add to our understanding of the Ltheo genome and
84 transcriptome.
4 https://mc06.manuscriptcentral.com/genome-pubs Page 5 of 46 Genome
85 Material and method:
86 Isolation and maintenance of fungi
87 Cacao stems showing symptoms of vascular streak dieback (VSD) were collected from Wotu
88 District of South Sulawesi Province, Indonesia (Coordinate point S: 02°33'33.30" E:
89 120°47'51.74", Elevation 39 m) and Davao Region of Philippines between 2014 and 2016.
90 Samples were shipped to USDA-APHIS-PPQ facility at Beltsville, USA and transferred to
91 USDA-ARS Sustainable Perennial Crops Laboratory in Beltsville after inspection. Bark was
92 removed and remaining stem material was cut into one cm long segments. Stem segments were
93 surface sterilized by submerging in 6% (v/v) bleach solution (Clorox, USA) for three minutes 94 followed by three rinses in sterile water.Draft Stem segments were placed on 1.5% water agar (Difco 95 Laboratories, USA) in 100 mm diameter plastic Petri plates and incubated at 25oC under dark.
96 After 2-3 days, stem segments were examined for hyphal growth from the stem segment in
97 contact with the agar. Fungal mycelia were transferred from water agar to Corticium Culture
98 Media (Samuels et al. 2012) containing 100 µg/ml ampicillin. Note the procedure to this point is
99 typically used when attempting to isolate C. theobromae, causal agent of vascular streak dieback
100 (VSD), from cacao tissues and other organisms are commonly isolated from tissues showing
101 symptoms of VSD. Subsequently, fungal isolates were transferred to 20% clarified V8 (CV8)-
102 agar plates and maintained at room temperature.
103 Molecular identification of Lasiodiplodia spp.
104 DNA Extraction
105 For DNA extraction, 2-3 agar plugs (0.25 cm2) from 5 day old cultures of each isolate were
106 transferred to 50 ml falcon tubes containing 20 ml liquid CV8 and grown at room temperature
5 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 6 of 46
107 for 5 days, while shaking at 100 rpm. Mycelia were harvested and DNA was extracted as
108 previously described by Ali et al. (2016).
109 PCR amplification and DNA sequencing of ITS region and EF1α gene
110 For molecular identification and phylogenetic analysis, PCR amplification of the ITS region and
111 elongation factor 1-alpha (EF1α) of the template DNA from 52 Lasiodiplodia isolates was
112 performed using the primers ITS4 and ITS5 described by White et al. (1990) and EF1-688F and
113 EF1-1251R described by (Alves et al. 2008), respectively. PCR amplification, product
114 purification and sequencing was performed as previously described by Ali et al. (2016).
115 Heterozygous or ambiguous sites were labelled using the IUPAC code and sequences were 116 exported for phylogenetic analysis. Draft 117 Phylogenetic analysis
118 A phylogenetic analysis was undertaken to confirm the sequence-based molecular identification
119 and characterization of variation within the Lasiodiplodia isolates associated with cacao. For
120 better phylogenetic representation, both the ITS region and EF1α sequences were combined and
121 aligned using ClustalW2 tool (Larkin et al. 2007) under default settings. A phylogenetic tree was
122 reconstructed using the Maximum Likelihood method based on the Poisson correction model and
123 a distance tree of 1000 bootstrapped data sets was generated by using MEGA v. 6 (Tamura et al.
124 2011).
125 Plant material and leaf disc bioassay
126 To identify a Lasiodiplodia isolate aggressive on cacao leaves, a leaf disc infection bioassay was
127 carried out using a subset of 14 Lasiodiplodia isolates representing the genetic diversity
128 identified. Stage 3 cacao leaves (light green but non-hardened) (Bailey et al. 2005) were
6 https://mc06.manuscriptcentral.com/genome-pubs Page 7 of 46 Genome
129 harvested from cacao trees of clone ICS1. Single leaf discs with 2.1 cm diameters were cut and
130 placed into 60 x 15 mm petri dishes lined with Whatman no. 2 filter paper soaked with sterile
131 distilled water. For inoculation, an agar plug (5 mm diameter) was placed off center beside the
132 midrib of the leaf disc. Controls were treated with water agar plugs. Petri plates were covered
133 and incubated at 25oC and under 12 h light (200 lx) and dark cycles. Observations were taken at
134 1-day intervals and the progression of the necrotic area was quantified as percentage of the area
135 covered. Necrosis progression was recorded separately for leaf blade (lamina), main vein
136 (midrib) and vein. Observations were taken up to 4 days after inoculation and the area under
137 disease progress curve (AUDPC) was calculated according to Shaner and Finney (1977). Each
138 experiment was repeated independently twice with three replicating leaf discs (one per plate) per
139 isolate per experiment. The homogeneityDraft of data sets across replicate experiments was confirmed
140 by two-tailed Pearson correlation analyses conducted using mean data values within GraphPad
141 Prism version 7.0. (r 0.9; P ≤ 0.001). Therefore, data sets from the replicate experiments (a
142 total of 6 leaf discs per isolate) were pooled for the purposes of further analysis. The significance
143 of treatment effects was analyzed within GraphPad Prism version 7.0 by two-way ANOVA with
144 post-hoc pair wise uncorrected Fisher's Least Significance Difference (LSD) comparisons (P =
145 0.05).
146 Isolation of RNA from mycelia and infected plant material
147 For RNA extraction from Ltheo isolate AM2As mycelia, 2-3 agar plugs from a V8 agar plate
148 culture were transferred to 250 ml conical flasks containing 50 ml liquid CV8. Liquid cultures
149 were grown 5 days at 25°C, shaking at 100 rpm. Mycelia were harvested by vacuum filtration
150 through miracloth (Calbiochem, San Diego, USA) and rinsed three times with sterile distilled
151 water followed by flash freezing in liquid nitrogen and freeze drying. Freeze-dried mycelia were
7 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 8 of 46
152 ground in a mortar and pestle in liquid nitrogen and transferred to a 50mL centrifuge tube
153 containing 15 mL of 65°C extraction buffer (Bailey et al. 2005). The remaining extraction
154 procedure was conducted as described by Bailey et al. (2013). Using a NanoDrop
155 spectrophotometer (Thermo Scientific, Wilmington, DE), RNA concentrations were determined
156 based on absorbance at 260 nm and purity was estimated by the 260/280 and the 260/230 ratios.
157 For RNA extraction from infected leaf discs, the leaf disc bioassay was conducted
158 essentially as described above, with the following exceptions. Leaf discs (55 mm diameter) cut
159 out from stage 2 cacao leaves with the midrib in the center were used for the assay. Agar plugs
160 carrying mycelia of isolate AM2As or sterile V8 agar (controls) were placed off center on both
161 sides (two agar plugs per leaf disk) of the midrib of the leaf disc. Leaf discs were harvested 48 h
162 after inoculation, agar plugs removed, andDraft leaf disk flash frozen in liquid nitrogen. Samples were
163 ground with mortar and pestle in liquid nitrogen and RNA extracted as described above.
164 Genome sequencing and assembly
165 Ltheo isolate AM2As genomic DNA was sequenced using Illumina paired-end short-read
166 technology (library preparation and sequencing done by Beijing Genome Institute, Shenzhen,
167 China). DNA sample was sheared into small fragment with a desired size by Covaris S/E210.
168 The overhangs resulting from fragmentation are converted into blunt ends by using T4 DNA
169 polymerase, klenow fragment and T4 polynucleotide kinase. After adding an “A” base to the 3'
170 end of the blunt phosphorylates DNA fragments, adapters were ligated to the ends of the DNA
171 fragments. The desires fragments were purified though gel-electrophoresis and selectively
172 enriched and amplified by PCR to construct a library with 500bp insert size. Sequencing was
173 performed using the Illumina X-ten platform. For assembly, 347,857,726 short reads (100 bp)
174 were trimmed using BBMap version 37.58 (Bushnell 2014) and 173,386,781 paired-end reads
8 https://mc06.manuscriptcentral.com/genome-pubs Page 9 of 46 Genome
175 were assembled using SPAdes Genome Assembler version 3.11.0 (Bankevich et al. 2012) in read
176 error correction and assembling mood. Key parameter K-mers were set at K21, K33, K55.
177 Ab initio gene prediction
178 The ab initio gene prediction was performed from the assembly results using AUGUSTUS
179 version 2.7 (Stanke et al. 2004), trained with Diplodia corticola gene models (GenBank:
180 MNUE01000000). The predicted proteins were compared against NCBI non-redundant (NR)
181 protein databases by BLASTp to identify biological functions (Altschul et al. 1997). Open
182 reading frames were also annotated using Blast2GO (http://www.blast2go.com/b2ghome)
183 (Conesa et al. 2005) and the KEGG–database of metabolic pathways (Moriya et al. 2007).
184 Identification of core eukaryotic genes
185 To assess transcriptome completeness Draft of assemblies and predicted genes, the Benchmarking
186 Universal Single-Copy Orthologs (BUSCO) (Simão et al. 2015) strategy was used. BUSCO
187 assembly was run using the eukaryote profile under default settings.
188 Determining secretomes and effectors
189 Ltheo protein coding sequences were scanned for possible signal peptides using SignalP, version
190 3.0 (Petersen et al. 2011). The amino acid sequences containing predicted signal peptides were
191 scanned for transmembrane proteins using the TMHMM program (prediction of transmembrane
192 helices in proteins) (Sonnhammer et al. 1998). Proteins with no more than one transmembrane
193 domain were considered potential components of the secretome. Fungal effectors among
194 secretome was identified using the machine learning program EffectorP 2.0 (Sperschneider et al.
195 2018a).
196 Determining pathogenicity related proteins
9 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 10 of 46
197 To determine the pathogenicity related proteins, Ltheo predicted proteins were compared with
198 the Pathogen-Host Interaction database (PHI-base) (Winnenburg et al. 2006). The predicted
199 protein sequences were used in a local BlastStation2 software (TM Software, Arcadia, CA)
200 analysis in the PHI-base (version 4.4). Proteins with E<10-10 and >40% sequence identity for
201 BLASTp were considered as homologs.
202 Identification of genes encoding cell wall degrading enzymes
203 To identify Ltheo genes encoding carbohydrate-active enzymes related to cell wall and other
204 organic matters, Ltheo predicted genes were analyzed by BLASTp program against
205 Carbohydrate-Active enzymes database (CAZymes) at the threshold value of E<10-10. Proteins
206 possessing a sequence identity more than 40% with biochemically characterized CAZymes were
207 considered as candidate. Draft
208 Transcriptome sequencing
209 RNA-Seq analysis from three replicating RNA samples from mycelia and infected plant material
210 was carried out by the National Center for Genome Resources (Santa Fe, NM, USA). cDNA was
211 generated using the RNA library preparation TruSeq protocol developed by Illumina
212 Technologies (San Diego, CA). Using the kit, mRNA was first isolated from total RNA by
213 performing a polyA selection step, followed by construction of paired-end sequencing libraries
214 with an insert size of 160 bp. Paired-end sequencing was performed using the Illumina
215 HiSeq2000 platform. Samples were multiplexed with unique six-mer barcodes generating
216 filtered (for Illumina adapters/primers and PhiX contamination) 2x50bp reads. The sequences
217 acquired by RNA-Seq were verified by comparison to the genomes assembled in this study.
218 RNA reads from RNA-Seq libraries ranging from 42 to 78 million reads in fastq format were
219 trimmed up using BBDuk version 37.58 (Bushnell 2014), using adapters.fa with parameters
10 https://mc06.manuscriptcentral.com/genome-pubs Page 11 of 46 Genome
220 ktrrim=r, k=23, mink=11, hist=1, tpe, tbo. Trimmed reads were aligned using HISAT2 2.1.0
221 (Pertea et al. 2016) to the coding sequences (CDS) of the Ltheo genomes. Tabulated raw counts
222 of reads to each CDS were obtained from the HISAT2 alignment. Estimation and statistical
223 analysis of expression level using the count data of each gene with three replicates for each
224 library were performed using the DESeq2 package in the R statistics suite (Anders and Huber
225 2010). For DESeq2's default normalization method, scaling factors are calculated for each lane
226 as median of the ratio, for each gene, of its read count of its geometric mean across all lanes and
227 apply to all read counts.
228 Data availability 229 The DNA sequence data of ITS 1 and 2Draft regions of the 52 Lasiodiplodia spp. have been deposited 230 at GenBank under the accession MH412939 to MH412990. The complete nucleotide sequence
231 assemblies and the Whole Genome Assembly of the Ltheo isolate AM2As has been deposited at
232 GenBank under the accession QCYV00000000, under BioProject PRJNA388190. The combined
233 transcriptome assembly from multiple tissues have been uploaded as supplementary Excel files
234 available with the article through the journal Web site.
235 Result and Discussion:
236 Collection and identification of fungi
237 VSD, caused by a near fastidious basidiomycete C. theobromae, is a serious threat to the cacao
238 industry in South East Asia (McMahon and Purwantara 2016). In an attempt to isolate the VSD
239 pathogen, we initially collected stems showing symptoms of VSD from Indonesia. Using water
240 agar medium, 47 fungal isolates were obtained from stems collected in Indonesia and showing
241 symptoms of VSD, 74% of which were identified as Lasiodiplodia spp., and 15% as Diaporthe
11 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 12 of 46
242 sp., the remainder being Fusarium spp., based on BLASTn search of the ITS rDNA sequence
243 (gen-2019-0070.R3Suppla Figure S1).
244 We also collected cacao stem samples with VSD symptoms from the Philippines and
245 isolated fungal cultures as described above. Fifteen fungal isolates from the Philippines, along
246 with 2 isolates from Miami, Florida were validated as Lasiodiplodia spp. based on BLASTn
247 search of their ITS rDNA sequences. Alvindia and Gallema (2017) reported on Ltheo causing
248 many of the symptoms associated with VSD in the Philippines based on the isolation of the
249 organism from symptomatic material. It is noted here that C. theobromae, being near fastidious,
250 is difficult to isolate from infected tissues (Samuels et al. 2012), so the isolation of other
251 organisms from VSD-symptomatic tissues is not unexpected and does not preclude the original
252 tissue symptoms being caused by C. theobromaeDraft. Regardless, the isolation of multiple isolates
253 of several Lasiodiplodia species, some of which we demonstrate have potential to cause disease
254 symptoms on cacao mimicking, in part, some of the symptoms of VSD cannot be ignored,
255 Whether acting as a primary, latent or opportunistic pathogen, it is important to
256 understand the molecular diversity, and pathogenic makeup of the organism. Alignment results
257 of the ITS rDNA sequences obtained here of these isolates showed 97-100% similarity with the
258 ITS rDNA sequences of various Lasiodiplodia sp. reported in the NCBI database (gen-2019-
259 0070.R3Supplb). The 50 Lasiodiplodia isolates identified herein from South East Asia, along
260 with two isolates obtained from cacao in Miami, Florida were subjected to phylogenetic study to
261 assess the diversity of Lasiodiplodia species associated with cacao in the areas studied hear in.
262 The parts of rDNA sequenced in this investigation include the entire ITS1 and ITS2 regions and
263 the 5.8S rRNA gene and the part of EF1α gene sequences. As the intraspecific variation of the
264 ITS rDNA is usually low in Lasiodiplodia (Alves et al. 2008), the EF1α gene sequence was
12 https://mc06.manuscriptcentral.com/genome-pubs Page 13 of 46 Genome
265 combined with the ITS rDNA sequence from the 52 Lasiodiplodia isolates acquired herein, and
266 other Lasiodiplodia spp. and related Botryosphaeriaceae fungus Diplodia corticola and
267 Botryosphaeria lutea obtained from Genbank. The maximum likelihood tree grouped the 52
268 isolates on distinct branches in accordance with these species (Figure 1). Lasiodiplodia clade
269 was resolved into five sub-clades corresponding to L. gonubiensis, L. crassispora, L.
270 rubropurpurea, L. venezuelensis and L. theobromae (Alves et al. 2008). All isolates (except
271 AM54B, AM52 and AM54B2) fall into the L. theobromae sub-clade. Thirty-three isolates can
272 be confirmed as Ltheo based on their sequence similarity to known Ltheo isolates and their
273 phylogenetic position with almost no intraspecific diversity, except isolate AM50A (Figure 1).
274 The Ltheo group includes multiple isolates from Indonesia and the Philippines. Another set of 16
275 isolates separates from the Ltheo isolatesDraft into multiple separate diverse groups with 50%
276 confidence limit when bootstrap analysis was performed with 1000 replicates (Figure 1). This set
277 of 16 diverse isolates may represent cryptic species. Although the BLASTn search against the
278 NCBI database indicates that the isolates belonging to this diverse groups are mostly likely L.
279 pseudotheobromae, a few isolates showed sequence similarity with other Lasiodiplodia species
280 (gen-2019-0070.R3Supplb). As in many instances, Lasiodiplodia isolates with exact same
281 sequence have been designated as different Lasiodiplodia sp. in the NCBI database, we would
282 need to sequence more marker genes to clearly identify these species. The third group of 3
283 isolates (AM54B, AM52 and AM54B2) were distinct, being close to L. rubropurpurea with 99%
284 sequence similarity (Figure 1 and gen-2019-0070.R3Supplb). L. rubropurpurea has been
285 previously reported from eucalyptus tree in Australia (Van der Linde et al. 2011; Burgess et al.
286 2006), this being the first report of that species outside Australia. Though further in depth study
287 would be needed to identify the exact species for many of these isolates, it is clear, that cacao
13 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 14 of 46
288 from these two areas of Indonesia and the Philippines harbors multiple species of Lasiodiplodia
289 in association with cacao. That multiple species interact with cacao in the field brings into focus
290 the need for a better understanding of their potential pathogenicity on cacao. Therefore, we
291 selected 14 isolates representing the three groups to conduct bioassays and test their
292 aggressiveness on cacao.
293 Leaf disc bioassay
294 All the 14 Lasiodiplodia isolates tested started to show water-soaked lesions on the inoculated
295 leaf disc at 1 day post-inoculation (dpi) and necrosis started at 2 dpi (gen-2019-0070.R3Suppla
296 Figure S2). AUDPC was calculated using the area under necrosis (% of area) measures for the
297 three primary tissues of the leaf (main vein, auxiliary veins, and leaf blade) at 24 h intervals over
298 4 days. Though the rate of necrosis Draft progression varied between the three tissues for some
299 isolates, the isolates showed similar trends across tissues (gen-2019-0070.R3Suppla Figure S3).
300 Focusing on the leaf blade, there is variation in the rate of spread of necrosis among the isolates
301 tested (P < 0.05) (Figure 2). Among the 6 Ltheo isolates tested (AM50B, 21A, 29A, 2As, 19B
302 and 50A), AM2As and AM50A were most aggressive. Among the other 6 Lasiodiplodia sp.
303 isolates (AM27C, AM27B, AM36A, AM19A, AM25B and AM54A), AM27B was the most
304 aggressive among all the isolates tested here in, while the AM27C and AM54A were the least
305 aggressive. The 2 putative L. rubropurpurea isolates (AM54B and AM52) tested had limited
306 aggressiveness in this assay (Figure 2). Interspecies variation in pathogenicity has been observed
307 among the Botryosphaeriaceae (Urbez-Torres and Gubler 2009). Here we see variation in
308 aggressiveness for Lasiodiplodia spp. The presence of multiple Lasiodiplodia spp., coupled with
309 variation in aggressiveness, and a common appearance of symptoms at the later stages of
310 infection, is likely to hinder the development of disease management strategies in cacao. We
14 https://mc06.manuscriptcentral.com/genome-pubs Page 15 of 46 Genome
311 selected Ltheo isolate AM2As for further study, it being aggressive and a member of the most
312 common Lasiodiplodia species recovered. We report here the draft genome sequence of Ltheo
313 isolate AM2As to identify and better understand its possible virulence determinants, an
314 understanding that may prove critical to disease management.
315 Genome Assembly and Annotation
316 Ltheo isolate AM2As was subjected to whole-genome shotgun sequencing generated by Illumina
317 technology (Table 1). Around 347 million Illumina reads were fed into SPAdes Genome
318 Assembler that resulted 43.75 Mb genome sequence with approximately 91X coverage. The
319 assembly consisted of 833 contigs with N50 length of 0.87 Mb. The overall GC content of the 320 Ltheo genome is 54.73%. For quantitativeDraft assessment of genome completeness, BUSCO analysis 321 was conducted that indicated Ltheo contains 99.23% of examined loci (95.2% complete genes
322 and 3.95% fragmented genes). Another recently published Ltheo isolate CSS-01s, has a very
323 similar genome size of 43.7 Mb and average GC content of 54.8% (Yan et al. 2017). Compared
324 to other pathogens within the Botryosphaeriaceae family, the estimated genome size of Ltheo
325 was larger than those of Diplodia corticola (34.9 Mb), but similar to Neofusicoccum parvum
326 (42.59 Mb) (Blanco-Ulate et al. 2013) and Macrophomina phaseolina (48.8 Mb) (Islam et al.
327 2012).
328 Ab initio gene prediction using AUGUSTUS, generated 13,061 protein-coding genes in
329 Ltheo genome with an average sequence length of 1,639.5bp. The gene density calculated was
330 0.489, meaning that the coding regions of these predicted genes covers 48.9% of the whole
331 genome. Using the same ab initio gene prediction strategy, Blanco-Ulate et al. (2013) has
332 reported 10,366 protein coding genes in N. parvum and Islam et al. (2012) has predicted 12,231
333 protein coding genes in M. phaseolina. Whereas, using a slightly different approach, Yan et al.
15 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 16 of 46
334 (2017) has predicted 12,902 protein coding genes in Ltheo isolate CSS-01s. As
335 Botryosphaeriaceae fungi carry limited numbers of repetitive elements (around 3.4%) in their
336 genomes (Yan et al. 2017), the high similarity in genetic structure of these organisms was
337 expected. Functional annotation of the 13,061 predicted genes showed that 7,980 (61%) could be
338 assigned with GO terms and 4,035 (30.8%) predicted genes could be mapped to the KEGG
339 pathway database (Table 1 and gen-2019-0070.R3Supplc). Among the 13,061 predicted Ltheo
340 proteins, 1,372 were further predicted to contain signal peptides. Finally, scanning for
341 transmembrane proteins using the TMHMM program, 1,202 proteins with no more than one
342 transmembrane domain were considered components of the secretome (gen-2019-
343 0070.R3Supplc). A total of 1,279 predicted Ltheo proteins showed homology with the proteins
344 included in the PHI-base. Among that, 150Draft predicted proteins are related to loss of pathogenicity,
345 533 are related to reduced virulence, 20 are related to increased virulence, while 71 predicted
346 proteins are related to lethality when their functions are disrupted (gen-2019-0070.R3Supplc).
347 Similarly, 1,120 PHI-base hits were identified in D. seriata, while 1,384 were identified in N.
348 parvum (Morales-Cruz et al. 2015). To understand the potential Ltheo genes involved in organic
349 matter degradation we identified CAZymes in the transcriptome. A total of 606 CAZymes
350 mapping to 718 predicted Ltheo proteins were identified (gen-2019-0070.R3Supplc). Although a
351 direct comparison was not possible, as Yan et al. (2017) haven’t released the assembled
352 CDS/protein sequence data in the published genome (GenBank accession no. MDYX01000000),
353 these numbers were quite similar to the Ltheo isolate CSS-01s (Yan et al. 2017).
354 Comparisons to other fungi
355 Comparative genomics reveals information on genetic variation and evolutionary dynamics
356 between species and their specific adaptations. Among the published Botryosphaeriaceae
16 https://mc06.manuscriptcentral.com/genome-pubs Page 17 of 46 Genome
357 genomes, Ltheo AM2As has the second largest genome after M. phaseolina (48.88 Mb) (Table
358 2). To identify predicted genes that are exclusive to a genome or to a group, bidirectional
359 BLAST analysis were conducted with Ltheo, Diplodia corticola, Neofusicoccum parvum and
360 Botrytis cinerea, enabling identification of similarities such as pathogenicity genes, and family
361 specific genes. D. corticola and N. parvum were selected because they are Botryosphaeriaceae
362 and exhibit similar environmental adaptations, whereas B. cinerea is a non-Botryosphaeriaceae
363 and highly pathogenic. The number of predicted species-specific genes is 2,862 for Ltheo (LT),
364 1,086 for D. corticola (DC), 1,269 for N. parvum (NP) and 10, 798 for B. cinereal (BC) (Figure
365 3). Of the 2,862 Ltheo-specific predicted genes, 1,661 have putative functions and 1,619 were
366 detected as transcriptionally active by RNA-Seq analysis (with ≥5 normalized reads, either in
367 mycelia, or in planta) (gen-2019-0070.R3Suppld).Draft Based on the annotation, several of these
368 predicted genes appear to have a role in plant defense modification, cell adhesion, cell wall
369 degradation, pectin degradation, oxidoreductases and membrane transport (gen-2019-
370 0070.R3Suppld). Ltheo has been reported to be the most virulent pathogen of grape vine among
371 the Botryosphaeriaceae family (Úrbez-Torres and Gubler 2009). The species-specific genes
372 identified here likely support that virulence. We could not include the transcriptome of the other
373 published Ltheo genome of CSS-01s (Yan et al. 2017) in this comparison as the assembled
374 transcriptome has not been made available. A BLASTn search of the Ltheo isolate AM2As
375 against the CSS-01s contigs showed that 9,410 genes are ≥99% similar. Among the 383 Ltheo
376 AM2As genes which are ≤90% similar to the Ltheo CSS-01s genome, 297 genes are Ltheo-
377 specific genes (gen-2019-0070.R3Supple). This indicates that, though the two isolates have very
378 similar genomes, there is a set of differentiating genes and those genes are mostly unique to
379 Ltheo compared to other related Botryosphaeriaceae fungi.
17 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 18 of 46
380 Differential transcriptome analysis during cacao infection
381 To validate the expression of the Ltheo predicted genes, RNA-Seq was performed on RNA from
382 culture grown mycelia and from infected cacao leaves at 48 h post infection. The RNA-Seq
383 analysis identified 11,698 and 10,310 transcripts (with ≥5 raw reads) for the mycelia and in planta
384 samples, respectively. Combining these data 11,860 predicted genes could be detected in RNA-
385 Seq data from mycelium and infected leaves, leaving only 1,201 predicted genes without
386 transcripts. Islam et al. (2012) has reported that only 9,934 predicted genes were transcriptionally
387 active in the M. phaseolina genome with 13,806 predicted genes. Among the transcribed genes,
388 1,255 were preferentially expressed in planta (>2 Log2 and Padj <0.05) compared to mycelia.
389 On the other hand, 1,753 transcribed genes were preferentially expressed in culture grown
390 mycelia (>2 Log2 and Padj <0.05) comparedDraft to in planta (gen-2019-0070.R3Supplc). KEGG
391 pathway analysis of the differentially expressed genes showed that in planta, although pathways
392 such as starch and sucrose metabolism, pentose and glucuronate interconversions,
393 glycolysis/gluconeogenesis, fatty acid degradation and biosynthesis of unsaturated fatty acids
394 showed some perturbation (gen-2019-0070.R3Supplf), changes to major metabolic pathways
395 were limited in general. This resiliency despite such a significant change in external conditions
396 may partially explain the plasticity of Ltheo, allowing adaptation to its broad host range.
397 Yan et al. (2017) identified a total of 285 up-regulated genes and 243 down-regulated
398 genes during the early infection stages of Ltheo in grape vine. Yan et al. (2017) also reported that
399 the up-regulated genes were largely secreted, facilitating the degradation of cell walls. Genome
400 comparison of Botryosphaeriaceae species with opportunistic, pathogenic and non-pathogenic
401 fungi revealed that they are more closely related to opportunistic fungi than the other two (Yan et
402 al. 2017). Endophytic fungi like Trichoderma spp. on cacao also utilize genes encoding enzymes
18 https://mc06.manuscriptcentral.com/genome-pubs Page 19 of 46 Genome
403 targeting digestion and modification of the hosts cell wall (Bailey et al. 2006). That these
404 enzymes would be of importance in the establishing associations between fungi and plants seems
405 logical since they likely function in acquisition of nutrients from the surrounding environment,
406 whatever the outcome of the interaction might be. Ltheo has been described as a primary
407 pathogen (Machado et al. 2014), opportunistic pathogen (Mullen et al. 1991), or endophyte
408 (Rubini et al. 2005) depending on the specific interaction and situation of isolation. Considering
409 the complexity of Ltheo interactions with cacao, the manner Ltheo interacts with plants through
410 its transmembrane and secreted proteins is warrantied.
411 Membrane proteins 412 Membrane transporters play a Draft vital role in fungal pathogenesis and protection against 413 host defense mechanism (Perlin et al. 2014). Plant pathogenic fungi depend heavily on their
414 ability to exploit host-derived resources like carbohydrates or peptides, for which they need the
415 transporters to facilitate the uptake of degraded cellular components. During the infection
416 process, fungi also constantly need to get rid of any phytotoxins or xenobiotics that would
417 otherwise hinder their success in processes that can also rely on membrane transporters. Based
418 on the screening result for transmembrane proteins using the TMHMM program and BLASTp
419 search against NCBI non-redundant (NR) protein databases, 827 Ltheo proteins are predicted to
420 be membrane bound transporter proteins, with MFS transporters being the largest group (Table
421 3). Ltheo encodes the highest number of membrane transporters among all the sequenced
422 Botryosphaeriaceae species (Yan et al. 2017). More than 90% of these transporter proteins were
423 transcriptionally active either in mycelia or in planta, with only 93 being more highly expressed
424 in planta and 159 being more highly expressed in mycelia (Table 3). The sugar transporters were
425 the exception in this trend with 20 being more highly expressed in planta compared to just 9
19 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 20 of 46
426 being more highly expressed in mycelia. Similarly, more genes encoding non-transporter
427 membrane bound proteins are more highly expressed in mycelia (162) compared to those more
428 highly expressed (78) in planta (Table 3). These differences are likely indicators of the
429 significant necrotrophic ability of Ltheo as the mycelia were grown in a complex plant based
430 media V8. Membrane protein composition depends on available nutrient sources. For the
431 necrotrophic pathogen B. cinerea, it was observed that, when grown in the presence of tomato
432 cell walls as a sole carbon source, membrane protein production was directed toward cell wall
433 degrading enzymes and proteins involved in toxic resistance (Liñeiro et al. 2016). On the other
434 hand, when B. cinerea was grown in presence of glucose as a sole carbon source, the changes in
435 membrane proteins were related to signaling process, protein biosynthesis and modification
436 process in the endoplasmic reticulum andDraft vesicle mediated transport (Liñeiro et al. 2016). The in
437 planta differential expression of the Ltheo transcriptome identified here, though narrower than
438 that observed in V8 grown mycelia, likely focused on specific substrates and secondary
439 metabolites released during the infection process on cacao.
440 Ltheo secretome
441 Proteins secreted by plant pathogenic fungi have the potential to interact with and alter host cells
442 and therefore, their identification and characterization is essential to understanding virulence and
443 the mechanism of infection. As predicted by SignalP-4.1, 1,202 proteins with no more than one
444 transmembrane domain were considered potential components of the Ltheo secretome,
445 accounting for 9.2% of its proteome. Among the secretome, 1020 were found to be
446 transcriptionally active and 397 were more highly expressed during cacao leaf infection (gen-
447 2019-0070.R3Supplg). On the other hand, Yan et al. (2017) has reported 937 secreted proteins in
448 another Ltheo isolate from grape vine and found that 105 were up-regulated during infection.
20 https://mc06.manuscriptcentral.com/genome-pubs Page 21 of 46 Genome
449 More than 50% of the Ltheo secretome identified herein are secreted into the apoplast (gen-
450 2019-0070.R3Supplg) as predicted by ApoplastP 1.0 (Sperschneider et al. 2018b).
451 Secreted effectors
452 Fungi and oomycetes secrete effectors that suppress the pathogen-associated molecular pattern
453 (PAMP) triggered plant immunity. But, we have very limited knowledge on Botryosphaeriaceae
454 effectors that contribute to virulence. Yan et al. (2017) did an initial prediction of Ltheo effectors
455 based on protein sequence length and number of cysteine residues, and identified 359 putative
456 effectors in Ltheo isolate CSS-01s. Using same approach, 384 putative effectors were identified
457 in the transcriptome of AM2As. We further used the machine learning program EffectorP 2.0
458 (Sperschneider et al. 2018a) to predict 115 effector proteins among the secretome (gen-2019-
459 0070.R3Supplg). Among these 115 effectors,Draft 85 are apoplastic while 66 are transcriptionally
460 active in mycelia or infected plant tissue (48 hpi) (gen-2019-0070.R3Supplg). Evolutionary
461 relationships among the effectors indicates that they are highly diverse and not dominated by any
462 significant phyletic group (Figure 4A). As expected, most expressed effectors were more highly
463 expressed in planta compare to mycelia (Figure 4B). Fifty-three of the effectors encode
464 hypothetical proteins, which suggests novel effector functions may exist in Ltheo. Yan et al.
465 (2017) tested a few Ltheo effectors by expressing them in Burkholderia glumae, and then
466 infecting Nicotiana benthamiana with the transformed bacteria. Five out of seven effectors tested
467 showed strong suppressive effect of the B. glumae triggered hypersensitivity in N. benthamiana.
468 Among the 39 effectors more highly expressed in planta, 3 genes have pectate lyase domains
469 (LTHEOB_1556, 2730 and 7548). Pectate lyases can be essential for the fungal virulence (Cho
470 et al. 2015; Yang et al. 2018), but their potential function as effectors need further consideration.
471 Another infection expressed effector (LTHEOB_3882) has a NPP1 domain. Necrosis inducing
21 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 22 of 46
472 proteins (NPPs) are induced during infection and are associated with plant cell death (Fellbrich et
473 al. 2002). Some other interesting infection expressed effectors requiring further study are
474 LTHEOB_3870 with CAP domain, LTHEOB_197 with CFEM and LTHEOB_12487 with
475 RALF domain. Another set of interesting group requiring further study are the 14 in planta
476 expressed effectors that are specific to Ltheo, compared to other Botryosphaeriaceae species
477 (gen-2019-0070.R3Suppld).
478 Secreted CAZymes
479 Phytopathogenic fungi encode CAZymes, that play an important role during colonization and
480 infection by breakdown and modification of plant cell wall structures. To understand the
481 potential genes involved in the adaptation of Ltheo to cacao habitats and substrates we identified
482 the repertoire of CAZymes of this cacaoDraft pathogen. Among the 718 predicted Ltheo genes
483 mapped to 606 CAZymes, 323 are predicted to be secreted protein based on the presence of a
484 signal peptide (Table 4) and rest of the 395 are predicted to be non- secreted gen-2019-
485 0070.R3Suppla Table S1). Yan et al. (2017) has reported slightly higher numbers, a total of 763
486 Ltheo genes mapped to 820 CAZymes. The differences found may be due to different cut offs
487 used in the BLAST homology search. Unfortunately, a direct comparison between the two
488 transcriptomes was not possible. Of the secreted CAZymes, 306 are transcriptionally active and
489 more than 50% of these are more highly expressed during infection, whereas only 39 secreted
490 CAZymes were more highly expressed in mycelia. Glycoside hydrolases (GH) formed the
491 largest group followed by auxiliary activities (AA) and polysaccharide lyases (PL) among the
492 infection expressed secreted CAZymes (Table 4). Most of these infection-expressed secreted
493 CAZymes can be considered as highly expressed (treatments mean >1000 reads) (Table 4).
494 Previous studies related to Botryosphaeriaceae genomes identified a broad range of key genes
22 https://mc06.manuscriptcentral.com/genome-pubs Page 23 of 46 Genome
495 responsible for cell degradation of woody plants (Morales-Cruz et al. 2015; Paolinelli-Alfonso et
496 al. 2016). Pectinases degrade pectin which is a component of the plant primary cell wall and
497 middle lamella. Out of the 28 Ltheo pectinases, 26 are predicted to be secreted proteins. There
498 were 26 Ltheo pectinases preferentially expressed in planta compared to just one preferentially
499 expressed in mycelia. Ltheo possesses the highest number of pectolytic enzyme coding genes
500 among the sequenced Botryosphaeriaceae genomes (Yan et al. 2017). The significant induction
501 of these genes during infection is consistent with their critical roles in pathogenesis. Because
502 most of the dicotyledon cell walls consist of around 35% pectin, these pectolytic enzymes
503 facilitate the breakdown of cell walls (Have et al. 1998). In addition, there were 90 GH and other
504 carbohydrate modifying genes preferentially expressed in planta compared to just 13
505 preferentially expressed in mycelia (TableDraft 4). Among these, a wide array of enzymes (AA9,
506 GH5, GH3, GH16, GH43) target cellulose and hemicellulose, like other pathogens (Yan et al.
507 2017; Morales-Cruz et al. 2015; Paolinelli-Alfonso et al. 2016). Together, the up-regulation of
508 genes encoding secreted CAZymes targeting pectin, cellulose, hemicellulose and xylan during
509 infection explains the rapid colonization and infection of Ltheo on plants. Among the non-
510 secreted CAZymes, GH was also the largest group followed by glycosyltransferases (GT).
511 Though, more than 95% of the 393 non-secreted CAZymes are transcriptionally active, only 50
512 were preferentially expressed during infection of cacao, and 34 were preferentially expressed in
513 mycelia (gen-2019-0070.R3Suppla Table S1).
514 Non-CAZymes secreted proteins
515 Beside secreted effectors and plant cell degrading enzymes, fungi have a broader arsenal
516 of secreted proteins/enzymes at their disposal when causing disease. The most notable are
517 peptidases, alternative oxidase, cell wall proteins, reductases, necrosis inducing proteins (NPPs),
23 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 24 of 46
518 esterase, pathogenesis associated (PR) proteins, fungal hydrophobins, cytochrome P450s, chitin
519 synthesis and genes with putative role in phenolic, melanin, protein metabolism (Paolinelli-
520 Alfonso et al. 2016; Meinhardt et al. 2014; Bailey et al. 2014). Among the 764 non-CAZyme and
521 non-effector secreted protein codding genes, proteases/peptidases are the largest group with 78
522 genes, of which, 26 being preferentially expressed during infection compared to 8 preferentially
523 expressed in mycelia (Table 5). The next major group of genes in this list are oxidases/reductases
524 (60 genes), with 31 genes being preferentially expressed during infection compared to 6
525 preferentially expressed in mycelia (Table 5). These genes participate in processes including
526 sterol biosynthesis, degradation of lignin and breakdown of environmental contaminants (van
527 Gorcom et al. 1998). Similarly, induction of 4 tannase feruloyl esterase genes during infection
528 also suggests accelerated lignin degradation,Draft as they act as accessory enzymes to assist
529 xylanolytic and pectinolytic enzymes in gaining access to their site of action during biomass
530 conversion (Dilokpimol et al. 2016). Besides the proteins with known putative functions, many
531 in these groups encode hypothetical proteins, again needing more detailed study. Motif search
532 have identified functional domains in a few of these hypothetical proteins (gen-2019-
533 0070.R3Supplg).
534 Conclusion
535 From an endophyte to opportunistic plant pathogen, and now, potentially, a significant emerging
536 threat to cacao, Ltheo appears to have the tools within its genome to evolve in response to
537 changes in climate and crop production practices. Ltheo has been shown to be more virulent
538 under high temperature and drought stress in some plant pathogen interactions (Paolinelli-Alfonso
539 et al. 2016; Yan et al. 2017; Songy et al. 2019). We observed that multiple species of Lasiodiplodia
540 capable of causing similar symptoms on cacao leaves coexist in a given location, increasing the
24 https://mc06.manuscriptcentral.com/genome-pubs Page 25 of 46 Genome
541 complexity of disease management. It is easy to see how symptoms caused by Lasiodiplodia
542 spp., principally necrosis, might be confused with symptoms of other diseases and possible
543 synergy between these pathogens in causing disease deserves further study. The genome of the
544 Ltheo isolate from cacao studied here is similar to the genome of grapevine isolate CSS-01s,
545 previously described by Yan et al. (2017), and it uses a similar gene complements when
546 colonizing cacao leaves as CSS-01s uses when colonizing grape vine stems. This seems logical
547 for a broad host pathogen providing a foundation to study the possible specialization of Ltheo
548 isolates when attacking divergent plant species. Our results indicate that, during infection,
549 limited changes in expression occur in genes encoding proteins that reside inside the cell or on its
550 membranes, compared to genes encoding secreted proteins that directly interact with components
551 outside the cell, including the plant. ThisDraft may indicate that Ltheo routinely and constitutively
552 expresses a wide range of internal and membrane associated proteins, buffering it against
553 external changes, while genes encoding secreted proteins that interact directly with components
554 outside the cell are more responsive to change, adapting to the specific substrates/components
555 encountered. The latter gene set includes a wide array of putative effectors, necrosis-inducing
556 proteins, pectinases and hydrolytic enzymes likely involved in Ltheo virulence and pathogenicity
557 during cacao infection.
558
559 Acknowledgements
560 This work was funded by USDA ARS. References to a company and/or product by the USDA
561 are only for the purposes of information and do not imply approval or recommendation of the
562 product to the exclusion of others that may also be suitable. USDA is an equal opportunity
563 provider and employer. The authors have no conflict of interest to declare.
25 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 26 of 46
564
565 Reference: 566 Ali, S. S., Shao, J., Lary, D. J., Kronmiller, B., Shen, D., Strem, M. D., Amoako-Attah, I.,
567 Akrofi, A. Y., Begoude, B. A. D., Hoopen, G. M. t., Coulibaly, K., Kebe, B. I., Melnick,
568 R. L., Guiltinan, M. J., Tyler, B. M., Meinhardt, L. W. and Bailey, B. A. 2016.
569 Phytophthora megakarya and P. palmivora, closely related causal agents of cacao black
570 pod rot, underwent increases in genome sizes and gene numbers by different
571 mechanisms. Genome. Biol. Evol. 9: 536-557.
572 Altschul, S. F., Madden, T. L., Schäffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.
573 J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search 574 programs. Nucleic Acids Res. 25Draft: 3389-3402. 575 Alves, A., Crous, P. W., Correia, A. and Phillips, A. 2008. Morphological and molecular data
576 reveal cryptic speciation in Lasiodiplodia theobromae. Fungal Divers. 28: 1-13.
577 Alvindia, D. G. and Gallema, F. L. M. 2017. Lasiodiplodia theobromae causes vascular streak
578 dieback (VSD)–like symptoms of cacao in Davao Region, Philippines. Austral. Plant Dis.
579 Notes, 12: 54.
580 Anders, S. and Huber, W. 2010. Differential expression analysis for sequence count data.
581 Genome Biol. 11: R106.
582 Bailey, B., Bae, H., Strem, M., Roberts, D., Thomas, S., Crozier, J., Samuels, G., Choi, I.-Y. and
583 Holmes, K. 2006. Fungal and plant gene expression during the colonization of cacao
584 seedlings by endophytic isolates of four Trichoderma species. Planta,, 224: 1449-1464.
585 Bailey, B. A., Crozier, J., Sicher, R. C., Strem, M. D., Melnick, R., Carazzolle, M. F., Costa, G.
586 G., Pereira, G. A., Zhang, D. and Maximova, S. 2013. Dynamic changes in pod and
587 fungal physiology associated with the shift from biotrophy to necrotrophy during the
26 https://mc06.manuscriptcentral.com/genome-pubs Page 27 of 46 Genome
588 infection of Theobroma cacao by Moniliophthora roreri. Physiol. Mol. Plant Pathol. 81:
589 84-96.
590 Bailey, B. A., Melnick, R. L., Strem, M. D., Crozier, J., Shao, J., Sicher, R., Philips-Mora, W.,
591 Ali, S. S., Zhang, D. and Meinhardt, L. 2014. Differential gene expression by
592 Moniliophthora roreri while overcoming cacao tolerance in the field. Mol. Plant Pathol.
593 15: 711-729.
594 Bailey, B. A., Strem, M. D., Bae, H., de Mayolo, G. A. and Guiltinan, M. J. 2005. Gene
595 expression in leaves of Theobroma cacao in response to mechanical wounding, ethylene,
596 and/or methyl jasmonate. Plant Science, 168: 1247-1258.
597 Bankevich, A., Nurk, S., Antipov, D., Gurevich, A. A., Dvorkin, M., Kulikov, A. S., Lesin, V.
598 M., Nikolenko, S. I., Pham, S.Draft and Prjibelski, A. D. 2012. SPAdes: a new genome
599 assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19:
600 455-477.
601 Blanco-Ulate, B., Rolshausen, P. and Cantu, D. 2013. Draft genome sequence of Neofusicoccum
602 parvum isolate UCR-NP2, a fungal vascular pathogen associated with grapevine cankers.
603 Genome Announc. 1: e00339-00313. DOI: 10.1128/genomeA.00339-13
604 Burgess, T. I., Barber, P. A., Mohali, S., Pegg, G., de Beer, W. and Wingfield, M. J. 2006. Three
605 new Lasiodiplodia spp. from the tropics, recognized based on DNA sequence
606 comparisons and morphology. Mycologia, 98: 423-435.
607 Bushnell, B. 2014 BBMap: a fast, accurate, splice-aware aligner. Berkeley, CA (US): Ernest
608 Orlando Lawrence Berkeley National Laboratory. http://1ofdmq2n8tc36m6i46scovo2e.
609 wpengine.netdna-cdn.com/wp-content/uploads/2013/11/BB_User-Meeting-2014-poster-
610 FINAL.pdf.
27 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 28 of 46
611 Cho, Y., Jang, M., Srivastava, A., Jang, J.-H., Soung, N.-K., Ko, S.-K., Kang, D.-O., Ahn, J. S.
612 and Kim, B. Y. 2015. A pectate lyase-coding gene abundantly expressed during early
613 stages of infection is required for full virulence in Alternaria brassicicola. PloS ONE, 10:
614 e0127140. doi.org/10.1371/journal.pone.0127140.
615 Conesa, A., Götz, S., García-Gómez, J. M., Terol, J., Talón, M. and Robles, M. 2005. Blast2GO:
616 a universal tool for annotation, visualization and analysis in functional genomics
617 research. Bioinformatics, 21: 3674-3676.
618 del Castillo, D. S., Parra, D., Noceda, C. and Pérez-Martínez, S. 2016. Co-occurrence of
619 pathogenic and non-pathogenic Fusarium decemcellulare and Lasiodiplodia theobromae
620 isolates in cushion galls disease of cacao (Theobroma cacao L.). J. Plant Protect. Res. 56:
621 129-138. Draft
622 Dilokpimol, A., Mäkelä, M. R., Aguilar-Pontes, M. V., Benoit-Gelber, I., Hildén, K. S. and de
623 Vries, R. P. 2016. Diversity of fungal feruloyl esterases: updated phylogenetic
624 classification, properties, and industrial applications. Biotechno. Biofuels, 9: 231-231.
625 Fellbrich, G., Romanski, A., Varet, A., Blume, B., Brunner, F., Engelhardt, S., Felix, G.,
626 Kemmerling, B., Krzymowska, M. and Nürnberger, T. 2002. NPP1, a
627 Phytophthora‐associated trigger of plant defense in parsley and Arabidopsis. Plant J. 32:
628 375-390.
629 Have, A. t., Mulder, W., Visser, J. and van Kan, J. A. 1998. The endopolygalacturonase gene
630 Bcpg1 is required for full virulence of Botrytis cinerea. Mol. Plant Microbe Interact. 11:
631 1009-1016.
632 Islam, M. S., Haque, M. S., Islam, M. M., Emdad, E. M., Halim, A., Hossen, Q. M. M., Hossain,
633 M. Z., Ahmed, B., Rahim, S. and Rahman, M. S. 2012. Tools to kill: genome of one of
28 https://mc06.manuscriptcentral.com/genome-pubs Page 29 of 46 Genome
634 the most destructive plant pathogenic fungi Macrophomina phaseolina. BMC Genom.
635 13: 493.
636 Jaiyeola, I., Akinrinlola, R. J., Ige, G. S., Omoleye, O. O., Oyedele, A., Odunayo, B. J., Emehin,
637 O. J., Bello, M. O. and Adesemoye, A. O. 2014. Bot canker pathogens could complicate
638 the management of Phytophthora black pod of cocoa. African J. Microbiol. Res. 8: 3094-
639 3100.
640 Larkin, M. A., Blackshields, G., Brown, N., Chenna, R., McGettigan, P. A., McWilliam, H.,
641 Valentin, F., Wallace, I. M., Wilm, A. and Lopez, R. 2007. Clustal W and Clustal X
642 version 2.0. Bioinformatics, 23: 2947-2948.
643 Liñeiro, E., Chiva, C., Cantoral, J. M., Sabidó, E. and Fernández-Acero, F. J. 2016.
644 Modifications of fungal membraneDraft proteins profile under pathogenicity induction: A
645 proteomic analysis of Botrytis cinerea membranome. Proteom. 16: 2363-2376.
646 Machado, A. R., Pinho, D. B. and Pereira, O. L. 2014. Phylogeny, identification and
647 pathogenicity of the Botryosphaeriaceae associated with collar and root rot of the biofuel
648 plant Jatropha curcas in Brazil, with a description of new species of Lasiodiplodia.
649 Fungal Diver. 67: 231-247.
650 Mbenoun, M., Momo Zeutsa, E. H., Samuels, G., Nsouga Amougou, F. and Nyasse, S. 2008.
651 Dieback due to Lasiodiplodia theobromae, a new constraint to cocoa production in
652 Cameroon. Plant Pathol. 57: 381-381.
653 McMahon, P. and Purwantara, A. 2016.Vascular streak dieback (Ceratobasidium theobromae,
654 history and biology. In Cacao Diseases: A History of Old Enemies and New Encounters.
655 Edited by B. A. Bailey and L. W. Meinhardt. Springer International Publishing, New
656 York, NY. pp. 307-335.
29 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 30 of 46
657 Medina, V. and Laliberte, B., 2017. A review of research on the effects of drought and
658 temperature stress and increased CO2 on Theobroma cacao L., and the role of genetic
659 diversity to address climate change. Costa Rica: Bioversity International, 51 p. ISBN: 978-92-
660 9255-074-5.
661 Meinhardt, L. W., Costa, G. G., Thomazella, D. P., Teixeira, P. J., Carazzolle, M. F., Schuster, S.
662 C., Carlson, J. E., Guiltinan, M. J., Mieczkowski, P. and Farmer, A. 2014. Genome and
663 secretome analysis of the hemibiotrophic fungal pathogen, Moniliophthora roreri, which
664 causes frosty pod rot disease of cacao: mechanisms of the biotrophic and necrotrophic
665 phases. BMC Genom. 15: 164.
666 Morales-Cruz, A., Amrine, K. C., Blanco-Ulate, B., Lawrence, D. P., Travadon, R., Rolshausen,
667 P. E., Baumgartner, K. and Cantu,Draft D. 2015. Distinctive expansion of gene families
668 associated with plant cell wall degradation, secondary metabolism, and nutrient uptake in
669 the genomes of grapevine trunk pathogens. BMC Genom. 16: 469.
670 Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A. C. and Kanehisa, M. 2007. KAAS: an automatic
671 genome annotation and pathway reconstruction server. Nucleic Acids Res. 35: 182-185.
672 Mullen, J., Gilliam, C., Hagan, A. and Morgan-Jones, G. 1991. Canker of dogwood caused by
673 Lasiodiplodia theobromae, a disease influenced by drought stress or cultivar selection.
674 Plant Disease, 75: 886-889.
675 Nowell, W. 1923. Diseases of crop-plants in. the Lesser Antilles. London: The Imperial Dept. of
676 Agriculture. The West India Committee, Trinity Square, London, UK. pp. 383.
677 Paolinelli-Alfonso, M., Villalobos-Escobedo, J. M., Rolshausen, P., Herrera-Estrella, A.,
678 Galindo-Sánchez, C., López-Hernández, J. F. and Hernandez-Martinez, R. 2016. Global
679 transcriptional analysis suggests Lasiodiplodia theobromae pathogenicity factors
680 involved in modulation of grapevine defensive response. BMC Genom. 17: 615.
30 https://mc06.manuscriptcentral.com/genome-pubs Page 31 of 46 Genome
681 Perlin, M. H., Andrews, J. and San Toh, S. 2014.Essential letters in the fungal alphabet: ABC
682 and MFS transporters and their roles in survival and pathogenicity. Adv. Genet. 85: 201-
683 253.
684 Pertea, M., Kim, D., Pertea, G. M., Leek, J. T. and Salzberg, S. L. 2016. Transcript-level
685 expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown. Nat.
686 Protoc. 11: 1650.
687 Petersen, T. N., Brunak, S., von Heijne, G. and Nielsen, H. 2011. SignalP 4.0: discriminating
688 signal peptides from transmembrane regions. Nat. Methods, 8: 785-786.
689 Ploetz, R. 2016.The Impact of Diseases on Cacao Production: A Global Overview. In Cacao
690 Diseases: A History of Old Enemies and New Encounters. Edited by B. A. Bailey and L.
691 W. Meinhardt. Springer InternationalDraft Publishing, New York, NY. pp. 33-59.
692 Rubini, M. R., Silva-Ribeiro, R. T., Pomella, A. W. V., Maki, C. S., Araújo, W. L., Dos Santos,
693 D. R. and Azevedo, J. L. 2005. Diversity of endophytic fungal community of cacao
694 (Theobroma cacao L.) and biological control of Crinipellis perniciosa, causal agent of
695 Witches' Broom Disease. Int. J. Biol. Sc. 1: 24-33.
696 Saitou, N. and Nei, M. 1987. The neighbor-joining method: a new method for reconstructing
697 phylogenetic trees. Mol. Biol. Evol. 4: 406-425.
698 Samuels, G. J., Ismaiel, A., Rosmana, A., Junaid, M., Guest, D., Mcmahon, P., Keane, P.,
699 Purwantara, A., Lambert, S. and Rodriguez-Carres, M. 2012. Vascular streak dieback of
700 cacao in Southeast Asia and Melanesia: in planta detection of the pathogen and a new
701 taxonomy. Fungal Biol. 116: 11-23.
702 Shaner, G. and Finney, R. 1977. The effect of nitrogen fertilization on the expression of slow-
703 mildewing resistance in Knox wheat. Phytopathol. 67: 1051-1056.
31 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 32 of 46
704 Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. and Zdobnov, E. M. 2015.
705 BUSCO: assessing genome assembly and annotation completeness with single-copy
706 orthologs. Bioinformatics, 31(19): 3210-3212.
707 Slippers, B. and Wingfield, M. J. 2007. Botryosphaeriaceae as endophytes and latent pathogens
708 of woody plants: diversity, ecology and impact. Fungal Bio. Rev. 21: 90-106.
709 Sonnhammer, E. L., Von Heijne, G. and Krogh, A. 1998 A hidden Markov model for predicting
710 transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6:
711 175-182.
712 Songy, A., Fernandez, O., Clément, C., Larignon, P. and Fontaine, F., 2019. Grapevine trunk
713 diseases under thermal and water stresses. Planta, 249(6): 1655-1679.
714 Sperschneider, J., Dodds, P. N., Gardiner,Draft D. M., Singh, K. B. and Taylor, J. M. 2018a.
715 Improved prediction of fungal effector proteins from secretomes with EffectorP 2.0. Mol.
716 Plant Pathol. 19: 2094-2110.
717 Sperschneider, J., Dodds, P. N., Singh, K. B. and Taylor, J. M. 2018b. ApoplastP: prediction of
718 effectors and plant proteins in the apoplast using machine learning. New Phytol. 217:
719 1764-1778.
720 Stanke, M., Steinkamp, R., Waack, S. and Morgenstern, B. 2004. AUGUSTUS: a web server for
721 gene finding in eukaryotes. Nucleic Acids Res. 32: 309-312.
722 Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. and Kumar, S. 2011. MEGA5:
723 molecular evolutionary genetics analysis using maximum likelihood, evolutionary
724 distance, and maximum parsimony methods. Mol. Biol. Evol. 28: 2731-2739.
725 Úrbez-Torres, J. and Gubler, W. 2009. Pathogenicity of Botryosphaeriaceae species isolated
726 from grapevine cankers in California. Plant Disease, 93: 584-592.
32 https://mc06.manuscriptcentral.com/genome-pubs Page 33 of 46 Genome
727 Van der Linde, J. A., Six, D. L., Wingfield, M. J. and Roux, J. 2011. Lasiodiplodia species
728 associated with dying Euphorbia ingens in South Africa. Southern Forests 73: 165-173.
729 van der Nest, M. A., Bihon, W., De Vos, L., Naidoo, K., Roodt, D., Rubagotti, E., Slippers, B.,
730 Steenkamp, E. T., Wilken, P. M. and Wilson, A. 2014. Draft genome sequences of
731 Diplodia sapinea, Ceratocystis manginecans, and Ceratocystis moniliformis. IMA
732 Fungus 5: 135-140.
733 van Gorcom, R. F., van den Hondel, C. A. and Punt, P. J. 1998. Cytochrome P450 enzyme
734 systems in fungi. Fungal Genet. Biol. 23: 1-17.
735 White, T. J., Bruns, T., Lee, S. and Taylor, J. 1990. Amplification and direct sequencing of
736 fungal ribosomal RNA genes for phylogenetics. In PCR Protocols: A Guide to Methods
737 and Applications. Edited by M.Draft A. Innis, D. H. Gelfand, J. J. Sninsky and T. J. White.
738 Academic Press Inc., New York. pp. 315-322.
739 Winnenburg, R., Baldwin, T. K., Urban, M., Rawlings, C., Köhler, J. and Hammond-Kosack, K.
740 E. 2006. PHI-base: a new database for pathogen host interactions. Nucleic Acids Res. 34:
741 459-464.
742 World Cocoa Foundation 2014. Cocoa market update. Washington DC: World Cocoa
743 Foundation. Found at http://www.worldcocoafoundation.org/wp-content/uploads/Cocoa-
744 Market-Update-as-of-4-1-2014.pdf.
745 Yan, J. Y., Zhao, W. S., Chen, Z., Xing, Q. K., Zhang, W., Chethana, K., Xue, M. F., Xu, J. P.,
746 Phillips, A. J. and Wang, Y. 2017. Comparative genome and transcriptome analyses
747 reveal adaptations to opportunistic infections in woody plant degrading pathogens of
748 Botryosphaeriaceae. DNA Res. 25: 87-102.
33 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 34 of 46
749 Yang, Y., Zhang, Y., Li, B., Yang, X., Dong, Y. and Qiu, D. 2018. A Verticillium dahliae
750 Pectate lyase Induces Plant Immune Responses and Contributes to Virulence. Front.
751 Plant Sci. 9: 1271.
752 Zuckerkandl, E. and Pauling, L. 1965. Evolutionary divergence and convergence in proteins.
753 In Evolving Genes and Proteins. Edited by V. Bryson and H.J. Vogel. Academic Press,
754 New York. pp. 97-166.
755
Draft
34 https://mc06.manuscriptcentral.com/genome-pubs Page 35 of 46 Genome
756 Table 1. Genome assembly and annotation statistics of
757 Lasiodiplodia theobromae strain AM2As.
L. theobromae Total Contig length (bp) 43,757,571 Contig numbers 833 BUSCO Completeness (%) 99.23% GC content 54.73% N50 Contig length (bp) 876,715 Max Contig size (bp) 1,723,604 Min Contig size (bp) 56 Mean Contig size (bp) 52530 Gene number 13,061 Total gene length 21,414,393 Average gene length 1639.56 Gene density# Draft0.489 Number of expressed genes* 11,860 Genes with GO annotation¥ 7,980 Genes within KEGG pathway 4,035 #CDS bases/total genome bases *Only gene models with ≥5 raw reads, either in any mycelia, or in planta sample are reported. ¥Gene models with E<10-4 for BLASTn against Uniport Gene Ontology database.
35 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 36 of 46
759 Table 2. Comparative genomes of Botryosphaeriaceae fungi.
Genome No. of Species Contig N50 GC% No. of genes GenBank no. length (bp) contigs Diplodia sapinea 36,053,350 2,371 37,635 56.85 No gene call AXCF00000000 Neofusicoccum parvum 42,592,847 1,877 83,561 56.80 10,366 AORE01000000 Macrophomina phaseolina 48,882,845 1,506 150,180 52.30 13,806 AHHD01000000 Lasiodiplodia theobromae 43,283,415 60 1,738,941 54.80 No gene call MDYX01000000 CSS-01s Diplodia corticola 34,986,079 286 271,374 57.10 10,839 MNUE01000000 Diplodia scrobiculata 34,931,051 4,037 16,793 57.00 No gene call LAEG00000000 Diplodia seriata 37,268,684 469 239,894 56.60 8,050 MSZU00000000 Botryosphaeria dothidea 47,389,336 1,251 210,735 53.10 No gene call MDSR00000000 L. theobromae AM2As 43,757,571 833 876,715 54.73 13,061 In this study Draft
36 https://mc06.manuscriptcentral.com/genome-pubs Page 37 of 46 Genome
761 Table 3. Number of non-CAZyme transmembrane protein codding genes# of Lasiodiplodia
762 theobromae isolate AM2As.
Preferential expression (In planta/mycelia)2 Gene Total Treatment Mean Treatment Mean>1000 Expressed1 Class/family genes Overall reads In planta Mycelia In planta Mycelia Total transporters 827 747 93 159 17 40 ABC Transporters 43 41 2 10 2 5 MFS 316 277 49 59 16 12 Sugar transport 106 93 20 9 6 4 Cation/anion (noncarbon) 161 152 7 36 2 6 Drug (ABC/MFS) 75 65 7 11 3 2 Carbon based 200 184 14 49 6 11 peptide 18 16 1 1 amino acid 64 Draft57 3 17 3 5 allantoate 31 27 5 7 3 2 pantothenate 9 9 0 5 0 0 Total non-transport 941 848 78 162 21 55 Integral membrane 185 174 28 27 4 14 Hypothetical 228 180 23 51 5 11 Enzymatic etc 645 607 46 98 14 42 Steroid biosymthesis/metabolism 17 16 1 3 1 Chitin synthase/GH family 2 19 19 1 5 3 RTA/RTA1 26 24 2 8 CFEM 8 8 5 1 1 0 Petidase/protease 22 21 1 6 1 2 FAD Binding 21 19 5 2 Glucosyl Hydrolase 23 22 2 4 1 1 Glucosyl Transferase 42 41 1 5 0 3 Cytochrome/Cp450 56 48 7 11 2 3 #As determined by TMHMM program and BLASTp search against Carbohydrate-Active enzymes database at the threshold value of E<10-10. 1Gene models with ≥5 raw reads, either in any mycelia, or in planta. 2Differential regulation at >2 Log2 and Padj <0.05
37 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 38 of 46
764 Table 4. Number of secreted# CAZymes family genes of Lasiodiplodia theobromae isolate 765 AM2A.
Preferential expression (In plant/mycelia)3 CAZymes Total Treatment Mean Treatment Expressed2 family1 genes Overall Mean>1000 reads In planta Mycelia In planta Mycelia AA3 19 19 7 3 3 1 AA1 12 11 3 3 1 2 AA7 9 6 5 1 3 1 AA9 8 8 6 2 6 1 Other AA (4) 11 10 2 4 1 2 CBM1 8 8 7 0 7 0 CBM13 4 4 3 1 3 0 Other CBM (6) 15 15 4 6 3 1 CE5 8 8 6 0 3 0 CE16 4 3 3 0 2 0 CE8 3 3 3 0 1 0 Other CE (5) 14 14 7 5 4 4 GH43_12 19 18 Draft11 0 8 0 GH3 13 11 11 0 8 0 GH28 12 11 8 1 5 0 GH5_11 10 10 6 1 5 1 GH16 9 9 5 1 2 0 GH18 7 5 4 0 2 0 GH10 4 4 4 0 3 0 GH35 4 3 3 0 3 0 GH78 4 3 3 0 3 0 GH12 3 3 3 0 3 0 GH131 3 3 3 0 3 0 Other GH (40) 86 83 29 10 21 2 GTs (8) 11 11 2 1 0 1 PL1 10 10 8 0 7 0 PL3_2 7 7 6 1 6 0 PL4_1 5 5 5 0 4 0 PL9_3 1 1 1 0 1 0 #As determined by SignalP, version 3.0 and BLASTp search against Carbohydrate-Active enzymes database at the threshold value of E<10-10. 1Number within parentheses indicates the number of CAZymes families. 2Gene models with ≥5 raw reads, either in any mycelia, or in planta. 3Differential regulation at >2 Log2 and Padj <0.05
38 https://mc06.manuscriptcentral.com/genome-pubs Page 39 of 46 Genome
767 Table 5. Number of secreted# non-CAZyme and non-effector genes of Lasiodiplodia 768 theobromae isolate AM2As.
Preferential expression (In plant/mycelia)2 Total Treatment Mean Treatment Mean>1000 Gene Class/family Expressed1 genes Overall reads In planta Mycelia In planta Mycelia Peptidase/protease/amidase 78 72 26 8 19 3 Carboxypeptidase a1 6 5 1 1 1 0 Aspartic endopeptidase pep1 3 2 0 0 0 0 Peptidase A1 7 7 2 1 1 1 Peptidase m35 deuterolysin 5 5 4 1 4 0 Peptidase M43 4 4 2 0 2 0 Peptidase S10 serine carboxypeptidase 3 3 1 0 1 0 Peptidase S41 family protein 5 4 1 0 1 0 Peptidase S8/S53 4 3 2 0 1 0 subtilisin/kexin/sedolisin Tripeptidyl-peptidase 1 4 4 1 0 0 0 Major allergen 12 9 5 2 2 2 Major allergen Asp 1 1 0 1 0 1 Allergen Asp f 7 1 0 0 0 0 0 Allergen V5/Tpx-1-related protein 4 Draft4 2 1 1 1 Major allergen Alt 5 3 2 1 1 1 Oxidase/reductase activity 60 51 31 6 19 2 Dehydrogenases 15 14 8 3 6 2 p450 20 15 6 1 2 0 Peroxidase 5 4 3 0 2 0 Dioxygenase 13 13 11 0 7 0 Protocatechuate -dioxygenase beta 4 3 3 0 2 0 subunit protein Esterase 30 26 8 5 6 0 Tannase feruloyl esterase 5 4 4 0 4 0 Carboxylesterase 16 12 3 3 2 0 Para-nitrobenzyl esterase 2 2 1 0 1 0 Cell wall protein 27 25 10 9 5 5 Cell wall protein 9 9 7 2 4 1 Gpi anchored cell wall protein 4 4 0 2 0 1 Gpi-anchored cell wall organization 8 7 1 3 1 1 protein ecm33 Carbohydrate binding 24 24 4 11 4 4 Carbohydrate binding protein 4 4 0 0 0 0 CFEM domain-containing protein 5 5 1 4 1 3 Chitin binding 3 3 1 0 1 0 WSC domain containing protein 9 9 1 6 1 1 Others families/gene classes alpha beta-hydrolase 14 12 6 1 3 0 Lipase 13 11 4 2 3 1
39 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 40 of 46
Hypothetical protein 240 172 56 29 18 12 Tyrosinase 4 4 2 1 1 0 Hemagglutinin 3 2 0 2 0 2 NLPs 4 4 4 0 4 0 Protein elicitor 1 1 1 0 1 0 Deoxyribonuclease TatD-related 2 1 1 0 1 0 protein Extracellular aldonolactonase protein 1 1 1 0 1 0 Isoform cra a protein 1 1 1 0 1 0 Hypersensitive response-inducing 1 1 1 0 0 0 protein elicitor Mycelial catalase cat1 1 1 1 0 1 0 Sulfatase 1 1 1 0 0 0 Survival protein SurE-like 1 1 1 0 1 0 phosphatase/nucleotidase Phytase protein 1 1 1 0 1 0 Ureidoglycolate lyase 1 1 1 0 1 0 Glutaminase 1 1 1 0 1 0 Epl1 protein 1 1 1 0 1 0 Carbonic anhydrase 1 1 1 0 1 0 ABC-type fe3+ transport system 4 Draft4 4 0 3 0 protein FAD binding domain containing 22 17 4 1 2 0 protein #As determined by SignalP, version 3.0, EffectorP 2.0 and BLASTp search against Carbohydrate-Active enzymes database at the threshold value of E<10-10. 1Gene models with ≥5 raw reads, either in any mycelia, or in planta. 2Differential regulation at >2 Log2 and Padj <0.05
40 https://mc06.manuscriptcentral.com/genome-pubs Page 41 of 46 Genome
770 Figure legend
771 Figure 1. Molecular phylogenetic analysis of Lasiodiplodia isolates collected from infected
772 cacao plants from Indonesia, Philippines and USA (country of origin with code AM: Indonesia,
773 Phi: Philippines and Miami: USA) in comparison to known Lasiodiplodia sp. and related
774 Botryosphaeriaceae fungi Diplodia corticola and Botryosphaeria lutea (accessed from
775 GenBank). The analysis was based on DNA sequence data of ITS 1 and 2 regions (GenBank-
776 MH412939 to MH412990) and part of EF1α gene of the 52 isolates. Sequence were combined
777 and aligned using ClustalW2 tool (Larkin et al. 2007) under default settings. A phylogenetic tree
778 was reconstructed using the Maximum Likelihood method based on Poisson correction model 779 (Zuckerkandl and Pauling, 1965), withDraft 1000 bootstrapped data sets. The tree is drawn to scale, 780 with branch lengths measured in the number of substitutions per site. Analyses were conducted
781 in MEGA6 (Tamura et al. 2011). Isolates of Lasiodiplodia spp. studied here in are listed in gen-
782 2019-0070.R3Supplb.
783 Figure 2. Differential aggressiveness responses by Lasiodiplodia spp. isolates as assessed in
784 mycelia inoculated leaf disc bioassay. After daily observation, out to 4 days post inoculation,
785 percentage of necrotic area of leaf blade (lamina) was quantified and area under disease progress
786 curve (AUDPC) was calculated. AUDPC was analyzed by two-way RM ANOVA with Fisher's
787 Least Significant Difference (LSD) test (P = 0.05) using GraphPad Prism version 7.0. Bars
788 indicate standard error of the mean (LSD0.05 = 28.92).
789 Figure 3. Bi-directional Venn diagram. Bi-directional blast results are present in a Venn
790 diagram. The code used for this diagram is: LT = Lasiodiplodia theobromae (AM2As) genes;
791 DC = Diplodia corticola genes; NP = Neofusicoccum parvum genes and BC = Botrytis cinerea
41 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 42 of 46
792 genes. Intersects are labeled with a number which represents the number of specific genes in that
793 intersect. To be considered as an ortholog, BLASTp matches should span at least 50% of the
794 sequence with E-value less than 1e-05. The Venn diagram was generated using the ‘DrawVenn
795 Diagram’ website at http://bioinformatics.psb.ugent.be/webtools/Venn/. The 2,862 L.
796 theobromae specific genes are listed in gen-2019-0070.R3Suppld.
797 Figure 4. Evolutionary relationships and transcription profiles of 115 putative effector proteins
798 of Lasiodiplodia theobromae (AM2As). (A) Amino acid sequences were aligned using
799 ClustalW2 tool (Larkin et al. 2007) under default settings and evolutionary relationships were
800 inferred using the Neighbor-Joining method (Saitou and Nei, 1987) with bootstrap (1000 801 replicates). There was a total of 6 positionsDraft in the final dataset. The tree is drawn to scale, with 802 branch lengths representing number of amino acid differences per sequence. Evolutionary
803 analyses were conducted in MEGA5 (Tamura et al. 2011). (B) For the relative transcription
804 profiles, normalized mean RNA-Seq read counts for in planta and mycelia libraries were
805 LOG10-transformed. The heat map was generated using CIMminer
806 (http://discover.nci.nih.gov/cimminer). White blocks indicate no detectable transcription.
42 https://mc06.manuscriptcentral.com/genome-pubs Page 43 of 46 Genome
Phi_L4 Phi_L5 Phi_L3 Phi_L2 Phi_L1 AM29B AM23A AM29A AM9B AM21A AM19C AM19B AM2As Amp8 AM50B AM9 31 AMp5C AM32A L. theobromae AM5E AM5F AM49 AM34A AM2C Phi_L6 Phi_L8 Phi_L9 Phi_L10 Phi_L11 Phi_L12 Phi_L13 50 Phi_L14 Miami_Draft1 L._theobromae_MF580791 L._theobromae_EF622074 AM50A Phi_L15 L._pseudotheobromae_MF671948 Phi_L7 . p
AM27A s AM27C a
AM26B_2 i
49 L._pseudotheobromae_KY655207 d
AM26B o AM54A l p 55 AM25B i Am27B d o
AM25C i
AM36A s 58 AM33A a 61 AM9A L AM25A AM19A 59 Miami_2 L._pseudotheobromae_EF622077 L._gonubiensis_AY639595 53 L_venezuelensis_KF766194 L._crassispora_KY994644 L._rubropurpurea_NR_136976 42 AM52 53 AM54B L. rubropurpurea AM54B2 Diplodia_corticola_MG220433 99 Diplodia_corticola_MG015741 Botryosphaeria_lutea_AY259091
0.02 Figure 1
https://mc06.manuscriptcentral.com/genome-pubs Genome Page 44 of 46
140 120 )
( % 100
80
P C 60 D
U 40 A 20 0
L. theobromae Lasiodiplodia sp. L. rubropurpurea Draft Figure 2
https://mc06.manuscriptcentral.com/genome-pubs Page 45 of 46 Genome
DC
BC
Draft
Figure 3
https://mc06.manuscriptcentral.com/genome-pubs Genome Page 46 of 46
A B 83 LTHEOB_5587 LTHEOB_5988 LTHEOB_10949 LTHEOB_9306 LTHEOB_6889 LTHEOB_9274 41 LTHEOB_3309 LTHEOB_12692 36 LTHEOB_8146 827 LTHEOB_11417 LTHEOB_68 25 LTHEOB_3477 LTHEOB_11673 LTHEOB_11518 LTHEOB_10916 21 LTHEOB_8187 LTHEOB_7140 1439 LTHEOB_10532 LTHEOB_8999 LTHEOB_83 LTHEOB_10018 LTHEOB_6118 20 LTHEOB_777 LTHEOB_6708 LTHEOB_8392 LTHEOB_10437 11 LTHEOB_1882 LTHEOB_10978 LTHEOB_5120 99 LTHEOB_10533 LTHEOB_6027 LTHEOB_9425 10 LTHEOB_8594 32 LTHEOB_4222 17 100 LTHEOB_3870 10 LTHEOB_1946 LTHEOB_10768 LTHEOB_10510 25 54 LTHEOB_10866 15 LTHEOB_9401 LTHEOB_2457 LTHEOB_10065 LTHEOB_11381 1249 LTHEOB_1074 LTHEOB_6344 LTHEOB_9419 LTHEOB_3825 LTHEOB_5899 10 LTHEOB_11039 47 98 LTHEOB_10522 LTHEOB_10948 Draft LTHEOB_11804 LTHEOB_12437 17 LTHEOB_897 LTHEOB_6028 LTHEOB_544 41 LTHEOB_3834 27 94 LTHEOB_197 LTHEOB_5817 LTHEOB_10971 LTHEOB_282 LTHEOB_10312 24 LTHEOB_8441 LTHEOB_11463 LTHEOB_8212 LTHEOB_11455 LTHEOB_8145 LTHEOB_53 22 LTHEOB_11453 15 LTHEOB_11454 LTHEOB_52 LTHEOB_1624 LTHEOB_5629 LTHEOB_4787 29 LTHEOB_6019 LTHEOB_8063 LTHEOB_2873 LTHEOB_11449 LTHEOB_4221 3607 LTHEOB_10356 LTHEOB_8573 80 45 LTHEOB_10448 31 LTHEOB_12891 LTHEOB_2218 LTHEOB_7368 LTHEOB_5293 14 LTHEOB_4255 LTHEOB_2998 LTHEOB_10224 LTHEOB_11788 LTHEOB_11502 27 LTHEOB_11506 52 54 LTHEOB_1556 26 LTHEOB_2730 LTHEOB_7548 LTHEOB_3558 LTHEOB_9035 LTHEOB_230 45 LTHEOB_8715 LTHEOB_4483 25 LTHEOB_4162 66 LTHEOB_4686 LTHEOB_10947 20 LTHEOB_3743 LTHEOB_9582 LTHEOB_6447 LTHEOB_6552 35 LTHEOB_12487 LTHEOB_9678 29 LTHEOB_63 LTHEOB_4822 LTHEOB_8191 LTHEOB_3882 LTHEOB_7369 7 1 5 9 3
2.5 2.0 1.5 1.0 0.5 0.0 0 3 3 6 6 4 4 6 4 5 3 6 7 9 0 6 8 1 2 ...... 5 4 2 2 0 3 1 0 -
Figure 4
https://mc06.manuscriptcentral.com/genome-pubs