bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 First genome of Labyrinthula, an opportunistic seagrass pathogen, reveals novel insight
2 into marine protist phylogeny, ecology and CAZyme cell-wall degradation
3 Mun Hua Tan1,2,3,4, Stella Loke1,2, Laurence J. Croft1,2, Frank H. Gleason5, Lene Lange6, Bo
4 Pilgaard7, Stacey M. Trevathan-Tackett1*
5
6 1Centre of Integrative Ecology, School of Life and Environmental Sciences Deakin
7 University, Geelong, Australia
8 2Deakin Genomics Centre, Deakin University, Geelong, Australia
9 3School of BioSciences, Bio21 Institute, University of Melbourne, Australia
10 4Department of Microbiology and Immunology, University of Melbourne, Bio21 Institute,
11 Melbourne, Australia
12 5School of Life and Environmental Sciences, University of Sydney, Sydney, 2006, New
13 South Wales, Australia
14 6BioEconomy, Research & Advisory, Valby, 2500 Copenhagen, Denmark
15 7Protein Chemistry and Enzyme Technology, Department of Bioengineering, Technical
16 University of Denmark, Kgs. Lyngby, Denmark
17
18
19 * Corresponding Author: Dr Stacey Trevathan-Tackett, [email protected],
20 Deakin University, 221 Burwood Hwy, Burwood, Victoria, Australia 3125
21
22 Keywords: evolution; ion regulation; mitochondrial genome; saprobe; Stramenopiles;
23 virulence
1 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
24 Abstract
25 Labyrinthula spp. are saprobic, marine protists that also act as opportunistic pathogens and
26 are the causative agents of seagrass wasting disease (SWD). Despite the threat of local- and
27 large-scale SWD outbreaks, there are currently gaps in our understanding of the drivers of
28 SWD, particularly surrounding Labyrinthula virulence and ecology. Given these
29 uncertainties, we investigated Labyrinthula from a novel genomic perspective by presenting
30 the first draft genome and predicted proteome of a pathogenic isolate of Labyrinthula
31 SR_Ha_C, generated from a hybrid assembly of Nanopore and Illumina sequences.
32 Phylogenetic and cross-phyla comparisons revealed insights into the evolutionary history of
33 Stramenopiles. Genome annotation showed evidence of glideosome-type machinery and an
34 apicoplast protein typically found in protist pathogens and parasites. Proteins involved in
35 Labyrinthula’s actin-myosin mode of transport, as well as carbohydrate degradation were
36 also prevalent. Further, CAZyme functional predictions revealed a repertoire of enzymes
37 involved in breakdown of cell-wall and carbohydrate storage compounds common to
38 seagrasses. The relatively low number of CAZymes annotated from the genome of
39 Labyrinthula SR_Ha_C compared to other Labyrinthulea species may reflect the
40 conservative annotation parameters, a specialised substrate affinity and the scarcity of
41 characterised protist enzymes. Inherently, there is high probability for finding both unique
42 and novel enzymes from Labyrinthula spp. This study provides resources for further
43 exploration of Labyrinthula ecology and evolution, and will hopefully be the catalyst for new
44 hypothesis-driven SWD research revealing more details of molecular interactions between
45 Labyrinthula species and its host substrate.
2 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
46 Introduction
47 Sequencing capabilities of both short and long read technologies have shown marked
48 progress in recent years, resulting in the generation of a large number of contiguous and high
49 quality genome assemblies reported for a myriad of living organisms at an unprecedented
50 pace. As a result of genome sequencing projects at local-to-global scales, we are better
51 resolving linages across the Tree of Life (Lewin et al. 2018), revealing new diversity for
52 underrepresented or unculturable microorganisms (e.g. aquatic and marine protists and
53 zoosporic fungi; Sibbald and Archibald 2017, Lange et al. 2019a) and informing the
54 ecological and functional roles of organisms in the environment (Kyrpides et al. 2014). In
55 marine ecosystems, genomic and transcriptomic sequencing is producing vital information on
56 the ecological roles microorganisms play. For example, genomes of putative pathogens are
57 revealing genes involved in virulence and attachment to the host, as well as mechanisms
58 involved in degrading the host substrate and switching between non-pathogenic and
59 pathogenic lifestyles (Fernandes et al. 2011). These new insights are especially important as
60 marine emergent diseases and opportunistic pathogens are predicted to increase globally in
61 the coming decades due to the suboptimal living conditions for marine biota caused by
62 human- and climate-induced changes (Harvell et al. 1999).
63
64 Seagrass meadows are one such marine ecosystem susceptible to a decline in health and
65 increased infection, as they are subjected to dynamic pressures at the land-sea interface, such
66 as elevated temperatures in shallow regions and run-off-associated turbidity, eutrophication
67 and hyposalinity conditions (Sullivan et al. 2018). There are currently four pathogens that
68 cause disease to multiple seagrass genera: Labyrinthula, Phytophthora, Halophytophthora,
69 and Phytomyxea (Sullivan et al. 2018). Labyrinthula, a heterotrophic, halophytic protist in
70 the Stramenopile lineage, is the causative agent of seagrass wasting disease (SWD). To-date,
3 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
71 it is also the most well-studied seagrass pathogen after being discovered in the early twentieth
72 century when it decimated 90% of the North Atlantic population of the seagrass Zostera
73 marina (Sullivan et al. 2018). Labyrinthula isolates from seagrass leaves have shown variable
74 levels of virulence through laboratory infection trials and associated phylogenetic analyses
75 (Martin et al. 2016, Brakel et al. 2017). Virulent isolates are able to invade the leaf cells of
76 living plants and subsequently cause black leaf lesions, the diagnostic trait of SWD (Sullivan
77 et al. 2018). However, Labyrinthula, is also a ubiquitous commensal saprobe that
78 decomposes marine plants and algae litter, with one species parasitic to turfgrass (Martin et
79 al. 2016). Because of the flexibility of its ecological function, Labyrinthula spp. are
80 considered opportunistic pathogens that in some cases utilise a mild parasitism strategy
81 (Brakel et al. 2017).
82
83 Since its discovery, the advances in our understanding of Labyrinthula species include
84 phylogenetic reclassifications of Labyrinthula (Leander and Porter 2001) and molecular
85 techniques that facilitate detection of Labyrinthula without the need for culturing (Duffin et
86 al. 2020, Lohan et al. 2020). Despite this progress, we still do not completely understand the
87 pathosystem of SWD. For example, it is still under debate whether the conditions that
88 promote SWD rely more on seagrass health and defence capabilities or the virulence and
89 enzymatic capabilities of Labyrinthula isolates and haplotypes, or a combination of host and
90 pathogen characteristics (Martin et al. 2016, Duffin et al. 2020). Much of the new knowledge
91 on the pathosystem has come from the immune and stress response of the seagrass using
92 genomic markers or assays (Brakel et al. 2014, Duffin et al. 2020), as opposed to pathogen
93 virulence (Martin et al. 2016). Furthermore, despite our molecular advances, we also know
94 little about the role of specific organelles, such as the bothrosome, in the hunt and
95 consumption of food sources (Collier and Rest 2019). Next-generation sequencing
4 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
96 technologies are helping us produce new hypotheses for Labyrinthula and SWD. For
97 example, Lohan et al. (2020) questioned how the variation in Labyrinthula diversity links
98 with intra-species diversity of the seagrass host.
99
100 This set of ecological uncertainties constitutes the basis for our interest in sequencing the
101 genome of a Labyrinthula isolate. So far the genomes of only three different genera of the
102 Labyrinthulea class have been sequenced, which has been valuable in revealing novel
103 ecological insights and biotechnology applications (Song et al. 2018). In this study, using a
104 hybrid assembly approach combining both short and long read sequence data, we assemble
105 the first known draft genome of a pathogenic haplotype of Labyrinthula and report a highly
106 contiguous assembly. Through sequencing and annotating a draft Labyrinthula genome, our
107 aim is to further resolve the genus’ lineage and function, particularly its capacity to degrade
108 seagrass carbohydrates, and compare those functions in closely-related but non-pathogenic,
109 saprobic members of Labyrinthulea. We expect that this work will also provide a launching
110 pad to deeply investigate and understand the basic ecology and evolutionary history of this
111 ubiquitous marine protist, as well as to unlock the potential of their understudied and
112 unexploited CAZyme diversity.
113
114 Materials and methods
115 Labyrinthula sp. isolate description, sample preparation and sequencing
116 The Labyrinthula sp. (order Labyrinthulida) isolate used for sequencing was cultured on
117 seawater-serum agar plates (Trevathan-Tackett et al. 2018) at room temperature prior to
118 preparing DNA and RNA for sequencing. Isolated from a seagrass meadow in Victoria,
119 Australia, isolate SR_Ha_C (GenBank: MF872134.1; referred simply as Labyrinthula
120 SR_Ha_C herein) was previously shown to be highly pathogenic to both Zostera muelleri and
5 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
121 Heterozostera nigricaulis (Trevathan-Tackett et al. 2018). Genomic DNA for short-read
122 Illumina sequencing was extracted from the Labyrinthula SR_Ha_C culture with the
123 ZymoBIOMICS Quick DNA kit according to the manufacturers’ instructions. Total RNA
124 was prepared using the Monarch Total RNA isolation kit according to the standard protocol
125 for hard to lyse cells. The short-read genome library was prepared with the Nextera Flex
126 DNA library preparation kit (Illumina, San Diego) and sequenced on the MiSeq with 2 x
127 300bp V3 chemistry. The short-read RNAseq library was prepared with the Nugen mRNA
128 ovation kit and sequenced on the MiniSeq with 2x150bp chemistry. The full-length
129 transcriptome libraries were prepared for long-read sequencing with Lexogen’s teloprime kit
130 and Oxford Nanopore’s SQK-LSK109 library prep kit (ONT, UK). Sequencing was
131 performed on one Flongle (FLO) flow cell for 24 h and one MinION R9.4.1 flow cell for 24
132 h. See Electronic Supplementary Material 1 for detailed isolate, extraction, assembly and
133 annotation methodology.
134
135 Genome assembly
136 Illumina reads were trimmed for quality and adapters with Trimmomatic, whereas adapters
137 were trimmed from Nanopore MinION reads with Porechop. Haploid genome size, repetitive
138 content and heterozygosity size for Labyrinthula SR_Ha_C was estimated with
139 GenomeScope based on the frequency of k-mers in short reads (Vurture et al. 2017).
140 MaSuRCA (Zimin et al. 2017) was used to perform a hybrid assembly of the genome,
141 combining short and long read data. Resulting contigs were corrected and polished with three
142 iterations of Racon and Pilon. Since bimodal k-mer distributions were observed from the
143 GenomeScope analysis, indicating high levels of heterozygosity, the Purge Haplotigs pipeline
144 was used to remove haplotigs (Roach et al. 2018). Checks were put into place to scan for
145 possible bacterial contaminant contigs. The first was to search for any contigs with first hits
6 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
146 to bacterial sequences in NCBI’s non-redundant nucleotide database. Secondly, RNAmmer
147 was used to identify contigs containing bacterial ribosomal RNAs. In both searches, none of
148 the contigs met these criteria and therefore all were retained. The final assembly was
149 compared to those of other Stramenopiles based on assembly quality and completeness with
150 Quast (Mikheenko et al. 2018) and BUSCO (Seppey et al. 2019), respectively. A list of
151 species included in this comparison is available in Table S1.
152
153 Mitogenome recovery and phylogenetic analysis
154 The complete Labyrinthula SR_Ha_C mitogenome was extracted from the assembly through
155 blastn homology with other protist mitogenomes. The mitogenome was annotated with
156 MFannot and MITOS and manually adjusted according to proteins in NCBI’s non-redundant
157 database. Phylogenetic reconstruction was performed based on thirteen genes typically found
158 in eukaryote mitogenomes (atp6, atp8, cox1-3, nad1-6, nad4l, cytb) (Boore 1999). These
159 were extracted from 127 Stramenopile mitogenomes and four from outgroup species (Table
160 S2). Protein sequences were aligned with MAFFT, trimmed with trimAl and concatenated
161 into a superalignment with FASconCAT-G. IQ-TREE was used to find the best-fit partition
162 model and to infer a maximum-likelihood tree. The tree was rooted with Reclinomonas
163 americana, which has a mitogenome that most closely resembles the ancestral proto-
164 mitochondrial genome (Lang et al. 1997).
165
166 Gene prediction and functional annotation
167 Two types of “hints” were used to predict genes for this non-model organism. The first type
168 consists of Labyrinthula transcripts generated in this study by sequencing (Illumina) and
169 assembling the transcriptome (Haas et al. 2013) to generate an Illumina-based transcriptome
170 (Data S1), in addition to long Nanopore-sequenced cDNA reads. The second type included
7 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
171 known peptide sequences from 235 protists downloaded from Ensembl Protists (release 46;
172 Howe et al. 2019) and curated sequences from UniProtKB/Swiss-Prot (Consortium 2018).
173
174 Gene prediction was performed with the MAKER2 annotation pipeline (Holt and Yandell
175 2011). A first iteration was run to identify a preliminary set of genes based on provided hints.
176 The predictions were used to train two ab initio gene predictors, Augustus and SNAP, after
177 which the resulting gene models were used in a second MAKER2 iteration. Since protist
178 genomes are diverse and therefore contribute highly divergent gene sequences, the
179 keep_preds option was activated in this iteration to report all predictions regardless of
180 concordance to supporting hints. InterProScan was used to scan these predictions for the
181 presence PFAM protein domains and any gene models that have physical evidence
182 (annotation edit distance, AED < 1) and/or protein domains were retained. To annotate
183 function, DIAMOND (Buchfink et al. 2015) was used to find homology between predicted
184 genes and proteins in NCBI’s non-redundant protein database (e-value cutoff 1e-5).
185
186 Ortholog discovery within the kingdom Stramenopila (i.e. Chromista) was performed using
187 protein sequences obtained from GenBank for two commonly-studied protists: 1.
188 Plasmodium falciparum (BioProject: PRJNA148), 2. Phytophthora sojae (BioProject:
189 PRJNA262907). Pairwise searches of protein sequence sets were performed with DIAMOND
190 (Buchfink et al. 2015) and orthologs between Labyrinthula and the two species were
191 identified through Reciprocal Best Hits (RBH).
192
193 Identification and peptide-based functional annotation of putative carbohydrate-active
194 enzymes
8 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
195 The dbCAN2 meta server (Zhang et al. 2018) was used to annotate carbohydrate-active
196 enzymes (CAZymes) based on enzyme families available in the CAZy database (Cantarel et
197 al. 2008). Due to the divergence of Labyrinthula from other eukaryotes, the HMMER search
198 option was selected (E-Value < 1e-15, coverage > 0.35) to search against representative
199 hidden Markov models of signature domains within the five enzyme classes: Glycoside
200 Hydrolases (GH), Glycosyl Transferases (GT), Polysaccharide Lyases (PL), Carbohydrate
201 Esterases (CE), and Auxiliary Activities (AA). Counts of putative Labyrinthula SR_Ha_C
202 CAZyme sequences were compared to data available on the CAZy database for 36 other
203 Stramenopile species.
204
205 The predicted amino acid sequences of Labyrinthula SR_Ha_C were analysed by the new
206 non-alignment based method, Conserved Unique Peptide Patterns (CUPP; Barrett and Lange
207 2019). CUPP automatically, annotates proteins to families and subfamilies. Further, if
208 characterised members of the CUPP group query is already assigned to a specific CAZyme
209 function, CUPP assigns that function to the query protein. For the purpose of comparison,
210 predicted proteins of three other members of the class Labyrinthulea (a.k.a.
211 Labyrinthulomycetes) were analysed: Aplanochytrium kerguelense (PBS07; family
212 Aplanochytridae, order Labyrinthulida), as well as Aurantiochytrium limacinum (ATCC
213 MYA-1381) and Schizochytrium aggregatum (ATCC 28209) (family Thraustochytriidae,
214 order Thraustochytrida).
215
216 Results & Discussion
217 A contiguous Labyrinthula sp. genome
218 This study reports the first assembled genome of a Labyrinthula species, available on
219 GenBank (BioProject: PRJNA607370, WGS version: JAALGZ01). This 50.9 Mbp assembly
9 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
220 contained 159 contigs (Table 1), consistent with the kmer-based method’s estimate of a 51
221 Mbp haploid genome size for this isolate (Figure S1). The majority of short reads (99.07%)
222 were successfully aligned back to the assembly. Contiguity and completeness of this
223 assembly were substantially improved by the inclusion of Nanopore long reads in a hybrid
224 assembly, demonstrating the value of strategic incorporation of reads from different
225 platforms. Though the SR_Ha_C isolate was maintained in culture media that included
226 streptomycin and penicillin, further careful checks were performed to exclude bacterial
227 contigs. No bacterial contaminants were found in the genome, ensuring that findings reported
228 in this study accurately reflect the genetics of this Labyrinthula SR_Ha_C isolate. The ‘clean’
229 genome will also prevent inaccurate interpretations in other larger-scale studies looking into
230 investigations of bacteria-to-animal lateral gene transfer (Sibbald and Archibald 2017).
231
232
10 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
233 Table 1. Data generation, genome assembly and annotation of the Labyrinthula sp.
234 isolate SR_Ha_C. nr refers to the non-redundant protein database. aa = amino acids.
Raw sequencing data and depth (based on 51 Mbp genome size estimate) Illumina (MiSeq) 21.5 Gbp (421.79×) Oxford Nanopore (MinION) 612 Mbp (12.00×) Mitogenome assembly Assembly size (circular) 26,822 bp # protein coding genes 23 # ribosomal RNAs 2 # transfer RNAs 36 Genome assembly Number of contigs 159 Assembly size 50,948,119 bp Contig N50 (median) size 1,265,572 bp Average contig size 320,428.42 bp Longest contig 3,302,845 bp Shortest contig 4,362 bp BUSCO completeness (lineage: stramenopiles_odb10, n:100) Complete 60.0% Fragmented 9.0% Missing 31.0% BUSCO completeness (lineage: eukaryota_odb10, n:255) Complete 42.8% Fragmented 9.0% Missing 48.2% Genome annotation Number of ab initio proteins predicted by gene predictors 17,041 with AED+ < 1 and/or with PFAM domain 8,488 with homology to NCBI’s nr database (1e-5) 6,351 with homology to NCBI’s nr database (1e-50) 1,372 Average protein length 470 aa Average number of exons/gene 2.03 Longest protein* 15,723 aa 235 + Annotation edit distance (AED) ranges from 0 to 1, a small value shows lesser variance between the 236 predicted gene and provided evidence. 237 * Homologous to “Receptor-type tyrosine-protein phosphatase F” [Hondaea fermentalgiana] 238
11 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
239
240 Genome completeness was assessed with the BUSCO algorithm, which reported a 60.0% and
241 42.8% completeness for this reported assembly, based on stramenopiles_odb10 and
242 eukaryota_odb10 ortholog sets, respectively (Table 1, Figure 1). The former ortholog set (27
243 species, 100 BUSCOs) contained only three species from the phylum Bigyra compared to the
244 Oomycota (14) and Ochrophyta (9) phyla. The latter (70 species, 255 BUSCOs) was derived
245 from data of 37 metazoan species, 22 fungi, 10 plants and one Rhodophyta, the single protist
246 representative in this dataset. While the Labyrinthula SR_Ha_C assembly reported relatively
247 high values of longest contig and N50 lengths, BUSCO-assessed completeness ranked lower
248 than that of most other Stramenopiles (Figures 1a, S2). Higher completeness was generally
249 observed for species in the phylum Oomycota but appeared to be more variable in the other
250 two phyla (Figure 1a). In contrast, QUAST analysis showed Labyrinthula SR_Ha_C
251 outperforming the Bigyra phylum average (green cells in Figure 1b) across all measured
252 metrics (Figure 1c). Similarly, high quality QUAST outcomes for Blastocystis hominis were
253 not reflected in BUSCO completeness values (Labyrinthula SR_Ha_C 60%, Blastocystis
254 hominis 45%).
255
12 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
256
257 Figure 1: Comparison of genome assemblies for Stramenopiles in three phyla: Bigyra,
258 Ochrophyta and Oomycota. a. (Left) N50 length and longest contig/scaffold for each
259 assembly (cutoff 20 Mbp); (Right) Percentages of complete and fragmented BUSCOs based
260 on 100 BUSCOs in stramenopiles_odb10. Number of assemblies per phylum are listed in
261 brackets. A detailed version and sources are available in Figure S2 and Table S1. b. Further
262 details on assemblies with sequence (contig/scaffold) length distribution for species in the
263 phylum Bigyra. Longest sequence length, N50 and N75 are converted to percentages out of
264 assembly length for more accurate comparisons across species with different genome sizes.
265 Green cells are for values that outperform the average and red for values poorer than average,
266 subject to the context of each column. c. QUAST-generated Figures showing a cumulative
267 length plot, Nx values and GC content for species in the phylum Bigyra (colour legend in
268 panel b). Assembly sources: Ensembl Protists, NCBI, JGI.
13 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
269
270 Though a surge of new genomes are being made available in more recent times, taxonomic
271 sampling of protists are largely skewed to select ‘popular’ lineages with relevance to medical,
272 agricultural and aquaculture interests. However, since gene model prediction, quality
273 assessment and other comparative genomic analyses largely rely on the support of closely-
274 related sequences; this patchy and uneven representation presents gaps in current knowledge
275 and challenges efforts to better characterise and understand genomes of less-studied
276 organisms. In the case of quality assessments, while the BUSCO algorithm has been shown
277 to be an effective and popular choice of tool for assessing the ‘completeness’ of genome
278 assemblies, its shortcomings have also been discussed especially in relation to lineages
279 sampled in its reference ortholog datasets (Exposito Alonso et al. 2020). Subject to
280 characteristics of an assessed genome, there can be technical limitations in estimating
281 proportions of fragmented and missing BUSCOs, and in species that are highly divergent
282 from the reference ortholog dataset, higher degrees of reported BUSCO ‘missingness’ may
283 reflect the organism’s evolutionary divergence and history rather than the completeness of its
284 assembly (Exposito Alonso et al. 2020). Our results show that the use of multiple
285 assessment qualities is important to more accurately assess the quality of assemblies,
286 particularly those of highly divergent, non-model organisms (Exposito Alonso et al. 2020).
287
288 Labyrinthula sp. gene annotation
289 A transcriptome with 1.2 million transcripts was generated and assisted in the annotation of
290 the genome, resulting in the prediction of >17,000 protein sequences (Table 1; annotation
291 results in Data S1). We applied a conservative filter to retain only gene models that were
292 either supported by existing sequence evidence or those that contain PFAM protein domains,
293 in order to avoid false gene predictions albeit at the cost of removing some real, but clade-
14 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
294 specific, novel genes. Since the annotation process typically relies on the homology of
295 sequences to those in other well-annotated reference genomes (Exposito Alonso et al. 2020),
296 the absence of a genome available as a close relative of Labyrinthula SR_Ha_C resulted in a
297 reduced final number of 8,488 proteins reported in this study (Table 1; Data S1). This is
298 likely an underestimation of the true number owing to the limited sequenced genomes in this
299 clade and further highlights the novelty of the Labyrinthula genome.
300
301 A survey of the annotated functions of these predicted peptides revealed rhodanese-like
302 proteins related to stress response and sulphur metabolism (Cipollone et al. 2007), as well as
303 a rhodopsin protein (Data S1). While light-activated functions have not been explored in
304 Labyrinthula species, phototaxis and rhodopsin-related functions have been increasingly
305 identified for estuarine zoosporic fungi and protists (Kazama 1972, Tian et al. 2018). We
306 noted proteins involved in sodium, potassium and calcium exchange and transport suggestive
307 of the pathways involved in ion homeostasis and salinity adaptations in halophytic organisms
308 (Harding et al. 2017). The annotation also revealed a wealth of actin-myosin and
309 carbohydrate-degradation proteins. A pairwise annotation, with orthologs of known
310 pathogenic protist Plasmodium falciparum and oomycete Phytophthora sojae, further
311 revealed glideosome and apicoplast ribosomal proteins (Data S1). The presence of a
312 glideosome protein in addition to the actin-myosin machinery suggests shared traits with the
313 apicomplexan family of pathogens, and opens an interesting avenue for investigating the
314 machinery underlying Labyrinthula parasitism and host invasion, as well as its evolutionary
315 history of endosymbiosis (Keeley and Soldati 2004, Oborník et al. 2009).
316
317 CAZymes and functional predictions facilitating plant cell wall degradation
15 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
318 A total of 112 genes associated with carbohydrate-active enzyme (CAZy) activities were
319 identified from the predicted Labyrinthula gene set (Figure 2, Table S3). This was distributed
320 across 56 glycosyltransferases (GT), 34 glycoside hydrolases (GH), 17 auxiliary activities
321 (AA), 3 polysaccharide lyases (PL) and 2 carbohydrate esterases (CE). We identified the
322 presence of genes categorised in several groups of putative aromatic and cellulose-degrading
323 enzymes with activities associated with the breakdown of plant cell walls. Some of these
324 enzyme families also show an enrichment of genes relative to other available protist data on
325 the CAZy database, including, the AA2 enzyme family (activities: manganese peroxidase,
326 lignin peroxidase), the PL1 family (activities: pectate lyase, pectin lyase), the GH1 family
327 (activities: β-glucosidases, β-galactosidases) and the GH5 family (activities:
328 endoglucanase/cellulase, etc) (Figure 2).
329
16 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
330
17 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
331 Figure 2: Annotation of Carbohydrate-active enzymes in Labyrinthula sp. isolate
332 SR_Ha_C in comparison with other Stramenopile profiles available in the CAZy
333 database. Comparative analysis was done for five enzyme classes: Auxiliary Activities
334 (AA), Carbohydrate Esterases (CE), Glycoside Hydrolases (GH), GlycosylTransferases (GT)
335 and Polysaccharide Lyases (PL). Enzyme family numbers correspond to those in the CAZy
336 database and are indicated on the x-axis. Asterisks (*) indicate the Labyrinthula isolate
337 SR_Ha_C reported in this study.
338
339 As these enzyme families can act on a diversity of substrates, we used the CUPP analysis to
340 more closely predict the function of the CAZymes. We found that the Labyrinthula
341 SR_Ha_C genome has a broad selection of cell wall-degrading and storage materials
342 modifying enzymes (starch, cellulose, trehalose, Beta-D-galactosides, α-mannan. α-1,2-
343 mannan), as well as glucosylceramidase that can be involved in pathways in plant hosts-
344 pathogen relationships (Table 2) (Warnecke and Heinz 2003). The three enzyme functions
345 with highest number of functionally annotated genes were 1,2-α-mannosidase (5), beta
346 glucosidase (4) and glucosylceramidase (3). Interestingly, functional annotation by CUPP did
347 not identify xylanolytic, chitinolytic or pectinolytic CAZymes. However, as the seagrass
348 fibres (particularly of the Zostera genus) are known to be rich in both xylan and pectin
349 (Davies et al. 2007) and the functional annotation by homology suggested xylanolytic and
350 pectinolytic proteins (Data S1), we suggest that Labyrinthula SR_Ha_C could have such
351 enzyme functionalities, but represented by enzymes too distantly related to the known
352 CAZymes to be identified.
353
354 The CUPP annotations of Labyrinthula SR_Ha_C were low (74) compared to the other three
355 Labyrinthulea species: A. kerguelens (129), S. aggregatum (215), and A. limacinum (266;
18 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
356 Tables 2, S4). Interestingly, the ratio of CAZymes, which could be predicted to function,
357 divides the four species in two groups. Notably, the two Thraustochytrid species A.
358 limacinum and S. aggregatum had 7% and 12 % of their CAZymes ascribed to function,
359 respectively, while the two Labyrinthulidae species had 29% (Labyrinthula sp.) and 24% (A.
360 kerguelense) ascribed. In short, among the four species studied, the richest CAZyme profile
361 has also the highest level of uniqueness found among its enzyme portfolio (Tables 2, S4).
362 Although we are working with a conservative predicted proteome, the rather low number of
363 CAZymes found in Labyrinthula SR_Ha_C may be interpreted to reflect its ecology, e.g.
364 being an opportunistic pathogen, it has a more specialised substrate/host affinity, as
365 compared to the more saprotrophic lifestyles of the other three species. Further, it is attractive
366 to interpret the more specialised enzyme profile of Labyrinthula SR_Ha_C. in the context of
367 its highly specialised ectoplasmic net, a unique structure not only attaching the Labyrinthula
368 colony to surfaces but also secreting digestive enzymes and nutritional uptake, similar to S.
369 aggregatum (Iwata and Honda 2018).
370
19
bioRxiv preprint (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade
371 Table 2 Comparative functional annotation of CAZymes in Labyrinthula sp. isolate SR_Ha_C as compared to three other doi:
372 Labyrinthulea. Each line of the table specifies one type of enzyme function, listing EC number, and estimated substrate. Note, the number of https://doi.org/10.1101/2020.09.14.297390
373 CAZymes functionally annotated for all these four species are as can be seen rather few. This is due to most of the secretome profile of the
374 Labyrinthulea are too distantly related to known and characterised CAZymes to provide a basis for functional annotation. Further, as appears in
375 the case of Labyrinthula SR_Ha_C, all enzyme functions are represented by only one type of enzyme family, while for the more diverse available undera
376 CAZyme profiles of the three other species, some enzyme functions are represented by two enzyme families. The Gene Number columns are
377 given as heat maps. All CAZymes found by the CUPP analysis (both with predicted function and the CAZymes with no predicted function) are CC-BY-NC-ND 4.0Internationallicense ; 378 listed in Table S4. this versionpostedSeptember15,2020.
Labyrinthula Aplanochytrium Aurantiochytrium Schizochytrium SR_Ha_C kerguelense limacinum aggregatum
Labyrinthulidae Aplanochytriidae Thraustochytriaceae Thraustochytriaceae
Enzyme Enzyme Enzyme Enzyme Nb. No. No. No. Estimated families families families families EC no. Function description of of of of overall substrate representing representing representing representing genes genes genes genes function function function function . 2 1 1 1 1 1 GH31 3 1 - 1 - The copyrightholderforthispreprint Starch 3.2.1.20 α-glucosidase GH13 GH31 GH31 GH31 Endo-1,4-β-D- Cellulose 2 2 GH5 2 2 GH5 - 0 - - 1 1 GH5 - 3.2.1.4 glucanase Cellulose 3.2.1.21 β-glucosidase 4 4 GH1 1 1 GH1 - 0 - - 1 1 GH1 - Xylan 3.2.1.37 1,4-β-xylosidase 0 - 1 1 GH3 - 1 1 GH3 - 2 2 GH3 - 1 0 - 1 - 0 - - 0 - - Fucose 3.2.1.51 α-L-fucosidase GH29
20
bioRxiv preprint (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade
Labyrinthula Aplanochytrium Aurantiochytrium Schizochytrium doi: SR_Ha_C kerguelense limacinum aggregatum https://doi.org/10.1101/2020.09.14.297390 Labyrinthulidae Aplanochytriidae Thraustochytriaceae Thraustochytriaceae
Enzyme Enzyme Enzyme Enzyme Nb. No. No. No. Estimated families families families families EC no. Function description of of of of overall substrate representing representing representing representing genes genes genes genes function function function function available undera 1 0 - 1 - 0 - - 0 - - Arabinan 3.2.1.88 β-L-arabinosidase GH27 1 β-D-galactosides 3.2.1.23 β-galactosidase 1 1 GH2 1 1 GH2 - 0 - - 3 1 GH1 GH35 1 1 1 α-D-galactosides 3.2.1.22 α-galactosidase 1 1 GH36 2 0 - - 1 - CC-BY-NC-ND 4.0Internationallicense GH27 GH36 GH36 ;
β-mannan 3.2.1.25 β-mannosidase 0 - 1 1 GH2 - 0 - - 0 - - this versionpostedSeptember15,2020. 1 1 1 α-1,3-glucan 3.2.1.84 1,3-α-glucosidase 1 1 GH31 1 - 1 - 1 - GH31 GH31 GH31 1 1 1 Chitin 3.2.1.14 chitinase 0 - 2 0 - - 1 - GH19 GH18 GH18 N-acetyl-β-D- β-N- 1 2 3.2.1.52 0 - 1 - 2 - 0 - - hexosaminides acetylhexosaminidase GH20 GH20 endo-1,3-β-D- 1 β-1,3-glucan 3.2.1.39 0 - 0 - - 0 - - 1 - glucosidase GH16 1 Trehalose 3.2.1.28 α,α-trehalase 1 1 GH37 1 - 0 - - 0 - - GH37
Galactosamino- 1 . 3.2.1.109 endogalactosaminidase 0 - 0 - - 0 - - 1 - glycan GH114 maltopentaose-forming α-1,6-mannan 3.2.1.x 0 - 1 - 0 - 0 - The copyrightholderforthispreprint α-amylase 1 1 α-mannan 3.2.1.24 α-mannosidase 1 1 GH38 1 - 0 - - 1 - GH38 GH38 4 2 2 1 3 α-1,2-mannan 3.2.1.113 1,2-α-mannosidase 5 5 GH47 4 - 4 4 GH47 GH92 GH47 GH92 GH47 glycoprotein endo- 2 α-1,2-mannan 3.2.1.130 0 - 2 - 0 - - 0 - - alpha-1,2-mannosidase GH99 Glycoprotein 3.2.1.96 endo-β-N- 0 - 1 1 - 0 - - 0 - -
21
bioRxiv preprint (which wasnotcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade
Labyrinthula Aplanochytrium Aurantiochytrium Schizochytrium doi: SR_Ha_C kerguelense limacinum aggregatum https://doi.org/10.1101/2020.09.14.297390 Labyrinthulidae Aplanochytriidae Thraustochytriaceae Thraustochytriaceae
Enzyme Enzyme Enzyme Enzyme Nb. No. No. No. Estimated families families families families EC no. Function description of of of of overall substrate representing representing representing representing genes genes genes genes function function function function available undera acetylglucosaminidase GH85 N-acetyl-D- N-acetylglucosamine- glucosamine 6- 3.5.1.25 6-phosphate 0 - 0 - - 1 1 CE9 - 1 1 CE9 - phosphate deacetylase
Peptidophospho- CC-BY-NC-ND 4.0Internationallicense 3.2.1.146 β-galactofuranosidase 0 - 1 1 GH2 - 0 - - 0 - - galactomannan ; 1 1 1 this versionpostedSeptember15,2020. Cerebrosides 3.2.1.46 galactosylceramidase 0 - 1 - 1 - 1 - GH59 GH59 GH59 2 Cerebrosides 3.2.1.45 glucosylceramidase 3 2 GH5 2 2 GH5 - 5 3 GH5 3 3 GH5 - GH30 cytochrome-c Peroxide 1.11.1.5 1 1 AA2 1 1 AA2 - 1 1 AA2 - 2 2 AA2 - peroxidase p-benzoquinone Aromatics 1.6.5.6 1 1 AA6 0 - - 4 4 AA6 - 1 1 AA6 - reductase (NADPH) 379
380 . The copyrightholderforthispreprint
22 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
381
382 Labyrinthula SR_Ha_C shared several Function:Family constellations with the other three
383 Labyrinthulea species, that is, the four glycohydrolases, belonging to GH31 and GH38, are
384 representing the same four EC functions in all species (Table 2). However significant
385 discrepancies in Function:Family constellations are found as well: Enzyme function EC
386 3.2.1.113 predictive of α-mannan is represented by GH38 in Labyrinthula SR_Ha_C and A.
387 kerguelense; while represented by two different proteins, GH47 and GH92 in the two
388 Thraustochytriaceae species (A. limacinum and S. aggregatum). All in all, these observations
389 reflect relatedness among the Labyrinthulea species, as well as indicating that they are only
390 distantly related in digestive enzyme profiles, and likewise, substrate affinities.
391
392 Evolutionary relationships within Bigyra and Stramenopile
393 We recovered a complete circular Labyrinthula SR_Ha_C mitogenome (accession number:
394 MT267870) (Table 1, Figure S3). All genes were transcribed on the heavy strand, which has
395 a guanine-rich base composition of 28.18% A, 8.26% C, 39.16% G and 24.40% T. In
396 addition to the 13 protein-coding genes (PCG) typically found in animal mitogenomes, 10
397 other PCGs found in other heterotrophic protists were present in the Labyrinthula
398 mitogenome (Wideman et al. 2020), namely atp9, nad7, nad9, nad11, rpl16, rps11-13, tatA
399 and tatC. At the time of this study, molecular sequences for Labyrinthula species were scant
400 on the NCBI database, reporting only 572 nucleotide fragments, majority of which are
401 ribosomal RNA and ITS sequences. Our complete annotated mitogenome showed an
402 interesting strand asymmetry characteristic, containing a high guanine (G) to cytosine (C)
403 ratio of 4.74:1 on its heavy strand versus an average of 1.22:1 in other Stramenopiles.
404 Guanine-enriched sequences in other eukaryotes have been shown to adopt G-quadruplex
405 (G4) stable structures and may play potential roles in the regulation of DNA replication,
23 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
406 transcription and translation (Falabella et al. 2019). Using G4Hunter (Brázda et al. 2019),
407 Labyrinthula SR_Ha_C’s predicted G4 frequency (15/1kbp) was 50- and 300-fold higher
408 than predicted in Schizochytrium sp. TIO1101 and Cafeteria roenbergensis, respectively. It is
409 difficult to infer that the cause of this high frequency due to the newness of this metric for
410 non-human mitogenomes. However, as more mitogenomes are made available in the future, it
411 will be of interest to see if this high frequency is observed in the mitogenomes of other more
412 closely-related species and whether G4 richness is associated with protist evolutionary
413 history or as potential regulators of virulence in pathogens (Harris and Merrick 2015).
414
415
416 Figure 3: Evolutionary relationships of Labyrinthula sp. isolate SR_Ha_C and other
417 species within Bigyra and Stremenopiles more generally. Phylogenetic tree based on 13
418 mitochondrial protein-coding genes (atp, cox, nad, cytb). This tree contains 127
419 Stramenopiles species/isolates in three phyla (Bigyra:13, Ochrophyta:91, Oomycota:23) and
420 three Rhodophyta species as outgroup, rooted with Reclinomonas americana. Nodal support
24 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
421 is indicated in the form of SH-aLRT / UFBoot values, with reliable clades having ≥80% and
422 ≥95% values, respectively. Closed circles indicate maximum support (100/100) while open
423 circles report less than maximum values but above the reliable threshold.
424
425 In the reconstructed phylogeny of Stramenopiles, Labyrinthula and other protist species in
426 the phylum, Bigyra formed a paraphyletic group at the base of Stramenopile, in addition to
427 monophyletic clades for the other two phyla (Oomycota and Ochrophycota) (Figure 3;
428 superalignment and phylogenetic tree is available in Data S2). Within Bigyra itself,
429 Labyrinthula SR_Ha_C shared a strongly-supported sister relationship with two
430 Thraustochytrida species (Thraustochytrium aureum and Schizochytrium sp. TIO1101), all
431 three of which were taxonomically assigned to the class Labyrinthulea. This clade clustered
432 with species from the other two major phyla, consistent with current taxonomic
433 classifications and reports in other studies based on nuclear data (Tsui et al. 2009). However,
434 this inference should be treated with caution since inadequate and uneven representation of
435 branches in phylogenetic trees can introduce errors caused by long-branch attraction and false
436 monophyly, which can lead to inaccurate phylogenetic relationships. It is therefore crucial to
437 recognise the importance of strategic sampling to increase mitogenomic resources for under-
438 studied taxonomic groups, thereby filling in the gaps in taxonomic representation to enable
439 more comprehensive investigations into molecular diversity and evolutionary histories of
440 protists (Wideman et al. 2020).
441
442 Ecological and biotechnology implications of the Labyrinthula SR_Ha_C genome
443 This study reports the first assembled genome of a Labyrinthula species, and adds valuable
444 genomic and transcriptomic resources to the limited molecular sequences available for the
445 group Labyrinthulea (Song et al. 2018). The recovery of the complete mitochondrial genome
25 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
446 makes this the first of its kind for the order Labyrinthulida, strongly supporting a
447 monophyletic Labyrinthulida+Thraustochytrida clade, while providing insights into the
448 evolutionary history of Stramenopile species. Our cross-phyla comparison of >100
449 Stramenopile genome assemblies unveiled possible caveats in current existing quality
450 assessment tools that need to be carefully considered when assessing work for species from
451 divergent and under-sampled taxonomic groups. Genome annotation of Labyrinthula
452 SR_Ha_C opens comparisons to the recently published aquatic early lineage zoosporic fungi
453 (Lange et al. 2019a). Further, the presented genome could act as a reference to propel new
454 research directions for the Labyrinthula genus, such as population genetics (Lohan et al.
455 2020).
456
457 While our mining of the draft genome was not exhaustive, our initial investigations show
458 promise in highlighting novel pathways to understanding Labyrinthula ecology and the
459 SWD-pathosystem. The CAZymes and predicted CAZy function profiles provided insight
460 into cell-wall breakdown. These pathways have been largely assumed through the
461 classification of Labyrinthula’s saprobic lifestyle, but such evidence relating to this
462 ecological function has most often been limited to their thraustochytrid cousins (Raghukumar
463 and Damare 2011). Cell-wall and fibre breakdown not only has significant implications to
464 marine carbon sequestration, but in conjunction with adhesion to the host, highlights
465 mechanisms that may be key to the invasion of living seagrass tissue (Gibson et al. 2011,
466 Iwata and Honda 2018). These results will hopefully provide a springboard to future research
467 on Labyrinthula’s virulence or SWD assays using both host and pathogen gene responses.
468
469 Our study also highlights the industrial potential for Labyrinthula. Enzyme secretome
470 profiles and specialised functional proteins of fungal and non-fungal zoosporic unicellular
26 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
471 organisms, such as Labyrinthula, share many similarities with regard to enzyme functions,
472 but also reflect different types of plant biomass affinities and degrading capabilities (Lange et
473 al. 2019b). Secretome organisation, e.g. one enzyme function may be represented by several
474 types of protein families or one type of protein family may include several enzyme functions,
475 could reflect a strategy that is evolutionary optimised for robustness to varying conditions
476 (salinity, temperature, pH, substrate availability). Notably, the uniqueness of the genome of
477 Labyrinthula SR_Ha_C, combined with the very low number of sequenced genomes in this
478 group makes the genome/transcriptome and function predictions described here a hotspot for
479 future enzyme discovery activities. For example, the lignocellulose repertoire of plant
480 pathogens is inspiring research on their use for biomass conversion to biofuels or higher
481 value products (Gibson et al. 2011), while thraustochytrids are rich source bioproducts
482 (Marchan et al. 2018). We expect that much more could be found from this highly diverse,
483 un-exploited and under-studied group of microorganisms.
484
485
27 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
486 Acknowledgments
487 This research utilised computational resources and services provided by the National
488 Computational Infrastructure (NCI), which is supported by the Australian Government. We
489 thank Samuel Lysy for his assistance in the lab. We would also like to thank Robert Ruge of
490 Deakin University for assistance and use of the SIT HPC Cluster. Funding for this study was
491 provided by the Mary Collins Trust and the Deakin Genomics Centre, Deakin University
492 (Australia).
493
494 Availability of data and material
495 The whole genome assembly was deposited into the NCBI/GenBank Whole Genome
496 Sequence (WGS) database with the accession number JAALGZ000000000 and raw genomic
497 reads into NCBI/Short Read Archive (SRA) with accession numbers SRR12496087 and
498 SRR12496088, all of which are under the BioProject PRJNA607370. The mitochondrial
499 genome was deposited into the NCBI/GenBank database with the accession number
500 MT267870.
501
502 Authors’ contributions
503 STT, SL, MT, LC, LL and FHG designed the research and direction of analyses. STT
504 collected and isolated the sample. SL performed whole genome and transcriptome
505 sequencing. MT assembled and annotated the genome and transcriptome, and carried out
506 phylogenetic analysis. LC and MT performed comparative analyses of assemblies and
507 proteomes of other protist genomes. LL, BP and MT analysed the peptide-based functional
508 annotation of CAZymes. MT and STT led the writing of the manuscript. All authors edited
509 and developed the manuscript.
510
28 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
511 References
512 Barrett, K., and L. Lange. 2019. Peptide-based functional annotation of carbohydrate-active 513 enzymes by conserved unique peptide patterns (CUPP). Biotechnology for Biofuels 514 12:1-21. 515 Boore, J. L. 1999. Animal mitochondrial genomes. Nucleic Acids Research 27:1767-1780. 516 Brakel, J., T. B. Reusch, and A.-C. Bockelmann. 2017. Moderate virulence caused by the 517 protist Labyrinthula zosterae in ecosystem foundation species Zostera marina under 518 nutrient limitation. Marine Ecology Progress Series 571:97-108. 519 Brakel, J., F. J. Werner, V. Tams, T. B. Reusch, and A.-C. Bockelmann. 2014. Current 520 European Labyrinthula zosterae are not virulent and modulate seagrass (Zostera 521 marina) defense gene expression. PloS one 9:e92448. 522 Brázda, V., J. Kolomazník, J. Lýsek, M. Bartas, M. Fojta, J. Šťastný, and J.-L. Mergny. 2019. 523 G4Hunter web application: a web server for G-quadruplex prediction. Bioinformatics 524 35:3493-3495. 525 Buchfink, B., C. Xie, and D. H. Huson. 2015. Fast and sensitive protein alignment using 526 DIAMOND. Nature Methods 12:59-60. 527 Cantarel, B. L., P. M. Coutinho, C. Rancurel, T. Bernard, V. Lombard, and B. Henrissat. 528 2008. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for 529 Glycogenomics. Nucleic Acids Research 37:D233-D238. 530 Cipollone, R., P. Ascenzi, and P. Visca. 2007. Common themes and variations in the 531 rhodanese superfamily. IUBMB Life 59:51-59. 532 Collier, J. L., and J. S. Rest. 2019. Swimming, gliding, and rolling toward the mainstream: 533 cell biology of marine protists. Molecular Biology of the Cell 30:1245-1248. 534 Consortium, T. U. 2018. UniProt: a worldwide hub of protein knowledge. Nucleic Acids 535 Research 47:D506-D515. 536 Davies, P., C. Morvan, O. Sire, and C. Baley. 2007. Structure and properties of fibres from 537 sea-grass (Zostera marina). Journal of Materials Science 42:4850-4857. 538 Duffin, P., D. L. Martin, K. M. Pagenkopp Lohan, and C. Ross. 2020. Integrating host 539 immune status, Labyrinthula spp. load and environmental stress in a seagrass 540 pathosystem: Assessing immune markers and scope of a new qPCR primer set. PloS 541 one 15:e0230108. 542 Exposito Alonso, M., H. G. Drost, H. A. Burbano, and D. Weigel. 2020. The Earth 543 BioGenome project: opportunities and challenges for plant genomics and 544 conservation. The Plant Journal 102:222-229. 545 Falabella, M., R. J. Fernandez, F. B. Johnson, and B. A. Kaufman. 2019. Potential roles for 546 G-quadruplexes in mitochondria. Current medicinal chemistry 26:2918-2932. 547 Fernandes, N., R. J. Case, S. R. Longford, M. R. Seyedsayamdost, P. D. Steinberg, S. 548 Kjelleberg, and T. Thomas. 2011. Genomes and virulence factors of novel bacterial 549 pathogens causing bleaching disease in the marine red alga Delisea pulchra. PloS one 550 6:e27387. 551 Gibson, D. M., B. C. King, M. L. Hayes, and G. C. Bergstrom. 2011. Plant pathogens as a 552 source of diverse enzymes for lignocellulose digestion. Current Opinion in 553 Microbiology 14:264-270. 554 Haas, B. J., A. Papanicolaou, M. Yassour, M. Grabherr, P. D. Blood, J. Bowden, M. B. 555 Couger, D. Eccles, B. Li, M. Lieber, M. D. MacManes, M. Ott, J. Orvis, N. Pochet, F. 556 Strozzi, N. Weeks, R. Westerman, T. William, C. N. Dewey, R. Henschel, R. D. 557 LeDuc, N. Friedman, and A. Regev. 2013. De novo transcript sequence reconstruction 558 from RNA-seq using the Trinity platform for reference generation and analysis. 559 Nature Protocols 8:1494-1512.
29 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
560 Harding, T., A. J. Roger, and A. G. Simpson. 2017. Adaptations to high salt in a halophilic 561 protist: differential expression and gene acquisitions through duplications and gene 562 transfers. Frontiers in Microbiology 8:944. 563 Harris, L. M., and C. J. Merrick. 2015. G-quadruplexes in pathogens: a common route to 564 virulence control? PLoS Pathog 11:e1004562. 565 Harvell, C., K. Kim, J. Burkholder, R. Colwell, P. R. Epstein, D. Grimes, E. Hofmann, E. 566 Lipp, A. Osterhaus, and R. M. Overstreet. 1999. Emerging marine diseases--climate 567 links and anthropogenic factors. Science 285:1505-1510. 568 Holt, C., and M. Yandell. 2011. MAKER2: an annotation pipeline and genome-database 569 management tool for second-generation genome projects. BMC Bioinformatics 570 12:491. 571 Howe, K. L., B. Contreras-Moreira, N. De Silva, G. Maslen, W. Akanni, J. Allen, J. Alvarez- 572 Jarreta, M. Barba, D. M. Bolser, L. Cambell, M. Carbajo, M. Chakiachvili, M. 573 Christensen, C. Cummins, A. Cuzick, P. Davis, S. Fexova, A. Gall, N. George, L. Gil, 574 P. Gupta, K. E. Hammond-Kosack, E. Haskell, S. E. Hunt, P. Jaiswal, S. H. Janacek, 575 P. J. Kersey, N. Langridge, U. Maheswari, T. Maurel, M. D. McDowall, B. Moore, 576 M. Muffato, G. Naamati, S. Naithani, A. Olson, I. Papatheodorou, M. Patricio, M. 577 Paulini, H. Pedro, E. Perry, J. Preece, M. Rosello, M. Russell, V. Sitnik, D. M. 578 Staines, J. Stein, M. K. Tello-Ruiz, S. J. Trevanion, M. Urban, S. Wei, D. Ware, G. 579 Williams, A. D. Yates, and P. Flicek. 2019. Ensembl Genomes 2020—enabling non- 580 vertebrate genomic research. Nucleic Acids Research 48:D689-D695. 581 Iwata, I., and D. Honda. 2018. Nutritional intake by ectoplasmic nets of Schizochytrium 582 aggregatum (Labyrinthulomycetes, Stramenopiles). Protist 169:727-743. 583 Kazama, F. 1972. Ultrastructure and phototaxis of the zoospores of Phlyctochytrium sp., an 584 estuarine chytrid. Microbiology 71:555-566. 585 Keeley, A., and D. Soldati. 2004. The glideosome: a molecular machine powering motility 586 and host-cell invasion by Apicomplexa. Trends in Cell Biology 14:528-532. 587 Kyrpides, N. C., P. Hugenholtz, J. A. Eisen, T. Woyke, M. Göker, C. T. Parker, R. Amann, 588 B. J. Beck, P. S. Chain, and J. Chun. 2014. Genomic encyclopedia of bacteria and 589 archaea: sequencing a myriad of type strains. PLoS biology 12:e1001920. 590 Lang, B. F., G. Burger, C. J. O'Kelly, R. Cedergren, G. B. Golding, C. Lemieux, D. Sankoff, 591 M. Turmel, and M. W. Gray. 1997. An ancestral mitochondrial DNA resembling a 592 eubacterial genome in miniature. Nature 387:493-497. 593 Lange, L., K. Barrett, B. Pilgaard, F. Gleason, and A. Tsang. 2019a. Enzymes of early- 594 diverging, zoosporic fungi. Applied Microbiology and Biotechnology 103:6885-6902. 595 Lange, L., B. Pilgaard, F.-A. Herbst, P. K. Busk, F. Gleason, and A. G. Pedersen. 2019b. 596 Origin of fungal biomass degrading enzymes: Evolution, diversity and function of 597 enzymes of early lineage fungi. Fungal Biology Reviews 33:82-97. 598 Leander, C. A., and D. Porter. 2001. The Labyrinthulomycota is comprised of three distinct 599 lineages. Mycologia:459-464. 600 Lewin, H. A., G. E. Robinson, W. J. Kress, W. J. Baker, J. Coddington, K. A. Crandall, R. 601 Durbin, S. V. Edwards, F. Forest, and M. T. P. Gilbert. 2018. Earth BioGenome 602 Project: Sequencing life for the future of life. Proceedings of the National Academy 603 of Sciences 115:4325-4333. 604 Lohan, K. M. P., R. DiMaria, D. L. Martin, C. Ross, and G. M. Ruiz. 2020. Diversity and 605 microhabitat associations of Labyrinthula spp. in the Indian River Lagoon System. 606 Diseases of Aquatic Organisms 137:145-157. 607 Marchan, L. F., K. J. L. Chang, P. D. Nichols, W. J. Mitchell, J. L. Polglase, and T. 608 Gutierrez. 2018. Taxonomy, ecology and biotechnological applications of 609 thraustochytrids: A review. Biotechnology Advances 36:26-46.
30 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
610 Martin, D. L., Y. Chiari, E. Boone, T. D. Sherman, C. Ross, S. Wyllie-Echeverria, J. K. 611 Gaydos, and A. A. Boettcher. 2016. Functional, phylogenetic and host-geographic 612 signatures of Labyrinthula spp. provide for putative species delimitation and a global- 613 scale view of seagrass wasting disease. Estuaries and Coasts 39:1403-1421. 614 Mikheenko, A., A. Prjibelski, V. Saveliev, D. Antipov, and A. Gurevich. 2018. Versatile 615 genome assembly evaluation with QUAST-LG. Bioinformatics 34:i142-i150. 616 Oborník, M., J. Janouškovec, T. Chrudimský, and J. Lukeš. 2009. Evolution of the apicoplast 617 and its hosts: from heterotrophy to autotrophy and back again. International Journal 618 for Parasitology 39:1-12. 619 Raghukumar, S., and V. S. Damare. 2011. Increasing evidence for the important role of 620 Labyrinthulomycetes in marine ecosystems. Botanica Marina 54:3-11. 621 Roach, M. J., S. A. Schmidt, and A. R. Borneman. 2018. Purge Haplotigs: allelic contig 622 reassignment for third-gen diploid genome assemblies. BMC Bioinformatics 19:460. 623 Seppey, M., M. Manni, and E. M. Zdobnov. 2019. BUSCO: Assessing Genome Assembly 624 and Annotation Completeness. Pages 227-245 in M. Kollmar, editor. Gene Prediction: 625 Methods and Protocols. Springer New York, New York, NY. 626 Sibbald, S. J., and J. M. Archibald. 2017. More protist genomes needed. Nature Ecology & 627 Evolution 1:1-3. 628 Song, Z., J. E. Stajich, Y. Xie, X. Liu, Y. He, J. Chen, G. R. Hicks, and G. Wang. 2018. 629 Comparative analysis reveals unexpected genome features of newly isolated 630 Thraustochytrids strains: on ecological function and PUFAs biosynthesis. BMC 631 genomics 19:541. 632 Sullivan, B. K., S. M. Trevathan-Tackett, S. Neuhauser, and L. L. Govers. 2018. Host- 633 pathogen dynamics of seagrass diseases under future global change. Marine Pollution 634 Bulletin 134:75-88. 635 Tian, Y., S. Gao, S. Yang, and G. Nagel. 2018. A novel rhodopsin phosphodiesterase from 636 Salpingoeca rosetta shows light-enhanced substrate affinity. Biochemical Journal 637 475:1121-1128. 638 Trevathan-Tackett, S. M., B. K. Sullivan, K. Robinson, O. Lilje, P. I. Macreadie, and F. H. 639 Gleason. 2018. Pathogenic Labyrinthula associated with Australian seagrasses: 640 Considerations for seagrass wasting disease in the southern hemisphere. 641 Microbiological Research 206:74-81. 642 Tsui, C. K., W. Marshall, R. Yokoyama, D. Honda, J. C. Lippmeier, K. D. Craven, P. D. 643 Peterson, and M. L. Berbee. 2009. Labyrinthulomycetes phylogeny and its 644 implications for the evolutionary loss of chloroplasts and gain of ectoplasmic gliding. 645 Molecular Phylogenetics and Evolution 50:129-140. 646 Vurture, G. W., F. J. Sedlazeck, M. Nattestad, C. J. Underwood, H. Fang, J. Gurtowski, and 647 M. C. Schatz. 2017. GenomeScope: fast reference-free genome profiling from short 648 reads. Bioinformatics 33:2202-2204. 649 Warnecke, D., and E. Heinz. 2003. Recently discovered functions of glucosylceramides in 650 plants and fungi. Cellular and Molecular Life Sciences CMLS 60:919-941. 651 Wideman, J. G., A. Monier, R. Rodríguez-Martínez, G. Leonard, E. Cook, C. Poirier, F. 652 Maguire, D. S. Milner, N. A. Irwin, and K. Moore. 2020. Unexpected mitochondrial 653 genome diversity revealed by targeted single-cell genomics of heterotrophic 654 flagellated protists. Nature Microbiology 5:154-165. 655 Zhang, H., T. Yohe, L. Huang, S. Entwistle, P. Wu, Z. Yang, P. K. Busk, Y. Xu, and Y. Yin. 656 2018. dbCAN2: a meta server for automated carbohydrate-active enzyme annotation. 657 Nucleic acids research 46:W95-W101. 658 Zimin, A. V., D. Puiu, M.-C. Luo, T. Zhu, S. Koren, G. Marçais, J. A. Yorke, J. Dvořák, and 659 S. L. Salzberg. 2017. Hybrid assembly of the large and highly repetitive genome of
31 bioRxiv preprint doi: https://doi.org/10.1101/2020.09.14.297390; this version posted September 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
660 Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads 661 algorithm. Genome Research 27:787-792. 662
32