<<

bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 2 3 4 5 Early diverging pathogenic fungi of the order possess diverse 6 and unique subtilisin-like serine proteases 7 8 9 Authors 10 Jonathan A. Arnesen1, Joanna Malagocka1, Andrii Gryganskyi2, Igor V. Grigoriev3, Kerstin 11 Voigt4,5, Jason E. Stajich6 and Henrik H. De Fine Licht1* 12 13 14 1Section for Organismal Biology, Department of and Environmental Sciences, 15 University of Copenhagen, Thorvaldsenvej 40, 1871 Frederiksberg, Denmark. 16 2Department of Biology, Duke University, Durham, North Carolina, USA. 17 3US Department of Energy Joint Genome Institute, Walnut Creek, California, USA. 18 4Jena Microbial Resource Collection (JMRC), Leibniz Institute for Natural Product Research 19 and Biology - Hans Knoell Institute, Jena, Germany. 20 5Institute of , Friedrich Schiller University, Jena, Germany. 21 6Department of and Microbiology, University of California, Riverside, 22 California, USA. 23 24 25 *Author for correspondence: 26 Section for Organismal Biology, Department of Plant and Environmental Sciences, 27 University of Copenhagen, Thorvaldsenvej 40, 1871 Frederiksberg, Denmark. Email: 28 [email protected], Tel: +45 35320097 29 30

1 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

31 Abstract 32 Insect-pathogenic fungi use subtilisin-like serine proteases (SLSPs) to degrade chitin- 33 associated proteins in the insect procuticle. Most insect-pathogenic fungi in the order 34 Hypocreales () are generalist with a broad host-range, and most species 35 possess a high number of SLSPs. The other major clade of insect-pathogenic fungi is part of 36 the subphylum Entomophthoromycotina (Zoopagomycota, formerly ) which 37 consists of high host-specificity insect-pathogenic fungi that naturally only infect a single or 38 very few host species. The extent to which insect-pathogenic fungi in the order 39 Entomophthorales rely on SLSPs is unknown. Here we take advantage of recently available 40 transcriptomic and genomic datasets from four genera within Entomophthoromycotina: the 41 saprobic or opportunistic Basidiobolus meristosporus, coronatus, C. 42 thromboides, C. incongruus, and the host-specific insect pathogens Entomphthora muscae 43 and formicae, specific pathogens of house flies (Muscae domestica) and wood ants 44 (Formica polyctena), respectively. We use phylogenetics and protein analysis to 45 show that the obligate biotrophic fungi E. muscae, P. formicae and the saprobic human 46 C. incongruus all contain “classical” fungal SLSPs and a unique group of SLSPs 47 that loosely resembles bacillopeptidase F-like SLSPs. This novel group of SLSPs is found in 48 the genomes of obligate insect pathogens and a generalist saprobic opportunistic pathogen 49 why they are unlikely to be responsible for the host specificity of Entomophthorales. 50 However, this class represent a unique group of SLSPs so far only observed among Bacteria, 51 Oomycetes and early diverging fungi such as Cryptomycota, , and 52 Entomophthoromycotina and missing in the sister fungal lineages of or 53 the fungal phyla Mucoromyocta, Ascomycota and fungi suggesting 54 interesting loss patterns. 55 56 Keywords 57 Subtilase; insect-pathogen; early-diverging fungi; proteases 58 59 60

2 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

61 1. Introduction 62 Insect pathogenic fungi use a broad array of enzymes to penetrate the host cuticle and gain 63 entry to the soft tissues inside (Charnley, 2003; St. Leger et al., 1986b). In many cases, serine 64 proteases are among the first enzymes to be secreted in the early stages of infection in order 65 to cleave and open up chitin-associated proteins in the procuticle (St. Leger et al., 1986a; 66 Vega et al., 2012), which later is followed by extensive lipase and chitinase enzymatic 67 secretions (Charnley, 2003). In particular, subtilisin-like serine proteases (SLSPs) have been 68 considered important virulence factors in pathogenic fungi (Muszewska et al., 2011). The 69 first SLSPs from insect pathogenic fungi were identified in Metarhizium anisopliae 70 (ARSEF2575), which secretes SLSPs as some of the key proteases during fungal growth on 71 insect cuticle (Charnley, 2003; St. Leger et al., 1986a). Comparative genomic approaches 72 have identified significant expansions of the SLSP gene family in the Metarhizium 73 (Bagga et al., 2004; Hu et al., 2014), the insect pathogenic Beauveria bassiana (Xiao 74 et al., 2012), two -trapping fungi Monacrosporium haptotylum and Arthrobotrys 75 oligospora that are able to penetrate the chitinaceous cell wall of soil (Meerupvati 76 et al. 2013), and dermatophytic fungi such as Arthroderma benhamiae and 77 verrucosum that can cause nail and in humans and (Burmester et al., 78 2011; Desjardins et al., 2011; Martinez et al., 2012; Sharpton et al., 2009). Fungi that are able 79 to utilize chitin-rich substrates, including many insect pathogenic fungi, thus appear to often 80 be associated with a diversified and expanded set of SLSPs. 81 82 Although SLSPs are expanded among insect pathogenic fungi, this group of proteases are 83 ubiquitous among eukaryotic . Most SLSPs are secreted externally or localized to 84 vacuoles, and especially in saprobic and symbiotic fungi SLSPs constitute an important 85 component of the secretome (Li et al., 2017). According to the MEROPS peptidase 86 classification, the S8 family of SLSPs together with the S53 family of serine-carboxyl 87 proteases make up the SB clan of subtilases (Rawlings et al., 2016). The S8 family of SLSPs 88 is characterized by an Asp-His-Ser catalytic triad (DHS triad), which forms the active site 89 and is further divided into two subfamilies S8A and S8B. Subfamily S8A contains most S8 90 representatives, including the well-known Proteinase K enzyme that is widely used in 91 laboratories as a broad-spectrum protease. The S8B SLSPs are kexins and furins which 92 cleave peptides and protohormones (Jalving et al., 2000; Muszewska et al., 2017, 2011). 93 Based on characteristic protein domain architectures and protein motifs surrounding the 94 active site residues, the large S8A subfamily of SLSPs is further divided into several groups

3 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

95 such as proteinase-K and pyrolysin. Besides these two major groups of proteinase K-like and 96 pyrolisin subfamilies, six new groups of subtilase designated new 1 to new 6 have 97 recently been found (Li et al., 2017; Muszewska et al., 2011). The analysis of fungal genome 98 data from a wide taxonomic range has shown that the size of the proteinase K gene family 99 has expanded independently in fungi pathogenic to invertebrates (Hypocreales) and 100 vertebrates () (Muszewska et al., 2017; Sharpton et al., 2009). Closely related 101 systemic human-pathogenic fungi, however, do not show the same expansions and related 102 pathogens and non-pathogens can show the same expansions (Muszewska et al., 2011; 103 Whiston and Taylor, 2016). This suggests that the number of SLSPs that a fungus possess is 104 not directly related to pathogenicity, but instead is associated with the use of dead or alive 105 tissue as growth substrate (Li et al., 2017; Muszewska et al., 2011). 106 107 Most anamorphic insect-pathogenic fungi in the order Hypocreales (Ascomycota) are 108 generalist species with a broad host-range capable of infecting most major orders of 109 (e.g. M. robertsii and B. bassiana) or specific to larger phylogenetic groups (e.g. the locust- 110 specific M. acridum or the coleopteran pathogen B. brongniartii). The above inferences of 111 fungal SLSP evolution rely almost exclusively on insights from Ascomycota, and 112 consequently has strong sampling bias towards generalist insect-pathogenic fungi. In 113 contrast, the other major clade of insect-pathogenic fungi in the subphylum 114 Entomophthoromycotina (Zoopagomycota, formerly Zygomycota) consists almost 115 exclusively of insect-pathogens and many are extremely host-specific, naturally only 116 infecting a single or very few host species (Spatafora et al., 2016). The dearth of genomic 117 data for Entomophthoromycotina has previously precluded their inclusion in comparative 118 genomic analyses (De Fine Licht et al., 2016; Gryganskyi and Muszewska, 2014). Here we 119 take advantage of recently available transcriptomic and genomic datasets from four genera 120 within Entomophthoromycotina: the saprobic Basidiobolus meristosporus, the saprobic and 121 opportunistic pathogens, , C. thromboides, C. incongruus, and the 122 host-specific insect pathogens Entomphthora muscae and Pandora formicae, specific 123 pathogens of house flies (Muscae domestica) and wood ants (Formica polyctena), 124 respectively. We use phylogenetics and protein domain analysis to show that the obligate 125 biotrophic fungi E. muscae, P. formicae and the saprobic C. incongruus in 126 addition to more “classical” fungal SLSPs, harbor a unique group of SLSPs that loosely 127 resembles bacillopeptidase F-like SLSPs. 128

4 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

129 2. Materials and Methods 130 Sequence database searches for subtilisin-like serine proteases 131 We identified putative subtilisin-like serine proteases (SLSPs) from six fungi in the order 132 Entomophthorales: muscae, Pandora formicae, Basidiobolus meristosporus, 133 Conidiobolus coronatus, C. incongruus and C. thromboides. First, Pfam protein family 134 domains were identified in the de-novo assembled transcriptomes of E. muscae KVL-14-117 135 (De Fine Licht et al., 2017), and P. formicae (Małagocka et al., 2015), using profile Hidden 136 Markov Models with hmmscan searches (e-value < 1e-10) against the PFAM-A database ver. 137 31.0 using HMMER ver. 3 (Eddy, 1998; Finn et al., 2016). All sequences in the 138 transcriptome datasets containing the S8 subtilisin-like protease domain (PF00082) were 139 identified and included in further analyses. Second, all sequences that contain the PF00082 140 domain were obtained from the genomes of B. meristosporus CBS 931.73 (Mondo et al., 141 2017), C. coronatus NRRL28638 (Chang et al., 2015), and C. thromboides FSU 785 from the 142 US Department of Energy Joint Genome Institute MycoCosm genome portal 143 (http://jgi.doe.gov/-fungi). Third, predicted coding regions in the genome sequence of C. 144 incongruus CDC-B7586 (Chibucos et al., 2016), were searched for the presence of the S8 145 subtilisin-like protease domain (PF00082) as described above. 146 147 Sequences encoding an incomplete Asp-His-Ser catalytic triad (DHS triad) characteristic of 148 S8 family proteases were regarded as potential pseudogenes and excluded from further 149 analysis as they. A preliminary protein alignment made with ClustalW and construction of a 150 Neighbour-Joining tree revealed a highly divergent group of P. formicae SLSP-sequences 151 that had significant homology with insect proteases (blastp, e-value < 1e-6, ncbi-nr protein 152 database, accessed June 2017). These putative insect-sequences likely originate from the ant 153 host, Formica polyctena, and represents host contamination that were not filtered out from 154 the dual-RNAseq reads used to construct the P. formicae transcriptome (Małagocka et al., 155 2015). These divergent sequences were therefore removed and excluded from further 156 analysis. 157 158 Protein domain architecture and homology modelling 159 The domain architectures of all putative SLSPs identified within Entomophthoromycotina 160 were predicted using Pfam domain annotation. The presence of putative signal peptides for 161 extracellular secretion were predicted using SignalP (Petersen et al., 2011). Homology-based 162 protein models of the 3D structure were constructed with Swiss-model (Biasini et al., 2014)

5 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

163 and visualized using PyMol (The PyMOL Molecular Graphics System, Version 1.8 164 Schrödinger, LLC). 165 166 Protease sequence clustering 167 A Markov Clustering Algorithm (MCL) was used to identify clusters of similar proteins 168 among putative SLSPs identified within Entomophthorales. Clustering using MCL is based 169 on a graph constructed by an all-vs-all-BLAST of SLSPs (BLASTP, e-value < 1e-10). The 170 Tribe-MCL protocol (Enright et al., 2002) as implemented in the Spectral Clustering of 171 Protein Sequences (SCPS) program (Nepusz et al., 2010) was used with inflation = 2.0. The 172 inflation parameter is typically set between 1.2 – 5.0 (Nepusz et al., 2010), and controls the 173 “tightness” of the sequence matrix, with lower values leading to fewer clusters and higher 174 values to more sequence clusters. To putatively assign protease function to the newly 175 identified Entomophthorales sequence clusters the Tribe-MCL protocol was used to identify 176 clusters of similar proteins between the putative SLSPs identified within Entomophthorales 177 and 20,806 protease sequences belonging to the peptidase subfamily S8A obtained from the 178 MEROPS database, accessed November 2017 (Rawlings et al., 2016). Investigation of the 179 MEROPS protease sequences that clustered together with the identified Entomophthorales 180 sequence clusters allowed putative protease holotype information to be assigned to the 181 identified clusters. 182 183 Phylogenetic analysis 184 All identified putative Entomophthorales SLSP coding nucleotide sequences were aligned in 185 frame to preserve codon structure using MAFFT (Larkin et al., 2007). Unreliable codon- 186 columns with a Guidance2 score below 0.90 in the multiple sequence alignment were 187 removed (Penn et al., 2010). The best model for phylogenetic analysis was selected by 188 running PhyML with GTR as substitution model and with-or-without Gamma parameter and 189 a proportion of invariable sites (Guindon and Gascuel, 2003). The optimal substitution model 190 based on the Bayesian Information Criterion (BIC) score (GTR+G) was used in maximum 191 likelihood analysis calculation using RaxML with 10,000 bootstrap runs (Stamatakis, 2014). 192 193 To identify branches that potentially contain signatures of positive selection among SLSP 194 sequences we used maximum likelihood estimates of the dN/dS ratio (ω) for each site 195 (codon) along protein sequences. A specific lineage (branch) was tested independently for 196 positive selection (ω > 1) on individual sites by applying a neutral model that allows ω to

6 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

197 vary between 0 – 1 and a selection model that also incorporates sites with ω > 1 using the 198 software codeml implemented in PAML 4.4 (Yang, 2007). Statistical significance was 199 determined with a likelihood ratio test of these two models for the tested lineage. 200 201 3. Results 202 We identified 154 SLSP sequences from six fungi in the subphylum Entomophthoromyctina: 203 E. muscae (n = 22), P. formicae (n = 6), B. meristosporus (n = 60), C. thromboides (n = 18), 204 C. coronatus (n = 36), and C. incongruus (n = 12). Close inspection of the active site residues 205 revealed two C. incongruus sequences (Ci7229 and Ci12055), which contained the active site 206 DHS residues in the motifs Asp-Asp-Gly, His-Gly-Thr-Arg, and Gly-Thr-Ser-Ala/Val- 207 Ala/Ser-Pro characteristic of the S8B subfamily of S8 proteases. These two sequences also 208 contained the P domain (PF01483) indicating that they are S8B kexin proteases. All other 209 152 identified Entomophthoromyctina S8 protease sequences contained active site residues 210 closely resembling the motifs Asp-Thr/Ser-Gly, His-Gly-Thr-His, and Gly-Thr-Ser-Met-Ala- 211 Xaa-Pro characteristic of the S8A subfamily. Cluster analysis of these 152 S8A-protease 212 sequences identified three groups of proteins that were designated as group A, B and C, with 213 130, 11 and 11 sequences in each cluster, respectively (Fig. 1). These groups do not change 214 when the Tribe-MCL inflation parameters varied within a range of 1.5 – 6.0 suggesting that 215 these three groups were clearly distinct. Phylogenetic analysis of the identified 152 S8A 216 SLSPs using maximum likelihood methods (Stamatakis, 2014) also recovered the same three 217 distinct lineages with strong bootstrap support (Fig. 1). Evidence of positive selection acting 218 on specific enzyme residues on the branches leading to these clusters were not detected with 219 branch-site tests (2ΔlnL > 1.53, P > 0.217). 220 221 To further characterize the three SLSP groups, the protein domain architecture of each of the 222 152 protease sequences were analyzed. The presence of a proteinase-associated (PA, 223 pfam:PF02225) domain was only found in Group B, strongly suggested that this cluster with 224 11 members is comprised of pyrolisin and osf proteases (Muszewska et al., 2011). An 225 additional Tribe-MCL cluster analysis (inflation = 1.2) of the 152 Entomophthoromycotina 226 and all S8A proteases in the MEROPS database clustered these 11 Entomophthoromycotina 227 sequences into a group of 779 MEROPS proteases. This group of proteases contained 42 228 members of the fungal S08.139 (PoSl-(Pleurotus ostreatus)-type peptidase) holotype 229 (Supplementary data). The protein domain architecture of group A contained 130 230 Entomopthoromycotina SLSPs with secretory signal and a peptidase inhibitor (Inhibitor_I9,

7 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

231 pfam: PF05922) domain. None of the 11 members of group C contained a peptidase inhibitor 232 domain (Fig. 1). The 130 Entomophthoromycotina Group B SLSPs appear to belong to the 233 common Proteinase K group of S8A proteases (Muszewska et al., 2011) based on 234 conservation of the active site residues (Fig. 2) and cluster membership of the MEROPS 235 Tribe-MCL analysis. The 130 Entomophthoromycotina SLSPs clustered with 3,046 236 MEROPS proteins of well-known fungal entomopathogenic protease holotypes 237 (Supplementary data), including cuticle-degrading peptidase of nematode-trapping fungi 238 (S08.120), cuticle-degrading peptidase of insect-pathogenic fungi in the genus Metarhizium 239 (S08.056), and subtilisin-like peptidase 3 (-type; S08.115). 240 241 The MEROPS Tribe-MCL analysis identified a third group (C) of 11 242 Entomophthoromycotina SLSPs which were similar to 402 MEROPS proteins. Of these 243 members 386 were classified as unassigned subfamily S8A peptidases (S08.UPA) and the 244 remaining 15 assigned to the bacillopeptidase F holotype (S08.017; Supplementary data). All 245 402 MEROPS proteases in this group originated from either Bacteria or Oomycetes, except 246 for two proteases from the Fungi Rozella allomycis (Cryptomycota) and Mitosporidium 247 daphnia (Microsporidia), respectively (Fig. 3, Supplementary data). The group C 248 Entomophthorales SLSPs clustered as sister to these two Cryptomycota and Microsporidia 249 proteins with strong support (ML bootstrap value = 99) in the phylogenetic analysis of the 250 protein sequences (Fig. 3), in concordance with this group of SLSPs being an outlier from all 251 other previously known fungal S8A SLSPs. 252 253 4. Discussion 254 Subtilisin-like serine proteases (SLSPs) have many roles in fungal biology and are known to 255 be involved in host–pathogen interactions. Independent expansion of copy number and 256 diversification of SLSPs is widespread among animal pathogenic (Ascomycota and 257 Basidiomycota) (Li et al., 2017). The repeated expansion of SLSPs among the generalist 258 insect-pathogenic hypocrealean fungi has been interpreted as an adaptation to enable 259 infection of insect hosts (Muszewska et al., 2011), whereas comparatively little is known 260 about the evolution and diversification of SLSPs among the early diverging fungal clades. To 261 understand the evolution of SLSPs among the vertebrate and arthropod pathogenic fungi in 262 the subphylum Entomophthoromycotina, we searched available genomic and transcriptomic 263 sequence data to identify all entomophthoralean genes with SLSP domains. We found 154 264 entomophthoromycotan SLSPs, of which two copies were classified as S8B kexin SLSPs.

8 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

265 The remaining 152 S8A SLSPs were clustered by sequence similarity and compared by 266 phylogenetic analysis to show that the majority of the SLSPs (n = 130) are similar to and 267 cluster together with “classical” proteinase-K-like fungal S8A SLSPs (Figure 1). 268 269 We did identify 11 SLSPs that cluster together with 402 un-annotated or Bacillopeptidase F- 270 like SLSPs from bacteria and Oomycetes (Figure 3). These obervations remained consistent 271 even when exploring variation in the inflation parameter, which controls the “tightness” in 272 the cluster analysis. The entomophthoralean and Oomycete S8A SLSPs form separate clades 273 within this cluster of primarily bacterial proteases indicating that the Entomophthorales and 274 Oomycete SLSPs evolved independently (Figure 3). Apart from the entomophthoralean 275 sequences, only two fungal protease sequences were found within this group from the 276 Cryptomycota R. allomycis and microsporidium M. daphnia. A statistical test for a significant 277 expansion of SLSP copy number among the insect-pathogenic Entomophthorales was not 278 explicitly performed in the present analysis due to uncertainty of total gene numbers from 279 transcriptomic data sets of the specialist insect-pathogens E. muscae and P. formicae. In the 280 sampled transcriptomes, the number of transcripts is likely larger than the genome gene count 281 due to splice variants, post-transcriptional modifications, and allelic variants assembling into 282 multiple transcripts per gene. In addition, the assembled transcripts only reflect actively 283 transcribed genes expressed in the sampled conditions and time points, and may 284 underestimate the actual number of genes. These confounding factors impact the estimated 285 number of genes and make quantitative comparative analyses of gene family size between 286 transcriptomes unreliable. However, together with the two SLSPs from Cryptomycota and 287 Microsporidia, the 11 entomophthoralean proteases are a unique group of proteases exclusive 288 to some of the early diverging fungal lineages. 289 290 Functional annotation indicates apparent protease activity based on sequence similarity, but 291 function of the novel 11 SLSPs in group C is unknown. Eight of these SLSPs possess a signal 292 peptide that suggest external secretion and thus indicative of a function on the immediate 293 environment, whereas the remaining three might not be secreted or incomplete sequence 294 models. Apart from the canonical protease S8 domain (PF00082), no other Pfam domains 295 were found among this group C SLSPs. Searches against InterPro databases similarly did not 296 reveal any other protein domains apart from the protease S8 SLSP domain (PRINTS: 297 subtilisin serine protease family (S8) signature (PR00723), InterPro: peptidase S8, subtilisin- 298 related (IPR015500), and ProSitePatterns: serine proteases, subtilase family (PS00138)).

9 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

299 Notably, none of these SLSPs contain the proteinase inhibitor i9 domain (PF05922) often 300 found among the classical protease K-like SLSPs (Muszewska et al., 2011). However, this 301 group C of SLSPs may potentially have evolved a different function than the “classical” 302 fungal SLSPs in group A as evidenced by extensive diversification of the amino acids 303 immediately surrounding the active site residues in the DHS triad (Figure 2). Out of the five 304 Entomophthorales species analysed here, only three: C. incongruus, E. muscae and P. 305 formicae contained members in the new group C SLSPs (Figure 1). Since these genes are 306 missing in the genomes of C. coronatus and C. thromboides, the absence of particular SLSPs 307 is unlikely to be due to sequencing or sampling artifacts. The unequal phylogenetic presence 308 of the group C SLSPs could be indicative of specific functions related to niche adaptation. 309 The two insect-pathogenic fungi specialized on house flies (E. muscae) and wood ants (P. 310 formicae) contain five and two of the novel group C SLSPs, respectively (Figure 1), while no 311 group C SLSP S8A-proteases were found in Basidiomycota or Ascomycota (Figure 3). 312 However, the insect-pathogenic hypocrealean and nematode-trapping fungi within 313 Ascomycota contain their own unique SLSP’s missing in Entomphthoromycotina (Figure 4). 314 The soil saprobe and opportunistic human pathogen C. incongruus also contains four group C 315 SLSPs suggesting these SLSPs are not exclusively related to host-specific evolution of the 316 specialist insect-pathogenic entomophthoralean fungi. 317 318 Further studies including genomic comparisons of the host-specific insect-pathogenic 319 Entomophthorales will likely shed interesting new light on the gene content of these early 320 diverging fungi (De Fine Licht et al., 2016). The presence of unusual genome organization, 321 polyploidy and large genomes in many host-specific insect-pathogenic species within 322 Entomophthorales has previously been a hindrance to genome sequencing (Gryganskyi and 323 Muszewska, 2014). However, the present analysis exemplifies the many new proteins and 324 enzymes that may be discovered as genomes begin to become available within 325 Entomophthorales. 326 327 328 329 Acknowledgements 330 The work conducted by the U.S. Department of Energy Joint Genome Institute is supported 331 by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02- 332 05CH11231. Work by JES and AG was partially supported by funding from the National

10 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

333 Science Foundation (DEB-1441715) to JES. HHDFL was supported by the Villum 334 Foundation (grant no. 10122). 335 336 337 References 338 339 Bagga, S., Hu, G., Screen, S.E., St. Leger, R.J., 2004. Reconstructing the diversification of 340 subtilisins in the Metarhizium anisopliae. Gene 324, 159–169. 341 doi:10.1016/j.gene.2003.09.031 342 Biasini, M., Bienert, S., Waterhouse, A., Arnold, K., Studer, G., Schmidt, T., Kiefer, F., 343 Cassarino, T.G., Bertoni, M., Bordoli, L., Schwede, T., 2014. SWISS-MODEL: 344 Modelling protein tertiary and quaternary structure using evolutionary information. 345 Nucleic Acids Res. 42. doi:10.1093/nar/gku340 346 Burmester, A., Shelest, E., Glöckner, G., Heddergott, C., Schindler, S., Staib, P., Heidel, A., 347 Felder, M., Petzold, A., Szafranski, K., Feuermann, M., Pedruzzi, I., Priebe, S., Groth, 348 M., Winkler, R., Li, W., Kniemeyer, O., Schroeckh, V., Hertweck, C., Hube, B., White, 349 T.C., Platzer, M., Guthke, R., Heitman, J., Wöstemeyer, J., Zipfel, P.F., Monod, M., 350 Brakhage, A. a, 2011. Comparative and functional genomics provide insights into the 351 pathogenicity of dermatophytic fungi. Genome Biol. 12, R7. doi:10.1186/gb-2011-12-1- 352 r7 353 Chang, Y., Wang, S., Sekimoto, S., Aerts, A., Choi, C., Clum, A., LaButti, K., Lindquist, E., 354 Ngan, C.Y., Ohm, R.A., Salamov, A., Grigoriev, I. V., Spatafora, J.W., Berbee, M., 355 2015. Phylogenomic analyses indicate that early fungi evolved digesting cell walls of 356 algal ancestors of land . Genome Biol. Evol. 7, 1590–1601. 357 doi:10.1093/gbe/evv090 358 Charnley, A.K., 2003. Fungal pathogens of insects: Cuticle degrading enzymes and toxins. 359 Adv. Bot. Res. 40, 241–321. doi:10.1016/S0065-2296(05)40006-3 360 Chibucos, M.C., Soliman, S., Gebremariam, T., Lee, H., Daugherty, S., Orvis, J., Shetty, 361 A.C., Crabtree, J., Hazen, T.H., Etienne, K.A., Kumari, P., O’Connor, T.D., Rasko, 362 D.A., Filler, S.G., Fraser, C.M., Lockhart, S.R., Skory, C.D., Ibrahim, A.S., Bruno, 363 V.M., 2016. An integrated genomic and transcriptomic survey of -causing 364 fungi. Nat. Commun. 7, 12218. doi:10.1038/ncomms12218 365 De Fine Licht, H.H., Hajek, A.E., Eilenberg, J., Jensen, A.B., 2016. Utilizing genomics to 366 study entomopathogenicity in the fungal phylum : A review of 367 current genetic resources. Adv. Genet. 94, 41–65. doi:10.1016/bs.adgen.2016.01.003 368 De Fine Licht, H.H., Jensen, A.B., Eilenberg, J., 2017. Comparative transcriptomics reveal 369 host-specific nucleotide variation in entomophthoralean fungi. Mol. Ecol. 26, 2092– 370 2110. doi:10.1111/mec.13863 371 Desjardins, C.A., Champion, M.D., Holder, J.W., Muszewska, A., Goldberg, J., Bailão, 372 A.M., Brigido, M.M., Ferreira, M.E. da S., Garcia, A.M., Grynberg, M., Gujja, S., 373 Heiman, D.I., Henn, M.R., Kodira, C.D., León-Narváez, H., Longo, L.V.G., Ma, L.-J., 374 Malavazi, I., Matsuo, A.L., Morais, F. V., Pereira, M., Rodríguez-Brito, S., 375 Sakthikumar, S., Salem-Izacc, S.M., Sykes, S.M., Teixeira, M.M., Vallejo, M.C., 376 Walter, M.E.M.T., Yandava, C., Young, S., Zeng, Q., Zucker, J., Felipe, M.S., 377 Goldman, G.H., Haas, B.J., McEwen, J.G., Nino-Vega, G., Puccia, R., San-Blas, G., 378 Soares, C.M. de A., Birren, B.W., Cuomo, C.A., 2011. Comparative genomic analysis of 379 human fungal pathogens causing . PLoS Genet. 7, e1002345. 380 doi:10.1371/journal.pgen.1002345

11 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

381 Eddy, S., 1998. Profile hidden Markov models. Bioinformatics 14, 755–763. doi:btb114 [pii] 382 Enright, A.J., Van Dongen, S., Ouzounis, C.A., 2002. An efficient algorithm for large-scale 383 detection of protein families. Nucleic Acids Res. 30, 1575–1584. doi:doi: 384 10.1093/nar/30.7.1575 385 Finn, R.D., Coggill, P., Eberhardt, R.Y., Eddy, S.R., Mistry, J., Mitchell, A.L., Potter, S.C., 386 Punta, M., Qureshi, M., Sangrador-Vegas, A., Salazar, G.A., Tate, J., Bateman, A., 387 2016. The Pfam protein families database: towards a more sustainable future. Nucleic 388 Acids Res. 44, D279–D285. doi:10.1093/nar/gkv1344 389 Gryganskyi, A.P., Muszewska, A., 2014. Whole Genome Sequencing and the Zygomycota. 390 Fungal Genomics Biol. 4, 10–12. doi:10.4172/2165-8056.1000e116 391 Guindon, S., Gascuel, O., 2003. A simple, fast, and accurate algorithm to estimate large 392 phylogenies by maximum likelihood. Syst. Biol. 52, 696–704. 393 doi:10.1080/10635150390235520 394 Hu, X., Xiao, G., Zheng, P., Shang, Y., Su, Y., Zhang, X., Liu, X., Zhan, S., St. Leger, R.J., 395 Wang, C., 2014. Trajectory and genomic determinants of fungal-pathogen speciation 396 and host adaptation. Proc. Natl. Acad. Sci. 111, 16796–16801. 397 doi:10.1073/pnas.1412662111 398 Jalving, R., Van De Vondervoort, P.J.I., Visser, J., Schaap, P.J., 2000. Characterization of the 399 kexin-like maturase of niger. Appl. Environ. Microbiol. 66, 363–368. 400 doi:10.1128/AEM.66.1.363-368.2000.Updated 401 Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., 402 Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., Thompson, J.D., Gibson, T.J., 403 Higgins, D.G., 2007. Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947– 404 2948. doi:10.1093/bioinformatics/btm404 405 Li, J., Gu, F., Wu, R., Yang, J.K., Zhang, K.Q., 2017. Phylogenomic evolutionary surveys of 406 subtilase superfamily genes in fungi. Sci. Rep. 7, 1–15. doi:10.1038/srep45456 407 Małagocka, J., Grell, M.N., Lange, L., Eilenberg, J., Jensen, A.B., 2015. Transcriptome of an 408 entomophthoralean fungus (Pandora formicae) shows molecular machinery adjusted for 409 successful host exploitation and transmission. J. Invertebr. Pathol. 128, 47–56. 410 doi:10.1016/j.jip.2015.05.001 411 Martinez, D.A., Oliver, B.G., Gräser, Y., Goldberg, J.M., Li, W., Martinez-Rossi, N.M., 412 Monod, M., Shelest, E., Barton, R.C., Birch, E., Brakhage, A.A., Chen, Z., Gurr, S.J., 413 Heiman, D., Heitman, J., Kosti, I., Rossi, A., Saif, S., Samalova, M., Saunders, C.W., 414 Shea, T., Summerbell, R.C., Xu, J., Young, S., Zeng, Q., Birren, B.W., Cuomo, C.A., 415 White, T.C., 2012. Comparative genome analysis of and related 416 reveals candidate genes involved in infection. MBio 3, e00259-12. 417 doi:10.1128/mBio.00259-12 418 Mondo, S.J., Dannebaum, R.O., Kuo, R.C., Louie, K.B., Bewick, A.J., LaButti, K., Haridas, 419 S., Kuo, A., Salamov, A., Ahrendt, S.R., Lau, R., Bowen, B.P., Lipzen, A., Sullivan, W., 420 Andreopoulos, B.B., Clum, A., Lindquist, E., Daum, C., Northen, T.R., Kunde- 421 Ramamoorthy, G., Schmitz, R.J., Gryganskyi, A., Culley, D., Magnuson, J., James, 422 T.Y., O’Malley, M.A., Stajich, J.E., Spatafora, J.W., Visel, A., Grigoriev, I. V, 2017. 423 Widespread adenine N6-methylation of active genes in fungi. Nat. Genet. 49, 964–968. 424 doi:10.1038/ng.3859 425 Muszewska, A., Stepniewska-Dziubinska, M.M., Steczkiewicz, K., Pawlowska, J., Dziedzic, 426 A., Ginalski, K., 2017. Fungal lifestyle reflected in serine protease repertoire. Sci. Rep. 427 7, 9147. doi:10.1038/s41598-017-09644-w 428 Muszewska, A., Taylor, J.W., Szczesny, P., Grynberg, M., 2011. Independent subtilases 429 expansions in fungi associated with animals. Mol. Biol. Evol. 28, 3395–3404. 430 doi:10.1093/molbev/msr176

12 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

431 Nepusz, T., Sasidharan, R., Paccanaro, A., 2010. SCPS: a fast implementation of a spectral 432 method for detecting protein families on a genome-wide scale. BMC Bioinformatics 11, 433 120. doi:10.1186/1471-2105-11-120 434 Penn, O., Privman, E., Ashkenazy, H., Landan, G., Graur, D., Pupko, T., 2010. GUIDANCE: 435 A web server for assessing alignment confidence scores. Nucleic Acids Res. 38. 436 doi:10.1093/nar/gkq443 437 Petersen, T.N., Brunak, S., von Heijne, G., Nielsen, H., 2011. SignalP 4.0: discriminating 438 signal peptides from transmembrane regions. Nat. Methods 8, 785–786. 439 doi:10.1038/nmeth.1701 440 Rawlings, N.D., Barrett, A.J., Finn, R., 2016. Twenty years of the MEROPS database of 441 proteolytic enzymes, their substrates and inhibitors. Nucleic Acids Res. 44, D343–D350. 442 doi:10.1093/nar/gkv1118 443 Sharpton, T.J., Stajich, J.E., Rounsley, S.D., Gardner, M.J., Wortman, J.R., Jordar, V.S., 444 Maiti, R., Kodira, C.D., Neafsey, D.E., Zeng, Q., Hung, C.-Y., McMahan, C., 445 Muszewska, A., Grynberg, M., Mandel, M.A., Kellner, E.M., Barker, B.M., Galgiani, 446 J.N., Orbach, M.J., Kirkland, T.N., Cole, G.T., Henn, M.R., Birren, B.W., Taylor, J.W., 447 2009. Comparative genomic analyses of the human fungal pathogens and 448 their relatives. Genome Res. 19, 1722–1731. doi:10.1101/gr.087551.108. 449 Spatafora, J.W., Chang, Y., Benny, G.L., Lazarus, K., Smith, M.E., Berbee, M.L., Bonito, G., 450 Corradi, N., Grigoriev, I., Gryganskyi, A., James, T.Y., O’Donnell, K., Roberson, R.W., 451 Taylor, T.N., Uehling, J., Vilgalys, R., White, M.M., Stajich, J.E., 2016. A phylum-level 452 phylogenetic classification of zygomycete fungi based on genome-scale data. Mycologia 453 108, 1028–1046. doi:10.3852/16-042 454 St. Leger, R.J., Charnley, a. K., Cooper, R.M., 1986a. Cuticle-degrading enzymes of 455 entomopathogenic fungi: Synthesis in culture on cuticle. J. Invertebr. Pathol. 48, 85–95. 456 doi:10.1016/0022-2011(86)90146-1 457 St. Leger, R.J., Cooper, R.M., Charnley, A.K., 1986b. Cuticle-degrading enzymes of 458 entomopathogenic fungi: Cuticle degradation in vitro by enzymes from 459 entomopathogens. J. Invertebr. Pathol. 177, 167–177. doi:10.1016/0022-2011(86)90043- 460 1 461 Stamatakis, A., 2014. RAxML version 8: A tool for phylogenetic analysis and post-analysis 462 of large phylogenies. Bioinformatics 30, 1312–1313. doi:10.1093/bioinformatics/btu033 463 Vega, F.E., Meyling, N. V., Luangsa-Ard, J.J., Blackwell, M., 2012. Fungal 464 entomopathogens, in: Vega, F.E., Kaya, H. (Eds.), Insect Pathology. Elsevier Inc., pp. 465 171–220. doi:10.1016/B978-0-12-384984-7.00006-3 466 Whiston, E., Taylor, J.W., 2016. Comparative phylogenomics of pathogenic and 467 nonpathogenic species. G3 Genes|Genomes|Genetics 6, 235–244. 468 doi:10.1534/g3.115.022806 469 Xiao, G., Ying, S.-H., Zheng, P., Wang, Z.-L., Zhang, S., Xie, X.-Q., Shang, Y., St. Leger, 470 R.J., Zhao, G.-P., Wang, C., Feng, M.-G., 2012. Genomic perspectives on the evolution 471 of fungal entomopathogenicity in Beauveria bassiana. Sci. Rep. 2. 472 doi:10.1038/srep00483 473 Yang, Z., 2007. PAML 4: Phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 474 24, 1586–1591. doi:10.1093/molbev/msm088 475 476

13 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

477 Figure 1 478

479 480 481 482 Figure 1. Maximum likelihood phylogeny calculated with RAxML and based on a 2379 bp 483 alignment of 152 subtilisin-like serine protease codon nucleotide sequences from 484 Entomophthoromycotina that contain the peptidase S8/S53-subtilisin (PF00082) domain. 485 Branches are coloured for eachs species as (Blue), Pandora formicae 486 (Purple), Conidiobolus coronatus (Pink), C. thromboides (brown), C. incongruus (orange), 487 and B. meristopolus (Green). For each SLSP, the accession number and protein domains 488 additional to PF00082 are shown. The three identified clusters: Protease K cluster (A), 489 Pyrolysin/osf protease cluster (B), and the new bacillopeptidase-like Entomophthorales 490 cluster (C), are marked in the grey circle surrounding the tree and with grey background for 491 cluster B and C.

14 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

492 Figure 2 493 494 495 496 497 498 499 500

501 502 503 Figure 2. Active site and domain co-occurrence variability of the three Tribe-MCL clusters 504 identified among 152 Entomophthoromycotina subtilisin-like serine proteases. The columns 505 DTG, GHGTH, and SGTS represents the closest amino acid sequence for each of the amino 506 acids from the DHS catalytic triad. A. Amino acid alignment of the active site residues for 507 the three identified groups (A-C) of SLSPs within Entomophthoromycotina. Accession codes 508 are color coded as: Orange – C. incongruus, Blue – E. muscae, and Purple – P. formicae. B. 509 Sequence motifs of the active site residues for each group. 510

15 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

511 Figure 3

512 513 Figure 3. Mid-point rooted maximum likelihood phylogeny calculated with RAxML and 514 based on a (479 amino acid) alignment of 413 protein subtilisin-like serine protease 515 sequences, which belonged to group C in the Tribe-MCL analysis (see text for details). 516 Bootstrap values >50 from 1000 iterations are shown. 517

16 bioRxiv preprint doi: https://doi.org/10.1101/247858; this version posted January 15, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

518 Figure 4 519 520

521 522 523 Figure 4. Venn diagram showing taxonomic distribution of subtilisin-like serine protease 524 clusters of major insect and nematode-pathogenic fungal genera. Each two or three-digit 525 number corresponds to number of S8A proteases in a specific cluster, which in two cases 526 contain two clusters (314 and 93, and 11 and 38). The asterisk (*) marks the 11 members of 527 the new bacillo-peptidase like cluster C found within the order Entomophthorales 528 (Entomophthoromycotina). Entomophthoromycotina encompasses SLSP’s found in the 529 genera: Basidiobolus, Conidiobolus, Entomophthora, and Pandora, Hypocrealean consists of 530 SLSP’s from the genera: Cordyceps, Metarhizium, Ophiocordyceps, and Nematode trapping 531 fungi is SLSP’s found in the genera: Arthrobotrys and Monacrosporium. 532 533

17