bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Early diverging fungus Mucor circinelloides lacks centromeric histone CENP-A and

2 displays a mosaic of point and regional centromeres

3

4 María Isabel Navarro-Mendoza1#, Carlos Pérez-Arques1#, Shweta Panchal2, Francisco E.

5 Nicolás1, Stephen J. Mondo3,4, Promit Ganguly2, Jasmyn Pangilinan3, Igor V. Grigoriev3,5,

6 Joseph Heitman6*, Kaustuv Sanyal2*, and Victoriano Garre1*

7 1Department of Genetics and Microbiology, Faculty of Biology, University of Murcia, Spain;

8 2Molecular Biology and Genetics Unit, Jawaharlal Nehru Centre for Advanced Scientific

9 Research, Jakkur, Bangalore, India; 3US Department of Energy Joint Genome Institute, Walnut

10 Creek, California, USA; 4Bioagricultural Science and Pest Management Department, Colorado

11 State University, Fort Collins, Colorado, 80521, USA; 5Department of Plant and Microbial

12 Biology, University of California Berkeley, Berkeley, CA 94598, USA; 6Department of

13 Molecular Genetics and Microbiology, Duke University Medical Center, NC, USA

14 #Equally contributed; *Corresponding author

15 Abstract

16 Centromeres are rapidly evolving across eukaryotes, despite performing a conserved

17 function to ensure high fidelity segregation. CENP-A chromatin is a hallmark of

18 a functional centromere in most organisms. Due to its critical role in architecture,

19 the loss of CENP-A is tolerated in only a few organisms, many of which possess holocentric

20 . Here, we characterize the consequence of the loss of CENP-A in the fungal

21 kingdom. Mucor circinelloides, an opportunistic human pathogen, lacks CENP-A along with

22 the evolutionarily conserved CENP-C, but assembles a monocentric chromosome with a

23 localized kinetochore complex throughout the cell cycle. Mis12 and Dsn1, two conserved

24 kinetochore were found to bind nine short overlapping regions, each comprising an

1

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

25 ~200-bp AT-rich sequence followed by a centromere-specific conserved motif that echoes the

26 structure of budding yeast point centromeres. Resembling fungal regional centromeres, these

27 core centromere regions are embedded in large genomic expanses devoid of yet marked

28 by Grem-LINE1s, a novel retrotransposable element silenced by the Dicer-dependent RNAi

29 pathway. Our results suggest that these hybrid features of point and regional centromeres arose

30 from the absence of CENP-A, thus defining novel mosaic centromeres in this early-diverging

31 fungus.

32 Introduction

33 Accurate chromosome segregation is crucial to maintain genome integrity during cell

34 division. The timely attachment of microtubules to centromere DNA is essential to achieve

35 proper chromosome segregation. This is accomplished by a specialized multilayered

36 complex, the kinetochore which links microtubules to centromere DNA. This protein bridge is

37 divided into two layers – the inner and outer kinetochore. The fundamental inner kinetochore

38 protein is the histone H3 variant CENP-A. It binds directly to centromere DNA and lays the

39 foundation to recruit other essential proteins of the kinetochore complex, playing a fundamental

40 role in centromere structure and function, and hence, precise chromosome segregation1,2.

41 CENP-A is also found at all identified neocentromeres3 and at the active centromeres of

42 dicentric chromosomes4, acting as an epigenetic determinant of centromeric identity.

43 Despite its conserved function, the centromere is one of the most rapidly evolving

44 regions of the genome5. This so-called “centromere paradox” has led to centromeres of diverse

45 sizes and content. The first centromeres identified in Saccharomyces cerevisiae were found to

46 be point centromeres - small regions of ~120 bp defined by specific DNA sequences6,7. In

47 contrast to point centromeres described in only a few budding yeasts of the phylum

48 Ascomycota, most other fungi and metazoans have regional centromeres that are larger,

49 ranging from a few kilobases to several megabases8. Regional centromeres are often

2

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

50 interspersed with repetitive sequences and are mostly defined by epigenetic factors rather than

51 DNA sequence per se9. While the centromeres described thus far are confined to one region of

52 the DNA, there are organisms in which their entire chromosomes are loaded with

53 and display a parallel separation of the two chromatids. Such chromosomes possess

54 holocentromeres and have been found in several clades of insects10 and nematodes like

55 Caenorhabditis elegans11. Despite this remarkable variation in the types of centromeres, most

56 organisms have their centromere loci defined by the binding of CENP-A. However, there are

57 a few exceptions, including some insect lineages and kinetoplastids, which have independently

58 lost CENP-A. The kinetoplastid Trypanosoma brucei has evolved a unique set of proteins to

59 perform the role of the kinetochore complex12, whereas all insects that have recurrently lost

60 CENP-A have transitioned from monocentricity to holocentricity10.

61 Neither holocentricity nor lack of CENP-A has ever been reported in the fungal

62 kingdom until a recent study showed that this protein could be absent in the Mucoromycotina

63 subphylum13, which belongs to the early diverging fungi. Kinetochore structure and centromere

64 function are well-studied in species of the Dikarya, whereas centromere identity as well as the

65 kinetochore architecture remain largely unexplored in basal fungi because these are challenging

66 to manipulate genetically. Mucor circinelloides, a fungus of the subphylum Mucoromycotina,

67 is an exception and molecular tools are available to modify its genome14. M. circinelloides, as

68 with other Mucorales, causes an infectious disease known as mucormycosis, which have been

69 associated with high mortality rates due to rhino-orbital-cerebral, pulmonary, or cutaneous

70 infections15. Despite its medical significance and research efforts to identify new antifungal

71 targets, the genome biology and cell cycle of this fungus remain poorly understood. Using

72 evolutionarily conserved kinetochore proteins as tools, we investigated the dynamics of the

73 kinetochore complex during nuclear division and unveiled the unique characteristics of the

74 centromeres of M. circinelloides. Studying distinct and divergent factors involved in the cell

3

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

75 cycle of this organism will open avenues for the development of species-specific antifungal

76 drugs.

77 Results

78 Last Mucoralean common ancestor lost CENP-A and CENP-C, and the descendant

79 Mucoromycotina lack centromere-specific histone variants

80 A recent study of kinetochore evolution in eukaryotes reported the striking absence of

81 two well conserved inner kinetochore proteins, CENP-A and CENP-C, in two species of the

82 Mucoromycotina13. To ascertain the presence or loss of well-studied kinetochore proteins

83 across the entire subphylum, we analyzed the curated proteomes of 75 fungal species, including

84 55 species of Mucoromycotina and 20 species from other fungal clades (Supplementary Table

85 1). The kinetochore protein sequences from Mus musculus, Ustilago maydis, S. cerevisiae, and

86 Schizosaccharomyces pombe served as queries to conduct iterative Blast and Hmmer searches

87 against these proteomes. These sequences were confirmed as putative orthologs by the presence

88 of conserved Pfam domains or by obtaining a matching first hit in a reciprocal Blastp search

89 against the initial four species: mouse, U. maydis, budding, and fission yeasts (Supplementary

90 Table 1, Supplementary Data 1).

91 To confirm the presence or absence of centromere-specific histone H3 CENP-A in each

92 fungal species, 289 histone H3 orthologs were analyzed, identifying 27 putative CENP-A

93 proteins (Supplementary Fig. 1, Supplementary Table 1, Supplementary Data 1). These

94 proteins share common features with well-known CENP-A proteins16, such as an N-terminal

95 tail that varies significantly in length and sequence compared to the canonical H3 counterparts

96 (Supplementary Fig. 1b), the absence of a conserved glutamine residue in the first α-helix

97 (69Q) and a phenylalanine residue (84F) (Supplementary Fig. 1c), a longer and divergent first

98 loop (Supplementary Fig. 1c), and a non-conserved C-terminal tail (Supplementary Fig. 1d).

4

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

99 In addition, the histone fold domain (HFD) sequences of the putative CENP-A proteins have

100 diverged rapidly from the conserved canonical H3 sequences16 (Supplementary Fig. 1e). Nine

101 additional H3 orthologs also share several CENP-A common features but were regarded as rare

102 histones because they lack these conserved substitutions. The analysis revealed that CENP-A

103 was absent in most of the Mucoromycotina clade, specifically in the orders Umbelopsidales

104 and Mucorales; nevertheless, we identified putative CENP-A proteins in all Endogonalean

105 species (Fig. 1). Similarly, CENP-C is present in the Endogonales and Umbelopsidales but not

106 in the Mucorales (Fig. 1). These findings suggest that the last Mucoromycotina common

107 ancestor possessed CENP-A, which was lost after the Endogonales diverged from the rest of

108 the clade. In the absence of CENP-A, CENP-C was presumably lost soon after. In addition,

109 CENP-Q is absent in all Mucoralean species, and CENP-U is lost in the whole subphylum.

110 These proteins have been recurrently lost across the fungal kingdom, as evidenced by their

111 absence in species of all phyla of early-diverging fungi and most species in Basidiomycota

112 (Fig. 1). The absence of orthologs that were present in closely-related species should be taken

113 with caution because this could be a result of the incomplete state of their genome assemblies

114 and annotation.

115 Despite these relevant losses, the remaining kinetochore proteins were well conserved

116 among most species of Mucoromycotina, especially in M. circinelloides. Therefore, we

117 hypothesized that another histone protein could have taken the role of CENP-A as a part of

118 centromeric nucleosome for kinetochore assembly. We searched the M. circinelloides genome

119 sequence for histone protein-coding genes and examined their sequences seeking distinctive

120 features of histone variants. In contrast to canonical mammalian histones H3.1 and H3.2, fungal

121 cell cycle-dependent histones may have intronic sequences and polyadenylation [poly(A)]

122 signals17,18. Thus, the analysis of possible variants focused on the presence of substitutions in

123 relevant amino acid residues targeted by post-translational modifications19. The M.

5

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

124 circinelloides genome contains four histone H3- and three histone H4-coding genes, named

125 histone H three (hht1-4) and histone H four (hhf1-3), respectively (Supplementary Fig. 2a),

126 and all of them feature a poly(A) signal. Three of the four histone H3-coding genes are

127 intronless and their protein sequence identical, except the intron-containing hht4

128 (Supplementary Fig. 2a). Also, the amino acid residues at positions 32, 43, and 97 in the Hht4

129 protein sequence differ compared to the conserved H3 sequence (Supplementary Fig. 2b).

130 Similarly, histone H4-coding genes hhf2 and hhf3 lack introns and encode identical proteins,

131 whereas hhf1 has intronic sequences (Supplementary Fig. 2a) and its product has a longer N-

132 terminal tail (Supplementary Fig. 2c).

133 To test if the minor differences observed in Hht4 and Hhf1 could have resulted in a

134 centromere-specific binding protein, the cellular localization was analyzed and compared to

135 their canonical histone counterpart Hhf3. We constructed alleles that contained each histone-

136 coding gene fused in-frame at the C-terminus with the enhanced green fluorescent protein

137 (eGFP) flanked by each of the histone gene native promoter and terminator sequences, and a

138 selectable marker (Supplementary Fig. 3). Homokaryotic strains expressing fluorescent

139 histones Hht4, Hhf1, or Hhf3 were obtained by transforming a wild-type strain with their

140 respective eGFP-tagged alleles. Integration by homologous recombination at the corresponding

141 native loci was confirmed by PCR (Supplementary Fig. 3, Supplementary Tables 2,3).

142 Asexual spores from these strains were observed with a confocal microscope under two

143 stages of germination: ungerminated and 4-h pre-germinated spores, that had started swelling

144 and forming germ tubes. In both conditions, all fluorescent histone fusions were localized

145 encompassing the entire nucleus instead of forming kinetochore-like nuclear clusters

146 (Supplementary Fig. 2d). Most species of the Mucoromycotina are coenocytic fungi and have

147 multinucleated spores, as does the M. circinelloides wild-type strain used in this study which

148 has large spores with a high number of nuclei20. We established that nuclear division in M.

6

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

149 circinelloides is asynchronous by observing the nuclear distribution of Hht4-eGFP protein in

150 pre-germinated spores during a time-lapse imaging session (Supplementary video 1).

151 Altogether, these findings indicate that either none of these histone variants functions like

152 CENP-A in kinetochore assembly or Mucor centromeres are holocentric in nature. We could

153 test these two possibilities by studying other kinetochore proteins that are evolutionarily

154 conserved and easily identifiable in M. circinelloides.

155 Mis12 and CENP-T complexes show discrete nuclear localization

156 Analyzing the nuclear localization and assembly dynamics of the kinetochore complex

157 of M. circinelloides could offer insights into its architecture. In the absence of CENP-A or

158 another similar histone H3 variant, and CENP-C, that could anchor the kinetochore formation,

159 we focused our attention on the evolutionarily conserved and obvious homologs of inner

160 kinetochore proteins that are expected to be closer to the centromere DNA in this organism

161 (Fig. 2a). We designed fluorescent kinetochore fusion proteins of Mis12, Dsn1, and CENP-T

162 because these are well-conserved in almost all species of the Mucoromycotina. First, we

163 constructed alleles to tag the Mis12 and Dsn1 proteins with the red fluorescent protein

164 mCherry, both at their C-termini and N-termini, aiming to integrate the alleles in their

165 endogenous loci following a similar strategy as for the epitope tagging of the histone genes

166 described above. Homologous recombination in the desired transformants was confirmed by

167 PCR, and six mutant colonies from the four tagged alleles were checked for fluorescent

168 localization. Unfortunately, fluorescent signals were not detected in any of the four different

169 tagged strains possibly due to low levels of expression.

170 As an alternative approach, the Mis12 and Dsn1 proteins were tagged with mCherry at

171 both ends and overexpressed from a strong promoter (Pzrt1) in M. circinelloides (Fig. 2b,

172 Supplementary Fig. 4). The alleles were designed to target integration into the carRP , a

173 gene encoding an enzyme involved in carotenogenesis. Thus, the mutant strains obtained

7

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

174 should lack colored-carotenes and display an albino phenotype14. After confirming the

175 integration of the alleles in the carRP locus by PCR (Supplementary Fig. 4a, b, Supplementary

176 Table 2), one strain harboring each construct was used as a parental strain to integrate the allele

177 for H3-eGFP expression at its own locus as previously described, allowing monitoring

178 colocalization of the kinetochore proteins within the nucleus. Once the double integration was

179 confirmed, two independent strains of each type were selected for fluorescent localization

180 (Supplementary Table 3).

181 Mis12 and Dsn1 displayed a clear localization signal forming small fluorescent puncta

182 within the nucleus marked by histone H3 (Fig 2c, d respectively). The microscopy screening

183 confirmed that the expression of both N-terminal and C-terminal tagged proteins was similar.

184 The Mis12 and Dsn1 localization patterns revealed a single cluster of kinetochores as observed

185 in many fungal species. This result strongly suggests that M. circinelloides has monocentric

186 chromosomes with a localized kinetochore even in the absence of CENP-A.

187 Next, another evolutionarily conserved inner kinetochore protein CENP-T, which is

188 known to interact with DNA, was tagged with eGFP at the C-terminus following the same

189 procedure used for the histone tagging and integrated by homologous recombination at the

190 endogenous locus in strains expressing the Mis12-mCherry and Dsn1-mCherry fusion proteins.

191 Integration of the alleles at the desired loci was confirmed by PCR in both parental backgrounds

192 (Supplementary Fig. 4c), and the localization of the kinetochore proteins was examined by

193 analyzing their fluorescent signal in pre-germinated spores. CENP-T colocalized with Mis12

194 and Dsn1 in each nucleus (Fig 2e, f, respectively), indicating that all three proteins assemble

195 together in the kinetochore ensemble. Time-lapse imaging during spore germination of the

196 double-tagged strains allowed the analysis of nuclear division and kinetochore structure during

197 the cell cycle in live cells. The Mis12 and Dsn1 proteins clustered at the nuclear periphery

8

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

198 during division (Fig 2g), a typical feature of fungal kinetochore proteins, suggesting all these

199 three proteins are constitutively present at the kinetochore during the cell cycle.

200 M. circinelloides exhibits small mosaic centromeres associated with RNAi-suppressed L1-

201 like retrotransposons

202 The centromere regions of early-diverging fungi have never been described, possibly

203 because of their genetic intractability. The generation of kinetochore-tagged strains of M.

204 circinelloides offered the opportunity to identify the nature of centromeres in an organism that

205 lost CENP-A and CENP-C. To map M. circinelloides centromeres, chromatin

206 immunoprecipitation followed by next generation sequencing (ChIP-seq) was performed with

207 Mis12- and Dsn1-mCherry proteins. Illumina sequencing was conducted in two independently

208 immunoprecipitated (IP DNA) samples for each kinetochore protein, one for Mis12-mCherry

209 and the other for Dsn1-mCherry, using RFP-Trap MA beads (Chromotek) for mCherry

210 immunoprecipitation. Input control samples consisting of total sheared DNA (Input DNA)

211 were also sequenced for each tagged strain, as well as mock binding control (beads only DNA)

212 samples with MA beads without antibody. Raw sequencing data were processed, and reads

213 were aligned to a newly generated PacBio genome assembly of Mucor circinelloides f.

214 lusitanicus MU402 strain (hereafter the M. circinelloides genome), the parental strain of the

215 kinetochore protein-tagged strains.

216 Both Mis12 and Dsn1 IP DNA sequence reads were compared to the corresponding

217 control Input DNA reads to identify statistically significant kinetochore protein-enriched

218 genomic loci in both IP DNA samples (FDR ≤ 5x10-5, fold enrichment ≥ 1.6; Supplementary

219 Table 4). The result of this analysis was visualized in a genome browser to ensure that the

220 enrichment was robust and absent in either the Input or beads only DNA controls

221 (Supplementary Fig. 5). By this approach, we detected nine significantly enriched peaks that

222 overlapped in both Mis12 and Dsn1 IP DNA samples, and thus define core centromeres (CCs)

9

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

223 (Fig 3a). Each of these core CEN sequences was located in a different scaffold of the genome

224 assembly (Fig. 3b) and was designated by their scaffold number. The identification of five

225 repeated regions matching the telomeric sequence (5’-TTAGGG-3’) at the 3’-end of scaffolds

226 2, 3, 8, 9, and 10, suggested that CC2 and CC9 display a subtelomeric localization (Fig. 3b).

227 Because the length of the overlapping peaks was minimally different in the two kinetochore IP

228 DNA samples, DNA sequence enriched with at least one of the kinetochore proteins was

229 assumed to be the length of kinetochore binding region on each chromosome. In addition,

230 contiguous enriched regions were added to each CC, defining the core centromere regions

231 (Supplementary Table 4).

232 The average length of core centromere regions of M. circinelloides is 941 bp (Fig. 3c).

233 These unusually short regions are consistently AT-rich (71% average) (Fig. 3d), resembling

234 point centromeres of S. cerevisiae and its related budding yeast species. Further analysis

235 revealed a 41-bp DNA sequence motif conserved in all nine kinetochore protein-bound core

236 centromere regions (Fig. 3e). An extensive search of the presence of this motif across the

237 whole genome revealed that the 41-bp motif is centromere-specific in M. circinelloides.

238 Moreover, a closer inspection of the core centromere regions revealed a conserved pattern.

239 Each core centromere is composed of a minor peak enriched of kinetochore proteins spaced by

240 a highly AT-rich stretch spanning the major binding peak of kinetochore proteins that starts

241 with the 41-bp conserved motif (Fig. 3f).

242 Surrounding the core centromere regions, both upstream and downstream, are stretches

243 of genomic sequences completely devoid of genes. These pericentric regions are considerably

244 larger than the core and vary in length, ranging from 15 to 75 kb (Fig. 4a). These pericentric

245 regions contain a variable number of large repeats oriented away from the core centromere

246 regions (Supplementary Fig. 6). These repeats cluster in nine groups according to their

247 similarity (≥ 95% identity and ≥ 70% coverage across their entire sequence). Clusters 1, 2 and

10

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

248 3 contain all of the full-length elements, which were termed autonomous elements, while the

249 remaining clusters are composed of incomplete repeats or remnants (Supplementary Fig. 7a).

250 A pairwise comparison of all of the repeats revealed a high similarity in their aligned sequences

251 (≥ 90% identity), though the truncated sequences of remnants showed higher differences

252 accounting for the gaps in the alignment.

253 Each autonomous repeat comprises two sequential open reading frames (ORFs) with

254 an overlapping codon. The first ORF encodes a product containing several zinc finger domains,

255 and the second codes for a protein with AP endonuclease and reverse transcriptase domains

256 (Fig 4b, Supplementary Table 5). Based on their high similarity, their ORF domain

257 architecture, the presence of a poly(A) signal at their 3’-end, and the absence of long terminal

258 repeats (LTR) at either end, they were classified as a single non-LTR retrotransposon.

259 Phylogenetic profiling revealed that these retrotransposons belong to the LINE1 clade

260 (Supplementary Fig. 7b) and are closely related to other LINE1-like retrotransposons of

261 Mucoromycotina species, so they were termed Genomic Retrotransposable Element of

262 Mucoromycotina (Grem) LINE1-like. A search of the M. circinelloides genome revealed that

263 the Grem-LINE1s are found specifically in the pericentromeric regions; or at the non-telomeric

264 ends of three small scaffolds, oriented away from the end (Supplementary Table 5). Four major

265 scaffolds end abruptly in a putative centromere region, suggesting that they may be linked to

266 these minor scaffolds that contain the Grem elements at one end. Broadening the search across

267 55 genomes of species of Mucoromycotina identified the Grem elements in most Mucorales

268 and Umbelopsidales (≥ 30% identity and ≥ 50% coverage), but they could not be found in any

269 of the Endogonalean genomes, indicating an inverse correlation between the presence of Grem-

270 LINE1s and CENP-A (Fig. 1 and Supplementary Table 6).

271 RNAi silences expression and prevents transposition of retrotransposable elements in

272 several fungal species featuring functional RNAi21,22. M. circinelloides possess an elaborate

11

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

273 RNAi mechanism that protects the genome against invading elements, that could prevent the

274 movement of retrotransposable elements. The two Dicer enzymes, particularly Dcl2, and one

275 of the three Argonaute proteins (Ago1), play a pivotal role in this protective RNAi-related

276 pathway23. To analyze the transcriptional landscape of the pericentric regions, previously

277 generated and publicly available transcriptomic raw data for mRNA and small RNAs were

278 reprocessed and aligned to the M. circinelloides genome assembly. The pericentric regions

279 were almost devoid of transcription in the wild-type strain, although the retrotransposons were

280 modestly expressed. Indeed, a high number of sRNAs corresponding to the retrotransposons

281 were detected (Fig. 4c, Supplementary Fig. 6), indicating an inverse correlation between

282 mRNA and sRNA levels. This inverse correlation was confirmed by analyzing the

283 transcriptomic profile of mutant strains lacking essential components of the RNAi machinery,

284 Ago1 or Dcl1/Dcl2. In these RNAi-deficient mutants, the pattern was reversed; high mRNA

285 levels from the retrotransposons were observed, while the production of sRNAs originating

286 from those regions was considerably lower than in the wild-type strain. These findings suggest

287 that the retrotransposons contained in the pericentric regions of M. circinelloides are silenced

288 by the Dicer-dependent RNAi pathway.

289 In most organisms, deposition of CENP-A at the centromere-specific nucleosomes is

290 correlated to depletion of histone H324–26. Because CENP-A is absent in M. circinelloides, we

291 quantified the presence of histone H3 at the centromere-specific nucleosomes by performing

292 ChIP with anti-histone H3 antibodies and a qPCR analysis. The occupancy of histone H3 was

293 examined by qPCR in four regions across the scaffold_2: the spanning kinetochore protein-

294 binding region (core centromere, CC2), the ORF-free pericentromeric region, the centromere-

295 flanking ORFs and the control region far-CEN ORF ~2 Mb away from the centromere (Fig.

296 4d). There are no significant differences in histone H3 occupancy across the analyzed regions,

12

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

297 indicating that H3 is not depleted at either the kinetochore protein binding region or the

298 pericentric region compared to the flanking ORFs and the control far-CEN ORF (Fig. 4d).

299 Discussion

300 Mucoralean species are the causative agents of mucormycosis, an emerging fungal

301 infection with extremely poor clinical outcomes, probably as a consequence of their innate

302 resistance to all current antifungal drugs27,28 and their ability to evade host innate immunity29,30.

303 In spite of their clinical relevance, limited information is available on essential biological

304 processes in these fungal pathogens, especially nuclear division and cell cycle. In this study,

305 we identified major kinetochore components in M. circinelloides, and utilized them to identify

306 and characterize nine chromosomal loci as centromeres.

307 In most eukaryotes, the centrochromatin is composed of nucleosomes containing the

308 histone H3 variant CENP-A as a hallmark of centromere identity. The most striking

309 observation of our study was the absence of this centromere determining protein in all

310 Mucoralean species. Absence of CENP-A was accompanied by loss of CENP-C, while most

311 other kinetochore protein orthologs were present in these species. Loss of a previously-thought

312 indispensable protein like CENP-A raised questions on the evolutionary bases of centromere

313 and kinetochore structure. Absence of CENP-A had been previously observed in certain insect

314 orders including butterflies and moths, bugs and lice, earwigs, and dragonflies which in all

315 known cases led to a transition to holocentricity, suggesting that the presence of a centromere-

316 specific protein is dispensable for segregation of a chromosome with localized kinetochore10.

317 Our study is the first to demonstrate loss of core kinetochore proteins in the fungal

318 kingdom, reinforcing that the absence of CENP-A can be tolerated in eukaryotic organisms.

319 Surprisingly, punctate localization patterns of other conserved kinetochore proteins including

320 Mis12, Dsn1, and CENP-T throughout the cell cycle indicate that these organisms still retain a

13

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

321 monocentric arrangement of a canonical kinetochore structure, as opposed to the holocentric

322 insects that lack CENP-A. Kinetochore proteins in M. circinelloides were observed to be

323 localized at the periphery of each nucleus in the asexual spores as well as mycelia. Three types

324 of nuclear divisions have been studied in multinucleated fungi – synchronous, asynchronous,

325 and parasynchronous31. In all ascomycetes except S. pombe and Zymoseptoria tritici,

326 kinetochore proteins are centromere-localized and clustered throughout the cell cycle32,33. In

327 the basidiomycete C. neoformans, kinetochore proteins transition between clustering and

328 declustering states in different stages of the cell cycle34.

329 We have observed asynchronous nuclear divisions during germination of asexual

330 spores of M. circinelloides, a multinucleated and coenocytic fungus. The kinetochore was

331 found to be consistently present as a cluster in all stages of , implying that the basic

332 kinetochore structure in this organism is constitutively assembled throughout the cell cycle.

333 Overall, these findings hint that the role of CENP-A is being carried out by either a distinct

334 protein that has a newly evolved function or by one of the known kinetochore proteins like

335 CENP-T, which also has a histone-fold DNA binding domain. We also tried to find whether

336 one of the histone H3 variants plays a role of centromeric histone, but none of them displayed

337 kinetochore-like localization, indicating that they are probably not involved in replacing

338 CENP-A function in chromosome segregation mediated by the kinetochore machinery.

339 A ChIP-based analysis using Dsn1 and Mis12 revealed the first glimpse of the

340 centromere structure of this Mucoromycotina species. The centromeres in this organism

341 possess features of both point and regional centromeres, resulting in a unique mosaic; thus, we

342 named them mosaic centromeres (Fig. 5). One of their distinctive features is a small

343 kinetochore protein binding domain (~750-1150 bp) defined by the presence of an AT-rich

344 sequence stretch (~70-80% AT, ~100 bp), which is followed by a conserved centromere-

345 specific 41-bp sequence motif. This motif is only found in M. circinelloides centromeres and

14

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

346 marks the start of the major kinetochore-binding region, suggesting that it could be a binding

347 site for the unknown centromere-specific protein. Overall, this pattern resembles the

348 distribution of conserved elements in S. cerevisiae point centromeres, in which two conserved

349 Centromere Determining Elements (CDEI and CDEIII) are separated by an AT-rich non-

350 conserved DNA sequence stretch (CDEII)7.

351 Other features of M. circinelloides mosaic centromeres are reminiscent of fungal

352 regional centromeres like the ones described in Candida albicans35. The pericentric regions of

353 M. circinelloides are large (~15-75 kb), gene-free, and transcriptionally silenced sequences that

354 are interspersed by Grem-LINE1s, which are repeats of a LINE1-like non-LTR

355 retrotransposable element. These types of retroelements have been identified in the

356 centromeres and neocentromeres of highly diverged eukaryotes, especially LINE1-like

357 elements in mammals36–41 and also other non-LTR retroelements as active components of fruit

358 fly centromeres42, and LTR retrotransposons in plants43 and fungi22. Based on these findings,

359 a model has been proposed that CENP-A is recruited by genomic sequences rich in

360 retrotransposable elements and thus, gives rise to the centromeres44. M. circinelloides is the

361 first organism to display an enrichment in retrotransposable elements in their centromeric

362 regions in the absence of CENP-A, suggesting that these non-LTR elements determine the

363 formation of centromeres. Thus, we hypothesize that identifying Grem clusters in other species

364 of Mucoromycotina could pinpoint the location of their putative pericentromeric regions.

365 RNAi machinery is essential to control the mobility of transposable elements, but also

366 to maintain centromere structure and stability22. Fungi that have lost essential components of

367 the RNAi machinery have shorter pericentric regions and fewer transposable elements than

368 closely related RNAi-proficient species. M. circinelloides has a functional and complex RNAi

369 system with Dicer-dependent and -independent pathways23. Our results demonstrate that the

15

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

370 expression of the centromeric Grem-LINE1s is suppressed by the Dicer-dependent RNAi

371 pathway, which may have resulted in accumulation of these elements in the centromeres22.

372 The composition of centromere-specific nucleosomes has been debated and several

373 conflicting models have been proposed, which include a hemisome in yeast and Drosophila

374 with one CENP-A/H4 and one H2A/H2B dimer45,46, a conventional octamer with two CENP-

375 A/H4/H2A/H2B47, hexasomes with two CENP-A/H4/Scm3, and mixed octasomes which have

376 one molecule of H3 with CENP-A48. In addition, H3 nucleosomes have been shown to be

377 interspersed with CENP-A nucleosomes in certain plants and animals49–51. We observed the

378 presence of H3 at the relatively short kinetochore protein-binding region at a level that is

379 similar to the pericentric region or gene bodies. Because histone H3 was found to be enriched

380 at the core centromere, we propose that either a homotypic histone H3 nucleosome or a

381 heterotypic histone H3 nucleosome exists where CENP-A is replaced with some other protein.

382 Further studies will provide the exact composition of centromere-specific nucleosomes in this

383 basal fungus.

384 Overall this study has uncovered for the first time the centromeres of an early-diverging

385 fungus, which are characterized by a mosaic of features from both point and regional

386 centromeres found in the fungal kingdom. Several studies on rapidly-evolving centromeres

387 suggest that point centromeres may be a recently evolved state as compared to regional

388 centromeres9. Therefore, we hypothesize that the last mucoralean common ancestor had

389 regional centromeres with pericentric regions that harbored retrotransposable elements

390 contained by an active RNAi machinery. The loss of CENP-A triggered an evolutionary

391 rearrangement from typical regional centromeres to the smaller kinetochore-binding regions

392 that defines the mosaic centromeres of M. circinelloides.

16

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

393 Methods

394 Fungal strains and culture conditions

395 All the strains used and generated in this work derive from M. circinelloides f.

396 lusitanicus CBS277.49. The double auxotroph MU40214 (Ura-, Leu-) or the MU636 (Ura+, Leu-

397 ) strain was used for transformation with cassettes containing the selectable marker for uracil

398 pyrG or for leucine leuA, respectively. Single tagged strains in kinetochore proteins MU840,

399 MU842, MU844, and MU846 were used as parental strains for transformation with the Hht4-

400 eGFP cassette to obtain double-tagged strains. All the generated strains in this work are listed

401 in Supplementary Table 3.

402 After transformation, to select uracil prototrophy M. circinelloides spores were plated

403 in minimal medium with Casamino Acids (MMC)14, and for leucine prototrophy spores were

404 plated in medium yeast nitrogen base (YNB)14, both adjusted at pH 3.2 for transformation and

405 colony isolation. M. circinelloides spores were cultured in rich medium yeast-peptone-glucose

406 (YPG)14 for optimal growth and sporulation at pH 4.5. The microscopy and ChIP assays were

407 performed with pre-germinated spores growing in YPG pH 4.5 for 3 hours at 250 rpm and 26

408 ºC.

409 Microscopy imaging and acquisition

410 Freshly isolated M. circinelloides spores were washed twice with 1X phosphate

411 buffered saline (PBS). The spores were resuspended in PBS and 10 μl suspension was placed

412 on a slide, covered with a coverslip and sealed with clear nail polish. The images were acquired

413 at room temperature using laser scanning inverted confocal microscope LSM 880-Airyscan

414 (Zeiss, Plan Apochromat 63x, NA oil 1.4) equipped with highly sensitive photo-detectors. The

415 filters used were GFP/FITC 488, mCherry 561 for excitation and GFP/FITC 500/550 band

416 pass, mCherry 565/650 long pass for emission. Z- stack images were taken at every 0.5 μm and

17

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

417 processed using Zeiss Zen software. All the images were displayed after the maximum intensity

418 projection of images using Zeiss Zen software.

419 Live cell imaging

420 Freshly isolated M. circinelloides spores were incubated in YPG growth medium at

421 26ºC for 2 h. Then, the spores to be imaged were pelleted at 3000 xg and washed twice with

422 1X phosphate buffered saline (PBS). The spores were resuspended in PBS and 10μl suspension

423 was placed on a slide containing thin 2% agarose patch containing 2% dextrose and covered

424 with a coverslip. Live cell imaging was performed at 30ºC in a temperature-controlled chamber

425 of an inverted confocal microscope (Zeiss, LSM-880) using a Plan Apochromat 100X NA oil

426 1.4 objective and GaAsp photodetectors. Images were collected at 60-s intervals with 0.5 µm

427 Z‐steps using GFP/FITC 488, mCherry 561 for excitation. GFP/FITC 500/550 band pass or

428 GFP 495/500 and mCherry 565/650 or mCherry 580‐750 band pass for emission. All the

429 images were displayed after the maximum intensity projection of images at each time using

430 Zeiss Zen software.

431 Ortholog search

432 Seventy five fungal proteomes were retrieved from the Joint Genome Institute (JGI)

433 Mycocosm genome portal52 (Supplementary Table 1). BLAST+53 v2.7.1 individual protein-

434 protein BLASTp, iterative BLASTp (PSI-BLAST) and iterative HMMER v3.2.1

435 (http://www.hmmer.org) jackhammer searches (E-value ≤ 1x10-5) were conducted against

436 these proteomes using M. musculus, U. maydis, S. cerevisiae, and S. pombe protein sequences

437 (Supplementary Table 1) as queries. In addition, an hmmsearch was launched using the Pfam-

438 A54 HMM models for each kinetochore protein [Pfam gathering threshold (GA) as cut-off

439 value]. All the putative orthologs found were used to perform a reciprocal BLASTp against M.

440 musculus, U. maydis, S. cerevisiae, and S. pombe protein sequences. Also, Pfam-A protein

18

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

441 domains were searched in all the possible orthologs using HMMER hmmscan. Protein

442 sequences that either lacked appropriate Pfam domains or failed to produce a hit in a reciprocal

443 blastp search were discarded, resulting in a matrix of putative orthologs for each given species

444 (Supplementary Table 1). The cladogram showing the relationship among species was drawn

445 using the tree data from JGI Mycocosm52 and the interactive Tree of Life55 (iTOL v4) tool. To

446 determine CENP-A presence or absence in each given species, all 289 protein sequences of

447 histone H3 putative orthologs (Supplementary Table 1) were aligned using MUSCLE56

448 v3.8.1551, and the phylogenetic relationship of their HFD was inferred by the neighbor-joining

449 method employing the Jones, Taylor, and Thornton (JTT) substitution model57, with a bootstrap

450 procedure of 1,000 iterations (MEGA58 v10.0.5).

451 Construction of the strains expressing the histone and kinetochore proteins with the

452 fluorescent tag

453 Alleles containing fluorescent fusion protein coding-sequences were designed to either

454 integrate them in their corresponding endogenous locus, or in the carRP locus. The Hht4-eGFP,

455 Hhf1-eGFP, Hhf4-eGFP and CENP-T-eGFP alleles contained the ORF of each gene fused in-

456 frame at their C-termini with the eGFP sequence, followed by the leuA gene as a selectable

457 marker. The alleles were obtained by overlapping PCR using 5’-modified primers harboring

458 restriction sites for easy cloning. Briefly, each allele was obtained by fusing the 1-kb fragment

459 upstream the stop codon of each gene, the 0.7-kb eGFP fragment (ending in a stop codon), the

460 3.4-kb leuA fragment, and the 1-kb sequence downstream the stop codon of each tagged gene.

461 All the linear alleles were digested with appropriate restriction enzymes, ligated into the

462 plasmid Bluescript SK (+), and cloned into Escherichia coli.

463 The Mis12-mCherry and Dsn1-mCherry constructions were designed to allow the

464 integration and ectopic overexpression of the fused proteins in the carRP locus. First, a

19

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

465 universal mCherry-expression vector was constructed, containing the strong promoter Pzrt1,

466 the ORF of the mCherry coding-gene including the stop codon, and the pyrG gene as a

467 selectable marker (named pMAT1915). To do that, a fragment containing the 1-kb downstream

468 and upstream sequences from the carRP gene flanking the Pzrt1 promoter was amplified with

469 an inverse PCR using the plasmid pMAT147714 as a template. Then, the mCherry and pyrG

470 fragments were added by In-Fusion cloning (Takara) using primers with appropriate

471 overlapping 5’-tails (Supplementary Table 2). Once the universal pMAT1915 vector was

472 constructed, it was used for the C-terminal tagging of or dsn1, fusing their ORFs without

473 the stop codon upstream and in-frame with the mCherry ORF by In-Fusion cloning. The

474 resulting plasmids were pMAT1917 and pMAT1919, respectively for mis12 and dsn1. For the

475 N-terminal tagging, the plasmid pMAT1915 was inverse-amplified using primers that exclude

476 the stop codon of the mCherry fragment, and the mis12 and dsn1 ORFs, including their stop

477 codons, were cloned in-frame. The resulting plasmids were pMAT1916 and pMAT1918 for

478 mCherry-Mis12 and mCherry-Dsn1 expression, respectively.

479 StellarTM competent cells (Takara) were used for transformation by heat-shock

480 following the supplier procedures. The correct construction of all the generated plasmids was

481 confirmed by restriction analysis and Sanger sequencing.

482 For transformation, the alleles were released from the vector backbone by restriction

483 digestion and used to transform M. circinelloides protoplasts by electroporation14. Spores from

484 individual colonies were collected and plated in selective media to ensure homokaryosis. After

485 at least five vegetative cycles, the integration in the target locus was confirmed by PCR using

486 primers that amplified the whole locus (Supplementary Table 2), producing PCR-fragments

487 that allowed discrimination between the mutant and the wild-type locus (Supplementary Fig.

488 3, 4). Genomic DNA purification was performed following the procedure previously

489 published59.

20

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

490 Chromatin immunoprecipitation and sequencing

491 The ChIP protocol for M. circinelloides was adapted from previous studies in other

492 fungal models22. Briefly, spores from the strains MU842 (expressing Mi12-mCherry) and

493 MU846 (expressing Dsn1-mCherry) were used for independent IP experiments. 107 spores/mL

494 of these strains were germinated in 100 ml of liquid YPG pH 4.5 medium, shaking at 250 rpm

495 at 26ºC for 3 hours. Then, cells were fixed by adding formaldehyde at a final concentration of

496 1% for 20 minutes. The fixation was quenched by adding glycine to a final concentration of

497 135 mM. Cells were pelleted and ground with liquid nitrogen, and 150 mg of ground pellet was

498 resuspended in ChIP lysis Buffer (50 mM HEPES pH 7.4, 150 mM NaCl, 1% Triton X-100,

499 0.1% DOC, and 1 mM EDTA), with a protease inhibitor cocktail. The sample was sonicated

500 using a Bioruptor (Diogenode) with pulses of 30s ON/OFF for 50 cycles to obtain a chromatin

501 fragmentation between 100 and 300 bp.

502 The lysates were clarified by centrifugation at 12,000 xg for 10 min at 4ºC. At this step,

503 100 μL were frozen at -20ºC for Input DNA control. The samples were then divided into 450

504 μL for the antibody immunoprecipitation and 450 μL for the mock control. For IP 25 μL of

505 Red Fluorescent Protein (RFP)-Trap Magnetic Agarose (MA) beads (Chromotek) were added

506 and 25 μL of MA beads (Chromotek) to the binding control. Samples were incubated at 4ºC

507 overnight. After incubation, the beads were washed twice with 1 mL of the low salt wash buffer

508 (20 mM Tris pH 8.0, 150 mM NaCl, 0.1% SDS, 1 % Triton X-100, and 2 mM EDTA), twice

509 with 1 mL of the high salt wash buffer (20 mM Tris pH 8.0, 500 mM NaCl, 0.1% SDS, 1%

510 Triton X-100 and 2 mM EDTA), once with the LiCl wash buffer (10 mM Tris pH 8.0, 1 mM

511 EDTA, 0.25 M LiCl, 1% NP40, and 1% DOC), and finally once with TE (10mM Tris pH 8.0,

512 and 1 mM EDTA). To elute the DNA, 250 μL of TES (50 mM Tris pH 8.0, 10 mM EDTA, 1%

513 SDS) was added to each sample and incubated for 10 min at 65ºC, twice. The

514 immunoprecipitated samples and the Input controls were de-crosslinked by incubating at 65ºC

21

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

515 overnight. The samples were treated with 199 μg of RNase A (Sigma Aldrich) and 190 μg of

516 Proteinase K (Sigma Aldrich) for 2 hours at 50ºC. For DNA purification, 1 volume of

517 phenol:chloroform:isoamyl alcohol (25:24:1) was added, the aqueous phase was recovered,

518 and then 1 volume of chloroform:isoamyl alcohol (24:1) was added. After centrifugation, the

519 aqueous phase was recovered, and DNA was precipitated with 20 μg of glycogen (Thermo

520 Fisher), 1/10 volume of Na-Acetate 3 M (pH 5.2), and 1 volume of ethanol 100%. Pulled DNA

521 from duplicated samples were precipitated together for sequencing. Samples incubated at -20ºC

522 overnight were centrifuged for 10 min at 12,000 xg and the pellets were washed with ethanol

523 70%. Each sample was resuspended in 30 μL of MilliQ water. The quality of the DNA was

524 analyzed by QuBit (Thermo Fisher) and sent to Novogene, which prepared the libraries using

525 TruSeq ChIP Library Prep Kit and sequenced them with Illumina Hiseq 2500 High-Output v4

526 single-end reads of 50 bp.

527 Histone H3-ChIP assays were performed in the same way with the following changes.

528 The protoplasts were obtained as described before60 and then sonicated as above. Native ChIP

529 was performed22 using anti-H3 antibody (Abcam ab1791). The DNA pellet was dissolved in

530 25 μl of MilliQ water. All samples (Input, IP with or without antibodies) were used for PCR.

531 The Input and IP DNA was subsequently used for qPCR using the primers listed in

532 Supplementary Table 2b. Histone H3 enrichment was determined by the percentage input

533 method using the formula:100 푥 2푎푑푗푢푠푡푒푑 퐶푡 퐼푛푝푢푡−푎푑푗푢푠푡푒푑 퐶푡 퐼푃 (ref). The adjusted Ct value is

534 the dilution factor (log2 of dilution factor) subtracted from the Ct value of the Input or IP DNA

535 samples. Three technical replicates were taken for qPCR analysis and standard error of mean

536 was calculated.

537 Sequencing data analysis

22

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

538 Raw reads from Chromatin Immunoprecipitation and RNA sequencing (ChIP- and

539 RNA-seq) were quality-checked with FASTQC v0.11.8 and adapters removed with Trim

540 Galore! v0.6.2 (http://www.bioinformatics.babraham.ac.uk/projects/). Reads were aligned to

541 the Mucor circinelloides f. lusitanicus MU402 reference genome Muccir1_3 (referred to as the

542 M. circinelloides genome throughout the text, available at JGI

543 http://genome.jgi.doe.gov/Muccir1_3/Muccir1_3.home.html), using the Burrows-Wheeler

544 Aligner61 (BWA v0.7.8) algorithm for ChIP- and sRNA-seq reads and STAR62 v2.7.1a for

545 mRNA-seq data. The number of overlapping aligned reads per 25-bp bin was used as a measure

546 of coverage, obtaining normalized bigWig coverage files with deepTools63 v3.2.1

547 bamCoverage function. For RNA-seq data, coverage was normalized to bins per million

548 mapped reads (BPM). For ChIP-seq data, reads were normalized per genomic content (1x

549 normalization). ChIP enrichment peaks were identified by Model-based Analysis of ChIP-seq64

550 (MACS2 v2.1.1) callpeak function and sorted by fold-enrichment and FDR values

551 (Supplementary Table 4). Enriched peaks (fold-enrichment ≥ 1.6, FDR ≤ 5x10-5) were

552 manually curated by analyzing the IP coverage with the Integrative Genomics Viewer65 (IGV

553 v2.4.1) genome browser, ensuring that peaks were present in both Mis12 and Dsn1 IP samples,

554 and absent in either the input and binding controls. The resulting centromeric sequences were

555 analyzed with Multiple Em for Motif Elicitation66 (MEME v5.0.5) to discover conserved DNA

556 motifs on both strands. GC content every 25-bp bin was calculated with bedTools67 v2.28.

557 Normalized coverage, GC content and genetic elements annotation files were rendered with

558 either Gviz68 v1.29.0 for whole-chromosome visualization or deepTools pyGenomeTracks

559 module for close-in genomic views.

560 Transposable element analysis

561 The M. circinelloides genome assembly of strain MU402 was searched for repeats using

562 RepeatScout69 v1.0.5 and RepeatMasker (http://www.repeatmasker.org) v4.0.9. The repeats

23

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

563 located in the pericentromeric regions were aligned using MAFFT70 v7.407 and clustered with

564 CD-HIT71 v4.8.1, generating groups of sequences that shared ≥ 95% identity and ≥ 70%

565 coverage and calculating their p-distance (nucleotide changes per site). A BLASTn search

566 against M. circinelloides genome assembly was conducted using a representative repeated

567 sequence from each cluster to locate similar sequences across the entire genome, obtaining hits

568 with ≥ 80% identity and ≥ 20% coverage (Supplementary Table 5). The pericentromeric repeats

569 were examined with getorf from Emboss72 v6.6.0, obtaining all ORFs ≥ 600 nt. Protein

570 domains in these ORFs translated sequences were predicted by hmmsearch using the Pfam-A

571 v32.0 database. The reverse transcriptase (RVT) domains found in the repeated sequences were

572 aligned with MUSCLE to create a consensus sequence, and this consensus sequence was

573 aligned with 281 RVT domains of well-known non-LTR retrotransposons from Repbase73

574 24.04. A neighbor-joining phylogenetic tree was inferred from this alignment using the JTT

575 substitution model and a bootstrap procedure of 1000 iterations (MEGA v10.0.5). To assess

576 the presence of the retrotransposable elements in other species of Mucoromycotina, a tBLASTn

577 search was launched against the 55 available genomes at JGI (Supplementary Table 6) using

578 the translated sequence of both ORFs combined, recovering hits with ≥ 50% coverage and E-

579 value ≤ 10-5.

580 Data availability

581 ChIP-seq raw data and processed files are deposited at the Gene Expression Omnibus

582 (GEO) repository under GSE132687 accession number. Open access will be granted upon

583 publication and they are provisionally available upon token request. Both mRNA and sRNA-

584 seq data were already available at the Sequence Read Archive (SRA) and can be accessed with

585 the following run accession numbers: SRR1611144 (R7B wild-type strain mRNA),

586 SRR1611151 (ago1 deletion mutant strain mRNA), SRR1611171 (double dcl1 dcl2 deletion

24

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

587 mutant strain mRNA), SRR039123 (R7B wild-type strain sRNA), SRR836082 (ago1 deletion

588 mutant strain sRNA), SRR039128 (double dcl1 dcl2 deletion mutant strain sRNA).

589

590 References

591 1. McKinley, K. L. & Cheeseman, I. M. The molecular basis for centromere identity and

592 function. Nat. Rev. Mol. Cell Biol. 17, 16–29 (2016).

593 2. Cheeseman, I. M. The kinetochore. Cold Spring Harb. Perspect. Biol. 6, a015826

594 (2014).

595 3. Marshall, O. J., Chueh, A. C., Wong, L. H. & Choo, K. H. A. Neocentromeres: new

596 insights into centromere structure, disease development, and karyotype evolution. Am.

597 J. Hum. Genet. 82, 261–82 (2008).

598 4. Earnshaw, W. C. & Migeon, B. R. Three related centromere proteins are absent from

599 the inactive centromere of a stable isodicentric chromosome. Chromosoma 92, 290–6

600 (1985).

601 5. Henikoff, S. The centromere paradox: stable inheritance with rapidly evolving DNA.

602 Science (80-. ). 293, 1098–1102 (2001).

603 6. Clarke, L. & Carbon, J. Isolation of a yeast centromere and construction of functional

604 small circular chromosomes. Nature 287, 504–9 (1980).

605 7. Clarke, L. & Carbon, J. The structure and function of yeast centromeres. Annu. Rev.

606 Genet. 19, 29–55 (1985).

607 8. Friedman, S. & Freitag, M. Centrochromatin of Fungi. in 85–109 (2017).

608 doi:10.1007/978-3-319-58592-5_4

609 9. Malik, H. S. & Henikoff, S. Major evolutionary transitions in centromere complexity.

610 Cell 138, 1067–1082 (2009).

611 10. Drinnenberg, I. A., DeYoung, D., Henikoff, S. & Malik, H. S. Recurrent loss of CenH3

25

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

612 is associated with independent transitions to holocentricity in insects. Elife 3, 717–750

613 (2014).

614 11. Steiner, F. A. & Henikoff, S. Holocentromeres are dispersed point centromeres localized

615 at transcription factor hotspots. Elife 3, (2014).

616 12. Akiyoshi, B. & Gull, K. Discovery of unconventional kinetochores in kinetoplastids.

617 Cell 156, 1247–1258 (2014).

618 13. van Hooff, J. J., Tromer, E., van Wijk, L. M., Snel, B. & Kops, G. J. Evolutionary

619 dynamics of the kinetochore network in eukaryotes as revealed by comparative

620 genomics. EMBO Rep. 18, e201744102 (2017).

621 14. Nicolás, F. E. et al. Molecular tools for carotenogenesis analysis in the mucoral Mucor

622 circinelloides. Methods Mol. Biol. 1852, 221–237 (2018).

623 15. Prakash, H. & Chakrabarti, A. Global Epidemiology of Mucormycosis. J. Fungi 5,

624 (2019).

625 16. Malik, H. S. & Henikoff, S. Phylogenomics of the nucleosome. Nat. Struct. Biol. 10,

626 882–891 (2003).

627 17. Yun, C. S. & Nishida, H. Distribution of introns in fungal histone genes. PLoS One 6,

628 4–7 (2011).

629 18. Osley, M. The regulation of histone synthesis in the cell cycle. Annu. Rev. Biochem. 60,

630 827–861 (1991).

631 19. Luense, L. J. et al. Comprehensive analysis of histone post-translational modifications

632 in mouse and human male germ cells. Epigenetics Chromatin 9, 24 (2016).

633 20. Li, C. H. et al. Sporangiospore size dimorphism is linked to virulence of Mucor

634 circinelloides. PLoS Pathog. 7, e1002086 (2011).

635 21. Smith, K. M., Phatale, P. A., Sullivan, C. M., Pomraning, K. R. & Freitag, M.

636 Heterochromatin is required for normal distribution of Neurospora crassa CenH3. Mol.

26

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

637 Cell. Biol. 31, 2528–42 (2011).

638 22. Yadav, V. et al. RNAi is a critical determinant of centromere evolution in closely related

639 fungi. Proc. Natl. Acad. Sci. 115, 3108–3113 (2018).

640 23. Torres-Martínez, S. & Ruiz-Vázquez, R. M. RNAi pathways in Mucor: A tale of

641 proteins, small RNAs and functional diversity. Fungal Genet. Biol. 90, 44–52 (2016).

642 24. Mizuguchi, G., Xiao, H., Wisniewski, J., Smith, M. M. & Wu, C. Nonhistone Scm3 and

643 histones CenH3-H4 assemble the core of centromere-specific nucleosomes. Cell 129,

644 1153–1164 (2007).

645 25. Thakur, J., Talbert, P. B. & Henikoff, S. Inner kinetochore protein interactions with

646 regional centromeres of fission yeast. Genetics 201, 543–61 (2015).

647 26. Kapoor, S., Zhu, L., Froyd, C., Liu, T. & Rusche, L. N. Regional centromeres in the

648 yeast Candida lusitaniae lack pericentromeric heterochromatin. Proc. Natl. Acad. Sci.

649 112, 12139–12144 (2015).

650 27. Calo, S. et al. Antifungal drug resistance evoked via RNAi-dependent epimutations.

651 Nature 513, 555–558 (2014).

652 28. Caramalho, R. et al. Intrinsic short-tailed azole resistance in mucormycetes is due to an

653 evolutionary conserved aminoacid substitution of the lanosterol 14α-demethylase. Sci.

654 Rep. 7, 3–12 (2017).

655 29. Andrianaki, A. M. et al. Iron restriction inside macrophages regulates pulmonary host

656 defense against Rhizopus species. Nat. Commun. 9, 3333 (2018).

657 30. Pérez-Arques, C. et al. Mucor circinelloides thrives inside the phagosome through an

658 atf-mediated germination pathway. MBio 10, 1–15 (2019).

659 31. Gladfelter, A. S., Hungerbuehler, A. K. & Philippsen, P. Asynchronous nuclear division

660 cycles in multinucleated cells. J. Cell Biol. 172, 347–62 (2006).

661 32. Liu, X., McLeod, I., Anderson, S., Yates, J. R. & He, X. Molecular analysis of

27

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

662 kinetochore architecture in fission yeast. EMBO J. 24, 2919–30 (2005).

663 33. Schotanus, K. et al. Histone modifications rather than the novel regional centromeres of

664 Zymoseptoria tritici distinguish core and accessory chromosomes. Epigenetics

665 Chromatin 8, 41 (2015).

666 34. Kozubowski, L. et al. Ordered kinetochore assembly in the human-pathogenic

667 basidiomycetous yeast Cryptococcus neoformans. MBio 4, (2013).

668 35. Sanyal, K., Baum, M. & Carbon, J. Centromeric DNA sequences in the pathogenic yeast

669 Candida albicans are all different and unique. Proc. Natl. Acad. Sci. 101, 11374–11379

670 (2004).

671 36. Miga, K. H. et al. Centromere reference models for human chromosomes X and Y

672 satellite arrays. Genome Res. 24, 697–707 (2014).

673 37. Chueh, A. C., Northrop, E. L., Brettingham-Moore, K. H., Choo, K. H. A. & Wong, L.

674 H. LINE retrotransposon RNA is an essential structural and functional epigenetic

675 component of a core neocentromeric chromatin. PLoS Genet. 5, e1000354 (2009).

676 38. Carbone, L. et al. Centromere remodeling in Hoolock leuconedys (Hylobatidae) by a

677 new transposable element unique to the gibbons. Genome Biol. Evol. 4, 648–658 (2012).

678 39. de Sotero-Caio, C. G. et al. Centromeric enrichment of LINE-1 retrotransposons and its

679 significance for the chromosome evolution of Phyllostomid bats. Chromosom. Res. 25,

680 313–325 (2017).

681 40. Longo, M. S., Carone, D. M., Green, E. D., O’Neill, M. J. & O’Neill, R. J. Distinct

682 retroelement classes define evolutionary breakpoints demarcating sites of evolutionary

683 novelty. BMC Genomics 10, 334 (2009).

684 41. O’Neill, R. J. W., O’Neill, M. J. & Graves, J. A. M. Undermethylation associated with

685 retroelement activation and chromosome remodelling in an interspecific mammalian

686 hybrid. Nature 393, 68–72 (1998).

28

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

687 42. Chang, C.-H. et al. Islands of retroelements are major components of Drosophila

688 centromeres. PLOS Biol. 17, e3000241 (2019).

689 43. Presting, G. G. Centromeric retrotransposons and centromere function. Curr. Opin.

690 Genet. Dev. 49, 79–84 (2018).

691 44. Klein, S. J. & O’Neill, R. J. Transposable elements: genome innovation, chromosome

692 diversity, and centromere conflict. Chromosom. Res. 26, 5–23 (2018).

693 45. Henikoff, S. & Furuyama, T. The unconventional structure of centromeric nucleosomes.

694 Chromosoma 121, 341–352 (2012).

695 46. Dalal, Y., Wang, H., Lindsay, S. & Henikoff, S. Tetrameric structure of centromeric

696 nucleosomes in interphase Drosophila cells. PLoS Biol. 5, e218 (2007).

697 47. Camahort, R. et al. Cse4 is part of an octameric nucleosome in budding yeast. Mol. Cell

698 35, 794–805 (2009).

699 48. Lochmann, B. & Ivanov, D. Histone H3 localizes to the centromeric DNA in budding

700 yeast. PLoS Genet. 8, e1002739 (2012).

701 49. Blower, M. D., Sullivan, B. A. & Karpen, G. H. Conserved organization of centromeric

702 chromatin in flies and humans. Dev. Cell 2, 319–330 (2002).

703 50. Greaves, I. K., Rangasamy, D., Ridgway, P. & Tremethick, D. J. H2A.Z contributes to

704 the unique 3D structure of the centromere. Proc. Natl. Acad. Sci. 104, 525–530 (2007).

705 51. Yan, H. & Jiang, J. Rice as a model for centromere and heterochromatin research.

706 Chromosom. Res. 15, 77–84 (2007).

707 52. Grigoriev, I. V. et al. MycoCosm portal: gearing up for 1000 fungal genomes. Nucleic

708 Acids Res. 42, D699–D704 (2014).

709 53. Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421

710 (2009).

711 54. El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47,

29

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

712 D427–D432 (2019).

713 55. Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new

714 developments. Nucleic Acids Res. (2019). doi:10.1093/nar/gkz239

715 56. Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high

716 throughput. Nucleic Acids Res. 32, 1792–1797 (2004).

717 57. Jones, D. T., Taylor, W. R. & Thornton, J. M. The rapid generation of mutation data

718 matrices from protein sequences. Comput. Appl. Biosci. 8, 275–82 (1992).

719 58. Tamura, K. et al. MEGA5: Molecular Evolutionary Genetics Analysis using maximum

720 likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol.

721 28, 2731–2739 (2011).

722 59. Navarro-Mendoza, M. I. et al. Components of a new gene family of ferroxidases

723 involved in virulence are functionally specialized in fungal dimorphism. Sci. Rep. 8,

724 7660 (2018).

725 60. Nicolás, F. E. et al. Molecular tools for carotenogenesis analysis in the mucoral Mucor

726 circinelloides. Methods in Molecular Biology 1852, (2018).

727 61. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler

728 transform. Bioinformatics 25, 1754–1760 (2009).

729 62. Dobin, A. & Gingeras, T. R. Mapping RNA-seq Reads with STAR. in Current Protocols

730 in Bioinformatics 11.14.1-11.14.19 (John Wiley & Sons, Inc., 2015).

731 doi:10.1002/0471250953.bi1114s51

732 63. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data

733 analysis. Nucleic Acids Res. 44, W160–W165 (2016).

734 64. Zhang, Y. et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 9, R137

735 (2008).

736 65. Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).

30

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

737 66. Bailey, T. L. et al. MEME SUITE: tools for motif discovery and searching. Nucleic

738 Acids Res. 37, W202–W208 (2009).

739 67. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing

740 genomic features. Bioinformatics 26, 841–842 (2010).

741 68. Hahne, F. & Ivanek, R. Visualizing Genomic Data Using Gviz and Bioconductor. in

742 335–351 (2016). doi:10.1007/978-1-4939-3578-9_16

743 69. Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in

744 large genomes. Bioinformatics 21, i351–i358 (2005).

745 70. Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software Version

746 7: Improvements in Performance and Usability. Mol. Biol. Evol. 30, 772–780 (2013).

747 71. Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-

748 generation sequencing data. Bioinformatics 28, 3150–3152 (2012).

749 72. Rice, P., Longden, I. & Bleasby, A. EMBOSS: the European Molecular Biology Open

750 Software Suite. Trends Genet. 16, 276–7 (2000).

751 73. Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements

752 in eukaryotic genomes. Mob. DNA 6, 11 (2015).

753 Acknowledgments

754 This work was supported by the Ministerio de Economía y Competitividad, Spain (grant

755 number BFU2015-65501-P, cofinanced by FEDER, and RYC-2014-15844) and Ministerio de

756 Ciencia, Innovación y Universidades, Spain (grant number PGC2018-097452-B-I00,

757 cofinanced by FEDER). CPA and MINM were supported by predoctoral fellowships from the

758 Ministerio de Educación, Cultura y Deporte, Spain (grants number FPU14/01983 and

759 FPU14/01832 respectively). We acknowledge financial support from Tata Innovation

760 Fellowship, Department of Biotechnology, Government of India (BT/HRD/35/01/03/2017),

761 DBT-BUILDER programme support at JNCASR (BT/INF22/SP27679/2018) and intramural

31

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

762 support from JNCASR to KS. SP is supported by DST-WOS-A postdoctoral fellowship

763 (SR/WOS-A/LS-207/2018). Supported in part by NIH/NIAID R37 grant AI39115-21 and R01

764 grant AI50113-15 to J.H., and by the CIFAR program Fungal Kingdom: Threats &

765 Opportunities, for which J.H. serves as Co-Director and Fellow. We thank B. Suma at the

766 confocal imaging facility of the Molecular Biology and Genetics Unit, JNCASR for assistance

767 in imaging. Work conducted by the US Department of Energy Joint Genome Institute, a DOE

768 Office of Science User Facility, was supported by the Office of Science of the US Department

769 of Energy under Contract No. DE-AC02-05CH11231. We thank Alexander Idnurm (University

770 of Melbourne, Australia), Luis M. Corrochano (University of Seville, Spain), Timothy Y.

771 James (University of Michigan, USA), Mathew E. Smith (University of Florida, USA), Joseph

772 W. Spatafora (Oregon State University, USA), Jason E. Stajich (University of California-

773 Riverside, USA), and Gregory Bonito (Michigan State University, USA) for sharing fungal

774 genome sequences to analyze the distribution of kinetochore proteins. We also thank Thomas

775 D. Petes (Duke University School of Medicine, USA) and Beth A. Sullivan (Duke University

776 School of Medicine, USA) for critical reading of the manuscript and the members of KS lab,

777 VG lab, and JH lab for valuable discussions during bi-weekly Skype meetings.

778 Author contributions

779 KS, JH, VG, and FEN conceived and supervised the study. MINM and CPA analyzed the

780 distribution of kinetochore proteins in fungi, generated and validated the plasmids and strains,

781 performed the ChIP-seq experiments, identified and defined the retrotransposable elements,

782 and analyzed the mRNA and sRNA-seq data. MINM, CPA and PG analyzed the ChIP-seq data.

783 MINM, CPA, PG and SP characterized the centromeric loci. SP, MINM and CPA did the

784 fluorescence microscopy imaging. SP performed the ChIP-qPCR experiments. IVG, JP and

785 SJM provided the genome assembly and gene annotation. MINM, CPA and SP designed the

32

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

786 figures and wrote the manuscript with the support from KS, JH, and VG. VG, JH, FEN and KS

787 provided funding. All authors revised the manuscript and approved the final version.

788 Figure legends

789 Fig. 1. Distribution of the kinetochore complex across the Mucoromycotina subphylum.

790 Concise kinetochore schematic showing the most conserved protein complexes in eukaryotes

791 is shown on the left corresponding to the kinetochore proteins analyzed in the matrix on the

792 right. The matrix displays the presence or absence of 26 kinetochore proteins across a

793 cladogram of 51 fungal lineages (top) to show the relationships among them. Every fungal

794 phylum is represented and color-coded, differentiating the Mucoromycota into the

795 Glomeromycotina, Mortierellomycotina, and Mucoromycotina subphyla to provide a special

796 emphasis on the latter. Arrows at divergence events in the cladogram mark the hypothetical

797 loss of proteins CENP-A (red) and CENP-C (blue) in these clades.

798 Fig. 2. Inner and outer kinetochore proteins colocalize in M. circinelloides nuclei in the

799 absence of CENP-A and CENP-C. (a) Graphical representation of Mis12, Dsn1, and CENP-

800 T sequence features showing individual Pfam domains PF05859, PF08202, PF15511

801 respectively identified in M. circinelloides. (b) Schematic to represent C-terminal tagging of

802 histone H3 and kinetochore proteins with mCherry (red circles), eGFP (green circles). (c-f)

803 Confocal microscope imaging showing the cellular localization of well-conserved mCherry-

804 tagged outer kinetochore proteins Mis12 (c, e) and Dsn1 (d, f), eGFP-tagged histone H3 (c, d),

805 and eGFP-tagged inner kinetochore CENP-T (e, f) in pre-germinated spores of M.

806 circinelloides strains expressing fluorescent fusion proteins. A calibrated scale (white bar) is

807 provided for size comparison (5 µm). (f) Time-lapsed confocal image displaying the cellular

808 localization of mCherry-tagged Mis12 and eGFP-tagged histone H3 in a germinative tube

809 sprouting from a spore. A discontinuous white perimeter outlines the germinative tube in the

33

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

810 fluorescence images, and a yellow circle indicates a nuclear division event. Cartoon for the

811 zoomed image represents the localization and signal intensity of Mis12-mCherry in the

812 nucleus.

813 Fig. 3. The nine centromeres of M. circinelloides are short, AT-rich, and harbor a

814 conserved DNA motif. (a) Putative kinetochore-binding regions showing color-coded

815 enrichment of immunoprecipitated DNA (IP DNA) from Mis12 and Dsn1 mCherry-tagged

816 strains compared to their corresponding input (Input DNA) and binding controls (Beads only).

817 1 kb flanking sequences from the center of the enrichment peak are shown. Major (M) and

818 minor (m) kinetochore-binding regions are indicated. (b) Scaled ideogram of the nine scaffolds

819 (white) of M. circinelloides genome assembly that contain a kinetochore-binding region (blue),

820 showing the telomeric repeats (red). (c) Size distribution of the nine core centromere (CC)

821 regions. (d) GC content across the nine core centromere (CC) sequences. The median (red line)

822 and standard deviation (black lines) are shown in c and d. (e) 41 bp centromere-specific DNA

823 motif conserved in all nine CC regions and absent in the remainder of the genome. (f) A

824 genomic view of CEN11 illustrating all nine core centromere regions. Kinetochore-binding

825 region enrichment (IP coverage) as the average of both immunoprecipitation signals minus

826 Input and beads only controls (left axis, blue), GC content across the region (right axis, black),

827 and the centromere-specific DNA motif described in e (red) are shown.

828 Fig. 4. M. circinelloides core centromeres are present in large ORF-free pericentric

829 regions having retrotransposable elements regulated by the Dicer-dependent RNAi

830 pathway and bind H3 nucleosomes. (a) Genomic sequences lacking annotated genes (light

831 blue) that flank the CC regions (dark blue) served as the reference points to calculate the length

832 both upstream and downstream. (b) Schematic of the Grem-LINE1 interspersed in the

833 pericentromeric regions. Open Reading Frames (ORF) and protein domains predicted from

834 their coding sequences are shown as colored boxes [Zf, zinc finger (PF00098 and PF16588);

34

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

835 AP, AP endonuclease (PTHR22748); RVT, reverse transcriptase (PF00078); and zf-RVT,

836 zinc-binding in reverse transcriptase (PF13966)]. (c) CEN4 is exemplified by low GC content,

837 lack of genes and transcripts, which are shown in different tracks displaying the kinetochore-

838 binding region enrichment (IP, an average of both IP DNA signals minus Input and beads only

839 controls), annotation of genes (red blocks) and transposable elements (light blue blocks), CEN-

840 specific DNA motif position (vertical line) and direction (arrow), GC content, and

841 transcriptomic data of mRNA (green) and sRNAs (red) in M. circinelloides wild-type, ago1,

842 and double dcl1 dcl2 deletion mutant strains after 48 h of growth in rich media. (d) Genomic

843 location of CEN2 showing the regions studied for histone H3 occupancy as labelled and colored

844 rectangles. ChIP assays were performed using polyclonal antibodies against histone H3.

845 Primers were designed for CC2 region, pericentric regions (1L, 2L, 1R, 2R), flanking ORFs

846 (ORF-L, ORF-R) and a far-CEN control ORF ~2 Mb away from CEN2 (Far-CEN). IP samples

847 were analyzed by real-time PCR using these primers. The y-axis denotes the qPCR value as a

848 percentage of the total chromatin input with standard error mean (SEM), from each region

849 tested. The experiment was repeated three times with similar results.

850 Fig. 5. M. circinelloides possesses a mosaic of point and regional centromeres. A

851 comparative analysis of centromere features of S. cerevisiae (ascomycete), M. circinelloides

852 (mucoromycete), and Cryptococcus neoformans (basiodiomycete). M. circinelloides

853 centromeres are a mosaic with features of point centromeres like shorter kinetochore-bound

854 region (core centromere) and a conserved motif (red dots), combined with features of regional

855 centromeres like large ORF-free pericentromeres dotted with transposable elements that are

856 suppressed by the RNAi machinery.

857 Supplementary Fig. 1. Mucorales and Umbelopsidales lack CENP-A. (a) Schematic of

858 canonical histone H3 and CENP-A shared protein features. (b-d) Multiple protein sequence

859 alignments of the N-tail (b), Loop 1 (c), and C-terminal (d) regions. The scale above the

35

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

860 comparisons indicates the amino acid positions with S. pombe sequence as the reference. Red

861 arrows mark relevant amino acid positions. Amino acids are colored in increasing shades of

862 blue to show conservation within each group. (e) Neighbor-joining phylogenetic tree (JTT

863 model) of the Histone Folding Domain (HFD), showing phylogenetic distance (branch length)

864 and branch support (1000 bootstraps). Branches with < 50% bootstrap support are collapsed.

865 The protein sequences analyzed in b to e are distributed among three groups: well-studied

866 histone H3 (yellow), rare histone H3 (blue), and the predicted CENP-A proteins in this study

867 (red). Asterisks (*) indicate proteins from the Mucoromycotina.

868 Supplementary Fig. 2. M. circinelloides histones H3 and H4 are not centromere-specific

869 binding proteins. (a) Location of all the histone H2A, H2B, H3, and H4-coding genes in the

870 M. circinelloides genome. Genes are represented by arrows showing transcription direction;

871 the coding sequence is depicted as larger red blocks, flanked by smaller blue blocks

872 representing the untranslated regions, and connected by black lines as intronic sequences.

873 Light-gray arrows indicate neighboring non-histone genes and their transcription direction. (b,

874 c) Protein alignment of several well-characterized H3 (b) and H4 (c) histones with M.

875 circinelloides orthologs. A scale indicates the amino acid positions taking S. pombe sequence

876 as the reference. Histone folding domains (HFD) are outlined in a diagram below each

877 alignment, as well as the N-terminal tail. Amino acids are colored in increasing shades of blue

878 and a consensus protein logo is provided to reflect conservation. (d) Confocal microscopy

879 images of M. circinelloides strains expressing eGFP-fluorescent fusion histone proteins Hht4,

880 Hhf1, and Hhf3 in 4-hour pregerminated spores. The fluorescent signal is colored as green. A

881 calibrated scale (white bar) is provided for size comparison (5 µm).

882 Supplementary Fig. 3. Generation of strains expressing fluorescent fusion histones.

883 Diagram of the fluorescent fusion alleles used for integration by homologous recombination

884 (shown as discontinuous crosses) into the wild-type allele of each histone loci: hht4 (a), hhf1 (b),

36

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

885 and hhf3 (c). Arrows in the diagrams mark the annealing regions for specific primers used to

886 confirm the integration by PCR, amplifying both the wild-type (WT) and mutant (MUT)

887 alleles. Gel images below the diagrams show the size-specific PCR fragments. M lanes were

888 loaded with the GeneRuler DNA Ladder Mix (Thermo Fisher) to estimate the fragment sizes.

889 Green regions indicate mCherry and eGFP genes.

890 Supplementary Fig. 4. Generation of mutant strains expressing fluorescent kinetochore

891 proteins. Diagram (above) and agarose gel images (below) showing the integration of the cnpT

892 fusion allele into its wild-type locus (a); and the mis12 (b) and dsn1 (c) fusion alleles (both N-

893 terminal and C-terminal fusions) into the carRP locus. Discontinuous crosses indicate

894 homologous recombination and arrows mark the annealing regions for specific primers used to

895 confirm the integration by PCR, amplifying both the wild-type (WT) and mutant (MUT) alleles

896 shown in the gel images. M lanes were loaded with the GeneRuler DNA Ladder Mix (Thermo

897 Fisher) to estimate the fragment sizes. Red and green regions indicate mCherry and eGFP

898 genes, respectively

899 Supplementary Fig. 5. M. circinelloides has nine centromeres. IP enrichment (coverage x1)

900 of immunoprecipitated DNA (IP DNA) from Mis12 and Dsn1 mCherry-tagged strains

901 compared to their corresponding input (Input DNA) and binding controls (beads only DNA),

902 shown as color-coded tracks across the whole sequence of scaffolds 01-11. Asterisks (*) mark

903 significant peaks in both IP DNA samples that are not present in the controls, indicating the

904 putative centromeric regions.

905 Supplementary Fig. 6. M. circinelloides pericentromeric regions. A genomic view of the

906 pericentromeric regions of all nine centromeres. Each region shows the kinetochore-binding

907 region enrichment (IP, an average of both IP signals minus Input and Beads only controls),

908 annotated genes (red blocks) and transposable elements (light blue blocks), CEN-specific DNA

37

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

909 motif position (black vertical line) and its direction (arrow), GC content, and transcriptomic

910 data of mRNA (green) and sRNAs (red) in M. circinelloides wild-type strain, and ago1 and

911 double dcl1 dcl2 deletion mutants after 48 h of growth in rich media.

912 Supplementary Fig. 7. Centromere-specific non-LTR L1-like retrotransposable elements.

913 (a) Matrix showing pairwise p-distance (% changes per site) across all 44 repetitive elements

914 flanking the centromeres of M. circinelloides. Elements sharing ≥ 95.0% identity are clustered

915 together, generating 10 groups that comprise full-length elements and incomplete sequences

916 (remnants). (b) Neighbor-joining phylogeny (JTT model) of the RVT domain of well-known

917 non-LTR elements. The RVT domain of the centromere-specific element (Grem LINE1)

918 identified in this study is marked inside a blue box. Phylogenetic distance (branch length) and

919 branch support (1000 bootstraps, size-coded circles) are shown. Leaves within the same clade

920 of non-LTR retrotransposons are collapsed, forming a scalene triangle (top and bottom corners

921 show the shortest and longest phylogenetic distance, respectively); except for the L1 clade

922 which contains the Grem LINE-1.

38

bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 1

CENP-A CENP-C

Microtubules sp. rouxii linderi butleri circina crassa fusiger repens padenii trispora poitrasii mucedo africana A. nidulans F. F. anomala G. G. B. cordense allomycis isabellina S. N. A. pombe S. mexicana parasitica umbellata umbellata persicaria coronatus irregularis U. maydis U. verticillata C. C. B. echinulata T. T. elegans B. recurvatus spectabilis vesiculosa H. elegans H. elegans D. H. radiatus H. M. M. M. albicans C. A. P. corymbifera K. R. R. U. U. Z. macrogynus microsporus P. S. punctatus S. S. C. C. C. C. R. G. P.articulosus cerevisiaeS. circinelloides cucurbitarum C. C. heterogamus M. M. P.umbonatus E. intestinalis E. C. C. R. R. H. flammicorona Endogone L. L. blakesleeanus C. neoformans C. A. R. R. Z. Z. C. C. J. M. M. P.

Ndc80 NDC Nuf2 80 Spc25 Spc24 Zwint-1 Knl1

Outer Nsl1 MIS12 Dsn1 Kinetochore Mis12 Nnf1 CENP-T T-W CENP-W S-X CENP-S CENP-X CENP-H H-I-K-M CENP-I CENP-K CENP-M

Inner L-N CENP-L O-P CENP-N

Kinetochore Q-U CENP-O CENP-P C CENP-Q CENP-U CENP-C CENP-A

Phylum Subphylum Order Centromere Microsporidia Zoopagomycota Glomeromycotina

Cryptomycota Mucoromycota Mortierellomycotina Endogonales

Blastocladiomycota Ascomycota Mucoromycotina Umbelopsidales

Chytridiomycota Basidiomycota Mucorales bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 2

a Protein ID: 1432987 DIC Mis12-mCherry H3-eGFP Merged c Mis12Mis12 domain domain 1 25 153 246

Protein ID: 1400423

DIC Dsn1-mCherry H3-eGFP Merged Dsn1Dsn1 domain domain d 1 103 295 397

Protein ID: 1361249

CENP-T domain 1 403 534 DIC CENP-T-eGFP Mis12-mCherry Merged e b hht4 Phht4

mis12 Pzrt1 DIC CENP-T-eGFP Dsn1-mCherry Merged f dsn1 Pzrt1

cnpT PcnpT

DIC Mis12-mCherry g H3-eGFP Merged

t=0

t=1 min

t=3 min

t=5 min

t=6 min

t=7 min

t=8 min bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 3

a b

1 CC1 CC2 CC3 2 3 40 Mis12 IP M 4 Dsn1 IP 5 30 Beads only M 6 Mis12 Input CEN Scaffold 8 20 Dsn1 Input 9 TEL m M m 11 10 m 0 1.0 2.0 3.0 4.0 5.0 6.0 Coverage (1x) Coverage 0 Length (Mb) -1.0 peak 1.0Kb -1.0 peak 1.0Kb -1.0 peak 1.0Kb

CC4 CC5 CC6 c d 100 40

30 M M M

20 s 50 m m CC 10 m

Coverage (1x) Coverage 0 0 500 1000 1500

-1.0 peak 1.0Kb -1.0 peak 1.0Kb -1.0 peak 1.0Kb (%) content GC CC region size (bp) 0 CC8 CC9 CC11 CCs 40 M

(1x) M e 30 M 20 m m 10 m

Coverage 0 -1.0 peak 1.0Kb -1.0 peak 1.0Kb -1.0 peak 1.0Kb f Core centromere region 30 0.60

M

m GC content GC

IP Coverage IPCoverage (1x) 0 0.00 CEN motif

884.0 884.5 885.0 885.5 886.0 886.5 887.0 CC11 (Kb) bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 4

a b

CEN1 CC CEN2 Pericentric region CEN3 CEN4 ORF1 ORF2 CEN5 AP RVT Poly(A) An CEN6 ZFs ZF-RVT CEN8 CEN9 CEN11

-40 -30 -20 -10 0 10 20 30 40 50 60 Pericentric region length (kb) c

CEN4

49kb 25 IP 0 CEN motif Annotation

GC content

mRNA WT sRNA

mRNA Ago- (ago1Δ) sRNA

mRNA Dicer- (dcl1Δ dcl2Δ) sRNA

scaffold_4

d

25 0.25 IP

0 0.20

CEN motif 0.15 Annotation 0.10 % Input% Regions 0.05 analyzed ORF-L 1L 2L CC2 1R 2R ORF-R 0.00 4,690 4,695 4,700 4,705 4,710 4,715 4,720 4,725 4,730 4,735 kb scaffold_2 bioRxiv preprint doi: https://doi.org/10.1101/706580; this version posted July 18, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure 5

Regional CEN

C. neoformans

Mosaic CEN

Point CEN M. circinelloides

S. cerevisiae

Core CEN-specific DNA sequence motif Pericentric transposon-rich region