bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 The DNA-binding protein HTa from Thermoplasma acidophilum is an archaeal 2 histone analog 3 4 Antoine Hocher1,2*, Maria Rojec1,2, Jacob B. Swadling1,2, Alexander Esin1,2, Tobias 5 Warnecke1,2* 6 7 1MRC London Institute of Medical Sciences (LMS), Du Cane Road, London W12 8 0NN, United Kingdom 9 2Institute of Clinical Sciences (ICS), Faculty of Medicine, Imperial College London, 10 Du Cane Road, London W12 0NN, United Kingdom 11 12 Running title: Chromatin architecture in Thermoplasma acidophilum 13 14 *corresponding author 15 16 AH: [email protected] 17 TW: [email protected]

1 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

18 Abstract 19 20 Histones are a principal constituent of chromatin in eukaryotes and fundamental to 21 our understanding of eukaryotic gene regulation. In , histones are 22 phylogenetically widespread, often highly abundant, but not universal: several 23 archaeal lineages have lost histone genes from their coding repertoire. What prompted 24 or facilitated these losses and how archaea without histones organize their chromatin 25 remains largely unknown. Here, we use micrococcal nuclease digestion followed by 26 high-throughput sequencing (MNase-Seq) to elucidate primary chromatin architecture 27 in an archaeon without histones, the acido-thermophilic archaeon Thermoplasma 28 acidophilum. We confirm and extend prior results showing that T. acidophilum 29 harbours a HU family protein, HTa, that is highly expressed and protects a sizeable 30 fraction of the genome from MNase digestion. Charting HTa-based chromatin 31 architecture across the growth cycle and comparing it to that of three histone- 32 encoding archaea ( fervidus, Thermococcus kodakarensis and 33 Haloferax volcanii), we then present evidence that HTa is an archaeal histone analog. 34 HTa-protected fragments are GC-rich, display histone-like mono- and dinucleotide 35 patterns around a conspicuous dyad, exhibit relatively invariant positioning 36 throughout the growth cycle, and show archaeal histone-like oligomerization 37 dynamics. Our results suggest that HTa, a DNA-binding protein of bacterial origin, 38 has converged onto an architectural role filled by histones in other archaea. 39

2 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

40 Introduction 41 42 Across all domains of life, DNA is intimately associated with proteins that wrap, 43 package, and protect it. Bacteria typically encode multiple small basic proteins that 44 are dynamically expressed and fulfil a variety of architectural roles by bridging, 45 wrapping, or bending DNA (Dillon and Dorman 2010). Some of these proteins are 46 phylogenetically widespread, others restricted to specific lineages (Swiercz et al. 47 2013; Lagomarsino et al. 2015). Bacterial chromatin, on the whole, is diverse and 48 dynamic over both physiological and evolutionary timescales. In contrast, a single 49 group of proteins has come to dominate chromatin in eukaryotes: histones. 50 Assembling into octameric complexes that wrap ~147bp of DNA, eukaryotic histones 51 not only mediate genome compaction but also establish a basal landscape of 52 differential accessibility, elaborated via a plethora of post-translational modifications, 53 that is fundamental to our understanding of eukaryotic gene regulation (Jiang and 54 Pugh 2009; Bai and Morozov 2010). 55 56 Histones are also widespread in archaea (Adam et al. 2017; Henneman et al. 2018). 57 They have the same core fold (Decanniere et al. 2000; Mattiroli et al. 2017) as 58 eukaryotic histones, but lack N- and typically also C-terminal tails, the principal 59 substrates for post-translational modifications in eukaryotes (Henneman et al. 2018). 60 Dimers in solution, they assemble into tetramers that wrap ~60bp of DNA (Bailey et 61 al. 1999). This minimal nucleosomal unit can be extended, at least in some archaea, 62 into a longer oligomer via incorporation of additional dimers (Xie and Reeve 2004; 63 Maruyama et al. 2013; Mattiroli et al. 2017). Like their eukaryotic counterparts, 64 archaeal nucleosomes preferentially bind sequences that, for example by means of 65 periodically spaced GC/GG/AA/TT dinucleotides, facilitate wrapping (Bailey and 66 Reeve 1999; Bailey et al. 2000). On average, nucleosome occupancy is higher on 67 more GC-rich sequences and lower around transcriptional start and end sites (Ammar 68 et al. 2011; Nalabothula et al. 2013), which tend to be relatively AT-rich. The precise 69 role of archaeal histones in transcription regulation, however, remains poorly 70 understood (Gehring et al. 2016). 71

3 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

72 Although widespread, archaeal histones are less phylogenetically entrenched than 73 histones in eukaryotes: they have been deleted experimentally in several species 74 without dramatic effects on transcription and growth (Heinicke et al. 2004; 75 Weidenbach et al. 2008; Dulmage et al. 2015) and lost altogether from a handful of 76 archaeal lineages (Adam et al. 2017). A particularly intriguing case concerns the 77 thermophilic acidophile Thermoplasma acidophilum, which lacks histone genes but 78 instead encodes a protein named HTa (Histone-like protein of Thermoplasma 79 acidophilum). Based on its primary sequence (DeLange et al. 1981; Drlica and 80 Rouvière-Yaniv 1987) and predicted secondary, tertiary and quaternary structure 81 (Figure 1a-b), HTa is a member of the HU family of proteins, which are broadly 82 distributed across bacterial phyla (Table S1) and often abundant constituents of 83 bacterial chromatin. This includes Escherichia coli (Figure 1c), where the average cell 84 during exponential growth contains an estimated 30,000-55,000 HU molecules (Azam 85 et al. 1999). Although individual members of the HU family have diverged in their 86 DNA binding properties, even distant homologs display functional similarities, 87 constraining negative supercoils and binding not only B-form but also structurally 88 unusual DNA such as cruciforms (Grove 2011). Outside bacteria, HU proteins are 89 relatively rare. They are found with a modicum of phylogenetic persistence only in 90 some single-celled eukaryotes (Figure 2a , Supplementary Text), where they are 91 known to play important functional roles (Sasaki et al. 2009; Gornik et al. 2012), and 92 in a single clade of archaea: the Thermoplasmatales/deep-sea hydrothermal vent 93 (DHVE2 group). Phylogenetic reconstruction of the HTa/HU gene 94 family suggests that HTa was acquired via horizontal gene transfer from bacteria at 95 the root of this clade. However, HTa has sufficiently diverged from its bacterial 96 homologs – and the event is sufficiently ancient – to prevent facile identification of a 97 specific bacterial donor clade (Figure 2, Supplementary Text). 98 99 In T. acidophilum, HTa is highly abundant (Figure 1c) and protects ~25-35% of the 100 genome from micrococcal nuclease (MNase) digestion (Searcy and Stein 1980; 101 Thomm et al. 1982), consistent with a global role in structuring T. acidophilum 102 chromatin. Pioneering work by Searcy and colleagues showed that MNase digestion 103 of native T. acidophilum chromatin yields two distinct fragment sizes of ~40bp and 104 ~80bp (Searcy and Stein 1980). The same authors observed an indistinguishable

4 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

105 banding pattern when they digested calf thymus DNA following in vitro 106 reconstitution with purified HTa, suggesting that native HTa is sufficient for and 107 likely the principal mediator of protection from MNase digestion in T. acidophilum 108 (Searcy and Stein 1980). It remains unknown, however, where HTa binds to the T. 109 acidophilum genome in vivo; whether HTa binds in a sequence-specific manner or 110 promiscuously; whether it requires particular post-translational modifications to carry 111 out its functions; whether binding is dynamic in response to environmental changes; 112 and how binding relates to functional genomic landmarks. Crucially, we do not know 113 how HTa-mediated chromatin organization in T. acidophilum compares to that in 114 histone-encoding archaea: do HTa and histones fill similar functional and 115 architectural niches? Or are their binding patterns and functional repercussions 116 entirely distinct? 117 118 Here, to begin to address these questions, we characterize genome-wide chromatin 119 organization mediated by HTa in T. acidophilum. Using MNase treatment coupled to 120 high-throughput sequencing, we find footprints of protection throughout the genome. 121 Confirming prior results (Searcy and Stein 1980), we observe a bimodal distribution 122 of protected fragment sizes and subsequently infer small and large binding footprints. 123 The more common smaller footprints are well predicted by simple sequence features 124 and display a general preference for GC-rich sequences. Their sequence preferences, 125 positioning around transcription start sites, and static behavior throughout the growth 126 cycle are reminiscent of archaeal histones rather than well-characterized bacterial HU 127 homologs. In addition, we present evidence that larger fragments are frequently 128 derived from nucleation-extension events, similar to what has been observed for 129 archaeal histones that have the capacity to oligomerize (Maruyama et al. 2013; 130 Nalabothula et al. 2013; Mattiroli et al. 2017). Our results suggest that, in key aspects 131 of its molecular behaviour, HTa can be regarded as an archaeal histone analog. 132 133 134 Results 135 136 Primary chromatin structure across the T. acidophilum growth cycle 137

5 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

138 To elucidate genome-wide HTa binding in vivo, we carried out a series of MNase 139 experiments across the T. acidophilum growth cycle. Throughout, MNase digestion 140 yielded protected fragments of two distinct sizes (Figure 3a), in line with previous 141 results (Searcy and Stein 1980). MNase digestion of E. coli cells expressing 142 recombinant HTa produced a very similar protection pattern (Figure 3b). This 143 demonstrates that HTa readily binds DNA outside its native cellular environment and 144 does not require T. acidophilum-specific post-translational modifications or binding 145 partners to protect from MNase digestion, consistent with previous in vitro 146 reconstitution experiments (Searcy and Stein 1980). In contrast, native E. coli HU 147 (HupA), even when strongly over-expressed from the same plasmid, does not confer 148 significant protection under equivalent digestion conditions (Figure 3b). Neither do 149 HU orthologs from Thermobacillus composti and Lactobacillus floricola (Figure 150 S1a), which have higher sequence identity to HTa (37% and 39% compared to 27% 151 for E. coli HupA). HTa is therefore unusual, although perhaps not unique (Ghosh and 152 Grove 2004; Mukherjee et al. 2008) among HU family proteins in its capacity to 153 protect DNA from MNase attack. Interestingly, the MNase profile of HTa-expressing 154 E. coli (see Figure S1a in particular) suggests protection of additional, longer 155 fragments, not as readily apparent in the T. acidophilum digest (but see Figure 3c), an 156 observation to which we will return below. 157 158 Next, we sequenced the T. acidophilum DNA fragments that survived MNase 159 treatment using Illumina paired-end technology (see Methods). As anticipated, two 160 major fragment size classes are present across all stages of the growth cycle (Figure 161 3c, Figure S1b). For downstream analysis, we define small (large) fragments as 40- 162 65bp (70-100bp) in size and note the following: first, the twin peaks centred around 163 ~85bp (Figure 3c), separated by approximately one helical turn, were evident across 164 biological replicates (Figure S1b). At present, we do not know whether this reflects 165 discontinuous unwrapping and digest or the presence of distinct binding species. 166 However, genome-wide occupancy of 70-80bp and 80-90bp fragments is highly 167 correlated (Spearman’s r=0.76, P< 2e-16, Figure 3d) and we therefore consider 70- 168 80bp and 80-90bp fragments jointly. Second, modal fragment sizes (~50bp and 169 ~85bp) are slightly larger than those reported previously (~40bp and ~80bp) (Searcy 170 and Stein 1980). At least in part, this reflects digestion conditions: fragments obtained

6 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

171 after doubling digestion time from 15 to 30 minutes are shorter and map inside larger 172 footprints (Figure S2). We chose and persisted with a somewhat milder digest here to 173 avoid over-digestion of small fragments. 174 175 We then mapped fragments, irrespective of their size, to the T. acidophilum genome 176 (see Methods). Protection across the genome is both ubiquitous and heterogeneous, 177 with relatively even coverage along the origin-terminus axis once increased copy 178 number of early replicating regions is taken into account (see Methods, Figure 3e-f). 179 Major drop-offs in coverage correspond to areas of low GC content, as evident from 180 Figure 3e, formally shown in Figure S3, and further explored below. For any given 181 growth phase, genome-wide occupancy is highly correlated across replicates (mean 182 r=0.89, P<2.2e-16 for all pairwise comparisons). For each time point, we therefore 183 merged reads across replicates and called peaks independently for small and large 184 fragments (see Methods). Globally, peak locations and occupancy vary little across 185 the growth cycle (Figure 3g). Below, we will focus on data from exponential phase 186 (day 2), where we observe 13,915 narrow and 6,887 broad peaks, before discussing 187 variability of HTa-mediated chromatin architecture across the growth cycle in the 188 context of transcriptional changes. 189 190 Analysis of HTa binding footprints suggests histone-like oligomerization behaviour 191 192 Considering read coverage around called peaks, it is evident that small and large 193 fragments often overlap (Figure 3f, Figure 4c,f), indicating the presence of different 194 protective entities in the same location across cells. Importantly, broad peaks are 195 typically a combination of small and large fragments and often show asymmetric 196 coverage (caused by smaller fragments) around the summit of the inferred peak (see 197 Figure 4a for an example). Given that some archaeal histones (Maruyama et al. 2013; 198 Mattiroli et al. 2017) and bacterial HU proteins (Hammel et al. 2016; Hołówka et al. 199 2017) can form oligomers, we reasoned that asymmetric coverage might contain 200 valuable information about the potential genesis of larger fragments from smaller 201 nucleation sites. To retain this signal, lost when averaging over individual peaks in 202 aggregate plots, we re-oriented the coverage signal as displayed in Figure 4b,

7 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

203 revealing that smaller fragments are aligned to the edge – rather than the centre – of 204 broad-peak footprints (Figure 4d,g). 205 206 We then applied the same procedure to MNase data we had independently generated 207 for the histone-containing archaeon Methanothermus fervidus (see Methods), which – 208 thanks to the work of Reeve, Sandman and others – is the source of much of our 209 foundational knowledge about archaeal histones. Comparing M. fervidus to T. 210 acidophilum, we find a very similar nested, edge-aligned structure of smaller 211 fragments within broader peaks (Figure 4e,h). In M. fervidus, this pattern is consistent 212 with oligomerization, in dimer steps, from the minimal histone tetramer (Maruyama et 213 al. 2013; Mattiroli et al. 2017). Whether the pattern in T. acidophilum similarly 214 reflects direct physical contact or is caused by closely adjacent binding of 215 independent HTa complexes remains to be established. We note that our approach to 216 retaining asymmetric coverage signals might have broader utility in characterizing 217 oligomerization behaviour of DNA-binding proteins from MNase or similar high- 218 resolution foot-printing data. 219 220 221 HTa exhibits histone-like sequence preferences 222 223 Next, we asked what factors govern HTa binding in general and presumed nucleation- 224 extension dynamics in particular. As the MNase signal broadly tracks GC content 225 (Figure 3e), we first considered nucleotide enrichment patterns associated with peaks. 226 At a coarse level, we find that both broad and narrow peaks exhibit relatively elevated 227 GC content at their centre and are flanked by short stretches of GC depletion (Figure 228 4i,j). In line with promiscuous binding, a specific binding motif, as one would observe 229 for most transcription factors, is not evident (Figure S4a). For broader peaks, it is 230 worth noting that, in both T. acidophilum and M. fervidus, once we have re-oriented 231 the small fragment coverage signal as described above, peak-internal GC content 232 positively tracks small fragment abundance (Figure 4j,k), supporting a model where 233 nucleation happens on more GC-rich sites whereas sequence need not be as 234 favourable for subsequent extension events. Note, however, that while sequence 235 might be more important for nucleation than extension, it is by no means irrelevant

8 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

236 for the latter: for example, narrow peaks where large fragments are rare tend to be 237 flanked by more AT-rich sequence in both T. acidophilum and M. fervidus, suggesting 238 that sequence can prevent or at least predispose against oligomer formation (Figure 239 S4b,c). 240 241 We then considered dinucleotide frequencies in reads of defined lengths, restricting 242 the analysis to reads that overlap previously defined peaks by at least 90% and 243 discarding duplicate reads that mapped to the same genomic location, in order to not 244 bias results towards highly occupied footprints. Read-internal dinucleotide profiles 245 confirm an overall GC preference but also reveal local enrichment/depletion patterns, 246 notably a short central track of reduced GC enrichment (Figure 5a) in T. acidophilum 247 as well as histone-encoding archaea, which is symmetric around the HTa/nucleosome 248 dyad. Reassuringly, local enrichment patterns get weaker and eventually disappear 249 when considering reads increasingly further away from modal fragment lengths 250 (Figure S5). Unexpectedly, we also find mononucleotide and RR/YY (R=A or G; 251 Y=T or C) profiles similar to those previously observed for – and attributed to the 252 unique geometry of – eukaryotic histones (Reynolds et al. 2010; Ioshikhes et al. 2011; 253 Quintales et al. 2015). These profiles show strong counter-phasing and asymmetry 254 across the dyad (Figure 5b,c) and are particularly prominent in T. acidophilum and T. 255 kodakarensis. These observations suggest that symmetric and asymmetric nucleotide 256 enrichments across a dyad axis are not limited to nucleosomes and, furthermore, 257 imply that the HTa-DNA complex is symmetric. All three observations – a general 258 preference for GC-rich sequence, symmetric WW/SS and asymmetric 259 mononucleotide/RR/YY frequencies around the dyad axis – are strongly reminiscent 260 of archaeal as well as eukaryotic histones (Peckham et al. 2007; Kaplan et al. 2008; 261 Tillo and Hughes 2009; Ammar et al. 2011; Nalabothula et al. 2013). 262 263 To define nucleotide preferences more rigorously and enable prediction of relative 264 occupancy from underlying nucleotide features, as previously done for eukaryotic 265 histones (Tillo and Hughes 2009), we trained Lasso models on small and large 266 fragments separately (see Methods). The model confirms a general enrichment for 267 strong (S = G or C) over weak (W = A or T) nucleotides, with mono- and dinucleotide 268 frequencies the strongest individual predictors (Figure 5e). For small fragments,

9 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

269 predicted and observed occupancy are well correlated (r (day 2)=0.76; r(day 3.5)= 270 0.86, observed versus predicted in test set, P<2e-16) although the model fails to 271 capture extreme occupancy values (Figure 5d). The HTa occupancy signal we observe 272 is, in a further parallel to histones, almost certainly conflated by MNase digestion 273 bias, which is characterized by preferential degradation of AT-rich sequences (Allan 274 et al. 2012). As a consequence, it would be premature to draw quantitative 275 conclusions about the extent to which HTa prefers GC-rich sequences. However, not 276 only do we observe read-internal nucleotide enrichments, which are not expected to 277 arise as a consequence of digestion bias and have also been found in eukaryotes using 278 chemical mapping approaches (Brogaard et al. 2012), but we also readily recover AT- 279 rich protected fragments if they exist in vivo, as described in the following section. 280 This indicates that, as previously shown for eukaryotes (Allan et al. 2012), inferred 281 HTa/histone binding preferences are conflated with but not simply a mirage of MNase 282 cutting bias. 283 284 A hidden diversity of large fragments in exponential phase 285 286 Although the Lasso model does an admirable job of predicting small fragment 287 occupancy across the growth cycle, it curiously performs much worse when trying to 288 predict large fragment coverage in exponential phase (day 2, Figure 5e, r=0.43, P< 289 2.2e-16). We reasoned that differences between broad peaks in exponential and 290 stationary phase might be owing to changes in the relative abundance of qualitatively 291 different protective binding events: one, where large HTa-protected footprints emerge 292 from extension of smaller fragments, and another where these fragments represent 293 something else; a different mode of HTa binding that is independent of prior 294 sequence-driven nucleation, perhaps, or an altogether different protein complex that 295 happens to protect a similar-sized piece of DNA. To explore this hypothesis, we 296 divided broad peaks into deciles based on the relative coverage of small fragments at 297 these peak. Doing so, we find that broad peaks where small fragments are rarest show 298 strongly divergent sequence composition (Figure 6a). Curiously, while these 299 divergent peaks are very common in exponential phase, they almost entirely disappear 300 in stationary phase, which is instead dominated by broad peaks that conform to the

10 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

301 nucleation and extension model, where small fragment abundance quantitatively 302 predicts large fragment abundance (Figure 6b, Figure S6). 303 304 305 Landscape of HTa binding around transcriptional start sites 306 307 Broad peaks with few small fragments are particularly enriched in intergenic 308 sequence (Figure 7a), and it is around transcriptional start sites (TSSs, Figure 7b; 309 approximated from RNA-seq data, see Methods) and end sites (TESs, Figure S7), 310 where their disappearance in stationary phase is arguably most striking (Figure 7b). 311 However, even though their positioning suggests a potential involvement in gene 312 regulation, their disappearance is not obviously coupled to local changes in 313 transcription: the abundance of these “independent” fragments drops equally for 314 genes that are up- or down-regulated or remain the same in stationary compared to 315 exponential phase (Figure 7f). We find no equivalent to these nucleation-independent 316 large fragments in histone-encoding archaea (Figure 7c-e, Figure S7) and further 317 investigation will be required to establish the identity of the proteins that protect these 318 fragments from digestion. We note, however, that nucleoid-associated proteins in 319 bacteria, including HU and H-NS in E. coli, can bind different DNA substrates in 320 different conformations and therefore exert distinct effects on DNA topology and 321 chromatin architecture depending on local concentration, sequence context, and the 322 presence of other factors (Dillon and Dorman 2010). It is therefore tempting to 323 speculate that HTa has retained some of this HU-like versatility in DNA binding and 324 that independent fragments might also be derived from HTa protection. However, we 325 have no direct evidence for this at present. Small fragment-peaks and nucleation- 326 dependent large fragments, on the other hand, share substantial similarities with 327 archaeal histones in their positioning around genes: they are depleted at all times from 328 the AT-rich TSSs/TESs, and barely change position through the growth cycle (Figure 329 7, Figure S8). 330 331 332 Discussion 333

11 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

334 The evidence above suggests that HTa is a protein of bacterial origin that converged 335 onto supramolecular properties reminiscent of archaeal histones. Whereas well- 336 characterized HU homologs exhibit elevated occupancy in AT-rich regions, prefer an 337 AT-rich motif at the centre of the binding footprint, or bind sequence non-selectively 338 (Swinger and Rice 2007; Grove 2011; Prieto et al. 2012), HTa shows a more histone- 339 like preference for GC and even exhibits asymmetric mononucleotide/RR/YY profiles 340 across the dyad. Whether these shared nucleotide preferences are grounded in primary 341 sequence, DNA structural features determined by primary sequence, or a combination 342 of the two remains to be established. Sequence-driven positioning around functional 343 genomic landmarks, footprint extension dynamics, and relative positional stability 344 throughout the growth cycle are also reminiscent of archaeal histones, as is the ability 345 of HTa to protect DNA from MNase digestion. Based on these observations, we 346 propose that HTa can be regarded as an archaeal histone analog. 347 348 This analogy is, of course, preliminary. Structural work will be required to 349 characterize how HTa interacts with DNA and to determine whether large HTa- 350 protected fragments reflect closely spaced binding events or, in further analogy to 351 archaeal histones, the presence of contiguous HTa polymers. The observation that 352 even larger protected fragments can be formed when HTa is expressed in E. coli 353 (Figure S1a) is interesting in this regard and deserves further exploration. Additional 354 work is also required to elucidate the interaction partners and physiological functions 355 of HTa, including its involvement in transcription. Although we find no obvious 356 global link between changes in HTa occupancy and transcriptional output, this does 357 not preclude local effects or dynamics at a time-scale where information on nascent 358 transcription – rather than steady-state RNA levels – would be required to implicate 359 HTa. Neither do our results rule out an important but constitutive role in transcription, 360 for example in binding to and constraining negative supercoils similar to what has 361 been observed for other HU homologs (Berger et al. 2009; Grove 2011) and archaeal 362 chromatin proteins such as MC1, Sul7, and Cren7 (Zhang et al. 2012). Alternatively, 363 the main function(s) of HTa might simply not be in transcription. Searcy and co- 364 workers showed that HTa facilitates re-annealing following DNA denaturation (Stein 365 and Searcy 1978), suggesting that HTa might assist DNA renaturation after thermal 366 stress. Our finding of heterogeneous but predictable HTa occupancy across the

12 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

367 genome implies that re-annealing would proceed differentially in genic and intergenic 368 regions, leaving the latter unbound and free to engage in binding of protein complexes 369 for transcription. However, the physiological importance of HTa-mediated 370 reannealing remains to be established and is put into perspective by the observation 371 that E. coli HU, when added to naked DNA, also significantly raises its melting 372 temperature (Drlica and Rouvière-Yaniv 1987). Another possibility is that HTa is 373 involved in higher order genome structure (Bohrmann et al. 1990) or, like many HU 374 family members (Grove 2011), in DNA repair. To dissect these and other hypothetical 375 functions in vivo in the future, to determine whether HTa is essential, and address 376 whether archaeal histones can functionally substitute for HTa, the development of 377 genetic tools for T. acidophilum is highly desirable. 378 379 Finally, from an evolutionary perspective, it is worth highlighting dinoflagellates as a 380 second case where histones have lost their pre-eminent role in genome compaction 381 and organization to other small basic proteins. Even though histones remain encoded 382 in dinoflagellate genomes (Marinov and Lynch 2016), and might continue to play 383 important roles in transcription or other processes, they are no longer the main protein 384 constituent of chromatin. Instead, proteins with homologs in phycodnaviruses have 385 become the principal mediators of compaction (Gornik et al. 2012). In a subset of 386 species, these dinoflagellate/viral nucleoproteins (DVNPs) act alongside HU-like 387 proteins that were likely acquired from bacteria (Figure 2) (Wong et al. 2003; 388 Janouškovec et al. 2017) and might be involved in the genesis and maintenance of 389 DNA loop structures (Chan and Wong 2007). If not exactly a precedent, the case of 390 dinoflagellates nonetheless serves to illustrate that HU proteins are versatile, 391 evolvable and have been independently co-opted into important architectural roles 392 following horizontal transfer. It also highlights that histones or a subset of their 393 functions can be replaced, under the right circumstances, by alternative DNA-binding 394 proteins. At the same time, our results demonstrate that such replacements, even 395 though they appear radical, need not necessarily go hand in hand with fundamental 396 changes to the architectural layout of chromatin. 397 398 399

13 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

400 401 402 Methods

403 Thermoplasma acidophilum culture

404 T. acidophilum strain 122-1B2 was obtained from DSMZ (https://www.dsmz.de/) and 405 cultured using the medium described in (Searcy and Stein 1980), supplemented to a 406 final concentration of 2g/L yeast extract (BD Biosciences). The medium was boiled 407 for five minutes and allowed to cool to 58°C before inoculation. Cultures were 408 incubated at 59°C with shaking (90rpm) in an INFORS Thermotron incubator. 409 Throughout, culture to flask volume ratio was maintained at a maximum of one fifth. 410 Fresh cultures were inoculated with a 10% v/v inoculum from a 4-day-old culture. 411 Samples from 24-, 48-, 72-, and 80-hour cultures were used for RNA extraction and

412 MNase experiments. Sample aliquots were first equilibrated to pH4 using NH4OH to 413 avoid depurination (Robb et al. 1995) and then pelleted at low speed. To monitor cell 414 viability and homogeneity, cells were imaged with a standard light microscope at 415 100x magnification. 416

417 MNase digestion – Thermoplasma acidophilum

418 T. acidophilum cells self-lyse at pH>6 (Robb et al. 1995). Pellets were therefore 419 directly re-suspended in ice-cold MNase digestion buffer (10mM Tris, 5mM Ca2+, 420 pH8) and homogenized via 20 passages through a Dounce homogenizer, on ice. 421 Unfixed crude lysates were digested in the presence of 4U/mL of MNase (Thermo 422 Fisher) at 37°C. Digestion was stopped by addition of EDTA to a final concentration 423 of 20mM. Samples were incubated for an additional 30min at 37°C in the presence of 424 RNAse A to a final concentration of 0.5mg/mL and then overnight at 65°C in the 425 presence of SDS (1%) and proteinase K (125µg/mL). Subsequently, DNA was 426 extracted by phenol chloroform extraction and precipitated with ethanol. For MNase 427 digestion of E. coli samples, lysates were obtained by cryo-grinding with a pestle and 428 mortar. 429 430 Preparation of undigested DNA samples 431

14 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

432 Undigested lysate was incubated as above but without addition of enzyme. 433 Undigested DNA was then sonicated using a Covaris S220 sonicator with the 434 following settings: peak power: 175, duty: 10, cycle/burst: 200 for 430s (target size: 435 150bp). 436

437 MNase digestion and sequencing – Methanothermus fervidus

438 Flash-frozen pellets of M. fervidus harvested in late exponential and stationary phase 439 were purchased from the Archaeenzentrum in Regensburg (via Harald Huber). 440 Digestion and sequencing conditions will be published as part of a manuscript 441 currently in preparation. They are not repeated here but reproduced in full at the 442 NCBI Gene Epression Omnibus (GEO) where the raw reads have been deposited 443 (accession number pending). 444

445 RNA extraction

446 Pellets were re-suspended in 500µL RNAlater (Ambion), incubated for one hour at 447 room temperature, pelleted again and snap-frozen. Total RNA was extracted using an 448 RNeasy kit (Qiagen) according to manufacturer’s instruction, including DNAse I 449 treatment.

450 Protein expression in E. coli

451 Protein sequences for HTa and different HU homologs were obtained from UniProt 452 (P02345, P0ACF0, LOEKC1, A0A0R2DT96). The corresponding coding sequences 453 were codon-optimized for expression in E. coli, synthesized and inserted into the 454 pD864-SR backbone for rhamnose-inducible expression (ATUM). The YFP- 455 expressing control strain was obtained directly from ATUM.

456 Bacterial strains and growth conditions

457 E. coli ∆∆hupA/hupB double mutant and corresponding wildtype strain were a gift 458 from Jacques Oberto. These strains were grown in LB medium at 37°C with shaking. 459 For MNase assays, strains were pre-cultured from selection plates to 5mL LB cultures 460 in 50mL falcon tubes overnight with antibiotics and re-suspended to OD600=0.1. 1M

15 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

461 rhamnose was added two hours after inoculation to a final concentration of 5mM and 462 grown for 4 hours prior to harvesting. Induction was confirmed by monitoring YFP 463 expression.

464 Sequencing and data availability

465 For MNase digestion experiments paired-end reads were prepared using the NEBNext 466 Ultra II DNA Library Prep Kit. For RNA sequencing, ribosomal RNAs were depleted 467 using Illumina’s Ribo-Zero rRNA Removal Kit (Bacteria) and RNA libraries 468 prepared using the TruSeq Stranded Total RNA LT Kit. Both RNA and DNA libraries 469 were then sequenced on a HiSeq2500 machine. Raw read and processed data have 470 been deposited in GEO (accession number pending).

471 MNase-seq analysis

472 Paired-end 100bp reads were first trimmed using BBDuk v37.64 (parameters: 473 ktrim=r, k=21, hdist=1, edist=0, mink=11, qtrim=rl, trimq=30, minlength=10, 474 ordered=t, qin=33) and merged using BBmerge v37.64 (parameters: mininsert=10, 475 mininsert0=10). The merged reads were then mapped to the T. acidophilum 476 DSM1728 genome (GCA_000195915.1) using Bowtie2 (-U). Coverage tracks were 477 computed using bedtools bamCoverage (RPGC normalization, effective genome size: 478 1564906bp), measuring coverage for reads of sizes 40-65bp and 70-100bp separately 479 where appropriate. To compute the correlation matrix of reads of different sizes 480 (Figure 3d), coverage for reads of defined lengths was computed using bedtools 481 bamCoverage without RPGC normalization.

482 Normalization for replication associated bias

483 Reads from the sonicated DNA control samples were mapped in paired-end mode 484 using Bowtie2. Coverage was computed independently of read size and smoothed 485 over 10kbp to avoid introducing noise from the input into the MNase signal. Genome- 486 wide coverage of MNase-digested samples was then divided by its corresponding 487 undigested DNA sample to remove bias in coverage associated with replication. 488 Lastly, coverage was converted into a Z-score using the scale function in R.

489 RNA-seq analysis

16 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

490 Single-end reads were mapped to the T. acidophilum DSM1728 genome using 491 Geneious 11.1.2. Transcripts were reconstructed and quantified using Rockhopper 492 (McClure et al. 2013), using a stringent threshold of 0.9 for 5’ and 3’UTR detection. 493 Differential expression was assessed using DESeq2 as part of the Rockhopper 494 pipeline.

495 LASSO modeling

496 Abundances of different nucleotide k-mers k={1,2,3,4} were computed over genomic 497 windows of 21bp and 51bp using the R packages seqtools (v1.16). We included the 498 21bp window to enable independent capture of read-internal nucleotide patterns that 499 did not overlap with MNase cutting sites. For all genomes analyzed, the top 80 k-mers 500 (across both 21bp and 51bp windows) with the highest absolute correlation values 501 with MNase coverage over the first third of the genome were chosen as model 502 parameters. A general linear model with LASSO optimization was trained on the first 503 third of each genome, with 10-fold internal cross validation, using the LASSO 504 function in Matlab. Predicted coverage tracks were then calculated in R based on k- 505 mer weights from the training set.

506 Peak detection and asymmetry scoring

507 Peaks were detected using the NucleR R package and scored using a modified scoring 508 function, where peak height was measured as the coverage value at each peak centre 509 relative to the empirical distribution of the data. Parameters used for the initial Fourier 510 filtering step are listed in Table S3. A threshold value corresponding to a Z-score of 511 0.25 was used for all data. To score asymmetric coverage inside broad peaks, we 512 computed the average coverage signal of smaller DNA fragments either side (a and b 513 in Figure 4b) of the peak dyad and then considered the ratio a/b. Coverage signal at 514 peaks where a/b<1 were flipped around the dyad axis for downstream signal 515 processing.

516 Nucleotide frequencies and data visualization

517 Nucleotide frequencies were computed using the R package Seqpattern (v1.14). Only 518 reads with an average quality score of 30 that overlapped called peaks by ³90% were 519 selected for this analysis. 2D occupancy plots were generated from non-normalized

17 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

520 MNase-Seq reads using plot2DO (https://github.com/rchereji/plot2DO) (Chereji et al. 521 2018), modified to enable processing of single-end reads. Multiscale analysis of 522 MNase coverage was carried out using MultiScale Representation of Genomic 523 Signals (https://github.com/tknijnen/msr/) (Knijnenburg et al. 2014), considering 524 genomic segments with significant (P<1e-10) enrichment at scale 30.

525 MNase and additional data from other organisms

526 MNase data from other organisms was obtained from the Sequence Read Archive 527 (SRR495445 for T. kodakarensis; SRR574592 for H. volcanii). H. volcanii reads were 528 trimmed using BBDUK (ktrim=r, k=21, hdist=1, edist=0, mink=11, qtrim=rl, 529 trimq=20, minlength=10, qin=33). Reads were mapped to their respective genomes 530 using Bowtie2 (Setting for SRR495445 reads: -3 75 -X 5000 -k 1 -x, as in the original 531 publication; default settings for SRR574592 reads). To reduce PCR duplicate bias, per 532 base coverage values of MNase data from H.volcanii were thresholded at the last 533 percentile. TSS positions for T. kodakarensis and H. volcanii were obtained from 534 (Jäger et al. 2014) and (Babski et al. 2016), respectively.

535 Homology modelling

536 Secondary structures for HupA (P0ACF0) and HmfA (P48781), as displayed 537 schematically in Figure 1a, were taken from UniProt. The secondary structure of HTa 538 was predicted by homology modeling in SWISS-MODEL (Waterhouse et al. 2018) 539 using HupA (PDB:1mul) as a template. To predict and visualize the quaternary 540 structure of the HTa homodimer, we used the HTa sequence to build a homology

541 model based on the X-ray crystal structure of (HupA)2 using PDB:1p51 as a template. 542 The homology model, again built using SWISS-MODEL, has a general mean quality

543 estimate of 0.71. Both (HTa)2 and (HupA)2 structures were refined with steepest 544 descent and conjugate gradient energy minimization using the AMBER ff14SB 545 protein force field potentials (Maier et al. 2015) and a force constraint of 2 kcal/mol 546 placed on the Ca peptide backbone atoms. To calculate the solvent accessible surface 547 area and charge density, we used the Adaptive Poisson-Boltzmann Solver (APBS) 548 (Jurrus et al. 2018). The charge density was mapped onto the solvent accessible 549 surface area using the VMD visualisation package (Humphrey et al. 1996). 550

18 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

551 Phylogenetic analysis 552 553 Amino acid sequences containing the HU-IHF domain (cl00257) where identified in 554 bacteria, archaea, eukaryotes, and viruses using the Conserved Domain Architecture 555 Retrieval Tool (Geer et al. 2002) [accessed on 29th October 2018]. The initial set 556 comprised 52,953, 82, 204, and 131 sequences, respectively. To reduce the number of 557 bacterial sequences to a computationally more tractable subset yet maintain sequence 558 diversity, we pre-processed the bacterial sequence set as follows: first, each bacterial 559 sequence was identified to family level using NCBI annotations; then, only 560 those sequences with a valid family-level taxonomic identification were retained 561 (43,454 sequences belonging to 409 families). Within each family, we then calculated 562 pairwise identities between all sequences and identified up to ten sequence identity 563 clusters. Subsequently, a single representative sequence was randomly sampled from 564 each cluster (for families comprising of fewer than 10 sequences, we selected all 565 available sequences). This reduced the bacterial set to 3135 sequences. 566 567 Next, the full archaeal, eukaryotic, viral and reduced bacterial sets were processed 568 using the Batch Web CD-Search Tool (Marchler-Bauer and Bryant 2004) to 569 determine the position and integrity of the HU-IHF domain(s). Based on the domain 570 identification, we then selected only those sequences from each set that contained a 571 single, complete domain. The bacterial set was further reduced by selecting only those 572 sequences less than or equal to 110 amino acids. Given the relative scarcity of 573 sequences in the other kingdoms, their sequences were not size-filtered. The final set 574 comprised 30 archaeal, 164 eukaryotic, 112 viral, and 1920 bacterial sequences. 575 576 Sequences were aligned using the Constraint-based Multiple Alignment Tool 577 (COBALT) through the NCBI web-interface 578 (https://www.ncbi.nlm.nih.gov/tools/cobalt/cobalt.cgi) with default parameters. The 579 phylogenetic tree was reconstructed using RAxML (version 8.2.10) with the 580 following parameters: -f a, -m PROTCATAUTO, -T 20. Branch support was based on 581 100 bootstrap calculations performed in RAxML (Stamatakis 2014). 582 583

19 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

584 Acknowledgements 585 586 The authors thank Dennis Searcy for advice on T. acidophilum husbandry and 587 chromatin biochemistry, Finn Werner for mentorship and his lab for feedback and 588 advice, Harald Huber and the Archaeenzentrum in Regensburg for M. fervidus 589 biomass, and Jacques Oberto for E. coli hupA/B deletion strains. This work was 590 supported by a UKRI Innovation Fellowship (JBS), an Imperial College Integrative 591 Experimental and Computational Biology Studentship (AE), and UK Medical 592 Research Council core funding (TW). 593 594 Author contributions 595 596 AH carried out all experiments and analyses, with the following exceptions: MR 597 produced and contributed to analysis of M. fervidus MNase data; JBS carried out 598 homology modelling; and AE designed and implemented pipelines for HU ortholog 599 identification and phylogenetic analysis. TW conceived the study. AH and TW 600 devised experimental and analytical strategies and wrote the manuscript, with input 601 from all authors. 602 603 Competing financial interests 604 605 The authors declare that no competing financial interests exist. 606 607 References 608 609 Adam P. S., Borrel G., Brochier-Armanet C., Gribaldo S., 2017 The growing tree of 610 Archaea: new perspectives on their diversity, evolution and ecology. ISME J 11: 611 2407–2425.

612 Allan J., Fraser R. M., Owen-Hughes T., Keszenman-Pereyra D., 2012 Micrococcal 613 Nuclease Does Not Substantially Bias Nucleosome Mapping. Journal of 614 Molecular Biology 417-135: 152–164.

615 Ammar R., Torti D., Tsui K., Gebbia M., Durbic T., Bader G. D., Giaever G., Nislow 616 C., 2011 Chromatin is an ancient innovation conserved between Archaea and 617 Eukarya. eLife 1: e00078.

618 Azam T. A., Iwata A., Nishimura A., Ueda S., Ishihama A., 1999 Growth Phase-

20 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

619 Dependent Variation in Protein Composition of the Escherichia coli Nucleoid. 620 Journal of Bacteriology 181: 6361–6370.

621 Babski J., Haas K. A., Näther-Schindler D., Pfeiffer F., Förstner K. U., Hammelmann 622 M., Hilker R., Becker A., Sharma C. M., Marchfelder A., Soppa J., 2016 623 Genome-wide identification of transcriptional start sites in the haloarchaeon 624 Haloferax volcanii based on differential RNA-Seq (dRNA-Seq). BMC Genomics 625 17: 629.

626 Bai L., Morozov A. V., 2010 Gene regulation by nucleosome positioning. Trends in 627 Genetics 26: 476–483.

628 Bailey K. A., Reeve J. N., 1999 DNA repeats and archaeal nucleosome positioning. 629 Research in Microbiology 150: 701–709.

630 Bailey K. A., Chow C. S., Reeve J. N., 1999 Histone stoichiometry and DNA 631 circularization in archaeal nucleosomes. Nucleic Acids Research 27: 532–536.

632 Bailey K. A., Pereira S. L., Widom J., Reeve J. N., 2000 Archaeal histone selection of 633 nucleosome positioning sequences and the procaryotic origin of histone- 634 dependent genome evolution. Journal of Molecular Biology 303: 25–34.

635 Berger M., Farcas A., Geertz M., Zhelyazkova P., Brix K., Travers A., Muskhelishvili 636 G., 2009 Coordination of genomic structure and transcription by the main 637 bacterial nucleoid-associated protein HU. EMBO Rep 11: 59–64.

638 Bohrmann B., Arnold-Schulz-Gahmen B., Kellenberger E., 1990 Ultrastructural 639 localization of the histone-like protein HTa from the archaeon Thermoplasma 640 acidophilum. Journal of Structural Biology 104: 112–119.

641 Brogaard K., Xi L., Wang J.-P., Widom J., 2012 A map of nucleosome positions in 642 yeast at base-pair resolution. Nature 486: 496–501.

643 Chan Y.-H., Wong J. T. Y., 2007 Concentration-dependent organization of DNA by 644 the dinoflagellate histone-like protein HCc3. Nucleic Acids Research 35: 2573– 645 2583.

646 Chereji R. V., Ramachandran S., Bryson T. D., Henikoff S., 2018 Precise genome- 647 wide mapping of single nucleosomes and linkers in vivo. Genome Biol. 19: 19.

648 Decanniere K., Babu A. M., Sandman K., Reeve J. N., Heinemann U., 2000 Crystal 649 structures of recombinant histones HMfA and HMfB from the hyperthermophilic 650 archaeon Methanothermus fervidus. Journal of Molecular Biology 303: 35–47.

651 DeLange R. J., Williams L. C., Searcy D. G., 1981 A histone-like protein (HTa) from 652 Thermoplasma acidophilum. II. Complete amino acid sequence. The Journal of 653 Biological Chemistry 256: 905–911.

654 Dillon S. C., Dorman C. J., 2010 Bacterial nucleoid-associated proteins, nucleoid 655 structure and gene expression. Nature Reviews Microbiology 8: 185–195.

656 Drlica K., Rouvière-Yaniv J., 1987 Histonelike proteins of bacteria. Microbiol. Rev.

21 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

657 51: 301–319.

658 Dulmage K. A., Todor H., Schmid A. K., 2015 Growth-Phase-Specific Modulation of 659 Cell Morphology and Gene Expression by an Archaeal Histone Protein. mBio 6: 660 781.

661 Geer L. Y., Domrachev M., Lipman D. J., Bryant S. H., 2002 CDART: protein 662 homology by domain architecture. Genome Research 12: 1619–1623.

663 Gehring A. M., Walker J. E., Santangelo T. J., 2016 Transcription Regulation in 664 Archaea. Journal of Bacteriology 198: 1906–1917.

665 Ghosh S., Grove A., 2004 Histone-like Protein HU from Deinococcus radiodurans 666 Binds Preferentially to Four-way DNA Junctions. Journal of Molecular Biology 667 337: 561–571.

668 Gornik S. G., Ford K. L., Mulhern T. D., Bacic A., McFadden G. I., Waller R. F., 669 2012 Loss of Nucleosomal DNA Condensation Coincides with Appearance of a 670 Novel Nuclear Protein in Dinoflagellates. Current Biology 22: 2303–2312.

671 Grove A., 2011 Functional evolution of bacterial histone-like HU proteins. Curr 672 Issues Mol Biol 13: 1–12.

673 Hammel M., Amlanjyoti D., Reyes F. E., Chen J.-H., Parpana R., Tang H. Y. H., 674 Larabell C. A., Tainer J. A., Adhya S., 2016 HU multimerization shift controls 675 nucleoid compaction. Science Advances 2: e1600650.

676 Heinicke I., M ller J., Pittelkow M., Klein A., 2004 Mutational analysis of genes 677 encoding chromatin proteins in the archaeon Methanococcus voltae indicates their 678 involvement in the regulation of gene expression. Mol Genet Genomics 272.

679 Henneman B., van Emmerik C., van Ingen H., Dame R. T., 2018 Structure and 680 function of archaeal histones. PLoS Genet. 14: e1007582.

681 Hołówka J., Trojanowski D., Ginda K., Wojtaś B., Gielniewski B., Jakimowicz D., 682 Zakrzewska-Czerwińska J., 2017 HupB Is a Bacterial Nucleoid-Associated 683 Protein with an Indispensable Eukaryotic-Like Tail (M Sritharan, Ed.). mBio 8: 684 e01272–17.

685 Humphrey W., Dalke A., Schulten K., 1996 VMD: Visual molecular dynamics. 686 Journal of Molecular Graphics 14: 33–38.

687 Ioshikhes I., Hosid S., Pugh B. F., 2011 Variety of genomic DNA patterns for 688 nucleosome positioning. Genome Research 21: 1863–1871.

689 Janouškovec J., Gavelis G. S., Burki F., Dinh D., Bachvaroff T. R., Gornik S. G., 690 Bright K. J., Imanian B., Strom S. L., Delwiche C. F., Waller R. F., Fensome R. 691 A., Leander B. S., Rohwer F. L., Saldarriaga J. F., 2017 Major transitions in 692 dinoflagellate evolution unveiled by phylotranscriptomics. Proceedings of the 693 National Academy of Sciences of the United States of America 114: E171–E180.

694 Jäger D., Förstner K. U., Sharma C. M., Santangelo T. J., Reeve J. N., 2014 Primary

22 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

695 transcriptome map of the hyperthermophilic archaeon Thermococcus 696 kodakarensis. BMC Genomics 15: 684.

697 Jiang C., Pugh B. F., 2009 Nucleosome positioning and gene regulation: advances 698 through genomics. Nat. Rev. Genet. 10: 161–172.

699 Jurrus E., Engel D., Star K., Monson K., Brandi J., Felberg L. E., Brookes D. H., 700 Wilson L., Chen J., Liles K., Chun M., Li P., Gohara D. W., Dolinsky T., 701 Konecny R., Koes D. R., Nielsen J. E., Head-Gordon T., Geng W., Krasny R., 702 Wei G.-W., Holst M. J., McCammon J. A., Baker N. A., 2018 Improvements to 703 the APBS biomolecular solvation software suite. Protein Sci. 27: 112–128.

704 Kaplan N., Moore I. K., Fondufe-Mittendorf Y., Gossett A. J., Tillo D., Field Y., 705 LeProust E. M., Hughes T. R., Lieb J. D., Widom J., Segal E., 2008 The DNA- 706 encoded nucleosome organization of a eukaryotic genome. Nature 458: 362–366.

707 Knijnenburg T. A., Ramsey S. A., Berman B. P., Kennedy K. A., Smit A. F. A., 708 Wessels L. F. A., Laird P. W., Aderem A., Shmulevich I., 2014 Multiscale 709 representation of genomic signals. Nat Meth 11: 689–694.

710 Lagomarsino M. C., Espéli O., Junier I., 2015 From structure to function of bacterial 711 chromosomes: Evolutionary perspectives and ideas for new experiments. FEBS 712 Letters 589: 2996–3004.

713 Lu P., Vogel C., Wang R., Yao X., Marcotte E. M., 2007 Absolute protein expression 714 profiling estimates the relative contributions of transcriptional and translational 715 regulation. Nature Biotechnology 25: 117–124.

716 Maier J. A., Martinez C., Kasavajhala K., Wickstrom L., Hauser K. E., Simmerling 717 C., 2015 ff14SB: Improving the Accuracy of Protein Side Chain and Backbone 718 Parameters from ff99SB. J. Chem. Theory Comput. 11: 3696–3713.

719 Marchler-Bauer A., Bryant S. H., 2004 CD-Search: protein domain annotations on the 720 fly. Nucleic Acids Research 32: W327–W331.

721 Marinov G. K., Lynch M., 2016 Diversity and Divergence of Dinoflagellate Histone 722 Proteins. G3: Genes|Genomes|Genetics 6: 397–422.

723 Maruyama H., Harwood J. C., Moore K. M., Paszkiewicz K., Durley S. C., 724 Fukushima H., Atomi H., Takeyasu K., Kent N. A., 2013 An alternative beads- 725 on-a-string chromatin architecture in Thermococcus kodakarensis. EMBO Rep 726 14: 711–717.

727 Mattiroli F., Bhattacharyya S., Dyer P. N., White A. E., Sandman K., Burkhart B. W., 728 Byrne K. R., Lee T., Ahn N. G., Santangelo T. J., Reeve J. N., Luger K., 2017 729 Structure of histone-based chromatin in Archaea. Science 357: 609–612.

730 McClure R., Balasubramanian D., Sun Y., Bobrovskyy M., Sumby P., Genco C. A., 731 Vanderpool C. K., Tjaden B., 2013 Computational analysis of bacterial RNA-Seq 732 data. Nucleic Acids Research 41: e140.

23 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

733 Mukherjee A., Sokunbi A. O., Grove A., 2008 DNA protection by histone-like protein 734 HU from the hyperthermophilic eubacterium Thermotoga maritima. Nucleic 735 Acids Research 36: 3956–3968.

736 Nalabothula N., Xi L., Bhattacharyya S., Widom J., Wang J.-P., Reeve J. N., 737 Santangelo T. J., Fondufe-Mittendorf Y. N., 2013 Archaeal nucleosome 738 positioning in vivo and in vitro is directed by primary sequence motifs. BMC 739 Genomics 14: 391.

740 Peckham H. E., Thurman R. E., Fu Y., Stamatoyannopoulos J. A., Noble W. S., 741 Struhl K., Weng Z., 2007 Nucleosome positioning signals in genomic DNA. 742 Genome Research 17: 1170–1177.

743 Prieto A. I., Kahramanoglou C., Ali R. M., Fraser G. M., Seshasayee A. S. N., 744 Luscombe N. M., 2012 Genomic analysis of DNA binding and gene regulation by 745 homologous nucleoid-associated proteins IHF and HU in Escherichia coli K12. 746 Nucleic Acids Research 40: 3524–3537.

747 Quintales L., Soriano I., Vazquez E., Segurado M., Antequera F., 2015 A species- 748 specific nucleosomal signature defines a periodic distribution of amino acids in 749 proteins. Open Biology 5: 140218–140218.

750 Reynolds S. M., Bilmes J. A., Noble W. S., 2010 Learning a Weighted Sequence 751 Model of the Nucleosome Core and Linker Yields More Accurate Predictions in 752 Saccharomyces cerevisiae and Homo sapiens (U Ohler, Ed.). PLoS Comput Biol 753 6: e1000834.

754 Robb F. T., Sowers K. R., Place A. R., Schreier H. J., 1995 Archaea. CSHL Press.

755 Sasaki N., Hirai M., Maeda K., Yui R., Itoh K., Namiki S., Morita T., Hata M., 756 Murakami-Murofushi K., Matsuoka H., Kita K., Sato S., 2009 The Plasmodium 757 HU homolog, which binds the plastid DNA sequence-independent manner, is 758 essential for the parasite's survival. FEBS Letters 583: 1446–1450.

759 Searcy D. G., Stein D. B., 1980 Nucleoprotein subunit structure in an unusual 760 prokaryotic organism: Thermoplasma acidophilum. Biochimica et Biophysica 761 Acta (BBA) - Nucleic Acids and Protein Synthesis 609: 180–195.

762 Stamatakis A., 2014 RAxML version 8: a tool for phylogenetic analysis and post- 763 analysis of large phylogenies. Bioinformatics 30: 1312–1313.

764 Stein D. B., Searcy D. G., 1978 Physiologically important stabilization of DNA by a 765 prokaryotic histone-like protein. Science 202: 219–221.

766 Sun N., Pan C., Nickell S., proteome M. M. J. O., 2010, 2010 Quantitative Proteome 767 and Transcriptome Analysis of the Archaeon Thermoplasma acidophilum 768 Cultured under Aerobic and Anaerobic Conditions. Journal of Proteome Research 769 9: 4839–4850.

770 Swiercz J. P., Nanji T., Gloyd M., Guarné A., Elliot M. A., 2013 A novel nucleoid- 771 associated protein specific to the actinobacteria. Nucleic Acids Research 41:

24 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

772 4171–4184.

773 Swinger K. K., Rice P. A., 2007 Structure-based Analysis of HU–DNA Binding. 774 Journal of Molecular Biology 365: 1005–1016.

775 Thomm M., Stetter K. O., Zillig W., 1982 Histone-like Proteins in Eu- and 776 Archaebacteria. Zentralblatt für Bakteriologie Mikrobiologie und Hygiene: I. Abt. 777 Originale C: Allgemeine, angewandte und ökologische Mikrobiologie 3: 128– 778 139.

779 Tillo D., Hughes T. R., 2009 G+C content dominates intrinsic nucleosome 780 occupancy. BMC Bioinformatics 10: 442.

781 Waterhouse A., Bertoni M., Bienert S., Studer G., Tauriello G., Gumienny R., Heer F. 782 T., de Beer T. A. P., Rempfer C., Bordoli L., Lepore R., Schwede T., 2018 783 SWISS-MODEL: homology modelling of protein structures and complexes. 784 Nucleic Acids Research 46: W296–W303.

785 Weidenbach K., Glöer J., Ehlers C., Sandman K., Reeve J. N., Schmitz R. A., 2008 786 Deletion of the archaeal histone in Methanosarcina mazei Gö1 results in reduced 787 growth and genomic transcription. Molecular Microbiology 67: 662–671.

788 Wong J. T. Y., New D. C., Wong J. C. W., Hung V. K. L., 2003 Histone-Like 789 Proteins of the Dinoflagellate Crypthecodinium cohnii Have Homologies to 790 Bacterial DNA-Binding Proteins. Eukaryotic Cell 2: 646–650.

791 Xie Y., Reeve J. N., 2004 Transcription by an archaeal RNA polymerase is slowed 792 but not blocked by an archaeal nucleosome. Journal of Bacteriology 186: 3492– 793 3498.

794 Zhang Z., Guo L., Huang L., 2012 Archaeal chromatin proteins. Sci China Life Sci 795 55: 377–385.

796

25 bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International licensec . a α α β β α β α HTa HTa (T. acidophilum) 1 90 14 Alba RpoB α α β β β α HupA (E. coli) 1 90 12 α β α β α 10 HmfA (M. fervidus) 1 69 8

b 6 4 2

(HTa) Log protein abundance (Sun et al. ) 2 T. acidophilum

−6 −4 −2 0 2 4 6 Log transcript abundance (this study) 4 -1

2.4 e -1 0.8 HupA HupB

-0.8 t al. )

kJ mol H−NS

-2.4 10 Fis -4 RpoB 8 6

(HupA)2 4 Log protein abundance (Lu e E. coli

−2 −1 0 1 2 3 4 Log transcript abundance (3-study average)

Figure 1. Predicted structure and measured abundance of HTa. (a) Predicted secondary structures of HTa (T. acidophilum), the bacterial HU protein

HupA (E. coli), and the archaeal histone protein HmfA (M. fervidus). (b) Predicted quaternary structure of the (HTa)2 homodimer compared to the crystal structure of (HupA)2 (PDB: 1p51) bound to DNA. Colour gradients represent charge densities mapped onto the solvent accessible surface area of (HTa)2 and (HupA)2. Note the extended patches of stronger positive charge for (HTa)2 compared to (HupA)2, particularly in the stalk region. (c) Correlation of transcript and protein abundances for T. acidophilum and E. coli. HTa and HU are highlighted along with some additional chromatin-associated proteins. Data sources: T. acidophilum protein abundance: Sun et al. (2010); E. coli protein abundance: Lu et al. (2007). E. coli transcript abundance is an average across three previous studies as reported by Lu et al. (2007). bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. a

Nephila clavipes

(PRD31632.1)

Methanosphaera sp. rholeuAM74 (RAP44745.1)

Thermoplasmata M8B2D (PNX46291.1)

Fistulifera solaris (GAX13946.1 & GAX17495.1) Thermoplasmatales/Aciduliprofundum spp. Thalassiosira oceanica

Lokiarchaeota archaeon CR 4 (OLS13623.1) Oryza sativa Japonica Group (BAS72442.1)

Fragilariopsis cylindrus Nephila clavipes (PRD22299.1)

(EJK68815.1)

CCMP1102 (OEU13407.1)

Aspergillus sclerotialis (RJE17645.1)

CyanidioschyzonCyanidiaceae_sp._MX-AZ01 merolae (AIA61201.1) Pantholops hodgsonii (XP_005974949.1) Nannochloropsis gaditana

Pantholops hodgsonii (XP_005980042.1) strain 10D (XP_849067.1) (EWM26240.1)

Emiliania huxleyi (XP_005764861.1)CCMP1516

Helicosporidium sp. ATCC 50920 Plasmodium spp. (KDD76417.1) Acanthamoeba castellanii Theileria spp. (XP_004336062.1) str. Neff Babesia spp. Toxoplasma spp. other Apicomplexa

Oikopleura dioica (CBY13230.1) Ricinus communis (XP_002538095.1)

Hyalella azteca (XP_018010076.1)

Drosophila kikkawaiHomo sapiens (XP_017037840.1)(XP_005982097.1) (SJM33774.1)

Beta vulgaris (KMS64563.1) Pantholops hodgsonii

Cajanus cajan (KYP78898.1) Pythium insidiosum (GAX97320.1) Dorcoceras hygrometricum

Halorubrum aidingense Haloferax sp. SB29 (KTG30336.1) Nephila clavipes (PRD33985.1 & PRD40145.1) uncultured archaeon A07HR67 (ESS03323.1) Halogranum gelatinilyticum Noctiluca scintillans (ABV22345.1) (KZV15018.1)

JCM 13560 (EMA66313.1) Dinoflagellata spp. Ziziphus jujuba (XP_015872235.1) Amphidinium carterae (ACF28665.1 & ACF28651.1) Thorarchaeota archaeon SMTZ1-83 (KXH76530.1) (WP_089697618.1)

Lasius niger

(KMQ85438.1)

SAR spp. Viridiplantae spp. Bacteria Viridiplantae spp.

Archaea spp. Trichomonas vaginalis Bootstrap support (%) spp. Eukaryota

Cryptophyta Virus/phage 0-10 90-100 Carophyta Paulinella spp.

0.6 G3(XP_001307951.1)

b

37 Fusobacterium ulcerans (WP_005977149.1) 75 Fusobacterium varium (WP_096402311.1) 0 99 Fusobacterium varium (WP_005951637.1) Fusobacterium ulcerans (WP_005976496.1) 0 Methanosphaera sp. rholeuAM74 (RAP44745.1) 99 23 Parvularcula sp. SM1705 (WP_116392575.1) 0 Tepidicaulis (WP_045445695.1) Granulibacter bethesdensis CGDNIH1 (ABI61581.1) 0 Bacillus virus G (YP_009015542.1) 0 Chitinibacter tainanensis (WP_028449755.1) 3 Lewinella nigricans (WP_099150092.1) 100 0 0 Clostridium phage c-st (YP_398604.1) 0 Clostridium phage D-1873 (EES90336.1) Thermodesulfovibrio islandicus (WP_028841966.1) Hydrogenobaculum (WP_012514093.1) 29 Candidatus Pelagibacter ubique (WP_075522291.1) Dethiosulfatarculus sandiegensis (WP_044347245.1) Desulfobulbaceae bacterium A2 (OQX15673.1) 0 0 99 Holospora curviuscula (WP_104206949.1) 3 67 Holospora elegans E1 (GAJ46836.1) Holospora obtusa F1 (ETZ06777.1) Desulfotalea sp. (PHR27282.1) Dolosigranulum pigrum (WP_112790374.1) 1 100 99 Nocardiopsis synnemataformans (WP_063822307.1) 1 Nocardiopsis sp. NRRL B-16309 (WP_053618021.1) Microbacterium (WP_110588076.1) 0 0 Bisgaard Taxon 44 Serinibacter salmoneus (PFG20718.1) (WP_119524597.1) uncultured euryarchaeote Alv-FOS4 (AAZ32517.1) 40 Aciduliprofundum boonei (WP_008082219.1) 16 Aciduliprofundum sp. MAR08-339 (WP_015283134.1) Euryarchaeota archaeon ex4484 178 (OYT58879.1) Euryarchaeota archaeon ex4484 17 (OYT59826.1) 0 28 100 12 30 Thermoplasmatales archaeon I-plasma (EQB64933.1) 84 Thermoplasmatales archaeon I-plasma (EQB64459.1) Aciduliprofundum sp. (PMP74981.1) 73 Thermoplasmatales archaeon A-plasma (EQB70955.1) 31 79 uncultured archaeon (AKA47862.1) Thermoplasmatales archaeon B DKE (OWP57332.1) 40 Picrophilus (WP_011177176.1) 35 Acidiplasma (WP_048101779.1) 30 61 0 4 12 99 100 Ferroplasma sp. Type II (EQB71534.1) Ferroplasma sp. Type II (EQB71966.1) Ferroplasma (WP_009887139.1) 8 68 Cuniculiplasma divulgatum (WP_021790421.1) Thermoplasmatales archaeon E-plasma (EQB66868.1) 95 Thermoplasma acidophilum (WP_010900520.1) 51 Thermoplasma volcanium (WP_010916416.1) Thermoplasma sp. Kam2015 (WP_110640116.1) Aciduliprofundum sp. (PMP73909.1) Thalassiosira oceanica (EJK68815.1) 9 Bacteroides salanitronis (WP_013619549.1) Clostridium sp. AF50-3 (WP_118490970.1) bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 2. Phylogenetic relationships of HU family proteins from bacteria, eukaryotes, and archaea. (a) protein-level phylogenetic tree of HU proteins including HTa (see Methods for details on phylogenetic reconstruction). The tree is midpoint-rooted. Reported domain-level membership (Bacteria, Archaea, etc.) of different proteins is colour-coded in the outer circle and on the dotted lines that point to individual branches. See Supplementary Text for a critical evaluation of domain assignments and likely assembly contaminants. Bootstrap support values (%) for individual branches, visually encoded as node diameters, illustrate poorly resolved relationships at deeper nodes. (b) Excerpt of the phylogeny shown above, highlighting good support (84%) for a monophyletic origin of HU proteins in the Thermoplasmatales/DHVE2 clade and their uncertain affiliation to other HU family members. bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. a b Day 1 Day 2 Day 3 Day 3.5 E.coli

T. acidophilum WT::HTa WT::HUα WT::YFP Ladder Ladder Undigested 15 min 30 min Undigested 15 min 30 min Undigested 15 min 30 min Undigested 15 min 30 min Ladder Ladder Undigested 15 min 15 min Undigested 15 min Undigested 15 min Undigested

100bp 100bp 75bp 75bp 50bp 50bp 35bp

e c d Fragment length (bp) Thermoplasma acidophilum genome 1,555 kb 40 60 80 100 120 140 160 180 0 to 2 (a.u.)

40 MNase signal 3 (raw) 60 0 to 2500 (a.u.) 80 2 Input DNA 100 0 to 1 (a.u.)

% total reads 120 1 MNase signal 140

Fragment length (bp) (normalized)

160 30 to 66 %

0 180 0 50 100 150 200 GC content Spearman’s ρ Fragment length (bp) 0 0.2 0.4 0.6 0.8 1 f 10 kbp 160,000 170,000 Locus

Small fragments (40-65bp) MNase-seq Large fragments (70-100bp) called peaks

Overlap of narrow peaks across the growth cycle (required overlap ≥95%) Overlap of broad peaks across the growth cycle (required overlap ≥95%) g

150 543 day 3 day 1 day 2 day 2 day 3 605 day 2 2025 954 678 5130 60 97 day 1 665 137 3055 10046 665 4197 day 3.5 3167 1022 7681 107 178 1104 533 day 2 2765 880 236 day 3 395 524 378 day 3 day 3.5

Figure 3. HTa-mediated primary chromatin architecture in T. acidophilum mapped by MNase-Seq. (a) Agarose gel of MNase digestion products from T. acidophilum sampled across the growth cycle. Growth phases are given as days after inoculation, digestion time in minutes. (b) Agarose gel of MNase digestion products from T. acidophilum (day 2) along with digestion products of E. coli ectopically expressing HTa, HupA or YFP (see Methods). (c) Distribution of the lengths of fragments mapped to the T. acidophilum genome (pooled across all four replicates from day 2), highlighting fragment size ranges that correspond to small (blue) and large (red) fragments, as defined in the main text. (d) Correlation matrix comparing genome-wide MNase-Seq coverage signal, computed at base pair resolution, between reads of defined sizes (pooled replicates, day 2). (e) Genome-wide MNase-Seq signal prior to and after normalization with sonicated DNA input (see Methods), along with GC content profile along the T. acidophilum chromosome, computed using a 51bp moving window. (f) Example of coverage and called peaks across a 10kb region of the T. acidophilum chromosome. (g) Overlap of detected narrow and broad peaks across the growth cycle. Note that different sections/overlaps are only qualitatively but not quantitatively proportional to absolute peak numbers. i f c a Broad peakfootprint Small fragments Figure 4. Asymmetric coveragesignalsaround peaksin (f, g,h)Normalizedcoveragefor by fragmentlengthrelativetothecentreoflarge peaks(correspondingtothebindingfootprints of octamerichistoneoligomers)in and broadpeaksin small fragmentsaroundthedyadaxis. (a) Empiricalexampleand expected bychance,estimatedviarandomsamplingof25000regions equal sizeineachgenome. WW (= AA|AT|TA|TT) dinucleotidesatthesamerelative positionsas(c,d,e).DottedlinesindicatetheproportionofSSor WW dinucleotides corresponding totheexpectedfootprintsizesofhistonetetramers,hexamers, andoctamers. Proportion of dinucleotides Relative coverage (z−score) Fragment length (bp) Large fragments Position relativetocentreofnarrowpeaks(bp) Position relativetocentreofnarrowpeaks(bp) Position relative tocentreofnarrowpeaks(bp) (70-100bp) 0.15 0.20 0.25 0.30 0.35 (40-65bp) bioRxiv preprint −180 −0.50 0.5 1.0 1.5 2.0 20 50 80 110 not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade -100 -100 SS -50 5 50 -50 0 0 T. acidophilum.Coveragearoundbroadpeaksisorientedasexplainedin(b). 0 doi: large fragments small fragments WW 50 https://doi.org/10.1101/564930 100 100 (b) schematicdescribingourapproachtore-orientingcoveragesignalsatbroad peaksbasedonthecoverageof T. acidophilum 180 T. acidophilumsmall(40-65bp)andlarge (70-100bp)fragmentsand (c, d)HeatmapsillustratingMNase-seqcoveragebyfragmentlengthrelativeto thecentreofnarrow d g j Proportion of dinucleotides Relative coverage (z−score) Fragment length (bp) Position relativetocentreofbroadpeaks(bp) Position relative tocentreofbroadpeaks(bp) Position relativetocentreofbroadpeaks(bp) 0.15 0.20 0.25 0.30 0.35 −180 −0.50 0.5 1.0 1.5 2.0 20 50 80 110 b available undera -100 -100 SS broad peak -50 5 50 -50 footprint T. acidophilumandM.fervidus thattrackunderlyingnucleotidecontent. ; a this versionpostedMarch1,2019. P1 0 0 0 b large fragments small fragments 50 retain orientation a/b >1; WW CC-BY 4.0Internationallicense 100 100 180 k h e broad peak Position relativetocentreof117-127bppeaks(bp) footprint Position relative tocentreof117-127bppeaks(bp) Proportion of dinucleotides Position relativetocentreof117-127bppeaks(bp) Relative coverage (z−score) Fragment length (bp) a 0.1 0.2 0.3 0.4 0.5 P2 −300 −0.5 0 0.5 1.0 20 50 80 110 b (i, j,k)ProportionofSS(=CC|CG|GC|GG)and mirror orientation a/b <1; (e) Analogousheatmapillustratingcoverage -100 -100 M.fervidus The copyrightholderforthispreprint(whichwas SS -50 5 50 -50 M. fervidusfragmentranges . 0 0 0 50 100 100 WW 300 large fragments(117−127bp) medium fragments(87−97bp) small fragments(60−70bp)

Relative coverage (%) broad peak M. fervidus. footprint 0 1 2 3 4 P1+P2 a b peaks aggregated re-oriented, bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. a

T. acidophilum M. fervidus T. kodakarensis

50bp reads overlapping 90bp reads overlapping 60bp reads overlapping 120bp reads overlapping 60bp reads overlapping with narrow peaks with broad peaks with 60-70bp peaks with 117-127bp peaks with 55-65bp peaks 0.2 0.1 0 0.1 0.2 0.3 0 0.1 0.2 0.3 0.4 -50 0 50 -50 0 50 -50 0 50 500-50 -50 0 50 Proportion of CC | C G GG dinucleotides Distance to centre of reads (bp) Distance to centre of reads (bp) Distance to centre of reads (bp) Distance to centre of reads (bp) Distance to centre of reads (bp)

b c

T. acidophilum M. fervidus T. kodakarensis T. acidophilum M. fervidus T. kodakarensis

RR YY RR YY ) A T ( T A

A T RR ( YY )

Proportion of T A YY RR Proportion of 0.1 0.2 0.3 0.4 0.5 0.20 0.25 0.30 0.35 -50 0 50 -50 0 50 -50 0 50 -50 0 50 -50 0 50 -50 0 50 Distance to centre of reads (bp) Distance to centre of reads (bp) Distance to centre of reads (bp) Distance to centre of reads (bp) Distance to centre of reads (bp) Distance to centre of reads (bp)

fragment size (bp) MNase-seq MNase-seq d e vs vs growth phase DNA k-mer, training set LASSO Model T. acidophilum species GG C G CC A T AT C G G C G C C G A T A T A 4 T G C training test A G T A A T A C T T T A T 21 21

21 21 21 21 21 21 set set

40-65 exp Spearman’s 2 70-100 T.acidophilum ρ 40-65 stat 70-100 0.8 0 60-70 0.4 exp 87-97 count (x103) 117-127 0

Predicted occupancy (a.u.) 3 M.fervidus −2 60-70 1 −0.4 stat 87-97 117-127 −0.8 -2 0 2 4 6 H.volcanii exp 50 Observed occupancy (z-score)

Figure 5. Comparison and predictive power of nucleotide enrichment patterns associated with HTa and archaeal histones. (a) Proportion of SS (= CC|CG|GC|GG) dinucleotides, (b) A|T mononucleotides, and (c) RR (= purine/purine)|YY (= pyrimidine/pyrimidine) dinucleotides relative to the centres of reads of defined length in different archaeal species (see Methods for read filtering). (d) Density plot comparing observed (day 2, replicate 2) and predicted MNase-Seq coverage across the part of the T. acidophilum chromosome not used for training. (e) Correlation between MNase-seq coverage and individual DNA k-mers with particularly high positive or negative correlation coefficients, as observed in the training data. Overall correlations between measured MNase-Seq coverage and coverage predicted by the LASSO model, for both trained and untrained data, are shown on the right-hand side. a GC content

Position relativetocentreofbroadpeaks(bp) 0.2 0.3 0.4 0.5 0.6 during exponentialphase(day2). large andsmallfragmentsduringexponential andstationaryphase.Forallgraphs,deciledecompositionisbasedonsmallfragmentoccupancy broad peaks(day2),separatedintodecilesbasedontherelativeabundanceofsmallfragmentsand Figure 6.Broad peaksare associatedwithheterogeneous GCcontentinexponentialbutnotstationaryphase.(a) bioRxiv preprint not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade abundance ofsmallfragments −100 −50 0 doi: 50 https://doi.org/10.1101/564930 100 b Relative coverage (z−score) Relative coverage (z−score)

−180 3

−1 0 1 2 −180 -1 0 1 2 available undera Position relativetocentreofpeaks(bp) Exponential phase(day2) ; this versionpostedMarch1,2019. 0 0 CC-BY 4.0Internationallicense 180 180 −180

−180 -1 0 1 2 Position relativetocentreofpeaks(bp) Stationary phase(day3.5) The copyrightholderforthispreprint(whichwas (b) thecorrespondingrelativecoveragefor . 0 0 180 180 Average GCcontentat Large fragments Small fragments bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was abundance of small fragments a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 40 50 30 % overlap with intergenes 0 10 20 1 2 3 4 5 6 7 8 9 10 Small fragment abundance decile b T. acidophilum d T. kodakarensis

exponential phase stationary phase

exponential 3

110 110 exponential 2 80 80 1 −0.8 −0.4 0 Fragment length (bp) Fragment 50 50 Relative coverage (%) coverage Relative 0

Median normalized coverage stationary Median normalized coverage −0.8 −0.4 0 20 20 −300 −100 0 100 300 −300 −100 0 100 300 −400 0 400 −400 0 400 Position relative to TSS (bp) Position relative to TSS (bp) Position relative to TSS (bp) Position relative to TSS (bp) c M.fervidus e H. volcanii

exponential phase stationary phase

stationary

3 110 110

2 exponential exponential 80 80 1 Fragment length (bp) Fragment 50 50

Relative coverage (%) coverage Relative 0 −1.0 −0.5 0 Median normalized coverage Median normalized coverage −0.6 −0.2 0.2 20 20 −300 −100 0 100 300 −300 −100 0 100 300 −400 0 400 −400 0 400 Position relative to ATG (bp) Position relative to ATG (bp) Position relative to ATG (bp) Position relative to TSS (bp)

f T. acidophilum

day 1 day 2 day 3 RNA-seq Small fragments Large fragments day 3.5

day 2

day 3 Upregulated genes z−score day 3.5 −1.0 0.0 1.0

day 2

day 3 Downregulated genes z−score day 3.5 −1.0 0.0 1.0

day 2

day 3 Unchanged genes z−score day 3.5 −1.0 0.0 1.0 −400 0 600 −400 0 600 −3 0 3 Distance from TSS (bp) Distance from TSS (bp) log2 fold-change (versus day1) bioRxiv preprint doi: https://doi.org/10.1101/564930; this version posted March 1, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 7. MNase-Seq coverage around transcriptional start sites in T. acidophilum and histone-encoding archaea in the context of dynamic transcription. (a) Broad peaks associated with low abundance of small fragments are enriched in intergenic regions. (b) Left and central panel: Heat maps indicating MNase-seq coverage by fragment length relative to transcriptional start sites in exponential (day 2) and stationary phase (day 3.5). Right panel: median normalized MNase-seq coverage (considering all fragment sizes) as a function of distance from the TSS. (c) as in (b) but for M. fervidus and using the coding start (ATG) rather than the TSS as a reference point. To ensure that the coding start constitutes a reasonable proxy for the TSS, only genes with a divergently oriented neighbouring gene are considered, thus eliminating genes internal to operons. (d, e) median of normalized MNase-seq coverage (considering all fragment sizes) as a function of distance from the TSS in T. kodakarensis and H. volcanii. (f) Changes in normalized MNase-seq coverage for small and large fragments around transcriptional start sites in T. acidophilum as a function of growth phase and whether genes are upregulated, downregulated or remain unchanged relative to RNA abundance on day 1. Genes are grouped according to differential expression (or lack thereof) on day 2 compared to day 1. Genes with a log2-fold change >1 were considered significantly upregulated, those with a log2-fold change <-1 significantly down-regulated (FDR<0.01). The rightmost panels indicate that a majority of genes up-/downregulated on day 2, remain up-/downregulated on days 3 and 3.5.