Genome

Primate-specific variants

Journal: Genome

Manuscript ID gen-2020-0094.R1

Manuscript Type: Mini Review

Date Submitted by the 16-Sep-2020 Author:

Complete List of Authors: Ding, Dongbo; Hong Kong University of Science and Technology School of Science, Division of Life Science NGUYEN, Thi Thuy; Hong Kong University of Science and Technology School of Science, Division of Life Science Pang, Matthew Yu Hin; Hong Kong University of Science and Technology School of Science, Division of Life Science Ishibashi, DraftToyotaka; Hong Kong University of Science and Technology School of Science, Division of Life Science

Keyword: histone variant, H2BFWT, H3.5, H3.X, H3.Y

Is the invited manuscript for consideration in a Special Genome Biology Issue? :

© The Author(s) or their Institution(s) Page 1 of 28 Genome

1 Primate-specific histone variants

2 Dongbo Ding1, Thi Thuy Nguyen1, Matthew Y.H. Pang1 and Toyotaka

3 Ishibashi1

4 1. Division of Life Science, The Hong Kong University of Science and Technology, 5 Clear Water Bay, NT, HKSAR, China 6

7

8

9

10

11 12 Draft 13

14

15

16

17

18

19 Correspondence should be addressed to :

20 Toyotaka Ishibashi

21 The Hong Kong University of Science and Technology

22 Clear Water Bay, NT, HKSAR, China

23 E-mail: [email protected]

24 Phone: +852-3469-2238

25 Fax: +852-2358-1552

26

© The Author(s) or their Institution(s) Genome Page 2 of 28

27 Abstract

28

29 Canonical (H2A, H2B, H3, and H4) are present in all eukaryotes where they

30 package genomic DNA and participate in numerous cellular processes, such as

31 transcription regulation and DNA repair. In addition to the canonical histones, there

32 are many histone variants, which have different amino acid sequences, possess tissue-

33 specific expression profiles, and function distinctly from the canonical counterparts. A

34 number of histone variants, including both core histones (H2A/H2B/H3/H4) and

35 linker histones (H1/H5), have been identified to date. Htz1 (H2A.Z) and CENP-A

36 (CenH3) are present from yeasts to mammals, and H3.3 is present from Tetrahymena

37 to humans. In addition to the prevalent variants, others like H3.4 (H3t), H2A.Bbd, and

38 TH2B, as well as several H1 variants,Draft are found to be specific to mammals. Among

39 them, H2BFWT, H3.5, H3.X, H3.Y, and H4G are unique to primates (or Hominidae).

40 In this review, we focus on localization and function of primate- or hominidae-

41 specific histone variants.

42

43

44

45 Key Words

46

47 Histone variants, H2BFWT, H3.5, H3.X, H3.Y, H4G

48

© The Author(s) or their Institution(s) Page 3 of 28 Genome

49

50 1. Telomere-localized and spermatogenesis-specific histone variants, H2BFWT

51

52 , one of the four canonical histones involved in chromatin formation,

53 has several canonical isoforms (Molden et al. 2015) and two variants: TH2B and

54 H2BFWT, both of which have been identified as testis-specific histones. TH2B is

55 present in later stages of spermatogenesis and is expressed in most mammalian

56 species (Montellier et al. 2013), whereas H2BFWT is expressed only in primates.

57 H2BFWT is also known as H2BW1, is located at the locus Xq22.2 in humans (Table

58 1). The H2BFWT shares 43% amino acid sequence identity with the

59 canonical H2B, and 46% identity with TH2B (Figure 1). H2BFWT was reported to

60 be 153 amino acid residues long inDraft vivo with an extended N-terminal tail compared

61 to canonical H2B (Boulard et al. 2006).

62

63 GFP-H2BFWT was found to mainly localize in the telomeric region in Chinese

64 hamster immortal lung fibroblasts (the V79 cell line) (Churikov et al. 2004).

65 However, there is limited knowledge about its function in the telomeric region.

66 H2BFWT can also be loaded into nucleosomes using in vitro salt gradient dialysis

67 methods, and these reconstituted H2BFWT nucleosomes can be remodeled and

68 mobilized by the SWI/SNF chromatin remodeling complex more efficiently than that

69 the canonical nucleosomes (Boulard et al. 2006). In contrast to canonical H2B,

70 H2BFWT was reported to be unable to recruit condensation factors and

71 participate in the assembly of mitotic (Boulard et al. 2006).

72

© The Author(s) or their Institution(s) Genome Page 4 of 28

73 Several studies have suggested a relationship between single nucleotide

74 polymorphisms (SNPs) in H2BFWT and defects in male fertility (Lee et al. 2009;

75 Rafatmanesh et al. 2018; Ying et al. 2012). In particular, three groups have linked the

76 -9C > T mutation (rs7885967) within the 5’ untranslated region of H2BFWT to non-

77 azoospermia (p = 0.018), a reduced sperm count (p = 0.0127) and a weakened sperm

78 vitality (p = 0.0076) (Lee et al. 2009; Rafatmanesh et al. 2018; Ying et al. 2012). A

79 related investigation by Ying et al. revealed that the 368A > G mutation (rs553509)

80 increased the risk of male sterility (Ying et al. 2012)(Figure 2).

81

82 A more contemporary analysis assessing copy number variations in chromosome X

83 revealed that 5 out of 42 female participants with premature ovarian failure suffer

84 from either gains or losses of genomicDraft regions containing H2BFWT and H2BFM (an

85 uncharacterized histone variant located near H2BFWT) (Quilter et al. 2010).

86 Considering the highly testis-specific expression pattern of H2BFWT and the

87 detectable expression of H2BFM in the ovary (Papatheodorou et al. 2020), it is not

88 unreasonable to speculate that the premature ovarian failure phenotype stems from

89 the deletion of H2BFM but not H2BFWT. However, the possibilities of a correlation

90 between mutations in both H2BFWT and H2BFM, and premature ovarian failures

91 cannot be ignored, especially when detailed expression profiles of H2BFWT and

92 H2BFM in female reproductive organs are currently unavailable. Compounded by

93 the fact that the biological role of H2BFM has yet to be explored, further research is

94 necessary to evaluate these hypotheses. Although H2BFWT SNPs increase the

95 probability of infertility, it is possible that H2BFWT is not the only causative factor.

96 It would also be interesting to study its combination with another histone variant’s

97 mutation contributing to infertility. Further characterization of H2BFWT would be

© The Author(s) or their Institution(s) Page 5 of 28 Genome

98 helpful to understand how it might contribute to spermatogenesis and other cellular

99 processes.

100

101 Regarding the H2B family, other than H2BFWT, little attention has been given to

102 multi-copied canonical H2B isoforms, which differ from each other by a few amino

103 acids. Most of them are currently grouped together as canonical H2Bs due to their

104 high similarity (Molden et al. 2015). Most changes in these isoforms are small and

105 have minor effects on the H2B protein structure, although some isoforms such as

106 H2B1L, H2B1M, and H2B1A could possess an alternative N-terminal tail

107 conformation (Molden et al. 2015). It remains unknown if these structural

108 modifications can cause distinct functions or localizations compared to canonical

109 H2B, but it would be interestingDraft to study how the N-terminal tail of H2B can

110 contribute to nucleosome stability, change the chromatin structure, interact with other

111 , cause distinct genomic localization, and potentially alter post-translational

112 modification patterns if critical amino acid residues are changed.

113

114 2. Testis-specific histone variant H3.5 regulates spermatogenesis while H3.X and

115 H3.Y express in the brain

116

117 The family has several histone variants, but although their sequences are

118 similar, they have distinct functions (Buschbeck and Hake 2017). For example, H3.3

119 is widely expressed and is localized in the promoter region of actively expressed

120 , while another histone variant, H3T (H3.4), is only present in the testis and

121 plays a critical role in the dynamics of chromatin transitions during spermatogenesis

122 (Ausio et al. 2016; Yuen et al. 2014). In primates, three additional H3 variants have

© The Author(s) or their Institution(s) Genome Page 6 of 28

123 been identified; H3.5 (also known as H3.3C) (Schenk et al. 2011; Ederveen et al.

124 2011), H3.X, and H3.Y (Wiedemann et al. 2010).

125

126 The H3.5 is localized in the 12p11.21 region of human chromosome 12 (Schenk

127 et al. 2011). The amino acid sequence of H3.5 is most similar to that of histone

128 variant H3.3, which is involved in the maintenance of transcriptionally active

129 chromatin (Mito et al. 2005). The H3.X and H3.Y genes were identified at the same

130 time and are located at the 5p15.1 chromosome region in humans (Wiedemann et al.

131 2010). Both H3.X and H3.Y are expressed in the same tissues and are considered to

132 have similar functions, including regulating the cell cycle and facilitating DUX4 133 target genes expression (Resnick etDraft al. 2019; Wiedemann et al. 2010). 134

135 Histone variant H3.5 varies by five amino acid residues from H3.3 (Figure 1). As a

136 result, the H3.5 nucleosome has lower nucleosome stability than the H3.3

137 nucleosome, but it is more stable than the H3.4 (H3T) nucleosome (Urahama et al.

138 2016). This decrease in nucleosome stability can mainly be attributed to the H3.5L103

139 residue which weakens the interaction between H3.5 and H4 (Urahama et al. 2016).

140 The H3.5/H4 tetrasome is also less stable than the H3.3/H4 tetrasome (Urahama et al.

141 2016). Interestingly, NAP1L1, which is a common -H2B and H3-H4

142 chaperone (Park and Luger 2006), has a stronger binding affinity to the H3.5/H4

143 tetramer than the H3.3/H4 tetramer because of the amino acid changes at the N-

144 terminal of H3.5 in our unpublished experiment.

145

© The Author(s) or their Institution(s) Page 7 of 28 Genome

146 In cultured HepG2 and HEK293 cells, transiently expressed GST-H3.5 can be

147 assembled into chromatin and is preferentially localized at the euchromatin regions

148 (Schenk et al. 2011). H3.5 had a very similar genomic localization pattern as the

149 histone post-translational modifications of pan-H3K4me3 and pan-H3K9ac. However,

150 H3.5 was excluded from pan-H3K9me3 or HP1α positive regions in HEK293 cells

151 (Schenk et al. 2011). Furthermore, when H3.3 was depleted with RNAi, H3.5 was

152 observed to associate with the actively-transcribed genes and the essential functions

153 of H3.3 can be replaced by H3.5 (Schenk et al. 2011). Based on fluorescence recovery

154 after photobleaching (FRAP) assay, it was found that H3.5 also has much higher

155 mobility than H3.3 in HeLa cells (Urahama et al. 2016). 156 Draft 157 Similar to H3T, H3.5 is expressed specifically in the testes. In humans, H3.5 is mainly

158 expressed in spermatogonia and primary spermatocytes of germinal stages VI–X.

159 H3T, on the other hand, is expressed in spermatocytes and spermatids, but neither

160 H3T nor H3.5 are present in sperm (Figure 3)(Schenk et al. 2011; Shiraishi et al.

161 2018; Ueda et al. 2017; Urahama et al. 2016). ChIP-sequencing data of H3.5 from

162 human testes (DDBJ:DRA002604) showed that H3.5 accumulates around the TSS of

163 both active and silent genes; this is quite different from H3.3, which is primarily

164 enriched at the TSS of active genes (Urahama et al. 2016). However, it is still unclear

165 whether H3.5 is only involved in gene activation or has other functions as a TSS

166 marker. Based on clinical data, H3.5 mRNA expression levels in patients with non-

167 obstructive azoospermia (NOA) was much lower than in patients with obstructive

168 azoospermia or normal individuals (Shiraishi et al. 2018). The expression of H3.5 in

169 NOA patients can be rescued by glycoprotein polypeptide hormone gonadotropins

© The Author(s) or their Institution(s) Genome Page 8 of 28

170 therapy (Shiraishi et al. 2018), where the therapeutic method has been described in

171 detail elsewhere (Kato et al. 2014).

172

173 Overall, the presence of H3.5 is important for normal spermatogenesis where it is

174 postulated to enhance DNA replication and transcription, and its expression is

175 regulated by glycoprotein polypeptide hormones gonadotropins (Figure 3) (Shiraishi

176 et al. 2018). A lack of published studies regarding loss-of-function in H3.5 limits our

177 understanding of the exact function of H3.5 in spermatogenesis. In addition, at the

178 position corresponding to K36/K37 of canonical H3s, H3.5 only has one lysine

179 residue. Taking into account that H3K36/K37 are important PTM sites, it is therefore 180 likely that H3.5 would alter the PTMDraft pattern. Furthermore, there are two exact copies 181 of ARKST motifs in H3.5 around the K9 and K27 residues, compared to H3.1, H3.2,

182 and H3.3 which only have one such motif at the K27 position (Schenk et al. 2011). It

183 would be interesting to study how post-translational modification patterns of H3.5 are

184 affected by these alternations. The current H3.5 working model during

185 spermatogenesis is as follows: gonadotrophins enhance H3.5 expression at early

186 stages of spermatogenesis; then, when H3.5 is incorporated into genomic DNA, it

187 forms a relaxed chromatin structure in spermatogonia and primary spermatocyte

188 stages; DNA replication and transcription are enhanced by these relaxed structures,

189 and spermatogenesis then progresses to further stages (Figure 3) (Shiraishi et al.

190 2018). It may be important to study the function of H3.5 in spermatogenesis in order

191 to better understand the primate spermatogenesis process.

192

© The Author(s) or their Institution(s) Page 9 of 28 Genome

193 The other primates-specific histone variants, H3.X and H3.Y, also have high

194 similarities to H3.3 (Ederveen et al. 2011). H3.X and H3.Y have remarkably similar

195 primary sequences, with only four amino acids difference in the core domain

196 (Wiedemann et al. 2010). The main difference between these variants is the presence

197 of 11 extra amino acid residues at the C-terminal tail of H3.X (Wiedemann et al.

198 2010). Both H3.X and H3.Y are expressed in human osteosarcoma epithelial U2OS

199 cells and, interestingly, in a subpopulation of neurons in the human brain, where H3.Y

200 may function as a regulator of cellular response to cellular stimuli (Wiedemann et al.

201 2010). The H3.Y/H4 tetrasome is more stable than the H3.3/H4 tetrasome, where the

202 H3.YM124 residue is responsible for the increase in tetrasome stability (Kujirai et al.

203 2017). Interestingly, conformations of H3.Y nucleosome and chromatin are more

204 open than the H3.3/canonical nucleosomeDraft and chromatin (Kujirai et al. 2017). The

205 K42 residue in H3.Y is important for this open conformation. In addition, the

206 H2A/H2B dimer can be released more easily in the H3.Y nucleosome than in the H3.3

207 nucleosome (Kujirai et al. 2017; Kujirai et al. 2016). Furthermore, the H3.Y

208 nucleosome is more accessible for micrococcal nuclease (MNase) digestion and H1

209 binding (Kujirai et al. 2016). In cells, H3.X and H3.Y have similar exchange speeds

210 as H3.3 (Kujirai et al. 2016; Wiedemann et al. 2010). However, little is known about

211 the relationship between the H3.X nucleosome and chromatin stability. The extra 11

212 amino acid residues at the C-terminal tail of H3.X may further alter the nucleosome

213 and chromatin structure.

214

215 In U2OS cells, elevated H3.X and H3.Y expression levels were observed under

216 starvation or overgrowth conditions (Wiedemann et al. 2010). Once H3.X or H3.Y

217 expression was knocked down by siRNA, the U2OS cell growth speed was reduced.

© The Author(s) or their Institution(s) Genome Page 10 of 28

218 This is more prominent when H3.Y was knocked down. enrichment

219 analysis showed that most of the overlapping downregulated genes in H3.Y and

220 H3.X/H3.Y knocked down cells are related to cell cycle and chromatin organization

221 (Wiedemann et al. 2010). When H3.Y was overexpressed in HeLa cells, it was found

222 that H3.Y was enriched around the TSS of active gene regions, similar to what was

223 observed with H3.3 (Kujirai et al. 2016). Another study found that this enrichment is

224 caused by the interaction between the H3.Y and HIRA complex, a H3.3-H4 histone

225 chaperone, and not by another H3.3-H4 histone chaperone DAXX/ATRX complex

226 (Zink et al. 2017). This specific interaction between H3.Y and the HIRA complex was

227 abolished when H3.YL46 was mutated (Zink et al. 2017). When the expression of a

228 common transcription factor DUX4, which is highly expressed in the early cleavage-

229 stage embryo, was shortly inducedDraft by doxycycline, the expression of H3.X and H3.Y

230 can be induced in a myoblast cell line (Resnick et al. 2019). H3.X and H3.Y then bind

231 onto the DUX4 target genes and further facilitate their expression (Resnick et al.

232 2019).

233

234 The functions of H3.X and H3.Y in vivo have not been well studied. Currently, it is

235 known that these two histone variants have relatively high expression in brain tissues,

236 including the cerebral cortex, cerebellum, and hippocampus (Figure 4) (Wiedemann et

237 al. 2010). In addition to the brain, H3.X and H3.Y are also expressed in the testis and

238 some tumors, such as bones, breasts, lungs, and ovaries (Wiedemann et al. 2010).

239 However, the exact functions of H3.X and H3.Y are still unknown. This may be due

240 to limitations in accessing some of these tissues, especially for the brain.

241

© The Author(s) or their Institution(s) Page 11 of 28 Genome

242 3. variant H4G – The breast cancer-driving sibling of H4 that

243 resides in the nucleolus

244

245 Histone H4 is the most conserved histone and therefore is the least likely to have an

246 impressive number of variants (Kamakaka and Biggins 2005). Previously, the histone

247 H4 gene h4r identified in Drosophila was found to encode a protein identical to

248 canonical H4 in the same genus (Akhmanova et al. 1996). Since there are no h4r

249 homologs in other species, it was believed that histone H4 is the only histone that

250 does not have any functionally distinct histone variant (Holmes et al. 2005). Recently,

251 H4G (histone H4-like protein type G) was found to be a hominidae-specific histone

252 variant that drives breast cancer proliferation through the formation of a temporary

253 nucleoprotein complex (Long et al.Draft 2019; Pang et al. 2020). Located in histone cluster

254 1 on (6p22.1–22.2) in humans, the uninterrupted gene H4C7 (h4g),

255 which was previously known as HIST1H4G or H4FL, encodes the H4G protein

256 (Brown et al. 2015; Singh et al. 2018). Homology search identified highly similar

257 genes in Gorilla gorilla, Pan troglodytes, and other close species, indicating that H4G

258 is likely to be specific to hominids (Long et al. 2019). Although the molecular

259 evolutionary pathway has yet to be unveiled, H4G is presumed to be the newest

260 histone variant emanating from canonical H4. While shortened by five amino acids at

261 the C-terminal tail, a region that was reported to be lethal when deleted in yeast

262 (Kayne et al. 1988; Santisteban et al. 1997), the 98-amino-acid-long H4G shares an

263 85% amino acid sequence similarity with canonical H4 (Figure 1). The rest of the

264 amino acid changes are distributed rather evenly along the length of the entire protein.

265 Interestingly, the amino acid substitutions in the N-terminal tail and the α-helix 3 of

© The Author(s) or their Institution(s) Genome Page 12 of 28

266 the histone fold domain make H4G notably more hydrophobic and are likely

267 responsible for its unique characteristics (Long et al. 2019).

268

269 Like most other histone variants, H4G is only expressed at a fraction of the levels

270 compared to its canonical counterparts such as h4d and h4e. Although the baseline

271 expression of H4G is higher in areas such as livers, lungs, and ovaries, elevated

272 expression of H4G could be detected is in breast cancer tissue (Long et al. 2019). This

273 is also positively correlated with breast cancer stages, where stage IV breast cancer

274 patients showed the highest H4G expression. This elevated H4G expression in tissues

275 is also reflected in cultured breast cancer cell lines MCF7, LCC1, and LCC2, which

276 express more H4G than non-cancerous cell lines (Long et al. 2019). These findings

277 suggest that H4G might be implicatedDraft in the development or progression of breast

278 cancer. Indeed, from a xenograft experiment, knocking out H4G significantly

279 impeded tumor growth and led to the formation of smaller tumors in mice. This is

280 further supported by cell cycle analysis of H4G-KO cells which all demonstrated a

281 delayed entry into the S-phase once released from a serum starvation-induced G0

282 block (Long et al. 2019). Collectively, these data suggest that H4G is implicated in

283 the development or progression of breast cancer.

284

285 At the cellular level, H4G is mainly localized in the nucleolus, which is unique among

286 histones. A series of site-directed mutagenesis on H4G showed that the D85A and

287 A89V substitutions on the α-helix 3 domain are responsible for nucleolus localization.

288 With this information, one might predict that H4G interacts with nucleolar proteins.

289 When the interaction proteins of H4G were characterized by mass spectrometry (IP-

290 LC-MS/MS) and co-immunoprecipitation (Co-IP), several nucleolar proteins were

© The Author(s) or their Institution(s) Page 13 of 28 Genome

291 identified as specific interaction proteins, one of which was the nucleolar histone

292 chaperone, NPM1. The interaction between H4G and NPM1 was proven to be strong

293 and depended almost entirely on the α-helix 3 domain of H4G, while H4 showed little

294 NPM1 binding (Long et al. 2019). Because of the potential structural functions and

295 high enrichment level of NPM1 in the granular component (GC) of the nucleolus

296 (Feric et al. 2016; Mitrea et al. 2016), it is highly likely that NPM1 is responsible for

297 bringing H4G to the nucleolus.

298

299 Since the nucleolus is the place where rRNA transcription and ribosome biogenesis

300 occur (Scheer and Hock 1999), it is also not surprising to find that H4G promotes one

301 of these processes. H4G lost its nucleolar localization in H4G-positive cell lines when

302 the RNA polymerase I (Pol I) inhibitor,Draft actinomycin D, was added to the culture

303 media. At the same time, exposure to mTOR inhibitors such as rapamycin, known to

304 repress ribosome proteins and 5S rRNA synthesis, had no observable effect (Long et

305 al. 2019). Therefore, it appears that H4G specifically affects Pol I-dependent rRNA

306 transcription. This was confirmed when significant decreases in 18S and 28S rRNA

307 levels were observed in H4G-KO MCF7 cells, where a retarded protein synthesis rate

308 was also detected (Long et al. 2019). This is likely a direct consequence of diminished

309 ribosome biogenesis, which explains why H4G-expressing WT MCF7 tumors grow

310 faster in mouse xenograft models and further alludes to the role of H4G in breast

311 cancer proliferation.

312

313 At the molecular level, many would consider the formation of nucleosomes as a

314 fundamental role of histones. However, H4G gives another surprise because it cannot

315 form classical nucleosomes both in vitro and in cell, even with the help of the histone

© The Author(s) or their Institution(s) Genome Page 14 of 28

316 chaperoning function of NPM1 (Long et al. 2019; Pang et al. 2020). FRAP analysis of

317 H4G mobility revealed that it has a recovery half-life comparable to that of histone

318 chaperones of under 16 sec (Pang et al. 2020). This is a remarkably short time for

319 histones, which have a FRAP recovery time usually taking tens of minutes (Harada et

320 al. 2018; Kujirai et al. 2016). Curiously, there is evidence showing the participation of

321 H4G in NPM1-mediated nucleosome loading in vitro, which resulted in the formation

322 of a transient nucleoprotein complex that promptly dissociates. Finally, coupled with

323 the observation that H4G-expressing cells have a more relaxed nucleolar chromatin

324 structure than H4G-KOs, a model was proposed recently to explain the molecular

325 mechanism of H4G in breast cancer cells (Figure 5). In the nucleolus of H4G-positive

326 breast cancer cells, H4G likely competes with canonical H4 for loading onto the

327 nucleolar chromatin through its Draft interaction with the nucleolar histone chaperone

328 NPM1. The H4G-containing nucleoprotein complex is unstable and quickly

329 dissociates. The reduced nucleosome occupancy then leads to a more relaxed

330 nucleolar chromatin conformation that favors Pol I access and rRNA transcription. As

331 a result, ribosome biogenesis and protein synthesis are enhanced, which ultimately

332 drives cancer cell proliferation (Pang et al. 2020).

333

334 4. Perspective

335

336 In this review, we discuss primate-specific histone variants and their involvement in

337 spermatogenesis and nucleolar chromatin destabilization in breast cancer cells.

338 Currently, detailed functions of most of these histone variants remain uncharacterized

339 due to their low expression, niche cell-type specificities, and the limited availability of

340 antibodies. To move forward, there is a need to take advantage of recent technological

© The Author(s) or their Institution(s) Page 15 of 28 Genome

341 advancements in single-cell RNA sequencing and mass spectroscopy. These

342 advancements promise higher detection sensitivities and, at the same time, provide an

343 opportunity to identify expression profiles of histone variants in rare cell populations

344 such as those from spermatogenesis. Finally, since the primate-specific nature of these

345 histone variants precludes the use of mouse models, to further our understanding of

346 the localizations and physiological functions of these histone variants, which are often

347 involved in developmental processes, as well as their relationships with human

348 diseases, the establishment of reliable human-cell-based organoid models should be

349 given high priority.

350

351 5. Acknowledgements

352 Draft

353 This work was supported by a grant from the Research Grants Council of the Hong

354 Kong SAR (16104917, 26100214 and C7007-17G) to TI.

355

© The Author(s) or their Institution(s) Genome Page 16 of 28

356 6. References

357

358 Akhmanova, A., Miedema, K., and Hennig, W. 1996. Identification and 359 characterization of the Drosophila histone H4 replacement gene. FEBS 360 Lett 388(2-3): 219-222. doi:10.1016/0014-5793(96)00551-0. 361 362 Ausio, J., Zhang, Y., and Ishibashi, T. 2016. Histone Variants and 363 Posttranslational Modifications in Spermatogenesis and Infertility. In 364 Epigenetic Biomarkers and Diagnostics Edited by G.-G. JL. Academic 365 Press. pp. 479-496. 366 367 Boulard, M., Gautier, T., Mbele, G.O., Gerson, V., Hamiche, A., Angelov, D., et 368 al. 2006. The NH2 tail of the novel histone variant H2BFWT exhibits 369 properties distinct from conventional H2B with respect to the assembly 370 of mitotic chromosomes. Mol Cell Biol 26(4): 1518-1526. 371 doi:10.1128/MCB.26.4.1518-1526.2006. 372 373 Brown, G.R., Hem, V., Katz, K.S.,Draft Ovetsky, M., Wallin, C., Ermolaeva, O., et al. 374 2015. Gene: a gene-centered information resource at NCBI. Nucleic 375 Acids Res 43(Database issue): D36-42. doi:10.1093/nar/gku1055. 376 377 Buschbeck, M., and Hake, S.B. 2017. Variants of core histones and their roles 378 in cell fate decisions, development and cancer. Nature reviews 379 Molecular cell biology 18(5): 299. doi:10.1038/nrm.2016.166. 380 381 Churikov, D., Siino, J., Svetlova, M., Zhang, K., Gineitis, A., Morton Bradbury, 382 E., et al. 2004. Novel human testis-specific histone H2B encoded by the 383 interrupted gene on the . Genomics 84(4): 745-756. 384 doi:10.1016/j.ygeno.2004.06.001. 385 386 Ederveen, T.H., Mandemaker, I.K., and Logie, C. 2011. The human histone H3 387 complement anno 2011. Biochim Biophys Acta 1809(10): 577-586. 388 doi:10.1016/j.bbagrm.2011.07.002. 389 390 Feric, M., Vaidya, N., Harmon, T.S., Mitrea, D.M., Zhu, L., Richardson, T.M., 391 et al. 2016. Coexisting Liquid Phases Underlie Nucleolar 392 Subcompartments. Cell 165(7): 1686-1697. doi:10.1016/j.cell.2016.04.047. 393 394 Harada, A., Maehara, K., Ono, Y., Taguchi, H., Yoshioka, K., Kitajima, Y., et al. 395 2018. Histone H3.3 sub-variant H3mm7 is required for normal skeletal

© The Author(s) or their Institution(s) Page 17 of 28 Genome

396 muscle regeneration. Nat Commun 9(1): 1400. doi:10.1038/s41467-018- 397 03845-1. 398 399 Holmes, W.F., Braastad, C.D., Mitra, P., Hampe, C., Doenecke, D., Albig, W., 400 et al. 2005. Coordinate control and selective expression of the full 401 complement of replication-dependent histone H4 genes in normal and 402 cancer cells. J Biol Chem 280(45): 37400-37407. 403 doi:10.1074/jbc.M506995200. 404 405 Kamakaka, R.T., and Biggins, S. 2005. Histone variants: deviants? Genes Dev 406 19(3): 295-310. doi:10.1101/gad.1272805. 407 408 Kato, Y., Shiraishi, K., and Matsuyama, H. 2014. Expression of testicular 409 androgen receptor in non-obstructive azoospermia and its change after 410 hormonal therapy. Andrology 2(5): 734-740. doi:10.1111/j.2047- 411 2927.2014.00240.x. 412 413 Kayne, P.S., Kim, U.J., Han, M., Mullen, J.R., Yoshizaki, F., and Grunstein, M. 414 1988. Extremely conservedDraft histone H4 N terminus is dispensable for 415 growth but essential for repressing the silent mating loci in yeast. Cell 416 55(1): 27-39. doi:10.1016/0092-8674(88)90006-2. 417 418 Kujirai, T., Horikoshi, N., Xie, Y., Taguchi, H., and Kurumizaka, H. 2017. 419 Identification of the amino acid residues responsible for stable 420 nucleosome formation by histone H3.Y. Nucleus 8(3): 239-248. 421 doi:10.1080/19491034.2016.1277303. 422 423 Kujirai, T., Horikoshi, N., Sato, K., Maehara, K., Machida, S., Osakabe, A., et 424 al. 2016. Structure and function of human histone H3.Y nucleosome. 425 Nucleic Acids Res 44(13): 6127-6141. doi:10.1093/nar/gkw202. 426 427 Lee, J., Park, H.S., Kim, H.H., Yun, Y.J., Lee, D.R., and Lee, S. 2009. Functional 428 polymorphism in H2BFWT-5'UTR is associated with susceptibility to 429 male infertility. J Cell Mol Med 13(8B): 1942-1951. doi:10.1111/j.1582- 430 4934.2009.00830.x. 431 432 Long, M., Sun, X., Shi, W., Yanru, A., Leung, S.T.C., Ding, D., et al. 2019. A 433 novel histone H4 variant H4G regulates rDNA transcription in breast 434 cancer. Nucleic Acids Res 47(16): 8399-8409. doi:10.1093/nar/gkz547. 435 436 Mito, Y., Henikoff, J.G., and Henikoff, S. 2005. Genome-scale profiling of 437 histone H3.3 replacement patterns. Nat Genet 37(10): 1090-1097. 438 doi:10.1038/ng1637.

© The Author(s) or their Institution(s) Genome Page 18 of 28

439 440 Mitrea, D.M., Cika, J.A., Guy, C.S., Ban, D., Banerjee, P.R., Stanley, C.B., et al. 441 2016. Nucleophosmin integrates within the nucleolus via multi-modal 442 interactions with proteins displaying R-rich linear motifs and rRNA. 443 Elife 5. doi:10.7554/eLife.13571. 444 445 Molden, R.C., Bhanu, N.V., LeRoy, G., Arnaudo, A.M., and Garcia, B.A. 2015. 446 Multi-faceted quantitative proteomics analysis of histone H2B isoforms 447 and their modifications. Epigenetics Chromatin 8: 15. 448 doi:10.1186/s13072-015-0006-8. 449 450 Montellier, E., Boussouar, F., Rousseaux, S., Zhang, K., Buchou, T., Fenaille, 451 F., et al. 2013. Chromatin-to-nucleoprotamine transition is controlled 452 by the histone H2B variant TH2B. Genes Dev 27(15): 1680-1692. 453 doi:10.1101/gad.220095.113. 454 455 Pang, M.Y.H., Sun, X., Ausió, J., and Ishibashi, T. 2020. Histone H4 variant, 456 H4G, drives ribosomal RNA transcription and breast cancer cell 457 proliferation by looseningDraft nucleolar chromatin structure. J Cell Physiol. 458 doi:10.1002/jcp.29770. 459 460 Papatheodorou, I., Moreno, P., Manning, J., Fuentes, A.M., George, N., 461 Fexova, S., et al. 2020. Expression Atlas update: from tissues to single 462 cells. Nucleic Acids Res 48(D1): D77-D83. doi:10.1093/nar/gkz947. 463 464 Park, Y.J., and Luger, K. 2006. The structure of nucleosome assembly protein 465 1. Proc Natl Acad Sci U S A 103(5): 1248-1253. 466 doi:10.1073/pnas.0508002103. 467 468 Quilter, C.R., Karcanias, A.C., Bagga, M.R., Duncan, S., Murray, A., Conway, 469 G.S., et al. 2010. Analysis of X chromosome genomic DNA sequence 470 copy number variation associated with premature ovarian failure 471 (POF). Hum Reprod 25(8): 2139-2150. doi:10.1093/humrep/deq158. 472 473 Rafatmanesh, A., Nikzad, H., Ebrahimi, A., Karimian, M., and Zamani, T. 474 2018. Association of the c.-9C>T and c.368A>G transitions in H2BFWT 475 gene with male infertility in an Iranian population. Andrologia 50(1). 476 doi:10.1111/and.12805. 477 478 Resnick, R., Wong, C.J., Hamm, D.C., Bennett, S.R., Skene, P.J., Hake, S.B., et 479 al. 2019. DUX4-Induced Histone Variants H3.X and H3.Y Mark DUX4 480 Target Genes for Expression. Cell Rep 29(7): 1812-1820.e1815. 481 doi:10.1016/j.celrep.2019.10.025.

© The Author(s) or their Institution(s) Page 19 of 28 Genome

482 483 Santisteban, M.S., Arents, G., Moudrianakis, E.N., and Smith, M.M. 1997. 484 Histone octamer function in vivo: mutations in the dimer-tetramer 485 interfaces disrupt both gene activation and repression. EMBO J 16(9): 486 2493-2506. doi:10.1093/emboj/16.9.2493. 487 488 Scheer, U., and Hock, R. 1999. Structure and function of the nucleolus. Curr 489 Opin Cell Biol 11(3): 385-390. doi:10.1016/S0955-0674(99)80054-4. 490 491 Schenk, R., Jenke, A., Zilbauer, M., Wirth, S., and Postberg, J. 2011. H3.5 is a 492 novel hominid-specific histone H3 variant that is specifically expressed 493 in the seminiferous tubules of human testes. Chromosoma 120(3): 275- 494 285. doi:10.1007/s00412-011-0310-4. 495 496 Shiraishi, K., Shindo, A., Harada, A., Kurumizaka, H., Kimura, H., Ohkawa, 497 Y., et al. 2018. Roles of histone H3.5 in human spermatogenesis and 498 spermatogenic disorders. Andrology 6(1): 158-165. 499 doi:10.1111/andr.12438. 500 Draft 501 Singh, R., Bassett, E., Chakravarti, A., and Parthun, M.R. 2018. Replication- 502 dependent histone isoforms: a new source of complexity in chromatin 503 structure and function. Nucleic Acids Res 46(17): 8665-8678. 504 doi:10.1093/nar/gky768. 505 506 Ueda, J., Harada, A., Urahama, T., Machida, S., Maehara, K., Hada, M., et al. 507 2017. Testis-Specific Histone Variant H3t Gene Is Essential for Entry 508 into Spermatogenesis. Cell Rep 18(3): 593-600. 509 doi:10.1016/j.celrep.2016.12.065. 510 511 Urahama, T., Harada, A., Maehara, K., Horikoshi, N., Sato, K., Sato, Y., et al. 512 2016. Histone H3.5 forms an unstable nucleosome and accumulates 513 around transcription start sites in human testis. Epigenetics Chromatin 514 9: 2. doi:10.1186/s13072-016-0051-y. 515 516 Wakamori, M., Fujii, Y., Suka, N., Shirouzu, M., Sakamoto, K., Umehara, T., et 517 al. 2015. Intra- and inter-nucleosomal interactions of the histone H4 tail 518 revealed with a human nucleosome core particle with genetically- 519 incorporated H4 tetra-acetylation. Sci Rep 5: 17204. 520 doi:10.1038/srep17204. 521 522 Wiedemann, S.M., Mildner, S.N., Bonisch, C., Israel, L., Maiser, A., Matheisl, 523 S., et al. 2010. Identification and characterization of two novel primate-

© The Author(s) or their Institution(s) Genome Page 20 of 28

524 specific histone H3 variants, H3.X and H3.Y. J Cell Biol 190(5): 777-791. 525 doi:10.1083/jcb.201002043. 526 527 Ying, H.Q., Scott, M.B., and Zhou-Cun, A. 2012. Relationship of SNP of 528 H2BFWT gene to male infertility in a Chinese population with 529 idiopathic spermatogenesis impairment. Biomarkers 17(5): 402-406. 530 doi:10.3109/1354750X.2012.677066. 531 532 Yuen, B.T., Bush, K.M., Barrilleaux, B.L., Cotterman, R., and Knoepfler, P.S. 533 2014. Histone H3.3 regulates dynamic chromatin states during 534 spermatogenesis. Development 141(18): 3483-3494. 535 doi:10.1242/dev.106450. 536 537 Zink, L.M., Delbarre, E., Eberl, H.C., Keilhauer, E.C., Bönisch, C., Pünzeler, S., 538 et al. 2017. H3.Y discriminates between HIRA and DAXX chaperone 539 complexes and reveals unexpected insights into human DAXX-H3.3- 540 H4 binding and deposition requirements. Nucleic Acids Res 45(10): 541 5691-5706. doi:10.1093/nar/gkx131. 542 Draft 543

544

545

546

© The Author(s) or their Institution(s) Page 21 of 28 Genome

547 Figure legends

548

549 Figure 1. Primary Sequence Alignments of Primate-specific Histone Variants

550 Amino acid sequences of primate-specific histone variants and their respective

551 canonical histone references were retrieved from the UniProt database and aligned

552 using Clustal Omega (1.2.4) with default settings. Highlighted positions represent

553 unconserved amino acid residues. Diagrams above alignments outline positions of

554 terminal tails and helixes of the canonical histone reference (Wakamori et al. 2015)

555 (PDB: 5AV6).

556

557 Figure 2. H2BFWT SNP is related to male infertility in humans.

558 The panel is modified from (Ying etDraft al. 2012).

559

560 Figure 3. Schematic diagram of histone H3 variants expression patterns in

561 spermatogenesis.

562 The panel explains how histone variants and protamines are exchanged and proposed

563 regulatory role of H3.5 during spermatogenesis. This figure is modified from

564 (Shiraishi et al. 2018).

565

566 Figure 4. H3.X and H3.Y expressions in the human brain.

567 Human hippocampus sections were costained with α-H3.X/H3.Y (red), α-NeuN

568 (neuronal marker, green), α-GFAP (astrocyte marker, white), and DAPI (DNA, blue).

569 Bars, 20 μm. This figure is modified from (Wiedemann et al. 2010).

570

571 Figure 5. A functional model for H4G in nucleoli.

© The Author(s) or their Institution(s) Genome Page 22 of 28

572 In normal cells, NPM1 maintains normal nucleosome occupancy by assisting the

573 loading of canonical histone octamer onto the nucleoli chromatin. The compact

574 heterochromatin formed represses Pol I transcription; in breast cancer cells expressing

575 H4G, however, in addition to loading canonical nucleosomes, NPM1 also facilitates

576 the formation of H4G-containing nucleoprotein complex that swiftly dissociates. This

577 upsets the balance and results in a much more relaxed nucleoli chromatin

578 configuration that promotes rRNA transcription and, by extension, protein synthesis,

579 and cell proliferation. This figure is modified from (Pang et al. 2020).

580

Draft

© The Author(s) or their Institution(s) Page 23 of 28 Genome

Draft

Figure 1

© The Author(s) or their Institution(s) Genome Page 24 of 28

Fertile men Infertile patients p value Locus Allele Total (n=209) Total (n=409) C 0.584 (122) 0.472 (193) −9C>T T 0.416 (87) 0.528 (216) 0.009 A 0.675 (141) 0.570 (233) 368A>G G 0.325 (68) 0.430 (176) 0.012

Draft

© The Author(s) or their Institution(s) Page 25 of 28 Genome

Draft

Figure 3

© The Author(s) or their Institution(s) Genome Page 26 of 28

DraftFigure 4

© The Author(s) or their Institution(s) Page 27 of 28 Genome

Figure 5

Draft

© The Author(s) or their Institution(s) Genome Page 28 of 28

Table 1. An overview of histone variants discussed in this review

Location Alternative NCBI Alternative Protein Uniprot Accession Primate- Gene Name in Human Protein Name Biological Significance References Gene Name Gene ID Name Number specific? Genome

Churikov et al. 2004 H2B histone family, - SNPs correlated with male- H2B.W Histone H2B infertility Boulard et al. 2006 member W, testis- Q7Z2G1 histone 1 H2BFWT 158983 Xq22.2 type W-T specific Yes Lee et al. 2009 (H2BWT_HUMAN) - Enriched in telomere and (H2BW1) (H2BWT) H2BFWT may have telomere-related Quilter et al. 2010 functions Rafatmanesh et al. 2018

H2B histone family member M H2B.W Histone H2B H2BM H2B.M histone P0C1H6 histone 2 286436 Xq22.2 type F-M No - Unknown Quilter et al. 2010 H2BFM (H2BFM_HUMAN) (H2BW2) (H2BFM) H2B/s Histone H2B.s

H3 histone family, - Replace canonical H3 during H3t member T spermatogenesis H3.4 H3.4 histone Histone H3.1t H3/t Q16695 - Generate flexible H3/g 8290 1q42.13 No nucleosome/chromatin Ueda et al. 2017 (H3-4) (H3.1t) (H31T_HUMAN) H3FT Histone H3.4 structure HIST3H3 - Allow male germ cells to Histone cluster 3 H3 enter meiosis and beyond

H3 histone family - Accumulated around TSSs in member 3C testes during spermatogenesis H3.5 histone H3.5 Histone H3.3C H3 histone, family 3C Q6NXT2 - Diminish nucleosome 440093 12p11.21 Yes stability Urahama et al, 2016 (H3-5) H3F3C (H3.3C) (H3C_HUMAN) Histone H3.5 - May induce open chromatin confirmation during early Histone variant H3.5 spermatogenesis

PutativeDraft histone H3.X H3.Y histone 2 H3.X Histone H3.X Histone H3.Y2 P0DPK5 - Together with H3.Y may 340096 5p15.1 Yes facilitate the expression of Resnick et al. 2019 (H3Y2) H3.Y.2 (H3.X) Histone cluster 2, H3d (H3Y2_HUMAN) DUX4 target genes pseudogene

- Expressed in the brain Histone H3.Y - Together with H3.X may H3.Y histone 1 H3.Y Histone H3.Y Histone H3.Y1 P0DPK2 facilitate the expression of Resnick et al. 2019 391769 5p15.1 Yes DUX4 target genes (H3Y1) H3.Y.1 (H3.Y) (H3Y1_HUMAN) Wiedemann et al. 2010 Histone cluster 2, H3c - May involved in the pseudogene regulation of cellular responses to cellular stimuli

H4 histone family, member L - Overexpressed in breast Histone H4- cancer tissues in a cancer- H4 clustered H4/l stage-dependent manner like protein Histone 1, H4g Q99525 Long et al. 2019 histone 7 H4FL 8369 6p22.2 Yes type G Histone cluster 1 H4 (H4G_HUMAN) - Nucleolar localization Pang et al. 2020 (H4C7) HIST1H4G (H4G) family member g - Promote rRNA transcription Histone cluster 1, H4g and cell proliferation

© The Author(s) or their Institution(s)