bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Single cell analysis reveals X upregulation is not global and 2 primarily belongs to ancestral in pre-gastrulation embryos 3 Naik C H, Chandel D, and Gayen S* 4 Department of Molecular Reproduction, Development and Genetics, Indian Institute of Science, 5 Bangalore-560012, India. 6 *Correspondence: [email protected] 7 8 Abstract 9 Recent studies have provided substantial evidence supporting Ohno's hypothesis that 10 upregulation of active genes balances the dosage of X-linked expression 11 relative to autosomal genes. However, the dynamics of X-chromosome upregulation (XCU) 12 during early development remain poorly Pre-gastrulation embryos E6.50 13 understood. Here, we have profiled the E6.25 E5.5 14 dynamics of XCU in different lineages of 15 female pre-gastrulation mouse embryos at 16 single cell level through allele-specific single 17 cell RNA-seq analysis. We found dynamic 18 XCU upon initiation of random X-

Epiblast cells 19 chromosome inactivation (XCI) in epiblast Onset of Random X-inactivation 20 cells and cells of extraembryonic lineages, 21 which undergo imprinted XCI, also harbored 22 upregulated active-X chromosome. On the 23 other hand, the extent of XCU remains 24 controversial till date. While it is thought that

Upregulation is mostly confined to 25 it is a global phenomenon, some studies Ancestral genes 26 suggested that it affects dosage sensitive genes 27 only. Interestingly, through profiling gene- Not Upregulated Genes Upregulated Genes28 wise dynamics of XCU, we found that X- 29 upregulation is not global and primarily 30 belongs to ancestral X-linked genes. 31 However, it is not fully restricted to ancestral Higher burst 32 X-linked genes as some of newly acquired X- frequency 33 linked genes also undergo XCU, suggesting Acquired X-linked genes H3K4me3 RNA PolII H4K16a34c evolution of XCU to the newly acquired genes Ancestral X-linked genes H3K36me3 35 as well. Importantly, we found that occupancy Xa: Active X Xi: Inactive X Xu: Upregulated X 36 of RNApolII, H3k4me3, H4k16ac and 37 H3K36me3 is enhanced at loci of the 38 upregulated/ancestral genes compared to non-upregulated/newly acquired X-linked genes. 39 Moreover, upregulated X-linked genes showed increased burst frequency compared to the non- 40 upregulated genes. Altogether, our study provides significant insight into the gene-wise 41 dynamics, mechanistic basis, and evolution of XCU during early development. 42 43 Keywords: X chromosome upregulation, X chromosome inactivation, Sex chromosome 44 evolution, Dosage compensation, Y degeneration, Pre-gastrulation. 45 1

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

46 Introduction 47 In therian mammals, sex is determined by the sex : XX in female and XY in male. 48 During the evolution of sex chromosomes from a pair of autosomes, the Y chromosome's 49 degradation led to the dosage imbalance between X and autosomal genes in males [1]. In 1967, 50 Ohno hypothesized that the expression of X-linked genes in XY cells was upregulated to two- 51 fold to correct this imbalance [2]. Subsequently, this X-chromosome upregulation (XCU) was 52 inherited in females and thereby introduced an extra dosage of the X chromosome in female 53 cells. Therefore, to counteract the extra dosage from X-chromosome in female cells, the 54 evolution of X-chromosome inactivation (XCI) happened, a process that silent one of the X- 55 chromosome in female mammals [3]. However, Ohno's hypothesis was not accepted well for 56 a long time due to the lack of proper experimental evidence supporting the existence of 57 upregulated active-X chromosome. The first evidence of X-upregulation came through the 58 studies based on microarray analysis; however, subsequently, it was challenged through RNA- 59 seq based analysis [4–9]. Down the line, several independent studies came out both in 60 supporting and refuting Ohno's hypothesis [10–19]. Although, Ohno’s hypothesis is still 61 contested, recent studies by us and others have shown the presence of upregulated active-X 62 chromosome (X2a) in different cell types in vivo and in vitro [20]. However, the dynamics of 63 X-chromosome upregulation (XCU) during early development remain poorly understood. 64 Here, we have profiled the dynamics of XCU in different lineages of female pre-gastrulation 65 mouse embryos at single cell level through allele-specific single cell RNA-seq analysis. 66 On the other hand, the extent of XCU, i.e., whether XCU is chromosome wide like XCI or 67 restricted to specific genes, remains controversial. Many studies implicated that XCU might 68 be global, though direct evidence showing gene-wise dynamics of upregulation is not well 69 understood [11,14]. In contrary, several studies implicated that XCU affects dosage-sensitive 70 genes such as components of macromolecular complexes, signal transduction pathways or 71 encoding for transcription factors [21,22]. To get better insight about this, here we have 72 explored gene-wise dynamics of XCU in pre-gastrulation mouse embryos. We found that XCU 73 is neither global nor restricted to dosage sensitive genes only. Furthermore, we have 74 investigated if XCU is related to the evolutionary timeline of X-linked genes. Based on the 75 evolutionary time point X-linked genes can be categorized into two classes: ancestral X-linked 76 genes, which were present originally from the beginning of sex chromosome evolution and 77 newly acquired X-linked genes. Whether X-chromosome upregulation happens to the newly 78 acquired X-linked genes or restricted to mostly in ancestral genes is another intriguing 79 question. Here, we have analyzed the XCU kinetics in ancestral vs. newly acquire X-linked 80 genes to understand the evolution of XCU. Finally, we have explored the mechanistic basis of 81 XCU. 82 83 Results

84 Dynamic active-X upregulation upon random XCI in epiblast cells of pre-gastrulation 85 embryos 86 In pre-gastrulation mouse embryos, while extraembryonic cells have imprinted inactive X, 87 epiblast cells are at the onset of random XCI [23–27]. Therefore, to investigate the dynamics 88 of XCU, we segregated the cells of E5.5, E6.25, and E6.5 mouse embryos into the three 89 lineages: epiblast (EPI), extraembryonic ectoderm (ExE), and visceral endoderm (VE) based 90 on lineage-specific marker by t-distributed stochastic neighbour embedding 91 (t-SNE) analysis as described earlier (Fig. 1A). We investigated status of XCU in these 2

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

92 93 Figure 1: Dynamic X-chromosome upregulation in different lineages of pre-gastrulation embryos. (A) 94 Schematic outline of the workflow: profiling of active X-upregulation in different lineages (EPI: Epiblast, ExE: 95 Extraembryonic ectoderm, and VE: Visceral endoderm) of pre-gastrulation hybrid mouse embryos (E5.5, E6.25, 96 and E6.50) at the single-cell level using published scRNA-seq dataset. Hybrid mouse embryos were obtained from 97 crossing between two divergent mouse strains C57 and CAST. (B) Classification of cells based on XCI state 98 through profiling of fraction maternal expression of X-linked genes in the single cells of different lineages of pre- 99 gastrulation embryos (EPI, ExE, and VE). Ranges of fraction of maternal expression was considered for different 100 category cells: XaXa = 0.4-0.6, XaXp= 0.6-0.9/ 0.1-0.4 and XaXi = 0.9-1/0-0.1. (C) X:A ratios represented as violin 3

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

101 plots for the XaXa, XaXp, XaXi cells of different lineages of pre-gastrulation embryos. (D) Allelic expression 102 levels of X-linked and autosomal genes in XaXa, XaXp, XaXi cells of different lineages of pre-gastrulation 103 embryos. 104 105 different lineages of pre-gastrulation embryos through performing allelic/non-allelic gene 106 expression analysis using the available scRNA-seq dataset of E5.5, E6.25, and E6.5 hybrid 107 mouse embryos [28] (Fig. 1A). These embryos were derived from two divergent mouse strains 108 (C57BL/6J and CAST/EiJ) and therefore harbored polymorphic sites across the genome, 109 allowing us to profile gene expression with allelic resolution (Fig. 1A). First, we categorized 110 the cells of different lineages of the embryos based on their XCI status: cells with no XCI 111 (XaXa), partial or undergoing XCI (XaXp), and complete XCI (XaXi) through profiling the 112 fraction of maternal allele expression (Fig. 1B). We found that in the EPI lineage lot of cells 113 belongs to XaXp/XaXa indicating these cells are onset of random X-inactivation, whereas cells 114 of VE/ExE lineage mostly belong to XaXi category since these cells underwent of imprinted 115 X-inactivation (Fig. 1B). As expected, autosomal genes showed almost equivalent 116 paternal/maternal allele expression, thus validating our allele-specific analysis (Fig S1A). 117 Next, to check the upregulation dynamics, we profiled X:A ratio in the individual cells of 118 different stages/lineages of embryos. If a diploid female cell (XaXi) has upregulated active X, 119 the X:A ratio should be more than 0.5 and closer to 1. We found that X:A ratio of XaXp/XaXi 120 cells is always > 0.5 and close to 1 despite XCI, indicating the presence of dynamic X- 121 upregulation from the active X chromosome in XaXp/XaXi cells (Fig. 1C). For X:A analysis, 122 we made sure that there are no significant differences between the X-linked and autosomal 123 gene expression distribution (Method; Supplementary file 1). Next, we compared the allelic 124 expression pattern from autosome and X-chromosome in individual cells of each lineage. As 125 expected, we found that the active X expression is always significantly higher than the 126 autosomal allelic expression in XaXp/XaXi cells, corroborating upregulation of gene expression 127 from the active X -chromosome (Fig. 1D). On the other hand, there were no significant 128 differences in active Xs and autosomal allelic expression in XaXa cells of epiblast lineage 129 suggesting no upregulation of X-chromosome in absence of XCI (Fig. 1D). Altogether, these 130 analyses suggested dynamic active XCU upon initiation of random XCI in epiblast cells of pre- 131 gastrulation embryos. 132 133 X-chromosome upregulation is primarily confined to the ancestral X-linked genes 134 Our X:A analysis in pre-gastrulation embryos and differentiation mESCs revealed that the X:A 135 ratio of XaXi cells is always lower compared to the XaXa cells (Fig. 1C). This data hinted that 136 X-upregulation of X-linked genes in XaXi cells is partial or all genes do not undergo 137 upregulation. To explore this further, we investigated whether X-chromosome upregulation 138 occurs globally or restricted to certain class of genes. To test this, we profiled gene-wise 139 dynamics of XCU by comparing the expression of X-linked genes from the active X 140 chromosome of XaXi cells with the same active allele of XaXa cells in EPI E6.5 (Fig. 2A). If 141 active allele of a gene is upregulated in XaXi cells, it will show increased expression from the 142 active allele in XaXi cells compared to the same active allele of XaXa cells. We found that while 143 many X-linked genes showed increased expression from the active-X allele in XaXi cells 144 compared to the XaXa, there were significant number of genes which did not show such 145 increased expression suggesting that all genes do not undergo upregulation (Fig. 2A). 146 Moreover, while some genes showed robust upregulation from the active-X allele, the others 147 were moderately upregulated. Altogether, this result suggested that X-upregulation is not

4

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A EPI E6.50 Fig. 2

High e

Active cells Inactive cells Active cells Inactive cells g

a r C57 CAST C57 e C57 CAST )

CAST v

CAST 1

C57 a

+

t

d

n

e z X u X X X i i a a l

Xa Xa X Xi a o a

a c

m

d

r

a

o

e

r

N

(

2

g

o l

Low

n

n

o

o

i

i

t

t

a

a

l

l

u

u

g

g

e

e

r

r

p

p

U

U

t

t

s

s

u

u

b

b

o

o

R R

B

80 7780 77 EPI 6.5

60 60

n

s

e

o i n 45

t 45

e

l

g

a

f 40 40

o

u

o

g 24 24

N e r 20 20 13 13 13 13

p 9 9

U n

1 1

o

e i

t 0 0

t l

a No Upregulation CAST

a

r u

e No Upregulation C57 g

d Upregulation C57

e o

r Upregulation CAST M p 100 51000 0 50 0

U No of genes

e

t

a

r

e

d

o M

n

o

i

t

a

l

u

g

e

r

p

U

n

o

o

i

t

N

a

l

u

g

e

r

p

U

o

N

148 149 Figure 2: X-chromosome upregulation is not global. (A) Comparison of expression (log normalized reads) of 150 individual X-linked genes from the active allele between the XaXa and XaXi EPI cells of E6.50 embryos (B) Plot 151 representing intersections of allele-wise upregulation of X-linked genes in E6.5 EPI cells.

5

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

152 global or do not occur chromosome-wide. On the other hand, surprisingly we found that an 153 adequate number of upregulated X-linked genes showed allele-specificity as they showed 154 upregulation from either C57 or CAST as an active-X suggesting regulation of upregulation of 155 active allele may occur in parent of origin specific manner (Fig. 2B). 156 Next, we investigated the mechanistic basis of why some genes undergo upregulation and 157 others are not. To find this, we categorized the X-linked genes in E6.5 EPI to three categories 158 based on their active X-upregulation status. X-linked genes with robust upregulation from both 159 C57 and CAST allele (Category 1), robust upregulation from one allele but moderate from the 160 other allele (Category 2) and no upregulation from either of the allele (Category3) (Fig. 3A). 161 First, we explored if there any relevance of the location of the genes on the X-chromosome 162 with the upregulation. To test this, we profiled the position of the upregulated (Cat1 and Cat 2) 163 and non-upregulated genes (Cat 3). We found that all these three categories’ genes are located 164 across the X-chromosome and there is no specific pattern of position of upregulated vs non- 165 upregulated genes indicating chromosomal positioning of X-linked genes is not relevant to X- 166 upregulation (Fig 3B). Next, we asked if there any differences of expression level of these three 167 classes of genes and found no significant differences (Fig. 3C). Moreover, the X-linked genes 168 in the three categories did not show any distinction with respect to their location in 169 topologically associated domains (TADs) (Fig. 3D). As the retrotransposon elements play a 170 profound role in gene regulation, we wanted to see if the genes undergoing upregulation from 171 active-X chromosome has any link to the retrotransposon enrichment. However, the three 172 categories of X-linked genes showed no correlation with the density of LINE elements (Fig. 173 3E). SINEs on the other hand, were significantly enriched in not upregulated (Cat3) genes 174 compared to the robustly upregulated (Cat1) genes (Fig. 3E). Previous reports stated that XCU 175 is restricted only to dosage-sensitive genes such as genes encoding for large complexes, 176 transcription factors, involve in signal transduction etc. Therefore, we analyzed if the 177 upregulated genes in EPI E6.5 is mostly belonging such dosage sensitive genes or not. 178 However, we found that while there are some upregulated genes fall into dosage sensitive 179 category, many genes are not (Supplementary file 2). Moreover, there are many dosage 180 sensitive genes among the non-upregulated genes. Altogether, there are no significant 181 differences of distribution of dosage sensitive / insensitive genes among the upregulated or 182 non-upregulated genes suggesting XCU is not restricted to dosage sensitive genes only 183 (Supplementary file 2). Next, we collated the three categories of X-linked genes with their one- 184 to-one orthologs in chicken (ancestral X-linked genes) and found that most of the robustly 185 upregulated (Cat1) and moderately upregulated (Cat2) genes belong to this ancestral class of 186 X-linked genes (79% in Cat1 and 60% in Cat2) and only a few belong to the class of acquired 187 X-linked genes. While the genes that showed no upregulation had equivalent distribution to 188 two classes of ancestral and acquired X-linked genes (Fig. 3F). Altogether, these data suggest 189 that the process of active-X upregulation is mostly confined to the ancestral class of X-linked 190 genes (Fig. 3F). 191 192 Upregulated / ancestral X-linked genes show higher enrichment of active chromatin 193 marks 194 Previous studies have reported that the X linked genes from active X-chromosome have greater 195 enrichment of the active chromatin marks compared to the autosomal genes [14]. Therefore, 196 we investigated if the different active chromatin marks (H3K4me3, H3K36me3, H4K16ac and 197 RNA PolII) are differentially enriched at the transcriptional start site and gene body of 198 upregulated (Cat1 and Cat2) vs not upregulated (Cat3) genes on the active X-chromosome (Fig.

6

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A Fig. 3 C57 CAST Cat1 Genes with robust upregulation from C57 active X CAST and CAST active X Cat1 Cat1 C57 Cat2 Genes with robust upregulation from one allele and Cat2 Cat2 moderate from other (C57 or CAST) Cat3 Xi Cat3 Xi Cat3 X Genes with no upregulation from C57 active X and a Xa CAST active X

B Upregulation (Cat1) Upregulation (Cat2) No upregulation (Cat3) Xist v

Chromosome X C EPI 6.5 EPI 6.5 8 EPI 6.5 M E Line

P Sine T

4 ns 2

g 0.3 * o L 0 0.6 n n n ns io o io ns t ti t la a la ns ns u ) l ) u g 1 u 2 g e t g t e ) 0.4 0.2 r a re a r t3 p (C p C p a U U ( U C o ( N

D

d 100 0.2 0.1

e t

80

a

s c

o 60

D

l

A

s 0 40

T 0

e

n n n n n i 20 n e n o o o o n io ti ti ti g ti o t i a a a 0 la t la l ) l ) l Category1 Category2 Category3 ) la u u 1 u 2 u % n u ) g t g t g n n o g t1 u 2 g ) e a e a e ) io o ti e a g t re 3 r r r t3 t ti a r re a p t p (C p (C p a la a l p (C p C a U U U u ) l ) u U ( U C (C g t1 u 2 g ) U o ( o e g t e 3 N r a re a r t N p (C p C p a U ( U C U o ( N EPI 6.5 Category F upregulation (Cat1) upregulation (Cat2) No upregulation (Cat3)

(Ancestral: 79.2%) (Ancestral: 60.4%) (Ancestral: 54.2%) y y y ry r r r o o o o s g g s g s g s e s te s te s te la t s a a a a a a la l C l C C C C C C C Abcb7 1110012L19Rik Rpgr A830080D01Rik

Arxes1 Acot9 Rps6ka6 Cox7b Chst7 Apln Smc1a Cstf2 Clcn5 Arxes2 Snx12 Fmr1 Cul4b Cenpi Tbl1x Fmr1nb Fancb Ctps2 Pqbp1 Med12 Gpc3 Dkc1 Ebp Ncrna00086 Mageh1 Dock11 Praf2 Nxt2 Mospd1 Fam123b Zfp280c Ofd1 Mum1l1 Fndc3c1 G6pdx Phf16 Piga Gyk Ubl4 Siah1b Prkx Hccs Dusp9 Slc7a3 Rbmx Hsd17b10 Haus7 Txlng Sh3kbp1 Itm2a Rpl10 Hdac6 Stard8 Lpar4 Hcfc1 Wdr45 Tceanc Maged2 Armcx1 Pim2 Tsc22d3 Magt1 Tceal1 Tfe3

Xiap Msn Tceal8 Otud5 Zc4h2 Pdk3 Tmsb15b2 Lage3 Cxx1b Pgrmc1 Bex2 Ngfrap1 Cxx1c Phf6 Bhlhb9 Mcart6 L1cam Pja1 Ubqln2 Bex1 Gspt2 Pls3 Apex2 Ribc1 Armcx2 Prps2 2210013O21Rik Pir Psmd10 Gnl3l Rab9 Zrsr2 Rp2h

EPI 6.5 199 200 Figure 3: X-upregulation is mostly confined to ancestral X-linked genes. (A) Shematic representation of 201 different categories of X-linked genes in E6.5 EPI based on their allele-wise upregulation status. X-linked genes 202 with robust upregulation from both C57 and CAST allele (Category 1), robust upregulation from one allele but 203 moderate from the other allele (Category 2) and no upregulation from either of the allele (Category3). (B) 204 Genomic position of the three categories of X-linked genes on the X-chromosome. (C) Comparison of expression 7

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

205 levels (log2 TPM) between the three categories of X-linked genes. (D) Histograms representing the percent of the 206 three categories of X-linked genes located in TADs. (E) Density of LINE and SINE repeats in the three categories 207 of X-linked genes. p-values of Wilcoxon rank test comparing the retrotransposon enrichment in the three 208 categories (p-value < 0.05; *) (F) Heatmap representing the segregation of the three categories of X-linked genes 209 to ancestral and acquired classes of X-linked genes in E6.5 EPI cells. 210 211 4A). Interestingly, we found that the average density of H3K4me3, H4K16ac and RNA PolII 212 showed notable increase in the upregulated X-linked genes (Cat1 and Cat2) compared to the 213 not upregulated genes (Cat3) both around the TSS and gene body regions (Fig. 4A). However, 214 for H3K36me3 there were slight difference in enrichment. Additionally, we compared the 215 enrichment of these active marks between ancestral vs acquired X-linked genes on the active 216 X-chromosome and found that the ancestral class of X-linked genes showed higher enrichment 217 for these marks compared to the newly acquired X-linked genes (Fig. 4B). Finally, we profiled 218 the allelic transcriptional burst kinetics between the upregulated and not upregulated X-linked 219 genes in the XaXi cells of E6.5 EPI. Excitingly, the upregulated X-linked genes (Cat1 and Cat2) 220 showed significantly higher burst frequency compared to the not upregulated X-linked genes 221 (Cat3) (Fig. 4C). However, burst sizes were not significantly different between the two 222 categories of X-linked genes (Fig. 4C). 223 224 Discussion 225 In 1967, Ohno hypothesized that gene loss from the Y chromosome led to the upregulation of 226 X-chromosomes in both sexes, which subsequently led to the inactivation of one X- 227 chromosome in female mammals to ensure proper X-dosage. However, Ohno's hypothesis was 228 contested for a long time due to the lack of appropriate experimental evidence. Recent studies 229 have provided evidence supporting active X upregulation in different cell types in vivo and in 230 vitro. However, the dynamics of X-upregulation during early development remain poorly 231 understood. Here, we have profiled the XCU dynamics at single cell level in female pre- 232 gastrulation mouse embryos. We found that in EPI cells (E5.5, E6.25, E6.5), which are at the 233 onset of random XCI, X upregulation is dynamically linked with the X-inactivation. VE and 234 ExE of pre-gastrulation embryos, which undergo imprinted X-inactivation also showed 235 presence of upregulated active-X chromosome. Altogether, these results suggested that two X- 236 chromosomes expression state is highly plastic in nature towards balancing the optimal X 237 chromosome dosage during early development. 238 One intriguing question is whether X-upregulation occurs globally or restricted to certain class 239 of genes. It is thought that upon X-chromosome inactivation, numerous trans-acting factors 240 leave the inactivating-X and shift to the active-X thereby leading to global increase in 241 expression from the active-X chromosome. In contrast, it is thought that XCU is restricted to 242 dosage sensitive genes only. Interestingly, we found that though XCU is dynamically linked to 243 XCI, XCU is not global like XCI. Indeed, we found that the X:A ratio of XaXi cells is always 244 lower compared to the XaXa cells. Importantly, through assaying gene-wise dynamics of XCU 245 we found that many X-linked genes do not undergo upregulation. Moreover, we found that 246 upregulated genes are not restricted to dosage sensitive genes. Altogether, these results 247 suggested that XCU is neither global nor restricted to dosage sensitive genes only. 248 Interestingly, we found that XCU is mostly confined to the ancestral class of X-linked genes. 249 However, it is not restricted to ancestral X-linked gene only as some of newly acquired X- 250 linked genes also undergo X-upregulation, suggesting evolution of X-upregulation to the newly 251 acquired genes as well. 8

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Upregulation (Cat1+Cat2) Ancestral X-linked genes A B Fig. 4 No Upregulation (Cat3) Acquired X-linked genes

H3K4me3 H3K4me3 H3K4me3 H3K4me3 4 4 1.5

y 1.5

t 3 3

i

y

s t

i 1.0 n 2 1.0

2 s

e

n e D 1 1 0.5 0.5 D C -3kb TSS +3kb -3kb TSS TES +3kb +3kb -3kb TSS -3kb TSS TES +3kb CAST allele

20.0 7.0 C57 allele )

) 14.0 10.0 ***

2 ) e 17.5 **

6.0 g 0.0

l

s

2

e

o

l

g

l

a

n

(

a e

12.0 o

r

l

t

)

g

n

y

(

15.0 s

m

2

o t

d -4.0

e c i 8.0

t 5.0

y

a

e

c

e

n

a

k

n

c

l C

f -2.5

n

e

u

A i

+ 10.0

n

(

l

g

-

1

u

t e

e 12.5

X

q

r a

u -6.0

p

X C

4.0 e

(

q

r U

- 6.0 f

8.0 e

r

e t

10.0 -5.0 f

s

v t

r -8.0 i

3.0 s

u t 6.0 r

B c 7.5 4.0 u s B a e

n d e n n 2.0 n

e F

g o r

o 4.0 i o

i

i n t ) i

t 5.0 n t u d o a 2 a a i o l t

E ) l

l i

q e ) t

2 t u a u

c k u a 3 2.0 l t a

t l g C g )

n g

A a i u e a u e 3

l + M e 1.0 g r r t - C r 2.0 g ) 1

C 2.5 e + p t p a ( X e p r r t3 a p t1 U U (C U p a C U a U C ( o o C ( ( o N N 0.0 0.0 0.0 0.0 N -3kb TSS +3kb -3kb TSS TES +3kb -3kb TSS +3kb -3kb TSS TES +3kb CAST allele C57 allele PolII S5P PolII S5P PolII S5P PolII S5P 1 ns ns 0.5 0.5

0.8 0.8 10.0 9.5

y

y t

t 0.4 i

0.4 )

0.6 i

) s

0.6 s

2

n n

0.3 0.3 2

g

e e

0.4 0.4 g 8.5

o

D

l D

0.2 0.2 o

( l

8.0

0.2 0.2 (

0.1 0.1 e

e z

-3kb TSS i

+3kb -3kb TSS TES +3kb -3kb TSS +3kb -3kb TSS TES +3kb z i

s 7.5

s

t

t

2.00 s s

6.0 r r

4.0 u u 6.5

3.0 B B ) 1.75

s 2.0

e

e

l 3.5 l

n )

a

e

n a r 4.0 5.5

2

t

g o

t 2.5

i

s

t 1.50 a d e a n

m n

l

C e

3.0 c

n o io

u

k + n n ti ) t

e o

g n

1 o i a A i i 2

t t a l f l ) t

e t l

- 1.5 r a a 2 a u a u ( 1.25 l t l

p g ) 2.0 X C u g 2.5 u a C ( e e 3 U g C g ) r + r t e + e t3 p 1 p a X r r t p 1 p a U a U C

- ( t C 2.0 1.00 U a U ( (C o e 1.5 (C o N N

v 1.0 i

t 1.5 0.75

c

s e

a 1.0

n

n

d

e

o e

i 0.50

g r

t 1.0

F

i

a

l

u d

) 0.5

u

q

e

3

E

t

c k

g 0.5

a

n

e

A

i r

0.5 l

C 0.25

M

-

(

p

X

U

o

N 0.0 0.0 0.0 0.0 -3kb TSS +3kb -3kb TSS TES +3kb -3kb TSS +3kb -3kb TSS TES +3kb

H4K16ac H4K16ac D H4K16ac H4K16ac 2.5 2.6 2.4 2.3

y 2.3 t

i Pre-gastrulation embryos

y s

2.4 2.4 t 2.2 2.2 E6.50

i n

s 2.1 e

n 2.1 E6.25 e D 2.3 2.2 2.0 D E5.5 2.0 1.9 -3kb TSS +3kb -3kb TES +3kb TSS -3kb TSS +3kb -3kb TSS TES +3kb 5.0

4.0

5.0

s e

4.0 l

n 4.0

a

e

)

r

t

g

s

e

)

d

e

n

l

2

e

c

o

t i

4.0 k

n 3.0

t

a

a

n

A

i

a

l

l

C

-

u

+

m X

g 3.0 1 3.0

t Epiblast cells

(

e

r

a

p C

( Onset of Random X-inactivation U

X 3.0

- 2.0 e

2.0 v

2.0 s

i

e

t

2.0 n

d

c

e

e

g

r

i

a

u

d

n

q e

o 1.0

i

c

k t

F 1.0

n

A

a

i

l

l )

1.0 - u

3 1.0

E

t

X

g

a

e

r

C

M

(

p

U

o 0.0 0.0

N 0.0 -3kb TSS +3kb -3kb TSS TES +3kb -3kb TSS +3kb -3kb TSS TES +3kb Upregulation is mostly confined to Ancestral genes

H3K36me3 H3K36me3 H3K36me3 H3K36me3 0.6 0.6 0.4 0.4

0.5 Not Upregulated Genes y

y 0.3 Upregulated Genes t

0.4 t 0.3 i

i 0.4

s

s n

n 0.3 0.2 0.2

e e

0.2 D D 0.2 0.1 0.1 0.1 -3kb TSS +3kb -3kb TSS TES +3kb -3kb TSS +3kb -3kb TSS TES +3kb

1.6 1.6 Higher 1.75 1.75 burst 1.4 1.4

s frequency

e

) l

1.50 1.50 n

a

e

r

)

e

t

n

g

l

2

s o

t 1.2 1.2

i

d

e t

a Acquired X-linked genes a

e RNA PolII

c H3K4me3 H4K16ac

a

l

k

C

n

n u

+ 1.25 A

1.25 i

l

m

g

1

- t

e 1.0 1.0 Ancestral X-linked genes H3K36me3

X

r

a

e

f

p

C

(

( U 1.00 1.00

0.8 0.8 Xa: Active X Xi: Inactive X Xu: Upregulated X

X

- e 0.75 0.75 0.6

v 0.6

i

s

t

e

n

d

c

e

e

g n

r 0.4 0.4

i

a 0.50 0.50

o

i

u

d

t

q

e

a

c

k

l

)

F

n

u

3

A

i t

l 0.2 0.2

g

-

E a

e 0.25

0.25 X

r

C

(

p

M U 0.0 0.0 o 0.0 -3kb TSS +3kb -3kb TSS TES +3kb N 0.0 -3kb TSS +3kb -3kb TSS TES +3kb Gene distance (bp) Gene distance (bp) 252 253 Figure 4: Upregulated X-linked genes show higher enrichment of different active chromatin marks and 254 higher burst frequency compared to non-upregulated genes. Density plots showing the enrichment of RNA

9

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

255 PolII S5P, active chromatin marks H3K4me3, H3K36me3 and H4K16ac in the (A) upregulated vs not upregulated 256 X-linked genes and (B) ancestral vs acquired classes of X-linked genes on the active-X chromosome. (C) 257 Comparison of allelic burst frequency and burst size of the upregulated vs not upregulated X-linked genes in the 258 E6.5 EPI cells (Wilcoxon rank test: p-value < 0.01; ** and p-value < 0.001; ***). (D) Model representing that 259 X-chromosome upregulation is mostly confined to the ancestral X-linked genes and mechanism of XCU. 260 261 On the other hand, mechanism of XCU, remain poorly understood. We found that the active 262 histone marks such as H3K4me3, H3K36me3, H4K16ac as well as RNA PolII are highly 263 enriched on the genes subjected to X-upregulation compared to the not-upregulated genes (Fig. 264 4A). Furthermore, these marks are seen to have higher average density on the ancestral X- 265 linked genes compared to the newly acquired class of X-linked genes (Fig. 4B). Interestingly, 266 we found that the transcriptional burst frequencies are significantly enhanced in upregulated 267 genes compared to the non-upregulated genes. On the other hand, transcriptional burst sizes 268 remained almost similar between the two categories of X-linked genes (Fig. 4C). Taken 269 together, our results unveils that the distinctive enrichment of active chromatin marks and 270 higher transcriptional burst frequencies might be one of the major regulators for XCU (Fig. 271 4D). 272 In summary, our study confers significant support to Ohno's hypothesis. In addition, we show 273 that although the two X-chromosomes' expression state is highly plastic in nature towards 274 balancing the optimal X chromosome dosage during development, XCU is not global like XCI. 275 Importantly, we found that XCU is primarily confined to ancestral X-linked genes only. 276 277 Methods 278 Data acquisition: Pre-gastrulation embryo single-cell RNA-seq dataset for E5.5 and E6.25 279 generated from hybrid mouse embryos (C57BL/6J × CAST/EiJ) and E6.5 from hybrid mouse 280 with the reciprocal cross (CAST/EiJ × C57BL/6J) was retrieved under the accession code- 281 GSE109071 [28]. 282 Read alignment and counting: RNA-seq reads were aligned to the mouse reference genome 283 mm10 using STAR [29]. The reads aligned were then counted using HTSeq-count. To avoid 284 the dropout events due to low amount of mRNA sequenced within single cells, we used a 285 statistical imputation method scImpute [30], which identifies the likely dropouts without 286 introducing any bias in the rest of the data. Expression levels of transcripts was computed using 287 Transcripts per million (TPM) method. 288 Sexing of the embryos: For sex-determination of the pre-gastrulation embryos, an embryo was 289 classified as male if the sum of the read count for the Y-linked genes (Usp9y, Uty, Ddx3y, 290 Eif2s3y, Kdm5d, Ube1y1, Zfy2, Zfy1) in each cell of an embryo was greater than 12, rest were 291 considered as female embryos. 292 X:A ratio: We calculated the X:A ratio for different lineages of pre-gastrulation embryo by 293 dividing the median expression (TPM) of X-linked genes with the median expression (TPM) 294 of the autosomal genes. For this analysis, we considered those X-linked and autosomal genes 295 having ≥ 0.5 TPM. We also applied an upper TPM threshold which corresponded to the lowest 296 90 th centile of TPM expression to avoid any differences between the X-linked and autosomal 297 gene expression distribution. Also, statistical Kolmogorov-Smirnov's test was performed 298 which again validated the non-significant differences in the levels of gene expression

10

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

299 distribution between X and autosomal genes. We excluded the escapees of X-inactivation and 300 the genes in the pseudo autosomal region from our analysis. 301 Allele-specific expression analysis: We first constructed in silico CAST specific parental 302 genome by incorporating CAST/EiJ specific SNPs 303 (https://www.sanger.ac.uk/science/data/mouse-genomes-project) into the mm10 genome 304 using VCF tools. Reads were mapped onto C57BL/6J (mm10) reference genome and in silico 305 CAST genomes using STAR allowing no multi-mapped reads. To exclude any false positives, 306 we only considered those genes with at least 2 informative SNPs and minimum 3 reads per 307 SNP site. We took an average of SNP-wise reads to have the allelic read counts. After 308 normalization of allelic read counts, we considered only those genes for downstream analysis 309 which had at least 1 average reads across the cells of each lineage from a specific 310 developmental stage for pre-gastrulation embryos. Further, only those single-cells were 311 considered for downstream analysis which showed at least 10 X-linked gene expressions. 312 Allelic ratio was obtained individually for each gene using formula = (Maternal/Paternal reads) 313 ÷ (Maternal reads + Paternal reads). 314 Classification of ancestral and acquired X-linked genes: The ancestral X-linked genes are 315 identified as those having one-to-one orthologs in chicken and were retrieved from the Ensembl 316 biomart release 102. All the remaining X-linked genes are considered as the acquired class of 317 X-linked genes. 318 Topologically associated domains (TADs), LINE and SINE analysis: The coordinates of 319 TADs called in mouse ES cells by GMAP method were retrieved from the TADKB database 320 for our analysis [31]. For LINE and SINE element analysis, we used the table containing the 321 genomic coordinates of repeat elements from the UCSC table browser, the remasker table 322 (“rmsk”) for mm10 reference mouse genome assembly. The density of LINE and SINE 323 elements was then calculated for a gene as the ratio of the total length of the retrotransposons 324 present to the length of that gene. 325 Transcriptional burst kinetics: We used SCALE to profile allelic transcriptional burst 326 kinetics [32]. For this analysis, we used genes with at least 5 average read counts across the 327 cells of EPI E6.5. 328 ChIP-seq analysis: To estimate the enrichment of H3K4me3, H3K36me3, RNA PolII S5P, 329 and H4K16ac in our gene categories, we retrieved the available ChIP-seq datasets for MEF 330 cells from GSE33823 [33] , GSE36905 [34] for H3K4me3, H3K36me3, RNAPolII S5P and 331 GSE97459 [35] for H4K16ac. 332 Gene function analysis: We identified association of the genes with different biological 333 function through (Biological process, Molecular Function and Cellular 334 Component, Signal related function (http://geneontology.org/) [36], 335 databases (http://bioinfo.life.hust.edu.cn/AnimalTFDB/#!/ , 336 https://sunlab.cpy.cuhk.edu.hk/mTFkb/) [37,38], involved in protein complex ( 337 http://mips.helmholtz-muenchen.de/corum/#) [39], Dosage sensitive genes (human) 338 (https://www.clinicalgenome.org/) [40], Housekeeping genes database 339 (https://housekeeping.unicamp.br/, Li et al., 2017 ) [41,42]. 340 341 Statistical tests and plots: All statistical tests and plots were performed in R version 3.6.3. 342

11

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

343 Author's Contribution 344 SG conceptualized and supervised the study. HCN and DC did bioinformatic analyses. SG, 345 DC, and HCN wrote, edited, and proofread the manuscript. The final manuscript was edited 346 and approved by all the authors. 347 348 Acknowledgments 349 Study is supported by DBT grant (BT/PR30399/BRB/10/1746/2018), DST-SERB 350 (CRG/2019/003067), DBT-Ramalingaswamy fellowship (BT/RLF/Re-entry/05/2016) and 351 Infosys Young Investigator award to SG. We also thank DST-FIST [SR/FST/LS11- 352 036/2014(C)], UGC-SAP [F.4.13/2018/DRS-III (SAP-II)] and DBT-IISc Partnership Program 353 Phase-II (BT/PR27952-INF/22/212/2018) for infrastructure and financial support. 354 355 References:

356 1. Graves JAM. Evolution of vertebrate sex chromosomes and dosage compensation. 357 Nature Reviews Genetics. 2016. pp. 33–46. doi:10.1038/nrg.2015.2 358 2. Ohno S. Sex Chromosomes and Sex-linked Genes. Springer-Verlag, Berlin, 359 Heidelberg, New York. 1967;68: 1375. doi:10.7326/0003-4819-68-6-1375_2 360 3. Lyon MF. Gene action in the X-chromosome of the mouse (mus musculus L.). Nature. 361 1961;190: 372–373. doi:10.1038/190372a0 362 4. Di KN, Disteche CM. Dosage compensation of the active X chromosome in mammals. 363 Nat Genet. 2006;38: 47–53. doi:10.1038/ng1705 364 5. Gupta V, Parisi M, Sturgill D, Nuttall R, Doctolero M, Dudko OK, et al. Global 365 analysis of X-chromosome dosage compensation. J Biol. 2006;5. doi:10.1186/jbiol30 366 6. Lin H, Gupta V, Vermilyea MD, Falciani F, Lee JT, O’Neill LP, et al. Dosage 367 compensation in the mouse balances up-regulation and silencing of X-linked genes. 368 PLoS Biol. 2007;5: 2809–2820. doi:10.1371/journal.pbio.0050326 369 7. Xiong Y, Chen X, Chen Z, Wang X, Shi S, Wang X, et al. RNA sequencing shows no 370 dosage compensation of the active X-chromosome. Nat Genet. 2010;42: 1043–1047. 371 doi:10.1038/ng.711

372 8. Johnston CM, Lovell FL, Leongamornlert DA, Stranger BE, Dermitzakis ET, Ross 373 MT. Large-scale population study of human cell lines indicates that dosage 374 compensation is virtually complete. PLoS Genet. 2008;4: 0088–0098. 375 doi:10.1371/journal.pgen.0040009 376 9. Talebizadeh Z, Simon SD, Butler MG. X chromosome gene expression in human 377 tissues: Male and female comparisons. Genomics. 2006;88: 675–681. 378 doi:10.1016/j.ygeno.2006.07.016 379 10. Larsson AJM, Coucoravas C, Sandberg R, Reinius B. X-chromosome upregulation is 380 driven by increased burst frequency. Nat Struct Mol Biol. 2019;26: 963–969. 381 doi:10.1038/s41594-019-0306-y

382 11. Deng X, Hiatt JB, Nguyen DK, Ercan S, Sturgill D, Hillier LW, et al. Evidence for

12

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

383 compensatory upregulation of expressed X-linked genes in mammals, Caenorhabditis 384 elegans and Drosophila melanogaster. Nat Genet. 2011;43: 1179–1185. 385 doi:10.1038/ng.948

386 12. Sangrithi MN, Royo H, Mahadevaiah SK, Ojarikre O, Bhaw L, Sesay A, et al. Non- 387 Canonical and Sexually Dimorphic X Dosage Compensation States in the Mouse and 388 Human Germline. Dev Cell. 2017;40: 289-301.e3. doi:10.1016/j.devcel.2016.12.023 389 13. Li X, Hu Z, Yu X, Zhang C, Ma B, He L, et al. Dosage compensation in the process of 390 inactivation/reactivation during both germ cell development and early embryogenesis 391 in mouse. Sci Rep. 2017;7. doi:10.1038/s41598-017-03829-z 392 14. Deng X, Berletch JB, Ma W, Nguyen DK, Hiatt JB, Noble WS, et al. Mammalian X 393 upregulation is associated with enhanced transcription initiation, RNA half-life, and 394 MOF-mediated H4K16 acetylation. Dev Cell. 2013;25: 55–68. 395 doi:10.1016/j.devcel.2013.01.028 396 15. Lin H, Halsall JA, Antczak P, O’Neill LP, Falciani F, Turner BM. Relative 397 overexpression of X-linked genes in mouse embryonic stem cells is consistent with 398 Ohno’s hypothesis. Nat Genet. 2011; doi:10.1038/ng.992 399 16. De Mello JCM, Fernandes GR, Vibranovski MD, Pereira L V. Early X chromosome 400 inactivation during human preimplantation development revealed by single-cell RNA- 401 sequencing. Sci Rep. Springer US; 2017;7: 1–12. doi:10.1038/s41598-017-11044-z 402 17. Kharchenko P V., Xi R, Park PJ. Evidence for dosage compensation between the X 403 chromosome and autosomes in mammals. Nature Genetics. 2011. pp. 1167–1169. 404 doi:10.1038/ng.991 405 18. Chen X, Zhang J. The X to autosome expression ratio in haploid and diploid human 406 embryonic stem cells. Mol Biol Evol. 2016;33: 3104–3107. 407 doi:10.1093/molbev/msw187 408 19. Julien P, Brawand D, Soumillon M, Necsulea A, Liechti A, Schütz F, et al. 409 Mechanisms and evolutionary patterns of mammalian and avian dosage compensation. 410 PLoS Biol. 2012; doi:10.1371/journal.pbio.1001328 411 20. Mandal S, Chandel D, Kaur H, Majumdar S, Arava M, Gayen S. Single-Cell Analysis 412 Reveals Partial Reactivation of X Chromosome instead of Chromosome-wide 413 Dampening in Naive Human Pluripotent Stem Cells. Stem Cell Reports. 2020;14: 414 745–754. doi:10.1016/j.stemcr.2020.03.027 415 21. Pessia E, Makino T, Bailly-Bechet M, McLysaght A, Marais GAB. Mammalian X 416 chromosome inactivation evolved as a dosage-compensation mechanism for dosage- 417 sensitive genes on the X chromosome. Proc Natl Acad Sci U S A. 2012;109: 5346– 418 5351. doi:10.1073/pnas.1116763109 419 22. Pessia E, Engelstädter J, Marais GAB. The evolution of X chromosome inactivation in 420 mammals: The demise of Ohno’s hypothesis? Cellular and Molecular Life Sciences. 421 2014. pp. 1383–1394. doi:10.1007/s00018-013-1499-6 422 23. Gayen S, Maclary E, Buttigieg E, Hinten M, Kalantry S. A Primary Role for the Tsix 423 lncRNA in Maintaining Random X-Chromosome Inactivation. Cell Rep. 2015;11: 424 1251–1265. doi:10.1016/j.celrep.2015.04.039

13

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

425 24. Harris C, Cloutier M, Trotter M, Hinten M, Gayen S, Du Z, et al. Conversion of 426 random x-inactivation to imprinted x-inactivation by maternal prc2. Elife. 2019;8. 427 doi:10.7554/eLife.44258

428 25. Maclary E, Hinten M, Harris C, Sethuraman S, Gayen S, Kalantry S. PRC2 represses 429 transcribed genes on the imprinted inactive X chromosome in mice. Genome Biol. 430 2017;18. doi:10.1186/s13059-017-1211-5 431 26. Gayen S, Maclary E, Hinten M, Kalantry S. Sex-specific silencing of X-linked genes 432 by Xist RNA. Proc Natl Acad Sci U S A. 2016;113. doi:10.1073/pnas.1515971113

433 27. Sarkar MK, Gayen S, Kumar S, Maclary E, Buttigieg E, Hinten M, et al. An Xist- 434 activating antisense RNA required for X-chromosome inactivation. Nat Commun. 435 2015;6. doi:10.1038/ncomms9564 436 28. Cheng S, Pei Y, He L, Peng G, Reinius B, Tam PPL, et al. Single-Cell RNA-Seq 437 Reveals Cellular Heterogeneity of Pluripotency Transition and X Chromosome 438 Dynamics during Early Mouse Development. Cell Rep. 2019;26: 2593--2607.e3. 439 doi:10.1016/j.celrep.2019.02.031 440 29. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: 441 Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29: 15–21. 442 doi:10.1093/bioinformatics/bts635 443 30. Li WV, Li JJ. An accurate and robust imputation method scImpute for single-cell 444 RNA-seq data. Nat Commun. 2018;9. doi:10.1038/s41467-018-03405-7 445 31. Liu T, Porter J, Zhao C, Zhu H, Wang N, Sun Z, et al. TADKB: Family classification 446 and a knowledge base of topologically associating domains. BMC Genomics 2019 447 201. BioMed Central; 2019;20: 1–17. doi:10.1186/S12864-019-5551-2 448 32. Jiang Y, Zhang NR, Li M. SCALE: Modeling allele-specific gene expression by 449 single-cell RNA sequencing. Genome Biol. 2017;18. doi:10.1186/s13059-017-1200-8 450 33. Yildirim E, Sadreyev RI, Pinter SF, Lee JT. X-chromosome hyperactivation in 451 mammals via nonlinear relationships between chromatin states and transcription. Nat 452 Struct Mol Biol. 2012;19: 56–62. doi:10.1038/nsmb.2195 453 34. Pinter SF, Sadreyev RI, Yildirim E, Jeon Y, Ohsumi TK, Borowsky M, et al. 454 Spreading of X chromosome inactivation via a hierarchy of defined Polycomb stations. 455 Genome Res. Cold Spring Harbor Laboratory Press; 2012;22: 1864–1876. 456 doi:10.1101/GR.133751.111 457 35. Ye A, Kim H, Kim J. PEG3 control on the mammalian MSL complex. 2017; 458 doi:10.1371/journal.pone.0178363

459 36. H M, A M, D E, X H, PD T. PANTHER version 14: more genomes, a new PANTHER 460 GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res. Nucleic 461 Acids Res; 2019;47: D419–D426. doi:10.1093/NAR/GKY1038 462 37. H H, YR M, LH J, QY Y, Q Z, AY G. AnimalTFDB 3.0: a comprehensive resource 463 for annotation and prediction of animal transcription factors. Nucleic Acids Res. 464 Nucleic Acids Res; 2019;47: D33–D38. doi:10.1093/NAR/GKY822 465 38. Sun K, Wang H, Sun H. mTFkb: a knowledgebase for fundamental annotation of 466 mouse transcription factors. Sci Reports 2017 71. Nature Publishing Group; 2017;7: 1– 14

bioRxiv preprint doi: https://doi.org/10.1101/2021.07.18.452817; this version posted July 19, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

467 11. doi:10.1038/s41598-017-02404-w 468 39. M G, J R, B B, I D-K, G F, G F, et al. CORUM: the comprehensive resource of 469 mammalian protein complexes-2019. Nucleic Acids Res. Nucleic Acids Res; 2019;47: 470 D559–D563. doi:10.1093/NAR/GKY973 471 40. HL R, JS B, LD B, CD B, JP E, MJ L, et al. ClinGen--the Clinical Genome Resource. 472 N Engl J Med. N Engl J Med; 2015;372: 2235–2242. doi:10.1056/NEJMSR1406261 473 41. Li B, Qing T, Zhu J, Wen Z, Yu Y, Fukumura R, et al. A Comprehensive Mouse 474 Transcriptomic BodyMap across 17 Tissues by RNA-seq. Sci Reports 2017 71. Nature 475 Publishing Group; 2017;7: 1–10. doi:10.1038/s41598-017-04520-z 476 42. Hounkpe BW, Chenou F, de Lima F, De Paula EV. HRT Atlas v1.0 database: 477 redefining human and mouse housekeeping genes and candidate reference transcripts 478 by mining massive RNA-seq datasets. Nucleic Acids Res. Oxford Academic; 2021;49: 479 D947–D955. doi:10.1093/NAR/GKAA609 480 481

15