bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Extensive cellular heterogeneity of X inactivation revealed by single- allele-specific 2 expression in fibroblasts. 3 Marco Garieri 1, #, Georgios Stamoulis 1, #, Emilie Falconnet 1, Pascale Ribaux 1, Christelle 4 Borel 1, † ,*, Federico Santoni 1,3, †,* and Stylianos E. Antonarakis 1, 2, 4, †, * 5 6 1Department of Genetic Medicine and Development, University of Geneva Medical School, 7 Geneva, Switzerland. 8 2 University Hospitals of Geneva, Switzerland. 9 3 Department of Endocrinology, Diabetology and Metabolism, University Hospitals of 10 Lausanne, Switzerland. 11 4 iGE3 Institute of Genetics and Genomics of Geneva, Switzerland. 12 # These authors contributed equally to this work 13 †These authors contributed equally to this work 14 * Corresponding Authors 15 16 Address for correspondence: 17 Stylianos E. Antonarakis, Christelle Borel 18 Department of Genetic Medicine and Development, 19 University of Geneva Medical School, Geneva, Switzerland. 20 1 rue Michel-Servet 21 1211 Geneva, Switzerland 22 Tel +41-22-379-5707 23 Fax +41-22-379-5706 24 Email [email protected], [email protected] 25 26 Federico Santoni 27 Department of Endocrinology, Diabetology and Metabolism, 28 Lausanne University Hospital (CHUV), Lausanne, Switzerland 29 7 rue de Bugnon, 30 1111 Lausanne, Switzerland 31 email: [email protected] 32 33 Running title: X-inactivation in single cells. 34 Keywords: single cell, ASE, X-inactivation

1 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

35 ABSTRACT 36 37 In eutherian mammals, X inactivation (XCI) provides a dosage compensation 38 mechanism where in each female cell one of the two X is randomly silenced. 39 However, some on the inactive and outside the pseudoautosomal 40 regions escape from XCI and are expressed from both alleles (escapees). Given the relevance 41 of the escapees in biology and medicine, we investigated XCI at an unprecedented single-cell 42 resolution. We combined deep single-cell RNA sequencing with whole genome sequencing 43 to examine allelic specific expression (ASE) in 935 primary fibroblast and 48 lymphoblastoid 44 single cells from five female individuals. In this framework we integrated an original method 45 to identify and exclude doublets of cells. We have identified 55 genes as escapees including 5 46 novel escapee genes. Moreover, we observed that all genes exhibit a variable propensity to 47 escape XCI in each cell and cell type, and that each cell displays a distinct expression profile 48 of the escapee genes. We devised a novel metric, the Inactivation Score (IS), defined as the 49 mean of the allelic expression profiles of the escapees per cell, and discovered a 50 heterogeneous and continuous degree of cellular XCI with extremes represented by 51 “inactive” cells, i.e., exclusively expressing the escaping genes from the active X 52 chromosome, and “escaping” cells, expressing the escapees from both alleles. Intriguingly we 53 found that XIST is the major genetic determinant of IS, and that XIST expression, higher in 54 G0 phase, is negatively correlated with the expression of escapees, inactivated and 55 pseudoautosomal genes. In this study we use single-cell allele specific expression to identify 56 novel escapees in different tissues and provide evidence of an unexpected cellular 57 heterogeneity of XCI driven by a possible regulatory activity of XIST. 58 59 60 61 INTRODUCTION 62 63 In eutherian mammals, X chromosome inactivation (XCI) is a well-described mechanism of 64 dosage compensation for the X chromosome in females (Lyon 1961; Penny et al. 1996; Chow 65 and Heard 2009; Bartolomei and Ferguson-Smith 2011). In female cells, only one X 66 chromosome is transcribed (Xa; X-active), whereas the second X chromosome is silenced 67 (Xi; X-inactive) (Lyon 1961). Marsupials have an imprinted pattern of XCI, and the paternal 68 allele is predominantly inactive (Sharman 1971). In mice, an imprinted form of XCI occurs

2 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

69 through early embryonic developmental stages (4-8 cell stage)(Huynh and Lee 2003; 70 Okamoto et al. 2004; Okamoto et al. 2005; Patrat et al. 2009), followed by inner cell mass 71 reactivation and random XCI in epiblast cells (Mak et al. 2004). In , the two X 72 chromosomes are active during post-zygotic stages, achieve dosage compensation by 73 dampening their expression up to or even after late blastocyst formation, until one of the X 74 chromosomes is randomly inactivated in each cell (Petropoulos et al. 2016). In female 75 somatic cells, random XCI is stable, resulting in a mosaicism for on the X 76 chromosome, in which an average of 50% of cells express the active paternal X and 50% the 77 active maternal X alleles. Most of the genes on the Xi chromosome are transcriptionally 78 silenced through epigenetic processes initiated by the X Inactivation Center (XIC) and spread 79 along the Xi chromosome during early embryogenesis (Lyon 1961). The XIC encodes several 80 genes, including XIST, a long non-coding RNA (ncRNA) essential for initiating and 81 completing XCI (Brown et al. 1991; Ballabio and Willard 1992; Brown et al. 1992). XIST 82 RNA molecules mediate the establishment and maintenance of XCI in subsequent cycles of 83 mitotic division by coating the Xi chromosome and recruiting Polycomb Repressive Complex 84 2 with repressive chromatin modifiers (Lee and Bartolomei 2013). It has been shown that the 85 coating of the Xi is regulated by the interaction between XIST ncRNA and the Lamin B 86 receptor (LBR)(Chen et al. 2016a). This interaction is needed for the recruitment of Xi to the 87 nuclear lamina and the subsequent spread of XIST ncRNA to actively transcribed regions 88 (Chen et al. 2016a). 89 However, not all X-linked genes are inactivated. In females, genes that escape from XCI 90 (escapees) represent 15-25% of the X-linked genes, and a further 10% of escapees differ 91 between individuals and cell types (Carrel and Willard 2005; Prothero et al. 2009; Yang et al. 92 2010; Cotton et al. 2013; Crowley et al. 2015). Such genes have been associated to sex- 93 specific traits and to clinical abnormalities in patients with X chromosome aneuploidy, such 94 as Turner and Klinefelter Syndromes (Berletch et al. 2011). Pathogenic variants in escapees 95 also contribute to various disease phenotypes in women carriers, including Kabuki syndrome 96 (KABUK1 [MIM 147920])(Lederer et al. 2012; Miyake et al. 2013), intellectual 97 disabilities(van Haaften et al. 2009; Grasso et al. 2012; Jones et al. 2012; Gropman and 98 Samango-Sprouse 2013; Zhang et al. 2013; Dunford et al. 2016). Genes escaping XCI have 99 been previously identified by whole tissue studies using different approaches, such as X- 100 linked gene expression comparisons between males and females (Yasukochi et al. 2010), 101 detecting allelic imbalance in clonal lymphoblast and fibroblast cell lines (Cotton et al. 102 2013),identifying inactivated and active transcription start sites by methylation profiles

3 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

103 (Cotton et al. 2015)and among female individuals with X chromosome aneuploidies (Sudbrak 104 et al. 2001) 105 The ability to capture single cells and to study their allele-specific expression (ASE) (Borel et 106 al. 2015) provides the opportunity to explore XCI patterns at the single-cell level and to 107 identify escapee genes. Recent studies on mouse single cells demonstrated the robust nature 108 of this technology to monitor the dynamics of XCI through differentiation(Chen et al. 2016b), 109 mouse preimplantation female embryos (Borensztein et al. 2017) and in clonal somatic cells 110 (Reinius et al. 2016). Recently, (Tukiainen et al. 2017) performed an across-tissue study of X 111 inactivation and partially validated their observation performing shallow sequencing (1Mio 112 reads x cell) on 940 single cells from lymphoblasts and dendritic cells. Here, using RNA-Seq 113 at high sequencing depth (40Mio reads per cell), we studied the X-linked ASE in 983 114 isolated, unsynchronized single fibroblast and lymphoblast cells and established the degree of 115 XCI after the removal of potential confounding effects. One of the caveats of allele 116 expression quantification in single cells is represented by the allele dropout, which randomly 117 affects the detection of one of the two alleles of poorly expressed genes (Stegle et al. 2015). 118 However, in this context, the allelic dropout will not induce false positive escapees. Moreover 119 given the number of cells analyzed in our study, the probability to consistently miss the 120 capture of the expressed allele from the X-inactive chromosome of a true escapee in all the 121 cells is extremely low. In this study we identified 55 escapee genes in at least one individual, 122 out of which 5 were novel. A subset of 22 genes was detected as escapee in at least two 123 individuals (robust set), including 3 novel escapee genes. Through the analysis of the 124 expression profile of the genes in the robust escapee set, we further investigated their 125 propensity to escape XCI in each cell. We discovered that the transcriptional activity of the 126 escapee genes is highly variable and significantly associated with the cellular XIST transcript 127 abundance. 128 129 130 131 132 RESULTS 133 134 Identification and elimination of confounding doublets 135 Genes located on the X chromosome of female cells express one allele from the randomly 136 active chromosome, while escaping genes express both alleles (Fialkow 1970). Multiple cells

4 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

137 (i.e., doublets) resulting from the simultaneous capture of two or more cells expressing 138 discordant haplotypes obviously complicate the detection of escapee genes and potentially 139 increase the number of false positives. A recent study has explored the limits of current 140 fluidic technology to conduct single-cell RNAseq, demonstrating that a substantial fraction of 141 the bona fide isolated single cells are doublets (Macosko et al. 2015). After removing 32 cells 142 because of low mapping quality (Supplemental Figure S1), in order to eliminate all doublets 143 with discordant haplotypes, we conducted a pairwise correlation analysis of X-linked ASE 144 among all the cells. We performed hierarchical clustering analysis and obtained three distinct 145 clusters of cells (Figure 1B and Supplemental Figure S2, Methods). As expected, two clusters 146 were populated by cells with one mutually inactivated X chromosome. The third cluster 147 included cells with a biallelic expression pattern for all the X chromosome genes, revealing 148 the presence of doublets. Considering all 5 individuals (n = 983 cells), we identified a total of 149 82 doublets (~8% of the total, consistent with Fluidigm manufacturer’s expectations), which 150 were excluded from further analysis (Figure 1B, Supplemental Figure S2 and Supplemental 151 Table S1). Doublets expressing concordant haplotypes cannot be detected with this our in 152 silico approach; however, they do not inflate the number of false positive escapees and, 153 consequently, they do not have an impact on escapee gene discovery and XCI analysis. 154 155 Identification of escapee genes 156 To estimate the allelic ratio (AR) for each gene in each cell, we calculated the ratio of the 157 number of reads supporting the cell-specific expressed haplotypes over the total number of 158 reads covering all single nucleotide variants (SNV) of a gene (See Methods). Fully 159 inactivated genes displayed an AR equal to 1. In the relaxed discovery set of escapee genes, 160 putative escapees were defined as having an AR =< 0.95 in at least one individual. The 161 rational and the choice of this threshold is explained in the methods. The gene is considered 162 as inactivated (i.e. exclusively expressed from the active chromosome) otherwise. As a proof 163 of principle of the reliability of our approach we first confirmed that XIST is expressed 164 exclusively from the inactivated allele by analyzing its AR in all cells from individuals 3 and 165 4 (i.e monozygotic twin samples), for which we were able to phase the haplotypes from 166 parental genotyping (Supplemental Figure S3). As an additional control we examined the 167 allele expression profile of genes in PAR1 and PAR2 regions. As expected, all of these genes 168 exhibited a balanced ASE across the two X chromosomes except VAMP7 in PAR2 where 169 very few cells displayed an expression from the inactive chromosome (Figure 2, 170 Supplementary Table S2).

5 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

171 As expected, the majority of chrX genes showed an inactivated status in all the cells (Figure 2 172 and Supplemental Table S2). From a total of 296 genes interrogated in at least one individual, 173 we identified the relaxed set of 55 escapees (18.5%): 50 of them previously described to 174 escape XCI in at least one study, and 5 novel escapees (INE2 (antisense gene) , STK26, 175 UQCRBP1, LINC00630 and TTC3P1) (Figure 2, S3). Out of 203 genes with AR information 176 from at least 2 individuals, we classified as robust escapees 22 genes (10.9%) which exhibit 177 an escapee status in at least two individuals including 3 novel genes (INE2, STK26 and 178 TTC3P1). As expected, the power to detect a gene escaping XCI is linearly related with the 179 respective expression level (Supplemental Figure S4). The number of overlapping escapees 180 among all individuals is shown in a Venn diagram (Supplemental Figure S5). Results from 181 both relaxed and robust sets are consistent with the current understanding of XCI, in which is 182 predicted that 10% to 20% of X chromosome genes escape XCI (Carrel and Willard 2005). 183 In addition, we analyzed 48 single cells from a lymphoblastoid cell line derived from the 184 individual 5 to investigate the escapee concordance with the fibroblasts. After quality control 185 and doublets removal, we were able to classify 9 genes as escapee in lymphoblastoid cells 186 with 5 of them being known escapees (DDX3X, KDM6A, MSL3, PUDP, ZFX), and 4 novel 187 escapee genes (IDS, SLC9A7, STAG2, STK26). We observed that, though expressed and 188 having an informative heterozygous site, the MSL3, IDS, SLC9A7, and STAG2 genes were 189 not classified as escapees in fibroblasts. Conversely several genes detected as escapees in 190 fibroblasts were not escapees in lymphoblast cells. This could be partially ascribed to the 191 previously observed tissue specificity of XCI (Deng et al. 2014b; Tukiainen et al. 2017) and 192 to the recently discovered peculiar maintenance of XCI in lymphoblast cells (Wang et al. 193 2016). Additionally, the relatively small number of lymphoblastoid single cells along with the 194 absence of informative heterozygous sites for some genes in other individuals, can affect our 195 ability to detect escapees(Santoni et al. 2017) and to assess and establish their ASE. However 196 the STK26 escapee status was clearly confirmed in both fibroblast and lymphoblastoid single 197 cells. 198 199 Heterogeneity of escaping XCI 200 Among five individuals, the 22 robust escapee genes exhibited a heterogeneous AR profile, 201 being inactivated in some cells and escaping XCI in others (Figure 3). Specifically, we 202 calculated a cellular escaping ratio per escapee gene as the proportion of cells escaping XCI 203 with respect to the total number of cells expressing the gene. Some genes displayed a stable 204 cellular escaping ratio among all the individuals, while others were more variable. For

6 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

205 example, CA5BP1 showed consistent cellular escaping ratios ranging from 37% to 52% ; 206 ZFX, a known constitutive escapee (Schneider-Gadicke et al. 1989), presented with cellular 207 escaping ratios ranging from 85% to 100%, while DDX3X had a broader range going from 208 29% to 61%. Overall, escapees had different cellular escaping ratios, thereby suggesting that 209 each escapee gene is independently regulated. 210 The observed cellular pattern of XCI of the escapees (Figure 3) suggests a variable cellular 211 ability of expressing genes from the inactivated allele. To investigate this hypothesis, we 212 calculated the Inactivation Score (IS) for each cell defined as the mean AR of the escapee 213 genes detected per cell (we considered only cells expressing at least two escapees). For each 214 individual, we ordered the cells according to the respective IS and discovered that the 215 capacity to express from the inactive chromosome is a continuous variable (Figure 3, Figure 216 4A). This suggested a cellular stratification that reflect the propensity of each cell to escape 217 XCI, as confirmed by the proportion of escapee genes per cell (Figure 4B). As expected, the 218 Inactivation Score is strongly negatively correlated with expression from the inactive X 219 chromosome (Supplemental Figure S6). Notably, in all five individuals, we observed two 220 special groups of cells: one in which all the detectable escapees behaved as inactivated and 221 another where all detectable escapees expressed both X alleles (Figure 4B). These two 222 extreme cell populations represent on average 15% of the total number of aggregate cells 223 among the individuals (individual 1: 20%, individual 2: 14%, individual 3: 8%, individual 4: 224 7%, individual 5: 26%). As a control, we calculated the Inactivation Score of the remaining 225 inactivated genes per cell (all close to 1, as expected. Figure 4A, right panel). Overall, these 226 results demonstrate that XCI is a complex intra- and inter-individual heterogeneous process, 227 and the ability to escape from X inactivation varies from gene to gene, from cell to cell and 228 also among individuals. The evidence of similar cell stratification in all individuals suggests 229 the existence of a general regulatory mechanism that controls the propensity of a cell to 230 express genes from the inactivated allele. 231 232 Potential drivers of cellular XCI heterogeneity

233 We hypothesized that the cellular XCI heterogeneity may be associated with the level of 234 expression of genes on the X chromosome. To explore this hypothesis, we correlated the X- 235 linked gene expression (RPKM > 1) with the Inactivation Score. After FDR correction for 236 multiple testing, genes were ranked based on the adjusted p-value (Figure 5A). Notably the

7 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

237 only gene positively and significantly correlated with IS was XIST (nominal p-value = 3.2 x 238 10-5; adj. p-value = 3.0 x 10-3). XIST is a well-known non-coding RNA that regulates the 239 establishment and the maintenance of XCI (Brown et al. 1991). Other significantly albeit 240 negatively associated genes were EIF2S3, CD99, NDUFA1, RPL10 and BCAP31. We further 241 investigated the correlation of XIST expression with inactivated genes and all escapee genes 242 (relaxed and robust sets) and genes located in PAR regions. Notably we observed a 243 significant trend of negative correlations between XIST expression and the majority of the 244 genes for all four categories (H0: mean of distribution of correlation coefficients = 0; 245 inactivated genes p = 2.06 x 10-83 ; PAR genes p = 6.49 x 10-4; Relaxed escapee p= 1.01 x 10- 246 6; Robust escapees p= 4.19 x 10-4, Mann-Whitney two-tailed test, Figure 5B). This suggests a 247 general repressive XIST regulatory effect in all classes of X-linked genes. To further 248 investigate whether this effect was cell cycle dependent, we used Cyclone (Scialdone et al. 249 2015) to assign each cell to a specific cell cycle phase according to the expression of the 250 appropriate gene markers (from CycleBase (Santos et al. 2015)). Cells not expressing MKI67 251 have been previously reported to be in G0 (Sobecki et al. 2017). A statistically significant 252 difference between the populations of single cells in G0 and G1 was observed regarding XIST 253 expression (Figure 6). In agreement with our previous observations, the Inactivation Score 254 was also significantly higher in G0 than in G1 cells on average (Figure 6). These results 255 together support the hypothesis that XIST ncRNA tends to be more expressed in the resting 256 G0 phase than in G1 and, consequently, the expression of the escapees from the inactive 257 chromosome is reduced in this cell cycle stage. We could not identify any cell-cycle driven 258 effect for the remaining cell cycle stages, likely due to a limited statistical power (small 259 number of single cells classified in these cell cycle stages, Figure 6). 260 261 262 DISCUSSION 263 Our study, using human single-cell RNAseq datasets, points to a pervasive heterogeneity in 264 escaping XCI. We have shown that escapees have a different allelic expression profile in 265 single fibroblasts from the same individual. More than 50% of the escapees had the tendency 266 to be mainly expressed from Xa (Figure 4), while ZFX and PUDP exhibited an overall 267 biallelic expression in more than 70% of the cells. We also observed that some escapees 268 exhibited a relatively stable cellular escaping ratio (proportion of cells in which the gene is 269 escaping), i.e., ZFX and CA5BP1, where others, such as DDX3X, showed broader variability 270 (Figure 3). This finding explains, from a single-cell perspective, the previous observations of

8 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

271 heterogeneous gene expression from Xi in cell lines derived from different individuals 272 (Carrel and Willard 1999) and tissues (Cotton et al. 2013; Berletch et al. 2015). A more 273 recent study revealed the contribution of 6 escapee genes (TRX, CNKSR2, DDX3X, KDM5C, 274 KDM6A, MAGEC3) to cancer sex bias (Dunford et al. 2016). Here, we confirmed the escapee 275 status for DDX3X, KDM5C and KDM6A. We observed that the inter-individual DDX3X 276 inactivation profile is highly variable and thereby could potentially be associated with 277 differences in cancer predisposition occurring among female individuals. 278 Among the genes that we detected as escapees in this study, HUWE1 encodes for an E3 279 ubiquitin and microduplications of this gene have been found in individuals affected by 280 Turner type intellectual disability (MRX17 [MIM 300706])(Froyen et al. 2008; Froyen et al. 281 2012). Moreover a recent study suggested the critical role of HUWE1 as a colonic tumor 282 suppressor gene (Myant et al. 2017). RBMX encodes an RNA binding which is 283 involved in pre- and post-transcriptional regulation and alternative splicing of many 284 premature transcripts. Pathogenic variants in RBMX have been associated with X-linked 285 intellectual disability (Shashi et al. 2000; Shashi et al. 2015). STK26 is a member of germinal 286 center subfamily and encodes a serine/threonine kinase expressed highly in human 287 lymphocytes and thymus (Ceccarelli et al. 2011; Jiao et al. 2015). Expression of STK26 288 (alternatively named as MST4) was recently shown to be significantly lower in patients 289 affected by Graves’ disease, an autoimmune disorder characterized by abnormal thyroid 290 function (Guo et al. 2017). It has also been shown that patients affected with Turner 291 Syndrome (XO) exhibit a high prevalence of hypothyroidism (El-Mansoury et al. 2005). We 292 speculate that this phenotype may be related to the reduced expression of the single copy of 293 STK26. 294 and deletions of the escapee genes identified in the relaxed set such as EDA, 295 FHL1, IL1RAPL1, and FMR1 have been associated with different pathological conditions. 296 More specifically pathogenic variants of EDA, a gene of the tumor necrosis factor family, 297 have been associated with X-linked dominant tooth agenesis and X-linked ectodermal 298 dysplasia (Kere et al. 1996) (Yotsumoto et al. 1998) (Visinoni et al. 2003) (Tao et al. 2006) 299 (Tarpey et al. 2007). FHL1 encodes a member of the four-and-a-half-LIM-only protein 300 family and contains two highly conserved zinc finger domains. Pathogenic variants in FHL1 301 have been associated with X-linked dominant reduced body myopathy with both affected 302 male and female individuals (Liewluck et al. 2007) (Schessl et al. 2008; Shalaby et al. 2009), 303 Emery-Dreifuss muscular dystrophy (Gueneau et al. 2009), and scapuloperoneal myopathy 304 (Quinzii et al. 2008). Pathogenic variants of IL1RAPL1, a member of interleukin 1 receptor

9 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

305 family, have been linked to X-linked mental retardation (Tabolacci et al. 2006; Nawara et al. 306 2008). A trinucleotide repeat expansion (CGG) in the 5’ UTR of FMR1 is the cause of 307 Fragile X syndrome (Kremer et al. 1991). We hypothesize that the phenotypic variability of 308 these diseases may be partially explained by variable penetrance and variable expressivity 309 issues in these gene-disease associations due to the observed variable escapee status of their 310 respective causative genes in the relevant tissues. 311 It has been recently observed in a mouse study that some genes escapee XCI in a fraction of 312 cells only ( Reinius (Reinius et al. 2016)et al .2016). We confirmed and extended this 313 observation with the finding that XCI heterogeneity extends from genes to single cells. Using 314 the Inactivation Score as a metric for cellular propensity to escape XCI, we indeed showed 315 that cells from the same individual have a variable propensity to transcribe the escapee genes 316 from Xi. According to the IS, those cells stratify into three classes: cells where all escapees 317 are expressed from the inactive allele, cells where all escapees are inactivated, and cells with 318 an intermediate profile (Figure 4B). We have shown that the observed cellular heterogeneity 319 is driven by the expression profile of XIST in single cells which is in turn strongly and 320 positively correlated with the Inactivation Score. We found five genes negatively associated 321 with IS. EIF2S3 encodes the gamma subunit of the translation initiation factor 2 which is 322 involved in protein synthesis initiation (Moortgat et al. 2016). This gene has been described 323 to escape X inactivation while hemizygous mutations have been associated with X mental 324 retardation Borck type (Ehrmann et al. 1998) (Moortgat et al. 2016). CD99 is located in the 325 PAR1 region and is involved in leukocyte migration, and T-cell adhesion (Bernard et al. 326 1997) (Bernard et al. 2000). Additionally it has been characterized as a potential 327 oncosuppressor in osteosarcoma(Manara et al. 2006). NDUFA1 (NADH dehydrogenase 328 (ubiquinone) 1 alpha subcomplex, 1) encodes a component of the mitochondrial respiratory 329 chain; hemizygous mutations have been associated with mitochondrial complex I deficiency 330 (Fernandez-Moreira et al. 2007) (Ambros et al. 1997; Potluri et al. 2009). RPL10 is involved 331 in protein synthesis and is an essential component of large 60s ribosomal subunit (Gong et al. 332 2009). Hemizygous RPL10 mutations have been associated with syndromic X-linked mental 333 retardation (Thevenon et al. 2015). BCAP31 is involved in the export of secreted and 334 hemizygous loss of function mutations have been identified in individuals affected with 335 deafness and severe dystonia(Cacciagli et al. 2013). 336 Additionally, we demonstrated that XIST expression is negatively correlated with the overall 337 expression of all genes in chrX including pseudoautosomal and escapee genes, thereby 338 indicating a general repressive regulatory activity of XIST transcripts. Inspired by a previous

10 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

339 report describing the localisation of XIST during the cell cycle (Jonkers et al. 2008), we 340 explored the hypothesis that XIST expression and IS are related to the cell cycle phases of 341 individual cells. Our data suggest that XIST is more expressed during the G0 than the G1 342 phase and single cells in the G0 phase have an increased IS, indicating less propensity to 343 express escapees from Xi. Overall, these results revealed a considerable degree of 344 heterogeneity of XCI in single cells and enhance our understanding of X inactivation. The 345 framework presented in this study can be applied on an increased number of cells extracted 346 from various tissues from a large cohort of individuals. Such investigations will enable us to 347 define the extent of cellular XCI heterogeneity and clarify its possible implication in the 348 phenotypic variability of X-linked single gene disorders, whole X chromosome aneuploidies 349 and to the observed female sex bias in cancer. 350 351 MATERIAL AND METHODS 352 353 Ethical statement 354 The study was approved by the ethics committee of the University Hospitals of Geneva, and 355 written informed consent was obtained from both parents of each individual prior to the 356 study. 357 358 Samples 359 We established 6 different cell lines from 5 female individuals: 5 primary fibroblast cell lines 360 and one lymphoblastoid cell line (Table S1). We captured 935 single-cell fibroblasts and 48 361 lymphoblastoid single cells. Lymphoblastoid cells obtained from one of the five female 362 individuals (Figure 1A, Table S1)(Borel et al. 2015; Santoni et al. 2017). 363 364 Cell growth 365 Cells were cultured in DMEM GlutaMAX™ (Life Technologies) supplemented with 10% 366 fetal bovine serum (Life Technologies) and 1% penicillin/streptomycin/fungizone mix

367 (Amimed, BioConcept) at 37°C in a 5% CO2 atmosphere as described (Borel et al. 2015). 368 369 Whole Genome Sequencing 370 Genomic DNA was extracted for all five individuals using a QIAamp DNA Blood Mini Kit 371 (Qiagen) and fragmented by Covaris to peak sizes of 300–400 bp. Libraries were prepared 372 with TruSeq DNA kit (Illumina) using 1 µg of gDNA and sequenced on an Illumina HiSeq

11 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

373 2000 machine with 2 x 100-bp as previously described(Borel et al. 2015). All experiments 374 were performed using the manufacturer's protocols. For each individual, raw whole genome 375 DNA sequences were analyzed using an in-house pipeline that utilizes published algorithms 376 in a sequential manner (BWA) (Li and Durbin 2010) for mapping the reads over the hg19 377 reference, SAMtools (Li et al. 2009) for detection of heterozygous sites, and (ANNOVAR) 378 for the annotation (Wang et al. 2010). 379 380 Single-cell capture 381 Single-cell capture was performed using the C1 single-cell auto prep system (Fluidigm) 382 following the manufacturer’s procedure(Borel et al. 2015). The integrated fluidic circuit used 383 for the study is the C1™ Single-Cell mRNA Seq IFC, 17–25 µm with a capacity of 96 384 chambers. During the capture, all 96 chambers were inspected under an inverted phase 385 contrast microscope; only chambers containing a non-damaged single cell were considered 386 for downstream analysis. 387 388 Single-cell RNA-sequencing 389 SMARTer Ultra Low RNA kit for Illumina sequencing (version 2, Clontech) was used for the 390 cell lysis and cDNA synthesis. Libraries were prepared with 0.3 ng of pre-amplified cDNA 391 using Nextera XT DNA kit (Illumina) as described (Borel et al. 2015). Libraries were 392 sequenced on an Illumina HiSeq2000 sequencer as 100 bp single-ended reads. RNA 393 sequences were mapped with GEM (Marco-Sola et al. 2012). Uniquely mapping reads were 394 extracted by filtering for mapping quality (MQ>=150). An in-house algorithm was used to 395 quantify RPKM expression using GENCODE v26. Cells with less than 1 million uniquely 396 mapped reads were excluded from further analysis (Supplemental Figure S7). 397 398 Allele-specific expression and classification of escapee genes 399 For each gene on the X chromosome, the aggregate monoallelic ratio (AR) per cell was 400 calculated by averaging the allelic ratio of the reads covering the respective heterozygous 401 sites (AR = sum of number of reads from the active X allele / total SNV reads; 0≤AR≤1). 402 Since we do not have the availability of parental genotype for all the individuals, we designed 403 an algorithm to estimate the active X allele per site based on the assumption that the active X 404 allele is, on average, more transcribed than the inactive X. We validated this assumption 405 comparing the prediction of our algorithm with the phasing of twins’ X alleles based on 406 parental information (more details in Supplemental Text S1). According to this metric,

12 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

407 inactivated genes cluster around AR=1 while known escapees appear as been uniformly 408 distributed between 0.5≤AR≤0.95 (linear phase of the cumulative distribution, Supplemental 409 Figure S7). As support of this observation, AR distribution of autosomal genes clearly 410 indicates AR=0.95 as the threshold separating biallelically expressed genes from monoallelic 411 expressed genes (Supplemental Figure S8). Therefore, we consider a gene as escapee in the 412 relaxed set when the aggregate AR is ≤ 0.95 in at least 1 individual and as escapee in the 413 robust set when the aggregate AR is ≤ 0.95 in at least 2 individuals. To reduce the effect of 414 allele dropout, we only consider for the analysis SNV sites covered by at least 5 reads in at 415 least three cells. To reduce sampling bias effects (Deng et al. 2014a) a gene is included in the 416 analysis if detectable in more than 5 different cells and/or SNVs per sample. 417 418 Haplotype and multiple cells (doublets) detection 419 For each cell, the expressed haplotype was estimated by calculating the allelic ratio of each 420 heterozygous site in the X chromosome as identified by whole genome sequencing, excluding 421 sites in the PAR regions (PAR1: chrX:60001-2699520, PAR2: chrX:154931044-155260560) 422 and in known escapee genes (see section Annotation of known escapee genes). The estimated 423 haplotype of each cell was compared to all others through pairwise correlation based 424 hierarchical clustering procedure. A comparison of cells expressing concordant and 425 discordant haplotypes results in a correlation near 1 and -1 respectively. Doublets 426 simultaneously expressing both haplotypes cluster around an absolute correlation of 0.5 are 427 identified and excluded from further analysis. 428 429 Annotation of the escapee genes 430 First, we curated a list of 190 previously observed escapee genes in different cell types and 431 tissues according to the literature (Carrel and Willard 2005) (Johnston et al. 2008; Park et al. 432 2010; Yasukochi et al. 2010; Sharp et al. 2011) (Cotton et al. 2013; Zhang et al. 2013)(Table 433 S3). Specifically, we investigated the status of 115 known escapee genes with available 434 informative heterozygous sites and being expressed in fibroblast and lymphoblast cell lines 435 (Supplemental Table S3). Second, we have appended the results published in two studies 436 (Balaton et al. 2015; Tukiainen et al. 2017) in Table S2. Genes detected as escapees in our 437 studies, in absence of citation, have been labeled as novel escapee genes. Genes found as 438 escapee in our study and found subject to inactivation in other studies have been labeled 439 variable escapee genes. 440

13 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

441 Cell cycle phase assignment 442 G1, S, and G2M cell cycle stage related gene markers were obtained from CycleBase (Santos 443 et al. 2015) Cells not expressing MKI67 have been considered to be in G0 (Schonk et al. 444 1989) The remaining cells were assigned to their respective cell cycle phase according to the 445 expression of CycleBase genes with Cyclone (Scialdone et al. 2015). 446 447 448 449 450 451 ACKNOWLEDGMENTS 452 We thank the laboratory of E. T. Dermitzakis for discussions. This work was supported by 453 grants from the Swiss National Science Foundation (SNF-144082), and the European 454 Research Council (ERC-249968) and the ChildCare foundation to S.E.A.. The computations 455 were performed at the Vital-IT (http://www.vital-it.ch) Center for high-performance 456 computing of the SIB Swiss Institute of Bioinformatics. 457 458 ACCESSION NUMBERS 459 Newly generated RNA and DNA sequencing data are deposited in the European Genome- 460 phenome Archive (EGA, https://www.ebi.ac.uk/ega/) for controlled accesses; the study 461 accession number is (EGASxxxx, to be determined). 462 463 464 465 466 467 468 469 470 471 472 LEGENDS TO FIGURES 473 474 Figure 1. Identification and elimination of confounding doublets.

14 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

475 (A) Flowchart of the study. Whole genome sequencing for each individual identified the 476 respective heterozygous sites. Single-cell RNA-seq provided RNA abundance for each single 477 cell. After QC and removal of the doublets, our framework calculated AR for the X-linked 478 genes to enable escapees identification and study the cellular XCI. (B) Heat map of 479 unsupervised hierarchical clustering using cell-cell pairwise Pearson correlation coefficients 480 of AR from single cells from individual 3 before (left) and after (right), excluding the 481 identified doublets. The heat map separated the cells expressing one haplotype (blue and 482 green cluster) from cells expressing two haplotypes (black cluster, doublets). Pearson 483 correlations range from -1 (red) to 1 (white). 484 485 Figure 2. Single cell Allelic Ratio profiles with respect to the active haplotype for genes 486 on female X chromosome. For each individual, the Allelic Ratio for each gene (fibroblasts 487 or lymphoblasts) is reported for each cell along the x-axis (rectangles with AR ≥ 0.95 (blue) 488 an AR ≤ 0.95 (red)) according to the genomic location of genes in the human X chromosome 489 (y-axis). 55 identified escapee genes in at least one individual are annotated on the left of the 490 X chromosome ideogram. Known escapees are shown in red; novel escapees in black; 491 escapees from the robust set with an asterisk. XIST is shown in green. PAR, pseudoautosomal 492 regions; LCL, lymphoblastoid cells. 493 494 495 Figure 3. Single cell ASE profile of 22 robust escapee genes in human female fibroblasts. 496 Composite figure of individual allelic ratios per gene per cell (Top of the panel). Allelic 497 Ratio profile of robust escapee genes (listed the rows) with a detectable expression in single 498 cells (ordered along the columns) is shown. Every dot represents the allelic ratio of the 499 respective gene in a cell. AR ranges from 0 (light orange) to 1 (dark blue). The size of the dot 500 is proportional to the respective number of reads detected per cell. (%) is the percentage of 501 cells where the respective gene is escaping XCI. (Bottom of the panel). Bar plot representing 502 the Inactivation Score (see text for details) per cell. IS ranges from 0 (light orange) to 1 (dark 503 blue). 504 505 Figure 4. Cellular propensity to escape XCI. 506 (A) Cells ranked by the Inactivation Score calculated on all escapees in the robust set (left - 507 Escapee) and on all inactive genes (right - Inactive) per individual (color coded - see legend). 508 Each dot represents a fibroblast cell. (B) Y-axis - cells ordered according to the percentage of

15 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

509 escapees displaying an inactive status (AR ≥ 0.95, with respect to all detectable escapees) per 510 individual (x-axis). Each dot represents a fibroblast cell. 511 512 Figure 5. XIST association and correlation with IS and its repressive role for X 513 chromosome genes. 514 (A) Manhattan-like plot for Pearson correlation of X-linked genes expression with

515 Inactivation Score. The plot represents -log10 p-value against X chromosome position. 516 Vertical lines represent PAR region limits. Red horizontal lines represent the significance 517 threshold for positively and negatively correlated genes, respectively. (B) Distributions of 518 correlations of expression of XIST with inactivated genes (upper left), PAR located genes 519 (upper right), escapees in robust set (lower left), escapees in relaxed set (lower right). P- 520 values calculated with Mann-Whitney test (see text for details). 521 522 Figure 6. XIST expression and IS dependency on cell cycle 523 Distribution of XIST expression (left) and of Inactivation Scores per single cells (right) 524 according to cell cycle phases (n=number of cells per stage). P-values calculated with Mann- 525 Whitney test (see text for details) 526 527 528 Supplemental Data 529 Supplemental data include eight figures, additional methods and three tables. 530 531

532 References 533 Ambros IM, Rumpler S, Luegmayr A, Hattinger CM, Strehl S, Kovar H, Gadner H, Ambros 534 PF. 1997. Neuroblastoma cells can actively eliminate supernumerary MYCN gene 535 copies by micronucleus formation--sign of tumour cell revertance? Eur J Cancer 33: 536 2043-2049. 537 Balaton BP, Cotton AM, Brown CJ. 2015. Derivation of consensus inactivation status for X- 538 linked genes from genome-wide studies. Biol Sex Differ 6: 35. 539 Ballabio A, Willard HF. 1992. Mammalian X-chromosome inactivation and the XIST gene. 540 Curr Opin Genet Dev 2: 439-447. 541 Bartolomei MS, Ferguson-Smith AC. 2011. Mammalian genomic imprinting. Cold Spring 542 Harbor perspectives in biology 3. 543 Berletch JB, Ma W, Yang F, Shendure J, Noble WS, Disteche CM, Deng X. 2015. Escape 544 from X inactivation varies in mouse tissues. PLoS Genet 11: e1005079. 545 Berletch JB, Yang F, Xu J, Carrel L, Disteche CM. 2011. Genes that escape from X 546 inactivation. Human genetics 130: 237-245.

16 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

547 Bernard G, Breittmayer JP, de Matteis M, Trampont P, Hofman P, Senik A, Bernard A. 1997. 548 Apoptosis of immature thymocytes mediated by E2/CD99. J Immunol 158: 2543- 549 2550. 550 Bernard G, Raimondi V, Alberti I, Pourtein M, Widjenes J, Ticchioni M, Bernard A. 2000. 551 CD99 (E2) up-regulates alpha4beta1-dependent T cell adhesion to inflamed vascular 552 endothelium under flow conditions. Eur J Immunol 30: 3061-3065. 553 Borel C, Ferreira PG, Santoni F, Delaneau O, Fort A, Popadin KY, Garieri M, Falconnet E, 554 Ribaux P, Guipponi M et al. 2015. Biased allelic expression in human primary 555 fibroblast single cells. Am J Hum Genet 96: 70-80. 556 Borensztein M, Syx L, Ancelin K, Diabangouaya P, Picard C, Liu T, Liang JB, Vassilev I, 557 Galupa R, Servant N et al. 2017. Xist-dependent imprinted X inactivation and the 558 early developmental consequences of its failure. Nat Struct Mol Biol 24: 226-233. 559 Brown CJ, Ballabio A, Rupert JL, Lafreniere RG, Grompe M, Tonlorenzi R, Willard HF. 560 1991. A gene from the region of the human X inactivation centre is expressed 561 exclusively from the inactive X chromosome. Nature 349: 38-44. 562 Brown CJ, Hendrich BD, Rupert JL, Lafreniere RG, Xing Y, Lawrence J, Willard HF. 1992. 563 The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains 564 conserved repeats and is highly localized within the nucleus. Cell 71: 527-542. 565 Cacciagli P, Sutera-Sardo J, Borges-Correia A, Roux JC, Dorboz I, Desvignes JP, Badens C, 566 Delepine M, Lathrop M, Cau P et al. 2013. Mutations in BCAP31 cause a severe X- 567 linked phenotype with deafness, dystonia, and central hypomyelination and 568 disorganize the Golgi apparatus. Am J Hum Genet 93: 579-586. 569 Carrel L, Willard HF. 1999. Heterogeneous gene expression from the inactive X 570 chromosome: an X-linked gene that escapes X inactivation in some human cell lines 571 but is inactivated in others. Proc Natl Acad Sci U S A 96: 7364-7369. 572 Carrel L, Willard HF. 2005. X-inactivation profile reveals extensive variability in X-linked 573 gene expression in females. Nature 434: 400-404. 574 Ceccarelli DF, Laister RC, Mulligan VK, Kean MJ, Goudreault M, Scott IC, Derry WB, 575 Chakrabartty A, Gingras AC, Sicheri F. 2011. CCM3/PDCD10 heterodimerizes with 576 germinal center kinase III (GCKIII) proteins using a mechanism analogous to CCM3 577 homodimerization. J Biol Chem 286: 25056-25064. 578 Chen CK, Blanco M, Jackson C, Aznauryan E, Ollikainen N, Surka C, Chow A, Cerase A, 579 McDonel P, Guttman M. 2016a. Xist recruits the X chromosome to the nuclear lamina 580 to enable chromosome-wide silencing. Science 354: 468-472. 581 Chen G, Schell JP, Benitez JA, Petropoulos S, Yilmaz M, Reinius B, Alekseenko Z, Shi L, 582 Hedlund E, Lanner F et al. 2016b. Single-cell analyses of X Chromosome inactivation 583 dynamics and pluripotency during differentiation. Genome research 26: 1342-1354. 584 Chow J, Heard E. 2009. X inactivation and the complexities of silencing a sex chromosome. 585 Curr Opin Cell Biol 21: 359-366. 586 Cotton AM, Ge B, Light N, Adoue V, Pastinen T, Brown CJ. 2013. Analysis of expressed 587 SNPs identifies variable extents of expression from the human inactive X 588 chromosome. Genome Biol 14: R122. 589 Cotton AM, Price EM, Jones MJ, Balaton BP, Kobor MS, Brown CJ. 2015. Landscape of 590 DNA methylation on the X chromosome reflects CpG density, functional chromatin 591 state and X-chromosome inactivation. Hum Mol Genet 24: 1528-1539. 592 Crowley JJ, Zhabotynsky V, Sun W, Huang S, Pakatci IK, Kim Y, Wang JR, Morgan AP, 593 Calaway JD, Aylor DL et al. 2015. Analyses of allele-specific gene expression in 594 highly divergent mouse crosses identifies pervasive allelic imbalance. Nature genetics 595 47: 353-360.

17 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

596 Deng Q, Ramskold D, Reinius B, Sandberg R. 2014a. Single-cell RNA-seq reveals dynamic, 597 random monoallelic gene expression in mammalian cells. Science 343: 193-196. 598 Deng X, Berletch JB, Nguyen DK, Disteche CM. 2014b. X chromosome regulation: diverse 599 patterns in development, tissues and disease. Nature reviews Genetics 15: 367-378. 600 Dunford A, Weinstock DM, Savova V, Schumacher SE, Cleary JP, Yoda A, Sullivan TJ, 601 Hess JM, Gimelbrant AA, Beroukhim R et al. 2016. Tumor-suppressor genes that 602 escape from X-inactivation contribute to cancer sex bias. Nature genetics 603 doi:10.1038/ng.3726. 604 Ehrmann IE, Ellis PS, Mazeyrat S, Duthie S, Brockdorff N, Mattei MG, Gavin MA, Affara 605 NA, Brown GM, Simpson E et al. 1998. Characterization of genes encoding 606 translation initiation factor eIF-2gamma in mouse and human: sex chromosome 607 localization, escape from X-inactivation and evolution. Hum Mol Genet 7: 1725-1737. 608 El-Mansoury M, Bryman I, Berntorp K, Hanson C, Wilhelmsen L, Landin-Wilhelmsen K. 609 2005. Hypothyroidism is common in turner syndrome: results of a five-year follow- 610 up. J Clin Endocrinol Metab 90: 2131-2135. 611 Fernandez-Moreira D, Ugalde C, Smeets R, Rodenburg RJ, Lopez-Laso E, Ruiz-Falco ML, 612 Briones P, Martin MA, Smeitink JA, Arenas J. 2007. X-linked NDUFA1 gene 613 mutations associated with mitochondrial encephalomyopathy. Ann Neurol 61: 73-83. 614 Fialkow PJ. 1970. X-chromosome inactivation and the Xg . Am J Hum Genet 22: 460- 615 463. 616 Froyen G, Belet S, Martinez F, Santos-Reboucas CB, Declercq M, Verbeeck J, Donckers L, 617 Berland S, Mayo S, Rosello M et al. 2012. Copy-number gains of HUWE1 due to 618 replication- and recombination-based rearrangements. Am J Hum Genet 91: 252-264. 619 Froyen G, Corbett M, Vandewalle J, Jarvela I, Lawrence O, Meldrum C, Bauters M, 620 Govaerts K, Vandeleur L, Van Esch H et al. 2008. Submicroscopic duplications of the 621 hydroxysteroid dehydrogenase HSD17B10 and the E3 ubiquitin ligase HUWE1 are 622 associated with mental retardation. Am J Hum Genet 82: 432-443. 623 Gong X, Delorme R, Fauchereau F, Durand CM, Chaste P, Betancur C, Goubran-Botros H, 624 Nygren G, Anckarsater H, Rastam M et al. 2009. An investigation of ribosomal 625 protein L10 gene in autism spectrum disorders. BMC Med Genet 10: 7. 626 Grasso CS, Wu YM, Robinson DR, Cao X, Dhanasekaran SM, Khan AP, Quist MJ, Jing X, 627 Lonigro RJ, Brenner JC et al. 2012. The mutational landscape of lethal castration- 628 resistant prostate cancer. Nature 487: 239-243. 629 Gropman A, Samango-Sprouse CA. 2013. Neurocognitive variance and neurological 630 underpinnings of the X and Y chromosomal variations. American journal of medical 631 genetics Part C, Seminars in medical genetics 163C: 35-43. 632 Gueneau L, Bertrand AT, Jais JP, Salih MA, Stojkovic T, Wehnert M, Hoeltzenbein M, 633 Spuler S, Saitoh S, Verschueren A et al. 2009. Mutations of the FHL1 gene cause 634 Emery-Dreifuss muscular dystrophy. Am J Hum Genet 85: 338-353. 635 Guo A, Tan Y, Liu C, Zheng X. 2017. MST-4 and TRAF-6 expression in the peripheral 636 blood mononuclear cells of patients with Graves' disease and its significance. BMC 637 Endocr Disord 17: 11. 638 Huynh KD, Lee JT. 2003. Inheritance of a pre-inactivated paternal X chromosome in early 639 mouse embryos. Nature 426: 857-862. 640 Jiao S, Zhang Z, Li C, Huang M, Shi Z, Wang Y, Song X, Liu H, Li C, Chen M et al. 2015. 641 The kinase MST4 limits inflammatory responses through direct phosphorylation of 642 the adaptor TRAF6. Nat Immunol 16: 246-257. 643 Johnston CM, Lovell FL, Leongamornlert DA, Stranger BE, Dermitzakis ET, Ross MT. 644 2008. Large-scale population study of human cell lines indicates that dosage 645 compensation is virtually complete. PLoS Genet 4: e9.

18 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

646 Jones DT, Jager N, Kool M, Zichner T, Hutter B, Sultan M, Cho YJ, Pugh TJ, Hovestadt V, 647 Stutz AM et al. 2012. Dissecting the genomic complexity underlying 648 medulloblastoma. Nature 488: 100-105. 649 Jonkers I, Monkhorst K, Rentmeester E, Grootegoed JA, Grosveld F, Gribnau J. 2008. Xist 650 RNA is confined to the nuclear territory of the silenced X chromosome throughout the 651 cell cycle. Mol Cell Biol 28: 5583-5594. 652 Kere J, Srivastava AK, Montonen O, Zonana J, Thomas N, Ferguson B, Munoz F, Morgan D, 653 Clarke A, Baybayan P et al. 1996. X-linked anhidrotic (hypohidrotic) ectodermal 654 dysplasia is caused by in a novel transmembrane protein. Nature genetics 655 13: 409-416. 656 Kremer EJ, Pritchard M, Lynch M, Yu S, Holman K, Baker E, Warren ST, Schlessinger D, 657 Sutherland GR, Richards RI. 1991. Mapping of DNA instability at the fragile X to a 658 trinucleotide repeat sequence p(CCG)n. Science 252: 1711-1714. 659 Lederer D, Grisart B, Digilio MC, Benoit V, Crespin M, Ghariani SC, Maystadt I, 660 Dallapiccola B, Verellen-Dumoulin C. 2012. Deletion of KDM6A, a histone 661 demethylase interacting with MLL2, in three patients with Kabuki syndrome. Am J 662 Hum Genet 90: 119-124. 663 Lee JT, Bartolomei MS. 2013. X-inactivation, imprinting, and long noncoding RNAs in 664 health and disease. Cell 152: 1308-1323. 665 Li H, Durbin R. 2010. Fast and accurate long-read alignment with Burrows-Wheeler 666 transform. Bioinformatics 26: 589-595. 667 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin 668 R, Genome Project Data Processing S. 2009. The Sequence Alignment/Map format 669 and SAMtools. Bioinformatics 25: 2078-2079. 670 Liewluck T, Hayashi YK, Ohsawa M, Kurokawa R, Fujita M, Noguchi S, Nonaka I, Nishino 671 I. 2007. Unfolded protein response and aggresome formation in hereditary reducing- 672 body myopathy. Muscle Nerve 35: 322-326. 673 Lyon MF. 1961. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 674 190: 372-373. 675 Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, 676 Kamitaki N, Martersteck EM et al. 2015. Highly Parallel Genome-wide Expression 677 Profiling of Individual Cells Using Nanoliter Droplets. Cell 161: 1202-1214. 678 Mak W, Nesterova TB, de Napoles M, Appanah R, Yamanaka S, Otte AP, Brockdorff N. 679 2004. Reactivation of the paternal X chromosome in early mouse embryos. Science 680 303: 666-669. 681 Manara MC, Bernard G, Lollini PL, Nanni P, Zuntini M, Landuzzi L, Benini S, Lattanzi G, 682 Sciandra M, Serra M et al. 2006. CD99 acts as an oncosuppressor in osteosarcoma. 683 Mol Biol Cell 17: 1910-1921. 684 Marco-Sola S, Sammeth M, Guigo R, Ribeca P. 2012. The GEM mapper: fast, accurate and 685 versatile alignment by filtration. Nat Methods doi:nmeth.2221 [pii] 686 10.1038/nmeth.2221. 687 Miyake N, Mizuno S, Okamoto N, Ohashi H, Shiina M, Ogata K, Tsurusaki Y, Nakashima 688 M, Saitsu H, Niikawa N et al. 2013. KDM6A point mutations cause Kabuki 689 syndrome. Human mutation 34: 108-110. 690 Moortgat S, Desir J, Benoit V, Boulanger S, Pendeville H, Nassogne MC, Lederer D, 691 Maystadt I. 2016. Two novel EIF2S3 mutations associated with syndromic 692 intellectual disability with severe microcephaly, growth retardation, and epilepsy. Am 693 J Med Genet A 170: 2927-2933. 694 Myant KB, Cammareri P, Hodder MC, Wills J, Von Kriegsheim A, Gyorffy B, Rashid M, 695 Polo S, Maspero E, Vaughan L et al. 2017. HUWE1 is a critical colonic tumour

19 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

696 suppressor gene that prevents MYC signalling, DNA damage accumulation and 697 tumour initiation. EMBO Mol Med 9: 181-197. 698 Nawara M, Klapecki J, Borg K, Jurek M, Moreno S, Tryfon J, Bal J, Chelly J, Mazurczak T. 699 2008. Novel mutation of IL1RAPL1 gene in a nonspecific X-linked mental 700 retardation (MRX) family. Am J Med Genet A 146A: 3167-3172. 701 Okamoto I, Arnaud D, Le Baccon P, Otte AP, Disteche CM, Avner P, Heard E. 2005. 702 Evidence for de novo imprinted X-chromosome inactivation independent of meiotic 703 inactivation in mice. Nature 438: 369-373. 704 Okamoto I, Otte AP, Allis CD, Reinberg D, Heard E. 2004. Epigenetic dynamics of 705 imprinted X inactivation during early mouse development. Science 303: 644-649. 706 Park C, Carrel L, Makova KD. 2010. Strong purifying selection at genes escaping X 707 chromosome inactivation. Molecular biology and evolution 27: 2446-2450. 708 Patrat C, Okamoto I, Diabangouaya P, Vialon V, Le Baccon P, Chow J, Heard E. 2009. 709 Dynamic changes in paternal X-chromosome activity during imprinted X- 710 chromosome inactivation in mice. Proc Natl Acad Sci U S A 106: 5198-5203. 711 Penny GD, Kay GF, Sheardown SA, Rastan S, Brockdorff N. 1996. Requirement for Xist in 712 X chromosome inactivation. Nature 379: 131-137. 713 Petropoulos S, Edsgard D, Reinius B, Deng Q, Panula SP, Codeluppi S, Reyes AP, 714 Linnarsson S, Sandberg R, Lanner F. 2016. Single-Cell RNA-Seq Reveals Lineage 715 and X Chromosome Dynamics in Human Preimplantation Embryos. Cell 167: 285. 716 Potluri P, Davila A, Ruiz-Pesini E, Mishmar D, O'Hearn S, Hancock S, Simon M, Scheffler 717 IE, Wallace DC, Procaccio V. 2009. A novel NDUFA1 mutation leads to a 718 progressive mitochondrial complex I-specific neurodegenerative disease. Mol Genet 719 Metab 96: 189-195. 720 Prothero KE, Stahl JM, Carrel L. 2009. Dosage compensation and gene expression on the 721 mammalian X chromosome: one plus one does not always equal two. Chromosome 722 Res 17: 637-648. 723 Quinzii CM, Vu TH, Min KC, Tanji K, Barral S, Grewal RP, Kattah A, Camano P, Otaegui 724 D, Kunimatsu T et al. 2008. X-linked dominant scapuloperoneal myopathy is due to a 725 mutation in the gene encoding four-and-a-half-LIM protein 1. Am J Hum Genet 82: 726 208-213. 727 Reinius B, Mold JE, Ramskold D, Deng Q, Johnsson P, Michaelsson J, Frisen J, Sandberg R. 728 2016. Analysis of allelic expression patterns in clonal somatic cells by single-cell 729 RNA-seq. Nature genetics 48: 1430-1435. 730 Santoni FA, Stamoulis G, Garieri M, Falconnet E, Ribaux P, Borel C, Antonarakis SE. 2017. 731 Detection of Imprinted Genes by Single-Cell Allele-Specific Gene Expression. Am J 732 Hum Genet 100: 444-453. 733 Santos A, Wernersson R, Jensen LJ. 2015. Cyclebase 3.0: a multi-organism database on cell- 734 cycle regulation and phenotypes. Nucleic Acids Res 43: D1140-1144. 735 Schessl J, Zou Y, McGrath MJ, Cowling BS, Maiti B, Chin SS, Sewry C, Battini R, Hu Y, 736 Cottle DL et al. 2008. Proteomic identification of FHL1 as the protein mutated in 737 human reducing body myopathy. J Clin Invest 118: 904-912. 738 Schneider-Gadicke A, Beer-Romero P, Brown LG, Nussbaum R, Page DC. 1989. ZFX has a 739 gene structure similar to ZFY, the putative human sex determinant, and escapes X 740 inactivation. Cell 57: 1247-1258. 741 Schonk DM, Kuijpers HJ, van Drunen E, van Dalen CH, Geurts van Kessel AH, Verheijen R, 742 Ramaekers FC. 1989. Assignment of the gene(s) involved in the expression of the 743 proliferation-related Ki-67 antigen to human chromosome 10. Human genetics 83: 744 297-299.

20 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

745 Scialdone A, Natarajan KN, Saraiva LR, Proserpio V, Teichmann SA, Stegle O, Marioni JC, 746 Buettner F. 2015. Computational assignment of cell-cycle stage from single-cell 747 transcriptome data. Methods 85: 54-61. 748 Shalaby S, Hayashi YK, Nonaka I, Noguchi S, Nishino I. 2009. Novel FHL1 mutations in 749 fatal and benign reducing body myopathy. Neurology 72: 375-376. 750 Sharman GB. 1971. Late DNA replication in the paternally derived X chromosome of female 751 kangaroos. Nature 230: 231-232. 752 Sharp AJ, Stathaki E, Migliavacca E, Brahmachary M, Montgomery SB, Dupre Y, 753 Antonarakis SE. 2011. DNA methylation profiles of human active and inactive X 754 chromosomes. Genome research 21: 1592-1600. 755 Shashi V, Berry MN, Shoaf S, Sciote JJ, Goldstein D, Hart TC. 2000. A unique form of 756 mental retardation with a distinctive phenotype maps to Xq26-q27. Am J Hum Genet 757 66: 469-479. 758 Shashi V, Xie P, Schoch K, Goldstein DB, Howard TD, Berry MN, Schwartz CE, Cronin K, 759 Sliwa S, Allen A et al. 2015. The RBMX gene as a candidate for the Shashi X-linked 760 intellectual disability syndrome. Clin Genet 88: 386-390. 761 Sobecki M, Mrouj K, Colinge J, Gerbe F, Jay P, Krasinska L, Dulic V, Fisher D. 2017. Cell- 762 Cycle Regulation Accounts for Variability in Ki-67 Expression Levels. Cancer Res 763 77: 2722-2734. 764 Stegle O, Teichmann SA, Marioni JC. 2015. Computational and analytical challenges in 765 single-cell transcriptomics. Nature reviews Genetics 16: 133-145. 766 Sudbrak R, Wieczorek G, Nuber UA, Mann W, Kirchner R, Erdogan F, Brown CJ, Wohrle 767 D, Sterk P, Kalscheuer VM et al. 2001. X chromosome-specific cDNA arrays: 768 identification of genes that escape from X-inactivation and other applications. Hum 769 Mol Genet 10: 77-83. 770 Tabolacci E, Pomponi MG, Pietrobono R, Terracciano A, Chiurazzi P, Neri G. 2006. A 771 truncating mutation in the IL1RAPL1 gene is responsible for X-linked mental 772 retardation in the MRX21 family. Am J Med Genet A 140: 482-487. 773 Tao R, Jin B, Guo SZ, Qing W, Feng GY, Brooks DG, Liu L, Xu J, Li T, Yan Y et al. 2006. 774 A novel missense mutation of the EDA gene in a Mongolian family with congenital 775 hypodontia. J Hum Genet 51: 498-502. 776 Tarpey P, Pemberton TJ, Stockton DW, Das P, Ninis V, Edkins S, Andrew Futreal P, 777 Wooster R, Kamath S, Nayak R et al. 2007. A novel Gln358Glu mutation in 778 ectodysplasin A associated with X-linked dominant incisor hypodontia. Am J Med 779 Genet A 143: 390-394. 780 Thevenon J, Michot C, Bole C, Nitschke P, Nizon M, Faivre L, Munnich A, Lyonnet S, 781 Bonnefont JP, Portes VD et al. 2015. RPL10 mutation segregating in a family with X- 782 linked syndromic Intellectual Disability. Am J Med Genet A 167A: 1908-1912. 783 Tukiainen T, Villani AC, Yen A, Rivas MA, Marshall JL, Satija R, Aguirre M, Gauthier L, 784 Fleharty M, Kirby A et al. 2017. Landscape of X chromosome inactivation across 785 human tissues. Nature 550: 244-248. 786 van Haaften G, Dalgliesh GL, Davies H, Chen L, Bignell G, Greenman C, Edkins S, Hardy 787 C, O'Meara S, Teague J et al. 2009. Somatic mutations of the histone H3K27 788 demethylase gene UTX in human cancer. Nature genetics 41: 521-523. 789 Visinoni AF, de Souza RL, Freire-Maia N, Gollop TR, Chautard-Freire-Maia EA. 2003. X- 790 linked hypohidrotic ectodermal dysplasia mutations in Brazilian families. Am J Med 791 Genet A 122A: 51-55. 792 Wang J, Syrett CM, Kramer MC, Basu A, Atchison ML, Anguera MC. 2016. Unusual 793 maintenance of X chromosome inactivation predisposes female lymphocytes for 794 increased expression from the inactive X. Proc Natl Acad Sci U S A 113: E2029-2038.

21 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

795 Wang K, Li M, Hakonarson H. 2010. ANNOVAR: functional annotation of genetic variants 796 from high-throughput sequencing data. Nucleic Acids Res 38: e164. 797 Yang F, Babak T, Shendure J, Disteche CM. 2010. Global survey of escape from X 798 inactivation by RNA-sequencing in mouse. Genome research 20: 614-622. 799 Yasukochi Y, Maruyama O, Mahajan MC, Padden C, Euskirchen GM, Schulz V, Hirakawa 800 H, Kuhara S, Pan XH, Newburger PE et al. 2010. X chromosome-wide analyses of 801 genomic DNA methylation states and gene expression in male and female neutrophils. 802 Proc Natl Acad Sci U S A 107: 3704-3709. 803 Yotsumoto S, Fukumaru S, Matsushita S, Oku T, Kobayashi K, Saheki T, Kanzaki T. 1998. 804 A novel point mutation of the EDA gene in a Japanese family with anhidrotic 805 ectodermal dysplasia. J Invest Dermatol 111: 1246-1247. 806 Zhang Y, Castillo-Morales A, Jiang M, Zhu Y, Hu L, Urrutia AO, Kong X, Hurst LD. 2013. 807 Genes that escape X-inactivation in humans have high intraspecific variability in 808 expression, are associated with mental impairment but are not slow evolving. 809 Molecular biology and evolution 30: 2588-2601. 810

22 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A B ind 1 ind 2 ind 3 ind 4 ind 5

−1 0 1 correlation

229 SC 160 SC 192 SC 192 SC 162 SC 48 SC CELLS_ individual 3 CELLS_ individual 3 WGS Identification of heterozygous scRNA-seq sites for each individual 983 SC

QC filtering 951 SC

Removal of double cells 869 SC

X chromosome ASE single cell analysis

CELLS_ individual 3 CELLS_ individual 3 Study of cellular dynamics of XCI GYG2 *ARSD LCL *ARSE *MXRA5 ind. 1 ind. 2 ind. 3 ind. 4 ind. 5 5 PLCXD1 GTPBP6 *PUDP PPP2R3B SHOX PAR CSF2RA IL3RA *STS SLC25A6 ASMTL AKAP17A MID1 DHRSX bioRxiv preprint doi: https://doi.org/10.1101/298984ZBED1 ; this version posted April 11, 2018. The copyright holder for this preprint (which was not CD99P1 CD99 TRAPPC2 certifiedXG by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

OFD1 GYG2 ARSD ARSE GPM6B MXRA5 NLGN4X PUDP STS PIR ANOS1 TBL1X SHROOM2 *CA5BP1 WWC3 CLCN4 MID1 *CA5B ARHGAP6 MSL3 FRMPD4 *INE2 PRPS2 TLR7 TMSB4X *ZRSR2 TCEANC TRAPPC2 OFD1 AP1S2 GPM6B GEMIN8 GLRA2 FANCB *TXLNG MOSPD2 ASB9 PIR *RBBP7 CA5BP1 CA5B INE2 SMS ZRSR2 EIF1AX AP1S2 PRDX4 GRPR TXLNG RBBP7 EIF2S3 NHS SCML1 SCML2 *ZFX CDKL5 PHKA2 PDHA1 MAP3K15 IL1RAPL1 SH3KBP1 EIF1AX RPS6KA3 CXorf38 MBTPS2 SMS PHEX PTCHD1 PRDX4 ACOT9 SAT1 KLHL15 EIF2S3 ZFX PDK3 POLA1 IL1RAPL1 GK TAB3 DMD TMEM47 PRRG1 *MED14 DYNLT3 SYTL5 *USP9X SRPX RPGR TSPAN7 *DDX3X MID1IP1 BCOR ATP6AP2 *KDM6A CXorf38 MED14 USP9X CHST7 DDX3X CASK RP2 GPR34 GPR82 GPR173 MAOA FUNDC1 KDM6A KRBOX4 *KDM5C ZNF674 CHST7 SLC9A7 *SMC1A RP2 JADE3 RGN ZNF41 *HUWE1 ARAF TIMP1 ELK1 FAM120C ZNF81 ZNF182 SLC38A5 FTSJ1 EBP UQCRBP1 TBC1D25 RBM3 TIMM17B SLC35A2 GRIPAP1 PRAF2 WDR45 CCDC22 PPP1R3F USP27X CLCN5 SHROOM4 GSPT2 MAGED1 GPR173 KDM5C SMC1A HUWE1 PHF8 FAM120C EFNB1 WNK3 TSR2 FGD1 EDA GNL3L MAGED2 APEX2 FAM104B RRAGB UQCRBP1 PIN4 ZXDB ZXDA HEPH EDA2R XIST OPHN1 JPX YIPF6 EFNB1 PJA1 FTX EDA IGBP1 *TTC3P1 ARR3 KIF4A DLG3 SNX12 TAF1 OGT NHSL2 PIN4 ERCC6L RPS4X HDAC8 CHIC1 XIST JPX FTX SLC16A2 RLIM ABCB7 UPRT TTC3P1 PBDC1 ATRX ITM2A BRWD3 HMGN5 SH3BGRL RPS6KA6 HDX APOOL ZNF711 CHM KLHL4 NAP1L3 DIAPH2 TSPAN6 SRPX2 SYTL4 CSTF2 TRMT2B CENPI DRP2 RPL36A GLA ARMCX4 ARMCX1 LINC00630 ARMCX3 ARMCX2 LINC00630 BEX4 TCEAL7 MORF4L2 SLC25A53 FAM199X NRK RNF128 TBC1D8B MORC4 CLDN2 RBM41 NUP62CL MID2 ATG4A COL4A6 COL4A5 NXT2 ACSL4 TMEM164 AMMECR1 RGAG1 PAK3 ALG13 LRCH2 PLS3 WDR44 DOCK11 IL13RA1 MCTS1 PGRMC1 SLC25A43 CXorf56 UBE2A NKRF SEPT6 RPL39 UPF3B LAMP2 CUL4B MCTS1 C1GALT1C1 GRIA3 THOC2 XIAP STAG2 SMARCA1 ELF4 OCRL SASH3 ZDHHC9 UTP14A BCORL1 ELF4 AIFM1 ZNF280C ENOX2 *STK26 FIRRE STK26 PHF6 MIR503HG PLAC1 FAM122C MOSPD1 FHL1 MMGT1 SLC9A6 FHL1 *RBMX MAP7D3 HTATSF1 ARHGEF6 RBMX FGF13 ATP11C LDOC1 FMR1 IDS MAMLD1 MTM1 MTMR1 CD99L2 VMA21 GABRE NSDHL ZNF275 HAUS7 BGN PLXNB3 FMR1 SSR4 ARHGAP4 NAA10 HCFC1 TMEM187 IRAK1 MAMLD1 MECP2 FLNA MTMR1 RPL10 DNASE1L1 ATP6AP1 NSDHL FAM3A GAB3 MPP1 NAA10 VBP1 CLIC2 FAM3A TMLHE

VAMP7 AR: [0-0.95], escapee AR: [0.95-1], inactivated cells bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

0 1 Genes % Cells CA5BP1 37 DDX3X 29 HUWE1 23 Individual 1 INE2 14 KDM5C 38 KDM6A 16 MED14 0 MXRA5 4 RBBP7 0 RBMX 49 STK26 71 STS 12 TTC3P1 27 TXLNG 50 ZFX 85 ZRSR2 32

1.00

10.75

0.50 IS

0.25

0.00 0 CAP39_C67CAP29_C22CAP49_C18CAP49_C49CAP29_C20CAP49_C09CAP49_C19CAP29_C03CAP49_C73CAP29_C11CAP41_C86CAP49_C27CAP29_C44CAP49_C45CAP29_C34CAP49_C21CAP39_C69CAP49_C51CAP29_C36CAP41_C77CAP41_C03CAP49_C30CAP39_C89CAP39_C59CAP29_C01CAP29_C29CAP49_C17CAP49_C55CAP29_C42CAP41_C28CAP39_C64CAP41_C87CAP41_C52CAP29_C75CAP39_C17CAP49_C04CAP41_C75CAP39_C09CAP39_C57CAP41_C08CAP49_C28CAP41_C16CAP49_C20CAP49_C26CAP41_C92CAP39_C77CAP41_C02CAP49_C36CAP29_C14CAP49_C05CAP29_C62CAP41_C29CAP39_C58CAP49_C91CAP29_C26CAP41_C38CAP39_C11CAP41_C83CAP41_C91CAP49_C07CAP41_C27CAP39_C92CAP39_C70CAP39_C87CAP41_C63CAP39_C13CAP29_C18CAP39_C66CAP29_C85CAP29_C07CAP41_C55CAP39_C01CAP41_C51CAP41_C84CAP39_C63CAP41_C07CAP29_C80CAP39_C04CAP49_C58CAP41_C85CAP41_C30CAP39_C49CAP41_C05CAP39_C80CAP29_C08CAP49_C47CAP29_C04CAP49_C13CAP49_C42CAP49_C33CAP49_C63CAP29_C39CAP29_C28CAP29_C93CAP39_C20CAP49_C93CAP39_C88CAP41_C65CAP29_C55CAP39_C22CAP41_C58CAP39_C78CAP41_C36CAP29_C51CAP39_C33CAP41_C60CAP29_C50CAP41_C68CAP41_C06CAP41_C35CAP49_C08CAP29_C24CAP39_C02CAP39_C85CAP49_C11CAP41_C50CAP41_C11CAP39_C18CAP29_C06CAP39_C31CAP49_C82CAP49_C75CAP39_C38CAP39_C07CAP49_C23CAP39_C48CAP41_C82CAP49_C50CAP39_C72CAP39_C10CAP41_C61CAP49_C16CAP49_C43CAP41_C43CAP49_C32CAP49_C29CAP39_C68CAP41_C41CAP41_C59CAP41_C54CAP29_C92CAP39_C81CAP29_C16CAP39_C52CAP39_C83CAP41_C88CAP29_C73CAP49_C59CAP41_C15CAP49_C81CAP41_C01CAP49_C54CAP41_C48CAP39_C91CAP41_C76CAP39_C82CAP49_C68CAP49_C06CAP39_C90CAP49_C03CAP49_C74CAP49_C22CAP41_C67CAP39_C30CAP41_C62CAP49_C79CAP41_C94CAP39_C36CAP29_C49CAP29_C35CAP41_C96CAP49_C12CAP29_C09CAP41_C66CAP29_C05CAP29_C10CAP41_C31CAP49_C88CAP41_C69CAP29_C81CAP49_C57CAP41_C12CAP41_C18CAP29_C87CAP39_C54CAP49_C01CAP49_C31CAP49_C39 ARSD 20 CA5BP1 30 DDX3X 42 Individual 2 KDM5C 25 KDM6A 33 MED14 15 MXRA5 33 PUDP 72 RBBP7 25 RBMX 60 STK26 42 TXLNG 50 USP9X 18 ZFX 100 ZRSR2 46

1.00

10.75

0.50 IS

0.25

0.00 0 CAP22_C70CAP22_C34CAP38_C03CAP22_C91CAP22_C35CAP22_C61CAP22_C30CAP38_C61CAP40_C27CAP38_C25CAP22_C26CAP38_C39CAP38_C59CAP38_C51CAP22_C95CAP22_C46CAP22_C33CAP38_C29CAP38_C87CAP22_C19CAP40_C72CAP22_C48CAP22_C23CAP40_C11CAP22_C55CAP38_C09CAP22_C92CAP22_C41CAP38_C81CAP22_C12CAP38_C72CAP38_C13CAP38_C86CAP22_C45CAP38_C50CAP22_C43CAP38_C22CAP38_C83CAP38_C26CAP38_C40CAP22_C72CAP40_C32CAP22_C78CAP22_C86CAP40_C82CAP22_C42CAP38_C15CAP40_C47CAP22_C63CAP38_C47CAP22_C08CAP40_C70CAP22_C27CAP22_C56CAP38_C24CAP38_C79CAP38_C42CAP38_C45CAP22_C21CAP38_C10CAP22_C24CAP22_C64CAP38_C55CAP22_C80CAP38_C07CAP22_C22CAP22_C81CAP22_C06CAP22_C28CAP22_C50CAP22_C87CAP38_C62CAP22_C18CAP40_C40CAP38_C78CAP38_C75CAP22_C38CAP38_C96CAP22_C49CAP22_C10CAP38_C01CAP22_C71CAP38_C56CAP38_C36CAP22_C44CAP22_C53CAP38_C71CAP22_C84CAP38_C84CAP40_C49CAP40_C76CAP22_C25CAP40_C65CAP22_C47CAP22_C68CAP38_C14CAP22_C05CAP22_C82CAP22_C67CAP22_C36CAP22_C96CAP22_C77CAP40_C12CAP38_C58CAP22_C54CAP40_C20CAP22_C65CAP22_C83CAP40_C37CAP22_C37CAP22_C74CAP40_C60CAP38_C69CAP22_C75CAP38_C88CAP38_C02CAP38_C82CAP22_C32CAP22_C59CAP22_C76CAP22_C07CAP38_C74CAP38_C68CAP38_C94CAP22_C09CAP22_C58CAP22_C11CAP38_C19CAP22_C15CAP40_C48CAP22_C20CAP22_C57CAP22_C62CAP38_C60CAP22_C90CAP40_C09CAP38_C49CAP38_C77CAP40_C26CAP40_C17CAP22_C93CAP22_C02CAP38_C28CAP38_C12CAP22_C69CAP22_C88CAP40_C51CAP22_C29CAP22_C40CAP22_C52 ARSE 29

CA5B 29 Individual 3 CA5BP1 42 DDX3X 51 HUWE1 13 INE2 10 KDM6A 22 MED14 4 PUDP 63 RBMX 0 SMC1A 41 STS 20 TTC3P1 25 USP9X 34 ZFX 94

1.00

10.75

0.50 IS 0.25

0.00 0 CAP23_C49TS1B_C02CAP23_C25TS1B_C22CAP35_C28CAP35_C88CAP35_C70CAPSC_C47TS1B_C68TS1B_C13TS1B_C12CAP23_C20CAP35_C75CAP23_C73CAP23_C15CAP35_C29TS1B_C39TS1B_C01CAP35_C90CAP23_C48TS1B_C03CAP23_C56CAP35_C21CAP35_C73TS1B_C35TS1B_C63CAP23_C40TS1B_C37CAP23_C33TS1B_C66TS1B_C24CAP23_C12TS1B_C55CAP35_C61CAP35_C62TS1B_C04TS1B_C50TS1B_C80CAP23_C22TS1B_C72CAP35_C44CAP23_C75CAP35_C04TS1B_C34CAP23_C70TS1B_C85CAP23_C31TS1B_C20TS1B_C40TS1B_C11TS1B_C14CAP35_C43CAP23_C43CAP23_C51TS1B_C51CAP35_C86TS1B_C23CAPSC_C17CAP35_C55CAP35_C53CAP23_C37CAP35_C10CAPSC_C90TS1B_C19CAP23_C88CAP23_C07TS1B_C21CAP35_C35CAP35_C13CAP35_C09CAP23_C55CAP23_C34CAP35_C77CAP35_C85CAP23_C11CAP35_C57CAP35_C33CAP23_C28CAP23_C19TS1B_C33CAP35_C56CAP35_C71CAP35_C54CAP35_C19CAP23_C62CAP35_C92CAP23_C63TS1B_C29CAP35_C32CAP23_C09CAP23_C72CAP35_C76CAP23_C58CAP35_C05CAP23_C53TS1B_C44CAP23_C52CAP35_C83CAP23_C01CAP35_C50CAP23_C32CAP23_C64CAP23_C68CAP23_C67CAP23_C10CAP35_C31TS1B_C76CAP23_C93TS1B_C31CAP35_C23CAP23_C05TS1B_C61CAP23_C87TS1B_C56CAP35_C02CAP23_C44CAP23_C03CAP35_C17CAP35_C15TS1B_C05CAP35_C66CAP35_C34CAPSC_C94TS1B_C32TS1B_C75CAP23_C69CAP23_C27CAP23_C57TS1B_C57TS1B_C38TS1B_C73CAP23_C86CAP35_C59CAP23_C08CAP23_C61CAP23_C14CAP35_C68CAP35_C03CAP23_C83CAP35_C01CAP35_C74CAP35_C11CAP35_C84CAP23_C81TS1B_C16CAP23_C65TS1B_C30TS1B_C45CAP35_C24CAP23_C24CAP23_C06CAP23_C92CAPSC_C89TS1B_C41CAP23_C54CAP35_C91CAP23_C84CAP23_C89TS1B_C81TS1B_C06TS1B_C27CAP23_C94CAP23_C96CAPSC_C18CAPSC_C71CAP35_C89CAP23_C82CAP23_C16CAP23_C23CAP23_C95CAP23_C39CAP35_C30CAP35_C06CAP35_C20CAP35_C22CAP35_C38CAP35_C81 ARSE 30

CA5B 33 Individual 4 CA5BP1 47 DDX3X 35 HUWE1 25 INE2 63 KDM6A 26 MED14 15 PUDP 80 RBMX 0 SMC1A 30 STS 27 TTC3P1 27 USP9X 14 ZFX 94

1.00

10.75

0.50 IS

0.25

0.00 0 CAP36_C55TN2A_C96TN2A_C88TN2A_C34CAP36_C70CAP24_C27CAP24_C53CAP24_C80CAP24_C26CAP24_C63TN2A_C64TN2A_C33CAP24_C52CAP24_C82TN2A_C41TN2A_C93TN2A_C37CAP36_C17TN2A_C39CAP36_C21CAP24_C84CAP24_C14CAP24_C35TN2A_C94TN2A_C81TN2A_C44CAP24_C30CAP24_C19CAP24_C68TN2A_C91TN2A_C73CAP24_C28CAP24_C21CAP24_C01CAP24_C16CAP24_C58CAP36_C62CAP24_C29CAP24_C81CAP36_C34CAP24_C83CAP36_C56CAP36_C44CAP36_C54CAP36_C02CAP24_C65CAP36_C85TN2A_C40CAP24_C33CAP24_C11CAP36_C84CAP24_C71TN2A_C87CAP36_C06CAP36_C86CAP24_C69CAP36_C92CAP36_C61TN2A_C92TN2A_C28CAP36_C08TN2A_C43CAP24_C04TN2A_C47CAP24_C85CAP36_C13CAP24_C07CAP36_C77TN2A_C27CAP24_C13CAP36_C04CAP36_C71CAP24_C09CAP24_C60CAP36_C24CAP36_C32CAP36_C66CAP24_C59CAP24_C03CAP24_C38CAP24_C37TN2A_C45CAP24_C18CAP24_C56TN2A_C46CAP24_C51CAP36_C48CAP36_C75CAP36_C63CAP36_C23CAP36_C67CAP36_C16CAP36_C18CAP36_C69CAP36_C05CAP36_C68CAP36_C81CAP24_C05CAP24_C20CAP24_C79CAP24_C34CAP36_C57CAP36_C51CAP24_C08CAP24_C77CAP24_C72CAP24_C73TN2A_C84TN2A_C82CAP24_C54CAP24_C22CAP36_C47CAP24_C64CAP24_C75CAP24_C50CAP36_C28CAP24_C24CAP36_C50CAP24_C25CAP24_C62CAP36_C79CAP36_C76CAP36_C78CAP36_C59CAP36_C91CAP36_C65CAP24_C87CAP24_C78CAP24_C67CAP36_C90CAP24_C49CAP36_C14CAP24_C86CAP36_C74CAP36_C72CAP24_C70CAP24_C02CAP36_C09TN2A_C38CAP36_C53CAP24_C61CAP36_C82CAP36_C87CAP24_C36CAP24_C66CAP36_C94CAP24_C57TN2A_C90CAP36_C43CAP36_C35CAP36_C60 ARSD 50

CA5BP1 53 Individual 5 DDX3X 61 INE2 44 KDM6A 27 MXRA5 4 PUDP 50 RBBP7 29 STK26 73 STS 29 TXLNG 73 USP9X 21 ZFX 89 ZRSR2 33

1.00

10.75

0.50 IS

0.25

0.00 0 DEMO4_C87DEMO3_C34DEMO4_C62DEMO3_C53DEMO4_C41DEMO3_C32DEMO3_C50DEMO3_C64DEMO3_C89DEMO1_C16DEMO3_C67DEMO4_C49DEMO4_C51DEMO3_C59DEMO3_C35DEMO3_C68DEMO3_C86DEMO3_C22DEMO3_C02DEMO3_C58DEMO4_C11DEMO3_C75DEMO1_C94DEMO3_C61DEMO3_C55DEMO3_C23DEMO3_C79DEMO3_C08DEMO4_C12DEMO1_C79DEMO4_C09DEMO3_C51DEMO3_C76DEMO4_C18DEMO4_C30DEMO3_C92DEMO3_C11DEMO3_C74DEMO4_C85DEMO3_C62DEMO4_C70DEMO4_C50DEMO3_C37DEMO3_C96DEMO3_C88DEMO3_C56DEMO4_C28DEMO3_C25DEMO4_C01DEMO3_C87DEMO1_C89DEMO4_C57DEMO3_C05DEMO4_C36DEMO1_C37DEMO3_C21DEMO3_C31DEMO3_C33DEMO4_C35DEMO3_C47DEMO3_C36DEMO4_C17DEMO4_C59DEMO3_C10DEMO4_C96DEMO3_C06DEMO3_C38DEMO4_C75DEMO1_C88DEMO1_C78DEMO1_C91DEMO1_C87DEMO1_C72DEMO4_C77DEMO4_C15DEMO4_C81DEMO4_C63DEMO4_C45DEMO4_C21DEMO4_C61DEMO3_C13DEMO3_C09DEMO1_C59DEMO3_C46DEMO3_C69DEMO3_C39DEMO3_C44DEMO1_C57DEMO1_C81DEMO3_C82DEMO4_C33DEMO4_C86DEMO3_C70DEMO3_C43DEMO3_C07DEMO4_C47DEMO4_C29DEMO4_C26DEMO4_C27DEMO3_C45DEMO4_C40DEMO4_C14DEMO1_C69DEMO3_C95DEMO1_C90DEMO3_C60DEMO1_C73DEMO4_C78DEMO3_C40DEMO1_C71DEMO3_C15DEMO4_C16DEMO4_C56DEMO4_C54DEMO3_C54DEMO4_C69DEMO3_C93DEMO4_C90DEMO4_C23DEMO4_C52DEMO3_C12DEMO1_C11DEMO4_C89DEMO3_C04DEMO3_C26DEMO4_C38DEMO3_C41DEMO4_C65DEMO3_C52DEMO4_C73DEMO4_C04DEMO3_C24DEMO3_C29DEMO3_C42DEMO3_C80 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

individual 1 individual 2 individual 3 individual 4 individual 5 (fibro)

1.0 1.0

0.8 0.8

0.6 0.6 Proportion

Inactivation score 0.4 0.4

0.2 0.2

0.0 0.0

Escapee Inactive Escapee B A bioRxiv preprint

negatively correlated -log10 (p-value) positively correlated 6 3 0 3 6 doi: certified bypeerreview)istheauthor/funder.Allrightsreserved.Noreuseallowedwithoutpermission. https://doi.org/10.1101/298984 0 CD99 Number of genes Number of genes 100 150 0.0 2.5 5.0 7.5 50 0 EIF2S3 Relaxed escapees 02000.2 0.0 −0.2 Inactive genes correlation ; this versionpostedApril11,2018. 50 X chromosomeposition(Mb) p p =

= 1 2.0 .0 1 6 x 1 6 x

x 1 x XIST

0 0

-6 -83

The copyrightholderforthispreprint(whichwasnot 0 1 2 3 4 5 0 2 4 6 100 02000.2 0.0 −0.2 Robust escapees PAR genes correlation NDUF p = 6 p = p A1

= 4 .19 49 x 1 x .49 BCAP31 RPL10

x 1 x 0 0 -4 -4

150 bioRxiv preprint doi: https://doi.org/10.1101/298984; this version posted April 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

-5 2.677 x 10-7 7.046 x 10 50 1.00

40

0.75

30

0.50

IS (RPKM) 20 XIST

0.25

10

0 0.00

G0 G1 S G2M G0 G1 S G2M n=264 n=414 n=30 n=94 n=264 n=414 n=30 n=94