Dissection of Functional Elements Involved in X- Inactivation

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Colognori, David A. 2019. Dissection of Xist Functional Elements Involved in X-Chromosome Inactivation. Doctoral dissertation, Harvard University, Graduate School of Arts & Sciences.

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:42029476

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA

Dissection of Xist functional elements involved in X-chromosome inactivation

A dissertation presented

by

David A. Colognori

to

The Division of Medical Sciences

in partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in the subject of

Biological and Biomedical Sciences

Harvard University

Cambridge, Massachusetts

May 2019

© 2019 by David A. Colognori

All rights reserved.

Dissertation Advisor: Jeannie T. Lee David A. Colognori

Dissection of Xist functional elements involved in X-chromosome inactivation

Abstract

X-chromosome inactivation (XCI) is the epigenetic process of silencing one of the two X in female mammals to balance X-linked dosage with that of males. This process is governed by the long noncoding RNA Xist. Xist is expressed from and coats one in cis, leading to recruitment of various factors to silence .

However, the functional RNA elements and molecular mechanisms involved in Xist coating and silencing remain ambiguous.

The rationale for this work was thus to perform a systematic deletional analysis to identify functional motifs within the endogenous Xist in female cells. The screen identified regions important for correct splicing of the Xist transcript, such as the Repeat A region. Others, including Repeat F, were required for proper expression and/or RNA stability. Repeat E was crucial for restricting Xist localization to the inactive X (Xi) territory. Finally, Repeat B was necessary for recruitment of Polycomb Repressive Complexes 1 and 2 (PRC1 and PRC2) across the Xi, as well as for proper Xist coating.

To gain mechanistic insight into Polycomb targeting and Xist localization, I identified protein trans factors associated with the Repeat B and E motifs. Repeat B function in Polycomb recruitment and Xist coating was shown to be mediated through direct interaction with the RNA- binding protein HNRNPK. Meanwhile, Repeat E function in Xist localization was found to be mediated through interaction with the nuclear matrix protein CIZ1.

iii

Just as Xist RNA is necessary to spread Polycomb complexes across the Xi, I found that

Polycomb complexes are in turn necessary to properly spread Xist RNA. In addition, PRC1 and

PRC2 occupancy on Xi was discovered to be mutually interdependent. Hence, Xist, PRC1, and

PRC2 require each other to propagate along the Xi, suggesting a positive feedback mechanism between RNA initiator and protein effectors. Perturbing Xist/Polycomb spreading has significant consequences, as deleting Repeat B during de novo XCI establishment causes failure of X- linked gene silencing and disrupts architectural reconfiguration of the X from and active to inactive chromosomal structure.

iv

Table of Contents

Abstract iii Table of Contents v List of Figures and Tables vii Acknowledgements viii Dedication x

Chapter 1 Introduction 1

Epigenetics and 2 Dosage compensation and X-chromosome inactivation 3 The X-inactivation center and X-inactive specific transcript 6 XCI as a paradigm of epigenetic regulation by RNA 7 Features and mechanisms of Xist RNA 8 Xist spreading 9 Xist-mediated gene silencing 11 Polycomb complexes and their function 12 Polycomb recruitment, spreading, and maintenance 16 Xist-mediated Polycomb recruitment 18 Unique chromosomal architecture of the Xi 21 Xist-mediated changes to chromosomal architecture 23 Preface 24

Chapter 2 Xist deletional analysis reveals an interdependency between Xist 26 RNA and Polycomb complexes for spreading along the inactive X

Summary 28 Introduction 28 Results 30 Discussion 66 Materials and Methods 68

Chapter 3 Repeat E anchors Xist RNA to the inactive X chromosomal 89 compartment through CDKN1A-interacting protein (CIZ1)

Summary 91 Introduction 91 Results 92 Discussion 110 Materials and Methods 112

Chapter 4 Conclusion 120

Controversy over Xist functional domains 121 Advantages of CRISPR/Cas9 technology and cell lines used in this work 122 Xist regions affecting RNA abundance and splicing: 123 Repeats A, F, exons 2-6, 7a/d Xist regions affecting RNA localization: Repeat E 125

v

CIZ1 interaction with Repeat E 125 Xist regions affecting RNA coating and Polycomb recruitment: Repeat B 126 HNRNPK interaction with Repeat B 126 Repeat B-HNRNPK interaction is required for Polycomb recruitment 127 Role of Repeat A in Polycomb recruitment 128 Role of Polycomb in Xist-mediated gene silencing 130 Mechanism of Repeat B/HNRNPK-mediated Polycomb recruitment 130 Xist co-opts general factors for XCI 132 Independent versus interdependent recruitment of Polycomb complexes 134 Role of Polycomb in Xist RNA coating 135 Role of Repeat B in architectural reconfiguration of the Xi 137

References 140

Appendix 162

vi

List of Figures and Tables

Chapter 1

Figure 1.1 – Features of Xist 7 Figure 1.2 – Composition and interplay between PRC1 and PRC2 15

Chapter 2

Figure 2.1 – CRISPR/Cas9 deletion screen identifies Xist functional domains 31 Figure 2.2 – Xist deletions affecting splicing 33 Figure 2.3 – The Repeat B motif affects Xist localization in a CIZ1-independent manner 36 Figure 2.4 – /H2AK119ub IF and Xist RNA FISH for deletions without 37 phenotype Figure 2.5 – Repeat B is required for Xist RNA spreading and Polycomb maintenance 39 Figure 2.6 – Identification of HNRNPK as a Repeat B-interacting protein 42 Figure 2.7 – Direct Repeat B-HNRNPK interaction is required for Xist spreading and 45 Polycomb maintenance Figure 2.8 – Xist and Polycomb complexes depend on each other to spread across the Xi 48 Figure 2.9 – Diffuse Xist cloud morphology is not due to large-scale decompaction of Xi 51 Figure 2.10 – Independent and interdependent recruitment of PRC1 and PRC2 to the Xi 54 Figure 2.11 – Repeat B is required for complete Xist-mediated gene silencing 57 Figure 2.12 – Repeat B is required for proper Xa to Xi architectural reconfiguration 62

Chapter 3

Figure 3.1 – ASH2L antibody exhibits cross-reactivity to an unknown Xi-localizing protein 93 Figure 3.2 – CIZ1 is a novel Xi-localizing protein 96 Figure 3.3 – Xist RNA is required for CIZ1 recruitment to Xi 99 Figure 3.4 – CIZ1 is critical for maintenance of Xist cloud and Xi marks 102 Figure 3.5 – CIZ1 interacts with Xist RNA through the Repeat E region 105 Figure 3.6 – CIZ1 interacts with Xist RNA independently of HNRNPU 108

Chapter 4

Figure 4.1 – Repeat B secondary structure 127

Appendix

Table S1 – Sanger sequencing information 163 Table S2 – Mass spectrometry data 198 Table S3 – Guide RNA sequences 215 Table S4 – FISH probe sequences 216 Table S5 – Primer sequences 224

vii

Acknowledgements

This dissertation would not have been possible without the love and support of countless individuals in my life. Science is intended to be a collaborative effort, and the long journey toward a Ph.D.—both at and away from the bench—accurately reflects that.

First and foremost, I would like to thank my family. I am grateful to my mother for recognizing and encouraging my scientific curiosity and creativity from a young age, as well as her patience with my many idiosyncrasies. I thank my sister, the other Ph.D. in our family, for her guidance throughout this difficult journey. I also thank my brother-in-law, Kevin, and my recent nephews, John and Christopher, for making my visits home so enjoyable. I was also fortunate to have several relatives near Boston that went out of their way to make me feel at home and provide a respite from work. Finally, I am grateful to my father for always believing in me and helping me get back up again when I fall. Although he is no longer with us, he is the reason I chose to pursue biological research in the first place. I miss you dad.

I was lucky to have amazing friends and housemates in Boston who made these years fly by: Diana Cai, Luvena Ong, Terence Wong, Tim Whipple, Jarom Chung, Shela Durresi, Bing

Han, and Bernie Kuan. Thanks for all the game nights, potlucks, and sharing your lives with me.

I would also like to thank my high school and college friends for their support from afar, as well as my online gaming friends for all the laughs we shared over the internet.

I am indebted to my undergraduate research advisor, Joan Steitz, for starting me down the path of RNA biology. She accepted me into her lab with no prior research experience and taught me the fundamentals of doing good science. I owe my mentors in the Steitz lab, Andrei

Alexandrov and Kasandra Riley, for teaching me everything I knew upon entering graduate school.

I am grateful to all past and present members of the Lee lab for their advice and support over the years, both scientific and personal. I am especially grateful to Hongjae Sunwoo, Stefan

viii

Pinter, Cathy Cifuentes-Rojas, Yesu Jeon, and Brian del Rosario for their close mentorship on various projects. Thank you for putting up with all my whining and shenanigans.

To my fellow BBS graduate students in the Lee lab (Chen-Yu Wang, John Froberg,

Andrea Kriz, Lin Wang, and Johnny Kung), thanks for making my time here so enjoyable. I would not have been able to work so late into the night without the company of such amazing and inspiring friends. I will miss our conversations about science and life in general, as well as our impromptu dinners and movie nights.

I thank the members of my dissertation advisory committee, Bob Kingston, Mitzi Kuroda, and Danesh Moazed. Their critical advice and insight kept me on track toward a successful project and shaped my work into what it is today.

Last but not least, I would like to thank my dissertation advisor, Jeannie Lee. She maintained an unwavering positive attitude no matter what obstacles arose throughout my time in the lab. She always encouraged me to keep forging ahead and gave me the freedom, resources, and independence to pursue my own interests and ideas.

ix

This thesis is dedicated to my mother, father, and sister.

Chapter 1

Introduction

Epigenetics and chromatin

“Epigenetics” (meaning “above” genetics) refers to changes in phenotype not caused by changes in underlying genotype. These phenotypical changes are heritable across cell division, establishing a form of “memory”. Epigenetic memory is largely responsible for cell fate specification during development, which in multicellular organisms generates a diverse array of phenotypes from a single genotype (Cavalli, 2006). These changes, while relatively stable, are dynamic and reversible upon exposure to new developmental or environmental cues.

Epigenetics therefore constitute an additional layer of gene regulation above the genetic level.

But what is the nature of this epigenetic regulation? In eukaryotes, DNA is packaged in the nucleus in the form of an ordered macromolecular structure called chromatin (Trojer and

Reinberg, 2007). The basic unit of chromatin is the nucleosome, consisting of ~146 bp of DNA wrapped around an octamer of H2A, H2B, H3, and H4 like beads on a string

(Luger et al., 1997). Changes in this packaging structure (e.g. densely vs. loosely packed) affect accessibility of machinery to the wound DNA, thereby controlling gene expression.

Specific epigenetic modifications include post-translational modification of , substitution of canonical histones with variant forms, or even chemical modification of DNA itself. Cadres of factors that perform (“writers”), recognize (“readers”), and remove (“erasers”) these modifications maintain the epigenetic system (Biswas and Rao, 2018).

Chromatin can be subcategorized into two types: active euchromatin and inactive heterochromatin. Euchromatin and heterochromatin can generally be distinguished according to their differential staining pattern in the nucleus, which corresponds to the level of compaction

(Trojer and Reinberg, 2007). Active euchromatin is “open” or decompacted in interphase nuclei, allowing accessibility to transcription machinery. Euchromatin is generally gene-rich and marked by H3 lysine 4 trimethylation (H3K4me3), H3/H4 lysine acetylation, decreased nucleosome occupancy, and low CpG methylation of DNA at promoters (Mikkelsen et al., 2007; Bernstein et al., 2007; Lee et al., 2004; Bernstein et al., 2004; Meissner et al., 2008). Gene bodies are

2 usually marked by H3K36me3 and high DNA methylation (Mikkelsen et al., 2007; Maunakea et al., 2010; Baubec et al., 2015).

Inactive heterochromatin is “closed” or compacted in interphase nuclei, precluding accessibility to transcription machinery. Heterochromatin is generally gene-poor and exhibits increased association with the nuclear lamina and late DNA replication timing during S-phase

(Peric-Hupkes et al., 2010; Guelen et al., 2008; Ryba et al., 2010). Heterochromatin can be further subdivided into constitutive or facultative heterochromatin, depending on whether gene repression is permanent or conditional, respectively (Trojer and Reinberg, 2007). Constitutive heterochromatin is often marked by H3K9me2/3 and exemplified by repetitive DNA, such as centromeres, transposons, and telomeres (Peters et al., 2001; Mikkelsen et al., 2007).

Facultative heterochromatin is often marked by H2AK119 ubiquitylation (H2AK119ub) and

H3K27me3, placed by Polycomb repressive complexes 1 and 2 (PRC1 and PRC2), respectively

(Schuettengruber et al., 2017). This type of chromatin modification is developmentally regulated, affording flexibility in response to particular stimuli. Examples of facultative heterochromatin include imprinted , Hox genes, and the inactive X chromosome.

Dosage compensation and X-chromosome inactivation

In diploid organisms, genes are generally expressed from both parental alleles, providing a safeguard against heterozygous mutations that may decrease fitness. There are, however, several exceptions in mammals where expression occurs from only one allele (Chess, 2016).

The choice of which allele to express can be random (i.e. random monoallelic expression) or depend on whether it is maternally or paternally derived (i.e. genomic imprinting). Whereas random monoallelic expression and imprinting usually pertain to individual genes or small clusters of genes, mammalian X-chromosome inactivation represents an extreme case whereby an entire chromosome is expressed from a single copy.

3

But what is the purpose of inactivating an entire X chromosome? Sex chromosomes are believed to have evolved from a pair of identical autosomes. However, fixation of sex-enhancing genes on these chromosomes caused suppression of meiotic recombination, which led them down very different evolutionary paths (Hughes and Page, 2015; Graves, 2006). In mammals, the X chromosome encodes hundreds of genes required for normal developmental processes, whereas the Y chromosome is relatively gene-poor and specialized for male-specific processes such as spermatogenesis (Hughes and Page, 2015; Graves, 2006). Thus, females carrying 2 X chromosomes inherently have twice the dosage of essential X-linked genes compared to males.

To correct for this imbalance, one of the two X chromosomes in female mammals becomes silenced during a process called X-chromosome inactivation (XCI). Interesting, other organisms accomplish the same feat via slightly different approaches (Ercan, 2015). Fly males (XY) upregulate their single X to achieve dosage parity with females (XX), and nematode hermaphrodites (XX) downregulate both X’s to achieve parity with males (XO). This dosage compensation is important to ensure proper levels of gene products, as imbalances can lead to disease. For example, Down (trisomy 21), Klinefelter (XXY), and Turner (XO) syndromes all result from incorrect dosage of chromosomes (Ercan, 2015).

The “inactive X hypothesis” was first proposed by Mary Lyon in 1961 as a unifying theory to explain several past observations (Lyon, 1961). For instance, what is the genetic basis behind heterozygous mutations in mouse X-linked genes (e.g. fur color) causing a variegated phenotype in females? It also addressed the functional significance of the condensed sex chromatin structure known as the “Barr body” (Barr and Bertram, 1949), which was later discovered to correspond to a single X chromosome (Ohno et al., 1959; Ohno and Hauschka,

1960). Finally, it fit well with observations that XO female mice were viable and fertile, which indicated that only one functional X chromosome was required for normal development

(Welshons and Russell, 1959). In her landmark paper (Lyon, 1961), Lyon detailed how stochastic inactivation of one X chromosome early during female development leads to the

4 condensed silent chromatin body that Barr had described. This inactivation must be stable and heritable through cell division. The random choice of which X gets silenced in heterozygous females produces some cell populations expressing one allele and some expressing the other, leading to a variegated phenotype. Ultimately, this inactivation would serve to compensate the extra dosage of X-linked genes in females compared to males.

XCI can take on both imprinted and random forms. During early embryonic development prior to implantation into the uterus, the paternal X (Xp) is always inactivated (Takagi and

Sasaki, 1975; Wake et al., 1976; West et al., 1977). The Xp remains inactivated in the extraembryonic tissue (i.e. placenta) but is reactivated in the inner cell mass (i.e. embryo proper) (Mak et al., 2004). Then either the Xp or maternal X (Xm) is randomly inactivated in these cells. Due to the random nature of this round of XCI and the multicellular stage of the embryo at this point (i.e. blastocyst), some cells will inactivate Xp and others Xm. Thus, the resulting offspring will develop into “chimeric” females composed of different populations of cells depending on which X is silenced/expressed. A classic example of this is the variegated fur color of tortoise-shell cats, due to the color trait being X-linked. This mosaicism can play a significant role in determining penetrance of X-linked diseases (e.g. Rett Syndrome) (Huppke et al., 2006). Since heterozygous mutations should not present phenotype if located on the inactive X, the proportion of cells expressing a wild-type versus mutant allele (referred to as

“skewing of XCI choice”) will often determine the severity of phenotype at the organismal level

(Cattanach and Williams, 1972; Cattanach and Isaacson, 1967; Huppke et al., 2006). Keeping this in mind, marsupials undergo imprinted XCI in all cells, raising the possibility that random

XCI arose later in evolution to increase diversity and/or fitness by allowing Xp to be expressed in ~50% of cells, rather than 100% expressing Xm (Richardson et al., 1971; Sharman, 1971;

Cooper et al., 1971; Cooper, 1971; Cooper et al., 1993).

5

The X-inactivation center and X-inactive specific transcript

Through analyses of X:autosome translocations and yeast artificial chromosomes, the region responsible for XCI was mapped to a part of the X chromosome called the “X-inactivation center”, or XIC/Xic (henceforth Xic) (Russell, 1963; Rastan, 1983; Brown et al., 1991b; Lee et al., 1996; Lee et al., 1999b; Heard et al., 1999). This region harbors the genetic elements that are both necessary and sufficient to trigger XCI. Hybridization experiments between a cDNA clone containing the Xic and RNA isolated from somatic male versus female cells (or hybrids containing an inactive X) led to the identification of an “X-inactive specific transcript”, or

XIST/Xist (henceforth Xist) (Figure 1.1A), in humans (Brown et al., 1991a) and mice (Brockdorff et al., 1991; Borsani et al., 1991). As its name implies, Xist is expressed only from the female inactive X (Xi) and not the active X (Xa) chromosome, contrary to nearly all other X-linked genes. Follow-up studies confirmed Xist is necessary for both random and imprinted XCI

(Penny et al., 1996; Marahrens et al., 1997; Marahrens et al., 1998), and furthermore, that it is sufficient to induce long-range silencing of genes in cis when inserted as an autosomal transgene (Herzing et al., 1997; Lee and Jaenisch, 1997; Wutz and Jaenisch, 2000).

Xist RNA exhibits several unique features that distinguish it from most other cellular

RNAs (Brockdorff et al., 1992; Brown et al., 1992). First, despite being capped, polyadenylated, and spliced, it remains in the nucleus. Second, it does not appear to code for protein, as it contains no significant open reading frame nor does it associate with polysomes. Third, it accumulates and spreads in cis across the X chromosome from which it is transcribed, forming a “cloud” readily detectable by RNA fluorescence in situ hybridization (FISH) (Figure 1.1B).

Together, these features led to the hypothesis that Xist RNA itself was the functionally relevant molecule in XCI, and therefore represents a 17-kb long noncoding RNA (lncRNA). Interestingly, fly dosage compensation also depends on lncRNAs (i.e. roX1/2), which coat and upregulate the male X chromosome (Franke and Baker, 1999; Amrein and Axel, 1997; Meller et al., 1997).

6

Figure 1.1 Features of Xist

(A) Mouse Xist gene, showing location of Repeats A-F (above) and exons 1-7 (below).

(B) Xist RNA coats the X chromosome from which it is transcribed, as shown by RNA FISH in differentiating mouse ES cell.

The Xic contains another prominent lncRNA transcribed antisense to Xist―aptly named

“Tsix”. Tsix expression antagonizes Xist expression in cis (Lee et al., 1999a; Lee and Lu, 1999;

Lee, 2000). Hence, perturbing Tsix leads to preferential Xist upregulation from the Tsix-mutant allele, skewing XCI choice (Ogawa et al., 2008). Additional Xic noncoding regulatory elements have since been described, such as Jpx, Xite, and Ftx, though their significance is not as well understood (Tian et al., 2010; Ogawa and Lee, 2003; Chureau et al., 2002; Chureau et al.,

2011).

XCI as a paradigm of epigenetic regulation by RNA

Since its discovery, Xist has served as an archetype for studying lncRNA biology in relation to epigenetics. After coating the X chromosome, Xist RNA sets off an ordered cascade of epigenetic events that together establish and maintain gene silencing. This includes: exclusion of transcription machinery (Chaumeil et al., 2006), deacetylation of H3/H4 (Belyaev et al., 1996; Boggs et al., 1996; Jeppesen and Turner, 1993), deposition of H2AK119ub and

H3K27me3 via recruitment of PRC1 and PRC2, respectively (de Napoles et al., 2004; Fang et al., 2004; Plath et al., 2003; Plath et al., 2004), incorporation of macroH2A (Costanzi and

7

Pehrson, 1998; Mermoud et al., 1999), and DNA methylation at CpG islands (Liskay and Evans,

1980; Lock et al., 1986; Lock et al., 1987; Mohandas et al., 1981; Norris et al., 1991).

Importantly, stable silencing by Xist is highly dependent upon developmental timing.

Using embryonic stem (ES) cells as an ex vivo model which undergo XCI during differentiation

(Martin et al., 1978; Rastan and Robertson, 1985), it was demonstrated that Xist can only establish silencing if induced during an early differentiation time window (i.e. “establishment phase”); induction afterwards had no effect (Wutz and Jaenisch, 2000). Moreover, silencing was reversible during this time frame upon loss of Xist expression (Wutz and Jaenisch, 2000).

However, prolonged Xist expression past this point eventually leads to stable silencing (i.e.

“maintenance phase”), where subsequent epigenetic mechanisms persist even in the absence of Xist RNA (Brown and Willard, 1994; Csankovszki et al., 2001; Csankovszki et al., 1999).

Hence, whereas Xist acts as the initial stimulus to trigger silencing, it is largely dispensable once silencing has been established―rendering XCI a bona fide epigenetic phenomenon. De novo establishment and maintenance therefore present two distinct biological contexts to study XCI.

Features and mechanisms of Xist RNA

The mechanisms by which Xist coats and silences an entire chromosome in cis have been the focus of intensive study. By coating the Xi, Xist is thought to act as an RNA scaffold for the stable recruitment of repressive factors. An early deletional analysis identified some of the first critical domains within Xist important for localization and silencing (Wutz et al., 2002).

Because these functions were able to be uncoupled and mediated by different parts of the RNA, a modular view of Xist was adopted. Interestingly, the functional regions corresponded to tandem repetitive elements (called “Repeats A-F”) found partially conserved across species

(Figure 1.1A) (Nesterova et al., 2001). A current theory is that these motifs serve as trans factor binding sites that have expanded over evolution, thus allowing multimeric interaction and amplifying function (Brockdorff, 2018). A few such Xist-interacting proteins have been identified

8 by candidate approaches (Jeon and Lee, 2011; Zhao et al., 2008; Hasegawa et al., 2010).

Recently, unbiased proteomic (Chu et al., 2015; McHugh et al., 2015; Minajigi et al., 2015) and genetic screens (Moindrot et al., 2015; Monfort et al., 2015) have uncovered additional Xist- binding proteins and related factors, whose roles in the various aspects of XCI are beginning to be deciphered (discussed below).

Xist spreading

At the onset of XCI, Xist is upregulated from the future Xi (Kay et al., 1993). It remains attached in cis and soon spreads across 160 Mb to coat the entire chromosome, forming an

RNA cloud over the Xi territory (Brown, et al., 1992; Clemson et al., 1996). To do so, several mechanisms are likely involved: (1) dissemination of Xist across large chromosomal distances in 2D and/or 3D space, and (2) interaction of Xist RNA with chromatin and/or the nuclear matrix.

In flies, the dosage compensation machinery is targeted to the male X chromosome by specific sequence elements that make up “high affinity sites” (Alekseyenko et al., 2008; Straub et al.,

2008). Accordingly, X-linked dosage compensation occurs even when the roX1/2 RNAs are provided in trans from an autosome (Meller and Rattner, 2002). This is not the case for mammalian XCI. Expression of Xist from an autosomal transgene causes Xist to coat and silence several megabases of the autosome near the insertion site (Herzing et al., 1997; Lee and Jaenisch, 1997; Wutz and Jaenisch, 2000). Because the extent of silencing is less than would normally occur on the X chromosome, there have been several hypotheses that some feature of the X lends itself to Xist-mediated silencing (Russell, 1963; Rastan, 1983). One idea is that LINE elements, which are disproportionately over-represented on the X, might serve as high-affinity sites similar to those in flies, but molecular evidence to support this hypothesis is currently lacking (Lyon, 1998).

Instead, 3D chromosomal structure seems to play a large role in initial Xist targeting.

The earliest sites of Xist spreading correlate with regions that contact the Xist locus in 3D

9 space, as determined by Hi-C experiments (Engreitz et al., 2013). Accordingly, relocating Xist to another site on the X chromosome via transgenic insertion similarly alters both 3D chromosomal interactions and early RNA spreading sites. These 3D-interacting/early spreading sites tend to be active, gene-rich regions (Simon, et al., 2013)―consistent with the fact that active regions of a chromosome (which includes the Xist locus) spatially associate with other active regions (see later discussion of A/B compartments) (Lieberman-Aiden et al., 2009). Afterwards, Xist spreads into adjacent inactive, gene-poor regions.

While the above experiments may answer how Xist spans vast chromosomal distances, they do not address the interaction between Xist ribonucleoprotein complexes and Xi chromatin.

One factor proposed for this role is the transcription factor YY1. YY1 was reported to tether nascent Xist RNA to the Xist DNA locus following transcription―a process referred to as

“nucleation” (Jeon and Lee, 2011). This stemmed from observations that YY1 possesses both

DNA- and RNA-binding capability, and can interact with Xist RNA’s Repeat C region in vitro and

Xist DNA’s Repeat F region in vitro and in vivo. In support of this, antisense oligonucleotides targeting Repeat C cause loss of Xist clouds, possibly by displacing Xist from the Xi (Beletskii et al., 2001; Sarma et al., 2010). However, deleting Repeat C does not perturb Xist clouds (Wutz et al., 2002; Bousard et al., 2018; see Chapter 2), raising concern over whether the oligos instead cause Xist RNA degradation or act via off-target effects. Also, depleting YY1 or mutating the YY1 binding sites within Repeat F was found to impair Xist expression, suggesting YY1 may serve as a transcriptional activator of Xist (Makhlouf et al., 2014).

Another player in Xist localization is the nuclear matrix protein, HNRNPU (also known as

SAF-A). Depleting HNRNPU causes Xist to disperse throughout the nucleus (Hasegawa et al.,

2010). HNRNPU has been claimed to directly interact with many regions along the length of Xist

RNA (Hasegawa et al., 2010; Yamada et al., 2015; Cirillo et al., 2016). Consistent with this, Xist deletional analysis was unable to pinpoint a single region of the RNA responsible for localization, suggesting multiple redundant regions are involved (Wutz et al., 2002), though

10

HNRNPU may not be the only trans factor responsible. HNRNPU was reported by one study to be Xi-enriched (Helbig and Fackelmayer, 2003), but this was not confirmed by others

(Hasegawa et al., 2010). As was proposed for YY1, HNRNPU contains both DNA- and RNA- binding domains that may directly bridge Xist to chromatin. Alternatively, HNRNPU association with the nuclear matrix may restrict Xist localization within the Xi territory.

Xist-mediated gene silencing

Shortly after Xist spreading, transcription machinery is excluded from the Xi, H3/H4 are deacetylated, and gene silencing occurs (Belyaev et al., 1996; Boggs et al., 1996; Jeppesen and Turner, 1993; Chaumeil et al., 2006; Wutz and Jaenisch, 2000; Żylicz et al., 2019). Initial silencing activity of Xist was mapped to its Repeat A motif, a ~400-bp 5’ region consisting of

7.5-8.5 copies of a GC-rich element separated by A/T-rich spacers (Wutz et al., 2002).

Extensive enzymatic and chemical probing analyses have yielded inconsistent results regarding

Repeat A secondary structure (Smola et al., 2016; Duszczyk et al., 2011; Fang et al., 2015;

Maenner et al., 2010). Recently, a technique for mapping RNA duplexes from live cells with near-basepair resolution proposed a novel Repeat A structure characterized by inter-repeat dimers formed by non-canonical basepairing interactions (Lu et al., 2016).

Xist proteomic (Chu et al., 2015; McHugh et al., 2015; Minajigi et al., 2015) and genetic screens (Moindrot et al., 2015; Monfort et al., 2015) have begun to identify trans factors required for Xist-mediated gene silencing. One candidate recovered from all screens was the RNA- binding protein SPEN. Depleting SPEN impairs Xist silencing (Chu et al., 2015; McHugh et al.,

2015; Moindrot et al., 2015; Monfort et al., 2015; Nesterova et al., 2018), phenocopying Repeat

A deletion. Accordingly, direct interaction between SPEN and Repeat A has been demonstrated by several approaches in vivo and in vitro (Chen et al., 2016; Lu et al., 2016; Chu et al., 2015;

Monfort et al., 2015), and Repeat A deletion abrogates SPEN binding (Chen et al., 2016; Chu et al., 2015). Although the mechanism by which SPEN may promote gene silencing is still

11 unknown, one theory is that SPEN recruits NCOR2/SMRT to activate HDAC3, which in turn triggers H3/H4 deacetylation (McHugh et al., 2015; Żylicz et al., 2019).

While the above factors may account for initiation of gene silencing, the pathways involved in maintaining the silent state appear to be distinct. For instance, Polycomb complexes appear to be dispensable for initial silencing (Kalantry and Magnuson, 2006), but may play a role in its stabilization until additional mechanisms (including DNA methylation) solidify the silenced state during maintenance phase (Kalantry et al., 2006). Interestingly, pharmacological treatments that interfere with a single pathway (e.g. 5-azacytidine for blocking DNA methylation) only minimally reactivate the Xi (Csankovszki et al., 2001; Mohandas et al., 1981), implying multiple pathways must work in parallel or synergistically to maintain silencing.

XCI is nearly chromosome-wide; yet a few genes (3-15%, depending on species and tissue) escape inactivation (Balaton and Brown, 2016; Berletch et al., 2011). These “escapees” often have paralogs on the Y chromosome and thus are expected to require 2x dosage for normal development in both males and females. Improper dosage of escapees is believed to play a role in the pathogenesis of X aneuploidy diseases like Klinefelter (XXY) and Turner (XO) syndromes, which express a single Xa and retain otherwise normal X-linked dosage (Berletch et al., 2011).

Polycomb complexes and their function

Following gene silencing, Polycomb repressive complexes accumulate on the Xi and deposit heterochromatin marks across nearly the entire chromosome (Okamoto et al., 2004;

Plath et al., 2003; de Napoles et al., 2004; Fang et al., 2004; Pinter et al., 2012). This is may help stabilize gene silencing by altering chromatin structure. But before discussing the specific mechanisms of Xist-mediated Polycomb recruitment and its impact on XCI, let me first introduce the Polycomb complexes and their general function in gene regulation.

12

Polycomb Group proteins were first discovered in flies, where their mutation was found to cause homeotic transformations that shuffle the identity and features of body segments

(Lewis, 1978; Jürgens, 1985; Duncan, 1982; Struhl and Akam, 1985). Since then, they have been recognized for their broad role in maintaining gene expression patterns throughout development and cell differentiation. Two major multi-subunit complexes, Polycomb repressive complexes 1 and 2 (PRC1 and PRC2), are perhaps the best characterized and are functionally conserved from to flies to mammals. PRC1 and PRC2 deposit the histone modifications

H2AK119ub and H3K27me3, respectively, which occur mostly over silent gene promoters

(Schuettengruber et al., 2017). PRC2 can also mono- and dimethylate H3K27 in regions of the genome where the complex does not stably bind (Ferrari et al., 2014).

H2AK119ub and H3K27me3 have been shown to interfere with RNA Polymerase II function (Wang et al., 2004a; Stock et al., 2007). The histone marks and PRC1/2 complexes themselves were both proposed to prevent deposition of activating histone modifications, such as H3K4me3 (Pasini et al., 2008; Nakagawa et al., 2008) and H3K27 acetylation (H3K27ac)

(Pasini et al., 2010b; Tie et al., 2016)―the latter being mutually exclusive with H3K27me3.

Additional evidence suggests that PRC1 can mediate chromatin compaction at gene targets independently of its H2AK119ub catalytic activity (Pengelly et al., 2013; Pengelly et al., 2015;

Francis et al., 2004; Eskeland et al., 2010), perhaps through self-association of PRC1 complexes and bound chromatin domains (Isono et al., 2013; Kundu et al., 2017).

PRC1 is the most heterogeneous of the Polycomb complexes (Figure 1.2A). The only subunit common to all complexes is RING1A or RING1B, which is responsible for catalyzing

H2AK119ub (Wang et al., 2004b; de Napoles et al., 2004; McGinty et al., 2014). RING1A/B activity requires heterodimerization with any of six PCGF proteins (PCGF1-6) to form the core

E3 ubiquitin ligase (Elderkin et al., 2007; Cao et al., 2005). RING1A/B:PCGF1-6 can form two types of PRC1 complexes: “canonical” and “non-canonical”. Canonical PRC1 complexes are defined by having any of five CBX proteins (CBX2/4/6-8) (Gao et al., 2012; Hauri et al., 2016).

13

The ability of CBX proteins to bind methylated H3 (e.g. H3K27me3) through their chromodomain

(Bernstein et al., 2006) was suggested to guide recruitment of canonical PRC1 complexes to sites previously targeted by PRC2 (i.e. “canonical Polycomb recruitment pathway”, Figure 1.2C)

(Wang et al., 2004b). Canonical PRC1 complexes specifically contain PCGF2/4, and also include any of three PHC proteins (PHC1-3) and any of three SCM proteins (SCMH1/L1/L2).

Non-canonical PRC1 complexes are defined by having RYBP or YAF2 in place of CBX proteins

(Czypionka et al., 2007; Wang et al., 2010; Gao et al., 2012; Tavares et al., 2012) and can associate with a variety of additional factors. Due to the lack of a CBX subunit, non-canonical

PRC1 cannot bind H3K27me3 and would thus require an alternative means of recruitment that may precede PRC2 (i.e. “non-canonical Polycomb recruitment pathway”, Figure 1.2C).

PRC2 composition is more conservative than PRC1 (Figure 1.2B). The core subunits of

PRC2 are EZH1 or EZH2, EED, SUZ12, and RBBP4 or RBBP7 (Schuettengruber et al., 2017).

The catalytic subunit is EZH1/2, whose SET domain is responsible for carrying out H3K27me1-3 modification (Ferrari et al., 2014). EED can bind H3K27me3 and enhance EZH1/2 catalytic activity, and has thus been suggested to play a role in spreading PRC2 from sites with pre- existing H3K27me3 to adjacent nucleosomes (Margueron et al., 2009). RBBP4/7 is thought to play a general role in chromatin-modifying complexes by mediating binding to H3/H4 (Nekrasov et al., 2005). The specific function of SUZ12 is still uncertain, but it appears necessary for proper targeting of PRC2 to CpG islands (Højfeldt et al., 2018). Besides the core PRC2 subunits, several accessory proteins have been identified, leading to the subcategorization of

PRC2 into two variants: PRC2.1 and PRC2.2 (Alekseyenko et al., 2014; Grijzenhout et al.,

2016; Hauri et al., 2016). These sub-stoichiometric components affect PRC2 recruitment to target sites (Casanova et al., 2011; Pasini et al., 2010a; Peng et al., 2009; Kim et al., 2009; Kalb et al., 2014; Beringer et al., 2016; Cooper et al., 2016) as well as modulate activity (Hunkapiller et al., 2012; Sarma et al., 2008; Li et al., 2011; Peng et al., 2009;

14

Figure 1.2 Composition and interplay between PRC1 and PRC2

(Adapted from Schuettengruber et al., 2017)

(A) Composition of various PRC1 complexes in mammals.

(B) Composition of various PRC2 complexes in mammals.

(C) Proposed canonical (above) and non-canonical (below) Polycomb recruitment pathways.

15

Kalb et al., 2014; Grijzenhout et al., 2016). PRC2.1 includes any of three PCL1-3 proteins

(PHF1, MTF2, PHF19), as well as C10ORF12 (LCOR) or C17ORF96 (EPOP). PRC2.2 includes the accessory subunits AEBP2 and JARID2, the latter of which has been shown to recognize

H2AK119ub (Cooper et al., 2016).

Polycomb recruitment, spreading, and maintenance

Polycomb response elements (PREs) were identified as DNA sequences required for

Polycomb recruitment and repression of Hox genes in flies (Simon et al., 1993; Busturia et al.,

1997; Papp and Müller, 2006; Oktaba et al., 2008; Schuettengruber et al., 2009). However, none of the PRC1 or PRC2 core components possess DNA-binding activity, implying Polycomb recruitment to PREs is mediated by outside factors. Several DNA-binding proteins, most notably

Pleiohomeotic (Pho), are present at fly PREs and believed to work in a combinatorial fashion to recruit Polycomb (Fritsch et al., 1999; Brown et al., 1998; Erokhin et al., 2018).

The situation in mammals has proven more complex. The mammalian ortholog of Pho is

YY1, whose role in Polycomb recruitment does not seem to be conserved. Instead, mammalian

Polycomb targets correlate with CpG islands, which do not share any obvious DNA-binding protein motifs (Mendenhall et al., 2010). Targeting to CpG islands may be through recognition of specific chromatin features, such as gene silencing (Riising et al., 2014) and deacetylated

H3K27 (Morey et al., 2008; Reynolds et al., 2012; Schmitges et al., 2011).

A few Polycomb-associated factors have been shown to contribute toward core complex recruitment: JARID2, associated with PRC2.2 (da Rocha et al., 2014); PCL proteins (e.g.

MTF2), associated with PRC2.1 (Li et al., 2017); and KDM2B, associated with one of the noncanonical PRC1 complexes (Farcas et al., 2012). JARID2 harbors a ubiquitin-interacting motif believed to participate in PRC2 recruitment through recognition of H2AK119ub (Cooper et al., 2016), though its binding affinity is relatively weak (Kd ~2.8 μM). KDM2B and PCL proteins

16 directly bind unmethylated CpG islands (Li et al., 2017; Farcas et al., 2012), supporting the idea that CpG islands can act as mammalian PREs.

H2AK119ub and H3K27me3 often co-occupy regions of chromatin. In light of this observation, Polycomb complexes were thought to somehow recruit one another to reinforce gene repression. Early studies in flies and mammals noted that PRC1 enrichment at PREs was lost upon depletion of PRC2 (Wang et al., 2004b; Boyer et al., 2006; Cao et al., 2005). This led to a hierarchical model of Polycomb recruitment whereby PRC2 is recruited first and deposits

H3K27me3, which is then recognized by canonical PRC1 via its CBX subunit’s chromodomain, allowing recruitment and deposition of H2AK119ub (“canonical Polycomb recruitment pathway”).

However, results from later studies conflicted with this simplistic model. For example, depletion of H3K27me3 does not result in global changes of H2AK119ub levels (Tavares et al.,

2012). Furthermore, PRC1 was found to bind target sites in the absence of PRC2 (Kahn et al.,

2016; Hisada et al., 2012), while PRC2 instead required PRC1 for proper targeting to some sites (Kahn et al., 2016). Also, tethering PRC1 components to a heterochromatin region led to accumulation of PRC2 and H3K27me3 (Blackledge et al., 2014; Cooper et al., 2014). A possible mechanistic explanation for these findings was provided when PRC2 complexes containing

JARID2 were shown to have H2AK119ub-binding activity, which further stimulates PRC2 catalytic activity when bound (Kalb et al., 2014; Cooper et al., 2016). Meanwhile, the identification of non-canonical PRC1 complexes containing RYBP in place of CBX provided rationale for how PRC1 could be targeted independently of H3K27me3 (Tavares et al., 2012).

Together, these results led to a reversal of the hierarchical model for Polycomb recruitment

(“non-canonical Polycomb recruitment pathway”) (Comet and Helin, 2014; Blackledge et al.,

2015), whereby non-canonical PRC1 is recruited to target sites first and deposits H2AK119ub.

H2AK119ub can then be recognized by JARID2, leading to recruitment of PRC2 and deposition of H3K27me3. H3K27me3 can further recruit canonical CBX-containing PRC1 complexes.

17

Instead of de novo recruitment, reciprocal recognition of H2AK119ub and H3K27me3 by

PRC2 and PRC1, respectively, may be important for the spreading of Polycomb domains along chromatin and/or their maintenance over cell division. A similar idea has been proposed for

PRC2 self-propagation, given its ability to recognize its own H3K27me3 mark. PRC2 binding to

H3K27me3 via its EED subunit is required for allosteric activation of EZH1/2 and deposition of

H3K27me3 onto adjacent nucleosomes (Hansen et al., 2008; Margueron et al., 2009), a concept supported by cryo-EM structures of PRC2 simultaneously engaged with two nucleosomes (Poepsel et al., 2018). Likewise, the non-canonical PRC1 subunit RYBP was shown to bind H2AK119ub (Arrigoni et al., 2006), which may similarly contribute to PRC1 self- propagation. These properties likely explain the broad distribution of H2AK119ub/H3K27me3 marks across many kilobases of chromatin, rather than forming sharp peaks. Recent studies offer additional support for the marks in mediating Polycomb spreading while refuting their role in transgenerational Polycomb maintenance (Kahn et al., 2016; Højfeldt et al., 2018; Oksuz et al., 2018). Depleting H3K27me3 and then reintroducing PRC2 does not abolish or alter de novo

H3K27me3 placement, arguing against pre-existing marks directing PRC2 recruitment.

However, similar experiments removing both H2AK119ub and H3K27me3 have not been performed. In sum, PRC1 and PRC2 (along with their respective histone modifications) appear to constitute an elegant two-component system to mutually reinforce gene repression.

Xist-mediated Polycomb recruitment

During XCI, both PRC1 and PRC2 accumulate on Xi and deposit H2AK119ub and

H3K27me3 such that both marks appear visibly enriched above other chromosomes by immunofluorescence (Okamoto et al., 2004; Plath et al., 2003; de Napoles et al., 2004; Fang et al., 2004). Polycomb recruitment to the Xi is unique compared to recruitment elsewhere in the genome in several ways. First, Xist RNA is necessary and sufficient to recruit PRC1 and PRC2

(Kohlmaier et al., 2004; Plath et al., 2003; Schoeftner et al., 2006; Silva et al., 2003; Zhao et al.,

18

2008), providing an example of RNA-mediated targeting. With this as precedence, several other

RNAs were reported to play a role in directing Polycomb and other epigenetic complexes to chromatin in cis or in trans (Zhao et al., 2010; Rinn et al., 2007; Davidovich et al., 2013; Kung et al., 2013), though further investigation is necessary to define whether this mode of RNA-guided targeting is an exception or a rule. Second, Polycomb targeting to the Xi is chromosome-wide

(spanning both genic and intergenic regions) rather than CpG island/-specific (Pinter et al., 2012). These observations, combined with the fact that Xist and Polycomb complexes/marks colocalize by microscopy and genomic methods (Sunwoo et al., 2015; Simon et al., 2013;

Engreitz et al., 2013), favor a direct role for Xist RNA in seeding Polycomb across the Xi.

Indeed, Polycomb targeting to Xi is not merely a consequence of gene silencing, since

Xist mutants lacking Repeat A still recruit both PRC1 and PRC2―at least to intergenic regions

(da Rocha et al., 2014; Kohlmaier et al., 2004; McHugh et al., 2015; Plath et al., 2003; Żylicz et al., 2019). Furthermore, loss of Xist RNA during the maintenance phase of XCI leads to complete loss of Polycomb marks from Xi, even though gene silencing remains intact (Nozawa et al., 2013; Zhang et al., 2007). Thus, Polycomb recruitment requires continual presence of Xist

RNA and acts through additional mechanisms distinct from those involved in gene silencing.

How Xist recruits PRC1 and PRC2 to the Xi and the order in which they are recruited is still in dispute. One model proposes that Xist first recruits PRC2 via direct interaction with

Repeat A (Zhao et al., 2008), with canonical PRC1 recruited subsequently via H3K27me3 (i.e. canonical Polycomb recruitment pathway). In agreement with this model, Repeat A-PRC2 interaction has been demonstrated by in vivo RNA immunoprecipitations and in vitro binding assays (Zhao et al., 2008; Zhao et al., 2010; Cifuentes-Rojas et al., 2014). However, PRC2 was claimed by others to spuriously bind RNA in vitro with little to no specificity (Davidovich et al.,

2013)―though this claim is currently under reconsideration (Davidovich et al., 2015; Wang et al., 2017). Thus, the physiological significance of PRC2-RNA interaction requires further testing

19 in vivo. Interestingly, deleting Repeat A does not fully abolish Polycomb enrichment on the Xi

(da Rocha et al., 2014; Kohlmaier et al., 2004; McHugh et al., 2015; Plath et al., 2003), but does inhibit its spreading from intergenic regions into active gene bodies (Żylicz et al., 2019). One possible explanation for this is the fact that active transcription presents a barrier to Polycomb spreading (Riising et al., 2014; Kaneko et al., 2014), and so Repeat A’s gene silencing activity may be a prerequisite for Polycomb to spread to these regions. Thus, additional Xist motifs outside of Repeat A likely contribute to Polycomb recruitment in bulk.

Another model, not necessarily mutually exclusive, proposes that Xist first recruits non- canonical PRC1 through a region encompassing Repeats F, B, and C, with PRC2 recruited subsequently via H2Ak119ub, followed by canonical PRC1 via H3K27me3 (i.e. non-canonical

Polycomb recruitment pathway) (da Rocha et al., 2014; Almeida et al., 2017). Consistent with this model, transgenic Xist RNA lacking Repeats F, B, and C is unable to form

H2AK119ub/H3K27me3-enriched domains in cis (da Rocha et al., 2014; Almeida et al., 2017).

One recent Xist proteomic study recovered all 3 components of a non-canonical PRC1 complex but none specific to canonical PRC1 or PRC2 (Chu et al., 2015), supporting a closer link between non-canonical PRC1 and Xist. Furthermore, during de novo XCI, H2AK119ub is seen to accumulate on Xi prior to H3K27me3 (Żylicz et al., 2019), and depletion of non-canonical

PRC1 or H2AK119ub results in loss of 90% of H3K27me3 from Xi (Almeida et al., 2017).

Indeed, non-canonical PRC1 and H2AK119ub are still visible on Xi in ES cells lacking PRC2, but canonical PRC1 is lost (Tavares et al., 2012; Schoeftner et al., 2006). Contrary to this model, however, interaction between H2AK119ub and PRC2 through its JARID2 subunit is relatively weak (Kd ~2.8 μM) and JARID2 KO causes partial but not complete loss of

H3K27me3 on Xi (Cooper et al., 2016), supporting the notion that H2AK119ub-independent mechanisms for PRC2 recruitment also exist.

20

Unique chromosomal architecture of the Xi

Mammalian chromosomes adopt a highly ordered structure. Each chromosome occupies a spatial area within interphase nuclei referred to as its “chromosome territory”, observable by microscopy (Cremer et al., 1982). Despite being a linear molecule, chromosomes fold into complex 3D patterns thought to be related to their gene expression patterns. Recent advances in chromosome conformation capture technologies (e.g. 3C, 4C, 5C, Hi-C) have begun to unravel the various layers of chromosome topology. For instance, continuous regions of ~1 Mb in length fold into “topologically associating domains” (TADs) (Dixon et al., 2012; Nora et al.,

2012), perhaps grouping related genetic elements into a single domain. Formation of TADs requires architectural proteins such as CTCF and cohesin (Nora et al., 2017; Rao et al., 2017).

Discontinuous regions of chromosomes also interact with one another. In general, gene-rich, active regions of a chromosome associate with one another, as do gene-poor, inactive regions, forming two distinct compartments referred to as A (active) and B (inactive) compartments

(Lieberman-Aiden et al., 2009; Rao et al., 2014). Rather than through architectural proteins, it is believed that compartment formation is driven by self-association between similar types of chromatin, possibly through shared use of trans factors or by nature of having similar biophysical properties (Hnisz et al., 2017; Larson et al., 2017; Strom et al., 2017).

Compared to autosomes and the Xa, the Xi exhibits a unique chromosomal architecture, raising the possibility that the folding of the Xi may play a role in gene silencing, or vice-versa.

Microscopy studies describe the Xi as appearing more compact, spherical, and smooth- surfaced, while the Xa appears more extended, flat, and irregularly-shaped (Eils et al., 1996).

Super-resolution microscopy has revealed interphase chromosomes to be sponge-like or cavernous, with a network of channels permeating the packed chromatin. Whereas the Xa is porous, likely allowing for exchange of trans factors and active transcription, the Xi channels are more collapsed (Smeets et al., 2014). Investigation of the structure-function relationship between chromosome architecture and gene activity remains an intensive area of research.

21

One of the most striking structural features of the Xi is its division into two large megadomains separated by the tandem repeat locus Dxz4, visible by both microscopy and Hi-C

(Giorgetti et al., 2016). Because this repeat is heavily bound by CTCF, it is assumed that CTCF- mediated insulation is responsible for maintaining this boundary (Darrow et al., 2016). Though megadomain function, if any, remains to be seen, it seems dispensable for XCI. Deleting Dxz4 abolishes megadomains while leaving gene silencing intact (Bonora et al., 2018; Darrow et al.,

2016; Froberg et al., 2018; Giorgetti et al., 2016), and although one study linked Dxz4 to escapee genes status (Giorgetti et al., 2016), this claim has been refuted by others (Froberg et al., 2018; Darrow et al., 2016).

Another interesting structural feature of the Xi is its apparent lack of A/B compartments

(Giorgetti et al., 2016; Splinter et al., 2011). Given that A/B compartments correspond to active/inactive regions, the lack of active transcription on Xi may account for this observation.

However, a recent study demonstrated a role for SMCHD1, an architectural protein recruited to the Xi in an Xist-dependent manner, in merging Xi compartments into a final seemingly compartmentless structure (Wang et al., 2018). This function seems to parallel that in humans, where SMCHD1 merges H3K27me3 and H3K9me3 domains on the Xi (Nozawa et al., 2013).

The merging of Xi compartments by SMCHD1 also provides an explanation for earlier observations on the pattern of Xist spreading (Engreitz et al., 2013; Simon et al., 2013): Xist first spreads to A compartments (i.e. other active, gene-rich regions that interact with the Xist locus in 3D space), SMCHD1 is recruited and merges Xi compartments, then Xist can spread into former B compartments (i.e. inactive, gene-poor regions). Importantly, this example illustrates that, while chromosome architecture may guide initial Xist targeting, Xist-related factors can alter chromosome structure to facilitate subsequent spreading.

A third unique structural feature of the Xi is its loss of TADs; though not completely absent, TADs are largely attenuated across the chromosome (Wang et al., 2018). This is probably linked to the concurrent loss of CTCF and cohesin (two factors critical for TAD

22 formation) from the Xi (Berletch et al., 2015; Calabrese et al., 2012; Minajigi et al., 2015). The reason for this is uncertain, though it was suggested that CTCF and cohesin are repelled by Xist or excluded from the Xi compartment (Minajigi et al., 2015). Other possible mechanisms include the Xi’s DNA hypermethylation (a modification known to inhibit CTCF binding (Bell and

Felsenfeld, 2000)) or its lack of active transcription (a process associated with TADs (Kagey et al., 2010)). A body of recent work has also implicated SMCHD1 in promoting long-range chromosomal interactions while disrupting short-range ones (e.g. TADs) (Wang et al., 2018;

Jansz et al., 2018; Gdula et al., 2019).

Xist-mediated changes to chromosomal architecture

Xi architectural reconfiguration is closely linked to XCI, with Xist RNA playing a central role. Microscopy studies have shown that Xist induces the formation of a repressive nuclear compartment not permissive to transcription machinery, which X-linked genes (excluding escapees) get drawn into and silenced (Chaumeil et al., 2006; Namekawa et al., 2010). Repeat

A is required to form this compartment (Chaumeil et al., 2006; Engreitz et al., 2013), demonstrating a link between Xist’s gene silencing activity and its ability to coat and/or shape the Xi. Xist-mediated recruitment of trans factors including Polycomb and SMCHD1 may explain some of the changes to chromosomal architecture. As mentioned earlier, PRC1 has been implicated in chromatin compaction and long-range interactions via self-association (Isono et al.,

2013; Kundu et al., 2017). Meanwhile, SMCHD1 was implicated in merging Xi chromosomal compartments to facilitate the spreading of Xist RNA, heterochromatin, and gene silencing

(Wang et al., 2018; Jansz et al., 2018; Gdula et al., 2019). Xist coating may also repel certain architectural factors (e.g. CTCF, cohesin) from the Xi, with consequential changes to chromosome topology (Minajigi et al., 2015). Hi-C studies have shown that Xist is sufficient to drive initial formation of megadomains (Giorgetti et al., 2016), and continues to play a role in shaping Xi structure even during maintenance of XCI. Removing Xist after XCI has already

23 been established causes the Xi to revert to a more Xa-like state, with restoration of select TADs

(Splinter et al., 2011; Minajigi et al., 2015). However, megadomains persist even after Xist deletion (Minajigi et al., 2015), and original A/B compartments do not get restored (Wang et al., in revision). Thus, Xist’s effect on Xi architectural reconfiguration is slightly reversible.

Preface

The work presented in this thesis addresses several questions regarding mechanisms of

Xist localization, spreading, and silencing along the Xi. I mapped key Xist functional domains through a series of tiled deletions using CRISPR/Cas9. Aside from those affecting Xist splicing or transcription/stability, I identified two regions whose deletion led to novel phenotypes characterized by diffuse Xist cloud morphologies.

In Chapter 2, I focus on the role of Repeat B. First, high-resolution mapping of Xist occupancy along the Xi revealed Repeat B is necessary for proper Xist spreading and/or chromatin association. Second, Repeat B is needed for continual recruitment of both PRC1 and

PRC2 to Xi during XCI maintenance; failure to do so results in depletion of H2AK119ub and

H3K27me3 marks from Xi chromatin, respectively. Surprisingly, the Xist cloud and Polycomb phenotypes were found to be causally linked, as loss of either PRC1 or PRC2 is sufficient to induce aberrant Xist clouds. Thus, Xist and Polycomb complexes are mutually dependent in coating the Xi. By screening for Repeat B RNA-binding proteins, I identified HNRNPK as a direct interactor required to mediate Repeat B’s downstream functions. I then investigated the interplay between PRC1 and PRC2 recruitment to Xi and found that, although each can be recruited to some extent without the other, the two reinforce one another’s recruitment and are needed together for robust Xi occupancy. Finally, deleting Repeat B during de novo XCI establishment causes failure of Xi gene silencing, accompanied by incomplete reconfiguration of chromosomal architecture from an active to inactive structure.

24

In Chapter 3, I focus on the role of Repeat E. Deleting Repeat E causes Xist to detach from the Xi territory and diffuse throughout the nucleus. As a consequence of Xist delocalization, H3K27me3 enrichment on Xi is significantly weakened. Repeat E was found to interact with the nuclear matrix protein CIZ1, which localizes to the Xi in an Xist Repeat E- dependent manner. Ablating CIZ1 phenocopies deletion of Repeat E in causing Xist delocalization and H3K27me3 reduction. Interestingly, the mechanism by which CIZ1 functions in restricting Xist to the Xi territory is distinct from that of HNRNPU, another nuclear matrix factor involved in Xist localization. CIZ1 and HNRNPU interact with Xist independently of one another, though both are required for proper Xist localization.

In Chapter 4, I discuss the broader significance and implications of my findings within the context of the field, as well as state the next logical steps to be taken. I conclude by proposing revised models for Xist spreading, Polycomb recruitment, and gene silencing during establishment and maintenance of XCI.

25

Chapter 2

Xist deletional analysis reveals an interdependency between Xist RNA and Polycomb

complexes for spreading along the inactive X

Attribution

Hongjae Sunwoo contributed equally to work in this chapter, including characterization of several Xist mutant, RING1A/B KO, and EED KO clones by standard and super-resolution microscopy. He also participated in ChIP-seq, CHART-seq, and RNA-seq experiments and related bioinformatic analyses. Chen-Yu Wang previously optimized the Xist CHART protocol with assistance from Hsueh-Ping Chu and Hyun Jung Oh, and performed related bioinformatic analyses. Andrea Kriz performed and analyzed Hi-C experiments. Content from this chapter has been published in Colognori et al., 2019.

27

Summary

During X-inactivation, Xist RNA spreads along an entire chromosome to establish silencing. However, the mechanism and functional RNA elements involved in spreading remain undefined. By performing a comprehensive endogenous Xist deletion screen, we identify

Repeat B as crucial for spreading Xist and maintaining Polycomb repressive complexes 1 and 2

(PRC1/PRC2) along the inactive X (Xi). Unexpectedly, spreading of these three factors is inextricably linked. Deleting Repeat B or its direct binding partner, HNRNPK, compromises recruitment of PRC1 and PRC2. In turn, ablating PRC1 or PRC2 impairs Xist spreading.

Therefore, Xist and Polycomb complexes require each other to propagate along the Xi, suggesting a feedback mechanism between RNA initiator and protein effectors. Perturbing

Xist/Polycomb spreading causes failure of de novo Xi silencing with partial compensatory downregulation of the active X, and also disrupts Xi topological reconfiguration. Thus, Repeat B is a multifunctional element that integrates interdependent Xist/Polycomb spreading, gene silencing, and changes in chromosome architecture.

Introduction

X-chromosome inactivation (XCI) has served as an archetype for studying epigenetics for decades (Starmer and Magnuson, 2009; Lee, 2011; Disteche, 2016; Mira-Bontenbal and

Gribnau, 2016). During XCI, the 17-kb noncoding RNA Xist spreads exclusively in cis along the future inactive X (Xi) and induces conversion to a heterochromatic state (Brown et al., 1992;

Clemson et al., 1996; Marahrens et al., 1997). The functions of Xist are manifold. On one hand,

Xist acts as a modular RNA scaffold in the assembly of repressive protein factors (Schoeftner et al., 2006; Zhao et al., 2008; Chu et al., 2015; McHugh et al., 2015; Minajigi et al., 2015;

Moindrot et al., 2015; Monfort et al., 2015). Two well-known factors, Polycomb repressive complexes 1 and 2 (PRC1 and PRC2), are responsible for monoubiquitylating histone H2A at lysine 119 (H2AK119ub) and trimethylating at lysine 27 (H3K27me3), respectively.

28

On the other hand, Xist forms a repressive compartment by repelling transcriptional and architectural factors to establish a unique Xi chromosomal conformation (Nora et al., 2012; Rao et al., 2014; Deng et al., 2015; Minajigi et al., 2015; Giorgetti et al., 2016).

Although broad functions have been associated with Xist, specific mechanisms have not been clarified. In particular, how Xist RNA spreads in cis remains an open question. Recent work has demonstrated the importance of nuclear matrix factors in restricting Xist to the Xi territory (Hasegawa et al., 2010; Ridings-Figueroa et al., 2017; Sunwoo et al., 2017). On the Xi itself, Xist spreads from its site of transcription to nearby contacts in 3D space, preferentially targeting regions enriched for active genes before spreading to less active and gene-poor regions (Engreitz et al., 2013; Simon et al., 2013). However, the mechanisms by which Xist associates with and spreads along chromatin remain unknown. How Xist recruits PRC1 and

PRC2, as well as the order in which they are recruited (Zhao et al., 2008; da Rocha et al., 2014;

Almeida et al., 2017), also requires further investigation.

Lacking in the field is a comprehensive map of Xist’s functional elements. While several essential domains have been identified—often corresponding to conserved repetitive motifs

(“Repeats A-F”) (Wutz et al., 2002; Zhao et al., 2008; Hoki et al., 2009; Jeon and Lee, 2011;

Ridings-Figueroa et al., 2017; Sunwoo et al., 2017; Yue et al., 2017)—these together account for <20% of Xist’s total sequence. Prior to CRISPR/Cas9 technology (Ran et al., 2013), genetic dissection at the endogenous locus proved challenging due to inefficiency of homologous targeting, compounded by Xist’s large size and purported redundant regions. Previous analyses have relied heavily on the use of Xist transgenes and ectopic insertions, often in male cells (Lee et al., 1999; Wutz et al., 2002; Jeon and Lee, 2011; Pintacuda et al., 2017), with the caveat that these non-physiological perturbations might not inform Xist function in the endogenous context.

Here, we carry out a systematic deletional analysis of endogenous Xist and identify a specific RNA motif—Repeat B—for RNA spreading and Polycomb targeting. In doing so, we

29 reveal the surprising discovery that Xist and Polycomb complexes depend on each other to spread across the Xi.

Results

Comprehensive deletional analysis of native Xist in female cells

To perform a systematic CRISPR/Cas9 deletion screen, guide RNA (gRNA) pairs were designed to remove consecutive 1-2 kb regions across the Xist locus in female mouse embryonic fibroblasts (MEFs), where XCI has already been established (Figure 2.1A). The immortalized MEFs were tetraploid (with genome duplication after XCI) and carried two Xi’s and two Xa’s within the same nucleus (Yildirim et al., 2011), enabling isolation of Xi+/- clones

(deletion on only one Xi) and Xi-/- clones (deletion on both Xi’s) (Figure 2.1B). Xi+/- cells provided an internal control Xist cloud within the same nucleus for comparative microscopy, while Xi-/- cells provided a homogeneous system for genomic experiments. We screened mutants by two- color RNA FISH—with cyan probes external and red probes internal to each deletion—and selected clones exhibiting cyan with no overlapping red signal (Fig. 2.1B). All deletions were validated by Sanger sequencing (Table S1).

We began with a visual inspection of Xist cloud morphology by RNA FISH. Of the 13 deletions, 7 exhibited some phenotype. While Repeat A is known for its role in gene silencing

(Wutz et al., 2002; Zhao et al., 2008), its deletion has been reported to cause decreased accumulation and/or loss of Xist expression in both human and mouse cells (Chow et al., 2007;

Zhao et al., 2008; Hoki et al., 2009). A minimal Repeat A deletion allowed us to derive clones with an intact Xist cloud and overall RNA level (Figure 2.1C, D). However, further characterization revealed aberrant splicing (Royce-Tolland et al., 2010), which we found occurred through de-suppression of a cryptic splice donor (Figure 2.2A-E). Because this resulted in simultaneous skipping of the majority of exon 1 in ~50% of transcripts, we did not pursue the ∆RepA clones further. Likewise, exon 7a or 7d deletions caused skipping of adjacent

30

Figure 2.1 CRISPR/Cas9 deletion screen identifies Xist functional domains

(A) Diagram of Xist locus, repeat elements, gRNA target sites, and qPCR amplicons.

(B) Schematic of screening method using tandem two-color RNA FISH.

(C) Xist RNA FISH for deletions showing altered Xist cloud morphology in Xi+/- MEFs.

Arrowhead indicates WT and arrow indicates mutant Xist cloud. Right panels show 3x zoom-in of each cloud.

(D) RT-qPCR showing effect of deletions on Xist RNA levels in Xi-/- MEFs. Error bars show standard deviation for 3 biological replicates.

(E) 3D STORM imaging and size measurements of Xist clouds in ∆RepB and ∆RepE Xi+/-

MEFs. Epifluorescent images of same cells shown to the right, with arrowhead indicating WT and arrow indicating mutant Xist cloud. p-values by two-tailed t-test.

(F) H3K27me3 and H2AK119ub IF for deletions showing phenotype in Xi+/- MEFs. Arrowhead indicates WT and arrow indicates mutant Xist cloud.

31

Figure 2.1 CRISPR/Cas9 deletion screen identifies Xist functional domains (continued)

32

Figure 2.2. Xist deletions affecting splicing

(A) Xist RNA FISH in ∆RepA Xi+/- MEFs showing unequal fluorescence intensity between mutant and WT Xist clouds using Repeat F or B probes, but not exon 7 probe. Arrowhead indicates WT and arrow indicates mutant Xist cloud.

(B) Quantification of (A). Fluorescence intensity of mutant and WT Xist clouds using Repeat F or

A probes was normalized to that using exon 7 probe. Repeat A displays expected intensity of 1 or 0 corresponding to WT or ∆RepA cloud, respectively. Repeat F displays unexpectedly low intensity of ~0.6 rather than 1 for the ∆RepA cloud, indicating a significant fraction of Xist RNA lacking Repeat A also lacks Repeat F.

(C) RT-qPCR in ∆RepA Xi-/- MEFs showing underrepresentation of exon 1 regions compared to exon 7. Error bars show standard deviation for 3 biological replicates. Location of qPCR amplicons shown in (E) (black bars).

(D) RT-PCR showing presence of abnormally short Xist RNA species in ∆RepA but not WT cells

(expected WT size is ~12 kb). Location of primers shown in (E) (grey arrows).

(E) Diagram of aberrant cryptic splice isoforms determined by Sanger sequencing of PCR bands in (D). Cryptic splice donor/acceptor sites shown in bold.

(F) Xist RNA FISH in ∆Ex7d Xi+/- MEFs showing unequal fluorescence intensity between mutant and WT Xist clouds using Ex7a-d probes, but not exon 1 probe. This pattern is similar to Xist’s minor splice isoform. Arrowhead indicates WT and arrow indicates mutant Xist cloud.

33

Figure 2.2 Xist deletions affecting splicing (continued)

34 exon 7 regions in most transcripts, exhibiting a pattern similar to Xist’s minor splice isoform

(Figure 2.2F).

Deleting the region containing Repeat F caused loss or significant weakening of Xist clouds (Figure 2.1C), consistent with other reports (Jeon and Lee, 2011; Makhlouf et al., 2014).

RT-qPCR in ∆RepF Xi-/- cells confirmed a reduction in Xist RNA level (Figure 2.1D), which could be due to loss of expression (Makhlouf et al., 2014), RNA stability, and/or proper “nucleation”

(Jeon and Lee, 2011). Deletion of the internal exons 2-6 also yielded a weaker cloud and decreased RNA levels (Figure 2.1C, D), presumably by affecting splicing efficiency and/or transcript stability.

Intriguingly, deleting Repeat B- or E-containing regions produced Xist clouds with aberrant morphologies. While ∆RepE caused widespread dispersal of Xist throughout the nucleus (Ridings-Figueroa et al., 2017; Sunwoo et al., 2017; Yue et al., 2017), ∆RepB caused

Xist clouds to appear more diffuse yet remain localized near the Xi vicinity (Figure 2.1C).

Morphological aberrations were accentuated using single-molecule super-resolution imaging by

3D stochastic optical reconstruction microscopy (3D STORM) (Figure 2.1E). ∆RepB’s diffuse

Xist cloud was not due to changes in Xist RNA level (Figure 2.1D), nor failure to recruit the nuclear matrix protein CIZ1 as was the case for ∆RepE (Figure 2.3A, see also Chapter 3)

(Ridings-Figueroa et al., 2017; Sunwoo et al., 2017). Thus, ∆RepB represents a distinct mechanism of Xist RNA localization. To pinpoint the specific element responsible for ∆RepB’s phenotype, we generated smaller internal deletions (Figure 2.3B). The ~300 bp subdeletion of

Repeat B itself (∆RepBd) fully recapitulated aberrant Xist clouds, whereas the other subdeletions did not. These data show that the conserved repetitive GCCCC(A/T) motif is critical for proper Xist localization.

Next, we investigated whether any deletions might affect Xist-dependent chromatin modifications enriched on Xi. Specifically, we performed immunofluorescence (IF) for

H2AK119ub and H3K27me3 marks, mediated by PRC1 and PRC2, respectively (Figures 2.1F,

35

Figure 2.3 The Repeat B motif affects Xist localization in a CIZ1-independent manner

(A) IF showing loss of CIZ1 on mutant Xist RNA in ∆RepE but not ∆RepB Xi+/- MEFs. Arrowhead indicates WT and arrow indicates mutant Xist cloud.

(B) Diagram and Xist RNA FISH of RepB subdeletions. Location of gRNA target sites and

Repeat B element indicated.

36

Figure 2.4 H3K27me3/H2AK119ub IF and Xist RNA FISH for deletions without phenotype

37

2.4). The ∆RepA Xi showed a reduction in both marks, which could be caused by loss of Repeat

A and/or other exon 1 regions due to the confounding splicing defect. Similarly, ∆RepF and

∆Ex2-6 showed decreased H2AK119ub and H3K27me3 Xi enrichment, presumably due to the decrease in Xist RNA levels. Strikingly, this was not the case for ∆RepB. Although Xist still partially covered the mutant Xi, there was complete loss of H2AK119ub and H3K27me3. ∆RepB contrasted sharply with ∆RepE, which caused only moderate loss of these marks despite having an even stronger impact on Xist localization (Figure 2.1C, E). Therefore, Xist delocalization cannot completely account for the absence of H2AK119ub/H3K27me3 on the ∆RepB Xi.

Henceforth, we focus on ∆RepB for its unique effect on both Xist localization and Xi chromatin modifications.

Genomic mapping reveals defects in Xist spreading and Polycomb recruitment

∆RepB’s aberrant cloud morphology suggested a defect in RNA spreading. To visualize

Xist binding sites on chromatin, we performed Capture Hybridization Analysis of RNA Targets

(CHART-seq) in ∆RepB Xi-/- cells. Since the mutant MEFs were derived from F1 hybrid cells in which the Xi’s are of Mus musculus (mus) origin and Xa’s of Mus castaneus (cas) origin, SNPs allowed for allelic analysis of sequencing results (Yildirim et al., 2011). Antisense oligos to Xist

(avoiding our deleted area) efficiently captured Xist RNA and associated Xi chromatin (Figure

2.5A). Whereas WT cells displayed an expected pattern of dense chromosome-wide Xist coverage, ∆RepB Xi-/- cells showed reduced coverage across the entire Xi relative to WT.

Importantly, Xist occupancy was most strongly affected at regions more distant from its site of synthesis—particularly at centromeric and telomeric ends—arguing for inefficient spreading

(Figure 2.5A, ∆ track). Plots of Xist density indicated significantly diminished coverage over genes normally subject to XCI as well as over non-expressed genes (Figure 2.5B). Examination of specific genes confirmed a reduction of Xist occupancy across gene bodies and surrounding intergenic regions (Figure 2.5C). Xist coverage was reduced to the point of being

38

Figure 2.5 Repeat B is required for Xist RNA spreading and Polycomb maintenance

(A) Xist CHART-seq in WT and ∆RepB Xi-/- MEFs. Change in Xist coverage (∆) is shown below as log2(∆RepB/WT). Xist locus is indicated and unmappable regions are masked.

(B) Box plots showing Xist coverage over genes subject to XCI, non-expressed, and escapees in WT and ∆RepB Xi-/- cells. p-values by Wilcoxon rank sum test.

(C) Zoom-in of CHART-seq tracks for representative regions.

(D) Allele-specific ChIP-seq for H3K27me3 and H2AK119ub in ∆RepB Xi-/- MEFs. Composite

(comp) of all reads as well as allelic (Xi and Xa) tracks shown. Xist locus is indicated and unmappable regions are masked.

(E) Box plots showing H3K27me3 and H2AK119ub coverage over genes subject to XCI, non- expressed, and escapees in WT and ∆RepB Xi-/- MEFs. p-values by Wilcoxon rank sum test.

(F) Zoom-in of CHART-seq and ChIP-seq tracks for representative regions.

(G) Fluorescence microscopy showing loss of GFP-RYBP (noncanonical PRC1) or EZH2

(PRC2) from mutant Xi in ∆RepB Xi+/- MEFs. Arrowhead indicates WT and arrow indicates mutant Xist cloud.

39

Figure 2.5 Repeat B is required for Xist RNA spreading and Polycomb maintenance

(continued)

40 indistinguishable from that over escapee genes, which normally are not subject to XCI and have low-level Xist coverage (Figure 2.5B, C, Kdm5c). Similar depletion was not seen at the Xist locus (Figure 2.5C, Xist), where the nascent transcript is naturally tethered to chromatin.

Collectively, these data demonstrate that Repeat B is necessary for proper spreading of Xist across Xi.

We then examined ∆RepB’s effect on Xi chromatin modifications by performing allele- specific ChIP-seq in ∆RepB Xi-/- cells. Consistent with IF, there was near-complete loss of both

H2AK119ub and H3K27me3 on a chromosome-wide scale, specific to the Xi (Figure 2.5D).

Depletion was uniform across the chromosome (Figure 2.5D-F), in contrast to the non-uniform spreading defect of Xist RNA as visualized by CHART-seq. This distinction supports the idea that H2AK119ub/H3K27me3 loss is not merely a secondary effect of aberrant Xist spreading.

Rather, we found it was due to failure of Xist to recruit PRC1/2, as shown by loss of EZH2 (the catalytic component of PRC2) and RYBP (a component of noncanonical PRC1) from the mutant

Xi (Figure 2.5G). While this work was in progress, another study using Xist transgenes reported a similar result (Pintacuda et al., 2017). Thus, Repeat B is essential for continual recruitment of

Polycomb complexes during maintenance of XCI.

Xist spreading and Polycomb recruitment are mediated by HNRNPK

To identify trans factors involved in Repeat B function, we performed in vitro pulldown experiments using aptamer-tagged Repeat B RNA (Figure 2.6A-C). Proteins significantly enriched over aptamer-only and antisense controls were identified by LC/MS (Figure 2.6D,

Table S2). The top candidate in all replicates was HNRNPK, a nuclear poly(C) RNA-binding protein involved in RNA processing, stability, and transport (Bomsztyk et al., 2004). Other candidates included the related HNRNPE1-3 (PCBP1-3) proteins. HNRNPK has been implicated in Xist-binding as well as interaction with PRC1 and PRC2 (Denisenko and

Bomsztyk, 1997; Chu et al., 2015; Cirillo et al., 2016; Pintacuda et al., 2017). However, being a

41

Figure 2.6 Identification of HNRNPK as a Repeat B-interacting protein

(A) In vitro RNA pulldown scheme.

(B) Diagram of optimized 4xS1m aptamer tag used for in vitro RNA pulldown. The following features are highlighted: streptavidin-binding domain (blue), unique stem (green), poly(A) spacer (red). Inclusion of poly(A) spacer and unique stem sequences led to a single structural prediction by Mfold (http://unafold.rna.albany.edu). These modifications promote intra- over inter-aptamer folding between the four aptamer units.

(C) Denaturing gel demonstrating efficient binding and elution of RNA from streptavidin beads.

(D) Silver staining of SDS-PAGE following in vitro RNA pulldown. HNRNPK band is indicated.

(E) IF showing HNRNPK’s tight association with Xi. Two antibodies recognizing non-overlapping epitopes were used: Antibody 1 (Bethyl Laboratories, A300-674A) recognizes the N-terminal 50 amino acids; Antibody 2 (Proteintech, 11426-1-AP) recognizes the C-terminal 350 amino acids.

(F) HNRNPK IF in cells fixed with or without pre- extraction. Without pre-extraction, HNRNPK exhibits nucleoplasmic staining; with pre-extraction, much of the soluble protein is removed, resulting in Xi-enriched staining pattern. Xi enrichment is not an artifact of the extraction/fixation method, as HNRNPE1-3 do not appear Xi-enriched under the same conditions (G).

(H) IF showing Repeat B-dependent association of HNRNPK with Xi in pre-extracted MEFs.

Arrowhead indicates WT and arrow indicates mutant Xist cloud.

(I) IF showing HNRNPK recruitment to an ectopic full-length or exon 1 Xist transgene (arrow).

Over-expressed transgenic RNA outcompetes endogenous Xist (arrowheads) for HNRNPK- binding in female cells.

(J) Coomassie staining of SDS-PAGE demonstrating purity of recombinant His-HNRNPK.

(K) RNA EMSA showing recombinant HNRNPK directly binds a Repeat B RNA fragment in vitro. Binding is specific to WT but not mutated (mut) RNA sequence, and can be outcompeted by excess unlabeled WT RNA.

42

Figure 2.6 Identification of HNRNPK as a Repeat B-interacting protein (continued)

43 ubiquitous RNA-binding protein, it was unclear how HNRNPK might be specifically and functionally relevant to XCI. Here, we validated HNRNPK as a bona fide Xist-interacting protein in vivo. IF using two different antibodies demonstrated clear enrichment on Xi (Figure 2.6E).

Notably, enrichment of HNRNPK (but not HNRNPE1-3) only became visible upon pre-extraction of soluble protein prior to cell fixation (Figure 2.6F, G), implying its tighter association with the Xi compartment.

Significantly, IF in ∆RepB Xi+/- cells revealed loss of HNRNPK from the mutant Xi (Figure

2.6H), demonstrating Xist Repeat B is necessary for HNRNPK’s Xi-association. We then ectopically expressed Xist (full-length or exon 1) via transgene (Jeon and Lee, 2011) and found this was sufficient to stabilize HNRNPK on an autosome as well (Figure 2.6I). To test whether

Repeat B-HNRNPK interaction is direct, we performed electrophoretic mobility shift assay

(EMSA) using recombinant protein and a synthetic RNA fragment (Figure 2.6J, K). Indeed, we observed formation of multiple higher molecular weight species with increasing HNRNPK concentrations, indicative of >1 protein:RNA stoichiometry. Taken together, these data support a direct Repeat B-HNRNPK interaction in vitro and in vivo.

We sought whether HNRNPK is functionally relevant to Repeat B’s roles in Xist spreading and Polycomb recruitment in post-XCI cells. Attempts to generate stable HNRNPK knockout (KO) clones using CRISPR/Cas9 were unsuccessful—consistent with HNRNPK being essential in metazoans (Bomsztyk et al., 2004; Gallardo et al., 2015). However, we were able to temporarily ablate HNRNPK among a mixed cell population, and found deficiency of HNRNPK recapitulated ∆RepB phenotypes (Figure 2.7A). To investigate the kinetics of these events, we performed siRNA knockdown (KD) of HNRNPK across a 6-day timecourse (Figure 2.7B).

Complete loss of H2AK119ub from the Xi was evident by day 2, although at this time, Xist clouds still appeared morphologically normal. Disrupted clouds did not fully manifest until day 4 and was not due to decreased expression of Xist or of genes known to affect Xist localization

(Figure 2.7C). Reduction of H3K27me3 enrichment over the Xi did not occur until day 6. No

44

Figure 2.7 Direct Repeat B-HNRNPK interaction is required for Xist spreading and

Polycomb maintenance

(A) Xist RNA FISH and H3K27me3/H2AK119ub IF in heterogeneous population of MEFs expressing varying levels of HNRNPK. Phenotype correlates with HNRNPK levels. Cells lacking

HNRNPK are outlined; cells with minimal HNRNPK are marked by asterisks. Inset shows 2x zoom-in and increased contrast of Xist clouds indicated by arrows. For easier visualization of

HNRNPK-positive/negative cells, pre-extraction was omitted (hence the nucleoplasmic staining).

(B) Xist RNA FISH and H3K27me3/H2AK119ub IF in MEFs following scramble or HNRNPK KD for 2, 4, and 6 days. Inset shows increased contrast of boxed region.

(C) RNA levels of Xist and genes known to affect Xist localization in HNRNPK KD (day 6) compared to scramble KD MEFs. Error bars show standard deviation for 3 biological replicates.

(D) Western blot showing HNRNPK depletion does not affect global H3K27me3/H2AK119ub levels. GAPDH serves as loading control.

(E) 3D STORM imaging and size measurements of Xist clouds after scramble or HNRNPK KD. p-values by two-tailed t-test.

45

Figure 2.7 Direct Repeat B-HNRNPK interaction is required for Xist spreading and

Polycomb maintenance (continued)

46 change in global H3K27me3 or H2AK119ub was observed by Western analysis at days 2, 4, or

6 (Fig. 2.7D), suggesting HNRNPK’s role in Polycomb maintenance is Xi-specific. Super- resolution imaging confirmed the diffuse Xist clouds in HNRNPK KD cells to be highly similar to those in ∆RepB cells (Figure 2.7E, compare to 2.1E). Collectively, these data demonstrate the importance of HNRNPK in mediating Xist spreading and Polycomb targeting across the Xi.

Interdependent spreading of Xist RNA and Polycomb complexes

We wondered whether the Xist spreading defect might be a consequence of Polycomb loss. To test this, we generated RING1A/RING1B (the catalytic components of PRC1) double-

KO or EED (a core component of PRC2) KO MEFs and verified total loss of H2AK119ub or

H3K27me3, respectively (Figure 2.8A, Table S1). In both KO cell lines, we observed surprising delocalization of Xist RNA reminiscent of ∆RepB and HNRNPK KD/KO phenotypes, confirmed by super-resolution imaging (Figure 2.8B, compare to 2.1E, 2.7E). This was true in additional

RING1A/B and EED KO clones (Figure 2.9A, B, Table S1), and was not an indirect effect of

PRC1/2 depletion on Xist RNA level or on genes known to affect Xist localization (Figure 2.8C).

Importantly, HNRNPK’s association with Xist was unaffected, indicating that PRC1/2’s effect on cloud morphology occurs downstream of Repeat B-HNRNPK interaction (Figure 2.8D).

Because Polycomb is known to play a role in chromatin compaction (Boettiger et al.,

2016; Kundu et al., 2018), we asked whether diffuse Xist clouds could be a byproduct of Xi decompaction, rather than a true defect in Xist spreading. Sequential Xist RNA FISH and X- chromosome painting showed Xist clouds often exceeding the boundary of the underlying Xi territory in RING1A/B and EED KO cells—and occurring on the ∆RepB but not WT Xi in the same nucleus of ∆RepB Xi+/- cells (Figure 2.9C). This observation is consistent with decreased accumulation of Xist on chromatin as seen by CHART-seq. Size measurements of WT Xi and

Xa DNA territories revealed a difference of only ~20%, similar to a previous analysis in mouse

47

Figure 2.8 Xist and Polycomb complexes depend on each other to spread across the Xi

(A) Western blot confirming total loss of H2AK119ub and H3K27me3 in RING1A/B and EED KO

MEFs, respectively.

(B) RNA FISH, 3D STORM imaging, and size measurements of Xist clouds in WT, RING1A/B

KO, and EED KO MEFs. p-values by two-tailed t-test.

(C) RT-qPCR showing no change in Xist levels or genes known to affect Xist localization in

RING1A/B and EED KO MEFs. Error bars show standard deviation for 3 biological replicates.

(D) IF showing HNRNPK association with Xist is unaffected in RING1A/B and EED KO MEFs.

(E) Xist CHART-seq in RING1A/B and EED KO MEFs. Change in Xist coverage relative to WT

(∆) is shown below as log2(KO/WT). Xist locus is indicated and unmappable regions are masked.

(F) Box plots showing Xist coverage over genes subject to XCI, non-expressed, and escapees in WT, ∆RepB Xi-/-, RING1A/B KO, and EED KO MEFs. p-values by Wilcoxon rank sum test.

(G) Zoom-in of CHART-seq tracks for representative regions.

48

Figure 2.8 Xist and Polycomb complexes depend on each other to spread across the Xi

(continued)

49 cells (Giorgetti et al., 2016). This is considerably smaller than the 2-4 fold difference between

WT and diffuse Xist RNA clouds in ∆RepB Xi+/-, HNRNPK KD, RING1A/B KO, and EED KO cells (Figures 2.1E, 2.7E, 2.8B). Moreover, we were unable to detect any significant shift in

Xi:Xa size ratio in ∆RepB Xi+/-, RING1A/B KO, and EED KO cells (Figure 2.9C). These data suggest that the loss of Repeat B/Polycomb in post-XCI cells is not sufficient to reverse Xi compaction at this scale.

To investigate the Xist localization defect further, we performed RNA/DNA FISH on mitotic chromosomes from ∆RepB Xi+/-, RING1A/B KO, and EED KO cells. In this state, all chromosomes are condensed and WT Xist RNA often remained associated with the Xi in cis

(Figure 2.9D). However, ∆RepB Xist within the same cell was barely detectable or completely absent from its corresponding Xi. Similar Xist dissociation was seen in RING1A/B and EED KO cells. These analyses indicate that a difference in Xist’s ability to interact with chromatin, rather than Xi decompaction, likely accounts for the cloud dispersal phenotype.

To corroborate these findings at the molecular level, we performed CHART-seq using

RING1A/B and EED KO cells. In agreement with RNA FISH analysis, Xist binding was reduced across the Xi relative to WT in both cell lines (Figure 2.8E). Interestingly, EED KO showed a stronger effect than RING1A/B KO, indicating that spreading of Xist RNA may depend on

PRC2/H3K27me3 more than on PRC1/H2AK119ub. Overall coverage profiles mirrored that of

∆RepB Xi-/- CHART-seq (compare to Figure 2.5A), exhibiting particularly diminished Xist binding at chromosome extremities. Plots of Xist density again showed low coverage over genes normally subject to XCI and non-expressed genes (Figure 2.8F). Examination of specific genes confirmed a reduction of Xist occupancy across gene bodies and surrounding intergenic regions, but not at the Xist locus itself or escapees (Figure 2.8G). Depletion of PRC1 or PRC2 therefore phenocopies deletion of Repeat B. Taken together, our data reveal the surprising conclusion that, while Xist RNA recruits Polycomb complexes to the Xi, PRC1 and PRC2 are in turn required to properly spread Xist.

50

Figure 2.9 Diffuse Xist cloud morphology is not due to large-scale decompaction of Xi

(A) Western blot confirming total cell loss of H2AK119ub and H3K27me3 in additional

RING1A/B and EED KO MEF clones, respectively.

(B) Xist RNA FISH and H3K27me3/H2AK119ub IF in additional RING1A/B and EED KO MEF clones.

(C) Xist RNA FISH/X-chromosome painting in ∆RepB Xi+/-, RING1A/B KO, and EED KO MEFs.

Arrowhead indicates WT and arrow indicates ∆RepB Xi. Area occupied by diffuse Xist cloud is outlined, showing Xist particles often extend beyond the X chromosome territory on ∆RepB Xi and in KOs. RNA/DNA images were taken sequentially and then aligned by Hoechst signal.

Size measurements of X chromosome territories normalized to Xi within each cell. p-values by two-tailed t-test.

(D) Xist RNA FISH/X-chromosome painting on ∆RepB Xi+/-, RING1A/B KO, and EED KO mitotic chromosomes. Arrowhead indicates WT and arrow indicates ∆RepB Xi. Mitotic retainment of

Xist on Xi is impaired on ∆RepB Xi and in KOs. RNA/DNA images were taken sequentially and then aligned by Hoechst signal.

(E) IF for H3K27me3 in RING1A/B KO and H2AK119ub in EED KO MEFs was performed in the absence of any other Xi marker, showing residual enrichment on Xi is not due to channel bleed- through.

(F) IF for panH2A and panH3 in WT MEFs showing no visible Xi enrichment.

51

Figure 2.9 Diffuse Xist cloud morphology is not due to large-scale decompaction of Xi

(continued)

52

Independent and interdependent recruitment of Polycomb complexes

An ongoing debate in the XCI field regards the order in which PRC1 and PRC2 are recruited to chromatin (Schoeftner et al., 2006; Zhao et al., 2008; Margueron and Reinberg,

2011; Simon and Kingston, 2013; Cooper et al., 2016; Almeida et al., 2017; Pintacuda et al.,

2017). Given our KO cell lines, we examined reciprocal effects of depleting PRC1 and PRC2.

As shown by IF, KO of EED expectedly caused total loss of H3K27me3, but also reduced

H2AK119ub enrichment on the Xi (Figures 2.9B, 2.10A). This effect is consistent with the canonical Polycomb pathway whereby H3K27me3 facilitates PRC1 recruitment (Cao et al.,

2002; Margueron and Reinberg, 2011; Simon and Kingston, 2013). Similarly, RING1A/B KO caused total loss of H2AK119ub, but also significantly reduced H3K27me3 enrichment on the

Xi. This effect is consistent with the non-canonical Polycomb recruitment pathway whereby

H2AK119ub facilitates PRC2 recruitment (Tavares et al., 2012; Kalb et al., 2014; Cooper et al.,

2016). Xist delocalization (due to Polycomb loss) may also partly contribute toward the overall reduction of H2AK119ub/H3K27me3 in each KO cell. If so, this would suggest that the interdependency between Polycomb complexes on Xi may, in fact, partially stem from their role in helping Xist to spread.

Nevertheless, in both EED and RING1A/B KO cell lines, a notable fraction of cells retained H2AK119ub or H3K27me3 Xi enrichment, respectively (Figures 2.9B, 2.10A), suggesting PRC1 and PRC2 can be recruited independently of one another to a degree. This was unexpected given a recent model in which PRC2 recruitment to the Xi was proposed to strictly depend on prior H2AK119ub modification by non-canonical PRC1 (Cooper et al., 2016;

Almeida et al., 2017; Pintacuda et al., 2017). We ruled out the possibility that residual

H2AK119ub/H3K27me3 Xi enrichment could be due to channel bleed-through from concurrent

Xist RNA FISH, as IF alone showed identical results (Figure 2.9E). To see whether the more compact state of the Xi could cause H2AK119ub or H3K27me3 to appear enriched above neighboring chromatin, we performed IF using panH2A or panH3 antibodies, respectively, in WT

53

Figure 2.10 Independent and interdependent recruitment of PRC1 and PRC2 to the Xi

(A) Xist RNA FISH and H3K27me3/H2AK119ub IF in WT, RING1A/B KO, and EED KO MEFs.

(B) Allele-specific ChIP-seq for H3K27me3 and H2AK119ub in RING1A/B and EED KO MEFs, respectively. WT tracks included for comparison. Composite (comp) of all reads as well as allelic (Xi and Xa) tracks shown. Xist locus is indicated and unmappable regions are masked.

(C) Box plots showing H3K27me3 and H2AK119ub coverage over genes subject to XCI, non- expressed, and escapees in WT, ∆RepB Xi-/-, RING1A/B KO, and EED KO MEFs. p-values by

Wilcoxon rank sum test.

(D) Zoom-in of ChIP-seq tracks for representative region.

54

Figure 2.10 Independent and interdependent recruitment of PRC1 and PRC2 to the Xi

(continued)

55 cells and saw no such enrichment (Figure 2.9F). Taken together, perturbing either PRC1 or

PRC2 weakened the other’s modification of Xi chromatin, though PRC1’s effect on H3K27me3 was stronger than PRC2’s on H2AK119ub. Yet at the same time, the two can function independently on Xi to an extent, since many cells retained weak H2AK119ub or H3K27me3 enrichment when PRC2 or PRC1 was depleted, respectively.

To confirm these results with another approach, we performed ChIP-seq for H3K27me3 in RING1A/B KO and H2AK119ub in EED KO cells. In agreement with our IF observations, EED

KO caused significant reduction of H2AK119ub on Xi relative to WT, and RING1A/B KO caused even stronger reduction of H3K27me3 (Figure 2.10B). These effects were chromosome-wide.

Nevertheless, H2AK119ub/H3K27me3 levels remained slightly higher on Xi over Xa in PRC2/1

KO cells, respectively, as well as over Xi in ∆RepB Xi-/- cells, which showed near-complete loss of both marks (Figure 2.10B-D, compare to Figure 2.5D-F). This residual enrichment was particularly noticeable over genes subject to XCI (Figure 2.10C, D, Pak3) but less so over non- expressed genes (Figure 2.10C, D, Dcx), which often already contain H2AK119ub/H3K27me3 on Xa. Together, these data support both independent and interdependent mechanisms of

PRC1/2 recruitment during maintenance of XCI.

Failure of Xi gene silencing with partial compensatory downregulation of Xa

To determine the role of Repeat B in maintenance of gene silencing, we performed

RNA-seq in ∆RepB Xi-/- MEFs. No major changes occurred in X-linked gene expression (Figure

2.11A, B), despite chromosome-wide depletion of H2AK119ub and H3K27me3 (Figure 2.5D).

These findings are consistent with gene silencing being generally stable in post-XCI cells, independent of Xist RNA (Brown and Willard, 1994). To determine if Repeat B is required for de novo silencing, we generated the same deletion in female mouse embryonic stem (ES) cells

(Figure 2.11C, Table S1), which undergo XCI as they differentiate. Importantly, our ES cell line is a mus/cas hybrid that selectively inactivates the mus X chromosome (Figure 2.11D)

56

Figure 2.11 Repeat B is required for complete Xist-mediated gene silencing

(A) Scatterplots depicting allele-specific expression in ∆RepB Xi-/- versus WT MEFs. p-values by

Wilcoxon rank sum test.

(B) Box plots showing no change in the fraction of mus reads for X-linked genes in ∆RepB Xi-/- versus WT cells. p-values by two-tailed t-test.

(C) Diagram and Southern blot showing Repeat B deletion in ES cells. Black arrows indicate

NcoI restriction sites; green arrowheads indicate gRNA target sites; red bar indicates Southern probe position.

(D) Allelic bias of Xist expression in this cell background (Ogawa et al., 2008), confirming mus reads for X-linked genes are indeed Xi-derived.

(E) H2AK119ub/H3K27me3 IF and Xist RNA FISH in ∆RepB and WT ES cells.

(F) CDF plots and (G) scatterplots depicting allele-specific expression in ∆RepB (clone 1) versus WT day 14 ES cells. p-values by Wilcoxon rank sum test.

(H) Box plots showing increase in the fraction of mus reads for X-linked genes in ∆RepB (clone

1) versus WT day 14 ES cells cells. p-values by two-tailed t-test.

(I) Fraction of mus reads for genes with respect to chromosomal position in ∆RepB (clone 1) versus WT day 14 ES cells.

(J) Zoom-in of RNA-seq tracks for representative genes.

(K-O) Same as (F-J) but for ∆RepB (clone 2) versus WT day 14 ES cells.

57

Figure 2.11 Repeat B is required for complete Xist-mediated gene silencing (continued)

58

Figure 2.11 Repeat B is required for complete Xist-mediated gene silencing (continued)

59

(Ogawa et al., 2008), facilitating downstream allelic analysis. When differentiated for 14 days, the ∆RepB ES cells recapitulated the defective Xist cloud and H2AK119ub/H3K27me3 phenotypes seen in MEFs (Figure 2.11E). Thus, Repeat B is also required for Xist spreading and Polycomb targeting during de novo establishment of XCI.

Transcriptomic analysis at day 14 of differentiation showed an upregulation of X-linked genes in ∆RepB cells compared to WT, whereas no significant change was observed for autosomes such as chromosome 13 (Figure 2.11F, G, “comp”). Examination of individual alleles showed a drastic increase specific to Xi, as evidenced by a rightward shift in cumulative distribution frequency (CDF) plots and an upward deviation in scatterplots (Figure 2.11F, G,

“mus”). The fraction of reads from Xi shifted from ~5% (monoallelic) in WT to ~35% (biallelic) in

∆RepB ES cells (Figure 2.11H). This percentage remained at the expected 50% (biallelic) for reads from chromosome 13 in both WT and ∆RepB cells. Genes that failed to be silenced showed no obvious clustering along the chromosome (Figure 2.11I). Inspection of individual genes confirmed the presence of reads from the Xi in ∆RepB but not WT cells (Figure 2.11J).

Interestingly, Xi upregulation was accompanied by a slight but significant and reproducible downregulation of Xa (Figure 2.11F, G, “cas”), suggesting partial Xa compensation for failed Xi silencing. This offers one possible explanation for the continued survival of ∆RepB cells despite undergoing incomplete XCI. Similar results were obtained using a second independently derived

∆RepB clone (Figure 2.11K-O). Thus, Repeat B is essential during differentiation for establishing chromosome-wide gene silencing on the Xi—which if compromised, may be partially rescued by downregulating the Xa.

Deleting Repeat B impairs Xi architectural reconfiguration

XCI is accompanied by architectural reconfiguration of the Xi, characterized by weakening of topologically associating domains (TADs) and formation of megadomains (Nora et al., 2012; Rao et al., 2014; Deng et al., 2015; Minajigi et al., 2015; Giorgetti et al., 2016). We

60 wondered how deleting Repeat B might broadly affect Xi chromosome structure. To address this, we performed allele-specific in situ Hi-C in ∆RepB ES cells after 14 days of differentiation.

Contact heatmaps at 100-kb resolution showed that, as expected, the WT Xi folded into two megadomains (Figure 2.12A, upper left panel, “WT”). Interestingly, megadomains were also present on ∆RepB Xi (Figure 2.12A, upper left panel, “∆RepB”), despite the significant defect in

XCI establishment (Figure 2.11E-O). Thus, while megadomain architecture is not required for Xi gene silencing (Darrow et al., 2016; Giorgetti et al., 2016; Froberg et al., 2018), our data show it is neither sufficient for, nor a consequence of gene silencing. By examining the Pearson correlation map for interaction frequencies, the two megadomains appeared even more pronounced on the Xi in ∆RepB relative to WT (Figure 2.12A, lower left panel, compare “WT” and “∆RepB”). Consistent with this observation, interaction frequency between the two megadomains on Xi decreased from 15% of total chromosome interactions (WT) to 9%

(∆RepB), whereas the Xa remained unchanged (9%). ∆RepB Xi-/- MEFs showed the same trend

(Figure 2.12F)—17% (WT) to 12% (∆RepB) on Xi; Xa unchanged (12%)—suggesting Repeat B might play a role in promoting long-range chromosomal interactions spanning the megadomain border.

Next, we examined Xi TAD organization. Consistent with a recent study from our lab

(Wang et al., 2018), TADs were still present on Xi but in a significantly attenuated state relative to Xa (Figure 2.12B, C). However, TADs were visibly less attenuated on ∆RepB Xi compared to

WT. To quantify this, we calculated “insulation scores” at 100-kb resolution across WT Xa and used these to call TADs. Consistent with prior studies (Giorgetti et al., 2016), ~110 TADs were identified (~90 after removal of regions with low SNP density). We then assigned a metric to each, with a greater “TAD score” signifying greater TAD strength (see “Materials and Methods”).

Averaged across the chromosome, TADs were stronger on the ∆RepB Xi compared to WT in day 14 ES cells (Figure 2.12B), with 31 individual TADs being significantly stronger (8 were weaker) (Figure 2.12C). We did not detect any significant correlation between strength of these

61

Figure 2.12 Repeat B is required for proper Xa to Xi architectural reconfiguration

(A) Hi-C interaction maps at 500-kb resolution for WT (above diagonal) versus ∆RepB (below diagonal) Xi (left) and Xa (right) in day 14 ES cells. Corresponding Pearson correlation maps at

200-kb resolution shown below. Arrows indicate regions of ∆RepB Xi exhibiting Xa-like features.

(B) Average TAD scores for Xi and Xa in WT and ∆RepB day 14 ES cells. Chromosome 13 is shown for comparison. p-values by Wilcoxon rank sum test.

(C) Individual TAD scores for Xi and Xa in WT (blue) and ∆RepB (pink) ES cells. TADs showing significant difference are indicated above by a black dot. Zoom-in of regions with corresponding

Hi-C interaction maps shown at 100-kb resolution.

(D) Change in fraction of mus reads for each X-linked gene compared to change in TAD score of its residing TAD on Xi in ∆RepB versus WT day 14 ES cells. The dashed line indicates upper threshold for significant change in TAD score.

(E) First principal component (PC1) for Xa and Xi in WT and ∆RepB day 14 ES cells shows A/B compartments and megadomains, respectively. PC1 for Xi in WT day 7 ES cells (Wang et al.,

2018) shows S1/S2 compartments for comparison with our Pearson matrix of WT Xi from (A).

(F-H) Same as (A-C) but for ∆RepB versus WT MEFs.

(D) Model: Interdependency in the spreading of Xist RNA, PRC1, and PRC2. Xist recruits

PRC1/2 to the Xi. HNRNPK is required for this process through its direct interaction with Repeat

B. PRC1/2 reinforce each other’s recruitment and spread along chromatin via canonical and non-canonical Polycomb pathways. Polycomb spreading in turn aids Xist binding along chromatin, perhaps through Xist interaction with Polycomb complexes, modification of histones, and/or changes in local chromatin structure.

(E) Schematic of the Repeat B pathway, showing interdependency between Xist and Polycomb complexes. Dotted arrow indicates parallel Repeat A-mediated Polycomb recruitment pathway, as reported previously (Zhao et al., 2008).

62

Figure 2.12 Repeat B is required for proper Xa to Xi architectural reconfiguration

(continued)

63

Figure 2.12 Repeat B is required for proper Xa to Xi architectural reconfiguration

(continued)

64

TADs and corresponding gene expression (Figure 2.12D), in line with our earlier observation that genes no longer silenced in ∆RepB ES cells do not cluster along the chromosome (Figure

2.11I, N). Allele-specific in situ Hi-C in ∆RepB Xi-/- MEFs revealed similar, albeit subtler, impacts of deleting Repeat B on maintenance of Xi chromosome structure (Figure 2.12F-H), despite no significant change in gene expression (Figure 2.11A, B). In particular, 18 TADs were stronger (7 were weaker) on the ∆RepB Xi compared to WT (Figure 2.12G, H), similar to our previous observation in MEFs harboring a full-length Xist deletion (Minajigi et al., 2015). Collectively, these results suggest that Repeat B is at least partly responsible for Xist’s role in disrupting

TADs during both establishment and maintenance of XCI. Furthermore, they demonstrate that

TADs and megadomains can co-exist on the X chromosome.

In addition to TADs, mammalian chromosomes are partitioned into alternating A/B compartments, formed through self-association of active gene-rich (A) and inactive gene-poor

(B) regions (Lieberman-Aiden et al., 2009; Bickmore and van Steensel, 2013; Rao et al., 2014;

Bonev and Cavalli, 2016). During formation of the Xi, A/B compartments are partially merged to create larger “S1/S2” transitional compartments, which are merged further to yield the final Xi- specific macro-structure (Wang et al., 2018). To examine potential effects of deleting Repeat B on compartmentalization, we again examined Pearson correlation maps of our Hi-C interaction matrices. As expected, the Xa displayed a checkerboard pattern characteristic of A/B compartmentalization (Figure 2.12A, lower right panel; E). Meanwhile, the WT Xi displayed a partial checkerboard pattern resembling S1/S2 compartmentalization (Figure 2.12A, lower left panel, “WT”; E). However, our principal component analysis only detected megadomains, perhaps due to the late differentiation timepoint of our dataset (S1/S2 compartmentalization was originally reported using day 7 ES cells). This was absent on ∆RepB Xi, which instead retained

Xa-like features in some regions (Figure 2.12A, arrows). Taken together, our data show that

Repeat B plays a critical role in transforming the X from an active to inactive chromosomal structure.

65

Discussion

Here, we have performed the first comprehensive deletional analysis of Xist in the native context and found that Repeat B plays a critical role in propagating Xist RNA and Polycomb complexes along the Xi. Unexpectedly, our analyses revealed a profound interdependency between Xist RNA and Polycomb complexes during the process of spreading. These findings shift our view of the relationship between Xist and Polycomb from one of unidirectional to bidirectional recruitment. Furthermore, our study shows that the dynamics of Polycomb recruitment are more complex than previously suggested. Both canonical (PRC2 action recruits

PRC1) and non-canonical (PRC1 action recruits PRC2) pathways have been proposed, with prevalent models positing that one complex depends on the other in a hierarchical fashion.

However, more recent evidence suggests a greater degree of interdependency between PRC1 and PRC2 (Schoeftner et al., 2006; Margueron and Reinberg, 2011; Kahn et al., 2016; Almeida et al., 2017; Dorafshan et al., 2017). Consistently, our results for the X chromosome show that depleting PRC2 limits but does not abolish PRC1 recruitment, and vice versa. Thus, PRC1 and

PRC2 reinforce each other’s occupancy but can be recruited independently of the other to a limited extent. Our model therefore proposes that the two pathways can work in parallel during maintenance of XCI, though both require Xist Repeat B and HNRNPK.

In the simplest model (Figure 2.12I, J), Xist recruits PRC1 and PRC2 to Xi chromatin, where PRC2 trimethylates H3K27 and PRC1 monoubiquitylates H2AK119. The H3K27me3 and

H2AK119ub marks then facilitate spreading of PRC1 and PRC2 through the canonical and non- canonical pathways, respectively (Margueron and Reinberg, 2011; Tavares et al., 2012; Simon and Kingston, 2013; Kalb et al., 2014; Cooper et al., 2016). In this way, PRC1 and PRC2 reinforce each other’s recruitment by Xist RNA and partially self-propagate along the chromatin in cis. In turn, as revealed in this study, PRC1 and PRC2 reciprocally enable Xist to spread on the Xi, possibly by mediating contact between the RNA and chromatin. Alternatively, Polycomb- mediated changes to chromatin structure (Francis et al., 2004; Kundu et al., 2018; Oksuz et al.,

66

2018) may allow Xist to better interact with chromatin or spread to less accessible regions

(Figure 2.12I). Thus, the interdependency between Xist and Polycomb complexes would explain the chromosome-wide propagation of the silencing machinery via a positive feedback loop.

Notably, previous studies with PRC1/2 KOs or deletions encompassing Repeat B did not exhibit noticeable effects on the morphological character of Xist clouds (Wutz et al., 2002;

Schoeftner et al., 2006; da Rocha et al., 2014; Almeida et al., 2017; Pintacuda et al., 2017). A number of reasons could explain this discrepancy, including non-native conditions employed by previous studies, such as the use of transgenes and/or forced Xist expression. Prior studies also did not employ super-resolution imaging. Here, all experiments were performed in the native context and in female cells, and furthermore benefited from inclusion of an unperturbed

Xist cloud in the same nucleus for side-by-side comparison.

Another interesting result from our screen was that many of Xist’s functional elements correspond to repetitive motifs, which may act as multicopy binding platforms for trans factors.

As a proof of principle, we demonstrate that Repeat B can indeed accommodate multiple molecules of HNRNPK. Previous studies have suggested a direct interaction between HNRNPK and PRC1 (Pintacuda et al., 2017) as well as PRC2 (Denisenko and Bomsztyk, 1997). Because

HNRNPK is a general RNA-binding protein, it is likely to require cooperation with additional Xi factors to carry out its role in Polycomb recruitment. Indeed, the Repeat B pathway may be only part of the picture. Deleting Repeat B causes substantial failure of de novo gene silencing chromosome-wide. However, partial silencing still occurred, likely due to intact Repeat A, a motif known to be involved in gene silencing (Wutz et al., 2002). Thus, Repeats A and B may work together to establish, spread, and maintain epigenetic silencing. Unfortunately, we were unable to investigate Repeat A’s individual contribution in this study due to the confounding effect on

Xist splicing.

Finally, our study also demonstrates that Repeat B plays an important role in architectural reorganization of the Xi. Deleting Repeat B post-XCI resulted in failed attenuation

67 of TADs, but at the same time left megadomains strongly intact. This effect was even greater when deleting Repeat B during de novo XCI, consistent with the accompanying failure of gene silencing. These results indicate that TADs and megadomains are not mutually exclusive, and furthermore that the Xi-specific megadomain structure does not preclude gene expression.

Moreover, deleting Repeat B led to a relative decrease in interaction frequency between the two megadomains. Thus, Repeat B (and Xist in general) may play a role in promoting long-range chromosomal interactions that span the two domains. Lastly, compartment structures were also affected, with the mutant Xi lacking patterns resembling S1/S2 compartmentalization, unlike the

WT Xi. Instead, the mutant Xi exhibited regional Xa-like features. Thus, Repeat B may play a role in S1/S2 compartment formation. Our study has thus identified a multifunctional RNA domain that coordinates interdependent spreading of Xist and Polycomb complexes, gene silencing, and the topological transformation from active to inactive X chromosome structure.

Materials and Methods

Cell lines

The M. musculus/M. castaneus F1 hybrid transformed MEF cell line was previously described as “EY.T4” (Yildirim et al., 2011). The male and female transgenic Xist MEF cells lines carrying either full-length Xist or Xist exon 1 were previously described as “♂X+P” and “♀X+P E1”, respectively (Jeon and Lee, 2011). MEFs were grown in medium containing DMEM, high glucose, GlutaMAX supplement, pyruvate (Thermo Fisher Scientific), 10% FBS (Sigma), 25 mM

HEPES pH 7.2-7.5 (Thermo Fisher Scientific), 1x MEM non-essential amino acids (Thermo

Fisher Scientific), 1x Pen/Strep (Thermo Fisher Scientific), and 0.1 mM βME (Thermo Fisher

Scientific) at 37°C with 5% CO2. The M. musculus/M. castaneus F2 hybrid ES cell line carrying a mutated Tsix allele was previously described as “TsixTST/+” (Ogawa et al., 2008). ES cells were grown on γ-irradiated MEF feeders in medium containing DMEM, high glucose, GlutaMAX supplement, pyruvate, 15% Hyclone FBS (Sigma), 25 mM HEPES pH 7.2-7.5, 1x MEM non-

68 essential amino acids, 1x Pen/Strep, 0.1 mM βME, and 500 U/mL ESGRO recombinant mouse

Leukemia Inhibitory Factor (LIF) protein (Sigma, ESG1107) at 37°C with 5% CO2. LIF was excluded from the medium during ES cell differentiation procedures (see Method Details below).

ES cell differentiation

Undifferentiated ES cells were grown on γ-irradiated MEF feeders for 3 days, after which ES cell colonies were trypsinized and feeders removed (day 0). ES cells were then switched to medium lacking LIF and grown in suspension for 4 days, forming embryoid bodies (EBs). EBs were settled down onto gelatin-coated coverslips at day 4 and allowed to further differentiate until harvesting at day 14.

Generation of Xist deletions and HNRNPK, RING1A/B, and EED KO cells

Xist deletions were generated by CRISPR/Cas9 using a pair of gRNAs flanking each target region. HNRNPK, RING1A/B, and EED KOs were also generated by CRISPR/Cas9 but using a single gRNA to disrupt the coding sequence (premature stop or frameshift mutation). gRNA sequences (Table S3) were designed using tools available online (http://crispr.mit.edu) and cloned into pSpCas9(BB)-2A-GFP or pSpCas9(BB)-2A-Puro vectors (Ran et al., 2013). gRNA/Cas9 plasmid was delivered into ES cells by electroporation (Bio-Rad Gene Pulser Xcell) or MEFs by nucleofection (Lonza Nucleofector II, using MEF 2 solution) as per manufacturer’s instructions. Following plasmid delivery, cells were cultured for one week to allow sufficient time for DNA cutting and repair. Single cells were then sorted into 96-well plates by FACS, expanded, and screened by genomic PCR, Sanger sequencing (Table S1), and Xist RNA FISH

(Xist deletions) or HNRNPK/H2AK119ub/H3K27me3 IF (HNRNPK, RING1A/B, EED KOs).

Multiple clones were examined for each deletion/KO to rule out any effect arising from clonal variation.

69

Labeling of Xist oligo FISH probes

Xist oligo FISH probe sequences (Table S4) were designed using tools available online

(https://www.idtdna.com/calc/analyzer). Amino-ddUTP (Kerafast) was added to the 3’-ends of pooled oligos by Terminal Transferase (New England BioLabs) as per manufacturer’s instructions. Oligos were purified by phenol/chloroform extraction, concentrated by ethanol precipitation, resuspended in 0.1 M sodium borate, and labeled with Atto488 (Sigma), Cy3B (GE

Healthcare), or Alexa647 NHS-ester (Life Technologies). After another ethanol precipitation, labeled oligos were resuspended in water and labeling efficiency was evaluated by absorbance using NanoDrop (Thermo Fisher Scientific).

RNA FISH

RNA FISH was performed as previously described (Sunwoo et al., 2015). Briefly, cells were grown on glass coverslips and rinsed in PBS. They were fixed in 4% paraformaldehyde for 10 min and then permeabilized in PBS/0.5% Triton X-100 for another 10 min at room temp. Cells were rinsed in PBS and dehydrated in a series of increasing ethanol concentrations. Labeled oligo probe pool (0.1-5 nM each) was added to hybridization buffer containing 25% formamide,

2x SSC, 10% dextran sulfate, and nonspecific competitor (0.1 mg/mL mouse Cot-1 DNA

[Thermo Fisher Scientific] or 1 mg/mL yeast tRNA [Thermo Fisher Scientific]). Hybridization was performed in a humidified chamber at 37°C for >3 hours or overnight. After being washed once in 25% formamide/2x SSC at 37°C for 20 min and three times in 2x SSC at 37°C for 5 min each, cells were mounted for wide-field fluorescent imaging. Nuclei were counter-stained with Hoechst

33342 (Life Technologies).

RNA FISH/X-chromosome painting

1x105 cells were cytospun onto glass slides and RNA FISH performed as described above.

Slides were mounted and images captured with positions recorded. After imaging RNA signal,

70 coverslips were carefully removed and slides rinsed first in PBS/0.05% Tween-20 and then in

PBS to remove mounting medium, treated with RNase A (0.5 mg/mL in PBS) at 37°C for 40 min to remove RNA signal, and denatured for DNA FISH in 70% formamide/2x SSC at 80°C for 15 min. Slides were quenched in ice cold 70% ethanol and dehydrated in a series of increasing ethanol concentrations. 1:10 (v/v) XMP X Green mouse chromosome paint (MetaSystems, D-

1420-050-FI) was added to hybridization buffer containing 50% formamide, 2x SSC, 10% dextran sulfate, and 0.2 mg/mL mouse Cot-1 DNA (Thermo Fisher Scientific) and denatured at

95°C for 10 min. Hybridization was performed in a humidified chamber at 37°C overnight. After being washed once in 0.2x SSC at 65°C for 10 min and three times in 2x SSC at room temp for

5 min each, slides were remounted and reimaged at recorded positions.

IF/RNA FISH

IF was performed as previously described (Sunwoo et al., 2015). Briefly, cells were grown on glass coverslips and rinsed in PBS. They were fixed in 4% paraformaldehyde for 10 min and then permeabilized in PBS/0.5% Triton X-100 for another 10 min at room temp. To detect association of HNRNPK with Xi, cells were pre-extracted with CSKT buffer (100 mM NaCl, 300 mM sucrose, 10 mM PIPES, 3 mM MgCl2, 0.5% Triton X-100, pH 6.8) for 10 min prior to fixation

(no subsequent permeabilization step necessary). After being blocked for 30 min in PBS/1%

BSA supplemented with 10 mM ribonucleoside vanadyl complex (New England BioLabs), primary antibodies were added and allowed to incubate at room temp for 1 hr. Cells were washed three times in PBS/0.05% Tween-20 at room temp for 5 min each. After incubating with dye-conjugated secondary antibody for 30 min at room temp, cells were washed again in

PBS/0.05% Tween-20 at room temp for 5 min each. Cells were post-fixed in 4% paraformaldehyde and dehydrated in a series of increasing ethanol concentrations. RNA FISH was then performed as described above.

71

Preparation of mitotic chromosomes

MEFs were grown to ~75% confluency. KaryoMAX colcemid solution (Thermo Fisher Scientific) was added to cells at a final concentration of 100 ng/mL for 2 hours. Afterwards, the plate was tapped several times against a hard surface and rinsed with PBS to dislodge mitotic cells. The medium/PBS was collected and cells pelleted by centrifugation at 400 g for 5 min. The supernatant was aspirated and cells gently resuspended in 200 μL PBS. Hypotonic solution (75 mM KCl) was slowly added to a final volume of 10 mL while swirling, and cells were incubated at 37°C for 10 min. Cells were pelleted by centrifugation at 400 g for 5 min. The supernatant was aspirated and cells gently resuspended in 200 μL hypotonic solution. Cold fixative (3:1 methanol:acetic acid) was slowly added to a final volume of 10 mL while swirling, and cells were fixed at -20°C for 1 hour. Cells were pelleted by centrifugation at 400 g for 5 min, washed with an additional 10 mL of cold fixative, pelleted again, and gently resuspended in 1 mL fixative. 100

μL of fixed cell solution was dropped (dropwise) onto slides from 1 inch height, allowing the slide to air dry for 30 seconds between each drop. Slides were allowed to air dry completely for 1 hour before proceeding to RNA FISH.

Microscopy

For wide-field fluorescent imaging, cells were observed on a Nikon 90i microscope equipped with 60x/1.4 N.A. VC objective lens, Orca ER CCD camera (Hamamatsu), and Volocity software

(Perkin Elmer).

3D STORM super-resolution microscopy

STORM imaging was performed as previously described (Sunwoo et al., 2015) on an N-STORM

(Nikon) equipped with 100x/1.4 N.A.  objective lens, ion X3 EM CCD camera (Andor), and 3 laser lines (647 nm, 561 nm, and 405 nm). A cylindrical lens was inserted into the optical path to introduce astigmatism. Z calibration was performed using 100 nm TetraSpeck beads (Life

72

Technologies). Imaging buffer containing 147 mM ME and GluOX (Sigma) was used to promote blinking and reduce photo-bleaching. Z calibration was performed using 100 nm

TetraSpeck beads. N-STORM module in Element software (Nikon) was used to control microscopes, acquire images, and perform 3D STORM localizations.

Microscopy image analysis

For measuring the size of Xist clouds or X chromosome territories, FISH images captured on an epifluorescent microscope were imported into ImageJ (NIH, v1.52a) using the Bio-Formats

Importer plug-in. After maximum intensity projection along the z-axis, Xist clouds or X chromosome territories were outlined by the free hand selection tool and area within measured.

For alignment of Xist RNA FISH and X-chromosome painting images, the two images were cropped and saved as TIF files. TIF Images were imported into MatLab (MathWorks). Nuclei

(stained with Hoechst 33342) were aligned by applying three functions: imregconfig (configuring intensity-based registration), imregtform (estimating geometric transformation for aligning images), and imwarp (applying geometric transformation). Registered image was saved as TIF.

For comparing relative intensity between different RNA FISH probes in RepA MEFs, RNA

FISH was performed as described above using three differently colored probe sets: Xist exon 7

(Atto488), Repeat A (Cy3B), and Repeat F (Alexa647). All measurements were done using

Volocity software (Perkin Elmer). Xist clouds were identified based on exon 7 signal, and intensities for all three probe sets were recorded. Values were imported to Excel and baseline signal was subtracted based on nuclear background FISH intensity per volume. Adjusted intensity per channel for each Xist cloud was normalized to that of exon 7 in order to approximate relative level between clouds.

73

Antibodies

The following primary antibodies were used for ChIP-seq: H3K27me3 (GeneTex, GTX60892),

H2AK119ub (Cell Signaling, CST8240); for IF: H3K27me3 (GeneTex, GTX60892), H3K27me3

(Active Motif, AM39535), H2AK119ub (Cell Signaling, CST8240), EZH2 (Cell Signaling,

CST3147), HNRNPK (Proteintech, 11426-1-AP), HNRNPK (Bethyl Laboratories, A300-674A),

HNRNPE1 (Proteintech, 14523-1-AP), HNRNPE2 (GeneTex, GTX114616), HNRNPE3

(GeneTex, GTX118656), CIZ1 (in-house (Sunwoo et al., 2017; see Chapter 3)); for Western blot: H3K27me3 (GeneTex, GTX60892), H3K27me3 (Cell Signaling, CST9733), H2AK119ub

(Cell Signaling, CST8240), GAPDH (Cell Signaling, CST14C10), HNRNPK (Proteintech, 11426-

1-AP). Dye-conjugated secondary antibodies were purchased from Life Technologies.

Western blot

Cells were washed once with PBS and lysed in cold lysis buffer (50 mM Tris pH 7.5, 150 mM

NaCl, 1% NP-40, 1% sodium deoxycholate, 0.1% SDS, 1x protease inhibitor cocktail [Sigma,

P8340]). Lysate was sonicated (Qsonica Q800 Sonicator) in polystyrene tubes at 45% power setting, 30 sec on/30 sec off for a total sonication time of 5 min at 4°C. After removing debris by centrifugation at 16,000 g for 10 min, protein concentration in the supernatant was measured

(Pierce BCA Assay Kit). 20-50 μg protein lysate was denatured in 1x Laemmli buffer at 95°C for

10 min and resolved by SDS-PAGE. Protein was electrotransferred to Immobilon-P PVDF membrane (EMD Millipore). At room temp, the membrane was blocked with blocking buffer

(PBS/0.05% Tween-20 containing 5% milk) for 1 hr, incubated with primary antibody in blocking buffer for 1 hr, washed three times with PBS/0.05% Tween-20 for 5 min each, incubated with secondary antibody-HRP conjugate (Promega, W4011 or W4021) in blocking buffer for 30 min, and washed three times again with PBS/0.05% Tween-20 for 5 min each. Protein bands were visualized using Western Lightning Plus-ECL (PerkinElmer) and exposing to BioMax MR film

(Carestream Health).

74

Southern blot

ES cells were lysed in lysis buffer (10 mM Tris pH 7.5, 10 mM NaCl, 10 mM EDTA, 0.5% sarkosyl) supplemented with 1 mg/ml proteinase K (Sigma) overnight at 55°C. Genomic DNA was precipitated by adding 2x volume of 66 mM NaCl/ethanol and washed with 70% ethanol.

Restriction digestion with NcoI (New England BioLabs) was performed overnight at 37°C. 10 g of digested DNA was resolved by agarose gel electrophoresis. DNA was de-purinated using 0.2

N HCl for 10 min, washed in deionized water, neutralized with 0.4 N NaOH, and transferred to

Hybond-XL membrane (GE Healthcare) by capillary action in 0.4 N NaOH overnight at room temp. Membrane was rinsed briefly with 2x SSC and then pre-hybridized in Church buffer (7%

SDS, 158 mM NaH2PO4, 342 mM Na2HPO4, 1% BSA, 1 mM EDTA) for 2 hours at 65°C. The probe was a ~500-bp PCR fragment (primers: CCTAAATGTCCTATAATCCATTGCTACACA and CGCTTGGTGGATGGAAATATGGTTTTG) labeled using Random Primers DNA Labeling

System (Thermo Fisher Scientific) as per manufacturer’s instructions. Labeled probe was denatured at 95°C for 10 min prior to hybridization with membrane at ~1x106 cpm/mL overnight at 65°C. Membrane was rinsed briefly with 2x SSC, then washed twice with 2x SSC/0.1% SDS at 65°C for 20 min each, and again with 0.1x SSC/0.1% SDS at 65°C for 10 min. Membrane was exposed to BioMax MR film (Carestream Health) at -80°C for two days prior to developing.

In vitro RNA pulldown and mass spectrometry

The in vitro RNA pulldown was performed as previously described (Leppek and Stoecklin,

2014), with additional modifications. Four copies of the S1m aptamer each separated by a short poly(A) spacer were cloned downstream of a T7 promoter into pUC19 vector (New England

BioLabs). The S1m aptamer is a variation of the original S1 aptamer (Srisawat and Engelke,

2001), in which the stem structure was lengthened by 8 bp, and 3 nt altered to strengthen stem formation. We further modified these 8 bps so they differ between each of the 4 tandem aptamers, thereby promoting intra- over intermolecular aptamer folding and improving pulldown

75 efficiency. Our bait sequence (sense RepB, antisense RepB, minimal RepBd, or no insert) was cloned downstream of the tandem aptamer tag, vector linearized by restriction digest, in vitro transcribed by T7 RNA polymerase (MEGAscript, Thermo Fisher Scientific), and purified using

TRIzol Reagent (Thermo Fisher Scientific). The size and quality of the RNA was verified by denaturing RNA gel electrophoresis. For each pulldown, 50 pmol of RNA were added to 200 μL binding/lysis buffer (15 mM βME, 2 mM MgCl2, 1% NP-40, 1% sodium deoxycholate, 0.1%

SDS, 1x PBS) supplemented with 200 U/mL RNase Inhibitor (Roche), and folded by incubation for 5 min each at 50°C, 37°C, then room temp. Folded RNA was added to 20 μL High

Performance Streptavidin Sepharose Beads (GE Healthcare) pre-washed with binding/lysis buffer, and allowed to bind for 2 hr with rotation at 4°C. Unbound RNA was removed by washing beads twice with binding/lysis buffer.

Protein lysate was prepared by two different methods with similar results: (1) For each pulldown experiment, 3 confluent 15-cm dishes of MEFs (~2x107 cells) were rinsed with PBS, extracted with cold CSKT buffer (10 mM PIPES pH 6.8, 100 mM NaCl, 300 mM sucrose, 3 mM MgCl2,

0.5% Triton X-100) for 10 min to remove much of the soluble protein fraction, and rinsed again with PBS. Cells were scraped, pelleted, and snap frozen until later use. (2) Cells were trypsinized, pelleted, rinsed with PBS, and resuspended in 10 mL nuclear isolation buffer (10 mM HEPES pH 7.5, 10 mM KCl, 0.1 mM EDTA, 1 mM DTT, 0.5% NP-40) supplemented with 1x cOmplete mini EDTA-free protease inhibitor cocktail (Roche), for 20 min on ice. Nuclei were pelleted at 1,000 g for 5 min, rinsed three times with nuclear isolation buffer to remove remaining cytoplasmic debris, and snap frozen until later use. To prepare lysate, extracted cells/nuclei were thawed on ice, resuspended in 200 μL cold binding/lysis buffer supplemented with 2 μL protease inhibitor cocktail (Sigma, P8340), and sonicated (Qsonica Q800 Sonicator) in polystyrene tubes at 45% power setting, 30 sec on/30 sec off for a total sonication time of 5 min at 4°C. The homogenate was centrifuged for 10 min at 16,000 g to remove debris, and the

76 supernatant pre-cleared by incubation with 20 μL High Performance Streptavidin Sepharose

Beads for 2 hr with rotation at 4°C.

Pre-cleared protein lysate was supplemented with 200 U/mL RNase Inhibitor (Roche) and added to RNA-bound beads for 2 hr with rotation at 4°C. Beads were washed 6 times with binding/lysis buffer supplemented with an additional 350 mM NaCl (500 mM final salt concentration). RNA-protein complexes were specifically eluted with 10 mM D-biotin (Thermo

Fisher Scientific) in 50 μL binding/lysis buffer for 1 hr with rotation at 4°C. Eluate was concentrated by TCA precipitation, examined by denaturing RNA gel and SDS-PAGE with silver staining (Pierce Silver Stain Kit), and submitted for LC/MS protein identification (Taplan Mass

Spectrometry Facility, Harvard Medical School).

Purification of His-tagged HNRNPK

6xHis-HNRNPK was purified from Rosetta-gami B bacteria as previously described (Tahir et al.,

2014). Briefly, transformed bacteria were grown in Luria-Bertani broth at 37°C and 225 rpm to an optical density of 0.6 (600 nm), after which protein expression was induced for 4 hr at 25°C using 1 mM IPTG. Bacteria were harvested by centrifugation for 15 min at 4,000 rpm in a JA-14 rotor (Beckman), and resuspended in lysis buffer (50 mM Tris pH 8.0, 150 mM NaCl, 5% glycerol, 10 mM imidazole, 1x cOmplete EDTA-free protease inhibitor cocktail [Roche]).

Bacteria were lysed by sonication (Branson Sonifier with attached microtip) using ten 10-sec pulses at power setting 6, while re-chilling lysate on ice in between pulses. Debris were pelleted by centrifugation for 15 min at 15,000 g at 4°C. The clarified lysate was combined with 0.5 mL

Ni-NTA beads (Qiagen) pre-washed with lysis buffer, and incubated overnight at 4°C with rotation. The beads were transferred to a 10-mL polypropylene column and washed with 10 column volumes of wash buffer (20 mM Tris pH 6.8, 2 M NaCl, 60 mM imidazole, 1% Triton X-

100). Protein was eluted from beads by incubation with 0.5 mL elution buffer (20 mM Tris pH

77

8.0, 5% glycerol, 500 mM imidazole, 5 μL protease inhibitor cocktail [Sigma, P8340]) for 20 min at 4°C with rotation, and dialyzed against PBS containing 5% glycerol overnight at 4°C. The purity and concentration were analyzed by SDS-PAGE followed by Coomassie staining and

Bradford assay (Bio-Rad Protein Assay Kit), respectively.

RNA EMSA

RNA EMSA was performed as previously described (Tomonaga and Levens, 1995). Briefly,

RNA probes were generated by 5’-end-labeling with T4 polynucleotide kinase (New England

BioLabs) and [γ-32P]ATP (PerkinElmer). Free nucleotides were removed using Illustra MicroSpin

G-25 columns (GE Healthcare), and specific activity of probe was verified by scintillation counting to be ~500,000 cpm/μL. The following RNA oligos were used:

WT Repeat B

(GCCCCAGCCCCUGCCCCAGCCCCAGCCCCUGCCCCUGCCCCAGCCCCUGCCCCA)

Mutant Repeat B

(GCAACAGCAACUGCAACAGCAACAGCAACUGCAACUGCAACAGCAACUGCAACA)

Prior to performing EMSA, probe was denatured at 65°C for 5 min and folded by slow cooling to room temp. 50 fmol folded probe were added to 20 μL binding buffer (25 mM Tris pH 7.5, 200 mM glycine, 1 mM EDTA, 50 mM NaCl, 0.1% Tween-20, 10% glycerol, 0.5 mg/mL BSA, 2

μg/mL poly(dI-dC) [Thermo Fisher], 1 U/μL RNase inhibitor [Roche]) containing the indicated concentration of recombinant protein, and incubated for 30 min at room temp. Where indicated, unlabeled probe was added to binding reactions and pre-incubated for 10 min with protein prior to addition of labeled probe. RNA-protein complexes were resolved by electrophoresis on 1x

TBE, 6% polyacrylamide gel at 4°C. The gel was dried, exposed to storage phosphor screen

(GE Healthcare), and imaged on Amersham Typhoon (GE Healthcare).

78

Delivery of siRNA and EGFP plasmid

RYBP cDNA was cloned into mammalian EGFP expression vector (derived from pSV2-EYFP-

C1 (Tsukamoto et al., 2000)) and transfected into MEFs using Lipofectamine LTX (Thermo

Fisher Scientific) as per manufacturer’s instructions. After 24 hr, cells were fixed and imaged by wide-field fluorescence microscopy. For siRNA knock-downs, 10 nM ON-TARGETplus

SMARTpool targeting mouse HNRNPK (Dharmacon, L-048002-01) or non-targeting (scramble) control (Dharmacon, D-001810-10) was introduced into MEFs using Lipofectamine RNAiMAX

(Thermo Fisher Scientific) as per manufacturer’s instructions. Knock-down was repeated daily for the indicated durations.

RT-PCR

RNA was isolated from cells using TRIzol Reagent (Thermo Fisher Scientific) as per manufacturer’s instructions. Genomic DNA was removed using TURBO DNase from the

TURBO DNA-free Kit (Thermo Fisher Scientific). After inactivating TURBO DNase with DNase

Inactivation Reagent (also enclosed in TURBO DNA-free Kit), DNAase-free RNA was reverse transcribed using SuperScript III Reverse Transcriptase (Thermo Fisher Scientific) with random primers (Promega, C118A) at 25°C for 5 min, 50°C for 1 hr, and 85°C for 15 min. Conventional

PCR was performed on cDNA using 2x PCR MasterMix (Promega) or 2x Phusion High-Fidelity

PCR Master Mix (New England BioLabs). Quantitative RT-PCR was performed using iTaq

Universal SYBR Green Supermix (Bio-Rad) in a CFX96 Real-Time PCR Detection System (Bio-

Rad). Gene-specific primer pairs are listed in Table S5.

ChIP-seq

ChIP-seq was performed as previously described (Minajigi et al., 2015), with minor modifications. Briefly, cells were cross-linked in PBS with 1% formaldehyde at room temp for 10 min with rotation at 1 million cells/mL, and quenched with 0.125 M glycine for 5 min. Cross-

79 linked cells were washed twice with cold PBS, pelleted, and snap-frozen in liquid nitrogen. 10 million cross-linked cells per ChIP were thawed on ice and resuspended in 1 mL buffer 1 (50 nM

HEPES pH 7.5, 140 mM NaCl, 1 mM EDTA, 0.5% NP-40, 0.25% Triton X-100, 10% glycerol, 1x cOmplete EDTA-free protease inhibitor cocktail [Roche]), and rotated for 10 min at 4°C. Nuclei were pelleted by centrifugation at 1,000 g for 5 min at 4°C, resuspended in 1 mL buffer 2 (10 mM HEPES pH 7.5, 200 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 1x cOmplete EDTA-free protease inhibitor cocktail) supplemented with 0.2 mg/mL RNase A (Thermo Fisher Scientific), and rotated for 10 min at 4°C. Nuclei were pelleted by centrifugation at 1,000 g for 5 min at 4°C and resuspended in 1.3 mL buffer 3 (10 mM HEPES pH 7.5, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 1x cOmplete EDTA-free protease inhibitor cocktail, 0.1% sodium deoxycholate,

0.5% sarkosyl). Nuclei were sonicated (Qsonica Q800 Sonicator) in polystyrene tubes at 45% power reading, 30 sec on/30 sec off for a total sonication time of 4 min at 4°C. Triton X-100 was added to the lysate to 1%, which was then centrifuged for 10 min at 16,000 g to remove debris.

The lysate was pre-cleared for 2 hr at 4°C with rotation using 20 L Dynabeads Protein G

(Thermo Fisher Scientific) pre-washed with PBS/0.5% BSA. After saving 10% as “input” sample, the pre-cleared lysate was combined with 20 L Dynabeads Protein G pre-bound to 2 g antibody (H3K27me3, GeneTex GTX60892; H2AK119ub, Cell Signaling CST8240), and rotated overnight at 4°C. Afterwards, beads were washed five times with wash buffer (50 mM HEPES pH 7.5, 0.5 M LiCl, 1 mM EDTA, 1% NP-40, 0.7% sodium deoxycholate), once with TEN buffer

(10 mM Tris pH 8.0, 1 mM EDTA, 50 mM NaCl), and once with TE buffer (10 mM Tris pH 8.0, 1 mM EDTA). Input sample and beads containing ChIP material were resuspended in 400 μL TES buffer (50 mM Tris pH 8.0, 10 mM EDTA, 1% SDS) supplemented with 0.5 mg/mL Proteinase K

(Sigma) and incubated for 1 hr at 55°C and then for >3 hr at 65°C to reverse cross-links. DNA was purified by phenol-chloroform extraction and quantified with Quant-iT PicoGreen dsDNA

Reagent (Thermo Fisher Scientific). Input and ChIP-seq libraries were prepared in two biological

80 replicates using NEBNext ChIP-seq Library Prep Master Mix Set for Illumina (New England

BioLabs) as per manufacturer’s instructions. Libraries were sequenced on Illumina HiSeq2000

(high-throughput run) or HiSeq2500 (rapid run), generating ~50 million 50-nt paired-end reads per sample.

CHART-seq

Xist CHART-seq was performed as previously described (Wang et al., 2018). Briefly, cells were cross-linked in PBS with 1% formaldehyde at room temp for 10 min with rotation at 2 million cells/mL, and quenched with 0.125 M glycine for 5 min. Cross-linked cells were washed twice with cold PBS, pelleted, and snap-frozen in liquid nitrogen. 25 million cross-linked cells per

CHART were thawed on ice and resuspended in 1 mL cold sucrose buffer (10 mM HEPES pH

7.5, 0.3 M sucrose, 1% Triton X-100, 100 mM potassium acetate, 0.1 mM EGTA) supplemented with 0.5 mM spermidine, 0.15 mM spermine, 1 mM DTT, 1x protease inhibitor cocktail (Sigma,

P8340), 10 U/mL RNase inhibitor (Roche). The cell suspension was rotated for 10 min at 4°C, after which it was diluted with additional 2 mL cold sucrose buffer. This was added to a pre- chilled 15-mL glass Wheaton dounce tissue grinder (Thermo Fisher Scientific) that had been pre-rinsed once with RNaseZAP RNase decontamination solution (Thermo Fisher Scientific), three times with DEPC-treated water, and once with 2 mL sucrose buffer. Release of nuclei was accomplished by douncing cells 20 times with a tight pestle, and verified by staining with Trypan

Blue Solution (Thermo Fisher Scientific). The nuclear suspension was then layered atop a cushion of 7.5 mL cold glycerol buffer (10 mM HEPES pH 7.5, 25% glycerol, 1 mM EDTA, 0.1 mM EGTA, 100 mM potassium acetate) supplemented with 0.5 mM spermidine, 0.15 mM spermine, 1 mM DTT, 1x cOmplete EDTA-free protease inhibitor cocktail (Roche), and 5 U/mL

RNase inhibitor, and centrifuged at 1500 g for 10 min at 4°C. The purified nuclear pellet was resuspended in 6 mL PBS and further cross-linked with 3% formaldehyde for 30 min at room temp with rotation. Afterwards, nuclei were pelleted by centrifugation at 1,000 g for 5 min at 4°C

81 and washed three times with cold PBS. Nuclei were resuspended in in 1 mL cold nuclear extraction buffer (50 mM HEPES pH 7.5, 250 mM NaCl, 0.1 mM EGTA, 0.5% sarkosyl, 0.1% sodium deoxycholate, 5 mM DTT, 10 U/mL RNase inhibitor) and incubated for 10 min at 4°C with rotation. Nuclei were then centrifuged at 400 g for 5 min at 4°C and resuspended in 230 μL cold sonication buffer (50 mM HEPES pH 7.5, 75 mM NaCl, 0.1 mM EGTA, 0.5% sarkosyl,

0.1% sodium deoxycholate, 0.1% SDS, 5 mM DTT, 10 U/mL RNase inhibitor) to a final volume of ~270 μL. Nuclei were sonicated for 5 min at 4°C using a Covaris E220e (140W peak incident power, 10% duty factor, 200 cycles/burst). Sonicated chromatin was centrifuged at 16,000 g for

20 min at 4°C to remove debris. The supernatant (~220 μL) was mixed with 110 μL sonication buffer to a final volume of ~330 μL, which was split into two CHART reactions (Xist capture using sense probes and control capture using antisense probes).

For two CHART reactions, 600 μL MyOne Streptavidin C1 beads (Thermo Fisher Scientific) were used, which were pre-washed with 1 mL DEPC-treated water, blocked with 600 μL blocking buffer (33% sonication buffer, 67% 2x hybridization buffer, 500 ng/μL yeast RNA

[Thermo Fisher Scientific], 1% BSA) at room temp for 1 hr, washed with 1 mL DEPC-treated water again, and resuspended in 610 μL 1x hybridization buffer (33% sonication buffer, 67% 2x hybridization buffer [see below for composition]). For each CHART, 160 μL chromatin was mixed with 320 μL 2x hybridization buffer (50 mM Tris pH 7.0, 750 mM NaCl, 1% SDS, 1 mM

EDTA, 15% formamide, 1 mM DTT, 1 mM PMSF, 1x cOmplete EDTA-free protease inhibitor cocktail, 100 U/mL RNase inhibitor) and pre-cleared with 60 μL pre-blocked MyOne Streptavidin

C1 beads at room temp for 1 hr with rotation. Pre-cleared chromatin was separated from beads, with 1% saved as “input” sample. The remaining pre-cleared chromatin (~500 μL) was mixed with 36 pmol of either antisense (Xist-targeting) or sense (control) biotinylated capture probes.

We used a pool of 9 capture probes, which were selected from the 11 probes described previously (Simon et al., 2013), excluding probes X.3 and X.8. Hybridization was carried out at

82

37°C for 4 hr followed by incubation with 240 μL blocked MyOne Streptavidin C1 beads at 37°C for 1 hr. The beads were washed once with 1x hybridization buffer (33% sonication buffer, 67%

2x hybridization buffer) at 37°C for 10 min, five times with wash buffer (10 mM HEPES pH 7.5,

150 mM NaCl, 2% SDS, 2 mM EDTA, 2 mM EGTA, 1 mM DTT) at 37°C for 5 min, and twice with elution buffer (10 mM HEPES pH 7.5, 150 mM NaCl, 0.5% NP-40, 3 mM MgCl2, 10 mM

DTT) at 37°C for 5 min. 10% of the final wash was saved as an “RNA capture” sample to estimate the efficiency of Xist RNA capture afterwards by RT-qPCR. CHART-enriched DNA was eluted twice in 200 μL elution buffer supplemented with 5 U/μL RNase H (New England

BioLabs) at 37°C for 30 min. An “input” sample and CHART DNA were treated with 0.5 mg/mL

RNase A (Thermo Fisher Scientific) at 37°C for 1 hr and then with 1% SDS, 10 mM EDTA, and

0.5 mg/mL proteinase K (Sigma) at 55°C for 1 hr. Reversal of crosslinks was performed by addition of 150 mM NaCl (final 300 mM) and incubation at 65°C overnight. DNA was purified by phenol-chloroform extraction and further sheared to <500-bp fragments using Covaris E220e

(140W peak incident power, 10% duty factor, 200 cycles/burst). Sonicated DNA was purified using 1.8x Agencourt AMPure XP beads (Beckman Coulter). Input and CHART-seq libraries were prepared in two biological replicates using NEBNext ChIP-seq Library Prep Master Mix Set for Illumina (New England BioLabs) as per manufacturer’s instructions. Libraries were sequenced on Illumina HiSeq2000 (high-throughput run) or HiSeq2500 (rapid run), generating

~50 million 50-nt paired-end reads per sample.

ChIP-seq and CHART-seq analysis

To account for the hybrid character of our cell lines, adaptor-trimmed reads were separately aligned to custom mus/129 and cas genomes using NovoAlign (Novocraft), then mapped back to reference mm9 genome using SNPs (Pinter et al., 2012). This generated three tracks: composite of all reads (comp) and two allele-specific tracks using only allele-specifically mappable reads. We first used SPP (Kharchenko et al., 2008) to generate input-subtracted

83

ChIP and CHART coverage profiles with smoothing using 1-kb windows recorded every 500 bp, as previously described (Simon et al., 2013). To account for different sequencing depths and

CHART efficiencies, CHART profiles were scaled so that they have equal coverage within a 1-

MB region flanking but not including the Xist locus (chrX:100150712-100650712 and

100683572-101183572). After scaling, we divided the coverage of each 1-kb window in ∆RepB,

RING1A/B KO, and EED KO with that of WT, with a pseudo-count (0.001) introduced to each window. We then computed the base 2 logarithm of the ratio to generate the log2 ratio profiles.

To account for different sequencing depths for ChIP-seq, samples differing by >10% were compensated by random downsampling with SAMtools (Li et al., 2009). To compare the densities of Xist, H3K27me3, and H2AK119ub at different categories of X-linked genes, we computed their densities over gene bodies by HOMER (Heinz et al., 2010). Non-expressed genes were defined as X-linked genes having zero RPKM in at least one RNA-seq replicate of

WT MEFs; genes subject to XCI were defined as having non-zero RPKM in both WT MEF RNA- seq replicates after excluding escapees; escapees in the hybrid MEF line have been previously defined (Pinter et al., 2012). Genes with length shorter than 3 kb or overlapping unmappable regions were excluded from the analysis along with Xist and Tsix.

RNA-seq

Total cell RNA was extracted using TRIzol Reagent (Thermo Fisher Scientific), from which mRNA was isolated using NEBNext Poly(A) mRNA magnetic isolation module (New England

BioLabs) as per manufacturer’s instructions. RNA-seq libraries were prepared in two biological replicates using NEBNext Ultra Directional RNA Library Prep Kit for Illumina (New England

BioLabs) as per manufacturer’s instructions. Libraries were sequenced on Illumina HiSeq2000

(high-throughput run) or HiSeq2500 (rapid run), generating ~50 million 50-nt paired-end reads per sample.

84

RNA-seq analysis

PCR duplicates were removed and reads separately aligned to custom mus/129 and cas genomes using TopHat2 (Kim et al., 2013). Final allele-specific mapping to reference mm9 genome was generated based on SNPs (Pinter et al., 2012). Only uniquely aligned concordantly mapped sequences were used in downstream analysis. Counts per gene were calculated using featureCounts (Liao et al., 2014). Using MatLab (MathWorks), library sizes were normalized and genes with insufficient allelic information (<13 allele-specific reads) were removed. We also removed potentially miscalled genes from our alignment pipeline, defined as genes with reads incorrectly assigned to mus from a pure cas RNA-seq library. Allele-specific RPKM was calculated using allelic ratio (allele-specific counts) applied to comp RPKM (total counts). For the CDF plot from fibroblast cells, genes with allele-specific RPKM≥1 were used while all other plots used genes with comp RPKM≥1 in at least one sample.

In situ Hi-C

In situ Hi-C was performed as previously described (Rao et al., 2014), with minor modifications outlined below. Briefly, 5 million cells per sample were cross-linked with 1% formaldehyde for 10 min at room temp. Cross-linked cells were lysed and the nuclei digested with 15 μL HindIII (100

U/μL; New England BioLabs, R0104M) at 37°C overnight. Cell clumps were removed by brief centrifugation, with the supernatant transferred to a new tube for the fill-in reaction with biotin-

14-dATP (Thermo Fisher Scientific) and subsequent ligation. Re-ligated DNA was purified and then sheared to 300-500 bp in a Covaris microTUBE for 80 sec using Covaris E229e (140W peak incident power, 10% duty factor, 200 cycles/burst). Double size selection was performed on sheared DNA using 0.55x-0.75x Agencourt AMPure XP beads (Beckman Coulter).

Biotinylated ligation products were purified with 30 μL MyOne Streptavidin C1 beads, end- repaired, dA-tailed, ligated to 3 μL 15 μM NEBNext Adaptor for Illumina (enclosed in NEBNext

Multiplex Oligos for Illumina [Index Primers Set 1]; New England BioLabs), and digested with 3

85

μL USER (also enclosed in NEBNext Multiplex Oligos for Illumina [Primers Set 1]) at

37°C for 15 min. Hi-C libraries were then amplified with 12-16 cycles of PCR and cleaned up using Agencourt AMPure XP beads. Libraries were sequenced on Illumina HiSeq2000, generating ~200 million 50-nt paired-end reads per sample.

In situ Hi-C analysis

Reads were trimmed using cutadapt with the options --adapter=GATCGATC (MboI ligation junction) and --minimum-length=20. Reads of each pair were individually mapped to the mus

(Xi) and cas (Xa) reference genomes using NovoAlign (Novocraft) and merged into Hi-C summary files and filtered using HOMER (Heinz et al., 2010) as previously described (Minajigi et al., 2015). Hi-C contact maps were generated from reads in the filtered HOMER mus and cas tag directories using the ‘pre’ command of Juicer tools with the genomeID mm9 (Durand et al.,

2016b). The resulting Hi-C contact maps in .hic format were visualized and normalized with the

‘Balanced’ option in Juicebox (Durand et al., 2016a). Unless otherwise noted, all Hi-C interaction matrices in figures were visualized using Juicebox with 100-kb resolution and

‘Balanced’ normalization. Insulation scores, TADs, and TAD scores were determined using perl scripts previously described and available in the cworld GitHub repository (Giorgetti et al.,

2016). Specifically, 100-kb resolution ‘Balanced’ normalized chromosome X Hi-C interaction matrices were extracted from Juicebox using the ‘dump’ function of Juicer tools (juicer dump observed KR name.hic X X BP 100000 name.KR.100kb.dump). Custom shell and R scripts were used to convert this output matrix into the cworld matrix format. Insulation scores were then calculated for the X chromosome using the resulting cworld matrix and the matrix2insulation.pl cworld script with parameters --is 500000 --ids 400000 --im iqrMean --nt 0 -- ss 200000 --yb 1.5 --nt 0 --bmoe 0. As described previously, regions of insulation score minima represent regions of locally high insulation and are considered potential TAD boundaries (Crane et al., 2015). TADs were detected from output insulation files using the cworld script

86 insulation2tads.pl using default parameters (-- mbs 0 -- mts 0). To avoid slight differences in called TAD borders between WT and ∆RepB samples, and to avoid potential inaccuracies in the location of called TAD borders due to weakened TADs on the Xi, for each cell type (MEF or day

14 ES cell) TAD borders called on the WT Xa were used for further analysis of all samples. For example for MEF, TAD borders were called on the WT Xa and those borders used for analysis of WT Xi, ∆RepB Xa, and ∆RepB Xi MEF samples. For MEF Xa, 110 TADs were called while for day 14 ES cell, 108 TADs were called. TADs falling within low SNP density regions (33Mb-

45Mb, 109-116Mb, 123-130Mb) were excluded from further analysis, resulting in 89 TADs for

MEF and 90 TADs for day 14 ES cell. The insulation2tads.pl script also calculated scores for each TAD. As previously described, the TAD score is calculated by first finding the maximum insulation score within a given TAD. The insulation scores at the boundaries of the TAD are then subtracted from this maximum insulation score. The TAD score is then taken to be the minimum of these two differences (https://github.com/dekkerlab/cworld-dekker). To determine if

TAD scores overall differed significantly between samples, a Wilcoxon rank sum test was performed in R (wilcox.test(), paired=TRUE, all other options default). To determine the level of noise in the calculated TAD score between samples, for example due to differences in efficiency between Hi-C experiments, we first assumed that TAD strength should not differ substantially between the ∆RepB Xa and the WT Xa. Therefore, we calculated the ratio of TAD scores

(∆RepB/WT Xa) for each cell type (MEF or day 14 ES cell) and used this to assign an empirical false discovery rate (FDR) < 0.05 threshold for Xi. Only TADs with non-zero TAD scores on both the ∆RepB and WT Xa were included in the FDR < 0.05 threshold calculations. To determine which TADs were stronger on ∆RepB versus WT Xi, ∆RepB/WT Xi TAD score ratios were calculated, and those greater than the FDR < 0.05 threshold were considered significant.

Analogously, TADs significantly weaker on ∆RepB versus WT Xi were also determined. For

∆RepB MEF Xi, 18 TADs were significantly stronger and 7 significantly weaker. For ∆RepB day

14 ES cell Xi, 31 TADs were significantly stronger and 8 were significantly weaker. To

87 determine if X-linked genes that failed to be silenced in ∆RepB day 14 ES cells correlated with

TADs that were significantly stronger, fraction mus (Xi) reads (mus/[mus + cas]) were first calculated for each gene where allelic information could be determined (using the same criteria as above), in both the WT and ∆RepB day 14 RNA-seqs. The change in fraction mus reads

“∆fraction mus reads” was calculated for each gene (fraction ∆RepB mus reads/fraction WT mus reads). Each gene was assigned to a day 14 ES cell TAD harboring the gene (with borders defined by the Xa, as explained above). For each gene, ∆fraction mus reads was plotted against the TAD score ratio (∆RepB/WT ES Xi) of its assigned TAD. Linear correlation of RNA-seq ratios and TAD score ratios was then computed using the lm() function in R with default parameters. Principle component analysis (PCA) was done using the runHiCpca.pl command of

HOMER with the filtered HOMER Hi-C tag directory as input and the options -res 100000 - superRes 100000 -genome mm9. Pearson correlation maps were also generated by HOMER using the analyzeHiC command with the filtered HOMER Hi-C tag directory as input and the options -chr chrX -res 200000 –corr. Pearson correlation maps were visualized with Java

TreeView (Saldanha, 2004).

Data and Software Availability

Original unprocessed gel, blot, and microscope images in this manuscript have been deposited at Mendeley Data and are available at: http://dx.doi.org/10.17632/cfk4bgz6w9.1

Raw high-throughput sequencing data and processed files for RNA-seq, ChIP-seq, CHART-seq, and Hi-C reported in this paper have been deposited at GEO under accession number:

GSE107217

88

Chapter 3

Repeat E anchors Xist RNA to the inactive X chromosomal compartment through CDKN1A-

interacting protein (CIZ1)

Attribution

Hongjae Sunwoo and John Froberg performed immunoprecipitation experiments to identify

CIZ1 protein. CIZ1 KO, HNRNPU KO, and CIZ1-EGFP knock-in cell lines were generated and characterized by Hongjae Sunwoo. STORM super-resolution imaging and several UV-RIP experiments were also performed by Hongjae Sunwoo. Content from this chapter has been published in Sunwoo et al., 2017.

90

Summary

X-chromosome inactivation is an epigenetic dosage compensation mechanism in female mammals driven by the long noncoding RNA, Xist. Although recent genomic and proteomic approaches have provided a more global view of Xist’s function, how Xist RNA localizes to the inactive X chromosome (Xi) and spreads in cis remains unclear. Here, we report that the

CDKN1-interacting zinc finger protein CIZ1 is critical for localization of Xist RNA to the Xi chromosome territory. Stochastic optical reconstruction microscopy shows a tight association of

CIZ1 with Xist RNA at the single-molecule level. CIZ1 interacts with a specific region within Xist exon 7—namely, the highly repetitive Repeat E motif. Using genetic analysis, we show that loss of CIZ1 or deletion of Repeat E in female cells phenocopies one another in causing Xist RNA to delocalize from the Xi and disperse into the nucleoplasm. Interestingly, this interaction is exquisitely sensitive to CIZ1 levels, as overexpression of CIZ1 likewise results in Xist delocalization. As a consequence, this delocalization is accompanied by a decrease in

H3K27me3 on the Xi. Our data reveal that CIZ1 plays a major role in ensuring stable association of Xist RNA within the Xi territory.

Introduction

X chromosome inactivation (XCI) is one of the most extensively studied epigenetic processes to date. Since its discovery more than 50 years ago, numerous genetic and cellular studies have uncovered several RNA and protein factors to be high-confidence regulators of this process (Lee, 2011; Wutz, 2011; Disteche, 2012). Recently, the advent of genomic (Calabrese et al., 2012; Engreitz et al., 2013; Pinter et al., 2012; Simon et al., 2013) and proteomic (Chu et al., 2015; McHugh et al., 2015; Minajigi et al., 2015) approaches for studying long noncoding

RNAs has brought about a more holistic view of XCI mechanics. Still, how Xist is able to spread across only one of two X chromosomes and be retained within the inactive X (Xi) territory as an

Xist cloud (Clemson et al., 1996) remains one of the most challenging questions to address.

91

Despite an intuitive perception that Xist localization must be confined in cis to the allele from which it is transcribed, specific molecular players have yet to be fully elucidated. In Chapter 2, we shed light on the role of HNRNPK and Polycomb repressive complexes in Xist spreading and/or chromatin association. Meanwhile, the transcription factor YY1 has been ascribed a role in nucleation of Xist RNA in cis to the Xi (Jeon and Lee, 2011; Wang et al., 2016). The nuclear matrix protein HNRNPU (also known as SAF-A) was also found to be important for Xist RNA localization (Wang et al., 2016; Hasegawa et al., 2010). However, neither YY1 nor HNRNPU is particularly enriched on the Xi relative to other chromosomes and may thus function indirectly or in a cell-type-specific manner (Kolpa et al., 2016; Sakaguchi et al., 2016).

Results

CIZ1 is a novel Xi factor

While performing super-resolution stochastic optical reconstruction microscopy

(STORM) to investigate candidate protein factors for their ability to colocalize with Xist RNA

(Sunwoo et al., 2015), we came across an ASH2L antibody that piqued our interest. This

Trithorax protein, usually associated with active genes, was previously reported to colocalize with Xist RNA in immunofluorescence (IF) experiments (Pullirsch et al., 2010). Indeed, our initial analysis confirmed this colocalization, with the antibody exhibiting an exceptionally high level of colocalization with Xist RNA in mouse embryonic stem (ES) and immortalized embryonic fibroblast (MEF) cells (Figure 3.1A, B). However, further analysis suggested ASH2L was not the recognized epitope. IF showed that knockdown of ASH2L by siRNA failed to abolish the Xi- enriched signal, despite effective knockdown at both the protein and mRNA levels (Figure 3.1C,

D). In addition, IF using two other commercially available antibodies or an EGFP fusion protein failed to show any sign of ASH2L enrichment on the Xi (Figure 3.1E, F).

92

Figure 3.1 ASH2L antibody exhibits cross-reactivity to an unknown Xi-localizing protein

(A, B) STORM imaging of Xist RNA FISH and ASH2L IF using a specific antibody (Bethyl

Laboratories, A300-107A) in female mouse ES cell (A) or MEF cell (B). Distribution of nearest neighbor’s distance between Xist and ASH2L shows a high degree of proximity, significantly more than randomized control. Boxed area is shown at higher magnification. p values

(Kolmogorov-Smirnov test) were calculated by comparing distributions of observed distances to those of control distances based on randomized localizations.

(C) IF using ASH2L antibody exhibits no visible difference between control siRNA and ASH2L knock-down cells.

(D) Quantitative RT-PCR and Western blot show effective knock-down of ASH2L RNA and protein, respectively.

(E) IF using two different ASH2L antibodies in female MEFs fails to show any enrichment of

ASH2L on Xi.

(F) Transiently transfected EGFP-ASH2L fails to colocalize with XIST RNA in HEK293FT cells.

93

Figure 3.1 ASH2L antibody exhibits cross-reactivity to an unknown Xi-localizing protein

(continued)

94

To identify the epitope associated with the Xi, we performed proteomic analysis of the material immunoprecipitated by the presumptive ASH2L antibody (Table S2). To screen candidates, we constructed and transiently transfected several EGFP-fusion proteins into

HEK293FT cells. Among them, CIZ1 arose as a highly enriched factor on the Xi (Figure 3.2A).

CIZ1 was originally identified as a CDKN1A-interacting protein (Mitsui et al., 1999) and has been reported to play a role in cell cycle progression (Copeland et al., 2015; Coverley et al.,

2005). It can be found in tight association with the nuclear matrix and is resistant to high salt extraction (Ainscough et al., 2007). CIZ1 is also linked to several human diseases, including cervical dystonia (Xiao et al., 2012) and lung (Higgins et al., 2012). Furthermore, although not previously implicated in XCI, CIZ1 did emerge as a potential Xist RNA interactor in one of the recent proteomic studies (Chu et al., 2015).

We confirmed CIZ1’s localization to the Xi in several ways. First, CIZ1 colocalization with

Xist RNA in female mouse cells was examined by IF using an in-house CIZ1 antibody (Figure

3.2B, C), as well as by C-terminal knock-in of EGFP at the endogenous Ciz1 locus (Figure 3.2D,

E, Table S1). Importantly, Xi localization of CIZ1 was not observed in an Xist-deleted female

MEF (Zhang et al., 2007), indicating Xist is necessary for CIZ1 recruitment to the Xi (Figure

3.2F). To investigate the molecular localization of CIZ1, we performed super-resolution imaging analysis using STORM to resolve single Xist particles that were previously deduced to contain one to two molecules of Xist RNA (Sunwoo et al., 2015). Intriguingly, CIZ1 showed greater proximity to puncta of Xist transcripts than any other previously examined Xi-associated factor

(Figure 3.2G), including EZH2, H3K27me3, SMCHD1, H4K20me1, and HBiX1 (Sunwoo et al.,

2015). A large fraction of CIZ1-Xist pairs were within 25 nm of each other, a distance within the empirical resolution (20–30 nm) of STORM microscopy. Approximately 85% of all pairs showed a separation of <50 nm distance. This distance is significantly different from that which would be derived from a random model (P << 0.001).

95

Figure 3.2 CIZ1 is a novel Xi-localizing protein

(A) Transiently transfected EGFP-CIZ1 colocalizes with XIST RNA in HEK293FT cells.

(B) CIZ1 colocalizes with Xist RNA in female MEF (tetraploid) using in-house CIZ1 antibody.

(C) Validation of in-house CIZ1 antibody specificity by IF (left) and Western blot (right): IF signal can be blocked by addition of 2.5x excess antigen. Western blot using WT, CIZ1 knock-out

(KO), and CIZ1-EGFP knock-in (KI) whole-cell lysate. Arrows indicate CIZ1 (WT) and CIZ1-

EGFP (KI). Asterisk denotes non-specific protein band. βTUB serves as loading control.

(D) CIZ1-EGFP colocalizes with Xist RNA in MEFs carrying an EGFP knock-in at the C- terminus of the endogenous Ciz1 locus.

(E) Western blot using GFP antibody confirms the presence of CIZ1-EGFP fusion protein.

Asterisk denotes non-specific protein band.

(F) CIZ1 shows no Xi localization pattern in female MEFs with Xist deleted.

(G) Two-color STORM image showing CIZ1 colocalization with Xist RNA in female MEF. The two Xist clouds are boxed with one shown at higher magnification. Nearest neighbor’s distance measurement shows most of CIZ1 is proximal to Xist RNA, significantly more than randomized control (p values from Kolmogorov-Smirnov test).

(H) Stoichiometry of the number of Xist puncta relative to CIZ1 puncta per Xi. Data for Xist versus EZH2 (denoted by asterisk) is taken from (Sunwoo et al., 2015) for comparison purposes. Note that each punctum could have multiple molecules of protein or RNA. This analysis is strictly focused on the number of clusters (puncta) of protein or RNA.

(I) UV-RIP using CIZ1-EGFP KI cell line shows CIZ1-EGFP is in close interaction with Xist RNA.

Western blot verifies CIZ1-EGFP was pulled down efficiently, compared to 10% input. Asterisk denotes statistically significant difference in paired student t test.

96

Figure 3.2 CIZ1 is a novel Xi-localizing protein (continued)

97

CIZ1 is recruited to the Xi by Xist RNA

These data suggested a very close relationship between CIZ1 and Xist RNA. We counted the number of Xist puncta relative to CIZ1 puncta and observed a similar stoichiometry, with 52.2 ± 11.2 for Xist and 61.9 ± 10.8 for CIZ1 (Figure 3.2H). This similarity is consistent with a tight co-clustering of Xist and CIZ1, and is also consistent with the nearly equal stoichiometry of Xist to Polycomb repressive complex 2 puncta, as previously reported (Sunwoo et al., 2015).

To probe further whether Xist and CIZ1 might associate with each other at a molecular level, we performed UV-crosslinked RNA immunoprecipitation (UV-RIP), using our CIZ1-EGFP knock-in cell line (Figure 3.2I). A clear enrichment over non-UV-crosslinked control or the parental cell line lacking EGFP supported a potential direct interaction between CIZ1 and Xist RNA in vivo.

To test this relationship further, we used locked nucleic acid (LNA) antisense oligonucleotides to

“knock off” Xist RNA from the Xi (Sarma et al., 2010) and asked whether there were consequences on CIZ1 localization. LNA knock-off caused an immediate delocalization of CIZ1 that was detectable as early as 20 min after transfection, in a manner that was concurrent with loss of Xist RNA (Figure 3.3A). Recovery of both Xist and CIZ1 occurred very slowly after 4-6 h in a concordant fashion (Figure 3.3B). Taken together, these data support a very close association between Xist RNA and CIZ1 in Xi localization.

We then examined the time course of CIZ1 recruitment in differentiating female ES cells, an ex vivo model for XCI. CIZ1 foci were observed as soon as Xist RNA was detected by RNA

FISH on day 2 of differentiation, although CIZ1 foci were less intense and appeared somewhat punctate at this early point (Figure 3.3C). Between days 4 and 14, CIZ1 signal continued to accumulate coincidentally with Xist RNA in differentiating female ES cells (Figure 3.3C). Nearly all Xist foci showed a confidently detectable level of overlapping CIZ1 localization by day 4

(98%, n = 117). STORM imaging of day 4 ES cells further confirmed proximal localization of

CIZ1 to Xist RNA (Figure 3.3D). Quantitative RT-PCR demonstrated that Ciz1 levels were up- regulated during female cell differentiation with a time course that paralleled Xist’s up-regulation

98

Figure 3.3 Xist RNA is required for CIZ1 recruitment to Xi

(A) LNA knock-off of Xist RNA from the Xi also displaces CIZ1 within 1 hour post-transfection, while scrambled control LNA (Scr) has no effect.

(B) Slow recovery over 4-6 hours was observed, in line with re-establishment of the Xist cloud.

(C) Xist RNA FISH and CIZ1 IF on days 2-14 of female mouse ES cell differentiation. The boxed area in day 2 was enlarged and contrast further adjusted to show weak level of CIZ1 still colocalizing with Xist at this early time point.

(D) STORM imaging of Xist RNA FISH and CIZ1 IF on the Xi in day 4 differentiating ES cell.

(E) Quantitative RT-PCR shows Ciz1 RNA is up-regulated along with Xist RNA during female mouse ES cell differentiation. Ciz1 RNA also shows similar up-regulation in male ES cells.

(F) CIZ1 does not exhibit any distinctive Xi localization pattern in male MEF.

(G) A full-length Xist transgene is able to recruit CIZ1 even when expressed in male MEF.

Arrowhead indicates Xist cloud from stably integrated loci.

99

Figure 3.3 Xist RNA is required for CIZ1 recruitment to Xi (continued)

100

(Figure 3.3E). In differentiating male ES cells, CIZ1 was also transcriptionally up-regulated

(Figure 3.3E), but failed to accumulate on the single active X chromosome, consistent with the absence of XCI in male fibroblasts (Figure 3.3F). In contrast, CIZ1 could be recruited ectopically to an induced Xist transgene in male fibroblasts (Figure 3.3G), indicating Xist expression is sufficient for CIZ1 recruitment. We conclude that CIZ1 is rapidly recruited to the Xi during XCI, and that Xist RNA is both necessary and sufficient to recruit CIZ1.

CIZ1 is required for proper Xist localization to Xi

To understand CIZ1’s role during XCI, we established two female MEF cell lines harboring small deletions in CIZ1’s exon 5 (present in all splicing isoforms), using the

CRISPR/Cas9 system. Two knockout (KO) clonal lines were established: KO1 has a frameshift deletion, whereas KO5 has a short (≤16 aa) in-frame deletion (Table S1). Both KO cell lines showed loss of CIZ1 protein in Western blot analysis (Figure 3.4A), suggesting the frame-shift and in-frame mutations both produced unstable protein. Intriguingly, loss of CIZ1 protein in both cell lines led to an aberrant pattern of Xist accumulation on the Xi (Figure 3.4B). Analysis by 3D

STORM super-resolution imaging showed poorly localized Xist particles and a gradient of Xist concentration, indicative of diffusion away from the site of synthesis (Xi) (Figure 3.4C). Xist RNA

FISH and X chromosome painting confirmed that Xist RNA localized beyond the Xi chromosome territory (Figure 3.4D). This aberrant localization pattern was not caused by any evident effect on Xist expression (Figure 3.4E). Quantitative RT-PCR confirmed that CIZ1 loss did not significantly affect expression of HNRNPU, another factor critical for proper Xist localization (Hasegawa et al., 2010) (Figure 3.4E). We also generated CIZ1 KO female ES cells and observed a similar Xist localization defect (Figure 3.4F, Table S1). Significantly, the role of

CIZ1 in Xist localization is conserved in human, as KO of CIZ1 in HEK293FT cells likewise resulted in dispersal of XIST RNA (Figure 3.4G, Table S1). Xist delocalization led to a consequent decrease or loss of H3K27me3 on the Xi in KO MEFs (Figure 3.4H), consistent with

101

Figure 3.4 CIZ1 is critical for maintenance of Xist cloud and Xi heterochromatin marks

(A) Western blot confirms depletion of CIZ1 protein in KO1 and KO5 cell lines. YY1 was used as a loading control. Asterisk denotes non-specific protein band.

(B, F, G) Xist/XIST localization is dispersed in CIZ1 KO MEF cells (B), mouse ES cells (F), and

HEK293FT cells (G). Yellow circle indicates nuclear boundary.

(C) STORM imaging of boxed areas shows Xist particles diffusing away in KO compared to tight cloud in WT cells. Depth in the z-plane is color-coded from red (+400 nm) to green (-400 nm).

(D) Xist RNA in KO cells is detected outside of the X-chromosome territory. Arrows and arrowheads denote the two active and inactive X chromosomes, respectively.

(E) CIZ1 depletion has minimal effect on Xist or Hnrnpu RNA levels. Two primer sets were used for each. Mean ± s.d. for 3 replicates is shown.

(H) H3K27me3 on the Xi is lost or reduced in a significant fraction of CIZ1 KO cells. Arrowheads indicate H3K27me3 enrichment on Xi.

102

Figure 3.4 CIZ1 is critical for maintenance of Xist cloud and Xi heterochromatin marks

(continued)

103 a requirement for Xist in recruiting Polycomb repressive complex 2. Taken together, these data demonstrate that CIZ1 is required for Xist localization.

CIZ1 interacts with Xist RNA via Repeat E

We then investigated whether specific motifs in Xist RNA are responsible for CIZ1 recruitment. We first tested female MEFs carrying Xist transgenes with various subdeletions

(Jeon and Lee, 2011). Cell lines with a wild-type Xist transgene or a transgene with a Repeat A deletion were both capable of recruiting CIZ1 (Figure 3.5A). In contrast, a transgene containing only exon 1 of Xist failed to recruit CIZ1 (Figure 3.5B), arguing that critical CIZ1-interacting domains lie outside of exon 1. To pinpoint required domains, we began by deleting the entire exon 7 (the largest exon after the first), using the CRISPR/Cas9 system and a pair of guide

RNAs flanking the endogenous locus in MEFs (Tables S1, S3). Because the MEFs are tetraploid with two Xi’s, we could isolate clones with one Xi containing the desired deletion and the other serving as an internal control within the same nucleus, as well as clones with deletions on both Xi’s. Deletion of the entire exon 7 not only failed to recruit CIZ1 but also phenocopied the Xist delocalization and loss of H3K27me3 seen in CIZ1 null MEFs (Figure 3.5C-E, “∆Ex7

10”, compare to Figure 3.4). This phenotype contrasted sharply with that of wild-type Xist RNA in the same nucleus of each KO cell. UV-RIP-qPCR showed that Xist ∆Ex7 22 (containing the same deletion as ∆Ex7 10, but on both Xi’s) could no longer be pulled down by CIZ1 protein

(Figure 3.5F). Likewise, enrichment of H3K27me3 was lost on both Xi’s (Figure 3.5D, E). Thus, exon 7 of Xist RNA is critical for CIZ1 recruitment to the Xi.

Within exon 7 lies one of Xist’s many repetitive motifs, Repeat E. Deleting this entire

~1.2-kb Repeat E motif resulted in complete loss of CIZ1, significant loss of H3K27me3 IF signal from the Xi, and strong disruption of the Xist cloud (Figure 3.5C-E, “∆RepE 4 & 16”). Finer mapping revealed that deleting the first 800 nucleotides (containing the highly repetitive sequences upstream of a PstI restriction site) was enough to ablate CIZ1 and H3K27me3 IF

104

Figure 3.5 CIZ1 interacts with Xist RNA through the Repeat E region

(A) Transgenes containing either full-length Xist or Xist lacking Repeat A are sufficient to recruit

CIZ1. Arrows indicate Xist cloud from endogenous loci while arrowhead indicates over- expressed Xist from stably integrated transgene.

(B) A transgene containing only Xist exon 1 is insufficient to recruit CIZ1.

(C) Schematic diagram showing several cell lines with endogenous subdeletions within Xist exon 7. Red dotted lines denote deleted segments.

(D) Quantification of CIZ1/H3K27me3 IF and Xist RNA FISH (E) showing loss of CIZ1 recruitment and dispersal of Xist RNA away from the Xi in indicated deletion cell lines. Arrow indicates WT Xist while arrowhead indicates Xist with deletion. Contrast was enhanced for Xist

RNA FISH to show single particles outside the main cloud.

(F) UV-RIP shows CIZ1-Xist RNA interaction requires Repeat E within exon 7. Clones ∆Ex7 22 and ∆RepE 16 have the entire exon 7 or Repeat E regions deleted on both Xi’s, respectively.

Mean ± s.d. for 3 replicates is shown.

105

Figure 3.5 CIZ1 interacts with Xist RNA through the Repeat E region (continued)

106 signal in 50% of the population, along with some disruption of the Xist cloud (Figure 3.5C-E,

“∆RepE 1-2”). A larger deletion encompassing Repeat E and some downstream sequence

(“Ex7a” region, see Chapter 2) abolished CIZ1 detection in all cells, and was accompanied by even stronger disruption of the Xist cloud and loss of H3K27me3 (Figure 3.5C-E, “∆RepE 3-9”,

“∆Ex7a 11”). In general, the larger the deletion of Repeat E (and perhaps some sequence immediately downstream), the more pronounced the effect on CIZ1/Xist localization and

H3K27me3 deposition (Figure 3.5C-E). In addition, UV-RIP-qPCR showed that Xist ∆RepE 16

(containing the same deletion as ∆RepE 4, but on both Xi’s) could no longer be pulled down by

CIZ1 protein (Figure 3.5F). These data identify the Repeat E in exon 7 of Xist RNA as essential for the recruitment of CIZ1 to the Xi.

CIZ1 interacts with Xist RNA independently of HNRNPU

HNRNPU had been previously shown to be important as a nuclear matrix factor for the localization of Xist RNA to the Xi chromosomal territory (Wang et al., 2016; Hasegawa et al.,

2010). Unlike CIZ1, however, HNRNPU is not seen enriched on the Xi, and may therefore function indirectly as a nuclear matrix factor for the anchorage of heterochromatic factors of various chromosomes. We asked whether CIZ1 and HNRNPU may function together in the same pathway, albeit with CIZ1 being Xi-specific and HNRNPU being more general. To test this, we generated HNRNPU KO cells using CRISPR/Cas9 (Table S1). Surprisingly, HNRNPU

KO cells were viable and exhibited two noticeable defects: slow growth and dispersed Xist clouds, consistent with previous experiments using HNRNPU siRNA knockdown (Hasegawa et al., 2010). Interestingly, CIZ1 IF of HNRNPU KO cells revealed that CIZ1 remained colocalized with Xist RNA, despite Xist particles being dispersed throughout the nucleoplasm (Figure 3.6A), with a Pearson’s coefficient of >0.8. Fluorescence intensity showed CIZ1 and Xist signals peaked together nearly perfectly along a linear 8-μm distance, and this tight association was evident by STORM imaging of the same nuclei (Figure 3.6A). Thus, CIZ1 interacts with Xist

107

Figure 3.6 CIZ1 interacts with Xist RNA independently of HNRNPU

(A) CIZ1 remains colocalized with Xist RNA in HNRNPU KO cells. Boxed area is enlarged.

Pearson coefficient was calculated (O) along with randomized control (R) from the conventional image. Line chart of fluorescence intensity along the yellow line shows CIZ1 signal peaks together with Xist RNA signal (arrows). Two-color STORM image of the same cell shows CIZ1 colocalizes with Xist at the molecular level. 5/6 colocalizations (black arrows in intensity chart, white arrows in STORM image) were confirmed while 1/6 was not seen (gray arrow).

(B) HNRNPU UV-RIP using CIZ1 KO and Xist ∆Ex7 22 cell lines suggests HNRNPU can interact with Xist RNA independently of CIZ1. Mean ± s.d. for 3 replicates is shown.

(C) Transient over-expression of EGFP-CIZ1 phenocopies CIZ1 KO’s dispersal of Xist particles away from the Xi, while EGFP alone (EGFP EV) has no effect. Western blot confirms EGFP-

CIZ1 over-expression compared to endogenous levels.

(D) Transient over-expression of EGFP-CIZ1 does not affect Xist or Hnrnpu RNA levels.

Relative RNA level is normalized to untransfected cells. Mean ± s.d. for 3 replicates is shown.

(E) Proper Xist localization to Xi simultaneously requires at least two independent protein factors, CIZ1 and HNRNPU.

108

Figure 3.6 CIZ1 interacts with Xist RNA independently of HNRNPU (continued)

109

RNA independently of HNRNPU. HNRNPU UV-RIP in CIZ1 KO1 cells demonstrated that, reciprocally, HNRNPU interacts with Xist independently of CIZ1 or Xist exon 7 (Figure 3.6B).

Taken together, these data argue that both CIZ1 and HNRNPU are necessary for Xist localization to the Xi. However, their interactions with Xist RNA occur independently of each other.

Maintaining physiological levels of CIZ1 seems crucial. Intriguingly, although transient over-expression of EGFP alone did not influence Xist localization, over-expression of EGFP-

CIZ1 triggered a phenotype similar to that of CIZ1 depletion (Figure 3.6C) without affecting

Hnrnpu or Xist RNA levels (Figure 3.6D). We arrived at this conclusion through assessment of the number of delocalized Xist puncta relative to controls. The assessment was performed by two independent scorers (H.S. and D.C.) and by single-blind scoring (H.S.), each yielding similar trends: ∼90% of CIZ1 over-expressed nuclei showed disperse Xist puncta, whereas

<12% of wild-type nuclei showed this pattern (Figure 3.6C). Thus, anchorage of Xist RNA to the nuclear matrix appears to depend on a fine stoichiometric balance between Repeat E and CIZ1, with too much CIZ1 possibly saturating binding sites in the nuclear matrix and thereby preventing the anchorage of Xist Repeat E. In summary, we suggest that, although both CIZ1 and HNRNPU are required for Xist RNA localization to the Xi territory, they interact with Xist independently of each other (Figure 3.6E).

Discussion

Although much recent attention has been focused on Xist-interacting proteins that are required for transcriptional repression (Wutz, 2011; Calabrese et al., 2012; Chu et al., 2015;

McHugh et al., 2015; Minajigi et al., 2015; Zhao et al., 2008; Plath et al., 2003; Wang et al.,

2001), protein factors responsible for Xist localization have been more difficult to identify. Two previous studies, however, did report Xist exon 7 as an important domain for the spreading and localization process (Chow et al., 2007; Yamada et al., 2015). Our present study agrees with the

110 published work on the importance of exon 7, and further identified Repeat E as a critical motif within exon 7. The function of Repeat E has also been analyzed in a recent study of XCI during female ES cell differentiation (Yue et al., 2017). While our manuscript was in preparation, CIZ1 was independently identified by another group as being critical for Xist localization (Ridings-

Figueroa et al., 2017). Our study agrees with theirs on the role of CIZ1, and furthermore provides evidence for a potential direct interaction between CIZ1 and Repeat E in vivo, using

UV-RIP. This interaction is independent of other known Xist-localizing factors such as

HNRNPU, and more importantly is critical for downstream deposition of the H3K27me3 repressive mark onto Xi chromatin through proper localization of Xist.

The intensity of IF signals and EGFP signals in the CIZ1-EGFP knock-in cell line suggests that, although the number of Xist and CIZ1 clusters (puncta) are similar on the Xi

(Figure 3.2H), the actual molecular stoichiometry of CIZ1 to Xist may exceed one-to-one, with the highly repetitive nature of Repeat E enabling multiple CIZ1 proteins to bind a single Xist transcript. Repeat E is unique to Xist RNA and may provide a high-affinity platform for CIZ1 binding that would be found nowhere else in the transcriptome. A recent study of Xist secondary structure in vivo versus ex vivo showed that the Repeat E region’s accessibility is highly altered in the cellular environment (Smola et al., 2016), supporting our idea of super-stoichiometric binding of CIZ1 protein to this region of Xist RNA in vivo. It may be surprising that CIZ1 mutant mice are viable; however, they have a predisposition toward lymphoproliferative disorders

(Ridings-Figueroa et al., 2017; Nishibe et al., 2013), consistent with a loss of Xist function and

XCI in blood cells (Wang et al., 2016; Yildirim et al., 2013). Future work will be directed at a molecular understanding of how CIZ1-mediated Xist localization affects the Xi heterochromatin and gene expression state.

111

Materials and Methods

Cell culture

Transformed MEF cells were grown on glass coverslips. ES cells were grown on γ-irradiated

MEF feeders for 3 days before differentiation. For differentiation, ES colonies were trypsinized and feeders removed. ES cells were allowed to differentiate in suspension, forming embryoid bodies (EBs), for 4 days. EBs were settled down on gelatin-coated coverslips at day 4 and allowed to further differentiate until harvesting.

Identification of CIZ1

MEF cells were washed in PBS, pre-extracted in CSK/0.5% Triton X-100 on ice for 5 min and washed again in CSK on ice for an additional 5 min. Cells were crosslinked in 1.5% formaldehyde for 10 min, washed in PBS, resuspended in PBS/0.01% Tween-20, and lysed by sonication using a Covaris S2. ASH2L antibody (A300-107A; Bethyl Laboratories) was conjugated to Protein G Dynabeads (Life Technologies) by 20 mM dimethylpimelimidate in 0.1

M sodium tetraborate and quenched in 0.1 M ethanolamine. Cell lysate was incubated with antibody-bead conjugation overnight at 4°C. Pellet was washed three times with RIPA buffer (50 mM Tris pH 8.0, 500 mM NaCl, 1% Nonidet P-40, 0.5% sodium deoxycholate, 0.1% SDS), for

10 min at room temperature. To eliminate proteins bound nonspecifically to beads and/or antibody through precipitated DNA, the IP was treated with TurboDNase (Life Technologies) for

1 h at 37°C and washed again in RIPA buffer followed by PBS. Proteins were eluted from beads in 0.2 M glycine for 5 min at room temperature, and pH was neutralized by addition of Tris pH

8.0. After briefly running the eluate on a PAGE gel, a small section was excised and proteomic identification was performed (Taplin Mass Spectrometry Facility, Harvard Medical School). cDNA of candidate proteins was cloned into a mammalian EGFP expression vector (derived from pSV2-EYFP-C1 (Tsukamoto et al., 2000)) The subcellular localization of EGFP fusion proteins was assessed in HEK293FT cells after calcium phosphate transfection of plasmid.

112

Generation of CIZ1 antibody

A fragment of murine CIZ1 cDNA (corresponding to amino acids 324–538) was cloned into pET28a vector in frame with a C-terminal 6×His tag. The plasmid was introduced into

BL21(DE3) bacteria. 50 μL of overnight cultured bacteria was inoculated into 500 mL LB medium and allowed to grow at 37°C for 2 h. CIZ1(324-538)-His was induced by adding IPTG to

0.5 mM for 3 h at 37°C. The bacterial pellet was resuspended in denaturing lysis buffer (100 mM

NaH2PO4, 10 mM Tris, 6 M Guanidine-HCl). After brief sonication on Sonifier S-250 (Branson), the lysate was incubated at room temperature for 1 h. The insoluble fraction was removed by centrifugation, and imidazole (final 20 mM) was added to the supernatant to inhibit nonspecific binding of proteins during capture. Ni-NTA beads (Qiagen) were then added and allowed to bind for 2 h at 4°C. Beads were washed with lysis buffer containing 20 mM imidazole. Bound protein was eluted in lysis buffer containing 250 mM imidazole and dialyzed against PBS overnight, after which an aliquot was sent to Cocalico Biologicals for immunization of rabbit host. For affinity purification of antibody, an overlapping fragment of murine CIZ1 cDNA (corresponding to amino acids 302–538) was cloned into pET28a vector in frame with an N-terminal MBP tag and purified using amylose resin (New England Biolabs), as per manufacturer’s instructions. MBP-

CIZ1(302-538) was later conjugated to Affi-Gel 10 (Bio-Rad) as per manufacturer’s instructions.

Rabbit serum was incubated with MBP-CIZ1(302-538)/bead conjugate in PBS at 4°C overnight.

After extensive washes with PBS, antibody was eluted in 0.2 M glycine (pH 2.0)/500 mM NaCl and immediately neutralized by addition of 1.5 M Tris (pH 8.0). Antibody was dialyzed against

PBS/50% glycerol at 4°C overnight and stored at −20°C.

Labeling of Xist oligo FISH probes

Xist oligo FISH probe sequences (Table S4) were designed using tools available online

(https://www.idtdna.com/calc/analyzer). Amino-ddUTP (Kerafast) was added to the 3’-ends of pooled oligos by Terminal Transferase (New England BioLabs) as per manufacturer’s

113

instructions. Oligos were purified by phenol/chloroform extraction, concentrated by ethanol precipitation, resuspended in 0.1 M sodium borate, and labeled with Atto488 (Sigma), Cy3B (GE

Healthcare), or Alexa647 NHS-ester (Life Technologies). After another ethanol precipitation, labeled oligos were resuspended in water and labeling efficiency was evaluated by absorbance using NanoDrop (Thermo Fisher Scientific).

RNA FISH

RNA FISH was performed as previously described (Sunwoo et al., 2015) with minor modifications. Briefly, cells grown on glass coverslips were rinsed in PBS and fixed in 4% paraformaldehyde. After permeabilization in 0.5% Triton X-100 at room temperature, cells were washed in PBS and dehydrated in a series of increasing ethanol concentrations. Depending on experiment, 3–200 labeled oligo probes (0.1-5 nM each) were added to hybridization buffer containing 25% formamide, 2× SSC, 10% dextran sulfate, and 1 mg/mL yeast tRNA. RNA FISH was performed in a humidified chamber at 42°C for 3–4 h. After being washed three times in 2×

SSC, cells were mounted for wide-field fluorescent imaging or dehydrated for STORM imaging.

Nuclei were counterstained with Hoechst 33342 (Life Technologies).

RNA FISH/X-chromosome painting

1x105 cells were cytospun onto glass slides and RNA FISH performed as described above.

Slides were mounted and images captured with positions recorded. After imaging RNA signal, coverslips were carefully removed and slides rinsed first in PBS/0.05% Tween-20 and then in

PBS to remove mounting medium, treated with RNase A (0.5 mg/mL in PBS) at 37°C for 40 min to remove RNA signal, and denatured for DNA FISH in 70% formamide/2x SSC at 80°C for 15 min. Slides were quenched in ice cold 70% ethanol and dehydrated in a series of increasing ethanol concentrations. 1:10 (v/v) XMP X Green mouse chromosome paint (MetaSystems, D-

1420-050-FI) was added to hybridization buffer containing 50% formamide, 2x SSC, 10%

114 dextran sulfate, and 0.2 mg/mL mouse Cot-1 DNA (Thermo Fisher Scientific) and denatured at

95°C for 10 min. Hybridization was performed in a humidified chamber at 37°C overnight. After being washed once in 0.2x SSC at 65°C for 10 min and three times in 2x SSC at room temp for

5 min each, slides were remounted and reimaged at recorded positions.

Immunofluorescence/RNA FISH

Immunofluorescence/RNA FISH was performed as previously described (Sunwoo et al., 2015), with minor modifications. Briefly, cells grown on glass coverslips were rinsed in PBS, fixed in 4% paraformaldehyde, and permeabilized in PBS/0.5% Triton-X 100 for 10 min at room temperature. After being blocked for 20 min in 1% BSA supplemented with 10 mM VRC (New

England Biolabs), primary antibodies were added and allowed to incubate at room temperature for 1 h. Cells were washed three times in PBS/0.02% Tween-20. After incubating with secondary antibody for 30 min at room temperature, cells were washed again in PBS/0.02%

Tween-20. Cells were post-fixed in 4% paraformaldehyde and dehydrated in a series of increasing ethanol concentrations. RNA FISH was then performed as described above.

Microscopy

For wide-field fluorescent imaging, cells were observed on a Nikon 90i microscope equipped with 60×/1.4 N.A. VC objective lens, Orca ER CCD camera (Hamamatsu), and Volocity software

(Perkin-Elmer). As previously described (Sunwoo et al., 2015), STORM imaging was performed on an N-STORM (Nikon) equipped with 100×/1.4 N.A. λ objective lens, ion ×3 EM CCD camera

(Andor), and three laser lines (647, 561, and 405 nm). Imaging buffer containing 147 mM βME and GluOX (Sigma) was used to promote blinking and reduce photo-bleaching. For 3D STORM imaging, cylindrical lens was inserted into the optical path to introduce astigmatism. Z calibration was performed using 100 nm TetraSpeck beads (Life Technologies). For two-color STORM imaging, sequential imaging with appropriate emission filters (Cy5 em filter for Alexa 647 and

115

Cy3 em filter for Cy3B) was adapted to suppress any crosstalk. N-STORM module in Element software (Nikon) was used to control microscopes, acquire images, and perform 2D and 3D

STORM localizations. Analysis of STORM images was performed as previously described, using in-house Matlab (Mathworks) scripts (Sunwoo et al., 2015). Briefly, after STORM localizations were binned into each pixel position, Xist cloud was automatically identified using kmeans function. Nearest neighbor’s distance was calculated between Xist localizations within the cloud and any nearest CIZ1/ASH2L localizations. For randomized control calculation of nearest neighbors, CIZ1/ASH2L localizations were randomized within the Xist cloud, and nearest neighbor’s distance was again calculated. For randomization of colocalization analysis between Xist and CIZ1 in HNRNPU KO cells, ImageJ with JACoP plug-in was used.

Antibodies

The following primary antibodies were used: rabbit H3K27me3 (AM39155; Active Motif and

GTX60892; GeneTex), mouse HNRNPU (sc-32315; Santa Cruz), rabbit GFP (ab290; Abcam), mouse YY1 (sc-7341, Santa Cruz), mouse TUB (T5201, Sigma), rabbit GAPDH (CST14C10,

Cell Signaling), rabbit ASH2L (A300-107A; Bethyl Laboratories), rabbit ASH2L (AM39100;

Active Motif), and mouse ASH2L (ab50699; Abcam). Dye-conjugated secondary antibodies were purchased from Life Technologies. For two-color STORM, Cy3B was conjugated to anti- rabbit secondary antibody (Jackson ImmunoResearch), using NHS-Cy3B (GE Healthcare) as previously described (Sunwoo et al., 2015).

Delivery of LNA, siRNA, and EGFP fusion plasmids

ASH2L or CIZ1 cDNA was cloned into mammalian EGFP expression vector (derived from pSV2-EYFP-C1 (Tsukamoto et al., 2000)) and transfected into MEFs using Lipofectamine LTX

(Thermo Fisher Scientific) as per manufacturer’s instructions. After 24 h, cells were fixed and imaged by wide-field fluorescence microscopy. For siRNA knock-down, 10 nM ON-TARGETplus

116

SMARTpool targeting mouse ASH2L (Dharmacon, L-048754-01) or non-targeting control

(Dharmacon, D-001810-10) was introduced into MEFs using Lipofectamine RNAiMAX (Thermo

Fisher Scientific) as per manufacturer’s instructions. Xist LNA experiments were performed as previously described (Sarma et al., 2010). Briefly, 2 × 106 MEF cells were resuspended in 100

μL Nucleofector MEF II solution (Lonza) containing 2 μM Xist anti-Repeat C LNA. After electroporation was performed as per manufacturer’s instructions, cells were allowed to settle on 0.2% gelatin-coated glass coverslips. Complete medium was added and cells cultured normally until harvested at various time points.

RT-PCR

RNA was isolated from cells using TRIzol Reagent (Thermo Fisher Scientific) as per manufacturer’s instructions. Genomic DNA was removed using TURBO DNase from the

TURBO DNA-free Kit (Thermo Fisher Scientific). After inactivating TURBO DNase with DNase

Inactivation Reagent (also enclosed in TURBO DNA-free Kit), DNAase-free RNA was reverse transcribed using SuperScript III Reverse Transcriptase (Thermo Fisher Scientific) with random primers (Promega, C118A) at 25°C for 5 min, 50°C for 1 hr, and 85°C for 15 min. Conventional

PCR was performed on cDNA using 2x PCR MasterMix (Promega) or 2x Phusion High-Fidelity

PCR Master Mix (New England BioLabs). Quantitative RT-PCR was performed using iTaq

Universal SYBR Green Supermix (Bio-Rad) in a CFX96 Real-Time PCR Detection System (Bio-

Rad). Gene-specific primer pairs are listed in Table S5.

Western blot

Cells were washed once with PBS and lysed in cold lysis buffer (50 mM Tris pH 7.5, 150 mM

NaCl, 1% NP-40, 1% sodium deoxycholate, 0.1% SDS, 1x protease inhibitor cocktail [Sigma,

P8340]). Lysate was sonicated (Qsonica Q800 Sonicator) in polystyrene tubes at 45% power setting, 30 sec on/30 sec off for a total sonication time of 5 min at 4°C. After removing debris by

117 centrifugation at 16,000 g for 10 min, protein concentration in the supernatant was measured

(Pierce BCA Assay Kit). 20-50 μg protein lysate was denatured in 1x Laemmli buffer at 95°C for

10 min and resolved by SDS-PAGE. Protein was electrotransferred to Immobilon-P PVDF membrane (EMD Millipore). At room temp, the membrane was blocked with blocking buffer

(PBS/0.05% Tween-20 containing 5% milk) for 1 hr, incubated with primary antibody in blocking buffer for 1 hr, washed three times with PBS/0.05% Tween-20 for 5 min each, incubated with secondary antibody-HRP conjugate (Promega, W4011 or W4021) in blocking buffer for 30 min, and washed three times again with PBS/0.05% Tween-20 for 5 min each. Protein bands were visualized using Western Lightning Plus-ECL (PerkinElmer) and exposing to BioMax MR film

(Carestream Health).

UV-RIP

UV-RIP was performed as previously described, with minor modifications (Jeon and Lee, 2011).

One 15-cm plate of female MEF cells per IP was UV-crosslinked at 254 nm (200 mJ/cm2) in 5 mL cold PBS and collected by scraping. Cells were incubated in nuclear isolation buffer (10 mM

HEPES pH 7.5, 1.5 mM MgCl2, 10 mM KCl, 0.5 mM DTT) for 30 min on ice. Nonidet P-40 was added to 0.1% for 10 min on ice. Cells were incubated in lysis buffer (0.5% Nonidet P-40, 0.5% sodium deoxycholate, 200 U/mL RNase Inhibitor [Roche], and protease inhibitor mixture

[Roche] in PBS pH 7.4) at 4°C for 25 min with rotation, followed by DNase treatment (30 U of

TURBO DNase, 15 min at 37°C). SDS was added to 0.1% and lysate was further incubated at

4°C for 15 min. After centrifugation, the supernatant was incubated with 2 mg GFP or 1 mg CIZ1 antibodies for 2 h at 4°C. 10-20 μL Dynabeads Protein G was added for 1 h at 4°C. Beads were washed three times with 1× PBS supplemented with 1% Nonidet P-40, 0.5% sodium deoxycholate, and additional 350 mM NaCl (500 mM final salt concentration), and DNase- treated (10 U) for 30 min at 37°C. Beads were washed three times with the same wash buffer supplemented with 10 mM EDTA, followed by three washes with low-salt buffer (50 mM Tris pH

118

7.5, 50 mM NaCl, 10 mM EDTA, 1% Nonidet P-40, 0.5% sodium deoxycholate, 0.1% SDS).

Crosslinked proteins were digested in 100 mM Tris pH 7.4, 50 mM NaCl, 10 mM EDTA, 0.1 mg/mL Proteinase K (Roche), and 0.5% SDS for 30 min at 55°C. RNA was recovered using

TRIzol (Life Technologies).

Generation of Xist deletions, CIZ1 and HNRNPU KOs, and CIZ1-EGFP knock-in cells

All guide RNAs were designed using tools available online (www.rgenome.net/ or http://crispr.mit.edu). MEF CIZ1 KO and CIZ1-EGFP knock-in guide RNA (gRNA) sequences were cloned into gRNA Cloning Vector and introduced into cells alongside vector expressing hCas9_D10A (Mali et al., 2013). All other gRNA sequences were cloned into pSpCas9(BB)-2A-

GFP or pSpCas9(BB)-2A-Puro (Ran et al., 2013). Guide RNA/Cas9 plasmid delivery into MEF cells was performed using Nucleofector MEF II solution (Lonza), as per manufacturer’s instructions; delivery into ES cells was by electroporation (Bio-Rad); delivery into HEK293FT cells was by calcium phosphate transfection. Following delivery, cells were cultured for 5-8 days before clonal selection. For CIZ1-EGFP knock-in, an EGFP-3×HA insertion cassette flanked by

1-kb homologous sequences was cloned into pBluescript II vector and included with the guide

RNA/Cas9 plasmid delivery. gRNA sequences are listed in Table S3.

119

Chapter 4

Conclusion

Controversy over Xist functional domains

When I began this project in 2015, a comprehensive map of Xist functional motifs was lacking. Xist’s large size and poor sequence conservation had made it unamenable to genetic manipulation at the time. However, the presence of conspicuous tandem repeats (Repeats A-F) that were somewhat conserved provided initial motivation in the search for functional elements.

Indeed, previous work focused on perturbing these repeats, leading to identification of Repeat A as necessary for gene silencing (Wutz et al., 2002). Other functions, such as localization of Xist and Polycomb to the Xi, were less clear. One study argued for Repeat A in recruiting Polycomb

(Zhao et al., 2008), whereas another argued for a downstream region containing Repeats F, B, and C (da Rocha et al., 2014). For Xist localization, an early study implicated various, non- overlapping regions across the entire transcript, proposing several redundant regions to be involved (Wutz et al., 2002). Separate studies specifically implicated Repeat C (ironically one of the few regions not implicated by the earlier study) (Jeon and Lee, 2011; Beletskii et al., 2001;

Sarma et al., 2010), while another implicated exon 7 (Yamada et al., 2015). Thus, there was disagreement in the field over Xist’s functional elements.

The use of non-physiological systems likely explains some of the discrepancy. With few exceptions (Hoki et al., 2009; Yamada et al., 2015), most studies employed inducible Xist transgenes on the X chromosome or autosomes (often in male cells) (da Rocha et al., 2014;

Wutz et al., 2002; Jeon and Lee, 2011; Chow et al., 2007). This presents several caveats.

Forced induction can lead to 10-100x over-expression of Xist compared to endogenous levels

(Jeon and Lee, 2011). This can cause otherwise wild-type Xist RNA to diffuse beyond the Xi territory, potentially by saturating trans factors. Evidence for this saturation include: (1)

“squelching” of the endogenous Xist cloud when transgenes are introduced into female cells

(Jeon and Lee, 2011), and (2) siphoning away of HNRNPK and CIZ1 from the endogenous toward the transgenic Xist cloud (Figures 2.6I, 3.5A). Indeed, maintaining proper Xist levels seems crucial, since even slight fluctuations can affect the extent of H2AK119ub/H3K27me3 Xi

121 chromatin modifications (Figure 2.1D, F), and precise Xist:CIZ1 stoichiometry is necessary for

Xist localization (Figure 3.6C). Additionally, male cells may not provide the correct environment for Xist action, as pertinent (X-linked) trans factors may not be present or present at incorrect stoichiometry. Furthermore, there is evidence that the X chromosome itself is more amenable to inactivation by Xist compared to autosomes (Russell, 1963; Rastan, 1983), and so artificial autosome inactivation by Xist may not accurately recapitulate X inactivation.

Advantages of CRISPR/Cas9 technology and cell lines used in this work

The advent of CRISPR/Cas9 technology around this time presented an exciting opportunity to readdress the question of Xist functional domains from a more physiological perspective. I thus created a panel of deletions tiling across the endogenous Xist locus in female MEFs, where XCI has already been established. Upon identifying certain domains of interest, I generated similar deletions in female ES cells to investigate potential roles in de novo

XCI establishment. Both parental cell lines are hybrid, carrying an X chromosome from two different mouse strains with >600,000 SNPs available for inferring allelic origin (Yildirim et al.,

2011). Moreover, the ES cell line carries a heterozygous loss-of-function mutation in Tsix, a cis regulator of Xist, which drives (nonrandom) inactivation of the mus X, thereby facilitating downstream allelic analysis (Ogawa et al., 2008). Hence, these two systems allowed me to probe the effects of key Xist functional elements with allelic specificity at different stages of XCI.

Another important advantage of the MEF cell line was that it is tetraploid (post-XCI genome duplication), and thus carries two Xi’s of mus origin and two Xa’s of cas origin. This enabled isolation of Xi+/- clones (deletion on only one Xi) as well as Xi-/- clones (deletion on both

Xi’s). Xi+/- cells provided an internal control Xist cloud within the same nucleus for side-by-side comparison by microscopy, allowing detection of even subtle phenotypes. Meanwhile, Xi-/- cells provided a homogeneous system for genomics experiments.

122

Xist regions affecting RNA abundance and splicing: Repeats A, F, exons 2-6, 7a/d

One of the first observable phenotypes was that several deletions altered Xist RNA abundance. Initial attempts to remove Repeat A resulted in complete loss of Xist detection by

RNA FISH and qRT-PCR (data not shown)―consistent with prior studies in both mouse and human (Hoki, et al., 2009; Chow et al., 2007). This first deletion encompassed ~1 kb near the 5’ end of Xist, which may have disrupted a key promoter-proximal element. A second attempt to remove Repeat A using a minimal ~400-bp deletion of only the motif itself yielded proper overall

RNA levels. However, these transcripts exhibited aberrant splicing patterns. Closer inspection of splice isoforms by Sanger sequencing revealed that all had arisen through the use of a cryptic splice donor upstream of the Repeat A deletion. One study previously implicated Repeat A in

Xist splicing through interaction with the splicing factor SF2 (Royce-Tolland et al., 2010). Thus, aberrant Xist splicing might be caused by de-suppression of this cryptic splice donor, perhaps due to loss of SF2 recruitment via Repeat A. A modified Repeat A deletion, such as one that also removes the nearby cryptic splice donor, may be able provide a clean Repeat A loss-of- function while simultaneously suppressing aberrant Xist splicing (D.C., H.S., J.T.L., manuscript in preparation).

An unintentional advantage of the aberrant Xist splicing pattern caused by Repeat A deletion was that it enabled discrimination between certain changes occurring through the DNA versus RNA level. To explain, deletion of Repeat A caused exon 1 skipping (and thus loss of

Repeats F, B, C, and D at the RNA but not DNA level) in ~50% of Xist transcripts. Because there was no change in overall Xist levels (assayed by exon 7 abundance), I could conclude that any observed effect on Xist levels caused by deleting these repeats individually likely occurs at the DNA level (see Repeat F below).

Deleting Repeat F strongly weakened Xist RNA detection. Repeat F contains several

DNA binding sites for the transcription factor YY1, important for maintenance of Xist clouds.

One study reported YY1 tethers Xist transcripts to the Xist locus through its ability to bind both

123

Xist RNA (via Repeat C) and DNA (via Repeat F) (Jeon and Lee, 2011). A subsequent study found that ablating YY1 or its DNA binding sites within Repeat F impairs Xist expression, offering a simpler explanation that YY1 may function as a transcriptional activator of Xist, and

Repeat F a downstream enhancer (Makhlouf et al., 2014). My results are consistent with a transcriptional activation function for YY1/Repeat F since: (1) deleting Repeat F decreased Xist

RNA levels (though this may not have been due to loss of the YY1 sites specifically), and (2) deleting Repeat C had no effect on Xist levels or localization (though YY1 may also bind Xist through redundant regions). It is worth noting that these two models are not mutually exclusive, and YY1 may serve to both activate and tether Xist RNA. Interesting, two reports using oligonucleotides antisense to Repeat C reported loss of Xist clouds, concluding these oligos were able to displace Xist from the Xi (Beletskii et al., 2001; Sarma et al., 2010). Given deletion of Repeat C has no effect on Xist clouds, it is possible that the oligos instead cause degradation of Xist RNA or act through off-target effects (such as binding or altering the structure of adjacent

Xist regions).

Deleting exons 2-6 also caused a decrease in Xist RNA levels. One previous report noted that exon 4 is highly conserved at the primary sequence level and is predicted to form a stable stem-loop structure (Caparros et al., 2002). Despite having no effect on Xist localization or silencing, its deletion did reduce steady-state levels of Xist without affecting RNA stability, suggesting a defect in either transcription or processing (Caparros et al., 2002).

Finally, deleting exon 7a or 7d regions caused skipping of adjacent regions in most transcripts, exhibiting a pattern similar to Xist’s minor splice isoform. This isoform results from the use of a minor splice donor and acceptor within exon 7a and 7d regions, respectively. One would think that deleting these donor/acceptor regions would abolish the minor splice isoform, but it instead enhanced it. Thus, these regions may also contain splice site suppressors whose deletion causes increased splicing through the use of surrounding cryptic splice sites.

124

Xist regions affecting RNA localization: Repeat E

Repeat E is a ~1.2-kb CT-rich region at the beginning of Xist exon 7. The first half is especially repetitive in mouse, with ~30 total copies of 2-3 distinct, alternating CT-rich motifs.

Although human XIST also contains a CT-rich region at the beginning of its last exon, the exact sequence is not conserved. Deletion of mouse Repeat E resulted in widespread dispersal of

Xist RNA throughout the nucleoplasm. Finer deletion mapping revealed increasing severity of phenotype correlating with increasingly longer deletions of Repeat E, suggesting that the effect of each repeat copy is cumulative. Indeed, the highly repetitive nature of Repeat E’s upstream portion renders it a prime site for multimeric trans factor binding. A recent study of Xist secondary structure in vivo versus ex vivo showed that Repeat E’s accessibility is highly altered in the cellular environment, supporting trans factor binding to this region (Smola et al., 2016).

CIZ1 interaction with Repeat E

In collaboration with Hongjae Sunwoo and John Froberg, I demonstrated that the trans factor interacting with Repeat E and responsible for Xist localization is the nuclear matrix protein

CIZ1 (also independently shown by Ridings-Figueroa et al., 2017). CIZ1 was shown to have the highest degree of colocalization with Xist puncta compared to all other Xi-enriched factors, suggesting a physical interaction. Although CIZ1 had been previously identified as a candidate

Xist-interacting protein by one proteomics study (Chu et al., 2015), its function in XCI and interacting region within Xist were unknown. Deletion of Repeat E resulted in loss of CIZ1 localization to the Xi. Reciprocally, ablating CIZ1 in mouse or human cells resulted in delocalization of Xist from the Xi. Using X-chromosome painting, I demonstrated that the dispersal of Xist RNA was not due to changes in size of the underlying Xi territory.

The C-terminus of CIZ1 mediates immobilization of the protein to the nuclear matrix

(Ainscough et al., 2007). Thus, CIZ1 may function to limit Xist diffusion away from its site of synthesis (Xi) by fixing it to nearby matrix components. CIZ1 also contains both Matrin3-type

125

RNA-binding and C2H2-type zinc finger domains, suggesting RNA-binding capability. Indeed,

UV-RIP experiments supported direct RNA-protein interaction in vivo. However, I was unable to recapitulate Xist-CIZ1 interaction using in vitro binding assays with recombinant protein (data not shown). This may be due to improper protein folding or lack of essential post-translational modifications or cofactors. Interestingly, polypyrimidine tract-binding protein 1 (PTBP1) has also been reported to interact with Xist Repeat E in vivo by CLIP experiments (Cirillo et al., 2016), consistent with Repeat E’s CT-rich composition. As mentioned earlier, Repeat E contains more than one unique CT-rich motif. It is tempting to speculate that PTBP1 and CIZ1 bind along

Repeat E in a co-dependent or synergistic manner. Thus, additional work is required to determine whether CIZ1 interacts with Xist RNA directly or indirectly, and whether other factors might be involved.

Xist regions affecting RNA coating and Polycomb recruitment: Repeat B

Repeat B is a 200-bp C-rich element in Xist exon 1. It contains ~30 copies of the hexameric repeat GCCCC(A/T). Human XIST contains this sequence as well as a second C- rich sequence nearby. Deletion of mouse Repeat B resulted in: (1) loss of PRC1/2 recruitment, and (2) diffuse Xist clouds that remained close to the Xi territory rather than dispersing throughout the entire nucleus, as was the case for deletion of Repeat E. The highly repetitive nature of Repeat B makes it another prime site for multimeric trans factor binding.

HNRNPK interaction with Repeat B

Using in vitro RNA pulldown followed by mass spectrometry, I identified the primary trans factor interacting with Repeat B to be HNRNPK (also independently shown by Pintacuda et al., 2017). Around the same time, an in vivo CLIP-seq dataset was published showing direct

HNRNPK binding robustly across Repeat B, and somewhat detectably across Repeat C (Cirillo et al., 2016). Using IF, I demonstrated HNRNPK is tightly associated with the Xi in an Xist

126

Repeat B-dependent manner. Consistent with this, HNRNPK could be ectopically recruited to either full-length or exon 1 (which contains Repeat B) Xist transgenes. Furthermore, I was able to recapitulate direct Repeat B-HNRNPK interaction in vitro by EMSA, which confirmed suspected super-stoichiometric binding. Interestingly, Repeat B is predicted to form a highly stable secondary structure consisting of stemloops with regularly-spaced C-rich bulges (Figure

4.1). This structure is ideal for binding HNRNPK, which contains three KH RNA-binding domains that each recognize a single-stranded CCC triplet (Thisted et al., 2001).

Figure 4.1 Repeat B secondary structure

Predicted secondary structure for 9 copies of mouse Repeat B (by Mfold).

Repeat B-HNRNPK interaction is required for Polycomb recruitment

Previous work had mapped Polycomb recruitment to a ~4-kb region of Xist containing

Repeats F, B, and C (da Rocha et al., 2014). Along with a concurrent transgenic study

(Pintacuda et al., 2017), I further refined this region to Repeat B. My findings that: (1) Repeat B is required for Polycomb recruitment, and (2) Repeat B directly binds HNRNPK, fit nicely with

127 previous work demonstrating HNRNPK is required for Xist-mediated Polycomb recruitment (Chu et al., 2015). I confirmed that KD of HNRNPK in post-XCI cells phenocopies deletion of Repeat

B in causing loss of H2AK119ub/H3K27me3 enrichment on Xi by IF. This was due to failed

PRC1/2 recruitment, rather than successful recruitment but failure to deposit their marks onto chromatin. ChIP-seq verified complete loss of H2AK119ub/H3K27me3 marks chromosome-wide in both MEFs and day 14 differentiating ES cells, suggesting deletion of Repeat B alone is sufficient to abolish Polycomb occupancy on Xi at these stages. This finding is slightly at odds with another work claiming deletion of both Repeats B and C is required for complete loss of

H2AK119ub/H3K27me3 enrichment on Xi by IF (Bousard et al., 2018). Whereas CLIP-seq data does show slight binding of HNRNPK along Repeat C, it is minimal compared to that along

Repeat B (Cirillo et al., 2016). Deletion of Repeat C alone has no effect on Polycomb recruitment (Bousard et al., 2018), and another group saw complete loss of

H2AK119ub/H3K27me3 enrichment by deleting Repeat B and just the first quarter of Repeat C

(Pintacuda et al., 2017). Thus, although HNRNPK may bind at low levels to Repeat C, this region seems to contribute minimally, if at all, to Polycomb recruitment.

Role of Repeat A in Polycomb recruitment

As mentioned earlier, Repeat A has also been implicated in Polycomb recruitment through direct interaction with PRC2 (Zhao et al., 2008; Cifuentes-Rojas et al., 2014). However, several reports claim that Repeat A deletion does not substantially impact H3K27me3 enrichment on Xi (da Rocha et al., 2014; Kohlmaier et al., 2004; McHugh et al., 2015; Plath et al., 2003). My work offers confirmation that Repeat A deletion in post-XCI cells does not abolish

H2AK119ub or H3K27me3 Xi enrichment as determined by IF, but leaves open the possibility of regional defects in PRC2 recruitment and functional redundancy with other Xist domains.

Although there was a slight decrease (Figure 2.1F), this is at least partly due to the confounding effect of Repeat A deletion on Xist splicing, which causes simultaneous loss of Repeat B in

128

~50% of transcripts. Thus, compared to Repeat B, Repeat A would seem to play a minor role in the Polycomb recruitment pathway, at least during maintenance of XCI. Rather, recent ChIP- seq experiments have shown that Repeat A deletion compromises the spread of Polycomb marks from intergenic regions into active gene bodies during de novo XCI establishment (Żylicz et al., 2019). Therefore, Repeat A and/or its gene silencing activity may be a prerequisite for

Polycomb to spread to these regions and blanket the entire X chromosome.

Recent work shows that deleting Repeat B during early stages of de novo XCI impairs but does not fully eradicate H2AK119ub/H3K27me3 on Xi; small amounts are still detectable over newly-inactivated promoters and gene bodies (Bousard et al., 2018; D.C., H.S., J.T.L., manuscript in preparation). These studies assay H2AK119ub/H3K27me3 Xi enrichment after only a couple days of ES cell differentiation. By contrast, here I showed that deleting Repeat B at either the establishment (assayed at 14 days post-differentiation) or maintenance phases

(MEF) leads to complete absence of H2AK119ub/H3K27me3 on Xi. This suggests the presence of a Repeat B-independent Polycomb recruitment pathway that functions exclusively at the early stages of XCI establishment. Given that transition from an active to inactive state has been shown to trigger Polycomb recruitment to even autosomal genes (Riising et al., 2014), de novo

Xi gene silencing mediated by Repeat A may contribute to this Repeat B-independent pathway.

This cannot occur in the maintenance phase, as it is the switch into gene silencing rather than the state of being silent that leads to Polycomb recruitment, explaining the complete dependence on the Repeat B pathway during later stages of XCI. Additional experiments removing Repeats A and B individually and in combination will be necessary to parse out the precise role each plays in Polycomb recruitment (D.C., H.S., J.T.L., manuscript in preparation).

Nevertheless, these findings may offer a unifying explanation to the longstanding controversy over Polycomb recruitment by Repeat A versus B, such that the timing of XCI (i.e. initiation vs. late establishment/maintenance) matters.

129

Role of Polycomb in Xist-mediated gene silencing

Evidence suggests that Polycomb functions in stabilizing rather than initiating gene silencing (Riising et al., 2014). Accordingly, PRC2 was reported to be dispensable for early stages of XCI (Kalantry and Magnuson, 2006), but necessary to prevent X-reactivation throughout later stages (Kalantry et al., 2006). I and others demonstrated that deleting Repeat B not only disrupts Polycomb recruitment, but also impairs de novo gene silencing (Pintacuda et al., 2017; Nesterova et al., 2018; Bousard et al., 2018). This is likely attributable to Polycomb loss, since depleting HNRNPK or PRC1 recapitulates this silencing defect (Chu et al., 2015;

Almeida et al., 2017). Thus, Repeat B/Polycomb most likely function in stabilizing Repeat A- mediated gene silencing. Current work is underway to examine transcription status across a timecourse of de novo XCI in Repeat B-deleted cells to look for initial silencing followed by reactivation (D.C., H.S., J.T.L., manuscript in preparation).

Mechanism of Repeat B/HNRNPK-mediated Polycomb recruitment

While it is known that HNRNPK binding to Repeat B is necessary for recruitment of

PRC1 and PRC2, the link between HNRNPK and PRC1/2 remains a black box. EED, a component of PRC2, was first discovered through yeast two-hybrid screens using HNRNPK as bait (Denisenko and Bomsztyk, 1997). However, more recent work did not observe interaction between HNRNPK and PRC2 (Pintacuda et al., 2017). Instead, Pintacuda et al. proposed a direct interaction between HNRNPK and PCGF3/5, a component of non-canonical PRC1. A shortcoming of their co-immunoprecipitation experiment was the lack of an IgG negative control.

Furthermore, interaction between HNRNPK and PCGF3/5, if any, appears to be relatively weak, as judged by low stoichiometry of HNRNPK in mass spectrometry analysis using PCGF3/5 as bait (Brockdorff, 2018). Despite this, several specific subunits of non-canonical PRC1 appear in

Xist proteomic studies (Chu et al., 2015), whereas canonical PRC1 and PRC2 subunits are

130 lacking or lowly enriched. Thus, the connection between HNRNPK and PRC1/2 in XCI warrants further investigation.

There are several alternative hypotheses linking HNRNPK to PRC1/2 recruitment. For example, HNRNPK may act through one or more adaptor proteins. Alternatively, other Xist/Xi factors may cooperate with HNRNPK such that a particular combination of otherwise general factors gains an XCI-specific role. One potential adaptor protein is SAP18. A study comparing

Xist-interacting proteins between full-length and Repeats B/C-deleted Xist identified only five differentially bound proteins: HNRNPK, non-canonical PRC1 components (RYBP, PCGF5,

RING1B), and SAP18 (Bousard et al., 2018). A more direct way to elucidate the HNRNPK pathway would be to perform HNRNPK co-immunoprecipitation followed by mass spectrometry.

However, because HNRNPK is thought to interact with hundreds of protein partners (Bomsztyk et al., 2004), subsequent validation of candidate interactors may prove challenging. One way to limit the pool of HNRNPK-interacting proteins to those specifically relevant to XCI would be to perform pre-extraction of cells as done in this work—which removes the soluble fraction of

HNRNPK and leaves only the Xi-enriched fraction—and compare co-immunoprecipitating proteins from WT versus ∆RepB cells.

Another possibility linking HNRNPK to PRC1/2 recruitment is that HNRNPK itself or

HNRNPK-interacting proteins alter chromatin in a way that renders it permissive to Polycomb binding. Thus, the link could be indirect through chromatin changes rather than direct through physical protein-protein interaction. Another way in which the link could be indirect is if HNRNPK serves some core structural role at the hub of the Xist ribonucleoprotein complex; its removal would thus compromise general Xist function. However, this seems unlikely since: (1) removing

Repeat B does not affect at least one other Xist-dependent function (CIZ1 recruitment to Xi), and (2) no other Xist region caused complete loss of Polycomb recruitment to Xi upon deletion—unless several are redundant or function in combination.

131

Xist co-opts general factors for XCI

Many of the Xist-interacting proteins identified by proteomic studies are common RNA- binding or chromatin-associated factors (Chu et al., 2015; McHugh et al., 2015; Minajigi et al.,

2015). Though likely genuine interactors, most were assumed not to be specifically involved in

XCI, but rather in general RNA processing. Even so, KD of general processing factors such as

SRSF3 or RBM3 can cause strong Xist delocalization phenotypes (data not shown), perhaps by disrupting Xist splicing. But beyond this, Xist seems to have specifically co-opted several common RNA-binding proteins for important roles in XCI. Perhaps for this reason, the role of

HNRNPK has gone overlooked until recently.

HNRNPU and CIZ1 are both required for proper Xist localization (Hasegawa et al., 2010;

Sunwoo et al., 2017; Ridings-Figueroa et al., 2017). Yet the two function independently of one another, since CIZ1 remains visibly associated with dispersed Xist puncta in HNRNPU KO cells, and reciprocally, HNRNPU continues to interact with Xist by UV-RIP in CIZ1 KO cells. This is consistent with CLIP profiles showing HNRNPU binding along the entire length of Xist (Cirillo et al., 2016), whereas CIZ1 binding is limited to Repeat E. Unlike CIZ1, however, HNRNPU is not clearly enriched on Xi (Hasegawa et al., 2010), and may function indirectly as a nuclear matrix factor for the anchorage of RNA or other chromatin-associated factors to all chromosomes.

Although highly enriched on the Xi in female cells, CIZ1 is also present in several puncta throughout the nucleus of male cells, which lack an Xi. This raises the question: what is the role of CIZ1 outside of XCI? CIZ1 was originally identified as a CDKN1A-interacting protein associated with the nuclear matrix (Mitsui et al., 1999), and has been linked to several human diseases, including cervical dystonia and lung cancer, as well as lymphoproliferative disorder in mice (Copeland et al., 2015; Coverley et al., 2005; Xiao et al., 2012; Higgins et al., 2012;

Ridings-Figueroa et al., 2017). Thus, CIZ1 may normally play a role in proliferation and cell cycle progression, perhaps by regulating attachment/detachment of DNA or associated protein factors to the nuclear matrix.

132

Similar to HNRNPU, HNRNPK is a general nuclear factor associated with many RNAs.

Thus, it was unclear how HNRNPK might be specifically and functionally relevant to XCI. I demonstrated that extraction of the soluble protein fraction revealed a hidden layer of HNRNPK tightly associated with the Xi compartment. Thus, high abundance of soluble protein can mask underlying localization patterns, as was also shown for PCGF3/5 on Xi (Almeida et al., 2017).

More intriguing is HNRNPK’s involvement in Polycomb recruitment. I showed that HNRNPK KD causes loss of H2AK119ub/H3K27me3 enrichment on Xi but does not affect total cellular levels of H2AK119ub/H3K27me3. Thus, HNRNPK’s role in Polycomb recruitment appears to be Xi- specific. Because HNRNPK is a ubiquitous RNA-binding protein, it may cooperate with additional Xi factors for this process. Another possibility is that high local concentration of

HNRNPK on Xi due to multimeric binding to Xist Repeat B is required for efficient Polycomb recruitment, perhaps through amplification of weak protein-protein interactions (Brockdorff,

2018). Multiple HNRNPK molecules artificially tethered to transgenic Xist RNA lacking Repeat B was reportedly sufficient to recruit Polycomb (Pintacuda et al., 2017). My own attempts to reproduce this result were unsuccessful, as both N- or C-terminal fusion tags on HNRNPK

(including an identical fusion used by Pintacuda et al., 2017) prevented recruitment to even wild- type Xist containing Repeat B (data not shown). This implies that the fusion affects HNRNPK folding, function, or sterically occludes its direct binding to Repeat B. Regardless, this does not rule out the possibility that other Xist/Xi factors might cooperate with HNRNPK in Polycomb recruitment. A more conclusive experiment would be to tether HNRNPK to an RNA other than

Xist (e.g. Neat1, Malat1) or DNA locus outside the Xi, and check if HNRNPK alone can ectopically recruit Polycomb complexes.

Importantly, these ubiquitous protein factors are often highly conserved throughout evolution, whereas Xist RNA is less conserved. Take for example marsupials, which undergo imprinted XCI but do not have Xist RNA. Instead, they have Rsx (“RNA-on-the-silent X”), a repeat-rich lncRNA that is expressed only in females and is transcribed from, coats, and

133 silences a single X chromosome (Grant et al., 2012). Because it resembles Xist in structure and function but lacks similarity in sequence, it will be interesting to see whether Rsx also contains multimeric binding sites for conserved factors like HNRNPK and/or CIZ1. This would provide strong evidence for convergent evolution of particular pathways in XCI, as well as support the concept of lncRNAs serving as modular scaffolds.

Independent versus interdependent recruitment of Polycomb complexes

As mentioned earlier, competing models for Polycomb recruitment in XCI posit that one complex comes before the other in a hierarchical fashion. Most recently, it was proposed that

PRC2 recruitment to the Xi strictly depends on prior H2AK119ub modification by non-canonical

PRC1. This is largely based on the observation that, during de novo XCI, H3K27me3 enrichment on Xi is lost upon conditional RING1A/B KO (Almeida et al., 2017). However, in that study, 10% of cells had residual H3K27me3, which was presumably due to incomplete conditional KO. Here, I showed that (unconditional) RING1A/B KO in MEFs significantly weakens H3K27me3 enrichment on Xi, but indeed retains some H3K27me3. Hence, the 10% residual H3K27me3 in the original study may have in fact been biological rather than due to limitations of the experimental setup. Alternatively, inherent differences in XCI establishment

(examined by Almeida et al., 2017) versus maintenance (examined in this work) may account for divergent conclusions. Reciprocally, PRC2 ablation weakened H2AK119ub enrichment on

Xi, but some amount of H2AK119ub remained. Together, these observations are consistent with interdependent recruitment pathways (both canonical and non-canonical), but also highlight a degree of independence in recruiting each to the Xi. Thus, I propose Xist is able to recruit PRC1 and PRC2 in parallel, with the two mutually reinforcing one another’s recruitment and spreading.

HNRNPK KD phenocopies deletion of Repeat B in loss of H2AK119ub/H3K27me3 Xi enrichment and Xist cloud dispersal. Interesting, the KD experiment revealed differential kinetics for each phenotype over a 6-day timecourse. H2AK119ub Xi enrichment was lost first, followed

134 by Xist dispersal, and finally loss of H3K27me3 enrichment. Whether the order in which these events occur implies causality (i.e. H3K27me3 is lost as a result of H2AK119ub loss), or rather is due to differential kinetics for addition/removal of the different histone marks, requires further investigation. During de novo XCI, H2AK119ub accumulates on Xi earlier than H3K27me3

(Żylicz et al., 2019). But perhaps addition/removal of the monoubiquitin is simply a faster process than addition/removal of the trimethyl group, which requires stable PRC2 occupancy and/or allosteric activation to progress through mono-, di-, and trimethyl states (Højfeldt et al.,

2018; Oksuz et al., 2018). Thus, PRC1 and PRC2 may be recruited in parallel despite their marks appearing sequentially. IF for the Polycomb complexes themselves, rather than their resulting marks, should clarify this uncertainty.

Role of Polycomb in Xist RNA coating

The work presented here reveals an unexpected role for Polycomb in the process of Xist

RNA coating. Loss of PRC1 or PRC2 is sufficient to recapitulate the diffuse Xist cloud morphology caused by deleting Repeat B or depleting HNRNPK. However, because perturbing

PRC1 also affects PRC2 recruitment and vice versa, it will be difficult to disentangle which one, if not both, is more directly involved. Importantly, Polycomb loss did not alter RNA levels of Xist or genes known to affect Xist localization (e.g. CIZ1, HNRNPU, HNRNPK), nor did it interfere with HNRNPK’s association with the Xi. CHART-seq and RNA/DNA FISH experiments support the conclusion that diffuse Xist clouds result from decreased association between the RNA and chromatin―causing Xist to diffuse beyond the Xi territory―rather than from decompaction of Xi chromatin.

How PRC1/2 might influence Xist interaction with chromatin is unknown. Future work should be aimed at determining whether Polycomb complexes themselves, their associated histone marks, or consequential changes to chromatin structure may be responsible. As Xist has been proposed to interact both directly and indirectly with PRC2 and PRC1 (Zhao et al.,

135

2008; Pintacuda et al., 2017), respectively, it is possible that the complexes mediate physical contact between the RNA and chromatin. Alternatively, Polycomb-mediated changes to chromatin structure may allow Xist to better contact chromatin or spread to less accessible regions, as previously suggested (Engreitz et al., 2013). Interestingly, the effect of Polycomb deficiency in Xist cloud morphology is only observed upon complete KO of PRC1 or PRC2, but not transient/incomplete siRNA KD (data not shown). Thus, maintaining proper Xist clouds requires either only minimal amounts of PRC1/2 or longer-lasting changes induced by PRC1/2

(e.g. modification of histones and/or chromatin structure). In support of the latter, following LNA knock-off, Xist re-coats the Xi in a much more homogeneous (rather than stepwise) manner―probably because genes are already silenced, chromatin marks already present, and/or chromosomal structure already altered (Simon et al., 2013). Indeed, loss of Xist does not significantly reverse gene silencing (Brown and Willard, 1994), erosion of

H2AK119ub/H3K27me3 does not occur over just a few hours (Figure 2.7B), and Xi architectural reconfiguration is only partly reversible (Minajigi et al., 2015; Wang et al., in revision).

In any case, the intrinsic ability of Polycomb complexes to spread and form broad heterochromatin domains seems to have been exploited by Xist RNA, helping it span large distances while limiting its spread in cis. An interesting parallel can be drawn to fly dosage compensation where roX1/2 RNAs recruit the MSL complex, and certain chromatin-interacting components of this complex (i.e. MOF, MSL3) aid in its spread along the male X chromosome.

Specifically, H4K16ac modification by MOF might facilitate spreading by increasing accessibility at and around target sites (Kind et al., 2008). Meanwhile, MSL3’s chromodomain, which can bind H3K36me3 found across active gene bodies, was reportedly necessary for spreading MSL outwards from high affinity sites (Sural et al., 2008). Thus, the co-option of epigenetic complexes for spreading lncRNA-based machinery seems to be a common theme in dosage compensation across evolution.

136

In summary, my data are consistent with the following model: Xist seeds both PRC1 and

PRC2 locally onto Xi chromatin. The two spread in an interdependent fashion (via canonical and non-canonical Polycomb pathways), reinforcing each other’s recruitment. In turn, PRC1/2 facilitate Xist interaction with chromatin, creating a positive feedback loop whereby additional

Xist spreading can seed more PRC1/2, and so on. This example of Xist recruiting protein effectors which then reciprocally aid the spread of Xist would not be the first. Initial Xi coating by

Xist results in recruitment of SMCHD1, which allows Xist to further spread from previous A to B compartments by reconfiguring chromosomal architecture (Wang et al., 2018). Likewise, Repeat

A is required to facilitate Xist spreading from intergenic regions into active gene bodies, perhaps through recruitment of SPEN and subsequent histone deacetylation and gene silencing

(Engreitz et al., 2013; Żylicz et al., 2019; Chu et al., 2015; McHugh et al., 2015; Moindrot et al.,

2015; Monfort et al., 2015).

Role of Repeat B in architectural reconfiguration of the Xi

A central question in the 3D genome field is the relationship between chromosome structure and gene expression. Some evidence indicates chromosome structure regulates transcription, while other evidence indicates that gene activity alters chromosome structure

(Rowley and Corces, 2018). These need not be mutually exclusive. Here, in collaboration with

Andrea Kriz and Chen-Yu Wang, I have taken advantage of XCI as a model system to investigate the interplay between gene silencing dynamics and changes in X- chromosome/chromatin structure.

Polycomb recruitment is believed to limit chromatin accessibility (possibly through nucleosome compaction) to stabilize transcriptional repression. One study showed by ATAC- seq that deleting Repeat B during de novo XCI abrogates both Xist-mediated gene silencing and chromosome-wide decreases in chromatin accessibility, presumably due to Polycomb loss

(Pintacuda et al., 2017). However, similar ATAC-seq experiments in post-XCI Xist-deleted cells

137

(which do not exhibit gene silencing defects) revealed only minimal changes in chromatin accessibility (~37 restored ATAC-seq peaks) (Jégu et al., 2019). Thus, these changes are more likely a consequence of gene activity rather than Polycomb loss or large-scale chromatin decompaction. This is in line with my observation that loss of Repeat B/Polycomb does not affect the size of the Xi territory at the level of detection by DNA FISH.

On the other hand, Repeat B does play an important role in architectural reorganization of the Xi. Deleting Repeat B post-XCI caused an increase in strength of several TADs, but at the same time left megadomains intact. This result is similar to the effect of deleting full-length Xist

(Minajigi et al., 2015), suggesting Repeat B is at least partly responsible for Xist’s ability to remodel the Xi. TAD strength is correlated with active transcription, as shown by the TAD-like structures exhibited by escapees on the Xi (Giorgetti et al., 2016), even though genome-wide elimination of TADs by depleting CTCF or cohesin has minimal impact on transcription (Nora et al., 2017; Rao et al., 2017). Yet because Xi TADs were strengthened despite no corresponding change in transcriptional activity of ∆RepB MEFs, additional forces must also be involved. The effects of deleting Repeat B on Xi architecture were even greater when performed during de novo XCI, consistent with the accompanying failure of gene silencing. These results indicate that: (1) TADs and megadomains are not mutually exclusive, and (2) megadomain structure does not preclude gene expression. Future work will be needed to determine the significance, if any, of megadomains on XCI or Xi architecture in general.

Counterintuitively, deleting Repeat B caused the megadomains to appear even more pronounced in Pierson correlation matrices, by decreasing interaction frequency across the

Dxz4 border. Thus, Xist/Repeat B may play a role in promoting long-range chromosomal interactions that span the two megadomains. This might occur through self-association of PRC1 complexes (Kundu et al., 2017; Wang et al., in revision), which coat the entire Xi in a Repeat B- dependent manner. Thus, the megadomains observed in normal cells might already be partly obscured by PRC1. Deleting Repeat B during de novo XCI also affected compartment

138 structures, with the mutant Xi exhibiting regional Xa-like features and lacking characteristic

S1/S2 transitional compartmentalization seen on the wild-type Xi at this time (Wang et al.,

2018). Thus, Repeat B might also play a role in S1/S2 compartment formation, perhaps again through self-association of PRC1 complexes (Wang et al., in revision).

139

References

Ainscough, J.F., Rahman, F.A., Sercombe, H., Sedo, A., Gerlach, B., and Coverley, D. (2007). C-terminal domains deliver the DNA replication factor Ciz1 to the nuclear matrix. J Cell Sci 120, 115-124

Alekseyenko, A.A., Gorchakov, A.A., Kharchenko, P.V., and Kuroda, M.I. (2014). Reciprocal interactions of human C10orf12 and C17orf96 with PRC2 revealed by BioTAP-XL cross-linking and affinity purification. Proc Natl Acad Sci USA 111, 2488-2493

Alekseyenko, A.A., Peng, S., Larschan, E., Gorchakov, A.A., Lee, O.K., Kharchenko, P., McGrath, S.D., Wang, C.I., Mardis, E.R., Park, P.J., and Kuroda, M.I. (2008). A sequence motif within chromatin entry sites directs MSL establishment on the Drosophila X chromosome. Cell 134, 599-609

Almeida, M., Pintacuda, G., Masui, O., Koseki, Y., Gdula, M., Cerase, A., Brown, D., Mould, A., Innocent, C., Nakayama, M., Schermelleh, L., Nesterova, T.B., Koseki, H., and Brockdorff, N. (2017). PCGF3/5-PRC1 initiates Polycomb recruitment in X chromosome inactivation. Science 356, 1081-1084

Amrein, H., and Axel, R. (1997). Genes expressed in neurons of adult male Drosophila. Cell 88, 459-469

Arrigoni, R., Alam, S.L., Wamstad, J.A., Bardwell, V.J., Sundquist, W.I., and Schreiber-Agus, N. (2006). The Polycomb-associated protein Rybp is a ubiquitin binding protein. FEBS Lett 580, 6233-6241

Balaton, B.P. and Brown, C.J. (2016). Escape artists of the X chromosome. Trends Genet 32, 348-359

Barr, M.L. and Bertram, E.G. (1949). A morphological distinction between the neurones of the male and female, and the behaviour of the nucleolar satellite during accelerated nucleoprotein synthesis. Nature 163, 676-677

Baubec, T., Colombo, D.F., Wirbelauer, C., Schmidt, J., Burger, L., Krebs, A.R., Akalin, A., and Schübeler, D. (2015). Genomic profiling of DNA reveals a role for DNMT3B in genic methylation. Nature 520, 243-247

Bell, A.C. and Felsenfeld, G. (2000). Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature 405, 482-485

Belyaev, N., Keohane, A.M., and Turner, B.M. (1996). Differential underacetylation of histones H2A, H3 and H4 on the inactive X chromosome in human female cells. Hum Genet 97, 573-578

Berletch, J.B., Ma, W., Yang, F., Shendure, J., Noble, W.S., Disteche, C.M., and Deng, X. (2015). EScape from X inactivation varies in mouse tissues. PLoS Genet 11, e1005079

Berletch, J.B., Yang, F., Xu, J., Carrel, L., and Disteche, C.M. (2011). Genes that escape from X inactivation. Hum Genet 130, 237-245

140

Bernstein, B.E., Liu, C.L., Humphrey, E.L., Perlstein, E.O., and Schreiber, S.L. (2004). Global nucleosome occupancy in yeast. Genome Biol 5, R62.

Bernstein, B.E., Meissner, A., and Lander, E.S. (2007). The mammalian epigenome. Cell 128, 669-681

Bernstein, E., Duncan, E.M., Masui, O., Gil, J., Heard, E., and Allis, C.D. (2006). Mouse Polycomb proteins bind differentially to methylated histone H3 and RNA and are enriched in facultative heterochromatin. Mol Cell Biol 26, 2560-2569

Bickmore, W.A., and van Steensel, B. (2013). Genome architecture: domain organization of interphase chromosomes. Cell 152, 1270-1284

Biswas, S. and Rao, M. (2018). Epigenetic tools (the writers, the readers and the erasers) and their implications in cancer therapy. Eur J Pharmacol 837, 8-24

Blackledge, N.P., Farcas, A.M., Kondo, T., King, H.W., McGouran, J.F., Hanssen, L.L.P., Ito, S., Cooper, S., Kondo, K., Koseki, Y., Ishikura, T., Long, H.K., Sheahan, T.W., Brockdorff, N., Kessler, B.M., Koseki, H., and Klose, R.J. (2014). Variant PRC1 complex-dependent H2A ubiquitylation drives PRC2 recruitment and polycomb domain formation. Cell 157, 1445-1459

Blackledge, N.P., Rose, N.R., and Klose, R.J. (2015). Targeting Polycomb systems to reglate gene expression: modifications to a complex story. Nat Rev Mol Cell Biol 16, 643-649

Boettiger, A.N., Bintu, B., Moffitt, J.R., Wang, S., Beliveau, B.J., Fudenberg, G., Imakaev, M., Mirny, L.A., Wu, C.T., and Zhuang, X. (2016). Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature 529, 418-422

Boggs, B.A., Connors, B., Sobel, R.E., Chinault, A.C. and Allis, C.D. (1996). Reduced levels of histone H3 acetylation on the inactive X chromosome in human females. Chromosoma 105, 303-309

Bomsztyk, K., Denisenko, O., and Ostrowski, J. (2004). hnRNP K: one protein multiple processes. Bioessays 26, 629-638

Bonev, B., and Cavalli, G. (2016). Organization and function of the 3D genome. Nat Rev Genet 17, 661-678

Bonora, G., Deng, X., Fang, H., Ramani, V., Qiu, R., Berletch, J.B., Filippova, G.N., Duan, Z., Shendure, J., Noble, W.S., and Disteche, C.M. (2018). Orientation-dependent Dxz4 contacts shape the 3D structure of the inactive X chromosome. Nat Commun 9, 1445

Borsani, G., Tonlorenzi, R., Simmler, M.C., Dandolo, L., Arnaud, D., Capra, V., Grompe, M., Pizzuti, A., Muzny, D., Lawrence, C., Willard, H.F., Avner, P., and Ballabio, A. (1991). Characterization of a murine gene expressed from the inactive X chromosome. Nature 351, 325-329

Bousard, A., Raposo, A.C., Żylicz, J.J., Picard, C., Pires, V.B., Qi, Y., Syx, L., Chang, H.Y., Heard, E., and da Rocha, S.T. (2018). Exploring the role of Polycomb recruitment in Xist- mediated silencing of the X chromosome in ES cells. bioRxiv, 495739

141

Boyer, L.A., Plath, K., Zeitlinger, J., Brambrink, T., Medeiros, L.A., Lee, T.I., Levine, S.S., Wernig, M., Tajonar, A., Ray, M.K., Bell, G.W., Otte, A.P., Vidal, M., Gifford, D.K., Young, R.A., and Jaenisch, R. (2006). Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature 441, 349-353

Brockdorff, N. (2018). Local tandem repeat expansion in Xist RNA as a model for the functionalisation of ncRNA. Noncoding RNA 4, 28

Brockdorff, N., Ashworth, A., Kay, G.F., Cooper, P., Smith, S., McCabe, V.M., Norris, D.P., Penny, G.D., Patel, D., and Rastan, S. (1991). Conservation of position and exclusive expression of mouse Xist from the inactive X chromosome. Nature 351, 329-331

Brockdorff, N., Ashworth, A., Kay, G.F., McCabe, V.M., Norris, D.P., Cooper, P.J., Swift, S., and Rastan, S. (1992). The product of the mouse Xist gene is a 15 kb inactive X-specific transcript containing no conserved ORF and located in the nucleus. Cell 71, 515-526

Brown, C.J., Ballabio, A., Rupert, J.L., Lafreniere, R.G., Grompe, M., Tonlorenzi, R., and Willard, H.F. (1991a). A gene from the region of the human inactivation centre is expressed exclusively from the inactive X chromosome. Nature 349, 38-44

Brown, C.J., Hendrich, B.D., Rupert, J.L., Lafreniere, R.G., Xing, Y., Lawrence, J., and Willard, H.F. (1992). The human XIST gene: analysis of a 17 kb inactive X-specific RNA that contains conserved repeats and is highly localized within the nucleus. Cell 71, 527-542

Brown, C.J., Lafreniere, R.G., Powers, V.E., Sebastio, G., Ballabio, A., Pettigrew, A.L., Ledbetter, D.H., Levy, E., Craig, I.W., and Willard, H.F. (1991b). Localization of the X inactivation centre on the human X chromosome in Xq13. Nature 349, 82-84

Brown, C.J., and Willard, H.F. (1994). The human X-inactivation centre is not required for maintenance of X-chromosome inactivation. Nature 368, 154-156

Brown, J.L., Mucci, D., Whiteley, M., Dirksen, M.L., and Kassis, J.A. (1998). The Drosophila Polycomb group gene pleiohomeotic encodes a DNA binding protein with homology to the transcription factor YY1. Mol Cell 1, 1057-1064

Busturia, A., Wightman, C.D., and Sakonju, S. (1997). A silencer is required for maintenance of transcriptional repression throughout Drosophila development. Development 124, 4343-4350

Calabrese, J.M., Sun, W., Song, L., Mugford, J.W., Williams, L., Yee, D., Starmer, J., Mieczkowski, P., Crawford, G.E., and Magnuson, T. (2012). Site-specific silencing of regulatory elements as a mechanism of X inactivation. Cell 151, 951–963

Cao, R., Tsukada, Y.I., and Zhang, Y. (2005). Role of Bmi-1 and Ring1A in H2A ubiquitylation and Hox gene silencing. Mol Cell 20, 845-854

Cao, R., Wang, L., Wang, H., Xia, L., Erdjument-Bromage, H., Tempst, P., Jones, R.S., and Zhang, Y. (2002). Role of histone H3 lysine 27 methylation in Polycomb-group silencing. Science 298, 1039-1043

Caparros, M.L., Alexiou, M., Webster, Z., and Brockdorff, N. (2002). Functional analysis of the highly conserved exon IV of XIST RNA. Cytogenet Genome Res 99, 99-105

142

Casanova, M., Preissner, T., Cerase, A., Poot, R., Yamada, D., Li, X., Appanah, R., Bezstarosti, K., Demmers, J., Koseki, H., and Brockdorff, N. (2011). Polycomblike 2 facilitates the recruitment of PRC2 Polycomb group complexes to the inactive X chromosome and to target loci in embryonic stem cells. Development 138, 1471-1482

Cattanach, B.M. and Isaacson, J.H. (1967). Controlling elements in the mouse X chromosome. Genetics 57, 331-346

Cattanach, B.M. and Williams, C.E. (1972). Evidence of non-random X chromosome activity in the mouse. Genet Res 19, 229-240

Cavalli, G. (2006). Chromatin and epigenetics in development: blending cellular memory with cell fate plasticity. Development 133, 2089-2094

Chaumeil, J., Le Baccon, P., Wutz, A., and Heard, E. (2006). A novel role for Xist RNA in the formation of a repressive nuclear compartment into which genes are recruited when silenced. Genes Dev 20, 2223-2237

Chen, C.K., Blanco, M., Jackson, C., Aznauryan, E., Ollikainen, N., Surka, C., Chow, A., Cerase, A., McDonel, P., and Guttman, M. (2016). Xist recruits the X chromosome to the nuclear lamina to enable chromosome-side silencing. Science 354, 468-472

Chess, A. (2016). Monoallelic gene expression in mammals. Annu Rev Genet 50, 317-327

Chow, J.C., Hall, L.L., Baldry, S.E., Thorogood, N.P., Lawrence, J.B., and Brown, C.J. (2007). Inducible XIST-dependent X-chromosome inactivation in human somatic cells is reversible. Proc Natl Acad Sci USA 104, 10104-10109

Chu, C., Zhang, Q.C., da Rocha, S.T., Flynn, R.A., Bharadwaj, M., Calabrese, J.M., Magnuson, T., Heard, E., and Chang, H.Y. (2015). Systematic discovery of Xist RNA binding proteins. Cell 161, 404-416

Chureau, C., Chantalat, S., Romito, A., Galvani, A., Duret, L., Avner, P., and Rougeulle, C. (2011). Ftx is a non-coding RNA which affects Xist expression and chromatin structure within the X-inactivation center region. Hum Mol Genet 20, 705-718

Chureau, C., Prissette, M., Bourdet, A., Barbe, V., Cattolico, L., Jones, L., Eggen, A., Avner, P., and Duret, L. (2002). Comparative sequence analysis of the X-inactivation center region in mouse, human, and bovine. Genome Res 12, 894-908

Cifuentes-Rojas, C., Hernandez, A.J., Sarma, K., and Lee, J.T. (2014). Regulatory interactions between RNA and polycomb repressive complex 2. Mol Cell 55, 171-185

Cirillo, D., Blanco, M., Armaos, A., Buness, A., Avner, P., Guttman, M., Cerase, A., and Tartaglia, G.G. (2016). Quantitative predictions of protein interactions with long noncoding RNAs. Nat Methods 14, 5-6

Clemson, C.M., McNeil, J.A., Willard, H.F., and Lawrence, J.B. (1996). XIST RNA paints the inactive X chromosome at interphase: evidence for a novel RNA involved in nuclear/chromosome structure. J Cell Biol 132, 259-275

143

Colognori, D., Sunwoo, H., Kriz, A.J., Wang, C.Y., and Lee, J.T. (2019). Xist deletional analysis reveals an interdependency between Xist RNA and Polycomb complexes for spreading along the inactive X. Mol Cell, DOI: 10.1016/j.molcel.2019.01.015

Comet, I. and Helin, K. (2014). Revolution in the Polycomb hierarchy. Nat Struct Mol Biol 21, 573-575

Cooper, D.W. (1971). Directed genetic change model for X chromosome inactivation in eutherian mammals. Nature 230, 292-294

Cooper, D.W., Johnston, P.G., and Graves, J.A.M. (1993). X-inactivation in marsupials and monotremes. Sem Dev Biol 4, 117-128

Cooper, D.W., VandeBerg, J.L., Sharman, G.B., and Poole, W.E. (1971). Phosphoglycerate kinase polymorphism in kangaroos provides further evidence for paternal X inactivation. Nat New Biol 230, 155-157

Cooper, S., Dienstbier, M., Hassan, R., Schermelleh, L., Sharif, J., Blackledge, N.P., De Marco, V., Elderkin, S., Koseki, H., Klose, R., Heger, A., and Brockdorff, N. (2014). Targeting polycomb to pericentric heterochromatin in embryonic stem cells reveals a role for H2AK119u1 in PRC2 recruitment. Cell Rep 7, 1456-1470

Cooper, S., Grijzenhout, A., Underwood, E., Ancelin, K., Zhang, T., Nesterova, T.B., Anil- Kirmizitas, B., Bassett, A., Kooistra, S.M., Agger, K., Helin, K., Heard, E., and Brockdorff, N. (2016). Jarid2 binds mono-ubiquitylated H2A lysine 119 to mediate crosstalk between Polycomb complexes PRC1 and PRC2. Nat Commun 7, 13661

Copeland, N.A., Sercombe, H.E., Wilson, R.H., and Coverley, D. (2015). Cyclin-A-CDK2- mediated phosphorylation of CIZ1 blocks replisome formation and initiation of mammalian DNA replication. J Cell Sci 128, 1518-1527

Costanzi, C. and Pehrson, J.R. (1998). Histone macroH2A is concentrated in the inactive X chromosome of female mammals. Nature 393, 599-601

Coverley, D., Marr, J., and Ainscough, J. (2005). Ciz1 promotes mammalian DNA replication. J Cell Sci 118, 101-112

Crane, E., Bian, Q., McCord, R.P., Lajoie, B.R., Wheeler, B.S., Ralston, E.J., Uzawa, S., Dekker, J., and Meyer, B.J. (2015). Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240-244

Cremer, T., Cremer, C., Baumann, H., Luedtke, E.K., Sperling, K., Teuber, V., and Zorn, C. (1982). Rabl's model of the interphase chromosome arrangement tested in Chinese hamster cells by premature chromosome condensation and laser-UV-microbeam experiments. Hum Genet 60, 46-56

Csankovszki, G., Nagy, A., and Jaenisch, R. (2001). Synergism of Xist RNA, DNA methylation, and histone hypoacetylation in maintaining X chromosome inactivation. J Cell Biol 153, 773-784

144

Csankovszki, G., Panning, B., Bates, B., Pehrson, J.R., and Jaenisch, R. (1999). Conditional deletion of Xist disrupts histone macroH2A localization but not maintenance of X inactivation. Nat Genet 22, 323-324

Czypionka, A., de los Paños, O.R., Mateu, M.G., Barrera, F.N., Hurtado-Gómez, E., Gómez, J., Vidal, M., and Neira, J.L. (2007). The isolated C-terminal domain of Ring1B is a dimer made of stable, well-structured monomers. Biochemistry 46, 12764-12776 da Rocha, S.T., Boeva, V., Escamilla-Del-Arenal, M., Ancelin, K., Granier, C., Matias, N.R., Sanulli, S., Chow, J., Schulz, E., Picard, C., Kaneko, S., Helin, K., Reinberg, D., Stewart, A.F., Wutz, A., Margueron, R., and Heard, E. (2014). Jarid2 Is Implicated in the Initial Xist-Induced Targeting of PRC2 to the Inactive X Chromosome. Mol Cell 53, 301-316

Darrow, E.M., Huntley, M.H., Dudchenko, O., Stamenova, E.K., Durand, N.C., Sun, Z., Huang, S.C., Sanborn, A.L., Machol, I., Shamim, M., Seberg, A.P., Lander, E.S., Chadwick, B.P., and Aiden, E.L. (2016). Deletion of DXZ4 on the human inactive X chromosome alters higher-order genome architecture. Proc Natl Acad Sci USA 113, E4504-4512

Davidovich, C., Wang, X., Cifuentes-Rojas, C., Goodrich, K.J., Gooding, A.R., Lee, J.T., and Cech, T.R. (2015). Toward a consensus on the binding specificity and promiscuity of PRC2 for RNA. Mol Cell 57, 552-558

Davidovich, C., Zheng, L., Goodrich, K.J., and Cech, T.R. (2013). Promiscuous RNA binding by Polycomb repressive complex 2. Nat Struct Mol Biol 20, 1250-1257 de Napoles, M., Mermoud, J.E., Wakao, R., Tang, Y.A., Endoh, M., Appanah, R., Nesterova, T.B., Silva, J., Otte, A.P., Vidal, M., Koseki, H., and Brockdorff, N. (2004). Polycomb group proteins Ring1A/B link ubiquitylation of histone H2A to heritable gene silencing and X inactivation. Dev Cell 7, 663-676

Deng, X., Ma, W., Ramani, V., Hill, A., Yang, F., Ay, F., Berletch, J.B., Blau, C.A., Shendure, J., Duan, Z., Noble, W.S., and Disteche, C.M. (2015). Bipartite structure of the inactive mouse X chromosome. Genome Biol 16, 152

Denisenko, O.N., and Bomsztyk, K. (1997). The product of the murine homolog of the Drosophila extra sex combs gene displays transcriptional repressor activity. Mol Cell Biol 17, 4707-4717

Disteche, C.M. (2016). Dosage compensation of the sex chromosomes and autosomes. Semin Cell Dev Biol 56, 9-18

Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., and Ren, B. (2012). Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376-380

Dorafshan, E., Kahn, T.G., and Schwartz, Y.B. (2017). Hierarchical recruitment of Polycomb complexes revisited. Nucleus 8, 496-505

Duncan, I.M., (1982). Polycomblike: a gene that appears to be required for the normal expression of the bithorax and antennapedia gene complexes of Drosophila melanogaster. Genetics 102, 49-70

145

Durand, N.C., Robinson, J.T., Shamim, M.S., Machol, I., Mesirov, J.P., Lander, E.S., and Aiden, E.L. (2016a). Juicebox Provides a Visualization System for Hi-C Contact Maps with Unlimited Zoom. Cell Syst 3, 99-101

Durand, N.C., Shamim, M.S., Machol, I., Rao, S.S., Huntley, M.H., Lander, E.S., and Aiden, E.L. (2016b). Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95-98

Duszczyk, M.M., Wutz, A., Rybin, V., and Sattler, M. (2011). The Xist RNA A-repeat comprises a novel AUCG tetraloop fold and a platform for multimerization. RNA 17, 1973-1982

Eils, R., Dietzel, S., Bertin, E., Schrock, E., Speicher, M.R., Ried, T., Robert-Nicoud, M., Cremer, C., and Cremer, T. (1996). Three-dimensional reconstruction of painted human interphase chromosomes: active and inactive X chromosome territories have similar volumes but differ in shape and surface structure. J Cell Biol 135, 1427-1440

Elderkin, S., Maertens, G.N., Endoh, M., Mallery, D.L., Morrice, N., Koseki, H., Peters, G., Brockdorff, N., and Hiom, K. (2007). A phosphorylated form of Mel-18 targets the Ring1B histone H2A ubiquitin ligase to chromatin. Mol Cell 28, 107-120

Engreitz, J., Pandya-Jones, A., McDonel, P., Shishkin, A., Sirokman, K., Surka, C., Kadri, S., Xing, J., Goren, A., Lander, E.S., Plath, K., and Guttman, M. (2013). The Xist lncRNA exploits three-dimensional genome architecture to spread across the X chromosome. Science (New York, NY) 341, 1237973

Ercan, Sevinç (2015). Mechanisms of X chromosome dosage compensation. J Genomics 3, 1- 19

Erokhin, M., Georgiev, P., and Chetverina, D. (2018). Drosophila DNA-binding proteins in Polycomb repression. Epigenomes 2, 1

Eskeland, R., Leeb, M., Grimes, G.R., Kress, C., Boyle, S., Sproul, D., Gilbert, N., Fan, Y., Skoultchi, A., Wutz, A., and Bickmore, W.A. (2010). Ring1B compacts chromatin structure and represses gene expression independent of histone ubiquitination. Mol Cell 38, 452-464

Fang, J., Chen, T., Chadwick, B., Li, E., and Zhang, Y. (2004). Ring1b-mediated H2A ubiquitylation associates with inactive X chromosomes and is involved in initiation of X inactivation. J Biol Chem 279, 52812-52815

Fang, R., Moss, W.N., Rutenberg-Schoenberg, M., and Simon, M.D. (2015). Probing Xist RNA structure in cells using targeted structure-seq. PLoS Genet 11, e1005668

Farcas, A.M., Blackledge, N.P., Sudbery, I., Long, H.K., McGouran, J.F., Rose, N.R., Lee, S., Sims, D., Cerase, A., Sheahan, T.W., Koseki, H., Brockdorff, N., Ponting, C.P., Kessler, B.M., and Klose, R.J. (2012). KDM2B links the Polycomb Repressive Complex 1 (PRC1) to recognition of CpG islands. Elife 1, e00205

Ferrari, K.J., Scelfo, A., Jammula, S., Cuomo, A., Barozzi, I., Stützer, A., Fischle, W., Bonaldi, T., and Pasini, D. (2014). Polycomb-dependent H327me1 and H3K27me2 regulate active transcription and enahncer fidelity. Mol Cell 53, 49-62

146

Francis, N.J., Kingston, R.E., and Woodcock, C.L. (2004). Chromatin compaction by a polycomb group protein complex. Science 306, 1574-1577

Franke, A. and Baker, B.S. (1999). The rox1 and rox2 RNAs are essential components of the Compensasome, which mediates dosage compensation in Drosophila. Mol Cell 4, 117-122

Fritsch, C., Brown, J.L., Müller, J., and Kassis, J.A. (1999). The DNA binding Polycomb group protein Pleiohomeotic mediates silencing of a Drosophila homeotic gene. Development 126, 3905-3913

Froberg, J.E., Pinter, S.F., Kriz, A., Jegu, T., and Lee, J.T. (2018). Megadomains and superloops form dynamically but are dispensable for X-chromosome inactivation and gene escape. Nat Commun 9, 5004

Gallardo, M., Lee, H.J., Zhang, X., Bueso-Ramos, C., Pageon, L.R., McArthur, M., Multani, A., Nazha, A., Manshouri, T., Parker-Thornburg, J., Rapado, I., Quintas-Cardama, A., Kornblau, S.M., Martinez-Lopez, J., and Post, S.M. (2015). hnRNP K Is a Haploinsufficient Tumor Suppressor that Regulates Proliferation and Differentiation Programs in Hematologic Malignancies. Cancer Cell 28, 486-499

Gao, Z., Zhang, J., Bonasio, R., Strino, F., Sawai, A., Parisi, F., Kluger, Y., and Reinberg, D. (2012). PCGF homologs, CBX proteins, and RYBP define functionally distinct PRC1 family complexes. Mol Cell 45, 344-356

Gdula, M.R., Nesterova, T.B., Pintacuda, G., Godwin, J., Zhan, Y., Ozadam, H., McClellan, M., Moralli, D., Krueger, F., Green, C.M., Reik, W., Kriaucionis, S., Heard, E., Dekker, J., and Brockdorff, N. (2019). The non-canonical SMC protein SmcHD1 antagonises TAD formation and compartmentalisation on the inactive X chromosome. Nat Commun 10, 30

Giorgetti, L., Lajoie, B.R., Carter, A.C., Attia, M., Zhan, Y., Xu, J., Chen, C.J., Kaplan, N., Chang, H.Y., Heard, E., and Dekker, J. (2016). Structural organization of the inactive X chromosome in the mouse. Nature 535, 575-579

Grant, J., Mahadevaiah, S.K., Khil, P., Sangrithi, M.N., Royo, H., Duckworth, J., McCarrey, J.R., VandeBerg, J.L., Renfree, M.B., Taylor, W., Elgar, G., Camerini-Otero, R.D., Gilchrist, M.J., and Turner, J.M. (2012). Rsx is a metatherian RNA with Xist-like properties in X-chromosome inactivation. Nature 487, 254-258

Graves, J.A.M. (2006). Sex chromosome specialization and degeneration in mammals. Cell 124, 901-914

Grijzenhout, A., Godwin, J., Koseki, H., Gdula, M.R., Szumska, D., McGouran, J.F., Bhattacharya, S., Kessler, B.M., Brockdorff, N., and Cooper, S. (2016). Functional analysis of AEBP2, a PRC2 Polycomb protein, reveals a Trithorax phenotype in embryonic development and in ESCs. Development 143, 2716-2723

Guelen, L., Pagie, L., Brasset, E., Meuleman, W., Faza, M.B., Talhout, W., Eussen, B.H., de Klein, A., Wessels, L., de Laat, W., and van Steensel, B. (2008). Domain organization of human chromosomes revealed by mapping of nuclear lamina interactions. Nature 453, 948-951

147

Hansen, K.H., Bracken, A.P., Pasini, D., Dietrich, N., Gehani, S.S., Monrad, A., Rappsilber, J., Lerdrup, M., and Helin, K. (2008). A model for transmission of the H3K27me3 epigenetic mark. Nat Cell Biol 10, 1291-1300

Hasegawa, Y., Brockdorff, N., Kawano, S., Tsutui, K., Tsutui, K., and Nakagawa, S. (2010). The matrix protein hnRNP U is required for chromosomal localization of Xist RNA. Dev Cell 19, 469- 476

Hauri, S., Comoglio, F., Seimiya, M., Gerstung, M., Glatter, T., Hansen, K., Aebersold, R., Paro, R., Gstaiger, M., and Beisel, C. (2016). A high-density map for navigating the human Polycomb complexome. Cell Rep 17, 583-595

Heard, E., Mongelard, F., Arnaud, D., Chureau, C., Vourc'h C., and Avner, P. (1999). Human XIST yeast artificial chromosome transgenes show partial X inactivation center function in mouse embryonic stem cells. Proc Natl Acad Sci USA 96, 6841-6846

Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C., Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576-589

Helbig, R. and Fackelmayer, F.O. (2003). Scaffold attachment factor A (SAF-A) is concentrated in inactive X chromosome territories through its RGG domain. Chromosoma 112, 173-182

Herzing, L.B.K., Romer, J.T., Horn, J.M., and Ashworth, A. (1997). Xist has properties of the X- chromosome inactivation centre. Nature 386, 272-275

Higgins, G., Roper, K.M., Watson, I.J., Blackhall, F.H., Rom, W.N., Pass, H.I., Ainscough, J.F., and Coverley, D. (2012). Variant Ciz1 is a circulating biomarker for early-stage lung cancer. Proc Natl Acad Sci USA 109, E3128-E3135

Hisada, K., Sanchez, C., Endo, T.A., Endoh, M. Román-Trufero, M., Sharif, J., Koseki, H., and Vidal, M. (2012). RYBP represses endogenous retroviruses and preimplantation- and germ line- specific genes in mouse embryonic stem cells. Mol Cell Biol 32, 1139-1149

Hnisz, D., Shrinivas, K., Young, R.A., Chakraborty, A.K., and Sharp, P.A. (2017). A phase separation model for transcriptional control. Cell 169, 13-23

Højfeldt, J.W., Laugesen, A., Willumsen, B.M., Damhofer, H., Hedehus, L., Tvardovskiy, A., Mohammad, F., Jensen, O.N., and Helin, K. (2018). Accurate H3K27 methylation can be established de novo by SUZ12-directed PRC2. Nat Struct Mol Biol 25, 225-232

Hoki, Y., Kimura, N., Kanbayashi, M., Amakawa, Y., Ohhata, T., Sasaki, H., and Sado, T. (2009). A proximal conserved repeat in the Xist gene is essential as a genomic element for X- inactivation in mouse. Development 136, 139-146

Hughes, J.F. and Page, D.C. (2015). The biology and evolution of mammalian Y chromosomes. Annu Rev Genet 49, 507-527

Huppke, P., Maier, E.M., Warnke, A., Brendel, C., Laccone, F., and Gärtner, J. (2006). Very mild cases of Rett syndrome with skewed X inactivation. J Med Genet 43, 814-816

148

Isono, K., Endo, T.A., Ku, M., Yamada, D., Suzuki, R., Sharif, J., Ishikura, T., Toyoda, T., Bernstein, B.E., and Koseki, H. (2013). SAM domain polymerization links subnuclear clustering of PRC1 to gene silencing. Dev Cell 26, 565-577

Jansz, N., Keniry, A., Trussart, M., Bildsoe, H., Beck, T., Tonks, I.D., Mould, A.W., Hickey, P., Breslin, K., Iminitoff, M., Ritchie, M.E., McGlinn, E., Kay, G.F., Murphy, J.M., and Blewitt, M.E. (2018). Smchd1 regulates long-range chromatin interactions on the inactive X chromosome and at Hox clusters. Nat Struct Mol Biol 25, 766-777

Jégu, T., Blum, R., Cochrane, J.C., Yang, L., Wang, C.Y., Gilles, M.E., Colognori, D., Szanto, A., Marr, S.K., Kingston, R.E., and Lee, J.T. (2019). Xist RNA antagonizes the SWI/SNF chromatin remodeler BRG1 on the inactive X chromosome. Nat Struct Mol Biol 26, 96-109

Jeon, Y., and Lee, J.T. (2011). YY1 tethers Xist RNA to the inactive X nucleation center. Cell 146, 119-133

Jeppesen, P. and Turner, B.M. (1993). The inactive X chromosome in female mammals is distinguished by a lack of histone H4 acetylation, a cytogenetic marker for gene expression. Cell 74, 281-289

Jürgens, G. (1985). A group of genes controlling the spatial expression of the bithorax complex in Drosophila. Nature 316, 153-155

Kagey, M.H., Newman, J.J., Bilodeau, S., Zhan, Y., Orlando, D.A., van Berkum, N.L., Ebmeier, C.C., Goossens, J., Rahl, P.B., Levine, S.S., Taatjes, D.J., Dekker, J., and Young, R.A. (2010). Mediator and cohesin connect gene expression and chromatin architecture. Nature 467, 430- 435

Kahn, T.G., Dorafshan, E., Schultheis, D., Zare, A., Stenberg, P., Reim, I., Pirrotta, V., and Schwartz, Y.B. (2016). Interdependence of PRC1 and PRC2 for recruitment to Polycomb Response Elements. Nucleic Acids Res 44, 10132-10149

Kalantry, S., Mills, K.C., Yee, D., Otte, A.P., Panning, B., and Magnuson, T. (2006). The Polycomb group protein Eed protects the inactive X-chromosome from differentiation-induced reactivation. Nat Cell Biol 8, 195-202

Kalantry, S. and Magnuson, T. (2006). The Polycomb group protein EED is dispensable for the initiation of random X-chromosome inactivation. PLoS Genet 2, e66

Kalb, R., Latwiel, S., Baymaz, H.I., Jansen, P.W., Muller, C.W., Vermeulen, M., and Muller, J. (2014). Histone H2A monoubiquitination promotes histone H3 methylation in Polycomb repression. Nat Struct Mol Biol 21, 569-571

Kaneko, S., Son, J., Bonasio, R., Shen, S.S., and Reinberg, D. (2014). Nascent RNA interaction keeps PRC2 activity poised and in check. Genes Dev 28, 1983-1988

Kay, G.F., Penny, G.D., Patel, D., Ashworth, A., Brockdorff, N., and Rastan, S. (1993). Expression of Xist during mouse development suggests a role in the initiation of X chromosome inactivation. Cell 72, 171-182

149

Kharchenko, P.V., Tolstorukov, M.Y., and Park, P.J. (2008). Design and analysis of ChIP-seq experiments for DNA-binding proteins. Nat Biotechnol 26, 1351-1359

Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14, R36

Kim, H., Kang, K., and Kim, J. (2009). AEBP2 as a potential targeting protein for Polycomb Repression Complex PRC2. Nucleic Acids Res 37, 2940-2950

Kind, J., Vaquerizas, J.M., Gebhardt, P., Gentzel, M., Luscombe, N.M., Bertone, P., and Akhtar, A. (2008). Genome-wide analysis reveals MOF as a key regulator of dosage compensation and gene expression in Drosophila. Cell 133, 813-828

Kohlmaier, A., Savarese, F., Lachner, M., Martens, J., Jenuwein, T., and Wutz, A. (2004). A chromosomal memory triggered by Xist regulates histone methylation in X inactivation. PLoS Biol 2, E171

Kolpa, H.J., Fackelmayer, F.O., Lawrence, J.B. (2016). SAF-A requirement in anchoring XIST RNA to chromatin varies in transformed and primary cells. Dev Cell 39, 9-10

Kundu, S., Ji, F., Sunwoo, H., Jain, G., Lee, J.T., Sadreyev, R.I., Dekker, J., and Kingston, R.E. (2018). Polycomb Repressive Complex 1 Generates Discrete Compacted Domains that Change during Differentiation. Mol Cell 71, 191

Kung, J.T., Colognori, D., and Lee, J.T. (2013). Long noncoding RNAs: past, present, and future. Genetics 193, 651-669

Larson, A.G., Elnatan, D., Keenen, M.M., Trnka, M.J., Johnston, J.B., Burlingame, A.L., Agard, D.A., Redding, S., and Narlikar, G.J. (2017). Liquid droplet formation by HP1alpha suggests a role for phase separation in heterochromatin. Nature 547, 236-240

Lee, C.K., Shibata, Y., Rao, B., Strahl, B.D., and Lieb, J.D. (2004). Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet 36, 900-905.

Lee, J.T. (2000). Disruption of imprinted X inactivation by parent-of-origin effects at Tsix. Cell 103, 17-27

Lee, J.T. (2011). Gracefully ageing at 50, X-chromosome inactivation becomes a paradigm for RNA and chromatin control. Nat Rev Mol Cell Biol 12, 815-826

Lee, J.T., Davidow, L.S., and Warshawsky, D. (1999a). Tsix, a gene antisense to Xist at the X- inactivation centre. Nat Genet 21, 400-404

Lee, J.T. and Jaenisch, R. (1997). Long-range cis effects of ectopic X-inactivation centres on a mouse autosome. Nature 386, 275-279

Lee, J.T. and Lu, N. (1999). Targeted mutagenesis of Tsix leads to nonrandom X inactivation. Cell 99, 47-57

150

Lee, J.T., Lu, N., and Han, Y. (1999b). Genetic analysis of the mouse X inactivation center defines an 80-kb multifunction domain. Proc Natl Acad Sci USA 96, 3836-3841

Lee, J.T., Strauss, W.M., Dausman, J.A., and Jaenisch, R. (1996). A 450 kb transgene displays properties of the mammalian X-inactivation center. Cell 86, 83-94

Leppek, K., and Stoecklin, G. (2014). An optimized streptavidin-binding RNA aptamer for purification of ribonucleoprotein complexes identifies novel ARE-binding proteins. Nucleic Acids Res 42, e13

Lewis, E.B. (1978). A gene complex controlling segmentation in Drosophila. Nature 276, 565- 570

Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Genome Project Data Processing, S. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079

Li, H., Liefke, R., Jiang, J., Kurland, J.V., Tian, W., Deng, P., Zhang, W., He, Q., Patel, D.J., Bulyk, M.L., Shi, Y., and Wang, Z. (2017). Polycomb-like proteins link the PRC2 complex to CpG islands. Nature 549, 287-291

Liao, Y., Smyth, G.K., and Shi, W. (2014). featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923-930

Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., Sandstrom, R., Bernstein, B., Bender, M.A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L.A., Lander, E.S., and Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the . Science 326, 289-293

Liskay, R.M. and Evans, R.J. (1980). Inactive X chromosome DNA does not function in DNA- mediated cell transformation for the hypoxanthine phosphoribosyltransferase gene. Proc Natl Acad Sci USA 77, 4895-4898

Lock, L.F., Melton, D.W., Caskey, C.T., and Martin, G.R. (1986). Methylation of the mouse hprt gene differs on the active and inactive X chromosomes. Mol Cell Biol 6, 914-924

Lock, L.F., Takagi, N., and Martin, G.R. (1987). Methylation of the Hprt gene on the inactive X occurs after chromosome inactivation. Cell 48, 39-46

Lu, Z., Zhang, Q.C., Lee, B., Flynn, R.A., Smith, M.A., Robinson, J.T., Davidovich, C., Gooding, A.R., Goodrich, K.J., Mattick, J.S., Mesirov, J.P., Cech, T.R., and Chang, H.Y. (2016). RNA duplex map in living cells reveals higher-order transcriptome structure. Cell 165, 1267-1279

Luger, K., Mäder, A.W., Richmond, R.K., Sargent, D.F., and Richmond, T.J. (1997). Crystal structure of the nucleosome core particle at 2.8 A resolution. Nature 389, 251-260

Lyon, M.F. (1961). Gene action in the X-chromosome of the mouse (Mus musuculus L.). Nature 190, 372-373

151

Lyon, M.F. (1998). X-chromosome inactivation: a repeat hypothesis. Cytogenet Cell Genet 80, 133-137

Maenner, S., Blaud, M., Fouillen, L., Savoye, A., Marchand, V., Dubois, A., Sanglier-Cianférani, S., van Dorsselaer, A., Clerc, P., Avner, P., Visvikis, A., and Branlant, C. (2010). 2-D structure of the A region of Xist RNA and its implication for PRC2 association. PLoS Biol 8, e1000276

Mak, W., Nesterova, T.B., de Napoles, M., Appanah, R., Yamanaka, S., Otte, A.P., and Brockdorff, N. (2004). Reactivation of the paternal X chromosome in early mouse embryos. Science 303, 666-669

Makhlouf, M., Ouimette, J.F., Oldfield, A., Navarro, P., Neuillet, D., and Rougeulle, C. (2014). A prominent and conserved role for YY1 in Xist transcriptional activation. Nat Commun 5, 4878

Mali, P., Yang., L, Esvelt, K.M., Aach, J., Guell, M., DiCarlo, J.E., Norville, J.E., and Church, G.M. (2013). RNA-guided human genome engineering via Cas9. Science 339, 823-826

Marahrens, Y., Loring, J., and Jaenisch, R. (1998). Role of the Xist gene in X chromosome choosing. Cell 92, 657-664

Marahrens, Y., Panning, B., Dausman, J., Strauss, W., and Jaenisch, R. (1997). Xist-deficient mice are defective in dosage compensation but not spermatogenesis. Genes Dev 11, 156-166

Margueron, R., Justin, N., Ohno, K., Sharpe, M.L., Son, J., Drury, W.J., Voigt, P., Martin, S.R., Taylor, W.R., De Marco, V., Pirrotta, V., Reinberg, D., and Gamblin, S.J. (2009). Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461, 762-767

Margueron, R., and Reinberg, D. (2011). The Polycomb complex PRC2 and its mark in life. Nature 469, 343-349

Martin, G.R., Epstein, C.J., and Martin D.W. Jr. (1978). Use of teratocarcinoma stem cells as a model system for the study of X-chromosome inactivation in vitro. Basic Life Sci 12, 269-295

Maunakea, A.K., Nagarajan, R.P. Bilenky, M., Ballinger, T.J., D'Souza, C., Fouse, S.D., Johnson, B.E., Hong, C., Nielsen, C., Zhao, Y., Turecki, G., Delaney, A., Varhol, R. Thiessen, N., Shchors, K., Heine, V.M., Rowitch, D.H., Xing, X., Fiore, C., Schillebeeckx, M., Jones, S.J., Haussler, D., Marra, M.A., Hirst, M., Wang, T., and Costello, J.F. (2010). Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466, 253-257

McGinty, R.K., Henrici, R.C., and Tan, S. (2014). Crystal structure of the PRC1 ubiquitylation module bound to the nucleosome. Nature 514, 591-596

McHugh, C.A., Chen, C.K., Chow, A., Surka, C.F., Tran, C., McDonel, P., Pandya-Jones, A., Blanco, M., Burghard, C., Moradian, A., Sweredoski, M.J., Shishkin, A.A., Su, J., Lander, E.S., Hess, S., Plath, K., and Guttman, M. (2015). The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature 521, 232-236

Meissner, A., Mikkelsen T.S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A., Zhang, X., Bernstein, B.E., Nusbaum, C., Jaffe, D.B., Gnirke, A., Jaenisch, R., and Lander, E.S. Genome- scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766-770

152

Meller, V.H. and Rattner, B.P. (2002). The roX genes encode redundant male-specific lethal transcripts required for targeting of the MSL complex. MEBO J 21, 1084-1091

Meller, V.H., Wu, K.H., Roman, G., Kuroda, M., and Davis, R. (1997). roX1 RNA paints the X chromosome of male Drosophila and is regulated by the dosage compensation system. Cell 88, 445-457

Mendenhall, E.M., Koche, R.P., Truong, T., Zhou, V.W., Issac, B., Chi, A.S., Ku, M., and Bernstein, B.E. (2010). GC-rich sequence elements recruit PRC2 in mammalian ES cells. PLoS Genet 6, e1001244

Mermoud, J.E., Costanzi, C., Pehrson, J.R., and Brockdorff, N. (1999). Histone macroH2A1.2 relocates to the inactive X chromosome after initiation and propagation of X-inactivation. J Cell Biol 147, 1399-1408

Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P., Brockman, W., Kim, T.K., Koche, R.P., Lee, W., Mendenhall, E., O'Donovan, A., Presser, A., Russ, C., Xie, X., Meissner, A., Wernig, M., Jaenisch, R., Nusbaum, C., Lander, E.S., and Bernstein, B.E. (2007). Genome-wide maps of chromatin state in pluripotent and lineage- committed cells. Nature 448, 553-560

Minajigi, A., Froberg, J., Wei, C., Sunwoo, H., Kesner, B., Colognori, D., Lessing, D., Payer, B., Boukhali, M., Haas, W., and Lee, J.T. (2015). A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science 349

Mira-Bontenbal, H., and Gribnau, J. (2016). New Xist-Interacting Proteins in X-Chromosome Inactivation. Curr Biol 26, R338-342

Mitsui, K., Matsumoto, A., Ohtsuka, S., Ohtsubo, M., and Yoshimura, A. (1999). Cloning and characterization of a novel (Cip1/Waf1)-interacting zinc finger protein, ciz1. Biochem Biophys Res Commun 264, 457-464

Mohandas, T., Sparkes, R.S., and Shapiro, L.J. (1981). Reactivation of an inactive human X chromosome: evidence for X inactivation by DNA methylation. Science 211, 393-396

Moindrot, B., Cerase, A., Coker, H., Masui, O., Grijzenhout, A., Pintacuda, G., Schermelleh, L., Nesterova, T.B., and Brockdorff, N. (2015). A Pooled shRNA Screen Identifies Rbm15, Spen, and Wtap as Factors Required for Xist RNA-Mediated Silencing. Cell Rep 12, 562-572

Monfort, A., Di Minin, G., Postlmayr, A., Freimann, R., Arieti, F., Thore, S., and Wutz, A. (2015). Identification of Spen as a Crucial Factor for Xist Function through Forward Genetic Screening in Haploid Embryonic Stem Cells. Cell Rep 12, 554-561

Morey, L., Brenner, C., Fazi, F., Villa, R., Gutiérrez, A., Buschbeck, M., Nervi, C., Minucci, S., Fuks, F., and Di Croce, L. (2008). MBD3, a component of the NuRD complex, facilitates chromatin alteration and deposition of epigenetic marks. Mol Cell Biol 28, 5912-5923

Nakagawa, T., Kajitani, T., Togo, S., Masuko, N., Ohdan, H., Hishikawa, Y., Koji, T., Matsuyama, T., Ikura, T., Muramatsu, M., and Ito, T. (2008). Deubiquitylation of histone H2A activates transcriptional initiation via trans-histone cross-talk with H3K4 di- and trimethylation. Genes Dev 22, 37-49

153

Namekawa, S.H., Payer, B., Huynh, K.D., Jaenisch, R., and Lee, J.T. (2010). Two-step imprinted X inactivation: repeat versus genic silencing in the mouse. Mol Cell Biol 30, 3187- 3205

Nekrasov, M., Wild, B., and Müller, J. (2005). Nucleosome binding and histone methyltransferase activity of Drosophila PRC2. EMBO Rep 6, 348-353

Nesterova, T.B., Slobodyanyuk, S.Y., Elisaphenko, E.A., Shevchenko, A.I., Johnston, C., Pavlova, M.E., Rogozin I.B., Kolesnikov, N.N., Brockdorff, N., and Zakian, S.M. (2001). Characterization of the genomic Xist locus in rodents reveals conservation of overall gene structure and tandem repeats but rapid evolution of unique sequence. Genome Res 11, 833- 849

Nesterova, T.B., Wei, G., Coker, H., Pintacuda, G., Bowness, J.S., Zhang, T., Almeida, M., Bloechl, B., Moindrot, B., Carter, E.J., Rodrigo, I.A., Pan, Q., Bi, Y., Song, C.X., and Brockdorff, N. (2018). Systematic allelic analysis defines the interplay of key pathways in X chromosome inactivation. bioRxiv, 477232

Nishibe, R., Watanabe, W., Ueda, T., Yamasaki, N., Koller, R., Wolff, L., Honda, Z. Ohtsubo, M., and Honda, H. (2013). CIZ1, a p21Cip1/Waf1-interacting protein, functions as a tumor suppressor in vivo. FEBS Lett 587, 1529-1535

Nora, E.P., Goloborodko, A., Valton, A.L., Gibcus, J.H., Uebersohn, A., Abdennur, N., Dekker, J., Mirny, L.A., and Bruneau, B.G. (2017). Targeted degradation of CTCF decouples local insulation of chromosome domains from genomic compartmentalization. Cell 169, 930-944.e22

Nora, E.P., Lajoie, B.R., Schulz, E.G., Giorgetti, L., Okamoto, I., Servant, N., Piolot, T., van Berkum, N.L., Meisig, J., Sedat, J., Gribnau, J., Barillot, E., Blüthgen, N., Dekker, J., and Heard, E. (2012). Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381-385

Norris, D.P., Brockdorff, N., and Rastan, S. (1991). Methylation status of CpG-rich islands on active and inactive mouse X chromosomes. Mamm Genome 1, 78-83

Nozawa, R.S., Nagao, K., Igami, K.T., Shibata, S., Shirai, N., Nozaki, N., Sado, T., Kimura, H., and Obuse, C. (2013). Human inactive X chromosome is compacted through a PRC2- independent SMCHD1-HBiX1 pathway. Nat Struct Mol Biol 20, 566-573

Ogawa, Y. and Lee, J.T. (2003). Xite, X-inactivation intergenic transcription elements that regulate the probability of choice. Mol Cell 11, 731-743

Ogawa, Y., Sun, B.K., and Lee, J.T. (2008). Intersection of the RNA interference and X- inactivation pathways. Science 320, 1336-1341

Ohno, S. and Hauschka, T.S. (1960). Allocycly of the X-chromosome in tumors and normal tissues. Cancer Res 20, 541-545

Ohno, S., Kaplan, W.D., and Kinosita, R. (1959). Formation of the sex chromatin by a single X- chromosome in liver cells of Rattus norwegicus. Exp Res 18, 415-418

154

Okamoto, I., Otte, A.P., Allis, C.D., Reinberg, D., and Heard, E. (2004). Epigenetic dynamics of imprinted X inactivation during early mouse development. Science 303, 644-649

Oksuz, O., Narendra, V., Lee, C.H., Descostes, N., LeRoy, G., Raviram, R., Blumenberg, L., Karch, K., Rocha, P.P., Garcia, B.A., Skok, J.A., and Reinberg, D. (2018). Capturing the Onset of PRC2-Mediated Repressive Domain Formation. Mol Cell 70, 1149-1162 e1145

Oktaba, K., Gutiérrez, L., Gagneur, J., Girardot, C., Sengupta, A.K., Furlong, E.E.M., and Müller, J. (2008). Dynamic regulation by Polycomb group protein complexes controls pattern formation and the cell cycle in Drosophila. Dev Cell 15, 877-889

Papp, B. and Müller, J. (2006). Histone trimethylation and the maintenance of transcriptional ON and OFF states by trxG and PcG proteins. Genes Dev 20, 2041-2054

Pasini, D., Cloos, P.A.C., Walfridsson, J., Olsson, L., Bukowski, J.P., Johansen, J.V., Bak, M., Tommerup, N., Rappsilber, J., and Helin, K. (2010a). JARID2 regulates binding of the Polycomb repressive complex 2 to target genes in ES cells. Nature 464, 306-310

Pasini, D., Hansen, K.H., Christensen, J., Agger, K., Cloos, P.A.C., and Helin, K. (2008). Coordinated regulation of transcriptional repression by the RBP2 H3K4 demethylase and Polycomb-Repressive Complex 2. Genes Dev 22, 1345-1355

Pasini, D., Malatesta, M., Jung, H.R., Walfridsson, J., Willer, A., Olsson, L., Skotte, J., Wutz, A., Porse, B., Jensen, O.N., and Helin, K. (2010b). Characterization of an antagonistic switch between histone H3 lysine 27 methylation and acetylation in the transcriptional regulation of Polycomb group target genes. Nucleic Acids Res 38, 4958-4969

Pengelly, A.R., Copur, Ö., Jäckle, H., Herzig, A., and Müller, J. (2013). A histone mutant reproduces the phenotype caused by loss of histone-modifying factor Polycomb. Science 339, 698-699

Peng, J.C., Valouev, A., Swigut, T., Zhang, J., Zhao, Y., Sidow, A., and Wysocka, J. (2009). Jarid2/Jumonji coordinates control of PRC2 enzymatic activity and target gene occupancy in pluripotent cells. Cell 139, 1290-1302

Pengelly, A.R., Kalb, R., Finkl, K., and Müller, J. (2015). Transcriptional repression by PRC1 in the absence of H2A monoubiquitylation. Genes Dev 29, 1487-1492

Penny, G.D., Kay, G.F., Sheardown, S.A., Rastan, S., and Brockdorff, N. (1996). Requirement for Xist in X chromosome inactivation. Nature 379, 131-137

Peric-Hupkes, D., Meuleman, W., Pagie, L., Bruggeman, S.W.M., Solovei, I., Brugman, W., Gräf, S., Flicek, P., Kerkhoven, R.M., van Lohuizen, M., Reinders, M., Wessels, L., and van Steensel, B. (2010). Molecular maps of the reorganization of genome-nuclear lamina interactions during differentiation. Mol Cell 38, 603-613

Peters, A.H., O'Carroll, D., Scherthan, H., Mechtler, K., Sauer, S., Schöfer, C., Weipoltshammer, K., Pagani, M., Lachner, M., Kohlmaier, A., Opravil, S., Doyle, M., Sibilia, M., and Jenuwein, T. (2001). Loss of the Suv39h histone methyltransferase impairs mammalian heterochromatin and genome stability. Cell 107, 323-337

155

Pintacuda, G., Wei, G., Roustan, C., Kirmizitas, B.A., Solcan, N., Cerase, A., Castello, A., Mohammed, S., Moindrot, B., Nesterova, T.B., and Brockdorff, N. (2017). hnRNPK Recruits PCGF3/5-PRC1 to the Xist RNA B-Repeat to Establish Polycomb-Mediated Chromosomal Silencing. Mol Cell 68, 955-969 e910

Pinter, S.F., Sadreyev, R.I., Yildirim, E., Jeon, Y., Ohsumi, T.K., Borowsky, M., and Lee, J.T. (2012). Spreading of X chromosome inactivation via a hierarchy of defined Polycomb stations. Genome Res 22, 1864-1876

Plath, K., Fang, J., Mlynarczyk-Evans, S.K., Cao, R., Worringer, K.A., Wang, H., de la Cruz, C.C., Otte, A.P., Panning, B., and Zhang, Y. (2003). Role of histone H3 lysine 27 methylation in X inactivation. Science 300, 131-135

Plath, K., Talbot, D., Hamer, K.M., Otte, A.P., Yang, T.P., Jaenisch, R., and Panning, B. (2004). Developmentally regulated alterations in Polycomb repressive complex 1 proteins on the inactive X chromosome. J Cell Biol 167, 1025-1035

Poepsel, S., Kasinath, V., and Nogales, E. (2018). Cryo-EM structures of PRC2 simultaneously engaged with two functionally distinct nucleosomes. Nat Struct Mol Biol 25, 154-162

Pullirsch, D., Härtel, R., Kishimoto, H., Leeb, M., Steiner, G., and Wutz, A. (2010). The Trithorax group protein Ash2l and Saf-A are recruited to the inactive X chromosome at the onset of stable X inactivation. Development 137, 935-943

Ran, F.A., Hsu, P.D., Wright, J., Agarwala, V., Scott, D.A., and Zhang, F. (2013). Genome engineering using the CRISPR-Cas9 system. Nat Protoc 8, 2281-2308

Rao, S.S.P., Huang, S.C., Glenn St Hilaire, B., Engreitz, J.M., Perez, E.M., Kieffer-Kwon, K.R., Sanborn, A.L., Johnstone, S.E., Bascom, G.D., Bochkov, I.D., Huang, X., Shamim, M.S., Shin, J., Turner, D., Ye, Z., Omer, A.D., Robinson, J.T., Schlick, T., Bernstein, B.E., Casellas, R., Lander, E.S., and Aiden, E.L. (2017). Cohesin loss eliminates all loop domains. Cell 171, 305- 320.e24

Rao, S.S.P., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., and Aiden, E.L. (2014). A three- dimensional map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665-1680

Rastan, S. (1983). Non-random X-chromosome inactivation in mouse X-autosome translocation embryos--location of the inactivation centre. J Embryol Exp Morphol 78, 1-22

Rastan, S. and Robertson, E.J. (1985). X-chromosome deletions in embryo-derived (EK) cell lines associated with lack of X-chromosome inactivation. J Embryol Exp Morphol 90, 379-388

Reynolds, N., Salmon-Divon, M., Dvinge, H., Hynes-Allen, A., Balasooriya, G., Leaford, D., Behrens, A., Bertone, P., and Hendrich, B. (2012). NuRD-mediated deacytlation of H3K27 facilitates recruitment of Polycomb Repressive Complex 2 to direct gene repression. EMBO J 31, 593-605

Richardson, B.J., Czuppon, A.B. and Sharman, G.B. (1971). Inheritance of glucose-6- phosphate dehyrogenase variation in kangaroos. Nat New Biol 230, 154-155

156

Ridings-Figueroa, R., Stewart, E.R., Nesterova, T.B., Coker, H., Pintacuda, G., Godwin, J., Wilson, R., Haslam, A., Lilley, F., Ruigrok, R., Bageghni, S.A., Albadrani, G., Mansfield, W., Roulson, J.A., Brockdorff, N., Ainscough, J.F.X., and Coverley, D. (2017). The nuclear matrix protein CIZ1 facilitates localization of Xist RNA to the inactive X-chromosome territory. Genes Dev 31, 876-888

Riising, E.M., Comet, I., Leblanc, B., Wu, X., Johansen, J.V., and Helin, K. (2014). Gene silencing triggers Polycomb repressive complex 2 recruitment to CpG islands genome wide. Mol Cell 55, 347-360

Rinn, J.L., Kertesz, M., Wang, J.K., Squazzo, S.L., Xu, X., Brugmann, S.A., Goodnough, L.H., Helms, J.A., Farnham, P.J., Segal, E., and Chang, H.Y. (2007). Functional demarcation of active and silent chromatin domains in human HOX loci by noncoding RNAs. Cell 129, 1311- 1323

Rowley, M.J. and Corces, V.G. (2018). Organizational principles of 3D genome architecture. Nat Rev Genet 19, 789-800

Royce-Tolland, M.E., Andersen, A.A., Koyfman, H.R., Talbot, D.J., Wutz, A., Tonks, I.D., Kay, G.F., and Panning, B. (2010). The A-repeat links ASF/SF2-dependent Xist RNA processing with random choice during X inactivation. Nat Struct Mol Biol 17, 948-954

Russell, L.B. (1963). Mammalian X-chromosome action: inactivation limited in spread and region of origin. Science 140, 976-978

Ryba, T., Hiratani, I., Lu, J., Itoh, M., Kulik, M., Zhang, J., Schulz, T.C., Robins, A.J., Dalton, S., and Gilbert, D.M. (2010). Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res 20, 761-770

Sakaguchi, T., Hasegawa, Y., Brockdorff, N., Tsutsui, K., Sado, T., Nakagawa, S. (2016). Control of chromosomal localization of Xist by hnRNP U family molecules. Dev Cell 39, 11-12

Saldanha, A.J. (2004). Java Treeview--extensible visualization of microarray data. Bioinformatics 20, 3246-3248

Sarma, K., Levasseur, P., Aristarkhov, A., and Lee, J.T. (2010). Locked nucleic acids (LNAs) reveal sequence requirements and kinetics of Xist RNA localization to the X chromosome. Proc Natl Acad Sci USA 107, 22196-22201

Schmitges, F.W., Prusty, A.B., Faty, M., Stützer, A., Lingaraju, G.M., Aiwazian, J., Sack, R., Hess, D., Li, L., Zhou, S., Bunker, R.D., Wirth, U., Bouwmeester, T., Bauer, A., Ly-Hartig, N., Zhao, K., Chan, H., Gu, J., Gut, H., Fischle, W., Müller, J., and Thomä, N.H. (2011). Histone methylation by PRC2 is inhibited by active chromatin marks. Mol Cell 42, 330-341

Schoeftner, S., Sengupta, A.K., Kubicek, S., Mechtler, K., Spahn, L., Koseki, H., Jenuwein, T., and Wutz, A. (2006). Recruitment of PRC1 function at the initiation of X inactivation independent of PRC2 and silencing. EMBO J 25, 3110-3122

Schuettengruber, B., Bourbon, H.M., Di Croce, L., and Cavalli, G. (2017). Genome regulation by Polycomb and Trithorax: 70 years and counting. Cell 171, 34-57

157

Schuettengruber, B., Ganapathi, M., Leblanc, B., Portoso, M., Jaschek, R., Tolhuis, B., van Lohuizen, M., Tanay, A., and Cavalli, G. (2009). Functional anatomy of polycomb and trithorax chromatin landscapes in Drosophila embryos. PLoS Biol 7, e13

Sharman, G.B. (1971). Late DNA replication in the paternally derived X chromosome of female kangaroos. Nature 230, 231-232

Silva, J., Mak, W., Zvetkova, I., Appanah, R., Nesterova, T.B., Webster, Z., Peters, A.H., Jenuwein, T., Otte, A.P., and Brockdorff, N. (2003). Establishment of histone h3 methylation on the inactive X chromosome requires transient recruitment of Eed-Enx1 polycomb group complexes. Dev Cell 4, 481-495

Simon, J.A., and Kingston, R.E. (2013). Occupying chromatin: Polycomb mechanisms for getting to genomic targets, stopping transcriptional traffic, and staying put. Mol Cell 49, 808-824

Simon, J., Chiang, A., Bender, W., Shimell, M.J., and O'Connor, M. (1993). Elements of the Drosophila bithorax complex that mediate repression by Polycomb group products. Dev Biol 158, 131-144

Simon, M.D., Pinter, S.F., Fang, R., Sarma, K., Rutenberg-Schoenberg, M., Bowman, S.K., Kesner, B.A., Maier, V.K., Kingston, R.E., and Lee, J.T. (2013). High-resolution Xist binding maps reveal two-step spreading during X-chromosome inactivation. Nature 504, 465-469

Smeets, D., Markaki, Y., Schmid, V.J., Kraus, F., Tattermusch, A., Cerase, A., Sterr, M., Fiedler, S., Demmerle, J., Popken, J., Leonhardt, H., Brockdorff, N., Cremer, T., Schermelleh, L., Cremer, M. (2014). Three-dimensional super-resolution microscopy of the inactive X chromosome territory reveals a collapse of its active nuclear compartment harboring distinct Xist RNA foci. Epigenetics Chromatin 7, 8

Smola, M.J., Christy, T.W., Inoue, K., Nicholson, C.O., Friedersdorf, M., Keene, J.D., Lee, D.M., Calabrese, J.M., and Weeks, K.M. (2016). SHAPE reveals transcript-wide interactions, complex structural domains, and protein interactions across the Xist lncRNA in living cells. Proc Natl Acad Sci USA 113, 10322-10327

Splinter, E., de Wit, E., Nora, E.P., Klous, P., van de Werken, H.J., Zhu, Y., Kaaij, L.J., van Ijcken, W., Gribnau, J., Heard, E., and de Laat, W. (2011) Genes Dev 25, 1371-1383

Srisawat, C., and Engelke, D.R. (2001). Streptavidin aptamers: affinity tags for the study of RNAs and ribonucleoproteins. RNA 7, 632-641

Starmer, J., and Magnuson, T. (2009). A new model for random X chromosome inactivation. Development 136, 1-10

Stock, J.K., Giadrossi, S., Casanova, M., Brookes, E., Vidal, M., Koseki, H., Brockdorff, N., Fisher, A.G., and Pombo, A. (2007). Ring1-mediated ubiquitination of H2A restrains poised RNA polymerase II at bivalent genes in mouse ES cells. Nat Cell Biol 9, 1428-1435

Straub, T., Grimaud, C., Gilfillan, G.D., Mitterweger, A., and Becker, P.B. (2008). The chromosomal high-affinity binding sites for the Drosophila dosage compensation complex. PLoS Genet 4, e1000302

158

Strom, A.R., Emelyanov, A.V., Mir, M., Fyodorov, D.V., Darzacq, X., and Karpen, G.H. (2017). Phase separation drives heterochromatin domain formation. Nature 547, 241-245

Struhl, G. and Akam, M. (1985). Altered distributions of Ultrabithorax transcripts in extra sex combs mutant embryos of Drosophila. EMBO J 4, 3259-3264

Sunwoo, H., Colognori, D., Froberg, J.E., Jeon, Y., and Lee, J.T. (2017). Repeat E anchors Xist RNA to the inactive X chromosomal compartment through CDKN1A-interacting protein (CIZ1). Proc Natl Acad Sci USA 114, 10654-10659

Sunwoo, H., Wu, J.Y., and Lee, J.T. (2015). The Xist RNA-PRC2 complex at 20-nm resolution reveals a low Xist stoichiometry and suggests a hit-and-run mechanism in mouse cells. Proc Natl Acad Sci USA 112, E4216-4225

Sural, T.H., Peng, S., Li, B., Workman, J.L., Park, P.J., and Kuroda, M.I. (2008). The MSL3 chromodomain directs a key targeting step for dosage compensation of the Drosophila melanogaster X chromosome. Nat Struct Mol Biol 15, 1318-1325

Tahir, T.A., Singh, H., and Brindle, N.P. (2014). The RNA binding protein hnRNP-K mediates post-transcriptional regulation of uncoupling protein-2 by angiopoietin-1. Cell Signal 26, 1379- 1384

Takagi, N. and Sasaki, M. (1975). Preferential inactivation of the paternally derived X chromosome in the extraembryonic membranes of the mouse. Nature 256, 640-642

Tavares, L., Dimitrova, E., Oxley, D., Webster, J., Poot, R., Demmers, J., Bezstarosti, K., Taylor, S., Ura, H., Koide, H., Wutz, A., Vidal, M., Elderkin, S., and Brockdorff, N. (2012). RYBP- PRC1 complexes mediate H2A ubiquitylation at polycomb target sites independently of PRC2 and H3K27me3. Cell 148, 664-678

Thisted, T., Lyakohov, D.L., and Liebhaber, S.A. (2001). Optimized RNA targets of two closely related triple KH domain proteins, heterogeneous nuclear ribonucleoprotein K and alphaCP- 2KL, suggest distinct modes of RNA recognition. J Biol Chem 276, 17484-17496

Tian, D., Sun, S., and Lee, J.T. (2010). The long noncoding RNA, Jpx, is a molecular switch for X chromosome inactivation. Cell 143, 390-403

Tie, F., Banerjee, R., Fu, C., Stratton, C.A., Fang, M., and Harte, P.J. (2016). Polycomb inhibits histone acetylation by CBP by binding directly to its catalytic domain. Proc Natl Acad Sci USA 113, E744-753

Tomonaga, T., and Levens, D. (1995). Heterogeneous nuclear ribonucleoprotein K is a DNA- binding transactivator. J Biol Chem 270, 4875-4881

Trojer, P. and Reinberg, D. (2007). Facultative heterochromatin: is there a distinctive molecular signature? Mol Cell 28, 1-13

Tsukamoto, T., Hashiguchi, N., Janicki, S.M., Tumbar, T., Belmont, A.S., and Spector, D.L. (2000). Visualization of gene activity in living cells. Nat Cell Biol 2, 871-878

159

Wake, N., Takagi, N., and Sasaki, M. (1976). Non-random inactivation of X chromosome in the rat yolk sac. Nature 262, 580-581

Wang, C.Y., Colognori, D.C., Sunwoo, H.S., and Lee, J.T. (in revision). Polycomb repressive complex 1 and SMCHD1 collaborate in the stepwise folding of the inactive X chromosome. Nat Commun

Wang, C.Y., Jegu, T., Chu, H.P., Oh, H.J., and Lee, J.T. (2018). SMCHD1 Merges Chromosome Compartments and Assists Formation of Super-Structures on the Inactive X. Cell 174, 406-421

Wang, H., Wang, L., Erdjument-Bromage, H., Vidal, M., Tempst, P., Jones, R.S., and Zhang, Y. (2004a). Role of histone H2A ubiquitination in Polycomb silencing. Nature 431, 873-878

Wang, J., Mager, J., Chen, Y., Schneider, E., Cross, J.C., Nagy, A., and Magnuson, T. (2001). Imprinted X inactivation maintained by a mouse Polycomb group gene. Nat Genet 28, 371-375

Wang, J., Syrett, C.M., Kramer, M.C., Basu, A., Atchison, M.L., Anguera, M.C. (2016). Unusual maintenance of X chromosome inactivation predisposes female lymphocytes for increased expression from the inactive X. Proc Natl Acad Sci USA, 113, E2029-E2038

Wang, L., Brown, J.L., Cao, R., Zhang, Y., Kassis, J.A., and Jones, R.S. (2004b). Hierarchical recruitment of Polycomb group silencing complexes. Mol Cell 14, 637-646

Wang, R., Taylor, A.B., Leal, B.Z., Chadwell, L.V., Ilangovan, U., Robinson, A.K., Schirf, V., Hart, P.J., Lafer, E.M., Demeler, B., Hinck, A.P., McEwen, D.G., and Kim, C.A. (2010). Polycomb group targeting through different binding partners of RING1B C-terminal domain. Structure 18, 966-975

Wang, X., Goodrich, K.J., Gooding, A.R., Naeem, H., Archer, S., Paucek, R.D., Youmans, D.T., Cech, T.R., and Davidovich, C. (2017). Targeting of Polycomb repressive complex 2 to RNA by short repeats of consecutive guanines. Mol Cell 65, 1056-1067

Welshons, W.J. and Russell, L.B. (1959). The Y-chromosome as the bearer of male determinging factors in the mouse. Proc Natl Acad Sci USA 45, 560-566

West, J.D., Frels, W.I., Chapman, V.M., and Papaioannou, V.E. (1977). Preferential expression of the maternally derived X chromosome in the mouse yolk sac. Cell 12, 873-882

Wutz, A. (2011). RNA-mediated silencing mechanisms in mammalian cells. Prog Mol Biol Transl Sci 101, 351–376

Wutz, A. and Jaenisch, R. (2000). A shift from reversible to irreversible X inactivation is triggered during ES cell differentiation. Mol Cell 5, 695-705

Wutz, A., Rasmussen, T.P., and Jaenisch, R. (2002). Chromosomal silencing and localization are mediated by different domains of Xist RNA. Nat Genet 30, 167-174

Xiao, J., Uitti, R.J., Zhao, Y., Vemula, S.R., Perlmutter, J.S., Wszolek, Z.K., Maraganore, D.M., Auburger, G., Leube, B., Lehnhoff, K., and LeDoux, M.S. (2012). Mutations in CIZ1 cause adult onset primary cervical dystonia. Ann Neurol 71, 458-469.

160

Yamada, N., Hasegawa, Y., Yue, M., Hamada, T., Nakagawa, S., and Ogawa, Y. (2015). Xist exon 7 contributes to the stable localization of Xist RNA on the inactive X-chromosome. PLoS Genet 11, e1005430

Yildirim, E., Kirby, J.E., Brown, D.E., Mercier, F.E., Sadreyev, R.I., Scadden, D.T., and Lee, J.T. (2013). Xist RNA is a potent suppressor of hematologic cancer in mice. Cell 152, 727-742

Yildirim, E., Sadreyev, R.I., Pinter, S.F., and Lee, J.T. (2011). X-chromosome hyperactivation in mammals via nonlinear relationships between chromatin states and transcription. Nat Struct Mol Biol 19, 56-61

Yue, M., Ogawa, A., Yamada, N., Charles Richard, J.L., Barski, A., and Ogawa, Y. (2017). Xist RNA repeat E is essential for ASH2L recruitment to the inactive X and regulates histone modifications and escape gene expression. PLoS Genet 13, e1006890

Zhang, L.F., Huynh, K.D., and Lee, J.T. (2007). Perinucleolar targeting of the inactive X during S phase: Evidence for a role in the maintenance of silencing. Cell 129, 693-706

Zhao, J., Ohsumi, T.K., Kung, J.T., Ogawa, Y., Grau, D.J., Sarma, K., Song, J.J., Kingston, R.E., Borowsky, M., and Lee, J.T. (2010). Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol Cell 40, 939-953

Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J., and Lee, J.T. (2008). Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750-756

Żylicz, J.J., Bousard, A., Žumer, K., Dossin, F., Mohammad, E., da Rocha, S.T., Schwalb, B., Syx, L., Dingli, F., Loew, D., Cramer, P., and Heard, E. (2019). The implication of early chromatin changes in X chromosome inactivation. Cell 176, 182-197

161

Appendix

Table S1 Sanger sequencing information

Sanger sequencing of Xist deletions and knock-in/out cells. For Xist deletions, both genomic

DNA (gDNA) and cDNA were sequenced to verify: (1) genotypes correspond to Xi (Xist expressed) rather than Xa (Xist not expressed), and (2) appropriate splicing occurs. Where available, SNPs were also used to distinguish Xi (mus) and Xa (cas). Deleted regions are shown in red with gRNA target sites underlined. Introns and exons are shown in lower and upper case, respectively, with exon-exon junctions separated by a vertical bar (|). Cryptic splice donor/acceptor sites are shown in bold, and insertion/point mutations are highlighted in yellow.

For CIZ1-EGFP-3xHA KI, EGFP is shown in green and 3xHA in blue.

Xist deletion MEF cells

∆RepA Xi+/- (clone X31)

Genotype 1 (10/10 colonies) gDNA:

TTGACGTTTTGATATGTTCTGGTAAGATTTTTTTTTTGACATGTCCTCCATACTTTTTGATATTTGTAAT ATTTTCAGTCAATTTTTCATTTTTAAGGAATATTTCTTTGTTGTGCCTTTTGGTTGATACTTGTGTGTGT ATGGTGGACTTACCTTTCTTTCATTGTTTATATATTCTTGCCCATCGGGGCCACGGATACCTGTGTGTCC TCCCCGCCATTCCATGCCCAACGGGGTTTTGGATACTTACCTGCCTTTTCATTCTTTTTTTTTCTTATTA TTTTTTTTTCTAAACTTGCCCATCTGGGCTGTGGATACCTGCTTTTATTCTTTTTTTCTTCTCCTTAGCC CATCGGGGCCATGGATACCTGCTTTTTGTAAAAAAAAAAAAAAAAAAAAAAAAACCTTTCTCGGTCCATC GGGACCTCGGATACCTGCGTTTAGTCTTTTTTTCCCATGCCCAACGGGGCCTCGGATACCTGCTGTTATT ATTTTTTTTTCTTTTTCTTTTGCCCATCGGGGCTGTGGATACCTGCTTTAAATTTTTTTTTTCACGGCCC AACGGGGCGCTTGGTGGATGGAAATATGGTTTTGTGAGTTATTGCACTACCTGGAATATCTATGCCTCTT ATTTGCGTGTACTGTTGCTGCTGATCGTTTGGTGCTGTGTGAGTGAACCTATGGCTTAGAAAAACGACTT TGCTCTTAAACTGAGTGGGT cDNA: See Figure 2.2

∆RepA Xi-/- (clone X9)

Genotype 1 (5/12 colonies) gDNA:

TTGACGTTTTGATATGTTCTGGTAAGATTTTTTTTTTGACATGTCCTCCATACTTTTTGATATTTGTAAT ATTTTCAGTCAATTTTTCATTTTTAAGGAATATTTCTTTGTTGTGCCTTTTGGTTGATACTTGTGTGTGT

163

Table S1 Sanger sequencing information (continued) ATGGTGGACTTACCTTTCTTTCATTGTTTATATATTCTTGCCCATCGGGGCCACGGATACCTGTGTGTCC TCCCCGCCATTCCATGCCCAACGGGGTTTTGGATACTTACCTGCCTTTTCATTCTTTTTTTTTCTTATTA TTTTTTTTTCTAAACTTGCCCATCTGGGCTGTGGATACCTGCTTTTATTCTTTTTTTCTTCTCCTTAGCC CATCGGGGCCATGGATACCTGCTTTTTGTAAAAAAAAAAAAAAAAAAAAAAAAACCTTTCTCGGTCCATC GGGACCTCGGATACCTGCGTTTAGTCTTTTTTTCCCATGCCCAACGGGGCCTCGGATACCTGCTGTTATT ATTTTTTTTTCTTTTTCTTTTGCCCATCGGGGCTGTGGATACCTGCTTTAAATTTTTTTTTTCACGGCCC AACGGGGCGCTTGGTGGATGGAAATATGGTTTTGTGAGTTATTGCACTACCTGGAATATCTATGCCTCTT ATTTGCGTGTACTGTTGCTGCTGATCGTTTGGTGCTGTGTGAGTGAACCTATGGCTTAGAAAAACGACTT TGCTCTTAAACTGAGTGGGT cDNA: See Figure 2.2

Genotype 2 (7/12 colonies) gDNA:

TTGACGTTTTGATATGTTCTGGTAAGATTTTTTTTTTGACATGTCCTCCATACTTTTTGATATTTGTAAT ATTTTCAGTCAATTTTTCATTTTTAAGGAATATTTCTTTGTTGTGCCTTTTGGTTGATACTTGTGTGTGT ATGGTGGACTTACCTTTCTTTCATTGTTTATATATTCTTGCCCATCGGGGCCACGGATACCTGTGTGTCC TCCCCGCCATTCCATGCCCAACGGGGTTTTGGATACTTACCTGCCTTTTCATTCTTTTTTTTTCTTATTA TTTTTTTTTCTAAACTTGCCCATCTGGGCTGTGGATACCTGCTTTTATTCTTTTTTTCTTCTCCTTAGCC CATCGGGGCCATGGATACCTGCTTTTTGTAAAAAAAAAAAAAAAAAAAAAAAAACCTTTCTCGGTCCATC GGGACCTCGGATACCTGCGTTTAGTCTTTTTTTCCCATGCCCAACGGGGCCTCGGATACCTGCTGTTATT ATTTTTTTTTCTTTTTCTTTTGCCCATCGGGGCTGTGGATACCTGCTTTAAATTTTTTTTTTCACGGCCC AACGGGGCGCTTGGTGGATGGAAATATGGTTTTGTGAGTTATTGCACTACCTGGAATATCTATGCCTCTT ATTTGCGTGTACTGTTGCTGCTGATCGTTTGGTGCTGTGTGAGTGAACCTATGGCTTAGAAAAACGACTT TGCTCTTAAACTGAGTGGGT cDNA: See Figure 2.2

∆RepF Xi+/- (clone 19)

Genotype 1 (11/11 colonies) gDNA:

GTTCAGGGCGTGGAGAGCCCGCGTCCGCCATTATGGCTTCTGCGTGATACGGCTATTCTCGAGCCAGTTA CGCCAAGAATTAGGACACCGAGGAGCACAGCGGACTGGATAAAAGCAACCAATTGCGCTGCGCTAGCTAA AGGCTTTCTTTATATGTGCGGGGTTGCGGGATTCGCCTTGATTTGTGGTAGCATTTGCGGGGTTGTGCTA GCCGGAAGTAGAAAGCCAAGGAGTGCTCGTATTAGTGTGCGGTGTTGCGCGGAAGCCGCAGAGGACTAGG GGATAGGGCTCAGCGTGGGTGTGGGGATTGGGCAGGGTGTGTGTGCATATGGACCCCTGGCGCGGTCCCC CGTGGCTTTAAGGGCTGCTCAGAAGTCTATAAAATGGCGGCTCGGGGGCTCCACCCGAGGCTCGACAGCC CAATCTTTGTTCTGGTGTGTAGCAATGGATTATAGGACATTTAGGTCGTACAGGAAAAGATGGCGGCTCA AGTTCTTGGTGCGGTATAACGCAAAGGGCTTTGTGTGTCACATGTCAGCTTCATGTCTGAGTTAGCCTGG AGAGGTGGCACATGCTCTTGAATGTGTCTAAGATGGCGGAAGTCATGTGACCTGCCCTCTAGTGGTTTCT TTCAGTGATTTTTTTTTTGGCGGGCTTTAGCTACTTGGCGGGCTTTGCCCGAGGGTACACTTGGTGCATT ATGGTAGGGTGTGGTTGGTCCTACCTTGTGCCACTCGAAGCTGAGGCAAGGCTAAGTGGAAGTGTTGGTT GCCACTTGACGTAACTCGTCAGAAATGGGCACAAGTGTGAAAGTGTTGGTGTTTGCTTGACTTCCAGTTA

164

Table S1 Sanger sequencing information (continued) GAAATGTGCATTATTGCTTGGTGGCCAGGATGGAATTAGACTGTGATGAGTCACTGTCCCATAAGGACGT GAGTTTCGCTTGGTACTTCACGTGTGTCTTTAGTCATCATTTTTTCGAAGTGCCTGCCCAGGTCGGGAGA GCGCATGCTTGCAATTCTAACACTG cDNA: As expected from gDNA

∆RepF Xi-/- (clone 6)

Genotype 1 (20/20 colonies) gDNA:

GTTCAGGGCGTGGAGAGCCCGCGTCCGCCATTATGGCTTCTGCGTGATACGGCTATTCTCGAGCCAGTTA CGCCAAGAATTAGGACACCGAGGAGCACAGCGGACTGGATAAAAGCAACCAATTGCGCTGCGCTAGCTAA AGGCTTTCTTTATATGTGCGGGGTTGCGGGATTCGCCTTGATTTGTGGTAGCATTTGCGGGGTTGTGCTA GCCGGAAGTAGAAAGCCAAGGAGTGCTCGTATTAGTGTGCGGTGTTGCGCGGAAGCCGCAGAGGACTAGG GGATAGGGCTCAGCGTGGGTGTGGGGATTGGGCAGGGTGTGTGTGCATATGGACCCCTGGCGCGGTCCCC CGTGGCTTTAAGGGCTGCTCAGAAGTCTATAAAATGGCGGCTCGGGGGCTCCACCCGAGGCTCGACAGCC CAATCTTTGTTCTGGTGTGTAGCAATGGATTATAGGACATTTAGGTCGTACAGGAAAAGATGGCGGCTCA AGTTCTTGGTGCGGTATAACGCAAAGGGCTTTGTGTGTCACATGTCAGCTTCATGTCTGAGTTAGCCTGG AGAGGTGGCACATGCTCTTGAATGTGTCTAAGATGGCGGAAGTCATGTGACCTGCCCTCTAGTGGTTTCT TTCAGTGATTTTTTTTTTGGCGGGCTTTAGCTACTTGGCGGGCTTTGCCCGAGGGTACACTTGGTGCATT ATGGTAGGGTGTGGTTGGTCCTACCTTGTGCCACTCGAAGCTGAGGCAAGGCTAAGTGGAAGTGTTGGTT GCCACTTGACGTAACTCGTCAGAAATGGGCACAAGTGTGAAAGTGTTGGTGTTTGCTTGACTTCCAGTTA GAAATGTGCATTATTGCTTGGTGGCCAGGATGGAATTAGACTGTGATGAGTCACTGTCCCATAAGGACGT GAGTTTCGCTTGGTACTTCACGTGTGTCTTTAGTCATCATTTTTTCGAAGTGCCTGCCCAGGTCGGGAGA GCGCATGCTTGCAATTCTAACACTG cDNA: As expected from gDNA

∆RepB Xi+/- (clone 8)

Genotype 1 (13/13 colonies) gDNA:

TGAGTCACTGTCCCATAAGGACGTGAGTTTCGCTTGGTACTTCACGTGTGTCTTTAGTCATCATTTTTTC GAAGTGCCTGCCCAGGTCGGGAGAGCGCATGCTTGCAATTCTAACACTGAAGTGTTGGATGATGTCGGAT CCGATTCGAGAGACCGAGGCTGCGGGTTCTTGGTCGATGTAAATCATTGAAACCTCACCTATTAAAAGAA AGAAAAGTATCTAAGGCCATTTCAAGGACATTTGACTCATCCGCTTGCGTTCATAGTCTCTTACAGTGCT CTATACGTGGCGGTGCAAACTAAAACTCAGCCCGTTCCATTCCTTTGTATTGTTCAGTGGCTAGTCTACT TACACCTTGGCCTCTGATTTAGCCAGCACTGATCTCAAGCGGTTCTCTAAGCCTACTGGGTATAAGTGGT GACTTTGGCCAGAGTCATAGTGGATCACAAATCACTGGTGAAGAGGTAGAATCCTACCTTCTTCCAAAAT CTACCCCATGACTATTGCTGGGGTTGCATTTTGATTTCAATGAATATTTTGGATGCCAACGACACGTCTG ATAGTGTGCTTTGCTAGTGTTTGAATTTAAAACCGAAGTGATTGTTTTCAAAATGTATTTACGGATTTGC TTACTTGTTGAATTCATTTTAATTACCTTTAGTGAATTGTTACTTTGGAGTCCTTAAAGTTTTCAATAAT TTTTTTGGCAGATGATACTCAAATTACTTGGCACTTAAATGTACTTTCTTTCAAACTCATCCACCGAGCT

165

Table S1 Sanger sequencing information (continued) ACTCTTCAAATTTTTAAGTCTTATAACACAGATACTGTTAATGTAAAGTGAACATTATGACTGGATGTCA GGAGTATTTGAGGTTCTATACCAGTTCAGGCTTTGCTTTTGTTGCTATTGTTGATGCTATATTGACTAAT GGTTTTACTTGTCAGCAAGAGCCTTGAATTGTAATGCTCTGTGTCCTCTATCAGACTTACTGTTATAATA GTAATATTAAGGCCTACATTTCAACTTTCTGTGTGTTCTTGCCTTTATGGCATCTAGATTCTCCTCAAGA CTCAGCAAATAGTGCTGCTGCTATTGCTGCCCCAGCCCCAGGCCCAGCCCCAGCCCCTGCCCCAGCCCCA GCCCCAGCCCCTGCCCCAGCCCCAGCCCCTGCCCCTGCCCCAGCCCCTGCCCCAGCCCCAGCCCCAGCCC CTACCCCTGCCCCTGCCCCTGCCCCACCCAACCAACCCAATCCAGTCCAGCCCTGCCCCAGCCCAGTCCT AGCCCCAGGCCCAGATACTTTCAGACCTATCCCAAGCCCACTTCTACTTAGAGAAATTCG cDNA: As expected from gDNA

∆RepB Xi-/- (clone 22)

Genotype 1 (12/20 colonies) gDNA:

TGAGTCACTGTCCCATAAGGACGTGAGTTTCGCTTGGTACTTCACGTGTGTCTTTAGTCATCATTTTTTC GAAGTGCCTGCCCAGGTCGGGAGAGCGCATGCTTGCAATTCTAACACTGAAGTGTTGGATGATGTCGGAT CCGATTCGAGAGACCGAGGCTGCGGGTTCTTGGTCGATGTAAATCATTGAAACCTCACCTATTAAAAGAA AGAAAAGTATCTAAGGCCATTTCAAGGACATTTGACTCATCCGCTTGCGTTCATAGTCTCTTACAGTGCT CTATACGTGGCGGTGCAAACTAAAACTCAGCCCGTTCCATTCCTTTGTATTGTTCAGTGGCTAGTCTACT TACACCTTGGCCTCTGATTTAGCCAGCACTGATCTCAAGCGGTTCTCTAAGCCTACTGGGTATAAGTGGT GACTTTGGCCAGAGTCATAGTGGATCACAAATCACTGGTGAAGAGGTAGAATCCTACCTTCTTCCAAAAT CTACCCCATGACTATTGCTGGGGTTGCATTTTGATTTCAATGAATATTTTGGATGCCAACGACACGTCTG ATAGTGTGCTTTGCTAGTGTTTGAATTTAAAACCGAAGTGATTGTTTTCAAAATGTATTTACGGATTTGC TTACTTGTTGAATTCATTTTAATTACCTTTAGTGAATTGTTACTTTGGAGTCCTTAAAGTTTTCAATAAT TTTTTTGGCAGATGATACTCAAATTACTTGGCACTTAAATGTACTTTCTTTCAAACTCATCCACCGAGCT ACTCTTCAAATTTTTAAGTCTTATAACACAGATACTGTTAATGTAAAGTGAACATTATGACTGGATGTCA GGAGTATTTGAGGTTCTATACCAGTTCAGGCTTTGCTTTTGTTGCTATTGTTGATGCTATATTGACTAAT GGTTTTACTTGTCAGCAAGAGCCTTGAATTGTAATGCTCTGTGTCCTCTATCAGACTTACTGTTATAATA GTAATATTAAGGCCTACATTTCAACTTTCTGTGTGTTCTTGCCTTTATGGCATCTAGATTCTCCTCAAGA CTCAGCAAATAGTGCTGCTGCTATTGCTGCCCCAGCCCCAGGCCCAGCCCCAGCCCCTGCCCCAGCCCCA GCCCCAGCCCCTGCCCCAGCCCCAGCCCCTGCCCCTGCCCCAGCCCCTGCCCCAGCCCCAGCCCCAGCCC CTACCCCTGCCCCTGCCCCTGCCCCACCCAACCAACCCAATCCAGTCCAGCCCTGCCCCAGCCCAGTCCT AGCCCCAGGCCCAGATACTTTCAGACCTATCCCAAGCCCACTTCTACTTAGAGAAATTCG cDNA: As expected from gDNA

Genotype 2 (8/20 colonies) gDNA:

TGAGTCACTGTCCCATAAGGACGTGAGTTTCGCTTGGTACTTCACGTGTGTCTTTAGTCATCATTTTTTC GAAGTGCCTGCCCAGGTCGGGAGAGCGCATGCTTGCAATTCTAACACTGAAGTGTTGGATGATGTCGGAT CCGATTCGAGAGACCGAGGCTGCGGGTTCTTGGTCGATGTAAATCATTGAAACCTCACCTATTAAAAGAA AGAAAAGTATCTAAGGCCATTTCAAGGACATTTGACTCATCCGCTTGCGTTCATAGTCTCTTACAGTGCT CTATACGTGGCGGTGCAAACTAAAACTCAGCCCGTTCCATTCCTTTGTATTGTTCAGTGGCTAGTCTACT

166

Table S1 Sanger sequencing information (continued) TACACCTTGGCCTCTGATTTAGCCAGCACTGATCTCAAGCGGTTCTCTAAGCCTACTGGGTATAAGTGGT GACTTTGGCCAGAGTCATAGTGGATCACAAATCACTGGTGAAGAGGTAGAATCCTACCTTCTTCCAAAAT CTACCCCATGACTATTGCTGGGGTTGCATTTTGATTTCAATGAATATTTTGGATGCCAACGACACGTCTG ATAGTGTGCTTTGCTAGTGTTTGAATTTAAAACCGAAGTGATTGTTTTCAAAATGTATTTACGGATTTGC TTACTTGTTGAATTCATTTTAATTACCTTTAGTGAATTGTTACTTTGGAGTCCTTAAAGTTTTCAATAAT TTTTTTGGCAGATGATACTCAAATTACTTGGCACTTAAATGTACTTTCTTTCAAACTCATCCACCGAGCT ACTCTTCAAATTTTTAAGTCTTATAACACAGATACTGTTAATGTAAAGTGAACATTATGACTGGATGTCA GGAGTATTTGAGGTTCTATACCAGTTCAGGCTTTGCTTTTGTTGCTATTGTTGATGCTATATTGACTAAT GGTTTTACTTGTCAGCAAGAGCCTTGAATTGTAATGCTCTGTGTCCTCTATCAGACTTACTGTTATAATA GTAATATTAAGGCCTACATTTCAACTTTCTGTGTGTTCTTGCCTTTATGGCATCTAGATTCTCCTCAAGA CTCAGCAAATAGTGCTGCTGCTATTGCTGCCCCAGCCCCAGGCCCAGCCCCAGCCCCTGCCCCAGCCCCA GCCCCAGCCCCTGCCCCAGCCCCAGCCCCTGCCCCTGCCCCAGCCCCTGCCCCAGCCCCAGCCCCAGCCC CTACCCCTGCCCCTGCCCCTGCCCCACCCAACCAACCCAATCCAGTCCAGCCCTGCCCCAGCCCAGTCCT AGCCCCAGGCCCAGATACTTTCAGACCTATCCCAAGCCCACTTCTACTTAGAGAAATTCG cDNA: As expected from gDNA

∆RepC Xi+/- (clone 14)

Genotype 1 (10/10 colonies) gDNA:

ATACTTTCAGACCTATCCCAAGCCCACTTCTACTTAGAGAAATTCGAATCTTCATTGATTCAGTGCTAAA ATGCAGTGTCCATCACTCAGCCTATAAGACTGAGACAGCCCATCTATACCCCCTCCATACTGACTTCTAG AGTCATGGAATTTCACTTAATGCATAGAATCGTATTGCTAAAATGCAGTGCCCATCACTCAGCCTATAAG ACTGAGATAGCCCATCTATACCCCCTCCATACTGACTTACAGAGTCATGGAGTTTCACTTAATGCATGCA GTCCTATTGCTAAAATGCAGTGCCCATAACTCAGCCTATAAGACTGAGATAGCCCATTTATACCCCATAC CCCCTCCATACTGACTTCTAGGGTCATGGAATTTCACTTAATACATAGAATCGTATTGCTAAAATGCAGT GTCCATCACTCAGTCTATAAGACTGAGATATCCCTATGTATACCCCATACTCCCTCCATACTGACTTCCA GAGTCATAGAATTTCACTTTGCATACGGTCCTATTGCTAAAATGCAGTGTCCATCACTCAGTCTATAAGA CTGAGATATCCCTATGTATACCCCATACTCCCTCCATACTGACTTCCAGAGTCATAGAATTTCACTTTGC ATACGGTCCTATTGCTAAAATGCAGTGCCCATCACTCAGCCTATAAGACTGAGATAGCCCATCTATACCC CCTCCATACTGACTTCCAGAGTCATGGAATTTCACTTAATGCATGCAGTCCTATTGCTAAAATGCAGTGC CCATCACTCAGCCTATAAGACTGAGATAGCCCATCTATACCCCATACCCCCTCCATACTGACTTCCAGAG TCATGGAATTTCACTTAATGCATGCAGTCCTATTGCTAAAATGCAGTGCCCATCACTCAGCCTATAAGAC TGAGATAGCCCATCTATACCCACTCCATACTGACTTCCAGAGTCATGGAATTTCACTTAATGCATGCAGT CCTATTGCTAAAATGCAGTGCCCATCACTCAGCCTATAAGACTGAGATAGCCCATCTATACCCACTCCAT ACTGACTTCCAGAGTCATGGAGTTTCACTTAATGCATGCAGTCCTATTGCTAAAATGCAGTGCCCATAAC TCAGCCTATAAGACTGAGATAGCCCATTTATACCCCATACCCCCTCCATACTGACTTCTAGGGTCATGGA ATTTCACTTAATGCATAGAATCGTATTGCTAAAATGCAGTGTCCATTACTCAGCCTATAAGACTGAGATA TCCCTATGTATACCCCATACCCCCTCCATACTGACTTCCAGAGACATAGAATTTCACTTTGCATACGGTC CTATTGCTAAAATGCAGTGCCCATCACTCAGCCTATAAGACTGAGATATCCCTATCTATACCCTCTACCC CCTCCATACTGACTTCCAGAGTCATGGAATTTCACATAATGTATAGATTTCTATTGCTAAAATGCAGTGC CCATAACTCAGCCTATAAGACTGAGATAGCCCATCTATACCCCCTCCATACTGAGTTCCAGAGTCATGGA ATTTCACTTAATGCATAGAATCGTATTGCTAAAATGCAGTGCCCATCACTCAGCCTATAAGACTGAGCCC ATCTATACCCCATACCCCCTCCATACTGACTTCCAGAGTCATGGAATTTCACTTTGCATACAGTCCTACT TTACTTGTCCATGGACAAGTAAACAAAGAACTCTTGTCCTTCATGTTAATCAAGATACACCAATCAAACA AGAGTTTTATATCAGAGACTTGCCATGGAGGTATCATCTCTC

167

Table S1 Sanger sequencing information (continued)

cDNA: As expected from gDNA

∆C-D Xi+/- (clone 22)

Genotype 1 (11/11 colonies) gDNA:

TTACTTGTCCATGGACAAGTAAACAAAGAACTCTTGTCCTTCATGTTAATCAAGATACACCAATCAAACA AGAGTTTTATATCAGAGACTTGCCATGGAGGTATCATCTCTCAAGTCTCCTTTCCTTTAAGGAAAGAAAA CCATTCTGTCATTGCTGTAGTAGTCACAGTCCCAAGTTTCTAAGCAGTGTTCAGTCGTCTTTTCTCATGT ATTACCTTGAGTACTGAATAATTCTGTCAGAAATATTTTGTCCATTGGATTAGACTTTAGCTAGTCCAGC CCTGTGTGCATTTAGCAAAGGGGCAAACACAGGTCTGTTATCAGACAGTTAAAGTGCTCAGTCCCAATTT TCAAGGCATTGGCCATTAAAGGGGGTAGAATACTATATACTGTTGGCATGCTGTCATGGGTGCTATCGCC CCAGGTCACATCTTTCTAACTGATGGAGATACATTTATTTGCTCATGATATTGTATACTAGTCTCACATG CTTTCTTATTTCAGCCAAAAACCTCTGCACTGGAACATTTTATGTGGATAATCCTGACTAGGAATTGAGT CTTTTCTCAAGGTCCTAATACTACCCTTGCTTTATGTAAAGAGGGTGCTGATTACTTAATGCCTCTTACA CAATTGTGCAAAATTGCAGTTGTTCAAGTCCCCTTCTGTTAGTAACCAAGATCCCATACCCTCATACCCT AATGGGTGACAATCAAGGGTGCCAACCAATGAGACCACTTCTCTGTTCTGGTCTTTCTGCTGTGCTGGGG AATCAAACCTTGAGTCTTGTGTACGCTAGTAAAGCACTGTCATAGAGCTACAGCCCCACCGTGTGGTGGT TTGAGAGAACAGCCTCTTATGTAGCCTGGGCTGGGCGGGACTTACAGGCATTGCCACCTGTAATGTAAAC ATATTTGTGCCTGTTGTGTGCACAGCTGCATTTGTCCCTCTTCCTAAGCATTGGATAAAGAAACCAAACT AAGTCAAGTCATTTTGTTGGTAATCAAGAAGACCTTTGATCTGTCCTGTTTTTAACTTCCAGGCTGGCCT GGAACTTAGCATATAACCCAGGCTAGCCTTGAGCTCAGGATCTAGCCTGCGTTTAACAAGTGTTGGCATA TCTGGTTCCTACCACTATGCCCTGCATGCAGTCTTTCATATTGTGAATGTGCATATGTCATTTCACTGTA GTAATCTGCATCTGGTGAAGACTTATTTGTATTGCAGCAGTATTTAAGATCCTTAACATAGTAAATGTGC ACAGTGTTAACTCTATTGTACATATTCTCATGTCCACAGTTGTGCCTTTTAGATCAGGACTCCTGTACTT AGCAAAGCAAAGAG cDNA: As expected from gDNA

∆RepD Xi+/- (clone 11)

Genotype 1 (9/9 colonies) gDNA:

TAAATGTGCACAGTGTTAACTCTATTGTACATATTCTCATGTCCACAGTTGTGCCTTTTAGATCAGGACT CCTGTACTTAGCAAAGCAAAGAGGCTCACTAATATAAAGCTTCTTTCATGAGACTATAGATTGAAACGAT TCCAATACGGTCAATGGTCCTTCAAGGTAAGACTTCTGTCTCTGATCATTCATATCCTCTTTGCTTTATG GAATTATGTATGTGCTGTGCACTTGAAACCCCTTCCTCAAACTATTTATGTACATACTGGCAATTTTAGT AGGATCAATTTTACTCTTAACTTTGAAGTACAGAAGTGGTGTTGACCTATAAGGTCCCATTTTGTGGCTT GCTAATAATAATGACTGATTGTAGTAGGCCTTTTCTGTTCACTACAGAAGGAAACCTGAACAGCGTAAAA CTGTAATGGCCATAAACATGTACCTTGCATATTAGTATGCATTTACTGCACACATCTCATTCCATTTGGA TACGATCCTACTCTCAAACCCTTTTGCAGTACAGCAAGGGTCACTAATCTTTTGGCTTCTTCATCTTCCT GGACACTGGATAAGGCTGTCCCCTCCTTTCCACTCTTTAATTTCCAGGACTATTACTTTAAAGACTTAAT ATTTGCATAAAGGATGGGGTTTTTAATTGATAACATGTCCCTTGAACATTAATGTATATAACAGGGACAT GATCCATTCATTTTAATAAAAATACTTGGCCAGTTAATGTGTAAAATTACACTTATCCACAACCTTATTA CTTTTCGGACCATTGTATCTCTTGCACTCCTGCAAGGGATACCGTTTATCTCCCAAGGTCCCTGCTAGTG

168

Table S1 Sanger sequencing information (continued) GACCATTAATATACAGTGAATCTTCCTTTGTCTTTGCCAGTAAACAAAGGCCATACTCCTTCGCCTTTCA TTTGCACTATATCAGGATATGCTGATCAACAAGGCCGCATTCTTTTGGACTGTTATCATATATTAAATGT ATGCGTATGCACTGCCACCTGCTCTGTGCACTTGAAAGGATCCCACTCACTTCCTTAGCACCTTCAGCAG GAAGTGATAATAAGCTCAAGACTTTCATTTGGAAAGTTCACATGTCTAAGCACTTCTCTAAGAACTACTG TACCCTCTTCTCCGCTTTAAAGCAGAAAGAGGGTTGTACGAAGTGCTCTTCATTTGGACTTAAGTGCATT AATGCAGTTAG cDNA: As expected from gDNA

∆Ex1 3’ Xi+/- (clone 3)

Genotype 1 (13/13 colonies) gDNA:

CACATGTCTAAGCACTTCTCTAAGAACTACTGTACCCTCTTCTCCGCTTTAAAGCAGAAAGAGGGTTGTA CGAAGTGCTCTTCATTTGGACTTAAGTGCATTAATGCAGTTAGTTGTCCATCATTACCTTTGGAGTTGGA TTTTACATCCTTGTACTCTTTTGACACCAGAGGCATATTAATTATTTCTGAGCACTTCTCTTGTCAATAT TAATCTGTACCCTTACACATATGACCTGTGCGGCAGCAAAGGTTCTGAAATGCCTACCTTTTGACTGGGG CTGCTGAGTGGTAGTAACTATTAGTAACCTCAGCATTTGGATGATTACTATGCAAAAATGTCAAGGACCT GTGTGCTCTCTTTGCATACCATCAAGGCTACTGAGTCCCAGAATTAATTGCTAAGTTATGCGTATTTATA ACTATGAATGTCTGGAATATTTTGTCCCCTTTACATTATTGCAGAGGTTGCTGAGCCCCCGAAACTACCC GGTACTGTCAATGAGCACAGGGGCTCTGACGAATGACCTGCTCTCTTCCTTAAACTGATTTTGGGACTCT TAATAGGCACAATGGCAGTTCTGGATGGTTTATTTTCTACTCCAACTTGAGCAAATCCCCTGCTAGTTTC CCAATGATATAATAAAGTACAGCAGTATGTACACCCAACAATGACCCGGATTTCGACCCTTTTTGCATTG CTTTAATATATACAATCCTAAATAGTCACAATCTCACACTTTATAGTGTTCCTTTTGCCCGGCCTCTAGT TTGTCCATTGACCACTTTTCTGAATCACTAATTCTCACAAACCCATCATTAAGGAAGAGTTTGTGCCCTT TCTCAATTCCATCATGCCATCCCTTTTGCCTCTTTGTTTGAACAGTATTGACTGGGCAAAGCCCTTCTCT TGACTTAAAGTCAACAACACCAGTTTACTCACTTCATATGGCTACAGTGTCTCAGTTGCCTTCTCCTTGC TCCCACTGAACAGAGACACCTCGAATTCTTACATTATTCTGGGTAATGTTAATTACCCCAAACACCCTAT GTGTCATTAATAAATTTTGGTGTATTTATACACTGAATAGCAAAAGCAGGCCAAAACTAGGTGGATGAGC CTTCAATCTTTAACTTGCACTTCTAAATTATTCCAATTCCAACTGCTGGCACATTCTAGGGCCAGGAACC ATTCTTGCCTACCTTTATTAATGCTTTATTGTGCAAAATATTGCAGGCAAGTAGCTCAGGGAGTTGGATT GCCACCTTTTACTTGGGGCTTTCCTTTACAGTATGAACTGAAAATTGTCTTCCTGAGAAGGAAGCTTAGC ACTTTTCTTTCCATTCTTCCTCCAGGAAGGAGCCAACTGTCTGCTTAAGAAACTTTAAGCCCGATTTTGT ATATTGCTACTGTACAGGACCAACTGCCAGAAAAGTTATTGATAATTTTATTCCTTAAGAAAGGCATTTG GATTGCAAGGTGGATTGACTGTGAGATCATTAGCTTTTGTGAAGTAAAAATAGCCATTTGTGTCATGTTT CTGAAGACTAAGCAGTGTCTCAGTGTACTGAGGGTGATGAGTCTGTGGAAAGATCAGTGCAACTATTGCA GAATGTTTAAGACAAGTATCTTTGCTTGGTCTTTACTACAAGTTTAACAAAACGAAAAAGTCAATCTTTG TGTGGCCTTTAGTATGATTAACTTTTTGGAAGATGACCTAAGCCTTCTAATCATTATATTTTGTCTGACA TTGGTCACCAGTCCTTGCTTATTTTTAAAAGGTGACTGGATGGATTAAATTTGAGAACATGTCAAGTCGC CTTTGAAAATTATATAGGCCATCACATTTAATTAATTCATTCTATCCACCATTAAACTCTGGCAATAATT TGAAGTAGCTTGAAAATTCCTAAAGTGGGAATTTATTTTAGAGATGATAGAACCTGTTTCCCCACTTTAC ATTTTAAAATATGTCTGCCAGGATCTAATCATTCCTTTAAACGTACACTTCAAAGAGAGATTTTCCTAGT AAGAAAAGAGCTTTCTCTAGTGTGAAGGGTGCTTTGTAGCCGCCGAGTACTTAGGTCTTTTTTGGGAGCT ATTGTGTATGAGTGTATGTATGTGTGTGTGTACATGCATGTTGCTGCGCGCAGTCATTCATTCACATGGT GCTCAGACAACAATGGGAGCTGGTTCGTCTATCTTGTGGGTCCTGGAGATCAAAGTGAGATCATCAGGCT TGGCAGCAAGTGCCTTTACCCTCCGCGTGCCATCTTGCCATCCCGCTGCTGAGTGTTTGATATGACATTG CTGATGAAAATAATCATCACAACAGCAGTTCTCCCAGCATTACTGAGAAATGATACTATTTTTCTGAGGA GGATGAGTTCAAGTAACTCATCCAGTGCAGGA

169

Table S1 Sanger sequencing information (continued)

cDNA: As expected from gDNA

∆Ex2-6 Xi+/- (clone 13)

Genotype 1 (12/12 colonies) gDNA:

CCGTTACATCAGACTCTGGCTGTTTAGACTACAgtaagtactaccttgttagctactaagctatcttttt ctcattcgagtttctttctgtgccacccactgaaaagtcaaagtgactaataagtcactagctcttctca gcttatccatacacaacacatagtatgccaacgaaattgtagaattaacacattgtgtacccaatcgcca ttatgctggggtacactaaaatgagaagcgatgatattttacaaaatttcttccgtgagatgattatgtt atttaagacttaatgtaatataagcacacttgagatcattttaggaaatgtactgcctgtgtgaattgtt taagtattttaggaagtctactggggttgaaatttgacttcttaaagagggaagtgtgcttactttgcct tgttggtccagacgactattagtaccatctggttggtttatctgaatgccatctgtttctatctacctct tggaagcatggaaagaatattaagagtgaaagactccatgcattatgcttagctgtacgccaagggtagc aagagatgagcttgtgtaggaattgggaatataaatttgataatttggttctttcaaccatttttactgc atttctgcccagaacagaccactggacggcactaacagcttagacagagaacttcatagcttacaatctt tttttttttttttttttttttttttttttttacaaaagtgcaaagaaggataagttttatgaacaaattt tactatggactactttatgtaccctccttgctcatttttctttttcattaattaatcaattaattttgtt ttgttctctgttttgttttgtgacagggattctctgtgtagtctggctatcctggaactcaaactcaaga gatccgcccgcagagatccgcccgcctctgcctccagtttagagtgttgggattaagggtgggtgccaca ctggacccctgccagcacttttcaacagatactttcagttaaaacaacaaatacttgactcctgtgtaga aaatggtctagctgaaaaaacatccttgtccataacacttaacatttttttgcatttgccttttgtacgt gcaaatgaccctctagtactttaacccttttaagtccactgtaaattccttcgcagcaggtttcccccaa ctccaaaggcttagcattgtctggctctctaggtgataatacatctagaactattgagtcttttaggaaa ctaggaccattgttgtacaagtaactgcttcaaaaggaaacatttgcaaggaaaaaaagctttttgagat gctaaagaatgtgtttggtgcactaagacagttagctaaagcttcggtctcaccttagcaccattggcta cccaccccagtacatgcaaagtgaccccccccccaggcttgcctaggcacctagagctttcagcctgcct cctatggacctaaatagagattaggactcagactccagttagtggcctggaggaagctggtactgctgct attcacagtaaacccactatcactgacagttccctattgaaatgatcaagttgtgtgagcaggcagatca ttaatcagggccaattctgtcatggcaacctagtcctacatgacctccacggcactgtaaaatacacact tatttgtagtaggacatctggaacatgagcttgtctgtgattgaaatgtccattaggaggtgccatcaca caaaaccatctggaggccatctttggtccccataattgatggaatacaaagagttccccaaattagtgtc ctgacccagccctcactgtcagtaatcatgtaatggaaagccaccaactcgccatcaccatcttcaggct gcctgaactgtatacattaagaatgagctttcagccgggcagtggtagctcgagcctttaatcccagcac ttgggaggcagaggcaggcggatctctgagttcgaggccagcctggtctacagagtgagttccaggacag ccagggctacacagagaaaccctgtcttgaaaaaaaaagaagaatgagctttcttccaagctgtccaatg gcttgacccagacttaggagtttgtaaacaaagaaccctgtagaacaagtgaatttgagaatgtacacct tcattaagtggtgggaagtggaagtagtggattttgccagctgattttacaaagccctgtggcagaacca gaccaggtctttgtatgcatactgccatataacctagccactgtccaacctgtgccctctctcccagtcc agtgaacctcttctgcctgccccttacttagtcctctggctcatcccaggggaagatggctctatttgtg tagtcataagaaagttttgttaattttaaataaaacaggaaaagagaaagctcacttaagacgagagctg gagagatggttctgttagtcttccagaggtcctgagtacaattcccagcaaccacatggtggctcacaac catctgtaatgacgtctgactccctcttctggcacacaggcatacatgcagatagagtactcaaatatac aaaaagggaaaagaaatgaaaggcctttgaaatgtagacatgctttcaattttaaaggctgttttcacat ctcttaagaactaggggttccatgagaacatatcctcaccctgagttcttaccaccaattgaaaacgctg acatgtgcatatgaagattgtattgctagtaatcaaattatcatttgtagagtttagtataattatcatt

170

Table S1 Sanger sequencing information (continued) tgttttccagGGATGAATTTGGAGTCTGTTTTGTGCTCCTGCCTCAAGAAGAAGGATTGCCTGGATTTAG AGGAGTGAAGAGTGCTGGAGAGAGCCCAAAGgtatttccgttacttggttgactgagatagtatcttccc ttaaaggtctaatgtgtcaagttatacctcctaagtaaatacctcattattttcatgtggaattttcaac aatttttgtagtctaatgatagccttgtatttcttcacagGGACAAACAATCCCTATGTGAGACTCAAGG ACTGCCAGCAGCCTATACAGCTACATTACATCTCAGCAGAACTTCTCTTCAAGTCCTCGCTACTCTGAAC AAAAAGCTTACAGGCCACATGGAGAAAAAAAGgtactttggagaatttgctatttgggggtttctatctg atctccagttcatcagagaatgagggggcaagtcaataaagcactccccatctcctggtcccctggcata gtcagttagccaatgagatccttgccttatgatgaggccttgaaatcatccccaccatgttatgaccaat ctgtaacatttctgctctgaaatgcctaaaaggtaggttatcagaggaatgaaggccagtttctgacacc ctagtcttttgtacttgatagagtaagtagatatggctgttgtcacattctgtaggcctgtcataattct ggtctctgaatatcctcaaactcagaggtccaatgctacaccagtttctagggacatatgcaccataagc actgagccaaacaatgctattcaccttgccagtgttctggcatggtattgtttaaagcacaaattaaaag aagaaaataatttaacccctttattcccagagtatttgaagtacaaggactttaacacaacctataatct aaaggatctctcattggtaggtgacagaataaaagccagaaagaacttggatggagcaattctcatgtct ctaagttaacactaagatctacaaatgtgaagggttcactctccaagcccctccccatgatttaacacaa ctgctggttactacataatgagacaactttcccagtattcttgatttcaagaattggcttttctttcacc tcttttccagATCTCCCCCCAGAATTGTGGGCTTGCTGCTTTGCAGTGCTGGCGACCTATTCCCTTTGAC GATCCCTAGGTGGAGATGGGGCATGAGGATCCTCCAGGGGAATAGCTCACCACCACTGGGCAACAGGCCT AGCCCAGATTTCAGTGAGACGCTTTCCTGAACCCAGCAAGGAAGACAAAGGCTCAAAGAATGCCACCCTA CATCAAAGTAGgtaagttttctgcaacagtgtaataattttaactttgatcttgttttccattaaagtca gccttttacttgagatatatacataaattttatctttttcccacatctgggaaatgtaactaaacagtgc aaatgtttcttctagGAGAAAAGCTGCTGCAATAGTGGCACTGACCTTCGAGGAAGCCATTCTGCTCTAT TTGGTTCTCTCTCCAGAAGCTAGGAAAGCTTTGCCAGCTGTTTACATACTTCAAGATGCACTGCTACCCT ACTCATGCCATATAATACACAAgtgagtagcttatttacctacctagctatttacgagtacactgttgct gtcctcagacacaccagaagagagcaccagatctcattacagatggttgtgcgccaccatgtggttgctg gggattgaactcaggacctctggaagagcaataggtgctcttaaccactgagccatcactaacattactg atgatgaatgcctctaactacaaatggaaacctggctggaattgatacagaagtatctatcattaggcag aatttaaagagccaggatgttctagtacttctttcctggcattcatacagcttcttttgttctttgtagT GCCATCTACCAAATATTACCCTTCCCCAAAGCAGCACAGAAAACTGGGTCTTCAGCGTGATCAAGCAATG TGAACACACAAAAGGAAGGCAGCTTTATAAATGACCCGAGGATCAACATGCCTGACTGCAGCATCTTAAA AGCAATAGAATGAGgtaagtcactagcattgcagtcttctgaggatttgcatttgctggaagatggtgct gggtggagagcatctaatgtgataatgtgaggcagggccatgtacacgatggaagatgaacaggctttca cgttatcaaatggcctcacagcagcaactcaaactattatctgcttaccagttatatcacaagaggaatt tagcttctaggttttgttgttgttgttgtttgttttggcttggtgccctaccattcttacagacttaaac attgaaaagctttaaatagtttatttcttatctccatctgtgaagcagccattagacttgtgaaggatgt aaaaaccaagcccccccttttttttaatagaagaggagagtgaagctgacaattaaatatgcagtcgctt atagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttgagatgttcattctgtgtttaa atgtaatattccctagatgtatgccctttggcaatttagttctgctaagacctgtctgtttgtgaaggtc aaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatgccctggagttgcatgactagg ccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaaccttaagtttaacgttgatag cctggtacagtgtactaatggcaatttttttctttgcccttccctgtttcttgttaccctctttctggtg gtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTTCTTGTTTTA cDNA: As expected from gDNA (see below for splicing pattern)

CCGTTACATCAGACTCTGGCTGTTTAGACTACA|GGATGAATTTGGAGTCTGTTTTGTGCTCCTGCCTCA AGAAGAAGGATTGCCTGGATTTAGAGGAGTGAAGAGTGCTGGAGAGAGCCCAAAG|GGACAAACAATCCC TATGTGAGACTCAAGGACTGCCAGCAGCCTATACAGCTACATTACATCTCAGCAGAACTTCTCTTCAAGT CCTCGCTACTCTGAACAAAAAGCTTACAGGCCACATGGAGAAAAAAAG|ATCTCCCCCCAGAATTGTGGG CTTGCTGCTTTGCAGTGCTGGCGACCTATTCCCTTTGACGATCCCTAGGTGGAGATGGGGCATGAGGATC

171

Table S1 Sanger sequencing information (continued) CTCCAGGGGAATAGCTCACCACCACTGGGCAACAGGCCTAGCCCAGATTTCAGTGAGACGCTTTCCTGAA CCCAGCAAGGAAGACAAAGGCTCAAAGAATGCCACCCTACATCAAAGTAG|GAGAAAAGCTGCTGCAATA GTGGCACTGACCTTCGAGGAAGCCATTCTGCTCTATTTGGTTCTCTCTCCAGAAGCTAGGAAAGCTTTGC CAGCTGTTTACATACTTCAAGATGCACTGCTACCCTACTCATGCCATATAATACACAA|TGCCATCTACC AAATATTACCCTTCCCCAAAGCAGCACAGAAAACTGGGTCTTCAGCGTGATCAAGCAATGTGAACACACA AAAGGAAGGCAGCTTTATAAATGACCCGAGGATCAACATGCCTGACTGCAGCATCTTAAAAGCAATAGAA TGAG|TGTGTATTGTGGGTGTGTCTATTTCTTGTTTTA

∆Ex2-6 Xi-/- (clone H23)

Genotype 1 (10/20 colonies) gDNA:

CCGTTACATCAGACTCTGGCTGTTTAGACTACAgtaagtactaccttgttagctactaagctatcttttt ctcattcgagtttctttctgtgccacccactgaaaagtcaaagtgactaataagtcactagctcttctca gcttatccatacacaacacatagtatgccaacgaaattgtagaattaacacattgtgtacccaatcgcca ttatgctggggtacactaaaatgagaagcgatgatattttacaaaatttcttccgtgagatgattatgtt atttaagacttaatgtaatataagcacacttgagatcattttaggaaatgtactgcctgtgtgaattgtt taagtattttaggaagtctactggggttgaaatttgacttcttaaagagggaagtgtgcttactttgcct tgttggtccagacgactattagtaccatctggttggtttatctgaatgccatctgtttctatctacctct tggaagcatggaaagaatattaagagtgaaagactccatgcattatgcttagctgtacgccaagggtagc aagagatgagcttgtgtaggaattgggaatataaatttgataatttggttctttcaaccatttttactgc atttctgcccagaacagaccactggacggcactaacagcttagacagagaacttcatagcttacaatctt tttttttttttttttttttttttttttttttacaaaagtgcaaagaaggataagttttatgaacaaattt tactatggactactttatgtaccctccttgctcatttttctttttcattaattaatcaattaattttgtt ttgttctctgttttgttttgtgacagggattctctgtgtagtctggctatcctggaactcaaactcaaga gatccgcccgcagagatccgcccgcctctgcctccagtttagagtgttgggattaagggtgggtgccaca ctggacccctgccagcacttttcaacagatactttcagttaaaacaacaaatacttgactcctgtgtaga aaatggtctagctgaaaaaacatccttgtccataacacttaacatttttttgcatttgccttttgtacgt gcaaatgaccctctagtactttaacccttttaagtccactgtaaattccttcgcagcaggtttcccccaa ctccaaaggcttagcattgtctggctctctaggtgataatacatctagaactattgagtcttttaggaaa ctaggaccattgttgtacaagtaactgcttcaaaaggaaacatttgcaaggaaaaaaagctttttgagat gctaaagaatgtgtttggtgcactaagacagttagctaaagcttcggtctcaccttagcaccattggcta cccaccccagtacatgcaaagtgaccccccccccaggcttgcctaggcacctagagctttcagcctgcct cctatggacctaaatagagattaggactcagactccagttagtggcctggaggaagctggtactgctgct attcacagtaaacccactatcactgacagttccctattgaaatgatcaagttgtgtgagcaggcagatca ttaatcagggccaattctgtcatggcaacctagtcctacatgacctccacggcactgtaaaatacacact tatttgtagtaggacatctggaacatgagcttgtctgtgattgaaatgtccattaggaggtgccatcaca caaaaccatctggaggccatctttggtccccataattgatggaatacaaagagttccccaaattagtgtc ctgacccagccctcactgtcagtaatcatgtaatggaaagccaccaactcgccatcaccatcttcaggct gcctgaactgtatacattaagaatgagctttcagccgggcagtggtagctcgagcctttaatcccagcac ttgggaggcagaggcaggcggatctctgagttcgaggccagcctggtctacagagtgagttccaggacag ccagggctacacagagaaaccctgtcttgaaaaaaaaagaagaatgagctttcttccaagctgtccaatg gcttgacccagacttaggagtttgtaaacaaagaaccctgtagaacaagtgaatttgagaatgtacacct tcattaagtggtgggaagtggaagtagtggattttgccagctgattttacaaagccctgtggcagaacca gaccaggtctttgtatgcatactgccatataacctagccactgtccaacctgtgccctctctcccagtcc agtgaacctcttctgcctgccccttacttagtcctctggctcatcccaggggaagatggctctatttgtg tagtcataagaaagttttgttaattttaaataaaacaggaaaagagaaagctcacttaagacgagagctg gagagatggttctgttagtcttccagaggtcctgagtacaattcccagcaaccacatggtggctcacaac

172

Table S1 Sanger sequencing information (continued) catctgtaatgacgtctgactccctcttctggcacacaggcatacatgcagatagagtactcaaatatac aaaaagggaaaagaaatgaaaggcctttgaaatgtagacatgctttcaattttaaaggctgttttcacat ctcttaagaactaggggttccatgagaacatatcctcaccctgagttcttaccaccaattgaaaacgctg acatgtgcatatgaagattgtattgctagtaatcaaattatcatttgtagagtttagtataattatcatt tgttttccagGGATGAATTTGGAGTCTGTTTTGTGCTCCTGCCTCAAGAAGAAGGATTGCCTGGATTTAG AGGAGTGAAGAGTGCTGGAGAGAGCCCAAAGgtatttccgttacttggttgactgagatagtatcttccc ttaaaggtctaatgtgtcaagttatacctcctaagtaaatacctcattattttcatgtggaattttcaac aatttttgtagtctaatgatagccttgtatttcttcacagGGACAAACAATCCCTATGTGAGACTCAAGG ACTGCCAGCAGCCTATACAGCTACATTACATCTCAGCAGAACTTCTCTTCAAGTCCTCGCTACTCTGAAC AAAAAGCTTACAGGCCACATGGAGAAAAAAAGgtactttggagaatttgctatttgggggtttctatctg atctccagttcatcagagaatgagggggcaagtcaataaagcactccccatctcctggtcccctggcata gtcagttagccaatgagatccttgccttatgatgaggccttgaaatcatccccaccatgttatgaccaat ctgtaacatttctgctctgaaatgcctaaaaggtaggttatcagaggaatgaaggccagtttctgacacc ctagtcttttgtacttgatagagtaagtagatatggctgttgtcacattctgtaggcctgtcataattct ggtctctgaatatcctcaaactcagaggtccaatgctacaccagtttctagggacatatgcaccataagc actgagccaaacaatgctattcaccttgccagtgttctggcatggtattgtttaaagcacaaattaaaag aagaaaataatttaacccctttattcccagagtatttgaagtacaaggactttaacacaacctataatct aaaggatctctcattggtaggtgacagaataaaagccagaaagaacttggatggagcaattctcatgtct ctaagttaacactaagatctacaaatgtgaagggttcactctccaagcccctccccatgatttaacacaa ctgctggttactacataatgagacaactttcccagtattcttgatttcaagaattggcttttctttcacc tcttttccagATCTCCCCCCAGAATTGTGGGCTTGCTGCTTTGCAGTGCTGGCGACCTATTCCCTTTGAC GATCCCTAGGTGGAGATGGGGCATGAGGATCCTCCAGGGGAATAGCTCACCACCACTGGGCAACAGGCCT AGCCCAGATTTCAGTGAGACGCTTTCCTGAACCCAGCAAGGAAGACAAAGGCTCAAAGAATGCCACCCTA CATCAAAGTAGgtaagttttctgcaacagtgtaataattttaactttgatcttgttttccattaaagtca gccttttacttgagatatatacataaattttatctttttcccacatctgggaaatgtaactaaacagtgc aaatgtttcttctagGAGAAAAGCTGCTGCAATAGTGGCACTGACCTTCGAGGAAGCCATTCTGCTCTAT TTGGTTCTCTCTCCAGAAGCTAGGAAAGCTTTGCCAGCTGTTTACATACTTCAAGATGCACTGCTACCCT ACTCATGCCATATAATACACAAgtgagtagcttatttacctacctagctatttacgagtacactgttgct gtcctcagacacaccagaagagagcaccagatctcattacagatggttgtgcgccaccatgtggttgctg gggattgaactcaggacctctggaagagcaataggtgctcttaaccactgagccatcactaacattactg atgatgaatgcctctaactacaaatggaaacctggctggaattgatacagaagtatctatcattaggcag aatttaaagagccaggatgttctagtacttctttcctggcattcatacagcttcttttgttctttgtagT GCCATCTACCAAATATTACCCTTCCCCAAAGCAGCACAGAAAACTGGGTCTTCAGCGTGATCAAGCAATG TGAACACACAAAAGGAAGGCAGCTTTATAAATGACCCGAGGATCAACATGCCTGACTGCAGCATCTTAAA AGCAATAGAATGAGgtaagtcactagcattgcagtcttctgaggatttgcatttgctggaagatggtgct gggtggagagcatctaatgtgataatgtgaggcagggccatgtacacgatggaagatgaacaggctttca cgttatcaaatggcctcacagcagcaactcaaactattatctgcttaccagttatatcacaagaggaatt tagcttctaggttttgttgttgttgttgtttgttttggcttggtgccctaccattcttacagacttaaac attgaaaagctttaaatagtttatttcttatctccatctgtgaagcagccattagacttgtgaaggatgt aaaaaccaagcccccccttttttttaatagaagaggagagtgaagctgacaattaaatatgcagtcgctt atagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttgagatgttcattctgtgtttaa atgtaatattccctagatgtatgccctttggcaatttagttctgctaagacctgtctgtttgtgaaggtc aaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatgccctggagttgcatgactagg ccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaaccttaagtttaacgttgatag cctggtacagtgtactaatggcaatttttttctttgcccttccctgtttcttgttaccctctttctggtg gtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTTCTTGTTTTA cDNA: As expected from gDNA (see below for splicing pattern)

173

Table S1 Sanger sequencing information (continued) CCGTTACATCAGACTCTGGCTGTTTAGACTACA|GGATGAATTTGGAGTCTGTTTTGTGCTCCTGCCTCA AGAAGAAGGATTGCCTGGATTTAGAGGAGTGAAGAGTGCTGGAGAGAGCCCAAAG|GGACAAACAATCCC TATGTGAGACTCAAGGACTGCCAGCAGCCTATACAGCTACATTACATCTCAGCAGAACTTCTCTTCAAGT CCTCGCTACTCTGAACAAAAAGCTTACAGGCCACATGGAGAAAAAAAG|ATCTCCCCCCAGAATTGTGGG CTTGCTGCTTTGCAGTGCTGGCGACCTATTCCCTTTGACGATCCCTAGGTGGAGATGGGGCATGAGGATC CTCCAGGGGAATAGCTCACCACCACTGGGCAACAGGCCTAGCCCAGATTTCAGTGAGACGCTTTCCTGAA CCCAGCAAGGAAGACAAAGGCTCAAAGAATGCCACCCTACATCAAAGTAG|GAGAAAAGCTGCTGCAATA GTGGCACTGACCTTCGAGGAAGCCATTCTGCTCTATTTGGTTCTCTCTCCAGAAGCTAGGAAAGCTTTGC CAGCTGTTTACATACTTCAAGATGCACTGCTACCCTACTCATGCCATATAATACACAA|TGCCATCTACC AAATATTACCCTTCCCCAAAGCAGCACAGAAAACTGGGTCTTCAGCGTGATCAAGCAATGTGAACACACA AAAGGAAGGCAGCTTTATAAATGACCCGAGGATCAACATGCCTGACTGCAGCATCTTAAAAGCAATAGAA TGAG|TGTGTATTGTGGGTGTGTCTATTTCTTGTTTTA

Genotype 2 (10/20 colonies) gDNA:

CCGTTACATCAGACTCTGGCTGTTTAGACTACAgtaagtactaccttgttagctactaagctatcttttt ctcattcgagtttctttctgtgccacccactgaaaagtcaaagtgactaataagtcactagctcttctca gcttatccatacacaacacatagtatgccaacgaaattgtagaattaacacattgtgtacccaatcgcca ttatgctggggtacactaaaatgagaagcgatgatattttacaaaatttcttccgtgagatgattatgtt atttaagacttaatgtaatataagcacacttgagatcattttaggaaatgtactgcctgtgtgaattgtt taagtattttaggaagtctactggggttgaaatttgacttcttaaagagggaagtgtgcttactttgcct tgttggtccagacgactattagtaccatctggttggtttatctgaatgccatctgtttctatctacctct tggaagcatggaaagaatattaagagtgaaagactccatgcattatgcttagctgtacgccaagggtagc aagagatgagcttgtgtaggaattgggaatataaatttgataatttggttctttcaaccatttttactgc atttctgcccagaacagaccactggacggcactaacagcttagacagagaacttcatagcttacaatctt tttttttttttttttttttttttttttttttacaaaagtgcaaagaaggataagttttatgaacaaattt tactatggactactttatgtaccctccttgctcatttttctttttcattaattaatcaattaattttgtt ttgttctctgttttgttttgtgacagggattctctgtgtagtctggctatcctggaactcaaactcaaga gatccgcccgcagagatccgcccgcctctgcctccagtttagagtgttgggattaagggtgggtgccaca ctggacccctgccagcacttttcaacagatactttcagttaaaacaacaaatacttgactcctgtgtaga aaatggtctagctgaaaaaacatccttgtccataacacttaacatttttttgcatttgccttttgtacgt gcaaatgaccctctagtactttaacccttttaagtccactgtaaattccttcgcagcaggtttcccccaa ctccaaaggcttagcattgtctggctctctaggtgataatacatctagaactattgagtcttttaggaaa ctaggaccattgttgtacaagtaactgcttcaaaaggaaacatttgcaaggaaaaaaagctttttgagat gctaaagaatgtgtttggtgcactaagacagttagctaaagcttcggtctcaccttagcaccattggcta cccaccccagtacatgcaaagtgaccccccccccaggcttgcctaggcacctagagctttcagcctgcct cctatggacctaaatagagattaggactcagactccagttagtggcctggaggaagctggtactgctgct attcacagtaaacccactatcactgacagttccctattgaaatgatcaagttgtgtgagcaggcagatca ttaatcagggccaattctgtcatggcaacctagtcctacatgacctccacggcactgtaaaatacacact tatttgtagtaggacatctggaacatgagcttgtctgtgattgaaatgtccattaggaggtgccatcaca caaaaccatctggaggccatctttggtccccataattgatggaatacaaagagttccccaaattagtgtc ctgacccagccctcactgtcagtaatcatgtaatggaaagccaccaactcgccatcaccatcttcaggct gcctgaactgtatacattaagaatgagctttcagccgggcagtggtagctcgagcctttaatcccagcac ttgggaggcagaggcaggcggatctctgagttcgaggccagcctggtctacagagtgagttccaggacag ccagggctacacagagaaaccctgtcttgaaaaaaaaagaagaatgagctttcttccaagctgtccaatg gcttgacccagacttaggagtttgtaaacaaagaaccctgtagaacaagtgaatttgagaatgtacacct tcattaagtggtgggaagtggaagtagtggattttgccagctgattttacaaagccctgtggcagaacca

174

Table S1 Sanger sequencing information (continued) gaccaggtctttgtatgcatactgccatataacctagccactgtccaacctgtgccctctctcccagtcc agtgaacctcttctgcctgccccttacttagtcctctggctcatcccaggggaagatggctctatttgtg tagtcataagaaagttttgttaattttaaataaaacaggaaaagagaaagctcacttaagacgagagctg gagagatggttctgttagtcttccagaggtcctgagtacaattcccagcaaccacatggtggctcacaac catctgtaatgacgtctgactccctcttctggcacacaggcatacatgcagatagagtactcaaatatac aaaaagggaaaagaaatgaaaggcctttgaaatgtagacatgctttcaattttaaaggctgttttcacat ctcttaagaactaggggttccatgagaacatatcctcaccctgagttcttaccaccaattgaaaacgctg acatgtgcatatgaagattgtattgctagtaatcaaattatcatttgtagagtttagtataattatcatt tgttttccagGGATGAATTTGGAGTCTGTTTTGTGCTCCTGCCTCAAGAAGAAGGATTGCCTGGATTTAG AGGAGTGAAGAGTGCTGGAGAGAGCCCAAAGgtatttccgttacttggttgactgagatagtatcttccc ttaaaggtctaatgtgtcaagttatacctcctaagtaaatacctcattattttcatgtggaattttcaac aatttttgtagtctaatgatagccttgtatttcttcacagGGACAAACAATCCCTATGTGAGACTCAAGG ACTGCCAGCAGCCTATACAGCTACATTACATCTCAGCAGAACTTCTCTTCAAGTCCTCGCTACTCTGAAC AAAAAGCTTACAGGCCACATGGAGAAAAAAAGgtactttggagaatttgctatttgggggtttctatctg atctccagttcatcagagaatgagggggcaagtcaataaagcactccccatctcctggtcccctggcata gtcagttagccaatgagatccttgccttatgatgaggccttgaaatcatccccaccatgttatgaccaat ctgtaacatttctgctctgaaatgcctaaaaggtaggttatcagaggaatgaaggccagtttctgacacc ctagtcttttgtacttgatagagtaagtagatatggctgttgtcacattctgtaggcctgtcataattct ggtctctgaatatcctcaaactcagaggtccaatgctacaccagtttctagggacatatgcaccataagc actgagccaaacaatgctattcaccttgccagtgttctggcatggtattgtttaaagcacaaattaaaag aagaaaataatttaacccctttattcccagagtatttgaagtacaaggactttaacacaacctataatct aaaggatctctcattggtaggtgacagaataaaagccagaaagaacttggatggagcaattctcatgtct ctaagttaacactaagatctacaaatgtgaagggttcactctccaagcccctccccatgatttaacacaa ctgctggttactacataatgagacaactttcccagtattcttgatttcaagaattggcttttctttcacc tcttttccagATCTCCCCCCAGAATTGTGGGCTTGCTGCTTTGCAGTGCTGGCGACCTATTCCCTTTGAC GATCCCTAGGTGGAGATGGGGCATGAGGATCCTCCAGGGGAATAGCTCACCACCACTGGGCAACAGGCCT AGCCCAGATTTCAGTGAGACGCTTTCCTGAACCCAGCAAGGAAGACAAAGGCTCAAAGAATGCCACCCTA CATCAAAGTAGgtaagttttctgcaacagtgtaataattttaactttgatcttgttttccattaaagtca gccttttacttgagatatatacataaattttatctttttcccacatctgggaaatgtaactaaacagtgc aaatgtttcttctagGAGAAAAGCTGCTGCAATAGTGGCACTGACCTTCGAGGAAGCCATTCTGCTCTAT TTGGTTCTCTCTCCAGAAGCTAGGAAAGCTTTGCCAGCTGTTTACATACTTCAAGATGCACTGCTACCCT ACTCATGCCATATAATACACAAgtgagtagcttatttacctacctagctatttacgagtacactgttgct gtcctcagacacaccagaagagagcaccagatctcattacagatggttgtgcgccaccatgtggttgctg gggattgaactcaggacctctggaagagcaataggtgctcttaaccactgagccatcactaacattactg atgatgaatgcctctaactacaaatggaaacctggctggaattgatacagaagtatctatcattaggcag aatttaaagagccaggatgttctagtacttctttcctggcattcatacagcttcttttgttctttgtagT GCCATCTACCAAATATTACCCTTCCCCAAAGCAGCACAGAAAACTGGGTCTTCAGCGTGATCAAGCAATG TGAACACACAAAAGGAAGGCAGCTTTATAAATGACCCGAGGATCAACATGCCTGACTGCAGCATCTTAAA AGCAATAGAATGAGgtaagtcactagcattgcagtcttctgaggatttgcatttgctggaagatggtgct gggtggagagcatctaatgtgataatgtgaggcagggccatgtgcacgatggaagatgaacaggctttca cgttatcaaatggcctcacagcagcaactcaaactattatctgcttaccagttatatcacaagaggaatt tagcttctaggttttgttgttgttgttgtttgttttggcttggtgccctaccattcttacagacttaaac attgaaaagctttaaatagtttatttcttatctccatctgtgaagcagccattagacttgtgaaggatgt aaaaaccaagcccccccttttttttaatagaagaggagagtgaagctgacaattaaatatgcagtcgctt atagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttgagatgttcattctgtgtttaa atgtaatattccctagatgtatgccctttggcaatttagttctgctaagacctgtctgtttgtgaaggtc aaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatgccctggagttgcatgactagg ccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaaccttaagtttaacgttgatag cctggtacagtgtactaatggcaatttttttctttgcccttccctgtttcttgttaccctctttctggtg gtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTTCTTGTTTTA

175

Table S1 Sanger sequencing information (continued) cDNA: As expected from gDNA (see below for splicing pattern)

CCGTTACATCAGACTCTGGCTGTTTAGACTACA|GGATGAATTTGGAGTCTGTTTTGTGCTCCTGCCTCA AGAAGAAGGATTGCCTGGATTTAGAGGAGTGAAGAGTGCTGGAGAGAGCCCAAAG|GGACAAACAATCCC TATGTGAGACTCAAGGACTGCCAGCAGCCTATACAGCTACATTACATCTCAGCAGAACTTCTCTTCAAGT CCTCGCTACTCTGAACAAAAAGCTTACAGGCCACATGGAGAAAAAAAG|ATCTCCCCCCAGAATTGTGGG CTTGCTGCTTTGCAGTGCTGGCGACCTATTCCCTTTGACGATCCCTAGGTGGAGATGGGGCATGAGGATC CTCCAGGGGAATAGCTCACCACCACTGGGCAACAGGCCTAGCCCAGATTTCAGTGAGACGCTTTCCTGAA CCCAGCAAGGAAGACAAAGGCTCAAAGAATGCCACCCTACATCAAAGTAG|GAGAAAAGCTGCTGCAATA GTGGCACTGACCTTCGAGGAAGCCATTCTGCTCTATTTGGTTCTCTCTCCAGAAGCTAGGAAAGCTTTGC CAGCTGTTTACATACTTCAAGATGCACTGCTACCCTACTCATGCCATATAATACACAA|TGCCATCTACC AAATATTACCCTTCCCCAAAGCAGCACAGAAAACTGGGTCTTCAGCGTGATCAAGCAATGTGAACACACA AAAGGAAGGCAGCTTTATAAATGACCCGAGGATCAACATGCCTGACTGCAGCATCTTAAAAGCAATAGAA TGAG|TGTGTATTGTGGGTGTGTCTATTTCTTGTTTTA

∆RepE Xi+/- (clone 4)

Genotype 1 (10/10 colonies) gDNA:

TGACTGCAGCATCTTAAAAGCAATAGAATGAGgtaagtcactagcattgcagtcttctgaggatttgcat ttgctggaagatggtgctgggtggagagcatctaatgtgataatgtgaggcagggccatgtacacgatgg aagatgaacaggctttcacgttatcaaatggcctcacagcagcaactcaaactattatctgcttaccagt tatatcacaagaggaatttagcttctaggttttgttgttgttgttgtttgttttggcttggtgccctacc attcttacagacttaaacattgaaaagctttaaatagtttatttcttatctccatctgtgaagcagccat tagacttgtgaaggatgtaaaaaccaagcccccccttttttttaatagaagaggagagtgaagctgacaa ttaaatatgcagtcgcttatagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttgaga tgttcattctgtgtttaaatgtaatattccctagatgtatgccctttggcaatttagttctgctaagacc tgtctgtttgtgaaggtcaaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatgccc tggagttgcatgactaggccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaacct taagtttaacgttgatagcctggtacagtgtactaatggcaatttttttctttgcccttccctgtttctt gttaccctctttctggtggtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTTCTT GTTTTATGTATCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTT GTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTATATC TATTTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTT CCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTTGCT TTGTGTGTCTGTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCT ATTTCTTCCTTGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGTGTC TAATTCTTTGTTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTC TTTGGTATATATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTGCAT CTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTGTGT GTCTATGTCTTCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATTTCT TCCTTCCTTTGCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATTCCC ATGTTTCTCCTGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGTGGT CTGCTTTGTCTTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTCTTT CACTCCTGCCTTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCTCAC

176

Table S1 Sanger sequencing information (continued) CCCACTTGTTCTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCCTTA GTGAATTCTTGTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTGTGT CTCTTTGCTATTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTATCT TCTACTTTCCTCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTATTCT AAAGTAATCCTTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGTCAT GTGTGAGAGGAGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCACACC AGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCTGAA GTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATGGCC CGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTGTGA CTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACCTAA GGACCCAATC cDNA: As expected from gDNA (see below for splicing pattern)

TGACTGCAGCATCTTAAAAGCAATAGAATGAG|TGTGTATTGTGGGTGTGTCTATTTCTTGTTTTATGTA TCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTTGTGTGTCTAT TTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTATATCTATTTCTTCC TTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTTG TGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTTGCTTTGTGTGTCT GTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTTCCT TGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTG TTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGGTATAT ATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTGCATCTATTTCTTC CTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTGTGTGTCTATGTCT TCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTCCTTT GCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATTCCCATGTTTCTCC TGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGTGGTCTGCTTTGTC TTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTCTTTCACTCCTGCC TTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCTCACCCCACTTGTT CTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCCTTAGTGAATTCTT GTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTGTGTCTCTTTGCTA TTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTATCTTCTACTTTCC TCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTATTCTAAAGTAATCC TTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGTCATGTGTGAGAGG AGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCACACCAGACTGCCCT GAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCTGAAGTGAACGCCC AAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATGGCCCGTCTTGAGC TGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTGTGACTTCCAAGAG CGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACCTAAGGACCCAATC

∆RepE Xi-/- (clone 16)

Genotype 1 (22/22 colonies) gDNA:

177

Table S1 Sanger sequencing information (continued) TGACTGCAGCATCTTAAAAGCAATAGAATGAGgtaagtcactagcattgcagtcttctgaggatttgcat ttgctggaagatggtgctgggtggagagcatctaatgtgataatgtgaggcagggccatgtacacgatgg aagatgaacaggctttcacgttatcaaatggcctcacagcagcaactcaaactattatctgcttaccagt tatatcacaagaggaatttagcttctaggttttgttgttgttgttgtttgttttggcttggtgccctacc attcttacagacttaaacattgaaaagctttaaatagtttatttcttatctccatctgtgaagcagccat tagacttgtgaaggatgtaaaaaccaagcccccccttttttttaatagaagaggagagtgaagctgacaa ttaaatatgcagtcgcttatagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttgaga tgttcattctgtgtttaaatgtaatattccctagatgtatgccctttggcaatttagttctgctaagacc tgtctgtttgtgaaggtcaaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatgccc tggagttgcatgactaggccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaacct taagtttaacgttgatagcctggtacagtgtactaatggcaatttttttctttgcccttccctgtttctt gttaccctctttctggtggtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTTCTT GTTTTATGTATCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTT GTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTATATC TATTTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTT CCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTTGCT TTGTGTGTCTGTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCT ATTTCTTCCTTGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGTGTC TAATTCTTTGTTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTC TTTGGTATATATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTGCAT CTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTGTGT GTCTATGTCTTCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATTTCT TCCTTCCTTTGCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATTCCC ATGTTTCTCCTGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGTGGT CTGCTTTGTCTTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTCTTT CACTCCTGCCTTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCTCAC CCCACTTGTTCTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCCTTA GTGAATTCTTGTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTGTGT CTCTTTGCTATTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTATCT TCTACTTTCCTCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTATTCT AAAGTAATCCTTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGTCAT GTGTGAGAGGAGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCACACC AGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCTGAA GTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATGGCC CGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTGTGA CTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACCTAA GGACCCAATC cDNA: Deletion slightly larger than expected from gDNA. Removal of normal splice acceptor to exon 7 causes cryptic acceptor (bold) to be used instead (see below for splicing pattern)

TGACTGCAGCATCTTAAAAGCAATAGAATGAG|TGTGTATTGTGGGTGTGTCTATTTCTTGTTTTATGTA TCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTTGTGTGTCTAT TTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTATATCTATTTCTTCC TTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTTG TGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTTGCTTTGTGTGTCT GTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTTCCT TGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTG TTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGGTATAT ATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTGCATCTATTTCTTC

178

Table S1 Sanger sequencing information (continued) CTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTGTGTGTCTATGTCT TCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTCCTTT GCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATTCCCATGTTTCTCC TGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGTGGTCTGCTTTGTC TTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTCTTTCACTCCTGCC TTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCTCACCCCACTTGTT CTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCCTTAGTGAATTCTT GTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTGTGTCTCTTTGCTA TTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTATCTTCTACTTTCC TCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTATTCTAAAGTAATCC TTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGTCATGTGTGAGAGG AGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCACACCAGACTGCCCT GAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCTGAAGTGAACGCCC AAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATGGCCCGTCTTGAGC TGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTGTGACTTCCAAGAG CGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACCTAAGGACCCAATC

∆RepE Xi+/- (clone 1-2)

Genotype 1 (11/11 colonies) gDNA:

TGACTGCAGCATCTTAAAAGCAATAGAATGAGgtaagtcactagcattgcagtcttctgaggatttgcat ttgctggaagatggtgctgggtggagagcatctaatgtgataatgtgaggcagggccatgtacacgatgg aagatgaacaggctttcacgttatcaaatggcctcacagcagcaactcaaactattatctgcttaccagt tatatcacaagaggaatttagcttctaggttttgttgttgttgttgtttgttttggcttggtgccctacc attcttacagacttaaacattgaaaagctttaaatagtttatttcttatctccatctgtgaagcagccat tagacttgtgaaggatgtaaaaaccaagcccccccttttttttaatagaagaggagagtgaagctgacaa ttaaatatgcagtcgcttatagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttgaga tgttcattctgtgtttaaatgtaatattccctagatgtatgccctttggcaatttagttctgctaagacc tgtctgtttgtgaaggtcaaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatgccc tggagttgcatgactaggccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaacct taagtttaacgttgatagcctggtacagtgtactaatggcaatttttttctttgcccttccctgtttctt gttaccctctttctggtggtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTTCTT GTTTTATGTATCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTT GTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTATATC TATTTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTT CCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTTGCT TTGTGTGTCTGTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCT ATTTCTTCCTTGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGTGTC TAATTCTTTGTTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTC TTTGGTATATATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTGCAT CTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTGTGT GTCTATGTCTTCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATTTCT TCCTTCCTTTGCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATTCCC ATGTTTCTCCTGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGTGGT CTGCTTTGTCTTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTCTTT

179

Table S1 Sanger sequencing information (continued) CACTCCTGCCTTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCTCAC CCCACTTGTTCTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCCTTA GTGAATTCTTGTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTGTGT CTCTTTGCTATTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTATCT TCTACTTTCCTCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTATTCT AAAGTAATCCTTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGTCAT GTGTGAGAGGAGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCACACC AGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCTGAA GTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATGGCC CGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTGTGA CTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACCTAA GGACCCAATC cDNA: As expected from gDNA

∆RepE Xi-/- (clone 3-16)

Genotype 1 (10/10 colonies) gDNA:

TGACTGCAGCATCTTAAAAGCAATAGAATGAGgtaagtcactagcattgcagtcttctgaggatttgcat ttgctggaagatggtgctgggtggagagcatctaatgtgataatgtgaggcagggccatgtacacgatgg aagatgaacaggctttcacgttatcaaatggcctcacagcagcaactcaaactattatctgcttaccagt tatatcacaagaggaatttagcttctaggttttgttgttgttgttgtttgttttggcttggtgccctacc attcttacagacttaaacattgaaaagctttaaatagtttatttcttatctccatctgtgaagcagccat tagacttgtgaaggatgtaaaaaccaagcccccccttttttttaatagaagaggagagtgaagctgacaa ttaaatatgcagtcgcttatagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttgaga tgttcattctgtgtttaaatgtaatattccctagatgtatgccctttggcaatttagttctgctaagacc tgtctgtttgtgaaggtcaaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatgccc tggagttgcatgactaggccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaacct taagtttaacgttgatagcctggtacagtgtactaatggcaatttttttctttgcccttccctgtttctt gttaccctctttctggtggtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTTCTT GTTTTATGTATCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTT GTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTATATC TATTTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTT CCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTTGCT TTGTGTGTCTGTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCT ATTTCTTCCTTGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGTGTC TAATTCTTTGTTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTC TTTGGTATATATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTGCAT CTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTGTGT GTCTATGTCTTCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATTTCT TCCTTCCTTTGCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATTCCC ATGTTTCTCCTGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGTGGT CTGCTTTGTCTTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTCTTT CACTCCTGCCTTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCTCAC CCCACTTGTTCTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCCTTA

180

Table S1 Sanger sequencing information (continued) GTGAATTCTTGTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTGTGT CTCTTTGCTATTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTATCT TCTACTTTCCTCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTATTCT AAAGTAATCCTTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGTCAT GTGTGAGAGGAGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCACACC AGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCTGAA GTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATGGCC CGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTGTGA CTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACCTAA GGACCCAATC cDNA: As expected from gDNA

Genotype 2 (9/9 colonies) gDNA:

TGACTGCAGCATCTTAAAAGCAATAGAATGAGgtaagtcactagcattgcagtcttctgaggatttgcat ttgctggaagatggtgctgggtggagagcatctaatgtgataatgtgaggcagggccatgtacacgatgg aagatgaacaggctttcacgttatcaaatggcctcacagcagcaactcaaactattatctgcttaccagt tatatcacaagaggaatttagcttctaggttttgttgttgttgttgtttgttttggcttggtgccctacc attcttacagacttaaacattgaaaagctttaaatagtttatttcttatctccatctgtgaagcagccat tagacttgtgaaggatgtaaaaaccaagcccccccttttttttaatagaagaggagagtgaagctgacaa ttaaatatgcagtcgcttatagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttgaga tgttcattctgtgtttaaatgtaatattccctagatgtatgccctttggcaatttagttctgctaagacc tgtctgtttgtgaaggtcaaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatgccc tggagttgcatgactaggccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaacct taagtttaacgttgatagcctggtacagtgtactaatggcaatttttttctttgcccttccctgtttctt gttaccctctttctggtggtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTTCTT GTTTTATGTATCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTT GTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTATATC TATTTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTTCTT CCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTTGCT TTGTGTGTCTGTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCT ATTTCTTCCTTGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGTGTC TAATTCTTTGTTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTC TTTGGTATATATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTGCAT CTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTGTGT GTCTATGTCTTCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATTTCT TCCTTCCTTTGCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATTCCC ATGTTTCTCCTGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGTGGT CTGCTTTGTCTTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTCTTT CACTCCTGCCTTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCTCAC CCCACTTGTTCTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCCTTA GTGAATTCTTGTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTGTGT CTCTTTGCTATTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTATCT TCTACTTTCCTCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTATTCT AAAGTAATCCTTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGTCAT GTGTGAGAGGAGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCACACC AGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCTGAA

181

Table S1 Sanger sequencing information (continued) GTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATGGCC CGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTGTGA CTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACCTAA GGACCCAATC cDNA: As expected from gDNA

∆RepE Xi+/- (clone 3-9)

Unable to sequence. Deletion significantly larger than expected. Entire RepE region deleted, as determined by Xist RNA FISH mapping.

∆Ex7a Xi+/- (clone 11)

Unable to sequence. Deletion significantly larger than expected. Entire RepE and Ex7a regions deleted, as determined by Xist RNA FISH mapping.

∆Ex7a Xi+/- (clone N3)

Genotype 1 (8/8 colonies) gDNA:

CGTTCCTCACACCAGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTC AAGTGGCTCTGAAGTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAA TACACAGATGGCCCGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTAC AGGACACCTGTGACTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAA GCATAAGACCTAAGGACCCAATCCTATATGGACAGAATATTTAAGAGATAAAGGCCTATGGCCCAGAACT CTGGAAGGATATTTCTATCCTTCTATCCCCAAGACCAAGAAGGGAAATTCGAAGATGAGACCTGCCCCCC AACCCCAGCATCCCTTTCCATTTCTTATATTTCTATTTAAGCTGTCTTCACTTGAGATGTAATTTTTCAT TGTTGCCATTGCCCATAAAGGAATACGTTTTTAGCTGGATAGTATTGTGCAAGGGTCTGTTTTAAACTGG GTCTTAGCCATTTGTTAAATTGTTGATGTTTTACAACTTCCATTTCTCTTCACATCTGCTCCACTTGAGA CGGAACTAAATCCAGCCAGTGTATATAGCCTGACTATTGAAACTTCCCTAGGAATAAGCATGCATACAGA TATGCATACTGCCATCCTCCCTACCTCAGAAGCCCTAGGCTGACAAGAAAAGGAAAGCATCAGGTTGTTA GGGGGAAAACAATGTCAGGCTATCTAGAGAAAATATAAAGAGTTGTTCCAGACCAATGAGAAGAATTAGA CAAGCAATATGCAGATGTGCCAACCCTCTGAGAAGCACCAGCCAGTGTCACCTTCTTTCTTTGGGCTTAG GTGAGCAGGGTATGGTTTTCTAATAATGGTTTGGGGACAAAATGAGGTCTGAACTCCCTGCTCATAGTAG TGGCCGAGTAATTTGGTGCATTTCACCAAAGGAACTCCTGGGTCTAATACCTACCTTTAAAATTAATGAT GAGAGACTCTAAGGACTACTTAACGGGCTTAATCTTTTTCGTGCCTTCCTCTTCCTCTGTAAGAGGGAAG TTAAATGACACAGGATGAAAAAGTAACATGCTCATAGCACATTGGCAATTATACATGGTTATTATCTGAA AGTGTAGAGCTTTTCCTATAAGGCATCAGACTAAGTACCTGAAGCTTTGTGGGTTCATGGTCTTAGTTGC ATATTCCTTAGTTGCAAATCCTTTTCAAAAGGTAAGAAAAAGGCACACTGGTCTATTGCCTGTACTTGAT CAAGCCCTGATATGAATGCCAGGGAATGTCTGAGTAACATTAATTTCCTTCCCTGCATATTTTTTGTGCT GAATACTAAGGCTGTGATGCTTCACTGTGGTCACCCCCAGGTAACAAGATATTACCAGGTAACCAGGAAA CGTATGAATACGTAAACCATGAAGCCTACTGTAACTTCCAAGTCAGTGCTGAGTATGTATTACATA

182

Table S1 Sanger sequencing information (continued)

cDNA: See Figure 2.2

∆Ex7b Xi+/- (clone H2)

Genotype 1 (10/10 colonies) gDNA:

TGTACTTGATCAAGCCCTGATATGAATGCCAGGGAATGTCTGAGTAACATTAATTTCCTTCCCTGCATAT TTTTTGTGCTGAATACTAAGGCTGTGATGCTTCACTGTGGTCACCCCCAGGTAACAAGATATTACCAGGT AACCAGGAAAACGTATGAATACGTAAACCATGAAGCCTACTGTAACTTCCAAGTCAGTGCTGAGTATGTA TTACATAGTAGCTGAAGTCTACGCCTCTGTGTGCTATAGGCACAAAGATTGCTCTAGGAATAACATGCTT TGTAAAAACAAATATATGAACATAACGGGGCTTGAATGAATAACAGTCCATATACTTAAGGCCAGTGTGT TTCTTCTGCTTTGGTGAGGCTCAGTAAGTTATATTATACCAGGTAGCAGAAGAGAAAACACATGGAAACT GATTTTAAACTACAAACTAGGTCACTAATGCAGGTGATTGATTACCCTATTCTGATCACCTTCTAATTTC TGAATACCCATGTTCAGCACTGGGAATAACAAAGGGGGACATTACCACAGAACTAGAATTTACAAAAGAA TGCATTAAATAAAGCATTATACAGCTATCAATTGTTCCATGTGTGCAAATGAATGACTACTAACTACCTC TGATGTATCCGATATTGTTTTGGGTACATGAAATATTCATGAGTAACTGCCATGAAATAAGAATGTTTGC ATTCCATACTATTCATAAGGAATGAGCCAATGCTTAATTTAATCAGTCAAAACTTGAGTGATAAGGGCAT GTTAATACAAGAACATTTGCCCAGGTCACATTATGGTTGTGGGTACTTTCTTAACTATAAAGCAGTTCAG TAGTATAAGACAAGACAAATTTTCTATAGAAATAAAGCTGCCTATAAAATAGGCATAGTCTCTACAAAAT TTTCATTGTACTTTTTAGCCCATAATGGGAAGAGTACAGTTAACAAGCTGGGTGTGGTAGCATGTGCTCT GAGCTGAAGCAACAGGACCACTTGAGCCCAGAAATTGGAGGCTAGCCTGGGAAGACCATAAGGTCAATCT CAAACCTGGAGGCTAAATATTGTCTCCCATGTGTATATTCTCTTTCATGGGTACTGGAGAGATACACAGA CGTACATTTCAGTGTGTCCACACTTGAGAATAATATGTACGTTGGCATTTTATGAACTCGGAGGTACCAT ATAAATGTAACAATTCATTTTCTTACTTGGTATCAATTTCCAGGCTTTTAAAATTCTGCCACATTTATTA TACTGTGAAAATAAAGTAAATAAGTAACTGTGAACCACTGAATATATGAAGCATTCAATACTTGATGAGT ACATACTGAATGGCAGTCATTTATTACAAAACAGTGCCCTTGCTAGGCACTGGGATGCAAAGAGCATTCT CATTGTCCTGTGTATCTAAAGAAATTATGCATGAGATTAATTTATAATTTGTAAACTGCCATATATATGT GTATATATGCAATATTTGCCTGGTGTGCAATGACTTTGCTTTTATCCCAGGCATGCACAACAGATCTGTG TGGAGCTTTGTGAAGTCTACAGTTCTATAAAGCCGGGACCTAACTGTTGGCTTTATCAGTGAACAGTGAT TACTTTCTAAGTTTCATAATGGCTGAAACTTAATCATAATGCTTATCACCTAACACCACCTAATAATAAT TTTACCATGCTATGTGTTGAGCGAACACATAGATTGCTTTCTAGCATTATGTAGCACTTATAGGAGTGAA ATCTAGACCAAAACTTCAATTCACTTCAATGAGGAAATGAAA cDNA: As expected from gDNA

∆Ex7c Xi+/- (clone 1)

Genotype 1 (9/9 colonies) gDNA:

GTTCTATAAAGCCGGGACCTAACTGTTGGCTTTATCAGTGAACAGTGATTACTTTCTAAGTTTCATAATG GCTGAAACTTAATCATAATGCTTATCACCTAACACCACCTAATAATAATTTTACCATGCTATGTGTTGAG CGAACACATAGATTGCTTTCTAGCATTATGTAGCACTTATAGGAGTGAAATCTAGACCAAAACTTCAATT CACTTCAATGAGGAAATGAAAACAGAAAAAAAAAATGGATTTGTGCAAGGCAGTGTGCTAAATGTTACAC

183

Table S1 Sanger sequencing information (continued) TGAGTGGACTATGCTGTCTAGGATACTTCCCAGCTGGCTTGACTGAGGAGGTGGAAAAGGTTTTATTAAT GACAGGAACTTTTTCCATCCAGTTTCTTAAATGTTTGTTGAATGCTGCTGCCAGAGATGAATTACAAACA CCTTGCCAGTAAAGGAGTTTTATAGGGCCAGAGTGAGATAATCCCAGAGCATGGGTATCAGGGAACAAAA CGGGAAGAGGCCAGAGCATCTGATGGCATGTACTCAGTGTGGCCCAGAACCTCTCGAACTAGATGTACTG GCTGGAGGGACCAAGCATGCAGAACACAACACCTAATGAAACATTGTATATAAAATATGCTAACCTAGGT CCTAAAACTAAAATGTGAGGTGGACCTAGTGTAGATCACTGATCATAGGAGACATGGTCTCATAAAGCCC AGGCTGGTTCTAATTGGTGACTGTCACAGCTTCTCAAGTGCTGAGATTACAGATGTGCTTAACCCATGCC CAGCCTGAAGAATATATCTGATTACTGAGTGAATAATATTTTTAAAGAATTATATATTTTATGTATATGA GTACGCTGTTGCTGTCTTCAGACACACCAGAAGAGGGCACCACATCACATTACAGATGGTTGTGAGCCCC CATGTGGTTGTTGGGATTTGAACTCAGGACCTTCGGAAGAGCAGTCAGACTCTTAACCACTGAGTCATCT CTCCAGCCTTCTGAGTAAATATTTTAACTATAATGGCTGTTTGCGAAACCCAACCAAGGCCAAGATTCCT TCAACATAAACTGGAGACTTCCTAGCTAAGGAAGCTCCAAAAGTCATTTTCTCATTGGCCTAGCTTGAAG CCAGGACAGACTTAAAGTCTGTCCTTTAATTCATTACCCATTTTCCTTTTCTTACTGTTGAAGTGTTTCA AAGGAGAATCAAGATGAATCGATAATTCTAAACGTATTTGTTCATTGCCTGGCTCAGCGTCATGTGAGCA AGAAGAATATACTATCACACTCATACTTTTAACTTAAGTGTGATGAAAGTGCAGTTCTAAGTACTAAAAT TTCTAAGTACTGAAAAGAACAAAGACATTTAAAGGATGCAACCCAAAGTGTACTTTACCTCAGTAGTTTC TGAGGGGACTGCAGTCACACCTTGAGACTACAGCTCTCACTTTAGCTGGGAAAAACATCAAGGTGTAGAG GAGGCAAGTTAAATAAAAAGTTGCTCCCCTCCTCATGGGCATGCTTGGTAGAGTGGAAATAATAAAAGAG GTTCTCTATTTCCTCGGTTCCACACATTGCAGAAGATGCTACTGGATGCTAAGTGCAACACATTTGTTCC AAAAGGGCACTCAGTGTGACTTACAGATGCCCCGGAAAGCAGAGGGATGCTCTTTATTAAACAGAAATAT TAGCTCAAACGTTTTCTAGACTGAAGAACACTTTCCTCATTTCCCACAGTTTGCCTCAGAGGTTGAATAC AGGAAGGTTATTATTCATTCATTTGCTTTATTGGTTCGCCTG cDNA: As expected from gDNA

∆Ex7d Xi+/- (clone N4)

Genotype 1 (10/10 colonies) gDNA:

TAGCTCAAACGTTTTCTAGACTGAAGAACACTTTCCTCATTTCCCACAGTTTGCCTCAGAGGTTGAATAC AGGAAGGTTATTATTCATTCATTTGCTTTATTGGTTCGCCTGTTCTACAAGGATTTGCATGTCTCTTAGG CCTTCACTTGGCTCCTGAGACATGGAAAAAGGAAACATAGACATAGGGAAGTGCTGGATGGGGGGGGGGG GTCTCTTTTCTGGGTAGTGGCACGACTTAGTCCTTAGTCCCCAAGTAATATGCAATGTGAGTCCTCATCC TCATGTCTTCTCCGGCCACTGCAATGAGTGGGAAGCTGGGCTTTGTAGCAAGCCTGACCCTAAAGTTACA GAAGCCCTCCACGCTAAGAAACTCAATTTTCTAGGCCATTTTAGCTATGACTGTGACCACTACTGGTCAG GAGGGATGACAGCCATCTAAGTTCCACAATCTTAGGCTACTTTGCATTATCCTGGGGCAAACAAGCCATT TTTGAGCTGCAGCAGGCTTTGAAATACATTGACCAATTTTGCCTGTGTTCGTTAAACCTTTTACCTTTTT ACATGCTAATGCTCACAGTAATTTAGAAATGTTCTCCTTACTATAATATACTCAAGGTGGCTTGCTATGG TAAAATAATGCCAGTGGATGAAAATAACATTAATGTTTAACATTCTTGCATAAAATTTAAGAATAATAAA ATTGACAACAATCAGAAAACTGGAGGAACGAAAGACCAAATTGAAAGAACTTGAAAAAGATTAAAAATGC CTGTGCTTTGACCCTTTCCATTTTTCTTTCACTCACAGAGGGTGGGACAGGAGGCCGAGTGAAGGAAAGG GTCCAGCCTGTCTATCTGGAATCTAAGTTGGGACTTTAATGCAGTTCCACAAAATTGGTATTAATTCGCT AAATGTTTCTGAAAATGTATTTTCATCTAAATGGCTATCAGCTAAGCCTTGAGTCAAATGGGAATGAAAC AGATTAAGTCAATGTGATCTCTTTATCCAAGTTGCCTTAGAGCTGAAGTCACAATTTGCTGTAAGGAAGC TTATTCATTGTAGCATACGCATACTTTCAAAGTATCTAGACTTTACTTAGTAACCCAATCAGGACATTCA GGCAAAAGAAAAGGAACAGAGAAGATGGAGCCAGGTTGAAGAGGTCTGGGAGTTCAAACAAATTTTTTTC

184

Table S1 Sanger sequencing information (continued) ATTTTCATTAAAACTCAATTGGGCATCAAAAGTGTTACTAATATTAGCTTTTAATTAGTGGAAATTGGCT GGATTCAGTAATATCCCTTTGTATGGGTAGGAATGGGCTTACATTTCTGGAATTTGCAAAGGAAAAAATA ACTGAAAGCCTTCCTTTCACAGTTACTGCCATCAATATTGCTACCAATTAAGCACATCCTACCATCATCT GCTTTGATCACATAAATGAACTGTGTACCAATCTGTTGTTGAAAGACTGGAGTCATCTTCCCACCAACTG TGAAAAAACACATGGAAAACACCTGGACTTTGTGAACGGATGCGGAATACAGAACTTCTGTTGACTCTTG GGTGTTTTGAAGACTTGAAAAAAAAAACTGTTGCTTACCAACATGTCACAATGAGTCCGTGTGTGGGTGG GTGGATGGGTGGGTGGGTGGGTGGGTGGGTGGTTGAGTGGGTGGGGTAGTTTGCTGTTAAATAAAATGCT TTGTTTTGAAAACATCATGTGCTGTATATGATATTTTCCTTAAATTTTATTTAATGTGCATGAGTTTTGT CTATGCATTAGGTTTGGGTGTTATACCCGTGTAGGCCAGCAG cDNA: See Figure 2.2

∆Ex7 Xi+/- (clone 10)

Genotype 1 (7/7 colonies) gDNA: catttgctggaagatggtgctgggtggagagcatctaatgtgataatgtgaggcagggccatgtacacga tggaagatgaacaggctttcacgttatcaaatggcctcacagcagcaactcaaactattatctgcttacc agttatatcacaagaggaatttagcttctaggttttgttgttgttgttgtttgttttggcttggtgccct accattcttacagacttaaacattgaaaagctttaaatagtttatttcttatctccatctgtgaagcagc cattagacttgtgaaggatgtaaaaaccaagcccccccttttttttaatagaagaggagagtgaagctga caattaaatatgcagtcgcttatagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttg agatgttcattctgtgtttaaatgtaatattccctagatgtatgccctttggcaatttagttctgctaag acctgtctgtttgtgaaggtcaaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatg ccctggagttgcatgactaggccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaa ccttaagtttaacgttgatagcctggtacagtgtactaatggcaatttttttctttgcccttccctgttt cttgttaccctctttctggtggtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTT CTTGTTTTATGTATCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGC TTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTAT ATCTATTTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTT CTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTT GCTTTGTGTGTCTGTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACA TCTATTTCTTCCTTGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGT GTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAA TTCTTTGGTATATATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTG CATCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTG TGTGTCTATGTCTTCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATT TCTTCCTTCCTTTGCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATT CCCATGTTTCTCCTGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGT GGTCTGCTTTGTCTTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTC TTTCACTCCTGCCTTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCT CACCCCACTTGTTCTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCC TTAGTGAATTCTTGTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTG TGTCTCTTTGCTATTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTA TCTTCTACTTTCCTCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTAT TCTAAAGTAATCCTTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGT CATGTGTGAGAGGAGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCAC ACCAGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCT

185

Table S1 Sanger sequencing information (continued) GAAGTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATG GCCCGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTG TGACTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACC TAAGGACCCAATCCTATATGGACAGAATATTTAAGAGATAAAGGCCTATGGCCCAGAACTCTGGAAGGAT ATTTCTATCCTTCTATCCCCAAGACCAAGAAGGGAAATTCGAAGATGAGACCTGCCCCCCAACCCCAGCA TCCCTTTCCATTTCTTATATTTCTATTTAAGCTGTCTTCACTTGAGATGTAATTTTTCATTGTTGCCATT GCCCATAAAGGAATACGTTTTTAGCTGGATAGTATTGTGCAAGGGTCTGTTTTAAACTGGGTCTTAGCCA TTTGTTAAATTGTTGATGTTTTACAACTTCCATTTCTCTTCACATCTGCTCCACTTGAGACGGAACTAAA TCCAGCCAGTGTATATAGCCTGACTATTGAAACTTCCCTAGGAATAAGCATGCATACAGATATGCATACT GCCATCCTCCCTACCTCAGAAGCCCTAGGCTGACAAGAAAAGGAAAGCATCAGGTTGTTAGGGGGAAAAC AATGTCAGGCTATCTAGAGAAAATATAAAGAGTTGTTCCAGACCAATGAGAAGAATTAGACAAGCAATAT GCAGATGTGCCAACCCTCTGAGAAGCACCAGCCAGTGTCACCTTCTTTCTTTGGGCTTAGGTGAGCAGGG TATGGTTTTCTAATAATGGTTTGGGGACAAAATGAGGTCTGAACTCCCTGCTCATAGTAGTGGCCGAGTA ATTTGGTGCATTTCACCAAAGGAACTCCTGGGTCTAATACCTACCTTTAAAATTAATGATGAGAGACTCT AAGGACTACTTAACGGGCTTAATCTTTTTCGTGCCTTCCTCTTCCTCTGTAAGAGGGAAGTTAAATGACA CAGGATGAAAAAGTAACATGCTCATAGCACATTGGCAATTATACATGGTTATTATCTGAAAGTGTAGAGC TTTTCCTATAAGGCATCAGACTAAGTACCTGAAGCTTTGTGGGTTCATGGTCTTAGTTGCATATTCCTTA GTTGCAAATCCTTTTCAAAAGGTAAGAAAAAGGCACACTGGTCTATTGCCTGTACTTGATCAAGCCCTGA TATGAATGCCAGGGAATGTCTGAGTAACATTAATTTCCTTCCCTGCATATTTTTTGTGCTGAATACTAAG GCTGTGATGCTTCACTGTGGTCACCCCCAGGTAACAAGATATTACCAGGTAACCAGGAAACGTATGAATA CGTAAACCATGAAGCCTACTGTAACTTCCAAGTCAGTGCTGAGTATGTATTACATAGTAGCTGAAGTCTA CGCCTCTGTGTGCTATAGGCACAAAGATTGCTCTAGGAATAACATGCTTTGTAAAAACAAATATATGAAC ATAACGGGGCTTGAATGAATAACAGTCCATATACTTAAGGCCAGTGTGTTTCTTCTGCTTTGGTGAGGCT CAGTAAGTTATATTATACCAGGTAGCAGAAGAGAAAACACATGGAAACTGATTTTAAACTACAAACTAGG TCACTAATGCAGGTGATTGATTACCCTATTCTGATCACCTTCTAATTTCTGAATACCCATGTTCAGCACT GGGAATAACAAAGGGGGACATTACCACAGAACTAGAATTTACAAAAGAATGCATTAAATAAAGCATTATA CAGCTATCAATTGTTCCATGTGTGCAAATGAATGACTACTAACTACCTCTGATGTATCCGATATTGTTTT GGGTACATGAAATATTCATGAGTAACTGCCATGAAATAAGAATGTTTGCATTCCATACTATTCATAAGGA ATGAGCCAATGCTTAATTTAATCAGTCAAAACTTGAGTGATAAGGGCATGTTAATACAAGAACATTTGCC CAGGTCACATTATGGTTGTGGGTACTTTCTTAACTATAAAGCAGTTCAGTAGTATAAGACAAGACAAATT TTCTATAGAAATAAAGCTGCCTATAAAATAGGCATAGTCTCTACAAAATTTTCATTGTACTTTTTAGCCC ATAATGGGAAGAGTACAGTTAACAAGCTGGGTGTGGTAGCATGTGCTCTGAGCTGAAGCAACAGGACCAC TTGAGCCCAGAAATTGGAGGCTAGCCTGGGAAGACCATAAGGTCAATCTCAAACCTGGAGGCTAAATATT GTCTCCCATGTGTATATTCTCTTTCATGGGTACTGGAGAGATACACAGACGTACATTTCAGTGTGTCCAC ACTTGAGAATAATATGTACGTTGGCATTTTATGAACTCGGAGGTACCATATAAATGTAACAATTCATTTT CTTACTTGGTATCAATTTCCAGGCTTTTAAAATTCTGCCACATTTATTATACTGTGAAAATAAAGTAAAT AAGTAACTGTGAACCACTGAATATATGAAGCATTCAATACTTGATGAGTACATACTGAATGGCAGTCATT TATTACAAAACAGTGCCCTTGCTAGGCACTGGGATGCAAAGAGCATTCTCATTGTCCTGTGTATCTAAAG AAATTATGCATGAGATTAATTTATAATTTGTAAACTGCCATATATATGTGTATATATGCAATATTTGCCT GGTGTGCAATGACTTTGCTTTTATCCCAGGCATGCACAACAGATCTGTGTGGAGCTTTGTGAAGTCTACA GTTCTATAAAGCCGGGACCTAACTGTTGGCTTTATCAGTGAACAGTGATTACTTTCTAAGTTTCATAATG GCTGAAACTTAATCATAATGCTTATCACCTAACACCACCTAATAATAATTTTACCATGCTATGTGTTGAG CGAACACATAGATTGCTTTCTAGCATTATGTAGCACTTATAGGAGTGAAATCTAGACCAAAACTTCAATT CACTTCAATGAGGAAATGAAAACAGAAAAAAAAAATGGATTTGTGCAAGGCAGTGTGCTAAATGTTACAC TGAGTGGACTATGCTGTCTAGGATACTTCCCAGCTGGCTTGACTGAGGAGGTGGAAAAGGTTTTATTAAT GACAGGAACTTTTTCCATCCAGTTTCTTAAATGTTTGTTGAATGCTGCTGCCAGAGATGAATTACAAACA CCTTGCCAGTAAAGGAGTTTTATAGGGCCAGAGTGAGATAATCCCAGAGCATGGGTATCAGGGAACAAAA CGGGAAGAGGCCAGAGCATCTGATGGCATGTACTCAGTGTGGCCCAGAACCTCTCGAACTAGATGTACTG GCTGGAGGGACCAAGCATGCAGAACACAACACCTAATGAAACATTGTATATAAAATATGCTAACCTAGGT CCTAAAACTAAAATGTGAGGTGGACCTAGTGTAGATCACTGATCATAGGAGACATGGTCTCATAAAGCCC

186

Table S1 Sanger sequencing information (continued) AGGCTGGTTCTAATTGGTGACTGTCACAGCTTCTCAAGTGCTGAGATTACAGATGTGCTTAACCCATGCC CAGCCTGAAGAATATATCTGATTACTGAGTGAATAATATTTTTAAAGAATTATATATTTTATGTATATGA GTACGCTGTTGCTGTCTTCAGACACACCAGAAGAGGGCACCACATCACATTACAGATGGTTGTGAGCCCC CATGTGGTTGTTGGGATTTGAACTCAGGACCTTCGGAAGAGCAGTCAGACTCTTAACCACTGAGTCATCT CTCCAGCCTTCTGAGTAAATATTTTAACTATAATGGCTGTTTGCGAAACCCAACCAAGGCCAAGATTCCT TCAACATAAACTGGAGACTTCCTAGCTAAGGAAGCTCCAAAAGTCATTTTCTCATTGGCCTAGCTTGAAG CCAGGACAGACTTAAAGTCTGTCCTTTAATTCATTACCCATTTTCCTTTTCTTACTGTTGAAGTGTTTCA AAGGAGAATCAAGATGAATCGATAATTCTAAACGTATTTGTTCATTGCCTGGCTCAGCGTCATGTGAGCA AGAAGAATATACTATCACACTCATACTTTTAACTTAAGTGTGATGAAAGTGCAGTTCTAAGTACTAAAAT TTCTAAGTACTGAAAAGAACAAAGACATTTAAAGGATGCAACCCAAAGTGTACTTTACCTCAGTAGTTTC TGAGGGGACTGCAGTCACACCTTGAGACTACAGCTCTCACTTTAGCTGGGAAAAACATCAAGGTGTAGAG GAGGCAAGTTAAATAAAAAGTTGCTCCCCTCCTCATGGGCATGCTTGGTAGAGTGGAAATAATAAAAGAG GTTCTCTATTTCCTCGGTTCCACACATTGCAGAAGATGCTACTGGATGCTAAGTGCAACACATTTGTTCC AAAAGGGCACTCAGTGTGACTTACAGATGCCCCGGAAAGCAGAGGGATGCTCTTTATTAAACAGAAATAT TAGCTCAAACGTTTTCTAGACTGAAGAACACTTTCCTCATTTCCCACAGTTTGCCTCAGAGGTTGAATAC AGGAAGGTTATTATTCATTCATTTGCTTTATTGGTTCGCCTGTTCTACAAGGATTTGCATGTCTCTTAGG CCTTCACTTGGCTCCTGAGACATGGAAAAAGGAAACATAGACATAGGGAAGTGCTGGATGGGGGGGGGGG GTCTCTTTTCTGGGTAGTGGCACGACTTAGTCCTTAGTCCCCAAGTAATATGCAATGTGAGTCCTCATCC TCATGTCTTCTCCGGCCACTGCAATGAGTGGGAAGCTGGGCTTTGTAGCAAGCCTGACCCTAAAGTTACA GAAGCCCTCCACGCTAAGAAACTCAATTTTCTAGGCCATTTTAGCTATGACTGTGACCACTACTGGTCAG GAGGGATGACAGCCATCTAAGTTCCACAATCTTAGGCTACTTTGCATTATCCTGGGGCAAACAAGCCATT TTTGAGCTGCAGCAGGCTTTGAAATACATTGACCAATTTTGCCTGTGTTCGTTAAACCTTTTACCTTTTT ACATGCTAATGCTCACAGTAATTTAGAAATGTTCTCCTTACTATAATATACTCAAGGTGGCTTGCTATGG TAAAATAATGCCAGTGGATGAAAATAACATTAATGTTTAACATTCTTGCATAAAATTTAAGAATAATAAA ATTGACAACAATCAGAAAACTGGAGGAACGAAAGACCAAATTGAAAGAACTTGAAAAAGATTAAAAATGC CTGTGCTTTGACCCTTTCCATTTTTCTTTCACTCACAGAGGGTGGGACAGGAGGCCGAGTGAAGGAAAGG GTCCAGCCTGTCTATCTGGAATCTAAGTTGGGACTTTAATGCAGTTCCACAAAATTGGTATTAATTCGCT AAATGTTTCTGAAAATGTATTTTCATCTAAATGGCTATCAGCTAAGCCTTGAGTCAAATGGGAATGAAAC AGATTAAGTCAATGTGATCTCTTTATCCAAGTTGCCTTAGAGCTGAAGTCACAATTTGCTGTAAGGAAGC TTATTCATTGTAGCATACGCATACTTTCAAAGTATCTAGACTTTACTTAGTAACCCAATCAGGACATTCA GGCAAAAGAAAAGGAACAGAGAAGATGGAGCCAGGTTGAAGAGGTCTGGGAGTTCAAACAAATTTTTTTC ATTTTCATTAAAACTCAATTGGGCATCAAAAGTGTTACTAATATTAGCTTTTAATTAGTGGAAATTGGCT GGATTCAGTAATATCCCTTTGTATGGGTAGGAATGGGCTTACATTTCTGGAATTTGCAAAGGAAAAAATA ACTGAAAGCCTTCCTTTCACAGTTACTGCCATCAATATTGCTACCAATTAAGCACATCCTACCATCATCT GCTTTGATCACATAAATGAACTGTGTACCAATCTGTTGTTGAAAGACTGGAGTCATCTTCCCACCAACTG TGAAAAAACACATGGAAAACACCTGGACTTTGTGAACGGATGCGGAATACAGAACTTCTGTTGACTCTTG GGTGTTTTGAAGACTTGAAAAAAAAAACTGTTGCTTACCAACATGTCACAATGAGTCCGTGTGTGGGTGG GTGGATGGGTGGGTGGGTGGGTGGGTGGGTGGTTGAGTGGGTGGGGTAGTTTGCTGTTAAATAAAATGCT TTGTTTTGAAaacatcatgtgctgtatatgatattttccttaaattttatttaatgtgcatgagttttgt ctatgcattaggtttgggtgttatacccgtgtaggccagcagagggtgtcggatcccaaggaaaccaagt tacagacgccatcttgtgtgtagccggaatgtaattcagatcccctggaagagcagttagtgctcttaaa tcagttcaccttatgctaatacaaggagcc cDNA: As expected from gDNA

∆Ex7 Xi-/- (clone 22)

Genotype 1 (6/6 colonies)

187

Table S1 Sanger sequencing information (continued)

gDNA: catttgctggaagatggtgctgggtggagagcatctaatgtgataatgtgaggcagggccatgtacacga tggaagatgaacaggctttcacgttatcaaatggcctcacagcagcaactcaaactattatctgcttacc agttatatcacaagaggaatttagcttctaggttttgttgttgttgttgtttgttttggcttggtgccct accattcttacagacttaaacattgaaaagctttaaatagtttatttcttatctccatctgtgaagcagc cattagacttgtgaaggatgtaaaaaccaagcccccccttttttttaatagaagaggagagtgaagctga caattaaatatgcagtcgcttatagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttg agatgttcattctgtgtttaaatgtaatattccctagatgtatgccctttggcaatttagttctgctaag acctgtctgtttgtgaaggtcaaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatg ccctggagttgcatgactaggccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaa ccttaagtttaacgttgatagcctggtacagtgtactaatggcaatttttttctttgcccttccctgttt cttgttaccctctttctggtggtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTT CTTGTTTTATGTATCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGC TTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTAT ATCTATTTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTT CTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTT GCTTTGTGTGTCTGTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACA TCTATTTCTTCCTTGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGT GTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAA TTCTTTGGTATATATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTG CATCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTG TGTGTCTATGTCTTCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATT TCTTCCTTCCTTTGCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATT CCCATGTTTCTCCTGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGT GGTCTGCTTTGTCTTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTC TTTCACTCCTGCCTTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCT CACCCCACTTGTTCTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCC TTAGTGAATTCTTGTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTG TGTCTCTTTGCTATTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTA TCTTCTACTTTCCTCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTAT TCTAAAGTAATCCTTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGT CATGTGTGAGAGGAGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCAC ACCAGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCT GAAGTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATG GCCCGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTG TGACTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACC TAAGGACCCAATCCTATATGGACAGAATATTTAAGAGATAAAGGCCTATGGCCCAGAACTCTGGAAGGAT ATTTCTATCCTTCTATCCCCAAGACCAAGAAGGGAAATTCGAAGATGAGACCTGCCCCCCAACCCCAGCA TCCCTTTCCATTTCTTATATTTCTATTTAAGCTGTCTTCACTTGAGATGTAATTTTTCATTGTTGCCATT GCCCATAAAGGAATACGTTTTTAGCTGGATAGTATTGTGCAAGGGTCTGTTTTAAACTGGGTCTTAGCCA TTTGTTAAATTGTTGATGTTTTACAACTTCCATTTCTCTTCACATCTGCTCCACTTGAGACGGAACTAAA TCCAGCCAGTGTATATAGCCTGACTATTGAAACTTCCCTAGGAATAAGCATGCATACAGATATGCATACT GCCATCCTCCCTACCTCAGAAGCCCTAGGCTGACAAGAAAAGGAAAGCATCAGGTTGTTAGGGGGAAAAC AATGTCAGGCTATCTAGAGAAAATATAAAGAGTTGTTCCAGACCAATGAGAAGAATTAGACAAGCAATAT GCAGATGTGCCAACCCTCTGAGAAGCACCAGCCAGTGTCACCTTCTTTCTTTGGGCTTAGGTGAGCAGGG TATGGTTTTCTAATAATGGTTTGGGGACAAAATGAGGTCTGAACTCCCTGCTCATAGTAGTGGCCGAGTA ATTTGGTGCATTTCACCAAAGGAACTCCTGGGTCTAATACCTACCTTTAAAATTAATGATGAGAGACTCT AAGGACTACTTAACGGGCTTAATCTTTTTCGTGCCTTCCTCTTCCTCTGTAAGAGGGAAGTTAAATGACA

188

Table S1 Sanger sequencing information (continued) CAGGATGAAAAAGTAACATGCTCATAGCACATTGGCAATTATACATGGTTATTATCTGAAAGTGTAGAGC TTTTCCTATAAGGCATCAGACTAAGTACCTGAAGCTTTGTGGGTTCATGGTCTTAGTTGCATATTCCTTA GTTGCAAATCCTTTTCAAAAGGTAAGAAAAAGGCACACTGGTCTATTGCCTGTACTTGATCAAGCCCTGA TATGAATGCCAGGGAATGTCTGAGTAACATTAATTTCCTTCCCTGCATATTTTTTGTGCTGAATACTAAG GCTGTGATGCTTCACTGTGGTCACCCCCAGGTAACAAGATATTACCAGGTAACCAGGAAACGTATGAATA CGTAAACCATGAAGCCTACTGTAACTTCCAAGTCAGTGCTGAGTATGTATTACATAGTAGCTGAAGTCTA CGCCTCTGTGTGCTATAGGCACAAAGATTGCTCTAGGAATAACATGCTTTGTAAAAACAAATATATGAAC ATAACGGGGCTTGAATGAATAACAGTCCATATACTTAAGGCCAGTGTGTTTCTTCTGCTTTGGTGAGGCT CAGTAAGTTATATTATACCAGGTAGCAGAAGAGAAAACACATGGAAACTGATTTTAAACTACAAACTAGG TCACTAATGCAGGTGATTGATTACCCTATTCTGATCACCTTCTAATTTCTGAATACCCATGTTCAGCACT GGGAATAACAAAGGGGGACATTACCACAGAACTAGAATTTACAAAAGAATGCATTAAATAAAGCATTATA CAGCTATCAATTGTTCCATGTGTGCAAATGAATGACTACTAACTACCTCTGATGTATCCGATATTGTTTT GGGTACATGAAATATTCATGAGTAACTGCCATGAAATAAGAATGTTTGCATTCCATACTATTCATAAGGA ATGAGCCAATGCTTAATTTAATCAGTCAAAACTTGAGTGATAAGGGCATGTTAATACAAGAACATTTGCC CAGGTCACATTATGGTTGTGGGTACTTTCTTAACTATAAAGCAGTTCAGTAGTATAAGACAAGACAAATT TTCTATAGAAATAAAGCTGCCTATAAAATAGGCATAGTCTCTACAAAATTTTCATTGTACTTTTTAGCCC ATAATGGGAAGAGTACAGTTAACAAGCTGGGTGTGGTAGCATGTGCTCTGAGCTGAAGCAACAGGACCAC TTGAGCCCAGAAATTGGAGGCTAGCCTGGGAAGACCATAAGGTCAATCTCAAACCTGGAGGCTAAATATT GTCTCCCATGTGTATATTCTCTTTCATGGGTACTGGAGAGATACACAGACGTACATTTCAGTGTGTCCAC ACTTGAGAATAATATGTACGTTGGCATTTTATGAACTCGGAGGTACCATATAAATGTAACAATTCATTTT CTTACTTGGTATCAATTTCCAGGCTTTTAAAATTCTGCCACATTTATTATACTGTGAAAATAAAGTAAAT AAGTAACTGTGAACCACTGAATATATGAAGCATTCAATACTTGATGAGTACATACTGAATGGCAGTCATT TATTACAAAACAGTGCCCTTGCTAGGCACTGGGATGCAAAGAGCATTCTCATTGTCCTGTGTATCTAAAG AAATTATGCATGAGATTAATTTATAATTTGTAAACTGCCATATATATGTGTATATATGCAATATTTGCCT GGTGTGCAATGACTTTGCTTTTATCCCAGGCATGCACAACAGATCTGTGTGGAGCTTTGTGAAGTCTACA GTTCTATAAAGCCGGGACCTAACTGTTGGCTTTATCAGTGAACAGTGATTACTTTCTAAGTTTCATAATG GCTGAAACTTAATCATAATGCTTATCACCTAACACCACCTAATAATAATTTTACCATGCTATGTGTTGAG CGAACACATAGATTGCTTTCTAGCATTATGTAGCACTTATAGGAGTGAAATCTAGACCAAAACTTCAATT CACTTCAATGAGGAAATGAAAACAGAAAAAAAAAATGGATTTGTGCAAGGCAGTGTGCTAAATGTTACAC TGAGTGGACTATGCTGTCTAGGATACTTCCCAGCTGGCTTGACTGAGGAGGTGGAAAAGGTTTTATTAAT GACAGGAACTTTTTCCATCCAGTTTCTTAAATGTTTGTTGAATGCTGCTGCCAGAGATGAATTACAAACA CCTTGCCAGTAAAGGAGTTTTATAGGGCCAGAGTGAGATAATCCCAGAGCATGGGTATCAGGGAACAAAA CGGGAAGAGGCCAGAGCATCTGATGGCATGTACTCAGTGTGGCCCAGAACCTCTCGAACTAGATGTACTG GCTGGAGGGACCAAGCATGCAGAACACAACACCTAATGAAACATTGTATATAAAATATGCTAACCTAGGT CCTAAAACTAAAATGTGAGGTGGACCTAGTGTAGATCACTGATCATAGGAGACATGGTCTCATAAAGCCC AGGCTGGTTCTAATTGGTGACTGTCACAGCTTCTCAAGTGCTGAGATTACAGATGTGCTTAACCCATGCC CAGCCTGAAGAATATATCTGATTACTGAGTGAATAATATTTTTAAAGAATTATATATTTTATGTATATGA GTACGCTGTTGCTGTCTTCAGACACACCAGAAGAGGGCACCACATCACATTACAGATGGTTGTGAGCCCC CATGTGGTTGTTGGGATTTGAACTCAGGACCTTCGGAAGAGCAGTCAGACTCTTAACCACTGAGTCATCT CTCCAGCCTTCTGAGTAAATATTTTAACTATAATGGCTGTTTGCGAAACCCAACCAAGGCCAAGATTCCT TCAACATAAACTGGAGACTTCCTAGCTAAGGAAGCTCCAAAAGTCATTTTCTCATTGGCCTAGCTTGAAG CCAGGACAGACTTAAAGTCTGTCCTTTAATTCATTACCCATTTTCCTTTTCTTACTGTTGAAGTGTTTCA AAGGAGAATCAAGATGAATCGATAATTCTAAACGTATTTGTTCATTGCCTGGCTCAGCGTCATGTGAGCA AGAAGAATATACTATCACACTCATACTTTTAACTTAAGTGTGATGAAAGTGCAGTTCTAAGTACTAAAAT TTCTAAGTACTGAAAAGAACAAAGACATTTAAAGGATGCAACCCAAAGTGTACTTTACCTCAGTAGTTTC TGAGGGGACTGCAGTCACACCTTGAGACTACAGCTCTCACTTTAGCTGGGAAAAACATCAAGGTGTAGAG GAGGCAAGTTAAATAAAAAGTTGCTCCCCTCCTCATGGGCATGCTTGGTAGAGTGGAAATAATAAAAGAG GTTCTCTATTTCCTCGGTTCCACACATTGCAGAAGATGCTACTGGATGCTAAGTGCAACACATTTGTTCC AAAAGGGCACTCAGTGTGACTTACAGATGCCCCGGAAAGCAGAGGGATGCTCTTTATTAAACAGAAATAT TAGCTCAAACGTTTTCTAGACTGAAGAACACTTTCCTCATTTCCCACAGTTTGCCTCAGAGGTTGAATAC

189

Table S1 Sanger sequencing information (continued) AGGAAGGTTATTATTCATTCATTTGCTTTATTGGTTCGCCTGTTCTACAAGGATTTGCATGTCTCTTAGG CCTTCACTTGGCTCCTGAGACATGGAAAAAGGAAACATAGACATAGGGAAGTGCTGGATGGGGGGGGGGG GTCTCTTTTCTGGGTAGTGGCACGACTTAGTCCTTAGTCCCCAAGTAATATGCAATGTGAGTCCTCATCC TCATGTCTTCTCCGGCCACTGCAATGAGTGGGAAGCTGGGCTTTGTAGCAAGCCTGACCCTAAAGTTACA GAAGCCCTCCACGCTAAGAAACTCAATTTTCTAGGCCATTTTAGCTATGACTGTGACCACTACTGGTCAG GAGGGATGACAGCCATCTAAGTTCCACAATCTTAGGCTACTTTGCATTATCCTGGGGCAAACAAGCCATT TTTGAGCTGCAGCAGGCTTTGAAATACATTGACCAATTTTGCCTGTGTTCGTTAAACCTTTTACCTTTTT ACATGCTAATGCTCACAGTAATTTAGAAATGTTCTCCTTACTATAATATACTCAAGGTGGCTTGCTATGG TAAAATAATGCCAGTGGATGAAAATAACATTAATGTTTAACATTCTTGCATAAAATTTAAGAATAATAAA ATTGACAACAATCAGAAAACTGGAGGAACGAAAGACCAAATTGAAAGAACTTGAAAAAGATTAAAAATGC CTGTGCTTTGACCCTTTCCATTTTTCTTTCACTCACAGAGGGTGGGACAGGAGGCCGAGTGAAGGAAAGG GTCCAGCCTGTCTATCTGGAATCTAAGTTGGGACTTTAATGCAGTTCCACAAAATTGGTATTAATTCGCT AAATGTTTCTGAAAATGTATTTTCATCTAAATGGCTATCAGCTAAGCCTTGAGTCAAATGGGAATGAAAC AGATTAAGTCAATGTGATCTCTTTATCCAAGTTGCCTTAGAGCTGAAGTCACAATTTGCTGTAAGGAAGC TTATTCATTGTAGCATACGCATACTTTCAAAGTATCTAGACTTTACTTAGTAACCCAATCAGGACATTCA GGCAAAAGAAAAGGAACAGAGAAGATGGAGCCAGGTTGAAGAGGTCTGGGAGTTCAAACAAATTTTTTTC ATTTTCATTAAAACTCAATTGGGCATCAAAAGTGTTACTAATATTAGCTTTTAATTAGTGGAAATTGGCT GGATTCAGTAATATCCCTTTGTATGGGTAGGAATGGGCTTACATTTCTGGAATTTGCAAAGGAAAAAATA ACTGAAAGCCTTCCTTTCACAGTTACTGCCATCAATATTGCTACCAATTAAGCACATCCTACCATCATCT GCTTTGATCACATAAATGAACTGTGTACCAATCTGTTGTTGAAAGACTGGAGTCATCTTCCCACCAACTG TGAAAAAACACATGGAAAACACCTGGACTTTGTGAACGGATGCGGAATACAGAACTTCTGTTGACTCTTG GGTGTTTTGAAGACTTGAAAAAAAAAACTGTTGCTTACCAACATGTCACAATGAGTCCGTGTGTGGGTGG GTGGATGGGTGGGTGGGTGGGTGGGTGGGTGGTTGAGTGGGTGGGGTAGTTTGCTGTTAAATAAAATGCT TTGTTTTGAAaacatcatgtgctgtatatgatattttccttaaattttatttaatgtgcatgagttttgt ctatgcattaggtttgggtgttatacccgtgtaggccagcagagggtgtcggatcccaaggaaaccaagt tacagacgccatcttgtgtgtagccggaatgtaattcagatcccctggaagagcagttagtgctcttaaa tcagttcaccttatgctaatacaaggagcc cDNA: As expected from gDNA

Genotype 2 (5/5 colonies) gDNA: catttgctggaagatggtgctgggtggagagcatctaatgtgataatgtgaggcagggccatgtacacga tggaagatgaacaggctttcacgttatcaaatggcctcacagcagcaactcaaactattatctgcttacc agttatatcacaagaggaatttagcttctaggttttgttgttgttgttgtttgttttggcttggtgccct accattcttacagacttaaacattgaaaagctttaaatagtttatttcttatctccatctgtgaagcagc cattagacttgtgaaggatgtaaaaaccaagcccccccttttttttaatagaagaggagagtgaagctga caattaaatatgcagtcgcttatagtgtttgctgcttacagaagcttttaatccatgtaacagaatgttg agatgttcattctgtgtttaaatgtaatattccctagatgtatgccctttggcaatttagttctgctaag acctgtctgtttgtgaaggtcaaatgaaatcatgaatggaaagtgttgagtacagagcctggcaaatatg ccctggagttgcatgactaggccatttggaagagttgacgggtgtgtcctatggtcctatgttaaggaaa ccttaagtttaacgttgatagcctggtacagtgtactaatggcaatttttttctttgcccttccctgttt cttgttaccctctttctggtggtctttgcttactatcaatcattagTGTGTATTGTGGGTGTGTCTATTT CTTGTTTTATGTATCTATTTTTTCCTTGGTCTGTGTGTCTAATTCTTTGTTACATCTATTTCTTCCTTGC TTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTATGTCTAATTCTTTGTTAT ATCTATTTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACATCTATTT

190

Table S1 Sanger sequencing information (continued) CTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAATTCTTTGTTATATCTATTTCTTCCTT GCTTTGTGTGTCTGTCTTCCTTGCTTTGTGTCTATTTCTTCCTTGCAGTTGTGTCTAATTCTTTGTTACA TCTATTTCTTCCTTGCTTTTGTGTGTCTTTCTTTCTTGCTTTTGTGTGTCTATTTCTTCCTTGCAGTTGT GTCTAATTCTTTGTTACATCTATTTCTTCCTTGCTTTTGTGTGTCTATTTCTTCCTTGCATTGTGTCTAA TTCTTTGGTATATATATTTCTTCATTGCTTTGTGTGTCTATGTCTCCTTGTGTTGTCTAATTCGTTGTTG CATCTATTTCTTCCTTGCTTTGTGTGTCTATTTCTTCCTTGCTTTGTGTGTCTATGTCTTCCTTGCTTTG TGTGTCTATGTCTTCCTTGTTTTGTGTATCTACTTCTTCCTTGTGTGTCTAATTCTTTGTTACATCTATT TCTTCCTTCCTTTGCATGTCTCCTTCTTTCCTTTGTGTGTCTTTTCTGTCTGCAGTGTGTCTTACCTATT CCCATGTTTCTCCTGCATGTTCTTTCTTGCAGAGCTTTGAGCTTTGTTTCACTTTCTCTGGTGCCTGTGT GGTCTGCTTTGTCTTCACTAGCTATGGCTCTCTGTTTTATCTATCTGGTTGCTATTTCTCTTAGCTTTTC TTTCACTCCTGCCTTTCGTGACTCCCCTTTGGGTCACATGTTGCATGCATCCCTCTCTTTTTCTTGTGCT CACCCCACTTGTTCTTTGTTCAAGTTCTCTTTGTCAGTCCATTTCAGTTTTCTTTCTGCTGCTTCTATCC TTAGTGAATTCTTGTTTACATTTCTTCCCTGCCTTTCTTGGGCCACTTTCTCTGTTTTCTTTTGTATTTG TGTCTCTTTGCTATTGGTGGATTTCTTATCTCAGCATCATTCTGTTGCTTTGTGTTTGCTTGTGTTTCTA TCTTCTACTTTCCTCCTTTCTGTTCACTTTGAGCATTTCATCTCTTTACAAGTCTGTGTCTCTCTTGTAT TCTAAAGTAATCCTTTCTTGGATGTTTCTTTGTATGTACATGTGCGTGTGTGCATGTGTGTTATGTGTGT CATGTGTGAGAGGAGCTTCATAGCCCCTTCCCAATAGGTCCAGAATGTCACCCGTGGAGCCGTTCCTCAC ACCAGACTGCCCTGAGAAATAATCTAAGACAAAATACATCATTCCGTCCGGTCAGGATTCAAGTGGCTCT GAAGTGAACGCCCAAGTAGAAGACAGAAGTTTTGCGACTTGAGATTTAAAAGGACCAAAATACACAGATG GCCCGTCTTGAGCTGGCTGGACAGAATGCTGACAACCCAAAGAAGAGGAACTGTTTCTACAGGACACCTG TGACTTCCAAGAGCGGGGAACTACGTATGTCATAAGACACAAAACCTGAGCTAAGTCCAAGCATAAGACC TAAGGACCCAATCCTATATGGACAGAATATTTAAGAGATAAAGGCCTATGGCCCAGAACTCTGGAAGGAT ATTTCTATCCTTCTATCCCCAAGACCAAGAAGGGAAATTCGAAGATGAGACCTGCCCCCCAACCCCAGCA TCCCTTTCCATTTCTTATATTTCTATTTAAGCTGTCTTCACTTGAGATGTAATTTTTCATTGTTGCCATT GCCCATAAAGGAATACGTTTTTAGCTGGATAGTATTGTGCAAGGGTCTGTTTTAAACTGGGTCTTAGCCA TTTGTTAAATTGTTGATGTTTTACAACTTCCATTTCTCTTCACATCTGCTCCACTTGAGACGGAACTAAA TCCAGCCAGTGTATATAGCCTGACTATTGAAACTTCCCTAGGAATAAGCATGCATACAGATATGCATACT GCCATCCTCCCTACCTCAGAAGCCCTAGGCTGACAAGAAAAGGAAAGCATCAGGTTGTTAGGGGGAAAAC AATGTCAGGCTATCTAGAGAAAATATAAAGAGTTGTTCCAGACCAATGAGAAGAATTAGACAAGCAATAT GCAGATGTGCCAACCCTCTGAGAAGCACCAGCCAGTGTCACCTTCTTTCTTTGGGCTTAGGTGAGCAGGG TATGGTTTTCTAATAATGGTTTGGGGACAAAATGAGGTCTGAACTCCCTGCTCATAGTAGTGGCCGAGTA ATTTGGTGCATTTCACCAAAGGAACTCCTGGGTCTAATACCTACCTTTAAAATTAATGATGAGAGACTCT AAGGACTACTTAACGGGCTTAATCTTTTTCGTGCCTTCCTCTTCCTCTGTAAGAGGGAAGTTAAATGACA CAGGATGAAAAAGTAACATGCTCATAGCACATTGGCAATTATACATGGTTATTATCTGAAAGTGTAGAGC TTTTCCTATAAGGCATCAGACTAAGTACCTGAAGCTTTGTGGGTTCATGGTCTTAGTTGCATATTCCTTA GTTGCAAATCCTTTTCAAAAGGTAAGAAAAAGGCACACTGGTCTATTGCCTGTACTTGATCAAGCCCTGA TATGAATGCCAGGGAATGTCTGAGTAACATTAATTTCCTTCCCTGCATATTTTTTGTGCTGAATACTAAG GCTGTGATGCTTCACTGTGGTCACCCCCAGGTAACAAGATATTACCAGGTAACCAGGAAACGTATGAATA CGTAAACCATGAAGCCTACTGTAACTTCCAAGTCAGTGCTGAGTATGTATTACATAGTAGCTGAAGTCTA CGCCTCTGTGTGCTATAGGCACAAAGATTGCTCTAGGAATAACATGCTTTGTAAAAACAAATATATGAAC ATAACGGGGCTTGAATGAATAACAGTCCATATACTTAAGGCCAGTGTGTTTCTTCTGCTTTGGTGAGGCT CAGTAAGTTATATTATACCAGGTAGCAGAAGAGAAAACACATGGAAACTGATTTTAAACTACAAACTAGG TCACTAATGCAGGTGATTGATTACCCTATTCTGATCACCTTCTAATTTCTGAATACCCATGTTCAGCACT GGGAATAACAAAGGGGGACATTACCACAGAACTAGAATTTACAAAAGAATGCATTAAATAAAGCATTATA CAGCTATCAATTGTTCCATGTGTGCAAATGAATGACTACTAACTACCTCTGATGTATCCGATATTGTTTT GGGTACATGAAATATTCATGAGTAACTGCCATGAAATAAGAATGTTTGCATTCCATACTATTCATAAGGA ATGAGCCAATGCTTAATTTAATCAGTCAAAACTTGAGTGATAAGGGCATGTTAATACAAGAACATTTGCC CAGGTCACATTATGGTTGTGGGTACTTTCTTAACTATAAAGCAGTTCAGTAGTATAAGACAAGACAAATT TTCTATAGAAATAAAGCTGCCTATAAAATAGGCATAGTCTCTACAAAATTTTCATTGTACTTTTTAGCCC ATAATGGGAAGAGTACAGTTAACAAGCTGGGTGTGGTAGCATGTGCTCTGAGCTGAAGCAACAGGACCAC

191

Table S1 Sanger sequencing information (continued) TTGAGCCCAGAAATTGGAGGCTAGCCTGGGAAGACCATAAGGTCAATCTCAAACCTGGAGGCTAAATATT GTCTCCCATGTGTATATTCTCTTTCATGGGTACTGGAGAGATACACAGACGTACATTTCAGTGTGTCCAC ACTTGAGAATAATATGTACGTTGGCATTTTATGAACTCGGAGGTACCATATAAATGTAACAATTCATTTT CTTACTTGGTATCAATTTCCAGGCTTTTAAAATTCTGCCACATTTATTATACTGTGAAAATAAAGTAAAT AAGTAACTGTGAACCACTGAATATATGAAGCATTCAATACTTGATGAGTACATACTGAATGGCAGTCATT TATTACAAAACAGTGCCCTTGCTAGGCACTGGGATGCAAAGAGCATTCTCATTGTCCTGTGTATCTAAAG AAATTATGCATGAGATTAATTTATAATTTGTAAACTGCCATATATATGTGTATATATGCAATATTTGCCT GGTGTGCAATGACTTTGCTTTTATCCCAGGCATGCACAACAGATCTGTGTGGAGCTTTGTGAAGTCTACA GTTCTATAAAGCCGGGACCTAACTGTTGGCTTTATCAGTGAACAGTGATTACTTTCTAAGTTTCATAATG GCTGAAACTTAATCATAATGCTTATCACCTAACACCACCTAATAATAATTTTACCATGCTATGTGTTGAG CGAACACATAGATTGCTTTCTAGCATTATGTAGCACTTATAGGAGTGAAATCTAGACCAAAACTTCAATT CACTTCAATGAGGAAATGAAAACAGAAAAAAAAAATGGATTTGTGCAAGGCAGTGTGCTAAATGTTACAC TGAGTGGACTATGCTGTCTAGGATACTTCCCAGCTGGCTTGACTGAGGAGGTGGAAAAGGTTTTATTAAT GACAGGAACTTTTTCCATCCAGTTTCTTAAATGTTTGTTGAATGCTGCTGCCAGAGATGAATTACAAACA CCTTGCCAGTAAAGGAGTTTTATAGGGCCAGAGTGAGATAATCCCAGAGCATGGGTATCAGGGAACAAAA CGGGAAGAGGCCAGAGCATCTGATGGCATGTACTCAGTGTGGCCCAGAACCTCTCGAACTAGATGTACTG GCTGGAGGGACCAAGCATGCAGAACACAACACCTAATGAAACATTGTATATAAAATATGCTAACCTAGGT CCTAAAACTAAAATGTGAGGTGGACCTAGTGTAGATCACTGATCATAGGAGACATGGTCTCATAAAGCCC AGGCTGGTTCTAATTGGTGACTGTCACAGCTTCTCAAGTGCTGAGATTACAGATGTGCTTAACCCATGCC CAGCCTGAAGAATATATCTGATTACTGAGTGAATAATATTTTTAAAGAATTATATATTTTATGTATATGA GTACGCTGTTGCTGTCTTCAGACACACCAGAAGAGGGCACCACATCACATTACAGATGGTTGTGAGCCCC CATGTGGTTGTTGGGATTTGAACTCAGGACCTTCGGAAGAGCAGTCAGACTCTTAACCACTGAGTCATCT CTCCAGCCTTCTGAGTAAATATTTTAACTATAATGGCTGTTTGCGAAACCCAACCAAGGCCAAGATTCCT TCAACATAAACTGGAGACTTCCTAGCTAAGGAAGCTCCAAAAGTCATTTTCTCATTGGCCTAGCTTGAAG CCAGGACAGACTTAAAGTCTGTCCTTTAATTCATTACCCATTTTCCTTTTCTTACTGTTGAAGTGTTTCA AAGGAGAATCAAGATGAATCGATAATTCTAAACGTATTTGTTCATTGCCTGGCTCAGCGTCATGTGAGCA AGAAGAATATACTATCACACTCATACTTTTAACTTAAGTGTGATGAAAGTGCAGTTCTAAGTACTAAAAT TTCTAAGTACTGAAAAGAACAAAGACATTTAAAGGATGCAACCCAAAGTGTACTTTACCTCAGTAGTTTC TGAGGGGACTGCAGTCACACCTTGAGACTACAGCTCTCACTTTAGCTGGGAAAAACATCAAGGTGTAGAG GAGGCAAGTTAAATAAAAAGTTGCTCCCCTCCTCATGGGCATGCTTGGTAGAGTGGAAATAATAAAAGAG GTTCTCTATTTCCTCGGTTCCACACATTGCAGAAGATGCTACTGGATGCTAAGTGCAACACATTTGTTCC AAAAGGGCACTCAGTGTGACTTACAGATGCCCCGGAAAGCAGAGGGATGCTCTTTATTAAACAGAAATAT TAGCTCAAACGTTTTCTAGACTGAAGAACACTTTCCTCATTTCCCACAGTTTGCCTCAGAGGTTGAATAC AGGAAGGTTATTATTCATTCATTTGCTTTATTGGTTCGCCTGTTCTACAAGGATTTGCATGTCTCTTAGG CCTTCACTTGGCTCCTGAGACATGGAAAAAGGAAACATAGACATAGGGAAGTGCTGGATGGGGGGGGGGG GTCTCTTTTCTGGGTAGTGGCACGACTTAGTCCTTAGTCCCCAAGTAATATGCAATGTGAGTCCTCATCC TCATGTCTTCTCCGGCCACTGCAATGAGTGGGAAGCTGGGCTTTGTAGCAAGCCTGACCCTAAAGTTACA GAAGCCCTCCACGCTAAGAAACTCAATTTTCTAGGCCATTTTAGCTATGACTGTGACCACTACTGGTCAG GAGGGATGACAGCCATCTAAGTTCCACAATCTTAGGCTACTTTGCATTATCCTGGGGCAAACAAGCCATT TTTGAGCTGCAGCAGGCTTTGAAATACATTGACCAATTTTGCCTGTGTTCGTTAAACCTTTTACCTTTTT ACATGCTAATGCTCACAGTAATTTAGAAATGTTCTCCTTACTATAATATACTCAAGGTGGCTTGCTATGG TAAAATAATGCCAGTGGATGAAAATAACATTAATGTTTAACATTCTTGCATAAAATTTAAGAATAATAAA ATTGACAACAATCAGAAAACTGGAGGAACGAAAGACCAAATTGAAAGAACTTGAAAAAGATTAAAAATGC CTGTGCTTTGACCCTTTCCATTTTTCTTTCACTCACAGAGGGTGGGACAGGAGGCCGAGTGAAGGAAAGG GTCCAGCCTGTCTATCTGGAATCTAAGTTGGGACTTTAATGCAGTTCCACAAAATTGGTATTAATTCGCT AAATGTTTCTGAAAATGTATTTTCATCTAAATGGCTATCAGCTAAGCCTTGAGTCAAATGGGAATGAAAC AGATTAAGTCAATGTGATCTCTTTATCCAAGTTGCCTTAGAGCTGAAGTCACAATTTGCTGTAAGGAAGC TTATTCATTGTAGCATACGCATACTTTCAAAGTATCTAGACTTTACTTAGTAACCCAATCAGGACATTCA GGCAAAAGAAAAGGAACAGAGAAGATGGAGCCAGGTTGAAGAGGTCTGGGAGTTCAAACAAATTTTTTTC ATTTTCATTAAAACTCAATTGGGCATCAAAAGTGTTACTAATATTAGCTTTTAATTAGTGGAAATTGGCT

192

Table S1 Sanger sequencing information (continued) GGATTCAGTAATATCCCTTTGTATGGGTAGGAATGGGCTTACATTTCTGGAATTTGCAAAGGAAAAAATA ACTGAAAGCCTTCCTTTCACAGTTACTGCCATCAATATTGCTACCAATTAAGCACATCCTACCATCATCT GCTTTGATCACATAAATGAACTGTGTACCAATCTGTTGTTGAAAGACTGGAGTCATCTTCCCACCAACTG TGAAAAAACACATGGAAAACACCTGGACTTTGTGAACGGATGCGGAATACAGAACTTCTGTTGACTCTTG GGTGTTTTGAAGACTTGAAAAAAAAAACTGTTGCTTACCAACATGTCACAATGAGTCCGTGTGTGGGTGG GTGGATGGGTGGGTGGGTGGGTGGGTGGGTGGTTGAGTGGGTGGGGTAGTTTGCTGTTAAATAAAATGCT TTGTTTTGAAaacatcatgtgctgtatatgatattttccttaaattttatttaatgtgcatgagttttgt ctatgcattaggtttgggtgttatacccgtgtaggccagcagagggtgtcggatcccaaggaaaccaagt tacagacgccatcttgtgtgtagccggaatgtaattcagatcccctggaagagcagttagtgctcttaaa tcagttcaccttatgctaatacaaggagcc cDNA: As expected from gDNA

Xist deletion mES cells

∆RepB Xa+Xi- (clone C9)

Genotype 1 (8/8 colonies) gDNA:

TGAGTCACTGTCCCATAAGGACGTGAGTTTCGCTTGGTACTTCACGTGTGTCTTTAGTCATCATTTTTTC GAAGTGCCTGCCCAGGTCGGGAGAGCGCATGCTTGCAATTCTAACACTGAAGTGTTGGATGATGTCGGAT CCGATTCGAGAGACCGAGGCTGCGGGTTCTTGGTCGATGTAAATCATTGAAACCTCACCTATTAAAAGAA AGAAAAGTATCTAAGGCCATTTCAAGGACATTTGACTCATCCGCTTGCGTTCATAGTCTCTTACAGTGCT CTATACGTGGCGGTGCAAACTAAAACTCAGCCCGTTCCATTCCTTTGTATTGTTCAGTGGCTAGTCTACT TACACCTTGGCCTCTGATTTAGCCAGCACTGATCTCAAGCGGTTCTCTAAGCCTACTGGGTATAAGTGGT GACTTTGGCCAGAGTCATAGTGGATCACAAATCACTGGTGAAGAGGTAGAATCCTACCTTCTTCCAAAAT CTACCCCATGACTATTGCTGGGGTTGCATTTTGATTTCAATGAATATTTTGGATGCCAACGACACGTCTG ATAGTGTGCTTTGCTAGTGTTTGAATTTAAAACCGAAGTGATTGTTTTCAAAATGTATTTACGGATTTGC TTACTTGTTGAATTCATTTTAATTACCTTTAGTGAATTGTTACTTTGGAGTCCTTAAAGTTTTCAATAAT TTTTTTGGCAGATGATACTCAAATTACTTGGCACTTAAATGTACTTTCTTTCAAACTCATCCACCGAGCT ACTCTTCAAATTTTTAAGTCTTATAACACAGATACTGTTAATGTAAAGTGAACATTATGACTGGATGTCA GGAGTATTTGAGGTTCTATACCAGTTCAGGCTTTGCTTTTGTTGCTATTGTTGATGCTATATTGACTAAT GGTTTTACTTGTCAGCAAGAGCCTTGAATTGTAATGCTCTGTGTCCTCTATCAGACTTACTGTTATAATA GTAATATTAAGGCCTACATTTCAACTTTCTGTGTGTTCTTGCCTTTATGGCATCTAGATTCTCCTCAAGA CTCAGCAAATAGTGCTGCTGCTATTGCTGCCCCAGCCCCAGGCCCAGCCCCAGCCCCTGCCCCAGCCCCA GCCCCAGCCCCTGCCCCAGCCCCAGCCCCTGCCCCTGCCCCAGCCCCTGCCCCAGCCCCAGCCCCAGCCC CTACCCCTGCCCCTGCCCCTGCCCCACCCAACCAACCCAATCCAGTCCAGCCCTGCCCCAGCCCAGTCCT AGCCCCAGGCCCAGATACTTTCAGACCTATCCCAAGCCCACTTCTACTTAGAGAAATTCG cDNA: As expected from gDNA

∆RepB Xa-Xi- (clone D2)

Genotype 1 (18/18 colonies) gDNA:

193

Table S1 Sanger sequencing information (continued) TGAGTCACTGTCCCATAAGGACGTGAGTTTCGCTTGGTACTTCACGTGTGTCTTTAGTCATCATTTTTTC GAAGTGCCTGCCCAGGTCGGGAGAGCGCATGCTTGCAATTCTAACACTGAAGTGTTGGATGATGTCGGAT CCGATTCGAGAGACCGAGGCTGCGGGTTCTTGGTCGATGTAAATCATTGAAACCTCACCTATTAAAAGAA AGAAAAGTATCTAAGGCCATTTCAAGGACATTTGACTCATCCGCTTGCGTTCATAGTCTCTTACAGTGCT CTATACGTGGCGGTGCAAACTAAAACTCAGCCCGTTCCATTCCTTTGTATTGTTCAGTGGCTAGTCTACT TACACCTTGGCCTCTGATTTAGCCAGCACTGATCTCAAGCGGTTCTCTAAGCCTACTGGGTATAAGTGGT GACTTTGGCCAGAGTCATAGTGGATCACAAATCACTGGTGAAGAGGTAGAATCCTACCTTCTTCCAAAAT CTACCCCATGACTATTGCTGGGGTTGCATTTTGATTTCAATGAATATTTTGGATGCCAACGACACGTCTG ATAGTGTGCTTTGCTAGTGTTTGAATTTAAAACCGAAGTGATTGTTTTCAAAATGTATTTACGGATTTGC TTACTTGTTGAATTCATTTTAATTACCTTTAGTGAATTGTTACTTTGGAGTCCTTAAAGTTTTCAATAAT TTTTTTGGCAGATGATACTCAAATTACTTGGCACTTAAATGTACTTTCTTTCAAACTCATCCACCGAGCT ACTCTTCAAATTTTTAAGTCTTATAACACAGATACTGTTAATGTAAAGTGAACATTATGACTGGATGTCA GGAGTATTTGAGGTTCTATACCAGTTCAGGCTTTGCTTTTGTTGCTATTGTTGATGCTATATTGACTAAT GGTTTTACTTGTCAGCAAGAGCCTTGAATTGTAATGCTCTGTGTCCTCTATCAGACTTACTGTTATAATA GTAATATTAAGGCCTACATTTCAACTTTCTGTGTGTTCTTGCCTTTATGGCATCTAGATTCTCCTCAAGA CTCAGCAAATAGTGCTGCTGCTATTGCTGCCCCAGCCCCAGGCCCAGCCCCAGCCCCTGCCCCAGCCCCA GCCCCAGCCCCTGCCCCAGCCCCAGCCCCTGCCCCTGCCCCAGCCCCTGCCCCAGCCCCAGCCCCAGCCC CTACCCCTGCCCCTGCCCCTGCCCCACCCAACCAACCCAATCCAGTCCAGCCCTGCCCCAGCCCAGTCCT AGCCCCAGGCCCAGATACTTTCAGACCTATCCCAAGCCCACTTCTACTTAGAGAAATTCG cDNA: As expected from gDNA

KO MEF cells

EED KO (clone 1-2)

Genotype 1 (5/14 colonies)

CATTGTTTGGAGTTCAGTTTAACTGGCACAGTAAAGAAGGAGACCCTCTGGTGTTTGCA

Genotype 2 (2/14 colonies)

CATTGTTTGGAGTTCAGTTTAACTGGCACAGTAAAGAAGGAGACCCTCTGGTGTTTGCA

Genotype 3 (7/14 colonies)

CATTGTTTGGAGTTCAGTTTAACTGGCACAGTAAAGAAGGAGACCCTCTGGTGTTTGCA

EED KO (clone 1-12)

Genotype 1 (8/14 colonies)

CATTGTTTGGAGTTCAGTTTAACTGGCACAGTAAAAGAAGGAGACCCTCTGGTGTTTGCA

Genotype 2 (6/14 colonies)

CATTGTTTGGAGTTCAGTTTAACTGGCACAGTAAAGAAGGAGACCCTCTGGTGTTTGCA

194

Table S1 Sanger sequencing information (continued)

RING1A/B KO (clone 67)

RING1A

Genotype 1 (2/11 colonies)

GCATAGGTTCTGCTCGGACTGCATCGTCACCGCCCTTGCGGAGCGGgtaacaggagg

Genotype 2 (4/11 colonies)

GCATAGGTTCTGCTCGGACTGCATCGTCACCGCCCTGCGGAGCGGgtaacaggagg

Genotype 3 (5/11 colonies)

GCATAGGTTCTGCTCGGACTGCATCGTCACCGCCCTGCGGAGCGGgtaacaggagg

RING1B

Genotype 1 (3/8 colonies)

TTTGTTTGGATATGTTAAAGAACACCATGACTATACAAAGGAGTGTTTACATCGGTT

Genotype 2 (5/8 colonies)

TTTGTTTGGATATGTTAAAGAACACCATGACTACAAAGGAGTGTTTACATCGGTT

RING1A/B KO (clone 13)

RING1A

Genotype 1 (4/12 colonies)

GCATAGGTTCTGCTCGGACTGCATCGTCACCGCCCTGCGGAGCGGgtaacaggagg

Genotype 2 (8/12 colonies)

GCATAGGTTCTGCTCGGACTGCATCGTCACCGCCCTTGCGGAGCGGgtaacaggagg

RING1B

Genotype 1 (2/11 colonies)

TTTGTTTGGATATGTTAAAGAACACCATGACTAACAAAGGAGTGTTTACATCGGTT

Genotype 2 (9/11 colonies)

TTTGTTTGGATATGTTAAAGAACACCATGACTACAAAGGAGTGTTTACATCGGTT

195

Table S1 Sanger sequencing information (continued)

CIZ1 KO (clone 1)

Genotype 1 (4/16 colonies)

CCACCCCAGATGGTCACCCCAAATCTGCAGCAGTTCTTTCCCCAGGCTACTCGACAGTCTCT

Genotype 2 (5/16 colonies)

CCACCCCATGATGGTCACCCCAAATCTGCAGCCAGTTCTTTCCCCAGGCTACTCGACAGTCTCT

Genotype 3 (7/16 colonies)

CACCACCCCAGATGGTCACCCCAAATCTGCAGCAGTTCTTTCCCCAGGCTACTCGACAGTCTCT

CIZ1 KO (clone 5)

Genotype 1 (6/18 colonies)

CCACCCCAGATGGTCACCCCAAATCTGCAGCAGTTCTTTCCCCAGGCTACTCGACAGTCTCT

Genotype 2 (8/18 colonies)

CCACCCCAGATGGTCACCCCAAATCTGCAGCAGTTCTTTCCCCAGGCTACTCGACAGTCTCT

Genotype 3 (4/18 colonies)

CCACCCCAGATGGTCACCCCAAATCTGCAGCAGTTCTTTCCCCAGGCTACTCGACAGTCTCT

HNRNPU KO

Genotype 1 (20/20 colonies)

CGGCCGGGCGCTCGGGAGCGGGCCTAGAGCAGGAGGCCGCGGCTGGCGCCGAAGAC

KO mES cells

CIZ1 KO

Genotype 1 (8/8 colonies)

CCCCAAGCCTAGCAGCTCCCAGCCTTACACCACCCCAGATGGTCACCCCAAATCTGCAGCAGTT

196

Table S1 Sanger sequencing information (continued)

KO HEK293FT cells

CIZ1 KO (clone 3)

Genotype 1 (6/7 colonies)

ACTGGGAAGGGTTCATGGGGACCCCAACAGGAGGAGGTCCCAGCAAGGACTGGCGAGTGGCC

Genotype 2 (1/7 colonies)

ACTGGGAAGGGTTCATGGGGACCCCAACAGGAGGAGGTCCCAGCAAGGACTGGCGAGTGGCC

CIZ1 KO (clone 5)

Genotype 1 (9/11 colonies)

ACTGGGAAGGGTTCATGGGGACCCCAACAGGAGGAGGTCCCAGCAAAAGGACTGGCGAGTGGCC

Genotype 2 (2/11 colonies)

ACTGGGAAGGGTTCATGGGGACCCCAACAGGAGGAGGTCCCAGCAAAGGACTGGCGAGTGGCC

KI MEF cells

CIZ1-EGFP-3xHA KI

Genotype 1 (16/16 colonies)

Caaggtgaagcctggatcccccggcctcccaccacctcttcgccgctcaacacgcctcaaaaccatggtg agcaagggcgaggagctgttcaccggggtggtgcccatcctggtcgagctggacggcgacgtaaacggcc acaagttcagcgtgtccggcgagggcgagggcgatgccacctacggcaagctgaccctgaagttcatctg caccaccggcaagctgcccgtgccctggcccaccctcgtgaccaccctgacctacggcgtgcagtgcttc agccgctaccccgaccacatgaagcagcacgacttcttcaagtccgccatgcccgaaggctacgtccagg agcgcaccatcttcttcaaggacgacggcaactacaagacccgcgccgaggtgaagttcgagggcgacac cctggtgaaccgcatcgagctgaagggcatcgacttcaaggaggacggcaacatcctggggcacaagctg gagtacaactacaacagccacaacgtctatatcatggccgacaagcagaagaacggcatcaaggtgaact tcaagatccgccacaacatcgaggacggcagcgtgcagctcgccgaccactaccagcagaacacccccat cggcgacggccccgtgctgctgcccgacaaccactacctgagcacccagtccgccctgagcaaagacccc aacgagaagcgcgatcacatggtcctgctggagttcgtgaccgccgccgggatcactctcggcatggacg agctgtacccctatgatgtgccggactacgctggttacccatacgatgttcccgattatgccggtagtta tccatacgacgtgccagactacgccctgtacaagtaa

197

Table S2 Mass spectrometry data

Mass spectrometry data for Repeat B in vitro RNA pulldowns and ASH2L antibody immunoprecipitation. Proteins with >1 enrichment in total peptides over control are shown.

RepB (Sense) versus RepB (Anti-Sense) in vitro RNA pulldown Unique Total Unique Total Ratio of Total Protein Peptides Peptides Peptides Peptides Peptides RepB(S) RepB(S) RepB(AS) RepB(AS) RepB(S)/RepB(AS Hnrnpk 42 778 18 40 ) 19.45 Pcbp2 9 65 3 4 16.25 Ago2 9 14 1 1 14.00 Sptbn1 9 11 1 1 11.00 Rps5 7 39 3 4 9.75 Trim28 7 9 1 1 9.00 Nes 6 9 1 1 9.00 Tagln2 5 8 1 1 8.00 Gfap 1 8 1 1 8.00 Pno1 2 7 1 1 7.00 Col5a1 5 6 1 1 6.00 Rrp1 3 6 1 1 6.00 Pcbp1 14 60 7 11 5.45 Dhx33 4 5 1 1 5.00 Imp3 2 5 1 1 5.00 Rps29 1 5 1 1 5.00 Trrap 5 5 1 1 5.00 Mdn1 4 5 1 1 5.00 Cpsf4 2 5 1 1 5.00 Ahnak 50 63 12 13 4.85 Rpl32 9 37 5 8 4.63 Rpl36 3 9 1 2 4.50 Rpl28 8 27 6 6 4.50 Rbbp4 5 9 2 2 4.50 Rpl10a 14 108 11 25 4.32 Rrs1 10 21 4 5 4.20 Ebna1bp2 11 33 6 8 4.13 Snw1 4 4 1 1 4.00 Ddx47 10 16 3 4 4.00 Mrto4 5 12 2 3 4.00 Rsl24d1 4 12 2 3 4.00 Kdelc1 3 4 1 1 4.00 Dcaf13 3 4 1 1 4.00 Pwp1 3 11 1 3 3.67

198

Table S2 Mass spectrometry data (continued)

Rps10 5 11 3 3 3.67 Rpl30 5 33 4 9 3.67 Ckap4 13 25 6 7 3.57 Gm8730 1 13 1 4 3.25 Rpl18 8 71 6 22 3.23 Rpl4 20 170 18 54 3.15 Myo1e 3 3 1 1 3.00 Rprd1b 2 3 1 1 3.00 Eif6 6 18 3 6 3.00 Aldoa 3 3 1 1 3.00 Sart3 3 3 1 1 3.00 Zc3h18 1 3 1 1 3.00 Picalm 3 6 1 2 3.00 Hist2h2aa1 2 3 1 1 3.00 Fh 3 3 1 1 3.00 Hsd17b10 2 3 1 1 3.00 P4hb 3 3 1 1 3.00 Fen1 1 3 1 1 3.00 Rae1 3 6 2 2 3.00 Naa15 4 6 2 2 3.00 Ncbp1 5 6 2 2 3.00 Gnl2 8 9 3 3 3.00 Lims1 3 3 1 1 3.00 Sec22b 4 6 2 2 3.00 Gsk3a 2 3 1 1 3.00 Metap2 1 3 1 1 3.00 Hist1h1b 4 9 2 3 3.00 Pum1 3 3 1 1 3.00 Nol11 2 3 1 1 3.00 Nup214 3 3 1 1 3.00 Ddx42 4 6 2 2 3.00 Mta3 2 3 1 1 3.00 Ebf1 3 3 1 1 3.00 Rpl14 5 44 6 15 2.93 Hist1h1c 6 32 7 11 2.91 Col3a1 13 22 6 8 2.75 Rps18 12 67 11 25 2.68 Bop1 3 8 3 3 2.67 Rpf1 3 8 3 3 2.67 Rpl21 10 53 8 20 2.65 Hist1h2ba 6 74 6 28 2.64 Ran 4 21 3 8 2.63 Rpl12 9 62 7 24 2.58

199

Table S2 Mass spectrometry data (continued)

Rpl6 20 93 12 36 2.58 Mov10 27 56 14 22 2.55 Hnrnpa0 7 71 8 28 2.54 Slc25a11 4 5 2 2 2.50 Ruvbl2 5 5 2 2 2.50 Hist1h1a 4 5 2 2 2.50 Serbp1 3 5 2 2 2.50 Naa40 2 5 2 2 2.50 Ewsr1 3 5 2 2 2.50 Gatad2b 4 5 2 2 2.50 Ndufa4 2 5 1 2 2.50 Rps16 12 57 8 23 2.48 Col1a2 32 54 18 22 2.45 Magohb 5 12 3 5 2.40 Sf3b3 6 7 3 3 2.33 Ddx31 5 7 2 3 2.33 Snrnp40 3 7 1 3 2.33 Rpl29 4 7 3 3 2.33 Gatad2a 5 7 3 3 2.33 Rpl35a 9 23 8 10 2.30 Eif4a3 20 55 15 24 2.29 Tra2b 5 16 3 7 2.29 Rpl15 8 27 5 12 2.25 Rpl18a 9 29 7 13 2.23 Prdx1 10 20 7 9 2.22 Map4 10 11 5 5 2.20 Rpl5 17 48 12 22 2.18 Rrbp1 26 65 20 30 2.17 Ddx27 10 15 6 7 2.14 Gtpbp4 19 53 14 25 2.12 Rpl7a 13 72 10 35 2.06 Col1a1 51 94 29 46 2.04 Rps21 2 2 1 1 2.00 Dad1 2 2 1 1 2.00 Rplp2 3 6 1 3 2.00 Rplp1 1 2 1 1 2.00 LRWD1 2 2 1 1 2.00 Apex1 2 2 1 1 2.00 Ahnak2 3 4 1 2 2.00 Stat1 1 2 1 1 2.00 Rps12 2 2 1 1 2.00 Nr2f1 2 2 1 1 2.00 Arpc3 3 4 2 2 2.00

200

Table S2 Mass spectrometry data (continued)

Emc1 1 2 1 1 2.00 Nbn 1 2 1 1 2.00 Pin4 1 2 1 1 2.00 Cald1 9 10 5 5 2.00 Scfd1 2 4 2 2 2.00 Plrg1 5 6 2 3 2.00 Stag2 3 4 2 2 2.00 Snap91 2 2 1 1 2.00 Poglut1 2 2 1 1 2.00 Dhcr7 2 2 1 1 2.00 Isg20l2 5 8 2 4 2.00 Dek 2 2 1 1 2.00 Krtcap2 1 4 1 2 2.00 Strap 3 4 2 2 2.00 Glyr1 6 12 4 6 2.00 Vars 2 2 1 1 2.00 Ddx49 1 2 1 1 2.00 Tram1 1 2 1 1 2.00 2210010C04Rik 1 2 1 1 2.00 Gtf3c5 1 2 1 1 2.00 Vdac2 4 6 2 3 2.00 Eef1b 2 4 1 2 2.00 Lyar 5 8 4 4 2.00 Eif3i 1 2 1 1 2.00 Hist1h1e 3 4 1 2 2.00 Bptf 1 2 1 1 2.00 Rpl36 2 4 1 2 2.00 Med16 2 2 1 1 2.00 Nupl2 2 2 1 1 2.00 Rpl13a 10 46 11 23 2.00 Med24 2 2 1 1 2.00 Abce1 3 4 2 2 2.00 Dazap1 2 4 2 2 2.00 Slc25a1 2 2 1 1 2.00 Gtf2i 5 6 3 3 2.00 Ints7 3 4 2 2 2.00 Supt6h 2 2 1 1 2.00 Hnrnpl 2 2 1 1 2.00 Smarcd2 4 4 2 2 2.00 Pcid2 2 2 1 1 2.00 Dhx9 1 2 1 1 2.00 Mlec 2 2 1 1 2.00 Tubb3 2 8 2 4 2.00

201

Table S2 Mass spectrometry data (continued)

Larp1 1 2 1 1 2.00 Exosc10 2 4 2 2 2.00 Dync1li1 5 6 3 3 2.00 Tbl1x 1 2 1 1 2.00 Pabpn1 1 2 1 1 2.00 Khsrp 10 14 7 7 2.00 H1f0 2 2 1 1 2.00 Nfix 3 8 3 4 2.00 Atad2 2 2 1 1 2.00 Fcf1 2 2 1 1 2.00 Dld 2 2 1 1 2.00 Arhgap5 1 2 1 1 2.00 Col2a1 2 2 1 1 2.00 Dnajb2 1 2 1 1 2.00 Hsp90ab1 2 4 2 2 2.00 Wdr43 3 6 3 3 2.00 Safb2 1 2 1 1 2.00 Naa11 1 2 1 1 2.00 Eif3h 1 2 1 1 2.00 Msn 11 15 6 8 1.88 Mta2 11 15 7 8 1.88 Wdr75 8 13 6 7 1.86 Rps24 4 24 6 13 1.85 Rpl7 25 105 18 57 1.84 Lrrc59 4 11 4 6 1.83 Hspa5 8 11 6 6 1.83 Rpl36a 3 11 3 6 1.83 Tex10 7 9 4 5 1.80 Rps15a 5 9 3 5 1.80 Actb 7 36 6 20 1.80 Tubb5 5 9 3 5 1.80 Rps14 6 18 6 10 1.80 Ssr4 3 9 3 5 1.80 Rps11 5 9 4 5 1.80 Utp15 6 9 4 5 1.80 Eef1a1 16 128 16 73 1.75 Rbm3 2 7 2 4 1.75 Eftud2 17 28 11 16 1.75 Xab2 5 7 4 4 1.75 Ehd4 8 14 6 8 1.75 Col5a2 7 7 3 4 1.75 Hp1bp3 15 38 13 22 1.73 Hist1h2af 5 34 5 20 1.70

202

Table S2 Mass spectrometry data (continued)

Adarb1 9 17 7 10 1.70 Nop2 16 42 11 25 1.68 Nop10 1 5 2 3 1.67 Ddx56 6 10 5 6 1.67 Ptbp3 3 5 3 3 1.67 Rpl10 3 10 2 6 1.67 Cct4 5 5 2 3 1.67 Polr2b 5 5 3 3 1.67 Pcna 3 5 3 3 1.67 Eif5b 4 5 3 3 1.67 Gtf3c1 3 5 3 3 1.67 Nars 6 10 5 6 1.67 Rpl24 7 28 6 17 1.65 Hnrnpa3 7 28 8 17 1.65 Rps25 6 31 5 19 1.63 Eef1g 5 13 4 8 1.63 Rpl13 7 21 9 13 1.62 Eif2s2 5 8 4 5 1.60 H2afv 3 11 3 7 1.57 Rpf2 6 14 5 9 1.56 Rpl10l 3 14 3 9 1.56 Rpl23 6 17 5 11 1.55 Atp5b 10 17 7 11 1.55 Rps8 6 20 5 13 1.54 Rps3 15 61 14 40 1.53 Anxa2 14 35 17 23 1.52 Vwa5a 3 3 1 2 1.50 Rps15 3 3 1 2 1.50 Tmem214 5 6 4 4 1.50 Rps17 3 9 3 6 1.50 Raver1 3 3 1 2 1.50 Rbm34 7 9 5 6 1.50 Snrpa1 10 24 10 16 1.50 Purb 2 3 2 2 1.50 Rrp9 6 12 7 8 1.50 Rsbn1 3 3 2 2 1.50 Sf3b2 5 6 4 4 1.50 Rpl19 3 6 2 4 1.50 Cebpz 4 6 4 4 1.50 Fen1 3 3 2 2 1.50 Dynlrb1 1 3 1 2 1.50 Cdc5l 5 6 4 4 1.50 Smpd4 2 3 2 2 1.50

203

Table S2 Mass spectrometry data (continued)

Pabpc4 2 3 2 2 1.50 Tfip11 3 3 2 2 1.50 Ddx39b 5 15 5 10 1.50 Srpr 5 6 4 4 1.50 Dnmt1 2 3 2 2 1.50 Mcm4 7 12 5 8 1.50 Supt5h 2 3 2 2 1.50 Nup160 12 18 9 12 1.50 Idh2 3 3 1 2 1.50 Pogz 2 3 2 2 1.50 Pes1 7 9 5 6 1.50 Cfl1 2 3 2 2 1.50 Cmtr1 5 6 4 4 1.50 Nup188 3 3 1 2 1.50 Sltm 2 3 2 2 1.50 Pycr2 2 3 2 2 1.50 Smarcd3 3 3 2 2 1.50 Eif2s1 3 3 2 2 1.50 Elavl1 7 15 6 10 1.50 Ppan 9 15 7 10 1.50 Slc25a3 4 9 5 6 1.50 Dkc1 13 65 14 44 1.48 Cdk1 9 22 10 15 1.47 Tmpo 12 16 9 11 1.45 Acta2 21 64 19 44 1.45 Numa1 13 16 9 11 1.45 Tarbp2 10 33 9 23 1.43 Rsu1 5 10 4 7 1.43 Mta1 6 10 6 7 1.43 Pdcd11 27 52 25 37 1.41 Smarcb1 4 7 4 5 1.40 Rps19 14 49 15 35 1.40 Stau1 5 7 4 5 1.40 Gcn1l1 7 7 4 5 1.40 Lbr 17 35 12 25 1.40 Ncapg 6 7 4 5 1.40 Utp18 5 7 4 5 1.40 Actr3 4 7 4 5 1.40 Thbs1 6 7 5 5 1.40 Mki67 9 14 9 10 1.40 Myh10 36 68 28 49 1.39 Rpl11 3 11 3 8 1.38 42980 10 22 9 16 1.38

204

Table S2 Mass spectrometry data (continued)

Rpl34 5 11 5 8 1.38 Hnrnpl 24 125 29 91 1.37 Rpl31 5 15 8 11 1.36 Dnaja2 2 4 2 3 1.33 Ruvbl1 4 4 3 3 1.33 Nop2 2 4 1 3 1.33 Nhp2 2 4 1 3 1.33 Hsd17b10 3 4 2 3 1.33 Nle1 3 4 3 3 1.33 Rrp1b 9 12 9 9 1.33 Phf2 3 4 3 3 1.33 Atp13a1 4 4 3 3 1.33 Ints4 4 4 3 3 1.33 2 4 2 3 1.33 Slc25a4 10 28 12 21 1.33 Eif5a 3 4 3 3 1.33 Xrn2 4 4 3 3 1.33 Rcl1 4 8 3 6 1.33 Hsd17b12 5 8 3 6 1.33 Nat10 14 16 11 12 1.33 Rfc3 3 4 3 3 1.33 Orc1 3 4 3 3 1.33 Cdc40 4 4 3 3 1.33 Tmem209 2 4 3 3 1.33 Cenpc 3 4 3 3 1.33 Fubp1 4 8 6 6 1.33 Ddx3x 1 4 3 3 1.33 Cct6a 2 4 3 3 1.33 Ddx50 10 25 13 19 1.32 Rpn2 12 26 12 20 1.30 Flna 71 178 65 137 1.30 Tuba1a 11 35 13 27 1.30 Gar1 4 9 5 7 1.29 Hnrnpc 19 147 21 116 1.27 Vdac3 3 5 2 4 1.25 Ppib 5 10 3 8 1.25 Phb 3 5 4 4 1.25 Dhx37 5 5 4 4 1.25 Hadha 4 5 4 4 1.25 Brix1 8 21 8 17 1.24 Eif4a1 6 21 10 17 1.24 Flnb 20 37 23 30 1.23 Rpl27 9 16 4 13 1.23

205

Table S2 Mass spectrometry data (continued)

Slc25a5 4 16 3 13 1.23 Npm1 11 22 10 18 1.22 Ddx39a 10 33 15 27 1.22 Gapdh 12 28 13 23 1.22 Nup93 15 23 14 19 1.21 Sf3b1 13 23 12 19 1.21 Nup85 4 6 4 5 1.20 Myh9 48 114 44 95 1.20 Prkra 4 18 7 15 1.20 Rps27l 2 6 1 5 1.20 Bub3 4 6 5 5 1.20 Hdac2 2 6 3 5 1.20 G3bp1 5 6 5 5 1.20 Hnrnpa2b1 9 36 13 30 1.20 Ddx18 14 26 16 22 1.18 Utp20 17 20 13 17 1.18 Rplp0 7 20 11 17 1.18 Mak16 3 7 3 6 1.17 Ptbp2 5 7 4 6 1.17 Wars 5 7 4 6 1.17 Immt 7 7 6 6 1.17 Mcm6 6 7 6 6 1.17 Nup107 5 7 6 6 1.17 Dhx9 46 164 54 142 1.15 Nup205 15 23 17 20 1.15 Snrpd1 4 46 4 40 1.15 Nop58 15 63 16 55 1.15 Rps3a 13 24 12 21 1.14 Dync1h1 89 184 104 161 1.14 Arhgef2 5 8 6 7 1.14 Ddx5 15 44 16 39 1.13 Rps28 3 9 3 8 1.13 Snrpe 3 9 4 8 1.13 Rps13 5 9 5 8 1.13 Csrp1 1 9 3 8 1.13 Tra2a 3 9 4 8 1.13 Hist1h4a 7 46 8 41 1.12 Srsf3 5 20 8 18 1.11 Pds5b 14 20 13 18 1.11 Myl12b 6 10 7 9 1.11 Mcm3 8 10 8 9 1.11 Ncapd2 8 10 6 9 1.11 Abcf2 9 11 8 10 1.10

206

Table S2 Mass spectrometry data (continued)

Hnrnpr 10 22 13 20 1.10 Myo1c 13 23 15 21 1.10 Vim 35 185 38 169 1.09 Raly 15 36 15 33 1.09 Ddx17 22 64 21 59 1.08 Rpl23a 8 32 10 30 1.07 Snd1 19 48 26 45 1.07 42989 11 17 11 16 1.06 Rpl3 16 54 20 51 1.06 Prpf19 6 18 7 17 1.06 Col6a3 16 19 17 18 1.06 Atp5a1 11 19 13 18 1.06 Ftsj3 12 20 12 19 1.05 Hnrnpul2 17 49 20 47 1.04 Pabpc1 14 25 14 24 1.04 Ddx21 31 77 34 74 1.04 Col12a1 59 105 68 101 1.04 Lmnb1 18 33 24 32 1.03 Matr3 13 35 19 34 1.03 Top2a 22 38 23 37 1.03 Smc3 22 40 29 39 1.03 Lmna 35 88 43 86 1.02 RepBd versus Aptamer Only in vitro RNA pulldown Unique Total Unique Total Ratio of Total Protein Peptides Peptides Peptides Peptides Peptides Aptamer Aptamer RepBd RepBd RepBd/Aptamer Hnrnpk Only 4 Only 5 27 197 Only 39.40 Srrm1 1 1 9 26 26.00 Pcbp1 1 1 11 25 25.00 Srsf7 1 1 6 24 24.00 Ppig 1 1 9 18 18.00 Pcbp2 1 1 5 13 13.00 Sap18 1 2 9 23 11.50 Srsf3 2 3 6 26 8.67 Rbm8a 2 2 6 16 8.00 Bclaf1 1 1 6 8 8.00 Ddx46 1 1 6 8 8.00 Mov10 1 1 6 8 8.00 Magohb 2 3 8 22 7.33 Rpl28 1 4 2 28 7.00 Rnps1 3 3 10 21 7.00 Cdc40 2 2 10 14 7.00 Pabpc1 1 1 7 7 7.00

207

Table S2 Mass spectrometry data (continued)

Rpl35a 1 1 5 7 7.00 Srsf9 1 1 3 7 7.00 Hist1h1a 1 1 3 7 7.00 Srsf4 3 3 10 20 6.67 Rrs1 2 2 8 13 6.50 Srrm2 10 13 45 80 6.15 Crnkl1 4 5 20 30 6.00 Ncbp2 1 1 4 6 6.00 Rps18 1 1 4 6 6.00 Dhx8 1 1 5 6 6.00 Acin1 3 4 12 23 5.75 Raly 3 3 9 17 5.67 Hnrnpf 3 3 5 17 5.67 Eif4a3 9 16 24 90 5.63 Gtpbp4 4 4 14 21 5.25 Snrnp70 4 5 11 26 5.20 Srsf5 2 2 5 10 5.00 Snrnp40 2 2 6 10 5.00 Rbm39 2 2 5 10 5.00 Rps19 1 1 4 5 5.00 Rpl30 1 1 4 5 5.00 Snrpg 1 1 4 5 5.00 Srsf6 3 4 9 19 4.75 Tra2a 3 4 6 19 4.75 Prpf4b 4 4 13 19 4.75 Rpl13a 1 3 7 14 4.67 Hnrnpl 8 12 22 55 4.58 Ncbp1 4 4 11 16 4.00 Rpl3 2 4 6 16 4.00 Rpl5 3 4 9 16 4.00 Rpl18 3 4 8 16 4.00 Rpl4 4 4 10 16 4.00 Ddx17 3 3 10 12 4.00 Fbl 2 2 7 8 4.00 Rsrc1 1 2 5 8 4.00 Rps16 2 2 6 8 4.00 Bcas2 1 1 3 4 4.00 Olfr868 1 1 1 4 4.00 Nsa2 1 1 3 4 4.00 Kiaa0020 4 4 12 15 3.75 Tra2b 3 3 5 11 3.67 Ddx23 3 3 7 11 3.67 Snrpa1 5 5 12 18 3.60

208

Table S2 Mass spectrometry data (continued)

Hnrnpc 10 23 16 81 3.52 Srsf2 3 6 6 21 3.50 Rpl7 6 6 12 21 3.50 Puf60 2 2 4 7 3.50 Rpl13 2 2 5 7 3.50 Hnrnpa2b1 2 3 5 10 3.33 Srsf1 7 15 14 49 3.27 Snrpb 4 6 8 18 3.00 Snw1 4 5 11 15 3.00 Rpl10a 3 3 7 9 3.00 Trpm4 3 3 6 9 3.00 Eif6 2 2 4 6 3.00 Gar1 2 2 4 6 3.00 Prpf38a 2 2 5 6 3.00 Ppie 1 1 3 3 3.00 Prpf38b 1 1 3 3 3.00 Ybx1 1 1 2 3 3.00 Tcof1 1 1 3 3 3.00 Nop2 1 1 2 3 3.00 Gnl3 1 1 3 3 3.00 Cebpz 1 1 1 3 3.00 Mrto4 6 6 11 17 2.83 Rbm14 4 4 9 11 2.75 Bop1 3 3 6 8 2.67 Tardbp 3 3 4 8 2.67 Ssb 9 13 15 34 2.62 Rpl6 4 5 9 13 2.60 Nop2 5 5 11 13 2.60 Ebna1bp2 3 4 8 10 2.50 Plrg1 2 2 4 5 2.50 Ftsj3 2 2 5 5 2.50 Rpl23 2 2 4 5 2.50 Rpf1 4 7 6 17 2.43 Rpl7a 5 5 9 12 2.40 Son 8 11 19 26 2.36 Rrp9 5 6 7 14 2.33 Rbmx 6 6 7 14 2.33 Nhp2l1 2 3 3 7 2.33 Snrpe 3 3 4 7 2.33 Hist1h1c 2 3 3 7 2.33 Krr1 3 3 6 7 2.33 Utp20 3 3 6 7 2.33 Eftud2 13 17 24 39 2.29

209

Table S2 Mass spectrometry data (continued)

Nop56 12 14 19 32 2.29 Nop58 10 11 12 25 2.27 Prpf8 29 36 56 81 2.25 Hnrnpul1 3 4 5 9 2.25 Ptbp1 4 4 6 9 2.25 Smarca5 6 6 8 13 2.17 Ncl 10 10 17 20 2.00 Rrp1 4 7 5 14 2.00 Hist1h1b 3 5 6 10 2.00 Hp1bp3 4 5 7 10 2.00 Larp7 4 4 8 8 2.00 Rpl27 3 3 4 6 2.00 Nup205 3 3 6 6 2.00 Ppan 3 3 4 6 2.00 Rbm28 2 2 2 4 2.00 Nup160 2 2 3 4 2.00 Pwp2 2 2 4 4 2.00 Hnrnph3 2 2 3 4 2.00 Rpl21 1 1 2 2 2.00 Cfap20 1 1 2 2 2.00 Rps28 1 1 2 2 2.00 U2surp 1 1 1 2 2.00 Rpl9 1 1 2 2 2.00 Imp3 1 1 2 2 2.00 Rpl24 1 1 2 2 2.00 Mta1 1 1 2 2 2.00 Rpl22 1 1 1 2 2.00 Znf326 1 1 1 2 2.00 Elavl1 1 1 2 2 2.00 Chtop 1 1 1 2 2.00 Nhp2 1 1 1 2 2.00 Cdc5l 8 9 15 17 1.89 Dkc1 5 16 9 30 1.88 Mybbp1a 8 8 10 14 1.75 Nup107 4 4 6 7 1.75 Hist1h3a 4 7 4 12 1.71 Hnrnpul2 6 7 9 12 1.71 Rpl11 2 3 2 5 1.67 Utp15 3 3 4 5 1.67 Pds5b 2 3 4 5 1.67 Top1 3 3 5 5 1.67 Rps3 3 3 4 5 1.67 Rpl15 2 3 5 5 1.67

210

Table S2 Mass spectrometry data (continued)

Rbm15 2 3 3 5 1.67 Dhx15 16 20 20 33 1.65 Srsf10 4 5 5 8 1.60 Xab2 6 7 9 11 1.57 Prpf19 6 6 6 9 1.50 Smarca1 3 4 4 6 1.50 Srsf11 1 2 3 3 1.50 Zfr 2 2 3 3 1.50 Rrp1b 1 2 3 3 1.50 Utp18 2 2 3 3 1.50 H1f0 2 2 3 3 1.50 Cbx3 1 2 2 3 1.50 Chd4 2 2 2 3 1.50 Rpl27a 2 2 2 3 1.50 Lmna 17 21 20 31 1.48 Rsl1d1 7 9 9 13 1.44 Snrpd2 4 7 5 10 1.43 Wdr36 5 5 6 7 1.40 H2afy2 7 12 10 16 1.33 Des 2 6 2 8 1.33 Wdr74 6 6 7 8 1.33 Snrpd1 3 3 4 4 1.33 Rplp0 3 3 4 4 1.33 Rpl26 2 3 2 4 1.33 Heatr1 2 3 4 4 1.33 Hist1h4a 9 38 8 49 1.29 Hist1h2ba 2 15 3 19 1.27 Ddx39a 12 16 12 20 1.25 Prpf6 4 4 5 5 1.25 Hnrnpdl 4 4 5 5 1.25 Rpl12 3 5 3 6 1.20 Rpf2 5 7 4 8 1.14 Hnrnph1 6 8 7 9 1.13 Baz1b 8 9 6 10 1.11 Snrnp200 19 21 20 22 1.05 ASH2L antibody immunoprecipitation Protein Unique Total Vim Peptides46 Peptides129 Plec 35 36 Fn1 17 18 Acta2 13 17 Prph 4 17 Hist1h4a 6 13

211

Table S2 Mass spectrometry data (continued)

Lmna 12 12 Ahnak 9 10 Rabbit IgG 6 10 Uba52 5 9 Hist1h2ba 5 9 Des 4 9 Hist1h2af 4 9 Ckap4 8 8 Hspa5 7 8 Nono 6 8 Ciz1 6 7 Hist1h3a 6 7 Serpinh1 6 6 P4hb 6 6 Hsp90b1 6 6 Pdia3 5 6 Hnrnpc 5 6 Hist1h1b 5 5 Hnrnpa3 4 5 Actb 4 5 Ilf2 4 4 Hnrnpa2b1 4 4 Hnrnpu 4 4 Hist1h2bb 3 4 Ina 1 4 Gfap 1 4 Hnrnpm 3 3 Rbmx 3 3 Hist1h1c 3 3 Elavl1 3 3 Tuba1a 3 3 Hist1h1a 3 3 Sfpq 3 3 Tmpo 3 3 Hspa8 3 3 Cow IgG 3 3 Dnase1 3 3 Nes 3 3 Hnrnpr 2 3 Gsn 2 2 Anxa2 2 2 Lyz1 2 2 Hist1h1e 2 2

212

Table S2 Mass spectrometry data (continued)

Ddx17 2 2 Anxa1 2 2 Tardbp 2 2 Hnrnpk 2 2 Hnrnpa1 2 2 Hp1bp3 2 2 Rcc1 2 2 Lmnb1 2 2 Prdx1 2 2 Ywhab 2 2 Ptbp1 2 2 Ncl 2 2 Myo6 1 2 Hnrnpd 1 1 Alb 1 1 Hba 1 1 Hnrnpab 1 1 Trap1 1 1 Ppib 1 1 Psma8 1 1 Npm1 1 1 Rpl7 1 1 Col18a1 1 1 Sf3b3 1 1 Tubb2a 1 1 Wdr5 1 1 Celf2 1 1 Prdx2 1 1 Synm 1 1 Snrpd3 1 1 Ptbp2 1 1 Crem 1 1 Hbb-b1 1 1 H2afy2 1 1 Vcl 1 1 Aldoa 1 1 Rbbp5 1 1 Raly 1 1 Cat 1 1 Znf692 1 1 Rpl6 1 1 Hspa1a 1 1 Fbl 1 1

213

Table S2 Mass spectrometry data (continued)

Actn1 1 1 Csda 1 1 Cep112 1 1 Ddx5 1 1 Psmb3 1 1 Capza1 1 1 Atp5a1 1 1 Prdx4 1 1 Pcolce 1 1 Tbc1d8b 1 1 Atad3 1 1 Dnmt3a 1 1 Krt13 1 1 Rpl12 1 1 Ints7 1 1 Rapgef3 1 1 Sema6c 1 1 Tpm1 1 1 Dnajb11 1 1 Eno1 1 1 Plec 1 1 Fabp5 1 1 Hnrnph1 1 1 Ldha 1 1 Sumo2 1 1 Tmem202 1 1 Celf1 1 1 Rplp0 1 1 Tecta 1 1 Vdac1 1 1

214

Table S3 Guide RNA sequences

Protein knock-in/out cells Coding region HNRNPK KO AACCTACCTCTTCCAAGGTA EED KO TTTAACTGGCACAGTAAAGA RING1A KO CTGTTACCCGCTCCGCAGGG RING1B KO AAAGAACACCATGACTACAA HNRNPU KO CTCGGGAGCGGGCCTAGAGC CIZ1 KO (mES cell) CAGCCTTACACCACCCCAGA CIZ1 KO (HEK293FT) GCCACTCGCCAGTCCTTGCT CIZ1 KO (MEF) GGGGTGACCATCTGGGG GCAGCAGTTCTTTCCCC CIZ1-EGFP knock-in GAGCGCCGAAGGGGAGG GCCTCAAAACCTGATAG Xist deletion cells Upstream Downstream ∆RepA ATTCTTGCCCATCGGGGCCA CATCCACCAAGCGCCCCGTT ∆RepF GAAGCCATAATGGCGGACGC AAGCATGCGCTCTCCCGACC ∆RepB (MEF and mES cell) ATAAGGACGTGAGTTTCGCT CTCTAAGTAGAAGTGGGCTT ∆RepBa ATAAGGACGTGAGTTTCGCT AAGCGGTTCTCTAAGCCTAC ∆RepBb AAGCGGTTCTCTAAGCCTAC CAGATGATACTCAAATTACT ∆RepBc CAGATGATACTCAAATTACT GGAGAATCTAGATGCCATAA ∆RepBd GGAGAATCTAGATGCCATAA CTCTAAGTAGAAGTGGGCTT ∆RepC CTCTAAGTAGAAGTGGGCTT GTGTATCTTGATTAACATGA ∆C-D GTGTATCTTGATTAACATGA TACAGGAGTCCTGATCTAAA ∆RepD ACAGTTGTGCCTTTTAGATC TGTACGAAGTGCTCTTCATT ∆Ex1 3' TCTGCTTTAAAGCGGAGAAG ATGGCAAGATGGCACGCGGA ∆Ex2-6 GAGTCAGACGTCATTACAGA TTCATCTTCCATCGTGTACA ∆RepE (Clones 4 & 16) AGAATTAGACACACAGACCA ATACATCATTCCGTCCGGTC ∆RepE (Clone 1-2) AGAATTAGACACACAGACCA AGAACATGCAGGAGAAACAT ∆RepE (Clones 3-9 & 3-16) ATTCTAAAGTAATCCTTTCT ATGGCCCGTCTTGAGCTGGC ∆Ex7a ATACATCATTCCGTCCGGTC TTACGTATTCATACGTTTCC ∆Ex7b TTACGTATTCATACGTTTCC TGTTCGCTCAACACATAGCA ∆Ex7c TGTTCGCTCAACACATAGCA TTGCCTCAGAGGTTGAATAC ∆Ex7d TTGCCTCAGAGGTTGAATAC ACAGAACTTCTGTTGACTCT ∆Ex7 TTCATCTTCCATCGTGTACA ACAAGATGGCGTCTGTAACT

215 Table S4 FISH probe sequences

Region Sequence Ex1 RepA TAAGTATCCAAAACCCCGTTGGGCAT AAAAGCAGGTATCCACAGCCCAGAT AAAAAGCAGGTATCCATGGCCC TCCCGATGGACCGAGAAAGG

ATGGGAAAAAAAGACTAAACGCAGGTATCC GGTATCCGAGGCCCCGTTG ACAGCCCCGATGGGCAAAAG CCCGTTGGGCCGTGAAAAAA GCAGAAGCCATAATGGCGGAC RepF GGCGTAACTGGCTCGAGAATAGC

TGTGCTCCTCGGTGTCCTAATTCTT CAATTGGTTGCTTTTATCCAGTCCGC

GCCTTTAGCTAGCGCAGCG CAACCCCGCACATATAAAGAAAGCC

TACCACAAATCAAGGCGAATCCC ACGAGCACTCCTTGGCTTTCT

GCGCAACACCGCACACTAAT

CCCTAGTCCTCTGCGGCTTC ACACCCACGCTGAGCCCTAT

GGGTCCATATGCACACACACCC TTCTGAGCAGCCCTTAAAGCCAC

CCCCGAGCCGCCATTTTATAGAC

ACAAAGATTGGGCTGTCGAGCC CCTAAATGTCCTATAATCCATTGCTACACA

CGCCATCTTTTCCTGTACGACCTAAA GCGTTATACCGCACCAAGAACTT GCCACCTCTCCAGGCTAACTC TCCGCCATCTTAGACACATTCAAGA CCACTAGAGGGCAGGTCACA

CCAAGTAGCTAAAGCCCGCCAAA AGTGTACCCTCGGGCAAAGC

CCACACCCTACCATAATGCACCA TGGCACAAGGTAGGACCAACC CCACTTAGCCTTGCCTCAGCTTC ACGTCAAGTGGCAACCAACACTTC CACTTGTGCCCATTTCTGACGAGTTA

GGAAGTCAAGCAAACACCAACAC GGCCACCAAGCAATAATGCACA AGTAGCTAAAGCCCGCCAAA CCATAATGCACCAAGTGTACCCTCG CACAAGGTAGGACCAACCACACC GTGCCCATTTCTGACGAGTTACGTC

216

Table S4 FISH probe sequences (continued)

CTGGAAGTCAAGCAAACACCAACACTT GGCCACCAAGCAATAATGCACATTT AGTGACTCATCACAGTCTAATTCCATCC RepB RepBa TGTTAGAATTGCAAGCATGCGCTCT

GGATCCGACATCATCCAACACTTCAG

CAGCCTCGGTCTCTCGAATC AATGATTTACATCGACCAAGAACCCG

CAAATGTCCTTGAAATGGCCTTAGATACTTTTC ACTGTAAGAGACTATGAACGCAAGCG

GAGTTTTAGTTTGCACCGCCACGTATAG

ACTAGCCACTGAACAATACAAAGGAATGGAA GGCTAAATCAGAGGCCAAGGTGTAAGT

GTAGGCTTAGAGAACCGCTTGAGATCAG RepBb ATCCACTATGACTCTGGCCAAAGT

GTAGGATTCTACCTCTTCACCAGTGATT CCAGCAATAGTCATGGGGTAGATTTTGG

TGTCGTTGGCATCCAAAATATTCATTGAAATCA

ATTCAACAAGTAAGCAAATCCGTAAATACATTTTG GGACTCCAAAGTAACAATTCACTAAAGGT

RepBc TGAAGAGTAGCTCGGTGGATGAGTTTG GTTCACTTTACATTAACAGTATCTGTGTTATAAGACTTAAA

GCCTGAACTGGTATAGAACCTCAAATACTC ACCATTAGTCAATATAGCATCAACAATAGCAAC

CTATTATAACAGTAAGTCTGATAGAGGACACAGAGC

GCAAGAACACACAGAAAGTTGAAATGTAGGC RepBd GCACTATTTGCTGAGTCTTGAGGAGAAT AAGTGGGCTTGGGATAGGTCTGAAA

RepC GGGCACTGCATTTTAGCAATAGG GAAGTCAGTATGGAGGGGGTAT

C -D CTCTGATATAAAACTCTTGTTTGATTGGTGTATCTTG TTGAGAGATGATACCTCCATGGCAAGT

ACTACTACAGCAATGACAGAATGGTTTTC AACACTGCTTAGAAACTTGGGACTGTG

CTAAAGTCTAATCCAATGGACAAAATATTTCTGACA

CTAAATGCACACAGGGCTGGACTAG AATGCCTTGAAAATTGGGACTGAGCAC

GACAGCATGCCAACAGTATATAGTATTCTACCC AAAGCATGTGAGACTAGTATACAATATCATGAGC

CAGTGCAGAGGTTTTTGGCTGAAATAAG

GACTCAATTCCTAGTCAGGATTATCCACATA CATAAAGCAAGGGTAGTATTAGGACCTTGAG

GTGTAAGAGGCATTAAGTAATCAGCACCCTC GGACTTGAACAACTGCAATTTTGCACAATT

217

Table S4 FISH probe sequences (continued)

GAGGGTATGGGATCTTGGTTACTAACA ACCCTTGATTGTCACCCATTAGGGTA AACAGAGAAGTGGTCTCATTGGTTGG TACTAGCGTACACAAGACTCAAGGTTTGATTC

GGCTGTAGCTCTATGACAGTGCTT AAATATGTTTACATTACAGGTGGCAATGCCTG CAATGCTTAGGAAGAGGGACAAATGCA GATCAAAGGTCTTCTTGATTACCAACAAAATGAC GCCAGCCTGGAAGTTAAAAACAGGAC

AACACTTGTTAAACGCAGGCTAGATCCT GACATATGCACATTCACAATATGAAAGACTGC CACATTTACTATGTTAAGGATCTTAAATACTGCTGCAA GGACATGAGAATATGTACAATAGAGTTAACACTGTG RepD GAGCCTCTTTGCTTTGCTAAGTACAGG

GTATTGGAATCGTTTCAATCTATAGTCTCATGAAAG CAGAAGTCTTACCTTGAAGGACCATTGACC

TCCATAAAGCAAAGAGGATATGAATGATCAGAGA

GGGTTTCAAGTGCACAGCACATACATAAT CAAAGTTAAGAGTAAAATTGATCCTACTAAAATTGCCAG

GGACCTTATAGGTCAACACCACTTCTGTAC CTACAATCAGTCATTATTATTAGCAAGCCACAA

GTTCAGGTTTCCTTCTGTAGTGAACAGA

GTAAATGCATACTAATATGCAAGGTACATGTTTATGG CTTGCTGTACTGCAAAAGGGTTTGA TGGAAAGGAGGGGACAGCCTTAT GTTCAAGGGACATGTTATCAATTAAAAACCCCAT ATGAATGGATCATGTCCCTGTTATATACATTAAT

GGATAAGTGTAATTTTACACATTAACTGGCCAAGTA AGATACAATGGTCCGAAAAGTAATAAGGTTGT

GGTATCCCTTGCAGGAGTGCA ACTAGCAGGGACCTTGGGAGATAAAC

GCGAAGGAGTATGGCCTTTGTTTACTG CTTGTTGATCAGCATATCCTGATATAGTGCAAAT

CATTTAATATATGATAACAGTCCAAAAGAATGCGGC

AGGTGGCAGTGCATACGCATA TGGGATCCTTTCAAGTGCACAGAG

CTGCTGAAGGTGCTAAGGAAGTGA TTTCCAAATGAAAGTCTTGAGCTTATTATCACTTC

GTAGTTCTTAGAGAAGTGCTTAGACATGTGAAC

Ex1 3' CTTAAGTCCAAATGAAGAGCACTTCGTAC TCCAAAGGTAATGATGGACAACTAACTGCA

GGTGTCAAAAGAGTACAAGGATGTAAAATCC GACAAGAGAAGTGCTCAGAAATAATTAATATGCCTC

218

Table S4 FISH probe sequences (continued)

GCACAGGTCATATGTGTAAGGGTACAGA GTAGGCATTTCAGAACCTTTGCTGC CACTCAGCAGCCCCAGTCAAAA TATGCAAAGAGAGCACACAGGTCCTT

TAATTCTGGGACTCAGTAGCCTTGAT GCTCAGCAACCTCTGCAATAATGTAAAGG TGCTCATTGACAGTACCGGGTAGTTT GAAGAGAGCAGGTCATTCGTCAGA TGTGCCTATTAAGAGTCCCAAAATCAGT

GGAAACTAGCAGGGGATTTGCTCAAG GGTCATTGTTGGGTGTACATACTGCTGTA TAAAGCAATGCAAAAAGGGTCGAAATCCG CACTATAAAGTGTGAGATTGTGACTATTTAGGATTG GACAAACTAGAGGCCGGGCAAAAG

GTTTGTGAGAATTAGTGATTCAGAAAAGTGGTCAATG GAGAAAGGGCACAAACTCTTCCTTAATGATG

GCAAAAGGGATGGCATGATGGAA GCCCAGTCAATACTGTTCAAACAAAGAG CATATGAAGTGAGTAAACTGGTGTTGTTGACTTTA

AAGGAGAAGGCAACTGAGACACTGTAG AATTCGAGGTGTCTCTGTTCAGTGG

GACACATAGGGTGTTTGGGGTAATTAACATTA CCTGCTTTTGCTATTCAGTGTATAAATACACC GTTAAAGATTGAAGGCTCATCCACCTAGTTTTG AGCAGTTGGAATTGGAATAATTTAGAAGTGCAA AAGAATGGTTCCTGGCCCTAGAATGT GCAATCCAACTCCCTGAGCTACTTG CTTAAAGTTTCTTAAGCAGACAGTTGGCTCC ATTATCAATAACTTTTCTGGCAGTTGGTCC CCTTGCAATCCAAATGCCTTTCTTAAGG GGCTATTTTTACTTCACAAAAGCTAATGATCTCA

TGAGACACTGCTTAGTCTTCAGAAACATG CCACAGACTCATCACCCTCAGTACA

GTCTTAAACATTCTGCAATAGTTGCACTGATC

GTTAATCATACTAAAGGCCACACAAAGATTGAC GACAAAATATAATGATTAGAAGGCTTAGGTCATCTTCC

CAAAGGCGACTTGACATGTTCTCAAATTTAATCC CAAGCTACTTCAAATTATTGCCAGAGTTTAATGG

GTAAAGTGGGGAAACAGGTTCTATCATCTC

CGTTTAAAGGAATGATTAGATCCTGGCAGAC CTAGAGAAAGCTCTTTTCTTACTAGGAAAATCTCTC

CTAAGTACTCGGCGGCTACAAAGC TCTGAGCACCATGTGAATGAATGACTG

219

Table S4 FISH probe sequences (continued)

AGATAGACGAACCAGCTCCCATTG Ex2-6 GAGCACAAAACAGACTCCAAATTCATCC CAGGCAATCCTTCTTCTTGAGGCAG CTCTCCAGCACTCTTCACTCCTCTAAATC

CTTGAGTCTCACATAGGGATTGTTTGTCC GCTGTATAGGCTGCTGGCAGTC TGAAGAGAAGTTCTGCTGAGATGTAATGTA GTAAGCTTTTTGTTCAGAGTAGCGAGGAC CCACAATTCTGGGGGGAGAT

CAGCACTGCAAAGCAGCAAGC

GGATCGTCAAAGGGAATAGGTCGC CTCATGCCCCATCTCCACCTAG GTGAGCTATTCCCCTGGAGGATC TAGGCCTGTTGCCCAGTGGT

AAGCGTCTCACTGAAATCTGGGC

TTTGTCTTCCTTGCTGGGTTCAGGA

CTTTGATGTAGGGTGGCATTCTTTGAGC TGCCACTATTGCAGCAGCTTTTCTC CAGAATGGCTTCCTCGAAGGTCAG

CCTAGCTTCTGGAGAGAGAACCAAATAGAG

CTTGAAGTATGTAAACAGCTGGCAAAGCT

GTATTATATGGCATGAGTAGGGTAGCAGTGC GGGGAAGGGTAATATTTGGTAGATGGCA GAAGACCCAGTTTTCTGTGCTGCTTT GTGTGTTCACATTGCTTGATCACGCT CTCGGGTCATTTATAAAGCTGCCTTCC

GATGCTGCAGTCAGGCATGTTGATC Ex7 RepE ATAGACACACAAAGCAAGGAAG AATAGATGTAACAAAGAATTAGACACA ACTGCAAGGAAGAAATAGACACA GACACACTGCAGACAGAAAAGACACAC

ACATGCAGGAGAAACATGGGAATAGGTA

AGACCACACAGGCACCAGAGAAA

CAGAGAGCCATAGCTAGTGAAGACAAAGC

GATGCATGCAACATGTGACCCAAAG AAGAACAAGTGGGGTGAGCACAAGAAA

CTGAAATGGACTGACAAAGAGAACTTGAAC

CAAGAATTCACTAAGGATAGAAGCAGCAGAAAGA

CAGAATGATGCTGAGATAAGAAATCCACCAATAG

AGAATACAAGAGAGACACAGACTTGTAAAGAGA GATTATTTCTCAGGGCAGTCTGGTGTGAG

Ex7a CGTTCACTTCAGAGCCACTTGAATCC

CCTTTTAAATCTCAAGTCGCAAAACTTCTGTCTTC

220

Table S4 FISH probe sequences (continued)

GGACTTAGCTCAGGTTTTGTGTCTTATGAC CCATATAGGATTGGGTCCTTAGGTCTTATGC CATCTTCGAATTTCCCTTCTTGGTCTTGG GCAACAATGAAAAATTACATCTCAAGTGAAGACAGC

CCAGCTAAAAACGTATTCCTTTATGGGCAATG CCCAGTTTAAAACAGACCCTTGCACAATAC CTATATACACTGGCTGGATTTAGTTCCGTCTC ATGCTTATTCCTAGGGAAGTTTCAATAGTCAGG TCTAATTCTTCTCATTGGTCTGGAACAACTC

GGTTGGCACATCTGCATATTGCTTG CCTAAGCCCAAAGAAAGAAGGTGACAC CCAAACCATTATTAGAAAACCATACCCTGCTC GAGCAGGGAGTTCAGACCTCATTTTG GCTATGAGCATGTTACTTTTTCATCCTGTGTC

GATGCCTTATAGGAAAAGCTCTACACTTTCAG TGCAACTAAGGAATATGCAACTAAGACCATGAA

TGATCAAGTACAGGCAATAGACCAGTGTG GTTACTCAGACATTCCCTGGCATTCATATC CCACAGTGAAGCATCACAGCCTT

Ex7b CAGTAGGCTTCATGGTTTACGTATTCATACG CACAGAGGCGTAGACTTCAGCTACTATG

GTTATTCCTAGAGCAATCTTTGTGCCTATAGC GGACTGTTATTCATTCAAGCCCCGTTATG

CCTCACCAAAGCAGAAGAAACACACTG

GGTAATCAATCACCTGCATTAGTGACCT GAACATGGGTATTCAGAAATTAGAAGGTGATCAG GTAATGTCCCCCTTTGTTATTCCCAGTG CACACATGGAACAATTGATAGCTGTATAATGC CGGATACATCAGAGGTAGTTAGTAGTCATTCA

CATGGCAGTTACTCATGAATATTTCATGTACCC GCATTGGCTCATTCCTTATGAATAGTATGG

CTTGTATTAACATGCCCTTATCACTCAAGTTTTGAC GTACCCACAACCATAATGTGACCTGG

CTTGTTAACTGTACTCTTCCCATTATGGGC GATTGACCTTATGGTCTTCCCAGGCTAG

GAGAATATACACATGGGAGACAATATTTAGCCTC

CTGAAATGTACGTCTGTGTATCTCTCCAGTAC ATGCCAACGTACATATTATTCTCAAGTGTGG

ATGAATTGTTACATTTATATGGTACCTCCGAGTTC GAATGCTTCATATATTCAGTGGTTCACAGTTAC

AATGACTGCCATTCAGTATGTACTCATCAAG

CTTTGCATCCCAGTGCCTAGCAA CAAAGTCATTGCACACCAGGCAAATATTG

221

Table S4 FISH probe sequences (continued)

ACACAGATCTGTTGTGCATGCCTG GTCCCGGCTTTATAGAACTGTAGACTTCA

Ex7c CTCCTATAAGTGCTACATAATGCTAGAAAGCAATC AGTATCCTAGACAGCATAGTCCACTCAGTG

TTTTCCACCTCCTCAGTCAAGCCA GGTGTTTGTAATTCATCTCTGGCAGCAG

CACTCTGGCCCTATAAAACTCCTTTACTGG

GTTTTGTTCCCTGATACCCATGCTCTG GAGTACATGCCATCAGATGCTCTGG

CAGTACATCTAGTTCGAGAGGTTCTGGG CCTCACATTTTAGTTTTAGGACCTAGGTTAGC CTCCTATGATCAGTGATCTACACTAGGTCCA

CACTCAGTAATCAGATATATTCTTCAGGCTGG GGCCAATGAGAAAATGACTTTTGGAGCTTC

AAGAAAAGGAAAATGGGTAATGAATTAAAGGACAGAC GCCAGGCAATGAACAAATACGTTTAGAATTATC

TGATAGTATATTCTTCTTGCTCACATGACGCTG CTGAGGTAAAGTACACTTTGGGTTGCATC

GCTAAAGTGAGAGCTGTAGTCTCAAGGTG

TTCCACTCTACCAAGCATGCCCAT GCAATGTGTGGAACCGAGGAAATAGAGAA

CAAATGTGTTGCACTTAGCATCCAGTAGC CATCTGTAAGTCACACTGAGTGCCCT

TAATAAAGAGCATCCCTCTGCTTTCCG

GAAATGAGGAAAGTGTTCTTCAGTCTAGAAAACG Ex7d ATGCAAATCCTTGTAGAACAGGCGAAC TTTCCATGTCTCAGGAGCCAAGTGAA

CATCCAGCACTTCCCTATGTCTATGTTTCC AGTCGTGCCACTACCCAGAAAAGAG

GACTCACATTGCATATTACTTGGGGACTAAGG CTTCTGTAACTTTAGGGTCAGGCTTGC

GTGGTCACAGTCATAGCTAAAATGGCCTA TTAGATGGCTGTCATCCCTCCTGAC

CAGGATAATGCAAAGTAGCCTAAGATTGTGG GAGAACATTTCTAAATTACTGTGAGCATTAGCATG

GCATTATTTTACCATAGCAAGCCACCTTGAG

TTCGTTCCTCCAGTTTTCTGATTGTTGTC AATGGAAAGGGTCAAAGCACAGGC

CTTTCCTTCACTCGGCCTCCTG AACTTAGATTCCAGATAGACAGGCTGGAC

AGAGATCACATTGACTTAATCTGTTTCATTCCC

GCGTATGCTACAATGAATAAGCTTCCTTACA CCTGAATGTCCTGATTGGGTTACTAAGTAAAGTC

222

Table S4 FISH probe sequences (continued)

GTTTGAACTCCCAGACCTCTTCAACCT GGGATATTACTGAATCCAGCCAATTTCCAC GCAAATTCCAGAAATGTAAGCCCATTCCTAC GCAATATTGATGGCAGTAACTGTGAAAGGAAG

GCAGATGATGGTAGGATGTGCTTAATTGGTA TCCAGTCTTTCAACAACAGATTGGTACACA CCATGTGTTTTTTCACAGTTGGTGGGA

223

Table S5 Primer sequences

Amplicon Sequence Forward Reverse Xist Ex1-1 CCCGTGGCTTTAAGGGCTG TTTGCGTTATACCGCACCAAGAA Xist Ex1-2 ACTCATCCACCGAGCTACT GATGCCATAAAGGCAAGAAC Xist Ex1-3 ACTTTGCATACAGTCCTACTTTACTT GGAAAGGAGACTTGAGAGATGATAC Xist Ex1-4 CCTGCGTTTAACAAGTGTTGGCATA GATCTAAAAGGCACAACTGTGGACATG Xist Ex7-1 CTTGAGCCCAGAAATTGGAG CGAGTTCATAAAATGCCAACG Xist Ex7-2 AAAGTTGCTCCCCTCCTCAT TCACACTGAGTGCCCTTTTG Hnrnpk TCTCCCATCAAAGGACGTGCAC AAGAGGCAGATTCCGGGCTCT Hnrnpu 1 TCAGAAAGAGGAAGCCCAAA GCCACCACCTCTGTTGAACT Hnrnpu 2 CTCCTGGGAATCGTGGTGGATATAATAGG TGTTGGGCATTCCACCTCTGTTGTA Ring1a AGCAAAACGTGGGAACTGAG TGGGACACATGAGCTCAGAA ATACATAAAGACTTCAGGCAATGCCACTGT TGGCTATGTAAATGGTGTACTGCTTCTCAC Ring1b Eed TCGCCCAAGAATAGTCACAT CGATGGTTAGGCGATTTGAT Ciz1 1 TGAGACGTACAACCCCAACA GGGACTTGCAGTGAGAAAGC Ciz1 2 GATGACAGCACCAAAGCAGA GGCTCTACCTGATCCTGCTG U1 CCAGGGCGAGGCTTATCCATT GCAGTCCCCCACTACCACAAAT Malat1 CATGGCGGAATTGCTGGTA CGTGCCAACAGCATAGCAGTA βActin CGGTTCCGATGCCCTGAGGCTCTT CGTCACACTTCATGATGGAATTGA Ash2l 1 CAAGCGTAAGCAGCAAGATG CCAAAGGATAGCCATGAGGA Ash2l 2 GGGAAAGCCTATTCCTGGAG GTCAGCCGGTCATCAGAGAT

224