TRANSCRIPTOME-WIDE INVESTIGATION OF NUCLEAR RNA-BINDING

PROTEINS

by

ERIC NGUYEN

B.A., B.S., University of Washington, 2009

A thesis submitted to the

Faculty of the Graduate School of the

University of Colorado in partial fulfillment

of the requirements for the degree of

Doctor of Philosophy

Molecular Biology Program

2017

This thesis for the Doctor of Philosophy degree by

Eric Nguyen

has been approved for the

Molecular Biology Program

by

Arthur Gutierrez-Hartmann, Chair

Anthony Gerber

Thomas Evans

Patricia Ernst

Matthew Taylor

Aaron Johnson, Advisor

Date: December 15th, 2017

ii Nguyen, Eric (PhD, Molecular Biology Program)

Transcriptome-Wide Investigation of Nuclear RNA-Binding

Thesis directed by Assistant Professor Aaron M. Johnson

ABSTRACT

RNA-binding proteins play a number of important roles throughout the cell.

In order to more closely investigate their activity, we have adapted high-throughput techniques to characterize their activity across the transcriptome.

We have previously identified heterogeneous nuclear ribonucleoprotein

(hnRNP) A2/B1 as a potential adaptor for interactions between the chromatin silencing complex PRC2 and the RNA HOTAIR. We used enhanced cross- linking immunoprecipitation (eCLIP) to map the complete set of direct interactions between hnRNP A2/B1 and RNA in breast cancer cells. Surprisingly, a strong A2/B1 binding site occurs in the third intron of HOTAIR, which interrupts a known RNA-

RNA interaction hotspot and is retained at a higher frequency than other HOTAIR introns. In vitro eCLIP experiments suggest that A2/B1 may redistribute to exonic binding sites once this intron is spliced. A2/B1 associates with multiple lncRNAs at regions that may contribute to regulation. Finally, we performed cellular fractionation to characterize the pattern of RNA association of A2/B1 in chromatin, nucleoplasm, and cytoplasm.

We also examined the potential relevance of hnRNP A2/B1 in myogenesis. As

A2/B1 has been associated with a number of muscle diseases, we performed eCLIP on A2/B1 in both undifferentiated mouse myoblasts and differentiated myotubes.

We found that A2/B1 binds the 3′ UTR of transcripts in differentiated cells, and that these transcripts tended to be protein-coding. We also performed eCLIP on another

iii protein TDP-43, that has been shown to directly interact with A2/B1 and be dysregulated in many of the same diseases. This experiment identified a number of exonic binding sites in myogenesis-associated transcripts, indicative of a role for

TDP-43 in the nuclear export of long RNAs during myogenesis. Comparison of

A2/B1 and TDP-43-bound transcripts shows some overlap, suggesting that they may act cooperatively in RNA regulation during myogenesis.

Finally, we developed a novel method to investigate heterochromatin- associated RNA called hmRIP-seq, which was designed to differentiate between

RNA-enzyme interactions leading to heterochromatin formation, and those that do not. This method identified a potential heterochromatin-interacting noncoding RNA,

MTRNR2L12, that may direct silencing towards repetitive elements with similar sequence.

The form and content of this abstract are approved. I recommend its publication.

Approved: Aaron M. Johnson

iv ACKNOWLEDGEMENTS

This work would not have been possible without the assistance of many

people.

First and foremost, I would like to thanks my advisor, Aaron M. Johnson, for

his invaluable advice and persistence in keeping the following projects on track as

each one progressed. His copious comments on every paper, poster, and abstract

were invariably helpful.

I would also like to thank the graduate students with whom I have worked for

the past four years, Alexis Zukowski and Maggie Balas, for their conversations and

commiserations about life in the lab, and former lab members Emily Meredith and

Karly Sindy for their invaluable mentorship and advice.

Successful scientific endeavors rarely happen without outside help. For

assistance with the eCLIP protocol I would like to thanks Gabriel Pratt and Eric van

Nostrand, a chance conference encounter that became a lifeline for establishing their

protocol in our lab. For assistance in imagining new frontiers for eCLIP I would like

to thank Josh Wheeler and Tom Vogler for having the foresight to see how to include

my work in their story.

I would like to thank my thesis committee for their advice: Arthur Gutierrez-

Hartmann, Tom Evans, Anthony Gerber, Patricia Ernst, and Matt Taylor. I would also like to acknowledge to contributions of former committee members Tobias Neff and David Bentley.

I would like to thank the members of the Biochemistry and Molecular

Genetics Department and Molecular Biology Program who have generously provided assistance: Nova Fong, Ryan Sheridan, Kerri York, and Monica Ransom for research

v assistance; Sue Brozowski and Annie Vazquez for help navigating departmental

bureaucracy; and Sabrena Heilman, Michele Hwozdyk-Parsons, and Bob Sclafani for

running the Molecular Biology Program.

I would like to thank the Medical Scientist Training Program for their

continued support: Arthur Gutierrez-Hartmann and Angie Ribera for heading the

program; Jodi Cropper, Emily Thomas, and Katie Bidus for administrative assistance; Sally Peach, Laura Hancock, Greg Kirkpatrick, Tamara Garcia, Ariel

Hernandez, and Leon Zheng, the classmates with whom I entered this program six years ago; and Jingjing Zhang, Tom Vogler, Josh Wheeler, Dan Youmans, Matt

Becker, Kelly Higa, Taylor Soderborg, Sarah Haeger, Jason Silver, and Mindy Szeto for their continued willingness to adventure around Denver.

I would like to thank my previous mentors for helping to guide me to this career path: Bertil Hille, Willie Swanson, Pat Navas, and Jay Hesselberth.

I would like to thank my parents, Ann and Toan Nguyen, for their continued advice and support, as well as my brother, Grant Nguyen.

Last but not least, I would like to thank Charlotte Siska for being there for me at any time of day or night.

vi

TABLE OF CONTENTS

CHAPTER

I. INTRODUCTION ...... 1

Chromatin Biology ...... 1

Epigenetic Modifications Have an Effect on Regulation ...... 1

Polycomb Proteins Form Gene Silencing Complexes ...... 3

PRC2 Activity is Modulated by Protein and RNA Cofactors ...... 5

PRC2 Can Bind to Many RNAs ...... 6

PRC2 and Disease ...... 8

Long Noncoding RNAs and Their Effects on Chromatin ...... 8

Long Noncoding RNAs: A New Class of RNA ...... 8

lncRNAs Can Bind to Specific Sites in the Genome ...... 10

Many lncRNAs Can Bind to PRC2 ...... 12

PRC2-lncRNA Interactions are Implicated in Disease ...... 12

Many Proteins Bind lncRNAs and Affect Chromatin State ...... 14

hnRNP A2/B1: A Multi-Faceted Protein ...... 15

hnRNPs Comprise a Diverse Family of Nuclear Proteins ...... 15

The Structure of hnRNP A2/B1 ...... 16

The Potential Roles of hnRNP A2/B1 ...... 17

hnRNP B1 Binds HOTAIR and Its Targets with Specificity ...... 19

hnRNP B1 May Act as an RNA Matchmaker on Chromatin ...... 19

hnRNP A2/B1 in Disease ...... 20

Functional Interactions between RNA-Binding Proteins ...... 21

vii TDP-43 is a Splicing Regulator that Interacts with hnRNP A2/B1 ...... 21

Cytoplasmic TDP-43 and hnRNP A2/B1 can be Pathologic ...... 22

Scope of Thesis ...... 23

II. THE RNA INTERACTOME OF HNRNP A2/B1 ...... 36

Introduction ...... 36

Materials and Methods ...... 37

Lessons Learned from HITS-CLIP and iCLIP Methods ...... 37

The Enhanced CLIP (eCLIP) Method ...... 38

Computational Analysis of eCLIP-seq Samples ...... 41

In vitro eCLIP ...... 42

Cellular Fractionation of MCF7 Cells ...... 43

RNA Isolation and PCR ...... 43

Results ...... 43

The hnRNP B1 Exon is Well-Conserved and Expressed in Mouse and Human

...... 43

Binding of the lncRNA HOTAIR by hnRNP A2/B1 ...... 47

Binding of hnRNP A2/B1 to Long Noncoding RNAs ...... 48

The hnRNP B1 Binding Profile Differs in Each Cellular Compartment ...... 50

Discussion ...... 51

Conservation of the hnRNP B1 Isoform ...... 51

Potential Roles of hnRNP A2/B1 on HOTAIR ...... 52

Interactions between A2/B1 and Additional lncRNAs ...... 53

III. ROLES OF TDP-43 AND HNRNP A2/B1 IN MUSCLE DIFFERENTIATION ...... 67

viii Introduction ...... 67

Materials and Methods ...... 69

C2C12 Cells are a Model System for Myogenesis ...... 69

Modifications to the eCLIP Method ...... 70

Results ...... 70

hnRNP A2/B1 Binds to a Variety of Myogenic in the Cytoplasm ...... 70

TDP-43 Binds Coding Regions of RNAs in Both Myoblasts and Myotubes .... 71

Differences between TDP-43 and A2/B1 RNA Binding in Muscle ...... 73

Discussion ...... 74

IV. THE HISTONE MARK RIP-SEQ METHOD ...... 94

Introduction ...... 94

Materials and Methods ...... 95

Cell Culture ...... 95

Crosslinking of Cells to Improve RNA Recovery ...... 95

Nuclear Lysis and Chromatin Fragmentation ...... 96

RNA Immunoprecipitation ...... 97

hmRIP-seq RNA and DNA Purification ...... 98

Generating an hmRIP-seq Sequencing Library ...... 99

Computational Analysis of hmRIP-seq Libraries ...... 100

Validation of hmRIP-seq Candidates by RT-qPCR ...... 101

Results ...... 101

The hmRIP-seq Technique Can Detect Chromatin-Associated RNAs ...... 101

Identifying Potential Histone Mark-Associated Transcripts ...... 101

A Novel Heterochromatin-Associated RNA ...... 102

ix Potential Piwi Protein-Interacting Transcripts ...... 104

Discussion ...... 106

V. DISCUSSION ...... 119

REFERENCES ...... 126

x LIST OF TABLES

TABLE

3.1 Analysis of TDP-43 Binding Sites in C2C12 Cells...... 91

3.2 Gene Ontology Analysis of A2/B1 Binding Sites in C2C12 Cells...... 92

xi LIST OF FIGURES FIGURE

1.1 Chromatin Biology...... 25

1.2 Globin Activation...... 26

1.3 Polycomb Complex Mutations...... 27

1.4 PRC2 Subunits and Protein Cofactors...... 28

1.5 Model of Interactions Between PRC2, RNA, and Cofactors...... 29

1.6 Methods to Identify RNA-Associated DNA or Protein...... 30

1.7 Examples of RNA-PRC2 Interactions...... 31

1.8 Structure of hnRNPs A1, A2, and B1...... 32

1.9 Proposed Functions of hnRNP A2/B1...... 33

1.10 Matchmaker Model for hnRNP B1 Activity...... 34

1.11 Mechanisms of hnRNP A2/B1 Mutation in Multisystem Proteinopathy.... 35

2.1 CLIP Methods Comparison...... 55

2.2 A2/B1 Antibody Tests...... 56

2.3 eCLIP Experimental Procedure...... 57

2.4 eCLIP Computational Pipeline Flowchart...... 58

2.5 Conservation of B1-Specific Exon Across Species...... 59

2.6 A2/B1 and B1 eCLIP Results in MCF7 Cells...... 60

2.7 A2/B1 and B1 MCF7 Replicate Correlation...... 61

2.8 Investigation of a Novel A2/B1 Binding Site in HOTAIR...... 62

2.9 In vitro B1 eCLIP Binding Sites...... 63

2.10 Binding of lncRNAs by A2/B1...... 64

2.11 eCLIP of MCF7 Chromatin, Nucleoplasm, and Cytoplasm...... 65

xii 2.12 Model of B1 Interaction with HOTAIR...... 66

3.1 TDP-43 Cytoplasmic Distribution of TDP-43 in Myotubes...... 77

3.2 Model of TDP-43 Mediated Muscle Repair...... 78

3.3 TDP-43 is Correlated With Amyloid Oligomers...... 79

3.4 eCLIP TDP-43 Procedural Tests...... 80

3.5 eCLIP C2C12 A2B1 Procedural Tests...... 81

3.6 Correlation of A2/B1 eCLIP Experimental Replicates...... 82

3.7 eCLIP hnRNP A2/B1 Summary...... 83

3.8 A2/B1 C2C12 eCLIP at lncRNAs and A2/B1 Locus...... 84

3.9 A2/B1 C2C12 eCLIP of mRNAs...... 85

3.10 Correlation of TDP-43 eCLIP Experimental Replicates...... 86

3.11 eCLIP TDP-43 Summary...... 87

3.12 eCLIP TDP-43 at Previously Studied RNAs...... 88

3.13 eCLIP TDP-43 at Myogenic Transcripts...... 89

3.14 Comparison of A2B1 and TDP43 eCLIPs...... 90

3.15 Model of Protein-RNA Binding Network...... 91

4.1 hmRIP-seq Protocol Flowchart...... 110

4.2 hmRIP-seq Tests...... 111

4.3 hmRIP-seq Profiles at lncRNAs...... 112

4.4 MTRNR2L12 hmRIP-seq and Confirmatory qPCR...... 113

4.5 MTRNR2L12 Paralogs hmRIP-seq...... 114

4.6 MTRNR2L12 paralogs relationship and H3K27me3 ChIP-qPCR...... 115

4.7 Potential Mechanism of MTRNR2L12-Mediated Paralog Silencing...... 116

xiii 4.8 Model of Potential LINE Functions...... 117

4.9 LINE mapping...... 118

xiv ABBREVIATIONS

CHART Capture Hybridization Analysis of RNA Targets ChIRP Chromatin Isolation by RNA Purification eCLIP Enhanced CLIP FAST-iCLIP Fully Automated and Standardized iCLIP H3K4me3 Histone H3 Lysine 4 trimethylation; activating H3K9me2 Histone H3 Lysine 9 dimethylation; silencing H3K27me3 Histone H3 Lysine 27 trimethylation; silencing H3K36me3 Histone H3 Lysine 36 trimethylation; activating HITS-CLIP High-throughput sequencing of RNA isolated by crosslinking IP hmRIP Histone mark RNA immunoprecipitation hnRNP Heterogeneous Nuclear Ribonucleoprotein iCLIP Individual nucleotide resolution UV crosslinking and IP IDR Irreproducibility Discovery Rate IP Immunoprecipitation LINE Long Interspersed Nuclear Element lncRNA Long noncoding RNA m6A N6-methyladenosine NUMT Nuclear Mitochondrial DNA Segment piRNA piwi-interacting RNA PRC1 Polycomb Repressive Complex 1 PRC2 Polycomb Repressive Complex 2 PTM Post-translational modification RAP RNA Antisense Purification SILAC Stable Isotope Labeling with Amino Acids in Cell Culture smFISH Small Molecule Fluorescence in situ Hybridization

xv CHAPTER I

INTRODUCTION

Chromatin Biology

Epigenetic Modifications Have an Effect on Gene Regulation

Genes do not exist in a vacuum. In the nucleus, the DNA that makes up the genome is surrounded by factors such as newly synthesized RNA, regulatory elements, transcription factors, and packaging proteins. The most basic of these are nucleosomes, repeating DNA-protein complexes that make up the molecular complex known as chromatin.

Chromatin is composed of repeating nucleosomes, protein octamers

(comprised of dimers of histones H2A, H2B, H3, and H4) wrapped by 147 base pairs of DNA and connected by linker DNA. This structure can exist as “beads on a string” accessible to DNA-binding proteins (euchromatin), or be compacted into a condensed structure consisting of packed nucleosomes and structural proteins

(heterochromatin) (Figure 1.1). Euchromatin is generally associated with active genes, at which uncondensed chromatin allows transcription machinery to bind

DNA.

Euchromatic regions often contain DNase hypersensitive sites, which are particularly active loci that can be bound and degraded by the addition of the enzyme

DNaseI. These sites often change during development and cellular differentiation, as reflected by differences in hypersensitive site locations in cells from different tissues at different times (Crawford et al., 2006; Thurman et al., 2012). A classic example of changing hypersensitive sites is in the locus coding for hemoglobin proteins (Li et al.,

2002; Noordermeer and de Laat, 2008). Humans produce various combinations of

1 hemoglobin proteins during development in a highly-regulated fashion—embryonic

hemoglobin gives way to fetal hemoglobin (comprised of two alpha and gamma

subunits), which is replaced by adult hemoglobin (comprised of two alpha and beta

subunits) after birth (Figure 1.2a). This process is regulated at upstream and downstream hypersensitive sites clustered in Locus Control Regions (Figure 1.2b),

which are thought to form chromatin loops to interact with particular hemoglobin

genes at different times. Deletion or mutation of these hypersensitive sites can lead to diseases such as thalassemia.

Genome-wide studies of DNase hypersensitive sites show high correlation

with studies demonstrating regional enrichment of histone post-translational

modifications (PTMs). Histone PTMs are reversible modifications added mainly to

the N-terminal tails that extend from the globular domain of each histone.

Methylation, acetylation, ubiquitination, and phosphorylation are among the

modifications of specific histone residues can recruit protein complexes that can

perform various actions on nearby chromatin. One of the most common histone

PTMs genome-wide is methylation, which can be associated with either activation or

repression of the surrounding region depending on its position in the histone tail.

Trimethylation of histone H3 lysine 9 (H3K9me3) and lysine 27 (H3K27me3) are

repressive histone marks; conversely, H3K4me3 appears at active transcription start

sites and hypersensitive sites, while H3K36me3 is often present within the body of

actively transcribed genes. Histone PTMs have been described as “epigenetic”

modifications in light of their ability to regulate phenotypic changes without

affecting DNA sequence.

2 Histone PTMs are deposited on chromatin by enzymes that catalyze their addition. The modifications are then recognized and bound by “reader” complexes that bind specifically to particular histone PTMs. Some reader complexes have the capability to perform the physical act of chromatin compaction or de-compaction.

Histone PTM distribution fluctuates between cell types, can be targeted to specific loci in the genome, and can be preserved through DNA replication. Histone PTMs thus represent a reversible mechanism of gene silencing and activation that is heritable across cell divisions.

Polycomb Proteins Form Gene Silencing Complexes

Recent research has attempted to determine the mechanisms underlying the targeting of PTMs to specific sites in the genome. One such mechanism is a pathway named Polycomb. The Polycomb pathway was first identified in a Drosophila mutant that developed sex combs on all (instead of just the first) of its pairs of legs (Lewis,

1947) (Figure 1.3). Subsequent research characterized this mutation as regulating developmental patterning in flies by repressing the Bithorax protein complex under certain conditions. This in turn creates a gradient of Bithorax complex across the developing fly (Lewis, 1978). Mutations with similar phenotypes were soon discovered in a number of genes (e.g. Pc, Pcl, Scm), which are now understood to code for some of the proteins that comprise the Polycomb Repressive Complex 1

(Jürgens, 1985).

Polycomb proteins are well conserved between Drosophila and human

(Levine et al., 2002), and in both species the Polycomb pathway is split between two

Polycomb Repressive Complexes that perform distinct functions: PRC1 and PRC2.

PRC1 binds H3K27me3 marks, then performs the physical compaction of

3 nucleosomes that creates heterochromatin (Franke et al., 1992; Kundu et al., 2017).

PRC2 is the protein complex that deposits H3K27me3 on histones.

Mammalian PRC2 is comprised of four core proteins (Figure 1.4a-d). Ezh2 is

the catalytic subunit, containing a methyltransferase domain that is used to catalyze

the deposition of methyl marks at H3K27 (Kuzmichev et al., 2002). There also exists

a homolog of Ezh2, called Ezh1, which can combine with the other PRC2 proteins to

make an alternate form of the complex (Margueron et al., 2008; Shen et al., 2008).

Ezh1 and Ezh2 have some degree of functional overlap, with either protein alone

being able to generate H3K27 methylation at many of the same sites; however, Ezh1

appears to be the predominant Ezh protein in differentiated tissues (Margueron et

al., 2008; Xu et al., 2015), while Ezh2 appears to more effectively deposit H3K72me3

marks (Shen et al., 2008).

The proposed structure of PRC2 suggests that Suz12, the largest protein in the

complex, acts as a “bridge” between RbAp48 and the other two subunits, Ezh2 and

Eed (Ciferri et al., 2012). It is required for proper PRC2 function (Cao and Zhang,

2004), but can also inhibit PRC2 activity when bound to H3K4me3 or H3K36me3

marks through its VEFS domain (Schmitges et al., 2011). Curiously, Suz12 can bind

DNA or RNA on its own (Beltran et al., 2016; Kirmizis et al., 2004), and is necessary

for proper localization of HP1α, a protein that is involved with silencing by the

histone PTM H3K9me3 (la Cruz et al., 2007). Suz12 also appears able to bind the

chromatin looping protein CTCF, perhaps enabling the spread of PRC2 across the

chromatin landscape (Li et al., 2008).

Eed is composed of WD40 repeats that bind to trimethylated lysines, which

could aid PRC2 in propagating repressive chromatin marks from loci containing pre-

4 existing H3K27me3 marks (Margueron et al., 2009). Although Eed does not contain any catalytic domains, it is required for virtually all H3K27me3 in vivo (Schoeftner et al., 2006; Xie et al., 2014); Eed inhibitors have recently been discovered that block the catalytic function of PRC2 as effectively as compounds that inhibit the methyltransferase domain in Ezh2 (He et al., 2017; Qi et al., 2017).

RbAp48 is a histone binding protein that, like Eed, contains a number of

WD40 domains. It has been shown to bind H3/H4 dimers in vitro, which may act to

stabilize the binding of chromatin by PRC2 (Murzina et al., 2008).

In Drosophila, PRC2 is found on specific sites known as Polycomb Response

Elements (PREs) along with protein cofactors such as GAGA, Zeste, Pleihomeotic,

and Pipsqueak, with one or more cofactor associating with PRC2 at any given PRE

(Mishra et al., 2001; Mulholland et al., 2003; Schwendemann and Lehmann, 2002).

However, in humans most of these proteins are not conserved. YY1, the ortholog of

pleihomeotic and the sole PRC2 binding protein that is conserved between mammals

and invertebrates, does not co-localize with PRC2 on chromatin in humans as it does

in Drosophila (Mendenhall et al., 2010). While the mechanisms of PRC2 targeting in

humans remain unknown, a number of hypotheses have been proposed to explain

how PRC2 is localized to specific genomic loci.

PRC2 Activity is Modulated by Protein and RNA Cofactors

A number of protein cofactors have been found to bind to PRC2, including

AEBP2, Pcl1 and Jarid2 (Figure 1.4e-g). AEBP2 binds DNA nonspecifically and

increases the stability and activity of PRC2 (Kim et al., 2009) while Pcl1 binds to the

activating H3K36me3 mark, perhaps leading to silencing of active sites (Musselman

et al., 2012). Jarid2 has received the most research of these cofactors since it binds

5 both PRC2 and specific DNA sequences, making it a prime candidate to perform the role of gene targeting.

Jarid2, also known as Jumonji, is a transcriptional repressor (Kim et al.,

2003) that was first identified as a key factor in embryonic stem cell differentiation and cardiac development through silencing of genes in the Notch1 pathway

(Mysliwiec et al., 2011). Jarid2 is necessary for maximal PRC2 methyltransferase activity (Li et al., 2010; Peng et al., 2009; Shen et al., 2009). Jarid2-PRC2 interactions appear to be mediated by a small region of Jarid2 that binds Ezh2

(Kaneko et al., 2014a; Pasini et al., 2010; Son et al., 2013).

Jarid2 also binds RNAs, which has led to speculation that association of PRC2 and sequence-specific RNAs via Jarid2 might have an effect on H3K27me3 deposition. Jarid2 has been shown to interact with the PRC2-interacting RNA

HOTAIR (Kaneko et al., 2014a), and be required for interactions between PRC2 and the RNA Xist (da Rocha et al., 2014; Zhao et al., 2008). This suggests a potential mechanism for PRC2 localization: binding to non-repressed regions for subsequent silencing is facilitated by interactions with RNAs that contain specificity for that region (Figure 1.5).

PRC2 Can Bind to Many RNAs

It was recently found that PRC2 can bind nascent RNAs across the genome.

This property could provide significant caveats to the current model of PRC2 function, as it makes little sense why PRC2, a protein complex that deposits repressive marks, would bind to the RNA of genes that are not silenced. One group of studies identified RNAs that bind Ezh2, and concluded that PRC2 is associated with promoters across the genome but does not deposit H3K27me3 if bound to a nascent

6 RNA (i.e. it is at an active promoter) (Kaneko et al., 2014b; 2013). Another study

showed that recombinant human PRC2 promiscuously binds to RNA, even those

from constitutively transcribed genes and non-human species. The term

“promiscuous binding,” when used in this context, refers to proteins that bind to

RNAs that do not contain an obvious protein-binding motif. In this model PRC2 binds to RNA, including nascent transcripts, then recognizes nearby histone PTMs.

If those histone PTMs are H3K27me3 then PRC2 deposits additional H3K27me3 nearby; conversely, if the only nearby histone PTMs are activating then PRC2 is inhibited (Davidovich et al., 2013).

It has since been found that promiscuous binding is highly dependent on

experimental conditions, including RNA length and buffer conditions. Although the

binding affinity of PRC2 is correlated with RNA length, length alone is not enough to

fully determine affinity, leaving the potential mechanisms of promiscuous binding

unclear (Davidovich et al., 2015). It has recently been suggested that PRC2 has

affinity for G-rich RNA regions, and has a strong affinity for related RNA secondary

structures such as folded guanine quadruplexes (Wang et al., 2017).

Yet another study concluded that PRC2 binding is less promiscuous and

proposed a model in which RNA inhibits PRC2, an effect that is relieved by the

binding of accessory protein Jarid2 (Cifuentes-Rojas et al., 2014). This has been

supported by a more recent study that described PRC2 binding sites across the

transcriptome, finding that RNA tends to be bound by PRC2 (specifically the Suz12

subunit) at exons and 3′ UTRs, an interaction that is capable of inhibiting PRC2-

chromatin binding, and vice versa (Beltran et al., 2016).

7 PRC2 and Disease

Since PRC2 is involved in the silencing of many loci across the genome, it has

also been implicated as a cause of many diseases in which those genes are

inappropriately silenced or activated. For example, the FMR1 gene is subjected to

PRC2-mediated silencing when an internal repeat tract reaches a certain length, leading to Fragile X syndrome (Kumari and Usdin, 2014). A similar mechanism appears to affect development of Huntington’s Disease, in which the huntingtin locus is targeted by PRC2 once an internal repeat region reaches a certain length (Seong et al., 2010). It has also been shown that PRC2 is responsible for silencing a number of neurodegenerative disease-related genes, with PRC2 deficiency leading to neuronal cell death in mice (Schimmelmann et al., 2016).

PRC2 and its subunits have also been associated with many different types of cancer. The complex is required for the proliferation of MLL-AF9 leukemia cells

(Neff et al., 2012; Xie et al., 2014); in these same cells, inhibition of the interaction between Ezh2 and Eed is also able to disrupt proliferation (Kim et al., 2013b).

Malignant peripheral nerve sheath tumors have been associated with PRC2 mutations (particularly in Suz12 and Eed) in patient samples (De Raedt et al., 2014;

Zhang et al., 2014b). Mutations in the genes coding for histone H3 variants have been shown to modulate PRC2 activity in glioblastoma, leading to reduced amounts of H3K27me3 (Lewis et al., 2013).

Long Noncoding RNAs and Their Effects on Chromatin

Long Noncoding RNAs: A New Class of RNA

In the Central Dogma, mRNAs are intermediates in the transfer of information from DNA to protein while rRNAs and tRNAs facilitate this process.

8 This view of RNA function was incomplete, as the years since have witnessed the

discovery of many RNAs with varied structure and function: ribozymes catalyze

common enzymatic reactions; snRNAs alter the sequence of other RNAs; rRNAs catalyze translation in addition to scaffolding the ribosome; miRNAs can promote

targeted RNA silencing.

Many recent discoveries in RNA biology have been helped by the ENCODE

project, which was tasked with a systematic examination of genomic regulatory

elements, including less well-studied elements whose functions had previously been poorly described (Dunham et al., 2012; Gerstein et al., 2012; Harrow et al., 2006;

Neph et al., 2012; Thurman et al., 2012). Their studies showed that over 80% of the genome contains histone PTMs, is in a DNase hypersensitive site, is bound by transcription factors, or is transcribed in one of their studied cell types (Kellis et al.,

2014). Among many surprising findings, they observed pervasive transcription

occurring throughout the genome, suggesting that any given cell has a far wider

variety of RNA expressed at a given time than had previously been believed (Djebali

et al., 2012). Although there is disagreement over how much of this pervasive

transcription is noise versus functional transcription, analysis of these transcripts

has led to the identification of thousands of long noncoding RNAs (Derrien et al.,

2012).

Long noncoding RNAs (lncRNAs) are generally defined as RNAs longer than two hundred nucleotides with little protein-coding potential but with functional roles in cellular processes nonetheless (Derrien et al., 2012). In many respects they are similar to mRNAs: they are often transcribed by Pol II, are often polyadenylated, have similar chromatin signatures as active genes, and contain canonical splice sites

9 despite often being spliced post-transcriptionally (Tilgner et al., 2012). LncRNAs

have evolved rapidly, although they retain significant tissue specificity between

species (Necsulea et al., 2014; Washietl et al., 2014). This rapid evolution is thought

to be due in part to the influence of transposable elements (Kapusta et al., 2013).

There are several ways for lncRNAs to influence cellular processes. Most obvious are antisense RNAs, which bind mRNAs in cis to assert post-transcriptional control over their fate (Amit-Avraham et al., 2015; Villegas and Zaphiropoulos,

2015). lncRNAs can also modulate enhancer function when transcribed from enhancer loci (Li et al., 2013; Ørom et al., 2010). There are many other examples of diverse lncRNA mechanisms such as Gas5, which contains a stem-loop structure similar to a glucocorticoid response element, allowing it to inhibit the glucocorticoid receptor (Kino et al., 2010); THRIL, which forms a ribonucleoprotein complex binding the TNFα promoter to induce an immune response (Li et al., 2014); linc-

MD1, which acts as a microRNA sponge during muscle development (Cesana et al.,

2011); and RMST, which binds and regulates a pluripotency-associated transcription factor (Ng et al., 2013). Many lncRNAs, such as the previously-mentioned HOTAIR,

Kcnq1ot1, and Xist, are known to bind to repressive chromatin modifiers as well as specific genomic loci, either in cis or in trans (Ng et al., 2011). lncRNAs Can Bind to Specific Sites in the Genome

A number of groups have introduced methods to study the genomic localization of DNA-associating lncRNAs. The first of these, ChIRP-seq (chromatin isolation by RNA purification, Figure 1.6a), uses tiling antisense RNA oligonucleotides to capture an RNA of interest, cross-linked to chromatin, from which bound DNA can then be purified. ChIRP-seq analysis of HOTAIR confirmed

10 the prior suggestion that HOTAIR interacts directly with the HOXD gene cluster

(Rinn et al., 2007), narrowing the binding site to the intergenic region between

HOXD3 and HOXD4 and demonstrating that HOTAIR binds to this site

independently of PRC2 (Chu et al., 2011). ChIRP-seq has recently been modified to

investigate binding partners of specific domains of transcripts, although this

approach has not yet yielded any insights into HOTAIR biology (Quinn et al., 2014).

Another similar method to ChIRP is CHART-seq (capture hybridization analysis of RNA targets, Figure 1.6b), which uses a similar strategy as ChIRP-seq but

with antisense RNA oligonucleotides designed specifically to single-stranded regions

of the target RNA (Simon et al., 2011). CHART has been applied to the lncRNAs Xist,

NEAT1 and MALAT1, which are both highly expressed in the nucleus, and like

HOTAIR, bind to specific sites in the genome (Simon et al., 2014; West et al., 2014).

Finally, the RAP (RNA antisense purification, Figure 1.6c) method also uses

antisense RNA oligonucleotides to retrieve target transcripts, but with

oligonucleotides designed to be very long (120 nucleotides) in order to improve their

strength of hybridization. RAP has been used to examine the process by which Xist

binds to the soon-to-be-inactivated X (Engreitz et al., 2013). It was

found that Xist initially binds to specific regions on the X chromosome, then spreads

across it in correlation with three-dimensional chromosome conformation. Notably,

although Xist is able to spread independently of its PRC2-binding A-repeat domain,

that domain is required for Xist to bind active, gene-dense regions, implying that

separate domains of a lncRNA might mediate its ability to bind different regions or

types of chromatin.

11 Many lncRNAs Can Bind to PRC2

Several groups have attempted to characterize the RNAs associated with

PRC2 by immunoprecipitations with PRC2 components and computational

predictions based on supervised learning (Glazko et al., 2012; Guttman et al., 2012;

Khalil et al., 2009; Zhao et al., 2010). These groups identified one to three hundred lncRNAs that interact with PRC2. These identifications comprised a large proportion of lncRNAs that are expressed in the cell lines used for those experiments, suggesting that some of these lncRNAs may not have any specificity for PRC2, and may instead be binding as part of a promiscuous PRC2 binding interaction.

In light of promiscuous binding of RNA to PRC2, it seems apparent that any

immunoprecipitation experiment to find PRC2-associated RNAs would identify

promiscuous interactions in addition to specific ones. Depending on the ratio of

promiscuous to specific interactions, a majority of the previously described PRC2-

RNA interactions might have no effect on H3K27me3 deposition. In the absence of a

strong RNA primary sequence motif in transcripts bound by PRC2, any productive

interactions (RNA-PRC2 interactions that lead to the formation of heterochromatin)

are likely to instead be regulated by RNA secondary structure or bound protein

cofactors.

PRC2-lncRNA Interactions are Implicated in Disease

A number of diseases can be caused by interactions between PRC2 and

lncRNAs. Dysregulation of the RNA HOTAIR leads to inappropriate localization of

PRC2, in turn leading to inappropriate gene silencing (Tsai et al., 2010). HOTAIR is

transcribed from the HOXC locus on chromosome 12, but is known to bind PRC2

and silence a specific region between HOXD3 and HOXD4 (Rinn et al., 2007)

12 (Figure 1.7a). HOTAIR also binds specifically to many other genomic loci. In MDA-

MB-231 breast cancer cells, HOTAIR overexpression results in a PRC2 occupancy

pattern similar to embryonic and neonatal fibroblasts, implying that HOTAIR

overexpression is able to “reprogram” PRC2 to revert cells to a more proliferative

state. These cells also demonstrated primary tumor growth and metastatic ability

(Gupta et al., 2011).

Another well-studied example of disease caused by PRC2-RNA dysfunction is

Beckwith-Wiedemann Syndrome, in which inappropriate silencing of the PRC2- binding lncRNA Kcnq1ot1 leads to activation of the nearby loci, including Igf2,

Cdkn1, and H19 (Pandey et al., 2008; Zhang et al., 2014a) (Figure 1.7b). This can

lead to a number of clinical symptoms, including macroglossia, large abdominal

circumference, and Wilms’ tumor. Interestingly, inappropriate silencing of Kcnq1ot1

is itself often epigentically driven, with about 50% of cases occurring as a result of

loss of DNA methylation in a nearby differentially methylated region. As a result,

many cases of Beckwith-Wiedemann Syndrome are often mosaic, resulting in

hemihyperplasia such as uneven leg length.

A well-studied chromatin-associated lncRNA is Xist, which regulates

mammalian dosage compensation. Xist spreads from its locus across the soon-to-be-

inactivated X chromosome, binding PRC2 and promoting heterochromatin

formation across the chromosome (Engreitz et al., 2013; Simon et al., 2014) (Figure

1.7c). Xist normally only acts on the inactive X chromosome, and transplanting it to another chromosome silences that chromosome through Xist’s normal PRC2- associated mechanism (Jiang et al., 2013); conversely, Xist loss in mice leads to X chromosome reactivation, which is highly cancerous (Yildirim et al., 2013).

13 Many Proteins Bind lncRNAs and Affect Chromatin State

A number of groups have investigated proteins associated with lncRNAs using modified versions sequence capture methods, followed by mass spectrometry to identify bound proteins. CHART-MS (CHART followed by mass spectroscopy) was used to identify proteins bound to NEAT1 and MALAT1, two lncRNAs that associate with either paraspeckles or nuclear speckles (West et al., 2014). As one might expect, the proteins that bind these RNAs tend to be components of paraspeckles or nuclear speckles, some of which bind to either lncRNA, and some of which bind to both.

ChIRP-MS has been used to identify Xist-interacting proteins in mouse embryonic stem cells, embryonic stem cells with Xist expression induced on chromosome 11, epiblast stem cells that have undergone random X chromosome inactivation, and trophoblast stem cells that have a silenced paternal X chromosome

(Chu et al., 2015). This study identified a number of Xist-interacting proteins, which were then subdivided into groups that bind in all cells versus only differentiated cells. A “plug-and-play” model was hypothesized in which Xist might gain or lose silencing functions based on which cell type-specific proteins it binds.

RAP-MS (McHugh et al., 2015) and iDRiP (identification of direct RNA- interacting proteins) (Minajigi et al., 2015) have also been used to examine proteins that associate with Xist. These experiments identified a wide range of Xist-associated proteins, many of which are involved in or are associated with chromatin regulation.

The iDRiP method also identified proteins that are associated with nuclear structure, and confirmed that Xist binding is necessary for these proteins to affect the structure of the inactive X chromosome. Another study targeting Xist-associated histone variants identified ATRX as a potential cofactor for interactions between Xist and

14 PRC2 (Sarma et al., 2014). The various ChIRP-MS, RAP-MS, and iDRiP results

suggest that Xist can bind to a variety of proteins, suggesting that its binding

partners may change based on the overall cellular state.

Our lab has performed the only study to examine the proteins bound to

HOTAIR, using MS2-tagged HOTAIR that was then incubated with nuclear extracts from either Hela or MDA-MB-231 cells, and analyzed by SILAC (stable isotope labeling with amino acids in cell culture), comparing HOTAIR versus a control RNA

(Meredith et al., 2016). This study identified a number of candidate HOTAIR- regulating protein, including several from the hnRNP (heterogeneous nuclear ribonucleoprotein) family.

hnRNP A2/B1: A Multi-Faceted Protein hnRNPs Comprise a Diverse Family of Nuclear Proteins

The hnRNP family consists of twenty abundant nuclear proteins that were

originally characterized based on the fact that they can be purified with nuclear RNA,

and were named based on their size (with hnRNP A1 being the smallest, and hnRNP

U being the largest). The diversity of hnRNPs is correlated with rapid expansion of

this family in vertebrates (Barbosa-Morais et al., 2006). They have been shown to

cooperatively regulate many RNA processing events, including alternative splicing

(Busch and Hertel, 2012; Gueroussov et al., 2017; Huelga et al., 2012).

Multiple hnRNPs have been identified as binding to lncRNAs. For example, hnRNP U is required for proper localization of Xist (Hasegawa et al., 2010). Human hnRNP K interacts with the Drosophila ortholog of the PRC2 subunit Eed

(Denisenko and Bomsztyk, 1997) and is required for the deposition of silencing histone marks on the inactive X chromosome, a process that is mediated by Xist

15 (Chu et al., 2015). Our SILAC experiment identified a strong interaction between

HOTAIR and hnRNP A2/B1.

The Structure of hnRNP A2/B1

The name hnRNP A2/B1 describes a pair of alternatively-spliced transcripts that differ only by an additional 36-nucleotide exon at the N-terminal end of the B1 isoform (Burd et al., 1989). Both isoforms contain two RNA recognition motifs

(RRMs) at their N-terminal end, and a glycine-rich domain (also known as a low- complexity or prion-like domain) at their C-terminal end (Figure 1.8b-c). A2/B1, like other highly abundant hnRNPs, is expected to have tens of millions of copies per nucleus, with some variation depending on cell type (Dreyfuss et al., 2002; Kamma et al., 1999).

A2/B1 is a paralog of hnRNP A1, which also contain two RRMs and a glycine- rich domain, and has also been classified as a splicing regulator (Biamonti et al.,

1994) (Figure 1.8a). There is little primary sequence similarity in the glycine-rich domain but a notable amount of structural similarity, particularly in the spacing of aromatic residues (Biamonti et al., 1994). This similarity enables comparisons between the structure and function of the two paralogs.

The structure of the hnRNP A1 RNA-binding domains has been solved

(Shamoo et al., 1997; Xu et al., 1997); although there is significant sequence dissimilarity between the two RNA binding domains, their structure remains similar.

Furthermore, when combined together with a short linker region (as in hnRNP A1) the RNA-binding domains orient antiparallel to one another. It has been hypothesized that this antiparallel orientation could be conducive to hnRNP A1 binding two sites on the same RNA with only one turn in between (Ding et al., 1999);

16 alternatively, this style of binding applied to two different RNAs could promote complementary RNA-RNA interactions.

There are some examples of hnRNP A1 promoting complementary annealing in a sequence-specific manner (Figure 1.9a). This annealing can be promoted solely by a 48-residue region in the glycine-rich domain of A1 (Kumar and Wilson, 1990;

Munroe and Dong, 1992), implying that this domain might have some RNA-binding ability of its own. Although the RNA-binding domains can bind mRNA on their own, it has been shown that the glycine-rich domain adds selectivity for RNA containing the sequence UAGGGA/U (Burd and Dreyfuss, 1994). Since this region is structurally conserved in A2/B1, this selectivity might also be conserved between the two paralogs.

The Potential Roles of hnRNP A2/B1

Analysis of hnRNP A2/B1 to date has focused on its role as an RNA processing factor. hnRNPs A1 and A2/B1 have been associated with splicing and stability of mRNAs (Mayeda and Krainer, 1992; Mayeda et al., 1994) (Figure 1.9b). An extensive examination of splicing control by the hnRNP family identified a number of splicing events that require A2/B1, as well as others that require either A2/B1 or another hnRNP (Huelga et al., 2012). This analysis also identified a preference for binding within the exons or 3ʹ UTRs of genes; their motif analysis was unable identify a strong binding motif, but indicated a preference for UAG sequences.

While splicing appears to be the major function of A2/B1, it appears that its ability to bind RNA during splicing has been repurposed for other RNA-binding activities. For example, A2/B1 binds to telomeric (TTAGGG)n repeats (McKay and

Cooke, 1992) (Figure 1.9c), similar to the UAGGGA/U sequence that is a preferential

17 binding site for hnRNP A1. As a result, it has been hypothesized to act as a potential

adapter between ssDNA and ssRNA at the telomere (Moran-Jones et al., 2005).

However, the exact nature of this role remains unknown.

A2/B1 has also been hypothesized to be a reader of N6-methyladenosine

(m6A) modifications (Alarcón et al., 2015), which are a common form of mRNA

modification (Figure 1.9d). This modification occurs in multiple locations on many genes at specific locations on each transcript (Linder et al., 2015), and is thought to affect the structure of the surrounding RNA (Liu et al., 2015). These modified sites have functional consequences—for example, the m6A sites on Xist have been shown to be crucial for its role in X inactivation (Patil et al., 2016). m6A modifications are thought to help maintain mRNA stability through the recruitment of selectively- binding proteins (Wang et al., 2014). A2/B1 has been shown to bind to transcripts at m6A marks, often overlapping the RGAC motif bound by METTL3, the enzyme that deposits m6A (Alarcón et al., 2015). A2/B1 also stabilizes miRNAs that are processed by METTL3, implying that A2/B1 is able to “read” m6A marks deposited by METTL3.

While this “reading” ability may not be a direct interaction, binding by A2/B1 appears to be able to mediate the downstream effects of the METTL3 pathway.

Another study computationally predicted 3ʹ UTR sequence elements most closely associated with mRNA stability, then used a number of methods to identify hnRNP A2/B1 as the most likely factor to bind these elements and stabilize the transcripts that contain them (Goodarzi et al., 2013). Interestingly, A2/B1 has also been associated with decay-related 3ʹ UTR motifs, implying that A2/B1 might have a number of effects on RNA depending on the motifs available for it to bind (Geissler et al., 2016) (Figure 1.9f).

18 hnRNP B1 Binds HOTAIR and Its Targets with Specificity

Although both hnRNP A2 and B1 were specifically enriched in the HOTAIR pulldown that our lab performed, the hnRNP B1 isoform displayed far greater

enrichment than A2. While the ratio of A2 to B1 varies in vivo, hnRNP A2 tends to be far more abundant than the B1 isoform (Kamma et al., 1999). This specificity is consistent across cell types, including MDA-MB-231 and MCF7 breast cancer cell

lines, which have previously been used for HOTAIR research.

The binding of HOTAIR by hnRNP B1 also has functional consequences.

Knockdown of hnRNP A2/B1 by shRNA inhibited HOTAIR-dependent invasion and

migration, as well as decreased levels of H3K27me3 at HOTAIR-dependent loci.

A2/B1 can also bind the RNA transcripts of genes regulated by HOTAIR, implying

that it could be a mediator of interactions between HOTAIR and those transcripts

(Meredith et al., 2016).

hnRNP B1 May Act as an RNA Matchmaker on Chromatin

HOTAIR and several of its target gene transcripts share a significant amount

of complementary sequence, as identified by the IntaRNA program (Busch et al.,

2008). To test whether these complementary sequences promote RNA-RNA

annealing in vitro, we tested tethered JAM2 RNA, a target gene of HOTAIR. This

experiment retrieved HOTAIR bound to the tethered JAM2, indicating that the

interaction between complementary sequences is relatively strong. This interaction

was disrupted by mutations in the HOTAIR interaction sequence, and strengthened

by the addition of hnRNP B1 to the experiment (Meredith et al., 2016).

This result led us to a model (Figure 1.10) in which hnRNP B1 first binds to

both HOTAIR and HOTAIR target gene transcripts on chromatin; this could be

19 mediated by the RNA recognition motifs on individual B1 proteins, or by dimers of

B1. Second, putting HOTAIR and the target transcripts in close proximity promotes

the formation of stable RNA-RNA interactions between these RNAs. Finally,

HOTAIR recruits PRC2, leading to deposition of H3K27me3 at the target gene locus.

This “matchmaker model” could fit as an intermediate step in prior models of

PRC2 activity, in which PRC2 binds to either nascent transcripts or promoters and inhibition of its activity is relieved as a result of some external stimulus. Either the

A2/B1 cofactor could serve as the stimulus, or the RNA-RNA interactions it promotes could do so. If this model is an accurate depiction of the A2/B1’s activity on chromatin, it raises questions as to what other functions A2/B1 might have in gene silencing, and how those functions might be disrupted in disease. hnRNP A2/B1 in Disease

Since hnRNP A2/B1 is associated with the stability and splicing of many mRNAs, it stands to reason that mutations in A2/B1 or A2/B1 targets would have an effect on numerous transcripts and pathways. One such change is caused by mutation of an exonic splicing silencer sequence in the GLA gene (Palhais et al.,

2016). Disruption of this sequence decreases binding of A2/B1 and leads to inclusion of a pseudoexon that contains a premature stop codon. This results in a defective protein that impairs lysosomal storage, eventually leading to Fabry Disease.

Antibodies to A2/B1 have also been detected in patients with a wide range of autoimmune diseases, such as rheumatoid arthritis, systematic lupus erythematosus, mixed connective tissue disease (Steiner et al., 1992), juvenile idiopathic arthritis

(Tomoum et al., 2009), Takayasu’s arteritis, and Behçet’s disease (Bin Cho et al.,

2012) (in the autoimmune disease field, A2/B1 is referred to by the name RA33)

20 (Figure 1.9e). While no mechanistic link has been discovered through which depleted

A2/B1 could cause these diseases, this correlation suggests that large quantities of

A2/B1 are released once the inflammatory process lyses cells.

Loss of A2/B1 has also been associated with cognitive diseases such as

Alzheimer’s disease (Berson et al., 2012). Curiously, overexpression or mislocalization of A2/B1 can also lead to disease. In a subset of patients with multisystem proteinopathy (a complex disease presenting with frontotemporal demetia, Paget’s disease of the bone, inclusion body myopathy, and atrophic lateral sclerosis), hnRNP A2/B1 mutations have been identified as causative factors (Le Ber et al., 2014). This causation, however, is closely related to another protein that has

also been classified as an hnRNP, TDP-43.

Functional Interactions between RNA-Binding Proteins

TDP-43 is a Splicing Regulator that Interacts with hnRNP A2/B1

TDP-43 was initially identified as a factor that binds the TAR DNA locus in the HIV-1 genome (Ou et al., 1995). It was soon realized that TDP-43 binds to RNA

as well, playing a key role in the correct splicing of the CFTR transcript, which codes

for a transmembrane chloride channel (Buratti et al., 2001). TDP-43 binds to TG

repeats near the 3ʹ end of exon 8 of CFTR, affecting whether or not exon 9 is spliced

into the final transcript—overexpression of TDP-43 results in an increase in exon 9

skipping, while inhibition results in increased inclusion. This is a function that has

clinical consequences, as a CFTR protein missing exon 9 is nonfunctional, leading to

cystic fibrosis.

The C-terminal end of TDP-43 associates with hnRNPs A1, A2/B1, and C

(Buratti et al., 2005). Either knockdown of A1 or A2, or inhibition of the TDP-43/A2

21 interaction, leads to increased inclusion of CFTR exon 9 (D’Ambrogio et al., 2009).

This interaction is evolutionarily conserved, as human TDP-43 retains the ability to bind the Drosophila orthologs of hnRNP A1 and A2/B1 (Romano et al., 2014).

Cytoplasmic TDP-43 and hnRNP A2/B1 can be Pathologic

TDP-43 has been identified as a key contributor to diseases such as frontotemporal lobar degeneration and amyotrophic lateral sclerosis. These diseases are characterized by ubiquitin-positive, tau- and α-synuclein-negative inclusions accumulating in the nucleus or cytoplasm of affected cells. These inclusions were later found to include fragments of degraded TDP-43 (Neumann et al., 2006). In cells from patients with these diseases TDP-43 redistributes from the nucleus to the cytoplasm, a process that has significant effects on transcripts that would normally be regulated by TDP-43 (Amlie-Wolf et al., 2015; Polymenidou et al., 2011).

These disease-related cytoplasmic accumulations can also be caused by a specific mutation in hnRNP A2/B1. This mutation, an aspartic acid to valine mutation at residue 290 in A2 (residue 302 in B1), occurs in the glycine-rich domain and has been shown to promote fibrillization of A2/B1 and cytoplasmic localization of both A2/B1 and TDP-43 (Kim et al., 2013a) (Figure 1.11). These are thought to be the molecular cause of multisystem proteinopathy.

The D290V mutation also causes changes in splicing of A2/B1 target transcripts such as the DAO gene, which produces an oxidase associated with ALS

(Martinez et al., 2016). While these changes are likely due in part to localization of

A2/B1 from the nucleus to the cytoplasm, it is also possible that the D290V mutation on its own can lead to changes in splicing and transcription regulation. Residue 290 falls within the region of A2/B1 that has previously been described to bind to TDP-

22 43. Since that protein-protein interaction is necessary for proper splicing of the

CFTR transcript, it makes sense that a mutation in this region might affect other alternative splicing events as well.

Scope of Thesis

TDP-43 and hnRNP A2/B1 are two examples of multifunctional RNA-binding proteins that have a diverse set of roles in the cell. In this thesis, I will examine the functional significance of these proteins, as well as other chromatin-related proteins, from a transcriptomic perspective. Although research into A2/B1 function thus far has focused on its role as a splicing modulator, the studies referenced above provide evidence for other functions. In Chapter II, I will examine these potential functions through identification of novel RNA binding sites, and differences between the A2 and B1 isoforms, in breast cancer cell lines.

In Chapter III, I will describe a role for A2/B1 and TDP-43 in muscle differentiation and repair. Here, rather than just affecting splicing, we show that

A2/B1 and TDP-43 play a major role in the proper nuclear export and localization of mRNAs. To date most reports of cytoplasmic A2/B1 or TDP-43 have focused on their inclusion in disease-related cytoplasmic RNP complexes. However, we demonstrate that these complexes in fact occur normally during muscle differentiation and repair, and that A2/B1 and TDP-43 are necessary for proper completion of these processes.

Research into the role of RNA-protein complexes in chromatin regulation has mainly focused on interactions between RNAs and PRC2. In Chapter IV, I will describe a novel method for detecting RNA-histone interactions, that may occur either directly or indirectly via protein complexes such as PRC2.

23 The goal of this thesis is to expand on the roles of specific RNA-binding proteins that, to date, have aspects that have not been well studied. Transcriptome- wide approaches allow for a less biased approach to studying the roles of these

RNAs, as they allow for the identification and analysis of previously-unknown RNA- protein interactions. The results from these experiments can then facilitate further targeted investigation focusing on the roles of specific protein-RNA complexes.

24 a Nascent transcript

Transcription Factor Pol II

b

c X

d

Figure 1.1: Chromatin Biology a) Euchromatin consists of DNA bound by nucleosomes, with enough accessible DNA to allow the binding of proteins such as transcription factors. b) Electron micrograph of euchromatin. Adapted from Alberts et al. 5th edition. c) Heterochromatin is sufficiently condensed to prevent the binding of transcription factors, preventing nearby genes from being transcribed. d) Electron micrograph of heterochromatin. Adapted from Alberts et al. 5th edition.

25 a Globin synthesis (%)

Months post-conception b

Figure 1.2: Globin Activation a) Timeline of globin gene expression during human development. Embryonic globin is replaced by fetal globin early in gestation, which is replaced by adult global after birth. Figure adapted from Noordermeer and de Laat, 2008. b) Diagram of the human globin gene loci (boxes), including upstream and downstream hypersensitive sites (3′ and 5′ HS) comprising the Locus Control Regions. Also pictured are deletions that lead to thalassemia. Figure adapted from Li et al., 2008

26 Normal Polycomb Mutant Polycomb

Figure 1.3: Polycomb Complex Mutations Effect of a heterozygous Polycomb mutation on sex comb development in Drosophila (figure adapted from Parrish et al. Genes and Development 2007).

27 Eed RNA Suz12 Nucleic acid? WD40- SANT SANT CXC SET a Ezh2 binding 1 746 H3K27me3 catalytic domain RbAp48 AEBP2 H3K4me3, H3K36me2/3 WD40- Suz12 Zn VEFS b binding 1 741

H3 H3K27me3 H3K27me3 c Eed WD40 WD40 WD40 WD40 WD40 WD40 WD40 1 441

H3/H4 H3/H4 d RbAp48 WD40 WD40 WD40 WD40 WD40 WD40 1 425

Ezh2 RNA DNA DNA e Jarid2 JmjN ARID JmjC Zn 1 1246

H3K36me3 f Pcl1 Tudor Zn Zn 1 559

Ezh2Nucleic acid? Suz12 g AEBP2 Zn Zn Zn 1 517 h

Figure 1.4: Polycomb Repressive Complex 2 Subunits and Protein Cofactors Core protein subunits of the PRC2 complex (a-d), and known cofactors (e-g). Major protein domains are annotated inside the protein diagram, with binding partners annotated above. Also pictured (h) is the structure of the core complex along with AEBP2, as predicted by cryo-EM (Ciferri et al., 2012)

28 H3K27me3 PRC2

Eed binds H3K27me3 On heterochromatin: The Eed subunit binds pre- existing H3K27me3, which allows PRC2 to deposit more H3K27me3 around this region of the genome. RNA

Cofactors?

H3K27me3 PRC2

On euchromatin: Histone-binding properties of Eed and RbAp48, and activating mark-binding property of Suz12 allow binding to active chromatin. Further binding of lncRNA and cofactors can then promote the deposition of H3K27me3. Figure 1.5: Model of Interactions Between PRC2, RNA, and Cofactors Model of interactions between PRC2 and protein/RNA cofactors during H3K27me3 deposition

29 aChIRP bCHART c RAP

Cross-linked complexes Cross-linked complexes Cross-linked complexes

Biotinylated Oligos Long Tiling designed to oligos Oligos ssRNA

Sonication, Sonication, Sonication, Retrieve oligos Retrieve oligos Retrieve oligos

RNase RNase RNase

ChIRP-seq ChIRP-MS RAP RAP-MS

Figure 1.6: Methods to identify RNA-associated DNA or protein a) ChIRP-seq/MS uses tiling antisense oligonucleotides (ASOs) to retrieve RNAs of interest. b) CHART-seq uses ASOs designed specifically to single-stranded regions of the RNA of interest. c) RAP-seq/MS uses long ASOs to retrieve RNAs of interest.

30 a

PRC2 PRC2 Pol II

HOTAIR HOXD4 Other genomic loci

b

X Pol II X X

CDKN1C KCNQ1OT1 IGF2 H19 c

Xist

Chromosome X

Figure 1.7: Examples of RNA-PRC2 interactions a) HOTAIR binds PRC2 and localizes it in trans to specific genomic loci. b) Kcnq1ot1 binds PRC2 and regulates the expression of nearby genes in cis, including Igf2, Cdkn1c, and the lncRNA H19 (figure not to scale). c) Xist is transcribed on one X chromosome, then spreads across the entire chromosome. This process leads to silencing of the entire chromosome, creating an inactive Barr body.

31 a

1 RRM RRM Gly-rich M9 372 b

1 RRM RRM Gly-rich M9 341

D290V mutation c Optional B1 exon D302V mutation 1 RRM RRM Gly-rich M9 353

Figure 1.8 Structure of hnRNPs A1, A2, and B1 a) hnRNP A1 contains two RNA Recognition Motifs, a Glycine-Rich domain (also described as a low complexity domain or prion-like domain) b) hnRNP A2 has a shorter glycine-rich domain than A1 c) hnRNP B1 is differentiated from A2 solely by the inclusion of an alternately-spliced 12-amino acid exon at its N-terminal end.

32 a b

Splicing complex A2B1

A2B1 Pol II

c d

A2B1

A2B1

A2B1 e f

A2B1 A2B1

A2B1

Figure 1.9: Proposed functions of hnRNP A2/B1 a) RNA-RNA annealing b) Splicing modulation c) Binding to telomeres d) Reader of m6a modifications e) Autoimmune antibody target f) Binding to 3′ UTR mRNA stability elements

33 a B1

HOTAIR

Nascent transcript PRC2

Pol II

b HOTAIR B1 B1 Nascent transcript PRC2

Pol II

c HOTAIR B1 B1 Nascent transcript PRC2

H3K27me3 Pol II

Figure 1.10: Matchmaker Model for hnRNP B1 Activity a) B1 binds to HOTAIR and HOTAIR target gene transcripts at chromatin b) Matching of RNA transcripts via B1 is promoted by B1-chromatin interaction c) Initiation of heterochromatin at target gene

34 c d

e

Figure 1.11: Mechanisms of A2/B1 Mutation in Multisystem Proteinopathy a) Wild-type A2/B1 (red) localizes near the nucleus (blue) in muscle cells. b) In muscle biopsy from a patient with the A2 D290V mutation, A2/B1 instead has a cytoplasmic distribution. c) Wild-type TDP-43 (red) localizes near the nucleus in a manner similar to A2/B1. d) In a different muscle biopsy from the patient in b), TDP-43 is also found in the cytoplasm despite not containing a mutation. e) In vitro, A2 with the D290V mutation is more prone to fibrillization than wild-type A2. Images adapted from Kim et al., 2011

35 CHAPTER II

THE RNA INTERACTOME OF HNRNP A2/B1

Introduction

While the hnRNP A2 isoform is the more common A2/B1 isoform in most cell types, we have identified the B1 isoform as having preferential binding to HOTAIR. hnRNP B1 also associates with RNA transcripts of known HOTAIR target genes, and is bound to chromosomal loci of those target genes (Meredith et al., 2016). From

these data, we have proposed a model in which hnRNP B1 can act as a matchmaker

between lncRNAs and nascent transcripts at target gene loci. Once matched, the

lncRNA and nascent transcript are able to interact via direct RNA-RNA interactions.

The lncRNA can then recruit PRC2, which catalyzes the deposition of H3K27me3, leading to the formation of heterochromatin at the target gene. This model combines two known aspects of A2/B1 activity: its cotranscriptional binding to nascent

transcripts, and its ability to promote the formation of RNA-RNA interactions.

In order to investigate the function of hnRNP B1, we have first characterized

the RNAs to which it and the A2 isoform bind. We used the recently developed eCLIP

method to examine this question in breast cancer cell lines, which have previously

been used to study the effect of HOTAIR on cancer metastasis (Gupta et al., 2011).

Our work indicates that hnRNP B1 and A2 bind many lncRNAs and mRNAs,

displaying shared and unique binding events and motifs. Interestingly, A2/B1

binding is highly enriched in a stable intronic sequence of HOTAIR that disrupts an

RNA-RNA interaction region. A2/B1-RNA interactions occur primarily on chromatin, with the small fraction of interactions that persist in the cytoplasm displaying a unique distribution within the noncoding sequences of mRNAs.

36 Materials and Methods

Lessons Learned from HITS-CLIP and iCLIP Methods

At the time that the lab first began investigating the feasibility of analyzing direct protein-RNA interactions, the main methods in use were based off of HITS-

CLIP (short for high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation) (Moore et al., 2014) (Figure 2.1a). HITS-CLIP uses short-wave

UV crosslinking to create covalent protein-RNA crosslinks, which are then retrieved via immunoprecipitation with antibodies to a protein of interest. The RNAs are then radiolabeled and size selected on a protein gel while still crosslinked to the protein.

The RNA is then purified, reverse transcribed, and ligated to sequencing adapters on either end.

An improved version of HITS-CLIP, known as iCLIP (individual-nucleotide resolution UV crosslinking and immunoprecipitation), retains the basic initial steps of HITS-CLIP but replaces the sequencing adapters with a system designed to create a circular template off of which the sequencing library can be amplified, improving efficiency as compared to the dual RNA ligation steps involved in HITS-CLIP

(Huppertz et al., 2014) (Figure 2.1c). Furthermore, this protocol takes advantage of the tendency of reverse transcriptase to stall at covalent crosslink sites—since one end of the cDNA will correspond to the crosslink, this method has single-nucleotide resolution of crosslink locations.

We first attempted to use a modified version of iCLIP called FAST-iCLIP

(Fully Automated and Standardized iCLIP) (Flynn et al., 2014). This protocol was an attempt to standardize the adapters and computational pipeline of iCLIP along with reducing the number of required overnight incubations to improve the speed of the

37 protocol. While we were able to demonstrate efficient recovery of radiolabeled RNA using antibodies to hnRNP A2/B1 (abcam ab31645) and B1 (IBL 18941), the downstream steps of this protocol functioned at low efficiency.

The Enhanced CLIP (eCLIP) Method

We then turned to a recently-published variant of CLIP called eCLIP

(enhanced CLIP) (Van Nostrand et al., 2016) (Figure 2.1d; 2.3). The eCLIP method is based on iCLIP, but contains a number of improvements. First, ligation efficiency is improved through the addition of PEG8000 and using ssDNA instead of RNA for

one of the adapters. Second, the RNA radiolabeling steps are replaced with western blotting to determine the size of the immunoprecipitated protein, after which a specified size higher than that band is removed for RNA purification. Third, a small fraction of the sample is removed prior to immunoprecipitation to act as a size- matched input, which allows the final eCLIP samples to be normalized against the input samples.

Briefly, the eCLIP method begins with cells irradiated with 150 mJ of 254 nm

UV light. These cells can then be pelleted and stored at –80ºC. Upon thawing, these cells can either be lysed using the standard eCLIP lysis buffer (50 mM Tris-HCl pH

7.4, 100 mM NaCl, 1% NP-40, 0.1% SDS, 0.5% sodium deoxycholate, and 1x protease

inhibitor) on ice for 15 minutes, or buffers for chromatin isolation. We used a

protocol that has previously been used to prepare chromatin samples for HITS-

CLIP(Kung et al., 2015), in which the crosslinked cell pellets are resuspended in

Buffer A (10 mM HEPES pH 7.9, 1.5 mM MgCl2, 10 M KCl, 0.5 mM PMSF),

incubated on ice for thirty minutes, pelleted and resuspended in Buffer C (20 mM

HEPES pH 7.5, 420 mM NaCl, 15% glycerol, 1.5 mM MgCl2, 0.5 mM PMSF, 1x

38 protease inhibitor), incubated on ice with rotation for thirty minutes, and diluted

with final concentration 6.7 mM HEPES pH 7.5. The chromatin is then isolated via

digestion with 40U TURBO DNase for 30 minutes at 37ºC, which is quenched with

10 mM final concentration EDTA.

The lysed sample is then sonicated using a Bioruptor at the low setting for five

minutes, then treated with RNase I (diluted 1:25) and 4U TURBO DNase for five

minutes at 37ºC while shaking. The sample is then treated with RNase inhibitor and

pelleted. While this is happening, the antibody for immunoprecipitation (the amount

varying by antibody, but generally between 2.5-10 µg per sample) is combined with

Protein G Dynabeads (2 mg beads per 10 µg of antibody) rotating at room

temperature for 45 minutes, using antibodies (for A2/B1, abcam ab31645, 9 µg/IP;

for B1, IBL 18941, 2.5 µg/IP). Following washing of the antibody-bead complex, we combined that with the lysate, and performed the immunoprecipitation at 4ºC overnight with rotation.

Following the overnight immunoprecipitation, the samples were washed with

High Salt buffer (50 mM Tris-HCl pH 7.4, 1 M NaCl, 1 mM EDTA, 1% NP-40, 0.1%

SDS, 0.5% sodium deoxycholate) and Wash buffer (20 mM Tris-HCl pH 7.4, 10 mM

MgCl2, 0.2% Tween-20) and resuspended in 100 µL FastAP enzyme mix. Following fifteen minutes of 37ºC shaking incubation, 300 µL PNK enzyme mix was added for an additional twenty minutes. Following more wash steps, an RNA adapter is ligated onto the 3′ end. These adapters are paired and contain a short random sequence, which helps to avoid issues with cluster identification on HiSeq 2500 sequencers.

Following more washes, the samples are then separated from the Dynabeads through incubation at 70ºC for ten minutes, then split into input, experimental, and

39 western blot samples, then size selected on a NuPage 4-12% Bis-Tris protein gel run at 150V in 1x MOPS buffer. These gels were then transferred to either PVDF (for the western blot) or nitrocellulose membranes overnight at 30V in a Bio-Rad transfer apparatus, in transfer buffer with 20% methanol.

Following the transfer, we imaged the western blot to confirm that our protein of interest was being detected in both the input samples and the immunoprecipitation. After noting the size of the protein of interest, we then cut out strips of nitrocellulose membrane from the protein size up to 75 kD larger than that.

Single-stranded RNA-protein complexes will run about twenty kD higher per seventy nucleotides of RNA length, so this should retrieve RNAs up to about 260 nucleotides in length, or about 230 nucleotides of transcript sequence after accounting for the 3′ adapter. The membrane strips are then digested with proteinase K to release the

RNA, then purified by phenol chloroform extraction.

Finally, the RNAs are reverse transcribed using SuperScript IV, and ligated to a single stranded 3′ DNA adapter (which corresponds to the 5′ end of the original

RNA, and contains a ten-nucleotide random sequenceto use to remove PCR duplicates later on). With adapter sequence now at both ends of the cDNA, the library can be amplified by PCR. We used primers to create dual-indexed samples, as are used in the Illumina TruSeq HT Kit. These samples must be processed using paired-end sequencing, which we did after multiplexing our samples into about ten samples per lane, at a final total concentration of 10 nM. These were then sequenced on Illumina NextSeq or HiSeq 4000 sequencers.

40 Computational Analysis of eCLIP-seq Samples

One advantage of the eCLIP protocol is that it is provided along with an

extensive computational pipeline (Figure 2.4). Following sequencing and demultiplexing of the dual-indexed barcodes, the first step is to separate the paired

3′ adapter sequences. This is performed using a script provided along with the eCLIP protocol, which searches for the adapter sequences and divides the sequencing reads into separate files (to be recombined later in the pipeline. While there are always some reads that don’t contain the proper adapter sequence, a high-quality library will have the adapters in a majority of reads.

The pipeline contains a number of steps that are automated using the Genome

Analysis Toolkit (GATK) (McKenna et al., 2010). First, adapters are removed using the cutadapt program (Martin, 2011). Then, the libraries are mapped against an indexed version of the RepBase database, to filter out sequencing reads corresponding to repetitive elements. The reads that are not repetitive are then mapped to either the hg19 (human) or mm9 (mouse) genomes, and the paired 3′ adapter sequences are combined and removed.

Once these libraries are sorted and indexed, peaks of library buildup are called using the clipper program (Lovci et al., 2013), using a modified version from the Yeo lab that is more liberal with its peak calls, under the assumption that these peaks will next be compared against the input signal at the peak regions. The next script does this by calculating the IP over input signal fold change at each peak, then using that to calculate a p-value for each peak. The p-values used in prior publication has varied, but we decided to use a cutoff of p < 10-5.

41 Scatterplots for sample comparison were generated using the eCLIP input

normalization script but comparing the library from one sample against the peak calls from another sample, thus allowing us to compare fold enrichment over input for two different samples at the same locations. Correlation measurements are

Pearson’s correlation coefficient. Additional comparisons were performed by comparing fold enrichment at peaks in each sample, then performing pairwise comparisons using the 2012 ENCODE Irreproducibility Discovery Rate (IDR)

Pipeline. Peak motifs were determined using the DREME program (Bailey, 2011).

In vitro eCLIP

Cloning and purification of recombinant hnRNP B1 was performed as

previously described (Meredith et al., 2016). HOTAIR and control RNA were in vitro transcribed using the MEGAscript T7 Transcription Kit, treated with TURBO DNase, and purified with RNeasy Qiagen Kit. In a 1:10 RNA:protein molar ratio, 1.2 ug of control or HOTAIR RNA was incubated with recombinant B1 in RNA refolding buffer (20 mM HEPES-KOH pH 7.9, 100 mM KCl, 0.2 EDTA pH 8.0. 20% Glycerol,

0.5 mM PMSF, 0.5 DTT) for 20 minutes at room temperature. The mixture was UV- crosslinked twice at 250 mJ and 254 nm wavelength, with mixing by pipette in between. B1-RNA crosslinked and non-crosslinked samples were treated with RNase

A for 3 minutes at 37ºC and 1200 rpm, then stopped with RNase inhibitor. Following this, the in vitro samples were subjected to end repair, adaptor ligation, SDS-PAGE and and transfer to nitrocellulose, and the remainder of the eCLIP-seq protocol, then sequenced multiplexed with other eCLIP-seq libraries.

42 Cellular Fractionation of MCF7 Cells

Cellular fractionation into cytoplasm, nucleoplasm, and chromatin sub

compartments was performed as previously described (Wysocka et al., 2001). Each

fraction was raised to 1 mL final volume using Buffer D (20 mM HEPES pH 7.5, 210

mM NaCl, 7.5% glycerol, 0.75 mM MgCl2, 0.25 mM PMSF, and 1x proteinase inhibitor) then incubated with TURBO DNase (40U for chromatin sample, 10U for soluble samples) for 30 minutes at 37ºC, then quenched with 10 mM EDTA.

Fractions were then further digested with Bioruptor and RNase (but no additional

DNase) as specified in the original eCLIP protocol, then immunoprecipitated overnight with antibody to hnRNP B1(IBL 18941, 2.5 µg/IP). The rest of the eCLIP protocol was performed identically to other samples.

RNA Isolation and PCR

RNA was isolated with 500 µL TRIzol (Life Technologies) followed by purification by RNeasy kit (QIAGEN). Samples were DNase treated using the

TURBO DNase kit (Ambion). Two µg of each RNA sample was reverse transcribed using a cDNA High Capacity Kit (Life Technologies). cDNA was PCR amplified using

Phusion polymerase and 1 minute extension times to accurately amplify reads over 1 kb in length.

Results

The hnRNP B1 Exon is Well-Conserved and Expressed in Mouse and

Human

The 36-nucleotide exon specific to hnRNP B1 is included in approximately

10% of A2/B1 transcripts in most human tissues. This B1-specific region has also been identified in the mouse and rat hnRNP A2/B1 transcript, along with transcripts

43 corresponding to other minor pseudogenes processed from the same locus (Hatfield

et al., 2002; Kamma et al., 1999). However, this exon is not annotated as being part

of any hnRNP A2/B1 transcript in the mouse RefSeq or Ensembl databases. An

analysis of the hnRNP A2/B1 genomic locus across several species indicates that the

B1 exon is in fact highly conserved across the eutherian lineage, with perfect

nucleotide sequence identity to human in the mouse, rat, cow, and elephant genomes

but no identifiable syntenic region to the B1 exon in the opossum genome (Figure

2.5a). This is better conservation than is displayed by the next downstream exon, which is also well-conserved but only has ~90% sequence identity between these species.

In order to confirm whether or not the B1-specific exon is transcribed in non- human species, we first performed an immunoprecipitation, with an antibody designed to be specific to the human hnRNP B1 unique N-terminal region, on whole- cell protein lysate from mouse C2C12 cells or human MCF7 cells. This IP retrieved hnRNP B1 in both cell types, but a minimal amount of hnRNP A2 (Figure 2.5b). We also confirmed that the B1-specific exon is transcribed in C2C12 cells using RT-PCR primers specific to the 5′ UTR and the downstream exon, creating an amplicon spanning the exon (Figure 2.5c).

eCLIP Reveals the Distribution of B1 and A2/B1 RNA Binding

Methods that examine direct protein-RNA interactions by UV crosslinking and immunoprecipitation (CLIP) are commonly used to identify transcripts bound to proteins of interest. Recently the eCLIP method was developed, which has the advantages of individual nucleotide resolution, requiring less library amplification, normalization of the IP sample against a size-matched input (SMInput), and

44 inclusion of a computational analysis pipeline (Van Nostrand et al., 2016). We

modified the protocol slightly, replacing the cell lysis step with a nuclear isolation

previously used to isolate chromatin-associated CTCF-RNA complexes (Kung et al.,

2015). We performed eCLIP using antibodies to either hnRNP B1 specifically, or

hnRNP A2/B1 in combination, using MCF7 breast adenocarcinoma cells. MCF7 cells

were chosen since much of the prior characterization of HOTAIR has been

performed in MCF7 cells (Gupta et al., 2011; Yu et al., 2017), they have been shown

to express high levels of hnRNP A2/B1, and have been well-characterized by the

ENCODE Project (Dunham et al., 2012). We also performed a parallel experiment

using MCF10A cells, in order to facilitate comparison between MCF7 cells and a non-

tumorigenic counterpart from the same organ system (Soule et al., 1990).

The eCLIP experiments for hnRNP A2/B1 and B1 produced transcriptome-

wide maps of A2/B1 and B1 binding sites. We identified peaks of sequencing read

buildup enriched over SMInput, with significant peaks being defined as those with

low p-value (p < 10-5). In total, we identified 7626 A2/B1 peaks significant in both

replicates, and 14,725 B1 peaks significant in both replicates. These peaks were

highly correlated between replicates (A2/B1 r = 0.87; B1 r = 0.87), but less so using

the different antibodies (r = 0.69; Figure 2.7). Comparisons between replicates also

had lower irreproducibility discovery rate (IDR) scores than comparison between IPs

(Figure 2.6a). Although most peaks were shared between A2/B1 and B1 libraries, we identified a subset of peaks specific to the B1 library (Figure 2.6b).

To further investigate the differences between A2/B1 and B1 RNA binding, we investigated the transcripts containing either A2/B1 or B1 input normalized peaks.

Of the 54,064 transcripts annotated in the RefSeq database, 2,850 contained peaks

45 enriched over input and conserved between replicates in either the A2/B1 or B1

experiment. A majority of these transcripts (1,472) shared peaks in both A2/B1 and

B1 experiments; however, a minority of transcripts were unique to either A2/B1

(479) or B1 (899) experiment. Fewer transcripts (412) contained binding peaks in

exons or UTRs, with a significant proportion of those (148) unique to the A2/B1

experiment. These results suggest that the A2 isoform may have more of a role in

binding mature mRNAs than the B1 isoform (Figure 2.6c).

The distribution of input normalized peaks in both A2/B1 and B1 eCLIP

experiments roughly mirrored the genomic distribution of exons, UTRs, and introns;

however, we identified a shift in peak frequency towards regions of introns within 2

kilobases of splice junctions. Non-tumor MCF10A breast cells demonstrated a

similar profile as MCF7 cells, with a slight overrepresentation of exonic peaks as

compared to both the genomic distribution and the MCF7 eCLIP (Figure 2.6d).

Motif analysis revealed a preference for G-rich sequences in hnRNP A2/B1

binding sites. Previous analyses have suggested that hnRNP A2/B1 has a preference

to bind UAGGG motifs in RNA, such as those found in telomeric RNA (McKay and

Cooke, 1992); a similar UAGG motif has been identified using iCLIP in mouse spinal cord cells (Martinez et al., 2016). However, HITS-CLIP of A2/B1 in 293T cells did not identify a similar motif (Huelga et al., 2012). Our data indicates that, although

hnRNP A2/B1 might have particular affinity for (UAGGG)n sequences, it also binds a variety of AGG-rich sequences. Motif analysis identifies a slight difference in binding preference between A2/B1 and B1. B1 displays a strong enrichment for (AGG)n motifs in both MCF7 and MCF10A cells, while A2/B1 appears to be have a weaker preference for AG-rich regions (Figure 2.6e).

46 Binding of the lncRNA HOTAIR by hnRNP A2/B1

The eCLIP profiles of HOTAIR identified a strong hnRNP A2/B1 binding site

downstream from the previously identified PRC2 binding site (Tsai et al., 2010; Wu

et al., 2013), which is likely responsible for the input signal identified in the first

three exons (Figure 2.8a). The A2/B1 binding site is inside the third intron of

HOTAIR, a surprising finding since HOTAIR is generally thought to function in trans as a well-defined spliced transcript. Accordingly, our prior experiments identifying HOTAIR-A2/B1 interactions used in vitro transcribed, fully spliced

HOTAIR. To begin to validate this result, we designed PCR primers spanning either

intron 3 or intron 2. By RT-PCR of MCF7 total RNA, we found that intron 3 is

retained in a fraction of HOTAIR transcripts, while intron 2 is undetectable (Figure

2.8b). This raises the question as to why this particular intron is being specifically

retained, and whether or not that is related to its being bound by A2/B1.

Intron 3 is located in a region of HOTAIR that contains multiple predicted

RNA-RNA interactions sites between HOTAIR and transcripts of its target genes

(Gupta et al., 2011; Meredith et al., 2016). Interestingly, inclusion of intron 3

disrupts an RNA-RNA interaction site between the HOTAIR and JAM2 transcripts

(Figure 2.8c). Multiple RNA-RNA interaction sites between HOTAIR and HOXD

also exist immediately upstream of the intron 3 splice site, close to a UAGGG motif

that could act as an A2/B1 binding site. The existence of a strong A2/B1 binding site

so close to these predicted RNA-RNA interaction sites provides support for the

model of A2/B1 as a “matchmaker” protein, but also raises the question of whether

this role might be regulated by alternative splicing of intron 3 of HOTAIR.

47 In order to test how A2/B1 might bind HOTAIR in the absence of intron 3, we

performed eCLIP on in vitro spliced HOTAIR transcript cross-linked with purified recombinant hnRNP B1 (Figure 2.8d). This in vitro eCLIP experiment revealed two strong binding peaks within HOTAIR (Figure 2.8e). The first binding site is located in exon 1 of HOTAIR, which, according to the proposed secondary structure of

HOTAIR (Somarowthu et al., 2015), overlaps with a region that basepairs with the

RNA-RNA interaction hotspot of HOTAIR (Figure 2.9). The other binding site resides in exon 5, and, while not complementary to the JAM2 interaction site, is proximal to that site in three-dimensional space and may localize to it in fully folded

HOTAIR. These results suggest that A2/B1 may bind HOTAIR at intron 3 and

remain bound after the intron is spliced out in regions proximal to RNA-RNA

interactions.

Binding of hnRNP A2/B1 to Long Noncoding RNAs

We analyzed the eCLIP experiments for other lncRNAs, besides HOTAIR, that

A2/B1 interacts with. A2/B1 displays strong binding to a small fraction of lncRNAs,

catalogued in both a commonly used lncRNA database (Cabili et al., 2011) and a

recently published database of lncRNAs expressed in MCF7 cells (Sun et al., 2015)

(Figure 2.10a). For example, there are a series of four A2/B1 binding sites in Xist, the

lncRNA that contributes to dosage compensation through inactivation of one X

chromosome in mammals (Augui et al., 2011). Xist spreads in cis along with PRC2

across one X chromosome, leading to repression of nearly all genes on that

chromosome (Simon et al., 2014). The 5′ end of the mouse Xist transcript contains a

1.6-kilobase region known a RepA, which has previously been shown to recruit

proteins including PRC2 (Zhao et al., 2010), and fold into three independent

48 structural modules (Liu et al., 2017a). The downstream portion of this region, which is structurally conserved between mouse and human (Yen et al., 2007), contains four strong, reproducible A2/B1 binding sites in our MCF7 eCLIP dataset. These binding sites lie immediately downstream of iCLIP-derived binding sites for other RNA- binding proteins, RBM15 and RBM15b (Figure 2.10b). RBM15 and RBM15b have been shown to mediate the formation of m6A methylation on Xist (Patil et al., 2016).

HnRNP A2/B1 has been proposed to be a reader of m6A marks (Alarcón et al., 2015), though whether there is a direct physical association between the protein and modified base is not clear.

We also identified an hnRNP A2/B1 peak in the lncRNA NORAD (Figure

2.10c), which has been shown to regulate genomic stability through sequestration of

PUM1 and PUM2 proteins (Lee et al., 2016; Tichon et al., 2016), resulting in more efficient DNA repair. Our identified A2/B1 binding site is near the 3′ end of NORAD, overlapping a potential PUM2 binding site identified by eCLIP in K562 cells (Van

Nostrand et al., 2016), but not overlapping any UGURUAUA PUM2 consensus sequences. NORAD contains no introns, highlighting the fact that A2/B1 makes interactions with RNA that do not involve splicing.

Another lncRNA containing an A2/B1 peak is TUG1, which is a Notch pathway-regulated lncRNA that functions in the maintenance of stemness in glioma stem cells through recruitment of PRC2 to neuronal differentiation genes

(Katsushima et al., 2016; Khalil et al., 2009). TUG1 has also been identified as a potential biomarker in a variety of different cancers (Li et al., 2016b). The hnRNP

A2/B1 binding site appears in the intron immediately following the proposed PRC2- interacting region of TUG1 (Figure 2.10d).

49 The hnRNP B1 Binding Profile Differs in Each Cellular Compartment

Although hnRNP A2/B1 is predominantly found in the nucleus, we decided to

compare how its RNA-binding properties change when found in the cytoplasm. To

this end, we performed separate B1 eCLIP experiments on samples split, using a subcellular fractionation protocol (Wysocka et al., 2001), into cytoplasmic, nucleoplasmic, and chromatin-associated fractions. iCLIP of the RNA-binding protein TDP-43 has been performed in the nucleus versus cytoplasm of neuronal primary cells and SH-SY5Y cells (Tollervey et al., 2011) detecting a shift in binding towards the 3′ UTR of transcripts in the cytoplasmic fraction.

Our eCLIP experiment detected far more significant B1 binding peaks in the chromatin sample (28,539) than in the nucleoplasm (230) or cytoplasm (162). The majority of the nucleoplasmic and cytoplasmic binding sites were also found in the chromatin-associated sample (Figure 2.11a). Of the remaining nucleoplasm- and cytoplasm-specific binding sites, very few were in protein-coding sequence, with the vast majority found instead in proximal introns and UTRs, indicating a potential regulatory role on nuclear-exported RNA (Figure 2.11b).

One example of a transcript with distinct B1-RNA interactions in the soluble,

non-chromatin fractions is the SEC14L1 transcript. While B1 binds to the SEC14L1 3′

UTR in all fractions, the signal in the soluble fractions is both stronger than and

shifted as compared to the chromatin-specific binding peaks (Figure 2.11c),

suggesting that B1 binding sites can change as a message matures, or that

localization is correlated with a specific B1-RNA interaction. We also detected

nucleoplasm/cytoplasm-enriched binding to a number of small Cajal body-specific

RNAs (scaRNAs), which are a family of small transcripts that localize to Cajal bodies,

50 nuclear organelles that are involved in the biogenesis of small nuclear ribonucleoproteins (snRNPs) (Figure 2.11d). In particular, scaRNAs are thought to act as guide RNAs in the modification of spliceosomal RNAs (Darzacq et al., 2002).

A splice isoform of A2/B1 missing exon 7-9, hnRNP A2*, has been shown to interact with telomerase at Cajal bodies (Wang et al., 2012).

Discussion

Our profiling of A2/B1 binding sites transcriptome-wide has lead to a number of novel and interesting findings. We find that both A2 and B1 isoforms likely have some level of preferential association with many transcripts, potentially stemming from recognition of distinct motifs. The surprising manner in which A2/B1 encounters its lncRNA partner HOTAIR before HOTAIR is fully processed contributes to the model of RNA-RNA matchmaking that we have previously proposed (Meredith et al., 2016). We identified novel lncRNA interactions with

A2/B1, the positions of which are suggestive of ways that A2/B1 can contribute to known mechanisms of lncRNA activity. Finally, we describe the strong preferential association of A2/B1 with chromatin-associated RNAs rather than those in the soluble fraction of the nucleus.

Conservation of the hnRNP B1 Isoform

To date, functional studies of the hnRNP A2 and B1 isoforms have, with rare exception (Kamma et al., 1999), treated the two isoforms as a single unit. However, we have shown that in certain circumstances the B1 isoform has a distinct set of RNA binding sites, which are likely due to contributions of direct interactions between the

B1-specific N-terminal region and the target RNA. The proximity of the B1-specific

51 exon to the RRMs suggests that the B1-specific exon is able to regulate the activity of

the RRMs when bound to RNA.

The strong conservation of the sequence of the B1-specific exon as compared

to surrounding sequence strongly implies that it has functional significance, leading

to its being subjected to stabilizing selection across a number of species.

Interestingly, the strong conservation of the B1-specific protein sequence extends to conservation of the RNA and DNA sequence of the B1-specific exon, which also has identical sequence across eutherians. While this portion of the B1 transcript does not appear to be heavily protein-bound according to the eCLIP input libraries, it is possible that it is transiently bound by splicing factors that require a particular sequence in order to generate the ideal proportions of A2 and B1.

Potential Roles of hnRNP A2/B1 on HOTAIR

The shift in hnRNP A2/B1 binding towards proximal introns, as compared to the overall transcriptome, is indicative of the role that A2/B1 is known to play in the regulation of splicing. Our work also expands the knowledge of A2/B1 RNA interactions, with particular implications for regulation and function outside of splicing.

We identified a number of transcripts that contain hnRNP A2-specific or B1- specific binding sites, which include a number of different classes of genes. It is possible that the B1-specific exon interacts with the downstream RRMs or the RNA directly, slightly changing the preferred sequence of the bound RNA. This in turn would allow for the diversification of transcript targets that are regulated by products of a single gene.

52 As a known splicing regulator, it makes sense for A2/B1 near exon-intron

junctions as it does in HOTAIR. In fact, such binding may be responsible for the

small fraction of intron 3 is retained in the HOTAIR transcript in total RNA. We

suspect that binding to intron 3 may promote the formation of RNA-RNA

interactions from the region surrounding the retained intron. However, our evidence

that A2/B1 can also bind fully spliced HOTAIR suggests that A2/B1 is capable of

binding elsewhere in HOTAIR under certain circumstances.

Our in vitro eCLIP experiment suggests a model in which A2/B1 preferentially

binds to intron 3 of HOTAIR, regulating its splicing so as to stay bound. However,

under certain circumstances intron 3 is spliced out, resulting in redistribution of

A2/B1 to nearby sites on HOTAIR. The removal of intron 3, combined with A2/B1

binding nearby, facilitates association of the RNA-RNA interaction hotspot with

other target RNA transcripts (Figure 2.12). This suggests that the other functions of

A2/B1 may act in concert with its alternative splicing function in order to affect target RNAs. Such a multifunctional role has recently been suggested for a number

of transcription factors that also regulate alternative splicing (Han et al., 2017).

Interactions between A2/B1 and Additional lncRNAs

The binding of A2/B1 to NORAD, TUG1, and Xist suggest that it may promote interactions between those RNAs and other proteins. The A2/B1 binding site on

NORAD overlaps a PUM2 binding site, while on TUG1 and Xist A2/B1 binds downstream of binding sites for other proteins. We suspect that, as in the matchmaker model, A2/B1 often acts in concert with other proteins on RNA to facilitate cellular activities.

53 While the activities of hnRNP A2/B1 discussed thus far have been chromatin-

associated, we also identified B1-RNA interactions that occur in the nucleoplasm and

cytoplasm. These include nucleoplasm/cytoplasm-specific interactions, such as the

binding of B1 to scaRNAs that suggests localization of B1 to Cajal bodies, which we

suspect allows B1 to promote RNA-RNA interactions between scaRNAs and their

targets. We also identified B1 binding sites that are shifted depending on the cellular compartment on SEC14L1, which we hypothesize to be a reflection of how B1 activity can be regulated based on its local cellular environment.

Our transcriptome-wide study of A2/B1 RNA binding partners provides many examples demonstrating that A2/B1 has more roles than simply acting as a splicing regulator. We identify a number of cases in which A2/B1 appears to bind RNA in concert with other proteins, as well a site in HOTAIR in which the splicing function of A2/B1 is used as a precursor to subsequent activities on the transcript. These results solidify the view that A2/B1 is a multifunctional RNA-binding protein with a diverse set of roles across the transcriptome.

54 a bc d HITS-CLIP PAR-CLIP iCLIP eCLIP

254 nm 365 nm 254 nm 254 nm 4SU→C

IP, 3′ adapter ligation, protein gel size selection

5′ adapter ligation, reverse transcription

C C

Figure 2.1: CLIP Methods Comparison a) HITS-CLIP relies read-through of reverse transcriptase at crosslink sites, leading to relatively lower RNA yield. b) PAR-CLIP uses photoactivatable nucleosides (e.g. 4-thiouridine) that are activated by exposure to long-wave UV at protein-RNA interaction sites. This method still relies on reverse transcriptase read-through, but allows for more accurate crosslink site identification. c) iCLIP uses cDNA circularization to avoid a reliance on read-through, instead generating a cDNA fragment that ends at the crosslink site. This allows for single- nucleotide resolution without addition of extra nucleosides. d) eCLIP uses some methodological alterations to achieve the single-nucleotide resolution of iCLIP without the circularization step, greatly increasing cDNA yield.

55 CLIP CLIP M-X protein CLIP CLIP B1 #1 B1 #2 ladder A2B1 #1 A2B1 #2

– 130 kD

– 93 kD – 70 kD (pink)

– 53 kD

– 41 kD

– 30 kD

– 22 kD (green)

– 14 kD

– 9 kD

Figure 2.2: A2/B1 Antibody Tests Both abcam A2/B1 and IBL B1 antibodies effectively retrieve RNA-protein complexes, as shown by radio labeled blot. Red lines indicate potential excision location for CLIP experiments.

56 a b c

Nuclear isolation and fragmentation Immunoprecipitation

Protein-RNA complex (UV-crosslinked) Dephosphorylation 3′ linker ligation e d f Proteinase K Protein RNA isolation Size Selection

Reverse Transcription

RNA Removal PCR Amplification g Adaptor Ligation h i Size Selection Sequencing

Figure 2.3: eCLIP Experimental Procedure Flowchart of eCLIP experimental steps

57 Fastq file of sequencing reads demultiplexed into different samples by sequencing core

Remove 5’ and 3’ adaptors using cutadapt program

Fastq file of partially trimmed reads

Remove double-ligated 3’ adaptors using cutadapt program

Fastq file of trimmed reads

Bam file of reads aligning to Align and remove repetitive regions repetitive regions (regions provided using STAR program by Yeo lab)

Bam file of reads aligning to non- repetitive regions

Remove PCR duplicates

Bam file of unique reads aligning to non-repetitive regions

Sort and index bam file

Final bam file of aligned reads for Call peaks using File of peak regions downstream analysis clipper program

Generate bigwig files for upload to genome browser Compare read density in IP vs. input

Genome browser view Input-normalized peaks

Filter input-normalized peaks to p-value threshold (p<10^-5)

Input-normalized peaks for analysis and to upload to genome browser view

Figure 2.4: eCLIP Computational Pipeline Flowchart

58 a b B1-specific exon

Protein K T L E T V P L E R K K IP (20%) Input (1%) Human AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTGCCAGCA-GGTCACC Mouse AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTACCAGCA-GGTCACC 41 kD Rat AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTGCCAGCA-GGTCACC ←B1 Cow AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTGCCAGCA-GGTCACC ←A2

Elephant AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTGCCAGCA-GGTCACC MCF7 Opossum TGGAAATTCATCAATCTTCAGCTACTAAGTTACATCAGGCTGCCCAGAAGCACAAAAGCC (human) 30 kD * *** * ** ** ** ** * *** * **** **

Protein 41 kD Human TCATATT--TAAGAATTTAATTTCCTGCATACAAAGAGGAAAATGTAAATAAAAATTGAA ←B1 Mouse TCATATT--TAAGAATTTAATTTCCTGCATACAAAGA---AAGTGTAAATAAAAATTGAA Rat TCATATT--TAAGAATTTAATTTCCTGCATACAAAGAG-AAAATGTAAATAAAAATTGAA ←A2 C2C12 Cow TCATATT--TAAGAATTTTATTTCCAGCATACAAAAAG-AAAATGTAAATAAAAATTGAA (mouse) 30 kD Elephant TCATATT--TAAGAATTTAATTTCCTGCATACAGAGAG-AAAATGTAAATAAAAATTGAA Opossum TCTGAATGATAATAGCTGGAGCTTCCGTAGGAAGGGAG-CCTGAACATAATCCCGTTGCT ** * * *** * * * * * * * * * * * ***

Protein R E K E Q F R K L F I G G L Human ATGGTATTTTCCTTTGCAGAGAGAAAAGGAACAGTTCCGTAAGCTCTTTATTGGTGGCTT Mouse ATGGTATTTCCCTTTGCAGAGAGAAAAGGAACAGTTCCGAAAGCTCTTTATTGGTGGCTT Rat ATGGTGTTTTCCTTTGCAGAGAGAAAAGGAACAGTTCCGTAAGCTCTTTATTGGTGGCTT Cow ATGCTGTTTTCCTTTGCAGAGAGAAAAGGAACAATTCCGTAAACTCTTTATTGGTGGCTT Elephant ATGGTGTTTTCCTTTCCAGAGAGAAAAGGAACAGTTTCGTAAACTCTTCATTGGTGGCTT Opossum GTCCACTTTTTGTTTGCAGAGAGAAAAGGAACAGTTCCGCAAACTGTTTATTGGAGGCCT * *** *** ***************** ** ** ** ** ** ***** *** * Exon 3 MCF7 cellsC2C12 cells 150 bp c ←B1 (156 bp) 100 bp ←A2 (120 bp)

5′ UTR Exon 1 Exon 2 Exon 3 (156 bp) (6 bp) (36 bp) (111 bp)

Figure 2.5: Conservation of B1-Specific Exon Across Species a) Multiple sequence analysis of hnRNP B1 genomic sequence in human, mouse, rat, cow, elephant, and opossum reference sequences, with B1-specific exon and next downstream exon highlighted, demonstrating high degree of conservation of the B1-specific exon in all eutherian species. Mismatches from human sequence are colored in red. b) Immunoprecipitation using antibody specific to hnRNP B1 (IBL #18941) specifically retrieves hnRNP B1 in both human and mouse samples. c) RT-PCR primers surrounding B1-specific exon 2 demonstrate inclusion of B1 exon in total RNA from both human MCF7 and mouse C2C12 cells.

59 a b

A2/B1 (replicate A) vs. B1 (replicate A) 500 bp Hivep3 42 345 500 42 345 000 42 344 500 (NM_001127714) A2/B1 (replicate A) vs. A2/B1 (replicate B) 20 – B1 (replicate A) vs. B1 (replicate B) eCLIP A2/B1 Replicate A 0 – 20 – eCLIP A2/B1 SMInput

IDR 0 – 20 – eCLIP B1 Replicate A 0 – 20 – eCLIP B1 SMInput

0 – 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 20000 40000 60000 80000 100000 120000 num of significant peaks d Proximal Distal Proximal 5′ UTR Exon Exon 3′ UTR c Intron Intron Intron All transcripts Introns Exons/UTR

479 (25%) 447 (25%) 148 (45%) A2/B1

1389 183 1472 Genomic Distribution hnRNP A2/B1 binding hnRNP B1 binding (human) (MCF7 cells) (MCF7 cells)

B1

899 (38%) 884 (39%) 81 (30%)

hnRNP A2/B1 binding hnRNP B1 binding (MCF10A cells) (MCF10A cells)

e MCF7 MCF10A A2/B1 B1 A2/B1 B1

Figure 2.6: A2/B1 and B1 eCLIP Results in MCF7 Cells a) IDR analysis indicating lower IDR values for replicates compared against one another (A2/B1, black; B1, red) than comparison of replicates from different experiments (purple). b) Example hnRNP A2/B1 eCLIP data, demonstrating region of the transcription factor Hivep3 that contains binding sites (black) enriched specifically in hnRNP B1 (blue). Y-axis scale is normalized to reads per million. c) Analysis of RefSeq transcripts containing input normalized peaks in both replicates of A2/B1 or B1 eCLIP-seq experiments, including number of transcripts unique to either IP. d) Distribution of peaks (conserved between both replicates) in different areas of transcripts in hnRNP A2/B1 and B1 eCLIP experiments, compared to experiments performed in non-tumorigenic MCF10A cells. “Proximal introns” are defined as intronic regions within 2 kb of an exon. e) Top two identified motifs in both replicates of hnRNP A2/B1 and B1 eCLIPs in MCF7 and MCF10A cells.

60 a MCF7 A2/B1 replicatesb MCF7 B1 replicatesc MCF7 IP comparison r = 0.87 r = 0.87 r = 0.69 r = 0.95 r = 0.89 r = 0.65

d MCF10A A2/B1 replicates e MCF10A B1 replicates f MCF10A IP comparison r = 0.79 r = 0.90 r = 0.85 r = 0.67 r = 0.97 r = 0.86

g A2/B1 (replicate A) vs. B1 (replicate A) A2/B1 (replicate A) vs. A2/B1 (replicate B) B1 (replicate A) vs. B1 (replicate B) IDR 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 20000 40000 60000 80000 120000 num of significant peaks Figure 2.7: A2/B1 and B1 MCF7 Replicate Correlation a) Comparison of MCF7 A2/B1 eCLIP replicates displays strong correlation. b) Comparison of MCF7 B1 eCLIP replicates displays strong correlation. c) Comparison of MCF7 A2/B1 eCLIP replicate versus B1 replicate displays relatively weaker correlation. d) Comparison of MCF10A A2/B1 eCLIP replicates. e) Comparison of MCF10A B1 eCLIP replicates. f) Comparison of MCF10A A2/B1 eCLIP replicate versus B1 replicate. g) IDR analysis of MCF10A A2/B1 replicates (black), B1 replicates (red), or A2/B1 versus B1 replicates (purple).

61 a 2 kb 54 363 000 54 362 000 54 361 000 54 360 000 54 359 000 54 358 000 54 357 000 54, 500 bp HOTAIR (NR_003716) 12 – 54 360 000 HOTAIR (NR_003716) 12 – eCLIP A2/B1 Replicate A eCLIP A2/B1 0 – Replicate A 0 – 12 – 12 – eCLIP A2/B1 Replicate B eCLIP A2/B1 0 – Replicate B 0 – 12 – 12 – eCLIP A2/B1 SMInput eCLIP A2/B1 SMInput 0 – 12 – 0 – 12 – eCLIP B1 eCLIP B1 Replicate A Replicate A 0 – 0 – 12 – 12 – eCLIP B1 eCLIP B1 Replicate B Replicate B 0 – 0 – 12 – 12 – eCLIP B1 eCLIP B1 SMInput SMInput

0 – 0 – PRC2 interaction region b c Predicted A2/B1 UAGG binding motif RNA-RNA interaction hotspot Invitrogen 50 bpIntron ladder 3 primersIntron 2 primers

HOTAIR (NR_003716) 54,360,000

⬅ Intron 2 retained 12 – eCLIP B1 Replicate A ⬅ Intron 3 retained 0 –

⬅ Introns spliced out

2 kb HOTAIR (NR_003716) 54,363,000 54,362,000 54,361,000 54,360,000 54,359,000 54,358,000 54,357,000 54, e 150,000 – In vitro eCLIP HOTAIR/B1 Crosslinked 0 – d 150,000 – Recombinant In vitro transcribed In vitro eCLIP hnRNP B1 HOTAIR or control HOTAIR/B1 Non-crosslinked 0 – 150,000 – In vitro eCLIP Anti-luc/B1 Crosslinked 0 – 12 –

eCLIP A2/B1 Replicate A Size selection and 0 – remainder of eCLIP protocol 12 – eCLIP A2/B1 Crosslinking SMInput 0 – Figure 2.8: Investigation of a Novel A2/B1 Binding Site in HOTAIR a) Location of an hnRNP A2/B1 eCLIP peak in an intron of HOTAIR, downstream of the proposed PRC2 binding site (visible in the input library). b) The intron between exons 3 and 4 of HOTAIR retained in a small fraction of HOTAIR MCF7 total RNA. c) Intron 3 disrupts an RNA-RNA interaction site between HOTAIR and JAM2, is adjacent to an interaction site between HOTAIR and HOXD regions, and is near a predicted UAGGG A2/B1 binding motif. d) Schematic of in vitro eCLIP experiment using recombinant B1 protein and in vitro transcribed HOTAIR. e) In vitro eCLIP of B1 and either HOTAIR or negative control anti-luciferase RNA identified strong binding sites in exon 1, and in exons 5-6 of HOTAIR.

62 HOTAIR domain 1

B1 in vitro eCLIP Intron 2

Intron 3 RNA-RNA interaction hotspot

Intron 1

Intron 4

Figure 2.9: In vitro B1 eCLIP Binding Sites Mapping of in vitro eCLIP B1 binding peaks (blue) to HOTAIR secondary structure (Somarowthu et al. 2015), showing comparison to RNA-RNA interaction hotspot (green).

63 Peaks conserved 10 kb between replicates 075 000 73 070 000 73 065 000 73 060 000 73 055 000 73 050 000 73 045 000 73 040,000 Xist (NR_001564) a b 20 – 7626 34 A2/B1 A2/B1 peaks Compare vs. Cabili et al. lncRNA database transcripts eCLIP A2/B1 (5739 transcripts) Replicate A 0 – 14725 48 B1 20 – B1 peaks transcripts eCLIP A2/B1 Replicate B 0 –

20 A2/B1 20 – Compare vs. Sun et al. lncRNA database transcripts eCLIP A2/B1 (4877 transcripts) SMInput 0 – 21 B1 20 – transcripts eCLIP B1 Replicate A c 2 kb 0 – 34 639 000 34 638 000 34 637 000 34 636 000 34 635 000 34 634 000 NORAD (NR_027451) 20 – PUM2 motifs (UAURUAUA) eCLIP B1 Replicate B PUM2 eCLIP Peaks (K562) Replicate 1 Replicate 2 60 – 0 – 20 – eCLIP A2/B1 eCLIP B1 Replicate A 0 – SMInput 60 – 0 – eCLIP A2/B1 Replicate B 0 – 60 – eCLIP A2/B1 SMInput 500 bp 0 – 73,072,500 73,072,000 73,071,500 73,071,000 :chrX Xist (NR_001564) NR 60 – 20 – - 20

eCLIP B1 P MCF7 A2B1 A-IP negfli Replicate A eCLIP A2/B1 Replicate A 0 – 0 – eCLIP MCF7 A2B1 AIP 60 – 20 – - 20 eCLIP B1 eCLIP A2/B1 P MCF7 A2B1 B-IP negfli Replicate B Replicate B 0 – 0 – eCLIP MCF7 A2B1 BIP 20 – - 20 60 – eCLIP B1 eCLIP A2/B1 SMInput SMInput 0 – - 20 0 – 20 – eCLIP B1 LIP MCF7 B1 A-IP negfli Replicate A 0 – eCLIP MCF7 B1 AIP 20 – - 20 eCLIP B1 d 5 kb LIP MCF7 B1 B-IP negfli Replicate B 000 31,370,000 31,375,000 TUG1 (NR_110492) 0 – 15 – eCLIP MCF7 B1 BIP 20 – - 20 eCLIP B1 eCLIP A2/B1 SMInput LIP MCF7 B1 input negfli Replicate A 0 – 0 – 1000 – - 1000 15 – iCLIP RBM15 eCLIP A2/B1 0 – Replicate B 1000 – - 1000 0 – iCLIP RBM15b

15 – 0 – eCLIP A2/B1 SINE SMInput 0 – 15 – eCLIP B1 Replicate A 0 – 15 – eCLIP B1 Replicate B 0 – 15 – eCLIP B1 SMInput

0 –

Figure 2.10: Binding of lncRNAs by A2/B1 a) hnRNP A2/B1 interacts with a small number of previously-identified lncRNAs b) hnRNP A2/B1 binding sites within the RepA region of Xist. RBM15 and RBM15b iCLIP tracks published in Patil et al. 2016. c) An hnRNP A2/B1 binding site within the lncRNA NORAD. PUM2 eCLIP replicate peaks retrieved from ENCODE Project experiment accession ENCSR661ICQ (van Nostrand et al. 2016). d) An hnRNP A2/B1 binding site within an intron of the lncRNA TUG1

64 Proximal Distal Proximal Chromatin 5′ UTR Exon Exon 3′ UTR a b Intron Intron Intron

28,316

73 134 16 MCF7 chromatin MCF7 nucleoplasm MCF7 cytoplasm 15 5 7 Nucleoplasm Cytoplasm c d 2 kb 100 bp SEC14L1 210,000 75,210,500 75,211,000 75,211,500 75,212,000 75,212,500 75,213,000 75,213 SCARNA10 6 619 400 6 619 450 6 619 500 6 619 550 6 619 600 6 619 650 6 619 700 6,619 (NM_001143998) (NR_004387) 5 – 10 – eCLIP MCF7 eCLIP MCF7 chromatin chromatin 0 – 0 – 5 – 10 – eCLIP MCF7 eCLIP MCF7 chromatin input chromatin input 0 – 0 – 125 – 150 – eCLIP MCF7 eCLIP MCF7 nucleoplasm nucleoplasm 0 – 0 – 125 – 150 – eCLIP MCF7 eCLIP MCF7 nucleoplasm input nucleoplasm input 0 – 0 – 225 – 150 – eCLIP MCF7 eCLIP MCF7 cytoplasm cytoplasm 0 – 0 –

225 – 150 – eCLIP MCF7 eCLIP MCF7 cytoplasm input cytoplasm input 0 – 0 –

Figure 2.11: eCLIP of MCF7 Chromatin, Nucleoplasm, and Cytoplasm a) The vast majority of A2/B1 binding peaks were identified in the chromatin sample, with a small minority overlapping with a nucleoplasm or cytoplasm binding peak. A small fraction of binding peaks were unique to either nucleoplasm or cytoplasm. b) Compared to the chromatin sample, the nucleoplasm and cytoplasm peaks were far more likely to identify binding to either 5′ UTR or proximal intronic sequence. c) In the 3′ UTR of SEC14L1, there are two distinct binding peaks for B1. However, in nucleoplasm (purple) and cytoplasm (yellow), A2/B1 preferentially binds between the two chromatin peaks. d) A nucleoplasm-specific binding peak in the small Cajal body-associated RNA SCARNA10.

65 a B1 HOTAIR

Intron 3 is spliced out b

B1 B1

c

B1 B1

Figure 2.12: Model of B1 Interaction with HOTAIR a) hnRNP B1 preferentially binds HOTAIR intron 3. b) Following splicing of intron 3, B1 remains bound to nearby regions of B1. c) B1-HOTAIR binding promotes RNA-RNA interactions from hotspot region.

66 CHAPTER III

ROLES OF TDP-43 AND HNRNP A2/B1 IN MUSCLE DIFFERENTIATION

Introduction

Having adapted the eCLIP method to examine hnRNP A2/B1 binding in human cells, we next used this method to explore the activity of A2/B1 and TDP-43

in mouse muscle differentiation. As mentioned previously, mutations in A2/B1 can

lead to a condition known as multisystem proteinopathy, in which aggregates of

A2/B1 and TDP-43 are found in the cytoplasm (Kim et al., 2013a). A2/B1 has also

been implicated in a number of muscle-specific diseases, such as limb-girdle

muscular dystrophy (Bengoechea et al., 2015) and oculopharyngeal muscular

dystrophy (Fan et al., 2014). The A2/B1 paralog hnRNP A1 is required for proper

muscle development in mice, suggesting that A2/B1 may play a similarly important

role (Liu et al., 2017b).

Many A2/B1-associated muscle diseases feature mislocalized A2/B1

ribonucleoprotein (RNP) complexes, which are thought to interfere with the normal

RNA binding function of A2/B1 (Fan et al., 2014; Kim et al., 2013a; Li et al., 2016a).

This led us to investigate the role that A2/B1 plays during normal myogenesis,

reasoning that knowledge of normal A2/B1 function would allow us to better predict

the effects that mislocalization can have on RNA function.

We also examined potential roles that cytoplasmic ribonuclear protein

complexes (RNPs) play in in muscle cells. While cytoplasmic TDP-43 and A2/B1

complexes are generally thought to be pathologic, our collaborators discovered

evidence that TDP-43 aggregates are found in normally differentiating myotubes,

implying that these aggregates can exist in cells in a non-pathologic context. They

67 also identified similar aggregates in regenerating myotubes for about seven days post-injury, suggesting that cytoplasmic TDP-43 aggregates are present during

periods when muscle-specific transcripts are being actively translated (Figure 3.1).

Our collaborators developed a model to fit these findings in which RNA-

binding proteins such as TDP-43 act as RNA cofactors in the nuclear export of long,

structural RNA transcripts (Figure 3.2). During muscle differentiation or injury

repair, molecules of TDP-43 and RNA aggregate to form RNP complexes that may

aid in export of these transcripts from the nucleus. These complexes are normally

degraded or returned to the nucleus at the conclusion of differentiation or repair;

however, mutations that disrupt this process lead to abnormal formation or retention of these complexes in non-differentiating, quiescent cells.

In order to better understand how dysregulation of TDP-43-RNA complexes can lead to the pathologic effects seen in multisystem proteinopathy, we applied eCLIP as we had done previously with A2/B1. The TDP-43 RNA interactome has been investigated using CLIP-based methods before, in neurons. HITS-CLIP of mouse neurons identified a preference for GU-rich sites in distal introns

(Polymenidou et al., 2011). Knockdown of TDP-43 in these cells resulted in downregulation or alternative splicing of RNAs containing long introns or multiple

TDP-43 binding sites. Another study used iCLIP to examine TDP-43 in human

SHSY5Y neuronal cells, and primary neurons derived from either healthy patients or patients with frontotemporal dementia (Tollervey et al., 2011). The RNAs bound by

TDP-43 in these cells appear similar to those bound in mouse brain—they contain

GU-rich sites, are subject to alternative splicing in TDP-43 knockout, and are involved in neuronal development.

68 Our collaborators also found evidence that the TDP-43 aggregates form

stable, higher order, amyloid oligomer-like assemblies. Not only are these assemblies visible in myotube samples run on denaturing gels (Figure 3.3a-b), they are also immunoreactive for the A11 amyloid oligomer antibody. Immunostaining with the

A11 antibody specific to amyloid oligomers in differentiating myotubes and injured cells reveals a nearly identical distribution as TDP-43 (Figure 3.3c). Finally, powder

x-ray diffraction indicates that TDP-43 assemblies contain a cross-beta structure

typical of amyloid oligomers (Figure 3.3d). These results suggest that TDP-43- containing RNPs form tightly packed amyloid-like oligomers. However, the mechanisms behind this process, and the transcripts contained within the RNPs, remain unclear.

To date research into amyloid oligomers has focused on pathologic contexts such as Alzheimer’s disease or amyloidosis, with only a few examples of non-disease associated amyloid in humans (Berson et al., 2003; Maji et al., 2009). This study represents one of the first descriptions of non-pathogenic amyloid oligomers forming in neurons or muscle cells.

Materials and Methods

C2C12 Cells are a Model System for Myogenesis

The eCLIP protocol was performed as previously described, with minor modifications. Mouse C2C12 cells were used as a model system in which to examine muscle differentiation. These cells are grown as myoblasts, and can be induced to differentiate into myotubes through growth in low-serum media. This differentiation process has been shown to be comparable to the differentiation process of primary muscle cells (Burattini et al., 2004). While the amount of differentiation time used

69 varies in the literature, we differentiated our cells for eight days, confirming

differentiation by microscope prior to harvest.

Modifications to the eCLIP Method

We used a rabbit antibody to TDP-43 (Bethyl A303-223A) that has previously

been used for eCLIP experiments in human K562 and HepG2 cell lines (Van

Nostrand et al., 2016). Unlike the hnRNP A2/B1 and B1 antibodies, we also used this

antibody to probe the confirmatory eCLIP western blots. This antibody retrieves

TDP-43-RNA complexes with far greater efficiency than an IgG control, as

demonstrated by IP with radiolabeled RNA (Figure 3.4).

eCLIP data analysis was performed as previously described. Downstream

analysis was performed on significant peaks that were conserved in both replicates of

each cell type.

Results

hnRNP A2/B1 Binds to a Variety of Myogenic Genes in the Cytoplasm

We first performed eCLIP using antibodies to hnRNP A2/B1, as previously

described, on C2C12 myoblasts and differentiated myotubes (Figure 3.5), generating

transcriptome-wide maps of A2/B1 RNA binding sites. As before, we identified

significant binding peaks in all replicates, with significant peaks being defined as

those with low p-value (p < 10-5) and are detected in both biological replicates.

Significant peak location and strength were highly correlated between replicates

(myoblast r = 0.99; myotube r = 0.81) and showed a low IDR score. Comparison

between cell types displayed significantly lower correlation (r = 0.64) and higher

IDR score (Figure 3.6), implying that A2/B1 might bind to unique subsets of RNA transcripts during the course of differentiation.

70 We identified 518 significant binding peaks in 100 genes in myoblasts, and

1079 significant binding peaks in 150 genes in myotubes. These peaks were most

likely to be found in the 3′ UTR of genes (Figure 3.7a) and in protein-coding genes

(Figure 3.7b), implying that A2/B1 is likely to bind to mature mRNA transcripts in

these cells. Gene ontology analysis of these genes did not identify any particular

affinity for particular biological processes.

We identified binding to a number of coding and noncoding RNAs, including

the long noncoding RNAs Xist and H19 (Figure 3.8a-d). Interestingly, A2/B1 also

appears to bind to the 3′ UTR of its own transcript, with more binding regions

detectable in myotubes, suggesting that it may be regulating its own expression and

splicing during myogenesis (Figure 3.8e-f). A2/B1 also binds to a number of

myogenesis-related genes, including MBNL1, TAF15, and TRIM72 (Figure 3.9).

TDP-43 Binds Coding Regions of RNAs in Both Myoblasts and Myotubes

TDP-43 eCLIP was performed similarly to the A2/B1 eCLIP, and identified

556 binding peaks across 174 genes for myoblasts, and 975 binding sites across 320 genes for myotubes as being significantly enriched over size matched input and shared between biological replicates. The replicates of each cell type displayed high correlation with each other, but lower correlation between cell types. This is supported by the IDR analysis, which has a similar profile as both of the A2/B1 IDR analyses. For this experiment we also prepared eCLIP libraries using the IgG antibody for sequencing, which produced far fewer usable reads than either the input or IP libraries. Accordingly, there were far fewer peaks identified in the IgG libraries, which did not correlate well with the IP peaks by the IDR analysis (Figure 3.11).

71 In both myoblast and myotube libraries, TDP-43 eCLIP binding peaks were more likely to be found in coding exons of protein coding genes (Figure 3.12a-b), a surprising finding given that previous TDP-43 CLIP experiments in neurons identified binding primarily in distal intronic sequences (Polymenidou et al., 2011;

Tollervey et al., 2011). A possible explanation for this is the eCLIP computational pipeline step that removes sequencing reads mapping to repetitive sequence—much intronic sequence, especially in the mouse genome, has been classified as repetitive in RepeatMasker (Smit et al.), which would lead previous CLIP-seq analyses that do not include this step to erroneously assign reads to repetitive regions in introns.

Another possibility is that this is evidence of a biological phenomenon in which TDP-

43 tends to bind exonic mRNA sequence in muscle cells specifically.

Motif analysis of either cell type recapitulated the strong enrichment of (UG)n motifs described in previous TDP-43 CLIP-seq experiments (Figure 3.12c). Many

TDP-43 target genes also contained UGUGU motifs, which are often found adjacent to TDP-43 binding sites. One example is TDP-43 binding to the 3′ UTR of its own

RNA at a location surrounded by UGUGU motifs (Bhardwaj et al., 2013) (Figure

3.13a-b). We also detect previously-identified binding to the lncRNAs NEAT1 and

MALAT1, which contain numerous UGUGU motifs (Guo et al., 2015; Tollervey et al.,

2011) (Figure 3.13c-f). As a negative control, we examined TDP-43 binding to the ubiquitously expressed housekeeping gene GAPDH, which only contains one

UGUGU motif, and detected no input-normalized peaks (Figure 3.13g-h).

The eCLIP results confirmed our hypothesis that TDP-43 is binding to long myogenic transcripts during differentiation. In myotubes, we detect multiple TDP-43 binding peaks in the exons of genes involved in the contractile apparatus such as

72 nebulin, titin, myosin 3, and tropomyosin C (Figure 3.14). In contrast, there was

minimal binding to these transcripts in either the IP or input sample in myoblasts.

The transcription of these genes is upregulated in myotube as opposed to myoblast

(de Klerk et al., 2015), but the increase in TDP-43 binding is greater than the

increase in transcription.

Of note, the eCLIP signal in these transcripts was predominantly exonic,

which was well-visualized at exon-intron junctions, and suggested that TDP-43 is

predominantly binding to mature structural RNAs rather than acting as a splicing

regulator. To further investigate the particular genes that display exonic binding, we

filtered the overall list of significant peaks into peaks existing only in exons and

UTRs, and further subdivided those lists into a list of exonic/UTR peaks unique to

myotubes. Gene ontology analysis of the final, unique-to-myotube list indicate strong

selectivity for mRNAs associated with the sarcomere (p-value = 3.6 x 10-23) (Mi et

al., 2017) (Table 3.1).

Differences between TDP-43 and A2/B1 RNA Binding in Muscle

In C2C12 cells hnRNP A2/B1 tends to bind to the 3′ UTR of its target

transcripts, rather than exons as TDP-43 does. This differs from human MCF7 cells, in which we found that A2/B1 binds predominantly to proximal introns, implying that the RNA binding partners of A2/B1 are dependent on cell type and local environment. Conversely, A2/B1 has been shown to bind 3′ UTRs in mouse spinal cord as well (Martinez et al., 2016), leading us to speculate that the preference for binding to the 3′ UTR is a characteristic of mouse A2/B1, or is due to a common aspect of the cellular environment in neurons and muscle.

73 We had suspected that some transcripts might be bound by both A2/B1 and

TDP-43 at the same time based on prior studies demonstrating interactions between the two proteins (Buratti et al., 2005; D’Ambrogio et al., 2009; Mohagheghi et al.,

2016; Romano et al., 2014). However, we found that the transcripts bound by A2/B1 tend to differ from those bound by TDP-43 (Figure 3.15a-b). For both A2/B1 and

TDP-43, a majority of bound transcripts are associated with one protein but not the

other.

Some transcripts bound by both TDP-43 and A2/B1 demonstrated binding to

specific yet unique locations on the transcript. On the lncRNA H19, for example,

both proteins gain binding sites during differentiation, but A2/B1 appears to localize

to the 5′ end of the transcript, while TDP-43 binds to the 3′ end (Figure 3.15c-d).

Meanwhile, on Xist, both proteins appear to bind to the RepA region at the 5′ end,

but bind to different regions in the central portion of the transcript (Figure e-f).

Gene ontology analysis of the mouse transcripts that have 3′ UTR peaks in

myotubes only were inconclusive. However, when performing less stringent analysis

that allowed for peaks anywhere in the transcript, we detected a slight enrichment

for myogenesis-related categories along with splicing. This implies that A2/B1 might

cooperate with TDP-43 in binding cytoplasmic mRNAs during myogenesis, but can

additionally bind 3′ UTRs of RNAs for other purposes (Table 3.2).

Discussion

Describing the binding preferences of A2/B1 and TDP-43 during the course of

muscle differentiation is a first step towards a more complete explanation of how

RNA is normally processed and transported around the cell during the complex

process of muscle differentiation and regeneration. Although A2/B1 and TDP-43 do

74 not act in concert across the transcriptome as we had previously suspected, we still found evidence that they may bind to targeted transcripts in a cooperative manner.

Our strong evidence that TDP-43 is involved in nuclear export of myogenic transcripts implies that the role of A2/B1 might shift depending on the cellular context, and which RNAs are transcribed and available to bind.

The massive size of RNAs bound by TDP-43 during differentiation is also remarkable. For large transcripts such as titin, which is over 100 kb in length even after splicing, export from the nucleus must be a complex process. We hypothesize that TDP-43 binds across the entire length of titin and similar transcripts, acting in a manner analogous to histones to condense the RNA and more easily transport it into the cytoplasm. RNA granules have been shown to perform important functions in a number of cellular contexts (Fatimy et al., 2016; Khong et al., 2017). While TDP-43 has previously been shown to form small RNP complexes, that role is extended by our observation of TDP-43-RNA granules in muscles.

The observed cytoplasmic distribution of TDP-43 in muscle differentiation and regeneration helps to explain why these aggregates are so common in neuromuscular degenerative disease. Cellular damage would trigger the muscle regeneration pathway, leading TDP-43 to bind long structural transcripts and travel along with them into the cytoplasm. It is possible that during disease states a downstream step in the repair pathway has failed, leaving transcript aggregates and their associated RNA stuck in the cytoplasm. Similar aggregates are also seen in traumatic brain injury, suggesting that a similar process may also be taking place in neurons (Wright et al., 2016).

75 Finally, the lack of co-binding of transcripts between TDP-43 and A2/B1 is an interesting observation, as it suggests that these RNA-binding proteins might have evolved to target different transcripts at different times and in different cellular states. This provides a model for how a network of RNA-binding proteins, including

A2/B1, TDP-43, and other hnRNPs, might act in concert to regulate a wide variety of transcripts (Figure 3.16). This would also explain how mutations in one RNA- binding protein, such as the D290V mutation in A2/B1, could lead to disease by disrupting the delicate balance of homeostasis between these RNA-binding proteins and their associated transcripts.

76 a TDP-43TDP-43 MyHCMyHC Pax7 MergeMerge

s Myotubes ) (C2C12)

Myotubess (primary) )

TDP-43 MyHC Merge b

No injury

Five days post-injury

Ten days post-injury

Figure 3.1: TDP-43 Cytoplasmic Distribution of TDP-43 in Myotubes a) TDP-43 (red) colocalizes with MyHC (myotube marker; green) but not Pax7 (myoblast marker; purple) in both C2C12 and primary myotubes. b) TDP-43 (purple) becomes prominent in the cytoplasm in injured myotubes, but that effect is lessened over time. Data courtesy of Parker/Olwin labs.

77 a

Muscle Injury

b

c d

With Normal A2/B1 or TDP-43 mutation

Figure 3.2: Model of TDP-43-Mediated Muscle Repair a) A normal muscle myotube cell, containing multiple myonuclei. b) In response to injury, complexes of TDP-43 and myogenesis-related RNA transcripts are released from myonuclei into the cytoplasm. c) Under normal circumstances, the TDP-43 is returned to the nucleus or degraded once repair is complete. d) If hnRNP A2/B1 or TDP-43 is mutated then TDP-43 remains in the cytoplasm, leading to pathologic effects.

78 a b c TDP-43 A11

d kDa -TDP-43 (n)

1,236 -TDP-43 (28) 1,048 -TDP-43 (24) Uninjured Higher -TDP-43 (20) molecular -TDP-43 (18) weight 720 st -TDP-43 (12) 480 -TDP-43 (10) 5 DPI -TDP-43 (8)

242 -TDP-43 (6) -TDP-43 (4) 146 -TDP-43 (3) Lower -TDP-43 (2) s molecular 66 weight ury -TDP-43 (1) 10 DPI

NativePAGE IB: TDP-43 d CLEM e f

TDP-43 DIC Merge EM Muscle regeneration TDP-43/A11 deposition

Injury

0 days 30-45 days IP: Amyloid Oligomer (A11) TDP-43

Figure 3.3: TDP-43 is Correlated With Amyloid Oligomers a) In myotubes, TDP-43 tends to aggregate into oligomers. b) TDP-43 aggregates are SDS-resistant when run on an SDD-AGE gel. c) TDP-43 and A11 (antibody to amyloid oligomers) display the same staining pattern in myotubes post-injury. d) By correlative light electron microscopy, amyloid oligomers retrieved with A11 antibody contain TDP-43. e) By micro-electron diffraction, TDP-43 complexes display evidence of ß-strands but not mated ß-sheets, suggesting an amyloid-like conformation. Data courtesy of Michael Hughes, David Eisenberg lab (UCLA). f) Model for amyloid formation during muscle regeneration. Data courtesy of Parker/Olwin labs.

79 aC2C12 Myoblast eCLIP b C2C12 Myotube eCLIP

IgG input (1%)IgG IP (10%) Non-crosslinkedNon-crosslinked input (1%)M-X proteinIP (10%)Crosslinked marker Crosslinked IP (replicate input A)Crosslinked (10%) (replicateCrosslinked IP A) (replicate (1%) input B) (10%) (replicate B) (1%) Non-crosslinked input (0.1%) Non-crosslinkedIgG input (0.1%)(1%)IgG IP (10%) Non-crosslinkedNon-crosslinked input (1%)M-X proteinIP (10%)Crosslinked marker Crosslinked IP (replicate input A)Crosslinked (10%) (replicateCrosslinked IP A) (replicate (1%) input B) (10%) (replicate B) (1%)

130 kD — * * 93 kD — 93 kD — 70 kD — * (~50 kD) * 53 kD — — IgG heavy chain 70 kD — — TDP43 (~45 kD) (~50 kD) 41 kD — 53 kD — — IgG heavy chain — TDP43 (~45 kD) 30 kD — 41 kD —

22 kD — 30 kD —

14 kD — 22 kD —

eCLIP IP: Bethyl A303-223A (Lot 1) 5 µg 14 kD — WB probe: Bethyl A303-223A (Lot 1) 1:2000 eCLIP IP: Bethyl A303-223A (Lot 1) 5 µg WB probe: Bethyl A303-223A (Lot 1) 1:2000 c RNAse Low High IgG d Myoblast Myotube Myotube Myoblast Myoblast Myotube — Input IP (A replicate)IP (B replicate)Non-crosslinkedIgG

130 kD 130 kD — 93 kD 93 kD — * 70 kD 53 kD 70 kD — 41 kD 30 kD * 53 kD — 22 kD TDP-43 -- — 14 kD 41 kD — 9 kD C2C12Myoblast eCLIP

30 kD —

22 kD — Input IP (A replicate)IP (B replicate)Non-crosslinkedIgG 14 kD — 130 kD 93 kD 70 kD IP: TDP-43 (A303-223A) 53 kD 32P-labeled 41 kD 30 kD 22 kD C2C12Myotube eCLIP 14 kD 9 kD

Figure 35: eCLIP TDP-43 Procedural Tests a) Western blot displaying retrieval of TDP-43 by IP during myoblast eCLIP, as compared to non-crosslinked and IgG controls. b) Western blot of eCLIP controls for myotube experiment. c) 32P Radiolabeling of eCLIP RNA followed by IP for either TDP-43 or IgG demonstrates specific retrieval of TDP-43 RNA-protein complexes. “Low” RNase is diluted 1:25 as per the eCLIP protocol, while “high” RNase is diluted 1:2. IgG sample uses 1:25 dilution. d) Size selection of eCLIP TDP-43 libraries.

80 a b

NX-input (0.1%)IgG-input IgG(1%) beads (1%)NX-input (1%)NX-beadsM-X (20%) proteinA-IP ladder (20%)A-input (1%)B-IP (20%) B-input (1%) NX-input (0.1%)IgG-input IgG-beads(1%) NX-input (15%) (1%)NX-beadsM-X (20%) proteinA-IP ladder (20%)A-input (1%)B-IP (20%)B-input (1%)

— 170 kD — 130 kD — 130 kD — 93 kD — 93 kD — 79 kD (pink) — 79 kD (pink) — 53 kD — 53 kD

— 41 kD — 41 kD — hnRNP A2B1 (35 kD) — hnRNP A2B1 (35 kD) — 30 kD — 30 kD — 22 kD (green) — 22 kD (green) — 14 kD — 14 kD — 9 kD — 9 kD c d

170 bp — IP (A replicate) IP (B replicate) Non-crosslinked 170 bp — IP (A replicate) IP (B replicate) Non-crosslinked Input 130 bp — 130 bp — Input 93 bp — 93 bp — 70 bp — 70 bp —

53 bp — 53 bp —

41 bp — 41 bp —

30 bp — 30 bp —

22 bp — 22 bp —

14 bp — 14 bp — 9 bp — 9 bp —

e

Myoblast 1:25 RNaseMyoblast dilution 1:2 RNaseMyotube dilution 1:25 RNaseMyotube dilution 1:2 RNaseMyoblast dilutionMyotube IgG IgG

170 kD — 130 kD — 93 kD — 70 kD —

53 kD —

41 kD —

30 kD —

22 kD —

14 kD —

Figure 3.5: eCLIP C2C12 A2/B1 Procedural Tests a) Western blot displaying retrieval of A2/B1 by IP during myoblast eCLIP, as compared to non-crosslinked and IgG controls. b) Western blot of eCLIP controls for myotube experiment. c) Size selection of eCLIP A2/B1 myoblast libraries. d) Size selection of eCLIP A2/B1 myotube libraries. e) Radiolabeled blot demonstrates recovery of RNA-protein complexes with antibody to A2/B1 in C2C12 myoblasts and myotubes as compared to IgG control.

81 aMyoblast replicates b Myotube replicates R = 0.80 R = 0.74 R = 0.99 R = 0.85

cMyoblast vs. Myotube d R = 0.63 R = 0.64 IDR 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 num of significant peaks Myoblast (replicate A) vs. IgG Myotube (replicate A) vs. IgG Myoblast (replicate A) vs. Myotube (replicate A) Myoblast (replicate A) vs. Myoblast (replicate B) Myotube (replicate A) vs. Myotube (replicate B)

Figure 3.6: Correlation of A2/B1 eCLIP Experimental Replicates a-c) Correlation of fold enrichment between replicates of all (black) or significant (red) peaks. Values shown are Pearson’s correlation coefficient. d) Irreproducibility Discovery Rate analysis comparing above replicates along with IgG libraries, demonstrating enhanced reproducibility for replicates from similar cell types, and low reproducibility for replicates compared against IgG.

82 a Myoblast Myotube

5d′ UTR Exon Prox. Intron Dist. Intron Prox. Intron Exon 3′ UTR

2 kb 2 kb b Myoblast Myotube

antisense lincRNA miRNA misc_RNA mt_rRNA mt_tRNA non_coding nonsense_mediated_decay processed_transcript protein_coding retained_intron rRNA snoRNA snRNA

Figure 3.7: eCLIP hnRNP A2/B1 Summary a) hnRNP A2/B1 exhibits a strong preference for binding 3′ UTR sequences in both myoblasts and myotubes. b) A2/B1 binding peaks tend to be found in protein-coding genes in both myoblasts and myotubes.

83 10 kb 10 kb Xist (NR_001463) Xist (NR_001463) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks a65 – b 65 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – 1 kb 1 kb H19 (NR_130973) H19 (NR_130973) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks c 225 – d 225 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – 5 kb 5 kb Hnrnpa2b1 (ENSMUST00000114459) Hnrnpa2b1 (ENSMUST00000114459) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks 25 – e25 – f eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 25 – 25 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 25 – 25 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 –

Figure 3.8: A2/B1 C2C12 eCLIP at lncRNAs and A2/B1 Locus eCLIP results in reads per million for myoblast (a, c, e) and myotube (b, d, f) demonstrate binding to lncRNAs Xist and H19, and to 3′ UTR of hnRNP A2/B1 transcript.

84 60 350 000 50 kb 60 400 000 60,450,000 60 350 000 50 kb 60 400 000 60,450,000 Mbnl1 (NM_001253708) Mbnl1 (NM_001253708) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks a12 – b 12 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 12 – 12 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 12 – 12 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – 20 kb 20 kb 83 290 000 83 300 000 83 310 000 83 320 000 Taf15 (NM_027427) Taf15 (NM_027427) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks 25 – 25 – c d eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 25 – 25 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 25 – 25 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0Taf1 – 5 Gm1143 5

5 kb 135 150 000 5 kb 135 155 000 Trim72 (NM_001079932) Trim72 (NM_001079932) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks e20 – f 20 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 20 – 20 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 20 – 20 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – Trim72 2

Figure 3.9: A2/B1 C2C12 eCLIP of mRNAs eCLIP results in reads per million for myoblast (a, c, e) and myotube (b, d, f) demonstrate binding to transcripts associated with muscle disease in myotubes.

85 aMyoblast replicates b Myotube replicates R = 0.75 R = 0.63 R = 0.96 R = 0.81

cMyoblast vs. Myotube d R = 0.56 R = 0.46 IDR 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 10000 20000 30000 num of significant peaks Myoblast (replicate A) vs. IgG Myotube (replicate A) vs. IgG Myoblast (replicate A) vs. Myotube (replicate A) Myoblast (replicate A) vs. Myoblast (replicate B) Myotube (replicate A) vs. Myotube (replicate B)

Figure 3.10: Correlation of TDP-43 eCLIP Experimental Replicates a-c) Correlation of fold enrichment between replicates of all (black) or significant (red) peaks. Values shown are Pearson’s correlation coefficient. d) Irreproducibility Discovery Rate analysis comparing above replicates along with IgG libraries, demonstrating enhanced reproducibility for replicates from similar cell types, and low reproducibility for replicates compared against IgG.

86 Myoblast classification (536 peaks) Myotube classification (975 peaks) a

5′ UTR Exon Prox. Intron Dist. Intron Prox. Intron Exon 3′ UTR

2 kb 2 kb b Myoblast classification Myotube classification (174 genes) (320 genes)

protein coding retained intron processed transcript nonsense mediated decay lincRNA snRNA miRNA non coding

antisense Mt rRNA snoRNA psudogene rRNA misc RNA c Myoblast Myotube

All peaks

Exon/ UTR peaks

Figure 3.11: eCLIP TDP-43 Summary a) TDP-43 binding peaks are most commonly found in coding exons. b) TDP-43 binding peaks are most commonly found in protein-coding genes. c) Motif analysis of TDP-43 peaks recapitulates previously-identified (UG)n motif.

87 TDP-43 (NM_145556) TDP-43 (NM_145556) 10 kb 10 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation Combined peaks Combined peaks a b 75 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 75 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 75 – 75 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 75 – 75 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – MALAT1 (NR_002847) 5 kb MALAT1 (NR_002847) 5 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation Combined peaks c 4000 – d Combined peaks 2500 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 4000 – 2500 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 4000 – 2500 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 4000 – 2500 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – NEAT1 (NR_131212) NEAT1 (NR_131212) 10 kb 10 kb Exonic UGUGU motifs 5 845 000 5 840 000 5 835 000 5 830 000 5,825, Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation e f Combined peaks Combined peaks 200 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 200 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 200 – 75 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 75 – 200 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – GAPDH (NM_008084) GAPDH (NM_008084) 2 kb 2 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation Combined peaks gCombined peaks 50 – h 25 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 50 – 25 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 50 – 25 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 50 – 25 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – Figure 3.12: TDP-43 eCLIP at Previously Studied RNAs eCLIP results in reads per million (except for IgG, shown as raw reads) for myoblast (a, c, e, g) and myotube (b, d, f, h) demonstrating previously-described binding to TDP-43 3′ UTR, NEAT1, and MALAT1, and minimal binding to negative control transcript GAPDH. Scale between myoblast and myotube may differ.

88 Nebulin (NM_010889) Nebulin (NM_010889) 100 kb 100 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation aCombined peaks b Combined peaks 75 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 75 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 75 – 75 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 75 – 75 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – Titin (NM_011652) Titin (NM_011652) 100 kb 100 kb Exonic UGUGU motifs Exonic UGUGU motifs 850 000 76 800 000 76 750 000 76 700 000 76 650 000 76 600 000 76 550 000 Mouse-human Mouse-human conservation conservation Combined peaks Combined peaks c d 350 – 350 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 350 – 350 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 350 – 350 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 350 – 350 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – MYH3 (NM_001099635) 10 kb MYH3 (NM_001099635) 10 kb Exonic UGUGU motifs Exonic UGUGU motifs 66 895 000 66 900 000 66 905 000 66 910 000 66 915 000 Mouse-human Mouse-human conservation conservation Combined peaks 150 – eCombined peaks f 150 – eCLIP TDP43 Replicate A eCLIP TDP43 0 – Replicate A ) 0 – 150 – ) 150 – eCLIP TDP43 Replicate B eCLIP TDP43 0 – Replicate B ) 0 – 150 – ) eCLIP TDP43 150 – SMInput eCLIP TDP43 SMInput 0 – 150 – 0 – eCLIP IgG 150 – (raw reads) eCLIP IgG 0 – (raw reads) 0 – TNNC1 (NM_009393) TNNC1 (NM_009393) 2 kb 2 kb Exonic UGUGU motifs 500 32 022 000 32 022 500 32 023 000 32 023 500 32 024 000 32 024 500 32 025 000 32Exonic UGUGU motifs 500 32 022 000 32 022 500 32 023 000 32 023 500 32 024 000 32 024 500 32 025 000 32 Mouse-human Mouse-human conservation conservation gCombined peaks h Combined peaks 150 – 150 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – ) ) 150 – 150 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – ) ) 150 – 150 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 150 – 150 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – Figure 3.13: eCLIP TDP-43 at Myogenic Transcripts eCLIP results in reads per million (except for IgG, shown as raw reads) for myoblast (a, c, e, g) and myotube (b, d, f, h) at myogenesis-associated transcripts demonstrate increased binding in myotubes.

89 a b

66 99 34 51 161 328

A2/B1 TDP-43 A2/B1 TDP-43

c d H19 (NR_130973) 1 kb H19 (NR_130973) 1 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human conservation Mouse-human conservation TDP-43 Combined peaks TDP-43 Combined peaks 5000 – 5000 – eCLIP TDP-43 eCLIP TDP-43 Replicate A Replicate A 0 – 0 – 5000 – 5000 – eCLIP TDP-43 eCLIP TDP-43 Replicate B Replicate B 0 – 0 – 5000 – 5000 – eCLIP TDP-43 eCLIP TDP-43 SMInput SMInput 0 – 0 – eCLIP IgG eCLIP IgG (raw reads) (raw reads)

Combined peaks Combined peaks 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput e 0 – f 0 – Xist (NR_001463) 1 kb Xist (NR_001463) mm9 1 kb 10 kb 100,680,000 100,675,000 100,670,000 100,665,000 100,660,000 Exonic UGUGU motifs Exonic UGUGU motifs

Mouse-human conservation Mouse-human conservation TDP-43 Combined peaks TDP-43 Combined peaks 150 – 150 – eCLIP TDP-43 eCLIP TDP-43 Replicate A Replicate A 0 – 0 – 150 – 150 – eCLIP TDP-43 eCLIP TDP-43 Replicate B Replicate B 0 – 0 – 150 – 150 – eCLIP TDP-43 eCLIP TDP-43 SMInput SMInput 0 – 0 – 150 – eCLIP IgG 150 – eCLIP IgG (raw reads) (raw reads)

0 – 0 – A2/B1 Combined peaks A2/B1 Combined peaks 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – Figure 3.14: Comparison of A2/B1 and TDP-43 eCLIPs a) A minority of transcripts bound by either TDP-43 or A2/B1 are bound by both proteins in myoblasts. b) A minority of transcripts bound by either TDP-43 or A2/B1 are bound by both proteins in myotubes. c) eCLIP of the lncRNA H19 with both TDP-43 (top) and A2/B1 (bottom) in myoblasts identifies a shared binding site as well as unique binding sites. d) eCLIP of H19 in myotubes identifies stronger A2/B1 binding to the 5′ end of the transcript, while TDP-43 binds more strongly to the 3′ end. e) eCLIP of the lncRNA Xist in myoblasts identifies strong binding by both proteins to the 5′ end, but unique binding sites in the middle of the transcript. f) eCLIP of Xist in myotubes identifies a similar binding pattern as in myoblasts.

90 A2/B1 TDP-43

Figure 3.15: Model of Protein-RNA Binding Network Model of ways in which A2/B1 and TDP-43 might cooperatively regulate RNA transcripts, including binding of distinct transcripts, binding of different sites on the same transcript, and binding of both proteins to the same site on a transcript. Other proteins may also be involved.

91 Table 3.1: Gene Ontology Analysis of TDP-43 Binding Sites in C2C12 Cells RefSeq IDs corresponding to genes with exonic/UTR TDP-43 eCLIP peaks unique to myotube were submitted to the Gene Ontology tool (http://geneontology.org). Biological categories were then ranked by enrichment in our gene list over background frequency.

GO biological process complete Fold Enrichment P value sarcomere organization 27.55 1.10E-02 actomyosin structure organization 16.99 2.65E-04 actin cytoskeleton organization 7.53 1.44E-05 single-organism organelle organization 3.65 2.11E-03 organelle organization 2.86 6.81E-06 cellular component organization 1.94 2.26E-02 cellular component organization or biogenesis 1.96 8.92E-03 cytoskeleton organization 4.9 1.39E-05 actin filament-based process 6.66 7.32E-05 myofibril assembly 21.46 3.89E-03 cellular component assembly 2.74 1.55E-02 cellular component biogenesis 2.67 9.07E-03 organelle assembly 4.6 4.62E-02 supramolecular fiber organization 7.12 3.59E-04 striated muscle cell development 12.36 2.92E-03 striated muscle cell differentiation 9.22 5.97E-03 muscle cell differentiation 8.21 1.03E-03 muscle structure development 6.19 5.84E-04 muscle cell development 10.95 7.20E-03 regulation of ATPase activity 26.93 8.95E-05 striated muscle contraction 15.83 4.52E-04 muscle contraction 11.45 1.91E-04 muscle system process 9.84 1.71E-04 translation 6.77 2.21E-02 peptide biosynthetic process 6.37 3.78E-02 regulation of cellular component organization 2.62 5.14E-03

92 Table 3.2: Gene Ontology Analysis of A2/B1 Binding Sites in C2C12 Cells RefSeq IDs corresponding to genes with exonic/UTR hnRNP A2/B1 eCLIP peaks unique to myotube were submitted to the Gene Ontology tool (http://geneontology.org). Biological categories were then ranked by enrichment in our gene list over background frequency.

GO biological process complete Fold Enrichment P value alternative mRNA splicing, via spliceosome 55.62 8.36E-03 mRNA splicing, via spliceosome 11.53 4.72E-03 mRNA processing 6.84 5.79E-03 mRNA metabolic process 6.23 1.50E-03 RNA splicing, via transesterification reactions with bulged adenosine as nucleophile 11.53 4.72E-03 RNA splicing, via transesterification reactions 11.46 4.93E-03 RNA splicing 8.53 6.68E-04 mRNA splice site selection 41.11 2.75E-02 cellular component organization 2.2 4.49E-04 cellular component organization or biogenesis 2.12 1.34E-03 cellular component assembly 2.81 3.76E-02 regulation of alternative mRNA splicing, via spliceosome 33.77 4.01E-03 regulation of mRNA splicing, via spliceosome 19.17 7.31E-03 regulation of RNA splicing 17.35 2.17E-04 muscle cell differentiation 7.79 2.24E-02 muscle structure development 5.64 3.67E-02 supramolecular fiber organization 6.99 4.71E-03 actin cytoskeleton organization 6.99 1.39E-03 cytoskeleton organization 4.33 6.83E-03 actin filament-based process 6.18 5.07E-03

93 CHAPTER IV

THE HISTONE MARK RIP-SEQ METHOD

Introduction

The discovery that PRC2 has the ability to bind to both lncRNAs in a targeted manner at heterochromatin, as well as nascent transcripts in a promiscuous manner

(Davidovich et al., 2013), has raised the question of what properties RNAs contain that distinguish between the two types of binding. Since lncRNAs can interact with

PRC2 at heterochromatin in a highly reproducible manner (Chu et al., 2011; Simon et al., 2011), the RNA must possess some factor, either intrinsic to the RNA or another protein (or RNA), that stimulates PRC2 to deposit H3K27me3 marks.

However, to date no motifs have been found that are specific to interactions between

PRC2 and lncRNAs.

Previous studies that attempted to identify PRC2-binding lncRNAs used methods such as RNA immunoprecipitation to pull down transcripts interacting with selected PRC2 subunits (Guttman et al., 2012; Khalil et al., 2009; Zhao et al., 2010).

However, in addition to potentially identifying off-target transcripts (due to the inability of RIP-seq to distinguish between direct and indirect protein-RNA interactions), these experiments were not designed to distinguish between promiscuous and non-promiscuous binding of RNA. We attempted to circumvent these issues by designing a protocol to identify the subset of PRC2-interacting transcripts that are associated with silenced heterochromatin.

To do this, we adapted a RIP-seq protocol to detect chromatin-associated

RNA using antibodies to histone marks, resulting in a method that we named histone mark RNA immunoprecipitation and sequencing (hmRIP-seq) (Figure 4.1). Due to

94 the inability of the RIP-seq method to distinguish between direct and indirect

protein-RNA interactions, hmRIP-seq can detect transcripts that are bound directly

to histones, or that are bound indirectly via proteins such as PRC2. In the case of

indirect binding, we would expect detected RNA to be bound to one of the RNA

binding domains in PRC2, while one or more of the histone binding sites in PRC2 are

in contact with the histone PTM of interest.

Materials and Methods

Cell Culture

To match prior lncRNA experiments, we used MCF7 cells for testing the

hmRIP-seq method. MCF7s are an adherent cell line is derived from a breast adenocarcinoma, and are classified by the ENCODE Project as a Tier 2 cell line, meaning that many of the genome-wide studies they have performed have included this cell line. MCF7 cells were grown in Roswell Park Memorial Institute Medium

(RPMI) supplemented with 10% FBS and 1x P/S.

Crosslinking of Cells to Improve RNA Recovery

In order to maximize RNA recovery, we decided to use a dual-crosslinking strategy that has previously been shown to improve DNA recovery in ChIP experiments (Zeng et al., 2006). This strategy involves sequential crosslinking of cells using both formaldehyde and ethylene glycol bis(succinimidyl succinate) (EGS)

(Abdella et al., 1979). EGS is a long, homobifunctional molecule containing a long

(16.1 Å) carbon chain that can be cleaved through incubation with hydroxylamine. It was our hope that the formaldehyde would crosslink short-range interactions while the EGS would crosslink longer-range, potentially indirect, interactions.

95 Cells were first collected in either media or quenched trypsin, washed in 1x

PBS, then resuspended in 1x PBS at 20 million cells per milliliter. EGS was prepared

as a fresh 25 mM stock in dimethyl sulfoxide (DMSO), then added to cells at a final

concentration of 1.5 mM. Following rotation for thirty minutes, the cells were treated

with formaldehyde at 1% final concentration, then rotated for an additional eight

minutes. The formaldehyde was then quenched in a ten-minute rotation following

addition of glycine to a final concentration of 125 mM. The cells were then washed

twice in 1x PBS and stored as pellets at –80ºC.

Nuclear Lysis and Chromatin Fragmentation

The nuclear lysis of crosslinked stored cells was based on a standardized lab

protocol. Briefly, frozen cell pellets were thawed and resuspended in five volumes of

Buffer A (10 mM KCl, 1 mM MgCl2, 1 mM NaF, 1 mM Na3VO4, 20 mM HEPES, 1 mM

DTT, and 0.4 mM PMSF), then incubated on ice for ten minutes. Following ten- minute centrifugation at 4ºC and 2500 rpm, the pellet was resuspended in two volumes of Buffer A and dounced 16 times. Centrifugation at 3000 rpm and 4ºC for

15 minutes was sufficient to separate the nuclear pellet from the cytoplasmic supernatant.

The nuclear pellet was resuspended in two volumes of Buffer C (400 mM KCl,

1 mM MgCl2, 1 mM NaF, 1 mM Na3VO4, 20 mM HEPES, 0.1 mM EDTA, 15% glycerol, 1 mM DTT, 0.4 mM PMSF). Following 16 dounce steps, this mixture was rotated at 4ºC for thirty minutes, then centrifuged at 17,000g for 30 minutes at 4ºC to separate the chromatin pellet from nuclear extract in the supernatant.

Testing of chromatin fragmentation conditions revealed that resuspending the cell pellet in high volume (1 mL) of DNase buffer and using high quantities (250U) of

96 RNase-free DNase (Roche) (Figure 4.2a) for 40 minutes at 37ºC followed by quenching with 5 mM EDTA was the most efficient way to fragment the chromatin to an acceptable size. This was followed by ten minute Bioruptor treatment (Figure

4.2b), followed by centrifugation (fifteen minutes at 17,000g at 4ºC) to collect the soluble chromatin in the supernatant.

RNA Immunoprecipitation

To improve RNA recovery, the hmRIP-seq protocol uses overnight immunoprecipitation at 4ºC, following removal of 10% of the sample for input and addition of RNase inhibitor. To ensure integrity of the RNase inhibitor throughout the downstream incubations and decrosslinking steps, the protocol uses RNasin Plus

(Promega N2611), which is advertised as being more stable at elevated temperatures.

To detect potential PRC2-interacting RNAs, the hmRIP-seq protocol was initially tested using an antibody specific to H3K27me3 (abcam ab6002). As a positive control, immunoprecipitations were also performed using an antibody specific for histone H3 regardless of modification status (abcam ab1791). In addition, antibodies specific to two other PTMs were also used (abcam ab1220, specific to

H3K9me2; and abcam ab8580, specific to H3K4me3). As a negative control, an antibody to IgG (Novus NB810-56910) was used, at the same concentration as the IP antibodies. All histone PTM antibody specificities were confirmed using an outside database (Rothbart et al., 2015).

Antibody-protein-RNA complexes were recovered using 25 µg Protein A/G beads (Thermo Fisher 88802) per sample. These beads were first blocked in 1 mL blocking buffer (IP Wash Buffer 1 supplemented with 200 µg/mL BSA and 200

97 µg/mL yeast tRNA) overnight at 4ºC, then incubated with the antibody-protein-RNA complexes for 90 minutes at room temperature.

To remove background contamination, the bead complexes were washed three times with IP Wash Buffer 1 (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 0.5%

Triton-X, 0.5% NP-40, and 1 mM PMSF), followed by a single wash using high-salt

IP Wash Buffer 2 (the same recipe as IP Wash Buffer 1, except using 500 mM NaCl).

The samples were then washed once in LiCl Wash Buffer (10 mM Tris-HCl pH 8.0,

250 mM LiCl, 0.5% NP-40, 0.5% Triton-X, 1 mM EDTA) and once in 1x TE Buffer as previously described(Neff et al., 2012).

Elution of the protein-RNA complexes from beads was performed at 65ºC for

15 minutes in 100 µL Elution Buffer 1 (50 mM Tris-HCl pH 8.0, 10 mM EDTA, 1%

SDS), followed by vortexing in 150 µL Elution Buffer 2 (1x TE supplemented with

0.67% SDS), for a total elution volume of 250 µL per sample.

The samples were then decrosslinked in 1M hydroxylamine-HCl (pH 8.5) for three hours at 37ºC, with 10U RNasin Plus added, to cleave the EGS crosslinked to the samples. The formaldehyde was then decrosslinked by incubation at 55ºC for two hours. Finally, the protein was removed from the nucleic acids by incubation at 37ºC for two hours with 100 µg proteinase K added and 20 µg glycogen added at this step.

The samples were then split into 25% for DNA analysis, and 75% for RNA analysis. hmRIP-seq RNA and DNA Purification

The DNA was purified using the Omega EZNA Cycle Pure Kit (Omega D6493-

01), followed by incubation at 37ºC for thirty minutes with RNase A added to a final concentration of 0.2 µg/µL, and a second purification with the EZNA kit, eluting into a final volume of 30 µL. The RNA was purified as previously described (Yang et al.,

98 2014) using the Qiagen miRNeasy Mini kit (Qiagen 217004), DNase treatment with

TURBO DNase (Ambion AM2238), and second purification with the miRNeasy kit, eluting in a final volume of 8 µL. This kit was used instead of a standard RNA purification kit due to the prior RNA fragmentation during the nuclear lysis step, which necessitated a method that would efficiently retrieve short RNAs.

Generating an hmRIP-seq Sequencing Library

The RNA was reverse transcribed and sequenced using a strand-specific protocol similar to what has previously been described(Borodina et al., 2011). Briefly, the RNA derived from hmRIP was reverse transcribed using SuperScript II (Thermo

Fisher 18064014) according to manufacturer’s instructions, priming off of random hexamers (Thermo Fisher N8080127). Following purification with AMPure XP beads (Beckman Coulter A63880), the second strand was synthesized for 2.5 hours

at 15ºC in a reaction containing the RNA-DNA hybrid, 1x Second Strand Buffer (NEB

B6117S), 1 µL dUTP mix (20 mM dUTP, 10 mM dATP, 10 mM dCTP, 10 mM dGTP),

2.5U RNase H (NEB M0297S), 30U E. coli DNA polymerase (NEB M0209S), and

10U E. coli DNA ligase (NEB M0205S).

The resulting cDNA was then end-repaired using the End-It™ DNA End-

Repair Kit (Lucigen ER0720), and A-tailed for thirty minutes at 37ºC in a reaction

containing 1x NEB2 Buffer, 0.2 mM dATP, and 15U Klenow (3′→5′ exo–) (NEB

M0212S). The cDNA was ligated to Illumina TruSeq adaptors containing Unique

Molecular Identifiers (UMIs) for 30 minutes at room temperature in a reaction

containing 50 nM pre-annealed adaptors, 1x Quick Ligase Buffer (NEB M2200), and

2000U T4 DNA Ligase (NEB M0202S). The cDNA was then size selected using 2%

E-Gel® SizeSelect™ gels (Thermo Fisher G661002). In order to achieve strand

99 specificity, the second strand of the size selected cDNA was selectively degraded by

USER enzyme (NEB M5505S). Following library PCR with barcoded Illumina

TruSeq primers and AMPure bead purification, these libraries were sequenced on an

Illumina MiSeq machine.

Computational Analysis of hmRIP-seq Libraries

Sequencing libraries were validated using the Fastqc program (version

0.10.1), then aligned using the Tophat program (version 2.0.13) (Trapnell et al.,

2009). These alignments were then visualized on the hg19 human reference genome using the bedGraphToBigWig program (Kent et al., 2010).

To identify candidate H3K27me3-interacting transcripts, we followed a two- pronged approach. First, peaks in the hmRIP-seq data were identified using MACS2

(Zhang et al., 2008) using default settings; this identified 82 candidate peaks overlapping previously-identified lncRNAs (Cabili et al., 2011). Then, we identified locations that corresponded to multiple sequencing reads. Since reads with duplicate

UMIs are removed during processing of the sequencing library, each read in the final library should correspond to a unique RNA-histone mark interaction; therefore, transcripts identified in multiple reads, even if insufficient to be called as a peak by

MACS2, should still be considered as candidates. There were 404 transcripts that had at least ten UMI-filtered reads at one location.

These candidates were then manually inspected in the UCSC Genome

Browser, and filtered into a few categories: previously-identified lncRNAs, repetitive or simple regions (as identified by the RepeatMasker database(Smit et al.)), regions identified as peaks with minimal sequencing read buildup, and potential candidates

100 for novel chromatin-associated RNAs. The potential candidates were subjected to

qPCR primer design using the Primer3 program (Untergasser et al., 2012).

Validation of hmRIP-seq Candidates by RT-qPCR

RNA samples from Ezh2, H3K27me3, and IgG hmRIP experiments were reverse transcribed using a High-Capacity cDNA Reverse Transcription Kit (Thermo

Fisher 4368814), then used for qPCR with 2x Takyon for Sybr qPCR Master Mix

(Eurogentec UF-NSCT-B0201) and primers specific to computationally identified candidates. These reactions were then run on a Roche LightCycler 480.

Results

The hmRIP-seq Technique Can Detect Chromatin-Associated RNAs

Peaks called by MACS2 were identified in a number of known chromatin- associated lncRNAs, including NEAT1, MALAT1, and HOTAIR (Figure 4.3). The hmRIP-seq signal also corresponded to previously-identified chromatin-interacting regions. For example, the strong hmRIP signal in exons 1 and 2 of HOTAIR (Figure

4.3a) lies within the previously-identified HOTAIR-PRC2 binding site that spans the

5′-most 300 nucleotides of the HOTAIR transcript.

Identifying Potential Histone Mark-Associated Transcripts

Of the candidate transcripts, eight (TCONS_00006407, TCONS_00026407,

TCONS_00017382, TCONS_00007196, TCONS_00014209, and TCONS_00015119,

LINC000894, and RPPH1) were sufficiently non-repetitive and expressed at high enough level to design qPCR primers to. These primers were tested in an experiment using RNA from hmRIP experiments that used antibodies to H3K27me3 and Ezh2.

101 However, many of these candidates displayed either weak recovery as a percentage

of input, or high signal in the –RT negative control sample.

Three candidates had high enough enrichment in the IP sample to warrant the

design of additional qPCR primer sets to confirm positive results:

TCONS_00006407, TCONS_00026407, and TCONS_00007196.

TCONS_00007196 appeared to have the greatest enrichment over the IgG negative

control in the Ezh2 and H3K27me3 hmRIPs. To confirm these results, we designed

an additional set of primers to a different hmRIP-seq hotspot, which also displayed

strong qPCR signal (Figure 4.4).

A Novel Heterochromatin-Associated RNA

The locus of TCONS_00007196 has a number of overlapping lncRNA

transcript annotations that exist on either strand and are annotated with different

intron/exon boundaries. This suggests that this locus may be a hotspot for pervasive

transcription, with a number of different RNAs being transcribed from this region.

However, the signal detected in the hmRIP-seq experiment mapped strongly to the

minus strand, and also overlapped many of the intron-exon boundaries. The

GENCODE database contained an annotation for a transcript spanning the entire

locus on the minus strand, labeled MTRNR2L12.

MTRNR2L12 is a gene that has been suggested as a potential biomarker for

early Alzheimer’s Disease-like dementia (Bik-Multanowski et al., 2015). It is a

paralog of a mitochondrial RNA, MT-RNR2, which encodes the mitochondrial 16S

ribosomal RNA. Nuclear Mitochondrial Pseudogenes (NUMTs), nuclear sequence

that has been transposed from the mitochondrial genome, are fairly common across

species (Bensasson et al., 2001) and in the (Dayama et al., 2014).

102 They are thought to enter the nucleus due to errors in non-homologous end joining

during end repair, which lead to the inclusion of non-nuclear DNA into the nuclear genome (Ramos et al., 2011). MT-RNR2 contains within it a short, 24-amino acid, open reading frame that encodes a protein called humanin, which has been shown to inhibit neuronal cell death in some models of Alzheimer’s Disease but also be a potential oncogene (Maximov et al., 2002).

There are thirteen paralogs of MT-RNR2 in the human nuclear genome, of which MTRNR2L12 is one of the closest in sequence to MT-RNR2 (Bodzioch et al.,

2009). Of these paralogs, MTRNR2L12 had the strongest hmRIP enrichment over input across the entire transcript (Figure 4.5). Notably, the sequences in MTRNR2L8 and MTRNR2L12 corresponding to the humanin ORF contain a polymorphic site

(rs7350541) that, if translated in the nucleus, would have the same sequence that the

MT-RNR2 humanin protein would have if by mitochondrial tRNAs (Bodzioch et al.,

2009).

The finding that this transcript interacts with heterochromatin was an interesting one, as none of the previous functions attributed to MTRNR2L12, MT-

RNR2, or the other paralogs was chromatin-related. Due to the high expression of

MTRNR2L12 in many cell types, and the relatively high level of H3K27me3 around the loci of the other paralogs (Figure 4.5), we hypothesized that MTRNR2L12 might interact with chromatin surrounding the other paralogs.

We performed a ChIP-qPCR experiment using primers designed specifically to each of the thirteen MT-RNR2 paralogs, to examine whether or not there was evidence of H3K27me3 marks at each individual paralog. We found that all paralogs except for MTRNR2L12 and MTRNR2L13 displayed some H3K27 methylation as

103 compared against a high-H3K27me3 positive control locus (Figure 4.6b). This suggests that either the other eleven paralogs are specifically silenced, or

MTRNR2L12/13 are specifically not silenced.

A model that would fit these data is one in which MTRNR2L12 interacts with heterochromatin at the other paralogs, potentially in conjunction with PRC2 (Figure

4.7). The sequence similarity between MTRNR2L12 and the sequence of the other paralogs suggests that MTRNR2L12 RNA molecules could bind to the DNA or nascent RNA transcripts of the paralogs. If MTRNR2L12 was also bound to PRC2 during this process, it could lead to targeted silencing at the other paralogs.

Potential Piwi Protein-Interacting Transcripts

We also performed an hmRIP-seq experiment using an antibody to

H3K9me2, another common heterochromatin-associated mark that is generally spread across larger genomic regions than H3K27me3. We suspected that some transcripts associated with H3K27me3 would also display association with

H3K9me2, since PRC2 has been shown to interact with H3K9 methylases (Boros et al., 2014; Mozzetta et al., 2014). However, there is also reason to believe that these two histone marks do not always co-occur. It has been shown that Polycomb- associated H3K27me3-marked chromatin has tighter packing dynamics which could impose restrictions on interactions with external factors (Boettiger et al., 2016). Our data shows moderate overlap for transcripts with H3K9me2 and H3K27me3 signal, suggesting that some transcripts are interacting with regions containing both of these histone marks. This implies that some transcripts might interact with particular heterochromatin-associated PTMs, while other transcripts might interact with heterochromatin in a histone PTM-independent manner.

104 The hmRIP-seq data displayed a number of reads diffusely overlapping

repetitive Long Interspersed Nuclear Elements (LINEs) (Figure 4.8a). LINEs are retrotransposons that are approximately six kilobases in length, and exist in hundreds of thousands of copies spread across the human genome, a small fraction of which are currently considered active. LINEs contain two open reading frames, which code for retrotransposon and chaperone proteins. Active LINEs are transcribed and have both open reading frames translated, allowing them to copy themselves to other parts of the genome. It has been proposed that transposons can comprise a significant portion of lncRNA sequence (Kapusta et al., 2013; Kelley and

Rinn, 2012), as transposon insertion is a potential mechanism for the insertion of a de novo transcription start site into the genome. Transposon insertions into the middle of lncRNAs are also more likely to be retained across generations than insertions into mRNAs, since insertions into lncRNAs will not interfere with any protein coding sequence.

Since most of the reads were in LINEs not overlapping annotated lncRNAs, we suspected that we were observing a different mechanism than the binding of

HOTAIR to chromatin modifiers and heterochromatin. That mechanism could potentially be the silencing of LINEs in action. It has previously been shown that genomic LINEs are silenced by H3K9 methylation, and those that are transcribed can be silenced in an RNAi-like manner (Bulut-Karslioglu et al., 2014; Yang and

Kazazian, 2006). Similar small RNAs generated from transposable elements can also act as piwi-interacting RNAs (piRNAs), which direct the piwi complex of silencing proteins to targeted genomic loci, leading to recruitment of H3K9 methylases and the deposition of H3K9 methyl marks (Le Thomas et al., 2013). One possible

105 explanation for hmRIP-seq reads to map to a LINE would be the identification of

nascent transcripts from LINEs that are in the process of being silenced by this

process, or have been incompletely silenced (Figure 4.8b).

In order to more closely investigate the regions of LINEs that associate with

H3K9me2, we mapped the location of each sequencing read against LINEs

annotated in the L1Base database (Penzkofer, 2004). This database divides LINEs

into three categories: those that are inactive, those that have a mutation in one open

reading frame, and those that are active. Interestingly, we identified certain sites that

were particularly well-represented in our hmRIP-seq data (Figure 4.9). These sites appeared to be correlated with the activity of each LINE category—sites in each region were enriched only in categories containing active versions of that region.

Those LINEs classified as active had all of the sites, while those with an inactive first open reading frame were missing the sites within the first open reading frame.

Discussion

The results provided by hmRIP-seq suggest an effective method to identify heterochromatin-associated RNA transcripts in a number of different contexts. The identification of MTRNR2L12 is a particularly interesting case, as it represents a novel heterochromatin-interacting RNA. The unusual evolutionary history of

MTRNR2L12 and its paralogs suggest a model for how other gene families with numerous duplications in the genome might be silenced. Unfortunately, the high sequence similarity between the paralogs means that further analysis of this gene family will prove difficult.

While we have hypothesized a potential model for how MTRNR2L12 interacts with heterochromatin, the hmRIP-seq method is poorly suited to understanding the

106 details of the interaction. A high-throughput RNase footprinting experiment has identified protein binding sites on MTRNR2L12, suggesting that it is protein-bound

(Ji et al., 2016). While the bound protein could be PRC2, as is the case with HOTAIR,

MTRNR2L12 may function differently instead—for example, binding to an

H3K27me3 reader complex or binding directly to heterochromatin.

The difficulties in assigning a mechanism to MTRNR2L12-mediated silencing

are symptomatic of a key weakness of hmRIP-seq: an inability to identify whether

RNA-chromatin interactions are direct or indirect, and if indirect, what other protein

cofactors are present and necessary for the interaction. To clarify the answers to

these questions, hmRIP-seq targets must currently be subjected to other types of

experiments, such as eCLIP-seq experiments to other chromatin-modifying

enzymes.

The identification of LINEs in the H3K9me2 hmRIP-seq was also an

interesting result, as it provided some support for a mechanism of repetitive element

silencing that had been suggested by previous studies. The disappearance of hmRIP-

seq sites when their regions do not make functional protein suggests that association

of LINE transcript regions with heterochromatin is correlated with translation of

those regions. It is possible that the generation of targeting piRNAs from LINEs is

associated with the process of translation, which could focus silencing on LINEs that

are able to re-insert themselves into the genome, a more efficient process than

blindly silencing all LINEs.

However, as with the repetitive MT-RNR2-like family of genes, studying

LINEs in a high-throughput manner would be difficult, considering their high

sequence similarity. Adapting the hmRIP-seq method to use longer RNA fragments

107 to achieve longer sequencing reads might improve identification of reads mapping to

repetitive elements by raising the chance that any given sequencing read is able to be

mapped back to the genome uniquely. Improved identification of repetitive elements

would be useful in many respects, since retrotransposons have been suggested to

interact with heterochromatin in a number of ways.

First, since many lncRNAs have been suggested to contain evidence of

transposable elements (Elisaphenko et al., 2008; Johnson and Guigó, 2014; Kapusta

et al., 2013), this would allow for more accurate identification of lncRNAs associated with chromatin. Second, retrotransposons in the genome must be silenced in order to ensure genomic stability (Bulut-Karslioglu et al., 2014; Sadic et al., 2015), which suggests that any RNAs mediating their silencing would be detectable by hmRIP-seq.

Third, a subset of LINEs are expressed during some forms of heterochromatin formation (Chow et al., 2010), and hmRIP-seq could be used to more fully

characterize their functions.

Unfortunately, the hmRIP-seq method as described is limited by poor RNA

recovery. While we detected strong binding to previously-identified

heterochromatin-interacting RNAs such as HOTAIR, there was only minimal signal

at many other transcripts. As a result, it was difficult to identify novel binding

partners of silencing PTMs, with MTRNR2L12 being the only high-quality candidate

to emerge from the hmRIP-seq results that validated well by RT-qPCR. Further

technical improvements to the hmRIP-seq protocol could improve RNA recovery and

the sensitivity of the protocol to detect novel heterochromatin-associated transcripts.

Using hmRIP-seq on different types of cells to see if heterochromatin-

associated transcripts change in response to different cellular conditions may be a

108 worthwhile future direction. One reason that MCF7 cells have been used for much

prior lncRNA research is that they contain relatively high levels of endogenous

HOTAIR, perhaps reflected in the strong hmRIP-seq readings along the HOTAIR

transcript. It is possible that the high levels of PRC2-bound HOTAIR in MCF7 cells preclude other lncRNAs from binding to PRC2 and heterochromatin; performing the same analysis in a cell line with low HOTAIR would remove this constraint, allowing signal from other lncRNAs to be detected more strongly in an hmRIP-seq

experiment.

109 a Cross-link cells with formaldehyde and EGS

b Lyse cells to retrieve nuclear pellets

c DNase treat and solubilize chromatin

d IP with histone mark antibodies

e De-crosslink with hydroxylamine and heat

f Purify RNA/DNA for sequencing library creation

Figure 4.1: hmRIP-seq Protocol Flowchart Flowchart of steps in hmRIP-seq protocol.

110 a b

Ladder 0 minutes Bioruption15 minutes Bioruption60 minutes Bioruption Ladder 150U DNase 200U DNase 225U DNase 250U DNase

1 kb —

500 bp — 400 bp — 1 kb —

300 bp —

200 bp — 500 bp — 400 bp — 100 bp — 300 bp —

200 bp —

100 bp —

Figure 4.2: hmRIP-seq Tests a) Titration of DNase points for the hmRIP-seq protocol b) Test of Bioruption time points for the hmRIP-seq protocol

111 a PRC2-interacting region 2 kb HOTAIR 54 363 000 54 362 000 54 361 000 54 360 000 54 359 000 54 358 000 54 357 000 54,356 (NR_003716) 5 –

K27 hmRIP enrichment

–5 – 5 – K27 hmRIP 0 – 5 –

Input 0 – b 5 kb MALAT1 000 65 270 000 65 (NR_002819) 6 –

K27 hmRIP enrichment

–6 – 40 – K27 hmRIP 0 – 40 –

Input 0 – c 10 kb NEAT1 000 65,195,000 65,200,000 65,205,000 65,210,000 65,215 (NR_131012) 5 –

K27 hmRIP enrichment

–5 – 5 – K27 hmRIP 0 – 5 –

Input 0 – Figure 4.3: hmRIP-seq Profiles at lncRNAs Input and H3K27me3 hmRIP-seq profiles, in reads per million, for known heterochromatin-interacting transcripts. At top of each plot are enrichment values across each gene, calculated as log(K27/input).

112 500 bp MTRNR2L12 (ENST00000600213) 96,336,000 96,336,500 96,337,000 TCONS 00007196 TCONS 00007197 TCONS 00006915 Other annotated TCONS 00006916 TCONS 00006917 lncRNAs TCONS 00006918 TCONS 00007198 TCONS 00006919 TCONS 00006556 5 –

K27 hmRIP enrichment

–5 – 5 – K27 hmRIP 0 – 5 –

Input 0 – 15 – ENCODE MCF7 H3K27me3 ChIP-seq 0 –

Figure 4.4: MTRNR2L12 hmRIP-seq and Confirmatory qPCR MTRNR2L12 displays strong hmRIP signal for H3K27me3. Relative hmRIP and input signal are shown, along with enrichment, for MTRNR2L12 as in Figure 3. Also included are lncRNAs predicted at this locus (Cabili et al. Genes and Development 2011). Below, RT-qPCR results of MTRNR2L12 loci displaying H3K27me3 hmRIP signal, as compared to RT-qPCR results from highly abundant control RNAs. RT-qPCR was performed on hmRIP samples using antibodies to H3K27me3, IgG, and Ezh2, a PRC2 component.

113 1 kb 500 bp MTRNR2L1 22,022,500 22,023,000 22,023,500 22,024,000 MTRNR2L2 79 946 000 79 946 500 79 (NM_001190452) (NM_001190470) a5 – b 5 – K27 hmRIP K27 hmRIP enrichment enrichment

–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 – Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – 500 bp 1 kb MTRNR2L4 3,421,500 3,422,000 MTRNR2L3 933,500 55,934,000 55,934,500 55,935, (NM_001190476) (NM_001190472) 5 – 5 – c d K27 hmRIP K27 hmRIP enrichment enrichment

–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 –

Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq ChIP-seq 0 – 0 – 1 kb 1 kb MTRNR2L6 142,374,500 142,375,000 142,375,500 MTRNR2L5 57,359,000 57,359,500 57,360,000 57,360,500 (NM_001190487) (NM_001190478) 5 – 5 – e f K27 hmRIP K27 hmRIP enrichment enrichment

–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 – Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – 1 kb 500 bp MTRNR2L7 37,890,500 37,891,000 37,891,500 37,892,000 MTRNR2L8 10,529,500 10,530,000 10,530,500 (NM_001190489) (NM_001190702) g 5 – h 5 – K27 hmRIP K27 hmRIP enrichment enrichment

–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 –

Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – 200 bp 500 bp MTRNR2L9 000 62,284,100 62,284,200 62,284,300 62,284,400 62,284,500 62, MTRNR2L10 55,208,000 55,208,500 55,209,000 (NM_001190706) (NM_001190708) i 5 – j 5 – K27 hmRIP K27 hmRIP enrichment enrichment

–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 –

Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – 1 kb 1 kb MTRNR2L11 MTRNR2L13 000 238,107,500 238,108,000 238,108,500 000 117,220,500 117,221,000 117,221,500 (ENST00000604646) (ENST00000604093) k 5 – l 5 – K27 hmRIP K27 hmRIP enrichment enrichment

–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 –

Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – Figure 4.5: MTRNR2L12 Paralogs hmRIP-seq hmRIP-seq profiles of MTRNR2L1–11 and MTRNR2L13

114 a L11 L10

L7 L3

L13

L4 L5

L6

L1 L9

L12

L2 L8

MT-RNR2 b

Figure 4.6: MTRNR2L12 Paralogs Relationship and H3K27me3 ChIP-qPCR a) Evolutionary distance between MTRNR2L12, MT-RNR2, and other paralogs. Tree generated using UCSC Genome Browser (http://genome.ucsc.edu), Clustal Omega (http://www.clustal.org/), and Dendroscope (http://dendroscope.org) b) ChIP-qPCR of MTRNR2L12 loci (same amplicons as Figure 4.4) and the other MT- RNR2 paralogs using DNA from an H3K27me3 hmRIP experiment. Y-axis is percent input, normalized against percent input at control locus “+.” Error bars represent standard error (n=4).

115 X

MTRNR2L3

X MTRNR2L12 MTRNR2L2

X

MTRNR2L7

Figure 4.7: Potential Mechanism of MTRNR2L12-Mediated Paralog Silencing MTRNR2L12 is transcribed, then binds to PRC2 at the genomic loci of its paralogs, effectively silencing them. MTRNR2L12 may also associate with heterochromatin via other heterochromatin-associated proteins as well.

116 5 kb SORCS1 (NM_052918) 5 – Input 0 – 5 – H3K27me3 hmRIP 0 – 5 – H3K9me2 hmRIP RepeatMasker 0 – LINEs

Piwi

Piwi 1 2 5

Pol II Active LINE

ORF1 ORF2

3

ORF1 ORF2 Nucleus 4

Figure 4.8: Model of Potential LINE Functions a) An example of a LINE with diffuse H3K9me2 hmRIP-seq across it. b) Model of potentialLINE transcript-H3K9me2 interactions ɠ An active LINE is transcribed and exported from the nucleus. ɡ The LINE transcript is recognized by a piRNA-associated Piwi complex, which contains proteins that degrade the LINE transcript. ɢ The LINE transcript codes for two proteins, ORF1 and ORF2 ɣ ORF1, a chaperone, and ORF2, a reverse transcriptase, can facilitate re- integration of the LINE transcript into the genome at a different location. ɤ The LINE transcript can also be recognized by nuclear Piwi complexes that contain histone methyltransferases. These complexes can silence LINEs by depositing H3K9me2 upstream of the LINE transcription start site.

117 0.05 LINE Type

Intact

0.04 Intact ORF2

All Full Length

0.03

Frequency 0.02

0.01

0.00

0% 25% 50% 75% 100% Distance along LINE 5′ UTR ORF1 ORF2 3′ UTR Figure 4.9: hmRIP-seq Reads in LINEs hmRIP-seq H3K9me2 sequencing reads mapped along all genomic LINEs, as classified by L1Base. Green arrows mark peaks that only appear in LINEs that are fully intact or only have an intact ORF2. Red arrow marks a peak that only appears in fully intact LINEs.

118 CHAPTER V

DISCUSSION

The world of RNA-associating proteins is vast. In this thesis, I have focused on a small subset of RNA-protein interactions: interactions involved in chromatin

and silencing, and interactions that are involved in RNA trafficking. The insights

gained from research into these areas, however, can also be applied to many other

aspects of RNA-protein interactions.

The role of RNA in chromatin silencing thus far has focused on the potential for transcripts to act as a targeting mechanism for histone-modifying protein

complexes. The Matchmaker Model proposed by our lab provides a twist to this

formula, shifting the focus from individual protein-RNA interactions towards

networks of interactions in which multiple factors must be present and interacting

with each other appropriately in order for gene silencing to occur. This paradigm

greatly increases the potential complexity of targeting interactions.

With greater potential for complexity comes greater potential for something

to go wrong. In the example of HOTAIR-mediated PRC2 gene silencing,

dysregulation of or mutations in either of those components, hnRNP A2/B1, or the

nascent transcript that A2/B1 binds, could result in disrupted PRC2 activity. It is

reasonable to wonder how such a complex system evolved, when a simpler system

(in which HOTAIR binds to PRC2 and specific gene loci, and PRC2 acts whenever it

is adjacent to chromatin) appears as though it would suffice to regulate gene

silencing in most circumstances.

One possibility is that PRC2 is a multifunctional protein complex that has

functions outside of catalyzing the formation of H3K27me3. Although research on

119 PRC2 has focused on its effects on H3K27me3, its subunits also contain binding regions for histones and non-H3K27me3 histone PTMs. If PRC2 plays some role on

chromatin aside from silencing, it would make sense that another cofactor would be

necessary to activate the silencing function of PRC2, deposition of H3K27me3, and

the silencing of the locus.

Another possibility is that having multiple factors involved in silencing

through an interaction network could facilitate quicker evolution of gene targeting. If

mutation of any factor in the network can lead to disease, then such mutations can

also lead to novel gene silencing or activation events. On a small scale this could be

advantageous, but a large number of novel silencing targets would predispose the

organism to disease. Since lncRNAs have been shown to evolve more rapidly than

proteins and other types of transcripts (Kapusta and Feschotte, 2014; Necsulea et al.,

2014; Washietl et al., 2014), they would likely be the primary driver of changes in

targeting through this mechanism. PRC2 and A2/B1, although they have evolved

more slowly than lncRNAs, could also provide changes to gene targeting specificity.

It is also possible that modifications, such as alternate splice isoforms (in the case of

A2/B1), phosphorylation (in the case of Ezh2 (Kaneko et al., 2010)), or methylation

(in the case of Jarid2 (Sanulli et al., 2015)) of these proteins could affect their

function and specificity.

Our identification of binding of A2/B1 to intron 3 of HOTAIR suggests new

possibilities regarding the roles that RNA-binding proteins can play. Since A2/B1 is a

known splicing regulator, one possibility would be for A2/B1 to bind intron 3 merely

to regulate its alternative splicing. In fact, such binding may be responsible for the

small fraction of intron 3 that is retained in the HOTAIR transcript in total RNA.

120 However, our evidence that A2/B1 also binds fully spliced HOTAIR suggests that

binding to intron 3 may simply represent a precursor to subsequent binding

elsewhere in HOTAIR domain 1.

It is possible that intron 3 represents a particularly strong binding site for

A2/B1, and serves to recruit A2/B1 more strongly than a fully spliced HOTAIR

would. If even a fraction of A2/B1 remained bound to HOTAIR following the splicing

of intron 3, that would enhance A2/B1-HOTAIR interactions, leading to increased

formation of RNA-RNA interactions between HOTAIR and other transcripts. An

interesting experiment to perform would be overexpression of HOTAIR without

intron 3, to see if A2/B1 binding and subsequent B1-mediated silencing events are

affected genome-wide.

Although investigation of A2/B1 by eCLIP is a good way to study the silencing

aspects of A2/B1, it also allows for study of A2/B1 away from chromatin, a role which

has been much less extensively studied than its actions on chromatin and with

nascent transcripts. We have characterized the activity of cytoplasmic A2/B1 in

cancer cell lines. The identification of numerous A2/B1 targets in the cytoplasm

under normal circumstances was a fascinating observation, given that the existence

of many RNA-binding proteins in the cytoplasm was previously thought to be

predominantly pathologic.

Our eCLIP fractionation experiment showed a small amount of cytoplasmic and nucleoplasmic A2/B1-RNA binding in normal MCF7 cells, with a far different binding profile than chromatin-associated A2/B1. Most striking was the identification of strong binding to proximal introns and 5′ UTR sequences, as opposed to exonic sequence. The fraction of cytoplasmic RNAs that is exonic is far

121 higher than in unspliced chromatin-associated RNA, leading us to predict that A2/B1

would be more likely bind exonic sequence in the cytoplasm.

Our analysis instead suggests other roles for cytoplasmic A2/B1, as the lack of

exonic binding potentially evolved to prevent interference between A2/B1 binding

and interactions between RNAs and the actively translating ribosomes. First, binding

to proximal introns likely represents a consequence of A2/B1 modulating the splicing

of those transcripts on chromatin, representing a potential mechanism for regulation

of their maturation. A potential model for this would involve A2/B1 binding to and

interfering with the splicing of an intron, then remaining bound to that intron following nuclear export. It is also possible that A2/B1 binds in order to slow the process of mRNA maturation, allowing splicing to happen eventually but at a specifically defined time or place.

Second, 5′ UTR binding likely represents binding to regulatory regions. While binding of A2/B1 to stability-associated elements in the 3′ UTR has been shown

previously (Geissler et al., 2016), no such binding has been identified at the 5′ end.

Although there are no functional examples of A2 interacting with the 5′ UTR, its

paralog hnRNP A1 has been shown to interact with the 5′ UTR of a viral RNA (Lin et

al., 2009), providing a model for how 5′ UTR binding could function. It is possible

that A2/B1 is binding to 5′ UTR elements in human transcripts in a similar manner,

in a process involving elements that have not yet been described.

C2C12 mouse muscle cells represent a far different cellular environment than

MCF7 breast adenocarcinoma cells. For example, our eCLIP of A2/B1 in myoblasts

indicated a large proportion of binding to the 3′ UTR, similar to eCLIP in human

neurons (Martinez et al., 2016) but far different from the preferential intron binding

122 detected in our nuclear MCF7 dataset. This suggests that A2/B1 binding sites are highly variable and dependent on both cell type and local cellular environment. On a

transcript containing an A2/B1 binding site, that site might only be used in certain

cell types, in certain cellular compartments, or at certain times in development.

Our comparison of TDP-43 and A2/B1 eCLIP libraries provides evidence for the hypothesis that the transcriptome may be subdivided by association with different RNA-binding proteins (Mohagheghi et al., 2016). Between TDP-43 and

A2/B1 there are a significant number of uniquely bound transcripts, and even on the transcripts that are bound by both proteins there appear to be regions of transcript that are only bound by one or the other protein. This suggests a model in which transcripts can be bound cooperatively by a number of RNA-binding proteins, each with its own distinct binding regions.

Combinatorial transcript recognition by RNA binding proteins could represent a major point of regulation for particular transcripts. For example, binding of RNA by TDP-43 could promote aggregation into RNP granules, as TDP-43 tends to form octameric complexes in the cell. If RNAs in such granules are also bound by hnRNP A1, that could then promote RNA-RNA interactions in the granule, as the antiparallel orientation of the RRMs in A1 could promote base pairing between different transcripts (Shamoo et al., 1997). Transcripts are likely to contain regions that are bound by more than one protein; the RepA region of Xist is one example, since it appears to be bound by both TDP-43 and A2/B1. It is possible that binding by one protein might prevent binding by another to the same region, allowing for more binding combinations if one protein is absent.

123 The aggregation of TDP-43 into oligomers that display features of amyloid

structures is of specific interest, as amyloid is described almost exclusively in terms

of its pathologic potential. In humans there are few examples of non-pathologic

amyloid formation, including packing of hormone peptides in secretory granules

(Maji et al., 2009) and proteins that are specific to melanosomes (Berson et al.,

2003). Both of these examples involve proteins that take advantage of the natural properties of amyloid: in secretory granules, it would be advantageous to have proteins packed as tightly as possible in a small space. Meanwhile, in melanosomes

the aggregative properties of amyloid structures appear to aid in the pigmentation of the melanosomes.

The discovery of non-pathogenic amyloid structures in muscle cells thus represents one of the first identifications of functional amyloid in an organ system where it is thought to cause disease. This redefines the questions that researchers should have when thinking about the effect that amyloid has on disease—rather than assuming that amyloid structures are pathogenic, the question should be how the timing of amyloid formation and degradation are altered in disease. Our data indicate strong binding of TDP-43 to myogenic transcripts during muscle differentiation. If cytoplasmic TDP-43 is also binding long myogenic transcripts in disease states, that would almost certainly affect cellular function.

First, the massive size of many myogenic transcripts (a fully spliced Titin mRNA transcript is 114 kilobases, which would be over 30 µm long if unfolded) would lead them to interfere with other cellular functions, even if binding by TDP-43

(which binds across the length of the mature transcript) prompts those transcripts to fold into a more compact shape. Second, cytoplasmic localization of TDP-43 during

124 normal cellular function would have an adverse effect on RNA homeostasis in both

the nucleus and cytoplasm. Less TDP-43 in the nucleus would adversely affect the

alternative splicing events that it mediates, while more TDP-43 in the cytoplasm

could lead to inappropriate localization of mRNAs that it would normally bind.

The investigations into protein-RNA interactions described in this thesis

suggest a number of avenues for future research. Our study of hnRNP A2/B1

provides evidence for a number of proposed activities that it might perform, and

posits it as a multifunctional protein that acts within numerous interrelated

pathways. The collaborative work we have performed on TDP-43 clarifies its

functional role in muscle, where it has previously been thought act pathologically, and sets the stage for future investigations into its critical role in myogenesis.

Subsequent studies characterizing these proteins, their interactions, and their associations with disease will have the opportunity to build upon these results and give a more complete picture of how their interactions with RNA function in the cell.

125 REFERENCES

Abdella, P.M., Smith, P.K., and Royer, G.P. (1979). A new cleavable reagent for cross- linking and reversible immobilization of proteins. Biochem. Biophys. Res. Commun. 87, 734–742.

Alarcón, C.R., Goodarzi, H., Lee, H., Liu, X., Tavazoie, S., and Tavazoie, S.F. (2015). HNRNPA2B1 Is a Mediator of m6A-Dependent Nuclear RNA Processing Events. Cell 162, 1299–1308.

Amit-Avraham, I., Pozner, G., Eshar, S., Fastman, Y., Kolevzon, N., Yavin, E., and Dzikowski, R. (2015). Antisense long noncoding RNAs regulate var gene activation in the malaria parasite Plasmodium falciparum. Proc Natl Acad Sci USA 112, E982– E991.

Amlie-Wolf, A., Ryvkin, P., Tong, R., Dragomir, I., Suh, E., Xu, Y., Van Deerlin, V.M., Gregory, B.D., Kwong, L.K., Trojanowski, J.Q., et al. (2015). Transcriptomic Changes Due to Cytoplasmic TDP-43 Expression Reveal Dysregulation of Histone Transcripts and Nuclear Chromatin. PLoS ONE 10, e0141836.

Augui, S., Nora, E.P., and Heard, E. (2011). Regulation of X-chromosome inactivation by the X-inactivation centre. Nat. Rev. Genet. 12, 429–442.

Bailey, T.L. (2011). DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659.

Barbosa-Morais, N.L., Carmo-Fonseca, M., and Aparício, S. (2006). Systematic genome-wide annotation of spliceosomal proteins reveals differential gene family expansion. Genome Res 16, 66–77.

Beltran, M., Yates, C.M., Skalska, L., Dawson, M., Reis, F.P., Viiri, K., Fisher, C.L., Sibley, C.R., Foster, B.M., Bartke, T., et al. (2016). The interaction of PRC2 with RNA or chromatin is mutually antagonistic. Genome Res 26, 896–907.

Bengoechea, R., Pittman, S.K., Tuck, E.P., True, H.L., and Weihl, C.C. (2015). Myofibrillar disruption and RNA-binding protein aggregation in a mouse model of limb-girdle muscular dystrophy 1D. Hum Mol Genet 24, 6588–6602.

Bensasson, D., Zhang, D.X., Hartl, D.L., and Hewitt, G.M. (2001). Mitochondrial pseudogenes: evolution's misplaced witnesses. Trends in Ecology & … 16, 314–321.

Berson, A., Barbash, S., Shaltiel, G., Goll, Y., Hanin, G., Greenberg, D.S., Ketzef, M., Becker, A.J., Friedman, A., and Soreq, H. (2012). Cholinergic-associated loss of hnRNP-A/B in Alzheimer's disease impairs cortical splicing and cognitive function in mice. EMBO Mol Med 4, 730–742.

126 Berson, J.F., Theos, A.C., Harper, D.C., Tenza, D., Raposo, G., and Marks, M.S. (2003). Proprotein convertase cleavage liberates a fibrillogenic fragment of a resident glycoprotein to initiate melanosome biogenesis. The Journal of Cell Biology 161, 521–533.

Bhardwaj, A., Myers, M.P., Buratti, E., and Baralle, F.E. (2013). Characterizing TDP- 43 interaction with its RNA targets. Nucleic Acids Research 41, 5062–5074.

Biamonti, G., Ruggiu, M., Saccone, S., Valle, Della, G., and Riva, S. (1994). Two homologous genes, originated by duplication, encode the human hnRNP proteins A2 and A1. Nucleic Acids Research 22, 1996–2002.

Bik-Multanowski, M., Pietrzyk, J.J., and Midro, A. (2015). MTRNR2L12: A Candidate Blood Marker of Early Alzheimer’s Disease-Like Dementia in Adults with Down Syndrome. Jad 46, 145–150.

Bin Cho, S., Ahn, K.J., Do Hee Kim, Zheng, Z., Cho, S., Kang, S.-W., Lee, J.H., Park, Y.-B., Lee, K.H., and Bang, D. (2012). Identification of HnRNP-A2/B1 as a Target Antigen of Anti-Endothelial Cell IgA Antibody in Behçet's Disease. Journal of Investigative Dermatology 132, 601–608.

Bodzioch, M., Lapicka-Bodzioch, K., Zapala, B., Kamysz, W., Kiec-Wilk, B., and Dembinska-Kiec, A. (2009). Evidence for potential functionality of nuclearly- encoded humanin isoforms. Genomics 94, 247–256.

Boettiger, A.N., Bintu, B., Moffitt, J.R., Wang, S., Beliveau, B.J., Fudenberg, G., Imakaev, M., Mirny, L.A., Wu, C.-T., and Zhuang, X. (2016). Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature 529, 418–422.

Borodina, T., Adjaye, J., and Sultan, M. (2011). A strand-specific library preparation protocol for RNA sequencing. Meth. Enzymol. 500, 79–98.

Boros, J., Arnoult, N., Stroobant, V., Collet, J.-F., and Decottignies, A. (2014). Polycomb repressive complex 2 and H3K27me3 cooperate with H3K9 methylation to maintain heterochromatin protein 1α at chromatin. Mol. Cell. Biol. 34, 3662–3674.

Bulut-Karslioglu, A., La Rosa-Velázquez, De, I.A., Ramirez, F., Barenboim, M., Onishi-Seebacher, M., Arand, J., Galán, C., Winter, G.E., Engist, B., Gerle, B., et al. (2014). Suv39h-dependent H3K9me3 marks intact retrotransposons and silences LINE elements in mouse embryonic stem cells. Mol Cell 55, 277–290.

Buratti, E., Dörk, T., Zuccato, E., Pagani, F., Romano, M., and Baralle, F.E. (2001). Nuclear factor TDP-43 and SR proteins promote in vitro and in vivo CFTR exon 9 skipping. The EMBO Journal 20, 1774–1784.

127 Buratti, E., Brindisi, A., Giombi, M., Tisminetzky, S., Ayala, Y.M., and Baralle, F.E. (2005). TDP-43 binds heterogeneous nuclear ribonucleoprotein A/B through its C- terminal tail: an important region for the inhibition of cystic fibrosis transmembrane conductance regulator exon 9 splicing. J Biol Chem 280, 37572–37584.

Burattini, S., Ferri, P., Battistelli, M., Curci, R., Luchetti, F., and Falcieri, E. (2004). C2C12 murine myoblasts as a model of skeletal muscle development: morpho- functional characterization. Eur J Histochem 48, 223–233.

Burd, C.G., and Dreyfuss, G. (1994). RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing. The EMBO Journal 13, 1197–1204.

Burd, C.G., Swanson, M.S., Görlach, M., and Dreyfuss, G. (1989). Primary structures of the heterogeneous nuclear ribonucleoprotein A2, B1, and C2 proteins: a diversity of RNA binding proteins is generated by small peptide inserts. Proc Natl Acad Sci USA 86, 9788–9792.

Busch, A., and Hertel, K.J. (2012). Evolution of SR protein and hnRNP splicing regulatory factors. WIREs RNA 3, 1–12.

Busch, A., Richter, A.S., and Backofen, R. (2008). IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions. 24, 2849–2856.

Cabili, M.N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A., and Rinn, J.L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25, 1915–1927.

Cao, R., and Zhang, Y. (2004). SUZ12 Is Required for Both the Histone Methyltransferase Activity and the Silencing Function of the EED-EZH2 Complex. Mol Cell 15, 57–67.

Cesana, M., Cacchiarelli, D., Legnini, I., Santini, T., Sthandier, O., Chinappi, M., Tramontano, A., and Bozzoni, I. (2011). A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147, 358–369.

Chow, J.C., Ciaudo, C., Fazzari, M.J., Mise, N., Servant, N., Glass, J.L., Attreed, M., Avner, P., Wutz, A., Barillot, E., et al. (2010). LINE-1 activity in facultative heterochromatin formation during X chromosome inactivation. Cell 141, 956–969.

Chu, C., Qu, K., Zhong, F.L., Artandi, S.E., and Chang, H.Y. (2011). Genomic Maps of Long Noncoding RNA Occupancy Reveal Principles of RNA-Chromatin Interactions. Mol Cell 44, 667–678.

Chu, C., Zhang, Q.C., da Rocha, S.T., Flynn, R.A., Bharadwaj, M., Calabrese, J.M., Magnuson, T., Heard, E., and Chang, H.Y. (2015). Systematic discovery of Xist RNA binding proteins. Cell 161, 404–416.

128 Ciferri, C., Lander, G.C., Maiolica, A., Herzog, F., Aebersold, R., and Nogales, E. (2012). Molecular architecture of human polycomb repressive complex 2. Elife 1, e00005.

Cifuentes-Rojas, C., Hernandez, A.J., Sarma, K., and Lee, J.T. (2014). Regulatory interactions between RNA and polycomb repressive complex 2. Mol Cell 55, 171–185.

Crawford, G.E., Holt, I.E., Whittle, J., Webb, B.D., Tai, D., Davis, S., Margulies, E.H., Chen, Y., Bernat, J.A., Ginsburg, D., et al. (2006). Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16, 123–131. da Rocha, S.T., Boeva, V., Escamilla-Del-Arenal, M., Ancelin, K., Granier, C., Matias, N.R., Sanulli, S., Chow, J., Schulz, E., Picard, C., et al. (2014). Jarid2 Is Implicated in the Initial Xist-Induced Targeting of PRC2 to the Inactive X Chromosome. Mol Cell 53, 301–316.

Darzacq, X., Jády, B.E., Verheggen, C., Kiss, A.M., Bertrand, E., and Kiss, T. (2002). Cajal body-specific small nuclear RNAs: a novel class of 2'-O-methylation and pseudouridylation guide RNAs. The EMBO Journal 21, 2746–2756.

Davidovich, C., Wang, X., Cifuentes-Rojas, C., Goodrich, K.J., Gooding, A.R., Lee, J.T., and Cech, T.R. (2015). Toward a consensus on the binding specificity and promiscuity of PRC2 for RNA. Mol Cell 57, 552–558.

Davidovich, C., Zheng, L., Goodrich, K.J., and Cech, T.R. (2013). Promiscuous RNA binding by Polycomb repressive complex 2. Nat Struct Mol Biol 20, 1250–1257.

Dayama, G., Emery, S.B., Kidd, J.M., and Mills, R.E. (2014). The genomic landscape of polymorphic human nuclear mitochondrial insertions. Nucleic Acids Research 42, 12640–12649.

de Klerk, E., Fokkema, I.F.A.C., Thiadens, K.A.M.H., Goeman, J.J., Palmblad, M., Dunnen, den, J.T., Lindern, von, M., and 't Hoen, P.A.C. (2015). Assessing the translational landscape of myogenic differentiation by ribosome profiling. Nucleic Acids Research 43, 4408–4428.

De Raedt, T., Beert, E., Pasmant, E., Luscan, A., Brems, H., Ortonne, N., Helin, K., Hornick, J.L., Mautner, V., Kehrer-Sawatzki, H., et al. (2014). PRC2 loss amplifies Ras-driven transcription and confers sensitivity to BRD4-based therapies. Nature 514, 247–251.

Denisenko, O.N., and Bomsztyk, K. (1997). The product of the murine homolog of the Drosophila extra sex combs gene displays transcriptional repressor activity. Mol. Cell. Biol. 17, 4707–4717.

129 Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D.G., et al. (2012). The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res 22, 1775–1789.

Ding, J., Hayashi, M.K., Zhang, Y., Manche, L., Krainer, A.R., and Xu, R.M. (1999). Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single- stranded telomeric DNA. Genes Dev 13, 1102–1115.

Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., et al. (2012). Landscape of transcription in human cells. Nature 488, 101–108.

Dreyfuss, G., Kim, V.N., and Kataoka, N. (2002). Messenger-RNA-binding proteins and the messages they carry. Nat. Rev. Mol. Cell Biol. 3, 195–205.

Dunham, I., Aldred, S.F., Davis, C.A., Doyle, F., Harrow, J., Pauli, F., Rosenbloom, K.R., Sabo, P., Safi, A., Simon, J.M., et al. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74.

D’Ambrogio, A., Buratti, E., Stuani, C., Guarnaccia, C., Romano, M., Ayala, Y.M., and Baralle, F.E. (2009). Functional mapping of the interaction between TDP-43 and hnRNP A2 in vivo. Nucleic Acids Research 37, 4116–4126.

Elisaphenko, E.A., Kolesnikov, N.N., Shevchenko, A.I., Rogozin, I.B., Nesterova, T.B., Brockdorff, N., and Zakian, S.M. (2008). A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements. PLoS ONE 3, e2521.

Engreitz, J.M., Pandya-Jones, A., McDonel, P., Shishkin, A., Sirokman, K., Surka, C., Kadri, S., Xing, J., Goren, A., Lander, E.S., et al. (2013). The Xist lncRNA Exploits Three-Dimensional Genome Architecture to Spread Across the X Chromosome. Science 341, 1237973–1237973.

Fan, X., Messaed, C., Dion, P., Laganiere, J., Brais, B., Karpati, G., and Rouleau, G.A. (2014). hnRNP A1 and A/B Interaction with PABPN1 in Oculopharyngeal Muscular Dystrophy. Canadian Journal of Neurological Sciences / Journal Canadien Des Sciences Neurologiques 30, 244–251.

Fatimy, El, R., Davidovic, L., Tremblay, S., Jaglin, X., Dury, A., Robert, C., De Koninck, P., and Khandjian, E.W. (2016). Tracking the Fragile X Mental Retardation Protein in a Highly Ordered Neuronal RiboNucleoParticles Population: A Link between Stalled Polyribosomes and RNA Granules. PLoS Genet. 12, e1006192.

Flynn, R.A., Martin, L., Spitale, R.C., Do, B.T., Sagan, S.M., Zarnegar, B., Qu, K., Khavari, P.A., Quake, S.R., Sarnow, P., et al. (2014). Dissecting noncoding and pathogen RNA–protein interactomes. Rna 21, 135–143.

130 Franke, A., DeCamillis, M., Zink, D., Cheng, N., Brock, H.W., and Paro, R. (1992). Polycomb and polyhomeotic are constituents of a multimeric protein complex in chromatin of Drosophila melanogaster. The EMBO Journal 11, 2941–2950.

Geissler, R., Simkin, A., Floss, D., Patel, R., Fogarty, E.A., Scheller, J., and Grimson, A. (2016). A widespread sequence-specific mRNA decay pathway mediated by hnRNPs A1 and A2/B1. Genes Dev 30, 1070–1085.

Gerstein, M.B., Kundaje, A., Hariharan, M., Landt, S.G., Yan, K.-K., Cheng, C., Mu, X.J., Khurana, E., Rozowsky, J., Alexander, R., et al. (2012). Architecture of the human regulatory network derived from ENCODE data. Nature 488, 91–100.

Glazko, G.V., Zybailov, B.L., and Rogozin, I.B. (2012). Computational prediction of polycomb-associated long non-coding RNAs. PLoS ONE 7, e44878.

Goodarzi, H., Najafabadi, H.S., Oikonomou, P., Greco, T.M., Fish, L., Salavati, R., Cristea, I.M., and Tavazoie, S. (2013). Systematic discovery of structural elements governing stability of mammalian messenger RNAs. Nature 485, 264–268.

Gueroussov, S., Weatheritt, R.J., O'Hanlon, D., Lin, Z.-Y., Narula, A., Gingras, A.-C., and Blencowe, B.J. (2017). Regulatory Expansion in Mammals of Multivalent hnRNP Assemblies that Globally Control Alternative Splicing. Cell 170, 324–339.e23.

Guo, F., Jiao, F., Song, Z., Li, S., Liu, B., Yang, H., Zhou, Q., and Li, Z. (2015). Regulation of MALAT1 expression by TDP43 controls the migration and invasion of non-small cell lung cancer cells in vitro. Biochem. Biophys. Res. Commun. 465, 293– 298.

Gupta, R.A., Shah, N., Wang, K.C., Kim, J., Horlings, H.M., Wong, D.J., Tsai, M.-C., Hung, T., Argani, P., Rinn, J.L., et al. (2011). Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071–1076.

Guttman, M., Donaghey, J., Carey, B.W., Garber, M., Grenier, J.K., Munson, G., Young, G., Lucas, A.B., Ach, R., Bruhn, L., et al. (2012). lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477, 295–300.

Han, H., Braunschweig, U., Gonatopoulos-Pournatzis, T., Weatheritt, R.J., Hirsch, C.L., Ha, K.C.H., Radovani, E., Nabeel-Shah, S., Sterne-Weiler, T., Wang, J., et al. (2017). Multilayered Control of Alternative Splicing Regulatory Networks by Transcription Factors. Mol Cell 65, 539–553.e7.

Harrow, J., Denoeud, F., Frankish, A., Reymond, A., Chen, C.-K., Chrast, J., Lagarde, J., Gilbert, J.G.R., Storey, R., Swarbreck, D., et al. (2006). GENCODE: producing a reference annotation for ENCODE. Genome Biol 7 Suppl 1, S4.1–S4.9.

Hasegawa, Y., Brockdorff, N., Kawano, S., Tsutui, K., Tsutui, K., and Nakagawa, S. (2010). The Matrix Protein hnRNP U Is Required for Chromosomal Localization of Xist RNA. Developmental Cell 19, 469–476.

131 Hatfield, J.T., Rothnagel, J.A., and Smith, R. (2002). Characterization of the mouse hnRNP A2/B1/B0 gene and identification of processed pseudogenes. Gene 295, 33– 42.

He, Y., Selvaraju, S., Curtin, M.L., Jakob, C.G., Zhu, H., Comess, K.M., Shaw, B., The, J., Lima-Fernandes, E., Szewczyk, M.M., et al. (2017). The EED protein-protein interaction inhibitor A-395 inactivates the PRC2 complex. Nat. Chem. Biol. 13, 389– 395.

Huelga, S.C., Vu, A.Q., Arnold, J.D., Liang, T.Y., Liu, P.P., Yan, B.Y., Donohue, J.P., Shiue, L., Hoon, S., Brenner, S., et al. (2012). Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Rep 1, 167–178.

Huppertz, I., Attig, J., D’Ambrogio, A., Easton, L.E., Sibley, C.R., Sugimoto, Y., Tajnik, M., König, J., and Ule, J. (2014). iCLIP: protein-RNA interactions at nucleotide resolution. Methods 65, 274–287.

Ji, Z., Song, R., Huang, H., Regev, A., and Struhl, K. (2016). Transcriptome-scale RNase-footprinting of RNA-protein complexes. Nature Biotechnology 34, 410–413.

Jiang, J., Jing, Y., Cost, G.J., Chiang, J.-C., Kolpa, H.J., Cotton, A.M., Carone, D.M., Carone, B.R., Shivak, D.A., Guschin, D.Y., et al. (2013). Translating dosage compensation to trisomy 21. Nature 500, 296–300.

Johnson, R., and Guigó, R. (2014). The RIDL hypothesis: transposable elements as functional domains of long noncoding RNAs. Rna 20, 959–976.

Jürgens, G. (1985). A group of genes controlling the spatial expression of the bithorax complex in Drosophila. Nature 316, 153–155.

Kamma, H., Horiguchi, H., Wan, L., Matsui, M., Fujiwara, M., Fujimoto, M., Yazawa, T., and Dreyfuss, G. (1999). Molecular characterization of the hnRNP A2/B1 proteins: tissue-specific expression and novel isoforms. Exp. Cell Res. 246, 399–411.

Kaneko, S., Bonasio, R., Saldaña-Meyer, R., Yoshida, T., Son, J., Nishino, K., Umezawa, A., and Reinberg, D. (2014a). Interactions between JARID2 and noncoding RNAs regulate PRC2 recruitment to chromatin. Mol Cell 53, 290–300.

Kaneko, S., Li, G., Son, J., Xu, C.-F., Margueron, R., Neubert, T.A., and Reinberg, D. (2010). Phosphorylation of the PRC2 component Ezh2 is cell cycle-regulated and up- regulates its binding to ncRNA. Genes Dev 24, 2615–2620.

Kaneko, S., Son, J., Bonasio, R., Shen, S.S., and Reinberg, D. (2014b). Nascent RNA interaction keeps PRC2 activity poised and in check. Genes Dev 28, 1983–1988.

132 Kaneko, S., Son, J., Shen, S.S., Reinberg, D., and Bonasio, R. (2013). PRC2 binds active promoters and contacts nascent RNAs in embryonic stem cells. Nat Struct Mol Biol 20, 1258–1264.

Kapusta, A., and Feschotte, C. (2014). Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications. Trends in Genetics 30, 439– 452.

Kapusta, A., Kronenberg, Z., Lynch, V.J., Zhuo, X., Ramsay, L., Bourque, G., Yandell, M., and Feschotte, C. (2013). Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs. PLoS Genet. 9, e1003470.

Katsushima, K., Natsume, A., Ohka, F., Shinjo, K., Hatanaka, A., Ichimura, N., Sato, S., Takahashi, S., Kimura, H., Totoki, Y., et al. (2016). Targeting the Notch-regulated non-coding RNA TUG1 for glioma treatment. Nature Communications 7, 13616.

Kelley, D., and Rinn, J. (2012). Transposable elements reveal a stem cell-specific class of long noncoding RNAs. Genome Biol 13, R107.

Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G.E., Dekker, J., et al. (2014). Defining functional DNA elements in the human genome. Proc Natl Acad Sci USA 111, 6131–6138.

Kent, W.J., Zweig, A.S., Barber, G., Hinrichs, A.S., and Karolchik, D. (2010). BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207.

Khalil, A.M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B.E., van Oudenaarden, A., et al. (2009). Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci USA 106, 11667–11672.

Khong, A., Kerr, C.H., Yeung, C.H.L., Keatings, K., Nayak, A., Allan, D.W., and Jan, E. (2017). Disruption of Stress Granule Formation by the Multifunctional Cricket Paralysis Virus 1A Protein. J. Virol. 91, e01779–16.

Kim, H., Kang, K., and Kim, J. (2009). AEBP2 as a potential targeting protein for Polycomb Repression Complex PRC2. Nucleic Acids Research 37, 2940–2950.

Kim, H.J., Kim, N.C., Wang, Y.-D., Scarborough, E.A., Moore, J., Diaz, Z., MacLea, K.S., Freibaum, B., Li, S., Molliex, A., et al. (2013a). Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS. Nature 495, 467–473.

Kim, T.-G., Kraus, J.C., Chen, J., and Lee, Y. (2003). JUMONJI, a critical factor for cardiac development, functions as a transcriptional repressor. J Biol Chem 278, 42247–42255.

133 Kim, W., Bird, G.H., Neff, T., Guo, G., Kerenyi, M.A., Walensky, L.D., and Orkin, S.H. (2013b). Targeted disruption of the EZH2-EED complex inhibits EZH2- dependent cancer. Nat. Chem. Biol. 9, 643–650.

Kino, T., Hurt, D.E., Ichijo, T., Nader, N., and Chrousos, G.P. (2010). Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal 3, ra8.

Kirmizis, A., Bartley, S.M., Kuzmichev, A., Margueron, R., Reinberg, D., Green, R., and Farnham, P.J. (2004). Silencing of human polycomb target genes is associated with methylation of histone H3 Lys 27. Genes Dev 18, 1592–1605.

Kumar, A., and Wilson, S.H. (1990). Studies of the strand-annealing activity of mammalian hnRNP complex protein A1. Biochemistry 29, 10717–10722.

Kumari, D., and Usdin, K. (2014). Polycomb group complexes are recruited to reactivated FMR1 alleles in Fragile X syndrome in response to FMR1 transcription. Hum Mol Genet 23, 6575–6583.

Kundu, S., Ji, F., Sunwoo, H., Jain, G., Lee, J.T., Sadreyev, R.I., Dekker, J., and Kingston, R.E. (2017). Polycomb Repressive Complex 1 Generates Discrete Compacted Domains that Change during Differentiation. Mol Cell 65, 432– 446.e435.

Kung, J.T., Kesner, B., An, J.Y., Ahn, J.Y., Cifuentes-Rojas, C., Colognori, D., Jeon, Y., Szanto, A., del Rosario, B.C., Pinter, S.F., et al. (2015). Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. Mol Cell 57, 361–375.

Kuzmichev, A., Nishioka, K., Erdjument-Bromage, H., Tempst, P., and Reinberg, D. (2002). Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein. Genes Dev 16, 2893–2905. la Cruz, de, C.C., Kirmizis, A., Simon, M.D., Isono, K.-I., Koseki, H., and Panning, B. (2007). The Polycomb Group Protein SUZ12 regulates histone H3 lysine 9 methylation and HP1α distribution. Chromosome Res 15, 299–314.

Le Ber, I., Van Bortel, I., Nicolas, G., Bouya-Ahmed, K., Camuzat, A., Wallon, D., De Septenville, A., Latouche, M., Lattante, S., Kabashi, E., et al. (2014). hnRNPA2B1 and hnRNPA1 mutations are rare in patients with “multisystem proteinopathy” and frontotemporal lobar degeneration phenotypes. Neurobiology of Aging 35, 934.e5– 934.e6.

Le Thomas, A., Rogers, A.K., Webster, A., Marinov, G.K., Liao, S.E., Perkins, E.M., Hur, J.K., Aravin, A.A., and Toth, K.F. (2013). Piwi induces piRNA-guided transcriptional silencing and establishment of a repressive chromatin state. Genes Dev 27, 390–399.

134 Lee, S., Kopp, F., Chang, T.-C., Sataluri, A., Chen, B., Sivakumar, S., Yu, H., Xie, Y., and Mendell, J.T. (2016). Noncoding RNA NORAD Regulates Genomic Stability by Sequestering PUMILIO Proteins. Cell 164, 69–80.

Levine, S.S., Weiss, A., Erdjument-Bromage, H., Shao, Z., Tempst, P., and Kingston, R.E. (2002). The core of the polycomb repressive complex is compositionally and functionally conserved in flies and humans. Mol. Cell. Biol. 22, 6070–6078.

Lewis, E.B. (1978). A gene complex controlling segmentation in Drosophila. Nature 276, 565–570.

Lewis, P.H. (1947). Pc: Polycomb (Drosophila Information Service).

Lewis, P.W., Müller, M.M., Koletsky, M.S., Cordero, F., Lin, S., Banaszynski, L.A., Garcia, B.A., Muir, T.W., Becher, O.J., and Allis, C.D. (2013). Inhibition of PRC2 activity by a gain-of-function H3 mutation found in pediatric glioblastoma. Science 340, 857–861.

Li, G., Margueron, R., Ku, M., Chambon, P., and Reinberg, D. (2010). Jarid2 and PRC2, partners in regulating gene expression. Genes Dev 24, 368–380.

Li, Q., Peterson, K.R., Fang, X., and Stamatoyannopoulos, G. (2002). Locus control regions. Blood 100, 3077–3086.

Li, S., Zhang, P., Freibaum, B.D., Kim, N.C., Kolaitis, R.-M., Molliex, A., Kanagaraj, A.P., Yabe, I., Tanino, M., Tanaka, S., et al. (2016a). Genetic interaction of hnRNPA2B1 and DNAJB6 in a Drosophila model of multisystem proteinopathy. Hum Mol Genet 25, 936–950.

Li, T., Hu, J.-F., Qiu, X., Ling, J., Chen, H., Wang, S., Hou, A., Vu, T.H., and Hoffman, A.R. (2008). CTCF regulates allelic expression of Igf2 by orchestrating a promoter-polycomb repressive complex 2 intrachromosomal loop. Mol. Cell. Biol. 28, 6473–6482.

Li, W., Notani, D., Ma, Q., Tanasa, B., Nunez, E., Chen, A.Y., Merkurjev, D., Zhang, J., Ohgi, K., Song, X., et al. (2013). Functional roles of enhancer RNAs for oestrogen- dependent transcriptional activation. Nature 498, 516–520.

Li, Z., Shen, J., Chan, M.T.V., and Wu, W.K.K. (2016b). TUG1: a pivotal oncogenic long non-coding RNA of human cancers. Cell Prolif. 49, 471–475.

Li, Z., Chao, T.-C., Chang, K.-Y., Lin, N., Patil, V.S., Shimizu, C., Head, S.R., Burns, J.C., and Rana, T.M. (2014). The long noncoding RNA THRIL regulates TNFα expression through its interaction with hnRNPL. Proc Natl Acad Sci USA 111, 1002– 1007.

135 Lin, J.Y., Shih, S.R., Pan, M., Li, C., Lue, C.F., Stollar, V., and Li, M.L. (2009). hnRNP A1 Interacts with the 5' Untranslated Regions of Enterovirus 71 and Sindbis Virus RNA and Is Required for Viral Replication. J. Virol. 83, 6106–6114.

Linder, B., Grozhik, A.V., Olarerin-George, A.O., Meydan, C., Mason, C.E., and Jaffrey, S.R. (2015). Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat Methods 12, 767–772.

Liu, F., Somarowthu, S., and Pyle, A.M. (2017a). Visualizing the secondary and tertiary architectural domains of lncRNA RepA. Nat. Chem. Biol. 13, 282–289.

Liu, N., Dai, Q., Zheng, G., He, C., Parisien, M., and Pan, T. (2015). N6- methyladenosine-dependent RNA structural switches regulate RNA-protein interactions. Nature 518, 560–564.

Liu, T.-Y., Chen, Y.-C., Jong, Y.-J., Tsai, H.-J., Lee, C.-C., Chang, Y.-S., Chang, J.-G., and Chang, Y.-F. (2017b). Muscle developmental defects in heterogeneous nuclear Ribonucleoprotein A1 knockout mice. Open Biol. 7, 160303–160312.

Lovci, M.T., Ghanem, D., Marr, H., Arnold, J., Gee, S., Parra, M., Liang, T.Y., Stark, T.J., Gehman, L.T., Hoon, S., et al. (2013). Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges. Nat Struct Mol Biol 20, 1434–1442.

Maji, S.K., Perrin, M.H., Sawaya, M.R., Jessberger, S., Vadodaria, K., Rissman, R.A., Singru, P.S., Nilsson, K.P.R., Simon, R., Schubert, D., et al. (2009). Functional amyloids as natural storage of peptide hormones in pituitary secretory granules. Science 325, 328–332.

Margueron, R., Justin, N., Ohno, K., Sharpe, M.L., Son, J., DruryIII, W.J., Voigt, P., Martin, S.R., Taylor, W.R., De Marco, V., et al. (2009). Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461, 762–767.

Margueron, R., Li, G., Sarma, K., Blais, A., Zavadil, J., Woodcock, C.L., Dynlacht, B.D., and Reinberg, D. (2008). Ezh1 and Ezh2 maintain repressive chromatin through different mechanisms. Mol Cell 32, 503–518.

Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal 17, 10.

Martinez, F.J., Pratt, G.A., Van Nostrand, E.L., Batra, R., Huelga, S.C., Kapeli, K., Freese, P., Chun, S.J., Ling, K., Gelboin-Burkhart, C., et al. (2016). Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System. Neuron 92, 780–795.

Maximov, V., Martynenko, A., Hunsmann, G., and Tarantul, V. (2002). Mitochondrial 16S rRNA gene encodes a functional peptide, a potential drug for Alzheimer’s disease and target for cancer therapy. Medical Hypotheses 59, 670–673.

136 Mayeda, A., and Krainer, A.R. (1992). Regulation of alternative pre-mRNA splicing by hnRNP A1 and splicing factor SF2. Cell 68, 365–375.

Mayeda, A., Munroe, S.H., Caceres, J.F., and Krainer, A.R. (1994). Function of conserved domains of hnRNP A1 and other hnRNP A/B proteins. The EMBO Journal 13, 5483–5495.

McHugh, C.A., Chen, C.-K., Chow, A., Surka, C.F., Tran, C., McDonel, P., Pandya- Jones, A., Blanco, M., Burghard, C., Moradian, A., et al. (2015). The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature 521, 232–236.

McKay, S.J., and Cooke, H. (1992). hnRNP A2/B1 binds specifically to single stranded vertebrate telomeric repeat TTAGGGn. Nucleic Acids Research 20, 6461– 6464.

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303.

Mendenhall, E.M., Koche, R.P., Truong, T., Zhou, V.W., Issac, B., Chi, A.S., Ku, M., and Bernstein, B.E. (2010). GC-Rich Sequence Elements Recruit PRC2 in Mammalian ES Cells. PLoS Genet. 6, e1001244.

Meredith, E.K., Balas, M.M., Sindy, K., Haislop, K., and Johnson, A.M. (2016). An RNA matchmaker protein regulates the activity of the long noncoding RNA HOTAIR. Rna 22, 995–1010.

Mi, H., Huang, X., Muruganujan, A., Tang, H., Mills, C., Kang, D., and Thomas, P.D. (2017). PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Research 45, D183–D189.

Minajigi, A., Froberg, J.E., Wei, C., Sunwoo, H., Kesner, B., Colognori, D., Lessing, D., Payer, B., Boukhali, M., Haas, W., et al. (2015). A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science 349, aab2276–aab2276.

Mishra, R.K., Mihaly, J., Barges, S., Spierer, A., Karch, F., Hagstrom, K., Schweinsberg, S.E., and Schedl, P. (2001). The iab-7 polycomb response element maps to a nucleosome-free region of chromatin and requires both GAGA and pleiohomeotic for silencing activity. Mol. Cell. Biol. 21, 1311–1318.

Mohagheghi, F., Prudencio, M., Stuani, C., Cook, C., Jansen-West, K., Dickson, D.W., Petrucelli, L., and Buratti, E. (2016). TDP-43 functions within a network of hnRNP proteins to inhibit the production of a truncated human SORT1 receptor. Hum Mol Genet 25, 534–545.

137 Moore, M.J., Zhang, C., Gantman, E.C., Mele, A., Darnell, J.C., and Darnell, R.B. (2014). Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat Protoc 9, 263–293.

Moran-Jones, K., Wayman, L., Kennedy, D.D., Reddel, R.R., Sara, S., Snee, M.J., and Smith, R. (2005). hnRNP A2, a potential ssDNA/RNA molecular adapter at the telomere. Nucleic Acids Research 33, 486–496.

Mozzetta, C., Pontis, J., Fritsch, L., Robin, P., Portoso, M., Proux, C., Margueron, R., and Ait-Si-Ali, S. (2014). The histone H3 lysine 9 methyltransferases G9a and GLP regulate polycomb repressive complex 2-mediated gene silencing. Mol Cell 53, 277– 289.

Mulholland, N.M., King, I.F.G., and Kingston, R.E. (2003). Regulation of Polycomb group complexes by the sequence-specific DNA binding proteins Zeste and GAGA. Genes Dev 17, 2741–2746.

Munroe, S.H., and Dong, X.F. (1992). Heterogeneous nuclear ribonucleoprotein A1 catalyzes RNA.RNA annealing. Proc Natl Acad Sci USA 89, 895–899.

Murzina, N.V., Pei, X.-Y., Zhang, W., Sparkes, M., Vicente-Garcia, J., Pratap, J.V., McLaughlin, S.H., Ben-Shahar, T.R., Verreault, A., Luisi, B.F., et al. (2008). Structural Basis for the Recognition of Histone H4 by the Histone-Chaperone RbAp46. Structure 16, 1077–1085.

Musselman, C.A., Avvakumov, N., Watanabe, R., Abraham, C.G., Lalonde, M.-E., Hong, Z., Allen, C., Roy, S., Nuñez, J.K., Nickoloff, J., et al. (2012). Molecular basis for H3K36me3 recognition by the Tudor domain of PHF1. Nat Struct Mol Biol 19, 1266–1272.

Mysliwiec, M.R., Bresnick, E.H., and Lee, Y. (2011). Endothelial Jarid2/Jumonji Is Required for Normal Cardiac Development and Proper Notch1 Expression. Journal of Biological Chemistry 286, 17193–17204.

Necsulea, A., Soumillon, M., Warnefors, M., Liechti, A., Daish, T., Zeller, U., Baker, J.C., Grützner, F., and Kaessmann, H. (2014). The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640.

Neff, T., Sinha, A.U., Kluk, M.J., Zhu, N., Khattab, M.H., Stein, L., Xie, H., Orkin, S.H., and Armstrong, S.A. (2012). Polycomb repressive complex 2 is required for MLL-AF9 leukemia. Proc Natl Acad Sci USA 109, 5028–5033.

Neph, S., Vierstra, J., Stergachis, A.B., Reynolds, A.P., Haugen, E., Vernot, B., Thurman, R.E., John, S., Sandstrom, R., Johnson, A.K., et al. (2012). An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 488, 83–90.

138 Neumann, M., Sampathu, D.M., Kwong, L.K., Truax, A.C., Micsenyi, M.C., Chou, T.T., Bruce, J., Schuck, T., Grossman, M., Clark, C.M., et al. (2006). Ubiquitinated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. Science 314, 130–133.

Ng, S.-Y., Bogu, G.K., Soh, B.S., and Stanton, L.W. (2013). The long noncoding RNA RMST interacts with SOX2 to regulate neurogenesis. Mol Cell 51, 349–359.

Ng, S.-Y., Johnson, R., and Stanton, L.W. (2011). Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. The EMBO Journal 31, 522–533.

Noordermeer, D., and de Laat, W. (2008). Joining the loops: beta-globin gene regulation. IUBMB Life 60, 824–833.

Ou, S.H., Wu, F., Harrich, D., García-Martínez, L.F., and Gaynor, R.B. (1995). Cloning and characterization of a novel cellular protein, TDP-43, that binds to human immunodeficiency virus type 1 TAR DNA sequence motifs. J. Virol. 69, 3584–3596.

Palhais, B., Dembic, M., Sabaratnam, R., Nielsen, K.S., Doktor, T.K., Bruun, G.H., and Andresen, B.S. (2016). The prevalent deep intronic c. 639+919 G>A GLA mutation causes pseudoexon activation and Fabry disease by abolishing the binding of hnRNPA1 and hnRNP A2/B1 to a splicing silencer. Molecular Genetics and Metabolism 119, 258–269.

Pandey, R.R., Mondal, T., Mohammad, F., Enroth, S., Redrup, L., Komorowski, J., Nagano, T., Mancini-DiNardo, D., and Kanduri, C. (2008). Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell 32, 232–246.

Pasini, D., Cloos, P.A.C., Walfridsson, J., Olsson, L., Bukowski, J.-P., Johansen, J.V., Bak, M., Tommerup, N., Rappsilber, J., and Helin, K. (2010). JARID2 regulates binding of the Polycomb repressive complex 2 to target genes in ES cells. Nature 464, 306–310.

Patil, D.P., Chen, C.-K., Pickering, B.F., Chow, A., Jackson, C., Guttman, M., and Jaffrey, S.R. (2016). m(6)A RNA methylation promotes XIST-mediated transcriptional repression. Nature 537, 369–373.

Peng, J.C., Valouev, A., Swigut, T., Zhang, J., Zhao, Y., Sidow, A., and Wysocka, J. (2009). Jarid2/Jumonji Coordinates Control of PRC2 Enzymatic Activity and Target Gene Occupancy in Pluripotent Cells. Cell 139, 1290–1302.

Penzkofer, T. (2004). L1Base: from functional annotation to prediction of active LINE-1 elements. Nucleic Acids Research 33, D498–D500.

139 Polymenidou, M., Lagier-Tourenne, C., Hutt, K.R., Huelga, S.C., Moran, J., Liang, T.Y., Ling, S.-C., Sun, E., Wancewicz, E., Mazur, C., et al. (2011). Long pre-mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP-43. Nature Neuroscience 14, 459–468.

Qi, W., Zhao, K., Gu, J., Huang, Y., Wang, Y., Zhang, H., Zhang, M., Zhang, J., Yu, Z., Li, L., et al. (2017). An allosteric PRC2 inhibitor targeting the H3K27me3 binding pocket of EED. Nat. Chem. Biol. 13, 381–388.

Quinn, J.J., Ilik, I.A., Qu, K., Georgiev, P., Chu, C., Akhtar, A., and Chang, H.Y. (2014). Revealing long noncoding RNA architecture and functions using domain- specific chromatin isolation by RNA purification. Nature Biotechnology 32, 933– 940.

Ramos, A., Barbena, E., Mateiu, L., del Mar González, M., Mairal, Q., Lima, M., Montiel, R., Aluja, M.P., and Santos, C. (2011). Nuclear insertions of mitochondrial origin: Database updating and usefulness in cancer studies. Mitochondrion 11, 946– 953.

Rinn, J.L., Kertesz, M., Wang, J.K., Squazzo, S.L., Xu, X., Brugmann, S.A., Goodnough, L.H., Helms, J.A., Farnham, P.J., Segal, E., et al. (2007). Functional Demarcation of Active and Silent Chromatin Domains in Human HOX Loci by Noncoding RNAs. Cell 129, 1311–1323.

Romano, M., Buratti, E., Romano, G., Klima, R., Del Bel Belluz, L., Stuani, C., Baralle, F., and Feiguin, F. (2014). Evolutionarily conserved heterogeneous nuclear ribonucleoprotein (hnRNP) A/B proteins functionally interact with human and Drosophila TAR DNA-binding protein 43 (TDP-43). Journal of Biological Chemistry 289, 7121–7130.

Rothbart, S.B., Dickson, B.M., Raab, J.R., Grzybowski, A.T., Krajewski, K., Guo, A.H., Shanle, E.K., Josefowicz, S.Z., Fuchs, S.M., Allis, C.D., et al. (2015). An Interactive Database for the Assessment of Histone Antibody Specificity. Mol Cell 59, 502–511.

Sadic, D., Schmidt, K., Groh, S., Kondofersky, I., Ellwart, J., Fuchs, C., Theis, F.J., and Schotta, G. (2015). Atrx promotes heterochromatin formation at retrotransposons. EMBO Rep. 16, 836–850.

Sanulli, S., Justin, N., Teissandier, A., Ancelin, K., Portoso, M., Caron, M., Michaud, A., Lombard, B., da Rocha, S.T., Offer, J., et al. (2015). Jarid2 Methylation via the PRC2 Complex Regulates H3K27me3 Deposition during Cell Differentiation. Mol Cell 57, 769–783.

Sarma, K., Cifuentes-Rojas, C., Ergun, A., del Rosario, A., Jeon, Y., White, F., Sadreyev, R., and Lee, J.T. (2014). ATRX Directs Binding of PRC2 to Xist RNA and Polycomb Targets. Cell 159, 869–883.

140 Schimmelmann, von, M., Feinberg, P.A., Sullivan, J.M., Ku, S.M., Badimon, A., Duff, M.K., Wang, Z., Lachmann, A., Dewell, S., Ma'ayan, A., et al. (2016). Polycomb repressive complex 2 (PRC2) silences genes responsible for neurodegeneration. Nature Neuroscience 19, 1321–1330.

Schmitges, F.W., Prusty, A.B., Faty, M., Stützer, A., Lingaraju, G.M., Aiwazian, J., Sack, R., Hess, D., Li, L., Zhou, S., et al. (2011). Histone methylation by PRC2 is inhibited by active chromatin marks. Mol Cell 42, 330–341.

Schoeftner, S., Sengupta, A.K., Kubicek, S., Mechtler, K., Spahn, L., Koseki, H., Jenuwein, T., and Wutz, A. (2006). Recruitment of PRC1 function at the initiation of X inactivation independent of PRC2 and silencing. The EMBO Journal 25, 3110– 3122.

Schwendemann, A., and Lehmann, M. (2002). Pipsqueak and GAGA factor act in concert as partners at homeotic and many other loci. Proc Natl Acad Sci USA 99, 12883–12888.

Seong, I.S., Woda, J.M., Song, J.J., Lloret, A., Abeyrathne, P.D., Woo, C.J., Gregory, G., Lee, J.-M., Wheeler, V.C., Walz, T., et al. (2010). Huntingtin facilitates polycomb repressive complex 2. Hum Mol Genet 19, 573–583.

Shamoo, Y., Krueger, U., Rice, L.M., Williams, K.R., and Steitz, T.A. (1997). Crystal structure of the two RNA binding domains of human hnRNP A1 at 1.75 A resolution. Nat. Struct. Biol. 4, 215–222.

Shen, X., Kim, W., Fujiwara, Y., Simon, M.D., Liu, Y., Mysliwiec, M.R., Lee, Y., and Orkin, S.H. (2009). Jumonji Modulates Polycomb Activity and Self-Renewal versus Differentiation of Stem Cells. Cell 139, 1303–1314.

Shen, X., Liu, Y., Hsu, Y.-J., Fujiwara, Y., Kim, J., Mao, X., Yuan, G.-C., and Orkin, S.H. (2008). EZH1 mediates methylation on histone H3 lysine 27 and complements EZH2 in maintaining stem cell identity and executing pluripotency. Mol Cell 32, 491–502.

Simon, M.D., Pinter, S.F., Fang, R., Sarma, K., Rutenberg-Schoenberg, M., Bowman, S.K., Kesner, B.A., Maier, V.K., Kingston, R.E., and Lee, J.T. (2014). High-resolution Xist binding maps reveal two-step spreading during X-chromosome inactivation. Nature 504, 465–469.

Simon, M.D., Wang, C.I., Kharchenko, P.V., West, J.A., Chapman, B.A., Alekseyenko, A.A., Borowsky, M.L., Kuroda, M.I., and Kingston, R.E. (2011). The genomic binding sites of a noncoding RNA. Proc Natl Acad Sci USA 108, 20497–20502.

Smit, A., Hubley, R., and Green, P. RepeatMasker 4.0. Httpwww.Repeatmasker.org.

141 Somarowthu, S., Legiewicz, M., Chillón, I., Marcia, M., Liu, F., and Pyle, A.M. (2015). HOTAIR Forms an Intricate and Modular Secondary Structure. Mol Cell 58, 353– 361.

Son, J., Shen, S.S., Margueron, R., and Reinberg, D. (2013). Nucleosome-binding activities within JARID2 and EZH1 regulate the function of PRC2 on chromatin. Genes Dev 27, 2663–2677.

Soule, H.D., Maloney, T.M., Wolman, S.R., Peterson, W.D., Brenz, R., McGrath, C.M., Russo, J., Pauley, R.J., Jones, R.F., and Brooks, S.C. (1990). Isolation and characterization of a spontaneously immortalized human breast epithelial cell line, MCF-10. Cancer Res. 50, 6075–6086.

Steiner, G., Hartmuth, K., Skriner, K., Maurer-Fogy, I., Sinski, A., Thalmann, E., Hassfeld, W., Barta, A., and Smolen, J.S. (1992). Purification and partial sequencing of the nuclear autoantigen RA33 shows that it is indistinguishable from the A2 protein of the heterogeneous nuclear ribonucleoprotein complex. J. Clin. Invest. 90, 1061–1066.

Sun, M., Gadad, S.S., Kim, D.-S., and Kraus, W.L. (2015). Discovery, Annotation, and Functional Analysis of Long Noncoding RNAs Controlling Cell-Cycle Gene Expression and Proliferation in Breast Cancer Cells. Mol Cell 59, 698–711.

Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen, E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B., et al. (2012). The accessible chromatin landscape of the human genome. Nature 489, 75–82.

Tichon, A., Gil, N., Lubelsky, Y., Solomon, T.H., Lemze, D., Itzkovitz, S., Stern- Ginossar, N., and Ulitsky, I. (2016). A conserved abundant cytoplasmic long noncoding RNA modulates repression by Pumilio proteins in human cells. Nature Communications 7, 1–10.

Tilgner, H., Knowles, D.G., Johnson, R., Davis, C.A., Chakrabortty, S., Djebali, S., Curado, J., Snyder, M., Gingeras, T.R., and Guigo, R. (2012). Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22, 1616–1625.

Tollervey, J.R., Curk, T., Rogelj, B., Briese, M., Cereda, M., Kayikci, M., König, J., Hortobágyi, T., Nishimura, A.L., Župunski, V., et al. (2011). Characterizing the RNA targets and position-dependent splicing regulation by TDP-43. Nature Neuroscience 14, 452–458.

Tomoum, H.Y., Mostafa, G.A., and El-Shahat, E.M.F. (2009). Autoantibody to heterogeneous nuclear ribonucleoprotein-A2 (RA33) in juvenile idiopathic arthritis: Clinical significance. Pediatrics International 51, 188–192.

Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. 25, 1105–1111.

142 Tsai, M.C., Manor, O., Wan, Y., Mosammaparast, N., Wang, J.K., Lan, F., Shi, Y., Segal, E., and Chang, H.Y. (2010). Long Noncoding RNA as Modular Scaffold of Histone Modification Complexes. Science 329, 689–693.

Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B.C., Remm, M., and Rozen, S.G. (2012). Primer3--new capabilities and interfaces. Nucleic Acids Research 40, e115–e115.

Van Nostrand, E.L., Pratt, G.A., Shishkin, A.A., Gelboin-Burkhart, C., Fang, M.Y., Sundararaman, B., Blue, S.M., Nguyen, T.B., Surka, C., Elkins, K., et al. (2016). Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods 13, 508–514.

Villegas, V.E., and Zaphiropoulos, P.G. (2015). Neighboring gene regulation by antisense long non-coding RNAs. Int J Mol Sci 16, 3251–3266.

Wang, F., Tang, M., Zeng, Z., Wu, R., and Xue, Y. (2012). Telomere-and telomerase- interacting protein that unfolds telomere G-quadruplex and promotes telomere extension in mammalian cells.

Wang, X., Lu, Z., Gomez, A., Hon, G.C., Yue, Y., Han, D., Fu, Y., Parisien, M., Dai, Q., Jia, G., et al. (2014). N6-methyladenosine-dependent regulation of messenger RNA stability. Nature 505, 117–120.

Wang, X., Goodrich, K.J., Gooding, A.R., Naeem, H., Archer, S., Paucek, R.D., Youmans, D.T., Cech, T.R., and Davidovich, C. (2017). Targeting of Polycomb Repressive Complex 2 to RNA by Short Repeats of Consecutive Guanines. Mol Cell 65, 1056–1067.e5.

Washietl, S., Kellis, M., and Garber, M. (2014). Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res 24, 616– 628.

West, J.A., Davis, C.P., Sunwoo, H., Simon, M.D., Sadreyev, R.I., Wang, P.I., Tolstorukov, M.Y., and Kingston, R.E. (2014). The long noncoding RNAs NEAT1 and MALAT1 bind active chromatin sites. Mol Cell 55, 791–802.

Wright, D.K., Liu, S., van der Poel, C., McDonald, S.J., Brady, R.D., Taylor, L., Yang, L., Gardner, A.J., Ordidge, R., O'Brien, T.J., et al. (2016). Traumatic Brain Injury Results in Cellular, Structural and Functional Changes Resembling Motor Neuron Disease. Cereb. Cortex.

Wu, L., Murat, P., Matak-Vinkovic, D., Murrell, A., and Balasubramanian, S. (2013). Binding interactions between long noncoding RNA HOTAIR and PRC2 proteins. Biochemistry 52, 9519–9527.

143 Wysocka, J., Reilly, P.T., and Herr, W. (2001). Loss of HCF-1-chromatin association precedes temperature-induced growth arrest of tsBN67 cells. Mol. Cell. Biol. 21, 3820–3829.

Xie, H., Xu, J., Hsu, J.H., Nguyen, M., Fujiwara, Y., Peng, C., and Orkin, S.H. (2014). Polycomb repressive complex 2 regulates normal hematopoietic stem cell function in a developmental-stage-specific manner. Cell Stem Cell 14, 68–80.

Xu, J., Shao, Z., Li, D., Xie, H., Kim, W., Huang, J., Taylor, J.E., Pinello, L., Glass, K., Jaffe, J.D., et al. (2015). Developmental control of polycomb subunit composition by GATA factors mediates a switch to non-canonical functions. Mol Cell 57, 304–316.

Xu, R.M., Jokhan, L., Cheng, X., Mayeda, A., and Krainer, A.R. (1997). Crystal structure of human UP1, the domain of hnRNP A1 that contains two RNA- recognition motifs. Structure 5, 559–570.

Yang, N., and Kazazian, H.H. (2006). L1 retrotransposition is suppressed by endogenously encoded small interfering RNAs in human cultured cells. Nat Struct Mol Biol 13, 763–771.

Yang, Y.W., Flynn, R.A., Chen, Y., Qu, K., Wan, B., Wang, K.C., Lei, M., and Chang, H.Y. (2014). Essential role of lncRNA binding for WDR5 maintenance of active chromatin and embryonic stem cell pluripotency. Elife 3, e02046.

Yen, Z.C., Meyer, I.M., Karalic, S., and Brown, C.J. (2007). A cross-species comparison of X-chromosome inactivation in Eutheria. Genomics 90, 453–463.

Yildirim, E., Kirby, J.E., Brown, D.E., Mercier, F.E., Sadreyev, R.I., Scadden, D.T., and Lee, J.T. (2013). Xist RNA is a potent suppressor of hematologic cancer in mice. Cell 152, 727–742.

Yu, Y., Lv, F., Liang, D., Yang, Q., Zhang, B., Lin, H., Wang, X., Qian, G., Xu, J., and You, W. (2017). HOTAIR may regulate proliferation, apoptosis, migration and invasion of MCF-7 cells through regulating the P53/Akt/JNK signaling pathway. Biomed. Pharmacother. 90, 555–561.

Zeng, P.-Y., Vakoc, C.R., Chen, Z.-C., Blobel, G.A., and Berger, S.L. (2006). In vivo dual cross-linking for identification of indirect DNA-associated proteins by chromatin immunoprecipitation. BioTechniques 41, 694–696–698.

Zhang, H., Zeitz, M.J., Wang, H., Niu, B., Ge, S., Li, W., Cui, J., Wang, G., Qian, G., Higgins, M.J., et al. (2014a). Long noncoding RNA-mediated intrachromosomal interactions promote imprinting at the Kcnq1 locus. The Journal of Cell Biology 204, 61–75.

Zhang, M., Wang, Y., Jones, S., Sausen, M., McMahon, K., Sharma, R., Wang, Q., Belzberg, A.J., Chaichana, K., Gallia, G.L., et al. (2014b). Somatic mutations of SUZ12 in malignant peripheral nerve sheath tumors. Nat. Genet. 46, 1170–1172.

144 Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137.

Zhao, J., Ohsumi, T.K., Kung, J.T., Ogawa, Y., Grau, D.J., Sarma, K., Song, J.J., Kingston, R.E., Borowsky, M., and Lee, J.T. (2010). Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol Cell 40, 939–953.

Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J., and Lee, J.T. (2008). Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756.

Ørom, U.A., Derrien, T., Beringer, M., Gumireddy, K., Gardini, A., Bussotti, G., Lai, F., Zytnicki, M., Notredame, C., Huang, Q., et al. (2010). Long noncoding RNAs with enhancer-like function in human cells. Cell 143, 46–58.

145