TRANSCRIPTOME-WIDE INVESTIGATION OF NUCLEAR RNA-BINDING
PROTEINS
by
ERIC NGUYEN
B.A., B.S., University of Washington, 2009
A thesis submitted to the
Faculty of the Graduate School of the
University of Colorado in partial fulfillment
of the requirements for the degree of
Doctor of Philosophy
Molecular Biology Program
2017
This thesis for the Doctor of Philosophy degree by
Eric Nguyen
has been approved for the
Molecular Biology Program
by
Arthur Gutierrez-Hartmann, Chair
Anthony Gerber
Thomas Evans
Patricia Ernst
Matthew Taylor
Aaron Johnson, Advisor
Date: December 15th, 2017
ii Nguyen, Eric (PhD, Molecular Biology Program)
Transcriptome-Wide Investigation of Nuclear RNA-Binding Proteins
Thesis directed by Assistant Professor Aaron M. Johnson
ABSTRACT
RNA-binding proteins play a number of important roles throughout the cell.
In order to more closely investigate their activity, we have adapted high-throughput techniques to characterize their activity across the transcriptome.
We have previously identified heterogeneous nuclear ribonucleoprotein
(hnRNP) A2/B1 as a potential adaptor protein for interactions between the chromatin silencing complex PRC2 and the RNA HOTAIR. We used enhanced cross- linking immunoprecipitation (eCLIP) to map the complete set of direct interactions between hnRNP A2/B1 and RNA in breast cancer cells. Surprisingly, a strong A2/B1 binding site occurs in the third intron of HOTAIR, which interrupts a known RNA-
RNA interaction hotspot and is retained at a higher frequency than other HOTAIR introns. In vitro eCLIP experiments suggest that A2/B1 may redistribute to exonic binding sites once this intron is spliced. A2/B1 associates with multiple lncRNAs at regions that may contribute to regulation. Finally, we performed cellular fractionation to characterize the pattern of RNA association of A2/B1 in chromatin, nucleoplasm, and cytoplasm.
We also examined the potential relevance of hnRNP A2/B1 in myogenesis. As
A2/B1 has been associated with a number of muscle diseases, we performed eCLIP on A2/B1 in both undifferentiated mouse myoblasts and differentiated myotubes.
We found that A2/B1 binds the 3′ UTR of transcripts in differentiated cells, and that these transcripts tended to be protein-coding. We also performed eCLIP on another
iii protein TDP-43, that has been shown to directly interact with A2/B1 and be dysregulated in many of the same diseases. This experiment identified a number of exonic binding sites in myogenesis-associated transcripts, indicative of a role for
TDP-43 in the nuclear export of long RNAs during myogenesis. Comparison of
A2/B1 and TDP-43-bound transcripts shows some overlap, suggesting that they may act cooperatively in RNA regulation during myogenesis.
Finally, we developed a novel method to investigate heterochromatin- associated RNA called hmRIP-seq, which was designed to differentiate between
RNA-enzyme interactions leading to heterochromatin formation, and those that do not. This method identified a potential heterochromatin-interacting noncoding RNA,
MTRNR2L12, that may direct silencing towards repetitive elements with similar sequence.
The form and content of this abstract are approved. I recommend its publication.
Approved: Aaron M. Johnson
iv ACKNOWLEDGEMENTS
This work would not have been possible without the assistance of many
people.
First and foremost, I would like to thanks my advisor, Aaron M. Johnson, for
his invaluable advice and persistence in keeping the following projects on track as
each one progressed. His copious comments on every paper, poster, and abstract
were invariably helpful.
I would also like to thank the graduate students with whom I have worked for
the past four years, Alexis Zukowski and Maggie Balas, for their conversations and
commiserations about life in the lab, and former lab members Emily Meredith and
Karly Sindy for their invaluable mentorship and advice.
Successful scientific endeavors rarely happen without outside help. For
assistance with the eCLIP protocol I would like to thanks Gabriel Pratt and Eric van
Nostrand, a chance conference encounter that became a lifeline for establishing their
protocol in our lab. For assistance in imagining new frontiers for eCLIP I would like
to thank Josh Wheeler and Tom Vogler for having the foresight to see how to include
my work in their story.
I would like to thank my thesis committee for their advice: Arthur Gutierrez-
Hartmann, Tom Evans, Anthony Gerber, Patricia Ernst, and Matt Taylor. I would also like to acknowledge to contributions of former committee members Tobias Neff and David Bentley.
I would like to thank the members of the Biochemistry and Molecular
Genetics Department and Molecular Biology Program who have generously provided assistance: Nova Fong, Ryan Sheridan, Kerri York, and Monica Ransom for research
v assistance; Sue Brozowski and Annie Vazquez for help navigating departmental
bureaucracy; and Sabrena Heilman, Michele Hwozdyk-Parsons, and Bob Sclafani for
running the Molecular Biology Program.
I would like to thank the Medical Scientist Training Program for their
continued support: Arthur Gutierrez-Hartmann and Angie Ribera for heading the
program; Jodi Cropper, Emily Thomas, and Katie Bidus for administrative assistance; Sally Peach, Laura Hancock, Greg Kirkpatrick, Tamara Garcia, Ariel
Hernandez, and Leon Zheng, the classmates with whom I entered this program six years ago; and Jingjing Zhang, Tom Vogler, Josh Wheeler, Dan Youmans, Matt
Becker, Kelly Higa, Taylor Soderborg, Sarah Haeger, Jason Silver, and Mindy Szeto for their continued willingness to adventure around Denver.
I would like to thank my previous mentors for helping to guide me to this career path: Bertil Hille, Willie Swanson, Pat Navas, and Jay Hesselberth.
I would like to thank my parents, Ann and Toan Nguyen, for their continued advice and support, as well as my brother, Grant Nguyen.
Last but not least, I would like to thank Charlotte Siska for being there for me at any time of day or night.
vi
TABLE OF CONTENTS
CHAPTER
I. INTRODUCTION ...... 1
Chromatin Biology ...... 1
Epigenetic Modifications Have an Effect on Gene Regulation ...... 1
Polycomb Proteins Form Gene Silencing Complexes ...... 3
PRC2 Activity is Modulated by Protein and RNA Cofactors ...... 5
PRC2 Can Bind to Many RNAs ...... 6
PRC2 and Disease ...... 8
Long Noncoding RNAs and Their Effects on Chromatin ...... 8
Long Noncoding RNAs: A New Class of RNA ...... 8
lncRNAs Can Bind to Specific Sites in the Genome ...... 10
Many lncRNAs Can Bind to PRC2 ...... 12
PRC2-lncRNA Interactions are Implicated in Disease ...... 12
Many Proteins Bind lncRNAs and Affect Chromatin State ...... 14
hnRNP A2/B1: A Multi-Faceted Protein ...... 15
hnRNPs Comprise a Diverse Family of Nuclear Proteins ...... 15
The Structure of hnRNP A2/B1 ...... 16
The Potential Roles of hnRNP A2/B1 ...... 17
hnRNP B1 Binds HOTAIR and Its Targets with Specificity ...... 19
hnRNP B1 May Act as an RNA Matchmaker on Chromatin ...... 19
hnRNP A2/B1 in Disease ...... 20
Functional Interactions between RNA-Binding Proteins ...... 21
vii TDP-43 is a Splicing Regulator that Interacts with hnRNP A2/B1 ...... 21
Cytoplasmic TDP-43 and hnRNP A2/B1 can be Pathologic ...... 22
Scope of Thesis ...... 23
II. THE RNA INTERACTOME OF HNRNP A2/B1 ...... 36
Introduction ...... 36
Materials and Methods ...... 37
Lessons Learned from HITS-CLIP and iCLIP Methods ...... 37
The Enhanced CLIP (eCLIP) Method ...... 38
Computational Analysis of eCLIP-seq Samples ...... 41
In vitro eCLIP ...... 42
Cellular Fractionation of MCF7 Cells ...... 43
RNA Isolation and PCR ...... 43
Results ...... 43
The hnRNP B1 Exon is Well-Conserved and Expressed in Mouse and Human
...... 43
Binding of the lncRNA HOTAIR by hnRNP A2/B1 ...... 47
Binding of hnRNP A2/B1 to Long Noncoding RNAs ...... 48
The hnRNP B1 Binding Profile Differs in Each Cellular Compartment ...... 50
Discussion ...... 51
Conservation of the hnRNP B1 Isoform ...... 51
Potential Roles of hnRNP A2/B1 on HOTAIR ...... 52
Interactions between A2/B1 and Additional lncRNAs ...... 53
III. ROLES OF TDP-43 AND HNRNP A2/B1 IN MUSCLE DIFFERENTIATION ...... 67
viii Introduction ...... 67
Materials and Methods ...... 69
C2C12 Cells are a Model System for Myogenesis ...... 69
Modifications to the eCLIP Method ...... 70
Results ...... 70
hnRNP A2/B1 Binds to a Variety of Myogenic Genes in the Cytoplasm ...... 70
TDP-43 Binds Coding Regions of RNAs in Both Myoblasts and Myotubes .... 71
Differences between TDP-43 and A2/B1 RNA Binding in Muscle ...... 73
Discussion ...... 74
IV. THE HISTONE MARK RIP-SEQ METHOD ...... 94
Introduction ...... 94
Materials and Methods ...... 95
Cell Culture ...... 95
Crosslinking of Cells to Improve RNA Recovery ...... 95
Nuclear Lysis and Chromatin Fragmentation ...... 96
RNA Immunoprecipitation ...... 97
hmRIP-seq RNA and DNA Purification ...... 98
Generating an hmRIP-seq Sequencing Library ...... 99
Computational Analysis of hmRIP-seq Libraries ...... 100
Validation of hmRIP-seq Candidates by RT-qPCR ...... 101
Results ...... 101
The hmRIP-seq Technique Can Detect Chromatin-Associated RNAs ...... 101
Identifying Potential Histone Mark-Associated Transcripts ...... 101
A Novel Heterochromatin-Associated RNA ...... 102
ix Potential Piwi Protein-Interacting Transcripts ...... 104
Discussion ...... 106
V. DISCUSSION ...... 119
REFERENCES ...... 126
x LIST OF TABLES
TABLE
3.1 Gene Ontology Analysis of TDP-43 Binding Sites in C2C12 Cells...... 91
3.2 Gene Ontology Analysis of A2/B1 Binding Sites in C2C12 Cells...... 92
xi LIST OF FIGURES FIGURE
1.1 Chromatin Biology...... 25
1.2 Globin Activation...... 26
1.3 Polycomb Complex Mutations...... 27
1.4 PRC2 Subunits and Protein Cofactors...... 28
1.5 Model of Interactions Between PRC2, RNA, and Cofactors...... 29
1.6 Methods to Identify RNA-Associated DNA or Protein...... 30
1.7 Examples of RNA-PRC2 Interactions...... 31
1.8 Structure of hnRNPs A1, A2, and B1...... 32
1.9 Proposed Functions of hnRNP A2/B1...... 33
1.10 Matchmaker Model for hnRNP B1 Activity...... 34
1.11 Mechanisms of hnRNP A2/B1 Mutation in Multisystem Proteinopathy.... 35
2.1 CLIP Methods Comparison...... 55
2.2 A2/B1 Antibody Tests...... 56
2.3 eCLIP Experimental Procedure...... 57
2.4 eCLIP Computational Pipeline Flowchart...... 58
2.5 Conservation of B1-Specific Exon Across Species...... 59
2.6 A2/B1 and B1 eCLIP Results in MCF7 Cells...... 60
2.7 A2/B1 and B1 MCF7 Replicate Correlation...... 61
2.8 Investigation of a Novel A2/B1 Binding Site in HOTAIR...... 62
2.9 In vitro B1 eCLIP Binding Sites...... 63
2.10 Binding of lncRNAs by A2/B1...... 64
2.11 eCLIP of MCF7 Chromatin, Nucleoplasm, and Cytoplasm...... 65
xii 2.12 Model of B1 Interaction with HOTAIR...... 66
3.1 TDP-43 Cytoplasmic Distribution of TDP-43 in Myotubes...... 77
3.2 Model of TDP-43 Mediated Muscle Repair...... 78
3.3 TDP-43 is Correlated With Amyloid Oligomers...... 79
3.4 eCLIP TDP-43 Procedural Tests...... 80
3.5 eCLIP C2C12 A2B1 Procedural Tests...... 81
3.6 Correlation of A2/B1 eCLIP Experimental Replicates...... 82
3.7 eCLIP hnRNP A2/B1 Summary...... 83
3.8 A2/B1 C2C12 eCLIP at lncRNAs and A2/B1 Locus...... 84
3.9 A2/B1 C2C12 eCLIP of mRNAs...... 85
3.10 Correlation of TDP-43 eCLIP Experimental Replicates...... 86
3.11 eCLIP TDP-43 Summary...... 87
3.12 eCLIP TDP-43 at Previously Studied RNAs...... 88
3.13 eCLIP TDP-43 at Myogenic Transcripts...... 89
3.14 Comparison of A2B1 and TDP43 eCLIPs...... 90
3.15 Model of Protein-RNA Binding Network...... 91
4.1 hmRIP-seq Protocol Flowchart...... 110
4.2 hmRIP-seq Tests...... 111
4.3 hmRIP-seq Profiles at lncRNAs...... 112
4.4 MTRNR2L12 hmRIP-seq and Confirmatory qPCR...... 113
4.5 MTRNR2L12 Paralogs hmRIP-seq...... 114
4.6 MTRNR2L12 paralogs relationship and H3K27me3 ChIP-qPCR...... 115
4.7 Potential Mechanism of MTRNR2L12-Mediated Paralog Silencing...... 116
xiii 4.8 Model of Potential LINE Functions...... 117
4.9 LINE mapping...... 118
xiv ABBREVIATIONS
CHART Capture Hybridization Analysis of RNA Targets ChIRP Chromatin Isolation by RNA Purification eCLIP Enhanced CLIP FAST-iCLIP Fully Automated and Standardized iCLIP H3K4me3 Histone H3 Lysine 4 trimethylation; activating H3K9me2 Histone H3 Lysine 9 dimethylation; silencing H3K27me3 Histone H3 Lysine 27 trimethylation; silencing H3K36me3 Histone H3 Lysine 36 trimethylation; activating HITS-CLIP High-throughput sequencing of RNA isolated by crosslinking IP hmRIP Histone mark RNA immunoprecipitation hnRNP Heterogeneous Nuclear Ribonucleoprotein iCLIP Individual nucleotide resolution UV crosslinking and IP IDR Irreproducibility Discovery Rate IP Immunoprecipitation LINE Long Interspersed Nuclear Element lncRNA Long noncoding RNA m6A N6-methyladenosine NUMT Nuclear Mitochondrial DNA Segment piRNA piwi-interacting RNA PRC1 Polycomb Repressive Complex 1 PRC2 Polycomb Repressive Complex 2 PTM Post-translational modification RAP RNA Antisense Purification SILAC Stable Isotope Labeling with Amino Acids in Cell Culture smFISH Small Molecule Fluorescence in situ Hybridization
xv CHAPTER I
INTRODUCTION
Chromatin Biology
Epigenetic Modifications Have an Effect on Gene Regulation
Genes do not exist in a vacuum. In the nucleus, the DNA that makes up the genome is surrounded by factors such as newly synthesized RNA, regulatory elements, transcription factors, and packaging proteins. The most basic of these are nucleosomes, repeating DNA-protein complexes that make up the molecular complex known as chromatin.
Chromatin is composed of repeating nucleosomes, protein octamers
(comprised of dimers of histones H2A, H2B, H3, and H4) wrapped by 147 base pairs of DNA and connected by linker DNA. This structure can exist as “beads on a string” accessible to DNA-binding proteins (euchromatin), or be compacted into a condensed structure consisting of packed nucleosomes and structural proteins
(heterochromatin) (Figure 1.1). Euchromatin is generally associated with active genes, at which uncondensed chromatin allows transcription machinery to bind
DNA.
Euchromatic regions often contain DNase hypersensitive sites, which are particularly active loci that can be bound and degraded by the addition of the enzyme
DNaseI. These sites often change during development and cellular differentiation, as reflected by differences in hypersensitive site locations in cells from different tissues at different times (Crawford et al., 2006; Thurman et al., 2012). A classic example of changing hypersensitive sites is in the locus coding for hemoglobin proteins (Li et al.,
2002; Noordermeer and de Laat, 2008). Humans produce various combinations of
1 hemoglobin proteins during development in a highly-regulated fashion—embryonic
hemoglobin gives way to fetal hemoglobin (comprised of two alpha and gamma
subunits), which is replaced by adult hemoglobin (comprised of two alpha and beta
subunits) after birth (Figure 1.2a). This process is regulated at upstream and downstream hypersensitive sites clustered in Locus Control Regions (Figure 1.2b),
which are thought to form chromatin loops to interact with particular hemoglobin
genes at different times. Deletion or mutation of these hypersensitive sites can lead to diseases such as thalassemia.
Genome-wide studies of DNase hypersensitive sites show high correlation
with studies demonstrating regional enrichment of histone post-translational
modifications (PTMs). Histone PTMs are reversible modifications added mainly to
the N-terminal tails that extend from the globular domain of each histone.
Methylation, acetylation, ubiquitination, and phosphorylation are among the
modifications of specific histone residues can recruit protein complexes that can
perform various actions on nearby chromatin. One of the most common histone
PTMs genome-wide is methylation, which can be associated with either activation or
repression of the surrounding region depending on its position in the histone tail.
Trimethylation of histone H3 lysine 9 (H3K9me3) and lysine 27 (H3K27me3) are
repressive histone marks; conversely, H3K4me3 appears at active transcription start
sites and hypersensitive sites, while H3K36me3 is often present within the body of
actively transcribed genes. Histone PTMs have been described as “epigenetic”
modifications in light of their ability to regulate phenotypic changes without
affecting DNA sequence.
2 Histone PTMs are deposited on chromatin by enzymes that catalyze their addition. The modifications are then recognized and bound by “reader” complexes that bind specifically to particular histone PTMs. Some reader complexes have the capability to perform the physical act of chromatin compaction or de-compaction.
Histone PTM distribution fluctuates between cell types, can be targeted to specific loci in the genome, and can be preserved through DNA replication. Histone PTMs thus represent a reversible mechanism of gene silencing and activation that is heritable across cell divisions.
Polycomb Proteins Form Gene Silencing Complexes
Recent research has attempted to determine the mechanisms underlying the targeting of PTMs to specific sites in the genome. One such mechanism is a pathway named Polycomb. The Polycomb pathway was first identified in a Drosophila mutant that developed sex combs on all (instead of just the first) of its pairs of legs (Lewis,
1947) (Figure 1.3). Subsequent research characterized this mutation as regulating developmental patterning in flies by repressing the Bithorax protein complex under certain conditions. This in turn creates a gradient of Bithorax complex across the developing fly (Lewis, 1978). Mutations with similar phenotypes were soon discovered in a number of genes (e.g. Pc, Pcl, Scm), which are now understood to code for some of the proteins that comprise the Polycomb Repressive Complex 1
(Jürgens, 1985).
Polycomb proteins are well conserved between Drosophila and human
(Levine et al., 2002), and in both species the Polycomb pathway is split between two
Polycomb Repressive Complexes that perform distinct functions: PRC1 and PRC2.
PRC1 binds H3K27me3 marks, then performs the physical compaction of
3 nucleosomes that creates heterochromatin (Franke et al., 1992; Kundu et al., 2017).
PRC2 is the protein complex that deposits H3K27me3 on histones.
Mammalian PRC2 is comprised of four core proteins (Figure 1.4a-d). Ezh2 is
the catalytic subunit, containing a methyltransferase domain that is used to catalyze
the deposition of methyl marks at H3K27 (Kuzmichev et al., 2002). There also exists
a homolog of Ezh2, called Ezh1, which can combine with the other PRC2 proteins to
make an alternate form of the complex (Margueron et al., 2008; Shen et al., 2008).
Ezh1 and Ezh2 have some degree of functional overlap, with either protein alone
being able to generate H3K27 methylation at many of the same sites; however, Ezh1
appears to be the predominant Ezh protein in differentiated tissues (Margueron et
al., 2008; Xu et al., 2015), while Ezh2 appears to more effectively deposit H3K72me3
marks (Shen et al., 2008).
The proposed structure of PRC2 suggests that Suz12, the largest protein in the
complex, acts as a “bridge” between RbAp48 and the other two subunits, Ezh2 and
Eed (Ciferri et al., 2012). It is required for proper PRC2 function (Cao and Zhang,
2004), but can also inhibit PRC2 activity when bound to H3K4me3 or H3K36me3
marks through its VEFS domain (Schmitges et al., 2011). Curiously, Suz12 can bind
DNA or RNA on its own (Beltran et al., 2016; Kirmizis et al., 2004), and is necessary
for proper localization of HP1α, a protein that is involved with silencing by the
histone PTM H3K9me3 (la Cruz et al., 2007). Suz12 also appears able to bind the
chromatin looping protein CTCF, perhaps enabling the spread of PRC2 across the
chromatin landscape (Li et al., 2008).
Eed is composed of WD40 repeats that bind to trimethylated lysines, which
could aid PRC2 in propagating repressive chromatin marks from loci containing pre-
4 existing H3K27me3 marks (Margueron et al., 2009). Although Eed does not contain any catalytic domains, it is required for virtually all H3K27me3 in vivo (Schoeftner et al., 2006; Xie et al., 2014); Eed inhibitors have recently been discovered that block the catalytic function of PRC2 as effectively as compounds that inhibit the methyltransferase domain in Ezh2 (He et al., 2017; Qi et al., 2017).
RbAp48 is a histone binding protein that, like Eed, contains a number of
WD40 domains. It has been shown to bind H3/H4 dimers in vitro, which may act to
stabilize the binding of chromatin by PRC2 (Murzina et al., 2008).
In Drosophila, PRC2 is found on specific sites known as Polycomb Response
Elements (PREs) along with protein cofactors such as GAGA, Zeste, Pleihomeotic,
and Pipsqueak, with one or more cofactor associating with PRC2 at any given PRE
(Mishra et al., 2001; Mulholland et al., 2003; Schwendemann and Lehmann, 2002).
However, in humans most of these proteins are not conserved. YY1, the ortholog of
pleihomeotic and the sole PRC2 binding protein that is conserved between mammals
and invertebrates, does not co-localize with PRC2 on chromatin in humans as it does
in Drosophila (Mendenhall et al., 2010). While the mechanisms of PRC2 targeting in
humans remain unknown, a number of hypotheses have been proposed to explain
how PRC2 is localized to specific genomic loci.
PRC2 Activity is Modulated by Protein and RNA Cofactors
A number of protein cofactors have been found to bind to PRC2, including
AEBP2, Pcl1 and Jarid2 (Figure 1.4e-g). AEBP2 binds DNA nonspecifically and
increases the stability and activity of PRC2 (Kim et al., 2009) while Pcl1 binds to the
activating H3K36me3 mark, perhaps leading to silencing of active sites (Musselman
et al., 2012). Jarid2 has received the most research of these cofactors since it binds
5 both PRC2 and specific DNA sequences, making it a prime candidate to perform the role of gene targeting.
Jarid2, also known as Jumonji, is a transcriptional repressor (Kim et al.,
2003) that was first identified as a key factor in embryonic stem cell differentiation and cardiac development through silencing of genes in the Notch1 pathway
(Mysliwiec et al., 2011). Jarid2 is necessary for maximal PRC2 methyltransferase activity (Li et al., 2010; Peng et al., 2009; Shen et al., 2009). Jarid2-PRC2 interactions appear to be mediated by a small region of Jarid2 that binds Ezh2
(Kaneko et al., 2014a; Pasini et al., 2010; Son et al., 2013).
Jarid2 also binds RNAs, which has led to speculation that association of PRC2 and sequence-specific RNAs via Jarid2 might have an effect on H3K27me3 deposition. Jarid2 has been shown to interact with the PRC2-interacting RNA
HOTAIR (Kaneko et al., 2014a), and be required for interactions between PRC2 and the RNA Xist (da Rocha et al., 2014; Zhao et al., 2008). This suggests a potential mechanism for PRC2 localization: binding to non-repressed regions for subsequent silencing is facilitated by interactions with RNAs that contain specificity for that region (Figure 1.5).
PRC2 Can Bind to Many RNAs
It was recently found that PRC2 can bind nascent RNAs across the genome.
This property could provide significant caveats to the current model of PRC2 function, as it makes little sense why PRC2, a protein complex that deposits repressive marks, would bind to the RNA of genes that are not silenced. One group of studies identified RNAs that bind Ezh2, and concluded that PRC2 is associated with promoters across the genome but does not deposit H3K27me3 if bound to a nascent
6 RNA (i.e. it is at an active promoter) (Kaneko et al., 2014b; 2013). Another study
showed that recombinant human PRC2 promiscuously binds to RNA, even those
from constitutively transcribed genes and non-human species. The term
“promiscuous binding,” when used in this context, refers to proteins that bind to
RNAs that do not contain an obvious protein-binding motif. In this model PRC2 binds to RNA, including nascent transcripts, then recognizes nearby histone PTMs.
If those histone PTMs are H3K27me3 then PRC2 deposits additional H3K27me3 nearby; conversely, if the only nearby histone PTMs are activating then PRC2 is inhibited (Davidovich et al., 2013).
It has since been found that promiscuous binding is highly dependent on
experimental conditions, including RNA length and buffer conditions. Although the
binding affinity of PRC2 is correlated with RNA length, length alone is not enough to
fully determine affinity, leaving the potential mechanisms of promiscuous binding
unclear (Davidovich et al., 2015). It has recently been suggested that PRC2 has
affinity for G-rich RNA regions, and has a strong affinity for related RNA secondary
structures such as folded guanine quadruplexes (Wang et al., 2017).
Yet another study concluded that PRC2 binding is less promiscuous and
proposed a model in which RNA inhibits PRC2, an effect that is relieved by the
binding of accessory protein Jarid2 (Cifuentes-Rojas et al., 2014). This has been
supported by a more recent study that described PRC2 binding sites across the
transcriptome, finding that RNA tends to be bound by PRC2 (specifically the Suz12
subunit) at exons and 3′ UTRs, an interaction that is capable of inhibiting PRC2-
chromatin binding, and vice versa (Beltran et al., 2016).
7 PRC2 and Disease
Since PRC2 is involved in the silencing of many loci across the genome, it has
also been implicated as a cause of many diseases in which those genes are
inappropriately silenced or activated. For example, the FMR1 gene is subjected to
PRC2-mediated silencing when an internal repeat tract reaches a certain length, leading to Fragile X syndrome (Kumari and Usdin, 2014). A similar mechanism appears to affect development of Huntington’s Disease, in which the huntingtin locus is targeted by PRC2 once an internal repeat region reaches a certain length (Seong et al., 2010). It has also been shown that PRC2 is responsible for silencing a number of neurodegenerative disease-related genes, with PRC2 deficiency leading to neuronal cell death in mice (Schimmelmann et al., 2016).
PRC2 and its subunits have also been associated with many different types of cancer. The complex is required for the proliferation of MLL-AF9 leukemia cells
(Neff et al., 2012; Xie et al., 2014); in these same cells, inhibition of the interaction between Ezh2 and Eed is also able to disrupt proliferation (Kim et al., 2013b).
Malignant peripheral nerve sheath tumors have been associated with PRC2 mutations (particularly in Suz12 and Eed) in patient samples (De Raedt et al., 2014;
Zhang et al., 2014b). Mutations in the genes coding for histone H3 variants have been shown to modulate PRC2 activity in glioblastoma, leading to reduced amounts of H3K27me3 (Lewis et al., 2013).
Long Noncoding RNAs and Their Effects on Chromatin
Long Noncoding RNAs: A New Class of RNA
In the Central Dogma, mRNAs are intermediates in the transfer of information from DNA to protein while rRNAs and tRNAs facilitate this process.
8 This view of RNA function was incomplete, as the years since have witnessed the
discovery of many RNAs with varied structure and function: ribozymes catalyze
common enzymatic reactions; snRNAs alter the sequence of other RNAs; rRNAs catalyze translation in addition to scaffolding the ribosome; miRNAs can promote
targeted RNA silencing.
Many recent discoveries in RNA biology have been helped by the ENCODE
project, which was tasked with a systematic examination of genomic regulatory
elements, including less well-studied elements whose functions had previously been poorly described (Dunham et al., 2012; Gerstein et al., 2012; Harrow et al., 2006;
Neph et al., 2012; Thurman et al., 2012). Their studies showed that over 80% of the genome contains histone PTMs, is in a DNase hypersensitive site, is bound by transcription factors, or is transcribed in one of their studied cell types (Kellis et al.,
2014). Among many surprising findings, they observed pervasive transcription
occurring throughout the genome, suggesting that any given cell has a far wider
variety of RNA expressed at a given time than had previously been believed (Djebali
et al., 2012). Although there is disagreement over how much of this pervasive
transcription is noise versus functional transcription, analysis of these transcripts
has led to the identification of thousands of long noncoding RNAs (Derrien et al.,
2012).
Long noncoding RNAs (lncRNAs) are generally defined as RNAs longer than two hundred nucleotides with little protein-coding potential but with functional roles in cellular processes nonetheless (Derrien et al., 2012). In many respects they are similar to mRNAs: they are often transcribed by Pol II, are often polyadenylated, have similar chromatin signatures as active genes, and contain canonical splice sites
9 despite often being spliced post-transcriptionally (Tilgner et al., 2012). LncRNAs
have evolved rapidly, although they retain significant tissue specificity between
species (Necsulea et al., 2014; Washietl et al., 2014). This rapid evolution is thought
to be due in part to the influence of transposable elements (Kapusta et al., 2013).
There are several ways for lncRNAs to influence cellular processes. Most obvious are antisense RNAs, which bind mRNAs in cis to assert post-transcriptional control over their fate (Amit-Avraham et al., 2015; Villegas and Zaphiropoulos,
2015). lncRNAs can also modulate enhancer function when transcribed from enhancer loci (Li et al., 2013; Ørom et al., 2010). There are many other examples of diverse lncRNA mechanisms such as Gas5, which contains a stem-loop structure similar to a glucocorticoid response element, allowing it to inhibit the glucocorticoid receptor (Kino et al., 2010); THRIL, which forms a ribonucleoprotein complex binding the TNFα promoter to induce an immune response (Li et al., 2014); linc-
MD1, which acts as a microRNA sponge during muscle development (Cesana et al.,
2011); and RMST, which binds and regulates a pluripotency-associated transcription factor (Ng et al., 2013). Many lncRNAs, such as the previously-mentioned HOTAIR,
Kcnq1ot1, and Xist, are known to bind to repressive chromatin modifiers as well as specific genomic loci, either in cis or in trans (Ng et al., 2011). lncRNAs Can Bind to Specific Sites in the Genome
A number of groups have introduced methods to study the genomic localization of DNA-associating lncRNAs. The first of these, ChIRP-seq (chromatin isolation by RNA purification, Figure 1.6a), uses tiling antisense RNA oligonucleotides to capture an RNA of interest, cross-linked to chromatin, from which bound DNA can then be purified. ChIRP-seq analysis of HOTAIR confirmed
10 the prior suggestion that HOTAIR interacts directly with the HOXD gene cluster
(Rinn et al., 2007), narrowing the binding site to the intergenic region between
HOXD3 and HOXD4 and demonstrating that HOTAIR binds to this site
independently of PRC2 (Chu et al., 2011). ChIRP-seq has recently been modified to
investigate binding partners of specific domains of transcripts, although this
approach has not yet yielded any insights into HOTAIR biology (Quinn et al., 2014).
Another similar method to ChIRP is CHART-seq (capture hybridization analysis of RNA targets, Figure 1.6b), which uses a similar strategy as ChIRP-seq but
with antisense RNA oligonucleotides designed specifically to single-stranded regions
of the target RNA (Simon et al., 2011). CHART has been applied to the lncRNAs Xist,
NEAT1 and MALAT1, which are both highly expressed in the nucleus, and like
HOTAIR, bind to specific sites in the genome (Simon et al., 2014; West et al., 2014).
Finally, the RAP (RNA antisense purification, Figure 1.6c) method also uses
antisense RNA oligonucleotides to retrieve target transcripts, but with
oligonucleotides designed to be very long (120 nucleotides) in order to improve their
strength of hybridization. RAP has been used to examine the process by which Xist
binds to the soon-to-be-inactivated X chromosome (Engreitz et al., 2013). It was
found that Xist initially binds to specific regions on the X chromosome, then spreads
across it in correlation with three-dimensional chromosome conformation. Notably,
although Xist is able to spread independently of its PRC2-binding A-repeat domain,
that domain is required for Xist to bind active, gene-dense regions, implying that
separate domains of a lncRNA might mediate its ability to bind different regions or
types of chromatin.
11 Many lncRNAs Can Bind to PRC2
Several groups have attempted to characterize the RNAs associated with
PRC2 by immunoprecipitations with PRC2 components and computational
predictions based on supervised learning (Glazko et al., 2012; Guttman et al., 2012;
Khalil et al., 2009; Zhao et al., 2010). These groups identified one to three hundred lncRNAs that interact with PRC2. These identifications comprised a large proportion of lncRNAs that are expressed in the cell lines used for those experiments, suggesting that some of these lncRNAs may not have any specificity for PRC2, and may instead be binding as part of a promiscuous PRC2 binding interaction.
In light of promiscuous binding of RNA to PRC2, it seems apparent that any
immunoprecipitation experiment to find PRC2-associated RNAs would identify
promiscuous interactions in addition to specific ones. Depending on the ratio of
promiscuous to specific interactions, a majority of the previously described PRC2-
RNA interactions might have no effect on H3K27me3 deposition. In the absence of a
strong RNA primary sequence motif in transcripts bound by PRC2, any productive
interactions (RNA-PRC2 interactions that lead to the formation of heterochromatin)
are likely to instead be regulated by RNA secondary structure or bound protein
cofactors.
PRC2-lncRNA Interactions are Implicated in Disease
A number of diseases can be caused by interactions between PRC2 and
lncRNAs. Dysregulation of the RNA HOTAIR leads to inappropriate localization of
PRC2, in turn leading to inappropriate gene silencing (Tsai et al., 2010). HOTAIR is
transcribed from the HOXC locus on chromosome 12, but is known to bind PRC2
and silence a specific region between HOXD3 and HOXD4 (Rinn et al., 2007)
12 (Figure 1.7a). HOTAIR also binds specifically to many other genomic loci. In MDA-
MB-231 breast cancer cells, HOTAIR overexpression results in a PRC2 occupancy
pattern similar to embryonic and neonatal fibroblasts, implying that HOTAIR
overexpression is able to “reprogram” PRC2 to revert cells to a more proliferative
state. These cells also demonstrated primary tumor growth and metastatic ability
(Gupta et al., 2011).
Another well-studied example of disease caused by PRC2-RNA dysfunction is
Beckwith-Wiedemann Syndrome, in which inappropriate silencing of the PRC2- binding lncRNA Kcnq1ot1 leads to activation of the nearby loci, including Igf2,
Cdkn1, and H19 (Pandey et al., 2008; Zhang et al., 2014a) (Figure 1.7b). This can
lead to a number of clinical symptoms, including macroglossia, large abdominal
circumference, and Wilms’ tumor. Interestingly, inappropriate silencing of Kcnq1ot1
is itself often epigentically driven, with about 50% of cases occurring as a result of
loss of DNA methylation in a nearby differentially methylated region. As a result,
many cases of Beckwith-Wiedemann Syndrome are often mosaic, resulting in
hemihyperplasia such as uneven leg length.
A well-studied chromatin-associated lncRNA is Xist, which regulates
mammalian dosage compensation. Xist spreads from its locus across the soon-to-be-
inactivated X chromosome, binding PRC2 and promoting heterochromatin
formation across the chromosome (Engreitz et al., 2013; Simon et al., 2014) (Figure
1.7c). Xist normally only acts on the inactive X chromosome, and transplanting it to another chromosome silences that chromosome through Xist’s normal PRC2- associated mechanism (Jiang et al., 2013); conversely, Xist loss in mice leads to X chromosome reactivation, which is highly cancerous (Yildirim et al., 2013).
13 Many Proteins Bind lncRNAs and Affect Chromatin State
A number of groups have investigated proteins associated with lncRNAs using modified versions sequence capture methods, followed by mass spectrometry to identify bound proteins. CHART-MS (CHART followed by mass spectroscopy) was used to identify proteins bound to NEAT1 and MALAT1, two lncRNAs that associate with either paraspeckles or nuclear speckles (West et al., 2014). As one might expect, the proteins that bind these RNAs tend to be components of paraspeckles or nuclear speckles, some of which bind to either lncRNA, and some of which bind to both.
ChIRP-MS has been used to identify Xist-interacting proteins in mouse embryonic stem cells, embryonic stem cells with Xist expression induced on chromosome 11, epiblast stem cells that have undergone random X chromosome inactivation, and trophoblast stem cells that have a silenced paternal X chromosome
(Chu et al., 2015). This study identified a number of Xist-interacting proteins, which were then subdivided into groups that bind in all cells versus only differentiated cells. A “plug-and-play” model was hypothesized in which Xist might gain or lose silencing functions based on which cell type-specific proteins it binds.
RAP-MS (McHugh et al., 2015) and iDRiP (identification of direct RNA- interacting proteins) (Minajigi et al., 2015) have also been used to examine proteins that associate with Xist. These experiments identified a wide range of Xist-associated proteins, many of which are involved in or are associated with chromatin regulation.
The iDRiP method also identified proteins that are associated with nuclear structure, and confirmed that Xist binding is necessary for these proteins to affect the structure of the inactive X chromosome. Another study targeting Xist-associated histone variants identified ATRX as a potential cofactor for interactions between Xist and
14 PRC2 (Sarma et al., 2014). The various ChIRP-MS, RAP-MS, and iDRiP results
suggest that Xist can bind to a variety of proteins, suggesting that its binding
partners may change based on the overall cellular state.
Our lab has performed the only study to examine the proteins bound to
HOTAIR, using MS2-tagged HOTAIR that was then incubated with nuclear extracts from either Hela or MDA-MB-231 cells, and analyzed by SILAC (stable isotope labeling with amino acids in cell culture), comparing HOTAIR versus a control RNA
(Meredith et al., 2016). This study identified a number of candidate HOTAIR- regulating protein, including several from the hnRNP (heterogeneous nuclear ribonucleoprotein) family.
hnRNP A2/B1: A Multi-Faceted Protein hnRNPs Comprise a Diverse Family of Nuclear Proteins
The hnRNP family consists of twenty abundant nuclear proteins that were
originally characterized based on the fact that they can be purified with nuclear RNA,
and were named based on their size (with hnRNP A1 being the smallest, and hnRNP
U being the largest). The diversity of hnRNPs is correlated with rapid expansion of
this family in vertebrates (Barbosa-Morais et al., 2006). They have been shown to
cooperatively regulate many RNA processing events, including alternative splicing
(Busch and Hertel, 2012; Gueroussov et al., 2017; Huelga et al., 2012).
Multiple hnRNPs have been identified as binding to lncRNAs. For example, hnRNP U is required for proper localization of Xist (Hasegawa et al., 2010). Human hnRNP K interacts with the Drosophila ortholog of the PRC2 subunit Eed
(Denisenko and Bomsztyk, 1997) and is required for the deposition of silencing histone marks on the inactive X chromosome, a process that is mediated by Xist
15 (Chu et al., 2015). Our SILAC experiment identified a strong interaction between
HOTAIR and hnRNP A2/B1.
The Structure of hnRNP A2/B1
The name hnRNP A2/B1 describes a pair of alternatively-spliced transcripts that differ only by an additional 36-nucleotide exon at the N-terminal end of the B1 isoform (Burd et al., 1989). Both isoforms contain two RNA recognition motifs
(RRMs) at their N-terminal end, and a glycine-rich domain (also known as a low- complexity or prion-like domain) at their C-terminal end (Figure 1.8b-c). A2/B1, like other highly abundant hnRNPs, is expected to have tens of millions of copies per nucleus, with some variation depending on cell type (Dreyfuss et al., 2002; Kamma et al., 1999).
A2/B1 is a paralog of hnRNP A1, which also contain two RRMs and a glycine- rich domain, and has also been classified as a splicing regulator (Biamonti et al.,
1994) (Figure 1.8a). There is little primary sequence similarity in the glycine-rich domain but a notable amount of structural similarity, particularly in the spacing of aromatic residues (Biamonti et al., 1994). This similarity enables comparisons between the structure and function of the two paralogs.
The structure of the hnRNP A1 RNA-binding domains has been solved
(Shamoo et al., 1997; Xu et al., 1997); although there is significant sequence dissimilarity between the two RNA binding domains, their structure remains similar.
Furthermore, when combined together with a short linker region (as in hnRNP A1) the RNA-binding domains orient antiparallel to one another. It has been hypothesized that this antiparallel orientation could be conducive to hnRNP A1 binding two sites on the same RNA with only one turn in between (Ding et al., 1999);
16 alternatively, this style of binding applied to two different RNAs could promote complementary RNA-RNA interactions.
There are some examples of hnRNP A1 promoting complementary annealing in a sequence-specific manner (Figure 1.9a). This annealing can be promoted solely by a 48-residue region in the glycine-rich domain of A1 (Kumar and Wilson, 1990;
Munroe and Dong, 1992), implying that this domain might have some RNA-binding ability of its own. Although the RNA-binding domains can bind mRNA on their own, it has been shown that the glycine-rich domain adds selectivity for RNA containing the sequence UAGGGA/U (Burd and Dreyfuss, 1994). Since this region is structurally conserved in A2/B1, this selectivity might also be conserved between the two paralogs.
The Potential Roles of hnRNP A2/B1
Analysis of hnRNP A2/B1 to date has focused on its role as an RNA processing factor. hnRNPs A1 and A2/B1 have been associated with splicing and stability of mRNAs (Mayeda and Krainer, 1992; Mayeda et al., 1994) (Figure 1.9b). An extensive examination of splicing control by the hnRNP family identified a number of splicing events that require A2/B1, as well as others that require either A2/B1 or another hnRNP (Huelga et al., 2012). This analysis also identified a preference for binding within the exons or 3ʹ UTRs of genes; their motif analysis was unable identify a strong binding motif, but indicated a preference for UAG sequences.
While splicing appears to be the major function of A2/B1, it appears that its ability to bind RNA during splicing has been repurposed for other RNA-binding activities. For example, A2/B1 binds to telomeric (TTAGGG)n repeats (McKay and
Cooke, 1992) (Figure 1.9c), similar to the UAGGGA/U sequence that is a preferential
17 binding site for hnRNP A1. As a result, it has been hypothesized to act as a potential
adapter between ssDNA and ssRNA at the telomere (Moran-Jones et al., 2005).
However, the exact nature of this role remains unknown.
A2/B1 has also been hypothesized to be a reader of N6-methyladenosine
(m6A) modifications (Alarcón et al., 2015), which are a common form of mRNA
modification (Figure 1.9d). This modification occurs in multiple locations on many genes at specific locations on each transcript (Linder et al., 2015), and is thought to affect the structure of the surrounding RNA (Liu et al., 2015). These modified sites have functional consequences—for example, the m6A sites on Xist have been shown to be crucial for its role in X inactivation (Patil et al., 2016). m6A modifications are thought to help maintain mRNA stability through the recruitment of selectively- binding proteins (Wang et al., 2014). A2/B1 has been shown to bind to transcripts at m6A marks, often overlapping the RGAC motif bound by METTL3, the enzyme that deposits m6A (Alarcón et al., 2015). A2/B1 also stabilizes miRNAs that are processed by METTL3, implying that A2/B1 is able to “read” m6A marks deposited by METTL3.
While this “reading” ability may not be a direct interaction, binding by A2/B1 appears to be able to mediate the downstream effects of the METTL3 pathway.
Another study computationally predicted 3ʹ UTR sequence elements most closely associated with mRNA stability, then used a number of methods to identify hnRNP A2/B1 as the most likely factor to bind these elements and stabilize the transcripts that contain them (Goodarzi et al., 2013). Interestingly, A2/B1 has also been associated with decay-related 3ʹ UTR motifs, implying that A2/B1 might have a number of effects on RNA depending on the motifs available for it to bind (Geissler et al., 2016) (Figure 1.9f).
18 hnRNP B1 Binds HOTAIR and Its Targets with Specificity
Although both hnRNP A2 and B1 were specifically enriched in the HOTAIR pulldown that our lab performed, the hnRNP B1 isoform displayed far greater
enrichment than A2. While the ratio of A2 to B1 varies in vivo, hnRNP A2 tends to be far more abundant than the B1 isoform (Kamma et al., 1999). This specificity is consistent across cell types, including MDA-MB-231 and MCF7 breast cancer cell
lines, which have previously been used for HOTAIR research.
The binding of HOTAIR by hnRNP B1 also has functional consequences.
Knockdown of hnRNP A2/B1 by shRNA inhibited HOTAIR-dependent invasion and
migration, as well as decreased levels of H3K27me3 at HOTAIR-dependent loci.
A2/B1 can also bind the RNA transcripts of genes regulated by HOTAIR, implying
that it could be a mediator of interactions between HOTAIR and those transcripts
(Meredith et al., 2016).
hnRNP B1 May Act as an RNA Matchmaker on Chromatin
HOTAIR and several of its target gene transcripts share a significant amount
of complementary sequence, as identified by the IntaRNA program (Busch et al.,
2008). To test whether these complementary sequences promote RNA-RNA
annealing in vitro, we tested tethered JAM2 RNA, a target gene of HOTAIR. This
experiment retrieved HOTAIR bound to the tethered JAM2, indicating that the
interaction between complementary sequences is relatively strong. This interaction
was disrupted by mutations in the HOTAIR interaction sequence, and strengthened
by the addition of hnRNP B1 to the experiment (Meredith et al., 2016).
This result led us to a model (Figure 1.10) in which hnRNP B1 first binds to
both HOTAIR and HOTAIR target gene transcripts on chromatin; this could be
19 mediated by the RNA recognition motifs on individual B1 proteins, or by dimers of
B1. Second, putting HOTAIR and the target transcripts in close proximity promotes
the formation of stable RNA-RNA interactions between these RNAs. Finally,
HOTAIR recruits PRC2, leading to deposition of H3K27me3 at the target gene locus.
This “matchmaker model” could fit as an intermediate step in prior models of
PRC2 activity, in which PRC2 binds to either nascent transcripts or promoters and inhibition of its activity is relieved as a result of some external stimulus. Either the
A2/B1 cofactor could serve as the stimulus, or the RNA-RNA interactions it promotes could do so. If this model is an accurate depiction of the A2/B1’s activity on chromatin, it raises questions as to what other functions A2/B1 might have in gene silencing, and how those functions might be disrupted in disease. hnRNP A2/B1 in Disease
Since hnRNP A2/B1 is associated with the stability and splicing of many mRNAs, it stands to reason that mutations in A2/B1 or A2/B1 targets would have an effect on numerous transcripts and pathways. One such change is caused by mutation of an exonic splicing silencer sequence in the GLA gene (Palhais et al.,
2016). Disruption of this sequence decreases binding of A2/B1 and leads to inclusion of a pseudoexon that contains a premature stop codon. This results in a defective protein that impairs lysosomal storage, eventually leading to Fabry Disease.
Antibodies to A2/B1 have also been detected in patients with a wide range of autoimmune diseases, such as rheumatoid arthritis, systematic lupus erythematosus, mixed connective tissue disease (Steiner et al., 1992), juvenile idiopathic arthritis
(Tomoum et al., 2009), Takayasu’s arteritis, and Behçet’s disease (Bin Cho et al.,
2012) (in the autoimmune disease field, A2/B1 is referred to by the name RA33)
20 (Figure 1.9e). While no mechanistic link has been discovered through which depleted
A2/B1 could cause these diseases, this correlation suggests that large quantities of
A2/B1 are released once the inflammatory process lyses cells.
Loss of A2/B1 has also been associated with cognitive diseases such as
Alzheimer’s disease (Berson et al., 2012). Curiously, overexpression or mislocalization of A2/B1 can also lead to disease. In a subset of patients with multisystem proteinopathy (a complex disease presenting with frontotemporal demetia, Paget’s disease of the bone, inclusion body myopathy, and atrophic lateral sclerosis), hnRNP A2/B1 mutations have been identified as causative factors (Le Ber et al., 2014). This causation, however, is closely related to another protein that has
also been classified as an hnRNP, TDP-43.
Functional Interactions between RNA-Binding Proteins
TDP-43 is a Splicing Regulator that Interacts with hnRNP A2/B1
TDP-43 was initially identified as a factor that binds the TAR DNA locus in the HIV-1 genome (Ou et al., 1995). It was soon realized that TDP-43 binds to RNA
as well, playing a key role in the correct splicing of the CFTR transcript, which codes
for a transmembrane chloride channel (Buratti et al., 2001). TDP-43 binds to TG
repeats near the 3ʹ end of exon 8 of CFTR, affecting whether or not exon 9 is spliced
into the final transcript—overexpression of TDP-43 results in an increase in exon 9
skipping, while inhibition results in increased inclusion. This is a function that has
clinical consequences, as a CFTR protein missing exon 9 is nonfunctional, leading to
cystic fibrosis.
The C-terminal end of TDP-43 associates with hnRNPs A1, A2/B1, and C
(Buratti et al., 2005). Either knockdown of A1 or A2, or inhibition of the TDP-43/A2
21 interaction, leads to increased inclusion of CFTR exon 9 (D’Ambrogio et al., 2009).
This interaction is evolutionarily conserved, as human TDP-43 retains the ability to bind the Drosophila orthologs of hnRNP A1 and A2/B1 (Romano et al., 2014).
Cytoplasmic TDP-43 and hnRNP A2/B1 can be Pathologic
TDP-43 has been identified as a key contributor to diseases such as frontotemporal lobar degeneration and amyotrophic lateral sclerosis. These diseases are characterized by ubiquitin-positive, tau- and α-synuclein-negative inclusions accumulating in the nucleus or cytoplasm of affected cells. These inclusions were later found to include fragments of degraded TDP-43 (Neumann et al., 2006). In cells from patients with these diseases TDP-43 redistributes from the nucleus to the cytoplasm, a process that has significant effects on transcripts that would normally be regulated by TDP-43 (Amlie-Wolf et al., 2015; Polymenidou et al., 2011).
These disease-related cytoplasmic accumulations can also be caused by a specific mutation in hnRNP A2/B1. This mutation, an aspartic acid to valine mutation at residue 290 in A2 (residue 302 in B1), occurs in the glycine-rich domain and has been shown to promote fibrillization of A2/B1 and cytoplasmic localization of both A2/B1 and TDP-43 (Kim et al., 2013a) (Figure 1.11). These are thought to be the molecular cause of multisystem proteinopathy.
The D290V mutation also causes changes in splicing of A2/B1 target transcripts such as the DAO gene, which produces an oxidase associated with ALS
(Martinez et al., 2016). While these changes are likely due in part to localization of
A2/B1 from the nucleus to the cytoplasm, it is also possible that the D290V mutation on its own can lead to changes in splicing and transcription regulation. Residue 290 falls within the region of A2/B1 that has previously been described to bind to TDP-
22 43. Since that protein-protein interaction is necessary for proper splicing of the
CFTR transcript, it makes sense that a mutation in this region might affect other alternative splicing events as well.
Scope of Thesis
TDP-43 and hnRNP A2/B1 are two examples of multifunctional RNA-binding proteins that have a diverse set of roles in the cell. In this thesis, I will examine the functional significance of these proteins, as well as other chromatin-related proteins, from a transcriptomic perspective. Although research into A2/B1 function thus far has focused on its role as a splicing modulator, the studies referenced above provide evidence for other functions. In Chapter II, I will examine these potential functions through identification of novel RNA binding sites, and differences between the A2 and B1 isoforms, in breast cancer cell lines.
In Chapter III, I will describe a role for A2/B1 and TDP-43 in muscle differentiation and repair. Here, rather than just affecting splicing, we show that
A2/B1 and TDP-43 play a major role in the proper nuclear export and localization of mRNAs. To date most reports of cytoplasmic A2/B1 or TDP-43 have focused on their inclusion in disease-related cytoplasmic RNP complexes. However, we demonstrate that these complexes in fact occur normally during muscle differentiation and repair, and that A2/B1 and TDP-43 are necessary for proper completion of these processes.
Research into the role of RNA-protein complexes in chromatin regulation has mainly focused on interactions between RNAs and PRC2. In Chapter IV, I will describe a novel method for detecting RNA-histone interactions, that may occur either directly or indirectly via protein complexes such as PRC2.
23 The goal of this thesis is to expand on the roles of specific RNA-binding proteins that, to date, have aspects that have not been well studied. Transcriptome- wide approaches allow for a less biased approach to studying the roles of these
RNAs, as they allow for the identification and analysis of previously-unknown RNA- protein interactions. The results from these experiments can then facilitate further targeted investigation focusing on the roles of specific protein-RNA complexes.
24 a Nascent transcript
Transcription Factor Pol II
b
c X
d
Figure 1.1: Chromatin Biology a) Euchromatin consists of DNA bound by nucleosomes, with enough accessible DNA to allow the binding of proteins such as transcription factors. b) Electron micrograph of euchromatin. Adapted from Alberts et al. 5th edition. c) Heterochromatin is sufficiently condensed to prevent the binding of transcription factors, preventing nearby genes from being transcribed. d) Electron micrograph of heterochromatin. Adapted from Alberts et al. 5th edition.
25 a Globin synthesis (%)
Months post-conception b
Figure 1.2: Globin Activation a) Timeline of globin gene expression during human development. Embryonic globin is replaced by fetal globin early in gestation, which is replaced by adult global after birth. Figure adapted from Noordermeer and de Laat, 2008. b) Diagram of the human globin gene loci (boxes), including upstream and downstream hypersensitive sites (3′ and 5′ HS) comprising the Locus Control Regions. Also pictured are deletions that lead to thalassemia. Figure adapted from Li et al., 2008
26 Normal Polycomb Mutant Polycomb
Figure 1.3: Polycomb Complex Mutations Effect of a heterozygous Polycomb mutation on sex comb development in Drosophila (figure adapted from Parrish et al. Genes and Development 2007).
27 Eed RNA Suz12 Nucleic acid? WD40- SANT SANT CXC SET a Ezh2 binding 1 746 H3K27me3 catalytic domain RbAp48 AEBP2 H3K4me3, H3K36me2/3 WD40- Suz12 Zn VEFS b binding 1 741
H3 H3K27me3 H3K27me3 c Eed WD40 WD40 WD40 WD40 WD40 WD40 WD40 1 441
H3/H4 H3/H4 d RbAp48 WD40 WD40 WD40 WD40 WD40 WD40 1 425
Ezh2 RNA DNA DNA e Jarid2 JmjN ARID JmjC Zn 1 1246
H3K36me3 f Pcl1 Tudor Zn Zn 1 559
Ezh2Nucleic acid? Suz12 g AEBP2 Zn Zn Zn 1 517 h
Figure 1.4: Polycomb Repressive Complex 2 Subunits and Protein Cofactors Core protein subunits of the PRC2 complex (a-d), and known cofactors (e-g). Major protein domains are annotated inside the protein diagram, with binding partners annotated above. Also pictured (h) is the structure of the core complex along with AEBP2, as predicted by cryo-EM (Ciferri et al., 2012)
28 H3K27me3 PRC2
Eed binds H3K27me3 On heterochromatin: The Eed subunit binds pre- existing H3K27me3, which allows PRC2 to deposit more H3K27me3 around this region of the genome. RNA
Cofactors?
H3K27me3 PRC2
On euchromatin: Histone-binding properties of Eed and RbAp48, and activating mark-binding property of Suz12 allow binding to active chromatin. Further binding of lncRNA and cofactors can then promote the deposition of H3K27me3. Figure 1.5: Model of Interactions Between PRC2, RNA, and Cofactors Model of interactions between PRC2 and protein/RNA cofactors during H3K27me3 deposition
29 aChIRP bCHART c RAP
Cross-linked complexes Cross-linked complexes Cross-linked complexes
Biotinylated Oligos Long Tiling designed to oligos Oligos ssRNA
Sonication, Sonication, Sonication, Retrieve oligos Retrieve oligos Retrieve oligos
RNase RNase RNase
ChIRP-seq ChIRP-MS RAP RAP-MS
Figure 1.6: Methods to identify RNA-associated DNA or protein a) ChIRP-seq/MS uses tiling antisense oligonucleotides (ASOs) to retrieve RNAs of interest. b) CHART-seq uses ASOs designed specifically to single-stranded regions of the RNA of interest. c) RAP-seq/MS uses long ASOs to retrieve RNAs of interest.
30 a
PRC2 PRC2 Pol II
HOTAIR HOXD4 Other genomic loci
b
X Pol II X X
CDKN1C KCNQ1OT1 IGF2 H19 c
Xist
Chromosome X
Figure 1.7: Examples of RNA-PRC2 interactions a) HOTAIR binds PRC2 and localizes it in trans to specific genomic loci. b) Kcnq1ot1 binds PRC2 and regulates the expression of nearby genes in cis, including Igf2, Cdkn1c, and the lncRNA H19 (figure not to scale). c) Xist is transcribed on one X chromosome, then spreads across the entire chromosome. This process leads to silencing of the entire chromosome, creating an inactive Barr body.
31 a
1 RRM RRM Gly-rich M9 372 b
1 RRM RRM Gly-rich M9 341
D290V mutation c Optional B1 exon D302V mutation 1 RRM RRM Gly-rich M9 353
Figure 1.8 Structure of hnRNPs A1, A2, and B1 a) hnRNP A1 contains two RNA Recognition Motifs, a Glycine-Rich domain (also described as a low complexity domain or prion-like domain) b) hnRNP A2 has a shorter glycine-rich domain than A1 c) hnRNP B1 is differentiated from A2 solely by the inclusion of an alternately-spliced 12-amino acid exon at its N-terminal end.
32 a b
Splicing complex A2B1
A2B1 Pol II
c d
A2B1
A2B1
A2B1 e f
A2B1 A2B1
A2B1
Figure 1.9: Proposed functions of hnRNP A2/B1 a) RNA-RNA annealing b) Splicing modulation c) Binding to telomeres d) Reader of m6a modifications e) Autoimmune antibody target f) Binding to 3′ UTR mRNA stability elements
33 a B1
HOTAIR
Nascent transcript PRC2
Pol II
b HOTAIR B1 B1 Nascent transcript PRC2
Pol II
c HOTAIR B1 B1 Nascent transcript PRC2
H3K27me3 Pol II
Figure 1.10: Matchmaker Model for hnRNP B1 Activity a) B1 binds to HOTAIR and HOTAIR target gene transcripts at chromatin b) Matching of RNA transcripts via B1 is promoted by B1-chromatin interaction c) Initiation of heterochromatin at target gene
34 c d
e
Figure 1.11: Mechanisms of A2/B1 Mutation in Multisystem Proteinopathy a) Wild-type A2/B1 (red) localizes near the nucleus (blue) in muscle cells. b) In muscle biopsy from a patient with the A2 D290V mutation, A2/B1 instead has a cytoplasmic distribution. c) Wild-type TDP-43 (red) localizes near the nucleus in a manner similar to A2/B1. d) In a different muscle biopsy from the patient in b), TDP-43 is also found in the cytoplasm despite not containing a mutation. e) In vitro, A2 with the D290V mutation is more prone to fibrillization than wild-type A2. Images adapted from Kim et al., 2011
35 CHAPTER II
THE RNA INTERACTOME OF HNRNP A2/B1
Introduction
While the hnRNP A2 isoform is the more common A2/B1 isoform in most cell types, we have identified the B1 isoform as having preferential binding to HOTAIR. hnRNP B1 also associates with RNA transcripts of known HOTAIR target genes, and is bound to chromosomal loci of those target genes (Meredith et al., 2016). From
these data, we have proposed a model in which hnRNP B1 can act as a matchmaker
between lncRNAs and nascent transcripts at target gene loci. Once matched, the
lncRNA and nascent transcript are able to interact via direct RNA-RNA interactions.
The lncRNA can then recruit PRC2, which catalyzes the deposition of H3K27me3, leading to the formation of heterochromatin at the target gene. This model combines two known aspects of A2/B1 activity: its cotranscriptional binding to nascent
transcripts, and its ability to promote the formation of RNA-RNA interactions.
In order to investigate the function of hnRNP B1, we have first characterized
the RNAs to which it and the A2 isoform bind. We used the recently developed eCLIP
method to examine this question in breast cancer cell lines, which have previously
been used to study the effect of HOTAIR on cancer metastasis (Gupta et al., 2011).
Our work indicates that hnRNP B1 and A2 bind many lncRNAs and mRNAs,
displaying shared and unique binding events and motifs. Interestingly, A2/B1
binding is highly enriched in a stable intronic sequence of HOTAIR that disrupts an
RNA-RNA interaction region. A2/B1-RNA interactions occur primarily on chromatin, with the small fraction of interactions that persist in the cytoplasm displaying a unique distribution within the noncoding sequences of mRNAs.
36 Materials and Methods
Lessons Learned from HITS-CLIP and iCLIP Methods
At the time that the lab first began investigating the feasibility of analyzing direct protein-RNA interactions, the main methods in use were based off of HITS-
CLIP (short for high-throughput sequencing of RNA isolated by crosslinking immunoprecipitation) (Moore et al., 2014) (Figure 2.1a). HITS-CLIP uses short-wave
UV crosslinking to create covalent protein-RNA crosslinks, which are then retrieved via immunoprecipitation with antibodies to a protein of interest. The RNAs are then radiolabeled and size selected on a protein gel while still crosslinked to the protein.
The RNA is then purified, reverse transcribed, and ligated to sequencing adapters on either end.
An improved version of HITS-CLIP, known as iCLIP (individual-nucleotide resolution UV crosslinking and immunoprecipitation), retains the basic initial steps of HITS-CLIP but replaces the sequencing adapters with a system designed to create a circular template off of which the sequencing library can be amplified, improving efficiency as compared to the dual RNA ligation steps involved in HITS-CLIP
(Huppertz et al., 2014) (Figure 2.1c). Furthermore, this protocol takes advantage of the tendency of reverse transcriptase to stall at covalent crosslink sites—since one end of the cDNA will correspond to the crosslink, this method has single-nucleotide resolution of crosslink locations.
We first attempted to use a modified version of iCLIP called FAST-iCLIP
(Fully Automated and Standardized iCLIP) (Flynn et al., 2014). This protocol was an attempt to standardize the adapters and computational pipeline of iCLIP along with reducing the number of required overnight incubations to improve the speed of the
37 protocol. While we were able to demonstrate efficient recovery of radiolabeled RNA using antibodies to hnRNP A2/B1 (abcam ab31645) and B1 (IBL 18941), the downstream steps of this protocol functioned at low efficiency.
The Enhanced CLIP (eCLIP) Method
We then turned to a recently-published variant of CLIP called eCLIP
(enhanced CLIP) (Van Nostrand et al., 2016) (Figure 2.1d; 2.3). The eCLIP method is based on iCLIP, but contains a number of improvements. First, ligation efficiency is improved through the addition of PEG8000 and using ssDNA instead of RNA for
one of the adapters. Second, the RNA radiolabeling steps are replaced with western blotting to determine the size of the immunoprecipitated protein, after which a specified size higher than that band is removed for RNA purification. Third, a small fraction of the sample is removed prior to immunoprecipitation to act as a size- matched input, which allows the final eCLIP samples to be normalized against the input samples.
Briefly, the eCLIP method begins with cells irradiated with 150 mJ of 254 nm
UV light. These cells can then be pelleted and stored at –80ºC. Upon thawing, these cells can either be lysed using the standard eCLIP lysis buffer (50 mM Tris-HCl pH
7.4, 100 mM NaCl, 1% NP-40, 0.1% SDS, 0.5% sodium deoxycholate, and 1x protease
inhibitor) on ice for 15 minutes, or buffers for chromatin isolation. We used a
protocol that has previously been used to prepare chromatin samples for HITS-
CLIP(Kung et al., 2015), in which the crosslinked cell pellets are resuspended in
Buffer A (10 mM HEPES pH 7.9, 1.5 mM MgCl2, 10 M KCl, 0.5 mM PMSF),
incubated on ice for thirty minutes, pelleted and resuspended in Buffer C (20 mM
HEPES pH 7.5, 420 mM NaCl, 15% glycerol, 1.5 mM MgCl2, 0.5 mM PMSF, 1x
38 protease inhibitor), incubated on ice with rotation for thirty minutes, and diluted
with final concentration 6.7 mM HEPES pH 7.5. The chromatin is then isolated via
digestion with 40U TURBO DNase for 30 minutes at 37ºC, which is quenched with
10 mM final concentration EDTA.
The lysed sample is then sonicated using a Bioruptor at the low setting for five
minutes, then treated with RNase I (diluted 1:25) and 4U TURBO DNase for five
minutes at 37ºC while shaking. The sample is then treated with RNase inhibitor and
pelleted. While this is happening, the antibody for immunoprecipitation (the amount
varying by antibody, but generally between 2.5-10 µg per sample) is combined with
Protein G Dynabeads (2 mg beads per 10 µg of antibody) rotating at room
temperature for 45 minutes, using antibodies (for A2/B1, abcam ab31645, 9 µg/IP;
for B1, IBL 18941, 2.5 µg/IP). Following washing of the antibody-bead complex, we combined that with the lysate, and performed the immunoprecipitation at 4ºC overnight with rotation.
Following the overnight immunoprecipitation, the samples were washed with
High Salt buffer (50 mM Tris-HCl pH 7.4, 1 M NaCl, 1 mM EDTA, 1% NP-40, 0.1%
SDS, 0.5% sodium deoxycholate) and Wash buffer (20 mM Tris-HCl pH 7.4, 10 mM
MgCl2, 0.2% Tween-20) and resuspended in 100 µL FastAP enzyme mix. Following fifteen minutes of 37ºC shaking incubation, 300 µL PNK enzyme mix was added for an additional twenty minutes. Following more wash steps, an RNA adapter is ligated onto the 3′ end. These adapters are paired and contain a short random sequence, which helps to avoid issues with cluster identification on HiSeq 2500 sequencers.
Following more washes, the samples are then separated from the Dynabeads through incubation at 70ºC for ten minutes, then split into input, experimental, and
39 western blot samples, then size selected on a NuPage 4-12% Bis-Tris protein gel run at 150V in 1x MOPS buffer. These gels were then transferred to either PVDF (for the western blot) or nitrocellulose membranes overnight at 30V in a Bio-Rad transfer apparatus, in transfer buffer with 20% methanol.
Following the transfer, we imaged the western blot to confirm that our protein of interest was being detected in both the input samples and the immunoprecipitation. After noting the size of the protein of interest, we then cut out strips of nitrocellulose membrane from the protein size up to 75 kD larger than that.
Single-stranded RNA-protein complexes will run about twenty kD higher per seventy nucleotides of RNA length, so this should retrieve RNAs up to about 260 nucleotides in length, or about 230 nucleotides of transcript sequence after accounting for the 3′ adapter. The membrane strips are then digested with proteinase K to release the
RNA, then purified by phenol chloroform extraction.
Finally, the RNAs are reverse transcribed using SuperScript IV, and ligated to a single stranded 3′ DNA adapter (which corresponds to the 5′ end of the original
RNA, and contains a ten-nucleotide random sequenceto use to remove PCR duplicates later on). With adapter sequence now at both ends of the cDNA, the library can be amplified by PCR. We used primers to create dual-indexed samples, as are used in the Illumina TruSeq HT Kit. These samples must be processed using paired-end sequencing, which we did after multiplexing our samples into about ten samples per lane, at a final total concentration of 10 nM. These were then sequenced on Illumina NextSeq or HiSeq 4000 sequencers.
40 Computational Analysis of eCLIP-seq Samples
One advantage of the eCLIP protocol is that it is provided along with an
extensive computational pipeline (Figure 2.4). Following sequencing and demultiplexing of the dual-indexed barcodes, the first step is to separate the paired
3′ adapter sequences. This is performed using a script provided along with the eCLIP protocol, which searches for the adapter sequences and divides the sequencing reads into separate files (to be recombined later in the pipeline. While there are always some reads that don’t contain the proper adapter sequence, a high-quality library will have the adapters in a majority of reads.
The pipeline contains a number of steps that are automated using the Genome
Analysis Toolkit (GATK) (McKenna et al., 2010). First, adapters are removed using the cutadapt program (Martin, 2011). Then, the libraries are mapped against an indexed version of the RepBase database, to filter out sequencing reads corresponding to repetitive elements. The reads that are not repetitive are then mapped to either the hg19 (human) or mm9 (mouse) genomes, and the paired 3′ adapter sequences are combined and removed.
Once these libraries are sorted and indexed, peaks of library buildup are called using the clipper program (Lovci et al., 2013), using a modified version from the Yeo lab that is more liberal with its peak calls, under the assumption that these peaks will next be compared against the input signal at the peak regions. The next script does this by calculating the IP over input signal fold change at each peak, then using that to calculate a p-value for each peak. The p-values used in prior publication has varied, but we decided to use a cutoff of p < 10-5.
41 Scatterplots for sample comparison were generated using the eCLIP input
normalization script but comparing the library from one sample against the peak calls from another sample, thus allowing us to compare fold enrichment over input for two different samples at the same locations. Correlation measurements are
Pearson’s correlation coefficient. Additional comparisons were performed by comparing fold enrichment at peaks in each sample, then performing pairwise comparisons using the 2012 ENCODE Irreproducibility Discovery Rate (IDR)
Pipeline. Peak motifs were determined using the DREME program (Bailey, 2011).
In vitro eCLIP
Cloning and purification of recombinant hnRNP B1 was performed as
previously described (Meredith et al., 2016). HOTAIR and control RNA were in vitro transcribed using the MEGAscript T7 Transcription Kit, treated with TURBO DNase, and purified with RNeasy Qiagen Kit. In a 1:10 RNA:protein molar ratio, 1.2 ug of control or HOTAIR RNA was incubated with recombinant B1 in RNA refolding buffer (20 mM HEPES-KOH pH 7.9, 100 mM KCl, 0.2 EDTA pH 8.0. 20% Glycerol,
0.5 mM PMSF, 0.5 DTT) for 20 minutes at room temperature. The mixture was UV- crosslinked twice at 250 mJ and 254 nm wavelength, with mixing by pipette in between. B1-RNA crosslinked and non-crosslinked samples were treated with RNase
A for 3 minutes at 37ºC and 1200 rpm, then stopped with RNase inhibitor. Following this, the in vitro samples were subjected to end repair, adaptor ligation, SDS-PAGE and and transfer to nitrocellulose, and the remainder of the eCLIP-seq protocol, then sequenced multiplexed with other eCLIP-seq libraries.
42 Cellular Fractionation of MCF7 Cells
Cellular fractionation into cytoplasm, nucleoplasm, and chromatin sub
compartments was performed as previously described (Wysocka et al., 2001). Each
fraction was raised to 1 mL final volume using Buffer D (20 mM HEPES pH 7.5, 210
mM NaCl, 7.5% glycerol, 0.75 mM MgCl2, 0.25 mM PMSF, and 1x proteinase inhibitor) then incubated with TURBO DNase (40U for chromatin sample, 10U for soluble samples) for 30 minutes at 37ºC, then quenched with 10 mM EDTA.
Fractions were then further digested with Bioruptor and RNase (but no additional
DNase) as specified in the original eCLIP protocol, then immunoprecipitated overnight with antibody to hnRNP B1(IBL 18941, 2.5 µg/IP). The rest of the eCLIP protocol was performed identically to other samples.
RNA Isolation and PCR
RNA was isolated with 500 µL TRIzol (Life Technologies) followed by purification by RNeasy kit (QIAGEN). Samples were DNase treated using the
TURBO DNase kit (Ambion). Two µg of each RNA sample was reverse transcribed using a cDNA High Capacity Kit (Life Technologies). cDNA was PCR amplified using
Phusion polymerase and 1 minute extension times to accurately amplify reads over 1 kb in length.
Results
The hnRNP B1 Exon is Well-Conserved and Expressed in Mouse and
Human
The 36-nucleotide exon specific to hnRNP B1 is included in approximately
10% of A2/B1 transcripts in most human tissues. This B1-specific region has also been identified in the mouse and rat hnRNP A2/B1 transcript, along with transcripts
43 corresponding to other minor pseudogenes processed from the same locus (Hatfield
et al., 2002; Kamma et al., 1999). However, this exon is not annotated as being part
of any hnRNP A2/B1 transcript in the mouse RefSeq or Ensembl databases. An
analysis of the hnRNP A2/B1 genomic locus across several species indicates that the
B1 exon is in fact highly conserved across the eutherian lineage, with perfect
nucleotide sequence identity to human in the mouse, rat, cow, and elephant genomes
but no identifiable syntenic region to the B1 exon in the opossum genome (Figure
2.5a). This is better conservation than is displayed by the next downstream exon, which is also well-conserved but only has ~90% sequence identity between these species.
In order to confirm whether or not the B1-specific exon is transcribed in non- human species, we first performed an immunoprecipitation, with an antibody designed to be specific to the human hnRNP B1 unique N-terminal region, on whole- cell protein lysate from mouse C2C12 cells or human MCF7 cells. This IP retrieved hnRNP B1 in both cell types, but a minimal amount of hnRNP A2 (Figure 2.5b). We also confirmed that the B1-specific exon is transcribed in C2C12 cells using RT-PCR primers specific to the 5′ UTR and the downstream exon, creating an amplicon spanning the exon (Figure 2.5c).
eCLIP Reveals the Distribution of B1 and A2/B1 RNA Binding
Methods that examine direct protein-RNA interactions by UV crosslinking and immunoprecipitation (CLIP) are commonly used to identify transcripts bound to proteins of interest. Recently the eCLIP method was developed, which has the advantages of individual nucleotide resolution, requiring less library amplification, normalization of the IP sample against a size-matched input (SMInput), and
44 inclusion of a computational analysis pipeline (Van Nostrand et al., 2016). We
modified the protocol slightly, replacing the cell lysis step with a nuclear isolation
previously used to isolate chromatin-associated CTCF-RNA complexes (Kung et al.,
2015). We performed eCLIP using antibodies to either hnRNP B1 specifically, or
hnRNP A2/B1 in combination, using MCF7 breast adenocarcinoma cells. MCF7 cells
were chosen since much of the prior characterization of HOTAIR has been
performed in MCF7 cells (Gupta et al., 2011; Yu et al., 2017), they have been shown
to express high levels of hnRNP A2/B1, and have been well-characterized by the
ENCODE Project (Dunham et al., 2012). We also performed a parallel experiment
using MCF10A cells, in order to facilitate comparison between MCF7 cells and a non-
tumorigenic counterpart from the same organ system (Soule et al., 1990).
The eCLIP experiments for hnRNP A2/B1 and B1 produced transcriptome-
wide maps of A2/B1 and B1 binding sites. We identified peaks of sequencing read
buildup enriched over SMInput, with significant peaks being defined as those with
low p-value (p < 10-5). In total, we identified 7626 A2/B1 peaks significant in both
replicates, and 14,725 B1 peaks significant in both replicates. These peaks were
highly correlated between replicates (A2/B1 r = 0.87; B1 r = 0.87), but less so using
the different antibodies (r = 0.69; Figure 2.7). Comparisons between replicates also
had lower irreproducibility discovery rate (IDR) scores than comparison between IPs
(Figure 2.6a). Although most peaks were shared between A2/B1 and B1 libraries, we identified a subset of peaks specific to the B1 library (Figure 2.6b).
To further investigate the differences between A2/B1 and B1 RNA binding, we investigated the transcripts containing either A2/B1 or B1 input normalized peaks.
Of the 54,064 transcripts annotated in the RefSeq database, 2,850 contained peaks
45 enriched over input and conserved between replicates in either the A2/B1 or B1
experiment. A majority of these transcripts (1,472) shared peaks in both A2/B1 and
B1 experiments; however, a minority of transcripts were unique to either A2/B1
(479) or B1 (899) experiment. Fewer transcripts (412) contained binding peaks in
exons or UTRs, with a significant proportion of those (148) unique to the A2/B1
experiment. These results suggest that the A2 isoform may have more of a role in
binding mature mRNAs than the B1 isoform (Figure 2.6c).
The distribution of input normalized peaks in both A2/B1 and B1 eCLIP
experiments roughly mirrored the genomic distribution of exons, UTRs, and introns;
however, we identified a shift in peak frequency towards regions of introns within 2
kilobases of splice junctions. Non-tumor MCF10A breast cells demonstrated a
similar profile as MCF7 cells, with a slight overrepresentation of exonic peaks as
compared to both the genomic distribution and the MCF7 eCLIP (Figure 2.6d).
Motif analysis revealed a preference for G-rich sequences in hnRNP A2/B1
binding sites. Previous analyses have suggested that hnRNP A2/B1 has a preference
to bind UAGGG motifs in RNA, such as those found in telomeric RNA (McKay and
Cooke, 1992); a similar UAGG motif has been identified using iCLIP in mouse spinal cord cells (Martinez et al., 2016). However, HITS-CLIP of A2/B1 in 293T cells did not identify a similar motif (Huelga et al., 2012). Our data indicates that, although
hnRNP A2/B1 might have particular affinity for (UAGGG)n sequences, it also binds a variety of AGG-rich sequences. Motif analysis identifies a slight difference in binding preference between A2/B1 and B1. B1 displays a strong enrichment for (AGG)n motifs in both MCF7 and MCF10A cells, while A2/B1 appears to be have a weaker preference for AG-rich regions (Figure 2.6e).
46 Binding of the lncRNA HOTAIR by hnRNP A2/B1
The eCLIP profiles of HOTAIR identified a strong hnRNP A2/B1 binding site
downstream from the previously identified PRC2 binding site (Tsai et al., 2010; Wu
et al., 2013), which is likely responsible for the input signal identified in the first
three exons (Figure 2.8a). The A2/B1 binding site is inside the third intron of
HOTAIR, a surprising finding since HOTAIR is generally thought to function in trans as a well-defined spliced transcript. Accordingly, our prior experiments identifying HOTAIR-A2/B1 interactions used in vitro transcribed, fully spliced
HOTAIR. To begin to validate this result, we designed PCR primers spanning either
intron 3 or intron 2. By RT-PCR of MCF7 total RNA, we found that intron 3 is
retained in a fraction of HOTAIR transcripts, while intron 2 is undetectable (Figure
2.8b). This raises the question as to why this particular intron is being specifically
retained, and whether or not that is related to its being bound by A2/B1.
Intron 3 is located in a region of HOTAIR that contains multiple predicted
RNA-RNA interactions sites between HOTAIR and transcripts of its target genes
(Gupta et al., 2011; Meredith et al., 2016). Interestingly, inclusion of intron 3
disrupts an RNA-RNA interaction site between the HOTAIR and JAM2 transcripts
(Figure 2.8c). Multiple RNA-RNA interaction sites between HOTAIR and HOXD
also exist immediately upstream of the intron 3 splice site, close to a UAGGG motif
that could act as an A2/B1 binding site. The existence of a strong A2/B1 binding site
so close to these predicted RNA-RNA interaction sites provides support for the
model of A2/B1 as a “matchmaker” protein, but also raises the question of whether
this role might be regulated by alternative splicing of intron 3 of HOTAIR.
47 In order to test how A2/B1 might bind HOTAIR in the absence of intron 3, we
performed eCLIP on in vitro spliced HOTAIR transcript cross-linked with purified recombinant hnRNP B1 (Figure 2.8d). This in vitro eCLIP experiment revealed two strong binding peaks within HOTAIR (Figure 2.8e). The first binding site is located in exon 1 of HOTAIR, which, according to the proposed secondary structure of
HOTAIR (Somarowthu et al., 2015), overlaps with a region that basepairs with the
RNA-RNA interaction hotspot of HOTAIR (Figure 2.9). The other binding site resides in exon 5, and, while not complementary to the JAM2 interaction site, is proximal to that site in three-dimensional space and may localize to it in fully folded
HOTAIR. These results suggest that A2/B1 may bind HOTAIR at intron 3 and
remain bound after the intron is spliced out in regions proximal to RNA-RNA
interactions.
Binding of hnRNP A2/B1 to Long Noncoding RNAs
We analyzed the eCLIP experiments for other lncRNAs, besides HOTAIR, that
A2/B1 interacts with. A2/B1 displays strong binding to a small fraction of lncRNAs,
catalogued in both a commonly used lncRNA database (Cabili et al., 2011) and a
recently published database of lncRNAs expressed in MCF7 cells (Sun et al., 2015)
(Figure 2.10a). For example, there are a series of four A2/B1 binding sites in Xist, the
lncRNA that contributes to dosage compensation through inactivation of one X
chromosome in mammals (Augui et al., 2011). Xist spreads in cis along with PRC2
across one X chromosome, leading to repression of nearly all genes on that
chromosome (Simon et al., 2014). The 5′ end of the mouse Xist transcript contains a
1.6-kilobase region known a RepA, which has previously been shown to recruit
proteins including PRC2 (Zhao et al., 2010), and fold into three independent
48 structural modules (Liu et al., 2017a). The downstream portion of this region, which is structurally conserved between mouse and human (Yen et al., 2007), contains four strong, reproducible A2/B1 binding sites in our MCF7 eCLIP dataset. These binding sites lie immediately downstream of iCLIP-derived binding sites for other RNA- binding proteins, RBM15 and RBM15b (Figure 2.10b). RBM15 and RBM15b have been shown to mediate the formation of m6A methylation on Xist (Patil et al., 2016).
HnRNP A2/B1 has been proposed to be a reader of m6A marks (Alarcón et al., 2015), though whether there is a direct physical association between the protein and modified base is not clear.
We also identified an hnRNP A2/B1 peak in the lncRNA NORAD (Figure
2.10c), which has been shown to regulate genomic stability through sequestration of
PUM1 and PUM2 proteins (Lee et al., 2016; Tichon et al., 2016), resulting in more efficient DNA repair. Our identified A2/B1 binding site is near the 3′ end of NORAD, overlapping a potential PUM2 binding site identified by eCLIP in K562 cells (Van
Nostrand et al., 2016), but not overlapping any UGURUAUA PUM2 consensus sequences. NORAD contains no introns, highlighting the fact that A2/B1 makes interactions with RNA that do not involve splicing.
Another lncRNA containing an A2/B1 peak is TUG1, which is a Notch pathway-regulated lncRNA that functions in the maintenance of stemness in glioma stem cells through recruitment of PRC2 to neuronal differentiation genes
(Katsushima et al., 2016; Khalil et al., 2009). TUG1 has also been identified as a potential biomarker in a variety of different cancers (Li et al., 2016b). The hnRNP
A2/B1 binding site appears in the intron immediately following the proposed PRC2- interacting region of TUG1 (Figure 2.10d).
49 The hnRNP B1 Binding Profile Differs in Each Cellular Compartment
Although hnRNP A2/B1 is predominantly found in the nucleus, we decided to
compare how its RNA-binding properties change when found in the cytoplasm. To
this end, we performed separate B1 eCLIP experiments on samples split, using a subcellular fractionation protocol (Wysocka et al., 2001), into cytoplasmic, nucleoplasmic, and chromatin-associated fractions. iCLIP of the RNA-binding protein TDP-43 has been performed in the nucleus versus cytoplasm of neuronal primary cells and SH-SY5Y cells (Tollervey et al., 2011) detecting a shift in binding towards the 3′ UTR of transcripts in the cytoplasmic fraction.
Our eCLIP experiment detected far more significant B1 binding peaks in the chromatin sample (28,539) than in the nucleoplasm (230) or cytoplasm (162). The majority of the nucleoplasmic and cytoplasmic binding sites were also found in the chromatin-associated sample (Figure 2.11a). Of the remaining nucleoplasm- and cytoplasm-specific binding sites, very few were in protein-coding sequence, with the vast majority found instead in proximal introns and UTRs, indicating a potential regulatory role on nuclear-exported RNA (Figure 2.11b).
One example of a transcript with distinct B1-RNA interactions in the soluble,
non-chromatin fractions is the SEC14L1 transcript. While B1 binds to the SEC14L1 3′
UTR in all fractions, the signal in the soluble fractions is both stronger than and
shifted as compared to the chromatin-specific binding peaks (Figure 2.11c),
suggesting that B1 binding sites can change as a message matures, or that
localization is correlated with a specific B1-RNA interaction. We also detected
nucleoplasm/cytoplasm-enriched binding to a number of small Cajal body-specific
RNAs (scaRNAs), which are a family of small transcripts that localize to Cajal bodies,
50 nuclear organelles that are involved in the biogenesis of small nuclear ribonucleoproteins (snRNPs) (Figure 2.11d). In particular, scaRNAs are thought to act as guide RNAs in the modification of spliceosomal RNAs (Darzacq et al., 2002).
A splice isoform of A2/B1 missing exon 7-9, hnRNP A2*, has been shown to interact with telomerase at Cajal bodies (Wang et al., 2012).
Discussion
Our profiling of A2/B1 binding sites transcriptome-wide has lead to a number of novel and interesting findings. We find that both A2 and B1 isoforms likely have some level of preferential association with many transcripts, potentially stemming from recognition of distinct motifs. The surprising manner in which A2/B1 encounters its lncRNA partner HOTAIR before HOTAIR is fully processed contributes to the model of RNA-RNA matchmaking that we have previously proposed (Meredith et al., 2016). We identified novel lncRNA interactions with
A2/B1, the positions of which are suggestive of ways that A2/B1 can contribute to known mechanisms of lncRNA activity. Finally, we describe the strong preferential association of A2/B1 with chromatin-associated RNAs rather than those in the soluble fraction of the nucleus.
Conservation of the hnRNP B1 Isoform
To date, functional studies of the hnRNP A2 and B1 isoforms have, with rare exception (Kamma et al., 1999), treated the two isoforms as a single unit. However, we have shown that in certain circumstances the B1 isoform has a distinct set of RNA binding sites, which are likely due to contributions of direct interactions between the
B1-specific N-terminal region and the target RNA. The proximity of the B1-specific
51 exon to the RRMs suggests that the B1-specific exon is able to regulate the activity of
the RRMs when bound to RNA.
The strong conservation of the sequence of the B1-specific exon as compared
to surrounding sequence strongly implies that it has functional significance, leading
to its being subjected to stabilizing selection across a number of species.
Interestingly, the strong conservation of the B1-specific protein sequence extends to conservation of the RNA and DNA sequence of the B1-specific exon, which also has identical sequence across eutherians. While this portion of the B1 transcript does not appear to be heavily protein-bound according to the eCLIP input libraries, it is possible that it is transiently bound by splicing factors that require a particular sequence in order to generate the ideal proportions of A2 and B1.
Potential Roles of hnRNP A2/B1 on HOTAIR
The shift in hnRNP A2/B1 binding towards proximal introns, as compared to the overall transcriptome, is indicative of the role that A2/B1 is known to play in the regulation of splicing. Our work also expands the knowledge of A2/B1 RNA interactions, with particular implications for regulation and function outside of splicing.
We identified a number of transcripts that contain hnRNP A2-specific or B1- specific binding sites, which include a number of different classes of genes. It is possible that the B1-specific exon interacts with the downstream RRMs or the RNA directly, slightly changing the preferred sequence of the bound RNA. This in turn would allow for the diversification of transcript targets that are regulated by products of a single gene.
52 As a known splicing regulator, it makes sense for A2/B1 near exon-intron
junctions as it does in HOTAIR. In fact, such binding may be responsible for the
small fraction of intron 3 is retained in the HOTAIR transcript in total RNA. We
suspect that binding to intron 3 may promote the formation of RNA-RNA
interactions from the region surrounding the retained intron. However, our evidence
that A2/B1 can also bind fully spliced HOTAIR suggests that A2/B1 is capable of
binding elsewhere in HOTAIR under certain circumstances.
Our in vitro eCLIP experiment suggests a model in which A2/B1 preferentially
binds to intron 3 of HOTAIR, regulating its splicing so as to stay bound. However,
under certain circumstances intron 3 is spliced out, resulting in redistribution of
A2/B1 to nearby sites on HOTAIR. The removal of intron 3, combined with A2/B1
binding nearby, facilitates association of the RNA-RNA interaction hotspot with
other target RNA transcripts (Figure 2.12). This suggests that the other functions of
A2/B1 may act in concert with its alternative splicing function in order to affect target RNAs. Such a multifunctional role has recently been suggested for a number
of transcription factors that also regulate alternative splicing (Han et al., 2017).
Interactions between A2/B1 and Additional lncRNAs
The binding of A2/B1 to NORAD, TUG1, and Xist suggest that it may promote interactions between those RNAs and other proteins. The A2/B1 binding site on
NORAD overlaps a PUM2 binding site, while on TUG1 and Xist A2/B1 binds downstream of binding sites for other proteins. We suspect that, as in the matchmaker model, A2/B1 often acts in concert with other proteins on RNA to facilitate cellular activities.
53 While the activities of hnRNP A2/B1 discussed thus far have been chromatin-
associated, we also identified B1-RNA interactions that occur in the nucleoplasm and
cytoplasm. These include nucleoplasm/cytoplasm-specific interactions, such as the
binding of B1 to scaRNAs that suggests localization of B1 to Cajal bodies, which we
suspect allows B1 to promote RNA-RNA interactions between scaRNAs and their
targets. We also identified B1 binding sites that are shifted depending on the cellular compartment on SEC14L1, which we hypothesize to be a reflection of how B1 activity can be regulated based on its local cellular environment.
Our transcriptome-wide study of A2/B1 RNA binding partners provides many examples demonstrating that A2/B1 has more roles than simply acting as a splicing regulator. We identify a number of cases in which A2/B1 appears to bind RNA in concert with other proteins, as well a site in HOTAIR in which the splicing function of A2/B1 is used as a precursor to subsequent activities on the transcript. These results solidify the view that A2/B1 is a multifunctional RNA-binding protein with a diverse set of roles across the transcriptome.
54 a bc d HITS-CLIP PAR-CLIP iCLIP eCLIP
254 nm 365 nm 254 nm 254 nm 4SU→C
IP, 3′ adapter ligation, protein gel size selection
5′ adapter ligation, reverse transcription
C C
Figure 2.1: CLIP Methods Comparison a) HITS-CLIP relies read-through of reverse transcriptase at crosslink sites, leading to relatively lower RNA yield. b) PAR-CLIP uses photoactivatable nucleosides (e.g. 4-thiouridine) that are activated by exposure to long-wave UV at protein-RNA interaction sites. This method still relies on reverse transcriptase read-through, but allows for more accurate crosslink site identification. c) iCLIP uses cDNA circularization to avoid a reliance on read-through, instead generating a cDNA fragment that ends at the crosslink site. This allows for single- nucleotide resolution without addition of extra nucleosides. d) eCLIP uses some methodological alterations to achieve the single-nucleotide resolution of iCLIP without the circularization step, greatly increasing cDNA yield.
55 CLIP CLIP M-X protein CLIP CLIP B1 #1 B1 #2 ladder A2B1 #1 A2B1 #2
– 130 kD
– 93 kD – 70 kD (pink)
– 53 kD
– 41 kD
– 30 kD
– 22 kD (green)
– 14 kD
– 9 kD
Figure 2.2: A2/B1 Antibody Tests Both abcam A2/B1 and IBL B1 antibodies effectively retrieve RNA-protein complexes, as shown by radio labeled blot. Red lines indicate potential excision location for CLIP experiments.
56 a b c
Nuclear isolation and fragmentation Immunoprecipitation
Protein-RNA complex (UV-crosslinked) Dephosphorylation 3′ linker ligation e d f Proteinase K Protein RNA isolation Size Selection
Reverse Transcription
RNA Removal PCR Amplification g Adaptor Ligation h i Size Selection Sequencing
Figure 2.3: eCLIP Experimental Procedure Flowchart of eCLIP experimental steps
57 Fastq file of sequencing reads demultiplexed into different samples by sequencing core
Remove 5’ and 3’ adaptors using cutadapt program
Fastq file of partially trimmed reads
Remove double-ligated 3’ adaptors using cutadapt program
Fastq file of trimmed reads
Bam file of reads aligning to Align and remove repetitive regions repetitive regions (regions provided using STAR program by Yeo lab)
Bam file of reads aligning to non- repetitive regions
Remove PCR duplicates
Bam file of unique reads aligning to non-repetitive regions
Sort and index bam file
Final bam file of aligned reads for Call peaks using File of peak regions downstream analysis clipper program
Generate bigwig files for upload to genome browser Compare read density in IP vs. input
Genome browser view Input-normalized peaks
Filter input-normalized peaks to p-value threshold (p<10^-5)
Input-normalized peaks for analysis and to upload to genome browser view
Figure 2.4: eCLIP Computational Pipeline Flowchart
58 a b B1-specific exon
Protein K T L E T V P L E R K K IP (20%) Input (1%) Human AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTGCCAGCA-GGTCACC Mouse AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTACCAGCA-GGTCACC 41 kD Rat AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTGCCAGCA-GGTCACC ←B1 Cow AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTGCCAGCA-GGTCACC ←A2
Elephant AGAAAACTTTAGAAACTGTTCCTTTGGAGAGGAAAAAGGTACTCTGCCAGCA-GGTCACC MCF7 Opossum TGGAAATTCATCAATCTTCAGCTACTAAGTTACATCAGGCTGCCCAGAAGCACAAAAGCC (human) 30 kD * *** * ** ** ** ** * *** * **** **
Protein 41 kD Human TCATATT--TAAGAATTTAATTTCCTGCATACAAAGAGGAAAATGTAAATAAAAATTGAA ←B1 Mouse TCATATT--TAAGAATTTAATTTCCTGCATACAAAGA---AAGTGTAAATAAAAATTGAA Rat TCATATT--TAAGAATTTAATTTCCTGCATACAAAGAG-AAAATGTAAATAAAAATTGAA ←A2 C2C12 Cow TCATATT--TAAGAATTTTATTTCCAGCATACAAAAAG-AAAATGTAAATAAAAATTGAA (mouse) 30 kD Elephant TCATATT--TAAGAATTTAATTTCCTGCATACAGAGAG-AAAATGTAAATAAAAATTGAA Opossum TCTGAATGATAATAGCTGGAGCTTCCGTAGGAAGGGAG-CCTGAACATAATCCCGTTGCT ** * * *** * * * * * * * * * * * ***
Protein R E K E Q F R K L F I G G L Human ATGGTATTTTCCTTTGCAGAGAGAAAAGGAACAGTTCCGTAAGCTCTTTATTGGTGGCTT Mouse ATGGTATTTCCCTTTGCAGAGAGAAAAGGAACAGTTCCGAAAGCTCTTTATTGGTGGCTT Rat ATGGTGTTTTCCTTTGCAGAGAGAAAAGGAACAGTTCCGTAAGCTCTTTATTGGTGGCTT Cow ATGCTGTTTTCCTTTGCAGAGAGAAAAGGAACAATTCCGTAAACTCTTTATTGGTGGCTT Elephant ATGGTGTTTTCCTTTCCAGAGAGAAAAGGAACAGTTTCGTAAACTCTTCATTGGTGGCTT Opossum GTCCACTTTTTGTTTGCAGAGAGAAAAGGAACAGTTCCGCAAACTGTTTATTGGAGGCCT * *** *** ***************** ** ** ** ** ** ***** *** * Exon 3 MCF7 cellsC2C12 cells 150 bp c ←B1 (156 bp) 100 bp ←A2 (120 bp)
5′ UTR Exon 1 Exon 2 Exon 3 (156 bp) (6 bp) (36 bp) (111 bp)
Figure 2.5: Conservation of B1-Specific Exon Across Species a) Multiple sequence analysis of hnRNP B1 genomic sequence in human, mouse, rat, cow, elephant, and opossum reference sequences, with B1-specific exon and next downstream exon highlighted, demonstrating high degree of conservation of the B1-specific exon in all eutherian species. Mismatches from human sequence are colored in red. b) Immunoprecipitation using antibody specific to hnRNP B1 (IBL #18941) specifically retrieves hnRNP B1 in both human and mouse samples. c) RT-PCR primers surrounding B1-specific exon 2 demonstrate inclusion of B1 exon in total RNA from both human MCF7 and mouse C2C12 cells.
59 a b
A2/B1 (replicate A) vs. B1 (replicate A) 500 bp Hivep3 42 345 500 42 345 000 42 344 500 (NM_001127714) A2/B1 (replicate A) vs. A2/B1 (replicate B) 20 – B1 (replicate A) vs. B1 (replicate B) eCLIP A2/B1 Replicate A 0 – 20 – eCLIP A2/B1 SMInput
IDR 0 – 20 – eCLIP B1 Replicate A 0 – 20 – eCLIP B1 SMInput
0 – 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 20000 40000 60000 80000 100000 120000 num of significant peaks d Proximal Distal Proximal 5′ UTR Exon Exon 3′ UTR c Intron Intron Intron All transcripts Introns Exons/UTR
479 (25%) 447 (25%) 148 (45%) A2/B1
1389 183 1472 Genomic Distribution hnRNP A2/B1 binding hnRNP B1 binding (human) (MCF7 cells) (MCF7 cells)
B1
899 (38%) 884 (39%) 81 (30%)
hnRNP A2/B1 binding hnRNP B1 binding (MCF10A cells) (MCF10A cells)
e MCF7 MCF10A A2/B1 B1 A2/B1 B1
Figure 2.6: A2/B1 and B1 eCLIP Results in MCF7 Cells a) IDR analysis indicating lower IDR values for replicates compared against one another (A2/B1, black; B1, red) than comparison of replicates from different experiments (purple). b) Example hnRNP A2/B1 eCLIP data, demonstrating region of the transcription factor Hivep3 that contains binding sites (black) enriched specifically in hnRNP B1 (blue). Y-axis scale is normalized to reads per million. c) Analysis of RefSeq transcripts containing input normalized peaks in both replicates of A2/B1 or B1 eCLIP-seq experiments, including number of transcripts unique to either IP. d) Distribution of peaks (conserved between both replicates) in different areas of transcripts in hnRNP A2/B1 and B1 eCLIP experiments, compared to experiments performed in non-tumorigenic MCF10A cells. “Proximal introns” are defined as intronic regions within 2 kb of an exon. e) Top two identified motifs in both replicates of hnRNP A2/B1 and B1 eCLIPs in MCF7 and MCF10A cells.
60 a MCF7 A2/B1 replicatesb MCF7 B1 replicatesc MCF7 IP comparison r = 0.87 r = 0.87 r = 0.69 r = 0.95 r = 0.89 r = 0.65
d MCF10A A2/B1 replicates e MCF10A B1 replicates f MCF10A IP comparison r = 0.79 r = 0.90 r = 0.85 r = 0.67 r = 0.97 r = 0.86
g A2/B1 (replicate A) vs. B1 (replicate A) A2/B1 (replicate A) vs. A2/B1 (replicate B) B1 (replicate A) vs. B1 (replicate B) IDR 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 20000 40000 60000 80000 120000 num of significant peaks Figure 2.7: A2/B1 and B1 MCF7 Replicate Correlation a) Comparison of MCF7 A2/B1 eCLIP replicates displays strong correlation. b) Comparison of MCF7 B1 eCLIP replicates displays strong correlation. c) Comparison of MCF7 A2/B1 eCLIP replicate versus B1 replicate displays relatively weaker correlation. d) Comparison of MCF10A A2/B1 eCLIP replicates. e) Comparison of MCF10A B1 eCLIP replicates. f) Comparison of MCF10A A2/B1 eCLIP replicate versus B1 replicate. g) IDR analysis of MCF10A A2/B1 replicates (black), B1 replicates (red), or A2/B1 versus B1 replicates (purple).
61 a 2 kb 54 363 000 54 362 000 54 361 000 54 360 000 54 359 000 54 358 000 54 357 000 54, 500 bp HOTAIR (NR_003716) 12 – 54 360 000 HOTAIR (NR_003716) 12 – eCLIP A2/B1 Replicate A eCLIP A2/B1 0 – Replicate A 0 – 12 – 12 – eCLIP A2/B1 Replicate B eCLIP A2/B1 0 – Replicate B 0 – 12 – 12 – eCLIP A2/B1 SMInput eCLIP A2/B1 SMInput 0 – 12 – 0 – 12 – eCLIP B1 eCLIP B1 Replicate A Replicate A 0 – 0 – 12 – 12 – eCLIP B1 eCLIP B1 Replicate B Replicate B 0 – 0 – 12 – 12 – eCLIP B1 eCLIP B1 SMInput SMInput
0 – 0 – PRC2 interaction region b c Predicted A2/B1 UAGG binding motif RNA-RNA interaction hotspot Invitrogen 50 bpIntron ladder 3 primersIntron 2 primers
HOTAIR (NR_003716) 54,360,000
⬅ Intron 2 retained 12 – eCLIP B1 Replicate A ⬅ Intron 3 retained 0 –
⬅ Introns spliced out
2 kb HOTAIR (NR_003716) 54,363,000 54,362,000 54,361,000 54,360,000 54,359,000 54,358,000 54,357,000 54, e 150,000 – In vitro eCLIP HOTAIR/B1 Crosslinked 0 – d 150,000 – Recombinant In vitro transcribed In vitro eCLIP hnRNP B1 HOTAIR or control HOTAIR/B1 Non-crosslinked 0 – 150,000 – In vitro eCLIP Anti-luc/B1 Crosslinked 0 – 12 –
eCLIP A2/B1 Replicate A Size selection and 0 – remainder of eCLIP protocol 12 – eCLIP A2/B1 Crosslinking SMInput 0 – Figure 2.8: Investigation of a Novel A2/B1 Binding Site in HOTAIR a) Location of an hnRNP A2/B1 eCLIP peak in an intron of HOTAIR, downstream of the proposed PRC2 binding site (visible in the input library). b) The intron between exons 3 and 4 of HOTAIR retained in a small fraction of HOTAIR MCF7 total RNA. c) Intron 3 disrupts an RNA-RNA interaction site between HOTAIR and JAM2, is adjacent to an interaction site between HOTAIR and HOXD regions, and is near a predicted UAGGG A2/B1 binding motif. d) Schematic of in vitro eCLIP experiment using recombinant B1 protein and in vitro transcribed HOTAIR. e) In vitro eCLIP of B1 and either HOTAIR or negative control anti-luciferase RNA identified strong binding sites in exon 1, and in exons 5-6 of HOTAIR.
62 HOTAIR domain 1
B1 in vitro eCLIP Intron 2
Intron 3 RNA-RNA interaction hotspot
Intron 1
Intron 4
Figure 2.9: In vitro B1 eCLIP Binding Sites Mapping of in vitro eCLIP B1 binding peaks (blue) to HOTAIR secondary structure (Somarowthu et al. 2015), showing comparison to RNA-RNA interaction hotspot (green).
63 Peaks conserved 10 kb between replicates 075 000 73 070 000 73 065 000 73 060 000 73 055 000 73 050 000 73 045 000 73 040,000 Xist (NR_001564) a b 20 – 7626 34 A2/B1 A2/B1 peaks Compare vs. Cabili et al. lncRNA database transcripts eCLIP A2/B1 (5739 transcripts) Replicate A 0 – 14725 48 B1 20 – B1 peaks transcripts eCLIP A2/B1 Replicate B 0 –
20 A2/B1 20 – Compare vs. Sun et al. lncRNA database transcripts eCLIP A2/B1 (4877 transcripts) SMInput 0 – 21 B1 20 – transcripts eCLIP B1 Replicate A c 2 kb 0 – 34 639 000 34 638 000 34 637 000 34 636 000 34 635 000 34 634 000 NORAD (NR_027451) 20 – PUM2 motifs (UAURUAUA) eCLIP B1 Replicate B PUM2 eCLIP Peaks (K562) Replicate 1 Replicate 2 60 – 0 – 20 – eCLIP A2/B1 eCLIP B1 Replicate A 0 – SMInput 60 – 0 – eCLIP A2/B1 Replicate B 0 – 60 – eCLIP A2/B1 SMInput 500 bp 0 – 73,072,500 73,072,000 73,071,500 73,071,000 :chrX Xist (NR_001564) NR 60 – 20 – - 20
eCLIP B1 P MCF7 A2B1 A-IP negfli Replicate A eCLIP A2/B1 Replicate A 0 – 0 – eCLIP MCF7 A2B1 AIP 60 – 20 – - 20 eCLIP B1 eCLIP A2/B1 P MCF7 A2B1 B-IP negfli Replicate B Replicate B 0 – 0 – eCLIP MCF7 A2B1 BIP 20 – - 20 60 – eCLIP B1 eCLIP A2/B1 SMInput SMInput 0 – - 20 0 – 20 – eCLIP B1 LIP MCF7 B1 A-IP negfli Replicate A 0 – eCLIP MCF7 B1 AIP 20 – - 20 eCLIP B1 d 5 kb LIP MCF7 B1 B-IP negfli Replicate B 000 31,370,000 31,375,000 TUG1 (NR_110492) 0 – 15 – eCLIP MCF7 B1 BIP 20 – - 20 eCLIP B1 eCLIP A2/B1 SMInput LIP MCF7 B1 input negfli Replicate A 0 – 0 – 1000 – - 1000 15 – iCLIP RBM15 eCLIP A2/B1 0 – Replicate B 1000 – - 1000 0 – iCLIP RBM15b
15 – 0 – eCLIP A2/B1 SINE SMInput 0 – 15 – eCLIP B1 Replicate A 0 – 15 – eCLIP B1 Replicate B 0 – 15 – eCLIP B1 SMInput
0 –
Figure 2.10: Binding of lncRNAs by A2/B1 a) hnRNP A2/B1 interacts with a small number of previously-identified lncRNAs b) hnRNP A2/B1 binding sites within the RepA region of Xist. RBM15 and RBM15b iCLIP tracks published in Patil et al. 2016. c) An hnRNP A2/B1 binding site within the lncRNA NORAD. PUM2 eCLIP replicate peaks retrieved from ENCODE Project experiment accession ENCSR661ICQ (van Nostrand et al. 2016). d) An hnRNP A2/B1 binding site within an intron of the lncRNA TUG1
64 Proximal Distal Proximal Chromatin 5′ UTR Exon Exon 3′ UTR a b Intron Intron Intron
28,316
73 134 16 MCF7 chromatin MCF7 nucleoplasm MCF7 cytoplasm 15 5 7 Nucleoplasm Cytoplasm c d 2 kb 100 bp SEC14L1 210,000 75,210,500 75,211,000 75,211,500 75,212,000 75,212,500 75,213,000 75,213 SCARNA10 6 619 400 6 619 450 6 619 500 6 619 550 6 619 600 6 619 650 6 619 700 6,619 (NM_001143998) (NR_004387) 5 – 10 – eCLIP MCF7 eCLIP MCF7 chromatin chromatin 0 – 0 – 5 – 10 – eCLIP MCF7 eCLIP MCF7 chromatin input chromatin input 0 – 0 – 125 – 150 – eCLIP MCF7 eCLIP MCF7 nucleoplasm nucleoplasm 0 – 0 – 125 – 150 – eCLIP MCF7 eCLIP MCF7 nucleoplasm input nucleoplasm input 0 – 0 – 225 – 150 – eCLIP MCF7 eCLIP MCF7 cytoplasm cytoplasm 0 – 0 –
225 – 150 – eCLIP MCF7 eCLIP MCF7 cytoplasm input cytoplasm input 0 – 0 –
Figure 2.11: eCLIP of MCF7 Chromatin, Nucleoplasm, and Cytoplasm a) The vast majority of A2/B1 binding peaks were identified in the chromatin sample, with a small minority overlapping with a nucleoplasm or cytoplasm binding peak. A small fraction of binding peaks were unique to either nucleoplasm or cytoplasm. b) Compared to the chromatin sample, the nucleoplasm and cytoplasm peaks were far more likely to identify binding to either 5′ UTR or proximal intronic sequence. c) In the 3′ UTR of SEC14L1, there are two distinct binding peaks for B1. However, in nucleoplasm (purple) and cytoplasm (yellow), A2/B1 preferentially binds between the two chromatin peaks. d) A nucleoplasm-specific binding peak in the small Cajal body-associated RNA SCARNA10.
65 a B1 HOTAIR
Intron 3 is spliced out b
B1 B1
c
B1 B1
Figure 2.12: Model of B1 Interaction with HOTAIR a) hnRNP B1 preferentially binds HOTAIR intron 3. b) Following splicing of intron 3, B1 remains bound to nearby regions of B1. c) B1-HOTAIR binding promotes RNA-RNA interactions from hotspot region.
66 CHAPTER III
ROLES OF TDP-43 AND HNRNP A2/B1 IN MUSCLE DIFFERENTIATION
Introduction
Having adapted the eCLIP method to examine hnRNP A2/B1 binding in human cells, we next used this method to explore the activity of A2/B1 and TDP-43
in mouse muscle differentiation. As mentioned previously, mutations in A2/B1 can
lead to a condition known as multisystem proteinopathy, in which aggregates of
A2/B1 and TDP-43 are found in the cytoplasm (Kim et al., 2013a). A2/B1 has also
been implicated in a number of muscle-specific diseases, such as limb-girdle
muscular dystrophy (Bengoechea et al., 2015) and oculopharyngeal muscular
dystrophy (Fan et al., 2014). The A2/B1 paralog hnRNP A1 is required for proper
muscle development in mice, suggesting that A2/B1 may play a similarly important
role (Liu et al., 2017b).
Many A2/B1-associated muscle diseases feature mislocalized A2/B1
ribonucleoprotein (RNP) complexes, which are thought to interfere with the normal
RNA binding function of A2/B1 (Fan et al., 2014; Kim et al., 2013a; Li et al., 2016a).
This led us to investigate the role that A2/B1 plays during normal myogenesis,
reasoning that knowledge of normal A2/B1 function would allow us to better predict
the effects that mislocalization can have on RNA function.
We also examined potential roles that cytoplasmic ribonuclear protein
complexes (RNPs) play in in muscle cells. While cytoplasmic TDP-43 and A2/B1
complexes are generally thought to be pathologic, our collaborators discovered
evidence that TDP-43 aggregates are found in normally differentiating myotubes,
implying that these aggregates can exist in cells in a non-pathologic context. They
67 also identified similar aggregates in regenerating myotubes for about seven days post-injury, suggesting that cytoplasmic TDP-43 aggregates are present during
periods when muscle-specific transcripts are being actively translated (Figure 3.1).
Our collaborators developed a model to fit these findings in which RNA-
binding proteins such as TDP-43 act as RNA cofactors in the nuclear export of long,
structural RNA transcripts (Figure 3.2). During muscle differentiation or injury
repair, molecules of TDP-43 and RNA aggregate to form RNP complexes that may
aid in export of these transcripts from the nucleus. These complexes are normally
degraded or returned to the nucleus at the conclusion of differentiation or repair;
however, mutations that disrupt this process lead to abnormal formation or retention of these complexes in non-differentiating, quiescent cells.
In order to better understand how dysregulation of TDP-43-RNA complexes can lead to the pathologic effects seen in multisystem proteinopathy, we applied eCLIP as we had done previously with A2/B1. The TDP-43 RNA interactome has been investigated using CLIP-based methods before, in neurons. HITS-CLIP of mouse neurons identified a preference for GU-rich sites in distal introns
(Polymenidou et al., 2011). Knockdown of TDP-43 in these cells resulted in downregulation or alternative splicing of RNAs containing long introns or multiple
TDP-43 binding sites. Another study used iCLIP to examine TDP-43 in human
SHSY5Y neuronal cells, and primary neurons derived from either healthy patients or patients with frontotemporal dementia (Tollervey et al., 2011). The RNAs bound by
TDP-43 in these cells appear similar to those bound in mouse brain—they contain
GU-rich sites, are subject to alternative splicing in TDP-43 knockout, and are involved in neuronal development.
68 Our collaborators also found evidence that the TDP-43 aggregates form
stable, higher order, amyloid oligomer-like assemblies. Not only are these assemblies visible in myotube samples run on denaturing gels (Figure 3.3a-b), they are also immunoreactive for the A11 amyloid oligomer antibody. Immunostaining with the
A11 antibody specific to amyloid oligomers in differentiating myotubes and injured cells reveals a nearly identical distribution as TDP-43 (Figure 3.3c). Finally, powder
x-ray diffraction indicates that TDP-43 assemblies contain a cross-beta structure
typical of amyloid oligomers (Figure 3.3d). These results suggest that TDP-43- containing RNPs form tightly packed amyloid-like oligomers. However, the mechanisms behind this process, and the transcripts contained within the RNPs, remain unclear.
To date research into amyloid oligomers has focused on pathologic contexts such as Alzheimer’s disease or amyloidosis, with only a few examples of non-disease associated amyloid in humans (Berson et al., 2003; Maji et al., 2009). This study represents one of the first descriptions of non-pathogenic amyloid oligomers forming in neurons or muscle cells.
Materials and Methods
C2C12 Cells are a Model System for Myogenesis
The eCLIP protocol was performed as previously described, with minor modifications. Mouse C2C12 cells were used as a model system in which to examine muscle differentiation. These cells are grown as myoblasts, and can be induced to differentiate into myotubes through growth in low-serum media. This differentiation process has been shown to be comparable to the differentiation process of primary muscle cells (Burattini et al., 2004). While the amount of differentiation time used
69 varies in the literature, we differentiated our cells for eight days, confirming
differentiation by microscope prior to harvest.
Modifications to the eCLIP Method
We used a rabbit antibody to TDP-43 (Bethyl A303-223A) that has previously
been used for eCLIP experiments in human K562 and HepG2 cell lines (Van
Nostrand et al., 2016). Unlike the hnRNP A2/B1 and B1 antibodies, we also used this
antibody to probe the confirmatory eCLIP western blots. This antibody retrieves
TDP-43-RNA complexes with far greater efficiency than an IgG control, as
demonstrated by IP with radiolabeled RNA (Figure 3.4).
eCLIP data analysis was performed as previously described. Downstream
analysis was performed on significant peaks that were conserved in both replicates of
each cell type.
Results
hnRNP A2/B1 Binds to a Variety of Myogenic Genes in the Cytoplasm
We first performed eCLIP using antibodies to hnRNP A2/B1, as previously
described, on C2C12 myoblasts and differentiated myotubes (Figure 3.5), generating
transcriptome-wide maps of A2/B1 RNA binding sites. As before, we identified
significant binding peaks in all replicates, with significant peaks being defined as
those with low p-value (p < 10-5) and are detected in both biological replicates.
Significant peak location and strength were highly correlated between replicates
(myoblast r = 0.99; myotube r = 0.81) and showed a low IDR score. Comparison
between cell types displayed significantly lower correlation (r = 0.64) and higher
IDR score (Figure 3.6), implying that A2/B1 might bind to unique subsets of RNA transcripts during the course of differentiation.
70 We identified 518 significant binding peaks in 100 genes in myoblasts, and
1079 significant binding peaks in 150 genes in myotubes. These peaks were most
likely to be found in the 3′ UTR of genes (Figure 3.7a) and in protein-coding genes
(Figure 3.7b), implying that A2/B1 is likely to bind to mature mRNA transcripts in
these cells. Gene ontology analysis of these genes did not identify any particular
affinity for particular biological processes.
We identified binding to a number of coding and noncoding RNAs, including
the long noncoding RNAs Xist and H19 (Figure 3.8a-d). Interestingly, A2/B1 also
appears to bind to the 3′ UTR of its own transcript, with more binding regions
detectable in myotubes, suggesting that it may be regulating its own expression and
splicing during myogenesis (Figure 3.8e-f). A2/B1 also binds to a number of
myogenesis-related genes, including MBNL1, TAF15, and TRIM72 (Figure 3.9).
TDP-43 Binds Coding Regions of RNAs in Both Myoblasts and Myotubes
TDP-43 eCLIP was performed similarly to the A2/B1 eCLIP, and identified
556 binding peaks across 174 genes for myoblasts, and 975 binding sites across 320 genes for myotubes as being significantly enriched over size matched input and shared between biological replicates. The replicates of each cell type displayed high correlation with each other, but lower correlation between cell types. This is supported by the IDR analysis, which has a similar profile as both of the A2/B1 IDR analyses. For this experiment we also prepared eCLIP libraries using the IgG antibody for sequencing, which produced far fewer usable reads than either the input or IP libraries. Accordingly, there were far fewer peaks identified in the IgG libraries, which did not correlate well with the IP peaks by the IDR analysis (Figure 3.11).
71 In both myoblast and myotube libraries, TDP-43 eCLIP binding peaks were more likely to be found in coding exons of protein coding genes (Figure 3.12a-b), a surprising finding given that previous TDP-43 CLIP experiments in neurons identified binding primarily in distal intronic sequences (Polymenidou et al., 2011;
Tollervey et al., 2011). A possible explanation for this is the eCLIP computational pipeline step that removes sequencing reads mapping to repetitive sequence—much intronic sequence, especially in the mouse genome, has been classified as repetitive in RepeatMasker (Smit et al.), which would lead previous CLIP-seq analyses that do not include this step to erroneously assign reads to repetitive regions in introns.
Another possibility is that this is evidence of a biological phenomenon in which TDP-
43 tends to bind exonic mRNA sequence in muscle cells specifically.
Motif analysis of either cell type recapitulated the strong enrichment of (UG)n motifs described in previous TDP-43 CLIP-seq experiments (Figure 3.12c). Many
TDP-43 target genes also contained UGUGU motifs, which are often found adjacent to TDP-43 binding sites. One example is TDP-43 binding to the 3′ UTR of its own
RNA at a location surrounded by UGUGU motifs (Bhardwaj et al., 2013) (Figure
3.13a-b). We also detect previously-identified binding to the lncRNAs NEAT1 and
MALAT1, which contain numerous UGUGU motifs (Guo et al., 2015; Tollervey et al.,
2011) (Figure 3.13c-f). As a negative control, we examined TDP-43 binding to the ubiquitously expressed housekeeping gene GAPDH, which only contains one
UGUGU motif, and detected no input-normalized peaks (Figure 3.13g-h).
The eCLIP results confirmed our hypothesis that TDP-43 is binding to long myogenic transcripts during differentiation. In myotubes, we detect multiple TDP-43 binding peaks in the exons of genes involved in the contractile apparatus such as
72 nebulin, titin, myosin 3, and tropomyosin C (Figure 3.14). In contrast, there was
minimal binding to these transcripts in either the IP or input sample in myoblasts.
The transcription of these genes is upregulated in myotube as opposed to myoblast
(de Klerk et al., 2015), but the increase in TDP-43 binding is greater than the
increase in transcription.
Of note, the eCLIP signal in these transcripts was predominantly exonic,
which was well-visualized at exon-intron junctions, and suggested that TDP-43 is
predominantly binding to mature structural RNAs rather than acting as a splicing
regulator. To further investigate the particular genes that display exonic binding, we
filtered the overall list of significant peaks into peaks existing only in exons and
UTRs, and further subdivided those lists into a list of exonic/UTR peaks unique to
myotubes. Gene ontology analysis of the final, unique-to-myotube list indicate strong
selectivity for mRNAs associated with the sarcomere (p-value = 3.6 x 10-23) (Mi et
al., 2017) (Table 3.1).
Differences between TDP-43 and A2/B1 RNA Binding in Muscle
In C2C12 cells hnRNP A2/B1 tends to bind to the 3′ UTR of its target
transcripts, rather than exons as TDP-43 does. This differs from human MCF7 cells, in which we found that A2/B1 binds predominantly to proximal introns, implying that the RNA binding partners of A2/B1 are dependent on cell type and local environment. Conversely, A2/B1 has been shown to bind 3′ UTRs in mouse spinal cord as well (Martinez et al., 2016), leading us to speculate that the preference for binding to the 3′ UTR is a characteristic of mouse A2/B1, or is due to a common aspect of the cellular environment in neurons and muscle.
73 We had suspected that some transcripts might be bound by both A2/B1 and
TDP-43 at the same time based on prior studies demonstrating interactions between the two proteins (Buratti et al., 2005; D’Ambrogio et al., 2009; Mohagheghi et al.,
2016; Romano et al., 2014). However, we found that the transcripts bound by A2/B1 tend to differ from those bound by TDP-43 (Figure 3.15a-b). For both A2/B1 and
TDP-43, a majority of bound transcripts are associated with one protein but not the
other.
Some transcripts bound by both TDP-43 and A2/B1 demonstrated binding to
specific yet unique locations on the transcript. On the lncRNA H19, for example,
both proteins gain binding sites during differentiation, but A2/B1 appears to localize
to the 5′ end of the transcript, while TDP-43 binds to the 3′ end (Figure 3.15c-d).
Meanwhile, on Xist, both proteins appear to bind to the RepA region at the 5′ end,
but bind to different regions in the central portion of the transcript (Figure e-f).
Gene ontology analysis of the mouse transcripts that have 3′ UTR peaks in
myotubes only were inconclusive. However, when performing less stringent analysis
that allowed for peaks anywhere in the transcript, we detected a slight enrichment
for myogenesis-related categories along with splicing. This implies that A2/B1 might
cooperate with TDP-43 in binding cytoplasmic mRNAs during myogenesis, but can
additionally bind 3′ UTRs of RNAs for other purposes (Table 3.2).
Discussion
Describing the binding preferences of A2/B1 and TDP-43 during the course of
muscle differentiation is a first step towards a more complete explanation of how
RNA is normally processed and transported around the cell during the complex
process of muscle differentiation and regeneration. Although A2/B1 and TDP-43 do
74 not act in concert across the transcriptome as we had previously suspected, we still found evidence that they may bind to targeted transcripts in a cooperative manner.
Our strong evidence that TDP-43 is involved in nuclear export of myogenic transcripts implies that the role of A2/B1 might shift depending on the cellular context, and which RNAs are transcribed and available to bind.
The massive size of RNAs bound by TDP-43 during differentiation is also remarkable. For large transcripts such as titin, which is over 100 kb in length even after splicing, export from the nucleus must be a complex process. We hypothesize that TDP-43 binds across the entire length of titin and similar transcripts, acting in a manner analogous to histones to condense the RNA and more easily transport it into the cytoplasm. RNA granules have been shown to perform important functions in a number of cellular contexts (Fatimy et al., 2016; Khong et al., 2017). While TDP-43 has previously been shown to form small RNP complexes, that role is extended by our observation of TDP-43-RNA granules in muscles.
The observed cytoplasmic distribution of TDP-43 in muscle differentiation and regeneration helps to explain why these aggregates are so common in neuromuscular degenerative disease. Cellular damage would trigger the muscle regeneration pathway, leading TDP-43 to bind long structural transcripts and travel along with them into the cytoplasm. It is possible that during disease states a downstream step in the repair pathway has failed, leaving transcript aggregates and their associated RNA stuck in the cytoplasm. Similar aggregates are also seen in traumatic brain injury, suggesting that a similar process may also be taking place in neurons (Wright et al., 2016).
75 Finally, the lack of co-binding of transcripts between TDP-43 and A2/B1 is an interesting observation, as it suggests that these RNA-binding proteins might have evolved to target different transcripts at different times and in different cellular states. This provides a model for how a network of RNA-binding proteins, including
A2/B1, TDP-43, and other hnRNPs, might act in concert to regulate a wide variety of transcripts (Figure 3.16). This would also explain how mutations in one RNA- binding protein, such as the D290V mutation in A2/B1, could lead to disease by disrupting the delicate balance of homeostasis between these RNA-binding proteins and their associated transcripts.
76 a TDP-43TDP-43 MyHCMyHC Pax7 MergeMerge
s Myotubes ) (C2C12)
Myotubess (primary) )
TDP-43 MyHC Merge b
No injury
Five days post-injury
Ten days post-injury
Figure 3.1: TDP-43 Cytoplasmic Distribution of TDP-43 in Myotubes a) TDP-43 (red) colocalizes with MyHC (myotube marker; green) but not Pax7 (myoblast marker; purple) in both C2C12 and primary myotubes. b) TDP-43 (purple) becomes prominent in the cytoplasm in injured myotubes, but that effect is lessened over time. Data courtesy of Parker/Olwin labs.
77 a
Muscle Injury
b
c d
With Normal A2/B1 or TDP-43 mutation
Figure 3.2: Model of TDP-43-Mediated Muscle Repair a) A normal muscle myotube cell, containing multiple myonuclei. b) In response to injury, complexes of TDP-43 and myogenesis-related RNA transcripts are released from myonuclei into the cytoplasm. c) Under normal circumstances, the TDP-43 is returned to the nucleus or degraded once repair is complete. d) If hnRNP A2/B1 or TDP-43 is mutated then TDP-43 remains in the cytoplasm, leading to pathologic effects.
78 a b c TDP-43 A11
d kDa -TDP-43 (n)
1,236 -TDP-43 (28) 1,048 -TDP-43 (24) Uninjured Higher -TDP-43 (20) molecular -TDP-43 (18) weight 720 st -TDP-43 (12) 480 -TDP-43 (10) 5 DPI -TDP-43 (8)
242 -TDP-43 (6) -TDP-43 (4) 146 -TDP-43 (3) Lower -TDP-43 (2) s molecular 66 weight ury -TDP-43 (1) 10 DPI
NativePAGE IB: TDP-43 d CLEM e f
TDP-43 DIC Merge EM Muscle regeneration TDP-43/A11 deposition
Injury
0 days 30-45 days IP: Amyloid Oligomer (A11) TDP-43
Figure 3.3: TDP-43 is Correlated With Amyloid Oligomers a) In myotubes, TDP-43 tends to aggregate into oligomers. b) TDP-43 aggregates are SDS-resistant when run on an SDD-AGE gel. c) TDP-43 and A11 (antibody to amyloid oligomers) display the same staining pattern in myotubes post-injury. d) By correlative light electron microscopy, amyloid oligomers retrieved with A11 antibody contain TDP-43. e) By micro-electron diffraction, TDP-43 complexes display evidence of ß-strands but not mated ß-sheets, suggesting an amyloid-like conformation. Data courtesy of Michael Hughes, David Eisenberg lab (UCLA). f) Model for amyloid formation during muscle regeneration. Data courtesy of Parker/Olwin labs.
79 aC2C12 Myoblast eCLIP b C2C12 Myotube eCLIP
IgG input (1%)IgG IP (10%) Non-crosslinkedNon-crosslinked input (1%)M-X proteinIP (10%)Crosslinked marker Crosslinked IP (replicate input A)Crosslinked (10%) (replicateCrosslinked IP A) (replicate (1%) input B) (10%) (replicate B) (1%) Non-crosslinked input (0.1%) Non-crosslinkedIgG input (0.1%)(1%)IgG IP (10%) Non-crosslinkedNon-crosslinked input (1%)M-X proteinIP (10%)Crosslinked marker Crosslinked IP (replicate input A)Crosslinked (10%) (replicateCrosslinked IP A) (replicate (1%) input B) (10%) (replicate B) (1%)
130 kD — * * 93 kD — 93 kD — 70 kD — * (~50 kD) * 53 kD — — IgG heavy chain 70 kD — — TDP43 (~45 kD) (~50 kD) 41 kD — 53 kD — — IgG heavy chain — TDP43 (~45 kD) 30 kD — 41 kD —
22 kD — 30 kD —
14 kD — 22 kD —
eCLIP IP: Bethyl A303-223A (Lot 1) 5 µg 14 kD — WB probe: Bethyl A303-223A (Lot 1) 1:2000 eCLIP IP: Bethyl A303-223A (Lot 1) 5 µg WB probe: Bethyl A303-223A (Lot 1) 1:2000 c RNAse Low High IgG d Myoblast Myotube Myotube Myoblast Myoblast Myotube — Input IP (A replicate)IP (B replicate)Non-crosslinkedIgG
130 kD 130 kD — 93 kD 93 kD — * 70 kD 53 kD 70 kD — 41 kD 30 kD * 53 kD — 22 kD TDP-43 -- — 14 kD 41 kD — 9 kD C2C12Myoblast eCLIP
30 kD —
22 kD — Input IP (A replicate)IP (B replicate)Non-crosslinkedIgG 14 kD — 130 kD 93 kD 70 kD IP: TDP-43 (A303-223A) 53 kD 32P-labeled 41 kD 30 kD 22 kD C2C12Myotube eCLIP 14 kD 9 kD
Figure 35: eCLIP TDP-43 Procedural Tests a) Western blot displaying retrieval of TDP-43 by IP during myoblast eCLIP, as compared to non-crosslinked and IgG controls. b) Western blot of eCLIP controls for myotube experiment. c) 32P Radiolabeling of eCLIP RNA followed by IP for either TDP-43 or IgG demonstrates specific retrieval of TDP-43 RNA-protein complexes. “Low” RNase is diluted 1:25 as per the eCLIP protocol, while “high” RNase is diluted 1:2. IgG sample uses 1:25 dilution. d) Size selection of eCLIP TDP-43 libraries.
80 a b
NX-input (0.1%)IgG-input IgG(1%) beads (1%)NX-input (1%)NX-beadsM-X (20%) proteinA-IP ladder (20%)A-input (1%)B-IP (20%) B-input (1%) NX-input (0.1%)IgG-input IgG-beads(1%) NX-input (15%) (1%)NX-beadsM-X (20%) proteinA-IP ladder (20%)A-input (1%)B-IP (20%)B-input (1%)
— 170 kD — 130 kD — 130 kD — 93 kD — 93 kD — 79 kD (pink) — 79 kD (pink) — 53 kD — 53 kD
— 41 kD — 41 kD — hnRNP A2B1 (35 kD) — hnRNP A2B1 (35 kD) — 30 kD — 30 kD — 22 kD (green) — 22 kD (green) — 14 kD — 14 kD — 9 kD — 9 kD c d
170 bp — IP (A replicate) IP (B replicate) Non-crosslinked 170 bp — IP (A replicate) IP (B replicate) Non-crosslinked Input 130 bp — 130 bp — Input 93 bp — 93 bp — 70 bp — 70 bp —
53 bp — 53 bp —
41 bp — 41 bp —
30 bp — 30 bp —
22 bp — 22 bp —
14 bp — 14 bp — 9 bp — 9 bp —
e
Myoblast 1:25 RNaseMyoblast dilution 1:2 RNaseMyotube dilution 1:25 RNaseMyotube dilution 1:2 RNaseMyoblast dilutionMyotube IgG IgG
170 kD — 130 kD — 93 kD — 70 kD —
53 kD —
41 kD —
30 kD —
22 kD —
14 kD —
Figure 3.5: eCLIP C2C12 A2/B1 Procedural Tests a) Western blot displaying retrieval of A2/B1 by IP during myoblast eCLIP, as compared to non-crosslinked and IgG controls. b) Western blot of eCLIP controls for myotube experiment. c) Size selection of eCLIP A2/B1 myoblast libraries. d) Size selection of eCLIP A2/B1 myotube libraries. e) Radiolabeled blot demonstrates recovery of RNA-protein complexes with antibody to A2/B1 in C2C12 myoblasts and myotubes as compared to IgG control.
81 aMyoblast replicates b Myotube replicates R = 0.80 R = 0.74 R = 0.99 R = 0.85
cMyoblast vs. Myotube d R = 0.63 R = 0.64 IDR 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0e+00 2e+04 4e+04 6e+04 8e+04 1e+05 num of significant peaks Myoblast (replicate A) vs. IgG Myotube (replicate A) vs. IgG Myoblast (replicate A) vs. Myotube (replicate A) Myoblast (replicate A) vs. Myoblast (replicate B) Myotube (replicate A) vs. Myotube (replicate B)
Figure 3.6: Correlation of A2/B1 eCLIP Experimental Replicates a-c) Correlation of fold enrichment between replicates of all (black) or significant (red) peaks. Values shown are Pearson’s correlation coefficient. d) Irreproducibility Discovery Rate analysis comparing above replicates along with IgG libraries, demonstrating enhanced reproducibility for replicates from similar cell types, and low reproducibility for replicates compared against IgG.
82 a Myoblast Myotube
5d′ UTR Exon Prox. Intron Dist. Intron Prox. Intron Exon 3′ UTR
2 kb 2 kb b Myoblast Myotube
antisense lincRNA miRNA misc_RNA mt_rRNA mt_tRNA non_coding nonsense_mediated_decay processed_transcript protein_coding retained_intron rRNA snoRNA snRNA
Figure 3.7: eCLIP hnRNP A2/B1 Summary a) hnRNP A2/B1 exhibits a strong preference for binding 3′ UTR sequences in both myoblasts and myotubes. b) A2/B1 binding peaks tend to be found in protein-coding genes in both myoblasts and myotubes.
83 10 kb 10 kb Xist (NR_001463) Xist (NR_001463) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks a65 – b 65 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – 1 kb 1 kb H19 (NR_130973) H19 (NR_130973) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks c 225 – d 225 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – 5 kb 5 kb Hnrnpa2b1 (ENSMUST00000114459) Hnrnpa2b1 (ENSMUST00000114459) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks 25 – e25 – f eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 25 – 25 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 25 – 25 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 –
Figure 3.8: A2/B1 C2C12 eCLIP at lncRNAs and A2/B1 Locus eCLIP results in reads per million for myoblast (a, c, e) and myotube (b, d, f) demonstrate binding to lncRNAs Xist and H19, and to 3′ UTR of hnRNP A2/B1 transcript.
84 60 350 000 50 kb 60 400 000 60,450,000 60 350 000 50 kb 60 400 000 60,450,000 Mbnl1 (NM_001253708) Mbnl1 (NM_001253708) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks a12 – b 12 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 12 – 12 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 12 – 12 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – 20 kb 20 kb 83 290 000 83 300 000 83 310 000 83 320 000 Taf15 (NM_027427) Taf15 (NM_027427) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks 25 – 25 – c d eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 25 – 25 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 25 – 25 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0Taf1 – 5 Gm1143 5
5 kb 135 150 000 5 kb 135 155 000 Trim72 (NM_001079932) Trim72 (NM_001079932) Mouse-human conservation Mouse-human conservation Combined peaks Combined peaks e20 – f 20 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 20 – 20 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 20 – 20 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – Trim72 2
Figure 3.9: A2/B1 C2C12 eCLIP of mRNAs eCLIP results in reads per million for myoblast (a, c, e) and myotube (b, d, f) demonstrate binding to transcripts associated with muscle disease in myotubes.
85 aMyoblast replicates b Myotube replicates R = 0.75 R = 0.63 R = 0.96 R = 0.81
cMyoblast vs. Myotube d R = 0.56 R = 0.46 IDR 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 10000 20000 30000 num of significant peaks Myoblast (replicate A) vs. IgG Myotube (replicate A) vs. IgG Myoblast (replicate A) vs. Myotube (replicate A) Myoblast (replicate A) vs. Myoblast (replicate B) Myotube (replicate A) vs. Myotube (replicate B)
Figure 3.10: Correlation of TDP-43 eCLIP Experimental Replicates a-c) Correlation of fold enrichment between replicates of all (black) or significant (red) peaks. Values shown are Pearson’s correlation coefficient. d) Irreproducibility Discovery Rate analysis comparing above replicates along with IgG libraries, demonstrating enhanced reproducibility for replicates from similar cell types, and low reproducibility for replicates compared against IgG.
86 Myoblast classification (536 peaks) Myotube classification (975 peaks) a
5′ UTR Exon Prox. Intron Dist. Intron Prox. Intron Exon 3′ UTR
2 kb 2 kb b Myoblast classification Myotube classification (174 genes) (320 genes)
protein coding retained intron processed transcript nonsense mediated decay lincRNA snRNA miRNA non coding
antisense Mt rRNA snoRNA psudogene rRNA misc RNA c Myoblast Myotube
All peaks
Exon/ UTR peaks
Figure 3.11: eCLIP TDP-43 Summary a) TDP-43 binding peaks are most commonly found in coding exons. b) TDP-43 binding peaks are most commonly found in protein-coding genes. c) Motif analysis of TDP-43 peaks recapitulates previously-identified (UG)n motif.
87 TDP-43 (NM_145556) TDP-43 (NM_145556) 10 kb 10 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation Combined peaks Combined peaks a b 75 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 75 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 75 – 75 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 75 – 75 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – MALAT1 (NR_002847) 5 kb MALAT1 (NR_002847) 5 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation Combined peaks c 4000 – d Combined peaks 2500 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 4000 – 2500 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 4000 – 2500 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 4000 – 2500 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – NEAT1 (NR_131212) NEAT1 (NR_131212) 10 kb 10 kb Exonic UGUGU motifs 5 845 000 5 840 000 5 835 000 5 830 000 5,825, Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation e f Combined peaks Combined peaks 200 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 200 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 200 – 75 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 75 – 200 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – GAPDH (NM_008084) GAPDH (NM_008084) 2 kb 2 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation Combined peaks gCombined peaks 50 – h 25 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 50 – 25 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 50 – 25 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 50 – 25 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – Figure 3.12: TDP-43 eCLIP at Previously Studied RNAs eCLIP results in reads per million (except for IgG, shown as raw reads) for myoblast (a, c, e, g) and myotube (b, d, f, h) demonstrating previously-described binding to TDP-43 3′ UTR, NEAT1, and MALAT1, and minimal binding to negative control transcript GAPDH. Scale between myoblast and myotube may differ.
88 Nebulin (NM_010889) Nebulin (NM_010889) 100 kb 100 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human Mouse-human conservation conservation aCombined peaks b Combined peaks 75 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 75 – 75 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 75 – 75 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 75 – 75 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – Titin (NM_011652) Titin (NM_011652) 100 kb 100 kb Exonic UGUGU motifs Exonic UGUGU motifs 850 000 76 800 000 76 750 000 76 700 000 76 650 000 76 600 000 76 550 000 Mouse-human Mouse-human conservation conservation Combined peaks Combined peaks c d 350 – 350 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – 350 – 350 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – 350 – 350 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 350 – 350 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – MYH3 (NM_001099635) 10 kb MYH3 (NM_001099635) 10 kb Exonic UGUGU motifs Exonic UGUGU motifs 66 895 000 66 900 000 66 905 000 66 910 000 66 915 000 Mouse-human Mouse-human conservation conservation Combined peaks 150 – eCombined peaks f 150 – eCLIP TDP43 Replicate A eCLIP TDP43 0 – Replicate A ) 0 – 150 – ) 150 – eCLIP TDP43 Replicate B eCLIP TDP43 0 – Replicate B ) 0 – 150 – ) eCLIP TDP43 150 – SMInput eCLIP TDP43 SMInput 0 – 150 – 0 – eCLIP IgG 150 – (raw reads) eCLIP IgG 0 – (raw reads) 0 – TNNC1 (NM_009393) TNNC1 (NM_009393) 2 kb 2 kb Exonic UGUGU motifs 500 32 022 000 32 022 500 32 023 000 32 023 500 32 024 000 32 024 500 32 025 000 32Exonic UGUGU motifs 500 32 022 000 32 022 500 32 023 000 32 023 500 32 024 000 32 024 500 32 025 000 32 Mouse-human Mouse-human conservation conservation gCombined peaks h Combined peaks 150 – 150 – eCLIP TDP43 eCLIP TDP43 Replicate A Replicate A 0 – 0 – ) ) 150 – 150 – eCLIP TDP43 eCLIP TDP43 Replicate B Replicate B 0 – 0 – ) ) 150 – 150 – eCLIP TDP43 eCLIP TDP43 SMInput SMInput 0 – 0 – 150 – 150 – eCLIP IgG eCLIP IgG (raw reads) (raw reads) 0 – 0 – Figure 3.13: eCLIP TDP-43 at Myogenic Transcripts eCLIP results in reads per million (except for IgG, shown as raw reads) for myoblast (a, c, e, g) and myotube (b, d, f, h) at myogenesis-associated transcripts demonstrate increased binding in myotubes.
89 a b
66 99 34 51 161 328
A2/B1 TDP-43 A2/B1 TDP-43
c d H19 (NR_130973) 1 kb H19 (NR_130973) 1 kb Exonic UGUGU motifs Exonic UGUGU motifs Mouse-human conservation Mouse-human conservation TDP-43 Combined peaks TDP-43 Combined peaks 5000 – 5000 – eCLIP TDP-43 eCLIP TDP-43 Replicate A Replicate A 0 – 0 – 5000 – 5000 – eCLIP TDP-43 eCLIP TDP-43 Replicate B Replicate B 0 – 0 – 5000 – 5000 – eCLIP TDP-43 eCLIP TDP-43 SMInput SMInput 0 – 0 – eCLIP IgG eCLIP IgG (raw reads) (raw reads)
Combined peaks Combined peaks 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 225 – 225 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput e 0 – f 0 – Xist (NR_001463) 1 kb Xist (NR_001463) mm9 1 kb 10 kb 100,680,000 100,675,000 100,670,000 100,665,000 100,660,000 Exonic UGUGU motifs Exonic UGUGU motifs
Mouse-human conservation Mouse-human conservation TDP-43 Combined peaks TDP-43 Combined peaks 150 – 150 – eCLIP TDP-43 eCLIP TDP-43 Replicate A Replicate A 0 – 0 – 150 – 150 – eCLIP TDP-43 eCLIP TDP-43 Replicate B Replicate B 0 – 0 – 150 – 150 – eCLIP TDP-43 eCLIP TDP-43 SMInput SMInput 0 – 0 – 150 – eCLIP IgG 150 – eCLIP IgG (raw reads) (raw reads)
0 – 0 – A2/B1 Combined peaks A2/B1 Combined peaks 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 Replicate A Replicate A 0 – 0 – 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 Replicate B Replicate B 0 – 0 – 65 – 65 – eCLIP A2/B1 eCLIP A2/B1 SMInput SMInput 0 – 0 – Figure 3.14: Comparison of A2/B1 and TDP-43 eCLIPs a) A minority of transcripts bound by either TDP-43 or A2/B1 are bound by both proteins in myoblasts. b) A minority of transcripts bound by either TDP-43 or A2/B1 are bound by both proteins in myotubes. c) eCLIP of the lncRNA H19 with both TDP-43 (top) and A2/B1 (bottom) in myoblasts identifies a shared binding site as well as unique binding sites. d) eCLIP of H19 in myotubes identifies stronger A2/B1 binding to the 5′ end of the transcript, while TDP-43 binds more strongly to the 3′ end. e) eCLIP of the lncRNA Xist in myoblasts identifies strong binding by both proteins to the 5′ end, but unique binding sites in the middle of the transcript. f) eCLIP of Xist in myotubes identifies a similar binding pattern as in myoblasts.
90 A2/B1 TDP-43
Figure 3.15: Model of Protein-RNA Binding Network Model of ways in which A2/B1 and TDP-43 might cooperatively regulate RNA transcripts, including binding of distinct transcripts, binding of different sites on the same transcript, and binding of both proteins to the same site on a transcript. Other proteins may also be involved.
91 Table 3.1: Gene Ontology Analysis of TDP-43 Binding Sites in C2C12 Cells RefSeq IDs corresponding to genes with exonic/UTR TDP-43 eCLIP peaks unique to myotube were submitted to the Gene Ontology tool (http://geneontology.org). Biological categories were then ranked by enrichment in our gene list over background frequency.
GO biological process complete Fold Enrichment P value sarcomere organization 27.55 1.10E-02 actomyosin structure organization 16.99 2.65E-04 actin cytoskeleton organization 7.53 1.44E-05 single-organism organelle organization 3.65 2.11E-03 organelle organization 2.86 6.81E-06 cellular component organization 1.94 2.26E-02 cellular component organization or biogenesis 1.96 8.92E-03 cytoskeleton organization 4.9 1.39E-05 actin filament-based process 6.66 7.32E-05 myofibril assembly 21.46 3.89E-03 cellular component assembly 2.74 1.55E-02 cellular component biogenesis 2.67 9.07E-03 organelle assembly 4.6 4.62E-02 supramolecular fiber organization 7.12 3.59E-04 striated muscle cell development 12.36 2.92E-03 striated muscle cell differentiation 9.22 5.97E-03 muscle cell differentiation 8.21 1.03E-03 muscle structure development 6.19 5.84E-04 muscle cell development 10.95 7.20E-03 regulation of ATPase activity 26.93 8.95E-05 striated muscle contraction 15.83 4.52E-04 muscle contraction 11.45 1.91E-04 muscle system process 9.84 1.71E-04 translation 6.77 2.21E-02 peptide biosynthetic process 6.37 3.78E-02 regulation of cellular component organization 2.62 5.14E-03
92 Table 3.2: Gene Ontology Analysis of A2/B1 Binding Sites in C2C12 Cells RefSeq IDs corresponding to genes with exonic/UTR hnRNP A2/B1 eCLIP peaks unique to myotube were submitted to the Gene Ontology tool (http://geneontology.org). Biological categories were then ranked by enrichment in our gene list over background frequency.
GO biological process complete Fold Enrichment P value alternative mRNA splicing, via spliceosome 55.62 8.36E-03 mRNA splicing, via spliceosome 11.53 4.72E-03 mRNA processing 6.84 5.79E-03 mRNA metabolic process 6.23 1.50E-03 RNA splicing, via transesterification reactions with bulged adenosine as nucleophile 11.53 4.72E-03 RNA splicing, via transesterification reactions 11.46 4.93E-03 RNA splicing 8.53 6.68E-04 mRNA splice site selection 41.11 2.75E-02 cellular component organization 2.2 4.49E-04 cellular component organization or biogenesis 2.12 1.34E-03 cellular component assembly 2.81 3.76E-02 regulation of alternative mRNA splicing, via spliceosome 33.77 4.01E-03 regulation of mRNA splicing, via spliceosome 19.17 7.31E-03 regulation of RNA splicing 17.35 2.17E-04 muscle cell differentiation 7.79 2.24E-02 muscle structure development 5.64 3.67E-02 supramolecular fiber organization 6.99 4.71E-03 actin cytoskeleton organization 6.99 1.39E-03 cytoskeleton organization 4.33 6.83E-03 actin filament-based process 6.18 5.07E-03
93 CHAPTER IV
THE HISTONE MARK RIP-SEQ METHOD
Introduction
The discovery that PRC2 has the ability to bind to both lncRNAs in a targeted manner at heterochromatin, as well as nascent transcripts in a promiscuous manner
(Davidovich et al., 2013), has raised the question of what properties RNAs contain that distinguish between the two types of binding. Since lncRNAs can interact with
PRC2 at heterochromatin in a highly reproducible manner (Chu et al., 2011; Simon et al., 2011), the RNA must possess some factor, either intrinsic to the RNA or another protein (or RNA), that stimulates PRC2 to deposit H3K27me3 marks.
However, to date no motifs have been found that are specific to interactions between
PRC2 and lncRNAs.
Previous studies that attempted to identify PRC2-binding lncRNAs used methods such as RNA immunoprecipitation to pull down transcripts interacting with selected PRC2 subunits (Guttman et al., 2012; Khalil et al., 2009; Zhao et al., 2010).
However, in addition to potentially identifying off-target transcripts (due to the inability of RIP-seq to distinguish between direct and indirect protein-RNA interactions), these experiments were not designed to distinguish between promiscuous and non-promiscuous binding of RNA. We attempted to circumvent these issues by designing a protocol to identify the subset of PRC2-interacting transcripts that are associated with silenced heterochromatin.
To do this, we adapted a RIP-seq protocol to detect chromatin-associated
RNA using antibodies to histone marks, resulting in a method that we named histone mark RNA immunoprecipitation and sequencing (hmRIP-seq) (Figure 4.1). Due to
94 the inability of the RIP-seq method to distinguish between direct and indirect
protein-RNA interactions, hmRIP-seq can detect transcripts that are bound directly
to histones, or that are bound indirectly via proteins such as PRC2. In the case of
indirect binding, we would expect detected RNA to be bound to one of the RNA
binding domains in PRC2, while one or more of the histone binding sites in PRC2 are
in contact with the histone PTM of interest.
Materials and Methods
Cell Culture
To match prior lncRNA experiments, we used MCF7 cells for testing the
hmRIP-seq method. MCF7s are an adherent cell line is derived from a breast adenocarcinoma, and are classified by the ENCODE Project as a Tier 2 cell line, meaning that many of the genome-wide studies they have performed have included this cell line. MCF7 cells were grown in Roswell Park Memorial Institute Medium
(RPMI) supplemented with 10% FBS and 1x P/S.
Crosslinking of Cells to Improve RNA Recovery
In order to maximize RNA recovery, we decided to use a dual-crosslinking strategy that has previously been shown to improve DNA recovery in ChIP experiments (Zeng et al., 2006). This strategy involves sequential crosslinking of cells using both formaldehyde and ethylene glycol bis(succinimidyl succinate) (EGS)
(Abdella et al., 1979). EGS is a long, homobifunctional molecule containing a long
(16.1 Å) carbon chain that can be cleaved through incubation with hydroxylamine. It was our hope that the formaldehyde would crosslink short-range interactions while the EGS would crosslink longer-range, potentially indirect, interactions.
95 Cells were first collected in either media or quenched trypsin, washed in 1x
PBS, then resuspended in 1x PBS at 20 million cells per milliliter. EGS was prepared
as a fresh 25 mM stock in dimethyl sulfoxide (DMSO), then added to cells at a final
concentration of 1.5 mM. Following rotation for thirty minutes, the cells were treated
with formaldehyde at 1% final concentration, then rotated for an additional eight
minutes. The formaldehyde was then quenched in a ten-minute rotation following
addition of glycine to a final concentration of 125 mM. The cells were then washed
twice in 1x PBS and stored as pellets at –80ºC.
Nuclear Lysis and Chromatin Fragmentation
The nuclear lysis of crosslinked stored cells was based on a standardized lab
protocol. Briefly, frozen cell pellets were thawed and resuspended in five volumes of
Buffer A (10 mM KCl, 1 mM MgCl2, 1 mM NaF, 1 mM Na3VO4, 20 mM HEPES, 1 mM
DTT, and 0.4 mM PMSF), then incubated on ice for ten minutes. Following ten- minute centrifugation at 4ºC and 2500 rpm, the pellet was resuspended in two volumes of Buffer A and dounced 16 times. Centrifugation at 3000 rpm and 4ºC for
15 minutes was sufficient to separate the nuclear pellet from the cytoplasmic supernatant.
The nuclear pellet was resuspended in two volumes of Buffer C (400 mM KCl,
1 mM MgCl2, 1 mM NaF, 1 mM Na3VO4, 20 mM HEPES, 0.1 mM EDTA, 15% glycerol, 1 mM DTT, 0.4 mM PMSF). Following 16 dounce steps, this mixture was rotated at 4ºC for thirty minutes, then centrifuged at 17,000g for 30 minutes at 4ºC to separate the chromatin pellet from nuclear extract in the supernatant.
Testing of chromatin fragmentation conditions revealed that resuspending the cell pellet in high volume (1 mL) of DNase buffer and using high quantities (250U) of
96 RNase-free DNase (Roche) (Figure 4.2a) for 40 minutes at 37ºC followed by quenching with 5 mM EDTA was the most efficient way to fragment the chromatin to an acceptable size. This was followed by ten minute Bioruptor treatment (Figure
4.2b), followed by centrifugation (fifteen minutes at 17,000g at 4ºC) to collect the soluble chromatin in the supernatant.
RNA Immunoprecipitation
To improve RNA recovery, the hmRIP-seq protocol uses overnight immunoprecipitation at 4ºC, following removal of 10% of the sample for input and addition of RNase inhibitor. To ensure integrity of the RNase inhibitor throughout the downstream incubations and decrosslinking steps, the protocol uses RNasin Plus
(Promega N2611), which is advertised as being more stable at elevated temperatures.
To detect potential PRC2-interacting RNAs, the hmRIP-seq protocol was initially tested using an antibody specific to H3K27me3 (abcam ab6002). As a positive control, immunoprecipitations were also performed using an antibody specific for histone H3 regardless of modification status (abcam ab1791). In addition, antibodies specific to two other PTMs were also used (abcam ab1220, specific to
H3K9me2; and abcam ab8580, specific to H3K4me3). As a negative control, an antibody to IgG (Novus NB810-56910) was used, at the same concentration as the IP antibodies. All histone PTM antibody specificities were confirmed using an outside database (Rothbart et al., 2015).
Antibody-protein-RNA complexes were recovered using 25 µg Protein A/G beads (Thermo Fisher 88802) per sample. These beads were first blocked in 1 mL blocking buffer (IP Wash Buffer 1 supplemented with 200 µg/mL BSA and 200
97 µg/mL yeast tRNA) overnight at 4ºC, then incubated with the antibody-protein-RNA complexes for 90 minutes at room temperature.
To remove background contamination, the bead complexes were washed three times with IP Wash Buffer 1 (20 mM Tris-HCl pH 8.0, 200 mM NaCl, 0.5%
Triton-X, 0.5% NP-40, and 1 mM PMSF), followed by a single wash using high-salt
IP Wash Buffer 2 (the same recipe as IP Wash Buffer 1, except using 500 mM NaCl).
The samples were then washed once in LiCl Wash Buffer (10 mM Tris-HCl pH 8.0,
250 mM LiCl, 0.5% NP-40, 0.5% Triton-X, 1 mM EDTA) and once in 1x TE Buffer as previously described(Neff et al., 2012).
Elution of the protein-RNA complexes from beads was performed at 65ºC for
15 minutes in 100 µL Elution Buffer 1 (50 mM Tris-HCl pH 8.0, 10 mM EDTA, 1%
SDS), followed by vortexing in 150 µL Elution Buffer 2 (1x TE supplemented with
0.67% SDS), for a total elution volume of 250 µL per sample.
The samples were then decrosslinked in 1M hydroxylamine-HCl (pH 8.5) for three hours at 37ºC, with 10U RNasin Plus added, to cleave the EGS crosslinked to the samples. The formaldehyde was then decrosslinked by incubation at 55ºC for two hours. Finally, the protein was removed from the nucleic acids by incubation at 37ºC for two hours with 100 µg proteinase K added and 20 µg glycogen added at this step.
The samples were then split into 25% for DNA analysis, and 75% for RNA analysis. hmRIP-seq RNA and DNA Purification
The DNA was purified using the Omega EZNA Cycle Pure Kit (Omega D6493-
01), followed by incubation at 37ºC for thirty minutes with RNase A added to a final concentration of 0.2 µg/µL, and a second purification with the EZNA kit, eluting into a final volume of 30 µL. The RNA was purified as previously described (Yang et al.,
98 2014) using the Qiagen miRNeasy Mini kit (Qiagen 217004), DNase treatment with
TURBO DNase (Ambion AM2238), and second purification with the miRNeasy kit, eluting in a final volume of 8 µL. This kit was used instead of a standard RNA purification kit due to the prior RNA fragmentation during the nuclear lysis step, which necessitated a method that would efficiently retrieve short RNAs.
Generating an hmRIP-seq Sequencing Library
The RNA was reverse transcribed and sequenced using a strand-specific protocol similar to what has previously been described(Borodina et al., 2011). Briefly, the RNA derived from hmRIP was reverse transcribed using SuperScript II (Thermo
Fisher 18064014) according to manufacturer’s instructions, priming off of random hexamers (Thermo Fisher N8080127). Following purification with AMPure XP beads (Beckman Coulter A63880), the second strand was synthesized for 2.5 hours
at 15ºC in a reaction containing the RNA-DNA hybrid, 1x Second Strand Buffer (NEB
B6117S), 1 µL dUTP mix (20 mM dUTP, 10 mM dATP, 10 mM dCTP, 10 mM dGTP),
2.5U RNase H (NEB M0297S), 30U E. coli DNA polymerase (NEB M0209S), and
10U E. coli DNA ligase (NEB M0205S).
The resulting cDNA was then end-repaired using the End-It™ DNA End-
Repair Kit (Lucigen ER0720), and A-tailed for thirty minutes at 37ºC in a reaction
containing 1x NEB2 Buffer, 0.2 mM dATP, and 15U Klenow (3′→5′ exo–) (NEB
M0212S). The cDNA was ligated to Illumina TruSeq adaptors containing Unique
Molecular Identifiers (UMIs) for 30 minutes at room temperature in a reaction
containing 50 nM pre-annealed adaptors, 1x Quick Ligase Buffer (NEB M2200), and
2000U T4 DNA Ligase (NEB M0202S). The cDNA was then size selected using 2%
E-Gel® SizeSelect™ gels (Thermo Fisher G661002). In order to achieve strand
99 specificity, the second strand of the size selected cDNA was selectively degraded by
USER enzyme (NEB M5505S). Following library PCR with barcoded Illumina
TruSeq primers and AMPure bead purification, these libraries were sequenced on an
Illumina MiSeq machine.
Computational Analysis of hmRIP-seq Libraries
Sequencing libraries were validated using the Fastqc program (version
0.10.1), then aligned using the Tophat program (version 2.0.13) (Trapnell et al.,
2009). These alignments were then visualized on the hg19 human reference genome using the bedGraphToBigWig program (Kent et al., 2010).
To identify candidate H3K27me3-interacting transcripts, we followed a two- pronged approach. First, peaks in the hmRIP-seq data were identified using MACS2
(Zhang et al., 2008) using default settings; this identified 82 candidate peaks overlapping previously-identified lncRNAs (Cabili et al., 2011). Then, we identified locations that corresponded to multiple sequencing reads. Since reads with duplicate
UMIs are removed during processing of the sequencing library, each read in the final library should correspond to a unique RNA-histone mark interaction; therefore, transcripts identified in multiple reads, even if insufficient to be called as a peak by
MACS2, should still be considered as candidates. There were 404 transcripts that had at least ten UMI-filtered reads at one location.
These candidates were then manually inspected in the UCSC Genome
Browser, and filtered into a few categories: previously-identified lncRNAs, repetitive or simple regions (as identified by the RepeatMasker database(Smit et al.)), regions identified as peaks with minimal sequencing read buildup, and potential candidates
100 for novel chromatin-associated RNAs. The potential candidates were subjected to
qPCR primer design using the Primer3 program (Untergasser et al., 2012).
Validation of hmRIP-seq Candidates by RT-qPCR
RNA samples from Ezh2, H3K27me3, and IgG hmRIP experiments were reverse transcribed using a High-Capacity cDNA Reverse Transcription Kit (Thermo
Fisher 4368814), then used for qPCR with 2x Takyon for Sybr qPCR Master Mix
(Eurogentec UF-NSCT-B0201) and primers specific to computationally identified candidates. These reactions were then run on a Roche LightCycler 480.
Results
The hmRIP-seq Technique Can Detect Chromatin-Associated RNAs
Peaks called by MACS2 were identified in a number of known chromatin- associated lncRNAs, including NEAT1, MALAT1, and HOTAIR (Figure 4.3). The hmRIP-seq signal also corresponded to previously-identified chromatin-interacting regions. For example, the strong hmRIP signal in exons 1 and 2 of HOTAIR (Figure
4.3a) lies within the previously-identified HOTAIR-PRC2 binding site that spans the
5′-most 300 nucleotides of the HOTAIR transcript.
Identifying Potential Histone Mark-Associated Transcripts
Of the candidate transcripts, eight (TCONS_00006407, TCONS_00026407,
TCONS_00017382, TCONS_00007196, TCONS_00014209, and TCONS_00015119,
LINC000894, and RPPH1) were sufficiently non-repetitive and expressed at high enough level to design qPCR primers to. These primers were tested in an experiment using RNA from hmRIP experiments that used antibodies to H3K27me3 and Ezh2.
101 However, many of these candidates displayed either weak recovery as a percentage
of input, or high signal in the –RT negative control sample.
Three candidates had high enough enrichment in the IP sample to warrant the
design of additional qPCR primer sets to confirm positive results:
TCONS_00006407, TCONS_00026407, and TCONS_00007196.
TCONS_00007196 appeared to have the greatest enrichment over the IgG negative
control in the Ezh2 and H3K27me3 hmRIPs. To confirm these results, we designed
an additional set of primers to a different hmRIP-seq hotspot, which also displayed
strong qPCR signal (Figure 4.4).
A Novel Heterochromatin-Associated RNA
The locus of TCONS_00007196 has a number of overlapping lncRNA
transcript annotations that exist on either strand and are annotated with different
intron/exon boundaries. This suggests that this locus may be a hotspot for pervasive
transcription, with a number of different RNAs being transcribed from this region.
However, the signal detected in the hmRIP-seq experiment mapped strongly to the
minus strand, and also overlapped many of the intron-exon boundaries. The
GENCODE database contained an annotation for a transcript spanning the entire
locus on the minus strand, labeled MTRNR2L12.
MTRNR2L12 is a gene that has been suggested as a potential biomarker for
early Alzheimer’s Disease-like dementia (Bik-Multanowski et al., 2015). It is a
paralog of a mitochondrial RNA, MT-RNR2, which encodes the mitochondrial 16S
ribosomal RNA. Nuclear Mitochondrial Pseudogenes (NUMTs), nuclear sequence
that has been transposed from the mitochondrial genome, are fairly common across
species (Bensasson et al., 2001) and in the human genome (Dayama et al., 2014).
102 They are thought to enter the nucleus due to errors in non-homologous end joining
during end repair, which lead to the inclusion of non-nuclear DNA into the nuclear genome (Ramos et al., 2011). MT-RNR2 contains within it a short, 24-amino acid, open reading frame that encodes a protein called humanin, which has been shown to inhibit neuronal cell death in some models of Alzheimer’s Disease but also be a potential oncogene (Maximov et al., 2002).
There are thirteen paralogs of MT-RNR2 in the human nuclear genome, of which MTRNR2L12 is one of the closest in sequence to MT-RNR2 (Bodzioch et al.,
2009). Of these paralogs, MTRNR2L12 had the strongest hmRIP enrichment over input across the entire transcript (Figure 4.5). Notably, the sequences in MTRNR2L8 and MTRNR2L12 corresponding to the humanin ORF contain a polymorphic site
(rs7350541) that, if translated in the nucleus, would have the same sequence that the
MT-RNR2 humanin protein would have if by mitochondrial tRNAs (Bodzioch et al.,
2009).
The finding that this transcript interacts with heterochromatin was an interesting one, as none of the previous functions attributed to MTRNR2L12, MT-
RNR2, or the other paralogs was chromatin-related. Due to the high expression of
MTRNR2L12 in many cell types, and the relatively high level of H3K27me3 around the loci of the other paralogs (Figure 4.5), we hypothesized that MTRNR2L12 might interact with chromatin surrounding the other paralogs.
We performed a ChIP-qPCR experiment using primers designed specifically to each of the thirteen MT-RNR2 paralogs, to examine whether or not there was evidence of H3K27me3 marks at each individual paralog. We found that all paralogs except for MTRNR2L12 and MTRNR2L13 displayed some H3K27 methylation as
103 compared against a high-H3K27me3 positive control locus (Figure 4.6b). This suggests that either the other eleven paralogs are specifically silenced, or
MTRNR2L12/13 are specifically not silenced.
A model that would fit these data is one in which MTRNR2L12 interacts with heterochromatin at the other paralogs, potentially in conjunction with PRC2 (Figure
4.7). The sequence similarity between MTRNR2L12 and the sequence of the other paralogs suggests that MTRNR2L12 RNA molecules could bind to the DNA or nascent RNA transcripts of the paralogs. If MTRNR2L12 was also bound to PRC2 during this process, it could lead to targeted silencing at the other paralogs.
Potential Piwi Protein-Interacting Transcripts
We also performed an hmRIP-seq experiment using an antibody to
H3K9me2, another common heterochromatin-associated mark that is generally spread across larger genomic regions than H3K27me3. We suspected that some transcripts associated with H3K27me3 would also display association with
H3K9me2, since PRC2 has been shown to interact with H3K9 methylases (Boros et al., 2014; Mozzetta et al., 2014). However, there is also reason to believe that these two histone marks do not always co-occur. It has been shown that Polycomb- associated H3K27me3-marked chromatin has tighter packing dynamics which could impose restrictions on interactions with external factors (Boettiger et al., 2016). Our data shows moderate overlap for transcripts with H3K9me2 and H3K27me3 signal, suggesting that some transcripts are interacting with regions containing both of these histone marks. This implies that some transcripts might interact with particular heterochromatin-associated PTMs, while other transcripts might interact with heterochromatin in a histone PTM-independent manner.
104 The hmRIP-seq data displayed a number of reads diffusely overlapping
repetitive Long Interspersed Nuclear Elements (LINEs) (Figure 4.8a). LINEs are retrotransposons that are approximately six kilobases in length, and exist in hundreds of thousands of copies spread across the human genome, a small fraction of which are currently considered active. LINEs contain two open reading frames, which code for retrotransposon and chaperone proteins. Active LINEs are transcribed and have both open reading frames translated, allowing them to copy themselves to other parts of the genome. It has been proposed that transposons can comprise a significant portion of lncRNA sequence (Kapusta et al., 2013; Kelley and
Rinn, 2012), as transposon insertion is a potential mechanism for the insertion of a de novo transcription start site into the genome. Transposon insertions into the middle of lncRNAs are also more likely to be retained across generations than insertions into mRNAs, since insertions into lncRNAs will not interfere with any protein coding sequence.
Since most of the reads were in LINEs not overlapping annotated lncRNAs, we suspected that we were observing a different mechanism than the binding of
HOTAIR to chromatin modifiers and heterochromatin. That mechanism could potentially be the silencing of LINEs in action. It has previously been shown that genomic LINEs are silenced by H3K9 methylation, and those that are transcribed can be silenced in an RNAi-like manner (Bulut-Karslioglu et al., 2014; Yang and
Kazazian, 2006). Similar small RNAs generated from transposable elements can also act as piwi-interacting RNAs (piRNAs), which direct the piwi complex of silencing proteins to targeted genomic loci, leading to recruitment of H3K9 methylases and the deposition of H3K9 methyl marks (Le Thomas et al., 2013). One possible
105 explanation for hmRIP-seq reads to map to a LINE would be the identification of
nascent transcripts from LINEs that are in the process of being silenced by this
process, or have been incompletely silenced (Figure 4.8b).
In order to more closely investigate the regions of LINEs that associate with
H3K9me2, we mapped the location of each sequencing read against LINEs
annotated in the L1Base database (Penzkofer, 2004). This database divides LINEs
into three categories: those that are inactive, those that have a mutation in one open
reading frame, and those that are active. Interestingly, we identified certain sites that
were particularly well-represented in our hmRIP-seq data (Figure 4.9). These sites appeared to be correlated with the activity of each LINE category—sites in each region were enriched only in categories containing active versions of that region.
Those LINEs classified as active had all of the sites, while those with an inactive first open reading frame were missing the sites within the first open reading frame.
Discussion
The results provided by hmRIP-seq suggest an effective method to identify heterochromatin-associated RNA transcripts in a number of different contexts. The identification of MTRNR2L12 is a particularly interesting case, as it represents a novel heterochromatin-interacting RNA. The unusual evolutionary history of
MTRNR2L12 and its paralogs suggest a model for how other gene families with numerous duplications in the genome might be silenced. Unfortunately, the high sequence similarity between the paralogs means that further analysis of this gene family will prove difficult.
While we have hypothesized a potential model for how MTRNR2L12 interacts with heterochromatin, the hmRIP-seq method is poorly suited to understanding the
106 details of the interaction. A high-throughput RNase footprinting experiment has identified protein binding sites on MTRNR2L12, suggesting that it is protein-bound
(Ji et al., 2016). While the bound protein could be PRC2, as is the case with HOTAIR,
MTRNR2L12 may function differently instead—for example, binding to an
H3K27me3 reader complex or binding directly to heterochromatin.
The difficulties in assigning a mechanism to MTRNR2L12-mediated silencing
are symptomatic of a key weakness of hmRIP-seq: an inability to identify whether
RNA-chromatin interactions are direct or indirect, and if indirect, what other protein
cofactors are present and necessary for the interaction. To clarify the answers to
these questions, hmRIP-seq targets must currently be subjected to other types of
experiments, such as eCLIP-seq experiments to other chromatin-modifying
enzymes.
The identification of LINEs in the H3K9me2 hmRIP-seq was also an
interesting result, as it provided some support for a mechanism of repetitive element
silencing that had been suggested by previous studies. The disappearance of hmRIP-
seq sites when their regions do not make functional protein suggests that association
of LINE transcript regions with heterochromatin is correlated with translation of
those regions. It is possible that the generation of targeting piRNAs from LINEs is
associated with the process of translation, which could focus silencing on LINEs that
are able to re-insert themselves into the genome, a more efficient process than
blindly silencing all LINEs.
However, as with the repetitive MT-RNR2-like family of genes, studying
LINEs in a high-throughput manner would be difficult, considering their high
sequence similarity. Adapting the hmRIP-seq method to use longer RNA fragments
107 to achieve longer sequencing reads might improve identification of reads mapping to
repetitive elements by raising the chance that any given sequencing read is able to be
mapped back to the genome uniquely. Improved identification of repetitive elements
would be useful in many respects, since retrotransposons have been suggested to
interact with heterochromatin in a number of ways.
First, since many lncRNAs have been suggested to contain evidence of
transposable elements (Elisaphenko et al., 2008; Johnson and Guigó, 2014; Kapusta
et al., 2013), this would allow for more accurate identification of lncRNAs associated with chromatin. Second, retrotransposons in the genome must be silenced in order to ensure genomic stability (Bulut-Karslioglu et al., 2014; Sadic et al., 2015), which suggests that any RNAs mediating their silencing would be detectable by hmRIP-seq.
Third, a subset of LINEs are expressed during some forms of heterochromatin formation (Chow et al., 2010), and hmRIP-seq could be used to more fully
characterize their functions.
Unfortunately, the hmRIP-seq method as described is limited by poor RNA
recovery. While we detected strong binding to previously-identified
heterochromatin-interacting RNAs such as HOTAIR, there was only minimal signal
at many other transcripts. As a result, it was difficult to identify novel binding
partners of silencing PTMs, with MTRNR2L12 being the only high-quality candidate
to emerge from the hmRIP-seq results that validated well by RT-qPCR. Further
technical improvements to the hmRIP-seq protocol could improve RNA recovery and
the sensitivity of the protocol to detect novel heterochromatin-associated transcripts.
Using hmRIP-seq on different types of cells to see if heterochromatin-
associated transcripts change in response to different cellular conditions may be a
108 worthwhile future direction. One reason that MCF7 cells have been used for much
prior lncRNA research is that they contain relatively high levels of endogenous
HOTAIR, perhaps reflected in the strong hmRIP-seq readings along the HOTAIR
transcript. It is possible that the high levels of PRC2-bound HOTAIR in MCF7 cells preclude other lncRNAs from binding to PRC2 and heterochromatin; performing the same analysis in a cell line with low HOTAIR would remove this constraint, allowing signal from other lncRNAs to be detected more strongly in an hmRIP-seq
experiment.
109 a Cross-link cells with formaldehyde and EGS
b Lyse cells to retrieve nuclear pellets
c DNase treat and solubilize chromatin
d IP with histone mark antibodies
e De-crosslink with hydroxylamine and heat
f Purify RNA/DNA for sequencing library creation
Figure 4.1: hmRIP-seq Protocol Flowchart Flowchart of steps in hmRIP-seq protocol.
110 a b
Ladder 0 minutes Bioruption15 minutes Bioruption60 minutes Bioruption Ladder 150U DNase 200U DNase 225U DNase 250U DNase
1 kb —
500 bp — 400 bp — 1 kb —
300 bp —
200 bp — 500 bp — 400 bp — 100 bp — 300 bp —
200 bp —
100 bp —
Figure 4.2: hmRIP-seq Tests a) Titration of DNase points for the hmRIP-seq protocol b) Test of Bioruption time points for the hmRIP-seq protocol
111 a PRC2-interacting region 2 kb HOTAIR 54 363 000 54 362 000 54 361 000 54 360 000 54 359 000 54 358 000 54 357 000 54,356 (NR_003716) 5 –
K27 hmRIP enrichment
–5 – 5 – K27 hmRIP 0 – 5 –
Input 0 – b 5 kb MALAT1 000 65 270 000 65 (NR_002819) 6 –
K27 hmRIP enrichment
–6 – 40 – K27 hmRIP 0 – 40 –
Input 0 – c 10 kb NEAT1 000 65,195,000 65,200,000 65,205,000 65,210,000 65,215 (NR_131012) 5 –
K27 hmRIP enrichment
–5 – 5 – K27 hmRIP 0 – 5 –
Input 0 – Figure 4.3: hmRIP-seq Profiles at lncRNAs Input and H3K27me3 hmRIP-seq profiles, in reads per million, for known heterochromatin-interacting transcripts. At top of each plot are enrichment values across each gene, calculated as log(K27/input).
112 500 bp MTRNR2L12 (ENST00000600213) 96,336,000 96,336,500 96,337,000 TCONS 00007196 TCONS 00007197 TCONS 00006915 Other annotated TCONS 00006916 TCONS 00006917 lncRNAs TCONS 00006918 TCONS 00007198 TCONS 00006919 TCONS 00006556 5 –
K27 hmRIP enrichment
–5 – 5 – K27 hmRIP 0 – 5 –
Input 0 – 15 – ENCODE MCF7 H3K27me3 ChIP-seq 0 –
Figure 4.4: MTRNR2L12 hmRIP-seq and Confirmatory qPCR MTRNR2L12 displays strong hmRIP signal for H3K27me3. Relative hmRIP and input signal are shown, along with enrichment, for MTRNR2L12 as in Figure 3. Also included are lncRNAs predicted at this locus (Cabili et al. Genes and Development 2011). Below, RT-qPCR results of MTRNR2L12 loci displaying H3K27me3 hmRIP signal, as compared to RT-qPCR results from highly abundant control RNAs. RT-qPCR was performed on hmRIP samples using antibodies to H3K27me3, IgG, and Ezh2, a PRC2 component.
113 1 kb 500 bp MTRNR2L1 22,022,500 22,023,000 22,023,500 22,024,000 MTRNR2L2 79 946 000 79 946 500 79 (NM_001190452) (NM_001190470) a5 – b 5 – K27 hmRIP K27 hmRIP enrichment enrichment
–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 – Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – 500 bp 1 kb MTRNR2L4 3,421,500 3,422,000 MTRNR2L3 933,500 55,934,000 55,934,500 55,935, (NM_001190476) (NM_001190472) 5 – 5 – c d K27 hmRIP K27 hmRIP enrichment enrichment
–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 –
Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq ChIP-seq 0 – 0 – 1 kb 1 kb MTRNR2L6 142,374,500 142,375,000 142,375,500 MTRNR2L5 57,359,000 57,359,500 57,360,000 57,360,500 (NM_001190487) (NM_001190478) 5 – 5 – e f K27 hmRIP K27 hmRIP enrichment enrichment
–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 – Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – 1 kb 500 bp MTRNR2L7 37,890,500 37,891,000 37,891,500 37,892,000 MTRNR2L8 10,529,500 10,530,000 10,530,500 (NM_001190489) (NM_001190702) g 5 – h 5 – K27 hmRIP K27 hmRIP enrichment enrichment
–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 –
Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – 200 bp 500 bp MTRNR2L9 000 62,284,100 62,284,200 62,284,300 62,284,400 62,284,500 62, MTRNR2L10 55,208,000 55,208,500 55,209,000 (NM_001190706) (NM_001190708) i 5 – j 5 – K27 hmRIP K27 hmRIP enrichment enrichment
–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 –
Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – 1 kb 1 kb MTRNR2L11 MTRNR2L13 000 238,107,500 238,108,000 238,108,500 000 117,220,500 117,221,000 117,221,500 (ENST00000604646) (ENST00000604093) k 5 – l 5 – K27 hmRIP K27 hmRIP enrichment enrichment
–5 – –5 – 5 – 5 – K27 hmRIP K27 hmRIP 0 – 0 – 5 – 5 –
Input Input 0 – 0 – 15 – 15 – ENCODE MCF7 ENCODE MCF7 H3K27me3 H3K27me3 ChIP-seq 0 – ChIP-seq 0 – Figure 4.5: MTRNR2L12 Paralogs hmRIP-seq hmRIP-seq profiles of MTRNR2L1–11 and MTRNR2L13
114 a L11 L10
L7 L3
L13
L4 L5
L6
L1 L9
L12
L2 L8
MT-RNR2 b
Figure 4.6: MTRNR2L12 Paralogs Relationship and H3K27me3 ChIP-qPCR a) Evolutionary distance between MTRNR2L12, MT-RNR2, and other paralogs. Tree generated using UCSC Genome Browser (http://genome.ucsc.edu), Clustal Omega (http://www.clustal.org/), and Dendroscope (http://dendroscope.org) b) ChIP-qPCR of MTRNR2L12 loci (same amplicons as Figure 4.4) and the other MT- RNR2 paralogs using DNA from an H3K27me3 hmRIP experiment. Y-axis is percent input, normalized against percent input at control locus “+.” Error bars represent standard error (n=4).
115 X
MTRNR2L3
X MTRNR2L12 MTRNR2L2
X
MTRNR2L7
Figure 4.7: Potential Mechanism of MTRNR2L12-Mediated Paralog Silencing MTRNR2L12 is transcribed, then binds to PRC2 at the genomic loci of its paralogs, effectively silencing them. MTRNR2L12 may also associate with heterochromatin via other heterochromatin-associated proteins as well.
116 5 kb SORCS1 (NM_052918) 5 – Input 0 – 5 – H3K27me3 hmRIP 0 – 5 – H3K9me2 hmRIP RepeatMasker 0 – LINEs
Piwi
Piwi 1 2 5
Pol II Active LINE
ORF1 ORF2
3
ORF1 ORF2 Nucleus 4
Figure 4.8: Model of Potential LINE Functions a) An example of a LINE with diffuse H3K9me2 hmRIP-seq across it. b) Model of potentialLINE transcript-H3K9me2 interactions ɠ An active LINE is transcribed and exported from the nucleus. ɡ The LINE transcript is recognized by a piRNA-associated Piwi complex, which contains proteins that degrade the LINE transcript. ɢ The LINE transcript codes for two proteins, ORF1 and ORF2 ɣ ORF1, a chaperone, and ORF2, a reverse transcriptase, can facilitate re- integration of the LINE transcript into the genome at a different location. ɤ The LINE transcript can also be recognized by nuclear Piwi complexes that contain histone methyltransferases. These complexes can silence LINEs by depositing H3K9me2 upstream of the LINE transcription start site.
117 0.05 LINE Type
Intact
0.04 Intact ORF2
All Full Length
0.03
Frequency 0.02
0.01
0.00
0% 25% 50% 75% 100% Distance along LINE 5′ UTR ORF1 ORF2 3′ UTR Figure 4.9: hmRIP-seq Reads in LINEs hmRIP-seq H3K9me2 sequencing reads mapped along all genomic LINEs, as classified by L1Base. Green arrows mark peaks that only appear in LINEs that are fully intact or only have an intact ORF2. Red arrow marks a peak that only appears in fully intact LINEs.
118 CHAPTER V
DISCUSSION
The world of RNA-associating proteins is vast. In this thesis, I have focused on a small subset of RNA-protein interactions: interactions involved in chromatin
and silencing, and interactions that are involved in RNA trafficking. The insights
gained from research into these areas, however, can also be applied to many other
aspects of RNA-protein interactions.
The role of RNA in chromatin silencing thus far has focused on the potential for transcripts to act as a targeting mechanism for histone-modifying protein
complexes. The Matchmaker Model proposed by our lab provides a twist to this
formula, shifting the focus from individual protein-RNA interactions towards
networks of interactions in which multiple factors must be present and interacting
with each other appropriately in order for gene silencing to occur. This paradigm
greatly increases the potential complexity of targeting interactions.
With greater potential for complexity comes greater potential for something
to go wrong. In the example of HOTAIR-mediated PRC2 gene silencing,
dysregulation of or mutations in either of those components, hnRNP A2/B1, or the
nascent transcript that A2/B1 binds, could result in disrupted PRC2 activity. It is
reasonable to wonder how such a complex system evolved, when a simpler system
(in which HOTAIR binds to PRC2 and specific gene loci, and PRC2 acts whenever it
is adjacent to chromatin) appears as though it would suffice to regulate gene
silencing in most circumstances.
One possibility is that PRC2 is a multifunctional protein complex that has
functions outside of catalyzing the formation of H3K27me3. Although research on
119 PRC2 has focused on its effects on H3K27me3, its subunits also contain binding regions for histones and non-H3K27me3 histone PTMs. If PRC2 plays some role on
chromatin aside from silencing, it would make sense that another cofactor would be
necessary to activate the silencing function of PRC2, deposition of H3K27me3, and
the silencing of the locus.
Another possibility is that having multiple factors involved in silencing
through an interaction network could facilitate quicker evolution of gene targeting. If
mutation of any factor in the network can lead to disease, then such mutations can
also lead to novel gene silencing or activation events. On a small scale this could be
advantageous, but a large number of novel silencing targets would predispose the
organism to disease. Since lncRNAs have been shown to evolve more rapidly than
proteins and other types of transcripts (Kapusta and Feschotte, 2014; Necsulea et al.,
2014; Washietl et al., 2014), they would likely be the primary driver of changes in
targeting through this mechanism. PRC2 and A2/B1, although they have evolved
more slowly than lncRNAs, could also provide changes to gene targeting specificity.
It is also possible that modifications, such as alternate splice isoforms (in the case of
A2/B1), phosphorylation (in the case of Ezh2 (Kaneko et al., 2010)), or methylation
(in the case of Jarid2 (Sanulli et al., 2015)) of these proteins could affect their
function and specificity.
Our identification of binding of A2/B1 to intron 3 of HOTAIR suggests new
possibilities regarding the roles that RNA-binding proteins can play. Since A2/B1 is a
known splicing regulator, one possibility would be for A2/B1 to bind intron 3 merely
to regulate its alternative splicing. In fact, such binding may be responsible for the
small fraction of intron 3 that is retained in the HOTAIR transcript in total RNA.
120 However, our evidence that A2/B1 also binds fully spliced HOTAIR suggests that
binding to intron 3 may simply represent a precursor to subsequent binding
elsewhere in HOTAIR domain 1.
It is possible that intron 3 represents a particularly strong binding site for
A2/B1, and serves to recruit A2/B1 more strongly than a fully spliced HOTAIR
would. If even a fraction of A2/B1 remained bound to HOTAIR following the splicing
of intron 3, that would enhance A2/B1-HOTAIR interactions, leading to increased
formation of RNA-RNA interactions between HOTAIR and other transcripts. An
interesting experiment to perform would be overexpression of HOTAIR without
intron 3, to see if A2/B1 binding and subsequent B1-mediated silencing events are
affected genome-wide.
Although investigation of A2/B1 by eCLIP is a good way to study the silencing
aspects of A2/B1, it also allows for study of A2/B1 away from chromatin, a role which
has been much less extensively studied than its actions on chromatin and with
nascent transcripts. We have characterized the activity of cytoplasmic A2/B1 in
cancer cell lines. The identification of numerous A2/B1 targets in the cytoplasm
under normal circumstances was a fascinating observation, given that the existence
of many RNA-binding proteins in the cytoplasm was previously thought to be
predominantly pathologic.
Our eCLIP fractionation experiment showed a small amount of cytoplasmic and nucleoplasmic A2/B1-RNA binding in normal MCF7 cells, with a far different binding profile than chromatin-associated A2/B1. Most striking was the identification of strong binding to proximal introns and 5′ UTR sequences, as opposed to exonic sequence. The fraction of cytoplasmic RNAs that is exonic is far
121 higher than in unspliced chromatin-associated RNA, leading us to predict that A2/B1
would be more likely bind exonic sequence in the cytoplasm.
Our analysis instead suggests other roles for cytoplasmic A2/B1, as the lack of
exonic binding potentially evolved to prevent interference between A2/B1 binding
and interactions between RNAs and the actively translating ribosomes. First, binding
to proximal introns likely represents a consequence of A2/B1 modulating the splicing
of those transcripts on chromatin, representing a potential mechanism for regulation
of their maturation. A potential model for this would involve A2/B1 binding to and
interfering with the splicing of an intron, then remaining bound to that intron following nuclear export. It is also possible that A2/B1 binds in order to slow the process of mRNA maturation, allowing splicing to happen eventually but at a specifically defined time or place.
Second, 5′ UTR binding likely represents binding to regulatory regions. While binding of A2/B1 to stability-associated elements in the 3′ UTR has been shown
previously (Geissler et al., 2016), no such binding has been identified at the 5′ end.
Although there are no functional examples of A2 interacting with the 5′ UTR, its
paralog hnRNP A1 has been shown to interact with the 5′ UTR of a viral RNA (Lin et
al., 2009), providing a model for how 5′ UTR binding could function. It is possible
that A2/B1 is binding to 5′ UTR elements in human transcripts in a similar manner,
in a process involving elements that have not yet been described.
C2C12 mouse muscle cells represent a far different cellular environment than
MCF7 breast adenocarcinoma cells. For example, our eCLIP of A2/B1 in myoblasts
indicated a large proportion of binding to the 3′ UTR, similar to eCLIP in human
neurons (Martinez et al., 2016) but far different from the preferential intron binding
122 detected in our nuclear MCF7 dataset. This suggests that A2/B1 binding sites are highly variable and dependent on both cell type and local cellular environment. On a
transcript containing an A2/B1 binding site, that site might only be used in certain
cell types, in certain cellular compartments, or at certain times in development.
Our comparison of TDP-43 and A2/B1 eCLIP libraries provides evidence for the hypothesis that the transcriptome may be subdivided by association with different RNA-binding proteins (Mohagheghi et al., 2016). Between TDP-43 and
A2/B1 there are a significant number of uniquely bound transcripts, and even on the transcripts that are bound by both proteins there appear to be regions of transcript that are only bound by one or the other protein. This suggests a model in which transcripts can be bound cooperatively by a number of RNA-binding proteins, each with its own distinct binding regions.
Combinatorial transcript recognition by RNA binding proteins could represent a major point of regulation for particular transcripts. For example, binding of RNA by TDP-43 could promote aggregation into RNP granules, as TDP-43 tends to form octameric complexes in the cell. If RNAs in such granules are also bound by hnRNP A1, that could then promote RNA-RNA interactions in the granule, as the antiparallel orientation of the RRMs in A1 could promote base pairing between different transcripts (Shamoo et al., 1997). Transcripts are likely to contain regions that are bound by more than one protein; the RepA region of Xist is one example, since it appears to be bound by both TDP-43 and A2/B1. It is possible that binding by one protein might prevent binding by another to the same region, allowing for more binding combinations if one protein is absent.
123 The aggregation of TDP-43 into oligomers that display features of amyloid
structures is of specific interest, as amyloid is described almost exclusively in terms
of its pathologic potential. In humans there are few examples of non-pathologic
amyloid formation, including packing of hormone peptides in secretory granules
(Maji et al., 2009) and proteins that are specific to melanosomes (Berson et al.,
2003). Both of these examples involve proteins that take advantage of the natural properties of amyloid: in secretory granules, it would be advantageous to have proteins packed as tightly as possible in a small space. Meanwhile, in melanosomes
the aggregative properties of amyloid structures appear to aid in the pigmentation of the melanosomes.
The discovery of non-pathogenic amyloid structures in muscle cells thus represents one of the first identifications of functional amyloid in an organ system where it is thought to cause disease. This redefines the questions that researchers should have when thinking about the effect that amyloid has on disease—rather than assuming that amyloid structures are pathogenic, the question should be how the timing of amyloid formation and degradation are altered in disease. Our data indicate strong binding of TDP-43 to myogenic transcripts during muscle differentiation. If cytoplasmic TDP-43 is also binding long myogenic transcripts in disease states, that would almost certainly affect cellular function.
First, the massive size of many myogenic transcripts (a fully spliced Titin mRNA transcript is 114 kilobases, which would be over 30 µm long if unfolded) would lead them to interfere with other cellular functions, even if binding by TDP-43
(which binds across the length of the mature transcript) prompts those transcripts to fold into a more compact shape. Second, cytoplasmic localization of TDP-43 during
124 normal cellular function would have an adverse effect on RNA homeostasis in both
the nucleus and cytoplasm. Less TDP-43 in the nucleus would adversely affect the
alternative splicing events that it mediates, while more TDP-43 in the cytoplasm
could lead to inappropriate localization of mRNAs that it would normally bind.
The investigations into protein-RNA interactions described in this thesis
suggest a number of avenues for future research. Our study of hnRNP A2/B1
provides evidence for a number of proposed activities that it might perform, and
posits it as a multifunctional protein that acts within numerous interrelated
pathways. The collaborative work we have performed on TDP-43 clarifies its
functional role in muscle, where it has previously been thought act pathologically, and sets the stage for future investigations into its critical role in myogenesis.
Subsequent studies characterizing these proteins, their interactions, and their associations with disease will have the opportunity to build upon these results and give a more complete picture of how their interactions with RNA function in the cell.
125 REFERENCES
Abdella, P.M., Smith, P.K., and Royer, G.P. (1979). A new cleavable reagent for cross- linking and reversible immobilization of proteins. Biochem. Biophys. Res. Commun. 87, 734–742.
Alarcón, C.R., Goodarzi, H., Lee, H., Liu, X., Tavazoie, S., and Tavazoie, S.F. (2015). HNRNPA2B1 Is a Mediator of m6A-Dependent Nuclear RNA Processing Events. Cell 162, 1299–1308.
Amit-Avraham, I., Pozner, G., Eshar, S., Fastman, Y., Kolevzon, N., Yavin, E., and Dzikowski, R. (2015). Antisense long noncoding RNAs regulate var gene activation in the malaria parasite Plasmodium falciparum. Proc Natl Acad Sci USA 112, E982– E991.
Amlie-Wolf, A., Ryvkin, P., Tong, R., Dragomir, I., Suh, E., Xu, Y., Van Deerlin, V.M., Gregory, B.D., Kwong, L.K., Trojanowski, J.Q., et al. (2015). Transcriptomic Changes Due to Cytoplasmic TDP-43 Expression Reveal Dysregulation of Histone Transcripts and Nuclear Chromatin. PLoS ONE 10, e0141836.
Augui, S., Nora, E.P., and Heard, E. (2011). Regulation of X-chromosome inactivation by the X-inactivation centre. Nat. Rev. Genet. 12, 429–442.
Bailey, T.L. (2011). DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659.
Barbosa-Morais, N.L., Carmo-Fonseca, M., and Aparício, S. (2006). Systematic genome-wide annotation of spliceosomal proteins reveals differential gene family expansion. Genome Res 16, 66–77.
Beltran, M., Yates, C.M., Skalska, L., Dawson, M., Reis, F.P., Viiri, K., Fisher, C.L., Sibley, C.R., Foster, B.M., Bartke, T., et al. (2016). The interaction of PRC2 with RNA or chromatin is mutually antagonistic. Genome Res 26, 896–907.
Bengoechea, R., Pittman, S.K., Tuck, E.P., True, H.L., and Weihl, C.C. (2015). Myofibrillar disruption and RNA-binding protein aggregation in a mouse model of limb-girdle muscular dystrophy 1D. Hum Mol Genet 24, 6588–6602.
Bensasson, D., Zhang, D.X., Hartl, D.L., and Hewitt, G.M. (2001). Mitochondrial pseudogenes: evolution's misplaced witnesses. Trends in Ecology & … 16, 314–321.
Berson, A., Barbash, S., Shaltiel, G., Goll, Y., Hanin, G., Greenberg, D.S., Ketzef, M., Becker, A.J., Friedman, A., and Soreq, H. (2012). Cholinergic-associated loss of hnRNP-A/B in Alzheimer's disease impairs cortical splicing and cognitive function in mice. EMBO Mol Med 4, 730–742.
126 Berson, J.F., Theos, A.C., Harper, D.C., Tenza, D., Raposo, G., and Marks, M.S. (2003). Proprotein convertase cleavage liberates a fibrillogenic fragment of a resident glycoprotein to initiate melanosome biogenesis. The Journal of Cell Biology 161, 521–533.
Bhardwaj, A., Myers, M.P., Buratti, E., and Baralle, F.E. (2013). Characterizing TDP- 43 interaction with its RNA targets. Nucleic Acids Research 41, 5062–5074.
Biamonti, G., Ruggiu, M., Saccone, S., Valle, Della, G., and Riva, S. (1994). Two homologous genes, originated by duplication, encode the human hnRNP proteins A2 and A1. Nucleic Acids Research 22, 1996–2002.
Bik-Multanowski, M., Pietrzyk, J.J., and Midro, A. (2015). MTRNR2L12: A Candidate Blood Marker of Early Alzheimer’s Disease-Like Dementia in Adults with Down Syndrome. Jad 46, 145–150.
Bin Cho, S., Ahn, K.J., Do Hee Kim, Zheng, Z., Cho, S., Kang, S.-W., Lee, J.H., Park, Y.-B., Lee, K.H., and Bang, D. (2012). Identification of HnRNP-A2/B1 as a Target Antigen of Anti-Endothelial Cell IgA Antibody in Behçet's Disease. Journal of Investigative Dermatology 132, 601–608.
Bodzioch, M., Lapicka-Bodzioch, K., Zapala, B., Kamysz, W., Kiec-Wilk, B., and Dembinska-Kiec, A. (2009). Evidence for potential functionality of nuclearly- encoded humanin isoforms. Genomics 94, 247–256.
Boettiger, A.N., Bintu, B., Moffitt, J.R., Wang, S., Beliveau, B.J., Fudenberg, G., Imakaev, M., Mirny, L.A., Wu, C.-T., and Zhuang, X. (2016). Super-resolution imaging reveals distinct chromatin folding for different epigenetic states. Nature 529, 418–422.
Borodina, T., Adjaye, J., and Sultan, M. (2011). A strand-specific library preparation protocol for RNA sequencing. Meth. Enzymol. 500, 79–98.
Boros, J., Arnoult, N., Stroobant, V., Collet, J.-F., and Decottignies, A. (2014). Polycomb repressive complex 2 and H3K27me3 cooperate with H3K9 methylation to maintain heterochromatin protein 1α at chromatin. Mol. Cell. Biol. 34, 3662–3674.
Bulut-Karslioglu, A., La Rosa-Velázquez, De, I.A., Ramirez, F., Barenboim, M., Onishi-Seebacher, M., Arand, J., Galán, C., Winter, G.E., Engist, B., Gerle, B., et al. (2014). Suv39h-dependent H3K9me3 marks intact retrotransposons and silences LINE elements in mouse embryonic stem cells. Mol Cell 55, 277–290.
Buratti, E., Dörk, T., Zuccato, E., Pagani, F., Romano, M., and Baralle, F.E. (2001). Nuclear factor TDP-43 and SR proteins promote in vitro and in vivo CFTR exon 9 skipping. The EMBO Journal 20, 1774–1784.
127 Buratti, E., Brindisi, A., Giombi, M., Tisminetzky, S., Ayala, Y.M., and Baralle, F.E. (2005). TDP-43 binds heterogeneous nuclear ribonucleoprotein A/B through its C- terminal tail: an important region for the inhibition of cystic fibrosis transmembrane conductance regulator exon 9 splicing. J Biol Chem 280, 37572–37584.
Burattini, S., Ferri, P., Battistelli, M., Curci, R., Luchetti, F., and Falcieri, E. (2004). C2C12 murine myoblasts as a model of skeletal muscle development: morpho- functional characterization. Eur J Histochem 48, 223–233.
Burd, C.G., and Dreyfuss, G. (1994). RNA binding specificity of hnRNP A1: significance of hnRNP A1 high-affinity binding sites in pre-mRNA splicing. The EMBO Journal 13, 1197–1204.
Burd, C.G., Swanson, M.S., Görlach, M., and Dreyfuss, G. (1989). Primary structures of the heterogeneous nuclear ribonucleoprotein A2, B1, and C2 proteins: a diversity of RNA binding proteins is generated by small peptide inserts. Proc Natl Acad Sci USA 86, 9788–9792.
Busch, A., and Hertel, K.J. (2012). Evolution of SR protein and hnRNP splicing regulatory factors. WIREs RNA 3, 1–12.
Busch, A., Richter, A.S., and Backofen, R. (2008). IntaRNA: efficient prediction of bacterial sRNA targets incorporating target site accessibility and seed regions. 24, 2849–2856.
Cabili, M.N., Trapnell, C., Goff, L., Koziol, M., Tazon-Vega, B., Regev, A., and Rinn, J.L. (2011). Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev 25, 1915–1927.
Cao, R., and Zhang, Y. (2004). SUZ12 Is Required for Both the Histone Methyltransferase Activity and the Silencing Function of the EED-EZH2 Complex. Mol Cell 15, 57–67.
Cesana, M., Cacchiarelli, D., Legnini, I., Santini, T., Sthandier, O., Chinappi, M., Tramontano, A., and Bozzoni, I. (2011). A long noncoding RNA controls muscle differentiation by functioning as a competing endogenous RNA. Cell 147, 358–369.
Chow, J.C., Ciaudo, C., Fazzari, M.J., Mise, N., Servant, N., Glass, J.L., Attreed, M., Avner, P., Wutz, A., Barillot, E., et al. (2010). LINE-1 activity in facultative heterochromatin formation during X chromosome inactivation. Cell 141, 956–969.
Chu, C., Qu, K., Zhong, F.L., Artandi, S.E., and Chang, H.Y. (2011). Genomic Maps of Long Noncoding RNA Occupancy Reveal Principles of RNA-Chromatin Interactions. Mol Cell 44, 667–678.
Chu, C., Zhang, Q.C., da Rocha, S.T., Flynn, R.A., Bharadwaj, M., Calabrese, J.M., Magnuson, T., Heard, E., and Chang, H.Y. (2015). Systematic discovery of Xist RNA binding proteins. Cell 161, 404–416.
128 Ciferri, C., Lander, G.C., Maiolica, A., Herzog, F., Aebersold, R., and Nogales, E. (2012). Molecular architecture of human polycomb repressive complex 2. Elife 1, e00005.
Cifuentes-Rojas, C., Hernandez, A.J., Sarma, K., and Lee, J.T. (2014). Regulatory interactions between RNA and polycomb repressive complex 2. Mol Cell 55, 171–185.
Crawford, G.E., Holt, I.E., Whittle, J., Webb, B.D., Tai, D., Davis, S., Margulies, E.H., Chen, Y., Bernat, J.A., Ginsburg, D., et al. (2006). Genome-wide mapping of DNase hypersensitive sites using massively parallel signature sequencing (MPSS). Genome Res 16, 123–131. da Rocha, S.T., Boeva, V., Escamilla-Del-Arenal, M., Ancelin, K., Granier, C., Matias, N.R., Sanulli, S., Chow, J., Schulz, E., Picard, C., et al. (2014). Jarid2 Is Implicated in the Initial Xist-Induced Targeting of PRC2 to the Inactive X Chromosome. Mol Cell 53, 301–316.
Darzacq, X., Jády, B.E., Verheggen, C., Kiss, A.M., Bertrand, E., and Kiss, T. (2002). Cajal body-specific small nuclear RNAs: a novel class of 2'-O-methylation and pseudouridylation guide RNAs. The EMBO Journal 21, 2746–2756.
Davidovich, C., Wang, X., Cifuentes-Rojas, C., Goodrich, K.J., Gooding, A.R., Lee, J.T., and Cech, T.R. (2015). Toward a consensus on the binding specificity and promiscuity of PRC2 for RNA. Mol Cell 57, 552–558.
Davidovich, C., Zheng, L., Goodrich, K.J., and Cech, T.R. (2013). Promiscuous RNA binding by Polycomb repressive complex 2. Nat Struct Mol Biol 20, 1250–1257.
Dayama, G., Emery, S.B., Kidd, J.M., and Mills, R.E. (2014). The genomic landscape of polymorphic human nuclear mitochondrial insertions. Nucleic Acids Research 42, 12640–12649.
de Klerk, E., Fokkema, I.F.A.C., Thiadens, K.A.M.H., Goeman, J.J., Palmblad, M., Dunnen, den, J.T., Lindern, von, M., and 't Hoen, P.A.C. (2015). Assessing the translational landscape of myogenic differentiation by ribosome profiling. Nucleic Acids Research 43, 4408–4428.
De Raedt, T., Beert, E., Pasmant, E., Luscan, A., Brems, H., Ortonne, N., Helin, K., Hornick, J.L., Mautner, V., Kehrer-Sawatzki, H., et al. (2014). PRC2 loss amplifies Ras-driven transcription and confers sensitivity to BRD4-based therapies. Nature 514, 247–251.
Denisenko, O.N., and Bomsztyk, K. (1997). The product of the murine homolog of the Drosophila extra sex combs gene displays transcriptional repressor activity. Mol. Cell. Biol. 17, 4707–4717.
129 Derrien, T., Johnson, R., Bussotti, G., Tanzer, A., Djebali, S., Tilgner, H., Guernec, G., Martin, D., Merkel, A., Knowles, D.G., et al. (2012). The GENCODE v7 catalog of human long noncoding RNAs: Analysis of their gene structure, evolution, and expression. Genome Res 22, 1775–1789.
Ding, J., Hayashi, M.K., Zhang, Y., Manche, L., Krainer, A.R., and Xu, R.M. (1999). Crystal structure of the two-RRM domain of hnRNP A1 (UP1) complexed with single- stranded telomeric DNA. Genes Dev 13, 1102–1115.
Djebali, S., Davis, C.A., Merkel, A., Dobin, A., Lassmann, T., Mortazavi, A., Tanzer, A., Lagarde, J., Lin, W., Schlesinger, F., et al. (2012). Landscape of transcription in human cells. Nature 488, 101–108.
Dreyfuss, G., Kim, V.N., and Kataoka, N. (2002). Messenger-RNA-binding proteins and the messages they carry. Nat. Rev. Mol. Cell Biol. 3, 195–205.
Dunham, I., Aldred, S.F., Davis, C.A., Doyle, F., Harrow, J., Pauli, F., Rosenbloom, K.R., Sabo, P., Safi, A., Simon, J.M., et al. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74.
D’Ambrogio, A., Buratti, E., Stuani, C., Guarnaccia, C., Romano, M., Ayala, Y.M., and Baralle, F.E. (2009). Functional mapping of the interaction between TDP-43 and hnRNP A2 in vivo. Nucleic Acids Research 37, 4116–4126.
Elisaphenko, E.A., Kolesnikov, N.N., Shevchenko, A.I., Rogozin, I.B., Nesterova, T.B., Brockdorff, N., and Zakian, S.M. (2008). A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements. PLoS ONE 3, e2521.
Engreitz, J.M., Pandya-Jones, A., McDonel, P., Shishkin, A., Sirokman, K., Surka, C., Kadri, S., Xing, J., Goren, A., Lander, E.S., et al. (2013). The Xist lncRNA Exploits Three-Dimensional Genome Architecture to Spread Across the X Chromosome. Science 341, 1237973–1237973.
Fan, X., Messaed, C., Dion, P., Laganiere, J., Brais, B., Karpati, G., and Rouleau, G.A. (2014). hnRNP A1 and A/B Interaction with PABPN1 in Oculopharyngeal Muscular Dystrophy. Canadian Journal of Neurological Sciences / Journal Canadien Des Sciences Neurologiques 30, 244–251.
Fatimy, El, R., Davidovic, L., Tremblay, S., Jaglin, X., Dury, A., Robert, C., De Koninck, P., and Khandjian, E.W. (2016). Tracking the Fragile X Mental Retardation Protein in a Highly Ordered Neuronal RiboNucleoParticles Population: A Link between Stalled Polyribosomes and RNA Granules. PLoS Genet. 12, e1006192.
Flynn, R.A., Martin, L., Spitale, R.C., Do, B.T., Sagan, S.M., Zarnegar, B., Qu, K., Khavari, P.A., Quake, S.R., Sarnow, P., et al. (2014). Dissecting noncoding and pathogen RNA–protein interactomes. Rna 21, 135–143.
130 Franke, A., DeCamillis, M., Zink, D., Cheng, N., Brock, H.W., and Paro, R. (1992). Polycomb and polyhomeotic are constituents of a multimeric protein complex in chromatin of Drosophila melanogaster. The EMBO Journal 11, 2941–2950.
Geissler, R., Simkin, A., Floss, D., Patel, R., Fogarty, E.A., Scheller, J., and Grimson, A. (2016). A widespread sequence-specific mRNA decay pathway mediated by hnRNPs A1 and A2/B1. Genes Dev 30, 1070–1085.
Gerstein, M.B., Kundaje, A., Hariharan, M., Landt, S.G., Yan, K.-K., Cheng, C., Mu, X.J., Khurana, E., Rozowsky, J., Alexander, R., et al. (2012). Architecture of the human regulatory network derived from ENCODE data. Nature 488, 91–100.
Glazko, G.V., Zybailov, B.L., and Rogozin, I.B. (2012). Computational prediction of polycomb-associated long non-coding RNAs. PLoS ONE 7, e44878.
Goodarzi, H., Najafabadi, H.S., Oikonomou, P., Greco, T.M., Fish, L., Salavati, R., Cristea, I.M., and Tavazoie, S. (2013). Systematic discovery of structural elements governing stability of mammalian messenger RNAs. Nature 485, 264–268.
Gueroussov, S., Weatheritt, R.J., O'Hanlon, D., Lin, Z.-Y., Narula, A., Gingras, A.-C., and Blencowe, B.J. (2017). Regulatory Expansion in Mammals of Multivalent hnRNP Assemblies that Globally Control Alternative Splicing. Cell 170, 324–339.e23.
Guo, F., Jiao, F., Song, Z., Li, S., Liu, B., Yang, H., Zhou, Q., and Li, Z. (2015). Regulation of MALAT1 expression by TDP43 controls the migration and invasion of non-small cell lung cancer cells in vitro. Biochem. Biophys. Res. Commun. 465, 293– 298.
Gupta, R.A., Shah, N., Wang, K.C., Kim, J., Horlings, H.M., Wong, D.J., Tsai, M.-C., Hung, T., Argani, P., Rinn, J.L., et al. (2011). Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis. Nature 464, 1071–1076.
Guttman, M., Donaghey, J., Carey, B.W., Garber, M., Grenier, J.K., Munson, G., Young, G., Lucas, A.B., Ach, R., Bruhn, L., et al. (2012). lincRNAs act in the circuitry controlling pluripotency and differentiation. Nature 477, 295–300.
Han, H., Braunschweig, U., Gonatopoulos-Pournatzis, T., Weatheritt, R.J., Hirsch, C.L., Ha, K.C.H., Radovani, E., Nabeel-Shah, S., Sterne-Weiler, T., Wang, J., et al. (2017). Multilayered Control of Alternative Splicing Regulatory Networks by Transcription Factors. Mol Cell 65, 539–553.e7.
Harrow, J., Denoeud, F., Frankish, A., Reymond, A., Chen, C.-K., Chrast, J., Lagarde, J., Gilbert, J.G.R., Storey, R., Swarbreck, D., et al. (2006). GENCODE: producing a reference annotation for ENCODE. Genome Biol 7 Suppl 1, S4.1–S4.9.
Hasegawa, Y., Brockdorff, N., Kawano, S., Tsutui, K., Tsutui, K., and Nakagawa, S. (2010). The Matrix Protein hnRNP U Is Required for Chromosomal Localization of Xist RNA. Developmental Cell 19, 469–476.
131 Hatfield, J.T., Rothnagel, J.A., and Smith, R. (2002). Characterization of the mouse hnRNP A2/B1/B0 gene and identification of processed pseudogenes. Gene 295, 33– 42.
He, Y., Selvaraju, S., Curtin, M.L., Jakob, C.G., Zhu, H., Comess, K.M., Shaw, B., The, J., Lima-Fernandes, E., Szewczyk, M.M., et al. (2017). The EED protein-protein interaction inhibitor A-395 inactivates the PRC2 complex. Nat. Chem. Biol. 13, 389– 395.
Huelga, S.C., Vu, A.Q., Arnold, J.D., Liang, T.Y., Liu, P.P., Yan, B.Y., Donohue, J.P., Shiue, L., Hoon, S., Brenner, S., et al. (2012). Integrative genome-wide analysis reveals cooperative regulation of alternative splicing by hnRNP proteins. Cell Rep 1, 167–178.
Huppertz, I., Attig, J., D’Ambrogio, A., Easton, L.E., Sibley, C.R., Sugimoto, Y., Tajnik, M., König, J., and Ule, J. (2014). iCLIP: protein-RNA interactions at nucleotide resolution. Methods 65, 274–287.
Ji, Z., Song, R., Huang, H., Regev, A., and Struhl, K. (2016). Transcriptome-scale RNase-footprinting of RNA-protein complexes. Nature Biotechnology 34, 410–413.
Jiang, J., Jing, Y., Cost, G.J., Chiang, J.-C., Kolpa, H.J., Cotton, A.M., Carone, D.M., Carone, B.R., Shivak, D.A., Guschin, D.Y., et al. (2013). Translating dosage compensation to trisomy 21. Nature 500, 296–300.
Johnson, R., and Guigó, R. (2014). The RIDL hypothesis: transposable elements as functional domains of long noncoding RNAs. Rna 20, 959–976.
Jürgens, G. (1985). A group of genes controlling the spatial expression of the bithorax complex in Drosophila. Nature 316, 153–155.
Kamma, H., Horiguchi, H., Wan, L., Matsui, M., Fujiwara, M., Fujimoto, M., Yazawa, T., and Dreyfuss, G. (1999). Molecular characterization of the hnRNP A2/B1 proteins: tissue-specific expression and novel isoforms. Exp. Cell Res. 246, 399–411.
Kaneko, S., Bonasio, R., Saldaña-Meyer, R., Yoshida, T., Son, J., Nishino, K., Umezawa, A., and Reinberg, D. (2014a). Interactions between JARID2 and noncoding RNAs regulate PRC2 recruitment to chromatin. Mol Cell 53, 290–300.
Kaneko, S., Li, G., Son, J., Xu, C.-F., Margueron, R., Neubert, T.A., and Reinberg, D. (2010). Phosphorylation of the PRC2 component Ezh2 is cell cycle-regulated and up- regulates its binding to ncRNA. Genes Dev 24, 2615–2620.
Kaneko, S., Son, J., Bonasio, R., Shen, S.S., and Reinberg, D. (2014b). Nascent RNA interaction keeps PRC2 activity poised and in check. Genes Dev 28, 1983–1988.
132 Kaneko, S., Son, J., Shen, S.S., Reinberg, D., and Bonasio, R. (2013). PRC2 binds active promoters and contacts nascent RNAs in embryonic stem cells. Nat Struct Mol Biol 20, 1258–1264.
Kapusta, A., and Feschotte, C. (2014). Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications. Trends in Genetics 30, 439– 452.
Kapusta, A., Kronenberg, Z., Lynch, V.J., Zhuo, X., Ramsay, L., Bourque, G., Yandell, M., and Feschotte, C. (2013). Transposable Elements Are Major Contributors to the Origin, Diversification, and Regulation of Vertebrate Long Noncoding RNAs. PLoS Genet. 9, e1003470.
Katsushima, K., Natsume, A., Ohka, F., Shinjo, K., Hatanaka, A., Ichimura, N., Sato, S., Takahashi, S., Kimura, H., Totoki, Y., et al. (2016). Targeting the Notch-regulated non-coding RNA TUG1 for glioma treatment. Nature Communications 7, 13616.
Kelley, D., and Rinn, J. (2012). Transposable elements reveal a stem cell-specific class of long noncoding RNAs. Genome Biol 13, R107.
Kellis, M., Wold, B., Snyder, M.P., Bernstein, B.E., Kundaje, A., Marinov, G.K., Ward, L.D., Birney, E., Crawford, G.E., Dekker, J., et al. (2014). Defining functional DNA elements in the human genome. Proc Natl Acad Sci USA 111, 6131–6138.
Kent, W.J., Zweig, A.S., Barber, G., Hinrichs, A.S., and Karolchik, D. (2010). BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207.
Khalil, A.M., Guttman, M., Huarte, M., Garber, M., Raj, A., Rivea Morales, D., Thomas, K., Presser, A., Bernstein, B.E., van Oudenaarden, A., et al. (2009). Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc Natl Acad Sci USA 106, 11667–11672.
Khong, A., Kerr, C.H., Yeung, C.H.L., Keatings, K., Nayak, A., Allan, D.W., and Jan, E. (2017). Disruption of Stress Granule Formation by the Multifunctional Cricket Paralysis Virus 1A Protein. J. Virol. 91, e01779–16.
Kim, H., Kang, K., and Kim, J. (2009). AEBP2 as a potential targeting protein for Polycomb Repression Complex PRC2. Nucleic Acids Research 37, 2940–2950.
Kim, H.J., Kim, N.C., Wang, Y.-D., Scarborough, E.A., Moore, J., Diaz, Z., MacLea, K.S., Freibaum, B., Li, S., Molliex, A., et al. (2013a). Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS. Nature 495, 467–473.
Kim, T.-G., Kraus, J.C., Chen, J., and Lee, Y. (2003). JUMONJI, a critical factor for cardiac development, functions as a transcriptional repressor. J Biol Chem 278, 42247–42255.
133 Kim, W., Bird, G.H., Neff, T., Guo, G., Kerenyi, M.A., Walensky, L.D., and Orkin, S.H. (2013b). Targeted disruption of the EZH2-EED complex inhibits EZH2- dependent cancer. Nat. Chem. Biol. 9, 643–650.
Kino, T., Hurt, D.E., Ichijo, T., Nader, N., and Chrousos, G.P. (2010). Noncoding RNA gas5 is a growth arrest- and starvation-associated repressor of the glucocorticoid receptor. Sci Signal 3, ra8.
Kirmizis, A., Bartley, S.M., Kuzmichev, A., Margueron, R., Reinberg, D., Green, R., and Farnham, P.J. (2004). Silencing of human polycomb target genes is associated with methylation of histone H3 Lys 27. Genes Dev 18, 1592–1605.
Kumar, A., and Wilson, S.H. (1990). Studies of the strand-annealing activity of mammalian hnRNP complex protein A1. Biochemistry 29, 10717–10722.
Kumari, D., and Usdin, K. (2014). Polycomb group complexes are recruited to reactivated FMR1 alleles in Fragile X syndrome in response to FMR1 transcription. Hum Mol Genet 23, 6575–6583.
Kundu, S., Ji, F., Sunwoo, H., Jain, G., Lee, J.T., Sadreyev, R.I., Dekker, J., and Kingston, R.E. (2017). Polycomb Repressive Complex 1 Generates Discrete Compacted Domains that Change during Differentiation. Mol Cell 65, 432– 446.e435.
Kung, J.T., Kesner, B., An, J.Y., Ahn, J.Y., Cifuentes-Rojas, C., Colognori, D., Jeon, Y., Szanto, A., del Rosario, B.C., Pinter, S.F., et al. (2015). Locus-specific targeting to the X chromosome revealed by the RNA interactome of CTCF. Mol Cell 57, 361–375.
Kuzmichev, A., Nishioka, K., Erdjument-Bromage, H., Tempst, P., and Reinberg, D. (2002). Histone methyltransferase activity associated with a human multiprotein complex containing the Enhancer of Zeste protein. Genes Dev 16, 2893–2905. la Cruz, de, C.C., Kirmizis, A., Simon, M.D., Isono, K.-I., Koseki, H., and Panning, B. (2007). The Polycomb Group Protein SUZ12 regulates histone H3 lysine 9 methylation and HP1α distribution. Chromosome Res 15, 299–314.
Le Ber, I., Van Bortel, I., Nicolas, G., Bouya-Ahmed, K., Camuzat, A., Wallon, D., De Septenville, A., Latouche, M., Lattante, S., Kabashi, E., et al. (2014). hnRNPA2B1 and hnRNPA1 mutations are rare in patients with “multisystem proteinopathy” and frontotemporal lobar degeneration phenotypes. Neurobiology of Aging 35, 934.e5– 934.e6.
Le Thomas, A., Rogers, A.K., Webster, A., Marinov, G.K., Liao, S.E., Perkins, E.M., Hur, J.K., Aravin, A.A., and Toth, K.F. (2013). Piwi induces piRNA-guided transcriptional silencing and establishment of a repressive chromatin state. Genes Dev 27, 390–399.
134 Lee, S., Kopp, F., Chang, T.-C., Sataluri, A., Chen, B., Sivakumar, S., Yu, H., Xie, Y., and Mendell, J.T. (2016). Noncoding RNA NORAD Regulates Genomic Stability by Sequestering PUMILIO Proteins. Cell 164, 69–80.
Levine, S.S., Weiss, A., Erdjument-Bromage, H., Shao, Z., Tempst, P., and Kingston, R.E. (2002). The core of the polycomb repressive complex is compositionally and functionally conserved in flies and humans. Mol. Cell. Biol. 22, 6070–6078.
Lewis, E.B. (1978). A gene complex controlling segmentation in Drosophila. Nature 276, 565–570.
Lewis, P.H. (1947). Pc: Polycomb (Drosophila Information Service).
Lewis, P.W., Müller, M.M., Koletsky, M.S., Cordero, F., Lin, S., Banaszynski, L.A., Garcia, B.A., Muir, T.W., Becher, O.J., and Allis, C.D. (2013). Inhibition of PRC2 activity by a gain-of-function H3 mutation found in pediatric glioblastoma. Science 340, 857–861.
Li, G., Margueron, R., Ku, M., Chambon, P., and Reinberg, D. (2010). Jarid2 and PRC2, partners in regulating gene expression. Genes Dev 24, 368–380.
Li, Q., Peterson, K.R., Fang, X., and Stamatoyannopoulos, G. (2002). Locus control regions. Blood 100, 3077–3086.
Li, S., Zhang, P., Freibaum, B.D., Kim, N.C., Kolaitis, R.-M., Molliex, A., Kanagaraj, A.P., Yabe, I., Tanino, M., Tanaka, S., et al. (2016a). Genetic interaction of hnRNPA2B1 and DNAJB6 in a Drosophila model of multisystem proteinopathy. Hum Mol Genet 25, 936–950.
Li, T., Hu, J.-F., Qiu, X., Ling, J., Chen, H., Wang, S., Hou, A., Vu, T.H., and Hoffman, A.R. (2008). CTCF regulates allelic expression of Igf2 by orchestrating a promoter-polycomb repressive complex 2 intrachromosomal loop. Mol. Cell. Biol. 28, 6473–6482.
Li, W., Notani, D., Ma, Q., Tanasa, B., Nunez, E., Chen, A.Y., Merkurjev, D., Zhang, J., Ohgi, K., Song, X., et al. (2013). Functional roles of enhancer RNAs for oestrogen- dependent transcriptional activation. Nature 498, 516–520.
Li, Z., Shen, J., Chan, M.T.V., and Wu, W.K.K. (2016b). TUG1: a pivotal oncogenic long non-coding RNA of human cancers. Cell Prolif. 49, 471–475.
Li, Z., Chao, T.-C., Chang, K.-Y., Lin, N., Patil, V.S., Shimizu, C., Head, S.R., Burns, J.C., and Rana, T.M. (2014). The long noncoding RNA THRIL regulates TNFα expression through its interaction with hnRNPL. Proc Natl Acad Sci USA 111, 1002– 1007.
135 Lin, J.Y., Shih, S.R., Pan, M., Li, C., Lue, C.F., Stollar, V., and Li, M.L. (2009). hnRNP A1 Interacts with the 5' Untranslated Regions of Enterovirus 71 and Sindbis Virus RNA and Is Required for Viral Replication. J. Virol. 83, 6106–6114.
Linder, B., Grozhik, A.V., Olarerin-George, A.O., Meydan, C., Mason, C.E., and Jaffrey, S.R. (2015). Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome. Nat Methods 12, 767–772.
Liu, F., Somarowthu, S., and Pyle, A.M. (2017a). Visualizing the secondary and tertiary architectural domains of lncRNA RepA. Nat. Chem. Biol. 13, 282–289.
Liu, N., Dai, Q., Zheng, G., He, C., Parisien, M., and Pan, T. (2015). N6- methyladenosine-dependent RNA structural switches regulate RNA-protein interactions. Nature 518, 560–564.
Liu, T.-Y., Chen, Y.-C., Jong, Y.-J., Tsai, H.-J., Lee, C.-C., Chang, Y.-S., Chang, J.-G., and Chang, Y.-F. (2017b). Muscle developmental defects in heterogeneous nuclear Ribonucleoprotein A1 knockout mice. Open Biol. 7, 160303–160312.
Lovci, M.T., Ghanem, D., Marr, H., Arnold, J., Gee, S., Parra, M., Liang, T.Y., Stark, T.J., Gehman, L.T., Hoon, S., et al. (2013). Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges. Nat Struct Mol Biol 20, 1434–1442.
Maji, S.K., Perrin, M.H., Sawaya, M.R., Jessberger, S., Vadodaria, K., Rissman, R.A., Singru, P.S., Nilsson, K.P.R., Simon, R., Schubert, D., et al. (2009). Functional amyloids as natural storage of peptide hormones in pituitary secretory granules. Science 325, 328–332.
Margueron, R., Justin, N., Ohno, K., Sharpe, M.L., Son, J., DruryIII, W.J., Voigt, P., Martin, S.R., Taylor, W.R., De Marco, V., et al. (2009). Role of the polycomb protein EED in the propagation of repressive histone marks. Nature 461, 762–767.
Margueron, R., Li, G., Sarma, K., Blais, A., Zavadil, J., Woodcock, C.L., Dynlacht, B.D., and Reinberg, D. (2008). Ezh1 and Ezh2 maintain repressive chromatin through different mechanisms. Mol Cell 32, 503–518.
Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal 17, 10.
Martinez, F.J., Pratt, G.A., Van Nostrand, E.L., Batra, R., Huelga, S.C., Kapeli, K., Freese, P., Chun, S.J., Ling, K., Gelboin-Burkhart, C., et al. (2016). Protein-RNA Networks Regulated by Normal and ALS-Associated Mutant HNRNPA2B1 in the Nervous System. Neuron 92, 780–795.
Maximov, V., Martynenko, A., Hunsmann, G., and Tarantul, V. (2002). Mitochondrial 16S rRNA gene encodes a functional peptide, a potential drug for Alzheimer’s disease and target for cancer therapy. Medical Hypotheses 59, 670–673.
136 Mayeda, A., and Krainer, A.R. (1992). Regulation of alternative pre-mRNA splicing by hnRNP A1 and splicing factor SF2. Cell 68, 365–375.
Mayeda, A., Munroe, S.H., Caceres, J.F., and Krainer, A.R. (1994). Function of conserved domains of hnRNP A1 and other hnRNP A/B proteins. The EMBO Journal 13, 5483–5495.
McHugh, C.A., Chen, C.-K., Chow, A., Surka, C.F., Tran, C., McDonel, P., Pandya- Jones, A., Blanco, M., Burghard, C., Moradian, A., et al. (2015). The Xist lncRNA interacts directly with SHARP to silence transcription through HDAC3. Nature 521, 232–236.
McKay, S.J., and Cooke, H. (1992). hnRNP A2/B1 binds specifically to single stranded vertebrate telomeric repeat TTAGGGn. Nucleic Acids Research 20, 6461– 6464.
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297–1303.
Mendenhall, E.M., Koche, R.P., Truong, T., Zhou, V.W., Issac, B., Chi, A.S., Ku, M., and Bernstein, B.E. (2010). GC-Rich Sequence Elements Recruit PRC2 in Mammalian ES Cells. PLoS Genet. 6, e1001244.
Meredith, E.K., Balas, M.M., Sindy, K., Haislop, K., and Johnson, A.M. (2016). An RNA matchmaker protein regulates the activity of the long noncoding RNA HOTAIR. Rna 22, 995–1010.
Mi, H., Huang, X., Muruganujan, A., Tang, H., Mills, C., Kang, D., and Thomas, P.D. (2017). PANTHER version 11: expanded annotation data from Gene Ontology and Reactome pathways, and data analysis tool enhancements. Nucleic Acids Research 45, D183–D189.
Minajigi, A., Froberg, J.E., Wei, C., Sunwoo, H., Kesner, B., Colognori, D., Lessing, D., Payer, B., Boukhali, M., Haas, W., et al. (2015). A comprehensive Xist interactome reveals cohesin repulsion and an RNA-directed chromosome conformation. Science 349, aab2276–aab2276.
Mishra, R.K., Mihaly, J., Barges, S., Spierer, A., Karch, F., Hagstrom, K., Schweinsberg, S.E., and Schedl, P. (2001). The iab-7 polycomb response element maps to a nucleosome-free region of chromatin and requires both GAGA and pleiohomeotic for silencing activity. Mol. Cell. Biol. 21, 1311–1318.
Mohagheghi, F., Prudencio, M., Stuani, C., Cook, C., Jansen-West, K., Dickson, D.W., Petrucelli, L., and Buratti, E. (2016). TDP-43 functions within a network of hnRNP proteins to inhibit the production of a truncated human SORT1 receptor. Hum Mol Genet 25, 534–545.
137 Moore, M.J., Zhang, C., Gantman, E.C., Mele, A., Darnell, J.C., and Darnell, R.B. (2014). Mapping Argonaute and conventional RNA-binding protein interactions with RNA at single-nucleotide resolution using HITS-CLIP and CIMS analysis. Nat Protoc 9, 263–293.
Moran-Jones, K., Wayman, L., Kennedy, D.D., Reddel, R.R., Sara, S., Snee, M.J., and Smith, R. (2005). hnRNP A2, a potential ssDNA/RNA molecular adapter at the telomere. Nucleic Acids Research 33, 486–496.
Mozzetta, C., Pontis, J., Fritsch, L., Robin, P., Portoso, M., Proux, C., Margueron, R., and Ait-Si-Ali, S. (2014). The histone H3 lysine 9 methyltransferases G9a and GLP regulate polycomb repressive complex 2-mediated gene silencing. Mol Cell 53, 277– 289.
Mulholland, N.M., King, I.F.G., and Kingston, R.E. (2003). Regulation of Polycomb group complexes by the sequence-specific DNA binding proteins Zeste and GAGA. Genes Dev 17, 2741–2746.
Munroe, S.H., and Dong, X.F. (1992). Heterogeneous nuclear ribonucleoprotein A1 catalyzes RNA.RNA annealing. Proc Natl Acad Sci USA 89, 895–899.
Murzina, N.V., Pei, X.-Y., Zhang, W., Sparkes, M., Vicente-Garcia, J., Pratap, J.V., McLaughlin, S.H., Ben-Shahar, T.R., Verreault, A., Luisi, B.F., et al. (2008). Structural Basis for the Recognition of Histone H4 by the Histone-Chaperone RbAp46. Structure 16, 1077–1085.
Musselman, C.A., Avvakumov, N., Watanabe, R., Abraham, C.G., Lalonde, M.-E., Hong, Z., Allen, C., Roy, S., Nuñez, J.K., Nickoloff, J., et al. (2012). Molecular basis for H3K36me3 recognition by the Tudor domain of PHF1. Nat Struct Mol Biol 19, 1266–1272.
Mysliwiec, M.R., Bresnick, E.H., and Lee, Y. (2011). Endothelial Jarid2/Jumonji Is Required for Normal Cardiac Development and Proper Notch1 Expression. Journal of Biological Chemistry 286, 17193–17204.
Necsulea, A., Soumillon, M., Warnefors, M., Liechti, A., Daish, T., Zeller, U., Baker, J.C., Grützner, F., and Kaessmann, H. (2014). The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature 505, 635–640.
Neff, T., Sinha, A.U., Kluk, M.J., Zhu, N., Khattab, M.H., Stein, L., Xie, H., Orkin, S.H., and Armstrong, S.A. (2012). Polycomb repressive complex 2 is required for MLL-AF9 leukemia. Proc Natl Acad Sci USA 109, 5028–5033.
Neph, S., Vierstra, J., Stergachis, A.B., Reynolds, A.P., Haugen, E., Vernot, B., Thurman, R.E., John, S., Sandstrom, R., Johnson, A.K., et al. (2012). An expansive human regulatory lexicon encoded in transcription factor footprints. Nature 488, 83–90.
138 Neumann, M., Sampathu, D.M., Kwong, L.K., Truax, A.C., Micsenyi, M.C., Chou, T.T., Bruce, J., Schuck, T., Grossman, M., Clark, C.M., et al. (2006). Ubiquitinated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis. Science 314, 130–133.
Ng, S.-Y., Bogu, G.K., Soh, B.S., and Stanton, L.W. (2013). The long noncoding RNA RMST interacts with SOX2 to regulate neurogenesis. Mol Cell 51, 349–359.
Ng, S.-Y., Johnson, R., and Stanton, L.W. (2011). Human long non-coding RNAs promote pluripotency and neuronal differentiation by association with chromatin modifiers and transcription factors. The EMBO Journal 31, 522–533.
Noordermeer, D., and de Laat, W. (2008). Joining the loops: beta-globin gene regulation. IUBMB Life 60, 824–833.
Ou, S.H., Wu, F., Harrich, D., García-Martínez, L.F., and Gaynor, R.B. (1995). Cloning and characterization of a novel cellular protein, TDP-43, that binds to human immunodeficiency virus type 1 TAR DNA sequence motifs. J. Virol. 69, 3584–3596.
Palhais, B., Dembic, M., Sabaratnam, R., Nielsen, K.S., Doktor, T.K., Bruun, G.H., and Andresen, B.S. (2016). The prevalent deep intronic c. 639+919 G>A GLA mutation causes pseudoexon activation and Fabry disease by abolishing the binding of hnRNPA1 and hnRNP A2/B1 to a splicing silencer. Molecular Genetics and Metabolism 119, 258–269.
Pandey, R.R., Mondal, T., Mohammad, F., Enroth, S., Redrup, L., Komorowski, J., Nagano, T., Mancini-DiNardo, D., and Kanduri, C. (2008). Kcnq1ot1 antisense noncoding RNA mediates lineage-specific transcriptional silencing through chromatin-level regulation. Mol Cell 32, 232–246.
Pasini, D., Cloos, P.A.C., Walfridsson, J., Olsson, L., Bukowski, J.-P., Johansen, J.V., Bak, M., Tommerup, N., Rappsilber, J., and Helin, K. (2010). JARID2 regulates binding of the Polycomb repressive complex 2 to target genes in ES cells. Nature 464, 306–310.
Patil, D.P., Chen, C.-K., Pickering, B.F., Chow, A., Jackson, C., Guttman, M., and Jaffrey, S.R. (2016). m(6)A RNA methylation promotes XIST-mediated transcriptional repression. Nature 537, 369–373.
Peng, J.C., Valouev, A., Swigut, T., Zhang, J., Zhao, Y., Sidow, A., and Wysocka, J. (2009). Jarid2/Jumonji Coordinates Control of PRC2 Enzymatic Activity and Target Gene Occupancy in Pluripotent Cells. Cell 139, 1290–1302.
Penzkofer, T. (2004). L1Base: from functional annotation to prediction of active LINE-1 elements. Nucleic Acids Research 33, D498–D500.
139 Polymenidou, M., Lagier-Tourenne, C., Hutt, K.R., Huelga, S.C., Moran, J., Liang, T.Y., Ling, S.-C., Sun, E., Wancewicz, E., Mazur, C., et al. (2011). Long pre-mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP-43. Nature Neuroscience 14, 459–468.
Qi, W., Zhao, K., Gu, J., Huang, Y., Wang, Y., Zhang, H., Zhang, M., Zhang, J., Yu, Z., Li, L., et al. (2017). An allosteric PRC2 inhibitor targeting the H3K27me3 binding pocket of EED. Nat. Chem. Biol. 13, 381–388.
Quinn, J.J., Ilik, I.A., Qu, K., Georgiev, P., Chu, C., Akhtar, A., and Chang, H.Y. (2014). Revealing long noncoding RNA architecture and functions using domain- specific chromatin isolation by RNA purification. Nature Biotechnology 32, 933– 940.
Ramos, A., Barbena, E., Mateiu, L., del Mar González, M., Mairal, Q., Lima, M., Montiel, R., Aluja, M.P., and Santos, C. (2011). Nuclear insertions of mitochondrial origin: Database updating and usefulness in cancer studies. Mitochondrion 11, 946– 953.
Rinn, J.L., Kertesz, M., Wang, J.K., Squazzo, S.L., Xu, X., Brugmann, S.A., Goodnough, L.H., Helms, J.A., Farnham, P.J., Segal, E., et al. (2007). Functional Demarcation of Active and Silent Chromatin Domains in Human HOX Loci by Noncoding RNAs. Cell 129, 1311–1323.
Romano, M., Buratti, E., Romano, G., Klima, R., Del Bel Belluz, L., Stuani, C., Baralle, F., and Feiguin, F. (2014). Evolutionarily conserved heterogeneous nuclear ribonucleoprotein (hnRNP) A/B proteins functionally interact with human and Drosophila TAR DNA-binding protein 43 (TDP-43). Journal of Biological Chemistry 289, 7121–7130.
Rothbart, S.B., Dickson, B.M., Raab, J.R., Grzybowski, A.T., Krajewski, K., Guo, A.H., Shanle, E.K., Josefowicz, S.Z., Fuchs, S.M., Allis, C.D., et al. (2015). An Interactive Database for the Assessment of Histone Antibody Specificity. Mol Cell 59, 502–511.
Sadic, D., Schmidt, K., Groh, S., Kondofersky, I., Ellwart, J., Fuchs, C., Theis, F.J., and Schotta, G. (2015). Atrx promotes heterochromatin formation at retrotransposons. EMBO Rep. 16, 836–850.
Sanulli, S., Justin, N., Teissandier, A., Ancelin, K., Portoso, M., Caron, M., Michaud, A., Lombard, B., da Rocha, S.T., Offer, J., et al. (2015). Jarid2 Methylation via the PRC2 Complex Regulates H3K27me3 Deposition during Cell Differentiation. Mol Cell 57, 769–783.
Sarma, K., Cifuentes-Rojas, C., Ergun, A., del Rosario, A., Jeon, Y., White, F., Sadreyev, R., and Lee, J.T. (2014). ATRX Directs Binding of PRC2 to Xist RNA and Polycomb Targets. Cell 159, 869–883.
140 Schimmelmann, von, M., Feinberg, P.A., Sullivan, J.M., Ku, S.M., Badimon, A., Duff, M.K., Wang, Z., Lachmann, A., Dewell, S., Ma'ayan, A., et al. (2016). Polycomb repressive complex 2 (PRC2) silences genes responsible for neurodegeneration. Nature Neuroscience 19, 1321–1330.
Schmitges, F.W., Prusty, A.B., Faty, M., Stützer, A., Lingaraju, G.M., Aiwazian, J., Sack, R., Hess, D., Li, L., Zhou, S., et al. (2011). Histone methylation by PRC2 is inhibited by active chromatin marks. Mol Cell 42, 330–341.
Schoeftner, S., Sengupta, A.K., Kubicek, S., Mechtler, K., Spahn, L., Koseki, H., Jenuwein, T., and Wutz, A. (2006). Recruitment of PRC1 function at the initiation of X inactivation independent of PRC2 and silencing. The EMBO Journal 25, 3110– 3122.
Schwendemann, A., and Lehmann, M. (2002). Pipsqueak and GAGA factor act in concert as partners at homeotic and many other loci. Proc Natl Acad Sci USA 99, 12883–12888.
Seong, I.S., Woda, J.M., Song, J.J., Lloret, A., Abeyrathne, P.D., Woo, C.J., Gregory, G., Lee, J.-M., Wheeler, V.C., Walz, T., et al. (2010). Huntingtin facilitates polycomb repressive complex 2. Hum Mol Genet 19, 573–583.
Shamoo, Y., Krueger, U., Rice, L.M., Williams, K.R., and Steitz, T.A. (1997). Crystal structure of the two RNA binding domains of human hnRNP A1 at 1.75 A resolution. Nat. Struct. Biol. 4, 215–222.
Shen, X., Kim, W., Fujiwara, Y., Simon, M.D., Liu, Y., Mysliwiec, M.R., Lee, Y., and Orkin, S.H. (2009). Jumonji Modulates Polycomb Activity and Self-Renewal versus Differentiation of Stem Cells. Cell 139, 1303–1314.
Shen, X., Liu, Y., Hsu, Y.-J., Fujiwara, Y., Kim, J., Mao, X., Yuan, G.-C., and Orkin, S.H. (2008). EZH1 mediates methylation on histone H3 lysine 27 and complements EZH2 in maintaining stem cell identity and executing pluripotency. Mol Cell 32, 491–502.
Simon, M.D., Pinter, S.F., Fang, R., Sarma, K., Rutenberg-Schoenberg, M., Bowman, S.K., Kesner, B.A., Maier, V.K., Kingston, R.E., and Lee, J.T. (2014). High-resolution Xist binding maps reveal two-step spreading during X-chromosome inactivation. Nature 504, 465–469.
Simon, M.D., Wang, C.I., Kharchenko, P.V., West, J.A., Chapman, B.A., Alekseyenko, A.A., Borowsky, M.L., Kuroda, M.I., and Kingston, R.E. (2011). The genomic binding sites of a noncoding RNA. Proc Natl Acad Sci USA 108, 20497–20502.
Smit, A., Hubley, R., and Green, P. RepeatMasker 4.0. Httpwww.Repeatmasker.org.
141 Somarowthu, S., Legiewicz, M., Chillón, I., Marcia, M., Liu, F., and Pyle, A.M. (2015). HOTAIR Forms an Intricate and Modular Secondary Structure. Mol Cell 58, 353– 361.
Son, J., Shen, S.S., Margueron, R., and Reinberg, D. (2013). Nucleosome-binding activities within JARID2 and EZH1 regulate the function of PRC2 on chromatin. Genes Dev 27, 2663–2677.
Soule, H.D., Maloney, T.M., Wolman, S.R., Peterson, W.D., Brenz, R., McGrath, C.M., Russo, J., Pauley, R.J., Jones, R.F., and Brooks, S.C. (1990). Isolation and characterization of a spontaneously immortalized human breast epithelial cell line, MCF-10. Cancer Res. 50, 6075–6086.
Steiner, G., Hartmuth, K., Skriner, K., Maurer-Fogy, I., Sinski, A., Thalmann, E., Hassfeld, W., Barta, A., and Smolen, J.S. (1992). Purification and partial sequencing of the nuclear autoantigen RA33 shows that it is indistinguishable from the A2 protein of the heterogeneous nuclear ribonucleoprotein complex. J. Clin. Invest. 90, 1061–1066.
Sun, M., Gadad, S.S., Kim, D.-S., and Kraus, W.L. (2015). Discovery, Annotation, and Functional Analysis of Long Noncoding RNAs Controlling Cell-Cycle Gene Expression and Proliferation in Breast Cancer Cells. Mol Cell 59, 698–711.
Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen, E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B., et al. (2012). The accessible chromatin landscape of the human genome. Nature 489, 75–82.
Tichon, A., Gil, N., Lubelsky, Y., Solomon, T.H., Lemze, D., Itzkovitz, S., Stern- Ginossar, N., and Ulitsky, I. (2016). A conserved abundant cytoplasmic long noncoding RNA modulates repression by Pumilio proteins in human cells. Nature Communications 7, 1–10.
Tilgner, H., Knowles, D.G., Johnson, R., Davis, C.A., Chakrabortty, S., Djebali, S., Curado, J., Snyder, M., Gingeras, T.R., and Guigo, R. (2012). Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome Res 22, 1616–1625.
Tollervey, J.R., Curk, T., Rogelj, B., Briese, M., Cereda, M., Kayikci, M., König, J., Hortobágyi, T., Nishimura, A.L., Župunski, V., et al. (2011). Characterizing the RNA targets and position-dependent splicing regulation by TDP-43. Nature Neuroscience 14, 452–458.
Tomoum, H.Y., Mostafa, G.A., and El-Shahat, E.M.F. (2009). Autoantibody to heterogeneous nuclear ribonucleoprotein-A2 (RA33) in juvenile idiopathic arthritis: Clinical significance. Pediatrics International 51, 188–192.
Trapnell, C., Pachter, L., and Salzberg, S.L. (2009). TopHat: discovering splice junctions with RNA-Seq. 25, 1105–1111.
142 Tsai, M.C., Manor, O., Wan, Y., Mosammaparast, N., Wang, J.K., Lan, F., Shi, Y., Segal, E., and Chang, H.Y. (2010). Long Noncoding RNA as Modular Scaffold of Histone Modification Complexes. Science 329, 689–693.
Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B.C., Remm, M., and Rozen, S.G. (2012). Primer3--new capabilities and interfaces. Nucleic Acids Research 40, e115–e115.
Van Nostrand, E.L., Pratt, G.A., Shishkin, A.A., Gelboin-Burkhart, C., Fang, M.Y., Sundararaman, B., Blue, S.M., Nguyen, T.B., Surka, C., Elkins, K., et al. (2016). Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP). Nat Methods 13, 508–514.
Villegas, V.E., and Zaphiropoulos, P.G. (2015). Neighboring gene regulation by antisense long non-coding RNAs. Int J Mol Sci 16, 3251–3266.
Wang, F., Tang, M., Zeng, Z., Wu, R., and Xue, Y. (2012). Telomere-and telomerase- interacting protein that unfolds telomere G-quadruplex and promotes telomere extension in mammalian cells.
Wang, X., Lu, Z., Gomez, A., Hon, G.C., Yue, Y., Han, D., Fu, Y., Parisien, M., Dai, Q., Jia, G., et al. (2014). N6-methyladenosine-dependent regulation of messenger RNA stability. Nature 505, 117–120.
Wang, X., Goodrich, K.J., Gooding, A.R., Naeem, H., Archer, S., Paucek, R.D., Youmans, D.T., Cech, T.R., and Davidovich, C. (2017). Targeting of Polycomb Repressive Complex 2 to RNA by Short Repeats of Consecutive Guanines. Mol Cell 65, 1056–1067.e5.
Washietl, S., Kellis, M., and Garber, M. (2014). Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals. Genome Res 24, 616– 628.
West, J.A., Davis, C.P., Sunwoo, H., Simon, M.D., Sadreyev, R.I., Wang, P.I., Tolstorukov, M.Y., and Kingston, R.E. (2014). The long noncoding RNAs NEAT1 and MALAT1 bind active chromatin sites. Mol Cell 55, 791–802.
Wright, D.K., Liu, S., van der Poel, C., McDonald, S.J., Brady, R.D., Taylor, L., Yang, L., Gardner, A.J., Ordidge, R., O'Brien, T.J., et al. (2016). Traumatic Brain Injury Results in Cellular, Structural and Functional Changes Resembling Motor Neuron Disease. Cereb. Cortex.
Wu, L., Murat, P., Matak-Vinkovic, D., Murrell, A., and Balasubramanian, S. (2013). Binding interactions between long noncoding RNA HOTAIR and PRC2 proteins. Biochemistry 52, 9519–9527.
143 Wysocka, J., Reilly, P.T., and Herr, W. (2001). Loss of HCF-1-chromatin association precedes temperature-induced growth arrest of tsBN67 cells. Mol. Cell. Biol. 21, 3820–3829.
Xie, H., Xu, J., Hsu, J.H., Nguyen, M., Fujiwara, Y., Peng, C., and Orkin, S.H. (2014). Polycomb repressive complex 2 regulates normal hematopoietic stem cell function in a developmental-stage-specific manner. Cell Stem Cell 14, 68–80.
Xu, J., Shao, Z., Li, D., Xie, H., Kim, W., Huang, J., Taylor, J.E., Pinello, L., Glass, K., Jaffe, J.D., et al. (2015). Developmental control of polycomb subunit composition by GATA factors mediates a switch to non-canonical functions. Mol Cell 57, 304–316.
Xu, R.M., Jokhan, L., Cheng, X., Mayeda, A., and Krainer, A.R. (1997). Crystal structure of human UP1, the domain of hnRNP A1 that contains two RNA- recognition motifs. Structure 5, 559–570.
Yang, N., and Kazazian, H.H. (2006). L1 retrotransposition is suppressed by endogenously encoded small interfering RNAs in human cultured cells. Nat Struct Mol Biol 13, 763–771.
Yang, Y.W., Flynn, R.A., Chen, Y., Qu, K., Wan, B., Wang, K.C., Lei, M., and Chang, H.Y. (2014). Essential role of lncRNA binding for WDR5 maintenance of active chromatin and embryonic stem cell pluripotency. Elife 3, e02046.
Yen, Z.C., Meyer, I.M., Karalic, S., and Brown, C.J. (2007). A cross-species comparison of X-chromosome inactivation in Eutheria. Genomics 90, 453–463.
Yildirim, E., Kirby, J.E., Brown, D.E., Mercier, F.E., Sadreyev, R.I., Scadden, D.T., and Lee, J.T. (2013). Xist RNA is a potent suppressor of hematologic cancer in mice. Cell 152, 727–742.
Yu, Y., Lv, F., Liang, D., Yang, Q., Zhang, B., Lin, H., Wang, X., Qian, G., Xu, J., and You, W. (2017). HOTAIR may regulate proliferation, apoptosis, migration and invasion of MCF-7 cells through regulating the P53/Akt/JNK signaling pathway. Biomed. Pharmacother. 90, 555–561.
Zeng, P.-Y., Vakoc, C.R., Chen, Z.-C., Blobel, G.A., and Berger, S.L. (2006). In vivo dual cross-linking for identification of indirect DNA-associated proteins by chromatin immunoprecipitation. BioTechniques 41, 694–696–698.
Zhang, H., Zeitz, M.J., Wang, H., Niu, B., Ge, S., Li, W., Cui, J., Wang, G., Qian, G., Higgins, M.J., et al. (2014a). Long noncoding RNA-mediated intrachromosomal interactions promote imprinting at the Kcnq1 locus. The Journal of Cell Biology 204, 61–75.
Zhang, M., Wang, Y., Jones, S., Sausen, M., McMahon, K., Sharma, R., Wang, Q., Belzberg, A.J., Chaichana, K., Gallia, G.L., et al. (2014b). Somatic mutations of SUZ12 in malignant peripheral nerve sheath tumors. Nat. Genet. 46, 1170–1172.
144 Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum, C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol 9, R137.
Zhao, J., Ohsumi, T.K., Kung, J.T., Ogawa, Y., Grau, D.J., Sarma, K., Song, J.J., Kingston, R.E., Borowsky, M., and Lee, J.T. (2010). Genome-wide identification of polycomb-associated RNAs by RIP-seq. Mol Cell 40, 939–953.
Zhao, J., Sun, B.K., Erwin, J.A., Song, J.J., and Lee, J.T. (2008). Polycomb proteins targeted by a short repeat RNA to the mouse X chromosome. Science 322, 750–756.
Ørom, U.A., Derrien, T., Beringer, M., Gumireddy, K., Gardini, A., Bussotti, G., Lai, F., Zytnicki, M., Notredame, C., Huang, Q., et al. (2010). Long noncoding RNAs with enhancer-like function in human cells. Cell 143, 46–58.
145