Genomic Relationship between Small RNAs and Histone Modifications Revealed by Deep Sequencing of Human Embryonic Stem Cells Loyal A. Goff1,2 Ahmad Khalil1,3 Mavis Swerdel4, Jennifer Moore4, Ronald P. Hart4, John L. Rinn1,3, Manolis Kellis1,2 1. The Broad Institute, MIT, Cambridge, MA. 2. Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA. 3. Division of Experimental Pathology, Harvard Medical School and Beth Israel Deaconess Medical Center, Boston, MA 4. W.M. Keck Center for Collaborative Neuroscience, Rutgers University, Piscataway, NJ. Abstract 3a. Experimental Setup: Sequencing of hESC smRNAs 4a. Mechanistic Relationship: RIP-Seq of Suz12 to 5a. Recovered smRNAs are specific to PRC2 bound New sequencing technologies have enabled probing of the cellular state identify PRC2-associated smRNAs regions in ever increasing detail, revealing many new families of small RNAs and • We have successfully sequenced the smRNA population of the H1 human embryonic stem cell (hESC) line in both undifferentiated and differentiated many new members of existing families, such as microRNAs, piRNAs, and To determine if smRNAs are associated with specific epigenome- states in a previous effort to identify novel microRNA sequences (in prep.). Flanking Sum Density of EZH2 Flanking Sum Density of RING1B rasiRNAs. However, the function of a large fraction of small RNAs found in modifying complexes, we conducted RIP-Seq to identify Suz12-associated • Cultures were maintained in a chemically defined medium supplemented such sequencing experiments remains unknown, as they do not fit in any smRNA sequences (Figure 4). As a proof of principle, we conducted the RIPs with the growth factor bFGF. Cells are passaged every 7 days so that the of these functional categories. in HeLa cells. As a control we pulled down any smRNAs that associated with

resulting cultures are 80-90% confluent on the day of passage. In their 300 300 To understand potential new functions in chromatin regulation, we have an IgG isotype antibody. IP’d RNA was used as input for SOLiD sequencing undifferentiated state, all lines express Oct4 and SSEA4, commonly used used SOLiD sequencing of small RNAs (18-28 nucleotides) in differentiated of short (~18-28 nt) RNAs. and undifferentiated human ES cells, and compared these to ChIP-Seq markers of pluripotency (not shown). 200 200 data for previously published hES histone modifications. The comparison • Differentiated cells were pre-conditioned with Neural Induction Media revealed a striking relationship between the two. (NIM) with N2 supplement for 2 days. Cells are passaged, and after the 5th # of Reads # of Reads

We found that small RNAs with previously unknown functions align to day ,the media is changed to Neural Precursor Media (NPM) containing 100 100 the center of peaks for histone modifications, suggesting that endogenous B27, N2 and bFGF. After 2-4 days in NPM, cells take on the flattened, Native Ribo- bipolar morphology typical of neural stem cells (NSC) and are classified as complexes

small RNAs may nucleate epigenetic changes. Moreover, the association 0 0 such (Nestin+, Sox2+, CD133+, Oct4-). showed distinct relationships with separate marks, and was significantly −10000 −5000 0 5000 10000 −10000 −5000 0 5000 10000 reduced when comparing smRNAs from differentiated hES cells with histone • Total RNA was isolated from the undifferentiated (H1-Undiff) and Position relative to smRNA Position relative to smRNA modifications of undifferentiated cells, suggesting these relationships are differentiated (H1 - Diff) cells and used as input for the SOLiD Small RNA Expression Kit (Applied Biosystems). RNA Immunoprecipitation Flanking sum densities of H9 hESC ChIP-Seq reads for EZH2 (A) and RING1B (B) likely to be cell-type specific and modification specific. surrounding the alignment positions of smRNAs associated with Suz12 in HeLa cells. While several recent studies have suggested a functional relationship Despite the dramatic differences in cell type, there still appears to be an appropriate between histone modifications and the RNAi pathway, this is the first genome- 3b. Sequencing Results: Many unannotated regions genomic relationship between the smRNAs associated with PRC2 and the binding of wide evidence of a likely global role for small RNAs in the establishment PRC2 component EZH2. This relationship is not observed for the PRC1 component and maintenance of histone modification states. If smRNAs do in fact direct express smRNAs in hESC RING1B. epigenetic modifications, this association may provide new understanding • ~6.0x107 reads were obtained for each cellular condition. Reads were • Suz12-associated and -enriched smRNAs from HeLa cells appear to of epigenetic changes during differentiation, and possibly new therapeutic specifically overlap with regions associated with EZH2 (PRC2) and not means for diseases associated with chromatin dysregulation such as cancer. aligned to the reference (hg18) using the SHRiMP RNA Isolation algorithm, a fast-kmer search with a Smith-Waterman extension (http:// RING1B (PRC1) in H9 cells. compbio.cs.toronto.edu/shrimp). Intervals overlapping repeat sequence • Despite dramatically different cell types, there remains a genomic 1. Hypothesis: Endogenous smRNAs direct histone- and/or known small RNA content were removed. relationship between Suz-12 associated smRNAs and EZH2 genomic occupancy, suggesting specificity of our assay for PRC2-related smRNAs. Number of non-redundant loci Size distribution of aligned small RNA regions modifying complexes to target genomic regions A B derived from H1U smRNAs

150,000 0.20 Epigenetic modifications are inheritable alterations to the human genome 5b. Example: A miRNA with known transcriptional 112,500

in the form of enzymatic manipulations of key histone residues, or the 0.15 silencing activity is associated with Suz12 methylation of specific cytosines in the DNA sequence. These modifications 75,000 Sequencing Library Preparation Density are responsible for chromatin organization and ultimately regulation of 0.10 37,500 the transcriptional availability of neighboring genetic elements. Aberrant Suz12-IP Associated small RNAs repeatmasker filtered

epigenetic regulation can affect cell proliferation and tissue transformation, 0.05 0 12 _ 0 1 2 3 4 5 a hallmark of cancer. We are interested in understanding the mechanisms # mismatches allowed hESC H3K4me3 0 _ Unannotated RefSeq - Sense 0.00 12 _ by which epigenetic modifications are targeted to specific genomic regions. RefSeq - Antisense Known smRNA 15 20 25 30 35 40 45 50 hESC H3K27me3 We hypothesize that endogenous small RNAs interact directly with 0 _ nt length specific epigenome-modifying complexes and direct these complexes 12 _ A) Size distribution of genomic intervals derived from collapsed short RNA reads hESC EZH2 to target DNA sequences to affect specific changes in chromatin 0 _ from H1-Undiff and H1-Diff smRNA libraries. Consistent with the size-selection, the Overview of the RIP-Seq method for identification of RNAs associated with a 12 _ organization and transcriptional availability. majority of genomic intervals are contained between ~17 and 28 nt in length. B) hESC RING1B specific protein complex. After immunoprecipitation and IP-RNA isolation, the RNA 0 _ We have sequenced the endogenous small RNA population of select Annotation overlap between sequenced smRNAs in H1U and known smRNAs and is used as input material for the Small RNA Expression Kit (Applied Biosystems). The 12 _ human embryonic stem cells and their differentiated progeny; a model RefSeq protein-coding . The vast majority of known smRNAs including miRNAs, resulting libraries (Suz12+ and IgG control) were sequenced on the SOLiD platform hESC H3K36me3 0 _ characterized by dramatic changes in epigenetic profiles genome-wide. scaRNAs, snoRNAs are identified through perfect alignment to genome. and the data were analyzed using a custom pipeline. POLR3D Exploration of these small RNAs and previously published epigenomic data hsa-mir-320a

of key histone modifications has identified an association between these CpG Islands (Islands < 300 Bases are Light Green) two previously independent pathways.To further test this hypothesis, we 3c. Link to chromatin modifications : hESC smRNA 4b. RIP Results: ~18-28 nt smRNAs are present in Vertebrate Multiz Alignment & Conservation (44 Species) have sequenced the population of smRNAs that Immunoprecipitate (RIP- align to peaks of histone modifications Suz12 RNA Immunoprecipitation Mammal Cons Seq), with the Polycomb Repressive Complex 2 component Suz12.

H3K4me3 − Undiff 1.4 EZH2 − Undiff Known smRNA content is enriched in the Suz12-associated smRNAs, A H3K4me3 − Diff B 1.20 H3K27me3 − Undiff EZH2 − Diff H3K27me3 − Diff RING1B − Undiff including 41 known microRNAs (Table). One example in particular, miR- Histone Modifying H3K36me3 − Undiff RING1B − Diff H3K36me3 − Diff Complex 320, encoded in the promoter of the POLR3D gene, has been previously 1.3 smRNA 1.15 shown to induce transcriptional gene silencing of the downstream gene and direct the association of Ago1, EZH2, and H3K27me3 to the POLR3D 1.2 Target RNA (Antisense) 1.10 promoter (Kim, et al., 2008). Tag Density (Relative to min) 1.1

Ago2 1.05 Chromatin Tag Density (Relative to min) Pol II smRNAs are selectively amplified during the SOLiD library preparation. 6. Future Directions 1.0 1.00 Adapter sequences are directly ligated onto the 5’ and 3’ ends of any smRNA with a −1000 −500 0 500 1000 −1000 −500 0 500 1000 5’-P and 3’-OH (consistent with products of RNase III degradation). PCR amplification • Continue RIP-Seq on additional epigenome-modifying complexes A model for smRNA-directed histone modification in which an endogenous smRNA Position relative to small RNAs Position relative to small RNAs using oligos designed against the adapter sequences demonstrates enrichment of (RING1B, G9a, MLLs, etc). is bound in complex to an Argonaute protein (e.g. Ago2). This primed complex can Flanking sum densities of reads from histone methylations (A) and Polycomb smRNA sequences in the Suz12 library relative to the IgG control. smRNAs are then • Modified CLIP-Seq assay to determine target RNAs. then target a nascent RNA sequence (pasRNA, antisense-RNA, nascent mRNA) and complex groups (B) in H9 undifferentiated hESC surrounding endogenous smRNAs size selected via gel excision and ligated to beads for ePCR and sequencing. recruit a histone-modifying complex to the target genomic location via protein-protein • Clone and validate high-confidence Suz12-associated smRNA target from H1U-Undiff (solid lines) and H1U-Diff (dashed lines). There is a clear tendency sites. interactions for smRNAs to be centered in regions of epigenetic modifications. This association is decreased when assessed across divergent cell types, suggesting a cell-type specific 4c. RIP-Seq results: Suz12-enriched smRNAs align • Functional over-expression and inhibition of Suz12-associated smRNAs interaction between smRNAs and histone modifications to determine sequence specific effect on target genomic regions. 2.Existing Evidence: Implication of the RNAi pathway to known and novel ncRNAs • Site directed mutagenesis of target regions to ensure smRNA specificity. • To determine the relationship between our observed smRNAs and • EMSA and Supershift assays to assess direct interactions between 7 in mammalian epigenetic regulation select histone modifications, we employed a previously published dataset We were able to generate ~3x10 25nt reads from each of the Suz12 smRNA and epigenome-modifying complexes from a similar cell type. Ku, et al. (2008) conducted ChIP-Seq to identify IP and IgG control samples. Reads were uniquely aligned to genome using Recent studies have suggested a link between specific epigenetic regions of the human genome that are associated with histone methylations the maq aligner. mechanisms and various components of the small RNA-specific RNA (H3K4me3, H3K27me3, and H3K36me3) and genomic occupancy by EZH2 • 348,327 Suz12-associated reads aligned with 0 mm to the reference Conclusions interference (RNAi) pathway. and RING1B, members of the PRC2 and PRC1 complexes respectively. human genome • Exogenous siRNAs, directed against a promoter region can silence • We assessed the genome-wide relationship between our sequenced • This corresponds to 34,237 distinct genomic intervals after collapsing the transcriptional activity of the downstream gene by induction of endogenous smRNAs from H1-Undiff and H1-Diff cells and the histone overlapping reads • Observed smRNA sequences align to the centers heterochromatin formation in mammals (Kim, et al., 2006; Morris, Chan, modification data. For each smRNA in our dataset, we determined the • 20,002 intervals remain after RepeatMasker filtering and removing of peaks of specific histone modifications in human Jacobsen, & Looney, 2004; Weinberg, et al., 2006). flanking sum densities of H3K4me3, H3K27me3, H3K36me3, EZH2, and any overlap with IgG control embryonic stem cells. • Promoter-targeting siRNAs can induce methylation of the H3K27 RING1B reads from the H9 undifferentiated hESC within a 1Kb region repressive histone mark (Weinberg, et al., 2006). upstream and downstream of the aligned smRNA. Known miRNA 63 • smRNAs are present in a Suz12 RNA Results: • Endogenous microRNAs have been shown to affect TGS as well through lincRNAs 115 / 1505 Immunoprecipitiation targeting of promoter-associated RNAs, which are required for proper • In comparing H1-Undiff smRNAs to H9-Undiff histone modifications, we 13,897 (52.4% of reads) • 20,002 genomic loci are enriched for Suz12- transcription of downstream genes in some cases (Kim, Saetrom, Snove, observe a clear genomic relationship in which the smRNA is located at the Within RefSeq within 9,126 RefSeq mRNA & Rossi, 2008). peak of various histone modifications (A - solid lines). associated smRNAs relative to IgG background 1Kb upstream Since promoter-encoded microRNAs and exogenous siRNAs can • This relationship is decreased when we compare the H1-Diff smRNAs 2,179 • Suz12-associated smRNAs contain known influence the epigenetic landscape in a sequence specific manner, could a to the H9-Undiff histone modifications (A - dashed lines). This suggests a RefSeq TSS microRNAs and intersect with lincRNAs cell-type specific relationship exists between the two pathways. 200b downstream similar mechanism be employed for endogenous smRNAs? We propose a 853 • A miRNA with a demonstrated role in transcriptional model in which smRNAs, through association with an Ago-complex recruit • A similar relationship is observed between the smRNAs and the enzymatic RefSeq TSS complexes associated with polycomb repression (B). gene silencing and PRC2 recruitment is enriched in a select epigenome-modifying complexes to target genomic locations by Known annotation content of Suz12-associated smRNAs enriched relative targeting nascent RNA transcripts. • Association is not observed for random genomic regions (not shown). to IgG Control RIP. Suz12 RIP.