<<

Supplemental Methods

RNA-seq

ESCs were lysed with Trizol reagent (Life technologies) and RNA was extracted according to the manufacturer's instructions. RNAse free DNaseI (Sigma) was used to eliminate DNA contamination during RNA extraction and the treated RNA was purified with RNeasy mini kit

(Qiagen). RNA-seq libraries were prepared using TruSeq stranded total RNA sample preparation kit (Illumina). At least 2 biological replicates were performed for each experimental condition.

Next generation sequencing data processing

ChIP-seq and RNA-seq libraries were single read sequenced on the NextSeq 500 sequencer

(Illumina) for 50 error rated cycles. The resulting raw BCL files were converted into FASTQ format using bcl2fastq (Illumina, version 2.17.1.14). These FASTQ files were trimmed from the

3’ end using the Trimmomatic processing tool (Bolger et al. 2014) until the final base had a sequencing quality score greater than 30. The trimmed reads were then aligned to the mouse genome assembly mm9 using Bowtie version 1.1.2 (Langmead et al. 2009) (ChIP-seq) or Tophat version 2.1.0 (Trapnell et al. 2009) (RNA-seq), allowing a maximum of two mismatches over the length of the . Only uniquely mapping reads were considered for further analysis, with gene annotations coming from Ensembl release 72. Mapped ChIP-seq reads were extended 150 bases to represent sequenced fragments. Raw read counts from both ChIP-seq and RNA-seq were normalized to total read counts per million (cpm) and visualized in the UCSC genome browser as bigWig-formatted coverage tracks.

1

Differential analysis

Gene count tables were constructed using Ensembl gene annotations and used as input for edgeR

3.0.8 (Robinson et al. 2010). with Benjamini-Hochberg adjusted p-values less than 0.01 were considered to be differentially expressed. Custom perl and R scripts were used to generate the correlation plots.

Gene ontology analysis

GO terms in Supplemental Fig. S6 and Supplemental Fig. S8 were derived using Metascape

(http://metascape.org/) (Tripathi et al. 2015) with default parameters. Significantly differentially expressed genes (adjusted p<0.01) were used as input.

2

Supplemental Figure Legends

Supplemental Figure S1. Characterization and deletion of putative enhancers of the HoxA

cluster.

(A) UCSC genome browser view of H3K27ac and H3K4me3 ChIP-seq tracks at Halr1 gene in

RA-treated ESCs. Positions and genome coordinates of the 4 gRNAs used for respective E1 and

E2 deletions are labeled.

(B) Schematic representation of the strategy to knock out E1. CTCFBS: CTCF binding site.

(C) Southern blot of WT and E1 KO ESC DNA showing that E1 KO cells contain the knockout allele but not the WT allele.

(D) Genotyping PCR results of E1 KO and E2 KO ESCs. Arrowheads indicate the size of PCR products.

(E) Sanger sequencing results showing the DNA sequence of the mutated regions in E1KO,

E2KO, and DKO cells, respectively. Genome locations of gRNAs are labeled in blue, and PAM sequences are labeled in green. Red dash lines represent deleted sequences.

Supplemental Figure S2. Characterization of enhancer-deleted and Halr1-depleted cells with RNA-seq.

(A-B) Correlation analysis of gene expression levels between enhancer-deleted and WT ESCs.

RNA-seq correlation of undifferentiated (A) and differentiated (B) cells are shown with HoxB genes labeled in (B). None of the HoxB cluster genes were significantly misregulated in the KO

ESCs. Significantly upregulated genes are marked with purple dots, while significantly downregulated genes are marked with green dots. Other genes are shown as grey dots.

3

(C) UCSC genome browser view of transcription levels of Halr1 (top), Hoxa1 (middle), and

Hoxa3-a7 (bottom) in non-targeting shRNA (nonTsh) infected and Halr1 shRNA (Halr1sh)

infected ESCs treated with RA. Arrows indicate direction of transcription.

Supplemental Figure S3. Epigenetic effects of E1 and E2 deletion.

(A) UCSC genome browser view of H3K4me1 (top) and H3K27ac (bottom) ChIP-seq tracks at the HoxB cluster in WT and DKO ESCs.

(B) UCSC genome browser view of H3K4me1 (top) and H3K27ac (bottom) ChIP-seq tracks at

HoxA cluster in E1KO and E2KO ESCs. E1 and E2 are marked with blue stripes.

(C) UCSC genome browser view of H3K4me2 ChIP-seq tracks at HoxA gene cluster in WT and

DKO ESCs. E1 and E2 are marked with blue stripes.

(D) Hi-C interaction map of a ~170Mb region surrounding HoxA cluster. Data were extracted

from (Dixon et al. 2012). HoxA genes are labeled red in the gene track.

Supplemental Figure S4. Identification of transcription factors (TFs) recruited to E1 and

E2.

(A) UCSC genome browser view of SMC1A ChIP-seq track at E1 and E2 regions in ESCs. Data

were extracted from (Kagey et al. 2010). E1 and E2 positions are marked below the ChIP-seq

track.

(B) anti-SMC1A western blot image (top) and Ponceau S staining (bottom) with control and

SET1A-depleted cell lysates, respectively.

4

(C) UCSC genome browser view of transcription levels of Hoxa1 (top) and Hoxa3-a7 (bottom) in non-targeting shRNA (nonTsh) infected and Halr1 shRNA (Halr1sh) infected ESCs treated with RA. Arrows indicate transcription direction.

(D) Genome positions and epigenetic states of E1, E2, and control (Ctrl) regions for TF motif analysis.

(E) Venn diagram showing the overlap of TF motif matrices found on E1, E2, and Ctrl regions.

Red box denotes the number of E1 and E2 shared matrices not found on the control region. TF matrices are generated with MatInspector.

(F) The list of genes coding TFs that have the 20 motif matrices shown in (E). Genes that have

RPKM (reads per kilobase of transcript per million mapped reads) less than 0.5 in both undifferentiated and RA-induced cells are marked with green, other genes are labeled with magenta. Genes are arranged according to their expression levels in undifferentiated cells such that the top gene has the highest expression.

(G) Relative expression levels of Zic2 and Yy1 in RA-treated cells over that of undifferentiated

ESCs.

(H) UCSC genome browser view of ZIC2, YY1, and P300 ChIP-seq tracks at E1 and E2 regions in ESCs. Data were extracted from (Creyghton et al. 2010; Luo et al. 2015; Sigova et al. 2015), respectively.

(I) nonTsh or target shRNA infected ESCs were treated with RA and the transcriptional changes between nonTsh and target knockdown cells were determined by qRT-PCR (quantitative reverse transcription PCR) and are presented in log2 scale. *: p<0.05; **: p<0.01

5

Supplemental Figure S5. UCSC genome browser view of SET1A, MLL2, MLL4, H3K4me1,

H3K27ac, and H3K4me3 ChIP-seq tracks at HoxA and HoxB clusters in undifferentiated and

RA-treated ESCs. Elevated peaks of COMPASS ChIP in RA-treated cells are indicated with arrowheads. E1 and E2 are marked by blue stripes.

Supplemental Figure S6. MLL3 and MLL4 co-regulate H3K4me1 levels but are not responsible for Hox gene activation.

(A) Western blotting assay of lysates from non-targeting shRNA (nonTsh) and MLL3 shRNA

(MLL3sh) infected MLL4ΔSET ESCs with H3K4me1, H3K4me3, and H3 antibodies.

(B) Correlation analysis of gene expression levels in MLL3sh and nonTsh infected MLL4ΔSET

ESCs treated with RA. HoxA genes that were significantly changed (adjusted p<0.01) are marked with red dots, while unchanged ones are marked with black dots. Significantly downregulated genes are labeled green, and upregulated ones are labeled purple. Other genes are shown as grey dots.

(C) GO analysis of upregulated (top) or downregulated (bottom) genes in SET1A-depleted RA- treated ESCs.

(D) UCSC genome browser view of RNA-seq signals at Hoxa1 (left) and Hoxa3-a7 (right) genes of nonTsh and MLL3sh infected MLL4ΔSET ESCs treated with RA.

(E) UCSC genome browser view of RNA-seq signals at Pou5f1 (left) and Nanog (right) genes of nonTsh and MLL3sh infected MLL4ΔSET ESCs treated with RA.

(F) UCSC genome browser view of H3K4me1 ChIP-seq tracks at the HoxA cluster in undifferentiated (left) and RA-treated (right) nonTsh and MLL3sh infected MLL4ΔSET ESCs, respectively. Regions with altered H3K4me1 levels are highlighted with blue.

6

Supplemental Figure S7. Expression levels of Hox genes are not affected by MLL1 or

MLL2 deletion in ESCs.

(A) UCSC genome browser view of RNA-seq signals showing successful deletion of the indicated exons (red arrows and the red rectangle) at Kmt2a gene.

(B) Western blotting assay indicating the loss of MLL1 in MLL1KO ESCs.

(C) Western blotting assay of lysates from WT and MLL1KO ESCs with H3K4me1, H3K4me3, and H3 antibodies.

(D) UCSC genome browser view of RNA-seq signals at Hoxa1 (left) and Hoxa3-a7 (right) genes of WT, MLL1KO, and MLL2KO ESCs upon RA induction.

(E) UCSC genome browser view of H3K4me1 ChIP-seq tracks at HoxA cluster in undifferentiated (left) and RA-treated (right) WT, MLL1KO, and MLL2KO ESCs, respectively.

Supplemental Figure S8. Depletion of SET1A leads to failures in Hox gene activation and inactivation of proximal regulatory sequences.

(A-B) GO analysis of upregulated (A) or downregulated (B) genes in SET1A-depleted RA- treated ESCs.

(C) Correlation analysis of RNA-seq signals comparing the expression levels of 1212 RA- induced genes in SET1A-depleted cells with control knockdown cells treated with RA.

Significantly upregulated genes by SET1A depletion are marked with purple dots, while significantly downregulated genes by SET1A depletion are marked with green dots. Other genes are shown as grey dots.

7

(D) UCSC genome browser view of RNA-seq signals at Pou5f1, Nanog, Lefty1, and Cdx1 genes

of nonTsh and SET1Ash infected ESCs treated with RA.

(E) UCSC genome browser view of H3K4me1 (top), H3K27ac (middle), and H3K4me3

(bottom) ChIP-seq tacks at HoxA gene cluster (left) and HoxB gene cluster (right) in

undifferentiated or RA-treated ESCs infected with respective non-targeting (nonTsh) and SET1A

(SET1Ash) shRNA. Regions with altered levels of histone marks are highlighted with blue.

Positions of E1 and E2 are marked by blue boxes.

Supplemental Figure S9. COMPASS-dependent H3K4me2 at Hox clusters.

UCSC genome browser view of H3K4me2 ChIP-seq tracks at HoxA and HoxB clusters in

undifferentiated and RA-treated WT, MLL1KO, MLL2KO, MLL3KO, MLL4ΔSET, nonTsh

infected WT, and SET1Ash infected WT ESCs.

8

Supplemental References

Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114‐2120. Creyghton MP, Cheng AW, Welstead GG, Kooistra T, Carey BW, Steine EJ, Hanna J, Lodato MA, Frampton GM, Sharp PA et al. 2010. Histone H3K27ac separates active from poised enhancers and predicts developmental state. Proc Natl Acad Sci U S A 107: 21931‐21936. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS, Ren B. 2012. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485: 376‐380. Kagey MH, Newman JJ, Bilodeau S, Zhan Y, Orlando DA, van Berkum NL, Ebmeier CC, Goossens J, Rahl PB, Levine SS et al. 2010. Mediator and cohesin connect gene expression and chromatin architecture. Nature 467: 430‐435. Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory‐efficient alignment of short DNA sequences to the . Genome Biol 10: R25. Luo Z, Gao X, Lin C, Smith ER, Marshall SA, Swanson SK, Florens L, Washburn MP, Shilatifard A. 2015. Zic2 is an enhancer‐binding factor required for embryonic stem cell specification. Mol Cell 57: 685‐ 694. Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139‐140. Sigova AA, Abraham BJ, Ji X, Molinie B, Hannett NM, Guo YE, Jangi M, Giallourakis CC, Sharp PA, Young RA. 2015. Transcription factor trapping by RNA in gene regulatory elements. Science 350: 978‐ 981. Trapnell C, Pachter L, Salzberg SL. 2009. TopHat: discovering splice junctions with RNA‐Seq. Bioinformatics 25: 1105‐1111. Tripathi S, Pohl MO, Zhou Y, Rodriguez‐Frandsen A, Wang G, Stein DA, Moulton HM, DeJesus P, Che J, Mulder LC et al. 2015. Meta‐ and Orthogonal Integration of Influenza "OMICs" Data Defines a Role for UBR4 in Virus Budding. Cell host & microbe 18: 723‐735.

9