Supporting Information

Wakabayashi et al. 10.1073/pnas.1521754113 SI Materials and Methods DNA Transfection and Puromycin Selection. K562 cells were cotrans- Plasmid Preparation. Short guide RNA (sgRNA) sequences (Table fected with 1 μg total of Cas9 nuclease and sgRNA plasmids using S1) were cloned into the pSg1 vector (Addgene) and the XPR5 Lipofectamine LTX Plus Reagent (Thermo Fisher Scientific) at a lentiviral vector (Broad Institute), respectively. [The XPR5 1:2 ratio of Cas9 to sgRNA. For a control, K562 cells were co- vector contains the Cas9 nuclease and a red fluorescent protein transfectedwith1μg total of Cas9 nuclease and pLKO.1-GFP (RFP) cassette.] The Cas9 nuclease expression vector used was plasmid at a 1:2 ratio of Cas9 to pLKO.1-GFP. At 24 h after co- μ pxPR_BRD001, which contains a puromycin resistance cassette transfection, puromycin was added at a concentration of 2 g/mL, μ as a selection marker. Off-target scores for each guide were followed 24 h later by a reduction to 1 g/mL for an additional 24 h. calculated using the CRISPR design tool (CRISPR Design; Selection efficiency was assessed by flow cytometry with propidium crispr.mit.edu); only guides with a score >50 (except for a score iodide staining (to assess viability) on a FACSCanto II flow cy- of 49 in one case) were used. tometer (BD Biosciences). Limiting dilutions were performed to obtain single cell-derived clonal populations for both cells targeted Cell Culture and Lentivirus Production. The K562 cells (American with sgRNAs as well as for GFP controls. Unless specified other- Type Culture Collection) were maintained in RPMI medium 1640 wise, three matching clonal GFP controls were analyzed for each plus L-glutamine (Life Technologies) supplemented with 10% experiment. (vol/vol) FBS (Atlanta Biologicals) and 1% penicillin-strepto- mycin (PS; Life Technologies). Cells were incubated at 37 °C Identification of Deletions. Genomic DNA was isolated from single clones using the DNEasy Blood & Tissue Kit (Qiagen). PCR with 5% CO2 in air atmosphere. The 293T cells were maintained in DMEM, high glucose (Life Technologies) supplemented with analyses were performed using Platinum PCR Supermix (In- 10% FBS and 1% PS. For lentivirus production, 293T cells were vitrogen) and a T100 Thermal Cycler (Bio-Rad) with primer transfected with sgRNA constructs (Table S1) along with the pairs flanking the sgRNA target sequences, as well as with VSV-G envelope and the pDelta8.9 packaging vector using Fu- primer pairs that select for clones containing a deletion within gene6 (Roche) transfection reagent according to the manufac- the targeted GATA1 binding site (Fig. S3). Genomic DNA turer’s instructions. The next day, the medium was changed to isolated for screening was PCR-amplified using Platinum PCR Supermix (Invitrogen) on a T100 Thermal Cycler (Bio-Rad) Iscove’s modified Dulbecco’s medium (IMDM; Life Technolo- (Table S1), and purified with the Qiaquick PCR Purification Kit gies) supplemented with 2% (vol/vol) human AB plasma, 3% (Qiagen). The product was Sanger-sequenced, and trace files (vol/vol) human AB serum (Atlanta Biologicals), and 1% PS. were analyzed on FinchTV (Geospiza). Chromatograms were The next day (48 h posttransfection), viral supernatant was col- compared with DNA sequences obtained from the University of lected and filtered through a 0.45-μm filter, and then concen- California Santa Cruz (UCSC) Genome Browser (reference ge- trated by centrifuging at 68,320 × g for 2 h at 4 °C with an nome GRCH37/hg19). Optima L-100 XP ultracentrifuge (Beckman Coulter). Concen- When identifying deletions, it is important to keep in mind that trated virus was dissolved in 2 mL of supernatant overnight at K562 cells are aneuploid in certain regions. Here we aimed to 4 °C, and infection was performed the next day. completely disrupt the targeted RE regardless of the copy number + at a particular locus. For all clonal deletions (either homozygous Primary Cell Culture and Lentiviral Infection. Adult CD34 HSPCs or compound heterozygous deletions), we observed Sanger- from mobilized peripheral blood mononuclear cells (Harvard sequencing traces only with a deletion and did not observe any Stem Cell Institute Flow Cytometry Facility) were maintained in traces of the reference K562 DNA, confirming that the targeted base medium (IMDM plus 2% human AB plasma, 3% human CRE was modified in all copies. In cases with compound het- AB serum, 1% PS, 3 U/mL heparin, 10 μg/mL insulin, and erozygosity, we typically observed similar intensities for both 200 μg/mL holo-transferrin) supplemented with 3 U/mL erythro- alleles, suggesting that the loci were diploid. poietin, 10 ng/mL human recombinant stem cell factor, and 1 ng/mL – human recombinant IL-3 for days 0 6 of culture. IL-3 was Quantitative RT-PCR. RNA was extracted from selected clones – omitted on days 7 12 of culture. Cells were incubated at 37 °C using the RNEasy Plus Mini Kit (Qiagen) and the Ambion with 5% CO2. For lentiviral infection, cells (at day 2 of culture) RNAqueous–Micro Total RNA Isolation Kit (Life Technolo- × μ were spun at 931 g for 90 min at room temperature with 8 g/mL gies). cDNA was synthesized using the iScript cDNA Synthesis polybrene (Millipore), 1 mL of concentrated virus, and 1 mL Kit (Bio-Rad). Quantitative RT-PCR (qRT-PCR) was performed of IMDM supplemented with 2% human AB plasma, 3% human with iQSYBR Green Supermix (Bio-Rad) on a CFX96 Real-Time AB serum, and 1% PS. The next day, the medium was changed PCR System (Bio-Rad). Primer sequences are listed in Table S1. μ to omit the 8 g/mL polybrene and concentrated virus. At 3 d mRNA expression levels were quantified and calculated via the postinfection, RFP was measured by flow cytometry analysis ΔΔCt method and normalized to β-actin levels (55). with a BD Acuri C6 flow cytometer (BD Biosciences). Heme Quantification. A total of 250,000 cells from the controls and Fluorescence-Activated Cell Sorting and Analysis. For fluorescence- ALAS2 clones were counted with a hemacytometer and har- activated cell sorting (FACS), cells were harvested at day 10 of vested for heme quantification using the QuantiChrom Heme culture, washed with FACS buffer [3% (vol/vol) FBS in PBS], and Assay Kit (BioAssay Systems; DIHM-250). The assay was per- then resuspended in 600 μL of FACS buffer. RFP-positive cells formed in triplicate, following the manufacturer’s protocol. were then sorted using a FACSAria II cell sorter (BD Biosci- ences). For flow cytometry analysis of differentiation, cells were Porphyrin Quantification. A total of 100,000 cells from the controls harvested at day 12 of culture, washed with FACS buffer, stained and UROS clones were seeded in RPMI medium 1640 including with allophycocyanin-conjugated anti-human CD235a antibody L-glutamine supplemented with 10% FBS and 1% PS, along with (17-9987; eBioscience) and analyzed with a BD Acuri C6 flow 1mMδ-aminolevulinic acid (Sigma-Aldrich), at a density of cytometer. 50,000 cells/mL. Cells were incubated at 37 °C with 5% CO2,

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 1of12 protected from light, for 72 h. The cells were then centrifuged at experiment, and error bars represent the SEM. Disruptions of 456 × g at for 5 min at 4 °C, and 1 mL of supernatant was col- GATA1 binding motifs for selected mutations were initially lected for analysis. Samples were deproteinized with an equal identified using TRAP (57). ChIP-seq data for GATA1, TAL1, volume of 20% TCA/DMSO (1:1, vol/vol), incubated on ice for NFE2, and KLF1 were analyzed as described previously (32) and 15 min, and then centrifuged at 10,000 × g for 10 min. Porphyrins displayed as input-normalized reads per million. Raw ChIP-seq in the samples were separated by ultra-performance liquid data were obtained and combined from multiple studies (31, 32, chromatography (UPLC) in a ACQUITY UPLC system (Waters) 58–62). LDB1 ChIP-seq and FAIRE-seq in EBs were obtained on a BEH C18 2.1 × 100 μM column (Waters) as described from www.ncbi.nlm.nih.gov/geo/ (accession nos. GSE52637 and previously (56). GSE36985) and processed similarly (58, 63). GATA1 and NDR – Eluent A was 1 mol/L ammonium acetate acetic acid buffer, peaks were defined using MACS2 (64). DHS-seq for K562 cells pH 5.16, with 0.02% sodium azide, and eluent B was neat ace- was obtained from the Integrative Genomics Browser (IGV) tonitrile. The elution program was as follows: 2 min concave server, and IGV was used for visualization of genome-wide as- (Waters #8) gradient from 8% (vol/vol) to 70% (vol/vol) B, says. RNA-seq data across the erythroid lineage were obtained followed by 0.5 min at 70% B, and a 1.5-min reequilibration at from www.ncbi.nlm.nih.gov/geo/ (accession nos. GSE61566 and 8% B at a flow rate of 0.6 mL/min. Column temperature was GSE53983) and processed as described previously (32, 65). maintained at 65 °C. Porphyrins were detected using a Waters PhastCons scores derived from 100 vertebrates were downloaded ACQUITY UPLC fluorescence detector with 405 nm excitation, from hgdownload.cse.ucsc.edu/goldenpath/hg19/phastCons100way/ 619 nm emission, and a gain of 100. For porphyrin quantitation, and compared using the seqplots package in R/Bioconductor. the UPLC chromatograph was standardized using the URO I K-means clustering was used to derive clusters of conservation and COPRO III fluorescence standards UFS-1 and CFS-3, re- across GATA1 motifs. TF binding intensities at promoters (de- spectively (Frontier Scientific). All experiments were performed ± in triplicate. fined as 1 kb from the TSS) were determined using the UCSC Genome Browser tool bigWigAverageOverBed; the maximum Intracellular Flow Cytometry. Cells from the controls and PKLR value in an interval was taken. When multiple promoters were clones were harvested for intracellular staining with the anti- present for a single annotated , the promoter with the PKLR antibody (1H9, ab123908; Abcam), using a modified protocol maximum TF intensity was chosen. Intensities were then log2- and reagents from the Click-iT Flow Cytometry Assay Kit (Life transformed, and negative values (i.e., when input is greater than Technologies; C-10418). Cells were analyzed on a FACSCanto II ChIP) were set to 0. Random forest models of proE gene ex- flow cytometer (BD Biosciences). pression were then learned on a training set of TF binding in- tensities for 15,000 with the randomForest package in R Enzymatic Activity Quantification. A total of 50,000 with the parameters mtry = 2, mtree = 300, and ntree = 501. A cells of the controls and PKLR clones were harvested. The assay total of 3,000 genes were held out as a test set, and fit was was performed with the Sigma-Aldrich Pyruvate Kinase Activity reported for this set. Assay Kit (MAK072) following the manufacturer’s protocol. TF binding intensities were then transformed to z-scores, and PAM was used to identify TF binding intensity clusters with ChIP-PCR. A total of 60,000,000 cells of the controls and clones the pamk function from the fpc package in R. Enhancers near ALAS2 A-4, UROS U-1, and PKLR P-1 were harvested and fixed genes in each cluster were defined by EB DHS peaks within μ with 1% formaldehyde in preparation for ChIP with 10 gof ±100 kb of the TSS (excluding DHS peaks overlapping ±2kbof normal rabbit-IgG (sc-2027x; Santa Cruz Biotechnology) or each TSS). Similar to promoter analyses, LDB1, GATA1, and TAL1 clone C-21 (sc-12984; Santa Cruz Biotechnology). ChIP TAL1 binding intensities were mapped to each enhancer and DNA was then quantified against the ALAS2, PKLR, and UROS summed for each gene. The Mann–Whitney U test was used to loci via qRT-PCR using iQSYBR Green Supermix (Bio-Rad) on compare TF intensities across two clusters. the CFX96 Real-Time PCR System (Bio-Rad). A gkmer-SVM and a convolutional neural network (DeepBind) were used to predict the effects of NC mutations (15, 16). K562 Identifying Putative CREs in MEDs. MEDs whose genes act in- trinsically within the erythroid lineage (ANK1, SPTB, SPTA1, weights for gkmer-SVM were obtained from www.beerlab.org/ SLC4A1, EPB42, EPB41, PIEZO1, KCNN4, GLUT1, G6PD, deltasvm/, and weights for the EB NDRs were derived according PKLR, NT5C3A, HK1, GPI, PGK1, ALDOA, TPI1, PFKM, to the protocol of Lee et al. (16). ALAS2, UROS, and FECH) were selected from the literature. Trained models of GATA1, TAL1, NFE2, and KLF1 (SP1 CREs were defined as NDR peaks in proEs that were proximal SELEX-seq) binding were obtained from tools.genes.toronto.edu/ to the target gene (manually defined within 500 kb of each MED deepbind/ (D00765.001, D00815.001, D00535.004, and D00650.005, gene so as not to include nearby promoters or regulatory regions respectively). Mutation maps for both the gkmer-SVM and that were more likely to specifically regulate other genes). DeepBind predictions were created using a custom R script following the outline in the supplemental notes of Alipanahi Bioinformatics and Statistical Analyses. The Student t test was used et al. (15). Predictions were averaged across windows of 10 nt for to compare mRNA expression, functional assays, and ChIP-PCR gkmer-SVM and 24 nt for DeepBind unless specified otherwise results between controls and CRISPR-edited clones. Unless (e.g., in the PKLR promoter where multiple GATA1 motifs were specified otherwise, three replicates were performed for each observed within a single 24-nt window).

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 2of12 Fig. S1. Selected NC mutations from three distinct MEDs disrupt the GATA1 motif. The canonical GATA1 motif is shown as a position weight matrix. For each mutated CRE, one reported mutation in the GATA1 motif is shown. P values for the presence of the GATA1 motif in each reference (Ref) sequence are shown, along with the log10 difference in this P value between the reference sequence and the sequence with the reported mutation. In each case, the mutation is predicted to completely disrupt the GATA1 motif.

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 3of12 50

Fig. S2. Comparison of erythroid TF chromatin occupancy between primary human proEs and the erythroid K562 cell line. (A–C) ChIP-seq profiles across ALAS2, UROS, and PKLR in proEs, similar to Fig. 1 A–C.(D–F) ChIP-seq profiles across ALAS2, UROS, and PKLR in K562 cells, similar to Fig. 1 A–C.

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 4of12 Fig. S3. PCR-based genomic DNA screening method and deletions generated across interrogated GATA1 motifs. (A) For each gene, one forward primer and two reverse primers were designed for the screen (Table S1). FW1 and RV1 acted as control primers and amplified the whole region of interest. FW1 and RV2 failed to amplify a product when the GATA1 binding site (in red) was sufficiently deleted to interfere with RV2 binding. Assuming a diploid genotype, we determined whether the deletions introduced were present in the first (1) or second (2) chromosomal copy. (B) PCR products of FW and RV1 of PKLR clones and two GFP controls. PCR products for all samples are observed. Band sizes are unequal due to deletions as a result of CRISPR/Cas9 editing. (C)PCR products of FW and RV2 of PKLR clones and two GFP controls. PKLR clones 1, 3, 4, and 5 show no product. (D–F) Clonal deletions across GATA1 motifs obtained for CREs proximal to ALAS2 (D), UROS (E), and PKLR (F).

Fig. S4. of ALAS2 and ALAS1.(Left) Expression of ALAS2 and ALAS1 in the erythroid lineage. (Right) Expression of ALAS2 and ALAS1 in the K562 cell line.

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 5of12 Fig. S5. Targeting additional GATA1 binding sites near genes implicated in MEDs. (A) ChIP-seq profiles across HK1, similar to Fig. 1 A–C. H1 and H2 are GATA1 motifs in CREs occupied by GATA1 that we targeted using CRISPR/Cas9. (B) Relative mRNA expression levels of HK1.**P < 0.01; ***P < 0.001. (C) ChIP-seq profiles across SLC4A1, similar to Fig. 1 A–C. S1 and S2 are GATA1 motifs in CREs occupied by GATA1 that we targeted using CRISPR/Cas9. (D) Relative mRNA expression levels of SLC4A1.**P < 0.01.

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 6of12 Fig. S6. Clusters of TF intensity at promoter regions. Multiple plots showing the differences in TF activity across PAM-derived clusters of ∼18,000 genes. Units are all in log2 input-normalized reads per million. (Lower Diagonal) Pairwise scatterplots for the binding intensities of TFs. Points (genes) are color-coded by cluster, as in Bottom Right.(Diagonal) Probability densities of TF binding intensities for each cluster. (Upper Diagonal) Spearman correlations for pairwise TFs. (Far Right) Boxplots of the TF binding intensities for each cluster. (Bottom Right) Relative numbers of genes in each cluster.

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 7of12 Fig. S7. Erythroid TFs binding at enhancers. Shown are intensities of LDB1 (A), GATA (B), and TAL1 (C) across enhancers assigned to genes in each cluster, as − defined in Fig. 4 B and C. ***P < 10 9.

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 8of12 Fig. S8. Mutation maps based on a gkmer-SVM trained on erythroid open chromatin and DeepBind trained on GATA1 and TAL1 ChIP-seq. All interrogated GATA1 motifs that exhibited a decrease in target gene expression are shown: (A) ALAS2 intron 1, (B) UROS intron, (C) EPB41 E1, (D) EPB41 E2, (E) KCNN1 K1, (F) HK1 H1, and (G) HK1 H2. In most cases, disruption of a GATA1 motif was more important than disruption of a TAL1 motif for predicted TAL1 binding.

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 9of12 Fig. S9. Interrogation of two GATA1 binding motifs in the “complex” enhancer in the eighth intron of ALAS2.(A) ChIP-seq profiles across ALAS2, similar to Fig. 1A. A2 is a GATA1 motif in a CRE occupied by GATA1 that we targeted using CRISPR/Cas9. (B) Relative mRNA expression levels of ALAS2.**P < 0.01; ****P < 0.0001.

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 10 of 12 Table S1. sgRNA oligos and primers for PCR screening, qRT-PCR, and sequencing sgRNA oligo/primer 5′ → 3′

sgRNA oligos for pSg1 cloning ALAS2 sgRNA FW AACTCTGGCAACTTTATCTGGTTTT ALAS2 sgRNA RV CAGATAAAGTTGCCAGAGTTCGGTG PKLR sgRNA FW AAACTGCTGGTCTTATCTAAGTTTT PKLR sgRNA RV TTAGATAAGACCAGCAGTTTCGGTG UROS sgRNA FW GAAGACCCCTGTCACTGATAGTTTT UROS sgRNA RV TATCAGTGACAGGGGTCTTCCGGTG EPB41-1a sgRNA FW GCCCAGGCTCTGACAGGATAGTTTT EPB41-1a sgRNA RV TATCCTGTCAGAGCCTGGGCCGGTG EPB41-1b sgRNA FW CTTGAGGTGGGTGATAAAGAGTTTT EPB41-1b sgRNA RV TCTTTATCACCCACCTCAAGCGGTG EPB41-1c sgRNA FW CTGTGGGGCGCTGATAAGCTGTTTT EPB41-1c sgRNA RV AGCTTATCAGCGCCCCACAGCGGTG EPB41-2 sgRNA FW CACACACACTCTTATCAGGCGTTTT EPB41-2 sgRNA RV GCCTGATAAGAGTGTGTGTGCGGTG KCNN4 sgRNA FW CGGCACACCCCACTTATCTCGTTTT KCNN4 sgRNA RV GAGATAAGTGGGGTGTGCCGCGGTG HK1-1 sgRNA FW GACTCAGTGTTACTTATCTGGTTTT HK1-1 sgRNA RV CAGATAAGTAACACTGAGTCCGGTG HK1-2 sgRNA FW AGGGTTTGCTGGCTCAGATAGTTTT HK1-2 sgRNA RV TATCTGAGCCAGCAAACCCTCGGTG SLC4A1-1 sgRNA FW TGTGCTGCCTAGCACTGATAGTTTT SLC4A1-1 sgRNA RV TATCAGTGCTAGGCAGCACACGGTG SLC4A1-2 sgRNA FW GTGGAGGGAGAAGATAGCTCGTTTT SLC4A1-2 sgRNA RV GAGCTATCTTCTCCCTCCACCGGTG sgRNA oligos for XPR5 cloning ALAS2 shRNA FW CACCGAACTCTGGCAACTTTATCTG ALAS2 shRNA RV AAACCAGATAAAGTTGCCAGAGTTC PKLR shRNA FW CACCGAAACTGCTGGTCTTATCTAA PKLR shRNA RV AAACTTAGATAAGACCAGCAGTTTC UROS shRNA FW CACCGAAGACCCCTGTCACTGATA UROS shRNA RV AAACTATCAGTGACAGGGGTCTTC Primers for PCR screening ALAS2 FW TGCCTGCTTGTGAAAGCTAA ALAS2 RV1 GGAGTGGTCAGACCCCAAT ALAS2 RV2 GGCGATAAACTCTGGCAACTTTA PKLR FW CGGGACCATGGAATGAGAG PKLR RV1 TGTGCCCCTTTTCTCTTCTC PKLR RV2 CTTTTCTCTTCTCTGTCTCCCTTAGAT UROS FW GCACTAATGGGCTTGTTCTTTC UROS RV1 TGGTTTCATCTGTCTTTCCAAG UROS RV2 CATGCTCTTTCTTGGCCTTA Primers for qRT-PCR B-actin FW AGAAAATCTGGCACCACACC B-actin RV GGGGTGTTGAAGGTCTCAAA ALAS2 FW ACCTACCGTGTGTTCAAGACT ALAS2 RV AGATGCCTCAGAGAAATGTTGG PKLR FW TCAAGGCCGGGATGAACATTG PKLR RV CTGAGTGGGGAACCTGCAAAG UROS FW GCCAAGTCAGTGTATGTGGTT UROS RV GCAATCCCTTTGTCCTTGAGC EPB41 FW TGAACTGGGAGACTACGACCC EPB41 RV AGCTGGAGTCATGGACCTGT KCNN4 FW CTGCTGCGTCTCTACCTGG KCNN4 RV AGGGTGCGTGTTCATGTAAAG HK1 FW GCTCTCCGATGAAACTCTCATAG HK1 RV GGACCTTACGAATGTTGGCAA SLC4A1 FW GGTGATGGACGAAAAGAACCA SLC4A1 RV AAGACTCTACGCAGCTCTAGG Primers for sequencing ALAS2 PCR FW CAGCCTGGGTTGGTATGTG ALAS2 PCR RV TAGCCAGATGCTCAGACGTG

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 11 of 12 Table S1. Cont. sgRNA oligo/primer 5′ → 3′

ALAS2 SEQ FW TCAGCTGTCAAACGTGAGGT PKLR PCR FW CCTCTCTGGGTCTCCCTCTC PKLR PCR RV GAGGAAATGCCAGGAGATGA PKLR SEQFW GGCTTCTGTCTCCCCTTCTT UROS PCR FW AGGGATCAAAGTGGCTTCAA UROS PCR RV TCTTTCCGGAACCATAAACG UROS SEQ FW TCCTAAGCAATTTCCGATGG

Wakabayashi et al. www.pnas.org/cgi/content/short/1521754113 12 of 12