The Pennsylvania State University

The Graduate School

College of Medicine

EPIGENETIC AND GENE EXPRESSION CHANGES MEDIATED BY HISTONE H3 METHYLATION IN ACUTE

MYELOID LEUKEMIA

A Thesis in

Biomedical Sciences

by

Abigail Harris Becker

 2018 Abigail Harris Becker

Submitted in Partial Fulfillment of the Requirements for the Degree of

Master of Science

May 2018

The thesis of Abigail Harris Becker was reviewed and approved* by the following:

Sergei Grigoryev Professor of Biochemistry and Molecular Biology Thesis Advisor

Kristin Eckert Professor of Pathology, and Biochemistry and Molecular Biology

Gregory S. Yochum Associate Professor of Biochemistry and Molecular Biology and Surgery

Ralph L. Keil Associate Professor of Biochemistry and Molecular Biology Chair of Biomedical Sciences Graduate Program

*Signatures are on file in the Graduate School

iii

ABSTRACT

Histone modifications are the principal regulators of chromatin dynamics, mediating the extent of chromatin compaction which in turn alters DNA accessibility for gene regulation and repair. Large-scale chromatin remodeling, manifested by the progressive condensation of chromatin marked by the repressive histone modification H3K9me2, is essential to establish cell-specific gene expression patterns during lineage commitment and cellular differentiation.

Dysregulation of this large-scale regulatory mechanism has been linked to cellular transformation and oncogenesis in cancer cell lines, diverse solid tumors and hematopoietic cancers. Acute myeloid leukemia (AML) is a highly malignant blood cancer and many of the recurrent gene fusions and driver mutations associated with AML are directly related to epigenetic regulation and chromatin higher order structure, suggesting broad genetic and epigenetic disruption in the pathogenesis of AML.

We conducted Chromatin Immunoprecipitation followed by massively parallel sequencing (ChIP-seq) to analyze genome-wide H3K9me2 distribution in primary AML patient samples, CD34+ hematopoietic progenitors, and mature granulocytes, and identified distinct genomic regions marked by significant bidirectional changes of H3K9me2, including domains persistently altered in AML patient samples. In addition, we showed that targeting the histone methyltransferase G9a, which is responsible for the majority of euchromatic H3K9me2, with a pharmacologically active inhibitor, reverses some of the changes observed in AML cells. We performed ChIP-qPCR and transcriptional analysis of genes within the specific regions and in

iv this thesis present evidence in support of the ChIP-seq analysis. We propose that the bidirectional changes in H3K9me2 domains (i.e. loss or gain of H3K9me2, rearrangement in domain size or location) results in coordinated activation or silencing of gene clusters, which significantly contributes to the development of AML.

v

TABLE OF CONTENTS

List of Figures ...... vii

List of Tables ...... viii

List of Abbreviations ...... ix

Acknowledgements ...... x

Chapter 1: Introduction ...... 1

1.1 Epigenetic mechanisms mediated by Histone H3 modifications ...... 1 1.1.1 Regulation of heritable changes in chromatin state ...... 1 1.1.2 Role of Histone H3 K9 dimethylation in cellular differentiation ...... 2 1.1.3 The histone methyltransferases G9a and GLP ...... 3 1.2 Epigenetic dysregulation in Acute Myeloid Leukemia ...... 5 1.2.1 Evidence of broad disruption of epigenetic mechanisms in AML ...... 5 1.2.2 H3K9 dimethylation domains and oncogenesis ...... 8 1.3 Characterization of H3K9me2 domains in primary AML cells, CD34+ cells and granulocytes ...... 9 1.3.1 Distinguishing characteristics associated with H3K9me2 domains ...... 13 1.3.2 Deregulation of H3K9me2 domains and gene expression in AML ...... 18 1.3.3 Inhibiting the histone methyltransferases G9a and GLP in K562 cells ... 23

Chapter 2: Materials and Methods ...... 25

2.1 Cells and cell culture ...... 25 2.1.1 Human blood samples ...... 25 2.1.2 Bone marrow CD34+ cells ...... 25 2.1.3 K562 cells...... 25 2.1.4 G9a/GLP inhibition in K562 cells by UNC0638 ...... 25 2.2 Antibodies ...... 26 2.3 SDS-PAGE and Western blot ...... 26 2.4 Chromatin immunoprecipitation ...... 27 2.4.1 Nuclei isolation ...... 27 2.4.2 Immunoprecipitation ...... 28 2.5 Library preparation and Next-Generation sequencing ...... 29 2.6 Quantitative PCR ...... 30 2.6.1 Instrumentation ...... 30 2.6.2 Primer sequences ...... 30 2.6.3 ChIP-qPCR ...... 34 2.6.4 RNA extraction and mRNA expression analysis ...... 34

vi

Chapter 3: Domain-specific changes in H3K9 dimethylation in AML patient samples 35

3.1 Global H3 Lysine 9 methylation in AML ...... 35 3.2 Evaluating H3K9me2 levels at selected dLOCKs by ChIP-qPCR ...... 38

Chapter 4: Domain-specific inhibition of G9a/GLP in K562 cells and coordinate effects on H3K9me2 domains and gene expression ...... 41

4.1 Global H3K9 di- and tri-methylation in K562 cells after pharmacological inhibition of G9a/GLP ...... 41 4.2 Quantitative PCR evaluation of H3K9me2 levels and gene expression following G9a inhibition ...... 44

Chapter 5: Discussion ...... 47

References ...... 53

Appendix: Protocol, ChIP-seq of histone modifications and chromatin-binding proteins 59

vii

LIST OF FIGURES

Figure 1: H3K9me2 domain in granulocytes, CD34+ progenitors and K562 cells...... 12

Figure 2: Genome-wide correlation analysis of H3K9me2 domains in granulocytes, CD34+ and K562 cells ...... 14

Figure 3: H3K9me2 domain correlation analysis separates AML samples into two epigenetically distinct clusters...... 17

Figure 4: Genome-wide analysis of H3K9me2 domains reveals sites recurrently deregulated in AML...... 19

Figure 5: Gene regulation within AML-associated H3K9me2 domains ...... 22

Figure 6: Analysis of K562 cells following G9a inhibition with the small molecule UNC0638...... 24

Figure 7: qPCR assay validation using standard curve and melt peak analysis...... 33

Figure 8: Analysis of global histone H3K9 methylation in AML samples and K562 cells...... 37

Figure 9: Comparing ChIP-seq and ChIP-qPCR analysis in AML samples and granulocytes...... 39

Figure 10: Treatment with UNC0638 results in selective and reversible reduction of global histone H3K9me2 but not H3K9me3 in K562 cells...... 43

Figure 11: ChIP- and RT-qPCR analysis of K562 cells following G9a inhibition...... 45

viii

LIST OF TABLES

Table 1. Quantitative PCR cycling program ...... 30

Table 2. Primer sequences ...... 31

ix

LIST OF ABBREVIATIONS AML Acute myeloid leukemia CGI Cytosine-Guanine Island ChIP Chromatin Immunoprecipitation ChIP-qPCR Chromatin Immunoprecipitation quantitative polymerase chain reaction ChIP-seq Chromatin Immunoprecipitation followed by massively parallel sequencing Ct or Cq Cycle threshold or Quantification cycle DNAse Deoxyribonuclease dsDNA Double stranded DNA ECL Enhanced chemiluminescence EMT Epithelial-to-mesenchymal transition ESC Embryonic stem cell H3K9me1 Histone H3 Lysine 9 monomethylation H3K9me2 Histone H3 Lysine 9 dimethylation H3K9me3 Histone H3 Lysine 9 trimethylation H3K4me3 Histone H4 Lysine 9 trimethylation HKMTase Histone Lysine (K) methyltransferase HMTase Histone methyltransferase HMM Hidden Markov Model HSC Hematopoietic stem cells HSPC Hematopoietic stem and progenitor cells IP Immunoprecipitation KDM Lysine demethylase KRAB-ZNF Zinc finger proteins which contain a Krüppel-associated box domain LAD Lamina associated domains LOCK Long Organized Chromatin K(Lysine) domain LSC Leukemic stem cells MDS Myelodysplastic syndrome mESC Mouse embryonic stem cell MNAse Micrococcal nuclease PCR Polymerase chain reaction PVDF Polyvinylidene difluoride qPCR Quantitative polymerase chain reaction RT-qPCR Reverse transcriptase quantitative polymerase chain reaction SNV Single nucleotide variants

x

ACKNOWLEDGEMENTS

I would like to thank Sergei for his guidance and enthusiasm throughout my time at

Hershey. In addition, a heartfelt thank you to Drs. Genya Popova, Sarah Bronson, David

Claxton, Ralph Keil and my committee members Drs. Kristin Eckert and Gregory Yochum, who provided much needed support and guidance. Many thanks to Dr. James Broach and the

Biochemistry and Molecular Biology Department, as well as a Penn State Hershey Clinical and

Translational Sciences Institute grant, for providing funding and enabling my graduate research to continue.

Thank you to the many graduate and medical students who touched my life, especially

Jenna Craig Buckwalter, Caitlin Nealon and Heather Young Moyer. I am eternally grateful for such wonderful friends, whose warmth, understanding and humor kept me going through many long days and nights.

Finally, I would like to express the utmost gratitude and love to my parents for their continued support and constant faith in me, and to my dearest husband Joseph who has been my most devoted and loving supporter through everything, without him none of this would have been possible.

1

Chapter 1

Introduction

1.1 Epigenetic mechanisms mediated by Histone H3 modifications

1.1.1 Regulation of heritable changes in chromatin state

Eukaryotic genomic DNA is packaged in the nucleus in the form of a nucleosome, 147 base pairs (bp) of DNA wrapped around an octamer of histone proteins consisting of two copies each of H3, H4, H2A and H2B (Richmond and Davey, 2003; Li and Reinberg, 2011). Nucleosomes constitute the first order of chromatin organization, which is further condensed by binding of the linker histone (H1 or H5)( Li and Reinberg, 2011). Chromatin condensation and organization is further regulated by post-translational modifications of the N-terminal “tails” of the core histone proteins (Kouzarides, 2007; Li and Reinberg, 2011). Chromatin states are regulated through a complex interplay of epigenetic mechanisms, including post-translational modifications to histone tails, DNA methylation and non-histone proteins that serve as chaperones and chromatin remodelers [Tachibana et al., 2002; Grigoryev et al., 2004;

Kouzarides, 2007; Ernst et al., 2011; Chen et al., 2012]. Histone modifications are the principal regulators of chromatin dynamics, altering the epigenetic states of chromatin and the extent of chromatin compaction [Grigoryev et al., 2006; Kouzarides, 2007], which in turn alter DNA accessibility for gene regulation and repair [Timp and Feinberg, 2013; Burman et al., 2015].

2 Constitutive heterochromatin is enriched for repressive histone modifications like histone H3 trimethylation at lysine 9 (H3K9me3) and histone H4 trimethylation at lysine 20

(H4K20me3) and occurs at sites that are not transcriptionally active, such as highly repetitive sequences and gene deserts [Grigoryev, 2006; Krishnan, 2011]. Transcriptionally active euchromatin typically has elevated levels of histone H4 acetylation at lysine 8 and lysine 12

(H4K8Ac and H4K12Ac, respectively) [Grigoryev et al., 2004; Mirabella, et al., 2016]. Facultative heterochromatin is established during cellular differentiation when euchromatin progressively acquires histone H3 K9 dimethylation (H3K9me2) and H3 K27 trimethylation (H3K27me3), which results in a more condensed chromatin state [Tachibana et al., 2002; Wen et al., 2009; Li and Reinberg, 2011; Chen et al., 2012; Schones et al., 2014]. Alterations in the type, extent and patterning of these histone modifications can result in large-scale remodeling of repressed or active chromatin [Tachibana et al., 2002; Grigoryev et al., 2004; Tachibana et al., 2005; Li and

Reinberg, 2011].

1.1.2 Role of Histone H3 Lysine 9 dimethylation in cellular differentiation

In vertebrates, histone H3 K9me2 is the predominant marker of euchromatin in proliferating cells [Tachibana et al., 2002] and of chromatin condensation during terminal cell differentiation [Grigoryev et al., 2004; Popova et al., 2012]. Extensive Histone H3 K9 dimethylation domains, contiguous chromosomal regions with elevated levels of the repressive histone modification, arise during differentiation [Wen et al., 2009; Chen et al., 2012].

Establishment of facultative heterochromatin, driven primarily by the spread of H3K9me2, serves to repress blocks of genes, usually in a tissue-specific or developmentally-regulated

3 manner [Wen et al., 2009; Chen et al., 2012]. This large-scale chromatin remodeling is essential during lineage commitment to specify gene expression patterns for specialized cell types [Wen et al., 2009; Liu et al., 2015].

1.1.3 The histone methyltransferases mediating histone H3K9 dimethylation

The conserved histone lysine methyltransferases (HKMTase) G9a (or EHMT2) and GLP

(or EHMT1) are responsible for the majority of H3K9me2 across the genome and play a significant role in genome-wide epigenetic reprogramming [Tachibana et al., 2005; Wen et al.,

2009; Liu et al., 2015+. G9a can function as a “reader” of epigenetic modifications, recognizing and binding unmethylated and monomethylated histone H3 at lysine 9, usually at or near CG sites *Chen et al., 2012; Schones et al., 2014+. As a “writer”, G9a can catalyze conversion of

H3K9me to H3K9me2 and can recruit the HKMTases Suv39h1 (KMT1A) and SETDB1 (KMT1E) to catalyze conversion of H3K9me2 to H3K9me3 [Fritsch et al., 2010; Schones et al., 2014]. These modifications provide an interaction point for the heterochromatin-associated protein HP1, which promotes local heterochromatization and de novo DNA methylation, thereby regulating chromatin state and gene repression [Feldman et al., 2006; Chin et al., 2007; Shinkai and

Tachibana, 2011; Shankar et al., 2013]. G9a can form functional multimeric complexes with

GLP, Suv39h1 and SETDB1 that are recruited to individual G9a target genes as well as major satellite repeats, known Suv39h1 genomic targets [Fritsch et al., 2010]. The association between G9a, H3K9me2 and me3, HP1 and DNA methylation are thought to constitute a positive feedback mechanism for spreading and maintaining repressive domains, wherein G9a directs methylation of H3K9 on adjacent nucleosomes [Chin et al., 2007; Collins and Chen,

4 2010; Shinkai and Tachibana, 2011; Krishnan et al., 2011]. The HKMTases G9a, GLP and SETDB1, together with the HP1 protein family, have been implicated as the major regulators of reestablishing pluripotency during reprogramming [Sridharan et al., 2013].

G9a specifically is necessary for embryonic stem cell (ESC) pluripotency, hematopoiesis and early mammalian development [Tachibana et al., 2002; Tachibana et al., 2004; Feldman et al., 2006; Chen et al., 2012; Liu et al., 2015]. Knockdown of G9a results in reduced ESC growth and embryonic lethality in mice [Tachibana et al., 2002; Liu et al., 2015]. While inhibiting G9a in human CD34+ hematopoietic stem cells promoted stem cell-like morphology and function in culture, it also induced up-regulation of gene clusters associated with committed hematopoietic progenitors and tissues like brain and liver [Chen et al., 2012].

Multiple studies have provided links between G9a and GLP and the progressive establishment of large blocks of H3K9me2 during development [Tachibana et al., 2002; Wen et al., 2009; Chen et al., 2012]. Global analysis of chromatin modifications in mouse embryonic stem cells (mESCs) identified distinct genomic domains that acquire high levels of H3K9me2 during differentiation, which the researchers termed LOCKs (Large Organized Chromatin K

(Lysine) modifications) [Wen et al., 2009]. Using knockout cells, G9a was shown to be responsible for up to 90% of H3K9me2 domains established during mESC differentiation in vitro

[Wen et al., 2009]. These regions tend to be co-incident with lamina-associated domains

(LADs), heterochromatic regions that associate with the proteins of the nuclear lamina, two features which are inversely related to gene expression [Yokochi et al., 2009; Timp and

Feinberg, 2013]. In accordance with the known role of H3K9me2 in establishing facultative heterochromatin, these domains show cell- and tissue-specific patterns and result in silencing

5 of functionally related gene clusters [Wen et al., 2009; Yokochi et al., 2009; Chen et al., 2012].

Together, these studies indicate an extensive role for G9a in genome-wide epigenetic remodeling and developmental regulation of a wide range of tissue-specific genes.

1.2 Epigenetic dysregulation in Acute Myeloid Leukemia

1.2.1 Evidence of broad disruption of epigenetic mechanisms in AML

Acute leukemias are malignant clonal disorders of hematopoietic development, characterized by stalled maturation and abnormal differentiation of hematopoietic progenitor cells (HPCs), at the expense of mature blood cells [Shih and Levine, 2011]. In acute myeloid leukemia (AML), myeloid progenitor cells are arrested at various stages of differentiation and show dramatically increased capacity for self-renewal and proliferation, leading to hypercellularity of the bone marrow and low peripheral white blood cell count [Shih and Levine,

2011; Amente et al., 2013; Corces-Zimmerman et al., 2014].

AML shows marked heterogeneity in genetics, disease presentation and clinical outcome *Borate et al., 2012; Patel et al., 2012; Uy et al., 2016+. In addition to “founding” mutations that initiate oncogenesis, serially acquired mutations in hematopoietic stem and progenitor cells (HSPCs) may give rise to so-called leukemic stem cells (LSCs) [Mochmann et al.,

2013; Corces-Zimmerman et al., 2014]. It has been suggested that these malignant and multi- potent LSCs contribute to not only leukemic initiation and pathogenesis, but also to relapse and chemotherapy resistance in some patients [Mochmann et al., 2013; Corces-Zimmerman et al.,

2014; Cai et al., 2015; Uy et al., 2016]. Recently, repeated sequencing of bone marrow samples

6 from patients with AML and other myelodysplatic syndromes (MDS) identified leukemic sub- clones with very rare mutations (defined as one heterozygous mutant in 2000 non-mutant cells), which persisted through treatment and later expand, causing relapse [Uy et al., 2016].

Many of the recurrent mutations and chromosomal translocations seen in AML affect genes directly related to epigenetic regulation, including chromatin modifiers and DNA- methylation-associated genes, suggesting that broad disruption of epigenetic regulation is a primary component in the initiation and pathogenesis of AML [Shih and Levine, 2011; Yoshimi and Kurokawa, 2011; Genomic and epigenomic landscapes, 2013; Hu and Shilatifard, 2016]. For example, over expression of one transcription factor, ERG, in a leukemic cell line resulted in down-regulation of many genes associated with chromatin remodeling and DNA repair, an increase in DNA damage and resistance to apoptosis [Mochmann et al., 2013]. High ERG expression in leukemia patients is associated with a poor prognostic outcome [Bock et al., 2013;

Turskey et al., 2016], and over expression of ERG in cell models mimics the expansion and molecular profile seen in certain leukemias, and is sufficient to establish drug-resistance [Bock et al., 2013; Mochmann et al., 2013; Turskey et al., 2016].

ERG, ETS1 and FLI1 are all ETS transcription factor family members that have been linked to hematopoietic differentiation and oncogenesis in cell and mouse models [Martens,

2011; Mochmann et al., 2013]. Both ERG and FLI1 were shown to promote the expression and functional interactions of the AML-associated oncogene and oncofusion protein RUNX1/AML1

(formerly called AML1-ETO), which is generated by the t(8;21) translocation in about 15% of

AML cases [Martens et al., 2012; Chen et al., 2015]. One study singled out histone H3 acetylation as the primary epigenetic marker associated with ERG and FLI1 binding, and the

7 authors suggest that histone deacetylase activity at ERG- and FLI1-bound sites may be a useful target in certain AMLs [Martens et al., 2012].

Two histone lysine demethylases (KDMs) that target H3K9me1/2 have been implicated in these oncogeneic pathways as well. The KDM JMJD2A (or KDM4A) was identified as a direct interacting partner with ERG, driving aberrant gene regulation in prostate cancer [Kim et al.,

2015]. JMJD1C is required for RUNX1 to drive cellular proliferation in AML cell lines, and is recruited directly to target genes by RUNX1/AML1 where it promotes a transcriptionally permissive chromatin state by maintaining low H3K9me2 levels [Chen et al., 2015]. The JMJD2 family of KDMs is essential for ESC self-renewal and transcriptional capability by preventing

H3K9me3 (and H3K36me3) accumulation, thereby maintaining chromatin boundaries

[Pedersen et al., 2016]. Similarly, the KDM LSD1 (KDM1a) is essential for hematopoietic cell renewal and early development, and loss of LSD1 severely inhibits hematopoietic differentiation, in part by de-repression of key genes associated with primitive hematopoietic stem and progenitor cells [Kerenyi et al., 2013]. LSD1 has been implicated in leukemogenesis and identified as a potential therapeutic target in acute promeyelocytic leukemia (APL) [Borate,

2012]. APL is characterized by the t(15;17)(q22;q12) fusion of the promyelocytic leukemia

(PML) and retinoic acid receptor-α (RARA) genes), where all-trans-retinoic acid (ATRA) therapy is used to drive differentiation of leukemic blasts, which, in conjunction with chemotherapy, yields a 80% cure rate [Borate, 2012]. While in non-APL AML, ATRA fails to drive transcriptional activation of retinoic acid receptor (RAR) target genes, Schenk et al. showed that inhibiting

LSD1 in addition to ATRA therapy enables ATRA-driven differentiation in leukemic cell lines and significantly limited the engraftment of primary human AML cells in immunodeficient NOD-SCID

8 mice Schenk et al. [2012]. LSD1 was shown to have an essential oncogenic function in both human and murine MLL-AML, where a translocation of the H3K4 methyltransferase MLL gene results in aberrant transcription of MLL target genes [Harris et al., 2012]. LSD1 inhibition or knockdown again promoted differentiation of leukemic cells and abrogated the clonogenic capacity of leukemic stem cells [Harris et al., 2012].

The HMTase G9a was shown to promote rapid proliferation of myeloid progenitor cells, in part by interacting with the transcription factor HoxA9, which has established roles in normal

HSPCs as well as in AML, where HoxA9- regulated gene expression contributes to enhanced proliferation and impaired differentiation [Lehnertz et al., 2014]. Reducing G9a, by knockout or small-molecule inhibitor treatment, slowed cell proliferation and reduced the LSC population in both a mouse model and human AML cells [Lehnertz et al., 2014]. G9a has also been implicated in contributing to leukemia development by transcriptionally repressing UHRF1

(ubiquitin-like with PHD and ring finger domains 1), which is itself an epigenetic regulator that recognizes histone modifications and DNA methylation, and has roles in maintenance of DNA methylation and regulating cell cycle progression by ubiquitination of H3K23 [Kim et al., 2015].

1.2.2 H3K9me2 genomic domains and oncogenesis

In addition to the regulatory role of H3K9me2 domains during lineage commitment and cellular differentiation, loss or alteration of this large-scale regulatory mechanism has been linked to cellular transformation and oncogenesis [Wen et al., 2009; McDonald et al., 2011]. In cancer cell lines, including hepatocarcinoma, cervical, lymphoma and leukemia, H3K9me2 domains were found to be drastically reduced in both size and number, compared to normal

9 cells or tissues [Wen et al., 2009]. Likewise, there were particular genomic sites, common across various types of cancer, where the patterns of H3K9me2 enrichment were dramatically altered, relative to normal cells and tissues [Wen et al., 2009; Wen et al., 2012].

The epithelial-to-mesenchymal transition (EMT) is a remarkable example of cellular plasticity, where cells undergo reversible morphological and functional changes to gain more stem-cell like characteristics. The changes seen in EMT have roles in normal development and injury repair, but EMT has also been linked to malignant transformation and may contribute to development of cancer “stem” cells in certain cancer types *McDonald et al., 2011+. In mouse

ESCs undergoing EMT in vitro, genome coverage by H3K9me2 was reduced and H3K4me3, a modification associated with active euchromatin, concomitantly increased, resulting in expression of genes previously associated with EMT and malignancy [McDonald et al., 2011].

Interestingly, chromatin reprogramming during EMT is largely specific to H3K9me2 domains and was shown to be LSD1-dependent [McDonald et al., 2011]. Differentiating stem cells and cells undergoing EMT go through widespread, highly dynamic epigenetic reprogramming, and

H3K9me2 domains may represent a common mechanism for both normal and aberrant epigenetic changes [Amente et al., 2013; McDonald et al., 2011].

1.3 Characterization of H3K9me2 domains in primary AML cells, CD34+ cells and granulocytes

This thesis discusses figures, results and conclusions that are included in the publication:

Rearrangement of large blocks of facultative heterochromatin in acute myeloid leukemia is linked to genetic instability and silencing of key genes regulating stem cell maintenance (PLOS

10 One, 12(3), March 2017). Authors were Anna Salzberg*, Abigail Harris Becker*, Evgenya

Popova, Nikki Keasey, Thomas Loughran, David Claxton and Sergei Grigoryev. (*Co-first authors). Section 1.3 discusses Figures 1-4, 6, 7 and Supplemental Figure F of Salzberg et al., of the work presented in this section Abigail Harris Becker contributed significantly to the ChIP- sequencing data (analysis shown in thesis Figures 1-4) and conducted G9a inhibition studies in

K562 cells (thesis Figure 6, and discussed in greater detail in Chapter 4). Thesis Chapter 2 presents methods and results generated by AHB. Thesis Chapters 3 and 4 present results generated by AHB and included in Supplemental Figure G and Figure 7 of Salzberg et al., albeit thesis Figures 8 and 10 are substantially expanded from published Supplemental Figure Ga and

Figure 7a, respectively.

The H3K9me2 modification drives the transition from active euchromatin to repressed heterochromatin during differentiation of hematopoietic and other stem cells [Tachibana et al.,

2005; Chen et al., 2012; Liu et al., 2015], and altered H3K9me2 domains have been linked to oncogenesis [Wen et al., 2009; McDonald et al., 2011; Wen et al., 2012]. We asked whether changes in H3K9me2 domains could be a potential mechanism for some of the epigenetic and gene expression changes seen in leukemic cells. To address this question, we conducted genome-wide analysis of H3K9me2 distribution patterns by Chromatin Immunoprecipitation followed by massively parallel sequencing (ChIP-seq), in primary AML patient samples, human

CD34+ hematopoietic progenitors, human mature granulocytes and the leukemic cell line K562.

Using the RSEG computational method, which finds epigenomic domain boundaries based on hidden Markov model (HMM) analysis of ChIP-seq read distribution [Song and Smith,

2011], we identified distinct genomic regions characterized by H3K9me2 levels that were

11 significantly higher than the genome-wide average (at least 2-fold over the genome-wide average). Unlike the even distribution seen with the total (“input”) DNA reads (Figure 1, e), the

H3K9me2 distribution shows marked variation over different genomic regions (Figure 1, a).

Comparisons between cell types revealed H3K9me2 domains that are conserved and common to all cell types evaluated, and domains that showed variable (higher or lower) levels of

H3K9me2 in different cell types or developmental stages. This analysis confirmed the conserved H3K9me2 LOCKs as previously described [Wen et al., 2009], including a conserved

H3K9me2 domain at a locus on 5 (Figure 1, red box) overlying a gene-poor region

(Figure 1 bottom panel). In addition, this analysis identified single gene loci such as TERT

(Figure 1, arrows) and more extended domains containing gene clusters where H3K9me2 levels are increased or decreased in the AML samples evaluated (Figure 4).

12

Figure 1: H3K9me2 domain in granulocytes, CD34+ progenitors and K562 cells. DNA sequence reads mapped to the , at 1 Mb resolution (a) and 10 kb resolution (b-g), from granulocyte H3K9me2 ChIP-seq (a, b), granulocyte input (total) DNA (e), granulocyte total histone H3 (H3 C-tail, f) and granulocyte H3K4me2 (g); H3K9me2 ChIP-seq for CD34+ (c) and K562 (d) cells; genes within locus shown at very bottom. Note even distribution of reads over input DNA (e) and total histone H3 (f), in contrast to sharp peaks for H3K4me2 ChIP (g) and the more diffuse marks seen with H3K9me2 ChIP (b, c, d). The H3K9me2 domain or LOCK (red box in b, c, d) clearly targets a gene-poor region, while the TERT gene (arrows) shows varying H3K9me2 levels between samples at a single locus. The y-axes depict the fold enrichment of sequence reads over the genome-wide average.

13 1.3.1 Identification of distinguishing characteristics associated with H3K9me2 domains

The next step was to investigate whether the H3K9me2 domains were co-localized with the standard chromatin states [Ernst et al., 2011] and other relevant human genomic features previously identified as part of the ENCODE Project. Correlation analysis of the H3K9me2 domains, in granulocytes (GC), CD34+ and K562 cells, and biodata for K562 cells available from human genome databases [Ernst et al., 2011; ENCODE Project Consortium, 2012] showed positive correlations with general heterochromatin (Figure 2, far left) and the repressive marks

H3K9me3 and K3K27me3 (Figure 2, labeled arrows). In line with the established role of

H3K9me2 in repressed chromatin, there were negative correlations with features of actively transcribed DNA including the activatory mark H3K4me2, promoters, enhancers and gene density. Additionally, the analysis showed positive correlations between the H3K9me2 domain regions and LADs, single-nucleotide variants (SNVs) associated with cancer and AML-associated

SNVs (Figure 2, 3rd and 4th from left), consistent with the model wherein condensed heterochromatin may promote DNA breaks or mutations, and also limit access of DNA repair machinery [Timp and Feinberg, 2013; Burman et al., 2015].

14

Figure 2: Genome-wide correlation analysis of H3K9me2 domains in granulocytes, CD34+ and K562 cells. From far left: H3K9me2 in T lymphocytes [Barski et al., 2007], general heterochromatin, cancer SNVs, AML-associated SNVs and LADs, all show positive correlations in granulocytes (top panel), CD34+ (middle panel) and K562 cells (bottom panel). “Discriminatory” datasets, H3K9me2, H3K27me3, “repressed” and “insulator” (arrows), show divergence between the cell types shown here.

15 Notably, H3K9me2 in CD34+ and K562 cells showed positive correlations with the chromatin states defined as “repressed” and “insulator” [Ernst et al., 2011], while granulocytes showed minimal or negative correlations, respectively (Figure 2, labeled arrows). The constitutive heterochromatin mark H3K9me3 was strongly correlated with H3K9me2 in K562 cells, but showed only a weak positive correlation in granulocytes and CD34+ cells (Figure 2, labeled arrow).

The H3K9me2 ChIP-seq and correlation analyses provide additional evidence that

H3K9me2 domains represent a genome-wide mechanism, which includes the domains conserved across cell types (“Red” domains ) as well as developmentally regulated domains, with distinct profiles associated with other genomic features and functions, in line with previous reports [Wen et al., 2009; McDonald et al., 2011; Wen et al., 2012].

Similar analysis was performed for the 10 AML patient peripheral mononuclear blood samples, and overall the H3K9me2 HMM domains in each of the AML samples showed strong positive correlation with the domains in the two normal cell types evaluated, CD34+ progenitors and mature granulocytes. Genome-wide correlation analysis showed that the comparisons that were “discriminatory” between granulocytes, CD34+ and K562 cells, namely the “insulator”, “repressed” and heterochromatin states, showed marked differences between certain AML samples as well (Figure 3 A). These correlations, together with hierarchical cluster analysis of H3K9me2 domain distribution (Figure 3 B) suggested that the AML patient samples fell into two distinct groups. The group termed ‘AML type A’ had a higher level of correlation between H3K9me2, repressed chromatin and insulators, in a pattern similar to CD34+ cells, and contained AML cases with less differentiated phenotype (data not shown, see Suppl. Table 2S in

16 Salzberg, et al.). The other samples, termed ‘AML type B’, had a correlation pattern more similar to K562 cells and granulocytes (looking at biodata sets for repressed chromatin, heterochromatin and insulators). In addition, while there were only 3 samples that fell into the

AML type B group, all showed more differentiated, monocytic features than the other 7 samples included in this evaluation.

17

Figure 3: H3K9me2 domain correlation analysis separates AML samples into two epigenetically distinct clusters. A: Correlations between H3K9me2 HMM domains and selected discriminatory datasets (“insulator”, “repressed” and heterochromatin), for granulocytes, CD34+ progenitors, 10 primary AML patient samples and K562 cells. B: Hierarchical cluster analysis of genome-wide H3K9me2 distribution in K562, granulocytes (female, F, and male, M, donors), CD34+ progenitor cells and 10 primary AML patient samples.

18 1.3.2 Deregulation of H3K9me2 domains and gene expression in AML

As previously reported [Wen et al., 2009; Chen et al., 2012; Wen et al., 2012], particular

H3K9me2 domains showed notable differences between developmental stages (e.g. CD34+ progenitors vs. mature granulocytes), which were termed differential LOCKs (dLOCKs).

Interestingly, ChIP-seq HMM analysis also identified H3K9me2 domains with drastic differences between AML and other cell types, suggesting that these domains play a role in both developmental and leukemia-associated epigenetic regulation.

A map of a 10 kb region on (50-59.12 MB) shows two H3K9me2 domains (Figure 4 A, red dashed boxes), showing higher H3K9me2 in granulocytes (A 1) than in

CD34+ progenitors (A 2) and two representative AML samples (A 3 and 4). While H3K9me2 levels are lower in the AML samples compared to granulocytes, the pattern of H3K9me2 distribution is distinctly similar in the AML type A sample and CD34+ cells (Figure 4 A, panels 3 and 2, respectively), and the AML type B sample shows similar H3K9me2 coverage within the regions to the coverage in granulocytes, albeit at a lower level (Figure 4 A, panels 4 and 1, respectively). The similarity of the input DNA maps (Figure 4 A, panels 5-7) is in marked contrast to the differences in H3K9me2 coverage and levels seen between sample types (i.e. CD34+ progenitors vs. mature granulocytes, AML vs. normal CD34+ cells or granulocytes, AML type A vs. AML type B, Figure 4 A panels 1-4). While an unsupervised cluster analysis of H3K9me2 at the chromosome 19 locus (50-59.12 MB) again suggests two distinct groupings of the AML samples

19

Figure 4: Genome-wide analysis of H3K9me2 domains reveals sites recurrently deregulated in AML. A: Ten kb region on chromosome 19 containing two H3K9me2 domains (red dashed boxes at 19q13.43 and 19q.13.41), showing higher H3K9me2 in granulocytes (1) than CD34+ progenitors (2) and AML (3, 4) (dLOCK gran>AML). Panels 1-4 show H3K9me2 ChIP DNA and panels 5, 6 and 7 show input DNA maps for granulocyte and 2 AML samples, respectively. B: Hierarchical cluster analysis of H3K9me2 distribution at chromosome 19 (50-59.12 MB) locus in K562, granulocytes, CD34+ progenitor cells and 10 primary AML patient samples.

20 (Figure 4 B), protein analysis by western blot showed no significant differences in total

H3K9me2 or total histone H3 between AML type A and AML type B (see Chapter 3, Figure 8).

Taken together these results indicate that the variance in regions designated as dLOCKs are reflective of specific, genome-wide changes in H3K9me2, and that the variance in H3K9me2 domains seen in AML type A and AML type B samples is not simply due to quantitative differences in global H3K9me2 but rather changing their predominant association from the temporarily repressed (H3K27me3) to constitutive heterochromatin (H3K9me3).

Six comparison categories were used to further classify the differential H3K9me2 domains (dLOCKs): CD34+ > granulocyte (GC); GC > CD34+; CD34+ > AML A; AML A > CD34+; GC

> AML A; AML A > GC. These categories were used to define and identify which dLOCKs showed the most significant changes associated with developmental and leukemic states. The threshold for significant dLOCKs was set to the 95 and 5 percentile, with alpha = 0.95 and 0.05 respectively. (Note, further analysis focused primarily on AML samples designated as ‘AML type

A’, with genome-wide correlation patterns more similar to CD34+ cells, because the majority of the primary patient samples evaluated fell into this category.)

Focusing on the developmental and leukemia-associated H3K9me2 HMM domains classified as described above, genes from these regions were subjected to Ingenuity® Pathway

Analysis, which indicated that all dLOCKs had a moderate (up to 2.7-fold) but significant (p<10-6

– p<10-15) enrichment for AML-associated genes (see Salzberg et al, 2017. Supplemental

Materials). The AML-depleted H3K9me2 domains (lower H3K9me2 in AML than CD34+ cells or

GCs, dLOCK CD34+ > AML A and dLOCK GC > AML A) were associated with clusters of repeated genes,

21 including many ZNF (zinc finger or KRAB-ZNF, zinc finger proteins which contain a Krüppel- associated box, or C2H2, domain) genes, a large family of repressive transcription regulators, many of which are clustered on chromosome 19 (Figure 4) and, clusters of protocadherin genes on chromosome 5. Interestingly, the domains with increased H3K9me2 in AML (dLOCK AML A > CD34+ and dLOCK AML A > GC) contained genes associated with hematopoietic development and myeloid leukemia, including ERG, RUNX1, ETS2 and MECOM.

Up-stream transcriptional regulators were also identified using Ingenuity® Pathway

Analysis (Figure 5 A), and the key regulator ERG was associated with AML-enriched H3K9me2 domains (dLOCK AML A > CD34+ and dLOCK AML A > GC). For the AML-depleted H3K9me2 domains

(dLOCK CD34+ > AML A and dLOCK GC > AML A) two transcriptional factors associated with chromosome domain boundaries were identified, CTCF, a zinc finger domain-containing architectural protein, and RAD21 (Cohesin). TRIM28 (KAP1) is the primary co-repressor for the KRAB-ZNF genes and was associated with dLOCKs enriched in granulocytes.

Microarray gene expression data from AML type A samples and CD34+ cells was analyzed using the Broad Institute Gene Set Enrichment Analysis tool, GSEA, [Subramanian et al., 2005] and Ingenuity® Pathway Analysis, which highlighted differences in the population of

AML-activated genes in dLOCK CD34+ > AML A and the AML-repressed genes enriched in dLOCK AML

A > CD34+ (Figure 5 B-E). Taken together, these results suggest that changes in H3K9me2 domains

(i.e. loss or gain of H3K9me2, rearrangement in domain coverage) not only regulates the individual underlying genes in some cases, but also affects specific transcriptional regulatory pathways such as those controlled by CTCF/RAD21 (cohesion), TRIM28 and ERG.

22

Figure 5: Gene regulation within AML-associated H3K9me2 domains: A: Top upstream transcriptional regulators associated with genes within the six dLOCK comparisons (a>0.95; a<0.05) shown at top. B, C: Gene set enrichment analysis (Subramanian et al., 2005) shows association of H3K9me2 dLOCK CD34+>AML (B) and dLOCK AML>CD34+ (C) with genes up- and down-regulated in AML. D, E: Top upstream regulators associated with the AML-expressed genes in dLOCK CD34+>AML (D) and AML-repressed genes in dLOCK AML>CD34+ (E).

23 1.3.3 Inhibiting the histone methyltransferases G9a and GLP in K562 cells

As the HKMTases G9a and GLP have been linked to gene regulation in AML [Lehnertz et al., 2014; Kim et al., 2015], and inhibition of G9a slowed leukemic progression [Lehnertz et al.,

2014], the effect of pharmacological inhibition of G9a in K562 cells was evaluated by ChIP-seq and Reverse Transcription quantitative PCR (RT-qPCR). K562 cells were treated with UNC0638, a small molecule inhibitor of G9a/GLP [Vedadi et al., 2011] continuously for 5 days (see Chapter

4, Figure 10). Genome-wide correlation analysis revealed that, in contrast to untreated K562 cells, G9a inhibition resulted in negative correlations between H3K9me2 domains and key genomic features including insulators, “repressed” dataset, H3K27 me3, and a decrease in the positive correlations with heterochromatin and H3K9me3 (Figure 6 A). Correlation patterns in

G9a-inhibited K562 cells closely resembled those in mature granulocytes and AML ‘type B’ samples, in sharp contrast to the similarities between untreated K562 cells and CD34+ cells.

Interestingly, among the top upstream transcriptional regulators were TRIM28 for dLOCKs

UNC0638 > K562 and ERG for dLOCKs K562 > UNC0638 (Figure 6 B) also featured in the primary AML cells

(Figure 5). Further evaluation of the effects of G9a inhibition in K562 cells is discussed in greater detail in Chapter 4.

24

Figure 6: Analysis of K562 cells following G9a inhibition with the small molecule UNC0638: A: Correlation analysis of H3K9me2 domains in K562 cells following UNC0638 treatment (1 uM for 5 days) and untreated control K562 cells, vs. selected discriminatory datasets (insulator, repressed, H3K27me3, heterochromatin and H3K9me3). B: Ingenuity pathway analysis of upstream regulators in K562 cells with and without UNC0638 treatment.

25

Chapter 2

Materials and Methods

2.1 Cells and Cell culture

2.1.1 Human blood samples

AML patient samples were collected with informed consent for the performance of diagnostic tests under the guidelines and procedures approved by the institutional review board of Milton Hershey Medical Center (STUDY00002518). Blood samples were collected and processed as described in [Salzberg, Harris Becker, et al. 2017].

2.1.2 Bone marrow CD34+ cells

Cryopreserved bone marrow CD34+ cells from normal donors were obtained from

Allcells (Alameda, CA).

2.1.3 K562 cells

Human K562 cells (ATCC CCL-243) were grown in RPMI 1640 GlutaMAX medium (Gibco,

Thermo Fisher Scientific) supplemented with 10% FBS (HyClone) and 1% penicillin-streptomycin

o (Gibco, Thermo Fisher Scientific), in 5% CO2 at 37 C. Cells were replated every 2 to 5 days at 1:5 to 1:10 dilution.

2.1.4 G9a/GLP inhibition in K562 cells

The small molecule G9a inhibitor UNC0638 (Sigma) (Vedadi et al.2011) was dissolved at

10 mM in DMSO under sterile conditions. K562 cells were treated with 0-1 M with UNC0638

26 with 0.01% (vol/vol) DMSO, for 5 days; fresh growth medium containing UNC0638 or DMSO only was replaced on day 3. Cells were incubated with UNC0638 at the concentrations indicated, when not specified the K562 cells were treated with 1 M UNC0638. Three biological replicates were performed, using independently cultured K562 cells.

2.2 Antibodies

Antibodies used for chromatin immunoprecipitation (ChIP) and western blotting analysis: anti-H3K9me2 (ab1220), anti-H3K9me3 (ab8898), anti-H3 C-tail (ab8898) from Abcam; and Anti-H3K4me2 (07–030) from Millipore. These antibodies were evaluated previously for

ChIP-seq and were used successfully for immunocytochemical analysis, western blotting, and

ChIP in previous works with myeloid cells [Popova et al. 2012]. See standard Grigoryev’s lab

ChIP-seq protocol in the Appendix for additional information on amount of antibody used per

ChIP.

2.3 SDS-PAGE and Western blot

Protein separation by SDS-PAGE was carried out in 15% polyacrylamide and the proteins were transferred to Immobilon-P polyvinylidene difluoride (PVDF) membrane (Millipore). For immunoblotting, the blocking buffer (also used for antibody dilution) was tris-buffered saline

(TBS) containing 3% nonfat dry milk and 0.1% Tween 20 (Fisher Scientific). The primary antibodies used were anti-H3K9me2 (ab1220), anti-H3K9me3 (ab8898), and anti-H3 C-tail

(ab8898), which was used to detect total Histone H3 as a loading control. Appropriate secondary antibodies conjugated to horseradish peroxidase (HRP) were from Jackson

Laboratories (Jackson Immuno Research). Semi-quantitative analysis of relative protein levels

27 was performed using the chemiluminescence detection reagent kit ECL Prime (Amersham, GE

Healthcare) and photographic film. Protein bands were quantified using ImageJ [Rasband, https://imagej.nih.gov/ij/] and the adjusted relative density (I) was calculated for AML samples relative to K562

2.4 Chromatin immunoprecipitation

Nuclei isolation, chromatin preparation and chromatin immunoprecipitation were performed as previously described (Popova et al. 2012; Salzberg, Harris Becker et al. 2017), a summary of protocols performed by Abigail Harris Becker to generate data included in this thesis is included here.

2.4.1 Nuclei isolation

Cultured or freshly thawed cells were fixed with 1% formaldehyde (Fisher Scientific), fixation was stopped by adding glycine (Fisher Scientific), cells were centrifuged at +4°C and washed with PBS. Nuclei were isolated from formaldehyde-fixed cells as previously described

[Popova et al. 2012]. In brief, after cell lysis and disruption using a Dounce homogenizer, the cell homogenate was layered on top of a 1.2 M sucrose “cushion” and centrifuged for 35 min. at 4700 x g, 4oC. The supernatant contains the less dense cell debris and the pelleted nuclei are resuspended in micrococcal nuclease digestion buffer (MNAse digest buffer, 50 mM Tris-HCl, pH=7.6; 3 mM CaCl2).

The nuclear suspension was digested with Micrococcal nuclease (Roche) for 20-40 min. at +37oC, to ~200 bp DNA fragments nucleosome sizes, and digestion was stopped by the addition of EDTA. The MNAse digests chromatin into nucleosome-sized (~150 bp) fragments.

28 The digested material is centrifuged to collect the nuclear digest pellet, which is resuspended in

SDS-containing lysis buffer (1% SDS; 10mM EDTA; 50 mM Tris HCl pH 8.0), and treated with light sonication (Fisher model 100, 2 x 10 sec. at setting 2) to enhance solubility but not shear DNA.

2.4.2 Immunoprecipitation

For a typical chromatin immunoprecipitation (ChIP) procedure, the MNAse-digest nuclear suspension was centrifuged (14,000 rpm for 5 min.) and the supernatant was diluted 1 to 10 with dilution buffer (0.01% SDS; 1.1% Triton X-100; 1.2 mM EDTA; 16.7 mM Tris-HCl pH

8.0; 167 mM NaCl). For each ChIP, antibody (2-5 g), 3 uL of 1X protease inhibitor cocktail

(Sigma, P2714) and 3 uaL of 100 mM of the serine protease inhibitor PMSF (Thermo Fisher

Scientific) were added to the diluted nuclear digest suspension and incubated with rotation at

+4°C overnight. At the same time, protein A Sepharose beads (Sigma, P9424) were washed twice with Washing buffer (9:1 = Dilution buffer : Lysis buffer), then resuspended in Washing buffer with salmon sperm DNA (0.5 mg/ml final, Thermo Fisher Scientific) and BSA (100 g/ml final, New England Biolabs) and incubated with rotation at +4°C overnight. On the next day, the beads were washed twice with Dilution buffer and resuspended in Washing buffer. The bead slurry was divided between the IPs prepared the previous day and rotated at 4oC for 2 hours.

The IPs were centrifuged, 2000 rpm for 5 min, the unbound fraction was discarded and the protein A Sepharose beads were resuspended in 1 ml Low Salt buffer (0.1% SDS; 1% Triton X-

100; 2 mM EDTA; 20 mM Tris-HCl pH 8.0; 150 mM NaCl). The protein A Sepharose beads were washed 4 times with Low Salt buffer and once with High Salt buffer (0.1% SDS; 1% Triton X-100;

2 mM EDTA; 20 mM Tris-HCl pH 8.0; 500 mM NaCl), collected by gentle centrifugation after each wash. The IP material was eluted from the protein A Sepharose beads with Elution buffer

29 (1% SDS, 0.1M NaHCO3) and 10 min. incubation with rotation at room temperature. The liquid supernatant was collected by centrifugation and designated as “IP CHIP”. The starting nuclear suspension was diluted 6-fold in Elution buffer and designated as “IP Input”, representing the non-immunoprecipitated total chromatin. The IP CHIP and IP Input were incubated with RNase

A (Thermo Fisher Scientific) and Proteinase K (Roche), both at 0.5 mg/ml final concentration, for

30 min at 370C. The formaldehyde cross-linking was reversed by incubating the IP preparations at 65oC for a minimum of 6 hours.

Immunoprecipitated DNA (or total DNA in the case of IP Input) was extracted using

Phase Lock tubes (5 PRIME) with two extractions using phenol/chloroform/isoamyl alcohol

(25:24:1; Fisher Scientific), followed by one extraction with chloroform/isoamyl alcohol (24:1;

Fisher Scientific). DNA was precipitated using 100% ethanol (4.5X the sample volume) with glycogen (40 g/ml final; Sigma) and sodium acetate (60 mM final, pH=5.5; Fisher Scientific), and washed twice with 70% ethanol. The final DNA was dissolved in 50-100 l DNase-free water. DNA concentration was measured using Qubit fluorometer and the dsDNA High

Sensitivity quantitation assay (Thermo Fisher Scientific).

2.5 Library preparation and Next-Generation sequencing

Libraries prepared by AHB for Illumina sequencing used 10 ng or less ChIP or Input DNA, following the manufacturers protocol (NEBNext ChIP-Seq Library Prep Reagent Set for Illumina,

New England Biolabgs), using XPure Amp magnetic beads (Beckman Coulter) for DNA fragment clean-up. Quality control of the final libraries was conducted using Agilent Bioanalyzer (see standard Grigoryev’s lab ChIP-seq protocol in the Appendix).

30 Genomic DNA sequencing was performed at the Pennsylvania State University

(University Park) Genomics Core Facility, with special thanks to Dr. Craig Praul.

2.6 Quantitative PCR

2.6.1 qPCR Instrumentation

All quantitative PCR (qPCR) was performed using FastStart Universal SYBR Green Master

(Roche) on a Roche LightCycler® 480 Instrument following the manufacturer’s instructions

(LC480® Instrument Manual, Software v1.5), using the cycling program shown in Table 1.

Table 1: Quantitative PCR cycling

Step Analysis Mode Acquisition Temp Time Ramp rate 1. Pre-Incubation None None 95˚C 10 min 4.8 °C/s

2. Amplification - 45 cycles Quantification i. Denaturation None 95˚C 15 sec 4.8 °C/s ii. Annealing None 55˚C 30 sec 2.5 °C/s iii. Extension Single 72˚C 30 sec 4.8 °C/s

3. Melting Curve Melting Curves i. None 95˚C 10 sec 4.8 °C/s ii. None 45˚C 60 sec 2.5 °C/s

iii. Continuous 97˚C - 4. Hold None 4˚C 2 °C/s

2.6.2 Primer sequences

Custom primers were designed using Primer3 [Rozen and Skaletsky, 1998], NCBI Primer-

BLAST [Ye et al. 2012] and the UCSC Genome Browser (hg19 assembly, Feb. 2009

(GRCh37/hg19)) [Kent et al. 2002]. With the exception of the GAPDH and ERG primers (Qiagen), all custom primer oligos were prepared by Integrated DNA Technologies (IDT). Primer sequences are listed in Table 2.

31

Table 2: Primer sequences Gene Forward primer Reverse primer

ChIP-qPCR primers

NLRP11-4 CGCTCTATACGACACCAGGAG GAATTGCTCCCTTGCCATTG ZNF274_A TGTCTTGCTCTGCTTTTGACTTAC CCTGGTTCAGTGTGAGGACC ZNF274_B GAAGATGGAAGCCTGAGTGC TATAACGGAACTGCCGGAAC ZNF544 ATCTGTGTGCTTCGAGGATGT GAGACAATATGCTCCCAGGTCT TRIM28* AACATTGCAGAAGAGCACCAA AGGTCAGGCTAGGTAGGGTCTT MECOM GAACCATCTGAAGCAGGTCTTG GGCAGTAGGAGTAGAGCCAGTG ETS1 CAAGATCCTTTTAGGCCAAGC TTCTGGATAGGCTGGGTTGA ERG GTTCTCTCCAGGGCACTCATC CCAGGTGACAGGCGACAC CDH1 GTGAACCCTCAGCCAATCAG TCACAGGTGCTTTGCAGTTC RT -pPCR primers

ZNF274 CTGGGTTTTACCCCGGAAG GCCAGAAATCTTCTGCCTCCT ZNF544 CTGAGGACCTCTGCCCTCTA GCCATAGCCACATCCTCGAA TRIM28* AACATTGCAGAAGAGCACCAA AACATTGCAGAAGAGCACCAA MECOM AGTGCCCTGGAGATGAGTTG TTTGAGGCTATCTGTGAAGTGC ETS1 AGAAGTCGTCACCCCAGACA GGGTGAGGCGATCACAACTA CDH1 TGGAACAGGGACACTTCTGC TTCTTGGGTTGGGTCGTTGT ERG Qiagen cat# QT00074193 sequence not available GAPDH Qiagen cat# PPH00150F sequence not available

* TRIM28 ChIP and RT-qPCR primer pairs share same forward primer

All PCR primer pairs were initially screened by running PCR products on agarose gels to establish a range for optimal melting temperature (Tm) and confirm the presence of a single product produced by the amplification reaction. By constraining primer oligo length and complexity it was possible to design primers such that all final primer pairs had similar or over- lapping Tm ranges. As the annealing temperature for a given primer pair is approximately 5-

10°C below the Tm this permitted the use of the same qPCR thermal cycling program with a

32 55°C annealing temperature (Table 1) for all primer pairs used with all ChIP- and RT-qPCR templates.

To validate the performance of primer pairs as qPCR assays, qPCR was performed using

4 4-fold dilutions of an appropriate template with each primer pair (gDNA for ChIP-qPCR primers and cDNA for RT-qPCR primers). Standard curves were generated by plotting the Ct values versus the log of template input (Figure 7 A) and performing a linear regression to confirm linearity of the standard curve data (desired R2 >0.980, Figure 7 B) [Hellemans et al.,

2007; Bustin et al., 2009]. In addition to evaluating the sensitivity and detection limit (working template range, in ng of template per reaction) of a given primer pair, the slope of the linear regression of the standard curve can be used to calculate the amplification efficiency of the reaction using the equation:

Efficiency = 10(-1/slope)-1

A slope between -3.9 and -3.0 (corresponding to 85-110% efficiency) is considered acceptable, as a slope of -3.33 indicates that the PCR reaction is 100% efficient (10(-1/-3.33)-1 = 100% = 100) and that the amount of product doubles with each amplification cycle [Hellemans et al., 2007].

A melt program (Table 1) was used to generate the melting temperature profile (“melt peaks”) of the amplified DNA, a single sharp melt peak indicates a single specific product was produced by the reaction (Figure 7 C). The standard curve and melt peak analysis was performed in 3 replicates for each primer pair, representative data shown.

33

Figure 7: qPCR assay validation using standard curve and melt peak analysis. A: Standard curves generated for 3 custom primer pairs using cDNA template. B: Table showing target and calculated slope, efficiency and R2 values. C: Representative melt peak as depicted by Roche LightCycler 480 Instrument Software, a sharp single peak indicates that a single specific product was produced by the amplification reaction.

34 2.6.3 ChIP-qPCR ChIP-qPCR assays were performed with 2 ng of ChIP or Input DNA per reaction, with triplicate reactions and two technical replicates were for each ChIP or Input DNA. The fold enrichment of ChIP over IP Input was calculated as: 2 to the power of the Ct difference between IP Input chromatin and ChIP samples: 2(Ct IP) – (CT ChIP)

2.6.4 RNA extraction and mRNA expression analysis Total RNA was extracted from cells using Trizol (Thermo Fisher Scientific) and Direct-zol RNA MiniPrep kit (Zymo Research) and quantified using a NanoDrop spectrophotometer (Thermo Fisher Scientific). cDNA was synthesized using SuperScript II First-Strand Synthesis System and oligo(dT)12-18 (Thermo Fisher Scientific). Reactions were performed using the equivalent to 20 ng cDNA per reaction (assuming 100% conversion of RNA to cDNA). Triplicate reactions for each target were performed, with two technical replicates for each sample. The ((Ct)) method (Livak and Schmittgen, 2001) was used to determine changes in gene expression: 2(Test sample (Ct Housekeeping) – (CT Target))/ (Reference sample (Ct Housekeeping) – (CT Target))

Two housekeeping genes, GAPDH and TBP, were used to calculate relative target gene mRNA levels, with similar results; data relative to GAPDH is shown. Fold change in gene expression was calculated relative to DMSO control-treated K562 cells. Student’s T-Test was used to evaluate the statistical significance of RT-qPCR gene expression analysis in two independent biological replicates.

35 Chapter 3

Domain-specific changes in H3K9 dimethylation in AML patient samples

Data presented in this chapter, in particular Figure 9, were included in the paper

Salzberg, Harris Becker, et al., 2017. Of the work and data included in this chapter Abigail Harris

Becker was responsible for conducting protein analysis by Western blot, designing the primers used for qPCR, performing and analyzing qPCR work, and preparation of most libraries for ChIP- seq. ChIP-seq analysis included in Figure 9 was conducted by Sergei Grigoryev.

ChIP-seq HMM analysis identified H3K9me2 domains with drastic differences between

AML and other cell types (“dLOCKS”, Figure 4 A, red dashed boxes), suggesting that these domains play a role in both developmental and leukemia-associated epigenetic regulation. The data suggested that the changes identified by ChIP-seq HMM analysis stemmed from distinct, site-specific changes in H3K9me2 levels, rather than differences in total DNA, as input DNA maps of normal cells and the AML samples showed no differences (Figure 4 A, panels 5-7). To support this, we evaluated total H3K9 methylation by immunoblot. Additionally, site-specific methylation was examined by a second method, ChIP quantitative PCR (ChIP-qPCR) targeting selected sites within dLOCKs, and ChIP-qPCR results were highly consistent with the ChIP-seq

HMM analysis.

3.1 Global H3 Lysine 9 methylation in AML

Cell lysate was prepared from freshly thawed unfixed cells and used to prepare two immunoblots in parallel (Figure 8). Using antibodies that can distinguish H3K9me2 (α-

36 H3K9me2, Figure 8 E) and H3K9me3 (α-H3K9me3, Figure 8 F) we compared overall levels of

H3K9 methylation in AML patient samples and K562 cells. To indicate relative amounts of total histone H3 in each sample and serve as a loading control, blots were stripped and re-probed with an antibody against the unmodified C-terminus of histone H3 (α-H3 C-tail, Figure 8 G and

H). This measure of global H3K9 methylation (shown as the amount of protein present in the

AML samples relative to K562 in Figure 8 I) showed no notable differences in the relative levels of the H3K9me2, H3K9me3 and unmodified H3 C-tail epitopes across AML type A and AML type

B samples.

37

Figure 8: Analysis of global histone H3K9 methylation in AML samples and K562 cells. Cell lysate from unfixed cells was prepared and analyzed in parallel. A-B: Coomassie stained polyacrylamide gels after protein transfer. C-D: Ponceau stained PVDF blot membrane. E-H: Immunoblots detecting H3K9me2 (E), H3K9me3 (F) and total histone H3 (G and H, blots stripped and re-probed for the unmodified C-terminus of histone H3) in 3 AML type A samples (8, 11, 9) and 2 AML type B samples (13, 14) and K562 cells. Protein bands were quantified using ImageJ and the adjusted relative density (I) was calculated for AML samples relative to K562. Representative immunoblots and quantification shown.

38 3.2 Evaluating H3K9me2 levels at selected dLOCKs by ChIP-qPCR

Quantitative PCR (qPCR) was performed using primers for selected loci within H3K9me2 domains, representing different types of LOCKs with constitutive high or low, or variable

H3K9me2 levels. Targeted loci included conserved domains, where H3K9me2 levels were either high (NLRP11-4) or low (TRIM28) in all of the samples evaluated, and “differential LOCKs”

(dLOCKs), regions where H3K9me2 levels tend to differ between the majority of the AML samples evaluated and granulocytes or CD34+ positive cells (as representative non-cancerous cell types). Genes within dLOCKs were evaluated at a region on chromosome 19 with low

H3K9me2 in AML samples (dLOCKCD34+>AML; ZNF274, ZNF544) and 2 regions where H3K9me2 levels are higher AML samples than in granulocytes (dLOCKsAML>CD34+; MECOM, ETS1). H3K9me2

ChIP DNA (Figure 9 solid bars) from mature granulocytes (Figure 9 A) and AML type A samples

(Figure 9 B through E) and AML type B samples (Figure 9 F and G) was used for ChIP-qPCR.

39

Figure 9: Comparing ChIP-seq and ChIP-qPCR analysis in AML samples and granulocytes. ChIP-seq data was used to calculate a gene-average for H3K9me2 levels at the loci indicated, representing dLOCKCD34+ >AML (ZNF274, ZNF544), conserved H3K9me2 domains (high H3K9me2 at NLRP4, low at TRIM28), dLOCKAML>CD34+ (MECOM, ETS1), for granulocytes (A), AML type A (B-E) and AML type B (G, H). For ChIP-qPCR data (solid bars), the fold enrichment of each locus was calculated as 2^ -(Ct Input – Ct ChIP), values were normalized to the average fold enrichment across all target sites evaluated. Values depicted are the fold enrichment of H3K9me2 ChIP DNA over Input DNA. ChIP-qPCR assays were performed in triplicate with 2 ng DNA per reaction and two replicates were run for each ChIP or Input DNA. Error bars represent standard deviation.

40 As with ChIP-seq, for each sample (e.g. granulocyte), ChIP-qPCR was performed on both

H3K9me2 ChIP DNA and Input DNA (total DNA, without immunoprecipitation), and the data is shown as the enrichment of H3K9me2 ChIP DNA over Input DNA. To corroborate the ChIP-seq

HMM analysis, gene-averages of H3K9me2 levels were calculated from ChIP-seq data, and the gene-average H3K9me2 levels were compared to the ChIP-qPCR data in (Figure 9 white bars).

The ChIP-qPCR results are highly consistent with the ChIP-seq HMM analysis that was used to identify and define H3K9me2 domains.

41 Chapter 4

Domain-specific inhibition of G9a/GLP in K562 cells and coordinate effects on H3K9me2 domains and gene expression

The data presented in this chapter, specifically portions of Figure 10 (E and G) and all of

Figure 11, were included in the paper Salzberg, Harris Becker et al, 2017. Of the work and data included in this chapter, Abigail Harris Becker was responsible for culturing K562 cells, optimizing conditions for G9a inhibition, conducting protein analysis by Western blot, designing the primers used for qPCR, performing and analyzing qPCR, and preparation of some libraries for ChIP-seq.

Differential computational analysis of ChIP-seq data identified H3K9me2-enriched domains that differed (dLOCKS) in K562 cells with and without inhibition of the histone lysine- specific methyltransferase (HKMTase) G9a/GLP. There was a strong overlap between domains where inhibition of G9a reduced H3K9me2 levels, and domains with reduced H3K9me2 AML compared to mature granulocytes. We analyzed K562 cells following pharmacological inhibition of G9a to evaluate coincident changes in H3K9me2 and gene regulation.

4.1 Global H3K9 di- and tri-methylation in K562 cells after pharmacological inhibition of G9a

Since G9a/GLP is responsible for H3K9 dimethylation and, to a lesser extent trimethylation, it was necessary to evaluate the effects of UNC0638 treatment on both

H3K9me2 and H3K9me3 levels in K562 cells. Myeloid leukemia-derived K562 cells were treated continuously with UNC0638, a well-characterized small molecule which has been shown to

42 inhibit both G9a and GLP in a substrate-competitive manner [Vedadi et al., 2011], media with fresh UNC0638 or DMSO control was replaced every 3 days. Whole cell lysate was boiled for 10 minutes in buffer containing SDS then analyzed by SDS-PAGE. Antibodies that can distinguish

H3K9me2 (α-H3K9me2) and H3K9me3 (α-H3K9me3) were used to detect H3K9 methylation levels in cell lysate prepared from UNC638-treated or untreated K562 cells (Figure 10) by

Western Blotting. To indicate the amount of total histone H3 in each sample and serve as a loading control, blots were stripped and re-probed with an antibody against the unmodified C- terminus of histone H3 (α-H3 C-tail). Immunoblot analysis of 2 independent experiments was performed, representative images shown.

43

Figure 10: Treatment with UNC0638 results in selective and reversible reduction of global histone H3K9me2 but not H3K9me3 in K562 cells. Western blot detection of total H3K9 di- and tri-methylation in untreated K562 cells and K562 cells following treatment with varying amounts of UNC0638 or DMSO vehicle control, as indicated. Cell lysate from unfixed cells was run on two polyacrylamide gels in parallel. A-B: Coomassie stained polyacrylamide gels after protein transfer. C-D: Ponceau stained PVDF membrane. E-H: Immunoblots showing H3K9me2 (E), H3K9me3 (F) and total histone H3 as indicated (G-H: blots from E and F were stripped and re-probed for the unmodified C-terminus of histone H3). Cells were collected after 5 days of UNC0638 treatment (lanes 1 through 5), 5 days of treatment followed by 2 days in culture in the absence of UNC0638 (lane 7), or after 7 days of continuous treatment with UNC0638 (lane 8) or DMSO control (lane 6).

44 K562 cells were treated with increasing concentrations of UNC0638 (0.25, 0.5 or 1 uM

UNC0638, or DMSO vehicle control) and collected after 5 days (Figure 10 E-H, lanes 1 through

5). As expected, increasing concentrations of UNC0638 reduced total H3K9me2 without reducing H3K9me3. In cells treated for 5 days and then maintained in culture for another two days in the absence of UNC0638 (Figure 10 lane 7 in E and G), H3K9me2 levels begin to rebound

(Figure 10 E and G, compare lane 7 with lanes 4 and 8 which received continuous 0.5 uM

UNC0638 treatment for 5 and 7 days, respectively; lane 6 shows 7 days in culture with DMSO control only). This data shows that UNC0638 treatment reduces H3K9me2 in K562 cells in an

H3K9me2-specific, dose-dependent and reversible manner. Based on this analysis the

UNC0638 treatment used for the majority of work described here was 1 uM UNC0638 (unless otherwise specified) for 5 days, with fresh UNC0638-containing medium every other day.

4.2 Quantitative PCR evaluation of H3K9me2 levels and gene expression following G9a inhibition

As described in Chapter 3, qPCR was performed using primers for targeted loci within

H3K9me2 domains, including conserved LOCKs and select differential LOCKs containing genes in the two regulatory pathways identified by the Ingenuity® Pathway Analysis (as discussed in

Chapter 1.3.2; gene loci are specified below). ChIP-qPCR was performed on H3K9me2 ChIP DNA and Input DNA (total DNA, no IP) for UNC0638-treated K562 cells (1 uM UNC0638 for 5 days) and DMSO control-treated K562 cells, the data is shown as the enrichment of H3K9me2 ChIP

DNA over Input DNA (Figure 11 A). Gene expression analysis was conducted for RNA samples prepared from UNC0638-treated K562 cells (0.5 and 1 uM, 5 days) and DMSO vehicle -treated

K562 cells, using RT-qPCR primer assays for the same loci as ChIP-qPCR (Figure 11 B).

45

Figure 11: ChIP- and RT-qPCR analysis of K562 cells following G9a inhibition. A: ChIP-qPCR quantification of H3K9me2 levels at selected loci, representing constitutive H3K9me2 domains (H3K9me2 high: NLRP11-4, H3K9me2 low: TRIM28), dLOCKCD34+ >AML (ZNF274, ZNF544), dLOCKAML>CD34+ (MECOM, ETS1, ERG) and dLOCKAML>GC (CDH1) for untreated (control) K562 cells and K562 cells treated for 5 days with 1 M UNC0638. For ChIP-qPCR data, the fold enrichment of each locus was calculated as 2^ -(Ct Input – Ct ChIP), values were normalized to the average fold enrichment across all target sites. Values depicted are the fold enrichment of H3K9me2 ChIP DNA over Input DNA. ChIP-qPCR assays were performed in triplicate with 2 ng DNA per reaction and two technical replicates were run for each ChIP or Input DNA. Error bars represent standard deviation. B: RT-qPCR gene expression quantification at the same loci as in A, for DMSO vehicle-treated (control) K562 cells and K562 cells treated for 5 days with 0.5 or 1 M UNC0638, as indicated. RT-qPCR data was normalized to GAPDH, and fold change in gene expression was calculated relative to DMSO control-treated K562 cells. RT-qPCR assays were performed in triplicate with 20 ng DNA per reaction. Student’s T-Test was used to evaluate the statistical significance of RT-qPCR gene expression analysis in two independent biological replicates.

46 ZNF genes were most represented in H3K9me2 dLOCKs depleted in AML. However, G9a inhibition in K562 cells increased rather than reduced H3K9me2 levels and did not impact gene expression at these loci (ZNF274, ZNF544), consistent with previous reports [Chen et al., 2012].

In UNC0638-treated K562 cells, two genes associated with the ERG regulatory pathway, ERG and ETS1, showed decreased H3K9me2 and increased gene expression relative to untreated

K562 cells. The qPCR data independently shows that G9a inhibition by UNC0638 results in changes in H3K9me2 and gene expression that are highly locus-specific. In addition, this data is in agreement with the ChIP-seq data and HMM analysis. Our studies determined that there are

H3K9me2 LOCKs where either G9a-dependent or G9a-independent transcriptional regulation predominates.

47 Chapter 5

Discussion

Altered epigenetic regulation is increasingly recognized as a significant factor in the pathogenesis of leukemia and other cancers, driving cancer initiation, progression and resistance to therapy [Yoshimi and Kurokawa, 2011; Timp and Feinberg, 2013; Corces-

Zimmerman et al., 2014]. Many genetic and epigenetic factors have been implicated in AML and information on epigenetic mechanisms in oncogenesis is fast emerging [Yoshimi and

Kurokawa, 2011; Borate et al., 2014; Cai et al., 2015; Mirabella, et al., 2016; Hu and Shilatifard,

2016]. Our work focused on dysregulation of the large-scale epigenetic changes mediated by dimethylation of histone H3 at lysine 9 (H3K9me2), a repressive histone modification associated with gene silencing. The H3K9me2 modification mediates the transition from transcriptionally active euchromatin to repressed heterochromatin during differentiation of hematopoietic and other stem cells [Tachibana, 2005; Chen, 2012; Liu, 2015]. This large-scale chromatin remodeling is essential during lineage commitment to restrict the totipotent genome and specify gene expression patterns for specific cell types [Wen et al., 2009; Liu et al., 2015].

Alterations in H3K9me2 domains have been linked to cellular transformation in leukemic and other cancer cell lines [Wen et al., 2009; McDonald et al., 2011; Wen 2012], but comprehensive examination of H3K9me2 domains in primary leukemic cells had not been previously reported.

We conducted genome-wide analysis of H3K9me2 distribution patterns by ChIP-seq in primary AML patient samples, human CD34+ hematopoietic progenitors, mature human granulocytes and the leukemic cell line K562. We identified distinct genomic regions marked by

48 H3K9me2 levels that were significantly higher than the genome-wide average (defined as 2-fold over the genome-wide average). And some domains showed variation between cell types, including specific sites where H3K9me2 levels are recurrently depleted in AML. Analysis of genome-wide ChIP-Seq profiling of H3K9me2 domains in hematopoietic cells and primary AML cells revealed novel information on the distinguishing characteristics, genomic features and gene content associated with these H3K9me2 domains.

The domains showed a positive correlation with lamina associated domains (LADs) and single nucleotide variants (SNVs) associated with AML and other cancers, which may indicate that more condensed heterochromatin limits access of DNA repair machinery [Timp and

Feinberg, 2013; Burman et al., 2015; Supek and Lehner, 2015]. As shown recently, proto- oncogenes may “escape” insulated chromosomal neighborhoods and become activated oncogenes after microdeletions or other mutations eliminate or impact chromosomal boundary elements [Hnisz et al., 2016]. Notably, our studies showed that only AML-specific H3K9me2 domains (dLOCKAML A>CD34+) significantly overlapped with genes associated with chromosomal translocations in AML. Taken together, these observations suggest that aberrant spreading of repressive H3K9me2 inhibits DNA repair and that the resulting mutations may promote chromosomal translocations. Interestingly, Ingenuity® Pathway Analysis of genes within the

AML-depleted H3K9me2 domains (dLOCK CD34+ > AML A and dLOCK GC > AML A) identified two transcriptional factors associated with chromosome domain boundaries, CTCF and RAD21

(Cohesin). This may indicate that depleted H3K9me2 domains in AML also contribute to

49 chromosomal translocations by altering the expression or function of key chromosome boundary mediators.

In addition, pharmacological inhibition of G9a by UNC0638 in K562 cells provided evidence for the involvement of the HKMTases G9a/GLP in changes seen in H3K9me2 domains in AML cells, as dLOCKs enriched in AML (compared to mature granulocytes) tended to overlap with those reduced by UNC0638 treatment. We observed that pharmacological inhibition of

G9a/GLP reduced H3K9me2 at certain loci in K562 cells, concurrent with a moderate increase in gene expression, including ERG and ETS1 (Figure 11). The key regulator ERG was associated with AML-enriched H3K9me2 domains (dLOCK AML A > CD34+ and dLOCK AML A > GC), and these same domains contained genes associated with hematopoietic development and myeloid leukemias, including ERG, RUNX1, ETS2 and MECOM. ERG and other members of the ETS transcription factor family have been linked to hematopoietic differentiation and oncogenesis in cell and mouse models [Martens, 2011; Mochmann et al., 2013]. As high expression of ERG has been linked to poor prognostic outcomes [Bock et al., 2013; Turskey et al., 2016], mimics the expansion and molecular fingerprint of certain leukemias, and is sufficient to establish drug- resistance [Bock et al., 2013; Mochmann et al., 2013; Turskey et al., 2016], a treatment that increases ERG expression is liable to have a negative impact on AML progression. However,

UNC0638 treatment and genetic inactivation of G9a/GLP was shown to slow the progression of

AML characteristics in human cells and a mouse model of leukemia (Lehnertz et al., 2014). It is likely therefore, that the role of G9a/GLP and associated H3K9me2 in AML are highly context

50 dependent, and thus the success of potential treatments targeting them will necessarily be highly patient-specific.

From these studies we conclude that while most H3K9me2 domains do not undergo radical reprogramming in AML cells (e.g. from repressive heterochromatin state to a transcriptionally active euchromatin state), there are significant changes and rearrangement of

H3K9me2 marks, which results in correlational changes in gene regulation. Results from our studies and others suggest that loss or dysregulation of this large-scale repression system may play a significant role in cellular transformation and oncogenesis. Our working model proposes that changes in H3K9me2 domains (i.e. loss or gain of H3K9me2, rearrangement in domain size or location) results in coordinated activation or silencing of gene clusters, which significantly contributes to the development of AML.

Future studies utilizing UNC0638 to inhibit G9a could be performed in AML cell model systems, primary CD34+ progenitors and leukemic cells. It would be of interest to confirm the inhibition of cell growth and clonal expansion, without increased apoptosis, reported by

Lehnertz in mouse cells and human AML cells [Lehnertz et al., 2014]. Subsequent research on

G9a inhibition in leukemia has identified other G9a/GLP inhibitors with fewer cytotoxic effects in leukemia cell lines [Pappano et al., 2015], a novel mechanism of action [Kondengaden et al.,

2016] or dual action against G9a and DNA methyltransferases (DNMTs) [San Jose-Eneriz et al.,

2017].

The question remains whether the inactivation of G9a we showed in our K562 model system (derived from chronic myelogenous leukemia, CML) will replicate in an AML cell model

51 system and primary AML patient cells. Genetic inactivation of G9a/GLP, by siRNA or

CRISPR/Cas9 methods, in a suitable AML model system is necessary to establish the role of

G9a/GLP in the H3K9me2 domain changes suggested by this current work. As this study was limited by the number of patient samples, future evaluations should include a broader range of

AML patient samples and normal hematopoietic progenitors. It would also be of interest to evaluate the expression and functionality of G9a/GLP, SETDB1, LSD1 and other key epigenetic regulators linked to leukemogenesis in the primary AML patient samples.

Finally, our studies indicate that while the majority of H3K9me2 LOCKs are regulated by

G9a, there is a subset of G9a-independent LOCKs which tend to overlap with regions of highly repetitive gene families, including KRAB-ZNF genes, in accordance with previous reports [Wen et al., 2009]. Interestingly, TRIM28 (KAP1) is the primary co-repressor for the KRAB-ZNF genes

[Schultz et al., 2002] and was identified as the top upstream transcriptional regulator associated with dLOCKs UNC0638 > K562. In addition, CTCF, one of the factors linked with the AML- depleted H3K9me2 domains (dLOCK CD34+ > AML A and dLOCK GC > AML A), is itself a zinc finger domain-containing architectural protein associated with chromosome domain boundaries. The

HKMT SETDB1, which catalyzes H3K9 di- and trimethylation, is known to co-localize with

TRIM28 and HP1 to silence KRAB-ZNF genes [Schultz et al., 2002; Frietze et al., 2010], which suggests that the G9a-independent H3K9me2 domains may be regulated by SETDB1. As KRAB-

ZNFs and TRIM28 (KAP1) have been shown to act as long-distance transcriptional repressors capable of silencing promoters tens of kilobases (kb) away [Groner et al., 2010], the G9a- independent H3K9me2 domains represent another exciting avenue for future research. Finally,

52 SETDB1 has recently been implicated in oncogenesis, including silencing of tumor suppressor genes [Karanth et al., 2017], retrotransposon activation and immune system avoidance [Cuellar et al., 2017]. Verification and evaluation of SETDB1-depenent H3K9me2 domains therefore be essential to understanding the role of dLOCKs in leukemogenesis.

53 References

Amente S, Lania L, Majello B. The histone LSD1 demethylase in stemness and cancer transcription programs. Biochimica Biophysica Acta, v. 1829, n. 10, 2013.

Barski A, Cuddapah S, Cui K, Roh TY, et al. High-resolution profiling of histone methylations in the human genome. Cell, 129, 2007.

Bock J, Mochmann LH, Schlee C, et al. ERG trancscriptional networks in primary Acute Leukemia cells implicate a role for ERG in deregulated kinase signaling. PLOS One, v. 8(1), 2013.

Borate U, Absher D, Erba HP, Pasche B. Potential of whole-genome sequencing to determine risk and personalizing therapy: focus on AML. Expert Reviews in Anticancer Therapy, 12(10), 2012.

Burman B, Zhang ZZ, Pegoraro G, Liedb JD, Misteli T. Histone modifications predispose genome regions to breakage and translocation. Genes and Development, v. 29, 2015.

Bustin SA, Benes V, Garcon JA, et al. The MIQE Guidelines: Minimum Information for Publications of Quantitative Real-Time PCR Experiments. Clinical Chemistry, v. 55 n. 4, 2009.

Cai SF, Chen CW, Armstrong SA. Drugging Chromatin in Cancer: Recent Advances and Novel Approaches. Molecular Cell, 60, 2015.

Chen X, Skutt-Kakaria K, Davidson J, et al. G9a/GLP-dependent histone H3K9me2 patterning during human hematopoietic stem cell lineage commitment. Genes and Development, v. 26, n. 22, 2012.

Chen M, Zhu N, Liu X, et al. JMJD1C is required for the survival of acute myeloid leukemia by functioning as a coactivator for key transcription factors. Genes and Development, v. 29, 2015.

Chin HG, Esteve PO, Pradhan M et al. Automethylation of G9a and its implication in wider substrate specificity and HP1 binding. Nucleic Acids Research, v. 35, n. 21, 2007.

Collins R, Cheng X. A case study in cross-talk: the histone lysine methyltransferases G9a and GLP. Nucleic Acids Research, v. 38, n. 11, 2010.

Corces-Zimmerman MR, Hong WJ, Weissman IL, et al. Preleukemic mutations in human acute myeloid leukemia affect epigenetic regulators and persist in remission. Proceedings of the National Acadamy of Science, v. 111, n. 7, 2014.

Cuellar TL, Herzner AM, Zhang X, Goyal Y et al. Silencing of retrotransposons by SETDB1 inhibits the interferon response in acute myeloid leukemia. Journal of Cell Biology, 216(11), 2017

ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, v. 489, n. 7414, 2012.

54 Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature, v. 473, 2011.

Feldman N, Gerson A, Fang Jia, et al. G9a-mediated irreversible epigenetic inactivation of Oct- 3/4 during early embryogenesis. Nature Cell Biol, v. 8, no. 2, 2006.

Frietze S, O’Geen H, Blanik KR, Jin VX, et al. ZNF274 recruits the histone methyltransferase SETDB1 to the 3' ends of ZNF genes. PLoS One, v. 5, n. 12, 2010.

Fritsch L, Robin P, Mathieu JRR, Souidi M, et al. A subset of the Histone H3 Lysine 9 Methyltransferases Suv39h1, G9a, GLP, and SETDB1 Participate in a Multimeric Complex. Molecular Cell, 37, 2010.

Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. New England Journal of Medicine, v. 368, n. 22, 2013.

Grigoryev SA, Nikitina T, Pehson JR, Singh PB, Woodcock CL. Dynamic relocation of epigenetic chromatin markers reveals an active role of constitutive heterochromatin in the transition from proliferation to quiescence. Journal of Cell Science, 117, 2004.

Grigoryev SA, Bulynko YA, Popova EY. The end adjusts the means: heterochromatin remodelling during terminal cell differentiation. Chromosome Research, v. 14, n. 1, 2006.

Groner AC, Meylan S, Ciuffi A, Zangger N et al. KRAB-zinc finger proteins and KAP1 can mediate long-range transcriptional repression through heterochromatin spreading. PLoS Genetics, v. 6, n. 3, Mar 2010.

Harris WJ, Huang X, Lunch JT, Spencer GJ, et al. The histone demethylase KDM1A sustains the oncogeneic potential of MLL-AF9 leukemia stem cells. Cancer Cell, 21(4); 2012.

Hellemans J, Mortier G, De Paepe A, Speleman F and Vandesompele J. qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biology, 8:R19, 2007.

Hnisz D, Weintraub AS, Day DS, Valton AL, Bak RO et al. Activation of proto-oncogenes by disruption of chromosome neighborboods. Science, 351. 2016.

Hu D, Shilatifard A. Epigenetics of hematopoiesis and hematological malignancies. Genes and Development, v. 30, n. 18, 2016.

Karanth AV, Maniswami RR, Prashanth S, Govindaraj H et al. Emerging role of SETDB1 as a therapeutic target. Expert Opinion on Therapeutic Targets, 21(3), 2017.

Kerenyi MA, Shao Z, Hsu YJ, Guo G, et al. Histone demethylase LSD1 represses hematopoietic stem and progenitor cell signatures during blood cell maturation. Elife 2: e00633, 2013.

55 Kent WJ, Sugnet CW, Furey TS, et al. The human genome browser at UCSC. Genome Research, 12(6), 2002.

Kim TD, Shin S, Janknect R. ETS transcription factor ERG cooperates with histone demethylase KDM4A. Oncology Reports, v. 35(6), 2015.

Kondengaden SM, Luo LF, Huang K, Zhu M, et al. Discovery of novel small molecule inhibitors of lysine methyltransferase G9a and their mechanism in leukemia cell lines. European Journal of Medical Chemistry, 122, 2016.

Kouzarides T. Chromatin modifications and their function. Cell 128: 693-705, 2007.

Krishnan S, Horowitz S, Trievel RC. Structure and function of histone H3 lysine 9 methyltransferases and demethylases. Chembiochem, v. 12, n. 2, 2011.

Lehnertz B, Pabst C, Su L, Miller M, et al. The methyltransferase G9a regulates HoxA9- dependent transcription in AML. Genes and Development, v. 28, n. 4, 2014.

Li G, Reinberg D. Chromatin higher-order structures and gene regulation. Current Opinions in Genetic Development 21: 175-186, 2011.

Livak KJ, Schmittgen TD. Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta (CT)) Method. Methods, 25(4), 2001.

Liu N, Zhang Z, Wu H, et al. Recognition of H3K9 methylation by GLP is required for efficient establishment of H3K9 methylation, rapid gene repression and mouse viability. Genes and Development, v. 29, 2015.

Martens JH. Acute myeloid leukemia: A central role for the ETS factor ERG. Int J Biochem Cell Biol., 43(10), 2011.

Martens JH, Mandoli A, Simme F, et al. ERG and FLI1 binding sites demarcate targets for aberrant epigenetic regulation by AML1-ETO in acute myeloid leukemia. Blood, 120(19), 2012.

McDonald OG, Wu H, Timp W, Doi A, Feinberg AP. Genome-scale epigenetic reprogramming during epithelial-to-mesenchymal transition. Nat Struct Mol Biol, v. 18, n. 8, Aug 2011.

Mirabella AC, Foster BM, Bartke T. Chromatin deregulation in disease. Chromosoma, 125, 2016.

Mochmann LH, Neumann M, von der Heide EK, et al. ERG introduces a mesenchymal-like state associated with chemoresistance in leukemia cells. Oncotarget, v. 5, n. 2, 2013.

Pappano WN, Guo J, He Y, Ferguson D, et al. The Histone Methyltransferase Inhibitor A-366 Uncovers a Role for G9a/GLP in the Epigenetics of Leukemia. PLOS One, 10(7), 2015.

56 Patel JP, Gonen M, Figueroa ME, Fernandez H, et al. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia. New England Journal of Medicine, v. 366, n. 12, Mar 22 2012.

Pedersen MT, Kooistra SM, Radzisheuskaya A, Laugesen A, et al. Continual removal of H3K9 promoter methylation by Jmjd2 demethylases is vital for ESC self-renewal and early development. The EMBO Journal 35: 1550-1564, 2016.

Popova EY, Xu X, DeWan AT, Salzberg AC, et al. Stage and gene specific signatures defined by histones H3K4me2 and H3K27me3 accompany mammalian retina maturation in vivo. PLOS One, 7(10), 2012.

Rasband WS, ImageJ, U.S. National Institutes of Health, Bethesda, Maryland, USA https://imagej.nih.gov/ij/, 1997-2016.

Richmond TJ, Davey CA. The structure of DNA in the nucleosome core. Nature, 423, 2003.

Rozen S, Skaletsky HJ. Primer3, 1998. Code available at: http://www- genome.wi.mit.edu/genome_software/other/primer3.html

Salzberg A, Harris Becker A, Popova E, Keasey N, Loughran T, Claxton D, Grigoryev S. Rearrangement of large blocks of facultative heterochromatin in acute myeloid leukemia is linked to genetic instability and silencing of key genes regulating stem cell maintenance. PLOS One, 12(3), 2017.

San Jose-Eneriz E, Agirre X, Rabal O, Vilas-Zornoza A, et al. Discovery of first-in-class reversible dual small molecule inhibitors against G9a and DNMTs in hematological malignancies. Nature Communications, 2017.

Schenk T, Chen WC, Gollner S, Howell L, et al., Inhibition of the LSD1 (KDM1A) demethylase reactivates the all-trans-retinoic acid differentiation pathway in acute myeloid leukemia. Nature Medicine, 18(4), 2012.

Schultz DC, Ayyanathan K, Negorev D, Maul GG et al. SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes and Development, v. 16, n. 8, 2002.

Schones DE, Chen X, Trac C, Setten R and Paddison PJ. G9a/GLP-dependent H3K9me2 patterning alters chromatin structure at CpG islands in hematopoietic progenitors. Epigenetics Chromatin, 7:23, 2014.

Shankar SR, Bahirvani AG, Rao VK, Bharathy N, Ow JR, Taneja R. G9a, a multipotent regulator of gene expression. Epigenetics, v. 8, n. 1, Jan 2013.

Shinkai Y, Tachibana, M. H3K9 methyltransferase G9a and the related molecule GLP. Genes and Development, v. 25, n. 8, 2011.

57 Shih, AH, Levine RL. Molecular biology of myelodysplastic syndromes. Seminars in Oncology, v. 38, n. 5, p 2011.

Song Q, Smith AD. Identifying dispersed epigenomic domains from ChIP-Seq data. Bioinformatics, v. 27, n. 6, 2011.

Sridharan R, Gonzales-Cope M, Chronis C et al. Proteomic and genomic approaches reveal critical functions of H3K9 methylation and heterochromatin protein-1 gamma in reprogramming to pluripotency. Nature Cell Biology, 15(7), 2013.

Subramanian A, Tamayo P, Mootha VK, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS USA, v. 102, 2005.

Supek F and Lehner B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature, 7:521, 2015.

Tachibana M, Sugimoto K, Nozaki M, et al. G9a histone methyltransferase plays a dominant role in euchromatic histone H3 lysine 9 methylation and is essential for early embryogenesis. Genes and Development, v. 16, n. 14, 2002.

Tachibana M, Ueda J, Fukudu M, et al. Histone methyltransferases G9a and GLP form heteromeric complexes and are both crucial for methylation of euchromatin at H3-K9. Genes and Development, v. 19, 2005.

Timp W, Feinberg AP. Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nature Reviews Cancer, v. 13, n. 7, 2013.

Tursky ML, Beck D, Thomas JA, et al. Overexpression of ERG in cord blood progenitors promotes expansion and recapitulates molecular signatures of high ERG leukemias. Leukemia, v. 29(4), 2014.

Uy GL, Duncavage EJ, Chang GS, et al. Dynamic changes in the clonal structure of MDS and AML in response to epigenetic therapy. Leukemia, 10-1, 2016.

Vedadi M, Barsyte-Lovejoy D, Liu F, et al. A chemical probe selectively inhibits G9a and GLP methyltransferase activity in cells. Nat Chem Biol, v. 7, n. 8, 2011.

Welch JS, Ley TJ, Link DC, et al. The origin and evolution of mutations in acute myeloid leukemia. Cell, 150, 2012.

Wen B, Wu H, Shinkai Y, Irizarry RA, Feinberg AP. Large histone H3 lysine 9 dimethylated chromatin blocks distinguish differentiated from embryonic stem cells. Nature Genetics, v. 41, n. 2, 2009.

Wen B, Wu H, Loh YH, Briem E, Daley GQ, Feinberg AP. Euchromatin islands in large heterochromatin domains are enriched for CTCF binding and differentially DNA-methylated regions. BMC Genomics, v. 13, 2012.

58 Ye J, Coulouris G, Zaretskaya I, et al. Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction. BMC Bioinformatics, 13, 2012. Available at: www.ncbi.nlm.nih.gov/tools/primer-blast

Yokochi T, Poduch K, Ryba T. et al. G9a selectively represses a class of late-replicating genes at the nuclear periphery. PNAS USA, v. 106, no. 46, 2009.

Yoshimi A, Kurokawa M. Key roles of histone methyltransferase and demethylase in leukemogenesis. J Cell Biochem, v. 112, n. 2, 2011.

59 Appendix

Protocol: ChIP-seq of histone modifications and chromatin-binding proteins.

1) Chromatin isolation with MNase fragmentation. a) Cells Cryopreserved AML cells (mostly myeloblasts) from unidentified AML patients were isolated from bone-marrow samples by Ficoll-Pacque density gradient centrifugation at Claxton laboratory (IRB protocol 2000-186). For one ChIP experiment, each sample containing approximately 107 cells was thawed and resuspended in PBS for fixation immediately after thawing.

Cultured cells (e.g. K562) in suspension were washed 2 x PBS with centrifugation 5 min at 200 g and then resuspended in PBS for fixation.

Consider cells to be a potential biohazard until after fixation, treat cells and waste (tips, wash liquid, reagents etc.) accordingly, dispose as biohazardous waste. b) Formaldehyde fixation. 1. Prepare fresh 10% formaldehyde in PBS from 37% stock (Fisher, ACS reagent F79-500). Take living cells or freshly thawed cells suspended in PBS at approximately 106 cell/ml at RT. Add 1/10 of the final volume of 10% formaldehyde, for 1% final formaldehyde concentration. Incubate the cells with 1% formaldehyde for 10-15 min at RT, with occasional gentle swirling of tube (do not shake, vortex etc.)

2. Stop the fixation by adding 1M glycine to final concentration 125 mM, invert tube to mix gently, and incubate on ice for no more than 5 min.

3. Spin the cells down in cold centrifuge at 3000 rpm for 5 min., and wash 3 times with 1-2x volume of PBS (spin each time as above), i.e. if you fixed cells in 1 ml wash with 1-2 ml PBS.

4. The fixed cells may be counted to estimate the yield and stored o/n at +4oC.

c) Nuclear Isolation.

Alternate protocol: sucrose cushion nuclei isolation Cell nuclei were isolated using a protocol that employed physiological salt during the nuclear lysis and purification through sucrose layer (Nekrasov, Amrichova et al. 2012). Alternative protocol was used for mouse retina (Popova, Xu et al. 2012).

Notes: All buffers need to be filtered for ChIP and used ice-cold. Pre-chill the centrifuge. Use 14 ml polystyrene Falcon tubes. Add sucrose on the day of the experiment. Add protease inhibitors just before the experiment.

60

1. Resuspend cells in 2 ml of ice-cold swelling buffer I, incubate on ice for 10 min. 2. Add 2 ml of buffer II and transfer to ice-cold Dounce homogenizer. Homogenize with pestle “B” by 20 strokes for ~ 20 min. Nuclei may be checked under microscope with Tripan Blue – if the cells are not broken, the nuclei are transparent; if broken – the nuclei are stained blue. 3. Prepare 2x14 ml tubes with 8 ml of Buffer III (sucrose cushion), keep on ice. 4. Carefully layer 2ml of cell homogenate on top of 8 ml sucrose cushions. 5. Centrifuge at 4700g for 35 min. at +4oC (this will pellet the nuclei and the cell debris will stay on top). 6. Resuspend the nuclei in 1 ml MNase digestion buffer. Take an aliquot ~ 20 ul of nuclear suspension, dissolve 50x in 1% SDS and measure DNA concentration spectrophotometrically. 7. Adjust cell suspension concentration to ~ 20 units O.D. (1 mg/ml DNA) with MNase buffer. Add PMSF to 0.5 mM from 100 mM stock solution in isopropanol. The nuclear suspension may be stored for several days at +4oC.

Nuclear isolation buffers:

MNase digestion Swelling Buffer I Buffer II Buffer III buffer 25 mM HEPES, pH=7.6 0.6 M Sucrose 1.2 M Sucrose 50 mM Tris-HCL, pH=7.6 15 mM NaCl 15 mM HEPES, pH=7.6 15 mM HEPES, pH=7.6 3 mM CaCl2 10 mM KCL 15 mM NaCl 15 mM NaCl 2 mM MgCl2 120 mM KCL 60 mM KCL 0.2% NP-40 2 mM MgCl2 2 mM MgCl2 1 mM EDTA 0.2% NP-40 1 mM EDTA 0.5 mM EGTA 1 mM EDTA 0.5 mM EGTA 0.5 mM DTT 0.5 mM EGTA 0.5 mM DTT 0.5 mM DTT To Buffers I, II and III, add just before using: 1) 0.2 mM PMSF 2) Sigma (Cat# P 2714) proteinase inhibitor at 1x

61 2) Micrococcal nuclease digestion and chromatin solubilization.

The nuclear preparation was digested with micrococcal nuclease to ~150 bp nucleosome sizes (Fig. 1). After digestion, the supernatant was collected and discarded and the nuclear pellet resuspended in SDS-containing lysis buffer with light sonication. A typical yield for granulocyte nuclei was ~ 0.5 mg DNA for 108 cells. The solubilized material was used for immunoprecipitation with specific antibodies against H3K9me2 and other histone modifications.

1. Take 240 ul of the nuclei suspension. Take 40 ul as sample 0. Warm the rest material (200 ul) for 5 min.at +37oC. Add 0.8 ul of Micrococcal nuclease solution (Roche) 15 units/ul. Incubate at +37oC with occasional shaking. Collect 40 ul aliquots at 2, 5, 10, 20, and 40 min. Terminate the reaction by adding 10 mM Na2-EDTA and placing on ice. Add SDS to 0.8% and proteinase K to 0.5 mg/ml. Incubate 1 hr at 55oC and 6 hrs at +65oC 2. To all samples add DNA electrophoresis sample buffer and run DNA electrophoresis in TAE buffer/ 1.2% agarose (80 V, 40 min), stain with EtBr. Select the optimal time of digestion to release predominantly nucleosome core-size DNA (Figure 1). A

Figure 1. A: Agarose gel electrophoresis shows DNA from granulocyte nuclei in the process of Micrococcal nuclease (MNase) digestion for 0 – 40 min. as indicated. MNase converts granulocyte chromatin into nucleosome cores containing ~150 bp DNA fragments (indicated by the arrow) suitable for ChIP and genomic sequencing. B: DNA from solubilized fractions S1 and pellet (P1) resuspended in L-ChIP buffer.

3. Take ~1 ml of the nuclear material. Add 0.5 mM PNSF and warm the nuclei for 5 min. at +37oC. Scale-up by digesting nuclei with 4 ul of Micrococcal nuclease solution (Roche) 15 units/ul for pre-selected time (usually 20 min.) 4. Stop the reaction with 3 mM Na2-EDTA and spin down for 2 min. at 10.000 rpm in pre- chilled microcentrifuge. 5. Collect the supernatant (S1) and add EDTA to 10 mM. Resuspend the pellet (P) in 500 l of the L-CHIP buffer (ChIP lysis-buffer). Add the inhibitors of proteinases: 5ul 100mM PMSF and 5ul of PI cocktail to both S1 and the lysate.

62 6. Take aliquots ~ 20 ul of S1 and the lysate nuclear suspension, dissolve 50x in 1% SDS and measure DNA concentration spectrophotometrically. Normally S1 contains no more than 10% of total DNA. Then use only the lysate for ChIP. 7. Sonicate the lysate 2x10 sec 2set on sonicator (small probe) to ensure solubilization. 8. Measure protein concentration of the lysate. Adjust protein concentration to ~ 1mg/ml by addition of L-CHIP buffer. This lysate can now be stored at 40C for couple of months. 9. Collect 40 ul aliquots of S1 and lysate. Add SDS to 0.8% and proteinase K to 0.5 mg/ml. Incubate 1 hr at 55oC and 6 hrs at +65oC. Run agarose gel to confirm that the size of eluted material is the same as was in the test digest (Fig. 1B). 10. If S1 contains as much or more nucleosomal material than the lysate, use S1 for ChIP by mixing with 1/10 vol of 10x concentrated L-buffer. It also recommended to collect aliquots of the S1 and lysate, treat with 1%SDS-proteinase K as described for the test digest and Add 1 mM PMSF to the lysate and S1, these samples may be stored at +4oC for several weeks.

Major Advantages of MNase Enzymatic Digestion Compared to Sonication (http://www.cellsignal.com/technologies/chip_vs.html): • Enzymatic digestion is milder than sonication and better preserves the integrity of the chromatin and antibody epitopes, which means increased IP efficiency. • Increased IP efficiency of target proteins means enhanced detection of protein-bound DNA loci. • Enhanced detection of target DNA loci means less input chromatin required, saving valuable time and reagents. • Enzymatic digestion provides greater sensitivity with lower background, which is essential when detecting DNA bound transcription factors and cofactors.

63 3) Alternative method: Chromatin isolation from mouse retina with sonication fragmentation (G.Popova, Barnstable lab).

Note: we have not used the method for ChIP-seq library preparations yet but used it for ChIP- PCR to validate ChIP-seq results. Longer DNA fragments resulting from sonication enhance the probability of intact PCR templates for amplification.

1. Dissect ~ 10 retinas. Collect them in 1 ml PBS on ice in epppendorf tube. 2. Centrifuge in cold for 3 min at 500rpm.Take out PBS and put new 1ml PBS. Resuspend to cells level by pipetting. 3. Crosslink: 27ul 37% CHOH (formaldehyde) = 1%. Vortex. Incubate 15min room t, rotating periodically during incubation (or put them on rotator). 4. Quench the crosslinking reaction by addition of 0.125ml of 1M glycine. Vortex. Incubate 5 min on ice 5. Pellet the cells at 2000g, 7min, 40C. 6. Lysis: resuspend the cells in 500 l of L-CHIP buffer. Add the inhibitors of proteinases: 5ul 100mM PMSF and 5ul of PI cocktail. 7. Incubate on ice 10 min. 8. Sonicate the lysate on sonicator (Fisher model 100, small probe): - 10 sec at setting #2 -10 sec at setting #3 - 10sec at setting #4 9. Keep on ice all the time, except during sonication, and allow chilling at least 1 min between sonication 10. Check lysate for efficiency of sonication: to 20ul of sample add 5ul proteinase K and incubate 30 min at 370 C and then overnight at 650C. 11. Measure protein concentration of the lysate. Adjust protein concentration to ~ 1ug/ul by addition of L-CHIP buffer. This lysate can now be stored at 40C for couple of months. Put uncrosslinked lysate from #9 onto 1% agarose gel (TAE buffer + EthBr). The length of DNA fragments should be around 300-800 bp (smear) – see Figure 2.

Figure 2. A: Agarose gel electrophoresis shows DNA from retina nuclei after sonication.

64 4) Chromatin Immunoprecipitation.

This protocol was successfully used in Grigoryev’s lab with hematological tissues and MNase fragmentation (sections 1-2) and in Barnstable lab for retina and other neural tissues with both MNase and sonication DNA fragmentation (section 3). For this technique we have successfully used anti-H3K9me2 (ab1220), anti-H3K9me3 (ab8898), anti-H3 C-tail (ab8898) from Abcam; Anti-H3K4me2 (07–030) and anti-H3K27me3 (07–449) from Upstate. For assessment of many commercial antibodies see (Egelhofer, Minoda et al. 2011). Note: use low-retention plastic tips and low-retention microcentrifuge tubes (e.g. Axygen T-205-WB-C-L 200µL Maxymum Recovery Pipet Tips and MCT-175-L-C Maxymum Recovery 1.7 ml tubes).

Day1: 12. Spin ~350ul of lysate at Eppendorf centrifuge max speed, 5 min. In 15 ml Falcone tube dilute the supernatant 10 fold with D-CHIP buffer (~3.150ml). Add 3ul of Protease inhibitor cocktail (general use, Sigma. Cat# P 2714) and 3ul 100mM PMSF. 13. Add ~5ug of antibody or preimmune serum to the solution. Rotate at +40 C overnight. 14. Simultaneously, prepare the protein A ( or G) Sepharose (e.g. Sigma cat.# P9424 or magnetic beads) beads: 90ul (for 3 samples) of beads slurry wash 2 times with 400ul of washing buffer (9:1 = D-CHIP: L-CHIP), resuspend beads in 400ul washing buffer, add 25ul 10mg/ml salmon sperm DNA (500ug/ml final) and 5ul 10mg/ml BSA (100ug/ml final). 15. Rotate beads at +40 C overnight. 16. Wash the beads (after blocking) 2 times with D-CHIP buffer and resuspend in 300ul washing buffer (same as above = 9 D-CHIP: 1 L-CHIP). Day 2: 17. Add ~100ul beads slurry to each Falcon tube with IP. Rotate at 4oC for 2 hours. 18. Gently centrifuge Falcons tubes with IP reactions in appropriate centrifuge (2000 rpm, 5min). Take out unbound fraction. Resuspend the beads in 1 ml LS-CHIP buffer. Transfer the beads to 1,5ml eppendorf tube. 19. Wash with 1 ml of: 4 times with LS-CHIP buffer 1 times with HS-CHIP buffer Spin at 2000 rpm in Eppendorf centrifuge or use magnet for magnetic beads and gently aspirate the liquid after each wash. 20. To elute the material, add to the beads 350l of E-CHIP buffer. Incubate by rotating 10min at RT, spin 2 min 12.000 rpm (or magnet), take the liquid supernatant = IP CHIP 21. Take 50ul of starting lysate + 300 ul E-ChIP = IP Input 22. Add to each tube with IP CHIP or IP Input 17.5l RNase A (10 mg/ml) and 17.5l Proteinase K (10mg/ml)(final concentration for both 0.5mg.ml). Incubate 30 min 370C. 23. Reverse cross-link by incubating the tubes at 65oC overnight. Day 3: 24. Extract 2 times with phenol/chlorophorm/isoamyl alcohol and 1 times with chloroform by using Phase Lock tubes (5 PRIME, Hamburg, ref# 2302820). 25. Ethanol precipitate the 350ul DNA solution with1.6ml 100% ethanol, +4ul glycogen Sigma G1767 (20ug/ml), +40ul Na acetate (3M, pH=5.5). 26. Dissolve the DNA in 50-100l of fresh, DNase-free water. 27. Measure DNA concentration using Qubit fluorometer (HS mode) and proceed to library preparation for sequencing.

65 The buffers:

L-CHIP (Lysis) 50 mL D- CHIP (Dilution) 50 mL 1% SDS 5 ml (10%) 0.01% SDS 50 ul (10%) 10 mM EDTA 1 ml (0.5 M) 1.1% Triton X-100 5.5 ml (10%) 50 mM Tris HCl pH 8.0 2.5 ml (1 M) 1.2 mM EDTA 0.12 ml (0.5 M) 16.7 mM Tris-HCl pH 8.0 0.835 ml (1 M) 167 mM NaCl 1.67 ml (5 M)

LS-CHIP (Low Salt) 50 mL HS-CHIP (High Salt) 50 mL 0.1% SDS 0.5 ml (10%) 0.1% SDS 0.5 ml (10%) 1% Triton X-100 5 ml (10%) 1% Triton X-100 5 ml (10%) 2 mM EDTA 0.2 ml (0.5 M) 2 mM EDTA 0.2 ml (0.5 M) 20 mM Tris-HCl pH 8.0 1 ml (1 M) 20 mM Tris HCl pH 8.0 1 ml (1 M) 150 mM NaCl 1.5 ml (5 M) 500 mM NaCl 5 ml (5 M)

E-CHIP (elution) 10ml 50ml 1% SDS 1 ml (10%) 5 ml (10%) 0.1 M NaHCO3 0.084 g 0.42 g

Glycine 1M = 0.75 g/10 ml

66 5) Controls and yields:

For typical controls, conduct two parallel immunoprecipitation reactions with a) no antibody added; b) similarly prepared IgG from a none-immunized animal (e.g. rabbit IgG #2729, Cell Signaling). A typical yield from immunoprecipitation with antibodies against H3K9me2 (Abcam, ChIP Grade, ab1220) is more than 400 ng for 40 ug of the input (ratio >1%). Nonspecific binding in the control samples < 0.02%. The yields vary between different cells depending on the histone modification level. For example, for a number of primary AML cells, yields of ChIP-recovered DNA was in the range of 420 - 4200 ng and ratio of ChIP to input was between 0.45 and 3.4% (Table 1). These variations are consistent with natural variations of H3K9me2 levels in leukemia blood cells described by us before (Popova, Claxton et al. 2006).

AML 126 150 176 218 245 248 267 Cell x108 1.2 1.2 1.2 1 1 0.9 1.6 DNA ng nuclei 1405000 275000 186000 760000 1137000 817000 711000 DNA ng input 403200 86333 92167 136267 125300 115500 70467 DNA ng ChIP 4280 924 418.8 2184 1824 1452 1416 ChIP/input 0.0106 0.0107 0.0045 0.0160 0.0146 0.0126 0.0201

AML 274 329 424 452 472 482 Cell x108 1 1.2 1.1 1 0.96 0.78 DNA ng nuclei 1205000 171000 630000 1022000 402500 456000 DNA ng input 336000 29400 62300 90067 73266 55533 DNA ng ChIP 1840 993.6 721.2 1344 1548 472.8 ChIP/input 0.0055 0.0338 0.0116 0.0149 0.0211 0.0085

Table 1. Yield of chromatin immunoprecipitation from AML-derived myeloid progenitors with H3K9me2 antibodies.

6) ChIP optimization:

We conducted several experiments to test antibody efficiency for immunoprecipitating chromatin labeled by H3K9me2. First, we conducted ChIP using two different commercially available types of antibodies at the same concentration : Abcam, ChIP Grade, ab1220 and Millipore 05- 1249. We observed that for the same concentration of antibody used, the yield for Abcam antibodies was, on average, 1.84 fold higher so that in subsequent experiments we used only Abcam ab 1220. As Abcam antibodies are IgG2a type that could potentially react with protein A and protein B, we set experiments to test which type of immobilized protein is best suited for immunoprecipitation. For this analysis, we tested similar amount of sonicated material (36 ug) to react with the same amounts of Abcam antibodies (1.8 g) and immunoprecipitated with a) Protein A beads b) protein G beads c) Protein A-G combination. We observed that protein A was

67 most efficient of immunoprecipitating H3K9me2 chromatin but combination with protein G was able to increase the efficiency by 20% (Fig. 3A). In all subsequent experiments we employed combination of protein A-G beads. Next we tested the optimal concentration of antibodies for immunoprecipitating of fixed amount of material (each sample contained 36 ug of DNA) with variable concentration of antibodies (0 - 3.6 g). We found that the maximal amount of antibody was most efficient to pull down material for deep sequencing (Fig. 3B). In all subsequent experiments we used 3.6 g of antibodies that appeared to be a cost-efficient way of conducting ChIP for H3K9me2. Finally, we estimated the number of cells providing necessary amount for ChIP by keeping the amount of antibody constant (2.7 g/ml) and changing the amount of material from 0 to 36 ug per sample. We estimated that with 3.6 ug of input DNA (corresponding to an yield from ~2x106 cells) we could generate 100 ng, ChIP DNA sufficient for robust sequencing (Fig. 3C).

Figure 3. Testing and optimizing conditions for chromatin immunoprecipitation with H3K9me2 antibodies. A: similar amount of chromatin (36 g by DNA) were reacted with 1.8 g of Abcam ab 1220 antibodies and immunoprecipitated with Protein A beads, protein G bead, and Protein A+G combination. Control immunoprecipitation contained no antibodies and A+G beads. B: chromatin samples each containing 36 g of DNA were reacted with variable concentration of - H3K9me2 antibodies as indicated. C: Chromatin samples containing different amounts of input DNA (indicated at the X-axis) were reacted each with 2.7 g -H3K9me2 antibodies. Y-axis shows the amount of DNA recovered by ChIP.

68

7) Library preparation for Next-Gen sequencing

For library preparation for Illumina sequencing we use NEBNext ChIP-Seq library preparation reagent set for Illumina (cat #E6200S/L for 12/60 reactions) and NEBNext Multiplex Oligos for Illumina Index Primers Set 1 (cat #E7335S/L for 24/96 reactions)

1. For one library take 10-30 ng of chromatin-immunoprecipitated or control DNA in ≤ 40 ul of water or elution buffer. 2. End repair of ChIP DNA and clean by AMPure XP magnetic beads (Agencourt part # A63880) - see NEBNext ChIP-Seq library preparation manual. 3. dA-Tailing and clean by AMPure XP magnetic beads - see NEBNext ChIP-Seq library preparation manual. 4. Adaptor ligation, and cleaning of DNA by AMPure XP magnetic beads conducted according to NEBNext ChIP-Seq library preparation manual with exception that after adaptor ligation and AMPure cleaning (step 4), the material is cleaned and redissolved in 20 ul of 0.1 x TE buffer for agarose gel size selection rather than 100 ul. 5. Add 4 μl of 6X Gel Loading Dye to 20 μl of sample DNA. 6. Prepare a 50 ml, 2% agarose with SyBr Gold gel using 1 X TAE Buffer as follows: a. Add 1 g of agarose powder in 50 ml of 1X TAE buffer. b. Microwave the gel buffer until the agarose powder is completely dissolved. c. Cool the gel buffer on the bench for 5 minutes, and then add 5 μl of 10,000 X SyBr Gold concentrate (Invitrogen cat# S11494). Swirl to mix. d. Insert comb (5x1.5 mm wells) and pour the entire gel buffer to a 10 x 6 cm gel tray

NOTE: The final concentration of SyBr Gold should be 1X in the agarose gel buffer. It is very important to pre-stain your gel with SyBr Gold. When using other staining dyes or staining the gel after running, the DNA will migrate more slowly than the ladder. This will result in cutting out the wrong size fragments.

7. Add 32 μl TAE Buffer with 1 x SyBr Gold and 7 μl of 6X Gel Loading Dye to 1 μl of DNA ladder (100 bp DNA ladder with 6X gel loading dye – NEB N3231S). 8. When the agarose gel is set, put it in the gel electrophoresis unit and fill the tank with 1X TAE Buffer to the maximum fill mark. 9. Load 20 ul of the ladder solution onto one lane of the gel. 10. Load the 20 ul samples onto the other lanes of the gel, leaving a gap of at least one empty lane between samples and ladders.

NOTEs: Flanking the library on both sides with ladders may make the library excision easier. When handling multiple samples, to avoid the risk of cross-contamination between libraries, leave a gap of at least one empty lane between samples and use ladders on the first and last well of the gel to help locate the gel area to be excised.

11. Run the gel at 90 V constant voltage for 90 minutes. 12. View the gel on a Dark Reader transilluminator.

69 13. Excise a band from the gel spanning the width of the lane and ranging in size from 175- 225 bp (150 bp nucleosome plus 50 bp adaptors) using a clean scalpel. Use the DNA ladder as a guide. 14. Take a snapshot of the gel after band excision for a record (Figure 4). 15. Purify the DNA on one Qiagen QIAquick (cat# 28704) gel extraction mini column and elute in 50 ul of sterile water or elution buffer. 16. Take 23 ul for one PCR enrichment of adaptor-ligated PCR (see NEBNext ChIP-Seq library preparation manual). Use one universal primer and one out of 12 index primers (NEB cat #E7335S) for multiplexing. We use 15-17 cycles of PCR depending on the amount of starting material.

Note: carefully record the number of index primer in the PCR reaction. It is suggested to include the number of the index primer in the PCR sample number.

17. Clean the PCR-amplified library DNA by AMPure XP magnetic beads. Redissolve in 20 ul of 0.1 x TE buffer. 18. Measure DNA concentration using Qubit fluorometer (HS mode).Expected yield – 40 – 100 ng/ul. 19. Take x 1 ul from each PCR reaction, add 9 μl TAE Buffer with 1 x SyBr Gold and 2 μl of 6X Gel Loading Dye and run DNA electrophoresis in TAE buffer/2% agarose (80 V, 40 min) with SyBr Gold (Figure 5). 20. Dilute the library 50 fold with nuclease free water, and assess the library quality on a Bioanalyzer® (Agilent high sensitivity chip). Check that the electrophoregram shows a narrow distribution with a peak size around 275 bp is expected (Figure 6).

Figure 4. SyBr Gold stained 2% Figure 5. SyBr Gold stained 2% agarose gel after excision of two agarose gel after applying of 1 ul 175-225 bands corresponding to DNA after PCR amplifications (15- adaptor-conjugated library DNA 17 cycles) of adaptor-conjugated DNA libraries

70

Figure 6. Bioanalyzer traces of final libraries. Note: DNA from samples 1, 2, 5-9 was extracted for 175-225 bp range resulting in library peaks at 275 bp. Samples 3, 4, 10 were extracted for 150-175 bp range resulting in library peaks at 250 bp. Sample 11 was extracted for 225-275 bp range resulting in library peaks at 320 bp.

71 8) Sequencing and statistics.

We submitted the DNA libraries shown in Fig. 5 for deep sequencing using Illumina Hi-Seq 2500 sequencer at Penn State Genomic core facility (University Park). We used rapid-run mode with 2 lanes, 4 samples per lane and 100 nucleotides per read. The cost of HiSeq Rapid-Run 100 nt single read sequencing was $1300 per lane.

For each sample Illumina sequencing provides a fastq file and fastqc report. For example, each of 8 fastq files obtained from 8 libraries sequenced in one run on two lanes contains 32-40 mln. 100 bp. reads. About 83-93% of all reads were successfully and uniquely aligned to the human genome using Bowtie software.

Sample 1: # reads processed: 34781372 # reads with at least one reported alignment: 30840953 (88.67%) # reads that failed to align: 3940419 (11.33%) Reported 30840953 alignments to 1 output stream(s) Sample 2: # reads processed: 32062020 # reads with at least one reported alignment: 29687113 (92.59%) # reads that failed to align: 2374907 (7.41%) Reported 29687113 alignments to 1 output stream(s) Sample 5: # reads processed: 35785754 # reads with at least one reported alignment: 31627859 (88.38%) # reads that failed to align: 4157895 (11.62%) Reported 31627859 alignments to 1 output stream(s) Sample 6: # reads processed: 36996584 # reads with at least one reported alignment: 31737180 (85.78%) # reads that failed to align: 5259404 (14.22%) Reported 31737180 alignments to 1 output stream(s) Sample 7m: # reads processed: 39123015 # reads with at least one reported alignment: 34583744 (88.40%) # reads that failed to align: 4539271 (11.60%) Reported 34583744 alignments to 1 output stream(s) Sample 7t: # reads processed: 32187175 # reads with at least one reported alignment: 27005691 (83.90%) # reads that failed to align: 5181484 (16.10%) Reported 27005691 alignments to 1 output stream(s) Sample 8: # reads processed: 40598207 # reads with at least one reported alignment: 36561290 (90.06%) # reads that failed to align: 4036917 (9.94%) Reported 36561290 alignments to 1 output stream(s) Sample 9: # reads processed: 37715448 # reads with at least one reported alignment: 35018636 (92.85%) # reads that failed to align: 2696812 (7.15%) Reported 35018636 alignments to 1 output stream(s)

72

References

Egelhofer, T. A., A. Minoda, et al. (2011). "An assessment of histone-modification antibody quality." Nat Struct Mol Biol 18(1): 91-93.

Nekrasov, M., J. Amrichova, et al. (2012). "Histone H2A.Z inheritance during the cell cycle and its impact on promoter organization and dynamics." Nature structural & molecular biology 19(11): 1076-1083.

Popova, E. Y., D. F. Claxton, et al. (2006). "Epigenetic heterochromatin markers distinguish terminally differentiated leukocytes from incompletely differentiated leukemia cells in human blood." Exp Hematol 34(4): 453-462.

Popova, E. Y., X. Xu, et al. (2012). "Stage and gene specific signatures defined by histones H3K4me2 and H3K27me3 accompany mammalian retina maturation in vivo." PLOS One 7(10): e46867.