EPIGENETIC BASIS OF STEM CELL IDENTITY IN NORMAL AND MALIGNANT HEMATOPOIETIC DEVELOPMENT by Namyoung Jung
A dissertation submitted to Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy
Baltimore, Maryland July, 2015
© 2015 Namyoung Jung All Rights Reserved
Abstract
Acute myeloid leukemia (AML) is a heterogeneous hematologic malignancy characterized by subpopulations of leukemia-initiating or leukemia stem cells (LSC) that give rise to clonally related non-stem leukemic blasts. The LSC model proposes that since LSC and their blast progeny are clonally related, their functional properties must be due to epigenetic differences. In addition, the cell of origin of LSC among normal hematopoietic stem and progenitor cells (HSPCs) has yet to be clearly demonstrated. In order to investigate the role of epigenetics in LSC function and hematopoietic development, we profiled DNA methylation and gene expression of CD34+CD38-,
CD34+CD38+ and CD34- cells from 15 AML patients, along with 6 well-defined HSPC populations from 5 normal bone marrows using Illumina Infinium
HumanMethylation450 BeadChip and Affymetrix Human Genome U133 Plus 2.0 Array.
To define LSC and blast functionally, we performed engraftment assays on the three subpopulations from 15 AML patients and defined 20 LSCs and 24 blast samples. We identified the key functional LSC epigenetic signature able to distinguish LSC from blasts that consisted of 84 differential methylations regions (DMRs) in 70 genes that correlated with differential gene expression. HOXA cluster genes were enriched within the LSC epigenetic signature. We found that most of these DMRs involve epigenetic alteration independent of underlying mutations, although several are downstream targets of genetic mutation in epigenome modifying enzymes and upstream regulators. The LSC epigenetic signature could predict overall survival for AML patients independent of known risk factors such as age and cytogenetics. We characterized epigenetic changes during normal human hematopoietic development and identified key novel regulators for
ii hematopoietic differentiation such as HMHB1 and MIR539. We found that global hypomethylation is a critical mechanism of lineage commitment in human hematopoiesis.
Our DNA methylation analysis in human hematopoiesis revealed variable epigenetic regulation compared to murine hematopoiesis. Furthermore, we found that LSC populations formed two distinct clusters resembling either lymphoid-primed multipotent progenitors (L-MPPs) or granulocyte/macrophage progenitors (GMPs). These results provide the first evidence for epigenetic variation between LSC and their blast progeny in
AML, and its prognostic power. We also provided comprehensive methylome map of human hematopoiesis and identified epigenetically distinct subgroups of AML LSCs that likely reflect the cell of origin.
Readers:
Ravindra Majeti, M.D., Ph.D.
Andrew P. Feinberg, M.D., M.P.H.
Thesis Advisor:
Andrew P. Feinberg, M.D., M.P.H.
iii Acknowledgments
Over the past 6 years of my graduate study, many people have contributed to my
personal and professional growth. It would be impossible to mention everyone who
helped me to finish this long journey in graduate school, but I wish to acknowledge
couple people here.
Foremost, I would like to express my deepest gratitude to my thesis advisor, Dr.
Andrew Feinberg for his relentless support, patience, and motivation. Andy has given
insightful suggestions and comments on any subject that we had a conversion on. He also taught me how I should behave in academic setting as a professional scientist. Without
Andy’s guidance and persistent help throughout my graduate study, this thesis would not have been possible.
I would like to thank to Dr. Ravindra Majeti at Stanford who has been a wonderful thesis committee member, the second reader of my thesis, and collaborator.
Ravi has provided his expert advices and valuable knowledge in leukemia and
hematopoiesis throughout this thesis. His comments and suggestions were an enormous
help for me to learn knowledge in a different field.
I am deeply grateful to my thesis committee chair, Dr. Roger Reeves for his
generous support and encouragement for this thesis and my career, and thesis committee
member, Dr. Donald Small for his advices and support.
I wish to acknowledge Dr. Yunje Cho, my undergraduate thesis advisor, for his
encouragements and advices, which have been motivation for my PhD work.
iv I am deeply grateful to my collaborators, Dr. Andrew Gentles and Dr. Rafael
Irizarry for their statistical advices and comments, and Dr. Bo Dai for the essential part of
functional experiments of this thesis. I would like to express my gratitude toward the
AML patients and their family members for their decisions to provide valuable samples
for this thesis.
I could not have done my PhD work without the advices and supports provided by
current and former Feinberg lab members who have had impact on every aspect of this
thesis. Akiko Doi and Brian Herb have always offered great feedbacks for experiments
and encouragements to move forward during my graduate course work. It has been very
lucky to have Amy Vandiver as my bench mate for past 5 years, who has provided useful
suggestions for science, and warm support for any decision related to graduate course
work and even personal life. I’m grateful to Carolina Montano, Lindsay Rizzardi,
Hwajin Lee, Xin Li, Yun Liu, Hong Ji, Peter Murakami, Michael Multhaup, Varenka
Rodriguez, and Elisabet Pujadas for useful discussions, feedbacks, and their friendship. I thank to Rakel Tryggvadottir and Arni Runarsson for their experimental help.
I am grateful to Samsung Scholarship that has offered financial support for first five years, and Mogam Science Scholarship Foundation for financial support for last year of my graduate study.
Many friends have supported and helped me to stay centered throughout past 6 years. I have been fortunate to have warmhearted CMM classmates, who have helped me to adapt to a new culture. Friends of Korean community and in Korea have been always there to listen to my problems, and given me great emotional support.
v Last but not least, this long journey in graduate school would have been impossible without the warm support of my family. My grandparents, Jongsung Jung and
Oksoon Cho have offered priceless life lessons and taught me how to be a polite person.
My parents, Insoo Cheong and Geumsook Choi, have provided unconditional love and support for any decision that I have made. My two younger sisters, Gayoung Jung and
Chaeyoung Jung have been the best friends throughout my life giving me emotional support.
I will always be grateful to all the individuals mentioned above who shaped me as a professional scientist.
vi Table of Contents
Title Page ...... i
Abstract ...... ii
Acknowledgements ...... iv
Table of Contents ...... vii
List of Tables ...... viii
List of Figures ...... xi
Chapter 1: Introduction ...... 1
Chapter 2: Epigenetic signature of leukemia stem cell ...... 27
Chapter 3: Epigenetic basis of human normal hematopoietic development ...... 135
Chapter 4: The cell of origin of leukemia stem cell ...... 168
References ...... 190
Curriculum Vitae ...... 207
Appendix ...... See attached files
vii List of Tables
Table 1.1. French-American-British (FAB) classification ...... 25
Table 1.2. Cytogenetics and prognosis ...... 26
Table 2.1. Clinical features of AML patients in this study ...... 60
Table 2.2. Genetic mutations identified ...... 61
Table 2.3. Engraftment of AML Subpopulations ...... 62
Table 2.4. DMRs of LSC vs Blast ...... See Appendix1
Table 2.5. Summary of LSC vs Blast DMRs ...... 64
Table 2.6. LSC epigenetic signature ...... 65
Table 2.7. Second DMR analysis to examine confounding effect of MLL cases ...... 70
Table 2.8. Ingenuity pathway analysis ...... 71
Table 2.9. Ingenuity upstream regulator analysis ...... 77
Table 2.10. Association of LSC epigenetic signature with DMRs of genetic mutations ......
...... 120
Table 2.11. Multivariate analysis of overall survival of TCGA patients using either DNA methylation or gene expression ...... 125
Table 2.12. Univariate overall survival analysis for LSC epigenetic signature regarding differential gene expression in various cohorts ...... 126
viii Table 2.13. Multivariate overall survival analysis for LSC epigenetic signature regarding differential gene expression in various cohorts ...... 127
Table 2.14. Univariate overall survival analysis for genetic mutations in epigenome modifying enzymes in TCGA ...... 128
Table 2.15. Multivariate overall survival analysis including DNMT3A mutation for LSC epigenetic signature in TCGA ...... 129
Table 2.16. Multivariate overall survival analysis for LSC epigenetic signature within intermediate cytogenetic risk patients in TCGA ...... 130
Table 2.17. Antibodies for Flow Cytometry ...... 131
Table 2.18. Primers used for sequencing of TET2, IDH1, IDH2, and DNMT3A mutations of AML ...... 132
Table 3.1. Normal bone marrow donor sample analysis ...... 158
Table 3.2. Antibodies for Flow Cytometry ...... 159
Table 3.3. DMR lists for pair-wise comparisons among HSPCs ...... See Appendix2
Table 3.4. Summary of DMRs identified in the indicated pairwise comparisons ...... 161
Table 3.5. Enrichment of DMRs for normal hematopoiesis in super-enhancers ...... 162
Table 3.6. DMRs of normal hematopoiesis located in super-enhancer of different tissues and cell types ...... See Appendix3
Table 3.7. Common genes between mouse and human hematopoiesis ......
ix ...... See Appendix4
Table 3.8. Primers for bisulfite pyrosequencing ...... 165
Table 4.1. DMRs for normal hematopoiesis ...... 181
Table 4.2. FAB type distribution for L-MPP-like and GMP-like AML samples ...... 189
x List of Figures
Figure 1.1. The role of epigenetics in multicellular organism ...... 24
Figure 2.1. Pre-sort and post-sort FACS analysis of subpopulations from human
AML ...... 44
Figure 2.2. Gene expression inversely correlates with DMRs at CpG island and open sea
...... 46
Figure 2.3. Gene body methylation doesn’t show statistically significant positive correlation with gene expression ...... 48
Figure 2.4. AML LSC and Blasts exhibit epigenetic differences that define an LSC epigenetic signature ...... 51
Figure 2.5. NPM1 mutation is associated with decreased methylation and increased expression of HOXA genes ...... 52
Figure 2.6. The LSC epigenetic signature is partially dependent on underlying somatic mutations ...... 55
Figure 2.7. The LSC epigenetic signature is associated with overall survival in human
AML ...... 56
Figure 2.8. The gene expression of the LSC epigenetic signature highly correlates with clinical outcome in the TCGA dataset ...... 58
Figure 2.9. R script for multivariate survival analysis ...... 59
xi Figure 3.1. Schematic of human hematopoiesis with the immunophenotype of individual
HSPC ...... 148
Figure 3.2. Pre-sort and post-sort FACS analysis of HSPCs from human bone marrow .....
...... 149
Figure 3.3. Comprehensive DNA methylation analysis shows tight clustering of human
hematopoietic stem and progenitor cells (HSPCs) ...... 150
Figure 3.4. DMR plots indicating genomic loci for genes with previously known functions in hematopoiesis MPO and CDK6 ...... 151
Figure 3.5. DMR plots indicating genomic loci for newly identified genes with previously unknown functions in hematopoiesis HMHB1 and MIR539 ...... 152
Figure 3.6. Location of DMRs for normal hematopoiesis relative to CpG island ...... 153
Figure 3.7. Gene expression inversely correlates with DMRs at non-CpG island regions in normal hematopoiesis ...... 154
Figure 3.8. Examples of DMRs located in master transcription factor or super-enhancer ...
...... 156
Figure 3.9. Global methylation changes during hematopoietic development in human and mouse ...... 157
Figure 4.1. Epigenetic signatures define subgroups of AML LSC reflecting the cell of origin ...... 175
xii Figure 4.2. Clustering analysis of AML populations with normal HSPCs using length matched random 216 regions ...... 176
Figure 4.3. Epigenetic signatures define subgroups of AML samples in TCGA reflecting the cell of origin ...... 177
Figure 4.4. Cell identity of TCGA AML samples ...... 178
Figure 4.5. Distribution of FAB types for L-MPP-like and GMP-like TCGA AML samples ...... 179
Figure 4.6. Correlation of disease features with cell identity ...... 180
xiii Chapter1
Introduction
1 1. Epigenetics
Overview
Epigenetics is the study of information that is heritable after cell division other than the
primary DNA sequence. Epigenetics is involved in many cellular processes such as embryonic development, differentiation, and an interaction with environmental stimuli.
Mechanisms of epigenetics include DNA methylation, histone modification, chromatin factors, chromatin structure, and noncoding RNAs. Dysregulation of the epigenetic modifications is known to be involved in many diseases such as cancer.
Definition (historical and modern)
Conrad Waddington first coined the term ‘epigenetics’ in 1942 as a mechanism of
how genotype brought about phenotype during development (Waddington, 2012). He
proposed the ‘Epigenetic landscape’ as a model for cellular development that a ball is rolling down a hill to the lowest points (Goldberg et al., 2007). The ‘Epigenetic landscape’ is a metaphor of how a cell (a ball) decides its fate (the lowest points) (Goldberg et al.,
2007). The role of epigenetics is evident in multicellular organisms, where the vast
majority of cells have the same genomic sequence, yet exhibit distinct cellular
phenotypes. All cells in the human body originate from the same source, pluripotent
stem cells in the inner cell mass (ICM) of a blastocyst. The pluripotent stem cell
differentiates into different types of cells such as neurons, hepatocytes, or tubular
epithelial cells, which constitute brain, liver, and kidney, respectively. As all these cells come from the pluripotent stem cells in the ICM, they have the same genomic sequence,
2 yet display distinct phenotypes and functionality (Figure 1.1). In order to investigate the
role of epigenetics in different levels of biological processes systematically, NIH
Roadmap Epigenomics Project was launched in 2008 (Bernstein et al., 2010). The NIH
Roadmap Epigenomics Project defines epigenetics as both heritable and other stable and long-term changes in gene expression regulation of a cell. Collective efforts to provide a public resource of epigenomic information have generated comprehensive epigenomic maps of DNA methylation, histone modifications, DNA binding protein, chromatin accessibility, and noncoding RNAs in diverse cell types and tissues (Bernstein et al.,
2010; Roadmap Epigenomics et al., 2015). Among the epigenetic modifications, DNA methylation will be the focus of this thesis.
DNA methylation
In mammals, DNA methylation is generally referred to as a covalent modification in which a methyl group is attached to the fifth position of cytosine in CpG dinucleotides.
Note that non-CG methylation in CHG and CHH (H=A, C or T) contexts has been reported in stem cell, neuron, and other differentiated tissues, yet its functional role is still an active research area (Lister et al., 2013; Lister et al., 2009; Ramsahoye et al., 2000;
Schultz et al., 2015). DNA methylation is established and maintained by three DNA methyltransferases: DNMT1, DNMT3A and DNMT3B (Li et al., 1992; Okano et al.,
1999). These enzymes use S-adenosyl methionine (SAM) as a methyl group donor.
DNMT1 is responsible for methylation of hemimethylated DNA after DNA replication, while DNMT3A and DNMT3B are able to methylate both hemi and unmethylated DNA,
3 so serve as de novo methyl transferases (Leonhardt et al., 1992; Okano et al., 1999). The
human genome contains about 28 million CpGs and 60-80% of those are generally
methylated (Smith and Meissner, 2013). A small portion (~10%) of CpGs, not
methylated clusters together, and thus establishes a genomic region called CpG islands,
which are predominantly located in the promoter of coding genes (Bird, 1986; Deaton
and Bird, 2011). Maintaining the unmethylated status of CpG islands requires
transcription factor binding, deposition of histone variant H2A.Z, and trimethylation of
histone H3 at lysine4 (H3K4me3) that inhibit the binding of DNA methyltransferases
(Brandeis et al., 1994; Conerly et al., 2010; Macleod et al., 1994; Otani et al., 2009). The
repression of CpG island-promoters genes is largely mediated by histone modification,
particularly trimethylation of histone H3 at lysine27 (H3K27me3), and polycomb
proteins (Bartke et al., 2010; Brinkman et al., 2012; Jones, 2012). However, DNA
methylation at CpG island promoters does occur to suppress CpG island containing genes that are targets of specific biological processes such as X-chromosome inactivation and genomic imprinting (Jones, 2012).
Recently, the CpG-island centric view has been challenged, as methods to measure DNA methylation have been improved with the development of array and sequencing technology. There has been an accumulation of evidence that other genomic regions also have important roles in development and disease progression. A large portion (~80%) of methylation differences among different tissues in human and mouse has been reported to not occur in CpG islands, but ‘CpG island shores’, located up to 2kb distant from CpG islands (Irizarry et al., 2009). Besides the differential methylation involved in normal tissue development, most differentially methylated regions (DMRs)
4 that distinguish human colon cancer from normal colon tissues, human induced
pluripotent stem (iPS) cells from fibroblast, and murine hematopoietic stem and
progenitor cells (HSPCs) are located in CpG island shores (Doi et al., 2009; Hansen et al.,
2011; Irizarry et al., 2009; Ji et al., 2010). Interestingly, the tissue-specific, cancer-
specific and reprogramming-specific DMRs (tDMR, cDMR, and rDMR, respectively)
showed statistically significant overlap with each other, indicating that there may be a
core set of genomic regions targeted for epigenetic regulation during normal development
and disease progression. In addition, DNA methylation at CpG island shores showed
strong inverse correlation with gene expression, while DNA methylation at CpG islands
did not show a statistically significant correlation with gene expression in these data sets,
suggesting functional importance of CpG island shores in diverse biological contexts
(Doi et al., 2009; Irizarry et al., 2009; Ji et al., 2010).
In cancer epigenetics, it has long been known that hypermethylation in CpG
islands is a core mechanism of cancer progression, yet various types of alterations have
been observed in other genomic regions. For example, in colon cancer, hypermethylation
is a major type of change in CpG island shores (Irizarry et al., 2009). A large-scale
change in half the genomic regions called hypomethylated ‘block’ has been identified in solid tumors with increased variation in expression of genes inside the blocks (Hansen et al., 2011; Timp et al., 2014; Timp and Feinberg, 2013). Stochastic variation of DNA methylation in cDMRs distinguished solid tumors from its normal counter parts, and different stages of tumors (Hansen et al., 2011; Timp et al., 2014). Thus, epigenetic dysregulation of increased epigenetic plasticity with genetic mutation would be a critical mechanism of cancer progression (Feinberg et al., 2006; Timp and Feinberg, 2013).
5 Recently, this hypomethylated large domain has also been observed in ageing phenotype.
Sun-exposed epidermal samples in older people (Age>65) showed the hypomethylated
blocks that overlapped with blocks in colon cancer and squamous cell carcinoma
(Vandiver et al., 2015). The hypomethylated blocks largely overlap with higher order
genomic regions such as large organized chromatin lysine-modifications (LOCKs) or
nuclear lamin-associated domains (LADs), suggesting the altered large scale DNA
methylation is associated with dysregulation of the large scale genomic regions in disease
progression.
Besides CpG islands and shores, gene body methylation has received a lot of
attention, yet its functional importance remains to be determined. DNA methylation in
gene bodies has traditionally been thought to silence repetitive DNA sequences, such as
retroviruses and LINE elements (Yoder et al., 1997). Recent evidence suggests that gene
body methylation positively correlates with gene expression in normal tissues and cancer samples (Kulis et al., 2012; Varley et al., 2013; Yang et al., 2014). Potential
mechanisms of gene body methylation in the regulation of gene expression include
effects on transcription elongation or splicing regulation (Jones, 2012; Laurent et al.,
2010). However, the functional role of gene body methylation in transcription regulation
is not evident yet, compared to other genomic regions.
DNA demethylation
DNA methylation is a stable covalent modification on genomic sequences, while
it is also reversible. Both passive and active loss of DNA methylation can occur through
6 diverse biological processes. Passive DNA demethylation occurs during consecutive
DNA replication in the absence of functional DNMT1 activity (Kohli and Zhang, 2013).
Recent advances in identification of enzymes involved in methyl group removal from
cytosine have facilitated our understanding of active DNA demethylation process. Ten-
eleven translocation (TET) family enzymes, TET1, TET2, and TET3 are able to oxidize
methyl cytosine and generate intermediate products including 5-hydroxymethylcytosine
(5hmC), 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) (Ito et al., 2010; Ito et al.,
2011; Kriaucionis and Heintz, 2009; Tahiliani et al., 2009). These oxidized 5mC intermediates undergo further removal processes: passive removal by sequential DNA replication, direct removal, or DNA repair pathway-associated removal (Kohli and Zhang,
2013). Among these, the base excision repair (BER) pathway has been actively investigated. Thymine DNA glycosylase (TDG), an enzyme involved in BER is known to have an ability to remove thymine from G-T mismatches from normal DNA context
(Cortazar et al., 2007). Recently, it has been reported that TDG is required for epigenetic stability in embryonic development in mice (Cortellino et al., 2011). Since TDG has an ability to remove thymine from G-T mispair, it has been hypothesized that 5mC or oxidized 5mCs may be converted to thymine or uracil by deaminase first. Several studies have suggested that AID/APOBEC enzymes, known cytosine deaminases, play a role in deamination of the 5mC in reprogramming or embryonic development (Bhutani et al.,
2010; Kumar et al., 2013; Popp et al., 2010). Yet, a controversy over the role of the deaminases in DNA demethylation still exists, due to their limited enzyme activities on modified cytosines (Kohli and Zhang, 2013). In addition to the deaminase-mediated BER, other studies have shown that TDG can directly remove 5fC and 5caC (He et al., 2011;
7 Maiti and Drohat, 2011). DNA demethylation is implicated in multiple biological
processes including pre-implantation methylation dynamics, primordial germ cell (PGS)
reprograming, maintenance of stem cell pluripotency and cancer development (Kohli and
Zhang, 2013).
Methods to measure DNA methylation
Array-based (CHARM and Illumina Infinium HumanMethylation450 BeadChip)
In 2008, the comprehensive high-throughput arrays for relative methylation
(CHARM) method was developed to provide the first platform to interrogate DNA methylation in a genome-wide and non-CpG island biased manner (Irizarry et al., 2008).
CHARM utilizes a methylation dependent restriction enzyme, McrBC, that cleaves DNA containing methylated cytosines. Sheared genomic DNA is divided into two fractions; one for undigested control and the other for McrBC digestion. The undigested control and
McrBC digested samples undergo size selection. The size selected DNA is amplified and hybridized to a tiling array that excluded isolated CpGs. The log-ratio of the signal intensities from the array of the untreated and McrBC treated samples (M-value) is measured. Since methylation status of neighboring CpGs is likely to correlate with each other, the measured M-value is averaged within a given genomic region of interest. This process is called ‘Genome-weighted smoothing’ that improves the accuracy and specificity of measuring methylation at CpG sites (Irizarry et al., 2008). Many studies have applied CHARM to identify genome-wide differential methylation in different model systems: tissue development, cellular reprogramming, hematopoiesis, cancer, and
8 behavior of social insects (Doi et al., 2009; Herb et al., 2012; Irizarry et al., 2009; Ji et al.,
2010; Kim et al., 2010; Kim et al., 2011).
Illumina has endeavored to develop commercially available arrays to measure genome-wide DNA methylation, and produced Infinium HumanMethylation27 BeadChip
(27K) and 450 BeadChip (450K). As implied in its name, 27K covers about 27000 CpG sites, mostly enriched in CpG islands, while 450K covers about 480000 CpG sites encompassing diverse genomic regions selected from previous studies for tDMR, cDMR, rDMR, non-CpG methylated sites, and miRNA promoter regions other than CpG islands.
In this section, we will focus on 450K, the method used in this thesis. The 450K array utilizes bisulfite conversion and genotyping of the C/T polymorphism method to detect methylation at a CpG site quantitatively. Methylated cytosine remains as cytosine, but unmethylated cytosine is converted to uracil after bisulfite conversion (Clark et al., 1994;
Frommer et al., 1992). Note that this method does not distinguish 5mC from 5hmC, because 5hmC remains as cytosine after bisulfite conversion (Huang et al., 2010). For the 450K array, DNA sample is treated with bisulfite, then amplified by PCR. The bisulfite treated and amplified DNA is hybridized on the 450K array that returns the measurement of methylation level at CpG sites on a probe on the array (Dedeurwaerder et al., 2011).
Sequencing-based (Bisulfite pyrosequencing and whole-genome bisulfite sequencing
(WGBS))
Two different levels of sequencing technology are widely used to measure DNA methylation: bisulfite pyrosequencing for a small genomic regions and WGBS for
9 genome-wide level. Bisulfite pyrosequencing is based on bisulfite conversion. As
explained above, methylated cytosine is not affected by bisulfite treatment, while
unmethylated cytosine is converted to uracil, and eventually thymidine after PCR
amplification. Step-wise incorporation of deoxynucleotide triphosphates (dNTPs) during
sequence extension releases pyrophosphates, which are converted to ATP by
sulphurylase. Then, luciferase converts luciferin to oxyluciferin using ATP and produces
light, which will be detected by pyrosequencing machine. The intensity of the released
light is proportional to the amount of the nucleotides incorporated at a single base site.
The C to T ratio can be quantitatively measured by the amount of dCTP or dTTP
incorporation at a cytosine of a CpG site which can be inferred from the intensity of light
released (Bassil et al., 2013). Bisulfite pyrosequencing has been used to detect quantitative methylation and individual CpG sites or to validate results from array-based methods (Doi et al., 2009; Herb et al., 2012; Ji et al., 2010; Kim et al., 2010).
Next-generation sequencing technology has allowed the development of WGBS, enabling researchers to investigate quantitative DNA methylation level at CpG sites genome-wide. WGBS uses bisulfite conversion as in bisulfite pyrosequencing, but the bisulfite converted DNA undergoes next-generation sequencing instead of region specific amplification. The high throughput sequencing data returns read numbers of cytosine versus thymidine, therefore yielding a quantitative measure of DNA methylation at all
CpG sites in the genome (Laird, 2010; Lister et al., 2009). Several statistical methods and software packages have been developed to analyze the WGBS data: BSmooth, Bismark, and so on. Among these packages, BSmooth offers relatively accurate measurement of
DNA methylation at individual CpG sites from low coverage WGBS data by using
10 smoothing algorithm (Hansen et al., 2012). WGBS enables researchers to discover not only DMRs, but also non-CG methylation in stem cells and large genomic DNA methylation changes in cancer and sun exposed skin in elderly people (Hansen et al.,
2011; Lister et al., 2009; Vandiver et al., 2015).
2. Hematopoiesis
Overview
Hematopoiesis is one of the best studied, but complicated developmental systems.
It consists of a hierarchical process initiated by hematopoietic stem cells (HSCs) that differentiate into other hematopoietic progenitor cells and eventually produce all the mature differentiated blood lineages (Chao et al., 2008; Doulatov et al., 2012).
Experimental investigation of the hematopoietic system in the mouse was pioneered by
Till and McCulloch who identified a small subset of cells from the mouse bone marrow which could self-renew and form myeloerythroid colonies (Becker et al., 1963; Till and
Mc, 1961). These studies facilitated other investigators to develop assays such as the in vitro clonal assay combined with fluorescence-activated cell sorting (FACS), to identify and characterize hematopoietic stem and progenitor populations in mouse and human
(Chao et al., 2008; Doulatov et al., 2012). The classical model of hematopoiesis demonstrates that fully differentiated blood cells constitute two major lineages: myeloid and lymphoid. Myeloid lineage cells include granulocytes, monocytes, erythrocytes, and megakaryocytes that give rise to platelets. Lymphoid lineage cells include T, B, and natural killer (NK) cells, involved in immune responses (Doulatov et al., 2012).
11
Human hematopoiesis
Identification of stem and progenitor cells (HSPCs) has promoted our
understanding of human hematopoiesis. Multiple different cell surface markers were
identified and used for isolation of HSPCs. For example, CD34 is a well known marker
for the HSPCs that posses regenerative potential (Civin et al., 1984; DiGiusto et al., 1994;
Krause et al., 1996). Among these CD34+ HSPCs, additional markers such as CD90,
CD38, CD45RA, CD123, and CD10 enable researchers to isolate different components of the hierarchy. Several studies have demonstrated that HSC resides in Lin-CD34+CD38-
CD90+ population (Chao et al., 2008). Further investigation to identify downstream progenitors of HSC has established that multipotent progenitor (MPP) cells are contained in Lin-CD34+CD38-CD90-CD45RA- (Majeti et al., 2007). After HSC gives rise to
MPPs, MPPs further differentiate into progenitors for myeloid or lymphoid lineages. In myeloid lineage differentiation, the common myeloid progenitors (CMPs) develop into either GMPs or megakaryocyte/erythrocyte progenitors (MEPs). Interleukin-3 receptor
alpha chain (CD123) and CD45RA distinguish CMP, GMP and MEP: CMPs reside in
Lin-CD34+CD38+CD123+CD45RA-, GMPs in Lin-CD34+CD38+CD123+CD45RA+,
and MEPs in Lin-CD34+CD38+CD123-CD45RA- (Chao et al., 2008). Lymphoid lineage
differentiation is more complicated, since L-MPP is able to generate lymphoid lineage
cells, as well as monocytes, macrophages and dendritic cells by differentiating into GMP.
This L-MPP population is contained in Lin-CD34+CD38-CD90-CD45RA+ (Doulatov et
al., 2010; Goardon et al., 2011). This thesis will demonstrate DNA methylation
12 differences among human HSPCs and compare epigenetic plasticity between human and
mouse hematopoietic development.
DNA methylation in normal hematopoiesis
Two studies have investigated genome-wide DNA methylation in mouse
hematopoietic development using array or sequencing based methods (Bock et al., 2012;
Ji et al., 2010). Ji et al. performed CHARM examining 4.6 million CpG sites throughout
the genome for MPPs, common lymphoid progenitors (CLPs), CMPs, GMPs, and
thymocyte progenitors (DN1, DN2, DN3). Global methylation changes were involved in
fate decision at the myeloid or lymphoid commitment stage: decreased methylation for
myeloid and gain of methylation for lymphoid commitment. This first comprehensive
methylome map of hematopoietic progenitor cells in murine hematopoiesis identified
potential novel regulators such as Arl4c and Jdp2, as well as previously known transcription factor for hematopoietic differentiation, Meis1. This study demonstrated
DNA methylation is a core mechanism for hematopoietic development, and epigenetic plasticity regulates lineage commitment (Ji et al., 2010). In another study, Bock et al. investigated genome-wide DNA methylation of HSCs, MPP1s, MPP2s, CMPs, CLPs,
GMPs, MEPs, CD4-T cells, CD8-T cells, B cells, erythrocytes, granulocytes, monocytes using reduced representation bisulfite sequencing (RRBS). They observed similar pattern of epigenetic plasticity in cell fate decision: hypermethylation in CLPs and hypomethylation in CMPs. DNA methylation has played a crucial role for silencing of genes involved in myeloid differentiation in lymphoid lineage cells and vice versa. For
13 example, promoters of key transcription factor (TF) for myeloid differentiation such as
Tal1, or binding sites of significant myeloid TFs such as Gata1 were highly methylated in
CLPs compared to CMPs. This study has demonstrated that the information from the
combination of DNA methylation and gene expression data accurately inferred cellular
identity of different blood cells, underscoring the value of DNA methylation in
hematopoietic development (Bock et al., 2012).
For human hematopoiesis, a recent study provided genome-wide DNA methylation profile for HSPCs. This study used a nano HpaII-tiny-fragment-enrichment- by-ligation-mediated-PCR (nanoHELP) assay to investigate DNA methylation of long- term HSCs (LT-HSCs), short-term HSCs (ST-HSCs), CMPs, and MEPs. Loss of methylation has been observed when ST-HSC differentiated into CMPs, and methylation changes has been correlated with gene expression at this transition, while other
commitments such as CMP to MEP transition did not show statistically significant
correlation between DNA methylation and gene expression. These HSC commitment-
associated methylation patterns were able to predict overall patient survival in three
independent AML patient cohorts, indicating the importance of epigenetic regulation for
normal hematopoietic development (Bartholdy et al., 2014). This thesis will compare our
results of comprehensive methylome map of human hematopoiesis to the mouse studies
and the human study.
3. AML and LSC
AML
14 AML is a genetically heterogeneous cancer of myeloid lineage blood cells, and characterized by an accumulation of immature myeloid lineage blood cells in bone marrow. Since AML is caused by diverse pathogenic mechanisms, it is very important to identify subgroups based on the different features of the disease such as morphology and genetic alterations (Lowenberg et al., 1999). The French-American-British (FAB) classification is the most common method to differentiate the heterogeneous disease based on the morphology of leukemic blasts, indicating the degree of differentiation
(Table 1.1) (Bennett et al., 1976, 1985). Specific cytogenetic abnormalities such as translocations and chromosome rearrangements correlate with particular FAB subtypes
(Table 1.1). These cytogenetic lesions have been used in prognosis to predict clinical outcomes and relapse rate (Table 1.2) (Byrd et al., 2002; Grimwade et al., 1998; Slovak et al., 2000). In addition to cytogenetic abnormalities, other genetic mutations play an important role in leukemogenesis, as about a half of AML cases do not harbor a cytogenetic lesion (Dohner, 2007; Lowenberg et al., 1999). The Cancer Genome Atlas
(TCGA) has provided a public resource of genomic map of over 200 AML patients by performing either whole-genome sequencing or whole-exome sequencing. This comprehensive sequencing for a large cohort of AML patients revealed the mutational landscape of AML. Interestingly, AML genomes have relatively fewer genetic mutations compared to other solid tumors, 13 mutations on average, with only 5 in genes recurrently mutated in AML. Genetic alterations were classified to nine different categories based on their biological functions of genes harboring the alterations: TF fusions (18% of cases), NPM1 mutation (27%), tumor suppressors (16%), DNA methylation enzymes (44%), activated signaling genes (59%), myeloid TFs (22%),
15 chromatin modifiers (30%), cohesion-complex genes (13%) and spliceosome-complex
genes (14%). TF-fusion includes PML-RARA, MYH11-CBFB, RUNX1-RUNX1T1, and
PICALM-MLLT10; TP53, WT1, and PHF6 for tumor suppressors; DNMT3A, DNMT3B,
DNMT1, TET1, TET2, IDH1, and IDH2 for DNA methylation enzymes; FLT3, KIT,
KRAS/NRAS, PTPs, and other Tyr or Ser-Thr kinases; RUNX1, CEBPA, and other
myeloid TFs for myeloid TFs; MLL-X fusions, MLL-PTD, NUP98-NSD1, ASXL1,
EZH2, KDM6A, and other modifiers for chromatin modifying enzymes. This study has
suggested common mutations such as DNMT3A, NPM1, CEBPA, IDH1/2 and RUNX1,
which were mutually exclusive to TF-fusion might be involved in the initiation of AML
(Cancer Genome Atlas Research, 2013).
LSC model
LSC model postulates that AML is organized as a hierarchy like normal
hematopoiesis, in which LSCs give rise to leukemic blast cells like HSCs give rise to normal progenitors and differentiated cells. In the 1990s, Dick’s group identified that a
small subset of CD34+CD38- cells were uniquely able to transplant AML into immune
deficient mice (Bonnet and Dick, 1997; Lapidot et al., 1994). These observations lead to
the hypothesis that LSCs possess increased self-renewal capacity, which enables LSCs to
maintain and propagate the disease by generating bulk cancer cells (Kreso and Dick,
2014). Later, improved xenotransplantation models have revealed that LSC activity can
be identified in other subpopulations as well such as CD34+CD38+ or CD34- (Eppert et
al., 2011; Goardon et al., 2011; Martelli et al., 2010; Sarry et al., 2011; Taussig et al.,
16 2008; Taussig et al., 2010). Recently, surface makers other than CD34 and CD38 were identified to enrich for LSCs among heterogeneous cells. C-type lectin-like molecule-1
(CLL-1) was expressed in one third of CD34+CD38- compartment of 29 AML patients
(van Rhenen et al., 2007). CD96 was highly expressed in CD34+CD38- compartment of about two thirds of AML patients (Hosen et al., 2007). T-cell Ig mucin-3 (TIM3), was elevated in LSC fraction compared to normal HSCs (Jan et al., 2011; Kikushige et al.,
2010). CD47 was highly expressed in LSC, and expression of it protected LSCs from being phagocytosed by macrophages (Majeti et al., 2009b). CD25 and CD32 were also identified as novel marker for LSCs (Saito et al., 2010). Even though, many studies have reported a variety of surface markers for LSC, heterogeneous expression of these markers in patients suggests a complex immunophenotype that cannot be applied universally in
AML.
Recently, several studies have investigated genome-wide gene expression profiles of LSCs compared to HSCs or leukemia progenitor cells (LPCs) (de Jonge et al., 2011;
Eppert et al., 2011; Gentles et al., 2010; Majeti et al., 2009a). Majeti et al. performed the first genome-wide gene expression analysis of LSCs compared with HSCs. They identified 3005 differentially expressed genes, enriched in pathways such as Wnt signaling, MAP kinase signaling, adherence junction, ribosome, and T cell receptor signaling (Majeti et al., 2009a). Gentles et al. identified 52 genes which distinguished
CD34+CD38- LSCs from CD34+CD38+ LPCs through genome-wide gene expression analysis. This study showed 52 genes which were associated with overall, event-free, and relapse-free survival and with therapeutic response (Gentles et al., 2010). De Jonge et al. compared CD34+ fractions with the CD34- subfraction of AML patients and CD34+
17 normal progenitor compartment, and found that the top 50 CD34+ specific genes were
able to predict overall survival of AML patients (de Jonge et al., 2011). These studies
used the cell surface markers CD34 and CD38 to define LSC compartment, following the traditional LSC model. Eppert et al. have defined LSCs functionally using a xenograft assay. They sorted AML cells into 4 different fractions based on CD34 and CD38 expression first, then performed a xenograft assay on these fractions from 16 AML patients. They compared the gene expression profile of the functionally validated LSC to non-LSC or HSCs. The analysis demonstrated that LSC and HSC share a core transcriptional program, indicating the commonality between the two populations would be derived from ‘stemness’ property. The genes related to the stemness were associated with clinical outcome (Eppert et al., 2011).
Cell of origin
A number of both mouse and human studies have investigated the cell of origin in
AML. Mouse studies have typically utilized retroviral oncogene transduction or knock-in models to explore this question and have generally led to the conclusion that committed progenitors, in particular CMP and/or GMP, serve as the cell of origin for most AML models. In one study of MN1-induced AML, retroviral transduction of single CMP, but not GMP or HSC, resulted in the development of AML, indicating tight restriction of transformation by this oncogene (Heuser et al., 2011). In a second study using a mouse model of MLL-AF9 AML, the cell of origin influenced biological properties such as gene expression, epigenetics, and drug responses (Krivtsov et al., 2013). Both of these studies
18 highlight the significance of this question for leukemogenesis and potential therapies. In
contrast to mouse models, inferring the cell of origin in human leukemia is only possible
based on features of the disease. Studies investigating the cell of origin of human AML
using surface immunophenotype and gene expression originally suggested AML LSC
arise from HSC (Kreso and Dick, 2014). However, a more recent study that compared
genome-wide gene expression and surface markers of LSCs to those of normal HSPCs
suggests that LSCs arise from more committed progenitors, including L-MPP and GMP
(Goardon et al., 2011). In blast-crisis chronic myeloid leukemia (CML), GMPs with activation of beta-catenin from patients showed increased self-renewal and leukemogenic potential (Jamieson et al., 2004). Notably, three studies have recently reported that leukemogenic mutations existed in HSC, called ‘pre-leukemic HSC’ that underwent further clonal evolution to give rise to AML LSC. These studies have demonstrated a hierarchy among genetic mutations during clonal evolution of the pre-leukemic HSCs.
For example, mutations in epigenome modifying enzymes such as DNMT3A, IDH1/2 have occurred earlier than mutations of genes involved in activated signaling such as
FLT3 and NPM1 (Corces-Zimmerman et al., 2014; Jan et al., 2012; Shlush et al., 2014).
These two studies suggest that the pre-leukemic HSCs which harbor the early occurring genetic mutations is a cell of origin in AML, which may lead to disease relapse after remission.
4. DNA methylation in AML
Somatic mutations in epigenome modifying enzymes in AML
19 Dysregulation of the epigenome is a common feature in AML, as indicated by the
recent discoveries that a number of epigenome modifying genes are mutated in AML.
These genes include several involved in the regulation of DNA methylation such as
IDH1/2, DNMT3A, and TET2, and modulation of chromatin modifications such as
ASXL1, EZH2, and others (Abdel-Wahab et al., 2012; Cancer Genome Atlas Research,
2013; Ley et al., 2010; Yan et al., 2011). Beyond somatic mutations in these epigenome
modifying factors, characterization of DNA methylation in bulk AML cells has revealed great heterogeneity among patient cases. Figueroa et al. examined ~350 AML patient samples using HpaII tiny fragment enrichment by ligation-mediated PCR (HELP) interrogating ~14000 unique gene loci. This study identified 16 distinct clusters among the patients based on DNA methylation profile, which some of the clusters were associated with particular genetic aberrations such as mutations in CEBPA, NPM1,
AML1-ETO, CBFb-MYH11, and PML-RARA (Figueroa et al., 2010b). This study showed the first epigenetically distinct subtypes in AML, associated with genetic alterations. However, this study was done before genetic mutations in epigenetic modifiers were identified. The mutations in epigenome modifying enzymes induce aberrant methylation in the AML cells. In particular, AML with IDH1 or IDH2 mutations was associated with globally increased DNA methylation (Cancer Genome Atlas
Research, 2013; Figueroa et al., 2010a). MLL fusion or mutations in NPM1, DNMT3A, or FLT3 were associated with decreased DNA methylation. It is interesting to observe that about a half of AML patients (44%) harbored a genetic mutation in DNA methylation enzymes, suggesting a critical role of this process in leukemogenesis (Cancer
Genome Atlas Research, 2013). Recent studies have investigated how genetic mutations
20 in DNA methylation enzymes play an important role in leukemic transformation. Challen
et al. has shown loss of function of DNMT3A disturbed HSC differentiation over serial transplantation, while the LT-HSC compartment expanded (Challen et al., 2012).
DNMT3A mutations have been associated with overexpression of HSC specific genes, such as HOXA and HOXB genes in AML patients (Yan et al., 2011). As mentioned
previously, TET enzymes are known to be involved in DNA demethylation by producing
5hmC intermediates. It has been reported that bone marrow samples from patients with
myeloid malignancies with TET2 mutations showed decreased 5hmC with
hypomethylation compared to normal controls, and the interruption of TET2 function in
mouse model has displayed myeloid-skewed differentiation of HSCs. (Ko et al., 2010;
Moran-Crusio et al., 2011; Quivoron et al., 2011). TET2 mutation has been implicated in
clonal hematopoiesis in elderly individuals (Busque et al., 2012; Genovese et al., 2014;
Jaiswal et al., 2014; Xie et al., 2014). Several studies identified recurrent mutations in
IDH1 and IDH2 in AML patients and their function in leukemogenesis (Figueroa et al.,
2010a; Marcucci et al., 2010; Mardis et al., 2009; Ward et al., 2010). IDH1/IDH2
mutations displayed a neomorphic enzymatic activity generating 2-hydroxyglutarate (2-
HG) from α-ketoglutarate (Dang et al., 2009; Ward et al., 2010). 2-HG, oncometabolite,
has been shown to inhibit TET enzyme activity, and induce promoter hypermethylation
(Figueroa et al., 2010a; Shih et al., 2012). All the studies have demonstrated the
significant role of the DNA methylation enzymes in regulation of epigenome in
hematopoietic differentiation and disease development.
Genetic mutations in chromatin modifying enzymes play a significant role in
leukemogenesis. For example, loss of function mutations of ASXL1 causes a genome-
21 wide loss of H3K27me3 and collaborates with oncogenes to promote leukemogenesis
(Abdel-Wahab et al., 2012). The role of EZH2 mutation in myeloid malignancies is complicated, as both loss of function and gain of function mutations in EZH2, a histone lysine methyltransferase and a member of a PRC2 complex have been implicated in leukemogenesis (Ernst et al., 2010; Lund et al., 2014)
Clinical implication of epigenetics in AML patients
Mutations in DNMT3A, IDH1, IDH2, and TET2 have been linked to clinical prognosis such as risk stratification and therapeutic responses in AML patients (Shih et al., 2012). DNMT3A mutation has been associated with adverse overall survival in intermediate-risk group patients (Ley et al., 2010; Patel et al., 2012). TET2 mutation has shown adverse overall survival in AML patients with intermediate-risk, while IDH1 or
IDH2 mutation, which frequently co-occurred with NPM1 mutations, has shown favorable clinical outcome (Patel et al., 2012).
In addition to the association of genetic mutation in DNA methylation enzymes with clinical outcome, DNA methylation itself has been indicated as a prognostic marker.
Figueroa et al. showed an association of distinct methylation clusters of AML patients to overall survival and demonstrated that 15 genes with aberrant DNA methylation could predict overall survival of AML patients (Figueroa et al., 2010b). Furthermore, quantitative DNA methylation has successfully predicted clinical outcome in AML patients (Bullinger et al., 2010). Deneberg et al. has reported that hypermethylation at polycomb group (PcG) target genes was associated with favorable clinical outcome in cytogenetically normal AML (CN-AML) (Deneberg et al., 2011).
22 Besides the prognostic power of DNA methylation, the alteration of DNA methylome of myeloid malignancies has been a target of therapy, as it is reversible.
Azacitidine and decitabine, DNMT inhibitors, are widely used to improve clinical outcome in AML (Estey, 2013). Recent studies have reported that drugs regulating chromatin modification could be an effective therapy for AML patients. For example, suppression of bromodomain-containing 4 (BRD4), which recognizes acetylated lysine on histone, by the small molecule inhibitor, JQ1, demonstrated anti-leukemic effects
(Valent and Zuber, 2014; Zuber et al., 2011). Small molecule inhibitors of DOT1L, a telomeric silencing 1-like histone 3 lysine 79 (H3K79) methyltransferase, have shown therapeutic effects against MLL-fusion AML (Daigle et al., 2013; Daigle et al., 2011).
The epigenetic therapy is an active area of research and clinical trials for AML patients.
23 Neuron Brain
Pluripotent Hepatocyte Liver Stem cell
Blastocyst
Tubular epithelial cell Kidney
Genomic DNA
Figure 1.1. The role of epigenetics in multicellular organism.
24 Table 1.1. French-American-British (FAB) classification.
Type Name Cytogenetics
M0 Undifferentiated acute myeloblastic leukemia
M1 Acute myeloblastic leukemia with minimal maturation
M2 Acute myeloblastic leukemia with maturation t(8;21)(q22;q22), t(6;9)
M3 Acute promyelocytic leukemia (APL) t(15;17)
M4 Acute myelomonocytic leukemia inv(16)(p13q22), del(16q)
M4eo Acute myelomonocytic leukemia with eosinophilia inv(16), t(16;16)
M5 Acute monocytic leukemia del (11q), t(9;11), t(11;19)
M6 Acute erythroid leuekmia
M7 Acute megakaryoblastic leukemia t(1;22)
25 Table 1.2. Cytogenetics and prognosis.
Risk groups Cytogenetics 5-year survival Relapse rate
Good t(8;21), t(15;17), inv(16) 70% 33%
Intermediate Normal, +8, +21, +22, del(7q), Abnormal 11q23 48% 50%
Poor -5, -7, del(5q), Abnormal 3q, Complex cytogenetics 15% 78%
26
Chapter2
Epigenetic signature of leukemia stem cell
27 This work is an ongoing project of the Feinberg Lab and Johns Hopkins. All publication
rights are reserved for these institutions and the presentation of this work here does not
preclude future publication elsewhere.
Summary
AML is a hematopoietic malignancy, composed of a hierarchy that LSCs give rise
into Blast cells. Since LSC is clonally related to Blasts, we hypothesized that particular epigenetic features would endow the distinct capacity of LSC, which can initiate and propagate the disease. Here, we first demonstrated epigenetic differences between LSCs and Blasts by performing genome-wide methylation analysis. We identified 84 DMRs in
70 genes, so called LSC epigenetic signature, which have shown differential methylation and expression. We found that HOXA cluster genes were enriched in LSC epigenetic signature, suggesting a critical role of these genes to confer the unique ability of LSC.
The LSC epigenetic signature was partially dependent on genetic mutation in upstream regulators and epigenome modifying enzymes, yet about a half of it was independent of genetic mutation. The LSC epigenetic signature showed prognostic power in both DNA methylation and gene expression data sets, independent of previously known clinical factors such as age. These results provide the first evidence for epigenetic variation between LSC and their blast progeny in AML, and moreover, demonstrate that DMRs define prognostic subgroups of AML.
Results
28 AML LSC and Blasts Exhibit Epigenetic Differences That Define an LSC Epigenetic
Signature
To formally investigate epigenetic differences between LSC and blast progeny, we sought to identify DMRs between functionally-defined AML LSC-enriched populations and their downstream non-engrafting blasts from a cohort of 15 primary patient samples. We obtained samples from 15 AML patients (Tables 2.1 and 2.2) and isolated subpopulations based on the expression of CD34 and CD38 including: Lin-
CD34+CD38-, Lin-CD34+CD38+, and Lin-CD34- (Figure 2.1). We then performed comprehensive genome-scale DNA methylation analysis using the Illumina Infinium
HumanMethylation450 bead chip array. While AML LSC were originally described to be exclusively contained in the CD34+CD38- subpopulation, recent reports have indicated that leukemia-initiating cells can also be detected in multiple compartments including both the CD34+CD38+ and CD34- subpopulations, although usually at lower frequencies
(Eppert et al., 2011; Goardon et al., 2011; Sarry et al., 2011).
In order to identify LSC and blast populations, we conducted xenotransplantation assays on all three CD34/CD38 subpopulations from each of the 15 AML cases (Table
2.3). Similar to other reports, leukemic engraftment was observed from at least one subpopulation in 10 out of 15 AML patients. As expected, LSC activity dramatically decreased following the immunophenotypic hierarchy with 64.3% of CD34+CD38-,
46.7% of CD34+CD38+, and 26.7% of CD34- subpopulations engrafting in vivo (Table
2.3). To identify epigenetic markers of functional LSC, we performed DMR analysis between the 20 LSC-containing (engrafting) and 24 blast-containing (non-engrafting) fractions (hereafter termed “LSC” and “Blast”). The analysis identified 3030 DMRs, of
29 which 91.4% were hypomethylated in LSC (Table 2.4, see Appendix1 and Table 2.5).
These DMRs were further classified according to their global genomic location
including: islands (regions with a GC content greater than 50% and an observed/expected
CpG ratio of more than 0.6), shores (regions within 2kb of an island), shelves (regions 2
to 4kb away from an island), and open sea (isolated CpG sites in the genome without a
specific designation). These DMRs were nearly evenly distributed in CpG islands
(27.8%) and open seas (29%), (Table 2.5). In addition, the DMRs strongly correlated
with gene expression at CpG islands and open seas, whereas most hypomethylated DMRs
in the engrafting populations were associated with transcriptional up-regulation of
associated genes (Figure 2.2).
We next sought to integrate DNA methylation with gene expression analysis to
identify an LSC epigenetic signature by extracting genes which passed a DMR p value
<0.01 cutoff and exhibited >0.5 log2 fold gene expression change between the LSC and
Blast populations, with an inverse relationship between gene expression and DNA
methylation within 2kb of the transcriptional start site (TSS). We excluded gene body
DMRs, as there was no statistically significant positive correlation in AML or normal
hematopoiesis comparisons (Figure 2.3). We applied a minimum absolute value log2 0.5 fold gene expression cutoff, similarly to our previous LSC gene expression signature using the same microarray platform (Gentles et al., 2010). With these parameters, we identified 84 regions of 70 unique genes exhibiting differential methylation and gene
expression in LSC compared to Blasts (Table 2.6).
We compared our LSC epigenetic signature to the LSC gene expression
signatures from previous studies (Eppert et al., 2011; Gentles et al., 2010). Only six out
30 of 70 genes were found in these earlier studies, suggesting most of the genes identified
here comprise a novel signature for LSC defined first by DMR analysis and refined by
gene expression differences. One gene in this signature, REC8, which encodes a kleisin
family protein that is associated with the cohesin complex, was hypomethylated and
transcriptionally up-regulated in LSC (Figure 2.4a and Table 2.6). Notably, mutations of
components of the cohesin complex have been identified in AML and other tumor types
(Losada, 2014; Thol et al., 2014). We speculate that hypomethylation and increased
expression of REC8 in LSC might be related to cohesin complex activity in LSC. We also
identified HOXA5, HOXA6, HOXA7, HOXA9, and HOXA10 in the LSC epigenetic signature (Figure 2.4b-d and Table 2.6). These HOXA cluster genes were hypomethylated and highly expressed in LSC (Table 2.6). Notably, HOXA9 showed hypomethylation and
increased expression in LSC (Figure 2.4b and Table 2.6), and aberrant expression of
HOXA9 is known to be involved in increased proliferation of HSPCs and leukemogenesis,
suggesting a critical role in LSC activity (Chung et al., 2006; Lehnertz et al., 2014;
Takeda et al., 2006; Thorsteinsdottir et al., 2002).
Because the MLL subtype is itself associated with changes in expression of
members of the HOXA gene cluster (Drabkin et al., 2002; Milne et al., 2002), we
performed a second DMR analysis excluding the 5 LSC populations from the 2 MLL
patients in our cohort. We observed substantial overlap between the sets of DMRs
without MLL cases and with all samples. For the key LSC epigenetic signature, 81 of 84
DMRs, including the HOXA genes, were present after removal of the MLL cases (Table
2.7). Considering all DMRs with p < 0.01 (not just the LSC signature), there was 77%
31 overlap (Table 2.7). Thus, the presence of the MLL subtype was not a confounding
variable in defining the LSC epigenetic signature.
The LSC Epigenetic Signature is Partially Dependent on Underlying Somatic
Mutations
In order to identify important pathways and upstream regulators of LSC activity,
we utilized Ingenuity Pathway Analysis (IPA). The most significantly enriched pathway
was fatty acid α oxidation (Table 2.8), and inhibitors of this pathway have been
previously shown to induce apoptosis of leukemia cells (Samudio et al., 2010). Ingenuity
upstream regulator analysis identified NPM1, ASXL1, and KAT6A as the most significant upstream regulators of the LSC epigenetic signature genes, primarily through regulation of HOXA genes including HOXA5, HOXA6, HOXA7, HOXA9, and HOXA10 (Table 2.9).
Significantly, all three of these upstream regulators have been found to be mutated in
AML and likely serve as driver genes (Cancer Genome Atlas Research, 2013). In particular, mutations in ASXL1 and NPM1 have been shown to cooperate with HOX
genes to initiate leukemia by enhancing self-renewal and proliferation of hematopoietic
progenitors (Abdel-Wahab et al., 2012; Vassiliou et al., 2011). Consistent with this, we
observed that NPM1 mutation was associated with decreased methylation and increased expression of HOXA5, HOXA6, HOXA7, HOXA9, and HOXA10 compared to NPM1 wild-type samples in the TCGA cohort (Figure 2.5).
We then sought to investigate the LSC epigenetic signature for its association
with AML mutations in the TCGA cohort (Figure 2.6). The TCGA cohort consists of 200
AML patient samples with associated DNA methylation, gene expression, and full
genotyping from genome/exome sequencing (Cancer Genome Atlas Research, 2013).
32 First, we identified the epigenetic signatures associated with individual AML mutations
by performing DMR analysis between wild-type and mutant patient samples (Figure 2.6).
The mutations tested included epigenome modifying enzymes such as DNMT3A, IDH1/2, and TET1/2, and upstream regulators of our LSC epigenetic signature, NPM1 and ASXL1
(Figure 2.6). KAT6A was not included as there was no patient who had this mutation among the patients investigated on methylation arrays. Next, we examined the overlap between the mutation-associated DMRs and our LSC epigenetic signature (Figure 2.6).
Each LSC epigenetic signature gene was classified into three categories: (1) upstream regulator-associated if differentially methylated in association with any mutation in upstream regulators; (2) epigenome modifying enzyme-associated if differentially methylated in association with any mutation in epigenetic enzymes; or (3) mutation- independent if it was not differentially methylated in association with either upstream regulator or epigenome modifying enzyme (Figure 2.6 and Table 2.10). Of the 84 LSC
DMRs, 28 (33.3%) and 27 (32.1%) were associated with upstream regulator or epigenome modifying enzyme mutations, respectively (Figure 2.6 and Table 2.10).
However, 40 DMRs (47.6%) including HOXA7 and HOXA9 were mutation-independent targets (Figure 2.6 and Table 2.10). It should be noted that some of the LSC differentially methylated genes, including HOXA7 and HOXA9, have multiple DMRs regulated by different mechanisms. For example, HOXA7 has 4 DMRs in the LSC epigenetic
signature; one associated with mutation in NPM1, two associated with mutation in
DNMT3A, TET1, and NPM1, and one mutation-independent (Table 2.10). Therefore, we
annotated each DMR in those genes differently with DMR numbering such as
HOXA9/DMR1 (Figure 2.6 and Table 2.10). A small subset (11 signatures) of upstream
33 regulator and epigenetic enzyme associated LSC epigenetic signatures overlapped,
including REC8, HOXA6, and HOXA7 (Figure 2.6 and Table 2.10). This analysis showed that all the HOXA genes are epigenetically regulated by at least one upstream regulator, and HOXA6, HOXA7/DMR2, and HOXA7/DMR3 are common targets of both upstream regulators and epigenetic enzymes (Figure 2.6 and Table 2.10), and all of these changes involved DNA hypomethylation. In addition, hypomethylation of HOXA7/DMR1 occurred independently of mutations (Figure 2.6 and Table 2.10). Together, these results suggest that overexpression of HOXA genes mediated by DNA hypomethylation is a core
mechanism for LSC activity.
The LSC Epigenetic Signature is Associated with Overall Survival in Human AML
We hypothesized that if the LSC epigenetic signature reflected key drivers of the
functional differences between LSC and Blasts, then this signature should be associated
with clinical outcomes in human AML. First, we tested the association between the LSC
epigenetic signature and overall survival in the DNA methylation data from the TCGA
AML cohort (Cancer Genome Atlas Research, 2013). To assign each TCGA patient to an
LSC-like or Blast-like category, we calculated scores of each TCGA sample based on the
probability of being closer to either LSC or Blasts. A comparable number of samples
were assigned to each category by this method (99 for Blast-like and 93 for LSC-like). In
univariate survival analysis, the LSC-like group showed worse outcome compared to the
Blast-like group (hazard ratio (HR) =2.3, (95% confidence interval (CI) =1.6-3.4);
p=1.07 x10-5) (Figure 2.7a). The LSC-like vs Blast-like stratification remained associated with overall survival in multivariate analysis together with other known prognostic factors such as age (considered as a continuous variable), cytogenetic risk (assessed as
34 high vs low risk and intermediate vs low risk), NPM1, and FLT3 mutations (HR=1.9,
(95% CI= 1.2-2.9); p=0.003; Table 2.11).
Next, we tested the association between expression of LSC epigenetic signature genes and clinical outcome using four different cohorts including TCGA (Cancer
Genome Atlas Research, 2013), a cohort of normal karyotype patients (Dufour et al.,
2010; Metzeler et al., 2008), and two cohorts of mixed karyotype patients (Valk et al.,
2004; Wilson et al., 2006; Wouters et al., 2009). These cohorts consist of a total of 776
AML patients treated on different clinical protocols that also exhibited distinct biological characteristics(Gentles et al., 2010). We observed a strong correlation between the relative expression of LSC epigenetic signature genes and overall survival in the TCGA cohort (correlation=0.49; p=4 x10-13; Figure 2.8). The more highly expressed a gene was in LSC compared to Blasts, the more robust its association with worse overall survival. In all four cohorts, the overall expression level of the signature genes was significantly associated with overall survival, with higher expression associated with worse clinical
(Table 2.12). This association remained significant in multivariate Cox regression including age (continuous), cytogenetic risk, NPM1, and FLT3 mutations (HR=1.7, (95%
CI, 1.0-2.7); p= 0.03; Table 2.11). Similar results were observed for the three other cohorts in univariate and multivariate analyses (Figures 2.7c-e, Tables 2.12 and 2.13).
Finally, we tested if mutations in epigenetic enzymes such as DNMT3A, IDH1/2,
TET2, and ASXL1 affected the prognostic impact of the LSC epigenetic signature in the
TCGA cohort. As described previously, mutation in DNMT3A, but none of the other genes, was associated with patient overall survival (Table 2.14). Multivariate survival analysis including DNMT3A mutation showed that our LSC epigenetic signature
35 remained independently associated with clinical outcome in both the DNA methylation and gene expression data from TCGA, even when incorporating cytogenetic risk group
(Table 2.15), as well as within the intermediate cytogenetic risk group alone (Table 2.16).
Overall, these results demonstrate that the LSC epigenetic signature defined by DNA methylation and gene expression is associated with overall survival in human AML.
Discussion
The cancer stem cell (CSC) model was originally proposed based on observations from human AML in which only subpopulations of leukemia-initiating or LSC possessed engraftment potential (Bonnet and Dick, 1997; Lapidot et al., 1994). According to this model, the LSC give rise to downstream Blasts that lack critical stem cell properties. As
LSC and their non-engrafting Blast progeny are clonally related, a major implication of this leukemia stem cell model is that their functional properties must be due to epigenetic differences. Here, we provide such evidence by characterizing global DNA methylation features of LSC defined by xenotransplantation of AML subpopulations, compared to non-engrafting Blast cells, demonstrating that AML LSC exhibit global hypomethylation compared to non-LSC Blast cells. Integrating DNA methylation and gene expression analysis, we identified 84 regions of 70 genes as the LSC epigenetic signature. 64 of these 70 genes were not reported in previous gene expression studies for LSC (the exceptions being CD34, SH3BP5, RBPMS, LTB, MS4A3, and VNN1) (Eppert et al., 2011;
Gentles et al., 2010). Most of the LSC epigenetic signature was mutation-independent, not associated with mutations in upstream regulators or epigenome-modifying enzymes suggesting that leukemogenesis may converge on these primary epigenetic signatures We
36 also identified some mutation-associated epigenetically dysregulated genes, including
REC8 and HOXA7. Together, these epigenetic signatures represent potential therapeutic targets regardless of the different types of the underlying mutations present in individual
AML cases. Furthermore, the LSC epigenetic signature was prognostic of patient overall survival independently of known survival predictors such as age and cytogenetic abnormalities, emphasizing its functional importance.
Apart from its prognostic effect, the LSC epigenetic signature represents a
molecular target that may improve patient survival and prevent relapse. Recently,
epigenetic therapy with hypomethylating agents azacytidine and decitabine has been
approved for the treatment of AML. Randomized trials demonstrated improved overall
survival compared to chemotherapy, but also indicated limited effect on relapse rate in
high-risk AML (Estey, 2013). Our results indicate that LSC are relatively
hypomethylated compared to Blasts, suggesting that they may be less effectively targeted
by hypomethylating agents, possibly accounting for their limited efficacy in relapse-free
survival. It would be of great interest to see how the LSC epigenetic signature is affected
by these drugs.
More specifically, this LSC epigenetic signature was markedly enriched for
members of the HOXA cluster, suggesting this cluster is a key driver of LSC function.
The HOXA cluster has been implicated as a key regulator of hematopoiesis and myeloid malignancy (Alharbi et al., 2013). In particular, HOXA9 is known to be involved in increased proliferation of HSPC and leukemogenesis (Thorsteinsdottir et al., 2002), even occurring as a fusion oncogene in rare cases (Nakamura et al., 1996). Moreover, increased expression of HOXA9 has been found to be an adverse prognostic factor in
37 AML (Golub et al., 1999). Other HOXA family members including posterior (HOXA7,
HOXA9, and HOXA10) and anterior (HOXA6) members have been implicated in leukemogenesis, as overexpression of these genes in normal mouse HSPC leads to increased self-renewal, transformation, and development of myeloid malignancies (Bach
et al., 2010). The functional LSC epigenetic signature provided here demonstrates that
the HOXA family is a key driver of AML LSC that may function in imparting aberrant
self-renewal.
Materials and Methods
Human Samples
Human acute myeloid leukemia (AML) samples were collected from patient peripheral
blood (PB) or bone marrow (BM) at Stanford hospital, according to an IRB-approved
protocol (22264), and informed consent was obtained from all subjects. PBMC or
BMMC were separated with Ficoll-Paque Plus (Amersham Biosciences, Piscataway, NJ,
Catalog number: 17-1440-03), and cryopreserved in 1 x freezing medium (90%FBS +
10%DMSO). All the AML experiments were conducted with cryopreserved PBMC or
BMMC samples that were thawed and washed in IMDM medium containing 10% FBS.
Flow Cytometry Analysis and Cell Sorting
A battery of antibodies (Abs) was used for staining, analysis and sorting of progenitor
cells from AML patient PBMCs/BMMCs, as well as lineage analysis human
chimerism/engraftment (Table 2.17). Cells were either analyzed or sorted using a FACS
Aria II cytometer (BD Biosciences, Franklin Lakes, NJ). Analysis of flow cytometry raw data was done with FlowJo Software (Treestar, Ashland, OR).
38 Xenotransplantation Assay
NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ mice (NSG) were obtained from The Jackson
Laboratory (Bar Harbor, ME) and bred in a specific pathogen-free environment per
Stanford Administrative Panel on Laboratory Animal Care Guidelines (Protocol 22264).
Six to eight week-old adult mice were exposed to 200 rads of gamma irradiation at least two hours (up to 24 hours) prior to transplantation. Up to 500 thousand fresh-sorted
AML cell subpopulation were resuspended in 30 µl of Hank’s Balanced Salt Solution
(HBSS) (Gibco Life Technologies, Grand Island, NY) containing 2% FBS, and injected intravenously via the tail vein using a 29-gauge needle. For each cell subpopulation, at least three technical replicates were performed by transplantation of three aliquot of cells into three mice. Around 150 mice in total were used. Neither randomization nor blinding was used for this study.
After eight weeks, mice were euthanized with CO2 according to IRB approved protocol
(22264). BM were isolated using scissors and needle flashing, then underwent hypotonic red cell lysis using ACK (Ammonium-Chloride-Potassium) lysing buffer (Life
Technologies, Grand Island, NY, Catalog# A10492). BMMCs were stained with Ab combinations (Table 2.17) on ice for 30 minutes, and dead cells were excluded by propidium iodide (PI) staining. Human myeloid engraftment (hCD45+CD33+) and lymphoid engraftment (hCD45+CD19+) were analyzed on flow as described before.
Illumina Infinium Human Methylation 450 Bead Array Assay
Genomic DNA from each sample was purified using the MasterPure DNA purification kit (Epicentre) according to the manufacturer’s protocol. The genomic DNA (250-500ng) was treated with sodium bisulfate using the Zymo EZ DNA Methylation Kit (ZYMO
39 Research) as recommended by the manufacturer, with the alternative incubation
conditions for the Illumina Infinium Methylation Assay. Converted DNA was eluted in
11ul of elution buffer. DNA methylation level was measured using Illumina Infinium HD
Methylation Assay (Illumina) according to the manufacturer’s specifications.
Methylation array data are deposited at the Gene Expression Omnibus (GEO) with
accession number GSE63409.
Illumina Infinium Human Methylation 450 Bead Array Analysis
Raw intensity files were obtained using minfi package (Aryee et al., 2014) to calculate methylation ratios (Beta values). The data was normalized using Illumina preprocessing method implemented in minfi. Several quality control measures were applied to remove
arrays with low quality. Control probes were examined on the 450k array to assess several measures including bisulfite conversion, extension, hybridization, specificity and others. Next, median methylated and unmethylated signals were calculated for each arrays; no array was identified for signal values lower than 10.5. For multidimensional scaling analysis, probes containing an annotated SNP (dbSNP137) at the single-base extension or CpG sites were removed (17398 probes removed). Minfi 1.8.9 was used.
Bump hunting method previously described was applied to identify DMRs in 450k array
(Aryee et al., 2014; Jaffe et al., 2012). Beta value of 0.1 (10% of methylation difference) was used as cutoff when finding DMRs. Statistical significance was assigned by permutations testing and the P-value cutoff used for downstream analysis was <0.01 that corresponded to Benjamini-Hochberg adjusted p-value <0.1 (data not shown) unless different cutoff was designated in result part. Bumphunter 1.2.0 was used. Same method
40 was applied to identify DMRs for the second DMR analysis of LSC vs Blast that we removed 5 LSC cases from 2 MLL patients (SU042 and SU046).
Affymetrix Microarray Expression Analysis
Total RNA was extracted from each FACS-sorted cell population using RNeasy® Plus
Mini (QIAGEN, Valencia, CA, Catalog#: 74134) according to the manufacture’s protocol. All RNA samples were quantified with 2100 Bioanalyzer (Agilent
Technologies, Santa Clara, CA), subjected to reverse transcription, two consecutive rounds of linear amplification, and production and fragmentation of biotinylated cRNA.
15µg of cRNA from each sample was hybridized to HG U133 Plus 2.0 microarrays.
Hybridization and scanning were performed according to the manufacture’s instruction
(Affymetrix). This step was performed at the PAN center of Stanford University. Data were normalized by GC robust multi-array average method and analyzed on
R/Bioconductor. SU042 CD34+38+ was removed from further analysis due to low quality. SU001 was excluded, as the samples from this patient were not included on expression array (GEO GSE63270).
Sanger Sequencing to Detect AML Mutations
Genomic DNA was extracted from patient BMMC or PBMC using QIAmp DNA Mini
Kit (QIAGEN, Valencia, CA, Catalog#: 51304) according to the manufacture’s instruction. PCR primers were designed to cover exon 3-11 of TET2, exon 4 of IDH1/2, and exon 7-23 of DNMT3A (Table 2.18). The PCR reaction premix consists of 1x of
OneTaq 2x Master Mix (NEB, Ipswich, MA, Catalog#: M0482L), 0.2µM forward and reverse primers respectively, and 10ng (up to 100ng) genomic DNA as template. The reaction was under the condition of 95ºC initial denaturation for 30 seconds, 45 cycles of
41 extension containing 94ºC for 30 seconds, 56ºC for 1 minute (or as necessary) and 72ºC for 1 minute, and a final extension at 72ºC 5 minutes. The PCR products were concentrated with PCR purification kit (QIAGEN, Valencia, CA, Catalog#: 28106), then submitted to Sequentech (Mountain View, CA) for sequencing of both forward and reverse directions using 3730xl DNA Analyzer (Applied Biosystems, Foster City, CA) according to the manufacturer’s instruction. The sequencing data was analyzed using
Sequencher 5.1 (Gene Codes Corporation, Ann Arbor, MI), and single-nucleotide polymorphism (SNP) was excluded by checking NCBI website before getting the final mutation results.
Survival Analysis
Survival analysis was performed to assess the association of LSC DNA methylation and gene expression signatures with clinical outcome (overall survival) in 4 different cohorts.
For DNA methylation data set (TCGA), patients were separated into two groups; LSC- like and Blast-like based on the methylation profile of each individual. Survival was compared between the two groups using the coxph function in R (survival package 2.37), with significance assessed by log-rank test. For gene expression, the genes in the LSC epigenetic signature were identified in expression datasets for which survival outcomes were available. The first principal component of their expression levels was computed, and patients were stratified as “high” or “low” relative to its median value. Survival differences between the groups were assessed by log-rank test. In multivariate analyses, age was incorporated as a continuous variable, mutations were coded as present/absent
(1/0), and assessment of cytogenetic risk was treated as individual groups and done for
42 intermediate vs low-risk and High vs low risk (Figure 2.9). Analysis was also performed within intermediate risk groups.
Statistical Analysis
To assign cell identity of LSC/Blast to TCGA samples, mean methylation value of each
LSC epigenetic signature (84 DMRs) for LSC/Blast (methylation profile) was retrieved and standard deviation of the mean value for each signature was calculated. Then scores
(probability density values as log value) for each TCGA sample regarding LSC and Blast profile was calculated using dnorm function with the mean and standard deviation calculated in previous step. Maximum value of scores between the ones regarding LSC and Blast methylation profile was chosen, and then cell identity assigned.
Bioinformatics Analysis
QIAGEN’s Ingenuity IPA (Ingenuity® Systems, www.ingenuity.com) was performed for pathway analysis.
43
Supplementary Figure 1. Pre-sort and post-sort FACS analysis of subpopulations from human AML. Top panel: FACS-sorting scheme of three immunophenotypically defined subpopulations from human AML samples. Other panels: Two rounds of post- sort analysis to check the purity of sorting.
44 Figure 2.1. Pre-sort and post-sort FACS analysis of subpopulations from human
AML. Top panel: FACS-sorting scheme of three immunophenotypically defined subpopulations from human AML samples. Other panels: Two rounds of post-sort analysis to check the purity of sorting.
45 Island Shore Differential expression Differential expression