In memory of Mimi Nordlund

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Nordlund J, Kiialainen A, Karlberg O, Berglund EC, Göransson- Kultima H, Sønderkær M, Nielsen KL, Gustafsson MG, Behrendtz M, Forestier E, Perkkiö M, Söderhäll S, Lönnerholm G, Syvänen A- C. (2012) Digital gene expression profiling of primary acute lym- phoblastic leukemia cells. Leukemia, 6(26):1218-1227

II Milani L, Lundmark A, Nordlund J, Kiialainen A, Flaegstad T, Jonmundsson G, Kanerva J, Schmiegelow K, Gunderson KL, Lön- nerholm G, Syvänen A-C. (2009) Allele-specific gene expression patterns in primary leukemic cells reveal regulation of gene expres- sion by CpG site methylation. Research, 19:1-11

III Nordlund J, Milani L, Lundmark A, Lönnerholm G, Syvänen A-C. (2012) DNA methylation analysis of bone marrow cells at diagnosis of acute lymphoblastic leukemia and at remission. PLoS ONE, 7(4)

IV Nordlund J*, Bäcklin C*, Wahlberg P, Busche S, Berglund EC, El- oranta M-L, Flaegstad T, Forestier E, Frost B-M, Harila-Saari A, Heyman M, Jonsson O, Larsson R, Palle J, Rönnblom L, Schmie- gelow K, Sinnett D, Söderhäll S, Pastinen T, Gustafsson MG, Lön- nerholm G, Syvänen A-C. (2012) DNA hypermethylation signatures for genetic subtypes of pediatric ALL at diagnosis and relapse. Man- uscript.

* Equally contributing authors

Reprints were made with permission from the respective publishers.

Related Paper

Milani L*, Lundmark A*, Kiialainen A, Nordlund J, Flaegstad T, Forestier E, Heyman M, Jonmundsson G, Kanerva J, Schmiegelow K, Söderhäll S, Gustafsson MG, Lönnerholm G, Syvänen A-C (2010) DNA methylation for subtype classification and prediction of treatment outcome in patients with childhood acute lymphoblastic leukemia. Blood, 11;115(6)

Contents

Introduction ...... 11 Acute Lymphoblastic Leukemia ...... 11 Cytogenetic Classification of ALL ...... 12 Clinical Implications ...... 15 Genetic Variation in the Human Genome ...... 16 Genetic Variation and Risk of ALL ...... 17 Somatic Genetic Variation in ALL ...... 18 The Human Epigenome ...... 19 Histone modifications ...... 19 DNA Methylation ...... 20 DNA Methylation and Cancer ...... 21 Epigenetics in ALL ...... 22 Cooperation of Genetic and Epigenetic Mechanisms in Cancer ...... 23 Gene Expression ...... 23 Allele-Specific Gene Expression ...... 24 Gene Expression Profiling in ALL ...... 25 Technologies for Genome-Scale Analysis ...... 25 Array-Based Technologies ...... 25 Sequencing-Based Technologies ...... 26 Present Investigations ...... 28 Thesis Aims ...... 28 Materials and Methods ...... 28 Samples ...... 28 Digital Gene Expression Profiling (Paper I) ...... 30 SNP Genotyping of DNA and RNA (Paper II) ...... 30 DNA Methylation Analyses (Papers II, III & IV) ...... 31 and Statistical Analyses (Papers I-IV) ...... 31 Results ...... 33 Genome-Wide Gene Expression in ALL Cells (Paper I) ...... 33 Allele-Specific Gene Expression in ALL Cells (Paper II) ...... 33 DNA Methylation at Diagnosis of ALL and Remission (Paper III) ...... 34 Genome-Wide DNA methylation Analysis in a Large ALL Cohort (Paper IV) ...... 34 Discussion ...... 36 Concluding Remarks ...... 38

Acknowledgements ...... 40 References ...... 43

Abbreviations

5mC 5-Methylcytosine ABL1 C-abl oncogene, non-receptor tyrosine kinase ALL Acute lymphoblastic leukemia APA Alternative polyadenylation ASE Allele-specific expression BCP-ALL B-cell precursor ALL BCR Breakpoint cluster region bp Base-pair cDNA Complementary DNA ChIP-seq Chromatin immunoprecipitation sequencing CML Chronic myeloid leukemia CNA Copy number alteration CNV Copy number variation CpG CG dinucleotide DGE Digital gene expression DNMT DNA methyltransferase E2A Transcription factor 3 (E2A immunoglobulin enhanc- er binding factors E12/E47) ENCODE Encyclopedia of DNA Elements ETV6 Ets variant 6 (TEL oncogene) EZH2 Histone-lysine N-methyltransferase FDR False discovery rate GWAS Genome-wide association study HeH High hyperdiploidy miRNA Micro RNA MLL Myeloid/lymphoid or mixed lineage leukemia MRD Minimum residual disease ncRNA Non-coding RNA NOPHO Nordic Society of Pediatric Hematology & Oncology

NSC Nearest shrunken centroids nsSNP Non-synonymous SNP PBX1 Pre-B leukemia homobox 1 PCA Principal component analysis PcG Polycomb-group PCR Polymerase chain reaction RNA-seq (RNA) sequencing RT-PCR Reverse transcription PCR RUNX1 Runt-related transcription factor 1 SAGE Serial analysis of gene expression SNP Single nucleotide polymorphism SNV Single nucleotide variation SV Structural variation T-ALL T-cell acute lymphoblastic leukemia TPM Tags per million TPMT Thiopurine s-methylstransferase TSS Transcription start site UTR Untranslated region WGA Whole-genome amplification WGBS Whole-genome bisulfite sequencing WGS Whole-genome sequencing

Introduction

Over the last decade, the completed sequence of the ~3 billion nucleotides that comprise the human genome has provided the foundation for under- standing human genetic variation in health and disease [1]. New high- throughput methods made genome-wide discovery and analysis of such vari- ation a feasible endeavor [2-4]. In studying malignancies, we are interested in identifying and understanding the causal and associated molecular chang- es that contribute to malignant phenotypes. This thesis focuses on using new technologies for identifying such factors that may have a functional role in the regulation of gene expression, DNA methylation, and in the development of acute lymphoblastic leukemia in children.

Acute Lymphoblastic Leukemia Acute lymphoblastic leukemia (ALL) is a malignant disease of the bone marrow that is characterized by the rapid clonal expansion of immature white blood cells. This expanded lymphoblast population is defective in their normal maturation and function and quickly crowd out the healthy function- ing blood cells [5]. Thus, immediate treatment is required to restore normal functional bone marrow. ALL is the most common form of cancer in chil- dren, with a peak incidence between two and five years of age [6]. Accord- ing to the Nordic Society of Pediatric Hematology and Oncology (NOPHO), approximately 200 children are newly diagnosed with ALL in the Nordic countries (Denmark, Finland, Iceland, Norway & Sweden) each year [7]. With contemporary combination chemotherapy protocols for ALL, approxi- mately 80% of patients achieve continuous complete remission (CR). De- spite these improvements, for unknown reasons approximately 20% of chil- dren do not respond to treatment, experience relapse, or suffer serious treat- ment related complications [8]. ALL is a clinically and genetically heterogeneous disease that can initially be divided into two immunophenotypic subgroups based on the cellular origin of lymphoblastic cells. The most frequent immunophenotype arises from pre-B cell precursor cells (BCP-ALL) and accounts for approximately 85% of cases. T-cell ALL (T-ALL) represents the remaining 15% of cases. ALL cells can be characterized at the molecular level by the presence of several recurrent structural and numerical chromosomal aberrations. Cyto-

11 genetic subtyping in BCP-ALL has a significant impact on the clinical man- agement of patients as some subtypes infer a better or worse prognosis [9].

Cytogenetic Classification of ALL Cytogenetic characterization provides the basis for subtype classification of BCP-ALL cases. The known recurrent structural and numerical chromoso- mal aberrations that define the cytogenetic subtypes of ALL are summarized in Table 1. Today, therapy decisions follow risk assessment based on clini- cal features such as presenting white blood cell count, sex, age, and analysis of minimal residual disease (MRD) in addition to detection of genetic aber- rations [8,10]. Thus, understanding the molecular mechanisms underlying the different cytogenetic aberrations will be fundamental in improving thera- peutic options.

Aneuploidies The most common cytogenetic subtype of pediatric ALL is high hyperdip- loidy (HeH), which is characterized by the presence of >50 chromosomes in the nuclei of leukemic blasts. The number of chromosomal gains can vary, however classical trisomies involve chromosomes 4, 6, 10, 14, 17, 18, 21 and X. HeH has a peak incidence around 2-5 years of age and infers one of the best prognoses [10]. Conversely, hypodiploidy is an uncommon subtype defined by a cellular chromosome count of less than 44 and a very poor prognosis [11]. Intrachromosomal amplification of chromosome 21 (iAMP21) was identi- fied by the observation of an amplification in the RUNX1 signal in routine screening for the t(12;21)ETV6/RUNX1 gene fusion [12]. The chromosomal structure of iAMP21 is complex and heterogeneous between patients, and is believed to be a primary cytogenetic event in this subtype [13]. iAMP21 is rare, tends to occur in older children, and infers an unfavorable prognosis.

Translocations Several of the recurrent structural abnormalities in ALL are chromosomal translocations that result in functional fusion genes. The translocations affect genes that are important for cell-regulation and development. For example, in t(12;21), t(1;19) and t(9;22) ALL subtypes, the translocations result in the functional fusion genes ETV6/RUNX1 , TCF3/PBX1 and BCR/ABL1, respec- tively [14,15]. The t(12;21)ETV6/RUNX1 translocation (formally TEL/AML1), involving two critical genes for hematopoiesis, is the second most frequent cytogenetic aberration in ALL and the most common gene fusion [14]. It is unique to BCP-ALL and infers one of the best clinical outcomes. It is frequently found in younger children and the co-incidence in monozygotic twins is approxi- mately 10% [16]. Interestingly, up to 1% of neonatal blood spots (Guthrie

12 cards) are t(12;21) positive, however very few children will go on to develop the disease [17,18]. This translocation is likely to be established in utero, however the low prevalence of the disease suggests that t(12;21)ETV6/RUNX1 is a required first hit, but alone it is not sufficient to cause overt leukemia [16,17]. The t(1;19)TCF3/PBX1 translocation (formally E2A/PBX1) occurs in ap- proximately 5% of BCP-ALL cases. Originally classified as a poor prognos- tic subtype, modern intense chemotherapy has now rendered TCF3/PBX1 is now among one of the subtypes with the best prognosis [12]. The low con- cordance rate of this aberration in monozygotic twins and negative results from Guthrie card studies suggest postnatal initiation, as opposed to t(12;21)ETV6/RUNX1 [18]. The discovery of the t(9;22)BCR/ABL1 gene fusion in chronic myeloid leukemia (CML) cells in 1960, also known as the Philadelphia chromosome was the first genetic abnormality identified to cause cancer [19]. Although more common in CML, t(9;22) is detected in roughly 5% of pediatric ALL cases. The resulting tyrosine kinase activity of the BCR/ABL1 fusion protein affects a number of downstream pathways such as cell death, Ras signaling, and growth factor independent proliferation [20]. The development of the drug Imatinib mesylate, which prevents the phosphorylation of substrates by BCR-ABL, is now considered a paradigm for molecular based therapies [21], and a revolution in the treatment of leukemias harboring the BCR/ABL1 fusion gene. Imatinib was added to the NOPHO treatment regiment for BCR/ABL1 positive patients in 2003. Most children under the age of one year have a different distinct form of ALL that is characterized by associated translocations of the mye- loid/lymphoid or mixed-lineage leukemia (trithorax homolog, Drosophila) gene (MLL) that results in chimeric proteins consisting of MLL fused to >40 different genes. The fusion proteins have enhanced activity that affects the downstream expression patterns of genes in the HOX gene family [5]. Monozygotic twins with MLL-rearranged subtypes have a concordance rate close to 100%, indicating in utero manifestation of the disease and potential trans-placental transmission from one twin to the other [10]. Infants are among the group of patients with the worst overall-survival [11]. Although dic(9;20) is an established chromosomal aberration, no gene fu- sion has been identified to date [22]. Whilst very little is known about the underlying molecular consequences, it is hypothesized that the loss of 9p and/or 20q material may be important. However, this has not been experi- mentally verified in a large set of patients and the prognostic significance of this aberration remains unclear [23,24].

Undefined Subtypes In approximately 25% of BCP-ALL cases the underlying genetic alterations are either unknown or non-recurrent. However, by applying new techniques

13 for unbiased genome-wide profiling, genetic alterations can be identified in virtually all cases and novel subgroups with recurrent genetic alterations are currently being uncovered [25]. For instance, it has been suggested that a rearrangement of the CRLF2 gene or deletion of the ERG gene may be found in up to 14% of undefined cases [26,27]. In addition, another novel high-risk subtype has been discovered that harbors IKZF1 deletions and displays gene expression profiles mirroring BCR/ABL1 positive ALL [12]. Because of their recent discovery, they have not yet been incorporated into risk stratification in clinical trials [28] and their prevalence in the Nordic countries has yet to be determined.

T-cell ALL In contrast to BCP-ALL, where the results of cytogenetic analysis impacts risk group classification, the cytogenetics of T-ALL is less well understood [29]. Detectable chromosomal abnormalities exist in approximately 50% of cases, albeit few are recurrent [25] and such abnormalities do not appear to have prognostic significance, as in BCP-ALL [29]. Instead, widespread dele- tion of CDKN2A/2B and mutations in genes in the NOTCH1 pathways ap- pear to be common across most T-ALL patients [25]. Several recent studies have identified a subset of high-risk, extremely aggressive T-ALL subtype termed “early T-cell precursor” ALL, which is classified by cell surface markers and a gene expression profile highly similar to hematopoietic stem cells [28,30].

Table 1. Summary of recurrent genomic alterations in acute lymphoblastic leukemia (ALL).

Immuno- Genetic feature Cytogenetic Median age Frequency* phenotype aberration (years)* T-ALL Various Various 9.4 13% BCP-ALL High hyperdiploidy >50 chromosomes 3.9 24% ETV6/RUNX1 t(12;21) 4.5 21% BCP-ALL “other” Various 7.8 13% BCP-ALL “undefined” None detected 6.7 10% MLL-rearrangement 11q23 and >40 fusion 1.0 4% partners BCR/ABL1 t(9;22) 10.0 ~3% TCF3/PBX1 t(1;19) 11.1 ~3% Dicentric chromosome dic(9;20) 2.6 ~3% iAMP21 Amplification 21q 9.2 ~1% Hypodioploidy <45 chromosomes 5.1 <1% * Based on data from the Nordic Society of Pediatric Hematology and Oncology for ALL patients diagnosed between 1996 and 2010.

14 Clinical Implications All patients diagnosed with ALL in the Nordic countries are treated with intensive combined chemotherapy including three therapy phases consisting of remission induction, periods of intensification (continuation), and mainte- nance treatment according to NOPHO protocols [8]. The event-free and long term survival for the NOPHO-92 protocol after 10 years was 75% and 85% and for the NOPHO-2000 protocol after 5 years was 79% and 89%, respec- tively [8]. The event-free survival of ~~800 Nordic ALL cases by cytogenet- ic subtype is plotted in Figure 1. Patients with MLL-rearrangements, iAMP21, t(9;22) and hypodiploidy have the lowest event-free survival rate. Younger children have the best prognosis (excluding infants <1 year) since these patients often have specific chromosomal re-arrangements that are associated with better prognosis, such as HeH and t(12;21). However, with increasing age, these translocations become less common and conse- quently teenagers and young adults rarely present with these “good progno- sis” aberrations. While several genetic subtypes (MLL-rearrangements, t(9;22), and hypodiploidy) are associated with a higher risk of relapse, it is important to note that relapse occurs in all genetic subtypes. In sheer num- bers many relapses occur in the large subtypes with favorable prognoses such as t(12;21)ETV6/RUNX1 and HeH. Thus, genetic subtyping is not suf- ficient enough for accurate prediction of chemotherapeutic response.

Figure 1. Kaplan-Meier curves representing the probability of event-free survival of ~800 pediatric acute lymphoblastic leukemia (ALL) patients of the different im- munophenotype and cytogenetic subtypes treated on the NOPHO protocols between 1996 and 2010.

15 Chemotherapy and Drug Resistance Cure rates for pediatric ALL have improved steadily over the past 50 years with the advent of several effective chemotherapies, targeted therapy, and the use of uniform treatment protocols [8,10]. However, both the side and late effects of intensive chemotherapy continue to be a problem in ALL treatment, and 15 to 20% of ALL patients will experience a relapse. At re- lapse, leukemic cells tend to be much more resistant to chemotherapies and thus the overall survival for relapsed ALL patients is merely 40%, complete- ly opposite to the situation at first diagnosis [31]. There is a lack in mecha- nistic understanding of what causes relapse in ALL, nevertheless with rapid advancements in genome-scale analysis, uncovering novel genomic, tran- scriptomic, and epigenomic markers for relapse prediction and mechanistic understanding increasing [9,27,32,33].

Genetic Variation in the Human Genome The human genome contains 3.1 billion base pairs (bps) of genetic infor- mation, encoding for roughly 20,000 protein-coding genes, and a diverse set of non-coding genes [34-36]. As a result of three major efforts; the Human , the International Haplotype Mapping Project, and the 1000 Project, the complete human genome sequence and the known variants are cataloged in publicly accessible databases [1,37-39]. Normal genetic variation ranges from single bp changes to very large chromosomal rearrangements. All somatic cells in a given individual contain approximate- ly the same DNA sequence, but the cellular phenotypes differ dramatically due to differential expression of lineage-specific genes. Gene expression is tightly regulated by a number of genetic and epigenetic factors, including regulatory elements encoded in the DNA and non-coded elements such as the methylation of cytosine residues in CpG dinucleotides and non-coding RNAs (ncRNA). Single base differences in the DNA sequence are referred to as single nu- cleotide polymorphisms (SNPs). The frequency of SNPs in the population is approximately 1 in every 300 bps in the human genome. Most SNPs are non- pathogenic. These small differences between individuals result in normal phenotypic variation (why we are not all the same), but some changes are pathogenic and have important consequences in disease. A specific class of SNPs, called non-synonymous SNPs (nsSNPs) are found in coding regions of genes that cause a base change in a codon that alters the corresponding amino acid, which may be deleterious to the structure or function of the pro- tein. One relevant example associated with ALL is the germline genetic varia- tion in the thiopurine s-methyltransferase (TPMT) enzyme that catalyzes

16 inactivation of thiopurine drugs. Approximately 10% of the population has an inherited deficiency in TPMT. ALL patients that are heterozygous or homozygous for the variant allele require lower doses of mercaptopurine used during maintenance therapy to avoid serious side effects such as re- duced bone marrow activity [8,9,40]. In a second example, results from a recent genome-wide association study (GWAS) of ALL suggested that sev- eral SNPs in the SLCO1B1 gene were independently associated with clear- ance of methotrexate, an antifolate agent used in the treatment of ALL [41]. These two examples demonstrate the importance of understanding the effects of germline genetic variation in applying treatment during the course of a disease to reduce toxicity and other unwanted side effects. A second class of variation known as structural variants (SVs) are defined as genomic rearrangements such as inversions, translocations, deletions, and insertions that affect >1 kb of consecutive DNA sequence [42]. Copy num- ber variants (CNVs) are a form of structural variation that changes the num- ber of copies of a DNA segment. Today, it has become apparent that SVs and CNVs are collectively common in the population and contribute to a large proportion of normal human variation and disease susceptibility [43].

Genetic Variation and Risk of ALL In general, ALL is considered a sporadic disease arising from the accumula- tion of somatic aberrations in lymphoid progenitor cells. However, some genetic syndromes are associated with increased risk of developing ALL. The best example of this is Down’s syndrome, which confers a 20-fold in- creased risk for ALL in comparison to the general population [16,25]. Inter- estingly, the trisomy of chromosome 21, the characteristic genetic abnormal- ity of Down’s syndrome patients, is the most frequently acquired chromo- somal aberration in ALL followed by the t(12;21) chromosomal transloca- tion [44]. Furthermore, in the less frequent subtype iAMP21, the q arm of chromosome 21 (21q) is amplified [13]. Together, these observations sug- gest an important contribution of genes on chromosome 21, such as RUNX1, ERG, and ETS2 in malignant transformation of ALL cells [16]. Several GWAS have identified and verified risk alleles in genes important for B-cell regulation (IKZF1, ARID5B, CEBPE, CRLF2) with a predictive value for risk and clinical outcome in ALL [45-48]. These studies highlight- ed variants that are associated with lower expression of the corresponding genes, signifying the importance of defects in B-cell development [16]. Epidemiological studies have suggested a link between environmental exposures, such as prenatal exposure to environmental hazards or medica- tions in utero and post natal infections, to the risk of developing ALL [49]. A Swedish population-based study revealed that although there is high con- cordance among monozygotic twins, the evidence is weak for recessive pol- ygeneic models of inheritance [50]. Current estimates are that underlying

17 genetic predisposition is present in less than 5% of cases [25]. Given these pieces of evidence, it is likely that the causative pathways are multifactorial, including a combination of environmental exposures, various low- penetrance susceptibility variants, and potentially other unknown triggers. The promise of uncovering genetic risk variants for predisposition to ALL have spurred the formation of large international consortiums, such as the International Childhood Acute Lymphoblastic Leukemia Consorti- um that aims to identify inherited susceptibility variants for ALL [51].

Somatic Genetic Variation in ALL Somatic mutations occur randomly throughout the numerous cell divisions that a cell goes through starting in utero and continuing throughout postnatal life, usually with no major consequences. However, some somatic mutations in key genes such as those that control cellular growth, proliferation, and death destabilize can the normal control machinery of a cell. These muta- tions, referred to as “driver mutations” [52], give the cell the growth ad- vantage to out compete neighboring cells and thus divide uncontrolled; ulti- mately resulting in what is referred to as cancer. Along side the driver muta- tions, there are also hundreds to several thousands of mutations of termed “passenger” in any given cancer. Passenger mutations occur sporadically across the genome, but driver mutations are likely to be clustered in a subset of cancer-driving genes, such as tumor suppressor or proto-oncogenes [52]. Recent estimates suggest that between 5-10 driver mutations are required for tumorigenesis, but fewer may be required for ALL [52,53]. Somatic single bp changes, or point mutations are referred to as single nucleotide variants (SNVs). Whole-genome and exome sequencing of leu- kemia genomes at high resolution have revealed many SNVs in genes im- portant for cellular development and signaling such as ETV6, RUNX1, GA- TA3, FLT3, among several others [30,54,55]. To date, there appears to be a much lower mutational burden in ALL when compared to other pediatric and adult tumors [53]. In addition to single base changes, a second class of somatic alterations that comprises large copy number changes are referred to as copy number alterations (CNAs) have been studied extensively in ALL. In ALL, approxi- mately six somatic CNAs occur per individual [45]. Several key CNAs have been reported in genes required for normal B-cell development thus imply- ing that arrest in lymphoid differentiation may be a key step in malignant transformation [12,27]. In 80% of t(9;22)BCR/ABL1 cases, the genomic re- gion containing the key hematological regulator gene IKZF1 is deleted [56]. Furthermore, a number of CNAs are known to increase in relapsed ALL and some have been associated with increased drug resistance [57,58].

18

Figure 2. The organization of chromatin within the nucleus. The basic units in the chromatin are the nucleosomes, which contain 147 bps of DNA wrapped around a histone octamer. Various DNA binding proteins, non-coding RNAs, modifications to the core histone proteins, and DNA methylation interact to influence the chroma- tin structure. Figure adapted from reference number [59].

The Human Epigenome

The various cell types that make up an organism have identical copies of their DNA sequence, yet they can have very different phenotypes. What drives this phenotypic diversity is non-genic or “epi(above)-genetic” memory that receives its instructions from developmental and environmental signals [60,61]. Thus, the various layers of epigenetic information are key regulators of gene expression. In contrast to the static nature of the DNA sequence, numerous molecules are involved in shaping and remodeling the epigenomic landscape during development and cellular differentiation such as the addition of methyl groups to cytosine residues (DNA methylation), histone modifications (methylation or acetylation), and expression of various ncRNAs [59] (Figure 2). Ongoing large-scale projects by the Roadmap Project [59] and Blueprint Epigenome Project [62] aim to pro- duce public resources for normal human reference epigenomes. As the Hu- man Genome Project [37], HapMap Project [38], and the 1,000 Genomes Project [39] have provided a resource for normal human genetic variation, these consortia provide a unique resource for epigenetic variation in numer- ous cell types. This promises to be an invaluable source of data for the re- search community and will help to broaden the understanding of the role(s) that epigenetics plays in health and disease [63].

Histone modifications The complexity and plasticity of the epigenomic landscape is modulated in part by covalent modifications to the histones that make up the core of the nucleosome. In simplified terms, histone proteins can be subjected to various

19 dynamic modifications like acetylation or methylation that are associated with induced or repressed gene expression. For example histone 3 lysine 4 trimethylation (H3K4me3) is associated with actively transcribed regions, while histone 3 lysine 27 timethylation (H3K27me3) is associated with gene silencing [64,65]. Polycomb group (PcG) proteins are regulators of embryonic stem cell pluripotency that act in multiprotein complexes to repress linage-specific genes, a process that is frequently controlled via histone mediation [66]. PcG proteins occupy the promoters of numerous genes, such as transcriptional regulators, signaling factors, and other genes important for cellular differen- tiation [65,67]. In embryonic stem cells, such genes feature “bivalent” marks of both the active H3K4me3 and the repressive H3K27me3. This phenome- na, known as bivalent chromatin, sets genes in a primed, yet controlled state, preparing the cell for differentiation by “poising” certain genes for expres- sion [68].

DNA Methylation DNA methylation of cytosine residues was first described in 1975 [69,70]. Since then it has been the most well-studied epigenetic mark, studied in great depth with regards to X chromosome inactivation in females, genomic im- printing, tissue specific inactivation, and in cancer [71,72]. The cytosine residue in the CpG dinucleotide content is a target for methylation by DNA methyltransferases (DNMTs) (Figure 3). The addition of a methyl group provides an important mechanism for regulating gene expression and can be stably inherited by daughter cells. DNMTs are responsible for de novo DNA methylation during embryonic development and for maintaining the methyl- ation mark by recognizing hemi-methylated DNA molecules after replication and copying the DNA methylation to the other strand [66,71]. There are approximately 28 million CpG dinucleotides in the human ge- nome and this is where the vast majority of DNA methylation occurs. Alt- hough, non-CpG DNA methylation has been observed in pluripotent cells, but appears to be rare in differentiated cell types [73]. Genome-wide, be- tween 70-80% of CpG dinucleotides are methylated in healthy cells [74]. The genome is underrepresented in CpG dinucleotides due to an increased mutation rate caused by spontaneous deamination of 5-methylcytosine (5mC) to thymine. Conversely, the deamination of unmethylated cytosine to uracil is recognized and corrected by the mismatch repair machinery [66]. As a consequence of the increased mutation rate of 5mC to thymine, such spontaneous point mutations have led to an over-representation of C/T SNPs in the human genome.

20

Figure 3. The 5’ carbon of the cytosine ring in cytosine-phosphate-guanine (CpG) dinucleotides is frequently methylated to form 5-methylcytosine due to the addition of a methyl group by DNA methyltransferase (DNMT) enzymes.

Functions of DNA methylation DNA methylation is often described as a mechanism for gene silencing. However this relationship largely depends on the context of the CpG sites in regards to CG density and relationship to gene annotation such as transcrip- tion start sites and gene body [61]. Over 50% of protein coding genes in the human genome have short stretches of 200-1000 bp in their promoter regions with high CG density called CpG islands [61]. The CpG sites in CpG islands tend to be hypomethylated, and thus they have opposing methylation pat- terns to CpG sites elsewhere in the genome. Unmethylated CpG islands and highly methylated gene bodies are correlated with highly expressed genes [61]. The DNA methylation in the regions up to two kb up- or down-stream from CpG islands termed CpG island “shores” have been found to be tightly regulated by the tissue of origin [75]. These regions are reported to be the most variable regions in healthy cell types [76], in cancer cells [75], and during aging [77]. To date, the function of CpG island “shore” methylation is largely unknown. The normal function of DNA methylation also extends to maintaining chromosomal stability. As mentioned previously, with the exception of CpG islands, the rest of the genome is globally methylated. This means that most CpG sites in intergenic regions, gene bodies, and repetitive regions in the genome are highly methylated [78]. Transposable elements that have inte- grated into the human genome throughout the course of evolution are usually completely methylated. These potentially harmful elements are suppressed in a large extent by DNA methylation [61].

DNA Methylation and Cancer Classically, the model of cancer progression involves the clonal proliferation of cells with driver mutations, which gradually accumulate more mutations as the malignant cell population expands. However, the classical model has

21 not been successful in predicting recurrent driver mutations for all cancers [53,79]. Changes in the epigenetic landscape preceding or during malignant transformation may offer at least a partial explanation. In 1983, Feinberg and Vogelstein made an important discovery that brought about the idea of an epigenetic contribution to cancer [72]. They identified a global pattern of hypomethylation (decrease in methylation) across the genome of cancer cells. Maintaining the integrity of DNA methyl- ation at such regions plays a significant role in malignant development, as global hypomethylation can lead to increased instability resulting in translo- cations and detrimental activation of growth enhancing genes such as growth stimulating proto-oncogenes [71]. In contrast, aberrant hypermethylation (increase in methylation) tends to occur in promoter regions, for example in the CpG islands of tumor suppressor genes [79-81]. In fact, between 5-10% of the CpG islands in the human genome become aberrantly methylated in cancer [82]. Such DNA methylation may be occurring due to an instructive or predisposed mechanism. Hypermethylation occurs in regions normally associated with PcG Repressive Complex during development [83]. This process has been coined “epigenetic switching [78], as cancer cells perma- nently switch off key regulators of cell development. Abnormal expression of the Histone-lysine N-methyltransferase (EZH2) protein that is part of the Polycomb Repressive Complex may be directing aberrant hypermethylation in cancer cells [84].

Epigenetics in ALL ALL has traditionally been considered as a disease that arises from genetic events, such as the recurrent cytogenetic events described previously. How- ever, mounting evidence suggests that the pathogenesis and phenotypic characteristics of leukemic cells may be partially explained by aberrant regu- lation of the epigenome [85]. Aberrant DNA methylation in BCP-ALL and T-ALL subtypes has been described [86,87] and cellular resistance to chem- otherapy in ALL cells may be associated with increased promoter methyla- tion [88]. Detection of such a methylation profile could be used at diagnosis to stratify patients with higher risk of resistance or relapse and importantly to better understand the mechanisms behind resistance in general. To date, nu- merous candidate genes have been identified as biomarkers for subtype clas- sification and prognosis of patients by CpG methylation status [33,89-92]. The MLL gene is a transcription factor with H3K4 methyltransferase activity involved in chromatin remodeling [93]. The ALL patients characterized by these rearrangements have very few additional genetic aberrations, thus is it speculated that this subtype might be driven by epigenetic deregulation [93].

22 Cooperation of Genetic and Epigenetic Mechanisms in Cancer It is becoming increasingly clear that genetic and epigenetic events are not separately acting entities, but together they contribute to malignant transfor- mation through cooperative interactions. First, single base pair CèT transi- tion mutations in the context of CpG dinucleotides are the most prevalent form of single base change in cancer, and account for nearly 30% of all dis- ease causing variants [94]. These mutations can disrupt the CpG dinucleotide may also infer an epigenetic effect. Second, in light of several recent re- sequencing efforts of various human cancers, multiple somatic mutations in genes responsible for epigenetic regulation have been characterized [30,66,95,96]. These mutations occur in genes involved in the maintenance and establishment of DNA methylation, histone modifications, and in chro- matin remodeling [66]. One particular example is the observation of recur- rent mutations within the EZH2 gene in T-ALL [30] and how the disruption of the EZH2 protein is enough to induce T-ALL in mice [97]. Third, the structure of the epigenomic landscape may be predisposing specific regions for increased mutation rates, such as those associated with the H3K9me3 repressive mark [98]. Given these pieces of evidence, Knudson’s two-hit hypothesis for inactivation of tumor suppressor genes that has mainly fo- cused on mutation and deletion events [99], could be extended to include epigenetic events as well [100].

Gene Expression Gene expression is the process by which genetic information (DNA) is tran- scribed into a functional product. It is the differential expression of RNA molecules that gives rise to cell-to-cell differences (phenotype). RNA mole- cules consist of both protein coding messenger RNAs (mRNAs) and a varie- ty of ncRNAs. Together, these molecules collectively consist of what we refer to as the transcriptome (Figure 4) [101]. It is now widely accepted that up to 70% of the human genome is transcribed, and that a large proportion of the transcribed regions are ncRNAs [34,36]. Many different types of ncRNAs exist including the highly transcribed ribosomal RNAs (rRNA) and transfer RNAs (tRNA) that are essential for translation of mRNA to protein. There are also many types of RNAs that are involved in regulation, particu- larly in tissue specialization, development and in disease [102,103]. Such ncRNAs include; small nucleolar RNAs (snoRNAs), micro RNAs (miR- NAs), spliceosomal RNAs (snRNAs), piwi-protein-interacting RNAs (piR- NAs), and medium to large RNAs such as long ncRNAs and antisense RNAs [101,104,105].

23

Figure 4. An example of the complexity of the human transcriptome. A single hu- man gene can have multiple coding transcripts (green/red lines) that originate from multiple alternative transcription start sites (arrows, green gene) and terminate at alternative polyadenylation sites (red gene). The two different alleles can be ex- pressed at different levels (green gene). The exons of protein coding genes are shown as boxes. miRNAs and snoRNAs (orange) can originate from intronic se- quences. Long and short antisense transcripts (blue) are transcribed from the oppo- site strand of coding genes. Novel short RNAs (purple) cluster around the transcrip- tion start and stop sites. Figure adapted from [101].

The functions of many of several of the ncRNA species are mostly unknown, however one interesting possibility is that they are involved in regulating several epigenetic processes [106]. For example, heterochromatin appears to be partly regulated by small RNAs, such as piRNAs and the RNA interfer- ence process [106]. Bidirectional transcription of can result in double strand- ed sense-antisense transcript pairs that are processed into small RNAs and direct epigenetic repression [107]. Alternatively, as hypothesized in the RNA-DNA interaction model, newly formed antisense ncRNAs can interact with DNMTs or recruit histone modifying enzymes to de-repress or repress sense RNA expression [108]. Notably, ncRNAs and anti-sense RNAs have been correlated with the development of cancer-specific DNA hypermeth- ylation at gene promoters [102], and may be responsible for recruiting Poly- comb Group proteins such as EZH2, which results in DNA hypermethylation [109,110].

Allele-Specific Gene Expression Another form of RNA regulation is the allelic imbalance in gene expression or allele specific gene expression (ASE) of two loci (alleles) of a gene in an individual [111,112]. It is commonly observed in imprinted genes and in X chromosome inactivation where only one of the two inherited alleles is ac- tive in females [113], but has also been observed in non-imprinted genes [114]. Current estimates suggest that the ~20% of genes expressed in a given cell type display ASE, whereas at the population level ~50% of expressed

24 genes display ASE [115]. ASE can be used as a guide for identifying genes regulated by cis-regulatory elements such as regulatory SNPs [116] or DNA methylation [117]. ASE and allele specific CpG site methylation may be controlled in a similar fashion by the same cis-regulatory variants, for exam- ple by a SNP disrupting a CpG methylation site [118].

Gene Expression Profiling in ALL Over a decade ago, Golub et al published their seminal study that demon- strated for the first time that subclasses of leukemia could be classified by district gene expression patterns [119]. Since then several microarray-based expression studies have established that cytogenetic subtypes of ALL have unique gene expression signatures [120-122] and have also discovered new subtypes of ALL [27]. In addition, several reports have suggested that the gene expression profiles of ALL samples taken at diagnosis can have prog- nostic significance [32]. Consequently, gene expression profiling has been explored as a potential tool for diagnostics and prognosis in ALL. However, owing to limitations related to the collection of the small amount of pure leukemic cells, RNA storage, and data interpretability its wide-spread appli- cation has been hindered [123]. Nonetheless, such expression analyses have led to a considerably better molecular understanding of ALL subtypes.

Technologies for Genome-Scale Analysis With the completion of the [124], the advancement in technologies for high-throughput DNA analysis with high-density micro- arrays [2,125] and DNA sequencing quickly followed [3,4]. Today, we can measure genetic and epigenetic variation at the single nucleotide level in thousands of samples at a reasonable cost and the long awaited 1,000 USD genome is now within reach [126]. New applications for sequencing RNA (RNA-seq) have provided a revolution in terms of dynamic range and single- base resolution for gene expression studies that were previously hampered by cost, throughput, and a priori design [127].

Array-Based Technologies The advent of microarray-based technologies has transformed the setting of high-throughput molecular biological studies. Microarrays have been adapted for large-scale SNP genotyping, copy number analyses, gene ex- pression analyses, determining transcription factor binding sites, and more [43,125,128]. Although the improvements in new sequencing technologies have made whole-genome sequencing (WGS) a possibility, the cost of such efforts in not reasonable for studies of hundreds to several thousands of indi-

25 viduals. Therefore, microarray-based studies are still the most feasible choice for study designs where many individuals are to be interrogated. SNP genotyping arrays are built by a combination of different methods: First, DNA is amplified by whole-genome amplification (WGA) in order to have ample DNA for subsequent reactions. Second, the amplified DNA is hybridized to oligonucleotide probes in solution or directly onto a microar- ray. Third, by reactions specific to each array, the SNP alleles are distin- guished from one another. Lastly, the signal from the SNP allele is and de- tected [2]. Currently, a single SNP up to 5 million SNPs can be assayed sim- ultaneously, depending on the platform used. SNP detection assays can be further applied for detection of CNVs and ASE. Both CNV and ASE anal- yses involve the analysis of raw intensity signals obtained from the arrays used for SNP calling [43,115]. The array-based techniques used for analyzing SNPs can also be used to determine the methylation status of cytosine within CpG dinucleotides. However, an additional step involving the addition of bisulfite treatment of DNA is required. 5mC has special properties in comparison to unmethylated cytosine in that the methylation protects the residue from conversion to ura- cil when the DNA is treated with sodium bisulfite [129]. Therefore, after sodium bisulfite treatment, CpG sites can be genotyped as C/T polymor- phisms since uracil is transformed into thymine during PCR. Array-based assays such as the Golden Gate and Infinium DNA methylation Arrays allow for simultaneous “epigenotyping” of 1,536-485,000 sites in multiple DNA samples in a single experiment (Illumina Inc).

Sequencing-Based Technologies The terms next generation sequencing (NGS) or “massively parallel” se- quencing describes technologies for sequencing many short DNA fragments in parallel. The advent of such technologies marked a dramatic shift away from automated Sanger sequencing towards methods that significantly in- crease the number of sequences produced in a single experiment, at a frac- tion of the cost [3]. These platforms are not yet capable of producing reads as long with traditional Sanger sequencing, but with rapid technology ad- vancements several of the platforms are expected to reach the same read lengths in the coming year. NGS technologies have circumvented issues with short sequencing lengths due to the sheer amount of sequences produced. The dominating commercial systems available today are built on three basic steps: Template preparation, clonal template amplification, and sequencing. In this thesis, we utilized the sequencing by synthesis, single base extension technology on the Genome Analyzer/HiSeq/MiSeq sequencing machines from Illumina Inc (San Diego, CA, USA) for sequencing short cDNA tags for digital gene expression analysis.

26 NGS has decreased the cost and increased the throughput of sequencing- based gene expression methods [3,4,130]. With such improvements, meth- ods like serial analysis of gene expression (SAGE) [131], which was previ- ously hindered by the low-throughput of cloning and high cost of Sanger sequencing, are now feasible for high-throughput genome-wide expression analysis [132]. Sequencing-based methods measure gene expression by digi- tal counting of the RNA molecules in a sample, whereas with microarrays the expressed genes are detected by hybridization to pre-designed probes and measurement by fluorescence intensity. Sequencing methods outperform microarrays in the increased dynamic range and in detection of novel tran- scripts [132]. Additionally, ASE can also be studied by digitally counting the occurrence of the different alleles in RNA sequence data. Such digital counts can be achieved by interrogating heterozygous SNPs in mRNA sequences achieved with RNA-seq [133] or by capturing the heterozygous information by using padlock probes in DNA and cDNA followed by sequencing [134]. WGS and whole-genome bisulfite sequencing (WGBS) are now common- ly used to interrogate the entire 3.1 billion bp of the human genome and the 28 million CpG sites in healthy and diseased DNA samples [4,30,77,96]. Although WGS has become feasible in the recent years, it is still an expen- sive procedure, especially when multiple samples are to be studied. An alter- native approach is to use enrichment methods to “fish out” specific regions of the genome of interest, and sequence only those regions. Several strate- gies based on this approach are available, such as target capture by custom- designed oligonucleotide microarrays, solution-based hybridization, and chromatin immunoprecipitation (ChIP) with antibodies against specific his- tone modifications or transcription factors [4,135]. In summary, array-based and NGS technologies have transformed the ge- netic and epigenetic study of human diseases. The vast improvement in the understanding of the mutational burden in cancer cells, the expression of genes that deregulate important pathways, and the disruption of epigenetic machinery will undoubtedly increase exponentially as more diseased ge- nomes are analyzed daily.

27 Present Investigations

Thesis Aims The aim of this thesis was to utilize modern methods for genome-wide anal- ysis to discover and characterize gene expression and DNA methylation patterns in primary ALL cells. The long-term aim of these studies was to advance the understanding of the genes, their variants, and the molecular pathways underlying the etiology of ALL and the clinical diversity between ALL patients. The specific objectives of the work in this thesis were to de- termine:

• Genome-wide gene expression patterns in five common subtypes of ALL. • Allele-specific gene expression caused by cis-acting DNA meth- ylation in ALL cells. • DNA methylation signatures in ALL at diagnosis, remission, and at relapse.

Materials and Methods Samples The primary ALL samples analyzed in Papers I-IV were collected at collab- orating institutions within the Nordic Society of Pediatric Hematology and Oncology (NOPHO). Bone marrow aspirates or peripheral blood samples from patients at diagnosis, relapse, and remission were collected from over 800 children diagnosed with ALL between 1996 and 2010. Ficoll-isopaque centrifugation (Pharmacia) was performed on all samples. All of the patients were enrolled in the NOPHO 1992 or 2000 treatment protocols [8]. The Eth- ics Committee of Uppsala University approved the study, and the patients and/or their guardians provided written informed consent. For papers III-IV, several types of non-leukemic control cells were ana- lyzed including healthy cells from matched remission bone marrow samples taken from the same ALL patients during routine follow-up screening for MRD within one year of diagnosis. In paper IV, CD19+ B-cells and CD3+ T- cells were isolated from peripheral blood mononuclear cells from healthy Swedish blood donors using positive selection and MACS cell separation

28 reagents (Miltenyi Biotec). Pooled CD34+ cells isolated from five healthy blood donors were purchased from 3H Biomedical (Uppsala, Sweden). DNA and RNA were extracted from pelleted or cryopreserved samples of 0.2-30 million cells using the AllPrep DNA/RNA Mini kit or the Qiamp Blood mini kit (Qiagen). RNA quantity and integrity was assessed by capil- lary electrophoresis on RNA 6000 Nano LabChips (Agilent). DNA quantity was measured by UV absorbance (Nanodrop) or fluorescence intensity with the Picogreen Assay (Invitrogen).

Figure 5. Summary of the protocol for digital gene expression (DGE) tag profiling. Polyadenylated RNAs are captured from 1 µg of total RNA on magnetic oligo(dT) beads and double stranded cDNA is synthesized. The cDNA is digested with the restriction enzyme NlaIII. Adapter one containing the recognition sequence for the restriction enzyme MmeI is ligated to the construct. MmeI cuts the construct down- stream from the recognition sequence, leaving 17 bp of “tag” sequence. A second adapter is ligated and the construct is amplified by PCR. The PCR products are gel purified and quantified on a Bioanalyzer (Agilent). The library is subsequently se- quenced on an Illumina Genome Analyzer (Illumina Inc).

29 Digital Gene Expression Profiling (Paper I) Digital gene expression (DGE) libraries for genome-wide RNA expression were generated using the commercially available Digital Gene Expression Profiling with NlaIII kit from Illumina, Inc. Twenty-one RNA samples of high quality from five subtypes of ALL were selected for analysis. The de- tailed protocol for DGE profiling is outlined in Figure 5. Using this bead- based protocol with chronological addition of adapters enables strand- specific interrogation because the information about which DNA strand the original polyadenylated RNA molecule originated from is retained. Each of the DGE libraries was sequenced on a single lane of a flow cell on the Ge- nome Analyzer II (Illumina, Inc).

Figure 6. Allele-specific gene expression (ASE). (A) Two copies of a gene are tran- scribed, one from each allele (allele 1 and allele 2). The gene is indicated as down- wards arrows and the mRNAs are indicated as wavy lines. In this hypothetical ex- ample, the mRNA is expressed in four copies from allele 1 and one copy from allele 2. Using the equation in (B), the ASE value is calculated by subtracting the allele fraction in DNA (1/1+1) from the allelic fraction in RNA (4/(4+1)). This hypothet- ical gene has a quantitative ASE value of 0.3.

SNP Genotyping of DNA and RNA (Paper II) Allele-specific gene expression (ASE) is measured by genotyping heterozy- gous SNPs in DNA and RNA. In this method, the RNA (cDNA) level for each SNP allele is compared to the corresponding DNA from the same sam- ple (Figure 6). The commercially available NS-12 SNP genotyping Bead- Chip from Illumina, Inc was utilized for the detection of ASE. The NS-12 BeadChip is based on allele-specific primer extension for the analysis of 13,900 SNPs annotated to approximately 8,000 genes. Prior to genotyping, RNA was reverse transcribed into cDNA with oligo-dT primers (Ambion). High quality DNA and cDNA from 197 ALL samples were simultaneously analyzed in triplicate. Detection of SNPs was performed in a one-color assay by allele-specific primer extension with biotinylated dNTPs and streptavi- din-phycoerythrin sandwich staining.

30 DNA Methylation Analyses (Papers II, III & IV) For the analysis of DNA methylation at single CpG site resolution we de- signed a custom GoldenGate DNA methylation panel to assay 1,536 CpG sites and also utilized the Infinium Human Methylation 450k Array that as- says a predefined set of ~485,000 sites across the genome (Illumina, Inc). In papers II-III, the custom selected 1,536 CpG sites were chosen to be within 2 kb upstream and 1 kb downstream from the transcription start sites (TSS) of 416 genes. In total, 400 of the genes were chosen based on observed ASE and several candidate genes were included based on previously previous implication in hematological diseases. In both assays, genomic DNA was treated with sodium bisulfite (Zymo Research), which converts all un- methylated cytosines to uracils, while methylated cytosines remain uncon- verted. CpG sites are interrogated by genotyping the bisulfite converted DNA for U(T)/C SNPs at the sites of interest by a two-color detection system (Figure 7). The GoldenGate assay is based on hybridization of two allele-specific oligos that are labeled with two different fluorophores, which were measured using an Illumina Bead Station GX scanner. In the Infinium assay, two dif- ferent probe types are utilized for maximum coverage of the genome. Type I probes are based on allele-specific hybridization of the two alleles to two different bead types, one for the methylated allele and the other for the un- methylated allele. The type II probe design is based on allele-specific primer extension where the methylated/unmethylated base is interrogated by fluo- rescent signals in two color channels. The fluorescence signals were meas- ured using the Illumina iScan system.

Bioinformatics and Statistical Analyses (Papers I-IV) Bioinformatic and statistical analyses were performed using custom de- signed scripts in the Perl and R languages [136]. Statistical packages were downloaded from Bioconductor (www.bioconductor.org) including “edg- eR”, “pamr” and “limma” [137-139]. Special consideration was given to the underlying distributions of the data. For example, DNA methylation obser- vations are confined to values between 0.0-1.0, and in cancer samples espe- cially, the β-value distribution is typically not normally distributed. There- fore, we performed non-parametric tests, such as the Wicoxon rank-sum test (also known as the Mann-Whitney test) and the Wilcoxon signed-rank test for paired analyses. In addition, multiple tests were performed on genome- scale data in each of the studies. An important step in the data analysis is the adjustment for performing multiple tests. We utilized three different methods for controlling spurious findings due to multiple testing that included strict Bonferroni adjustment, controlling false discovery rate (FDR) with the Ben- jamini-Hochberg approach, and permutation testing [140].

31

Figure 7. Schematic depiction of DNA methylation assays. (A) Bisulfite is applied to genomic DNA to convert unmethylated cytosines to uracils. In the GoldenGate Assay, the methylation levels are assayed using allele-specific oligos that hybridize to the site of interest. Ligated constructs are then amplified by PCR with three uni- versal primers: Two fluorescently labeled allele specific primers (shown in red and green) and a reverse primer universal to each template (shown in gray). The PCR products are hybridized to BeadArrays by their complementary address sequences (yellow) and the methylation status is deduced by the signals from the two fluoro- phores. (B) Thirty percent of the 450k probes on the Infinium Human Methylation 450k BeadChip are type I, based on allele-specific hybridization followed by single base extension with fluorescently labeled nucleotides that measure the methylation status of CpG sites with two different bead types in one color channel. (C) The re- maining (type II) probes are based on allele-specific base extension by fluorescently labeled nucleotides. Type II probes are analyzed in two color channels, one color for methylated and the other for unmethylated cytosines on one bead type.

32 Results Genome-Wide Gene Expression in ALL Cells (Paper I) We generated digital gene expression (DGE) profiles from RNA extracted from blood or bone marrow taken at diagnosis from 21 ALL samples with either BCP ALL (four subtypes) or T-ALL: High Hyperploidy (HeH) (n=8), t(9;22) BCR-ABL1 (n=3), t(12;21) ETV6-RUNX1 (n=3), dic(9;20) (n=3) and four patients with T-cell lineage ALL (T-ALL). We obtained between 4.9 and 23.7 million sequence reads for each library, of which approximately 70% were uniquely annotated to the human reference transcriptome se- quence. DGE combined with supervised classification by nearest shrunken cen- troids (NSC) proved excellent for distinguishing the subtypes of ALL (Paper I, Figure 2). Due to the strand-specific nature of DGE profiling, we detected antisense transcripts for ~48% of the expressed genes. Again, implementing supervised classification with NSCs, we found that a subset of the antisense transcripts was expressed in a subtype-specific manner (Paper I, Figure 2). Alternative polyadenylation (APA) changes the length of the 3’ untrans- lated region (UTR), resulting in shorter and longer mRNA isoforms that are detectable as multiple expressed 3’ tags using DGE. The detection of multi- ple DGE tags in 3’ UTR gene regions indicated APA for 38% of the genes expressed) in our analysis. Of the APA observations with exactly two tags in the 3’ UTR, 17% were supported by the presence of a predicted polyA cleavage site between the tags [141]. Interestingly, genes with evidence of APA highly correlated with the presence of miRNA binding sites (p-value < 0.0001) and the presence of antisense transcripts in the same gene region (p- value < 0.0001), suggesting that antisense transcripts, alternative polyad- enylation, and the presence of miRNA binding sites may be spatially con- nected.

Allele-Specific Gene Expression in ALL Cells (Paper II) To discover genes that display differential expression of the two alleles in primary leukemia cells, we screened heterozygous SNPs in the coding re- gions of ~8,000 genes in 197 diagnostic ALL samples. In order for a SNP to be informative for the detection of ASE, the SNP must be heterozygous in DNA and expressed in RNA. In our study, 32% (n = 3,531) of the SNPs in 2,529 genes were informative and ASE was detected in 470 SNPs in 400 genes. To determine whether ASE correlated with DNA methylation, we quanti- fied the methylation levels of CpG sites in the promoter regions and 1st in- tron of the genes with ASE. Applying three different approaches, we found a strong correlation between ASE and CpG site methylation.

33 Firstly, we noted that there was a difference in the magnitude of ASE be- tween one-directional (the same allele always over-expressed) and bidirec- tional (different alleles over-expressed) ASE. Genes with bidirectional ASE had significantly larger variability in methylation than genes with one- directional ASE (p-value = 2.2 x 10-5) (Paper II, Figure 4). In a second approach, we compared ASE-levels for genes with two groups of samples displaying different methylation patterns. The first group contained samples with variable CpG methylation (β-values from 0.25-0.75) and the second group contained samples with high or low CpG methylation (β-values > 0.75 and < 0.25). For 35 genes (Paper II, Table 2), we observed significantly higher ASE-levels for the first group, which contained samples with variable CpG methylation (p-value < 0.05). In the third approach, we looked for quantitative correlation between ASE-level and β-values. This analysis highlighted 24 genes (Paper II, Table 3) with quantitative correlation (p-value < 0.05). Twelve common genes were highlighted in this analysis and all were from the 35 genes highlighted above.

DNA Methylation at Diagnosis of ALL and Remission (Paper III) Next, we continued our investigation of the 1,536 CpG sites by determining the methylation levels in 20 bone marrow samples taken at diagnosis of ALL and matched bone marrow samples taken from the same patients at remis- sion. In addition, we measured the methylation levels in non-leukemic pedi- atric bone marrow samples as controls. We found that the ALL cells had methylation profiles that were clearly distinct from the bone marrow cells at remission (Paper III, Figure 2). The methylation profiles of the cells at re- mission were highly similar to those of bone marrow cells from children without leukemia. For 28 CpG sites, we observed large DNA methylation differences (∆β-value ≥0.3, corrected p-value < 0.01) between the paired diagnostic and remission samples (Paper III, Table 2). Thus the observed differential methylation patterns most likely reflect de novo methylation gains/losses in the ALL cells in comparison to mononuclear cells present in healthy bone marrow.

Genome-Wide DNA methylation Analysis in a Large ALL Cohort (Paper IV) Human Methylation 450k BeadArrays were used for genome-wide epigeno- typing of >1,000 samples, including 850 ALL samples and 150 control sam- ples (Paper IV, Table 1). The control samples consisted of DNA from matched bone marrow samples taken from 86 ALL patients at remission and hematopoietic stem cells, B-cells, and T-cells from healthy blood donors.

34 The 450k assay provides quantitative methylation levels for ~485,000 CpG sites in >99% of human genes, >96% of “CpG islands” on all human auto- somes and the X/Y chromosomes. Stringent filtering was applied to both the CpG sites and DNA samples analyzed. First, DNA samples were quality controlled by determining how many CpG sites passed a detection p-value filter of < 0.01. Second, each DNA samples was controlled for complete bisulfite conversion, by a set of control probes that interrogate cytosine resi- dues that are expected to be unmethylated and completely converted to ura- cil. All samples passed these quality filters. Additionally, we identified unre- liable probes that may introduce bias into the data. We removed all probes annotated on the X and Y chromosomes, as X-inactivation would cause skewed β -values in females. Then we removed all probes with a SNP or indel with >5% minor allele frequency (dbSNP v135) annotated within 3 bp of the 3’ end of the probe due to highly variable β -values. Non-reference genetic variation is not accounted for in the 450k probe design. Lastly, we removed probes that aligned to the genome in multiple places (Paper IV, supplementary figure 2). After stringent quality control and filtering, 440,923 CpG sites remained for analysis (Paper II, supplementary figure 12). We evaluated the β-values for reproducibility. Of the 1,536 CpG sites that we measured with the GoldenGate assay (Papers II-III), 207 CpG sites were also assayed on the 450k array. In total, methylation values for 364 primary ALL samples included in Paper IV were also measured with the GoldenGate assay (Papers II-III or [33]). We found extremely high correlation in β - values measured at these sites between the two assay types (correlation > 0.9). Biological and technical replicates also proved that the β-values deter- mined by the array were highly reproducible (Paper IV, supplementary fig- ures 13-14). As CpG sites are known to display different patterns by relationship to CpG islands, we utilized annotation data on the 450k to split sites into four categories referred to as CpG “islands”, “shores” (0-2 kb from an island), “shelves” (2-4 kb from an island), and “open sea” for CpG sites outside of CpG island regions [75]. In general, CpG islands are unmethylated, shores display bimodal patterns (either methylated or unmethylated) and “shelves” and “open sea” are usually completely methylated (Paper IV, supplementary figure 3). Coordinately, we found that the different cytogenetic subtypes of ALL displayed highly divergent clustering based on subtype and class of CpG site analyzed (Paper IV, Figure 1 & supplementary figure 4). In all comparisons, the control cells clustered together. The methylation patterns in ALL cells were highly divergent from that of the tightly controlled patterns of the four different control cell types consisting of healthy bone marrow mononuclear cells, CD34+ hematopoietic stem cells, CD19+ B-cells, and CD3+ T-cells.

35 Discussion Papers I-IV describe how the implementation of new technologies for ge- nome-wide analysis, such as sequencing-based expression profiling by “massively parallel” sequencing technology, detection of allele-specific gene expression by genotyping SNPs in DNA and RNA, and array-based technol- ogies for determining the methylation status of thousands to hundreds of thousands of cytosine residues across the genome can enhance our understat- ing of the molecular phenotypes of ALL cells. To achieve this, an exception- ally unique, well-characterized, international collection of primary ALL cells were analyzed in the four papers included in this thesis. The results from Papers I-II document the detection two previously un- explored expression patterns in primary ALL cells. In Paper I, the identifica- tion of widespread antisense transcription was possible by use of a strand- specific sequencing protocol. Methods based on randomly-primed cDNA synthesis like most RNA-seq [142] and array-based [128] protocols cannot distinguish between transcripts from opposite strands expressed at the same genomic locus. In addition to difficulty quantifying overlapping transcripts, such methods potentially skew the quantitation of sense transcription due to transcripts arising from the opposite strand [132,142]. For many years, it has been known that cytogenetic subtypes of ALL express mRNA molecules in subtype-specific patterns [119]. Although several studies have shown that antisense transcription is prevalent in healthy and in cancer cells and several examples of antisense transcripts at tumor suppressor loci in leukemic cells lines [102,143], antisense expression had yet to be addressed genome-wide in ALL. We not only found evidence to suggest that antisense transcripts are both prevalent in ALL cells (up to 49%), but that a cytogenetic subtypes of ALL express a subset of antisense transcripts in a subtype-specific manner. Such genes included important regulators of B- and T-cell development known to be involved in ALL pathogenesis such as notch 3 (NOTCH3), paired box 5 (PAX5), B-cell CLL/lymphoma 9 protein (BCL9), B-cell CLL/lymphoma 11B (BCL11B), T-cell leukemia/lymphoma 1A (TCL1A), among others (Paper I, supplementary tables S7-S8). The length and struc- ture of the antisense transcripts remains unclear since DGE profiling only yields 21 bp of sequence. Allele-specific expression outside of the X-chromosome and imprinted genes appears to be common in the human genome. Today, several studies have described this occurrence in non-diseased cells [115,144]. A major advantage of using measuring ASE by SNP arrays as described in this thesis (Paper II), is that the levels of the two alleles are simultaneously measured relative to each other, within the same sample. This approach controls for environmental factors that may cause differences between samples by com- paring the allelic ratio in RNA to DNA, which serves as an internal control [115]. ASE was observed in several interesting genes related to leukemia

36 (Paper II, supplementary table 2), including the NOTCH3 gene that was mentioned in Paper I. Together, Papers I -II suggest that complex regulatory mechanisms play a role in gene regulation in ALL. Whether these transcripts are aberrantly ex- pressed in the ALL cells, as non-leukemic cells were not included in either analysis, remains to be addressed. To gain further insight into the function of the antisense and ASE patterns, a comprehensive analysis of the entire cod- ing and non-coding transcriptome could to be performed using new strand- specific deep RNA sequencing methods [145]. Today, array-based ASE studies can also be extended using genome-wide SNP chips since the pres- ence of unspliced RNAs can add to the confidence in calling ASE by in- creasing the heterozygous markers measured in each sample [115,146]. Ex- panding to genome-wide SNP arrays would also allow for measuring ASE in ncRNAs [147]. Thus, focusing on SNPs outside protein coding regions may provide a more complete genome-wide picture of ASE. In Paper II, we identified a subset of genes with ASE that are regulated by DNA methylation. One striking observation was that CpG sites outside of CpG islands exhibited a larger degree of variation in methylation than those within CpG islands. In related paper by our group [33] in which we charac- terize the subtype-specific methylation patterns and overall outcome in ALL, we also found that these variable sites were the most informative. The im- portance of methylation patterns of CpG island “shores” has been noted in several other studies [75,148]. In Paper IV, we analyzed the CpG sites based on annotations in CpG islands, “shores”, “shelves” and “open sea” we found that different cytogenetic subtypes display different clustering patterns de- pending on which sites were interrogated. Shores are strongly correlated to gene expression and are where the majority of tissue-specific methylation occurs [75], but how “shore” methylation is directed remains unclear. In Papers III-IV we investigated how DNA methylation is perturbed in ALL cells compared to non-leukemic cells. Paper III can be viewed as a pilot study for large-scale analysis of DNA methylation in Paper IV. In Pa- per III, we measured the methylation levels of ALL cells at diagnosis and healthy bone marrow mononuclear cells 1-4 months after initial treatment, while the patients were in morphological remission. In this paper, we includ- ed additional bone marrow mononuclear cells from children without leuke- mia or any other hematological disease to control for chemotherapy-induced difference in the bone marrow, which was not observed in the sites we as- sayed. An important consideration in this study design was the use of control material. As DNA methylation is known to be highly correlated to cell type [76], individual genotype [144], and is known to differ throughout the life- time of individuals [77], with especially large changes during childhood and adolescence [149], the selection of appropriate control cells is essential. There is inadequate knowledge of the “leukemia-initiating cell” to define which cell type should be used in such studies [150]. The mononuclear cells

37 that comprise healthy bone marrow stem from all normal hematopoietic cell lineages, i.e. lymphoid, myeloid and erythroid progenitor cells at varying stages of differentiation. In Paper IV, we chose to expand the control types and therefore we chose to analyze three additional specific controls includ- ing purified B-cells, T-cells, and hematopoietic stem cells from healthy indi- viduals. One general, but important observation was that in both studies, the ALL cells were clearly different from all types of control cells, thus giving us the confidence that the majority of the differential methylation observed were ALL-specific. In Paper III, we found that 28 CpG sites annotated to 24 genes consistent- ly displayed differential methylation in 20 diagnosis-remission pairs. Among the differentially methylated sites in Paper IV, several CpG sites in each of the genes from Paper III were validated by analysis in additional ALL sam- ples and controls. Nine of the genes (CR1, WDR35, ZNF502, EYA4, RUNDC3B, FAM83B, DBC1, MYO3A, SEC14L1) were hypermethylated across all cytogenetic subtypes of ALL and unmethylated in each control cell type (Paper IV, supplementary table 2). These results indicate that aberrant DNA methylation of these genes is likely a general event in ALL. Hyper- methylation of EYA4 is particular is interesting because it is observed in other malignancies [151] and has been recently implicated among epigenetic drivers in cancer [152]. One striking observation from the combination of these four studies is a profound overlap in gene lists. For example, NOTCH3 displays antisense transcription in T-ALL cells (Paper I) and ASE (Paper II). Over 30 of the genes with ASE (Paper II) were found to have differential methylation be- tween ALL cells and controls (Papers III-IV), possibly indicating that the ASE observed in a subset of the genes in paper II was due to de novo hyper- or hypermethylation in ALL.

Concluding Remarks In conclusion, the results presented in this thesis identify new patterns that will aid in unraveling the complexity and variability of ALL at the molecular level. With the accumulating knowledge of inherited and somatic variation that drives human malignancies and advances in sequencing, “personalized” medicine is likely to become a reality in the forthcoming years. However, we are only at the beginning of understanding how and if the different genetic and epigenetic marks contribute to leukemic initiation and affect long-term survival for the patients. In sheer numbers, the epigenetic events are far more frequent than genetic events. It can be argued that epigenetic profiling may be just as, or more important for diagnostics, prognostication, and therapeu- tic stratification of leukemia patients in the future [153]. Importantly, with the increasing content available on microarrays and sequencing technologies

38 that are rapidly improving output every day, in order to utilize the growing data improvements in computational infrastructures and the adaptation of statistical analyses and algorithms are needed. Today and in the future, the challenges will be not in generating the data, but in interpreting the results and drawing conclusions from the underlying biology.

39 Acknowledgements

The work in this thesis was performed in the Molecular Medicine group at the De- partment of Medical Sciences, Uppsala University, Sweden. These studies have been supported by numerous grants from the Swedish Cancer Society, the Swedish Pedi- atric Cancer Foundation, the Swedish Research Council for Science and Technolo- gy, the Swedish Foundation for Strategic Research, and Uppsala University Hospi- tal. Numerous travel grants and stipends were generously provided by the Swedish Cancer Society, the Mortar Board SDSU Chapter, the American Women’s Club Stockholm, and the Hayfork Scholarship Foundation Hurlbut Grant.

To my main supervisor, Ann-Christine Syvänen: Thank you for giving me the chance to do my PhD in your group. In the beginning of my PhD you saw an oppor- tunity and believed that I could learn bioinformatics, at first I was not as convinced, but I cannot thank you enough for sending me to all the programming courses and putting up with the slow progress at the start. Thank you for allowing me to work independently on my projects, and for pulling me back into focus when I got side- tracked, I’ve learned so much!

To my co-supervisor, Anna Kiialainen: Thank you for being not only a phenomenal co-supervisor, but for becoming a dear friend as well. I have so many fond memo- ries of running 21.1k in the midnight sun to snowboarding down to the slopes of Sälen. I’ll always remember struggling to put oil on the flow cell at midnight… and your patience in the process.

Gudmar Lönnerholm, Thank you for sharing your vast knowledge of ALL and for the fantastic collaboration. I also would like to thank you for proof reading my thesis and paper IV this summer (while on vacation!).

To all the co-workers in the MolMed Group: Annika Ahlford, Lili Milani & Johanna Sandling, the big “sisters” who set the groundwork for a successful PhD. I can’t imagine what a mess this whole process would have been without your organization, good advice and support throughout. To the current MolMed members: Chuan, Olof, Anders, “Old” Per, Mårten, Eva, “new” Per, Elin, Christofer, Julianna, Lina, Jonas, Amanda, Mathias, Josefine, Niklas, Behrooz, Satyam, Shahina and all the past members of the group: thank you for making our workplace a “nerdy” & fun environment, especially for the famous “duble låda” and late night wine breaks! Chuan Wang, my twin! Thank you for

40 being a good listener, an excellent problem solver and the best co-TA I could ever ask for. I enjoyed sharing an office and teaching responsibilities with you all these years. Good luck on the 18th J. Olof Karlberg, THANK YOU for your help on the DGE paper, all the programming advice, and for your refreshing honesty. I’m sorry for crashing mumble during the weekends/summer and not wanting to use Uppmax. I do promise to work on that… Anders Lundmark, thank you for being an amazing co-author and R guru. Without your support most of the projects in this thesis would not have been possible. Christofer Bäcklin, thank you for putting in so many long hours on the last manuscript & for creating so many beautiful figures. It’s been a great pleasure to work together & I’m looking forward working on more 450k pro- jects in the future. Mårten Lindqvist, thank you for fueling this thesis with espresso shots,trying to calm down the freak-outs, and for agreeing to be my ToastMan! I owe you big. Per Wahlberg, the fiercest editor around, thank you for all the help with paper IV. To my former students, Ying, Yanara, & Emelie: Thank you for giv- ing me inspiration in your eagerness to learn about ALL and epigenetics and for all that you’ve taught me in return. Lillebil Andersson thank you for making sure every- thing runs so smoothly around here!

A big thanks to everyone (past and present) from the SNP&SEQ platform. I would especially like to thank Ulrika Liljedahl, Tomas Axelsson, Torbjörn Öst, Ingvar Örn Thorsteinsson, Camilla Enström, Kristina Larsson, and Marie Linders- son for all the hands-on help with the projects included in this thesis and to the IT guys for putting up with me & all my “user issues”.

To the collaborators: Thank you to all the clinicians and patients in the Nordic Society of Pediatric Hema- tology and Oncology (NOPHO) for your invaluable contribution to this work. Mads Sønderkær, Annabeth Petersen & Kåre Lehmann Nielsen, thank you for all the help with the DGE pipeline and for an enjoyable trip to Aalborg. Maija-Leena Eloranta for the fractionated cells and for the immunology discussions, Mats Gustafsson for helping me to understand and implement “multivariate” analyses, Rolf Larsson for the help with Metacore and interesting discussions about drug resistance, Christina Leek & Anna Karin Lannergård the help with the patient samples.

Thank you to all the lovely FoA1-3 people for making our workplace enjoyable! Special thanks to the “Pub crew” Olle, Gry,& Navya, and to Stina for the long chats.

A big thank you to all my Uppsala friends, in particular: Britta, for being an excellent friend and for the encouraging skype messages. Angie, for being an amazing conference/course companion, for reminding me to be realistic & for volunteering to proof read this thesis! Daniel OD, thank for always listening and for letting me be silly. To Sahar & Sayeh, thank you for taking me in as a sister, and Sayeh for a wonderful year living together. To my roomie Lesley, I couldn’t have asked for better company while finalizing the thesis. Thanks for the great ad-

41 vice, emergency proof reading & for washing up after me. Nikki, for the moral sup- port and for helping me out at the last minute! Jelena, for being fabulous. Sara, for being there for the tears and the smiles. To Didi and Fifi for inspiring me to run faster. To Mi for always being a positive spirit. To the wine club (Sara, Jelena, Christina, & Ville), the t-shirts (Angelica, Emilia, Tobias B., Tanai, Daniel C., Ana, Caro, Daniel OD, Ammar, Ola, Claire, Sahar, Tobias J., Anna & Manda), and the “Rudbeck Gang” (Sara, Jelena, Lesley, Fiona, Nicola, Angie, Dijana, Millaray, Demet, Lucy, Mattias, Ammar & Paul) I couldn’t imagine a better group of people (Fig. 8) to spend all my “free time” with, thank you for the amazing adventures!

Figure 8. A Norreda Torp “kräftskiva” with 27.5 lovely friends (Aug 2012).

To my Hayfork girls: thank you for supporting me (each in your own special ways) and reminding me what a “hardy laugh” truly is. Jo, thank you for coming to Swe- den and for all the “fika” when I needed it most!

To my New York family: it was so nice to visit while on my “school” trips and during the holidays. I am so lucky to have such fantastic cousins.

To Martin, Your unconditional support made writing this thesis much less of a “guessing game” J. Thank you for putting up with me!

To my family, Mom & Dad, Johanna, John, Jenny, Isaac, Ellie, Cole & Ashley: Thank you all for always taking time off from work/school to have fun when I come to visit. Thank you for all the hikes, days at the lake, family dinners, and the numer- ous trips to the emergency room that made all those vacations so much fun! Thank you for being there for me, always.

42 References

1. Project THG (2004) Finishing the euchromatic sequence of the human genome. Nature 431: 931-945. 2. Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS (2005) A genome- wide scalable SNP genotyping assay using microarray technology. Nat Genet 37: 549-554. 3. Mardis ER (2008) Next-generation DNA sequencing methods. Annu Rev Hum Genet 9: 387-402. 4. Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11: 31-46. 5. Pui CH, Relling MV, Downing JR (2004) Acute lymphoblastic leukemia. N Engl J Med 350: 1535-1548. 6. Pui CH (2011) Recent advances in acute lymphoblastic leukemia. Oncology (Williston Park) 25: 341, 346-347. 7. Gustafsson G, Schmiegelow K, Forestier E, Clausen N, Glomstein A, et al. (2000) Improving outcome through two decades in childhood ALL in the Nordic countries: the impact of high-dose methotrexate in the reduction of CNS irradiation. Nordic Society of Pediatric Haematology and Oncology (NOPHO). Leukemia 14: 2267-2275. 8. Schmiegelow K, Forestier E, Hellebostad M, Heyman M, Kristinsson J, et al. (2010) Long-term results of NOPHO ALL-92 and ALL-2000 studies of childhood acute lymphoblastic leukemia. Leukemia 24: 345-354. 9. Pui CH (2010) Recent research advances in childhood acute lymphoblastic leukemia. J Formos Med Assoc 109: 777-787. 10. Pui CH, Carroll WL, Meshinchi S, Arceci RJ (2011) Biology, risk stratification, and therapy of pediatric acute leukemias: an update. J Clin Oncol 29: 551- 565. 11. Loh ML, Mullighan CG (2012) Advances in the genetics of high-risk childhood B-progenitor acute lymphoblastic leukemia and juvenile myelomonocytic leukemia: implications for therapy. Clin Cancer Res 18: 2754-2767. 12. Mullighan CG (2011) Genomic profiling of B-progenitor acute lymphoblastic leukemia. Best Pract Res Clin Haematol 24: 489-503. 13. Rand V, Parker H, Russell LJ, Schwab C, Ensor H, et al. (2011) Genomic characterization implicates iAMP21 as a likely primary genetic event in childhood B-cell precursor acute lymphoblastic leukemia. Blood 117: 6848-6855. 14. Wiemels JL, Greaves M (1999) Structure and possible mechanisms of TEL- AML1 gene fusions in childhood acute lymphoblastic leukemia. Cancer Res 59: 4075-4082.

43 15. Pane F, Intrieri M, Quintarelli C, Izzo B, Muccioli GC, et al. (2002) BCR/ABL genes and leukemic phenotype: from molecular mechanisms to clinical correlations. Oncogene 21: 8652-8667. 16. Carroll WL, Loh M, Biondi A, Willman C (2011) The Biology of Acute Lymphoblastic Leukemia Childhood Leukemia. In: Reaman GH, Smith FO, editors: Springer Berlin Heidelberg. pp. 29-61. 17. Mori H, Colman SM, Xiao Z, Ford AM, Healy LE, et al. (2002) Chromosome translocations and covert leukemic clones are generated during normal fetal development. Proc Natl Acad Sci U S A 99: 8242-8247. 18. Schmiegelow K, Vestergaard T, Nielsen SM, Hjalgrim H (2008) Etiology of common childhood acute lymphoblastic leukemia: the adrenal hypothesis. Leukemia 22: 2137-2141. 19. Nowell PC, Hungerford DA (1961) Chromosome studies in human leukemia. II. Chronic granulocytic leukemia. J Natl Cancer Inst 27: 1013-1035. 20. Kurzrock R, Kantarjian HM, Druker BJ, Talpaz M (2003) Philadelphia chromosome-positive leukemias: from basic mechanisms to molecular therapeutics. Ann Intern Med 138: 819-830. 21. Druker BJ, Sawyers CL, Kantarjian H, Resta DJ, Reese SF, et al. (2001) Activity of a specific inhibitor of the BCR-ABL tyrosine kinase in the blast crisis of chronic myeloid leukemia and acute lymphoblastic leukemia with the Philadelphia chromosome. N Engl J Med 344: 1038-1042. 22. Zachariadis V, Gauffin F, Kuchinskaya E, Heyman M, Schoumans J, et al. (2011) The frequency and prognostic impact of dic(9;20)(p13.2;q11.2) in childhood B-cell precursor acute lymphoblastic leukemia: results from the NOPHO ALL-2000 trial. Leukemia 25: 622-628. 23. Schoumans J, Johansson B, Corcoran M, Kuchinskaya E, Golovleva I, et al. (2006) Characterisation of dic(9;20)(p11-13;q11) in childhood B-cell precursor acute lymphoblastic leukaemia by tiling resolution array-based comparative genomic hybridisation reveals clustered breakpoints at 9p13.2 and 20q11.2. Br J Haematol 135: 492-499. 24. Forestier E, Gauffin F, Andersen MK, Autio K, Borgstrom G, et al. (2008) Clinical and cytogenetic features of pediatric dic(9;20)(p13.2;q11.2)- positive B-cell precursor acute lymphoblastic leukemias: a Nordic series of 24 cases and review of the literature. Genes Chromosomes Cancer 47: 149- 158. 25. Carroll WL, Raetz EA (2012) Clinical and laboratory biology of childhood acute lymphoblastic leukemia. J Pediatr 160: 10-18. 26. Russell LJ, Capasso M, Vater I, Akasaka T, Bernard OA, et al. (2009) Deregulated expression of cytokine receptor gene, CRLF2, is involved in lymphoid transformation in B-cell precursor acute lymphoblastic leukemia. Blood 114: 2688-2698. 27. Mullighan CG, Su X, Zhang J, Radtke I, Phillips LA, et al. (2009) Deletion of IKZF1 and prognosis in acute lymphoblastic leukemia. N Engl J Med 360: 470-480.

44 28. Pui CH, Mullighan CG, Evans WE, Relling MV (2012) Pediatric acute lymphoblastic leukemia: where are we going and how do we get there? Blood 120: 1165-1174. 29. Kraszewska MD, Dawidowska M, Szczepanski T, Witt M (2012) T-cell acute lymphoblastic leukaemia: recent molecular biology findings. Br J Haematol 156: 303-315. 30. Zhang J, Ding L, Holmfeldt L, Wu G, Heatley SL, et al. (2012) The genetic basis of early T-cell precursor acute lymphoblastic leukaemia. Nature 481: 157-163. 31. Gaynon PS (2005) Childhood acute lymphoblastic leukaemia and relapse. Br J Haematol 131: 579-587. 32. Kang H, Chen IM, Wilson CS, Bedrick EJ, Harvey RC, et al. (2010) Gene expression classifiers for relapse-free survival and minimal residual disease improve risk classification and outcome prediction in pediatric B-precursor acute lymphoblastic leukemia. Blood 115: 1394-1405. 33. Milani L, Lundmark A, Kiialainen A, Nordlund J, Flaegstad T, et al. (2010) DNA methylation for subtype classification and prediction of treatment outcome in patients with childhood acute lymphoblastic leukemia. Blood 115: 1214-1225. 34. Gutschner T, Diederichs S (2012) The Hallmarks of Cancer: A long non-coding RNA point of view. RNA Biol 9. 35. Stein LD (2004) Human genome: end of the beginning. Nature 431: 915-916. 36. Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, et al. (2007) Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447: 799-816. 37. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409: 860-921. 38. The-International-Haplotype-Mapping-Project (2005) A haplotype map of the human genome. Nature 437: 1299-1320. 39. Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061-1073. 40. Relling MV, Hancock ML, Rivera GK, Sandlund JT, Ribeiro RC, et al. (1999) Mercaptopurine therapy intolerance and heterozygosity at the thiopurine S- methyltransferase gene locus. J Natl Cancer Inst 91: 2001-2008. 41. Ramsey LB, Bruun GH, Yang W, Trevino LR, Vattathil S, et al. (2012) Rare versus common variants in pharmacogenetics: SLCO1B1 variation and methotrexate disposition. Genome Res 22: 1-8. 42. Stankiewicz P, Lupski JR (2010) Structural variation in the human genome and its role in disease. Annu Rev Med 61: 437-455. 43. Alkan C, Coe BP, Eichler EE (2011) Genome structural variation discovery and genotyping. Nat Rev Genet 12: 363-376. 44. Forestier E, Izraeli S, Beverloo B, Haas O, Pession A, et al. (2008) Cytogenetic features of acute lymphoblastic and myeloid leukemias in pediatric patients with Down syndrome: an iBFM-SG study. Blood 111: 1575-1583.

45 45. Mullighan CG, Goorha S, Radtke I, Miller CB, Coustan-Smith E, et al. (2007) Genome-wide analysis of genetic alterations in acute lymphoblastic leukaemia. Nature 446: 758-764. 46. Papaemmanuil E, Hosking FJ, Vijayakrishnan J, Price A, Olver B, et al. (2009) Loci on 7p12.2, 10q21.2 and 14q11.2 are associated with risk of childhood acute lymphoblastic leukemia. Nat Genet 41: 1006-1010. 47. Collins-Underwood JR, Mullighan CG (2010) Genomic profiling of high-risk acute lymphoblastic leukemia. Leukemia 24: 1676-1685. 48. Trevino LR, Yang W, French D, Hunger SP, Carroll WL, et al. (2009) Germline genomic variants associated with childhood acute lymphoblastic leukemia. Nat Genet 41: 1001-1005. 49. Greaves M (2006) Infection, immune responses and the aetiology of childhood leukaemia. Nat Rev Cancer 6: 193-203. 50. Hemminki K, Jiang Y (2002) Risks among siblings and twins for childhood acute lymphoid leukaemia: results from the Swedish Family-Cancer Database. Leukemia 16: 297-298. 51. Sherborne AL, Hemminki K, Kumar R, Bartram CR, Stanulla M, et al. (2011) Rationale for an international consortium to study inherited genetic susceptibility to childhood acute lymphoblastic leukemia. Haematologica 96: 1049-1054. 52. Stratton MR (2011) Exploring the genomes of cancer cells: progress and promise. Science 331: 1553-1558. 53. Downing JR, Wilson RK, Zhang J, Mardis ER, Pui CH, et al. (2012) The Pediatric Cancer Genome Project. Nat Genet 44: 619-622. 54. Link DC, Schuettpelz LG, Shen D, Wang J, Walter MJ, et al. (2011) Identification of a novel TP53 cancer susceptibility mutation through whole-genome sequencing of a patient with therapy-related AML. JAMA 305: 1568-1576. 55. Lilljebjorn H, Rissler M, Lassen C, Heldrup J, Behrendtz M, et al. (2012) Whole-exome sequencing of pediatric acute lymphoblastic leukemia. Leukemia 26: 1602-1607. 56. Mullighan CG, Miller CB, Radtke I, Phillips LA, Dalton J, et al. (2008) BCR- ABL1 lymphoblastic leukaemia is characterized by the deletion of Ikaros. Nature 453: 110-114. 57. Yang JJ, Bhojwani D, Yang W, Cai X, Stocco G, et al. (2008) Genome-wide copy number profiling reveals molecular evolution from diagnosis to relapse in childhood acute lymphoblastic leukemia. Blood 112: 4178-4183. 58. Kuster L, Grausenburger R, Fuka G, Kaindl U, Krapf G, et al. (2011) ETV6/RUNX1-positive relapses evolve from an ancestral clone and frequently acquire deletions of genes implicated in glucocorticoid signaling. Blood 117: 2658-2667. 59. Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, et al. (2010) The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 28: 1045-1048. 60. Gibney ER, Nolan CM (2010) Epigenetics and gene expression. Heredity 105: 4-13.

46 61. Jones PA (2012) Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet 13: 484-492. 62. Adams D, Altucci L, Antonarakis SE, Ballesteros J, Beck S, et al. (2012) BLUEPRINT to decode the epigenetic signature written in blood. Nat Biotechnol 30: 224-226. 63. Esteller M (2006) The necessity of a human epigenome project. Carcinogenesis 27: 1121-1125. 64. Ernst J, Kheradpour P, Mikkelsen TS, Shoresh N, Ward LD, et al. (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473: 43-49. 65. Henikoff S (2008) Nucleosome destabilization in the epigenetic regulation of gene expression. Nat Rev Genet 9: 15-26. 66. You JS, Jones PA (2012) Cancer genetics and epigenetics: two sides of the same coin? Cancer Cell 22: 9-20. 67. Hock H (2012) A complex Polycomb issue: the two faces of EZH2 in cancer. Genes Dev 26: 751-755. 68. Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, et al. (2006) A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125: 315-326. 69. Holliday R, Pugh JE (1975) DNA modification mechanisms and gene activity during development. Science 187: 226-232. 70. Riggs AD (1975) X inactivation, differentiation, and DNA methylation. Cytogenet Cell Genet 14: 9-25. 71. Cedar H, Bergman Y (2012) Programming of DNA methylation patterns. Annu Rev Biochem 81: 97-117. 72. Feinberg AP, Vogelstein B (1983) Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 301: 89-92. 73. Ziller MJ, Muller F, Liao J, Zhang Y, Gu H, et al. (2011) Genomic distribution and inter-sample variation of non-CpG methylation across human cell types. PLoS Genet 7: e1002389. 74. Bird AP (1980) DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8: 1499-1504. 75. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, et al. (2009) The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 41: 178-186. 76. Ji H, Ehrlich LI, Seita J, Murakami P, Doi A, et al. (2010) Comprehensive methylome map of lineage commitment from haematopoietic progenitors. Nature 467: 338-342. 77. Heyn H, Li N, Ferreira HJ, Moran S, Pisano DG, et al. (2012) Distinct DNA methylomes of newborns and centenarians. Proc Natl Acad Sci U S A 109: 10522-10527. 78. Sharma S, Kelly TK, Jones PA (2010) Epigenetics in cancer. Carcinogenesis 31: 27-36. 79. Feinberg AP, Ohlsson R, Henikoff S (2006) The epigenetic progenitor origin of human cancer. Nat Rev Genet 7: 21-33. 80. Feinberg AP, Tycko B (2004) The history of cancer epigenetics. Nat Rev Cancer 4: 143-153.

47 81. Jones PA, Baylin SB (2007) The epigenomics of cancer. Cell 128: 683-692. 82. Dawson MA, Kouzarides T (2012) Cancer epigenetics: from mechanism to therapy. Cell 150: 12-27. 83. Easwaran H, Johnstone SE, Van Neste L, Ohm J, Mosbruger T, et al. (2012) A DNA hypermethylation module for the stem/progenitor cell signature of cancer. Genome Res 22: 837-849. 84. Vire E, Brenner C, Deplus R, Blanchon L, Fraga M, et al. (2006) The Polycomb group protein EZH2 directly controls DNA methylation. Nature 439: 871- 874. 85. Florean C, Schnekenburger M, Grandjenette C, Dicato M, Diederich M (2011) Epigenomics of leukemia: from mechanisms to therapeutic applications. Epigenomics 3: 581-609. 86. Wong NC, Ashley D, Chatterton Z, Parkinson-Bates M, Ng HK, et al. (2012) A distinct DNA methylation signature defines pediatric pre-B cell acute lymphoblastic leukemia. Epigenetics 7: 535-541. 87. Kraszewska MD, Dawidowska M, Larmonie NS, Kosmalska M, Sedek L, et al. (2012) DNA methylation pattern is altered in childhood T-cell acute lymphoblastic leukemia patients as compared with normal thymic subsets: insights into CpG island methylator phenotype in T-ALL. Leukemia 26: 367-371. 88. Hogan LE, Meyer JA, Yang J, Wang J, Wong N, et al. (2011) Integrated genomic analysis of relapsed childhood acute lymphoblastic leukemia reveals therapeutic strategies. Blood 118: 5218-5226. 89. Taylor KH, Pena-Hernandez KE, Davis JW, Arthur GL, Duff DJ, et al. (2007) Large-scale CpG methylation analysis identifies novel candidate genes and reveals methylation hotspots in acute lymphoblastic leukemia. Cancer Res 67: 2617-2625. 90. Gutierrez MI, Siraj AK, Bhargava M, Ozbek U, Banavali S, et al. (2003) Concurrent methylation of multiple genes in childhood ALL: Correlation with phenotype and molecular subgroup. Leukemia 17: 1845-1850. 91. Roman-Gomez J, Jimenez-Velasco A, Castillejo JA, Agirre X, Barrios M, et al. (2004) Promoter hypermethylation of cancer-related genes: a strong independent prognostic factor in acute lymphoblastic leukemia. Blood 104: 2492-2498. 92. Kuang SQ, Tong WG, Yang H, Lin W, Lee MK, et al. (2008) Genome-wide identification of aberrantly methylated promoter associated CpG islands in acute lymphocytic leukemia. Leukemia 22: 1529-1538. 93. Bernt KM, Armstrong SA (2011) Targeting epigenetic programs in MLL- rearranged leukemias. Hematology Am Soc Hematol Educ Program 2011: 354-360. 94. Rideout WM, 3rd, Coetzee GA, Olumi AF, Jones PA (1990) 5-Methylcytosine as an endogenous mutagen in the human LDL receptor and p53 genes. Science 249: 1288-1290. 95. Figueroa ME, Abdel-Wahab O, Lu C, Ward PS, Patel J, et al. (2010) Leukemic IDH1 and IDH2 mutations result in a hypermethylation phenotype, disrupt TET2 function, and impair hematopoietic differentiation. Cancer Cell 18: 553-567.

48 96. Ley TJ, Ding L, Walter MJ, McLellan MD, Lamprecht T, et al. (2010) DNMT3A mutations in acute myeloid leukemia. N Engl J Med 363: 2424- 2433. 97. Simon C, Chagraoui J, Krosl J, Gendron P, Wilhelm B, et al. (2012) A key role for EZH2 and associated genes in mouse and human adult T-cell acute leukemia. Genes Dev 26: 651-656. 98. Schuster-Bockler B, Lehner B (2012) Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 99. Knudson AG, Jr. (1971) Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A 68: 820-823. 100. Berger AH, Knudson AG, Pandolfi PP (2011) A continuum model for tumour suppression. Nature 476: 163-169. 101. Gingeras TR (2007) Origin of phenotypes: genes and transcripts. Genome Res 17: 682-690. 102. Yu W, Gius D, Onyango P, Muldoon-Jacobs K, Karp J, et al. (2008) Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature 451: 202-206. 103. Mattick JS (2009) The genetic signatures of noncoding RNAs. PLoS Genet 5: e1000459. 104. Nielsen H (2011) The transcriptional landscape. Methods Mol Biol 703: 3-14. 105. Taft RJ, Simons C, Nahkuri S, Oey H, Korbie DJ, et al. (2010) Nuclear- localized tiny RNAs are associated with transcription initiation and splice sites in metazoans. Nat Struct Mol Biol 17: 1030-1034. 106. Mattick JS, Amaral PP, Dinger ME, Mercer TR, Mehler MF (2009) RNA regulation of epigenetic processes. Bioessays 31: 51-59. 107. Buhler M, Moazed D (2007) Transcription and RNAi in heterochromatic gene silencing. Nat Struct Mol Biol 14: 1041-1048. 108. Faghihi MA, Wahlestedt C (2009) Regulatory roles of natural antisense transcripts. Nat Rev Mol Cell Biol 10: 637-643. 109. Baylin SB, Jones PA (2011) A decade of exploring the cancer epigenome - biological and translational implications. Nat Rev Cancer 11: 726-734. 110. Kim DH, Villeneuve LM, Morris KV, Rossi JJ (2006) Argonaute-1 directs siRNA-mediated transcriptional gene silencing in human cells. Nat Struct Mol Biol 13: 793-797. 111. Pant PV, Tao H, Beilharz EJ, Ballinger DG, Cox DR, et al. (2006) Analysis of allelic differential expression in human white blood cells. Genome Res 16: 331-339. 112. Lo HS, Wang Z, Hu Y, Yang HH, Gere S, et al. (2003) Allelic variation in gene expression is common in the human genome. Genome Res 13: 1855-1862. 113. Knight JC (2004) Allele-specific gene expression uncovered. Trends Genet 20: 113-116. 114. Tycko B (2010) Allele-specific DNA methylation: beyond imprinting. Hum Mol Genet 19: R210-220. 115. Pastinen T (2010) Genome-wide allele-specific analysis: insights into regulatory variation. Nat Rev Genet 11: 533-538.

49 116. Milani L, Gupta M, Andersen M, Dhar S, Fryknas M, et al. (2007) Allelic imbalance in gene expression as a guide to cis-acting regulatory single nucleotide polymorphisms in cancer cells. Nucleic Acids Res 35: e34. 117. Ghotbi R, Gomez A, Milani L, Tybring G, Syvanen AC, et al. (2009) Allele- specific expression and gene methylation in the control of CYP1A2 mRNA level in human livers. J 9: 208-217. 118. Shoemaker R, Deng J, Wang W, Zhang K (2010) Allele-specific methylation is prevalent and is contributed by CpG-SNPs in the human genome. Genome Res 20: 883-889. 119. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531-537. 120. Ross ME, Zhou X, Song G, Shurtleff SA, Girtman K, et al. (2003) Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. Blood 102: 2951-2959. 121. Alizadeh AA, Ross DT, Perou CM, van de Rijn M (2001) Towards a novel classification of human malignancies based on gene expression patterns. J Pathol 195: 41-52. 122. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, et al. (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1: 133-143. 123. Bacher U, Kohlmann A, Haferlach T (2009) Perspectives of gene expression profiling for diagnosis and therapy in haematological malignancies. Brief Funct Genomic Proteomic 8: 184-193. 124. International-Human-Genome-Consortium; (2004) Finishing the euchromatic sequence of the human genome. Nature 431: 931-945. 125. Syvanen AC (2005) Toward genome-wide SNP genotyping. Nat Genet 37 Suppl: S5-10. 126. Mardis ER (2006) Anticipating the 1,000 dollar genome. Genome Biol 7: 112. 127. Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 10: 57-63. 128. Lipshutz RJ, Fodor SP, Gingeras TR, Lockhart DJ (1999) High density synthetic oligonucleotide arrays. Nat Genet 21: 20-24. 129. Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, et al. (1992) A genomic sequencing protocol that yields a positive display of 5- methylcytosine residues in individual DNA strands. Proc Natl Acad Sci U S A 89: 1827-1831. 130. Nielsen KL, Hogh AL, Emmersen J (2006) DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples. Nucleic Acids Res 34: e133. 131. Velculescu VE, Vogelstein B, Kinzler KW (2000) Analysing uncharted with SAGE. Trends Genet 16: 423-425. 132. t Hoen PA, Ariyurek Y, Thygesen HH, Vreugdenhil E, Vossen RH, et al. (2008) Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Res 36: e141.

50 133. Heap GA, Yang JH, Downes K, Healy BC, Hunt KA, et al. (2010) Genome- wide analysis of allelic expression imbalance in human primary cells by high-throughput transcriptome resequencing. Hum Mol Genet 19: 122-134. 134. Zhang K, Li JB, Gao Y, Egli D, Xie B, et al. (2009) Digital RNA allelotyping reveals tissue-specific and allele-specific gene expression in human. Nat Methods 6: 613-618. 135. Kiialainen A, Karlberg O, Ahlford A, Sigurdsson S, Lindblad-Toh K, et al. (2011) Performance of microarray and liquid based capture methods for target enrichment for massively parallel sequencing and SNP discovery. PLoS One 6: e16486. 136. The-R-Development-Core-Team (2010) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. 137. Smyth GK, Michaud J, Scott HS (2005) Use of within-array replicate spots for assessing differential expression in microarray experiments. Bioinformatics 21: 2067-2075. 138. Tibshirani R, Hastie T, Narasimhan B, Chu G (2002) Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99: 6567-6572. 139. Robinson MD, McCarthy DJ, Smyth GK (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139-140. 140. Shaffer JP (1995) Multiple Hypothesis Testing. Annual Reviews Psychology 46: 561-584. 141. Koscielny G, Le Texier V, Gopalakrishnan C, Kumanduri V, Riethoven JJ, et al. (2009) ASTD: The Alternative Splicing and Transcript Diversity database. Genomics 93: 213-220. 142. Levin JZ, Yassour M, Adiconis X, Nusbaum C, Thompson DA, et al. (2010) Comprehensive comparative analysis of strand-specific RNA sequencing methods. Nat Methods 7: 709-715. 143. Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H, et al. (2009) Next- generation tag sequencing for cancer gene expression profiling. Genome Res 19: 1825-1835. 144. Gertz J, Varley KE, Reddy TE, Bowling KM, Pauli F, et al. (2011) Analysis of DNA methylation in a three-generation family reveals widespread genetic influence on epigenetic regulation. PLoS Genet 7: e1002228. 145. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, et al. (2008) The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320: 1344-1349. 146. Gimelbrant A, Hutchinson JN, Thompson BR, Chess A (2007) Widespread monoallelic expression on human autosomes. Science 318: 1136-1140. 147. Guttman M, Rinn JL (2012) Modular regulatory principles of large non-coding RNAs. Nature 482: 339-346. 148. Doi A, Park IH, Wen B, Murakami P, Aryee MJ, et al. (2009) Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet 41: 1350-1353.

51 149. Alisch RS, Barwick BG, Chopra P, Myrick LK, Satten GA, et al. (2012) Age- associated DNA methylation in pediatric populations. Genome Res 22: 623-632. 150. Testa U (2011) Leukemia stem cells. Ann Hematol 90: 245-271. 151. Tan AC, Jimeno A, Lin SH, Wheelhouse J, Chan F, et al. (2009) Characterizing DNA methylation patterns in pancreatic cancer genome. Mol Oncol 3: 425- 438. 152. De Carvalho DD, Sharma S, You JS, Su SF, Taberlay PC, et al. (2012) DNA methylation screening identifies driver epigenetic events of cancer cell survival. Cancer Cell 21: 655-667. 153. Issa JP (2012) DNA methylation as a clinical marker in oncology. J Clin Oncol 30: 2566-2568.

52