DNA methylation marks in peripheral blood and the risk of developing mature B cell neoplasms

Nicole Wong Doo

A thesis in fulfillment of the requirements for the degree of Doctor of Philosophy

School of Population and Global Health

The University of Melbourne

2018

ORCID ID: 0000-0003-3725-3397

0 Blank

1 Abstract

Dysregulation of DNA methylation is a feature of mature B cell neoplasms (MBCN) but it is not known whether methylation changes can be detected in blood-derived DNA prior to MBCN diagnosis. In this prospective cohort study, peripheral blood was collected from healthy participants at recruitment (1990-1994). Participants who were subsequently diagnosed with MBCN (chronic lymphocytic lymphoma, B cell non-Hodgkin lymphoma and myeloma) up to 2012 were matched to the same number of controls based on age, sex, ethnicity, and type of blood sample (Guthrie cards, mononuclear cells, buffy coats). DNA methylation was measured using the Infinium®HumanMethylation450 BeadChip. Peripheral blood DNA was collected from 438 matched case-control pairs, a median of 10.6 years prior to diagnosis with MBCN. A series of analytical approaches was used in order to evaluate whether there was a distinct methylation profile associated with MBCN. First, global methylation analysis was performed, identifying increased methylation in CpG island and promoter-associated CpGs and widespread hypomethylation. Second, conditional logistic regression was used to identify differentially methylated CpG sites (DMPs) and kernel smoothing was used to identify differentially methylated regions (DMRs). Third, differential methylation variability, considered to be a distinctive feature in cancer, was assessed. In total, 1,338 DMPs were identified, of which 90 had gain of methylation in CpG sites associated with homeobox and 1,248 had loss of methylation in CpG sites associated with MAPK signaling pathway genes and genes involved in chemokine signaling pathways. There were 9,857 DMRs, with a cluster of 151 DMRs located in a 3.8kb region on 6p21.3, corresponding to the major histocompatibility locus. Differential methylation variability analysis identified 144 novel CpG sites distinctively located outside CpG islands.

Conclusion: Distinctive changes in peripheral blood DNA methylation can be detected many years prior to diagnosis with MBCN, suggesting that changes in DNA methylation are an early epigenetic event. This contributes to our understanding of the timing of methylation changes in the development of MBCN.

2 Blank

3 Declaration

This is to certify that: (i) the thesis comprises only my original work towards the PhD except where indicated, (ii) due acknowledgement has been made in the text to all other material used, (iii) the thesis is less than 100,000 words in length, exclusive of table, maps, bibliographies, appendices and footnotes.

4 Blank

5 Preface

(i) The work contained within this thesis was performed as a collaboration with Graham G. Giles, Dallas R. English and John L. Hopper at the School of Population and Global Health, University of Melbourne and the Cancer Epidemiology and Intelligence Division, Cancer Council Victoria, who established the Melbourne Collaborative Cohort Study (MCCS); Melissa C. Southey, JiHoon E. Joo and Ee Ming Wong at the Genetic Epidemiology Laboratory, University of Melbourne who performed the DNA methylation assay; Enes Makalic, Daniel F. Schmidt and Chol- Hee Jung who performed the bioinformatics analysis. The component of the work which I contributed as original research was to assist in planning the nested study design, wholly planning and designing the statistical analyses, wholly interpreting the results and writing of the manuscript. The regional methylation analysis was performed by JiHoon E. Joo, according to specifications outlined by me and I interpreted the analysis in full. The differential variability analysis was performed by Dr Pierre Antoine-Dugué, and I interpreted the results. (ii) No portion of this thesis has been submitted for other qualifications (iii) No portion of this thesis was carried out prior to enrolment in the degree (iv) No third party editorial assistance was provided in preparation of the thesis (v) For the publication included in this thesis, the roles of the authors are as follows: Enes Makalic – biostatistical analysis of array data; JiHoon E. Joo – DNA methylation assay; Claire M Vajdic – contribution to manuscript; Daniel F. Schmidt – biostatistical analysis of array data; Ee Ming Wong – DNA methylation assay; Gianluca Severi – contribution to study design and review of manuscript; Daniel J. Park – contribution to manuscript; Jessica Chung – contribution to developing bioinformatics pipeline; Laura Baglietto – contribution to study design and review of manuscript; Henry M. Prince – contribution to manuscript; John F. Seymour – contribution to manuscript; Constantine Tam – contribution to manuscript; John L. Hopper – contribution to study design; Dallas R. English – contribution to study design; Dallas R. English – contribution to study design; Roger L. Milne – contribution to manuscript; Simon J. Harrison – contribution to manuscript; Melissa C. Southey – DNA methylation assay and contribution to study design; Graham G. Giles – contribution to study design and manuscript. (vii) The MCCS was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 209057, 251553 and 504711 and by infrastructure provided by Cancer Council Victoria.

i Blank

ii Acknowledgements

Nicholas Brennan – for your patience and understanding

Ella & Luca – you embody the forces of nature and time

Elsa & Victor Wong Doo – a lifetime of support and encouragement

Colleagues at Concord Hospital – for your unquestioning support

iii

Blank

iv Table of Contents

Abstract ...... 2 Declaration ...... 4 Preface ...... i Acknowledgements ...... iii Table of Contents ...... v List of Tables ...... vii List of Figures ...... viii List of Abbreviations ...... 10 Acknowledgements ...... Error! Bookmark not defined. 1 Introduction ...... 12 2 Background ...... 14 2.1 Mature B cell neoplasms – Background ...... 14 2.2 DNA Methylation ...... 26 2.3 Differential methylation as a marker of cancer risk ...... 40 2.4 Measuring DNA methylation ...... 41 2.5 Measures of differential methylation ...... 46 2.6 Biological challenges in measuring DNA methylation ...... 47 3 Study Design ...... 50 3.1 Melbourne Collaborative Cohort Study ...... 50 3.2 Nested Case-Control Study, participant selection ...... 51 4 Methods ...... 55 4.1 DNA source and sample collection ...... 55 4.2 DNA Extraction and Bisulfite conversion ...... 55 4.3 DNA methylation measurement ...... 56 4.4 Data processing ...... 56 4.5 CpG site selection ...... 57 4.6 Assembly of Candidate Genes ...... 58 5 Results ...... 59 5.1 Global DNA Methylation ...... 59 5.2 Differentially methylated positions ...... 72 Background ...... 72 Analysis ...... 72 Results ...... 75 Discussion ...... 103 5.3 Differentially methylated regions ...... 111 Background ...... 111 Analysis ...... 112 Results ...... 112 Discussion ...... 135 5.4 Differential methylation variability ...... 139

v Background ...... 139 Analysis ...... 140 Results ...... 141 Discussion ...... 144 6 Conclusions and Future Work ...... 146 Appendices ...... 149 References ...... 187

vi List of Tables

Table 1: MBCN classification (WHO) ...... 16 Table 2: Common structural chromosomal abnormalities thought to be primary events in MBCN ...... 17 Table 3: Recurrent mutations in MBCN ...... 18 Table 4: SNPs identified from published genome-wide association studies ...... 24 Table 5: Putative tumour suppressor genes exhibiting promoter hypermethylation and reduced expression ...... 28 Table 6: Summary of published studies reporting aberrant DNA methylation in CLL 30 Table 7: Summary of studies reporting an association between DNA methylation and prognosis ...... 31 Table 8: Summary of published studies reporting aberrant DNA methylation in mantle cell lymphoma ...... 33 Table 9: Summary of published studies reporting aberrant DNA methylation in follicular lymphoma ...... 33 Table 10: Summary of published studies reporting an association between methylation and prognosis in follicular lymphoma ...... 34 Table 11: Summary of published studies reporting aberrant DNA methylation in diffuse large B cell lymphoma ...... 35 Table 12: Summary of studies reporting an association between DNA methylation and prognosis ...... 36 Table 13: Summary of published studies reporting aberrant global DNA methylation in multiple myeloma ...... 36 Table 14: Summary of published studies reporting aberrant DNA methylation at specific CpG sites in multiple myeloma ...... 36 Table 15: Summary of studies reporting an association between DNA methylation and the clinical progression in later stages of myeloma ...... 37 Table 16: Histological diagnoses and assigned tumour group ...... 52 Table 17: Demographics of study population ...... 54 Table 18: Significant KEGG pathways for genes containing one or more CpG sites with loss of methylation ...... 79 Table 19: ANOVA comparisons of mean methylation difference in the four tumour subytpes ...... 81 Table 20: ANOVA comparisons of mean methylation according to time lag between blood collection and diagnosis ...... 82 Table 21: DMPs corresponding to genes described as showing aberrant methylation in literature review ...... 84 Table 22: Effect of correcting conditional logistic regression analysis using different models of white blood cell content adjustment ...... 94 Table 23: DMPs identified after adjustment for white blood cell content ...... 96 Table 24: Low grade lymphoma histological types ...... 99 Table 25: Methylation status of DMRs within genes known to be aberrantly methylated in MBCN...... 119 Table 26: DMRs containing a DMP identified following conditional logistic regression with adjustment for white blood cell content (p<1.2x10-7) ...... 122 Table 27: Top DMRs ranked by different methods ...... 124

vii Table 28: Number of DMRs with mean and maximum methylation difference 2 - 4%...... 125 Table 29: KEGG pathways associated with genes demonstrating a loss of methylation within a DMR ...... 126 Table 30: DMRs found after applying a threshold of mean methylation difference of >3%...... 127 Table 31: DMRs found after applying a threshold of maximum methylation difference of >4%...... 127 Table 32: Number of DMRs according to different pcomb value thresholds ...... 129 Table 33: KEGG pathway analysis of DMRs, p<1x10-15 ...... 129 Table 34: DMRs with Stouffer p<1x10-25 (λ=1000), ranked by magnitude of maximum methylation difference...... 130 Appendix Table 1: List of candidate genes identified as mutated in MBCN (mutation prevalence>4%) ...... 149 Appendix Table 2: List of genes identified as aberrantly methylated in MBCN from literature review ...... 150 Appendix Table 3: Full list of DMPs ...... 151 β=β methylation value ...... 182 Appendix Table 4: Differentially variable probes ...... 183 VarRatio=variance ratio ...... 186 ICC=intraclass correlation coefficient ...... 186

List of Figures

Figure 1: Flow diagram of study participants ...... 53 Figure 2: Strategy for selecting CpG sites to include in the final analysis ...... 57 Figure 3: Comparison of different methods of correction for multiple testing demonstrating the stringency of the Bonferroni method ...... 75 Figure 4: Mean methylation difference in all DMPs...... 76 Figure 5: Differential methylation in DMP cg04771285 ...... 77 Each dot represents methylation difference of a single case-control pair ...... 77 Figure 6: Mean methylation difference of DMPs by CpG island location ...... 78 Figure 7: Comparison of models before and after adjustment for lifestyle ...... 85 Figure 8: Computed white blood cell proportions by DNA source ...... 87 88 Figure 9: B cell proportions ...... 88 89 Figure 10: Granulocyte proportions ...... 89 Figure 11: Correlation between methylation and cell content ...... 91 Figure 12: Correlation between B cell content and absolute DNA methylation for DMP cg06313775...... 92

viii Figure 13: Correlation between B cell content and DNA methylation for DMP cg13814485...... 92 Figure 14: Correlation between granulocyte content and absolute DNA methylation level for DMP cg12699321...... 93 Figure 15: DMPs in common between the unadjusted model and the model adjusted white blood cell content ...... 95 Figure 16: Principal component analysis of all 438 case-control pairs, demonstrating the group of outliers ...... 97 Figure 17: Differential methylation by tumour subtype...... 98 Figure 18: Differential methylation in the 1,338 DMPs compared across the four time-lag groups (A & B)...... 101 Figure 19: Association between proportion of CpG island content of DMR and methylation ...... 113 Figure 20: Association between proportion of promoter-associated CpGs within DMR and methylation ...... 113 Figure 21: Chromosomal location of DMRs showing peak in Chm 6p21.3 ...... 114 Figure 22: DMRs in 6 ...... 115 Figure 23: Association between properties of DMRs and concordance with methylation findings in literature ...... 117 Figure 24: No relationship between magnitude of methylation difference and Stouffer p value ...... 123 Figure 25: Comparison of DMRs identified using λ=1000 and λ=500 ...... 132 Figure 26: Differential methylation variability ...... 139 Figure 27: Differentially variable positions demonstrated by chromosomal location ...... 142 Figure 28: Plot of variance ratio and differential methylation ...... 143

ix List of Abbreviations ABC activated B-cell AID activation-induced cytidine deaminase AZA azathioprine BCR B-cell receptor BMI body mass index BTK Bruton's tyrosine kinase CHARM comprehensive high-throughput arrays for relative methylation CI confidence interval CLL chronic lymphocytic leukaemia COBRA combined bisulfite restriction analysis CpG CpG dinucleotide DLBCL diffuse large B-cell lymphoma DMP differentially methylated position DMR differentially methylated region DNA deoxyribonucleic acid DNMT DNA methyl-transferase DVP Differentially variable position ES embryonic stem FDR false discovery rate GCB germinal centre B-cell GWAS genome-wide association studies HELP HpaII tiny fragment enrichment by ligation-mediated PCR HIV human immunodeficiency virus HNSCC head and neck squamous cell carcinoma HR hazard ratio ICD-O-3 International Classification of Disease, 3rd edition Ig immunoglobulin IGH Immunoglobulin heavy chain IGHV-M mutated immunoglobulin heavy chain gene IGHV-UM unmutated immunoglobulin heavy chain gene IL-7 interleukin 7 KEGG Kyoto Encyclopedia of Genes and Genomes LPL lymphoplasmacytic lymphoma MALT mucosal-associated lymphoid tissue MAPK mitogen-activated pathway kinase MBCN mature B-cell neoplasm MBL monoclonal B-cell lymphocytosis MCAM methylated CpG island amplification microarray MCCS Melbourne collaborative cohort study MCL mantle cell lymphoma MGUS monoclonal gammopathy of undetermined significance MIRA methylated CpG island recovery assay MM multiple myeloma

10 MSP methylation-sensitive PCR MVP methylation variable position NF-kB nuclear factor kappa-light chain enhancer of activated B cells NFAT nuclear factor of activated T cells NHL non-Hodgkin lymphoma OR odd ratio PCR polymerase chain reaction PRC2 polycomb repressor complex-2 RAF relative allele frequency RNA ribonucleic acid RR risk ratio RRBS reduced representation bisulfite sequencing SEER Surveillance, Epidemiology and End Results SLL small lymphocytic lymphoma SNP single nucleotide polymorphism SWAN subset-quartile within array normalisation SYK spleen tyrosine kinase TSG tumour suppressor gene USA United States of America VMR variably methylated region WBC white blood cell WHO World Health Organisation WM Waldenström's macroglobulinaemia

11 1 Introduction

Mature B cell neoplasms (MBCN) are a group of haematological malignancies classified histologically according to the World Health Organisation into chronic lymphocytic leukaemia (CLL), non-Hodgkin lymphoma (NHL) including follicular lymphoma and diffuse large B cell lymphoma (DLBCL) and multiple myeloma (MM) (1). They account for about 6% of all cancers diagnosed in Australia (2) and the majority of haematological malignancies. Although overall survival from MBCN has improved over the last decade, they continue to be a significant source of morbidity and annually in Australia they cause over 25,000 years of life to be lost prematurely (3).

The underlying aetiology of MBCN remains largely unexplained. While there is evidence for both environmental and genetic risk factors, the magnitude of any individual risk factor identified thus far is small, suggesting other as yet unidentified risks may be in play. The pattern of familial risk in MBCN suggests that heritable risk factors are shared across different MBCN types. For instance a family history of monoclonal gammopathy of undetermined significance (MGUS), a precursor of MM, is associated with increased risk of CLL (4).

Genetic events such as structural chromosomal changes and gene mutations are thought to occur both as initial genetic ‘hits’ in the pathogenesis of MBCN as well as later during disease progression. In B-NHL and CLL the most common locations of recurrent genetic events are in the immunoglobulin heavy chain (IGH) locus on chromosome 14, and in BCL2 (B cell lymphoma 2), MYC and NOTCH genes (5-7). In MM, primary or driver genetic events are found in the IGH locus and BCL6 and MYC genes (8). While some MBCN are virtually defined by a pathognomic mutation such as BRAF in hairy cell leukaemia and MYD88 in Waldenström’s macroglobulinaemia, the majority of MBCN are not characterised by a single genetic mutation and carry a diverse range of mutations occurring at low frequencies. Epidemiological studies examining the environmental and familial risk factors for MBCN have found modest associations (4, 9, 10). A number of genome-wide association studies have identified single nucleotide genetic changes associated with MBCN, revealing a number of relatively common genetic variations may each contribute to a small risk of MBCN (11-33).

Thus far, traditional epidemiological and genetic epidemiological studies have not established major risks associated with developing MBCN. A focus on epigenetic pathways in solid cancers and in haematological malignancies has thus evolved with our increasing understanding of the relationship between epigenetic mechanisms and their role in gene transcription. A host of epigenetic pathways are dysregulated in MBCN, including DNA methylation (34-38), histone modification, chromatin modeling and microRNAs (39). This thesis focuses on DNA methylation, the addition of a methyl group to DNA at CpG dinucleotides. The effect of CpG methylation in many instances is of downstream transcription silencing and, together with histone

12 modification, this is considered to be a major mechanism of gene regulation (40). DNA methylation is observed to change with increasing age and under the influence of environmental changes such as exposure to high dietary folate intake and exposure to smoking, mediated by reversible enzymatic processes that can add or remove the dimethylgroup. Methylation therefore represents an appealing potential link between environmental exposures and gene regulation in the pathogenesis of MBCN. Pharmacological agents that affect methylation are incorporated into treatment of MBCN in some instances, raising the possibility that novel methylation findings could contribute to clinically relevant discoveries in MBCN. Thus far, two distinct patterns of widespread DNA hypomethylation and promoter hypermethylation are features of MBCN but it is not yet clear whether these are early or late epigenetic events, nor whether they drive malignant transformation or are passenger events. The traditional view of DNA methylation in MBCN is that it is a late and possibly secondary event (8). If methylation changes were identified in blood samples many years prior to diagnosis it would support an alternative hypothesis that methylation is in fact an early step in MBCN pathogenesis. Thus far there is only one study evaluating peripheral blood methylation in pre-diagnostic blood from CLL subjects and none evaluating this in other MBCN types (41).

Identification of novel epigenetic risk factors has the potential to lead to new methods of early diagnosis through ‘liquid biopsies’ or minimally-invasive methods for monitoring disease activity. There is also the potential of uncovering novel mechanisms of disease that may become future therapeutic targets. The available technology for the evaluation of DNA methylation has improved in recent years with the advent of high through-put array-based assays which measure methylation at single CpG sites. Until recently, the cost of measuring the methylation status of large areas of the genome has been prohibitive. Using modern high-throughput DNA methylation assays, a ‘methylome’ has now been described for CLL, some NHL types and MM (42-47). This technology is well-suited for application to population-based studies evaluating DNA methylation levels in peripheral blood.

Hypothesis: DNA methylation changes present in peripheral blood could be detectable years before diagnosis with MBCN.

Aims: • To describe global DNA changes associated with MBCN • To describe CpG site-specific changes in methylation associated with MBCN • To use exploratory methods of identifying differential methylation including measures of differential regional methylation and methylation variance

13 2 Background 2.1 Mature B cell neoplasms – Background

The process of B cell development, including the normal signaling pathways and genetic events, are described as background to the genetic mutations and epigenetic events in MBCN.

Normal B Cell development Normal B cells arise from the pluripotent haematopoeitic stem cell precursor in the bone marrow and undergo maturation initially in the marrow and then in the lymph nodes. The primary physiological role of B lymphocytes is to recognise and react to an infinite number of potential antigens that may be encountered during a lifetime. This is achieved by developing a broad range of B cell clones during multiple genetic events called recombination, somatic hypermutation and class or isotype switching. This process by which each clone expresses a unique B-cell receptor allows B cells to fulfill their immunological role but puts them at risk of acquiring DNA errors.

The first event in recombination takes place in precursor B cells in the bone marrow where they undergo rearrangement of the V, D and J segments of the immunoglobulin (Ig) heavy chain gene and rearrangement of the light chain gene that induce double-stranded DNA (dsDNA) breaks.

The immunoglobulin heavy chain (IGH) locus is located on the long arm of chromosome 14 at 14q32. The segments undergoing recombination cover a very large region, for instance VH and DH segments can be separated by up to 2.5 Mb, suggesting a process of locus contraction and expansion occurs in order to bring the IGH segments in contact with each other. Recombination is carried out by the lymphoid-specific recombination-activating gene 1 (RAG1) and RAG2 (48). Initially, the method for recombination across such large regions of the genome was poorly understood. One of the generally accepted models is that recombination begins with the IGH locus in its usual expanded conformation after initial RAG expression is initiated. The process continues under the influence of early B cell factor gene (EBF) upregulation which promotes contraction of the IGH locus under Pax5 transcription factor influence (48). The RAG complex brings together two segments of the IGH locus and induces a double-stranded break in a random location. Subsequently terminal deoxynucleotidyl transferase inserts nucleotides at the D-J, and V-DJ junctions thereby increasing the diversity of the antigen-receptor repertoire (49). Other essential factors for Ig rearrangement include interleukin-7 (IL- 7) receptor signaling and phosphoinositide 3-kinase (PI3K) signaling, which regulate proliferation and survival of the pre-B cells. Recombination is a process of allelic exclusion where, the final B-cell receptor is expressed from only one allele at the heavy chain locus (one IGH allele) and light chain locus (one IgK or IgL allele). If recombination results in a non-functional B-Cell receptor, the B cell undergoes apoptosis.

14 The resultant immature B cells migrate from the bone marrow to the periphery – either lymph nodes or spleen. Upon encounter with antigen they are activated and migrate to the germinal centre of a lymph node. Within the germinal centre, the IGH locus undergoes further modification known as somatic hypermutation during which additional mutations are promoted under the influence of activation-induced deaminase (AID).

A final AID-induced process of IGH modification called class switch recombination occurs in order to produce different immunoglobulin isotypes (from IgM/IgD to IgG or IgA) (49). These isotypes promote ability to respond to different antigenic stimuli and generate varied cellular and humoral immune response. AID induces double strand DNA breaks which are repaired by the mismatch repair pathway. However, during this repair process they can be mistakenly joined to double strand breaks occurring elsewhere in the genome. This can result in the recurrent chromosomal translocations characteristic of some MBCN such as those described in Table 2.

It is not only the B cells themselves that mediate the maturation process. Cellular interactions between T follicular helper cells, follicular dendritic cells and B Cells within the germinal centre of the lymph node are thought to be essential for somatic hypermutation and isotype switching (49). The resulting activated B cells further differentiate into either memory B cells or plasma cells.

Regulation of B cell proliferation

B cell proliferation and differentiation is linked to changes in signaling roles of the B cell surface receptor. Early B cells express a pre-B cell receptor that favours B cell proliferation under the influence of interleukin-7 (IL-7). The surface membrane component of the mature B cell receptor is composed of CD79 and CD79b proteins. Nearby surface immunoglobulins’ role, as described above, is to be able to recognise a broad repertoire of antigens to facilitate the appropriate immune response. The B cell receptor components aggregate with cell surface immunoglobulin and subsequent antigen binding to the surface immunoglobulin leads to a conformational change in the B cell receptor and downstream pro-proliferative signaling. The enzymes spleen tyrosine kinase (SYK) and Bruton’s tyrosine kinase (BTK) are essential intermediate components linking surface proteins to more distal signaling via B cell linker (50). This cytoplasmic cell signaling cascade orchestrates the activation of three transcription factors that each drive pro-survival and proliferative effects: nuclear factor of activated T cells (NFAT), nuclear factor kappa-light-chain-enhancer of activated B cells (NF-kB) and activator protein 1 (AP1) (51). NFAT and NF-kB signaling is initiated by SYK-mediated BTK phosphorylation. In parallel, the activated B cell receptor activates MEK/ERK signaling which leads also to DNA transcription.

Definition of MBCN

MBCN are classified histologically in the current WHO (World Health Organisation) Classification of Tumours of Haematopoietic and Lymphoid Tissues (2008)(1), arising

15 from a B cell precursor after the stage of pro-B cell maturation (Table 1). The malignant cells of CLL and small lymphocytic lymphoma (SLL) are indistinguishable by histological and molecular phenotype and are differentiated clinically, with CLL occurring predominantly in the blood and bone marrow, while SLL occurs predominantly in lymph nodes.

Table 1: MBCN classification (WHO) (1)

Non-Hodgkin lymphoma (NHL) - Aggressive / high grade NHL Diffuse large B cell lymphoma, Burkitt lymphoma, primary mediastinal large B-cell lymphoma, intravascular large B-cell lymphoma, plasmablastic lymphoma - Indolent / low grade NHL Follicular NHL, small lymphocytic lymphoma, extranodal marginal zone lymphoma of mucosa-associated lymphoid tissue, nodal marginal zone lymphoma, lymphoplasmacytic lymphoma (or Waldenström’s macroglobulinaemia), mantle cell lymphoma Chronic lymphocytic leukaemia Hairy cell leukaemia Plasma cell dyscrasias Multiple myeloma, plasmacytoma

Molecular pathogenesis of MBCN

A description of the known molecular pathways in MBCN follows with, where possible, their proven or putative functional role, as background to the possible biological relevance of putative methylation changes.

I) Structural Chromosomal abnormalities in MBCN Balanced translocations, occurring when there is large-scale transfer of DNA from one chromosome to another, are a feature of B-NHL. Reciprocal translocations frequently involve the IGH gene and are thought to have an oncogenic role, whereby IGH regulatory elements are placed next to an oncogene partner, putting the oncogene under the influence of an IGH enhancer. The putative mechanism for the frequency of involvement of IGH in chromosomal translocations is loss of the usual DNA repair mechanisms required during normal immunoglobulin recombination, somatic hypermutation or isotype switching. Many partner genes have been identified, with some MBCN subtypes featuring a particular partner gene translocation as a characteristic abnormality. Translocations involving IGH occur early in pre/pro-B cells when the partner gene is BCL2 or CCND1 and later in germinal centre B cells where the partner gene is BCL6. Other chromosomal translocations occurring in germinal centre B cells occur between IGH and PAX5, IRF4, FOXP1, IRF8, EBF1 and TNFRSF13 (52).

In CLL/SLL, the timing of the initial genetic event that initiates malignant transformation is not clearly understood, but they can be divided into a type arising from a less mature pre-B cell with an unmutated IGH locus (IGHV-UM) or arising from a more mature post-germinal centre B cell with a mutated IGH locus (IGHV-M) (6). Deletions of 13q14 occur in 50-60% cases and are often the sole cytogenetic abnormality, suggesting this is an early genetic event. Other deletions include 11q22-

16 23 in 20% with associated loss of ATM and 17p13 in 10% with TP53 tumour suppressor gene inactivation. An extra copy of chromosome 12 is detected in 15% of cases. Recurrent balanced translocations are extremely rare in CLL/SLL with t(14;18)(q32;q21) occurring in only 2% cases, all of IGHV-M type.

Chromosomal translocations in MM include t(4;14), t(11;14) and t(14;16) (Table 2). Duplication of multiple resulting in chromosomal hyperdiploidy occurs in 57% of MM. These translocations and hyperdiploid abnormalities are thought to be primary genetic events in MM as they are detectable in the MM precursor condition monoclonal gammopathy of undetermined significance (MGUS). Chromosomal abnormalities believed to be secondary events include chromosomal gains (1q, 12p, 17q), translocations (t(8;14) and other non-IGH translocations) and deletions (1p in 30% [CDKN2C, FAF1, FAM46C], 6q in 33%, 8p in 25%, 11q in 7% (BIRC2 and BIRC3), 13 in 45% (RB1 and DIS3, 14q in 38% (TRAF3), 16q in 35% (CYLD and WWOX) and 17p in 8% (TP53)] .

Table 2: Common structural chromosomal abnormalities thought to be primary events in MBCN

Chm translocation Partner genes Stage of B cell MBCN type Mutation development frequency t(14;18)(q32;q21) IGH, BCL2 Pre/pro-B cell Follicular lymphoma 85% GCB-DLBCL 30% CLL 2% MM Infrequent t(11;14)(q12;q32) CCND1, IGH Pre/pro-B cell Mantle cell lymphoma Ubiquitous t(8;14)(q24;q32) MYC, IGH Germinal centre Burkitt lymphoma Ubiquitous IGH, BCL6 Germinal centre DLBCL t(4;14) FGFR3 & MMSET, MM 11% IGH t(11;14) CCND1, IGH MM 14% t(14;16) MAF, IGH MM 3% t(14;20) IGH, MAFB MM 1.5%

GCB = Germinal centre B cell ,DLBCL = diffuse large B-cell lymphoma

II) Mutations in MBCN The mutation landscape in MBCN is highly heterogenous (6, 8, 52). In CLL, a handful of mutations are present at frequencies of 10-15% while a large number of genes are mutated at lower frequency (2-5%). Quesada et al recently reported 60 recurrent mutations identified in 105 CLL samples after whole exome sequencing, with a median of 45 mutations per case (7). Whole exome sequencing in multiple myeloma reveals multiple detectable mutations present at diagnosis in 36 genes associated with the cell proliferation pathway mitogen-activated protein kinase (MAPK) and genes of likely pathogenetic interest including KRAS and/or NRAS-activating mutations and BRAF mutations (53).

17

Table 3: Recurrent mutations in MBCN

MBCN type Gene mutation Follicular NHL FAS mutations TNFRSF14 BCL2 activation as a consequence of IGH-BCL2 with apoptosis resistance KMT2D(MLL2) resulting in decreased histone methylation CREBBP EP300 Gain of function MEF2B mutations leading to BCL6 activation HIST1H1(B-E) – aberrant chromatic compaction ARID1A mutation – nucleosome modeling TNFSF13C – encodes for BAFF (B cell activating factor), mutation leads to aberrant B cell signaling cascade Germinal BCL2 activation due to IGH-BCL2 translocation and apoptosis resistance centre type BCL6 autoregulatory domain mutations DLBCL Epigenetic MEF2B/CREBBP/EP300 FOXO1 mutations EZH2 mutations and increased H3K27me3 PTEN loss resulting in MYC up-regulation and constitutive PI3K/Akt signaling GNA13/S1PR2/RHOA Activated B- MYD88 (L265P) and TNFAIP3 mutations resulting in NF-kB activation cell type CARD11, CD79A, CD79B mutations resulting in chronic active BCR signaling DLBCL BCL6 dysregulation PRDM1/Blimp1 mutations/deletions Burkitt MYC, CCND3 (activating and DDX3X (inactivating) – Enhanced cellular proliferation lymphoma and abnormal cell cycle control Activating TCF3 and inactivating ID3 promoting BCR signaling Inactivating ARID1A and SMARCA4 mutations leading to abnormal chromatin remodeling Marginal zone Activating mutations of MYD88/CARD11 and inactivating mutations or deletions of lymphoma KLF2, TNFAIP3, BIRC3, TRAF3, IKBKB leading to enhanced NK-kB activation Activating NOTCH2 mutations leading to constitutive NOTCH activation. Inactivating mutations of NOTCH repressors SPEN and DTX1. KMT2D(MLL2), EP300 and ARID1A mutations leading to epigenetic abnormalities Mantle cell CCND1 activating mutations due to IGH-CCNDI translocation leading to cell cycle lymphoma dysregulation Inactivation of RB1 (cell cycle) Bi-allelic loss/dysfunction of ATM leading to loss of DNA damage repair TP53 mutations NOTCH1 and NOTCH2 mutations WHSC1, KMT2D(MLL2) and MEF2B mutations leading to epigenetic dysregulation

18 CLL Cell cycle apoptosis: BRAF, PTPN11, KRAS, CCND2, ANKHD1, BAX, NRAS, CDKN1B, CDKN2A. Loss of RB1 can occur due to del13q14 NOTCH signaling: NOTCH1 (mutations in coding region in 10-12% at diagnosis, generally in IGVH-UM. 40% will also carry trisomy 12. Less commonly, mutations in 3`UTR region of NOTCH1 leading to aberrant splicing events found in 3%), FBXW7 (inactivating mutations found in this gene coding for a ubiquitin ligase), SPEN RNA processing: SF3B1 (10-15%), DDX3X, MBA, ZNF292, XPO1, CNOT3, NXF1, RPS15, LUC7L2, SKIV2L2 DNA damage: ATM (9%, inactivation can occur due to del11q22-23 in addition to mutation of the remaining allele), TP53 (15%, can occur due to 17p deletion or inactivating mutations), POT1 Chromatin remodeling and transcription: CHD2, ZMYM3, SYNE1, MED12, FUBP1, SETD1A, ASXL1, ATRX, ARID1A, MED1, SETD2, BAZ2A, POLR3B, CREBBP, MLL2, HIST1H1B BCR, NF-kB, TLR and B cell activation: PAX5, MYD88 (10%), KLHL6, EGR2, BIRC3 (deletion can occur in del11q), BCOR, NFKBIE, IRF4, IKZF3, TRAF3, TLR2, CD79A, NKAP, CD79B, IRAK1. As part of del13q14, DLEU2, the microRNA cluster MIR15A- MIR16-1 (putative role in regulation of BCL-2 expression), the DLEU1lncRNA gene and sometimes DLEU7 which is a putative negative regulator of NF-kB transcriptional complex Tumour suppressor: LRP1B (4.8%) Myeloma Cell cycle abnormality: CDKN2C, RB1 (3%) CCND1 (3%), CDKN2A Proliferation: NRAS (21%), KRAS (28%), BRAF (5%), MYC (1%) Resistance to apoptosis: PI3K, AKT, MCL-1 NF-kB: TRAF3 (3%), CYLD (3%) Abnormal localization and bone disease: DKK1, FRZB, DNAH5 (8%) Abnormal plasma cell differentiation: XBP1 (3%), BLIMP1 (6%), IRF4 (5%) Abnormal DNA repair: TP53 (6%), MRE11A (1%), PARP1 RNA editing: DIS3, FAM46C, LRRK2 KDM6A (UTX) (10%), MLL (1%), MMSET (8%), HOXA9, KDM6B Recurrent chromosomal deletions resulting in gene deletion

Epidemiology of MBCN

Non-Hodgkin lymphoma is slightly more common in males than females and more common in white than black populations. There is a lower incidence of follicular lymphoma in China and Japan (49). A number of studies have reported increased incidence of MGUS in Africans and African-Americans in healthy participants screened for the presence of a serum paraprotein. An approximate three-fold increase in MGUS prevalence has been noted in African-Americans (54, 55). An American study using Surveillance, Epidemiology and End Results (SEER) data of 5,798 African American and 28,939 white MM cases found a two-fold increase incidence of MM in African-Americans as well as a significantly lower age at diagnosis of MM for African-Americans compared with whites (65.8 compared with 69.8 years) (56).

19 Some autoimmune diseases and infections are strongly associated with B-NHL. A large international pooled case-control study of 17,471 NHL cases and 23,096 controls conducted by the International Lymphoma Epidemiology Consortium (Interlymph) investigated the risk of autoimmune disease and subsequent development of NHL (9). Extensive demographic, environmental and clinical information was available. Autoimmune diseases classified as ‘B-cell-activating’ (such as Sjögren’s syndrome and systemic lupus erythematosus) were associated with increased risk of marginal zone lymphoma (OR=5.46), Waldenström’s macroglobulinaemia (OR=2.61) and DLBCL (OR=2.45). Hepatitis C infection was associated with a 3.05-fold risk for Burkitt lymphoma and 3.04-fold risk for marginal zone lymphoma, 2.70-fold for Waldenström’s and 2.33-fold for DLBCL. There is no association between Sjögren’s syndrome or systemic lupus erythematosis and MM risk (57). Further support for the importance of an intact immune system in the pathogenesis of lymphoma is the strong association between two specific instances of acquired immunodeficiency and the development of NHL. Human Immunodeficiency Virus (HIV) and subsequent Acquired Immunodeficiency Disease (AIDS) is associated with substantially increased risk of DLBCL, Burkitt lymphoma and primary central nervous system lymphoma (26). The HIV-related immune deficiency is considered to be the most significant factor in the pathogenesis of lymphoma however other factors such as infection with oncogenic viruses (Epstein Barr virus and Kaposi sarcoma herpes virus), chronic antigen stimulation, abnormal cytokine production and a possible lymphomagenic role of the HIV itself are cofactors (58). A second demonstration of the importance of the immune system in the development of NHL is the phenomenon of post-transplant lymphoproliferative disorders, a group of B-NHL that occurs in the setting of severe acquired immunodeficiency following immune suppression for solid or haematopoietic stem cell transplantation (1).

With respect to infection in the aeitology of NHL, specific infections are implicated in B-NHL: Helicobacter pylori infection is associated with gastric marginal zone lymphoma (59) and Epstein Barr Virus is strongly associated with Burkitt lymphoma and other B-NHLs (49).

In the past, due to a reliance on smaller case-control studies, evidence for environmental exposures as risk factors for MBCN has been inconsistent and weak. The international Interlymph consortium, combining a number of case-control studies in a pooled analysis, reported associations between lifestyle factors or occupation and B-NHL were weaker than thosefor medical history or family history (9). There was an inverse association between alcohol consumption and B-NHL which has been observed in previous studies. Increased duration of smoking was associated with a modest, statistically significant increased risk for some B-NHL types (a 1.5-fold increased risk for Waldenström’s, 1.27-fold for marginal zone lymphoma, 1.19 for follicular lymphoma and 1.24-fold for mantle cell lymphoma). There was no association observed between smoking and DLBCL or Burkitt lymphoma risk. There was an observed association between recreational sun exposure and a reduced risk of NHL (OR=0.74 per increasing quartile of hours sun exposure per week) which is in

20 keeping with the literature suggesting low vitamin D levels are associated with risk of NHL (60-62).

The only occupations associated with increased risk of NHL in the Interlymph study were those of painter and farm worker. Work as a painter was associated with an increased risk of Burkitt lymphoma (OR=2.28), and occupation as a general farm worker was associated with a modest increased risk of all NHL (OR=1.28). Occupation as a teacher was associated with a reduced risk of Waldenström’s, marginal zone lymphoma and Burkitt lymphoma. Higher socioeconomic status was associated with a reduced risk of NHL (OR=0.88 per increasing tertile of socioeconomic category). Other putative NHL risk factors that were evaluated without evidence of association were exposure to hair dye and hormonal / reproductive factors.

There is strong evidence from a number of studies for the association between obesity and MBCN risk. In the Interlymph consortium study, there was an association between adult body mass index (BMI) and DLBCL (OR=1.32 per each increasing WHO category of BMI) but not for other subtypes. In comparison, there was an association between body mass index as a young adult and NHL with no evidence of heterogeneity between NHL subtypes (OR=1.95 per increasing category of BMI). Meta-analyses comparing individuals of normal weight with those overweight or obese report the risk of MM and NHL to be significantly increased (63, 64). An increase in BMI of 5 kg/m2 was associated with a modest increase in overall NHL risk (RR 1.07, 95%CI 1.04-1.10) and in DLBCL risk (RR 1.13, 95%CI 1.02-1.13) (64). Analysis of SEER data also suggests obesity (BMI ≥30 kg/m2) is associated with shorter overall survival from NHL, compared with non-obese (HR 1.32, 1.02-1.70) (65).

For MM, lifestyle and environmental factors associated with risk have been investigated using case-control studies, prospective cohort studies and meta- analyses. Environmental factors associated with increased risk include smoking (66) and occupation (including farming, longer term exposure to pesticides and occupational exposure to benzene or petroleum products) (57). A recent meta- analysis reports alcohol consumption to have a small protective effect (pooled RR 0.88, 95% CI: 0.79, 0.99) ((57). Obesity also appears to be a positively associated with risk of MM (57). BMI assessed in a prospective cohort study demonstrated that increasing BMI was associated with modest increased risk of MM both at study entry (HR = 1.10, 95% CI: 1.00, 1.22 per 5kg/m2) and at age 50 years (HR = 1.14, 95% CI: 1.02, 1.28) (67). A meta-analysis of 16 prospective studies of obesity and NHL risk confirmed a small but statistically significant increased risk of NHL with each 5kg/m2 increment (RR 1.07, 95%CI, 1.04-1.10).

Overall, there is a lack of appropriately prospective studies with which to validate the findings of retrospective case-control studies.

21 Familial predisposition to MBCN

Twin studies evaluate the concordance of a disease in monozygotic twins who share all genes and dizygotic twins who share a proportion of genes. If concordance is higher in monozygotic twins it provides evidence for a genetic component. A large study of leukaemia in 44,788 twins from Scandanavia found an excess in monozygotic twins, yielding an estimated heritability of 20% (68). While all leukaemias were included in this study, the results are considered to be largely attributable to CLL as there is minimal evidence for familial clustering in the other leukaemias: acute lymphoblastic leuekaemia or acute myeloid leukaemia.

Familial clustering of lymphoid haematological malignancies has been noted by numerous case reports and small population-based case-control studies (69) (70) demonstrating that individuals within a family are at increased risk of developing different subtypes of MBCN. In the large Interlymph study, the prevalence of a family history of NHL in cases was compared with that of controls. For those with a first- degree relative with NHL, there was a 1.8-fold increased risk of NHL compared with those with no family history (9). The study controlled for an extensive range of putative environmental and medical risk factors for NHL, leading the authors to suggest that the association between family history and increased NHL risk is due to shared genetics rather than shared environmental factors (24).

More recent analysis of large family cancer registry data has enabled an estimation of the risk of lymphoid malignancy when a first-degree relative is affected; the larger numbers in such studies and unselected case identification minimise the biases to which small case-control studies are prone. A large population-based case-control study using cancer registry data from Sweden and Denmark evaluated 8,974 first- degree relatives of 2,517 DLBCL cases and 10,188 relatives of 2,668 follicular NHL cases for the prevalence of NHL compared with that in relatives of matched controls. First-degree relatives of DLBCL cases had a 9.8-fold increase risk of DLBCL and first- degree relatives of follicular NHL had a 4-fold increased risk of follicular NHL (71). In 2009, Landgren et al (37) reported a large population-based case-control study of 4,458 MGUS cases and 14,621 first-degree relatives (compared with 17,505 controls and 58,387 first-degree relatives of controls). Relatives of MGUS patients had a 4.0- fold (95% CI 1.5-11.0) of Waldenström’s, while relatives of IgM MGUS patients had a RR 5.0-fold (95% CI 1.1-23) of CLL (4). A case-control study of Waldenström’s published in 2008 demonstrated 3.0-fold (95% CI 2.0-4.4), 3.4-fold (95% CI 1.7-6.6) and 5.0-fold (95% CI 1.3-18.9) increased RR of developing non Hodgkin lymphoma, CLL and MGUS, respectively (72). These two landmark studies add further evidence of shared heritable susceptibility pathways that predispose to MGUS, Waldenström’s and other MBCNs.

For MM, the largest population-based study evaluated 14,621 relatives of 4,458 MGUS cases compared with relatives of matched controls, finding that a family history of MGUS was associated with a 2.8 fold risk of developing MGUS (4). Results

22 have been confirmed by other studies in populations of European ancestry (73) and also by smaller studies in African-American populations (74). A large population based study using cancer registry-identified cases pooled from USA, Canada and Europe evaluated 2,843 MM cases and 11,470 controls. The presence of a first- degree family history of any lympho-haematopoetic malignancy was positively associated with the risk of MM (OR 1.29, 95% CI: 1.08-1.55) (75). MM risk was positively associated with having a first-degree relative with MM (OR 1.90, 95% CI: 1.27-2.88). There were differences by ethnicity; for African/American participants, there was a very strong association between MM risk and having a first-degree relative with MM (OR 5.52, 95% CI: 1.87-16.28).

Genetic risk and MBCN

Genome-wide association studies (GWAS) use a whole genome sequencing approach to identify single nucleotide variations of relatively common population frequency (>5%) to identify genetic markers associated with disease.

Overall, 81 SNPs have been identified from GWAS of MBCN and are summarised in the table below. Few candidate gene loci have been replicated in separate studies. As the loci are common by definition (relative allele frequency [RAF]>5%) and have small effect sizes, the findings of GWAS studies support the hypothesis that multiple genetic modifications and non-genetic risk factors are necessary for NHL genesis. There are clusters of SNPs at 6p21.32 (an immune regulatory region known as the human leucocyte antigen system (HLA)) and 8q24 (near the MYC gene) with other SNPs scattered across the genome. Targeted studies of genes encoding the B cell activating factor protein BAFF also suggest an association between mutations in TNFRSF13C/BAFF-R and TNFSF13B and CLL risk (76).

23 Table 4: SNPs identified from published genome-wide association studies

Disease SNP Chm.loc RAF Nearest gene p OR Refs (combined) (combined)

CLL rs17483466 2q13 0.21 ACOXL, BCL2L11 2.36E-10 1.39 (11) (12) CLL rs13397985 2q37.1 0.19 SP140, SP110 5.40E-10 1.41 (11) (13) (12) CLL rs3770745 2p22.2 0.22 QPCT, PRKD3 1.68X10-8 1.24 (15) CLL rs13401811 2q13 0.81 ACOXL, BCL2L11 2.08E-18 1.41 (15) CLL rs9308731 2q13 0.541 BCL2L11 1.00E-11 1.19 (14) CLL rs3769825 2q33.1 0.45 CASP10/CASP8 2.50E-09 1.22 (15) CLL rs757978 2q37.3 FARP2 2.11E-09 1.39 (16) (12) CLL rs9880772 3p24.1 0.465 EOMES 2.55E-11 1.17 (14) CLL rs10936599 3q26.2 MYNN 1.74E-10 1.26 (17) CLL rs9815073 3q28 0.651 LPP 3.62E-08 1.2 (14) CLL rs898518 4q25 0.59 LEF1 4.24E-10 1.2 (15) CLL rs6858698 4q26 CAMK2D 3.07E-10 1.31 (17) CLL rs872071 6p25.3 0.54 IRF4 1.91E-20 1.54 (11) (13) (12) CLL rs210134 6p21.33 BAK1 9.47E-16 1.35 (19) (18) CLL rs210142 6p21.33 BAK1 1.03E-12 1.47 (19) (18) CLL rs73718779 6p25.2 0.11 SERPINB6 1.97E-08 1.27 (15) CLL rs2236256 6q25.2 IPCEF1 1.50E-10 1.23 (17) CLL rs17246404 7q31.33 POT1 3.40E-08 1.22 (17) CLL rs2456449 8q24.21 - 7.84E-10 1.26 (16) (12) CLL rs1679013 9p21.3 0.52 CDKN2B-AS1 1.27E-08 1.11 (15) CLL rs4406737 10q23.31 0.57 ACTA2/FAS 1.22E-14 1.27 (15) CLL rs735665 11q24.1 0.21 GRAMD1V 3.78E-12 1.45 (11) (20) CLL C11orf21, TSPAN (14) rs7944004 11p15.5 0.49 2.15E-10 1.27 32 CLL rs7176508 15q23 0.37 - 4.54E-12 1.37 (11) CLL rs8024033 15q15.1 0.51 BMF 2.71E-10 1.22 (15) CLL rs7169431 15q21.3 - 4.74E-07 1.36 (16) CLL rs783540 15q25.2 CPEB1 1.10E-07 1.17 (13) CLL rs305077 16q24.1 0.37 IRF8 3.37E-08 0.66 (19) CLL rs391525 16q24.1 0.37 IRF8 3.16E-09 0.64 (19) CLL rs2292982 16q24.1 0.37 IRF8 6.48E-09 0.65 (19) CLL rs2292980 16q24.1 0.37 IRF8 1.89E-08 0.68 (19) CLL rs305061 16q24.1 IRF8 3.60E-07 1.22 (16) CLL rs4368253 18q21.32 0.69 PMAIP1 2.51E-08 1.18 (15) CLL rs4987852 18q21.33 0.06 BCL2 7.76E-11 1.41 (15) CLL rs4987855 18q21.33 0.91 BCL2 2.66E-12 1.47 (15) CLL rs11083846 19q13.32 0.22 PRKD2, STRN4 3.96E-09 1.35 (11)

DLBCL rs79480871 2p23.3 0.076 NCOA1 4.23E_8 1.35 (21) DLBCL HLA-DQB1, HLA- (22) rs10484561 6p21.32 1.00E-07 1.36 DQA1 DLBCL rs2647012 6p21.32 HLA-DQB1 1.28 DLBCL rs2523607 6p21.33 0.12 HLA-B 2.40E-10 1.45 (21) (23) DLBCL rs872071 6p25.3 IRF4 1.2 DLBCL rs116446171 6p25.3 0.019 ExOC2 2.33E-21 2.26 (21) (23) DLBCL rs13255292 8q24.21 0.321 PVT1 1.15E-13 1.19 (21) (23) DLBCL rs4733601 8q24.21 0.477 PVT1 3.63E-11 1.45 (21) DLBCL rs7097 13q12 LNX2 6.57E-07 1.42 (24) DLBCL rs751837 14q32 CDC42BPB 3.30E-07 3.5 (24)

24 Disease SNP Chm.loc RAF Nearest gene p OR Refs (combined) (combined)

Follicular rs2647012 6p21.32 - 2.00E-21 0.64 (22) Follicular HLA-DRB5, HLA- (25) rs4530903 6p21.32 2.69E-12 1.93 DQA1 Follicular rs7755224 6p21.32 - 5.18E-08 1.95 (20) Follicular HLA-DQB1, (20) (22) rs10484561 6p21.32 5.18E-08 2.07 HLADQA1 Follicular HLA-DQB1, HLA- (25) rs2647046 6p21.32 3.77E-10 0.59 DQA2 Follicular rs6457327 6p21.33 STG 4.00E-11 0.59 (26) Follicular rs9268853 6p23 - 2.48E-10 1.56 (25)

Mixed NHL rs6773854 3q27 - 3.36E-13 1.44 (27) Mixed NHL rs12289961 11q12.0 - 3.89E-08 1.29 Myeloma rs1052501 3p22.1 0.2 ULK4 7.47E-09 1.32 (28) (29)

Myeloma rs10936599 3q26.2 0.75 MYNN 8.70E-14 1.26 (30) (29) Myeloma rs56219066 5q31 0.73 ELL2 2.20E-10 1.24 (31) (29) Myeloma rs2285803 6p21.33 0.32 PSORS1C1 9.67E-11 1.19 (30) (29) Myeloma rs34229995 6p22.3 0.029 JARID2 1.30E-08 1.37 (29) Myeloma rs9372120 6q21 0.218 ATG5 9.09E-15 1.18 (29) Myeloma rs4487645 7p15.3 0.71 DNAH1 3.33E-15 1.38 (28) (32) Myeloma rs7781265 7q36.1 0.125 SMARCD3 9.71E-09 1.19 (29) Myeloma rs1948915 8q24.21 0.345 CCAT1 4.20E-11 1.13 (29) Myeloma rs2811710 9p21.3 0.657 CDKN2A 1.72E-13 1.15 (29) Myeloma rs2790457 10p12.1 0.739 WAC 1.77E-08 1.12 (29) Myeloma rs7193541 16q23.1 0.585 RFWD3 5.00E-12 1.13 (29) Myeloma rs4273077 17p11.2 0.12 TNFRSF13B 7.67E-09 1.26 (30) (29) Myeloma rs6066835 20q13.13 0.083 PREX1 1.36E-13 1.26 (29) Myeloma rs877529 22q13.1 0.44 - 7.63E-16 1.23 (30) (29) Myeloma rs603965 11q13.3 CCND1 7.96E-11 1.82 (33) (29)

SNP = single nucleotide polymorphism RAF= relative allele frequency

25 2.2 DNA Methylation

Epigenetic mechanisms The term ‘epigenetics’ was coined in the 1950s, meaning ‘over’ or ‘upon’ genetics. The modern description of epigenetics refers to mitotically and / or meiotically heritable changes in gene expression without alteration to the DNA sequence (39). Epigenetic modifications include DNA methylation, histone modification, chromatin remodeling and noncoding RNAs (39).

Regulation of DNA Methylation in normal epigenome

DNA methylation involves the addition of a methyl group at the 5’ position of the cytosine ring after DNA replication at positions where cytosine is adjacent to guanine, known as a CG dinucleotide or CpGs. The genomic location of CpGs and their methylation pattern sheds insight into CpG methylation regulation and function. The majority of the genome is observed to be CpG-poor, containing only approximately 21% of the CpG dinucleotides that are statistically expected. A CpG- poor genome is common amongst vertebrates and is explained by the observation that methylated cytosine has a propensity to be deaminated by 5-methylcytosine deaminase, forming thymidine in its place (40).

The majority of CpGs actually reside within CpG islands of 0.5-4kb in length in which the proportion of observed CpGs is higher than that across the rest of the genome (77). Other descriptions of CpG locations include ‘shores’, ‘shelves’ and ‘open sea’ locations, which reference the proximity of the CpG to a defined CpG island.

While sporadically located CpGs are generally methylated, CpGs located within CpG islands are virtually always unmethylated in normal cells. These unmethylated, CpG- rich islands are frequently associated with gene bodies and promoter-associated regions. The mechanisms that protect CpG islands from methylation are unclear, but it is known that methylation of CpG Islands is possible and occurs during normal female X chromosome inactivation. Functionally, X chromosome inactivation via DNA methylation is associated with long term stabilization of a repressed state. In cancer cells, promoter-associated CpG island methylation is a common feature (78, 79).

The genomic location of CpG Islands is a clue to their functional importance. Approximately 70% of gene promoters are associated with a CpG Island, including housekeeping genes, tissue-specific genes and developmental regulator genes (80) (77). Conversely, about half of CpG Islands occur within traditionally-defined promoter regions of annotated genes (78).

DNA methylation is catalysed by DNA methyl-transferases (DNMTs). The addition of a methyl group results in a change to binding characteristics of the transcription assembly that in some circumstances leads to downstream gene-silencing. DNMT3a and DNMT3B target previously unmethylated CpGs, resulting in de novo

26 methylation, while DNMT1 maintains existing methylation patterns following DNA replication. De novo methylation catalyzed by DNMT3A and DNMT3B is an infrequent event occurring during chromosome X inactivation and imprinted genes (77). The more ubiquitous mechanism of methylation is maintenance by DNMT1, which conserves DNA methylation patterns after DNA replication. There are also active processes of demethylation, the most well-described being the ten-eleven translocation (TET) enzymes (TET1, TET2 and TET3) which convert 5- methylcytosine to 5-hydroxomethylcytosine (81-83). The TET enzymes are thought to be important in maintaining CpG island protection from methylation. Disruption of TET protein activity is associated with aberrant methylation. Other enzymes involved in active and passive removal of methyl groups are activation-induced cytidine deaminase (AID) and Thymine DNA glycosylase (TDG).

DNA Methylation and gene silencing The settings in which DNA methylation may result in gene silencing remain unclear and a number of different mechanisms for this have been proposed.

DNA methylation changes DNA binding conformation. In this model, DNA methylation may directly inhibit transcription factors from binding to their cognate DNA sequence.

MBD-mediated transcriptional silencing. Methylated CpG sites attract the binding of proteins that recruit chromatin-modifying activities. Four such proteins MBD1, MBD2, MBD3 and MeCP2 have been implicated in methylation-dependent transcriptional repression but further specific mechanisms have not been shown (77).

DNA methylation targets transcriptionally inactive genes. In some instances, DNA methylation targets genes that are already transcriptionally repressed, and acts to cause irreversible silencing. Methylation of the Xist gene after it has been inactivated by non-coding chromosomal RNA is described during X chromosome in activation (77). The active X chromosome, in contrast, is protected from DNA methylation.

DNA Methylation and Cancer Abnormal patterns of DNA methylation, both loss of normal CpG methylation and gain of methylation in CpG islands, have been recognised as a hallmark of cancer for many years (78).

(I) Gain of methylation and associated gene silencing in cancer In cancer cells, hypermethylation of promoter-associated CpG Islands and associated gene-silencing has been described in a number of putative and established tumour suppressor genes. An epigenetic hypothesis of cancer pathogenesis suggests that methylation-associated tumour suppressor gene silencing could be one of the two genetic ‘hits’ required to silence both copies of the gene (78). The large number of

27 established or putative tumour suppressor genes silenced by promoter hypermethylation in different malignancies indicates the prevalence of this mechanism in cancer pathogenesis. The summary below, compiled from two review articles on promoter hypermethylation (78, 84) and a literature search of promoter hypermethylation with associated reduced gene expression in MBCN. The resulting gene list was cross-checked with TSGene 2.0, a database of tumour suppressor genes (85).

Table 5: Putative tumour suppressor genes exhibiting promoter hypermethylation and reduced gene expression

Gene Function Tumour AHR Transcription factor MCL (86) APC Inhibitor of β-catenin Aerodigestive tract (84) AR Androgen receptor Prostate, MBCN (84, 87) BRCA1 DNA repair, transcription Breast, ovary (84) Breast, stomach, leukaemia, NHL CDH1 E-cadherin cell adhesion (36, 78, 86, 88) CDH13 H-cadherin, cell adhesion Breast, lung (84) COX-2 Cyclo-oxygenase 2 Colon, stomach (84) CYB5R2 Cell proliferation inhibition TET2-mutated DLBCL (43) Lymphoma, lung, colon, MM (36, DAPK Pro-apoptotic 89-94) DKK1 Transcriptional target of P53 MM (95, 96) EGR1 Transcription factor and regulator of TP53 Follicular NHL (97) ER Oestrogen receptor Breast (84) EXTI Heparin sulphate synthesis Leukaemia, skin (84) FAT Cadherin, tumour suppressor Colon (84) FOXA1 Represses cell proliferation and migration CLL, follicular (97, 98) FOXA2/HNF3B Transcription factor CLL (98) GATA-4 Transcription factor Colon, stomach (84) GATA-5 Transcription factor Multiple types (84) GPX3 Detoxifies reactive oxidative species MM (99) GSTP1 Conjugation to glutathione Prostate, breast, kidney (84) HIC-1 Transcription factor Multiple (84) hMLH1 DNA mismatch repair Colon, endometrium, stomach (84) HOXA9 Homeobox protein Neuroblastoma (84) (46) (100) IGFBP3 Growth factor-binding protein Lung, skin (84) IKZf1 T cell differentiation DLBCL IRX1 Targets anti-angiogenesis genes CLL, gastric (84) KLF4 Cell cycle inhibitor Follicular NHL JDP2 Transcription regulator DLBCL LKB1/STK11 Serine/threonine kinase Colon, breast, lung (84) MGMT DNA repair of O6-alkyl-guanine Multiple (84) MIR129-2 Promotion of apoptosis MM A transcriptional target of TP52; silences B-cell MIR34B/C Colon, MM (84) translocation gene 4 (BTG4) NOREIA Ras effector homologue Lung (84)

28 P14/ARF MDM2 inhibitor Colon, stomach, kidney (84) P15/CDKN2B Cyclin-dependent kinase inhibitor Leukaemia, MM, lymphoma P16/CDKN2A Cyclin-dependent kinase inhibitor Multiple (84) Gene Function Tumour P73 P53 homologue Lymphoma PR Progestrone receptor Breast (84) PRLR Prolactin receptor Breast (84) RARB2 Retinoic acid receptor β2 Colon, lung, head and neck (84) RASSFIA Ras effector homologue Multiple (84) RB Cell cycle inhibitor Retinoblastoma (84) RBP1 Morphogenesis and cellular proliferation MM RIZ1 Histone/protein methyltransferase Breast, liver (84) SFRP1 Secreted Frizzled-related protein 1 Colon (84) SOCS-1 Inhibitor of JAK/STAT pathway Liver, myeloma (84) SOCS-3 Inhibitor of JAK/STAT pathway Lung (84) SOX11 Transcription factor CLL SPARC Haematopoiesis MM SRBC BRCA-1 binding protein Breast, lung (84) SYK Tyrosine kinase Breast (84) Cell cycle inhibitor, can also be tumour TGFBI MM promoter THBS-1 Thrombospondin-1, anti-angiogenic Glioma (84) TMS1 Pro-apoptotic Breast (84) TPEF/HPPI Transmembrane protein Colon, bladder (84) VHL Ubiquitin ligase component Kidney, haemangioblastoma (84) MCL = mantle cell lymphoma

(II) Loss of methylation in cancer Widespread DNA hypomethylation occurring across the genome is observed in a variety of cancer cells. Within colorectal tumour cell lines, hypomethylation is associated with genomic instability suggesting a possible mechanism of oncogenesis may be via the induction of chromosomal abnormalities. Experimentally, Chen et al examined DNMT1-/- embryonic stem (ES) cells after noting that disruption of DNMT1 in mice resulted in genomic demethylation and a lethal phenotype (101). The DNMT1 knockdown ES cells demonstrated higher rates of gene mutations, a phenotype that was reversed when ES DNMT1-/- cells were rescued with DNMT1 cDNA. The nature of mutations in the DNMT1-deficient cell lines were deletions and insertions, and the findings suggest that widespread loss of DNA methylation may result in chromosomal instability.

29 Aberrant DNA methylation in MBCN

Methylation in Chronic lymphocytic leukaemia The CLL methylome has been extensively studied using modern array technology, following on from early studies repeatedly demonstrating promoter methylation in a number of candidate genes: DAPK1, TWIST2, ZAP70, HOXA4, SFRP1, SFRP2, SFRP4, ID4, cyclin-related genes P16 (CDKN2A), P15 (CDKN2B) and CDH1 and DUSP22 (102- 109).

The most comprehensive genome-wide analysis of DNA methylation in CLL to date was published by Kulis et al in 2012 using the Infinium 450K (42). Methylation in two CLL subgroups was compared with that of B cells from a normal donor using both the Illumina Infinium 450K array and bisulfite sequencing. In total, 139 CLL cases were studied, comparing IGH-unmutated CLL to naïve B cells and IGH-mutated CLL to memory B cells. Hypermethylated CpGs were enriched at 5’ regulatory regions, in CpG islands and in 5’ regions of introns. Hypomethylation was observed in CpGs in the gene body located outside CpG islands.

A number of other groups have investigated methylation in CLL/SLL using DNA methylation assays with limited genome coverage including such as using an endonuclease digestion method known as methylated CpG island amplification microarray (MCAM) (110, 111) but these are not whole genome assays and are, therefore, not further considered here. Results of aberrant methylation reported by single studies should be interpreted with caution as the published studies are generally based on the analysis of between 30-100 CLL samples. For instance, while the gene AIRE was reported to be unmethylated by the Kanduri study, Pei et al found this gene to be hypermethylated.

In summary, CLL exhibits a range of pattern of aberrant methylation including promoter methylation and widespread non-promoter hypomethylation, although the published studies vary in sample selection, type of control used, type of methylation assay, statistical analysis and reporting, making comparisons between studies challenging. A small number of genes has been recurrently identified as having aberrant methylation: ID4, ABI3, WISP3, and SFRP1 as well as cell cycle regulators P15/CDKN2B and P16/CDKN2A.

Table 6: Summary of published studies reporting aberrant DNA methylation in CLL

Author Subjects / Methodology Finding Pei 2012 (98) 11 CLL samples, 3 normal B 533 significant DMRs. cell controls Hypermethylation of FOXD3, FOXE1, FOXG1, RIX1, ID4, Method: RRBS (23,000 CpG SFRP1, SLIT2, BNC1, ADCY5, EBF3, NR2F2 and DIO3. In islands). addition, HOXD8, HOXD11, HOXC13, SOX1, SOX2, SOX4, SOX6, SOX9, SOX11. Methylation status of FOXA1, FOXA2, SOX9, SOX11 and IRX1 inversely correlated with gene expression. Hypomethylation of TCL1A, BCR, LFNG, NOTCH1, TCF7, RASGRF1 and VAV2 (genes with a known or potential role as oncogenes).

30 Author Subjects / Methodology Finding Kanduri 2010 23 CLL samples Widespread hypomethylation in all CLL samples (38) Separated into IGVH mutation compared to normal B cells. Methylation of VHL, status (mutated [M] and SCGB2A1, ABI3, GPX2, IGSF4, SERPIND5 and ZNF540 in unmutated [U]) IGHV-U CLL and PPP1R3A and WISP3 (known TSGs) in Method: Infinium 27K , IGHV-M CLL. Genes involved in cell proliferation and absolute but not differential β tumour progression (PLD1 and BCL10 in IGHV-M CLL and methylation reported ADORA3, AIRE, CARD15, FABP7, LOC340061, PRF1, UNC5CL, ANGPT2, IFNB1, URP2, BCL2, IL19, IL17RC and S100A14) were unmethylated in IGVH-U CLL. Inverse correlation between methylation and gene regulation demonstrated for VHL, IGSF4 and ABI3. A distinct methylation profile was reported between IGVH- U and IGVH-M. Ronchetti 37 CLL samples (purified B Few shared DMPs between IGVH subgroups. 2014 (112) cells) Only 35 DMPs in which increased DNA methylation were Samples separated into IGVH inversely correlated with gene expression including genes mutation status encoding ZNF471 and ZFP28 and SPG20 gene. Method: Infinium 27K Cahill 2013 18 CLL samples separated into DNA methylation changes over two sampling timepoints (113) IGVH mutation status separated by 5-8 years demonstrated to be stable. Method: Infinium 450K, Increased methylation of NOTCH1 was observed in IGHV- absolute but not differential β M CLL compared to normal B cells. methylation reported Halldorsdottir 30 CLL samples, 20 MCL Methylation of apoptosis-related genes e.g. CIDEB, 2012 (46) samples CYFIP2, NR4A1, PRDM2 and PTGES. ABI3 and VHL were Method: Infinium 27K methylated only in unmutated CLL. Kulis 2012 139 CLL samples compared to IGVH-U CLL exhibited 3243 hypermethylated DMPs and (114) normal B cells separated into 29743 hypomethylated DMPs while IGVH-M CLL had 246 IGVH mutation status. hypermethylated and 4606 hypomethylated DMPs. Method: Infinium 450K with confirmation by bisulfite Note: Threshold for calling differential methylation was sequencing not made clear, therefore DMPs from this analysis were not included in list of confirmed differentially-methylated MBCN genes. RRBS = reduced representation bisulfite sequencing, TSG = tumour suppressor gene, MCL = Mantle cell lymphoma

A few studies have shown the methylation status of some genes in CLL cells at diagnosis to be associated with prognosis.

Table 7: Summary of studies reporting an association between DNA methylation and prognosis

Author Subjects, methodology Finding Irving 2011 118 CLL samples Methylation of CD38, HOXA4 and BTG4 as a predictor of (115) Method: Bisulfite treatment and survival gel-based PCR (COBRA assay) Tong 2010 78 CLL samples Methylation status in LINE and APP assocated with (110) Method: Endonuclease digestion shorter survival (MCAM) Queiros 211 CLL samples 5 CpG sites can reliably differentiate between three CLL 2015 (116) Validation series: 97 samples subgroups with prognostic difference Method: infinium 450K COBRA = Combined Bisulfite Restriction Analysis, MCAM = Methylated CpG Island Amplification and Microarray

31 Methylation in B cell non-Hodgkin lymphomas The following studies have assessed DNA methylation within MBCN tumour tissue.

The early studies of DNA methylation focused on methylation in cell cycle regulation genes. In 1999, Baur et al reported methylation of the P16 gene (CDKN2A) in 32% of B-NHL, most commonly in DLBCL (50%) (117). Methylation of the P15 gene (CDKN2B) was found in 64% of B-NHL, both low grade and high grade histological subtypes. Both P15 and P16 inactivation are considered to be important in haematological malignancies but there is limited evidence of mutations or homozygous deletions in lymphomas. It is thus proposed that aberrant DNA methylation is the primary mechanism of P15 and P16 silencing (118, 119).

In 2009, using MSP and bisulfite sequencing, Martin-Subero et al (120) reported promoter hypermethylation in a mixed sample of B-NHL compared with normal B cells. Methylated genes were highly enriched for targets of the polycomb repressor complex. The same group utilised the emerging array-based technology for methylation measurement (Infinium Goldengate assay) to investigate the pattern of methylation changes in a wide range of haematological neoplasms (121). Methylation in tumour tissue from 367 haematological neoplasms was compared with that from matching normal cells of the haematopoietic system: whole peripheral blood, whole bone marrow and CD34+ cells were used as controls for myeloid neoplasms, CD3-positive T cells for T-cell neoplasms and CD19-positive cells, germinal centre B cells and lymphoblastoid cell lines for B-cell neoplasms. They reported differential methylation patterns by unsupervised hierarchicial analysis between B cell neoplasms, T cell neoplasms and myeloid neoplasms with greater similarity of hypermethylated genes between T and B cell neoplasms compared with myeloid neoplasms. Six genes were hypomethylated in all three types of haematological neoplasms: DIO3, FZD9, HS3ST2, MOS and MYOD1. They noted higher levels of de novo DNA methylation in germinal centre B cell derived lymphomas such as DLBCL, Burkitt lymphoma and follicular lymphoma, intermediate de novo methylation in mantle cell lymphoma and MM and lower levels of methylation in CLL and marginal zone lymphoma (110).

Other genes methylated in B-NHL include: transcription factor ABF1 in follicular lymphoma, Burkitt lymphoma and DLBCL (122), DAPK1 promoter methylation in 85% follicular lymphoma and 72% mucosa-associated lymphoid tissue (MALT) lymphoma, MGMT methylation in 28% of all mature B cell tumours (123) and CDKN1C promoter methylation in 44% follicular lymphoma and 55% DLBCL (35).

32 Methylation in mantle cell lymphoma Mantle cell lymphoma (MCL) is characterised by a major genetic event in which the t(11;14)(q13;q32) translocation leads to cyclin D1 gene (CCND1) dysregulation. Within mantle cell lymphoma there are heterogeneous subgroups with varied prognosis and clinico-biological behaviour, including the presence or absence of SOX11 expression. This has led to the proposition that additional epigenetic events could be critical to mantle cell lymphoma diversity (86, 124). Quieros et al described the mantle cell lymphoma methylome in detail and proposed an epigenetic model of mantle cell lymphoma in which epigenetic hits are acquired by B cells carrying the t(11;14) mutation (47). The methodology of the experiment is notable for the investigators’ attempt to apply a correction for differences in white blood cell composition within tumour samples

Table 8: Summary of published studies reporting aberrant DNA methylation in mantle cell lymphoma

Author Subjects, methodology Finding Enjuanes MCL cell lines and 38 MCL Hypermethylation of seven TSGs: CDH1,AHR, (86), 2011 (86) tumour samples. ROBO1, SOX9, NR2F2, and NPTX2 but not CDC14B. Only Method: MassArray EpiTYPER SOX9, AHR and NR2F2 demonstrated inverse correlation applied to 25 candidate genes between methylation and gene expression. Halldorsdottir 30 CLL samples, 20 MCL No significant methylation differences between MCL 2012 (46) samples subgroups. Method: Infinium 27K Methylation of homeobox genes HOXA2, HOXA9, and HOXA13 (chm 7) and HLXB9, LHX1, PAX7, and POU4F1. Hypomethylation of proto-oncogene MERTK and CAMP. Quieros 2016 82 MCL samples, separated Two distinct methylation profiles correlating with SOX11 (47) into SOX11 expressers / non- expression and prognosis. expressers Method: Infinium 450K,

Methylation in follicular lymphoma The early application of array-based methylation technologies, e.g. Killian et al (125), demonstrated pervasive differential methylation in follicular lymphoma. Using the smaller scale Infinium Goldengate assay on 12 follicular lymphoma samples, the authors noted differential methylation at 259 CpG probes of the approximate 1500 in the array. The following table summarises studies in which differential methylation has been described at individual genes in follicular lymphoma.

Table 9: Summary of published studies reporting aberrant DNA methylation in follicular lymphoma

Author Subjects, methodology Finding O’Riain 2009 164 follicular lymphoma samples Promoter hypermethylation of 133 genes including (93) Method: Infinium Goldengate assay HOXA9, ONECUT2, NOTCH3, DAPK1, MYOD1, GRB10. Enrichment for gene targets of polychromb repressor complex. Bennett 2009 Six follicular lymphoma samples Widespread CpG island hypermethylation enriched (97) Method: MCAM validated with for homeobox genes and targets of polychromb COBRA and bisulfite sequencing repressor complex. Methylation of HOXA11 and PAX6 confirmed by COBRA and sequencing. Hypermethylation with reduced gene expression in CCND1, EGR1, FOXA1, KLF4, SIM2, SOX9.

33 Author Subjects, methodology Finding Choi 2010 14 follicular lymphoma samples Promoter hypermethylation in HOXA4, HOXA9, (100) Method: MIRA-assisted microarray PCDHGA-B gene cluster, PCDHA gene cluster and analysis for enrichment of CpG IRF4. islands followed by RRBS. Validation with massARRAY EpiTyper MIRA = Methylated-CpG island recovery assay, MSP = methylation-sensitive PCR

A small number of studies has suggested that methylation also appears to be associated with follicular lymphoma prognosis. The study by Giachella et al reported aberrant DAPK1 methylation detectable in the peripheral blood of patients with newly diagnosed follicular lymphoma. This is the only published study yet to report peripheral blood methylation findings for MBCN. The authors discuss the potential for a peripheral blood methylation profile to be used as a clinically relevant biomarker (90).

Table 10: Summary of published studies reporting an association between methylation and prognosis in follicular lymphoma

Author Subjects, methodology Finding Alhelaly 2014 118 follicular lymphoma samples, Methylation of p16/CDKN2A in 19% samples, (118) laser-capture microdissection associated with gene inactivation and poor clinical Methods: MSP outcome. Giachella 107 patient samples from bone (93) promoter methylation associated with 2014 (90) marrow, blood and lymph node increased likelihood of relapse Method: MethylLight PCR

Methylation in diffuse large B cell lymphoma (DLBCL) DLBCL carries mutations in several genes related to epigenetic pathways (TET and EZH2). TET enyzmes, as described above, are involved in the removal of methyl groups from CpG sites. EZH2 encodes for a histone methyltransferase that is a functional component of the polychromb repressor complex 2. A number of inhibitors of EZH2 have been developed in phase 1 and 2 clinical trials in non- hodgkin lymphoma with promising clinical efficacy (126-128). Further support for the dysregulation of methylation in DLBCL, is an increase in protein expression of the DNA methylation transferases DNMT1, -3A and -3B observed in a proportion of cases of newly diagnosed DLBCL (129), together with correlation between DNMT expression status and the promoter hypermethylation status of 11 investigated genes. Other specific targets of increased promoter methylation include cell cycle regulators P15 and P16 as well as DAPK1, PCDH10 and MGMT (91, 130).

In genome-wide approaches to measuring methylation, there is an evolving description of specific gene promoter methylation, frequently within CpG islands (43) (131) as well as an observation of widespread non-promoter hypomethylation (132).

Aberrant methylation has also been tested as an independent predictor of prognosis such as MGMT methylation associated with favourable prognosis (133). Methylation

34 profile identified using the HpaII tiny fragment enrichment by ligation-mediated PCR (HELP) assay associated with poor prognosis Chambwe 2014

In the only published study of circulating cell-free DNA, Kristensen et al looked at methylation in five candidate tumour suppressor genes known to be methylated in DLBCL tumour samples. They found modestly increased methylation in DAPK1, DBC1 and only very slight increased methylation in MIR34A and MIR34B/C compared with controls (134). DAPK1 methylation disappeared in patients during remission but was detectable in patients with poorly responsive or relapsed disease and, therefore, could have a potential role as a biomarker. A number of studies identify MGMT methylation or a distinct generalised methylation pattern as being associated with prognosis in DLBCL (Table 12).

Table 11: Summary of published studies reporting aberrant DNA methylation in DLBCL

Author Subjects, methodology Finding Pike 2008 8 DLBCL samples divided into Methylation of 12 of 15 candidate CpG islands (AR, (87) cell-of-origin groups GCB and CDKN1C, DLC1, GATA4, GDNF, GRIN2B, MTHFR, MYOD1, ABC NEUROD1, ONECUT2 and TFAP2A) Method: Screening with endonuclease digestion and sequencing, confirmation by Methylight Lee 2009 44 DLBCL samples Hypermethylation of promoters regions of MGMT (131) Method: MSP (52%), P15 (32%), P16 (55%), P57/CDKN1C (48%) and MAD2 (50%). MGMT and P57 methylation were associated with increased survival. Author Subjects, methodology Finding Shaknovich 69 DLBCL samples (min. 80% A methylation signature identifies GCB and ABC 2010 (135) tumour purity). Divided by cell- subtypes of DLBCL with a subset of 16 differentially of-origin groups GCB and ABC methylated genes with inversely correlated gene Method: HELP assay. expression. IKZf1, ASPHD2, PMM2, PAK1, LYAR, JDP2, Validation: EpiTYPER® assay FGD2, GALNS, IL12A, ARHGAP17, SORL1, KIAA0746, LANCL1, KCNK12, SOX9, CXorf57. Asmar 2013 100 DLBCL samples Compared to TET2wt, the TET2mut samples exhibited (44) Divided by TET2 mutation hypermethylation in 578 CpGs (315 genes) including 35 status genes in which gene expression was reduced. Inclusive Method: Infinium 450K in this list were some genes identified by Liu et al: CYB5R2, SDR42E1 and ZIK1. Krajnović 51 DLBCL samples Hypermethylation was detected in P15 in 23%, P16 in 2014 (91) Method: MSP 37%, MGMT in 39% and DAPK in 55%. Methylation of P15 was as associated with better prognosis. Kristensen 119 DLBCL cases DAPK1 methylation present in 84% DLBCL samples. 2014 (92) Method: MSP Allelic DAPK1 methylation associated with poor survival Huang 2017 107 DLBCL cases PCDH10 methylation in 54% DLBCL, associated with (136) Method: PCDH10 promoter inferior prognosis methylation by MSP Liu 2017 31 DLBCL samples Compared to TET2wt, the TET2mut samples the (43) Divided by TET2 mutation following genes were hypermethylated with reduced status expression: CRY1, CYB5R2, DCLK2, SDR42E1, SPIB, ZIK1, Method: Infinium 450K ZNF134, ZNF256 and ZNF615. GCB = germinal centre B cell type, ABC = activated B cell type

35 Table 12: Summary of studies reporting an association between methylation and prognosis in DLBCL

Author Subjects, methodology Finding Lee 2009 MGMT methylation associated with favourable prognosis (131) Wedge 2017 Widespread hypomethylation associated with poor (132) prognosis Chambwe Method: HELP assay Methylation pattern an independent predictor of 2014 (137) prognosis

Multiple Myeloma (MM) A summary of the literature reporting DNA methylation aberrancy in MM tumour samples is presented below. Recurrent findings include frequent methylation of P16 and variable methylation of DAPK. Aberrant methylation of a number of tumor suppressor genes and genes important in important oncogenic pathways such as Wnt signaling is also reported.

Table 13: Summary of published studies reporting aberrant global DNA methylation in MM

Author Subjects, methodology Finding Salhia 2010 13 MGUS, 26 smouldering MM, Global hypomethylation in 97% MGUS, 90% (138) 140 MM samples. smouldering MM and 94% MM samples compared Method: Goldengate Illumina with normal plasma cells. Array Of the 1500 loci, only 22 were hypermethylated. Walker 2011 161 MM samples from patients Global hypomethylation evident in the transition (45) taking part in the UK MRC IX from MGUS to MM. Increased methylation of 82 clinical trial CpG probes (77 genes) was observed. Differential Method: Infinium 27K methylation in subgroup with hyperdiploidy and t(4;14) translocation. Aoki 2012 7 MGUS, MM74 Hypomethylation of repetitive elements LINE-1 and (139) Method: MSP Alu

Table 14: Summary of published studies reporting aberrant DNA methylation at CpG sites in MM

Author Subjects, methodology Finding Galm 2004 56 MM samples Methylation of SOCS-1 (46%), P16 (36%), ECAD (89) Method: MSP of 11 candidate (21%), DAPK (13%), TP73 (2%). tumour suppressor genes Tatetsu 2007 29 MM samples Methylation of PU.1 regulator region observed in (140) Method: bisulfite sequencing 35% MM samples but not in MGUS and was associated with down-regulation. Cheng 2007 28 MM samples PF4 promoter hypermethylation in 54%, associated (141) Method: bisulfite sequencing of with no or low PF4 expression. PF4 promoter PF4 is a putative tumour suppressor gene on 4q13.3 which is deleted in 35-50% MM. Gonzalez-Paz 17 MGUS, 40 smouldering MM, Methylation of P16 present in MM (34%) more 2007 (142) 522 MM frequently than smouldering MM (24%) or MGUS Method: MSP (28%). Methylation was not associated with reduced gene expression. Peng 2013 56 subjects, recently diagnosed 66% samples showed ADAMTS9 promoter (143) MM. methylation and gene silencing

36 Author Subjects, methodology Finding Yuregir 2009 20 newly diagnosed MM Methylation in P16 (10%), MGMT (40%), DAPK (94) Method: MSP (10%), ECAD (45%). Jost 2009 3 MGUS, 66 MM, 7 plasma cell Methylation of Frizzled-related proteins. (144) leukaemia (both at diagnosis and Methylation of SFRP1, -2 and -5 associated with gene relapse) silencing. Method: MSP Stanganelli 21 MGUS, 44 newly diagnosed Methylation status of tumour suppressor genes: 2010 (145) MM samples Methylation of SOCS-1 (52%), TP73 (45%), ARF Method: MSP (29%), P15 (32%), P16 (7%). Braggio 2010 68 newly diagnosed MM Methylation status of nine tumour suppressor genes: (36) Method: MSP Promoter hypermethylation in CDH1 (50%), P16 (43%), P15 (16%), SHP1 (15%), ER (13%), BNIP3 (13%), RARb (12%), DAPK 6%), MGMT (0%). Wong 2013 40 MGUS, 95 MM (newly 46% newly diagnosed MM, 41% relapsed MM: (146) diagnosed), 29 MM at mir129-2 methylation and reduced expression. relapse/progression Methylation and expression changes reversible Method: MSP following exposure to azacitidine Jung 2012 193 MM samples IGF1R and IL17RB-associated CpG probes were (147) Method: Goldengate Methylation methylation and genes expressed. Cancer Panel I, confirmed with P16 and DLC1-associated CpG probes were m bisulfite sequencing ethylated and genes expressed. Hatzimichael 40 MM samples Bcl2-interacting killer (BIK) gene methylated in 40% 2012 (148) Method: MSP cases at diagnoses. BIK methylation associated with evolution to relapsed / refractory disease Chim 2007 50 MM samples Methylation in genes associated with Wnt signaling (95) Method: MSP pathway: 42% samples demonstrated methylation of at least one of seven genes (WIF1, DKK3, APC, SFRP1, SFRP2, SFRP4, SFRP5) Kocemba 7 MGUS, 41 newly diagnosed MM Methylation of DKK1 promoter region associated 2012 (96) samples with reduced expression. Restoration of DKK1 Method: MSP and bisulfite expression noted after exposure to azacitidine. sequencing Wong 2011 95 MM at diagnosis, 23 MM MIR34B/C methylation and reduced expression in (149) relapse 5.3% MM at diagnosis and 52% at Method: MSP relapse/progression. MIR34B/C is a direct transcriptional target of TP53 MSP = methylation-specific PCR

Methylation patterns may be associated with the progression of disease from MGUS to MM and may also be associated with MM prognosis

Table 15: Summary of studies reporting an association between DNA methylation and the clinical progression in later stages of myeloma

Author Subjects, methodology Finding Guillerm 2003 61 newly diagnosed MM P16 methylation associated with poor overall (37) Method: MSP survival Galm 2004 56 MM samples P16 methylation associated with poor survival (89) Method: MSP of 11 candidate tumour suppressor genes Chim 2007 19 MGUS, 32 MM samples Methylation status of tumour suppressor genes. (150) Method: MSP Methylation of ECAD (56%), SHP1 (84%) and p16 (53%)was observed in MM but not MGUS.

37 Author Subjects, methodology Finding Park 2011 100 newly diagnosed MM P16 methylation an independent predictor for (151) samples overall survival Method: MSP Kim 2013 103 newly diagnosed MM P16 methylation present in 38% but not associated (152) Method: MSP with survival De Carvalho 51 MM samples Aberrant methylation in CDKN2B in 90%, CDH1 in 2009 (88) Method: MSP 88%, ESR1 in 73%, HIC1 in 73%, CCND2 in 63%, DCC in 45% and TGFBR2 in 39%. Methylation of DCC and TGFBR2 were associated with poor prognosis. De Larrea 75 MM relapsed after Hypomethylation in promoter of CXCR4 and NFKB1 2013 (153) bortezomib-based regimen. associated with increased survival. Method: PCR following digestion CXCR4 important in cell adhesion. with methylation-sensitive restriction enzymes Kaiser 2013 161 newly diagnosed MM Methylation of GPX3, RBP1, SPARC, TGFBI was (99) samples from patient in UK MRC associated with reduced gene expression and IX clinical trial shorter overall survival. These are putative tumour Method: Infinium 27K suppressor genes.

Environmental exposures and DNA methylation The putative associations between environmental exposures and changes in DNA methylation, in particular where exposures are also associated with MBCN risk are summarised below. Limited data are available to directly support a link between environmental exposures and methylation in the aetiology of MBCN but, given the known interaction between methylation and environmental exposures, these factors should ideally be identified and controlled for in large scale methylation studies.

Obesity/Diet Obesity induced by a high-fat diet fed to animals results in global DNA hypermethylation (154), while in humans, DNA methylation of specific obesity- related genes for leptin and adiponectin are associated with weight. An association between extreme caloric restriction and DNA methylation was proposed from data gathered following the Dutch Winter studies. There appeared to be an increased risk of colon cancer together with differentially methylated regions associated with prenatal exposure to famine (155).

Vitamin D Epidemiological studies generally suggest that increased exposure to ultra violet light is associated with a reduced risk of NHL (156) while low vitamin D levels are associated with shorter time to treatment in previously untreated CLL (60) as well as reduced overall survival from CLL and DLBCL (157). Epigenetic modification is a possible mechanism whereby vitamin D insufficiency may be related to MBCN risk – in cancer cell lines, the vitamin D receptor gene (VDR) is hypermethylated, with resultant reduced VDR expression. This effect can be reversed by exposure to

38 azacitidine with subsequent VDR hypomethylation and increased expression of the receptor (158).

Tobacco Smoking is modestly associated with follicular NHL risk, most prominently for females exposed to passive smoking (RR 2.02, 1.06-3.87) (159). Cigarette smoke exposure affects DNA methylation, with aberrant promoter hyper-methylation of the CDKN2A/P16INK4 gene demonstrated in the bronchial epithelial cells of smokers but not non-smokers (160). No association between tobacco smoke exposure and methylation is known in MBCN.

Folic Acid Specific dietary deficiencies are highly likely to be of importance to DNA methylation, particularly deficiencies of dietary sources of methyl groups including folate, methionine, betaine, serine and choline (161). Manipulation of dietary folate has been shown to affect DNA methylation. Women who were administered a low-folate diet (56μg/day) developed global DNA hypomethylation which was reversed on re- introduction of dietary folate at 516μg/day (162). This is the only publication thus far reporting an effect of dietary folate levels on methylation in humans. The association of MTHFR (methylenetetrahydrofolate reductase) gene polymorphisms with NHL risk is additional evidence of the importance of the folate metabolism pathway to NHL (163). Dietary folic acid has an established role in the formation of thymidylate for DNA synthesis as well as being a co-factor for vitamin B12 synthesis. Recommended daily intakes generally refer to the folic acid levels required for adequate red blood cell synthesis. These levels may not be the same threshold as those required for DNA methylation. The Australian National Health and Medical Research Council Nutrient Reference Values publish a Recommended Daily Intake of 400μg/day and an Estimated Average Requirement (daily level estimated to meet the requirements of half the health individuals) of 320μg/day (164).

Age Age-related changes in DNA methylation are well described, both at a global level as well as changes at specific CpG sites. Further to this, the methylation-based measures of biological ageing are themselves associated with increased cancer risk (165). Any study wishing to relate DNA methylation to cancer risk should, therefore, match for age in order to remove this potential confounder.

Epigenetic therapy in MBCN treatment

Hypomethylating agents are widely available and used to treat a number of haematological malignancies, particularly 5-azacitidine (AZA) in myelodysplasia (166). Their use for MBCN is being tested in phase 1 clinical trials. Pre-clinical data are encouraging, demonstrating that AZA treatment results in hypomethylation of MIR34B, with resultant re-expression of the gene, inhibition of cellular proliferation and enhancement of apoptosis (167).

39 2.3 Differential methylation as a marker of cancer risk

The hypothesis that non-mutation genetic variation may be a marker or determinant of cancer risk follows on from classical epidemiology studies evaluating familial and environmental risk factors as well as genetic epidemiology studies evaluating genetic variation as a risk factor for cancer. Non-mutation genetic variation encompasses the epigenetic processes of DNA methylation, histone modification and microRNAs. Epigenome-wide equivalents of GWASs were proposed as far back as 2008 (168). However, the term epigenome-wide association study is a misnomer given that the multitude of different epigenetic mechanisms cannot be encompassed in a single assay. DNA methylation is the most extensively investigated epigenetic mechanism as a risk factor for cancer due to the advances in high-throughput methylation assays. As a consequence there has been a rapid increase in ‘genome-wide’ methylation studies of solid cancers (168).

In a 2012 meta-analysis of publications reporting genome-wide methylation and cancer risk, 23 studies were included (169). Notably, only eight were performed in prospective cohorts and in six of the remaining 15 retrospective studies, subjects may even have received cancer treatment prior to blood sampling, which is highly likely to have affected measured methylation. These studies utilised older methods of DNA methylation analysis, with only the ability to comment on methylation across large regions such as repetitive elements (LINE-1, Alu) or in small pre-defined genomic regions. A disadvantage of using a retrospective case-control study for the evaluation of DNA methylation is the inability to determine whether the epigenetic variation is due to disease-associated differences or post-disease processes (168). The relative ease of subtraction or addition of a methyl group that occurs via methyl- and demethyl-transferases means the impact of treatments such as chemotherapy on DNA methylation are likely greater than for studies assessing genetic risk.

Studies measuring quantitative CpG site specific methylation have been performed in incident case-control studies for breast, bladder, gastric and small cell lung cancer cancers and colorectal adenoma, suggesting an association between peripheral blood-derived DNA methylation and cancer (170-180). A prospective, population- based study of breast cancer risk using the Infinium450K also found an association between DNA methylation more marked for cases with shorter time between blood collection and cancer diagnosis (176). In all these studies the absolute difference in DNA methylation between cases and controls is reported to be very small, in the order of 0.5% to 3%. This contrasts with studies comparing methylation in tumour and normal tissue in which differences in methylation of 20% are considered the threshold for defining differential methylation. Given the small differences of methylation anticipated, studies of peripheral blood methylation, therefore, require adequate statistical power and robust technical reliability in order to reduce the risk of reporting false positive results.

The first published study to report aberrant methylation in blood as marker for MBCN risk is presented in Results Chapter 5.1 (181). There has been one additional study examining DNA methylation changes in pre-diagnostic blood samples in 347

40 individuals from a prospective cohort, 28 of whom were diagnosed with CLL 2.0 – 15.7 years after enrollment (41). Using the Infinium 450K Beadchip, 722 differentially methylated (DM) CpG sites were identified, of which 73.4% overlapped with DM sites identified in a study of CLL cells (42).

2.4 Measuring DNA methylation

Comparison of genome-scale DNA methylation Assays

(A) DNA pre-treatment

Most DNA methylation assays rely on treatment of DNA before amplification or hybridization. The main approaches are endonuclease digestion, affinity enrichment and bisulfite conversion (182, 183).

I. Endonuclease (enzyme) digestion Enzyme digestion uses the properties of methylation-sensitive restriction enzymes which are inhibited by 5-methylcytosine (5mC) so that the patterns of digestion by such enzymes reflect DNA methylation (182). HpaII and SmaI are the most widely used restriction enzymes, inhibited by the presence of 5mC at a CpG site. Other endonucleases such as McrBC cleave at methylated sites. Endonuclease digestion produces fragments of either only methylated or only unmethylated DNA, and is followed by polymerase chain reaction (PCR) across the restriction site. Its strength is high sensitivity and its major weakness is false positive results due to digestion for reasons other than DNA methylation (182). Different techniques exist in order to couple the enzymatic methods to array-based analysis. One modification is to use the methylation-dependent endonuclease McrBC followed by comprehensive high- throughput arrays for relative methylation (CHARM). Another method involving HpaII restriction enzymes uses PCR to amplify the restriction fragments followed by array hybridization in a method known as HpaII tiny fragment enrichment by ligation-mediated PCR (HELP). Following on from array hybridization technology is next-generation sequencing to analyse the output of restriction enzyme procedures. Sequencing has been used to analyse the output of the HELP assay, while sequencing of HpaII or MspI digests is known as Methyl-seq.

II. Affinity enrichment Affinity enrichment of methylated regions using antibodies specific for 5mC or using methyl-binding proteins has the potential to enable comprehensive methylation profiling (182)The techniques MeDIP and mDIP involve enrichment of methylated regions by using an antibody specific for methylated cytosine followed by hybridization to a microarray. Enriched and non-enriched DNA are labeled with different fluorescent dyes, followed by two colour fluorescent hybridization. Calculation of relative fluorescent signal intensities is used to extrapolate DNA methylation information at the corresponding loci. Affinity enrichment methods can also be linked to sequencing analysis. The main strength of affinity enrichment is rapid genome-scale assessment of DNA methylation. However, they do not give

41 information on individual CpG site methylation, instead reporting on pre-defined regions. Significant experimental and bioinformatic adjustment is required to account for varying CpG density throughout the genome. Affinity-based methods are susceptible to measurement error where there are copy number alterations because the method does not measure the unmethylated version of a sequence.

III. Bisulfite conversion Bisulfite conversion takes advantage of the discovery in the 1990s that treatment of denatured genomic DNA with sodium bisulfite results in chemical deamination of unmethylated cytosines more rapidly than methylated cytosines (182). The result is that unmethylated cytosines are converted to uracils (‘C’ to ‘U’) and methylated cytosines remain ‘C’. This process transforms an epigenetic difference into a genetic one that can be detected using many technologies.

Bisulfite treated DNA allows for the measurement of methylation at single resolution. As bisulfite-treated DNA consists of three rather than four bases, standard hybridization arrays require significant modification in order to render an authentic signal. The Illumina GoldenGate BeadArray used primers to detect methylation at up to 1,526 different CpG sites. Primers were designed to be specific for methylated and unmethylated sequences and labeled with different fluorescent dyes (182). The subsequent stages of development have been assays with increasing numbers of primers/probes: the Illumina Infinium HumanMethylation 27K interrogating 27,578 CpG sites (184) and the Infinium 450K covering 485,000 CpG sites (185). Bisulfite-treated DNA is also well-suited to sequencing approaches. Ultra- deep sequencing of 100 PCR products at an average coverage of more than 1,600 reads is possible on the Roche 454 platform (182). A challenge of sequencing bisulfite-converted DNA is that the low sequence complexity can lead to sequence redundancy. Reduced representation bisulfite sequencing (RRBS) limits this phenomenon by using restriction enzymes to fractionate DNA sequences based on size and limit regions selected for sequencing. Regions with moderate to high CpG density are targeted with this method, making the assay less costly than genome- wide bisulfite sequencing, but excluding it from the class of truly genome-wide methylation assays (182, 184).

Some sources of error in bisulfite-based methods include incomplete bisulfite conversion and differential PCR efficiency for methylated and unmethylated versions of the same sequence. While completion of bisulfite conversion should be measured as a quality control step, over-treatment with bisulfite can degrade DNA and result in methylated cytosines converting to thymine residues, resulting in under-reporting of DNA methylation. In bisulfite-treated methods, single nucleotide polymorphisms of C to T at CpG sites are at risk of being mis-interpreted as methylation variation, hence it is recommended that single nucleotide polymorphisms be removed from analysis or confirm presence of absence of nucleotide polymorphism with sequencing (182).

42 (B) DNA Methylation measurement

Following DNA pre-treatment, a choice of technologies is available to measure DNA methylation.

I. Locus-specific analysis – Methods such as MeDIP-PCR, MethyLight and EpiTYPER® were the first methylation assays developed and although they are still in use are not applicable to large-scale epidemiological studies.

II. Gel-based analysis – Methylation-specific PCR (MSP) and combined bisulfite restriction analyses (COBRA) are commonly used assays for the purposes of measuring between 10-100 CpG sites per sample (182).

III. Array-based technologies include CHARM and Illumina’s Goldengate and Infinium platforms. The advantages of array-based assays include the ability to use small quantities of DNA in a high throughput, relatively low-cost assay (168). The number of CpG sites analysed in the assay varies greatly from 1,000 up to 850,000. The criteria for selection of CpG sites included in the array itself leads to bias, as generally CpG sites associated with genes of known disease interest, and regions with known high CpG density are selected for inclusion, leaving other areas of the genome relatively sparsely covered by the assay. The Illumina Infinium assay is discussed below in greater detail.

IV. Sequencing-based technologies provide the highest level of coverage and resolution for DNA methylation measurement. Other advantages include negligible bias towards CpG-dense regions. Ultra-deep sequencing of a limited number (100- 1,000 PCR products) is possibly by pyrosequencing of PCR products after bisulfite conversion. The gold standard for single-base pair resolution methylation is whole genome bisulfite sequencing (182). When combined with bisulfite-treated DNA, sequencing methods are still unable to differentiate between methylated and hydroxomethylated cytosine bases. Sequencing generally remains too expensive and time-consuming for genome-wide methylation profiling.

43 Illumina Infinium 450k Assay

The Infinium Illumina HumanMethylation450 BeadChip uses specific probes to characterise individual CpG sites in bisulfite converted DNA (185). The 50 base pair length probes hybridise to specific CpG sites. Allele-specific single base extension of the primer incorporates a biotin nucleotide (for C and G) or a dinitrophenyl labeled nucleotide (for A and T).

There are two probe chemistries used in the Infinium 450k assay. The type I assay uses two probes per CpG site, one each for methylated and unmethylated states. One probe hybridises to the unmethylated CpG site denoted by ‘T’. The other probe hybridises to the methylated CpG site denoted by ‘C’. The type II assay uses one bead type, with the methylated state determined at the extension step after hybridization. Most probes (75%) are type II design. The two assays have different characteristics, in particular the range of β values in the type II assay has been smaller than that of the type I assay (186), potentially resulting in bias. Several correction methods have been published including a validated methodology, Subset- quartile within array normalization (SWAN) by Maksimovic et al (187).

Although offering less coverage than sequencing methods the throughput, resolution and cost-effectiveness makes the Illumina Infinium an attractive platform for genome-wide methylation studies (168). There is a bias towards representation of annotated gene regions and CpG islands with 99% RefSeq genes covered and 96% of CpG islands represented.

It is important to note that, while often described as a ‘genome-wide’ assay for methylation, the Infinium 450K covers 482,421 CpG dinucleotide positions of the , distributed across all 22 autosomes and 1 sex chromosome pairs. From a functional genome distribution perspective, 200,339 CpGs (41%) are located in proximal promoters, defined as sites located within 200bp or 1,500 bp upstream of the transcription start site and in the 5’-untranslated region and exon 1. Of these CpG sites in proximal promoter regions, 46.1% are in CpG islands (the clusters of predominantly unmethylated CpG sites), 28.3% are in CpG shores, 3.8% in CpG shelves and 21.8% are in other regions of the genome (9).

Data processing - methods After probe hybridization, a signal for methylation or no methylation at each CpG site is generated. Background intensity is computed from a set of negative controls and subtracted from the raw methylation probe intensity. In contrast to genomic data which exhibit categorical properties, methylation status is a continuous variable, generally expressed as a ratio of methylated to unmethylated molecules. Methylation levels within the sample are reported as a ratio and can be expressed as an M-value or β-value.

The β value for methylation is the ratio of methylated probe intensity to the overall intensity (the sum of methylated and unmethylated intensities) and results in a value ranging between 0 and 1 or 0 and 100%. The β value has the advantage of intuitive

44 biological interpretation. In an ideal setting, a β value of zero indicates that all copies of the CpG site measured by the corresponding probe are completely unmethylated across the entire sample.

β = M/(M+U+α) (188) where M = intensity of methylated probes, U = intensity of unmethylated probes, α = constant offset (recommended to be set to α=100 in order to regularise β values when M and U values are small.

The M value is calculated as the log2 ratio of the intensity of methylated probes compared to unmethylated probes. An M value close to zero indicates similar intensity between methylated and unmethylated probes, which means the CpG site is about half methylated across all samples. Positive M values indicate that more CpG sites are methylated than unmethylated while negative values mean the opposite. The relationship between β and M values is linear between 0.2 and 0.8 for β values and -2 and 2 for M values however at the extremes of methylation β values are severely compressed. The M value is, therefore, preferred for regression analyses due to more constant standard deviation across the entire methylation range (189).

M value = log2M/U (190)

Reproducibility and accuracy Reproducibility between technical replicates (tumour cell lines and tumour tissue) is reported by Infinium to show an average correlation R2 of beta-values of 0.992 (191).

Accuracy is measured by concordance with gold standard methods for methylation assays. Roessler et al reported that of 352 tested CpG sites measured in primary tumour tissue, 60.5% are within 10% of the result obtained by quantitative pyrosequencing (192). As has been observed with previous array methodologies, there is lower concordance with the gold standard in the middle regions of methylation – when CpG sites are methylated between 25% and 75%. The authors recommend caution in describing CpG sites as hypo- or hyper-methylated if the CpG site is methylated in the middle range.

45 2.5 Measures of differential methylation

The most common measurement of differential methylation is the differentially methylated position (DMP), in which there is a difference in methylation at a single CpG site usually between a disease and non-disease state. DNA methylation altered at multiple CpG sites within a small genomic region is termed a differentially methylated region (DMR) (168).

Methylation variance is a more recently described phenomenon in which methylation at a single CpG site or region is found to be highly variable in a cohort of samples with disease while in a cohort of controls is found to be very consistent. Both differentially variable probes and differentially variable regions (DVRs) are described. Differential methylation variance is associated with some cancers (c- DVRs) and can exhibit specific patterns in different tissue types (t-DVRs) (168).

46 2.6 Biological challenges in measuring DNA methylation

Confounders of DNA methylation

Confounding of DNA methylation results can be due either to genetic or congenital factors or environmental exposures. Potential confounders include ethnic background, age (193), smoking (194, 195), sun exposure (193), dietary folate and other micronutrient intake (162, 196-198), obesity (199), and alcohol intake (200). Rakyan et al recommend measuring and adjusting for covariates (201) if they are known to have an independent effect on phenotype, as adjustment may allow for better estimation of the direct methylation effect.

Use of whole peripheral blood

Whole blood is a readily accessible source of DNA and can be collected and stored relatively easily and inexpensively, properties which are desirable qualities for large genetic epidemiological studies. In this study, whole blood was stored predominantly as dried blood spots on Guthrie cards, a method validated for analysing DNA methylation results using the Infinium 450K (202). In this study, in a number of individuals for whom blood was sampled, it was stored as both dried blood spots and frozen buffy coats . Methylation profiles of blood stored as dried blood spots and buffy coats from the same individuals was compared and demonstrated to be highly reproducible (means correlation, r = 0.9907). However in further detail, there were 5,723 CpG sites (of total >470,000 in the assay) demonstrating a significant difference of methylation between dried blood spots and buffy coats (comparison of methylation at individual CpG sites, F-test, q≤0.01) (202). The high means correlation results support the use of DNA from dried blood spots as an alternative for DNA methylation profiling. But, the detection of even a small proportion of probes with differences in methylation underscores the importance of matching for DNA source in case-control studies due to potential systematic differences due to preparation, storage conditions or cell population in the two methods.

Challenges of using whole peripheral blood

(I) Effect of different white blood cell proportions on methylation Whole blood in a healthy individual contains DNA from different subtypes of white blood cells – the major subgroups being lymphocytes (B and T), monocytes and granulocytes. In any healthy individual, there will be variation in the proportion of white blood cell subtypes in the peripheral blood. This may have a significant effect on methylation studies of whole blood as methylation patterns are recognised to be tissue-specific to the extent that characteristic methylation patterns can reliably predict cell of origin; not only between skin cells and blood cells, for example, but also between the different leucocyte subtypes (203-206). Hierarchical clustering of genome-wide methylation data from purified leucocytes is consistent with the well- established model of haematopoietic differentiation (204).

47 The magnitude of difference between white blood cell subtypes may be large enough to confound the differences one may expect to see between cases and controls. Using data from published case-control series of ovarian cancer, bladder cancer and head and neck squamous cell carcinoma (HNSCC), Koestler et al documented a number of DMRs in the leucocyte subgroups. They reported that the magnitude of difference between beta-values of the proposed leucocyte DMRs was equivalent to the difference between beta-values of controls and cancer cases in the ovarian and HNSCC data sets (203). Clearly, the variable proportion of leucocytes in whole blood samples may present a significant confounder in any study of whole blood methylation. Houseman et al have developed a model that uses known leucocyte DMRs to predict leucocyte proportions in whole blood (207). The model was recently validated in a sizeable cohort, using methylation data from the Infinium 27K to accurately predict the percentage of monocytes and lymphocytes in whole blood compared with a full blood count analyser (208). The aim in applying such a model is to use methylation data to impute the white blood cell composition of blood samples, using a regression model.

(II) Possibility of circulating tumour cells Whole blood in an individual who subsequently develops MBCN may contain circulating tumour cells or tumour DNA as some of these malignancies are characterised by a long latency during which individuals remain asymptomatic and/or are associated with a pre-malignant precursor condition. In some cases there may be perturbation of the total lymphocyte count while in others at earlier stages the lymphocyte count may remain unchanged. The performance of the white blood cell correction algorithms such as the Houseman algorithm described above is not published or otherwise described for this situation.

Chronic lymphocytic leukaemia (CLL) is preceded nearly universally by a condition known as monoclonal B-lymphocytosis in which pre-malignant monoclonal B cells are detectable in the peripheral blood at very low levels which do not necessarily result in a raised total lymphocyte count (209). The duration of monoclonal lymphocytosis prior to CLL is not clear and likely varies greatly between individuals. In a study of pre-diagnostic blood samples collected a median of 3 years before the diagnosis of CLL detectable mutations consistent with circulating B cell clones were detected in 44 of 45 patients (209). Conversely, in a healthy population of 1,520 adults aged 62-80 years, monoclonal B cells (defined by abnormal surface immunophenotype) were detectable in the blood of 5.1% subjects with a normal lymphocyte count and 13.9% subjects with an elevated lymphocyte count (210). The annual rate of subsequent progression to overt CLL was 1.1%, , the only factor associated with increased risk of progression being an increase in B lymphocyte level of HR 1.46 (95% CI .1.12-1.91) associated with each increase of 1,000 B cells per ml.

For multiple myeloma the analogous precursor condition is monoclonal gammopathy of undertermined significance (MGUS) that is defined by the detection of an abnormal monoclonal protein (211). MGUS can be detected in 100% of MM cases at least 2 years prior to MM diagnosis, in 93% cases 7 years prior to diagnosis and in 82.4% cases 8+ years prior to diagnosis (211). The prevalence of MGUS

48 increases with age, and in a white American population was reported to be 3.2% in those >50 years and 7.5% in those aged 85 years and older (212). The presence of circulating clonal plasma cells and associated tumour DNA in myeloma is now well- established (5, 213) but the frequency of detection of the same in MGUS is yet to be described so far.

For follicular lymphoma, the characteristic genetic lesion, rearrangement of the BCL- 2/JH genes, is detectable in peripheral blood lymphocytes in up to 50% of healthy adults (214). The high prevalence of this mutation in healthy adults confirms that its presence alone is insufficient for the development of overt lymphoma. In a case- control study of subjects diagnosed with follicular lymphoma nested in the European Prospective Investigation Into Cancer and Nutrition (EPIC), pre-diagnostic blood collected a mean of 6.4 years prior to diagnosis was evaluated for the t(14;18) chromosomal translocation. The t(14;18) was detected in 56% of cases compared with 29% of controls (p<0.001) (215). The high prevalence of this mutation in circulating lymphocytes of healthy controls, as well as the significant difference between cases and controls, raises the possibility for this to be a potential confounder in methylation analysis of peripheral blood.

In summary, correction for white blood cell content appears to be important to adjust for cellular heterogeneity but alone is unlikely to account for the presence of low levels of circulating tumour cells and circulating tumour DNA. One of the potential outcomes of this study is the identification of a biomarker for early development of MBCN, and for this purpose whole blood is an appropriate source of DNA.

49 3 Study Design 3.1 Melbourne Collaborative Cohort Study

The Melbourne Collaborative Cohort Study (MCCS) is a prospective study of 41,513 people (24, 469 women and 17,044 men) recruited between 1990 and 1994, designed to investigate the roles of diet and lifestyle in causing cancer. While repeated blood sampling was part of the study methodology, the rapid growth in genomics was not specifically part of the initial study plan (216). Blood samples were collected from 41, 133 participants. Initially blood was collected and separated into mononuclear cells by Ficoll separation (n=9, 832) or buffy coat by centrifugation (n=635). Due to inadequate budget, from 1991, whole blood was stored on Guthrie diagnostic cellular filter paper as dried blood spots (n=30, 633). The majority of participants were identified from the Victorian state electoral rolls, on which enrollment is a required of Australian citizens. Announcements were also made targeting the Italian and Greek communities of Melbourne. 99.3% were aged between 40 and 69 years at baseline. Participants were Caucasian, of whom 25% were born in Italy or Greece and the remainder in Australia, New Zealand or the United Kingdom. Compared with the general population of Melbourne in the same age group, there were more females (59%) and the average age of MCCS participants (55 years) was slightly higher than the general population (216).

At recruitment (baseline), a wide range of epidemiological, lifestyle, personal medical history and medication data was collected via questionnaires, including diet, skin type, physical activity, alcohol and smoking. Dietary information was obtained using a food frequency questionnaire specifically developed for the MCCS (217) which estimates the intake of 121 food items including meats, fish, fruits, vegetables, cereal-based foods, fats and oils, and non-alcoholic beverages. Additionally, the FFQ is used to calculate intakes of macronutrients such as protein, carbohydrates, fats, and micronutrients such as fatty acids, vitamins, calcium, iron, folate, carotenoids, etc. The consumption of alcoholic beverages and smoking were captured by a separate questionnaire. The MCCS has also collected information on family history of malignancy and could be used to identify families to be followed up as a result of the project’s findings. In addition, direct physical measurements (height, weight, waist and hip circumferences and blood pressure) were made according to standard protocols and a blood sample was taken from virtually every participant. For the purposes of passive follow-up, the MCCS is matched routinely to the Victorian Cancer Registry and to other cancer registries in Australia via the Australian Institute for Health and Welfare which links it to the Australian Cancer Database and to the National Death Index. It is also linked 6-monthly to the Victorian Electoral Enrolment Register to update changes in address. Follow-up is maintained at a high level with only 110 (0.03%) of cohort members known to have left Australia.

Genome-wide association studies have been conducted on DNA samples from cases with breast, prostate, colorectal and urothelial cancers using the Illumina Octorara beadchip. Genome-wide DNA methylation has been evaluated in a series of nested

50 case-control studies of MCBN, prostate, colorectal, urothelial, kidney, gastric and lung cancers using the Illumina Infinium HumanMethylation450 BeadChip (216).

3.2 Nested Case-Control Study, participant selection

Participants were enrolled in the MCCS (216). Incident cases of MBCN diagnosed between baseline attendance and 31 December 2011 were identified by linkage to the Victorian Cancer Registry. The Victorian Cancer Registry receives mandatory notifications of all new cancer cases in Victoria from both hospital medical records and pathology laboratories. The diagnosis is preferred to be obtained from the diagnostic histopathology report. According to routine Victorian Cancer Registry practice, each case is assigned a diagnostic code according to the (International Classification of Diseases for Oncology, 3rd Edition) ICD-O-3 (218). Cases of mature B cell malignancies were identified classified according to ICD-O-3 codes 9670, 9823, 9680, 9690, 9691, 9695, 9698, 9731, 9732, 9733, 9734, 9671, 9673, 9675, 9678, 9679, 9684, 9687, 9689, 9699, 9761, 9764, 9826, 9833, 9940. The diagnostic histopathology reports were reviewed by the clinical haematology investigator (NWD) in order to confirm that the appropriate ICD-O-3 code was assigned to each case.

Controls were selected using density sampling with attained age as the time scale. Controls were individually matched to cases at a 1:1 ratio based on age at enrolment (+/- 1year), sex, country of birth (Southern Europe [Italy or Greece] or United Kingdom, Australia or New Zealand) and DNA source (mononuclear cells, buffy coat or dried blood spot).

Participants with a history of cancer (other than non-melanoma skin cancer) before baseline or those with no baseline DNA sample were ineligible. Peripheral blood samples were obtained as part of the MCCS at study entry (baseline).

Due to the potential influence of environmental exposures on DNA methylation, baseline smoking status, dietary folate intake, alcohol consumption and BMI were considered to be potential confounding variables. Information on smoking and alcohol was obtained from an interviewer-administered questionnaire. Smoking status was categorised as never smoked, former smoker and current smoker while alcohol consumption was categorised as no intake, 1-39g/day, 40-59g/day and ≥60g/day. Dietary folate intake was assessed by a 121-item self-recorded food frequency questionnaire specifically designed for the MCCS and categorised as <320μg/day or ≥320μg/day, the level estimated to meet the requirements of half of healthy adults (164). BMI was calculated from measured height and weight performed at baseline and categorised as <30 or ≥30, corresponding to the BMI threshold at which obesity is defined by the WHO.

Study participants provided written, informed consent and the study was approved by Cancer Council Victoria’s Human Research Ethics Committee and performed in accordance with the institution’s ethical guidelines.

51

Between baseline attendance (period spanning 1990-1994) and 31 December 2011, 572 cases of MBCN were notified to the Victorian Cancer Registry. Six cases were identified as having a pre-baseline or concurrent cancer diagnosis of which two were only identified as a result of review of the diagnostic pathology reports by the clinical haematology investigator (NWD). Of the remaining 566 cases, a matched control was available in 469. After DNA extraction for 469 pairs, for 5 there was insufficient DNA for the methylation assay. A total of 464 pairs were, thus, assayed on the Infinium 450K with only one sample failing the quality control threshold. The remaining 463 pairs were processed using the normalization procedure described in Methods Chapter 4.4. One case appeared to have been matched with two different controls and due to a discrepancy in dealing with the correct case-control pair both pairs were excluded.

A check of actual sex versus methylation findings of X chromosome inactivation and Y chromosome signals was performed and surprisingly there was an apparent sex discrepancy in 20 cases and 6 controls. A thorough reconciliation was undertaken to determine whether a labeling error had occurred. Three unique patient identifiers (Victorian Cancer Registry identifier, MCCS identifier and date of birth) were checked and no clerical error was found. After the corresponding 23 pairs were removed,438 case-control pairs remained available for statistical analysis.

Table 16: Histological diagnoses and assigned tumour group

ICD-O-3 N Morphology Assigned n code tumour group 9670 25 CLL Low grade 137 9823 57 CLL / Small lymphocytic lymphoma Low grade 9671 4 Lymphoplasmacytic Lymphoma Low grade 9673 17 Mantle cell lymphoma Low grade 9689 6 Splenic marginal zone lymphoma Low grade 9761 9 Waldenström’s macroglobulinaemia Low grade 9940 4 Hairy cell leukaemia Low grade 9699 15 Marginal zone lymphoma Low grade 9679 3 Mediastinal B cell lymphoma High grade 110 9680 102 DLBCL High grade 9675 2 Mixed small and large cell diffuse lymphoma High grade 9684 2 DLBCL, immunoblastic High grade 9687 1 Burkitt lymphoma High grade 9690 9 Follicular lymphoma Follicular 82 9691 30 Follicular lymphoma Follicular 9695 31 Follicular lymphoma Follicular 9698 12 Follicular lymphoma Follicular 9731 2 Plasmacytoma Myeloma 109 9732 106 Multiple myeloma Myeloma 9733 1 Plasma cell leukaemia Myeloma Total 438 438

52

572 MBCN cases in MCCS cohort reported to Victorian Cancer Registry

Missing DNA sample n=4 Pre-baseline cancer idenfied by Victorian Cancer Registry n=97

471 casses matched with control

Aer review of pathology reports, 2 addional cases were found to have a pre- baseline haematological malignancy#

469 pairs DNA extracted

Insufficient DNA n=5

464 assayed with Infinium 450K

QC failure n=1

463

A duplicated case-control pair idenfied, both pairs removed

461

Sex discrepancy in 20 cases and 6 controls. 23 matched pairs removed

438 pairs for stascal analysis

Figure 1: Flow diagram of study participants # One case had a pre-baseline diagnosis of chronic lymphocytic leukaemia and one case had a concurrent diagnosis of MBCN and chronic myelmonocytic leukaemia

53

Characteristics of the study sample are shown below. The median age at diagnosis was 69 years and median time between baseline DNA collection and MBCN diagnosis was 10.6 years (range 2.4 months – 20 years). There were no significant difference between cases and controls for body mass index, smoking status, folate intake or alcohol consumption.

Table 17: Demographics of study population

Controls Cases N= 438 N= 438 Age at enrolment Mean 58.7 58.8 (years)* SD 7.8 7.9 Range 40-70 40-70 DNA source* Dried blood spot 316 316 Lymphocyte 117 117 Buffy coat 5 5 Age at diagnosis Mean NA 76.9 (years)* SD NA 7.8 Range NA 56-91 Ethnicity* Anglo-Celtic 340 340 Southern European 98 98 Body Mass Index Mean 27.3 27.1 (BMI)* SD 4.4 4.3

BMI category BMI ≥30 87 87 Smoking status* Never smoked 246 241 Former smoker 151 152 Current smoker 37 43 Daily folate intake* Mean 328 343 (mcg/day) SD 140 177

Folate intake <320mcg/day 216 226 category* Daily alcohol intake* 0g/day 160 158 1-39g/day 215 228 40-59g/day 40 37 ≥ 60g/day 23 15 * p=ns between controls and cases

54 4 Methods 4.1 DNA source and sample collection

Blood samples were collected at baseline entry to the study, prior to any diagnosis with cancer. For 636 samples, the DNA source was from dried blood spots. For dried blood spot samples, whole blood was collected and transferred to Guthrie Card Diagnostic Cellulose filter paper (Whatman, Kent, UK) and stored in airtight containers at room temperature. For 234 samples, the DNA source was peripheral blood mononuclear cells and for ten samples the DNA source was buffy coats. Mononuclear cells were isolated by density gradient centrifugation method using Ficoll-Paque Plus (GE Healthcare, Parramatta, Australia). Briefly, whole blood was centrifuged and plasma was removed. The red cell fraction was carefully transferred onto Ficoll and spun down for 20 min without applying a brake. After the spin, the mononuclear cell layer was transferred and washed with RPMI medium. Mononuclear cells and buffy coats were stored in liquid nitrogen at -80°C since collection.

4.2 DNA Extraction and Bisulfite conversion

DNA extraction and bisulfite conversion was performed in the Genetic Epidemiology Laboratory, at The University of Melbourne. DNA was extracted from lymphocytes and buffy coats in a 96-well format using QIAamp 96 DNA blood kit (Qiagen, Hilden, Germany). DNA was extracted from dried blood spot samples using a published method (202) whereby 21x3.2mm blood spot punches were re-suspended in water and lysed in phosphate-buffered saline using Tissue Lyser (Qiagen). The resulting supernatant was processed using Qiagen mini spin columns according to the manufacturer’s protocol. The quality and quantity of DNA was confirmed using Quanti-i™ PicoGreen®ds DNA assay measured on the Qubit® Flurometer (Life Technologies, NY, USA). The ideal quantity of DNA was 1ug, whereby 300ng was the minimum quantity considered acceptable for methylation analysis and the remaining DNA stored for other applications such as genotyping assays. Where an insufficient quantity of DNA was extracted, samples were reassessed and DNA re-isolated where possible. If the DNA source from a control sample was exhausted, replacement controls were requested from MCCS. Where DNA source from a case was exhausted, the case-control set was dropped.

Bisulfite conversion was performed using Zymo Gold single tube kit (EZ DNA Methylation-Gold kit, Zymo Research, CA, USA) according to the manufacturer’s instructions. Post-conversion quality control was performed using SYBR Green (Applied Biosystems) quantitative PCR, an in-house assay, designed to determine the success of bisulfite conversion by comparing amplification of the test sample with positive and negative controls.

55 4.3 DNA methylation measurement

Plate Design Samples were processed in batches of 96 samples per plate (8 Infinium HumanMethylation450K BeadChips per batch). A total of 200 ng of bisulfite converted DNA was used per sample.

Array-based technologies are subject to intra- and inter-run batch effect due to environmental factors (e.g., temperature, humidity), reagent variability or operator effect. The following principles were therefore adhered to: 1. Matched case and control samples were always processed together and run on the same plate, placed consecutively to minimise potential batch effects 2. MBCN tumour subtypes were distributed evenly across all plates, to reduce the risk of incorrectly attributing difference in plate-to-plate measurement to differences between tumour subtypes 3. Two technical replicates and two cell lines (multiple myeloma cell lines U266) were included per 92 samples to check reproducibility of the assay 4. The positions for cases and controls were randomly distributed on each BeadChip in order to minimise any position effects within the chips

Methylation Analysis The Infinium HumanMethylation450 BeadChip (Illumina, San Diego, CA, USA) was used for methylation analysis in accordance with the manufacturer’s instructions. In further detail, bisulfite-converted DNA was denatured and neutralised. Isothermal whole genome amplification took place overnight. The amplified product was fragmented by an enzymatic process controlled by end-point fragmentation to avoid over fragmenting the sample. After isopropranolol fragmentation, the fragmented DNA was collected by centrifugation and the precipitated DNA resuspended in the hybridization buffer. Re-suspended DNA samples were distributed onto the BeadChips for hybridisation.

The TECAN automated liquid handler (Tecan Group Ltd, Mannedord, Switzerland) was used for the single base extension and staining BeadChip steps. A final washing step was performed to remove unhybridised DNA and ready the plate for staining and primer extension. Single base extension of the oligos on the BeadChip incorporates biotin (ddCTP and ddGTP) and 2,4-dinitrophenol (ddATP and ddTTP). Staining with antibodies and fixation with fluorophores is performed prior to scanning with the Illumina HiScan SQ scanner.

4.4 Data processing Methylation data were imported into the R statistical software (219) and processed using the ‘minfi’ bioconductor package (220). Background correction and between- sample normalization were performed using the ‘preprocessIllumina’ function of minfi which is equivalent to standard normalization methods used in the

56 GenomeStudio software package provided by Illumina. Subset quantile within array (SWAN) was performed for type I and type II probe bias correction (187). Illumina normalization to control for position effect, SWAN correction to correct for type I and II probe bias and ComBat normalization to correct for chip and batch effect are now part of a standard pipeline for analyzing such datasets (221).

Samples were excluded if >5% CpG probes (excluding chrX and chrY probes) had a detection p-value higher than 0.01. CpG probes were excluded from further analysis if they had detection p-value >0.01. ComBat normalization was applied to minimise chip and batch effects (222). Following methylation analysis, using the Infinium HumanMethylation450K BeadChip, only one pair of samples was rejected on the basis of returning >5% of probes with detection p-value >0.01.

4.5 CpG site selection

Out of a total of 485,512 sites covered by the Infinium 450k array, after quality control, 40,961 (9%) had a detection p value ≥0.01 and were excluded from analysis. All CpG sites on sex chromosomes were also excluded. The resulting 444,551 CpG sites were included in the analysis of global DNA methylation. For the analysis of differentially methylated positions and regions, SNP-associated CpG sites and non- CpG targeted sites were removed, leaving 416,669 CpG sites considered in the analysis of differentially methylated positions.

Figure 2: Strategy for selecting CpG sites to include in the final analysis

57 4.6 Assembly of Candidate Genes

Genes identified as mutated in MBCN Following the literature review outlined in Chapter 2, a list of 201 known mutations occurring in MBCN was compiled and used to evaluate the relevance of methylation findings (Appendix Table 1). The presence of mutations in the tumour tissue leads to the possibility that they are important in some element of the pathogenesis of MBCN, either through activating or deactivating function. However, only a smaller proportion is likely to comprise actual driver mutations and some are only found in small percentages of MBCN cases. Identification of known mutations in MBCN is important in order to consider two possible relationships between gene mutation and methylation.

First, a finding of aberrant methylation in these candidate genes may indicate that gene regulation via DNA methylation is an important alternative non-mutation pathway in the pathogenesis of MBCN.

Second, an alternative possibility is that a finding of aberrant methylation in these genes could be a downstream result of a mutated gene within circulating tumour cells rather than of a primary DNA methylation abnormality itself. Ideally this would be confirmed by mutation analysis before a definitive conclusion could be drawn.

Genes identified as aberrantly methylated in MBCN Following the literature review of aberrant methylation in MBCN described in Chapter 2, a list of differentially methylated genes was compiled for comparison with our results (Appendix Table 2).

58 5 Results 5.1 Global DNA Methylation

Wong Doo, et al. Global measures of peripheral blood-derived DNA methylation as a risk factor in the development of mature B-cell neoplasms. Epigenomics 2016 (1), 55- 56

Reproduced with permission from Epigenomics as agreed by Future Medicine Ltd.

59

60 Research Article Wong Doo, Makalic, Joo et al.

to detect novel regions of differential methylation asso- We considered baseline smoking, folate and alcohol ciated with a cancer phenotype. Although any tissue can intake and BMI as potential confounding variables. be used, DNA from peripheral blood offers the advan- Information on smoking and alcohol consumption was tage of being a readily available tissue from which new obtained from an interviewer-administered question- biomarkers for the prevention, early detection and mon- naire. Folate intake was assessed by a 121-item food itoring of cancer could be identified [17] . To date, several frequency questionnaire specially designed for the retrospective case–control studies have demonstrated MCCS. BMI was calculated from height and weight, aberrant global methylation profiles in the peripheral which were measured at baseline. blood of patients with colorectal cancer, breast cancer, Study participants provided written, informed head and neck cancers and urothelial cancers [18] but consent. The study was approved by Cancer Council there are no published prospective studies regarding Victoria’s Human Research Ethics Committee and MBCN. The only published study of DNA methylation performed in accordance with the institution’s ethical in peripheral blood in MBCN is a retrospective case– guidelines. control study of follicular lymphoma measuring DNA methylation in the DAPK1 gene only [19] . DNA source & sample collection To our knowledge, ours is the first prospective to Blood samples were collected at baseline entry into describe the importance of DNA methylation profile the study, prior to any diagnosis with cancer. For 636 measured in peripheral blood to the risk of MBCN samples, the DNA source was from dried blood spots tumors. We evaluated a global measure of DNA methyla- (DBS). For DBS samples, whole blood was collected tion as a predictor of subsequent development of MBCN and transferred to Guthrie Card Diagnostic Cellu- using the Infinium® HumanMethylation450 BeadChip lose filter paper (Whatman, Kent, UK) and stored in (Infinium 450k; Illumina, CA, USA) which is suitable airtight containers at room temperature. Methylation for epigenetic epidemiological studies due to the small profiles from DBS using the Infinium HumanMeth- amounts of DNA required, the high-throughput capa- ylation450 BeadChip are highly reproducible between bility and the ability to interrogate methylation at single technical replicates (means correlation, r = 0.9932) and base pair resolution [20]. Our aims were to assess whether compared with methylation from buffy coat samples a global measure of DNA methylation dispersed over a from the same individuals (r = 0.9932) [20]. For 234 large portion of the genome could be detected in DNA samples, the DNA source was peripheral blood mono- from peripheral blood, and whether such a measure was nuclear cells and for ten samples the DNA source was associated with the risk of developing MBCN. buffy coats. Both had been stored at -80°C since col- lection. Mononuclear cells were isolated by density gra- Materials & methods dient centrifugation method using Ficoll-Paque Plus Participants (GE Healthcare, Parramatta, Australia). Briefly, whole Participants were enrolled in the Melbourne Collabora- blood was centrifuged and plasma was removed. The tive Cohort Study (MCCS), a prospective cohort study red cell fraction was carefully transferred onto Ficoll of 41,514 healthy adult volunteers (24,469 women, and spun down for 20 min without applying brake. 17,045 men) aged between 27 and 76 years (99.3% aged After the spin, the mononuclear cell layer was trans- 40–69 years), recruited between 1990 and 1994 [21]. ferred and washed with RPMI. The mononuclear cells Peripheral blood samples were obtained from partici- were stored in liquid nitrogen. pants at study entry (baseline). For this study, those with a history of cancer before baseline (n = 1970) or no DNA extraction & bisulfite conversion baseline DNA sample (n = 420) were ineligible, leaving DNA was extracted from mononuclear cells and buffy 39,124 eligible participants. Incident cases of MBCN coat specimens using QIAamp mini spin columns diagnosed between baseline attendance and 31 Decem- (Qiagen, Hilden, Germany). DNA was extracted ber 2011 were identified by linkage to the Victorian from dried blood spots using a published method [20]. Cancer Registry, which receives mandatory notification Briefly, 20 blood spots of 3.2 mm diameter were of all new cancer cases in Victoria, Australia. The diag- punched from the Guthrie card and lysed in phos- nostic pathology reports were reviewed and classified phate-buffered saline using Tissue Lyser (Qiagen). according to the International Classification of Disease The resulting supernatant was processed using Qia- (ICD-O-3). Controls were selected using density sam- gen mini spin columns according to the manufactur- pling with attained age as the time scale and were indi- er’s protocol. The quality and quantity of DNA was vidually matched to cases at a 1:1 ratio based on age at assessed using the Quant-iT™ Picogreen®ds DNA enrollment (±1 year), gender, ethnicity and DNA source assay measured on the Qubit® Fluorometer (Life (see below). Technologies, NY, USA), with a minimum of 0.3 μg

56 Epigenomics (2016) 8(1) future science group

61 Global measures of peripheral blood-derived DNA methylation in mature B-cell neoplasms Research Article

DNA considered acceptable for methylation analysis. batch effect have been previously used as a pipeline for Bisulfite conversion was performed using Zymo Gold analyzing such datasets [26]. Following methylation single tube kit (EZ DNA Methylation-Gold kit, Zymo analysis, using the Infinium HumanMethylation450 Research, CA, USA) according to the manufacturer’s BeadChip, only one pair was rejected on the basis of instructions. Postconversion quality control was per- returning >5% of probes with detection p-value > 0.01. formed using SYBR green-based quantitative PCR, an A total of 19,834 probes was identified as hybridizing in-house assay, designed to determine the success of to multiple genomic locations [27], and analysis was bisulfite conversion by comparing amplification of the performed before and after exclusion in order to avoid test sample with positive control and negative controls. unexpected effect of these cross-hybridizing probes on global methylation. DNA methylation assay Samples were processed in batches of 96 samples per Statistical analysis

plate (8 Infinium HumanMethylation450 BeadChips The M-value is defined as log2 (Meth/Unmeth)], per batch). In order to minimize potential batch where Meth, Unmeth are the intensities of the methyl- effects, matched cases and controls were processed ated and unmethylated probes, respectively. M-values together and run on the same BeadChip and cancer and β-values were calculated using minfi [23]. Statis- subtypes were evenly distributed across the plates/ tical analyses were performed using M-values due to chips. Two technical replicates and two controls (mul- the possibility of heteroscedasticity encountered when tiple myeloma cell line U266) were included on each using β-values [28]. We calculated genome-wide global plate. The positions for cases and controls were ran- DNA methylation using all 444,551 probes and global domly ordered for every BeadChip to reduce any pos- DNA methylation in specific regions of the genome. sible position effects within the chips. The Infinium Using the annotation file provided by Illumina, CpG HumanMethylation450 BeadChip analysis was per- sites were classified according to their location in formed according to manufacturer’s instructions. A CpG islands, CpG shores or shelves or other (‘open total of 200 ng of bisulfite converted DNA was whole sea’ locations). Promoter regions were defined as loci genome amplified and hybridized onto the BeadChips. 1500 bp upstream of the transcription start site, within The TECAN automated liquid handler (Tecan Group enhancer-associated regions and within the 5′UTR. Ltd, Mannedord, Switzerland) was used for the single- Promoter regions were further divided according to base extension and staining BeadChip steps. Bisulfite their CpG content and ratio, as differential CpG con- conversion and the Infinium 450k methylation assay tent within promoter regions is known to influence were performed in accordance with the manufacturer’s methylation profile and gene expression [29,30]. High instructions. CpG promoters (HCP), intermediate CpG promot- ers (ICP) low CpG promoters (LCP), as described Data processing by Weber et al. [29], were analyzed using a published Methylation data were imported into R statistical annotation file [31]. software [22] and processed using ‘minfi’ bioconductor Stata Version 10 (StataCorp, TX, USA) was used package [23]. Background correction and between-sam- for statistical analysis. For each sample, the median ple normalization were performed using the ‘prepro- M-value across all probes was defined as the global cessIllumina’ function of minfi which is equivalent to measure of DNA methylation and grouped into tertiles standard normalization methods used in GenomeStu- based on the distribution in controls. Hypomethylation dio software package provided by Illumina. Subset- was defined as the lowest tertile of M-values and hyper- quantile within array (SWAN) was performed for methylation defined as the highest tertile of M-values. type I and II probe bias correction [24]. We excluded 65 Conditional logistic regression was used to estimate CpG sites corresponding to known single nucleotide odds ratios (OR) in relation to tertiles of global DNA polymorphisms. Samples were excluded if >5% CpG methylation, with the middle tertile as the reference probes (excluding chrX and chrY probes) had a detec- category. Variables adjusted for in conditional logistic tion p-value higher than 0.01, which were regarded regression were: gender, age at enrollment, ethnicity as probes with ‘missing value’, while CpG sites were and DNA source. Associations were also assessed by excluded from further analysis if they had missing val- including the median M-value as a continuous explan- ues for one or more samples. ComBat normalization atory variable. p-values from the likelihood ratio test was applied to minimize chip and batch effects [25]. were reported. The likelihood ratio test was also used Illumina normalization to control for position effect, to test for subgroup heterogeneity. SWAN correction to correct for type I and II probe The majority of blood in our study was collected bias and ComBat normalization to correct for chip and and stored as whole peripheral blood in which the

future science group www.futuremedicine.com 57 62

Research Article Wong Doo, Makalic, Joo et al.

proportions of leukocytes was not known. Due to the leuko cyte proportions (B lymphocytes, T lymphocytes, described effect of differential leukocyte cell content natural killer (NK) cells, monocytes, granulocytes) in on DNA methylation profile [32] we sought to control cases and controls. Leukocytes with significantly differ- for leukocyte heterogeneity. We therefore applied the ent proportions in cases and controls were factored into algorithm of Houseman et al. [32] in which the cell com- conditional logistic regression to determine whether position of peripheral blood samples is estimated based adjustment for cell composition affected the OR. on distinctive methylation profiles. This was performed The effect of the time interval between baseline and in minfi. A paired t-test was performed comparing MBCN diagnosis on the ORs was analyzed by group-

Table 1. Characteristics of study sample. Demographics Controls (n = 438) Cases (n = 438) p-value Age at enrollment (years): – Median 59 59 † – SD 8 8 – Range 40–70 40–70 DNA source: – Dried blood spot 316 316 † – Mononuclear cell 117 117 † – Buffy coat 5 5 † Age at diagnosis (years): – Median 69 69 – SD 9 9 – Range 42–87 42–87 Tumor subtype: – MM 109 – Follicular lymphoma 81 – Low-grade NHL/CLL 136 – High-grade NHL 112 Ethnicity: – Anglo-Celtic 340 340 † – Southern European 98 98 BMI:‡ – BMI ≥30 87 87 0.58 Smoking status‡: – Never smoked 246 241 0.53 – Former smoker 151 152 – Current smoker 37 43 Daily folate intake‡: – <320 μg/day 216 226 0.11 Daily alcohol intake‡: – 0 g/day 160 158 0.54 – 1–39 g/day 215 228 – 40–59 g/day 40 37 – ≥60 g/day 23 15 †6ARIABLESMATCHEDFORCASES CONTROLS ‡-EASUREDATBASELINE #,,#HRONICLYMPHOCYTICLEUKEMIA---ULTIPLEMYELOMA.(,.ON (ODGKINSLYMPHOMA3$3TANDARDDEVIATION

58 Epigenomics (2016) 8(1) future science group

63 Global measures of peripheral blood-derived DNA methylation in mature B-cell neoplasms Research Article

Table 2. Comparison of estimated leukocyte proportions in cases and controls. Leukocyte Mean Standard error Difference in 95% CI p-value breakdown Cases Controls Cases Controls means All samples (n = 876) T lymphocytes 0.286 0.287 0.005 0.005 -0.001 (-0.013–0.011) 0.863 NK cells 0.102 0.957 0.004 0.005 0.007 (-0.002–0.015) 0.127 B lymphocytes 0.107 0.081 0.005 0.002 0.026 (0.016–0.035) <0.001 Monocytes 0.079 0.078 0.002 0.002 0.001 (-0.005–0.007) 0.786 Granulocytes 0.478 0.5 0.008 0.007 -0.022 (-0.042 to -0.002) 0.029 Dried blood spot samples (n = 632) T lymphocytes 0.266 0.277 0.004 0.004 -0.01 (-0.021–0.001) 0.051 NK cells 0.076 0.07 0.002 0.002 0.006 (-0.001–0.012) 0.067 B lymphocytes 0.103 0.083 0.005 0.002 0.02 (0.009–0.030) <0.001 Monocytes 0.073 0.072 0.001 0.001 0.001 (-0.002–0.005) 0.41 Granulocytes 0.524 0.54 0.006 0.005 -0.016 (-0.031 to -0.001) 0.033 Mononuclear cell samples T lymphocytes 0.348 0.323 0.013 0.012 0.025 (-0.007–0.057) 0.128 NK cells 0.178 0.169 0.009 0.01 0.01 (-0.014–0.034) 0.413 B lymphocytes 0.116 0.08 0.011 0.004 0.037 (0.015–0.059) 0.001 Monocytes 0.097 0.097 0.007 0.007 <0.001 (-0.020–0.020) 1 Granulocytes 0.346 0.371 0.022 0.02 -0.05 (-0.10600.006) 0.08 $IRECTEVALUATIONOFLEUKOCYTECONTENTWASNOTAVAILABLETHEREFORELEUKOCYTEPROPORTIONSWEREESTIMATEDBYAPPLYINGANALGORITHMBASEDONTHEDISTINCTIVE METHYLATIONPROlLESOFEACHLEUKOCYTE,EUKOCYTEPROPORTIONSWERECOMPAREDINCASESANDCONTROLFORALLSAMPLETYPESASWELLASFORWHOLEBLOODSAMPLESSTOREDAS DRIEDBLOODSPOTSANDPURIlEDMONONUCLEARCELLSAMPLES ,EUKOCYTEPROPORTIONSESTIMATEDACCORDINGTOMETHYLATIONPROlLE[32]. .+.ATURALKILLER

ing the interval into <5, 5–9 and >9 years. For controls, Out of these, 24 had no suitable controls. Pairs were the time interval was calculated as the time between excluded if either the case or control had inadequate DNA collection and the date at which the matched sample after DNA extraction (n = 7) or high detec- case was diagnosed with MBCN. tion p-values after data processing (n = 1), leaving 438 DNA methylation within several target genes matched pairs for analysis. known to be hypermethylated in MBCN was ana- MBCN cases were grouped into four major sub- lyzed (HOXA9, CDH1, CDH13, ADAMTS18 and types: MM (n = 111), follicular NHL (n = 81), high- PCDH10) [33–36]. A conditional logistic regression grade NHL (n = 111), comprising diffuse large B-cell model was used to identify CpG probes differentially lymphoma (n = 110) and Burkitt lymphoma (n = 1) methylated in cases compared with controls. We and low-grade NHL (n = 136), comprising CLL and assigned a threshold of significance for differential small lymphocytic lymphoma (n = 86), splenic mar- methylation as p < 10-08, in keeping with the cut-off ginal zone lymphoma (n = 6), mantle cell lymphoma used in genome-wide association studies [37]. This is (n = 16), marginal zone lymphoma (n = 15), Walden- a relatively strict threshold comparable to other mul- strom’s macroglobulinemia (n = 9) and hairy cell tiple testing hypotheses such as the Bonferonni pro- leukemia (n = 4). Characteristics of participants and cedure and false discovery rate. Given the exploratory tumors are outlined in Table 1. nature of this prospective study of methylation, we also The proportions of T lymphocytes, NK cells, B adopted an inclusive approach and considered probes lymphocytes, monocytes and granulocytes in blood with p < 10-5. samples were estimated (Table 2, complete data in Supplementary Table). There were significant differ- Results ences in the proportions of estimated B lymphocytes During follow-up (median 10.6 years; range 2.4 and granulocytes in cases compared with controls. Fur- months to 20 years), 471 MBCN cases were identified. ther analyses therefore included an adjustment for the

future science group www.futuremedicine.com 59

64 Research Article Wong Doo, Makalic, Joo et al.

Table 3. Odds ratios for mature B-cell neoplasms in relation to hypomethylation and hypermethylation.†

CpG location Global methylation Without cell content adjustment With cell content adjustment level OR (95% CI)‡ Likelihood-ratio OR (95% CI)‡ Likelihood-ratio test p-value test p-value Genome-wide Hypomethylated 2.27 (1.59–3.25) <0.001 2.01 (1.39–2.93) <0.001 Hypermethylated 0.99 (0.68–1.44) 0.88 (0.60–1.30) Regulatory region: – Nonpromoter regions Hypomethylated 1.66 (1.18–2.34) <0.001 1.55 (1.07–2.25) <0.001 Hypermethylated 0.67 (0.46–1.00) 0.61 (0.41–0.92) – Promoter regions Hypomethylated 1.00 (0.71–1.40) 0.01 0.92 (0.64–1.33) <0.001 Hypermethylated 1.54 (1.10–2.13) 1.48 (1.04–2.11) CpG density: – High CpG density Hypomethylated 1.26 (0.90–1.78) 0.004 1.29 (0.90–1.85) <0.001 Hypermethylated 1.76 (1.25–2.48) 1.74 (1.21–2.48) – Intermediate CpG Hypomethylated 1.01 (0.72–1.40) 0.70 0.95 (0.65–1.39) <0.001 density Hypermethylated 1.14 (0.81–1.61) 1.06 (0.72–1.57) – Low CpG density Hypomethylated 1.89 (1.34–2.65) <0.001 1.79 (1.23–2.58) <0.001 Hypermethylated 0.89 (0.61–1.29) 0.81 (0.55–1.20) CpG distribution: – CpG island Hypomethylated 1.11 (0.78–1.59) <0.001 1.07 (0.74–1.54) <0.001 Hypermethylated 1.89 (1.35–2.65) 1.82 (1.27–2.60) – CpG shore or shelf Hypomethylated 2.03 (1.45–2.84) <0.001 1.89 (1.33–2.70) <0.001 Hypermethylated 1.00 (0.68–1.47) 0.93 (0.63–1.38) – Neither island, shore or Hypomethylated 1.78 (1.26–2.52) <0.001 1.65 (1.31–2.41) <0.001 shelf Hypermethylated 0.78 (0.54–1.13) 0.69 (0.47–1.01) 4HEASSOCIATIONBETWEENMETHYLATIONLEVELANDMATURE" CELLMALIGNANCIESISREPORTEDFORALL#P'PROBESWITHINTHE)NlNIUM®(UMAN-ETHYLATION"EAD#HIP GENOME WIDE ASWELLASFORDIFFERENTREGULATORYREGIONSAND#P'DENSITYREGIONS †/2CALCULATEDFROMCONDITIONALLOGISTICREGRESSION ACCOUNTINGFORAGE GENDER ETHNICITYAND$.!SOURCE ‡/2ARECALCULATEDRELATIVETOTHEMIDDLETERTILEOFMETHYLATIONINTHECONTROLS /2/DDSRATIO

B lymphocyte and granulocyte content of the sample Methylation in promoter & nonpromoter in order to account for the potential systematic effect regions of different cell composition on DNA methylation. Within nonpromoter regions, hypomethylation was Overall, genome-wide hypomethylation, defined associated with increased risk of developing MBCN as the lowest tertile of methylation values across all (p = 3.5 × 10-3; OR: 1.66 [95% CI: 1.18–2.34]). CpG sites analyzed was associated with increased The OR was similar after correction for B cell and risk of MBCN (p = 7.20 × 10-6; OR: 2.27 [95% CI: granulocyte content. Within promoter regions as 1.59–3.25)] whereas genome-wide hypermethyl- a whole, higher global methylation was associated ation showed no association (OR: 0.99 [0.68–1.44]) with increased risk of MBCN (OR: 1.54 [95% CI: (Table 3). Repetitive elements were analyzed, includ- 1.10–2.13]), and for the subgroup of high CpG ing SINE, LINE and long terminal repeat regions. promoter regions (HCPs), hypermethylation was Regions containing repetitive elements were associ- associated with increased risk of MBCN (OR: 1.76 ated with hypomethylation (OR: 1.84 [95% CI: [95% CI: 1.25–2.48]). In contrast, for low CpG pro- 1.31–2.59]; p < 0.001) and persisted after correction moter regions (LCPs), lower global methylation was for cell content (OR: 1.74 [95% CI: 1.20–2.52]; associated with increased risk of MBCN (OR: 1.89 p = 0.03). [95% CI: 1.34–2.65]).

60 Epigenomics (2016) 8(1) future science group

65 Global measures of peripheral blood-derived DNA methylation in mature B-cell neoplasms Research Article

Methylation in CpG islands, shores & shelves CDH1, two of 21 probes within the gene body were Within CpG Islands, where CpG content is highest, hypermethylated (p < 10-5, Table 5) and one probe hypermethylation was associated with increased risk of was hypermethylated (p < 10-5) in the 3′UTR. In developing MBCN (OR: 1.89 [95%CI: 1.35–2.65]). In CDH13, six of 64 probes within the gene body were contrast, within CpG shores/shelves, hypomethylation hypomethylated (p < 10-5)(Table 5). was associated with the risk of MBCN (p = 3.3 × 10-5, OR: 2.03 [95%CI: 1.45–2.84]). Similarly, in regions Discussion that are neither CpG islands, shores nor shelves, the Using a case–control study nested within a prospective ‘open sea’ regions, hypomethylation was associated cohort study, and blood collected years prior to diag- with increased risk of MBCN (OR: 1.78 [95% CI: nosis with MBCN, we were able to demonstrate for the 1.24–2.52]). first time an association between genome-wide global DNA hypomethylation and the risk of developing Tumor subgroups MBCN (OR: 2.27 for lowest tertile vs middle tertile There was no significant heterogeneity in associations of global methylation). Further, we were able to char- between the different tumor subgroups, with MM, fol- acterize distinct patterns of global DNA methylation licular NHL, high-grade NHL and low-grade NHL all according to the functional status and genomic loca- demonstrating similar patterns of global methylation tion of CpG sites. Within promoter regions contain- associated with risk (p = 0.36). ing high CpG density (HCP), hypermethylation was In order to account for the known effect of some associated with increased risk of MBCN. HCP regions lifestyle and environmental factors on DNA meth- contain a high proportion of gene promoters within ylation, a separate analysis adjusting for BMI, dietary CpG islands and in normal tissue are overwhelmingly folate intake and smoking status was performed and protected from methylation [6,29]. The loss of constitu- did not affect the magnitude of association nor p value tive protection from DNA methylation within HCP significance levels. Analysis after adjustment for differ- promoter regions has been described in MM, follicular ences in B-cell and granulocyte content in blood sam- NHL and DLBCL tissues [11,38]. Its role in the develop- ples was performed and made no material difference ment of MBCN has been uncertain to date, but our to the ORs. finding of promoter hypermethylation in pre-diagnos- tic blood samples suggests it is an early rather than late Time from blood collection to diagnosis event. The time from baseline blood sampling to diagno- Hypermethylation of specific gene promoters in sis with MBCN was less than 5 years for 83 (20%), MBCN, such as HOXA gene family in B-NHL, tumor 5–9 years for 114 (26%) and ≥10 years for 239 (54%) suppressor genes CDH1, CDH13 and ADAMTS18 in cases. There was no evidence that the association DLBCL and mantle cell lymphoma, and PCDH10 in between global methylation and MBCN risk was NHL and MM, is described [33–36], suggesting that stronger when blood was sampled closer to the time of HCP hypermethylation, perhaps through epigenetic Table 4) diagnosis (phet = 0.19) ( . silencing of tumor suppressor genes, is important in MBCN pathogenesis. We confirmed that within the Methylation in promoter regions of tumor HOXA9 gene, there was one CpG probe within the suppressor genes promoter region hypermethylated in cases compared We analyzed DNA methylation in CpG probes asso- with controls with a genome-wide level of significance ciated with HOXA9, CDH1, CDH13, ADAMTS18 (p < 10-8). On relaxing the criteria for significance and PCDH10, genes known to exhibit promoter to p < 10-5, there were four additional probes hyper- methylation in MBCN tumor tissue. Within the methylated within the promoter region. Within the HOXA9 gene, the Infinium HumanMethylation450 PCDH10 gene, we confirmed there were four hyper- BeadChip maps 28 CpG sites; four probes demon- methylated CpG probes within the promoter region strated significant hypermethylation in cases com- (p < 10-5). It is intriguing that we have been able to pared with controls (p < 10-8) with three probes detect promoter hypermethylation in genes implicated lying in the exon region and one probe in the 5′UTR in MBCN pathogenesis in blood samples collected (Table 5). There are eight further hypermethylated many years before diagnosis with MBCN. While stud- probes within the HOXA9 gene reaching a lower ies of tumor tissue commonly nominate an arbitrary level of statistical significance (p < 10-5) with four cut-off for differential methylation of β(case–control) probes in the promoter region. In PCDH10, five of > 0.2, in our analysis, we did not apply a specific cut- 24 probes within 1500 bp of the transcription start off for differential methylation (some studies in tumor site were hypermethylated (p < 10-5, Table 5). In tissue nominate a methylation difference of β[case–

future science group www.futuremedicine.com 61

66 Research Article Wong Doo, Makalic, Joo et al.

control] > 0.2) as the strength of our study is compari- son of a large number of matched cases and controls and rigorous control for bias by study design consisting of matched pairs and analytical methods (sample plat- ing methods and use of normalization methods to cor- rect for possible bias). The magnitude of methylation differences in peripheral blood collected years prior

Likelihood ratio test p-value 0.001 to tumor diagnosis is untested in the current litera- ture, and we felt that the methylation thresholds from tumor tissue studies could exclude significant results. >9 years (n = 263) We note that the findings of promoter hypermethyl- ation in HOXA9 and PCDH10 were apparent after relaxation of our criteria for genome-wide significance therefore confirmation of our findings in a similar 2.25 (1.42–3.58) 1.30 (0.79–2.14) OR (95% CI) large prospective sample is ideal. It is increasingly recognized that regions outside CpG islands, particularly CpG shores and shelves, contain higher proportions of differentially methylated regions within tumor tissue [39,40]. We found a strong association between global DNA hypomethylation in CpG shores and shelves that was not present in CpG islands, supporting their relevance to MBCN risk.

0.12 Likelihood ratio test p-value A major strength of our study is its prospective design, with DNA collected many years before MBCN diagnosis and therefore suggesting that the aber- 5–9 years (n =90) rant methylation patterns detected are an early event. While it is possible that peripheral blood samples from cases could contain circulating tumor cells, the latency SOURCE Time between baseline and diagnosis between baseline blood collection and diagnosis did 1.43 (0.65–3.15)1.43 (0.32–1.54)0.70 OR (95% CI) not affect the risk of MBCN. Factors known to alter $.! DNA methylation (age, gender, year of birth and eth- AND nicity) were taken into account while lifestyle factors with the potential to affect methylation (dietary folate ETHNICITY intake, obesity and smoking status) did not affect risk. The interpretation of DNA methylation from periph- GENDER eral blood samples requires caution due to the well- AGE

described influence of white blood cell composition FOR

on methylation [41]. While there are several published 0.002 Likelihood ratio test p-value algorithms which attempt to account for differential cell composition of peripheral blood [42,43], the opti- <5 years (n = 83) ACCOUNTING mal approach remains uncertain. We used a validated algorithm to estimate B cell and granulocyte content of blood samples [32] and found there was no associa- REGRESSION tion between estimated cell content and measures of 2.06 (0.97–4.35) (0.15–1.10) 0.41 OR (95% CI) global methylation in our cohort. The choice of tissue

LOGISTIC in DNA methylation studies is highly likely to influ- ence results due to tissue-specific methylation profiles. In MBCN where the cell of origin is the B lymphocyte, CONDITIONAL peripheral blood is both likely to be reflective of early

FROM changes in B lymphocytes and readily obtainable in the

RATIO context of a large prospective study. The use of whole blood as opposed to purified B lymphocytes allows the /DDS CALCULATED potential discovery of novel germline epimutations and Table 4. OddsTable ratios for methylation and mature B-cell neoplasm risk by 5-year time intervals between baseline blood collection and diagnosis. Hypermethylation /2 /2 Hypomethylation Globalmethylation level the potential revelation of epimutations associated with

62 Epigenomics (2016) 8(1) future science group 67 Global measures of peripheral blood-derived DNA methylation in mature B-cell neoplasms Research Article

Table 5 Differential methylation in genes associated mature B-cell neoplasms and promoter hypermethylation. CpG probe CHR MAPINFO Probes with UCSC gene p-value β(case) β(control) β(case– SNPs group control) HOXA9 cg27009703 7 27204894 1stExon 2.44 × 10-09 0.171 0.134 0.037 cg27009703 7 27204894 1stExon 2.44 × 10-09 0.171 0.134 0.037 cg07778029 7 27205114 1stExon; 5′UTR 1.47 × 10-08 0.063 0.047 0.016 cg26521404 7 27204981 rs11975265 1stExon 7.44 × 10-08 0.096 0.071 0.025 cg26476852 7 27204728 1stExon 7.40 × 10-07 0.182 0.162 0.020 cg01285501 7 27206461 rs6461991 TSS1500 2.29 × 10-06 0.254 0.236 0.018 cg01354473 7 27204803 1stExon 3.52 × 10-06 0.295 0.278 0.016 cg26365299 7 27205381 TSS1500 4.69 × 10-06 0.028 0.022 0.006 cg02643054 7 27206544 TSS1500 4.82 × 10-06 0.094 0.082 0.011 cg01381846 7 27204785 1stExon 5.16 × 10-06 0.234 0.212 0.022 cg20399871 7 27204663 1stExon 2.15 × 10-05 0.136 0.119 0.017 cg15506609 7 27206073 rs77936575 TSS1500 2.73 × 10-05 0.242 0.215 0.027 PCDH10 cg07665387 4 134069388 TSS1500 3.37 × 10-06 0.222 0.199 0.023 cg01408654 4 134070437 TSS200 3.31 × 10-05 0.089 0.078 0.011 cg14146100 4 134069236 TSS1500 3.55 × 10-05 0.064 0.053 0.010 cg02043159 4 134070235 TSS1500 4.25 × 10-05 0.171 0.161 0.010 CDH1 cg20716119 16 68771763 Body 1.62 × 10-05 0.217 0.204 0.013 cg01857829 16 68771498 Body 2.83 × 10-05 0.169 0.151 0.018 cg06875305 16 68869013 3′UTR 8.93 × 10-05 0.817 0.839 -0.022 CDH13 cg03011928 16 83023240 Body 4.65 × 10-07 0.694 0.716 -0.022 cg02495250 16 82735595 Body 4.27 × 10-06 0.899 0.912 -0.013 cg01973778 16 83280511 Body 4.30 × 10-06 0.726 0.750 -0.024 cg02168291 16 82671520 Body 3.98 × 10-05 0.783 0.801 -0.019 cg09765297 16 83029415 rs61696909 Body 4.82 × 10-05 0.893 0.905 -0.012 cg08856946 16 82661421 Body 6.09 × 10-05 0.101 0.091 0.011 cg01093138 16 83374884 Body 7.10 × 10-05 0.638 0.664 -0.026

immune regulation such as T-cell dysregulation and to T-cell antigen presentation in autoimmune diseases aberrant antigen presentation as well as changes in the such as primary Sjogrens syndrome and Bechet’s dis- microenvironment such as changes to nurse-like T cells. ease [44,45]. In turn, autoimmune disease is associated The functional significance of global methylation with an increased risk of MBCN, but a role for meth- changes in peripheral blood remains an open question. ylation in the relationship between autoimmune disease Our detection of aberrant global methylation may reflect and MBCN has not been described. it is an early event in MBCN pathogenesis, alternately Our findings suggest that differential global meth- the aberrant methylation could reflect co-incident events ylation is detectable in DNA extracted from peripheral in the development of MBCN such as inflammatory or blood samples collected years prior to diagnosis and is immune responses. The current understanding of the strongly associated with risk of subsequent MBCN. It role of methylation in immune regulation is limited. will be important to validate our findings in other large There appears to be an association between hypometh- prospective cohorts. While this study has identified a ylation of a number of genes and gene regions related potential marker for use in the prediction of MBCN

future science group www.futuremedicine.com 63

68 Research Article Wong Doo, Makalic, Joo et al.

risk, further analyses should be conducted to detect dif- Financial & competing interests disclosure ferentially methylated regions and specific loci, particu- -##3COHORTRECRUITMENTWASFUNDEDBY6IC(EALTHAND#AN- larly within HCP regions. This may identify aberrant CER #OUNCIL 6ICTORIA 4HE -##3 WAS FURTHER SUPPORTED BY methylation in specific regions of functional impor- !USTRALIAN.(-2#GRANTS ANDAND tance and refine our understanding of the role of DNA BYINFRASTRUCTUREPROVIDEDBY#ANCER#OUNCIL6ICTORIA4HEAU- methylation in MBCN initiation and development. THORSHAVENOOTHERRELEVANTAFlLIATIONSORlNANCIALINVOLVE- MENTWITHANYORGANIZATIONORENTITYWITHAlNANCIALINTEREST Supplementary data IN OR lNANCIAL CONmICT WITH THE SUBJECT MATTER OR MATERIALS 4OVIEWTHESUPPLEMENTARYDATATHATACCOMPANYTHISPAPER DISCUSSEDINTHEMANUSCRIPTAPARTFROMTHOSEDISCLOSED PLEASEVISITTHEJOURNALWEBSITEATWWWFUTUREMEDICINECOM .OWRITINGASSISTANCEWASUTILIZEDINTHEPRODUCTIONOFTHIS DOIFULLEPI MANUSCRIPT

Executive summary Background s Global hypomethylation and promoter hypermethylation are recognized characteristics of tumor tissue in both solid cancer and hematological malignancies. s To date, there have been no prospective studies examining peripheral blood methylation as a risk factor for the development of mature B-cell malignancies (MBCN). Methods s Participants were healthy adult volunteers who had peripheral blood samples taken at baseline enrollment into the study. s New cases of MBCN were identified by linkage to cancer registries and subsequently matched to controls by age, gender, ethnicity and DNA source. s DNA methylation was measured using the Infinium® HumanMethylation450 BeadChip and three tertiles of methylation were used as measures of global methylation. Results s After a median 10.6-year follow-up, 471 cases of MBCN were diagnosed, with 438 eligible for analysis. s Across the entire epigenome, hypomethylation was strongly associated with an increased risk of developing MBCN. s In contrast, within promoter regions, hypermethylation was associated with MBCN risk, particularly within promoter regions with high CpG content (OR: 1.54 [95% CI: 1.10–2.13]). s Global hypomethylation and promoter hypermethylation was demonstrated in all tumor subgroups, with no significant tumor subgroup heterogeneity. s Methylation status in the promoter regions of candidate genes HOXA9, CDH1, CDH13, ADAMTS18 and PCDH10, was analyzed. Hypermethylation of CpG probes in the promoter regions of HOXA9 and PCDH10 was associated with MBCN. s In order to account for the known effect of leucocyte composition on methylation, we repeated the analysis after adjusting for B lymphocyte and granulocyte content and found no difference in our results. s Further analyses were performed adjusting for potential environmental factors such as dietary folate intake, obesity and smoking with no effect on results. s The latency between blood sampling and MBCN diagnosis did not affect results, with similar associations found for cases sampled >10 years, 5–10 years and <5 years prior to diagnosis. Discussion s Our findings of global hypomethylation and promoter hypermethylation in peripheral blood are concordant with the described methylation changes that occur in MBCN tumor tissue. s This is the first time aberrant methylation patterns have been described in the peripheral blood of a prospective cohort in MBCN. Utilizing a prospective cohort with long latency between blood sampling and diagnosis (median 10.6 years) reduces the likelihood that methylation patterns are due to circulating tumor cells – a particular issue in hematological malignancies. s The stability of our methylation findings regardless of long or short latency suggests the methylation patterns occur early in the development of MBCN. s A further strength of our study was that we were able to correct for common confounders of methylation such as age, gender, ethnicity and lifestyle factors by the use of matched controls. Summary s Differential global methylation is detectable in DNA extracted from peripheral blood samples collected years prior to diagnosis, suggesting that global methylation is an early and perhaps driver event in MBCN pathogenesis.

64 Epigenomics (2016) 8(1) future science group

69 Global measures of peripheral blood-derived DNA methylation in mature B-cell neoplasms Research Article

References a methylation-specific polymerase chain reaction study Papers of special note have been highlighted as: using two different primer sets. Ann. Hematol. 90(1), 73–79 sOFINTERESTssOFCONSIDERABLEINTEREST (2011). 1 Swerdlow S, Swerdlow E, Harris N et al. WHO Classification 15 Braggio E, Maiolino A, Gouveia ME et al. Methylation status of Tumours of Haematopoietic and Lymphoid Tissues. IARC, of nine tumor suppressor genes in multiple myeloma. Int. J. Lyon, France (2008). Hematol. 91(1), 87–96 (2010). 2 SEER Cancer Statistics Review, 1975–2012, National Cancer 16 Kocemba KA, Groen RW, Van Andel H et al. Transcriptional Institute. silencing of the Wnt-antagonist DKK1 by promoter http://seer.Cancer.Gov/csr/1975_2012/ methylation is associated with enhanced Wnt signaling in advanced multiple myeloma. PLoS ONE 7(2), e30359 (2012). 3 Yuille MR, Matutes E, Marossy A, Hilditch B, Catovsky D, Houlston RS. Familial chronic lymphocytic leukaemia: 17 Dawson S-J, Tsui DWY, Murtaza M et al. Analysis of a survey and review of published studies. Br. J. Haematol. circulating tumor DNA to monitor metastatic breast cancer. 109(4), 794–799 (2000). N. Engl. J. Med. 368(13), 1199–1209 (2013). 4 Goldin LR, Pfeiffer RM, Li X, Hemminki K. Familial risk 18 Woo HD, Kim J. Global DNA hypomethylation in of lymphoproliferative tumors in families of patients with peripheral blood leukocytes as a biomarker for cancer risk: a chronic lymphocytic leukemia: results from the Swedish meta-analysis. PLoS ONE 7(4), e34615 (2012). family-cancer database. Blood 104(6), 1850–1854 (2004). 19 Giachelia M, Bozzoli V, D’Alo F et al. Quantification 5 Landgren O, Kristinsson SY, Goldin LR et al. Risk of of DAPK1 promoter methylation in bone marrow and plasma cell and lymphoproliferative disorders among 14621 peripheral blood as a follicular lymphoma biomarker. J. Mol. first-degree relatives of 4458 patients with monoclonal Diagn. 16(4), 467–476 (2014). gammopathy of undetermined significance in sweden. Blood s &IRSTPUBLICATIONDESCRIBINGABERRANTMETHYLATIONIN 114(4), 791–795 (2009). PERIPHERALBLOODOFPATIENTSWITHNEWLYDIAGNOSED 6 Herman JG, Baylin SB. Gene silencing in cancer in LYMPHOMA SUGGESTINGPERIPHERALBLOODMETHYLATIONCOULD association with promoter hypermethylation. N. Engl. BEUSEDASABIOMARKER J. Med. 349(21), 2042–2054 (2003). 20 Joo JE, Wong EM, Baglietto L et al. The use of DNA 7 Martín-Subero JI, Kreuz M, Bibikova M et al. New from archival dried blood spots with the Infinium insights into the biology and origin of mature aggressive Humanmethylation450 array. BMC Biotechnol. 13, 23 B-cell lymphomas by combined epigenomic, genomic, and (2013). transcriptional profiling. Blood 113(11), 2488–2497 (2009). 21 Giles GG, English DR. The Melbourne Collaborative Cohort 8 Walker BA, Wardell CP, Chiecchio L et al. Aberrant global Study. IARC Sci. Publ. 156, 69–70 (2002). methylation patterns affect the molecular pathogenesis and 22 R Core Team, R Foundation for Statistical Computing, 2015. prognosis of multiple myeloma. Blood 117(2), 553–562 www.r-project.org (2011). 23 Aryee MJ, Jaffe AE, Corrada-Bravo H et al. Minfi: a flexible ss -ETHYLATIONPROlLEOFMULTIPLEMYELOMASAMPLESATEARLY and comprehensive bioconductor package for the analysis MIDANDLATESTAGESOFTUMORPROGRESSION DEMONSTRATING of infinium DNA methylation microarrays. Bioinformatics ABERRANTMETHYLATIONPROlLESOCCURINTHEEARLYPRECURSOR 30(10), 1363–1369 (2014). STAGEOFMYELOMA 24 Maksimovic J, Gordon L, Oshlack A. Swan: subset- 9 Kanduri M, Cahill N, Goransson H et al. Differential quantile within array normalization for illumina infinium genome-wide array-based methylation profiles in prognostic humanmethylation450 beadchips. Genome Biol. 13(6), R44 subsets of chronic lymphocytic leukemia. Blood 115(2), (2012). 296–305 (2010). 25 Johnson WE, Li C, Rabinovic A. Adjusting batch effects in 10 Salhia B, Baker A, Ahmann G, Auclair D, Fonseca R, microarray expression data using empirical bayes methods. Carpten J. DNA methylation analysis determines the high Biostatistics 8(1), 118–127 (2007). frequency of genic hypomethylation and low frequency of 26 Marabita F, Almgren M, Lindholm ME et al. An evaluation hypermethylation events in plasma cell tumors. Cancer Res. of analysis pipelines for DNA methylation profiling using 70(17), 6934–6944 (2010). the Illumina Humanmethylation450 beadchip platform. 11 Martin-Subero JI, Ammerpohl O, Bibikova M et al. A Epigenetics 8(3), 333–346 (2013). comprehensive microarray-based DNA methylation study of 27 Naeem H, Wong NC, Chatterton Z et al. Reducing the 367 hematological neoplasms. PLoS ONE 4(9), e6986 (2009). risk of false discovery enabling identification of biologically 12 Chen RZ, Pettersson U, Beard C, Jackson-Grusby L, Jaenisch significant genome-wide methylation status using the R. DNA hypomethylation leads to elevated mutation rates. humanmethylation450 array. BMC Genomics 15, 51 (2014). Nature 395(6697), 89–93 (1998). 28 Du P, Zhang X, Huang CC et al. Comparison of 13 Berdasco M, Esteller M. Aberrant epigenetic landscape in Beta-value and M-value methods for quantifying methylation cancer: how cellular identity goes awry. Dev. Cell 19(5), levels by microarray analysis. BMC Bioinformatics 11, 587 698–711 (2010). (2010). 14 Park G, Kang SH, Lee JH et al. Concurrent p16 methylation 29 Weber M, Hellmann I, Stadler MB et al. Distribution, pattern as an adverse prognostic factor in multiple myeloma: silencing potential and evolutionary impact of promoter

future science group www.futuremedicine.com 65

70 Research Article Wong Doo, Makalic, Joo et al.

DNA methylation in the human genome. Nat. Genet. 39(4), 38 O’Riain C, O’Shea DM, Yang Y et al. Array-based DNA 457–466 (2007). methylation profiling in follicular lymphoma. Leukemia 30 Martin-Subero JI, Kreuz M, Bibikova M et al. New 23(10), 1858–1866 (2009). insights into the biology and origin of mature aggressive 39 Cahill N, Bergh AC, Kanduri M et al. 450k-array analysis B-cell lymphomas by combined epigenomic, genomic, and of chronic lymphocytic leukemia cells reveals global DNA transcriptional profiling. Blood 113(11), 2488–2497 (2009). methylation to be relatively stable over time and similar in 31 Price ME, Cotton AM, Lam LL et al. Additional annotation resting and proliferative compartments. Leukemia 27(1), enhances potential for biologically-relevant analysis of the 150–158 (2013). Illumina Infinium Humanmethylation450 beadchip array. 40 Irizarry RA, Ladd-Acosta C, Wen B et al. The human colon Epigenetics Chromatin 6(1), 4 (2013). cancer methylome shows similar hypo- and hypermethylation 32 Houseman EA, Accomando WP, Koestler DC et al. DNA at conserved tissue-specific CpG island shores. Nat. Genet. methylation arrays as surro-gate measures of cell mixture 41(2), 178–186 (2009). distribution. BMC Bioinformatics 13(1), 86 (2012). 41 Reinius LE, Acevedo N, Joerink M et al. Differential DNA s 4HEAUTHORSPROPOSEAMODELALLOWINGESTIMATIONOF methylation in purified human blood cells: implications for cell lineage and studies on disease susceptibility. PLoS ONE LEUCOCYTECOMPOSITIONINPERIPHERALBLOODTHROUGH 7(7), e41361 (2012). IDENTIlCATIONOFDISTINCTMETHYLATIONPROlLES 42 Koestler DC, Christensen B, Karagas MR et al. Blood- 33 Choi JH, Li Y, Guo J et al. Genome-wide DNA methylation based profiles of DNA methylation predict the underlying maps in follicular lymphoma cells determined by distribution of cell types: a validation analysis. Epigenetics methylation-enriched bisulfite sequencing. PLoS ONE 8(8), 816–826 (2013). 5(9), e13020 (2010). 43 Jaffe AE, Irizarry RA. Accounting for cellular heterogeneity 34 Narayan G, Xie D, Freddy AJ et al. PCDH10 promoter is critical in epigenome-wide association studies. Genome hypermethylation is frequent in most histologic subtypes Biol. 15(2), R31 (2014). of mature lymphoid malignancies and occurs early in lymphomagenesis. Genes Chromosomes Cancer 52(11), ss %MPHASIZESTHEIMPACTOFCELLULARHETEROGENEITYON 1030–1041 (2013). METHYLATIONANALYSIS 35 Halldorsdottir AM, Kanduri M, Marincevic M et al. Mantle 44 Hughes T, Ture-Ozdemir F, Alibaz-Oner F, Coit P, cell lymphoma displays a homogenous methylation profile: Direskeneli H, Sawalha AH. Epigenome-wide scan identifies a comparative analysis with chronic lymphocytic leukemia. a treatment-responsive pattern of altered DNA methylation Am. J. Hematol. 87(4), 361–367 (2012). among cytoskeletal remodeling genes in monocytes and CD4+ T cells in behcet’s disease. 66(6), 36 Handa H, Tahara K, Shimizu H et al. Chromosome 16q Arthritis Rheumatol. located genes CDH1, CDH13 and ADAMTS18 are 1648–1658 (2014). correlated and frequently methylated but not associated with 45 Altorok N, Coit P, Hughes T et al. Genome-wide DNA dnmts levels in human lymphoma. Blood 122, 4289–4289 methylation patterns in naive CD4+ T cells from patients (2013). with primary sjögren’s syndrome. Arthritis Rheumatol. 66(3), 37 Dudbridge F, Gusnanto A. Estimation of significance 731–739 (2014). thresholds for genomewide association scans. Genet. Epidemiol. 32(3), 227–234 (2008).

66 Epigenomics (2016) 8(1) future science group

71 5.2 Differentially methylated positions Background

One of the most commonly reported measures of differential methylation is a comparison of methylation between a case and control sample measured at a single CpG site. A differentially methylated position (DMP) is a CpG site in which there is a significant difference in methylation between the case and control. Differential methylation within promoter-associated regions and within CpG islands has historically been the major research focus due to the putative link between CpG island methylation and down-stream gene silencing (223). The other well-described pattern of aberrant methylation in cancer is that of widespread hypomethylation. Loss of methylation from CpG island-associated CpG sites associated with a gene could potentially lead to uninhibited gene transcription (224).

Analysis

Analysis was performed using M-values of DNA methylation, defined as the log2 ratio of the methylated and unmethylated CpG probes. M-values were calculated using the minfi bioconductor package (220). Using the annotation file provided by Illumina (version 1.2), CpG sites were classified according to their location in CpG islands, CpG shores, CpG shelves or other (‘open sea’ locations). CpG sites were classified as promoter-associated if they were annotated as promoter-associated, enhancer- associated, within 1500 bp upstream of the transcription start site or within the 5ʹUTR.

Conditional logistic regression was applied to compute odds ratios (OR), 95% confidence intervals (95% CI), and p-values for the association between methylation levels at each CpG site and the incidence of MBCN. Compared with unconditional logistic regression, conditional logistic regression allows the matched variables in case-control pairs to be taken into account and is preferred when there are a moderate number of matching variables. In this setting, unconditional logistic regression can over-estimate the odds ratio of association or widen the confidence interval of the odds ratio, leading to reduced precision (225, 226).

Differentially methylated positions (DMPs) were defined as CpG sites with statistically significant DNA methylation in cases compared with controls. Bonferroni adjustment was applied for multiple testing across 416 669 sites, thus DMPs were designated as those with p<1.2 x 10-7 (227). Bonferroni correction is a stringent, conservative method for multiple testing correction and controls the family-wise error rate (i.e., the probability of making at least one type I error in the entire body of tests). Other corrections for multiple testing were explored, including the false discovery rate (228)corrected p value (q) at different exploratory values of q (q<0.05, q<0.005, q<0.0005) and a global methylation cut-off suggested for assays analyzing 450,000 CpG probes, set at p<1x10-7 (168).

72 It has become common practice when comparing methylation in normal and abnormal tissue using a quantitative assay such as the Infinium 450K to set a threshold of difference in absolute β methylation ≥0.2 between cases and controls as one of the criteria to define a DMP (229). This threshold evolved from the technical validation study of the Infinium 450K assay by Bibikova et al in which the authors reported the assay’s ability to detect a difference in β of 0.2 with 99% confidence (191). The validation study involved comparison of DNA methylation in a tumour sample with normal tissue, a situation in which large differences of methylation are expected. In the tumour tissue studies that followed, a moderate number of cases (50-200) is generally compared to a one to two control samples containing normal cells (229, 230). In comparison, in studies of DNA methylation in peripheral blood, much smaller differences in methylation are anticipated. The analysis herein follows the approach used in other studies of DNA methylation in whole blood (169, 180). A threshold of absolute methylation difference to define a DMP was not applied and the potential identification of false positive results is balanced by use of more stringent p values as described above

An assessment of measurement reliability was made for each CpG site. The reliability of methylation measures at each CpG site was assessed by computing an intraclass correlation coefficient (ICC) based on 22 replicate pairs, following published methodology (231). The ICC calculation incorporates explained variability due to plate and chip effects. ICC ≥ 0.8 is considered excellent reliability. Principal component analysis was performed to evaluate systematic effects such as batch effect.

DAVID Bioinformatics Resources 6.8 was used to for classification (232, 233) by identifying the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and using the functional annotation clustering tool (clusters with Benjamini-Hochberg p value <0.05 were considered significant). Stata Version 10 (StataCorp, College Station, TX) was used for conditional logistic regression. Chromosome mapping visualization was performed using Phenogram (234).

Adjustment for differences in white blood cell content Differences in white blood cell content were controlled in two stages. First, the study design required all case-control pairs to be matched according to the method of blood purification and storage (whether stored as whole dried blood spots, Ficoll- separated mononuclear cells, or buffy coat). The conditional logistic regression methodology ensured retention of pairwise matching. Second, although white blood cell content was not directly measured at time of blood collection, an estimation was performed using a validated algorithm by Houseman et al, which computes white cell proportions based on the highly specific methylation patterns of different white blood cell populations (207). Estimated granulocyte, B cell, CD4+ T cell, CD8+ T cell, NK cell and monocyte proportions were included as variables in a separate conditional logistic regression model in order to correct for systematic effects of cell content on methylation.

73 Tumour subtypes and Time between blood collection and MBCN diagnosis Conditional logistic regression was performed for samples in each of the four tumour subtypes. The same was done for samples divided into four groups by time between blood sampling and MBCN diagnosis (<5, 5-9, 10-14 and ≥15 years). Interaction between methylation levels and length of follow-up was tested using likelihood-ratio tests, in order to examine whether the associations with MBCN were stronger when blood was collected closer to the time of diagnosis. The tumour subtype and time lag analyses were fully corrected for white blood cell content by using the Houseman algorithm described above.

Adjustment for lifestyle factors. A conditional logistic regression model additionally adjusted for lifestyle factors that might act as confounders for methylation. The lifestyle factors included in the adjusted analysis were: smoking status (never smoked, former smoker, current smoker), alcohol consumption (0g/day, 1-39g/day, 40-59g/day and ≥60g/day), body mass index (BMI<30, BMI≥30) and folate intake (<320microg/day and ≥320microg/day). The adjusted regression model was performed separately and compared with the unadjusted model.

Summary of analysis plan for conditional logistic regression:

1. Conditional logistic regression for all MBCN 2. Conditional logistic regression for subgroups identified by DNA source (Guthrie card source and mononuclear cell source only) 3. Conditional logistic regression for subgroups identified by tumour subtype (Low grade, high grade NHL, follicular NHL, multiple myeloma) 4. Conditional logistic regression after adjustment for white blood cell content (based on House man algorithm) a. For all MBCN b. For subgroups identified by the four tumour subtypes 5. Conditional logistic regression after adjustment for lifestyle factors

74 Results

Following conditional logistic regression analysis, the Bonferroni cut-off p<1.2x10-7 was applied as the threshold for significant differential methylation. Overall, excluding CpG sites associated with SNPs, there were 1338 differentially methylated CpG positions (DMPs). The complete list of DMPs is presented in Appendix Table 3. Compared with other commonly used statistical methods such as the false discovery rate, the Bonferroni cut-off is one of the most stringent thresholds for differential methylation as shown in the figure below.

Comparison of methods for multiple test correction 80000

70000

60000

50000

40000

No. CpG probes No. 30000

20000

10000

0 Global methylation Bonferroni (416669 FDR q<0.05 FDR q<0.005 FDR q<0.00005 cutoff for 450K probes) p<1.2x10-7 p<1.0x10-7 Series1 71951 12009 1338 1738 1126

Figure 3: Comparison of different methods of correction for multiple testing demonstrating the stringency of the Bonferroni method FDR=false discovery rate

The intraclass correlation coefficient (ICC), a measure of technical reliability of the assay at individual CpG sites, was computed for 485,349 CpG sites in 22 duplicate samples. For all 485,389 evaluable CpG sites the median IQR = 0.60 (IQR 0.24-0.87) which is a moderate score for assay reliability. For the 1,338 DMPs of interest, the median ICC = 0.95 (IQR 0.93-0.98) with 1284/1338 DMPs (96%) achieving ICC ≥ 0.8, considered excellent technical reliability, i.e. no significant intra-individual variation.

The mean methylation difference between all cases and controls at each DMP is depicted below. The overall mean methylation difference at each DMP was small, ranging from 0.01 to 0.047 (1.0 - 4.7%) (Figure 4).

75

Differen(al methyla(on 0.060

0.040

0.020

0.000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

-0.020 Methyla(on difference beta(case-control) -0.040

-0.060 Chromosome

Figure 4: Mean methylation difference in all DMPs. Each dot represents one CpG site. The majority of CpGs are less methylated in cases

Despite the small differences in mean methylation for some individual case-control pairs, significantly larger methylation differences were observed, with some cases demonstrating methylation differences of >0.1 compared with their matched control. DMP cg04771285 is chosen as an example as it illustrates the variation in differential methylation in different sample pairs (Figure 5). While the mean methylation difference was -0.03 (3% less methylation in all cases compared with controls), there were eight sample pairs in which differential methylation was <-0.2. The presence of at least a 0.2 difference in methylation meets the criterion for differential methylation applied in studies of methylation in tumour tissue (184). This same threshold is not generally applied in the description of peripheral blood methylation studies where smaller differences are reported (180).

76

Figure 5: Differential methylation in DMP cg04771285, demonstrating marked methylation difference in some case-control pairs Each dot represents methylation difference of a single case-control pair

DMPs with increased methylation Methylation was higher in cases than controls for 90 of 1,338 DMPs. Of these 90 DMPs with increased methylation, 53 (59%) were within a promoter-associated region, while 67 (74%) were located in CpG islands. Thirty nine of the 90 DMPs were promoter-associated and in CpG islands or shores. The classification of DMPs into promoter-associated and CpG island locations is of interest because a pattern of increased DNA methylation in promoter-associated CpG islands has been described consistently for all MBCN tumours (86, 97, 120, 131). In many of these settings, hypermethylation of promoter-associated CpG islands is associated with gene silencing. For DMPs within CpG islands there was a mean gain of methylation of 0.02 while for DMPs located in non-island regions there was a mean loss of methylation (Figure 6).

Of the 90 DMPs with increased methylation, 65 were associated with an annotated gene, but ontology analysis revealed no significant KEGG pathways (233). The functional annotation clustering tool in DAVID was used, which enables grouping of genes based on functional similarity. The clustering algorithm is based on a gene-to- gene similarity matrix constructed from 14 functional annotation sources. A high level of clustering stringency level was applied, in further detail the specifications of this are: Similarity Term Overlap =3 (a minimum of three overlapping annotation terms per gene is required to qualify) and Similarity Threshold =0.85 (where the minimum kappa value considered to represent a biologically significant cluster is <0.3) (232). There was a significant cluster associated with the homeodomain (enrichment score 9.17, p=2.3x10-11). The associated genes in this cluster were: GSX1, HMX3, LHX2, NKX3-2, DLX1, EMX2, GBX2, HOXA9, HOXC11, HOXD11, ONECUT2.

77

CpG island Open sea CpG shelf CpG Shore Mean 0.02 -0.02* -0.02* -0.02* methylation difference (95%CI) (-0.03, 0.04) (-0.03, 0.02) (-0.03, -0.02) (-0.03, -0.10)

Figure 6: Mean methylation difference of DMPs by CpG island location DMPs within CpG islands demonstrate a gain of methylation in cases compared with controls; DMPs in CpG shore, shelf and open sea locations demonstrate a loss of methylation, *= p<0.001 (ANOVA). Each dot represents one DMP.

DMPs with reduced methylation Methylation levels were lower in MBCN cases for the remaining 1,248 of 1,338 DMPs. In contrast to the DMPs with a gain of methylation, only 20 of 1,248 (18%) DMPs with loss of methylation were in CpG islands and 735 (59%) were promoter- associated. This pattern of the majority of DMPs exhibiting loss of methylation is of interest given findings in MBCN tumours show widespread DNA hypomethylation (42, 45, 138, 235, 236).

Promoter-associated DMPs with reduced methylation were associated with the following known proto-oncogenes: RUNX1, RARA, MAS1L, GAS7, AKT2, FAM83A, PBX1, FLT3, EPS15, BCAS4, TAL2, AKAP13, PAX7, CARS, ROBO1, PBX1, ZAK.

Of the 1,248 DMPs with loss of methylation, 831 were associated with an annotated gene. Gene ontology of the 831 genes revealed 16 significant KEGG pathways. Of particular interest were pathways associated with MAPK signaling and chemokine signaling.

78 Table 18: Significant KEGG pathways for genes containing one or more CpG sites with loss of methylation

KEGG pathway No. DMPs p value represented Aldosterone synthesis and secretion 12 3.2x10-4 Tight junction 16 3.2x10-3 Cholinergic synapse 14 4.2x10-4 Arrhythmogenic right ventricular cardiomyopathy 11 4.4x10-4 MAPK signaling pathway 22 1.1 x10-3 Genes: ZAK, CACNA1l, MKNK2, CACNG5, MAP4K1, FGF1, FGF23, CACNG2, CACNA2D2, CACNA2D4, AKT1, ARRB1, RPS6KA2, RASGRP4, CACNA1G, CACNA1C, AKT2, NFATC1, CACNA1B, FLNB, IKBKB, RRAS2 Retrograde endocannabinoid signaling 12 2.1x10-3 Morphine addiction 11 3.1x10-3 Dopaminergic synapse 13 4.7x10-3 Glutamatergic synapse 12 5.3x10-3 Acute myeloid leukaemia 8 1.3x10-2 Genes: JUP, AKT1, FLT3, RUNX1T1, RARA, TCF7L1, AKT2, IKBKB. Hypertrophic cardiomyopathy 9 1.2x10-2 Adrenergic signaling in cardiomyocytes 13 1.3x10-2 Chemokine signaling pathway 15 1.6x10-2 Genes: AKT1, AKT2, CCL22, CCL5, CCL8, CCR6, CXCR1, CX3CL1, GNB3, ADCY2, ARRB1, IKBKB, PRKCZ Focal adhesion 16 1.7x10-2 Dilated cardiomyopathy 9 1.8x10-2 GABAergic synapse 9 1.9x10-2

DMPs corresponding to candidate MBCN genes

A comparison was made between the 1,338 DMPs and the 201 genes known to contain recurrent mutations in MBCN, obtained from the literature review presented in Chapter 2.

Twelve DMPs were associated with a candidate MBCN gene. Three CpG sites within a CpG island demonstrated increased methylation and corresponded to the homeobox gene HOXA9 (mutated in MM). The remainder of the DMPs demonstrated loss of methylation. One DMP in each of the following genes known to be mutated in CLL (one of the low grade histological types): cg 07778029 within a promoter-associated region in the gene body of SNX7, cg21614211 within a promoter-associated region in the gene body of VILL and cg10044629 in the gene body of MUC2. One DMP in each of the following genes known to be mutated in myeloma: cg24512093 in a promoter- associated region in the gene body of ROBO1, cg24995347 in a promoter-associated region of SP140 and cg12664938 in the gene body of LRRK2. One DMP in these genes known to be mutated in marginal zone lymphoma, a type of low grade NHL: cg11283404 in a promoter-associated region in the gene body of IKBKB, relevant to the MAPK signaling pathway, and cg18567954 associated with DTX1. There was one DMP in a promoter-associated region of TNFRSF13C (also known as B-cell activating factor receptor, BAFF-R), a gene known to be mutated in B cell non Hodgkin lymphomas and CLL (237, 238).

79 Gene mutations, if present due to low levels of circulating tumour cells or tumour DNA, could lead to methylation differences attributable to genetic rather than epigenetic change. Therefore, an attempt was made to determine whether the observed methylation difference was more marked within the tumour type in which the mutation has been reported. The mean methylation difference at the DMP of interest was compared across each of the four tumour subtypes using a one-way analysis of variance. If there was a significant difference in methylation difference by subtype, a Tukey post-hoc analysis was performed to determine which tumour subtypes were responsible for the methylation difference.

For instance, SNX7 is mutated in CLL (belonging to the low grade tumour subtype), and there was no statistically significant difference in differential methylation between tumour groups by one-way ANOVA (p=0.45). There was no evidence that the differential methylation in SNX7 was stronger in CLL samples, and thus no evidence that the methylation difference was due to CLL-specific mutations present in the blood samples. However, in 10 CpGs associated with eight genes there was a significant difference in methylation between the tumour groups by one-way ANOVA: HOXA9 (cg27009703 F(3,434)=4.318, p=0.005; cg07778029 F(3,434)=6.834, p<0.001; cg26521404 F(3,434)=3.747, p=0.011), VILL (F(3,434)=4.079, p=0.007), ROBO1 (F(3,434)=3.751, p=0.011), SP140 (F(3,434)=2.81, p=0.039), LRRK2 (F(3,434)=6.414, p<0.001), IKBKB (F(3,434)=6.91, p<0.001), DTX1 (F3,434)=5.227, p=0.001) and TNFRSF13C/BAFF-R (F(4,434)=9.257, p<0.001).

The Tukey post-hoc tests, presented in Table 19 below, show that the differences between tumour groups is largely due to a significant difference in the low grade group and to a lesser degree in the follicular lymphoma group. Genes VILL, IKBKB and DTX1, in which the methylation difference appears to be strongest in the low grade group are known to be recurrently mutated in low grade NHL, raising the possibility that the methylation findings in these genes may be due to a lymphoma- related genetic mutation rather than a primary methylation abnormality.

In HOXA9, ROBO1, SP140, LRRK2 and TNFRSF13C/BAFF-R, aberrant methylation was no stronger in the corresponding tumour group known to exhibit recurrent mutations of these genes. It remains plausible, therefore, that the methylation abnormalities detected in these genes are primary rather than secondary to the presence of mutations in the samples.

80 Table 19: ANOVA comparisons of methylation difference in the four tumour subtypes

Tukey’s HSD comparisons

CpG site Group Mean Low grade Follicular Myeloma delta ß

cg27009703 Low grade 0.057 (HOXA9, mutated in Follicular 0.052 0.985 myeloma) Myeloma 0.008 0.005 0.048 High grade NHL 0.03 0.244 0.565 0.486 cg07778029 Low grade 0.028 (HOXA9) Follicular 0.023 0.891 Myeloma 0.002 <0.001 0.021 High grade NHL 0.009 0.014 0.207 0.734 cg26521404 Low grade 0.043 (HOXA9) Follicular 0.029 0.671 Myeloma 0.008 0.01 0.082 High grade NHL 0.017 0.082 0.751 0.883 cg21614211 Low grade -0.03 (VILL, mutated in low Follicular -0.009 0.041 grade NHL) Myeloma -0.001 0.01 0.996 High grade NHL -0.015 0.172 0.871 0.711 cg24512093 Low grade -0.04 (ROBO1, mutated in Follicular -0.044 0.991 myeloma) Myeloma -0.02 0.351 0.313 High grade NHL -0.007 0.026 0.032 0.694

cg24995347 Low grade -0.039 (SP140, mutated in Follicular -0.02 0.296 myeloma) Myeloma -0.021 0.28 1 High grade NHL -0.011 0.028 0.876 0.785 cg12664938 Low grade -0.044 (LRRK2, mutated in Follicular -0.014 0.016 myeloma) Myeloma -0.001 0.001 0.977 High grade NHL -0.012 0.003 0.996 1 cg11283404 Low grade -0.042 (IKBKB, mutated in low Follicular -0.025 0.378 grade NHL) Myeloma -0.017 0.038 0.848 High grade NHL -0.0002 <0.001 0.085 0.339 cg18567954 Low grade -0.055 (DTX1, mutated in low Follicular -0.022 0.082 grade NHL) Myeloma -0.016 0.01 0.967 High grade NHL -0.011 0.003 0.865 0.987 cg00253346 Low grade -0.044 (TNFRSF13C/BAFF-R, Follicular -0.02 0.043 mutated in B-NHL and CLL) Myeloma -0.001 <0.001 0.495 High grade NHL -0.001 <0.001 0.506 1

81 To further evaluate whether differential methylation was likely to be secondary to genetic mutation in a candidate MBCN gene, the effect of latency (the time lag between blood sampling and MBCN diagnosis) on methylation was evaluated. The mean methylation difference for each of the DMPs corresponding to a candidate gene was compared across the four time lag groups (separated into those with blood collected <5 years, 5-9 years, 10-15 years and >15 years prior to diagnosis with MBCN).

The DMPs associated with HOXA9, MUC2 and SP140 did not show any significantly greater differential methylation for the shortest time lag group compared with the others. For seven DMPs the mean methylation difference was significantly different between the time lag groups by one-way ANOVA test: LRRK2, SNX7, ROBO1, VILL, TNFRSF13C/BAFF-R, IKBKB and DTX1. In all instances, the difference was due to larger methylation differences in those with <5 years lag compared with 5-9 years lag and in all but once instance also due to larger methylation differences in those with <5 years lag compared with those with 10-15 years lag.

Only in LRRK2, TNFRSF13C/BAFF-R and IKBKB were significant differences also seen between the shortest lag <5 years and the longest lag >15 years. The results of the post-hoc tests are presented in Table 20 below.

Table 20: ANOVA comparison of methylation difference according to time between blood collection & diagnosis

Tukey HSD comparison test

CpG site Group Mean <5 years 5-9years 10-15years delta ß

cg16046833 <5y -0.051 (SNX7) 5-9y -0.029 0.008 10-15y -0.028 0.215 0.394 >15y -0.010 0.388 0.435 1.000 cg12664938 <5y -0.050 (LRRK2) 5-9y -0.018 0.009 10-15y -0.011 <0.001 0.842 >15y -0.018 0.017 1.000 0.877 cg24995347 <5y -0.047 (SP140) 5-9y -0.024 0.016 10-15y -0.020 0.007 0.858 >15y -0.012 0.249 0.745 0.986 cg24512093 <5y -0.056 (ROBO1) 5-9y -0.030 0.021 10-15y -0.017 0.013 1.000 >15y -0.018 0.274 0.775 0.734 cg21614211 <5y -0.034 (VILL) 5-9y -0.014 0.024 10-15y -0.011 0.014 0.989 >15y -0.012 0.086 1.000 0.982

82 cg00253346 <5y -0.043 (TNFRSF13C 5-9y -0.016 0.021 /BAFF-R) 10-15y -0.015 0.013 1.000 >15y -0.015 0.005 1.000 0.999 cg11283404 <5y -0.053 (IKBKB) 5-9y -0.022 0.004 10-15y -0.007 <0.001 0.638 >15y -0.018 0.031 0.973 0.420 cg18567954 <5y -0.057 (DTX1) 5-9y -0.032 0.009 10-15y -0.022 0.054 0.457 >15y -0.012 0.380 0.821 0.875

Comparing DMPs with genes reported to show aberrant methylation in MBCN

DMPs were also compared with a list of 172 genes known to demonstrate aberrant methylation in MBCN, obtained from the literature review presented in Chapter 2. In published studies to date, 142 hypermethylated and 30 hypomethylated genes have been described. Twelve DMPs with a gain of methylation corresponded to genes reported to be hypermethylated in MBCN, summarised in Table 21.

Within the homeobox gene HOXA9, known to be hypermethylated in follicular lymphoma (93, 100) and mantle cell lymphoma (46, 86) three DMPs demonstrated a gain of methylation. The homeobox genes HOXD11 (hypermethylated in CLL (42)) and ONECUT2 (hypermethylated in follicular lymphoma (93) and DLBCL (87)) contained one DMP each with a gain of methylation. FOXD3 and EBF3 (hypermethylated in CLL (42, 98)) each contained one DMP with a gain of methylation. DIO3, reported to be methylated in CLL by Pei et al (98) but hypomethylated in a sample of mixed haematological malignancies (121), contained three DMPs with a gain of methylation. DAPK1, hypermethylated in follicular lymphoma (93), CLL (102) and DLBCL (90, 92) contained a promoter-associated DMP within 200bp of the transcription start site. SFRP1, hypermethylated in myeloma (95, 144, 150), contained a promoter-associated DMP was detected within 200bp of the transcription start site.

The one-way ANOVA test was also applied for this set of DMPs to evaluate whether differential methylation was more marked in any particular tumour subtype, and therefore whether the experimental findings were in keeping with those described in the literature to date or whether they were more general findings across all tumour subtypes. For the HOXA9-associated DMP, cg27009703, there was a statistically significant difference in differential methylation between tumour subtypes by one- way ANOVA (F4.318, p=0.005).

The Tukey post-hoc test revealed that differential methylation was significantly greater for the follicular lymphoma group compared with the myeloma group and significantly greater for the low grade group compared with the myeloma group. The second HOXA9-associated DMP, cg07778029, there was also a difference in

83 methylation between tumour subtypes (F6.834, p=0.0002). The Tukey post-hoc test revealed greater differential methylation for the follicular lymphoma group compared with myeloma, and greater differential methylation for the low grade group compared with both high grade NHL and myeloma groups. These results suggest that the differential methylation findings for HOXA9 are strongest for the low grade and follicular lymphoma subtypes, which is concordant with the aberrant methylation findings reported within the tumours themselves. For the remainder of DMPs listed above, there were no significant differences in methylation between the tumour subtypes.

Seven DMPs corresponded to known genes with aberrant methylation but methylation was not concordant with the published studies (our findings showed a loss of methylation but previous studies reported hypermethylation). Summarised in Table 21 below, another feature of these DMPs is the mean methylation levels of controls lies in the intermediate ranges of the assay (between 0.25-0.75), the range at which the assay is reported to be less reliable (192). The lack of concordance with the published literature and methylation levels at risk for error suggests caution in the interpretation of differential methylation for these DMPs.

Table 21: DMPs corresponding to genes described as showing aberrant methylation in literature review

Probe ID Gene Chm OR p value β β (case- Relation Prom- Meth. (cont.) control) to CpG ass agrees island ^ -8 cg18279094 FOXD3 1 1.53 5.07x10 0.256 0.043 CG island No ✓ -8 cg02527112 HOXD11 2 1.59 3.39X10 0.078 0.016 Shore Yes ✓ -9 cg27009703 HOXA9 7 1.58 2.44X10 0.134 0.037 CG island No ✓ -8 cg26521404 HOXA9 7 1.53 7.44X10 0.071 0.025 CG island No ✓ -8 cg07778029 HOXA9 7 1.56 1.47X10 0.047 0.016 CG island Yes ✓ -8 cg24319902 SFRP1 8 1.48 9.90X10 0.144 0.029 CG island Yes ✓ -8 cg20401521 DAPK1 9 1.49 6.73X10 0.049 0.019 CG island Yes ✓ -8 cg09890775 EBF3 10 1.57 7.14X10 0.335 0.021 CG island No ✓ -8 cg02287710 DIO3 14 1.50 9.70X10 0.476 0.031 CG island Yes ✓ -8 cg18804615 DIO3 14 1.52 3.98X10 0.157 0.024 CG island Yes ✓ -7 cg01664864 DIO3 14 1.52 1.07x10 0.428 0.019 CG island Yes ✓ -9 cg02455094 ONECUT2 18 1.59 3.81X10 0.128 0.019 CG island No ✓ cg20932535 PAX7 1 1.52 6.53X10-8 0.801 -0.022 Open sea No ✕ cg22383924 TP73 1 1.61 4.33X10-9 0.607 -0.027 Open sea Yes ✕ cg24512093 ROBO1 3 1.65 2.83X10-8 0.569 -0.027 Open sea Yes ✕ cg11223735 MGMT 10 1.62 1.57X10-8 0.655 -0.009 Open sea No ✕ cg26166595 DKK3 11 1.54 3.45X10-8 0.368 -0.026 Open sea Yes ✕ cg05583636 ARHGAP17 16 1.57 1.92X10-8 0.892 -0.021 Open sea Yes ✕ cg02601893 BIK 22 1.52 8.41X10-8 0.631 -0.016 Open sea Yes ✕

β = β methylation value OR=odds ratio for the association between the difference in methylation between cases and controls and the risk of MBCN ^ Direction of methylation is concordant with literature

84 Adjustment for lifestyle factors

After adjustment for smoking, alcohol intake, dietary folate intake and body mass index, 1,339 CpG sites reached statistical significance for differential methylation by Bonferroni criteria. The great majority of the differentially methylated sites identified after adjustment for lifestyle factors (1,267 CpGs) were the same as those identified by the unadjusted analysis.

In order to evaluate any potential change in association between methylation and MBCN before and after adjustment for lifestyle factors, the regression coefficients of the unadjusted and adjusted conditional logistic regression analyses were compared for the 1,338 DMPs. There was very high correlation (r2=0.99) between the regression coefficients of the two analyses, suggesting minimal effect of lifestyle factors on methylation results.

Figure 7: Comparison of models before and after adjustment for lifestyle Comparison of regression coefficients r2=0.99

In addition, a likelihood ratio test was performed for each CpG probe to compare the unadjusted model with the model corrected for lifestyle factors. For all 1338 DMPs the p value for the likelihood ratio test was ≥0.05, indicating that the addition of the lifestyle factors as additional variables in the adjusted analysis regression model did not significantly improve upon the unadjusted model for the prediction of MBCN.

85 Adjustment for white blood cell content

The effect of different white blood cell proportions in blood samples has potentially significant effects on methylation analysis. The following steps were taken, first to identify the effect of white blood cell content on methylation and then, where necessary, to adjust analyses to take these differences into account.

I. Estimated white blood cell content in samples The white blood cell content of each sample was not measured directly in the blood samples at time of collection, therefore white cell proportions were computed using the Houseman algorithm. The algorithm is based on the known distinctive patterns of DNA methylation within different white blood cell types (207). Using the experimental methylation data, the algorithm allows an estimation of the proportions of major white blood cell types: CD4 T cell, CD8 T cell, NK T cell, B cell, granulocyte and monocyte. The majority of samples in our cohort were stored as whole blood on Guthrie cards. The computed white blood cell content for these samples appears to be valid as the results are within the population reference ranges for healthy adults (239, 240). The samples stored as purified mononuclear cells show, as expected, lower granulocyte proportions compared with the whole blood samples (Figure 8).

86

Figure 8: Computed white blood cell proportions by DNA source CD8 T cells, CD4 T cells, natural killer (NK) cells, B cells, monocytes, granulocytes within the three sample types: Guthrie card (whole blood dried blood spot), Mononuclear cells and Buffy coat.

The mean B cell proportion in samples stored as whole dried blood spots (n=632) was 0.11 (range 0-0.73) for cases and 0.08 (range 0-0.25) for controls (p<0.01), including some cases with much higher B cell content than controls (Figure 9). Samples with higher B cell content were more cases frequently in the low grade group compared with other tumour groups (Figure 9a). B cell content was slightly greater in cases with the shortest time lag (<5 years) between blood collection and MBCN diagnosis. The mean granulocyte proportion was 0.52 for cases and higher at 0.54 for controls (p<0.01) (Figure 10). There were no significant differences in CD4T cell, CD8T cell, NK cell or monocyte proportions between cases and controls. In contrast, there is no difference in granulocyte content between the four tumour subtypes or according to the time lag between blood collection and diagnosis (Figure 10).

87

B cell content by time lag

Figure 9: B cell proportions By tumour subtype and case-control status and by groups according to time between blood collection and MBCN diagnosis.

88 Granulocyte content by tumour group

Granulocyte content by time lag

Figure 10: Granulocyte proportions By tumour subtype and by time lag between blood collection and MBCN diagnosis

89 II. Association between DNA methylation level in DMPs and white blood cell content

The next step was to evaluate to what extent differences in white blood cell content were associated with methylation values among the 1,338 DMPs.

The association between cell content and CpG methylation was evaluated to discern whether the relationship between white blood cell content and methylation was stronger for some white blood cell types than others. For each of the 1,338 DMPs, a Pearson’s correlation coefficient was calculated for the two variables: white blood cell proportion and absolute β methylation level. The correlation coefficients are summarised in Figure 11 below, with separate graphs for each white blood cell type (B cell, granulocyte, CD4 T cell, CD8 T cell, NK cell and monocyte). Correlation coefficient of ≥0.80 or ≤-0.80 was considered strong and 0.50 to 0.80 or -0.80 to - 0.50 moderate correlation.

Focusing on the relationship with B cell content, 48% of DMPs showed a moderate negative correlation (r2<-0.5) and 3% showed a positive correlation (r2>0.5) between B cell content and methylation. For granulocyte content, 66% of DMPs showed a moderate positive correlation and 2% showed a moderate negative correlation between granulocyte content and methylation. For CD8T cells, CD4T cells and monocytes there was weak or no correlation between cell content and methylation levels.

These findings suggest that it is predominantly B cell and granulocyte content that affect methylation in this cohort.

90

Figure 11: Correlation between methylation and cell content Histogram representing the correlation coefficient between cell content and DNA methylation level at each DMP. (1) B cell content (2) granulocyte content (3) CD4 T cell content (4) CD8 T cell content, (5) NK Cell content, (6) monocyte content

The CpG site cg06313775, in the 5’UTR region of DSN1, is presented as an example as it exhibited the strongest negative correlation between B cell content and absolute methylation (r2=-0.80, 95%CI [-0.82, -0.78]). Samples with the highest B cell content demonstrated low levels of methylation and tended to be cases from the low grade tumour group. This suggests that B cell content is likely to be a major confounder in the assessment of methylation, particularly for the low grade tumour group.

91

Figure 12: Correlation between B cell content and absolute DNA methylation for DMP cg06313775. Each sample is plotted, coded by case-control status and tumour subtype. Samples with strongest correlation in lower right corner tend to be cases from the low grade tumour group.

The CpG site cg13814485, located in a CpG island within 200bp of the transcription start site of GATA3, exhibited the strongest positive correlation between B cell proportion and absolute methylation (r2=0.69, 95%CI [0.65, 0.72]). In Figure 14, samples with the highest B cell content featured high levels of methylation, were also exclusively cases and predominantly from the low grade tumour group.

Figure 13: Correlation between B cell content and DNA methylation for DMP cg13814485. Each sample is plotted, coded by case-control status. Cases are coded by the time interval between blood collection and diagnosis with MBCN.

92 The CpG site cg12699321, in the 5’UTR region of SEPT9, showed the strongest positive correlation between granulocyte proportion and methylation levels (r2=0.79, 95%CI [0.76, 0.81]). Figure 14 demonstrates that, unlike the pattern seen in B cell content, the correlation between granulocyte content and methylation was equally true for cases and controls. It was also uniformly distributed across all tumour groups and time lag groups. This suggests that while granulocyte content clearly exerts an important effect on methylation, there is no systematic difference in the DNA methylation results between cases and controls, tumour groups, and or time lag groups.

Figure 14: Correlation between granulocyte content and absolute DNA methylation level for DMP cg12699321. Each sample is plotted, coded for case-control status and tumour group (top) and time lag group (bottom).

93 III. Separate regression model adjusted for white blood cell content

The results presented above demonstrate that this cohort exhibits systematic differences in B cell and granulocyte content between cases and controls. The results confirm the relationship between white blood cell content and DNA methylation levels for B cells and granulocytes but also reveal that for CD4 T cells, CD8 T cells, NK T cells and monocytes, there is no significant effect of white blood cell content on methylation.

A separate conditional logistic regression analysis was performed adjusted for computed white blood cell content.

Two exploratory models were considered: (A) adjusting for B cell and granulocyte content only and (B) adjusting for all white blood cell components computed from the Houseman algorithm (B cell, granulocyte, CD4 T cell, CD8 T cell, NK cells and monocytes). The consequence of adjustment for white blood cell content in model A is a marked reduction in the number of CpG sites reaching the statistical threshold for differential methylation. The addition of more parameters in the regression analysis in model B results in only two significant CpG sites with differential methylation (Table 22).

Table 22: Effect of correcting conditional logistic regression analysis using different models of white blood cell content adjustment

Analysis type No. of CpG sites reaching statistical significance* All samples, unadjusted n=876 1338 All samples, adjusted for B cell, granulocyte content 34 All samples, adjusted for B cell, CD4 T cell, CD8 T cell, NK T cell, 2 granulocyte, monocyte * Bonferroni p<0.05 (raw p<1.2x10-7)

Given the findings above, showing only significant differences in B cell and granulocyte content between cases and controls, model A was chosen as the optimal model for white blood cell content adjustment. In model A, of the 34 DMPs, seven had not previously been identified by the unadjusted conditional logistic regression. The 34 DMPs that were identified after white blood cell content adjustment are listed in Table 23; of note, all demonstrated a loss of methylation.

94

Figure 15: DMPs in common between the unadjusted model and the model adjusted white blood cell content

IV. DMPs corresponding to candidate MBCN genes

Of the 34 DMPs identified after adjustment for white blood cell content, two were associated with genes known to carry mutations in MBCN, identified from the literature review in Chapter 2: cg10044629, in the gene body of MUC2, known to be mutated in CLL and cg02841912 in the promoter-associated region of SYNE1, also mutated in CLL. SYNE1 was one of the seven genes that had not been previously identified as being differentially methylated in the unadjusted regression analysis.

Other CpG sites associated with genes involved in pro-proliferative pathways are cg17352891 in the promoter region of SOX5 on chromosome 12 and cg04665287 in the gene body of PDPK1 on chromosome 16.

No DMPs in this analysis corresponded to genes known to be aberrantly methylated in MBCN (compiled from the literature review in Chapter 2).

95 Table 23: DMPs identified after adjustment for white blood cell content

CpG Chm MAPINFO p value OR β Δβ case- Relation Prom- Gene control control to CpG assoc. island cg07560948 1 41264498 4.42x10-08 1.67 0.8 -0.019 Shelf Yes KCNQ4 cg14522302 1 226300601 1.03x10-07 1.73 0.85 -0.018 Shelf No - cg22627753* 1 988623 1.03x10-07 1.79 0.74 -0.047 Shore No AGRN cg19965312* 2 239799954 3.02x10-08 1.69 0.71 -0.028 No TWIST2 cg20850016 3 5165254 2.02x10-10 1.90 0.12 -0.012 Shore Yes ARL8B cg25518868 5 140984057 6.79x10-09 1.72 0.1 -0.010 Yes DIAPH1^ cg06664486* 6 170379377 2.17x10-08 1.66 0.67 -0.022 Yes - cg02841912* 6 152955983 2.28x10-08 1.79 0.22 -0.017 Shore Yes SYNE1^ cg00114160* 6 29430096 3.77x10-08 1.90 0.81 -0.027 No OR2H1 cg09083279 6 29454873 1.16x10-07 1.53 0.6 -0.028 No MAS1L cg02088996* 7 41817771 9.75x10-09 1.95 0.3 -0.016 Yes LOC285954^ cg16328023 7 36382071 3.15x10-08 1.77 0.24 -0.022 No ARHGAP39^ cg14714629* 7 99495452 4.61x10-08 2.15 0.4 -0.013 No TRIM4^ cg00718539* 7 74122 6.04x10-08 1.98 0.6 -0.026 No - cg24048338 8 43046482 2.65x10-08 1.84 0.82 -0.032 Yes HGSNAT cg00661018* 8 145769086 8.89x10-08 1.68 0.74 -0.028 Shore No KIAA1688 cg09238000* 8 142238452 1.10x10-07 1.67 0.85 -0.02 Shore Yes SLC45A4 cg27209072* 10 127702584 4.04x10-08 1.93 0.74 -0.028 Yes - cg09623279* 10 45395734 6.79x10-08 1.86 0.79 -0.034 Yes - cg10568634* 10 5005562 7.48x10-08 1.61 0.52 -0.026 No AKR1C1 cg10044629* 11 1097602 6.04x10-08 1.68 0.67 -0.019 Shore No MUC2 cg04645070 11 34393106 6.36x10-08 1.81 0.26 -0.012 Yes - ^ cg05083067 11 4661335 7.10x10-08 1.82 0.75 -0.029 No OR51D1 cg04771285 12 117557630 1.35x10-08 1.77 0.52 -0.026 Yes - cg10524576* 12 63322644 1.02x10-07 1.80 0.28 -0.013 Yes PPM1H^ cg17352891* 12 24636625 1.13x10-07 1.76 0.86 -0.023 Yes SOX5 cg09046979* 16 28333134 2.66x10-08 1.61 0.42 -0.035 Shore No SBK1 cg05772125* 16 31539169 7.75x10-08 1.88 0.64 -0.041 Yes AHSP cg02198144* 16 85462102 1.17x10-07 1.79 0.81 -0.032 Shelf No - cg04665287* 16 2590387 1.18x10-07 1.85 0.62 -0.031 Shore No PDPK1 cg25212453* 17 1509953 4.90x10-09 1.74 0.86 -0.033 Island No SLC43A2 cg25649188* 17 73499917 1.07x10-08 1.64 0.74 -0.024 Shore No CASKIN2 cg01750895 17 17463041 6.53x10-08 1.59 0.66 -0.013 Shelf No PEMT cg10613706 17 19355650 9.63x10-08 1.80 0.66 -0.031 No - β = β methylation value OR=odds ratio for the association between the difference in methylation between cases and controls and the development of MBCN * DMP corresponds with differentially methylated CpG from Kulis et al (42) ^ DMP not previously identified in unadjusted conditional logistic regression analysis

96 Tumour subtypes

The initial suggestion of heterogeneity between tumour subtypes was the appearance of 22 outlier CpG sites on the principal components analysis that were not explained by sex, ethnicity, type of DNA or participant age.

Figure 16: Principal component analysis of all 438 case-control pairs, demonstrating the group of outliers

Of the 22 outliers (20 cases and 2 controls), 15 were low grade B-NHL cases, all with a histological diagnosis of CLL/SLL. Four were high grade B-NHL cases and one a follicular NHL case.

Heterogeneity between the tumour subtypes was formally assessed using the likelihood ratio test. For 216 of the 1,338 DMPs there was significant heterogeneity between the four subtypes (phet<0.05), suggesting substantial heterogeneity between the groups. To demonstrate this further, four CpG sites with phet<0.05 and associated with genes of interest in MBCN were selected: HOXA9, PBX1, HOXC11, IKBKB. Methylation differences between cases and controls within each tumour subtype are plotted for each of the four CpG sites, showing that samples in the low grade and follicular subtypes have a greater magnitude of differential methylation than do high grade and myeloma subtypes. There was a statistically significant difference in methylation difference between the tumour subtypes by one-way ANOVA for cg07778029/HOXA9 (F6.834, p=0.0002), cg17273416/HOXC11 (F7.194, p=0.0001) and cg11283404/IKBKB (F6.91, p=0.0001) but not for cg10758824/PBX1 (p=0.055)

97

Figure 17: Differential methylation by tumour subtype. Mean methylation difference in four CpG sites demonstrating heterogeneity between tumour subtypes showing that the low grade and follicular groups demonstrate greater magnitude of differential methylation in these instances.

The experimental data showing that cases in the low grade lymphoma group had the highest B cell content (figure 10) suggests that part of the reason for the larger methylation differences seen in this group could be due to differences in white blood cell content.

Separate conditional logistic regression for each MBCN subtype Conditional logistic regression was performed separately for the four MBCN tumour subtypes (low grade [n=136], high grade [n=110], follicular [n=82] and multiple myeloma [n=109). In the unadjusted logistic regression, only one CpG site remained significant at cg15410236 (OR=2.78, 95%CI 1.89-4.00, p=1.00x10-7) in the gene body, of ARID3A, with loss of methylation.

Following adjustment for white blood cell content, no CpG site retained statistical significance.

The low-grade lymphoma group comprised 136 paired samples, with the breakdown of specific histological types described in Table 24 below.

98

Table 24: Low grade lymphoma histological types

Tumour histology n CLL / Small lymphocytic lymphoma 82 Lympoplasmacytic Lymphoma 4 Mantle cell lymphoma 17 Splenic marginal zone lymphoma 6 Waldenström’s macroglobulinaemia 9 Hairy cell leukaemia 4 Marginal zone lymphoma 15

Differential methylation in CLL/SLL Conditional logistic regression was performed for the 82 CLL/SLL pairs separately in order to try to identify aberrant methylation in this homogenous group but no CpG site remained significantly differentially methylated.

99 Time-lag between blood collection and diagnosis

Blood samples were taken from cases a median of 10.4 years prior to diagnosis with MBCN but the range of time between blood sampling and diagnosis was 2.4 months to 20 years. The possibility of an effect of time lag on the methylation results required evaluation as it may provide further information on the nature of the potential confounding factors on methylation and further insights as to the relevance of the methylation findings.

The time between blood collection and MBCN diagnosis was divided into four time- lag groups: <5 years (n=83), 5-9 years (n=116), 10-15 years (n=152) and ≥ 15 years (n=87). Of note, samples in the shortest time lag group had a higher estimated B cell content (mean B cell proportion 0.14 for time lag <5 years and 0.10 for other time lag groups, p<0.01). There were no significant differences in proportion by sex, ethnicity or tumour subtype between the time lag groups.

A likelihood ratio test was performed, comparing the standard conditional logistic regression model with a nested model in which adjustment was made for the four time lag groups. A significant difference in log likelihoods between the two models (p<0.05) signifies the nested model with the additional time lag groups as covariates is superior to the comparator model, and reflects heterogeneity of methylation between the groups. Due to the observation that B cell content was significantly different for the shortest time lag group, the likelihood ratio test was performed both with and without adjustment for white blood cell content. Without adjustment for white blood cell content, there was significant heterogeneity (phet<0.05) in 203 of the 1,338 DMPs. Interestingly, the results were substantially different when white blood cell content was taken into account. After the model was adjusted for estimated white blood cell content, only 108 of the same 1,338 DMPs demonstrated heterogeneity. This suggests that for a minority of DMPs (203 of 1,338) methylation is associated with the time between blood collection and diagnosis; the remaining 1,135 DMPs may represent DMPs with differential methylation that is more stable over time. Further, that in approximately half of these DMPs, the association between methylation and the time lag may be due to differences in white blood cell content while in the other half the association appears to be due to the time lag, irrespective of white blood cell content.

Conditional logistic regression analysis was also performed separately for each time lag group, resulting in four regression analyses. A comparison of differential methylation in the 1,338 DMPs was made between the four time lag groups, separating DMPs into those with a gain of methylation and those with a loss of methylation. The aim was to visualise whether the shorter time lag group was associated with either more gains or more losses of methylation. In figure 18A, the group diagnosed within 5 years of blood collection was associated with a significantly greater degree of methylation gain compared with those diagnosed 5-9, 10-15 or >15 years after blood collection.

100 .1 .08 .06 .04 .02 0

beta(case-control)_timelt5 beta(case-control)_time5-9 beta(case-control)_time10-15 beta(case-control)_time_gt15

Time lag group Median Δβ IQR Coefficient P value <5 years 0.040 0.036, 0.054 5-9 years 0.019 0.013, 0.020 -0.02 <0.001 10-15 years 0.018 0.013, 0.020 -0.02 <0.001 ≥15 years 0.024 0.017, 0.030 -0.02 <0.001

Figure 18A: Differential methylation in the 1,338 DMPs compared across the four time-lag groups DMPs with a gain of methylation are shown, with differential methylation on the y axis. Case-control pairs are divided into four groups based on the time interval between blood collection and MBCN diagnosis of the case. The greatest mean methylation difference was present in the shortest time lag group. Δβ= difference in β methylation between case and control

In figure 18B, showing patterns of loss of methylation, the group diagnosed within 5 years of blood collection showed a greater loss of methylation compared with the groups diagnosed 5-9 and 10-15 years after blood collection but there was no significant difference in methylation loss compared with the group with the longest time between blood collection and diagnosis.

101 0 -.02 -.04 -.06 -.08

beta(case-control)_timelt5 beta(case-control)_time5-9 beta(case-control)_time10-15 beta(case-control)_time_gt15

Time lag group Median Δβ IQR Coefficient P value <5 years -0.046 -0.071, -0.020 5-9 years -0.016 -0.027, -0.005 0.05 0.017 10-15 years -0.017 -0.026, -0.007 0.13 <0.001 ≥15 years -0.024 -0.041, -0.008 0.14 0.5

Figure 18B: Differential methylation in the 1,338 DMPs compared across the four time-lag groups DMPs with a loss of methylation are shown. The shortest time lag group showed significantly greater methylation loss

102 Discussion

1. Differentially methylated CpG sites The major finding of the analysis of differential methylation at individual CpG sites in the 438 paired MBCN samples was the association between DNA methylation at 1,338 CpG sites and subsequent development of MBCN. Despite the small differences of methylation between cases and controls, methylation measurement at the DMPs exhibited excellent technical reliability and were identified using a stringent threshold of p<1.12x10-7 (corresponding to Bonferonni correction p<0.05).

Of the 90 CpG sites demonstrating a gain of methylation, 11 genes were associated with homeobox genes (GSX1, HMX3, LHX2, NKX3-2, DLX1, EMX2, GBX2, HOXA9, HOXC11, HOXD11, ONECUT2). The HOXA9 gene was of particular interest as it contained three separate differentially methylated DMPs with a gain of DNA methylation. All three DMPs were located within a CpG island and would therefore usually be protected from methylation. Dysregulated homeobox gene expression is thought to be a mechanism of cellular proliferation and inhibition of apoptosis. Methylation-induced transcriptional silencing of HOXA9 is associated with downstream dysregulation of the Pim apoptotic pathway in vitro (241) and HOXA9 is hypermethylated in CLL and follicular lymphoma. HOXD11 was another homeobox gene demonstrating a gain of methylation in a CpG shore-associated DMP. HOXD11 has been reported to be hypermethylated in CLL cells in two separate studies using both RRBS and Infinium HumanMethylation 450K assays (98, 114).

A gain of methylation was found in eight genes described as being hypermethylated in studies of MBCN tumours, suggesting that the epigenetic changes detectable in tumour tissue may be initiated well before clinical evidence of disease becomes apparent. For the gains of methylation found in three HOXA9-associated DMPs, samples in the low grade and follicular lymphoma groups demonstrated significantly greater methylation differences than the other tumor subtypes, suggesting that these are tumour subtype-specific changes. However, for the other seven genes (HOXD11, EBF3, DAPK1, SFRP1, FOXD3, DIO3, ONECUT2) methylation difference was equal between the tumour subtypes suggesting the effect may be generalizable across this diverse range of B cell malignancies. Hypermethylation of EBF3, a transcription factor and putative tumour suppressor gene is described in CLL (98). EBF3 (early B-cell factor 3 gene) is one of three transcription factors necessary for B-cell commitment and plays a role in regulating genes required for cell proliferation and survival. EBF3 itself can carry missense mutations within brain and pancreatic tumours, but is also epigenetically regulated by methylation-induced gene silencing in brain tumours and by MIR-650 silencing (242) in vitro in CLL cells. DAPK1 is a tumour suppressor gene involved in apoptosis regulation via the ERK pathway, responsible for the downstream signaling of the B cell receptor. Methylation of DAPK1 has been reported in 85% of follicular lymphoma samples (93), in cases of CLL (102) and up to 84% of DLBCL (90, 92) and is associated with poor prognosis in follicular lymphoma and DLBCL. An association between DAPK1 methylation and gene silencing has been described in lymphoma,

103 lung and colon cancer (84). Aberrant promoter methylation of the putative tumour suppressor gene SFRP1 has been reported in up to 53% of cases of myeloma, with associated gene silencing (95, 144, 150). Epigenetic silencing of SFRP1 leads to deregulated activation of the Wnt signaling pathway, a key cell proliferation pathway in many cancers including myeloma (95). FOXD3 is a transcription regulator with DNA binding properties, and has been found to be hypermethylated in CLL (98). Additional epigenetic silencing of FOXD3 by chromatin remodeling in CLL has been reported to be an early and key event in CLL leukaemogenesis by Chen et al (243).

Twelve of the 1,338 DMPs corresponded with genes reported to show differential methylation in MBCN tumour. The most significant of these appears to be HOXA9, which is methylated in follicular lymphoma and mantle cell lymphoma (46, 86, 93, 100). In the MBCN analysis, mantle cell lymphoma was included in the low grade tumour group. The ANOVA test and post-hoc analysis comparing the four tumour groups showed that differential methylation in HOXA9 was significantly greater in the follicular lymphoma and low grade groups, supporting the possibility that HOXA9 methylation in pre-diagnostic peripheral blood could represent early detection of the disease.

Loss of methylation was by far more common among the 1,338 DMPs, occurring in 1,248 of them. Specific DMPs of interest included those associated with pathways for MAPK signaling and chemokine signaling pathways. The mitogen activated protein kinase signaling (MAPK) pathway is highly regulated by a series of protein kinases that transmit signals from the cell surface to the nucleus and ultimately regulate cell proliferation and survival (244). Dysregulation of the MAPK/ERK pathway is pathognomic of hairy cell leukaemia, an uncommon MBCN subtype in which the V600E substitution in BRAF leads to constitutive activation of the pathway. Unopposed MAPK/ERK signaling is also critical in the most common MBCN (diffuse large B cell lymphoma, one of the high grade NHL subtypes) in which targeted blockage of the pathway is able to induce apoptosis in lymphoma cells (245). One of the DMPs with a loss of methylation was the MAPK/ERK gene NFATC1, one of a family of five transcription factors for NFAT (nuclear factor of activate T cells). In diffuse large B cell lymphoma and Burkitt lymphoma NFAT is dysregulated with abnormal localization from the cytoplasm to nucleus and constitutive activation (246-248). NFAT activation appears to be an important transcriptional regulator of the B-lymphocyte stimulator (BLyS) by binding to sites in the BLYS promoter in aggressive B cell lymphomas (248). There was also loss of methylation in IKBKB, part of the MAPK/ERK pathway as well as a critical activating kinase in nuclear factor kappaB signaling (NFκB). Both IKBKB mutations and IκB kinase activation has been described in B cell lymphomas and multiple myeloma (249, 250). While there appears to be a role for DNA methylation-mediated suppression of the NFκB pathway (251), there are no published reports of aberrant IKBKB methylation in MBCN.

Overall, the functional significance of hypomethylation in cancer is much less understood than for hypermethylation. The findings presented here in which the vast majority of DMPs exhibited a loss of methylation, are consistent with the

104 widespread global hypomethylation reported in the global methylation study (see Chapter 5.1). It is also consistent with other studies that report global hypomethylation in cancer tissue (224). As a marker of cancer risk, DNA methylation in peripheral blood has been investigated by a relatively small number of studies for solid cancers but not for MBCN. Brennan et al reported a meta-analysis of such studies and concluded that the overall odds ratio for the association between DNA hypomethylation and cancer risk for mixed solid cancers was not significant (169). The authors commented that the lack of association in the meta-analysis may have been due to the design of the studies available for review at that time which were largely retrospective in nature and, thus, unable to control or adjust for variables such as age or smoking exposure and may have under-estimated the relationship between methylation and cancer.

A small number of DMPs were associated with genes known to be mutated in MBCN. CpG sites within SNX7, VILL and MUC2 (mutated in CLL), ROBO1, SP140 and LRRK2 (mutated in myeloma), DTX1 (mutated in marginal zone NHL) TNFRSF13C/BAFF-R (mutated in a number of B-NHL and CLL) and IKBKB (mutated in myeloma and low grade lymphoma) all demonstrated loss of methylation. As described above, three CpG sites within HOXA9 (mutated in mantle cell lymphoma) demonstrated gains of methylation. The correlation between genes known to be mutated in MBCN and the detection of aberrant methylation in our cohort in one respect could be interpreted as confirming the relevance of these genes in MBCN. However, it is also possible that circulating tumour cells may have been present in the baseline DNA samples and could have contained tumour-specific mutations resulting in perceived methylation differences. The clinical diagnosis of CLL, follicular lymphoma and some subtypes of mantle cell lymphoma is known to be preceded by a period of up to many years of an asymptomatic precursor condition in which circulating tumour cells can be detected and it is, therefore, possible that for these tumour subtypes methylation findings are a consequence of an underlying mutation. Mutation analysis was not performed, but two analyses were performed to test the likelihood that methylation changes in these genes could be due to mutations in circulating tumour DNA. First, it was postulated that if tumour-related mutations were causing the methylation signals, then the methylation differences should be greatest in samples from the tumour group known to contain the recurrent mutations. DTX1 and IKBKB contained DMPs with significantly greater differential methylation in the low-grade NHL cases compared with the other tumour subtypes. As both these genes are recurrently mutated in low grade, it raises the possibility that the methylation changes detected in these could be due to mutations in circulating tumour DNA. There was no difference in methylation between the tumour subtypes for the other genes (SNX7, VILL, MUC2, ROBO1, SP140, LRRK2 or TNFRSF13C/BAFF-R). Second, an evaluation of differential methylation according to the time between blood collection and diagnosis was performed. For DMPs associated with the genes SNX7, LRRK2, ROBO1, VLL, TNFRSF13C/BAFF-R, IKBKB and DTX1, differential methylation was greatest in cases with the shortest latency (<5 years). This could suggest that differential methylation in these genes may be due to the presence of circulating tumour cells or tumour DNA, which is more likely to be present in the years just prior to diagnosis and would be important to follow up with mutational analysis. Alternatively, these

105 methylation changes may still be true epigenetic alterations which become more evident closer to the time of diagnosis. The other genes HOXA9, MUC2 and SP140 do not demonstrate any increase in differential methylation in the shortest latency group, suggesting that methylation in these genes may be stable, longer term changes.

2. DMPs after adjustment for white blood cell content

The effect of cell content on DNA methylation is well established (203-206) therefore efforts were made to identify cell content differences and adjust for them. The conditional logistic regression analysis was adjusted for B cell and granulocyte content after it was determined that there were differences in these white blood cells between cases and controls. Also, significantly, levels of these white blood cells were associated with significant differences in DNA methylation.

After adjustment for B cell and granulocyte content, 34 DMPs were identified, all demonstrating loss of methylation. One of these, PDPK1 on chromosome 16 is of interest as the protein it encodes, 3-phosphoinositide protein kinase 1 (PDPK1) is part of the PI3k-Akt pathway that transmits cell surface signals from the B cell receptor complex and chemokine receptors to intracellular pathways such as MAPK (cell proliferation and DNA repair), FOXO (cyclin regulation) and BCL2 (apoptosis) signaling. Activation of PDPK1, an oncogenic serine/threonine kinase, has been demonstrated in both multiple myeloma and lymphoma (252, 253). PDPK1 is an essential activator of the RAS/ERK pathway commonly activated in myeloma and appears to regulate a key group of cell survival molecules such as MYC, IRF4, D-type cyclins and PLK1 while PDPK1 inactivation is associated with myeloma cell apoptosis (253). Tatekawa et al propose epigenetic regulation of PDPK1 via miR-375, in which both DNA methylation and histone acetylation appear to regulate miR-375 activity and subsequent PDPK1 expression (254). There are no reports in the literature of direct DNA methylation regulation of PDPK1.

A second gene of interest demonstrating loss of methylation is SOX5, on chromosome 12. SOX5 up-regulation has been demonstrated in nasopharyngeal carcinoma and melanoma, while a recent observation revealed SOX5 up-regulation is found in TRAF3-/- mice and was associated with rapid progression of B cell lymphoma (255). The mechanism of SOX5 up-regulation does not appear to be due to mutations, and epigenetic mechanisms of regulation have yet to be explored. Other candidate MBCN genes that remained significantly differentially methylated after white blood cell adjustment were MUC2, described above, and SYNE1.

It is essential that any study of peripheral blood DNA methylation as a potential cancer risk marker should adjust its findings for white blood cell content, given the effect of different cell types on methylation. However, despite using validated cell content prediction algorithms it is still not clear how well they perform in the context of potential circulating tumour B cells, as they have not been tested in this setting. First, it is not clear whether the tumour B cells would carry the same methylation profile as normal B cells. Second, in participants who may have had an undiagnosed

106 precursor condition such as monoclonal B lymphocytosis or MGUS, circulating tumour B cells are likely to be present in very low numbers and may not even be identified using the Houseman algorithm. Separate to these questions of the validity of the white blood cell content algorithm, is a potential argument that in correcting for white blood cell content we may suppress important methylation changes occurring in non-tumour lymphocytes that may relate to immune regulation or inflammation. Finally, any DNA sample whether from blood, buccal mucosa or skin has the potential to contain several different cell types and may be affected by cellular heterogeneity. It would be preferable to examine methylation within one cell type, for instance by immunophenotypic sorting of CD20 positive B cells, however this is unlikely to be feasible for large population-based studies.

3. Methylation by tumour subtype

Tumour heterogeneity was formally tested between the four pre-specified tumour subtypes and was confirmed for a significant proportion of DMPs (216 of 1,338 DMPs demonstrated heterogeneity). Within these four tumour subtypes MBCN comprise further histologically and genetically diverse tumours, therefore this finding is one that is anticipated. The next step was to perform separate conditional logistic regression analyses for each of the tumour subtypes: low grade, high grade NHL, follicular lymphoma and multiple myeloma groups. Given the marked loss of statistical power as a result of analyzing the subgroups, it is perhaps not surprising that only the low grade group retained one significant DMP. One DMP and a large number of DMRs were associated with the low grade group.

The single DMP was associated with ARID, also known as BRIGHT – B-cell regulator of immunoglobulin heavy chain transcription on chromosome 19. The functional significance of dysregulation of ARID3A is uncertain – it appears to function as a transcriptional regulator and is essential in B cell lineage commitment (256) and haematopoietic stem cell differentiation (257).

In addition to overall tests of heterogeneity, analysis of methylation data in individual probes reveals there are differences in the degree of differential methylation in each tumour subtype. Frequently, the largest absolute difference in methylation between cases and controls was seen in the low grade group. The low grade group (ncases= 137) comprised predominantly chronic lymphocytic leukaemia/small lymphocytic lymphoma (CLL/SLL) (n=82) as well as marginal zone lymphoma (n=15), Lymphoplasmocytic lymphoma/Waldenström’s macroglobulinaemia (WM) (13), mantle cell lymphoma (17), splenic marginal zone lymphoma and hairy cell leukaemia. CLL and SLL are histologically and genetically identical. Their diagnosis is invariably preceded by a variable period of an asymptomatic, benign precursor condition in which very low levels of monoclonal B lymphocytes can be detected in the peripheral circulation (209). There was a statistically higher B cell level in the low grade tumour subtype compared with the other tumour subtypes. This observation supports the importance of adjusting for white blood cell content. However, the circulating tumour B cells in these precursor conditions only represent a small

107 proportion of the circulating B cells for an individual with this condition and frequently the total B cell count remains within the normal range. There is a limit to the Houseman algorithm’s ability to fully correct for circulating tumour B cells. An accurate detection of monoclonal B cells in the archived blood samples of our cohort would have required additional sequencing and this has not as yet been performed. Of note, a large number of DMPs did not demonstrate heterogeneity by tumour subtype which may reflect that some of the differential methylation observations may be relevant to all MBCN subtypes.

Further conclusions regarding methylation within each of the tumour subtypes is limited due to the small sample sizes and related deficiencies in statistical power.

4. Effect of time lag between blood collection and MBCN diagnosis on methylation The effect of latency, the time between blood collection and MBCN diagnosis was explored as a potential mechanism to identify, first, the timing of methylation changes in oncogenesis and, second, whether methylation changes are primary events or secondary to genetic mutation events.

Overall, the tests for heterogeneity by time lag group suggest that for a substantial minority of DMPs the latency between blood collection and MBCN diagnosis has an effect on methylation levels. Samples with the shortest time lag between blood collection and diagnosis show the largest methylation differences. This latency effect appears to be partly explained by differences in white cell content, given that the measurable heterogeneity between the four time lag groups (<5 years, 5-9 years, 10- 15 years and >15 years) is reduced after adjustment for white cell content.

The detection of stronger methylation findings in the shortest time lag group could still reflect methylation as a primary event, one that becomes more marked closer to the time of diagnosis. Alternative explanations include, first, that the methylation differences found in the shortest time lag group are due to the presence of circulating tumour cells prior to diagnosis. As described above, this is a well described phenomenon in CLL/SLL and is also recognised in follicular lymphoma. An alternative hypothesis is that these differences are truly markers of MBCN risk and could be due to germline methylation. To date, it remains unclear whether peripheral blood DNA methylation is a risk factor for solid cancers. In a recent meta- analysis, Brennan et al reported varied results and overall did not find a significant association between methylation and cancer risk (258). Another possible hypothesis is that the differential methylation reflects changes in other, non-tumour, cells carrying aberrant methylation signals such as those associated with chronic inflammation or immune dysregulation. A model for inflammation as a risk factor for MBCN due to a background of autoimmune disease or chronic infection is supported by the observation that a small number of specific lymphomas are strongly associated with infection and inflammation (Epstein Barr virus, Helicobacter pylori) (49, 59). Conroy et al investigated whether circulating levels of inflammatory markers found prior to diagnosis were associated with the

108 risk of subsequent development of NHL (259). The study found a positive association with the inflammatory cytokine IL-10 and a negative association with the adipokine leptin (usually reduced in obesity) supporting the hypothesis that pre-diagnostic markers of inflammation could be relevant biomarkers of risk for NHL. A model for overt immune dysregulation as a risk factor for MBCN exists in the case of increased incidence of MBCN in those with autoimmune disease (9) but there is no validated clinical model for the detection of subclinical autoimmune dysfunction and MBCN risk. Finally, one must consider the possibility that the finding of the most marked differential methylation occurring in the shortest time-lag group could be due to changes in DNA methylation over long periods of storage. This does not seem likely, first because methylation results in samples stored as dried blood spots do not appear to degrade significantly over time (202). Second, the study was designed so that case-control pairs were matched for DNA sample age and any changes in methylation due to storage time would have affected the pair evenly.

Despite the heterogeneity found in some DMPs by latency, there were 1,135 DMPs in which there was no significant difference in differential methylation over time. It is possible these are stable methylation marks but this needs to be confirmed by measuring methylation in the same individuals over multiple time-points

5. Discussion on importance of methodological features of study

Study design controls for biological confounders Potential known biological confounders of DNA methylation include age, sex and ethnicity. Additionally, lifestyle exposures such as obesity, smoking, alcohol intake and vitamin D intake may also modulate methylation. The study design mitigated the effect of these confounders by requiring controls to be matched 1:1 with cases for age, sex and ethnicity by 1:1 matching of cases and controls. For the lifestyle exposures information from participants was available to test their effect on methylation by adjusting for these factors in a separate regression analysis. There were no significant differences in the lifestyle exposures between cases and controls and after adjustment for these factors the results of the logistic regression model did not change significantly. Another characteristic of DNA methylation that makes interpretation challenging is that it is subject change over time, both due to age (which we were able to control for) and subsequent changes in environmental exposures (which we were unable to measure). In this study, methylation was not analysed for the same individuals at different time points, but DNA was collected at later time points from a subset of participants. A measurement of the stability of methylation changes over time would be a valuable study to undertake in the future.

Analysis pipeline applies corrections for measurement error There are a number of well-described potential sources of measurement error and bias when using the Infinium HumanMethylation BeadChip 450K. The first is the potential for batch and plate effect in which systematic differences in conditions in samples run in different batches or chips could lead to methylation differences due to technical error rather than biological effect. The potential batch and chip effects

109 were addressed by sample placement, ensuring cases and controls were co-located and samples from different tumour subtypes were mixed evenly across all chips. In addition we used a data processing pipeline that is now considered a standard approach comprising corrections for background noise, between-sample normalization and ComBat normalization (221). The second source for potential bias that is inherent to the design of the 450K array is the use of two different CpG probe designs. While the two probe designs are the basis for the large number of CpG sites measured by the assay, a correction step is necessary which was performed using a validated normalization package (187).

Validated assay precision is critical for detection of small methylation differences One of the purported strengths of the Infinium Human Methylation BeadChip 450K is its ability to quantitate methylation at single CpG dinucleotides at over 480,000 sites. In order to detect small levels of change in methylation, both precision and accuracy of the assay are important to consider. Published values of precision show very high correlation following repeated measurement of technical replicates (r2=0.992) (188). All methylation assays in this study were run in-house and a technical replicate sample was placed on each plate. The intraclass correlation coefficient at each DMP was used as a measure of assay reliability. An interesting finding was that the intraclass correlation coefficient was variable for different DMPs. Among all 485,349 evaluable CpG sites the median ICC was moderate (0.60) while for the 1,338 differentially methylated CpG sites the median ICC was very high at 0.95. Among the DMPs, 96% achieved ICC≥0.8, considered excellent reliability (231). The reason for the difference in measured reliability of the assay at different CpG sites is not clear, but supports the practice of a check for reliability within each methylation experiment.

A strict threshold for correction of multiple testing is appropriate given small methylation differences Three commonly used methods of correction for multiple testing were applied to the data: the false discovery method (FDR q<0.05), Bonferroni correction (in this case p<1.2x10-7) and a generalised cut-off applicable to genome-wide studies (p<1x10-8) (201). Both the Bonferroni and global cut-offs were far more stringent than FDR and, therefore, preferable given the small absolute differences in methylation. The Bonferroni correction was chosen as it has been more frequently used in published methylation array studies compared with the global cut-off.

110 5.3 Differentially methylated regions Background

Differential methylation occurring in multiple CpG sites across a continuous genomic region is referred to as a differentially methylated region (DMR). DMRs are distinctive enough to be able to differentiate between normal and cancer tissue (c- DMRs) and can also identify certain tissue types (t-DMRs) (168). Compared with DNA methylation status at an individual CpG site, knowledge of methylation in neighbouring sites is of potentially greater biological significance, given that hypermethylation of promoter-associated regions which is sometimes associated with gene silencing (224). The definition of a DMR varies, but generally refers to differential methylation occurring in two or more CpG sites within a genomic region spanning 500bp – 1,000bp (224).

Based on work by Weber et al, the properties of CpG sites within a DMR could indicate the potential functional consequences of aberrant methylation in the DMR. In normal somatic cells, promoter regions containing a small proportion of CpG island sites exhibit different methylation patterns to those with high CpG island proportions (79, 120). Low CpG island promoters are typically methylated while high CpG island promoters are typically unmethylated. In the global methylation analysis results reported in Chapter 4, high CpG promoter regions were confirmed to demonstrate that a gain of methylation, in these regions that are usually protected from methylation, was associated with MBCN. These observations suggest that the type of CpG sites that constitute DMRs, whether they are promoter-associated or CpG island associated, may provide additional clues to their underlying functional significance.

DMRs are typically identified from methylation data obtained from one of the ‘whole-genome’ DNA methylation assays, followed by identification of all differential methylation within a region by using one of a number of published methods (260- 262). An important consideration is to avoid methods that rely on analysis of pre- annotated CpG sites because approximately one-quarter of the CpGs in the Infinium450K array are intergenic and, therefore, not annotated in the Illumina manifest file. The preferred methods ignore any annotations and identify regions based solely on chromosomal co-ordinates (261). Different thresholds have been used in studies of DMRs in peripheral blood such as a threshold of mean methylation difference >2%, maximum methylation difference >>5%, or a more stringent p value cut-off such as FDR p<0.001 (203, 263). Finally, one group also applied an additional criterion that a DMR should contain at least five CpG sites (263).

For this thesis, regional methylation analysis was used as a secondary analysis following the main site by site conditional logistic regression analysis as the current DMR methods rely on linear rather than logistic regression, use unconditional rather than conditional rules and are a relatively recent analytical method for which common standards are not yet agreed.

111 Analysis

DMRs were identified using the DMRcate Bioconductor package (261). This method uses a linear regression model to rank differentially methylated positions and agglomerates single CpG sites into clusters by genomic location. The method uses kernel smoothing which is tunable depending on the user-defined bandwidth λ. A DMR may contain CpG sites that do not by themselves meet the criteria for a differentially methylated position. DMRcate avoids distinguishing between CpG sites with UCSC gene or CpG island annotation and thus reduces the potential bias of analyzing CpG sites based on Illumina’s annotations. DMRs are reported with Stouffer (combined) p values (pcomb), which reflect the association of each cluster of CpG sites with the occurrence of MBCN.

A DMR was initially broadly defined as a region with FDR p<0.05, containing >1 CpG site, with <1,000bp between CpG sites. Given there is no agreed threshold for defining DMRs in peripheral blood studies, an exploratory approach was used to report the results achieved using different exploratory thresholds.

DMRs were described according to chromosomal location. They were further divided according to the proportion of CpGs sites contained within them that were either promoter-associated or were located within a CpG island, in order to comment on patterns of methylation status according to promoter and CpG island content of the DMR. DAVID Bioinformatics Resources 6.8 was used to for gene ontology classification (232, 233) by identifying the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways and using the functional annotation clustering tool (clusters with Benjamini-Hochberg p value <0.05 were considered significant).

Results

Using the DMRcate package 9,857 DMRs were identified, comprising 6,364 DMRs in which there was a loss of methylation and 3,498 in which there was a gain of methylation.

Pattern of dysregulated methylation by epigenomic location

The results of the DMR analysis was compared with the known patterns of methylation in CpG islands and promoter-associated regions in cancer tissue, in which gains of methylation are predominantly seen in these methylome locations compared with widespread losses of methylation in non-promoter regions (79, 181). Individual CpG sites within DMRs were identified as being located within CpG islands or promoter-associated regions. This involved a manual process of referencing CpG sites back to the Illumina manifest and was performed for a subset of the 41 DMRs -25 with pcomb <1x10 . DMRs with higher CpG island content demonstrated a gain of methylation while DMRs with lower CpG island content exhibited loss of

112 methylation. As shown in the figure below, CpG island content of 40% was an accurate predictor of whether a DMR exhibited a gain or loss of methylation.

Methylation in DMRs and proportion of CpGs in CpG island 0.05 0.04 0.03 0.02 0.01 0 -0.01 0 20 40 60 80 100 120

max detla beta max detla -0.02 -0.03 -0.04 -0.05 % probes in a DMR located in CpG island

Figure 19: Association between proportion of CpG island content of DMR and methylation The top 40 DMRs (ranked by pcomb) are shown. CpG content is plotted against methylation difference.

For DMRs with greater promoter-associated CpG site content there was a pattern of gain in methylation. In particular, gains of methylation were only seen in DMRs with promoter-associated content of >20%. Losses of methylation were seen in DMRs with both high and low promoter-associated CpG site content.

Methylation in DMRs and proportion of promoter-associated CpGs 0.05 0.04 0.03 0.02 0.01 0 -0.01 0 20 40 60 80 100 120

max delta beta max delta -0.02 -0.03 -0.04 -0.05 % promoter-associated probes in a DMR

Figure 20: Association between proportion of promoter-associated CpGs within DMR and methylation DMRs are shown, plotted by percentage of CpG sites in a promoter-associated region. Gains of methylation were restricted to DMRs with high promoter-associated CpG site content (>20%)

113 Chromosomal location of DMRs

The chromosomal location of all 9,857 DMRs was visualised using the UCSC Genome Browser Human Genome Graphs function using track data hubs (264), Figure 22. While there was a broad numerical distribution of DMRs across all chromosomes, a high concentration of DMRs can be seen in the short arm of chromosome 6. This location corresponds to the human leucocyte antigen (HLA) gene complex which encodes a group of proteins known as the major histocompatibility (MHC). The HLA region contains a high concentration of genes related to antigen presentation, immune diversity and inflammatory proteins (265). Within the ~3.8kB region of the HLA locus (ch6:27,500,000-33,400,000) 203 DMRs were identified including DMRs associated with genes encoding MHC class I antigens (HLA-B, HLA-C, HLA-E, HLA-F, HLA-G) and MHC class II antigens (HLA-DMA, HLA-DMB, HLA-DOA, HLA-DOB, HLA- DPB1, HLA-DPB2 and HLA-DQA2) and MHC class III (NOTCH4, LTA, LTB, MICB, HSPA1B, ATP6G and TNXB) (265, 266). An enrichment odds ratio was calculated, comparing the proportion of HLA-associated DMRs to the proportion of HLA- associated probes in the Infinium450K assay (12,871 probes within the region ch6:27,500,000-33,400,000). The enrichment OR = 1.50 (95%CI 1.30 – 1.73, p=1.87x10-9), suggesting the number of DMRs within the HLA region is greater than that expected by chance.

The same region on chromosome 6p contains two common polymorphisms identified by genome-wide association studies that were not removed prior to DMR analysis and which, therefore, need to be considered as potential confounders of methylation results. One was a polymorphism rs2647046, corresponding to HLA- DQA2 gene, associated with reduced follicular lymphoma risk (OR 0.59) (25). The second was rs2285803, corresponding to PSORS1C1, associated with increased myeloma risk (OR 1.19) (30).

Figure 21: Chromosomal location of DMRs showing peak in Chm 6p21.3

114

Figure 22 below shows all DMRs located on chromosome 6 (pcomb <1x10-8) again demonstrating the clear hyper-localisation of DMRs in 6p21.3 with predominant loss of methylation compared with DMRs in other parts of chromosome 6 which demonstrate predominantly gains of methylation.

Figure 22: DMRs in chromosome 6 High concentration of DMRs in 6p21.3 and distinct predominance of loss of methylation in this region compared with DMRs in other parts of chromosome 6. Light blue=methylation gain, dark blue=methylation loss

Comparison with literature review of candidate genes

Of the 9,867 DMRs, 80 DMRs in 56 unique genes correlated with the candidate MBCN genes identified by literature review (see Chapter 2). These included NFKBIE and SYNE1 (mutated in CLL), CARD11 and TCF3 (mutated in DLBCL), CD79A (in DLBCL and CLL), FAS (in follicular lymphoma), ROBO1, AHR, NPTX2, CDC14B, NR2F2, CDH1 and SOX9 (mantle cell lymphoma), FRZB, FGFR3 and DKK1 (myeloma), HOXA9 and CDK6 (myeloma and mantle cell lymphoma) and TNFRSF13C/BAFF-R.

Comparison with literature review of aberrantly methylated genes

Of the total 9,867 DMRs, 106 s correlated with genes known to be aberrantly methylated in the literature review presented in Chapter 2. These DMRs are presented in Table 23, showing that the majority (78 of 106) demonstrated methylation changes that were concordant with those reported for MBCN tumours. An evaluation of which DMR parameters were important in identifying DMRs that were concordant with those reported in the published literature was performed. A number of factors were identified a priori as being likely to identify ‘confirmed DMRs’: greater magnitude of methylation difference, larger number of CpG sites within the DMR and lower pcomb value. Univariate logistic regression was used to evaluate whether these factors were associated with the likelihood of a DMR being concordant with the published literature. Direction of differential methylation was significant (gain of methylation associated with greater likelihood of identifying a confirmed DMR) but magnitude of the methylation difference was not significant. Both the number of CpG sites contained within a DMR and the pcomb value were significant factors (Figure 23a).

115

The finding that gains rather than losses of methylation in DMRs was associated with identifying confirmed DMRs may be due to a publication bias in the reporting of gene-specific hypermethylation in MBCN. In the literature review presented in Chapter 2, only 30 individual genes are reported to be hypomethylated in comparison with 142 genes hypermethylated. For this reason, the mean methylation difference was not included in multivariate analysis. By multivariate analysis including the number of CpG sites and pcomb as variables, only the number of CpG sites remained statistically significant as a predictor of DMR concordance (p<0.001). This suggests that a threshold for number of CpG sites alone may be adequate to select peripheral blood DMRs concordant with methylation occurring in MBCN tumours. A threshold of at least 7 CpG sites per DMR resulted in 81% sensitivity and 76% specificity for identifying DMRs with methylation concordant with the published literature.

(a)

0 1 .02 .01 0 Mean methylation difference methylation Mean -.01 -.02

Graphs by Methylation concordant

(b)

0 1 .02 .015 .01 .005 Absolute mean methylation difference methylation mean Absolute 0

Graphs by Methylation concordant

116 (c)

0 1 50 40 30 20 Number of CpGs Number 10 0

Graphs by Methylation concordant

(d)

0 1 0 -10 -20 Log(pcombined) -30 -40

Graphs by Methylation concordant

Figure 23: Association between properties of DMRs and concordance with methylation findings in literature (a) methylation difference (p<0.001), (b) absolute methylation difference (p=0.15) (c) number of CpGs (p<0.001), (d) log(pcomb) (p<0.001), univariate logistic regression

DMRs demonstrating gains of methylation concordant with that reported in the literature are summarised in Table 25 and include genes associated with the Wnt signaling pathway: WIF1, DKK1, SFRP1, SFRP2 and SFRP4 and putative tumour suppressor genes: (SOX11, RBP1, TGFB1, AHR, HOXA9, SFRP1, KLF4, FOXA1, CDH1, SPARC, CYB5R2). In 27 DMRs, there is loss of methylation while in published studies the same genes are described as hypermethylated. DMRs exhibiting methylation concordant with that described in the literature were more likely to contain a larger number of CpG sites (number of CpG sites in the DMR, mean=16.4 [95% CI=14.1-

117 18.7]) compared with DMRs in which methylation was discordant with the literature (mean=5.5 [95%CI=3.9-7.1) (p<0.0001).

Some genes were represented by multiple DMRs in which the direction of methylation varied within the same gene. Methylation patterns within DLC1 illustrate the association between the number of CpG sites within a DMR and the variable methylation results. The tumour suppressor gene DLC1 located on chromosome 8 and methylated in 85% of DLBCL tumours (87), contained four significant DMRs. In two large DMRs, containing 16 and 7 CpG sites respectively, there was a gain of methylation; while in two small DMRs both containing three CpG sites there was a loss of methylation. Similarly in GRB10, hypermethylated in follicular lymphoma (93), three DMRs were identified; in two DMRs (one containing 25 CpG sites) there was a small gain of methylation and in one smaller DMR containing 13 CpG sites there was a loss of methylation. For NOTCH3, two DMRs of similar size (containing 5-8 CpG sites) were found, one with a gain of methylation and one with loss of methylation, highlighting the challenges of interpreting differential regional methylation in DMRs comprising smaller numbers of CpG sites.

118 Table 25: Methylation status of DMRs within genes known to be aberrantly methylated in MBCN. The direction of methylation in DMRs was concordant with that reported in the literature in the majority of cases. For the minority of cases in which methylation was discordant, DMRs contained fewer CpG sites (p<0.0001).

Hypermethylated in MBCN DMR Gene1 Disease2 No. Chm start end Adj. p Mean Meth 4 CpGs value3 Δβ agrees TFAP2A DLBCL 40 6 10410042 10417726 1.30x10-24 0.01 ✓ 23 6 10419016 10422322 1.68x10-06 0.007 ✓ 4 6 10404084 10404967 2.97x10-5 0.008 ✓ GATA4 DLBCL 31 8 11559334 11563023 1.98x10-15 0.007 ✓ 12 8 11565277 11568356 1.44x10-7 0.009 ✓ 6 8 11550001 11550841 2.31x10-2 0.004 ✓ DLC1 DLBCL 16 8 12989237 12991196 2.40x10-20 0.009 ✓ 7 8 13133706 13135151 1.11x10-6 0.01 ✓ 3 8 13373033 13373141 8.13x10-5 -0.017 ✕ 3 8 12952379 12952665 2.57x10-5 -0.009 ✕ GDNF DLBCL 15 5 37838741 37840839 1.97x10-12 0.007 ✓ 14 5 37834102 37835965 6.17x10-9 0.007 ✓ NEUROD1 DLBCL 13 2 182544307 182545934 1.56x10-7 0.007 ✓ CRY1 DLBCL 11 12 107486229 107488596 1.27x10-10 0.005 ✓ KCNK12 DLBCL 11 2 47796781 47798679 3.63x10-2 0.004 ✓ ZIK1 DLBCL 10 19 58095011 58095659 1.98x10-3 0.007 ✓ SPIB DLBCL 9 19 50921232 50922484 7.32x10-4 -0.005 ✕ GALNS DLBCL 7 16 88905165 88907370 1.34x10-4 0.011 ✓ GRIN2B DLBCL 6 12 14133336 14133887 1.61x10-2 0.005 ✓ CYB5R2 DLBCL 5 11 7693674 7694814 2.63x10-6 -0.008 ✕ FGD2 DLBCL 3 6 36972027 36972163 2.82x10-9 -0.018 ✕ SORL1 DLBCL 3 11 121460778 121460973 3.35x10-2 -0.009 ✕ 2 11 121477655 121477737 1.33x10-3 -0.007 ✕ ONECUT2 FL/DLBCL 20 18 55101965 55106337 8.63x10-19 0.01 ✓ MYOD1* FL/DLBCL 18 11 17740430 17743946 3.30x10-13 0.01 ✓ GRB10 FL 25 7 50860173 50862266 2.68x10-5 0.006 ✓ 13 7 50799978 50801423 9.29x10-7 -0.007 ✕ 4 7 50773857 50774217 1.87x10-3 0.001 ✓ SIM2 FL 22 21 38069950 38072998 3.00x10-21 0.009 ✓ FOXA1 FL 15 14 38063564 38065582 8.22x10-26 0.012 ✓ 15 14 38067219 38069639 3.08x10-3 0.006 ✓ KLF4 FL 9 9 110249749 110252451 1.42x10-2 0.005 ✓ NOTCH3 FL 8 19 15311304 15312233 5.19x10-5 0.006 ✓ 5 19 15292440 15292775 2.73x10-2 -0.005 ✕ SIM2 FL 8 21 38080526 38081976 1.42x10-2 0.007 ✓ LHX1 low grade 42 17 35289432 35295726 4.45x10-19 0.007 ✓ 16 17 35296732 35300874 2.04x10-4 0.008 ✓

119 Gene1 Disease2 No. Chm start end Adj. p Mean Meth 4 CpGs value3 Δβ agrees HOXA9 low grade/FL 26 7 27203430 27206907 3.78x10-42 0.013 ✓ PAX7 low grade 18 1 18956762 18959891 5.65x10-10 0.007 ✓ MNX1/ low grade 17 7 156801419 156804268 3.44x10-13 0.008 ✓ HLXB9 POU4F1 low grade 16 13 79176272 79177925 2.59x10-14 0.005 ✓ CDH1 low grade/MM 14 16 68770613 68772469 9.96x10-8 0.007 ✓ DAPK1 low grade/ 11 9 90112101 90114156 2.51x10-11 0.009 ✓ FL/DLBCL CDKN1C low grade/ 9 11 2904951 2906679 2.21x10-4 0.005 ✓ DLBCL HOXA13 low grade 9 7 27238910 27240251 7.83x10-5 0.007 ✓ NPTX2 low grade 8 7 98245716 98247758 1.31x10-10 0.015 ✓ 6 7 98248852 98250025 8.33x10-5 0.001 ✓ ROBO1 low grade 7 3 79067596 79068857 2.11x10-05 0.006 ✓ 7 3 79815639 79816949 4.73x10-9 0.01 ✓ AHR low grade 7 7 17337976 17338557 1.77x10-4 0.001 ✓ NR2F2 CLL/low grade 27 15 96873850 96877300 7.32x10-33 0.012 ✓ 3 15 96869870 96870228 6.03x10-4 0.013 ✓ SOX9 CLL/low grade/ 13 17 70115862 70118558 1.21x10-7 0.008 ✓ FL/DLBCL 9 17 70112050 70114623 1.47x10-7 0.009 ✓ EBF3 CLL 46 10 131760778 131770909 1.64x10-34 0.01 ✓ FOXD3 CLL 39 1 63782395 63790202 5.15x10-19 0.009 ✓ SOX2 CLL 31 3 181428242 181431306 3.39x10-10 0.006 ✓ BNC1 CLL 31 15 83951663 83954849 1.53x10-23 0.009 ✓ DIO3* CLL 30 14 102025815 102031503 4.82x10-41 0.012 ✓ 29 HOXD11 CLL 28 2 176968052 176974060 2.57x10- 0.013 ✓ 38 FOXG1 CLL 28 14 29234671 29237480 1.13x10- 0.014 ✓ SOX1 CLL 27 13 112720171 112724454 2.35x10-27 0.013 ✓ SOX6 CLL 22 11 16625289 16629389 5.85x10-10 0.006 ✓ 3 11 16424673 16424935 1.47x10-6 -0.014 ✓ HOXD8 CLL 19 2 176993017 176995556 2.12x10-22 0.013 ✓ HOXC13 CLL 19 12 54331892 54333823 8.64x10-16 0.01 ✓ SLIT2 CLL 14 4 20253514 20255985 1.08x10-7 0.008 ✓ SOX4 CLL 14 6 21593881 21597055 3.56x10-4 0.003 ✓ CADM1/ CLL 14 11 115373757 115375718 1.42x10-9 0.007 ✓ IGFS4 SOX11 CLL 12 2 5831147 5834638 8.26x10-16 0.013 ✓ ID4 CLL 12 6 19837015 19839375 7.99x10-15 0.013 ✓ ABI3 CLL 11 17 47286719 47288569 2.57x10-10 -0.011 ✕ 4 17 47296970 47297512 4.62x10-5 -0.01 ✕ ABI3 CLL 2 17 47294967 47295142 6.85x10-6 -0.014 ✕ ZNF471 CLL 10 19 57018614 57019958 9.56x10-20 0.02 ✓ WISP3 CLL 9 6 112374151 112375870 2.08x10-4 -0.005 ✕

120 Gene1 Disease2 No. Chm start end Adj. p Mean Meth 4 CpGs value3 Δβ agrees 3 6 112380564 112381114 3.86x10-3 -0.008 ✕ ZFP28 CLL 8 19 57049695 57050834 1.20x10-8 0.017 ✓ NR4A1 CLL 8 12 52435420 52437571 1.16x10-6 -0.007 ✕ ADCY5 CLL 7 3 123166882 123167973 2.53x10-3 0.008 ✓ 3 3 123123576 123124018 1.62x10-5 -0.014 ✕ FOXE1 CLL 7 9 100615084 100616607 5.75x10-6 0.01 ✓ PRDM2 CLL 6 1 14029478 14031383 1.02x10-3 -0.009 ✕ SCGB2A1 CLL 5 11 61974948 61976482 1.01x10-4 -0.01 ✕ CADM1/IG CLL 4 11 115088647 115088907 2.75x10-3 -0.011 ✕ FS4 SFRP1 CLL/MM 12 8 41166169 41167923 5.33x10-9 0.01 ✓ SFRP2 MM 44 4 154709441 154712422 4.97x10-6 0.007 ✓ DCC MM 29 18 49865270 49868788 3.30x10-6 0.004 ✓ IGF1R MM 19 15 99190447 99194624 2.10x10-20 0.008 ✓ 4 15 99443213 99443666 2.90x10-4 0.001 ✓ DKK3 MM 18 11 12029738 12031508 3.48x10-5 0.006 ✓ RBP1 MM 17 3 139257229 139258948 5.46x10-8 0.007 ✓ DKK1 MM 17 10 54073047 54075082 3.26x10-10 0.008 ✓ TP73 MM 17 1 3605087 3607425 3.44x10-4 -0.007 ✕ SFRP4 MM 16 7 37955508 37957021 3.44x10-4 0.003 ✓ RARB MM 14 3 25469125 25469991 1.77x10-4 0.005 ✓ TGFBI MM 12 5 135364060 135365890 1.22x10-4 0.004 ✓ WIF1 MM 9 12 65514473 65516002 8.25x10-9 0.008 ✓ SOCS1 MM 6 16 11348611 11349308 4.85x10-5 -0.009 ✕ SPARC MM 3 5 151066730 151067848 2.36x10-3 -0.007 ✕ BIK MM 3 22 43507369 43507665 8.83x10-3 -0.009 ✕ Hypomethylated in MBCN DMR NOD2/CA CLL 3 16 50731412 50732212 7.72x10-06 -0.01 ✓ RD15 3 16 50,745,944 50746063 1.54x10-03 0.007 ✕

1 Genes identified by literature review as being differentially methylated in MBCN 2 Tumour subtype in which aberrant methylation is reported 3 Adj. p value = Stouffer p value 4 Direction of methylation (either gain or loss) is concordant with literature *DIO3, MYOD1 described as hypomethylated in other NHL subtypes

FL=follicular NHL, low grade=low grade NHL, DLBCL=diffuse large B cell lymphoma (high grade NHL subtype), CLL=chronic lymphocytic leukaemia, MM=multiple myeloma

121

Correlation between DMRs and DMPs

As the DMR analysis was not corrected for white blood cell content, the 9,857 DMRs were cross-referenced with DMPs identified in the regression model adjusted for white blood cell content. Fourteen corresponding DMRs were identified, of which two were located in chromosome 6 in the MHC Class III region (Table 26).

This comparison reveals some inconsistencies between methylation findings found at individual CpG sites by conditional logistic regression analysis and those found in regions by DMR analysis. Genes SLC45A4 and OR2H1 each contain one DMR with a gain of methylation and one with a loss of methylation. While both could be a true measure of methylation, it is not clear how to interpret this with respect to potential functional significance. In TWIST2 and SOX5 genes, each contain a DMP with a loss of methylation and a DMR with a gain of methylation.

Table 26: DMRs containing a DMP identified following conditional logistic regression with adjustment for white blood cell content (p<1.2x10-7)

No. Min. FDR p Stouffer Max. Mean Chm start end CpG value p value Δβ Δβ Gene Methylation loss 16 31,538,556 31,539,234 6 3.19x10-16 2.77x10-05 -0.041 -0.017 AHSP 17 1,509,883 1,510,666 6 1.21x10-38 8.21x10-25 -0.033 -0.023 SLC43A2 16 2,589,556 2,590,751 4 5.56x10-18 2.20x10-07 -0.031 -0.010 PDPK1 11 4,661,207 4,661,335 2 1.59x10-10 7.96x10-07 -0.029 -0.019 OR51D1 6 29,454,557 29,457,282 20 2.26x10-28 1.78x10-13 -0.028 -0.007 MAS1L 6 29,429,346 29,431,410 11 1.93x10-24 3.10x10-11 -0.027 -0.011 OR2H1 17 73,499,195 73,502,038 13 6.16x10-21 9.81x10-13 -0.023 -0.009 CASKIN2 -06 1 984,992 986,894 5 4.02x10-13 1.71x10 -0.017 -0.008 AGRN 8 142,262,761 142,263,053 2 6.21x10-07 5.85x10-05 -0.010 -0.008 SLC45A4 6 152,702,330 152,702,660 6 7.25x10-06 1.43x10-03 -0.007 -0.005 SYNE1 Methylation gain 6 29,427,519 29,427,915 4 1.12x10-05 8.09x10-04 0.038 0.004 OR2H1 8 142,238,452 142,239,323 9 2.38x10-15 1.64x10-03 0.022 0.005 SLC45A4 2 239,755,455 239,756,128 5 1.01x10-05 3.75x10-04 0.014 0.008 TWIST2 12 24,102,581 24,103,374 5 3.46x10-07 1.25x10-04 0.013 0.008 SOX5

122 Methylation difference within DMRs

Given the large number of DMRs identified and the lack of a published standard to guide the selection of DMRs most likely to be significant, an analysis was performed using increasing magnitudes of differential methylation as exploratory cut-offs to define different DMR groups. The relationship between the magnitude of the mean methylation difference in each DMR and the pcomb value is plotted in Figure 24 below. There was no relationship between pcomb value and the magnitude of methylation difference. In fact, several DMRs with highly statistically significant pcomb values reveal very small mean methylation differences.

Figure 24: No relationship between magnitude of methylation difference and Stouffer p value

In order to visualise the consequences of ranking DMRs by pcomb, compared with ranking DMRs by magnitude of differential methylation, the top 10 DMRs ranked by both methods is presented in Table 27 below. It suggests that despite selecting highly statistically significant pcomb values, a weakness in using the pcomb value as the only means of selecting DMRs is the identification of DMRs with mean methylation differences of 1%. The alternative method, using the magnitude of methylation difference, also presents issues as it appears to favour DMRs containing a very small number of CpG sites – in the case of the 10 DMRs with largest methylation difference, each contained only two CpG sites.

123 Table 27: Top DMRs ranked by different methods

Chm start End No. Min FDR p pcomb Max. Mean Gene CpG value Δβ Δβ A. Top DMRs by combined (Stouffer) p value ranking 6 33,130,696 33,149,047 140 1.31x10-65 7.88x10-114 -0.031 -0.009 COL11A2 10 8,091,753 8,098,328 63 9.69x10-89 1.40x10-71 0.022 0.008 GATA3 6 32,034,322 32,059,605 233 8.52x10-44 1.44x10-58 -0.032 -0.006 RNA5SP206 6 32,012,897 32,033,307 167 1.93x10-32 2.45x10-56 -0.028 -0.006 TNXB 16 54,962,001 54,967,714 25 6.87x10-36 1.12x10-42 0.021 0.012 IRX5 -57 7 27,203,430 27,206,907 26 1.49x10 3.78x10-42 0.037 0.013 HOXA9 -41 14 102,025,815 102,031,503 30 2.54x10-46 4.82x10 0.031 0.012 DIO3 38 14 29,234,671 29,237,480 28 6.07x10-47 1.13x10- 0.027 0.014 FOXG1 8 24,770,342 24,773,060 19 3.41x10-45 1.95x10-36 0.032 0.016 NEFM 16 51,183,363 51,189,291 43 8.52x10-35 5.36x10-36 0.025 0.010 SALL1 B. Top DMRs by magnitude of mean differential methylation 5 1,489,875 1,489,889 2 1.16x10-12 1.01x10-09 -0.048 -0.040 5 1,342,124 1,342,172 3 6.89x10-13 3.83x10-10 -0.039 -0.033 CLPTM1L 21 38,349,937 38,350,069 2 6.78x10-07 1.94x10-05 -0.036 -0.029 19 38,918,135 38,918,253 2 3.78x10-14 2.28x10-10 -0.035 -0.032 RASGRP4 8 104,185,332 104,185,501 2 9.33x10-11 5.65x10-08 -0.034 -0.030 RP11 6 134,757,763 134,758,490 2 1.82x10-09 7.02x10-09 -0.032 -0.029 LINC01010 16 84,628,969 84,629,008 2 1.02x10-11 3.69x10-09 -0.031 -0.031 RP11 16 23,850,106 23,850,404 2 1.40x10-08 3.75x10-07 -0.031 -0.031 PRKCB 9 140,446,823 140,446,993 2 4.42x10-05 8.42x10-04 -0.031 -0.030 PNPLA7 8 27,144,085 27,144,453 2 3.44x10-09 5.53x10-05 0.047 0.030

(A) Ranked by pcomb. Top DMRs contain a large number of CpG sites, predominantly with loss of methylation. (B) Ranked by magnitude of methylation difference. Top DMRs contain a small number of CpG sites, predominantly with gain of methylation. Δβ=methylation difference (case-control)

124 Evaluating different thresholds for calling DMRs

Three different types of thresholds for calling DMRs were evaluated.

A. Restricting DMRs to those in which a minimum level of methylation difference was detected B. Restricting DMRs based upon the pcomb value C. Altering DMRcate settings resulting in a change in the definition of a DMR by changing the allowable genomic distance between CpG sites within a DMR

A. Evaluating different thresholds of methylation difference

Within each DMR, DMRcate reports the methylation difference between cases and controls, averaged across all CpG sites contained within the DMR as well as the maximum methylation difference of the most differentially methylated CpG site.

Exploratory thresholds were applied to DMRs, starting with all DMRs reaching FDR p<0.05 (n=9,867). Different cut-offs based on the magnitude of methylation difference within the DMRs were applied to evaluate the effect on the number of DMRs, presented in Table 28 below. The number of DMRs captured by applying an increasing level of stringency is depicted, alongside the number of only those DMRs containing at least 5 CpG sites, another criterion for DMR selection suggested by Istas et al (263).

Table 28: Number of DMRs with mean and maximum methylation difference 2 - 4%.

DMRs, FDR p<0.05 Total no. of No. DMRs with No. DMRs with Mean no. CpGs DMRs methylation gain* methylation loss* per DMR No Δβ cutoff 9857 3496 (1432) 6361 (2300) 6.8

Mean Δβ >0.01 3183 1054 (482) 2128 (203) 5.8 Mean Δβ >0.02 262 30 (4) 232 (14) 2.8 Mean Δβ >0.03 6 0 6 (0) 2.0

Max Δβ >0.01 8885 3051 (2062) 5833 (1371) 6.1 Max Δβ >0.02 3274 1016 (771) 2258 (648) 7.2 Max Δβ >0.03 342 121 (103) 221 (74) 9.5 Max Δβ >0.04 24 11 (7) 13 (7) 10.8

Δβ=methylation difference (case-control) * The figure in brackets following the number of DMRs is the number of DMRs containing at least 5 CpG sites

125 I. Applying threshold of mean methylation difference ≥2% First, a mean methylation difference of ≥2% was selected as a threshold for DMRs for further consideration. The most striking observation is that the vast majority of DMRs exhibit a mean methylation difference of <2%. The second observation is when a threshold of mean methylation difference ≥2% is used, the resulting DMRs contain an average of only 2.8 CpG sites.

Using this cut-off, 262 DMRs were identified, of which 232 had a loss of methylation and of which 161 were associated with a known gene (130 unique genes as some genes contained several DMRs). Ontology analysis of the 130 unique genes was performed identifying 11 KEGG pathways, shown in Table 28 below. Pathways particularly applicable to MBCN include cytokine-cytokine receptor interaction, TNF signaling pathway, chemokine signaling pathway and T-cell receptor signaling pathway.

Fifteen DMRs demonstrated a gain of methylation, 13 of which were associated with known genes: GPR146 (G protein-coupled receptor 146), CRMP1, CHIC2, the imprinted gene DUSP22, FAM198, PRR33, TDRP and four zinc protein genes ZSCAN1, ZNF471, ZNF728 and ZNF835.

Table 29: KEGG pathways associated with genes demonstrating a loss of methylation within a DMR (threshold for DMR = mean methylation difference ≥2%)

KEGG pathway No. genes P value

Cytokine-cytokine receptor interaction 7 2.8x10-3 Genes: CCL5, CCR2, CCR8, TNFRSF17, IFNG, IL25, IL5 Olfactory transduction 9 3.3x10-3 TNF signaling pathway 5 4.1x10-3 Genes: AKT1, CCL5, NFKBIA, TRAF5, VCAM1 Chemokine signaling pathway 6 5.6x10-3 Genes: AKT1, CCL5, CCR2, CCR8, NFKBIA, PRKCZ African trypanosomiasis 3 1.8x10-2 HIF-1 signaling pathway 4 2.3x10-2 Influenza A 5 2.4x10-2 T-cell receptor signaling pathway 4 2.6x10-2 Genes: AKT1, NFKBIA, IFNG, IL5 Chagas disease 4 3.1x10-2 Insulin resistance 4 3.1x10-2 NOD-like receptor signaling pathway 3 4.6x10-2

126 II. Applying threshold of mean methylation difference ≥3% Second, a threshold of mean methylation difference of ≥3% was applied. Six DMRs were identified, shown in Table 30, all with loss of methylation and containing only a small number of CpG sites (2-3 CpGs per DMR). Of these six DMRs, three were associated with known genes: RASGRP4, PRKCB (protein kinase C beta, associated with the MAPK signaling pathway) and PNPLA7.

Table 30: DMRs found after applying a threshold of mean methylation difference of ≥3%.

No. Min. FDR p Stouffer Max. Mean Chm start end Gene CpG value p value Δβ Δβ Methylation loss 5 1342124 1342172 3 6.89E-13 3.83x10-10 -0.039 -0.033 CLPTM1L 5 1489875 1489889 2 1.16E-12 1.01E-09 -0.048 -0.040 8 104185332 104185501 2 9.33E-11 5.65X10-8 -0.034 -0.030 RP11-318M2.2 16 23850106 23850404 2 1.40X10-8 3.75E-07 -0.031 -0.031 PRKCB 16 84628969 84629008 2 1.02E-11 3.69E-09 -0.031 -0.031 RP11-61F12.1 19 38918135 38918253 2 3.78E-14 2.28E-10 -0.035 -0.032 RASGRP4* *= a DMR containing a DMP (p<1.2x10-7)

III. Applying threshold of maximum methylation difference Third, thresholds of maximum methylation difference of 1%, 2%, 3% and 4% were applied. Other authors have applied a maximum methylation cut-off as one criterion for defining a DMR but these studies involved tumour tissue DNA rather than peripheral blood DNA (203, 263). In peripheral blood DNA studies, the magnitude of methylation difference at individual CpG sites is in the order of 0.5-3% (170, 171, 173, 174, 179, 267). Twenty four DMRs were identified using the threshold of 4%, 11 with a gain of methylation, summarised in Table 31. All CpG sites retained highly significant minimum p values and Stouffer (combined) p values. DMRs associated with genes of interest in MBCN included the oncogene NOTCH4, the homeobox gene GSX1 and the imprinted gene DUSP22. NOTCH4 was also identified In the DMP analysis (using logistic regression, presented in Chapter 5.2) as containing three differentially methylated CpG sites. The DMR associated with NOTCH4 is very large, containing 56 CpG sites with combined highly statistically significant p=1.99x10-14 but only a small mean methylation difference of 1%. In GSX1, one DMP was identified by logistic regression and a large DMR containing 20 CpG sites (p=2.13x10-17 ) and a mean methylation difference of only 1%.

127 Table 31: DMRs found after applying a threshold of maximum methylation difference of ≥4%.

Chm start end No. Min FDR Stouffer p Max. Mean Gene CpG p value value Δβ Δβ Methylation loss 3 38664022 38664,451 4 3.31x10-10 9.51x10-06 -0.06 -0.02 5 1489875 1489,889 2 1.16x10-12 1.01x10-09 -0.05 -0.04 1 988623 989392 4 1.80x10-13 1.95x10-03 -0.05 -0.01 AGRN* 6 32184296 32192024 56 4.99x10-49 1.99x10-14 -0.05 -0.01 NOTCH4* 6 33083287 33083989 5 1.98x10-05 6.72x10-04 -0.04 -0.01

16 2801092 2802711 15 1.93x10-30 1.81x10-06 -0.04 -0.01 SRRM2-AS1* 20 23548991 23550668 10 1.78x10-11 4.28x10-07 -0.04 -0.01 CST9L 12 129279936 129282057 10 1.13x10-34 7.23x10-33 -0.04 -0.03 14 63670940 63671737 4 2.51x10-06 1.04x10-04 -0.04 -0.02 RHOJ 16 31538556 31539234 6 3.19x10-16 2.77x10-05 -0.04 -0.02 AHSP* 1 205497713 205499355 6 6.55x10-07 4.19x10-06 -0.04 -0.02 CDK18* 2 128453108 128453484 5 6.84x10-06 2.65x10-04 -0.04 -0.03 1 159046391 159047034 6 5.60x10-24 3.10x10-17 -0.04 -0.03 AIM2 Methylation gain 10 134876495 134876895 2 1.70x10-04 5.43x10-03 0.05 0.02 8 27144085 27144453 2 3.44x10-09 5.53x10-05 0.05 0.03 4 9782243 9783965 14 1.22x10-45 1.00x10-22 0.05 0.02 DRD5* 4 5894691 5895410 5 3.41x10-16 1.47x10-08 0.05 0.02 CRMP1* 12 126675667 126676790 6 1.32x10-14 1.12x10-09 0.04 0.02 RP4 20 11869467 11872261 13 2.68x10-25 4.45x10-10 0.04 0.01 BTBD3 1 63782395 63790202 39 7.71x10-17 5.15x10-19 0.04 0.01 FOXD3* 6 290800 293331 12 8.92x10-08 5.00x10-08 0.04 0.03 DUSP22 6 166,856,056 166856,094 3 6.47x10-05 3.62x10-04 0.04 0.03 13 28,365,587 28369,574 20 2.28x10-33 2.13x10-17 0.04 0.01 GSX1* *= a DMR containing a DMP (p<1.2x10-7)

In summary, it is possible to limit DMRs to those with a mean methylation difference of at least 2% and retain a significant number of DMRs for further analysis. The limitation of DMRs to a mean methylation difference of ≥3% resulted in a very small number of DMRs. Restriction of DMRs by mean methylation difference appears to select out DMRs containing smaller numbers of CpG sites and this seems to be a limitation of the approach.

B. Evaluating thresholds of p value in DMRs

Next, exploratory thresholds of the combined (Stouffer) p value were applied, outlined in the table below. The number of DMRs captured by applying an increasing level of stringency is depicted, alongside the number of only those DMRs containing at least 5 CpG sites, Table 32. Increasing stringency of pcomb appears to select for DMRs containing larger -8 numbers of CpG sites, starting with pcomb<1x10 based on the threshold suggested for studies using a stringent cut-off proposed for studies employing arrays with approximately 500K probes (168).

128 Table 32: Number of DMRs according to different pcomb value thresholds

DMRs No. of DMRs No. DMRs with No. DMRs with Mean. no. CpG methylation gain* methylation loss* sites per DMR

No cut-off pcomb 9857 3496 (1432) 6361 (2300) 6.1 -8 pcomb <1x10 1120 633 (614) 487 (258) 13.1 -15 pcomb <1x10 239 179 (179) 60 (57) 23.1 -20 pcomb <1x10 91 67 (67) 24 (24) 30.6 -25 pcomb <1x10 45 33 (33) 12 (12) 40.5 * The figure in brackets following the number of DMRs is the number of DMRs containing at least 5 CpG sites

I. Effect of restricting DMRs to those with Stouffer (combined) p<1x10-15 After applying a threshold of Stouffer (combined) p<1x10-15, there were 239 significant DMRs of which 179 had a gain of methylation and 60 had a loss. The effect of applying a more stringent Stouffer p value to the data was that nearly all resultant DMRs contained at least 5 CpG sites.

Of the 179 DMRs with a gain in methylation, gene ontology analysis was performed on the 167 associated with a named gene, identifying seven genes involved in the KEGG identified pathways in cancer. Of the 60 DMRs with a loss in methylation, gene ontology analysis was performed on the 51 named genes, identifying genes associated with the pro-proliferative pathways PI3k-Akt, Ras signaling, B cell receptor signaling and cytokine interactions (Table 33).

Table 33: KEGG pathway analysis of DMRs, p<1x10-15

KEGG pathway No. genes P value

DMRs with increased methylation: Maturity onset diabetes of the young 4 5.7x10-4 cAMP signaling pathway 5 3.6x10-2 Pathways in cancer 7 3.7x10-2 Genes: IKITLG, EGLN3, EDNRB, FZD10, IGF1R, PDGFRA, PTGS2 Axon guidance 4 4.7x10-2 DMRs with reduced methylation: PI3K-Akt signaling pathway 7 9.3x10-4 Genes: CD19, GNB2, COL11A2, FGF21, GRB2, INS, TNXB Ras signaling pathway 5 7.1x10-3 Genes: GNB3, FGF21, GRB2, INS, KSR1 Type I diabetes mellitus 3 9.1x10-3 B cell receptor signaling pathway 3 2.4x10-2 Genes: CD19, CD81, GRB2 Cytokine-cytokine receptor interaction 4 4.4x10-2 Genes: CCR6, TNFRSF13B, LTA, LTB

129 II. Effect of restricting DMRs to those with Stouffer (combined) p<1x10-25 A more stringent threshold of Stouffer p<1x10-25 was applied and the resulting 45 DMRs are all listed in Table 34 below including the methylation differences. These highly stringently selected DMRs have a tendency to contain larger number of CpGs within each DMR (average 40 CpG sites per DMR).

The genes of interest in MBCN include the homeobox genes HOXA9, HOXD11, GBX2, ISL1, NK6X-1, NK6X-2, TLX-3, UNCX, EVX2, IRX5, MSX1 and TSHZ3, transcription regulators SOX1, SOX21, TBX15, EBF3, FOXA1, FOXG1, FOXI2 and NR2F2 and GATA3, a positive regulator of T cell differentiation.

Table 34: DMRs with Stouffer p<1x10-25 (λ=1000), ranked by magnitude of maximum methylation difference.

No. Min. FDR p Stouffer Max. Mean Chm start end CpGs value p value Δβ Δβ Gene Methylation loss 1 1096717 1106175 47 4.43x10-27 6.47x10-29 -0.022 -0.009 MIR429 1 25290947 25292412 18 2.02x10-48 1.19x10-29 -0.028 -0.018 RUNX3^ 4 40517938 40519094 8 6.58x10-36 6.38x10-26 -0.035 -0.021 RBM47* 6 30618299 30619242 13 1.03x10-50 5.37x10-30 -0.033 -0.015 C6orf136*^ -34 6 31538984 31543300 30 2.13x10-36 7.84x10 -0.028 -0.018 LTA^ 6 32012897 32033307 167 1.93x10-32 2.45x10-56 -0.028 -0.006 TNXB* 6 32034322 32059605 233 8.52x10-44 1.44x10-58 -0.032 -0.006 RNA5SP206 6 33130696 33149047 140 1.31x10-65 7.88x10-114 -0.031 -0.009 COL11A2*^ 6 33240090 33249087 99 7.86x10-62 3.54x10-35 -0.030 -0.007 VPS52 12 129279936 129282057 10 1.13x10-34 7.23x10-33 -0.043 -0.025 14 94423192 94425114 9 2.48x10-32 7.42x10-26 -0.021 -0.015 ASB2* 14 106319730 106322429 13 6.25x10-27 6.88x10-28 -0.020 -0.014 IGHM Methylation gain 1 50879560 50893149 56 1.75x10-19 6.13x10-27 0.024 0.007 DMRTA2* 1 119526060 119532925 42 1.04x10-21 2.18x10-26 0.028 0.010 TBX15 2 14772312 14775494 21 4.26x10-24 1.46x10-26 0.026 0.015 FAM84A 2 176943669 176953512 44 1.81x10-20 1.96x10-30 0.025 0.011 EVX2 2 176968052 176974060 28 3.27x10-39 2.57x10-29 0.036 0.013 HOXD11*^ 2 237072391 237080535 39 1.19x10-32 2.77x10-33 0.030 0.012 GBX2^ 4 4856380 4862240 45 1.29x10-42 9.15x10-30 0.033 0.009 MSX1 4 85416865 85421050 26 5.82x10-26 3.71x10-26 0.022 0.009 NKX6-1 5 50678451 50679817 16 4.80x10-40 1.15x10-26 0.028 0.015 ISL1^ 5 170734312 170739853 25 5.20x10-25 2.44x10-29 0.024 0.011 TLX3 6 133561386 133563532 41 1.70x10-56 2.67x10-28 0.021 0.009 EYA4^ 7 1269181 1278508 42 4.81x10-20 2.32x10-33 0.021 0.010 UNCX 7 27203430 27206907 26 1.49x10-57 3.78x10-42 0.037 0.013 HOXA9*^ 7 28995458 28998860 24 7.08x10-28 8.52x10-26 0.026 0.008 TRIL^ 7 79081455 79084196 28 5.19x10-40 1.19x10-29 0.030 0.012 MAGI2*^ 8 24770342 24773060 19 3.41x10-45 1.95x10-36 0.032 0.016 NEFM*^ 10 8091753 8098328 63 9.69x10-89 1.40x10-71 0.022 0.008 GATA3*^

130 No. Min. FDR p Stouffer Max. Mean Chm start end CpGs value p value Δβ Δβ Gene 10 25463757 25466027 24 3.97x10-37 6.51x10-27 0.032 0.012 GPR158^ 10 129533998 129537990 23 1.57x10-36 2.65x10-31 0.037 0.009 FOXI2 10 131760778 131770909 46 1.78x10-24 1.64x10-34 0.022 0.010 EBF3* 10 134598352 134602228 41 6.07x10-39 1.03x10-26 0.024 0.011 NKX6-2^ 13 95362829 95366456 24 3.76x10-38 2.95x10-28 0.017 0.009 SOX21^ 13 112720171 112724454 27 2.63x10-29 2.35x10-27 0.027 0.013 SOX1^ 14 29234671 29237480 28 6.07x10-47 1.13x10-38 0.027 0.014 FOXG1^ 14 38063564 38065582 15 4.57x10-36 8.22x10-26 0.025 0.012 FOXA1^ 14 85995655 85998669 22 4.50x10-28 2.07x10-27 0.030 0.013 FLRT2* 14 102025815 102031503 30 2.54x10-46 4.82x10-41 0.031 0.012 DIO3*^ 15 96873850 96877300 27 3.78x10-30 7.32x10-33 0.024 0.012 NR2F2 16 51183363 51189291 43 8.52x10-35 5.36x10-36 0.025 0.010 SALL1^ 16 54962001 54967714 25 6.87x10-36 1.12x10-42 0.021 0.012 IRX5^ 17 79367615 79374515 30 7.56x10-25 3.26x10-29 0.026 0.008 RP11 18 76737392 76741157 17 6.53x10-26 1.28x10-28 0.032 0.014 SALL3 19 31840737 31848310 30 1.88x10-28 4.60x10-28 0.021 0.008 TSHZ3 *= a DMR containing a DMP (p<1.2x10-7) ^ = a DMR also identified following tuning of kernel smoothing (λ=500)

131 C. Evaluation of tunable kernel smoothing

DMRcate permits tunable kernel smoothing, allowing the user to set the maximum difference between CpG sites considered to be part of a DMR. The authors of DMRcate, Peters et al, suggest that reduction in λ to a distance of 500bp between CpGs (default value is λ=1,000) can be applied with the effect being to increase stringency. Further reduction of the λ is not recommended, as this is likely to result in false negative results.

Using λ=500 and FDR p<0.05, 6,731 DMRs were identified, confirming an effect of increased stringency compared with the standard λ=1,000 setting. On comparison of the DMRs identified using both λ settings, unexpectedly there was an overlap of only 29% of DMRs, with the more stringent λ=500 setting identifying 2,978 novel DMRs. The significance of the differences in DMRs identified is not clear and, ideally, should be clarified in a validation cohort.

Figure 25: Comparison of DMRs identified using λ=1000 and λ=500

In summary, the effect of tuning the kernel size down and restricting the defined distance between CpG sites within a DMR was to create a slightly more stringent analysis however the identification of a large number of novel DMRs was unexpected, highlighting the sensitivity of DMR analysis to changes in analysis parameters.

132 Final DMR definition

Based on the evaluations above, a final selection of DMRs was defined requiring at least 5 -8 CpG sites, a pcomb<1x10 and a maximum methylation difference of >1%, yielding 433 DMRs.

DMRs with gain of methylation Of the 433 DMRs, 256 demonstrated increased methylation, of which 223 were associated with a named gene. Ontology analysis revealed a number of statistically significant KEGG pathways but none of likely relevance to MBCN (Table 35). Sixteen DMRs corresponded to genes described as hypermethylated in MBCN: the homeobox genes SOX11, HOXD11, HOXD8, HOXA9, SOX1, DIO3 and ONECUT2, as well as ROBO1, ID4, NPTX2, SFRP1, MYOD1, FOXG1, FOXA1 and ZNF471. Ten DMRs corresponded to genes known from the literature review (Chapter 2) to carry recurrent mutations in MBCN from the literature review (Chapter 2): SOX11, SLC4A10, ROBO1, HOXA9, NPTX2, DPP6, UNC5D, LUZP2 and NR2F2.

DMRs with loss of methylation Of the 433 DMRs, 177 demonstrated a loss of methylation and 154 of these were associated with a named gene. Ontology analysis, performed using DAVID, revealed a number of KEGG pathways, two of which were of particular relevance to MBCN: B cell receptor signaling and chemokine signaling (Table 35). Genes related to the B cell receptor signaling pathway included CD81, CD19, PIK3R5 and GRB2. Other KEGG pathways of interest in MBCN were those related to regulation of autoimmune disease and immune response to infection, such as the genes in the major histocompatibility complex such as HLA-DMA, HLA-DOB, HLA-DQA2. Three DMRs corresponded to genes associated with recurrent mutation in MBCN: SP140, LTB and DTX1. None of the DMRs with a loss of methylation corresponded to any hypomethylated gene described in MBCN literature.

133 Table 35: KEGG pathways for final DMR list -8 DMRs defined as >5 CpG sites, pcomb<1x10 and max. methylation difference >1%

KEGG pathway No. genes P value DMRs with increased methylation Neuroactive ligand-receptor interaction 12 5.3x10-5 Axon guidance 7 1.3x10-3 Glutamatergic synapse 5 2.3x10-2 Mucin type O-Glycan biosynthesis 3 3.5x10-2 Cell adhesion molecules 5 4.6x10-2 DMRs with reduced methylation Staphylococcus aureus infection 6 1.0x10-4 Type I diabetes 4 5.9x10-3 Rheumatoid arthritis 5 7.4x10-3 CCL5, LTB, HLA-DMA, HLA-DOB, HLA-DQA2 Intestinal immune network for IgA production 4 8.1x10-3 Cell adhesion molecules 6 8.1x10-3 HLA-DMA, HLA-DOB, HLA-DQA2, MAG, NRXN2, PDCD1 Viral myocarditis 4 1.4x10-3 Influenza A 6 1.8x10-2 Tuberculosis 6 1.0x10-2 B cell receptor signaling pathway 4 2.3x10-2 CD19, CD81, GRB2, PIK3R5 Chemokine signaling pathway 6 2.4x10-2 CCL22, CCL5, CCR6, GNB3, GRB2, PIK3R5 Asthma 3 3.4x10-2 Graft-versus-host-disease 3 2.1x10-2 HLA-DMA, HLA-DOB, HLA-DQA2

134 Discussion

The identification of a differentially methylated region, or cluster of CpG sites, as opposed to the detection of a single differentially methylated CpG site may be associated with greater biological significance, in particular when close to the promoter region of a gene (168, 261). The identification of DMRs is, therefore, an increasingly important component of describing differential methylation in genome-wide studies. Using pre-defined criteria for the identification of DMRs (FDR p<0.05, distance between CpG sites <1,000 bp and CpG site>1) 9,867 DMRs were identified. The pattern of methylation in these DMRs was biologically valid; there was a gain of methylation within DMRs containing high proportions of CpG island-associated and promoter-associated CpG sites. Conversely, DMRs with low CpG island content demonstrated loss of methylation.

Without an available consensus for the ideal criteria by which to identify a DMR, guided by methods described in the literature (203, 263), a selection of variably stringent thresholds was applied in order to narrow down the large number of initially identified DMPs. The two main approaches to identifying significant DMRs are either ranking DMRs by the magnitude of methylation difference or by the combined (Stouffer) p value. Each approach identified a completely different set of DMRs – either DMRs with large methylation differences but containing a small number of CpG sites and therefore high p values, or DMRs with very small p values and small absolute methylation differences. A comparison was made, therefore, between methylation in the 9,867 DMRs identifies and those that were described in MBCN literature, in order to evaluate features of DMRs demonstrating concordant methylation difference. Due to the interest in the functional association between promoter methylation and gene silencing, the vast majority of literature relates to hypermethylated genes (78). While widespread hypomethylation is a consistent and well-described feature of all cancers it often occurs in non-promoter regions (38) and is less frequently reported due to uncertainties about its functional significance. Concordant methylation, determined as the detection of methylation either gained or lost within a DMR, was demonstrated by a small proportion (77/9867) of DMRs. DMRs with discordant methylation to that in the literature were more likely to contain a small number of CpG sites, suggesting that DMRs containing fewer CpG sites could be less reliable. This observation suggests that it is important to consider the number of CpG sites within a DMR. The arbitrary threshold of >2 CpG sites used by some studies to adequately define a DMR in tumour (168, 261) may be inadequate for studies using peripheral blood- derived DNA in which methylation differences are likely to be very small, and additional measures to improve reliability are needed.

Differential methylation in HLA locus An interesting finding was that of 151 DMRs occurring within a small area on the short arm of chromosome 6 (6p21.3) corresponding to the HLA locus. Genes associated with classes I, II and III HLAs were differentially methylated. HLA genes encode proteins of the major histocompatibility complex (MHC) whose function is to bind peptide fragments and present them to T cells enabling them to initiate an appropriate immune response. Class I antigens are expressed on all cells and present small peptides to CD8 T cells (cytotoxic T

135 cells), generally resulting in the killing of the cell presenting the foreign antigen. Class II antigens are limited in expression generally to B cells, monocytes and macrophages which present peptides to CD4 T cells (helper T cells) whose role it is to recruit other components of the immune system including additional B cells. Defects in the MHC are associated with MBCN risk: deficient HLA molecule expression is associated with particular lymphoma histiological types and the risk of relapse (268), while at least 11 common variants within the HLA locus have been identified as being associated with increased risk of MBCN.

The following factors were considered as possible confounders of this observation of differential methylation identified in the MHC. First, the presence of single nucleotide polymorphisms identified by genome-wide association studies could theoretically result in abnormal methylation however the polymorphisms were spread across different HLA genes and therefore unlikely to have influenced methylation in one region. Systematic ethnic differences could potentially influence methylation findings because genes of the MHC are highly diverse with significant differences between ethnic groups (49). However, in our cohort, the ethnic background of all participants was Caucasian and ethnic origin (whether from Southern Europe or Australia/UK/New Zealand) was matched for each case and control pair. We would not, therefore, expect to find any systematic differences in HLA alleles in our cohort and no systematic methylation differences based on ethnicity. Confirmation of the absence of systematic differences in HLA alleles between cases and controls would need to be performed by sequencing. A potential confounder of the methylation findings in the HLA locus is that class II antigen expression is restricted to B lymphocytes and, to a lesser degree, monocytes. Systematic differences in white blood cell content between cases and controls might affect methylation of HLA class II antigens (HLA-DR, HLA-DQ, HLA-DO, HLA-DM and HLA-DP). Validation of the results in a cohort in which the white blood cell content is known or in samples of purified B cells could determine whether the methylation findings for genes associated with class II antigens are due to differences in cell content.

Top DMRs by p value Given that the likelihood of concordance between DMR findings and the methylation literature was predicted by DMRs containing a larger number of CpG sites, the top DMRs identified by Stouffer p value (which selects for DMRs with more CpG sites) are discussed in more detail. For the 239 DMRs with p<1x10-15, ontology analysis suggests dysregulated methylation in PI 3-kinase and Ras signaling pathways and B cell receptor signaling. The PI3K and Ras signaling pathways regulate proliferation and survival of pre-B cells (49). The B cell receptor signaling pathway links the binding of surface proteins to the B cell receptor to the activation of a cytoplasmic cell signaling cascade that ultimately lead to activation of nuclear transcription factors that each drive pro-survival and proliferative effects (51).

For the 45 DMRs with p<1x10-25, DMRs associated with homeobox genes significantly differentially methylated. The genes of interest in MBCN include the homeobox genes HOXA9, HOXD11, GBX2, ISL1, NK6X-1, NK6X-2, TLX-3, UNCX, EVX2, IRX5, MSX1 and TSHZ3, transcription regulators SOX1, SOX21, TBX15, EBF3, FOXA1, FOXG1, FOXI2, NR2F2 as well

136 as GATA3 a positive regulator of T cell differentiation. The finding of differential methylation in regions associated with homeobox genes is concordant with the findings of the single CpG site methylation analysis (chapter 5.2). Dysregulation of the homeobox genes is associated with interruption of cellular apoptotic pathways and is found in chronic lymphocytic leukaemia (42, 98) and in many solid tumours.

Establishing a combined for calling DMR A final threshold for defining DMRs was proposed based upon the available literature and the data available. The criteria were: a moderately stringent combined p value (pcomb<1x10-8), a 1% threshold for the magnitude of maximum differential methylation in line with other studies investigating peripheral blood methylation (170, 171, 173, 174, 179, 267) and a threshold for the minimum number of CpG sites within each DMR based upon previous studies (263).

Of the DMRs with a gain of methylation, a number were associated with genes known to be hypermethylated in MBCN. The first is SOX11, which encodes the SOX11 transcription factor. SOX11 expression is absent in most normal adult tissues, and is traditionally thought of as a growth promoter. To this effect, SOX11 expression is reported in subsets of mantle cell lymphoma and Burkitt lymphoma (269). SOX11 expression appears to be particularly relevant in mantle cell lymphoma where its expression is associated with an aggressive form of the lymphoma with inferior prognosis compared with SOX11 negative tumours (270). However, somewhat paradoxically SOX11 is epigenetically silenced in mantle cell lymphoma and Burkitt lymphoma by both DNA methylation and histone methylation (271, 272). SOX11 expression is not a feature of other MBCN subtypes, and there is no further information to date regarding aberrant SOX11 methylation in other MBCN. Next, HOXA9, one of the homeobox family of genes, is known to be hypermethylated in follicular lymphoma (93, 100) and mantle cell lymphoma (46, 86). ROBO1 is methylated in mantle cell lymphoma and, although expression levels have not been shown to correlate with methylation, ROBO1 expression in normal lymphoid cells is lower than in lymphoma cells (86). The functional significance of ROBO1 is not clear, but the presence of ROBO1 methylation in solid tumours, and animal data showing that ROBO1 inactivation increases the likelihood of developing lymphoma, suggests that it could be a tumour suppressor gene (86). SFRP1 is a putative tumour suppressor gene that is hypermethylated in CLL (98) as well as myeloma in which methylation of SFRP1 is associated with gene silencing (144).

ZNF471 hypermethylation, with associated reduced gene expression, has been noted in CLL and solid tumours but its functional significance is unknown (112). Increased ZNF471 methylation is found in the blood of nonagenarians compared with young, health controls and is proposed to be one of a range of genes associated with an ageing methylome (273). HOXD11, HOXD8, ID4, FOXG1 and SOX1 are hypermethylated in CLL (98), FOXA1 in CLL and follicular lymphoma (98) and NPTX2 in mantle cell lymphoma but are of uncertain functional significance. ONECUT2 is hypermethylated in DLBCL and follicular lymphoma, but there is no demonstrated association with gene expression in these tumours and the functional relevance of ONECUT2 methylation is not known (93). MYOD1 is hypermethylated in follicular lymphoma and DLBCL (87) but another study found it to be

137 hypomethylated in B cell lymphomas (121). DIO3 is hypermethylated in CLL (98) but hypomethylated in another study of mixed B cell malignancies (121). It is notable that HOXA9, SOX11, ROBO1 and NPTX2 are known to be mutated in a subset of MBCN cases, with the potential risk that low levels of circulating tumour DNA present in the pre- diagnostic blood samples could result in tumour-related mutations being misinterpreted as aberrant methylation.

Of the DMRs with a loss of methylation, there were two KEGG pathways of relevance to MBCN the B cell receptor signaling and the chemokine signaling pathways. Associated genes included those encoding the cell surface proteins CD81 and CD19, PIK3R5 gene encoding the intracellular signaling enzyme PI3kinase regulatory subunit which is part of the PI3k-Akt pathway and GRB2, which plays a role in downstream signaling from the B cell receptor. There was also a KEGG pathway related to graft versus host disease which contained a number of genes related to regulation of autoimmune disease and immune response to infection, in particular genes in the major histocompatibility complex such as HLA-DMA, HLA-DOB, HLA-DQA2.

Limitations Compared with the many published papers describing differential methylation at single CpG positions, there is very little guidance in the literature on the appropriate thresholds of differential methylation in CpG regions. The analysis presented here is, therefore, exploratory and requires comparison with the findings of the DMP analysis, published findings in the literature, and in future validation in another cohort.

Without adjustment for white blood cell content, it is possible that the DMRs identified here represent tissue-specific DMRs as described by Rakyan et al (260). Concordance with methylation findings reported in the literatures supports the validity of the findings, but we cannot exclude the possibility that the effect is due to the presence of circulating tumour cells. A smaller list of DMRs does correlate with DMPs following white blood cell content correction. These may represent DMRs that predict MBCN independent of B cell content.

The magnitude of the methylation difference is small, in the order of a mean methylation difference within each DMR of 1-4% and leads to several possible scenarios. First, that the differential methylation detected is less likely to reflect germline methylation as this may be expected to result in a greater magnitude of methylation difference between cases and controls. The small magnitude of difference supports the hypothesis that these methylation differences are acquired, most likely occurring in a subset of circulating white blood cells. Given that DNA was sampled a mean of 10 years prior to diagnosis with MBCN, the burden of abnormal methylation is more likely to be low compared with DNA sampled at the time of MBCN diagnosis. For the majority of cases, the DNA source was whole peripheral blood consisting of different white blood cell types, only some of which may contain aberrant methylation.

138 5.4 Differential methylation variability Background

Inter-individual variability of methylation is increasingly understood to be a feature of cancer tissue compared with normal tissue (274-277). The variability of methylation at one CpG site among a series of normal individuals is generally small. In comparison, at the same CpG site, a series of cancer samples frequently exhibit high degrees of methylation variability due to large inter-individual (not technical) differences in methylation. The phenomenon of variability of methylation a particular site or region is thought to be an important evolutionary mechanism reflecting another pathway by which cells can adapt and survive (278). At the very least, patterns of methylation variability reflect distinctive cancer phenotypes (274, 277).

Phipson et al presented methylation data from kidney cancer in the figure below to illustrate that traditional methylation analysis measuring the difference in methylation between cancer and normal tissues identifies CpGs with the largest mean methylation difference. In comparison, measurement of differential methylation variability identifies CpGs in which the methylation values in cancer samples are highly variable compared with that in normal samples (277). Measuring methylation variability identifies a novel set of CpG sites with dysregulated methylation and using this information in addition to measuring differential methylation may lead to improvements in models of cancer prediction (279).

Figure 26: Differential methylation variability Beta methylation values at a single CpG site are shown. Each dot represents a single sample. In (A) the top differentially methylated CpG site is hypomethylated in cancer. Most of the samples with cancer show very similar levels of methylation. In (B) the top CpG site measured by differential methylation variance is characterised by a very broad range of methylation in cancer samples while in normal tissue the values are more tightly clustered. From Phipson et al. (277)

The mechanism by which methylation variance occurs is unknown, but one hypothesis is that it arises due to a loss of repair methylation mechanisms rather than from an increase in methylation error frequency. Insults such as age or inflammation are proposed to

139 impair DNA methylation repair mechanisms, supported by the observation that peripheral blood methylation variance increases with age (277, 278).

There are a number of available statistical models for measuring methylation variance. A comprehensive method was published by Jaffe et al in 2012 but was only for use with the CHARM microarray (275). The same group refined the methodology and enabled it to be used with the Illumina Infinium microarrays, identifying marked variance in cancer- associated DMRs for colon, breast, lung and thyroid cancers as well as Wilms tumours (274). The authors were able to exclude cell heterogeneity as a cause of the variability as this is a potential confounder of variability analysis. Teschendorff et al used their own methodology to demonstrate that differentially variable regions may be of particular use in identifying precancerous phenotypes in cervical epithelial neoplasms (276). Using an algorithm called EVORA, differential variability outperformed differential methylation measures in identifying these pre-cancer lesions, but underperformed differential methylation in identifying invasive cancers lesions and in whole blood analysis. Their methodology appears to be distinct in that it is sensitive to outlier methylation profiles which they hypothesise to be the case in pre-cancer lesions which contain a mix of normal and abnormal cells.

More recently, Phipson et al published a method based on Levene’s z-test called DiffVar, which measures differentially variable individual CpG sites (277). The methodology varies slightly compared with the EVORA method in that DiffVar is less sensitive to outlier samples. The authors propose that this may lead to fewer false discoveries because in many cases single outlier samples can be a result of quality and technical issues. In comparison with other methods that rely on Bartlett’s test or F-test, DiffVar has superior false discovery performance and has equivalent sensitivity for detecting variable sites so long as there is a reasonable sample size (n>5 appears sufficient (277). In a kidney cancer cohort analysed using the Infinium 450K BeadChip, the 10,000 top raked variable CpG sites showed very little overlap with the 10,000 top ranked differentially methylated sites.

So far there are differing observations regarding the epigenomic location of differentially variable CpG sites and regions. Hansen et al reported that differentially variable regions were located predominantly in non-CpG island locations with 44% in CpG shores, 57% in regions distant from CpG islands and only 31% within CpG islands themselves (274). This was surprising given that CpG island-associated probes were over-represented in the methylation array and that differentially methylated regions are found predominantly in CpG islands. In contrast, using DiffVar in kidney cancer samples, differentially variable CpG sites were reported to be mostly in CpG islands (277). The differences observed could be due to differences in analytical methodology or differences in tumour purity between the studies, but underlie the caution needed in comparing findings in this relatively young field.

Analysis

140 Differential methylation variability for individual CpG sites was assessed using the function DiffVar in the MissMethyl bioconductor package (277). In this method, differential variability is measured by the variability ratio, a ratio of methylation variance in cases compared with controls. A threshold of variability ratio >5.0 and FDR <5% (277) was used to define a Differentially Variable Position (DVP). CpGs with a low score for technical reproducibility as measured by low intraclass coefficient (ICC) <0.8 were excluded.

DVPs were compared with genes demonstrating recurrent mutations in MBCN and with genes demonstrating aberrant methylation in MBCN, both identified following literature review (Chapter 2). DVPs were also compared with the list of CpG sites identified following conditional logistic regression analysis which ranks CpG sites by differences in mean methylation. Enrichment of DVPs with genes known to be targets of the polycomb repressor complex was assessed by calculating an enrichment odds ratio, using Fisher’s exact test. The list of genes that are targets of the polycomb repressor complex was taken from the publication by Bracken et al (280) that identified 3,750 genes to be polycomb targets out of a total of 20,525 analysed.

Results

Of all 416,669 CpG sites analysed, 172 CpGs demonstrated differential methylation variability. Two CpGs were excluded due to a low score for technical reproducibility (ICC <0.8). In the final 170 differentially variable positions (DVPs) technical reproducibility at each CpG site was very high (mean ICC=0.98), confirming the presumption that the measured methylation variability is due to inter-individual variance and not technical artifact.

Of the 170 DVPs, 134 were associated with a named gene. Ontology analysis of the 122 unique genes revealed variable methylation was present in genes associated with transcription regulation (E2F1, E2F5, POU5F1, ACVRL1, APBB1, RERE, ARNT, CREB3L2, FOXP1, KPNA6, MSGN1, NFATC1, ZBED3), ubiquitin conjugation pathway (FBXL22, AMFR, EGR2, KDM2B, MARCH9, RNF14, UBE2O, USP10, USP3) and chromatin remodeling- associated genes (EIA, ERCC, SMARCA2). There was one KEGG pathway of interest identified: B cell receptor signaling (CD72, NFKBIE, NFATC1). All 170 DVPs are listed in the Appendix Table 4.

Unlike differentially methylated positions, DVPs were located nearly exclusively outside CpG islands. Only two of 170 DVPs were in a CpG island, 34 of 170 (20%) were located in a CpG shelf, 38 of 170 (22%) were located in a CpG shelf region and 56% were located in a region distant from a CpG island. 45 of 170 DVPs were in a promoter-associated region. All 170 DVPs exhibited a loss of methylation in cases compared with controls.

The chromosomal location of DVPs is shown in Figure 29 below. There is a spread of DVPs across all autosomal chromosomes. There appeared to be a cluster of DVPs on the short arm of chromosome 6, a small section of which corresponds to the Major Histocompatibility Complex (MHC) / Human Leucocyte Antigen locus that is associated with immune system regulation and inflammation. Within this region spanning from

141 Chm6:27,500,000 to Chm6:33,400,000, DMP and DMR analysis had identified enrichment of differential methylated CpG sites and regions (presented in Chapters 5.2, 5.3). However, the enrichment odds ratio for DVPs located in this region was not statistically significant (p=0.198).

Figure 27: Differentially variable positions demonstrated by chromosomal location DVPs are widely distributed across all chromosomes.

Comparison with genes mutated in MBCN In order to discover whether there was any overlap with genes identified as carrying recurrent mutations in MBCN, DVPs were compared with genes identified in the literature review described in chapter 2. There were three genes in common: NFKBIE, KLHL6 and EGR2, all identified to carry recurrent mutations in chronic lymphocytic leukaemia.

Comparison with genes aberrantly methylated in MBCN Next, in order to discover whether DVPs were similar to differentially methylated genes identified in the MBCN literature, DVPs were cross-checked with genes identified in the literature review described in chapter 2. Only one DVP correlated with a gene reported to

142 be aberrantly methylated in MBCN: ARHGAP17 (hypermethylated in diffuse large B cell lymphoma).

Comparison with DMPs Finally, in order to compare the findings of methylation variability with the conventional differential methylation analysis, DVPs were compared with the 1,338 DVPs identified by conditional logistic regression (chapter 5.2). Only 26 of the 170 DVPs had already been identified as differentially methylated CpG sites. None of the DVPs overlapped with the differentially methylated CpG sites that were identified after adjustment for white blood cell content. Therefore, it appears that measures of variable methylation identify a novel set of CpG sites compared with the differential position and differential regional methylation analyses presented thus far. Additionally, shown in the volcano plot below, high variability ratio at an individual CpG site does not correlate with a larger mean methylation difference. The lack of correlation between probes identified by high variability compared with large methylation difference supports that these two methods appear to describe different mechanisms of dysregulated methylation.

Figure 28: Plot of variance ratio and differential methylation Lack of correlation between these two methods of identifying dysregulated methylation

Enrichment of PRC2 target genes Next, DVPs were assessed to see whether they were enriched for targets of the polycomb repressor-2 (PRC2) complex. Other groups have postulated that polycomb targets feature increased epigenetic variability in cancer and pre-cancer states (276). PRC2 target genes

143 were represented in 16 of the 122 unique genes associated with DVPs (13.5%). To test whether differentially variable CpGs were enriched for PRC2 targets compared with differentially methylated CpGs, the enrichment OR among the top 500 variable CpGs and that for the top 500 differentially methylated CpGs was compared. Both DVPs and DMPs were enriched for targets of PRC2, but there was no significant difference between them. For differentially variable CpGs, the enrichment OR was 2.16 (95% CI 1.55-3.08, p=5.90x10-7) compared to the enrichment OR for DMPs of 1.85 (95% CI 1.35-2.58, p=4.53x10-5).

Discussion

Measures of methylation variability identify a different set of CpG sites associated with cancer and pre-cancer states. The utility of identifying differential variability appears to vary depending on the clinical context, for instance Teschendorff et al. found differential variability to provide a more accurate estimation of the likelihood of transformation of cervical neoplasia from a precursor state to invasive cancer (276). However, in their hands, methylation variability did not appear to be additionally informative either in the analysis of whole blood samples or in the comparison of normal and cancer tissue. In this analysis, a small number of highly variable CpG sites was identified, the majority of which were novel compared with the CpGs ranked by difference in mean methylation using conditional logistic regression. The pathways represented by genes with variable analysis were related to transcription regulation, ubiquitination and chromatin remodeling. The prior DMP analysis, presented in Chapter 5.2, had not highlighted the ubiquitin pathway, which is an essential mechanism by which cells regulate responses to various stimuli including the function of proteasome-dependent degradation of proteins modified by cellular enzymes (281). Dysfunction in the ubiquitination machinery is associated with all MBCN and is a proposed target of both current and novel therapies in MBCN (281).

In this analysis, there was enrichment of PRC2 targets identified by both differential variability and differential methylation over that expected by chance, consistent with the current understanding that the polycomb complex and DNA methylation are companion epigenetic events. The relationship between variable methylation and Polycomb group genes was explored due to increasing understanding of the interaction between the chromatin modification role of Polycomb group genes and DNA methylation (282, 283). The Polycomb Repressive Complex 2 (PRC2) is a complex of proteins required for both H3K27me2 and H3K27me3 post-translational histone modification. Gene targets of PRC2 are characterised by transcriptional silencing through both histone modification and DNA methylation. Initiation of transcription repression likely occurs via histone modification which remains easily reversible until DNA methylation at the same site occurs, being likened to a ‘dead-lock’ ensuring long-term gene silencing (283). There was, however, no additional enrichment of PRC2 targets in the variable CpGs compared with the differentially methylated CpGs. Teschendorff et al had observed that the top variable CpG sites exhibited even greater enrichment for PRC2 targets compared with differentially methylated CpGs but this was in the specific context of comparing pre-cancerous cervical epithelial cells with cervical cancer cells (276). Additionally, they did not observe the same

144 DVP enrichment for PRC2 targets in whole blood samples, which is in keeping with the findings in this analysis.

Methods for detecting methylation variability are still evolving and there are several different published methods (275-277). In this analysis, the method based on Levene’s z- test by Phipson et al. was used due to its reported balance between detection of variable methylation and control of false discovery, as well as its availability in a Bioconductor package. To date, there are no published reports identifying methylation variability in peripheral blood samples collected prior to the diagnosis of MBCN.

A limitation of these findings is the prevailing uncertainty of the biological significance of identifying these differentially variable CpGs. It is unclear whether they may play a role as an additional biomarker for early detection of disease or whether they may be similar to findings of variable methylation in MBCN tumour tissue itself. As with measurement of differential methylation, methylation variability is subject to the effect of cellular heterogeneity hence interpretation of these data from predominantly whole blood samples containing a mix of white blood cell types should be interpreted with some caution and ideally validated in a sample with either known white blood cell content or consisting of purified cells.

145 6 Conclusions and Future Work

In summary, this study describes small but measurable changes present in peripheral blood DNA methylation profile up to 20 years prior to diagnosis with MBCN, using four different but complementary analytical strategies.

Analysis of differential DNA methylation at a global level (Chapter 5.1) revealed distinctive patterns of aberrant methylation. Increased methylation was particularly notable in high CpG content promoter regions, while reduced methylation was widespread and dominant in non-CpG island-associated CpGs. These global methylation changes were not significantly changed by correction for white blood cell content, appeared to be the same for all tumour subtypes, and were not stronger in the shorter time latency groups.

In the conditional logistic regression analysis identifying differentially methylated CpG sites (Chapter 5.2), a short-list of DMPs was identified both before and after correction for white blood cell content and both of these analyses revealed genes of potential and confirmed relevance to MBCN. The analysis was designed a priori to investigate differential according to histological subgroups and time latency from blood collection to MBCN diagnosis. Increased methylation was seen in the homeobox genes, while reduced methylation was seen in genes associated with the MAPK signaling and chemokine signaling pathways. Significant heterogeneity was identified by tumour subtype among some, but not all, CpGs with the low grade group containing several outliers demonstrating more marked methylation. Together with heterogeneity for some CpGs between time latency groups suggesting stronger DNA methylation effect for the shortest latency group, this supported the possibility that some of the methylation changes were due to circulating tumour cells. In the low grade tumours, the majority of cases were chronic lymphocytic leukaemia and small lymphocytic lymphoma, in which the clinical presentation is preceded by a precursor condition where monoclonal B cells are detectable in the peripheral blood. There were also a large number of DMPs for which there was no detectable heterogeneity by tumour subtype or time latency, suggesting there are also methylation changes that may not be related to circulating tumour DNA. Differential methylation in HLA genes raise the possibility that some of the methylation changes might be related to changes in immunological mechanisms occurring prior to diagnosis with MBCN. Loss of immunological regulation is observed in both MM and NHL and is felt to be a key pathogenetic mechanism of both the development of disease and disease progression (284, 285).

Regional methylation analysis confirmed a large number of differentially methylated regions in the histocompatibility locus on chromosome 6p (Chapter 5.3). Given the lack of a uniform standard for peripheral blood methylation analysis, an exploratory process was performed to construct a definition of a DMR. The shortlist of DMRs identified regions with increased methylation in homeobox genes and regions with reduced methylation in genes associated with B cell receptor signaling (including the MAPK pathway), chemokine signaling and immune function. These findings broadly confirmed those of the conditional logistic regression analysis.

146 Lastly, differentially variable methylation analysis was used to identify aberrant methylation patterns (Chapter 5.4). This is a novel approach to assessing methylation in cancer cases and controls. As predicted, there was little overlap between the CpG sites identified by this analysis and the traditional logistic regression analysis. Variable methylation sites were associated with genes related to ubiquitin conjugation and chromatin remodeling, both relevant to MBCN pathogenesis. By using a completely different methodology, the variability analysis provided additional confirmation of a distinct methylation profile detectable in peripheral blood DNA samples.

Some aberrantly methylated genes identified in peripheral blood DNA were consistent with those reported in MBCN tumours, suggesting that some methylation changes detected in this study may represent the earliest reported form of tumour-relevant methylation changes. This observation challenges most current genetic theories of MBCN pathogenesis, which suggest genetic mutation events occur first, followed by epigenetic events. There were also numerous aberrantly methylated CpG sites and regions detected in novel genes and intergenic locations, raising the possibility that these new epigenetic pathways may be associated with MBCN. A region of interest was the major histocompatibility locus on chromosome 6p which was enriched with aberrantly methylated CpG sites and regions, suggesting aberrant methylation could be associated with dysregulation of immunological function in the years leading up to MBCN diagnosis.

The methylation assay, normalization strategies and statistical analyses were undertaken at a time during which there was limited published material to guide an appropriate strategy. Performance of the methylation assay in-house provided control over each step of quality control, and the development of a bioinformatics pathway for the MBCN cohort informed future bioinformatics pipelines for the group to apply to other tumour groups. Finally, the statistical analysis upon which which this thesis is based, was exploratory in nature and evolved while the surrounding field of methylation array analysis developed and matured. Incorporation of correction algorithms for cell content and use of regional methylation and methylation variance analyses are examples of the more recent acknowledgements

Future work may involve validation of the findings presented here in a similarly designed case-control study nested within a prospective cohort study. Within our own MCCS cohort, since the time of data collection, there have been a number of new MBCN cases identified through the cancer registry but the numbers, thus far, are significantly smaller than those already included in this study. An international collaborator would be required to identify a cohort study of similar size. Once a confirmed short-list of DMPs and DMRs are identified, it would be ideal to perform bisulfite sequencing to confirm the array findings, as well as DNA sequencing to identify whether underlying genetic mutations are responsible for the methylation findings. Additionally, performing methylation studies on paired tumour samples from the cases would test the hypothesis that the methylation findings are related to the subsequent development of MBCN. The clinical relevance of these findings is that they potentially place epigenetic mechanisms at the earliest stage of MBCN development and could result in novel therapeutic targets.

147 It is possible that these methylation abnormalities could become part of a clinically relevant and readily accessible liquid biomarker for early MBCN diagnosis, assessment of response to treatment or early detection of relapse. Further work identifying methylation changes specific to disease subgroups is required. The identification of altered methylation in novel genes may assist in the identification of new epigenetic MBCN therapeutics to add to the present limited arsenal of general hypomethylating agents (azacytidine and decitabine), EZH2 inhibitors (tazemetostat) and histone deacetylase inhibitors (vorinostat, romidepsin).

148 Appendices

Appendix Table 1: List of candidate genes identified as mutated in MBCN (mutation prevalence>4%)

AGMO CELF4 (BRUNOL4) FAM46C LTB PAX5

AHR CHD2 FAS LUC7L2 PCDHGC5 ALS2CL CHK1 FBXL2 LUZP2 PEG3 ANKHD1 CHK2 FBXW7 MAML1 PHC2 ARHGAP28 CLL1 FGFR2 MBA PI3K ARID1A CLL2 FGFR3 MCL-1 POF1B ASAP2 CLL3 FOXO1 MDM2 POLR3B ASXL1 CLL4 FRZB MECOM POT1 ATM CNOT3 FSIP2 MED1 PRDM1 ATRX COASY FSTL5 MED12 PRDM9 (BLIMP1) BAX COL25A1 FUBP1 MEF2B PRLR BAZ2A CREBBP GNA13 MiR15A PROM1 BCL2 CSMD3 GNAT1 MiR16 PTEN BCL6 CYLD HIST1H1B MIR16-1 PTPN11 BCOR CYP2A7 HIST1H1C MRE11A RAS BIRC3 DAPK HIST1H1D MUC2 RB BRAF DDX3X HIST1H1E MYC RFTN1 C12orf64 DHX32 HOXA9 MYD88 RHOA CARD11 DIS3 ID3 NECAB3 RIMS2 CCND1 DKK1 IKBKB NFKBIA RNF103 CCND3 DLEU1 IKZF3 NFKBIE ROBO1 CD36 DLEU2 IRAK1 NKAP RPL5 CD79A DLEU7 IRF4 NLN RPS15 CD79B DNAH5 KCNAB1 NOTCH1 RYR2 CDC14B DPP6 KDM6A NOTCH2 S1PR2 CDH1 (ECAD) DTNA KDM6B NPTX2 SETD1A CDH9 DTX1 KLF2 NR2F2 SETD2 CDK2 EGR1 KLHL6 NRAS SF3B1

CDK4 EGR2 KMT2A (MLL2) NSD2 (MMSET) SFRS1

CDK6 EIF4G2 KRAS NXF1 SI CDKN1B EP300 LPHN3 OR2M4 SKIV2L2 CDKN2A EPHB1 LRP1B OR6S1 SLC4A10 CDKN2B EZH2 LRRC16A P53 TNFRSF13C (BAFF-R) CDKN2C FAM117A LRRK2 PARP1

149 Appendix Table 2: List of genes identified as aberrantly methylated in MBCN from literature review Hypermethylated Hypomethylated CLL Low grade B-NHL DLBCL Myeloma CLL Mixed NHL ABI3 DAPK1 AR ADAMTS9 SORL1 ADORA3 CAMP ADCY5 ABF1 ARHGAP17 APC SOX9 AIRE DIO3 BNC1 AHR ASPHD2 ARF SPIB ANGPT2 FZD9 CDKN2A CDH1 CDKN1C BIK TFAP2 BCL10 HS3ST2 CDKN2B CDKN1C CDKN1C BNIP3 ZIK1 BCL2 MERTK CIDEB CDKN2A CDKN2A CDH1 ZIK1 BCR MOS CYFIP2 CDKN2B CDKN2B CDKN2A ZNF134 CARD15 (NOD2) MYOD1 DIO3 HLXB9 (HOXA11) CRY1 CDKN2B ZNF256 FABP7 EBF3 HOXA13 CXorf57 CXCR4 ZNF615 IFNB1 FOXD3 HOXA2 CYB5R2 DAPK IL19IL17Rc FOXE1 HOXA9 CYB5R2 DCC LFNG FOXG1 LHX1 DAPK1 DKK1 LOC340061 GPX2 MGMT DBC1 (CCAR2) DKK3 NOTCH1 HOXC13 NPTX2 DCLK2 DLC-1 PLD1 HOXD11 NR2F2 DLC1 ECAD PRF1 HOXD8 PAX7 FGD2 ER RASGRF1 ID4 POU4F1 GALNS ESR1 S100A14 IGSF4 ROBO1 GATA4 GPX3 TCF7 NOTCH1 SOX9 GDNF IGF1R TCL1A NR2F2 GRIN2B IL17RB UNC5CL NR4A1 IKZF1 (IGSF4) MGMT URP2 PPP1R3A IL12A MIR129-2 VAV2 PRDM2 JDP2 MIR34B PTGES KCNK12 MIR34C ABI3 KIAA0746 NFKB1 RIX1 LANCL1 RARB SCGB2A1 LYAR RBP1 SERPIND5 MAD2 SFRP1 SFRP1 MGMT SFRP1 SLIT2 MIR34A SFRP2 SOX1 MIR34B/C SFRP4 SOX11 MTHFR SFRP5 SOX2 MYOD1 SFRP5 SOX4 NEUROD1 SHP1 SOX6 ONECUT2 SOCS1 SOX9 P57 SPARC SPG20 PAK1 TGFBI VHL PCDH10 TGFBR2 VHL PCDH10 TP53 WISP3 PMM2 TP73 ZFP28 SDR42E1 WIF1 ZNF471 SDR42E1

150

Appendix Table 3: Full list of DMPs (unadjusted for white blood cells). See table footnote for indications of DMPs corresponding with other Infinium 450K publications

β (case- Probe ID OR 95% CI p value β (control) Gene name Chm control) cg25212453 1.83 1.55-2.17 1.08E-12 0.864 -0.033 SLC43A2 17 cg04771285* 1.79 1.5-2.12 3.95E-11 0.521 -0.026 12 cg14341467* 1.72 1.46-2.03 7.05E-11 0.768 -0.029 GRK5 10 cg24048338* 1.8 1.5-2.14 8.92E-11 0.816 -0.032 HGSNAT 8 cg22035959* 1.74 1.47-2.06 1.55E-10 0.773 -0.039 ARID3A 19 cg08666707* 1.68 1.43-1.97 1.99E-10 0.678 -0.025 6 cg10568634* 1.68 1.43-1.96 2.17E-10 0.521 -0.026 AKR1C1 10 cg09905973* 1.67 1.42-1.95 2.63E-10 0.821 -0.021 CXCR1 2 cg09046979 1.66 1.42-1.94 3.33E-10 0.424 -0.035 SBK1 16 cg24420717 1.7 1.44-2 3.77E-10 0.606 -0.024 GRN 17 cg10700424 1.59 1.37-1.83 4.22E-10 0.0449 0.0291 GLB1L2 11 cg19052355* 1.64 1.4-1.92 4.32E-10 0.0887 0.0283 GBX2 2 cg06144110 1.72 1.45-2.04 4.58E-10 0.803 -0.026 16 cg05379509* 1.65 1.41-1.93 5.11E-10 0.9 -0.014 SPTB 14 cg00290340 1.64 1.4-1.92 5.17E-10 0.828 -0.027 IGDCC3 15 cg14522302* 1.69 1.43-2 5.32E-10 0.849 -0.018 1 cg05484458* 1.7 1.44-2.02 5.39E-10 0.817 -0.025 GNB3 12 cg19965312* 1.63 1.39-1.9 6.07E-10 0.712 -0.028 TWIST2 2 cg01858866 1.66 1.42-1.95 6.30E-10 0.463 -0.026 PABPC4 1 cg10319905* 1.69 1.43-2 6.39E-10 0.834 -0.032 14 cg05772125* 1.69 1.43-2 6.69E-10 0.639 -0.041 AHSP 16 cg13981001* 1.67 1.42-1.96 7.49E-10 0.577 -0.022 RPL12 9 cg17958984* 1.67 1.42-1.97 8.20E-10 0.631 -0.021 17 cg01770019* 1.7 1.43-2.01 8.38E-10 0.563 -0.023 C1orf127 1 cg10613706* 1.66 1.41-1.96 8.49E-10 0.659 -0.031 17 cg06576920* 1.7 1.43-2.01 8.73E-10 0.661 -0.019 GH1 17 cg08224238* 1.62 1.39-1.9 8.75E-10 0.879 -0.029 CA5A 16 cg20524040* 1.61 1.38-1.87 8.99E-10 0.58 -0.017 TCEA2 20 cg06454982 1.66 1.41-1.95 9.12E-10 0.835 -0.025 ARRB1 11 cg18440123* 1.7 1.43-2.01 9.33E-10 0.874 -0.028 7 cg22110517* 1.68 1.42-1.99 9.34E-10 0.864 -0.017 MINK1 17 cg02478589* 1.71 1.44-2.02 9.77E-10 0.739 -0.028 5 cg09623279* 1.69 1.43-2 1.02E-09 0.794 -0.034 10 cg12667732* 1.71 1.44-2.02 1.05E-09 0.849 -0.029 SLCO3A1 15 cg03295107* 1.65 1.41-1.94 1.07E-09 0.73 -0.029 7 cg06861115* 1.65 1.4-1.93 1.07E-09 0.728 -0.028 15 cg16072814 1.69 1.42-1.99 1.18E-09 0.791 -0.028 NOTCH4 6 cg22936990* 1.64 1.4-1.92 1.20E-09 0.786 -0.028 5 cg22961335* 1.63 1.39-1.9 1.21E-09 0.68 -0.025 10

151 cg04308797* 1.7 1.43-2.02 1.22E-09 0.926 -0.027 SEC14L1 17 cg04340203* 1.7 1.43-2.02 1.25E-09 0.904 -0.021 MED16 19 cg26083141* 1.67 1.41-1.97 1.37E-09 0.894 -0.022 MYH6 14 cg08928933* 1.66 1.41-1.96 1.41E-09 0.808 -0.022 FOXN1 17 cg25649188* 1.6 1.37-1.86 1.45E-09 0.743 -0.024 CASKIN2 17 cg18841653 1.62 1.38-1.89 1.46E-09 0.523 -0.024 AKR1C2 10 cg17352891* 1.69 1.43-2 1.46E-09 0.857 -0.023 SOX5 12 cg19754282* 1.7 1.43-2.02 1.51E-09 0.831 -0.026 17 cg19025573* 1.59 1.37-1.86 1.51E-09 0.708 -0.024 CRYBB2 22 cg26865084* 1.61 1.38-1.88 1.52E-09 0.67 -0.033 6 cg05700681* 1.66 1.41-1.96 1.53E-09 0.852 -0.026 CCL22 16 cg26972479* 1.64 1.39-1.92 1.55E-09 0.745 -0.028 17 cg08753560* 1.69 1.42-2 1.56E-09 0.753 -0.032 7 cg06664486* 1.6 1.37-1.86 1.58E-09 0.666 -0.022 6 cg10583876* 1.65 1.4-1.94 1.59E-09 0.554 -0.021 SPEG 2 cg13244315 1.62 1.39-1.9 1.59E-09 0.957 -0.017 AGRN 1 cg25693289 1.64 1.39-1.92 1.59E-09 0.0314 0.0117 MYADM 19 cg15256491 1.64 1.39-1.92 1.68E-09 0.74 -0.025 FOXK1 7 cg16809040* 1.64 1.4-1.93 1.69E-09 0.828 -0.031 17 cg00661018* 1.66 1.41-1.96 1.70E-09 0.735 -0.028 ARHGAP39 8 cg06873452 1.65 1.4-1.93 1.70E-09 0.114 0.026 17 cg07375851 1.64 1.4-1.93 1.71E-09 0.817 -0.03 PGLYRP2 19 cg06813100* 1.66 1.4-1.95 1.72E-09 0.519 -0.026 15 cg20440093* 1.61 1.38-1.87 1.73E-09 0.644 -0.018 MUC4 3 cg04665287* 1.61 1.38-1.88 1.77E-09 0.621 -0.031 PDPK1 16 cg23900696* 1.65 1.4-1.95 1.79E-09 0.904 -0.019 12 cg14960373* 1.6 1.37-1.87 1.82E-09 0.755 -0.026 COL6A2 21 cg22627753* 1.63 1.39-1.91 1.84E-09 0.737 -0.047 AGRN 1 cg14578363* 1.66 1.41-1.96 1.88E-09 0.833 -0.027 DBF4B 17 cg02095334 1.65 1.4-1.95 1.90E-09 0.856 -0.026 COL11A2 6 cg22950493 1.74 1.45-2.09 1.90E-09 0.332 -0.018 7 cg00718539 1.7 1.43-2.02 1.91E-09 0.598 -0.026 OR2H1 7 cg15909319 1.62 1.38-1.9 1.97E-09 0.0464 0.0177 17 cg21691893 1.67 1.41-1.97 1.99E-09 0.594 -0.023 TNXB 6 cg10575367* 1.64 1.4-1.93 2.14E-09 0.852 -0.028 SRRM2-AS1 16 cg02045981* 1.63 1.39-1.9 2.23E-09 0.764 -0.022 21 cg04694619 1.63 1.39-1.92 2.25E-09 0.854 -0.032 ANO7 2 cg11987068 1.68 1.42-1.99 2.26E-09 0.829 -0.028 15 cg06539006* 1.69 1.42-2 2.28E-09 0.757 -0.037 CPO 2 cg05028750* 1.65 1.4-1.94 2.33E-09 0.753 -0.026 4 cg19229651 1.67 1.41-1.98 2.41E-09 0.93 -0.02 7 cg27009703 1.58 1.36-1.83 2.44E-09 0.134 0.037 HOXA9 7 cg22476960 1.68 1.42-2 2.56E-09 0.698 -0.028 OR8G1 11

152 cg09237849* 1.63 1.39-1.91 2.57E-09 0.79 -0.023 TNFRSF10C 8 cg06180869* 1.64 1.4-1.93 2.59E-09 0.734 -0.034 11 cg08548500* 1.61 1.38-1.88 2.62E-09 0.728 -0.023 F10 13 cg11911769 1.62 1.38-1.9 2.63E-09 0.819 -0.025 CUX1 7 cg26270261* 1.64 1.4-1.94 2.74E-09 0.775 -0.027 KRT4 12 cg10433327* 1.66 1.4-1.95 2.74E-09 0.853 -0.025 NEDD4 15 cg02632200* 1.62 1.38-1.9 2.85E-09 0.654 -0.016 SYT15 10 cg10718130* 1.58 1.36-1.83 2.90E-09 0.72 -0.022 DGAT1 8 cg08720250* 1.64 1.39-1.93 2.99E-09 0.635 -0.022 12 cg23016726* 1.62 1.38-1.89 2.99E-09 0.769 -0.021 CACNA1I 22 cg01871963* 1.61 1.38-1.89 3.00E-09 0.901 -0.017 LCN2 9 cg27045264* 1.59 1.37-1.86 3.01E-09 0.7 -0.023 16 cg22762189* 1.63 1.39-1.91 3.05E-09 0.91 -0.017 MSGN1 2 cg22653878* 1.71 1.43-2.04 3.09E-09 0.744 -0.027 6 cg22210779* 1.59 1.37-1.86 3.15E-09 0.948 -0.02 TBCD 17 cg02934736*^ 1.6 1.37-1.87 3.16E-09 0.67 -0.016 CROCCP2 1 cg03473640 1.68 1.42-2 3.17E-09 0.813 -0.022 MYO5A 15 cg07370588* 1.63 1.39-1.92 3.18E-09 0.762 -0.023 17 cg09577986 1.64 1.39-1.94 3.20E-09 0.734 -0.028 6 cg02198144* 1.64 1.39-1.93 3.21E-09 0.813 -0.032 16 cg04696391* 1.61 1.38-1.89 3.22E-09 0.963 -0.023 PAQR4 16 cg10044629* 1.61 1.37-1.88 3.34E-09 0.666 -0.019 MUC2 11 cg01279413* 1.7 1.43-2.03 3.48E-09 0.658 -0.035 20 cg17319889 1.66 1.4-1.96 3.48E-09 0.699 -0.03 CACNA1G 17 cg15386103 1.64 1.39-1.93 3.49E-09 0.845 -0.025 JUP 17 cg08683297* 1.62 1.38-1.9 3.49E-09 0.6 -0.019 KRT85 12 cg02418072* 1.66 1.4-1.97 3.54E-09 0.845 -0.02 10 cg06825415* 1.62 1.38-1.9 3.69E-09 0.713 -0.035 5 cg08548718* 1.61 1.37-1.88 3.70E-09 0.58 -0.014 PSG3 19 cg02455094 1.59 1.36-1.86 3.81E-09 0.128 0.019 ONECUT2 18 cg13782937 1.64 1.39-1.93 3.84E-09 0.748 -0.032 ABLIM1 10 cg24430703* 1.65 1.4-1.95 3.94E-09 0.813 -0.022 VOPP1 7 cg08899442* 1.68 1.41-2 4.06E-09 0.593 -0.03 5 cg23986143* 1.6 1.37-1.87 4.16E-09 0.741 -0.028 ZP3 7 cg02656049 1.6 1.37-1.88 4.28E-09 0.906 -0.02 FBRSL1 12 cg10901383* 1.6 1.37-1.87 4.29E-09 0.457 -0.026 TAC3 12 cg23485591 1.69 1.42-2.02 4.32E-09 0.827 -0.025 7 cg22383924* 1.61 1.37-1.89 4.33E-09 0.607 -0.027 TP73 1 cg11043251* 1.63 1.39-1.92 4.35E-09 0.656 -0.028 RUNX1T1 8 cg11334728* 1.7 1.42-2.02 4.38E-09 0.765 -0.023 GALNT9 12 cg03394909* 1.56 1.35-1.82 4.39E-09 0.678 -0.021 SPEG 2 cg08181642* 1.57 1.35-1.83 4.49E-09 0.751 -0.02 SLC29A2 11 cg21186966* 1.65 1.39-1.95 4.56E-09 0.691 -0.032 STK32B 4

153 cg09034695* 1.6 1.37-1.87 4.61E-09 0.905 -0.026 17 cg21423404* 1.63 1.39-1.93 4.65E-09 0.717 -0.019 KDM2B 12 cg02571055* 1.61 1.38-1.89 4.71E-09 0.781 -0.037 TULP4 6 cg06184396* 1.57 1.35-1.83 4.83E-09 0.722 -0.019 LINC00051 8 cg07034276 1.61 1.37-1.88 4.90E-09 0.899 -0.017 AQP6 12 cg25652701 1.58 1.36-1.84 4.91E-09 0.111 0.019 ATXN7L1 7 cg08160246* 1.56 1.35-1.82 4.92E-09 0.757 -0.025 KCNQ1 11 cg13888748 1.58 1.36-1.84 4.92E-09 0.906 -0.022 NIPSNAP3B 9 cg17809945 1.71 1.43-2.05 5.02E-09 0.798 -0.028 SDPR 2 cg04695027* 1.61 1.37-1.89 5.09E-09 0.547 -0.027 CATSPER1 11 cg25989783 1.66 1.4-1.97 5.18E-09 0.749 -0.033 NCALD 8 cg04285935* 1.59 1.36-1.86 5.19E-09 0.759 -0.035 KCNH5 14 cg05027458 1.54 1.33-1.79 5.20E-09 0.0241 0.0118 ADAMTS17 15 cg00470817 1.59 1.36-1.87 5.21E-09 0.917 -0.02 LPAR5 12 cg06315208* 1.6 1.37-1.88 5.22E-09 0.845 -0.035 MDC1 6 cg11516004* 1.63 1.39-1.92 5.23E-09 0.87 -0.03 RERE 1 cg00114160* 1.65 1.4-1.96 5.28E-09 0.811 -0.027 OR2H1 6 cg22298487* 1.61 1.37-1.88 5.34E-09 0.702 -0.024 SRMS 20 cg05374044* 1.61 1.37-1.89 5.37E-09 0.842 -0.025 MYOZ3 5 cg10624445* 1.6 1.37-1.88 5.39E-09 0.665 -0.028 CNGB1 16 cg01744056* 1.66 1.4-1.97 5.54E-09 0.765 -0.03 PARVA 11 cg18945039* 1.61 1.37-1.89 5.61E-09 0.698 -0.026 OVGP1 1 cg17316300* 1.59 1.36-1.85 5.67E-09 0.632 -0.027 IQCA1 2 cg09338297* 1.63 1.38-1.92 5.68E-09 0.889 -0.018 11 cg06774893* 1.61 1.37-1.89 5.73E-09 0.562 -0.026 16 cg26287609* 1.6 1.37-1.88 5.73E-09 0.773 -0.021 C20orf173 20 cg24724618* 1.63 1.38-1.92 5.74E-09 0.718 -0.029 PADI1 1 cg06599169* 1.57 1.35-1.84 5.90E-09 0.674 -0.022 RPL18AP3 19 cg19621065* 1.62 1.38-1.91 6.04E-09 0.845 -0.02 SLCO3A1 15 cg14851701* 1.56 1.34-1.82 6.07E-09 0.749 -0.029 12 cg01477133* 1.63 1.38-1.92 6.29E-09 0.691 -0.028 FAM83C 20 cg26538691* 1.61 1.37-1.9 6.33E-09 0.734 -0.024 1 cg22779330* 1.66 1.4-1.98 6.35E-09 0.786 -0.021 MYEOV 11 cg11629220 1.58 1.35-1.84 6.37E-09 0.948 -0.03 ARHGAP45 19 cg18806660* 1.66 1.4-1.97 6.37E-09 0.746 -0.027 12 cg23328335* 1.64 1.39-1.94 6.47E-09 0.629 -0.021 TNXB 6 cg04356090* 1.56 1.34-1.81 6.49E-09 0.878 -0.028 SLC15A4 12 cg20569369* 1.6 1.36-1.88 6.54E-09 0.783 -0.021 22 cg03908676 1.61 1.37-1.88 6.56E-09 0.713 -0.036 LYVE1 11 cg00178928* 1.61 1.37-1.88 6.58E-09 0.662 -0.022 2 cg14349078* 1.72 1.43-2.06 6.60E-09 0.736 -0.034 PCSK5 9 cg25308381* 1.62 1.38-1.91 6.67E-09 0.709 -0.029 TEX36-AS1 10 cg16685313* 1.63 1.38-1.92 6.74E-09 0.888 -0.022 1

154 cg08408971 1.65 1.39-1.95 6.79E-09 0.774 -0.026 5 cg11114503* 1.64 1.39-1.94 6.82E-09 0.823 -0.029 LTBP2 14 cg17872861* 1.6 1.37-1.88 6.85E-09 0.726 -0.019 CYFIP1 15 cg15798153* 1.62 1.38-1.91 6.89E-09 0.79 -0.03 CDK14 7 cg10405075* 1.65 1.39-1.95 6.91E-09 0.83 -0.024 16 cg20436086* 1.65 1.39-1.95 6.92E-09 0.801 -0.031 COL13A1 10 cg03582327* 1.63 1.38-1.93 6.94E-09 0.822 -0.029 SIGLEC10 19 cg02493771* 1.61 1.37-1.89 6.97E-09 0.403 -0.015 KRTAP13-2 21 cg22361106* 1.6 1.36-1.88 6.98E-09 0.683 -0.025 1 cg06686679* 1.62 1.38-1.91 7.03E-09 0.547 -0.012 19 cg09440340* 1.6 1.36-1.88 7.12E-09 0.764 -0.021 MAB21L3 1 cg02890365* 1.64 1.39-1.94 7.14E-09 0.799 -0.025 TUFT1 1 cg13502395* 1.61 1.37-1.89 7.32E-09 0.734 -0.025 KAZN 1 cg13480465* 1.66 1.39-1.96 7.33E-09 0.77 -0.035 LDB2 4 cg06952577* 1.63 1.38-1.93 7.35E-09 0.851 -0.019 1 cg13850695* 1.58 1.35-1.85 7.38E-09 0.815 -0.023 9 cg00912772* 1.64 1.39-1.93 7.44E-09 0.836 -0.025 FGD6 12 cg09083279* 1.54 1.33-1.78 7.56E-09 0.603 -0.028 MAS1L 6 cg06241689* 1.6 1.36-1.87 7.58E-09 0.585 -0.022 C2orf72 2 cg05267955* 1.58 1.36-1.86 7.63E-09 0.706 -0.026 VARS2 6 cg12856708* 1.62 1.38-1.91 7.64E-09 0.759 -0.024 17 cg03710354 1.55 1.34-1.81 7.67E-09 0.0932 0.0168 OTOP2 17 cg10758824 1.71 1.42-2.04 7.68E-09 0.726 -0.034 PBX1 1 cg19115017* 1.7 1.42-2.03 7.71E-09 0.743 -0.033 INSC 11 cg25757869* 1.62 1.37-1.9 7.72E-09 0.74 -0.02 MYOZ1 10 cg20951539 1.63 1.38-1.92 7.77E-09 0.887 -0.027 PXT1 6 cg03645007 1.56 1.34-1.81 7.77E-09 0.794 -0.018 SLC38A3 3 cg05083067* 1.61 1.37-1.88 7.78E-09 0.751 -0.029 OR51D1 11 cg17529152* 1.57 1.35-1.83 7.78E-09 0.421 -0.019 DBI 2 cg19030682* 1.63 1.38-1.92 7.85E-09 0.724 -0.04 SLC38A10 17 cg24706992* 1.62 1.38-1.91 7.88E-09 0.763 -0.019 1 cg12549345* 1.56 1.34-1.81 7.92E-09 0.824 -0.027 THOP1 19 cg13703089* 1.54 1.33-1.79 8.06E-09 0.935 -0.028 14 cg21357629 1.74 1.44-2.1 8.10E-09 0.0864 0.0196 21 cg13209335 1.63 1.38-1.93 8.15E-09 0.854 -0.031 1 cg26306435* 1.61 1.37-1.9 8.15E-09 0.848 -0.028 CHST15 10 cg26426437* 1.59 1.36-1.87 8.17E-09 0.899 -0.032 CAMKK2 12 cg03221702 1.57 1.35-1.83 8.20E-09 0.755 -0.021 ILF3 19 cg25781210* 1.57 1.35-1.84 8.20E-09 0.956 -0.013 5 cg14645027* 1.57 1.35-1.83 8.31E-09 0.739 -0.031 KRT7 12 cg18295744* 1.61 1.37-1.89 8.37E-09 0.764 -0.025 ZMIZ1 10 cg24402151* 1.61 1.37-1.89 8.39E-09 0.905 -0.025 11 cg18290981* 1.54 1.33-1.78 8.40E-09 0.784 -0.027 MACROD1 11

155 cg09238000 1.59 1.36-1.87 8.42E-09 0.85 -0.02 SLC45A4 8 cg20074526 1.62 1.37-1.9 8.43E-09 0.861 -0.021 3 cg16697850* 1.65 1.39-1.95 8.45E-09 0.744 -0.036 MOXD2P 7 cg18361425* 1.58 1.35-1.85 8.55E-09 0.872 -0.024 RAMP1 2 cg20091384* 1.61 1.37-1.89 8.61E-09 0.64 -0.028 GNG7 19 cg09017139* 1.59 1.36-1.87 8.62E-09 0.891 -0.013 10 cg05264223* 1.6 1.36-1.87 8.68E-09 0.876 -0.024 LINC00593 15 cg01581237 1.57 1.35-1.83 8.73E-09 0.0765 0.02 SPEF2 5 cg18169095 1.63 1.38-1.93 8.74E-09 0.695 -0.024 LAD1 1 cg01352906 1.64 1.38-1.93 8.77E-09 0.582 -0.026 1 cg10244994 1.59 1.36-1.86 8.77E-09 0.812 -0.02 17 cg27564715 1.59 1.36-1.87 8.85E-09 0.651 -0.017 ARHGEF10 8 cg20998127* 1.63 1.38-1.93 8.90E-09 0.855 -0.025 VIPR2 7 cg02143778* 1.64 1.39-1.95 8.91E-09 0.734 -0.028 16 cg04151146* 1.63 1.38-1.93 8.92E-09 0.788 -0.021 LINC00905 19 cg03507326 1.56 1.34-1.82 8.94E-09 0.68 -0.044 SRRM2-AS1 16 cg13898430* 1.56 1.34-1.82 8.95E-09 0.626 -0.023 PPP1R32 11 cg25720825* 1.61 1.37-1.89 8.97E-09 0.685 -0.03 1 cg15885043* 1.57 1.35-1.83 9.12E-09 0.767 -0.019 APBA2 15 cg13616033 1.56 1.34-1.82 9.24E-09 0.652 -0.018 SPATA31E1 9 cg06244306* 1.61 1.37-1.89 9.36E-09 0.88 -0.022 PC 11 cg18060999* 1.61 1.37-1.9 9.39E-09 0.8 -0.03 MAGI1 3 cg26062581* 1.6 1.36-1.87 9.48E-09 0.757 -0.027 11 cg05344066* 1.6 1.36-1.88 9.51E-09 0.619 -0.02 TMEM105 17 cg27356115* 1.56 1.34-1.82 9.63E-09 0.599 -0.03 15 cg27209072* 1.61 1.37-1.9 9.68E-09 0.737 -0.028 10 cg10315334 1.61 1.37-1.89 9.69E-09 0.646 -0.032 CCL5 17 cg03550727* 1.62 1.38-1.92 9.71E-09 0.792 -0.031 6 cg06617335* 1.57 1.34-1.83 9.86E-09 0.801 -0.022 LOC91450 15 cg21272713* 1.63 1.38-1.93 9.89E-09 0.744 -0.032 12 cg21742975* 1.6 1.36-1.88 9.93E-09 0.914 -0.022 15 cg20102955* 1.62 1.38-1.91 9.96E-09 0.804 -0.023 OC90 8 cg15439862 1.57 1.34-1.83 1.00E-08 0.356 0.036 DSC3 18 cg03634394 1.63 1.38-1.93 1.01E-08 0.784 -0.021 LINC00905 19 cg26609631* 1.55 1.33-1.8 1.01E-08 0.124 0.04 GSX1 13 cg10395806 1.54 1.33-1.79 1.02E-08 0.868 -0.027 CTTN 11 cg19303305 1.59 1.36-1.86 1.02E-08 0.68 -0.022 HTRA3 4 cg15449049* 1.59 1.36-1.87 1.03E-08 0.824 -0.024 MITF 3 cg27388962* 1.62 1.38-1.92 1.04E-08 0.711 -0.036 LINC00704 10 cg08462367* 1.61 1.37-1.89 1.04E-08 0.65 -0.03 8 cg12279968* 1.58 1.35-1.86 1.04E-08 0.871 -0.029 LINC01134 1 cg04784410* 1.66 1.4-1.98 1.05E-08 0.576 -0.034 SNORD113-4 14 cg26685998 1.61 1.37-1.9 1.05E-08 0.783 -0.02 2

156 cg12804755* 1.59 1.36-1.87 1.06E-08 0.691 -0.031 11 cg16046833* 1.59 1.36-1.86 1.07E-08 0.725 -0.028 SNX7 1 cg26748477 1.6 1.36-1.88 1.07E-08 0.637 -0.026 17 cg15817482* 1.58 1.35-1.84 1.07E-08 0.678 -0.02 ASB2 14 cg04198075 1.62 1.37-1.91 1.08E-08 0.687 -0.03 DIP2C 10 cg01208224* 1.63 1.38-1.92 1.09E-08 0.836 -0.024 PPP2R2C 4 cg16489259* 1.57 1.35-1.83 1.09E-08 0.604 -0.023 10 cg19165390 1.63 1.38-1.93 1.09E-08 0.365 0.025 18 cg04588380 1.61 1.36-1.89 1.13E-08 0.609 -0.025 KRT2 12 cg10319640* 1.61 1.37-1.9 1.14E-08 0.696 -0.033 SERPINA3 14 cg22379463* 1.59 1.36-1.87 1.15E-08 0.579 -0.024 3 cg17131905* 1.62 1.37-1.91 1.15E-08 0.883 -0.022 JPH1 8 cg24896053* 1.62 1.37-1.9 1.15E-08 0.891 -0.021 10 cg26163284* 1.61 1.36-1.89 1.15E-08 0.634 -0.018 19 cg24516259* 1.59 1.36-1.87 1.16E-08 0.882 -0.031 PLCH2 1 cg06370069* 1.61 1.37-1.89 1.16E-08 0.509 -0.031 CHST3 10 cg19513903* 1.57 1.34-1.83 1.17E-08 0.738 -0.02 DHRS4-AS1 14 cg13236257* 1.55 1.33-1.81 1.17E-08 0.777 -0.017 PPP1R32 11 cg09637963* 1.62 1.37-1.91 1.18E-08 0.764 -0.03 6 cg07699300 1.62 1.37-1.91 1.19E-08 0.676 -0.028 APBA2 15 cg00068433* 1.58 1.35-1.86 1.19E-08 0.843 -0.019 ARHGEF19 1 cg27232166 1.59 1.36-1.86 1.20E-08 0.89 -0.019 21 cg09655341* 1.55 1.33-1.8 1.20E-08 0.635 -0.017 PDE6G 17 cg12801067* 1.61 1.37-1.89 1.21E-08 0.825 -0.029 12 cg24598700* 1.62 1.37-1.91 1.21E-08 0.742 -0.027 5 cg07972309* 1.58 1.35-1.85 1.21E-08 0.952 -0.015 2 cg13259290* 1.59 1.36-1.87 1.21E-08 0.724 -0.014 CSF2 5 cg20181209* 1.66 1.39-1.97 1.22E-08 0.751 -0.032 RPS6KA2 6 cg26621780* 1.64 1.39-1.95 1.23E-08 0.871 -0.026 RASAL1 12 cg07064720* 1.61 1.37-1.9 1.23E-08 0.736 -0.026 15 cg15020132* 1.67 1.4-1.99 1.23E-08 0.829 -0.021 BMPER 7 cg12642651* 1.61 1.37-1.89 1.23E-08 0.876 -0.018 GAS7 17 cg01890568 1.58 1.35-1.85 1.24E-08 0.738 -0.032 ASIC2 17 cg07586799* 1.57 1.35-1.84 1.24E-08 0.793 -0.019 TNK2 3 cg18001869* 1.59 1.36-1.87 1.25E-08 0.733 -0.027 CACNA1I 22 cg04538990* 1.56 1.34-1.82 1.25E-08 0.83 -0.025 11 cg08111661* 1.57 1.35-1.84 1.26E-08 0.613 -0.037 CACNA1C 12 cg01986263* 1.62 1.37-1.9 1.26E-08 0.734 -0.026 20 cg14804384* 1.57 1.34-1.83 1.26E-08 0.763 -0.025 AP2A2 11 cg11876705* 1.59 1.36-1.87 1.27E-08 0.785 -0.035 RASGRP4 19 cg17090237* 1.62 1.37-1.91 1.27E-08 0.724 -0.022 TUB 11 cg03778108* 1.58 1.35-1.86 1.28E-08 0.816 -0.022 MYBPC3 11 cg02878544* 1.6 1.36-1.88 1.28E-08 0.835 -0.02 16

157 cg23274984* 1.65 1.39-1.96 1.29E-08 0.783 -0.03 2 cg03649060* 1.64 1.39-1.95 1.29E-08 0.758 -0.029 ELOF1 19 cg23679992* 1.6 1.36-1.88 1.29E-08 0.728 -0.027 PLEKHG6 12 cg15237494 1.59 1.36-1.87 1.30E-08 0.192 0.03 ADAMTS5 21 cg17480554 1.58 1.35-1.86 1.31E-08 0.885 -0.023 ITGA2 5 cg05348084* 1.6 1.36-1.88 1.31E-08 0.68 -0.019 MIR376B 14 cg26847866* 1.62 1.37-1.9 1.32E-08 0.52 -0.022 SCARA3 8 cg12083893* 1.64 1.38-1.94 1.33E-08 0.748 -0.032 10 cg00895997* 1.57 1.34-1.83 1.33E-08 0.76 -0.025 SLC13A1 7 cg14857193* 1.63 1.38-1.93 1.34E-08 0.61 -0.027 ALDH1L1 3 cg05685056* 1.56 1.34-1.82 1.34E-08 0.655 -0.027 11 cg26275110 1.56 1.34-1.82 1.34E-08 0.629 -0.024 SLC22A11 11 cg10497754 1.58 1.35-1.85 1.35E-08 0.844 -0.023 17 cg18401376* 1.58 1.35-1.86 1.35E-08 0.691 -0.02 6 cg19846389* 1.59 1.35-1.86 1.36E-08 0.668 -0.03 KRT9 17 cg01389386 1.61 1.36-1.89 1.36E-08 0.891 -0.017 ST3GAL2 16 cg27426287 1.53 1.32-1.77 1.36E-08 0.105 0.018 SVIL 10 cg12169927* 1.59 1.35-1.86 1.37E-08 0.511 -0.015 7 cg24682512 1.64 1.38-1.94 1.38E-08 0.803 -0.031 GRM3 7 cg23369632* 1.65 1.39-1.97 1.40E-08 0.84 -0.027 DST 6 cg04656009* 1.56 1.34-1.83 1.40E-08 0.75 -0.026 CCL8 17 cg10902033* 1.55 1.33-1.8 1.40E-08 0.748 -0.022 PP7080 5 cg04344476* 1.54 1.33-1.79 1.40E-08 0.6 -0.02 NKPD1 19 cg21151963 1.61 1.37-1.9 1.41E-08 0.887 -0.026 HLA-DPB1 6 cg11935831 1.56 1.34-1.82 1.41E-08 0.733 -0.023 CCDC57 17 cg08284151 1.65 1.39-1.97 1.41E-08 0.582 -0.016 DPPA3 12 cg05053829* 1.57 1.34-1.83 1.41E-08 0.484 -0.007 8 cg05939560* 1.57 1.34-1.83 1.43E-08 0.446 -0.032 RIMBP2 12 cg09990169* 1.55 1.33-1.81 1.44E-08 0.69 -0.024 C2orf54 2 cg14160419* 1.58 1.35-1.85 1.45E-08 0.721 -0.034 RBM47 4 cg20696173* 1.63 1.38-1.93 1.45E-08 0.824 -0.027 13 cg20711030* 1.6 1.36-1.88 1.46E-08 0.697 -0.03 GNG7 19 cg04363281* 1.71 1.42-2.05 1.46E-08 0.78 -0.025 LINC01101 2 cg11387248* 1.55 1.33-1.81 1.46E-08 0.709 -0.025 PSG8 19 cg00808175* 1.58 1.35-1.86 1.46E-08 0.86 -0.024 GNB3 12 cg25116503 1.57 1.35-1.84 1.47E-08 0.874 -0.022 TMPRSS4 11 cg07778029 1.56 1.34-1.82 1.47E-08 0.047 0.0157 HOXA9 7 cg21605271*^ 1.52 1.32-1.76 1.48E-08 0.573 -0.01 3 cg02083319 1.59 1.36-1.87 1.49E-08 0.753 -0.025 DOK3 5 cg13106156 1.56 1.34-1.83 1.50E-08 0.705 -0.024 COL11A2 6 cg01739845* 1.55 1.33-1.81 1.50E-08 0.681 -0.022 15 cg03975135* 1.56 1.34-1.81 1.50E-08 0.69 -0.016 PDCD6 5 cg25512381 1.57 1.34-1.83 1.52E-08 0.726 -0.024 3

158 cg10322876* 1.57 1.35-1.85 1.52E-08 0.881 -0.016 CYP2B6 19 cg11915812* 1.57 1.34-1.83 1.53E-08 0.854 -0.022 HPCAL1 2 cg16358741* 1.56 1.34-1.82 1.53E-08 0.541 -0.018 ST14 11 cg00706913* 1.62 1.37-1.92 1.55E-08 0.719 -0.022 PRDM16 1 cg08251765* 1.58 1.35-1.86 1.55E-08 0.904 -0.017 ZFAT 8 cg10129884* 1.61 1.36-1.89 1.57E-08 0.858 -0.029 16 cg20408175 1.56 1.34-1.82 1.57E-08 0.418 -0.019 PFDN5 12 cg11223735* 1.62 1.37-1.91 1.57E-08 0.655 -0.009 MGMT 10 cg02879159* 1.58 1.35-1.86 1.58E-08 0.777 -0.025 TRIM15 6 cg24358467* 1.58 1.35-1.85 1.59E-08 0.703 -0.021 AFAP1 4 cg16200799 1.58 1.35-1.86 1.59E-08 0.818 -0.019 MRC2 17 cg13894747* 1.75 1.44-2.12 1.59E-08 0.799 -0.015 1 cg04548002* 1.58 1.35-1.86 1.60E-08 0.701 -0.017 DEGS2 14 cg21071625* 1.6 1.36-1.89 1.61E-08 0.606 -0.014 ACTN4 19 cg15723874 1.61 1.37-1.9 1.62E-08 0.778 -0.035 MEF2D 1 cg06073417* 1.61 1.36-1.89 1.62E-08 0.741 -0.025 PLA1A 3 cg25061610* 1.56 1.34-1.82 1.63E-08 0.68 -0.02 HLA-DOA 6 cg27425146* 1.56 1.34-1.82 1.65E-08 0.933 -0.015 1 cg27596890 1.62 1.37-1.9 1.66E-08 0.727 -0.032 ZNF423 16 cg11214576*^ 1.57 1.34-1.84 1.66E-08 0.739 -0.018 LDLRAP1 1 cg24678732* 1.62 1.37-1.92 1.67E-08 0.788 -0.028 3 cg05302701* 1.55 1.33-1.81 1.67E-08 0.834 -0.022 NFATC1 18 cg22368204* 1.6 1.36-1.89 1.68E-08 0.775 -0.028 KCNQ2 20 cg18869840* 1.6 1.36-1.88 1.69E-08 0.831 -0.021 TCF7L1 2 cg14979674* 1.59 1.36-1.87 1.70E-08 0.767 -0.027 HCN1 5 cg05231226* 1.59 1.36-1.87 1.70E-08 0.825 -0.02 ACSF3 16 cg11932891* 1.6 1.36-1.88 1.72E-08 0.814 -0.024 ATXN7L1 7 cg27069753 1.57 1.34-1.83 1.73E-08 0.652 -0.02 CELA3B 1 cg09149541 1.54 1.33-1.79 1.74E-08 0.887 -0.017 TXNRD2 22 cg15169900 1.58 1.35-1.85 1.75E-08 0.778 -0.031 6 cg20710258* 1.53 1.32-1.78 1.75E-08 0.78 -0.018 3 cg07429087* 1.56 1.34-1.82 1.76E-08 0.713 -0.035 NMUR2 5 cg22372285* 1.58 1.35-1.86 1.76E-08 0.827 -0.024 11 cg10755723* 1.62 1.37-1.91 1.77E-08 0.636 -0.03 MB21D2 3 cg02556071* 1.64 1.38-1.95 1.77E-08 0.789 -0.029 CARD18 11 cg26808154* 1.59 1.35-1.87 1.77E-08 0.751 -0.027 MORN1 1 cg07457213 1.58 1.35-1.85 1.83E-08 0.843 -0.023 SH3PXD2A 10 cg17162576 1.57 1.34-1.84 1.83E-08 0.0966 0.0174 SGIP1 1 cg07472708 1.59 1.36-1.88 1.85E-08 0.812 -0.025 SMOX 20 cg07915323* 1.55 1.33-1.81 1.86E-08 0.96 -0.015 14 cg26924223* 1.57 1.34-1.84 1.87E-08 0.819 -0.02 ZNF563 19 cg18320149* 1.58 1.35-1.86 1.88E-08 0.749 -0.026 5 cg03838714* 1.56 1.34-1.82 1.88E-08 0.693 -0.026 10

159 cg00624799 1.54 1.32-1.78 1.88E-08 0.768 -0.025 ZNF710 15 cg20143758* 1.55 1.33-1.81 1.88E-08 0.748 -0.022 CCR6 6 cg23960324 1.58 1.35-1.85 1.88E-08 0.116 0.018 MIR203 14 cg01674585* 1.62 1.37-1.91 1.89E-08 0.651 -0.03 20 cg02328758* 1.67 1.4-2 1.89E-08 0.779 -0.028 NLGN1 3 cg22822910* 1.58 1.35-1.86 1.89E-08 0.91 -0.016 22 cg03604731* 1.55 1.33-1.81 1.90E-08 0.86 -0.021 2 cg06941630 1.58 1.35-1.86 1.91E-08 0.667 -0.027 12 cg16706613*^ 1.59 1.35-1.87 1.91E-08 0.885 -0.022 12 cg05583636* 1.57 1.34-1.84 1.92E-08 0.892 -0.021 ARHGAP17 16 cg22037030* 1.6 1.36-1.89 1.93E-08 0.78 -0.032 1 cg07972716* 1.57 1.34-1.84 1.94E-08 0.601 -0.037 20 cg13442606* 1.59 1.35-1.87 1.94E-08 0.681 -0.027 CARS 11 cg17960823* 1.58 1.35-1.85 1.96E-08 0.798 -0.021 LOC283332 12 cg07235935* 1.62 1.37-1.92 1.97E-08 0.48 -0.031 NDST4 4 cg19131667* 1.61 1.36-1.9 1.99E-08 0.676 -0.025 TBC1D5 3 cg05334655* 1.6 1.36-1.89 2.00E-08 0.669 -0.027 MYH7 14 cg03455990 1.55 1.33-1.81 2.00E-08 0.828 -0.023 10 cg17866692 1.61 1.36-1.9 2.01E-08 0.659 -0.029 TRPV4 12 cg01832757* 1.59 1.35-1.87 2.01E-08 0.817 -0.026 6 cg05977669 1.62 1.37-1.92 2.01E-08 0.148 0.026 HOXA11-AS 7 cg24512581* 1.73 1.43-2.1 2.02E-08 0.742 -0.026 4 cg21646060* 1.56 1.34-1.82 2.02E-08 0.64 -0.017 7 cg06013833 1.64 1.38-1.95 2.03E-08 0.851 -0.025 11 cg16652259 1.57 1.34-1.83 2.03E-08 0.117 0.019 DLX1 2 cg00827176 1.56 1.34-1.82 2.04E-08 0.676 -0.023 NPM2 8 cg11515702 1.54 1.33-1.8 2.04E-08 0.834 -0.015 ALDH3B1 11 cg18335991 1.54 1.33-1.8 2.05E-08 0.763 -0.028 SEMA7A 15 cg07633702* 1.58 1.35-1.85 2.05E-08 0.697 -0.026 18 cg08414647* 1.68 1.4-2.02 2.06E-08 0.639 -0.029 GPR156 3 cg21949830 1.56 1.34-1.83 2.06E-08 0.812 -0.026 SLC43A2 17 cg01692088* 1.56 1.34-1.82 2.06E-08 0.666 -0.023 1 cg15912800 1.55 1.33-1.81 2.06E-08 0.0994 0.0286 MIR196B 7 cg12664938* 1.6 1.36-1.89 2.07E-08 0.924 -0.021 LRRK2 12 cg26538442* 1.58 1.35-1.86 2.07E-08 0.865 -0.021 CES3 16 cg27565277 1.58 1.35-1.85 2.08E-08 0.831 -0.029 XYLT1 16 cg18014109 1.56 1.33-1.81 2.11E-08 0.708 -0.027 11 cg23210075 1.56 1.34-1.82 2.11E-08 0.834 -0.017 BAIAP2 17 cg13826452 1.56 1.34-1.83 2.11E-08 0.489 0.033 6 cg02965895* 1.6 1.36-1.89 2.13E-08 0.746 -0.038 OR51L1 11 cg03290522* 1.6 1.36-1.89 2.15E-08 0.664 -0.027 15 cg01184194* 1.55 1.33-1.81 2.15E-08 0.837 -0.017 5 cg14597586* 1.56 1.33-1.82 2.16E-08 0.721 -0.021 KLHL30 2

160 cg07125243* 1.62 1.37-1.92 2.17E-08 0.599 -0.021 ANGPT1 8 cg03960562* 1.56 1.34-1.82 2.17E-08 0.682 -0.016 SCGB1C1 11 cg25968937 1.56 1.34-1.82 2.19E-08 0.741 -0.032 DOK6 18 cg24900164* 1.52 1.31-1.75 2.20E-08 0.832 -0.024 8 cg14642381* 1.56 1.33-1.82 2.21E-08 0.617 -0.024 PRDM14 8 cg14894368* 1.56 1.34-1.83 2.22E-08 0.825 -0.029 NAPSA 19 cg18170297* 1.6 1.36-1.89 2.22E-08 0.681 -0.025 8 cg24009030* 1.56 1.34-1.82 2.22E-08 0.744 -0.02 13 cg18845433* 1.6 1.36-1.88 2.24E-08 0.679 -0.036 12 cg04299028* 1.56 1.33-1.81 2.24E-08 0.763 -0.023 KCNA10 1 cg24700494* 1.57 1.34-1.85 2.27E-08 0.645 -0.025 MDC1 6 cg20547131* 1.58 1.35-1.86 2.29E-08 0.617 -0.026 MIR412 14 cg16247269* 1.58 1.35-1.86 2.31E-08 0.607 -0.024 5 cg04896437* 1.58 1.34-1.85 2.32E-08 0.673 -0.018 1 cg26429636* 1.57 1.34-1.85 2.33E-08 0.744 -0.025 AZGP1 7 cg06769994 1.54 1.32-1.79 2.35E-08 0.487 -0.03 17 cg16334425* 1.54 1.32-1.79 2.36E-08 0.846 -0.025 CORO1B 11 cg00568164* 1.58 1.34-1.85 2.37E-08 0.859 -0.02 TSPAN18 11 cg02713951 1.56 1.33-1.82 2.37E-08 0.511 -0.016 22 cg23399011* 1.63 1.37-1.93 2.38E-08 0.647 -0.035 PIRT 17 cg25227672 1.59 1.35-1.87 2.38E-08 0.677 -0.028 HEPACAM 11 cg21012296 1.53 1.32-1.77 2.38E-08 0.142 0.03 11 cg16247184* 1.59 1.35-1.87 2.40E-08 0.735 -0.032 CEP112 17 cg00085732 1.55 1.33-1.81 2.40E-08 0.758 -0.024 SLC13A2 17 cg25420945* 1.53 1.32-1.78 2.40E-08 0.646 -0.011 5 cg01300684* 1.62 1.36-1.91 2.43E-08 0.777 -0.027 SCOC 4 cg24446586 1.57 1.34-1.85 2.44E-08 0.0966 0.0294 HOXA11-AS 7 cg01435039 1.59 1.35-1.87 2.45E-08 0.893 -0.02 PTCHD3P1 10 cg24017800* 1.57 1.34-1.84 2.45E-08 0.905 -0.014 17 cg01206944* 1.53 1.32-1.77 2.46E-08 0.652 -0.026 ATP10A 15 cg20302957* 1.6 1.36-1.89 2.46E-08 0.792 -0.017 11 cg12524061 1.57 1.34-1.85 2.47E-08 0.883 -0.016 SDR9C7 12 cg27252164 1.56 1.34-1.83 2.47E-08 0.185 0.028 ASCL2 11 cg24995347 1.57 1.34-1.84 2.48E-08 0.839 -0.023 SP140 2 cg17306339* 1.61 1.36-1.9 2.50E-08 0.704 -0.034 ATP6V0A2 12 cg27000831* 1.54 1.32-1.8 2.50E-08 0.763 -0.026 CCL8 17 cg08956010* 1.54 1.32-1.8 2.50E-08 0.873 -0.022 FGF23 12 cg12800105* 1.6 1.36-1.89 2.52E-08 0.846 -0.031 KDM2B 12 cg00741624 1.54 1.32-1.8 2.53E-08 0.347 0.026 MYOZ3 14 cg18204684 1.63 1.37-1.93 2.54E-08 0.844 -0.027 4 cg14665203* 1.61 1.36-1.9 2.55E-08 0.798 -0.029 FAM46B 1 cg10623198 1.56 1.34-1.82 2.55E-08 0.802 -0.022 GNB3 12 cg22724921* 1.58 1.35-1.86 2.55E-08 0.558 -0.014 NUTM2A 10

161 cg01704534* 1.63 1.37-1.93 2.56E-08 0.695 -0.027 RBM47 4 cg09412882* 1.56 1.34-1.83 2.56E-08 0.794 -0.026 MICAL2 11 cg01285652* 1.53 1.32-1.78 2.56E-08 0.826 -0.025 SBF1 22 cg09706133* 1.6 1.36-1.89 2.58E-08 0.428 -0.019 ITGA11 15 cg23973801* 1.54 1.32-1.8 2.58E-08 0.906 -0.015 UBOX5 20 cg16513326 1.53 1.32-1.77 2.59E-08 0.737 -0.024 XKR6 8 cg18046365* 1.58 1.34-1.85 2.60E-08 0.661 -0.031 SLC1A7 1 cg15396686 1.53 1.32-1.78 2.60E-08 0.176 0.03 HMX3 10 cg16375358* 1.6 1.36-1.88 2.62E-08 0.743 -0.031 CACNG5 17 cg26896979 1.6 1.36-1.89 2.62E-08 0.824 -0.027 CENPS 1 cg06956052* 1.56 1.34-1.83 2.62E-08 0.662 -0.02 ATP1A4 1 cg25811526* 1.58 1.34-1.86 2.63E-08 0.639 -0.033 SLC4A5 2 cg09807856* 1.6 1.36-1.89 2.63E-08 0.841 -0.023 13 cg06139513* 1.6 1.36-1.89 2.64E-08 0.867 -0.026 RAB35 12 cg24673600* 1.57 1.34-1.84 2.64E-08 0.87 -0.026 BATF 14 cg02486096* 1.57 1.34-1.84 2.64E-08 0.519 -0.025 19 cg21415035 1.59 1.35-1.88 2.64E-08 0.88 -0.021 ACVRL1 12 cg23847712 1.55 1.33-1.81 2.64E-08 0.254 0.034 DRD5 4 cg00608779* 1.57 1.34-1.84 2.65E-08 0.808 -0.018 PRRX2 9 cg02556649* 1.59 1.35-1.88 2.66E-08 0.881 -0.036 TNNT3 11 cg00550955*^ 1.53 1.32-1.78 2.66E-08 0.839 -0.024 22 cg05426956 1.56 1.33-1.82 2.66E-08 0.689 -0.019 HTRA1 10 cg02387843* 1.63 1.37-1.94 2.67E-08 0.727 -0.03 SLC2A9 4 cg09108532 1.63 1.37-1.93 2.67E-08 0.557 -0.029 11 cg25945273* 1.57 1.34-1.84 2.67E-08 0.939 -0.02 CLEC17A 19 cg02155891* 1.59 1.35-1.87 2.67E-08 0.71 -0.019 NKD2 5 cg16062877* 1.56 1.34-1.83 2.67E-08 0.691 -0.019 KSR1 17 cg21376526* 1.56 1.33-1.82 2.68E-08 0.81 -0.02 SPDYE4 17 cg11131599 1.63 1.37-1.94 2.69E-08 0.78 -0.033 PLEKHA6 1 cg08500172* 1.61 1.36-1.91 2.69E-08 0.828 -0.023 8 cg08989942* 1.55 1.33-1.81 2.69E-08 0.872 -0.021 CD207 2 cg02362292* 1.57 1.34-1.85 2.70E-08 0.753 -0.023 DAGLA 11 cg03099790* 1.63 1.37-1.93 2.70E-08 0.808 -0.018 AMFR 16 cg06588201 1.55 1.33-1.8 2.71E-08 0.213 0.024 4 cg01597398 1.58 1.34-1.86 2.72E-08 0.641 -0.022 17 cg10529401 1.55 1.33-1.8 2.72E-08 0.732 -0.019 SLC1A7 1 cg03377767* 1.58 1.35-1.87 2.73E-08 0.794 -0.029 MSGN1 2 cg14421980* 1.54 1.32-1.8 2.74E-08 0.806 -0.025 RPL29P2 17 cg17185921* 1.56 1.34-1.83 2.74E-08 0.856 -0.024 6 cg13490979* 1.6 1.36-1.89 2.74E-08 0.866 -0.02 ZNF385B 2 cg01119374 1.56 1.33-1.82 2.75E-08 0.743 -0.023 2 cg06430509* 1.63 1.37-1.93 2.76E-08 0.777 -0.037 21 cg14558716* 1.56 1.34-1.83 2.76E-08 0.762 -0.026 ATF1 12

162 cg15110243* 1.6 1.36-1.89 2.78E-08 0.764 -0.036 TENM4 11 cg24416609 1.58 1.34-1.86 2.78E-08 0.642 -0.025 GPR55 2 cg15152595* 1.56 1.33-1.82 2.78E-08 0.918 -0.021 NFASC 1 cg16431978 1.61 1.36-1.9 2.80E-08 0.669 -0.03 KRTAP13-3 21 cg02240748* 1.55 1.33-1.81 2.80E-08 0.64 -0.02 16 cg12599700 1.53 1.32-1.78 2.80E-08 0.126 0.021 16 cg10211744 1.54 1.32-1.8 2.82E-08 0.638 -0.019 22 cg06506523* 1.58 1.34-1.85 2.82E-08 0.903 -0.018 MEGF10 5 cg24512093 1.65 1.39-1.98 2.83E-08 0.569 -0.027 ROBO1 3 cg22816131* 1.56 1.33-1.82 2.84E-08 0.929 -0.015 5 cg00934232* 1.61 1.36-1.91 2.85E-08 0.854 -0.033 BAALC 8 cg19267163* 1.62 1.37-1.92 2.85E-08 0.85 -0.023 NKAIN2 6 cg23477380* 1.57 1.34-1.84 2.85E-08 0.778 -0.019 NDUFA11 19 cg06759632* 1.56 1.33-1.82 2.85E-08 0.902 -0.016 ACAP3 1 cg08267399* 1.58 1.35-1.86 2.86E-08 0.52 -0.033 LINC01300 8 cg08951271* 1.53 1.32-1.79 2.86E-08 0.836 -0.029 DDR1 6 cg00321901* 1.57 1.34-1.85 2.87E-08 0.458 -0.019 RP1L1 8 cg23345292 1.59 1.35-1.87 2.89E-08 0.7 -0.03 LOXHD1 18 cg25394452* 1.5 1.3-1.73 2.89E-08 0.969 -0.008 PRRX1 1 cg02478149* 1.53 1.32-1.78 2.90E-08 0.704 -0.021 GALR3 22 cg00013618 1.57 1.34-1.85 2.91E-08 0.861 -0.021 22 cg20442289* 1.62 1.36-1.92 2.92E-08 0.805 -0.026 C5orf15 5 cg02441543* 1.54 1.32-1.8 2.92E-08 0.861 -0.013 SHANK3 22 cg04010868* 1.56 1.33-1.82 2.93E-08 0.726 -0.022 ZC3H7A 16 cg14741236* 1.59 1.35-1.88 2.96E-08 0.74 -0.029 5 cg03707749* 1.63 1.37-1.93 2.96E-08 0.836 -0.026 12 cg11686208* 1.6 1.35-1.89 2.96E-08 0.862 -0.023 BCL11A 2 cg00361176* 1.54 1.32-1.8 2.97E-08 0.798 -0.02 HIP1 7 cg01668099 1.59 1.35-1.88 2.98E-08 0.663 -0.027 11 cg10472263* 1.65 1.38-1.97 3.00E-08 0.828 -0.025 SNORD114-25 14 cg26040761* 1.52 1.31-1.76 3.02E-08 0.902 -0.012 EPS8L3 1 cg06256858 1.59 1.35-1.87 3.02E-08 0.101 0.015 CRMP1 4 cg20017995 1.54 1.32-1.8 3.03E-08 0.675 -0.028 SNORD24 9 cg13683194* 1.63 1.37-1.94 3.04E-08 0.726 -0.029 TMEM246 9 cg24479552* 1.56 1.33-1.82 3.04E-08 0.915 -0.018 FFAR1 19 cg12682367* 1.63 1.37-1.94 3.05E-08 0.697 -0.031 13 cg10767662* 1.5 1.3-1.73 3.05E-08 0.827 -0.02 NFIX 19 cg12562132 1.57 1.34-1.84 3.06E-08 0.784 -0.023 11 cg08314660* 1.53 1.31-1.77 3.06E-08 0.821 -0.017 PKP3 11 cg09529783* 1.55 1.33-1.81 3.08E-08 0.851 -0.023 8 cg03014326^ 1.54 1.32-1.79 3.09E-08 0.201 0.035 ANKRD30B 18 cg08903465 1.61 1.36-1.91 3.10E-08 0.814 -0.02 1 cg02872868 1.62 1.36-1.92 3.11E-08 0.627 -0.037 SORBS2 4

163 cg14206056* 1.59 1.35-1.88 3.11E-08 0.816 -0.028 9 cg25637966* 1.57 1.34-1.84 3.11E-08 0.683 -0.023 TMPRSS4 11 cg12113981 1.58 1.34-1.86 3.13E-08 0.834 -0.023 CHRNA7 15 cg06191091* 1.56 1.33-1.82 3.13E-08 0.829 -0.022 17 cg25055341* 1.53 1.32-1.79 3.13E-08 0.84 -0.016 12 cg26354908 1.55 1.33-1.81 3.14E-08 0.756 -0.031 12 cg12074024* 1.54 1.32-1.8 3.14E-08 0.771 -0.02 1 cg11061434* 1.55 1.33-1.81 3.16E-08 0.77 -0.026 TENM2 5 cg07356876 1.56 1.33-1.82 3.16E-08 0.725 -0.02 KRT6B 12 cg12008118* 1.53 1.32-1.78 3.16E-08 0.816 -0.013 LDB3 10 cg03770548 1.52 1.31-1.76 3.18E-08 0.832 -0.021 CDH24 14 cg02035102* 1.56 1.33-1.82 3.20E-08 0.567 -0.018 7 cg07759394 1.5 1.3-1.73 3.20E-08 0.0365 0.0214 GLB1L2 11 cg02936931 1.6 1.35-1.89 3.21E-08 0.811 -0.029 6 cg20428976* 1.59 1.35-1.88 3.21E-08 0.866 -0.021 TTR 18 cg03369289* 1.51 1.31-1.75 3.21E-08 0.705 -0.02 7 cg14326354* 1.56 1.34-1.83 3.22E-08 0.647 -0.025 PRODH 22 cg16845126* 1.54 1.32-1.8 3.22E-08 0.708 -0.024 RAMP1 2 cg19063856 1.56 1.33-1.82 3.22E-08 0.534 -0.02 RAPGEFL1 17 cg10546487* 1.58 1.34-1.86 3.22E-08 0.889 -0.016 CLIC4 1 cg10203223* 1.52 1.31-1.75 3.23E-08 0.596 -0.021 13 cg25104234 1.58 1.35-1.87 3.24E-08 0.572 -0.034 2 cg02862087 1.5 1.3-1.72 3.24E-08 0.659 -0.02 PXDN 2 cg03216698* 1.58 1.34-1.86 3.27E-08 0.564 -0.022 TNXB 6 cg07330438* 1.62 1.37-1.93 3.28E-08 0.782 -0.031 21 cg21508713* 1.55 1.33-1.81 3.29E-08 0.722 -0.022 5 cg08735716 1.56 1.33-1.82 3.30E-08 0.71 -0.021 2 cg16360432* 1.56 1.33-1.83 3.30E-08 0.672 -0.021 TRPV6 7 cg00056002 1.56 1.33-1.82 3.30E-08 0.936 -0.015 PABPN1L 16 cg05653018* 1.55 1.33-1.81 3.32E-08 0.877 -0.022 ELF3 1 cg18584639* 1.61 1.36-1.91 3.34E-08 0.7 -0.03 5 cg12594635* 1.55 1.33-1.81 3.34E-08 0.77 -0.025 12 cg10363926* 1.58 1.34-1.86 3.34E-08 0.662 -0.024 SH3RF3 2 cg26750789 1.54 1.32-1.8 3.35E-08 0.617 -0.015 CSHL1 17 cg13614440 1.63 1.37-1.93 3.36E-08 0.644 -0.028 DNAH17 17 cg04945379* 1.59 1.35-1.88 3.37E-08 0.682 -0.033 CIITA 16 cg19589396* 1.58 1.34-1.86 3.37E-08 0.696 -0.033 8 cg21100712* 1.58 1.35-1.87 3.37E-08 0.881 -0.023 5 cg10631854* 1.64 1.38-1.96 3.38E-08 0.708 -0.032 3 cg27428414* 1.58 1.34-1.86 3.38E-08 0.633 -0.028 HOGA1 10 cg00062437* 1.58 1.34-1.86 3.38E-08 0.572 -0.018 22 cg02527112 1.59 1.35-1.87 3.39E-08 0.0775 0.0154 HOXD11 2 cg09558425 1.57 1.34-1.84 3.40E-08 0.78 -0.022 10

164 cg18468962* 1.52 1.31-1.77 3.41E-08 0.702 -0.032 19 cg17987968* 1.59 1.35-1.88 3.41E-08 0.718 -0.029 GRIA1 5 cg20326936* 1.56 1.34-1.83 3.41E-08 0.595 -0.029 7 cg03603395* 1.57 1.34-1.84 3.41E-08 0.805 -0.027 7 cg09512973* 1.61 1.36-1.9 3.42E-08 0.609 -0.032 MOB2 11 cg23180780* 1.61 1.36-1.9 3.42E-08 0.814 -0.028 2 cg02598564 1.59 1.35-1.88 3.42E-08 0.721 -0.026 10 cg11395890* 1.55 1.33-1.81 3.42E-08 0.851 -0.02 FHAD1 1 cg01617211* 1.56 1.34-1.83 3.43E-08 0.822 -0.028 AFG3L1P 16 cg26166595* 1.54 1.32-1.79 3.45E-08 0.368 -0.026 DKK3 11 cg20462100* 1.57 1.34-1.85 3.45E-08 0.879 -0.025 1 cg27649094 1.58 1.34-1.86 3.46E-08 0.74 -0.026 FAM83A 8 cg03619083* 1.52 1.31-1.75 3.46E-08 0.857 -0.017 GCDH 19 cg20116869 1.66 1.39-1.98 3.47E-08 0.784 -0.02 15 cg05495005 1.59 1.35-1.88 3.51E-08 0.859 -0.024 15 cg17480438* 1.56 1.33-1.83 3.52E-08 0.803 -0.03 ARFGEF3 6 cg13091702* 1.54 1.32-1.8 3.55E-08 0.844 -0.019 CTTN 11 cg07071839* 1.55 1.33-1.81 3.56E-08 0.805 -0.016 CACNA2D4 12 cg20766179 1.56 1.33-1.83 3.57E-08 0.729 -0.032 KY 3 cg01720520* 1.57 1.34-1.85 3.57E-08 0.713 -0.027 LILRB1 19 cg04670168* 1.57 1.34-1.85 3.57E-08 0.737 -0.023 MYOZ3 5 cg18782488 1.55 1.32-1.81 3.61E-08 0.834 -0.04 AGBL3 7 cg06313775* 1.55 1.33-1.81 3.62E-08 0.931 -0.022 DSN1 20 cg01010091 1.58 1.34-1.86 3.63E-08 0.815 -0.028 14 cg01967416* 1.55 1.32-1.81 3.63E-08 0.795 -0.023 SH2D1B 1 cg14480035 1.58 1.34-1.86 3.65E-08 0.66 -0.03 9 cg25430713* 1.56 1.33-1.83 3.65E-08 0.792 -0.024 2 cg26244013* 1.59 1.35-1.87 3.66E-08 0.724 -0.037 SCNN1A 12 cg15093248* 1.56 1.33-1.83 3.66E-08 0.655 -0.017 ESPNP 1 cg21614211 1.53 1.32-1.78 3.66E-08 0.72 -0.017 VILL 3 cg04857172* 1.56 1.33-1.82 3.67E-08 0.812 -0.02 SRL 16 cg17470095 1.58 1.34-1.87 3.67E-08 0.687 -0.017 AIFM3 22 cg21150026* 1.54 1.32-1.8 3.68E-08 0.713 -0.027 2 cg05673966* 1.52 1.31-1.77 3.69E-08 0.739 -0.023 16 cg00718336 1.62 1.37-1.93 3.69E-08 0.863 -0.023 22 cg03986792* 1.55 1.33-1.81 3.74E-08 0.791 -0.015 2 cg07566463 1.61 1.36-1.9 3.74E-08 0.866 -0.015 8 cg13595308 1.55 1.32-1.81 3.75E-08 0.779 -0.022 9 cg06865119 1.6 1.35-1.89 3.76E-08 0.908 -0.02 SNX29 16 cg27511824* 1.56 1.33-1.82 3.77E-08 0.648 -0.022 1 cg02498072* 1.58 1.34-1.87 3.77E-08 0.829 -0.02 SNX29 16 cg09388414* 1.54 1.32-1.79 3.79E-08 0.723 -0.026 MIR96 7 cg20131897* 1.55 1.32-1.81 3.80E-08 0.779 -0.022 ACVRL1 12

165 cg08757721* 1.53 1.32-1.78 3.80E-08 0.961 -0.016 CDR2L 17 cg08130490* 1.52 1.31-1.76 3.81E-08 0.948 -0.018 NOTCH4 6 cg06119575* 1.52 1.31-1.76 3.81E-08 0.937 -0.017 TAL2 9 cg20748559 1.58 1.34-1.86 3.82E-08 0.858 -0.025 MAP4K1 19 cg26727666* 1.57 1.34-1.85 3.83E-08 0.884 -0.023 TVP23A 16 cg18122743* 1.54 1.32-1.8 3.86E-08 0.704 -0.027 2 cg14909325* 1.56 1.33-1.83 3.86E-08 0.766 -0.023 MOB2 11 cg14972827* 1.62 1.36-1.92 3.87E-08 0.818 -0.028 6 cg04113258* 1.57 1.34-1.84 3.87E-08 0.714 -0.023 XYLT1 16 cg26325174 1.54 1.32-1.8 3.89E-08 0.606 -0.021 MEG8 14 cg00819818* 1.53 1.31-1.78 3.89E-08 0.761 -0.018 RARA 17 cg27629673* 1.61 1.36-1.9 3.90E-08 0.707 -0.023 ADCY2 5 cg14217211* 1.64 1.37-1.95 3.92E-08 0.761 -0.022 3 cg18463876* 1.59 1.35-1.87 3.94E-08 0.731 -0.03 SRRM3 7 cg26113278* 1.62 1.36-1.92 3.95E-08 0.738 -0.03 LDLRAD1 1 cg09673807* 1.56 1.33-1.83 3.96E-08 0.673 -0.028 6 cg18804615 1.52 1.31-1.77 3.98E-08 0.157 0.024 DIO3 14 cg27157619 1.56 1.33-1.83 3.99E-08 0.884 -0.02 SEMA7A 15 cg08638960 1.59 1.35-1.88 4.00E-08 0.668 -0.024 6 cg21852342 1.52 1.31-1.76 4.00E-08 0.0396 0.0134 4 cg22012759* 1.51 1.3-1.75 4.01E-08 0.742 -0.018 1 cg10318725 1.55 1.33-1.81 4.02E-08 0.929 -0.023 RASA3 13 cg00977903* 1.6 1.36-1.9 4.03E-08 0.829 -0.012 ZNF691 1 cg02267488 1.63 1.37-1.93 4.04E-08 0.865 -0.021 DCK 4 cg11851129* 1.52 1.31-1.77 4.04E-08 0.681 -0.02 SHMT2 12 cg09594635* 1.55 1.33-1.81 4.05E-08 0.819 -0.034 16 cg15993345* 1.54 1.32-1.8 4.05E-08 0.694 -0.019 17 cg02116437* 1.54 1.32-1.79 4.06E-08 0.768 -0.021 GPBAR1 2 cg01245155 1.62 1.37-1.93 4.07E-08 0.772 -0.029 PARD3B 2 cg16709433* 1.63 1.37-1.94 4.07E-08 0.751 -0.024 6 cg18076767* 1.55 1.32-1.81 4.07E-08 0.831 -0.022 11 cg25444821* 1.54 1.32-1.8 4.07E-08 0.85 -0.014 KCNK3 2 cg08687753* 1.56 1.33-1.82 4.11E-08 0.632 -0.034 NOX4 11 cg12179154* 1.58 1.34-1.86 4.12E-08 0.845 -0.03 18 cg10303698* 1.52 1.31-1.76 4.12E-08 0.554 -0.02 CUEDC1 17 cg22481882 1.53 1.32-1.79 4.13E-08 0.913 -0.022 WDR97 8 cg24542929* 1.56 1.33-1.84 4.13E-08 0.798 -0.021 GABRP 5 cg26293512*^ 1.55 1.32-1.81 4.14E-08 0.633 -0.028 TEPP 16 cg17271504* 1.51 1.3-1.75 4.15E-08 0.851 -0.017 1 cg20078074 1.52 1.31-1.77 4.16E-08 0.884 -0.011 12 cg06031011 1.56 1.33-1.83 4.17E-08 0.848 -0.017 6 cg23081357 1.55 1.32-1.81 4.18E-08 0.748 -0.023 2 cg18283321 1.57 1.34-1.85 4.19E-08 0.769 -0.025 DDC 7

166 cg26257163* 1.51 1.3-1.75 4.19E-08 0.558 -0.013 5 cg22852840 1.59 1.35-1.88 4.20E-08 0.769 -0.025 OPN5 6 cg25090514 1.51 1.3-1.75 4.20E-08 0.0705 0.0189 5 cg18320737* 1.54 1.32-1.8 4.23E-08 0.697 -0.03 CDK18 1 cg01717524* 1.57 1.34-1.85 4.25E-08 0.679 -0.028 1 cg26159860* 1.57 1.34-1.85 4.25E-08 0.758 -0.028 11 cg04743510* 1.54 1.32-1.8 4.25E-08 0.673 -0.022 17 cg18666630* 1.61 1.36-1.9 4.27E-08 0.574 -0.026 CACNG2 22 cg23925201* 1.58 1.34-1.87 4.29E-08 0.689 -0.023 2 cg03941975* 1.53 1.31-1.78 4.29E-08 0.536 -0.018 ANKRD26P3 13 cg10459018 1.63 1.37-1.93 4.30E-08 0.753 -0.03 6 cg10134963* 1.58 1.34-1.87 4.30E-08 0.914 -0.018 AGPAT5 8 cg07196992* 1.57 1.34-1.84 4.32E-08 0.862 -0.024 4 cg15410236* 1.52 1.31-1.77 4.33E-08 0.89 -0.027 ARID3A 19 cg17482801 1.52 1.31-1.77 4.33E-08 0.547 -0.022 11 cg23222278* 1.54 1.32-1.8 4.33E-08 0.829 -0.015 3 cg13852084* 1.58 1.34-1.87 4.35E-08 0.743 -0.029 HLA-DMA 6 cg16657886* 1.52 1.31-1.76 4.36E-08 0.691 -0.021 SYT8 11 cg07487751* 1.52 1.31-1.77 4.36E-08 0.921 -0.016 ZDHHC14 6 cg26043460* 1.55 1.32-1.81 4.37E-08 0.801 -0.028 KRT38 17 cg27232482* 1.54 1.32-1.8 4.39E-08 0.551 -0.016 ACKR1 1 cg15758700* 1.57 1.34-1.85 4.42E-08 0.806 -0.03 LMNTD1 12 cg01438467* 1.53 1.31-1.78 4.42E-08 0.878 -0.025 SLC43A2 17 cg15895132* 1.52 1.31-1.77 4.42E-08 0.637 -0.02 1 cg20192747^ 1.59 1.35-1.88 4.42E-08 0.394 0.02 18 cg21513316 1.53 1.31-1.78 4.46E-08 0.841 -0.016 MIR409 14 cg04278197 1.57 1.33-1.84 4.48E-08 0.894 -0.026 20 cg00253346 1.58 1.34-1.86 4.50E-08 0.857 -0.02 TNFRSF13C 22 cg02764245 1.53 1.31-1.78 4.50E-08 0.074 0.0186 2 cg20994118*^ 1.56 1.33-1.83 4.51E-08 0.593 -0.027 CAMK1G 1 cg01295539* 1.56 1.33-1.83 4.52E-08 0.749 -0.028 TLN2 15 cg25754143* 1.56 1.33-1.82 4.52E-08 0.723 -0.027 TRPM5 11 cg01372572* 1.52 1.31-1.76 4.53E-08 0.729 -0.019 ATP10A 15 cg18343556* 1.56 1.33-1.83 4.54E-08 0.671 -0.023 LSMEM2 3 cg03634928* 1.54 1.32-1.79 4.54E-08 0.801 -0.021 TMPRSS4 11 cg20274006* 1.6 1.35-1.9 4.56E-08 0.848 -0.021 LINC02209 8 cg08707112 1.53 1.31-1.78 4.57E-08 0.0887 0.0133 GATA3 10 cg00029521 1.59 1.35-1.88 4.58E-08 0.682 -0.029 KCNK17 6 cg11306701 1.51 1.3-1.74 4.58E-08 0.135 0.026 CTSL 9 cg00566320 1.52 1.31-1.76 4.59E-08 0.783 -0.022 1 cg11035122* 1.6 1.35-1.89 4.61E-08 0.825 -0.025 MIR758 14 cg10633609* 1.63 1.37-1.94 4.63E-08 0.76 -0.027 4 cg27133493* 1.51 1.3-1.75 4.64E-08 0.937 -0.016 TANGO6 16

167 cg12439157* 1.52 1.31-1.77 4.65E-08 0.455 -0.029 1 cg23561531 1.6 1.35-1.9 4.65E-08 0.682 -0.029 11 cg21560697* 1.57 1.34-1.85 4.66E-08 0.79 -0.032 TAF1B 2 cg25291978* 1.53 1.31-1.78 4.66E-08 0.818 -0.027 PLAGL1 6 cg24004327* 1.55 1.32-1.81 4.68E-08 0.894 -0.015 4 cg23424158 1.63 1.37-1.95 4.69E-08 0.721 -0.03 DLG2 11 cg05316785* 1.57 1.34-1.86 4.71E-08 0.801 -0.03 0 7 cg03515656* 1.54 1.32-1.8 4.71E-08 0.75 -0.025 IFT140 16 cg01091565* 1.59 1.35-1.88 4.72E-08 0.911 -0.012 MESP1 15 cg11161417 1.58 1.34-1.87 4.76E-08 0.735 -0.033 SPACA3 17 cg16412573* 1.53 1.31-1.78 4.76E-08 0.753 -0.019 RGS6 14 cg03182608* 1.52 1.31-1.77 4.77E-08 0.733 -0.021 0 17 cg08118034* 1.52 1.31-1.77 4.77E-08 0.846 -0.02 GAS2L1 22 cg08681689* 1.56 1.33-1.83 4.78E-08 0.622 -0.017 EIF4G1 3 cg19113081* 1.57 1.34-1.85 4.82E-08 0.655 -0.035 NXPE2 11 cg26575166* 1.55 1.32-1.81 4.82E-08 0.907 -0.018 0 15 cg23508908* 1.58 1.34-1.87 4.84E-08 0.861 -0.024 MRAP2 6 cg04935792* 1.62 1.36-1.92 4.84E-08 0.844 -0.018 0 12 cg01840724* 1.57 1.34-1.85 4.85E-08 0.642 -0.021 0 9 cg02964102* 1.53 1.32-1.79 4.85E-08 0.754 -0.021 0 22 cg07205823* 1.56 1.33-1.84 4.87E-08 0.683 -0.032 0 8 cg11352430* 1.56 1.33-1.82 4.88E-08 0.831 -0.022 PDE11A 2 cg06531034* 1.58 1.34-1.86 4.88E-08 0.808 -0.021 0 2 cg19171175* 1.49 1.29-1.72 4.91E-08 0.959 -0.015 LRRC56 11 cg07429623* 1.55 1.32-1.81 4.93E-08 0.732 -0.032 KLHDC8A 1 cg10067062* 1.52 1.31-1.77 4.94E-08 0.675 -0.016 PDE9A 21 cg10603156 1.6 1.35-1.9 4.95E-08 0.764 -0.034 FRMD4A 10 cg16112945* 1.57 1.34-1.85 4.97E-08 0.707 -0.032 ADAMTS13 9 cg25249728* 1.58 1.34-1.86 4.97E-08 0.905 -0.028 0 2 cg09232225 1.51 1.3-1.75 4.98E-08 0.826 -0.02 CES4A 16 cg03655527* 1.5 1.3-1.73 4.98E-08 0.857 -0.013 MT1B 16 cg00518276* 1.57 1.34-1.86 4.99E-08 0.544 -0.022 0 11 cg17153794* 1.55 1.32-1.81 5.02E-08 0.646 -0.022 MACROD1 11 cg05475010 1.53 1.32-1.79 5.04E-08 0.781 -0.016 TREML3P 6 cg19189176* 1.52 1.31-1.77 5.07E-08 0.692 -0.027 KIAA1644 22 cg17545399 1.59 1.35-1.88 5.07E-08 0.805 -0.025 FRMPD2 10 cg18780021* 1.56 1.33-1.84 5.07E-08 0.789 -0.02 0 17 cg18279094 1.53 1.31-1.78 5.07E-08 0.256 0.043 FOXD3 1 cg13089904* 1.53 1.31-1.78 5.08E-08 0.616 -0.031 0 1 cg20219911* 1.54 1.32-1.79 5.08E-08 0.674 -0.018 ITIH1 3 cg21461720* 1.52 1.3-1.76 5.09E-08 0.823 -0.019 PKDCC 2 cg17033546* 1.55 1.32-1.81 5.10E-08 0.822 -0.016 ZZZ3 1 cg14990643* 1.55 1.32-1.81 5.11E-08 0.62 -0.025 CDHR1 10

168 cg08155591 1.64 1.37-1.95 5.11E-08 0.878 -0.018 E2F5 8 cg17739643 1.55 1.32-1.82 5.12E-08 0.797 -0.021 GUCA2A 1 cg21165812 1.56 1.33-1.82 5.12E-08 0.848 -0.015 0 20 cg14508021* 1.55 1.32-1.81 5.15E-08 0.904 -0.021 AKAP13 15 cg27450010* 1.62 1.36-1.93 5.17E-08 0.683 -0.034 LDB2 4 cg03012028* 1.5 1.3-1.74 5.17E-08 0.62 -0.023 0 7 cg03625136* 1.51 1.3-1.75 5.20E-08 0.82 -0.02 CACNA2D2 3 cg02588809* 1.53 1.31-1.78 5.21E-08 0.824 -0.023 RASA3 13 cg02762689* 1.52 1.31-1.77 5.21E-08 0.919 -0.015 GUCA1B 6 cg02849894*^ 1.56 1.33-1.84 5.22E-08 0.692 -0.026 GNG7 19 cg07191791* 1.54 1.32-1.8 5.23E-08 0.743 -0.032 SUSD4 1 cg21301148* 1.6 1.35-1.9 5.23E-08 0.539 -0.027 MYH6 14 cg12107247* 1.6 1.35-1.9 5.25E-08 0.808 -0.031 0 17 cg13388277* 1.52 1.31-1.76 5.26E-08 0.669 -0.03 ZNF727 7 cg11018337 1.52 1.31-1.77 5.26E-08 0.0245 0.0097 gata3 10 cg17434901 1.55 1.32-1.82 5.27E-08 0.703 -0.023 0 11 cg11499681 1.56 1.33-1.82 5.27E-08 0.2 0.026 DNAH9 17 cg27448110* 1.52 1.31-1.77 5.27E-08 0.118 0.015 PTPRN2 7 cg05120428 1.56 1.33-1.83 5.28E-08 0.814 -0.018 AFAP1L2 10 cg22489204* 1.54 1.32-1.79 5.29E-08 0.847 -0.026 0 1 cg27353824* 1.55 1.32-1.81 5.31E-08 0.827 -0.021 APOC4 19 cg22129111* 1.55 1.32-1.81 5.32E-08 0.771 -0.025 ALOXE3 17 cg01632562* 1.55 1.32-1.81 5.33E-08 0.828 -0.022 0 6 cg14443182* 1.54 1.32-1.8 5.34E-08 0.678 -0.025 DNAJB5 9 cg06131936* 1.55 1.32-1.82 5.35E-08 0.618 -0.027 OSBPL5 11 cg17401106* 1.57 1.34-1.85 5.35E-08 0.566 -0.02 ADAM33 20 cg23155089 1.54 1.32-1.8 5.35E-08 0.766 -0.019 0 6 cg07707641* 1.56 1.33-1.82 5.37E-08 0.735 -0.022 CST4 20 cg01843768* 1.53 1.31-1.79 5.37E-08 0.812 -0.021 MAD1L1 7 cg13863076 1.53 1.31-1.78 5.40E-08 0.777 -0.018 0 22 cg12439439 1.5 1.3-1.73 5.40E-08 0.0464 0.0081 0 11 cg16095155 1.58 1.34-1.86 5.41E-08 0.783 -0.031 TCF19 6 cg10917910* 1.59 1.35-1.89 5.41E-08 0.848 -0.028 FGD6 12 cg14113353* 1.55 1.32-1.81 5.43E-08 0.808 -0.028 0 2 cg15780078* 1.57 1.34-1.86 5.43E-08 0.734 -0.026 RGR 10 cg05635169* 1.56 1.33-1.83 5.43E-08 0.582 -0.025 0 5 cg17235833* 1.51 1.3-1.75 5.44E-08 0.66 -0.02 TNXB 6 cg07777793 1.51 1.3-1.75 5.45E-08 0.792 -0.03 ALPPL2 2 cg16407998 1.57 1.34-1.85 5.45E-08 0.672 -0.022 MAGI2 7 cg12660813* 1.57 1.34-1.86 5.46E-08 0.694 -0.027 PRDM16 1 cg23676618* 1.54 1.32-1.8 5.47E-08 0.842 -0.025 C1orf186 1 cg16431352 1.55 1.32-1.81 5.47E-08 0.715 -0.024 DSC3 18 cg06079620 1.61 1.36-1.91 5.47E-08 0.558 -0.021 MPEG1 11

169 cg24787130* 1.58 1.34-1.86 5.48E-08 0.846 -0.026 0 5 cg15145672 1.52 1.31-1.77 5.48E-08 0.878 -0.015 SH3BP5 3 cg05002837* 1.55 1.32-1.82 5.48E-08 0.911 -0.014 GRAMD4 22 cg18254183* 1.6 1.35-1.9 5.49E-08 0.663 -0.026 0 2 cg13944685* 1.59 1.35-1.89 5.49E-08 0.809 -0.021 IL18BP 11 cg11224946* 1.54 1.32-1.8 5.51E-08 0.644 -0.027 AKR1C1 10 cg10754002 1.61 1.36-1.91 5.52E-08 0.582 -0.025 0 2 cg23622353* 1.54 1.32-1.8 5.52E-08 0.839 -0.024 0 6 cg06159980* 1.55 1.32-1.81 5.53E-08 0.84 -0.023 HSPB7 1 cg05653707 1.52 1.31-1.77 5.54E-08 0.11 0.028 EMX2 10 cg19988235 1.56 1.33-1.83 5.57E-08 0.792 -0.023 0 2 cg15549231* 1.54 1.32-1.8 5.58E-08 0.945 -0.022 BCAS4 20 cg13904667* 1.55 1.32-1.81 5.60E-08 0.518 -0.027 0 20 cg22488719*^ 1.59 1.34-1.88 5.60E-08 0.793 -0.026 LINC00636 3 cg21247874* 1.53 1.31-1.78 5.61E-08 0.623 -0.032 POLR1E 9 cg01522296* 1.59 1.34-1.88 5.61E-08 0.859 -0.014 IL17REL 22 cg25662428* 1.52 1.31-1.77 5.62E-08 0.845 -0.018 MKNK2 19 cg15238200 1.53 1.31-1.79 5.62E-08 0.813 -0.017 TRIM65 17 cg25517810 1.56 1.33-1.83 5.63E-08 0.179 0.027 0 2 cg08363345 1.58 1.34-1.86 5.67E-08 0.73 -0.024 FOXK1 7 cg26917745* 1.54 1.32-1.81 5.70E-08 0.598 -0.025 0 7 cg07988206* 1.53 1.31-1.78 5.70E-08 0.514 -0.023 0 12 cg02807446* 1.61 1.36-1.92 5.70E-08 0.863 -0.022 SORCS3 10 cg09251508 1.53 1.31-1.78 5.70E-08 0.765 -0.018 0 3 cg22247041* 1.52 1.31-1.78 5.73E-08 0.577 -0.024 0 17 cg01687389* 1.53 1.31-1.78 5.73E-08 0.664 -0.022 0 16 cg00187299* 1.51 1.3-1.76 5.75E-08 0.703 -0.019 TBATA 10 cg12556586* 1.54 1.32-1.81 5.77E-08 0.87 -0.019 FAM81A 15 cg19117322* 1.54 1.32-1.81 5.78E-08 0.774 -0.017 ARHGAP23 17 cg21531389 1.54 1.32-1.8 5.79E-08 0.609 -0.025 CRYBB1 22 cg03058346* 1.59 1.34-1.88 5.81E-08 0.745 -0.03 0 1 cg26980244 1.56 1.33-1.83 5.81E-08 0.205 0.03 NEFM 8 cg26976908 1.52 1.31-1.76 5.82E-08 0.669 -0.02 0 2 cg00753399* 1.59 1.34-1.88 5.84E-08 0.8 -0.031 0 2 cg24056232 1.54 1.32-1.81 5.84E-08 0.893 -0.02 C1orf186 1 cg08124372* 1.51 1.3-1.75 5.84E-08 0.7 -0.017 0 17 cg26500588* 1.5 1.3-1.74 5.86E-08 0.78 -0.018 SLC12A7 5 cg22689833 1.51 1.3-1.75 5.87E-08 0.0524 0.0182 0 10 cg13696135 1.54 1.32-1.81 5.92E-08 0.813 -0.019 PRKAG1 12 cg04511534* 1.57 1.33-1.85 5.94E-08 0.839 -0.016 GGT6 17 cg06933777* 1.57 1.34-1.86 5.95E-08 0.685 -0.027 0 18 cg01793445 1.52 1.31-1.78 5.95E-08 0.816 -0.023 RCSD1 1 cg13644626* 1.59 1.34-1.88 5.97E-08 0.691 -0.032 PALM2 9

170 cg10910525* 1.53 1.31-1.79 5.97E-08 0.673 -0.022 HRC 19 cg14125604 1.55 1.32-1.82 6.00E-08 0.09 0.021 FLRT2 14 cg20972466 1.53 1.31-1.78 6.01E-08 0.373 -0.024 DPEP1 16 cg00509772 1.55 1.32-1.82 6.04E-08 0.601 -0.019 E2F1 20 cg14481124 1.59 1.35-1.89 6.05E-08 0.798 -0.026 0 10 cg14608550 1.49 1.29-1.71 6.06E-08 0.875 -0.016 Sep-09 17 cg02900332 1.53 1.31-1.79 6.06E-08 0.0911 0.0189 0 10 cg12699321* 1.63 1.37-1.95 6.08E-08 0.821 -0.031 0 3 cg02404974* 1.53 1.31-1.78 6.08E-08 0.764 -0.026 0 6 cg22609474* 1.58 1.34-1.86 6.08E-08 0.826 -0.021 0 5 cg20629551 1.52 1.31-1.78 6.08E-08 0.845 -0.019 CEMIP 15 cg00375999 1.55 1.32-1.81 6.08E-08 0.734 -0.013 0 16 cg06377177 1.53 1.31-1.79 6.09E-08 0.77 -0.028 0 11 cg00113369 1.56 1.33-1.83 6.10E-08 0.75 -0.022 LINC00982 1 cg15474711 1.52 1.3-1.76 6.10E-08 0.729 -0.021 TREH 11 cg15114651* 1.52 1.31-1.77 6.10E-08 0.563 -0.019 SLC1A5 19 cg01787559* 1.55 1.32-1.81 6.11E-08 0.598 -0.017 HGFAC 4 cg08010546 1.5 1.3-1.75 6.12E-08 0.712 -0.022 0 17 cg11258108* 1.53 1.31-1.79 6.13E-08 0.677 -0.022 SHROOM3 4 cg11143092 1.59 1.34-1.88 6.14E-08 0.803 -0.027 0 2 cg05709468* 1.59 1.34-1.88 6.15E-08 0.862 -0.029 OSBPL1A 18 cg02172819* 1.57 1.34-1.85 6.15E-08 0.557 -0.022 KCNMA1 10 cg12617225* 1.52 1.3-1.76 6.15E-08 0.779 -0.019 0 6 cg22572071* 1.58 1.34-1.86 6.16E-08 0.741 -0.033 0 6 cg05279387* 1.54 1.32-1.8 6.18E-08 0.683 -0.028 AIFM3 22 cg04401876 1.54 1.31-1.8 6.18E-08 0.82 -0.023 APOC4 19 cg00755423* 1.59 1.35-1.89 6.18E-08 0.882 -0.021 RGS8 1 cg25786785* 1.49 1.29-1.72 6.19E-08 0.898 -0.009 TMPRSS6 22 cg25527882 1.58 1.34-1.87 6.20E-08 0.887 -0.025 TFEB 6 cg20227165* 1.53 1.31-1.79 6.20E-08 0.688 -0.02 PRDM11 11 cg08031024 1.59 1.34-1.88 6.21E-08 0.681 -0.035 GLI2 2 cg10913077* 1.59 1.34-1.88 6.21E-08 0.684 -0.034 ARHGEF3 3 cg14817241* 1.54 1.31-1.79 6.21E-08 0.73 -0.021 0 9 cg22902289* 1.57 1.33-1.85 6.21E-08 0.799 -0.013 LFNG 7 cg12756396 1.53 1.31-1.78 6.21E-08 0.165 0.024 DMRTA2 1 cg00656728* 1.57 1.33-1.85 6.22E-08 0.613 -0.023 GUCA2B 1 cg14431528* 1.56 1.33-1.84 6.23E-08 0.812 -0.021 MIR323B 14 cg19261426 1.51 1.3-1.75 6.24E-08 0.601 -0.024 OBSCN 1 cg01008602 1.6 1.35-1.89 6.24E-08 0.244 0.047 TRIM35 8 cg17273416 1.52 1.31-1.77 6.24E-08 0.192 0.022 HOXC11 12 cg07710073 1.52 1.3-1.76 6.25E-08 0.817 -0.016 0 3 cg18511011* 1.56 1.33-1.84 6.26E-08 0.663 -0.021 HABP2 10 cg08463231 1.57 1.33-1.85 6.28E-08 0.685 -0.024 0 15

171 cg09443153* 1.54 1.32-1.8 6.29E-08 0.736 -0.024 PYROXD2 10 cg06523516 1.55 1.32-1.82 6.30E-08 0.543 -0.019 0 2 cg17395184* 1.6 1.35-1.9 6.30E-08 0.859 -0.018 ZNF106 15 cg24774396* 1.53 1.31-1.79 6.31E-08 0.753 -0.019 NXN 17 cg02765403 1.51 1.3-1.75 6.31E-08 0.608 -0.017 MZF1 19 cg20530056 1.55 1.32-1.81 6.32E-08 0.772 -0.026 IKBKE 1 cg05947761* 1.57 1.33-1.85 6.33E-08 0.754 -0.027 0 7 cg15417654 1.57 1.33-1.85 6.33E-08 0.856 -0.019 LPP 3 cg25367999* 1.57 1.34-1.86 6.37E-08 0.814 -0.03 0 11 cg26786407* 1.55 1.32-1.81 6.38E-08 0.732 -0.03 MEGF6 1 cg08349804 1.55 1.32-1.82 6.38E-08 0.708 -0.028 GPR162 12 cg14615946 1.54 1.32-1.8 6.38E-08 0.796 -0.015 AANAT 17 cg20242937* 1.5 1.3-1.74 6.38E-08 0.902 -0.014 SEC14L1 17 cg25278175* 1.54 1.32-1.8 6.40E-08 0.726 -0.031 NRXN1 2 cg02612971* 1.51 1.3-1.76 6.40E-08 0.905 -0.02 0 22 cg16444826* 1.56 1.33-1.83 6.43E-08 0.923 -0.017 0 2 cg11335536*^ 1.53 1.31-1.79 6.44E-08 0.714 -0.033 0 3 cg11596947* 1.6 1.35-1.9 6.48E-08 0.77 -0.024 0 11 cg13537146* 1.56 1.33-1.83 6.49E-08 0.848 -0.018 0 3 cg13093023* 1.53 1.31-1.79 6.50E-08 0.892 -0.024 EIF4G3 1 cg03833216* 1.58 1.34-1.87 6.51E-08 0.773 -0.027 0 17 cg27172140 1.55 1.32-1.82 6.52E-08 0.805 -0.025 EML1 14 cg00025643* 1.55 1.32-1.81 6.52E-08 0.744 -0.024 0 1 cg20932535* 1.52 1.3-1.76 6.53E-08 0.801 -0.022 PAX7 1 cg09241928* 1.55 1.32-1.82 6.54E-08 0.671 -0.02 0 11 cg21489565 1.56 1.32-1.82 6.57E-08 0.831 -0.025 0 5 cg15259904* 1.57 1.33-1.85 6.58E-08 0.708 -0.028 0 6 cg05787063* 1.6 1.35-1.9 6.63E-08 0.831 -0.03 0 10 cg04526104* 1.61 1.36-1.91 6.65E-08 0.71 -0.034 0 2 cg14338073* 1.61 1.35-1.91 6.67E-08 0.81 -0.023 0 10 cg18685034* 1.59 1.35-1.89 6.69E-08 0.57 -0.028 DOCK1 10 cg16580935* 1.52 1.3-1.76 6.69E-08 0.748 -0.025 B3GALT4 6 cg26890181* 1.52 1.31-1.77 6.70E-08 0.645 -0.026 KCNN4 19 cg20401521 1.49 1.29-1.72 6.73E-08 0.0485 0.019 DAPK1 9 cg26363598 1.55 1.32-1.81 6.80E-08 0.92 -0.015 DIP2C 10 cg01635405 1.51 1.3-1.76 6.80E-08 0.145 0.03 HMX3 10 cg13630739 1.55 1.32-1.82 6.82E-08 0.738 -0.027 ATP6V0A4 7 cg27606002* 1.57 1.34-1.86 6.83E-08 0.725 -0.015 C17orf99 17 cg11330282* 1.5 1.3-1.74 6.84E-08 0.796 -0.017 LOC728264 5 cg12355110 1.52 1.31-1.77 6.84E-08 0.271 0.025 0 19 cg00921097 1.57 1.33-1.85 6.85E-08 0.638 -0.024 GPR37L1 1 cg12693436* 1.58 1.34-1.87 6.85E-08 0.734 -0.021 0 10 cg13294738* 1.51 1.3-1.75 6.85E-08 0.728 -0.02 0 10

172 cg01309121* 1.53 1.31-1.78 6.87E-08 0.802 -0.025 0 1 cg03520471* 1.56 1.33-1.83 6.88E-08 0.513 -0.023 GABRR3 3 cg17232583* 1.52 1.31-1.78 6.89E-08 0.835 -0.024 BTNL2 6 cg07014349* 1.55 1.32-1.82 6.90E-08 0.741 -0.023 GJB5 1 cg09293786 1.5 1.29-1.73 6.91E-08 0.0558 0.0179 0 16 cg14157844* 1.52 1.31-1.77 6.94E-08 0.113 0.019 0 10 cg15224955 1.57 1.33-1.85 6.95E-08 0.735 -0.025 AKR1B15 7 cg17952824* 1.55 1.32-1.81 6.98E-08 0.853 -0.025 HNRNPA3P1 10 cg11304682* 1.58 1.34-1.87 7.00E-08 0.681 -0.023 PDZRN3 3 cg14667104* 1.53 1.31-1.78 7.00E-08 0.867 -0.016 GCN1 12 cg03553786 1.52 1.3-1.77 7.02E-08 0.112 0.026 LINC00620 3 cg00532824 1.53 1.31-1.78 7.06E-08 0.773 -0.016 CTRB2 16 cg00917101 1.52 1.31-1.77 7.08E-08 0.524 -0.023 ESPN 1 cg06441398* 1.55 1.32-1.81 7.09E-08 0.665 -0.026 SHANK2 11 cg14107328* 1.51 1.3-1.75 7.09E-08 0.812 -0.017 ZAK 2 cg23580945* 1.51 1.3-1.75 7.09E-08 0.757 -0.016 0 17 cg23128056* 1.56 1.33-1.84 7.10E-08 0.85 -0.02 KRT15 17 cg00099519 1.52 1.31-1.78 7.10E-08 0.87 -0.016 0 5 cg13814485 1.51 1.3-1.75 7.11E-08 0.0312 0.0122 GATA3 10 cg09890775* 1.57 1.33-1.85 7.14E-08 0.335 0.021 EBF3 10 cg14366598* 1.54 1.32-1.8 7.18E-08 0.628 -0.03 IL25 14 cg15016120 1.57 1.33-1.85 7.18E-08 0.69 -0.026 DCD 12 cg25912037* 1.58 1.34-1.87 7.20E-08 0.819 -0.019 0 2 cg12579684 1.61 1.35-1.91 7.22E-08 0.774 -0.028 0 14 cg18330571* 1.49 1.29-1.72 7.22E-08 0.772 -0.02 ITGA2 5 cg04496906* 1.54 1.32-1.81 7.23E-08 0.483 -0.023 SPATC1 8 cg16936846 1.55 1.32-1.82 7.24E-08 0.922 -0.023 FLT3 13 cg01081091* 1.54 1.31-1.8 7.24E-08 0.864 -0.02 RECQL4 8 cg13565300*^ 1.52 1.31-1.78 7.24E-08 0.822 -0.018 GDF10 10 cg00898920* 1.53 1.31-1.79 7.27E-08 0.553 -0.023 0 2 cg07973507* 1.57 1.33-1.85 7.27E-08 0.952 -0.012 0 14 cg17753169 1.53 1.31-1.78 7.29E-08 0.864 -0.024 ANK1 8 cg23684410* 1.5 1.29-1.74 7.30E-08 0.405 -0.022 SIK3 11 cg05007097* 1.53 1.31-1.78 7.30E-08 0.604 -0.021 MLXIPL 7 cg19818308* 1.57 1.33-1.85 7.31E-08 0.522 -0.029 0 14 cg02014733* 1.52 1.3-1.77 7.31E-08 0.813 -0.021 WTIP 19 cg00232265* 1.49 1.29-1.73 7.34E-08 0.765 -0.021 NDUFS6 5 cg20312179* 1.53 1.31-1.79 7.36E-08 0.796 -0.03 EPS15 1 cg14271862* 1.55 1.32-1.81 7.36E-08 0.819 -0.03 0 11 cg21390574* 1.53 1.31-1.79 7.37E-08 0.734 -0.02 SEMA5B 3 cg14679780* 1.54 1.32-1.81 7.38E-08 0.653 -0.03 ZBTB7A 19 cg20813374* 1.54 1.31-1.8 7.38E-08 0.441 -0.02 FKBP5 6 cg12840847 1.52 1.31-1.78 7.38E-08 0.921 -0.017 SEC14L1 17

173 cg09939823* 1.55 1.32-1.82 7.39E-08 0.596 -0.014 0 8 cg14222574 1.55 1.32-1.81 7.40E-08 0.595 -0.029 0 13 cg00080418 1.51 1.3-1.75 7.40E-08 0.833 -0.012 CD8B 2 cg25025966* 1.51 1.3-1.75 7.42E-08 0.714 -0.015 0 2 cg03309350 1.57 1.33-1.85 7.43E-08 0.713 -0.026 0 6 cg24723690 1.52 1.31-1.78 7.44E-08 0.752 -0.023 0 19 cg26521404 1.53 1.31-1.79 7.44E-08 0.0706 0.0251 HOXA9 7 cg24755177 1.5 1.29-1.74 7.45E-08 0.947 -0.02 LILRA4 19 cg19972193 1.56 1.33-1.83 7.45E-08 0.878 -0.017 FLT4 5 cg21910196* 1.6 1.35-1.9 7.46E-08 0.795 -0.03 RGS6 14 cg12804677* 1.52 1.31-1.77 7.46E-08 0.751 -0.029 PRDM16 1 cg19998328* 1.5 1.3-1.74 7.46E-08 0.807 -0.017 YIPF2 19 cg11161550* 1.56 1.33-1.84 7.51E-08 0.593 -0.019 0 2 cg11846559 1.55 1.32-1.81 7.53E-08 0.813 -0.034 HSPBP1 19 cg19650573* 1.54 1.32-1.81 7.54E-08 0.846 -0.021 ITGA11 15 cg19514854 1.53 1.31-1.79 7.55E-08 0.767 -0.023 0 17 cg04381298* 1.54 1.32-1.8 7.55E-08 0.819 -0.014 0 12 cg20013688 1.59 1.35-1.89 7.58E-08 0.807 -0.027 0 1 cg05099196* 1.59 1.34-1.88 7.58E-08 0.701 -0.017 LINC00905 19 cg12890346 1.57 1.33-1.85 7.59E-08 0.669 -0.031 GRIP2 3 cg00409816* 1.54 1.32-1.81 7.61E-08 0.782 -0.024 PRKCZ 1 cg01270241* 1.54 1.32-1.81 7.61E-08 0.86 -0.022 0 14 cg01856892* 1.48 1.28-1.71 7.64E-08 0.752 -0.022 BAIAP3 16 cg23530596* 1.56 1.33-1.84 7.65E-08 0.754 -0.029 SLC12A1 15 cg20924825* 1.57 1.33-1.85 7.65E-08 0.798 -0.026 0 2 cg11892307* 1.54 1.32-1.81 7.66E-08 0.919 -0.014 0 6 cg18413427* 1.54 1.31-1.8 7.70E-08 0.732 -0.033 0 5 cg17518776* 1.52 1.3-1.77 7.72E-08 0.838 -0.016 PACSIN1 6 cg19750321 1.59 1.34-1.89 7.75E-08 0.913 -0.018 ARNT 1 cg26683252* 1.56 1.33-1.83 7.78E-08 0.797 -0.03 0 17 cg11521325* 1.59 1.34-1.89 7.78E-08 0.702 -0.027 CCDC114 19 cg04931184* 1.56 1.33-1.83 7.80E-08 0.653 -0.031 KCNJ4 22 cg10076902* 1.53 1.31-1.79 7.82E-08 0.647 -0.023 SYT8 11 cg12467852* 1.52 1.31-1.78 7.84E-08 0.798 -0.017 0 6 cg06815910 1.51 1.3-1.75 7.85E-08 0.773 -0.022 CES4A 16 cg06066697 1.52 1.31-1.78 7.87E-08 0.912 -0.014 XYLT1 16 cg07780199 1.55 1.32-1.81 7.89E-08 0.798 -0.026 CRTC1 19 cg07560948 1.51 1.3-1.75 7.89E-08 0.804 -0.019 KCNQ4 1 cg06803614* 1.54 1.32-1.8 7.90E-08 0.826 -0.028 NT5C1A 1 cg06326865* 1.53 1.31-1.79 7.91E-08 0.729 -0.02 CBFA2T3 16 cg07824114* 1.56 1.33-1.83 7.92E-08 0.728 -0.029 0 11 cg03164805* 1.51 1.3-1.75 7.94E-08 0.514 -0.017 0 16 cg13580827 1.56 1.32-1.83 7.95E-08 0.695 -0.031 APBA1 9

174 cg10759648* 1.52 1.3-1.77 7.95E-08 0.503 -0.02 OPCML 11 cg17377066* 1.51 1.3-1.75 7.98E-08 0.805 -0.019 PCSK4 19 cg09706243 1.49 1.29-1.73 7.99E-08 0.39 -0.022 LOC100130987 11 cg22408108 1.57 1.33-1.85 8.00E-08 0.732 -0.03 0 7 cg19350767* 1.55 1.32-1.82 8.00E-08 0.638 -0.03 OR9Q2 11 cg14512813* 1.54 1.32-1.8 8.01E-08 0.891 -0.022 0 3 cg17898124* 1.56 1.33-1.85 8.02E-08 0.711 -0.03 KCND3 1 cg25916172* 1.59 1.34-1.88 8.03E-08 0.719 -0.03 0 10 cg14950321* 1.55 1.32-1.82 8.03E-08 0.499 -0.025 PLIN5 19 cg06166915* 1.5 1.3-1.74 8.03E-08 0.733 -0.022 CX3CL1 16 cg02003612 1.56 1.33-1.83 8.04E-08 0.681 -0.02 TMEM114 16 cg13560871 1.51 1.3-1.76 8.04E-08 0.111 0.023 C1orf115 1 cg26226408 1.57 1.33-1.86 8.05E-08 0.847 -0.022 LINC00598 13 cg00883837 1.57 1.33-1.86 8.07E-08 0.789 -0.023 OR2H1 6 cg19939178* 1.55 1.32-1.81 8.07E-08 0.874 -0.022 BTBD11 12 cg14625154* 1.54 1.32-1.81 8.07E-08 0.575 -0.019 ADAMTSL1 9 cg26296488 1.49 1.29-1.73 8.07E-08 0.265 0.047 DRD5 4 cg07431973* 1.61 1.35-1.91 8.10E-08 0.758 -0.03 FRMD6 14 cg05733703* 1.55 1.32-1.82 8.12E-08 0.677 -0.022 PDS5A 4 cg09676630 1.54 1.31-1.8 8.13E-08 0.676 -0.027 FBRSL1 12 cg20850016* 1.52 1.31-1.77 8.13E-08 0.119 -0.012 ARL8B 3 cg21379458* 1.55 1.32-1.81 8.14E-08 0.672 -0.023 0 12 cg14076161 1.54 1.32-1.81 8.15E-08 0.708 -0.023 PRB4 12 cg24231148 1.53 1.31-1.78 8.16E-08 0.902 -0.013 0 12 cg24632873 1.56 1.32-1.83 8.18E-08 0.728 -0.028 0 3 cg26723045* 1.61 1.35-1.91 8.20E-08 0.831 -0.027 SCUBE2 11 cg12595306* 1.5 1.29-1.74 8.22E-08 0.886 -0.013 0 15 cg10735599 1.57 1.33-1.85 8.24E-08 0.76 -0.028 HLA-DPB2 6 cg20888142 1.53 1.31-1.79 8.30E-08 0.129 0.023 SYN2 3 cg04631994 1.52 1.31-1.78 8.36E-08 0.841 -0.02 THSD4 15 cg19777853* 1.54 1.32-1.81 8.37E-08 0.884 -0.024 UBXN11 1 cg09909478* 1.52 1.3-1.77 8.37E-08 0.853 -0.019 ABR 17 cg01611894 1.56 1.32-1.83 8.38E-08 0.772 -0.024 ALPI 2 cg20682639* 1.56 1.33-1.84 8.40E-08 0.609 -0.031 PDZD2 5 cg13670316 1.51 1.3-1.75 8.40E-08 0.0709 0.0164 PCSK5 9 cg24047926 1.52 1.3-1.77 8.41E-08 0.812 -0.023 SOX13 1 cg08273635* 1.52 1.3-1.77 8.41E-08 0.743 -0.022 CSK 15 cg23541422* 1.51 1.3-1.76 8.41E-08 0.884 -0.017 0 2 cg02601893* 1.52 1.31-1.78 8.41E-08 0.631 -0.016 BIK 22 cg01386157* 1.57 1.33-1.85 8.42E-08 0.757 -0.028 0 6 cg01659184* 1.59 1.34-1.89 8.43E-08 0.783 -0.023 0 6 cg16596691* 1.54 1.31-1.8 8.44E-08 0.813 -0.027 SNRPA 19 cg21245729* 1.52 1.31-1.78 8.45E-08 0.915 -0.02 0 9

175 cg19937056* 1.59 1.34-1.88 8.47E-08 0.791 -0.028 KCNMB2 3 cg10455321* 1.51 1.3-1.75 8.47E-08 0.875 -0.015 0 11 cg12470092* 1.55 1.32-1.82 8.48E-08 0.692 -0.027 CLDN14 21 cg22835851* 1.59 1.34-1.89 8.49E-08 0.823 -0.029 WDFY4 10 cg03278271* 1.5 1.29-1.74 8.50E-08 0.946 -0.02 0 12 cg09042277 1.53 1.31-1.78 8.51E-08 0.203 0.024 TBX5 12 cg05569220 1.5 1.29-1.74 8.52E-08 0.653 -0.02 AATK-AS1 17 cg23302998* 1.52 1.31-1.78 8.53E-08 0.771 -0.03 ASIC2 17 cg08023852* 1.62 1.36-1.93 8.56E-08 0.801 -0.028 MAGI1 3 cg04556646* 1.56 1.33-1.85 8.56E-08 0.597 -0.025 PHF21B 22 cg09936762* 1.53 1.31-1.79 8.56E-08 0.85 -0.019 ZMIZ1 10 cg13369817* 1.57 1.33-1.86 8.57E-08 0.755 -0.028 MROH7 1 cg24317310* 1.51 1.3-1.76 8.57E-08 0.653 -0.021 GPR142 17 cg09530108* 1.63 1.36-1.95 8.58E-08 0.831 -0.027 CRIM1 2 cg08243931* 1.59 1.34-1.89 8.58E-08 0.761 -0.026 0 5 cg12798194* 1.55 1.32-1.81 8.58E-08 0.617 -0.024 0 17 cg06233930* 1.51 1.3-1.76 8.58E-08 0.792 -0.022 TIMP2 17 cg05505545* 1.5 1.3-1.75 8.64E-08 0.809 -0.02 KHK 2 cg00949661 1.59 1.34-1.88 8.65E-08 0.739 -0.031 ATP9A 20 cg11283404* 1.55 1.32-1.82 8.66E-08 0.836 -0.022 IKBKB 8 cg00393415* 1.52 1.3-1.76 8.66E-08 0.758 -0.015 0 5 cg04744717* 1.52 1.31-1.78 8.69E-08 0.853 -0.019 0 19 cg23637354 1.52 1.3-1.77 8.73E-08 0.658 -0.026 0 6 cg09989938* 1.52 1.3-1.76 8.73E-08 0.842 -0.023 CD19 16 cg14451382 1.51 1.3-1.75 8.73E-08 0.149 0.03 0 5 cg08465774* 1.53 1.31-1.79 8.75E-08 0.873 -0.018 IDO1 8 cg22813711 1.55 1.32-1.82 8.76E-08 0.693 -0.029 LINC01140 1 cg24513381* 1.62 1.36-1.93 8.77E-08 0.743 -0.03 PRRX1 1 cg26111757* 1.58 1.34-1.87 8.78E-08 0.807 -0.027 BPIFB3 20 cg15713106* 1.52 1.31-1.78 8.80E-08 0.59 -0.012 0 4 cg13672638 1.53 1.31-1.79 8.82E-08 0.484 -0.02 FSD1L 9 cg17267681* 1.54 1.32-1.81 8.83E-08 0.86 -0.021 0 5 cg26620021 1.54 1.31-1.8 8.84E-08 0.697 -0.032 AKT2 19 cg06531870* 1.54 1.32-1.81 8.89E-08 0.68 -0.031 0 13 cg16270990 1.52 1.3-1.77 8.91E-08 0.787 -0.027 0 11 cg06912601* 1.52 1.3-1.77 8.91E-08 0.88 -0.023 0 22 cg11494699* 1.54 1.31-1.8 8.91E-08 0.868 -0.018 RAG1 11 cg20322837* 1.57 1.33-1.86 8.92E-08 0.74 -0.028 APLNR 11 cg11841127* 1.54 1.32-1.81 8.96E-08 0.64 -0.025 KLHDC8A 1 cg01958906* 1.57 1.33-1.85 9.00E-08 0.899 -0.017 0 14 cg23569968* 1.53 1.31-1.79 9.03E-08 0.754 -0.022 SCGB1A1 11 cg02379333 1.5 1.3-1.75 9.03E-08 0.891 -0.014 ST14 11 cg25918821 1.54 1.31-1.81 9.05E-08 0.552 -0.021 0 17

176 cg17920076* 1.57 1.33-1.86 9.07E-08 0.812 -0.027 0 7 cg02539664* 1.5 1.3-1.75 9.07E-08 0.837 -0.018 SRGAP3 3 cg04193065* 1.53 1.31-1.8 9.09E-08 0.708 -0.029 0 15 cg13599062* 1.54 1.31-1.81 9.09E-08 0.794 -0.028 SHB 9 cg01750895 1.49 1.29-1.73 9.11E-08 0.663 -0.013 PEMT 17 cg06332621 1.59 1.34-1.89 9.15E-08 0.89 -0.031 RBM47 4 cg01807131* 1.54 1.32-1.81 9.15E-08 0.622 -0.028 0 7 cg20941820 1.49 1.29-1.72 9.15E-08 0.0274 0.0127 ADAMTS17 15 cg17967260 1.55 1.32-1.82 9.20E-08 0.795 -0.024 0 6 cg04599612* 1.5 1.3-1.75 9.21E-08 0.694 -0.023 SLC2A7 1 cg19839801 1.54 1.32-1.81 9.21E-08 0.473 0.013 0 11 cg06141561 1.5 1.3-1.75 9.23E-08 0.665 -0.027 NDUFB10 16 cg19392831 1.6 1.35-1.91 9.23E-08 0.41 0.016 PRLHR 10 cg26363695 1.59 1.34-1.88 9.24E-08 0.761 -0.032 0 4 cg20587945* 1.62 1.36-1.93 9.24E-08 0.804 -0.023 SEMA3E 7 cg03114584* 1.53 1.31-1.79 9.28E-08 0.542 -0.033 0 4 cg02556363* 1.55 1.32-1.82 9.29E-08 0.853 -0.028 RBMS3 3 cg05397490* 1.56 1.33-1.84 9.29E-08 0.772 -0.024 N4BP2L2 13 cg01868128* 1.53 1.31-1.79 9.30E-08 0.649 -0.021 LCE5A 1 cg02684562* 1.49 1.29-1.73 9.32E-08 0.378 -0.016 0 10 cg02364703 1.58 1.34-1.87 9.32E-08 0.457 -0.014 GOLGA6A 15 cg09489330 1.51 1.3-1.76 9.34E-08 0.815 -0.016 0 5 cg19284368 1.59 1.34-1.89 9.36E-08 0.647 -0.028 0 8 cg25381737* 1.53 1.31-1.8 9.39E-08 0.789 -0.019 CYMP 1 cg01402347 1.59 1.34-1.88 9.40E-08 0.792 -0.024 0 4 cg16081441* 1.49 1.29-1.73 9.40E-08 0.607 -0.018 AANAT 17 cg10952171* 1.5 1.29-1.74 9.40E-08 0.92 -0.017 0 14 cg09313140* 1.62 1.35-1.92 9.41E-08 0.719 -0.03 0 12 cg00813135* 1.56 1.32-1.83 9.42E-08 0.925 -0.02 ITPR2 12 cg14809166* 1.52 1.31-1.78 9.43E-08 0.736 -0.018 DRD2 11 cg02097120 1.53 1.31-1.8 9.45E-08 0.617 -0.023 TDRD10 1 cg05007163* 1.53 1.31-1.78 9.46E-08 0.616 -0.026 CBX6 22 cg19521646* 1.52 1.3-1.77 9.47E-08 0.736 -0.027 OR1I1 19 cg01532487* 1.54 1.31-1.8 9.47E-08 0.861 -0.017 0 1 cg22012156 1.55 1.32-1.82 9.51E-08 0.826 -0.027 0 1 cg02944084* 1.56 1.32-1.83 9.51E-08 0.654 -0.026 0 17 cg14643264 1.59 1.34-1.88 9.52E-08 0.778 -0.03 MOB3B 9 cg12712659* 1.54 1.31-1.8 9.52E-08 0.446 -0.013 0 7 cg11765913* 1.52 1.3-1.76 9.54E-08 0.797 -0.026 EVI5L 19 cg14140691* 1.54 1.32-1.81 9.54E-08 0.574 -0.024 CACNA1B 9 cg04915788* 1.56 1.32-1.83 9.54E-08 0.913 -0.018 0 12 cg17240157* 1.57 1.33-1.86 9.57E-08 0.83 -0.017 PBX1 1 cg25711375* 1.54 1.31-1.8 9.59E-08 0.724 -0.022 0 13

177 cg17929273 1.5 1.29-1.75 9.60E-08 0.835 -0.022 LGALS2 22 cg06535498 1.53 1.31-1.78 9.61E-08 0.85 -0.025 NOTCH4 6 cg22626506* 1.52 1.3-1.77 9.61E-08 0.838 -0.023 KANK2 19 cg12038641* 1.5 1.3-1.75 9.68E-08 0.777 -0.027 MIR1-1HG 20 cg02287710 1.5 1.29-1.75 9.70E-08 0.476 0.031 DIO3 14 cg20665259* 1.58 1.34-1.87 9.71E-08 0.729 -0.032 KCNH2 7 cg01277890 1.55 1.32-1.82 9.71E-08 0.853 -0.021 PROB1 5 cg26355894* 1.49 1.29-1.73 9.72E-08 0.755 -0.015 TNNT3 11 cg06697448 1.56 1.32-1.83 9.73E-08 0.905 -0.019 LMCD1 3 cg08288703* 1.51 1.3-1.75 9.74E-08 0.586 -0.028 SLC22A8 11 cg21190595* 1.5 1.3-1.75 9.74E-08 0.563 -0.022 CARS 11 cg27182159* 1.54 1.31-1.8 9.75E-08 0.603 -0.024 RPS18 6 cg15061662* 1.54 1.31-1.8 9.80E-08 0.856 -0.017 0 9 cg11828203 1.51 1.3-1.76 9.81E-08 0.465 -0.018 GHRHR 7 cg00454696* 1.55 1.32-1.81 9.82E-08 0.486 -0.018 PRAMEF6 1 cg13795883* 1.5 1.29-1.74 9.83E-08 0.688 -0.018 ABCA2 9 cg17944561* 1.54 1.32-1.81 9.84E-08 0.698 -0.024 KRT16 17 cg15811435 1.56 1.32-1.83 9.85E-08 0.835 -0.028 LNX1 4 cg00238391* 1.56 1.32-1.84 9.85E-08 0.679 -0.016 0 7 cg27230984 1.48 1.28-1.71 9.90E-08 0.822 -0.018 ABR 17 cg24319902 1.48 1.28-1.71 9.90E-08 0.144 0.029 SFRP1 8 cg26544722* 1.59 1.34-1.89 9.92E-08 0.742 -0.029 0 4 cg04875821* 1.54 1.31-1.8 9.92E-08 0.872 -0.02 0 15 cg04215950 1.53 1.31-1.79 9.93E-08 0.701 -0.025 CAMTA1 1 cg13928961* 1.5 1.29-1.74 9.94E-08 0.907 -0.012 KRT73 12 cg10435838* 1.59 1.34-1.88 9.95E-08 0.796 -0.026 LINC01020 5 cg13954213 1.49 1.29-1.72 9.95E-08 0.689 -0.021 0 2 cg05726935* 1.54 1.31-1.8 9.95E-08 0.676 -0.02 AKT1 14 cg25067686* 1.49 1.29-1.73 9.97E-08 0.809 -0.018 TNXB 6 cg13064679* 1.53 1.31-1.79 9.98E-08 0.568 -0.032 TNXB 6 cg18773591 1.59 1.34-1.89 9.98E-08 0.605 -0.026 NRXN1 2 cg01104208* 1.52 1.3-1.78 9.98E-08 0.679 -0.025 EXOC3L4 14 cg24126567 1.52 1.3-1.77 1.00E-07 0.885 -0.016 CYGB 17 cg24083702* 1.51 1.3-1.76 1.00E-07 0.901 -0.016 MIA 19 cg13920435^ 1.47 1.28-1.7 1.00E-07 0.0661 0.0234 CDK20 9 cg18311537 1.5 1.29-1.75 1.00E-07 0.172 0.019 MIR196B 7 cg20001810* 1.55 1.32-1.82 1.01E-07 0.675 -0.033 KRT75 12 cg26408612 1.52 1.31-1.78 1.01E-07 0.714 -0.032 BAT1 6 cg11683364* 1.48 1.28-1.71 1.01E-07 0.431 -0.028 LGR6 1 cg20316538* 1.55 1.32-1.82 1.01E-07 0.765 -0.026 RUFY4 2 cg11705167* 1.54 1.31-1.8 1.01E-07 0.826 -0.026 LRIT1 10 cg03135964* 1.51 1.3-1.75 1.01E-07 0.812 -0.025 GALNT9 12 cg11420633 1.54 1.32-1.81 1.01E-07 0.483 -0.024 KRTAP3-3 17

178 cg15950504* 1.6 1.35-1.9 1.01E-07 0.832 -0.024 0 13 cg22742965* 1.55 1.32-1.81 1.01E-07 0.847 -0.022 TMEFF2 2 cg09896093* 1.52 1.31-1.78 1.01E-07 0.838 -0.022 SNX8 7 cg16392193 1.54 1.31-1.81 1.01E-07 0.589 -0.019 SLC6A3 5 cg01475538* 1.5 1.29-1.74 1.01E-07 0.594 -0.017 RRAS2 11 cg01017397* 1.53 1.31-1.79 1.01E-07 0.951 -0.016 ADAP1 7 cg08057037* 1.48 1.28-1.71 1.01E-07 0.531 -0.015 TSPAN4 11 cg01827999* 1.51 1.3-1.75 1.01E-07 0.782 -0.015 TOM1L2 17 cg24721899 1.51 1.3-1.76 1.01E-07 0.0743 0.0267 NKX3-2 4 cg01359532 1.52 1.31-1.78 1.02E-07 0.706 -0.033 CYP1A2 15 cg21187195* 1.57 1.33-1.85 1.02E-07 0.855 -0.024 0 5 cg14413700 1.55 1.32-1.82 1.02E-07 0.751 -0.022 FAM184A 6 cg08424166* 1.52 1.3-1.78 1.02E-07 0.957 -0.015 AGPAT4 6 cg26627129* 1.51 1.3-1.76 1.02E-07 0.509 -0.011 CGB8 19 cg10755973 1.49 1.29-1.73 1.02E-07 0.03 0.0167 LHX2 9 cg22142137 1.56 1.32-1.83 1.03E-07 0.745 -0.025 OR10G2 14 cg24131262 1.5 1.29-1.75 1.03E-07 0.808 -0.019 ITPR1 3 cg23149300* 1.52 1.3-1.78 1.03E-07 0.29 -0.018 0 10 cg25296222 1.49 1.29-1.73 1.03E-07 0.768 -0.014 0 11 cg25997110 1.53 1.31-1.78 1.03E-07 0.0836 0.0184 0 19 cg02987618* 1.58 1.34-1.87 1.04E-07 0.794 -0.029 0 10 cg18567954* 1.53 1.31-1.79 1.04E-07 0.704 -0.028 DTX1 12 cg03564620 1.54 1.31-1.81 1.04E-07 0.703 -0.028 MYH6 14 cg24830241* 1.52 1.3-1.78 1.04E-07 0.861 -0.024 MYL10 7 cg18758987 1.54 1.31-1.81 1.04E-07 0.797 -0.023 0 1 cg08537804* 1.57 1.33-1.85 1.04E-07 0.851 -0.022 DSCAML1 11 cg01152616* 1.54 1.31-1.81 1.04E-07 0.833 -0.02 C6orf136 6 cg14267222* 1.5 1.29-1.75 1.04E-07 0.867 -0.015 0 10 cg00676085* 1.57 1.33-1.86 1.05E-07 0.709 -0.031 KRT72 12 cg02680909 1.55 1.32-1.83 1.05E-07 0.713 -0.03 MOCS1 6 cg19951105 1.49 1.29-1.73 1.05E-07 0.835 -0.013 0 5 cg22920603* 1.54 1.31-1.8 1.06E-07 0.889 -0.031 ADNP 20 cg11886884* 1.57 1.33-1.86 1.06E-07 0.679 -0.03 0 3 cg14491667 1.53 1.31-1.79 1.06E-07 0.81 -0.03 TMOD1 9 cg04839683* 1.51 1.3-1.75 1.06E-07 0.79 -0.027 0 11 cg03496533* 1.53 1.31-1.79 1.06E-07 0.724 -0.026 PRR15L 17 cg26929394* 1.55 1.32-1.81 1.06E-07 0.528 -0.025 0 11 cg21090182 1.54 1.31-1.8 1.06E-07 0.863 -0.024 LRCH1 13 cg11857626* 1.54 1.31-1.8 1.06E-07 0.826 -0.023 IQCF5 3 cg09102690 1.52 1.3-1.77 1.06E-07 0.521 -0.019 ACCN1 17 cg21697667* 1.55 1.32-1.82 1.06E-07 0.746 -0.018 0 2 cg22975568* 1.51 1.3-1.76 1.06E-07 0.933 -0.017 CD74 5 cg20274430* 1.51 1.3-1.75 1.06E-07 0.522 -0.017 MCHR1 22

179 cg00928002* 1.55 1.32-1.81 1.06E-07 0.618 -0.014 0 8 cg09640070* 1.57 1.33-1.85 1.06E-07 0.93 -0.014 ITPR2 12 cg07725527 1.53 1.31-1.79 1.06E-07 0.224 0.029 NAA16 13 cg16273979* 1.6 1.35-1.9 1.07E-07 0.737 -0.028 0 4 cg26973018 1.56 1.32-1.83 1.07E-07 0.756 -0.024 ZNRF1 16 cg23080908 1.53 1.31-1.78 1.07E-07 0.762 -0.021 MAGI1 3 cg14042230* 1.51 1.3-1.76 1.07E-07 0.715 -0.017 0 9 cg01664864 1.52 1.3-1.77 1.07E-07 0.428 0.019 DIO3 14 cg01185573* 1.61 1.35-1.91 1.08E-07 0.712 -0.029 0 3 cg07356534* 1.54 1.31-1.8 1.08E-07 0.832 -0.028 KIRREL 1 cg09118247 1.54 1.32-1.81 1.08E-07 0.829 -0.024 0 1 cg14507845* 1.52 1.3-1.77 1.08E-07 0.547 -0.02 0 9 cg01502373* 1.51 1.3-1.75 1.08E-07 0.872 -0.016 TOP1P2 22 cg03517913* 1.57 1.33-1.85 1.08E-07 0.874 -0.015 0 3 cg20678043 1.55 1.32-1.82 1.09E-07 0.795 -0.031 SLC6A17 1 cg14599380* 1.55 1.32-1.81 1.09E-07 0.707 -0.026 0 4 cg25561401* 1.56 1.32-1.83 1.09E-07 0.76 -0.022 PDZD2 5 cg20505457 1.53 1.31-1.79 1.09E-07 0.798 -0.022 ACTN1 14 cg02047661* 1.52 1.3-1.78 1.09E-07 0.724 -0.02 RRP9 3 cg06059409 1.51 1.3-1.76 1.09E-07 0.659 -0.019 ARHGEF17 11 cg18760496* 1.51 1.3-1.75 1.09E-07 0.636 -0.019 0 17 cg05084700* 1.49 1.29-1.72 1.09E-07 0.775 -0.016 0 13 cg17284553* 1.53 1.31-1.8 1.09E-07 0.381 -0.015 0 7 cg21846970 1.5 1.29-1.75 1.09E-07 0.946 -0.014 SCIMP 17 cg11073381 1.59 1.34-1.89 1.10E-07 0.781 -0.027 ANKFN1 17 cg19131272 1.54 1.31-1.81 1.10E-07 0.776 -0.025 0 13 cg13051970 1.54 1.32-1.81 1.10E-07 0.544 -0.024 DDC 7 cg15168000* 1.56 1.32-1.84 1.10E-07 0.89 -0.022 MAML2 11 cg00151124* 1.54 1.31-1.8 1.10E-07 0.835 -0.021 0 15 cg26298273^ 1.52 1.3-1.78 1.10E-07 0.83 -0.02 SLC22A23 6 cg03528538* 1.51 1.3-1.76 1.10E-07 0.63 -0.018 MUC6 11 cg03449379 1.52 1.3-1.78 1.10E-07 0.726 -0.018 CYP11A1 15 cg10955243* 1.57 1.33-1.86 1.10E-07 0.908 -0.017 0 3 cg15437231* 1.55 1.32-1.82 1.10E-07 0.701 -0.013 TBC1D15 12 cg09887059 1.55 1.32-1.82 1.10E-07 0.252 0.036 0 12 cg07700234 1.5 1.29-1.74 1.10E-07 0.059 0.0087 PPM1L 3 cg17851868* 1.56 1.32-1.84 1.12E-07 0.729 -0.032 0 2 cg20054412* 1.52 1.3-1.77 1.12E-07 0.727 -0.025 C7orf50 7 cg27567922* 1.52 1.3-1.78 1.12E-07 0.873 -0.024 MUC4 3 cg10923036 1.54 1.31-1.81 1.12E-07 0.681 -0.023 IGFBP7 4 cg23942508 1.53 1.31-1.8 1.12E-07 0.586 -0.023 CEACAM16 19 cg22454564* 1.54 1.31-1.8 1.12E-07 0.44 -0.022 0 20 cg05785510* 1.56 1.32-1.83 1.12E-07 0.821 -0.022 0 2

180 cg15279413* 1.49 1.29-1.73 1.12E-07 0.647 -0.017 DGCR5 22 cg07557173* 1.55 1.32-1.83 1.12E-07 0.907 -0.016 FLNB 3 cg27389562 1.51 1.3-1.75 1.12E-07 0.491 -0.01 CEACAM8 19 cg04210186* 1.47 1.27-1.69 1.12E-07 0.57 -0.01 0 4 cg01914621 1.51 1.29-1.75 1.12E-07 0.196 0.024 0 7 cg02441149 1.57 1.33-1.86 1.13E-07 0.846 -0.028 EFNB2 13 cg04943797* 1.55 1.32-1.82 1.13E-07 0.685 -0.026 0 20 cg20821276* 1.51 1.3-1.76 1.13E-07 0.531 -0.023 0 17 cg08111284* 1.47 1.28-1.7 1.13E-07 0.789 -0.02 TNXB 6 cg12632309 1.5 1.29-1.74 1.13E-07 0.921 -0.015 ST6GAL1 3 cg17139392* 1.58 1.33-1.87 1.14E-07 0.626 -0.022 0 12 cg07950786* 1.51 1.3-1.76 1.14E-07 0.821 -0.021 EPS8L3 1 cg14051236 1.51 1.3-1.76 1.14E-07 0.383 -0.019 CCDC85C 14 cg19716038 1.53 1.31-1.79 1.14E-07 0.883 -0.016 0 6 cg22850587 1.51 1.3-1.76 1.14E-07 0.121 0.022 0 4 cg00244040 1.48 1.28-1.71 1.14E-07 0.0417 0.0128 PRKCH 14 cg23091122 1.56 1.33-1.85 1.15E-07 0.823 -0.021 SYPL2 1 cg02953144* 1.5 1.29-1.75 1.16E-07 0.746 -0.038 PI4KA 22 cg13107839* 1.56 1.33-1.85 1.16E-07 0.763 -0.031 0 1 cg00012576* 1.56 1.33-1.85 1.16E-07 0.671 -0.028 GPC6 13 cg12592691* 1.52 1.3-1.78 1.16E-07 0.765 -0.025 COBL 7 cg22739858* 1.55 1.32-1.83 1.16E-07 0.658 -0.024 FGF1 5 cg12765567* 1.51 1.3-1.76 1.16E-07 0.495 -0.021 0 6 cg15085044* 1.49 1.29-1.74 1.16E-07 0.754 -0.019 0 17 cg19293468 1.53 1.31-1.8 1.17E-07 0.657 -0.038 SMG6 17 cg08426962* 1.58 1.33-1.87 1.17E-07 0.698 -0.033 MCTP1 5 cg03466326* 1.51 1.3-1.76 1.17E-07 0.819 -0.026 0 1 cg03896940* 1.55 1.32-1.82 1.17E-07 0.765 -0.025 BAI1 8 cg24315209* 1.57 1.33-1.85 1.17E-07 0.622 -0.023 CDK18 1 cg21447170 1.52 1.3-1.77 1.17E-07 0.638 -0.023 0 4 cg02928935 1.54 1.31-1.81 1.17E-07 0.879 -0.022 0 17 cg08454125* 1.51 1.3-1.75 1.17E-07 0.812 -0.019 TMIE 3 cg11496113* 1.54 1.31-1.8 1.17E-07 0.493 -0.016 0 5 cg18394557* 1.5 1.29-1.74 1.17E-07 0.874 -0.016 P2RX1 17 cg12499130 1.5 1.29-1.74 1.17E-07 0.0589 0.0162 SYT15 10 cg11074933 1.49 1.29-1.73 1.18E-07 0.577 -0.031 GABRB3 15 cg01247693* 1.52 1.3-1.77 1.18E-07 0.752 -0.026 NPAS2 2 cg25520440* 1.53 1.31-1.79 1.18E-07 0.751 -0.025 0 7 cg00305993 1.54 1.31-1.81 1.18E-07 0.683 -0.025 0 15 cg14480531 1.51 1.3-1.76 1.18E-07 0.844 -0.025 MSI2 17 cg24312489* 1.55 1.32-1.82 1.18E-07 0.622 -0.023 WNT9A 1 cg10314459* 1.52 1.3-1.78 1.18E-07 0.693 -0.023 IL17RE 3 cg02771362* 1.5 1.29-1.75 1.18E-07 0.363 -0.017 SH2D3C 9

181 cg10325659 1.48 1.28-1.71 1.18E-07 0.892 -0.015 ASB2 14 cg06803853* 1.49 1.29-1.73 1.18E-07 0.965 -0.012 CASKIN1 16 cg03999067 1.61 1.35-1.92 1.19E-07 0.763 -0.028 0 12 cg01556466* 1.56 1.32-1.84 1.19E-07 0.625 -0.026 ADCYAP1R1 7 cg14989661* 1.52 1.3-1.77 1.19E-07 0.879 -0.024 OXSR1 3 cg24987622 1.56 1.32-1.83 1.19E-07 0.643 -0.024 PDE2A 11 cg19076265* 1.57 1.33-1.85 1.19E-07 0.855 -0.021 DOCK5 8 cg08509586* 1.55 1.32-1.82 1.19E-07 0.892 -0.015 0 11 cg12243034* 1.51 1.29-1.75 1.19E-07 0.908 -0.015 0 17

β= β methylation value *= corresponds with differentially methylated CpG in Kulis et al (42) ^= corresponds with differentially methylated CpG in Georgiadis et al (41)

182

Appendix Table 4: Differentially variable probes

Probe ID Adj p value VarRatio ICC Chm Gene name cg20649847 2.17E-09 5.12 0.998 17 ANKRD13B cg10318725 2.17E-08 5.54 0.994 13 RASA3 cg19750321 4.74E-08 10.20 0.996 1 ARNT cg17395184 7.10E-08 8.37 0.990 15 ZFP106 cg11913951 9.13E-08 6.09 0.987 6 NFKBIE cg06530347 1.18E-07 6.17 0.996 12 C12orf77 cg22325958 1.18E-07 7.42 0.995 12 SFSWAP cg13586038 1.89E-07 5.08 0.991 9 SMARCA2 cg24747239 2.37E-07 5.03 0.983 10 0 cg12493107 2.66E-07 5.78 0.969 20 GZF1 cg12492087 2.72E-07 7.54 0.994 15 ZFP106 cg00396427 3.56E-07 5.41 0.993 16 ARHGAP17 cg02141543 3.91E-07 6.54 0.993 5 ZBED3 cg05723179 6.61E-07 5.04 0.993 7 CREB3L2 cg02068351 7.90E-07 6.00 0.986 6 STX7 cg22042908 8.11E-07 5.56 0.987 20 RASSF2 cg00885461 8.11E-07 5.64 0.991 6 CCDC167 cg06073041 9.18E-07 7.97 0.995 5 RNF14 cg14517390 9.98E-07 5.52 0.997 15 ACSBG1 cg14972228 1.21E-06 6.40 0.992 19 SIPA1L3 cg09640070 1.29E-06 7.83 0.997 12 ITPR2 cg01706498 1.37E-06 5.76 0.955 3 KLHL6 cg00087511 1.45E-06 5.14 0.982 1 SLC2A7 cg01389386 1.45E-06 6.52 0.990 16 ST3GAL2 cg06536988 1.49E-06 5.32 0.991 4 TMEM154 cg15316118 1.62E-06 5.16 0.993 6 POU5F1 cg21017641 1.86E-06 5.37 0.991 13 RASA3 cg27232166 1.92E-06 5.02 0.996 21 0 cg24056232 2.02E-06 5.48 0.989 1 C1orf186 cg00958944 2.03E-06 5.33 0.994 1 KPNA6 cg27383873 2.23E-06 7.37 0.987 10 ERCC6 cg11071155 2.37E-06 5.84 0.988 6 0 cg18184941 2.60E-06 6.03 0.996 6 C6orf136 cg10325253 2.60E-06 5.04 0.993 17 LOC645638 cg19124151 2.74E-06 5.65 0.995 2 0 cg06555661 2.79E-06 5.56 0.993 12 SFSWAP cg07557173 2.97E-06 6.42 0.996 3 FLNB cg15706973 3.03E-06 5.39 0.983 2 ACOXL cg27630771 3.07E-06 5.03 0.988 1 ADAMTSL4 cg11915812 3.13E-06 5.57 0.986 2 HPCAL1 cg16444826 3.16E-06 5.30 0.994 2 0 cg02878544 3.47E-06 5.06 0.976 16 0 cg03969651 3.59E-06 6.54 0.996 17 TBX4

183

cg01213455 3.59E-06 5.35 0.997 5 MRNIP cg02087075 3.59E-06 5.03 0.986 6 SLC35B2 cg01810713 3.59E-06 5.76 0.980 16 0 cg27143842 3.62E-06 5.34 0.993 12 B4GALNT3 cg22762189 3.64E-06 5.57 0.994 2 MSGN1 cg05677184 3.77E-06 5.08 0.991 16 0 cg19753641 3.84E-06 5.43 0.970 19 CD37 cg00253346 4.06E-06 6.17 0.996 22 TNFRSF13C cg01767116 4.09E-06 5.97 0.984 15 FBXL22 cg07562039 4.09E-06 5.30 0.993 10 PAOX cg05849013 4.35E-06 5.11 0.994 16 SNX29 cg00346970 4.35E-06 5.36 0.985 6 ALDH5A1 cg24009030 4.35E-06 5.74 0.980 13 0 cg03586564 4.39E-06 5.14 0.993 18 NFATC1 cg06101576 4.42E-06 5.58 0.997 7 0 cg23529249 4.43E-06 6.33 0.988 15 0 cg27660099 4.69E-06 5.48 0.991 7 EEPD1 cg08010619 4.77E-06 7.05 0.991 11 SWAP70 cg02498072 4.84E-06 6.63 0.989 16 SNX29 cg05962793 4.97E-06 5.23 0.995 16 USP10 cg14800014 4.97E-06 5.47 0.992 5 GRAMD3 cg24264962 5.06E-06 5.52 0.990 3 OSBPL10 cg04132379 5.13E-06 5.53 0.994 20 RASSF2 cg23900696 5.13E-06 5.15 0.991 12 0 cg07534892 5.26E-06 5.25 0.974 12 0 cg16445423 5.47E-06 6.44 0.991 2 TMEM131 cg15418419 5.55E-06 5.88 0.966 6 HLA-DMA cg08126223 5.60E-06 5.48 0.989 12 RND1 cg19865523 5.75E-06 5.77 0.988 12 EP400 cg10597661 5.83E-06 6.41 0.991 8 0 cg08958952 6.28E-06 5.68 0.988 8 0 cg05475010 6.38E-06 5.24 0.990 6 TREML3P cg21419003 6.52E-06 6.05 0.994 1 ADORA3 cg12547839 6.64E-06 5.95 0.989 17 UBE2O cg15472885 6.65E-06 5.89 0.980 3 FOXP1 cg07552374 6.94E-06 6.34 0.994 11 APBB1 cg09477124 7.65E-06 5.27 0.973 3 ATP2B2 cg02400725 8.04E-06 5.70 0.992 6 COL19A1 cg00386940 8.17E-06 5.53 0.986 20 0 cg07566050 8.23E-06 5.27 0.989 1 PEA15 cg07322003 8.41E-06 5.01 0.996 11 0 cg22779330 8.72E-06 5.12 0.989 11 MYEOV cg16844818 8.89E-06 6.84 0.984 2 FARP2 cg01676996 8.93E-06 5.50 0.988 6 C6orf136 cg06221222 9.91E-06 5.20 0.986 1 BCAR3 cg05041511 9.97E-06 5.79 0.993 7 CREB3L2

184

cg14667104 1.00E-05 5.13 0.985 12 GCN1L1 cg06900514 1.03E-05 5.04 0.990 2 0 cg11611676 1.03E-05 5.52 0.989 6 0 cg00509772 1.03E-05 5.22 0.977 20 E2F1 cg25402083 1.09E-05 5.04 0.977 8 GPR20 cg08692295 1.10E-05 5.14 0.992 19 SFRS16 cg08461425 1.11E-05 5.79 0.992 12 KDM2B cg23534301 1.11E-05 5.65 0.995 17 SSH2 cg03282991 1.14E-05 5.09 0.994 6 C6orf10 cg19621160 1.23E-05 5.79 0.984 4 STIM2 cg19185146 1.25E-05 6.36 0.990 2 0 cg17516156 1.28E-05 5.57 0.981 2 INPP1 cg02042156 1.32E-05 5.27 0.993 2 0 cg05220083 1.39E-05 5.99 0.995 12 MARCH9 cg03099790 1.42E-05 5.62 0.995 16 AMFR cg02588809 1.44E-05 5.10 0.977 13 RASA3 cg15417654 1.52E-05 5.47 0.988 3 LPP cg02268510 1.59E-05 5.54 0.992 19 SIPA1L3 cg20131897 1.69E-05 5.40 0.987 12 ACVRL1 cg21856603 1.80E-05 5.29 0.976 17 0 cg20596543 1.93E-05 5.07 0.991 8 0 cg10235741 1.96E-05 5.15 0.990 3 DNASE1L3 cg18670483 1.96E-05 5.36 0.986 5 CD180 cg16252905 2.13E-05 5.63 0.988 1 TNFRSF4 cg19818271 2.17E-05 5.63 0.993 7 PHTF2 cg20964216 2.18E-05 5.08 0.992 17 0 cg05931231 2.22E-05 5.59 0.994 12 0 cg26465837 2.30E-05 5.18 0.981 7 HDAC9 cg25995870 2.44E-05 6.30 0.990 19 SIPA1L3 cg04289069 2.50E-05 5.14 0.995 10 DCLRE1C cg06650250 2.54E-05 5.49 0.989 13 LRCH1 cg10042132 2.57E-05 5.39 0.995 14 0 cg14162361 2.60E-05 5.05 0.985 9 0 cg21879236 2.62E-05 5.25 0.992 5 0 cg16914953 2.67E-05 5.47 0.978 7 AP1S1 cg12889817 2.70E-05 5.30 0.991 17 0 cg08708629 2.85E-05 7.35 0.963 1 PDE4DIP cg11938386 2.87E-05 5.34 0.985 15 USP3 cg02418072 2.90E-05 5.04 0.993 10 0 cg27446570 2.94E-05 5.19 0.988 1 C1orf174 cg18002437 3.01E-05 5.11 0.980 1 LOC728875 cg12495807 3.02E-05 5.94 0.994 12 MVK cg04242821 3.41E-05 5.25 0.986 22 SMTN cg18786643 3.50E-05 5.95 0.989 1 IGSF3 cg04617914 3.59E-05 5.18 0.975 6 FOXP4 cg25722041 3.60E-05 6.41 0.990 1 RERE

185

cg03493668 3.75E-05 5.10 0.987 11 THRSP cg25431916 3.84E-05 5.35 0.997 19 SF3A2 cg04341526 3.91E-05 5.41 0.992 1 0 cg05922591 3.97E-05 5.42 0.989 19 LILRB4 cg27319563 4.01E-05 5.14 0.992 10 DNMBP cg00963675 4.24E-05 5.45 0.976 10 EGR2 cg07094785 4.40E-05 5.23 0.995 11 0 cg04232098 5.18E-05 5.45 0.992 3 0 cg25060035 5.19E-05 5.47 0.990 17 TTYH2 cg25652781 5.54E-05 5.22 0.991 4 TMEM154 cg18165251 5.87E-05 5.26 0.980 10 CDHR1 cg26915618 5.92E-05 6.17 0.989 1 RERE cg02484633 5.93E-05 5.26 0.974 6 C6orf227 cg01403532 6.66E-05 5.62 0.961 16 LOC652276 cg15236196 6.76E-05 5.42 0.984 17 AMZ2P1 cg19458529 7.40E-05 6.36 0.963 5 MAST4 cg11834658 7.59E-05 5.25 0.990 1 ECE1 cg25538984 7.73E-05 5.22 0.987 5 0 cg12976832 9.07E-05 5.30 0.998 19 ZNF317 cg14323928 9.27E-05 5.83 0.989 15 TRPM1 cg00319157 9.34E-05 5.66 0.986 22 GRAMD4 cg23523388 9.72E-05 5.03 0.976 10 ITPRIP cg17342709 0.000100105 5.19 0.993 2 0 cg11043092 0.000106445 5.14 0.994 8 MCPH1 cg16273734 0.000113767 5.21 0.974 1 0 cg14369329 0.000122711 5.74 0.978 14 KIAA0247 cg04866639 0.000136015 5.09 0.994 15 RHOV cg05967129 0.000157294 6.21 0.992 7 SND1 cg03487430 0.000170657 5.48 0.980 9 CD72 cg22995064 0.000177276 5.26 0.988 6 ZFAND3 cg08155591 0.000178926 5.58 0.990 8 E2F5 cg03566787 0.000241552 5.70 0.983 11 CELF1 cg23796329 0.000242433 5.26 0.995 6 BEND3 cg13039199 0.000431549 5.15 0.923 15 SLCO3A1 cg23020857 0.002749529 5.27 0.996 14 CGRRF1 VarRatio=variance ratio ICC=intraclass correlation coefficient

186

References

1. Swerdlow SH CE, Harris NL, et al., editors. WHO Classification of Tumours of Haematopoietic and Lymphoid Tissues. Lyon, France: IARC; 2008. 2. Registries AIoHaWAAoC. Cancer in Australia: an overview, 2010. Canberra: AIHW; 2010. Contract No.: 56. 3. Thursfield V. Person years of life lost to age 85 years from B-cell neoplasms in Australia. VCR, Cancer Council Victoria: Melbourne; (based on Victorian Cancer Registry data for 2010). 4. Landgren O, Kristinsson SY, Goldin LR, Caporaso NE, Blimark C, Mellqvist U-H, et al. Risk of plasma cell and lymphoproliferative disorders among 14621 first-degree relatives of 4458 patients with monoclonal gammopathy of undetermined significance in Sweden. Blood. 2009;114(4):791-5. 5. Blombery PA, Ryland GL, Markham J, Guinto J, Wall M, McBean M, et al. Detection of clinically relevant early genomic lesions in B-cell malignancies from circulating tumour DNA using a single hybridisation-based next generation sequencing assay. Br J Haematol. 2017. 6. Fabbri G, Dalla-Favera R. The molecular pathogenesis of chronic lymphocytic leukaemia. Nature reviews Cancer. 2016;16(3):145-62. 7. Quesada V, Conde L, Villamor N, Ordonez GR, Jares P, Bassaganyas L, et al. Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nat Genet. 2011;44(1):47-52. 8. Morgan GJ, Walker BA, Davies FE. The genetic architecture of multiple myeloma. Nature reviews Cancer. 2012;12(5):335-48. 9. Morton LM, Slager SL, Cerhan JR, Wang SS, Vajdic CM, Skibola CF, et al. Etiologic heterogeneity among non-Hodgkin lymphoma subtypes: the InterLymph Non-Hodgkin Lymphoma Subtypes Project. J Natl Cancer Inst Monogr. 2014;2014(48):130-44. 10. Goldin LR, Bjorkholm M, Kristinsson SY, Samuelsson J, Landgren O. Germline and somatic JAK2 mutations and susceptibility to chronic myeloproliferative neoplasms. Genome medicine. 2009;1(5):55. 11. Di Bernardo MC, Crowther-Swanepoel D, Broderick P, Webb E, Sellick G, Wild R, et al. A genome-wide association study identifies six susceptibility loci for chronic lymphocytic leukemia. Nat Genet. 2008;40(10):1204-10. 12. Crowther-Swanepoel D, Corre T, Lloyd A, Gaidano G, Olver B, Bennett FL, et al. Inherited genetic susceptibility to monoclonal B-cell lymphocytosis. Blood. 2010;116(26):5957-60. 13. Crowther-Swanepoel D, Di Bernardo MC, Jamroziak K, Karabon L, Frydecka I, Deaglio S, et al. Common genetic variation at 15q25.2 impacts on chronic lymphocytic leukaemia risk. Br J Haematol. 2011;154(2):229-33. 14. Berndt SI, Camp NJ, Skibola CF, Vijai J, Wang Z, Gu J, et al. Meta-analysis of genome-wide association studies discovers multiple loci for chronic lymphocytic leukemia. Nat Commun. 2016;7:10933. 15. Berndt SI, Skibola CF, Joseph V, Camp NJ, Nieters A, Wang Z, et al. Genome-wide association study identifies multiple risk loci for chronic lymphocytic leukemia. Nat Genet. 2013;45(8):868-76. 16. Crowther-Swanepoel D, Broderick P, Di Bernardo MC, Dobbins SE, Torres M, Mansouri M, et al. Common variants at 2q37.3, 8q24.21, 15q21.3 and 16q24.1 influence chronic lymphocytic leukemia risk. Nat Genet. 2010;42(2):132-6.

187

17. Speedy HE, Di Bernardo MC, Sava GP, Dyer MJ, Holroyd A, Wang Y, et al. A genome-wide association study identifies multiple susceptibility loci for chronic lymphocytic leukemia. Nat Genet. 2014;46(1):56-60. 18. Slager SL, Camp NJ, Conde L, Shanafelt TD, Achenbach SJ, Rabe KG, et al. Common variants within 6p21.31 locus are associated with chronic lymphocytic leukaemia and, potentially, other non-Hodgkin lymphoma subtypes. British Journal of Haematology. 2012;159(5):572-6. 19. Slager SL, Rabe KG, Achenbach SJ, Vachon CM, Goldin LR, Strom SS, et al. Genome-wide association study identifies a novel susceptibility locus at 6p21.3 among familial CLL. Blood. 2011;117(6):1911-6. 20. Conde L, Halperin E, Akers NK, Brown KM, Smedby KE, Rothman N, et al. Genome-wide association study of follicular lymphoma identifies a risk locus at 6p21.32. Nat Genet. 2010;42(8):661-4. 21. Cerhan JR, Berndt SI, Vijai J, Ghesquieres H, McKay J, Wang SS, et al. Genome-wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma. Nat Genet. 2014;46(11):1233-8. 22. Smedby KE, Foo JN, Skibola CF, Darabi H, Conde L, Hjalgrim H, et al. GWAS of follicular lymphoma reveals allelic heterogeneity at 6p21.32 and suggests shared genetic susceptibility with diffuse large B-cell lymphoma. PLoS Genet. 2011;7(4):e1001378. 23. Bassig BA, Cerhan JR, Au WY, Kim HN, Sangrajrang S, Hu W, et al. Genetic susceptibility to diffuse large B-cell lymphoma in a pooled study of three Eastern Asian populations. Eur J Haematol. 2015;95(5):442-8. 24. Kumar V, Matsuo K, Takahashi A, Hosono N, Tsunoda T, Kamatani N, et al. Common variants on 14q32 and 13q12 are associated with DLBCL susceptibility. Journal of human genetics. 2011;56(6):436-9. 25. Vijai J, Kirchhoff T, Schrader KA, Brown J, Dutra-Clarke AV, Manschreck C, et al. Susceptibility loci associated with specific and shared subtypes of lymphoid malignancies. PLoS Genet. 2013;9(1):e1003220. 26. Skibola CF, Bracci PM, Halperin E, Conde L, Craig DW, Agana L, et al. Genetic variants at 6p21.33 are associated with susceptibility to follicular lymphoma. Nat Genet. 2009;41(8):873-5. 27. Tan DE, Foo JN, Bei JX, Chang J, Peng R, Zheng X, et al. Genome-wide association study of B cell non-Hodgkin lymphoma identifies 3q27 as a susceptibility locus in the Chinese population. Nat Genet. 2013;45(7):804-7. 28. Broderick P, Chubb D, Johnson DC, Weinhold N, Forsti A, Lloyd A, et al. Common variation at 3p22.1 and 7p15.3 influences multiple myeloma risk. Nat Genet. 2012;44(1):58-61. 29. Mitchell JS, Li N, Weinhold N, Forsti A, Ali M, van Duin M, et al. Genome- wide association study identifies multiple susceptibility loci for multiple myeloma. Nat Commun. 2016;7:12050. 30. Chubb D, Weinhold N, Broderick P, Chen B, Johnson DC, Forsti A, et al. Common variation at 3q26.2, 6p21.33, 17p11.2 and 22q13.1 influences multiple myeloma risk. Nat Genet. 2013. 31. Swaminathan B, Thorleifsson G, Joud M, Ali M, Johnsson E, Ajore R, et al. Variants in ELL2 influencing immunoglobulin levels associate with multiple myeloma. Nat Commun. 2015;6:7213.

188

32. Martino A, Campa D, Jamroziak K, Reis RM, Sainz J, Buda G, et al. Impact of polymorphic variation at 7p15.3, 3p22.1 and 2p23.3 loci on risk of multiple myeloma. Br J Haematol. 2012;158(6):805-9. 33. Weinhold N, Johnson DC, Chubb D, Chen B, Forsti A, Hosking FJ, et al. The CCND1 c.870G>A polymorphism is a risk factor for t(11;14)(q13;q32) multiple myeloma. Nat Genet. 2013;45(5):522-5. 34. Sharma A, Heuck CJ, Fazzari MJ, Mehta J, Singhal S, Greally JM, et al. DNA methylation alterations in multiple myeloma as a model for epigenetic changes in cancer. Wiley Interdiscip Rev Syst Biol Med. 2010;2(6):654-69. 35. Li Y, Nagai H, Ohno T, Yuge M, Hatano S, Ito E, et al. Aberrant DNA methylation of p57(KIP2) gene in the promoter region in lymphoid malignancies of B-cell phenotype. Blood. 2002;100(7):2572-7. 36. Braggio E, Maiolino A, Gouveia ME, Magalhaes R, Souto Filho JT, Garnica M, et al. Methylation status of nine tumor suppressor genes in multiple myeloma. Int J Hematol. 2010;91(1):87-96. 37. Guillerm G, Gyan E, Wolowiec D, Facon T, Avet-Loiseau H, Kuliczkowski K, et al. p16(INK4a) and p15(INK4b) gene methylations in plasma cells from monoclonal gammopathy of undetermined significance. Blood. 2001;98(1):244- 6. 38. Kanduri M, Cahill N, Goransson H, Enstrom C, Ryan F, Isaksson A, et al. Differential genome-wide array-based methylation profiles in prognostic subsets of chronic lymphocytic leukemia. Blood. 2010;115(2):296-305. 39. Berger SL, Kouzarides T, Shiekhattar R, Shilatifard A. An operational definition of epigenetics. Genes & development. 2009;23(7):781-3. 40. Jones PA, Takai D. The role of DNA methylation in mammalian epigenetics. Science. 2001;293(5532):1068-70. 41. Georgiadis P, Liampa I, Hebels DG, Krauskopf J, Chatziioannou A, Valavanis I, et al. Evolving DNA methylation and gene expression markers of B- cell chronic lymphocytic leukemia are present in pre-diagnostic blood samples more than 10 years prior to diagnosis. BMC Genomics. 2017;18(1):728. 42. Kulis M, Heath S, Bibikova M, Queiros AC, Navarro A, Clot G, et al. Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia. Nat Genet. 2012;44(11):1236-42. 43. Liu P, Jiang W, Zhao J, Zhang H. Integrated analysis of genomewide gene expression and DNA methylation microarray of diffuse large Bcell lymphoma with TET mutations. Mol Med Rep. 2017;16(4):3777-82. 44. Asmar F, Punj V, Christensen J, Pedersen MT, Pedersen A, Nielsen AB, et al. Genome-wide profiling identifies a DNA methylation signature that associates with TET2 mutations in diffuse large B-cell lymphoma. Haematologica. 2013;98(12):1912-20. 45. Walker BA, Wardell CP, Chiecchio L, Smith EM, Boyd KD, Neri A, et al. Aberrant global methylation patterns affect the molecular pathogenesis and prognosis of multiple myeloma. Blood. 2011;117(2):553-62. 46. Halldorsdottir AM, Kanduri M, Marincevic M, Mansouri L, Isaksson A, Goransson H, et al. Mantle cell lymphoma displays a homogenous methylation profile: a comparative analysis with chronic lymphocytic leukemia. Am J Hematol. 2012;87(4):361-7.

189

47. Queiros AC, Beekman R, Vilarrasa-Blasi R, Duran-Ferrer M, Clot G, Merkel A, et al. Decoding the DNA Methylome of Mantle Cell Lymphoma in the Light of the Entire B Cell Lineage. Cancer cell. 2016;30(5):806-21. 48. Hewitt SL, Chaumeil J, Skok JA. Chromosome dynamics and the regulation of V(D)J recombination. Immunol Rev. 2010;237(1):43-54. 49. Beutler EL, MA; Coller, BS; Kipps, TJ; Seligsohn, U. Williams Hematology. 6th ed: McGraw-Hill; 2001. 50. Seda V, Mraz M. B-cell receptor signalling and its crosstalk with other pathways in normal and malignant cells. Eur J Haematol. 2015;94(3):193-205. 51. Zhong Y, Byrd JC, Dubovsky JA. The B-cell receptor pathway: a critical component of healthy and malignant immune biology. Semin Hematol. 2014;51(3):206-18. 52. Blombery PA, Wall M, Seymour JF. The molecular pathogenesis of B-cell non-Hodgkin lymphoma. Eur J Haematol. 2015;95(4):280-93. 53. Bolli N, Avet-Loiseau H, Wedge DC, Van Loo P, Alexandrov LB, Martincorena I, et al. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat Commun. 2014;5:2997. 54. Landgren O, Gridley G, Turesson I, Caporaso NE, Goldin LR, Baris D, et al. Risk of monoclonal gammopathy of undetermined significance (MGUS) and subsequent multiple myeloma among African American and white veterans in the United States. Blood. 2006;107(3):904-6. 55. Cohen HJ, Crawford J, Rao MK, Pieper CF, Currie MS. Racial differences in the prevalence of monoclonal gammopathy in a community-based sample of the elderly. Am J Med. 1998;104(5):439-44. 56. Waxman AJ, Mink PJ, Devesa SS, Anderson WF, Weiss BM, Kristinsson SY, et al. Racial disparities in incidence and outcome in multiple myeloma: a population-based study. Blood. 2010;116(25):5501-6. 57. Sergentanis TN, Zagouri F, Tsilimidos G, Tsagianni A, Tseliou M, Dimopoulos MA, et al. Risk Factors for Multiple Myeloma: A Systematic Review of Meta-Analyses. Clin Lymphoma Myeloma Leuk. 2015;15(10):563-77 e1-3. 58. Dolcetti R, Gloghini A, Caruso A, Carbone A. A lymphomagenic role for HIV beyond immune suppression? Blood. 2016;127(11):1403-9. 59. Thieblemont C, Bertoni F, Copie-Bergman C, Ferreri AJ, Ponzoni M. Chronic inflammation and extra-nodal marginal-zone lymphomas of MALT-type. Semin Cancer Biol. 2014;24:33-42. 60. Shanafelt TD, Drake MT, Maurer MJ, Allmer C, Rabe KG, Slager SL, et al. Vitamin D insufficiency and prognosis in chronic lymphocytic leukemia. Blood. 2011;117(5):1492-8. 61. Lim U, Freedman DM, Hollis BW, Horst RL, Purdue MP, Chatterjee N, et al. A prospective investigation of serum 25-hydroxyvitamin D and risk of lymphoid cancers. Int J Cancer. 2009;124(4):979-86. 62. Giovannucci E, Liu Y, Rimm EB, Hollis BW, Fuchs CS, Stampfer MJ, et al. Prospective study of predictors of vitamin D status and cancer incidence and mortality in men. J Natl Cancer Inst. 2006;98(7):451-9. 63. Larsson, Wolk. BMI and risk of myeloma: a meta-analysis. International journal of cancer Journal international du cancer. 2007;121:2512-6. 64. Larsson SC, Wolk A. Body mass index and risk of non-Hodgkin's and Hodgkin's lymphoma: a meta-analysis of prospective studies. Eur J Cancer. 2011;47(16):2422-30.

190

65. Geyer SM, Morton LM, Habermann TM, Allmer C, Davis S, Cozen W, et al. Smoking, alcohol use, obesity, and overall survival from non-Hodgkin lymphoma: a population-based study. Cancer. 2010;116(12):2993-3000. 66. Andreotti G, Birmann BM, Cozen W, De Roos AJ, Chiu BC, Costas L, et al. A pooled analysis of cigarette smoking and risk of multiple myeloma from the international multiple myeloma consortium. Cancer Epidemiol Biomarkers Prev. 2015;24(3):631-4. 67. Hofmann JN, Moore SC, Lim U, Park Y, Baris D, Hollenbeck AR, et al. Body mass index and physical activity at different ages and risk of multiple myeloma in the NIH-AARP diet and health study. Am J Epidemiol. 2013;177(8):776-86. 68. Cerhan JR, Slager SL. Familial predisposition and genetic risk factors for lymphoma. Blood. 2015;126(20):2265-73. 69. Linet MS, Pottern LM. Familial aggregation of hematopoietic malignancies and risk of non-Hodgkin's lymphoma. Cancer Res. 1992;52(19 Suppl):5468s-73s. 70. Yuille MR, Matutes E, Marossy A, Hilditch B, Catovsky D, Houlston RS. Familial chronic lymphocytic leukaemia: a survey and review of published studies. Br J Haematol. 2000;109(4):794-9. 71. Goldin LR, Bjorkholm M, Kristinsson SY, Turesson I, Landgren O. Highly increased familial risks for specific lymphoma subtypes. Br J Haematol. 2009;146(1):91-4. 72. Kristinsson SY, Bjorkholm M, Goldin LR, McMaster ML, Turesson I, Landgren O. Risk of lymphoproliferative disorders among first-degree relatives of lymphoplasmacytic lymphoma/Waldenstrom macroglobulinemia patients: a population-based study in Sweden. Blood. 2008;112(8):3052-6. 73. Vachon CM, Kyle RA, Therneau TM, Foreman BJ, Larson DR, Colby CL, et al. Increased risk of monoclonal gammopathy in first-degree relatives of patients with multiple myeloma or monoclonal gammopathy of undetermined significance. Blood. 2009;114(4):785-90. 74. Lynch HT, Ferrara K, Barlogie B, Coleman EA, Lynch JF, Weisenburger D, et al. Familial myeloma. N Engl J Med. 2008;359(2):152-7. 75. Schinasi LH, Brown EE, Camp NJ, Wang SS, Hofmann JN, Chiu BC, et al. Multiple myeloma and family history of lymphohaematopoietic cancers: Results from the International Multiple Myeloma Consortium. Br J Haematol. 2016;175(1):87-101. 76. Jasek M, Bojarska-Junak A, Wagner M, Sobczynski M, Wolowiec D, Rolinski J, et al. Association of variants in BAFF (rs9514828 and rs1041569) and BAFF-R (rs61756766) genes with the risk of chronic lymphocytic leukemia. Tumour Biol. 2016;37(10):13617-26. 77. Bird A. DNA methylation patterns and epigenetic memory. Genes & development. 2002;16(1):6-21. 78. Herman JG, Baylin SB. Gene silencing in cancer in association with promoter hypermethylation. N Engl J Med. 2003;349(21):2042-54. 79. Weber M, Hellmann I, Stadler MB, Ramos L, Paabo S, Rebhan M, et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 2007;39(4):457-66. 80. Deaton AM, Bird A. CpG islands and the regulation of transcription. Genes & development. 2011;25(10):1010-22.

191

81. Williams K, Christensen J, Pedersen MT, Johansen JV, Cloos PA, Rappsilber J, et al. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity. Nature. 2011;473(7347):343-8. 82. Ito S, Shen L, Dai Q, Wu SC, Collins LB, Swenberg JA, et al. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science. 2011;333(6047):1300-3. 83. Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet. 2012;13(7):484-92. 84. Esteller M. Dormant hypermethylated tumour suppressor genes: questions and answers. J Pathol. 2005;205(2):172-80. 85. Texas TUo. Tumor Suppressor Gene Database 2016 [ 86. Enjuanes A, Fernandez V, Hernandez L, Navarro A, Bea S, Pinyol M, et al. Identification of methylated genes associated with aggressive clinicopathological features in mantle cell lymphoma. PLoS One. 2011;6(5):e19736. 87. Pike BL, Greiner TC, Wang X, Weisenburger DD, Hsu YH, Renaud G, et al. DNA methylation profiles in diffuse large B-cell lymphoma and their relationship to gene expression status. Leukemia. 2008;22(5):1035-43. 88. de Carvalho F, Colleoni GW, Almeida MS, Carvalho AL, Vettore AL. TGFbetaR2 aberrant methylation is a potential prognostic marker and therapeutic target in multiple myeloma. Int J Cancer. 2009;125(8):1985-91. 89. Galm O, Wilop S, Reichelt J, Jost E, Gehbauer G, Herman JG, et al. DNA methylation changes in multiple myeloma. Leukemia. 2004;18(10):1687-92. 90. Giachelia M, Bozzoli V, D'Alo F, Tisi MC, Massini G, Maiolo E, et al. Quantification of DAPK1 promoter methylation in bone marrow and peripheral blood as a follicular lymphoma biomarker. The Journal of molecular diagnostics : JMD. 2014;16(4):467-76. 91. Krajnovic M, Jovanovic MP, Mihaljevic B, Andelic B, Tarabar O, Knezevic- Usaj S, et al. Hypermethylation of p15 gene in diffuse - large B-cell lymphoma: association with less aggressiveness of the disease. Clin Transl Sci. 2014;7(5):384-90. 92. Kristensen LS, Asmar F, Dimopoulos K, Nygaard MK, Aslan D, Hansen JW, et al. Hypermethylation of DAPK1 is an independent prognostic factor predicting survival in diffuse large B-cell lymphoma. Oncotarget. 2014;5(20):9798-810. 93. O'Riain C, O'Shea DM, Yang Y, Le Dieu R, Gribben JG, Summers K, et al. Array-based DNA methylation profiling in follicular lymphoma. Leukemia. 2009;23(10):1858-66. 94. Yuregir OO, Yurtcu E, Kizilkilic E, Kocer NE, Ozdogu H, Sahin FI. Detecting methylation patterns of p16, MGMT, DAPK and E-cadherin genes in multiple myeloma patients. Int J Lab Hematol. 2010;32(2):142-9. 95. Chim CS, Pang R, Fung TK, Choi CL, Liang R. Epigenetic dysregulation of Wnt signaling pathway in multiple myeloma. Leukemia. 2007;21(12):2527-36. 96. Kocemba KA, Groen RW, van Andel H, Kersten MJ, Mahtouk K, Spaargaren M, et al. Transcriptional silencing of the Wnt-antagonist DKK1 by promoter methylation is associated with enhanced Wnt signaling in advanced multiple myeloma. PLoS One. 2012;7(2):e30359. 97. Bennett LB, Schnabel JL, Kelchen JM, Taylor KH, Guo J, Arthur GL, et al. DNA Hypermethylation Accompanied by Transcriptional Repression in Follicular Lymphoma. Genes, chromosomes & cancer. 2009;48(9):828-41.

192

98. Pei L, Choi JH, Liu J, Lee EJ, McCarthy B, Wilson JM, et al. Genome-wide DNA methylation analysis reveals novel epigenetic changes in chronic lymphocytic leukemia. Epigenetics. 2012;7(6):567-78. 99. Kaiser MF, Johnson DC, Wu P, Walker BA, Brioli A, Mirabella F, et al. Global methylation analysis identifies prognostically important epigenetically inactivated tumor suppressor genes in multiple myeloma. Blood. 2013;122(2):219-26. 100. Choi JH, Li Y, Guo J, Pei L, Rauch TA, Kramer RS, et al. Genome-wide DNA methylation maps in follicular lymphoma cells determined by methylation- enriched bisulfite sequencing. PLoS One. 2010;5(9). 101. Chen RZ, Pettersson U, Beard C, Jackson-Grusby L, Jaenisch R. DNA hypomethylation leads to elevated mutation rates. Nature. 1998;395(6697):89- 93. 102. Raval A, Tanner SM, Byrd JC, Angerman EB, Perko JD, Chen SS, et al. Downregulation of death-associated protein kinase 1 (DAPK1) in chronic lymphocytic leukemia. Cell. 2007;129(5):879-90. 103. Corcoran M, Parker A, Orchard J, Davis Z, Wirtz M, Schmitz OJ, et al. ZAP- 70 methylation status is associated with ZAP-70 expression status in chronic lymphocytic leukemia. Haematologica. 2005;90(8):1078-88. 104. Strathdee G, Sim A, Parker A, Oscier D, Brown R. Promoter hypermethylation silences expression of the HoxA4 gene and correlates with IgVh mutational status in CLL. Leukemia. 2006;20(7):1326-9. 105. Raval A, Lucas DM, Matkovic JJ, Bennett KL, Liyanarachchi S, Young DC, et al. TWIST2 demonstrates differential methylation in immunoglobulin variable heavy chain mutated and unmutated chronic lymphocytic leukemia. J Clin Oncol. 2005;23(17):3877-85. 106. Liu XF, Zhu SG, Zhang H, Xu Z, Su HL, Li SJ, et al. The methylation status of the TMS1/ASC gene in cholangiocarcinoma and its clinical significance. Hepatobiliary Pancreat Dis Int. 2006;5(3):449-53. 107. Chen SS, Claus R, Lucas DM, Yu L, Qian J, Ruppert AS, et al. Silencing of the inhibitor of DNA binding protein 4 (ID4) contributes to the pathogenesis of mouse and human CLL. Blood. 2011;117(3):862-71. 108. Seeliger B, Wilop S, Osieka R, Galm O, Jost E. CpG island methylation patterns in chronic lymphocytic leukemia. Leuk Lymphoma. 2009;50(3):419-26. 109. Arruga F, Gizdic B, Bologna C, Cignetto S, Buonincontri R, Serra S, et al. Mutations in NOTCH1 PEST-domain orchestrate CCL19-driven homing of Chronic Lymphocytic Leukemia cells by modulating the tumor suppressor gene DUSP22. Leukemia. 2016. 110. Tong WG, Wierda WG, Lin E, Kuang SQ, Bekele BN, Estrov Z, et al. Genome-wide DNA methylation profiling of chronic lymphocytic leukemia allows identification of epigenetically repressed molecular pathways with clinical impact. Epigenetics: Official Journal of the DNA Methylation Society. 2010;5(6):499-508. 111. Rahmatpanah FB, Carstens S, Hooshmand SI, Welsh EC, Sjahputera O, Taylor KH, et al. Large-scale analysis of DNA methylation in chronic lymphocytic leukemia. Epigenomics. 2009;1(1):39-61. 112. Ronchetti D, Tuana G, Rinaldi A, Agnelli L, Cutrona G, Mosca L, et al. Distinct patterns of global promoter methylation in early stage chronic lymphocytic leukemia. Genes, Chromosomes and Cancer. 2014;53(3):264-73.

193

113. Cahill N, Bergh AC, Kanduri M, Goransson-Kultima H, Mansouri L, Isaksson A, et al. 450K-array analysis of chronic lymphocytic leukemia cells reveals global DNA methylation to be relatively stable over time and similar in resting and proliferative compartments. Leukemia. 2013;27(1):150-8. 114. Kulis M, Heath S, Bibikova M, Queiros AC, Navarro A, Clot G, et al. Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia. Nat Genet. 2012;44(11):1236-42. 115. Irving L, Mainou-Fowler T, Parker A, Ibbotson RE, Oscier DG, Strathdee G. Methylation markers identify high risk patients in IGHV mutated chronic lymphocytic leukemia. Epigenetics. 2011;6(3):300-6. 116. Queiros AC, Villamor N, Clot G, Martinez-Trillos A, Kulis M, Navarro A, et al. A B-cell epigenetic signature defines three biologic subgroups of chronic lymphocytic leukemia with clinical impact. Leukemia. 2015;29(3):598-605. 117. Baur AS, Shaw P, Burri N, Delacretaz F, Bosman FT, Chaubert P. Frequent methylation silencing of p15(INK4b) (MTS2) and p16(INK4a) (MTS1) in B-cell and T-cell lymphomas. Blood. 1999;94(5):1773-81. 118. Alhejaily A, Day AG, Feilotter HE, Baetz T, Lebrun DP. Inactivation of the CDKN2A tumor-suppressor gene by deletion or methylation is common at diagnosis in follicular lymphoma and associated with poor clinical outcome. Clin Cancer Res. 2014;20(6):1676-86. 119. Pinyol M, Cobo F, Bea S, Jares P, Nayach I, Fernandez PL, et al. p16(INK4a) gene inactivation by deletions, mutations, and hypermethylation is associated with transformed and aggressive variants of non-Hodgkin's lymphomas. Blood. 1998;91(8):2977-84. 120. Martín-Subero JI, Kreuz M, Bibikova M, Bentink S, Ammerpohl O, Wickham-Garcia E, et al. New insights into the biology and origin of mature aggressive B-cell lymphomas by combined epigenomic, genomic, and transcriptional profiling. Blood. 2009;113(11):2488-97. 121. Martin-Subero JI, Ammerpohl O, Bibikova M, Wickham-Garcia E, Agirre X, Alvarez S, et al. A Comprehensive Microarray-Based DNA Methylation Study of 367 Hematological Neoplasms. PLoS ONE. 2009;4(9):e6986. 122. Ushmorov A, Leithauser F, Ritz O, Barth TF, Moller P, Wirth T. ABF-1 is frequently silenced by promoter methylation in follicular lymphoma, diffuse large B-cell lymphoma and Burkitt's lymphoma. Leukemia. 2008;22(10):1942-4. 123. Rossi D, Capello D, Gloghini A, Franceschetti S, Paulli M, Bhatia K, et al. Aberrant promoter methylation of multiple genes throughout the clinico- pathologic spectrum of B-cell neoplasia. Haematologica. 2004;89(2):154-64. 124. Liu H, Wang J, Epner EM. Cyclin D1 activation in B-cell malignancy: association with changes in histone acetylation, DNA methylation, and RNA polymerase II binding to both promoter and distal sequences. Blood. 2004;104(8):2505-13. 125. Killian JK, Bilke S, Davis S, Walker RL, Killian MS, Jaeger EB, et al. Large- scale profiling of archival lymph nodes reveals pervasive remodeling of the follicular lymphoma methylome. Cancer Res. 2009;69(3):758-64. 126. Brach D, Johnston-Blackwell D, Drew A, Lingaraj T, Motwani V, Warholic NM, et al. EZH2 Inhibition by Tazemetostat Results in Altered Dependency on B- cell Activation Signaling in DLBCL. Mol Cancer Ther. 2017;16(11):2586-97. 127. Ribrag VS, Jean-Charles; Michot,Jean-Marie; Schmitt, Anna; Postel-Vinay, Sophie; Bijou, Fontanet; Thomson, Blythe; Keilhack, Heike; Blakemore, Stephen

194

J.; Reyderman, Larisa; Kumar, Pavan; Fine, Greg; McDonald, Alice; Ho, Peter T; Italiano, Antoine. Phase 1 Study of Tazemetostat (EPZ-6438), an Inhibitor of Enhancer of Zeste-Homolog 2 (EZH2): Preliminary Safety and Activity in Relapsed or Refractory Non-Hodgkin Lymphoma (NHL) Patients. ASH Annual Scientific Meeting: Blood; 2015. p. 473. 128. Yap TAW, Jane N.; Leonard,John P.; Ribrag,Vincent; Constantinidou,Anastasia; Giulino-Roth, Lisa; Michot,Jean-Marie; Khan,Tariq A; Horner, Thierry; Carver, Jennifer; Dumetrescu, Teodora Pene; He, Zangdong; McCabe, Michael T.; Creasy Caretha L.;, Dhar, Arindam; Carpenter, Christopher; Johnson, Peter M A Phase I Study of GSK2816126, an Enhancer of Zeste Homolog 2(EZH2) Inhibitor, in Patients (pts) with Relapsed/Refractory Diffuse Large B- Cell Lymphoma (DLBCL), Other Non-Hodgkin Lymphomas (NHL), Transformed Follicular Lymphoma (tFL), Solid Tumors and Multiple Myeloma (MM). ASH Annual Scientific Meeting: Blood; 2016. p. 4203. 129. Amara K, Ziadi S, Hachana M, Soltani N, Korbi S, Trimeche M. DNA methyltransferase DNMT3b protein overexpression as a prognostic factor in patients with diffuse large B-cell lymphomas. Cancer Sci. 2010;101(7):1722-30. 130. Zainuddin N, Kanduri M, Berglund M, Lindell M, Amini RM, Roos G, et al. Quantitative evaluation of p16(INK4a) promoter methylation using pyrosequencing in de novo diffuse large B-cell lymphoma. Leuk Res. 2011;35(4):438-43. 131. Lee SM, Lee EJ, Ko YH, Lee SH, Maeng L, Kim KM. Prognostic significance of O6-methylguanine DNA methyltransferase and p57 methylation in patients with diffuse large B-cell lymphomas. APMIS. 2009;117(2):87-94. 132. Wedge E, Hansen JW, Garde C, Asmar F, Tholstrup D, Kristensen SS, et al. Global hypomethylation is an independent prognostic factor in diffuse large B cell lymphoma. Am J Hematol. 2017;92(7):689-94. 133. Esteller M, Guo M, Moreno V, Peinado MA, Capella G, Galm O, et al. Hypermethylation-associated Inactivation of the Cellular Retinol-Binding- Protein 1 Gene in Human Cancer. Cancer Res. 2002;62(20):5902-5. 134. Kristensen LS, Hansen JW, Kristensen SS, Tholstrup D, Harslof LB, Pedersen OB, et al. Aberrant methylation of cell-free circulating DNA in plasma predicts poor outcome in diffuse large B cell lymphoma. Clinical epigenetics. 2016;8(1):95. 135. Shaknovich R, Geng H, Johnson NA, Tsikitas L, Cerchietti L, Greally JM, et al. DNA methylation signatures define molecular subtypes of diffuse large B-cell lymphoma. Blood. 2010;116(20):e81-9. 136. Huang W, Xue X, Shan L, Qiu T, Guo L, Ying J, et al. Clinical significance of PCDH10 promoter methylation in diffuse large B-cell lymphoma. BMC Cancer. 2017;17(1):815. 137. Chambwe N, Kormaksson M, Geng H, De S, Michor F, Johnson NA, et al. Variability in DNA methylation defines novel epigenetic subgroups of DLBCL associated with different clinical outcomes. Blood. 2014;123(11):1699-708. 138. Salhia B, Baker A, Ahmann G, Auclair D, Fonseca R, Carpten J. DNA methylation analysis determines the high frequency of genic hypomethylation and low frequency of hypermethylation events in plasma cell tumors. Cancer Res. 2010;70(17):6934-44. 139. Aoki Y, Nojima M, Suzuki H, Yasui H, Maruyama R, Yamamoto E, et al. Genomic vulnerability to LINE-1 hypomethylation is a potential determinant of

195 the clinicogenetic features of multiple myeloma. Genome medicine. 2012;4(12):101. 140. Tatetsu H, Ueno S, Hata H, Yamada Y, Takeya M, Mitsuya H, et al. Down- regulation of PU.1 by methylation of distal regulatory elements and the promoter is required for myeloma cell growth. Cancer Res. 2007;67(11):5328-36. 141. Cheng SH, Ng MH, Lau KM, Liu HS, Chan JC, Hui AB, et al. 4q loss is potentially an important genetic event in MM tumorigenesis: identification of a tumor suppressor gene regulated by promoter methylation at 4q13.3, platelet factor 4. Blood. 2007;109(5):2089-99. 142. Gonzalez-Paz N, Chng WJ, McClure RF, Blood E, Oken MM, Van Ness B, et al. Tumor suppressor p16 methylation in multiple myeloma: biological and clinical implications. Blood. 2007;109(3):1228-32. 143. Peng L, Yang Z, Tan C, Ren G, Chen J. Epigenetic inactivation of ADAMTS9 via promoter methylation in multiple myeloma. Mol Med Rep. 2013;7(3):1055- 61. 144. Jost E, do ON, Wilop S, Herman JG, Osieka R, Galm O. Aberrant DNA methylation of the transcription factor C/EBPalpha in acute myelogenous leukemia. Leuk Res. 2009;33(3):443-9. 145. Stanganelli C, Arbelbide J, Fantl DB, Corrado C, Slavutsky I. DNA methylation analysis of tumor suppressor genes in monoclonal gammopathy of undetermined significance. Ann Hematol. 2010;89(2):191-9. 146. Wong KY, Yim RL, Kwong YL, Leung CY, Hui PK, Cheung F, et al. Epigenetic inactivation of the MIR129-2 in hematological malignancies. J Hematol Oncol. 2013;6:16. 147. Jung S, Kim S, Gale M, Cherni I, Fonseca R, Carpten J, et al. DNA methylation in multiple myeloma is weakly associated with gene transcription. PLoS One. 2012;7(12):e52626. 148. Hatzimichael E, Dasoula A, Kounnis V, Benetatos L, Lo Nigro C, Lattanzio L, et al. Bcl2-interacting killer CpG methylation in multiple myeloma: a potential predictor of relapsed/refractory disease with therapeutic implications. Leuk Lymphoma. 2012;53(9):1709-13. 149. Wong KY, Yim RL, So CC, Jin DY, Liang R, Chim CS. Epigenetic inactivation of the MIR34B/C in multiple myeloma. Blood. 2011;118(22):5901-4. 150. Chim CS, Liang R, Leung MH, Kwong YL. Aberrant gene methylation implicated in the progression of monoclonal gammopathy of undetermined significance to multiple myeloma. J Clin Pathol. 2007;60(1):104-6. 151. Park G, Kang SH, Lee JH, Suh C, Kim M, Park SM, et al. Concurrent p16 methylation pattern as an adverse prognostic factor in multiple myeloma: a methylation-specific polymerase chain reaction study using two different primer sets. Ann Hematol. 2011;90(1):73-9. 152. Kim H, Jekarl DW, Kim M, Kim Y, Lim J, Han K, et al. Prevalence of p16 methylation and prognostic factors in plasma cell myeloma at a single institution in Korea. Ann Lab Med. 2013;33(1):28-33. 153. Fernandez de Larrea C, Martin-Antonio B, Cibeira MT, Navarro A, Tovar N, Diaz T, et al. Impact of global and gene-specific DNA methylation pattern in relapsed multiple myeloma patients treated with bortezomib. Leuk Res. 2013;37(6):641-6.

196

154. Boque N, Campion J, Milagro FI, Moreno-Aliaga MJ, Martinez JA. Some cyclin-dependent kinase inhibitors-related genes are regulated by vitamin C in a model of diet-induced obesity. Biol Pharm Bull. 2009;32(8):1462-8. 155. Tobi EW, Goeman JJ, Monajemi R, Gu H, Putter H, Zhang Y, et al. DNA methylation signatures link prenatal famine exposure to growth and metabolism. Nat Commun. 2014;5:5592. 156. Hughes AM, Armstrong BK, Vajdic CM, Turner J, Grulich AE, Fritschi L, et al. Sun exposure may protect against non-Hodgkin lymphoma: a case-control study. Int J Cancer. 2004;112(5):865-71. 157. Drake MT, Maurer MJ, Link BK, Habermann TM, Ansell SM, Micallef IN, et al. Vitamin D insufficiency and prognosis in non-Hodgkin's lymphoma. J Clin Oncol. 2010;28(27):4191-8. 158. Marik R, Fackler M, Gabrielson E, Zeiger MA, Sukumar S, Stearns V, et al. DNA methylation-related vitamin D receptor insensitivity in breast cancer. Cancer Biol Ther. 2010;10(1):44-53. 159. Lu Y, Wang SS, Reynolds P, Chang ET, Ma H, Sullivan-Halley J, et al. Cigarette smoking, passive smoking, and non-Hodgkin lymphoma risk: evidence from the California Teachers Study. Am J Epidemiol. 2011;174(5):563-73. 160. Belinsky SA, Nikula KJ, Palmisano WA, Michels R, Saccomanno G, Gabrielson E, et al. Aberrant methylation of p16(INK4a) is an early event in lung cancer and a potential biomarker for early diagnosis. Proc Natl Acad Sci U S A. 1998;95(20):11891-6. 161. Mathers JC, Strathdee G, Relton CL. Induction of epigenetic alterations by dietary and other environmental factors. Adv Genet. 2010;71:3-39. 162. Jacob RA, Gretz DM, Taylor PC, James SJ, Pogribny IP, Miller BJ, et al. Moderate folate depletion increases plasma homocysteine and decreases lymphocyte DNA methylation in postmenopausal women. J Nutr. 1998;128(7):1204-12. 163. Kim HN, Lee IK, Kim YK, Tran HT, Yang DH, Lee JJ, et al. Association between folate-metabolizing pathway polymorphism and non-Hodgkin lymphoma. Br J Haematol. 2008;140(3):287-94. 164. National Health and Medical Research Council. Nutrient Reference values for Australia and New Zealand Including Recommended Dietary Intakes. Canberra: National Health and Medical Research Council; 2006. 165. Dugue PA, Bassett JK, Joo JE, Jung CH, Ming Wong E, Moreno-Betancur M, et al. DNA methylation-based biological aging and cancer risk and survival: Pooled analysis of seven prospective studies. Int J Cancer. 2018;142(8):1611-9. 166. Fenaux P, Mufti GJ, Hellstrom-Lindberg E, Santini V, Finelli C, Giagounidis A, et al. Efficacy of azacitidine compared with that of conventional care regimens in the treatment of higher-risk myelodysplastic syndromes: a randomised, open- label, phase III study. Lancet Oncol. 2009;10(3):223-32. 167. Wong KY, So CC, Loong F, Chung LP, Lam WW, Liang R, et al. Epigenetic inactivation of the miR-124-1 in haematological malignancies. PLoS One. 2011;6(4):e19027. 168. Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12(8):529-41. 169. Brennan K, Flanagan JM. Is there a link between genome-wide hypomethylation in blood and cancer risk? Cancer Prev Res (Phila). 2012;5(12):1345-57.

197

170. Choi JY, James SR, Link PA, McCann SE, Hong CC, Davis W, et al. Association between global DNA hypomethylation in leukocytes and risk of breast cancer. Carcinogenesis. 2009;30(11):1889-97. 171. Dauksa A, Gulbinas A, Endzinas Z, Oldenburg J, El-Maarri O. DNA methylation at selected CpG sites in peripheral blood leukocytes is predictive of gastric cancer. Anticancer Res. 2014;34(10):5381-8. 172. Koestler DC, Marsit CJ, Christensen BC, Accomando W, Langevin SM, Houseman EA, et al. Peripheral blood immune cell methylation profiles are associated with nonhematopoietic cancers. Cancer Epidemiol Biomarkers Prev. 2012;21(8):1293-302. 173. Lim U, Flood A, Choi SW, Albanes D, Cross AJ, Schatzkin A, et al. Genomic methylation of leukocyte DNA in relation to colorectal adenoma among asymptomatic women. Gastroenterology. 2008;134(1):47-55. 174. Marsit CJ, Koestler DC, Christensen BC, Karagas MR, Houseman EA, Kelsey KT. DNA methylation array analysis identifies profiles of blood-derived DNA methylation associated with bladder cancer. J Clin Oncol. 2011;29(9):1133-9. 175. Moore LE, Pfeiffer RM, Poscablo C, Real FX, Kogevinas M, Silverman D, et al. Genomic DNA hypomethylation as a biomarker for bladder cancer susceptibility in the Spanish Bladder Cancer Study: a case-control study. Lancet Oncol. 2008;9(4):359-66. 176. Severi G, Southey MC, English DR, Jung CH, Lonie A, McLean C, et al. Epigenome-wide methylation in DNA from peripheral blood as a marker of risk for breast cancer. Breast Cancer Res Treat. 2014;148(3):665-73. 177. Teschendorff AE, Menon U, Gentry-Maharaj A, Ramus SJ, Gayther SA, Apostolidou S, et al. An epigenetic signature in peripheral blood predicts active ovarian cancer. PLoS One. 2009;4(12):e8274. 178. Wallner M, Herbst A, Behrens A, Crispin A, Stieber P, Goke B, et al. Methylation of serum DNA is an independent prognostic marker in colorectal cancer. Clin Cancer Res. 2006;12(24):7347-52. 179. Widschwendter M, Apostolidou S, Raum E, Rothenbacher D, Fiegl H, Menon U, et al. Epigenotyping in peripheral blood cell DNA and breast cancer risk: a proof of principle study. PLoS One. 2008;3(7):e2656. 180. Woo HD, Kim J. Global DNA hypomethylation in peripheral blood leukocytes as a biomarker for cancer risk: a meta-analysis. PLoS One. 2012;7(4):e34615. 181. Wong Doo N, Makalic E, Joo JE, Vajdic CM, Schmidt DF, Wong EM, et al. Global measures of peripheral blood-derived DNA methylation as a risk factor in the development of mature B-cell neoplasms. Epigenomics. 2016;8(1):55-66. 182. Laird PW. Principles and challenges of genomewide DNA methylation analysis. Nat Rev Genet. 2010;11(3):191-203. 183. Fazzari MJ, Greally JM. Epigenomics: beyond CpG islands. Nat Rev Genet. 2004;5(6):446-55. 184. Bock C, Tomazou EM, Brinkman AB, Muller F, Simmer F, Gu H, et al. Quantitative comparison of genome-wide DNA methylation mapping technologies. Nature biotechnology. 2010;28(10):1106-14. 185. Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics. 2011;6(6):692-702.

198

186. Pan H, Chen L, Dogra S, Ling Teh A, Tan JH, Lim YI, et al. Measuring the methylome in clinical samples: Improved processing of the Infinium Human Methylation450 BeadChip Array. Epigenetics. 2012;7(10). 187. Maksimovic J, Gordon L, Oshlack A. SWAN: Subset-quantile within array normalization for illumina infinium HumanMethylation450 BeadChips. Genome biology. 2012;13(6):R44. 188. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, et al. High- throughput DNA methylation profiling using universal bead arrays. Genome research. 2006;16(3):383-93. 189. Du P, Zhang X, Huang CC, Jafari N, Kibbe WA, Hou L, et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC bioinformatics. 2010;11:587. 190. Allison DB, Cui X, Page GP, Sabripour M. Microarray data analysis: from disarray to consolidation and consensus. Nat Rev Genet. 2006;7(1):55-65. 191. Bibikova M, Barnes B, Tsan C, Ho V, Klotzle B, Le JM, et al. High density DNA methylation array with single CpG site resolution. Genomics. 2011;98(4):288-95. 192. Roessler J, Ammerpohl O, Gutwein J, Hasemeier B, Anwar SL, Kreipe H, et al. Quantitative cross-validation and content analysis of the 450k DNA methylation array from Illumina, Inc. BMC Res Notes. 2012;5:210. 193. Gronniger E, Weber B, Heil O, Peters N, Stab F, Wenck H, et al. Aging and chronic sun exposure cause distinct epigenetic changes in human skin. PLoS Genet. 2010;6(5):e1000971. 194. Breitling LP, Yang R, Korn B, Burwinkel B, Brenner H. Tobacco-smoking- related differential DNA methylation: 27K discovery and replication. Am J Hum Genet. 2011;88(4):450-7. 195. Zeilinger S, Kuhnel B, Klopp N, Baurecht H, Kleinschmidt A, Gieger C, et al. Tobacco smoking leads to extensive genome-wide changes in DNA methylation. PLoS One. 2013;8(5):e63812. 196. La Merrill M, Torres-Sanchez L, Ruiz-Ramos R, Lopez-Carrillo L, Cebrian ME, Chen J. The association between first trimester micronutrient intake, MTHFR genotypes, and global DNA methylation in pregnant women. J Matern Fetal Neonatal Med. 2012;25(2):133-7. 197. McKay JA, Waltham KJ, Williams EA, Mathers JC. Folate depletion during pregnancy and lactation reduces genomic DNA methylation in murine adult offspring. Genes Nutr. 2011;6(2):189-96. 198. Vineis P, Chuang SC, Vaissiere T, Cuenin C, Ricceri F, Johansson M, et al. DNA methylation changes associated with cancer risk factors and blood levels of vitamin metabolites in a prospective study. Epigenetics. 2011;6(2):195-201. 199. Milagro FI, Campion J, Garcia-Diaz DF, Goyenechea E, Paternain L, Martinez JA. High fat diet-induced obesity modifies the methylation pattern of leptin promoter in rats. J Physiol Biochem. 2009;65(1):1-9. 200. Zhao R, Zhang R, Li W, Liao Y, Tang J, Miao Q, et al. Genome-wide DNA methylation patterns in discordant sib pairs with alcohol dependence. Asia- Pacific Psychiatry. 2013;5(1):39-50. 201. Rakyan VK, Down TA, Balding DJ, Beck S. Epigenome-wide association studies for common human diseases. Nat Rev Genet. 2011;12.

199

202. Joo JE, Wong EM, Baglietto L, Jung CH, Tsimiklis H, Park DJ, et al. The use of DNA from archival dried blood spots with the Infinium HumanMethylation450 array. BMC biotechnology. 2013;13:23. 203. Koestler DC, Marsit CJ, Christensen BC, Accomando WP, Langevin SM, Houseman EA, et al. Peripheral blood immune cell methylation profiles are associated with non-hematopoietic cancers. Cancer Epidemiol Biomarkers Prev. 2012;19:19. 204. Bock C, Beerman I, Lien WH, Smith ZD, Gu H, Boyle P, et al. DNA methylation dynamics during in vivo differentiation of blood and skin stem cells. Molecular cell. 2012;47(4):633-47. 205. Adalsteinsson BT, Gudnason H, Aspelund T, Harris TB, Launer LJ, Eiriksdottir G, et al. Heterogeneity in White Blood Cells Has Potential to Confound DNA Methylation Measurements. PLoS One. 2012;7(10):e46705. 206. Reinius LE, Acevedo N, Joerink M, Pershagen G, Dahlén S-E, Greco D, et al. Differential DNA Methylation in Purified Human Blood Cells: Implications for Cell Lineage and Studies on Disease Susceptibility. PLoS One. 2012;7(7):e41361. 207. Houseman EA, Accomando WP, Koestler DC, Christensen BC, Marsit CJ, Nelson HH, et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC bioinformatics. 2012;13(1). 208. Koestler DC, Christensen B, Karagas MR, Marsit CJ, Langevin SM, Kelsey KT, et al. Blood-based profiles of DNA methylation predict the underlying distribution of cell types: A validation analysis. Epigenetics. 2013;8(8). 209. Landgren O, Albitar M, Ma W, Abbasi F, Hayes RB, Ghia P, et al. B-cell clones as early markers for chronic lymphocytic leukemia. N Engl J Med. 2009;360(7):659-67. 210. Rawstron AC, Bennett FL, O'Connor SJ, Kwok M, Fenton JA, Plummer M, et al. Monoclonal B-cell lymphocytosis and chronic lymphocytic leukemia. N Engl J Med. 2008;359(6):575-83. 211. Landgren O, Kyle RA, Pfeiffer RM, Katzmann JA, Caporaso NE, Hayes RB, et al. Monoclonal gammopathy of undetermined significance (MGUS) consistently precedes multiple myeloma: A prospective study. Blood. 2009;113(22):5412-7. 212. Kyle RA, Therneau TM, Rajkumar SV, Larson DR, Plevak MF, Offord JR, et al. Prevalence of monoclonal gammopathy of undetermined significance. New England Journal of Medicine. 2006;354(13):1362-9. 213. Zandecki M, Bernardi F, Genevieve F, Lai JL, Preudhomme C, Flactif M, et al. Involvement of peripheral blood cells in multiple myeloma: chromosome changes are the rule within circulating plasma cells but not within B lymphocytes. Leukemia. 1997;11(7):1034-9. 214. Hirt C, Camargo MC, Yu KJ, Hewitt SM, Dolken G, Rabkin CS. Risk of follicular lymphoma associated with BCL2 translocations in peripheral blood. Leuk Lymphoma. 2015;56(9):2625-9. 215. Roulland S, Kelly RS, Morgado E, Sungalee S, Solal-Celigny P, Colombat P, et al. t(14;18) Translocation: A predictive blood biomarker for follicular lymphoma. J Clin Oncol. 2014;32(13):1347-55. 216. Milne RL, Fletcher AS, MacInnis RJ, Hodge AM, Hopkins AH, Bassett JK, et al. Cohort Profile: The Melbourne Collaborative Cohort Study (Health 2020). International journal of epidemiology. 2017. 217. Ireland P, Jolley DJ, Giles GGea. Development of the Melbourne FFQ: a food frequency questionnaire for use in an Australian prospective study involving an

200 ethnically diverse cohort. Asia Pacific Journal of Clinical Nutrition. 1994;3(19- 31). 218. Percy CF, A; Jack, A; Shanmugarathan, S; Sobin, L; Parkin, D.M.; Whelan, S. International Classification of Diseases for Oncology (ICD-O-3): World Health Organisation; 2000. 219. R Core Team. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2015. 220. Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30(10):1363-9. 221. Marabita F, Almgren M, Lindholm ME, Ruhrmann S, Fagerstrom-Billai F, Jagodic M, et al. An evaluation of analysis pipelines for DNA methylation profiling using the Illumina HumanMethylation450 BeadChip platform. Epigenetics. 2013;8(3):333-46. 222. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118-27. 223. Herman JG, Baylin SB. Gene silencing in cancer in association with promoter hypermethylation. N Engl J Med. 2003;349(21):2042-54. 224. Jones PA, Baylin SB. The fundamental role of epigenetic events in cancer. Nat Rev Genet. 2002;3(6):415-28. 225. Kleinbaum DGK, M. Logisitic Regression - A Self-Learning Text. 3rd ed. Gail MK, J.M.; Tsiatis, A.; Wong, W., editor. New York: Springer Publishers; 2010 2010. 226. Rahman M, Sakamoto J, Fukui T. Conditional versus unconditional logistic regression in the medical literature. J Clin Epidemiol. 2003;56(1):101-2. 227. Dunn OJ. Multiple Comparisons Among Means. Journal of the American Statistical Association. 1961;56(293):52-64. 228. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B. 1995;57. 229. Martin-Subero JI, Ammerpohl O, Bibikova M, Wickham-Garcia E, Agirre X, Alvarez S, et al. A comprehensive microarray-based DNA methylation study of 367 hematological neoplasms. PLoS One. 2009;4(9):e6986. 230. Walker BA, Wardell CP, Chiecchio L, Smith EM, Boyd KD, Neri A, et al. Aberrant global methylation patterns affect the molecular pathogenesis and prognosis of multiple myeloma. Blood. 2011;117(2):553-62. 231. Dugue PA, English DR, MacInnis RJ, Jung CH, Bassett JK, FitzGerald LM, et al. Reliability of DNA methylation measures from dried blood spots and mononuclear cells using the HumanMethylation450k BeadArray. Sci Rep. 2016;6:30317. 232. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protocols. 2008;4(1):44-57. 233. Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1-13. 234. Wolfe D, Dudek S, Ritchie MD, Pendergrass SA. Visualizing genomic information across chromosomes with PhenoGram. BioData Mining. 2013;6:18-.

201

235. Kanduri M, Marincevic M, Halldorsdottir AM, Mansouri L, Junevik K, Ntoufa S, et al. Distinct transcriptional control in major immunogenetic subsets of chronic lymphocytic leukemia exhibiting subset-biased global DNA methylation profiles. Epigenetics. 2012;7(12):1435-42. 236. Aoki T, Frosen J, Fukuda M, Bando K, Shioi G, Tsuji K, et al. Prostaglandin E2-EP2-NF-kappaB signaling in macrophages as a potential therapeutic target for intracranial aneurysms. Sci Signal. 2017;10(465). 237. Yang S, Li JY, Xu W. Role of BAFF/BAFF-R axis in B-cell non-Hodgkin lymphoma. Crit Rev Oncol Hematol. 2014;91(2):113-22. 238. Takahata H, Ohara N, Ichimura K, Tanaka T, Sato Y, Morito T, et al. BAFF-R is expressed on B-cell lymphomas depending on their origin, and is related to proliferation index of nodal diffuse large B-cell lymphomas. Journal of clinical and experimental hematopathology : JCEH. 2010;50(2):121-7. 239. Jentsch-Ullrich K, Koenigsmann M, Mohren M, Franke A. Lymphocyte subsets' reference ranges in an age- and gender-balanced population of 100 healthy adults--a monocentric German study. Clin Immunol. 2005;116(2):192-7. 240. Lewis SM, Bain, B.J., Bates I. Dacie and Lewis Practical Haematology. 10th ed. Philadelphia: Churchill Livingstone; 2006. 241. Hu Y-L, Passegué E, Fong S, Largman C, Lawrence HJ. Evidence that the Pim1 kinase gene is a direct target of HOXA9. Blood. 2007;109(11):4732-8. 242. Mraz M, Pospisilova S. MicroRNAs in chronic lymphocytic leukemia: from causality to associations and back. Expert review of hematology. 2012;5(6):579- 81. 243. Chen SS, Raval A, Johnson AJ, Hertlein E, Liu TH, Jin VX, et al. Epigenetic changes during disease progression in a murine model of human chronic lymphocytic leukemia. Proc Natl Acad Sci U S A. 2009;106(32):13433-8. 244. Rossi D, Ciardullo C, Gaidano G. Genetic aberrations of signaling pathways in lymphomagenesis: revelations from next generation sequencing studies. Semin Cancer Biol. 2013;23(6):422-30. 245. Islam S, Qi W, Morales C, Cooke L, Spier C, Weterings E, et al. Disruption of Aneuploidy and Senescence Induced by Aurora Inhibition Promotes Intrinsic Apoptosis in Double Hit or Double Expressor Diffuse Large B-cell Lymphomas. Mol Cancer Ther. 2017;16(10):2083-93. 246. Pham LV, Tamayo AT, Yoshimura LC, Lin-Lee YC, Ford RJ. Constitutive NF- kappaB and NFAT activation in aggressive B-cell lymphomas synergistically activates the CD154 gene and maintains lymphoma cell survival. Blood. 2005;106(12):3940-7. 247. Marafioti T, Pozzobon M, Hansmann ML, Ventura R, Pileri SA, Roberton H, et al. The NFATc1 transcription factor is widely expressed in white cells and translocates from the cytoplasm to the nucleus in a subset of human lymphomas. Br J Haematol. 2005;128(3):333-42. 248. Fu L, Lin-Lee YC, Pham LV, Tamayo A, Yoshimura L, Ford RJ. Constitutive NF-kappaB and NFAT activation leads to stimulation of the BLyS survival pathway in aggressive B-cell lymphomas. Blood. 2006;107(11):4540-8. 249. Kai X, Chellappa V, Donado C, Reyon D, Sekigami Y, Ataca D, et al. IkappaB kinase beta (IKBKB) mutations in lymphomas that constitutively activate canonical nuclear factor kappaB (NFkappaB) signaling. J Biol Chem. 2014;289(39):26960-72.

202

250. Ceribelli M, Kelly PN, Shaffer AL, Wright GW, Xiao W, Yang Y, et al. Blockade of oncogenic IkappaB kinase activity in diffuse large B-cell lymphoma by bromodomain and extraterminal domain protein inhibitors. Proc Natl Acad Sci U S A. 2014;111(31):11365-70. 251. O'Gorman A, Colleran A, Ryan A, Mann J, Egan LJ. Regulation of NF-kappaB responses by epigenetic suppression of IkappaBalpha expression in HCT116 intestinal epithelial cells. Am J Physiol Gastrointest Liver Physiol. 2010;299(1):G96-G105. 252. Maegawa S, Chinen Y, Shimura Y, Tanba K, Takimoto T, Mizuno Y, et al. Phosphoinositide-dependent protein kinase 1 is a potential novel therapeutic target in mantle cell lymphoma. Exp Hematol. 2017. 253. Chinen Y, Kuroda J, Shimura Y, Nagoshi H, Kiyota M, Yamamoto-Sugitani M, et al. Phosphoinositide protein kinase PDPK1 is a crucial cell signaling mediator in multiple myeloma. Cancer Res. 2014;74(24):7418-29. 254. Tatekawa S, Chinen Y, Ri M, Narita T, Shimura Y, Matsumura-Kimoto Y, et al. Epigenetic repression of miR-375 is the dominant mechanism for constitutive activation of the PDPK1/RPS6KA3 signalling axis in multiple myeloma. Br J Haematol. 2017;178(4):534-46. 255. Edwards SK, Desai A, Liu Y, Moore CR, Xie P. Expression and function of a novel isoform of Sox5 in malignant B cells. Leuk Res. 2014;38(3):393-401. 256. Webb CF, Bryant J, Popowski M, Allred L, Kim D, Harriss J, et al. The ARID family transcription factor bright is required for both hematopoietic stem cell and B lineage development. Mol Cell Biol. 2011;31(5):1041-53. 257. Kim PG, Canver MC, Rhee C, Ross SJ, Harriss JV, Tu HC, et al. Interferon- alpha signaling promotes embryonic HSC maturation. Blood. 2016;128(2):204- 16. 258. Skibola CF, Berndt SI, Vijai J, Conde L, Wang Z, Yeager M, et al. Genome- wide association study identifies five susceptibility loci for follicular lymphoma outside the HLA region. Am J Hum Genet. 2014;95(4):462-71. 259. Conroy SM, Maskarinec G, Morimoto Y, Franke AA, Cooney RV, Wilkens LR, et al. Non-hodgkin lymphoma and circulating markers of inflammation and adiposity in a nested case-control study: the multiethnic cohort. Cancer Epidemiol Biomarkers Prev. 2013;22(3):337-47. 260. Rakyan VK, Down TA, Thorne NP, Flicek P, Kulesha E, Graf S, et al. An integrated resource for genome-wide identification and analysis of human tissue-specific differentially methylated regions (tDMRs). Genome research. 2008;18(9):1518-29. 261. Peters TJ, Buckley MJ, Statham AL, Pidsley R, Samaras K, R VL, et al. De novo identification of differentially methylated regions in the human genome. Epigenetics & chromatin. 2015;8:6. 262. Jaffe AE, Murakami P, Lee H, Leek JT, Fallin MD, Feinberg AP, et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. International journal of epidemiology. 2012;41(1):200-9. 263. Istas G, Declerck K, Pudenz M, Szic KSV, Lendinez-Tortajada V, Leon-Latre M, et al. Identification of differentially methylated BRCA1 and CRISP2 DNA regions as blood surrogate markers for cardiovascular disease. Sci Rep. 2017;7(1):5120.

203

264. Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, et al. Track data hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics. 2014;30(7):1003-5. 265. Milner CM, Campbell RD. Genetic organization of the human MHC class III region. Front Biosci. 2001;6:D914-26. 266. Janeway CA, Travers, J.P., Walport, M., Shlomchik, M.J. Immunobiology. New York: Garland Science; 2001. 267. Wang L, Aakre JA, Jiang R, Marks RS, Wu Y, Chen J, et al. Methylation markers for small cell lung cancer in peripheral blood leukocyte DNA. Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer. 2010;5(6):778-85. 268. Drenou B, Le Friec G, Bernard M, Pangault C, Grosset JM, Lamy T, et al. Major histocompatibility complex abnormalities in non-Hodgkin lymphomas. Br J Haematol. 2002;119(2):417-24. 269. Dictor M, Ek S, Sundberg M, Warenholt J, Gyorgy C, Sernbo S, et al. Strong lymphoid nuclear expression of SOX11 transcription factor defines lymphoblastic neoplasms, mantle cell lymphoma and Burkitt's lymphoma. Haematologica. 2009;94(11):1563-8. 270. Narurkar R, Alkayem M, Liu D. SOX11 is a biomarker for cyclin D1- negative mantle cell lymphoma. Biomark Res. 2016;4:6. 271. Gustavsson E, Sernbo S, Andersson E, Brennan DJ, Dictor M, Jerkeman M, et al. SOX11 expression correlates to promoter methylation and regulates tumor growth in hematopoietic malignancies. Mol Cancer. 2010;9:187. 272. Nordstrom L, Andersson E, Kuci V, Gustavsson E, Holm K, Ringner M, et al. DNA methylation and histone modifications regulate SOX11 expression in lymphoid and solid cancer cells. BMC Cancer. 2015;15:273. 273. Marttila S, Kananen L, Hayrynen S, Jylhava J, Nevalainen T, Hervonen A, et al. Ageing-associated changes in the human DNA methylome: genomic locations and effects on gene expression. BMC Genomics. 2015;16:179. 274. Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG, et al. Increased methylation variation in epigenetic domains across cancer types. Nat Genet. 2011;43. 275. Jaffe AE, Feinberg AP, Irizarry RA, Leek JT. Significance analysis and statistical dissection of variably methylated regions. Biostatistics. 2012;13. 276. Teschendorff AE, Widschwendter M. Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics. 2012;28(11):1487-94. 277. Phipson B, Oshlack A. DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging. Genome biology. 2014;15(9):1-16. 278. Issa JP. Epigenetic variation and cellular Darwinism. Nat Genet. 2011;43(8):724-6. 279. Teschendorff AE, Widschwendter M. Differential variability improves the identification of cancer risk markers in DNA methylation studies profiling precursor cancer lesions. Bioinformatics. 2012;28. 280. Bracken AP, Dietrich N, Pasini D, Hansen KH, Helin K. Genome-wide mapping of Polycomb target genes unravels their roles in cell fate transitions. Genes & development. 2006;20(9):1123-36.

204

281. Yang Y, Staudt LM. Protein ubiquitination in lymphoid malignancies. Immunol Rev. 2015;263(1):240-56. 282. Conway E, Healy E, Bracken AP. PRC2 mediated H3K27 methylations in cellular identity and cancer. Curr Opin Cell Biol. 2015;37:42-8. 283. Cedar H, Bergman Y. Linking DNA methylation and histone modification: patterns and paradigms. Nat Rev Genet. 2009;10(5):295-304. 284. Bryant C, Suen H, Brown R, Yang S, Favaloro J, Aklilu E, et al. Long-term survival in multiple myeloma is associated with a distinct immunological profile, which includes proliferative cytotoxic T-cell clones and a favourable Treg/Th17 balance. Blood cancer journal. 2013;3:e148. 285. Mpakou VE, Ioannidou HD, Konsta E, Vikentiou M, Spathis A, Kontsioti F, et al. Quantitative and qualitative analysis of regulatory T cells in B cell chronic lymphocytic leukemia. Leuk Res. 2017;60:74-81.

205

Minerva Access is the Institutional Repository of The University of Melbourne

Author/s: Wong Doo, Nicole

Title: DNA methylation marks in peripheral blood and the risk of developing mature B cell neoplasms

Date: 2018

Persistent Link: http://hdl.handle.net/11343/220501

File Description: DNA methylation marks in peripheral blood and the risk of developing mature B cell neoplasms

Terms and Conditions: Terms and Conditions: Copyright in works deposited in Minerva Access is retained by the copyright owner. The work may not be altered without permission from the copyright owner. Readers may only download, print and save electronic copies of whole works for their own personal non-commercial use. Any use that exceeds these limits requires permission from the copyright owner. Attribution is essential when quoting or paraphrasing from these works.