DNA methylation marks in peripheral blood and the risk of developing mature B cell neoplasms
Nicole Wong Doo
A thesis in fulfillment of the requirements for the degree of Doctor of Philosophy
School of Population and Global Health
The University of Melbourne
2018
ORCID ID: 0000-0003-3725-3397
0 Blank
1 Abstract
Dysregulation of DNA methylation is a feature of mature B cell neoplasms (MBCN) but it is not known whether methylation changes can be detected in blood-derived DNA prior to MBCN diagnosis. In this prospective cohort study, peripheral blood was collected from healthy participants at recruitment (1990-1994). Participants who were subsequently diagnosed with MBCN (chronic lymphocytic lymphoma, B cell non-Hodgkin lymphoma and myeloma) up to 2012 were matched to the same number of controls based on age, sex, ethnicity, and type of blood sample (Guthrie cards, mononuclear cells, buffy coats). DNA methylation was measured using the Infinium®HumanMethylation450 BeadChip. Peripheral blood DNA was collected from 438 matched case-control pairs, a median of 10.6 years prior to diagnosis with MBCN. A series of analytical approaches was used in order to evaluate whether there was a distinct methylation profile associated with MBCN. First, global methylation analysis was performed, identifying increased methylation in CpG island and promoter-associated CpGs and widespread hypomethylation. Second, conditional logistic regression was used to identify differentially methylated CpG sites (DMPs) and kernel smoothing was used to identify differentially methylated regions (DMRs). Third, differential methylation variability, considered to be a distinctive feature in cancer, was assessed. In total, 1,338 DMPs were identified, of which 90 had gain of methylation in CpG sites associated with homeobox genes and 1,248 had loss of methylation in CpG sites associated with MAPK signaling pathway genes and genes involved in chemokine signaling pathways. There were 9,857 DMRs, with a cluster of 151 DMRs located in a 3.8kb region on 6p21.3, corresponding to the major histocompatibility locus. Differential methylation variability analysis identified 144 novel CpG sites distinctively located outside CpG islands.
Conclusion: Distinctive changes in peripheral blood DNA methylation can be detected many years prior to diagnosis with MBCN, suggesting that changes in DNA methylation are an early epigenetic event. This contributes to our understanding of the timing of methylation changes in the development of MBCN.
2 Blank
3 Declaration
This is to certify that: (i) the thesis comprises only my original work towards the PhD except where indicated, (ii) due acknowledgement has been made in the text to all other material used, (iii) the thesis is less than 100,000 words in length, exclusive of table, maps, bibliographies, appendices and footnotes.
4 Blank
5 Preface
(i) The work contained within this thesis was performed as a collaboration with Graham G. Giles, Dallas R. English and John L. Hopper at the School of Population and Global Health, University of Melbourne and the Cancer Epidemiology and Intelligence Division, Cancer Council Victoria, who established the Melbourne Collaborative Cohort Study (MCCS); Melissa C. Southey, JiHoon E. Joo and Ee Ming Wong at the Genetic Epidemiology Laboratory, University of Melbourne who performed the DNA methylation assay; Enes Makalic, Daniel F. Schmidt and Chol- Hee Jung who performed the bioinformatics analysis. The component of the work which I contributed as original research was to assist in planning the nested study design, wholly planning and designing the statistical analyses, wholly interpreting the results and writing of the manuscript. The regional methylation analysis was performed by JiHoon E. Joo, according to specifications outlined by me and I interpreted the analysis in full. The differential variability analysis was performed by Dr Pierre Antoine-Dugué, and I interpreted the results. (ii) No portion of this thesis has been submitted for other qualifications (iii) No portion of this thesis was carried out prior to enrolment in the degree (iv) No third party editorial assistance was provided in preparation of the thesis (v) For the publication included in this thesis, the roles of the authors are as follows: Enes Makalic – biostatistical analysis of array data; JiHoon E. Joo – DNA methylation assay; Claire M Vajdic – contribution to manuscript; Daniel F. Schmidt – biostatistical analysis of array data; Ee Ming Wong – DNA methylation assay; Gianluca Severi – contribution to study design and review of manuscript; Daniel J. Park – contribution to manuscript; Jessica Chung – contribution to developing bioinformatics pipeline; Laura Baglietto – contribution to study design and review of manuscript; Henry M. Prince – contribution to manuscript; John F. Seymour – contribution to manuscript; Constantine Tam – contribution to manuscript; John L. Hopper – contribution to study design; Dallas R. English – contribution to study design; Dallas R. English – contribution to study design; Roger L. Milne – contribution to manuscript; Simon J. Harrison – contribution to manuscript; Melissa C. Southey – DNA methylation assay and contribution to study design; Graham G. Giles – contribution to study design and manuscript. (vii) The MCCS was funded by VicHealth and Cancer Council Victoria. The MCCS was further supported by Australian NHMRC grants 209057, 251553 and 504711 and by infrastructure provided by Cancer Council Victoria.
i Blank
ii Acknowledgements
Nicholas Brennan – for your patience and understanding
Ella & Luca – you embody the forces of nature and time
Elsa & Victor Wong Doo – a lifetime of support and encouragement
Colleagues at Concord Hospital – for your unquestioning support
iii
Blank
iv Table of Contents
Abstract ...... 2 Declaration ...... 4 Preface ...... i Acknowledgements ...... iii Table of Contents ...... v List of Tables ...... vii List of Figures ...... viii List of Abbreviations ...... 10 Acknowledgements ...... Error! Bookmark not defined. 1 Introduction ...... 12 2 Background ...... 14 2.1 Mature B cell neoplasms – Background ...... 14 2.2 DNA Methylation ...... 26 2.3 Differential methylation as a marker of cancer risk ...... 40 2.4 Measuring DNA methylation ...... 41 2.5 Measures of differential methylation ...... 46 2.6 Biological challenges in measuring DNA methylation ...... 47 3 Study Design ...... 50 3.1 Melbourne Collaborative Cohort Study ...... 50 3.2 Nested Case-Control Study, participant selection ...... 51 4 Methods ...... 55 4.1 DNA source and sample collection ...... 55 4.2 DNA Extraction and Bisulfite conversion ...... 55 4.3 DNA methylation measurement ...... 56 4.4 Data processing ...... 56 4.5 CpG site selection ...... 57 4.6 Assembly of Candidate Genes ...... 58 5 Results ...... 59 5.1 Global DNA Methylation ...... 59 5.2 Differentially methylated positions ...... 72 Background ...... 72 Analysis ...... 72 Results ...... 75 Discussion ...... 103 5.3 Differentially methylated regions ...... 111 Background ...... 111 Analysis ...... 112 Results ...... 112 Discussion ...... 135 5.4 Differential methylation variability ...... 139
v Background ...... 139 Analysis ...... 140 Results ...... 141 Discussion ...... 144 6 Conclusions and Future Work ...... 146 Appendices ...... 149 References ...... 187
vi List of Tables
Table 1: MBCN classification (WHO) ...... 16 Table 2: Common structural chromosomal abnormalities thought to be primary events in MBCN ...... 17 Table 3: Recurrent mutations in MBCN ...... 18 Table 4: SNPs identified from published genome-wide association studies ...... 24 Table 5: Putative tumour suppressor genes exhibiting promoter hypermethylation and reduced gene expression ...... 28 Table 6: Summary of published studies reporting aberrant DNA methylation in CLL 30 Table 7: Summary of studies reporting an association between DNA methylation and prognosis ...... 31 Table 8: Summary of published studies reporting aberrant DNA methylation in mantle cell lymphoma ...... 33 Table 9: Summary of published studies reporting aberrant DNA methylation in follicular lymphoma ...... 33 Table 10: Summary of published studies reporting an association between methylation and prognosis in follicular lymphoma ...... 34 Table 11: Summary of published studies reporting aberrant DNA methylation in diffuse large B cell lymphoma ...... 35 Table 12: Summary of studies reporting an association between DNA methylation and prognosis ...... 36 Table 13: Summary of published studies reporting aberrant global DNA methylation in multiple myeloma ...... 36 Table 14: Summary of published studies reporting aberrant DNA methylation at specific CpG sites in multiple myeloma ...... 36 Table 15: Summary of studies reporting an association between DNA methylation and the clinical progression in later stages of myeloma ...... 37 Table 16: Histological diagnoses and assigned tumour group ...... 52 Table 17: Demographics of study population ...... 54 Table 18: Significant KEGG pathways for genes containing one or more CpG sites with loss of methylation ...... 79 Table 19: ANOVA comparisons of mean methylation difference in the four tumour subytpes ...... 81 Table 20: ANOVA comparisons of mean methylation according to time lag between blood collection and diagnosis ...... 82 Table 21: DMPs corresponding to genes described as showing aberrant methylation in literature review ...... 84 Table 22: Effect of correcting conditional logistic regression analysis using different models of white blood cell content adjustment ...... 94 Table 23: DMPs identified after adjustment for white blood cell content ...... 96 Table 24: Low grade lymphoma histological types ...... 99 Table 25: Methylation status of DMRs within genes known to be aberrantly methylated in MBCN...... 119 Table 26: DMRs containing a DMP identified following conditional logistic regression with adjustment for white blood cell content (p<1.2x10-7) ...... 122 Table 27: Top DMRs ranked by different methods ...... 124
vii Table 28: Number of DMRs with mean and maximum methylation difference 2 - 4%...... 125 Table 29: KEGG pathways associated with genes demonstrating a loss of methylation within a DMR ...... 126 Table 30: DMRs found after applying a threshold of mean methylation difference of >3%...... 127 Table 31: DMRs found after applying a threshold of maximum methylation difference of >4%...... 127 Table 32: Number of DMRs according to different pcomb value thresholds ...... 129 Table 33: KEGG pathway analysis of DMRs, p<1x10-15 ...... 129 Table 34: DMRs with Stouffer p<1x10-25 (λ=1000), ranked by magnitude of maximum methylation difference...... 130 Appendix Table 1: List of candidate genes identified as mutated in MBCN (mutation prevalence>4%) ...... 149 Appendix Table 2: List of genes identified as aberrantly methylated in MBCN from literature review ...... 150 Appendix Table 3: Full list of DMPs ...... 151 β=β methylation value ...... 182 Appendix Table 4: Differentially variable probes ...... 183 VarRatio=variance ratio ...... 186 ICC=intraclass correlation coefficient ...... 186
List of Figures
Figure 1: Flow diagram of study participants ...... 53 Figure 2: Strategy for selecting CpG sites to include in the final analysis ...... 57 Figure 3: Comparison of different methods of correction for multiple testing demonstrating the stringency of the Bonferroni method ...... 75 Figure 4: Mean methylation difference in all DMPs...... 76 Figure 5: Differential methylation in DMP cg04771285 ...... 77 Each dot represents methylation difference of a single case-control pair ...... 77 Figure 6: Mean methylation difference of DMPs by CpG island location ...... 78 Figure 7: Comparison of models before and after adjustment for lifestyle ...... 85 Figure 8: Computed white blood cell proportions by DNA source ...... 87 88 Figure 9: B cell proportions ...... 88 89 Figure 10: Granulocyte proportions ...... 89 Figure 11: Correlation between methylation and cell content ...... 91 Figure 12: Correlation between B cell content and absolute DNA methylation for DMP cg06313775...... 92
viii Figure 13: Correlation between B cell content and DNA methylation for DMP cg13814485...... 92 Figure 14: Correlation between granulocyte content and absolute DNA methylation level for DMP cg12699321...... 93 Figure 15: DMPs in common between the unadjusted model and the model adjusted white blood cell content ...... 95 Figure 16: Principal component analysis of all 438 case-control pairs, demonstrating the group of outliers ...... 97 Figure 17: Differential methylation by tumour subtype...... 98 Figure 18: Differential methylation in the 1,338 DMPs compared across the four time-lag groups (A & B)...... 101 Figure 19: Association between proportion of CpG island content of DMR and methylation ...... 113 Figure 20: Association between proportion of promoter-associated CpGs within DMR and methylation ...... 113 Figure 21: Chromosomal location of DMRs showing peak in Chm 6p21.3 ...... 114 Figure 22: DMRs in chromosome 6 ...... 115 Figure 23: Association between properties of DMRs and concordance with methylation findings in literature ...... 117 Figure 24: No relationship between magnitude of methylation difference and Stouffer p value ...... 123 Figure 25: Comparison of DMRs identified using λ=1000 and λ=500 ...... 132 Figure 26: Differential methylation variability ...... 139 Figure 27: Differentially variable positions demonstrated by chromosomal location ...... 142 Figure 28: Plot of variance ratio and differential methylation ...... 143
ix List of Abbreviations ABC activated B-cell AID activation-induced cytidine deaminase AZA azathioprine BCR B-cell receptor BMI body mass index BTK Bruton's tyrosine kinase CHARM comprehensive high-throughput arrays for relative methylation CI confidence interval CLL chronic lymphocytic leukaemia COBRA combined bisulfite restriction analysis CpG CpG dinucleotide DLBCL diffuse large B-cell lymphoma DMP differentially methylated position DMR differentially methylated region DNA deoxyribonucleic acid DNMT DNA methyl-transferase DVP Differentially variable position ES embryonic stem FDR false discovery rate GCB germinal centre B-cell GWAS genome-wide association studies HELP HpaII tiny fragment enrichment by ligation-mediated PCR HIV human immunodeficiency virus HNSCC head and neck squamous cell carcinoma HR hazard ratio ICD-O-3 International Classification of Disease, 3rd edition Ig immunoglobulin IGH Immunoglobulin heavy chain IGHV-M mutated immunoglobulin heavy chain gene IGHV-UM unmutated immunoglobulin heavy chain gene IL-7 interleukin 7 KEGG Kyoto Encyclopedia of Genes and Genomes LPL lymphoplasmacytic lymphoma MALT mucosal-associated lymphoid tissue MAPK mitogen-activated pathway kinase MBCN mature B-cell neoplasm MBL monoclonal B-cell lymphocytosis MCAM methylated CpG island amplification microarray MCCS Melbourne collaborative cohort study MCL mantle cell lymphoma MGUS monoclonal gammopathy of undetermined significance MIRA methylated CpG island recovery assay MM multiple myeloma
10 MSP methylation-sensitive PCR MVP methylation variable position NF-kB nuclear factor kappa-light chain enhancer of activated B cells NFAT nuclear factor of activated T cells NHL non-Hodgkin lymphoma OR odd ratio PCR polymerase chain reaction PRC2 polycomb repressor complex-2 RAF relative allele frequency RNA ribonucleic acid RR risk ratio RRBS reduced representation bisulfite sequencing SEER Surveillance, Epidemiology and End Results SLL small lymphocytic lymphoma SNP single nucleotide polymorphism SWAN subset-quartile within array normalisation SYK spleen tyrosine kinase TSG tumour suppressor gene USA United States of America VMR variably methylated region WBC white blood cell WHO World Health Organisation WM Waldenström's macroglobulinaemia
11 1 Introduction
Mature B cell neoplasms (MBCN) are a group of haematological malignancies classified histologically according to the World Health Organisation into chronic lymphocytic leukaemia (CLL), non-Hodgkin lymphoma (NHL) including follicular lymphoma and diffuse large B cell lymphoma (DLBCL) and multiple myeloma (MM) (1). They account for about 6% of all cancers diagnosed in Australia (2) and the majority of haematological malignancies. Although overall survival from MBCN has improved over the last decade, they continue to be a significant source of morbidity and annually in Australia they cause over 25,000 years of life to be lost prematurely (3).
The underlying aetiology of MBCN remains largely unexplained. While there is evidence for both environmental and genetic risk factors, the magnitude of any individual risk factor identified thus far is small, suggesting other as yet unidentified risks may be in play. The pattern of familial risk in MBCN suggests that heritable risk factors are shared across different MBCN types. For instance a family history of monoclonal gammopathy of undetermined significance (MGUS), a precursor of MM, is associated with increased risk of CLL (4).
Genetic events such as structural chromosomal changes and gene mutations are thought to occur both as initial genetic ‘hits’ in the pathogenesis of MBCN as well as later during disease progression. In B-NHL and CLL the most common locations of recurrent genetic events are in the immunoglobulin heavy chain (IGH) locus on chromosome 14, and in BCL2 (B cell lymphoma 2), MYC and NOTCH genes (5-7). In MM, primary or driver genetic events are found in the IGH locus and BCL6 and MYC genes (8). While some MBCN are virtually defined by a pathognomic mutation such as BRAF in hairy cell leukaemia and MYD88 in Waldenström’s macroglobulinaemia, the majority of MBCN are not characterised by a single genetic mutation and carry a diverse range of mutations occurring at low frequencies. Epidemiological studies examining the environmental and familial risk factors for MBCN have found modest associations (4, 9, 10). A number of genome-wide association studies have identified single nucleotide genetic changes associated with MBCN, revealing a number of relatively common genetic variations may each contribute to a small risk of MBCN (11-33).
Thus far, traditional epidemiological and genetic epidemiological studies have not established major risks associated with developing MBCN. A focus on epigenetic pathways in solid cancers and in haematological malignancies has thus evolved with our increasing understanding of the relationship between epigenetic mechanisms and their role in gene transcription. A host of epigenetic pathways are dysregulated in MBCN, including DNA methylation (34-38), histone modification, chromatin modeling and microRNAs (39). This thesis focuses on DNA methylation, the addition of a methyl group to DNA at CpG dinucleotides. The effect of CpG methylation in many instances is of downstream transcription silencing and, together with histone
12 modification, this is considered to be a major mechanism of gene regulation (40). DNA methylation is observed to change with increasing age and under the influence of environmental changes such as exposure to high dietary folate intake and exposure to smoking, mediated by reversible enzymatic processes that can add or remove the dimethylgroup. Methylation therefore represents an appealing potential link between environmental exposures and gene regulation in the pathogenesis of MBCN. Pharmacological agents that affect methylation are incorporated into treatment of MBCN in some instances, raising the possibility that novel methylation findings could contribute to clinically relevant discoveries in MBCN. Thus far, two distinct patterns of widespread DNA hypomethylation and promoter hypermethylation are features of MBCN but it is not yet clear whether these are early or late epigenetic events, nor whether they drive malignant transformation or are passenger events. The traditional view of DNA methylation in MBCN is that it is a late and possibly secondary event (8). If methylation changes were identified in blood samples many years prior to diagnosis it would support an alternative hypothesis that methylation is in fact an early step in MBCN pathogenesis. Thus far there is only one study evaluating peripheral blood methylation in pre-diagnostic blood from CLL subjects and none evaluating this in other MBCN types (41).
Identification of novel epigenetic risk factors has the potential to lead to new methods of early diagnosis through ‘liquid biopsies’ or minimally-invasive methods for monitoring disease activity. There is also the potential of uncovering novel mechanisms of disease that may become future therapeutic targets. The available technology for the evaluation of DNA methylation has improved in recent years with the advent of high through-put array-based assays which measure methylation at single CpG sites. Until recently, the cost of measuring the methylation status of large areas of the genome has been prohibitive. Using modern high-throughput DNA methylation assays, a ‘methylome’ has now been described for CLL, some NHL types and MM (42-47). This technology is well-suited for application to population-based studies evaluating DNA methylation levels in peripheral blood.
Hypothesis: DNA methylation changes present in peripheral blood could be detectable years before diagnosis with MBCN.
Aims: • To describe global DNA changes associated with MBCN • To describe CpG site-specific changes in methylation associated with MBCN • To use exploratory methods of identifying differential methylation including measures of differential regional methylation and methylation variance
13 2 Background 2.1 Mature B cell neoplasms – Background
The process of B cell development, including the normal signaling pathways and genetic events, are described as background to the genetic mutations and epigenetic events in MBCN.
Normal B Cell development Normal B cells arise from the pluripotent haematopoeitic stem cell precursor in the bone marrow and undergo maturation initially in the marrow and then in the lymph nodes. The primary physiological role of B lymphocytes is to recognise and react to an infinite number of potential antigens that may be encountered during a lifetime. This is achieved by developing a broad range of B cell clones during multiple genetic events called recombination, somatic hypermutation and class or isotype switching. This process by which each clone expresses a unique B-cell receptor allows B cells to fulfill their immunological role but puts them at risk of acquiring DNA errors.
The first event in recombination takes place in precursor B cells in the bone marrow where they undergo rearrangement of the V, D and J segments of the immunoglobulin (Ig) heavy chain gene and rearrangement of the light chain gene that induce double-stranded DNA (dsDNA) breaks.
The immunoglobulin heavy chain (IGH) locus is located on the long arm of chromosome 14 at 14q32. The segments undergoing recombination cover a very large region, for instance VH and DH segments can be separated by up to 2.5 Mb, suggesting a process of locus contraction and expansion occurs in order to bring the IGH segments in contact with each other. Recombination is carried out by the lymphoid-specific recombination-activating gene 1 (RAG1) and RAG2 proteins (48). Initially, the method for recombination across such large regions of the genome was poorly understood. One of the generally accepted models is that recombination begins with the IGH locus in its usual expanded conformation after initial RAG expression is initiated. The process continues under the influence of early B cell factor gene (EBF) upregulation which promotes contraction of the IGH locus under Pax5 transcription factor influence (48). The RAG complex brings together two segments of the IGH locus and induces a double-stranded break in a random location. Subsequently terminal deoxynucleotidyl transferase inserts nucleotides at the D-J, and V-DJ junctions thereby increasing the diversity of the antigen-receptor repertoire (49). Other essential factors for Ig rearrangement include interleukin-7 (IL- 7) receptor signaling and phosphoinositide 3-kinase (PI3K) signaling, which regulate proliferation and survival of the pre-B cells. Recombination is a process of allelic exclusion where, the final B-cell receptor is expressed from only one allele at the heavy chain locus (one IGH allele) and light chain locus (one IgK or IgL allele). If recombination results in a non-functional B-Cell receptor, the B cell undergoes apoptosis.
14 The resultant immature B cells migrate from the bone marrow to the periphery – either lymph nodes or spleen. Upon encounter with antigen they are activated and migrate to the germinal centre of a lymph node. Within the germinal centre, the IGH locus undergoes further modification known as somatic hypermutation during which additional mutations are promoted under the influence of activation-induced deaminase (AID).
A final AID-induced process of IGH modification called class switch recombination occurs in order to produce different immunoglobulin isotypes (from IgM/IgD to IgG or IgA) (49). These isotypes promote ability to respond to different antigenic stimuli and generate varied cellular and humoral immune response. AID induces double strand DNA breaks which are repaired by the mismatch repair pathway. However, during this repair process they can be mistakenly joined to double strand breaks occurring elsewhere in the genome. This can result in the recurrent chromosomal translocations characteristic of some MBCN such as those described in Table 2.
It is not only the B cells themselves that mediate the maturation process. Cellular interactions between T follicular helper cells, follicular dendritic cells and B Cells within the germinal centre of the lymph node are thought to be essential for somatic hypermutation and isotype switching (49). The resulting activated B cells further differentiate into either memory B cells or plasma cells.
Regulation of B cell proliferation
B cell proliferation and differentiation is linked to changes in signaling roles of the B cell surface receptor. Early B cells express a pre-B cell receptor that favours B cell proliferation under the influence of interleukin-7 (IL-7). The surface membrane component of the mature B cell receptor is composed of CD79 and CD79b proteins. Nearby surface immunoglobulins’ role, as described above, is to be able to recognise a broad repertoire of antigens to facilitate the appropriate immune response. The B cell receptor components aggregate with cell surface immunoglobulin and subsequent antigen binding to the surface immunoglobulin leads to a conformational change in the B cell receptor and downstream pro-proliferative signaling. The enzymes spleen tyrosine kinase (SYK) and Bruton’s tyrosine kinase (BTK) are essential intermediate components linking surface proteins to more distal signaling via B cell linker protein (50). This cytoplasmic cell signaling cascade orchestrates the activation of three transcription factors that each drive pro-survival and proliferative effects: nuclear factor of activated T cells (NFAT), nuclear factor kappa-light-chain-enhancer of activated B cells (NF-kB) and activator protein 1 (AP1) (51). NFAT and NF-kB signaling is initiated by SYK-mediated BTK phosphorylation. In parallel, the activated B cell receptor activates MEK/ERK signaling which leads also to DNA transcription.
Definition of MBCN
MBCN are classified histologically in the current WHO (World Health Organisation) Classification of Tumours of Haematopoietic and Lymphoid Tissues (2008)(1), arising
15 from a B cell precursor after the stage of pro-B cell maturation (Table 1). The malignant cells of CLL and small lymphocytic lymphoma (SLL) are indistinguishable by histological and molecular phenotype and are differentiated clinically, with CLL occurring predominantly in the blood and bone marrow, while SLL occurs predominantly in lymph nodes.
Table 1: MBCN classification (WHO) (1)
Non-Hodgkin lymphoma (NHL) - Aggressive / high grade NHL Diffuse large B cell lymphoma, Burkitt lymphoma, primary mediastinal large B-cell lymphoma, intravascular large B-cell lymphoma, plasmablastic lymphoma - Indolent / low grade NHL Follicular NHL, small lymphocytic lymphoma, extranodal marginal zone lymphoma of mucosa-associated lymphoid tissue, nodal marginal zone lymphoma, lymphoplasmacytic lymphoma (or Waldenström’s macroglobulinaemia), mantle cell lymphoma Chronic lymphocytic leukaemia Hairy cell leukaemia Plasma cell dyscrasias Multiple myeloma, plasmacytoma
Molecular pathogenesis of MBCN
A description of the known molecular pathways in MBCN follows with, where possible, their proven or putative functional role, as background to the possible biological relevance of putative methylation changes.
I) Structural Chromosomal abnormalities in MBCN Balanced translocations, occurring when there is large-scale transfer of DNA from one chromosome to another, are a feature of B-NHL. Reciprocal translocations frequently involve the IGH gene and are thought to have an oncogenic role, whereby IGH regulatory elements are placed next to an oncogene partner, putting the oncogene under the influence of an IGH enhancer. The putative mechanism for the frequency of involvement of IGH in chromosomal translocations is loss of the usual DNA repair mechanisms required during normal immunoglobulin recombination, somatic hypermutation or isotype switching. Many partner genes have been identified, with some MBCN subtypes featuring a particular partner gene translocation as a characteristic abnormality. Translocations involving IGH occur early in pre/pro-B cells when the partner gene is BCL2 or CCND1 and later in germinal centre B cells where the partner gene is BCL6. Other chromosomal translocations occurring in germinal centre B cells occur between IGH and PAX5, IRF4, FOXP1, IRF8, EBF1 and TNFRSF13 (52).
In CLL/SLL, the timing of the initial genetic event that initiates malignant transformation is not clearly understood, but they can be divided into a type arising from a less mature pre-B cell with an unmutated IGH locus (IGHV-UM) or arising from a more mature post-germinal centre B cell with a mutated IGH locus (IGHV-M) (6). Deletions of 13q14 occur in 50-60% cases and are often the sole cytogenetic abnormality, suggesting this is an early genetic event. Other deletions include 11q22-
16 23 in 20% with associated loss of ATM and 17p13 in 10% with TP53 tumour suppressor gene inactivation. An extra copy of chromosome 12 is detected in 15% of cases. Recurrent balanced translocations are extremely rare in CLL/SLL with t(14;18)(q32;q21) occurring in only 2% cases, all of IGHV-M type.
Chromosomal translocations in MM include t(4;14), t(11;14) and t(14;16) (Table 2). Duplication of multiple chromosomes resulting in chromosomal hyperdiploidy occurs in 57% of MM. These translocations and hyperdiploid abnormalities are thought to be primary genetic events in MM as they are detectable in the MM precursor condition monoclonal gammopathy of undetermined significance (MGUS). Chromosomal abnormalities believed to be secondary events include chromosomal gains (1q, 12p, 17q), translocations (t(8;14) and other non-IGH translocations) and deletions (1p in 30% [CDKN2C, FAF1, FAM46C], 6q in 33%, 8p in 25%, 11q in 7% (BIRC2 and BIRC3), 13 in 45% (RB1 and DIS3, 14q in 38% (TRAF3), 16q in 35% (CYLD and WWOX) and 17p in 8% (TP53)] .
Table 2: Common structural chromosomal abnormalities thought to be primary events in MBCN
Chm translocation Partner genes Stage of B cell MBCN type Mutation development frequency t(14;18)(q32;q21) IGH, BCL2 Pre/pro-B cell Follicular lymphoma 85% GCB-DLBCL 30% CLL 2% MM Infrequent t(11;14)(q12;q32) CCND1, IGH Pre/pro-B cell Mantle cell lymphoma Ubiquitous t(8;14)(q24;q32) MYC, IGH Germinal centre Burkitt lymphoma Ubiquitous IGH, BCL6 Germinal centre DLBCL t(4;14) FGFR3 & MMSET, MM 11% IGH t(11;14) CCND1, IGH MM 14% t(14;16) MAF, IGH MM 3% t(14;20) IGH, MAFB MM 1.5%
GCB = Germinal centre B cell ,DLBCL = diffuse large B-cell lymphoma
II) Mutations in MBCN The mutation landscape in MBCN is highly heterogenous (6, 8, 52). In CLL, a handful of mutations are present at frequencies of 10-15% while a large number of genes are mutated at lower frequency (2-5%). Quesada et al recently reported 60 recurrent mutations identified in 105 CLL samples after whole exome sequencing, with a median of 45 mutations per case (7). Whole exome sequencing in multiple myeloma reveals multiple detectable mutations present at diagnosis in 36 genes associated with the cell proliferation pathway mitogen-activated protein kinase (MAPK) and genes of likely pathogenetic interest including KRAS and/or NRAS-activating mutations and BRAF mutations (53).
17
Table 3: Recurrent mutations in MBCN
MBCN type Gene mutation Follicular NHL FAS mutations TNFRSF14 BCL2 activation as a consequence of IGH-BCL2 with apoptosis resistance KMT2D(MLL2) resulting in decreased histone methylation CREBBP EP300 Gain of function MEF2B mutations leading to BCL6 activation HIST1H1(B-E) – aberrant chromatic compaction ARID1A mutation – nucleosome modeling TNFSF13C – encodes for BAFF (B cell activating factor), mutation leads to aberrant B cell signaling cascade Germinal BCL2 activation due to IGH-BCL2 translocation and apoptosis resistance centre type BCL6 autoregulatory domain mutations DLBCL Epigenetic MEF2B/CREBBP/EP300 FOXO1 mutations EZH2 mutations and increased H3K27me3 PTEN loss resulting in MYC up-regulation and constitutive PI3K/Akt signaling GNA13/S1PR2/RHOA Activated B- MYD88 (L265P) and TNFAIP3 mutations resulting in NF-kB activation cell type CARD11, CD79A, CD79B mutations resulting in chronic active BCR signaling DLBCL BCL6 dysregulation PRDM1/Blimp1 mutations/deletions Burkitt MYC, CCND3 (activating and DDX3X (inactivating) – Enhanced cellular proliferation lymphoma and abnormal cell cycle control Activating TCF3 and inactivating ID3 promoting BCR signaling Inactivating ARID1A and SMARCA4 mutations leading to abnormal chromatin remodeling Marginal zone Activating mutations of MYD88/CARD11 and inactivating mutations or deletions of lymphoma KLF2, TNFAIP3, BIRC3, TRAF3, IKBKB leading to enhanced NK-kB activation Activating NOTCH2 mutations leading to constitutive NOTCH activation. Inactivating mutations of NOTCH repressors SPEN and DTX1. KMT2D(MLL2), EP300 and ARID1A mutations leading to epigenetic abnormalities Mantle cell CCND1 activating mutations due to IGH-CCNDI translocation leading to cell cycle lymphoma dysregulation Inactivation of RB1 (cell cycle) Bi-allelic loss/dysfunction of ATM leading to loss of DNA damage repair TP53 mutations NOTCH1 and NOTCH2 mutations WHSC1, KMT2D(MLL2) and MEF2B mutations leading to epigenetic dysregulation
18 CLL Cell cycle apoptosis: BRAF, PTPN11, KRAS, CCND2, ANKHD1, BAX, NRAS, CDKN1B, CDKN2A. Loss of RB1 can occur due to del13q14 NOTCH signaling: NOTCH1 (mutations in coding region in 10-12% at diagnosis, generally in IGVH-UM. 40% will also carry trisomy 12. Less commonly, mutations in 3`UTR region of NOTCH1 leading to aberrant splicing events found in 3%), FBXW7 (inactivating mutations found in this gene coding for a ubiquitin ligase), SPEN RNA processing: SF3B1 (10-15%), DDX3X, MBA, ZNF292, XPO1, CNOT3, NXF1, RPS15, LUC7L2, SKIV2L2 DNA damage: ATM (9%, inactivation can occur due to del11q22-23 in addition to mutation of the remaining allele), TP53 (15%, can occur due to 17p deletion or inactivating mutations), POT1 Chromatin remodeling and transcription: CHD2, ZMYM3, SYNE1, MED12, FUBP1, SETD1A, ASXL1, ATRX, ARID1A, MED1, SETD2, BAZ2A, POLR3B, CREBBP, MLL2, HIST1H1B BCR, NF-kB, TLR and B cell activation: PAX5, MYD88 (10%), KLHL6, EGR2, BIRC3 (deletion can occur in del11q), BCOR, NFKBIE, IRF4, IKZF3, TRAF3, TLR2, CD79A, NKAP, CD79B, IRAK1. As part of del13q14, DLEU2, the microRNA cluster MIR15A- MIR16-1 (putative role in regulation of BCL-2 expression), the DLEU1lncRNA gene and sometimes DLEU7 which is a putative negative regulator of NF-kB transcriptional complex Tumour suppressor: LRP1B (4.8%) Myeloma Cell cycle abnormality: CDKN2C, RB1 (3%) CCND1 (3%), CDKN2A Proliferation: NRAS (21%), KRAS (28%), BRAF (5%), MYC (1%) Resistance to apoptosis: PI3K, AKT, MCL-1 NF-kB: TRAF3 (3%), CYLD (3%) Abnormal localization and bone disease: DKK1, FRZB, DNAH5 (8%) Abnormal plasma cell differentiation: XBP1 (3%), BLIMP1 (6%), IRF4 (5%) Abnormal DNA repair: TP53 (6%), MRE11A (1%), PARP1 RNA editing: DIS3, FAM46C, LRRK2 KDM6A (UTX) (10%), MLL (1%), MMSET (8%), HOXA9, KDM6B Recurrent chromosomal deletions resulting in gene deletion
Epidemiology of MBCN
Non-Hodgkin lymphoma is slightly more common in males than females and more common in white than black populations. There is a lower incidence of follicular lymphoma in China and Japan (49). A number of studies have reported increased incidence of MGUS in Africans and African-Americans in healthy participants screened for the presence of a serum paraprotein. An approximate three-fold increase in MGUS prevalence has been noted in African-Americans (54, 55). An American study using Surveillance, Epidemiology and End Results (SEER) data of 5,798 African American and 28,939 white MM cases found a two-fold increase incidence of MM in African-Americans as well as a significantly lower age at diagnosis of MM for African-Americans compared with whites (65.8 compared with 69.8 years) (56).
19 Some autoimmune diseases and infections are strongly associated with B-NHL. A large international pooled case-control study of 17,471 NHL cases and 23,096 controls conducted by the International Lymphoma Epidemiology Consortium (Interlymph) investigated the risk of autoimmune disease and subsequent development of NHL (9). Extensive demographic, environmental and clinical information was available. Autoimmune diseases classified as ‘B-cell-activating’ (such as Sjögren’s syndrome and systemic lupus erythematosus) were associated with increased risk of marginal zone lymphoma (OR=5.46), Waldenström’s macroglobulinaemia (OR=2.61) and DLBCL (OR=2.45). Hepatitis C infection was associated with a 3.05-fold risk for Burkitt lymphoma and 3.04-fold risk for marginal zone lymphoma, 2.70-fold for Waldenström’s and 2.33-fold for DLBCL. There is no association between Sjögren’s syndrome or systemic lupus erythematosis and MM risk (57). Further support for the importance of an intact immune system in the pathogenesis of lymphoma is the strong association between two specific instances of acquired immunodeficiency and the development of NHL. Human Immunodeficiency Virus (HIV) and subsequent Acquired Immunodeficiency Disease (AIDS) is associated with substantially increased risk of DLBCL, Burkitt lymphoma and primary central nervous system lymphoma (26). The HIV-related immune deficiency is considered to be the most significant factor in the pathogenesis of lymphoma however other factors such as infection with oncogenic viruses (Epstein Barr virus and Kaposi sarcoma herpes virus), chronic antigen stimulation, abnormal cytokine production and a possible lymphomagenic role of the HIV itself are cofactors (58). A second demonstration of the importance of the immune system in the development of NHL is the phenomenon of post-transplant lymphoproliferative disorders, a group of B-NHL that occurs in the setting of severe acquired immunodeficiency following immune suppression for solid or haematopoietic stem cell transplantation (1).
With respect to infection in the aeitology of NHL, specific infections are implicated in B-NHL: Helicobacter pylori infection is associated with gastric marginal zone lymphoma (59) and Epstein Barr Virus is strongly associated with Burkitt lymphoma and other B-NHLs (49).
In the past, due to a reliance on smaller case-control studies, evidence for environmental exposures as risk factors for MBCN has been inconsistent and weak. The international Interlymph consortium, combining a number of case-control studies in a pooled analysis, reported associations between lifestyle factors or occupation and B-NHL were weaker than thosefor medical history or family history (9). There was an inverse association between alcohol consumption and B-NHL which has been observed in previous studies. Increased duration of smoking was associated with a modest, statistically significant increased risk for some B-NHL types (a 1.5-fold increased risk for Waldenström’s, 1.27-fold for marginal zone lymphoma, 1.19 for follicular lymphoma and 1.24-fold for mantle cell lymphoma). There was no association observed between smoking and DLBCL or Burkitt lymphoma risk. There was an observed association between recreational sun exposure and a reduced risk of NHL (OR=0.74 per increasing quartile of hours sun exposure per week) which is in
20 keeping with the literature suggesting low vitamin D levels are associated with risk of NHL (60-62).
The only occupations associated with increased risk of NHL in the Interlymph study were those of painter and farm worker. Work as a painter was associated with an increased risk of Burkitt lymphoma (OR=2.28), and occupation as a general farm worker was associated with a modest increased risk of all NHL (OR=1.28). Occupation as a teacher was associated with a reduced risk of Waldenström’s, marginal zone lymphoma and Burkitt lymphoma. Higher socioeconomic status was associated with a reduced risk of NHL (OR=0.88 per increasing tertile of socioeconomic category). Other putative NHL risk factors that were evaluated without evidence of association were exposure to hair dye and hormonal / reproductive factors.
There is strong evidence from a number of studies for the association between obesity and MBCN risk. In the Interlymph consortium study, there was an association between adult body mass index (BMI) and DLBCL (OR=1.32 per each increasing WHO category of BMI) but not for other subtypes. In comparison, there was an association between body mass index as a young adult and NHL with no evidence of heterogeneity between NHL subtypes (OR=1.95 per increasing category of BMI). Meta-analyses comparing individuals of normal weight with those overweight or obese report the risk of MM and NHL to be significantly increased (63, 64). An increase in BMI of 5 kg/m2 was associated with a modest increase in overall NHL risk (RR 1.07, 95%CI 1.04-1.10) and in DLBCL risk (RR 1.13, 95%CI 1.02-1.13) (64). Analysis of SEER data also suggests obesity (BMI ≥30 kg/m2) is associated with shorter overall survival from NHL, compared with non-obese (HR 1.32, 1.02-1.70) (65).
For MM, lifestyle and environmental factors associated with risk have been investigated using case-control studies, prospective cohort studies and meta- analyses. Environmental factors associated with increased risk include smoking (66) and occupation (including farming, longer term exposure to pesticides and occupational exposure to benzene or petroleum products) (57). A recent meta- analysis reports alcohol consumption to have a small protective effect (pooled RR 0.88, 95% CI: 0.79, 0.99) ((57). Obesity also appears to be a positively associated with risk of MM (57). BMI assessed in a prospective cohort study demonstrated that increasing BMI was associated with modest increased risk of MM both at study entry (HR = 1.10, 95% CI: 1.00, 1.22 per 5kg/m2) and at age 50 years (HR = 1.14, 95% CI: 1.02, 1.28) (67). A meta-analysis of 16 prospective studies of obesity and NHL risk confirmed a small but statistically significant increased risk of NHL with each 5kg/m2 increment (RR 1.07, 95%CI, 1.04-1.10).
Overall, there is a lack of appropriately prospective studies with which to validate the findings of retrospective case-control studies.
21 Familial predisposition to MBCN
Twin studies evaluate the concordance of a disease in monozygotic twins who share all genes and dizygotic twins who share a proportion of genes. If concordance is higher in monozygotic twins it provides evidence for a genetic component. A large study of leukaemia in 44,788 twins from Scandanavia found an excess in monozygotic twins, yielding an estimated heritability of 20% (68). While all leukaemias were included in this study, the results are considered to be largely attributable to CLL as there is minimal evidence for familial clustering in the other leukaemias: acute lymphoblastic leuekaemia or acute myeloid leukaemia.
Familial clustering of lymphoid haematological malignancies has been noted by numerous case reports and small population-based case-control studies (69) (70) demonstrating that individuals within a family are at increased risk of developing different subtypes of MBCN. In the large Interlymph study, the prevalence of a family history of NHL in cases was compared with that of controls. For those with a first- degree relative with NHL, there was a 1.8-fold increased risk of NHL compared with those with no family history (9). The study controlled for an extensive range of putative environmental and medical risk factors for NHL, leading the authors to suggest that the association between family history and increased NHL risk is due to shared genetics rather than shared environmental factors (24).
More recent analysis of large family cancer registry data has enabled an estimation of the risk of lymphoid malignancy when a first-degree relative is affected; the larger numbers in such studies and unselected case identification minimise the biases to which small case-control studies are prone. A large population-based case-control study using cancer registry data from Sweden and Denmark evaluated 8,974 first- degree relatives of 2,517 DLBCL cases and 10,188 relatives of 2,668 follicular NHL cases for the prevalence of NHL compared with that in relatives of matched controls. First-degree relatives of DLBCL cases had a 9.8-fold increase risk of DLBCL and first- degree relatives of follicular NHL had a 4-fold increased risk of follicular NHL (71). In 2009, Landgren et al (37) reported a large population-based case-control study of 4,458 MGUS cases and 14,621 first-degree relatives (compared with 17,505 controls and 58,387 first-degree relatives of controls). Relatives of MGUS patients had a 4.0- fold (95% CI 1.5-11.0) of Waldenström’s, while relatives of IgM MGUS patients had a RR 5.0-fold (95% CI 1.1-23) of CLL (4). A case-control study of Waldenström’s published in 2008 demonstrated 3.0-fold (95% CI 2.0-4.4), 3.4-fold (95% CI 1.7-6.6) and 5.0-fold (95% CI 1.3-18.9) increased RR of developing non Hodgkin lymphoma, CLL and MGUS, respectively (72). These two landmark studies add further evidence of shared heritable susceptibility pathways that predispose to MGUS, Waldenström’s and other MBCNs.
For MM, the largest population-based study evaluated 14,621 relatives of 4,458 MGUS cases compared with relatives of matched controls, finding that a family history of MGUS was associated with a 2.8 fold risk of developing MGUS (4). Results
22 have been confirmed by other studies in populations of European ancestry (73) and also by smaller studies in African-American populations (74). A large population based study using cancer registry-identified cases pooled from USA, Canada and Europe evaluated 2,843 MM cases and 11,470 controls. The presence of a first- degree family history of any lympho-haematopoetic malignancy was positively associated with the risk of MM (OR 1.29, 95% CI: 1.08-1.55) (75). MM risk was positively associated with having a first-degree relative with MM (OR 1.90, 95% CI: 1.27-2.88). There were differences by ethnicity; for African/American participants, there was a very strong association between MM risk and having a first-degree relative with MM (OR 5.52, 95% CI: 1.87-16.28).
Genetic risk and MBCN
Genome-wide association studies (GWAS) use a whole genome sequencing approach to identify single nucleotide variations of relatively common population frequency (>5%) to identify genetic markers associated with disease.
Overall, 81 SNPs have been identified from GWAS of MBCN and are summarised in the table below. Few candidate gene loci have been replicated in separate studies. As the loci are common by definition (relative allele frequency [RAF]>5%) and have small effect sizes, the findings of GWAS studies support the hypothesis that multiple genetic modifications and non-genetic risk factors are necessary for NHL genesis. There are clusters of SNPs at 6p21.32 (an immune regulatory region known as the human leucocyte antigen system (HLA)) and 8q24 (near the MYC gene) with other SNPs scattered across the genome. Targeted studies of genes encoding the B cell activating factor protein BAFF also suggest an association between mutations in TNFRSF13C/BAFF-R and TNFSF13B and CLL risk (76).
23 Table 4: SNPs identified from published genome-wide association studies
Disease SNP Chm.loc RAF Nearest gene p OR Refs (combined) (combined)
CLL rs17483466 2q13 0.21 ACOXL, BCL2L11 2.36E-10 1.39 (11) (12) CLL rs13397985 2q37.1 0.19 SP140, SP110 5.40E-10 1.41 (11) (13) (12) CLL rs3770745 2p22.2 0.22 QPCT, PRKD3 1.68X10-8 1.24 (15) CLL rs13401811 2q13 0.81 ACOXL, BCL2L11 2.08E-18 1.41 (15) CLL rs9308731 2q13 0.541 BCL2L11 1.00E-11 1.19 (14) CLL rs3769825 2q33.1 0.45 CASP10/CASP8 2.50E-09 1.22 (15) CLL rs757978 2q37.3 FARP2 2.11E-09 1.39 (16) (12) CLL rs9880772 3p24.1 0.465 EOMES 2.55E-11 1.17 (14) CLL rs10936599 3q26.2 MYNN 1.74E-10 1.26 (17) CLL rs9815073 3q28 0.651 LPP 3.62E-08 1.2 (14) CLL rs898518 4q25 0.59 LEF1 4.24E-10 1.2 (15) CLL rs6858698 4q26 CAMK2D 3.07E-10 1.31 (17) CLL rs872071 6p25.3 0.54 IRF4 1.91E-20 1.54 (11) (13) (12) CLL rs210134 6p21.33 BAK1 9.47E-16 1.35 (19) (18) CLL rs210142 6p21.33 BAK1 1.03E-12 1.47 (19) (18) CLL rs73718779 6p25.2 0.11 SERPINB6 1.97E-08 1.27 (15) CLL rs2236256 6q25.2 IPCEF1 1.50E-10 1.23 (17) CLL rs17246404 7q31.33 POT1 3.40E-08 1.22 (17) CLL rs2456449 8q24.21 - 7.84E-10 1.26 (16) (12) CLL rs1679013 9p21.3 0.52 CDKN2B-AS1 1.27E-08 1.11 (15) CLL rs4406737 10q23.31 0.57 ACTA2/FAS 1.22E-14 1.27 (15) CLL rs735665 11q24.1 0.21 GRAMD1V 3.78E-12 1.45 (11) (20) CLL C11orf21, TSPAN (14) rs7944004 11p15.5 0.49 2.15E-10 1.27 32 CLL rs7176508 15q23 0.37 - 4.54E-12 1.37 (11) CLL rs8024033 15q15.1 0.51 BMF 2.71E-10 1.22 (15) CLL rs7169431 15q21.3 - 4.74E-07 1.36 (16) CLL rs783540 15q25.2 CPEB1 1.10E-07 1.17 (13) CLL rs305077 16q24.1 0.37 IRF8 3.37E-08 0.66 (19) CLL rs391525 16q24.1 0.37 IRF8 3.16E-09 0.64 (19) CLL rs2292982 16q24.1 0.37 IRF8 6.48E-09 0.65 (19) CLL rs2292980 16q24.1 0.37 IRF8 1.89E-08 0.68 (19) CLL rs305061 16q24.1 IRF8 3.60E-07 1.22 (16) CLL rs4368253 18q21.32 0.69 PMAIP1 2.51E-08 1.18 (15) CLL rs4987852 18q21.33 0.06 BCL2 7.76E-11 1.41 (15) CLL rs4987855 18q21.33 0.91 BCL2 2.66E-12 1.47 (15) CLL rs11083846 19q13.32 0.22 PRKD2, STRN4 3.96E-09 1.35 (11)
DLBCL rs79480871 2p23.3 0.076 NCOA1 4.23E_8 1.35 (21) DLBCL HLA-DQB1, HLA- (22) rs10484561 6p21.32 1.00E-07 1.36 DQA1 DLBCL rs2647012 6p21.32 HLA-DQB1 1.28 DLBCL rs2523607 6p21.33 0.12 HLA-B 2.40E-10 1.45 (21) (23) DLBCL rs872071 6p25.3 IRF4 1.2 DLBCL rs116446171 6p25.3 0.019 ExOC2 2.33E-21 2.26 (21) (23) DLBCL rs13255292 8q24.21 0.321 PVT1 1.15E-13 1.19 (21) (23) DLBCL rs4733601 8q24.21 0.477 PVT1 3.63E-11 1.45 (21) DLBCL rs7097 13q12 LNX2 6.57E-07 1.42 (24) DLBCL rs751837 14q32 CDC42BPB 3.30E-07 3.5 (24)
24 Disease SNP Chm.loc RAF Nearest gene p OR Refs (combined) (combined)
Follicular rs2647012 6p21.32 - 2.00E-21 0.64 (22) Follicular HLA-DRB5, HLA- (25) rs4530903 6p21.32 2.69E-12 1.93 DQA1 Follicular rs7755224 6p21.32 - 5.18E-08 1.95 (20) Follicular HLA-DQB1, (20) (22) rs10484561 6p21.32 5.18E-08 2.07 HLADQA1 Follicular HLA-DQB1, HLA- (25) rs2647046 6p21.32 3.77E-10 0.59 DQA2 Follicular rs6457327 6p21.33 STG 4.00E-11 0.59 (26) Follicular rs9268853 6p23 - 2.48E-10 1.56 (25)
Mixed NHL rs6773854 3q27 - 3.36E-13 1.44 (27) Mixed NHL rs12289961 11q12.0 - 3.89E-08 1.29 Myeloma rs1052501 3p22.1 0.2 ULK4 7.47E-09 1.32 (28) (29)
Myeloma rs10936599 3q26.2 0.75 MYNN 8.70E-14 1.26 (30) (29) Myeloma rs56219066 5q31 0.73 ELL2 2.20E-10 1.24 (31) (29) Myeloma rs2285803 6p21.33 0.32 PSORS1C1 9.67E-11 1.19 (30) (29) Myeloma rs34229995 6p22.3 0.029 JARID2 1.30E-08 1.37 (29) Myeloma rs9372120 6q21 0.218 ATG5 9.09E-15 1.18 (29) Myeloma rs4487645 7p15.3 0.71 DNAH1 3.33E-15 1.38 (28) (32) Myeloma rs7781265 7q36.1 0.125 SMARCD3 9.71E-09 1.19 (29) Myeloma rs1948915 8q24.21 0.345 CCAT1 4.20E-11 1.13 (29) Myeloma rs2811710 9p21.3 0.657 CDKN2A 1.72E-13 1.15 (29) Myeloma rs2790457 10p12.1 0.739 WAC 1.77E-08 1.12 (29) Myeloma rs7193541 16q23.1 0.585 RFWD3 5.00E-12 1.13 (29) Myeloma rs4273077 17p11.2 0.12 TNFRSF13B 7.67E-09 1.26 (30) (29) Myeloma rs6066835 20q13.13 0.083 PREX1 1.36E-13 1.26 (29) Myeloma rs877529 22q13.1 0.44 - 7.63E-16 1.23 (30) (29) Myeloma rs603965 11q13.3 CCND1 7.96E-11 1.82 (33) (29)
SNP = single nucleotide polymorphism RAF= relative allele frequency
25 2.2 DNA Methylation
Epigenetic mechanisms The term ‘epigenetics’ was coined in the 1950s, meaning ‘over’ or ‘upon’ genetics. The modern description of epigenetics refers to mitotically and / or meiotically heritable changes in gene expression without alteration to the DNA sequence (39). Epigenetic modifications include DNA methylation, histone modification, chromatin remodeling and noncoding RNAs (39).
Regulation of DNA Methylation in normal epigenome
DNA methylation involves the addition of a methyl group at the 5’ position of the cytosine ring after DNA replication at positions where cytosine is adjacent to guanine, known as a CG dinucleotide or CpGs. The genomic location of CpGs and their methylation pattern sheds insight into CpG methylation regulation and function. The majority of the genome is observed to be CpG-poor, containing only approximately 21% of the CpG dinucleotides that are statistically expected. A CpG- poor genome is common amongst vertebrates and is explained by the observation that methylated cytosine has a propensity to be deaminated by 5-methylcytosine deaminase, forming thymidine in its place (40).
The majority of CpGs actually reside within CpG islands of 0.5-4kb in length in which the proportion of observed CpGs is higher than that across the rest of the genome (77). Other descriptions of CpG locations include ‘shores’, ‘shelves’ and ‘open sea’ locations, which reference the proximity of the CpG to a defined CpG island.
While sporadically located CpGs are generally methylated, CpGs located within CpG islands are virtually always unmethylated in normal cells. These unmethylated, CpG- rich islands are frequently associated with gene bodies and promoter-associated regions. The mechanisms that protect CpG islands from methylation are unclear, but it is known that methylation of CpG Islands is possible and occurs during normal female X chromosome inactivation. Functionally, X chromosome inactivation via DNA methylation is associated with long term stabilization of a repressed state. In cancer cells, promoter-associated CpG island methylation is a common feature (78, 79).
The genomic location of CpG Islands is a clue to their functional importance. Approximately 70% of gene promoters are associated with a CpG Island, including housekeeping genes, tissue-specific genes and developmental regulator genes (80) (77). Conversely, about half of CpG Islands occur within traditionally-defined promoter regions of annotated genes (78).
DNA methylation is catalysed by DNA methyl-transferases (DNMTs). The addition of a methyl group results in a change to binding characteristics of the transcription assembly that in some circumstances leads to downstream gene-silencing. DNMT3a and DNMT3B target previously unmethylated CpGs, resulting in de novo
26 methylation, while DNMT1 maintains existing methylation patterns following DNA replication. De novo methylation catalyzed by DNMT3A and DNMT3B is an infrequent event occurring during chromosome X inactivation and imprinted genes (77). The more ubiquitous mechanism of methylation is maintenance by DNMT1, which conserves DNA methylation patterns after DNA replication. There are also active processes of demethylation, the most well-described being the ten-eleven translocation (TET) enzymes (TET1, TET2 and TET3) which convert 5- methylcytosine to 5-hydroxomethylcytosine (81-83). The TET enzymes are thought to be important in maintaining CpG island protection from methylation. Disruption of TET protein activity is associated with aberrant methylation. Other enzymes involved in active and passive removal of methyl groups are activation-induced cytidine deaminase (AID) and Thymine DNA glycosylase (TDG).
DNA Methylation and gene silencing The settings in which DNA methylation may result in gene silencing remain unclear and a number of different mechanisms for this have been proposed.
DNA methylation changes DNA binding conformation. In this model, DNA methylation may directly inhibit transcription factors from binding to their cognate DNA sequence.
MBD-mediated transcriptional silencing. Methylated CpG sites attract the binding of proteins that recruit chromatin-modifying activities. Four such proteins MBD1, MBD2, MBD3 and MeCP2 have been implicated in methylation-dependent transcriptional repression but further specific mechanisms have not been shown (77).
DNA methylation targets transcriptionally inactive genes. In some instances, DNA methylation targets genes that are already transcriptionally repressed, and acts to cause irreversible silencing. Methylation of the Xist gene after it has been inactivated by non-coding chromosomal RNA is described during X chromosome in activation (77). The active X chromosome, in contrast, is protected from DNA methylation.
DNA Methylation and Cancer Abnormal patterns of DNA methylation, both loss of normal CpG methylation and gain of methylation in CpG islands, have been recognised as a hallmark of cancer for many years (78).
(I) Gain of methylation and associated gene silencing in cancer In cancer cells, hypermethylation of promoter-associated CpG Islands and associated gene-silencing has been described in a number of putative and established tumour suppressor genes. An epigenetic hypothesis of cancer pathogenesis suggests that methylation-associated tumour suppressor gene silencing could be one of the two genetic ‘hits’ required to silence both copies of the gene (78). The large number of
27 established or putative tumour suppressor genes silenced by promoter hypermethylation in different malignancies indicates the prevalence of this mechanism in cancer pathogenesis. The summary below, compiled from two review articles on promoter hypermethylation (78, 84) and a literature search of promoter hypermethylation with associated reduced gene expression in MBCN. The resulting gene list was cross-checked with TSGene 2.0, a database of tumour suppressor genes (85).
Table 5: Putative tumour suppressor genes exhibiting promoter hypermethylation and reduced gene expression
Gene Function Tumour AHR Transcription factor MCL (86) APC Inhibitor of β-catenin Aerodigestive tract (84) AR Androgen receptor Prostate, MBCN (84, 87) BRCA1 DNA repair, transcription Breast, ovary (84) Breast, stomach, leukaemia, NHL CDH1 E-cadherin cell adhesion (36, 78, 86, 88) CDH13 H-cadherin, cell adhesion Breast, lung (84) COX-2 Cyclo-oxygenase 2 Colon, stomach (84) CYB5R2 Cell proliferation inhibition TET2-mutated DLBCL (43) Lymphoma, lung, colon, MM (36, DAPK Pro-apoptotic 89-94) DKK1 Transcriptional target of P53 MM (95, 96) EGR1 Transcription factor and regulator of TP53 Follicular NHL (97) ER Oestrogen receptor Breast (84) EXTI Heparin sulphate synthesis Leukaemia, skin (84) FAT Cadherin, tumour suppressor Colon (84) FOXA1 Represses cell proliferation and migration CLL, follicular (97, 98) FOXA2/HNF3B Transcription factor CLL (98) GATA-4 Transcription factor Colon, stomach (84) GATA-5 Transcription factor Multiple types (84) GPX3 Detoxifies reactive oxidative species MM (99) GSTP1 Conjugation to glutathione Prostate, breast, kidney (84) HIC-1 Transcription factor Multiple (84) hMLH1 DNA mismatch repair Colon, endometrium, stomach (84) HOXA9 Homeobox protein Neuroblastoma (84) (46) (100) IGFBP3 Growth factor-binding protein Lung, skin (84) IKZf1 T cell differentiation DLBCL IRX1 Targets anti-angiogenesis genes CLL, gastric (84) KLF4 Cell cycle inhibitor Follicular NHL JDP2 Transcription regulator DLBCL LKB1/STK11 Serine/threonine kinase Colon, breast, lung (84) MGMT DNA repair of O6-alkyl-guanine Multiple (84) MIR129-2 Promotion of apoptosis MM A transcriptional target of TP52; silences B-cell MIR34B/C Colon, MM (84) translocation gene 4 (BTG4) NOREIA Ras effector homologue Lung (84)
28 P14/ARF MDM2 inhibitor Colon, stomach, kidney (84) P15/CDKN2B Cyclin-dependent kinase inhibitor Leukaemia, MM, lymphoma P16/CDKN2A Cyclin-dependent kinase inhibitor Multiple (84) Gene Function Tumour P73 P53 homologue Lymphoma PR Progestrone receptor Breast (84) PRLR Prolactin receptor Breast (84) RARB2 Retinoic acid receptor β2 Colon, lung, head and neck (84) RASSFIA Ras effector homologue Multiple (84) RB Cell cycle inhibitor Retinoblastoma (84) RBP1 Morphogenesis and cellular proliferation MM RIZ1 Histone/protein methyltransferase Breast, liver (84) SFRP1 Secreted Frizzled-related protein 1 Colon (84) SOCS-1 Inhibitor of JAK/STAT pathway Liver, myeloma (84) SOCS-3 Inhibitor of JAK/STAT pathway Lung (84) SOX11 Transcription factor CLL SPARC Haematopoiesis MM SRBC BRCA-1 binding protein Breast, lung (84) SYK Tyrosine kinase Breast (84) Cell cycle inhibitor, can also be tumour TGFBI MM promoter THBS-1 Thrombospondin-1, anti-angiogenic Glioma (84) TMS1 Pro-apoptotic Breast (84) TPEF/HPPI Transmembrane protein Colon, bladder (84) VHL Ubiquitin ligase component Kidney, haemangioblastoma (84) MCL = mantle cell lymphoma
(II) Loss of methylation in cancer Widespread DNA hypomethylation occurring across the genome is observed in a variety of cancer cells. Within colorectal tumour cell lines, hypomethylation is associated with genomic instability suggesting a possible mechanism of oncogenesis may be via the induction of chromosomal abnormalities. Experimentally, Chen et al examined DNMT1-/- embryonic stem (ES) cells after noting that disruption of DNMT1 in mice resulted in genomic demethylation and a lethal phenotype (101). The DNMT1 knockdown ES cells demonstrated higher rates of gene mutations, a phenotype that was reversed when ES DNMT1-/- cells were rescued with DNMT1 cDNA. The nature of mutations in the DNMT1-deficient cell lines were deletions and insertions, and the findings suggest that widespread loss of DNA methylation may result in chromosomal instability.
29 Aberrant DNA methylation in MBCN
Methylation in Chronic lymphocytic leukaemia The CLL methylome has been extensively studied using modern array technology, following on from early studies repeatedly demonstrating promoter methylation in a number of candidate genes: DAPK1, TWIST2, ZAP70, HOXA4, SFRP1, SFRP2, SFRP4, ID4, cyclin-related genes P16 (CDKN2A), P15 (CDKN2B) and CDH1 and DUSP22 (102- 109).
The most comprehensive genome-wide analysis of DNA methylation in CLL to date was published by Kulis et al in 2012 using the Infinium 450K (42). Methylation in two CLL subgroups was compared with that of B cells from a normal donor using both the Illumina Infinium 450K array and bisulfite sequencing. In total, 139 CLL cases were studied, comparing IGH-unmutated CLL to naïve B cells and IGH-mutated CLL to memory B cells. Hypermethylated CpGs were enriched at 5’ regulatory regions, in CpG islands and in 5’ regions of introns. Hypomethylation was observed in CpGs in the gene body located outside CpG islands.
A number of other groups have investigated methylation in CLL/SLL using DNA methylation assays with limited genome coverage including such as using an endonuclease digestion method known as methylated CpG island amplification microarray (MCAM) (110, 111) but these are not whole genome assays and are, therefore, not further considered here. Results of aberrant methylation reported by single studies should be interpreted with caution as the published studies are generally based on the analysis of between 30-100 CLL samples. For instance, while the gene AIRE was reported to be unmethylated by the Kanduri study, Pei et al found this gene to be hypermethylated.
In summary, CLL exhibits a range of pattern of aberrant methylation including promoter methylation and widespread non-promoter hypomethylation, although the published studies vary in sample selection, type of control used, type of methylation assay, statistical analysis and reporting, making comparisons between studies challenging. A small number of genes has been recurrently identified as having aberrant methylation: ID4, ABI3, WISP3, and SFRP1 as well as cell cycle regulators P15/CDKN2B and P16/CDKN2A.
Table 6: Summary of published studies reporting aberrant DNA methylation in CLL
Author Subjects / Methodology Finding Pei 2012 (98) 11 CLL samples, 3 normal B 533 significant DMRs. cell controls Hypermethylation of FOXD3, FOXE1, FOXG1, RIX1, ID4, Method: RRBS (23,000 CpG SFRP1, SLIT2, BNC1, ADCY5, EBF3, NR2F2 and DIO3. In islands). addition, HOXD8, HOXD11, HOXC13, SOX1, SOX2, SOX4, SOX6, SOX9, SOX11. Methylation status of FOXA1, FOXA2, SOX9, SOX11 and IRX1 inversely correlated with gene expression. Hypomethylation of TCL1A, BCR, LFNG, NOTCH1, TCF7, RASGRF1 and VAV2 (genes with a known or potential role as oncogenes).
30 Author Subjects / Methodology Finding Kanduri 2010 23 CLL samples Widespread hypomethylation in all CLL samples (38) Separated into IGVH mutation compared to normal B cells. Methylation of VHL, status (mutated [M] and SCGB2A1, ABI3, GPX2, IGSF4, SERPIND5 and ZNF540 in unmutated [U]) IGHV-U CLL and PPP1R3A and WISP3 (known TSGs) in Method: Infinium 27K , IGHV-M CLL. Genes involved in cell proliferation and absolute but not differential β tumour progression (PLD1 and BCL10 in IGHV-M CLL and methylation reported ADORA3, AIRE, CARD15, FABP7, LOC340061, PRF1, UNC5CL, ANGPT2, IFNB1, URP2, BCL2, IL19, IL17RC and S100A14) were unmethylated in IGVH-U CLL. Inverse correlation between methylation and gene regulation demonstrated for VHL, IGSF4 and ABI3. A distinct methylation profile was reported between IGVH- U and IGVH-M. Ronchetti 37 CLL samples (purified B Few shared DMPs between IGVH subgroups. 2014 (112) cells) Only 35 DMPs in which increased DNA methylation were Samples separated into IGVH inversely correlated with gene expression including genes mutation status encoding ZNF471 and ZFP28 and SPG20 gene. Method: Infinium 27K Cahill 2013 18 CLL samples separated into DNA methylation changes over two sampling timepoints (113) IGVH mutation status separated by 5-8 years demonstrated to be stable. Method: Infinium 450K, Increased methylation of NOTCH1 was observed in IGHV- absolute but not differential β M CLL compared to normal B cells. methylation reported Halldorsdottir 30 CLL samples, 20 MCL Methylation of apoptosis-related genes e.g. CIDEB, 2012 (46) samples CYFIP2, NR4A1, PRDM2 and PTGES. ABI3 and VHL were Method: Infinium 27K methylated only in unmutated CLL. Kulis 2012 139 CLL samples compared to IGVH-U CLL exhibited 3243 hypermethylated DMPs and (114) normal B cells separated into 29743 hypomethylated DMPs while IGVH-M CLL had 246 IGVH mutation status. hypermethylated and 4606 hypomethylated DMPs. Method: Infinium 450K with confirmation by bisulfite Note: Threshold for calling differential methylation was sequencing not made clear, therefore DMPs from this analysis were not included in list of confirmed differentially-methylated MBCN genes. RRBS = reduced representation bisulfite sequencing, TSG = tumour suppressor gene, MCL = Mantle cell lymphoma
A few studies have shown the methylation status of some genes in CLL cells at diagnosis to be associated with prognosis.
Table 7: Summary of studies reporting an association between DNA methylation and prognosis
Author Subjects, methodology Finding Irving 2011 118 CLL samples Methylation of CD38, HOXA4 and BTG4 as a predictor of (115) Method: Bisulfite treatment and survival gel-based PCR (COBRA assay) Tong 2010 78 CLL samples Methylation status in LINE and APP assocated with (110) Method: Endonuclease digestion shorter survival (MCAM) Queiros 211 CLL samples 5 CpG sites can reliably differentiate between three CLL 2015 (116) Validation series: 97 samples subgroups with prognostic difference Method: infinium 450K COBRA = Combined Bisulfite Restriction Analysis, MCAM = Methylated CpG Island Amplification and Microarray
31 Methylation in B cell non-Hodgkin lymphomas The following studies have assessed DNA methylation within MBCN tumour tissue.
The early studies of DNA methylation focused on methylation in cell cycle regulation genes. In 1999, Baur et al reported methylation of the P16 gene (CDKN2A) in 32% of B-NHL, most commonly in DLBCL (50%) (117). Methylation of the P15 gene (CDKN2B) was found in 64% of B-NHL, both low grade and high grade histological subtypes. Both P15 and P16 inactivation are considered to be important in haematological malignancies but there is limited evidence of mutations or homozygous deletions in lymphomas. It is thus proposed that aberrant DNA methylation is the primary mechanism of P15 and P16 silencing (118, 119).
In 2009, using MSP and bisulfite sequencing, Martin-Subero et al (120) reported promoter hypermethylation in a mixed sample of B-NHL compared with normal B cells. Methylated genes were highly enriched for targets of the polycomb repressor complex. The same group utilised the emerging array-based technology for methylation measurement (Infinium Goldengate assay) to investigate the pattern of methylation changes in a wide range of haematological neoplasms (121). Methylation in tumour tissue from 367 haematological neoplasms was compared with that from matching normal cells of the haematopoietic system: whole peripheral blood, whole bone marrow and CD34+ cells were used as controls for myeloid neoplasms, CD3-positive T cells for T-cell neoplasms and CD19-positive cells, germinal centre B cells and lymphoblastoid cell lines for B-cell neoplasms. They reported differential methylation patterns by unsupervised hierarchicial analysis between B cell neoplasms, T cell neoplasms and myeloid neoplasms with greater similarity of hypermethylated genes between T and B cell neoplasms compared with myeloid neoplasms. Six genes were hypomethylated in all three types of haematological neoplasms: DIO3, FZD9, HS3ST2, MOS and MYOD1. They noted higher levels of de novo DNA methylation in germinal centre B cell derived lymphomas such as DLBCL, Burkitt lymphoma and follicular lymphoma, intermediate de novo methylation in mantle cell lymphoma and MM and lower levels of methylation in CLL and marginal zone lymphoma (110).
Other genes methylated in B-NHL include: transcription factor ABF1 in follicular lymphoma, Burkitt lymphoma and DLBCL (122), DAPK1 promoter methylation in 85% follicular lymphoma and 72% mucosa-associated lymphoid tissue (MALT) lymphoma, MGMT methylation in 28% of all mature B cell tumours (123) and CDKN1C promoter methylation in 44% follicular lymphoma and 55% DLBCL (35).
32 Methylation in mantle cell lymphoma Mantle cell lymphoma (MCL) is characterised by a major genetic event in which the t(11;14)(q13;q32) translocation leads to cyclin D1 gene (CCND1) dysregulation. Within mantle cell lymphoma there are heterogeneous subgroups with varied prognosis and clinico-biological behaviour, including the presence or absence of SOX11 expression. This has led to the proposition that additional epigenetic events could be critical to mantle cell lymphoma diversity (86, 124). Quieros et al described the mantle cell lymphoma methylome in detail and proposed an epigenetic model of mantle cell lymphoma in which epigenetic hits are acquired by B cells carrying the t(11;14) mutation (47). The methodology of the experiment is notable for the investigators’ attempt to apply a correction for differences in white blood cell composition within tumour samples
Table 8: Summary of published studies reporting aberrant DNA methylation in mantle cell lymphoma
Author Subjects, methodology Finding Enjuanes MCL cell lines and 38 MCL Hypermethylation of seven TSGs: CDH1,AHR, (86), 2011 (86) tumour samples. ROBO1, SOX9, NR2F2, and NPTX2 but not CDC14B. Only Method: MassArray EpiTYPER SOX9, AHR and NR2F2 demonstrated inverse correlation applied to 25 candidate genes between methylation and gene expression. Halldorsdottir 30 CLL samples, 20 MCL No significant methylation differences between MCL 2012 (46) samples subgroups. Method: Infinium 27K Methylation of homeobox genes HOXA2, HOXA9, and HOXA13 (chm 7) and HLXB9, LHX1, PAX7, and POU4F1. Hypomethylation of proto-oncogene MERTK and CAMP. Quieros 2016 82 MCL samples, separated Two distinct methylation profiles correlating with SOX11 (47) into SOX11 expressers / non- expression and prognosis. expressers Method: Infinium 450K,
Methylation in follicular lymphoma The early application of array-based methylation technologies, e.g. Killian et al (125), demonstrated pervasive differential methylation in follicular lymphoma. Using the smaller scale Infinium Goldengate assay on 12 follicular lymphoma samples, the authors noted differential methylation at 259 CpG probes of the approximate 1500 in the array. The following table summarises studies in which differential methylation has been described at individual genes in follicular lymphoma.
Table 9: Summary of published studies reporting aberrant DNA methylation in follicular lymphoma
Author Subjects, methodology Finding O’Riain 2009 164 follicular lymphoma samples Promoter hypermethylation of 133 genes including (93) Method: Infinium Goldengate assay HOXA9, ONECUT2, NOTCH3, DAPK1, MYOD1, GRB10. Enrichment for gene targets of polychromb repressor complex. Bennett 2009 Six follicular lymphoma samples Widespread CpG island hypermethylation enriched (97) Method: MCAM validated with for homeobox genes and targets of polychromb COBRA and bisulfite sequencing repressor complex. Methylation of HOXA11 and PAX6 confirmed by COBRA and sequencing. Hypermethylation with reduced gene expression in CCND1, EGR1, FOXA1, KLF4, SIM2, SOX9.
33 Author Subjects, methodology Finding Choi 2010 14 follicular lymphoma samples Promoter hypermethylation in HOXA4, HOXA9, (100) Method: MIRA-assisted microarray PCDHGA-B gene cluster, PCDHA gene cluster and analysis for enrichment of CpG IRF4. islands followed by RRBS. Validation with massARRAY EpiTyper MIRA = Methylated-CpG island recovery assay, MSP = methylation-sensitive PCR
A small number of studies has suggested that methylation also appears to be associated with follicular lymphoma prognosis. The study by Giachella et al reported aberrant DAPK1 methylation detectable in the peripheral blood of patients with newly diagnosed follicular lymphoma. This is the only published study yet to report peripheral blood methylation findings for MBCN. The authors discuss the potential for a peripheral blood methylation profile to be used as a clinically relevant biomarker (90).
Table 10: Summary of published studies reporting an association between methylation and prognosis in follicular lymphoma
Author Subjects, methodology Finding Alhelaly 2014 118 follicular lymphoma samples, Methylation of p16/CDKN2A in 19% samples, (118) laser-capture microdissection associated with gene inactivation and poor clinical Methods: MSP outcome. Giachella 107 patient samples from bone (93) promoter methylation associated with 2014 (90) marrow, blood and lymph node increased likelihood of relapse Method: MethylLight PCR
Methylation in diffuse large B cell lymphoma (DLBCL) DLBCL carries mutations in several genes related to epigenetic pathways (TET and EZH2). TET enyzmes, as described above, are involved in the removal of methyl groups from CpG sites. EZH2 encodes for a histone methyltransferase that is a functional component of the polychromb repressor complex 2. A number of inhibitors of EZH2 have been developed in phase 1 and 2 clinical trials in non- hodgkin lymphoma with promising clinical efficacy (126-128). Further support for the dysregulation of methylation in DLBCL, is an increase in protein expression of the DNA methylation transferases DNMT1, -3A and -3B observed in a proportion of cases of newly diagnosed DLBCL (129), together with correlation between DNMT expression status and the promoter hypermethylation status of 11 investigated genes. Other specific targets of increased promoter methylation include cell cycle regulators P15 and P16 as well as DAPK1, PCDH10 and MGMT (91, 130).
In genome-wide approaches to measuring methylation, there is an evolving description of specific gene promoter methylation, frequently within CpG islands (43) (131) as well as an observation of widespread non-promoter hypomethylation (132).
Aberrant methylation has also been tested as an independent predictor of prognosis such as MGMT methylation associated with favourable prognosis (133). Methylation
34 profile identified using the HpaII tiny fragment enrichment by ligation-mediated PCR (HELP) assay associated with poor prognosis Chambwe 2014
In the only published study of circulating cell-free DNA, Kristensen et al looked at methylation in five candidate tumour suppressor genes known to be methylated in DLBCL tumour samples. They found modestly increased methylation in DAPK1, DBC1 and only very slight increased methylation in MIR34A and MIR34B/C compared with controls (134). DAPK1 methylation disappeared in patients during remission but was detectable in patients with poorly responsive or relapsed disease and, therefore, could have a potential role as a biomarker. A number of studies identify MGMT methylation or a distinct generalised methylation pattern as being associated with prognosis in DLBCL (Table 12).
Table 11: Summary of published studies reporting aberrant DNA methylation in DLBCL
Author Subjects, methodology Finding Pike 2008 8 DLBCL samples divided into Methylation of 12 of 15 candidate CpG islands (AR, (87) cell-of-origin groups GCB and CDKN1C, DLC1, GATA4, GDNF, GRIN2B, MTHFR, MYOD1, ABC NEUROD1, ONECUT2 and TFAP2A) Method: Screening with endonuclease digestion and sequencing, confirmation by Methylight Lee 2009 44 DLBCL samples Hypermethylation of promoters regions of MGMT (131) Method: MSP (52%), P15 (32%), P16 (55%), P57/CDKN1C (48%) and MAD2 (50%). MGMT and P57 methylation were associated with increased survival. Author Subjects, methodology Finding Shaknovich 69 DLBCL samples (min. 80% A methylation signature identifies GCB and ABC 2010 (135) tumour purity). Divided by cell- subtypes of DLBCL with a subset of 16 differentially of-origin groups GCB and ABC methylated genes with inversely correlated gene Method: HELP assay. expression. IKZf1, ASPHD2, PMM2, PAK1, LYAR, JDP2, Validation: EpiTYPER® assay FGD2, GALNS, IL12A, ARHGAP17, SORL1, KIAA0746, LANCL1, KCNK12, SOX9, CXorf57. Asmar 2013 100 DLBCL samples Compared to TET2wt, the TET2mut samples exhibited (44) Divided by TET2 mutation hypermethylation in 578 CpGs (315 genes) including 35 status genes in which gene expression was reduced. Inclusive Method: Infinium 450K in this list were some genes identified by Liu et al: CYB5R2, SDR42E1 and ZIK1. Krajnović 51 DLBCL samples Hypermethylation was detected in P15 in 23%, P16 in 2014 (91) Method: MSP 37%, MGMT in 39% and DAPK in 55%. Methylation of P15 was as associated with better prognosis. Kristensen 119 DLBCL cases DAPK1 methylation present in 84% DLBCL samples. 2014 (92) Method: MSP Allelic DAPK1 methylation associated with poor survival Huang 2017 107 DLBCL cases PCDH10 methylation in 54% DLBCL, associated with (136) Method: PCDH10 promoter inferior prognosis methylation by MSP Liu 2017 31 DLBCL samples Compared to TET2wt, the TET2mut samples the (43) Divided by TET2 mutation following genes were hypermethylated with reduced status expression: CRY1, CYB5R2, DCLK2, SDR42E1, SPIB, ZIK1, Method: Infinium 450K ZNF134, ZNF256 and ZNF615. GCB = germinal centre B cell type, ABC = activated B cell type
35 Table 12: Summary of studies reporting an association between methylation and prognosis in DLBCL
Author Subjects, methodology Finding Lee 2009 MGMT methylation associated with favourable prognosis (131) Wedge 2017 Widespread hypomethylation associated with poor (132) prognosis Chambwe Method: HELP assay Methylation pattern an independent predictor of 2014 (137) prognosis
Multiple Myeloma (MM) A summary of the literature reporting DNA methylation aberrancy in MM tumour samples is presented below. Recurrent findings include frequent methylation of P16 and variable methylation of DAPK. Aberrant methylation of a number of tumor suppressor genes and genes important in important oncogenic pathways such as Wnt signaling is also reported.
Table 13: Summary of published studies reporting aberrant global DNA methylation in MM
Author Subjects, methodology Finding Salhia 2010 13 MGUS, 26 smouldering MM, Global hypomethylation in 97% MGUS, 90% (138) 140 MM samples. smouldering MM and 94% MM samples compared Method: Goldengate Illumina with normal plasma cells. Array Of the 1500 loci, only 22 were hypermethylated. Walker 2011 161 MM samples from patients Global hypomethylation evident in the transition (45) taking part in the UK MRC IX from MGUS to MM. Increased methylation of 82 clinical trial CpG probes (77 genes) was observed. Differential Method: Infinium 27K methylation in subgroup with hyperdiploidy and t(4;14) translocation. Aoki 2012 7 MGUS, MM74 Hypomethylation of repetitive elements LINE-1 and (139) Method: MSP Alu
Table 14: Summary of published studies reporting aberrant DNA methylation at CpG sites in MM
Author Subjects, methodology Finding Galm 2004 56 MM samples Methylation of SOCS-1 (46%), P16 (36%), ECAD (89) Method: MSP of 11 candidate (21%), DAPK (13%), TP73 (2%). tumour suppressor genes Tatetsu 2007 29 MM samples Methylation of PU.1 regulator region observed in (140) Method: bisulfite sequencing 35% MM samples but not in MGUS and was associated with down-regulation. Cheng 2007 28 MM samples PF4 promoter hypermethylation in 54%, associated (141) Method: bisulfite sequencing of with no or low PF4 expression. PF4 promoter PF4 is a putative tumour suppressor gene on 4q13.3 which is deleted in 35-50% MM. Gonzalez-Paz 17 MGUS, 40 smouldering MM, Methylation of P16 present in MM (34%) more 2007 (142) 522 MM frequently than smouldering MM (24%) or MGUS Method: MSP (28%). Methylation was not associated with reduced gene expression. Peng 2013 56 subjects, recently diagnosed 66% samples showed ADAMTS9 promoter (143) MM. methylation and gene silencing
36 Author Subjects, methodology Finding Yuregir 2009 20 newly diagnosed MM Methylation in P16 (10%), MGMT (40%), DAPK (94) Method: MSP (10%), ECAD (45%). Jost 2009 3 MGUS, 66 MM, 7 plasma cell Methylation of Frizzled-related proteins. (144) leukaemia (both at diagnosis and Methylation of SFRP1, -2 and -5 associated with gene relapse) silencing. Method: MSP Stanganelli 21 MGUS, 44 newly diagnosed Methylation status of tumour suppressor genes: 2010 (145) MM samples Methylation of SOCS-1 (52%), TP73 (45%), ARF Method: MSP (29%), P15 (32%), P16 (7%). Braggio 2010 68 newly diagnosed MM Methylation status of nine tumour suppressor genes: (36) Method: MSP Promoter hypermethylation in CDH1 (50%), P16 (43%), P15 (16%), SHP1 (15%), ER (13%), BNIP3 (13%), RARb (12%), DAPK 6%), MGMT (0%). Wong 2013 40 MGUS, 95 MM (newly 46% newly diagnosed MM, 41% relapsed MM: (146) diagnosed), 29 MM at mir129-2 methylation and reduced expression. relapse/progression Methylation and expression changes reversible Method: MSP following exposure to azacitidine Jung 2012 193 MM samples IGF1R and IL17RB-associated CpG probes were (147) Method: Goldengate Methylation methylation and genes expressed. Cancer Panel I, confirmed with P16 and DLC1-associated CpG probes were m bisulfite sequencing ethylated and genes expressed. Hatzimichael 40 MM samples Bcl2-interacting killer (BIK) gene methylated in 40% 2012 (148) Method: MSP cases at diagnoses. BIK methylation associated with evolution to relapsed / refractory disease Chim 2007 50 MM samples Methylation in genes associated with Wnt signaling (95) Method: MSP pathway: 42% samples demonstrated methylation of at least one of seven genes (WIF1, DKK3, APC, SFRP1, SFRP2, SFRP4, SFRP5) Kocemba 7 MGUS, 41 newly diagnosed MM Methylation of DKK1 promoter region associated 2012 (96) samples with reduced expression. Restoration of DKK1 Method: MSP and bisulfite expression noted after exposure to azacitidine. sequencing Wong 2011 95 MM at diagnosis, 23 MM MIR34B/C methylation and reduced expression in (149) relapse 5.3% MM at diagnosis and 52% at Method: MSP relapse/progression. MIR34B/C is a direct transcriptional target of TP53 MSP = methylation-specific PCR
Methylation patterns may be associated with the progression of disease from MGUS to MM and may also be associated with MM prognosis
Table 15: Summary of studies reporting an association between DNA methylation and the clinical progression in later stages of myeloma
Author Subjects, methodology Finding Guillerm 2003 61 newly diagnosed MM P16 methylation associated with poor overall (37) Method: MSP survival Galm 2004 56 MM samples P16 methylation associated with poor survival (89) Method: MSP of 11 candidate tumour suppressor genes Chim 2007 19 MGUS, 32 MM samples Methylation status of tumour suppressor genes. (150) Method: MSP Methylation of ECAD (56%), SHP1 (84%) and p16 (53%)was observed in MM but not MGUS.
37 Author Subjects, methodology Finding Park 2011 100 newly diagnosed MM P16 methylation an independent predictor for (151) samples overall survival Method: MSP Kim 2013 103 newly diagnosed MM P16 methylation present in 38% but not associated (152) Method: MSP with survival De Carvalho 51 MM samples Aberrant methylation in CDKN2B in 90%, CDH1 in 2009 (88) Method: MSP 88%, ESR1 in 73%, HIC1 in 73%, CCND2 in 63%, DCC in 45% and TGFBR2 in 39%. Methylation of DCC and TGFBR2 were associated with poor prognosis. De Larrea 75 MM relapsed after Hypomethylation in promoter of CXCR4 and NFKB1 2013 (153) bortezomib-based regimen. associated with increased survival. Method: PCR following digestion CXCR4 important in cell adhesion. with methylation-sensitive restriction enzymes Kaiser 2013 161 newly diagnosed MM Methylation of GPX3, RBP1, SPARC, TGFBI was (99) samples from patient in UK MRC associated with reduced gene expression and IX clinical trial shorter overall survival. These are putative tumour Method: Infinium 27K suppressor genes.
Environmental exposures and DNA methylation The putative associations between environmental exposures and changes in DNA methylation, in particular where exposures are also associated with MBCN risk are summarised below. Limited data are available to directly support a link between environmental exposures and methylation in the aetiology of MBCN but, given the known interaction between methylation and environmental exposures, these factors should ideally be identified and controlled for in large scale methylation studies.
Obesity/Diet Obesity induced by a high-fat diet fed to animals results in global DNA hypermethylation (154), while in humans, DNA methylation of specific obesity- related genes for leptin and adiponectin are associated with weight. An association between extreme caloric restriction and DNA methylation was proposed from data gathered following the Dutch Winter studies. There appeared to be an increased risk of colon cancer together with differentially methylated regions associated with prenatal exposure to famine (155).
Vitamin D Epidemiological studies generally suggest that increased exposure to ultra violet light is associated with a reduced risk of NHL (156) while low vitamin D levels are associated with shorter time to treatment in previously untreated CLL (60) as well as reduced overall survival from CLL and DLBCL (157). Epigenetic modification is a possible mechanism whereby vitamin D insufficiency may be related to MBCN risk – in cancer cell lines, the vitamin D receptor gene (VDR) is hypermethylated, with resultant reduced VDR expression. This effect can be reversed by exposure to
38 azacitidine with subsequent VDR hypomethylation and increased expression of the receptor (158).
Tobacco Smoking is modestly associated with follicular NHL risk, most prominently for females exposed to passive smoking (RR 2.02, 1.06-3.87) (159). Cigarette smoke exposure affects DNA methylation, with aberrant promoter hyper-methylation of the CDKN2A/P16INK4 gene demonstrated in the bronchial epithelial cells of smokers but not non-smokers (160). No association between tobacco smoke exposure and methylation is known in MBCN.
Folic Acid Specific dietary deficiencies are highly likely to be of importance to DNA methylation, particularly deficiencies of dietary sources of methyl groups including folate, methionine, betaine, serine and choline (161). Manipulation of dietary folate has been shown to affect DNA methylation. Women who were administered a low-folate diet (56μg/day) developed global DNA hypomethylation which was reversed on re- introduction of dietary folate at 516μg/day (162). This is the only publication thus far reporting an effect of dietary folate levels on methylation in humans. The association of MTHFR (methylenetetrahydrofolate reductase) gene polymorphisms with NHL risk is additional evidence of the importance of the folate metabolism pathway to NHL (163). Dietary folic acid has an established role in the formation of thymidylate for DNA synthesis as well as being a co-factor for vitamin B12 synthesis. Recommended daily intakes generally refer to the folic acid levels required for adequate red blood cell synthesis. These levels may not be the same threshold as those required for DNA methylation. The Australian National Health and Medical Research Council Nutrient Reference Values publish a Recommended Daily Intake of 400μg/day and an Estimated Average Requirement (daily level estimated to meet the requirements of half the health individuals) of 320μg/day (164).
Age Age-related changes in DNA methylation are well described, both at a global level as well as changes at specific CpG sites. Further to this, the methylation-based measures of biological ageing are themselves associated with increased cancer risk (165). Any study wishing to relate DNA methylation to cancer risk should, therefore, match for age in order to remove this potential confounder.
Epigenetic therapy in MBCN treatment
Hypomethylating agents are widely available and used to treat a number of haematological malignancies, particularly 5-azacitidine (AZA) in myelodysplasia (166). Their use for MBCN is being tested in phase 1 clinical trials. Pre-clinical data are encouraging, demonstrating that AZA treatment results in hypomethylation of MIR34B, with resultant re-expression of the gene, inhibition of cellular proliferation and enhancement of apoptosis (167).
39 2.3 Differential methylation as a marker of cancer risk
The hypothesis that non-mutation genetic variation may be a marker or determinant of cancer risk follows on from classical epidemiology studies evaluating familial and environmental risk factors as well as genetic epidemiology studies evaluating genetic variation as a risk factor for cancer. Non-mutation genetic variation encompasses the epigenetic processes of DNA methylation, histone modification and microRNAs. Epigenome-wide equivalents of GWASs were proposed as far back as 2008 (168). However, the term epigenome-wide association study is a misnomer given that the multitude of different epigenetic mechanisms cannot be encompassed in a single assay. DNA methylation is the most extensively investigated epigenetic mechanism as a risk factor for cancer due to the advances in high-throughput methylation assays. As a consequence there has been a rapid increase in ‘genome-wide’ methylation studies of solid cancers (168).
In a 2012 meta-analysis of publications reporting genome-wide methylation and cancer risk, 23 studies were included (169). Notably, only eight were performed in prospective cohorts and in six of the remaining 15 retrospective studies, subjects may even have received cancer treatment prior to blood sampling, which is highly likely to have affected measured methylation. These studies utilised older methods of DNA methylation analysis, with only the ability to comment on methylation across large regions such as repetitive elements (LINE-1, Alu) or in small pre-defined genomic regions. A disadvantage of using a retrospective case-control study for the evaluation of DNA methylation is the inability to determine whether the epigenetic variation is due to disease-associated differences or post-disease processes (168). The relative ease of subtraction or addition of a methyl group that occurs via methyl- and demethyl-transferases means the impact of treatments such as chemotherapy on DNA methylation are likely greater than for studies assessing genetic risk.
Studies measuring quantitative CpG site specific methylation have been performed in incident case-control studies for breast, bladder, gastric and small cell lung cancer cancers and colorectal adenoma, suggesting an association between peripheral blood-derived DNA methylation and cancer (170-180). A prospective, population- based study of breast cancer risk using the Infinium450K also found an association between DNA methylation more marked for cases with shorter time between blood collection and cancer diagnosis (176). In all these studies the absolute difference in DNA methylation between cases and controls is reported to be very small, in the order of 0.5% to 3%. This contrasts with studies comparing methylation in tumour and normal tissue in which differences in methylation of 20% are considered the threshold for defining differential methylation. Given the small differences of methylation anticipated, studies of peripheral blood methylation, therefore, require adequate statistical power and robust technical reliability in order to reduce the risk of reporting false positive results.
The first published study to report aberrant methylation in blood as marker for MBCN risk is presented in Results Chapter 5.1 (181). There has been one additional study examining DNA methylation changes in pre-diagnostic blood samples in 347
40 individuals from a prospective cohort, 28 of whom were diagnosed with CLL 2.0 – 15.7 years after enrollment (41). Using the Infinium 450K Beadchip, 722 differentially methylated (DM) CpG sites were identified, of which 73.4% overlapped with DM sites identified in a study of CLL cells (42).
2.4 Measuring DNA methylation
Comparison of genome-scale DNA methylation Assays
(A) DNA pre-treatment
Most DNA methylation assays rely on treatment of DNA before amplification or hybridization. The main approaches are endonuclease digestion, affinity enrichment and bisulfite conversion (182, 183).
I. Endonuclease (enzyme) digestion Enzyme digestion uses the properties of methylation-sensitive restriction enzymes which are inhibited by 5-methylcytosine (5mC) so that the patterns of digestion by such enzymes reflect DNA methylation (182). HpaII and SmaI are the most widely used restriction enzymes, inhibited by the presence of 5mC at a CpG site. Other endonucleases such as McrBC cleave at methylated sites. Endonuclease digestion produces fragments of either only methylated or only unmethylated DNA, and is followed by polymerase chain reaction (PCR) across the restriction site. Its strength is high sensitivity and its major weakness is false positive results due to digestion for reasons other than DNA methylation (182). Different techniques exist in order to couple the enzymatic methods to array-based analysis. One modification is to use the methylation-dependent endonuclease McrBC followed by comprehensive high- throughput arrays for relative methylation (CHARM). Another method involving HpaII restriction enzymes uses PCR to amplify the restriction fragments followed by array hybridization in a method known as HpaII tiny fragment enrichment by ligation-mediated PCR (HELP). Following on from array hybridization technology is next-generation sequencing to analyse the output of restriction enzyme procedures. Sequencing has been used to analyse the output of the HELP assay, while sequencing of HpaII or MspI digests is known as Methyl-seq.
II. Affinity enrichment Affinity enrichment of methylated regions using antibodies specific for 5mC or using methyl-binding proteins has the potential to enable comprehensive methylation profiling (182)The techniques MeDIP and mDIP involve enrichment of methylated regions by using an antibody specific for methylated cytosine followed by hybridization to a microarray. Enriched and non-enriched DNA are labeled with different fluorescent dyes, followed by two colour fluorescent hybridization. Calculation of relative fluorescent signal intensities is used to extrapolate DNA methylation information at the corresponding loci. Affinity enrichment methods can also be linked to sequencing analysis. The main strength of affinity enrichment is rapid genome-scale assessment of DNA methylation. However, they do not give
41 information on individual CpG site methylation, instead reporting on pre-defined regions. Significant experimental and bioinformatic adjustment is required to account for varying CpG density throughout the genome. Affinity-based methods are susceptible to measurement error where there are copy number alterations because the method does not measure the unmethylated version of a sequence.
III. Bisulfite conversion Bisulfite conversion takes advantage of the discovery in the 1990s that treatment of denatured genomic DNA with sodium bisulfite results in chemical deamination of unmethylated cytosines more rapidly than methylated cytosines (182). The result is that unmethylated cytosines are converted to uracils (‘C’ to ‘U’) and methylated cytosines remain ‘C’. This process transforms an epigenetic difference into a genetic one that can be detected using many technologies.
Bisulfite treated DNA allows for the measurement of methylation at single base pair resolution. As bisulfite-treated DNA consists of three rather than four bases, standard hybridization arrays require significant modification in order to render an authentic signal. The Illumina GoldenGate BeadArray used primers to detect methylation at up to 1,526 different CpG sites. Primers were designed to be specific for methylated and unmethylated sequences and labeled with different fluorescent dyes (182). The subsequent stages of development have been assays with increasing numbers of primers/probes: the Illumina Infinium HumanMethylation 27K interrogating 27,578 CpG sites (184) and the Infinium 450K covering 485,000 CpG sites (185). Bisulfite-treated DNA is also well-suited to sequencing approaches. Ultra- deep sequencing of 100 PCR products at an average coverage of more than 1,600 reads is possible on the Roche 454 platform (182). A challenge of sequencing bisulfite-converted DNA is that the low sequence complexity can lead to sequence redundancy. Reduced representation bisulfite sequencing (RRBS) limits this phenomenon by using restriction enzymes to fractionate DNA sequences based on size and limit regions selected for sequencing. Regions with moderate to high CpG density are targeted with this method, making the assay less costly than genome- wide bisulfite sequencing, but excluding it from the class of truly genome-wide methylation assays (182, 184).
Some sources of error in bisulfite-based methods include incomplete bisulfite conversion and differential PCR efficiency for methylated and unmethylated versions of the same sequence. While completion of bisulfite conversion should be measured as a quality control step, over-treatment with bisulfite can degrade DNA and result in methylated cytosines converting to thymine residues, resulting in under-reporting of DNA methylation. In bisulfite-treated methods, single nucleotide polymorphisms of C to T at CpG sites are at risk of being mis-interpreted as methylation variation, hence it is recommended that single nucleotide polymorphisms be removed from analysis or confirm presence of absence of nucleotide polymorphism with sequencing (182).
42 (B) DNA Methylation measurement
Following DNA pre-treatment, a choice of technologies is available to measure DNA methylation.
I. Locus-specific analysis – Methods such as MeDIP-PCR, MethyLight and EpiTYPER® were the first methylation assays developed and although they are still in use are not applicable to large-scale epidemiological studies.
II. Gel-based analysis – Methylation-specific PCR (MSP) and combined bisulfite restriction analyses (COBRA) are commonly used assays for the purposes of measuring between 10-100 CpG sites per sample (182).
III. Array-based technologies include CHARM and Illumina’s Goldengate and Infinium platforms. The advantages of array-based assays include the ability to use small quantities of DNA in a high throughput, relatively low-cost assay (168). The number of CpG sites analysed in the assay varies greatly from 1,000 up to 850,000. The criteria for selection of CpG sites included in the array itself leads to bias, as generally CpG sites associated with genes of known disease interest, and regions with known high CpG density are selected for inclusion, leaving other areas of the genome relatively sparsely covered by the assay. The Illumina Infinium assay is discussed below in greater detail.
IV. Sequencing-based technologies provide the highest level of coverage and resolution for DNA methylation measurement. Other advantages include negligible bias towards CpG-dense regions. Ultra-deep sequencing of a limited number (100- 1,000 PCR products) is possibly by pyrosequencing of PCR products after bisulfite conversion. The gold standard for single-base pair resolution methylation is whole genome bisulfite sequencing (182). When combined with bisulfite-treated DNA, sequencing methods are still unable to differentiate between methylated and hydroxomethylated cytosine bases. Sequencing generally remains too expensive and time-consuming for genome-wide methylation profiling.
43 Illumina Infinium 450k Assay
The Infinium Illumina HumanMethylation450 BeadChip uses specific probes to characterise individual CpG sites in bisulfite converted DNA (185). The 50 base pair length probes hybridise to specific CpG sites. Allele-specific single base extension of the primer incorporates a biotin nucleotide (for C and G) or a dinitrophenyl labeled nucleotide (for A and T).
There are two probe chemistries used in the Infinium 450k assay. The type I assay uses two probes per CpG site, one each for methylated and unmethylated states. One probe hybridises to the unmethylated CpG site denoted by ‘T’. The other probe hybridises to the methylated CpG site denoted by ‘C’. The type II assay uses one bead type, with the methylated state determined at the extension step after hybridization. Most probes (75%) are type II design. The two assays have different characteristics, in particular the range of β values in the type II assay has been smaller than that of the type I assay (186), potentially resulting in bias. Several correction methods have been published including a validated methodology, Subset- quartile within array normalization (SWAN) by Maksimovic et al (187).
Although offering less coverage than sequencing methods the throughput, resolution and cost-effectiveness makes the Illumina Infinium an attractive platform for genome-wide methylation studies (168). There is a bias towards representation of annotated gene regions and CpG islands with 99% RefSeq genes covered and 96% of CpG islands represented.
It is important to note that, while often described as a ‘genome-wide’ assay for methylation, the Infinium 450K covers 482,421 CpG dinucleotide positions of the human genome, distributed across all 22 autosomes and 1 sex chromosome pairs. From a functional genome distribution perspective, 200,339 CpGs (41%) are located in proximal promoters, defined as sites located within 200bp or 1,500 bp upstream of the transcription start site and in the 5’-untranslated region and exon 1. Of these CpG sites in proximal promoter regions, 46.1% are in CpG islands (the clusters of predominantly unmethylated CpG sites), 28.3% are in CpG shores, 3.8% in CpG shelves and 21.8% are in other regions of the genome (9).
Data processing - methods After probe hybridization, a signal for methylation or no methylation at each CpG site is generated. Background intensity is computed from a set of negative controls and subtracted from the raw methylation probe intensity. In contrast to genomic data which exhibit categorical properties, methylation status is a continuous variable, generally expressed as a ratio of methylated to unmethylated molecules. Methylation levels within the sample are reported as a ratio and can be expressed as an M-value or β-value.
The β value for methylation is the ratio of methylated probe intensity to the overall intensity (the sum of methylated and unmethylated intensities) and results in a value ranging between 0 and 1 or 0 and 100%. The β value has the advantage of intuitive
44 biological interpretation. In an ideal setting, a β value of zero indicates that all copies of the CpG site measured by the corresponding probe are completely unmethylated across the entire sample.
β = M/(M+U+α) (188) where M = intensity of methylated probes, U = intensity of unmethylated probes, α = constant offset (recommended to be set to α=100 in order to regularise β values when M and U values are small.
The M value is calculated as the log2 ratio of the intensity of methylated probes compared to unmethylated probes. An M value close to zero indicates similar intensity between methylated and unmethylated probes, which means the CpG site is about half methylated across all samples. Positive M values indicate that more CpG sites are methylated than unmethylated while negative values mean the opposite. The relationship between β and M values is linear between 0.2 and 0.8 for β values and -2 and 2 for M values however at the extremes of methylation β values are severely compressed. The M value is, therefore, preferred for regression analyses due to more constant standard deviation across the entire methylation range (189).
M value = log2M/U (190)
Reproducibility and accuracy Reproducibility between technical replicates (tumour cell lines and tumour tissue) is reported by Infinium to show an average correlation R2 of beta-values of 0.992 (191).
Accuracy is measured by concordance with gold standard methods for methylation assays. Roessler et al reported that of 352 tested CpG sites measured in primary tumour tissue, 60.5% are within 10% of the result obtained by quantitative pyrosequencing (192). As has been observed with previous array methodologies, there is lower concordance with the gold standard in the middle regions of methylation – when CpG sites are methylated between 25% and 75%. The authors recommend caution in describing CpG sites as hypo- or hyper-methylated if the CpG site is methylated in the middle range.
45 2.5 Measures of differential methylation
The most common measurement of differential methylation is the differentially methylated position (DMP), in which there is a difference in methylation at a single CpG site usually between a disease and non-disease state. DNA methylation altered at multiple CpG sites within a small genomic region is termed a differentially methylated region (DMR) (168).
Methylation variance is a more recently described phenomenon in which methylation at a single CpG site or region is found to be highly variable in a cohort of samples with disease while in a cohort of controls is found to be very consistent. Both differentially variable probes and differentially variable regions (DVRs) are described. Differential methylation variance is associated with some cancers (c- DVRs) and can exhibit specific patterns in different tissue types (t-DVRs) (168).
46 2.6 Biological challenges in measuring DNA methylation
Confounders of DNA methylation
Confounding of DNA methylation results can be due either to genetic or congenital factors or environmental exposures. Potential confounders include ethnic background, age (193), smoking (194, 195), sun exposure (193), dietary folate and other micronutrient intake (162, 196-198), obesity (199), and alcohol intake (200). Rakyan et al recommend measuring and adjusting for covariates (201) if they are known to have an independent effect on phenotype, as adjustment may allow for better estimation of the direct methylation effect.
Use of whole peripheral blood
Whole blood is a readily accessible source of DNA and can be collected and stored relatively easily and inexpensively, properties which are desirable qualities for large genetic epidemiological studies. In this study, whole blood was stored predominantly as dried blood spots on Guthrie cards, a method validated for analysing DNA methylation results using the Infinium 450K (202). In this study, in a number of individuals for whom blood was sampled, it was stored as both dried blood spots and frozen buffy coats . Methylation profiles of blood stored as dried blood spots and buffy coats from the same individuals was compared and demonstrated to be highly reproducible (means correlation, r = 0.9907). However in further detail, there were 5,723 CpG sites (of total >470,000 in the assay) demonstrating a significant difference of methylation between dried blood spots and buffy coats (comparison of methylation at individual CpG sites, F-test, q≤0.01) (202). The high means correlation results support the use of DNA from dried blood spots as an alternative for DNA methylation profiling. But, the detection of even a small proportion of probes with differences in methylation underscores the importance of matching for DNA source in case-control studies due to potential systematic differences due to preparation, storage conditions or cell population in the two methods.
Challenges of using whole peripheral blood
(I) Effect of different white blood cell proportions on methylation Whole blood in a healthy individual contains DNA from different subtypes of white blood cells – the major subgroups being lymphocytes (B and T), monocytes and granulocytes. In any healthy individual, there will be variation in the proportion of white blood cell subtypes in the peripheral blood. This may have a significant effect on methylation studies of whole blood as methylation patterns are recognised to be tissue-specific to the extent that characteristic methylation patterns can reliably predict cell of origin; not only between skin cells and blood cells, for example, but also between the different leucocyte subtypes (203-206). Hierarchical clustering of genome-wide methylation data from purified leucocytes is consistent with the well- established model of haematopoietic differentiation (204).
47 The magnitude of difference between white blood cell subtypes may be large enough to confound the differences one may expect to see between cases and controls. Using data from published case-control series of ovarian cancer, bladder cancer and head and neck squamous cell carcinoma (HNSCC), Koestler et al documented a number of DMRs in the leucocyte subgroups. They reported that the magnitude of difference between beta-values of the proposed leucocyte DMRs was equivalent to the difference between beta-values of controls and cancer cases in the ovarian and HNSCC data sets (203). Clearly, the variable proportion of leucocytes in whole blood samples may present a significant confounder in any study of whole blood methylation. Houseman et al have developed a model that uses known leucocyte DMRs to predict leucocyte proportions in whole blood (207). The model was recently validated in a sizeable cohort, using methylation data from the Infinium 27K to accurately predict the percentage of monocytes and lymphocytes in whole blood compared with a full blood count analyser (208). The aim in applying such a model is to use methylation data to impute the white blood cell composition of blood samples, using a regression model.
(II) Possibility of circulating tumour cells Whole blood in an individual who subsequently develops MBCN may contain circulating tumour cells or tumour DNA as some of these malignancies are characterised by a long latency during which individuals remain asymptomatic and/or are associated with a pre-malignant precursor condition. In some cases there may be perturbation of the total lymphocyte count while in others at earlier stages the lymphocyte count may remain unchanged. The performance of the white blood cell correction algorithms such as the Houseman algorithm described above is not published or otherwise described for this situation.
Chronic lymphocytic leukaemia (CLL) is preceded nearly universally by a condition known as monoclonal B-lymphocytosis in which pre-malignant monoclonal B cells are detectable in the peripheral blood at very low levels which do not necessarily result in a raised total lymphocyte count (209). The duration of monoclonal lymphocytosis prior to CLL is not clear and likely varies greatly between individuals. In a study of pre-diagnostic blood samples collected a median of 3 years before the diagnosis of CLL detectable mutations consistent with circulating B cell clones were detected in 44 of 45 patients (209). Conversely, in a healthy population of 1,520 adults aged 62-80 years, monoclonal B cells (defined by abnormal surface immunophenotype) were detectable in the blood of 5.1% subjects with a normal lymphocyte count and 13.9% subjects with an elevated lymphocyte count (210). The annual rate of subsequent progression to overt CLL was 1.1%, , the only factor associated with increased risk of progression being an increase in B lymphocyte level of HR 1.46 (95% CI .1.12-1.91) associated with each increase of 1,000 B cells per ml.
For multiple myeloma the analogous precursor condition is monoclonal gammopathy of undertermined significance (MGUS) that is defined by the detection of an abnormal monoclonal protein (211). MGUS can be detected in 100% of MM cases at least 2 years prior to MM diagnosis, in 93% cases 7 years prior to diagnosis and in 82.4% cases 8+ years prior to diagnosis (211). The prevalence of MGUS
48 increases with age, and in a white American population was reported to be 3.2% in those >50 years and 7.5% in those aged 85 years and older (212). The presence of circulating clonal plasma cells and associated tumour DNA in myeloma is now well- established (5, 213) but the frequency of detection of the same in MGUS is yet to be described so far.
For follicular lymphoma, the characteristic genetic lesion, rearrangement of the BCL- 2/JH genes, is detectable in peripheral blood lymphocytes in up to 50% of healthy adults (214). The high prevalence of this mutation in healthy adults confirms that its presence alone is insufficient for the development of overt lymphoma. In a case- control study of subjects diagnosed with follicular lymphoma nested in the European Prospective Investigation Into Cancer and Nutrition (EPIC), pre-diagnostic blood collected a mean of 6.4 years prior to diagnosis was evaluated for the t(14;18) chromosomal translocation. The t(14;18) was detected in 56% of cases compared with 29% of controls (p<0.001) (215). The high prevalence of this mutation in circulating lymphocytes of healthy controls, as well as the significant difference between cases and controls, raises the possibility for this to be a potential confounder in methylation analysis of peripheral blood.
In summary, correction for white blood cell content appears to be important to adjust for cellular heterogeneity but alone is unlikely to account for the presence of low levels of circulating tumour cells and circulating tumour DNA. One of the potential outcomes of this study is the identification of a biomarker for early development of MBCN, and for this purpose whole blood is an appropriate source of DNA.
49 3 Study Design 3.1 Melbourne Collaborative Cohort Study
The Melbourne Collaborative Cohort Study (MCCS) is a prospective study of 41,513 people (24, 469 women and 17,044 men) recruited between 1990 and 1994, designed to investigate the roles of diet and lifestyle in causing cancer. While repeated blood sampling was part of the study methodology, the rapid growth in genomics was not specifically part of the initial study plan (216). Blood samples were collected from 41, 133 participants. Initially blood was collected and separated into mononuclear cells by Ficoll separation (n=9, 832) or buffy coat by centrifugation (n=635). Due to inadequate budget, from 1991, whole blood was stored on Guthrie diagnostic cellular filter paper as dried blood spots (n=30, 633). The majority of participants were identified from the Victorian state electoral rolls, on which enrollment is a required of Australian citizens. Announcements were also made targeting the Italian and Greek communities of Melbourne. 99.3% were aged between 40 and 69 years at baseline. Participants were Caucasian, of whom 25% were born in Italy or Greece and the remainder in Australia, New Zealand or the United Kingdom. Compared with the general population of Melbourne in the same age group, there were more females (59%) and the average age of MCCS participants (55 years) was slightly higher than the general population (216).
At recruitment (baseline), a wide range of epidemiological, lifestyle, personal medical history and medication data was collected via questionnaires, including diet, skin type, physical activity, alcohol and smoking. Dietary information was obtained using a food frequency questionnaire specifically developed for the MCCS (217) which estimates the intake of 121 food items including meats, fish, fruits, vegetables, cereal-based foods, fats and oils, and non-alcoholic beverages. Additionally, the FFQ is used to calculate intakes of macronutrients such as protein, carbohydrates, fats, and micronutrients such as fatty acids, vitamins, calcium, iron, folate, carotenoids, etc. The consumption of alcoholic beverages and smoking were captured by a separate questionnaire. The MCCS has also collected information on family history of malignancy and could be used to identify families to be followed up as a result of the project’s findings. In addition, direct physical measurements (height, weight, waist and hip circumferences and blood pressure) were made according to standard protocols and a blood sample was taken from virtually every participant. For the purposes of passive follow-up, the MCCS is matched routinely to the Victorian Cancer Registry and to other cancer registries in Australia via the Australian Institute for Health and Welfare which links it to the Australian Cancer Database and to the National Death Index. It is also linked 6-monthly to the Victorian Electoral Enrolment Register to update changes in address. Follow-up is maintained at a high level with only 110 (0.03%) of cohort members known to have left Australia.
Genome-wide association studies have been conducted on DNA samples from cases with breast, prostate, colorectal and urothelial cancers using the Illumina Octorara beadchip. Genome-wide DNA methylation has been evaluated in a series of nested
50 case-control studies of MCBN, prostate, colorectal, urothelial, kidney, gastric and lung cancers using the Illumina Infinium HumanMethylation450 BeadChip (216).
3.2 Nested Case-Control Study, participant selection
Participants were enrolled in the MCCS (216). Incident cases of MBCN diagnosed between baseline attendance and 31 December 2011 were identified by linkage to the Victorian Cancer Registry. The Victorian Cancer Registry receives mandatory notifications of all new cancer cases in Victoria from both hospital medical records and pathology laboratories. The diagnosis is preferred to be obtained from the diagnostic histopathology report. According to routine Victorian Cancer Registry practice, each case is assigned a diagnostic code according to the (International Classification of Diseases for Oncology, 3rd Edition) ICD-O-3 (218). Cases of mature B cell malignancies were identified classified according to ICD-O-3 codes 9670, 9823, 9680, 9690, 9691, 9695, 9698, 9731, 9732, 9733, 9734, 9671, 9673, 9675, 9678, 9679, 9684, 9687, 9689, 9699, 9761, 9764, 9826, 9833, 9940. The diagnostic histopathology reports were reviewed by the clinical haematology investigator (NWD) in order to confirm that the appropriate ICD-O-3 code was assigned to each case.
Controls were selected using density sampling with attained age as the time scale. Controls were individually matched to cases at a 1:1 ratio based on age at enrolment (+/- 1year), sex, country of birth (Southern Europe [Italy or Greece] or United Kingdom, Australia or New Zealand) and DNA source (mononuclear cells, buffy coat or dried blood spot).
Participants with a history of cancer (other than non-melanoma skin cancer) before baseline or those with no baseline DNA sample were ineligible. Peripheral blood samples were obtained as part of the MCCS at study entry (baseline).
Due to the potential influence of environmental exposures on DNA methylation, baseline smoking status, dietary folate intake, alcohol consumption and BMI were considered to be potential confounding variables. Information on smoking and alcohol was obtained from an interviewer-administered questionnaire. Smoking status was categorised as never smoked, former smoker and current smoker while alcohol consumption was categorised as no intake, 1-39g/day, 40-59g/day and ≥60g/day. Dietary folate intake was assessed by a 121-item self-recorded food frequency questionnaire specifically designed for the MCCS and categorised as <320μg/day or ≥320μg/day, the level estimated to meet the requirements of half of healthy adults (164). BMI was calculated from measured height and weight performed at baseline and categorised as <30 or ≥30, corresponding to the BMI threshold at which obesity is defined by the WHO.
Study participants provided written, informed consent and the study was approved by Cancer Council Victoria’s Human Research Ethics Committee and performed in accordance with the institution’s ethical guidelines.
51
Between baseline attendance (period spanning 1990-1994) and 31 December 2011, 572 cases of MBCN were notified to the Victorian Cancer Registry. Six cases were identified as having a pre-baseline or concurrent cancer diagnosis of which two were only identified as a result of review of the diagnostic pathology reports by the clinical haematology investigator (NWD). Of the remaining 566 cases, a matched control was available in 469. After DNA extraction for 469 pairs, for 5 there was insufficient DNA for the methylation assay. A total of 464 pairs were, thus, assayed on the Infinium 450K with only one sample failing the quality control threshold. The remaining 463 pairs were processed using the normalization procedure described in Methods Chapter 4.4. One case appeared to have been matched with two different controls and due to a discrepancy in dealing with the correct case-control pair both pairs were excluded.
A check of actual sex versus methylation findings of X chromosome inactivation and Y chromosome signals was performed and surprisingly there was an apparent sex discrepancy in 20 cases and 6 controls. A thorough reconciliation was undertaken to determine whether a labeling error had occurred. Three unique patient identifiers (Victorian Cancer Registry identifier, MCCS identifier and date of birth) were checked and no clerical error was found. After the corresponding 23 pairs were removed,438 case-control pairs remained available for statistical analysis.
Table 16: Histological diagnoses and assigned tumour group
ICD-O-3 N Morphology Assigned n code tumour group 9670 25 CLL Low grade 137 9823 57 CLL / Small lymphocytic lymphoma Low grade 9671 4 Lymphoplasmacytic Lymphoma Low grade 9673 17 Mantle cell lymphoma Low grade 9689 6 Splenic marginal zone lymphoma Low grade 9761 9 Waldenström’s macroglobulinaemia Low grade 9940 4 Hairy cell leukaemia Low grade 9699 15 Marginal zone lymphoma Low grade 9679 3 Mediastinal B cell lymphoma High grade 110 9680 102 DLBCL High grade 9675 2 Mixed small and large cell diffuse lymphoma High grade 9684 2 DLBCL, immunoblastic High grade 9687 1 Burkitt lymphoma High grade 9690 9 Follicular lymphoma Follicular 82 9691 30 Follicular lymphoma Follicular 9695 31 Follicular lymphoma Follicular 9698 12 Follicular lymphoma Follicular 9731 2 Plasmacytoma Myeloma 109 9732 106 Multiple myeloma Myeloma 9733 1 Plasma cell leukaemia Myeloma Total 438 438
52
572 MBCN cases in MCCS cohort reported to Victorian Cancer Registry
Missing DNA sample n=4 Pre-baseline cancer iden fied by Victorian Cancer Registry n=97
471 casses matched with control
A er review of pathology reports, 2 addi onal cases were found to have a pre- baseline haematological malignancy#
469 pairs DNA extracted
Insufficient DNA n=5
464 assayed with Infinium 450K
QC failure n=1
463
A duplicated case-control pair iden fied, both pairs removed
461
Sex discrepancy in 20 cases and 6 controls. 23 matched pairs removed
438 pairs for sta s cal analysis
Figure 1: Flow diagram of study participants # One case had a pre-baseline diagnosis of chronic lymphocytic leukaemia and one case had a concurrent diagnosis of MBCN and chronic myelmonocytic leukaemia
53
Characteristics of the study sample are shown below. The median age at diagnosis was 69 years and median time between baseline DNA collection and MBCN diagnosis was 10.6 years (range 2.4 months – 20 years). There were no significant difference between cases and controls for body mass index, smoking status, folate intake or alcohol consumption.
Table 17: Demographics of study population
Controls Cases N= 438 N= 438 Age at enrolment Mean 58.7 58.8 (years)* SD 7.8 7.9 Range 40-70 40-70 DNA source* Dried blood spot 316 316 Lymphocyte 117 117 Buffy coat 5 5 Age at diagnosis Mean NA 76.9 (years)* SD NA 7.8 Range NA 56-91 Ethnicity* Anglo-Celtic 340 340 Southern European 98 98 Body Mass Index Mean 27.3 27.1 (BMI)* SD 4.4 4.3
BMI category BMI ≥30 87 87 Smoking status* Never smoked 246 241 Former smoker 151 152 Current smoker 37 43 Daily folate intake* Mean 328 343 (mcg/day) SD 140 177
Folate intake <320mcg/day 216 226 category* Daily alcohol intake* 0g/day 160 158 1-39g/day 215 228 40-59g/day 40 37 ≥ 60g/day 23 15 * p=ns between controls and cases
54 4 Methods 4.1 DNA source and sample collection
Blood samples were collected at baseline entry to the study, prior to any diagnosis with cancer. For 636 samples, the DNA source was from dried blood spots. For dried blood spot samples, whole blood was collected and transferred to Guthrie Card Diagnostic Cellulose filter paper (Whatman, Kent, UK) and stored in airtight containers at room temperature. For 234 samples, the DNA source was peripheral blood mononuclear cells and for ten samples the DNA source was buffy coats. Mononuclear cells were isolated by density gradient centrifugation method using Ficoll-Paque Plus (GE Healthcare, Parramatta, Australia). Briefly, whole blood was centrifuged and plasma was removed. The red cell fraction was carefully transferred onto Ficoll and spun down for 20 min without applying a brake. After the spin, the mononuclear cell layer was transferred and washed with RPMI medium. Mononuclear cells and buffy coats were stored in liquid nitrogen at -80°C since collection.
4.2 DNA Extraction and Bisulfite conversion
DNA extraction and bisulfite conversion was performed in the Genetic Epidemiology Laboratory, at The University of Melbourne. DNA was extracted from lymphocytes and buffy coats in a 96-well format using QIAamp 96 DNA blood kit (Qiagen, Hilden, Germany). DNA was extracted from dried blood spot samples using a published method (202) whereby 21x3.2mm blood spot punches were re-suspended in water and lysed in phosphate-buffered saline using Tissue Lyser (Qiagen). The resulting supernatant was processed using Qiagen mini spin columns according to the manufacturer’s protocol. The quality and quantity of DNA was confirmed using Quanti-i™ PicoGreen®ds DNA assay measured on the Qubit® Flurometer (Life Technologies, NY, USA). The ideal quantity of DNA was 1ug, whereby 300ng was the minimum quantity considered acceptable for methylation analysis and the remaining DNA stored for other applications such as genotyping assays. Where an insufficient quantity of DNA was extracted, samples were reassessed and DNA re-isolated where possible. If the DNA source from a control sample was exhausted, replacement controls were requested from MCCS. Where DNA source from a case was exhausted, the case-control set was dropped.
Bisulfite conversion was performed using Zymo Gold single tube kit (EZ DNA Methylation-Gold kit, Zymo Research, CA, USA) according to the manufacturer’s instructions. Post-conversion quality control was performed using SYBR Green (Applied Biosystems) quantitative PCR, an in-house assay, designed to determine the success of bisulfite conversion by comparing amplification of the test sample with positive and negative controls.
55 4.3 DNA methylation measurement
Plate Design Samples were processed in batches of 96 samples per plate (8 Infinium HumanMethylation450K BeadChips per batch). A total of 200 ng of bisulfite converted DNA was used per sample.
Array-based technologies are subject to intra- and inter-run batch effect due to environmental factors (e.g., temperature, humidity), reagent variability or operator effect. The following principles were therefore adhered to: 1. Matched case and control samples were always processed together and run on the same plate, placed consecutively to minimise potential batch effects 2. MBCN tumour subtypes were distributed evenly across all plates, to reduce the risk of incorrectly attributing difference in plate-to-plate measurement to differences between tumour subtypes 3. Two technical replicates and two cell lines (multiple myeloma cell lines U266) were included per 92 samples to check reproducibility of the assay 4. The positions for cases and controls were randomly distributed on each BeadChip in order to minimise any position effects within the chips
Methylation Analysis The Infinium HumanMethylation450 BeadChip (Illumina, San Diego, CA, USA) was used for methylation analysis in accordance with the manufacturer’s instructions. In further detail, bisulfite-converted DNA was denatured and neutralised. Isothermal whole genome amplification took place overnight. The amplified product was fragmented by an enzymatic process controlled by end-point fragmentation to avoid over fragmenting the sample. After isopropranolol fragmentation, the fragmented DNA was collected by centrifugation and the precipitated DNA resuspended in the hybridization buffer. Re-suspended DNA samples were distributed onto the BeadChips for hybridisation.
The TECAN automated liquid handler (Tecan Group Ltd, Mannedord, Switzerland) was used for the single base extension and staining BeadChip steps. A final washing step was performed to remove unhybridised DNA and ready the plate for staining and primer extension. Single base extension of the oligos on the BeadChip incorporates biotin (ddCTP and ddGTP) and 2,4-dinitrophenol (ddATP and ddTTP). Staining with antibodies and fixation with fluorophores is performed prior to scanning with the Illumina HiScan SQ scanner.
4.4 Data processing Methylation data were imported into the R statistical software (219) and processed using the ‘minfi’ bioconductor package (220). Background correction and between- sample normalization were performed using the ‘preprocessIllumina’ function of minfi which is equivalent to standard normalization methods used in the
56 GenomeStudio software package provided by Illumina. Subset quantile within array (SWAN) was performed for type I and type II probe bias correction (187). Illumina normalization to control for position effect, SWAN correction to correct for type I and II probe bias and ComBat normalization to correct for chip and batch effect are now part of a standard pipeline for analyzing such datasets (221).
Samples were excluded if >5% CpG probes (excluding chrX and chrY probes) had a detection p-value higher than 0.01. CpG probes were excluded from further analysis if they had detection p-value >0.01. ComBat normalization was applied to minimise chip and batch effects (222). Following methylation analysis, using the Infinium HumanMethylation450K BeadChip, only one pair of samples was rejected on the basis of returning >5% of probes with detection p-value >0.01.
4.5 CpG site selection
Out of a total of 485,512 sites covered by the Infinium 450k array, after quality control, 40,961 (9%) had a detection p value ≥0.01 and were excluded from analysis. All CpG sites on sex chromosomes were also excluded. The resulting 444,551 CpG sites were included in the analysis of global DNA methylation. For the analysis of differentially methylated positions and regions, SNP-associated CpG sites and non- CpG targeted sites were removed, leaving 416,669 CpG sites considered in the analysis of differentially methylated positions.
Figure 2: Strategy for selecting CpG sites to include in the final analysis
57 4.6 Assembly of Candidate Genes
Genes identified as mutated in MBCN Following the literature review outlined in Chapter 2, a list of 201 known mutations occurring in MBCN was compiled and used to evaluate the relevance of methylation findings (Appendix Table 1). The presence of mutations in the tumour tissue leads to the possibility that they are important in some element of the pathogenesis of MBCN, either through activating or deactivating function. However, only a smaller proportion is likely to comprise actual driver mutations and some are only found in small percentages of MBCN cases. Identification of known mutations in MBCN is important in order to consider two possible relationships between gene mutation and methylation.
First, a finding of aberrant methylation in these candidate genes may indicate that gene regulation via DNA methylation is an important alternative non-mutation pathway in the pathogenesis of MBCN.
Second, an alternative possibility is that a finding of aberrant methylation in these genes could be a downstream result of a mutated gene within circulating tumour cells rather than of a primary DNA methylation abnormality itself. Ideally this would be confirmed by mutation analysis before a definitive conclusion could be drawn.
Genes identified as aberrantly methylated in MBCN Following the literature review of aberrant methylation in MBCN described in Chapter 2, a list of differentially methylated genes was compiled for comparison with our results (Appendix Table 2).
58 5 Results 5.1 Global DNA Methylation
Wong Doo, et al. Global measures of peripheral blood-derived DNA methylation as a risk factor in the development of mature B-cell neoplasms. Epigenomics 2016 (1), 55- 56
Reproduced with permission from Epigenomics as agreed by Future Medicine Ltd.
59
60 Research Article Wong Doo, Makalic, Joo et al.
to detect novel regions of differential methylation asso- We considered baseline smoking, folate and alcohol ciated with a cancer phenotype. Although any tissue can intake and BMI as potential confounding variables. be used, DNA from peripheral blood offers the advan- Information on smoking and alcohol consumption was tage of being a readily available tissue from which new obtained from an interviewer-administered question- biomarkers for the prevention, early detection and mon- naire. Folate intake was assessed by a 121-item food itoring of cancer could be identified [17] . To date, several frequency questionnaire specially designed for the retrospective case–control studies have demonstrated MCCS. BMI was calculated from height and weight, aberrant global methylation profiles in the peripheral which were measured at baseline. blood of patients with colorectal cancer, breast cancer, Study participants provided written, informed head and neck cancers and urothelial cancers [18] but consent. The study was approved by Cancer Council there are no published prospective studies regarding Victoria’s Human Research Ethics Committee and MBCN. The only published study of DNA methylation performed in accordance with the institution’s ethical in peripheral blood in MBCN is a retrospective case– guidelines. control study of follicular lymphoma measuring DNA methylation in the DAPK1 gene only [19] . DNA source & sample collection To our knowledge, ours is the first prospective to Blood samples were collected at baseline entry into describe the importance of DNA methylation profile the study, prior to any diagnosis with cancer. For 636 measured in peripheral blood to the risk of MBCN samples, the DNA source was from dried blood spots tumors. We evaluated a global measure of DNA methyla- (DBS). For DBS samples, whole blood was collected tion as a predictor of subsequent development of MBCN and transferred to Guthrie Card Diagnostic Cellu- using the Infinium® HumanMethylation450 BeadChip lose filter paper (Whatman, Kent, UK) and stored in (Infinium 450k; Illumina, CA, USA) which is suitable airtight containers at room temperature. Methylation for epigenetic epidemiological studies due to the small profiles from DBS using the Infinium HumanMeth- amounts of DNA required, the high-throughput capa- ylation450 BeadChip are highly reproducible between bility and the ability to interrogate methylation at single technical replicates (means correlation, r = 0.9932) and base pair resolution [20]. Our aims were to assess whether compared with methylation from buffy coat samples a global measure of DNA methylation dispersed over a from the same individuals (r = 0.9932) [20]. For 234 large portion of the genome could be detected in DNA samples, the DNA source was peripheral blood mono- from peripheral blood, and whether such a measure was nuclear cells and for ten samples the DNA source was associated with the risk of developing MBCN. buffy coats. Both had been stored at -80°C since col- lection. Mononuclear cells were isolated by density gra- Materials & methods dient centrifugation method using Ficoll-Paque Plus Participants (GE Healthcare, Parramatta, Australia). Briefly, whole Participants were enrolled in the Melbourne Collabora- blood was centrifuged and plasma was removed. The tive Cohort Study (MCCS), a prospective cohort study red cell fraction was carefully transferred onto Ficoll of 41,514 healthy adult volunteers (24,469 women, and spun down for 20 min without applying brake. 17,045 men) aged between 27 and 76 years (99.3% aged After the spin, the mononuclear cell layer was trans- 40–69 years), recruited between 1990 and 1994 [21]. ferred and washed with RPMI. The mononuclear cells Peripheral blood samples were obtained from partici- were stored in liquid nitrogen. pants at study entry (baseline). For this study, those with a history of cancer before baseline (n = 1970) or no DNA extraction & bisulfite conversion baseline DNA sample (n = 420) were ineligible, leaving DNA was extracted from mononuclear cells and buffy 39,124 eligible participants. Incident cases of MBCN coat specimens using QIAamp mini spin columns diagnosed between baseline attendance and 31 Decem- (Qiagen, Hilden, Germany). DNA was extracted ber 2011 were identified by linkage to the Victorian from dried blood spots using a published method [20]. Cancer Registry, which receives mandatory notification Briefly, 20 blood spots of 3.2 mm diameter were of all new cancer cases in Victoria, Australia. The diag- punched from the Guthrie card and lysed in phos- nostic pathology reports were reviewed and classified phate-buffered saline using Tissue Lyser (Qiagen). according to the International Classification of Disease The resulting supernatant was processed using Qia- (ICD-O-3). Controls were selected using density sam- gen mini spin columns according to the manufactur- pling with attained age as the time scale and were indi- er’s protocol. The quality and quantity of DNA was vidually matched to cases at a 1:1 ratio based on age at assessed using the Quant-iT™ Picogreen®ds DNA enrollment (±1 year), gender, ethnicity and DNA source assay measured on the Qubit® Fluorometer (Life (see below). Technologies, NY, USA), with a minimum of 0.3 μg
56 Epigenomics (2016) 8(1) future science group
61 Global measures of peripheral blood-derived DNA methylation in mature B-cell neoplasms Research Article
DNA considered acceptable for methylation analysis. batch effect have been previously used as a pipeline for Bisulfite conversion was performed using Zymo Gold analyzing such datasets [26]. Following methylation single tube kit (EZ DNA Methylation-Gold kit, Zymo analysis, using the Infinium HumanMethylation450 Research, CA, USA) according to the manufacturer’s BeadChip, only one pair was rejected on the basis of instructions. Postconversion quality control was per- returning >5% of probes with detection p-value > 0.01. formed using SYBR green-based quantitative PCR, an A total of 19,834 probes was identified as hybridizing in-house assay, designed to determine the success of to multiple genomic locations [27], and analysis was bisulfite conversion by comparing amplification of the performed before and after exclusion in order to avoid test sample with positive control and negative controls. unexpected effect of these cross-hybridizing probes on global methylation. DNA methylation assay Samples were processed in batches of 96 samples per Statistical analysis
plate (8 Infinium HumanMethylation450 BeadChips The M-value is defined as log2 (Meth/Unmeth)], per batch). In order to minimize potential batch where Meth, Unmeth are the intensities of the methyl- effects, matched cases and controls were processed ated and unmethylated probes, respectively. M-values together and run on the same BeadChip and cancer and β-values were calculated using minfi [23]. Statis- subtypes were evenly distributed across the plates/ tical analyses were performed using M-values due to chips. Two technical replicates and two controls (mul- the possibility of heteroscedasticity encountered when tiple myeloma cell line U266) were included on each using β-values [28]. We calculated genome-wide global plate. The positions for cases and controls were ran- DNA methylation using all 444,551 probes and global domly ordered for every BeadChip to reduce any pos- DNA methylation in specific regions of the genome. sible position effects within the chips. The Infinium Using the annotation file provided by Illumina, CpG HumanMethylation450 BeadChip analysis was per- sites were classified according to their location in formed according to manufacturer’s instructions. A CpG islands, CpG shores or shelves or other (‘open total of 200 ng of bisulfite converted DNA was whole sea’ locations). Promoter regions were defined as loci genome amplified and hybridized onto the BeadChips. 1500 bp upstream of the transcription start site, within The TECAN automated liquid handler (Tecan Group enhancer-associated regions and within the 5′UTR. Ltd, Mannedord, Switzerland) was used for the single- Promoter regions were further divided according to base extension and staining BeadChip steps. Bisulfite their CpG content and ratio, as differential CpG con- conversion and the Infinium 450k methylation assay tent within promoter regions is known to influence were performed in accordance with the manufacturer’s methylation profile and gene expression [29,30]. High instructions. CpG promoters (HCP), intermediate CpG promot- ers (ICP) low CpG promoters (LCP), as described Data processing by Weber et al. [29], were analyzed using a published Methylation data were imported into R statistical annotation file [31]. software [22] and processed using ‘minfi’ bioconductor Stata Version 10 (StataCorp, TX, USA) was used package [23]. Background correction and between-sam- for statistical analysis. For each sample, the median ple normalization were performed using the ‘prepro- M-value across all probes was defined as the global cessIllumina’ function of minfi which is equivalent to measure of DNA methylation and grouped into tertiles standard normalization methods used in GenomeStu- based on the distribution in controls. Hypomethylation dio software package provided by Illumina. Subset- was defined as the lowest tertile of M-values and hyper- quantile within array (SWAN) was performed for methylation defined as the highest tertile of M-values. type I and II probe bias correction [24]. We excluded 65 Conditional logistic regression was used to estimate CpG sites corresponding to known single nucleotide odds ratios (OR) in relation to tertiles of global DNA polymorphisms. Samples were excluded if >5% CpG methylation, with the middle tertile as the reference probes (excluding chrX and chrY probes) had a detec- category. Variables adjusted for in conditional logistic tion p-value higher than 0.01, which were regarded regression were: gender, age at enrollment, ethnicity as probes with ‘missing value’, while CpG sites were and DNA source. Associations were also assessed by excluded from further analysis if they had missing val- including the median M-value as a continuous explan- ues for one or more samples. ComBat normalization atory variable. p-values from the likelihood ratio test was applied to minimize chip and batch effects [25]. were reported. The likelihood ratio test was also used Illumina normalization to control for position effect, to test for subgroup heterogeneity. SWAN correction to correct for type I and II probe The majority of blood in our study was collected bias and ComBat normalization to correct for chip and and stored as whole peripheral blood in which the
future science group www.futuremedicine.com 57 62
Research Article Wong Doo, Makalic, Joo et al.
proportions of leukocytes was not known. Due to the leuko cyte proportions (B lymphocytes, T lymphocytes, described effect of differential leukocyte cell content natural killer (NK) cells, monocytes, granulocytes) in on DNA methylation profile [32] we sought to control cases and controls. Leukocytes with significantly differ- for leukocyte heterogeneity. We therefore applied the ent proportions in cases and controls were factored into algorithm of Houseman et al. [32] in which the cell com- conditional logistic regression to determine whether position of peripheral blood samples is estimated based adjustment for cell composition affected the OR. on distinctive methylation profiles. This was performed The effect of the time interval between baseline and in minfi. A paired t-test was performed comparing MBCN diagnosis on the ORs was analyzed by group-
Table 1. Characteristics of study sample. Demographics Controls (n = 438) Cases (n = 438) p-value Age at enrollment (years): – Median 59 59 † – SD 8 8 – Range 40–70 40–70 DNA source: – Dried blood spot 316 316 † – Mononuclear cell 117 117 † – Buffy coat 5 5 † Age at diagnosis (years): – Median 69 69 – SD 9 9 – Range 42–87 42–87 Tumor subtype: – MM 109 – Follicular lymphoma 81 – Low-grade NHL/CLL 136 – High-grade NHL 112 Ethnicity: – Anglo-Celtic 340 340 † – Southern European 98 98 BMI:‡ – BMI ≥30 87 87 0.58 Smoking status‡: – Never smoked 246 241 0.53 – Former smoker 151 152 – Current smoker 37 43 Daily folate intake‡: – <320 μg/day 216 226 0.11 Daily alcohol intake‡: – 0 g/day 160 158 0.54 – 1–39 g/day 215 228 – 40–59 g/day 40 37 – ≥60 g/day 23 15 †6ARIABLES MATCHED FOR CASES CONTROLS ‡-EASURED AT BASELINE #,, #HRONIC LYMPHOCYTIC LEUKEMIA -- -ULTIPLE MYELOMA .(, .ON (ODGKINS LYMPHOMA 3$ 3TANDARD DEVIATION
58 Epigenomics (2016) 8(1) future science group
63 Global measures of peripheral blood-derived DNA methylation in mature B-cell neoplasms Research Article
Table 2. Comparison of estimated leukocyte proportions in cases and controls. Leukocyte Mean Standard error Difference in 95% CI p-value breakdown Cases Controls Cases Controls means All samples (n = 876) T lymphocytes 0.286 0.287 0.005 0.005 -0.001 (-0.013–0.011) 0.863 NK cells 0.102 0.957 0.004 0.005 0.007 (-0.002–0.015) 0.127 B lymphocytes 0.107 0.081 0.005 0.002 0.026 (0.016–0.035) <0.001 Monocytes 0.079 0.078 0.002 0.002 0.001 (-0.005–0.007) 0.786 Granulocytes 0.478 0.5 0.008 0.007 -0.022 (-0.042 to -0.002) 0.029 Dried blood spot samples (n = 632) T lymphocytes 0.266 0.277 0.004 0.004 -0.01 (-0.021–0.001) 0.051 NK cells 0.076 0.07 0.002 0.002 0.006 (-0.001–0.012) 0.067 B lymphocytes 0.103 0.083 0.005 0.002 0.02 (0.009–0.030) <0.001 Monocytes 0.073 0.072 0.001 0.001 0.001 (-0.002–0.005) 0.41 Granulocytes 0.524 0.54 0.006 0.005 -0.016 (-0.031 to -0.001) 0.033 Mononuclear cell samples T lymphocytes 0.348 0.323 0.013 0.012 0.025 (-0.007–0.057) 0.128 NK cells 0.178 0.169 0.009 0.01 0.01 (-0.014–0.034) 0.413 B lymphocytes 0.116 0.08 0.011 0.004 0.037 (0.015–0.059) 0.001 Monocytes 0.097 0.097 0.007 0.007 <0.001 (-0.020–0.020) 1 Granulocytes 0.346 0.371 0.022 0.02 -0.05 (-0.10600.006) 0.08 $IRECT EVALUATION OF LEUKOCYTE CONTENT WAS NOT AVAILABLE THEREFORE LEUKOCYTE PROPORTIONS WERE ESTIMATED BY APPLYING AN ALGORITHM BASED ON THE DISTINCTIVE METHYLATION PROlLES OF EACH LEUKOCYTE ,EUKOCYTE PROPORTIONS WERE COMPARED IN CASES AND CONTROL FOR ALL SAMPLE TYPES AS WELL AS FOR WHOLE BLOOD SAMPLES STORED AS DRIED BLOOD SPOTS AND PURIlED MONONUCLEAR CELL SAMPLES ,EUKOCYTE PROPORTIONS ESTIMATED ACCORDING TO METHYLATION PROlLE [32]. .+ .ATURAL KILLER
ing the interval into <5, 5–9 and >9 years. For controls, Out of these, 24 had no suitable controls. Pairs were the time interval was calculated as the time between excluded if either the case or control had inadequate DNA collection and the date at which the matched sample after DNA extraction (n = 7) or high detec- case was diagnosed with MBCN. tion p-values after data processing (n = 1), leaving 438 DNA methylation within several target genes matched pairs for analysis. known to be hypermethylated in MBCN was ana- MBCN cases were grouped into four major sub- lyzed (HOXA9, CDH1, CDH13, ADAMTS18 and types: MM (n = 111), follicular NHL (n = 81), high- PCDH10) [33–36]. A conditional logistic regression grade NHL (n = 111), comprising diffuse large B-cell model was used to identify CpG probes differentially lymphoma (n = 110) and Burkitt lymphoma (n = 1) methylated in cases compared with controls. We and low-grade NHL (n = 136), comprising CLL and assigned a threshold of significance for differential small lymphocytic lymphoma (n = 86), splenic mar- methylation as p < 10-08, in keeping with the cut-off ginal zone lymphoma (n = 6), mantle cell lymphoma used in genome-wide association studies [37]. This is (n = 16), marginal zone lymphoma (n = 15), Walden- a relatively strict threshold comparable to other mul- strom’s macroglobulinemia (n = 9) and hairy cell tiple testing hypotheses such as the Bonferonni pro- leukemia (n = 4). Characteristics of participants and cedure and false discovery rate. Given the exploratory tumors are outlined in Table 1. nature of this prospective study of methylation, we also The proportions of T lymphocytes, NK cells, B adopted an inclusive approach and considered probes lymphocytes, monocytes and granulocytes in blood with p < 10-5. samples were estimated (Table 2, complete data in Supplementary Table). There were significant differ- Results ences in the proportions of estimated B lymphocytes During follow-up (median 10.6 years; range 2.4 and granulocytes in cases compared with controls. Fur- months to 20 years), 471 MBCN cases were identified. ther analyses therefore included an adjustment for the
future science group www.futuremedicine.com 59
64 Research Article Wong Doo, Makalic, Joo et al.
Table 3. Odds ratios for mature B-cell neoplasms in relation to hypomethylation and hypermethylation.†
CpG location Global methylation Without cell content adjustment With cell content adjustment level OR (95% CI)‡ Likelihood-ratio OR (95% CI)‡ Likelihood-ratio test p-value test p-value Genome-wide Hypomethylated 2.27 (1.59–3.25) <0.001 2.01 (1.39–2.93) <0.001 Hypermethylated 0.99 (0.68–1.44) 0.88 (0.60–1.30) Regulatory region: – Nonpromoter regions Hypomethylated 1.66 (1.18–2.34) <0.001 1.55 (1.07–2.25) <0.001 Hypermethylated 0.67 (0.46–1.00) 0.61 (0.41–0.92) – Promoter regions Hypomethylated 1.00 (0.71–1.40) 0.01 0.92 (0.64–1.33) <0.001 Hypermethylated 1.54 (1.10–2.13) 1.48 (1.04–2.11) CpG density: – High CpG density Hypomethylated 1.26 (0.90–1.78) 0.004 1.29 (0.90–1.85) <0.001 Hypermethylated 1.76 (1.25–2.48) 1.74 (1.21–2.48) – Intermediate CpG Hypomethylated 1.01 (0.72–1.40) 0.70 0.95 (0.65–1.39) <0.001 density Hypermethylated 1.14 (0.81–1.61) 1.06 (0.72–1.57) – Low CpG density Hypomethylated 1.89 (1.34–2.65) <0.001 1.79 (1.23–2.58) <0.001 Hypermethylated 0.89 (0.61–1.29) 0.81 (0.55–1.20) CpG distribution: – CpG island Hypomethylated 1.11 (0.78–1.59) <0.001 1.07 (0.74–1.54) <0.001 Hypermethylated 1.89 (1.35–2.65) 1.82 (1.27–2.60) – CpG shore or shelf Hypomethylated 2.03 (1.45–2.84) <0.001 1.89 (1.33–2.70) <0.001 Hypermethylated 1.00 (0.68–1.47) 0.93 (0.63–1.38) – Neither island, shore or Hypomethylated 1.78 (1.26–2.52) <0.001 1.65 (1.31–2.41) <0.001 shelf Hypermethylated 0.78 (0.54–1.13) 0.69 (0.47–1.01) 4HE ASSOCIATION BETWEEN METHYLATION LEVEL AND MATURE " CELL MALIGNANCIES IS REPORTED FOR ALL #P' PROBES WITHIN THE )NlNIUM® (UMAN-ETHYLATION "EAD#HIP GENOME WIDE AS WELL AS FOR DIFFERENT REGULATORY REGIONS AND #P' DENSITY REGIONS †/2 CALCULATED FROM CONDITIONAL LOGISTIC REGRESSION ACCOUNTING FOR AGE GENDER ETHNICITY AND $.! SOURCE ‡/2 ARE CALCULATED RELATIVE TO THE MIDDLE TERTILE OF METHYLATION IN THE CONTROLS /2 /DDS RATIO
B lymphocyte and granulocyte content of the sample Methylation in promoter & nonpromoter in order to account for the potential systematic effect regions of different cell composition on DNA methylation. Within nonpromoter regions, hypomethylation was Overall, genome-wide hypomethylation, defined associated with increased risk of developing MBCN as the lowest tertile of methylation values across all (p = 3.5 × 10-3; OR: 1.66 [95% CI: 1.18–2.34]). CpG sites analyzed was associated with increased The OR was similar after correction for B cell and risk of MBCN (p = 7.20 × 10-6; OR: 2.27 [95% CI: granulocyte content. Within promoter regions as 1.59–3.25)] whereas genome-wide hypermethyl- a whole, higher global methylation was associated ation showed no association (OR: 0.99 [0.68–1.44]) with increased risk of MBCN (OR: 1.54 [95% CI: (Table 3). Repetitive elements were analyzed, includ- 1.10–2.13]), and for the subgroup of high CpG ing SINE, LINE and long terminal repeat regions. promoter regions (HCPs), hypermethylation was Regions containing repetitive elements were associ- associated with increased risk of MBCN (OR: 1.76 ated with hypomethylation (OR: 1.84 [95% CI: [95% CI: 1.25–2.48]). In contrast, for low CpG pro- 1.31–2.59]; p < 0.001) and persisted after correction moter regions (LCPs), lower global methylation was for cell content (OR: 1.74 [95% CI: 1.20–2.52]; associated with increased risk of MBCN (OR: 1.89 p = 0.03). [95% CI: 1.34–2.65]).
60 Epigenomics (2016) 8(1) future science group
65 Global measures of peripheral blood-derived DNA methylation in mature B-cell neoplasms Research Article
Methylation in CpG islands, shores & shelves CDH1, two of 21 probes within the gene body were Within CpG Islands, where CpG content is highest, hypermethylated (p < 10-5, Table 5) and one probe hypermethylation was associated with increased risk of was hypermethylated (p < 10-5) in the 3′UTR. In developing MBCN (OR: 1.89 [95%CI: 1.35–2.65]). In CDH13, six of 64 probes within the gene body were contrast, within CpG shores/shelves, hypomethylation hypomethylated (p < 10-5)(Table 5). was associated with the risk of MBCN (p = 3.3 × 10-5, OR: 2.03 [95%CI: 1.45–2.84]). Similarly, in regions Discussion that are neither CpG islands, shores nor shelves, the Using a case–control study nested within a prospective ‘open sea’ regions, hypomethylation was associated cohort study, and blood collected years prior to diag- with increased risk of MBCN (OR: 1.78 [95% CI: nosis with MBCN, we were able to demonstrate for the 1.24–2.52]). first time an association between genome-wide global DNA hypomethylation and the risk of developing Tumor subgroups MBCN (OR: 2.27 for lowest tertile vs middle tertile There was no significant heterogeneity in associations of global methylation). Further, we were able to char- between the different tumor subgroups, with MM, fol- acterize distinct patterns of global DNA methylation licular NHL, high-grade NHL and low-grade NHL all according to the functional status and genomic loca- demonstrating similar patterns of global methylation tion of CpG sites. Within promoter regions contain- associated with risk (p = 0.36). ing high CpG density (HCP), hypermethylation was In order to account for the known effect of some associated with increased risk of MBCN. HCP regions lifestyle and environmental factors on DNA meth- contain a high proportion of gene promoters within ylation, a separate analysis adjusting for BMI, dietary CpG islands and in normal tissue are overwhelmingly folate intake and smoking status was performed and protected from methylation [6,29]. The loss of constitu- did not affect the magnitude of association nor p value tive protection from DNA methylation within HCP significance levels. Analysis after adjustment for differ- promoter regions has been described in MM, follicular ences in B-cell and granulocyte content in blood sam- NHL and DLBCL tissues [11,38]. Its role in the develop- ples was performed and made no material difference ment of MBCN has been uncertain to date, but our to the ORs. finding of promoter hypermethylation in pre-diagnos- tic blood samples suggests it is an early rather than late Time from blood collection to diagnosis event. The time from baseline blood sampling to diagno- Hypermethylation of specific gene promoters in sis with MBCN was less than 5 years for 83 (20%), MBCN, such as HOXA gene family in B-NHL, tumor 5–9 years for 114 (26%) and ≥10 years for 239 (54%) suppressor genes CDH1, CDH13 and ADAMTS18 in cases. There was no evidence that the association DLBCL and mantle cell lymphoma, and PCDH10 in between global methylation and MBCN risk was NHL and MM, is described [33–36], suggesting that stronger when blood was sampled closer to the time of HCP hypermethylation, perhaps through epigenetic Table 4) diagnosis (phet = 0.19) ( . silencing of tumor suppressor genes, is important in MBCN pathogenesis. We confirmed that within the Methylation in promoter regions of tumor HOXA9 gene, there was one CpG probe within the suppressor genes promoter region hypermethylated in cases compared We analyzed DNA methylation in CpG probes asso- with controls with a genome-wide level of significance ciated with HOXA9, CDH1, CDH13, ADAMTS18 (p < 10-8). On relaxing the criteria for significance and PCDH10, genes known to exhibit promoter to p < 10-5, there were four additional probes hyper- methylation in MBCN tumor tissue. Within the methylated within the promoter region. Within the HOXA9 gene, the Infinium HumanMethylation450 PCDH10 gene, we confirmed there were four hyper- BeadChip maps 28 CpG sites; four probes demon- methylated CpG probes within the promoter region strated significant hypermethylation in cases com- (p < 10-5). It is intriguing that we have been able to pared with controls (p < 10-8) with three probes detect promoter hypermethylation in genes implicated lying in the exon region and one probe in the 5′UTR in MBCN pathogenesis in blood samples collected (Table 5). There are eight further hypermethylated many years before diagnosis with MBCN. While stud- probes within the HOXA9 gene reaching a lower ies of tumor tissue commonly nominate an arbitrary level of statistical significance (p < 10-5) with four cut-off for differential methylation of β(case–control) probes in the promoter region. In PCDH10, five of > 0.2, in our analysis, we did not apply a specific cut- 24 probes within 1500 bp of the transcription start off for differential methylation (some studies in tumor site were hypermethylated (p < 10-5, Table 5). In tissue nominate a methylation difference of β[case–
future science group www.futuremedicine.com 61
66 Research Article Wong Doo, Makalic, Joo et al.
control] > 0.2) as the strength of our study is compari- son of a large number of matched cases and controls and rigorous control for bias by study design consisting of matched pairs and analytical methods (sample plat- ing methods and use of normalization methods to cor- rect for possible bias). The magnitude of methylation differences in peripheral blood collected years prior
Likelihood ratio test p-value 0.001 to tumor diagnosis is untested in the current litera- ture, and we felt that the methylation thresholds from tumor tissue studies could exclude significant results. >9 years (n = 263) We note that the findings of promoter hypermethyl- ation in HOXA9 and PCDH10 were apparent after relaxation of our criteria for genome-wide significance therefore confirmation of our findings in a similar 2.25 (1.42–3.58) 1.30 (0.79–2.14) OR (95% CI) large prospective sample is ideal. It is increasingly recognized that regions outside CpG islands, particularly CpG shores and shelves, contain higher proportions of differentially methylated regions within tumor tissue [39,40]. We found a strong association between global DNA hypomethylation in CpG shores and shelves that was not present in CpG islands, supporting their relevance to MBCN risk.
0.12 Likelihood ratio test p-value A major strength of our study is its prospective design, with DNA collected many years before MBCN diagnosis and therefore suggesting that the aber- 5–9 years (n =90) rant methylation patterns detected are an early event. While it is possible that peripheral blood samples from cases could contain circulating tumor cells, the latency SOURCE Time between baseline and diagnosis between baseline blood collection and diagnosis did 1.43 (0.65–3.15)1.43 (0.32–1.54)0.70 OR (95% CI) not affect the risk of MBCN. Factors known to alter $.! DNA methylation (age, gender, year of birth and eth- AND nicity) were taken into account while lifestyle factors with the potential to affect methylation (dietary folate ETHNICITY intake, obesity and smoking status) did not affect risk. The interpretation of DNA methylation from periph- GENDER eral blood samples requires caution due to the well- AGE