EVALUATION OF MISMATCH REPAIR POLYMORPHISMS AND

THEIR CONTRIBUTION TO COLORECTAL CANCER AND ITS

SUBSETS

By

Miralem Mrkonjic

A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy

Graduate Department of Laboratory Medicine and Pathobiology

University of Toronto

© Copyright by Miralem Mrkonjic (2009)

ABSTRACT

Evaluation of Mismatch Repair Polymorphisms and Their Contribution to Colorectal

Cancer and its Subsets

Doctor of Philosophy, 2009

Miralem Mrkonjic

Department of Laboratory Medicine and Pathobiology, University of Toronto

Colorectal cancer (CRC) is a major source of morbidity and mortality in the Western world.

Approximately 15% of all CRCs develop via the mutator pathway, which results from a deficiency of mismatch repair (MMR) system and leads to genome-wide microsatellite instability (MSI). MLH1 promoter hypermethylation accounts for the majority of MSI CRCs.

Numerous single nucleotide polymorphisms have been identified in MMR , however their functional roles in affecting MMR system, and therefore susceptibility to MSI CRCs, are unknown. This study uses a multidisciplinary approach combining molecular genetics, epigenetics, and epidemiology to examine the contribution of MMR gene polymorphisms in

CRC. Among a panel of MMR SNPs examined, the MLH1 (-93G>A) promoter polymorphism

(rs1800734) was shown to be associated with increased risk of MSI CRCs in two Canadian populations, Ontario and Newfoundland. Functional studies of the MLH1-93G>A polymorphism indicate that it has weak effects on the core promoter activity, although it dramatically reduces activity of the shorter promoter constructs in a panel of cell lines. Furthermore, MLH1 gene shares a bi-directional promoter with EMP2AIP1 gene, and the MLH1-93G>A polymorphism increases the activity of the reverse, EPM2AIP1 promoter. Examination of alternative role of the

MLH1-93G>A polymorphism in MSI-H CRCs led to evaluation of a 500-kilobase pair

ii

3 region around the MLH1 gene and identification of two additional SNPs, rs749072 and rs13098279, which are in strong linkage disequilibrium with rs1800734. All three

SNPs showed strong associations with MLH1 promoter methylation, loss of MLH1 protein expression, and MSI-H CRCs in three populations, Ontario, Newfoundland, and Seattle. Such findings potentially implicate genetic susceptibility to DNA methylation. Logistic regression models for MSI-H versus non-MSI-H CRCs demonstrate that models including MLH1 IHC status and MLH1 promoter methylation status fit the data most parsimoniously in all three populations combined, however, when rs1800734/rs749072/rs13098279 was added to this model, polymorphisms no longer remained significant indicating that the observed associations of these polymorphisms with the MSI-H CRCs occur through their effect on DNA methylation.

This study identified a novel mechanism in which common missense alterations may contribute to complex disease.

iii

ACKNOWLEDGEMENTS

I would like to extend my heartfelt gratitude to my supervisor, Dr. Bharati Bapat, for all her guidance and support over the years and for giving me an opportunity to pursue my interest in cancer research. I have gained much experience and knowledge and was able to grow both as a person and a researcher during my tenure in her laboratory.

I would also like to thank my thesis advisory members, Dr. Julia Knight, Dr. Hilmi Ozcelik, and Dr. John McLaughlin, for their guidance, support, and inspiration over the last five years. A special thank you goes to Dr. Julia Knight who co-supervised my initial training. Her contributions to my projects have been substantial and very appreciated.

I am grateful to our collaborators, Nicole Roslin, Dr. Andrew Paterson, and Dr. Celia Greenwood, from the Hospital for Sick Children, for their guidance and contribution to my final data chapter and beyond. I am also grateful to many individuals from the Mount Sinai Hospital/SLRI, OFCCR, NFCCR, FHCRC, and UCSC that contributed to my thesis project over the past five years. I would like to thank Neerav Monga, for his statistical support in my first data chapter, and to Michael Manno, for the statistical analyses in my many side projects.

Many thanks to staff and administrators of the Laboratory Medicine and Pathobiology department, especially Dr. Harry Elsholtz, for all the support and friendly environment they have provided and to many CLAMPS members with whom I have had the privilege of working with over the years.

I have had the pleasure of working with many exceptional people in the lab who have made my graduate experience truly memorable. Special thanks go to Roula for her patience and mentorship in the early stages of my project, and for correcting all the “thes” that I tended to neglect. To Sheron, with whom I have started my graduate journey with, thank you for all your support and for always lending me a helping hand when I needed it the most. You have been a great friend and a confidante. Many thanks to George who always willingly shared his experiences and offered guidance. Kenneth (Gyro James and countless other nicknames that I gave you), thank you for making my time in the lab an entertaining one – keep corrupting young minds and I promise to do the same. To Vaiju and the newbies, James and Liyang (Kiki), I have had an extraordinary time working with you. Additional thanks go to lab alumni, Susmita, Pinella, and Ted, with whom I share very fond memories. I have thoroughly enjoyed the time I spent with all of you, many thanks for all the support you have given me, and most importantly, your friendships, which I hope will continue well into the future.

Finally and most importantly, countless thanks go to my wonderful family – my mother, my father, and my sister, Mirela. You have always supported me and encouraged me to do my utmost best. Thank you very much for many sacrifices you have made so that I can pursue my dreams. I love you very much.

iv

TABLE OF CONTENTS

ABSTRACT ...... ii ACKNOWLEDGEMENTS...... iv TABLE OF CONTENTS...... v LIST OF TABLES ...... viii LIST OF FIGURES ...... xi LIST OF ABBREVIATIONS...... xiii

CHAPTER 1 GENERAL INTRODUCTION 1.1 COLORECTAL CANCER...... 1

1.1.1 PHYSIOLOGY AND ANATOMY OF THE COLORECTUM...... 1 1.1.2 HISTORY AND EPIDEMIOLOGY OF COLORECTAL CANCER ...... 2 1.1.3 RISK FACTORS ...... 4 1.1.4 SCREENING, STAGING, AND PROGNOSIS...... 6 1.1.5 MOLECULAR PATHWAYS IN CRC (CIN/MSS, MSI/MIN, CIMP) ...... 8 1.1.6 HEREDITARY SYNDROMES, PATHOLOGY AND PRESENTATIONS ...... 23 1.2 MISMATCH REPAIR (MMR)...... 30

1.2.1 SOURCES OF DNA NUCLEOTIDE MISMATCHES...... 30 1.2.2 MMR SYSTEM OVERVIEW AND HISTORY...... 32 1.2.3 REPLICATION ERROR REPAIR...... 32 1.2.4 MMR IN DNA DAMAGE SIGNALLING, CELL CYCLE ARREST, AND APOPTOSIS...... 41 1.2.5 OTHER ROLES OF MMR ...... 43 1.2.6 ROLE OF MMR SYSTEM IN CANCER...... 47 1.3 GENETIC VARIATION...... 49

1.3.1 POLYMORPHISMS (DEFINITIONS AND INTRODUCTION)...... 49 1.3.2 UTILITY AND ROLES OF SNPS IN GENETIC EPIDEMIOLOGY ...... 51 1.3.3 IDENTIFICATION OF LOW-PENETRANCE ALLELES IN CRC AND FOUNDER POPULATIONS .53 1.3.4 SNPS IN MMR GENES ...... 55 1.4 HYPOTHESIS AND OBJECTIVES...... 56

v

CHAPTER 2 IDENTIFICATION OF MMR MODIFIER ALLELES IN CRC IN TWO CANADIAN POPULATIONS 2.1 SUMMARY...... 60 2.2 INTRODUCTION...... 61 2.3 MATERIALS AND METHODS ...... 64

2.3.1 SNP SELECTION CRITERIA...... 64 2.3.2 STUDY SUBJECTS ...... 64 2.3.3 MOLECULAR GENETIC ANALYSIS ...... 69 2.3.4 STATISTICAL METHODS ...... 75 2.4 RESULTS...... 76

2.4.1 POPULATIONS OF CASE PATIENTS AND CONTROL SUBJECTS ...... 76 2.4.2 DISTRIBUTION OF GENOTYPES AND ALLELES ...... 76 2.4.3 ASSOCIATIONS OF POLYMORPHISMS WITH CLINICOPATHOLOGICAL TUMOUR FEATURES ...79 2.5 DISCUSSION...... 93

CHAPTER 3 THE FUNCTIONAL EFFECTS OF THE MLH1-93G>A POLYMORPHISM ON THE MLH1/EPM2AIP1 PROMOTER ACTIVITY 3.1 SUMMARY...... 112 3.2 INTRODUCTION...... 113 3.3 MATERIALS AND METHODS ...... 116

3.3.1 MATERIALS...... 116 3.3.2 CELL CULTURE ...... 116 3.3.3 PROMOTER CONSTRUCTS...... 117 3.3.4 LUCIFERASE REPORTER GENE ASSAYS...... 119 3.3.5 IN VITRO PROMOTER CONSTRUCT METHYLATION WITH HHAI AND M.SSSI ...... 120 3.4 RESULTS...... 121

3.4.1 EFFECTS OF THE MLH1-93G>A SNP ON THE CORE PROMOTER (C+D) ACTIVITY ...... 123 3.4.2 EFFECTS OF THE MLH1-93G>A SNP ON THE D-REGION PROMOTER ACTIVITY...... 124 3.4.3 EFFECTS OF THE -93G>A SNP ON THE EPM2AIP1 PROMOTER ACTIVITY ...... 129 3.4.4 EFFECTS OF THE MLH1-93G>A SNP ON METHYLATED CORE PROMOTER ACTIVITY ...... 130 3.5 DISCUSSION...... 139

vi

CHAPTER 4 SPECIFIC VARIANTS IN THE MLH1 GENE REGION CONTRIBUTE TO DNA METHYLATION, LOSS OF PROTEIN EXPRESSION, AND MSI-H COLORECTAL CANCER 4.1 SUMMARY...... 147 4.2 INTRODUCTION...... 148 4.3 MATERIALS AND METHODS ...... 151

4.3.1 SNP SELECTION CRITERIA...... 152 4.3.2 STUDY SUBJECTS ...... 152 4.3.3 MOLECULAR GENETIC ANALYSIS ...... 154 4.3.4 STATISTICAL METHODS ...... 160 4.4 RESULTS...... 161 4.5 DISCUSSION...... 179

CHAPTER 5 SUMMARY AND FUTURE DIRECTIONS 5.1 MMR POLYMORPHISMS IN COLORECTAL CANCER...... 189 5.2 MLH1-93G>A EFFECTS OF PROMOTER ACTIVITY...... 191 5.3 EXAMINATION OF MSH2/MSH6 GENE REGION POLYMORPHISMS IN CRC ...... 194 5.4 IMPLICATION OF NEW GENES IN CRC...... 194

APPENDICES ...... 198

REFERENCES...... 216

vii

LIST OF TABLES

CHAPTER 1

TABLE 1.1 TUMOUR STAGE CLASSIFICATION ACCORDING TO THE AMERICAN JOINT COMMITTEE ON

CANCER (AJCC) CANCER STAGING MANUAL, 6TH EDITION...... 8

TABLE 1.2 TUMOUR STAGING AS DEFINED BY THE AJCC, 6TH EDITION ...... 8

TABLE 1.3 CLINICAL CRITERIA FOR HEREDITARY NONPOLYPOSIS COLORECTAL CANCER ...... 27

TABLE 1.4 IDENTITY AND FUNCTIONS OF E. COLI AND EUKARYOTIC PROTEINS INVOLVED IN MMR ...... 35

TABLE 1.5 PHENOTYPE OF MMR-DEFICIENT KNOCKOUT MICE...... 48

CHAPTER 2

TABLE 2.1 SEQUENCES OF PRIMERS AND PROBES...... 70

TABLE 2.2 PRIMER SEQUENCES FOR MICROSATELLITE INSTABILITY TESTING ...... 73

TABLE 2.3 DISTRIBUTION OF AGE, FAMILY HISTORY, AND CLINICOPATHOLOGICAL FEATURES OF

ONTARIO AND NEWFOUNDLAND COLORECTAL CANCER PATIENTS ...... 77

TABLE 2.4 ALLELE FREQUENCIES OF EACH POLYMORPHISM IN PARTICIPANTS IN ONTARIO AND

NEWFOUNDLAND...... 78

TABLE 2.5 ASSOCIATION OF MMR GENE POLYMORPHISMS WITH RISK OF CRC IN ONTARIO AND

NEWFOUNDLAND...... 80

TABLE 2.6 RISK OF COLORECTAL CANCER BY MICROSATELLITE INSTABILITY STATUS FOR THE

MLH1-93G>A POLYMORPHISM ONLY IN ONTARIO AND NEWFOUNDLAND*...... 81

TABLE 2.7 GENOTYPE FREQUENCIES OF MLH1 IVS14-19A>G POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN ONTARIO ...... 82

TABLE 2.8 GENOTYPE FREQUENCIES OF MLH1 IVS14-19A>G POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN NEWFOUNDLAND...... 83

TABLE 2.9 GENOTYPE FREQUENCIES OF MSH6-159C>T POLYMORPHISM AND CLINICOPATHOLOGIC

FEATURES OF CASE PATIENTS WITH CRC IN ONTARIO ...... 84

viii

TABLE 2.10 GENOTYPE FREQUENCIES OF MSH6-159C>T POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN NEWFOUNDLAND...... 85

TABLE 2.11 GENOTYPE FREQUENCIES OF MSH2 IVS12-6T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN ONTARIO ...... 87

TABLE 2.12 GENOTYPE FREQUENCIES OF MSH2 IVS12-6T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN NEWFOUNDLAND...... 88

TABLE 2.13 GENOTYPE FREQUENCIES OF MLH1 –93G>A POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN ONTARIO ...... 89

TABLE 2.14 GENOTYPE FREQUENCIES OF MLH1 –93G>A POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN NEWFOUNDLAND...... 90

TABLE 2.15 GENOTYPE FREQUENCIES OF MSH2-118T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN ONTARIO ...... 93

TABLE 2.16 GENOTYPE FREQUENCIES OF MSH2-118T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF CASE PATIENTS WITH CRC IN NEWFOUNDLAND...... 94

TABLE 2.17 GENOTYPE FREQUENCIES OF MSH2-118T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF MALE CASE PATIENTS WITH CRC IN ONTARIO ...... 95

TABLE 2.18 GENOTYPE FREQUENCIES OF MSH2-118T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF FEMALE CASE PATIENTS WITH CRC IN ONTARIO...... 96

TABLE 2.19 GENOTYPE FREQUENCIES OF MSH2-118T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF MLH1-PROFICIENT FEMALE CASE PATIENTS WITH CRC IN

ONTARIO ...... 97

TABLE 2.20 GENOTYPE FREQUENCIES OF MSH2-118T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF MALE CASE PATIENTS WITH CRC IN NEWFOUNDLAND....98

TABLE 2.21 GENOTYPE FREQUENCIES OF MSH2-118T>C POLYMORPHISM AND

CLINICOPATHOLOGIC FEATURES OF FEMALE CASE PATIENTS WITH CRC IN NEWFOUNDLAND 99

CHAPTER 3

TABLE 3.1 SUMMARY OF LUCIFERASE REPORTER ASSAY RESULTS...... 136

CHAPTER 4

ix

TABLE 4.1 SEQUENCES OF PRIMERS AND PROBES...... 156

TABLE 4.2 CHARACTERISTICS OF THREE STUDY POPULATIONS ...... 162

TABLE 4.3 POLYMORPHISMS ASSOCIATED WITH AN OVERALL INCREASED RISK OF CRC IN ONTARIO

WITH ATTEMPTED REPLICATION OF RS4431050 IN NEWFOUNDLAND...... 165

TABLE 4.4 SEQUENTIAL ANALYSES OF ONTARIO SAMPLES WITH TUMOUR MSI STATUS, MLH1 IHC

STATUS AND MLH1 PROMOTER METHYLATION STATUS ...... 166

TABLE 4.5 SINGLE MARKER ANALYSIS FOR 3 SNPS FOR CRC CASES VERSUS CONTROLS, MLH1

PROMOTER METHYLATION, MLH1 IHC STAINING AND MSI TUMOUR STATUS IN ONTARIO .167

TABLE 4.6 SINGLE MARKER ANALYSIS FOR 3 SNPS FOR CRC CASES VERSUS CONTROLS, MLH1

PROMOTER METHYLATION, MLH1 IHC STAINING AND MSI TUMOUR STATUS IN

NEWFOUNDLAND...... 168

TABLE 4.7 SINGLE MARKER ANALYSIS FOR 3 SNPS FOR CRC CASES VERSUS CONTROLS, MLH1

PROMOTER METHYLATION, MLH1 IHC STAINING AND MSI TUMOUR STATUS IN SEATTLE ..169

TABLE 4.8 SINGLE MARKER ANALYSIS IN THE COMBINED DATA FOR 3 SNPS FOR CRC CASES

VERSUS CONTROLS, MLH1 PROMOTER METHYLATION, MLH1 IHC STAINING AND MSI

TUMOUR STATUS...... 176

TABLE 4.9 MULTIVARIATE LOGISTIC REGRESSION MODEL RESULTS FOR MSI STATUS FOR

RS1800734, RS749072, AND RS13098279 FOR ONTARIO...... 179

TABLE 4.10 MULTIVARIATE LOGISTIC REGRESSION MODEL RESULTS FOR MSI STATUS FOR

RS1800734, RS749072, AND RS13098279 FOR NEWFOUNDLAND ...... 180

TABLE 4.11 MULTIVARIATE LOGISTIC REGRESSION MODEL RESULTS FOR MSI STATUS FOR

RS1800734, RS749072, AND RS13098279 FOR SEATTLE ...... 181

TABLE 4.12 MULTIVARIATE LOGISTIC REGRESSION MODEL RESULTS FOR MSI STATUS FOR

RS1800734, RS749072, AND RS13098279 IN THE COMBINED DATA ...... 182

x

LIST OF FIGURES

CHAPTER 1

FIGURE 1.1 GENETIC AND EPIGENETIC INSTABILITY PATHWAYS IN CRC...... 13

FIGURE 1.2 COMPARISON OF SPORADIC, FAP, AND NHPCC CRCS ...... 18

FIGURE 1.3 CHANGES IN THE PROMOTER CPG ISLAND FOLLOWING DNA METHYLATION...... 22

FIGURE 1.4 DISTRIBUTION OF CPG DINUCLEOTIDES IN THE HUMAN GENOME AND DIFFERENCES IN

METHYLATION PATTERNS BETWEEN THE NORMAL CELLS AND CANCER CELLS...... 23

FIGURE 1.5 SUMMARY OF MAMMALIAN MMR ...... 39

FIGURE 1.6 THREE MODELS FOR DOWNSTREAM SIGNALLING BETWEEN THE MISMATCH

RECOGNITION AND STRAND RECOGNITION SIGNAL...... 40

FIGURE 1.7 MMR-MEDIATED DNA DAMAGE SIGNALLING LEADS TO G2/M ARREST AND APOPTOSIS ...... 43

CHAPTER 3

FIGURE 3.1 SCHEMATIC DIAGRAM OF THE EPM2AIP1/MLH1 BI-DIRECTIONAL PROMOTER WITH

THE FOUR REGIONS OUTLINED...... 115

FIGURE 3.2 SCHEMATIC DIAGRAM REPRESENTING DIFFERENT PROMOTER CONSTRUCTS USED IN

LUCIFERASE REPORTER ASSAYS ...... 123

FIGURE 3.3 EFFECT OF THE MLH1-93G>A POLYMORPHISM ON MLH1 CORE PROMOTER ACTIVITY ...... 126

FIGURE 3.4 EFFECT OF THE MLH1-93G>A POLYMORPHISM ON MLH1 D-REGION PROMOTER

ACTIVITY ...... 128

FIGURE 3.5 EFFECT OF THE MLH1-93G>A POLYMORPHISM ON EPM2AIP1 (IP1) PROMOTER

ACTIVITY ...... 131

FIGURE 3.6 IN VITRO METHYLATION OF PGL3-BASIC, C+D -93G AND C+D -93A VECTOR

CONSTRUCTS USING HHAI METHYLTRANSFERASE FOLLOWED BY DIGESTION WITH HHAI

RESTRICTION ENDONUCLEASE...... 133

xi

FIGURE 3.7 EFFECT OF THE MLH1-93G>A POLYMORPHISM ON MLH1 HHAI METHYLATED CORE

PROMOTER ACTIVITY ...... 134

FIGURE 3.8 IN VITRO METHYLATION OF PGL3-BASIC, C+D -93G AND C+D -93A VECTOR

CONSTRUCTS USING M.SSSI METHYLTRANSFERASE FOLLOWED BY DIGESTION WITH BSTUI

RESTRICTION ENDONUCLEASE...... 135

FIGURE 3.9 DETERMINATION OF THE PROTEIN-BINDING TO THE -93G>A SNP REGION BY EMSA ...... 138

CHAPTER 4

FIGURE 4.1 PROPOSED MODEL FOR GENETIC SUSCEPTIBILITY TO DNA METHYLATION IN SPORADIC

MSI-H CRCS ...... 151

FIGURE 4.2 500KB REGION OF CHROMOSOME 3 EXAMINED IN ONTARIO SAMPLES ...... 163

FIGURE 4.3 D-PRIME MAP OF ONTARIO SAMPLES...... 171

FIGURE 4.4 R-SQUARED MAP OF ONTARIO SAMPLES...... 173

FIGURE 4.5 LRRFIP2 PROMOTER CPG ISLAND ...... 177

FIGURE 4.6 EVALUATION OF LRRFIP2 PROMOTER METHYLATION BY MS-PCR...... 177

LIST OF APPENDICES

TABLE A.1 SINGLE MARKER CASE-CONTROL ANALYSES ...... 198

TABLE A.2 SINGLE MARKER ANALYSES BY MLH1 PROMOTER METHYLATION STATUS...... 201

TABLE A.3 SINGLE MARKER ANALYSES BY TUMOUR MLH1 IHC STATUS...... 203

TABLE A.4 SINGLE MARKER ANALYSES BY TUMOUR MSI STATUS ...... 205

TABLE A.5 SUMMARIES OF LOGISTIC REGRESSION MODELS...... 207

TABLE A.6 DETAILS OF LOGISTIC REGRESSION MODELS ...... 209

xii

LIST OF ABBREVIATIONS

5-FU 5-Fluorouracil

A Adenine

ADP Adenosine Diphosphate

APC Adenomatous Polyposis Coli

ATP Adenosine Triphosphate bp

C Cytosine

CFR Cancer Family Registries

CIMP CpG Island Methylator Phenotype

CNV Copy Number Variation

CRC Colorectal Cancer dbSNP SNP Database

DCLK3 Doublecortin-like Kinase 3

DMEM Dulbecco's Modified Eagle's Medium

DNA Deoxyribonucleic Acid dNTP deoxyribonucleoside triphosphate

EMSA Electrophoretic Mobility Shift Assay

EPM2AIP1 Laforin (EPM2A) Interacting Protein 1

EV empty vector

Exo1 Exonuclease 1

FAP Familial Adenomatous Polyposis

FBS foetal bovine serum

xiii

FFQ Food Frequency Questionnaire

FHQ Family History Questionnaire

FOBT Faecal Occult Blood Test

FRCRC Fred Hutchinson Cancer Research Centre

G Guanine

GOLGA4 Golgi Autoantigen, Golgin Subfamily A, 4

GWAS Genome Wide Association Studies

HNPCC Hereditary Nonpolyposis Colorectal Cancer hr hour

IDL Insertion/Deletion Loop

IHC Immunohistochemistry indel insertion/deletion

IVS Intervening Sequence kb kilo base pair

LBA1 Lupus Brain Antigen 1

LD linkage disequilibrium

LPS lipopolysaccharide

LRRFIP2 Leucine Rich Repeat in Flightless (FLII) Interacting Protein 2

MAF Minor Allele Frequency

MAP MYH Associated Polyposis

Mb Mega base pair min minute

MLH1 MutL Homologue 1

xiv

mM millimolar

MS-PCR Methylation Specific Polymerase Chain Reaction

MSH2 MutS Homologue 2

MSH6 MutS Homologue 6

MSI Microsatellite Instability

MSI-H High Frequency Microsatellite Instability

MSI-L Low Frequency Microsatellite Instability

MSS Microsatellite Stable

MYH MutY Homologue

NFCCR Newfoundland Familial Colorectal Cancer Registry ng nanogram

OFCCR Ontario Familial Colorectal Cancer Registry

PCNA Proliferating Cell Nuclear Antigen

PCR Polymerase Chain Reaction

PHQ Personal History Questionnaire

PMS2 Post-Meiotic Segregation 2

PolD Polymerase Delta

RNA Ribonucleic Acid

ROS

RPA Replication Protein A

RT-PCR Reverse Transcribed Polymerase Chain Reaction

SNP Single Nucleotide Polymorphism

T Thymine

xv

TNM Tumour-Node-Metastasis

TS Thymidylate Synthase

U

WT wild-type

xvi 1

CHAPTER 1

GENERAL INTRODUCTION

1.1 Colorectal Cancer

1.1.1 Physiology and Anatomy of the Colorectum

The colorectum is a component of human digestive system responsible for absorption of water, sodium, and chloride, while excreting potassium and bicarbonate (1). The large intestine is capable of passively absorbing short-chain fatty acids, main fuel for the colonic epithelium (1).

It is also responsible for storage of waste until it can be removed from the body. The colorectum can be embryologically subdivided into two regions, the proximal and distal colon. The proximal colon consists of the appendix, caecum, ascending colon, hepatic flexure, and transverse colon while the distal colon consists of the splenic flexure, descending colon, sigmoid colon, and rectum. Proximal and distal segments of the large intestine have different embryological origins and distinct biological characteristics. Proximal colon, up to two-thirds of the transverse colon derives from the midgut, while the distal colon derives from the hindgut. The distinct embryological origins are also reflected in the dual blood supply of the colon and separate innervation patterns (1). Colonic epithelial cells of the proximal colon depend mostly on acetate as a fuel source while the epithelial cells of the distal colon rely primarily on butyrate. In addition, there are well-reported differences in the concentration of biliary salts and bacterial populations between the two regions (2). Concentration of biliary salts is higher in the proximal colon, and it has been suggested that certain bile acids could selectively increase the risk of proximal colorectal cancer (3).

2

The colonic mucosa, which lines the lumen, contains infoldings that form crypts, which are approximately fifty cells deep. Paneth cells are situated at the bases of these crypts and immediately above Paneth cells, three to five cells from the base, are located multipotent stem cells (known as the proliferative zone) (4). Multipotent stem cells produce, among other types of cells, epithelial cells, which migrate towards the lumen and become differentiated enterocytes.

Colonic epithelial cells undergo rapid turnover with a lifetime of only 4-6 days (5). Quick epithelial proliferation rate and sloughing of mature, aged cells at the apex of crypts maintain proper crypt structure. However, raid proliferation rate increases the possibility of replication errors and accumulation of genetic and/or epigenetic errors, which may lead to cancer development (6).

1.1.2 History and Epidemiology of Colorectal Cancer

Colorectal cancer (CRC) results from the progressive accumulation of genetic and epigenetic alterations, which lead to the transformation of normal colonic epithelium into benign adenomas, and finally, malignant adenocarcinomas (7). During the formation of adenomatous polyps, gland-forming mucosal projections observed in most CRCs, the expansion of the proliferative zone towards the luminal surface is observed. The earliest identifiable lesion that occurs is a cluster of colonic crypts displaying abnormal morphology, termed the dysplastic aberrant crypt foci (ACF) (8). Individual cells are still morphologically normal, but the accumulation of cells within the crypts leads to crowding and mucosal folding, resulting in a saw-toothed appearance (8). Additional changes in genetic and/or epigenetic profiles in cells of the adenoma may provide it with a growth advantage that drives tumour progression as

3 successive clonal outgrowths are generated, resulting in carcinoma (7). Individuals with a personal history of colonic polyps or adenomas are at increased risk of developing CRC (9).

CRC is one of the most common neoplasms in developed countries. Over one million new cases are diagnosed every year worldwide (10). It is the second leading cause of cancer- related deaths in North America, affecting approximately 5% of the general population (10, 11).

While the prevalence of colonic adenomas is higher in males than in females, the rates of CRC are similar in both sexes (9). The incidence rate of CRC increases with advancing age (12). The estimated incidence of CRC in Canada for 2009 is 22,000 cases with approximately 9,100 deaths resulting from this disease (13). The lifetime risk of developing CRC in Canada is 6.5-7.5% and five-year survival rate is 62% (13). In Ontario, approximately 8,100 CRC cases will be diagnosed in 2009 and the age-standardized incidence rates are 60 for males (per 100,000) and

41 for females (per 100,000) (13). The incidence of CRC within Canada varies by geographic location, with the Atlantic provinces having the highest rates in the country and the Pacific provinces having the lowest rates. The province of Newfoundland and Labrador (hereafter referred to as Newfoundland) has the highest rate of CRC in Canada and even the world (13-15).

In Newfoundland, approximately 490 CRC cases will be diagnosed in 2009 and the age- standardized incidence rates are 87 for males (per 100,000) and 52 for females (per 100,000)

(13). In contrast, in British Columbia, approximately 2,650 new CRC cases will be diagnosed in

2009, and the age-standardized incidence rates are 53 for males (per 100,000) and 36 for females

(per 100,000) (13).

4

1.1.3 Risk Factors

A strong genetic predisposition accounts for approximately 1-6% of CRCs (16), leaving a large proportion under speculation about the contribution of the environment, possible mutations in yet to be identified genes, and/or a combination of genetic and environmental factors. Strong indicator of environmental contribution to CRC comes from studies of migrants moving from nations with low incidence of CRC to nations with high incidence and, in a span of one generation, attaining similar incidence rates to those of their adopted country (15, 17-19).

Diet plays an important role in the etiology of CRC. Diets high in red and processed meat have been shown to increase risk of CRC, while diets rich in fruits, vegetables, and fibre have been shown to lower CRC risk (20, 21). Heterocyclic amines formed in cooking, nitroso compounds in processed meats and heme iron found in red meats are the proposed agents that form DNA-adducts, induce double stranded DNA breaks and stimulate epithelial proliferation

(22-24). To date, studies on fat-rich diets in CRC show inconsistent results (25-28). However, fat-rich diets are associated with increased risk of colon tumours in combination with activating

KRAS mutations (27, 29).

Alcohol consumption also increases the risk of CRC (30). This risk is consistent regardless of the type of alcoholic beverage, suggesting that the association is attributable to ethanol intake itself (31). Both volume of alcohol consumption and patterns of drinking influence

CRC risk (32). It is not fully understood how ethanol plays a role in carcinogenesis, however it has been suggested that ethanol acts as a cocarcinogen (33). Specifically, when ethanol is metabolized, acetaldehyde and free radicals are generated. These byproducts are believed to be responsible for alcohol-associated carcinogenesis by damaging nucleic acids, proteins, and folate

(34).

5

Smoking cigarettes, cigars, and pipes is also associated with an elevated risk of CRC

(35). Furthermore, strong association between smoking and the risk of adenomatous polyps has been well established. Due to a large number of toxic compounds present in cigarette smoke, the exact mechanism of these associations is unclear, although the risk may be further modified by polymorphisms in metabolizing enzymes, such as NAT1 and NAT2 (35).

The relationship between physical activity and reduced risk of CRC is one of the most consistent findings in the epidemiologic literature (35). Individuals with high levels of activity throughout their lives have significantly lower risk of CRC than inactive individuals. It is not then surprising that body mass index (BMI) is also associated with CRC risk. Individuals who were in the highest quantile of BMI, classified as obese, have a two-fold higher risk than those in the lowest quantile (35). However, studies between BMI and CRC risk show inconsistent results, especially among women (35).

Patients with inflammatory bowel disease (IBD), such as ulcerative colitis (UC) and

Crohn’s disease (CD), are at increased risk of developing CRC (36, 37). It is believed that this risk is caused by the persistent inflammation of the colon, though the exact mechanism is still unclear (38).

However, one certainty attained from the study of this disease is the importance of family history in determining risk of CRC. Individual’s risk of colorectal cancer more than doubles if one first-degree relative is affected with CRC and it almost quadruples if two first-degree relatives are affected (39). Family history of CRC is the most important indicator of CRC risk

(40).

6

1.1.4 Screening, Staging, and Prognosis

In Canada, five-year survival rate of CRC is 62% and detection of CRC at the early stages of disease frequently leads to a better prognosis (13, 14). There is strong evidence that screening is the key to reducing the morbidity and mortality associated with CRC; when screening is done consistently, correctly, and polyps are removed (41). The estimated five-year survival is 90% if CRC is diagnosed while still localized (confined to the bowel walls), but only

68% for regional disease (lymph node involvement), and only 10% if distant metastases are detected (41). The most important step in CRC screening is an accurate assessment of the risk for developing the disease and application of the appropriate screening interval and modality (41).

Screening methods for early detection of CRC include: faecal occult blood test (FOBT), flexible sigmoidoscopy (FSIG), colonoscopy, double contrast barium enema (DCBE), and computed tomography colonography (CTC) (41). FOBTs are non-invasive and commonly used in the detection of CRC. However, it has very low sensitivities for advanced neoplasia and cancer that it cannot be considered an acceptable measure for CRC screening. FSIG has been studied extensively and documented to have a 60-80% reduction in CRC mortality for the area of the colon that can be examined (41). Colonoscopy has increased in popularity because it not only detects polyps but also makes it possible to remove polyps in one procedure. While the majority of polyps will not develop into adenocarcinoma, the removal of such polyps is considered a true form of primary cancer prevention (42). DCBE, like colonoscopy, requires a bowel preparation to clean the colon. However, any positive findings must be further evaluated using colonoscopy, which will require an additional bowel preparation. CTC, commonly referred to as virtual colonoscopy, is considered a minimally invasive method to evaluate the colon and rectum (41).

Additional benefit of CTC is the possibility of finding extracolonic pathology because CTC

7 produces an image of the colon as well as the upper and lower abdomen. However, more defined guidelines for the management of CTC findings is required (41).

The stage of disease in CRC patients must be determined prior to selection and management of an appropriate treatment regime. The current staging system for CRC is the tumour-node-metastasis (TNM) staging system, developed by the International Union Against

Cancer (IUAC) and the American Joint Committee on Cancer (AJCC) in order to standardize staging for the uniform evaluation of data (43). Clinical staging (clinically derived TNM) is carried out during initial evaluation of the patient before any cancer-related therapy is initiated and includes physical examination, radiologic imaging, endoscopy, biopsy, and surgical exploration (44). Pathologic classification (pathologically derived TNM) is based on gross and microscopic examination of the resection specimen or biopsy of a previously untreated tumour

(44). Because the TNM system can be applied to the preoperative evaluation of patients, it is more meaningful and helpful to clinicians in preoperative patient management than the older, pathologically based, Dukes classification system (44). In addition to stage, histological grade of the tumour is also used as a determinant for both, treatment and survival. According to the gland- like structures in the tumour and the regularity of the cells, tumours are graded into one of three categories. Grade 1 tumours contain well-differentiated cells, Grade 2 tumours are moderately differentiated, and Grade 3 tumours are poorly differentiated. CRC prognosis correlates well with both stage and grade, which is not surprising as both of these variables parallel each other

(45). TNM staging and histological grade criteria are summarized in Tables 1.1 and 1.2.

8

Table 1.1 Tumour Stage Classification According to the American Joint Committee on Cancer (AJCC) Cancer Staging Manual, 6th Edition

TNM Stage Description TX Primary tumour cannot be assessed T0 No evidence of primary tumour Tis Carcinoma in situ (limited to mucosa) T1 Tumour invades submucosa and is ≤1cm in size T2 Tumour invades muscularis propria or is ≥1cm is size T3 Invasion through muscle wall to regional tissue T4 Tumour invades other organs/peritoneum NX Regional lymph node metastasis cannot be assessed N0 No regional lymph node metastasis N1 Metastasis in 1 to 3 regional lymph nodes N2 Metastasis in 4 or more regional lymph nodes MX Distant metastasis cannot be assessed M0 No distant metastasis M1 Distant metastasis

Table 1.2 Tumour Staging as Defined by the AJCC, 6th Edition

TNM Disease Stage T-Primary Tumour N-Regional Lymph M-Distant Nodes Metastasis 0 Tis N0 M0 I T1/T2 N0 M0 II A T3 N0 M0 II B T4 N0 M0 III A T1/T2 N1 M0 III B T3/T4 N1 M0 III C Any T N2 M0 IV Any T Any N M1

9

1.1.5 Molecular Pathways in CRC (CIN/MSS, MSI/MIN, CIMP)

The sequential process of genetic mutations and epigenetic alterations is widely believed to drive the initiation and progression of benign adenomas to malignant adenocarcinomas because these mutations influence pathways that regulate hallmark behaviours of cancer (46, 47).

Such mutations provide a clonal growth advantage that leads to the outgrowth of progressively more malignant cells, which will consequently present itself as invasive adenocarcinoma (7). The acquisition of these mutations is facilitated by the loss of genomic stability, a key molecular step in cancer formation (48, 49). Approximately 30% of the genes in the human genome encode for proteins that regulate DNA fidelity, signifying the importance of eukaryotic cells in maintaining the integrity of its DNA (7, 50). Because of this, there are likely many different mechanisms that can cause loss of genomic DNA stability. There are two types of genomic instability, chromosomal instability and microsatellite instability, as well as a more recent phenomenon, epigenetic instability that contribute to the molecular progression in CRC.

Chromosomal Instability (CIN)

One of the earliest genetic alterations that occurs in colorectal carcinogenesis is the mutation or loss of the adenomatous polyposis coli (APC) gene (Figure 1.1a), which is mutated in ~80% of CRCs (51, 52). Dysplastic ACF that carry APC mutations have the highest potential for progressing to CRC. APC is a tumour suppressor and a key player in Wnt/Wingless signalling pathway, which is dysregulated in ~90% of CRCs (53). The Wnt signalling pathway controls cell fate determination and axis specification in all metazoan organisms (54). Wnt signalling is also implicated in a variety of cellular processes that include: proliferation, differentiation, survival, apoptosis, and cellular motility (54). On the other hand, constitutive

10 activation of the Wnt pathway causes hyperproliferation of intestinal epithelial cells and is closely linked with the development and progression of CRC (55). Normally, APC suppresses this pathway by forming a β-catenin destruction complex (with GSK3β and Axin) thereby preventing β-catenin from accumulating in cytoplasm and translocating into the nucleus (56).

The increased likelihood of ACF with mutated APC progressing to cancer may also result from the induction of chromosomal instability and not only from overactivation of the Wnt signalling pathway. APC has, among its several protein binding sites, an EB1-binding domain in its C- terminus (7). EB1 associates with the growing ends of cytoplasmic and spindle microtubules as well as centrosomes. The yeast homologue of EB1 is required for a microtubule-dependent cytokinesis checkpoint (57). APC-null cells present with CIN as a consequence of abnormal mitotic spindles caused by an abundance of microtubules that inefficiently connect with kinetochores resulting in polyploidy (58). However, gains and/or losses of individual are observed in CRCs rather than polyploidy. A model for these discrepancies is that APC inactivation creates a permissive state that allows the evolving tumour clone to tolerate the development of true aneuploidy (59). Despite the fact that CIN occurs in 80-85% of all CRCs and that aneuploidy is recognized as hallmark of cancer, the specific molecular events that account for this form of genomic instability are not well understood (7). Elucidation of pathways responsible for CIN has proven to be extremely challenging because of the vast number of genes that regulate the fidelity of chromosomes. The list of candidate genes numbers more than 100 and includes genes involved in kinetochore structure and function, centrosome and microtubule formation and behaviour, chromosome condensation, sister chromatid cohesion, and cell cycle checkpoint control (7, 59). Possible mechanisms responsible for CIN will be briefly discussed below.

11

Inactivation of a “DNA caretaker” mechanism, base excision repair (BER), is found in a subset of CRCs and appears to lead to predisposition to point mutations that contribute to CRC formation by providing instability at the level of the base pairs. Inactivation of one of the BER genes, MutY homologue (MYH), is a cause of autosomal recessive form of adenomatous polyposis, called the MYH Associated Polyposis (MAP) (7). While somatic mutations in MYH are rare, bialleleic germline MYH mutations are responsible for MAP. Majority of MAP adenomas, 80%, present with aneuploidy, with frequent losses at chromosome 1p, 17, 19, and 22 and gains affecting chromosomes 7 and 13 (60). The exact role of MYH in causing CIN is not clear, but MAP tumours do not show differences in frequency of TP53, SMAD4, or TGFBR2 mutations, indicating a unique molecular pathogenesis compared to sporadic CRCs (60).

The best demonstrated proteins to date to play a role in CIN are the proteins that regulate spindle-kinetochore interactions during mitosis. Animal model studies have shown that these proteins, which control sister chromatid separation at the metaphase-anaphase transition, cause

CIN (7). Mutations in two genes that control the human mitotic checkpoint, BUB1 and BUBR1, were found in a subset of CIN CRC cell lines. Mutations in other human mitotic checkpoint genes, MAD1 and MAD2, have been identified in breast cancer and leukaemia, respectively (7).

Other genes involved in kinetochore function that have been found to be mutated in CRC include

CDC4, ROD, ZW10, ZWILCH, and DING (7).

Abnormal centrosome number and function have been proposed as a candidate mechanism in CIN as most solid tumours are characterized by centrosome amplification and by aneuploidy. The centrosome amplification observed in cancers is thought to increase the frequency of abnormal mitoses and chromosome missegregation (7). A centrosome-associated serine-threonine kinase, STK15/BTAK/ARK1/aurora2, was identified on chromosome 20q13,

12 which is frequently amplified in a variety of cancers (61). While STK15 amplification is a promising potential cause of CIN, two other cell cycle-regulated proteins, polo-like kinase 2 and

4 (PLK2 and PLK4), are overexpressed in some cases of CRC and have been implicated in aneuploidy and abnormal centrosome function (62). Although abnormal regulation of centrosomes is an excellent candidate cause of CIN, additional studies are needed to develop a better understanding of the role of centrosome amplification and centrosome-associated kinase activity in CIN CRCs (7).

Genomic stability is also maintained by checkpoints that regulate the maintenance of

DNA fidelity in response to genotoxic stress. The proteins involved in DNA checkpoint act either as sensors of DNA damage, or transduce DNA damage signal to effector proteins, which then regulate cell cycle arrest/progression. A number of these DNA repair proteins, TP53, ATM,

ATR, BRCA1, and BRCA2, have been shown to play a role in human cancers (7). Of all these checkpoint proteins, TP53 has been most clearly shown to play a role in CRC and to at least play a permissive role for the development of CIN (Figure 1.1a) (7).

Cell cycle proteins have been recently implicated in the maintenance of chromosome fidelity in human cancers. Mutations in CDC4, an evolutionarily conserved E3 ubiquitin ligase that regulates G1-S checkpoint, have been identified in a subset of CRCs (63). Cdc4 is regulated by p53 and is one of downstream effectors of p53-mediated regulation of genomic stability, especially during genomic stress (7).

Other factors, including telomere maintenance, may also play a role in the development of CIN in CRC, and future studies will elucidate their role in CRC development. It is important to note that CIN CRCs, which occur mostly in distal colon, are generally aggressive tumours and

CIN status serves as an independent factor of poor survival in patients with CRC (1).

13

A. Progression of CRCs with CIN

B. Progression of CRCs with MSI

C. Conceptual progression of CRCs with CIMP

Adapted from Grady WM, Carethers JM. Genomic and epigenetic instability in colorectal cancer pathogenesis. Gastroenterology 2008;135(4):1079-99.

Figure 1.1 Genetic and Epigenetic Instability Pathways in CRC Progression of CRCs with CIN (A), MSI (B), and CIMP (C) pathways form normal colonic epithelium to metastatic cancer. Several key genes affected in each pathway, along with the stage when they are affected, are emphasized.

14

Microsatellite Instability (MSI or MIN)

Microsatellites are short (<150 nucleotides) repeated sequence motifs, one-to-five nucleotides long, which are found throughout the human genome in large numbers (64). During

DNA replication, the primer and template strands in a microsatellite can occasionally dissociate and re-anneal incorrectly. This gives rise to heteroduplex DNA molecules, in which the number of microsatellite-repeat units between the template and newly synthesized strand differ (64).

Heteroduplexes in which the unpaired nucleotides are partially extrahelical are known as insertion/deletion loops (IDLs). A microsatellite may lengthen in a daughter cell where nucleotide-pairing slippage (looping) occurs along the newly synthesized strand. Similarly, a microsatellite may shorten if the template strand microsatellite slippage occurs during DNA replication (7). Such alterations in the length of microsatellite repeats in genomic DNA often escape polymerase’s proofreading activity and, unless repaired by the DNA mismatch repair

(MMR) system, define microsatellite instability (MSI). Thus, MSI serves as a marker for deficient DNA MMR activity. MSI can be detected and quantified by examining a panel of microsatellite markers in tumour DNA and comparing it to corresponding markers in DNA from normal tissue. Five or more microsatellite markers from the panel of 10 microsatellite markers recommended by the National Cancer Institute are examined. These markers consist of the mononucleotides BAT-25, BAT-26, BAT-40, and BAT-34C4; the dinucleotides D2S123,

D5S346, ACTC, D18S55, and D10S197; and one penta-mono-tetra compound marker, MYC-L

(65). Tumours are classified as having high-frequency microsatellite instability (MSI-H) if ≥30% of the markers show instability. If <30% of markers show instability, the tumour is classified as low-frequency microsatellite instability (MSI-L), while if there is no apparent instability, tumour is classified as microsatellite stable (MSS) (65). While germline MMR mutations cause MSI in

15

HNPCC patients, approximately 15% sporadic CRCs also display MSI due to the combinations of somatic MMR mutations, loss of heterozygosity (LOH) at a MMR gene locus, and most frequently, MLH1 promoter hypermethylation (66).

The loss of MMR has been shown to increase the normal mutation rate of a cell by as much as 100-1,000-fold, termed the mutator pathway, and permits accumulation of mutations in genes crucial to many cellular functions (67, 68). This increase in the spontaneous mutation rate appears to be the mechanism for the apparent rapid neoplastic progression in HNPCC and sporadic MSI tumours (Figure 1.2) (69). Microsatellite sequences are present in critical growth regulatory genes, and because of the tumour-promoting effects of their mutational inactivation, these genes are the ones selected for in MSI tumours (7). Ultimately, clones that acquire mutations in these genes gain a growth advantage that promotes the formation of the carcinoma

(7). The TGFBR2 gene, which contains two microsatellite sequences, undergoes frameshift mutations, observed in ~85% of MSI CRCs, causing the cells to escape the growth-suppressive effects of its ligand, transforming growth factor (TGF)-β1 (70, 71). Another targeted gene is

BAX, a member of the BCL2 gene family that determine the commitment of the cell to programmed cell death (7). Monoallelic frameshift mutations in BAX, observed in ~50% of MSI

CRCs, prevent programmed cell death and help immortalize cells (72, 73). However, mutations of TGFBR2 and BAX appear to be late in the adenoma-carcinoma progression, occurring in high-grade dysplasia (Figure 1.1b) (74, 75). Additional gene targets in MSI CRCs include the

MMR genes, MSH3, MSH6, and MLH3, further accelerating the accumulation of mutations

(76). Other genes targeted in MSI CRCs include the Wnt signalling genes APC, AXIN2, TCF4, and WISP-3, pro-apoptotic factors APAF1, BCL10, Caspase-5, and FAS, DNA damage response

16 and repair genes BLM, CHK1, MRE11, and RAD50, cell cycle genes PTEN, RIZ, hG4-1, and

E2F-4, and many others (7, 77, 78).

The BRAF gene that encodes a downstream component from KRAS in the

RAS/RAF/MAPK pathway is also often mutated in MSI CRCs, however the exact mechanism is unknown since the BRAF gene does not contain a coding microsatellite sequence (79).

Mutations in BRAF are usually V600E missense mutations that result in a constitutively active

BRAF protein (79).

A number of clinicopathologic features of MSI CRCs distinguish them from CIN/MSS

CRCs. MSI CRCs tend to have prominent tumour infiltrating lymphocytes (TILs), Crohn’s-like reaction (several nodular lymphoid aggregates beyond the advancing edge of the tumour), mucinous and signet ring histology, medullary growth pattern, poor differentiation, lower tumour stage, and occur predominantly in the proximal colon (80-82). MSI-H CRCs also tend to occur more frequently in female CRC patients and with earlier onset of disease (<50 years) (83, 84).

Furthermore, MSI CRCs tend to be diploid, have a better overall survival compared to MSS

CRCs and are less likely to metastasize (85, 86).

The major agent used for the medical treatment of patients with advanced CRC, 5-

Fluorouracil (5-FU), is recognized by the MMR system (87). 5-FU will selectively kill cells with intact MMR, while MMR-deficient cells are resistant to 5-FU (87). In a clinical setting, patients with stage II and III sporadic MSI CRC do not show a survival benefit with 5-FU therapy when compared with MSS CRC patients in retrospective and prospective studies (88-90). 5-FU-based adjuvant chemotherapy might potentially decrease overall and disease-free survival among MSI

CRC patients (88). Similarly, stage III HNPCC patients do not demonstrate a 5-year survival benefit with 5-FU treatment over untreated patients (91). CRC is a heterogeneous disease and

17

MSI may serve as a predictive marker for the response of patients with CRC to specific therapies and as a guide to the selection of optimal therapy (7).

The very existence of an MSI-L CRC subgroup has been controversial. MSI-L status was considered an intermediate step in the MSS-to-MSI-H progression and in vast majority of studies these cases are generally categorized with the MSS cases. Recently, however, MSI-L CRCs are being recognized as a unique subgroup with clinicopathological features that do not follow the pattern of either the MSI-H group or the MSS group (84). Similar to the MSI-H tumours, MSI-L tumours also tend to occur more commonly in proximal colon, but their gender and family history associations are more similar to the MSS tumours (84). More studies are required to fully characterize this CRC subgroup.

18

Adapted from Grady WM, Carethers JM. Genomic and epigenetic instability in colorectal cancer pathogenesis. Gastroenterology 2008;135(4):1079-99.

Figure 1.2 Comparison of Sporadic, FAP, and NHPCC CRCs CRC progression is depicted in sporadic and high-risk hereditary syndromes. Tumour is initiated from a normal colonocyte stem cell that has sustained genetic alterations over time due to local environment and any inherited germline mutations. DNA alterations provide a growth advantage that drives tumour progression as successive clonal outgrowths are generated, ultimately resulting in carcinoma. In FAP, inherited germline APC mutations accelerate tumour initiation; in HNPCC, tumour initiation is normal/slightly accelerated, but tumour progression is greatly accelerated due to the mutator phenotype that results from defective DNA MMR. Photomicrographs depict, in order, normal colonic tissue, tubular adenoma, high-grade dysplasia, and cancer.

19

Epigenetic Instability (CpG Island Methylator Phenotype, or CIMP)

Epigenetics refers to the heritable changes in the pattern of mediated by mechanisms other than the alterations in the primary nucleotide sequence of DNA (92).

Epigenetic modifications include cytosine methylation and a number of histone modifications such as: acetylation, methylation, phosphorylation, sumoylation, and ubiquitination that make up the histone code (93). DNA methylation is a powerful mechanism for gene silencing. It maintains the large amounts of non-coding DNA in the cells in a transcriptionally inert state.

This process prevents the transcription of large parts of the genome that consist of repeat elements, inert viral sequences, and transposons (DNA sequences that move from their usual location into a new region of the genome) (94). Transcription of such elements might be harmful to cells. DNA methylation also stabilizes pericentromeric regions of the chromosome, which are required for faithful DNA replication, and is responsible for X chromosome inactivation in females, silencing of imprinted genes, as well as age-related and tissue-specific gene expression

(94).

The most widely studied epigenetic mechanism in humans is the enzymatic modification of cytosine bases in DNA to form 5-methylcytosine, which only occurs at the 5’-CG-3’ cytosine- phosphoguanine dinucleotides (also called CpG dinucleotides) (95). Approximately 70% of all

CpG dinucleotides in the human genome are methylated, while the remainders are clustered in

CpG-rich regions of ≥200 base pairs that span the promoters and occasionally first exons of genes (95). These regions are known as CpG islands and are found in association with 60% of all human genes. Methylated CpG islands are associated with DNA methyltransferases (DNMTs), histone deacetylases (HDACs), deacetylated histones, methylcytosine-binding proteins (MBPs), and transcriptional co-repressors, which silence gene transcription (Figure 1.3) (94). In normal

20 cells, most CpG dinucleotides outside of CpG islands are methylated, while most CpG-island sites in gene promoters are unmethylated. However, in a cancer cell, the DNA methylation patterns are reversed. Abnormal methylation of promoters containing CpG islands occurs, resulting in gene silencing, whereas CpG sites in the bulk of the genome become unmethylated

(Figure 1.4) (94). Not surprisingly, research in this discipline has focused on promoter silencing by hypermethylation of tumour suppressor genes. Many tumour suppressor genes have been identified as targets of this process, including RB, p16INK4A, VHL, APC, CDH1 (E-cadherin), and MLH1 (96, 97). CpG island methylation is a common event in CRC, affecting genes from many key functional groups that define the cancer phenotype, including Wnt signalling (SFRP- family of genes), mismatch repair (MLH1), cell-cycle regulation (CDKN2A/p16), epithelial differentiation (GATA4, 5), p53-mediated damage response (HIC1), and cell matrix interactions

(TIMP3) (95). Aberrant methylation of HTLF, SLC5A8, MGMT, MINT1, and MINT31 have been observed in ACF, indicating that aberrant promoter methylation occurs early in the adenoma sequence (Figure 1.1c) (7).

In addition to genomic DNA instability being a common phenomenon in CRC, epigenetic instability is also a common occurrence in colorectal neoplasms. A new concept was introduced in 1999 by Toyota and colleagues in which CRCs characterized by multiple concordant methylation events were named CpG Island Methylator Phenotype (CIMP) (98). Since then, several epigenetic instability (CIMP) groups have been recognized based on the methylated loci, microsatellite instability status, and the presence/absence of BRAF and KRAS mutations.

Based on these molecular characteristics, CIMP tumours have been subdivided into two groups, CIMP1 and CIMP2 (7). CIMP1 tumours are characterized by tumour MSI status (~80%) and BRAF V600E mutations (>50%). Commonly used panel of genes to determine the CIMP1

21 subgroup include MLH1, TIMP3, MINT 12, MINT17, RUNX3, SOCS1, CDKN2A, RIZ1, and

P14. CIMP2 CRCs, on the other hand, are characterized by tumour MSS status and KRAS mutations (~90%) (99). Commonly used panel of genes to determine the CIMP2 subgroup include MINT2, MINT27, MEGALIN, and NEUROG1 (7). Both CIMP CRC groups share some clinicopathologic features with MSI CRCs such as the propensity for proximal colonic location, female sex, and mucinous histology (100, 101). However, several clinicopathologic differences between CIMP1 and CIMP2 groups have been reported. CIMP1 CRCs tend to have higher tumour grades and the presence inflammatory cell infiltrate (TILs and/or Crohn’s-like reaction), while CIMP2 tumours do not (100). Patients with CIMP2 CRCs have a significantly worse outcome than either CIMP1, or non-CIMP MSS CRC patients (101).

It is important to note that the CIMP concept has not been widely accepted by all researchers in this field and much debate as to whether the CIMP tumours represent a biologically distinct group of CRCs or are an artificially selected group from a continuum of tumours displaying different degrees of methylation at specific loci, has taken place over the past few years (95, 102). The recognition of several CIMP groups has strengthened the notion that

CIMPs may be a unique molecular subgroup of CRCs defined by a high proportion of aberrantly methylated gene promoters that arise by distinct and unique mechanisms (7). The mechanism responsible for causing CIMP tumours is unknown, but recent progress in our understanding of mechanisms through which DNA methylation occurs makes this a popular area of research.

An unresolved issue related to all types of genetic and epigenetic instability is whether it initiates the adenoma-carcinoma sequence or whether it arises during the process and facilitates

CRC formation (103, 104).

22

Adopted from Baylin SB, Herman JG. DNA hypermethylation in tumorigenesis: epigenetics joins genetics. Trends Genet 2000;16(4):168-74.

Figure 1.3 Changes in the promoter CpG island following DNA methylation The chromatin around the transcriptionally active promoter (green arrow) with non-methylated CpG sites (white circles) is occupied by widely spaced nucleosomes composed of acetylated histones (green ovals) methylated at lysine 4 residues (yellow asterisks). The region is accessible to key components of transcription machinery, including transcription factors (TF), histone acetyltransferases (HAT), and transcriptional coactivators (CA). The flanking regions of on either side of the active promoter contain methylated CpG sites (black circles). These regions are transcriptionally silenced and are embedded in chromatin characteristic of transcriptionally silenced state. These regions contain methylcytosine-binding proteins (MBPs); deacetylated histones (purple ovals) methylated at lysine 9 residues (black asterisks). The blue vertical bars on either side of the unmethylated CpG island represent the unknown molecular events that prevent the spread of DNA methylation and transcriptional silencing machinery across the CpG island in the promoter region of normal cells. The DNA methylation machinery consists of DNA methyltransferases (DNMTs), transcriptional corepressors (CR), and histone decetylases (HDAC). The lower panel depicts the breakdown of the barriers in cancer cells, in which transcriptionally repressive chromatin and DNA methylation have spread into the CpG island in the promoter region silencing the promoter activity (red arrow).

23

Adopted from Baylin SB, Herman JG. DNA hypermethylation in tumorigenesis: epigenetics joins genetics. Trends Genet 2000;16(4):168-74.

Figure 1.4 Distribution of CpG dinucleotides in the human genome and differences in methylation patterns between the normal cells and cancer cells In normal cells, most CpG sites outside of CpG islands, in exons (blue boxes) and introns (lines between blue boxes) are methyated (black circles), whereas most CpG islands are in gene promoters are unmethylated (white circles). In cancer cells, the DNA methylation and chromatin patterns are reversed. Many CpG sites in the bulk of the genome and in coding regions of genes (which should be methylated) are unmethylated, while CpG islands (which should be unmethylated and permit transcription) are methylated with associated transcriptional silencing.

24

1.1.6 Hereditary Syndromes, Pathology and Presentations

The majority of CRCs are sporadic (~70-85%) and arise as the result of somatic alterations in colon cancer susceptibility genes. CRCs display familial aggregation in ~25% of cases, with <10% of all CRCs attributed to inherited susceptibility syndromes. The hereditary

CRC syndromes can be broadly classified into two categories based on the presence/absence of multiple colorectal polyps. The development of large numbers of polyps distinguishes the polyposis syndromes, which can be additionally subdivided on the basis of histologic criteria into adenomatous or hamartomatous syndromes (105).

Hereditary Nonpolyposis Colorectal Cancer (HNPCC)

Hereditary Nonpolyposis Colorectal cancer (HNPCC), or Lynch syndrome (named after the oncologist who pioneered the study of this disease), is the most common form of hereditary

CRC, accounting for 1-6% of all CRCs (106-108). It is an autosomal dominant syndrome caused by germline mutations in MMR genes. Majority of HNPCC families (over 90%) are found to have mutations in three key MMR genes, MLH1, MSH2, and MSH6 (109). Mutations are located in all regions of these genes without any obvious hotspots. Other less commonly implicated genes are PMS2 and PMS1 (110). The numbers of polyps in patients with HNPCC are usually modest. However, compared to polyps observed in general population, those in

HNPCC exhibit an earlier onset (third to fourth decade of life), greater size and frequency, more villous and dysplastic characteristics, and a more rapid progression to cancer (Figure 1.2) (111,

112). Patients with HNPCC have an 80% lifetime risk of developing CRC, with the mean age of onset of 44 years, which is approximately 20 years earlier than that for sporadic CRCs (16). In

HNPCC, multiple CRCs tend to occur (synchronous 18%, or metachronous 24%), with

25 propensity for proximal colonic location (70%), poor differentiation, mucinous histology, and more than 95% of CRCs demonstrate MSI (113). HNPCC patients may also present with extracolonic tumours, such as endometrial, stomach, small bowel, ovarian, hepatobiliary epithelium, uroepithelial (kidney and ureter) epithelium, bladder, and brain (114-116). Women with HNPCC have a 60% lifetime risk of developing endometrial tumours (16). In the absence of screening, most HNPCC patients remain asymptomatic until the development of cancer.

Because early recognition of patients at risk for HNPCC is not straightforward, several diagnostic criteria were developed. In order to identify the genes involved and to define families with the syndrome, Amsterdam criteria (Amsterdam I) were established (117). Several modified versions have been proposed to include extracolonic cancers and to account for patients from small families (Table 1.3) (118-120). However these criteria were intended for research and not for diagnostic purposes. The Bethesda, and revised Bethesda, Guidelines were subsequently developed to identify patients who would benefit from MSI testing (121, 122).

A novel mechanism causing HNPCC has been identified where different deletions that disrupt the 3’ end of TACSTD1, gene located immediately upstream of MSH2, lead to inactivation of the adjacent MSH2 gene through methylation induction of its promoter in tissues that express TACSTD1 (123). The resulting allele-specific hypermethylation of the MSH2 promoter is transmitted over generations.

Patients with an unusual variant of HNPCC, Muir-Torre syndrome, manifest sebaceous gland neoplasms, skin cancers (keratoacanthomas and basal cell carcinoma), and an increased risk of breast cancer (105, 124).

Recently, a number of reports have described patients with biallelic germline mutations in one of the MMR genes. These patients develop childhood malignancies, mainly

26 haematological malignancies, brain tumours, early-onset CRCs, and HNPCC-associated tumours

(125). Patients with these tumours are said to suffer from a constitutional mismatch repair- deficiency syndrome (CMMR-D) and often display MSI in germline DNA sources.

Additionally, relatives of patients that meet strict criteria for family risk of CRC

(Amsterdam I) and have microsatellite stable or low-frequency microsatellite instability tumours

(MSS/L) have a statistically significantly lower risk of developing CRC than relatives of

Amsterdam I patients with MSI-H tumours (126). This new type of familial segregation of

MSS/L tumours was termed “familial colorectal cancer type X” syndrome (126, 127). While the exact mechanism for this is unknown, some of this familial risk may result from common variant alleles with low to moderate penetrance of key candidate genes already associated with colorectal cancer (128, 129).

27

Table 1.3 Clinical criteria for hereditary nonpolyposis colorectal cancer

Name of Criteria Criteria Amsterdam (117) 1. Three or more relatives with CRC, one of whom is a first degree relative of the other two; 2. CRC involving at least two generations; 3. One or more CRCs diagnosed at age <50 years Amsterdam II 1. Three or more relatives with HNPCC-associated cancer (colorectal, (118) endometrial, small bowel, ureter, renal pelvis), one of whom is a first- degree relative of the other two; 2. CRC involving at least two generations; 3. One or more CRCs diagnosed at age <50 years Modified 1. Very small families which cannot be further expanded can be Amsterdam (119, considered as HNPCC with only two CRCs in first-degree relatives; 120) CRC must involve at least two generations, and at least one CRC must be diagnosed at age <50 years 2. In families with two first-degree relatives affected by CRC, the presence of a third relative with an unusual early-onset neoplasm or endometrial cancer is sufficient Young age at Proband diagnosed at age <40 years without a family history fulfilling onset Amsterdam or Modified Amsterdam criteria HNPCC-variant Family history suggestive of HNPCC, but not fulfilling Amsterdam, Modified Amsterdam, or young age at onset criteria Bethesda (121) 1. Amsterdam criteria 2. Two HNPCC-related cancers, including synchronous and metachronous CRCs or associated extracolonic cancers 3. CRC and a first-degree relative with CRC and/or an HNPCC-related extracolonic cancer and/or a colorectal adenoma; one of the cancers diagnosed at age <45 years, and the adenoma diagnosed age <40 years 4. CRC or endometrial cancer diagnosed at age <45 years 5. Right-sided CRC with an undifferentiated pattern on histology diagnosed at age <45 years 6. Signet-ring cell-type CRC diagnosed at age <45 7. Adenomas diagnosed at age <40 years Revised Bethesda 1. CRC diagnosed at age <50 years (122) 2. Synchronous or metachronous CRC or other HNPCC-associated tumours regardless of age 3. CRC diagnosed at age <60 years with histologic findings of infiltrating lymphocytes, Crohn’s-like lymphocytic reaction, mucinous/signet ring differentiation or medullary growth pattern 4. CRC in ≥1 first-degree relative(s) with an HNPCC-related tumour, with one of the cancers being diagnosed at age <50 years 5. CRC diagnosed in ≥2 first- or second-degree relatives with HNPCC- related tumours, regardless of age Adapted from Strate LL, Syngal S. Hereditary colorectal cancer syndromes. Cancer Causes Control 2005;16(3):201-13.

28

Familial Adenomatous Polyposis (FAP)

Familial Adenomatous Polyposis (FAP) was the first recognized polyposis syndrome and the most characterized (105). FAP accounts for <1% of all CRCs (16). It is an autosomal dominant syndrome, caused by germline mutations in adenomatous polyposis coli (APC) gene with ~100% penetrance. It is characterized by rapid initiation with hundreds-to-thousands of colorectal adenomatous polyps developing during the second or third decade of life (Figure 1.2)

(105). The number and size of these polyps increases with time, potentially reaching 5,000 polyps, and have a slight predisposition for distal colon (105). FAP is usually associated with

APC mutations in codons 169-1,600 (16), with a mutation hotspot located in exon 15 between codons 1,286 and 1,513 (130). More than 95% of these germline mutations are truncating or nonsense (16). Patients with FAP develop polyps in the upper gastrointestinal tract, mainly gastric fundus and body, jejunum, and ileum. These polyps, unlike duodenal and ampullary polyps, have low malignant potential (105). Extracolonic tumours in FAP patients include papillary carcinoma of the thyroid, hepatoblastoma, adrenal hyperplasia and carcinoma, and various central nervous system tumours (131). Another associated extracolonic feature is congenital hypertrophy of the retinal pigment epithelium (CHRPE), is seen in some cases of

FAP, mostly in patients with APC mutations in codons 463-1,387 (16).

Several FAP variants with unique characteristics have been identified. Gardner’s syndrome is characterized, in addition to colorectal adenomas, by osteomas, epidermoid cysts, skin fibromas, dental anomalies, and desmoid tumours. Causal APC mutations in Gardner’s syndrome occur in the small region between codons 1,403 and 1,587 (16). Turcot syndrome is characterized by central nervous system tumours (medulloblastomas, astrocytomas, and ependymomas) and colorectal adenomas. Mutations in APC (70%) as well as MMR genes,

29

MLH1 and PMS2 (30%), are observed in Turcot syndrome patients (105). In addition to these syndromes, an attenuated form of FAP (AFAP), with a more positive prognosis and far smaller numbers of adenomas (usually less than 100), has been distinguished from classic FAP. Most cases of AFAP arise from mutations in the 5’ region of APC, upstream of codon 157 (16).

Hamartomatous Polyposis Syndromes

The hamartomatous polyposis syndromes are very rare, and combined, account for <1% of all CRCs. They are characterized by multiple, benign, nodular growths in the mucous lining of the intestinal wall, which develop at a young age (132).

Peutz-Jeghers syndrome (PJS) patients are distinguished by diffuse intestinal hamartomatous polyps and distinctive mucocutaneous pigmentation (black or brown macules on the perioral and oral mucosa as well as on the face). It is an autosomal dominant syndrome with high penetrance caused by mutations in a tumour suppressor, LKB1, serine/threonine kinase

(133, 134). Patients with PJS are at increased risk of gastrointestinal as well as extraintestinal cancers, including cancers of the pancreas, breast, ovaries, cervix, thyroid, lung and prostate

(135, 136).

Juvenile polyposis syndrome (JPS) patients develop distinctive smooth, spherical polyps composed of cystically dilated crypts. These polyps have increased malignant potential, but the exact magnitude of cancer risk is unknown (137). CRC, gastric, duodenal, and pancreatic cancers are observed in JPS patients (138). It is caused by mutations in genes involved in TGF-β signalling pathway, SMAD4 and BMPR1A (139-141). Mutations in PTEN are also found in a small number of JPS families (142).

30

Cowden’s disease and Ruvalcaba-Myhre-Smith/Bannayan-Zonana syndrome are closely related to JPS. Both syndromes involve mutations in PTEN gene (143). The hamartomatous polyps throughout gastrointestinal tract observed in these syndromes do not appear to be pre- neoplastic (105).

1.2 Mismatch Repair (MMR)

1.2.1 Sources of DNA Nucleotide Mismatches

Base pairing is an inherent structural and functional trait of deoxyribonucleic acid (DNA) where canonical base pairing, also referred to as Watson-Crick pairing, follows a strict rule by which adenine (A) pairs with thymine (T) and cytosine (C) pairs with guanine (G), establishing two and three hydrogen bonds respectively (144). DNA replication is an extraordinarily faithful process with mutations occurring at a frequency of 1 in 109 to 1010 base pairs per cell division

(145). In eukaryotes, the majority of nuclear DNA is replicated by two members of the B family of DNA polymerases, Polδ and Polε (146). The average fidelity of these two enzymes is in the order of one error in 105 nucleotides synthesized, which is further enhanced by their inherent proofreading ability to one error in 107 nucleotides (147). Considering that the size of the human genome is ~3x109 base pairs, an error rate of 1 in 107 base pairs will generate hundreds of mispairs per DNA replication (144). Nucleotide misincorporation during DNA synthesis yields mismatches or noncomplementary base pairs within the DNA, which during the subsequent round of DNA replication are fixed as mutations (148). Such erroneous occurrences threaten the integrity of the DNA structure and the genetic code (144).

DNA mismatches can also occur at sites of DNA damage. Some examples of exogenous

DNA damage are ionizing radiation, sunlight, chemical compounds, reactive oxygen and

31 nitrogen species, and genotoxic drugs (149). The oxidation of dGTP, or its precursors, to 8-oxo- dGTP through reactive oxygen species (ROS) is the most common form of oxidative DNA damage (144). 8-oxo-GTP competes with dTTP for incorporation opposite A during DNA synthesis, resulting in generation of 8-oxo-G⋅A mispairs and subsequent T⋅A→G⋅C transversion mutations unless repaired (150, 151). Similarly, nucleotide pool alkylation generates base derivatives, including the highly mutagenic O6-methylguanine (MeG), which upon conversion to

MeGTP, its pairing with C gets distorted and pairing with T becomes more favourable in the context of the DNA duplex (144). If not corrected, the resulting MeG⋅T mismatch will give rise to A⋅T→G⋅C transition mutations (152). Alkylating agents, such as N-methyl-N’-nitro-N- nitrosoguanidine (MNNG), generate a range of DNA lesions including MeG, which will pair equally well with C or T during DNA replication and result in G⋅C→A⋅T transition mutations

(153, 154). Alterations in the dNTP pool also result in mismatch generation. If the dNTP pool gets out of balance so that the ratio of correct to incorrect dNTP decreases, base substitutions by

DNA polymerases increases (155). This effect is often exploited by clinical chemotherapies such as 5-Fluorouracil (5-FU). 5-FU inhibits thymidylate synthase (TS), the enzyme responsible for the reductive methylation of dUMP in the final step of dTMP biosynthesis resulting in accumulation of dUMP at the expense of dTMP. Upon phosphorylation of dUMP to dUTP, it gets incorporated into DNA (156, 157), occasionally opposite G, to generate U⋅G mismatches

(158).

Another type of DNA mismatch can arise when, due to the presence of extra nucleotides in one DNA strand relative to the other, one or several DNA bases are left unpaired and form small nucleotide insertion/deletion loops (IDLs) (144).

32

A most surprising recent discovery is that, under certain circumstances, DNA mismatches are actively generated to facilitate localized mutagenesis or recombination (144). An important example of this is somatic hypermutation (SHM) in antigen-stimulated B-cells, where activation induced deaminase (AID) deaminates DNA cytosines to generate G⋅U mismatched in the immunoglobulin variable gene region (144). The function is to induce a mutagenic response that will result in an affinity maturation of the expressed antibody (159). Deamination of cytosines by

AID also induces class-switch recombination (CSR) at the immunoglobulin locus, most likely by providing excision repair mediated single-stranded DNA (ssDNA) breaks, which trigger initiation of recombination (160).

1.2.2 MMR System Overview and History

The DNA mismatch repair (MMR) system guards the integrity of the genome (161).

MMR contributes ~50-1000-fold to the overall fidelity of DNA replication, targeting mispaired bases and insertion-deletion loops occurring during replication, homologous recombination, or as a result of DNA damage (161). The following section will outline the function of MMR, its importance in maintaining DNA integrity, and the consequences of defective MMR system.

1.2.3 Replication Error Repair

The process of MMR is highly conserved throughout evolution (149). Essential components of the system were initially identified in Escherichia coli (E. coli) by genetic analysis and were termed the “mutator” (Mut) genes since their inactivation generated hypermutable strains (149, 162, 163). The first MMR gene to be identified in E. coli was mutS1

(164). Subsequent studies led to the characterization of at least 10 MMR factors in E. coli (148,

33

165). Heteroduplex repair in E. coli is controlled by the status of adenine methylation at GATC sequences (166). Newly synthesized DNA is subject to modification at GATC sequence by the deoxyadenine (dam) methylase following a transient delay (167-169). Dam methylase lags behind the replication fork by approximately 2 minutes (64). Mismatch repair of hemi- methylated DNA occurs on the unmodified, newly synthesized strand (148). E. coli cells that are either deficient in dam methylase or overproduce dam methylase display a mutator phenotype

(170-173). Three proteins are essential in detecting a mismatch and directing repair process:

MutS, MutL, and MutH (149). MutS, with a DNA-binding domain and an ATPase/dimerization domain, detects a mismatch and forms a homodimer that non-specifically binds and bends DNA in search of mismatches (149). Once MutS recognizes a specific mismatch, it undergoes a conformational change and initiates MMR pathway through direct and indirect interactions with other proteins, including MutL, MutH, and UvrD (149). MutL is recruited to the heteroduplex in a MutS- and ATP-dependent fashion (174-176). The MutL protein is an ATPase and is believed to act as a “molecular matchmaker” that mediates the interaction between MutS and MutH (177).

MutL dimerizes and interacts with MutS, activating the latent endonuclease activity of MutH

(149). Monomeric MutH endonuclease recognizes the temporarily unmethylated, newly synthesized, strand and cleaves it at hemimethylated GATC sequence at a site 3’ or 5’ to the mismatch, within 1,000 bp of the mismatch (178, 179). MutL also recruits UvrD, also known as

MutU, to the damaged site (149). UvrD is a DNA helicase II that unwinds the DNA from the nick generated by MutH to about 100 bp past the mismatch and single-stranded DNA binding

(SSB) protein stabilizes the single-stranded gap (180). The ssDNA flap is degraded in the 5’→3’ direction by ExoVII or RecJ exonuclease, if the nick occurred 5’ to the mismatch, or in the

3’→5’ direction by ExoI, ExoVII, or ExoX exonuclease, if the nick occurred 3’ to the mismatch

34

(181-183). Finally, the SSB-stabilized single-stranded gap is filled in by DNA polymerase III holoenzyme and LigI DNA ligase seals DNA ends (109). Importantly, for in vitro MMR activity,

β clamp, a polymerase processivity factor, and γ complex, which loads β clamp onto the DNA helix are required (180). β clamp interacts directly with MutS (184).

While the MMR system in prokaryotes is well understood and minimal components have been identified, our understanding of the eukaryotic MMR system is not as complete. The MMR pathway in eukaryotes follows the broad outline described above for the E. coli methyl-directed pathway, however some notable differences are readily apparent (185). First, while bacterial

MutS and MutL function as homodimers, their eukaryotic homologues are heterodimeric complexes composed of two related, but distinct protein subunits (185). Actually, eukaryotic cells possess several MutS and MutL homologues (MSHs and MLHs respectively), and the choice of subunit partners dictates substrate specificity and cellular functions (165, 185). Second, while E. coli and other closely related Gram-negative bacteria take advantage of dam methylation to direct strand-specific repair, such signals are not available to other prokaryotic or eukaryotic cells (185). Third, no eukaryotic homologues of MutH or UvrD have been identified or implicated in eukaryotic MMR. A list of prokaryotic factors and their functions in MMR and their eukaryotic homologues is summarized in Table 1.4.

35

Table 1.4 Identity and functions of E. coli and eukaryotic proteins involved in MMR

E. coli Function Homologues Function protein MutS Binds mismatches MSH2-MSH6 Repairs single base-base (MutSα) mismatches and 1-2 base IDLs MSH2-MSH3 Repairs some single base IDLs (MutSβ and IDLs ≥2 bases MutL Matchmaker, coordinates MLH1-PMS2 Matchmaker for coordinating multiple MMR steps (yPMS1) events from mismatch (MutLα) recognition to DNA repair synthesis MLH1-PMS1 Human heterodimer function (yMLH2) unknown (MutLβ) Suppresses some IDLs in yeast MLH1-MLH3 Suppresses some IDLs (MutLγ) Participates in meiosis MutH Nicks nascent unmethylated None strand at hemimethylated GATC sequence γ-δ Complex Loads β-clamp onto DNA RFC Complex Loads PCNA, modulates excision polarity β-Clamp Interacts with MutS and PCNA Interacts with MutS and MutL may recruit it to mismatches homologues and/or replication fork Recruits MMR proteins to Enhances DNA pol III mismatches processivity Increases mismatch binding specificity of MSH2-MSH6 UvrD/ Loaded onto DNA at nick None MutU/ by MutS and MutL LBA1? Helicase II Unwinds DNA to allow excision of ssDNA ExoI/ExoX 3’ to 5’ excision of ssDNA EXOI (Rth1) Excision of dsDNA RecJ 5’ to 3’ excision of ssDNA 3’ exo of Polδ Excision of ssDNA ExoVII 3’ to 5’ and 5’ to 3’ excision 3’ exo of Polε Synergistic mutator with Exo1 of ssDNA mutant DNA pol III Accurate resynthesis of DNA polδ Accurate repair synthesis DNA SSB Participates in excision and RPA Participates in excision and in in DNA synthesis DNA synthesis DNA ligase Seals nicks after completion DNA ligase Seals nicks after completion of of DNA synthesis DNA synthesis dsDNA – double-stranded DNA, PCNA – proliferating cell nuclear antigen, RFC – replication factor C, ssDNA – single-stranded DNA, exo – exonuclease, LBA1 – lupus brain antigen 1. Adapted from Kunkel TA, Erie DA. DNA mismatch repair. Annu Rev Biochem 2005;74:681-710.

36

Mammalian mismatch repair begins with the recognition of mispaired bases by either of the two heterodimeric MutS homologue complexes, MSH2-MSH6 (MutSα), which recognizes base-base mispairs and small insertion-deletion loops (up to 10 unpaired nucleotides), or MSH2-

MSH3 (MutSβ), which only recognizes larger (up to 16 unpaired nucleotides) insertion-deletion loops (186-188). A second protein complex of MutL homologues, MLH1-PMS2 (MutLα), or

MLH1-MLH3 (MutLγ), binds MutSα or MutSβ to form a ternary complex and recruits additional proteins required for the repair. MLH1-PMS1 (MutLβ) has been identified in humans but its function is still unknown (189). MutL proteins belong to the GHKL

(gyrase/Hsp90/histidine-kinase/MutL) ATPase family and their ATPase function is critical for

MMR (190). Upon formation, the ternary complex undergoes ATP-dependent conformational change which allows it to slide away (sliding clamp) from the mismatch in either 3’ or 5’ direction (144). The complex will reach proliferating cell nuclear antigen (PCNA), which is loaded onto the 3’ of an Okazaki fragment or onto the 3’ end of the leading strand by replication factor C (RFC) (64). Binding to PCNA/RFC leads to the activation of a MutSα

/PCNA/RFC/ATP-dependent endonuclease activity (located in the PMS2 subunit of MutLα) that will introduce nicks in the discontinuous strand (191). This generates 5’ entry points for the

5’→3’ exonuclease (EXO1), independent of whether the location of initial strand discontinuity was 5’ or 3’ to the mismatch (144). The replication protein A (RPA) regulates the rate of DNA resection by the MutSα-EXO1 complex and additionally binds and protects single-stranded DNA generated by the exonuclease (144). MutLα regulates termination of mismatch-provoked excision by suppressing EXO1 hydrolysis of DNA that lacks a mismatched base pair (78, 191,

192). Polymerase δ (Polδ) loads at the 3’ terminus and fills in the gap with help of its cofactors

PCNA and RFC (144). Finally, DNA Ligase I completes the repair process by sealing the

37 remaining nick. A very simplified view of the current model of mammalian MMR is summarized in Figure 1.5.

There are several components of the eukaryotic MMR system that are still not well understood. Three models have been proposed for signalling events that take place downstream of the mismatch recognition. The active-translocation model proposes that the MutSα (probably with MutLα) will translocate along the DNA in a controlled manner dependent on ATP hydrolysis (193, 194). The second, molecular-switch, model favours the stochastic bidirectional diffusion of multiple MutSα (probably with MutLα) sliding clamps away from the mismatch

(195). Clamps that encounter a strand break with PCNA will then initiate the repair process. The third, DNA bending/verification or stationary, model proposes that MSH proteins remain bound at the mismatch and that the communication between the mismatch and the strand recognition signal involves DNA bending and looping (196, 197). These three models are summarized in

Figure 1.6. Another element of eukaryotic MMR that is not well understood is how the system discriminates between the newly synthesized and parental DNA strands. While E. coli uses hemi-methylated GATC sites for strand discrimination, it is proposed that eukaryotic system uses strand breaks such as the 5’ or 3’ termini of Okazaki fragments in the lagging strand, or the

3’ terminus of the leading strand (78, 198). Finally, all components of eukaryotic MMR are still not finalized. Human MMR system has been reconstituted in vitro using the proteins described above, however several other proteins may also play a role in vivo. Similar to the prokaryotic

MMR system, eukaryotic system is bidirectional. However, EXO1 only contains 5’→3’ activity and another exonuclease, MRE11, with a 3’→5’ activity may also be involved (199). Another protein, non-histone chromatin component high-mobility group box 1 (HMGB1) is believed to be involved in MMR as MutSα-dependent EXO1-mediated 5’→3’ degradation is optimal in its

38

Adapted from Kunz C, et al. DNA Repair in mammalian cells: Mismatched repair: variations on a theme. Cell Mol Life Sci 2009;66(6):1021-38.

39

Figure 1.5 Summary of mammalian MMR Simplified overview of mismatch repair system in a mammalian cell. Bidirectional MMR requires strand discontinuities located either 3’ or 5’ to the mismatch (T-G mismatch) (a). MSH2-MSH6 heterodimer recognizes and binds the mismatch (b), recruits the MLH1-PMS2 heterodimer, and undergoes an ATP-dependent conformational switch that allows the complex to slide along the DNA molecule (c). A latent PMS2 endonuclease is activated in MSH2-MSH6-, RFC-, PCNA-, and ATP-dependent manner and introduces nicks in the discontinuous strand (red arrows) (d). These nicks generate 5’ entry points for EXO1, which degrades the nicked strand (e), generating single stranded gaps that are protected by RPA (f). Polymerase then loads at the 3’ terminus filling in the gap with help of its co-factors PCNA and RFC, and, finally, DNA Ligase I completes the repair by sealing in the remaining nick (g).

40

Adapted from Li GM. Mechanisms and functions of DNA mismatch repair. Cell Res 2008;18(1):85-98.

Figure 1.6 Three models for downstream signalling between the mismatch recognition and strand recognition signal The stationary or trans model (right) states that MutS proteins remain bound to the mismatch while searching for the strand discrimination signal. The protein-protein interaction then induces DNA looping that brings the two distant sites together. The two sites can cooperate in a trans configuration. In two cis/moving models, the MSH proteins move away from the site and search for strand discrimination signal. In the translocation model (left), unidirectional ATP-dependent MutS protein movement results in formation of an alpha-like loop. In the molecular switch model (centre), binding of a MutS protein to the mismatch triggers a bi-directional sliding of the protein away from the mismatch, thereby emptying the mismatch site for an oncoming MutS protein. Mismatch repair process begins once the MutS proteins reach the strand break.

41 presence (192).

1.2.4 MMR in DNA Damage Signalling, Cell Cycle Arrest, and Apoptosis

The MMR system is implicated in the cellular response to several types of DNA damage, including lesions produced by methylating and alkylating agents, ultraviolet radiation, and other carcinogens (148). Recognition and processing of such lesions by MMR activates DNA damage signalling pathways, resulting in cell cycle arrest, and in the case of high lesion load, apoptosis

(148). Cell cycle arrest in an important mechanism for preventing cells with DNA damage from replicating (78). MutSα- and MutLα-deficient cells are defective in cell cycle arrest in response to variety of DNA damaging agents (154, 200, 201). In addition, MMR-deficient cells are 100 times more resistant to killing by alkylating agents and approximately 2-4 times more resistant to killing by cisplatin (202). The exact molecular basis of this effect is unknown, but it has been reported that MMR-deficient cells fail to phosphorylate p53 and p73 in response to DNA damaging agents (203, 204). Treatment of human cells with methylating agents results in phosphorylated p53 and induction of apoptosis, a response that is dependent on functional

MutSα and MutLα (205). These findings implicate ataxia telangiectasia mutated (ATM), ATM- and Rad3-related (ATR), and/or c-Abl because these kinases phosphorylate p53 and p73 in response to DNA damage (203). MutSα and MutLα interact physically with ATM, ATR - ATR interacting protein (ATRIP), c-Abl, and p73 in cells treated with DNA damaging agents (201,

206-208). PMS2 can also interact with p73 independently of MLH1, indicating that MMR proteins have specific roles in the DNA damage response (209). MMR mutant cells fail to trigger

G2/M arrest in response to the methylating agent MNNG or similar alkylating agents (210) and, more specifically, this methylation-induced G2/M arrest is dependent on full complement of

42

MLH1 protein (211). The components that transmit the MMR-dependent G2/M checkpoint arrest in response to methylating agents are: ATM, ATR, checkpoint kinase 1 (Chk1) and checkpoint kinase 2 (Chk2) (212). MSH2 binds to Chk1 and Chk2 in response to DNA damage and it physically interacts with ATR, which leads to Chk1 phosphorylation (213, 214). A finding that the mitogen activated protein (MAP) kinase, p38α, is activated in MLH1-dependent fashion in response to the methylating agent, temozolomide, provides an additional link between the MMR system and G2/M checkpoint (215). A simplified model of DNA damage signalling pathway involving MMR is summarized in Figure 1.7.

Such observations implicate MMR proteins in a signalling cascade that leads from DNA damage to cell cycle arrest and/or apoptosis (78). In order to describe the role of MMR in DNA damage signalling, two models have been proposed. The first model, futile DNA repair cycle, suggests that MMR, which targets only the newly replicated DNA strand, engages in a futile

DNA repair cycle when it encounters DNA lesions in the template strand. This futile cycling then activates DNA damage signalling pathways to induce cell cycle arrest and apoptosis (200).

An alternative model, the direct signalling model, suggests that MutSα/MutLα directly trigger

DNA damage signalling and a checkpoint response by recruiting ATM or ATR/ATRIP to the lesion (208).

43

Adapted from Jun SH, Kim TG, Ban C. DNA mismatch repair system. Classical and fresh roles. Febs J 2006;273(8):1609-19.

Figure 1.7 MMR-mediated DNA damage signalling leads to G2/M arrest and apoptosis MMR proteins bind to damaged DNA and recruit various signal-transducing kinases, including ataxia telangiectasia mutated (ATM), ATM and Rad3-related (ATR), and checkpoint kinase 1/checkpoint kinase 2 (Chk1/Chk2). These kinases then stabilize and activate p53, a key component in DNA damage responses, including cell cycle checkpoint activation and programmed cell death (apoptosis). p73, a p53 homologue, also transduces the MMR protein- dependent DNA damage response, and postmeiotic segregation 2 (PMS2) is known to bind and stabilize p73. In this pathway, the p38 mitogen-activated protein (MAP) kinase pathway connects MMR proteins and p53/p73. c-Abl, a tyrosine kinase, acts upstream in this pathway by stabilizing p73.

44

1.2.5 Other Roles of MMR

Fidelity of Genetic Recombination

MMR affects the efficiency and fidelity of both mitotic and meiotic DNA recombination

(187, 216, 217). Recombination involves the pairings of single strands of DNA derived from different parental duplexes, which often leads mismatch generation (216). A failure to repair such mismatches results in segregation of non-identical DNA strands at the next mitotic division

(216). During mitotic recombination, MMR is believed to prevent strand exchange between similar, but non-identical, or homeologous sequences. The proposed mechanism by which MMR accomplishes this is by blocking the strand-exchange process on mismatch detection (64).

Consequently, loss of MMR increases the rate of gene duplication 50-100-fold; creating a genetic destabilization effect that is attributed to illegitimate recombination, which may further contribute to cancer predisposition (218). This effect is illustrated by E. coli and Salmonella typhimurium, which do not normally exchange genetic information, however, in the presence of

MMR defects within one of these species allows it to incorporate DNA material from the other into its genome (219, 220). The MMR system appears to be a chief determinant for definition of the species barrier in bacteria (148). Studies in yeast (Saccharomyces cerevisiae) demonstrated elevated levels of recombination between diverged sequences more so in msh2 mutants than in or in pms1 mutants (221, 222). Such results indicate that msh2 possesses anti-recombinant activity that is not dependent on MutL homologues (216). Additional studies in yeast have implicated MutSα and a RecQ family helicase, SGS1, in suppression of homeologous recombination (223). It has been postulated that MutSα recruits SGS1 to the site of DNA mismatches, where it unwinds the heteroduplex and blocks recombination (223). While suppression of homeologous recombination by MMR proteins in human cells is not well

45 understood, two human SGS1 homologues, BLM and RECQ1, interact with MutSα, indicating that a similar mechanism to yeast cells may also be present in human cells (224, 225).

Meiosis is the cytological process, which reduces the diploid chromosome content of gametocytes by one half to create haploid gametes (226). Following a single round of DNA replication, meiosis is achieved through two consecutive cycles of chromosome divisions (226).

Crossovers between homologous non-sister chromatids are created from homology-dependent repair of meiosis-induced DNA double strand breaks during the first round of meiotic division

(226, 227). Defects in crossover formation cause a failure in chromosome segregation resulting in inefficient progression through meiosis and poor viability of the meiotic products (226).

Inactivation of two new MMR factors, MSH4 and MSH5, result in such meiotic defects in yeast

(228, 229). A heterodimer composed of MSH4 and MSH5 forms a sliding clamp which was demonstrated to load onto synthetic Holliday junctions, and Holliday junction progenitors, with high specificity (230). Repeated loading of MSH4-MSH5 heterodimers stabilizes the recombination intermediate and facilitates its maturation to a double Holliday junction, which is then resolved to yield reciprocal crossovers associated with interference (COi) (226). Bound

MSH4-MSH5 may play a role in positioning a Holliday junction resolvase to cut the junctions in opposite directions (226). Other MMR proteins MLH1-MLH3 complex is also believed to play a role in meiosis since both male and female mice that lack these proteins are sterile (231, 232).

PMS2 is also believed to play a role in synaptonemal-complex formation, but perhaps only in spermatogenesis, as only male knockout mice are sterile (233).

Generation of Immunoglobulin Diversity

Recent animal model studies have suggested that MMR proteins are directly involved in

46 antibody diversification. MSH2- and MSH6-deficient mice accumulated five-fold fewer mutations in the V region of antibody genes (234, 235). During antibody diversification, AID generates G⋅U mispairs, which are processed by MMR (236). However, during MMR, high- fidelity polymerases, δ and ε, are replaced by error-prone polymerase η, which introduces base substitutions and frameshift mutations (237). Additionally, MMR proteins play an important role in class switch recombination (CSR) in which the IgM constant region gets substituted by downstream constant sequences (78). In order to accomplish this, MMR proteins utilize strand breaks generated by uracil DNA glycosylase (UDG) to repair the AID-induced G⋅U mispairs in a strand-indiscriminate manner, resulting in double-strand DNA breaks, which stimulate CSR

(238).

Trinucleotide Repeat Instability

Trinucleotide repeat expansion is the cause of several common neurodegenerative diseases, including Huntington’s disease, myotonic dystrophy, fragile X syndrome, and

Friedrich’s ataxia (148, 239). Long trinucleotide tracts (>100 repeats) expand and delete less frequently in MMR-deficient E. coli strains (240). Subsequent studies with mouse models revealed similar results where somatic and germline expansions at CAG⋅CTG loci within

Huntington’s gene and CTG⋅CAG loci within myotonic dystrophy protein kinase (DMPK) depend on functional MMR (148). Many of these studies are conducted using animal models, however, the role of MMR in human neurological diseases involving TNR is at present unclear

(78). Additional genetic and biochemical studies are required to elucidate the mechanism of TNR expansion in human cells.

47

1.2.6 Role of MMR System In Cancer

Inactivation of the human MMR system leads to a large increase in spontaneous mutability and a strong predisposition to tumour development (241). Loss of normal MMR function leads to microsatellite instability (MSI), a type of genomic instability characterized by alterations in the length of microsatellite sequences (16, 187). Over 300 mutations have been identified in MSH2, MLH1, and MSH6 genes, with the majority being missense mutations leading to inactivation of the mismatch repair (16). In the early 1990s, it was shown that defects in MMR pathway are the cause of typical and atypical HNPCC (242). MMR defects also play a role in 15-20% of sporadic CRCs and also occur in tumours from a variety of tissues (243).

These include endometrial, ovarian, gastric, cervical, breast, skin, lung, prostate, and bladder cancers, as well as glioma, leukemia, and lymphoma (78). Regarding HNPCC, two independent groups, Kolodner and colleagues and Vogelstein and colleagues, identified germline mutations in

MSH2 gene at chromosome 2p16-p21 in HNPCC families (244, 245). The second locus linked to

HNPCC, MLH1 at 3p21-p23, was identified shortly afterwards (246-248). Approximately 15-

20% of sporadic CRCs exhibit a loss in MMR due to MLH1 promoter methylation (249, 250).

Knockout mouse models have been developed for MSH2, MSH3, MSH4, MSH5, MSH6,

MLH1, PMS1, PMS2, MLH3, and EXO1 (Table 1.5). Most of the knockout mice are cancer prone with a mutator phenotype and microsatellite instability (78). The primary cancer susceptibility of MLH1, MSH2, and PMS2 knockout mice is lymphoma and secondary cancer susceptibilities are to gastrointestinal tumours, skin neoplasms, and/or sarcomas (78). MSH3- knockout mice are either tumour free, or they develop tumours at a late age (78). MSH4-,

MSH5-, and PMS1-knockout mice do not develop tumours and, of these, only PMS1-knockouts show low microsatellite instability (185). Male and female knockout mice for MSH4, MSH5,

48

Table 1.5 Phenotype of MMR-deficient knockout mice

Genotype Tumour Spectrum Tumour MSI Fertility Incidence Male/Female MSH2-/- Lymphoma, GI, skin, and High High +/+ other tumours MSH3-/- GI tumours at late age or Low Low to High +/+ tumour free MSH6-/- Lymphoma, GI, skin, and High Low in +/+ other tumours dinucleotide repeats MSH3-/- Lymphoma, GI, skin, and High High +/+ MSH6-/- other tumours MSH4-/- None None N/A -/- MSH5-/- None None N/A -/- MLH1-/- Lymphoma, GI, skin, and High High -/- other tumours PMS1-/- None None Mononucleotide +/+ repeats only PMS2-/- Lymphoma and sarcoma High High -/+ MLH3-/- N/A N/A Low -/- EXO1-/- Lymphomas Moderate High -/-

GI – gastro-intestinal, MSI – microsatellite instability. Adapted from Hsieh P, Yamane K. DNA mismatch repair: molecular mechanism, cancer, and ageing. Mech Ageing Dev 2008;129 (7- 8):391-407.

49

MLH1, MLH3 and EXO1 are infertile, while PMS2-knockouts mice display infertility only in males (185).

1.3 Genetic Variation

The publication of the human genome sequence (The Human Genome Project) and annotations of genetic variation in the human genome (The HapMap Project) combined with advances in technology have significantly changed the landscape of human genetics (251-253).

Now geneticists are able to address the contribution of common germline variants to disease susceptibility and outcome.

1.3.1 Polymorphisms (Definitions and Introduction)

The human genome is composed of over three billion bases of DNA encoding somewhere between 25,000 and 30,000 genes (251, 252). Multiple forms of genetic variation are present in the human genome, but the most common form is the single nucleotide polymorphism

(SNP). SNPs are DNA variants where a single nucleotide at a fixed position in the genome is substituted with another (254). Traditionally, SNPs have been defined as sequence alterations that are present in the general population with a minor allele frequency (MAF) of >1% and they have been regarded to result in neutral or benign phenotypic alterations (255). It has been estimated that there are in excess of 10 million common SNPs within the genome occurring, on average, every 300-1,000 base pairs (254). Although vast majority of SNPs are shared between populations, many are specific to populations or continental grouping of populations that share a recent history. This subset of SNPs is likely to give rise to the observable phenotypic differences in and between populations, including disease susceptibility and outcome (254). In this regard,

50

SNPs can be used to measure admixture in populations and may be utilized to map genes that could account for the differences in disease incidence between populations (256, 257).

It has been estimated that up to 50,000 to 200,000 SNPs may be biologically important

(258-260). SNPs have the potential to directly contribute disease pathogenesis, acting in a variety of ways depending on where they occur. SNPs located within genes can have serious consequences for the function or structural stability of a protein if they change its primary structure (254). Exonic SNPs that result in amino acid substitutions are referred to as non- synonymous SNPs (nsSNPs). These are the best-characterized class of genetic polymorphisms as they are subject to detection bias and their functional effects are usually easily assayable (254).

The relative severity of an amino acid substitution can be predicted by evaluating the biochemical properties of the amino acid side chain in question. The significance of amino acid substitutions can be assessed using algorithms, such as the Sorting Tolerant From Intolerant

(SIFT) and PolyPhen (261, 262). Exonic SNPs that do not alter protein’s primary structure are called synonymous and were thought to be functionally uninteresting. However, these SNPs can affect mRNA stability and alter splicing signals in genes (263, 264). SNPs in introns, regulatory and gene-distant regions can also be functionally important by affecting gene regulation.

Promoter SNPs can directly affect transcription of a gene while intronic SNPs may affect splicing mechanism. Even SNPs that occur in apparent gene deserts have been associated with disease risk (265). Methodologies for prediction of functional intronic or regulatory SNPs are in their infancy (254).

Other classes of genetic variants include short tandem repeats (STRs) and variable number tandem repeats (VNTRs), collectively known as microsatellites. They are often extremely heterogeneous within a population (254). Copy number variants (CNVs) are structural

51 variants containing large regions of variable copy numbers and can have MAF >1% (266). It has been estimated that a pair of individuals from a population will differ by at least 11 CNVs (267).

CNVs may encompass entire genes, promoter regions, and have dose effects (266). CNVs, therefore, may have an impact on phenotype, however the technology required to detect and assay CNVs has not reached a level of accessibility and versatility as that of SNPs (268).

Insertion-deletion variants (indels) occur when one or more base pairs (up to several kilobases) are present in some genomes, but not in others. An inversion variant is one in which the order of nucleotides gets reversed in a specific region of the chromosome. A well-known inversion variant has been identified on chromosome 17 in which ~900 kilobase interval is in the reverse order in approximately 20% of individuals with Northern European ancestry (269). Block substitutions, on the other hand, are strings of adjacent nucleotides that vary between genomes

(270).

1.3.2 Utility and Roles of SNPs in Genetic Epidemiology

Genetic associations may be tested either directly, where the SNP under examination is considered the disease-causing variant, or indirectly, where a SNP serves as a surrogate marker for the functional variant (254). Using the direct approach poses a problem of selecting an appropriate subset of SNPs to study. There are simply too many SNPs in the genome to study every single one. Such studies are usually limited to evaluating select variants with known or predicted functional consequences. However, the chance of success may be greatly improved by examining biological candidate genes implicated in disease pathogenesis (254). Over the past decade, the roles of common polymorphisms in numerous candidate genes were investigated in several cancers. However, the majority of the published studies have been small, and therefore

52 underpowered, and the effects detected for some of the polymorphisms studied have not been replicated. It is likely that most reported associations were false positives (type I error), however, a lack of adequate statistical power can also result in false-negative results (type II error).

Indirect study designs have the opportunity to map disease genes, while remaining agnostic with regard to function. Such studies may detect causal variants by proxy as a consequence of the correlation between a marker SNP and the true functional variant (254). Markers on the same chromosome that remain strongly associated with one another are said to be in linkage disequilibrium (LD) with one another. The LD between many SNPs generally persists because meiotic recombination does not occur at random, but is concentrated in recombination hotspots

(271). When LD is plotted across the genome, it appears to create blocks of common genetic variation, haplotype blocks, separated by hotspots of recombination (272). Some haplotypes may be large and extend over hundreds of kilobases demonstrating that parts of the genome are relatively protected from recombination (254). The patterns of LD in the genome vary according to population genetic history. Two common measures of pairwise LD are used, D’ (standardized

LD coefficient, D) and r2 (correlation coefficient), both of which have maximum values of one

(complete LD between two markers). A maximum D’ value of one is reached when less than total of four possible two-SNP haplotypes are observed in a population. Alternatively, r2 is a direct measure of pairwise correlation and for a value of one to be reached, it requires that only two of the four possible two-SNP haplotypes be observed (254). A SNP that best represents a haplotype block is referred to as a tagSNP.

In a span of a decade, from 1997 to 2007, technological advances have moved the field from testing one SNP at a time to the assessment of a million SNPs per individual. Such genome- wide association studies (GWAS), unlike candidate gene studies, facilitate a hypothesis-free

53 approach to genetic epidemiological investigations of common diseases (254). GWAS studies have resulted in identification of novel loci in a wide spectrum of conditions, many of which are in genes that had not previously been considered to be involved in their pathogenesis (254, 273).

Additionally, several instances have been reported where one genomic interval is associated with two or more seemingly distinct diseases, indicating a common pathogenic thread (270).

Examples include different interleukin receptor genes that are associated with Crohn’s disease, multiple sclerosis, systemic lupus erythematosus, and rheumatoid arthritis (274, 275). However, observations of replicable associations in regions of the genome in which there are no known genes identifies an enormous gap of GWAS in the ability to provide the biological explanation for why genomic interval associates with a complex trait (270, 276, 277). For the most part, it can be inferred that a tag SNP for an LD block is statistically associated with a trait, but the precise variants in the block that have a causal role in contributing to variation in the trait are unknown (270). GWAS platforms are well suited for identifying variants in the common disease- common variant hypothesis, which states that common complex traits are largely due to common variants with small to modest effect sizes (278, 279). Current research on the opposing theory, the rare allele hypothesis, is underway to identify low-frequency (<1%) high-penetrance variants, which are not in LD with common variants and therefore are not detected in GWAS.

1.3.3 Identification of Low-Penetrance Alleles in CRC and Founder Populations

Although family history is one of the most important factors in determining risk of developing CRC, only 10-25% of all CRCs may be attributed to having some family history, and even fewer (up to 6%) are caused by mutations in the highly penetrant genes implicated in hereditary CRC syndromes. It is possible that there may still be undefined high-penetrance genes

54 contributing to familial CRC or, alternatively, that a number of low-risk alleles together are able to present a significant risk. Considerable genetic heterogeneity observed in these tumours may be, in part, explained by the contribution of such low-penetrant alleles in candidate genes.

Several low-penetrance alleles including APC*I1307K, TGFBR1*6Ala, BLM*Ash,

MTHFR*677V, HRAS1*VNTR, and MLH1 D132H have been identified (16).

However, APC*I1307K and MLH1 D132H are only found in Ashkenazi Jewish population, which is a well-established founder population (16). Founder populations provide considerable advantages in mapping complex traits (280). These populations have expanded for a relatively small number of generations from a limited number of founders, which decreases genetic heterogeneity and reduces the complexity of genetic models even for complex diseases like CRC (281). Susceptibility to CRC will arise from a small number of independent high- penetrance alleles that are segregating in such populations and accounting for the majority of cases (281). The small number of generations in a founder population results in a reduced number of genetic recombination events surrounding the disease allele, compared to a large outbred population, and allows identification of rare alleles that contribute to a disease phenotype

(280). Several founder populations have been identified and well-studied for a variety of genetic disorders. Founder populations that have been used to advance our understanding of genetics and risks in CRC include Finns, Icelanders, Ashkenazi Jews, Danish, and Newfoundlanders (16).

Newfoundland is a relatively young founder population established little over 300 years ago when approximately 20,000 settlers from South-west England and Ireland made their homes there (282). In a span of 20 generations, Newfoundland population has expanded to only 500,000 inhabitants living in large families in relatively isolated communities throughout the island. This has made Newfoundland an important resource for genetic mapping in CRC and has led to

55 identification of the MSH2 gene implicated in HNPCC (245). Therefore, founder populations may also assist in identification of low penetrant alleles involved in CRC susceptibility.

1.3.4 SNPs in MMR Genes

Recently, a number of GWAS have identified and replicated a number of common, low- penetrant alleles (with odds ratios between 1.0 and 1.2) in CRC. These low-penetrant alleles include: rs16892766 (EIF3H on 8q23.3), rs4444235 (BMP4 on 14q22.2), rs9929218 (CDH1 on

16q21), rs4939827 and rs12953717 (SMAD7 on 18q21), rs10411210 (RHPN2 on 19q13.1) and a number of SNPs that are not located in a gene region such as: rs6983267 (8q24), rs10795668

(10p14), rs3802842 (11q23), rs4779584 (15q13), and rs961253 (20p12.3) (276). Not one of these polymorphisms is located in a gene implicated in MMR. This may be attributed to the

GWAS study designs, which were focused on identifying low-susceptibility alleles across all

CRCs regardless of the pathway involved. However, CRC is a heterogeneous disease caused by defects in a range of different pathways and it is less likely that one or several polymorphisms will account for such a broad range of disorders. Since MMR defects are the sole cause of MSI

CRCs, it is likely that polymorphisms in MMR genes will primarily be associated with this CRC subtype.

A number of MMR SNPs have been evaluated in a variety of conditions and, surprisingly, many reported associations are with extracolonic cancers where MSI status was not evaluated. The MLH1 I219V (rs1799977) polymorphism, which is located in exon 8 at nucleotide position 655 (A>G), was shown in a multilocus analysis with other metabolic SNPs to be associated with an increased risk for childhood acute lymphoblastic leukemia (283). A more recent evaluation of SNPs (284) found a significant association between the I219V homozygous

56 variant (GG) and an increased risk of breast cancer. The MSH2 G322D (rs4987188) polymorphism is located in the coding-region of the MSH2 gene, in exon 6, at nucleotide position 965 (G>A). G322D is a rare polymorphism with MAF <3% (285-287). Functional studies of its yeast homologue, G317D, indicate that this alteration results in a modest decrease in mismatch repair efficiency (288, 289). However, segregation studies in confirmed HNPCC or

HNPCC-like families have provided no consistent evidence of its association with colorectal cancer among affected family members (287). Another MSH2 SNP, IVS12-6T>C (rs2303428) has been associated with acute myeloid leukaemia, non-Hodgkin lymphoma, familial and sporadic CRCs, high-grade dysplasia and cancer in ulcerative colitis patients, and primary lung cancer (290-295). The MSH6 G39E, located in exon 1, (116G>A, rs1042821) was shown to be associated with increased risk of both colon and rectal cancers (296, 297). One of the most famous MMR polymorphisms is MLH1 D132H, which was associated with susceptibility to sporadic CRCs (298). This polymorphism attenuates the ATPase activity of MLH1 without affecting the MMR functions. This uncoupling of the MLH1 functions resulted in increased risk to MSS, but not MSI-H CRCs (298). However, this variant proved to be a founder polymorphism limited to Israeli populations and was not detected in North America or Europe (299, 300).

1.4 Hypothesis and Objectives

The MMR system is an important caretaker of the genomic DNA stability. Numerous genetic variants and polymorphisms have been identified in MMR genes, most of which remain uncharacterized and their role in disease, especially CRC and its subtypes, remains undetermined. It is likely that MMR SNPs contribute, in some form, to CRC susceptibility. We hypothesize is that certain MMR polymorphisms constitute low penetrant/modifier alleles that

57 contribute to CRC susceptibility. The frequency of such polymorphisms may vary according to population.

In order to identify and characterize MMR modifier alleles, the following aims were formulated:

I. Evaluate a panel of MMR SNPs and their contribution to CRC susceptibility or to CRC

subtypes by microsatellite instability status in two independent populations, Ontario and

Newfoundland and Labrador. We also investigated the potential role of each selected

SNP in affecting clinicopathological tumour characteristics such as age at onset, sex,

family risk status, tumour location, histological grade, and TNM stage in the patient

populations.

II. Investigate the functional effects of the MLH1-93G>A promoter polymorphism on the

MLH1 promoter activity. Since the MLH1 gene shares a bidirectional promoter with

another gene, EPM2AIP1, we also want to investigate the functional effects of the

MLH1-93G>A polymorphism on the EPM2AIP1 promoter activity.

III. Determine if the MLH1-93G>A polymorphism along with other variants in the region

make the region susceptible to DNA methylation resulting in the loss of MLH1 protein

expression, microsatellite instability, and finally CRC.

By integrating data obtained from objectives 1-3, we expect to identify genetic markers that will likely serve as modifiers of cancer risk in distinct subset of CRC patients.

58

CHAPTER 2

Identification of MMR Modifier Alleles in CRC in Two Canadian Populations

Miralem Mrkonjic1,2,3, Stavroula Raptis1,2,3, Roger C. Green4, Vaijayanti Pethe1,2, Neerav

Monga5, Yuen Man Chan3, Darshana Daftary5, Elizabeth Dicks6, Banfield H.

Younghusband4, Patrick S. Parfrey7, Steven Gallinger2,8,9, John R. McLaughlin2,10,11, Julia

A. Knight2,10,11, and Bharati Bapat1,2,3,

1) Department of Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, ON,

Canada; 2) Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, ON, Canada;

3) Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON,

Canada; 4) Department of Genetics, Memorial University, St. John’s, NL, Canada; 5) Ontario

Familial Colorectal Cancer Registry, Cancer Care Ontario, Toronto, ON, Canada; 6) Faculty of

Medicine, Memorial University, St. John’s, NL, Canada; 7) Department of Clinical

Epidemiology, Memorial University, St. John’s, NL, Canada; 8) Department of Surgery, Mount

Sinai Hospital, Toronto, ON, Canada; 9) Department of Surgery, University of Toronto, Toronto,

ON, Canada; 10) Prosserman Centre for Health Research, Mount Sinai Hospital, Toronto, ON,

Canada; 11) Department of Public Health Sciences, University of Toronto, Toronto, ON,

Canada.

59

Most of the data presented in this chapter has been previously published essentially in this form in the Journal of the National Cancer Institute: Raptis S*, Mrkonjic M*, Green RC, Pethe VV,

Monga N, Chan YM, Daftary D, Dicks E, Younghusband HB, Parfrey PS, Gallinger SS,

McLaughlin JR, Knight JA, and Bapat B. MLH1-93G>A promoter polymorphism and the risk of microsatellite-unstable colorectal cancer. J Natl Cancer Inst, 2007. 99(6): p. 463-74 (*first co- authors) and in the journal Carcinogenesis: MSH2 118T>C and MSH6 159C>T promoter polymorphisms and the risk of colorectal cancer. Mrkonjic M, Raptis S, Green RC, Monga N,

Daftary D, Dicks E, Younghusband HB, Parfrey PS, Gallinger SS, McLaughlin JR, Knight JA, and Bapat B: Carcinogenesis 2007. 28(12): p. 2575-80.

The work in this chapter was primarily contributed by Miralem Mrkonjic. SR, a M.Sc. student, assisted in genotyping the MLH1-93G>A polymorphism and helped with optimization of the genotyping assay for the MSH2-118T>C polymorphism. YMC, a summer student, and VVP, a research associate, assisted in genotyping the MLH1-93G>A polymorphism. NM, a biostatistician, performed logistic regressions, Cochran–Armitage Trend Tests, and double- checked all other statistical tests. DD, JRM, and SG provided DNA samples from Ontario along with relevant clinical and pathologic data. RCG, ED, HBY, and PSP provided DNA samples from Newfoundland along with relevant clinical and pathologic data. JAK, a principal investigator, performed power calculations and provided critical evaluations for the manuscripts.

SR, RCG, VVP, NM, DD, ED, HBY, PSP, SG, JRM, and BB also provided critical evaluations for the manuscripts.

60

Chapter 2 Identification of MMR Modifier Alleles in CRC in Two Canadian

Populations

2.1 SUMMARY

Although up to 30% of patients with colorectal cancer have a positive family history of colorectal neoplasia; few colorectal cancers can be explained by mutations in high-penetrance genes. Inherited genetic changes, such as single nucleotide polymorphisms, in key candidate genes may contribute to colorectal cancer risk. We investigated whether polymorphisms in DNA mismatch repair genes are associated with the risk of colorectal cancer.

We genotyped 929 case patients and 1098 control subjects from Ontario and 471 case patients and 347 control subjects from Newfoundland and Labrador for five polymorphisms in the mismatch repair genes MLH1, MSH2, and MSH6 with the fluorogenic 5’ nuclease assay.

Tumour microsatellite instability was determined with a polymerase chain reaction-based method; microsatellite instability (MSI) status was assigned as high (MSI-H, ≥30% unstable markers among all markers tested), low (MSI-L, <30% markers unstable), or stable (MSS, no unstable markers). We used unconditional logistic regression to evaluate the association between each polymorphism and colorectal cancer after adjusting for age and sex. The associations between polymorphisms and tumour clinicopathologic features were evaluated with a Pearson’s chi-squared or Fisher’s exact test. All statistical tests were two-sided.

We observed strong associations between the MLH1 –93G>A (rs1800734) polymorphism and

MSI-H tumours among case patients from Ontario (P = 0.001) and Newfoundland (P = 0.003).

When compared with the control populations, homozygosity for the MLH1 –93G>A variant

61 allele was associated with MSI-H tumours among case patients in Ontario (adjusted odds ratio

[OR] = 3.23, 95% confidence interval [CI] = 1.65 to 6.30) and in Newfoundland (OR = 8.88,

95% CI = 2.33 to 33.9), as was heterozygosity among case patients in Ontario (OR = 1.84, 95%

CI = 1.20 to 2.83) and in Newfoundland (OR = 2.56, 95% CI = 1.14 to 5.75). Genotype frequencies were similar among case patients with MSS and MSI-L tumours and control subjects, and the majority of homozygous carriers had MSS tumours. Among case patients from

Ontario, an association between the MLH1 –93G>A polymorphism and a strong family history of colorectal cancer (for Amsterdam criteria I and II, P = 0.004 and P = 0.02, respectively) was observed.

We also observed strong associations between the MSH2-118T>C (rs2303425) polymorphism and family history of colorectal cancer, based on the Amsterdam criteria I (P = 0.005), and

Amsterdam criteria I and II (P = 0.036) among case patients from Ontario. This association was especially evident among female colorectal cancer patients in Ontario (for Amsterdam criteria I, and I and II combined, P = 0.003 and P = 0.0001, respectively).

In two patient populations, the MLH1 –93G>A polymorphism was associated with an increased risk of MSI-H colorectal cancer, while the MSH2-118T>C polymorphism was associated with strong family history of colorectal cancer in Ontario patients.

2.2 INTRODUCTION

MLH1 and MSH2 are the key components of the mismatch repair system, which participates in the recognition of nucleotide mismatches occurring during DNA replication and in the recruitment of additional mismatch repair proteins to the site to correct the replication error

(301-303). In addition to germline mutations that have been identified in the MLH1 and MSH2

62 genes, numerous polymorphisms have also been identified; however, their functional contributions are currently unknown. Five single-nucleotide polymorphisms (SNPs) are of particular interest because of their prevalence and potential to affect mismatch repair functions.

These SNPs are located in the MLH1 gene (i.e., –93G>A promoter SNP, rs1800734, and intronic

IVS14-19A>G SNP, rs9876116, located 19 nucleotides upstream from the exon 15 splice acceptor site), in the MSH2 gene (i.e., -118T>C promoter SNP, rs2303425, and intronic IVS12-

6T>C SNP, rs2303428, located 6 nucleotides upstream from the exon 13 splice acceptor site), and in the MSH6 gene (i.e., -159C>T promoter SNP, rs41540312). The MLH1 –93G>A polymorphism is located in the core promoter region, 93 nucleotides upstream of the translational start site in potential transcription factor binding sites (304). MLH1 –93G>A has been associated with several cancers, including lung and breast cancers (305, 306), and with increased risks of hyperplastic polyps and colonic adenomas in long-term smokers (307). Hutter et al. (308) showed that the MLH1 I219V polymorphism is linked with another downstream

SNP, IVS14-19 A>G. In a European HNPCC population, the IVS14-19 variant G allele is overrepresented on chromosomes bearing a germline MLH1 mutation, and there was a strong association of the variant G allele and MLH1 substitution and deletion mutations (309).

The MSH2-118T>C polymorphism is located in the core promoter region, 118 nucleotides upstream of the transcription start site in a potential transcription factor binding site

(310). The MSH2-118T>C SNP has been examined in a small subsets of HNPCC patients (n =

40), suspected HNPCC patients (n = 56), and early-onset colorectal cancer patients (n = 40), with

157 control subjects in the Korean population. No differences in variant allele frequencies between CRC case patients and control subjects were observed in this study (310). The contribution of the MSH2-118T>C SNP to lung cancer was also examined in a Korean case-

63 control population with no significant results (295). The MSH2 IVS12-6 T>C polymorphism is located six nucleotides upstream of the exon 13 splice acceptor site. An association between this intronic SNP and non-Hodgkin lymphoma has been observed; the C allele occurred at a frequency of 11.4% in cancer patients and 5.0% in the normal control subjects (291, 311). The

MSH2 IVS12-6 T>C polymorphism may also be associated with a predisposition to cancer in patients with ulcerative colitis; the risk of developing colorectal cancer was three times higher for patients with ulcerative colitis who carried the variant C allele than for those who carried the wild-type T allele (294).

In contrast to the other SNPs mentioned, no known case-control studies have been performed for the MSH6-159C>T polymorphism, which is located 159 nucleotides upstream of the transcriptional start site. This MSH6-159C>T SNP is located in a Sp1 transcription factor binding site that is conserved in the MSH6 gene in multiple species including mouse, Guinea pigs, and Mycobacterium tuberculosis (312). Functional analysis of the MSH6 promoter reveals that this SNP inactivates an Sp1 transcription factor binding site and reduces the transcription of the MSH6 gene (312). The MSH6-159C>T was also found to be always associated with another

MSH6 promoter polymorphism, MSH6-448G>A located 448 nucleotides upstream of the transcriptional start site, and the polymorphic alleles MSH6-448A-159T inactivate two Sp1 binding sites and reduce MSH6 promoter activity by 50% (312).

In this case–control study of two populations, one in Ontario and the other in

Newfoundland, we investigated whether any of these five polymorphisms in mismatch repair genes constitute low-penetrant alleles and contribute to colorectal cancer susceptibility by evaluating the association between each SNP and colorectal cancer risk. Because low-penetrance alleles in candidate genes may not only be associated with cancer incidence but may also

64 influence cancer phenotype and prognosis, we also evaluated their association with clinical and pathologic tumour characteristics among case patients and control subjects.

2.3 MATERIALS AND METHODS

2.3.1 SNP Selection Criteria

The five polymorphisms analyzed in this study were selected on the basis of extensive database and literature searches. The databases that we surveyed included: Ensembl

(www.ensembl.org/index), Human Gene Mutation Database

(http://www.hgmd.cf.ac.uk/ac/index.php), Human Genome Variation dataBase

(http://hgvbaseg2p.org/index), International Society for Gastrointestinal Hereditary Tumours

(www.insight-group.org/), National Centre for Biotechnology Information SNP database

(http://www.ncbi.nlm.nih.gov/SNP/), and Swiss-Prot (http://www.ebi.ac.uk/uniprot/). We selected validated SNPs with a minor allele frequency of greater than 1% that had multiple independent submissions to the SNP databases and/or multiple citations in the literature that were confirmed by frequency or genotype data in which all alleles had been observed in at least two chromosomes and that were located in putative functional domains and regulatory regions.

2.3.2 Study Subjects

We conducted this case–control study with subjects from two different populations— those from the province of Ontario, and those from the province of Newfoundland and Labrador.

Case patients accrued from the Ontario and Newfoundland populations were stratified by family risk according to Amsterdam criteria I and II (117, 118). The Amsterdam I criteria include case patients with at least three family members who have been diagnosed with colorectal cancer in

65 two successive generations, with one affected family member being a first-degree relative of the other two, and with at least one of the three being younger than age 50 years at diagnosis (117).

The Amsterdam II criteria include the Amsterdam I criteria but are less stringent because they take into account HNPCC-associated tumours, such as those arising in the endometrium, small bowel, ureter, or renal pelvis (118). Additionally, we also stratified case patients by family risk as described by Cotterchio et al. (313). Briefly, case patients were classified as having a high familial risk if they met Amsterdam I criteria (117) and did not have familial adenomatous polyposis. Case patients were classified as having intermediate familial or other pathologic risk if they met any of the following criteria: case patient had two relatives with any of the HNPCC- associated tumours with one affected family member being a first-degree relative of at least one of the other two, case patient had a family member with HNPCC diagnosed at age 35 years of younger, case patient was younger than age 50 years and had one first/second degree relative diagnosed with colorectal cancer under the age of 50, case patient was age 35 years or younger irrespective of family history, case patient had multiple primary colorectal tumours, case patient had other primary HNPCC-associated tumours, or case patient had one of the following— multiple polyps, Peutz–Jeghers disease, hamartomatous polyp, juvenile polyp, inflammatory bowel disease, or any unusual colorectal cancer histologies (such as carcinosarcoma, adenosquamous, spindle cell, metaplastic, choriocarcinoma, signet ring, undifferentiated, trophoblastic differentiation, small cell neuroendocrine carcinoma)—or is of Ashkenazi Jewish ancestry. All other case patients were classified as having a low or sporadic risk.

Case patients and control subjects from the Ontario population were obtained from the

Ontario Familial Colorectal Cancer Registry, which is part of a US National Cancer Institute- supported consortium, the Colon Cancer Family Registry (http://epi.grants.cancer.gov/CFR/).

66

Living residents of Ontario with pathology-confirmed colorectal cancer, aged 20-74 years, and diagnosed between July 1, 1997, and June 30, 2000, were identified and recruited for the Ontario

Familial Colorectal Cancer Registry from the population-based Ontario Cancer Registry, as described by Cotterchio et al. (313). Family history information was collected by a mailed questionnaire and was used to construct pedigrees that would be used to classify the patient by his or her family risk. A total of 3776 patients with colorectal cancer were identified in Ontario; after we obtained their physicians’ approval, the patients were asked to complete and return the family history questionnaire. The familial risk for each patient was determined by use of the

Amsterdam Criteria (117) and other risk criteria described above (i.e., high, intermediate, or other family risk and low or sporadic risk) (313). Among the 3776 case patients contacted, 1593 were willing and able to participate in the registry. Of the 1593 case patients, 1103 had a blood sample available, from whom we identified 1004 case patients for this study by restricting inclusion to those with an adequate blood sample available and with colorectal cancer indicated as the primary tumour site. Because the majority of case patients and control subjects with specified ethnicity in Ontario (92.5% and 86.9%, respectively) and in Newfoundland (98.6% and

96.8%, respectively) are of white European/Caucasian ethnicity, we decided to exclude from the analyses those who were non-white and those who did not report ethnicity to minimize the potential for population stratification. Of 1004 case patients in Ontario, 929 were white and were successfully genotyped for all five SNPs, and thus constitute the case patients from Ontario. All information was collected from three mailed questionnaires (family, personal, and diet questionnaires) and phone and in-person interviews as well as blood and tissue specimens that were obtained after informed written consent was provided to participate in the Ontario Familial

Colorectal Cancer Registry, as described in protocols approved by the research ethics boards of

67

Mount Sinai Hospital and the University of Toronto. No case patients with familial adenomatous polyposis were included in the registry.

In Ontario, population control subjects who had not been diagnosed with colorectal cancer were accrued by randomly selected residential telephone numbers during the years 1999 and 2000 and the years 2002 and 2003 by use of population-based Tax Assessment Rolls of the provincial government (313, 314). The Ministry of Finance Property Assessment Database (year

2000) was used to identify age- (5-year groups) and sex-matched controls, and thus control subjects were frequency-matched to case patients by sex and 5-year age group. Permission to use this file for recruiting control subjects was granted to an investigator in this study (J. R. M., a co- investigator with the Ontario Familial Colorectal Cancer Registry). The subjects were sent a letter of invitation with a reply form to assess eligibility (no colorectal cancer and age- and sex- matched to case patients). Non-responders received a follow-up telephone call. Another letter was sent if they did not respond within 7 weeks. A total of 2736 control subjects from Ontario agreed to participate in the study, with 1957 completing all three questionnaires (family, personal, and diet questionnaires). Of the 1957, 1314 control subjects provided blood samples, and 1098 of them were white. These 1098 control subjects were successfully genotyped for all five SNPs and thus constituted the control subjects from Ontario in this study. The remaining case patients and control subjects did not complete one or more questionnaires, did not have an adequate blood sample, or were not white.

The accrual pattern followed by the Newfoundland Familial Colorectal Cancer Registry was similar to that followed by the Ontario Familial Colorectal Cancer Registry. Case patients with colorectal cancer who were younger than age 75 years and diagnosed between January 1,

1999, and December 31, 2003, were identified through the Newfoundland tumour registry. No

68 additional sampling was done in Newfoundland. We identified 1175 case patients with colorectal cancer and obtained consent from physicians to contact 1144 of them. Of the 1144 patients, 747 patients responded to the family history questionnaire, and of those, 555 provided blood samples.

We were able to obtain 504 case patients from the Newfoundland Familial Colorectal Cancer

Registry, of whom 430 provided ethnicity information and were classified as white. The recruitment of population control subjects who had not been diagnosed with colorectal cancer in

Newfoundland was accomplished through random digit dialling, and the control subjects were again matched to case patients by sex and 5-year age group. We identified 1602 control subjects from Newfoundland who agreed to participate in the study, of whom 703 completed all three questionnaires. Of the 703, 530 control subjects provided blood samples. The remaining case patients and control subjects did not complete one or more questionnaires, did not have an adequate blood sample, or were not white. Because the accrual of the Newfoundland population control subjects is being performed on an ongoing basis, at the time of this analysis, the 347 white control subjects who had provided an adequate blood sample and completed all three questionnaires were included in this study.

The mean age for control subjects from both provinces was calculated from the date of completion of the family history questionnaire, and that for case patients was calculated from the age at diagnosis. Case patients accrued from the Ontario and Newfoundland populations were stratified according to Amsterdam criteria I and II (117, 118) as described above. We collected data on tumour microsatellite instability (MSI) status, tumour location, tumour stage, and tumour grade, when available, through review of pathologic and/or surgical reports. Tumours were staged and graded according to the methodology of the American Joint Committee on Cancer

(45).

69

2.3.3 Molecular Genetic Analysis

2.3.3.1 Single-Nucleotide Polymorphism Genotyping with the Fluorogenic 5’ Nuclease Assay

Peripheral blood lymphocytes were isolated from whole blood by use of Ficoll–Paque gradient centrifugation according to the manufacturer’s protocol (Amersham Biosciences, Baie d’Urfé, Quebec, Canada). Phenol–chloroform or the Qiagen DNA extraction kit (Qiagen Inc.,

Montgomery Co., MD) was used to extract genomic DNA from lymphocytes. The fluorogenic 5’ nuclease polymerase chain reaction (PCR) assay or the TaqMan assay (315) was used to genotype each of the following five SNPs: MLH1-93G>A, MLH1 IVS14-19A>G, MSH2-

118T>C, MSH2 IVS12-6T>C, and MSH6-159C>T. Primers and probes were designed with

Primer Express version 2.0 software (Applied Biosystems, Foster City, CA). Primers were tested for successful PCR amplification of the 70 (MLH1-93G>A), 85 (IVS14-19A>G), 57 (MSH2-

118T>C), 105 (IVS12-6T>C), and 52 (MSH6-159C>T) base-pair products on the basis of the visualization of a robust amplicon product on an agarose gel (316, 317). Sequences of primers and probes are listed in Table 2.1. The master reaction mixtures for the polymorphism MLH1-

93G>A was prepared with reagents in the TaqMan 1000 Rxn PCR Core Reagents kit (Applied

Biosystems, Foster City, CA). The PCR conditions for IVS14-19A>G were as follows: 2 minutes at 50°C, 10 minutes at 95°C (AmpliTaq Gold activation), 15 seconds at 92°C (DNA denaturation), and 1 minute at 58.5°C (primer–probe annealing and primer extension). The denaturation–annealing–extension sequence (15 seconds at 92°C and 1 minute at 58.5°C) was repeated for 39 cycles.

70

Table 2.1 Sequences of primers and probes

Polymorphism 5’ to 3’ Primer/Probe Sequences MLH1 –93G>A F - TGAAGGGTGGGGCTGGAT R - AATCACCTCAGTGCCTCGT FAM - ACGTTCTTCCTTCAGCTGTAGC-TAMT TET - TTCTTCCTTTAGCTGTAGCTTAC-TAMT MLH1 IVS14-19A>G F - TTGTATCTCAAGCATGAATTCAGCT R - AATGAGTATCTGGTAGAACAGTTCTTCACT FAM - TCCTTAAAGTCACTTCAT-MGBNFQ VIC - TCCTTAAAGTCGCTTCA-MGBNFQ MSH2 -118T>C F - TCC CAC CCA CCG AAA CG R - TCC GGC CAC GGC GAC CA FAM - ACC CAA TCA GCT TCC A - MGBNFQ VIC - ACC CAG TCA GCT TCC A - MGBNFQ MSH2 IVS12-6T>C F - TCCATTTATTAGTAGCAGAAAGAAGTTTAAA R - CCAGTTTGTCGAATATATGTTGATTTACC FAM - ATAATTTGTTTTGTAGGCCCCA-MGBNFQ VIC - TAATTTGTTCTGTAGGCCCCA-MGBNFQ MSH6 -159C>T F - CGG GGG CGG GGC CT R - ATC AAC AGG CGC CTC GC FAM - CGC ACC GCC CGC GCA - MGBNFQ VIC - CGC ACC GCC TGC GCA - MGBNFQ

F = forward primer; R = reverse primer; FAM = wild type allele probe; VIC = variant allele probe; MGBNFQ = minor groove binder non-florescent quencher.

71

The PCR conditions for MSH2-118T>C and MSH6-159C>T SNPs were as follows: 2 minutes at 50°C, 10 minutes at 95°C (AmpliTaq® Gold activation), 15 seconds at 92°C, 1 minute 30 seconds at 62.0°C repeating the denaturation-annealing/extension sequence for 49 cycles.

For the MLH1-93G>A polymorphism, the master reaction mixture contained (final concentrations) 96 µM dATP, 96 µM dCTP, 96 µM cGTP, 96 µM dUTP, 3.5 mM MgCl2, 300 nM forward primer, 300 nM reverse primer, 15 nM wild-type FAM-labelled probe, 100 nM variant TET-labelled probe, and AmpliTaq Gold DNA polymerase (0.025 U/µL). The conditions used were identical to those used for IVS14-19A>G, except the annealing temperature was 62

°C. The MLH1–93G>A SNP assay used 29 µL of the master reaction mixture and 1 µL of DNA template (at 2–7.5 ng/µL).

The MLH1 IVS14-19A>G, MSH2-118T>C, MSH2 IVS12-6T>C, and MSH6-159C>T polymorphisms were genotyped by use of the Eurogentec qtPCR kit (Eurogentec, San Diego,

CA). The master reaction mixture contained (final concentrations) 0.72 mM primers and 0.16 mM probes (both VIC and FAM labelled). Assays for these polymorphisms used the same conditions described for IVS14-19A>G. The Eurogentec qtPCR SNP assays used 12.5 µL of the master reaction mixture and 1 µL of DNA template (at 2–7.5 ng/µL).

All SNP genotyping assays were conducted in 96-well polypropylene plates (Axygen

Scientific, Union City, CA), and the results were analyzed with the Applied Biosystems 790 0HT

Sequence Detection system and the accompanying software—SDS versions 2.0 and/or 2.1

(Applied Biosystems, Foster City, CA). Independent quality control for genotyping was done on

5%–10% of samples with restriction fragment length polymorphism (318) and/or sequencing

(319, 320).

72

2.3.3.2 Tumour Microsatellite Instability Analysis

Tumour MSI analysis was performed as described previously (321). Briefly, paraffin- embedded colorectal tumour tissue from case patients with incident cases of colorectal cancer and also paraffin-embedded normal colorectal tissue from the same patients were microdissected in areas with more than 70% cellularity in tumour and normal cell populations. MSI analysis was carried out with five or more microsatellite markers from the panel of 10 microsatellite markers, as recommended by the National Cancer Institute; these markers consist of the mononucleotides

BAT-25, BAT-26, BAT-40, and BAT-34C4; the dinucleotides D2S123, D5S346, ACTC,

D18S55, and D10S197; and one penta-mono-tetra compound marker, MYC-L (65). MSI was indicated by the presence of altered or additional bands of the PCR-amplified product from the tumour tissue, compared with the bands from matched normal colon tissue. MSI status was assigned as MSI high (MSI-H, ≥30% unstable markers among all markers tested), MSI low

(MSI-L, <30% markers unstable), or microsatellite stable (MSS, no unstable markers) as described by the NCI recommended guidelines for MSI testing (65). For the analysis, MSI-L and

MSS groups were combined into one group (hereafter referred to as “MSS/L”), in order to distinguish them from MMR deficient MSI-H group. Primers were obtained from Applied

Biosystems (Foster City, CA), and primer sequences are listed in Table 2.2.

PCRs for MSI analysis were prepared with reagents in the Platinum Taq Kit (Invitrogen,

Burlington, ON) and in the Gold Taq Kit (Applied Biosystems, Foster City, CA). For BAT25 and D5S346, the master mixture contained (final concentrations): 2.0 mM MgCl2, all four deoxyribonucleotide triphosphates (each at 0.4 mM), forward primer (2 ng/µL), reverse primer (2 ng/µL), 2 units of Platinum Taq Polymerase, and 1x PCR buffer. The master reaction mixtures

73

Table 2.2 Primer sequences for microsatellite instability testing

Locus 5’ to 3’ Primer Sequence Product Size Microsatellite (nucleotides) Type BAT25 F-NED-TCG CCT CCA AGA ATG TAA GT 150-125 Mononucleotide R- TCT GCA TTT TAA CTA TGG CTC BAT26 F- HEX-TGA CTA CTT TTG ACT TCA GCC 107-125 Mononucleotide R- AAC CAT TCA ACA TTT TTA ACC C D5S346 F-6FAM-ACT CAC TCT AGT GAT AAA TCG GG 110-135 Dinucleotide R- AGC AGA TAA GAC AGT ATT ACT AGT T D2S123 F- HEX-AAA CAG GAT GCC TGC CTT TA 200-230 Dinucleotide R- GGA CTT TCC ACC TAT GGG AC ACTC F-6FAM-CTT GAC CTG AAT GCA CTG TG 70-98 Dinucleotide R- ATT CCA TAC CTG GGA ACG AG BAT40 F- NED-ATT AAC TTC CTA CAC CAC AAC 110-140 Mononucleotide R- GTA GAG CAA GAC CAC CTT G MYC-L F-NED-TGG CGA GAC TCC ATC AAA G 140-210 Tetranucleotide R- CCT TTT AAG CTG CAA CAA TTT C BAT34C4 F- HEX-ACC CTG GAG GAT TTC ATC TC 120-145 Mononucleotide R- AAC AAA GCG AGA CCC AGT CT D10S197 F- HEX-ACC ACT GCA CTT CAG GTG ACA 155-185 Dinucleotide R- GTG ATA CTG TCC TCA GGT CTC C D18S55 F-6FAM-GGG AAG TCA AAT GCA AAA TC 135-165 Dinucleotide R- AGC TTC TGA GTA ATC TTA TGC TGT G

F = forward primer, R = reverse primer. NED, HEX and FAM are spectral dyes used.

74 for BAT26, ACTC, and D2S123 were identical to that for BAT25 and D5S346, except that 1 unit Gold Taq polymerase was used. The master reaction mixtures for D18S55, BAT40, and

MYC-L were also identical to that of BAT25 and D5S346, except that 2.25 mM MgCl2 was used. The master reaction mixture for BAT34C4 also used 2.25 mM MgCl2 but with 1 unit of

Gold Taq polymerase, and the master reaction mixture for D10S197 used 1.0 mM MgCl2 and 1 unit of Gold Taq polymerase. All assays used 20 µL of the master reaction mixture and 2 µL of

DNA template (at 25 ng/µL). The PCR conditions for all microsatellite sequences were as follows: 5 minutes at 94°C (Taq activation), 30 seconds at 94°C (DNA denaturation), 30 seconds at 55°C (primer annealing), and 30 seconds at 72°C (primer extension). The denaturation– annealing–extension sequence was repeated for 35 cycles, followed by 10 minutes at 72°C. Upon completion of the PCRs, the products of individual DNA samples were pooled into two groups as follows: group 1 contained 15% of the ACTC, 11.3% D5S346, 23.7% BAT26, 30% D2S123, and 20% BAT25, group 2 contained 10% D18S55, 15% BAT34C4, 20% D10S197, 35%

BAT40, and 20% MYC-L. Once the pooled PCR products were mixed by brief vortexing, 1.5 µL of pooled PCR products was mixed with 8 µL of Hi-Di formamide (Applied Biosystems, Foster

City, CA) and 0.5 µL of ROX standard (i.e., a passive reference dye) (Applied Biosystems,

Foster City, CA) and loaded onto 96-well polypropylene plates. All microsatellite instability assays were conducted in 96-well polypropylene plates (Axygen Scientific, Union City, CA), and the results were analyzed with the Applied Biosystems 3130xl DNA Analyzer system and the accompanying software—GeneMapper version 3.7 for Microsatellite Instability (Applied

Biosystems, Foster City, CA).

75

2.3.4 Statistical Methods

The associations of the variant alleles with colorectal cancer incidence, age at onset, MSI status, tumour location, tumour grade, tumour stage, and family risk status (Amsterdam criteria I and II) were evaluated with a two-sided Pearson’s chi-squared or Fisher’s exact test, in which a P value of less than 0.05 was considered statistically significant. Unconditional logistic regression was also used to evaluate the association between each SNP and colorectal cancer, after adjusting for age and sex. The Cochran–Armitage Trend Test was performed to examine the trend between MLH1 –93 variant allele carriers (of zero, one, or two copies) and case patients with MSI-H tumours. Chi-squared and Fisher’s exact tests, the trend test, and logistic regression were performed with SAS version 9.0 (SAS Institute, Cary, NC). All statistical tests were two- sided, and the results were adjusted by use of the Bonferroni correction method for multiple comparisons.

Before the initiation of this study, we had performed power calculations. It was estimated that 600 case patients and 600 control subjects would be made available from the Ontario and

Newfoundland registries and that the rarest allele frequency of selected polymorphisms was 5%

(rarest carrier frequency of 10%), if we assumed Hardy–Weinberg equilibrium. From 600 case patients and 600 control subjects, we had 80% power to detect an odds ratio of 1.9 [if we assumed a correlation coefficient of 0.2 between case patients and control subjects as described by Dupont and Plummer (322)] with a two-sided alpha of 0.01. This more conservative value of alpha was used to reduce the likelihood of a chance positive result among the approximately 20 comparisons that were made in each population.

76

2.4 RESULTS

2.4.1 Populations of Case Patients with Colorectal Cancer and Control Subjects

We genotyped a total of 929 case patients and 1098 control subjects in Ontario, and a total of 471 case patients and 347 control subjects in Newfoundland. The ages (mean ± standard deviation) at diagnosis for case patients in Ontario and Newfoundland were 59.8 ± 9 years and

60.4 ± 9 years, respectively. The age at diagnosis, family history, and histopathologic feature distributions for all case patients enrolled in this study from both provinces are shown in Table

2.3. Overall, no differences were observed in the distribution of characteristics between the case patients from the two provinces. The ages (mean ± standard deviation) of the control subjects from Ontario and Newfoundland were 63.7 ± 9 years and 60.5 ± 9 years, respectively. No differences in age or sex distributions between case patients and control subjects were observed in either province.

2.4.2 Distribution of Genotypes and Alleles

The variant allele frequencies of selected SNPs in the general populations and in the case patient populations of Ontario and Newfoundland were not known. Whether there were any differences in variant allele frequencies for selected polymorphisms between the general populations of Ontario and Newfoundland was also unknown. The variant allele frequencies for the SNPs examined are shown in Table 2.4. All SNPs examined were in Hardy–Weinberg equilibrium among control populations of both provinces. Overall, the variant allele frequencies were similar between case patients and control subjects in both Ontario and Newfoundland. We observed no statistically significant differences in allele frequencies between the general populations (represented by the control subjects) of Ontario and Newfoundland (data not shown).

77

Table 2.3 Distribution of age, family history, and clinicopathological features of Ontario and Newfoundland colorectal cancer patients Ontario Newfoundland N (%) N (%) No. of Caucasian 929 (NC) 467 (NC) cases

Sex Males 496 (53.4) 285 (61.0) Females 433 (46.6) 182 (39.0) Age at diagnosis < 50 years 126 (13.6) 51 (10.9) ≥ 50 years 802 (86.3) 416 (89.1) Unavailable 1 (0.1) 0 (0.0)

Family history Amsterdam I only 42 (4.5) 17 (3.6) Amsterdam II only 15 (1.6) 0 (0.0) Non-Amsterdam 872 (93.9) 447 (95.7) Unavailable 0 (0.0) 3 (0.6)

Histological grade 1 80 (8.6) 59 (12.6) 2 568 (61.1) 334 (71.5) 3 92 (9.9) 36 (7.7) Unavailable 189 (20.3) 38 (8.1)

TNM stage 1 178 (19.2) Na 2 302 (32.5) Na 3 247 (26.6) Na 4 47 (5.0) Na Unknown 155 (16.7) Na

MSI MSI-H 118 (12.7) 33 (7.1) MSS/MSI-L 650 (70.0) 288 (61.7) Unavailable 161 (17.3) 146 (31.2)

Tumour location Proximal 364 (39.2) 170 (36.4) Distal 554 (59.6) 279 (59.7) Other 10 (1.1) 17 (3.6) Unavailable 1 (0.1) 1 (0.2) NC = Not calculated; TNM = tumour–node–metastasis (available for Ontario cases only); MSI = microsatellite instability; MSI-H = high-frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability.

78

Table 2.4 Allele frequencies of each polymorphism in participants in Ontario and Newfoundland

Variant allele frequencies, % Gene SNP Population Case patients Control P value† subjects MLH1 –93G>A Ontario 22.6 21.4 0.22 Newfoundland 22.4 19.3 0.17

IVS14-19A>G Ontario 43.2 43.2 0.95 Newfoundland 42.2 45.3 0.76

MSH2 –118T>C Ontario 14.0 13.9 0.75 Newfoundland 14.3 12.5 0.24

IVS12-6T>C Ontario 9.3 10.7 0.22 Newfoundland 12.3 12.9 0.93

MSH6 –159C>T Ontario 9.9 11.0 0.38 Newfoundland 12.9 10.6 0.22

† Pearson’s chi-square test was used. All statistical tests were two-sided. A P-value of less than 0.01 was considered statistically significant after adjusting for multiple comparisons using Bonferroni method of correction.

79

The distribution of the genotypes for all five SNPs—MLH1 –93G>A, MLH1 IVS14-19G>A,

MSH2-118T>C, MSH2 IVS12-6T>C, and MSH6-159C>T—among control subjects and case patients in Ontario and Newfoundland are shown in Table 2.5. We found no differences in genotype frequency distribution between all case patients and all control subjects within each province. When case patients were stratified by tumour MSI status (i.e., MSI-H and MSS/L) and compared with control subjects (Table 2.6), we observed a statistically significant association between the MLH1 –93G>A polymorphism and MSI-H tumours in both the Ontario and

Newfoundland populations (for Ontario heterozygotes, OR = 1.84, 95% CI = 1.20 to 2.83; for

Ontario homozygotes, OR = 3.23, 95% CI = 1.65 to 6.30; for Newfoundland heterozygotes, OR

= 2.56, 95% CI = 1.14 to 5.75; and for Newfoundland homozygotes, OR = 8.88, 95% CI = 2.33 to 33.9). However, genotype frequencies were similar among case patients with MSS/L tumours and control subjects and thus not statistically significant (Table 2.6). We also observed a statistically significantly increasing trend for the association between the number of MLH1 –93 variant A alleles carried (zero, one, or two alleles) and MSI-H status in both the Ontario and

Newfoundland populations (both Ptrend<0.001) (Table 2.6).

2.4.3 Associations of polymorphisms with clinicopathological tumour features

Because low-penetrance alleles may not only be associated with cancer incidence but may also influence cancer phenotype and prognosis, we examined associations between available clinicopathologic tumour features among case patients and the variant alleles for each of the five

SNPs. For the MLH1 polymorphisms IVS14-19A>G and the MSH6 polymorphism -159C>T, we found no association between any clinicopathologic characteristic and the variant alleles of these two SNPs in case patients from either Ontario or Newfoundland (Tables 2.7–2.10). In addition,

80

Table 2.5 Association of MMR gene polymorphisms with risk of CRC in Ontario and Newfoundland

Population, SNP, Case patients, No. Control subjects, No. OR (95% CI) and genotype (%) (%) Ontario MLH1 –93G>A GG 554 (59.6) 687 (62.6) 1.00 (referent) GA 331 (35.6) 352 (32.1) 1.19 (0.98 to 1.45) AA 44 (4.7) 59 (5.4) 0.89 (0.59 to 1.35) IVS14-19A>G AA 297 (32.0) 355 (32.3) 1.00 (referent) AG 462 (49.7) 538 (49.0) 1.02 (0.83 to 1.25) GG 170 (18.3) 205 (18.7) 1.02 (0.78 to 1.33) MSH2 -118T>C TT 681 (73.4) 808 (74.0) 1.00 (referent) CT 234 (25.2) 265 (24.3) 1.05 (0.85 to 1.29) CC 13 (1.4) 19 (1.7) 0.92 (0.44 to 1.93) IVS12-6T>C TT 770 (82.9) 878 (80.0) 1.00 (referent) TC 146 (15.7) 205 (18.7) 0.81 (0.64 to 1.03) CC 13 (1.4) 15 (1.3) 0.99 (0.46 to 2.13) MSH6 -159C>T CC 757 (81.8) 868 (79.4) 1.00 (referent) CT 155 (16.7) 209 (19.1) 0.80 (0.63 to 1.02) TT 14 (1.5) 16 (1.5) 1.11 (0.53 to 2.30) Newfoundland MLH1 –93G>A GG 285 (61.0) 222 (64.5) 1.00 (referent) GA 158 (33.8) 113 (32.9) 1.10 (0.81 to 1.48) AA 24 (5.1) 9 (2.6) 2.08 (0.94 to 4.58) IVS14-19A>G AA 151 (32.5) 106 (30.7) 1.00 (referent) AG 226 (48.7) 168 (48.7) 0.93 (0.67 to 1.28) GG 87 (18.8) 71 (20.6) 0.86 (0.57 to 1.28) MSH2 -118T>C TT 342 (73.2) 260 (75.6) 1.00 (referent) CT 116 (24.8) 82 (23.8) 1.07 (0.77 to 1.48) CC 9 (1.9) 2 (0.6) 3.59 (0.76 to 16.9) IVS12-6T>C TT 360 (77.1) 264 (77.2) 1.00 (referent) TC 99 (21.2) 71 (20.8) 1.03 (0.73 to 1.47) CC 8 (1.7) 7 (2.0) 0.84 (0.30 to 2.35) MSH6 -159C>T CC 355 (76.7) 274 (79.6) 1.00 (referent) CT 97 (20.9) 67 (19.5) 1.11 (0.78 to 1.58) TT 11 (2.4) 3 (0.9) 2.90 (0.80 to 10.6) OR = odds ratios adjusted for age and sex; CI = confidence interval; SNP = single-nucleotide polymorphism.

81

Table 2.6 Risk of colorectal cancer by microsatellite instability status for the MLH1-93G>A polymorphism only in Ontario and Newfoundland* Population, tumour status, and Case patients, No. (%) OR (95% CI) genotype Ontario MSI-H GG 55 (48.2) 1.00 (referent) GA 45 (39.5) 1.84 (1.20 to 2.83) AA 14 (12.3) 3.23 (1.65 to 6.30) Ptrend† <0.001

MSS/L GG 379 (60.7) 1.00 (referent) GA 221 (35.4) 1.17 (0.94 to 1.46) AA 24 (3.9) 0.69 (0.42 to 1.15)

Newfoundland MSI-H GG 12 (36.4) 1.00 (referent) GA 16 (48.5) 2.56 (1.14 to 5.75) AA 5 (15.1) 8.88 (2.33 to 33.9) Ptrend† <0.001

MSS/L GG 166 (63.6) 1.00 (referent) GA 83 (31.8) 0.96 (0.66 to 1.39) AA 12 (4.6) 1.70 (0.65 to 4.51)

* The number of control subjects (percentage) from Ontario by genotype was as follows: GG = 687 (62.6); GA = 352 (32.1); AA = 59 (5.4). The number of control subjects from Newfoundland by genotype is as follows: GG = 176 (64.0); GA = 92 (33.4); AA = 7 (2.5). OR = odds ratio adjusted for age and sex; CI = confidence interval; MSI-H = high-frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Cochran–Armitage trend test was used. All statistical tests were two-sided.

82

Table 2.7 Genotype frequencies of MLH1 IVS14-19A>G polymorphism and clinicopathologic features of case patients with CRC in Ontario No. Genotype Frequency, No. (%) Chi- DF P square* value† AA GA GG Total 929 297 (32.0) 462 (49.7) 170 (18.3) NC NC NC Sex Male 496 152 (51.2) 247 (53.5) 97 (57.1) 1.504 2 0.47 Female 433 145 (48.8) 215 (46.5) 73 (42.9)

Age at diagnosis < 50 years 126 36 (12.2) 65 (14.1) 25 (14.7) 0.785 2 0.68 ≥ 50 years 802 260 (87.8) 397 (85.9) 145 (85.3)

Family history Amsterdam I only 42 9 (3.0) 23 (5.0) 10 (5.9) 2.483 2 0.29 Non-Amsterdam I 887 288 (97.0) 439 (95.0) 160 (94.1)

Amsterdam I & II 57 13 (4.4) 30 (6.5) 14 (8.2) 2.999 2 0.22 Non-Am I or II 872 284 (95.6) 432 (93.5) 156 (91.8)

Tumour location Proximal 364 121 (40.9) 180 (39.0) 63 (37.1) Fisher§ - 0.68§ Distal 554 173 (58.4) 275 (59.5) 106 (62.4) Other 10 2 (0.7) 7 (1.5) 1 (0.5)

MSI MSS/MSI-L 624 196 (81.3) 314 (86.3) 114 (85.7) 2.872 2 0.24 MSI-H 114 45 (18.7) 50 (13.7) 19 (14.3)

Histological grade 1 80 30 (12.3) 35 (9.6) 15 (11.3) 1.609 4 0.81 2 568 182 (74.9) 286 (78.6) 100 (75.2) 3 92 31 (12.8) 43 (11.8) 18 (13.5)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.007 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test. All samples with unavailable data have been omitted from the analysis.

83

Table 2.8 Genotype frequencies of MLH1 IVS14-19A>G polymorphism and clinicopathologic features of case patients with CRC in Newfoundland No. Genotype frequency, No. (%) Chi- DF P square* value† AA GA GG Total 430 143 (33.2) 211 (49.1) 76 (17.7) NC NC NC Sex Male 262 81 (56.6) 129 (61.1) 52 (68.4) 2.899 2 0.23 Female 168 62 (43.4) 82 (38.9) 24 (31.6)

Age at diagnosis < 50 years 46 14 (9.8) 25 (11.9) 7 (9.2) 0.592 2 0.74 ≥ 50 years 384 129 (90.2) 186 (88.1) 69 (90.8)

Family history‡ Amsterdam I only 14 6 (4.2) 7 (3.3) 1 (1.4) Fisher§ - 0.67§ Non-Amsterdam I 413 137 (95.8) 204 (96.7) 72 (98.6)

Tumour location Proximal 154 55 (38.5) 74 (35.2) 25 (32.9) 5.819 4 0.21 Distal 259 79 (55.2) 132 (62.9) 48 (63.2) Other 16 9 (6.3) 4 (1.9) 3 (3.9)

MSI MSS/MSI-L 261 86 (86.0) 134 (88.7) 41 (95.4) 2.638 2 0.27 MSI-H 33 14 (14.0) 17 (11.3) 2 (4.6)

Histological grade 1 55 19 (14.7) 25 (12.8) 11 (15.5) 7.319 4 0.12 2 309 105 (81.4) 148 (75.5) 56 (78.9) 3 32 5 (3.9) 23 (11.7) 4 (5.6)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.008 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test was used. All samples with unavailable data were omitted from the analysis. ‡ No probands meeting Amsterdam II criteria have been identified in the Newfoundland population.

84

Table 2.9 Genotype frequencies of MSH6-159C>T polymorphism and clinicopathologic features of case patients with CRC in Ontario N Genotype Frequency, No, (%) Chi- DF P square* value† CC CT TT Total 925 756 (81.7) 155 (16.8) 14 (1.5) NC NC NC Sex Male 495 398 (52.6) 88 (56.8) 9 (64.3) 1.545 2 0.46 Female 430 358 (47.4) 67 (43.2) 5 (35.7)

Age at diagnosis < 50 years 125 99 (13.1) 26 (16.8) 0 (0.0) 3.698 2 0.16 ≥ 50 years 799 656 (86.9) 129 (83.2) 14 (100.0)

Family history Amsterdam I only 41 34 (4.5) 7 (4.5) 0 (0.0) 0.659 2 0.72 Non-Amsterdam I 884 722 (95.5) 148 (95.5) 14 (100.0)

Amsterdam I & II 56 47 (6.2) 9 (5.8) 0 (0.0) 0.954 2 0.62 Non-Am I or II 869 709 (93.8) 146 (94.2) 14 (100.0)

Tumour location Proximal 363 293 (38.8) 66 (42.6) 4 (28.6) Fisher§ - 0.17§ Distal 552 456 (60.4) 87 (56.1) 9 (64.3) Other 9 6 (0.8) 2 (1.3) 1 (7.1)

MSI MSS/MSI-L 650 527 (83.9) 114 (88.4) 9 (81.8) 1.701 2 0.43 MSI-H 118 101 (16.1) 15 (11.6) 2 (18.2)

Histological grade 1 79 65 (10.7) 14 (11.7) 0 (0.0) Fisher§ - 0.76§ 2 565 468 (77.2) 88 (73.3) 9 (90.0) 3 92 73 (12.1) 18 (15.0) 1 (10.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.007 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis.

85

Table 2.10 Genotype frequencies of MSH6-159C>T polymorphism and clinicopathologic features of case patients with CRC in Newfoundland N Genotype Frequency, No, (%) Chi- DF P value† square* CC CT TT Total 463 355 (76.7) 97 (20.9) 11 (2.4) NC NC NC Sex Male 282 215 (60.6) 61 (62.9) 6 (54.5) 0.364 2 0.83 Female 181 140 (39.4) 36 (37.1) 5 (45.5)

Age at diagnosis < 50 years 51 39 (11.0) 10 (10.3) 2 (18.2) 0.626 2 0.73 ≥ 50 years 412 316 (89.0) 87 (89.7) 9 (81.8)

Family history‡ Amsterdam I only 16 15 (4.2) 1 (1.0) 0 (0.0) Fisher§ - 0.38§ Non-Amsterdam I 444 339 (95.8) 95 (99.0) 10 (100.0)

Tumour location Proximal 168 123 (34.7) 40 (41.2) 5 (45.4) Fisher§ - 0.75§ Distal 277 217 (61.3) 54 (55.7) 6 (54.6) Other 17 14 (4.0) 3 (3.1) 0 (0.0)

MSI MSS/MSI-L 289 221 (89.8) 60 (90.9) 8 (80.0) 1.132 2 0.57 MSI-H 33 25 (10.2) 6 (9.1) 2 (20.0)

Histological grade 1 58 43 (13.3) 14 (15.2) 1 (10.0) Fisher§ - 0.57§ 2 309 250 (77.4) 74 (80.4) 8 (80.0) 3 32 30 (9.3) 4 (4.4) 1 (10.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.008 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis ‡ No probands meeting Amsterdam II criteria have been identified in the Newfoundland population.

86 no association was found between clinicopathologic features and the MSH2 IVS12-6T>C polymorphism, except for tumours with MSI status, in which the IVS12-6T>C polymorphism was associated with MSI-H tumours among case patients from Ontario (P = 0.04; Table 2.11) but not with MSI-H tumours among case patients from Newfoundland (Table 2.12). This association between MSH2 IVS12-6T>C and MSI-H tumours among case patients from Ontario did not remain statistically significant after Bonferroni correction for multiple comparisons was applied.

Analysis of MSI status and the genotype distribution of the MLH1 –93G>A promoter

SNP among case patients from Ontario and Newfoundland found strong associations (P = 0.001 and P = 0.003, respectively) between the variant allele and MSI status (Tables 2.13 and 2.14), with a higher proportion of the MLH1 –93G>A variant allele carriers having MSI-H tumours than MSS/L tumours. Both of these results remained statistically significant after Bonferroni correction for multiple comparisons. Analysis of Ontario case patients with a strong family history, as defined by Amsterdam I (117) and/or Amsterdam II criteria (118), found a strong, statistically significant association between the MLH1 –93G>A variant A allele and family history of colorectal cancer (for Amsterdam I, P = 0.004; and for Amsterdam I and II, P = 0.016)

(Table 2.13). The association of the MLH1 –93G>A polymorphism with disease meeting

Amsterdam I criteria remained statistically significant after Bonferroni correction for multiple comparisons; however, the association of MLH1 –93G>A and disease meeting Amsterdam I and

II criteria did not remain statistically significant. No association between the MLH1 –93G>A variant and family history was observed among Newfoundland case patients. However, tumours in Newfoundland case patients who carried the variant allele were more likely to be located in the proximal region of the colon than in the distal region (P = 0.04; Table 2.14). This association

87

Table 2.11 Genotype frequencies of MSH2 IVS12-6T>C polymorphism and clinicopathologic features of case patients with CRC in Ontario No. Genotype frequency, No. (%) Chi- DF P square* value† TT TC CC Total 929 770 (82.9) 146 (15.7) 13 (1.4) NC NC NC Sex Male 496 417 (54.2) 76 (52.1) 3 (23.1) 5.086 2 0.08 Female 433 353 (45.8) 70 (47.9) 10 (76.9)

Age at diagnosis < 50 years 126 107 (13.9) 19 (13.0) 0 (0.0) 2.156 2 0.34 ≥ 50 years 802 662 (86.1) 127 (87.0) 13 (100.0)

Family history Amsterdam I only 42 37 (4.8) 5 (3.4) 0 (0.0) 1.166 2 0.56 Non-Amsterdam I 887 733 (95.2) 141 (96.6) 13 (100.0)

Amsterdam I & II 57 48 (6.2) 9 (6.2) 0 (0.0) 0.863 2 0.65 Non-Am I or II 872 722 (93.8) 137 (93.8) 13 (100.0)

Tumour location Proximal 364 297 (38.6) 61 (41.8) 6 (46.2) Fisher§ - 0.83§ Distal 554 463 (60.2) 84 (57.5) 7 (53.8) Other 10 9 (1.2) 1 (0.7) 0 (0.0)

MSI MSS/MSI-L 624 517 (84.5) 102 (87.2) 5 (55.6) 6.415 2 0.04 MSI-H 114 95 (15.5) 15 (12.8) 4 (44.4)

Histological grade 1 80 63 (10.3) 16 (13.8) 1 (10.0) Fisher§ - 0.61§ 2 568 477 (77.7) 83 (71.5) 8 (80.0) 3 92 74 (12.0) 17 (14.7) 1 (10.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.007 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test. All samples with unavailable data have been omitted from the analysis.

88

Table 2.12 Genotype frequencies of MSH2 IVS12-6T>C polymorphism and clinicopathologic features of case patients with CRC in Newfoundland No. Genotype frequency, No. (%) Chi- DF P square* value† TT TC CC Total 430 332 (77.2) 90 (20.9) 8 (1.9) NC NC NC Sex Male 262 203 (61.1) 54 (60.0) 5 (62.5) Fisher§ - 0.97§ Female 168 129 (38.9) 36 (40.0) 3 (37.5)

Age at diagnosis < 50 years 46 36 (10.8) 9 (10.0) 1 (12.5) 0.080 2 0.96 ≥ 50 years 384 296 (89.2) 81 (90.0) 7 (87.5)

Family history‡ Amsterdam I only 14 9 (2.7) 5 (5.6) 0 (0.0) Fisher§ - 0.38§ Non-Amsterdam I 413 320 (97.3) 85 (94.4) 8 (100.0)

Tumour location Proximal 154 117 (35.4) 33 (36.7) 4 (50.0) Fisher§ - 0.37§ Distal 259 203 (61.3) 53 (58.9) 3 (37.5) Other 16 11 (3.3) 4 (4.4) 1 (12.5)

MSI MSS/MSI-L 261 206 (89.6) 52 (86.7) 3 (75.0) Fisher§ - 0.33§ MSI-H 33 24 (10.4) 8 (13.3) 1 (25.0)

Histological grade 1 55 45 (14.9) 9 (10.5) 1 (14.3) Fisher§ - 0.84§ 2 309 234 (77.2) 69 (80.2) 6 (85.7) 3 32 24 (7.9) 8 (9.3) 0 (0.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless indicated otherwise. A P value of less than 0.008 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test was used. All samples with unavailable data were omitted from the analysis. ‡ No probands meeting Amsterdam II criteria have been identified in the Newfoundland population.

89

Table 2.13 Genotype frequencies of MLH1 –93G>A polymorphism and clinicopathologic features of case patients with CRC in Ontario

No. Genotype frequency, No. (%) Chi- DF P square* value† GG GA AA Total 929 554 (59.6) 331 (35.6) 44 (4.7) NC NC NC Sex Male 496 309 (55.8) 167 (50.5) 20 (45.5) 3.528 2 0.17 Female 433 245 (44.2) 164 (49.5) 24 (54.5)

Age at diagnosis, y <50 126 82 (14.8) 38 (11.5) 6 (13.6) 1.904 2 0.39 ≥50 802 472 (85.2) 292 (88.5) 38 (86.4)

Family history Amsterdam I only 42 27 (4.9) 9 (2.7) 6 (13.6) 11.119 2 0.004 Non-Amsterdam I 887 527 (95.1) 322 (97.3) 38(86.4)

Amsterdam I & II 57 34 (6.1) 16 (4.8) 7 (15.9) 8.272 2 0.016 Non-Am I or II 872 520 (93.9) 315 (95.2) 37 (84.1)

Tumour location Proximal 364 219 (39.5) 129 (39.1) 16 (36.4) 2.197 4 0.935 Distal 554 328 (59.2) 198 (60.0) 28 (63.6) Other 10 7 (1.3) 3 (0.9) 0 (0.0)

MSI MSS/L 649 391 (87.3) 231 (83.4) 27 (65.9) 16.315 2 0.001 MSI-H 117 57 (12.7) 46 (16.6) 14 (34.1)

Histological grade 1 80 49 (11.1) 30 (11.2) 1 (3.0) Fisher§ - 0.13§ 2 568 346 (78.6) 196 (73.4) 26 (78.8) 3 92 45 (10.2) 41 (15.4) 6 (18.2)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = Degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.007 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test. All samples with unavailable data have been omitted from the analysis.

90

Table 2.14 Genotype frequencies of MLH1 –93G>A polymorphism and clinicopathologic features of case patients with CRC in Newfoundland

No. Genotype frequency, No. (%) Chi- DF P square* value† GG GA AA Total 430 260 (60.5) 147 (34.2) 23 (5.3) NC NC NC Sex

Male 262 158 (60.8) 90 (61.2) 14 (60.9) 0.008 2 0.99

Female 168 102 (39.2) 57 (38.8) 9 (39.1)

Age at diagnosis, y

<50 46 28 (10.8) 17 (11.6) 1 (4.3) 1.088 2 0.58 ≥50 384 232 (89.2) 130 (88.4) 22 (95.7)

Family history ‡

Amsterdam I 14 8 (3.1) 6 (4.1) 0 (0.0) Fisher§ - 0.81§ Non-Amsterdam I 413 249 (96.9) 141 (95.9) 23(100)

Tumour location Proximal 154 88 (33.8) 52 (35.6) 14 (60.9) 9.908 4 0.04 Distal 259 164 (63.1) 88 (60.3) 7 (30.4) Other 16 8 (3.1) 6 (4.1) 2 (8.7)

MSI MSS/L 261 166 (93.3) 83 (83.8) 12 (70.6) 11.655 2 0.003 MSI-H 33 12 (6.7) 16 (16.2) 5 (29.4)

Histological grade 1 55 38 (15.4) 15 (11.6) 2 (10.0) Fisher§ - 0.42§ 2 309 193 (78.1) 99 (76.7) 17 (85.0) 3 32 16 (6.5) 15 (11.6) 1 (5.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.008 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test. All samples with unavailable data were omitted from the analysis. ‡ No probands meeting Amsterdam II criteria have been identified in the Newfoundland population.

91 between MLH1-93G>A and tumour location did not remain statistically significant after

Bonferroni correction for multiple comparisons. Tumour location, histological grade, and tumour–node–metastasis (TNM) stage of the tumours were not associated with the MLH1 –

93G>A variant A allele among the Ontario case patients.

In an analysis of Ontario case patients with a strong family history, as defined by the

Amsterdam I (117) and/or Amsterdam II criteria (118), we found a strong statistically significant association between the MSH2-118T>C variant C allele and family history of colorectal cancer

(for Amsterdam I, P = 0.005; and for Amsterdam I and II, P = 0.036) (Table 2.15). Only the association of the MSH2-118T>C polymorphism with case patients meeting the Amsterdam I criteria remained statistically significant after Bonferroni correction for multiple comparisons. A similar association between the MSH2-118T>C variant and family history (based on Amsterdam criteria) was not observed among Newfoundland case patients (Table 2.16). Tumour location, tumour MSI, histological grade, and tumour-node-metastasis (TNM) stage of the tumours were not associated with the MSH2-118T>C variant C allele among the Ontario and Newfoundland patients (Tables 2.15 and 2.16). To prevent the potential confounding effects of MLH1 promoter methylation and to capture the role of the MSH2-118T>C polymorphism in CRC more accurately, we have additionally removed all CRC patients that are lacking MLH1 expression, based on immunohistochemical analysis (n = 74 in Ontario, n = 29 in Newfoundland). A strong statistically significant association between the MSH2-118T>C variant C allele and family history of colorectal cancer with MLH1-proficient CRC patients (for Amsterdam I, P = 0.001; and for Amsterdam I and II, P = 0.011, data not shown) remained, although the latter association was still not significant after Bonferroni correction for multiple comparisons.

Interestingly, for the MSH2-118T>C promoter SNP, we observed that male case patients

92 were significantly or borderline significantly more likely to carry the variant allele compared to female case patients in both Newfoundland and Ontario (P = 0.015 and P = 0.05, respectively,

Tables 2.15 and 2.16, and stayed significant for MLH1-proficient case patients, P = 0.025, and P

= 0.044, data not shown). However, both of these results did not remain statistically significant after Bonferroni correction for multiple corrections. There were no such differences in the two control subject populations.

Since there was some evidence of sex differences between the genotype distributions among case patients, we decided to examine the clinical and pathologic tumour characteristics by each sex. In Ontario, no associations were observed between the MSH2-118T>C SNP and clinicopathological features for male CRC patients (Table 2.17). However, strong associations between the MSH2-118T>C SNP and female case patients from Ontario meeting just the

Amsterdam I criteria (P = 0.003), and both Amsterdam I and II criteria (P = 0.0001) were observed (Table 2.18). These associations were also observed in MLH1-proficient female CRC patients (P = 0.001 and P = 0.003, respectively, Table 2.19). Both of these associations remained significant after Bonferroni correction for multiple comparisons. Additionally, there was a trend towards significance between the MSH2-118T>C SNP and MSI-H tumour status of female case patients in Ontario (P = 0.08, Table 2.18, and P = 0.06 in MLH1-proficient female case patients,

Table 2.19). There were no sex-specific associations between the MSH2-118T>C polymorphism and clinicopathologic tumour features in Newfoundland (Tables 2.20 and 2.21) even after the

MLH1-deficient case patients were removed (data not shown).

93

Table 2.15 Genotype frequencies of MSH2-118T>C polymorphism and clinicopathologic features of case patients with CRC in Ontario N Genotype Frequency, No, (%) Chi- DF P value† square* TT CT CC Total 928 681 (73.5) 234 (25.1) 13 (1.4) NC NC NC Sex Male 496 348 (51.1) 141 (60.3) 7 (53.8) 5.846 2 0.05 Female 432 333 (48.9) 93 (39.7) 6 (46.2)

Age at diagnosis < 50 years 126 92 (13.5) 32 (13.7) 2 (15.4) 0.072 2 0.98 ≥ 50 years 801 589 (86.5) 201 (86.3) 11 (84.6)

Family history Amsterdam I only 42 29 (4.3) 10 (4.3) 3 (23.1) 10.881 2 0.005 Non-Amsterdam I 886 652 (95.7) 224 (95.7) 10 (76.9)

Amsterdam I & II 57 41 (6.0) 13 (5.6) 3 (23.1) 6.941 2 0.036 Non-Am I or II 871 640 (94.0) 221 (94.4) 10 (76.9)

Tumour location Proximal 363 278 (40.9) 80 (34.3) 5 (38.5) Fisher§ - 0.44§ Distal 552 396 (58.2) 150 (64.4) 8 (61.5) Other 9 6 (0.9) 3 (1.3) 0 (0.0)

MSI MSS/MSI-L 650 490 (85.2) 153 (83.6) 7 (70.0) 1.946 2 0.38 MSI-H 118 85 (14.8) 30 (16.4) 3 (30.0)

Histological grade 1 79 58 (10.5) 21 (11.9) 0 (0.0) Fisher§ - 0.54§ 2 565 427 (77.1) 134 (76.1) 4 (66.7) 3 92 69 (12.4) 21 (11.9) 2 (33.3)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.007 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis.

94

Table 2.16 Genotype frequencies of MSH2-118T>C polymorphism and clinicopathologic features of case patients with CRC in Newfoundland N Genotype Frequency, No, (%) Chi- DF P value† square* TT CT CC Total 467 342 (73.2) 116 (24.8) 9 (1.9) NC NC NC Sex Male 285 196 (57.3) 84 (72.4) 5 (55.6) 8.424 2 0.015 Female 182 146 (42.7) 32 (27.6) 4 (44.4)

Age at diagnosis < 50 years 51 38 (11.1) 12 (10.3) 1 (11.1) 0.053 2 0.97 ≥ 50 years 416 304 (88.9) 104 (89.7) 8 (88.9)

Family history‡ Amsterdam I 17 13 (3.8) 4 (3.5) 0 (0.0) Fisher§ - 1.00§ Non-Amsterdam I 447 327 (96.2) 111 (96.5) 9 (100.0)

Tumour location Proximal 170 123 (36.1) 44 (37.9) 3 (33.3) Fisher§ - 0.23§ Distal 279 203 (59.5) 71 (61.2) 5 (55.6) Other 17 15 (4.4) 1 (0.9) 1 (11.1)

MSI MSS/MSI-L 288 210 (88.2) 72 (93.5) 6 (100.0) 2.453 2 0.29 MSI-H 33 28 (11.8) 5 (6.5) 0 (0.0)

Histological grade 1 59 43 (13.9) 16 (14.4) 0 (0.0) Fisher§ - 0.71§ 2 334 243 (78.4) 84 (75.7) 7 (87.5) 3 36 24 (7.7) 11 (9.9) 1 (12.5)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom, NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.008 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis. ‡ No probands meeting Amsterdam II criteria have been identified in the Newfoundland population.

95

Table 2.17 Genotype frequencies of MSH2-118T>C polymorphism and clinicopathologic features of male case patients with CRC in Ontario N Genotype Frequency, No, (%) Chi- DF P square* value† TT CT CC Total 496 348 (70.2) 141 (28.4) 7 (1.4) NC NC NC Age at diagnosis < 50 years 56 36 (10.3) 20 (14.2) 0 (0.0) 2.451 2 0.30 ≥ 50 years 440 312 (89.7) 121 (85.8) 7 (100.0)

Family history Amsterdam I only 21 15 (4.3) 6 (4.3) 0 (0.0) 0.440 2 0.85 Non-Amsterdam I 475 333 (95.7) 135 (95.7) 7 (100.0)

Amsterdam I & II 28 20 (5.8) 8 (5.7) 0 (0.0) 0.518 2 0.81 Non-Am I or II 468 328 (94.2) 133 (94.3) 7 (100.0)

Tumour location Proximal 179 137 (39.4) 40 (28.6) 2 (28.6) Fisher§ - 0.13§ Distal 312 209 (60.0) 98 (70.0) 5 (71.4) Other 4 2 (0.6) 2 (1.4) 0 (0.0)

MSI MSS/MSI-L 357 257 (89.2) 96 (89.7) 4 (100.0) Fisher§ - 1.00§ MSI-H 42 31 (10.8) 11 (10.3) 0 (0.0)

Histological grade 1 49 34 (12.4) 15 (13.2) 0 (0.0) Fisher§ - 0.99§ 2 298 214 (77.8) 82 (77.4) 2 (100.0) 3 37 27 (9.8) 10 (9.4) 0 (0.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless indicated otherwise. A P value of less than 0.01 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis.

96

Table 2.18 Genotype frequencies of MSH2-118T>C polymorphism and clinicopathologic features of female case patients with CRC in Ontario N Genotype Frequency, No, (%) Chi- DF P value† square* TT CT CC Total 432 333 (77.1) 93 (21.5) 6 (1.4) NC NC NC Age at diagnosis < 50 years 70 56 (16.8) 12 (13.0) 2 (33.3) 1.946 2 0.36 ≥ 50 years 361 277 (83.2) 80 (87.0) 4 (66.7)

Family history Amsterdam I only 21 14 (4.2) 4 (4.3) 3 (50.0) Fisher§ - 0.003§ Non-Amsterdam I 411 319 (95.8) 89 (95.7) 3 (50.0)

Amsterdam I & II 29 21 (6.3) 5 (5.4) 3 (50.0) 18.193 2 0.0001 Non-Am I or II 403 312 (93.7) 88 (94.6) 3 (50.0)

Tumour location Proximal 184 141 (42.5) 40 (44.0) 3 (50.0) Fisher§ - 0.97§ Distal 240 187 (56.3) 50 (54.9) 3 (50.0) Other 5 4 (1.2) 1 (1.1) 0 (0.0)

MSI MSS/MSI-L 295 234 (81.3) 58 (75.3) 3 (50.0) Fisher§ - 0.08§ MSI-H 76 54 (18.7) 19 (24.7) 3 (50.0)

Histological grade 1 31 24 (8.6) 7 (10.0) 0 (0.0) Fisher§ - 0.43§ 2 267 213 (76.3) 52 (74.3) 2 (50.0) 3 55 42 (15.1) 11 (15.7) 2 (50.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.008 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis.

97

Table 2.19 Genotype frequencies of MSH2-118T>C polymorphism and clinicopathologic features of MLH1-proficient female case patients with CRC in Ontario N Genotype Frequency, No, (%) Chi- DF P value† square* TT CT CC Total 386 299 (77.5) 82 (21.2) 5 (1.3) NC NC NC Age at diagnosis < 50 years 66 53 (17.7) 11 (13.6) 2 (40.0) Fisher§ - 0.23§ ≥ 50 years 319 246 (82.3) 70 (86.4) 3 (60.0)

Family history Amsterdam I only 18 12 (4.0) 3 (3.7) 3 (60.0) Fisher§ - 0.001§ Non-Amsterdam I 368 287 (96.0) 79 (96.3) 2 (40.0)

Amsterdam I & II 24 17 (5.7) 4 (4.9) 3 (60.0) Fisher§ - 0.003§ Non-Am I or II 362 282 (94.3) 78 (95.1) 2 (40.0)

Tumour location Proximal 142 110 (36.8) 30 (37.0) 2 (40.0) Fisher§ - 1.00§ Distal 238 185 (61.9) 50 (61.7) 3 (60.0) Other 5 4 (1.3) 1 (1.2) 0 (0.0)

MSI MSS/MSI-L 295 234 (91.4) 58 (87.9) 3 (60.0) Fisher§ - 0.06§ MSI-H 32 22 (8.6) 8 (12.1) 2 (40.0)

Histological grade 1 23 19 (7.6) 4 (6.7) 0 (0.0) Fisher§ - 0.74§ 2 247 196 (78.7) 49 (81.7) 2 (66.7) 3 42 34 (13.7) 7 (11.7) 1 (33.3)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless otherwise indicated. A P value of less than 0.008 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis.

98

Table 2.20 Genotype frequencies of MSH2-118T>C polymorphism and clinicopathologic features of male case patients with CRC in Newfoundland N Genotype Frequency, No, (%) Chi- DF P value† square* TT CT CC Total 281 192 (68.3) 84 (29.9) 5 (1.8) NC NC NC Age at diagnosis < 50 years 31 23 (12.0) 8 (9.5) 0 (0.0) Fisher§ - 0.82§ ≥ 50 years 250 169 (88.0) 76 (90.5) 5 (100.0)

Family history‡ Amsterdam I only 9 7 (3.7) 2 (2.4) 0 (0.0) Fisher§ - 0.77§ Non-Amsterdam I 270 184 (96.3) 81 (97.6) 5 (100.0)

Tumour location Proximal 84 55 (28.8) 29 (34.5) 0 (0.0) Fisher§ - 0.10§ Distal 184 126 (66.0) 54 (64.3) 4 (80.0) Other 12 10 (5.2) 1 (1.2) 1 (20.0)

MSI MSS/MSI-L 187 128 (93.4) 55 (93.2) 4 (100.0) Fisher§ - 1.00§ MSI-H 13 9 (6.6) 4 (6.8) 0 (0.0)

Histological grade 1 30 18 (10.5) 12 (15.2) 0 (0.0) Fisher§ - 0.29§ 2 204 142 (82.5) 58 (73.4) 4 (80.0) 3 22 12 (7.0) 9 (11.4) 1 (20.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square was used, unless indicated otherwise. A P value of less than 0.01 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis. ‡ No probands meeting Amsterdam II criteria have been identified in the Newfoundland population.

99

Table 2.21 Genotype frequencies of MSH2-118T>C polymorphism and clinicopathologic features of female case patients with CRC in Newfoundland

N Genotype Frequency, No, (%) Chi- DF P square* value† CC CT TT Total 181 145 (80.1) 32 (17.7) 4 (2.2) NC NC NC Age at diagnosis < 50 years 20 15 (10.3) 4 (12.5) 1 (25.0) Fisher§ - 0.43§ ≥ 50 years 161 130 (89.7) 28 (87.5) 3 (75.0)

Family history‡ Amsterdam I only 7 5 (3.5) 2 (6.2) 0 (0.0) Fisher§ - 0.67§ Non-Amsterdam I 173 139 (96.5) 30 (93.8) 4 (100.0)

Tumour location Proximal 84 66 (45.5) 15 (46.9) 3 (75.0) Fisher§ - 0.69§ Distal 92 74 (51.0) 17 (53.1) 1 (25.0) Other 5 5 (3.5) 0 (0.0) 1 (0.0)

MSI MSS/MSI-L 101 82 (81.2) 17 (94.4) 2 (100.0) Fisher§ - 0.51§ MSI-H 20 19 (18.8) 1 (5.6) 0 (0.0)

Histological grade 1 30 24 (18.0) 4 (12.5) 0 (0.0) Fisher§ - 0.91§ 2 204 98 (73.7) 26 (81.3) 3 (100.0) 3 22 11 (8.3) 2 (6.2) 0 (0.0)

* Value of the two-sided Pearson’s chi-square test unless otherwise indicated. DF = degrees of freedom; NC = not calculated; MSI = microsatellite instability; MSI-H = high- frequency microsatellite instability; MSS/L = microsatellite stable or low-frequency microsatellite instability. † Two-sided Pearson’s chi-square test was used, unless indicated otherwise. A P value of less than 0.01 was considered statistically significant after adjusting for multiple comparisons with the Bonferroni method of correction. § Two-sided Fisher’s exact test All samples with unavailable data have been omitted from the analysis. ‡ No probands meeting Amsterdam II criteria have been identified in the Newfoundland population.

100

2.5 DISCUSSION

This is the first large-scale case–control study, to our knowledge, to examine the allele frequencies of these five polymorphisms in mismatch repair genes and their association with the incidence of colorectal cancer in two populations. The MLH1-93G>A promoter polymorphism appeared to be strongly associated with MSI-H tumours in both populations. The MLH1-93G>A variant allele, either in the homozygous or heterozygous state, was associated with a higher risk of developing MSI-H tumours than the wild-type allele. The absolute risks of colorectal cancer for heterozygous and homozygous carriers of the variant A allele in Ontario were 7.4% and

6.9%, respectively. For Newfoundland, the absolute risks of colorectal cancer were 13.1% for heterozygous carriers of the variant A allele and 12.6% for homozygous carriers. Thus, the

MLH1-93G>A polymorphism may modify colorectal tumourigenesis. Carriers of the wild-type

(G) allele predominantly had MSS/L tumours, and the genotype frequencies among case patients with MSS/L tumours were very similar to those among the control subjects (Table 4). Among case patients in Ontario, a statistically significant association was found between the MLH1-

93G>A promoter variant allele and a strong family history of colorectal cancer, as defined by the

Amsterdam criteria (117, 118). This association was not observed in the Newfoundland case patients, perhaps because of the small number of patients meeting the Amsterdam criteria and also being carriers of the variant A allele in this study. The MLH1-93G>A variant allele was not associated with tumour location, histological grade, or TNM stage among Ontario case patients, but it was associated with proximally located tumours among Newfoundland case patients. This association is not statistically significant after Bonferroni correction for multiple comparisons. At the time of analysis, TNM stage was not available for Newfoundland case patients and so could not be analyzed in this population.

101

Among Ontario case patients, a statistically significant association was found between the

MSH2-118T>C promoter variant allele and a strong family history of colorectal cancer, as defined by the Amsterdam criteria (117, 118). Upon further investigation, we found that this association is seen only in the female CRC patient population and no such association is seen for male CRC patients. This association was not observed in the Newfoundland patients, perhaps because of the small number of case patients meeting the Amsterdam criteria and also being carriers of the variant C allele in this study. Independent of sex, the MSH2-118T>C variant allele and its association with tumour location, tumour MSI, histological grade and TNM stage did not reveal any associations in the Ontario and Newfoundland case patients (again, TNM stage was not available for Newfoundland cases at the time of analysis).

Because we performed many statistical tests, some statistically significant results could have occurred by chance, and so the Bonferroni correction method for multiple comparisons was applied. The major finding of this study—a strong association between the MLH1-93G>A variant and MSI-H colorectal cancer tumours—was highly statistically significant in two independent populations even after Bonferroni adjustment and was, therefore, unlikely to have occurred by chance. The second major finding of this study, the association between the MSH2-

118T>C SNP and strong family history of CRC in female patients of Ontario (based on

Amsterdam criteria) was significant even after Bonferroni correction method for multiple comparisons. The other finding in our study, the association between the MSH2-118T>C SNP and sex was not significant after Bonferroni correction. However, this latter association was observed in two independent populations, indicating that this may not have occurred by chance.

A study with a larger sample size is needed to confirm this finding.

102

Polymorphism frequencies often vary by ethnic background (323). The frequency of the

MLH1-93G>A variant A allele in the Ontario general population was 21.4% and that in the

Newfoundland general population was 19.3%; both values are considerably lower than those published in Asian populations—46% in the Japanese population and 50% in the Korean population (304-306). The frequency of the MSH2-118T>C variant C allele in the general populations of Ontario and Newfoundland were 13.8% and 12.5%, respectively; both values are lower than those published in Asian populations – 20% in Japanese and Korean populations

(310, 324). The differences between our study populations, which are Caucasian, and the reported Asian populations are not surprising. Indeed, D132H, a SNP in the MLH1 gene, has been associated with colorectal cancer in an Ashkenazi Jewish population in Israel (298).

However, the D132H SNP occurs at a very low frequency in North America and was not detected in a group of patients with colorectal cancer or endometrial cancer in the United States

(299).

The MLH1-93 G>A promoter SNP has been associated with other cancers, including lung and breast cancers (305-307). A recent study found a statistically significant association between the variant AA genotype and an increased risk for squamous cell lung carcinoma in a

Korean population (306). Another study found that the GG genotype was statistically significantly associated with an increased risk of breast cancer in Korean women (305). These differing results may reflect differences in the tissue specificity and diverse cellular functions of

MLH1 (161). Recently, Yu et al. (307) reported that the MLH1-93G>A promoter SNP was associated with an increased risk of colorectal polyps and adenomas among long-term smokers in the United States (307). The results of Yu et al. are consistent with our finding that this polymorphism has modifying effects in colorectal cancers.

103

Functional studies have shown that the MLH1 promoter region from nucleotide position

–184 to the transcription start site, in which the G to A alteration occurs, is essential for transcription of the MLH1 gene (304). Within this region, there are two potential binding sites for transcription factors GT-IIB (GT-motif 2B) and NF-IL6 (interleukin-6-regulated nuclear factor) (304, 325, 326). The G>A alteration may, therefore, affect the transcriptional activation of MLH1. This polymorphism may alternatively be in linkage disequilibrium with another coding or intronic polymorphism and/or an MLH1 mutation that reduces MLH1 function.

Because the A allele of the MLH1-93 G>A polymorphism has also been found in patients with

MSS tumours, we can rule out the possibility of this polymorphism being linked to a founder

MLH1 mutation in our populations.

Traditionally, several different empirically developed clinical criteria, such as the

Amsterdam criteria, were used to assess individual’s risk of developing colorectal tumours in

HNPCC-like families. These criteria were based primarily on family history and early age at onset of cancer, giving little consideration to particular tumour characteristics/phenotype. This was recently addressed in a study by Lindor et al. (126), which found that relatives of

Amsterdam I patients with MSS/L tumours have a statistically significantly lower risk of developing colorectal tumours than relatives of Amsterdam I patients with MSI-H tumours

(126). This type of familial segregation of MSS/L tumours was termed “familial colorectal cancer type X” syndrome (126, 127). Our study identified 42 Ontario case patients meeting

Amsterdam I criteria—17 (40%) carried MSI-H tumours, 15 (36%) carried MSS/L tumours, and

10 (24%) lacked MSI data. Of the 17 case patients with MSI-H tumours, eight (47%) carried the variant allele, and of the 15 case patients with MSS/L tumours, only three (20%) carried the variant allele. Similarly, of the 14 Newfoundland case patients who met Amsterdam I criteria,

104 three (21%) carried MSI-H tumours, seven (50%) carried MSS/L tumours, and four (29%) lacked MSI data. Of three case patients with MSI-H tumours, two (67%) were carriers of the variant allele, and of the seven case patients with MSS/L tumours, three (42%) were carriers of the variant allele. Because the MLH1-93G>A polymorphism is found in a subset of Amsterdam I case patients with MSS/L tumours, further investigation appears to be warranted to examine the association between MSS/L tumours and familial colorectal cancer type X syndrome.

We found that the remaining SNPs examined were not associated with colorectal cancer in either population (Tables 3 and 4), and we did not observe an association between the variant alleles and any clinicopathologic characteristic examined. Despite the strong functional effects on the MSH6 promoter activity, the MSH6-159C>T polymorphism was not associated with CRC or with any clinical or pathologic tumour features. Perhaps more dramatic reduction in MSH6 levels is needed to affect mismatch repair. The MSH2 IVS12-6T>C polymorphism was associated with MSI-H tumour status in the Ontario population of case patients but not in the

Newfoundland population.

Our study has several limitations. One is the potential for selection bias. The major reason for nonparticipation in Ontario was unwillingness and/or inability to participate (only

42% were able and willing to participate). It is unlikely that that self-selection would be related to subject’s genotype, unless genotype is related to advanced disease stages (327). However, because the general clinical and pathologic characteristics of colorectal cancer of our case patient populations were similar to previously published reports (16, 302, 328-330), our study was not particularly limited by this potential source of bias.

Another potential source of bias was self-report of ethnicity. In Newfoundland, 15% of study subjects (case patients and study controls) did not report ethnicity, and in Ontario, 9% of

105 study subjects reported mixed ethnicity. To minimize the potential of population stratification and to make the two populations comparable, non-whites and subjects not reporting ethnicity were excluded from subsequent analyses.

Another limitation was the unavailability of some clinical data from our study subjects.

Clinical and pathologic characteristics were not available for a variety of reasons (e.g., tumour material not available for MSI analysis, technical difficulties with the MSI analyses, or death of case patients before determination of the tumour’s stage or grade). However, the proportion of case patients with missing data was small and was unlikely to affect our results. Accrual of study subjects was not complete in Newfoundland at the time of the analyses, because control subject accrual in Newfoundland was still ongoing, which limited our sample size and statistical power.

Our study also has numerous strengths. The large sample size gave us high precision and was less susceptible to fluctuating results. All statistical analyses were adjusted for the main confounding variables of age and sex. In addition, we were able to address the issue of multiple comparisons by using the Bonferroni correction method. A major strength to our study is the use of two separate population-based registries (Ontario and Newfoundland) that had been accrued with similar strategies, thus providing us with confirmation that our observations reflect true associations and are less likely due to chance.

The important findings in this study—the associations between the MLH1-93G>A polymorphism and MSI status in both populations and between the MLH1-93G>A polymorphism and a strong family history of colorectal cancer (Amsterdam criteria) among

Ontario case patients—indicate that low-penetrance alleles of mismatch repair genes may be associated with the risk of colorectal cancer. Because as many as 25% of all patients with colorectal cancer exhibit familial aggregation without Mendelian patterns of inheritance (331,

106

332), it would be informative to determine whether MLH1-93G>A is associated with family risk defined by use of less stringent criteria than Amsterdam I or II.

Most MSI-H colorectal cancer tumours are sporadic, in which methylation of the MLH1 promoter results in mismatch repair-deficient tumours (333). Results from this study indicate that the MLH1-93G>A promoter polymorphism may act as a modifier allele of CRC that contributes to susceptibility of MSI-H tumours; however, the mechanism used by subtle alterations of mismatch repair genes to contribute to the MSI-H phenotype remains unknown. A possible mechanism through which the MLH1-93G>A polymorphism could affect gene transcription is by altering the promoter’s sensitivity to methylation. The MLH1 promoter is hypermethylated in

15%–20% of sporadic colorectal cancers (249, 250). The MLH1-93G>A promoter polymorphism is located in a CpG island, adjacent to CpG sites that are able to undergo methylation (334). The MLH1 promoter is bi-directional and is used not only by MLH1 but also by another gene, EPM2AIP1, which is located approximately 321 base pairs upstream (of MLH1 transcription start site) and has been implicated in myoclonus epilepsy (335, 336). Because the

MLH1-93G>A polymorphism occurs in the core promoter region of both genes, it may have a preferential effect on the directionality of gene transcription. The effect of promoter polymorphisms on gene transcription has been illustrated with the progesterone receptor gene in endometrial cancers; in this cancer, the promoter of the progesterone receptor has a SNP that selectively transcribes one gene isoform over another (337).

In addition to mismatch repair, MLH1 plays a role in the control of the G2–M cell cycle checkpoint (211). Reductions in the cellular levels of MLH1 tend to have a greater impact on its role in cell cycle control than on DNA mismatch repair (211). Decreased levels of MLH1 expression associated with the MLH1-93G>A polymorphism may lead to impaired cell cycle

107 control, allowing cells to proceed with cell division before proper DNA repair can be accomplished. This impaired control would overwhelm the mismatch repair mechanism, leading to the accumulation of mutations and subsequent microsatellite instability.

Different alterations in mismatch repair genes may have various effects on phenotype, depending on their location within the coding region or regulatory domain of the gene and on the amino acid or nucleotide change that results. Individual missense alterations that commonly occur in mismatch repair genes and other cancer susceptibility genes may not be pathogenic and severe enough to cause colorectal cancer, but they may affect the level of protein (through mRNA levels or rates of translation) required by the specific cell type to perform normal function (338). Such missense alterations may be associated with the MLH1-93G>A promoter polymorphism, which we found to be associated with an increased susceptibility to MSI-H colorectal cancers. Additional characterization of these changes and the cumulative effects that these alterations have on disease risk may lead to new insights into the contribution of low- penetrant alleles to cancer incidence and disease progression.

We did not observe an association of MSH2-118T>C polymorphism and family history in Newfoundland. The number of CRC patients meeting Amsterdam I criteria was very small due to the overall smaller sample size, however there is another factor that may have played a role. Newfoundland is a founder population (282) with the highest rate of CRC in Canada (339);

31% of all CRC cases in Newfoundland have a first-degree relative affected with CRC (339).

Many of these families have a large number of HNPCC-related cancers segregating in a pattern consistent with an autosomal dominant mode of inheritance, but have failed to meet the

Amsterdam criteria due to the late onset of the cancers (340). Therefore, the Amsterdam criteria may not be the most appropriate classification system to determine the high familial risk of CRC

108 in patients from Newfoundland. Indeed, we did not observe any associations with case patients meeting the Amsterdam criteria in our earlier analysis with the MLH1-93G>A polymorphism even when we saw such associations with Ontario case patients (341). In addition, only 40% of

Newfoundland families meeting the Amsterdam criteria have MMR-deficient tumours (339).

The other 60% would therefore be defined as Familial Colorectal Cancer Type X (FCC-X) syndrome (126, 339). Thus, the proportion of families with FCC-X is higher in Newfoundland

(60%) than those reported in other populations (40%) (339).

In the Korean population, the MSH2-118T>C SNP was examined in a small subset of

HNPCC patients (n = 40), suspected HNPCC patients (n = 56), and early-onset colorectal cancer patients (n = 40), and 157 control subjects. Consistent with our findings, no differences in variant allele frequencies between CRC case patients and control subjects were observed in this study

(310). Similarly, the contribution of the MSH2-118T>C SNP to lung cancer was also examined in another Korean case-control population with no significant results (295).

Functional studies have shown that the MSH2 promoter region beginning 300 nucleotides upstream of the transcription start site is crucial for the transcription of the MSH2 gene (310,

324). The MSH2-118T>C polymorphism is located in a NF-Y (also called CBF or CAAT- binding factor) transcription factor binding site (324). NF-Y is a known oestrogen-responsive element (342-346). The variant C allele creates an AP1 transcription factor binding site and the

AP1 is activated by antiestrogens (like androgens) in the presence of oestrogen receptor-beta, which is expressed in the gastrointestinal tract (324, 345). Oestrogen was found to upregulate

MSH2 gene transcription and increase mismatch repair activity in endometrial cells (347), however the exact mechanism of this upregulation is unknown. Oestrogen has a protective effect in colorectal cancer as it reduces the risk of MSI-H colorectal cancers while its withdrawal

109 increases the risk of MSI-H tumours (348). Taken together, the evidence presented in these studies could indicate a role for the MSH2-118T>C polymorphism which could involve altering the responsiveness of the MSH2 promoter to oestrogen. Additionally, MLH1 was also found to be upregulated by oestrogen (347) and the MLH1 promoter is primarily regulated by the NF-Y transcription factor (349). These findings may explain why the MSH2-118 variant C allele is associated with a strong family history of CRC, as defined by Amsterdam criteria, in female

CRC patients and shows a trend towards association with MSI-H tumour phenotype. The MSH2-

118 variant C allele would eliminate the protective effects of oestrogen as it abolishes the binding of the oestrogen-responsive element to the MSH2 promoter. Additional investigations are required to define the exact role of this MSH2-118T>C variant.

The MSH2-118T>C is not the first polymorphism to show sex-related differences in colorectal cancer risk. A promoter polymorphism in the MDM2 gene, SNP309 (T>G), was shown to increase the risk of colorectal cancer, as well as other cancers, in women only (350,

351). It is believed that the SNP309 variant G allele increases the affinity of an oestrogen- responsive SP1 transcription factor and, thus, upregulates the transcription of the MDM2 gene

(350, 352). MDM2, in turn, attenuates the p53 pathway affecting cell cycle arrest, DNA repair, cellular senescence and apoptosis, and increases the risk for tumourigenesis. It seems that the p53 and MSH2 pathways have sex-specific, hormone-dependent roles.

In this study, we found the MSH2-118T>C polymorphism to be associated with clinical family history in women. It is possible that this SNP plays a role in the MSH2 response to sex hormones, since MSH2 expression is enhanced by the sex hormones in particular tissues (347).

Unfortunately, this polymorphism has not received much attention, as there is only one study, to date, which examines the effects of the MSH-118T>C in lung cancer (295). It would be

110 interesting to examine the role of this polymorphism in sex-specific cancers, such as endometrial, ovarian, and prostate cancers. Further characterization of promoter polymorphisms and the cumulative effects that such alterations have on gene expression and ultimately disease risk, may lead to new insights into the contribution of low-penetrant alleles to cancer incidence and progression.

111

CHAPTER 3

The Functional Effects of the MLH1-93G>A Polymorphism on the

MLH1/EPM2AIP1 Promoter Activity

Miralem Mrkonjic, Sheron Perera, James B. Rawson, and Bharati Bapat

Samuel Lunenfeld Research Institute, Department of Pathology and Laboratory Medicine, Mount

Sinai Hospital; Department of Laboratory Medicine and Pathobiology, University of Toronto;

Toronto, Canada.

The work presented in this chapter was primarily contributed by Miralem Mrkonjic. Sheron

Perera, a Ph.D. student, and James Rawson, a summer student, assisted in creation of MLH1 D- promoter constructs and EPM2AIP1 promoter constructs. A modified version of the data presented in this chapter will be submitted for publication.

112

Chapter 3 The Functional Effects of the MLH1-93G>A Polymorphism on the

MLH1/EPM2AIP1 Promoter Activity

3.1 SUMMARY

Defective mismatch repair (MMR) activity leads to microsatellite instability (MSI) phenotype of colorectal cancer (CRC). Unlike MMR gene mutations, the contribution of MMR gene polymorphisms to tumourigenesis is not well elucidated. In population-based studies, we have previously shown that the MLH1-93G>A promoter polymorphism (rs1800734) is strongly associated with MSI CRCs when compared to controls, thereby suggesting a modifier role for this polymorphism in CRC. To evaluate the functional effects of this polymorphism on MLH1 transcription, we transfected a panel of CRC cell lines and one normal colonic cell line with luciferase promoter constructs containing either the -93G or -93A allele. The luciferase activity was measured and compared using the Students’ T test. To distinguish any subtle effects, two constructs encompassing the variant site were used; one spans the core promoter (C+D regions) of MLH1, and the other, shorter construct, spans only the D region of the MLH1 promoter.

When the CRC cell lines were transfected with the core promoter constructs, no significant differences between the two alleles were observed. The largest difference occurred in normal colonic epithelial, CCD-841-CoTr cells, with 72% activity observed in the -93G promoter compared to the -93A promoter (P = 0.12). However, with the abbreviated D promoter, the -93A promoter was significantly less active (6% to 35% transcriptional activity compared to the -93G promoter) in all cell lines examined. Since the MLH1 promoter is bi-directional and also transcribes the antisense EPM2AIP1 gene, we also evaluated the role of this polymorphism on

113 the EPM2AIP1 transcription. The -93A promoter was consistently more active (11% to 54% more activity than the -93G promoter) across all cell lines examined. Our studies indicate that the

-93G>A variant is able to affect the efficiency of MLH1/EPM2AIP1 transcription in these cell lines.

3.2 INTRODUCTION

Colorectal cancer (CRC) is the second leading cause of cancer-related deaths in North

America. CRCs can be broadly subdivided into two separate groups depending on their mismatch repair (MMR) status. DNA mismatch repair system guards the integrity of the genome

(161) and contributes ~1000-fold to the overall fidelity of DNA replication by targeting mispaired bases and insertion deletion loops that occur through replication errors, during homologous recombination, and as a result of DNA damage (161). CRCs deficient in MMR system present with genome-wide microsatellite instability or MSI. Microsatellites are nucleotide repeats found throughout the genome. Instability is characterized by nucleotide insertions or deletions in these repeated loci leading to the widespread genomic instability.

Microsatellite instability can be quantified depending on how many tested markers show instability. When more or equal to 30% of tested markers show instability, the tumour presents high-frequency microsatellite instability of MSI-H. On the other hand, when less than less than

30% or markers tested, but more than 1%, the tumour presents low-frequency microsatellite instability or MSI-L.

Approximately 85-95% of MSI-H CRCs occur due to epigenetic silencing of the MLH1 gene. The MLH1 gene promoter has been empirically subdivided into four regions based on the groupings of CpG dinucleotides and the relationship between their methylation and MLH1 gene expression. These regions were termed: A (that spans from -711 to -577 base pairs from the

114 translational start site), B (-552 to -266 base pairs from the start site), C (-248 to -178 base pairs from the translational start site), and D (-109 to +15 base pairs from the translational start site)

(Figure 3.1) (334). Methylation of region A does not correlate with MLH1 gene expression, while the effects of methylation in region B are more ambiguous. The core promoter region consists of regions C and D and methylation of the C region shows the best correlation with the

MLH1 gene silencing (353). In addition to epigenetic silencing, recent evidence has shown this to be a bi-directional promoter with a second gene, Laforin (EPM2A) Interacting Protein 1

(EPM2AIP1), located on the opposite strand 351 base pairs away from the MLH1 gene’s translational start site (321 bases from transcription start site) (354). Epigenetic modification of this promoter region results in transcriptional silencing of both MLH1 and EPM2AIP1 genes

(354).

Our previous study has identified an association between the MLH1-93G>A promoter polymorphism and MSI-H CRCs in two separate Canadian populations (341). Since then our findings have been independently replicated. In the colon this polymorphism was shown to be associated with increased risk of hyperplastic polyps and adenomas in smokers (307) as well as

MSI-H CRCs, by itself, or in combination with lifestyle factors (296). Furthermore, the MLH1-

93G>A polymorphism was also shown to be associated with CIMP positive CRCs (which include MLH1 promoter methylation) (355) and with the loss of MLH1 protein expression (356).

The MLH1-93G>A polymorphism was also shown to be associated with the risk of endometrial carcinoma (357) and it was shown to be associated with the MLH1 promoter methylation in endometrial cancers (358). Hodgkin lymphoma patients who carried the MLH1-

93 variant A allele were at higher risk of developing secondary tumours following methylating chemotherapy (359).

115

Adapted from Deng et al. Cancer Research 1999;59(9):2029-33

Figure 3.1 Schematic diagram of the EPM2AIP1/MLH1 bi-directional promoter with the four regions outlined The MLH1 promoter has been divided into four regions A-D, which are delineated by CpG dinucleotide clustering (vertical bars). Methylation of CpG sites in regions C and D most strongly correlate with MLH1 gene expression. The A and B promoter regions are located in EPM2AIP1 gene. Borders of each region are shown as well as transcriptional start sites (black arrows) of MLH1 and EPM2AIP1 genes. The MLH1-93G>A polymorphism is located in D region and is represented by a red asterisk.

116

In this study we investigated whether the MLH1-93G>A promoter polymorphism plays a role on MLH1 gene transcription alone or in conjunction with DNA methylation. In addition, since this polymorphism is located in a bi-directional promoter, we also investigated its effects on the expression of the EPM2AIP1 gene.

3.3 MATERIALS AND METHODS

3.3.1 Materials

Tissue culture reagents and foetal bovine serum (FBS) were purchased from Invitrogen

Life Technologies (Burlington, Ontario, Canada). Wildtype and variant MLH1-93G>A pGL3-

C+D (-301 to +99) MLH1 promoter luciferase constructs were gifts from Dr. Hiromichi Hemmi

(Toho University School of Medicine, Japan). The renilla pRL-SV40 vector was purchased from

Promega (Madison, WI). The pGL3-Basic (empty) vector was a gift from Dr. David Hedley

(Princess Margaret Hospital, Toronto).

3.3.2 Cell Culture

HT29 cells were cultured in McCoy’s 5A media. SW620 and SW480 cells were grown in

Leibovitz’s L-15 media. Human embryonic kidney (HEK) 293T and CCD-841-CoTr cells were maintained in Dulbecco’s Modified Eagle’s Medium (DMEM). All cell culture media was supplemented with 10% FBS. All cell lines were incubated at 37°C in a humidified atmosphere of 5% CO2, except CCD-841-CoTr, which were grown at 33°C.

117

3.3.3 Promoter Constructs

3.3.3.1 MLH1 Promoter D-region Construct

DNA fragments corresponding to the MLH1 promoter D region (-113 to +99) were PCR amplified from the C+D MLH1 promoter constructs containing either G or A allele at the -93 position. The PCR reaction contained 10X Taq Buffer HiFi, 2 mM MgCl2, 0.2 mM dNTPs, 1 mM primers, 14 ng of template DNA, and 1 unit high-fidelity Taq DNA polymerase (Fermentas

Life Sciences, Burlington, Ontario, Canada). The PCR conditions were as follows: initial denaturation (95°C for 10 minutes), denaturation 95°C for 30 seconds, annealing (57°C for 45 sec), extension (72°C for 1 min), and final extension (72°C for 8 min). The denaturation- annealing-extension steps were repeated 35 times. The 5’ to 3’ sequences of the forward and reverse primers, respectively, are: AAAAAAAACTCGAGGGATGGCGTAAGCTA (XhoI site underlined), AAAAAAAAAAGCTTCTTTGATAGCATTAGCTGGCCG (HindIII site underlined).

3.3.3.2 EPM2AIP1 Promoter Construct

DNA fragments corresponding to the -513 to +122 region of the MLH1/EPM2AIP1 promoter were amplified from the lymphocyte DNA CRC patients homozygous (G or A) at the -

93 loci. The PCR reaction was identical to that described for the MLH1 promoter D-region, except that it used 40 ng of template DNA and 2.5 mM MgCl2. The 5’ to 3’ sequences of the forward and reverse primers, respectively, are: CCTCGTCGACTTCCATCTTGCTTCTTTT and

CCGTACCAGTTCTCAATCATCTCTTTGAT. A second, nested PCR was performed using the

PCR fragment generated in the above reaction as the template. The PCR conditions were identical to the above reaction, except that it used 20 ng of template DNA. The 5’ to 3’

118 sequences of the forward and reverse primers used, respectively, are:

AAAAAAAAAAGCTTCACAAGCCCGGTTCC (HindIII site underlined) and

AAAAAAAACTCGAGAAACGTCTAGATGCTCAACGG (XhoI site underlined).

Following all PCR reactions, DNA was isolated by gel electrophoresis on 1% agarose gel

(Invitrogen Life Technologies) and purified by QIAquick Gel Extraction Kit (Mississauga,

Ontario, Canada). MLH1 D-region promoter constructs, EPM2AIP1 promoter constructs, and pGL3-Basic vector were digested with XhoI and HindIII restriction endonucleases (Fermentas

Life Sciences). The restriction digestion reaction used 10x Buffer R with BSA, 12.5 units of each restriction enzyme, and 400 ng of inserts/vectors. The reaction mix was incubated at 37°C for 18 hrs. A second round of restriction enzymes were added 8 hrs into the reaction to ensure maximum digestion. Following the digestion, 5 µl of shrimp alkaline phosphatase (SAP) was added, to prevent re-ligation of PCR products, and incubated at 37°C for 1 hr. Following enzyme inactivation, at 80°C for 20 min, DNA was purified using Qiagen QIAQuick PCR Purification

Kit.

The amplified promoter regions were ligated into pGL3-Basic’s multiple cloning sites

(MCS). The ligation reactions contained 100 ng of pGL3-Basic, 33 ng of promoter inserts, 5x

Rapid Ligation Buffer, and 5 units of T4 Ligase (Fermentas Life Sciences). The ligation reactions were incubated at room temperature for 2 hrs.

The newly made plasmid constructs were transformed into DH5α supercompetent cells

(Invitrogen Life Technologies). The transformation reactions used 20 ng of plasmids applied to

100 µl of cells followed by a heat shock for 45 sec. The cells were then incubated at 37°C in a shaker at 225 rpm for 2 hrs. Following incubation, 75 µl of cells are plated on selective media containing Ampicillin and were incubated for additional 24 hrs. Single colonies were selected

119 and grown in 100 µl of selective cultures containing Ampicillin. DNA plasmids were isolated from cells with an Invitrogen PureLink HQ Maxi Purification Kit. Sequencing was performed on plasmids to confirm the integrity and sequence for the correct insert using a primer upstream of the pGL3-Basic’s MCS: 5’-CTAGCAAAATAGGCTGTCC-3’.

3.3.4 Luciferase Reporter Gene Assays

All transfection experiments were carried out using Lipofectamine 2000 reagent

(Invitrogen Life Technologies) according to manufacturer’s instructions. Cells were plated in 24- well plates at a density of 100,000 cells per well 24 hours prior to transfection. Luciferase constructs (500 ng) containing either pGL3-Basic, wild type MLH1-93G C+D promoter, MLH1-

93G D promoter, variant MLH1-93A C+D promoter, variant MLH1-93A D promoter, wild type

EPM2AIP1 G-promoter, or variant EPM2AIP1 A-promoter, were transfected into cells to determine luciferase reporter activity (360, 361). Cells were additionally co-transfected with an internal control (5 ng of pRL-SV40) for monitoring transfection efficiency and normalization.

Luciferase and Renilla activities were measured 24 hr after transfection using Dual Luciferase

Reporter Assay System from Promega (Madison, WI). Luminescence was quantified using a

Berthold 96-well microplate luminometer (Berthold, Wildbad, Germany). The luciferase levels were shown by the ratio of firefly luciferase readings over the renilla luciferase readings. The relative luciferase activity for each transfection was expressed by its luciferase level divided by that form pGL3-Basic promoter transfection. All reporter assays were done in quadruplicate and at least three independent experiments were performed. Statistically significant differences in promoter activity were assessed using the student’s T-test. Error bars represent standard deviation of the mean (SD).

120

3.3.5 In vitro Promoter Construct Methylation with HhaI and M.SssI

The pGL3-C+D MLH1 plasmids (10 µlg) were in vitro methylated using HhaI and

M.SssI methyltransferases (New England Biolabs, Beverly, MA). The HhaI methylation reactions, which methyate the internal cytosine residue in the 5’-GCGC-3’ sequence, were performed with NEBuffer 4, 160 µM S-adenosylmethionine (SAM) at 37°C for 4 hrs. The

M.SssI methylation reactions, which methylate all cytosine residues within the 5’-CG-3’ sequence, were performed with NEBuffer 2, 320µM SAM at 37°C for 4 hrs. The procedure for mock-methylation reactions was identical to the one for DNA methylations except that no SAM was included. The plasmids were purified, following methylation reactions, with Qiagen

QIAQuick PCR Purification Kit. The degree of methylation was determined by analyzing the band patterns on agarose gel electrophoresis (2% agarose gel) after digestion of purified plasmids with HhaI or BstUI restriction endonucleases (New England Biolabs). The HhaI restriction enzyme is methylation sensitive and only cuts the DNA sequence following the internal cytosine residue in non-methylated 5’-GCGC-3’ sequence. The restriction reaction was performed with NEBuffer 4 supplemented with 100 µg/ml bovine serum albumin (BSA) at 37°C for 4 hrs. The BstUI restriction enzyme is also methylation sensitive and cuts the DNA sequence following the internal cytosine residue in non-methylated 5’-GCGC-3’ sequence. The restriction reaction was performed with NEBuffer 4 at 60°C for 4 hrs. The completely methylated plasmids were utilized for luciferase reporter assays in SW620 and CCD-841-CoTr cell lines.

3.3.6 Electrophoretic Mobility Shift Assays (EMSA)

We generated the following double-stranded probes biotinylated at the 5’ end with the

121

MLH1-93G>A polymorphism bolded within brackets: -93G sense, 5’-

TAAGCTACAGCT[G]AAGGAAGAACGTG-3’; -93G antisense, 5’-

CACGTTCTTCCTT[C]AGCTGTAGCTTA-3’; -93A sense,

TAAGCTACAGCT[A]AAGGAAGAACGTG-3’; -93A antisense, 5’-

CACGTTCTTCCTT[T]AGCTGTAGCTTA-3’. The probes generated for the competition reaction were identical in sequence, but lack the 5’ biotin group.

Nuclear extracts of the cell lines were prepared using the NE-PER kit (Pierce) according to the manufacturer’s instructions. EMSA was carried out using the LightShift

Chemiluminescent EMSA Kit (Thermo-Fisher) as instructed by the manufacturer. We incubated

10 µg of nuclear proteins from the CCD-841-CoTr cell line with 60 fmol biotin-labelled oligonucleotide for 20 min at room temperature in binding buffer (10 mM Tris, 50 Mm KCl, 1 mM dithiothreitol, pH 7.5) and 50 ng/µl poly (dI-dC). Binding was competed by 200, 500, 2,000,

5,000, or 25,000-fold excess unlabelled oligonucleotides. Binding complexes were resolved by electrophoresis using 5% TBE Criterion gels (Bio-Rad), transferred to Biodyne B pre-cut modified nylon membranes (Pierce), UV cross-linked, and visualized using the

Chemiluminescent Nucleic Acid Detection system (Pierce).

3.4 RESULTS

We have examined the effects of the MLH1-93G>A promoter polymorphism on the

MLH1 promoter activity using two different constructs, one that spans a core promoter C+D region (-280 to +99) and another that spans a smaller, D only region (-113 to +99) of the promoter (Figure 3.2). Because of the bi-directional nature of the MLH1/EPM2AIP1 promoter,

122 we have also evaluated the effects of this polymorphism using the reverse promoter constructs (-

354 to -23) (Figure 3.2). The luciferase reporter activities of these promoter constructs were examined in three colon cancer cell lines, SW480, SW620, and HT29, one normal colonic epithelium cell line, CCD-841-CoTr, and one extracolonic, non-cancerous cell line, HEK293T.

123

Adapted from Deng et al. Cancer Research 1999;59(9):2029-33.

Figure 3.2 Schematic diagram representing different promoter constructs used in luciferase reporter assays The core promoter region, C+D, is shared between the MLH1 and EPM2AIP1 genes. Three sets of luciferase promoter constructs were created, C+D core promoter, D promoter, and reverse EPM2AIP1 (referred to as IP1 construct) promoter. The promoter constructs used in luciferase reporter assays are represented with purple arrows. Red asterisk represents the -93G>A polymorphism.

124

3.4.1 Effects of the MLH1-93G>A SNP on the core promoter (C+D) activity

For the MLH1 core promoter constructs (C+D) we did not observe statistically significant luciferase reporter activities between the -93G and -93A alleles in any of the cell lines examined (Figure 3.3 A-E). In four of five cell lines evaluated, the differences between the two alleles were less than 0.1-fold (Table 3.1) and only showed trends towards statistical significance in SW480 (P = 0.09) (Figure 3.3 C) and HT29 (P = 0.09) (Figure 3.3 D) cell lines.

In both of these cell lines the constructs containing the -93A allele had slightly higher activities than the constructs containing the -93G allele. The largest difference between the two alleles was observed in the normal colonic CCD-841-CoTr cell line. The constructs containing the -93G allele showed 0.7-fold activity to that of the -93A allele constructs. However, this difference was not statistically significant (P = 0.12) (Figure 3.3 A). In SW620 cell line, the -93G allele constructs showed slightly higher activities, 1.06-fold, than the -93A allele constructs (P = 0.22)

(Figure 3.3 B), while in HEK293T cell line, the -93G allele constructs had 0.94-fold activity to that of the -93A allele constructs (P = 0.57) (Figure 3.3 E).

3.4.2 Effects of the MLH1-93G>A SNP on the D-region promoter activity

The results are much more consistent for D-region constructs between all five cell lines.

We observed statistically significant differences between the -93G allele and -93A allele in all cell lines examined: CCD-841-CoTr (P = 0.02), SW620 (P = 0.003), SW480 (P = 0.001), HT29

(P = 0.001), and HEK293T (P = 0.02) (Figure 3.4 A-E). In all instances the -93G allele constructs had considerably higher activities that the -93A allele constructs. The differences in activities range from 2.8-fold higher activity in the -93G constructs for SW620 cell line to 17.7- fold higher activity in -93G constructs for CCD-841-CoTr (Table 3.1).

125

A B

C D

E

126

Figure 3.3 Effect of the MLH1-93G>A polymorphism on MLH1 core promoter activity The effects of the MLH1-93G>A polymorphism on MLH1 core promoter were evaluated in a panel of cell lines (A-E). Overall, no statistically significant differences were observed in promoter activities containing the -93G wild-type and -93A variant alleles. The relative luciferease activity for each transfectant was expressed by its luciferase level divided by that from pGL3-Basic (empty vector) promoter transfectant. Histograms represent the average + SD done in quadruplicate. Results presented are representative for at least three independent experiments. EV = empty vector (pGL3-Basic), G = -93G allele core promoter, A = -93A allele core promoter.

127

A B

C D

E

128

Figure 3.4 Effect of the MLH1-93G>A polymorphism on MLH1 D-region promoter activity The effects of the MLH1-93G>A polymorphism on MLH1 D-region promoter were evaluated in a panel of cell lines (A-E). Overall, statistically significant differences were observed in promoter activities containing the -93G wild-type and -93A variant alleles in all cell lines examined. The relative luciferease activity for each transfectant was expressed by its luciferase level divided by that from pGL3-Basic (empty vector) promoter transfectsnt. Histograms represent the average + SD done in quadruplicate. Results presented are representative for at least three independent experiments. EV = empty vector (pGL3-Basic), G = -93G allele D-region promoter, A = -93A allele D-region promoter.

129

Not surprisingly, across all cell lines examined, the shorter D-constructs showed lower activities than the C+D constructs (matched for the -93 allele status). However, the activities of the shorter promoter constructs were not similarly reduced among all cell lines. For -93G (wildtype) allele constructs, some cells, HEK293T and CCD-841-CoTr, showed attenuated promoter activity,

63% and 61% activity to that of the core promoter (C+D) activity. Other cell lines, SW620 and

SW480, showed more significant reduction in promoter activity, 36% and 41% to that of the core promoter, while HT29 cells showed the most dramatic reduction, 16% activity to that of the core promoter.

3.4.3 Effects of the -93G>A SNP on the EPM2AIP1 promoter activity

The results are also relatively consistent for IP1 constructs between all five cell lines examined. The IP1 -93A allele constructs consistently showed higher activity than the IP1 -93G allele constructs, although this difference did not reach statistical significance in all cell lines

(Figure 3.5 A-E and Table 3.1). In HT29 cells the IP1 -93G allele constructs had 0.65-fold activity to that of the IP1 -93A allele constructs (P = 0.05), while in SW620 cells IP1 -93G allele constructs had 0.8-fold activity to that of IP1 -93A allele constructs (P = 0.02) (Figure 3.5 D and B, respectively). In SW480 and HEK293T cells, the IP1 -93G allele construct had 0.8-fold activity to that of the IP1 -93A allele constructs and these differences showed a trend towards statistical significance (P = 0.09 in both cases) (Figure 3.5 C and E, respectively). CCD-841-

CoTr cells, however, showed the smallest difference between the IP1 -93G and IP1 -93A allele constructs (P = 0.15) with IP1 -93G constructs having the 0.9-fold activity to that of the IP1 -

93A constructs (Figure 3.5 A).

130

A B

C D

E

131

Figure 3.5 Effect of the MLH1-93G>A polymorphism on EPM2AIP1 (IP1) promoter activity The effects of the MLH1-93G>A polymorphism on EPM2AIP1 promoter were evaluated in a panel of cell lines (A-E). Overall, the -93A promoters were consistently more active than the - 93G promoters, although this difference was not always statistically significant. The relative luciferease activity for each transfectant was expressed by its luciferase level divided by that from pGL3-Basic (empty vector) promoter transfectsnt. Histograms represent the average + SD done in quadruplicate. Results presented are representative for at least three independent experiments. EV = empty vector (pGL3-Basic), G = -93G allele IP1 promoter, A = -93A allele IP1 promoter.

132

3.4.4 Effects of the MLH1-93G>A SNP on methylated core promoter (C+D) activity

To evaluate the cooperative effects of the promoter polymorphism and promoter methylation, we in vitro methylated the pGL3-Basic, C+D -93G, and C+D -93A plasmids by

HhaI methyltransferase. We identified three HhaI recognition sequences in the C+D promoter constructs, two within promoter C-region (at -242 and -247 positions, respectively) and one within promoter D-region (at -5 position). The completeness of methylation by HhaI methyltransferase was examined by measuring the extent of protection from digestion of the

HhaI restriction enzyme (Figure 3.6). In CCD-841-CoTr cells we observed no significant differences between the two methylated C+D promoter constructs (P = 0.07) with the -93A allele again having slightly higher activity than the -93G allele (Figure 3.7 A). Both methylated constructs had 5- to 6-fold reduced activities compared to their respective non-methylated constructs (Table 3.1). In SW620 cells, no significant differences were also observed between the methylated C+D -93G and A promoter constructs (Figure 3.7 B). Unlike non-methylated constructs in which the -93G allele had slightly higher activity, in methylated constructs the -

93A allele had slightly higher activity (1.16-fold), however this difference was not statistically significant (P = 0.08). Both methylated C+D constructs had approximately 3-fold reduced activities compared to their respective non-methylated C+D counterparts (Table 3.1).

In addition to methylation by HhaI methyltransferase, which methylated only three sites, we also evaluated complete promoter methylation by M.SssI methytransferase, which methylates all CpG dinucleotides. The completeness of methylation by M.SssI methyltransferase was examined by measuring the extent of protection from digestion of the BstUI restriction enzyme

(Figure 3.8). However, methylation of all available CpG sites completely abolished promoter activity to that of methylated pGL3-Basic empty vector in SW620 and CCD-841-CoTr cells.

133

Figure 3.6 In vitro methylation of pGL3-Basic, C+D -93G and C+D -93A vector constructs using HhaI methyltransferase followed by digestion with HhaI restriction endonuclease HhaI methylated and mock-methylated constructs treated with HhaI restriction endonuclease. The metylated constructs were protected from digestion, while mock-treated constructs were completely digested and yielded multiple bands on the gel. Note: the second, faint, band visible in the treated constructs is not a digestion product, but supercoiled constructs corresponding to the size of full vectors (~5.4 kb).

134

A B

Figure 3.7 Effect of the MLH1-93G>A polymorphism on MLH1 HhaI methylated core promoter activity The effects of the MLH1-93G>A polymorphism on methylated MLH1 core promoter were evaluated in a panel of cell lines (A and B). Overall no statistically significant differences were observed in promoter activities containing the -93G wild-type and -93A variant alleles. The relative luciferease activity for each transfectant was expressed by its luciferase level divided by that from pGL3-Basic (empty vector) promoter transfectsnt. Histograms represent the average + SD done in quadruplicate. Results presented are representative for at least three independent experiments. EV = empty vector (pGL3-Basic), G = -93G allele methylated core promoter, A = - 93A allele methylated core promoter.

135

Figure 3.8 In vitro methylation of pGL3-Basic, C+D -93G and C+D -93A vector constructs using M.SssI methyltransferase followed by digestion with BstUI restriction endonuclease M.SssI methylated and mock-methylated constructs treated with BstUI restriction endonuclease. The metylated constructs were protected from digestion, while mock-treated constructs were completely digested and yielded multiple bands on the gel. Note: the second, faint, band visible in the treated constructs is not a digestion product, but supercoiled constructs corresponding to the size of full vectors (~5.4 kb).

136

Table 3.1 Summary of luciferase reporter assay results C+D Region Promoter Constructs Fold Change P-value Relative Luciferase Activity G/A Cell Line C+D -93G (sd) C+D -93A (sd) HEK293T 11.23 (2.35) 12.01 (0.43) 0.94 0.57 SW620 90.07 (8.19) 85.03 (4.36) 1.06 0.22 HT29 62.20 (2.75) 68.36 (6.01) 0.91 0.09 SW480 102.09 (4.87) 112.46 (4.83) 0.91 0.09 CCD-841-CoTr 86.51 (16.08) 119.37 (25.43) 0.72 0.12

D Region Promoter Constructs Fold Change P-value Relative Luciferase Activity G/A Cell Line D -93G (sd) D -93A (sd) HEK293T 7.13 (1.31) 2.15 (0.34) 3.32 0.01 SW620 32.08 (3.02) 11.3 (1.02) 2.84 0.003 HT29 9.84 (0.67) 2.45 (0.44) 4.02 0.001 SW480 42.00 (2.21) 4.28 (0.13) 9.81 0.001 CCD-841-CoTr 52.85 (12.76) 2.98 (0.90) 17.73 0.02

EPM2AIP1 Promoter Constructs Fold Change P-value Relative Luciferase Activity G/A Cell Line IP1 -93G (sd) IP1 -93A (sd) HEK293T 13.25 (0.95) 16.23 (0.82) 0.82 0.09 SW620 61.68 (0.86) 76.04 (9.18) 0.81 0.02 HT29 16.94 (2.40) 26.13 (6.02) 0.65 0.05 SW480 39.51 (1.92) 50.70 (7.82) 0.78 0.09 CCD-841-CoTr 22.19 (1.11) 24.71 (0.86) 0.90 0.15

HhaI Treated Promoter Constructs Fold Change P-value C+D Relative Luciferase Activity G/A Cell Line CH3 C+D -93G (sd) CH3 C+D -93A (sd) SW620 28.91 (7.98) 33.56 (5.52) 0.86 0.08 CCD-841-CoTr 16.14 (2.20) 20.42 (2.80) 0.79 0.07

M.SssI Treated Promoter Constructs Fold Change P-value C+D Relative Luciferase Activity G/A Cell Line CH3 C+D -93G (sd) CH3 C+D -93A (sd) SW620 1.16 (0.07) 1.02 (0.13) 1.14 0.07 CCD-841-CoTr 0.80 (0.07) 0.83 (0.06) 0.96 0.68

Relative luciferase activity, sd = standard deviation, CH3 = methylated constructs

137

3.4.5 Effect of the MLH1-93G>A SNP on transcription/nuclear factor binding

We carried out EMSA to assess whether the differences seen in the luciferase reporter activity between the G and the A allele were due to differential binding of nuclear factors. These experiments revealed that multiple factors bound the probe. We used increasing amounts of unlabelled probe as a competitor to verify the specificity of binding interactions. In reactions carried out with the nuclear extract of the CCD-841-CoTr cell line, multiple factors bind to the labelled probe. In general, the specificity of certain interactions appeared to be very high with more than 5,000x unlabelled probe required to compete out the binding reaction. In addition, these factors appear to have different affinities for the binding sequence and consequently seem to exhibit competitive binding. Addition of the competitor probe diminished certain interactions

(top arrowhead, Figure 3.9) and strengthened others (middle and bottom arrowheads, Figure

3.9). It appears that nuclear factors 1 and 3 (top and bottom arrowheads, Figure 3.9) had higher affinity for the -93G probe and the higher concentrations of competitor were required to compete out this binding, which was not seen with the -93A allele.

138

Figure 3.9 Determination of the protein-binding to the -93G>A SNP region by EMSA The complete reaction mixture contained nuclear extract of CCD-841-CoTr cells and biotin- labelled probe with either the -93G or -93A allele (lanes 2 and 7, respectively). As a competitor 500- (lanes 3 and 8), 2,000- (lanes 4 and 9), and 25,000-fold (lanes 5 and 10) excess amount of the corresponding -93G or -93A unlabelled probes were used. No extract in the mixtures (lanes 1 and 6) was used as a negative control. Arrowheads indicate three bands that specifically bind the probes.

139

3.5 DISCUSSION

Since the MLH1-93G>A polymorphism is associated with increased risk of MSI-H CRCs

(296, 341) and with MLH1 IHC deficiency (356), it is possible that it plays a functional role in affecting transcription of the MLH1 gene. To our knowledge, this is the first study to examine the effects of the MLH1-93G>A promoter polymorphism on MLH1 promoter activity using constructs of two different lengths. This is also the first study to evaluate the effects of this SNP on transcription of the antisense gene, EPM2AIP1, which shares a bidirectional promoter with

MLH1.

However, we did not observe any statistically significant differences between the -93G and -93A alleles in core promoter constructs (Figure 3.3 A-E). The largest observable difference, of 30-40% in promoter activity in favour of the -93A allele, occurred in CCD-841-

CoTr normal colonic epithelial cells. This is comparable to what was reported in another study, however their observations occurred in SW620 colon cancer cell line (360). Only one other study attempted to evaluate the functional effects of the MLH1-93G>A promoter polymorphism, however no details, other than negative results, are given regarding the experimental approach or the cell lines used (306).

Even though the -93G>A alteration occurs at putative transcription factors, GT-IIB and

NF-IL6, binding sites, it is likely that these factors are not the main drivers of MLH1 transcription (304). Indeed, it was observed in several cell lines that CAAAT-box binding factor

(CBF, or NF-Y) is a major regulator of MLH1 transcription (334). Two CBF recognition sequences, CCAAT, are present in the MLH1 promoter, both upstream of the -93G>A polymorphism at -278 and -141 base pairs from the translational start site. It was found that mutations of CCAAT sequences or methylation of nearby CpG dinucleotides significantly

140 reduce MLH1 promoter activity (334). It is likely that the effects of the -93G>A polymorphism are simply masked in a core promoter construct. Even a maximum observable effect, a 40% difference in promoter activity between the two alleles, would not be enough to justify why this polymorphism is associated with microsatellite instability or with loss of MLH1 protein expression. The cellular levels of MLH1 need to be reduced by more than 80% before any defects in MMR are observed (211). Other functions of MLH1, such as G2/M cell cycle arrest, are affected by a much smaller decrease in MLH1 levels (211). The fact that MLH1 levels are conserved for MMR signifies the importance of this system in a cell.

In contrast, we have observed significant differences in promoter activities between the two alleles in shorter D-region constructs (Figure 3.4 A-E). This difference was consistent in all cell lines examined. As expected, the overall activities of the D-region promoters were considerably lower compared to the core promoters. Surprisingly, in two cell lines, CCD-841-

CoTr and HEK293T, this decrease in promoter activity was not so pronounced in wildtype -93G constructs. The shorter D-region construct did not contain any CCAAT sequence motifs, so the promoter activities observed here must be due to transcription factors other than CBF. Therefore,

MLH1 transcription is initiated by different sets of transcription factors that appear to be cell-line specific.

Nevertheless, the effects of the MLH1-93G>A polymorphism on promoter activity was much more pronounced in the shorter promoter constructs. The activities of D-region -93A promoters were reduced by 65% in SW620 cells to as much as 94% in CCD-841-CoTr cells compared to D-region -93G promoters. This effect could occur biologically in individuals whose

MLH1 promoter C-region is inactivated by methylation, while the promoter D-region remained hypomethylated. Differential methylation of the MLH1 promoter regions has been observed in

141

CRC cell lines and in CRC patient samples (353, 362). Methylation of distal promoter regions, A and B, was reported in normal mucosa as well as MSS CRC tissues (362). We explored the possibility that the effects of this polymorphism are more pronounced in cases where the promoter C-region is selectively methylated. We methylated our C+D constructs using HhaI methyltransferase. Two HhaI recognition sites were identified in C region (at -242 and -247 loci) and one in D region (at -5 locus). Methylated constructs yielded 2-3-fold reduction in promoter activity in SW620 cells and 5-6-fold reduction in CCD-841-CoTr cells without significant differences in activities between the -93G and -93A alleles (Figure 3.7 A and B). Moreover, methylated C+D promoters with a wildtype -93G allele were less active than shorter -93G D promoters indicating that our attempt to mimic D promoter by methylation was unsuccessful

(Table 3.1). Complete methylation of C+D constructs by M.SssI methyltransferase abolished any promoter activity.

Methylation of C and D regions correlate most strongly with loss of MLH1 gene expression, however methylation of D region, as well as differences in methylation patterns between C and D regions, are not well studied (363). Demethylation of MLH1 promoter with 5-

Aza-2’-Deoxycytidine resulted in significant demethylation of C and D rather than A and B regions, suggesting the presence of a methylation-response element in region C or D (353).

Recent studies have reported associations of the MLH1-93G>A polymorphism with MLH1 promoter methylation in endometrial and colon cancers (358). This polymorphism was also associated with CIMP1 colon tumours, which are MSI-H due to MLH1 promoter methylation

(355). Such findings raise the possibility that this polymorphism affects binding of methylation responsive elements making the region more susceptible to methylation and gene silencing.

142

In addition to its role in MLH1 transcription, analyses of the reverse promoter constructs indicate that the -93G>A promoter polymorphism also plays a role in transcription of the

EPM2AIP1 (Laforin interacting protein) gene. The -93A promoter construct was 1.5-fold more active in HT29 cells than the -93G promoter (Figure 3.5 D). Similar results were observed in other cell lines examined (Figure 3.5). The -93G>A polymorphism is located in a region (-20 to

-213 base pairs from the MLH1 translational start site) that was previously shown to be crucial for EPM2AIP1 transcription (354). Promoter methylation that silences the MLH1 gene also results in loss of EPM2AIP1 gene expression (354). The functions of the EPM2AIP1 gene are currently unknown. EPM2AIP1 gene comprises one large exon 1824 nucleotides long encoding a 607 amino acid protein (336). The deduced protein contains two coiled-coil domains and the 3’ untranslated region contains MIR sequences (336, 364). Copy number amplifications in

EPM2AIP1 gene were recently observed in 30% of idiopathic generalized epilepsy (IGE) patients (365), while a genome-wide breast cancer association study identified EMP2AIP1 gene as a high-priority candidate that modifies breast cancer risk in BRCA2 mutation carriers (366).

Mutations in Laforin (EPM2A), which interacts with EPM2AIP1, cause fatal neurological disorder known as Lafora disease (progressive myoclonus epilepsy) (367). More recently,

Laforin was implicated in Wnt signalling where it acts as a phosphatase for GSK-3β and serves as an inhibitor of Wnt signalling pathway (368). Laforin also negatively regulates cell cycle progression by phosphorylating cyclin D1 in a GSK-3β-dependent manner (369). Insertion- mediated mutagenesis of EPM2A results in rapid early-onset lymphoma in immune compromised mice (368). It is not clear whether EPM2AIP1 plays a role in Laforin-mediated tumourigenesis.

The EMSA results indicate that the region surrounding the -93G>A SNP was bound by

143 multiple nuclear factors and that the -93G>A SNP altered the affinity and binding pattern of these factors. These results corroborate the findings of our promoter activity assays. However, our experimental set up cannot determine the identity of the nuclear factors. Several studies have analyzed the promoter region of MLH1 for potential transcription factor binding sites. Based on the GENETYX-SV software analysis, one study showed that the -93G>A SNP occurs in a putative GT-II-B and NF-IL6 binding sites (304). While a subsequent study identified a protein- binding site at the -93G>A location, based on the TFSEARCH and DNASIS software no known proteins were predicted to bind this region (361). These studies suggested that the region surrounding the -93G>A SNP is important for protein binding. The results from our EMSA provide further evidence that the -93G>A SNP alters transcription of the two genes, MLH1 and

EPM2AIP1, regulated by a common promoter.

The main limitation of our study is our inability to conclude whether the proteins that bind the region surrounding the -93G>A SNP are activators or repressors, or which gene they are responsible for regulating. It is also unclear whether a subset of the observed proteins represents multi-protein complexes of coactivators or corepressors or simply single factors.

Our study has several strengths. This is the first study to examine the effects of the

MLH1-93G>A promoter polymorphism on promoter activity in MLH1 promoter constructs of two different lengths. We used a range of cell lines representative of CRC to demonstrate the functional effects of this polymorphism. This is also the first study to evaluate the effects of this

SNP on transcription of EPM2AIP1, demonstrating that a genetic alteration simultaneously affects both the regulation of MLH1 and its antisense gene. We demonstrated that the -93G>A is a moderately functional polymorphism, which in combination with previously published genetic epidemiology studies, supports its role in modifying the risk of cancer development. However,

144 the effects of this polymorphism are most pronounced in promoters with inactivated C region.

The observed associations of this polymorphism with MSI-H CRCs and MLH1 IHC deficiency are likely caused through its association with DNA meyhylation. However, further studies are required to examine whether this polymorphism predisposes the region to DNA methylation or whether it simply serves as a surrogate marker for an epigenetic event. In addition, this polymorphism increased promoter activity of the EPM2AIP1 gene, although the exact consequences of this are unknown.

145

CHAPTER 4

Specific Variants in the MLH1 Gene Region Contribute to DNA Methylation,

Loss of Protein Expression, and MSI-H Colorectal Cancer

Miralem Mrkonjic1,2,3, Nicole M. Roslin4, Celia M. Greenwood4,5, Stavroula Raptis1,2,3,

Aaron Pollett3, Peter W. Laird6, Vaijayante V. Pethe2, Theodore Chiang4, Darshana

Daftary7, Elizabeth Dicks8, Steven Gallinger1,2,7,9, Patrick S. Parfrey8, H. Banfield

Younghusband8, John D. Potter10, Thomas J. Hudson11,12,13, John R. McLaughlin2,5,7, Roger

C. Green8, Brent W. Zanke7,14, Polly A. Newcomb10, Andrew D. Paterson4,5, and Bharati

Bapat1,2,3

1) Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON,

2) Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, ON, 3) Department of

Pathology and Laboratory Medicine, Mount Sinai Hospital, Toronto, ON, 4) Program in

Genetics and Genome Biology, Hospital for Sick Children, Toronto, ON, 5) Dalla Lana School of Public Health, University of Toronto, Toronto, ON 6) USC Epigenome Center, University of

Southern California, Los Angeles, CA, 7) Ontario Familial Colorectal Cancer Registry, Cancer

Care Ontario, Toronto, ON, 8) Memorial University of Newfoundland, St. John’s, NL, 9)

Department of Surgery, University of Toronto, Toronto, ON, 10) Fred Hutchinson Cancer

Research Centre, Seattle, WA, 11) Ontario Institute for Cancer Research, Toronto, ON, 12)

146

Department of Molecular Genetics, University of Toronto, Toronto, ON, 13) Department of

Medical Biophysics, University of Toronto, Toronto, ON, 14) Ottawa Hospital Research

Institute, Ottawa, ON.

The work in this chapter was primarily contributed by Miralem Mrkonjic. NMR, a biostatistician, performed statistical analyses and modelling. CMG, a principal investigator, performed preliminary statistical analyses. SR, a M.Sc. student, genotyped I219V (rs1799977) and assisted in genotyping the MLH1-93G>A SNP. VVP, a research associate, assisted in genotyping the MLH1-93G>A SNP. CMG, TC, BWZ and TJH performed GWAS for the

ARCTIC group. TC collected and organized relevant ARCTIC data for this project. DD, JRM, and SG provided DNA samples from Ontario along with relevant clinical and pathologic data.

RCG, ED, HBY, and PSP provided DNA samples from Newfoundland along with relevant clinical and pathologic data. PAN and JDP provided DNA samples from Seattle along with relevant clinical and pathologic data. AP performed immunohistochemical staining. PWL analyzed MLH1 methylation status for Seattle samples and validated 25% of Ontario samples.

ADP and BB provided critical feedback and guidance for the project. A modified version of the data presented in this chapter will be submitted for publication.

147

Chapter 4 Specific Variants in the MLH1 Gene Region Contribute to DNA

Methylation, Loss of Protein Expression and MSI-H Colorectal Cancer

4.1 SUMMARY

Our previous work on mismatch repair polymorphisms identified an association between a MLH1 promoter SNP (rs1800734) and microsatellite unstable (MSI-H) colorectal cancers

(CRCs) in two distinct populations (P = 0.001 for Ontario and P = 0.003 for Newfoundland). Our current study expanded on this finding as we explored the genetic basis of DNA methylation in this region of chromosome 3. We hypothesized that specific polymorphisms in the MLH1 gene region predispose it to DNA methylation, resulting in the loss of MLH1 gene expression and mismatch-repair function, consequently leading to the genome-wide microsatellite instability.

We first evaluated our hypothesis in a study population from the province of Ontario (Canada)

(901 cases, 1097 controls) and replicated major findings in two additional populations from the province of Newfoundland and Labrador (Canada) (479 cases, 336 controls) and from Seattle

Cancer Registry (US) (591 cases, 629 controls). Logistic regression was used to test for association between SNPs in the region of MLH1 and CRC, MSI-H CRC, MLH1 gene expression in CRC, and DNA methylation in CRC. The association between rs1800734 and

MSI-H CRCs, previously reported in Ontario and Newfoundland, was replicated in the Seattle population. Two additional nearby SNPs, in strong linkage disequilibrium with rs1800734, showed a strong association with MLH1 promoter methylation, loss of MLH1 protein, and MSI-

H CRC in all three study populations.

148

In order to examine whether these variants acted through the pathway that we hypothesized, we next created multiple logistic regression models for MSI-H versus non-MSI-H

CRCs. We sequentially added the predictors to these models beginning with MLH1 IHC status, then MLH1 promoter methylation status, both MLH1 IHC status and MLH1 promoter methylation status, and both MLH1 IHC status and MLH1 promoter methylation status along with each SNP. The MLH1 IHC status, MLH1 promoter methylation status, and the 3 SNPs were strongly associated with the tumour MSI-H status. The logistic regression model of MSI-H CRC that included MLH1 IHC status and MLH1 promoter methylation status fit most parsimoniously in the dataset that combined all three study populations. When rs1800734 was added to this model, its effect was not statistically significant (P-value = 0.72 versus P = 2.30x10-4 when the

SNP was examined alone). The observed association of rs1800734 with MSI-H CRC occurs through its effect on the MLH1 promoter methylation, MLH1 IHC deficiency, or both.

4.2 INTRODUCTION

Colorectal cancer (CRC) is the fourth most common cancer, and second leading cause of cancer-related deaths in North America (11). CRCs can be parsimoniously subdivided into two major groups defined by the genetic pathways involved. The suppressor pathway, observed in more than 80% of CRC cases, involves abnormalities of the APC/wingless signalling pathway and characterized by frequent somatic mutations in tumour suppressor genes and oncogenes, chromosomal instability (CIN), and microsatellite stable (MSS) tumour status. The mutator pathway, on the other hand, accounts for ~15-20% of CRC cases and results from a deficiency of the mismatch-repair system, which leads to genome-wide microsatellite instability (MSI) (301,

149

370). MSI tumours have clinicopathologic features distinct from MSS tumours in that they tend to occur more commonly in proximal colon, have mucinous histology, tumour infiltrating lymphocytes, poor differentiation, and Crohn’s-like reaction (80).

Our previous work aimed to elucidate the role of a panel of MMR genes SNPs in CRC.

Included in this panel was the MLH1-93G>A promoter polymorphism (rs1800734) and we observed its association with MSI-H tumours in two study populations from the Canadian provinces of Ontario and Newfoundland and Labrador (341). Several studies have expanded on our findings and have observed associations between the MLH1-93G>A polymorphism and

MLH1 promoter methylation as well as with CpG Island Methylator Phenotype (CIMP) CRCs

(355, 358). Another study has observed the propensity of DNA methylation to occur on a DNA strand carrying the MLH1-93G>A variant ‘A’ allele, whereas no methylation occurs on a strand carrying the wild type ‘G’ allele (371).

Approximately 80-90% of sporadic MSI CRCs exhibit loss of MMR function due to

MLH1 promoter methylation (249, 250). The potential mechanism by which MLH1 is epigenetically silenced is unclear, but the association between the MLH1 promoter polymorphism (rs1800734) and methylation may indicate sequence specificity. Others have reported associations between rs1800734 and MLH1 promoter methylation and CIMP-positive

CRCs (355), or with MLH1 IHC deficiency (356); however, no testable model has been proposed to elucidate such findings.

We hypothesized a stepwise progression to MSI-H CRCs based on genetic susceptibility to DNA methylation leading to the MLH1 gene silencing and microsatellite instability (Figure

4.1). Further, we hypothesized that the MLH1-93G>A polymorphism may be in LD with other variants, and that one or more of them predisposes the region to methylation, which then results

150 in loss of MLH1 gene expression and a defective MMR system, leading to microsatellite instability. We have undertaken a population-based approach using three independent study populations. We first screened Ontario samples for 101 SNPs on chromosome 3 surrounding the

MLH1 gene, while the Newfoundland and Seattle samples were screened for 19 and 16 SNPs of interest respectively. Sequential analysis of markers associated with MSI-H CRC, MLH1 IHC deficiency, and MLH1 methylation revealed two additional markers highly associated with MSI-

H CRC subtype. This study uses a unique combination of genetic epidemiology and functional strategies to identify and characterize alleles that play a role in modifying CRC development in an important subgroup of cases.

151

Figure 4.1 Proposed model for genetic susceptibility to DNA methylation in sporadic MSI- H CRCs Specific SNPs predispose the region, including the MLH1 gene promoter, to methylation, which results in promoter silencing and loss of MLH1 gene expression that is measured by immunohistochemical staining. Loss of the MLH1 gene expression leads to genome-wide microsatellite instability and colorectal cancer.

152

4.3 MATERIALS AND METHODS

4.3.1 SNP Selection Criteria

The polymorphisms analyzed by 5’ nuclease assay in this study were selected on the basis of extensive database and literature searches as described previously (341, 372). The 500kb region of chromosome 3 surrounding MLH1 was genotyped for all available polymorphisms from a combination of Affymetrix GeneChip Human Mapping 100K and 500K platforms. In addition, we selected SNPs in the region of interest that are in strong LD with rs1800734 in the

HapMap data (release 27 in CEU population), publicly available at www.hapmap.org. Two such

SNPs were identified and were also included.

4.3.2 Study Subjects

We conducted this study with subjects from three different locations: the province of

Ontario, the province of Newfoundland and Labrador (hereafter referred to as Newfoundland), and Seattle. CRC patients and unaffected controls from Ontario and Newfoundland were accrued as described previously (341, 372). Briefly, for Ontario, 1004 CRC patients and 1947 controls were identified by the Ontario Familial Colorectal Cancer Registry (313). In order to minimize the potential for population stratification, we excluded from the analyses case patients who were non-white and those who did not report ethnicity. Of the 1004 cases, 929 were white. Further, we excluded all CRC patients with known MMR germline gene mutations (11 cases with a known mutation in MLH1, 10 in MSH2, and one in MSH6) and all CRC cases that were deficient in one of the MMR proteins, other than MLH1 (14 MSH2/MSH6 IHC deficient tumours). 901 CRC patients remained and constitute the Ontario case patients. All patient information as well as blood and tissue specimens was obtained as described previously.

153

A total of 1957 control subjects from Ontario agreed to participate in the study and completed all three questionnaires (family, personal, and diet questionnaires). Of the 1957, 1314 control subjects provided blood samples, and 1098 of them were white. These 1098 control subjects were successfully genotyped and thus constituted the Ontario control subjects.

The accrual pattern followed by the Newfoundland Familial Colorectal Cancer Registry was similar to that followed by the Ontario Familial Colorectal Cancer Registry. Cases with colorectal cancer who were younger than age 75 years and diagnosed between January 1, 1999, and December 31, 2003, were identified through the Newfoundland tumour registry; 1144 patients with CRC, of whom 747 responded to the family history questionnaire and 555 provided blood samples; 490 provided ethnicity information and were classified as white. Four cases with known germline mutations in MSH2 were excluded, as were 11 MMR IHC deficient cases (5 for

MSH2, 5 for MSH6, and one for PMS2 deficiency). The remaining 479 patients constitute the

Newfoundland case patients.

Newfoundland controls were recruited using random digit dialling, and matched to case patients by sex and 5-year age group; 1602 controls agreed to participate, of whom 336, to this point, completed all three questionnaires, provided blood samples, and were white.

For Seattle, cases and controls were recruited by the Fred Hutchinson Cancer Research

Center (FHCRC) as described previously (373). Briefly, CRC patients were diagnosed in

Washington’s King, Snohomish, or Pierce Counties between January 1998 and June 2002 with incident of CRC between the ages of 20 and 74 years. All CRC cases were included (no family history eligibility criteria) as well as their first-degree relatives. Of the 1814 case patients and

1531 control subjects who completed all three questionnaires, 1497 case patients and 745 control subjects donated a blood sample. For this study, we obtained DNA samples for 668 CRC case

154 patients and 667 control subjects. Fifteen MMR IHC deficient cases were excluded (10 for

MSH2, one for MSH6, and 4 for PMS2 deficiency).

Mean age was calculated for control subjects based on the date of completion of the family history questionnaire, and for cases based on the age at diagnosis. Data were collected on tumour microsatellite instability (MSI) status, tumour location, tumour stage, and tumour grade, when available, through review of pathologic and/or surgical reports. Tumours were staged and graded according to the method of the American Joint Committee on Cancer (45).

4.3.3 Molecular Genetic Analysis

4.3.3.1 SNP Genotyping

Peripheral blood lymphocytes were isolated from whole blood by use of Ficoll–Paque gradient centrifugation according to the manufacturer’s protocol (Amersham Biosciences, Baie d’Urfé, Quebec, Canada). Phenol–chloroform or the Qiagen DNA extraction kit (Qiagen Inc.,

Montgomery Co., MD) was used to extract genomic DNA from lymphocytes. The fluorogenic 5’ nuclease polymerase chain reaction (PCR) assay or the TaqMan assay (315) was used to genotype each of the following five SNPs: MLH1-93G>A (rs1800734), I219V (rs1799977),

IVS14-19A>G (rs9876116), LRRFIP2 intron 26 IVS26-18T>C (rs749072), LBA1 intron 8

(rs4431050), and intergenic rs13098279. Primers and probes were designed with Primer Express version 2.0 software (Applied Biosystems, Foster City, CA). Sequences of primers and probes as well as the master reaction mixtures for MLH1-93G>A (rs1800734), I219V (rs1799977), and

IVS14-19A>G (rs9876116) were described previously (341). The LRRFIP2 rs749072, LBA1 rs4431050, and intergenic rs13098279 polymorphisms were genotyped by use of the Eurogentec qtPCR kit (Eurogentec, San Diego, CA). The master reaction mixture contained (final

155 concentrations) 0.72 mM primers and 0.16 mM probes (both VIC and FAM labeled). Assays for these polymorphisms used the following conditions: 2 minutes at 50°C, 10 minutes at 95°C

(AmpliTaq Gold activation), 15 seconds at 92°C (DNA denaturation), and 1 minute at 58.5°C

(primer–probe annealing and primer extension). The denaturation–annealing–extension sequence

(15 seconds at 92°C and 1 minute at 58.5°C) was repeated for 39 cycles. The rs749072, rs4431050, and rs13098279 assays used 12.5 µL of the master reaction mixture and 1 µL of

DNA template (at 2–7.5 ng/µL). Primer and probe sequences used for genotyping rs749072, rs4431050, and rs13098279 are listed in Table 4.1.

All genotyping assays using 5’ nuclease assay were conducted in 96-well polypropylene plates (Axygen Scientific, Union City, CA), and the results were analyzed with the Applied

Biosystems 7900HT Sequence Detection system and the accompanying software—SDS versions

2.0 and/or 2.1 (Applied Biosystems, Foster City, CA).

156

Table 4.1 Sequences of primers and probes

Gene/SNP 5’ to 3’ Primer/Probe Sequences LRRFIP2 F – ACCAAAAAGTGGTGACTTCTAGTGC IVS26-18T>C R – CTCAAGCCGGCTAATCTGTAAGTAT rs749072 FAM – ACTAGAGGCCTATGTTCT-MGBNFQ VIC – TACTAGAGGCCTGTGTTCT-MGBNFQ rs13098279 F – TGCAGCTTGAGTTGACAAATGAAA R – TAAATCCTGCTTCATCAGCTGAGG FAM – TAAGACATCTCACCATAAGGA-MGBNFQ VIC – ACATAAGACATCTTACCATAAG-MGBNFQ LBA1 rs4431050 F – TGGAGTACACGTACTGGAGGTAGA IVS7+477G>A R – CCCAGGGATCTGGATGTGTT FAM – TGATACCATCTCTATCCCC-MGBNFQ VIC – TGATACCATCTTTATCCCC-MGBNFQ LRRFIP2 MS-PCR FM – TTTCGTAGTTTCGTGGGATTC RM – CAACGCGATAAAAAAATAAACGT FU - GGTTTTTGTAGTTTTGTGGGATTT RU - CAACACAATAAAAAAATAAACATT MLH1 MethyLight F – AGGAAGAGCGGATAGCGATTT R – TCTTCGTCCCTCCCTAAAACG FAM – CCCGCTACCTAAAAAAATATACGCTTACGCG-BHQ-1 Alu-C4 MethyLight F – GGTTAGGTATAGTGGTTTATATTTGTAATTTTAGTA R – ATTAACTAAACTAATCTTAAACTCCTAACCTCA FAM – CCTACCTTAACCTCCC - MGBNFQ

F = forward primer; R = reverse primer; FAM = wild type allele probe; VIC = variant allele probe; MGBNFQ = minor groove binder non-florescent quencher, FM = methylated forward primer, RM = methylated reverse primer, FU – unmethylated forward primer, RU = unmethylated reverse primer, BHQ-1 = black hole quencher-1.

157

SNPs located in the 500kb region of chromosome 3 surrounding the MLH1 gene were genotyped using the Affymetrix GeneChip Human Mapping 100K and 500K platforms as a part of the Assessment of Risk of Colorectal Tumours in Canada (ARCTIC) project, described previously (265). An additional 96 SNPs in the 500kb region were genotyped for the Ontario samples spanning the following genes: DCLK3, LBA1, EPM2AIP1, MLH1, LRRFIP2, and

GOLGA4. The list of SNPs genotyped for the Ontario samples is provided in Supplementary

Table 1. The Newfoundland and Seattle samples were genotyped using the Illumina ISelect

500K Chip platform. A total of 16 SNPs in this region were genotyped including rs1800734, rs749072, and rs13098279. The Newfoundland samples were further characterized for three polymorphisms: the I219V (rs1799977) and IVS14-19G>A (rs9876116) genotyped previously

(341), and LBA1 rs4431050. The rs1800734 SNP was genotyped both by the Affymetrix Chips and Taqman platforms and was used to validate genotyping calls.

SNPs were excluded from the data analysis if the minor allele frequency was less than

1% and the call rate was less than 87% in each of the three collection centres. Additionally,

SNPs were excluded if the p-value from a test for Hardy-Weinberg equilibrium was less than 10-

4 in the controls. Individuals were excluded if the genotyping call rate was less than 87%.

4.3.3.2 Tumour Microsatellite Instability Analysis

Tumour MSI analysis was performed as described previously (321). Briefly, paraffin- embedded colorectal tumour tissue from patients with incident cases of colorectal cancer and paraffin-embedded normal colorectal tissue from the same patients were microdissected in areas with more than 70% cellularity in tumour and normal cell populations, respectively. MSI analysis was carried out with five or more microsatellite markers from the panel of 10

158 microsatellite markers, as recommended by the National Cancer Institute; mononucleotides

BAT-25, BAT-26, BAT-40, and BAT-34C4; dinucleotides D2S123, D5S346, ACTC, D18S55, and D10S197; and one penta-mono-tetra compound marker, MYC-L (65). MSI was indicated by the presence of altered or additional bands of the PCR-amplified product from the tumour tissue, compared with the bands from matched normal colon tissue. MSI status was assigned as MSI high (MSI-H, ≥30% unstable markers among all markers tested), MSI low (MSI-L, <30% markers unstable), or microsatellite stable (MSS, no unstable markers) as described by the NCI recommended guidelines for MSI testing (65). For the analysis, MSI-L and MSS groups were combined into one group (hereafter referred to as “MSS/L”). Primers were obtained from

Applied Biosystems (Foster City, CA), and primer sequences were described previously (341).

All microsatellite instability assays were conducted in 96-well polypropylene plates

(Axygen Scientific, Union City, CA), and the results were analyzed with the Applied Biosystems

3130xl DNA Analyzer system and the accompanying software—GeneMapper version 3.7 for

Microsatellite Instability (Applied Biosystems, Foster City, CA).

4.3.3.3 DNA MLH1 Promoter Methylation Analysis

MLH1 promoter methylation was analyzed using MethyLight (374, 375). Tumour DNA from the available cases was subject to sodium-bisulphite conversion using EZ DNA

Methylation Gold Kit (Zymo Research, Orange, CA) per manufacturer’s recommendations.

Briefly, 50ng of tumour DNA was mixed with the CT Conversion Reagent and incubated at

98°C for 10 minutes followed by 64°C for 2.5 hours. The samples were then mixed with the M-

Binding buffer and centrifuged through Zymo-Spin IC Column. DNA was subsequently washed

159 with the M-Wash buffer and treated with the M-Desulphonation buffer for 20 minutes. DNA was further washed twice with the M-Wash buffer and eluted with M-Elution buffer.

MethyLight analysis of the MLH1 promoter was performed as previously described

(375). The Alu-C4 control reaction was used to normalize for sodium-bisulphite-converted input

DNA (375). The samples were classified as positive for MLH1 promoter methylation if percent methylated reference (PMR) ≥ 10 as described previously (375). The primer and probe sequences for the MLH1 and Alu-C4 as well as the real-time PCR program for MethyLight analysis have been previously reported (375). All assays were run in 96-well polypropylene plates (Axygen Scientific, Union City, CA) and the results were analyzed using the ABI 7500HT

Real-Time PCR instrument and the accompanying software, SDS version 2.2 (Applied

Biosystems, Foster City, CA). Independent quality control for the MLH1 promoter methylation analysis was performed on 15% of Ontario samples.

4.3.3.4 MMR Protein Immunohistochemical Staining Analysis

Formalin-fixed, paraffin-embedded tissues sectioned at 4µm were deparaffinized and rehydrated with alcohol and xylene for immunohistochemical analysis of MLH1 as described previously (340, 376). Following rehydration, the slides were placed into either a pressure cooker or microwave antigen retrieval medium (10 mmol/L citrate buffer at pH 6.0 for 3 minutes at

115°C in microMED T/T Mega; Hacker Instruments & Industries, Inc., Fairfield, NJ). Protein blocker (20%) with avidin was used to prevent nonspecific binding (Signet Laboratories, Inc,

Dedham, MA). After the slides were washed in PBS, the sections were incubated with mouse antibody against MLH1 (1:40; G168-728, PharMingen, San Diego, CA), MSH2 (1:100; FE 11,

Oncogene Research Products, Cambridge, MA), MSH6 (1:100; 44, BD Transduction

160

Laboratories, Mississauga, Ontario, Canada), or PMS2 (1:50; BD Biosciences PharMingen,

Mississauga, Ontario, Canada) for 1 hour. The antibodies were then detected using avidin-biotin:

3,3’-Diaminobenzidine tetrachloride was used as the chromogen and hematoxylin for counterstaining.

4.3.3.5 LRRFIP2 Methylation Specific PCR (MS-PCR)

The methylation specific PCR reaction contained 10X Taq Buffer, 1.5 mM MgCl2, 0.2 mM dNTPs, 1 mM primers (specific for methylated or nonmethylated template), 10 ng of sodium-bisulphite converted template DNA, and 1 unit platinum Taq DNA polymerase

(Invitrogen Life Technologies, Burlington, Ontario, Canada). The PCR conditions were as follows: initial denaturation (95°C for 4 minutes), denaturation 95°C for 30 seconds, annealing

(58°C for 45 sec), extension (72°C for 1 min), and final extension (72°C for 5 min). The denaturation-annealing-extension steps were repeated 40 times. Following all MS-PCR reactions, DNA products were visualized by gel electrophoresis on 2% agarose gel (Invitrogen

Life Technologies). The sequences of the forward and reverse primers are listed in Table 4.1.

4.3.4 Statistical Methods

Association with each SNP was assessed using logistic regression for each of the outcomes: colon cancer, methylation, IHC, and MSI, using an additive coding of genotypes for each SNP. Sex and age at exam were used as covariates when colon cancer was the outcome, whereas sex and age at diagnosis were used in models with the other outcomes. Analysis of separate models for the three collection sites and the combined dataset was undertaken; in the analysis of the combined data, site was included as a covariate.

161

Multiple logistic regression models were also evaluated in the subset of the data in which there were no missing values for all of the variables. MSI status was regressed on combinations of IHC, methylation, and SNP, for each of three SNPs that showed associations in the initial logistic regression models. Due to the strong association between MSI, IHC and methylation and nearly complete separation, maximum penalized likelihood was used to produce finite parameter estimates (377). Sex and age at diagnosis were used as covariates, as well as collection site in the combined analysis. All statistical analyses were performed with R 2.7.0 (http://www.R- project.org). All haplotype figures were generated using Haploview software

(http://www.broadinstitute.org/mpg/haploview).

4.4 RESULTS

We genotyped the 901 case patients and 1097 control subjects from Ontario for 101 SNPs in a 500kb region of chromosome 3 surrounding the MLH1 gene (Figure 4.2). We then screened the Newfoundland (479 case patients and 336 control subjects) and Seattle (591 case patients and

629 control subjects) samples for 19 and 16 SNPs of interest, respectively. Tumour microsatellite instability was evaluated for 744 Ontario, 463 Newfoundland, and 487 Seattle cases. MLH1 immunohistochemical (IHC) staining was undertaken on 709 Ontario, 462 Newfoundland, and

517 Seattle cases, and MLH1 promoter methylation analysis was performed on 569 Ontario, 468

Newfoundland, and 210 Seattle cases. Characteristics of all three study populations are summarized in Table 4.2.

162

Table 4.2 Characteristics of three study populations All Subjects Ontario Seattle Newfoundland Controls No. 1097 629 336 Percent male 56 41 55 Age at exam, y – mean (sd) 64.3 (8.6) 60.6 (10.2) 60.2 (8.6) Cases No. 901 591 479 Percent male 53 40 62 Age at exam, y – mean (sd) 61.6 (9.0) 60.1 (10.2) 62.3 (9.1) Age at diagnosis, y – mean (sd) 60.7 (9.0) 60.1 (10.2) 60.9 (8.9) Tumour Histological Grade No. 719 541 417 1 – Well differentiated – No. (%) 79 (11.0) 46 (8.5) 58 (13.9) 2 – Moderately differentiated – No. (%) 552 (76.8) 374 (69.1) 324 (77.7) 3 – Poorly differentiated – No. (%) 88 (12.2) 121 (22.3) 35 (8.4) Tumour TNM Stage No. 751 499 Na Stage I – No. (%) 172 (22.9) 150 (30.1) Na Stage II – No. (%) 291 (38.7) 140 (28.1) Na Stage III – No. (%) 241 (32.1) 167 (33.5) Na Stage IV – No. (%) 47 (6.3) 42 (8.4) Na Tumour MSI status, No. 744 487 463 MSI-high – No. (%) 90 (12.1) 75 (15.4) 40 (8.6) MSI-low – No. (%) 3 (0.4) 48 (9.9) 23 (5.0) MSI-stable – No. (%) 651 (87.5) 364 (74.7) 400 (86.4) MLH1 IHC status, No. 709 517 462 IHC present – No. (%) 635 (89.6) 447 (86.5) 428 (92.6) IHC deficient – No. (%) 74 (10.4) 70 (13.5) 34 (7.4) MLH1 promoter methylation status, No. 569 210 468 Methylation positive – No. (%) 62 (10.9) 58 (27.6) 25 (5.3) Methylation negative – No. (%) 507 (89.1) 152 (72.4) 443 (94.7) Subjects With No Missing Data Controls No. 1097 628 330 Percent male 56 41 55 Age at exam, y – mean (sd) 64.3 (8.6) 60.6 (10.2) 60.2 (8.6) Cases No. 526 193 457 Percent male 52 32 62 Age at diagnosis, y – mean (sd) 60.9 (8.7) 60.5 (10.2) 61.0 (8.9) Tumour Histological Grade, No. 471 188 402 1 – Well differentiated – No. (%) 49 (10.4) 14 (7.4) 57 (14.2) 2 – Moderately differentiated – No. (%) 363 (77.1) 122 (64.9) 310 (77.1) 3 – Poorly differentiated – No. (%) 59 (12.5) 52 (27.7) 35 (8.7) Tumour TNM Stage, No. 488 179 Na Stage I – No. (%) 105 (22.1) 41 (22.9) Na Stage II – No. (%) 194 (39.8) 59 (33.0) Na Stage III – No. (%) 161 (33.0) 70 (39.1) Na Stage IV – No. (%) 28 (5.7) 9 (5.0) Na Tumour MSI status, No. 526 193 457 MSI high – No. (%) 71 (13.5) 67 (34.7) 40 (8.8) MSI low – No. (%) 1 (0.2) 41 (21.2) 22 (4.8) MSI-stable – No. (%) 454 (86.3) 85 (44.0) 395 (86.4) MLH1 IHC status, No. 526 193 457 IHC present – No. (%) 464 (88.2) 131 (67.9) 423 (92.6) IHC deficient – No. (%) 62 (11.8) 62 (32.1) 34 (7.4) MLH1 promoter methylation status, No. 526 193 457 Methylation positive – No. (%) 54 (10.3) 56 (29.0) 25 (5.5) Methylation negative – No. (%) 472 (89.7) 137 (71.0) 432 (94.5) Na = not available, y = year, sd = standard deviation

163

Figure 4.2 500kb region of chromosome 3 examined in Ontario samples A total of 101 polymorphisms (99 unique polymorphisms) were examined in a 500kb region of chromosome 3 surrounding the MLH1 gene for Ontario samples. Genes in this region are outlined (above) along with their transcriptional directionality (below). The three polymorphisms of interest are indicated.

164

We first tested for association between each SNP and the risk of CRC, MSI-H CRCs,

MLH1 IHC-deficient CRCs, and with MLH1 promoter methylation in CRCs (Table 4.3). Three

SNPs were statistically significantly associated with increased risk of CRC in Ontario: rs931913

(P = 0.001), rs4624519 (P = 0.005), and rs4431050 (P = 0.001). In Newfoundland, rs4431050 was not associated with CRC risk (P = 0.79).

Three additional SNPs were statistically significantly associated with increased risk of

MSI-H CRCs, MLH1 IHC deficient CRCs, and with MLH1 promoter methylated CRCs in

Ontario (for rs1800734 P = 0.005, P = 0.04, and P = 0.018 respectively; for rs749072 P = 3.0x10-

4, P = 0.011, and P = 0.003 respectively; and for rs13098279 P = 0.017, P = 0.090, and P = 0.037 respectively; Table 4.4 and Table 4.5). We replicated these findings in the two other populations: for rs1800734 in Newfoundland, P = 8.53x10-5, 1.92x10-5, and 8.95x10-7 for MSI-

H, MLH1 IHC-deficiency, and MLH1 promoter methylation respectively and, for Seattle P =

0.08, P = 0.02, and P = 0.04 respectively; for rs749072 in Newfoundland, P = 0.001, P = 2.4x10-

4, P = 6.65x10-6 respectively and, for Seattle, P = 0.03, P = 0.004, and P = 0.014 respectively; for rs13098279 in Newfoundland, P = 4.5x10-4, P = 4.30x10-5, and 1.98x10-6 respectively and, for

Seattle, P = 0.24, P = 0.07, and P = 0.14 respectively (see Table 4.6 and Table 4.7). None of the three latter SNPs were significantly associated with overall risk of CRC in the three populations studied. These three SNPs span a 197.5-kb region with rs1800734 located in the MLH1 promoter, 93 nucleotides upstream of the translational start site; rs749072 located in intron 26 of the LRRFIP2 gene (IVS26-18T>C); and rs13098279 located between the LRRFIP2 and

GOLGA4 genes (Figure 4.2). All three SNPs are in strong linkage disequilibrium in the Ontario control subjects (pairwise r2 >0.73, D’ >0.98). Haplotype maps, r2 and D’, of all the SNPs examined in Ontario samples are shown in Figures 4.3 and 4.4.

165

Table 4.3 Polymorphisms associated with an overall increased risk of CRC in Ontario with attempted replication of rs4431050 in Newfoundland

Polymorphism Position Gene Sample Common P-value OR 95% CI No. Allele rs931913 36818153 Intergenic 1642 G 0.001 1.286 1.107 to 1.494 rs4624519 36837984 Intergenic 1640 C 0.005 1.240 1.069 to 1.439 rs4431050 36855688 LBA1 1329 G 0.001 1.554 1.186 to 2.037 Ontario intron 8 rs4431050 36855688 LBA1 692 G 0.787 0.951 0.662 to 1.366 Newfoundland intron 8

Analyses are adjusted for age and sex. OR = odds ratio, CI = confidence interval.

166

Table 4.4 Sequential analyses of Ontario samples with tumour MSI status, MLH1 IHC status and MLH1 promoter methylation status

Polymorphisms Associated with Tumour MSI Status Polymorphism Position Gene Sample Common P-value OR 95% CI Size Allele rs7646626 36823920 Intergenic 555 T 0.017 1.597 1.087 to 2.347 rs1800734 37009950 MLH1 744 G 0.005 1.664 1.164 to 2.381 promoter rs749072 37071028 LRRFIP2 744 T <0.001 1.890 1.339 to 2.667 intron 26 rs11720064 37076523 LRRFIP2 556 G 0.078 0.702 0.473 to 1.041 intron 23 rs336601 37159477 LRRFIP2 557 G 0.045 0.620 0.288 to 0.989 intron 2 rs13098279 37207462 Intergenic 744 G 0.017 1.555 1.083 to 2.237

Polymorphisms associated with MLH1 IHC Deficiency Polymorphism Position Gene Sample Common P-value OR 95% CI Size Allele rs7646626 36823920 Intergenic 529 T 0.056 1.531 0.990 to 2.368 rs1800734 37009950 MLH1 709 G 0.040 1.509 1.019 to 2.235 promoter rs9876420 37051463 MLH1 457 G 0.024 2.700 1.140 to 6.391 intron 13 rs749072 37071028 LRRFIP2 709 T 0.011 1.623 1.115 to 2.360 intron 26 rs11720064 37076523 LRRFIP2 531 G 0.083 0.673 0.430 to 1.053 intron 23 rs13098279 37207462 Intergenic 709 G 0.090 1.412 0.947 to 2.106

Polymorphisms associated with MLH1 Promoter Methylation Polymorphism Position Gene Sample Common P-value OR 95% CI Size Allele rs7646626 36823920 Intergenic 414 T 0.021 1.767 1.088 to 2.874 rs1800734 37009950 MLH1 569 G 0.018 1.684 1.093 to 2.591 promoter rs749072 37071028 LRRFIP2 569 T 0.003 1.880 1.238 to 2.857 intron 26 rs11720064 37076523 LRRFIP2 415 G 0.039 0.586 0.353 to 0.973 intron 23 rs7651033 37091046 LRRFIP2 398 C 0.046 0.593 0.355 to 0.990 intron 19 rs13098279 37207462 Intergenic 569 G 0.037 1.590 1.029 to 2.457

167

Table 4.5 Single marker analysis for 3 SNPs for CRC cases versus controls, MLH1 promoter methylation, MLH1 IHC staining and MSI tumour status in Ontario

Colon Cancer Cases vs. Controls

Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 1998 0.476 0.946 0.813 1.100 rs749072 T 1998 0.679 0.970 0.839 1.121 rs13098279 G 1998 0.616 0.961 0.824 1.122

MLH1 Promoter Methylation Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 569 0.018 1.684 1.093 2.591 rs749072 T 569 0.003 1.883 1.238 2.857 rs13098279 G 569 0.037 1.590 1.029 2.457

MLH1 IHC Staining Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 709 0.040 1.509 1.019 2.235 rs749072 T 709 0.011 1.623 1.115 2.360 rs13098279 G 709 0.090 1.412 0.947 2.106

Tumour MSI Status Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 744 0.005 1.667 1.164 2.381 rs749072 T 744 2.90E-04 1.890 1.339 2.667 rs13098279 G 744 0.017 1.555 1.083 2.237

CRC cases versus controls analyses are adjusted for age and sex. OR = odds ratio, CI = confidence interval.

168

Table 4.6 Single marker analysis for 3 SNPs for CRC cases versus controls, MLH1 promoter methylation, MLH1 IHC staining and MSI tumour status in Newfoundland

Colon Cancer Cases vs. Controls

Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 727 0.336 0.879 0.677 1.143 rs749072 T 714 0.377 0.893 0.694 1.148 rs13098279 G 714 0.282 0.865 0.661 1.128

MLH1 Promoter Methylation Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 462 8.95E-07 5.000 2.625 9.434 rs749072 T 427 6.65E-06 4.274 2.273 8.065 rs13098279 G 427 1.98E-06 4.854 2.532 9.346

MLH1 IHC Staining Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 456 1.92E-05 3.231 1.887 5.533 rs749072 T 421 2.40E-04 2.676 1.582 4.527 rs13098279 G 421 4.30E-05 3.085 1.800 5.292

Tumour MSI Status Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 457 8.53E-05 2.747 1.667 4.545 rs749072 T 423 0.001 2.278 1.393 3.717 rs13098279 G 423 4.50E-04 2.500 1.490 4.098

CRC cases versus controls analyses are adjusted for age and sex. OR = odds ratio, CI = confidence interval.

169

Table 4.7 Single marker analysis for 3 SNPs for CRC cases versus controls, MLH1 promoter methylation, MLH1 IHC staining and MSI tumour status in Seattle

Colon Cancer Cases vs. Controls

Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 1198 0.222 0.887 0.731 1.075 rs749072 T 1200 0.117 0.862 0.716 1.038 rs13098279 G 1200 0.286 0.900 0.741 1.093

MLH1 Promoter Methylation Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 205 0.043 1.695 1.016 2.825 rs749072 T 206 0.014 1.848 1.131 3.012 rs13098279 G 206 0.137 1.477 0.883 2.500

MLH1 IHC Staining Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 509 0.018 1.660 1.093 2.522 rs749072 T 510 3.60E-03 1.823 1.217 2.731 rs13098279 G 510 0.070 1.479 0.969 2.257

Tumour MSI Status Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 480 0.082 1.429 0.956 2.141 rs749072 T 481 0.029 1.541 1.046 2.268 rs13098279 G 481 0.241 1.279 0.847 1.927

CRC cases versus controls analyses are adjusted for age and sex. OR = odds ratio, CI = confidence interval.

170

171

Figure 4.3 D-prime map of Ontario samples A total of 99 unique polymorphisms were genotyped in a 500kb region of chromosome 3 for Ontario samples. All polymorphisms are listed on the left (top to bottom) along with their chromosomal locations. Linkage disequilibrium (LD) of each polymorphism with every other polymorphism is compared using a D’ method. The results are quantified and represented in colour-coded diamond shapes. D’ values range from 0 to 100, where 0 represents no LD between two markers and 100 a complete LD. The results are also colour-coded where grey diamonds represent no LD between two markers, while the degrees of LD are depicted in various shades of red. As the LD between two markers gets stronger, so does the shade of red colour along with the magnitude of the D’ value inside a diamond. Polymorphisms in LD are grouped together into haplotype blocks (red triangles), which are separated by white/grey areas where genetic recombination most likely occurs. MLH1 and LRRFIP2 genes are located in a single large block (top), with LBA1 located in a smaller block just below it, followed by DCLK3 immediately below LBA1.

172

173

Figure 4.4 R-squared map of Ontario samples A total of 99 unique polymorphisms were genotyped in a 500kb region of chromosome 3 for Ontario samples. All polymorphisms are listed on the left (top to bottom) along with their chromosomal locations. Linkage disequilibrium (LD) of each polymorphism with every other polymorphism is compared using an r2 method. The results are quantified and represented in colour-coded diamond shapes. The r2 values range from 0 to 100, where 0 represents no LD between two markers and 100 a complete LD. The results are also colour-coded where the degree of LD is depicted in various shades of black. As the LD between two markers gets stronger, so does the shade of black colour along with the magnitude of the r2 value inside a diamond. Polymorphisms in LD are grouped together into haplotype blocks (black triangles), which are separated by white areas where genetic recombination most likely occurs. MLH1 and LRRFIP2 genes are located in a single large block (top), with LBA1 located in a smaller block just below it. DCLK3 is not present in a block based on the r2 values.

174

Combined analysis of all three populations revealed strong associations between

-7 rs749072 and increased risk of each of the following: MSI-H CRC (Pcombined = 2.50x10 , OR =

-7 1.82, CI = 1.45-2.27); MLH1 IHC-deficient CRC (Pcombined = 3.99x10 , OR = 1.87, CI = 1.47-

-6 2.39); and MLH1-promoter-methylated CRC (Pcombined = 3.80x10 , OR = 2.22, CI = 1.67-2.94).

Because the other two SNPs (rs1800734 and rs13098279) are in strong linkage disequilibrium with rs749072, analyses of these SNPs yielded similar results (Table 4.8).

Since an LRRFIP2 polymorphism, rs749072, showed associations with MLH1 promoter methylation, we evaluated a 5-kb region upstream of the LRRFIP2 start site for CpG islands. We identified a CpG island spanning a region from -2516 to -865 relative to the LRRFIP2 transcriptional start site (Figure 4.5). We then examined the presence of DNA methylation in the

LRRFIP2 promoter in a subset of CRCs consisting of 15 MSI-H, MLH1 methylated and 15

MSS, MLH1 non-methylated cases by MS-PCR. We did not detect LRRFIP2 promoter methylation in any of the CRCs examined. An example of LRRFIP2 MSPCR is shown in Figure

4.6.

In order to examine whether these SNPs acted through the pathway that we hypothesized

(Figure 1), we next created logistic regression models for MSI-H versus non-MSI-H CRCs for each study population. We sequentially added the predictors to these models, beginning with

MLH1 IHC status; then MLH1 promoter methylation status; then both MLH1 IHC status and

MLH1-promoter-methylation status; and finally both MLH1 IHC status and MLH1-promoter- methylation status along with each SNP (Tables 4.9 to 4.11). The MLH1 IHC status, MLH1 promoter methylation status, and the SNPs of interest were all strong predictors of the tumour

MSI-H status. The models that included MLH1 IHC status and MLH1-promoter-methylation status gave the smallest Akaike’s Information Criterion (AIC) value in Ontario (93.35) and this

175 the smallest AIC was replicated in other two samples (93.77 for Newfoundland, and 47.06 for

Seattle). Adding each of the three SNPs did not change the model parameters statistically significantly and resulted in the second best model fit (Tables 4.9 to 4.11).

176

Table 4.8 Single marker analysis in the combined data for 3 SNPs for CRC cases versus controls, MLH1 promoter methylation, MLH1 IHC staining and MSI tumour status

Colon Cancer Cases vs. Controls

Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 3923 0.108 0.915 0.822 1.020 rs749072 T 3912 0.102 0.918 0.828 1.017 rs13098279 G 3912 0.155 0.924 0.828 1.031

MLH1 Promoter Methylation Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 1236 3.25E-07 2.128 1.592 2.841 rs749072 T 1202 3.80E-06 2.217 1.669 2.941 rs13098279 G 1202 5.80E-06 1.969 1.468 2.639

MLH1 IHC Staining Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 1674 2.43E-06 1.823 1.420 2.339 rs749072 T 1640 3.99E-07 1.872 1.469 2.386 rs13098279 G 1640 4.71E-05 1.691 1.313 2.179

Tumour MSI Status Marker Common Sample P-value OR Lower Upper Allele Size 95% CI 95% CI rs1800734 G 1681 3.38E-06 1.742 1.389 2.203 rs749072 T 1648 2.50E-07 1.818 1.447 2.278 rs13098279 G 1648 1.15E-04 1.597 1.259 2.024

CRC cases versus controls analyses are adjusted for age, sex and site. OR = odds ratio, CI = confidence interval.

177

Figure 4.5 LRRFIP2 Promoter CpG Island LRRFIP2 promoter contains a CpG Island, underlined in blue, spanning a region from -2516 to - 865 base pairs away from the ATG start site. Vertical crimson lines delineate CpG dinucleotides.

Figure 4.6 Evaluation of LRRFIP2 promoter methylation by MS-PCR Methylated− and Unmethylated− = negative controls with no DNA template, M represents reactions with methylated primers, U represents reactions with unmethylated primers, Methylated+ and Unmethylated+ = positive controls with supermethylated DNA template

178

We subsequently evaluated similar models in the combined samples and the results were consistent with the individual samples. MLH1 IHC status alone was a strong predictor of MSI-H

CRCs (P = 2.08x10-30) as was the MLH1-promoter-methylation status (P = 1.33x10-44) or the

SNPs of interest (for rs1800734, P = 2.30x10-4, for rs749072 P = 1.36x10-5, and for rs13098279

P = 5.10x10-3). The model with MLH1 IHC status and MLH1 promoter methylation status again gave the smallest AIC value (225.12) and addition of rs1800734 resulted in the next most parsimonious model (AIC = 227.05) (Table 4.12). In the combined model with the MLH1 IHC status and MLH1 promoter-methylation-status, both variables remained statistically significant and parameter estimates remained in the same direction; however, when the SNP of interest was added to this model, it no longer remained statistically significant: for rs1800734, the P value changed from 2.30x10-4 to 0.72; for rs749072, P value changed from 1.36x10-5 to 0.98; and, for rs13098279, P value changed from 0.005 to 0.29 (Table 4.12).

A complete list of single marker analyses performed by individual study population and of all three study populations combined along with the details of logistic regression models is provided in Appendices following Chapter 5.

179

Table 4.9 Multivariate logistic regression model results for MSI status for rs1800734, rs749072, and rs13098279 for Ontario

Covariate Model AIC Parameter Standard P-value No. Estimate Error IHC 4 96.71 8.70 1.44 1.63E-09 CH3 5 195.47 5.88 0.63 1.59E-20 IHC 6 95.35 7.16 1.40 3.50E-07 CH3 2.99 1.15 0.0092 rs1800734 8 409.45 -0.28 0.21 0.18 IHC 9 98.67 8.62 1.42 1.21E-09 rs1800734 -0.29 0.48 0.54 CH3 10 197.03 5.90 0.64 1.90E-20 rs1800734 0.24 0.36 0.50 IHC 11 97.56 7.09 1.37 2.55E-07 CH3 2.84 1.17 0.016 rs1800734 -0.14 0.51 0.79 rs749072 13 405.82 -0.46 0.20 0.019 IHC 14 97.39 8.70 1.44 1.48E-09 rs749072 -0.61 0.44 0.17 CH3 15 197.43 5.82 0.63 2.59E-20 rs749072 -0.12 0.32 0.70 IHC 16 96.67 7.19 1.39 2.25E-07 CH3 2.63 1.14 0.02 rs749072 -0.51 0.46 0.26 rs13098279 18 410.76 -0.13 0.21 0.51 IHC 19 98.90 8.60 1.41 1.05E-09 rs13098279 -0.04 0.51 0.93 CH3 20 195.44 6.00 0.65 3.52E-20 rs13098279 0.53 0.39 0.17 IHC 21 97.39 7.04 1.37 3.18E-07 CH3 3.06 1.21 0.01 rs13098279 0.16 0.55 0.77

Age at diagnosis, sex, and location are covariates common to all the models described above. IHC refers to the MLH1 immunohistochemical staining variable, CH3 refers to the MLH1 promoter methylation variable, AIC = Akaike’s information criterion.

180

Table 4.10 Multivariate logistic regression model results for MSI status for rs1800734, rs749072, and rs13098279 for Newfoundland

Covariate Model AIC Parameter Standard P-value No. Estimate Error IHC 4 99.66 6.17 0.75 1.62E-16 CH3 5 159.38 5.32 0.76 2.03E-12 IHC 6 93.77 5.21 0.78 2.44E-11 CH3 3.28 1.15 0.004 rs1800734 8 241.94 -1.01 0.26 8.07E-05 IHC 9 101.59 5.96 0.75 1.59E-15 rs1800734 -0.30 0.48 0.54 CH3 10 161.35 5.16 0.79 6.23E-11 rs1800734 -0.17 0.39 0.66 IHC 11 95.82 5.17 0.78 4.45E-11 CH3 3.36 1.23 0.006 rs1800734 0.12 0.52 0.82 rs749072 13 246.45 -0.83 0.25 8.80E-04 IHC 14 101.78 6.02 0.75 1.11E-15 rs749072 -0.16 0.47 0.73 CH3 15 161.47 5.27 0.79 3.19E-20 rs749072 -0.01 0.38 0.98 IHC 16 95.62 5.20 0.79 4.54E-11 CH3 3.44 1.23 0.005 rs749072 0.22 0.51 0.66 rs13098279 18 245.40 -0.89 0.25 4.70E-04 IHC 19 101.82 6.07 0.77 3.04E-15 rs13098279 -0.01 0.50 0.99 CH3 20 161.47 5.26 0.80 3.67E-11 rs13098279 -0.01 0.40 0.97 IHC 21 95.09 5.28 0.82 1.07E-10 CH3 3.64 1.27 0.004 rs13098279 0.42 0.56 0.45

Age at diagnosis, sex, and location are common parameters to all the models described above. IHC refers to the MLH1 immunohistochemical staining variable, CH3 refers to the MLH1 promoter methylation variable, AIC = Akaike’s information criterion.

181

Table 4.11 Multivariate logistic regression model results for MSI status for rs1800734, rs749072, and rs13098279 for Seattle

Covariate Model AIC Parameter Standard P-value No. Estimate Error IHC 4 47.06 8.75 1.64 9.41E-08 CH3 5 125.40 4.91 0.64 2.14E-14 IHC 6 47.06 7.23 1.57 4.41E-06 CH3 2.32 1.12 0.04 rs1800734 8 244.21 -0.36 0.24 0.14 IHC 9 48.68 8.76 1.61 5.37E-08 rs1800734 0.52 0.74 0.49 CH3 10 127.47 4.85 0.63 2.99E-14 rs1800734 -0.05 0.39 0.89 IHC 11 48.77 7.28 1.58 4.17E-06 CH3 2.22 1.10 0.04 rs1800734 0.48 0.74 0.51 rs749072 13 241.81 -0.49 0.23 0.035 IHC 14 48.43 8.90 1.66 8.48E-08 rs749072 0.63 0.74 0.40 CH3 15 127.33 4.28 0.64 4.23E-14 rs749072 -0.16 0.37 0.67 IHC 16 48.58 7.38 1.62 5.00E-06 CH3 2.19 1.10 0.047 rs749072 0.58 0.74 0.44 rs13098279 18 245.23 -0.27 0.24 0.27 IHC 19 48.73 8.73 1.60 4.92E-08 rs13098279 0.49 0.74 0.50 CH3 20 127.47 4.85 0.64 2.60E-14 rs13098279 -0.06 0.39 0.89 IHC 21 48.80 7.27 1.57 4.03E-06 CH3 2.23 1.10 0.044 rs13098279 0.46 0.73 0.53

Age at diagnosis, sex, and location are common parameters to all the models described above. IHC refers to the MLH1 immunohistochemical staining variable, CH3 refers to the MLH1 promoter methylation variable, AIC = Akaike’s information criterion.

182

Table 4.12 Multivariate logistic regression model results for MSI status for rs1800734, rs749072, and rs13098279 in the combined data

Covariate Model AIC Parameter Standard P-value No. Estimate Error IHC 4 238.72 7.79 0.68 2.08E-30 CH3 5 470.64 5.56 0.40 1.33E-44 IHC 6 225.12 6.53 0.68 7.36E-22 CH3 3.03 0.66 4.29E-06 rs1800734 8 890.89 -0.49 0.13 2.30E-04 IHC 9 240.83 7.74 0.68 2.31E-30 rs1800734 -0.03 0.33 0.94 CH3 10 472.63 5.55 0.40 1.52E-43 rs1800734 0.04 0.22 0.85 IHC 11 227.05 6.50 0.67 5.98E-22 CH3 3.06 0.67 5.18E-06 rs1800734 0.12 0.34 0.72 rs749072 13 885.46 -0.56 0.13 1.36E-05 IHC 14 240.72 7.72 0.67 2.70E-30 rs749072 -0.13 0.31 0.68 CH3 15 472.56 5.52 0.40 2.24E-43 rs749072 -0.08 0.21 0.70 IHC 16 227.23 6.48 0.67 5.64E-22 CH3 3.01 0.66 5.72E-06 rs749072 -0.01 0.32 0.98 rs13098279 18 896.57 -0.38 0.14 0.0051 IHC 19 240.41 7.80 0.68 2.73E-30 rs13098279 0.21 0.35 0.55 CH3 20 471.93 5.60 0.40 1.23E-43 rs13098279 0.19 0.23 0.41 IHC 21 225.94 6.55 0.68 1.04E-21 CH3 3.17 0.69 3.74E-06 rs13098279 0.39 0.37 0.29

Age at diagnosis, sex, and location are common parameters to all the models described above. IHC refers to the MLH1 immunohistochemical staining variable, CH3 refers to the MLH1 promoter methylation variable, AIC = Akaike’s information criterion.

183

4.5 DISCUSSION

This large-scale multi-centre study examined germline DNA markers and their contribution to somatic events in CRC. In three independent populations, three polymorphisms, rs1800734, rs749072, and rs1800734, were associated with increased DNA methylation in the chromosomal region and were also associated with gene silencing and microsatellite instability.

Although these three markers are not associated with an increase in the risk of CRC overall, they do play a role in colorectal tumourigenesis in the subset of CRCs that display genome-wide microsatellite instability. Among case patients in each individual study population and in an analysis of all three combined, statistically significant associations were observed between each of these three polymorphisms and MLH1 promoter methylation, MLH1 IHC deficiency, and

MSI-H tumour status. In multivariate logistic regression models, each SNP was associated with tumour MSI-H status; however, once MLH1 IHC deficiency or MLH1 promoter methylation, or both, were included in the model, the SNP association was no longer statistically significant.

These results indicate that the observed associations between these polymorphisms and MSI-H status occur through MLH1 methylation and subsequent gene silencing. Furthermore, when both

IHC and methylation status were included in the model, MLH1 IHC status and MLH1 promoter methylation were both strongly associated with MSI-H status indicating that these two events, while highly correlated, are not completely dependent of each other even after exclusion of all known germline MMR gene mutation carriers. Similar observation was reported previously where MLH1 promoter methylation accounted for 80% of MLH1 IHC deficient-MSI-H CRCs after excluding all MLH1 germline mutation carriers (378). Other mechanisms must, then, be responsible for the remaining 20% of MLH1 IHC deficient-MSI-H CRCs. These may include

184 somatic MMR gene mutations, loss of heterozygosity at an MMR gene locus, or maybe even due to unidentified microRNA silencing of a MMR gene.

The major agent used for the medical treatment of patients with advanced CRC, 5-

Fluorouracil (5-FU), is recognized by the MMR system (87). 5-FU selectively kills cells with intact MMR, while MMR-deficient cells are resistant (87). Patients with stage II and III sporadic

MSI CRC do not show a survival benefit following 5-FU therapy when compared with MSS

CRC patients in retrospective and prospective studies (88-90). Indeed, 5-FU-based adjuvant chemotherapy might decrease overall and disease-free survival among MSI CRC patients (88).

Similarly, stage III HNPCC patients do not show a 5-year survival benefit with 5-FU treatment over untreated patients (91). CRC is a heterogeneous disease and the three polymorphisms used in this study may serve as predictive markers in at-risk individuals for early identification of MSI and selection of optimal therapies.

In addition to colon cancer, the MLH1-93G>A polymorphism (rs1800734) also is associated with other cancers including: ovarian (379), endometrial (357, 358), and secondary tumours arising from Hodgkin lymphoma (359). More specifically, the MLH1-93G>A polymorphism was shown to be associated with the MLH1 promoter methylation in endometrial cancers (358). Hodgkin lymphoma patients who carried the variant -93A allele were at higher risk of developing secondary tumours following methylating chemotherapy (359). In the colon, this polymorphism has been shown to increase the risk of hyperplastic polyps and adenomas in smokers (307) as well as MSI-H CRCs, alone, or in combination with lifestyle factors (296).

Furthermore, the MLH1-93G>A polymorphism is associated with CIMP-positive CRCs (which include MLH1 promoter methylation) (355) and with the loss of MLH1 gene expression (356), both of which are consistent with the hypothesis proposed and tested here.

185

One possible explanation of our previous finding that the MLH1-93G>A promoter polymorphism was associated with increased risk of MSI-H CRCs is that the association is caused by another functional MLH1 polymorphism in strong linkage disequilibrium (LD) with the MLH1-93G>A SNP (341). In this study, we identified two polymorphisms, rs749072 and rs13098279, that are in strong LD with the MLH1-93G>A SNP. However, neither of these two polymorphisms is located in MLH1 gene; rs749072 is located in intron 26 of LRRFIP2 (leucine- rich repeat in Flightless interaction protein 2), 18 nucleotides from a splice acceptor site (IVS26-

18T>C); rs13098279 is an intergenic polymorphism located between LRRFIP2 and GOLGA4

(golgi autoantigen, golgin subfamily a, 4). LRRFIP2 binds Dishevelled and serves as an activator of the Wnt signalling pathway, which is deregulated in ~85% of CRCs (380). In addition,

LRRFIP2 also positively regulates Toll-like receptor (TLR) signalling as well as cytokine production in macrophages (381). LRRFIP2 splice variants were identified in colon and prostate cancers (382). Isoforms containing exons 5 and 6 predominate in normal colon, whereas a shorter isoform, lacking both these exons is found in adenomas and colon cancer samples.

Decreased levels of the LRRFIP2 longer variant are also observed in metastatic prostate cancers

(382). The spliced exons contain several potential phosphorylation sites that might influence protein function (382). The roles of the identified splice variants in tumourigenesis, as well as potential effects of rs749072 on splicing machinery, are still unclear.

We identified three additional polymorphisms, rs931913, rs4624519, and rs4431050, associated with an overall increased risk of CRC in the Ontario samples. Our attempt to replicate, in the Newfoundland population, the findings with rs4431050 (Lupus brain antigen 1,

LBA1 IVS7+477G>A), which had demonstrated the strongest association in Ontario, was unsuccessful. It is possible that the association observed in Ontario occurred by chance, or that

186 its effects are population specific and strongly influenced by environmental factors. We did not attempt to replicate the findings for rs931913 and rs4624519 in Newfoundland or Seattle.

Our study has several limitations, including the unavailability of some clinical data from our study subjects. Clinical and pathologic characteristics were not available for several reasons

(e.g., tumour material not available for MSI, IHC, or methylation testing, technical difficulties, or death of the patient before tissue samples could be obtained). However, because the general clinical and pathologic characteristics of CRC in our whole case patient populations were similar to those of case patients with no missing data, our study was not limited by this potential source of bias. One exception was the methylation analysis of Seattle samples, which were mostly completed on MSI-H case patients.

Our study also has numerous strengths. The large sample size gave us high power and precision. In order to observe statistically significant associations of the same order of magnitude that we report here in a genome-wide association study design, we would require between 23,000 and 61,000 case patients and control subjects. A major strength of our study is the use of three independent population-based registries, Ontario, Newfoundland, and Seattle. Replication of our main findings in two additional independent populations provides strong evidence that our findings reflect real associations and are unlikely to have occurred by chance.

The important finding of this study is the identification of a genetic basis for DNA methylation susceptibility; it indicates that genetic variants may play an indirect role in increasing the risk of MSI-H colorectal cancer. Rather than directly affecting gene transcription, splicing machinery, or protein function, these alleles indirectly lead to gene silencing.

Nonetheless, the exact mode of action of these polymorphisms is not clear. Perhaps they alter the binding sites of transcription factors and DNA-binding proteins that protect the DNA molecule

187 from methylation. Inability of these protective proteins to bind DNA would expose DNA to methylating machinery. Conversely, these polymorphisms may create binding sites for co- repressors, methylated DNA-binding proteins, or other proteins involved in epigenetic silencing that modify DNA and silence gene expression. Another possible mechanism involves the production of antisense RNA; it was shown recently that increased production of antisense RNA resulted in epigenetic silencing of p15 tumour suppressor gene (383). That paper also listed a number of antisense RNAs identified in the MLH1 gene (383). The polymorphisms in this study may increase the production of antisense RNAs that result in epigenetic silencing of the corresponding sense-strand genes.

The fact that polymorphisms in genes other than MLH1 are associated with DNA methylation may indicate that the MLH1 promoter methylation observed in MSI-H colorectal cancers is not localized just to the MLH1 locus, but extends beyond the gene. Indeed, Hitchins et al. observed that, in MSI-H colorectal cancers, methylation is not limited to the MLH1 promoter region, but affects genes in a region as large as 2.4 Mega base pairs (384). We may have identified, in a much smaller region, genetic markers of the predisposition to such epigenetic alterations and, because a mismatch repair gene, MLH1, is involved, microsatellite instability invariably develops. However, we cannot yet exclude the possibility that these markers tag some other unknown variant(s) that are the true cause of DNA susceptibility to methylation.

We identified a CpG island in the LRRFIP2 promoter spanning a region from -2516 to -

865 from transcriptional start site. However, we could not detect any methylation in a subset of

MLH1-methylated MSI-H CRCs or in MLH1-non-methylated MSS CRCs by MS-PCR.

However, LRRFIP2 was found to be downregulated in MSI-H CRCs even in the absence of epigenetic silencing events (DNA methylation or histone modifications) (384). This

188 downregulation of LRRFIP gene in MSI-H CRCs was attributed to epigenetic silencing of neighbouring genes. Treatment of LRRFIP2 non-expressing RKO cell line with the methyltransferase inhibitor (5-aza-2’-deoxycytidine) and/or histone deacetylase inhibitor

(trichostatin A) resulted in LRRFIP2 gene expression (384).

In this study, we built on our previous finding, an association of the MLH1-93G>A polymorphism with MSI-H colorectal cancers (341). We identified a novel mechanism in which common missense alterations may contribute to complex disease. The three polymorphisms reported in this study serve as germline markers/predisposition alleles for a somatic event that will result in gene silencing and consequently, a specific subtype of colorectal cancer. Additional characterization of these the genes and polymorphisms noted here may lead to new insights and new mechanisms by which alleles contribute to cancer incidence and progression.

189

CHAPTER 5

Summary and Future Directions

5.1 MMR Polymorphisms in Colorectal Cancer

Over the past decade, studies on genetic polymorphisms have undergone considerable transformation, evolving from examinations of non-synonymous polymorphisms in key candidate genes into evaluations of polymorphisms in gene regulatory regions, and more recently into more complex, genome-wide association studies. Similarly, our understanding of DNA methylation has also evolved from simplistic monogenic events into more complex, genome- wide instability pathways. One such pathway, CIMP1, is specific to MSI-H CRCs due to silencing of the MLH1 gene and impaired mismatch repair. Mismatch repair is a key DNA caretaker system whose dysfunction results in genome-wide microsatellite instability (69).

MLH1 is one of the key members of mismatch repair system. Loss of MLH1 gene expression caused by mutations, or more commonly epigenetic silencing, is a critical step in majority of

MSI-H colorectal cancers (66). Tumours exhibiting microsatellite instability are recognized as a unique subset of CRCs with distinct clinical and pathological features (80-82). Recent evidence suggests that patients with MSI-H CRCs respond differently to chemotherapeutic agents (88-90).

Early identification of patients with this CRC subtype has become critical in selecting suitable treatments and patient management.

190

In this regard, my work implicates a promoter polymorphism in MLH1 gene as a novel common susceptibility allele for MSI-H CRCs. In this thesis, I demonstrate that the MLH1-

93G>A promoter SNP increases the risk of MSI-H CRCs in two distinct populations. Since then, our work has been replicated in three additional populations (296, 355, 356). I further demonstrate that the effects of this polymorphism are not caused by its direct role in transcription, but rather through the epigenetic silencing of the MLH1 promoter. In another study, this polymorphism was also implicated in CIMP1 CRCs that display MSI-H phenotype due to MLH1 promoter methylation (355). It is possible that certain gene-gene or gene- environment interactions modify this risk. It has been reported that the MLH1-93A allele increases risk of hyperplastic polyps and adenomas among smokers (307). Interestingly,

Campbell et al. reported that the MLH1-93A diminishes risk of MSS CRCs among smokers

(296). It would be interesting to examine what role diet and environmental factors play with

MLH1-93G>A SNP and MLH1 promoter methylation.

Since the MLH1-93G>A promoter polymorphism increases the risk of MLH1 promoter methylation resulting in MSI-H colorectal tumours, it is possible that it also plays a role in modifying cancer risk in patients receiving methylating chemotherapy. In Hodgkin lymphoma patients, carriers of the variant -93A allele were at increased risk of developing secondary tumours, primarily acute myeloid leukaemia and breast cancers (359). Future studies can evaluate the response to certain chemotherapeutic agents, cancer recurrence, and five-year survival with the MLH1-93G>A polymorphism in our study populations.

However, we cannot eliminate the possibility that the MLH1-93G>A, rs749072, and rs13098273 tag other variants in the region that contribute to DNA methylation susceptibility. In order to identify such new variants, it would be useful to sequence a 197.5-kb region between the

191

MLH1-93G>A and rs13098279 by Deep Sequencing (Solexa/Illumina). This sequencing would be performed in a subset of CRC patients stratified by MSI/MLH1 promoter methylation tumour status and by two main haplotypes, homozygous wild-type or variant genotypes at all three loci.

Any new variants identified from the analysis of these four subgroups of CRC patients would then be genotyped for the three study populations, Ontario, Newfoundland, and Seattle.

5.2 MLH1-93G>A Effects of Promoter Activity

The functional promoter assays with the MLH1-93G>A polymorphism showed consistent results across all colon cancer and normal colonic cell lines examined. Even though the cancer cell lines examined represent a heterogeneous CRC population, the SW480 (MSS with KRAS mutation), SW620 (MSS with APC and KRAS mutations), HT29 (MSS, polyploid with APC and BRAF mutations), and HCT116 (MSI-H, MLH1 mutation plus promoter methylation, examined by a colleague in the lab) cell lines showed similar results, indicating that the effects observed are comparable among colon cancer tissues. The effects of this SNP were examined in a core promoter region of MLH1 as well as in a smaller, more proximal promoter region. In addition, I also examined the role of this polymorphism on the reverse promoter belonging to the EPM2AIP1 gene. It appears that the G>A nucleotide substitution at the -93 location affects the promoter activity, although considerably less in the MLH1 core promoter.

The variant -93A allele significantly reduces the promoter activity in a shorter promoter construct. Conversely, the same variant -93A allele increases the promoter activity of the

EPM2AIP1 gene. It is tempting to hypothesize that since the -93A allele decreases the activity of the MLH1 promoter due to impaired transcription factor binding, it will then free the reverse promoter sequence for transcription factors to bind and therefore increase the activity of the

192

EPM2AIP1 promoter. However, the proteins that bind this region have not yet been identified.

Moreover, as many as five proteins/protein-complexes have been observed to bind the 25- nucleotide probes around the -93G>A polymorphism in electrophoretic mobility shift assays

(EMSA) carried out in our lab (by a graduate student Sheron Perera). The identities and functions of these protein complexes remain to be elucidated.

Recent study found that expression of the MLH1 gene was repressed in cancer cells in response to hypoxic stress (385). This repression occurs via dynamic shift in occupancy from activating c-Myc/Max to repressive Mad1/Max and Mnt/Max complexes at the proximal promoter of the MLH1 gene (385). While the specific c-Myc binding site in the promoter of the

MLH1 gene has not been identified, c-Myc transcription factors can bind E-Box motifs

(AACGTG) one of which is located 8 nucleotides downstream from the -93G>A polymorphism.

Future studies can evaluate possible effects of the MLH1-93G>A polymorphism on MLH1 repression in cells maintained in hypoxic conditions (95% N2, 5% CO2, and less than 10 ppm

O2).

Unlike the MLH1-93G>A, the MSH2-118T>C polymorphism is still not well studied.

Our work identified a strong association between the MSH2-118T>C polymorphism and family history (based on the Amsterdam Criteria), which is especially pronounced among female CRC patients. In addition, we have observed that the -118 variant C allele is more common among the male CRC patients. The -118T>C polymorphism alters the oestrogen-responsive CBF-binding site into an androgen-responsive AP1 binding site. Future projects can evaluate the functional effects of the MSH-118T>C polymorphism using luciferase promoter assays in cells grown in media with and without hormonal (oestrogen/androgen) supplementation. Such assays can be carried out in a variety of male (HCT116, SW480, SW620) and female (HT29, SW48) CRC cell

193 lines. Evaluation of such polymorphisms may provide insight into disparities seen in adenoma and CRC incidence between the two sexes.

MSI-H CRCs have distinct clinical and pathological tumour features from MSS CRCs and these differences can be quantified and used to predict tumour MSI status. Jenkins et al. developed a scoring system to predict tumour MSI status based on histological tumour characteristics (80). This scoring system was termed microsatellite instability by pathology, or

MsPath. This system offers rapid scoring of the colorectal tumours and the probabilities of their

MSI status. However, MsPath system has not been widely accepted and is susceptible to variation as the scoring system is subjective and may vary between pathologists. Since we have identified genetic markers that serve as strong predictors of tumour MSI-H status, it would be useful to examine whether our markers can be utilized in combination with the MsPath scoring system. Because genotypes are fixed and do not change over time, SNPs examined in this study may provide a degree of stability to the MsPath scoring system. Colorectal tumours from Ontario and Newfoundland have been scored on clinical and pathological features used in MsPath (80).

The features included: histological subtypes (adenocarcinoma, signet ring cell carcinoma, mucinous carcinoma, and medullary carcinoma), degree of differentiation (poor or other), presence of tumour infiltrating lymphocytes (yes or no), presence of Crohn’s-like reaction (yes or no), anatomical site (proximal versus distal), and age at diagnosis (<50 or 50 and over). Future analyses will evaluate whether the predictive polymorphisms we examined increase the sensitivity and/or specificity of the MsPath scoring system.

194

5.3 Examination of MSH2/MSH6 Gene Region Polymorphisms in CRC

In addition to the MLH1 gene polymorphisms, our lab has also evaluated a number of polymorphisms in the MSH2 and MSH6 genes located on the short arm of chromosome 2 at

2p.22-21 and 2p16, respectively. Assessment of additional polymorphisms (already genotyped using the Affymetrix GeneChip Human Mapping 100K and 500K platforms) around and between these two genes may identify additional markers that are specific for MSI-H or MSS

CRCs. For these studies MLH1 IHC deficient tumours can be removed to eliminate any confounding effects and to capture the effects of polymorphisms more accurately.

5.4 Implication of New Genes in CRC

Our work has implicated potential new genes proximal to MLH1 on chromosome 3 in

CRC either through SNP associations or DNA methylation. Because MLH1 silencing by DNA methylation does not result in silencing of just that one gene, but potentially several others, at least EPM2AIP1 since it shares a promoter with MLH1, it would be most informative to examine gene expression of these genes in a subset of MSI-H (MLH1 IHC deficient) and MSS CRCs. For this purpose, we have requested, from the OFCCR, 21 CRC and matched normal flash-frozen tissue samples that will enable future studies to examine expression of these new candidate genes. Of these 21 tumours, 15 are MSS and 6 are MSI-H and deficient for MLH1 protein expression. Potential new genes to be examined include EPM2AIP1, LRRFIP2, and LBA1.

The EPM2AIP1 gene is simultaneously silenced in MSI-H tumours with MLH1 promoter methylation (354); however, its contribution to tumourigenic process is unknown. Our colleagues at the Hospital for Sick Children have created an Epm2aip1 knockout mouse model

195 and its phenotype is being closely monitored for any abnormalities. The EPM2AIP1 interacting protein, Laforin, has been implicated in Wnt signalling pathway and cell cycle regulation (369), although EPM2AIP1 role in these pathways is unclear. Potential future projects can evaluate whether EPM2AIP1 plays a role in Wnt signalling by examining its effects on the TOPFlash luciferase reporter assay (representing Wnt signalling target gene activation) in transiently transfected HEK293T cells that do not express endogenous EPM2AIP1 due to promoter methylation. Any effects of EPM2AIP1 on Wnt signalling pathways can be further evaluated.

LRRFIP2 has been implicated as a Wnt signalling pathway activator by interacting with

Dishevelled-3 (Dvl) (380). Since Dishevelled also plays a role in the nucleus to activate Wnt target genes (386), it is unclear whether its interaction with LRRFIP2 is strictly cytoplasmic or whether it occurs in the nucleus. Should this interaction prove to be nuclear, the presence of

LRRFIP2 at the promoters of Wnt signalling target genes can then be examined with chromatin immunoprecipitation (ChIP) studies. More recently LRRFIP2 has also been implicated in inflammatory response by interacting with MyD88 and activating TLR signalling (381). With a finding that the LRRFIP2 gene is downregulated in MSI-H CRCs (384), it is not well understood how this downregulation affects the Wnt signalling pathway or inflammatory response to LPS stimulation and whether it contributes to CRC tumourigenesis. We have managed to obtain flag- and myc-tagged LRRFIP2 and dominant-negative LRRFIP2 constructs from Dr. Peter Schultz

(La Jolla, California). These constructs will enable future studies to gain insights into the role of

LRRFIP2 in tumourgenesis. According to Entrez Gene

(http://www.ncbi.nlm.nih.gov/sites/entrez), three isoforms of LRRFIP2 exist. All three isoforms contain a DUF2051 double-stranded RNA binding domain, while isoform 1 additionally contains an Smc chromosome segregation ATPase involved in cell division and chromosome partitioning.

196

Coincidentally, three splice variants of LRRFIP2 have been identified in normal colonic tissue, however only two are found in CRCs and metastatic prostate cancers (382). In addition to a CpG island located in the LRRFIP2 promoter, additional CpG islands are present throughout the gene.

It is possible that methylation of some of these CpG islands contribute to alternative splicing and tissue-specific expression of LRRFIP2 gene. Evaluation of methylation patterns of these CpG islands and LRRFIP2 isoform expression may gain insight into the tumour-specific expression patterns and potential role of LRRFIP2 in CRC.

LBA1 is a rather unknown gene. No LBA1 functions, pathways, or binding partners are known which makes it an attractive candidate gene to study. According to Entrez Gene

(http://www.ncbi.nlm.nih.gov/sites/entrez), LBA contains a potential UvrD/Rec helicase domain.

UvrD helicase is a component of prokaryotic MMR system, however no mammalian homologue has ever been identified. LBA1 expression was suppressed in MSI-H CRCs by DNA methylation, although it was also suppressed in MSS CRCs (384). However, the mechanism of its suppression in MSS CRCs is unclear. Thus, LBA1 appears to be downregulated in CRCs regardless of tumour MSI status. This may explain why polymorphisms (rs931913, rs4624519, and rs4431050) in and around the LBA1 locus were associated with increased risk of all CRCs, at least in Ontario samples. Potential new studies can evaluate whether LBA1 acts as a genuine tumour suppressor or whether its downregulation is simply a bystander effect in CRC.

In summary, the present study focused on elucidating the role of mismatch repair gene polymorphisms and their roles in colorectal cancer. Multiple interesting associations between genetic polymorphisms and tumour features were observed. Notably, MLH1-93G>A polymorphism was associated with increased risk of MSI-H colorectal tumours. While this SNP does have some functional effects, this association is mediated via DNA methylation and gene

197 silencing. Two additional polymorphisms in neighbouring gene region that are in linkage disequilibrium with the MLH1-93G>A are also associated with DNA methylation. Combined with findings that DNA methylation affects several genes surrounding MLH1, it can be postulated that specific DNA sequence variants evoke conditions that are favourable for DNA methylation. Taken together, it appears that DNA sequence and epigenetic events are closely intertwined and affect each other. It is not clear how silencing of MLH1 neighbouring genes contributes to MSI-H CRCs and whether this results in clinical and pathologic features observed in MSI-H patients. The results presented in this thesis underscore the importance of future investigations on how the interplay of genetic variants and DNA methylation promotes tumour progression. Also, future studies may reveal the use of these genetic variants as prognostic markers for MSI-H tumours, which could be used to optimise patient care and improve outcome.

198

APPENDICES

Table A.1 Single Marker Case-Control Analyses Location Marker Position Ref CC CC CC CC CC Allele No. P-value OR L95% U95% NL rs6781630 36561025 3 714 0.427 1.124 0.843 1.499 SEA rs6781630 36561025 3 1200 0.049 1.263 1.001 1.594 NL rs6550425 36566207 1 714 0.769 0.968 0.781 1.200 SEA rs6550425 36566207 3 1200 0.168 1.119 0.954 1.312 NL rs11716822 36566673 1 714 0.427 1.124 0.843 1.499 SEA rs11716822 36566673 1 1200 0.049 1.263 1.001 1.594 NL rs1357069 36567042 1 714 0.427 1.124 0.843 1.499 SEA rs1357069 36567042 1 1200 0.049 1.263 1.001 1.594 NL rs1357070 36567136 2 714 0.427 1.124 0.843 1.499 SEA rs1357070 36567136 2 1200 0.049 1.263 1.001 1.594 NL rs1357071 36567376 4 712 0.386 1.136 0.851 1.516 SEA rs1357071 36567376 4 1189 0.039 1.277 1.012 1.612 NL rs11129723 36572905 1 714 0.427 1.124 0.843 1.499 SEA rs11129723 36572905 1 1200 0.037 1.280 1.015 1.614 NL rs1521268 36609993 1 713 0.302 1.166 0.871 1.561 SEA rs1521268 36609993 1 1199 0.046 1.268 1.004 1.600 NL rs1402563 36611158 4 714 0.427 1.124 0.843 1.499 SEA rs1402563 36611158 4 1200 0.042 1.273 1.009 1.608 NL rs1464412 36611910 2 714 0.897 0.986 0.797 1.220 SEA rs1464412 36611910 4 1200 0.161 0.891 0.759 1.047 NL rs17248901 36622465 4 714 0.528 1.096 0.824 1.460 SEA rs17248901 36622465 4 1200 0.041 1.271 1.010 1.600 NL rs6780039 36630641 2 700 0.243 1.166 0.901 1.508 SEA rs6780039 36630641 2 1176 0.060 1.223 0.992 1.508 ON rs6550433 36792332 4 1638 0.288 0.924 0.799 1.069 ON rs11927951 36792615 2 1643 0.291 0.925 0.801 1.069 ON rs1472862 36800143 3 1603 0.411 1.065 0.916 1.238 ON rs1472863 36800537 1 1580 0.186 0.865 0.698 1.072 ON rs4075977 36803662 2 1645 0.110 0.841 0.680 1.040 ON rs885164 36804255 2 1641 0.416 1.064 0.916 1.235 ON rs987190 36804984 1 1587 0.343 0.912 0.755 1.103 ON rs7648754 36807086 1 1620 0.365 1.069 0.926 1.234 ON rs9311137 36817888 3 1593 0.109 0.868 0.730 1.032 ON rs931913 36818153 3 1642 0.001 1.286 1.107 1.494 ON rs10510689 36821298 4 1534 0.510 0.859 0.547 1.350 ON rs9311138 36822275 3 1527 0.153 0.852 0.684 1.061 ON rs7646626 36823920 4 1643 0.258 0.915 0.784 1.067 ON rs906482 36831332 3 1629 0.188 0.908 0.787 1.048 ON rs1553656 36834709 2 1527 0.377 0.936 0.807 1.084 ON rs1532964 36835547 3 1534 0.221 0.901 0.763 1.064 ON rs1006834 36836485 4 1617 0.061 0.872 0.755 1.006 ON rs4624519 36837984 2 1640 0.005 1.240 1.069 1.439 ON rs9917659 36839164 1 1646 0.283 0.916 0.781 1.075 NL rs4431050 36855688 3 692 0.787 0.951 0.662 1.366 ON rs4431050 36855688 3 1329 0.001 1.554 1.186 2.037 ON rs9882911 36860396 4 1629 0.099 1.133 0.977 1.313 ON rs3733033 36867590 1 1507 0.093 1.247 0.963 1.614 ON rs6808230 36886601 4 1646 0.214 0.905 0.774 1.059 ON rs7649846 36889962 2 1637 0.280 0.917 0.783 1.073 ON rs9810132 36895054 3 1636 0.226 0.907 0.774 1.062 ON rs11129740 36900198 2 1629 0.169 0.895 0.764 1.048

199

ON rs6765624 36902550 3 1636 0.297 0.919 0.784 1.077 ON rs6781803 36912525 3 1502 0.856 0.985 0.836 1.161 ON rs4328757 36913184 4 1524 0.673 0.970 0.842 1.117 ON rs4528889 36914098 1 1629 0.798 0.977 0.819 1.166 ON rs6550440 36914290 3 1813 0.329 0.929 0.800 1.078 ON rs9864910 36915745 3 1535 0.074 0.862 0.732 1.014 ON rs7614265 36919082 4 1534 0.545 0.869 0.551 1.370 ON rs6769189 36933777 3 1633 0.709 1.039 0.850 1.269 ON rs6769400 36933999 3 1614 0.813 1.018 0.880 1.176 ON rs7622114 36935664 1 1646 0.608 0.964 0.836 1.110 ON rs6777094 36951930 1 1545 0.849 1.014 0.878 1.172 ON rs17035812 36969947 3 1626 0.984 1.003 0.717 1.404 ON rs17202548 36970991 1 1582 0.990 1.001 0.867 1.155 ON rs6768108 36973759 2 1626 0.917 0.992 0.861 1.144 ON rs6789043 37000871 4 1872 0.875 0.990 0.868 1.128 ON rs9311149 37005179 1 1817 0.761 1.021 0.894 1.165 Combine rs1800734 37009950 3 3923 0.108 0.915 0.822 1.020 NL rs1800734 37009950 3 727 0.336 0.879 0.677 1.143 ON minus93 37009950 3 1998 0.476 0.946 0.813 1.101 SEA rs1800734 37009950 3 1198 0.222 0.887 0.731 1.075 ON rs1540354 37019493 1 1521 0.600 1.053 0.868 1.278 NL i219v 37028572 1 687 0.338 1.122 0.887 1.420 ON i219v 37028572 1 1998 0.733 1.024 0.892 1.176 ON rs4647269 37032595 2 1626 0.974 1.002 0.869 1.156 ON rs3774339 37037858 2 1615 0.937 0.994 0.861 1.148 ON rs3774332 37049672 1 1872 0.791 0.969 0.768 1.223 ON rs9876420 37051463 3 1421 0.068 0.677 0.445 1.029 ON rs748766 37057878 4 1645 0.902 1.009 0.875 1.164 NL ivs1419 37058744 1 693 0.401 1.095 0.886 1.352 ON ivs1419 37058744 1 1998 0.879 0.990 0.872 1.125 ON rs2241031 37065278 2 1631 0.867 1.012 0.878 1.168 Combine rs749072 37071028 4 3912 0.102 0.918 0.828 1.017 NL rs749072 37071028 4 714 0.377 0.893 0.694 1.148 ON rs749072 37071028 1 1998 0.679 0.970 0.839 1.121 SEA rs749072 37071028 4 1200 0.117 0.862 0.716 1.038 ON rs3774326 37074570 3 1626 0.999 1.000 0.867 1.153 ON rs11720064 37076523 3 1642 0.911 1.008 0.874 1.163 ON rs1468712 37081017 4 1629 0.984 1.001 0.869 1.154 ON rs7639375 37082026 2 1456 0.546 0.956 0.825 1.107 ON rs7651033 37091046 2 1585 0.794 0.981 0.847 1.136 ON rs2302504 37091390 3 1644 0.893 1.010 0.876 1.164 ON rs17204801 37118438 3 1623 0.979 1.002 0.869 1.155 ON rs6800554 37118883 2 1610 0.919 0.993 0.861 1.144 ON rs336601 37159477 3 1646 0.519 1.053 0.900 1.233 ON rs6550458 37163322 4 1614 0.976 1.002 0.868 1.157 ON rs9869432 37171605 4 1646 0.847 1.014 0.880 1.169 ON rs9823617 37174354 1 1600 0.636 1.036 0.895 1.199 ON rs6786857 37180019 2 1555 0.651 1.066 0.808 1.406 ON rs1392749 37180195 2 1630 0.936 1.006 0.873 1.159 ON rs11711937 37199842 2 1626 0.852 1.014 0.879 1.169 Combine rs13098279 37207462 3 3912 0.155 0.924 0.828 1.031 NL rs13098279 37207462 3 714 0.282 0.864 0.661 1.128 ON rs13098279 37207462 3 1998 0.615 0.961 0.824 1.122 SEA rs13098279 37207462 3 1200 0.286 0.900 0.741 1.093 ON rs9874437 37241682 2 1645 0.990 0.999 0.867 1.152 ON rs7639607 37244043 4 1604 0.674 0.970 0.841 1.119 ON rs4678562 37257074 2 1607 0.731 1.026 0.886 1.188 ON rs6800842 37264608 3 1629 0.677 1.031 0.893 1.191 ON rs17036181 37277760 4 1616 0.773 1.057 0.724 1.543

200

NL rs10510691 37500765 4 714 0.507 0.900 0.661 1.227 SEA rs10510691 37500765 4 1200 0.750 0.964 0.770 1.208 SEA = Seattle, NL = Newfoundland, ON = Ontario, Alleles 1 = A, 2 = C, 3 = G, 4 = T, CC = Case-Control, OR = odds ratio, L95% = lower 95% Confidence Interval, U95% = Upper 95% Confidence Interval

201

Table A.2 Single Marker analyses by MLH1 Promoter Methylation Status Location Marker Position Ref Meth Meth Meth Meth Meth Allele No. P-value OR L95% U95% NL rs6781630 36561025 3 427 0.368 0.712 0.340 1.492 SEA rs6781630 36561025 3 206 0.821 0.925 0.469 1.823 NL rs6550425 36566207 1 427 0.859 0.946 0.511 1.749 SEA rs6550425 36566207 3 206 0.271 1.300 0.815 2.074 NL rs11716822 36566673 1 427 0.368 0.712 0.340 1.492 SEA rs11716822 36566673 1 206 0.821 0.925 0.469 1.823 NL rs1357069 36567042 1 427 0.368 0.712 0.340 1.492 SEA rs1357069 36567042 1 206 0.821 0.925 0.469 1.823 NL rs1357070 36567136 2 427 0.368 0.712 0.340 1.492 SEA rs1357070 36567136 2 206 0.821 0.925 0.469 1.823 NL rs1357071 36567376 4 426 0.365 0.710 0.339 1.488 SEA rs1357071 36567376 4 205 0.963 1.017 0.510 2.026 NL rs11129723 36572905 1 427 0.368 0.712 0.340 1.492 SEA rs11129723 36572905 1 206 0.966 1.015 0.509 2.024 NL rs1521268 36609993 1 426 0.326 0.685 0.322 1.457 SEA rs1521268 36609993 1 206 0.966 1.015 0.509 2.024 NL rs1402563 36611158 4 427 0.368 0.712 0.340 1.492 SEA rs1402563 36611158 4 206 0.966 1.015 0.509 2.024 NL rs1464412 36611910 2 427 0.703 0.892 0.498 1.600 SEA rs1464412 36611910 4 206 0.970 1.009 0.636 1.601 NL rs17248901 36622465 4 427 0.405 0.731 0.350 1.528 SEA rs17248901 36622465 4 206 0.854 0.938 0.475 1.852 NL rs6780039 36630641 2 418 0.625 0.849 0.440 1.638 SEA rs6780039 36630641 2 200 0.120 0.617 0.336 1.134 ON rs6550433 36792332 4 411 0.554 0.869 0.547 1.383 ON rs11927951 36792615 2 415 0.395 0.821 0.521 1.293 ON rs1472862 36800143 3 406 0.239 1.394 0.802 2.426 ON rs1472863 36800537 1 425 0.387 0.760 0.408 1.416 ON rs4075977 36803662 2 415 0.714 0.885 0.459 1.706 ON rs885164 36804255 2 414 0.094 1.633 0.920 2.899 ON rs987190 36804984 1 429 0.916 0.969 0.536 1.751 ON rs7648754 36807086 1 434 0.195 1.398 0.842 2.320 ON rs9311137 36817888 3 401 0.099 0.621 0.352 1.094 ON rs931913 36818153 3 414 0.182 1.457 0.838 2.532 ON rs10510689 36821298 4 415 0.151 0.463 0.162 1.326 ON rs9311138 36822275 3 411 0.972 1.011 0.540 1.894 ON rs7646626 36823920 4 414 0.021 0.566 0.348 0.919 ON rs906482 36831332 3 438 0.442 1.198 0.756 1.899 ON rs1553656 36834709 2 408 0.749 0.927 0.581 1.477 ON rs1532964 36835547 3 415 0.456 0.826 0.500 1.365 ON rs1006834 36836485 4 432 0.633 0.897 0.574 1.401 ON rs4624519 36837984 2 414 0.078 1.661 0.945 2.921 ON rs9917659 36839164 1 416 0.515 0.845 0.508 1.405 NL rs4431050 36855688 3 441 0.477 1.545 0.465 5.129 ON rs4431050 36855688 3 352 0.061 0.498 0.240 1.033 ON rs9882911 36860396 4 438 0.206 1.397 0.832 2.347 ON rs3733033 36867590 1 406 0.221 0.628 0.298 1.323 ON rs6808230 36886601 4 416 0.712 0.907 0.541 1.521 ON rs7649846 36889962 2 415 0.618 0.877 0.524 1.468 ON rs9810132 36895054 3 413 0.643 0.886 0.530 1.480 ON rs11129740 36900198 2 438 0.702 1.103 0.668 1.822 ON rs6765624 36902550 3 412 0.597 0.871 0.521 1.454 ON rs6781803 36912525 3 412 0.909 0.971 0.588 1.605 ON rs4328757 36913184 4 405 0.278 0.778 0.495 1.224 ON rs4528889 36914098 1 438 0.464 0.816 0.472 1.408

202

ON rs6550440 36914290 3 510 0.715 1.087 0.694 1.704 ON rs9864910 36915745 3 407 0.953 0.985 0.594 1.632 ON rs7614265 36919082 4 415 0.336 0.582 0.193 1.753 ON rs6769189 36933777 3 413 0.557 0.826 0.435 1.566 ON rs6769400 36933999 3 406 0.614 1.135 0.693 1.859 ON rs7622114 36935664 1 416 0.787 0.937 0.585 1.500 ON rs6777094 36951930 1 387 0.615 0.881 0.539 1.442 ON rs17035812 36969947 3 438 0.788 0.873 0.324 2.354 ON rs17202548 36970991 1 424 0.091 1.510 0.937 2.433 ON rs6768108 36973759 2 438 0.105 1.483 0.921 2.387 ON rs6789043 37000871 4 531 0.150 1.353 0.897 2.041 ON rs9311149 37005179 1 511 0.607 0.897 0.591 1.359 Combine rs1800734 37009950 3 1236 0.000 0.470 0.352 0.628 NL rs1800734 37009950 3 462 0.000 0.201 0.106 0.381 ON minus93 37009950 3 569 0.018 0.594 0.386 0.915 SEA rs1800734 37009950 3 205 0.043 0.590 0.354 0.984 ON rs1540354 37019493 1 406 0.830 0.933 0.495 1.758 NL i219v 37028572 1 438 0.331 1.436 0.692 2.980 ON i219v 37028572 1 569 0.221 1.317 0.847 2.046 ON rs4647269 37032595 2 437 0.176 1.384 0.864 2.217 ON rs3774339 37037858 2 431 0.093 1.500 0.935 2.406 ON rs3774332 37049672 1 531 0.298 1.579 0.668 3.733 ON rs9876420 37051463 3 360 0.317 0.581 0.200 1.684 ON rs748766 37057878 4 415 0.111 1.499 0.911 2.466 NL ivs1419 37058744 1 438 0.022 2.234 1.121 4.450 ON ivs1419 37058744 1 569 0.198 1.301 0.872 1.940 ON rs2241031 37065278 2 412 0.090 1.529 0.935 2.499 Combine rs749072 37071028 4 1202 0.000 0.451 0.340 0.599 NL rs749072 37071028 4 427 0.000 0.234 0.124 0.440 ON rs749072 37071028 1 569 0.003 0.532 0.350 0.808 SEA rs749072 37071028 4 206 0.014 0.541 0.332 0.884 ON rs3774326 37074570 3 436 0.121 1.450 0.907 2.318 ON rs11720064 37076523 3 415 0.039 1.706 1.028 2.831 ON rs1468712 37081017 4 438 0.126 1.442 0.902 2.303 ON rs7639375 37082026 2 404 0.275 1.300 0.812 2.079 ON rs7651033 37091046 2 398 0.046 1.687 1.010 2.815 ON rs2302504 37091390 3 415 0.084 1.544 0.944 2.527 ON rs17204801 37118438 3 436 0.127 1.439 0.902 2.298 ON rs6800554 37118883 2 432 0.125 1.434 0.905 2.271 ON rs336601 37159477 3 416 0.184 1.496 0.826 2.709 ON rs6550458 37163322 4 414 0.084 1.533 0.945 2.487 ON rs9869432 37171605 4 416 0.072 1.574 0.961 2.578 ON rs9823617 37174354 1 404 0.103 1.527 0.918 2.539 ON rs6786857 37180019 2 420 0.588 1.289 0.514 3.236 ON rs1392749 37180195 2 438 0.126 1.442 0.902 2.303 ON rs11711937 37199842 2 436 0.148 1.416 0.884 2.268 Combine rs13098279 37207462 3 1202 0.000 0.508 0.379 0.681 NL rs13098279 37207462 3 427 0.000 0.206 0.107 0.395 ON rs13098279 37207462 3 569 0.037 0.629 0.407 0.972 SEA rs13098279 37207462 3 206 0.137 0.677 0.405 1.132 ON rs9874437 37241682 2 416 0.191 1.380 0.852 2.235 ON rs7639607 37244043 4 432 0.333 1.255 0.792 1.988 ON rs4678562 37257074 2 402 0.139 1.456 0.885 2.396 ON rs6800842 37264608 3 438 0.237 1.323 0.832 2.104 ON rs17036181 37277760 4 432 0.770 0.852 0.290 2.503 NL rs10510691 37500765 4 427 0.174 0.583 0.268 1.270 SEA rs10510691 37500765 4 206 0.013 0.450 0.240 0.844 SEA = Seattle, NL = Newfoundland, ON = Ontario, Alleles 1 = A, 2 = C, 3 = G, 4 = T, Meth = MLH1 Promoter methylation, OR = odds ratio, L95% = lower 95% Confidence Interval, U95% = Upper 95% Confidence Interval

203

Table A.3 Single Marker Analyses by Tumour MLH1 IHC Status Location Marker Position Ref IHC IHC IHC IHC IHC Allele No. P-value OR L95% U95% NL rs6781630 36561025 3 421 0.622 1.181 0.609 2.292 SEA rs6781630 36561025 3 510 0.889 1.040 0.604 1.790 NL rs6550425 36566207 1 421 0.600 1.149 0.684 1.931 SEA rs6550425 36566207 3 510 0.155 0.764 0.527 1.107 NL rs11716822 36566673 1 421 0.622 1.181 0.609 2.292 SEA rs11716822 36566673 1 510 0.889 1.040 0.604 1.790 NL rs1357069 36567042 1 421 0.622 1.181 0.609 2.292 SEA rs1357069 36567042 1 510 0.889 1.040 0.604 1.790 NL rs1357070 36567136 2 421 0.622 1.181 0.609 2.292 SEA rs1357070 36567136 2 510 0.889 1.040 0.604 1.790 NL rs1357071 36567376 4 420 0.613 1.187 0.612 2.301 SEA rs1357071 36567376 4 504 0.869 0.954 0.548 1.662 NL rs11129723 36572905 1 421 0.622 1.181 0.609 2.292 SEA rs11129723 36572905 1 510 0.909 0.968 0.556 1.685 NL rs1521268 36609993 1 420 0.548 1.230 0.626 2.419 SEA rs1521268 36609993 1 510 0.909 0.968 0.556 1.685 NL rs1402563 36611158 4 421 0.622 1.181 0.609 2.292 SEA rs1402563 36611158 4 510 0.909 0.968 0.556 1.685 NL rs1464412 36611910 2 421 0.252 1.340 0.812 2.209 SEA rs1464412 36611910 4 510 0.812 1.046 0.724 1.511 NL rs17248901 36622465 4 421 0.690 1.143 0.591 2.211 SEA rs17248901 36622465 4 510 0.945 1.019 0.594 1.749 NL rs6780039 36630641 2 413 0.707 0.889 0.480 1.645 SEA rs6780039 36630641 2 501 0.154 1.402 0.881 2.233 ON rs6550433 36792332 4 527 0.753 0.933 0.604 1.440 ON rs11927951 36792615 2 531 0.841 1.044 0.686 1.588 ON rs1472862 36800143 3 519 0.861 1.041 0.667 1.623 ON rs1472863 36800537 1 543 0.637 0.855 0.445 1.641 ON rs4075977 36803662 2 531 0.441 0.767 0.391 1.505 ON rs885164 36804255 2 529 0.645 0.897 0.566 1.423 ON rs987190 36804984 1 542 0.655 0.874 0.483 1.580 ON rs7648754 36807086 1 552 0.898 1.028 0.674 1.569 ON rs9311137 36817888 3 513 0.224 1.371 0.825 2.278 ON rs931913 36818153 3 530 0.400 0.814 0.505 1.314 ON rs10510689 36821298 4 525 0.140 2.103 0.784 5.639 ON rs9311138 36822275 3 521 0.480 0.791 0.412 1.517 ON rs7646626 36823920 4 529 0.056 1.531 0.990 2.368 ON rs906482 36831332 3 556 0.484 0.862 0.570 1.305 ON rs1553656 36834709 2 521 0.842 0.956 0.615 1.486 ON rs1532964 36835547 3 525 0.760 1.075 0.675 1.713 ON rs1006834 36836485 4 550 0.420 1.185 0.785 1.790 ON rs4624519 36837984 2 530 0.186 0.720 0.442 1.171 ON rs9917659 36839164 1 532 0.956 1.013 0.640 1.603 NL rs4431050 36855688 3 439 0.245 0.540 0.191 1.526 ON rs4431050 36855688 3 447 0.308 1.496 0.690 3.243 ON rs9882911 36860396 4 556 0.170 0.717 0.446 1.153 ON rs3733033 36867590 1 512 0.783 1.116 0.512 2.431 ON rs6808230 36886601 4 532 0.225 1.323 0.842 2.079 ON rs7649846 36889962 2 529 0.191 1.351 0.860 2.122 ON rs9810132 36895054 3 529 0.200 1.342 0.856 2.105 ON rs11129740 36900198 2 556 0.418 1.202 0.771 1.874 ON rs6765624 36902550 3 526 0.169 1.371 0.875 2.147 ON rs6781803 36912525 3 518 0.474 1.183 0.746 1.877 ON rs4328757 36913184 4 514 0.653 1.100 0.726 1.668 ON rs4528889 36914098 1 555 0.719 0.905 0.526 1.558

204

ON rs6550440 36914290 3 637 0.562 1.126 0.754 1.682 ON rs9864910 36915745 3 518 0.358 1.234 0.788 1.931 ON rs7614265 36919082 4 525 0.197 1.903 0.715 5.065 ON rs6769189 36933777 3 529 0.605 1.158 0.665 2.017 ON rs6769400 36933999 3 520 0.958 0.988 0.635 1.537 ON rs7622114 36935664 1 532 0.931 0.981 0.642 1.501 ON rs6777094 36951930 1 498 0.626 1.116 0.719 1.732 ON rs17035812 36969947 3 556 0.359 1.481 0.639 3.431 ON rs17202548 36970991 1 534 0.262 0.782 0.508 1.202 ON rs6768108 36973759 2 556 0.280 0.789 0.513 1.213 ON rs6789043 37000871 4 663 0.455 0.869 0.600 1.257 ON rs9311149 37005179 1 636 0.775 1.056 0.724 1.541 Combine rs1800734 37009950 3 1674 0.000 1.823 1.420 2.339 NL rs1800734 37009950 3 456 0.000 3.231 1.887 5.533 ON minus93 37009950 3 709 0.040 1.509 1.019 2.235 SEA rs1800734 37009950 3 509 0.018 1.660 1.093 2.522 ON rs1540354 37019493 1 513 0.703 1.117 0.632 1.975 NL i219v 37028572 1 436 0.638 0.869 0.484 1.560 ON i219v 37028572 1 709 0.711 0.931 0.636 1.361 ON rs4647269 37032595 2 554 0.344 0.813 0.529 1.249 ON rs3774339 37037858 2 548 0.221 0.764 0.497 1.175 ON rs3774332 37049672 1 663 0.182 0.587 0.268 1.284 ON rs9876420 37051463 3 457 0.024 2.700 1.140 6.391 ON rs748766 37057878 4 531 0.190 0.744 0.478 1.158 NL ivs1419 37058744 1 436 0.062 0.594 0.343 1.026 ON ivs1419 37058744 1 709 0.609 0.913 0.642 1.296 ON rs2241031 37065278 2 529 0.166 0.734 0.473 1.137 Combine rs749072 37071028 4 1640 0.000 1.872 1.469 2.386 NL rs749072 37071028 4 421 0.000 2.676 1.582 4.527 ON rs749072 37071028 1 709 0.011 1.623 1.115 2.360 SEA rs749072 37071028 4 510 0.004 1.823 1.217 2.731 ON rs3774326 37074570 3 554 0.345 0.813 0.529 1.249 ON rs11720064 37076523 3 531 0.083 0.673 0.430 1.053 ON rs1468712 37081017 4 555 0.262 0.784 0.512 1.200 ON rs7639375 37082026 2 508 0.628 0.901 0.590 1.374 ON rs7651033 37091046 2 512 0.183 0.733 0.464 1.157 ON rs2302504 37091390 3 531 0.155 0.727 0.469 1.128 ON rs17204801 37118438 3 553 0.272 0.787 0.514 1.206 ON rs6800554 37118883 2 546 0.516 0.868 0.567 1.329 ON rs336601 37159477 3 532 0.231 0.733 0.441 1.219 ON rs6550458 37163322 4 527 0.158 0.728 0.469 1.131 ON rs9869432 37171605 4 532 0.136 0.716 0.461 1.111 ON rs9823617 37174354 1 520 0.105 0.685 0.433 1.083 ON rs6786857 37180019 2 527 0.285 1.504 0.712 3.176 ON rs1392749 37180195 2 556 0.269 0.786 0.513 1.204 ON rs11711937 37199842 2 554 0.191 0.748 0.485 1.156 Combine rs13098279 37207462 3 1640 0.000 1.691 1.313 2.179 NL rs13098279 37207462 3 421 0.000 3.085 1.798 5.292 ON rs13098279 37207462 3 709 0.090 1.412 0.947 2.106 SEA rs13098279 37207462 3 510 0.070 1.479 0.969 2.257 ON rs9874437 37241682 2 532 0.519 0.868 0.564 1.336 ON rs7639607 37244043 4 547 0.695 0.918 0.598 1.409 ON rs4678562 37257074 2 516 0.394 0.827 0.535 1.280 ON rs6800842 37264608 3 556 0.645 0.905 0.593 1.382 ON rs17036181 37277760 4 549 0.250 1.708 0.686 4.254 NL rs10510691 37500765 4 421 0.452 1.307 0.651 2.626 SEA rs10510691 37500765 4 510 0.399 1.233 0.758 2.008 SEA = Seattle, NL = Newfoundland, ON = Ontario, Alleles 1 = A, 2 = C, 3 = G, 4 = T, IHC = MLH1 IHC status, OR = odds ratio, L95% = lower 95% Confidence Interval, U95% = Upper 95% Confidence Interval

205

Table A.4 Single Marker Analyses by Tumour MSI Status Location Marker Position Ref MSI MSI MSI MSI MSI Allele No. P-value OR L95% U95% NL rs6781630 36561025 3 423 0.868 0.948 0.503 1.785 SEA rs6781630 36561025 3 481 0.240 0.738 0.445 1.224 NL rs6550425 36566207 1 423 0.301 0.775 0.478 1.256 SEA rs6550425 36566207 3 481 0.580 1.105 0.775 1.576 NL rs11716822 36566673 1 423 0.868 0.948 0.503 1.785 SEA rs11716822 36566673 1 481 0.240 0.738 0.445 1.224 NL rs1357069 36567042 1 423 0.868 0.948 0.503 1.785 SEA rs1357069 36567042 1 481 0.240 0.738 0.445 1.224 NL rs1357070 36567136 2 423 0.868 0.948 0.503 1.785 SEA rs1357070 36567136 2 481 0.240 0.738 0.445 1.224 NL rs1357071 36567376 4 422 0.856 0.943 0.501 1.776 SEA rs1357071 36567376 4 475 0.381 0.795 0.476 1.328 NL rs11129723 36572905 1 423 0.868 0.948 0.503 1.785 SEA rs11129723 36572905 1 481 0.350 0.783 0.469 1.308 NL rs1521268 36609993 1 422 0.770 0.908 0.476 1.733 SEA rs1521268 36609993 1 481 0.350 0.783 0.469 1.308 NL rs1402563 36611158 4 423 0.868 0.948 0.503 1.785 SEA rs1402563 36611158 4 481 0.350 0.783 0.469 1.308 NL rs1464412 36611910 2 423 0.291 0.778 0.488 1.240 SEA rs1464412 36611910 4 481 0.645 1.088 0.761 1.555 NL rs17248901 36622465 4 423 0.954 0.982 0.523 1.842 SEA rs17248901 36622465 4 481 0.279 0.758 0.459 1.252 NL rs6780039 36630641 2 415 0.837 1.060 0.606 1.856 SEA rs6780039 36630641 2 473 0.039 0.625 0.401 0.976 ON rs6550433 36792332 4 552 0.783 1.055 0.721 1.543 ON rs11927951 36792615 2 556 0.867 0.969 0.669 1.403 ON rs1472862 36800143 3 544 0.764 1.065 0.707 1.603 ON rs1472863 36800537 1 567 0.736 0.912 0.532 1.561 ON rs4075977 36803662 2 556 0.895 1.038 0.594 1.816 ON rs885164 36804255 2 554 0.361 1.217 0.798 1.856 ON rs987190 36804984 1 567 0.999 1.000 0.605 1.655 ON rs7648754 36807086 1 577 0.937 1.016 0.691 1.492 ON rs9311137 36817888 3 536 0.062 0.655 0.419 1.022 ON rs931913 36818153 3 555 0.890 1.029 0.685 1.548 ON rs10510689 36821298 4 549 0.187 0.535 0.211 1.355 ON rs9311138 36822275 3 545 0.973 0.991 0.580 1.693 ON rs7646626 36823920 4 555 0.017 0.626 0.426 0.920 ON rs906482 36831332 3 581 0.086 1.398 0.954 2.049 ON rs1553656 36834709 2 544 0.545 1.129 0.762 1.673 ON rs1532964 36835547 3 549 0.507 1.163 0.744 1.819 ON rs1006834 36836485 4 574 0.207 0.791 0.549 1.139 ON rs4624519 36837984 2 555 0.773 1.061 0.708 1.591 ON rs9917659 36839164 1 557 0.227 1.309 0.845 2.027 NL rs4431050 36855688 3 436 0.514 1.335 0.560 3.183 ON rs4431050 36855688 3 465 0.133 0.607 0.317 1.164 ON rs9882911 36860396 4 581 0.596 1.115 0.746 1.668 ON rs3733033 36867590 1 535 0.345 0.737 0.391 1.389 ON rs6808230 36886601 4 557 0.515 0.872 0.576 1.318 ON rs7649846 36889962 2 554 0.450 0.853 0.564 1.289 ON rs9810132 36895054 3 554 0.465 0.858 0.568 1.295 ON rs11129740 36900198 2 581 0.875 0.968 0.644 1.454 ON rs6765624 36902550 3 551 0.398 0.838 0.556 1.263 ON rs6781803 36912525 3 544 0.703 0.919 0.595 1.420 ON rs4328757 36913184 4 536 0.692 0.927 0.638 1.347 ON rs4528889 36914098 1 580 0.541 0.868 0.550 1.368

206

ON rs6550440 36914290 3 668 0.918 0.980 0.674 1.426 ON rs9864910 36915745 3 543 0.647 0.908 0.599 1.374 ON rs7614265 36919082 4 549 0.363 0.636 0.240 1.686 ON rs6769189 36933777 3 554 0.450 0.824 0.498 1.363 ON rs6769400 36933999 3 544 0.871 1.033 0.698 1.528 ON rs7622114 36935664 1 557 0.885 1.028 0.703 1.506 ON rs6777094 36951930 1 523 0.653 0.913 0.616 1.356 ON rs17035812 36969947 3 581 0.945 0.971 0.422 2.236 ON rs17202548 36970991 1 558 0.169 1.309 0.892 1.922 ON rs6768108 36973759 2 581 0.148 1.326 0.905 1.944 ON rs6789043 37000871 4 696 0.234 1.227 0.876 1.719 ON rs9311149 37005179 1 668 0.753 0.947 0.673 1.331 Combine rs1800734 37009950 3 1681 0.000 0.574 0.454 0.725 NL rs1800734 37009950 3 457 0.000 0.364 0.220 0.603 ON minus93 37009950 3 744 0.005 0.601 0.420 0.859 SEA rs1800734 37009950 3 480 0.082 0.699 0.467 1.046 ON rs1540354 37019493 1 538 0.997 1.001 0.602 1.664 NL i219v 37028572 1 433 0.580 1.172 0.669 2.052 ON i219v 37028572 1 744 0.743 1.060 0.747 1.504 ON rs4647269 37032595 2 580 0.203 1.280 0.875 1.873 ON rs3774339 37037858 2 573 0.161 1.315 0.897 1.928 ON rs3774332 37049672 1 696 0.102 1.848 0.885 3.861 ON rs9876420 37051463 3 479 0.255 0.600 0.249 1.447 ON rs748766 37057878 4 556 0.164 1.319 0.893 1.950 NL ivs1419 37058744 1 433 0.097 1.543 0.925 2.574 ON ivs1419 37058744 1 744 0.331 1.175 0.849 1.625 ON rs2241031 37065278 2 553 0.138 1.340 0.910 1.974 Combine rs749072 37071028 4 1648 0.000 0.550 0.439 0.691 NL rs749072 37071028 4 423 0.001 0.439 0.269 0.718 ON rs749072 37071028 1 744 0.000 0.529 0.375 0.747 SEA rs749072 37071028 4 481 0.029 0.649 0.441 0.956 ON rs3774326 37074570 3 579 0.202 1.281 0.875 1.876 ON rs11720064 37076523 3 556 0.078 1.425 0.961 2.113 ON rs1468712 37081017 4 581 0.157 1.315 0.900 1.920 ON rs7639375 37082026 2 534 0.461 1.158 0.784 1.711 ON rs7651033 37091046 2 534 0.244 1.268 0.850 1.891 ON rs2302504 37091390 3 556 0.173 1.311 0.888 1.935 ON rs17204801 37118438 3 579 0.159 1.313 0.899 1.917 ON rs6800554 37118883 2 572 0.268 1.238 0.849 1.804 ON rs336601 37159477 3 557 0.045 1.614 1.011 2.576 ON rs6550458 37163322 4 552 0.105 1.378 0.935 2.032 ON rs9869432 37171605 4 557 0.120 1.361 0.923 2.006 ON rs9823617 37174354 1 544 0.101 1.399 0.936 2.091 ON rs6786857 37180019 2 553 0.461 0.773 0.390 1.533 ON rs1392749 37180195 2 581 0.157 1.315 0.900 1.920 ON rs11711937 37199842 2 579 0.114 1.363 0.928 2.003 Combine rs13098279 37207462 3 1648 0.000 0.626 0.494 0.794 NL rs13098279 37207462 3 423 0.000 0.405 0.244 0.671 ON rs13098279 37207462 3 744 0.017 0.643 0.447 0.923 SEA rs13098279 37207462 3 481 0.241 0.782 0.519 1.180 ON rs9874437 37241682 2 557 0.261 1.244 0.850 1.821 ON rs7639607 37244043 4 572 0.261 1.240 0.852 1.806 ON rs4678562 37257074 2 540 0.257 1.253 0.848 1.849 ON rs6800842 37264608 3 581 0.211 1.272 0.873 1.854 ON rs17036181 37277760 4 573 0.733 0.858 0.355 2.070 NL rs10510691 37500765 4 423 0.425 0.768 0.402 1.468 SEA rs10510691 37500765 4 481 0.058 0.641 0.404 1.016 SEA = Seattle, NL = Newfoundland, ON = Ontario, Alleles 1 = A, 2 = C, 3 = G, 4 = T, MSI = Tumour MSI status, OR = odds ratio, L95% = lower 95% Confidence Interval, U95% = Upper 95% Confidence Interval

207

Table A.5 Summaries of Logistic Regression Models Location SNP Model Formula AIC Resid df Sample Dev Size NL NA 1 msi ~ AgeDiag 257.89 253.89 412 414 NL NA 2 msi ~ sex 253.05 249.05 412 414 NL NA 3 msi ~ AgeDiag + sex 255.07 249.07 411 414 NL NA 4 msi ~ AgeDiag + sex + ihc 99.66 91.66 410 414 NL NA 5 msi ~ AgeDiag + sex + methylation 159.38 151.38 410 414 NL NA 6 msi ~ AgeDiag + sex + ihc + methylation 93.77 83.77 409 414 NL rs1800734 7 msi ~ snp1 243.46 239.46 412 414 NL rs1800734 8 msi ~ AgeDiag + sex + snp1 241.94 233.94 410 414 NL rs1800734 9 msi ~ AgeDiag + sex + ihc + snp1 101.59 91.59 409 414 NL rs1800734 10 msi ~ AgeDiag + sex + methylation + snp1 161.35 151.35 409 414 NL rs1800734 11 msi ~ AgeDiag + sex + ihc + methylation + snp1 95.82 83.82 408 414 NL rs749072 12 msi ~ snp2 247.85 243.85 412 414 NL rs749072 13 msi ~ AgeDiag + sex + snp2 246.45 238.45 410 414 NL rs749072 14 msi ~ AgeDiag + sex + ihc + snp2 101.78 91.78 409 414 NL rs749072 15 msi ~ AgeDiag + sex + methylation + snp2 161.47 151.47 409 414 NL rs749072 16 msi ~ AgeDiag + sex + ihc + methylation + snp2 95.62 83.62 408 414 NL rs13098279 17 msi ~ snp3 246.69 242.69 412 414 NL rs13098279 18 msi ~ AgeDiag + sex + snp3 245.40 237.40 410 414 NL rs13098279 19 msi ~ AgeDiag + sex + ihc + snp3 101.82 91.82 409 414 NL rs13098279 20 msi ~ AgeDiag + sex + methylation + snp3 161.47 151.47 409 414 NL rs13098279 21 msi ~ AgeDiag + sex + ihc + methylation + snp3 95.09 83.09 408 414 SEA NA 1 msi ~ AgeDiag 247.28 243.28 187 189 SEA NA 2 msi ~ sex 243.16 239.16 187 189 SEA NA 3 msi ~ AgeDiag + sex 244.46 238.46 186 189 SEA NA 4 msi ~ AgeDiag + sex + ihc 47.06 39.06 185 189 SEA NA 5 msi ~ AgeDiag + sex + methylation 125.40 117.40 185 189 SEA NA 6 msi ~ AgeDiag + sex + ihc + methylation 47.06 37.06 184 189 SEA rs1800734 7 msi ~ snp1 245.59 241.59 187 189 SEA rs1800734 8 msi ~ AgeDiag + sex + snp1 244.21 236.21 185 189 SEA rs1800734 9 msi ~ AgeDiag + sex + ihc + snp1 48.68 38.68 184 189 SEA rs1800734 10 msi ~ AgeDiag + sex + methylation + snp1 127.47 117.47 184 189 SEA rs1800734 11 msi ~ AgeDiag + sex + ihc + methylation + snp1 48.77 36.77 183 189 SEA rs749072 12 msi ~ snp2 242.59 238.59 187 189 SEA rs749072 13 msi ~ AgeDiag + sex + snp2 241.81 233.81 185 189 SEA rs749072 14 msi ~ AgeDiag + sex + ihc + snp2 48.43 38.43 184 189 SEA rs749072 15 msi ~ AgeDiag + sex + methylation + snp2 127.33 117.33 184 189 SEA rs749072 16 msi ~ AgeDiag + sex + ihc + methylation + snp2 48.58 36.58 183 189 SEA rs13098279 17 msi ~ snp3 246.90 242.90 187 189 SEA rs13098279 18 msi ~ AgeDiag + sex + snp3 245.23 237.23 185 189 SEA rs13098279 19 msi ~ AgeDiag + sex + ihc + snp3 48.73 38.73 184 189 SEA rs13098279 20 msi ~ AgeDiag + sex + methylation + snp3 127.47 117.47 184 189 SEA rs13098279 21 msi ~ AgeDiag + sex + ihc + methylation + snp3 48.80 36.80 183 189 ON NA 1 msi ~ AgeDiag 417.50 413.50 524 526 ON NA 2 msi ~ sex 410.67 406.67 524 526 ON NA 3 msi ~ AgeDiag + sex 409.13 403.13 523 526 ON NA 4 msi ~ AgeDiag + sex + ihc 96.71 88.71 522 526 ON NA 5 msi ~ AgeDiag + sex + methylation 195.47 187.47 522 526 ON NA 6 msi ~ AgeDiag + sex + ihc + methylation 95.35 85.35 521 526 ON rs1800734 7 msi ~ snp1 418.13 414.13 524 526 ON rs1800734 8 msi ~ AgeDiag + sex + snp1 409.45 401.45 522 526 ON rs1800734 9 msi ~ AgeDiag + sex + ihc + snp1 98.67 88.67 521 526 ON rs1800734 10 msi ~ AgeDiag + sex + methylation + snp1 197.03 187.03 521 526 ON rs1800734 11 msi ~ AgeDiag + sex + ihc + methylation + snp1 97.56 85.56 520 526 ON rs749072 12 msi ~ snp2 414.57 410.57 524 526 ON rs749072 13 msi ~ AgeDiag + sex + snp2 405.82 397.82 522 526

208

ON rs749072 14 msi ~ AgeDiag + sex + ihc + snp2 97.39 87.39 521 526 ON rs749072 15 msi ~ AgeDiag + sex + methylation + snp2 197.43 187.43 521 526 ON rs749072 16 msi ~ AgeDiag + sex + ihc + methylation + snp2 96.67 84.67 520 526 ON rs13098279 17 msi ~ snp3 419.60 415.60 524 526 ON rs13098279 18 msi ~ AgeDiag + sex + snp3 410.76 402.76 522 526 ON rs13098279 19 msi ~ AgeDiag + sex + ihc + snp3 98.90 88.90 521 526 ON rs13098279 20 msi ~ AgeDiag + sex + methylation + snp3 195.44 185.44 521 526 ON rs13098279 21 msi ~ AgeDiag + sex + ihc + methylation + snp3 97.39 85.39 520 526 Combined NA 0 msi ~ Location 920.79 914.79 1126 1129 Combined NA 1 msi ~ Location + AgeDiag 920.11 912.11 1125 1129 Combined NA 2 msi ~ Location + sex 902.87 894.87 1125 1129 Combined NA 3 msi ~ Location + AgeDiag + sex 902.17 892.17 1124 1129 Combined NA 4 msi ~ Location + AgeDiag + sex + ihc 238.72 226.72 1123 1129 Combined NA 5 msi ~ Location + AgeDiag + sex + methylation 470.64 458.64 1123 1129 Combined NA 6 msi ~ Location + AgeDiag + sex + ihc + 225.12 211.12 1122 1129 methylation Combined rs1800734 7 msi ~ Location + snp1 907.84 899.84 1125 1129 Combined rs1800734 8 msi ~ Location + AgeDiag + sex + snp1 890.89 878.89 1123 1129 Combined rs1800734 9 msi ~ Location + AgeDiag + sex + ihc + snp1 240.83 226.83 1122 1129 Combined rs1800734 10 msi ~ Location + AgeDiag + sex + methylation + 472.63 458.63 1122 1129 snp1 Combined rs1800734 11 msi ~ Location + AgeDiag + sex + ihc + 227.05 211.05 1121 1129 methylation + snp1 Combined rs749072 12 msi ~ Location + snp2 902.18 894.18 1125 1129 Combined rs749072 13 msi ~ Location + AgeDiag + sex + snp2 885.46 873.46 1123 1129 Combined rs749072 14 msi ~ Location + AgeDiag + sex + ihc + snp2 240.72 226.72 1122 1129 Combined rs749072 15 msi ~ Location + AgeDiag + sex + methylation + 472.56 458.56 1122 1129 snp2 Combined rs749072 16 msi ~ Location + AgeDiag + sex + ihc + 227.23 211.23 1121 1129 methylation + snp2 Combined rs13098279 17 msi ~ Location + snp3 913.82 905.82 1125 1129 Combined rs13098279 18 msi ~ Location + AgeDiag + sex + snp3 896.57 884.57 1123 1129 Combined rs13098279 19 msi ~ Location + AgeDiag + sex + ihc + snp3 240.41 226.41 1122 1129 Combined rs13098279 20 msi ~ Location + AgeDiag + sex + methylation + 471.93 457.93 1122 1129 snp3 Combined rs13098279 21 msi ~ Location + AgeDiag + sex + ihc + 225.94 209.94 1121 1129 methylation + snp3 SEA = Seattle, NL = Newfoundland, ON = Ontario, Snp1 = rs1800734, Snp2 = rs749072, Snp3 = rs13098279, msi = Tumour MSI status, ihc = Tumour MLH1 IHC status, methylation = Tumour MLH1 promoter methylation status, AIC = Akaike’s Information Criterion, df = degrees of freedom, NA = not applicable

209

Table A.6 Details of Logistic Regression Models Location SNP Covariate Model Estimate Std. Error z value P-value Number ON NA (Intercept) 1 -3.40 0.99 -3.44 0.001 ON NA AgeDiag 1 0.03 0.02 1.60 0.109 ON NA (Intercept) 2 -2.28 0.21 -10.97 0.000 ON NA sexTRUE 2 0.80 0.26 3.02 0.003 ON NA (Intercept) 3 -4.05 1.02 -3.97 0.000 ON NA AgeDiag 3 0.03 0.02 1.79 0.073 ON NA sexTRUE 3 0.83 0.27 3.13 0.002 ON NA (Intercept) 4 -1.66 2.00 -0.83 0.404 ON NA AgeDiag 4 -0.04 0.03 -1.16 0.246 ON NA sexTRUE 4 0.33 0.62 0.53 0.594 ON NA ihcTRUE 4 8.70 1.44 6.03 0.000 ON NA (Intercept) 5 -1.29 1.40 -0.92 0.356 ON NA AgeDiag 5 -0.03 0.02 -1.28 0.202 ON NA sexTRUE 5 0.00 0.42 0.01 0.994 ON NA methylationTRUE 5 5.88 0.63 9.29 0.000 ON NA (Intercept) 6 -1.45 2.00 -0.73 0.467 ON NA AgeDiag 6 -0.04 0.03 -1.26 0.209 ON NA sexTRUE 6 0.13 0.64 0.21 0.836 ON NA ihcTRUE 6 7.16 1.40 5.09 0.000 ON NA methylationTRUE 6 2.99 1.15 2.60 0.009 ON rs1800734 (Intercept) 7 -1.38 0.32 -4.30 0.000 ON rs1800734 snp1 7 -0.31 0.20 -1.53 0.125 ON rs1800734 (Intercept) 8 -3.64 1.06 -3.43 0.001 ON rs1800734 AgeDiag 8 0.03 0.02 1.81 0.070 ON rs1800734 sexTRUE 8 0.81 0.27 3.03 0.002 ON rs1800734 snp1 8 -0.28 0.21 -1.34 0.181 ON rs1800734 (Intercept) 9 -1.16 2.13 -0.54 0.587 ON rs1800734 AgeDiag 9 -0.04 0.03 -1.17 0.241 ON rs1800734 sexTRUE 9 0.30 0.61 0.49 0.628 ON rs1800734 ihcTRUE 9 8.62 1.42 6.08 0.000 ON rs1800734 snp1 9 -0.29 0.48 -0.61 0.543 ON rs1800734 (Intercept) 10 -1.69 1.50 -1.13 0.260 ON rs1800734 AgeDiag 10 -0.03 0.02 -1.29 0.197 ON rs1800734 sexTRUE 10 0.02 0.42 0.06 0.954 ON rs1800734 methylationTRUE 10 5.90 0.64 9.27 0.000 ON rs1800734 snp1 10 0.24 0.36 0.68 0.499 ON rs1800734 (Intercept) 11 -1.20 2.16 -0.55 0.579 ON rs1800734 AgeDiag 11 -0.04 0.03 -1.27 0.203 ON rs1800734 sexTRUE 11 0.13 0.63 0.20 0.840 ON rs1800734 ihcTRUE 11 7.09 1.38 5.15 0.000 ON rs1800734 methylationTRUE 11 2.84 1.18 2.42 0.016 ON rs1800734 snp1 11 -0.14 0.51 -0.27 0.789 ON rs749072 (Intercept) 12 -1.19 0.29 -4.17 0.000 ON rs749072 snp2 12 -0.47 0.19 -2.45 0.014 ON rs749072 (Intercept) 13 -3.46 1.05 -3.29 0.001 ON rs749072 AgeDiag 13 0.03 0.02 1.85 0.065 ON rs749072 sexTRUE 13 0.80 0.27 3.01 0.003 ON rs749072 snp2 13 -0.46 0.20 -2.34 0.019 ON rs749072 (Intercept) 14 -0.66 2.12 -0.31 0.754 ON rs749072 AgeDiag 14 -0.04 0.03 -1.19 0.235 ON rs749072 sexTRUE 14 0.27 0.61 0.44 0.662 ON rs749072 ihcTRUE 14 8.70 1.44 6.05 0.000 ON rs749072 snp2 14 -0.61 0.44 -1.39 0.166 ON rs749072 (Intercept) 15 -1.10 1.48 -0.74 0.458 ON rs749072 AgeDiag 15 -0.03 0.02 -1.27 0.203

210

ON rs749072 sexTRUE 15 0.00 0.42 0.00 0.998 ON rs749072 methylationTRUE 15 5.83 0.63 9.23 0.000 ON rs749072 snp2 15 -0.12 0.32 -0.39 0.699 ON rs749072 (Intercept) 16 -0.58 2.12 -0.27 0.786 ON rs749072 AgeDiag 16 -0.04 0.03 -1.30 0.193 ON rs749072 sexTRUE 16 0.09 0.63 0.14 0.885 ON rs749072 ihcTRUE 16 7.19 1.39 5.18 0.000 ON rs749072 methylationTRUE 16 2.64 1.15 2.30 0.021 ON rs749072 snp2 16 -0.51 0.46 -1.11 0.265 ON rs13098279 (Intercept) 17 -1.56 0.34 -4.61 0.000 ON rs13098279 snp3 17 -0.19 0.21 -0.90 0.366 ON rs13098279 (Intercept) 18 -3.83 1.07 -3.58 0.000 ON rs13098279 AgeDiag 18 0.03 0.02 1.80 0.072 ON rs13098279 sexTRUE 18 0.82 0.27 3.07 0.002 ON rs13098279 snp3 18 -0.14 0.21 -0.65 0.514 ON rs13098279 (Intercept) 19 -1.57 2.15 -0.73 0.465 ON rs13098279 AgeDiag 19 -0.04 0.03 -1.17 0.241 ON rs13098279 sexTRUE 19 0.33 0.61 0.55 0.583 ON rs13098279 ihcTRUE 19 8.60 1.41 6.10 0.000 ON rs13098279 snp3 19 -0.04 0.51 -0.08 0.933 ON rs13098279 (Intercept) 20 -2.21 1.52 -1.45 0.147 ON rs13098279 AgeDiag 20 -0.03 0.02 -1.29 0.198 ON rs13098279 sexTRUE 20 0.05 0.42 0.12 0.908 ON rs13098279 methylationTRUE 20 6.00 0.65 9.20 0.000 ON rs13098279 snp3 20 0.53 0.39 1.37 0.171 ON rs13098279 (Intercept) 21 -1.74 2.19 -0.79 0.428 ON rs13098279 AgeDiag 21 -0.04 0.03 -1.26 0.209 ON rs13098279 sexTRUE 21 0.17 0.63 0.26 0.794 ON rs13098279 ihcTRUE 21 7.04 1.38 5.11 0.000 ON rs13098279 methylationTRUE 21 3.06 1.21 2.53 0.011 ON rs13098279 snp3 21 0.16 0.55 0.29 0.768 NL NA (Intercept) 1 -2.00 1.16 -1.73 0.084 NL NA AgeDiag 1 0.00 0.02 -0.24 0.813 NL NA (Intercept) 2 -2.61 0.25 -10.53 0.000 NL NA sexTRUE 2 0.75 0.34 2.20 0.028 NL NA (Intercept) 3 -2.47 1.19 -2.08 0.038 NL NA AgeDiag 3 0.00 0.02 -0.11 0.909 NL NA sexTRUE 3 0.75 0.34 2.20 0.028 NL NA (Intercept) 4 -2.06 1.90 -1.08 0.278 NL NA AgeDiag 4 -0.03 0.03 -0.98 0.329 NL NA sexTRUE 4 0.48 0.59 0.81 0.417 NL NA ihcTRUE 4 6.17 0.75 8.25 0.000 NL NA (Intercept) 5 -0.73 1.43 -0.51 0.608 NL NA AgeDiag 5 -0.04 0.02 -1.61 0.107 NL NA sexTRUE 5 0.03 0.47 0.06 0.955 NL NA methylationTRUE 5 5.32 0.76 7.03 0.000 NL NA (Intercept) 6 -1.63 1.85 -0.88 0.378 NL NA AgeDiag 6 -0.04 0.03 -1.22 0.222 NL NA sexTRUE 6 0.18 0.64 0.28 0.779 NL NA ihcTRUE 6 5.22 0.78 6.68 0.000 NL NA methylationTRUE 6 3.28 1.15 2.85 0.004 NL rs1800734 (Intercept) 7 -0.86 0.37 -2.35 0.019 NL rs1800734 snp1 7 -0.99 0.25 -3.88 0.000 NL rs1800734 (Intercept) 8 -0.80 1.30 -0.61 0.540 NL rs1800734 AgeDiag 8 -0.01 0.02 -0.31 0.755 NL rs1800734 sexTRUE 8 0.80 0.35 2.31 0.021 NL rs1800734 snp1 8 -1.01 0.26 -3.94 0.000 NL rs1800734 (Intercept) 9 -1.50 2.07 -0.73 0.467 NL rs1800734 AgeDiag 9 -0.03 0.03 -1.02 0.309

211

NL rs1800734 sexTRUE 9 0.49 0.59 0.84 0.401 NL rs1800734 ihcTRUE 9 5.96 0.75 7.97 0.000 NL rs1800734 snp1 9 -0.30 0.48 -0.62 0.537 NL rs1800734 (Intercept) 10 -0.43 1.57 -0.28 0.782 NL rs1800734 AgeDiag 10 -0.04 0.02 -1.62 0.105 NL rs1800734 sexTRUE 10 0.04 0.47 0.08 0.934 NL rs1800734 methylationTRUE 10 5.16 0.79 6.54 0.000 NL rs1800734 snp1 10 -0.17 0.39 -0.45 0.655 NL rs1800734 (Intercept) 11 -1.87 2.11 -0.88 0.376 NL rs1800734 AgeDiag 11 -0.04 0.03 -1.19 0.236 NL rs1800734 sexTRUE 11 0.16 0.63 0.26 0.798 NL rs1800734 ihcTRUE 11 5.17 0.78 6.59 0.000 NL rs1800734 methylationTRUE 11 3.36 1.23 2.73 0.006 NL rs1800734 snp1 11 0.12 0.52 0.23 0.816 NL rs749072 (Intercept) 12 -1.14 0.36 -3.19 0.001 NL rs749072 snp2 12 -0.81 0.25 -3.25 0.001 NL rs749072 (Intercept) 13 -1.14 1.27 -0.90 0.370 NL rs749072 AgeDiag 13 -0.01 0.02 -0.27 0.790 NL rs749072 sexTRUE 13 0.79 0.34 2.29 0.022 NL rs749072 snp2 13 -0.83 0.25 -3.32 0.001 NL rs749072 (Intercept) 14 -1.77 2.03 -0.87 0.384 NL rs749072 AgeDiag 14 -0.03 0.03 -1.00 0.318 NL rs749072 sexTRUE 14 0.48 0.58 0.83 0.409 NL rs749072 ihcTRUE 14 6.02 0.75 8.01 0.000 NL rs749072 snp2 14 -0.16 0.47 -0.34 0.734 NL rs749072 (Intercept) 15 -0.72 1.54 -0.47 0.639 NL rs749072 AgeDiag 15 -0.04 0.02 -1.61 0.108 NL rs749072 sexTRUE 15 0.03 0.47 0.05 0.956 NL rs749072 methylationTRUE 15 5.27 0.79 6.64 0.000 NL rs749072 snp2 15 -0.01 0.38 -0.02 0.984 NL rs749072 (Intercept) 16 -2.05 2.08 -0.99 0.324 NL rs749072 AgeDiag 16 -0.04 0.03 -1.17 0.243 NL rs749072 sexTRUE 16 0.16 0.63 0.26 0.794 NL rs749072 ihcTRUE 16 5.20 0.79 6.59 0.000 NL rs749072 methylationTRUE 16 3.45 1.23 2.80 0.005 NL rs749072 snp2 16 0.23 0.51 0.44 0.658 NL rs13098279 (Intercept) 17 -1.00 0.38 -2.67 0.008 NL rs13098279 snp3 17 -0.87 0.25 -3.45 0.001 NL rs13098279 (Intercept) 18 -0.97 1.30 -0.75 0.454 NL rs13098279 AgeDiag 18 -0.01 0.02 -0.29 0.774 NL rs13098279 sexTRUE 18 0.78 0.34 2.27 0.023 NL rs13098279 snp3 18 -0.89 0.25 -3.50 0.000 NL rs13098279 (Intercept) 19 -2.04 2.11 -0.97 0.333 NL rs13098279 AgeDiag 19 -0.03 0.03 -0.97 0.331 NL rs13098279 sexTRUE 19 0.47 0.58 0.81 0.421 NL rs13098279 ihcTRUE 19 6.08 0.77 7.89 0.000 NL rs13098279 snp3 19 -0.01 0.50 -0.01 0.991 NL rs13098279 (Intercept) 20 -0.71 1.58 -0.45 0.653 NL rs13098279 AgeDiag 20 -0.04 0.02 -1.61 0.108 NL rs13098279 sexTRUE 20 0.03 0.47 0.05 0.957 NL rs13098279 methylationTRUE 20 5.27 0.80 6.62 0.000 NL rs13098279 snp3 20 -0.01 0.40 -0.03 0.975 NL rs13098279 (Intercept) 21 -2.51 2.17 -1.16 0.248 NL rs13098279 AgeDiag 21 -0.03 0.03 -1.11 0.266 NL rs13098279 sexTRUE 21 0.17 0.63 0.27 0.784 NL rs13098279 ihcTRUE 21 5.29 0.82 6.46 0.000 NL rs13098279 methylationTRUE 21 3.64 1.27 2.86 0.004 NL rs13098279 snp3 21 0.42 0.56 0.76 0.447 SEA NA (Intercept) 1 -1.65 0.98 -1.69 0.092

212

SEA NA AgeDiag 1 0.02 0.02 1.07 0.283 SEA NA (Intercept) 2 -1.17 0.30 -3.84 0.000 SEA NA sexTRUE 2 0.78 0.35 2.20 0.028 SEA NA (Intercept) 3 -1.91 1.00 -1.91 0.056 SEA NA AgeDiag 3 0.01 0.02 0.80 0.425 SEA NA sexTRUE 3 0.74 0.35 2.08 0.037 SEA NA (Intercept) 4 -0.02 2.11 -0.01 0.993 SEA NA AgeDiag 4 -0.04 0.04 -1.06 0.288 SEA NA sexTRUE 4 -1.57 0.94 -1.67 0.094 SEA NA ihcTRUE 4 8.76 1.64 5.34 0.000 SEA NA (Intercept) 5 1.07 1.32 0.81 0.417 SEA NA AgeDiag 5 -0.05 0.02 -2.26 0.024 SEA NA sexTRUE 5 -0.12 0.52 -0.23 0.817 SEA NA methylationTRUE 5 4.91 0.64 7.64 0.000 SEA NA (Intercept) 6 0.59 2.12 0.28 0.782 SEA NA AgeDiag 6 -0.05 0.04 -1.42 0.156 SEA NA sexTRUE 6 -1.49 0.93 -1.60 0.109 SEA NA ihcTRUE 6 7.23 1.58 4.59 0.000 SEA NA methylationTRUE 6 2.32 1.12 2.07 0.039 SEA rs1800734 (Intercept) 7 -0.01 0.38 -0.04 0.972 SEA rs1800734 snp1 7 -0.41 0.24 -1.71 0.088 SEA rs1800734 (Intercept) 8 -1.34 1.08 -1.24 0.214 SEA rs1800734 AgeDiag 8 0.01 0.02 0.78 0.437 SEA rs1800734 sexTRUE 8 0.69 0.36 1.94 0.052 SEA rs1800734 snp1 8 -0.36 0.24 -1.48 0.139 SEA rs1800734 (Intercept) 9 -0.83 2.40 -0.34 0.731 SEA rs1800734 AgeDiag 9 -0.04 0.04 -1.09 0.277 SEA rs1800734 sexTRUE 9 -1.53 0.91 -1.68 0.093 SEA rs1800734 ihcTRUE 9 8.76 1.61 5.44 0.000 SEA rs1800734 snp1 9 0.52 0.74 0.70 0.485 SEA rs1800734 (Intercept) 10 1.15 1.46 0.79 0.431 SEA rs1800734 AgeDiag 10 -0.05 0.02 -2.25 0.024 SEA rs1800734 sexTRUE 10 -0.12 0.51 -0.23 0.815 SEA rs1800734 methylationTRUE 10 4.85 0.64 7.60 0.000 SEA rs1800734 snp1 10 -0.05 0.39 -0.13 0.895 SEA rs1800734 (Intercept) 11 -0.23 2.46 -0.09 0.927 SEA rs1800734 AgeDiag 11 -0.05 0.04 -1.42 0.157 SEA rs1800734 sexTRUE 11 -1.46 0.91 -1.61 0.108 SEA rs1800734 ihcTRUE 11 7.28 1.58 4.60 0.000 SEA rs1800734 methylationTRUE 11 2.22 1.10 2.01 0.044 SEA rs1800734 snp1 11 0.48 0.74 0.65 0.514 SEA rs749072 (Intercept) 12 0.16 0.35 0.45 0.654 SEA rs749072 snp2 12 -0.55 0.23 -2.39 0.017 SEA rs749072 (Intercept) 13 -1.14 1.07 -1.06 0.289 SEA rs749072 AgeDiag 13 0.01 0.02 0.76 0.446 SEA rs749072 sexTRUE 13 0.66 0.36 1.83 0.068 SEA rs749072 snp2 13 -0.49 0.23 -2.10 0.035 SEA rs749072 (Intercept) 14 -1.06 2.42 -0.44 0.661 SEA rs749072 AgeDiag 14 -0.04 0.04 -1.06 0.288 SEA rs749072 sexTRUE 14 -1.53 0.91 -1.68 0.094 SEA rs749072 ihcTRUE 14 8.90 1.66 5.36 0.000 SEA rs749072 snp2 14 0.63 0.75 0.85 0.398 SEA rs749072 (Intercept) 15 1.31 1.44 0.91 0.364 SEA rs749072 AgeDiag 15 -0.05 0.02 -2.25 0.025 SEA rs749072 sexTRUE 15 -0.13 0.51 -0.26 0.793 SEA rs749072 methylationTRUE 15 4.83 0.64 7.55 0.000 SEA rs749072 snp2 15 -0.16 0.37 -0.42 0.673 SEA rs749072 (Intercept) 16 -0.44 2.47 -0.18 0.860 SEA rs749072 AgeDiag 16 -0.05 0.04 -1.39 0.164

213

SEA rs749072 sexTRUE 16 -1.46 0.91 -1.60 0.109 SEA rs749072 ihcTRUE 16 7.38 1.62 4.56 0.000 SEA rs749072 methylationTRUE 16 2.19 1.10 1.99 0.047 SEA rs749072 snp2 16 0.58 0.74 0.78 0.438 SEA rs13098279 (Intercept) 17 -0.15 0.39 -0.37 0.710 SEA rs13098279 snp3 17 -0.31 0.24 -1.28 0.200 SEA rs13098279 (Intercept) 18 -1.45 1.08 -1.33 0.182 SEA rs13098279 AgeDiag 18 0.01 0.02 0.76 0.450 SEA rs13098279 sexTRUE 18 0.71 0.36 2.01 0.044 SEA rs13098279 snp3 18 -0.27 0.25 -1.10 0.271 SEA rs13098279 (Intercept) 19 -0.79 2.41 -0.33 0.743 SEA rs13098279 AgeDiag 19 -0.04 0.04 -1.08 0.279 SEA rs13098279 sexTRUE 19 -1.55 0.91 -1.70 0.089 SEA rs13098279 ihcTRUE 19 8.73 1.60 5.45 0.000 SEA rs13098279 snp3 19 0.49 0.74 0.67 0.502 SEA rs13098279 (Intercept) 20 1.16 1.47 0.79 0.430 SEA rs13098279 AgeDiag 20 -0.05 0.02 -2.26 0.024 SEA rs13098279 sexTRUE 20 -0.12 0.51 -0.23 0.817 SEA rs13098279 methylationTRUE 20 4.85 0.64 7.62 0.000 SEA rs13098279 snp3 20 -0.05 0.39 -0.14 0.888 SEA rs13098279 (Intercept) 21 -0.18 2.46 -0.07 0.940 SEA rs13098279 AgeDiag 21 -0.05 0.04 -1.41 0.157 SEA rs13098279 sexTRUE 21 -1.48 0.91 -1.63 0.104 SEA rs13098279 ihcTRUE 21 7.27 1.58 4.61 0.000 SEA rs13098279 methylationTRUE 21 2.23 1.10 2.01 0.044 SEA rs13098279 snp3 21 0.46 0.73 0.62 0.533 Combined NA (Intercept) 0 -2.28 0.17 -13.46 0.000 Combined NA LocationO 0 0.43 0.21 2.02 0.043 Combined NA LocationS 0 1.66 0.23 7.29 0.000 Combined NA (Intercept) 1 -3.22 0.62 -5.20 0.000 Combined NA LocationO 1 0.43 0.21 2.04 0.041 Combined NA LocationS 1 1.67 0.23 7.31 0.000 Combined NA AgeDiag 1 0.02 0.01 1.60 0.111 Combined NA (Intercept) 2 -2.64 0.20 -13.46 0.000 Combined NA LocationO 2 0.36 0.21 1.68 0.092 Combined NA LocationS 2 1.46 0.23 6.29 0.000 Combined NA sexTRUE 2 0.79 0.18 4.37 0.000 Combined NA (Intercept) 3 -3.59 0.63 -5.68 0.000 Combined NA LocationO 3 0.36 0.21 1.71 0.088 Combined NA LocationS 3 1.47 0.23 6.28 0.000 Combined NA AgeDiag 3 0.02 0.01 1.60 0.110 Combined NA sexTRUE 3 0.79 0.18 4.37 0.000 Combined NA (Intercept) 4 -1.79 1.25 -1.43 0.152 Combined NA LocationO 4 0.19 0.46 0.42 0.677 Combined NA LocationS 4 0.90 0.54 1.65 0.098 Combined NA AgeDiag 4 -0.04 0.02 -1.77 0.077 Combined NA sexTRUE 4 0.00 0.40 -0.01 0.994 Combined NA ihcTRUE 4 7.79 0.68 11.46 0.000 Combined NA (Intercept) 5 -0.58 0.83 -0.71 0.480 Combined NA LocationO 5 0.04 0.31 0.14 0.889 Combined NA LocationS 5 0.81 0.36 2.28 0.023 Combined NA AgeDiag 5 -0.04 0.01 -3.03 0.002 Combined NA sexTRUE 5 -0.04 0.27 -0.15 0.878 Combined NA methylationTRUE 5 5.56 0.40 14.01 0.000 Combined NA (Intercept) 6 -1.22 1.24 -0.98 0.325 Combined NA LocationO 6 0.15 0.46 0.32 0.752 Combined NA LocationS 6 0.63 0.55 1.13 0.256 Combined NA AgeDiag 6 -0.05 0.02 -2.25 0.025 Combined NA sexTRUE 6 -0.18 0.41 -0.45 0.656

214

Combined NA ihcTRUE 6 6.53 0.68 9.61 0.000 Combined NA methylationTRUE 6 3.03 0.66 4.60 0.000 Combined rs1800734 (Intercept) 7 -1.50 0.25 -5.91 0.000 Combined rs1800734 LocationO 7 0.41 0.21 1.92 0.054 Combined rs1800734 LocationS 7 1.65 0.23 7.16 0.000 Combined rs1800734 snp1 7 -0.52 0.13 -3.92 0.000 Combined rs1800734 (Intercept) 8 -2.83 0.67 -4.23 0.000 Combined rs1800734 LocationO 8 0.34 0.21 1.59 0.112 Combined rs1800734 LocationS 8 1.45 0.24 6.15 0.000 Combined rs1800734 AgeDiag 8 0.02 0.01 1.58 0.115 Combined rs1800734 sexTRUE 8 0.76 0.18 4.20 0.000 Combined rs1800734 snp1 8 -0.49 0.13 -3.69 0.000 Combined rs1800734 (Intercept) 9 -1.74 1.36 -1.28 0.199 Combined rs1800734 LocationO 9 0.19 0.46 0.42 0.672 Combined rs1800734 LocationS 9 0.90 0.54 1.67 0.095 Combined rs1800734 AgeDiag 9 -0.04 0.02 -1.77 0.076 Combined rs1800734 sexTRUE 9 0.00 0.40 -0.01 0.994 Combined rs1800734 ihcTRUE 9 7.74 0.68 11.45 0.000 Combined rs1800734 snp1 9 -0.03 0.33 -0.08 0.937 Combined rs1800734 (Intercept) 10 -0.65 0.90 -0.72 0.469 Combined rs1800734 LocationO 10 0.04 0.30 0.14 0.886 Combined rs1800734 LocationS 10 0.81 0.36 2.29 0.022 Combined rs1800734 AgeDiag 10 -0.04 0.01 -3.03 0.002 Combined rs1800734 sexTRUE 10 -0.04 0.27 -0.15 0.882 Combined rs1800734 methylationTRUE 10 5.55 0.40 13.84 0.000 Combined rs1800734 snp1 10 0.04 0.22 0.19 0.848 Combined rs1800734 (Intercept) 11 -1.43 1.37 -1.04 0.298 Combined rs1800734 LocationO 11 0.15 0.46 0.33 0.744 Combined rs1800734 LocationS 11 0.62 0.55 1.12 0.264 Combined rs1800734 AgeDiag 11 -0.05 0.02 -2.24 0.025 Combined rs1800734 sexTRUE 11 -0.17 0.41 -0.41 0.679 Combined rs1800734 ihcTRUE 11 6.50 0.67 9.63 0.000 Combined rs1800734 methylationTRUE 11 3.06 0.67 4.56 0.000 Combined rs1800734 snp1 11 0.12 0.34 0.36 0.720 Combined rs749072 (Intercept) 12 -1.44 0.24 -5.99 0.000 Combined rs749072 LocationO 12 0.40 0.21 1.86 0.062 Combined rs749072 LocationS 12 1.64 0.23 7.12 0.000 Combined rs749072 snp2 12 -0.58 0.13 -4.57 0.000 Combined rs749072 (Intercept) 13 -2.76 0.66 -4.16 0.000 Combined rs749072 LocationO 13 0.33 0.22 1.54 0.124 Combined rs749072 LocationS 13 1.44 0.24 6.10 0.000 Combined rs749072 AgeDiag 13 0.02 0.01 1.57 0.116 Combined rs749072 sexTRUE 13 0.76 0.18 4.17 0.000 Combined rs749072 snp2 13 -0.56 0.13 -4.35 0.000 Combined rs749072 (Intercept) 14 -1.58 1.33 -1.19 0.235 Combined rs749072 LocationO 14 0.19 0.46 0.41 0.681 Combined rs749072 LocationS 14 0.89 0.54 1.66 0.097 Combined rs749072 AgeDiag 14 -0.04 0.02 -1.78 0.076 Combined rs749072 sexTRUE 14 -0.01 0.40 -0.01 0.989 Combined rs749072 ihcTRUE 14 7.72 0.67 11.44 0.000 Combined rs749072 snp2 14 -0.13 0.31 -0.41 0.683 Combined rs749072 (Intercept) 15 -0.46 0.89 -0.52 0.604 Combined rs749072 LocationO 15 0.04 0.31 0.14 0.891 Combined rs749072 LocationS 15 0.81 0.36 2.28 0.022 Combined rs749072 AgeDiag 15 -0.04 0.01 -3.03 0.002 Combined rs749072 sexTRUE 15 -0.04 0.27 -0.16 0.874 Combined rs749072 methylationTRUE 15 5.52 0.40 13.81 0.000 Combined rs749072 snp2 15 -0.08 0.21 -0.38 0.702 Combined rs749072 (Intercept) 16 -1.20 1.34 -0.89 0.372

215

Combined rs749072 LocationO 16 0.15 0.46 0.32 0.748 Combined rs749072 LocationS 16 0.63 0.55 1.14 0.253 Combined rs749072 AgeDiag 16 -0.05 0.02 -2.26 0.024 Combined rs749072 sexTRUE 16 -0.18 0.41 -0.45 0.653 Combined rs749072 ihcTRUE 16 6.48 0.67 9.64 0.000 Combined rs749072 methylationTRUE 16 3.01 0.66 4.54 0.000 Combined rs749072 snp2 16 -0.01 0.32 -0.02 0.982 Combined rs13098279 (Intercept) 17 -1.65 0.26 -6.37 0.000 Combined rs13098279 LocationO 17 0.42 0.21 1.96 0.050 Combined rs13098279 LocationS 17 1.65 0.23 7.22 0.000 Combined rs13098279 snp3 17 -0.41 0.13 -3.05 0.002 Combined rs13098279 (Intercept) 18 -2.97 0.67 -4.44 0.000 Combined rs13098279 LocationO 18 0.35 0.21 1.63 0.104 Combined rs13098279 LocationS 18 1.46 0.23 6.22 0.000 Combined rs13098279 AgeDiag 18 0.02 0.01 1.55 0.121 Combined rs13098279 sexTRUE 18 0.77 0.18 4.24 0.000 Combined rs13098279 snp3 18 -0.38 0.14 -2.80 0.005 Combined rs13098279 (Intercept) 19 -2.14 1.38 -1.56 0.120 Combined rs13098279 LocationO 19 0.20 0.46 0.44 0.662 Combined rs13098279 LocationS 19 0.90 0.54 1.67 0.094 Combined rs13098279 AgeDiag 19 -0.04 0.02 -1.77 0.077 Combined rs13098279 sexTRUE 19 0.01 0.40 0.02 0.986 Combined rs13098279 ihcTRUE 19 7.80 0.68 11.44 0.000 Combined rs13098279 snp3 19 0.21 0.35 0.60 0.550 Combined rs13098279 (Intercept) 20 -0.91 0.91 -0.99 0.320 Combined rs13098279 LocationO 20 0.04 0.30 0.14 0.889 Combined rs13098279 LocationS 20 0.81 0.35 2.28 0.023 Combined rs13098279 AgeDiag 20 -0.04 0.01 -3.02 0.003 Combined rs13098279 sexTRUE 20 -0.04 0.27 -0.13 0.896 Combined rs13098279 methylationTRUE 20 5.60 0.40 13.85 0.000 Combined rs13098279 snp3 20 0.19 0.23 0.82 0.410 Combined rs13098279 (Intercept) 21 -1.92 1.40 -1.37 0.172 Combined rs13098279 LocationO 21 0.15 0.46 0.32 0.749 Combined rs13098279 LocationS 21 0.58 0.55 1.05 0.293 Combined rs13098279 AgeDiag 21 -0.04 0.02 -2.21 0.027 Combined rs13098279 sexTRUE 21 -0.14 0.41 -0.34 0.736 Combined rs13098279 ihcTRUE 21 6.55 0.68 9.57 0.000 Combined rs13098279 methylationTRUE 21 3.17 0.69 4.63 0.000 Combined rs13098279 snp3 21 0.39 0.37 1.05 0.294 SEA = Seattle, NL = Newfoundland, ON = Ontario, Snp1 = rs1800734, Snp2 = rs749072, Snp3 = rs13098279, msi = Tumour MSI status, ihc = Tumour MLH1 IHC status, methylation = Tumour MLH1 promoter methylation status, NA = not applicable

216

REFERENCES

1. Gervaz P, Bucher P, Morel P. Two colons-two cancers: paradigm shift and clinical implications. J Surg Oncol 2004;88(4):261-6. 2. Bleday R, Braidt J, Ruoff K, Shellito PC, Ackroyd FW. Quantitative cultures of the mucosal-associated bacteria in the mechanically prepared colon and rectum. Dis Colon Rectum 1993;36(9):844-9. 3. McMichael AJ, Potter JD. Host factors in carcinogenesis: certain bile-acid metabolic profiles that selectively increase the risk of proximal colon cancer. J Natl Cancer Inst 1985;75(2):185-91. 4. Feinberg AP, Ohlsson R, Henikoff S. The epigenetic progenitor origin of human cancer. Nat Rev Genet 2006;7(1):21-33. 5. Booth C, Brady G, Potten CS. Crowd control in the crypt. Nat Med 2002;8(12):1360-1. 6. Potten CS, Roberts SA, Chwalinski S, Loeffler M, Paulus U. Scoring mitotic activity in longitudinal sections of crypts of the small intestine. Cell Tissue Kinet 1988;21(4):231-46. 7. Grady WM, Carethers JM. Genomic and epigenetic instability in colorectal cancer pathogenesis. Gastroenterology 2008;135(4):1079-99. 8. Kinzler KW, Vogelstein B. Lessons from hereditary colorectal cancer. Cell 1996;87(2):159-70. 9. Neugut AI, Jacobson JS, De Vivo I. Epidemiology of colorectal adenomatous polyps. Cancer Epidemiol Biomarkers Prev 1993;2(2):159-76. 10. Jemal A, Siegel R, Ward E, Murray T, Xu J, Thun MJ. Cancer statistics, 2007. CA Cancer J Clin 2007;57(1):43-66. 11. Winawer S, Fletcher R, Rex D, Bond J, Burt R, Ferrucci J, et al. Colorectal cancer screening and surveillance: clinical guidelines and rationale-Update based on new evidence. Gastroenterology 2003;124(2):544-60. 12. Mitchell RJ, Farrington SM, Dunlop MG, Campbell H. Mismatch repair genes hMLH1 and hMSH2 and colorectal cancer: a HuGE review. Am J Epidemiol 2002;156(10):885-902. 13. Committee CCSsS. Canadian Cancer Statistics 2009. Canadian Cancer Society. Toronto; 2009. 14. Boyle P, Langman JS. ABC of colorectal cancer: Epidemiology. Bmj 2000;321(7264):805-8. 15. Boyle P, Leon ME. Epidemiology of colorectal cancer. Br Med Bull 2002;64:1-25. 16. de la Chapelle A. Genetic predisposition to colorectal cancer. Nat Rev Cancer 2004;4(10):769-80. 17. Haenszel W. Cancer mortality among the foreign-born in the United States. J Natl Cancer Inst 1961;26:37-132. 18. Haenszel W, Kurihara M. Studies of Japanese migrants. I. Mortality from cancer and other diseases among Japanese in the United States. J Natl Cancer Inst 1968;40(1):43-68. 19. McMichael AJ, McCall MG, Hartshorne JM, Woodings TL. Patterns of gastro-intestinal cancer in European migrants to Australia: the role of dietary change. Int J Cancer 1980;25(4):431-7. 20. Emmons KM, McBride CM, Puleo E, Pollak KI, Marcus BH, Napolitano M, et al. Prevalence and predictors of multiple behavioral risk factors for colon cancer. Prev Med 2005;40(5):527-34.

217

21. Ferrari P, Jenab M, Norat T, Moskal A, Slimani N, Olsen A, et al. Lifetime and baseline alcohol intake and risk of colon and rectal cancers in the European prospective investigation into cancer and nutrition (EPIC). Int J Cancer 2007;121(9):2065-72. 22. Cross AJ, Pollock JR, Bingham SA. Haem, not protein or inorganic iron, is responsible for endogenous intestinal N-nitrosation arising from red meat. Cancer Res 2003;63(10):2358-60. 23. Sesink AL, Termont DS, Kleibeuker JH, Van der Meer R. Red meat and colon cancer: the cytotoxic and hyperproliferative effects of dietary heme. Cancer Res 1999;59(22):5704-9. 24. Hughes R, Cross AJ, Pollock JR, Bingham S. Dose-dependent effect of dietary meat on endogenous colonic N-nitrosation. Carcinogenesis 2001;22(1):199-202. 25. Bautista D, Obrador A, Moreno V, Cabeza E, Canet R, Benito E, et al. Ki-ras mutation modifies the protective effect of dietary monounsaturated fat and calcium on sporadic colorectal cancer. Cancer Epidemiol Biomarkers Prev 1997;6(1):57-61. 26. Diergaarde B, Braam H, van Muijen GN, Ligtenberg MJ, Kok FJ, Kampman E. Dietary factors and microsatellite instability in sporadic colon carcinomas. Cancer Epidemiol Biomarkers Prev 2003;12(11 Pt 1):1130-6. 27. Slattery ML, Curtin K, Anderson K, Ma KN, Edwards S, Leppert M, et al. Associations between dietary intake and Ki-ras mutations in colon tumors: a population-based study. Cancer Res 2000;60(24):6935-41. 28. Wark PA, Van der Kuil W, Ploemacher J, Van Muijen GN, Mulder CJ, Weijenberg MP, et al. Diet, lifestyle and risk of K-ras mutation-positive and -negative colorectal adenomas. Int J Cancer 2006;119(2):398-405. 29. Weijenberg MP, Luchtenborg M, de Goeij AF, Brink M, van Muijen GN, de Bruine AP, et al. Dietary fat and risk of colon and rectal cancer with aberrant MLH1 expression, APC or KRAS genes. Cancer Causes Control 2007;18(8):865-79. 30. Moskal A, Norat T, Ferrari P, Riboli E. Alcohol intake and colorectal cancer risk: a dose- response meta-analysis of published cohort studies. Int J Cancer 2006;120(3):664-71. 31. Cho E, Smith-Warner SA, Ritz J, van den Brandt PA, Colditz GA, Folsom AR, et al. Alcohol intake and colorectal cancer: a pooled analysis of 8 cohort studies. Ann Intern Med 2004;140(8):603-13. 32. Rehm J, Room R, Graham K, Monteiro M, Gmel G, Sempos CT. The relationship of average volume of alcohol consumption and patterns of drinking to burden of disease: an overview. Addiction 2003;98(9):1209-28. 33. Poschl G, Seitz HK. Alcohol and cancer. Alcohol Alcohol 2004;39(3):155-65. 34. Anand P, Kunnumakkara AB, Sundaram C, Harikumar KB, Tharakan ST, Lai OS, et al. Cancer is a preventable disease that requires major lifestyle changes. Pharm Res 2008;25(9):2097-116. 35. Potter JD. Colorectal cancer: molecules and populations. J Natl Cancer Inst 1999;91(11):916-32. 36. Krok KL, Lichtenstein GR. Colorectal cancer in inflammatory bowel disease. Curr Opin Gastroenterol 2004;20(1):43-8. 37. Pohl C, Hombach A, Kruis W. Chronic inflammatory bowel disease and cancer. Hepatogastroenterology 2000;47(31):57-70. 38. Ahmadi A, Polyak S, Draganov PV. Colorectal cancer surveillance in inflammatory bowel disease: the search continues. World J Gastroenterol 2009;15(1):61-6. 39. Butterworth AS, Higgins JP, Pharoah P. Relative and absolute risk of colorectal cancer for individuals with a family history: a meta-analysis. Eur J Cancer 2006;42(2):216-27.

218

40. Johns LE, Houlston RS. A systematic review and meta-analysis of familial colorectal cancer risk. Am J Gastroenterol 2001;96(10):2992-3003. 41. Mahon SM. Prevention and screening of gastrointestinal cancers. Semin Oncol Nurs 2009;25(1):15-31. 42. Bond JH. Colon polyps and cancer. Endoscopy 2003;35(1):27-35. 43. Sobin LH, Hermanek P, Hutter RV. TNM classification of malignant tumors. A comparison between the new (1987) and the old editions. Cancer 1988;61(11):2310-4. 44. Compton CC, Greene FL. The staging of colorectal cancer: 2004 and beyond. CA Cancer J Clin 2004;54(6):295-308. 45. Green FL, Page DL, Fleming ID, Fritz A, Balch CM, Haller DG, et al. AJCC Cancer Staging Manual, 6th Edition New York, NY: Springer; 2002. 46. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell 2000;100(1):57-70. 47. Fearon ER, Vogelstein B. A genetic model for colorectal tumorigenesis. Cell 1990;61(5):759-67. 48. Lengauer C, Kinzler KW, Vogelstein B. Genetic instabilities in human cancers. Nature 1998;396(6712):643-9. 49. Grady WM. Genomic instability and colon cancer. Cancer Metastasis Rev 2004;23(1- 2):11-27. 50. Herrmann JL, Rastelli L, Burgess CE, Fernandez EE, Rothberg BE, Rothberg JM, et al. Implications of oncogenomics for cancer research and clinical oncology. Cancer J 2001;7(1):40- 51. 51. Chung DC. The genetic basis of colorectal cancer: insights into critical pathways of tumorigenesis. Gastroenterology 2000;119(3):854-65. 52. Miyaki M, Konishi M, Kikuchi-Yanoshita R, Enomoto M, Igari T, Tanaka K, et al. Characteristics of somatic mutation of the adenomatous polyposis coli gene in colorectal tumors. Cancer Res 1994;54(11):3011-20. 53. Taketo MM. Shutting down Wnt signal-activated cancer. Nat Genet 2004;36(4):320-2. 54. Willert K, Jones KA. Wnt signaling: is the party in the nucleus? Genes Dev 2006;20(11):1394-404. 55. Gregorieff A, Clevers H. Wnt signaling in the intestinal epithelium: from endoderm to cancer. Genes Dev 2005;19(8):877-90. 56. Behrens J, Jerchow BA, Wurtele M, Grimm J, Asbrand C, Wirtz R, et al. Functional interaction of an axin homolog, conductin, with beta-catenin, APC, and GSK3beta. Science 1998;280(5363):596-9. 57. Muhua L, Adames NR, Murphy MD, Shields CR, Cooper JA. A cytokinesis checkpoint requiring the yeast homologue of an APC-binding protein. Nature 1998;393(6684):487-91. 58. Fodde R, Kuipers J, Rosenberg C, Smits R, Kielman M, Gaspar C, et al. Mutations in the APC tumour suppressor gene cause chromosomal instability. Nat Cell Biol 2001;3(4):433-8. 59. Jallepalli PV, Lengauer C. Chromosome segregation and cancer: cutting through the mystery. Nat Rev Cancer 2001;1(2):109-17. 60. Cardoso J, Molenaar L, de Menezes RX, van Leerdam M, Rosenberg C, Moslein G, et al. Chromosomal instability in MYH- and APC-mutant adenomatous polyps. Cancer Res 2006;66(5):2514-9. 61. Sen S, Zhou H, White RA. A putative serine/threonine kinase encoding gene BTAK on chromosome 20q13 is amplified and overexpressed in human breast cancer cell lines. Oncogene 1997;14(18):2195-200.

219

62. Macmillan JC, Hudson JW, Bull S, Dennis JW, Swallow CJ. Comparative expression of the mitotic regulators SAK and PLK in colorectal cancer. Ann Surg Oncol 2001;8(9):729-40. 63. Rajagopalan H, Jallepalli PV, Rago C, Velculescu VE, Kinzler KW, Vogelstein B, et al. Inactivation of hCDC4 can cause chromosomal instability. Nature 2004;428(6978):77-81. 64. Jiricny J. The multifaceted mismatch-repair system. Nat Rev Mol Cell Biol 2006;7(5):335-46. 65. Boland CR, Thibodeau SN, Hamilton SR, Sidransky D, Eshleman JR, Burt RW, et al. A National Cancer Institute Workshop on Microsatellite Instability for cancer detection and familial predisposition: development of international criteria for the determination of microsatellite instability in colorectal cancer. Cancer Res 1998;58(22):5248-57. 66. Umar A. Lynch syndrome (HNPCC) and microsatellite instability. Dis Markers 2004;20(4-5):179-80. 67. Bhattacharyya NP, Skandalis A, Ganesh A, Groden J, Meuth M. Mutator phenotypes in human colorectal carcinoma cell lines. Proc Natl Acad Sci U S A 1994;91(14):6319-23. 68. Parsons R, Li GM, Longley MJ, Fang WH, Papadopoulos N, Jen J, et al. Hypermutability and mismatch repair deficiency in RER+ tumor cells. Cell 1993;75(6):1227-36. 69. Aaltonen LA, Peltomaki P, Leach FS, Sistonen P, Pylkkanen L, Mecklin JP, et al. Clues to the pathogenesis of familial colorectal cancer. Science 1993;260(5109):812-6. 70. Markowitz S, Wang J, Myeroff L, Parsons R, Sun L, Lutterbaugh J, et al. Inactivation of the type II TGF-beta receptor in colon cancer cells with microsatellite instability. Science 1995;268(5215):1336-8. 71. Parsons R, Myeroff LL, Liu B, Willson JK, Markowitz SD, Kinzler KW, et al. Microsatellite instability and mutations of the transforming growth factor beta type II receptor gene in colorectal cancer. Cancer Res 1995;55(23):5548-50. 72. Rampino N, Yamamoto H, Ionov Y, Li Y, Sawai H, Reed JC, et al. Somatic frameshift mutations in the BAX gene in colon cancers of the microsatellite mutator phenotype. Science 1997;275(5302):967-9. 73. Yamamoto H, Sawai H, Perucho M. Frameshift somatic mutations in gastrointestinal cancer of the microsatellite mutator phenotype. Cancer Res 1997;57(19):4420-6. 74. Grady WM, Rajput A, Myeroff L, Liu DF, Kwon K, Willis J, et al. Mutation of the type II transforming growth factor-beta receptor is coincident with the transformation of human colon adenomas to malignant carcinomas. Cancer Res 1998;58(14):3101-4. 75. Yagi OK, Akiyama Y, Nomizu T, Iwama T, Endo M, Yuasa Y. Proapoptotic gene BAX is frequently mutated in hereditary nonpolyposis colorectal cancers but not in adenomas. Gastroenterology 1998;114(2):268-74. 76. Malkhosyan S, Rampino N, Yamamoto H, Perucho M. Frameshift mutator mutations. Nature 1996;382(6591):499-500. 77. Duval A, Hamelin R. Genetic instability in human mismatch repair deficient cancers. Ann Genet 2002;45(2):71-5. 78. Li GM. Mechanisms and functions of DNA mismatch repair. Cell Res 2008;18(1):85-98. 79. Deng G, Bell I, Crawley S, Gum J, Terdiman JP, Allen BA, et al. BRAF mutation is frequently present in sporadic colorectal cancer with methylated hMLH1, but not in hereditary nonpolyposis colorectal cancer. Clin Cancer Res 2004;10(1 Pt 1):191-5. 80. Jenkins MA, Hayashi S, O'Shea AM, Burgart LJ, Smyrk TC, Shimizu D, et al. Pathology features in Bethesda guidelines predict colorectal cancer microsatellite instability: a population- based study. Gastroenterology 2007;133(1):48-56.

220

81. Popat S, Hubner R, Houlston RS. Systematic review of microsatellite instability and colorectal cancer prognosis. J Clin Oncol 2005;23(3):609-18. 82. Alexander J, Watanabe T, Wu TT, Rashid A, Li S, Hamilton SR. Histopathological identification of colon cancer with microsatellite instability. Am J Pathol 2001;158(2):527-35. 83. Thibodeau SN, French AJ, Cunningham JM, Tester D, Burgart LJ, Roche PC, et al. Microsatellite instability in colorectal cancer: different mutator phenotypes and the principal involvement of hMLH1. Cancer Res 1998;58(8):1713-8. 84. Bapat B, Lindor NM, Baron J, Siegmund K, Li L, Zheng Y, et al. The association of tumor microsatellite instability phenotype with family history of colorectal cancer. Cancer Epidemiol Biomarkers Prev 2009;18(3):967-75. 85. Gryfe R, Kim H, Hsieh ET, Aronson MD, Holowaty EJ, Bull SB, et al. Tumor microsatellite instability and clinical outcome in young patients with colorectal cancer. N Engl J Med 2000;342(2):69-77. 86. Thibodeau SN, Bren G, Schaid D. Microsatellite instability in cancer of the proximal colon. Science 1993;260(5109):816-9. 87. Carethers JM, Chauhan DP, Fink D, Nebel S, Bresalier RS, Howell SB, et al. Mismatch repair proficiency and in vitro response to 5-fluorouracil. Gastroenterology 1999;117(1):123-31. 88. Ribic CM, Sargent DJ, Moore MJ, Thibodeau SN, French AJ, Goldberg RM, et al. Tumor microsatellite-instability status as a predictor of benefit from fluorouracil-based adjuvant chemotherapy for colon cancer. N Engl J Med 2003;349(3):247-57. 89. Carethers JM, Smith EJ, Behling CA, Nguyen L, Tajima A, Doctolero RT, et al. Use of 5-fluorouracil and survival in patients with microsatellite-unstable colorectal cancer. Gastroenterology 2004;126(2):394-401. 90. Jover R, Zapater P, Castells A, Llor X, Andreu M, Cubiella J, et al. Mismatch repair status in the prediction of benefit from adjuvant fluorouracil chemotherapy in colorectal cancer. Gut 2006;55(6):848-55. 91. de Vos tot Nederveen Cappel WH, Meulenbeld HJ, Kleibeuker JH, Nagengast FM, Menko FH, Griffioen G, et al. Survival after adjuvant 5-FU treatment for stage III colon cancer in hereditary nonpolyposis colorectal cancer. Int J Cancer 2004;109(3):468-71. 92. Bird A. DNA methylation patterns and epigenetic memory. Genes Dev 2002;16(1):6-21. 93. Santos-Rosa H, Caldas C. Chromatin modifier enzymes, the histone code and cancer. Eur J Cancer 2005;41(16):2381-402. 94. Herman JG, Baylin SB. Gene silencing in cancer in association with promoter hypermethylation. N Engl J Med 2003;349(21):2042-54. 95. Wong JJ, Hawkins NJ, Ward RL. Colorectal cancer: a model for epigenetic tumorigenesis. Gut 2007;56(1):140-8. 96. Feinberg AP, Tycko B. The history of cancer epigenetics. Nat Rev Cancer 2004;4(2):143-53. 97. Greger V, Passarge E, Hopping W, Messmer E, Horsthemke B. Epigenetic changes may contribute to the formation and spontaneous regression of retinoblastoma. Hum Genet 1989;83(2):155-8. 98. Toyota M, Ahuja N, Ohe-Toyota M, Herman JG, Baylin SB, Issa JP. CpG island methylator phenotype in colorectal cancer. Proc Natl Acad Sci U S A 1999;96(15):8681-6. 99. Shen L, Toyota M, Kondo Y, Lin E, Zhang L, Guo Y, et al. Integrated genetic and epigenetic analysis identifies three different subclasses of colon cancer. Proc Natl Acad Sci U S A 2007;104(47):18654-9.

221

100. Hawkins N, Norrie M, Cheong K, Mokany E, Ku SL, Meagher A, et al. CpG island methylation in sporadic colorectal cancers and its relationship to microsatellite instability. Gastroenterology 2002;122(5):1376-87. 101. Ward RL, Cheong K, Ku SL, Meagher A, O'Connor T, Hawkins NJ. Adverse prognostic effect of methylation in colorectal cancer is reversed by microsatellite instability. J Clin Oncol 2003;21(20):3729-36. 102. Yamashita K, Dai T, Dai Y, Yamamoto F, Perucho M. Genetics supersedes epigenetics in colon cancer phenotype. Cancer Cell 2003;4(2):121-31. 103. Sieber OM, Heinimann K, Tomlinson IP. Genomic instability--the engine of tumorigenesis? Nat Rev Cancer 2003;3(9):701-8. 104. Rajagopalan H, Nowak MA, Vogelstein B, Lengauer C. The significance of unstable chromosomes in colorectal cancer. Nat Rev Cancer 2003;3(9):695-701. 105. Strate LL, Syngal S. Hereditary colorectal cancer syndromes. Cancer Causes Control 2005;16(3):201-13. 106. Aaltonen LA, Salovaara R, Kristo P, Canzian F, Hemminki A, Peltomaki P, et al. Incidence of hereditary nonpolyposis colorectal cancer and the feasibility of molecular screening for the disease. N Engl J Med 1998;338(21):1481-7. 107. Samowitz WS, Curtin K, Lin HH, Robertson MA, Schaffer D, Nichols M, et al. The colon cancer burden of genetically defined hereditary nonpolyposis colon cancer. Gastroenterology 2001;121(4):830-8. 108. Lynch HT, Krush AJ. Cancer family "G" revisited: 1895-1970. Cancer 1971;27(6):1505- 11. 109. Arczewska KD, Kusmierek JT. Bacterial DNA repair genes and their eukaryotic homologues: 2. Role of bacterial mutator gene homologues in human disease. Overview of nucleotide pool sanitization and mismatch repair systems. Acta Biochim Pol 2007;54(3):435-57. 110. Syngal S, Fox EA, Li C, Dovidio M, Eng C, Kolodner RD, et al. Interpretation of genetic test results for hereditary nonpolyposis colorectal cancer: implications for clinical predisposition testing. Jama 1999;282(3):247-53. 111. Lynch HT, Watson P, Lanspa SJ, Marcus J, Smyrk T, Fitzgibbons RJ, Jr., et al. Natural history of colorectal cancer in hereditary nonpolyposis colorectal cancer (Lynch syndromes I and II). Dis Colon Rectum 1988;31(6):439-44. 112. Vasen HF, Mecklin JP, Watson P, Utsunomiya J, Bertario L, Lynch P, et al. Surveillance in hereditary nonpolyposis colorectal cancer: an international cooperative study of 165 families. The International Collaborative Group on HNPCC. Dis Colon Rectum 1993;36(1):1-4. 113. Jass JR, Stewart SM. Evolution of hereditary non-polyposis colorectal cancer. Gut 1992;33(6):783-6. 114. Jass JR, Smyrk TC, Stewart SM, Lane MR, Lanspa SJ, Lynch HT. Pathology of hereditary non-polyposis colorectal cancer. Anticancer Res 1994;14(4B):1631-4. 115. Lynch HT, Smyrk TC, Watson P, Lanspa SJ, Lynch JF, Lynch PM, et al. Genetics, natural history, tumor spectrum, and pathology of hereditary nonpolyposis colorectal cancer: an updated review. Gastroenterology 1993;104(5):1535-49. 116. Watson P, Lynch HT. Extracolonic cancer in hereditary nonpolyposis colorectal cancer. Cancer 1993;71(3):677-85. 117. Vasen HF, Mecklin JP, Khan PM, Lynch HT. The International Collaborative Group on Hereditary Non-Polyposis Colorectal Cancer (ICG-HNPCC). Dis Colon Rectum 1991;34(5):424-5.

222

118. Vasen HF, Watson P, Mecklin JP, Lynch HT. New clinical criteria for hereditary nonpolyposis colorectal cancer (HNPCC, Lynch syndrome) proposed by the International Collaborative group on HNPCC. Gastroenterology 1999;116(6):1453-6. 119. Benatti P, Sassatelli R, Roncucci L, Pedroni M, Fante R, Di Gregorio C, et al. Tumour spectrum in hereditary non-polyposis colorectal cancer (HNPCC) and in families with "suspected HNPCC". A population-based study in northern Italy. Colorectal Cancer Study Group. Int J Cancer 1993;54(3):371-7. 120. Bellacosa A, Genuardi M, Anti M, Viel A, Ponz de Leon M. Hereditary nonpolyposis colorectal cancer: review of clinical, molecular genetics, and counseling aspects. Am J Med Genet 1996;62(4):353-64. 121. Rodriguez-Bigas MA, Boland CR, Hamilton SR, Henson DE, Jass JR, Khan PM, et al. A National Cancer Institute Workshop on Hereditary Nonpolyposis Colorectal Cancer Syndrome: meeting highlights and Bethesda guidelines. J Natl Cancer Inst 1997;89(23):1758-62. 122. Umar A, Boland CR, Terdiman JP, Syngal S, de la Chapelle A, Ruschoff J, et al. Revised Bethesda Guidelines for hereditary nonpolyposis colorectal cancer (Lynch syndrome) and microsatellite instability. J Natl Cancer Inst 2004;96(4):261-8. 123. Ligtenberg MJ, Kuiper RP, Chan TL, Goossens M, Hebeda KM, Voorendt M, et al. Heritable somatic methylation and inactivation of MSH2 in families with Lynch syndrome due to deletion of the 3' exons of TACSTD1. Nat Genet 2009;41(1):112-7. 124. Bapat B, Xia L, Madlensky L, Mitri A, Tonin P, Narod SA, et al. The genetic basis of Muir-Torre syndrome includes the hMLH1 locus. Am J Hum Genet 1996;59(3):736-9. 125. Wimmer K, Etzler J. Constitutional mismatch repair-deficiency syndrome: have we so far seen only the tip of an iceberg? Hum Genet 2008;124(2):105-22. 126. Lindor NM, Rabe K, Petersen GM, Haile R, Casey G, Baron J, et al. Lower cancer incidence in Amsterdam-I criteria families without mismatch repair deficiency: familial colorectal cancer type X. Jama 2005;293(16):1979-85. 127. Kaz AM, Brentnall TA. Genetic testing for colon cancer. Nat Clin Pract Gastroenterol Hepatol 2006;3(12):670-9. 128. de Jong MM, Nolte IM, te Meerman GJ, van der Graaf WT, de Vries EG, Sijmons RH, et al. Low-penetrance genes and their involvement in colorectal cancer susceptibility. Cancer Epidemiol Biomarkers Prev 2002;11(11):1332-52. 129. Houlston RS, Peto J. The search for low-penetrance cancer susceptibility alleles. Oncogene 2004;23(38):6471-6. 130. Miyoshi Y, Nagase H, Ando H, Horii A, Ichii S, Nakatsuru S, et al. Somatic mutations of the APC gene in colorectal tumors: mutation cluster region in the APC gene. Hum Mol Genet 1992;1(4):229-33. 131. Galle TS, Juel K, Bulow S. Causes of death in familial adenomatous polyposis. Scand J Gastroenterol 1999;34(8):808-12. 132. Wirtzfeld DA, Petrelli NJ, Rodriguez-Bigas MA. Hamartomatous polyposis syndromes: molecular genetics, neoplastic risk, and surveillance recommendations. Ann Surg Oncol 2001;8(4):319-27. 133. Hemminki A, Markie D, Tomlinson I, Avizienyte E, Roth S, Loukola A, et al. A serine/threonine kinase gene defective in Peutz-Jeghers syndrome. Nature 1998;391(6663):184- 7.

223

134. Jenne DE, Reimann H, Nezu J, Friedel W, Loff S, Jeschke R, et al. Peutz-Jeghers syndrome is caused by mutations in a novel serine threonine kinase. Nat Genet 1998;18(1):38- 43. 135. Giardiello FM, Brensinger JD, Tersmette AC, Goodman SN, Petersen GM, Booker SV, et al. Very high risk of cancer in familial Peutz-Jeghers syndrome. Gastroenterology 2000;119(6):1447-53. 136. Boardman LA, Thibodeau SN, Schaid DJ, Lindor NM, McDonnell SK, Burgart LJ, et al. Increased risk for cancer in patients with the Peutz-Jeghers syndrome. Ann Intern Med 1998;128(11):896-9. 137. Stram DO, Hankin JH, Wilkens LR, Pike MC, Monroe KR, Park S, et al. Calibration of the dietary questionnaire for a multiethnic cohort in Hawaii and Los Angeles. Am J Epidemiol 2000;151(4):358-70. 138. Howe JR, Mitros FA, Summers RW. The risk of gastrointestinal carcinoma in familial juvenile polyposis. Ann Surg Oncol 1998;5(8):751-6. 139. Howe JR, Roth S, Ringold JC, Summers RW, Jarvinen HJ, Sistonen P, et al. Mutations in the SMAD4/DPC4 gene in juvenile polyposis. Science 1998;280(5366):1086-8. 140. Howe JR, Bair JL, Sayed MG, Anderson ME, Mitros FA, Petersen GM, et al. Germline mutations of the gene encoding bone morphogenetic protein receptor 1A in juvenile polyposis. Nat Genet 2001;28(2):184-7. 141. Sayed MG, Ahmed AF, Ringold JR, Anderson ME, Bair JL, Mitros FA, et al. Germline SMAD4 or BMPR1A mutations and phenotype of juvenile polyposis. Ann Surg Oncol 2002;9(9):901-6. 142. Olschwang S, Serova-Sinilnikova OM, Lenoir GM, Thomas G. PTEN germ-line mutations in juvenile polyposis coli. Nat Genet 1998;18(1):12-4. 143. Liaw D, Marsh DJ, Li J, Dahia PL, Wang SI, Zheng Z, et al. Germline mutations of the PTEN gene in Cowden disease, an inherited breast and thyroid cancer syndrome. Nat Genet 1997;16(1):64-7. 144. Kunz C, Saito Y, Schar P. DNA Repair in mammalian cells: Mismatched repair: variations on a theme. Cell Mol Life Sci 2009;66(6):1021-38. 145. Drake JW. A constant rate of spontaneous mutation in DNA-based microbes. Proc Natl Acad Sci U S A 1991;88(16):7160-4. 146. Hubscher U, Maga G, Spadari S. Eukaryotic DNA polymerases. Annu Rev Biochem 2002;71:133-63. 147. McCulloch SD, Kunkel TA. The fidelity of DNA synthesis by eukaryotic replicative and translesion synthesis polymerases. Cell Res 2008;18(1):148-61. 148. Iyer RR, Pluciennik A, Burdett V, Modrich PL. DNA mismatch repair: functions and mechanisms. Chem Rev 2006;106(2):302-23. 149. Altieri F, Grillo C, Maceroni M, Chichiarelli S. DNA damage and repair: from molecular mechanisms to health implications. Antioxid Redox Signal 2008;10(5):891-937. 150. Lipscomb LA, Peek ME, Morningstar ML, Verghis SM, Miller EM, Rich A, et al. X-ray structure of a DNA decamer containing 7,8-dihydro-8-oxoguanine. Proc Natl Acad Sci U S A 1995;92(3):719-23. 151. Kouchakdjian M, Bodepudi V, Shibutani S, Eisenberg M, Johnson F, Grollman AP, et al. NMR structural studies of the ionizing radiation adduct 7-hydro-8-oxodeoxyguanosine (8-oxo- 7H-dG) opposite deoxyadenosine in a DNA duplex. 8-Oxo-7H-dG(syn).dA(anti) alignment at lesion site. Biochemistry 1991;30(5):1403-12.

224

152. Eadie JS, Conrad M, Toorchen D, Topal MD. Mechanism of mutagenesis by O6- methylguanine. Nature 1984;308(5955):201-3. 153. Kaina B, Christmann M, Naumann S, Roos WP. MGMT: key node in the battle against genotoxicity, carcinogenicity and apoptosis induced by alkylating agents. DNA Repair (Amst) 2007;6(8):1079-99. 154. Stojic L, Brun R, Jiricny J. Mismatch repair and DNA damage signalling. DNA Repair (Amst) 2004;3(8-9):1091-101. 155. Roberts JD, Kunkel TA. Fidelity of a human cell DNA replication complex. Proc Natl Acad Sci U S A 1988;85(19):7064-8. 156. An Q, Robins P, Lindahl T, Barnes DE. 5-Fluorouracil incorporated into DNA is excised by the Smug1 DNA glycosylase to reduce drug cytotoxicity. Cancer Res 2007;67(3):940-5. 157. Longley DB, Harkin DP, Johnston PG. 5-fluorouracil: mechanisms of action and clinical strategies. Nat Rev Cancer 2003;3(5):330-8. 158. Sowers LC, Eritja R, Kaplan B, Goodman MF, Fazakerly GV. Equilibrium between a wobble and ionized base pair formed between fluorouracil and guanine in DNA as studied by proton and fluorine NMR. J Biol Chem 1988;263(29):14794-801. 159. Neuberger MS. Antibody diversification by somatic mutation: from Burnet onwards. Immunol Cell Biol 2008;86(2):124-32. 160. Stavnezer J, Guikema JE, Schrader CE. Mechanism and regulation of class switch recombination. Annu Rev Immunol 2008;26:261-92. 161. Schofield MJ, Hsieh P. DNA mismatch repair: molecular mechanisms and biological function. Annu Rev Microbiol 2003;57:579-608. 162. Treffers HP, Spinelli V, Belser NO. A Factor (or Mutator Gene) Influencing Mutation Rates in Escherichia Coli. Proc Natl Acad Sci U S A 1954;40(11):1064-71. 163. Siegel EC, Bryson V. Selection of Resistant Strains of Escherichia Coli by Antibiotics and Antibacterial Agents: Role of Normal and Mutator Strains. Antimicrob Agents Chemother (Bethesda) 1963;161:629-34. 164. Siegel EC, Bryson V. Mutator gene of Escherichia coli B. J Bacteriol 1967;94(1):38-47. 165. Kunkel TA, Erie DA. DNA mismatch repair. Annu Rev Biochem 2005;74:681-710. 166. Pukkila PJ, Peterson J, Herman G, Modrich P, Meselson M. Effects of high levels of DNA adenine methylation on methyl-directed mismatch repair in Escherichia coli. Genetics 1983;104(4):571-82. 167. Marinus MG. Adenine methylation of Okazaki fragments in Escherichia coli. J Bacteriol 1976;128(3):853-4. 168. Geier GE, Modrich P. Recognition sequence of the dam methylase of Escherichia coli K12 and mode of cleavage of Dpn I endonuclease. J Biol Chem 1979;254(4):1408-13. 169. Lyons SM, Schendel PF. Kinetics of methylation in Escherichia coli K-12. J Bacteriol 1984;159(1):421-3. 170. Marinus MG, Morris NR. Biological function for 6-methyladenine residues in the DNA of Escherichia coli K12. J Mol Biol 1974;85(2):309-22. 171. Glickman BW. Spontaneous mutagenesis in Escherichia coli strains lacking 6- methyladenine residues in their DNA: an altered mutational spectrum in dam- mutants. Mutat Res 1979;61(2):153-62. 172. Herman GE, Modrich P. Escherichia coli K-12 clones that overproduce dam methylase are hypermutable. J Bacteriol 1981;145(1):644-6.

225

173. Marinus MG, Poteete A, Arraj JA. Correlation of DNA adenine methylase activity with spontaneous mutability in Escherichia coli K-12. Gene 1984;28(1):123-5. 174. Grilley M, Welsh KM, Su SS, Modrich P. Isolation and characterization of the Escherichia coli mutL gene product. J Biol Chem 1989;264(2):1000-4. 175. Galio L, Bouquet C, Brooks P. ATP hydrolysis-dependent formation of a dynamic ternary nucleoprotein complex with MutS and MutL. Nucleic Acids Res 1999;27(11):2325-31. 176. Spampinato C, Modrich P. The MutL ATPase is required for mismatch repair. J Biol Chem 2000;275(13):9863-9. 177. Modrich P. Mechanisms and biological effects of mismatch repair. Annu Rev Genet 1991;25:229-53. 178. Welsh KM, Lu AL, Clark S, Modrich P. Isolation and characterization of the Escherichia coli mutH gene product. J Biol Chem 1987;262(32):15624-9. 179. Bruni R, Martin D, Jiricny J. d(GATC) sequences influence Escherichia coli mismatch repair in a distance-dependent manner from positions both upstream and downstream of the mismatch. Nucleic Acids Res 1988;16(11):4875-90. 180. Lahue RS, Au KG, Modrich P. DNA mismatch correction in a defined system. Science 1989;245(4914):160-4. 181. Cooper DL, Lahue RS, Modrich P. Methyl-directed mismatch repair is bidirectional. J Biol Chem 1993;268(16):11823-9. 182. Grilley M, Griffith J, Modrich P. Bidirectional excision in methyl-directed mismatch repair. J Biol Chem 1993;268(16):11830-7. 183. Burdett V, Baitinger C, Viswanathan M, Lovett ST, Modrich P. In vivo requirement for RecJ, ExoVII, ExoI, and ExoX in methyl-directed mismatch repair. Proc Natl Acad Sci U S A 2001;98(12):6765-70. 184. Lopez de Saro FJ, O'Donnell M. Interaction of the beta sliding clamp with MutS, ligase, and DNA polymerase I. Proc Natl Acad Sci U S A 2001;98(15):8376-80. 185. Hsieh P, Yamane K. DNA mismatch repair: molecular mechanism, cancer, and ageing. Mech Ageing Dev 2008;129(7-8):391-407. 186. Harfe BD, Minesinger BK, Jinks-Robertson S. Discrete in vivo roles for the MutL homologs Mlh2p and Mlh3p in the removal of frameshift intermediates in budding yeast. Curr Biol 2000;10(3):145-8. 187. Harfe BD, Jinks-Robertson S. DNA mismatch repair and genetic instability. Annu Rev Genet 2000;34:359-399. 188. McCulloch SD, Gu L, Li GM. Bi-directional processing of DNA loops by mismatch repair-dependent and -independent pathways in human cells. J Biol Chem 2003;278(6):3891-6. 189. Raschle M, Marra G, Nystrom-Lahti M, Schar P, Jiricny J. Identification of hMutLbeta, a heterodimer of hMLH1 and hPMS1. J Biol Chem 1999;274(45):32368-75. 190. Dutta R, Inouye M. GHKL, an emergent ATPase/kinase superfamily. Trends Biochem Sci 2000;25(1):24-8. 191. Kadyrov FA, Dzantiev L, Constantin N, Modrich P. Endonucleolytic function of MutLalpha in human mismatch repair. Cell 2006;126(2):297-308. 192. Zhang Y, Yuan F, Presnell SR, Tian K, Gao Y, Tomkinson AE, et al. Reconstitution of 5'-directed human mismatch repair in a purified system. Cell 2005;122(5):693-705. 193. Blackwell LJ, Martik D, Bjornson KP, Bjornson ES, Modrich P. Nucleotide-promoted release of hMutSalpha from heteroduplex DNA is consistent with an ATP-dependent translocation mechanism. J Biol Chem 1998;273(48):32055-62.

226

194. Martik D, Baitinger C, Modrich P. Differential specificities and simultaneous occupancy of human MutSalpha nucleotide binding sites. J Biol Chem 2004;279(27):28402-10. 195. Gradia S, Subramanian D, Wilson T, Acharya S, Makhov A, Griffith J, et al. hMSH2- hMSH6 forms a hydrolysis-independent sliding clamp on mismatched DNA. Mol Cell 1999;3(2):255-61. 196. Wang H, Hays JB. Mismatch repair in human nuclear extracts: effects of internal DNA- hairpin structures between mismatches and excision-initiation nicks on mismatch correction and mismatch-provoked excision. J Biol Chem 2003;278(31):28686-93. 197. Wang H, Hays JB. Signaling from DNA mispairs to mismatch-repair excision sites despite intervening blockades. Embo J 2004;23(10):2126-33. 198. Lacks SA, Dunn JJ, Greenberg B. Identification of base mismatches recognized by the heteroduplex-DNA-repair system of Streptococcus pneumoniae. Cell 1982;31(2 Pt 1):327-36. 199. Vo AT, Zhu F, Wu X, Yuan F, Gao Y, Gu L, et al. hMRE11 deficiency leads to microsatellite instability and defective DNA mismatch repair. EMBO Rep 2005;6(5):438-44. 200. Li GM. The role of mismatch repair in DNA damage-induced apoptosis. Oncol Res 1999;11(9):393-400. 201. Brown KD, Rathi A, Kamath R, Beardsley DI, Zhan Q, Mannino JL, et al. The mismatch repair system is required for S-phase checkpoint activation. Nat Genet 2003;33(1):80-4. 202. O'Brien V, Brown R. Signalling cell cycle arrest and cell death through the MMR System. Carcinogenesis 2006;27(4):682-92. 203. Gong JG, Costanzo A, Yang HQ, Melino G, Kaelin WG, Jr., Levrero M, et al. The tyrosine kinase c-Abl regulates p73 in apoptotic response to cisplatin-induced DNA damage. Nature 1999;399(6738):806-9. 204. Duckett DR, Drummond JT, Murchie AI, Reardon JT, Sancar A, Lilley DM, et al. Human MutSalpha recognizes damaged DNA base pairs containing O6-methylguanine, O4- methylthymine, or the cisplatin-d(GpG) adduct. Proc Natl Acad Sci U S A 1996;93(13):6443-7. 205. Duckett DR, Bronstein SM, Taya Y, Modrich P. hMutSalpha- and hMutLalpha- dependent phosphorylation of p53 in response to DNA methylator damage. Proc Natl Acad Sci U S A 1999;96(22):12384-8. 206. Kim WJ, Rajasekaran B, Brown KD. MLH1- and ATM-dependent MAPK signaling is activated through c-Abl in response to the alkylator N-methyl-N'-nitro-N'-nitrosoguanidine. J Biol Chem 2007;282(44):32021-31. 207. Shimodaira H, Yoshioka-Yamashita A, Kolodner RD, Wang JY. Interaction of mismatch repair protein PMS2 and the p53-related transcription factor p73 in apoptosis response to cisplatin. Proc Natl Acad Sci U S A 2003;100(5):2420-5. 208. Yoshioka K, Yoshioka Y, Hsieh P. ATR kinase activation mediated by MutSalpha and MutLalpha in response to cytotoxic O6-methylguanine adducts. Mol Cell 2006;22(4):501-10. 209. Jun SH, Kim TG, Ban C. DNA mismatch repair system. Classical and fresh roles. Febs J 2006;273(8):1609-19. 210. Kat A, Thilly WG, Fang WH, Longley MJ, Li GM, Modrich P. An alkylation-tolerant, mutator human cell line is deficient in strand-specific mismatch repair. Proc Natl Acad Sci U S A 1993;90(14):6424-8. 211. Cejka P, Stojic L, Mojas N, Russell AM, Heinimann K, Cannavo E, et al. Methylation- induced G(2)/M arrest requires a full complement of the mismatch repair protein hMLH1. Embo J 2003;22(9):2245-54.

227

212. Stojic L, Mojas N, Cejka P, Di Pietro M, Ferrari S, Marra G, et al. Mismatch repair- dependent G2 checkpoint induced by low doses of SN1 type methylating agents requires the ATR kinase. Genes Dev 2004;18(11):1331-44. 213. Adamson AW, Beardsley DI, Kim WJ, Gao Y, Baskaran R, Brown KD. Methylator- induced, mismatch repair-dependent G2 arrest is activated through Chk1 and Chk2. Mol Biol Cell 2005;16(3):1513-26. 214. Wang Y, Qin J. MSH2 and ATR form a signaling module and regulate two branches of the damage response to DNA methylation. Proc Natl Acad Sci U S A 2003;100(26):15387-92. 215. Hirose Y, Katayama M, Stokoe D, Haas-Kogan DA, Berger MS, Pieper RO. The p38 mitogen-activated protein kinase pathway links the DNA mismatch repair system to the G2 checkpoint and to resistance to chemotherapeutic DNA-methylating agents. Mol Cell Biol 2003;23(22):8306-15. 216. Harfe BD, Jinks-Robertson S. Mismatch repair proteins and mitotic genome stability. Mutat Res 2000;451(1-2):151-67. 217. Hoffmann ER, Borts RH. Meiotic recombination intermediates and mismatch repair proteins. Cytogenet Genome Res 2004;107(3-4):232-48. 218. Chen S, Bigner SH, Modrich P. High rate of CAD gene amplification in human cells deficient in MLH1 or MSH6. Proc Natl Acad Sci U S A 2001;98(24):13802-7. 219. Rayssiguier C, Thaler DS, Radman M. The barrier to recombination between Escherichia coli and Salmonella typhimurium is disrupted in mismatch-repair mutants. Nature 1989;342(6248):396-401. 220. Matic I, Radman M, Rayssiguier C. Structure of recombinants from conjugational crosses between Escherichia coli donor and mismatch-repair deficient Salmonella typhimurium recipients. Genetics 1994;136(1):17-26. 221. Nicholson A, Hendrix M, Jinks-Robertson S, Crouse GF. Regulation of mitotic homeologous recombination in yeast. Functions of mismatch repair and nucleotide excision repair genes. Genetics 2000;154(1):133-46. 222. Chen W, Jinks-Robertson S. The role of the mismatch repair machinery in regulating mitotic and meiotic recombination between diverged sequences in yeast. Genetics 1999;151(4):1299-313. 223. Goldfarb T, Alani E. Distinct roles for the Saccharomyces cerevisiae mismatch repair proteins in heteroduplex rejection, mismatch repair and nonhomologous tail removal. Genetics 2005;169(2):563-74. 224. Doherty KM, Sharma S, Uzdilla LA, Wilson TM, Cui S, Vindigni A, et al. RECQ1 helicase interacts with human mismatch repair factors that regulate genetic recombination. J Biol Chem 2005;280(30):28085-94. 225. Pedrazzi G, Bachrati CZ, Selak N, Studer I, Petkovic M, Hickson ID, et al. The Bloom's syndrome helicase interacts directly with the human DNA mismatch repair protein hMSH6. Biol Chem 2003;384(8):1155-64. 226. Kunz C, Schar P. Meiotic recombination: sealing the partnership at the junction. Curr Biol 2004;14(22):R962-4. 227. Zickler D, Kleckner N. Meiotic chromosomes: integrating structure and function. Annu Rev Genet 1999;33:603-754. 228. Ross-Macdonald P, Roeder GS. Mutation of a meiosis-specific MutS homolog decreases crossing over but not mismatch correction. Cell 1994;79(6):1069-80.

228

229. Hollingsworth NM, Ponte L, Halsey C. MSH5, a novel MutS homolog, facilitates meiotic reciprocal recombination between homologs in Saccharomyces cerevisiae but not mismatch repair. Genes Dev 1995;9(14):1728-39. 230. Snowden T, Acharya S, Butz C, Berardini M, Fishel R. hMSH4-hMSH5 recognizes Holliday Junctions and forms a meiosis-specific sliding clamp that embraces homologous chromosomes. Mol Cell 2004;15(3):437-51. 231. Lipkin SM, Moens PB, Wang V, Lenzi M, Shanmugarajah D, Gilgeous A, et al. Meiotic arrest and aneuploidy in MLH3-deficient mice. Nat Genet 2002;31(4):385-90. 232. Prolla TA, Baker SM, Harris AC, Tsao JL, Yao X, Bronner CE, et al. Tumour susceptibility and spontaneous mutation in mice deficient in Mlh1, Pms1 and Pms2 DNA mismatch repair. Nat Genet 1998;18(3):276-9. 233. Baker SM, Bronner CE, Zhang L, Plug AW, Robatzek M, Warren G, et al. Male mice defective in the DNA mismatch repair gene PMS2 exhibit abnormal chromosome synapsis in meiosis. Cell 1995;82(2):309-19. 234. Rada C, Ehrenstein MR, Neuberger MS, Milstein C. Hot spot focusing of somatic hypermutation in MSH2-deficient mice suggests two stages of mutational targeting. Immunity 1998;9(1):135-41. 235. Wiesendanger M, Kneitz B, Edelmann W, Scharff MD. Somatic hypermutation in MutS homologue (MSH)3-, MSH6-, and MSH3/MSH6-deficient mice reveals a role for the MSH2- MSH6 heterodimer in modulating the base substitution pattern. J Exp Med 2000;191(3):579-84. 236. Wilson TM, Vaisman A, Martomo SA, Sullivan P, Lan L, Hanaoka F, et al. MSH2- MSH6 stimulates DNA polymerase eta, suggesting a role for A:T mutations in antibody genes. J Exp Med 2005;201(4):637-45. 237. Casali P, Pal Z, Xu Z, Zan H. DNA repair in antibody somatic hypermutation. Trends Immunol 2006;27(7):313-21. 238. Schrader CE, Guikema JE, Linehan EK, Selsing E, Stavnezer J. Activation-induced cytidine deaminase-dependent DNA breaks in class switch recombination occur during G1 phase of the cell cycle and depend upon mismatch repair. J Immunol 2007;179(9):6064-71. 239. Pearson CE, Nichol Edamura K, Cleary JD. Repeat instability: mechanisms of dynamic mutations. Nat Rev Genet 2005;6(10):729-42. 240. Jaworski A, Rosche WA, Gellibolian R, Kang S, Shimizu M, Bowater RP, et al. Mismatch repair in Escherichia coli enhances instability of (CTG)n triplet repeats from human hereditary diseases. Proc Natl Acad Sci U S A 1995;92(24):11019-23. 241. Modrich P. Mechanisms in eukaryotic mismatch repair. J Biol Chem 2006;281(41):30305-9. 242. Peltomaki P. Lynch syndrome genes. Fam Cancer 2005;4(3):227-32. 243. Peltomaki P. Role of DNA mismatch repair defects in the pathogenesis of human cancer. J Clin Oncol 2003;21(6):1174-9. 244. Fishel R, Lescoe MK, Rao MR, Copeland NG, Jenkins NA, Garber J, et al. The human mutator gene homolog MSH2 and its association with hereditary nonpolyposis colon cancer. Cell 1993;75(5):1027-38. 245. Leach FS, Nicolaides NC, Papadopoulos N, Liu B, Jen J, Parsons R, et al. Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer. Cell 1993;75(6):1215-25. 246. Bronner CE, Baker SM, Morrison PT, Warren G, Smith LG, Lescoe MK, et al. Mutation in the DNA mismatch repair gene homologue hMLH1 is associated with hereditary non- polyposis colon cancer. Nature 1994;368(6468):258-61.

229

247. Papadopoulos N, Nicolaides NC, Wei YF, Ruben SM, Carter KC, Rosen CA, et al. Mutation of a mutL homolog in hereditary colon cancer. Science 1994;263(5153):1625-9. 248. Lindblom A, Tannergard P, Werelius B, Nordenskjold M. Genetic mapping of a second locus predisposing to hereditary non-polyposis colon cancer. Nat Genet 1993;5(3):279-82. 249. Herman JG, Umar A, Polyak K, Graff JR, Ahuja N, Issa JP, et al. Incidence and functional consequences of hMLH1 promoter hypermethylation in colorectal carcinoma. Proc Natl Acad Sci U S A 1998;95(12):6870-5. 250. Veigl ML, Kasturi L, Olechnowicz J, Ma AH, Lutterbaugh JD, Periyasamy S, et al. Biallelic inactivation of hMLH1 by epigenetic gene silencing, a novel mechanism causing human MSI cancers. Proc Natl Acad Sci U S A 1998;95(15):8698-702. 251. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature 2001;409(6822):860-921. 252. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. Science 2001;291(5507):1304-51. 253. Consortium IH. A haplotype map of the human genome. Nature 2005;437(7063):1299- 320. 254. Orr N, Chanock S. Common genetic variation and human disease. Adv Genet 2008;62:1- 32. 255. Collins FS, Brooks LD, Chakravarti A. A DNA polymorphism discovery resource for research on human genetic variation. Genome Res 1998;8(12):1229-31. 256. Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, et al. Methods for high-density admixture mapping of disease genes. Am J Hum Genet 2004;74(5):979-1000. 257. Shriver MD, Mei R, Parra EJ, Sonpar V, Halder I, Tishkoff SA, et al. Large-scale SNP analysis reveals clustered and continuous patterns of human genetic variation. Hum Genomics 2005;2(2):81-9. 258. Chanock S. Candidate genes and single nucleotide polymorphisms (SNPs) in the study of human disease. Dis Markers 2001;17(2):89-98. 259. Risch NJ. Searching for genetic determinants in the new millennium. Nature 2000;405(6788):847-56. 260. Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 2001;409(6822):928-33. 261. Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res 2003;31(13):3812-4. 262. Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res 2002;30(17):3894-900. 263. Capon F, Allen MH, Ameen M, Burden AD, Tillman D, Barker JN, et al. A synonymous SNP of the corneodesmosin gene leads to increased mRNA stability and demonstrates association with psoriasis across diverse ethnic groups. Hum Mol Genet 2004;13(20):2361-8. 264. Chamary JV, Parmley JL, Hurst LD. Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet 2006;7(2):98-108. 265. Zanke BW, Greenwood CM, Rangrej J, Kustra R, Tenesa A, Farrington SM, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on chromosome 8q24. Nat Genet 2007;39(8):989-94.

230

266. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, et al. Detection of large-scale variation in the human genome. Nat Genet 2004;36(9):949-51. 267. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science 2004;305(5683):525-8. 268. Eichler EE. Widening the spectrum of human genetic variation. Nat Genet 2006;38(1):9- 11. 269. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, et al. The diploid genome sequence of an individual human. PLoS Biol 2007;5(10):e254. 270. Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat Rev Genet 2009;10(4):241-51. 271. McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P. The fine-scale structure of recombination rate variation in the human genome. Science 2004;304(5670):581-4. 272. Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. A fine-scale map of recombination rates and hotspots across the human genome. Science 2005;310(5746):321-4. 273. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest 2008;118(5):1590-605. 274. Xavier RJ, Rioux JD. Genome-wide association studies: a new window into immune- mediated diseases. Nat Rev Immunol 2008;8(8):631-43. 275. Lettre G, Rioux JD. Autoimmune diseases: insights from genome-wide association studies. Hum Mol Genet 2008;17(R2):R116-21. 276. Houlston RS, Webb E, Broderick P, Pittman AM, Di Bernardo MC, Lubbe S, et al. Meta- analysis of genome-wide association data identifies four new susceptibility loci for colorectal cancer. Nat Genet 2008;40(12):1426-35. 277. Tenesa A, Farrington SM, Prendergast JG, Porteous ME, Walker M, Haq N, et al. Genome-wide association scan identifies a colorectal cancer susceptibility locus on 11q23 and replicates risk loci at 8q24 and 18q21. Nat Genet 2008;40(5):631-7. 278. Lander ES. The new genomics: global views of biology. Science 1996;274(5287):536-9. 279. Chakravarti A. Population genetics--making sense out of sequence. Nat Genet 1999;21(1 Suppl):56-60. 280. Ober C, Cox NJ. The genetics of asthma. Mapping genes for complex traits in founder populations. Clin Exp Allergy 1998;28 Suppl 1:101-5; discussion 108-10. 281. Wittke-Thompson JK, Ambrose N, Yairi E, Roe C, Cook EH, Ober C, et al. Genetic studies of stuttering in a founder population. J Fluency Disord 2007;32(1):33-50. 282. Rahman P, Jones A, Curtis J, Bartlett S, Peddle L, Fernandez BA, et al. The Newfoundland population: a unique resource for genetic investigation of complex diseases. Hum Mol Genet 2003;12 Spec No 2:R167-72. 283. Mathonnet G, Krajinovic M, Labuda D, Sinnett D. Role of DNA mismatch repair genetic polymorphisms in the risk of childhood acute lymphoblastic leukaemia. Br J Haematol 2003;123(1):45-8. 284. Listgarten J, Damaraju S, Poulin B, Cook L, Dufour J, Driga A, et al. Predictive models for breast cancer susceptibility from multiple single nucleotide polymorphisms. Clin Cancer Res 2004;10(8):2725-37. 285. Froggatt NJ, Joyce JA, Davies R, Gareth D, Evans R, Ponder BA, et al. A frequent hMSH2 mutation in hereditary non-polyposis colon cancer syndrome. Lancet 1995;345(8951):727.

231

286. Salovaara R, Loukola A, Kristo P, Kaariainen H, Ahtola H, Eskelinen M, et al. Population-based molecular detection of hereditary nonpolyposis colorectal cancer. J Clin Oncol 2000;18(11):2193-200. 287. Liu T, Stathopoulos P, Lindblom P, Rubio C, Wasteson Arver B, Iselius L, et al. MSH2 codon 322 Gly to Asp seems not to confer an increased risk for colorectal cancer susceptibility. Eur J Cancer 1998;34(12):1981. 288. Ellison AR, Lofing J, Bitter GA. Functional analysis of human MLH1 and MSH2 missense variants and hybrid human-yeast MLH1 proteins in Saccharomyces cerevisiae. Hum Mol Genet 2001;10(18):1889-900. 289. Drotschmann K, Clark AB, Kunkel TA. Mutator phenotypes of common polymorphisms and missense mutations in MSH2. Curr Biol 1999;9(16):907-10. 290. Worrillow LJ, Travis LB, Smith AG, Rollinson S, Smith AJ, Wild CP, et al. An intron splice acceptor polymorphism in hMSH2 and risk of leukemia after treatment with chemotherapeutic alkylating agents. Clin Cancer Res 2003;9(8):3012-20. 291. Paz-y-Mino C, Perez JC, Fiallo BF, Leone PE. A polymorphism in the hMSH2 gene (gIVS12-6T>C) associated with non-Hodgkin lymphomas. Cancer Genet Cytogenet 2002;133(1):29-33. 292. Palicio M, Blanco I, Tortola S, Gonzalez I, Marcuello E, Brunet J, et al. Intron splice acceptor site polymorphism in the hMSH2 gene in sporadic and familial colorectal cancer. Br J Cancer 2000;82(3):535-7. 293. Goessl C, Plaschke J, Pistorius S, Hahn M, Frank S, Hampl M, et al. An intronic germline transition in the HNPCC gene hMSH2 is associated with sporadic colorectal cancer. Eur J Cancer 1997;33(11):1869-74. 294. Brentnall TA, Rubin CE, Crispin DA, Stevens A, Batchelor RH, Haggitt RC, et al. A germline substitution in the human MSH2 gene is associated with high-grade dysplasia and cancer in ulcerative colitis. Gastroenterology 1995;109(1):151-5. 295. Jung CY, Choi JE, Park JM, Chae MH, Kang HG, Kim KM, et al. Polymorphisms in the hMSH2 gene and the risk of primary lung cancer. Cancer Epidemiol Biomarkers Prev 2006;15(4):762-8. 296. Campbell PT, Curtin K, Ulrich C, Samowitz W, Bigler J, Velicer C, et al. Mismatch repair polymorphisms and risk of colon cancer, tumor microsatellite instability, and interactions with lifestyle factors. Gut 2008. 297. Berndt SI, Platz EA, Fallin MD, Thuita LW, Hoffman SC, Helzlsouer KJ. Mismatch repair polymorphisms and the risk of colorectal cancer. Int J Cancer 2007;120(7):1548-54. 298. Lipkin SM, Rozek LS, Rennert G, Yang W, Chen PC, Hacia J, et al. The MLH1 D132H variant is associated with susceptibility to sporadic colorectal cancer. Nat Genet 2004;36(7):694- 9. 299. Shin BY, Chen H, Rozek LS, Paxton L, Peel DJ, Anton-Culver H, et al. Low allele frequency of MLH1 D132H in American colorectal and endometrial cancer patients. Dis Colon Rectum 2005;48(9):1723-7. 300. Schafmayer C, Buch S, Egberts JH, Franke A, Brosch M, El Sharawy A, et al. Genetic investigation of DNA-repair pathway genes PMS2, MLH1, MSH2, MSH6, MUTYH, OGG1 and MTH1 in sporadic colon cancer. Int J Cancer 2007;121(3):555-8. 301. Ilyas M, Straub J, Tomlinson IP, Bodmer WF. Genetic pathways in colorectal and other cancers. Eur J Cancer 1999;35(14):1986-2002.

232

302. Lynch HT, de la Chapelle A. Hereditary colorectal cancer. N Engl J Med 2003;348(10):919-32. 303. Hampel H, Frankel WL, Martin E, Arnold M, Khanduja K, Kuebler P, et al. Screening for the Lynch syndrome (hereditary nonpolyposis colorectal cancer). N Engl J Med 2005;352(18):1851-60. 304. Ito E, Yanagisawa Y, Iwahashi Y, Suzuki Y, Nagasaki H, Akiyama Y, et al. A core promoter and a frequent single-nucleotide polymorphism of the mismatch repair gene hMLH1. Biochem Biophys Res Commun 1999;256(3):488-94. 305. Lee KM, Choi JY, Kang C, Kang CP, Park SK, Cho H, et al. Genetic polymorphisms of selected DNA repair genes, estrogen and progesterone receptor status, and breast cancer risk. Clin Cancer Res 2005;11(12):4620-6. 306. Park SH, Lee GY, Jeon HS, Lee SJ, Kim KM, Jang SS, et al. -93G-->A polymorphism of hMLH1 and risk of primary lung cancer. Int J Cancer 2004;112(4):678-82. 307. Yu JH, Bigler J, Whitton J, Potter JD, Ulrich CM. Mismatch repair polymorphisms and colorectal polyps: hMLH1-93G>A variant modifies risk associated with smoking. Am J Gastroenterol 2006;101(6):1313-9. 308. Hutter P, Couturier A, Rey-Berthod C. Two common forms of the human MLH1 gene may be associated with functional differences. J Med Genet 2000;37(10):776-81. 309. Hutter P, Wijnen J, Rey-Berthod C, Thiffault I, Verkuijlen P, Farber D, et al. An MLH1 haplotype is over-represented on chromosomes carrying an HNPCC predisposing mutation in MLH1. J Med Genet 2002;39(5):323-7. 310. Shin KH, Shin JH, Kim JH, Park JG. Mutational analysis of promoters of mismatch repair genes hMSH2 and hMLH1 in hereditary nonpolyposis colorectal cancer and early onset colorectal cancer patients: identification of three novel germ-line mutations in promoter of the hMSH2 gene. Cancer Res 2002;62(1):38-42. 311. Paz-y-Mino C, Fiallo BF, Morillo SA, Acosta A, Gimenez P, Ocampo L, et al. Analysis of the polymorphism [gIVS12-6T > C] in the hMSH2 gene in lymphoma and leukemia. Leuk Lymphoma 2003;44(3):505-8. 312. Gazzoli I, Kolodner RD. Regulation of the human MSH6 gene by the Sp1 transcription factor and alteration of promoter activity and expression by polymorphisms. Mol Cell Biol 2003;23(22):7992-8007. 313. Cotterchio M, McKeown-Eyssen G, Sutherland H, Buchan G, Aronson M, Easson AM, et al. Ontario familial colon cancer registry: methods and first-year response rates. Chronic Dis Can 2000;21(2):81-6. 314. Croitoru ME, Cleary SP, Di Nicola N, Manno M, Selander T, Aronson M, et al. Association between biallelic and monoallelic germline MYH gene mutations and colorectal cancer risk. J Natl Cancer Inst 2004;96(21):1631-4. 315. Livak KJ. Allelic discrimination using fluorogenic probes and the 5' nuclease assay. Genet Anal 1999;14(5-6):143-9. 316. Mullis KB, Faloona FA. Specific synthesis of DNA in vitro via a polymerase-catalyzed chain reaction. Methods Enzymol 1987;155:335-50. 317. Smith NH, Selander RK. Sequence invariance of the antigen-coding central region of the phase 1 flagellar filament gene (fliC) among strains of Salmonella typhimurium. J Bacteriol 1990;172(2):603-9. 318. Wyman AR, White R. A highly polymorphic locus in human DNA. Proc Natl Acad Sci U S A 1980;77(11):6754-8.

233

319. Watts D, MacBeath JR. Automated fluorescent DNA sequencing on the ABI PRISM 310 Genetic Analyzer. Methods Mol Biol 2001;167:153-70. 320. MacBeath JR, Harvey SS, Oldroyd NJ. Automated fluorescent DNA sequencing on the ABI PRISM 377. Methods Mol Biol 2001;167:119-52. 321. Lindor NM, Burgart LJ, Leontovich O, Goldberg RM, Cunningham JM, Sargent DJ, et al. Immunohistochemistry versus microsatellite instability testing in phenotyping colorectal tumors. J Clin Oncol 2002;20(4):1043-8. 322. Dupont WD, Plummer WD, Jr. Power and sample size calculations for studies involving linear regression. Control Clin Trials 1998;19(6):589-601. 323. Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, et al. Genetic structure of human populations. Science 2002;298(5602):2381-5. 324. Iwahashi Y, Ito E, Yanagisawa Y, Akiyama Y, Yuasa Y, Onodera T, et al. Promoter analysis of the human mismatch repair gene hMSH2. Gene 1998;213(1-2):141-7. 325. Zenke M, Grundstrom T, Matthes H, Wintzerith M, Schatz C, Wildeman A, et al. Multiple sequence motifs are involved in SV40 enhancer function. Embo J 1986;5(2):387-97. 326. Kinoshita S, Akira S, Kishimoto T. A member of the C/EBP family, NF-IL6 beta, forms a heterodimer and transcriptionally synergizes with NF-IL6. Proc Natl Acad Sci U S A 1992;89(4):1473-6. 327. Hung RJ, Brennan P, Canzian F, Szeszenia-Dabrowska N, Zaridze D, Lissowska J, et al. Large-scale investigation of base excision repair genetic polymorphisms and lung cancer risk in a multicenter study. J Natl Cancer Inst 2005;97(8):567-76. 328. Pawlik TM, Raut CP, Rodriguez-Bigas MA. Colorectal carcinogenesis: MSI-H versus MSI-L. Dis Markers 2004;20(4-5):199-206. 329. Diaz LA, Jr. The current clinical value of genomic instability. Semin Cancer Biol 2005;15(1):67-71. 330. Benson AB, 3rd. New approaches to the adjuvant therapy of colon cancer. Oncologist 2006;11(9):973-80. 331. Burt RW, Bishop DT, Lynch HT, Rozen P, Winawer SJ. Risk and surveillance of individuals with heritable factors for colorectal cancer. WHO Collaborating Centre for the Prevention of Colorectal Cancer. Bull World Health Organ 1990;68(5):655-65. 332. Bodmer WF. Cancer genetics: colorectal cancer as a model. J Hum Genet 2006;51(5):391-6. 333. Wheeler JM. Epigenetics, mismatch repair genes and colorectal cancer. Ann R Coll Surg Engl 2005;87(1):15-20. 334. Deng G, Chen A, Pong E, Kim YS. Methylation in hMLH1 promoter interferes with its binding to transcription factor CBF and inhibits gene expression. Oncogene 2001;20(48):7120-7. 335. Hitchins M, Williams R, Cheong K, Halani N, Lin VA, Packham D, et al. MLH1 germline epimutations as a factor in hereditary nonpolyposis colorectal cancer. Gastroenterology 2005;129(5):1392-9. 336. Ianzano L, Zhao XC, Minassian BA, Scherer SW. Identification of a novel protein interacting with laforin, the EPM2a progressive myoclonus epilepsy gene product. Genomics 2003;81(6):579-87. 337. De Vivo I, Hankinson SE, Colditz GA, Hunter DJ. A functional polymorphism in the progesterone receptor gene is associated with an increase in breast cancer risk. Cancer Res 2003;63(17):5236-8. 338. Fodde R, Smits R. Cancer biology. A matter of dosage. Science 2002;298(5594):761-3.

234

339. Green RC, Green JS, Buehler SK, Robb JD, Daftary D, Gallinger S, et al. Very high incidence of familial colorectal cancer in Newfoundland: a comparison with Ontario and 13 other population-based studies. Fam Cancer 2006. 340. Woods MO, Hyde AJ, Curtis FK, Stuckless S, Green JS, Pollett AF, et al. High frequency of hereditary colorectal cancer in Newfoundland likely involves novel susceptibility genes. Clin Cancer Res 2005;11(19 Pt 1):6853-61. 341. Raptis S, Mrkonjic M, Green RC, Pethe VV, Monga N, Chan YM, et al. MLH1 -93G>A promoter polymorphism and the risk of microsatellite-unstable colorectal cancer. J Natl Cancer Inst 2007;99(6):463-74. 342. Ru Lee W, Chen CC, Liu S, Safe S. 17beta-estradiol (E2) induces cdc25A gene expression in breast cancer cells by genomic and non-genomic pathways. J Cell Biochem 2006;99(1):209-20. 343. Wang W, Dong L, Saville B, Safe S. Transcriptional activation of E2F1 gene expression by 17beta-estradiol in MCF-7 cells is regulated by NF-Y-Sp1/estrogen receptor interactions. Mol Endocrinol 1999;13(8):1373-87. 344. Kushner PJ, Agard DA, Greene GL, Scanlan TS, Shiau AK, Uht RM, et al. Estrogen receptor pathways to AP-1. J Steroid Biochem Mol Biol 2000;74(5):311-7. 345. Webb P, Lopez GN, Uht RM, Kushner PJ. Tamoxifen activation of the estrogen receptor/AP-1 pathway: potential origin for the cell-specific estrogen-like effects of antiestrogens. Mol Endocrinol 1995;9(4):443-56. 346. Webb P, Nguyen P, Valentine C, Lopez GN, Kwok GR, McInerney E, et al. The estrogen receptor enhances AP-1 activity by two distinct mechanisms with different requirements for receptor transactivation functions. Mol Endocrinol 1999;13(10):1672-85. 347. Miyamoto T, Shiozawa T, Kashima H, Feng YZ, Suzuki A, Kurai M, et al. Estrogen up- regulates mismatch repair activity in normal and malignant endometrial glandular cells. Endocrinology 2006;147(10):4863-70. 348. Slattery ML, Potter JD, Curtin K, Edwards S, Ma KN, Anderson K, et al. Estrogens reduce and withdrawal of estrogens increase risk of microsatellite instability-positive colon cancer. Cancer Res 2001;61(1):126-30. 349. Quaresima B, Faniello MC, Baudi F, Cuda G, Grandinetti C, Tassone P, et al. Transcriptional regulation of the mismatch repair gene hMLH1. Gene 2001;275(2):261-5. 350. Bond GL, Hirshfield KM, Kirchhoff T, Alexe G, Bond EE, Robins H, et al. MDM2 SNP309 accelerates tumor formation in a gender-specific and hormone-dependent manner. Cancer Res 2006;66(10):5104-10. 351. Bond GL, Menin C, Bertorelle R, Alhopuro P, Aaltonen LA, Levine AJ. MDM2 SNP309 accelerates colorectal tumour formation in women. J Med Genet 2006;43(12):950-2. 352. Bond GL, Hu W, Bond EE, Robins H, Lutzker SG, Arva NC, et al. A single nucleotide polymorphism in the MDM2 promoter attenuates the p53 tumor suppressor pathway and accelerates tumor formation in humans. Cell 2004;119(5):591-602. 353. Deng G, Chen A, Hong J, Chae HS, Kim YS. Methylation of CpG in a small region of the hMLH1 promoter invariably correlates with the absence of gene expression. Cancer Res 1999;59(9):2029-33. 354. Lin JC, Jeong S, Liang G, Takai D, Fatemi M, Tsai YC, et al. Role of nucleosomal occupancy in the epigenetic silencing of the MLH1 CpG island. Cancer Cell 2007;12(5):432-44.

235

355. Samowitz WS, Curtin K, Wolff RK, Albertsen H, Sweeney C, Caan BJ, et al. The MLH1 -93 G>A promoter polymorphism and genetic and epigenetic alterations in colon cancer. Genes Chromosomes Cancer 2008;47(10):835-44. 356. Allan JM, Shorto J, Adlard J, Bury J, Coggins R, George R, et al. MLH1 -93G>A promoter polymorphism and risk of mismatch repair deficient colorectal cancer. Int J Cancer 2008;123(10):2456-9. 357. Beiner ME, Rosen B, Fyles A, Harley I, Pal T, Siminovitch K, et al. Endometrial cancer risk is associated with variants of the mismatch repair genes MLH1 and MSH2. Cancer Epidemiol Biomarkers Prev 2006;15(9):1636-40. 358. Chen H, Taylor NP, Sotamaa KM, Mutch DG, Powell MA, Schmidt AP, et al. Evidence for heritable predisposition to epigenetic silencing of MLH1. Int J Cancer 2007;120(8):1684-8. 359. Worrillow LJ, Smith AG, Scott K, Andersson M, Ashcroft AJ, Dores GM, et al. Polymorphic MLH1 and risk of cancer after methylating chemotherapy for Hodgkin lymphoma. J Med Genet 2008;45(3):142-6. 360. Zhong X, Arita M, Koike J, Tsujita K, Hemmi H. A Single Nucleotide Polymorphism in the Promoter Region of the hMLH1 Gene: Effect on Transcriptional Activity and Application as a Marker for Detecting Allelic Loss. Journal of the Medical Society of Toho University 2001;48(2):114-124. 361. Arita M, Zhong X, Min Z, Hemmi H, Shimatake H. Multiple sites required for expression in 5'-flanking region of the hMLH1 gene. Gene 2003;306:57-65. 362. Deng G, Peng E, Gum J, Terdiman J, Sleisenger M, Kim YS. Methylation of hMLH1 promoter correlates with the gene silencing with a region-specific manner in colorectal cancer. Br J Cancer 2002;86(4):574-9. 363. Capel E, Flejou JF, Hamelin R. Assessment of MLH1 promoter methylation in relation to gene expression requires specific analysis. Oncogene 2007;26(54):7596-600. 364. Nagase T, Ishikawa K, Suyama M, Kikuno R, Miyajima N, Tanaka A, et al. Prediction of the coding sequences of unidentified human genes. XI. The complete sequences of 100 new cDNA clones from brain which code for large proteins in vitro. DNA Res 1998;5(5):277-86. 365. Kim HS, Yim SV, Jung KH, Zheng LT, Kim YH, Lee KH, et al. Altered DNA copy number in patients with different seizure disorder type: by array-CGH. Brain Dev 2007;29(10):639-43. 366. Walker LC, Waddell N, Ten Haaf A, Grimmond S, Spurdle AB. Use of expression data and the CGEMS genome-wide breast cancer association study to identify genes that may modify risk in BRCA1/2 mutation carriers. Breast Cancer Res Treat 2008;112(2):229-36. 367. Liu Y, Wang Y, Wu C, Liu Y, Zheng P. Dimerization of Laforin is required for its optimal phosphatase activity, regulation of GSK3beta phosphorylation, and Wnt signaling. J Biol Chem 2006;281(46):34768-74. 368. Wang Y, Liu Y, Wu C, Zhang H, Zheng X, Zheng Z, et al. Epm2a suppresses tumor growth in an immunocompromised host by inhibiting Wnt signaling. Cancer Cell 2006;10(3):179-90. 369. Liu R, Wang L, Chen C, Liu Y, Zhou P, Wang Y, et al. Laforin negatively regulates cell cycle progression through glycogen synthase kinase 3beta-dependent mechanisms. Mol Cell Biol 2008;28(23):7236-44. 370. Wang WS, Chen PM, Su Y. Colorectal carcinoma: from tumorigenesis to treatment. Cell Mol Life Sci 2006;63(6):663-71.

236

371. Hitchins MP, Wong JJ, Suthers G, Suter CM, Martin DI, Hawkins NJ, et al. Inheritance of a cancer-associated MLH1 germ-line epimutation. N Engl J Med 2007;356(7):697-705. 372. Mrkonjic M, Raptis S, Green RC, Monga N, Daftary D, Dicks E, et al. MSH2 118T>C and MSH6 159C>T promoter polymorphisms and the risk of colorectal cancer. Carcinogenesis 2007;28(12):2575-80. 373. Newcomb PA, Baron J, Cotterchio M, Gallinger S, Grove J, Haile R, et al. Colon Cancer Family Registry: an international resource for studies of the genetic epidemiology of colon cancer. Cancer Epidemiol Biomarkers Prev 2007;16(11):2331-43. 374. Eads CA, Danenberg KD, Kawakami K, Saltz LB, Blake C, Shibata D, et al. MethyLight: a high-throughput assay to measure DNA methylation. Nucleic Acids Res 2000;28(8):E32. 375. Weisenberger DJ, Siegmund KD, Campan M, Young J, Long TI, Faasse MA, et al. CpG island methylator phenotype underlies sporadic microsatellite instability and is tightly associated with BRAF mutation in colorectal cancer. Nat Genet 2006;38(7):787-93. 376. Hampel H, Frankel W, Panescu J, Lockman J, Sotamaa K, Fix D, et al. Screening for Lynch syndrome (hereditary nonpolyposis colorectal cancer) among endometrial cancer patients. Cancer Res 2006;66(15):7810-7. 377. Firth D. Bias reduction of maximum likelihood estimates. Biometrika 1993;80(1):11. 378. Poynter JN, Siegmund KD, Weisenberger DJ, Long TI, Thibodeau SN, Lindor N, et al. Molecular characterization of MSI-H colorectal cancer by MLHI promoter methylation, immunohistochemistry, and mismatch repair germline mutation screening. Cancer Epidemiol Biomarkers Prev 2008;17(11):3208-15. 379. Harley I, Rosen B, Risch HA, Siminovitch K, Beiner ME, McLaughlin J, et al. Ovarian cancer risk is associated with a common variant in the promoter sequence of the mismatch repair gene MLH1. Gynecol Oncol 2008;109(3):384-7. 380. Liu J, Bang AG, Kintner C, Orth AP, Chanda SK, Ding S, et al. Identification of the Wnt signaling activator leucine-rich repeat in Flightless interaction protein 2 by a genome-wide functional analysis. Proc Natl Acad Sci U S A 2005;102(6):1927-32. 381. Dai P, Jeong SY, Yu Y, Leng T, Wu W, Xie L, et al. Modulation of TLR signaling by multiple MyD88-interacting partners including leucine-rich repeat Fli-I-interacting proteins. J Immunol 2009;182(6):3450-60. 382. Thorsen K, Sorensen KD, Brems-Eskildsen AS, Modin C, Gaustadnes M, Hein AM, et al. Alternative splicing in colon, bladder, and prostate cancer identified by exon array analysis. Mol Cell Proteomics 2008;7(7):1214-24. 383. Yu W, Gius D, Onyango P, Muldoon-Jacobs K, Karp J, Feinberg AP, et al. Epigenetic silencing of tumour suppressor gene p15 by its antisense RNA. Nature 2008;451(7175):202-6. 384. Hitchins MP, Lin VA, Buckle A, Cheong K, Halani N, Ku S, et al. Epigenetic inactivation of a cluster of genes flanking MLH1 in microsatellite-unstable colorectal cancer. Cancer Res 2007;67(19):9107-16. 385. Bindra RS, Glazer PM. Co-repression of mismatch repair gene expression by hypoxia in cancer cells: role of the Myc/Max network. Cancer Lett 2007;252(1):93-103. 386. Gan XQ, Wang JY, Xi Y, Wu ZL, Li YP, Li L. Nuclear Dvl, c-Jun, beta-catenin, and TCF form a complex leading to stabilization of beta-catenin-TCF interaction. J Cell Biol 2008;180(6):1087-100.